The sheer variety of different data sources in the modern enterprise can be a bit overwhelming - traditional databases, NoSQL databases, mainframe, message queues, cloud services - the list goes on. Ideally you want to ingest many (most!) of these data sets into Hadoop, because if the data's not in Hadoop then you can't utilise all the amazing capabilities of the Hadoop ecosystem like machine learning, analytics, visualisation and data mining on a scale that was never possible with earlier systems.
For all the benefits of the Hadoop ecosystem, it can also be a very steep learning curve for IT, developers and end users. As it stands, your team may find themselves installing, configuring, scripting and coding with half a dozen different tools to ingest all those different data sets.
Answer: it depends! Apache Sqoop is a common choice, but you could actually use any or all of these three tools with a list of pro's and con's to each approach. This is only for the RDBMS case, you'll be confronted with a laundry list of software to cover all the other data types too.
In terms of learning curve and implementation time it really makes sense to use a tool such as Syncsort DMX-h to simplify data ingestion by providing:
DMX-h offers the fast, easy connectivity you need for all your enterprise data sources:
For large enterprises and government departments, moving mainframe data into Hadoop can sometimes end up in the too-hard basket. Converting mainframe EBCDIC format to ASCII, unpacking mainframe numeric formats and dealing with Cobol Copybooks can certainly be complex. However there's a treasure trove of information locked away on the mainframe that's too valuable to ignore.
Syncsort makes mainframe data ingestion to Hadoop as simple and secure as any other data type, and for industries where compliance and data lineage are vital, DMX-h can even process mainframe data on Hadoop and Apache Spark without conversion - try doing that with another tool!
Data Funnel is a new feature of DMX-h that's rapidly becoming a favourite, offering the ability to move hundreds of database tables at a time with a single click. Now you can transfer large volumes of data quickly and easily from an existing data warehouse running on a system like Teradata, Oracle or SQL Server into Hive and HDFS on Hadoop.
All in all, Syncsort DMX-h is a tool that will simplify efforts to populate your Hadoop data lake, free up data scientists and developers to concentrate on high-value activities that affect revenue, and allow less technical staff easy access to the treasure trove of data you're building with Hadoop.
Find out how easy it is to get all types of data into Hadoop quickly, easily and with no coding required - take a test drive of Syncsort DMX-h today.