Big Data & Analytics Blog

News and views from the team at Keylink

A Simple, No-Coding Approach for Ingesting Data into Hadoop

Wednesday, November 2, 2016

The sheer variety of different data sources in the modern enterprise can be a bit overwhelming - traditional databases, NoSQL databases, mainframe, message queues, cloud services - the list goes on. Ideally you want to ingest many (most!) of these data sets into Hadoop, because if the data's not in Hadoop then you can't utilise all the amazing capabilities of the Hadoop ecosystem like machine learning, analytics, visualisation and data mining on a scale that was never possible with earlier systems.

For all the benefits of the Hadoop ecosystem, it can also be a very steep learning curve for IT, developers and end users. As it stands, your team may find themselves installing, configuring, scripting and coding with half a dozen different tools to ingest all those different data sets.

Pop Quiz

I want to ingest data from an existing relational database (RDBMS) to Hadoop - do I use Apache Flume, Apache Sqoop or Apache Kafka?

Answer: it depends! Apache Sqoop is a common choice, but you could actually use any or all of these three tools with a list of pro's and con's to each approach. This is only for the RDBMS case, you'll be confronted with a laundry list of software to cover all the other data types too.

Simplifying data access

In terms of learning curve and implementation time it really makes sense to use a tool such as Syncsort DMX-h to simplify data ingestion by providing:

  • A single software environment for accessing and integrating all your enterprise data sources – batch and streaming – on premise or in the cloud
  • An environment that evolves with the Hadoop ecosystem to keep you current without rewriting jobs or acquiring new skills
  • A single easy-to-use graphical interface to democratise data access for less technical users

DMX-h offers the fast, easy connectivity you need for all your enterprise data sources:

  • Files – CSV Delimited Files, JSON, XML, Avro, Parquet
  • Database – Hive, HCatalog, IBM DB2, Greenplum, SQL Server, Netezza, Oracle, Sybase, Teradata, Vertica, MySQL, ODBC, JDBC
  • Mainframe – EBCDIC fixed & variable, VSAM, packed decimal
  • NoSQL – Cassandra, HBase, MongoDB
  • Streaming – Apache Kafka, MapR Streams, IBM WebSphere MQ
  • Cloud – Amazon S3, Amazon Redshift, Google Cloud Storage
  • Plus – Tableau, Qlik, Salesforce.com, SAP/R3

Mainframe data on Hadoop too?

For large enterprises and government departments, moving mainframe data into Hadoop can sometimes end up in the too-hard basket. Converting mainframe EBCDIC format to ASCII, unpacking mainframe numeric formats and dealing with Cobol Copybooks can certainly be complex. However there's a treasure trove of information locked away on the mainframe that's too valuable to ignore.

Syncsort makes mainframe data ingestion to Hadoop as simple and secure as any other data type, and for industries where compliance and data lineage are vital, DMX-h can even process mainframe data on Hadoop and Apache Spark without conversion - try doing that with another tool!

Move a whole database schema to Hadoop with a single command

Data Funnel is a new feature of DMX-h that's rapidly becoming a favourite, offering the ability to move hundreds of database tables at a time with a single click. Now you can transfer large volumes of data quickly and easily from an existing data warehouse running on a system like Teradata, Oracle or SQL Server into Hive and HDFS on Hadoop.

All in all, Syncsort DMX-h is a tool that will simplify efforts to populate your Hadoop data lake, free up data scientists and developers to concentrate on high-value activities that affect revenue, and allow less technical staff easy access to the treasure trove of data you're building with Hadoop.

Find out how easy it is to get all types of data into Hadoop quickly, easily and with no coding required - take a test drive of Syncsort DMX-h today.

Category:
Syncsort DMX-h

Need Help?

Struggling with data integration, ETL, Hadoop, dashboards? We can help.

Our ProductsContact us