Data is business. The pace at which an organization can process data improves its ability to react to business events in real time.
Today, organizations are bringing together new types of data from a variety of internal and external sources for analytics, often in real time. They often explore new architectures, such as next-generation data lakes, information hubs and real-time streaming architectures to process and gain value from this data and build machine learning models in their path to artificial intelligence (AI). There is a growing need to develop capabilities that can effectively feed data into landing zones or information hubs and thereafter quickly process large data sets for quick responses to changing business events.
According to the third Gartner Chief Data Officer survey in 2017, chief data officers (CDOs) are not just focused on data, as the title may imply. Their responsibilities span data management, analytics, data science, ethics and digital transformation. In the 2017 survey, 86 percent of respondents ranked “defining data and analytics strategy for the organization” as their top responsibility, up from 64 percent in 2016. This reflects a need for creating or modernizing data and analytics strategies within an increasing dependence on data and insights within a digital business context.
As organizations struggled to manage the ingest of rapidly changing structured operational data, next-generation data lake models evolved that use streaming data via Kafka-based information hubs.
Kafka was conceived as a distributed streaming platform. It has a very low-latency pipeline that enables real time event processing, movement of data between systems and applications, and real-time transformation of data. Kafka is also not just a pipeline. Data can be stored on the platform as well. Kafka-based information hubs go well beyond feeding the data lake by seamlessly delivering continuously changing data in real-time for downstream data integration with everything from the cloud to AI environments.
To help organizations deliver transactional data into Kafka-based information hubs, you need a replication platform, such as IBM Data Replication, to provide a Kafka target engine that applies data into Kafka using either a Java API-based writer with built-in buffering or a REST (representational state transfer) API. The Kafka target engine fully integrates with the data replication platform’s low-impact, log-based captures from a wide variety of sources including Db2 z, i, LUW, Oracle, MS SQL Server and even IBM VSAM and IMS.
There is often little room for latency in real-time transfer of data. So you need the right data replication capability that can incrementally replicate changes captured from database logs in near real-time, which can thereby facilitate streaming analytics, feeding a data lake and more, using the data landed by the data replication solution into Kafka.
The good news is that today there are alternative technologies available for analytics leaders to choose from, based on their business requirements and use cases.
If you are an analytics leader using or planning to deploy Apache Kafka, check out this IBM webcast and learn how to make your information hub and data lake journey successful with the right data replication solution.
To get started and learn how you can use IBM Data Replication for incremental delivery of transactional data to feed your Hadoop-based data lakes or Kafka-based data hubs, read the IBM Data Replication for Big Data Solution Brief.