Organizations these days demand to run their big data workloads wherever and whenever they want – in their own private data centers, in public clouds, on the edge, and every which way in between. Hortonworks acknowledged that reality today by announcing expanded cloud partnerships with Google, Microsoft, and IBM. It also announced the first release of its Hortonworks Data Platform (HDP) based on Apache Hadoop version 3.
Hortonworks already had partnerships with Google, Microsoft, and IBM to run its products in their cloud offerings. You can also get Hortonworks Data Flow (HDF), its streaming data solution, and Hortonworks Data Plane on the biggest public cloud, Amazon Web Services. But with today’s announcement on the first day of Hortonworks annual DataWorks Summit (formerly Hadoop Summit), the company is re-iterating its commitment to ensuring that its customers’ HDP, HDF, and DataPlane implementations will run consistently across all of those environments.
“We certainly believe that for a very long time our customers are going to need to implement their modern data architecture across multiple hybrid footprints on prem and in the cloud,” said Hortonworks CTO Scott Gnau. “So whether you’re running against data that’s in AWS or your running against data in your data center, being able to have a common software infrastructure that allows that application to exist and run and be portable without being rewritten becomes extremely important.”
Hortonworks cloud footprint is relatively small at this point. Out of 1,400 paying customers, only 25% are running some Hortonworks software in the cloud, according to Gnau. By comparison, 95% of its customers have an on-premise footprint, which means only 5% are cloud-only.
“Our goal there is to leverage the best of both worlds,” Gnau tells Datanami. “Customers are looking for flexibility in deployment. They’re looking for the ability to have burst processing capabilities. They want to take advantage of agile deployment via containerization and serverless tech.”
It’s the small details that really count when building data infrastructure that can run just about anywhere. In most of these cloud implementations, HDP run with the cloud’s favored object storage system, such as Amazon S3 or Microsoft’s Windows Azure Storage Blob (WASB), and Google Cloud Storage (GCS), rather than HDFS. Beyond that, Hortonworks is focused on ensuring that the management, security, and governance of data is handled in a consistent manner through projects like Apache Atlas and Apache Ranger.
“We can give you that common experience across all of those footprints and let you take advantage of that, but do it in a highly secure and manageable way,” Gnau says. “We’ll give you …that common look and feel, that application portability, and a common state of tools to let you manage the estate of your data regardless of where it’s physically stored.”
The cloud offerings are not all the same. On Microsoft Azure, customers have the choice of running HDP and HDF or utilizing HDInsight, which is a hosted Hadoop application based on Hortonworks’ Hadoop distribution that’s managed by Microsoft for clients. “We are giving customers the most choice as to how they move data workloads to the cloud, on Azure or Microsoft HDInsight for an enterprise-grade managed service that makes it easier for end users,” stated Rohan Kumar, corporate vice president of Azure Data for Microsoft.
IBM, meanwhile, is launching a hosted big data solution dubbed IBM Hosted Analytics with Hortonworks (IHAH), which includes HDP, IBM Db2 Big SQL, and the IBM Data Science Experience. “The strong support we’ve received over the last year for our integrated solutions led IBM and Hortonworks to extend our joint efforts even further,” stated Rob Thomas, the general manager of IBM analytics, in a press release.
Hortonworks also released HDP 3, the first distribution of its enterprise data platform built on Apache Hadoop version 3, which brings enhancements in the areas of separation of compute and storage, containerization, and deep learning.
Specifically, Hortonworks is highlighting the capacity to support object stores in HDP 3 as an alternative to HDFS when running in the cloud. The company is also touting Apache Hive 3.0, which brings improvements to interactive query. Support for GPUs within Hadoop 3’s YARN scheduler also enables new deep learning and machine learning workloads to run on supported Hadoop clusters. Currently, much of those workloads are run on clusters adjacent to Hadoop.
Hortonworks is expecting thousands to attend its DataWorks Summmit at the San Jose McEnery Convention Center, which is taking place today through Thursday.