What’s going on in the data world right now, and how will it impact the market in 2018? There are the obvious, banner headlines, of course: AI is everywhere and will change everything; Enterprises continue to move their infrastructure – and data – to the cloud; GDPR will make data protection every company’s priority. But you knew all that. And maybe you were a bit skeptical of the grandiose claims anyway.
What substantive changes are really taking place? What do you need to be aware of as you set your architectural and procurement strategy and make decisions in those areas? We set out to identify ten impactful changes taking place in the analytics arena, right now, and we present them to you now.
1. Hadoop is Fundamental
Yes, those Big Data project failure rates have been high. Yes, Spark has in some ways displaced Hadoop and increasing numbers of customers are running the former independently of that latter. So the industry blames Hadoop…and stops uttering its name. Hadoop must be dead, right?
Wrong! Everyone’s talking about Data Lakes now and, much of the time, that’s just code for Hadoop. And while, yes, many organizations are implementing their data lakes in cloud storage layers, they’re often using Hadoop ecosystem technologies to analyze that data. Beyond that, consider that cloud storage layers can be made to emulate HDFS, Hadoop’s file system, and you start to realize that when you ponder cloud Data Lakes and Hadoop Data Lakes, there’s a distinction without much difference.
The good news is that this year, Hadoop’s going to do what it always should have: see adoption by the Enterprise, without great fanfare. Hadoop will become one data tool among many, and will be used when it makes tactical sense. It’s the combination of data technologies, including Hadoop, Spark, Business Intelligence (BI) and Data Warehouses that make the current analytics market so exciting.
2. Bye-Bye, Enterprise Stack BI
Earlier this year, MicroStrategy, the Enterprise BI pure play, announced its concession to the companies that compete with it on the front-end, by introducing connectors to their products. MicroStrategy is doubling down on its belief that their back-end OLAP platform, and associated data governance capabilities, are where it can best monetize. The company also seems to have decided that competing on the visualization and dashboard side is difficult and, even to the extent that it can be successful, provides diminishing returns.
Will the back-end be enough to sustain Enterprise revenue and supported growth? We’ll have to see. But one thing’s for sure: The monolithic Enterprise BI stack has become disaggregated and old dogs will need to learn new tricks.
3. Data Hierarchies
Maybe you’re familiar with the concept of data hierarchy, in terms of data storage and its correlation with frequency of access. “Hot” data – that which is used most often – is sometimes routed to very fast storage like solid state drives, or even CPU memory cache. Colder data is often routed to older – but cheaper – spinning hard disk drives.
With the storage hierarchy well-established, we’ll start to see recognition for other hierarchies this year. For example, analytics involves work with everything from experimental data sets that may be relevant to particular teams or business units, to highly structured, vetted and consensus-driven data that is useful to the entire Enterprise. In the middle are structured data sets that – possibly due to size, or level of cleanliness – are seen as somewhat less than production-level.
Experimental data sets sit best in a Data Lake. Highly vetted data sets are most logically kept in a data warehouse. And the mid-level data sets will likely live in Hadoop or Cloud storage, but will often be queried from relational databases, using SQL-on-Hadoop bridges like IBM Big SQL, Microsoft PolyBase, and Oracle Big Data SQL.
Another hierarchy might stratify data according to whether it will be used in the design of machine learning models or just for straight analysis. And another might be defined by the trustworthiness of the data source.
The reason hierarchies will be important is because there’s also a hierarchy of tools and technologies, including BI and Big Data analytics tools on query side, and transactional databases, NoSQL databases, Data Warehouses and Data Lakes on the repository side. Eventually, the hierarchies might simplify and the technologies might consolidate. But with so many technology choices right now, we’ll need hierarchies in the data to dictate our best practices in toolchain deployment.
4. Visualization Commoditization
MicroStrategy’s announcement of connectors to Tableau, Qlik and Power BI is more than a concession to competitors. It’s a de facto acceptance that those three self-service BI tools are now, essentially, the standard! Those companies have erected their own barrier to entry for others to do well in the visualization space.
They have also commoditized the whole area. Between Tableau Public, Qlik Sense Cloud Basic and Power BI Desktop (and the free tier of the Power BI cloud service), there’s a long tail of entry-level analytics that can be done for free. Add in tools like plot.ly, the D3 ecosystem and open source geospatial/mapping platforms and you’ll find your analytics capabilities are more limited by available time then they necessarily are by money.
Users are taking good data viz capabilities for granted now. They are still impressed by them, but not bowled over. Good viz isn’t so much a competitive edge anymore. Rather, bad viz is a competitive liability.