This is the introductory post in a blog series that explores how we in Hortonworks Engineering build, test and release new versions of our platforms.
In this post, we introduce the basic themes and set context for deeper discussions in subsequent blogs.
We, at Hortonworks, are very proud of the work we do, along with the Open Source communities, in pushing the frontiers on infrastructure for data such as YARN, Hive/LLAP, Atlas, Ranger etc. along with several other open-source projects like Spark & Kafka.
Our ability to continue to do the same depends on not only having the brightest minds in the building(s), but also, increasingly, on providing ourselves the tools to be able to validate the incredible work at scale and to a level of readiness that hundreds of Enterprise customers (of Hortonworks or other distributions) can securely/reliably run their business on.
Roughly speaking, here’s what happens in any given release:
That’s neat, you’d say? The reality is far more complex…
Just to provide a perspective on the breadth of the task at hand to integrate and validate 25+ open-source projects into a coherent distribution (HDP or HDF), here are some of the vectors we in Hortonworks engineering deal with on a daily basis:
Mathematically, this leads to over 30K combinations which are finite, but overwhelming, to validate!
Navigating the “Matrix”, as we call reverentially refer to it, is a really hard engineering problem – at least as hard as working on YARN or Atlas or LLAP – if not harder!
Moreover, we have several releases in flight at the same time which require different amounts of testing – Major, maintenance and hotfixes.
Last, not least, we have a corpus of over 30,000 functional tests alone we’ve built up over 10 years (yes, prior to life at Hortonworks), which cover:
- Unit testing
- Operational Readiness
Each of these tests have to be run on each “configuration” or “Matrix slice” (OS/DB/FS/JDK/…) before we feel comfortable shipping a release to our Enterprise customers.
To put everything into perspective, here are some stats on the Hortonworks machinery every day for each “slice”:
- 3500 VMs
- 21000 Compute Hours
- 30,000 tests
- 50+ projects (including Apache projects, connectors etc)
- 100+ Commits
This, naturally, necessitates a degree of sophistication and innovation for the infrastructure which is fairly unprecedented!
In the same vein, once the infrastructure is available, dealing with analysis of the output of the tests is a huge challenge, given the sheer breadth of tests we have built over time. Take a moment to imagine this… if 1500 tests fail due to a broken merge, it would require enormous amounts of human time to analyze and pinpoint the root-cause. Wouldn’t it be better to use text analytics and machine learning to categorize test failures and report them with possible root cause? Why stop there… let’s go further and file the (internal) jira too! 🙂
Pixie Dust to the Rescue
Wouldn’t it be nice if we sprinkle some pixie dust and conjure up infrastructure to help us deal with this?
Unfortunately, that’s a viable option in a Disney movie, but for us – not so much.
So, as we started to look at a Version 2 (aka the Project Pixie Dust) of our internal infrastructure a couple years ago, we had some lofty goals:
- Builds in 1hr
- UT in 10mins
- CI in 1hr
- Deploy a single HDP cluster in 15 mins
- Validate 30,000 tests across 500 HDP clusters per “slice” within 6 hours
- ML-based analytics of test case logs for automated analysis and reporting – and filing jiras!
- Last, and most important – do this all on HDP! Use what we ship, and ship what we use!
So, how far did we get, and how?
The rest of this series will walk you through the fantastic feats of gymnastics we made HDP perform to go a long way.
A teaser: Hadoop-3.0 YARN-based Docker container cloud running several million containers per release and several thousand HDP clusters per day, Ambari deploys in 10 mins or less, ML-based text-analytics for auto-categorization of failures etc.
Stay tuned! We are sure you will enjoy reading, we are certainly very proud of it… for it’s not only some very hard engineering problems, but also a massive competitive differentiator for Hortonworks!