This blog post covers some of the sessions from Dataworks Summit San Jose 2018 that focus on the efforts of the Apache Hadoop YARN community. Come & explore the latest and greatest of Apache Hadoop YARN at Dataworks Summit San Jose 2018!
The agenda is packed as usual.
If you are specifically interested about the latest developments in YARN and the larger Hadoop project in general, you can use this post as a map of sorts – pointing to some of the talks and sessions that explore & discover the latest trends and happenings of the Apache Hadoop YARN community.
Sessions at the main conference
Apache Hadoop YARN 3.x in Alibaba
By: Weiwei Yang (Alibaba Group) & Ren Chunde (Alibaba Group)
When/Where: Thursday, June 21 10:20 AM – 11:00 AM, Meeting Room 211A/B/C/D
Attend this talk to learn how Alibaba’s data infrastructure is built with Apache Hadoop YARN since 2013, how they manage more than 10k nodes, and how Hadoop YARN serves all types of workloads – batch jobs, streaming, machine learning, OLAP, and even online services that directly impact Alibaba’s user experience. They will also share how they leverage a lot of YARN 3.x improvements and how they keep evolving YARN’s ability to tackle the challenges brought by continuously increasing data and business in Alibaba.
Apache Hadoop YARN: state of the union
By: Vinod Kumar Vavilapalli (Hortonworks) (hey, that’s me!) and Sunil Govindan (Hortonworks).
When/Where: Tuesday, June 19 11:00 AM – 11:40 AM at Grand Ballroom 220A
In this talk, we do the annual sermon of covering the latest and great of Apache Hadoop YARN. We’ll start with the current status of YARN, then move on to the exciting present and future of YARN.
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
By: Suma Shivaprasad (Hortonworks) & Rohith Sharma (Hortonworks)
When/Where: Tuesday, June 19 2:50 PM – 3:30 PM, Meeting Room 211A/B/C/D
The Hadoop community announced Hadoop 3.0 GA in December, 2017 and 3.1 around April, 2018 loaded with a lot of features and improvements. One of the biggest challenges for any new major release of a software platform is its upgrades. This talk would focus on upgrades to Hadoop 3 and provides a cluster upgrade guide for admins and workload migration guide for users of Hadoop.
Running distributed Tensorflow in production: challenges and solutions on YARN 3.0
By: Wangda Tan (Hortonworks) and Yanbo Liang (Hortonworks)
When/Where: Wednesday, June 20 11:00 AM – 11:40 AM, Grand Ballroom 220A
Deep learning is popular, and Tensorflow is one of the most popular deep learning platforms. More and more enterprises start trying Tensorflow to solve their use cases. With latest features added to YARN such as GPU isolation, placement constraints (how to wisely place workers/parameter servers to better leverage resources), Docker container integration, native service support, etc. YARN has great support for deep learning and machine learning workloads!
YARN federation: taming a beasty fleet with global optimizations
By: Carlo Curino (Microsoft) & Subru Krishnan (Microsoft)
When/Where: Thursday, June 21 9:30 AM – 10:10 AM, Executive Ballroom 210D/H
As they say, operating one cluster is hard, operating a few large clusters is harder, operating many, massive clusters is… terrible. Attend this talk to learn how the Micorsoft team target this issue based on their experience at Microsoft in operating several clusters each with tens of thousands of Hadoop nodes and present a new component – Global Policy Generator – that makes the operation of such clusters seamless for users and painless for operators by overseeing the operations of an entire virutally-unified federation Hadoop YARN cluster. If you attended their talk last year on running Hadoop YARN on 40K nodes, don’t miss this one!
Containers and Big Data
By: Billie Rinaldi (Hortonworks) and Shane Kumpf (Hortonworks)
When/Where: Wednesday, June 20 2:00 PM – 2:40 PM, Grand Ballroom 220A
As containerization continues to gain momentum and become a de facto standard for application deployment, challenges around containerization of big data workloads are coming to light. Attend this session to explore the emerging patterns and challenges related to containers and big data workloads, including running applications such as Apache Spark, Apache HBase, and Kubernetes in containers on YARN.
Rich placement constraints: Who said YARN cannot schedule services?
By: Konstantinos Karanasos (Microsoft) and Wangda Tan (Hortonworks)
When/Where: Wednesday, June 20 11:50 AM – 12:30 PM, Executive Ballroom 210C/G
Optimizing performance and resilience of machine learning, streaming, and latency-sensitive online applications in shared production clusters requires precise control of their placements by means of complex constraints. Attend this joint talk by the two Hadoop PMC members present the brand new addition of expressive placement constraints in YARN. They describe real use cases from production clusters and show the benefits of placement constraints on large clusters using popular applications in both on-prem and cloud settings.
Related Hadoop talks
Scaling Hadoop at LinkedIn
By: Konstantin Shvachko (LinkedIn) and Erik Krogen (LinkedIn).
When/Where: Tuesday, June 19 2:00 PM – 2:40 PM, Grand Ballroom 220A
LinkedIn leverages the Apache Hadoop ecosystem for its big data analytics. This talk will tell the story of how our friends at LinkedIn doubled their Hadoop infrastructure twice in the past two years, catering to the big data analytics that powers the steady growth of the member base at LinkedIn.
Exploiting machine learning to keep Hadoop clusters healthy
By: Dheeraj Kapur (Oath) & Swetha Banagiri (Oath)
When/Where: Wednesday, June 20 4:40 PM – 5:30 PM, Executive Ballroom 210B/F
Oath has one of the largest footprint of Hadoop, with tens of thousands of jobs run every day and with 50k+ nodes. Attend this talk to see how the folks at Oath use machine learning to predict if one disk out of the 200k+ disks is going to fail.
Migrating a live Hadoop cluster with zero user intervention
By: Sumit Kumar Mukherjee (Adobe Systems)
When/Where: Thursday, June 21 10:20 AM – 11:00 AM, Grand Ballroom 220A
Attend this talk to learn about Adobe’s experience in moving their Hadoop workloads from one datacenter to a brand-new Hadoop cluster in another datacenter – all under 30 minutes, done with no user intervention using an innovative network-based migration strategy so as to ensure that users did not have to make any changes!.
Quality for the Hadoop Zoo
By: Sunitha velpula (Hortonworks)
When/Where: Thursday, June 21 12:20 PM – 1:00 PM, Meeting Room 230B
Hadoop distributions can be combination of 25+ opensource projects! Ensuring the quality of the stack for a complex stack and combinations of the can be overwhelming. Attend this talk to get a glimpse into how such a stack can tested to cater to the myriad combinations of workloads, environments with vectors like Operating systems, JDK and so on.
Birds of a Feather
Come join the BOF session Come join the discussion and share your experiences, challenges, future interests, and requirements on Apache Hadoop YARN and discuss what’s on the roadmap and future design options at the Birds of a Feather session happening on Wed, June 20th at 5:40 PM at Meeting Room 211A/B/C/D.
See you there!
That is just a small portion of the full agenda though. There are lots of other sessions that cover other exciting topics in open source, big data, analytics, data science, and artificial intelligence. See you all at Dataworks Summit and wishing you a great conference!