MapR Technologies welcomed more than 100 people Wednesday to a one-day Convergence event in San Diego, where local companies like Qualcomm and genomic startup LunaDNA shared stories about their big data journeys.
Dawn Barry, the president and co-founder of LunaDNA, provided a compelling keynote address on the power of genomics and the benefits that it will generate for humankind. While genomic data will eventually lead to great progress in disease prevention, food yield, and public safety (to name a few areas), there’s a lot of work to be done in the meantime.
For starters, the DNA data will need to be correlated with clinical medical information, lifestyle, environment, and nutrition data to get answers. “Just the genomic data is a heck of a lot of data,” she said. “Now you’re going to add more of that because we’re going to make correlations. You’re talking about a tsunami of data.”
Spun off from genomic sequencing firm San Diego-based Illumina last year, LunaDNA is a public benefit corporation that’s building a community that encourages people to share their genomic data. The data would be shared with researchers in a manner that respects the security and privacy of that data, and also provides value back to the people who share their data in the form of dividends.
Barry is confident LunaDNA’s approach can successfully get the genomics revolution into high gear. “In the area of human health… a lot of us don’t feel like it’s moving along as fast as it could,” she said. “We believe that by bringing people to the center of the research we’ll get that research, that dynamic data, that longitudinal data that can affect contextual data and through that be able to make meaningful decisions.”
The size and variety of the data types surrounding genomics demand a scalable and flexible platform, Barry says. “It’s not just humans in isolation. It’s really about this interconnected system, this interconnected environment, and that’s why platforms like MapR [are needed],” she said. “Companies that lay the foundation for the platform, like MapR, can really handle the massive amount of data that’s coming and can be positioned well for the long haul.”
Another MapR customer is Optum, a subsidiary of UnitedHealth Group that has built a sizable data analytics practice dubbed OptumIQ geared toward helping healthcare companies improve health outcomes while reducing costs. Central to that offering is a MapR cluster that spans 1,300 nodes and stores more than 54PB of data, according to Jay Hugalavalli, senior director of software engineering service for Optum.
As Hugalavalli explained, Optum uses a wide collection of technologies from the Hadoop stack with its MapR cluster, including core components like HDFS, YARN, MapReduce, Hive and Hbase, as well as relative newcomers like Spark and Drill. Optum is also using MapR-Streams and Kafka to manage the movement of big data streams, and MapR-DB to provide an operational data store. Kubernetes is also used for orchestration, while GPUs provide a processing boost for deep learning workloads.
All told, the MapR cluster serves more than 4,500 users at Optum, which views the cluster as core to its analytics mission. Being able to run deep learning and AI workloads on the same cluster where SQL and NoSQL workloads reside is a big benefit, he says.
Another local firm using MapR is Qualcomm, the $22.3-billion mobile chip manufacturer that employs more than 33,000 workers. According to Chandra Mouli, Qualcomm’s vice president of engineering IT, there are multiple ways the company utilize big data products.
For starters, Qualcomm uses machine learning to analyze the files generated from the billions of design jobs it runs across its HPC and cloud-based computing resources as it designs its chips, Mouli told MapR Chief Technology Officer Tom Fisher during a cool fireside chat.
“You can imagine how many petabytes of data we generate. A lot of them are temp files we don’t need as the design progresses,” he says. “Cost is an aspect that we’re very focused on. So we’re using machine learning to go and mine the hundreds of millions of files to see who touched it, who used it, and if we don’t need it, how do you delete it.”
The company also uses machine learning in the circuit-design process itself, including before the design has been hammered down to predict more efficient ways of doing things. Machine learning is also used after the design is set and the chip fabs are getting ready to ram up manufacturing.
“Once the design is done and the chip factories come out to talk to us about yield management, we are able to predict using machine learning the wafer yields and you can predict is it going to fail,” Mouli says. Better insight into failure rates helps the company save money during the testing process, he says.
Qualcomm, which is a MapR customer and has also been tied to Cloudera in the past, is also using big data platforms to manage the flow of data from fleets of connected cars in Europe into its data stores. Qualcomm writes 9 million messages per day, weighing about 100TB, into its big data cluster. Getting that data in a format and place where it can be useful is no trivial feat, says Mouli, who previously worked in the relational database world.
“Getting the data in is so important. If you don’t have the data, you can’t have the machine learning,” he says. “I built ETL systems from scratch. You know if you’ve done that kind of work, how hard it is. You really have a bottleneck of getting the data in, even in a traditional world.”
The next MapR Convergence event is scheduled for August 23 in Atlanta, Georgia. You can see the company’s schedule here.