GPU leader Nvidia, generally associated with deep learning, autonomous vehicles and other higher-end AI-related workloads (and gaming, of course), is mounting an open source end-to-end GPU acceleration platform and ecosystem directed at machine learning and data analytics, domains heretofore within the CPU realm.
GPUs operate in the analytics space, of course – examples include Kinetica’s GPU-accelerated database and OmniSci’s (formerly MapD) Core database system for big data query and visualization. Now, with Nvidia’s effort to push the GPU acceleration into ML/HPDA (high performance data analytics), the company reports that the RAPIDS platform delivers speed-ups, using the XGBoost machine learning algorithm for training on an NVIDIA DGX-2 supercomputer, of 50x compared with CPU-only systems.
RAPIDS brings with it with an ecosystem from the open-source community, including Databricks (a web-based platform for big data processing in the cloud using Apache Spark) and Anaconda (an open source distribution of the Python and R programming languages for data science and machine learning), and tech companies such as Hewlett Packard Enterprise, IBM and Oracle.
The RAPIDS suite of open-source libraries has been under development for the past two years by Nvidia engineers working with open-source contributors, including Apache Arrow (a data layer for in-memory analytics), Pandas and scikitlearn, and it’s designed to give scientists the tools to run the entire data science pipeline on GPUs. RAPIDS builds on popular open-source projects by adding GPU acceleration to the Python data science tool chain.
“We’re building on the community of Python users… and more recently built around… Apache Arrow and in memory data format and some other tools that allow us to scale from using just one GPU to multiple GPUs in the system, to multiple node and clusters of GPUs,” said Jeff Tseng, head of product for AI infrastructure at Nvidia, in a pre-announcement conference call. “These technologies are driving RAPIDS’ ability to integrate into today’s most popular data science workloads and accelerate them…. We’re going to be focused on business data, on tabular data, and were going to accelerate machine learning data prep.”
You can read the rest of the story at EnterpriseTech.