Genomics holds real promise to improve healthcare for countless patients worldwide. When the genomic roots of a disorder have been identified, pharmaceutical companies can develop treatments targeting the specific underlying disorder. Clinicians can then use these targeted therapies to develop treatment plans individualized to each patient, increasing the chances of successful outcomes.
Massive genomic data sets must be analyzed and compared to identify the variants that can spur these breakthroughs. With incredible potential and massive data, it’s not surprising that genomics has been identified as one of the largest generators of data over the next 10+ years (“Big Data: Astronomical or Genomical?” PLOS Biology July 7, 2015). The demand for technology in the genomics space is growing almost as quickly as the data.
Intel is a data company. We address big data problems and help organizations sort, analyze and interpret data to solve real world problems. Intel has capabilities in software optimization, scaling solutions, and speeding up the time to meaningful outcomes. We apply new technologies like faster processors, NVMe*and PCIe* SSDs, FPGAs, high speed fabrics, and artificial intelligence (AI) to address these big data problems. What better problem is there to address than curing disease?
Collaborating to Support Genomics Research: BIGstack
In late 2016, Intel and the Broad Institute of MIT and Harvard, the leader in genomics research, announced the Center for Genomic Data Engineering with a five year, $25M commitment from Intel. This Center was a unique partnership for both organizations. Intel saw the opportunity to make a real difference in people’s lives by combining the expertise of the two organizations to support, analyze, and manage the rapidly increasing genomics data available to researchers, pharmaceutical companies, and clinicians.
In May 2017, Intel announced BIGstack (Broad Intel Genomics Stack) as an early result of the collaboration. BIGstack is an integrated hardware and software stack designed to run the Broad Institute Genomic Analysis Toolkit (GATK) more quickly, at a larger scale, and with easier deployment. BIGstack is based on the GATK Best Practices pipelines, with optimizations for Intel® architecture and tools to improve performance (e.g. Intel’s Genomic Kernel Library (GKL) and Intel’s GenomicsDB).
Introducing BIGstack 2.0
It has been an exciting first year for the Broad Institute and Intel collaboration as we continue to improve GATK performance and help scientists and clinicians to treat disease.
Today, I’m thrilled to announce the release of BIGstack 2.0. BIGstack 2.0 incorporates our latest Intel® Xeon® Scalable processors, Intel® 3D NAND SSD, and Intel FPGAs while also leveraging the latest genomic tools from the Broad Institute in GATK 3.8 and GATK 4.0.
This new stack provides a 3.34x speed up in whole genome analysis and a 2.2x daily throughput increase1. It is able to deliver these performance improvements with a cost of just $5.68 per whole genome analyzed.2 The result: researchers will be able to analyze more genomes, more quickly and at lower cost, enabling new discoveries, new treatment options, and faster diagnosis of disease.
We are working with many organizations to leverage BIGstack. I am thankful for two of our key customers, BGI and Novogene, for their collaboration.
Fang Lin, BGI Chief Information Officer of BGI said, “BGI was able to leverage Intel Xeon Scalable processors, Intel FPGAs, and BIGstack to improve whole genome analysis to a few hours. Intel’s BIGstack provided the hardware architecture and software optimizations to allow BGI to further optimize our genomics pipeline to meet our latency goals.”
Tian Shilin, Novogene Chief Information Officer, added, “Novogene has increased capacity to generate genomic data 30 fold to 600PB per year. To address this increase in data, Novogene worked with Intel to implement BIGstack to meet the speed and scale required to support the dramatic increase in genomic data. Novogene was able to leverage BIGstack to rapidly deploy a solution to meet our requirements.”
Intel Select Solutions for Genomics Analytics
Intel will further facilitate the evolution of genomics and make it easier for researchers to adopt the latest genomics technology. I’m excited to announce Intel® Select Solutions for Genomics Analytics – premium, turnkey genomics solutions that will be offered by a limited number of Intel ecosystem partners.
Intel Select Solutions are verified hardware and software stacks that are optimized for specific workloads across compute, storage, and network. Intel Select Solutions for Genomics Analytics are based on BIGstack 2.0 and provide the added benefits of feature-rich, Intel-verified solution configurations.
Intel Select Solutions for Genomics Analytics adhere to tight hardware and software specifications, a rigorous benchmarking methodology, and a verification process by Intel’s internal engineering organization. Intel Select Solutions for Genomics Analytics are a fast path to unlock the benefits of BIGstack 2.0, enabling customers to confidently deploy infrastructure purpose-built for genomics workloads.
BIGstack at SC17 and Beyond
Intel is excited to be driving the genomics revolution and delivering a new era of precision therapies. We are also happy to announce that our partners Inspur, Colfax, HPE, Lenovo, and Atos are involved and supporting BIGstack. Please visit their booths at SC17 to learn more about BIGstack.
- Inspur Booth # 1643
- Colfax Booth # 1219
- HPE Booth # 925
- Lenovo Booth # 1353
- Atos Booth # 1925
1 Configuration details for the 3.34x and 2.2x performance claims:
|Latest-Generation Platform Configuration with Intel® Arria 10 FPGA PCIe Card||Latest-Generation Platform Configuration||Previous-Generation Platform Configuration|
|Number of Nodes||1||1||1|
|Processor||2 x Intel® Xeon® Platinum 8180 Processor with 28 cores each (56 total)||2 x Intel® Xeon® Platinum 8180 Processor with 28 corzes each (56 total)||2 x Intel® Xeon® Processor E5-2699 v4 with 22 cores each (44 total)|
|FPGA||1 x Intel® Arria 10 FPGA PCIe Card||NA||NA|
|Memory||16 x 32 GB 2666 REG ECC
(Total 512 GB)
|16 x 32 GB 2666 REG ECC
(Total 512 GB)
|16 x 32 GB 2400 REG ECC
(Total 512 GB)
|Storage Configuration I||7 x Intel® SSD Data Center
(Total 14 TB)
|8 x Intel® SSD Data Center
(Total 32 TB)
|8 x Intel® SSD Data Center
(Total 32 TB)
|Storage Configuration II||NA||8 x Western Digital*
6 TB SAS HDD 3.5″
(Total 48 TB)
Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance/datacenter.
2 Assuming 3-year life cycle of the appliance, a $522K initial cost for 24 node configuration, 5 whole genomes per day per node analyzed and 70% utilization of the system.