NCI is collaborating with the U.S. Department of Energy (DOE) as part of the inter-agency coordination activities defined in the National Strategic Computing Initiative (NSCI) Presidential Order (July 29, 2015) and announced during Vice President Biden’s Cancer Moonshot Summit on June 29, 2016. The NCI-DOE collaboration has initiated three pilot efforts that will simultaneously impact the future of cancer research and guide future advances in scientific computing. These pilots will characterize and help overcome key precision oncology challenges at the molecular, patient, and population levels during the next three years.
The NCI Surveillance, Epidemiology, and End Results (SEER) Program and DOE’s Oak Ridge National Laboratory (ORNL) are responsible for co-leading the population-level pilot (Pilot 3). Three additional DOE laboratories, Argonne National Laboratory (ANL), Los Alamos National Laboratory (LANL), and Lawrence Livermore National Laboratory (LLNL) participate in the effort. This pilot will address the growing cancer surveillance challenges in capturing essential information for understanding the effectiveness of cancer diagnosis and treatment in the context of our complex medical and social environment. The computational strengths and subject matter expertise in deep learning at the DOE laboratories and the content expertise of SEER staff concerning cancer registries will be leveraged to rapidly move the surveillance community forward. The objective of Pilot 3 is to deliver an infrastructure that will support the development of algorithms and informatics tools to enable a comprehensive, scalable, and cost-effective national cancer surveillance program that enhances the existing system while also expanding the breadth of data captured to integrate biological, social, psychological and ecological variables to model for cancer outcomes.
The aims of Pilot 3 are:
- Aim 1: Development of scalable natural language processing (NLP) and machine learning tools for deep text comprehension of unstructured clinical text to enable accurate, automated capture of reportable cancer surveillance data elements. Aim 1 will build the foundational infrastructure and models necessary for clinical document selection, annotation, and the development, validation, and iterative improvement of algorithms.
- Aim 2: Exascale linkage and scalable analytics of heterogeneous cancer surveillance data (e.g., claims, pharmacy, electronic health records (EHR), images) including novel data such as life-time human exposure (exposome) data to discover patterns and understand drivers of cancer outcomes. Aim 2 will provide robust infrastructure and processes for defining, acquiring, linking, storing, and processing heterogeneous datasets to support expanded cancer surveillance and develop scalable graph, visual, and in-memory heterogeneous data exploration methods and tools.
- Aim 3: Development of a data-driven modeling and simulation paradigm for predictive modeling of patient-specific health trajectories. These models will enable in silico, large-scale evaluation and recommendation of precision cancer therapies and prediction of their impact in the real world. Aim 3 will build on the informatics tools developed in Aims 1 and 2 to focus on modeling recurrence or progression and will begin to predict response to initial treatment (outcome: recurrence) or subsequent treatment (outcome: survival).
As a part of the collaboration, the Pilot 3 management team is establishing a community of academic and commercial stakeholders to engage in this pilot. These stakeholders are interested in machine learning and NLP to support cancer surveillance.
NCI’s Pilot 3 management team members (all in the Surveillance Research Program) are Lynne Penberthy, Paul Fearn, Jessica Boten, Donna Rivera, Marina Matatova, and Steve Friedman. The DOE management team includes Gina Tourassi Joe Lake, and Gilbert Weigand (ORNL), Tom Brettin (ANL), Ana Paula Sales (LLNL), and Tanmoy Bhattacharya (LANL).
Stay tuned for future blog posts that will explain NCI-DOE collaborations with key stakeholders and provide updates on pilot initiatives.