Toward Precision
Cancer Surveillance


An integrated team from NCI’s Surveillance, Epidemiology, and End Results (SEER) Program, four Department of Energy (DOE) labs—Oak Ridge National Laboratory (ORNL), Lawrence Livermore National Lab, Los Alamos National Lab, and Argonne National Lab—Information Management Systems (IMS), and four SEER registries met on March 28th–30th, 2017 to continue their work on the NCI-DOE Pilot 3 collaboration. This partnership will enhance cancer research using the DOE’s expertise in high performance computing and SEER’s expertise in cancer surveillance. The meeting focused on the progress made in Aim 1 and Aim 2 of the pilot as well as future goals for Aim 3.

The goal of Aim 1 is to create natural language processing (NLP) and machine learning tools that can accurately capture information from unstructured clinical text for expanded cancer surveillance data reporting. The collaboration team has completed development of a Clinical Document Annotation and Processing (CDAP) pipeline. This... Read more

collaborationDOEsurveillance dataCDAPNLP

The acquisition of diagnostic, treatment, and outcomes information on cancer cases for population-based cancer surveillance currently involves a tremendous amount of manual data abstraction and information processing by expert staff. A majority (estimated 65%) of clinical data elements that are needed to characterize cancer patients come from unstructured sources (e.g. pathology reports, radiology notes, treatment summaries, clinical visit notes). Many hospital-based cancer registries that abstract and report cancer cases to central cancer registries at the state level rely on manual data abstraction from document-based medical records. Central cancer registry staff also perform manual data processing to find additional cases, consolidate records, and fix data errors and gaps. Manual processes impose inherent limitations on the volume and types of information that registries can collect. Furthermore, with the increasing complexity of cancer care, staff may not have the resources to... Read more

natural language processingNLPcomputational linguisticscancer surveillancedata abstraction