Toward Precision
Cancer Surveillance

NLP

Feb
6

The National Cancer Institute (NCI)’s Surveillance, Epidemiology, and End Results (SEER) Program is collaborating with the Department of Energy (DOE) on a 5-year pilot project that focuses on the use of high-performance computing to support cancer surveillance. Pilot 3 of the NCI-DOE Collaboration applies advanced computational capabilities and deep learning methods to population-based cancer data to understand the impact of new diagnostics, treatments, and other factors affecting patient outcomes.

Members and stakeholders of the collaboration came together for a two-day hackathon at the DOE’s Oak Ridge National Laboratory in Knoxville, Tennessee on September 10-11, 2019. The hackathon included a hands-on review of algorithms to improve efficiency of cancer registries and discussions of the next steps for implementation of the new tools in the registry workflow. The event also involved in-depth discussions on project focus areas, including privacy-aware computing, model... Read more

collaborationalgorithmDOEhigh-performance computinghackathontoolsrecurrencedata captureNLPclinical trials
Jun
5

An integrated team from NCI’s Surveillance, Epidemiology, and End Results (SEER) Program, four Department of Energy (DOE) labs—Oak Ridge National Laboratory (ORNL), Lawrence Livermore National Lab, Los Alamos National Lab, and Argonne National Lab—Information Management Systems (IMS), and four SEER registries met on March 28th–30th, 2017 to continue their work on the NCI-DOE Pilot 3 collaboration. This partnership will enhance cancer research using the DOE’s expertise in high performance computing and SEER’s expertise in cancer surveillance. The meeting focused on the progress made in Aim 1 and Aim 2 of the pilot as well as future goals for Aim 3.

The goal of Aim 1 is to create natural language processing (NLP) and machine learning tools that can accurately capture information from unstructured clinical text for expanded cancer surveillance data reporting. The collaboration team has completed development of a Clinical Document Annotation and Processing (CDAP) pipeline. This... Read more

collaborationDOEsurveillance dataCDAPNLP
Dec
5

The acquisition of diagnostic, treatment, and outcomes information on cancer cases for population-based cancer surveillance currently involves a tremendous amount of manual data abstraction and information processing by expert staff. A majority (estimated 65%) of clinical data elements that are needed to characterize cancer patients come from unstructured sources (e.g. pathology reports, radiology notes, treatment summaries, clinical visit notes). Many hospital-based cancer registries that abstract and report cancer cases to central cancer registries at the state level rely on manual data abstraction from document-based medical records. Central cancer registry staff also perform manual data processing to find additional cases, consolidate records, and fix data errors and gaps. Manual processes impose inherent limitations on the volume and types of information that registries can collect. Furthermore, with the increasing complexity of cancer care, staff may not have the resources to... Read more

natural language processingNLPcomputational linguisticscancer surveillancedata abstraction