Toward Precision
Cancer Surveillance


The acquisition of diagnostic, treatment, and outcomes information on cancer cases for population-based cancer surveillance currently involves a tremendous amount of manual data abstraction and information processing by expert staff. A majority (estimated 65%) of clinical data elements that are needed to characterize cancer patients come from unstructured sources (e.g. pathology reports, radiology notes, treatment summaries, clinical visit notes). Many hospital-based cancer registries that abstract and report cancer cases to central cancer registries at the state level rely on manual data abstraction from document-based medical records. Central cancer registry staff also perform manual data processing to find additional cases, consolidate records, and fix data errors and gaps. Manual processes impose inherent limitations on the volume and types of information that registries can collect. Furthermore, with the increasing complexity of cancer care, staff may not have the resources to... Read more

natural language processingNLPcomputational linguisticscancer surveillancedata abstraction