Natural Language Processing (NLP)

SRP is developing Natural Language Processing (NLP) tools and methods to improve the overall efficiency and quality of data abstraction and processing for cancer registries, and to enable acquisition of more detailed clinical data that may not be currently reported. NLP tools can process free text documentation, including pathology reports, radiology reports, and oncology clinical notes, and can extract information. NLP engineers and data scientists train these computer algorithms to complete tasks, including information extraction, de-identification, and classification.

SRP is collaborating with four labs at the Department of Energy (DOE) to leverage the capabilities of high-performance computing to support implementation of more advanced NLP and deep learning tools. The collaboration has developed tools to semi-automatically abstract site (including sub-site), histology, laterality, behavior and grade from pathology reports, with the aim of reducing the manual coding burden on cancer registrars. Furthermore, the partnership is utilizing datasets to train algorithms that will sort pathology reports based on reportability. Other initiatives that apply NLP methods and tools include large-scale quality assessments of SEER data and de-identification of the data.

More Information

Last Updated: 02 Jan, 2020