Cancer Research Training Award Fellow Sought for Natural Language Processing (NLP) and Deep Learning in Cancer Surveillance


The Surveillance Research Program (SRP) directs the collection and analysis of cancer surveillance data to answer key questions about cancer. As part of its mission, SRP manages the Surveillance, Epidemiology, and End Results (SEER) Program, an integrated, comprehensive population-based cancer reporting system. A major challenge in advancing scientific discoveries in cancer research is that a large portion of the relevant clinical information about patients is contained in free text formats which do not lend themselves to ready analysis and the creation of datasets in support of cancer research.

Traditionally, cancer surveillance data have been interpreted, extracted and entered into data systems manually by trained cancer registrars. With the increasing complexity of cancer diagnosis and treatment, across hospitals and outpatient facilities, the ability of cancer registrars to access, identify and extract all the data manually is no longer feasible. Therefore, SRP is working to develop and apply methods and systems for NLP and deep learning to automate manual information processing, and to maximize the value of free text documents such as electronic pathology reports and radiologic dictations.

Currently, the SEER Program collects data on approximately 450,000 new cancer cases annually from states and regions that represent 28% of the US population. Approximately 80% of those new cancer cases have associated unstructured electronic pathology reports. While the registries currently use these reports extensively through key word and rules-based automations combined with manual processes, the ability to automatically extract and report relevant components from these reports is necessary as the SEER Program moves into the future. Automating or further augmenting the extraction of structured data from these documents would enhance the existing SEER data while simultaneously increasing the consistency of cancer reporting and reducing the big data burden on cancer registries.

To address this problem, NCI and SEER registries have partnered with Department of Energy (DOE) laboratories to apply national supercomputing resources and technical expertise in deep learning, uncertainty quantification, graph analytics, data visualization, modeling and simulation.

Position Description

The Cancer Research Training Award (CRTA) fellow will function as an integral member of the Surveillance Informatics Branch in SRP. He/she will work with staff on projects related to the mission of SRP, especially planning, reviewing and contributing to the development of tools and methods to more efficiently capture detailed information from clinical documents and to the integration of NLP and deep learning tools within cancer registry workflows. He/she may also work on related projects such as evaluation of de-identification tools and methods. Day-to-day activities of this full-time position include, but are not limited to:

  • Project management tasks such as organizing meetings, workshops, and teleconferences related to the overall objective of developing and testing NLP and deep learning methods for capture of data from unstructured text that would enhance cancer surveillance;
  • Planning and implementing annotation of clinical documents, review and validation of output of NLP and deep learning models;
  • Writing reports and presenting findings;
  • Working with contractors and others internal and external to SRP and the SEER Program to develop and test methods or tools for capturing and extracting key information
  • Performing literature searches, conducting systematic reviews of databases and web sources;
  • Leading or assisting with collaborative research projects related to NLP and machine learning; and
  • Participating in staff meetings and attending lectures or other training opportunities sponsored by the National Institutes of Health.

CRTA Fellows may have the opportunity to travel for the National Cancer Institute to national conferences. This fellowship provides an excellent opportunity for a recent graduate potentially interested in pursuing further education leading to a medical informatics or research career.


  • Master's-level degree or PhD in computational linguistics, medical informatics, or related field.
  • A strong interest in medical informatics, health care and electronic health data and/or cancer surveillance methodologies.
  • Experience reviewing, analyzing, and summarizing scientific literature.
  • Excellent attention to detail and interpersonal, organizational, writing, and project management skills.
  • The ability to work independently and as a team member.
  • Skills and experience in data management with text files, SQL and/or NoSQL databases.
  • Skills and experience in programming and scripting languages and computing environments for data science (e.g. Python, R).
  • Knowledge and experience of NLP and machine learning tools and methods.

Application Requirements

To be considered for this position, please submit your resume/CV and cover letter to Trish Murphy by the application deadline. In the cover letter, provide an explanation of your interest in the fellowship program, explain your professional development goals and research interests, and describe your experience or interest in computational science and biomedical, behavioral, or population science.

Once selected for the fellowship, you will need to submit the following additional materials:

  • Two letters of recommendation.
  • Proof of U.S. Citizenship or resident alien status (e.g. photocopy of birth certificate or passport).
  • Official Transcripts and/or Proof of Academic Good Standing - Send transcript of highest degree conferred. If currently enrolled in school, Proof of Academic Good Standing must be sent on official letterhead and signed by graduate program director, advisor, or equivalent.

Note: In order to qualify for the position, the candidate must be a U.S. citizen or resident alien. A candidate with an I-551 stamp in their passport can also qualify since this is temporary verification of permanent residency status pending issuance of the green card. Individuals with "Employment Authorization" documents (EADs) do not meet eligibility criteria.

Application materials may be submitted via email to Trish Murphy. Paper copies and Official Transcripts and/or Proof of Academic Good Standing must be mailed to:

Ms. Trish Murphy
Surveillance Research Program
9609 Medical Center Drive MSC 9765 Room 4E516
Bethesda, MD 20892 (U.S. Mail)
Rockville, MD 20850 (Courier Service)
Tel. 240-276-6903
Fax: 240-276-7908

The National Cancer Institute is an Equal Opportunity Employer.


The Fellowship Mentor for this position will be:

Paul Fearn, Ph.D., MBA
Chief, Surveillance Informatics Branch
Surveillance Research Program, National Cancer Institute
National Institutes of Health
9609 Medical Center Drive MSC 9765
Bethesda, MD 20892 (U.S. Mail)
Rockville, MD 20850 (Courier Service)

Approximate Start Date

As soon as possible.

Stipend and Benefits

The stipends for CRTA Fellows are adjusted yearly, and are commensurate with academic achievement and relevant experience. More information is available online at: Benefits include health insurance at no cost and a wide range of career development and social activities. The office is located in commuter-friendly Rockville, Maryland, close to the bustling metropolis of Bethesda, and near downtown DC.