Cancer Research Training Award Fellow Sought for Virtual Tissue Repository and Natural Language Processing Pilot Projects


The Surveillance Research Program (SRP) directs the collection and analysis of cancer surveillance data to answer key questions about cancer. As part of its mission, SRP manages the Surveillance, Epidemiology, and End Results (SEER) Program, an integrated, comprehensive, multiple population-based cancer reporting system that collects data on approximately 500,000 cancer cases annually and represents approximately 35% of the US population.

Population-based biospecimens are critical to modern cancer research including the development and testing of novel predictive biomarkers to direct therapeutic decision making and prognostic biomarkers for understanding outcome risks, identification of novel cancer subtypes, and testing of molecular-level hypotheses in population subgroups. Leveraging its cancer registries across the United States, the SEER Program is initiating a series of activities to assist investigators in locating and accessing biospecimens for research purposes on a population scale.

In addition, traditionally, cancer surveillance data have been manually interpreted, extracted and entered by trained cancer registrars. With the increasing complexity of cancer diagnosis and treatment and expanding use of electronic medical records among hospitals and outpatient facilities, the ability of cancer registrars to access, identify and extract the data manually is no longer feasible. Therefore, SRP is working to develop and apply methods and systems for natural language processing (NLP) and machine learning (e.g. deep learning, active learning) to automate and facilitate manual information processing, and to maximize the value of free text documents such as the electronic pathology reports, radiologic dictations, etc. Approximately 80% of incident cases ascertained through SEER have associated unstructured electronic pathology (e-path) reports. Several NLP projects are underway utilizing data and such reports obtained through SEER registries, including text mining to automatically extract and code the Breslow measurements of skin melanoma patients, circumferential reresection margin of rectum tumors, and the HPV status of patients with oral cancer. Additional work is conducted to develop algorithms for detection of metastases in pathology and diagnostic imaging reports.

Position Description

The Cancer Research Training Award (CRTA) fellow will function within the Data Quality, Analysis and Interpretation Branch (DQAIB) as an integral member of the Virtual Tissue Repository (VTR) team and other teams (Quality Audits, Recurrence) working on NLP projects. As part of the VTR Team, he/she will work with staff on projects related to the mission of DQAIB, particularly as that mission relates to projects on genomics and biospecimens by maintaining and building information databases; evaluating information from specimen tracking; integrating information from submitted clinicopathologic data and treatment information, automated digital image analysis obtained from whole slide images, and clinical outcomes; and being a key member of the analytical team. He/she will also function as an integral member of teams working on NLP projects within SRP by analyzing extracted data and identifying tools and methods to more efficiently capture detailed information from currently collected electronic reports and planning for the integration of other key electronic data sources, such as radiology dictations, pathology reports, and digital whole slide images.

Day-to-day activities of this full-time position include, but are not limited to:

  • Organizing and collaborating on pilot studies of components of a national network for biospecimen acquisition with NCI, academic investigators, and contractors as well as regular interactions with SEER registry personnel;
  • Organizing meetings, workshops, and teleconferences related to the overall objective of developing and testing informatics methods for capture of data from unstructured text that would enhance cancer surveillance;
  • Analyzing data extracted from pathology and radiology reports utilizing NLP algorithms;
  • Writing scripts, algorithms, interfaces, or functions to support NLP and machine learning, and assisting with implementation in applied projects within SRP;
  • Working with contractors and others internal and external to SRP and the SEER Program to develop and test methods or tools for capturing and extracting key information;
  • Assisting with collaborative research projects related to medical informatics and or computational linguistics;
  • Performing ongoing literature reviews and solicit information and recommendations from ethics experts within and external to the NCI to develop best practices for managing patient privacy issues;
  • Assisting in synthesizing the results from targeted pilots to develop a production level implementation plan for developing a large-scale virtual biorepository;
  • Gathering, analyzing, and synthesizing information through literature reviews, surveys, and interviews via written reports and oral presentations; and
  • Participating in staff meetings and attend lectures or other training opportunities sponsored by the National Institutes of Health.

CRTA Fellows may have the opportunity to travel for the National Cancer Institute to national conferences. This fellowship provides an excellent opportunity for a recent graduate potentially interested in pursuing a research or medical career.


  • Doctorate or Master's-level degree in public health, epidemiology, genomics, bioinformatics, medical informatics or statistics.
  • A strong interest in medical informatics, electronic health data and/or cancer surveillance methodologies, biospecimen research, epidemiology, population-based research, and/or cancer control.
  • Skills and experience in data management with text files, SQL and/or NoSQL databases.
  • Skills and experience in programming and scripting languages (e.g. Python, Java).
  • Knowledge and experience of NLP and machine learning tools and methods.
  • Experience reviewing, analyzing, and summarizing scientific literature.
  • Excellent attention to detail and interpersonal, organizational, writing, and project management skills.
  • The ability to work independently and as a team member.

CRTA Fellows may have the opportunity to travel for the National Cancer Institute to national conferences. This fellowship provides an excellent opportunity for a recent graduate potentially interested in pursuing further education leading to a research or medical career.

Application Requirements

To be considered for this position, please submit your resume/CV and cover letter to Trish Murphy by the application deadline. In the cover letter, provide an explanation of your interest in the fellowship program, explain your professional development goals and research interests, and describe your experience or interest in quantitative/computational science and/or population science.

In addition to your resume/CV and cover letter, you will need to submit the following additional materials:

  • Two letters of recommendation.
  • Proof of U.S. citizenship or resident alien status (e.g. photocopy of birth certificate or passport).
  • Official Transcripts and/or Proof of Academic Good Standing - Send transcript of all degree conferred. If currently enrolled in the last semester of your program, Proof of Academic Good Standing must be sent on official letterhead and signed by graduate program director, advisor, or equivalent.

Note: In order to qualify for the position, the candidate must be a U.S. citizen or resident alien. A candidate with an I-551 stamp in their passport can also qualify since this is temporary verification of permanent residency status pending issuance of the green card. Individuals with "Employment Authorization" documents (EADs) do not meet eligibility criteria.

Application materials may be submitted via email to Trish Murphy; paper copies and Official Transcripts and/or Proof of Academic Good Standing must be mailed to:

Ms. Trish Murphy
Surveillance Research Program
9609 Medical Center Drive MSC 9765 Room 4E516
Bethesda, MD 20892 (U.S. Mail)
Rockville, MD 20850 (Courier Service)
Tel. 240-276-6903
Fax: 240-276-7908

The National Cancer Institute is an Equal Opportunity Employer.

Approximate Start Date

March – June, 2020

Stipend and Benefits

The stipends for CRTA Fellows are adjusted yearly, and are commensurate with academic achievement and relevant experience. More information is available online at the Behavioral Research Program website. Benefits include health insurance at no cost and a wide range of career development and social activities. The office is located in commuter-friendly Rockville, Maryland, close to the bustling metropolis of Bethesda, and near downtown DC.

Last Updated: 25 Feb, 2020