Analysis of Data Based on the 2018 NAACCR Submission

Exclusions Across All Cancer Sites to Remove Obvious Aberrant Data

In order to produce stable estimates, the data are carefully examined before fitting the model. Some data are apparent outliers and are removed (e.g. a single submission with a sudden spike up in cases and then a decline in the next submission). These data are removed because the purpose of delay modeling is to project future counts of cases from the most current submission, and an aberrant submission is not likely to occur in the future.

Registries Included in the Analysis

In the analysis of 2018 NAACCR submission, there are 56 US registries and 13 Canadian registries. The 56 US registries include 4 states that are divided into 9 sub-state registries, 46 state registries and District of Columbia. There are 69 registries with US and Canadian registries combined. Totally, 3 US registries and 4 Canadian registries are excluded from the analysis.

No SEER registries are excluded based on these requirements.

Cancer Sites and Races/Ethnicity Included in Analysis

Cancer Sites

To create a stable delay-adjustment model, there needs to be sufficient amount of cases in the submissions in order to analyze the delay pattern for a cancer site or race group. Delay-adjusted incidence rates and trends are reported for 24 cancer sites. The modeling includes malignant cases only except where noted:

  • All cancers combined (malignant only except for urinary bladder)
  • Brain and other nervous system
  • Breast (female cases only)
  • Cervix uteri
  • Colon and rectum
  • Corpus and uterus
  • Esophagus
  • Hodgkin lymphoma
  • Kidney and renal pelvis
  • Larynx
  • Leukemia
  • Liver and intrahepatic bile duct
  • Lung and bronchus
  • Melanoma of the skin
  • Myeloma
  • Non-Hodgkin lymphoma
  • Oral cavity and pharynx
  • Ovary
  • Pancreas
  • Prostate
  • Stomach
  • Testis
  • Thyroid
  • Urinary bladder (in situ and malignant)

Races /Ethnicity in the Model

Prior to the 2017 release, delay adjustment factors for US registries were developed for all races, Whites, Blacks, and API. Starting from the 2017 release, the delay adjustment factors were developed for all races, Whites, Blacks, API, AI/AN, Hispanics, Non-Hispanics, White Hispanics, Black Hispanics, White Non-Hispanics, and Black Non-Hispanics. For Canadian registries there is no racial or ethnic designation in the registries. For all groups except for AI/AN, the modeling is done by registry. For AI/AN, because the data are sparse, the modeling is done at the US level, but only include Purchased/Referred Care Delivery Areas (PRCDA) (Formerly CHSDA). For All cancer sites combined, Colorectal, Prostate, Female Breast, and Lung cancer, the modeling is also done by CHSDA region. For all of the other cancer sites, due to the sparseness of the data, the modeling is done for all PRCDA counties in the U.S.

Table 3. Develop up to eight factors for each tumor.
  All Races Race(White, Black, API, AI/AN- CHSDA) Ethnicity (Hispanic, Non-Hispanic) Race x Ethnicity (White NH, White H, Black NH, Black H)
All sites        
Specific Cancer Sites        

Table 3 shows that there are up to eight factors for each tumor. For example, a white, non-Hispanic case has the following delay factors.

  All Races White Non-Hispanic WNH
All sites x x x x
Specific Cancer Sites x x x x

Because API and AI/AN are not coded for Race x Ethnicity, API and AI/AN have six factors. For example, a Hispanic API case has only six factors.

  All Races API Hispanic
All sites x x x
Specific Cancer Sites x x x

Determining if Each Registry/ Cancer Site Combination Has Sufficient Counts to be Modeled Individually

Since the purpose of the NAACCR delay modeling is ultimately to produce registry specific delay factors, to the extent possible, models were run for each registry separately. However, the modeling can be difficult if there are less than 50 cases per year for any particular registry and cancer site combination.

If Nijkl represents the number of cases submitted in submission year (i), diagnosis year (j), cancer site (k), and registry (l), we computed the average N..kl for each cancer site and registry averaging over all submission years (i) and diagnosis years (j).

If N..kl≥50 then the modeling was done just with data from that registry.

If N..kl<50 then for that cancer site and registry, the delay model was estimated based on a group of registries that had similar delay times. That is, a composite factor for this cancer site/registries combination.

Table 4 lists of the number of registries with less than 50 cases per submission and diagnosis year for various cancer sites.

To determine the groups, for each cancer site (and all sites) we computed an approximate registry specific estimate of reporting delay for that registry. The approximate estimate of reporting delay for a registry is computed and the ratio of the count in the first submission divided by the count in the fifth, averaged over all diagnosis years where both a first and fifth submission were present). All registries were ranked based on this measure of five-year completeness, and based on the ranking, registries were divided into three groups of registries with approximately equal population size in each group. If a registry/cancer site combination had an average of <50 cases per year, delay estimates were derived from the group of registries it belongs to.

Table 4. Number of registries with <50 cases per year for different cancer sites.
Sites # of registries with <50 cases/year
All sites 0
Female Breast, Colon and Rectum, Lung and Bronchus, Prostate 1
Corpus & Uterus, Kidney & Renal, Leukemia, Melanoma, Non-Hodgkin Lymphoma, Oral Cavity and Pharynx, Pancreas, Thyroid, Urinary Bladder 2
Stomach 6
Brain & Other Nervous System, Myeloma 7
Ovary 8
Esophagus, Liver & Intrahepatic Bile Duct 11
Cervix, Larynx 18
Hodgkin Lymphoma 21
Testis 22

Model Fitting

The multivariate ANOVA model is fitted and backward selection is applied to choose covariates. The dependent variable is log of the ratios defined above. The backward selection is based on the p-value. The final model has all the covariates with p-values < 0.05. Different covariates could remain in the model for different ratios. The fitted values are the estimated ratios defined above. The estimated delay factors are then calculated from the estimated ratios.