Analysis of Data Based on the 2024 NAACCR Submission

Exclusions across All Cancer Sites to Remove Obvious Aberrant Data

In order to produce stable estimates, the data are carefully examined before fitting the model. Some data are apparent outliers and are removed (e.g., a single submission with a sudden spike up in cases and then a decline in the next submission). These data are removed because the purpose of delay modeling is to project future counts of cases from the most current submission, and an aberrant submission is not likely to occur in the future.

Registries Included in the Analysis

In the analysis of 2024 NAACCR submission, there are 57 U.S. registries and 13 Canadian registries. The 57 U.S. registries include 4 states that are divided into 9 sub-state registries, 46 state registries, District of Columbia, and Puerto Rico. There are 70 registries with U.S. and Canadian registries combined. Totally, 1 U.S. registry and 4 Canadian registries are excluded from the analysis.

No SEER registries are excluded based on these requirements.

Cancer Sites and Races/Ethnicity Included in Analysis

Cancer Sites

To create a stable delay-adjustment model, there needs to be sufficient amount of cases in the submissions in order to analyze the delay pattern for a cancer site or race group. Delay-adjusted incidence rates and trends are reported for 24 cancer sites. The modeling includes malignant cases only except where noted:

All cancers combined (malignant only except for urinary bladder)
Brain and other nervous system
Breast (female cases only)
Cervix uteri
Colon and rectum
Corpus and uterus
Esophagus
Hodgkin lymphoma
Kidney and renal pelvis
Larynx
Leukemia
Liver and intrahepatic bile duct
Lung and bronchus
Melanoma of the skin
Myeloma
Non-Hodgkin lymphoma
Oral cavity and pharynx
Ovary
Pancreas
Prostate
Stomach
Testis
Thyroid
Urinary bladder (in situ and malignant)

Races/Ethnicity in the Model

Prior to the 2017 release, delay adjustment factors for U.S. registries were developed for all races, Whites, Blacks, and API. From the 2017 release to the 2023 release, the delay adjustment factors were developed for all races, Whites, Blacks, API, AI/AN, Hispanics, non-Hispanics, White Hispanics, Black Hispanics, White non-Hispanics, and Black non-Hispanics. Starting from the 2024 release, the delay adjustment factors were developed for all races, Whites, Blacks, Hispanics, White non-Hispanics, Black non-Hispanics, API non-Hispanics, and AI/AN non-Hispanics. For Canadian registries there is no racial or ethnic designation in the registries. For all groups except for AI/AN non-Hispanics, the modeling is done by registry. For AI/AN non-Hispanics, because the data are sparse, the modeling is done at the U.S. level, but only includes Purchased/Referred Care Delivery Areas (PRCDA) (Formerly CHSDA). For All cancer sites combined, Colorectal, Prostate, Female Breast, and Lung cancer, the modeling is also done by PRCDA region. For all of the other cancer sites, due to the sparseness of the data, the modeling is done for all PRCDA counties in the U.S.

Table 3. Develop up to six factors for each tumor.
	All Races	Race (White, Black)	Race x Ethnicity (Hispanic, White NH, Black NH, API NH, AIAN NH)
All sites
Specific Cancer Sites

Table 3 shows that there are up to six factors for each tumor. For example, a white, non-Hispanic case has the following delay factors.

	All Races	White	White non-Hispanic
All sites	x	x	x
Specific Cancer Sites	x	x	x

Determining if Each Registry/ Cancer Site Combination Has Sufficient Counts to be Modeled Individually

Since the purpose of the NAACCR delay modeling is ultimately to produce registry specific delay factors, to the extent possible, models were run for each registry separately. However, the modeling can be difficult if there are less than 50 cases per year for any particular registry and cancer site combination.

If N_ijkl represents the number of cases submitted in submission year (i), diagnosis year (j), cancer site (k), and registry (l), we computed the average N.._kl for each cancer site and registry averaging over all submission years (i) and diagnosis years (j).

If N.._kl≥50 then the modeling was done just with data from that registry.

If N.._kl<50 then for that cancer site and registry, the delay model was estimated based on a group of registries that had similar delay times. That is, a composite factor for this cancer site/registries combination.

Table 4 lists of the number of registries with less than 50 cases per submission and diagnosis year for various cancer sites.

To determine the groups, for each cancer site (and all sites) we computed an approximate registry specific estimate of reporting delay for that registry. The approximate estimate of reporting delay for a registry is computed and the ratio of the count in the first submission divided by the count in the fifth, averaged over all diagnosis years where both a first and fifth submission were present). All registries were ranked based on this measure of five-year completeness, and based on the ranking, registries were divided into three groups of registries with approximately equal population size in each group. If a registry/cancer site combination had an average of <50 cases per year, delay estimates were derived from the group of registries it belongs to.

Table 4. Number of registries with <50 cases per year for different cancer sites.
Sites	# of registries with <50 cases/year
All sites	0
Female Breast, Colon and Rectum, Lung and Bronchus, melanoma, Prostate	1
Corpus & Uterus, Kidney & Renal, Leukemia, Non-Hodgkin Lymphoma, Oral Cavity and Pharynx, Pancreas, Thyroid, Urinary Bladder	2
Stomach	5
Liver & Intrahepatic Bile Duct, Myeloma, Brain & Other Nervous System	6
Ovary	8
Esophagus	9
Cervix	17
Larynx	18
Testis	19
Hodgkin Lymphoma	21

Model Fitting

A statistical model is fitted and backward selection is applied to choose covariates. The dependent variable is log of the ratios defined above. The backward selection is based on the p-value. The final model has all the covariates with p-values < 0.05. Different covariates could remain in the model for different ratios. The fitted values are the estimated ratios defined above. The estimated delay factors are then calculated from the estimated ratios.