Methodology

The delay modelling process for the NAACCR annual submission includes several complex algorithms and methods. This page provides an overview of some of the methodologies used in the process.

Delay Model

Data Used

Table 1 shows that data portion used in the new model.

Table 1. Data portion used in the new model (with number of years of reporting delay in each cell).
Diagnosis Year	Reporting Year
Diagnosis Year	2013	2014	2015	2016	2017	2018	2019	2020	2021	2022	2023	2024
2011	2	3	4	5	6	7	8	9	10	11	12	13
2012		2	3	4	5	6	7	8	9	10	11	12
2013			2	3	4	5	6	7	8	9	10	11
2014				2	3	4	5	6	7	8	9	10
2015					2	3	4	5	6	7	8	9
2016						2	3	4	5	6	7	8
2017							2	3	4	5	6	7
2018								2	3	4	5	6
2019									2	3	4	5
2020										2	3	4
2021											2	3
2022												2

Model

For each cancer site and eligible registry, the models and covariates are:

All races (year of diagnosis, age group (<50, 50-64, 65+))

By race (year of diagnosis, age group (<50, 50-64, 65+), race (White, Black))

By race and ethnicity (year of diagnosis, age group (<50, 50-64, 65+), race and ethnicity (Hispanic, non-Hispanic White, non-Hispanic Black, non-Hispanic Asian and Pacific Islanders))

The modeling steps are shown below.

Step 1: Find ratios of sequential counts ratios of delay times 3 and 2, ratios of delay times 4 and 3, ratios of delay times 5 and 4, …, and ratios of delay times 11 to 10. If there is a missing cell, the ratio is not calculated.

Step 2: Group the ratios found in Step 1 into 4 groups: (1) ratios of delay times 3 and 2; (2) ratios of delay times 4 and 3; (3) ratios of delay times 5 and 4; (4) ratios of delay times j and j-1 (j=6, 7, 8, 9, 10, and 11). These four groups are dependent variables in the model. Normally, if there are no missing counts, group 1 has 11 ratios, group 2 has 10 ratios, group 3 has 9 ratios, and group 4 has 33 ratios.

Table 2. Four dependent variables formed by ratios of delay times (Labeled as a, b, c, and d).
Diagnosis Year	r_3/2	r_4/3	r_5/4	r_6/5	r_7/6	r_8/7	r_9/8	r_10/9	r_11/10
	Four Dependent Variables
	a	b	c	d
2011	y₂₀₁₄/ y₂₀₁₃	y₂₀₁₅/ y₂₀₁₄	y₂₀₁₆/ y₂₀₁₅	y₂₀₁₇/ y₂₀₁₆	y₂₀₁₈/ y₂₀₁₇	y₂₀₁₉/ y₂₀₁₈	y₂₀₂₀/ y₂₀₁₉	y₂₀₂₁/ y₂₀₂₀	y₂₀₂₂/ y₂₀₂₁
2012	y₂₀₁₅/ y₂₀₁₄	y₂₀₁₆/ y₂₀₁₅	y₂₀₁₇/ y₂₀₁₆	y₂₀₁₈/ y₂₀₁₇	y₂₀₁₉/ y₂₀₁₈	y₂₀₂₀/ y₂₀₁₉	y₂₀₂₁/ y₂₀₂₀	y₂₀₂₂/ y₂₀₂₁	y₂₀₂₃/ y₂₀₂₂
2013	y₂₀₁₆/ y₂₀₁₅	y₂₀₁₇/ y₂₀₁₆	y₂₀₁₈/ y₂₀₁₇	y₂₀₁₉/ y₂₀₁₈	y₂₀₂₀/ y₂₀₁₉	y₂₀₂₁/ y₂₀₂₀	y₂₀₂₂/ y₂₀₂₁	y₂₀₂₃/ y₂₀₂₂	y₂₀₂₄/ y₂₀₂₃
2014	y₂₀₁₇/ y₂₀₁₆	y₂₀₁₈/ y₂₀₁₇	y₂₀₁₉/ y₂₀₁₈	y₂₀₂₀/ y₂₀₁₉	y₂₀₂₁/ y₂₀₂₀	y₂₀₂₂/ y₂₀₂₁	y₂₀₂₃/ y₂₀₂₂	y₂₀₂₄/ y₂₀₂₃
2015	y₂₀₁₈/ y₂₀₁₇	y₂₀₁₉/ y₂₀₁₈	y₂₀₂₀/ y₂₀₁₉	y₂₀₂₁/ y₂₀₂₀	y₂₀₂₂/ y₂₀₂₁	y₂₀₂₃/ y₂₀₂₂	y₂₀₂₄/ y₂₀₂₃
2016	y₂₀₁₉/ y₂₀₁₈	y₂₀₂₀/ y₂₀₁₉	y₂₀₂₁/ y₂₀₂₀	y₂₀₂₂/ y₂₀₂₁	y₂₀₂₃/ y₂₀₂₂	y₂₀₂₄/ y₂₀₂₃
2017	y₂₀₂₀/ y₂₀₁₉	y₂₀₂₁/ y₂₀₂₀	y₂₀₂₂/ y₂₀₂₁	y₂₀₂₃/ y₂₀₂₂	y₂₀₂₄/ y₂₀₂₃
2018	y₂₀₂₁/ y₂₀₂₀	y₂₀₂₂/ y₂₀₂₁	y₂₀₂₃/ y₂₀₂₂	y₂₀₂₄/ y₂₀₂₃
2019	y₂₀₂₂/ y₂₀₂₁	y₂₀₂₃/ y₂₀₂₂	y₂₀₂₄/ y₂₀₂₃
2020	y₂₀₂₃/ y₂₀₂₂	y₂₀₂₄/ y₂₀₂₃
2021	y₂₀₂₄/ y₂₀₂₃
2022

Step 3: Excluding Registries that have too much missing data. Eliminate registries that do not have (1) 5 out 11 ratios of delay times 3 and 2; (2) 5 out 10 ratios of delay times 4 and 3; (3) 5 out 9 ratios of delay times 5 and 4; (4) 20 out 33 ratios of the remaining ratios. No delay modeling will be conducted for these registries because they do not have a sufficient history of reporting delay.

Step 4: Using the logarithm of the ratios derived from Step 2 as the dependent variables to fit a statistical model. The fitted model is then used to produce delay adjustment factors. For example, let a, b, c, and d denote r_(3/2), r_(4/3),r_(5/4), and r₍₅₊₎, respectively, as estimates of the ratio from the model. The delay adjustment factor for diagnosis year 2022 is obtained as a*b*c*d⁶, for diagnosis year 2021 b*c*d⁶, for diagnosis year 2020 c*d⁶, and so on.