Methodology

The delay modelling process for the NAACCR annual submission includes several complex algorithms and methods. This page provides an overview of some of the methodologies used in the process.

Delay Model

Data Used

Table 1 shows that data portion used in the new model.

Table 1. Data portion used in the new model (with number of years of reporting delay in each cell).
Diagnosis Year Reporting Year
2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
2005 2 3 4 5 6 7 8 9 10 11 12 13
2006   2 3 4 5 6 7 8 9 10 11 12
2007     2 3 4 5 6 7 8 9 10 11
2008       2 3 4 5 6 7 8 9 10
2009         2 3 4 5 6 7 8 9
2010           2 3 4 5 6 7 8
2011             2 3 4 5 6 7
2012               2 3 4 5 6
2013                 2 3 4 5
2014                   2 3 4
2015                     2 3
2016                       2

Model

For each cancer site and eligible registry, the models and covariates are:

All races (year of diagnosis, age group (<50, 50-64, 65+))

By race (year of diagnosis, age group (<50, 50-64, 65+), race (White, Black, Asian-Pacific Islanders (API))

By ethnicity (year of diagnosis, age group (<50, 50-64, 65+), ethnicity (Hispanic, non-Hispanic))

By race and ethnicity (year of diagnosis, age group (<50, 50-64, 65+), race and ethnicity (White Hispanic, White non-Hispanic, Black Hispanic, Black non-Hispanic))

The modeling steps are shown below.

Step 1: Find ratios of sequential counts ratios of delay times 3 and 2, ratios of delay times 4 and 3, ratios of delay times 5 and 4, …, and ratios of delay times 11 to 10. If there is a missing cell, the ratio is not calculated.

Step 2: Group the ratios found in Step 1 into 4 groups: (1) ratios of delay times 3 and 2; (2) ratios of delay times 4 and 3; (3) ratios of delay times 5 and 4; (4) ratios of delay times j and j-1 (j=6, 7, 8, 9, 10, and 11). These four groups are dependent variables in the model. Normally, if there is no missing counts, group 1 has 11 ratios, group 2 has 10 ratios, group 3 has 9 ratios, and group 4 has 33 ratios.

Table 2. Four dependent variables formed by ratios of delay times (Labeled as a, b, c, and d).
  Four Dependent Variables
  a b c d
Diagnosis Year r3/2 r4/3 r5/4 r6/5 r7/6 r8/7 r9/8 r10/9 r11/10
2005 y2008/
y2007
y2009/
y2008
y2010/
y2009
y2011/
y2010
y2012/
y2011
y2013/
y2012
y2014/
y2013
y2015/
y2014
y2016/
y2015
2006 y2009/
y2008
y2010/
y2009
y2011/
y2010
y2012/
y2011
y2013/
y2012
y2014/
y2013
y2015/
y2014
y2016/
y2015
y2017/
y2016
2007 y2010/
y2009
y2011/
y2010
y2012/
y2011
y2013/
y2012
y2014/
y2013
y2015/
y2014
y2016/
y2015
y2017/
y2016
y2018/
y2017
2008 y2011/
y2010
y2012/
y2011
y2013/
y2012
y2014/
y2013
y2015/
y2014
y2016/
y2015
y2017/
y2016
y2018/
y2017
 
2009 y2012/
y2011
y2013/
y2012
y2014/
y2013
y2015/
y2014
y2016/
y2015
y2017/
y2016
y2018/
y2017
   
2010 y2013/
y2012
y2014/
y2013
y2015/
y2014
y2016/
y2015
y2017/
y2016
y2018/
y2017
     
2011 y2014/
y2013
y2015/
y2014
y2016/
y2015
y2017/
y2016
y2018/
y2017
       
2012 y2015/
y2014
y2016/
y2015
y2017/
y2016
y2018/
y2017
         
2013 y2016/
y2015
y2017/
y2016
y2018/
y2017
           
2014 y2017/
y2016
y2018/
y2017
             
2015 y2018/
y2017
               
2016                  

Step 3: Excluding Registries that have too much missing data. Eliminate registries that do not have (1) 5 out 11 ratios of delay times 3 and 2; (2) 5 out 10 ratios of delay times 4 and 3; (3) 5 out 9 ratios of delay times 5 and 4; (4) 20 out 33 ratios of the remaining ratios. No delay modeling will be conducted for these registries because they do not have a sufficient history of reporting delay.

Step 4: Step 4: Fit multivariate ANOVA model where the dependent variables are the logarithm of the ratios derived from Step 2.

Step 5: The fitted model is then used to produce delay adjustment factors. For example, let a, b, c, and d denote r(3/2), r(4/3),r(5/4), and r(5+), respectively, as estimates of the ratio from the model. The delay adjustment factor for diagnosis year 2016 is obtained as a*b*c*d6, for diagnosis year 2015 b*c*d6, for diagnosis year 2014 c*d6, and so on.