CP*Trends–Methodology

Introduction

Constructing the CP*Trends graphs requires smoothing to produce graphs that are visually interpretable. The disease rates by single years of age and calendar year are smoothed over an extended circular window on both the age and calendar year time scale. The size of the window is chosen so that it strikes a reasonable balance between presenting very granular estimates which may be less noisy, and less granular estimates which may be biased. A rate computed for the entire window is plotted for the age and calendar year specified by the center point of the window. In addition, two statistics are computed for each graph to quantify one’s visual impressions of the graphs. The C-P Score summarizes the magnitude of the relative contributions of period and cohort in a model for trend. In a table of rates, there are many more cohorts than period, creating a bias that favors cohort. Hence, an adjusted C-P Score is given to correct for this bias. Negative adjusted C-P Score suggests dominance of period while a positive Score points to cohort. Graphs showing trend give bootstrap mean estimates of adjusted C-P Score along with 95% confidence intervals. An overview of the smoothing method and two summary statistics are provided below. More details are provided in Holford et al. (2024) [Am J Epidemiol] and Holford et al. (in press) [Am J Epidemiol].

Smoothing

Rates by single year of age and year often depend on a small number of cases or deaths, which result in less precise estimates. To improve the estimates, data from neighboring ages and calendar years are combined, increasing precision, and reducing noise for graphical display of the trend. The tradeoff of this approach is that while random error in the estimate is reduced, bias is potentially increased during times when rates are changing rapidly.

Yearly Smoothed Rates

Let r represent the radius of a window which indicates that all ages and years in which the length of time from the middle of the age and year of interest and the contributing ages and years is less than or equal to r. The rate is the ratio of the number of cases or deaths for a cancer site to the total population for those age-year cells. As r increases, random error tends to decrease because of the increasing number of cases/deaths, whereas bias tends to increase. The bias can be especially affected by curvature in the trend and by ages or years that are near the boundary of available data.

Selection of Window Size

Chi-square goodness of fit over all calendar years within a range of ages provides a measure of how well the smoothing is working and the extent to which the smoothed estimate is biased. Specifically, we use as a goodness of fit measure the ratio of chi-square over the degrees of freedom. One expects the ratio to be 1 if there is no bias, but as r increase the ratio tends to increase if the estimates are affected by bias. The window size used is the largest window that has a goodness of fit measure less than 1.50, thus reducing random error while avoiding excessive bias. This value was empirically found to produce a reasonable tradeoff of graphs that look reasonably smooth, yet not overly smoothed to introduce bias into the trends.

Deviance Analysis of Period and Cohort Effects

For each set of period and cohort graphs (based on incidence or mortality rates for a selected cancer site and sex), an adjusted C-P Score is computed to assist in interpreting the graphs and understanding how this cancer site compares to others (see C-P Score Comparison). These statistics are explained below. Details of the computations of these two statistics for each individual graph are provided in the Model Summary tab within the CP*Trends tool (click "Get Started" button on CP*Trends: Compare Cohort and Period Trends across Cancer Sites).

Drift

The sum of the linear slopes for period and for cohort is drift, and it indicates the overall direction of the trend. These linear effects are not identifiable, meaning that they cannot be uniquely attributed to either period or cohort, but the overall direction of trend or drift is important to measure. Curvature measures the deviation about the linear trend, and this is the only aspect of trend analysis that provides information about the separate contributions of period and cohort. The summary analysis is presented in two ways after controlling for age: (1) not adjusting for drift which captures all aspects of trends, and, (2) adjusting for drift which is the part of the analysis that does provide information on which factor is dominant.

Adjusted C-P Score

The age-period-cohort model is used to summarize the relative contribution of period and cohort to the temporal trend. The Cohort-Period Score (C-P Score) is a measure of how much of the trend can be explained by cohort effects versus period effects. In principle the C-P Score ranges between -1 and +1. If the entire temporal trend is accounted for by cohort effects, then the C-P score is 1, but if the entire trend is due to period effects, then it is -1. However, period and cohort effects are colinear, so it is not possible for the two sets of parameters to be independent. In addition, an analysis of a table of rates results in many more cohorts than periods, which results in a bias favoring cohort. Thus, we use an adjusted C-P Score that controls for this source of bias.

On a more technical level, Poisson regression is used for the calculation of C-P Score, beginning with a model that includes age effects and linear drift, i.e., the sum of period slope and cohort slope. (Because of the identifiability problem, period and cohort slopes cannot be determined at the same time.) Scaled deviance is a measure of goodness of fit for a model. C-P score is the difference in the proportion of scaled deviance due to cohort alone and period alone. Bias depends on the degrees of freedom for period and cohort, so this is subtracted from the crude C-P Score and then adjusted so that it is at zero if the contributions of each is the same, but the range is still -1 to +1.

To assess the precision of the adjusted C-P Score, 1000 bootstrap samples were drawn for the file of rates, and the Score was determined for each sample. The mean of the bootstrap score is presented as a measure of the effect, and the 2.5 and 97.5 percentiles of the distribution provide estimates of the 95% confidence interval.

Deviance Explained by Period and Cohort

In addition to the adjusted C-P Score, an additional measure is needed to fully characterize the contributions of period and cohort. The Deviance Explained by Period and Cohort is a value between 0% and 100%. It measures the strength of the period and cohort effects when added to the model with age alone or age and drift. Low values imply that very little effect in the age-period-cohort model is attributable to the cohort and/or period, which might suggest either small effects or lack of fit for the model. For example, the adjusted C-P score could be close to 0, and the Deviance Explained by Period and Cohort adjusted for age and drift could be small (meaning that neither cohort nor period effects are very curved) or close to a straight line. A small value of the Percent of Deviance Explained by Period and Cohort could also mean that the period or cohort effects do not fit the data well, i.e., there is lack of fit of the model due to a complex relationship, e.g. while obesity is a risk factor for breast cancer associated with birth cohorts, it is thought to increase breast cancer risk in post-menopausal but reduces risk in pre-menopausal women. This type of interaction between age and birth cohort factors is not included in the standard model and would cause lack of fit if changes in levels of obesity by birth cohort are a strong driver of trends.

The Deviance Explained when age alone is included (but not drift), captures the effects that can be attributable to period and cohort, as well as the effect of drift. If the trend is linear, then this is captured by drift, but the adjusted C-P Score is near 0. However, if drift is strong, then period and cohort are influencing trend, but it is just not possible to ascribe that contribution of one of the temporal factors.

Statistically, the percent deviance explained by period and cohort is 100 x the ratio of the change in scaled deviance when both period and cohort are added to the model over the scaled deviance for the model with just age and drift.

C-P Score Comparison

A chart giving the adjusted C-P Score and it’s 95% confidence interval for incidence or mortality, sex (male, female, both sexes), and race (Black, White, both races) show how a specific cancer site compares to all other cancer sites. This graph helps put in perspective the strength of the cohort and period effects for each cancer site compared to others.