# CP*Trends–Methodology: Compare Cohort and Period Trends across Cancer Sites

## CP*Trends

## Introduction

Constructing the CP*Trends graphs requires smoothing to produce graphs that are visually interpretable. The disease rates by single years of age and calendar year are smoothed over an extended circular window on both the age and calendar year time scale. The size of the window is chosen so that it strikes a reasonable balance between presenting very granular estimates which may be less noisy, and less granular estimates which may be biased. A rate computed for the entire window is plotted for the age and calendar year specified by the center point of the window. In addition, two statistics are computed for each graph to quantify one’s visual impressions of the graphs. The first statistic, C-P Score, summarizes the magnitude of the relative contribution of period and cohort in a model for trend. The second statistic, Deviance Explained by Period and Cohort, quantifies how strong the period and cohort effects when added to the model. An overview of the smoothing method and two summary statistics are provided below. More details are provided in Holford et al. (2019) [*Am J Epidemiol*].

### Smoothing

Rates by single year of age and year often depend on a small number of cases or deaths, which result in less precise estimates. To improve the estimates, data from neighboring ages and calendar years are combined, increasing precision and reducing noise for graphical display of the trend. The tradeoff of this approach is that while random error in the estimate is reduced, bias is potentially increased during times when rates are changing rapidly.

#### Yearly Smoothed Rates

Let *r* represent the size of a window which indicates that all ages and years in which the length of time from the middle of the age and year of interest and the contributing ages and years is less than or equal to *r*. The rate is the ratio of the number of cases or deaths for a cancer site to the total population for those age-year cells. As *r* increases, random error tends to decrease because of the increasing number of cases/deaths, whereas bias tends to increase. The bias can be especially affected by curvature in the trend and by ages or years that are near the boundary of available data.

#### Selection of Window Size

Chi-square goodness of fit over all calendar years within a range of ages provides a measure of how well the smoothing is working and the extent to which the smoothed estimate is biased. Specifically, we use the ratio of chi-square over the degrees of freedom. One expects the ratio to be 1 if there is no bias, but as *r* increase the ratio tends to increase if the estimates are affected by bias. The window size used is the largest window that has a goodness of fit measure less than 1.50, thus reducing random error while avoiding excessive bias. This value was empirically found to produce a reasonable tradeoff of graphs that look reasonably smooth, yet do not "over smooth" and introduce bias into the trends.

### Deviance Analysis of Period and Cohort Effects

For each period and cohort set of graphs (based on incidence or mortality rates for a selected cancer site and sex), a C-P Score and a Percent Deviance Explained by Period and Cohort are computed to assist in interpreting the graphs and understanding how this cancer site compares to others (see Scatter Plots). These statistics are explained below. Details of the computations of these two statistics for each individual graph are provided in the Model Summary tab within the CP*Trends tool (click "Get Started" button).

#### C-P Score

The age-period-cohort model is used to summarize the relative contribution of period and cohort to the temporal trend. The Cohort-Period Score (C-P Score) is a measure of how much of the trend can be explained by cohort effects versus period effects. The C-P Score ranges between -1 and +1. If the entire temporal trend is accounted for by cohort effects, then the C-P score is 1, but if the entire trend is due to period effects, then it is -1. If, on the other hand, the contributions of cohort and period effects are equal, then C-P is 0.

On a more technical level, Poisson regression is used for the calculation of C-P Score, beginning with a model that includes age effects and linear drift, i.e., the sum of period slope and cohort slope. (Because of the identifiability problem, period and cohort slopes cannot be determined at the same time.) Scaled deviance is a measure of goodness of fit for a model. C-P score is the difference in the proportion of scaled deviance due to cohort alone and period alone.

#### Deviance Explained by Period and Cohort

In addition to the C-P Score, an additional measure is needed to fully characterize the contributions of period and cohort. The Deviance Explained by Period and Cohort is a value between 0% and 100%. It measures how strong the period and cohort effects when added to the model. Low values imply that very little effect in the age-period-cohort model is attributable to the cohort and/or period, which might suggest either small effects or lack of fit for the model. For example, the C-P score could be close to 0, and the Deviance Explained by Period and Cohort could be small (meaning that neither cohort nor period effects are very curved) or close to a straight line. A small value of the Percent of Deviance Explained by Period and Cohort could also mean that the additive period or cohort effects do not fit the data, i.e., there is lack of fit of the model due to a complex relationship, e.g. while obesity is a risk factor for breast cancer associated with birth cohorts, it is thought to increase breast cancer risk in post-menopausal but reduces risk in pre-menopausal women. This type of interaction between age and birth cohort factors is not included in the standard model and would cause lack of fit if changes in levels of obesity by birth cohort are a strong driver of trends.

Statistically, the percent deviance explained by period and cohort is 100 x the ratio of the change in scaled deviance when both period and cohort are added to the model over the scaled deviance for the model with just age and drift.

#### Scatter Plots

A scatterplot with the C-P Score on the X-axis and the Percent Deviance Explained by Period and Birth Cohort on the Y-axis for incidence or mortality and sex (male, female, both sexes) (6 scatterplots in all) graphically shows how a specific cancer site compares to all other cancer sites. This plot helps put in perspective the strength of the cohort and period effects for this cancer site compared to others.