An official website of the United States government

Data Dependent Selection (DDS)

 

Data Dependent Choice of Model Selection Methods

Selecting the model, that is, the number of joinpoints, can be done by using one of the following options in Joinpoint: Permutation test, Bayesian Information Criterion (BIC), BIC3, and Modified BIC. The Permutation test was proposed in Kim et al. (2000) and implemented in the original version of Joinpoint. The use of BIC in Joinpoint is described in Kim et al. (2009), and the use of BIC3, which applies a harsher penalty than the traditional BIC, is introduced in Kim and Kim (2016).  The modified BIC, originally proposed by Zhang and Siegmund (2007) and applied to the joinpoint regression model, was implemented in Joinpoint V 3.5.
 

The Permutation test is computationally intensive, and the information based criteria, BIC, BIC3 and Modified BIC, are much more computationally efficient than the Permutation test. Regarding their performances, simulation studies indicated that (i) BIC performs well to detect a change with a small effect size and has a tendency of over-estimating the number of joinpoints, (ii) Modified BIC is the most conservative among these selection methods and performs well to detect a change with a large effect size, (iii) the performance of BIC3 is comparable to that of the Permutation test.

A new feature that combines BIC and BIC3 to improve the performance of BIC3 and is computationally more efficient than the Permutation test is implemented in Joinpoint version 4.6.0.0-Alpha. The new procedure internally determines the model selection method, BIC or BIC3, based on the characteristics of data, and its basic idea is to use BIC if change sizes are relatively small and BIC3 otherwise.
 
Consider a joinpoint regression model,

\(y = \beta_{0} +\ \beta_{1}x\ +\ \delta_{1} ( x - \tau_{1} )^+ +\ \cdots\ +\ \delta_{k} ( x - \tau_{\kappa} )^+ +\ \epsilon ,\) where \(\kappa\) is an unknown number of joinpoints, and \(a^+ = \ a \) if \(a \ > \ 0 \), and 0 otherwise. Suppose that with a pre-specified \(k_{max}\), a model with k joinpoints is selected by BIC or BIC3 \((0 \ \leq \ k \leq \ k_{max})\), for which the parameters are estimated as \(\hat{\tau_1}, \ldots, \hat{\tau_k} , \hat{\beta_0}, \hat{\beta_1}, \hat{\delta_1}, \ldots, \hat{\delta_k}\). For the observations in the ith and (i+1)st segments estimated (i=1, 2, ..., k), that is, the observations whose x-values are in \(( \hat{\tau_{i-1}}, \hat{\tau_{i+1}} ]\), where \(\hat{\tau_0}\) = min \(x_i \ - \ 1\) and \(\hat{\tau_{k+1}}\) = max \(x_i\), call their x-values in ascending order as \(x_{j_1 + 1}, \ \ldots, \ x_{j_2}\), and let

\(z_i \ = \ \left( (x_{j_1 + 1} \ - \ \hat{\tau_i})^+, \ \ldots, \ (x_{j_2} - \hat{\tau_i})^+ \right)^T\) and

\(X_0 \ = \ \left( \stackrel{\stackrel{1 \hspace{2mm} x_{j_1+1}}{\vdots \hspace{4mm} \vdots}}{\stackrel{1 \hspace{3mm} x_{j_2}}{ \ }} \right)\). Also let

\(\Delta_{i,i+1} \ = \ \hat{\delta_i}^2 z_i^T(I \ - \ H_0) z_i / \hat{\sigma}^2\), where \(H_0 \ = \ X_0(X_0^TX_0)^{-1}X_0^T\) and \(\hat{\sigma}^2\) is the mean squared error of the model with the maximum number of joinpoints, \(k_{max}\), and define

\(\Delta(k) \ = \ min_{i=1, \ldots, k} \Delta_{i, i+1}\).

Note that the measure \(\Delta_{i, i+1}\) is motivated from the consideration of a quantity related to the power of a test to detect a slope change of \(\delta\) from a simple linear regression model.

Given two pre-specified values, c and d, as cutoff values, the number of joinpoints is estimated as \(\hat{\kappa}\) according to the following steps:

Step 1: Estimate the number of joinpoints using both BIC and BIC3 and call them \(\hat{\kappa_{BIC}}\) and \(\hat{\kappa_{BIC3}}\), respectively.

Step 2: If \(\hat{\kappa_{BIC}}\) = \(\hat{\kappa_{BIC3}}\), then report it as \(\hat{\kappa}\).

Step 3: If \(\hat{\kappa_{BIC}} \ \neq \ \hat{\kappa_{BIC3}}\), compute \(\Delta_{max} \ = \ max (\Delta(\hat{\kappa_{BIC}}), \ \Delta(\hat{\kappa_{BIC3}}))\) and \(\Delta_{min} \ = \ min (\Delta(\hat{\kappa_{BIC}}), \ \Delta(\hat{\kappa_{BIC3}}))\).

Step 4: Use BIC if \(\Delta_{max} \ \leq \ c\) or \(\Delta_{diff} \ = \ \Delta_{max} - \ \Delta_{min} \ > \ d\) (that is, \(\hat{\kappa} = \hat{\kappa_{BIC}}\)), and use BIC3 otherwise (that is, \(\hat{\kappa} = \hat{\kappa_{BIC3}}\)).

Based on a simulation study where the performance of the new model selection procedure with various choices of c and d was examined, we recommend to use c=10 and d=200.  Among the values of c and d considered, the selection procedure with these choices of c and d was observed to perform best for the goal of it being at least as good as BIC3 and improving BIC3 when BIC performs better than BIC3.  Further details can be found in a technical report that is available upon request.

 

References