An official website of the United States government

Using SEER*Stat to Create Joinpoint Input Data Files

Joinpoint requires a specific file format for the input data file. Therefore, the following export options must be used in SEER*Stat when creating data files to be used in Joinpoint:

  • The data file must be a text file (use a .txt extension or a gzipped text file (use a .gz extension)
  • The Output Variables option must be set to "Numeric Representation"
  • The Line Delimiter option must be set to "DOS/Windows (CR/LF)"
  • The Field Delimiter may be any one of the choices. However, if commas are used as the field delimiter then the option to "Remove All Thousand Separators (Commas)" must be checked.
  • "Remove Flags (Footnote Characters)" must be selected.

 

The independent variable (year) must be an individual value. Therefore, we recommend you create a user-defined variable in SEER*Stat that does not contain any ranges. For example, when creating a new "Year of Diagnosis" variable remove the range variable (1973-2004) for all years combined. When using a SEER*Stat dictionary file (*.DIC), any independent variable value with a dash (*-*) located in its format label will not be used in the analysis. The label will be updated to contain the following text: (not used in calculations). This functionality was added so that users would not have to reproduce their SEER*Stat analyses in order to remove totals or sub-totals in their independent variable.

The records for each by-group must be contiguous. If N = number of records for the first by-group, lines 1 through N must contain all values for the first by-group. Within each by-group, the records must be sorted by the independent variable (this is typically year).

To be sure that the sort order is correct, on the SEER*Stat Table tab, put all table variables in the row dimension. Make the independent variable (year) the last variable in the list of row variables. Alternatively, you could set the independent variable as the column variable.

The SEER Cancer Statistics Review (CSR) uses the same Joinpoint Regression Program. Their analysis uses the "Standard Error" setting for the Heteroscedastic Errors Option with the standard error of the rate used as an estimate of the standard deviation. Using these options, and the current default number of permutations (4499), if the Joinpoint Regression Program chooses the model with 0 joinpoints, the annual percentage rate change will agree with the calculation of this value given in the CSR.

Note: to obtain the SEER*Stat Program visit the SEER web page at:

http://seer.cancer.gov/seerstat/