Importing SEER*Stat Data into DevCan
Exercise 1

In this exercise, you will calculate the probability, by race, of a female developing or dying of malignant breast cancer between 2000 and 2002.

Key Points and Reminders

  • This exercise describes only the steps needed to get data regarding one cancer site.

Step 1: Prepare the Cancer Incidence data

  1. Start SEER*Stat.
  2. Start a new Rate Session.
  3. On the Data tab, select the database "Incidence - SEER 13 Regs Research Data, Nov 2004 Sub for Expanded Races (1992-2002)".
  4. On the Statistic tab, select Rates (Crude) as your type of statistic.
  5. Go to the Selection tab.
  6. Edit the Race, Sex, Year Dx, Registry, County (Pop, Case Files) selection statement to read:
    {Race, Sex, Year Dx, Registry, County.Sex} = 'Female'
  7. Edit the Other (Case Files) selection statement to read:
    {Site and Morphology.Site rec with Kaposi and mesothelioma} = 'Breast'
  8. Mark both the Select Only Malignant Behavior and Select Only the First Matching Record for Each Person checkboxes. Now your selection definition is set up to find only the first malignant breast cancer for each female in the database.
  9. Go to the Table tab.
  10. Open the File menu and select Dictionary.
  11. Open the Race, Sex, Year Dx, Registry, County folder.
  12. Create a User-Defined variable based on "Race Recode Y". It should have three groupings: "All Races" (which includes all available values), "White", and "Black". Call it "Race recode Y (All, White, Black)". If desired, save the variable to the dictionary. (Learn more about saving this exercise's variables to the dictionary.)
  13. Re-open the Race, Sex, Year Dx, Registry, County folder and create a User-Defined variable based on "Year of Diagnosis". It should have one grouping, "2000-2002", which includes those three years. Call it "Year Dx (2000-02 1 group only) for DevCan". (Learn more about creating a year variable with only one grouping.)
  14. Close the dictionary.
  15. Arrange the variables on the Table tab as follows:
    • Page
      • Year Dx (2000-02 1 group only) for DevCan
    • Row
      • Race recode Y (All, White, Black)
    • Column
      • Age Recode with <1 year olds
    Note: Do not arrange the variables in a different order. (Learn more about the order of variables on the Table tab.)
  16. Go to the Output tab.
  17. Enter a title for the matrix.
  18. Choose to Display Rates as Cases Per 100,000.
  19. Execute the session. Two warnings will be displayed.
    • The first one will warn you that the "Year Dx (2000-02 1 group only) for DevCan" variable has only one grouping defined. This is expected in this case. (Learn more about creating a year variable with only one grouping.) Click Yes to continue.
    • The second will warn you that when First Matching Record Selection is used, the population will not be adjusted to account for people who are no longer at risk. You may safely ignore this warning because DevCan has been designed to compensate for it. Click OK to continue. (Note: You have the option to suppress future displays of this error message. If you have done so before, you will not see it this time.)
    • If you get warnings or error messages other than these, go back and check that you have followed all steps correctly.
  20. After calculation, your matrix is displayed in a new window. Save it with a filename that identifies it as the Cancer Incidence matrix. Compare it to ssdc1_cancer_incidence.sim if necessary.
  21. Open the Matrix menu. Select Export, then Text File. Set up the options as follows. (Learn more about export settings for DevCan.)
    • Output Variables as: Numeric Representation
    • Line Delimiter: DOS/Windows (CR/LF)
    • Missing Character: Space
    • Field Delimiter: Tab
    • Check the boxes to Remove All Thousands Separators (Commas) and Remove Flags (Footnote), Prefix and Suffix Characters. Leave the other checkboxes unmarked.
  22. Export the matrix with a filename that identifies it as the Cancer Incidence data.

Learn More...

  • Saving this exercise's variables to the dictionary: Some of the variables you create in this exercise will be used again in Exercise 2, and may be useful at other times as well. If you want to save them for later use, mark the Save to Dictionary box before you close the Edit Variable window.
  • Creating a year variable with only one grouping: Ordinarily in SEER*Stat, we use a selection statement when interested in just a single grouping. Here, we are using a table variable, which is usually not recommended. We are doing this for a reason. If we were to restrict our selection to the years 2000-2002, and click the "Select First Matching Record for Each Person" box, SEER*Stat would find the first occurrence of breast cancer that was diagnosed between 2000 and 2002 for each person. However, we are only interested in a person's first breast cancer during that time period if it was also their first breast cancer ever. In other words, we are looking for first ever breast cancers that happened to be diagnosed in the years 2000-2002. Thus we include all first breast cancers, but only display (and export) those in our desired time frame.
  • The order of variables on the Table tab: DevCan expects the variables in data files to be arranged in a certain order. The age variable must be at the bottom of the list on the Table tab. Variables which correspond to one another (e.g. "Age at Diagnosis" and "Age at Death", or "Site Recode" and "Cause of Death") must occupy the same position in each file (i.e., the variables must be in the same order from top to bottom).
  • Export settings for DevCan: The same export settings must be used whenever you export data from SEER*Stat for use in DevCan. If this is your primary use of SEER*Stat, you may want to click Set Default to avoid always having to re-enter them.

Step 2: Prepare the Cancer Mortality data

  1. Start a new Rate Session.
  2. On the Data tab, select the database "Mortality - All COD, Public-Use With State, Total U.S. for Expanded Races (1990-2002)".
  3. On the Statistic tab, select Rates (Crude) as your type of statistic.
  4. Go to the Selection tab.
  5. Edit the Race, Sex, Year Dth, State, Registry (Pop, Case Files) selection statement to read as follows:
    {Race, Sex, Year Dth, State, Registry.Sex} = 'Female'
    AND {Race, Sex, Year Dth, State, Registry.Year of death} = '2000','2001','2002'
    AND ({Race, Sex, Year Dth, State, Registry.SEER registry} = 'San Francisco-Oakland SMSA','Connecticut','Detroit (Metropolitan)','Hawaii','Iowa','New Mexico','Seattle (Puget Sound)','Utah','Atlanta (Metropolitan)','San Jose-Monterey','Los Angeles','Rural Georgia'
    OR ({Race, Sex, Year Dth, State, Registry.SEER registry} = 'Alaska'
    AND {Race, Sex, Year Dth, State, Registry.Race recode Y} = 'American Indian/Alaska Native'))
    Note that the SEER 13 registries are not listed consecutively; double-check that you have selected the right ones. Also note the order of parentheses in the last three lines of the statement. Since the Alaska Native Tumor Registry only collects data on cancer incidence in patients whose race is "American Indian/Alaska Native", you must narrow the mortality selection statement so that cancer deaths among other races in that registry are not included in the analysis.
  6. Edit the Other (Case Files) selection statement to read:
    {Site and Morphology.Cause of death recode} = 'Breast'
    Now your analysis will include all females in the SEER 13 Registries who died from breast cancer between 2000 and 2002.
  7. Go to the Table tab.
  8. Open the File menu and click Dictionary.
  9. Open the Race, Sex, Year Dth, State, Registry folder.
  10. Just as you did in the Incidence session, Create a User-Defined variable based on "Race Recode Y". It should have three groupings: "All Races" (which includes all available values), "White", and "Black". Call it "Race recode Y (All, White, Black)". As before, you may choose to save this variable to the dictionary.
  11. Create another User-Defined variable based on "Year of Death". Like the year of diagnosis variable you created in the Incidence session, it should have only one grouping, "2000-2002", that includes those three years. Call it "Yr Dth (2000-02 1 group only) for DevCan". (Learn more about the year variable in the mortality sessions.)
  12. Close the dictionary.
  13. Arrange the variables on the Table Tab as follows:
    • Page
      • Yr Dth (2000-02 1 group only) for DevCan
    • Row
      • Race recode Y (All, White, Black)
    • Column
      • Age Recode with <1 year olds
  14. Go to the Output tab.
  15. Enter a title for the matrix.
  16. Choose to Display Rates as Cases Per 100,000.
  17. Execute the session. You will receive the same warning about the year variable having only one grouping as you did when executing the Incidence of Cancer session. Click Yes to continue.
  18. After calculation, your matrix is displayed in a new window. Save it, in the same place as the previous matrix, with a filename that identifies it as the Cancer Mortality matrix. Compare it to ssdc1_cancer_mortality.sim if necessary. Do not close the session window.
  19. Open the Matrix menu. Select Export, then Text File.
  20. Edit the settings the same way as before.
  21. Export the matrix with a filename that identifies it as the Cancer Mortality data.

Learn More...

  • The year variable in the mortality sessions: In the mortality sessions, setting up a year variable with only one grouping doesn't affect the results. However, you must still include this variable in the table so that the mortality files exported by SEER*Stat match the incidence file.

Step 3: Prepare the All Causes of Mortality data

  1. Return to the session window you used to create the Cancer Mortality matrix. (Learn more about reusing the Cancer Mortality session.) If you have closed this window, you can still retrieve it:
    1. If the Cancer Mortality Matrix is not open in SEER*Stat, re-open it. If it is open, click on it to be sure it is the active window.
    2. Open the Matrix menu and select Retrieve Session.
  2. Go to the Selection tab.
  3. Clear the Other (Case Files) statement. Now your matrix will include all deaths of females in the SEER 13 Registries between 2000 and 2002, instead of only deaths from breast cancer.
  4. Go to the Output tab and enter a new title for the matrix.
  5. Execute the session. You will receive the same warning as in the previous sessions. Click Yes to continue.
  6. After calculation, your matrix is displayed in a new window. Save it, with a filename that identifies it as the All Causes of Mortality matrix. Compare it to ssdc1_all_causes_of_mortality.sim if necessary.
  7. Open the Matrix menu. Select Export, then Text File.
  8. Edit the settings the same way as before.
  9. Export the matrix with a filename that identifies it as the All Causes of Mortality data.

Learn More...

  • Reusing the Cancer Mortality session: The information in the Cancer Mortality session can be reused to build the All Causes of Mortality session, since they both cover the same periods of time and types of patients. The only difference is that the All Causes of Mortality data includes all deaths, not just deaths from cancer.

Step 4: Import the data into DevCan

  1. Start DevCan.
  2. Open the Database menu and select Import SEER*Stat Data.
  3. Use the Browse buttons to locate the ".dic" files you exported from SEER*Stat in the previous sections. Click Execute.
  4. You will be prompted to enter a new database name with which to save this data. You may choose any name that is not already in use by another DevCan database. Click OK when done.
  5. DevCan will ask whether you want to import Counts and Populations from the exported SEER*Stat files. You should click Yes. If you choose to import Rates instead, you will not be able to calculate confidence intervals in DevCan.
  6. DevCan will display a report that shows the names and values of the variables to be imported. If the information in this report is not as you expected, click Cancel to return to DevCan's main window without importing the data, and review this tutorial to make sure you have not made any mistakes. Otherwise, click OK, and the data will be imported and loaded automatically.
  7. The variable "Race recode Y (All, White, Black)" will be listed in the Parameters section. Highlight it. Its possible values -- the groupings you defined when you created the variable in SEER*Stat -- will appear in the Items Selected section. Click and drag to highlight the values you want.
  8. In the Parameters section, highlight the "Year Dx (2000-02 1 group only) for DevCan" variable. It has only one possible value, the "2000-2002" grouping. Click that value to highlight it.
  9. Use the drop-down list on the DevCan toolbar to select how you want the output displayed.
  10. Open the Session menu and select Execute.
  11. Your probabilities will be calculated and displayed. Click the tabs above the output window to switch between different sets of statistics.(Learn more about how these results may differ from the SEER*Explorer.)
  12. If you want to Save and/or Print the reports, use the appropriate commands on the File menu.

Learn More...

  • How this exercise's results may differ from those in the SEER*Explorer: The SEER*Explorer uses SEER*Stat and DevCan to generate this same data, but the results you generate in this exercise and the next may differ slightly from the SEER*Explorer's where people aged 85 and older are concerned. This is because the SEER*Explorer uses a database not publicly available that contains additional stratifying information for that age group.
Last Updated: 15 Apr, 2021