Head-Bang: Frequently Asked Questions

Is there a suggested citation for the Head-Bang software?
What is the best number of Nearest neighborhood points(NN) and Triples to be used?
What is an appropriate weight variable?
How does the algorithm smooth at the edges of the map? If I have data for areas beyond the boundaries of my map, should I use that?
Is there a recommended statistical method to compute/display the "potential distortion" introduced when ratios are "smoothed" using the Headbang software?
If the crude ratios do not have a normal distribution, should the crude ratios be "standardized" (log transformation) prior to using Headbang?

1. Is there a suggested citation for the Head-Bang software?

A citation for Head-Bang, indicating the software version, is recommended. See Suggested Citation on Head-Bang's help menu for the citation specific to your version of the program. The general format is:

Hansen Simonson and Statistical Research and Applications Branch, NCI. Headbang software (surveillance.cancer.gov/headbang) version <version>.

2. What is the best number of Nearest neighborhood points(NN) and Triples to be used?

We (Mungiole and Pickle (1999)) found that you do not gain much by going beyond NN=6. However, there are odd cases where one unit is an outlier and affects the smooth; in this case, NN=8 or NN=10 seems better. Simonson (original author of the program) suggested NTRIP=2/3 NN and we didn't find any reason to deviate from that in our study.

Setting NN too high, say a number that would result in half of the total geographic units being considered neighbors, could cause the original value to be smoothed to be like values of places that are a great distance away, places that would not be considered neighbors in the usual sense of the term. The user should also note that the definition of neighbor is not at all distance-based. For example, for US data, values in Alaska would be smoothed by finding neighbors for triples in Washington, Oregon, Idaho and beyond.

Another consideration in choosing the number of nearest neighbors to use is the total number of places in the dataset. For example, NN=30 would be too high for smoothing state values (n=51) but might be a good choice for county values (n=3047 or so). When smoothing a new type of dataset for the first time, it is a good idea to pick a few spots on the map and identify its 8, 10, 20, and 30 nearest neighbors. This will help you to decide the number of places that might reasonably be considered similar neighbors for smoothing purposes.

3. What is an appropriate weight variable?

We use population weights because the rate variance is proportional to 1/population. If you are smoothing another type of statistic, an appropriate weight would be inversely proportional to its variance or standard deviation.

4. How does the algorithm smooth at the edges of the map? If I have data for areas beyond the boundaries of my map, should I use that?

The algorithm reflects existing data out beyond the boundaries in order to construct triples. That is, it does something reasonable to get enough data to apply the algorithm, but if you have actual data outside the area of interest, use that instead of throwing it away.

5. Is there a recommended statistical method to compute/display the "potential distortion" introduced when ratios are "smoothed" using the Headbang software?

See: Nandram B, Sedransk J, Pickle L. Bayesian analysis and mapping of mortality rates for chronic obstructive pulmonary disease. J Amer Stat Assoc 2000;95:1110-18.

This paper shows the results of a Bayesian analysis of COPD rates. Even if you are not doing a Bayesian analysis, the paper displays the variation from the MCMC samples in a map color-coding the percentage of times the number of categories was different by a certain amount between the mean map and each sample. For example, if NYC fell into category 4 of 5 on the mean map, what percentage of the time did it fall in category 1, 2, 3, or 5? The figures included in the article show the proportion of replicates where the individual place rate fell into a quintile color category at least two different from the mean map. If you do not care so much about where the greatest smoothing occurs, you can always construct a table of "before" and "after" quintiles: the number on the diagonal equals the number of units which were categorized into the same quintile originally and after smoothing. In general, it will be the more sparsley-populated places that are smoothed the most, i.e., are most likely to be in a different category after the smoothing compared to before.

6. If the crude ratios do not have a normal distribution, should the crude ratios be "standardized" (log transformation) prior to using Headbang?

Standardization should not be necessary, since headbang is based on medians. Thus, any monotonic transformation of the data, such as logs, should give the same results.