Number of Permuted Data Sets

How many permuted data sets should I use?

Answer:

For greater consistency in the p-values obtained if one were to change the seed for each run, we suggest running the program for at least 4499 permutations. For this reason, the default number of permutations is now 4499 in the current version of Joinpoint (it was 999 in previous versions).

If the maximum number of possible joinpoints is set to 3 and the minimum is set to 0, then there are 3 statistical tests each conducted at the Bonferonni adjusted cutoff significance level of .05/3 = .0167. The value 4499 was chosen so that if you obtained a p-value of .0167 using one seed with 4499 permutations, then, assuming the number of possible permutations is large, the complete run using all possible permutations would have approximately a 99% chance of the p-value being between .0120 and .0220 (length of confidence interval = .0100), and approximately a 95% chance of the p-value being between .0129 and .0206 (length of confidence interval = .0077). Choice of the number of permutations selected by the user is a tradeoff between computer time and consistency of the p-values obtained.

The Joinpoint program uses Monte Carlo simulation to calculate p-values for a series of permutation tests. See Permutation Test Details for details on the permutation tests performed.

Here we discuss the implications of the choice of the number of permutation data sets (N). The program runs faster with smaller values of N, but it gives better precision for the p-value with larger values of N. In addition, a larger N reduces the probability that another analysis of the same data might get a different answer when run with different random number generator seeds. (Computer programs produce pseudo-random numbers through algorithms that mimic randomness, which we use to shuffle or permute the errors. The algorithms use a seed or seeds to start the algorithm. These seeds can be used to produce repeatable pseudo-random numbers.)

The problem of two analyses obtaining different answers from the same data is addressed by this program by specifying default random number generator seeds. Thus, as long as no parameters are changed (including the random number generator seed and N), repeats of the analyses will produce the same results. Otherwise, two runs of the same analysis except with different seeds could get different answers.

To get an idea how results would change for someone using different random number generator seeds, we list some confidence intervals for p-values below.

 

P-Value N99

P-Value N999

P-Value N9999

P-Value N99999