Is the correction for multiple tests always necessary?

The answer is that Multiple testing correction is not always necessary.

Multiple comparisons can be accounted for with Bonferroni and other corrections, or by the approach of calculating the False Discover Rate. But these approaches are not always needed. Here are three situations were special calculations are not needed.

Account for multiple comparisons when interpreting the results rather than in the calculations

Some statisticians recommend never correcting for multiple comparisons while analyzing data (1,2).

Instead, report all of the individual P values and confidence intervals, and make it clear that no mathematical correction was made for multiple comparisons. This approach requires that all comparisons be reported. When you interpret these results, you need to informally account for multiple comparisons. If all the null hypotheses are true, you’d expect 5% of the comparisons to have uncorrected P values less than 0.05. Compare this number to the actual number of small P values.

Corrections for multiple comparisons may not be needed if you make only a few planned comparisons

Other statisticians recommend not doing any formal corrections for multiple comparisons when the study focuses on only a few scientifically sensible comparisons, rather than every possible comparison. The term planned comparison is used to describe this situation. These comparisons must be designed into the experiment, and cannot be decided upon after inspecting the data.

Corrections for multiple comparisons are not needed when the comparisons are complementary

Ridker and colleagues (3) asked whether lowering LDL cholesterol would prevent heart disease in patients who did not have high LDL concentrations and did not have a prior history of heart disease (but did have an abnormal blood test suggesting the presence of some inflammatory disease). They study included almost 18,000 people. Half received a statin drug to lower LDL cholesterol and half received placebo.

The investigators primary goal (planned as part of the protocol) was to compare the number of “end points” that occurred in the two groups, including deaths from a heart attack or stroke, nonfatal heart attacks or strokes, and hospitalization for chest pain. These events happened about half as often to many people treated with the drug compared to people taking placebo. The drug worked.

The investigators also analyzed each of the endpoints. Those taking the drug (compared to those taking placebo) had fewer deaths, and fewer heart attacks, and fewer strokes, and fewer hospitalizations for chest pain.

The data from various demographic groups were then analyzed separately. Separate analyses were done for men and women, old and young, smokers and nonsmokers, people with hypertension and without, people with a family history of heart disease and those without. In each of 25 subgroups, patients receiving the drug experienced fewer primary endpoints than those taking placebo, and all these effects were statistically significant.

The investigators made no correction for multiple comparisons for all these separate analyses of outcomes and subgroups. No corrections were needed, because the results are so consistent. The multiple comparisons each ask the same basic question a different way, and all the comparisons point to the same conclusion – people taking the drug had less cardiovascular disease than those taking placebo.

References

1. Rothman, K.J. (1990). No adjustments are needed for multiple comparisons.Epidemiology, 1: 43-46.

2. D. J. Saville, Multiple Comparison Procedures: The Practical Solution. The American Statistician, 44:174-180, 1990

3. Ridker. Rosuvastatin to Prevent Vascular Events in Men and Women with Elevated C-Reactive Protein. N Engl J Med (2008) vol. 359 pp. 3195

Gentilini Davide