Letters to the Editor

Richard Evans University of Minnesota Clinical and Translational Science Institute, Minneapolis, MN

Search for other papers by Richard Evans in
Current site
Google Scholar
PubMed
Close
 PhD

Click on author name to view affiliation information

The “P value < .05” trope

Thank you for publishing the article, “Synthesis of surgeon and rehabilitation therapist treatment methods of bicipital tenosynovitis in dogs allows development of an initial consensus therapeutic protocol.”1 It is an excellent addition to the canine orthopedic rehabilitation literature. In that article, the authors provided 5 P value cutoffs for statistical significance that resulted in different interpretations of statistical significance and the study’s overall false-positive rate. One cutoff was defined in the statistics section, “A P value < .05 was considered significant.” That implied every P value below .05 was statistically significant. The other cutoffs contradicted the .05 cutoff. They were more stringent and implicitly stated with 1 Dunn and 3 Bonferroni multiplicity corrections, all giving cutoffs lower than .05.

It is fine to use several cutoffs, but knowing the cutoffs for P values is important because the cutoffs control the number of false-positive statistically significant results. If the cutoff for all of the P values was .05, as stated, the probability of a false-positive result in this article is roughly 1 – 0.9558 = 0.95, a near certainty. If some P values had lower cutoffs, that probability is lower. Sometimes cutoffs from corrections are hard to determine and report, but they usually follow omnibus tests, and cutoffs for those can be reported. In any case, it would be helpful if the authors could clarify the apparent contradiction in cutoffs. That is especially true of the P values in Tables 5 and 6, because it is hard to tell which of the approximately 58 P values correspond to which of the 5 cutoffs.

The phrase, “A P value of < .05 was considered statistically significant,” has been identified as a trope.2 An informal search of JAVMA using the keywords “Bonferroni” or “Tukey” produced 7 recent articles, 6 with the same conflicting cutoff problem as this article and 1 article that did not. It used the Tukey method and correctly omitted the trope. That is a simple solution: omit the trope, and describe the multiplicity adjustments in the statistics section by defining the cutoffs for the omnibus tests that precede the post hoc tests. That way, readers need only to consider the false-positive rate for the collection of omnibus tests and a few other miscellaneous statistical tests in a study. For tables, the best solution may be to report the P values but use superscript asterisks to indicate why a P value is or is not statistically significant under a particular multiplicity correction.

References

  • 1.

    Lane DM, Pfeil D von, Kowaleski MP. Synthesis of surgeon and rehabilitation therapist treatment methods of bicipital tenosynovitis in dogs allows development of an initial consensus therapeutic protocol. J Am Vet Med Assoc. Published online November 8, 2023. doi:10.2460/javma.23.08.0461

    • Search Google Scholar
    • Export Citation
  • 2.

    White NM, Balasubramaniam T, Nayak R, Barnett AG. An observational analysis of the trope “A p-value of < 0.05 was considered statistically significant” and other cut-and-paste statistical methods. PLoS One. 2022;17(3):e0264360. doi:10.1371/journal.pone.0264360

    • Search Google Scholar
    • Export Citation

The authors respond:

We thank Dr. Evans for the supportive comments on our manuscript. As Dr. Evans noted, we stated, “A P value < .05 was considered significant.” We also stated that we used Bonferroni and Dunn tests for multiple comparisons. We apologize to Dr. Evans and the readers for any confusion.

All P values reported in the manuscript were the raw P values, with no conversion. In some cases—for example, Table 4—the comparison was of 3 groups and the type I error rate of .05 was protected by means of Bonferroni; the raw P values were reported, and a P of < .05/3 = .017 was required for significance at the .05 level. The P values reported in Table 6 are the raw P values from Kruskal-Wallis. Post hoc comparisons of those 3 groups (in Table 6) were by means of the Dunn test; we did not report Dunn P values.

A “P < .05” means that one will make a false-positive conclusion (type I error) once out of every 20 comparisons. The value of P (say, .03 vs < .0001) estimates the strength of the conclusion concerning the difference in a contrast. We agree with Dr. Evans that we presented a large number of contrasts with the resultant high probability of a false positive. Given the probable number of contrasts in this study that were likely truly not different, it is reasonable to estimate that we had 1 to 3 false positives. A review of the P values in our manuscript revealed that there were 2 P = .03 and 5 P = .01; all others were P < .01. We accept that a type I error is a reality in statistics in research and that false positive(s) may have occurred. The reader is cautioned that there may be 1 to 3 type I errors in this manuscript and likely from that group with P = .01 to .03.

David Lane, DVM, DACVSMR

Points East West Veterinary Services Squamish, BC, Canada

Dirsko von Pfeil, DVM, DACVS, DECVS, DACVSMR, DECVSMR

Small Animal Surgery Locum PLLC

Dallas, TX

Bessy’s Kleintierklinik

Watt/Regensdorf, Switzerland

Mike Kowaleski, DVM, DACVS, DECVS

Cummings School of Veterinary Medicine, Tufts University

North Grafton, MA

Joe Hauptman

All Time Past Year Past 30 Days
Abstract Views 778 778 0
Full Text Views 509 509 240
PDF Downloads 111 111 13
Advertisement