ADVERTISEMENT
More Than Just P Values: Exercising Healthy Skepticism With Research
© 2024 HMP Global. All Rights Reserved.
Any views and opinions expressed are those of the author(s) and/or participants and do not necessarily reflect the views, policy, or position of Podiatry Today or HMP Global, their employees, and affiliates.
Many modern podiatrists read scientific literature—whether it is to keep up-to-date, find better practices, or to enhance their clinical knowledge and decision-making processes. Yet, understanding how to critically evaluate and interpret the literature remains a challenge for many. I think this is in part because reading and understanding scientific literature is like learning a new language—and understanding medical statistics is yet another new language. Most health care practitioners do not learn these “languages” directly, and good appraisal of literature takes reading and understanding hundreds of articles—good and bad—to appreciate the difference.
A common pitfall, in my experience, is relying solely on titles or abstracts without delving into methods or results. The unfortunate reality is that most practitioners are busy—with their patients, families, and social commitments outside of medicine. It can be challenging to dedicate the necessary hours for in-depth understanding. And with hundreds of specialty-specific studies published each month, it is next to impossible to read and evaluate everything.1 That being said, dismissing or accepting articles based on abstracts alone is a disservice. The Methods and Results sections are arguably two of the most important parts of scientific papers and the abstract alone does not highlight these sections.
When it comes to understanding articles, a knowledge of medical statistics is of the utmost importance. In statistics, the elusive P value often takes center stage. The P value is a complex and incredibly nuanced topic and its true meaning goes beyond a mere probability of randomness. Morgenstern, an emergency medicine physician with a knack for research, described the P value as follows: “... the p-value is essentially the probability of obtaining a result equal to or more extreme than the result actually observed, assuming that the null hypothesis is correct.”2 The null hypothesis asserts that there is no discernible difference between the two groups under examination. In regard to the calculation of the P value, this assumes the null hypothesis is true. Again, as Morgenstern writes “Therefore, it is impossible for the P value to prove the null hypothesis false. It only gives you information about how likely your results are if the null hypothesis is true.”2 The significance of the P value lies in prompting a second look at the hypothesis, not in affirming validity.3 Two studies have argued the P value of 0.05, a standard in medical papers, is arbitrarily defined and doesn't guarantee clinical relevance.2,3
If the P value is equal to or less than 0.05, it suggests that either the null hypothesis is inaccurate or there might be bias in the study. The calculation of the P value assumes unbiased data collection, which is inherently challenging.2 While a statistically significant P value indicates the null hypothesis is incorrect, it doesn't propose an alternative or validate the experimental hypothesis. It's crucial to recognize that statistical significance doesn't affirm the correctness of the experimental hypothesis, a point often overlooked.2
Per the creator of the P value, Ronald Fisher, if something is found to be statistically significant, the data is worthy of a second look—or that the study is worth replicating to evaluate results.3 It is not a definitive statement about validity of the data or its value in medicine.3 The P value also lacks in demonstrating the magnitude of the difference—clinical and statistical significance are two very different concepts. Something might be found to be statistically significant, but only offer 0.0001% of a clinical benefit, which is essentially meaningless.
Statistical misinterpretation is widespread, compounded by the complexity of tests like the chi-squared, Mann-Whitney, and logistic regression analyses. This is understandable, as most authors will hire an outside statistician to try and make sense of their data. In my opinion, the raw data should make it straightforward to garner conclusions without the use of heavy statistics. However, the unfortunate (and often treated as “elephant in the room”) reality is that one can ask a statistician to run whatever test with your data set to find the conclusion that you want to find.4-6 This is referred to as “P hacking”—or finding a way to ensure that the P value of the variable you are testing is statistically significant. This may occur in several ways—different statistical analysis methods, utilizing multiple data sets, or including lots of comparisons. These are all methods to attempt to achieve P<0.05.2,4,6
This is why it is of utmost importance to read the methods, see how the data was collected, and what (if any) bias was present … in my observation, there is almost always bias. Reading methods, evaluating biases, and scrutinizing raw data become imperative in countering statistical manipulations. Being a skeptic is all the more important in the “publish or perish” era that we are in—there was a study4 of abstracts between 1990 and 2015 that had 96% having at least one P value of <0.05. This implied that statistically nonsignificant results are published less frequently and P hacking is rampant.
P values can be artificially deflated in biased trials, intentional or not.2 A low P value doesn't guarantee a legitimate test, necessitating careful evaluation of methods. One must consider the influence of potential conflicts of interest, related to industry or otherwise, along with whether those circumstances are fully disclosed.5,7
When reading through any study, it is good to be a skeptic. Does it seem too good to be true? If so, it probably is, and the results will not be useful for clinical practice. In the methods, how were patients chosen for the study and were they randomized? If it was a cadaver study, could you feasibly generalize that to the average population? Probably not. That being said, commentary points out many smaller, biased, nonrandomized trials being used as the “standard of care,” which I believe is a misstep in the scientific community.8,9 It is good to remind yourself that P values around 0.05 have a greater than 20% chance of being incorrect.6,10
Peer review, often seen as a hallmark of credibility, has its shortcomings. While it may seem obvious, it is important to remind yourself that not all peer-reviewed papers are equally valid. Note that the former physician Andrew Wakefield’s fraudulent and retracted study linking vaccines to autism was peer-reviewed (in a very well-respected journal, no less), while Watson and Crick’s paper identifying the structure of DNA was not peer-reviewed.11,12
With the current “publish or perish” environment and the potential for fraud and predatory journals, it is even more important to read the entire paper and perform your own critical evaluation before interpreting the results. The peer review process is not bulletproof, as evidenced by published papers where things like satire and AI snuck through.13-15
In closing, low P values and author claims don’t automatically translate to clinical relevance. Reading methods, analyzing raw data, and embracing skepticism are crucial in forming independent conclusions. As podiatrists navigating the scientific landscape, a discerning eye is the key to separating the wheat from the chaff.
Dr. Ehlers is in private practice in Arvada, CO, and is an attending at the Highlands-Presbyterian/St. Luke’s Podiatric Residency Program. He finds interest in debunking medical myths and dogma.
References
1. Zul M. How many journal articles have been published? PublishingState. Published Oct. 19, 2023.
2. Morgenstern J. EBM masterclass: What exactly is a P value? First10EM. October 11, 2021. Available at: https://doi.org/10.51684/FIRS.83454
3. Dahiru T. P-value, a true test of statistical significance? A cautionary note. Ann Ib Postgrad Med. 2008 Jun;6(1):21-6. doi: 10.4314/aipm.v6i1.64038. PMID: 25161440; PMCID: PMC4111019.
4. Ioannidis JP. What have we (not) learnt from millions of scientific papers with P values? American Statistician. 2019; 73(sup1):20-25
5. Jureidini J, McHenry LB. The illusion of evidence based medicine. BMJ. 2022 Mar 16;376:o702. doi: 10.1136/bmj.o702. PMID: 35296456.
6. Johnson VE. Revised standards for statistical evidence. Proc Nat Acad Sci. 2013; 110(48):19313-19317.
7. Taheri C, Kirubarajan A, Li X, et al. Discrepancies in self-reported financial conflicts of interest disclosures by physicians: a systematic review. BMJ Open 2021;11:e045306. doi: 10.1136/bmjopen-2020-045306
8. Herrera-Perez D, Haslam A, Crain T, et al. A comprehensive review of randomized clinical trials in three medical journals reveals 396 medical reversals. Elife. 2019 Jun 11;8:e45183. doi: 10.7554/eLife.45183. PMID: 31182188; PMCID: PMC6559784.
9. Prasad V, Vandross A, Toomey C, et al. A decade of reversal: an analysis of 146 contradicted medical practices. Mayo Clin Proc. 2013 Aug;88(8):790-8. doi: 10.1016/j.mayocp.2013.05.012. Epub 2013 Jul 18. PMID: 23871230.
10. Goodman SN. Of P-values and Bayes: a modest proposal. Epidemiol. 2001 May;12(3):295-7. doi: 10.1097/00001648-200105000-00006. PMID: 11337600.
11. Wakefield AJ, Murch SH, Anthony A, et al. Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children. Lancet. 1998 Feb 28;351(9103):637-41. doi: 10.1016/s0140-6736(97)11096-0. Retraction in: Lancet. 2010 Feb 6;375(9713):445. Erratum in: Lancet. 2004 Mar 6;363(9411):750. PMID: 9500320.
12. Watson JD, Crick FH. Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature. 1953 Apr 25;171(4356):737-8. doi: 10.1038/171737a0. PMID: 13054692.
13. Elbein A. “What’s the deal with birds” a new paper asks while making a point. Audubon Magazine. Published April 22, 2020.
14. Manshu Z, Liming W, Tao Y, et al. The three-dimensional porous mesh structure of Cu-based metal-organic-framework - aramid cellulose separator enhances the electrochemical performance of lithium metal anode batteries. Surfaces Interfaces. 2024;46:104081.
15. Ashraf I, Mohammad A, Neta A, et al. Successful management of an Iatrogenic portal vein and hepatic artery injury in a 4-month-old female patient: A case report and literature review. Radiology Case Rep. 2024; 19(6):2106-2111