Basic and Applied Social Psychology (BASP) has banned the use of null hypothesis significance testing procedure (NHSTP). The editorial making the announcement is full of evocative language about having to “remove all vestiges of the NHSTP” from manuscripts before publication, criticising “the stultified structure of NHSTP thinking”, and liberating psychological research from the “crutch of NHSTP”. They’re down on confidence intervals as well and, while no direct alternative is proposed, Bayesian analysis gets a Highly Commended. If that sounds like a battle line being drawn in another epistemological paradigm war, this might be a good time to pick a side. Are you a Frequentifier or a Bayesianite?
It’s probably about probability
The Frequentist approach calculates the probability of A based on the number of times it occurs out of the number of opportunities for it to occur, P(A) = n/N. Because it’s impossible to observe every opportunity for A to occur, a sample of observations is selected, which means that Frequentist analysis always includes sampling error, and measures the relative probability of an event. The common perception is that it gives the probability of an observation being true. Another misconception is that probability is some proxy for replicability. However, the probability of exact replication of a reported result is no better than chance (50%) even where p < .001. Frequentist analysis often also reports the effect size, which indicates the proportion of variance of a dependent variable associated with levels of an independent variable. Effect size can be measured using η2, Cohen’s d, or ω2, depending on cell sizes and so on, and is useful in interpreting analysis.
One alternative to Frequentist analysis is Bayesian probability whereby the starting point is an a priori probability to which new data are compared a posteriori using Bayes’ theorem. The theorem basically calculates a conditional probability of A given B using random or constructed prior probabilities of A and B, P(A|B) = [P(B|A) X P(A)]/P(B), so knowing about B provides support for A. Bayesian analysis improves as B is refined, and the increasing size of samples in big databases have created more opportunities. The giant YouGov profiler, for example, uses hierarchical Bayesian models.
The Frequentist-Bayesian debate has to be seen in the context of Big Data, and some problems are avoided with larger sample sizes. However, the trouble with averages in large datasets is that everyone starts to look the same. In the YouGov data, pick any two newspapers or football clubs or shampoo brands and the average profile of fans or consumers is likely to be quite similar: middle-aged men with thinning hair and some disposable income. (Okay, maybe just newspapers and football clubs.) The criticism is also frequently cited that most people who look at the average person or the archetype don’t recognise anyone.
The power of a test is the probability of detecting a difference. In order to determine power, it’s necessary to know the size of the expected difference, the sample size, and the risk of Type I error that you’re prepared to take. Once you know all of those things, there are tables of sample sizes in most stats books and calculators on the interweb that will tell you the minimum sample size for that test. The sample size in any study might be too small for it to be read as definitive but noteworthy patterns could be explored in further studies. Looking at it the opposite way, large datasets have sufficient power to reliably identify characteristics that are peculiar to just one group, things that distinguish them from the others, but also have the power to identify essentially meaningless differences. There is still a role, then, for skilful interpretation.
What the debate about Frequentism and Bayesianism overlooks entirely is that research using statistical analysis aims to give meaningful answers to questions. Decisions are made about the questions to pose, the data to collect, the statistical tests to run, and the interpretation of the results, whether one has a Frequentist or a Bayesian stance. It may be true that the unthinking repeated use of NHSTP in psychology is a problem, but the issue could just be laziness, not philosophy or choice of technique at all. The trend for reporting effect sizes and not just probabilities has a longer history, and was an implicit criticism of NHSTP. There are also techniques that use Bayesian analysis to refine the standard errors in t-tests. There are advantages to the repetition of research on which Frequentism relies, such as meta-analysis.
BASP’s move is not a worldwide ban enforced by an army of axiom-wielding Bayesianistas. It’s more of a discontinuation of the practice. They now require strong descriptive statistics, effect sizes, frequency distribution, and larger samples. A more pragmatic approach might borrow from mixed methods and allow for epistemological mixing. As with other methodological innovations, this is likely to be driven by real-world researchers who are more interested in the best way to answer the question in front of them than they are with philosophical niceties. Crucially, the BASP editors hope to inspire other journals to take the same step, so either pick a side now or learn to make the best of both.
Link to the BASP editorial :
July 16, 2020
Ada Health GmbH
August 08, 2020
July 13, 2020
Decision Analysis Services (DAS)
July 28, 2020
Queen’s University Belfast
August 05, 2020
UK Lighthouse Labs Network
Alderley Park, Macclesfield, UK
July 15, 2020
Imperial College London
July 19, 2020
Wolfson Institute of Preventive Medicine, Queen Mary University
July 14, 2020
10 Downing Street
July 27, 2020