Stats Talk

Aug 14, 2015

You’re in full flow describing the potential of latent variable modelling with socio-demographic data. You’ve made it to the interview shortlist, not least thanks to your sparkling CV [how to make your CV sparkle], so now it’s time to demonstrate your detailed and sophisticated understanding of statistics, your wealth of experience, and the enormous contribution you’re about to make to their organisation. Then someone stops you to ask:

“Sorry, what’s a variable again?”

Yes, that is a quote and today’s blog is about the statistics non-expert. If your aim is to convince a panel of interviewers that you’re the best person for the job, it will probably take different things to make an impression on each panel member and it’s best to avoid making someone feel inadequate because they can’t tell *t*-test from a tea-bag.

In one way, it’s actually quite a good interview question because, as anyone who teaches for a living knows, it’s hard to explain something without understanding it properly first. If someone really needs an explanation of what a variable is, gender and age are good examples of categorical and continuous variables respectively because practically everybody has one of each. They’re also among the most commonly used variables in human and social sciences even if it’s a bit lazy sometimes. Depending on your discipline, you might think of more appropriate examples as there’s not much point in talking about demographics to a room full of marine biologists.

As for statistical tests, there are basically straight lines and normal curves. Everything else just adds layers of complexity. The most efficient way to convey your wealth of experience is to list as many statistical tests, modelling methods, and programming innovations as you can before picking one to explain in more detail. Based on the principle of the lowest common denominator, it might be best to start with the basics.

To describe the normal distribution, IQ can be useful because, again, practically everyone has an IQ regardless of whether you think it’s a concept with any validity. The normal curve illustrates that very few people have a very low IQ, most people are somewhere around the middle, and very few people have a very high IQ. It looks a bit like a bell, and a bit like a child’s drawing of a volcano. Using normative data, a common question involves comparing two groups such as males and females and testing whether the averages for each group differ. *T*-tests, and their burly ANOVA *F*-test cousins, use distributions similar to the normal curve to test for differences between groups based on the characteristics of the sample.

A correlation means that more of one thing is associated with more of something else; an association with less of something else is, of course, a negative correlation but we’re dealing with someone who can barely spell count to π here! In general, as children get older they learn more words so a graph of their age and vocabulary would show a positive correlation. The strength of the relationship determines how steep or shallow the slope and straight line is a perfect correlation. Regression modelling, like correlation, is based on straight lines and tests whether an additional variable pushes the slope of a relationship steeper or shallower. Continuing with the vocabulary example, a child having access to lots of books might make for a stronger relationship with age. A further layer of complexity in multi-level modelling is the addition of group-level variables such as including school characteristics in modelling student behaviour.

The final thing you’ll need to show our non-expert panellist is the ability to select the appropriate statistical test, though there are also times when it is not appropriate to use statistics. Beyond the necessarily basic descriptions here, statistical tests make loads of assumptions, including things like skewness and kurtosis, that need to be me; if they’re not met you need to know what else to do. For example, tests of difference and correlations work best with continuous variables but there are alternative tests for use with categories.

There are two little words that can cause enormous confusion for non-experts: ‘average’ and ‘significant’. The average journalists is significantly more likely to inaccurately describe random results as significant than the average statistician. For average, refer again to the normal distribution in which the average is the middle and about half of everybody is below average and the other half above; curiously most people actually think they’re above average. The simplest way to describe statistical significance might be to repeat the 95%, or 99%, certainty that comes with a significant difference, compared to the 50% certainty of a coin toss.

You can expect a good deal of variability on interview panels so judging the level at which to pitch your skills needs some consideration. The ability to explain statistical analysis to a non-expert, and thereby to gain their trust, is certainly likely to be an advantage. Try thinking of the most basic and obvious question that you think everyone knows the answer to and be prepared to take a moment to explain what a variable is.

The Institute of Statistical Science of Academia Sinica

December 31, 2021