That’s a surprisingly big number! Descriptive statistics

Stats Talk
Feb 20, 2015
That’s a surprisingly big number! Descriptive statistics

You’re on the bus. It’s early in the morning and difficult to remember what number comes after one. (Twelfty, isn’t it?). There seem to be some other people on the bus, people with coats, and hair, and bags, and mobile phones. When the early morning coffee starts to kick in, you start counting the colours of the coats and the hair and the bags. “Wow!” you think, “That’s amazing! There are seventeen people on this bus with blonde coats! That’s millions!” “So what?” says the bus driver when you try to tell him about your exciting discovery. Counting the things you see is relatively easy. Descriptive statistics is about giving some context to your observations, and starting you on the way to a genuine discovery.

Raw meat

There is a trend in sports analytics to count everything that moves. The Six Nations (that’s the rugby tournament that marks the start of Spring) has an official analytics partner. The “analytics” involves lots of counting, and helpful numbers pop up in the television coverage on “statistics” such as lineouts won on own throw. We can assume that they’re selling something more sophisticated to the teams but what’s presented is probably better described as “metrics”. The closest they get to statistics is percentages of play in action areas, but even then we’re not told what percentages one would ordinarily expect. The real break-through in sports analytics has been in measurement and coding and there are piles of raw numbers generated every week. Players wear GPS trackers that measure distance covered and the intensity of impacts. We now know, for example, that scrums are about 6G and tackles can be over 30G (thanks, the42.ie!). Video analysis has also progressed to classifying the contribution of each player arriving, but that’s data transformation, not analysis.

The first real analytical steps in building on the masses of data becoming available have to include looking at distributions and central tendency. Distribution has to do with positioning people on a scale, say from no tackles in a match to lots of tackles. There will be some players in a match who do lots of tackling, and some who do very little, and most who are somewhere in the middle, the group tending towards the centre, you see. There are three kinds of centre too: the mean is the arithmetic average, the median is the middle number of tackles if they’re arranged from least to most, and the mode is the most common number. The trend in some sports stats has been towards reporting the extremes, but with no clear rationale.

Out-liers

Out-liers (not outliers, which, if it were a word, would refer to things being more outly) are data points that look really big but do not appear to fit in a dataset; they might be genuine freaks, they might be measurement errors, or they might just indicate that the sample is too small. Hampel (1974. Thanks, Hampel!) helpfully came up with the concept of the influence curve of a data point, the degree to which one observation affects the pattern in a dataset: The influence of a single data point is approximately inversely proportional to the sample size, that is, the smaller the sample the greater the risk of influence. In a rugby match featuring a maximum sample of 46, the influence of out-liers is tight-head prop-esque.

Numbers of lineouts won on own throw is just as useless as number of blonde coats without any context and without any expectation. If we were also told – and the data are certainly there – the average number of lineouts won in matches over the last 15 years, and whether today’s match was significantly above or below the average, that would be something to shout about. There’s a difference between saying, “Wow! There’s a really big number!” based on a single number and “Wow! There’s a surprisingly big number!” based on a comparison. The next step is to link differences in what can easily be counted to what actually counts, that is, the result of the match. It is possible, for a start, to correlate any performance metric of a team with the outcome of the match. It is possible to control for the influence of any other performance metric, and soon you’re on the way to a statistical model of how to win a rugby match.

Analysis of sport has taken several steps in a more numerically literate ‎direction but it’s still a long way from the very appearance of terms “kurtosis” or even “predictor”. There’s probably still too much reliance on raw, de-contextualised data and on impressive-looking outliers but there’s also enormous potential to find out what really counts. The first step is descriptive statistics, but that’s still several steps away from master-minding a Six Nations win, so don’t mention anything to the bus driver quite yet.

Featured Jobs

Greater London Authority

London SE1 2AA

November 25, 2018

AstraZeneca

Cambridge

December 16, 2018

University of Leeds

Leeds, UK

December 06, 2018

AXA

Tunbridge Wells

December 06, 2018

Yale-NUS College

Singapore

November 23, 2018

The Energy Systems Catapult

Birmingham, UK

December 16, 2018

Cabinet Office

London

November 25, 2018

The General Medical Council

Manchester, UK

November 25, 2018

SPD Development Company Ltd

Bedford, UK

November 23, 2018

AstraZeneca

Cambridge, UK

December 16, 2018

The Office for Students (OfS)

Bristol, UK

November 20, 2018

Canal & River Trust

Birmingham, UK

December 06, 2018

The Welsh Government

Conwy LL31 9RZ

November 20, 2018

Massachusetts Institute of Technology (MIT)

Cambridge, MA

December 05, 2018

University of Leeds

Leeds, UK

December 06, 2018

University of Wollongong

Wollongong, NSW, Australia

December 02, 2018

National Audit Office

London

December 02, 2018

Abbott Diabetes Care

Witney, UK

December 09, 2018

Our Partners

Logo for Logo University Of Manchester
Logo for Yougov
Logo for Ministry
Logo for Ons Logo
Logo for Un
Logo for Office Depot
Logo for Mit Logo

Like what you see?

Post a job