Error, Part 1: Type I and reliability

Stats Talk
Jan 23, 2015
Error, Part 1: Type I and reliability

Statistics is far, far simpler than normal life. In most spheres of daily existence, there are hundreds of things that could go wrong whereas in statistics there are just two, very simply named Type I and Type II. If you can avoid both, you’ll do just fine. Type I error is essentially seeing something that isn’t there, and Type II error is failing to spot something that is. This post will take you through Type I next week’s will cover Type II, all with the help of Captain Statto and the crew of the pirate ship Regressor.

Imagine, if you will, the day Captain Statto sent Avery up to the crow’s nest with his telescope to have a look around. Statto thought there were no other ships around but wanted Avery to check before settling down for the evening with a bottle of rum. After a few minutes, Avery proudly shouts down that there’s a ship flying a Jolly Roger two points abaft the beam, starboard side. Battle stations! The crew drag themselves away from their games of Liar’s Dice and load the cannon. They patiently scan the horizon, but there’s no sign of another ship. It turned out that Avery’s Jolly Roger was in fact a seagull. Avery mistakenly rejected the null hypothesis (that there were no other ships in the area), committing a Type I error, based on a problem with the data collection instrument, namely Avery’s rum-soaked eyes.

Unreliable eyes

Avery’s eyes proved to be neither a valid nor a reliable method of data collection, and Type I errors often have to do with the instruments used. Validity was previously covered here, and basically refers to whether a scale measures what it claims to measure. Reliability, the other essential property of good measurement, has to do with whether a scale works consistently for different people and for the same people over time.

In quantitative analysis, there are two kinds of reliability: internal and test-retest. Internal reliability is used for multi-item scales and tests whether people give consistent patterns of answers. For example, the internal reliability is high when everyone who ticks A on question 1 also ticks B on question 2. Internal reliability is measured using Cronbach’s α statistic which calculates the average correlation between scale items, and values above .7 are indicative of good reliability. One thing to consider is the population on which reliability analysis is based. For example, there is a tendency to standardise assessment tools using undergraduate students as participants, often for course credit. Undergraduate students are systematically different from the normal population, in their age profile and average weekly consumption of alcohol among others, and this leaves the instrument open to the criticism that it is not reliable for the normal population. There’ll be more on the perils of sampling next week.

Test-retest reliability is concerned with whether the same person gives the same answer when they respond to items on a scale more than once. For anything concerning humans, the convention is to measure at two time-points two weeks apart. If it is after too short a period, scores may be inflated owing to memory effects while too long between responses opens the possibility of different scores due to fluctuations in the intensity of whatever is being measured; this is especially true of psychological problems like depression and anxiety. Test-retest reliability can use Pearson correlations between items or between scale total scores for each participant, and correlations of the order of .85 or .9 can be expected.

Risk of bias

In qualitative research, inter-coder reliability is used as a safeguard against bias in analysis. Analysis of interviews, for example, usually involves the development of a coding frame, a list of themes relevant to the study that might arise in the interview. The first step is for one researcher to note occurrences of all the codes in all the interviews. Using the same coding, a second rater then independently analyses a sample of interviews. A percentage agreement between the original and the second ratings is calculated and adjusted for chance using some version of κ adjustment If the minimum accepted κ coefficient of about .7 is reached for a code, it is considered reliable.

The level of significance of a statistical test result means the level of risk of Type I error that you’ve prepared to live with. Most people are happy with 5%, about a one-in-twenty chance of finding something that isn’t actually there. Once you’ve minimised the risk of measurement error, there are countless extraneous variables in even the best designed studies but being 95% sure of something is usually enough. So, having relieved Avery of look-out duties, Captain Statto can sail happily onwards for a further 19 days before expecting a similar seagull fail. Unless someone makes a Type II error next week…

Featured Jobs

University of Birmingham.

Edgbaston, Birmingham, UK

March 25, 2019

The UK Hydrographic Office

Taunton, Somerset

March 30, 2019

Greater London Authority

London, UK

March 24, 2019

Cambridge Assessment

Cambridge, UK

March 24, 2019

Department for Environment, Food & Rural Affairs (Defra)

March 25, 2019

Siemens

Erlangen, Germany

April 07, 2019

Boehringer Ingelheim

Biberach an der Riss, Germany

April 07, 2019

Australian National University

Canberra, Australia

March 31, 2019

Department for Environment, Food & Rural Affairs (Defra)

March 25, 2019

The Royal Pharmaceutical Society

London, UK

March 31, 2019

University of Jyväskylä

Jyväskylä, Finland

April 03, 2019

Dixons Carphone

London, UK

April 06, 2019

Yorkshire Water

Buttershaw, Bradford

April 07, 2019

Countinglab

Reading, UK

May 11, 2019

Amazon Video Limited

London, UK

April 12, 2019

Lloyds Banking Group

London, UK

March 31, 2019

University of Oxford

Oxford, UK

April 17, 2019

Jet2

Leeds, UK

April 14, 2019

Health Education and Improvement Wales (HEIW)

Cardiff, Wales

March 26, 2019

Our Partners

Logo for Logo University Of Manchester
Logo for Yougov
Logo for Ministry
Logo for Ons Logo
Logo for Un
Logo for Office Depot
Logo for Mit Logo

Like what you see?

Post a job