9  Statistics and probability

Reasoning honestly from limited data

A football team scores 1, 3, 0, 2, 1, 4, 1 goals across seven matches. What is their typical score?

A weather forecast says 70% chance of rain. What does that actually mean?

Two students both average 65% on their tests, but one scores between 60–70% every time and the other swings from 20% to 100%. Are they the same?

These questions all need the same toolkit: averages, spread, and probability. The numbers tell you something — but only if you know which number to look at.

9.1 What this chapter helps you do

Symbols to keep handy

These are the bits of notation you'll see a lot. If a line of symbols feels like a fence, read it out loud once, then keep going.

  • Σ: sigma, meaning sum all the following terms

  • P(A): the probability of event A

  • P(A) = m/n: probability equals favourable outcomes divided by total equally likely outcomes

  • : x-bar, the sample mean

  • range, IQR: range is max minus min; IQR is the interquartile range (middle 50%)

Here is the main move this chapter is making, in plain terms. You do not need to be fast. You just need to keep the thread.

  • Coming in: You can add and divide numbers. You know what a fraction is. You have encountered averages before.

  • Leaving with: Data is a sample, not a population. A single average hides the spread. Probability is a number between 0 and 1 that measures how likely an outcome is, based on either counting or frequency. The right question is never just “what is the average?” — it is also “how much does it vary, and how confident can we be?”

9.2 What the notation is saying

Start with a simple list of scores: 4, 6, 7, 7, 9. We could ask two different questions. What is a typical score? And how spread out are the scores?

The mean \bar{x} is the sum of all values divided by the number of values. Given data x_1, x_2, \ldots, x_n:

\bar{x} = \frac{x_1 + x_2 + \cdots + x_n}{n} = \frac{\sum x_i}{n}

The \Sigma (capital sigma) means “sum all of the following.” It is notation for a loop: add every term x_i for i = 1 to n.

Here, x_1, x_2, \ldots, x_n are the data values, and n is how many values there are. This means the mean is the total shared equally.

The median is the middle value when the data are ordered. If there are two middle values, take their mean. The median is less sensitive to extreme values than the mean.

This means the median tells us what sits in the middle, not what would happen if we shared everything equally.

The range is maximum minus minimum. It measures total spread but is heavily influenced by outliers.

This means range uses only the two end values.

Probability of event A is:

P(A) = \frac{\text{number of favourable outcomes}}{\text{total number of equally likely outcomes}}

P(A) = 0: impossible. P(A) = 1: certain. All probabilities for a complete set of outcomes sum to 1.

Here, A is the event we care about. This means probability is a fraction of all possible outcomes.

Relative frequency: when you can’t count outcomes theoretically, estimate probability from data: P(A) \approx \frac{\text{number of times A occurred}}{\text{total trials}}

This means data can give us an estimate of probability, even when we cannot work it out exactly in advance.

9.3 The method

Computing the mean

  1. Add all values — to get the total.
  2. Divide by the count — to find the fair share per value.

This means the mean is the equal-share value.

Computing the median

  1. Order the values from least to greatest.
  2. If n is odd, the median is the value at position \frac{n+1}{2}.
  3. If n is even, average the values at positions \frac{n}{2} and \frac{n}{2}+1.

This means the median depends on order, so you must sort the data first.

Computing the range and IQR

Range: \max - \min.

IQR (interquartile range): split the ordered data in half. Lower quartile Q1 = median of the lower half. Upper quartile Q3 = median of the upper half. \text{IQR} = Q3 - Q1.

The IQR describes the spread of the middle 50% of the data — it ignores the extremes at both ends.

This means IQR is often more stable than range when there are outliers.

Computing probability from equally likely outcomes

  1. Count total possible outcomes.
  2. Count outcomes that match event A.
  3. Divide.

This means probability compares successful outcomes to all possible ones.

Why this works

The mean is the “balance point” of a data set — if you placed equal weights at each value on a number line, the mean is where the line would balance. The median is the “middle” — it splits the data 50/50. They give the same answer for symmetric data; they diverge when the data is skewed. The median is more robust: one extreme outlier shifts the mean but not the median.

Probability is defined between 0 and 1 because it is a fraction of a complete set of outcomes. When probabilities sum to more than 1, you have double-counted. When they sum to less than 1, you have missed some outcomes.


Edit any of the seven scores below. Watch the mean, median, range, and IQR update instantly — and see the difference between mean and median when you push one value to an extreme.

Try this: change Score 1 from 62 to 2. Watch the mean drop noticeably while the median barely shifts. That is the difference between the two measures — and why it matters which one you use.


9.4 Worked examples

Example 1 — Test scores. A student scores the following in seven tests: 62, 71, 58, 74, 66, 70, 63. Find the mean, median, and range.

Ordered: 58, 62, 63, 66, 70, 71, 74.

Mean — add all values, then divide by 7: \bar{x} = \frac{62 + 71 + 58 + 74 + 66 + 70 + 63}{7} = \frac{464}{7} = 66.3

Range — highest minus lowest: 74 - 58 = 16

Median — the 4th value in the ordered list: 66

The mean and median are close, which tells you the scores are fairly evenly spread — no extreme result is pulling the average up or down.

This means either measure gives a similar picture for this set of data.


Example 1b — IQR for the same dataset. Ordered data (7 values): 58, 62, 63, 66, 70, 71, 74.

The median is the 4th value: 66. For an odd-count dataset, exclude the median when splitting into halves.

Lower half (values 1–3): 58, 62, 63. Q1 = median = 62.

Upper half (values 5–7): 70, 71, 74. Q3 = median = 71.

\text{IQR} = Q3 - Q1 = 71 - 62 = 9

The middle 50% of scores sit within a 9-point range.

This means the centre of the data is less spread out than the full range suggests.


Example 2 — Weather probability. Over the past 60 days, it has rained on 15 of them. Estimate the probability of rain on any given day.

P(\text{rain}) \approx \frac{15}{60} = \frac{1}{4} = 0.25 = 25\%

This is a relative frequency estimate — it uses past data rather than counting equally likely outcomes. The more data you have, the more reliable the estimate.

This means 25% is an estimate from evidence, not a guarantee.


Example 3 — Picking from a group. A class has 12 students who play sport: 5 play football, 4 play basketball, and 3 play tennis. One student is picked at random to represent the class. What is the probability they play football? What is the probability they do not play basketball?

P(\text{football}) = \frac{5}{12}

P(\text{not basketball}) = \frac{12 - 4}{12} = \frac{8}{12} = \frac{2}{3}

This means “not basketball” includes every outcome except basketball.


Example 4 — Expected score. A game show gives two options: take $300 for certain, or spin a wheel that pays $800 with probability 0.5 and $0 with probability 0.5. What is the expected value of spinning the wheel?

E = (0.5 \times 800) + (0.5 \times 0) = 400 + 0 = 400

The expected value of spinning is 400 — higher than the certain $300. But “expected value” means the average across many spins, not a guarantee for this one spin. You could walk away with nothing.

This means the best long-run average choice is not always the safest single choice.


The simulator below shows the difference between theoretical probability and what actually happens in a finite number of draws. Results vary each time you click — that is the point.

Simulation note: each click on “Simulate 100 draws” generates a fresh set of random draws. The simulated bars will be close to the theoretical bars but rarely identical — that gap between prediction and observation is real and expected. With 1 000 draws the bars would be closer; with 10 draws they might be very far apart. More data means a more reliable estimate.

9.5 Where this goes

This chapter gives us the first tools for reading data honestly. We are no longer asking only “what is the answer?” We are also asking “how typical is it?” and “how sure are we?”

Volume 6 develops this much further: probability distributions, hypothesis testing, and regression.

The expected value calculation in Example 4 is also the foundation of decision theory, financial option pricing (Black-Scholes is built on expected values under a probability distribution), and machine learning loss functions. The concept is old. The applications are new.

Where this shows up

  • Sports commentators and coaches use averages and ranges to compare players’ performance over a season.
  • A weather forecast’s percentage chance of rain is a probability estimated from thousands of past days with similar conditions.
  • A lab scientist reports every measurement with ± uncertainty — that is a statement about spread.
  • A data scientist evaluates a classifier by its error rate — a probability computed from test data.

Statistics is not a course you take. It is a way of reading claims about the world.

9.6 Exercises

  1. The daily high temperatures for one week (°C): −4, −7, −2, 1, 3, −1, −5. Find the mean, median, and range.

  2. A student scores 62, 71, 85, 68, 90, 74, and 55 on seven tests. Find the mean and median. A new student joins the group with a score of 4. Recalculate the mean. Does the median change? What does this show about which measure is more stable under an extreme value?

  3. A standard die has faces 1–6. What is the probability of rolling a number greater than 4? Of rolling an even number? Of rolling a 7?

  4. A bag contains 4 red, 6 blue, and 2 green marbles. A marble is drawn at random.

    1. What is P(red)?
    2. What is P(not blue)?
    3. What is P(red or green)?
  5. A school canteen tracks how many students choose each lunch option over 200 days: hot meal 110 times, sandwich 60 times, salad 30 times. Estimate the probability a randomly chosen day has more than half the students choosing a hot meal. Out of the next 50 days, how many would you expect?

  6. A quiz game has three options:

    • Option A: 80% chance of +$500, 20% chance of −$200
    • Option B: 50% chance of +$900, 50% chance of −$100
    • Option C: certain return of +$280

    Compute the expected value of each option. Which is highest? Which would you choose, and why might expected value alone not be the only consideration?

  7. A set of 9 test scores has a median of 14.2 and a mean of 15.1. One outlier is identified and removed, leaving 8 values with a mean of 14.6. What was the outlier value? (Hint: total of 9 values = mean × 9. Total of remaining 8 = new mean × 8. Outlier = first total − second total.)