32  Proof and mathematical reasoning

Establishing that something must be true

You have used the formula for the sum of an arithmetic series. You have used the quadratic formula. You have been told that \sqrt{2} is irrational. Each time, you accepted the result because it worked on the examples you tried, or because someone authoritative said so.

That is about to change.

The shift from computing to proving is the sharpest turn in mathematics. It is the difference between “this has always worked so far” and “this cannot fail.” Most of the mathematics you have seen up to now has been the first kind. This chapter introduces the second.

32.1 What this chapter helps you do

Symbols to keep handy

These are the bits of notation you'll see a lot. If a line of symbols feels like a fence, read it out loud once, then keep going.

  • , , : the integers, the rationals, the reals

Definitions to keep handy

These are the words we keep coming back to. If one feels slippery, come back here and steady it before you push on.

  • proof: An argument that shows something must be true in all cases, not just in the examples you tried.

  • theorem: A claim worth naming that has been proved.

  • contradiction: A proof method where you assume the opposite and show it leads to an impossibility.

  • induction: A proof method for statements about the natural numbers: base case plus a step that carries truth from n to n+1.

This chapter shows how mathematics moves from convincing examples to guaranteed arguments. We begin by separating pattern-spotting from proof. Then we lay out the logical parts of a proof and the notation that signals them. After that we work through four common proof methods, examine the ways arguments fail, and finish with examples that show proof as an active craft rather than a ritual. The aim is not to make proof feel easy on first contact. It is to make its purpose and structure legible.

Watch for this

When a proof feels hard, the problem is often orientation rather than algebra. Before pushing symbols around, ask:

  • What am I allowed to assume?
  • What exactly must be shown?
  • Which proof method fits that gap?

Those three questions usually matter more than the first line of calculation.

32.2 A pattern that lies to you

Consider the expression n^2 - n + 41. Try a few values:

n n^2 - n + 41 Prime?
1 41 Yes
2 43 Yes
3 47 Yes
4 53 Yes
10 131 Yes
20 421 Yes

In fact, every value from n = 1 to n = 40 gives a prime number.

Forty consecutive confirmations. Most people would bet on “always prime.”

But try n = 41:

41^2 - 41 + 41 = 41^2 = 1681 = 41 \times 41

Clearly composite. The pattern fails the moment you step past the range where you checked it.

This is the central lesson before we write a single proof: no number of confirmed cases constitutes a proof. Not ten, not a hundred, not a trillion. A proof must cover every case, including those that haven’t been tried. This means examples are useful for guessing a theorem, but never enough for establishing one.

32.3 What a proof is

A proof is a sequence of logical deductions from agreed starting points — called axioms or previously proved theorems — leading to a conclusion. Each step must follow from the previous ones by a valid logical rule. If every step is valid, the conclusion is guaranteed. Not likely. Not probably. Guaranteed.

Every proof has the same skeleton:

  • Statement: what you are claiming is true
  • Hypothesis: what you are allowed to assume
  • Conclusion: what you need to establish
  • Proof: the chain of deductions from hypothesis to conclusion

The hypothesis is given to you. The conclusion is what you must reach. Everything in between is your work.

32.3.1 The notation of logical structure

Two symbols carry most of the logical load:

How to read the proof symbols

  • Symbol: P \Rightarrow Q

  • Reads as: “P implies Q” or “if P, then Q”

  • Means: whenever P is true, Q must be true

  • Use when: you are proving a conditional statement

  • Common misread: the arrow has a direction; P \Rightarrow Q does not mean Q \Rightarrow P

  • Symbol: P \Leftrightarrow Q

  • Reads as: “P if and only if Q”

  • Means: each one implies the other (two directions)

  • Use when: you are proving two statements are equivalent

  • Common misread: proving only one direction is not enough

  • Symbol: \forall / \exists

  • Reads as: “for all” / “there exists”

  • Means: a claim about every object / at least one object

  • Use when: you need to be explicit about what your statement ranges over

  • Common misread: mixing up “for all” and “there exists” changes the statement completely

P \Rightarrow Q means P implies Q: if P is true, then Q must be true. Read it as “if P, then Q.” The arrow shows direction — P \Rightarrow Q does not automatically mean Q \Rightarrow P.

P \Leftrightarrow Q means P if and only if Q: the implication runs both ways. P is true exactly when Q is true, and vice versa.

Two quantifiers say which objects you are making claims about:

\forall x means for all x — the statement holds for every element in the domain under discussion.

\exists x means there exists an x — the statement holds for at least one element.

The symbol x \in S means x is a member of the set S. You will see it most often with the standard number sets:

  • \mathbb{N} — the natural numbers: 1, 2, 3, \ldots
  • \mathbb{Z} — the integers: \ldots, -2, -1, 0, 1, 2, \ldots
  • \mathbb{Q} — the rationals: all fractions p/q with p, q \in \mathbb{Z}, q \neq 0
  • \mathbb{R} — the real numbers: everything on the continuous number line

So “\forall n \in \mathbb{Z}” reads: “for every integer n.” A typical theorem opens with a phrase like this to tell you exactly what kind of object is being discussed.

32.4 Four proof methods

There is no single template. What follows are four approaches that cover the vast majority of what you will encounter in the first two years of university mathematics.

32.4.1 1. Direct proof

The simplest strategy: assume the hypothesis, apply definitions and algebra, arrive at the conclusion.

Theorem. The sum of two even integers is even.

Proof. Let m and n be even integers. By definition of “even,” there exist integers a and b such that m = 2a and n = 2b. Then:

m + n = 2a + 2b = 2(a + b)

Since a + b is an integer, m + n is divisible by 2, hence even. \square

Notice the structure: start with the definitions of the objects involved, do algebra, end at the conclusion. The key move was unpacking “even” into its definition (m = 2a) so that the algebra could work. This means direct proof usually begins by translating words like “even” or “odd” into algebraic form.

32.4.2 2. Proof by contrapositive

Every statement “if P then Q” (P \Rightarrow Q) is logically equivalent (meaning: any proof of one automatically proves the other — they make exactly the same claim in different words) to its contrapositive: “if \neg Q then \neg P” (\neg Q \Rightarrow \neg P), where \neg Q means “not Q” — the opposite truth value.

Sometimes the contrapositive is much easier to work with directly.

Theorem. If n^2 is even, then n is even.

Proof. We prove the contrapositive: if n is odd, then n^2 is odd.

Suppose n is odd. Then n = 2k + 1 for some integer k. Squaring:

n^2 = (2k+1)^2 = 4k^2 + 4k + 1 = 2(2k^2 + 2k) + 1

Since 2k^2 + 2k is an integer, n^2 has the form 2(\text{integer}) + 1, so n^2 is odd.

The contrapositive is proved. Therefore the original statement holds. \square

The reason to choose contrapositive over direct proof here: starting from “n^2 is even” and trying to work towards “n is even” is awkward — you’d have to take a square root, and it isn’t obvious that \sqrt{2k} is an integer. The contrapositive avoids that entirely. This means contrapositive proof is not a different theorem, only a smarter route to the same theorem.

32.4.3 3. Proof by contradiction

Assume the conclusion is false. Show that this assumption, together with the hypothesis, leads to a logical impossibility — a statement that contradicts something you know to be true. Since the assumption of falseness leads to a contradiction, the conclusion must be true.

The classic example is one you may have seen stated but never seen proved:

Theorem. \sqrt{2} is irrational.

Proof. Suppose for contradiction that \sqrt{2} is rational. Then we can write \sqrt{2} = p/q where p, q \in \mathbb{Z}, q \neq 0, and p and q share no common factor (the fraction is in lowest terms).

Squaring both sides: 2 = p^2 / q^2, so p^2 = 2q^2.

This means p^2 is even. By the theorem proved above (contrapositive), p must be even. So write p = 2k for some integer k.

Substituting: (2k)^2 = 2q^2, which gives 4k^2 = 2q^2, so q^2 = 2k^2.

This means q^2 is even, so q is even (by the same reasoning we used for p: if a number’s square is even, the number itself must be even).

But now both p and q are even — they share a factor of 2. This contradicts our assumption that p/q was in lowest terms.

The assumption that \sqrt{2} is rational leads to a contradiction. Therefore \sqrt{2} is irrational. \square

Note how the proof assembled earlier results — “if n^2 is even then n is even” was used twice. Proofs are built on top of other proofs. The earlier theorem was not an isolated exercise; it was a tool. This means contradiction works by assuming the opposite and showing that the assumption breaks the logical system you started with.

32.4.4 4. Mathematical induction

Induction proves statements of the form “this holds for every natural number n \geq 1” (or n \geq 0, or from any fixed starting point).

The structure has two parts:

  1. Base case. Show the statement holds for the starting value (usually n = 1).
  2. Inductive step. Show that if the statement holds for n = k (the inductive hypothesis), then it holds for n = k + 1.

If both hold, the statement holds for all n from the base case onward. The logic is a chain: the base case starts the chain, the inductive step extends it one link at a time, and there is no upper limit on the chain length.

Theorem. For all n \geq 1:

1 + 2 + 3 + \cdots + n = \frac{n(n+1)}{2}

Proof.

Base case. When n = 1: the left side is 1, and \frac{1 \cdot 2}{2} = 1. The statement holds.

Inductive step. Assume the statement holds for n = k, i.e., assume:

1 + 2 + \cdots + k = \frac{k(k+1)}{2}

We need to show it holds for n = k + 1, i.e., that:

1 + 2 + \cdots + k + (k+1) = \frac{(k+1)(k+2)}{2}

Starting from the left side and using the inductive hypothesis:

1 + 2 + \cdots + k + (k+1) = \frac{k(k+1)}{2} + (k+1)

Factor out (k+1):

= (k+1)\left(\frac{k}{2} + 1\right) = (k+1) \cdot \frac{k+2}{2} = \frac{(k+1)(k+2)}{2}

This is exactly what we needed to show.

By induction, the formula holds for all n \geq 1. \square

The formula n(n+1)/2 is not new to you — it appeared when we studied arithmetic series. But now you have not just a formula and some confirming examples. You have a proof. It holds for n = 10^{100} as surely as for n = 5. This means induction proves infinitely many cases by establishing a base and a reliable step from one case to the next.

32.5 What makes a proof fail

Not every argument that looks like a proof is one. Three failure modes appear constantly, and learning to recognise them is half the work.

32.5.1 1. Assuming the conclusion

Also called circular reasoning: a step in the proof assumes, in some form, the very thing you are trying to prove.

Here is a compact example. Suppose someone “proves” that 1 = 2 like this:

Let a = b. Then a^2 = ab, so a^2 - b^2 = ab - b^2, so (a+b)(a-b) = b(a-b), so a + b = b. Since a = b, this gives 2b = b, so 2 = 1.

The flaw: the step “so a + b = b” divides both sides by (a - b). But a = b, so a - b = 0. Division by zero is not a valid algebraic step. It is not that the algebra went wrong in a subtle way — the step is simply not allowed.

Circular reasoning is harder to spot than hidden division by zero. Watch for any step that uses the thing you’re trying to establish.

32.5.2 2. Missing a case

A proof that only covers some instances — even most instances — is not a proof. The primality of n^2 - n + 41 was “verified” for 40 cases and failed on the 41st. Missing cases can be more subtle: a proof about integers that silently only works for positive ones, a proof about triangles that only handles the acute case.

Before finishing a proof, ask: are there configurations I haven’t considered? Edge cases, boundary values, negative inputs, degenerate shapes? This means a proof can fail not only through bad algebra, but through an argument that silently covers only part of the problem.

32.5.3 3. An invalid step

An algebraic manipulation that is not universally valid. The most common:

  • Dividing by a variable without confirming it’s nonzero (as in the a = b example above)
  • Taking a square root and losing the negative case: x^2 = 4 implies x = 2 or x = -2, not just x = 2
  • Cancelling factors that might be zero
  • Swapping the order of limits, integrals, or sums without justification (this matters more in later courses)

Every algebraic step has conditions under which it is valid. When those conditions aren’t met, the step is wrong — even if it gives the right answer in specific cases. This means correctness in proof depends on the legality of each step, not on whether the final line happens to look plausible.

TipWhat proofs are actually for

Proofs are not rituals performed for examiners. They are the reason mathematicians can trust results for trillions of cases they’ve never checked.

When you prove that an algorithm runs in O(n \log n) time, you’re not estimating — you’re guaranteeing. When a cryptographic security proof says the encryption is unbreakable under certain assumptions, the logic is auditable. A regulator, a competitor, a court can read every step. The proof is the audit trail.

Physical laws are not just patterns observed in experiments. They are structures whose consequences can be derived — predicted before they are measured. The derivation is a proof. When a prediction matches an observation, it confirms the law; when it doesn’t, the law is wrong. The reliability of science rests on this structure.

Mathematics is the only domain where “I’ve checked a huge number of cases” is explicitly not sufficient, and where a short argument can settle a question for every case simultaneously. That is what makes it powerful.

32.6 Worked examples

32.6.1 Example 1: Direct proof — the product of two odd numbers

Context. Two players take turns on a board game. Each player’s turn takes an odd number of seconds. Prove that no matter how many turns have been played, the product of any two turn lengths is always odd.

Theorem. The product of two odd integers is odd.

Proof. Let m and n be odd integers. By definition, there exist integers a and b such that m = 2a + 1 and n = 2b + 1.

Then: m \cdot n = (2a+1)(2b+1) = 4ab + 2a + 2b + 1 = 2(2ab + a + b) + 1

Since 2ab + a + b is an integer, mn has the form 2(\text{integer}) + 1, so mn is odd. \square

Commentary. The key move was unpacking the definition of “odd” into algebra. Once you write m = 2a + 1 and n = 2b + 1, the result of multiplying them is computable. The only thing left is to recognise that the answer has the right form — it is one more than an even number.

The proof doesn’t appeal to intuition, examples, or the fact that you’ve tested it. It works because the algebra always produces 2(\ldots) + 1, regardless of what a and b are. This means a direct proof succeeds when the conclusion appears naturally from the definition of the objects involved.

32.6.2 Example 2: Proof by contradiction — infinitely many primes

Context. Public-key cryptography (RSA, used in every HTTPS connection) relies on the existence of very large prime numbers. The security argument assumes there are infinitely many primes to draw from. Euclid proved this over 2000 years ago. The argument has not changed.

Theorem. There are infinitely many prime numbers.

Proof. Suppose for contradiction that there are only finitely many primes. List them all: p_1, p_2, \ldots, p_k.

Consider the number: N = p_1 \cdot p_2 \cdots p_k + 1

N is greater than 1, so it has at least one prime factor. Call it p. Since our list contains all primes, p must be one of p_1, p_2, \ldots, p_k.

But p divides p_1 \cdot p_2 \cdots p_k (since p is one of those factors). And p divides N. So p must divide the difference: N - p_1 \cdot p_2 \cdots p_k = 1

But no prime divides 1. Contradiction.

The assumption that there are finitely many primes leads to a contradiction. Therefore there are infinitely many primes. \square

Commentary. The number N is not claimed to be prime. It is constructed deliberately so that dividing it by any prime on the list leaves a remainder of 1. The contradiction is that the prime factor of N must be on the list but cannot be on the list.

This is one of the most elegant proofs in mathematics. It uses no advanced machinery — only the definition of prime, the fact that every integer greater than 1 has a prime factor, and arithmetic. The idea (constructing N) is the creative step; the logic is then straightforward. This means contradiction proofs often hinge on constructing exactly the object that the false assumption cannot accommodate.

32.6.3 Example 3: Proof by induction — 2^n > n for all n \geq 1

Context. In digital circuit design, the number of possible states in a circuit with n binary switches is 2^n. Engineers say this “grows exponentially.” Proving 2^n > n is a basic instance of establishing that exponential growth outpaces linear growth — relevant whenever you’re deciding whether a brute-force approach (checking every state) is feasible.

Theorem. For all integers n \geq 1, 2^n > n.

Proof.

Base case. When n = 1: 2^1 = 2 > 1. The statement holds.

Inductive step. Assume the statement holds for n = k, i.e., assume 2^k > k. We need to show 2^{k+1} > k + 1.

Starting from the inductive hypothesis 2^k > k, multiply both sides by 2: 2^{k+1} = 2 \cdot 2^k > 2k

It suffices to show 2k \geq k + 1, i.e., k \geq 1. Since k \geq 1 by our starting point, this holds.

Therefore 2^{k+1} > 2k \geq k + 1.

By induction, 2^n > n for all n \geq 1. \square

Commentary. The inductive step needed two pieces: first, using the inductive hypothesis to get a bound on 2^{k+1}; second, connecting that bound to k + 1. The step “2k \geq k+1 because k \geq 1” looks almost too obvious to write down. Write it down anyway. In a proof, “obvious” is not a valid reason — the valid reason is “k \geq 1.” This means induction is only convincing when every link in the chain is explicitly justified.

32.7 Where this goes

This chapter shifted mathematics from calculation to justification. The ideas ahead do not only need formulas; they rely on knowing why the formulas and theorems can be trusted.

Throughout this series. Every other volume has been making claims without full justification — the sum formula for arithmetic series, the laws of exponents, properties of functions. You now have the tools to ask “how do we know that?” and to read the answer. The proofs are in the textbooks; you can now follow them.

Complex analysis (Volume 7). The central results there — Cauchy’s integral theorem, the residue theorem — are not computational tricks. They are theorems with proofs that depend on precise definitions of limits, continuity, and differentiability. Those proofs are accessible once you are comfortable with the logical structure introduced here. Complex analysis is where proof and computation meet in the most surprising way: a function’s behaviour on a closed curve can determine its behaviour at every point inside.

TipApplications

Algorithm correctness. Proving that an algorithm produces the right output for every valid input, not just the test cases you’ve tried. Induction is the standard tool for array algorithms and recursive functions.

Cryptographic security proofs. RSA’s security is reduced to the difficulty of factoring large numbers — the reduction is a proof. If you could break RSA, you could factor; but factoring appears hard. The chain of implication is a formal proof.

Derivation from physical laws. The equations governing heat flow, electromagnetic fields, and quantum mechanics are not empirical curve fits. They are derived — proved — from foundational principles. The derivations are proofs.

Formal verification. Aviation, medical devices, and nuclear control systems are subject to formal verification: software is checked against a mathematical model, and the check is a proof. DO-178C (aviation software standard) and IEC 62304 (medical software) both recognise formal methods. The mathematical core is the logic introduced here.

32.8 What you can do now

You should now be better able to separate evidence from proof, identify the hypothesis and conclusion of a statement, and choose among direct proof, contrapositive, contradiction, and induction with more intention. Just as important, you should be better able to diagnose why an argument fails. The next time you meet a theorem in calculus or algebra, you are no longer limited to accepting it on trust alone.

32.9 Exercises

Exercise 1. Prove that if n is divisible by 3, then n^2 is divisible by 9.

Hint. Write n = 3k for some integer k, then compute n^2.


Exercise 2. Prove by contradiction that \log_2 3 is irrational.

Hint. Assume \log_2 3 = p/q in lowest terms and use the definition of logarithm: \log_2 3 = p/q means 2^{p/q} = 3.


Exercise 3. Prove by induction that n! > 2^n for all n \geq 4.

Instructions. (a) Check the base case n = 4 by computing both sides. (b) Write out the inductive step: assuming k! > 2^k, show (k+1)! > 2^{k+1}.


Exercise 4. Find the flaw in the following “proof” that all horses are the same colour.

Claim. In any group of n horses, all horses are the same colour.

Proof.

Base case. When n = 1: a group of one horse trivially has all horses the same colour (there is only one). ✓

Inductive step. Assume any group of k horses is the same colour. Consider a group of k+1 horses: H_1, H_2, \ldots, H_k, H_{k+1}.

Look at the first k horses \{H_1, \ldots, H_k\}. By the inductive hypothesis, they are all the same colour.

Now look at the last k horses \{H_2, \ldots, H_{k+1}\}. By the inductive hypothesis, they are all the same colour.

These two groups overlap (they share H_2, \ldots, H_k), so all k+1 horses must be the same colour.

By induction, all horses in any group are the same colour. \square

The base case is correct. The logic in the inductive step sounds plausible. But the conclusion is obviously false. Where does the proof break?

Hint. What happens to the “overlap” argument when k = 1?


Exercise 5. (Harder.) Prove that between any two distinct real numbers, there exists a rational number.

More precisely: prove that for any a, b \in \mathbb{R} with a < b, there exists q \in \mathbb{Q} with a < q < b.

Hint. You will need the Archimedean property: for any real number x > 0, there exists a natural number n such that n > x. Use it to find an integer n large enough that 1/n < b - a. Then find an integer m such that m/n falls in the interval (a, b).