30  Differential calculus

The instantaneous rate of change

Your car’s GPS records your average speed over the entire trip: 62 km/h. But your speedometer, right now, reads 90 km/h. These are different quantities. The GPS average looks back over the whole journey; the speedometer is telling you something about this instant. The question “how fast am I going right now?” is not the same question as “how fast did I go on average?” — and answering it requires a fundamentally different tool.

Here is a sharper version of the same problem. A drug is injected into the bloodstream. A doctor monitors its concentration — measured in milligrams per litre — every hour. She can compute the average rate of clearance over a two-hour window. But what determines the dosing schedule is the instantaneous rate of clearance right now: how fast is the concentration dropping at this moment? If that rate is too slow, the drug accumulates to toxic levels. Too fast, and the dose is ineffective before the next one can be given. The decision is made on the basis of a rate at a point in time, not an average across an interval.

Both situations share the same mathematical structure. In chapter 1 you met limits: the value that a function approaches as the input gets arbitrarily close to some point. The speedometer reading is a limit — the limit of average speeds over shorter and shorter time intervals. This chapter builds that limit into a systematic tool.

30.1 What this chapter helps you do

Symbols to keep handy

These are the bits of notation you'll see a lot. If a line of symbols feels like a fence, read it out loud once, then keep going.

  • f’(x): f prime of x — the derivative of f

  • [f(x)]: d by dx of f — the derivative operator applied to f

  • f’’(x): f double-prime of x — the second derivative of f

Definitions to keep handy

These are the words we keep coming back to. If one feels slippery, come back here and steady it before you push on.

  • derivative: The instantaneous rate of change of a function: how fast the output is changing per unit change in the input, right now.

  • tangent line: The line that matches the curve’s direction at a point; its slope is the derivative there.

  • second derivative: The rate of change of the rate of change: it measures curvature/acceleration effects.

This chapter turns the idea of a limit into a practical language of change. We begin with the derivative as a limit of average rates, then build the standard differentiation rules that make real calculations possible. After that, we use first and second derivatives to describe shape, motion, and optimisation. The central question throughout is simple: what does this derivative say about how the quantity is changing right now?

Watch for this

A derivative is not just the result of a rule. It is a rate. Whenever you differentiate, pause long enough to name:

  • what quantity is changing
  • what it is changing with respect to
  • what the sign and size of the derivative mean in context

30.2 The limit definition of the derivative

The derivative has a precise definition as a limit. Start with the average rate of change of f over the interval from x to x + h:

\frac{f(x+h) - f(x)}{h}

The numerator f(x+h) - f(x) is the change in output. The denominator h is the change in input. Their ratio is the slope of the secant line connecting the two points (x,\, f(x)) and (x+h,\, f(x+h)) on the graph. This means the derivative starts with an ordinary average rate of change over an interval you can still see and measure.

Now let h \to 0. The second point slides toward the first. The secant line tilts and, in the limit, becomes the tangent line at x. The slope of that tangent line is the derivative:

\boxed{f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}}

This is the formal definition. Every differentiation rule derived later is a consequence of it. The definition itself is the limit you saw in chapter 1, applied to a specific quotient. This means the derivative is not separate from limits. It is the limit of average change as the interval collapses to a point.

30.3 What the notation is saying

Two notations for the derivative are standard. You will need both.

How to read derivative notation

  • Symbol: f'(x)

  • Reads as: “f prime of x”

  • Means: the instantaneous rate of change of f at input x (the tangent slope)

  • Use when: you’re working with a named function and a single clear variable

  • Common misread: f'(x) is a new function (it outputs a rate); it is not the same as f(x)

  • Symbol: \dfrac{dy}{dx}

  • Reads as: “d y by d x”

  • Means: the derivative of y with respect to x (what changes, and what it changes with respect to)

  • Use when: units or multiple variables matter

  • Common misread: it is not a literal fraction, but it behaves like one in many algebraic manipulations

Prime notation (Newton): f'(x), read f prime of x. Clean and fast to write. Used when there is no ambiguity about which variable you’re differentiating with respect to.

Leibniz notation: \dfrac{dy}{dx}, read dy by dx. More verbose but carries structural information — it names both the output variable (y) and the input variable (x). When a problem involves several variables, or when you need to keep track of units, Leibniz notation makes the algebra clearer. Most science and engineering texts default to it.

Both mean the same thing. If y = f(x), then:

f'(x) = \frac{dy}{dx} = \frac{d}{dx}\bigl[f(x)\bigr]

The expression \dfrac{d}{dx}[\,\cdot\,] is an operator — it takes a function as input and produces its derivative as output. Think of d/dx as an instruction: differentiate with respect to x. This means all three notations describe the same rate-of-change object, but each notation highlights a different aspect of it.

The tangent line

The derivative f'(a) is the slope of the tangent line to the graph of f at the point (a, f(a)) — the straight line through (a, f(a)) with slope f'(a), which is the best linear approximation to f near x = a.

Unlike the informal picture, a tangent line can cross the curve — f(x) = x^3 at the origin has a tangent line that crosses. What defines a tangent is its slope, not whether it crosses.

This is why a function fails to be differentiable at a corner or cusp: no unique tangent line exists there.

30.4 The method — differentiation rules

Computing every derivative from the limit definition would be exhausting. Fortunately, the limit has been worked out once for each class of function, packaged into rules, and those rules can be applied mechanically. The skill in differentiation is knowing which rule to reach for.

30.4.1 Power rule

For any function f(x) = x^n, where n is a real number:

\frac{d}{dx}\bigl[x^n\bigr] = n x^{n-1}

This is the most-used rule in calculus. It applies to positive integers, negative integers, fractions — any real exponent. This means the power rule is the default tool for algebraic functions because it turns a hard limit into a quick pattern.

Derivation for positive integer n. Start from the limit definition:

f'(x) = \lim_{h \to 0} \frac{(x+h)^n - x^n}{h}

To make progress, we need to expand (x+h)^n. For small n you can do this by multiplying out, but the pattern for general n comes from the binomial theorem — covered in Vol 3 polynomials. For n = 2: (x+h)^2 = x^2 + 2xh + h^2. For n = 3: (x+h)^3 = x^3 + 3x^2h + 3xh^2 + h^3. In general:

(x+h)^n = x^n + n x^{n-1} h + \frac{n(n-1)}{2} x^{n-2} h^2 + \cdots + h^n

Subtract x^n and divide by h:

\frac{(x+h)^n - x^n}{h} = n x^{n-1} + \frac{n(n-1)}{2} x^{n-2} h + \cdots + h^{n-1}

Every term except the first contains at least one factor of h. As h \to 0, all those terms vanish:

f'(x) = \lim_{h \to 0} \left( n x^{n-1} + \text{terms with } h \right) = n x^{n-1}

The rule is proved. Notice what happened: the limit killed every term except the leading one. This is why the power rule is so clean. This means the derivative keeps the first-order change and discards the higher-order terms that become negligible as h shrinks.

Examples:

f(x) f'(x)
x^5 5x^4
x^{-2} -2x^{-3}
\sqrt{x} = x^{1/2} \tfrac{1}{2} x^{-1/2}
1 = x^0 0

The last row says: the derivative of any constant is zero. A constant function has zero slope everywhere — that makes sense.

30.4.2 Sum/difference and constant multiple rules

These follow directly from the limit definition: because the limit of a sum is the sum of the limits, and a constant factor passes through a limit.

Constant multiple rule: \frac{d}{dx}\bigl[c \cdot f(x)\bigr] = c \cdot f'(x)

Sum/difference rule: \frac{d}{dx}\bigl[f(x) \pm g(x)\bigr] = f'(x) \pm g'(x)

Together these mean you can differentiate a polynomial term by term. For p(x) = 3x^4 - 2x^2 + 5x - 1:

p'(x) = 3 \cdot 4x^3 - 2 \cdot 2x + 5 \cdot 1 - 0 = 12x^3 - 4x + 5 This means linearity lets you break a large derivative into smaller pieces and differentiate each piece separately.

30.4.3 Product rule

The derivative of a product is not the product of the derivatives. That failure is easy to verify: \tfrac{d}{dx}[x \cdot x] = \tfrac{d}{dx}[x^2] = 2x, but \tfrac{d}{dx}[x] \times \tfrac{d}{dx}[x] = 1 \times 1 = 1. Different answers.

The correct rule: if f and g are both differentiable, then:

\frac{d}{dx}\bigl[f(x)\cdot g(x)\bigr] = f'(x)\cdot g(x) + f(x)\cdot g'(x)

A useful mnemonic: derivative of first times second, plus first times derivative of second. The rule accounts for how both factors are changing simultaneously. This means a product changes for two reasons at once: the first factor changes and the second factor changes.

Check: \tfrac{d}{dx}[x \cdot x] = 1 \cdot x + x \cdot 1 = 2x. Correct.

30.4.4 Chain rule

The chain rule handles composite functions — functions of functions. If y = f(u) and u = g(x), so that y = f(g(x)), then:

\frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx}

In prime notation: \bigl(f \circ g\bigr)'(x) = f'\bigl(g(x)\bigr) \cdot g'(x).

This is the most important and most commonly misapplied rule in calculus. The key skill is identifying the outer and inner functions before differentiating. This means the chain rule measures how a change in x affects the inner function, then how that inner change affects the outer one.

Example. Differentiate y = \sin(x^2).

  • Outer function: f(u) = \sin(u), so \dfrac{dy}{du} = \cos(u).
  • Inner function: g(x) = x^2, so \dfrac{du}{dx} = 2x.
  • Chain rule: \dfrac{dy}{dx} = \cos(x^2) \cdot 2x = 2x\cos(x^2).

The notation \dfrac{dy}{dx} = \dfrac{dy}{du} \cdot \dfrac{du}{dx} looks like the du’s cancel — and you can use it as if they do. The reason it works is multiplicative structure in the limit, not literal fraction cancellation, but the practical result is the same: set up the product and compute. This means composite functions must be differentiated layer by layer.

30.4.5 Derivatives of key functions

These are stated here without proof. Each can be derived from the limit definition, but the derivations involve limits of trigonometric quotients and the definition of e — both of which are established results.

Function Derivative Read the derivative as
\sin x \cos x Cosine is the rate of change of sine — maximum slope at x=0, zero slope at the peaks
\cos x -\sin x Sine with a sign flip — sine is decreasing where cosine reaches its peak
e^x e^x The exponential is its own derivative — the rate of growth equals the current value
\ln x \dfrac{1}{x} Valid for x > 0 — the domain of \ln x. For x < 0, use \ln\lvert x\rvert, whose derivative is also \frac{1}{x}

The self-referential property of e^x — that it equals its own derivative — is why e appears everywhere in physics and engineering. Any system whose rate of change is proportional to its current value is governed by an exponential, and the differential equation describing it is \dfrac{dy}{dx} = y. This means the table of key derivatives is really a table of standard change behaviours that recur in many different contexts.

30.5 Higher derivatives

The derivative of f is itself a function. You can differentiate it again. The second derivative f''(x) (or \dfrac{d^2y}{dx^2}) is the rate of change of the rate of change. This means the first derivative tells you how steep the graph is, and the second derivative tells you how that steepness is itself changing.

The clearest physical example: if s(t) is position at time t, then s'(t) is velocity and s''(t) is acceleration. Acceleration is how quickly velocity is changing.

The second derivative also carries geometric information:

  • f''(x) > 0: the curve is concave up at x — the slope is increasing, the curve bends upward like a bowl.
  • f''(x) < 0: the curve is concave down at x — the slope is decreasing, the curve bends downward like an arch.

This is what the second derivative test uses.

30.6 Applications: optimisation

One of the most immediate uses of calculus is finding where a function reaches its maximum or minimum value.

Critical points. If f is differentiable and has a local maximum or minimum at x = c, then the tangent line at c must be horizontal — a tilted tangent would mean the function is still rising or falling, so c couldn’t be an extreme point. Therefore:

f'(c) = 0 \implies c \text{ is a critical point}

Not every critical point is a maximum or minimum — it could be a saddle point (like x = 0 for f(x) = x^3, where the function is neither increasing nor decreasing at that instant but continues in the same direction on both sides). The second derivative distinguishes the cases. This means critical points are candidates for extrema, not guarantees.

Second derivative test. If f'(c) = 0:

  • If f''(c) > 0: concave up at c, so the tangent is at the bottom of a bowl — local minimum.
  • If f''(c) < 0: concave down at c, so the tangent is at the top of an arch — local maximum.
  • If f''(c) = 0: the test is inconclusive. What to do when f''(c) = 0 is deferred to multivariable calculus — the tools for that case are richer there. In single-variable calculus, if the second derivative test is inconclusive, you check the sign of f' on either side of c directly. This means the second derivative test reads the local shape of the graph to classify what the first derivative found.

Worked procedure. Find the local extrema of f(x) = x^3 - 6x^2 + 9x + 1.

Step 1. Differentiate: f'(x) = 3x^2 - 12x + 9.

Step 2. Set f'(x) = 0: 3x^2 - 12x + 9 = 0 \implies x^2 - 4x + 3 = 0 \implies (x-1)(x-3) = 0.

Critical points: x = 1 and x = 3.

Step 3. Compute f''(x) = 6x - 12.

Step 4. Test each critical point:

  • f''(1) = 6(1) - 12 = -6 < 0: local maximum at x = 1. Value: f(1) = 1 - 6 + 9 + 1 = 5.
  • f''(3) = 6(3) - 12 = 6 > 0: local minimum at x = 3. Value: f(3) = 27 - 54 + 27 + 1 = 1.

The function rises to a local maximum of 5 at x = 1, falls to a local minimum of 1 at x = 3, then rises again.

30.7 Worked examples


Example 1 (Computing/data). Differentiate f(x) = 3x^4 - 2x^2 + 5x - 1.

Apply the power rule term by term, with the constant multiple and sum rules:

f'(x) = 3 \cdot 4x^3 - 2 \cdot 2x^1 + 5 \cdot 1 - 0

f'(x) = 12x^3 - 4x + 5

This is a polynomial — differentiating a polynomial always produces a polynomial of degree one lower. The constant term -1 vanishes: a constant has zero slope everywhere. This means polynomial derivatives keep the same algebraic structure but become one degree less steep.


Example 2 (Science). Differentiate f(x) = x^2 \sin(x).

This is a product. Identify f(x) = x^2 (the first factor) and g(x) = \sin(x) (the second). Apply the product rule:

\frac{d}{dx}\bigl[x^2 \sin x\bigr] = 2x \cdot \sin x + x^2 \cdot \cos x

= 2x\sin x + x^2 \cos x

This kind of function appears in wave mechanics: x^2 \sin x models an oscillation whose amplitude grows quadratically with distance from the source. This means the product rule keeps track of both the oscillation and the changing amplitude.


Example 3 (Engineering). Differentiate f(x) = e^{3x^2}.

This is a composite function. Identify the outer and inner functions:

  • Outer: f(u) = e^u, which has derivative e^u.
  • Inner: g(x) = 3x^2, which has derivative 6x.

Chain rule:

\frac{d}{dx}\bigl[e^{3x^2}\bigr] = e^{3x^2} \cdot 6x = 6x\,e^{3x^2}

Functions of this form appear in Gaussian distributions, heat diffusion kernels, and signal-processing windows. The chain rule is the only way to differentiate them correctly. This means the outer exponential survives, while the inner function contributes an extra scaling factor.


Example 4 (Real-world optimisation). A farmer has 80 m of fencing and wants to enclose a rectangular paddock. One side is a riverbank, which needs no fence. What dimensions maximise the enclosed area?

Set up. Let the side perpendicular to the river have length x (metres). Two of these sides are needed. The side parallel to the river, call it y, uses the remaining fencing:

2x + y = 80 \implies y = 80 - 2x

The area is:

A(x) = x \cdot y = x(80 - 2x) = 80x - 2x^2

Differentiate and find critical points:

A'(x) = 80 - 4x

Setting A'(x) = 0: 80 - 4x = 0 \implies x = 20.

Classify: A''(x) = -4 < 0 everywhere, so x = 20 is a local (and global) maximum.

Dimensions: x = 20 m, y = 80 - 2(20) = 40 m.

Maximum area: A = 20 \times 40 = 800\text{ m}^2.

The shape that maximises the area is twice as wide as it is deep — a result that appears in architectural and agricultural optimisation problems wherever one boundary is free. This means optimisation turns a practical design question into a derivative question about where increase changes to decrease.

30.8 Where this goes

This chapter moved from limits to a full toolkit for describing change. The next chapter asks the inverse question: once a rate is known, how do we recover the total change it produces?

Integral calculus (ch03): Integration inverts differentiation. Where this chapter asked “given a function, find its rate of change,” the next chapter asks “given the rate of change, recover the original function.” The Fundamental Theorem of Calculus — the deepest result in the subject — shows these two operations are inverses of each other. Every calculation of area, volume, work, and accumulated change runs through it.

Ordinary differential equations (Vol 7): A differential equation is a relationship between a function and its own derivative — an equation that says something like “the rate of change of this quantity is proportional to the quantity itself.” Everything in this chapter is prerequisite: you need to know what a derivative is, how to compute one, and how the chain rule works, before a differential equation makes sense. The entire Vol 7 ODE sequence begins here.

Where this shows up

  • Velocity and acceleration. If s(t) is the position of an object at time t, then s'(t) is its velocity and s''(t) is its acceleration. Every problem in Newtonian mechanics is stated in these terms.
  • Marginal analysis. In economics, marginal cost is the derivative of total cost with respect to quantity. Profit is maximised where marginal revenue equals marginal cost — a critical point of the profit function.
  • Newton’s law of cooling. The rate of heat loss of an object is proportional to the difference between the object’s temperature and the ambient temperature. That “rate of heat loss” is a derivative. The law is \tfrac{dT}{dt} = -k(T - T_\text{ambient}) — a differential equation that lives in Vol 7.
  • Gradient descent. The optimisation algorithm that trains neural networks works by computing the derivative of the loss function with respect to each model parameter, then updating the parameters in the direction of steepest descent. The chain rule — applied thousands of times through a deep network — is called backpropagation.

30.9 What you can do now

You can now move among three connected views of differentiation: a limit of average change, the slope of a tangent line, and a computational tool for rates, turning points, and optimisation. You should be able to differentiate standard expressions, interpret the sign of a derivative in context, and use first and second derivatives to reason about local behaviour. The next chapter reverses the question: given a rate, recover the total accumulation.

30.10 Exercises

These are puzzles. Each has a clean answer, but the interesting part is choosing the right rule and executing the steps carefully before you reach for the result.

Exercise 1. Differentiate f(x) = 4x^5 - 3x^3 + 7x - 2 using the power rule.


Exercise 2. Differentiate f(x) = x^3 \cos(x) using the product rule.


Exercise 3. Differentiate f(x) = (2x^3 + 1)^5 using the chain rule.


Exercise 4. Find all critical points of f(x) = x^3 - 3x^2 - 9x + 2 and classify each as a local maximum, local minimum, or neither.


Exercise 5. A manufacturer’s total cost in dollars for producing q units per day is C(q) = q^3 - 6q^2 + 15q + 50. At what production level is the marginal cost minimised? (Marginal cost is C'(q).)


Exercise 6. A stone is dropped from a bridge. Its height above the water at time t seconds is h(t) = 45 - 4.9t^2 metres. At what speed (in m/s) is it falling at the instant it hits the water?