And by “earlier,” I mean as the very first thing. Let me explain.
I was watching a 3Blue1Brown video on implicit differentiation. This is one area of Calculus that had faded almost into nothingness in my memory. As always, the video clearly explained the topic and I refreshed my memory on the area, but something didn’t quite sit right with me. In particular, the problem where he introduces a function S(x, y) felt artificial, and he also never quite explains where the dx and dy come from.
I looked for a Khan Academy video on the same subject, but found it to be substantially worse. It just talks about the chain rule and how to apply it, never quite explaining why it works. And like so much of Calculus education, it plays very fast and loose with dx and dy. Are they operators? Functions? Infinitesimals? Something else? Can I treat them as a fraction? An algebraic variable? And the answer is always that sometimes it’s like a fraction, sometimes not, and you’ll just have to figure out when certain things are allowed. And is the chain rule *fundamental *to the technique or is there something more general going on (well, I already knew from the 3Blue1Brown video that it wasn’t fundamental, but why the difference)?
It got me thinking. I’ve always liked the infinitesimal approach to Calculus, so let’s apply it here. Using a concrete example, let’s differentiate y = log(x). First, we can single out another point on the curve as:
y+dy = log(x+dx)
Note that I’m not even using infinitesimals yet. I’m just stating the obvious fact that (x+dx, y+dy) is another point on the curve. Take both our equations and exponentiate both sides:
e[sup]y[/sup] = x
e[sup]y+dy[/sup] = x+dx
We ultimately want to express an answer as dy/dx. Let’s subtract the two equations and see what happens:
e[sup]y+dy[/sup] - e[sup]y[/sup] = dx
e[sup]y[/sup](e[sup]dy[/sup] - 1) = dx
Again, I haven’t even used infinitesimals yet, just algebra. But for the next step, I have to assume that dy is small. What is an approximation of e[sup]dy[/sup]? Well, we know the Taylor series for e[sup]x[/sup] starts with 1+x (Was that cheating, since the Taylor series requires differentiating e[sup]x[/sup]? Not really, since the very definition of e is that very number where the slope of e[sup]x[/sup] equals e[sup]x[/sup]. So the second term of the Taylor series must have a coefficient of 1).
Therefore, we have:
e[sup]y[/sup](1 + dy - 1) = dx
e[sup]y[/sup] dy = dx
But we know that e[sup]y[/sup]=x. And so we have:
dy/dx = 1/x
Easy. There was only point point where I made any assumption that dy was small, and that was easily justified. And in fact I could have used any number of other justifications for the same argument.
Also note that I didn’t need anything else in Calculus; certainly not the chain rule, nor even the standard “(f(x+h) - f(x))/h”. Needing so little, this felt more fundamental than the usual methods. And it shows you exactly what dx and dy are and where they came from: they’re just variables that you introduced in order to specify a different point on the curve. And it’s obvious when we’re doing completely ordinary algebraic ops and when we’re dropping terms because they are too small to care about.
To my mind, this works better than the usual (f(x+h) - f(x))/h approach. That’s similar, but the early divide feels awkward. If you make implicit differentiation the starting point, the divide only comes at the very end when you’re happy with the results. And then you don’t have to teach implicit differentiation as a separate thing, since it’s actually the most fundamental thing and you can always fall back to it in a bind.
3Blue1Brown did a more complex example, but to my mind it became even less clear why and when the dx and dy come into play. So let me try it again using the infinitesimal style:
sin(x)y[sup]2[/sup] = x
sin(x+dx)(y+dy)[sup]2[/sup] = x+dx
sin(x+dx)(y+dy)[sup]2[/sup] - sin(x)y[sup]2[/sup] = dx
(sin(x) + dxcos(x))(y[sup]2[/sup] + 2ydy + dy[sup]2[/sup]) - sin(x)y[sup]2[/sup] = dx
(sin(x)y[sup]2[/sup] + dxcos(x)y[sup]2[/sup] + 2sin(x)ydy + [small terms]) - sin(x)y[sup]2[/sup] = dx
dxcos(x)y[sup]2[/sup] + 2sin(x)ydy = dx
2sin(x)ydy = dx - dxcos(x)y[sup]2[/sup]
2sin(x)ydy = dx(1- cos(x)y[sup]2[/sup])
dy/dx = (1- cos(x)y[sup]2[/sup])/(2sin(x)y)
y[sup]2[/sup] = x/sin(x) [rearrange eq 1]
dy/dx = (1 - cos(x)x/sin(x))/(2sin(x)y)
dy/dx = (1 - cos(x)x)/(2sin[sup]2/supy)
And there’s one remaining y. But it needs a square root, so we know there are two sets of curves, just as there would have been in the original:
dy/dx = (1 - cos(x)x)/(2sin[sup]2/supsqrt(x/sin(x))
dy/dx = (1 - cos(x)x)/(-2sin[sup]2/supsqrt(x/sin(x))
Again, very straightforward. I didn’t even need the product rule! I only needed one tiny bit of other calculus (that (sin(x))’ = cos(x)), and two assumptions about the infinitesimals: that f(x+dx) =~ f(x) + dx*f’(x), and that any total power of a dx or dy above 1 can be discarded.
Now, obviously there are limits to this approach (no pun intended), because eventually you want to manipulate things in a higher order fashion, with the chain rule/product rule/etc. And then you get to integrals where you have to think even more abstractly. But why isn’t implicit differentiation taught immediately using an approach like this? It makes the manipulations very concrete and puts differentiation on a broader base right from the start.