Why isn't implicit differentiation using infinitesimals taught earlier?

And by “earlier,” I mean as the very first thing. Let me explain.

I was watching a 3Blue1Brown video on implicit differentiation. This is one area of Calculus that had faded almost into nothingness in my memory. As always, the video clearly explained the topic and I refreshed my memory on the area, but something didn’t quite sit right with me. In particular, the problem where he introduces a function S(x, y) felt artificial, and he also never quite explains where the dx and dy come from.

I looked for a Khan Academy video on the same subject, but found it to be substantially worse. It just talks about the chain rule and how to apply it, never quite explaining why it works. And like so much of Calculus education, it plays very fast and loose with dx and dy. Are they operators? Functions? Infinitesimals? Something else? Can I treat them as a fraction? An algebraic variable? And the answer is always that sometimes it’s like a fraction, sometimes not, and you’ll just have to figure out when certain things are allowed. And is the chain rule *fundamental *to the technique or is there something more general going on (well, I already knew from the 3Blue1Brown video that it wasn’t fundamental, but why the difference)?

It got me thinking. I’ve always liked the infinitesimal approach to Calculus, so let’s apply it here. Using a concrete example, let’s differentiate y = log(x). First, we can single out another point on the curve as:
y+dy = log(x+dx)

Note that I’m not even using infinitesimals yet. I’m just stating the obvious fact that (x+dx, y+dy) is another point on the curve. Take both our equations and exponentiate both sides:
e[sup]y[/sup] = x
e[sup]y+dy[/sup] = x+dx

We ultimately want to express an answer as dy/dx. Let’s subtract the two equations and see what happens:
e[sup]y+dy[/sup] - e[sup]y[/sup] = dx
e[sup]y[/sup](e[sup]dy[/sup] - 1) = dx

Again, I haven’t even used infinitesimals yet, just algebra. But for the next step, I have to assume that dy is small. What is an approximation of e[sup]dy[/sup]? Well, we know the Taylor series for e[sup]x[/sup] starts with 1+x (Was that cheating, since the Taylor series requires differentiating e[sup]x[/sup]? Not really, since the very definition of e is that very number where the slope of e[sup]x[/sup] equals e[sup]x[/sup]. So the second term of the Taylor series must have a coefficient of 1).

Therefore, we have:
e[sup]y[/sup](1 + dy - 1) = dx
e[sup]y[/sup] dy = dx

But we know that e[sup]y[/sup]=x. And so we have:
dy/dx = 1/x

Easy. There was only point point where I made any assumption that dy was small, and that was easily justified. And in fact I could have used any number of other justifications for the same argument.

Also note that I didn’t need anything else in Calculus; certainly not the chain rule, nor even the standard “(f(x+h) - f(x))/h”. Needing so little, this felt more fundamental than the usual methods. And it shows you exactly what dx and dy are and where they came from: they’re just variables that you introduced in order to specify a different point on the curve. And it’s obvious when we’re doing completely ordinary algebraic ops and when we’re dropping terms because they are too small to care about.

To my mind, this works better than the usual (f(x+h) - f(x))/h approach. That’s similar, but the early divide feels awkward. If you make implicit differentiation the starting point, the divide only comes at the very end when you’re happy with the results. And then you don’t have to teach implicit differentiation as a separate thing, since it’s actually the most fundamental thing and you can always fall back to it in a bind.

3Blue1Brown did a more complex example, but to my mind it became even less clear why and when the dx and dy come into play. So let me try it again using the infinitesimal style:
sin(x)y[sup]2[/sup] = x
sin(x+dx)(y+dy)[sup]2[/sup] = x+dx
sin(x+dx)(y+dy)[sup]2[/sup] - sin(x)y[sup]2[/sup] = dx
(sin(x) + dxcos(x))(y[sup]2[/sup] + 2ydy + dy[sup]2[/sup]) - sin(x)y[sup]2[/sup] = dx
(sin(x)y[sup]2[/sup] + dxcos(x)y[sup]2[/sup] + 2sin(x)ydy + [small terms]) - sin(x)y[sup]2[/sup] = dx
dx
cos(x)y[sup]2[/sup] + 2sin(x)ydy = dx
2sin(x)ydy = dx - dxcos(x)y[sup]2[/sup]
2
sin(x)ydy = dx(1- cos(x)y[sup]2[/sup])
dy/dx = (1- cos(x)y[sup]2[/sup])/(2
sin(x)y)
y[sup]2[/sup] = x/sin(x) [rearrange eq 1]
dy/dx = (1 - cos(x)x/sin(x))/(2sin(x)y)
dy/dx = (1 - cos(x)x)/(2
sin[sup]2/supy)

And there’s one remaining y. But it needs a square root, so we know there are two sets of curves, just as there would have been in the original:
dy/dx = (1 - cos(x)x)/(2sin[sup]2/supsqrt(x/sin(x))
dy/dx = (1 - cos(x)x)/(-2
sin[sup]2/supsqrt(x/sin(x))

Again, very straightforward. I didn’t even need the product rule! I only needed one tiny bit of other calculus (that (sin(x))’ = cos(x)), and two assumptions about the infinitesimals: that f(x+dx) =~ f(x) + dx*f’(x), and that any total power of a dx or dy above 1 can be discarded.

Now, obviously there are limits to this approach (no pun intended), because eventually you want to manipulate things in a higher order fashion, with the chain rule/product rule/etc. And then you get to integrals where you have to think even more abstractly. But why isn’t implicit differentiation taught immediately using an approach like this? It makes the manipulations very concrete and puts differentiation on a broader base right from the start.

I have no idea what any of that means and the last time I had to do anything beyond basic arithmetic by hand was in math class.

I would love a more intuitive way to grok differentiation and/or integration.
I don’t think you have found it though. I failed using this process even for y=x^2.

I probably started a little more advanced than I needed to. Here are the steps for your example:
y = x[sup]2[/sup]
y+dy = (x+dx)[sup]2[/sup]

Expand the second:
y+dy = x[sup]2[/sup] + 2x*dx + dx[sup]2[/sup]

Remove high order terms:
y+dy = x[sup]2[/sup] + 2x*dx

Subtract the first:
dy = 2x*dx

And finally, divide to get dy/dx on the left:
dy/dx = 2x

We can easily generalize to x[sup]N[/sup]:
y+dy = (x+dx)[sup]N[/sup] = x[sup]N[/sup] + N*x[sup]N-1[/sup]*dx + [small terms]

Subtract y=x[sup]N[/sup] and remove small terms:
dy = N*x[sup]N-1[/sup]*dx

Divide:
dy/dx = N*x[sup]N-1[/sup]

Of course, for simple derivatives, this is pretty much the same as the ordinary technique. But it gets students used to putting an infinitesimal dy on the y terms, wherever they may be: on the left, on the right, inside a function, or whatever. Your example could have been stated as “x[sup]2[/sup] - y = 0” and it would work just as well, only requiring students to move things left and right at the end when it is simpler.

I’m not 100% sure what techniques is the OP talking about, since I’ve never studied derivatives in English, but I think part of the issue here is that, AIUI, in the US integrals go before derivatives. That’s why you can start directly on chains: in our case, we did derivatives first.

So for us it went:

  1. limits.
  2. using limits to find the basic derivatives (I think this may be what the OP is proposing, only with a different name).
  3. learning combination techniques such as chaining.

I don’t expect anyone would go through that sort of verbose explicit calculation every time any more than they would with one variable. Given an equation like y - x[sup]2[/sup] = 0, one would calculate the total differential (a differential form) d(y - x[sup]2[/sup]) = dy - 2x dx = 0. Then you can solve for dy/dx, as it were. This is all taught rigorously in first-year calculus.

If some video does not explain why the chain rule works, then it must be doing a poor job of teaching any calculus.

Furthermore, perhaps the video in question fails to convey some basic geometrical intuition: an equation like S(x, y) = 0 defines a curve in the plane, and when we differentiate it what we get is the equation of the line tangent to the curve at any point. (At some point we may need to formally define tangent spaces and so on, but the basic geometric picture should be made clear.)

My own approach to teaching implicit differentiation, and the one in every Calculus textbook I remember seeing, is that shown in the second video, not the first. And it’s always taught after students have learned (and, hopefully, internalized) the Chain Rule, because (as the Khan video points out) it follows very logically from the Chain Rule and the idea that y is a function of x (as defined implicitly by the equation).

No it doesn’t. Unless I missed it, dy and dx never appear in the Khan video, only the expression “dy/dx” which, taken as a whole, denotes “the derivative of y with respect to x.”

“They are neither finite quantities, nor quantities infinitely small, nor yet nothing. May we not call them ghosts of departed quantities?”

Working with infinitessimals, or fluxions, or whatever they were called, was the way Calculus was done in its early days. It worked (mostly, sort of), but it didn’t entirely make sense, which is why that approach gave way to an approach based on limits.

The “infinitessimal approach” can be a good way of explaining things intuitively. For example, one way of deriving the Product Rule is to think about what happens to a product, u*v, when u and v both change by some small amount.

(u + du)(v + dv) = uv + udv + vdu + dudv. The amount of the change is u(the change in v), plus v*(the change in u), plus some amount that is so small we can disregard it.

That suggests what should happen, and what the Product Rule should be, but it doesn’t prove it. How do we know when something is so small we can disregard it? Mathematicians rightly insist that everything should be able to be made logically rigorous. Calculus using infinitessimals can be made logically rigorous, but that’s not necessarily the easiest (and certainly not the historically earliest) was of doing so.

IMHO when, say, a physicist writes something like df = ∂f/∂x dx + ∂f/∂y dy he or she is not thinking of these as “fluxions” or infinitesimals, but probably as something like differential forms (as in differential geometry).

Very rarely does a thread here make me feel like a complete idiot; the last time was maybe around 2000.

Congratulations!

Please justify why you can just eliminate the (dx)[sup]2[/sup]

The thinking goes: You’re dealing with very small numbers. Kinda like 0.001, right? But with as many zeroes as you want to add.

But when you multiply a very small number times itself – still very small – then the product is an extraordinarily small number. So small, it no longer matters, for example 0.001 times itself results in 0.000001. The first order terms are arbitrarily small, and yet still big enough to make a difference. But when you multiply them together, they get so small that you discard them as irrelevant.

You say that it makes it very concrete but to most people, the math you wrote might as well be this: Imgur: The magic of the Internet

Something can seem obvious without being so and once you’re familiar with something, it can be very difficult to imagine what it’s like not to be familiar with it. Right now, with the benefit of years of study and reflection, with a mind that’s likely better than average at systemisation, you find it obvious because you’ve understood it at a fundamental, abstract level and are now looking back at the stuff you learned in more concrete ways and it all makes elegant, efficient sense, like you’re really grokking it now.

Young children and everyone else who knows little of that topic can only learn based on what they know starting out and their current ability to learn which, at first, are both limited. So you start with very concrete examples like: “John has 4 apples. He gives 1 apple to Tim. How many apples does John have?” That might seem like a loss of time to you but to someone who’s learn about math, that’s pretty much where they have to start. Explaining it to them at the most abstract level would be like trying to teach them Russian by having them read War and peace. Once they do grok it though, they’re not a small notch above where they used to be, ready for the slightly more abstract level. The process is incremental and builds on itself.

You’ve integrated a lot of nuances and methods involved to the point that it’s no more of a challenge than tying your shoelaces or walking, both of which required a lot of attention and effort to understand and master when you learned them as a child. Now, however, they’ve been “compressed” to the point that you can use them without even thinking about them which enables you to use them with very little cognitive load.

Why don’t we teach calculus with infinitesimals? Good question. If we had to start over, I think we would. Jerry Kiesler wrote a calc book doing it all with infinitesimals, but people are too stuck-in-the-mud to adopt it.

The assumption that is made is that there are special numbers called infinitesimals and that every number can be written as x + h with h infinitesimal and x “ordinary”. Then we may write x = ord(x+h). As with all numbers a non-zero infinitesimal is either positive or negative. But if h >0 is infinitesimal then all of h, h+h, h+h+h, … remain < 1. Now assume that f is a function. Then the derivative of f is defined as ord((f(x+h) -f(x))/h) provided that this ordinary part of the difference quotient does not depend on h. If that proviso fails, the f does not a derivative at x.

If you apply this to f(x) = x^2, you get ord(((x+h)^2 - x^2)/h) = ord(2x+h) = 2x. That extra h being infinitesimal is what justifies throwing away the dx^2 in the earlier post. Even after you divide by dx (which is the same thing as h really) it remains infinitesimal.

It is certainly true that calculus done this way is easier, once you accept infinitesimals. It is easy (but highly non-elementary) to show that infinitesimals are consistent using a constructions called ultraproduct. That model also makes it clear how any ordinary function from the reals to the reals can be extended to a function on these extended reals. But I will not go into that here.

Incidentally, if h is infinitesimal, 1/h is infinite and you can use infinite numbers to define integrals. Despite what Nava suggested integral calculus is always done after differential. Because you cannot compute integrals until you have the fundamental theorem of calculus and understand antiderivatives.

This isn’t about whether the chain rule works or not–it’s that implicit differentiation is taught as a narrow special case of the chain rule, which itself is just one of many similar differentiation rules.

What I posit here is that we reverse things: implicit differentiation (except don’t call it that; just call it differentiation) is taught first. From there, special cases like the chain rule can be derived.

Clearly, one quickly moves on from directly manipulating the infinitesimals. As you move past the fundamentals, you learn to substitute x[sup]2[/sup] with 2x*dx, etc. That’s not any different from how we progress from lim[h->0] (f(x+h) - f(x))/h to the power rule and so on. But if you start with the implicit technique, it becomes embedded early on and not seen as a special thing that you only bring out from time to time.

That was my experience as well. So I’ll just emphasize my comment to DPRK, which is to question why we do it that way. Sure, it follows from the chain rule, which itself follows from our fundamental definitions. But it doesn’t have to be taught that way, and to my mind everything flows more naturally if we reverse the order.

I don’t know why mathematicians aren’t more embarrassed about the treatment of infinitesimals. They are embarrassed about the past treatment of imaginary numbers, to the point of apologizing for the name (“Please don’t think of them as imaginary. They’re perfectly legitimate numbers!”). As you note, Calculus was developed using them, and even if there was a period where they weren’t defined rigorously, they are now after Abraham Robinson’s efforts. But there doesn’t seem to be any significant admission that Newton+Leibniz were on the right track after all, or innovation in pedagogy.

I’m not sure you understand what “implicit differentiation” means. Which is understandable, since I think it’s a misleading term. It’s not the differentiation that’s implicit, it’s the function that you’re differentiating.

Isn’t this what pedagogy is about, though? Thinking, with the benefit of hindsight, how we can better guide others through their education? As I mentioned, implicit differentiation was something I’d almost completely forgotten about, and taught as a minor result of the chain rule. Now, years later, I look at it and was able to grok it in a certain way that changes my total understanding. And so now I view the implicit method as fundamental, and things like the chain rule to be easily-derivable corollaries.

Maybe I’m wrong; I dunno. But it feels to be like the asymmetry of the left and right of the equation (as normally presented) is an early obstacle that doesn’t need to be there. Just treat them equally at the start and some later results will flow more naturally.

All functions are implicit, because they can all be written as y=x[sup]2[/sup] or x[sup]2[/sup]-y=0 or the like. It’s just that some functions are simpler in their implicit forms, and functions which are multi-valued in x do not have a single non-implicit representation.

My suggestion is to start with the implicit form, because it’s the more general case. And, the way I see it, it has the benefit of making the process of differentiation more transparent and easier to understand. Later on, one can introduce non-implicit forms as essentially a shortcut.

Assuming I’m understanding you correctly: You may be right about this (going from e[sup]y[/sup] = x to the derivative of(e[sup]y[/sup]) = the derivative of(x)) should be thought of the same way as going from y = log(x) to the derivative of(y) = the derivative of(log(x))). But isn’t that a totally separate issue from the use of infinitesimals?