FAQ 
Calendar 


#1




Why isn't implicit differentiation using infinitesimals taught earlier?
And by "earlier," I mean as the very first thing. Let me explain.
I was watching a 3Blue1Brown video on implicit differentiation. This is one area of Calculus that had faded almost into nothingness in my memory. As always, the video clearly explained the topic and I refreshed my memory on the area, but something didn't quite sit right with me. In particular, the problem where he introduces a function S(x, y) felt artificial, and he also never quite explains where the dx and dy come from. I looked for a Khan Academy video on the same subject, but found it to be substantially worse. It just talks about the chain rule and how to apply it, never quite explaining why it works. And like so much of Calculus education, it plays very fast and loose with dx and dy. Are they operators? Functions? Infinitesimals? Something else? Can I treat them as a fraction? An algebraic variable? And the answer is always that sometimes it's like a fraction, sometimes not, and you'll just have to figure out when certain things are allowed. And is the chain rule fundamental to the technique or is there something more general going on (well, I already knew from the 3Blue1Brown video that it wasn't fundamental, but why the difference)? It got me thinking. I've always liked the infinitesimal approach to Calculus, so let's apply it here. Using a concrete example, let's differentiate y = log(x). First, we can single out another point on the curve as: y+dy = log(x+dx) Note that I'm not even using infinitesimals yet. I'm just stating the obvious fact that (x+dx, y+dy) is another point on the curve. Take both our equations and exponentiate both sides: e^{y} = x e^{y+dy} = x+dx We ultimately want to express an answer as dy/dx. Let's subtract the two equations and see what happens: e^{y+dy}  e^{y} = dx e^{y}(e^{dy}  1) = dx Again, I haven't even used infinitesimals yet, just algebra. But for the next step, I have to assume that dy is small. What is an approximation of e^{dy}? Well, we know the Taylor series for e^{x} starts with 1+x (Was that cheating, since the Taylor series requires differentiating e^{x}? Not really, since the very definition of e is that very number where the slope of e^{x} equals e^{x}. So the second term of the Taylor series must have a coefficient of 1). Therefore, we have: e^{y}(1 + dy  1) = dx e^{y} dy = dx But we know that e^{y}=x. And so we have: dy/dx = 1/x Easy. There was only point point where I made any assumption that dy was small, and that was easily justified. And in fact I could have used any number of other justifications for the same argument. Also note that I didn't need anything else in Calculus; certainly not the chain rule, nor even the standard "(f(x+h)  f(x))/h". Needing so little, this felt more fundamental than the usual methods. And it shows you exactly what dx and dy are and where they came from: they're just variables that you introduced in order to specify a different point on the curve. And it's obvious when we're doing completely ordinary algebraic ops and when we're dropping terms because they are too small to care about. To my mind, this works better than the usual (f(x+h)  f(x))/h approach. That's similar, but the early divide feels awkward. If you make implicit differentiation the starting point, the divide only comes at the very end when you're happy with the results. And then you don't have to teach implicit differentiation as a separate thing, since it's actually the most fundamental thing and you can always fall back to it in a bind. 3Blue1Brown did a more complex example, but to my mind it became even less clear why and when the dx and dy come into play. So let me try it again using the infinitesimal style: sin(x)y^{2} = x sin(x+dx)(y+dy)^{2} = x+dx sin(x+dx)(y+dy)^{2}  sin(x)y^{2} = dx (sin(x) + dx*cos(x))(y^{2} + 2y*dy + dy^{2})  sin(x)y^{2} = dx (sin(x)y^{2} + dx*cos(x)y^{2} + 2*sin(x)y*dy + [small terms])  sin(x)y^{2} = dx dx*cos(x)y^{2} + 2*sin(x)y*dy = dx 2*sin(x)y*dy = dx  dx*cos(x)y^{2} 2*sin(x)y*dy = dx(1 cos(x)y^{2}) dy/dx = (1 cos(x)y^{2})/(2*sin(x)y) y^{2} = x/sin(x) [rearrange eq 1] dy/dx = (1  cos(x)x/sin(x))/(2*sin(x)y) dy/dx = (1  cos(x)x)/(2*sin^{2}(x)y) And there's one remaining y. But it needs a square root, so we know there are two sets of curves, just as there would have been in the original: dy/dx = (1  cos(x)x)/(2*sin^{2}(x)sqrt(x/sin(x)) dy/dx = (1  cos(x)x)/(2*sin^{2}(x)sqrt(x/sin(x)) Again, very straightforward. I didn't even need the product rule! I only needed one tiny bit of other calculus (that (sin(x))' = cos(x)), and two assumptions about the infinitesimals: that f(x+dx) =~ f(x) + dx*f'(x), and that any total power of a dx or dy above 1 can be discarded. Now, obviously there are limits to this approach (no pun intended), because eventually you want to manipulate things in a higher order fashion, with the chain rule/product rule/etc. And then you get to integrals where you have to think even more abstractly. But why isn't implicit differentiation taught immediately using an approach like this? It makes the manipulations very concrete and puts differentiation on a broader base right from the start. 
#2




I have no idea what any of that means and the last time I had to do anything beyond basic arithmetic by hand was in math class.

#3




I would love a more intuitive way to grok differentiation and/or integration.
I don't think you have found it though. I failed using this process even for y=x^2. 
#4




Quote:
y = x^{2} y+dy = (x+dx)^{2} Expand the second: y+dy = x^{2} + 2x*dx + dx^{2} Remove high order terms: y+dy = x^{2} + 2x*dx Subtract the first: dy = 2x*dx And finally, divide to get dy/dx on the left: dy/dx = 2x We can easily generalize to x^{N}: y+dy = (x+dx)^{N} = x^{N} + N*x^{N1}*dx + [small terms] Subtract y=x^{N} and remove small terms: dy = N*x^{N1}*dx Divide: dy/dx = N*x^{N1} Of course, for simple derivatives, this is pretty much the same as the ordinary technique. But it gets students used to putting an infinitesimal dy on the y terms, wherever they may be: on the left, on the right, inside a function, or whatever. Your example could have been stated as "x^{2}  y = 0" and it would work just as well, only requiring students to move things left and right at the end when it is simpler. 


#5




I'm not 100% sure what techniques is the OP talking about, since I've never studied derivatives in English, but I think part of the issue here is that, AIUI, in the US integrals go before derivatives. That's why you can start directly on chains: in our case, we did derivatives first.
So for us it went: 1) limits. 2) using limits to find the basic derivatives (I think this may be what the OP is proposing, only with a different name). 3) learning combination techniques such as chaining. 
#6




I don't expect anyone would go through that sort of verbose explicit calculation every time any more than they would with one variable. Given an equation like y  x^{2} = 0, one would calculate the total differential (a differential form) d(y  x^{2}) = dy  2x dx = 0. Then you can solve for dy/dx, as it were. This is all taught rigorously in firstyear calculus.
If some video does not explain why the chain rule works, then it must be doing a poor job of teaching any calculus. 
#7




Furthermore, perhaps the video in question fails to convey some basic geometrical intuition: an equation like S(x, y) = 0 defines a curve in the plane, and when we differentiate it what we get is the equation of the line tangent to the curve at any point. (At some point we may need to formally define tangent spaces and so on, but the basic geometric picture should be made clear.)

#8




Quote:
Quote:
Quote:
Working with infinitessimals, or fluxions, or whatever they were called, was the way Calculus was done in its early days. It worked (mostly, sort of), but it didn't entirely make sense, which is why that approach gave way to an approach based on limits. The "infinitessimal approach" can be a good way of explaining things intuitively. For example, one way of deriving the Product Rule is to think about what happens to a product, u*v, when u and v both change by some small amount. (u + du)(v + dv) = uv + u*dv + v*du + du*dv. The amount of the change is u*(the change in v), plus v*(the change in u), plus some amount that is so small we can disregard it. That suggests what should happen, and what the Product Rule should be, but it doesn't prove it. How do we know when something is so small we can disregard it? Mathematicians rightly insist that everything should be able to be made logically rigorous. Calculus using infinitessimals can be made logically rigorous, but that's not necessarily the easiest (and certainly not the historically earliest) was of doing so. 
#9




IMHO when, say, a physicist writes something like df = ∂f/∂x dx + ∂f/∂y dy he or she is not thinking of these as "fluxions" or infinitesimals, but probably as something like differential forms (as in differential geometry).



#10




Very rarely does a thread here make me feel like a complete idiot; the last time was maybe around 2000.
Congratulations! 
#11




Please justify why you can just eliminate the (dx)^{2}
__________________
If all else fails, try S.C.E. to Aux. 
#12




The thinking goes: You're dealing with very small numbers. Kinda like 0.001, right? But with as many zeroes as you want to add.
But when you multiply a very small number times itself  still very small  then the product is an extraordinarily small number. So small, it no longer matters, for example 0.001 times itself results in 0.000001. The first order terms are arbitrarily small, and yet still big enough to make a difference. But when you multiply them together, they get so small that you discard them as irrelevant. 
#13




You say that it makes it very concrete but to most people, the math you wrote might as well be this: https://imgur.com/pK8cWEp
Something can seem obvious without being so and once you're familiar with something, it can be very difficult to imagine what it's like not to be familiar with it. Right now, with the benefit of years of study and reflection, with a mind that's likely better than average at systemisation, you find it obvious because you've understood it at a fundamental, abstract level and are now looking back at the stuff you learned in more concrete ways and it all makes elegant, efficient sense, like you're really grokking it now. Young children and everyone else who knows little of that topic can only learn based on what they know starting out and their current ability to learn which, at first, are both limited. So you start with very concrete examples like: "John has 4 apples. He gives 1 apple to Tim. How many apples does John have?" That might seem like a loss of time to you but to someone who's learn about math, that's pretty much where they have to start. Explaining it to them at the most abstract level would be like trying to teach them Russian by having them read War and peace. Once they do grok it though, they're not a small notch above where they used to be, ready for the slightly more abstract level. The process is incremental and builds on itself. You've integrated a lot of nuances and methods involved to the point that it's no more of a challenge than tying your shoelaces or walking, both of which required a lot of attention and effort to understand and master when you learned them as a child. Now, however, they've been "compressed" to the point that you can use them without even thinking about them which enables you to use them with very little cognitive load. 
#14




Why don't we teach calculus with infinitesimals? Good question. If we had to start over, I think we would. Jerry Kiesler wrote a calc book doing it all with infinitesimals, but people are too stuckinthemud to adopt it.
The assumption that is made is that there are special numbers called infinitesimals and that every number can be written as x + h with h infinitesimal and x "ordinary". Then we may write x = ord(x+h). As with all numbers a nonzero infinitesimal is either positive or negative. But if h >0 is infinitesimal then all of h, h+h, h+h+h, ... remain < 1. Now assume that f is a function. Then the derivative of f is defined as ord((f(x+h) f(x))/h) provided that this ordinary part of the difference quotient does not depend on h. If that proviso fails, the f does not a derivative at x. If you apply this to f(x) = x^2, you get ord(((x+h)^2  x^2)/h) = ord(2x+h) = 2x. That extra h being infinitesimal is what justifies throwing away the dx^2 in the earlier post. Even after you divide by dx (which is the same thing as h really) it remains infinitesimal. It is certainly true that calculus done this way is easier, once you accept infinitesimals. It is easy (but highly nonelementary) to show that infinitesimals are consistent using a constructions called ultraproduct. That model also makes it clear how any ordinary function from the reals to the reals can be extended to a function on these extended reals. But I will not go into that here. Incidentally, if h is infinitesimal, 1/h is infinite and you can use infinite numbers to define integrals. Despite what Nava suggested integral calculus is always done after differential. Because you cannot compute integrals until you have the fundamental theorem of calculus and understand antiderivatives. 


#15




Quote:
What I posit here is that we reverse things: implicit differentiation (except don't call it that; just call it differentiation) is taught first. From there, special cases like the chain rule can be derived. Clearly, one quickly moves on from directly manipulating the infinitesimals. As you move past the fundamentals, you learn to substitute x^{2} with 2x*dx, etc. That's not any different from how we progress from lim[h>0] (f(x+h)  f(x))/h to the power rule and so on. But if you start with the implicit technique, it becomes embedded early on and not seen as a special thing that you only bring out from time to time. 
#16




Quote:
I don't know why mathematicians aren't more embarrassed about the treatment of infinitesimals. They are embarrassed about the past treatment of imaginary numbers, to the point of apologizing for the name ("Please don't think of them as imaginary. They're perfectly legitimate numbers!"). As you note, Calculus was developed using them, and even if there was a period where they weren't defined rigorously, they are now after Abraham Robinson's efforts. But there doesn't seem to be any significant admission that Newton+Leibniz were on the right track after all, or innovation in pedagogy. 
#17




I'm not sure you understand what "implicit differentiation" means. Which is understandable, since I think it's a misleading term. It's not the differentiation that's implicit, it's the function that you're differentiating.

#18




Quote:
Maybe I'm wrong; I dunno. But it feels to be like the asymmetry of the left and right of the equation (as normally presented) is an early obstacle that doesn't need to be there. Just treat them equally at the start and some later results will flow more naturally. 
#19




Quote:
My suggestion is to start with the implicit form, because it's the more general case. And, the way I see it, it has the benefit of making the process of differentiation more transparent and easier to understand. Later on, one can introduce nonimplicit forms as essentially a shortcut. 


#20




Quote:

#21




Quote:
I don't think this matters. The average student uses the reals all the time but no one but math majors ever even encounter a Dedekind cut. Same goes for set theory and ZermeloFraenkel axioms. At some point, we just have to accept that the structures we're using have been proven to be consistent. 
#22




Quote:
First: use infinitesimals to teach calculus. I think this is a more natural approach in general, but the benefit here specifically is that we can treat dx and dy as actual variables that we can manipulate like any other, as compared to the usual approach where dy/dx is treated as a unit blob, except sometimes not really. Second: start with the implicit representation of functions. That means, when trying to find the slope of a curve, of whatever kind, that we add a dy to every place that y shows up and a dx to every place that x shows up. That gives us a new, nearby point on the curve. We can use that equation and the first one, and just a little algebra, to perform the derivative. One can use these fundamentals to derive all the normal rulespower rule, product rule, chain rule, etc. From there, one can move on to more special cases. 
#23




Personally, I always think of "dx" as "a little piece of x", and so on.
The drawback to this is that it makes partial derivatives extremely nonintuitive, for those students who will eventually encounter them. 
#24




Quote:
Quote:
Do they make it easier to logically justify and rigorously prove those rules? Do they make it easier to informally derive the rules? Do they make it easier to intuitively understand the rules and how or why they work? Do they make it easier to use the rules, once you have them? 


#25




I don'tat least not initially. One thing I may not have been clear about is that the initial substitution could have be anything, because I'm just stating the same fact in a different way:
y = x^{2} [(x, y) is a point on the curve] y+dy = (x+dx)^{2} [(x+dx, y+dy) is a point on the curve] q = t^{2} [(t, q) is a point on the curve] batman = superman^{2} [(superman, batman) is a point on the curve] However, I chose (x+dx, y+dy) because I knew it would make things easier later on. And that eventually, I would have to assume that dx and dy were small. But the smallness assumptions come in specific, obvious places: throwing out high powers of them, or assumptions like sin(dx)~=x, etc. Quote:
 Not exactly, because as Hari Seldon said, proving the consistency of infinitesimals requires some serious math. But then so does justifying the existence of the reals or the consistency of set theory. So we aren't in new waters here.  Yes, because we can manipulate infinitesimals just like anything else and aren't burdened with the excess notation of limits.  Yes, because we aren't burdened with weird special rules like "dy/dx looks like a fraction but really isn't". With infinitesimals, it is. There are no special cases aside from those we already know of (no dividing by zero, etc.)  Again, yes, for more or less the same reasonthere are fewer restrictions and thus more avenues to approach any given problem. 
#26




Quote:
x^{2} + y^{2} + z^{2} = 1 Suppose we want to find dz/dy. We'll make our new point (x, y+dy, z+dz), because we know we're holding x constant. Go through the steps I outlined above and you get: 2y*dy + 2z*dz = 0 [the x components drop out in the subtraction] And with a little manipulation you get: dz/dy = y/z Easy enough. And even easier for functions that are usually written nonimplicitly, like z=x^{2}+y^{2}. 
#27




I'm an optimist so I'd say you're half right : )
Quote:
Quote:
Look at how ONI taught you. At first, you stumbled your way through the game, coming up with juryrigged, ad hoc solutions to simple, basic problems. After solving many basic problems and becoming acquainted with the game, it started to dawn on you how you might solve them better with more complex and efficient solutions; Now you can skip the basic stuff and go straight to the best way of doing things. But if you tried getting a new player to do that, it would likely overwhelm them. The military likes to conceive teaching in terms of crawl, walk, run and when you know how to run, it can be difficult to see why anyone would waste their time crawling because it's currently useless to those who can run. It's possible that the way you propose is better. I don't have a reason to doubt that it's a superior way to talk about math among people who already know a fair amount of it. I'm just not sure it's a good way to teach people who know little math. If you have cousins handy, you could try it out on them if you're ok with becoming their least favorite uncle for a while. Have you noticed something similar when it comes to other skills like programming? Can your total understanding change more than once if you go deep enough into a skill? Last edited by MichaelEmouse; 03112019 at 04:59 PM. 
#28




OopsI answered the wrong question above, though perhaps what I did answer is still satisfying. The point is that it just follows from how we're defining the curve. The whole basis for the implicit form is that we're stating a S(x, y) = 0 where (x, y) is a point on the curve. If that's true, then (x+dx, y+dy) is a point on the curve when S(x+dx, y+dy) = 0. It's just basic substitution.
I don't assume, initially, that dx and dy are small. That comes at some later step where I have to throw out a dx^{2} or the like. 
#29




Quote:
Yes, though seemingly harder to pin down because it's harder to think of specific examples, or even where things went wrong before. However, looking at early code I wrote, I often slap my forehead and wonder what I was thinking. It's not even that the code was bad, perse, and not that I was an idiot. But somehow I didn't grasp the essential element of things and made a much more complex solution that was required. 


#30




Quote:
I've never seen anyone prove this using your method. Do you know of any good ones?
__________________
"Ridicule is the only weapon that can be used against unintelligible propositions. Ideas must be distinct before reason can act upon them." If you don't stop to analyze the snot spray, you are missing that which is best in life.  Miller I'm not sure why this is, but I actually find this idea grosser than cannibalism.  Excalibre, after reading one of my surefire millionseller business plans. 
#31




Quote:
When looking at early code you wrote and seeing how your code could have been more elegant, how would you teach your previous self to do it better in a way that your previous self would understand with the knowledge and skills he had at the time? 
#32




Quote:
It's possible that the book Hari mentioned above does so. There appears to be an online copy here. I haven't yet looked at it. 
#33




I wish I knew! This aspect of programming isn't taught at all in CS courses. People are just expected to eventually learn how to program elegantly. Maybe the coursework needs to be graded on beauty and not just correctness .

#34




Something for you to analyze and figure out. Having to teach someone (even a hypothetical someone) is a great way to make explicit, formalize and consolidate what you know. Your understanding might deepen further if you do.
Quote:
Zachtronics games like SpaceChem and Opus Magnum give people practice at doing that with a very basic kind of visual scripting. Door Kickers might be a topdown version of SWAT 4 but it can also have that same economical elegance where everything flows smoothly and comes together neatly based on a small set of instructions; A level/problem can go from seeming impossible to being completed in very little time with no wasted actions. Maybe the reason it isn't taught in CS classes is because it doesn't just relate to programming or games, it's a general skill that can be applied to pretty much anything, even pretty pedestrian stuff: I once had a very unsatisfying job where I had to fold cardboard boxes which was as boring as you would expect. So, I made it a little less boring by figuring out the most efficient way to fold them with a minimum amount of movement and waiting. I did the same for loading rifle rounds in magazines when that was part of a test. It didn't require much cleverness, mainly the willingness to pause, observe, reflect and experiment to figure out a better way. It might be suitable for more advanced CS classes (like the equivalent of an MBA) or competitions. Are there jams where people are given a goal and whoever can do it in the fewest characters wins? 


#35




Quote:
I like Zachtronics games, too. And household optimization! 
#36




Quote:
As an example of the nonintuitiveness I mean, what's (dx/dy) * (dy/dz) * (dz/dx)? Why, obviously, all of the infinitesimals cancel, and so it's just 1. So what's (∂x/∂y)*(∂y/∂z)*(∂z/∂x)? Why, obviously, that's... negative 1? Where'd that negative come from? The key, of course, is that while dx/dy can be broken down to something called dx divided by something called dy, ∂x/∂y (despite looking very similar) is a symbol in itself, not something called ∂x divided by something called ∂y. 
#37




Sure it is, though it may not be recognizable. Ok, I'll do a more normal one. The function is:
f(x, y) = x^{2}y + y^{3} The partial derivative with respect to y is: ∂/∂y f(x, y) = x^{2} + 3y^{2} We can do it using the implicit technique by defining: z = x^{2}y + y^{3} We're interested in dz/dy, so we'll use (x, y+dy, z+dz) as the new point, holding x constant: z+dz = x^{2}(y+dy) + (y+dy)^{3} Subtract the original equation and do some algebra: dz = x^{2}dy + 3y^{2}dy dz/dy = x^{2} + 3y^{2} Same thing. And sure, ∂ vs. d can get confusing, though I think maybe that's another argument in favor of infinitesimals, not against. The infinitesimals always work right if you're explicit about them. Last edited by Dr. Strangelove; 03112019 at 06:42 PM. 
#38




Is this something my son is doing in his Differential Equations class? I never took HS level precalc, much less a 300 level college math class.

#39




Differential equations is different, and much more difficult IMHO. This stuff is basically Calculus I, or maybe somewhere in AP Calc AB. I was never very good at DiffEq.



#40




Thanks for keeping me from being laughed at by a knowitall (albeit one who is very good at math) college student.

#41




Quote:
A couple of extras: when you divide a tiny number by a tiny number, the result might not be tiny. That's why we can't just translate the dy/dx to 0/0 (which breaks things anyway). Also, a big+tiny number minus the same big number leaves us the tiny number. Although tiny, if it's the only thing that's left, it's still important and we can't throw it away. But we can still get rid of tiny*tiny numbers because they're extratiny compared to ordinary tiny numbers. One nice thing here is that students can experiment around with the tiny numbers on their calculator. Plug in e^{0.0001} and get 1.000100...[a tiny bit extra]. Then see how the error goes down even further as you add more zeroes. Of course, one wants to be careful that the students don't get the idea that this is the only thing going onhowever many zeroes you add, infinitesimals are smaller yetbut it does give a nice bit of concrete intuition. Better yet would be a computer program with a kind of "infinite zoom" system, like the 3Blue1Brown videos do. With that, it should really sink in that you can always zoom in enough that the slope looks constant (except for nondifferentiable functions). 
#42




As for formally developing infinitesimals in order to teach calculus, as Hari Seldon explained, sure, why not? But keep in mind that there is going to be some conservation of difficulty in defining, e.g., hyperreal numbers, and the aforementioned physicists may not care deeply about the formal logic of it, and it does not impact their symbolic calculations either way.

#43




Quote:
I'm talking about the very earliest parts of calculus: where the usual starting point is lim[h>0] (f(x+h)f(x))/h. And from there, showing where the power rule, the chain rule, the product rule, the transcendental derivatives, and the others come from. Starting with implicit differentiation feels both simpler and more elegant to me. 
#44




When I was a TA for a PDP11 assembly language class over 40 years ago we started with an assignment, done in Pascal, which we graded more on beauty than on correctness. Beauty in this sense was following the rules of the then new structured programming movement. We figured that if the students didn't learn it early, there assembly language code would be even more of a mess than it turned out to be.



#45




Definitely true when I took it in high school a long time ago and when my daughter took it not such a long time ago. I can't imagine how anyone could teach integration first.

#46




Quote:

#47




Okay, I'll say it. I have no clue what this is about. Jesus H.Christ this whole thread just gave a headache.

#48




Quote:

#49




Seriously, why? Do you always get a headache when people discuss things that are outside your own areas of knowledge?



#50




No headache but I'm in a place possibly worse; I had Calc I and II (more than 40 years ago now) and I get just enough of the discussion that I had to make some notes and do some calculations/equations myself and follow it along. That, I think, is why there will always be some form of SDMB; where else can threads like this so easily mix with the range of other topics we discuss? This is really terrific!

Reply 
Thread Tools  
Display Modes  

