Finding an Integral Using u-substitution

Please explain finding an integral by u-substitution to me like I am a 5-year old. I can do it, but I don’t remember why it works. Someone explained it to me 25 or 30 years ago but I haven’t thought about it since and now I don’t have a clue. Let me use the following example: After finding the appropriate u substitution you come up with the expression du/dx = 2x[sup]2[/sup]. Then you are supposed to say that therefore du = 2x[sup]2[/sup] dx. What the heck? I know it works, but how do you justify this? du/dx is not a fraction! Neither du nor dx is a number! What is actually going on here?

I suppose I could google it or try to figure it out myself but why waste this opportunity to be teased and ridiculed by the brilliant minds on the SDMB. If a coherent discussion develops, I am pretty sure there will some talk about limits.

First you will seldom go wrong in treating a first derivative, du/dx, as a fraction (Just don’t cancel the d’s). Treating du and dx as separate and separable objects is pretty much the mathematics of infinitesimals.

But without going into detail think of the operation of integrating of f(x). We write that as (integral sign) f(x) dx. The dx here simply indicates the variable you are integrating with respect to so we can distinguish

int [ x[sup]2[/sup] y] dx = (1/3) x[sup]3[/sup] y
int [ x[sup]2[/sup] y] dy = (1/2) x[sup]2[/sup] y[sup]2[/sup]

so think of your problem du(x)/dx = 2x[sup]2[/sup] as

define f(x) = du(x)/dx = x[sup]2[/sup]

Then integrating f(x) is the same as integrating du(x)/dx nad you never have to worry about whether or not you’re multiplying the dx out.

By “u-substitution”, it seems that you just mean that instead of integrating f(x) dx, you just define u = u(x) to be some function that will end up making your integral simpler; practice will help a lot in finding a correct substitution.

As for the notation, it suffers from a bit of abuse in that du/dx is not a fraction, and you should not think of it as du divided by dx, as you point out, but on the other hand recall that dx and du are not numbers but rather differential 1-forms, so that you are able to integrate not f(x) but f(x) dx. The quantities with “d” in front of them are also variables you need to take account of and substitute for when changing coordinates.

The formula you need to understand is that if u = g(x), then du = g’(x) dx. In the above notation, that says du = (du/dx) dx, which looks tautological, but it’s not, really, at least not due to elementary arithmetic.

As for your example, if du = 2x[sup]2[/sup] dx, what you need to do is write your original integral in terms of u and du instead of x and dx, possibly using the inverse dx = (1/2x[sup]2[/sup]) du if you need it. Then, once the x’s are gone, you can pretend u is the independent variable and integrate as usual.

Sorry for the terse explanation; please ask if anything needs elaboration.

I would not go so far as to say dx “simply indicates the variable you are integrating”, rather that f(x) is not something you can integrate over a 1-dimensional interval, while f(x)dx is something that you can integrate over an interval and get a number. (If we are going to make the connection to definite integrals…)

The problem of finding antiderivatives can be formulated without integral signs and differentials, but on the other hand one must get used to the notation everybody uses.

What is a “differential 1-form”? Maybe that would help me understand.

The proposition that “if u = g(x), then du = g’(x) dx” is what is giving me trouble. Apparently it is true; people have been using it for hundreds of years. Can you give a simple proof that my tiny brain can comprehend?

I can use the fact that “if u = g(x), then du = g’(x) dx” to evaluate an integral. That’s not the problem. I just want to know how we can be certain that it is true.

This is a key point I think. If you’re not careful with your wording your work may be marked down by a pedantic teacher, but at least you get the right answer! I don’t think the calculus stars 300 years ago had qualms about this.

And the great 12th-century Indian mathematician Bhaskara II has a memorable and relevant quote shown at the top of this page in Kim Plofker’s book.

Basically, the change-of-variables formula boils down to the chain rule for the derivative of a composite function: if u = g(x) and y = f(u), then dy/dx = dy/du ⋅ du/dx.

Without getting into a detailed geometrical explanation (unless that’s what you want :), dx represents a small displacement in the x-direction, and will satisfy (dx)(d/dx) = 1. If u = g(x) is another function, by how much will u increase if we again increase x by a unit infinitesimal amount? The rate of change is the derivative g’(x), so du = g’(x) dx. (Similarly, d/du = (d/dx) / g’(x).) Note that this is not mere work with fractions; with more variables you will have df = (∂f/∂x)dx + (∂f/∂y)dy , and changing coordinates in the volume element will factor in the determinant of the Jacobian matrix.)

Using only one variable may obscure some nuances, but for any function f we will have (df)(d/dx) = df/dx. On the other hand, (dx)(d/dx) = 1. Also note what we said above, about a change in x resulting in the change in u magnified by a factor of du/dx , and vice versa.

septimus, would you mind quoting that statement for us? Your Google Books link shows only a single sentence on the page you linked, in an Indian language that I can neither read nor recognize.

This is worth stressing. When you’re dealing with total derivatives, you can in fact treat dy/dx as a fraction without getting into trouble, and there are even ways of formulating calculus where it really is a no-kidding genuine fraction. But that will absolutely not work once you eventually start working with partial derivatives (notated by those curvy "d"s that look like backwards 6s), and in fact you end up with some completely counterintuitive results like (∂x/∂y) * (∂y/∂z) * (∂z/∂x) = -1, rather than 1.

Yes, please do - I had Google Books translate that single sentence and it translates to…

“You arrive at a page that cannot be viewed. Or the restrictions on viewing this book have been reached.”

Neither memorable nor relevant, I’m afraid.

Is this 5-year-old supposed to understand the Chain Rule for derivatives? Because u-substitution is more-or-less the Chain Rule in reverse.

The Chain Rule says, for example, that since the derivative of x[sup]3[/sup] (with respect to x) is 3x[sup]2[/sup], the derivative of u[sup]3[/sup] (with respect to x) is 3u[sup]2[/sup] * u’.

That is, the derivative of (something)[sup]3[/sup] is 3(something)[sup]2[/sup] times the derivative of the something.

The derivative of (10x[sup]5[/sup] + 7x + 93)[sup]3[/sup] would be 3(10x[sup]5[/sup] + 7x + 93)[sup]2[/sup] * (50x[sup]4[/sup] + 7).

So then the antiderivative of 3(10x[sup]5[/sup] + 7x + 93)[sup]2[/sup] * (50x[sup]4[/sup] + 7) would be (10x[sup]5[/sup] + 7x + 93)[sup]3[/sup]. And u-substitution is really just a way of making this obvious.

There’s a lot more I could say about how to actually do it, but that’s the basic idea behind it.

That’s not an Indian language but I believe it’s Thai. Anyways, I have the book and here is the excerpt :

In other words, the “at-that-time” Sine-difference Δ Sin for a given arc α is considered simply proportional to the Cosine of α:

Δ Sin = Cos α * (225/R)

It has been noted that this and related statements reveal similarities be- tween Bh ̄askara’s ideas of motion and concepts in differential calculus. (In fact, perhaps these ratios of small quantities are what he was referring to in his commentary on L ̄ıla ̄vat ̄ı 47 when he spoke of calculations with factors of 0/0 being “useful in astronomy.”) This analogy should not be stretched too far: for one thing, Bha ̄skara is dealing with particular increments of partic- ular trigonometric quantities, not with general functions or rates of change in the abstract. But it does bring out the conceptual boldness of the idea of an instantaneous speed, and of its derivation by means of ratios of small increments.“

Using the Liebniz notation, the chain rule states that dy/dx = dy/du x du/dx. If dy, dx, and du were real numbers and dy/dx, dy/du, and du/dx were fractions then the proof of the chain rule would almost be trivial. However, they are not numbers or fractions. Nonetheless, I have seen several different proofs of the chain rule. Perhaps some sort of backwards proof of the chain rule could show that “if u = g(x), then du = g’(x) dx.”

I know how u-substitution works. I just want to know why it works. In fact, I stumbled upon with this question when preparing training on solving differential equations using the separation of variables technique. It’s really the same question but in a slightly different context. I just used the u-substitution context because I thought more people would be familiar with it. I wanted to be ready with a short simple answer in case some trainee asked the same question that I asked in this thread. Who knows, maybe one of the trainees could be a pedantic ***hole like me and just want to ask that question so that they could watch me squirm. (More likely though, I would get questions like “What’s a differential?” or “Couldn’t you just google it?”.)

In looking for a simple answer, I have found several sources that resort to what I consider mere hand waving such as “using some ‘informal’ algebra”, “treating dx as if it were a number even though it’s not”, etc. I have even seen some weasel words in this thread such as "you will seldom [emphasis added] go wrong … ", "you can pretend … ", and “… more-or-less.”

One more thing. No one in their right mind wants to actually watch me physically squirm.

Now, the slightly more modern point of view is that if you have a smooth space, which could be Euclidean space but also something curved like a differentiable manifold, you can zoom in to an infinitesimal neighborhood of a given point by considering the tangent space at that point, one way of looking at which is that curves passing through the point represent tangent vectors. A tangent vector applied to a function will give the corresponding directional derivative of the function at that point. Usually this kind of stuff is introduced when you study multi-variable calculus, not in a first-semester course.

Furthermore, at any point you can also consider the cotangent space. Eg a smooth function equal to 0 at a given point represents a cotangent vector. So now if you start with any smooth function f, its differential df will give a cotangent vector at any point, therefore it is a differential 1-form.

Anyway, the advantage of this point of view is that it provides a rigorous way of the interpreting intuitive calculations with infinitesimals which is independent of a particular choice of coordinates, and can be generalized.

The Leibniz notation may be obscuring what is going on. If you think of the Chain Rule in terms of the total derivative of a function, the total derivative of the composite of two functions is just the composite of the total derivatives. The composition of two linear functions is calculated via matrix multiplication; in one variable the derivative at any point is given by a single number, and you are multiplying those numbers together.

Anyway, back to the beginning, when you write “dx” it is not a “number” (then again your coordinate function x is not a mere “number” either, rather at any point you may evaluate it to get a number). Like x, dx may be evaluated at any point and gives a cotangent vector, which is not a “number” but some linear map, but in terms of your fixed coordinate x (which gives a resulting coordinate dx on the cotangent space) the linear map is represented by a number, namely the derivative of the function at that point. Think of f(x) = f(a) + f’(a) (x - a) + … df (evaluated at a) = f’(a) dx (evaluated at a) is the infinitesimal/differential version of that, and this is abbreviated by the formula df = f’(x) dx

Mathematics is not a place for lies or informality or weaselling, so you should discount those sources :slight_smile: Even people who give informal “physics proofs” know how to make them completely mathematically rigorous, assuming they know what they are talking about.

The short answer you can tell people seems to be that both “x” and “dx” are coordinates, so that when you change coordinates and write your integral in terms of the new coordinates, you need to substitute for both of them. Or, more basically, when finding antiderivatives you are using the Chain Rule— the trainees probably remember that name.

I just found this video that does not answer my question but at least it kinda, sorta, maybe asks my question.

dx and dy are “algebraic expressions” just like x and y, and are “super mathematically rigorous” no less than x and y, so I am not convinced that narrator knows what they are talking about.

But we must admit that the notation IS potentially confusing: dy/dx is not a fraction, and yet we have these things dx as well as d/dx defined, not to mention the curly d’s, stuff like |dx| and (dx)[sup]1/2[/sup], dx dy as well as dx ^ dy, you name it…

I guess there are different ways one could answer the question, but one way is “It works by definition.”

We define ∫ f(x) dx to mean “the general antiderivative of the function f(x) with respect to the variable x”; and we define du to be “du/dx * dx” (that is, the derivative of u with respect to x, times dx) without ever really defining what dx by itself means. These definitions then give us a convenient way of doing what we need to do with integrals or separable differential equations.

Just calling it another name for the chain rule seems incomplete to me. Otherwise, I couldn’t do more complicated substitutions, like this (admittedly contrived) example:
Say I want to compute:
∫3x[sup]5[/sup] dx

Start with this substitution:
u[sup]2[/sup] = x[sup]3[/sup]
Take the derivative of both sides:
2u du = 3x[sup]2[/sup] dx

Note that the original integral is one part of each:
∫3x[sup]5[/sup] dx = ∫x[sup]3[/sup] * 3x[sup]2[/sup] dx = ∫u[sup]2[/sup] * 2u du = ∫2u[sup]3[/sup] du
Integrate the normal way (ignoring the +C for convenience):
u[sup]4[/sup]/2
And substitute back to x:
(x[sup]3[/sup])[sup]2[/sup]/2 = x[sup]6[/sup]/2

That is of course the same as what we’d get if we did it the normal way in the beginning.

Lets start with the definition of the derivative g’(x) = limit( (g(x+h)-g(x))/h), if we let dx=h be an infinitesimally small change in x and du=g(x+h)-g(h) be the resulting change in u=g(x) then we get g’(x)=du/dx, or du=g’(x)dx

Now on to integration. Remember that when I write Int(f’(x)dx)=f(x)+c I am calculating the area under the curve f’(x) by taking little tiny slices of width dx and height f’(x) and adding their areas together. to make thing concrete I can chop the region a to b up into 10^1000 equal slices of width dx=10^-1000 resulting in f(b)-f(a)

Now suppose I am interested in calculating Int(f(g(x))g’(x)dx) from a to b. I can take dx=10^1000 equal width bars along the x axis and add their areas together. This will be the sum of all (b-a)*10^1000 regions each with area f(g(x))g’(x)x10^-1000

Next I am going to consider a different graph where the height is f(g(x)) with the same set of bars, but now I am going to relabel the x-axis from x to u=g(x), so that the curve now has height f(u) and extends from u=g(a) to u=g(b) so if for example g(x)=2*x^2 then the point at x=0.1 will be replaced with the value u=0.02 and the value x=0.2 with be replaced with the value u=0.08 and the region will extend from 0 to 2. If I use the same bars as before, they will have height f’(g(x)) and width in terms of u equal to du=g’(x)dx=g’(x)x10^-1000, so their total area will be the sum of the (b-a)10^1000 regions with area f(g(x))g’(x)x10^-1000 the same as my original graph. But it is also the graph of f’(u) vs u from the limits g(a) to g(b), and has area equal to Int(f(u)du) from g(a) to g(b).

So Int(f(g(x))g’(x)dx) from a to b = Int(f(u)du) from g(a) to g(b)