What's going on with this d<variable> stuff in integrals?

I’m taking physics, and I’m decent at calculus, but something came up that I’m not really clear on mathematically. I’ve always been taught to think of integral <expression> d<variable> as a really fancy singular operator, albeit one that could be played with (i.e. u-substitution). I knew this wasn’t precisely the case (and was told as much), but was content to go along with it. Then something came up with work.

We defined work as integral F . ds, where . is the dot product operator and F and s are vectors. So if you write F and ds in component forms you get F = F[sub]x[/sub] i-hat + F[sub]y[/sub] j-hat, ds = dx i-hat + dy y-hat, then you magically get:

integral F[sub]x[/sub] dx + integral F[sub]y[/sub] dy

Okay, I’m not sure I grasp how this works. I vaguely remember defining ds as the limit as something goes to 0 (probably change in position). And I recall that the d more or less means a REEEEEAAAALLLY small value. Given how I know the dot product works, there’s clearly some multiplication going on with ds, but I give up with how it’s working. I suspect there’s some vector calc stuff (which isn’t required for this course) given, well, the vectors. But I don’t think I understand the definitions well enough to really grasp why we can do what we just did, I had a hard enough time back in calc with u-substitution’s “now we’ll pretend that du/dx is really a fraction even though we told you it wasn’t and multiply both sides by dx, just accept it” thing. I suspect that something similar is going on in this case.

Can anyone shed light onto precisely what “ds” (or d whatever) MEANS and what makes it do what it does? I’m getting a little tired of seeing completely “arbitrary” things done with it – usually just simple division or multiplication – and then being assured that the d<variable> construction is still just magic and not something that operations can be done on.

Do you remember taking derivates and how that works? If so, then the dy and dx’s shouldn’t be so confusing.

If you have a function y = 4x, then taking the derivative dy/dx = 4 has the meaning “the derivative of y with respect to x.”

dx/dy has an entirely different meaning.

Now look at the opposite:

y = integral 4 dx

Where did this come from? It’s actually

dy/dx = 4, multiply both sides by dx to get:

dy = 4 dx, take the integral of BOTH sides to get:

integral 1 dy = integral 4 dx

y = integral 4 dx

If you understand derivates, then dy and dx should make sense.
If you don’t understand derivatives, then you should be looking there first.

dy/dx is an instantaneous rate of change. For example, the speed of your car (velocity) is the instantaneous rate of change of where your car is (position).
dy/dx means the rate at which y changes for teeny tiny change of x. if dy = 4 dx, then y changes 4 times faster than x. The dy’s and dx’s are there because for most functions, every point has a different rate of change.

No, that DOES make sense. It’s just that in calc we were taught that the d<variable> construction is literally a magical operator, and you are NOT allowed, ever, to multiply, divide, or look at it in any context that is not pre-specified. Now in physics I’m seeing it being multiplied and divided by all over the place.

And we were never taught the multiplication-then-integrate thing for functions. In fact, we were explicitly taught NEVER to do an integration of an equation like that. We were literally told for two semesters that d<something>/d<something> is COMPLETE magic and operations would never work on it. And then the next semester integral <expression> d<something> is MAGIC and d<something> is not a value at all, despite what we’re doing with u-substitution, what are you talking about, dear student? Now go learn this stuff by rote!

If they are just really mundane fractions (caused by limits) that cause special stuff to happen when you multiply by them and integrate, or when you multiply by the ratio between two of them, then fine. I was just assured so many times that it was magic that I wasn’t sure if this was still magic or the illusion was just being shattered.

A lot of what’s going on can be justified by the Radon-Nikodym theorem, which is basically the grown-up version of both the fundamental theorem of calculus and the change of variables rules that you learned in elementary calculus. There are some regularity conditions, but most of the nice functions you see in applications will automatically satisfy them, so you don’t really have to worry about the details if you don’t want to.

Edit: Since you’re interested in machine learning-type stuff, it’s probably worth your time to eventually get enough analysis under your belt to tackle Radon-Nikodym. On the other hand, it’s definitely not something you need to worry about right away.

@sachertort - nice explanation. :slight_smile:

I believe it has rigorous meaning as a differential form, but differential geometry hurts my head to think about.

For first-time line-integral users I tended to treat ds = (dx, dy) as a short hand for ds ~= (deltax, deltay) for some small intervals where we’re going to do a sum, take the limit to zero to get the integral, and make the ~= into an = later anyway.

No magic. Dot product takes two vectors as input, multiples each component of one vector with the corresponding component of the other vector, and then sums the three results (or two results, if you’re only working with 2-D vectors) to give a scalar answer. So you do your dot product of the force vector (a, b, c) and the displacement vector (x, y, z), and the end result is the sum ax+by+cz. You can integrate all three of those together as a single polynomial, or break them up into three separate integrals.

The purpose of integration is to determine the area under a curve y=F(x). The brute-force method, demonstrated when one is learning integral calculus, is to break that area up into small vertically-oriented rectangles of finite width delta-x, then do a summation of all those rectangular areas from x=a to x=b. So your summation looks like this (sorry, don’t know how to do symbols here):

area under y from a to b = sum (i=a to b, F(i)*delta-x) (i increases in increments of delta-x)

This is necessarily an approximation of the true area under y, and that approximation generally becomes worse as delta-x gets bigger and bigger. So if you want a more accurate approximation, you make delta-x smaller.

The integral is the logical outcome of making delta-x as small as possible - infinitesimal. In that extreme, the big “sigma” used in summation becomes the big script “S” that is the integral symbol, and the delta-X becomes the infinitesimal width dx.

So…go back to your original example regarding mechanical work. You’ve got a function describing a 2-dimensional force vector F(s), where s is a 2-dimensional position vector. You want to calculate the work done in moving from s1 to s2. First, you describe the infinitesimal unit of work, dW, done when moving an infinitesimal distance ds, where ds is the vector (dx, dy):

dW = F(s) <dot> ds

dW = [F[sub]x/sub, F[sub]y/sub] <dot> (dx, dy)

Remember, dot-product takes two vectors and results in a scalar, so the above becomes:

dW = F[sub]x/sub * dx + F[sub]y/sub * dy

Now integrate both sides from position s1 to position s2:

W = integral(from s1 to s2, F[sub]x/sub * dx) + integral(from s1 to s2, F[sub]y/sub * dy)

Let me guess, you got integration before derivation…

Along with Sachertorte’s explanation and with knowing that “d(variable)” means “an infinitesimal amount of variation in (variable)” and that multiplying or dividing by “an infinitesimal amount” is not the same as multiplying or dividing by zero, dy/dx means “we are deriving Y with regards to X”. You can have a function which involves more than one variable: y(x,z) can be derived with regards to X (with Z treated as a constant) or with regards to Z (with X treated as a constant). In integrals, again, the “d(variable)” tells you with regards to which variable you’re integrating - with every other possible variable treated as a constant - and again it’s something with an actual value, you can multiply and divide by it.

I really understand what all of you are saying (well, I don’t quite grasp the Radon-Nikodym theorem), and I know all that information. I know how the dot product works. It’s just that I was told that “multiplication by dx” has literally no meaning, it’s gibberish, and has no formal mathematical definition, you’re doing an undefined operation.

If that’s not the case, then I can just go understanding one more thing than I did before. It’s just that I was assured so much that doing anything to or with dx other than using it as punctuation for derivatives and integrals is complete mathematically ill-defined gibberish that it became hard to understand if they were doing what I thought they were (multiplying by dx), or just abstracting away really complex nonsense.

OK, so yes, it actually has mathematical meaning. Please file that Math teacher alongside that one of mine who claimed that “there is no logic in Math, you just learn it!”

It’s weird, we started by getting dy/dx by using limits on the slope construction (delta x/delta y), and then transitioned straight to “dy/dx is NOT a fraction, it’s an OPERATOR, multiplying by dx is like multiplying by the fraction bar between them. You wouldn’t multiply by ‘/’ would you?”

If I had to guess, the intention may have been so we didn’t try to solve for x by multiplying by dx and dividing by d, but then I’m not sure why they didn’t just tell us not to divide by d (well, my good math teacher did, he said that it’s technically possible to assign d or dx a value with really advanced math backing you up, but that way lies madness so to forget it even has a value).

There’s actually a good treatment in the wikipedia article on differential of a function with some history of the usage.

Basically for a function of one real variable, f, the function df is a function of two real variables defined by df(x, deltax) = f’(x)deltax.

One caveat: While it is true that you can treat “dx” as a variable in almost all contexts, and do things like multiply and divide by it and get right answers, the same is not true of partial derivatives. Thus, for instance, you can say that (dx/dy)(dy/dz)(dz/dx) = 1, by “canceling out” the factors on the top and bottom. But you can’t say that (∂x/∂y)(∂y/∂z)(∂z/∂x) = 1. In fact, that product of partial derivatives is actually negative one.

This makes zero sense to me. First, we’re told that dx is a very small change in x, then we’re told that dy/dx means “we’re deriving.” Yet 3/4 doesn’t mean “we’re deriving 3 with regards to X,” so why would dy/dx mean we’re deriving dy with respect to dx?! How do we go from “it’s a very small change” to “we’re deriving”? Those concepts are unconnected.

The other thing I don’t get is why we write d’y/dx’ to mark a derivative. What’s up with the placement of the apostrophes?

It’s a matter of history (as per leahcim’s link). If you go back to the days of Leibniz, when Calculus was first invented, they actually did think of dx as a thing, that you could do stuff with. Trouble is, it represented an “infinitesimal” quantity, one that wasn’t exactly zero but wasn’t any specific positive number either. Since that wasn’t really mathematically justified and didn’t totally make sense, later generations (specifically, Cauchy) defined dy/dx as, not a ratio of infinitessimal quanities themselves, but as a limit of a ratio.

(That’s the way most moderns use it, but in the 1960’s “non-standard analysis” was developed that takes the “infinitesimals” approach of Leibniz and puts it on a rigorous mathematical base.)

We don’t; we either write dy/dx or we write y’ (pronounced “y prime”), depending on which notation we’re using for the derivative.

Note that even in bog-standard analysis there is the exterior derivative that can be used to define these things. Having a well-defined meaning of dx is not the sole province of non-standard analysis. It’s just a PITA to define rigorously at the same level of class when you’re talking about derivatives.

This, in a nutshell, is one of the major problems with math education. Calling d<variable> a “magic operator” isn’t flat-out incorrect–infinitesimals are somewhat different–but ignoring useful properties like infinitesimal multiplication and division because it’s difficult to prove them rigorously (at least in an elementary course) is IMO the sign of a teacher who doesn’t really have a fluency with calculus.

In sum, it is perfectly OK to treat infinitesimals like ordinary values/variables as intermediate steps in solving a problem (your final answer, of course, must either have no infinitesmals, infinitesimals in ratio like dy/dx, or infintesimals that are under an integral sign). Regarding the ds vector, this is just one of those useful intermediate forms which disappear in a final answer. It’s useful because it generalizes to any coordinate system (the exact form changes if, for example, you use polar coordinates, but this is hidden by just writing ds).

Well, we don’t say “we’re deriving dy with respect to dx”; we say “we’re deriving (maybe “differentiating” is better) y with respect to x”. The concepts are related in how–knowing an explicit formula relating y to x–a change in x (dx) causes a change in y (dy).

Are you referring to the second derivative of y with respect to x, i.e. should those apostrophies actually be 2’s (written like exponents)?

If so, I think this is an artifact of treating “take the derivative with respect to x” as a kind of multiplicative operator “d/dx”. Thus, “the derivative of y with respect to x” is shown as the operator equation "(d/dx)y–which if you “reduce” it as if you were multiplying two fractions would be “dy/dx”. Then is you take the derivative of dy/dx (i.e. the second derivative of y) in the same way you have the operator equation (d/dx)(dy/dx)–and again, if you “reduce” this fractional product you get “d^2y/dx^2”–where it’s understood that the “exponent” in the “denominator” actually applies to the whole denominator, not just the x.