Please help me understand the Jacobian matrix

I’m just trying to get my head around the Jacobian matrix for a task*.

I think I understand derivatives (e.g. dF/dx of “F = x^2” is 2x), and I think I grok gradients (just a vector field where each vector points in the direction of greatest increase).
But suddenly I get to the Jacobian matrix and hit some kind of mental block. I must have read the first 3 pages of google results. I still don’t get it to a sufficient level to know how to apply it.


  • My task, which is not homework :), includes computing the jacobian of a deformation field.
    The deformation field is a function that for each x y z of 3D space, gives a displacement in each of the 3 dimensions.
    So: (x’, y’, z’) = (x, y, z) + Disp(x, y, z)

Well, understanding is one thing and getting your task done is another; they’re related, but you needn’t necessarily fully have the former to do the latter. Still, I’ll concentrate on the former anyway:

I think we teach multivariable calculus in a somewhat needlessly confusing way, largely out of historical reasons more than anything else. Let me try to give you some better intuition so that you can understand how derivatives, the gradient, and the Jacobian are all fundamentally the same concept (just called different things depending on the number of dimensions).

In the following, I’ll write “x |-> x[sup]2[/sup]” to mean the function which sends an input x to the output x[sup]2[/sup], “e |-> 8e” to mean the function which multiplies its input by 8, and so on.

A function’s derivative at a point is the local linear approximation to the function at that point. What do I mean by this?

Well, for starters, let’s introduce the idea of a linear function. We call a function G linear if G(a + b + c + …) = G(a) + G(b) + G© + … . For example, e |-> 8e and e |-> -e are linear, while e |-> e[sup]3[/sup] and e |-> sin(e) are not. Functions which distribute across addition like this are of course very convenient; hence our interest in them.

Now, if F is some function, and p is some point, then the derivative of F at p is itself a function. I know, you don’t normally think of it that way. But you should! The derivative of x |-> x[sup]2[/sup] at 4 isn’t really 8; it’s the function e |-> 8e (i.e., “multiply by 8”).

Finally, the purpose of the derivative of a function at a point is to describe, to the best linear approximation, how small changes in the function’s input around that point cause changes in its output. That is, if G is the derivative of F at p, then F(p + e) should be approximately equal to F§ + G(e) for small e. For example, when we say that e |-> 8e is the derivative of x |-> x[sup]2[/sup] at the point 4, what we mean is that (4 + e)[sup]2[/sup] is approximately equal to 4[sup]2[/sup] + 8e, for small e (this being the best approximation we can give using a linear function).

Alright; what of all this? Well, the point is, nothing I said above assumes F takes in one-dimensional input or produces one-dimensional output. That’s just one common case. But we can look at functions from any number of dimensions to any number of dimensions as well.

Now, if F is a function from n-dimensional space to m-dimensional space, so are its derivatives at any point. And, as you probably are aware, a linear function from n-dimensional space to m-dimensional space can be represented by an m-by-n matrix of numbers.

In particular, if F is a function from 1-dimensional space to 1-dimensional space, then its derivative at any point can be represented by a 1-by-1 matrix of numbers; i.e., by a single number. This, as we saw above, is what’s going on when we say the derivative of x |-> x[sup]2[/sup] at 4 is 8; the 8 is shorthand for the function e |-> 8e which it represents.

If F is a function from n-dimensional space to 1-dimensional space, then its derivative at any point can be represented by a 1-by-n matrix of numbers; i.e., by an n-dimensional vector. This is what’s going on when we say the “gradient” of <x, y> |-> x[sup]2[/sup] + y[sup]2[/sup] at <4, 3> is <8, 6>; the <8, 6> is shorthand for the function <e, f> |-> <8e, 6f> which it represents, and “gradient” is just a fancy word for “derivative of a function from n-dimensional space to 1-dimensional space”. (It happens to be the case that the vector representing the derivative in this case always points in the direction of greatest rate of increase and has magnitude equal to the rate of increase in that direction, but don’t think of that as the fundamental definition of the gradient; the gradient is just the derivative by another name)

Finally, if F is a function from n-dimensional space to m-dimensional space, then its derivative at any point can be represented by a full m-by-n matrix of numbers, and in this case the fancy word we use for it is “Jacobian”.

(And “partial derivatives”? That’s just when you speak of particular entries in this matrix, rather than the entire matrix)

Alright, as for computation: the rows and columns of a Jacobian matrix correspond to particular co-ordinates of the output and input, and the entry at any given point is the partial derivative of that corresponding output co-ordinate with respect to that corresponding input co-ordinate.

For example: the function <x, y> |-> <x[sup]2[/sup]y, x + y, sin(y)>. This is a function from 2-dimensional space to 3-dimensional space. Its Jacobian matrix will be a 3 by 2 matrix. Its first row corresponds to the x[sup]2[/sup]y coordinate of the output, its second row to the x + y, and its third row to the sin(y). Its first column corresponds to the x of the input, and its second column corresponds to the y of the input. At each row and column goes the corresponding partial derivative. So the matrix in full becomes



2xy | x[sup]2[/sup]
------------
1   | 1
------------
0   | cos(y)


(You may use the flipped convention for rows and columns; it doesn’t really matter. Indeed, in many cases, the matrix itself doesn’t matter; it’s just a way of representing a linear function, which is often more conveniently dealt with without resorting to the matrix representation. But when you do want the matrix, this is how you get it; just set up all the partial derivatives in a table)

You might benefit from a more physical point of view. One application of the Jacobian is in solid mechanics.

The deformation gradient is a linear transformation that converts one configuration of points to another. Let x represent one configuration and X represent another. Then X=Fx, where F is the deformation gradient carrying x to X. The vectors x and X contain the locations of the points in a body.

The determinant of F is the Jacobian, and it turns out to be the ratio of the density in the original configuration to the density in the final configuration.

In other words, if the location of points in a chunk of Jello was x, and you squashed the Jello until the points took on locations X, then the determinant of F (X=Fx) would be the ratio of the final density to the initial density of the chunk of Jello.

You’re speaking about the Jacobian determinant; the OP seems interested in the Jacobian matrix F itself.

Still, the point you make is a good one for defining the determinant of a linear transformation from some space to itself (with square matrices being a particularly co-ordinatey way of looking at this): the determinant is the constant ratio describing how much the “volume” (length in 1 dimension, area in 2 dimensions, and so on) of any figure multiplies by under this transformation.

[Technically, “oriented volume”, in that the determinant will go negative to reflect volume that turns inside out].

Er, sorry, I should have written <e, f> |-> 8e + 6f here. A function from 2-dimensional space to 1-dimensional space. In general, a vector v can be taken as representing the function w |-> v dot w, and that’s precisely what we’re doing when we say the gradient is a particular vector. Though if we presented things differently, we wouldn’t even have to worry about a lot of this unnecessary back and forth into unwieldy representations.

(While I’m fixing minor errors, Hyperelastic’s last line should read “ratio of the initial density to the final density”, as in their penultimate paragraph. Of course, the ratio of densities goes the other way from the ratio of volumes, the latter being, I think, a more direct conceptualization of the determinant.)

Thanks a lot Indistinguishable, I think I have a pretty good understanding of the Jacobian (matrix) now.
In my mind, derivatives, gradients and the jacobian were separate but related concepts. Of all the links, none made it clear that they are the same thing over different orders.

So, in terms of my problem, and this might be hopelessly naive, would I be right in looking at it like this:

e.g. Let’s say I have a 1D deformation, and it’s values are [3 6 2 7 4 1].

“Differentiating” at x = 1, I can just use a delta_x of 1 (so, not really “infinitesimal”), yielding a slope of 3.
And similarly, use a delta_y of 1 etc for the 3D case to fill in all the elements of the jacobian?

Yes! I got all excited and read Jacobian and not much else from the OP.

In mechanics, when we say Jacobian, we always mean det F. We call F the deformation gradient.

Indistinguishable, can you please travel back in time and become (all of) my calculus teacher(s)? Please?

Well, I still don’t exactly know what the task you’re supposed to accomplish is, but I think what you’re saying here sounds reasonable.

Aw, shucks; I think I need a blushing smiley. :slight_smile:

I still can’t see how to apply the jacobian to my problem. I’m really tearing my hair out over this. :mad:

So I have a displacement field: it’s a 3D field of 3D vectors. Each vector is just a translation to apply to a control point at that position in the world.

e.g. Applying this field to a regular cube, say, might produce something like this.

But I need to calculate an affine matrix for each point, that describes the local transformation in terms of scaling etc, not translation.
Apparently this matrix is “simply” calculated as M = I + J, where M is the affine transform, I is the identity matrix and J is the jacobian of the distortion field.

But what does “jacobian of the distortion field” mean?

I’m not sure what you mean by “describes the local transformation in terms of scaling etc, not translation”; if “affine matrix” means what I think it means, there is translation involved.

Anyway, the function you’re computing is f§ = p + d§, where d§ is the displacement at point p. That is, you’re ultimately interested in the function f which adds to each point the appropriate displacement. The derivative of f [aka, its Jacobian] at p is therefore the derivative of the identity function at p + the derivative of d at p; the derivative of the identity function is itself (since it’s a perfect linear approximation to itself), so we get the equation M = I + J, where M is the derivative of f and J is the derivative of d.

That’s just in case you were wondering where that equation came from; you’re basically just calculating the derivative of f.

Ok, now, how do you calculate J? That is, how do you calculate the derivative of d? Well, if you’re going to represent it as a matrix, then its first column (well, you may perhaps use the flipped convention for rows and columns, but whatever) is the derivative of the displacement field as you move along the X axis, its second column is the derivative of the displacement field as you move along the Y axis, and the third is the derivative as you move along the Z axis. So, the first column of the matrix, at the point p, can be approximated as (d(p + <tinystep, 0, 0>) - d§)/tinystep, the second column can be approximated as (d(p + <0, tinystep, 0>) - d§)/tinystep, and so on for the third column (note that these are all three-dimensional vectors).

Does that help?

Well yes, that’s a very good explanation, and it does make sense.

But now I’m wondering why my (naive) function doesn’t work. Are you familiar with Matlab?
I’m going to just post up my matlab code; this might be a bit cheeky, especially for GQ, but I’m way past the point of desperation…

I’m not familiar with Matlab syntax, but if you talk me through it, I should be ok. What does the first line do? I think I understand the rest (initialize a 3 * 3 matrix “jacob”; fill in its entries. deformAtPlusXYZ and deform are both, apparently, 3d vectors).

Just from the comment, it sounds like deform is d§ and deformAtPlusXYZ is d(p + <1, 1, 1>). But you don’t want d(p + <1, 1, 1>) for anything; you want to use d(p + <1, 0, 0>) - d§ for calculating the first column, d(p + <0, 1, 0>) - d§ for calculating the second column, and d(p + <0, 0, 1>) - d§ for calculating the third column. [Well, it’d be even better to take steps of size less than 1 unit if your grid is finer, but I’ll assume you chose 1 unit because that is indeed the resolution of your grid]

That is, it sounds like you’re always looking at steps in the <1, 1, 1> direction. But that’s not what you should be doing; you should be looking separately at steps in the three different directions, the X, Y, and Z directions.

Houston, we have a jacobian!

Thanks Indistinguishable, that worked a treat! I would describe how I want to repay you but it would be NSFW :smiley:

Just one more thing…the values I’m calculating are tiny, I had to scale them up before they matched my reference image. Is there a standard scaling factor (e.g. should the largest jacobian on an image be 1 say?)
</columbo>

Scratch the last thing actually, scaling was just a bug