Interpolation, extrapolation, or other? Multivariate data...

Suppose you have a function of two independent variables, Z = f(X,Y), and a graph comparing the two independent variables shows your (X,Y) point cloud is somewhat long and slender and somewhat crescent shaped, like a bit of scatter along and around the middle region of the positive branch of a hyperbola.

If you want to estimate f for a point (X,Y) that is well inside the cloud, certainly one way to do so is to interpolate between some close neighbors. At least, I think the term “interpolation” can apply to functions of more than one variable.

Suppose you want to estimate f for a point that is partly surrounded by the crescent, so that it is between the middle of the cloud and the line joining the ends of the cloud. All of its nearest neighbors are broadly in the same general direction. Is using those nearest neighbors interpolation or extrapolation? Do we call it interpolation because we’re evaluating f for values of X and Y that are respectively within the ranges of X and Y? Or do we call it extrapolation because it is outside the cluster of neighbors?

Suppose we pick a point far off of the crescent to estimate, but it’s still within the ranges of X and Y; that is, draw a rectangle to enclose the crescent, and estimate a point near that corner of the rectangle that is furthest from any point. What’s that?

This would be easier if I drew a plot, but there’s one that is useful enough at so imagine we’re talking about a point near the center of the circle that the points roughly follow.


I don’t believe there is a technically correct answer to this question. You have the right understanding of those words as they are used univariately and you could say extrapolating in the y direction and interpolating in the x direction, but that seems very awkward.

However “extrapolate” also has the less precise non-technical meaning of infer from the existing data, so I’d go with that one if forced to choose between the two.

What it sounds like you are describing is a method of modeling known as k-nearest neighbors regression. I wouldn’t describe it as interpolation because you aren’t fitting a slope between nearby values you are just averaging whatever is available. This is a well regarded algorithm but it only works in areas close to where the data is located. Averaging a number of far away points to get a value is likely to give erroneous results.

What you actually want to do is fit the data to a parametric function and then evaluate that function at the center of the crescent. In that case you would be extrapolating since you are extending the function away from the area on which the function was fit. How best to fit the function depends on the nature Z as a function of X and Y, and may be rather difficult.

No, my question is actually about the terminology distinction, not ways of accomplishing the estimation. The estimation is already working fine. Any comment on terminology?

I have always thought of the distinction trivially.

Interpolation estimates within the body of samples you have.
Extrapolation estimates outside of the body of samples.

Assuming this is the question. You can do this for any number of independent variables - it is just a matter of definition where you draw the distinction between inside and outside. And that isn’t trivial.