Sure, you you can make an argument for all sorts of error metrics. But consider just the simplest possible situation: what’s the best fit number for samples of 0 and 1? Intuitively, we’d expect 0.5, and that’s what least squares gives us. But least-abs could be any number between 0 and 1, which doesn’t seem right (of course, there are situations where it is, such as “where should two friends meet on a road between them if we want to minimize total driving time”, but even then least-squares seems to better capture notions of fairness).
I had not thought just about the integer exponents (squares, cubes…) but rational numbers like 2.5 . We also don’t see powers 2.5, 1.5 etc.
Perhaps how humans are able to perceive the world,
the quantities we are interested in (or we have the capability to be interested in) are related by the square function or maybe the relations we are interested in (or relations whose derivation comes naturally to us human beings) are where the quantities are related by square function.
I call this the “Bluenose effect”. The Bluenose was a Nova Scotia schooner that was widely held to be the most beautiful of ships, and not coincidentally, she was the fastest. The human eye detects what works best and sees it as beauty. Humans evolved in a world in which recognition of utility had survival value.
There is aesthetic attraction to many things that work well, from arrows to cheetahs. Perfection of design in the geometric universe came first, and the human eye followed through Darwinian selection.
I’m inclined to answer the OPs question by observing that the cube is the only polyhedron that packs uniformly in space. Which locks us into the simplicity of parallelism and perpendicularism. It is unimaginable that space would be a matrix of dimensions that are at odds with squarism.
I have wondered about the geometry of honeybees, which might have some dimensionality that we cannot grasp, since their livelihood depends so heavily on hexagons.
But it’s interesting to consider a species of philosophical fish, inhabiting a two-mile deep stratum in a four mile deep ocean, and who never venture beyond. They have never experienced he surface nor the seabed, and live in zero gravity. So how do they know up from down? They essentially occupy a universe made of an infinite number of intersecting planes, with no fixed points of reference. How would they reckon geometry?
Minimizing squared-error (with its linear derivative) is often preferred for reasons of tractability rather than efficacy! This criterion gives a simple solution to the linear regression problem. As another example, the proofs that the Discrete Sine and Cosine Transforms are optimal for certain signal compression problems rely on simplifying assumptions including the squared-error criterion. AFAIK there are no similar proofs for an absolute-error criterion.
It’s a good thing that you didn’t follow that inclination, then, because you’d be wrong. There are plenty of polyhedra that pack uniformly in space, including the extrusion of any polygon which packs uniformly in the plane, plus some 3d-specific ones like the rhombic dodecahedron.
And while least-absolute-value fits do have some wiggle to them, the wiggle is much reduced for data sets of significant size (the sort you’d usually be fitting lines to in the first place). And if the proofs that certain transforms are optimal depends on assuming the squared-error criterion, then maybe that just means that some other transforms are optimal for the least-absolute-value criterion.
Least-squares fitting is based on a number of assumptions which are usually good approximations to the real world, and so usually give good results, but those assumptions aren’t always good, and so it’s important to understand those assumptions and limitations, so that when they do fail, you understand why, and can use something more appropriate there. In particular, a great many real-world probability distributions have fatter tails than the Gaussian distribution, and so have more frequent outliers, and while those outliers won’t usually show up (hence what makes them outliers), when they do, they can totally wreck an analysis based on Gaussian assumptions.
-
I think least-square method isn’t worthy of getting too much attention here, because it is only one of the many methods which can be used. It is used because it is simpler to use. The basis of all machine learning is directional derivative and the direction of steepest accent/decent.
-
Of course, all of us would agree why we use perpendicular axes is because distance travelled along one axis has no contribution to distance travelled along the other axis. Resultant displacement in such a system is a^2+b^2…
I was thinking can there be another axes system where resultant displacement (or any other quantity of interest) be something which is got using something other than exponent 2.
Sorry, but this is mystical nonsense. From a purely geometric standpoint we have a preference for things that are symmetrical and in certain supposedly ideal proportions (hence the Golden and Silver Ratios), but when that extends to functionality independent of geometry, “Darwinian selection” of aesthetics offers nothing in terms of optimal performance. The unprecedented performance of the Bluenose was almost entirely an accident of modifications forced by changes in racing rules during her construction, and in general, beauty, even when universally agreed upon, is not a great indicator of optimal performance or strength.
I’m not sure what you mean here and I don’t think you do, either. In any system of orthogonal Cartesian axes of any multiple degree of dimensions, the distance between two points is the root sum of squares. You could have axes which are not orthogonal, or a metric space that is not linear (Cartesian) in which case the distance would vary with respect to the underlying geometry, but the distance being a function of squared quantities is a basic result of the coordinate transformation from the default coordinate system to one with the line connecting the two points laying coincident to one of the axes.
Stranger