 # Coefficient of covariation?

A statistics question I haven’t had any luck with: Is it possible (or useful) to generate a coefficient of covariation analogous to a coefficient of variation?

If one forms a variance/covariance matrix from a set of data, it shows by the magnitude and sign which factors covary with which others. However, because the scales of the different factors may be different, the meaning of directly comparing several covariances is confused.

I know that the usual way of normalizing a variance/covariance matrix is by dividing the entries by the product of their component standard deviations, resulting in a correlation matrix. Fine. The entries in this matrix will range from negative to positive 1.

However, I also know that you can normalize a standard deviation by dividing it by the mean, producing a coefficient of variation. So is it possible to normalize the covariance by dividing its square root by the square root of the product of the means, assigning the result the sign of the original? This seems to be a normalization analogous to the coefficient of variation, which could perhaps be called the coefficient covariation. Doing this to my data matrix produces a result very similar to the correlation matrix, except that it is not confined to the range negative to positive 1–it has values generally larger in magnitude than the correlations, and seems intuitively to be preserving more of the original information, but I’m not sure exactly how to interpret this information…

I Googled “coefficient of covariation,” and didn’t find much–only a South African paper in the Pakistan Journal of Applied Sciences by Seeletse in 2001, which appeared to suggest using exactly this math to produce precisely this metric. Unfortunately, two middle pages of the five-page paper are missing from the available PDF!

A friend suggested that one possible reason that this metric isn’t common is that it might not be valid to compare means of two grossly different regimes in this way–i.e., Teslas compared to dollars, or number of kumquats in Borneo to electoral college votes. It might no be valid to derive a unitless covariation between such variables by computing such Tesla*dollar means. This is a point that would be interesting to investigate, but I don’t think it applies to my data: I’m looking at populations of individuals in different samples, so the units should be the same. I’m more interested in eliminating strictly size effects, rather than unit. Also, I could see that it might be invalid to compare grossly differently-sized samples in this way–1000 individuals vs. 10, maybe–but I’m just trying to eliminate the size component of middle-of-the-road densities from their covariations to leave strictly the covariance portion without scale differences.

Any thoughts?

Stick with the correlation matrix. The coefficient of variation is nice because it has a very simple interpretation. The coefficient of covariation doesn’t seem like it will.

Covariation requires that your variables have paired values, so how can they have different sample sizes? I guess it’s not clear what you are trying to achieve with your “new” statistical measure. If you can explain this better, maybe you will get better responses.

If it statistics was easy to describe, it’d probably be a more popular subject. I have the same number of samples, so they pair; it is the magnitude of the populations within the samples which needs to be normalized. Or, more precisely, the magnitude of the means of the samples, I guess. I apologize that my terminology is probably not terribly precise–but then again, if I could generate exactly precise terminology, I probably wouldn’t have this question. I’ll try to explain a hypothetical setup similar to my own:

If I take, say, N=10 one-acre samples from a forest, each sample may contain a certain number of two species, A and B. You’ll get an average number of each species per acre, X[sub]A[/sub] and X[sub]B[/sub]. Each will also have a standard deviation (SD), s[sub]A[/sub] and s[sub]B[/sub]. You can calculate a coefficient of variation (CV) of each species as X/s, and make meaningful comparisons of their dispersion, irrespective (largely) of the magnitude of their relative means–i.e., the dispersion measures have been normalized. I just generated some fake data, and species A has a mean of 14.4 and a SD of 4.2 individuals/acre, while species B has a mean of 7.9 and a SD of 3.2 individuals/acre. The CVs of the species, however, are 0.3 and 0.4, respectively, showing that B has a higher dispersion than A, even though its SD is lower.

You can find the covariance between A and B as sum((X[sub]i[/sub]-X[sub]A[/sub])(X[sub]j[/sub]-X[sub]B[/sub]))/N, where X[sub]i[/sub] is the number of species A in the ith acre, and j for species B. (I realize the denominator may or may not need to be N-1 for arcane reasons, but let’s not complicate this any more than it already is.) So far it still seems to be a measure of dispersion (or squared dispersion), although it’s getting harder to say dispersion of what. This covariance reduces to the variance if one species is being compared against itself (sum(X[sub]i[/sub]-X[sub]A[/sub])^2/N). For this made-up data, the covariance is -11.46 individuals^2/acre^2, which indicates that they strongly negatively covary: when the population of A is high, B tends to be low (which I know because I set the data up that way). But exactly how strong? Is it significant? The respective variances are 17.8 and 10.0 (obviously: the squares of the SDs), so that at least gives us some sense of the magnitude of the covariation, but nothing very precise. If there was a third species, C, with a covariance to A of -12, it wouldn’t really be clear if it was more or less covariant to A than B is, or by how much.

I know that the standard practice to normalize the covariance is to divide by the product of the SDs (s[sub]A[/sub]s[sub]B[/sub]) and generate a correlation coefficient (r=s[sub]AB[/sub]/s[sub]A[/sub]s[sub]B[/sub]=-0.8 in this case–pretty strongly negatively correlated). I also understand the nice property of the correlation coefficient that it’s capped at -1 and 1 (the normalization of the variance by itself; r=s[sub]AA[/sub]/s[sub]A[/sub]s[sub]A[/sub]), which allows you to say if something is absolutely correlated, inversely correlated, uncorrelated, or somewhere in between. But the CV has a nice intuitive feel to it, because you’re normalizing the SD by something easily understandable: the mean. Removing the central tendency component from a dispersion measure, leaving just the dispersion, and utilizing both parameters of the Normal distribution. Conversely, while the correlation coefficient itself is fairly intuitive, the method of generating it doesn’t seem as concrete: You divide something that looks like a dispersion squared by two dispersions and get…well, something pretty darn useful, but it doesn’t seem like what one would try a priori. To answer the a priori question “How do I normalize these covariances” (not necessarily the question “How do I generate a metric of correlation”), it seems like one would try the same trick that worked before–dividing something dispersion-y by something central tendency-y. Interestingly, the result of doing so is not very intuitive.

To put it another way, if you can normalize the variance/SD in two ways (by dividing by the SD, or by the mean, generating the correlation coefficient and coefficient of variation, respectively), why can you only normalize the covariance/co-SD one way (by dividing by the SDs)? Where’s the fourth entry in this table–the memristor, if you will? The covariance (sum((X[sub]i[/sub]-X[sub]A[/sub])(X[sub]j[/sub]-X[sub]B[/sub]))/N) reduces to the variance (sum(X[sub]i[/sub]-X[sub]A[/sub])^2/N) as a special case; the correlation coefficient (s[sub]AB[/sub]/s[sub]A[/sub]s[sub]B[/sub]) reduces to unity (s[sub]AA[/sub]/s[sub]A[/sub]s[sub]A[/sub]); it seems that some coefficient of covariance (sign(s[sub]AB[/sub])(s[sub]AB[/sub]/X[sub]AB[/sub])^0.5) should exist where the CV (sign(s[sub]AA[/sub])(s[sub]AA[/sub]/X[sub]AA[/sub])^0.5=s[sub]A[/sub]/X[sub]A[/sub]) is the special case reduction. (The CCV for this made-up data was -0.32.)

I’m not trying to use the CCV for anything in particular; I agree that r works perfectly well. I’m just wondering what the heck (if anything) this CCV thing is or would be, what its properties are, if it’s been tried before, if it’s called something else, if it’s discussed anywhere, if anyone’s heard of it, thought of it, tried it out, played with it, etc. None of my statistics books seem to mention it or anything like it, and Web of Science returned zero results. But I can’t see why not, since a priori it so seems like the way to approach things? It was certainly the first thing that popped into my head when I was sitting there tiredly looking at a few covariances I happened upon and wondering how to legitimately compare them. Normally, if I was trying to correlate something I would have gone straight to, duh, the correlation coefficient, without ever explicitly thinking about the covariances. But when unexpectedly confronted with covariances, I thought to myself, “Self, I need to normalize these. How? Well, covariances are, as far as I can tell, mathematical generalizations of variances. You normalize variances with the mean; I should be able to normalize these against an appropriately ‘combined’ mean–something like the square root of the product of the means.” Google “coefficient of covariation” and, sure enough, the predominant hit is an (incomplete) paper containing precisely this equation. Spend 20 minutes calculating these CCVs before doing a face-palm and realizing you’re computing an obscure or non-existent metric for something that would’ve been perfectly obvious if you’d had more caffeine. Redo calculations, and spend the next week wondering what the heck you were calculating, and how two such seemingly similar objects can be normalized in such strangely different fashions. Spend too much time on Wikipedia getting your brain warped by excessive Greek notation.

Surely, if there’s such a commonly utilized non-normalized matrix (which seems pretty useless otherwise), and there’s a common normalization that applies to the diagonal of that matrix, somebody would have tried the mathematically simple analogization of that normalization, and it’d be mentioned pretty early in the literature. And if that obvious attempt is a mathematical or interpretational disaster, you’d think textbooks would give a warning as to why it isn’t useful to do, or someone would have published on why exactly it doesn’t work; how it violates some assumption of normality, or is a biased estimator of thus-and-such, or some other statistic-ish problem. But so far I haven’t seen anything…

Actually both the coefficient of correlation and the coefficient of variation/covariation are dimensionless because they are ratios of quantities measured in same units. The coefficient of covariation though, takes on large values when the denominator is small relative to the covariance. It is sensitive to translation of origin. This is probably a reason why the coefficient of correlation is preferred, since the latter is insensitive to translations and rescaling. Have you found any interesting use for ccv?