Statistics: repeatability

Concerning repeated measurements, what is the mathematical definition of repeatability? Some websites seem to give this as simply the standard deviation, but surely it’s more than that? (and surely it’s not just +/- 3*sigma?)

Related question:

Suppose I have three measurement instruments (that measure different things, e.g. length, pressure, temperature) with known repeatability specifications of X1, X2, and X3. Further suppose that I take simultaneous measurements with these instruments and use those measurements together in a calculation.

How do I calculate the repeatability of that calculated value?

I’m really not sure what you’re trying to ask here. It looks like you’re missing a lot of context.

Instrument/sensor manufacturers typically spec accuracy and repeatability, e.g. these guys. They give a verbal description of the general concept of repeatability:

They also give the repeatability spec for their particular flow meter as 0.2%. But how is that number calculated? One guy takes X readings within a short time, and…then what? What’s the formula for calculating repeatability?

My second question is about how the variability in a calculated value is affected by the repeatability of the instruments whose readings are used in that calculation. Suppose for example I have the following measurement repeatabilties:

Length, 0.5%
Pressure, 1.0%

I have a column of unknown liquid, and I take measurements of its height and the pressure at bottom, from which I want to calculate density:

Density = pressure/(g*height)

Knowing the repeatability specs of my length and pressure measurements, how can I quantify the variability of my density measurement? Is repeatability the wrong term/concept to use in this context?

Perhaps the OP can start here Accuracy and precision - Wikipedia and tell us where that leaves him versus where he wants to end up.

ETA: ninja’d by the OP’s response. Continuing from there …

You’re using “repeatability” in the sense of precision per the wiki article.

FOr your measurement question, you multiple the error bars.

e.g. your pressure measurement isn’t 100 psi. Given your 1% spec, it’s really 99psi-101psi.

Ditto your height measurement isn’t 200 inches. it’s 199-201 inches.

So multiply the upper limits together and the lower limits together. Those two are the envelope for the density *to the same degree of confidence as *the original repeatability / precision specs.

The OP might want to monitor this new thread for developments nearby to his Qs. http://boards.straightdope.com/sdmb/showthread.php?t=807014

Then maybe I’m misunderstanding repeatability? There’s a whole separate Wikipedia page for it, so I gather it’s something distinct from standard deviation:

Standard deviation is a bit of a misnomer. IMO the name promises more than it *really *delivers. It’s not wrong, it’s just got a little puffery going on.

Simplifying a bunch, SD is a statistical measuring stick which means “IF the underlying reality is normally distributed, THEN ~2/3rds of the measurements will be observed within +/- 1 standard deviation of the mean.”

The IF part is an assumption about the thing you’re measuring and the tool you’re measuring with. For many situations these are reasonable assumptions. For others not. Knowing the difference is key.

The THEN part says that for many practical purposes, 2/3rds is close enough and we can safely mostly ignore the outliers as we go on to the next step in whatever we’re doing. For many situations that’s a reasonable heuristic. For others not. Knowing the difference is key.

I suspect Chronos’ request for context in post #2 was aimed at getting the info to know those differences.
Assuming you really do have a normally distributed situation with your problem and with your tool, then the repeatability can be measured using SDs. Metaphorically: repeatability of normally distributed measurements is to SD like length is to inches.

I’m getting close to the end of my very rusty expertise here, so I’m mostly gonna shut up now. My record on saying smart stuff in statistics threads is not good once they really get going. We have plenty of true stats experts who will scoff at my caricatures just above and carry on.

In that case, I would assume that what they’re giving you is the standard deviation, since that’s the most natural choice for how to quantify such things. At least, assuming Gaussian errors, but that’s a very standard assumption (even if it shouldn’t always be).

And assuming that you are dealing with Gaussian errors, LSLGuy’s procedure is incorrect. That procedure only works if you’ve got tailless error distributions, and your error is specified by a maximum and minimum possible value, but such distributions are very seldom encountered. Assuming Gaussian errors, the proper procedure for the error in a multiplication would be to take the two relative error values and add them in quadrature. So, for instance, if you’re multiplying (or dividing) two numbers which both have a 1% error, then your result will have a 1.41% error, and if you’re multiplying a number with a 3% error by another one with a 4% error, you’ll get a result with 5% error.

That’s exactly what i was thinking. Take your object’s length, let’s say it is truly 100 units. If you measure it several times and each time your measuring tool reads 95, then it is perfectly repeatable, but only within 5% in terms of precision.

And it might not even be within 5%. Take the tool and measure something that is truly 50 units long. What if it reads 45 each time? Still perfectly repeatable, but now it is off by 10%, because it has an ***offset ***of 5 units.

Now take the instrument and use it to measure a stack of standards, going up 10 units each time. When we reach 100 on the standards, the tool reads 95. Keep going up to 200 then remove 10 units each time. What does it read at 100? It might read 105. So it has a hysteresis curve peaking at ± 5 units at its midpoint. But it could still be perfectly repeatable in each direction.

This is why we calibrate each instrument against standards in an increasing direction, then back down in a decreasing direction. And repeat. This determines precision, repeatability and hysteresis. And makes for very long days.

And it gets even more troubling if we are measuring a system that has moving parts that can have friction. We used to get notes from the research engineers to calibrate the thrust load cell by going up and down 10 lbs per step, with 3 gentle taps of a rubber mallet at every reading! And if they didn’t like the result they would say, “You tapped it wrong…”

Dennis

That’s a failure of accuracy, not of precision.

I’ve never quiet understood the definition for repeatability.

Let’s say I take a voltmeter and connect it to a perfectly-stable voltage source and get the following at 1 second intervals:

1.234 V
1.234 V
1.236 V
1.234 V
1.235 V
1.233 V
1.233 V
1.234 V

And then I calculate 2s or 3 s or whatever. Is that a measure of repeatability? Or am I measuring noise?

So then I turn on more averaging and get the following every ten seconds:

1.234 V
1.234 V
1.234 V
1.234 V
1.234 V
1.235 V
1.235 V
1.235 V
1.235 V
1.235 V
1.235 V
1.236 V

Can I calculate repeatabiltiy from that? Looks like short term drift to me.

In other words, wouldn’t “repeatabiltiy” be a function of the sampling rate and the total number of samples? Or put another way, isn’t “repeatabiltiy” really noise + drift?

Correct. The OP should google “propagation of uncertainties” for more info and examples.

There are two aspects to what the o.p. is referring to as “repeatability”: one is the error in the measurement method (deviation from actual), and the other is the natural variability in the property being measured. From a statistical standpoint these are lumped together in a population of measurements, and the ability to obtain measurements that fit within the previous distribution is called confidence (the confidence level or confidence interval depending on how it is stated). It takes careful trend analysis using measurements of a calibrated property (a “control” in experimental erms) in order to separate the variance of measurement error from the variance in measured property. This is often necessary when a measurement tool falls experiences drift (say, from heating up during measurements) or when different tools are used to used to measure a property (one ruler might just be slightly longer than another, or we might see “rounding” errors due to converting measurements taken at different scales or recorded in different units of measurement).

Assuming the relationship you are measureing is linear in nature and the measurement error is distributed in a Gaussian fashion, performing an ordinary least squares regression will give an estimate of how much natural variability can be attributed to the property versus measurement or other errors in the form of the coefficient of determination (often referred to as R[SUP]2[/SUP]), which is unity minus the sum of squares from the OLS fit over the total sum of squares from the mean. A high value for R[SUP]2[/SUP] indicates a good correspondence between the regression fit and the data (few “outliers”); however, this does not necessarily mean that the trend is “true”, merely that the data shows low variability, a nuance that is often lost on people who do not look critically at the data set. For instance, a data set that has very few measurements, for for which the measurements fall into just a couple of small clusters about a couple of loci on the independent scale may naturally have a high R[SUP]2[/SUP] even if the actual trend is not at all linear which would be shown by increasing the range of measurements at different values of the independent property. If you had only two measurements, the R[SUP]2[/SUP] would naturally be 1, even though it actually tells you nothing about the distribution.

There are other tests that measured the actual “goodness of fit” of the data to an assumed theoretical distribution, or can be used to compare the distributions of multiple sample populations of data or multivariate sources of variability in order to perform an analysis of variance (ANOVA) between populations, but these basically test how well your data fills out a distribution or how repeatable your measurements are given a certain sample size within a defined confident interval; actual variability due to measurement error can only be distinguished by comparison to a known controlled distribution. In reality most errors and other supposed randomly distributed properties are not strictly Gaussian (or have multiple dependent variables which each have their own variance that has to be accounted for) but are close enough that a Gaussian model is a good approximation and fits closely enough to discern actual bias in the measurement or phenomena in question.

However, it should be noted that when you are presented with a tolerance in measurement that is just +/- x.xxx with no other information, the only reasonable assumption you can make is that the error due to measurement is uniform across the range, i.e. that there is equal chance of the error being +x.xxx as there is being 0.000, and no chance of exceeding x.xxx down to the precision given by the number of significant figures. (The measurement error tolerance should be the same number of significant figures as the measurement but you’ll find that this often isn’t the case with many instruments, in which case the measurement should be adjusted to match the significant figures of the tolerance.) While the strictly random part of the error should be Gaussian by definition (and therefore has some small potential of having a measurement arbitrarily far from the mean), the stated error tolerance may include sources of bias such as effects from the environment that will provide a consistent offset from the mean (again, an instrument heating up as measurements are taking is a common one).

There is a large amount of resources on the topic of measurement error and statistics, but I like John Mandel’s The Statistical Analysis of Experimental Data as a good starting point for understanding measurement error and basic regression analysis. Unlike most books on statistics, Mandel actually talks specifically to the inherent variabiility in measurement as distinct from other sources of variability. The National Institute of Standards and Technology (NIST) also has the excellent online Engineering Statistics Handbook that is a good starting place for understanding the applications of statistical methods to experimental measurement.

Stranger

The general mathematical framework for doing this is Bayesian Networks.

The mathematics involved for doig it can be rather simple or intuitive or quite complex, depending of your system:

If the 3 variables that you are measuring are completely independent, then you just can’t combine the measurements.

If the 3 variables are proportional one to the other and the ‘errors’ of the instruments are Gaussian, then there are relatively simple mathematical formulations.

If the 3 variables are linked by probabilistic relationships involving non-linear formulas (e.g. multiplications or sinusoids), if your model has hidden variables, then things can get a little hairy.

If you are interested in digging into that, hereis the tutorial I learned Bayesian networks with a while ago. I don’t know how up to date it is but it is pedagogic.