Statistics question

Sorta long, sorry.

My work is in the area of very very high-reliability electronics. So we have a strict rule: exposed pure tin (Sn) is not allowed, and pure Sn finishes/platings are not allowed. This is because “tin whiskers” can grow from pure Sn, causing short circuits. To prevent this from happening, we require solder to contain at least 3% lead (Pb) by mass. (Traditional solder is 63% Sn and 37% Pb.) Also, the plating/finish on component terminals/leads cannot be pure Sn; the finish must contain at least 3% Pb. (A traditional solder finish is best, as it contains 37% Pb. But we allow all the way down to 3% Pb. But not lower.)

A contractor assembled a bunch of printed circuit boards (PCBs) using traditional solder (63% Sn / 37% Pb), which is good. But… some of the capacitors that were soldered to the PCBs had a pure Sn finish on the leads/terminals. :roll_eyes:

Now, in the areas where the solder covered the terminals during soldering, it’s fine. But the solder did not completely cover the top of the terminals. This explains it better:

The contractor said, “Nothing to worry about. Even though it doesn’t look like it, some of the solder did make its way to the top of the terminal, which means some Pb made its way to the top of the terminal.” To prove this, they measured the percentage of Pb at the top of the terminals for each capacitor using XRF. They took five measurements on each terminal. Here is data from one terminal for one of the capacitors:

(That’s not the actual data or actual photo, but close to it.)

Contractor is saying, “See, look! The lowest value on that terminal is 3.6% Pb, which is higher than the 3% minimum. All is good!”

So I thought, hmm, let’s do some statistics on this. So I estimated the population mean by doing this:

N = 5
Degrees of freedom (DF) = 4
Sample average = 6.26
sample standard deviation (sn-1) = 3.494
Confidence level (CL): 95%
t_value = 2.776 (based on DF and CL)

Therefore, and with 95% confidence, the population mean is estimated to be:

6.26 ± (t_value * sn-1)/sqrt(N)
= 6.26 ± (2.776 * 3.494)/sqrt(5)
= 6.26 ± 4.34, or between 1.92 and 10.60

So with 95% confidence, the population mean is estimated to be between 1.92% Pb and 10.60% Pb.

Is the above correct and valid? And is this the best way for me to do this? I’m thinking it would be better to compute the probability the population mean is less than 3%. Or perhaps even better, the percentage of the population that is less than 3%. But I don’t know much about statistics, and don’t know how to do that.

I would suggest testing more than a single capacitor. For example, you could pass/fail each one, and if you test, let’s say, 60 of them and they all pass then you could be 95% confident that at least 95% are good.

You seem to be suggesting that the data off each one is suspect, however, as in you took five measurements of a single terminal and got wildly differing results each time. Therefore it sounds like you really need to know the accuracy and precision of the XRF instrumentation in use. ±20%? 30%? Is there a way to prepare your own (known) samples and get an accurate number?

Looking at this from a different point of view, why are you allowing the contractor to try to talk his way out of using pure Sn tips when you specified a minimum 3% Pb?

I think I’d have to have more industry knowledge to speak about how acceptable that is, however I just want to note that a 3% mixture is not necessarily the same as uniform 3%. Like if you samples an area that used your acceptable 3% lead, you’d get some readings above and some below, right?

Echoing the others sorta.

So they took 5 samples from a single exemplar installed capacitor and are claiming somehow that’s adequate proof of a production run measured in dozens to thousands? And you don’t / can’t know if they cherry-picked that one?

I am most definitely not a statistician. But that doesn’t even pass the laugh test.

Statistically speaking what they demonstrated is that they are utterly clueless as to everything about statistical process control. Or they’re bad at lying. Or they think you & your organization are fools. Probably a mix of all three, but none are a good look.

Assuming I accurately understand the situation. ETA: Which based on the response below I evidently do not.

Each PCB contains five of these “risky” capacitors, and there are 1,500 PCBs. They took XRF measurements on a few of them. The data I presented is just one of the data sets they provided.

That’s a good point; I don’t know how accurate their XRF tool is. I guess I was just trying to show that simply looking at the minimum value of five measurements is not valid from a statistical POV, hence the reason for estimating the population mean. But as mentioned in my OP, I’m thinking it would be better to compute the probability the population mean is less than 3%. Or perhaps even better, the percentage of the population that is less than 3%. But I don’t know much about statistics, and don’t know how to do that. Regardless, I am going to recommend the PCBs be reworked to fix the problem; was just looking for better statistical treatment.

I am not sure what was originally specified – I haven’t seen the contract. I am working on the assumption that the contract didn’t specify a minimum Pb content for the finishes on component terminals. (If it did, you’re correct, and we simply tell them to fix the problem.)

Strictly speaking, if you have a population, and some variable has a mean, then there is no “probability the population mean is less than 3%” — it has some definite value, either less than 3% or not less than 3%! What you can do is hypothesis testing with a p-value.

I am just not positive that I grok what the XRF measurements are telling us. It seems that, looking at a single capacitor, both the amount of lead varies from point to point [in some way, but how exactly?] , and that the measurements taken at any given fixed point are also noisy? So you will have to take both of those things into account.

The OP represents the gold standard of providing context. Thanks for the details.

A first major question in my mind is this: Taking the 3% Pb content as the spec, is this XRF measurement really responsive to that spec? The spec comes from the fact that 3% Pb-Sn won’t whisker out due to the Pb doping messing with Sn crystal growth, but this isn’t 3% Pb-Sn. It’s bulk pure Sn (which will whisker out) covered by a layer of 37% Pb-Sn. Pure Sn needs a robust coating to keep the whiskers at bay, and the chosen coating could presumably be 37% Pb-Sn, but it still needs to be robust enough to prevent whisker penetration (as these grow from the bottom).

In this interpretation, the coating is made from 37% Pb-Sn, but the XRF shows much lower Pb fractions due to X-ray penetration. That it, the new layer would just be a very thin layer of 37% Pb-Sn, and the XRF is reporting a mix of that thin layer plus the pure Sn below. If the penetration depth of the XRF measurement for these materials is known, then the reported Pb fractions could be converted into thicknesses. Some Googling of XRF penetration depths for Sn yields a rough estimate of maybe 30 microns for the new layer’s thickness (for the lowest fraction cases).

The question becomes whether such a thin layer of 37% Pb-Sn is sufficient to prevent whisker penetration from the Sn underneath. Conformal coating with parylene is recommended to be closer to 100 microns for this purpose, but it’s also a very different material (better? worse?)

Or maybe the Sn terminals are heated enough such that the Pb-Sn solder infuses throughout the pure Sn? The photo suggests that the Sn on the terminals never melts, though, so I would naively think that doesn’t happen to any significant extent.

Or maybe my picture is all wrong in some way; I’m no metallurgist and am happy to be corrected. Setting the metallurgy aside and talking numbers…

The variations in the readings are due to something we don’t have a model for. For purely statistical problems (like how many times I rolled snake eyes in a series of craps sessions), the model is perfectly understood, so concrete statistical inferences can be made.

But here, the variation is due to god-knows-what: Intrinsic precision of the XRF? Non-linearity in the instrument? Measurement bias due to the geometry of the target, alignment of the XRF, and/or the collimation used in the measurements? Variation in solder flows due to geometric effects or inconsistent heating? Actual variations in bulk Pb fraction (if such an interpretation makes sense)? Or maybe the dominant cause of failures is some discrete but uncommon problem, not tied to any “continuous” mechanism.

Statistics can’t climb us out of that swamp of unknowns. Even if we take the XRF readings as fully precise and accurate, there is no model for extrapolating a failure rate. Extrapolation with a normal distribution – or any other distribution – won’t mean anything unless there is a defensible model in hand that says the Pb fraction should follow said distribution. The only statistical path in the absence of a model is to take enough samples to place an empirical, statistical limit on the failure rate. For example, if you want to say with 95% confidence that the failure rate is less than 1%, you need to test 300 good parts with no failures seen. For tighter tolerances, you can scale linearly from here. So, saying “The failure rate is <0.1% with 95% confidence” requires testing 3000 parts and seeing no failures. Of course, if failures are observed, one can set statistical upper and lower limits on the rate.

But without a defensible model for the failures or without enough direct empirical data, any claim of an upper limit on failure rate will be fanciful.

I’d just have them re-work the boards with the correct capacitors at their expense.

One-and-done. With no math needed.

Addressing the statistics part of the question: you’re assuming a symmetric distribution about the mean. But this cannot be true. For very very small standard deviations, with a mean sufficiently far away from both zero and 100%, this is an ok approximation. But for those numbers it is not. Bear in mind the concentration can never be less than zero or more than 100. Who knows what the “correct” distribution is (probably some variant of a beta distribution) but you can do better than what you have by taking the natural log, assuming a normal distribution in log space, calculating what confidence intervals you need, then exponentiating the results. This is called a log normal distribution. Strictly speaking it is not totally correct but it is a step in the right direction and is easy to do.

Edit: I just noticed you are asking about the population mean, so yes you can rely on the central limit theorem for that. But I’m not sure that is the right quantity to focus on. Would a better question be: what is the probability that a given capacitor contains a coating that is less than 3% lead in some place?

From Confidence interval - Wikipedia with the important part:

A 95% confidence level does not imply a 95% probability that the true parameter lies within a particular calculated interval, which is instead associated with the credible interval in Bayesian inference. The confidence level instead reflects the long-run reliability of the method used to generate the interval. In other words, if the same sampling procedure were repeated 100 times from the same population, approximately 95 of the resulting intervals would be expected to contain the true population mean.

A subtle distinction, but the conclusion you drew isn’t the exact result being shown.

I do not think there is going to be much of a difference in the required number of trials.

If the failure rate is p and you test N parts, then the probability that they all pass is 1-p^N.

On the other hand, if you take a Bayesian approach and the rate is a priori uniformly distributed between 0 and 1, then the probability that the failure rate is p given that they all pass is (N+1)(1-p)^N, therefore the probability that it is \le p is 1-p^{N+1}, not a big difference if you are testing dozens or hundreds of parts.

[sorry, I may have swapped p and 1-p somewhere above…!]

If I can avoid introducing typos upon editing this time, what I wanted to say was, if the failure probability p is uniformly random between 0 and 1, then, a posteriori, the probabilty that it equals exactly p given that no failures were observed is (N+1)(1-p)^N\,dp. The probability that it is less than or equal to p is therefore 1-(1-p)^{N+1}.

For example, if I want to be “95% confident” that it is <0.1%, I can set 0.95 = 1 − (0.999)N+1