How does a bell curve distribution change if you cut off the end

Wesley_Clark · September 29, 2016, 4:57pm

Let’s say you have a bell curve distribution. You cut off the bottom 25%. What happens to the mean and median? How do you calculate that?

What if you only take the top 10%?

Is there a website where I can put I number to see how it changes the distribution?

Riemann · September 29, 2016, 5:37pm

I don’t have an easy answer - really because at first glance this is a rather odd thing to want to do. I’m rather interested in the actual thing you’re looking at – what kind of random variable has the property of never falling below a certain value, but follows a “chopped off” Gaussian distribution above that value? I can almost imagine a few things, but I’m not really sure how plausible they are.

Some things that don’t answer your question that may be relevant are the lognormal distribution, or deviations such as kurtosis and heteroskedasticity.

Saffer · September 29, 2016, 5:39pm

Look up conditional value at risk. Also called expected shortfall. Basically it is the expected value of a random variable, on the condition that it is less than some value. It is a popular measure of risk in finance.
Sent from my iPhone using Tapatalk

KarlGauss · September 29, 2016, 11:12pm

By any chance, are you asking about the following type of situation:

Assume there is a college class where the students’ ‘intelligence’ and skill at taking exams are both normally distributed (with the distributions bell curve shaped).

If you remove, say, the bottom 25 percent, is there way of making the remaining 75 percent of students normally distributed with respect to intelligence and exam taking skill? I think the answer must be ‘no’. (As a result, I was frequently upset when profs tried to effect such miracles, e.g. when they tried to make a bell curve out of the remaining students (for the evil purpose, of course, of ‘belling’ our marks).

septimus · September 30, 2016, 4:40am

One practical but brutish way to do this is by using standard tools to just generate some random data and then calculate their statistics. For example
rnvars 400000 100 15 | statmoment
generates 400,000 normal variates of mean 100 and s.d. 15, and passes them to a statistics program. This prints out, about as expected,
400000 items: mean 99.992430 sd 15.011881 skew 0.008847 kurt 3.002609 min 26.098508 max 167.497638
Now do the same thing, but sort the random data and discard the bottom 25% before showing the statistics:
rnvars 400000 100 15 | sort -nr | head -300000 | tee tmp | statmoment
printing
300000 items: mean 106.343541 sd 11.004399 skew 0.732533 kurt 3.229362 min 89.838353 max 167.497638
statmoment doesn’t show the median, but you can get that with standard Unix tools:
echo -n Median is" " ; head -150000 tmp | tail -1
prints
Median is 104.752155

If the 25% truncation case is all you wanted, you should be able to just scale the above numbers and use them.

Ruken · September 30, 2016, 11:05am

You can also similarly brute-force it in excel, if that’s more the OP’s game.

Dinsdale · September 30, 2016, 4:43pm

Not directly responsive, but reminded me of Gould’s book Full House where he discussed changes in mean/median/mode. He was generally speaking of distributions with a hard minimum or maximum, which would be skewed one way or the other. Discussed the effect on the mean/median/mode as the tail extended. As a general rule, I believe the median and mode are less responsive than the mean to changes at the extreme. So if you chopped off one of the tails, I would imagine the mean would change the most, the median less so, and the mode likely not at all.

That’s all this math-impaired lawyer can add (other than that I highly recommend that book)!

Buck_Godot · September 30, 2016, 6:02pm

Wikipedia to the rescue

Buck_Godot · September 30, 2016, 6:16pm

I just noticed that the page I gave didn’t have the median calculated, but it’s pretty easy.

Using EXCEL’s terminology, let NORM.INV be the inverse quantile of the normal distribution NORM.INV(0.5) = 0, NORM.INV(0.025)=-1.96

then then if you cut off the lower q% of the tail, and the upper r% of the tail the pth quantile will be

NORM.INV(q+p*(1-r-q))

The median will be NORM.INV(q+(1-r-q)/2)

Leo_Bloom · September 30, 2016, 6:22pm

Wow, two great words at once. (I always thought kurtosis had something to do with scoliosis/orthopedics. Maybe it does.) And now I see it can be an adjective with lepto and platy! (An easy descriptive pdf: http://www.uky.edu/Centers/HIV/cjt765/9.Skewness%20and%20Kurtosis.pdf

Heteroskeda…I have to work on.

Wesley_Clark · October 1, 2016, 1:10pm

Yeah, something like that.

I know the new distribution will not be a bell curve. But what happens to the average and median values if you cut off the bottom 20 or 30%? How much does it improve?

How much does it go down if you cut off the top 5%?

etc.

Wesley_Clark · October 1, 2016, 1:35pm

Thanks. Math was never something I was good at but I found this site.

http://www.ntrand.com/truncated-normal-distribution/

I believe you need the bottom end of the distribution, top end, mode and standard deviation to calculate the new mean of a truncated distribution.

However where it says to calculate the mean, I’m not sure how to get the probability density function and cumulative distribution function.

A can be obtained from a, m & sigma. B from b, m & sigma (I assume that is a typo where the calculation for A and B are identical. (a-m)/sigma for both. If it isn’t a typo wouldn’t the answer for all the equations end up as 0?) But getting delta or the mean requires the two functions above, I’m not sure how they are obtained.

Buck_Godot · October 3, 2016, 5:40pm

The probability density function it is referring to is the probablility density furnction of the standard normal distribution.

phi(x)=1/sqrt(2pi)*exp(-x^2)

with capital PHI being the integral of that, which doesn’t have a nice closed form, but for which there are look up tables, or function calls in most programming languages. In Excel, but phi and PHI can be calculated with the function NORM.DIST, with mean 0 standard deviation 1 and the flag for cumulative set to 0 and 1 respectively.

So in EXCEL notation.

DELTA would be PHI(B)-PHI(A) = NORM.DIST(B,0,1,1)-NORM.DIST(A,0,1,1)

and

mean=m+(NORM.DIST(B,0,1,0)-NORM.DIST(A,0,1,0))*sigma/DELTA

The issue with A=B is not a typo. According the the notation, in the site you included the function is cut below A and above B, so if A=B, you have cut the entire distribution out, and so there is no distribution left, giving you a mean of (0/0)=Undefined.
Based on your OP it looked like you were more interested in cutting off at perticular percentiles (cutting off the top and bottom 25%) rather than particular values (cutting off values below 2 and above 3). If you want to cut off the lower q% and the upper p% then you can simplify things by replacing the equations for A and B and DELTA with the following EXCEL equivalents

A=NORM.INV(q,0,1) and B=NORM.INV(1-p,0,1) DELTA=1-p-q

Hope this helps.

Buck_Godot · October 3, 2016, 5:50pm

ETA: type and formatting issues in last post

should be

mean=m+(NORM.DIST(A,0,1,0)-NORM.DIST(B,0,1,0))*sigma/DELTA

and the last line should read
A=NORM.INV(q,0,1)
B=NORM.INV(1-p,0,1)
DELTA=1-p-q

Buck_Godot · October 3, 2016, 5:52pm

nm double post

Topic		Replies	Views
Statisticians, please help... Factual Questions	4	634	June 21, 2001
Randomness and the Bell Curve Factual Questions	46	1645	June 5, 2022
Bell Curve Question Factual Questions	18	1639	December 28, 2008
Logic question/puzzle Factual Questions	5	886	August 2, 2000
Standard Deviation: How do I use it? Factual Questions	11	1086	July 6, 2003

How does a bell curve distribution change if you cut off the end

Related topics