If I have a mean, a standard dev, and an "n", can I generate a data set?

Cagey_Drifter · September 17, 2009, 12:20am

How might I generate a data set that conforms to a certain mean, standard deviation, and an n (number of items in data set)?

statsman1982 · September 17, 2009, 12:23am

I assume that you want to generate normal data, since that distribution is completely specified by the mean and sd. What software are you using? There is a data analysis add-in for Excel that can do it pretty quickly, and SAS and R (the latter is open source freeware) can do it fairly easily too.

Cagey_Drifter · September 17, 2009, 12:48am

Hm. Is using a plugin the only way?

thelurkinghorror · September 17, 2009, 12:55am

The excel plugin is free. Depending on your version, you might need the CD handy. It’s under Excel Options > Add-Ins and select “Analysis ToolPak.” In 2007 you have to click the Windows logo in the upper left.

Once it’s installed, use the “Random Number Generation” tool.

Andy_L · September 17, 2009, 1:01am

No. Most software packages have a builtin function for providing uniformly distributed random numbers, and there are several methods for converting uniformly distributed random numbers to normally distributed random numbers.

I’ve used this method Inverse transform sampling - Wikipedia, but this one Box–Muller transform - Wikipedia and this one Ziggurat algorithm - Wikipedia are also available.

Cagey_Drifter · September 17, 2009, 1:06am

I guess what I’m asking is whether there’s a relatively simple way to do it without software.

Cagey_Drifter · September 17, 2009, 1:17am

(if I have a small n; n < 6)

Andy_L · September 17, 2009, 1:21am

You might be able to get a book of random number tables - the CRC mathematics manuals still have those (I think - my copy of the CRC manual dates from 1982 or so)

statsman1982 · September 17, 2009, 1:34am

Yes, you’ll need a table of random numbers before you do anything. The Rand put out a famous one years go, “One Million Random Digits” if I’m not mistaken. Do you want data from a normal distribution?

Cagey_Drifter · September 17, 2009, 1:50am

Thank you, but no. I’m practicing doing an ANOVA analysis from a set of textbook questions in a book about statistical analysis of experiments (not homework), and I’m just trying to figure out how they calculated sum of squares for S/A without having the individual observations. They only give the mean, SD, the n, and the a (number of test groups), but I thought you needed to have the individual observations to do it. I’m really perplexed by this.

ultrafilter · September 17, 2009, 2:01am

Then you do want normally distributed data–one of the assumptions underlying the standard ANOVA model is that your residuals are normally distributed. You can do it in Excel using the NORMINV function, where the parameters are rand(), your mean and your standard deviation; that’s much simpler than any method not involving a computer.

Cagey_Drifter · September 17, 2009, 2:08am

How would I create the full data set though? I don’t see how to do that using NORMINV.

statsman1982 · September 17, 2009, 2:13am

I think I understand what you’re trying to do. You’re trying to recreate the data set for which you only have summary statistics? Is that correct?

In that case, you won’t be able to get what you want for a couple of reasons. First, the summary statistics you have are just that, statistics, not pparameters. The latter are what you want, but you can’t get those from just looking at the sample results. And even if you could, there is no guarantee that the data set would be exactly the same, point for point, as the one used to calculate the statistics you have. Just as the data is random (i.e., they vary), so are statistics random (i.e., they vary, from sample to sample).

Cagey_Drifter · September 17, 2009, 3:19am

Yeah, that’s right. How weird. I’m unsure how they arrived at the answer in the back of the book without the observations.

j_sum1 · September 17, 2009, 4:45am

I do this all the time – generating data for teaching purposes.
In Excel the formula is =round(norminv(rand(),10,2),1)
This gives a normally distributed data value with mean of 10, SD of 2 rounded to one decimal place. Alter as required.

I have used similar formulas to create data sets from rectangular, triangular, skewed, bimodal and other shaped distributions.

One thing to note is that you are in effect sampling from the distribution. there is no guarantee that your sampled data will have its mean and standard deviation exactly the same as you specify in your formula. But you can get close and you can tweak your data afterwards if you really want.

thelurkinghorror · September 17, 2009, 5:17pm

You’re right, you need individual scores to calculate S/A using either method, as it looks at how each score deviates from it’s own group. You however only need sums or means to calculate the between-subjects SS.

ChordedZither · September 17, 2009, 6:29pm

Could it be as simple as this?

Let “sum” denote the sum of the x values and “sumsq” denote the sum of the squares of the x values.

You know the mean and n, and mean = sum / n. So you can solve for sum without knowing the individual x values.
You know the standard deviation (sd) , and sd = sqrt((sumsq- nmean^2)/(n))
So sumsq = nsd^2 + n * mean^2
(I’m using ^ to denote raising to the power). So you can solve for the sumsq without knowing the individual x values as well.

Don_t_Panic · September 17, 2009, 6:30pm

The Amazon customer reviews are great.

Topic		Replies	Views
ANOVA question (calculating SS s/a or MS s/a without individual observations) Factual Questions	3	1417	September 23, 2009
Z-Scores and Percentages from the Mean Factual Questions	19	2859	April 24, 2003
Correlating sequences of (pseudo-)random numbers Factual Questions	21	1154	August 5, 2004
Statistics Factual Questions	2	578	January 23, 2001
How do you find the average standard deviation amongst a group of standard deviations Factual Questions	16	30229	April 2, 2007

If I have a mean, a standard dev, and an "n", can I generate a data set?

Related topics