Reply
 
Thread Tools Display Modes
  #1  
Old 04-10-2012, 02:17 AM
Dave Hartwick Dave Hartwick is offline
Guest
 
Join Date: Jul 2009
Posts: 1,902
Approximating individual surveys from aggregate results

I'm working on a project where we collect employee surveys and report on them. Depending on role and location, the surveys may have up to 15 responses. Each response must be from one to five.

Responses are then grouped. For example:

Group 1: q1, q4, q9
Group 2: q5, q6, q7
Group 3: q6 and q8
Group 4: q2 and q3
Group 5: q10 and q11

(Yes, a question can be in more than one group, as q4 is in this example.)

Straightforward so far. However, it's an inherited project and the surveys from before a certain date were not kept, nor was a count of the responses. I'd like to approximate the old surveys in order to determine a minimum number of responses for those old weeks. Usually there's only 8-15 responses in any given week, so it shouldn't require all that much crunching.

I've set up a spreadsheet, naturally, and tried running the Excel (XL07) Solver add-in over the data but no luck-- no surprise there, I've never got Solver to do much for me, although it probably works fine on simpler problems like textbook break-evens. I'm not even coming up with good search terms to look up. Any ideas?

Sample responses:

Group 1: 3.4
Group 2: 3.2
Group 3: 3.15
Group 4: 3.5
Group 5: 3.8
  #2  
Old 04-10-2012, 10:22 AM
ultrafilter ultrafilter is offline
Guest
 
Join Date: May 2001
Location: In another castle
Posts: 18,988
My gut feeling is that what you're trying to do is impossible, but I'm still not 100% clear on what exactly it is. What exactly is involved in grouping the responses?
  #3  
Old 04-10-2012, 10:35 AM
Baracus Baracus is offline
Guest
 
Join Date: Dec 2006
Location: Cary, NC
Posts: 1,530
Yeah, sorry, but it isn't terribly clear what you are trying to do. Back out the average response for each question?
  #4  
Old 04-10-2012, 01:58 PM
Dave Hartwick Dave Hartwick is offline
Guest
 
Join Date: Jul 2009
Posts: 1,902
Mainly to determine an estimated number of responses for each group in a reporting week. Secondly, I'd like to load dummy responses into the database that would calculate out to the same result as the aggregate data.
  #5  
Old 04-10-2012, 02:10 PM
Dave Hartwick Dave Hartwick is offline
Guest
 
Join Date: Jul 2009
Posts: 1,902
Quote:
Originally Posted by ultrafilter View Post
My gut feeling is that what you're trying to do is impossible, but I'm still not 100% clear on what exactly it is. What exactly is involved in grouping the responses?
Formula for grouping the responses for a team would be (sum(questions in group))/(number of surveys*number of questions in group), three being the number of questions in Group 1.

So, say there were only two employees responding that week, employees A and B. Looking only at Group 1 responses, A responds:

q1=3, q4=5, q9=4 (on a 1 to 5 scale, of course, for a total of 12)

B is a little less sanguine and responds:

q1=2, q4=2, q9=1 (totalling 5)

(5+12)/(2*3)

17/6

or 2.83.
  #6  
Old 04-10-2012, 03:20 PM
Buck Godot Buck Godot is offline
Guest
 
Join Date: Mar 2010
Location: MD outside DC
Posts: 4,397
Quote:
Originally Posted by Dave Hartwick View Post
However, it's an inherited project and the surveys from before a certain date were not kept, nor was a count of the responses. I'd like to approximate the old surveys in order to determine a minimum number of responses for those old weeks.

So it sounds like you have the averages from previous weeks, but not the actual data. What I am not clear on, is what you want to calculate from the previous weeks. I think an example as to what you have is input and what you want as output would be helpful. Also information as to how you are going to use the output might assist us in determining if there is an alternative way to get what you want.
  #7  
Old 04-10-2012, 03:54 PM
ultrafilter ultrafilter is offline
Guest
 
Join Date: May 2001
Location: In another castle
Posts: 18,988
Quote:
Originally Posted by Dave Hartwick View Post
Formula for grouping the responses for a team would be (sum(questions in group))/(number of surveys*number of questions in group), three being the number of questions in Group 1.

So, say there were only two employees responding that week, employees A and B. Looking only at Group 1 responses, A responds:

q1=3, q4=5, q9=4 (on a 1 to 5 scale, of course, for a total of 12)

B is a little less sanguine and responds:

q1=2, q4=2, q9=1 (totalling 5)

(5+12)/(2*3)

17/6

or 2.83.
Let k be the number of questions in a group, let n be the number of surveys, and let m be the average. If you round the product kmn to an integer, you're just looking for an integer solution to the equation x1 + 2x2 + 3x3 + 4x4 + 5x5 = kmn, with the constraints that each xi is positive. There are algorithms that can solve this (Google on solving Diophantine equations), but since you know that each xi has to be between 0 and kmn, you can just brute force it. You will need to do some kind of programming for this, though; Excel's solver will not be good enough.
  #8  
Old 04-10-2012, 05:15 PM
ZenBeam ZenBeam is offline
Guest
 
Join Date: Oct 1999
Location: I'm right here!
Posts: 8,858
Do you always get two digits of accuracy, or could that 2.833333 sometimes be reported as 2.8?
  #9  
Old 04-10-2012, 08:27 PM
Dave Hartwick Dave Hartwick is offline
Guest
 
Join Date: Jul 2009
Posts: 1,902
9 digits in the aggregate data, sample below:


3.476190476
3.333333333
3.428571429
3.285714286
3.571428571
  #10  
Old 04-10-2012, 08:53 PM
ultrafilter ultrafilter is offline
Guest
 
Join Date: May 2001
Location: In another castle
Posts: 18,988
Quote:
Originally Posted by Dave Hartwick View Post
9 digits in the aggregate data, sample below:


3.476190476
3.333333333
3.428571429
3.285714286
3.571428571
That's more than enough.
  #11  
Old 04-10-2012, 08:56 PM
Dave Hartwick Dave Hartwick is offline
Guest
 
Join Date: Jul 2009
Posts: 1,902
Quote:
Originally Posted by Buck Godot View Post
So it sounds like you have the averages from previous weeks, but not the actual data. What I am not clear on, is what you want to calculate from the previous weeks. I think an example as to what you have is input and what you want as output would be helpful. Also information as to how you are going to use the output might assist us in determining if there is an alternative way to get what you want.
Yes, I have the averages. The output I'm hoping for is 1) an estimate of the number of responses and 2) dummy responses I can add to the database that will get the same average (or to one decimal place-- that would be good enough).

I'll go ahead with a more detailed example of the input and output. Let's say I've got the results:

Group 1: 2.433333333
Group 2: 2.6
Group 3: 2.9
Group 4: 2.65
Group 5: 3.466666667

I know that the employees answered 13 questions, grouped as below.

Group 1: q1, q2, q3
Group 2: q4, q5, q6
Group 3: q7, q8
Group 4: q9, q10
Group 5: q11, q12, q13

What I'd like for output is:

Code:
	q1	q2	q3	q4	q5	q6	q7	q8	q9	q10	q11	q12	q13
r1	5	1	3	1	2	3	5	5	3	4	4	3	3
r2	5	3	4	2	5	4	2	5	3	2	4	2	1
r3	2	4	5	2	3	5	4	1	4	1	2	1	5
r4	1	2	1	5	2	1	1	5	2	2	1	3	1
r5	1	4	4	3	4	1	1	2	5	3	5	5	1
r6	4	3	4	2	4	4	4	2	5	5	5	5	1
r7	2	1	3	5	4	3	5	3	1	3	4	2	3
r8	1	2	3	3	1	4	5	2	1	5	1	1	5
r9	4	5	2	5	4	1	4	5	3	3	5	5	4
r10	4	5	4	2	2	5	5	5	4	5	2	2	1
You could rearrange those in any way, switching respondent 1 and 10, say, or swapping r3 and r4's answers for questions 1 and 2, no matter, so long as the averages come out the same. I'm not trying to determine the individual results, I'd like to fake up some data that will work out the same as the legacy averages, and thereby, hopefully, determine a minimum number of responses that fit the data.

Thanks all for replying. I had a look at Diophantine equations in Wikipedia but didn't leap up and cry "Eureka". Looks like I'll back burner this aspect of the project and make do with the aggregate data. Nobody even asked me to do this. I just got curious if it could be done.
  #12  
Old 04-10-2012, 09:09 PM
Dave Hartwick Dave Hartwick is offline
Guest
 
Join Date: Jul 2009
Posts: 1,902
Realized too late that I posted bad survey results, should've been something like:


Code:
	q1	q2	q3	q4	q5	q6	q7	q8	q9	q10	q11	q12	q13
r1	3	1	1	4	3	1	3	5	5	2	5	5	3
r2	1	2	1	2	1	1	2	3	3	5	5	1	4
r3	2	1	3	3	2	4	1	1	3	1	5	1	1
r4	1	3	2	3	4	1	2	2	2	3	5	4	2
r5	2	5	3	1	1	5	4	4	4	1	1	5	5
r6	5	2	2	2	1	2	3	4	4	4	5	5	5
r7	4	1	1	1	1	1	4	1	1	2	4	3	1
r8	2	3	2	3	4	3	3	3	2	1	3	4	5
r9	5	3	5	1	5	5	3	3	1	3	4	4	1
r10	1	4	2	5	3	5	3	4	5	1	2	1	5
  #13  
Old 04-11-2012, 12:31 PM
Buck Godot Buck Godot is offline
Guest
 
Join Date: Mar 2010
Location: MD outside DC
Posts: 4,397
OK I understand the question, and could see that given the integer nature of the responses, and overlaps that it could be possible to solve, I realize that I haven't had nearly enough number theory to necessarily solve it.

The only help I can give is that its clear that if you have enough decimal places and look at the fractional component of your answer it would be possible to work out for each group a factor of the denominator that was used for that group. Fo that if there are N respondents and Q questions in a group then the Average*N*Q must be an integer.

In group 1 we have 2.4333 which only equals and integer when multiplied by a multiple of 30. So N*3 is a multiple of 30, so N is a multiple of 10.

question 2: 2.6 will be an integer when multipled by 5, so 2*N must be a multiple of 5, so N must be a multiple of 5.

If you do this for all of your questions and then take the least common multiple, that should give you a good guess as to the number of people questioned.
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Forum Jump


All times are GMT -5. The time now is 07:35 PM.

Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2018, vBulletin Solutions, Inc.

Send questions for Cecil Adams to: cecil@chicagoreader.com

Send comments about this website to: webmaster@straightdope.com

Terms of Use / Privacy Policy

Advertise on the Straight Dope!
(Your direct line to thousands of the smartest, hippest people on the planet, plus a few total dipsticks.)

Publishers - interested in subscribing to the Straight Dope?
Write to: sdsubscriptions@chicagoreader.com.

Copyright 2017 Sun-Times Media, LLC.

 
Copyright © 2017