# Study interpretation help (stats)

I’m reading a research study and it’s got some whizz bang stats in it that I don’t understand. The context is that the new playground encourages more physical activity ¶. Levels of physical activity were checked at regular intervals, and coded in three ways. Firstly 1-5 based on how active the child was, secondly as sedentary or non-sedentary (0 or 1) and thirdly as moderate to vigorous physical activity (MVPA) (1) or not (0). Then there is this table I can’t make head or tail of.

What does it mean?

Hi Jebbilene, would you mind sharing a link to the original study? Usually the authors are considerate enough to explain their methods and comment on the tabulated results.

Without the original study, I would start by imagining what the underlying data frame might look like. The wording of the research question suggests that the explanatory variable is the condition of the play area (undeveloped or newly developed), while the response variable is the child’s level of physical activity.

The N=6596 in the title could represent the number of distinct children (if it’s a matched pairs study), or the number of distinct observation times (where the same child might have activity level measured in two different rows of the data frame). The linear and logistic regressions that they report in the table are hard to square with a matched pairs study, so let’s suppose the data frame for the first regression looks like this:

Obs Playground Condition Physical Activity
1 old 2
2 old 3
3 old 1
6595 new 4
6596 new 5

Because the response variable is numerical, they can do ordinary least squares (linear) regression, coding the playground condition as old=0, new=1. The parameters of interest would be the slope of the line, the y-intercept, and the coefficient of determination (R-Sq). These numbers are basically what you find in the first column of the table.

As for the second column in the table, now the data frame might be formatted like so:

Obs Playground Condition Non-Sedentary Physical Activity?
1 old 0
2 old 0
3 old 1
6595 new 1
6596 new 1

Because the response variable is now binary (y=0 or y=1), linear regression is not recommended. Instead they employ a logistic transformation, something like y=e^z/(1+e^z), and regress z against the explanatory variable x (playground condition). Again the parameters reported are essentially slope, intercept, and R-sq.

The third column of the table would be obtained by a similar recipe as the second, except the data frame uses Moderate to Vigorous Physical Activity as the response variable, rather than Non-Sedentary Physical Activity.

Hi biqu, thanks very much for the response. You are correct about the variables and that N=6596 is the number of observations, rather than the number of children. So I think your guess as to how the data frame may have looked is probably pretty accurate.

The study is here (paywalled): Childcare outdoor renovation as a built environment health promotion strategy: evaluating the preventing obesity by design intervention - PubMed . It assesses the new playgrounds for a few different things, but the thing I am interested in is how much physical activity increased by. At one point they say by 22% but I don’t know where they got that from. They seem to gloss over few things.

The parameters of interest would be the slope of the line, the y-intercept, and the coefficient of determination (R-Sq).

Could you tell me why those things would be of interest, and what they might tell us? I’m way out of my depth, but I thought R-Sq was supposed to be high to indicate correlation, and it seems really low. And doesn’t negative mean inverse correlation, so the girls were actually less active in the new playgrounds?