Study interpretation help (stats)

Jebbilene · March 27, 2021, 8:27am

I’m reading a research study and it’s got some whizz bang stats in it that I don’t understand. The context is that the new playground encourages more physical activity ¶. Levels of physical activity were checked at regular intervals, and coded in three ways. Firstly 1-5 based on how active the child was, secondly as sedentary or non-sedentary (0 or 1) and thirdly as moderate to vigorous physical activity (MVPA) (1) or not (0). Then there is this table I can’t make head or tail of.

[Album] imgur.com

What does it mean?

biqu · April 2, 2021, 1:53pm

Hi Jebbilene, would you mind sharing a link to the original study? Usually the authors are considerate enough to explain their methods and comment on the tabulated results.

Without the original study, I would start by imagining what the underlying data frame might look like. The wording of the research question suggests that the explanatory variable is the condition of the play area (undeveloped or newly developed), while the response variable is the child’s level of physical activity.

The N=6596 in the title could represent the number of distinct children (if it’s a matched pairs study), or the number of distinct observation times (where the same child might have activity level measured in two different rows of the data frame). The linear and logistic regressions that they report in the table are hard to square with a matched pairs study, so let’s suppose the data frame for the first regression looks like this:

Obs	Playground Condition	Physical Activity
1	old	2
2	old	3
3	old	1
…	…	…
6595	new	4
6596	new	5

Because the response variable is numerical, they can do ordinary least squares (linear) regression, coding the playground condition as old=0, new=1. The parameters of interest would be the slope of the line, the y-intercept, and the coefficient of determination (R-Sq). These numbers are basically what you find in the first column of the table.

As for the second column in the table, now the data frame might be formatted like so:

Obs	Playground Condition	Non-Sedentary Physical Activity?
1	old	0
2	old	0
3	old	1
…	…	…
6595	new	1
6596	new	1

Because the response variable is now binary (y=0 or y=1), linear regression is not recommended. Instead they employ a logistic transformation, something like y=e^z/(1+e^z), and regress z against the explanatory variable x (playground condition). Again the parameters reported are essentially slope, intercept, and R-sq.

The third column of the table would be obtained by a similar recipe as the second, except the data frame uses Moderate to Vigorous Physical Activity as the response variable, rather than Non-Sedentary Physical Activity.

Jebbilene · April 2, 2021, 10:09pm

Hi biqu, thanks very much for the response. You are correct about the variables and that N=6596 is the number of observations, rather than the number of children. So I think your guess as to how the data frame may have looked is probably pretty accurate.

The study is here (paywalled): Childcare outdoor renovation as a built environment health promotion strategy: evaluating the preventing obesity by design intervention - PubMed . It assesses the new playgrounds for a few different things, but the thing I am interested in is how much physical activity increased by. At one point they say by 22% but I don’t know where they got that from. They seem to gloss over few things.

The parameters of interest would be the slope of the line, the y-intercept, and the coefficient of determination (R-Sq).

Could you tell me why those things would be of interest, and what they might tell us? I’m way out of my depth, but I thought R-Sq was supposed to be high to indicate correlation, and it seems really low. And doesn’t negative mean inverse correlation, so the girls were actually less active in the new playgrounds?

biqu · April 2, 2021, 11:02pm

Thanks for the link, Jebbilene. After downloading the PDF I found the paragraph you’re referring to:

The base model controlling only for gender was processed for each of the three PA outcome variables. On average, girls were less physically active than boys and less likely to be classified as nonsedentary ... Children observed after outdoor renovations were 22% more likely to be engaged in nonsedentary activity. So, independent of gender, children were more likely to be engaged in nonsedentary activity in renovated OLEs (Table 2).

I’m guessing that by “base model” they mean a multivariate equation like Y=a+b₁x₁+b₂x₂+…+b_kx_k, where Y is the response variable (physical activity, either coded on the 1–5 scale or transformed logistically) and the x’s are the explanatory variables (including gender and the specific playground features built during the renovation). Because the 6596 observations were not all made on the same playground site, the effect on physical activity needed to be standardized. That might be why Table 2 reported two sets of numbers in each cell: unstandardized (and standardized) effects.

The 1.22 that appears in the second row seems to be what they’re interpreting when they say “22% more likely to be engaged in nonsedentary activity” after renovations. That interpretation suggests that the second row represents the slope (one of the b’s in the linear equation above) of an OLS or logistic regression. If x₁ is the explanatory variable and it changes from 0 to 1 when a playground gets renovated, then a coefficient b₁=1.22 would be associated with an increase in physical activity by 22% (due to the way the logistic transformation is defined).

The Y-intercept of the model is perhaps not meaningful enough to warrant an appearance in Table 2. Now that I’m reading the original study, it appears they’re using row 1 to provide another coefficient of the base model (say b₂). Then if x₂=1 represents that the observed child is female, the model predicts a lower physical activity than the case x₂=0. (About 36% lower, because 1-0.643 = 0.357.)

As for R², you’re right that the numbers appear low. With multivariate regression you can think of R² as telling you how much explanatory power can be attributed to the variables on which you regress. There appears to be a whole lot of other variation in physical activity, which cannot be accounted for by gender and playground renovation.

Jebbilene · April 3, 2021, 10:51pm

Okay, I understand this much better now. Thank you for taking the time to explain it.

Topic		Replies	Views
Who knows something about statistics? Factual Questions	34	3881	August 11, 2011
Statisticians: would this look reasonably smart, stupid, or simply nonsensical? In My Humble Opinion	2	1100	January 2, 2012
Statisical Confusion Factual Questions	4	844	January 15, 2008
Linear regression: r vs. R^2 Factual Questions	6	1663	February 12, 2014
Youth sports research help Factual Questions	2	5309	May 27, 2010

Study interpretation help (stats)

Related topics