Chapter 12 Violations of independence, homogeneity, or Normality

12.1 Lack of independence

12.1.1 A paired t-test is a special case of a linear model for correlated data with two groups

12.1.1.1 Example 1 (fig1b)

Lynes, M.D., Leiria, L.O., Lundh, M., Bartelt, A., Shamsi, F., Huang, T.L., Takahashi, H., Hirshman, M.F., Schlein, C., Lee, A. and Baer, L.A., 2017. The cold-induced lipokine 12, 13-diHOME promotes fatty acid transport into brown adipose tissue. Nature medicine, 23(5), pp.631-637.

Public source

Data source

The data are from the experiment for Figure 1b of the 12,13-diHOME article from the previous chapter.

Response variable\(\texttt{diHOME}\): concentration (pmol per ml) of the putative lipokine 12,13 di-HOME. This is a continuous variable.

Factor variable\(\texttt{treatment}\), with levels: “saline” and “cold.” Coded as a factor.

Blocking variableid: identification code of the individual human.

Design – Blocked. Plasma concentrations of 12,13-diHOME were measured two times in each individual, after injection with saline and after a 1 hour cold exposure.

The consequence of the two measures in each human is correlated error. In blocked designs like this, the correlated error arises because we expect measures within a block (the individual for the fig1b experiment) to be more similar to each other than to measures from other blocks. A blocked design is a powerful experimental method to reduce uncertainty in estimates of differences among means – and, consequently, the power of a statistical test. The paired t-test exploits this design while the classic Student’s t-test ignores it.

Mulitple response measures per individual violates the “independent” sampling assumption for inference. If there is a lack of independence between the two groups, a “which test?” strategy points to a paired t-test in place of Student’s t-test.

Here, I show empirically that a paired t-test is a special case of a kind of linear model for correlated data called a linear mixed model. The advantage of understanding this is, the linear model is infinitely expandable while a paired t-test is limited to the comparison of two means. If you have more than two groups or a factorial design or a covariate, there is no elegant way to expand a paired t-test. Correlated error and linear mixed models are discussed more in the chapter Models with random factors – Blocking and pseudoreplication.

A good way to think about the model generating the data is

\[\begin{equation} diHome = (\beta_0 + \beta_{0_i}) + \beta_1 treatment\_cold + \varepsilon \end{equation}\]

  1. \(\beta_0\) is the expected value of diHome in humans given a saline treatment.
  2. \(\beta_{0_i}\) is the expected difference between the expected value of diHome in human \(i\) given a saline treatment and the expected value of diHome in humans given a saline treatment. \(\beta_{0_i}\) is a kind of random effect
  3. \(\beta_0 + \beta_{0_i}\) is the expected value of diHome in human \(i\) given a saline treatment. It is the random intercept for human \(i\).

Importantly, if the random effect (\(\beta_{0_i}\)) isn’t modeled, then variation in this component is picked up by the model error \(e\). Consequently, the error component is correlated because the subsets of \(e\) from the same individual are expected to have more similar values to each other than to that from other individuals. Correlated error is discussed more in the chapter Models with random factors – Blocking and pseudoreplication. Importantly, if the random effect isn’t modeled, the increased variance of \(e\) decreases precision of the estimate. Consequently, CIs are wider and significance tests have less power.

12.1.1.2 Fit the model

# fit the model
fig1b_m1 <- lmer(diHOME ~ treatment + (1|id), data = fig1b)
  1. The function lmer() from the lme4 package allows the addition of random factors to the model formula. Otherwise, the function works like the lm function.
  2. The added random factor is \(\texttt{id}\), which contains the id of the human. \(\texttt{id}\) didn’t appear from nowhere, it is a column in the data.table fig1b.
  3. Random factors are added embedded within parentheses. A random intercept is added using the formula (1|factor) where \(\texttt{factor}\) is the names of the column of the data containing the random factor.
  4. \(\texttt{id}\) is a random factor. \(\texttt{treatment}\) is a fixed factor. Models with both fixed and random factors go by many names. In this text, I use “linear mixed model.”

12.1.1.3 Inference

Inference using the linear model with added random factor:

# estimated marginal means table
fig1b_m1_emm <- emmeans(fig1b_m1, specs = "treatment")

# contrasts table
fig1b_m1_pairs <- contrast(fig1b_m1_emm,
                     method = "revpairwise") %>%
  summary(infer = TRUE)

fig1b_m1_pairs %>%
  kable(digits = c(1,1,2,1,1,1,5,5)) %>%
  kable_styling()
contrast estimate SE df lower.CL upper.CL t.ratio p.value
cold - saline 0.2 0.05 8 0.1 0.4 4.27223 0.00272

Inference using paired t-test:

fig1b_ttest <- t.test(
  x = fig1b[treatment == "cold", diHOME],
  y = fig1b[treatment == "saline", diHOME],
  paired = TRUE
)

fig1b_ttest
## 
##  Paired t-test
## 
## data:  fig1b[treatment == "cold", diHOME] and fig1b[treatment == "saline", diHOME]
## t = 4.2722, df = 8, p-value = 0.002716
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.1078778 0.3609173
## sample estimates:
## mean of the differences 
##               0.2343975

Notes

  1. The t and p values from the linear model and from the t-tests are the same because the paired t-test is a special case of the linear model.

12.1.1.4 A blocked design (paired t-test) is more efficient

12.1.2 Inferences from the linear mixed model and paired t-tests are not the same when there are more than two groups

12.1.2.1 Example 2 (diHOME fig2a) – more than two groups

The structure of the Figure 2A experiment from the chapter Linear models with a single, categorical X is ambiguous. It is not clear from the archived data if the measures of 12,13 diHome were on separate mice within each of the three treatments (“Control,” “1 hour cold,” “30 min NE”) or if the same mice were used in each set. If each mouse was measured three times, I assume the researchers would have stated this, which they didn’t. But, to outline modeling correlated data when we have more than two groups, let’s say each mouse in the Figure 2A experiment was subjected to all three treatments. We have three measures per mouse and the response measure violates the independence assumption.

The linear model (in regression notation) is

\[\begin{equation} diHome = (\beta_0 + \beta_{0_i}) + \beta_1 treatment\_cold\ + \beta_2 treatment\_NE\ + \varepsilon \end{equation}\]

12.1.2.2 Fit the model

# fit the model
fig2a_m2 <- lmer(diHOME ~ treatment + (1|id), data = fig2a)

Notes

  1. If a treatment factor has more than two groups, nothing special has to be done relative to a model with a treatment factor with only two groups.

12.1.2.3 Inference

Inference using the linear model with added random factor:

# estimated marginal means table
fig2a_m2_emm <- emmeans(fig2a_m2, specs = "treatment")

# contrasts table
fig2a_m2_planned <- contrast(fig2a_m2_emm,
                     method = "trt.vs.ctrl",
                     adjust = "none") %>%
  summary(infer = TRUE)

fig2a_m2_planned %>%
  kable(digits = c(1,1,2,1,1,1,5,5)) %>%
  kable_styling()
contrast estimate SE df lower.CL upper.CL t.ratio p.value
1 hour cold - Control 8.1 3.88 9.4 -0.6 16.8 2.08124 0.06589
30 min NE - Control 14.8 3.62 9.0 6.6 23.0 4.08718 0.00271

Inference using two paired t-tests:

fig2a_ttest_cold <- t.test(
  x = fig2a[treatment == "1 hour cold", diHOME],
  y = fig2a[treatment == "Control", diHOME],
  paired = TRUE
)
fig2a_ttest_cold
## 
##  Paired t-test
## 
## data:  fig2a[treatment == "1 hour cold", diHOME] and fig2a[treatment == "Control", diHOME]
## t = 2.1973, df = 4, p-value = 0.09293
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.925189 16.533005
## sample estimates:
## mean of the differences 
##                7.303908
fig2a_ttest_ne <- t.test(
  x = fig2a[treatment == "30 min NE", diHOME],
  y = fig2a[treatment == "Control", diHOME],
  paired = TRUE
)
fig2a_ttest_ne
## 
##  Paired t-test
## 
## data:  fig2a[treatment == "30 min NE", diHOME] and fig2a[treatment == "Control", diHOME]
## t = 3.7564, df = 5, p-value = 0.01321
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   4.670273 24.918436
## sample estimates:
## mean of the differences 
##                14.79435

Notes

  1. important – the t-test for the comparison of “1 hour cold” and “Control” excluded mouse 6 because of a missing “1 hour cold” measure. The linear mixed model was fit to all data, including the “Control” and “30 min NE” measures. The consequence is higher power/more precision in the linear mixed model.
  2. The paired t-test t and p values are not equal to those from the linear mixed model. This is partly because of the missing data (note 1). Even if there were no missing data, these wouldn’t be equal because the linear mixed model uses an estimate of \(\sigma\) computed from the single model with all three groups to compute the standard errors. See the comparison of t-tests and the contrast table from a linear model in Section ??.
  3. The results of the separate, paired t-tests and the linear mixed model differ in a way that might affect our inference about the system. Which is correct? Neither – they simply make different assumptions about the data generating model.
  • The linear model strategy has more power and precision but this advantage is small. The best reason to use linear models instead of separate t-tests is learning how to use linear models, and their extensions, gives you phenomenal cosmic power.
  • Do not compute both separate paired t-tests and the linear models and then convince yourself that that the assumption of the method with the p-value that matches your hypothesis is correct. See the p-hacking discussion above.

12.2 Heterogeneity of variances

Heterogeneity of variance among treatment groups is a problem for inference, especially if the sample size is unequal among groups (statisticians tend to agree that heterogeneity is much more problematic than a non-normal response).

12.2.0.1 A Welch t-test is a special case of a linear model for heterogeneity

A “which test?” strategy points to a Welch’s t-test in place of Student’s t-test if there is heterogeneity of variances between treatment groups. A Welch t-test is infrequent in the experimental biology literature, perhaps because

  1. it is poorly known and it doesn’t occur to researchers to use a test that models heterogeneity of variances.
  2. heterogeneity often arises in right-skewed data, which is often analyzed with a non-parametric test like the Mann-Whitney U test.

The Welch t-test is a special case of a linear model that explicitly models the within-group variance using generalized least squares (GLS). The 95% CI of a mean differences and p-values from the fit gls linear model and from Welch’s t-test are the same. Advantages of using a linear modeling strategy is that a researcher uses the model to estimate effects (difference in means) and measures of uncertainty in the effects (standard errors or confidence intervals of the difference). Advantages of specifically using a GLS linear model is that it is easily expanded to analyze more complex designs including 1) more than one factor, 2) added covariates, 3) correlated residuals due to non-independence.

Some statisticians argue that researchers should always use a Welch t-test instead of Student’s t-test. Given this logic, researchers should consider using GLS linear models for more complex experimental designs (added covariates, factorial) in place of classical ANCOVA and two-way ANOVA.

Modeling variance heterogeneity is the focus of Chapter 18 so the account here is brief. Heterogeneity can be modeled using a generalized least squares linear model with the gls function. The weights argument is used to model the variances using each group’s sample variance. In this example, I use the data from the Figure 1b experiment, which can be compared to the analysis of the same data in Example 2 above.

12.2.0.2 Fit the model

# gls fails with missing data
subdata <- fig2a[is.na(diHOME) == FALSE,] # omit rows with missing data
fig2a_m3 <- gls(diHOME ~ treatment,
                data = subdata,
                weights = varIdent(form = ~ 1 | treatment))

The model fig2a_m3 uses variance computed in each group separately as the estimate of \(\sigma\) for that group. The coefficient table of the GLS model is

12.2.0.3 Inference from the linear model

fig2a_m3_coef <- cbind(coef(summary(fig2a_m3)),
                       confint(fig2a_m3))

fig2a_m3_coef %>%
  kable(digits = c(4,4,4,6,4,4)) %>%
  kable_styling()
Value Std.Error t-value p-value 2.5 % 97.5 %
(Intercept) 12.0231 1.2001 10.0181 0.000000 9.6708 14.3753
treatment1 hour cold 7.1404 3.1635 2.2571 0.040505 0.9401 13.3407
treatment30 min NE 14.7944 4.5688 3.2382 0.005951 5.8398 23.7490

Notes

  1. Important for reporting CIs and p-values. Unlike the linear model modeling homogenous variance, the CIs and p-values for the coefficients of \(\texttt{treatment1 hour cold}\) and \(\texttt{treatment30 min NE}\) are not the same as the p-values of these equivalent contrasts in the contrasts table (see below). The reason is, the computation of the CI and p-values in the two tables use two different degrees of freedom. Report the CI and p-values from the contrast table using the Satterthwaite df.

The modeled means and contrasts are computed as above for the lm object

fig2a_m3_emm <- emmeans(fig2a_m3, specs="treatment")
fig2a_m3_emm
##  treatment   emmean   SE df lower.CL upper.CL
##  Control       12.0 1.20  5     8.94     15.1
##  1 hour cold   19.2 2.93  4    11.04     27.3
##  30 min NE     26.8 4.41  5    15.49     38.1
## 
## Degrees-of-freedom method: satterthwaite 
## Confidence level used: 0.95

Notes

  1. The SE of the means in this table are modeled SEs but are equal to the sample SE of the means, because this was specified in the GLS model.
fig2a_m3_pairs <-  contrast(fig2a_m3_emm,
                            method = "revpairwise",
                            adjust = "none") %>%
  summary(infer = TRUE)

fig2a_m3_pairs %>%
  kable(digits = c(1,4,4,1,4,4,4,5)) %>%
  kable_styling()
contrast estimate SE df lower.CL upper.CL t.ratio p.value
1 hour cold - Control 7.1404 3.1635 5.3 -0.8394 15.1202 2.2571 0.07026
30 min NE - Control 14.7944 4.5688 5.7 3.4896 26.0991 3.2382 0.01889
30 min NE - 1 hour cold 7.6540 5.2915 8.4 -4.4595 19.7675 1.4465 0.18452

Notes

  1. Compare the statistics for “1 hour cold - Control” and “30 min NE - Control” to the coefficients for \(\texttt{treatment1 hour cold}\) and \(\texttt{treatment30 min NE}\) in the coefficient table. The estimates, SE, and t are the same but the CIs and p values differ. The contrast function is using a different method (“satterthwaite”) for computing the degrees of freedom and this results in a different value of the t-distribution tail area used to compute the CI and p value.

12.2.0.4 Inference using Welch t-test

test1 <- t.test(fig2a[treatment == "1 hour cold", diHOME],
       fig2a[treatment == "Control", diHOME],
       var.equal = FALSE)

test2 <- t.test(fig2a[treatment == "30 min NE", diHOME],
       fig2a[treatment == "Control", diHOME],
       var.equal = FALSE)

test3 <- t.test(fig2a[treatment == "30 min NE", diHOME],
       fig2a[treatment == "1 hour cold", diHOME],
       var.equal = FALSE)

Notes

  1. the default t.test is the Welch t-test. However, I’ve included var.equal = FALSE so the method is transparent.

12.2.0.5 Compare inference from the linear model and Welch t-tests

Compare the contrast p values to the three Welch t-tests of all pairs of treatment levels in the fig2a experiment.

contrast t (gls lm) p (gls lm) t (welch t) p (welch t)
1 hour cold - Control 2.257139 0.0702626 2.257139 0.0702622
30 min NE - 1 hour cold 1.446454 0.1845154 1.446454 0.1845155
30 min NE - Control 3.238157 0.0188908 3.238158 0.0188904

The t and p-values computed from the GLS linear model and from the three, pairwise Welch t-tests are the same (to about the 6th decimal place). They are the same because each is estimating \(\sigma^2\) separately for each group and not as the pooled (among two groups for t-test or three groups for the linear model) estimate and because they use the same degrees of freedom to compute the p-value.

Let’s summarize these comparisons

  1. Inference from a linear model using homogenous variance (the lm function) and from a Student’s t-test are the same if there are only two levels in the treatment variable.
  2. Inference from a linear model using homogenous variance (the lm function) and from the series of pairwise, Student’s t-tests differ when there are more than two levels in the treatment variable.
  3. Inference from a GLS linear model using heterogenous variance (the gls function) and from a Welch t-test are the same regardless of the number of levels in the treatment variable.

Even though the linear model that models heterogeneity and the Welch t-test produce the same results, researchers should use the linear model because

  1. A linear modeling strategy encourages researchers to think about the effect and uncertainty in the effect and not just a p-value.
  2. The linear model is nearly infinitely flexible and expandible while the t-test has extremely limited flexibility (The Welch t-test is one way to expand the classical, Student’s t-test).

12.2.1 When groups of the focal test have >> variance

12.3 The conditional response isn’t Normal

It is common to hear or read from a statistitican that of all the issues with inference from a classical linear model (including t-tests and ANOVA), non-normal “data” is the least worrisome – we should be much more concerned about lack of independence and heterogeneity.

  1. The theoretical basis for this sentiment is the central limit theorem. In short, if we repeatedly 1) sample a non-normal distribution and 2) compute the sample mean, the distribution of these resampled means is approximately normal (the distribution converges to normal as the sample size increases). Because inferential statistics using a classical linear model assume a normal distribution of the means, inference from classical linear models are increasingly robust to non-normal distributions as the sample size increases. The main issue for applied statistics in experimental biology is the small sample sizes.
  2. Empirical results from simulations support the theory. Actual type I error using classical linear models with non-normal data tends to be close to the nominal (say, 0.05) level.
  3. That said, generalized linear models that explicitly model the distribution have more power than classical linear models and this is a pretty good reason to use GLMs.
  4. Confidence intervals computed using the normal distribution on non-normal data can be awkward, for example, the CIs might include impossible values such as negative counts or percent of cells expressing a certain marker above 100 %.

The normality assumption of the classical linear model does not apply to the distribution of the response (Y) variable and certainly not to the distribution of the treatment (X) variable but to the residuals of the fit model (actually, it applies to the distribution of the means, but if the sampling distribution is normal then the means are normal). An equivalent way to think about this is that the normality assumption applies to the distribution of the response if all values of \(X\) in the sample are the same – this is known as the conditional response. Contrary to common practice by experimental biologists, and the advice of some textbooks, there is no good reason to use the p-value from a test of normality as a decision tool to use either a t-test or a non-parametric test like the Mann-Whitney-Wilcoxon. A test of normality for small samples will frequently result in \(p > 0.05\) (“normal”), even for fake data sampled from a nonnormal distribution, while a test with a large sample size will frequently result in \(p < 0.05\) (“not-normal”) even if a histogram of the residuals appears normal.

A Normal Q-Q plot (Section 11.3.2) is a useful tool for the decision to use some alternative to inference using a Normal assumption. If a Q-Q plot indicates that the residuals are far from expected for a normal distribution, then the researcher has several alternatives, the best choice of which partially depends on the goals of the analysis. Some alternative to a linear model assuming normal distribution include:

  1. Count data – count data tend to be right skewed and groups with higher mean counts tend to have larger variance.
  • Generalized linear models using either a negative binomial or quasi-poisson distribution family are a good alternative to consider.
  • Bootstrap confidence intervals and permutation p-values are resampling techniques. Bootstrap confidence intervals are a good alternative if the sample size is much larger (say, > 40) than typical in experimental biology. Permutation p-values are a good alternative regardless of sample size.
  • A GLS linear model to account for any heterogeneity in variance that typically accompanies non-normal data. This doesn’t address the non-normal distribution but, again, heterogeneity is typically a much larger problem than non-normality.
  • Log transformation of a count response is controversial at best. log transformations can be pretty good at making the response look more like it was sampled from a normal distribution (and can make variances more similar). But…If the counts include zero, researchers have to add a kludge factor (typically equal to 1) to all counts prior to transformation. The value of the kludge factor is arbitrary and matters (adding .1 gives different results than adding 1). A log transformation (or any transformation) raises interpretation problems. For the log transformation, the effect is the difference between the means of the log-transformed counts. I don’t know what the magnitude of an effect on a log scale means biologically. If we backtransform this, the effect is the ratio of the geometric means of the two groups. A ratio effect is easy enough to interpret but do we really want to model the geometric and not the arithmetic means? Finally, log transformation does not specifically address either the shape of the distribution or the heterogeneity that often comes with the non-normal shape. A generalized linear model can specifically model both the shape and the variance.
  • Classic non-parametric tests such as Mann-Whitney-Wilcoxon were invented when computers were not fast enough to perform permutation tests and before the development of Generalized Linear Models (and other modern methods). Since at least 1990, there are better alternatives to non-parametric tests.
  1. Fraction (proportion) data – For example, the number of cells expressing a certain marker relative to all counted cells in the sample. Fraction data have hard bounds at 0 and 1, or at 0 and 100 if converted to a percent. Fraction data tend to be right skewed if the mean is closer to zero and left skewed if the mean is closer to the upper bound.
  • Generalized linear model using a binomial distribution family with a logit link (“logistic regression”) is a good alternative to consider. This model is used for Bernouili (success/fail) responses (for example the subject “lived” or “died”), where success is assigned the value 1 and fail is assigned the value 0. Using this model with proportion data is equivalent to assigning a 0 (“does not express marker”) or 1 (“expresses marker”) to all cells in the count.
  • Bootstrap confidence intervals and permutation p-values are resampling techniques. Bootstrap confidence intervals are a good alternative if the sample size is much larger (say, > 40) than typical in experimental biology. Permutation p-values are a good alternative regardless of sample size.
  • A GLS linear model to account for any heterogeneity in variance that typically accompanies non-normal data. This doesn’t address the non-normal distribution but, again, heterogeneity is typically a much larger problem than non-normality.
  • arcsin transformation. See [The arcsine is asinine: the analysis of proportions in ecology] (https://esajournals.onlinelibrary.wiley.com/doi/full/10.1890/10-0340.1)
  • Classic non-parametric tests such as Mann-Whitney-Wilcoxon were invented when computers were not fast enough to perform permutation tests and before the development of Generalized Linear Models (and other modern methods). Since at least 1990, there are better alternatives to non-parametric tests.

Generalized linear models that explicitly model the distribution of the response is covered in more detail in the chapter Linear models for count data – Generalized Linear Models I.

12.3.1 Example 1 (fig6f) – Linear models for non-normal count data

Source article: Exercise reduces inflammatory cell production and cardiovascular inflammation via instruction of hematopoietic progenitor cells

Data source

Public source

12.3.1.1 Fit a Linear Model – Inference assuming Normal conditional response

fit the model

m1 <- lm(neutrophil_count ~ treatment, data = fig6f)

check the model

ggcheck_the_model(m1)

The left plot shows classic right skew. The right plot shows evidence for positive relationship between mean and variance. This suggest we fit a negative binomial or quasi-poisson model.

inference

m1_emm <- emmeans(m1, specs = "treatment")
m1_pairs <- contrast(m1_emm,
                     method = "trt.vs.ctrl") %>%
  summary(infer = TRUE)

12.3.1.2 Fit a Generalized Linear Model – inference assuming negative binomial conditional response

Generalized linear models for count data are covered more thoroughly in the chapter Linear models for count data – Generalized Linear Models I.

fit the model

m2 <- glmmTMB(neutrophil_count ~ treatment,
              data = fig6f,
              family = nbinom2(link = "log"))

Notes

  1. The function glmmTMB allows fitting from multiple distribution families and works like the lm function with a few tweaks, including an argument for the distribution family.

inference

m2_coef <- coef(summary(m2))$cond

m2_coef %>%
  kable(digits = c(2,3,1,5)) %>%
  kable_styling()
Estimate Std. Error z value Pr(>|z|)
(Intercept) 15.11 0.085 177.9 0.00000
treatmentExercise -0.53 0.137 -3.8 0.00013

Notes

  1. The coef function for a glmmTMB object is a list with several elements. the element \(\texttt{\$cond}\)" returns the coefficient table.
  2. \(b_0\) (\(\texttt{(intercept)}\)) is the natural log of the mean of the reference (“sedentary”) group.
  3. \(b_1\) (\(\texttt{(intercept)}\)) is the difference of the natural log of the mean of exercise group mean and the natural log of the mean of the sedentary group mean.
  4. The coefficient values are on a link scale, which is the scale of the transformation of the group means in the GLM fit. The link transformation for this fit was log (the natural log transformation). In general, values on the response scale are more interpretable. \(exp(b_0)\) is the modeled mean of the reference group on the response scale (because the model had no covariates, it is the sample mean of the reference group). \(exp(b_1)\) is the ratio of the exercise to sedentary means. Remember: \(exp(log(a) - log(b)) = a/b\)
m2_emm <- emmeans(m2,
                  specs = "treatment",
                  type = "response")
m2_pairs <- contrast(m2_emm,
                     method = "trt.vs.ctrl",
                     type = "response") %>%
  summary(infer = TRUE)

Notes

  1. The argument type = "response" tells emmeans and contrast to return the estimates on the response (not link) scale. See the notes for the coefficient table above for a brief explanation of the link and response scale. For more detailed explanation, see the Chapter Linear models for count data – Generalized Linear Models I

12.3.1.3 Compare inference from the LM and GLM models

gg1 <- ggplot_the_model(
  m1,
  m1_emm,
  m1_pairs,
  legend_position = "none",
  y_label = "Neutrophil Count (/mL)",
  effect_label = "Effect (/mL)",
  palette = pal_okabe_ito_blue,
  rel_heights = c(0.75,1)
)

gg2 <- ggplot_the_model(
  m2,
  m2_emm,
  m2_pairs,
  legend_position = "none",
  y_label = "Neutrophil Count (/mL)",
  effect_label = "Relative Effect",
  palette = pal_okabe_ito_blue,
  rel_heights = c(0.75,1),
  effect_x_lim = c(0.35, 1.25)
)

plot_grid(gg1, gg2, nrow=2, labels = "AUTO")

Notes

  1. Generalized linear models are covered more thoroughly in Chapter 17.
  2. The effect estimated from the GLM is not a difference but the ratio of the exercise group mean to the sedentary group mean. Both differences and ratios indicate how much bigger (or smaller) the mean of one group is compared to another but in a linear model “how much bigger” is a difference while in the glm here “how much bigger” is a multiple (or “times”) – the mean of the exercise group is .59X the mean of the sedentary group.
  3. The CIs for the means and for the effect are asymmetric in the GLM. The CIs from a

12.3.2 My data aren’t normal, what is the best practice?

  1. The major advantage of using a GLM for count data instead of the classical linear model is the increased power of a significance test and increased precision of estimates. And, CIs will reflect the asymmetry in the uncertainty, unlike the CIs of a classical linear model. However, researchers should gaurd against the slight increase in Type I error of a GLM.
(A) histogram of distribution used for simulation of non-normal count response (sampled from negative binomial distribution with mu and theta equal to the observed values in Fig. 6f). (B) An example of a sampled fake data set.

Figure 12.1: (A) histogram of distribution used for simulation of non-normal count response (sampled from negative binomial distribution with mu and theta equal to the observed values in Fig. 6f). (B) An example of a sampled fake data set.

Table 12.1: Table of Type I error and Power computed from simulations of non-normal count data. The model columns are linear model, generalized least squares, GLM negative binomial, GLM quasipoisson, permutation, and Mann-Whitney-Wilcoxon. The values in the model columns are the frequency of simulated experiments with p < 0.05. Sample size is 10 when equal and 12, 8 when not equal. The effect ratio is 0.7 in the simulations of power.
sim normal type n theta lm gls glm_nb glm_qp lmp mww
1 TRUE type 1 10 0.05 0.050 0.076 0.052 0.053 0.046
2 FALSE type 1 10 6.6 0.04 0.041 0.075 0.044 0.046 0.039
3 FALSE type 1 10 1 0.04 0.034 0.075 0.046 0.046 0.041
4 FALSE type 1 12, 8 6.6 0.05 0.049 0.080 0.048 0.050 0.046
5 TRUE power 10 0.57 0.567 0.653 0.572 0.571 0.528
6 FALSE power 10 6.6 0.49 0.476 0.590 0.494 0.499 0.440
7 FALSE power 10 1 0.10 0.082 0.171 0.113 0.114 0.088
8 FALSE power 12, 8 6.6 0.44 0.501 0.578 0.452 0.444 0.438
9 FALSE power 12, 8 1, 6.6 0.03 0.082 0.144 0.054 0.041 0.045
10 FALSE type 1 12, 8 6.6, 1 0.13 0.103 0.139 0.116 0.132 0.144

Table 12.1 summarizes the results of a simulation to compare the performance of alternatives to a t-test. The simulated data were modeled to look like the Figure 6f (Figure 12.1). The methods are the classical linear model (lm) (equivalent to the t-test), GLS linear model (gls), negative binomial GLM (glm-nb), quasi-poisson GLM (glm-qp), permutation test (lmp), and Mann-Whitney-Wilcoxon test. Performance is based on the the p-values from the test. The frequency of \(p < 0.05\) is given in the column for each method. If the simulation was run with no treatment effect, this frequency is the simulated Type I error, given the nominal error of \(\alpha = 0.05\). Values less then 0.05 are conservative and values greater than 0.05 are liberal. If the simulation was run with a treatment effect, the frequency is the simulated power of the test.

Over the limited parameter space of the non-normal distribution (the shape of the count distribution including the relationship between mean and variance, the sample size, the effect size), the negative binomial GLM has substantially higher power than the classical linear model, although this comes at some cost to inflated Type I error. Even with a Normal simulated distribution, the negative binomial GLM has higher power than the classical linear model, but again, at some cost to Type I error.

In the simulations with a negative binomial distributed response, the quasipoisson GLM has consistently higher power than the classical linear model and excellent Type I error control. The performance of the permutation test is very similar to that of the quasipoisson GLM. In the simulations with a negative binomial distributed response, the GLS linear model has about the same power as the classical linear model. Perhaps surprisingly, the Mann-Whitney-Wilcoxon fails to outperform the classical linear model and even underperforms the classical linear in some parts of the parameter space.

12.4 Hidden Code

12.4.1 Importing and wrangling the fig1b data

data_from <- "The cold-induced lipokine 12,13-diHOME promotes fatty acid transport into brown adipose tissue"
file_name <- "41591_2017_BFnm4297_MOESM1_ESM.xlsx"
file_path <- here(data_folder, data_from, file_name)

fig1b <- read_excel(file_path,
                     sheet = "Fig 1 b thur c",
                     range = "A1:C20",
                     col_names = TRUE) %>%
  data.table() %>%
  clean_names() %>%
  na.omit() # get rid of blank row

setnames(fig1b,
         old = names(fig1b),
         new = c("sample", "diHOME", "bat_activity"))

fig1b[, id := substr(sample, 1, 6)]
fig1b[, treatment := ifelse(substr(sample, 8,8) ==
                              "C", "cold", "saline")]
fig1b[, treatment := factor(treatment, levels = c("saline", "cold"))]

#View(fig1b)

12.4.2 Importing and wrangling the fig2a data

data_from <- "The cold-induced lipokine 12,13-diHOME promotes fatty acid transport into brown adipose tissue"
file_name <- "41591_2017_BFnm4297_MOESM2_ESM.xlsx"
file_path <- here(data_folder, data_from, file_name)

# assuming mice are independent and not same mouse used for all three treatment
melt_col_names <- paste("Animal", 1:6)
fig2a <- read_excel(file_path,
                     sheet = "Fig 2a",
                     range = "A3:G6",
                     col_names = TRUE) %>%
  data.table() %>%
  melt(measure.vars = melt_col_names,
       variable.name = "id",
       value.name = "diHOME") # cannot start a variable with number
setnames(fig2a, old = colnames(fig2a)[1], new = "treatment")

treatment_order <- c("Control", "1 hour cold", "30 min NE")
fig2a[, treatment := factor(treatment, treatment_order)] # order levels

#View(fig2a)

12.4.3 Importing and wrangling the fig6f data

# need data_folder from earlier chunk
data_from <- "Exercise reduces inflammatory cell production and cardiovascular inflammation via instruction of hematopoietic progenitor cells"
file_name <- "41591_2019_633_MOESM8_ESM.xlsx"
file_path <- here(data_folder, data_from, file_name)

# assuming mice are independent and not same mouse used for all three treatment
melt_col_names <- c("Sedentary", "Exercise")
fig6f <- read_excel(file_path,
                     sheet = "Figure 6f",
                     range = "A7:B29",
                     col_names = TRUE) %>%
  data.table() %>%
  melt(measure.vars = melt_col_names,
       variable.name = "treatment",
       value.name = "neutrophils") %>%
  na.omit() # danger!

treatment_levels <- melt_col_names
fig6f[, treatment := factor(treatment,
                            levels = treatment_levels)]

# neutrophils is count/10^6
fig6f[, neutrophil_count := round(neutrophils*10^6, 0)]

fig6f[1, neutrophil_count]
## [1] 4936007
#View(fig6f)