Chapter 2 Analyzing experimental data with a linear model

fig2i_m1 <- lm(liver_tg ~ treatment, data = fig2i)
fig2i_m1_emm <- emmeans(fig2i_m1, specs = "treatment")
fig2i_m1_pairs <- contrast(fig2i_m1_emm,
                            method = "revpairwise") %>%
  summary(infer = TRUE)
fig2i_m1_emm_dt <- summary(fig2i_m1_emm) %>%
fig2i_m1_pairs_dt <- data.table(fig2i_m1_pairs)
fig2i_m1_pairs_dt[ , p_pretty := pvalString(p.value)]
fig2i_m1_pairs_dt[, group1 := 1]
fig2i_m1_pairs_dt[, group2 := 2]

2.1 This text is about the estimation of treatment effects and the uncertainty in our estimates using linear models. This, raises the question, what is “an effect?”

This text has an unusual start – an example analysis. This example is a goal or target; it’s what you will be working towards as you learn from this text. The data for the analysis come from multiple experiments presented in Figure 2 in the article ASK1 inhibits browning of white adipose tissue in obesity. The analysis is in the last part of this chapter. The second part of this chapter is just enough biology to help you understand the biological importance of each experiment. The first part of this chapter uses one experiment from Figure 2 to outline what the statistical analysis of experimental data is all about. Much of this outline will be repeated in “An introduction to linear models” chapter.

The analysis in part 3 of this chapter is a set of experiments exploring the consequences of adispose-tissue specific deletion of the ASK1 signaling protein on multiple, adverse effects of a high-fat diet in mice, including weight gain, glucose intolerance, and increased liver triacylglycerol levels. Think of this as a template for organizing your own R Markdown documents. This document is a re-analysis of the experiments in Figure 2 in the article ASK1 inhibits browning of white adipose tissue in obesity, including generation of the publication-ready plots. I chose the data in Figure 2 of this paper because of the diversity of analyses and plot types. My analyses and plots differ slightly from those of the researchers because I implemented better practices – the stuff of this text.

The goal of the experiments is to measure the effect of the adipose-specific ASK1 deletion. To understand what I mean by “an effect,” and to understand how we can estimate an effect by fiting a linear model to data, let’s look more closely at the analysis for Figure 2i.

For Figure 2i, the researchers want to know if “knocking out” the ASK1 gene in the adipose tissue cells lowers the liver triglyceride (TG) level in mice fed a high-fat diet. That is, is \(\overline{y}_{ASK1Δadipo} < \overline{y}_{ASK1F/F}\), where \(\overline{y}_{ASK1Δadipo}\) is the mean liver TG level of the knockout (Δ is the del operator and refers to a deletion in genetics) mice and \(\overline{y}_{ASK1F/F}\) is the mean liver TG level of the control mice. The difference in the means, \(\overline{y}_{ASK1Δadipo} - \overline{y}_{ASK1F/F}\), is the effect (of ASK1 deletion on liver TG levels).

The measured means in each group are computed from a random sample of mice. If we only cared about the six mice in each group in this experiment, then we would not need to fit a linear model to the data to estimate the effect, we could simply compute each group mean and subtract the control mean from the knockout mean. But we care more about these dozen mice because we are trying to discover something general about ASK1 regulation of TG levels in mice, generally (and even in mammals, and especially humans, generally). To make this leap of inference, we use a model to claim that each sample mean is an estimate of the respective population mean. Given this model, we can compute the standard error of each mean and the standard error of the difference in means. A standard error is a measure of the sampling variance of a statistic and, therefore, a measure of the precision of the estimate. The standard error, then, is a measure of uncertainty in the estimate. Here is how to think about precision and uncertainty: if we were to repeat this experiment many, many times, we would generate a long list of mean TG levels for the control mice and a long list of mean TG levels for the knockout mice. The less variable the means in each list, the more precise. By using a model, we do not need to repeat this experiment many times to get a standard error.

The model we are going to fit to the Figure 2i data is

\[\begin{align} y_i &= \beta_0 + \beta_1 x_i + \varepsilon_i\\ \varepsilon_i &\sim N(0, \sigma^2) \end{align}\]

This is a model of how the Figure 2i data were generated. In this model, \(y_i\) is the liver TG level for some fictional (generated!) mouse (the i stands for the ith fictional mouse generated) and \(x_i\) is a variable that indicates the condition of the ask1 gene in ficitional mouse i. For \(x_i\), a value of 0 is given to mice with a functional ASK1 gene and a value of 1 is given to mice with a knocked out gene.

\(\beta_0\) is the “true” mean of the TG level in mice fed a high-fat diet and with a functional ASK1 gene. By “true,” I mean the mean that would be computed if we were to measure TG on an infinite number of these mice. The observed mean of the ASK1F/F group is an estimate of \(\beta_0\). The sum \(\beta_0\) + \(\beta_1\) is the true mean of the TG level in mice fed a high-fat diet but with a knocked out ASK1 gene. This means that \(\beta_1\) is the true difference in the means, or the true effect. The observed difference in means between the ASK1Δadipo and ASK1F/F groups is an estimate of \(\beta_1\). This difference is the estimated effect.

Notice that the sum \(\beta_0 + \beta_1 x_i\) equals the true mean of the infinite set of normal mice if \(x_i = 0\) and equals the true mean of the infinite set of ASK1 knockout mice if \(x_i = 1\). \(\varepsilon_i\) is the error for mouse i, which is the difference between the TG level for mouse i and the expected TG value for mouse i. The expected value for a mouse with a normal ASK1 gene is \(\beta_0\). The expected value for a mouse with a knocked out ASK1 gene is \(\beta_0 + \beta_1\). The second line of the model simply states that \(\varepsilon_i\) is modeled as a random sample from a normal distribution with a mean of zero and a variance of \(\sigma^2\).

By fitting a model to the data we estimate the parameters \(\beta_0\), \(\beta_1\) and \(\sigma\). It is the estimation of \(\sigma\) that allows us to compute a measure of our uncertainty (a standard error) of our estimates of the means (\(\beta_0\) and \(\beta_0 + \beta_1\)) and of the difference in the means (\(\beta_1\)).

Let’s fit this model to the Figure 2i data using R

Robust inference from the model (generalizing from sample to population, including measures of the uncertainty of our estimates) requires that our data approximates the kind of data we’d expect from the data generating model specified above. All rigorous analysis should use specific model checks to evaluate this. First, the “normality check” – we use a quantile-quantile (QQ) plot to see if our data approximate what we’d see if we sampled from a normal distribution. This looks okay, in the sense that the observed data points (open circles) fall within the boundaries set by the dashed line. Inference is pretty robust to moderate departure from normal.

qqPlot(fig2i_m1, id=FALSE)

Second, the “homogeneity check” – we use a spread level plot to see if there is some pattern to the variance, for example if the spread of residuals is noticeably bigger in one group than another, or if the spread increases with the fitted value. This looks pretty good. Given these checks, lets move on and look at the table of model coefficients

spreadLevelPlot(fig2i_m1, id=FALSE)

## Suggested power transformation:  1.294553
fig2i_m1 <- lm(liver_tg ~ treatment, data = fig2i)
fig2i_m1_coef <- cbind(coef(summary(fig2i_m1)),
knitr::kable(fig2i_m1_coef, digits = c(1, 2, 1, 3, 1, 1))
Estimate Std. Error t value Pr(>|t|) 2.5 % 97.5 %
(Intercept) 61.5 4.98 12.3 0.000 50.4 72.6
treatmentASK1Δadipo -21.6 7.05 -3.1 0.012 -37.3 -5.9

The two values in the column “Estimate” are the estimates of \(\beta_0\) and \(\beta_1\). The top value (61.5) is the mean of the control mice (the units are µmol/g). The mean of the knockout mice is the sum of the two values (39.9 µmol/g). And the effect of ASK1 deletion on TG levels is simply the second value (-21.6 µmol/g). The standard error of the effect is 7.05 µmol/g. We can use the standard error to compute a t-value (-3.1, in the column “t value”). A t-value is a test statistic. The probability (“p value”) of the significance test is 0.012. This if the probability of sampling a t-value as large or larger than the observed t-value, if we were to sample from a null distribution of t-values (a distribution of sampled t values if the true value of \(\beta_1\) was 0). We can also use the standard error to compute a 95% confidence interval of the effect. The lower bound of this interval is -37.3 µmol/g and the upper bound is -5.9 µmol/g. A confidence interval is another way of communicating uncertainty, and the way advocated in this text. In a 95% confidence interval, 95% of similarly constructed intervals (from hypothetical sampling of six mice from the ASK1 normal population and six mice from the ASK1 knockout population) will contain the true mean. Another way to think about a confidence interval is, it is the range of true differences that are compatible with the data, where compatible means “not rejected” by a t-test (a t-test between the estimated effect and any number inside the interval would return a p-value greater than 0.05).

Here is how we might report this result in a paper:

Mean TG level in ASK1Δadipo mice on a high-fat diet was 21.6 µmol/g less than that in ASK1F/F mice on a high-fat diet (95% CI: -37.3, -5.9, \(p = 0.012\)).

And a plot for the paper: