Elements of Applied Biostatistics
Preface
0.1
Math
0.2
R and programming
Part I: Getting Started
1
Getting Started – R Projects and R Markdown
1.1
R vs R Studio
1.2
Download and install R and R studio
1.3
Install R Markdown
1.4
Importing Packages
1.5
Create an R Studio Project for this textbook
1.5.1
Create an R Markdown file for this Chapter
1.5.2
Create a “fake-data” chunk
1.5.3
Create a “plot” chunk
1.5.4
Knit
Part II: An introduction to the analysis of experimental data with a linear model
2
Analyzing experimental data with a linear model
2.1
This text is about the estimation of treatment effects and the uncertainty in our estimates using linear models. This, raises the question, what is “an effect”?
Background physiology to the experiments in Figure 2 of “ASK1 inhibits browning of white adipose tissue in obesity”
Analyses for Figure 2 of “ASK1 inhibits browning of white adipose tissue in obesity”
2.2
Setup
2.3
Data source
2.4
control the color palette
2.5
useful functions
2.6
figure 2b – effect of ASK1 deletion on growth (body weight)
2.6.1
figure 2b – import
2.6.2
figure 2b – exploratory plots
2.7
Figure 2c – Effect of ASK1 deletion on final body weight
2.7.1
Figure 2c – import
2.7.2
Figure 2c – check own computation of weight change v imported value
2.7.3
Figure 2c – exploratory plots
2.7.4
Figure 2c – fit the model: m1 (lm)
2.7.5
Figure 2c – check the model: m1
2.7.6
Figure 2c – fit the model: m2 (gamma glm)
2.7.7
Figure 2c – check the model, m2
2.7.8
Figure 2c – inference from the model
2.7.9
Figure 2c – plot the model
2.7.10
Figure 2c – report
2.8
Figure 2d – Effect of ASK1 KO on glucose tolerance (whole curve)
2.8.1
Figure 2d – Import
2.8.2
Figure 2d – exploratory plots
2.8.3
Figure 2d – fit the model
2.8.4
Figure 2d – check the model
2.8.5
Figure 2d – inference
2.8.6
Figure 2d – plot the model
2.9
Figure 2e – Effect of ASK1 deletion on glucose tolerance (summary measure)
2.9.1
Figure 2e – message the data
2.9.2
Figure 2e – exploratory plots
2.9.3
Figure 2e – fit the model
2.9.4
Figure 2e – check the model
2.9.5
Figure 2e – inference from the model
2.9.6
Figure 2e – plot the model
2.10
Figure 2f – Effect of ASK1 deletion on glucose infusion rate
2.10.1
Figure 2f – import
2.10.2
Figure 2f – exploratory plots
2.10.3
Figure 2f – fit the model
2.10.4
Figure 2f – check the model
2.10.5
Figure 2f – inference
2.10.6
Figure 2f – plot the model
2.11
Figure 2g – Effect of ASK1 deletion on tissue-specific glucose uptake
2.11.1
Figure 2g – import
2.11.2
Figure 2g – exploratory plots
2.11.3
Figure 2g – fit the model
2.11.4
Figure 2g – check the model
2.11.5
Figure 2g – inference
2.11.6
Figure 2g – plot the model
2.12
Figure 2h
2.13
Figure 2i – Effect of ASK1 deletion on liver TG
2.13.1
Figure 2i – fit the model
2.13.2
Figure 2i – check the model
2.13.3
Figure 2i – inference
2.13.4
Figure 2i – plot the model
2.13.5
Figure 2i – report the model
2.14
Figure 2j
3
An introduction to linear models
3.1
Two specifications of a linear model
3.1.1
The “error draw” specification
3.1.2
The “conditional draw” specification
3.1.3
Comparing the two ways of specifying the linear model
3.2
Statistical models are used for prediction, explanation, and description
3.3
What do we call the
\(X\)
and
\(Y\)
variables?
3.4
Modeling strategy
3.5
Fitting the model
3.6
Fitting linear models to experimental data in which the
\(X\)
variable is categorical
3.7
Assumptions for inference with a statistical model
3.8
Specific assumptions for inference with a linear model
3.9
“linear model,”regression model“, or”statistical model"?
Part III: R fundamentals
4
Data – Reading, Wrangling, and Writing
4.1
Learning from this chapter
4.2
Working in R
4.2.1
Importing data
4.3
Data wrangling
4.3.1
Reshaping data – Wide to long
4.3.2
Reshaping data – Transpose (turning the columns into rows)
4.3.3
Combining data
4.3.4
Subsetting data
4.3.5
Wrangling columns
4.3.6
Missing data
4.4
Saving data
4.5
Exercises
5
Plotting Models
5.1
Pretty good plots show the model and the data
5.1.1
Pretty good plot component 1: Modeled effects plot
5.1.2
Pretty good plot component 2: Modeled mean and CI plot
5.1.3
Combining Effects and Modeled mean and CI plots – an Effects and response plot.
5.2
Some comments on plot components
5.3
Working in R
5.3.1
Unpooled SE bars and confidence intervals
5.3.2
Adding bootstrap intervals
5.3.3
Adding modeled means and error intervals
5.3.4
Adding p-values
5.3.5
Adding custom p-values
5.3.6
Plotting two factors
5.3.7
Interaction plot
5.3.8
Plot components
Part IV: Some Fundamentals of Statistical Modeling
6
Variability and Uncertainty (Standard Deviations, Standard Errors, Confidence Intervals)
6.1
The sample standard deviation vs. the standard error of the mean
6.1.1
Sample standard deviation
6.1.2
Standard error of the mean
6.2
Using Google Sheets to generate fake data to explore the standard error
6.2.1
Steps
6.3
Using R to generate fake data to explore the standard error
6.3.1
part I
6.3.2
part II - means
6.3.3
part III - how do SD and SE change as sample size (n) increases?
6.3.4
Part IV – Generating fake data with for-loops
6.4
Bootstrapped standard errors
6.4.1
An example of bootstrapped standard errors using vole data
6.5
Confidence Interval
6.5.1
Interpretation of a confidence interval
7
P-values
7.1
A
p
-value is the probability of sampling a value as or more extreme than the test statistic if sampling from a null distribution
7.2
Pump your intuition – Creating a null distribution
7.3
A null distribution of
t
-values – the
t
distribution
7.4
P-values from the perspective of permutation
7.5
Parametric vs. non-parametric statistics
7.6
frequentist probability and the interpretation of p-values
7.6.1
Background
7.6.2
This book covers frequentist approaches to statistical modeling and when a probability arises, such as the
p
-value of a test statistic, this will be a frequentist probability.
7.6.3
Two interpretations of the
p
-value
7.6.4
NHST
7.6.5
Some major misconceptions of the
p
-value
7.6.6
Recommendations
7.7
Problems
Part V: Introduction to Linear Models
8
Models with a single, continuous
X
8.1
A linear model with a single, continuous
X
is classical “regression”
8.1.1
Analysis of “green-down” data
8.1.2
Learning from the green-down example
8.1.3
What a regression coefficient means
8.1.4
Using the linear model for prediction – prediction models
8.1.5
Using a linear model for “explanation” – causal models
8.2
Working in R
8.2.1
Fitting the linear model
8.2.2
Getting to know the linear model: the
summary
function
8.2.3
Inference – the coefficient table and Confidence intervals
8.2.4
How good is our model?
9
A linear model with a single, categorical
X
9.1
A linear model with a single, categorical
X
estimates the effects of
X
on the response.
9.1.1
Table of model coefficients
9.1.2
The linear model
9.1.3
Reporting results
9.2
Comparing the results of a linear model to classical hypothesis tests
9.2.1
t-tests are special cases of a linear model
9.2.2
ANOVA is a special case of a linear model
9.3
Working in R
9.3.1
Fitting the model
9.3.2
Changing the reference level
9.3.3
An introduction to contrasts
9.3.4
Harrell plot
10
Model Checking
10.1
Do coefficients make numeric sense?
10.2
All statistical analyses should be followed by model checking
10.3
Linear model assumptions
10.4
Diagnostic plots use the residuals from the model fit
10.4.1
Residuals
10.4.2
A Normal Q-Q plot is used to check normality
10.4.3
Outliers - an outlier is a point that is highly unexpected given the modeled distribution.
10.5
Model checking homoskedasticity
10.6
Model checking independence - hapiness adverse example.
10.7
Using R
11
Model Fitting and Model Fit (OLS)
11.1
Least Squares Estimation and the Decomposition of Variance
11.2
OLS regression
11.3
How well does the model fit the data?
\(R^2\)
and “variance explained”
12
Best Practices – Issues in Inference
12.1
Power
12.1.1
“Types” of Error
12.2
multiple testing
12.2.1
Some background
12.2.2
Multiple testing – working in R
12.2.3
False Discovery Rate
12.3
difference in p is not different
12.4
Inference when data are not Normal
12.4.1
Working in R
12.4.2
Bootstrap Confidence Intervals
12.4.3
Permutation test
12.4.4
Non-parametric tests
12.4.5
Log transformations
12.4.6
Performance of parametric tests and alternatives
12.5
max vs. mean
12.6
pre-post, normalization
Part VI: More than one
\(X\)
– Multivariable Models
13
Adding covariates to a linear model
13.1
Adding covariates can increases the precision of the effect of interest
13.2
Adding covariates can decrease prediction error in predictive models
13.3
Adding covariates can reduce bias due to confounding in explanatory models
13.4
Best practices 1: A pre-treatment measure of the response should be a covariate and not subtracted from the post-treatment measure (regression to the mean)
13.4.1
Regression to the mean in words
13.4.2
Regression to the mean in pictures
13.4.3
Do not use percent change, believing that percents account for effects of initial weights
13.4.4
Do not “test for balance” of baseline measures
13.5
Best practices 2: Use a covariate instead of normalizing a response
14
Two (or more) Categorical
\(X\)
– Factorial designs
14.1
Factorial experiments
14.1.1
Model coefficients: an interaction effect is what is leftover after adding the treatment effects to the control
14.1.2
What is the biological meaning of an interaction effect?
14.1.3
The interpretation of the coefficients in a factorial model is entirely dependent on the reference…
14.1.4
Estimated marginal means
14.1.5
In a factorial model, there are multiple effects of each factor (simple effects)
14.1.6
Marginal effects
14.1.7
The additive model
14.1.8
Reduce models for the right reason
14.1.9
What about models with more than two factors?
14.2
Reporting results
14.2.1
Text results
14.3
Working in R
14.3.1
Model formula
14.3.2
Modeled means
14.3.3
Marginal means
14.3.4
Contrasts
14.3.5
Simple effects
14.3.6
Marginal effects
14.3.7
Plotting results
14.4
Problems
15
ANOVA Tables
15.1
Summary of usage
15.2
Example: a one-way ANOVA using the vole data
15.3
Example: a two-way ANOVA using the urchin data
15.3.1
How to read an ANOVA table
15.3.2
How to read ANOVA results reported in the text
15.3.3
Better practice – estimates and their uncertainty
15.4
Unbalanced designs
15.4.1
What is going on in unbalanced ANOVA? – Type I, II, III sum of squares
15.4.2
Back to interpretation of main effects
15.4.3
The anova tables for Type I, II, and III sum of squares are the same if the design is balanced.
15.5
Working in R
15.5.1
Type I sum of squares in R
15.5.2
Type II and III Sum of Squares
16
Predictive Models
16.1
Overfitting
16.2
Model building vs. Variable selection vs. Model selection
16.2.1
Stepwise regression
16.2.2
Cross-validation
16.2.3
Penalization
16.3
Shrinkage
Part VII – Expanding the Linear Model
17
Linear mixed models
17.1
Random effects
17.2
Random effects in statistical models
17.3
Linear mixed models are flexible
17.4
Blocking
17.4.1
Visualing variation due to blocks
17.4.2
Blocking increases precision of point estimates
17.5
Pseudoreplication
17.5.1
Visualizing pseduoreplication
17.6
Mapping NHST to estimation: A paired t-test is a special case of a linear mixed model
17.7
Advanced topic – Linear mixed models shrink coefficients by partial pooling
17.8
Working in R
17.8.1
coral data
18
Generalized linear models I: Count data
18.1
The generalized linear model
18.2
Count data example – number of trematode worm larvae in eyes of threespine stickleback fish
18.2.1
Modeling strategy
18.2.2
Checking the model I – a Normal Q-Q plot
18.2.3
Checking the model II – scale-location plot for checking homoskedasticity
18.2.4
Two distributions for count data – Poisson and Negative Binomial
18.2.5
Fitting a GLM with a Poisson distribution to the worm data
18.2.6
Model checking fits to count data
18.2.7
Fitting a GLM with a Negative Binomial distribution to the worm data
18.3
Working in R
18.3.1
Fitting a GLM to count data
18.3.2
Fitting a generalized linear mixed model (GLMM) to count data
18.3.3
Fitting a generalized linear model to continouus data
18.4
Problems
19
Linear models with heterogenous variance
19.1
gls
Part V: Expanding the Linear Model – Generalized Linear Models and Multilevel (Linear Mixed) Models
20
Plotting functions (#ggplotsci)
20.1
odd-even
20.2
estimate response and effects with emmeans
20.3
emm_table
20.4
pairs_table
20.5
gg_mean_error
20.6
gg_ancova
20.7
gg_mean_ci_ancova
20.8
gg_effects
Appendix 1: Getting Started with R
20.9
Get your computer ready
20.9.1
Start here
20.9.2
Install R
20.9.3
Install R Studio
20.9.4
Install R Markdown
20.9.5
(optional) Alternative LaTeX installations
20.10
Start learning R Studio
Appendix 2: Online Resources for Getting Started with Statistical Modeling in R
Appendix 3: Fake Data Simulations
20.11
Performance of Blocking relative to a linear model
Published with bookdown
Elements of Statistical Modeling for Experimental Biology
Part VII – Expanding the Linear Model