Elements of Applied Biostatistics
Preface
0.1
Math
0.2
R and programming
Part I: R fundamentals
1
Organization – R Projects and R Notebooks
1.1
Importing Packages
1.2
Create an R Studio Project for this Class
1.3
R Notebooks
1.3.1
Create an R Notebook for this Chapter
1.3.2
Create a “setup” chunk
1.3.3
Create a “simple plot” chunk
1.3.4
Create more R chunks and explore options and play with R code
2
Data – Reading, Writing, and Fake
2.1
Create new notebook for this chapter
2.2
Importing Data
2.2.1
Excel File
2.2.2
Text File
2.3
Creating Fake Data
2.3.1
Continuous X (fake observational data)
2.3.2
Categorical X (fake experimental data)
2.3.3
Correlated X (fake observational data)
2.4
Saving Data
2.5
Problems
3
Pretty good plots
3.1
Pretty good plots show the model and the data
3.1.1
Pretty good plot component 1: Modeled effects plot
3.1.2
Pretty good plot component 2: Modeled mean and CI plot with jittered raw data
3.1.3
Combining Effects and Modeled mean and CI plots – an Effects and response plot.
3.2
Some comments on plot components
3.3
Working in R
3.3.1
Adding modeled error intervals
3.3.2
Adding p-values
3.3.3
Adding custom p-values
3.3.4
Plotting two factors
Part II: Statistics fundamentals
4
Variability and Uncertainty (Standard Deviations and Standard Errors)
4.1
The sample standard deviation vs. the standard error of the mean
4.1.1
Sample standard deviation
4.1.2
Standard error of the mean
4.2
Using Google Sheets to generate fake data to explore uncertainty
4.2.1
Steps
4.3
Using R to generate fake data to explore uncertainty
4.3.1
part I
4.3.2
part II - means
4.3.3
part III - how do SD and SE change as sample size (n) increases?
4.3.4
Part IV – Generating fake data with “for loops”
4.4
Bootstrapped standard errors
Part III: Statistical Modeling
5
Statistical Modeling
5.1
Two specifications of a linear model
5.1.1
The “random error” specification
5.1.2
The “random draw” specification
5.1.3
Comparing the two ways of specifying the linear model
5.2
What do we call the
\(X\)
and
\(Y\)
variables?
5.3
Statistical models are used for prediction, explanation, and description
5.4
Modeling strategy
5.5
A mean is the simplest model
5.6
Assumptions for inference with a statistical model
5.7
Specific assumptions for inference with a linear model
5.8
“Statistical model” or “regression model”?
5.9
GLM vs. GLM vs. GLS
6
A linear model with a single, continuous
X
6.1
A linear model with a single, continuous
X
is classical “regression”
6.1.1
Using a linear model to estimate explanatory effects
6.1.2
Using a linear model for prediction
6.1.3
Reporting results
6.2
Working in R
6.2.1
Exploring the bivariate relationship between
Y
and
X
6.2.2
Fitting the linear model
6.2.3
Getting to know the linear model: the
summary
function
6.2.4
display: An alternative to summary
6.2.5
Confidence intervals
6.2.6
How good is our model?
6.2.7
exploring a lm object
6.3
Problems
7
Least Squares Estimation and the Decomposition of Variance
7.1
OLS regression
7.2
How well does the model fit the data?
\(R^2\)
and “variance explained”
Part IV: Linear Models for Experimental Data
8
A linear model with a single, categorical
X
8.1
A linear model with a single, categorical
X
is the engine behind a single factor (one-way) ANOVA and a t-test is a special case of this model.
8.1.1
Table of model coefficients
8.1.2
The linear model
8.1.3
Reporting results
8.2
Working in R
8.2.1
Exploring the relationship between
Y
and
X
8.2.2
Fitting the model
8.2.3
An introduction to contrasts
8.2.4
Harrell plot
9
Model Checking
9.1
All statistical analyses should be followed by
model checking
10
skewed data
11
Two (or more) Categorical
\(X\)
– Factorial designs
11.1
Factorial experiments
11.1.1
Model coefficients: an interaction effect is what is leftover after adding the treatment effects to the control
11.1.2
What is the biological meaning of an interaction effect?
11.1.3
What about models with more than two factors?
11.1.4
The additive model
11.1.5
Contrasts – simple vs. main effects
11.2
Reporting results
11.2.1
Text results
11.2.2
Harrellplot
11.2.3
Interaction plots
11.3
Recommendations
11.4
Working in R
11.5
Problems
12
Adding covariates to a linear model I: ANCOVA
12.1
Adding covariates can increases the precision of the effect of interest
12.1.1
Interaction effects with covariates
12.1.2
Add only covariates that were measured before peaking at the data
12.2
Regression to the mean
12.2.1
Do not use percent change, believing that percents account for effects of initial weights
12.2.2
Do not “test for balance” of baseline measures
Part V: Hypothesis Testing
13
P-values
13.1
\(p\)
-values
13.2
Creating a null distribution.
13.2.1
the Null Distribution
13.2.2
\(t\)
-tests
13.2.3
P-values from the perspective of permutation
13.3
Statistical modeling instead of hypothesis testing
13.4
frequentist probability and the interpretation of p-values
13.4.1
Background
13.4.2
This book covers frequentist approaches to statistical modeling and when a probability arises, such as the
\(p\)
-value of a test statistic, this will be a frequentist probability.
13.4.3
Two interpretations of the
\(p\)
-value
13.4.4
NHST
13.4.5
Some major misconceptions of the
\(p\)
-value
13.4.6
Recommendations
13.5
Problems
14
ANOVA Tables
14.1
Summary of usage
14.2
Example: a one-way ANOVA using the vole data
14.3
Example: a two-way ANOVA using the urchin data
14.3.1
How to read an ANOVA table
14.3.2
How to read ANOVA results reported in the text
14.3.3
Better practice – estimates and their uncertainty
14.4
Unbalanced designs
14.4.1
What is going on in unbalanced ANOVA? – Type I, II, III sum of squares
14.4.2
Back to interpretation of main effects
14.4.3
The anova tables for Type I, II, and III sum of squares are the same if the design is balanced.
14.5
Working in R
14.5.1
Type I sum of squares in R
14.5.2
Type II and III Sum of Squares
Part VI: Expanding the Linear Model: Generalized Linear Models
15
Generalized linear models I: Count data
15.1
The generalized linear model
15.2
Count data example – number of trematode worm larvae in eyes of threespine stickleback fish
15.2.1
Modeling strategy
15.2.2
Checking the model I – a Normal Q-Q plot
15.2.3
Checking the model II – scale-location plot for checking homoskedasticity
15.2.4
Two distributions for count data – Poisson and Negative Binomial
15.2.5
Fitting a GLM with a Poisson distribution to the worm data
15.2.6
Model checking fits to count data
15.2.7
Fitting a GLM with a Negative Binomial distribution to the worm data
15.3
Working in R
15.4
Problems
Part VII: Expanding the Linear Model: Hierarchical (Linear Mixed) Models
16
Linear mixed models
16.1
Random effects
16.2
Random effects in statistical models
16.3
Linear mixed models are flexible
16.4
Visualizing block effects
16.5
Linear mixed models can increase precision of point estimates
16.6
Linear mixed models are used to avoid pseudoreplication
16.7
Linear mixed models shrink coefficients by partial pooling
16.8
Working in R
16.8.1
coral data
Appendix 1: Getting Started with R
16.9
Get your computer ready
16.9.1
Install R
16.9.2
Install R Studio
16.9.3
Resources for installing R and R Studio
16.9.4
Install LaTeX
16.10
Start learning
16.10.1
Start with Data Camp Introduction to R
16.10.2
Then Move to Introduction to R Studio
16.10.3
Develop your project with an R Studio Notebook
16.11
Getting Data into R
16.12
Additional R learning resources
16.13
Packages used extensively in this text
Appendix 2: Online Resources for Getting Started with Statistical Modeling in R
Published with bookdown
Elementary Statistical Modeling for Applied Biostatistics
Part VI: Expanding the Linear Model: Generalized Linear Models