Elements of Applied Biostatistics
Preface
0.1
Math
0.2
R and programming
Part I: R fundamentals
1
Organization – R Projects and R Notebooks
1.1
Importing Packages
1.2
Create an R Studio Project for this Class
1.3
R Notebooks
1.3.1
Create an R Notebook for this Chapter
1.3.2
Create a “load-packages” chunk
1.3.3
Create a “simple plot” chunk
1.3.4
Create more R chunks and explore options and play with R code
2
Data – Importing and Saving Data
2.1
Create new notebook for this chapter
2.2
Importing Data
2.2.1
Excel File
2.2.2
Text File
2.3
Saving Data
2.4
Problems
Part II: Some Fundamentals of Statistical Modeling
3
An Introduction to Statistical Modeling
3.1
Two specifications of a linear model
3.1.1
The “error draw” specification
3.1.2
The “conditional draw” specification
3.1.3
Comparing the two ways of specifying the linear model
3.2
What do we call the
\(X\)
and
\(Y\)
variables?
3.3
Statistical models are used for prediction, explanation, and description
3.4
Modeling strategy
3.5
A mean is the simplest model
3.6
Assumptions for inference with a statistical model
3.7
Specific assumptions for inference with a linear model
3.8
“Statistical model” or “regression model”?
3.9
GLM vs. GLM vs. GLS
4
Variability and Uncertainty (Standard Deviations, Standard Errors, Confidence Intervals)
4.1
The sample standard deviation vs. the standard error of the mean
4.1.1
Sample standard deviation
4.1.2
Standard error of the mean
4.2
Using Google Sheets to generate fake data to explore uncertainty
4.2.1
Steps
4.3
Using R to generate fake data to explore uncertainty
4.3.1
part I
4.3.2
part II - means
4.3.3
part III - how do SD and SE change as sample size (n) increases?
4.3.4
Part IV – Generating fake data with “for loops”
4.4
Bootstrapped standard errors
5
Covariance and Correlation
5.1
6
P-values
6.1
\(p\)
-values
6.2
Creating a null distribution.
6.2.1
the Null Distribution
6.2.2
\(t\)
-tests
6.2.3
P-values from the perspective of permutation
6.3
Statistical modeling instead of hypothesis testing
6.4
frequentist probability and the interpretation of p-values
6.4.1
Background
6.4.2
This book covers frequentist approaches to statistical modeling and when a probability arises, such as the
\(p\)
-value of a test statistic, this will be a frequentist probability.
6.4.3
Two interpretations of the
\(p\)
-value
6.4.4
NHST
6.4.5
Some major misconceptions of the
\(p\)
-value
6.4.6
Recommendations
6.5
Problems
7
Creating Fake Data
7.0.1
Continuous X (fake observational data)
7.0.2
Categorical X (fake experimental data)
7.0.3
Correlated X (fake observational data)
Part III: Introduction to Linear Models
8
A linear model with a single, continuous
X
8.1
A linear model with a single, continuous
X
is classical “regression”
8.1.1
Using a linear model to estimate explanatory effects
8.1.2
Using a linear model for prediction
8.1.3
Reporting results
8.2
Working in R
8.2.1
Exploring the bivariate relationship between
Y
and
X
8.2.2
Fitting the linear model
8.2.3
Getting to know the linear model: the
summary
function
8.2.4
display: An alternative to summary
8.2.5
Confidence intervals
8.2.6
How good is our model?
8.2.7
exploring a lm object
8.3
Problems
9
A linear model with a single, categorical
X
9.1
A linear model with a single, categorical
X
is the engine behind a single factor (one-way) ANOVA and a t-test is a special case of this model.
9.1.1
Table of model coefficients
9.1.2
The linear model
9.1.3
Reporting results
9.2
Working in R
9.2.1
Exploring the relationship between
Y
and
X
9.2.2
Fitting the model
9.2.3
An introduction to contrasts
9.2.4
Harrell plot
10
Model Checking
10.1
Do coefficients make numeric sense?
10.2
All statistical analyses should be followed by model checking
10.3
Linear model assumptions
10.4
Diagnostic plots use the residuals from the model fit
10.4.1
Residuals
10.4.2
A Normal Q-Q plot is used to check normality
10.4.3
Outliers - an outlier is a point that is highly unexpected given the modeled distribution.
10.5
Model checking homoskedasticity
10.6
Model checking independence - hapiness adverse example.
10.7
Using R
11
Model Fitting and Model Fit (OLS)
11.1
Least Squares Estimation and the Decomposition of Variance
11.2
OLS regression
11.3
How well does the model fit the data?
\(R^2\)
and “variance explained”
12
Plotting Models
12.1
Pretty good plots show the model and the data
12.1.1
Pretty good plot component 1: Modeled effects plot
12.1.2
Pretty good plot component 2: Modeled mean and CI plot with jittered raw data
12.1.3
Combining Effects and Modeled mean and CI plots – an Effects and response plot.
12.2
Some comments on plot components
12.3
Working in R
12.3.1
Adding modeled error intervals
12.3.2
Adding p-values
12.3.3
Adding custom p-values
12.3.4
Plotting two factors
Part IV: More than one
\(X\)
– Multivariable Models
13
Adding covariates to a linear model
13.1
Adding covariates can increases the precision of the effect of interest
13.1.1
Interaction effects with covariates
13.1.2
Add only covariates that were measured before peaking at the data
13.2
Regression to the mean
13.2.1
Do not use percent change, believing that percents account for effects of initial weights
13.2.2
Do not “test for balance” of baseline measures
14
Two (or more) Categorical
\(X\)
– Factorial designs
14.1
Factorial experiments
14.1.1
Model coefficients: an interaction effect is what is leftover after adding the treatment effects to the control
14.1.2
What is the biological meaning of an interaction effect?
14.1.3
What about models with more than two factors?
14.1.4
The additive model
14.1.5
Contrasts – simple vs. main effects
14.2
Reporting results
14.2.1
Text results
14.2.2
Harrellplot
14.2.3
Interaction plots
14.3
Recommendations
14.4
Working in R
14.5
Problems
15
ANOVA Tables
15.1
Summary of usage
15.2
Example: a one-way ANOVA using the vole data
15.3
Example: a two-way ANOVA using the urchin data
15.3.1
How to read an ANOVA table
15.3.2
How to read ANOVA results reported in the text
15.3.3
Better practice – estimates and their uncertainty
15.4
Unbalanced designs
15.4.1
What is going on in unbalanced ANOVA? – Type I, II, III sum of squares
15.4.2
Back to interpretation of main effects
15.4.3
The anova tables for Type I, II, and III sum of squares are the same if the design is balanced.
15.5
Working in R
15.5.1
Type I sum of squares in R
15.5.2
Type II and III Sum of Squares
16
Predictive Models
16.1
Overfitting
16.2
Model building vs. Variable selection vs. Model selection
16.2.1
Stepwise regression
16.2.2
Cross-validation
16.2.3
Penalization
16.3
Shrinkage
Part V: Expanding the Linear Model – Generalized Linear Models and Multilevel (Linear Mixed) Models
17
Generalized linear models I: Count data
17.1
The generalized linear model
17.2
Count data example – number of trematode worm larvae in eyes of threespine stickleback fish
17.2.1
Modeling strategy
17.2.2
Checking the model I – a Normal Q-Q plot
17.2.3
Checking the model II – scale-location plot for checking homoskedasticity
17.2.4
Two distributions for count data – Poisson and Negative Binomial
17.2.5
Fitting a GLM with a Poisson distribution to the worm data
17.2.6
Model checking fits to count data
17.2.7
Fitting a GLM with a Negative Binomial distribution to the worm data
17.3
Working in R
17.4
Problems
18
Linear mixed models
18.1
Random effects
18.2
Random effects in statistical models
18.3
Linear mixed models are flexible
18.4
Visualizing block effects
18.5
Linear mixed models can increase precision of point estimates
18.6
Linear mixed models are used to avoid pseudoreplication
18.7
Linear mixed models shrink coefficients by partial pooling
18.8
Working in R
18.8.1
coral data
Appendix 1: Getting Started with R
18.9
Get your computer ready
18.9.1
Install R
18.9.2
Install R Studio
18.9.3
Resources for installing R and R Studio
18.9.4
Install LaTeX
18.10
Start learning
18.10.1
Start with Data Camp Introduction to R
18.10.2
Then Move to Introduction to R Studio
18.10.3
Develop your project with an R Studio Notebook
18.11
Getting Data into R
18.12
Additional R learning resources
18.13
Packages used extensively in this text
Appendix 2: Online Resources for Getting Started with Statistical Modeling in R
Published with bookdown
Elementary Statistical Modeling for Applied Biostatistics
Part IV: More than one
\(X\)
– Multivariable Models