Upon completion of this course the successful student can:
- Understand the concept of a statistical model and the fundamentals of sampling uncertainty in statistics.
- Evaluate bias, variance, and relative efficiency of given statistics using Monte Carlo simulation techniques making use of the R statistical software package.
- Draw valid conclusions on the (relative) performance of given statistics when presented with results on bias, variance, and relative efficiency under particular simulation conditions.
- Have an understanding of concepts and techniques in statistical optimization (mainly univariate) including iteration-based optimization.
- Are able set up and program an algorithm to estimate the parameters for given statistical learning models using maximum likelihood and least squares estimation.
Assess model accuracy and perform model selection making use of cross-validation.
Assumed previous knowledge
An introductory course to statistics (eg. JBM101) is required
A course on programming is recommended
A first part is about computational statistics. First, fundamentals of statistics in terms of the assumed data generating model and sampling uncertainty are discussed. Then Monte Carlo simulation is introduced and how to use it in the context of assessing properties of statistical estimators and hypothesis tests (control of type I error and power) is discussed. Finally, the use of cross-validation in the context of model selection and model assessment is discussed with special attention for the bias-variance trade-off in the context of prediction.
In a second part, statistical computing is covered; this is the use of computers for optimization and numerical approximation in statistical problems. Maximum likelihood and least-square estimation is introduced and the necessary basics of univariate optimization are covered with respect to descriptive statistics like sample mean and sample variance and with respect to linear regression analysis. Then, the use of numerical optimization routines (e.g., bisection and Newton methods) are discussed in relation to logistic regression analysis.
The final grade will be based on two items:
- 4 homework assignments, the best 3 of which each count for 15% of the final grade.
- A written exam (closed book) which counts for the remaining 55% of the final grade.
If you reach less than 50% of the points on the final exam, then you will fail the course, regardless of the points you collected with the homework assignments. Your grade will be the minimum of 5 and the grade you achieved. However, you are allowed to participate in the second chance exam. The grade of the second chance exam replaces the grade for the first exam, that is, your homework assignments always count for 45% of your grade.