This course is an introduction to machine learning in R with applications to finance. Upon completion of this course, you should be able to:|
1) Be able to use the free software R to solve key tasks in “big data” including loading and cleaning big datasets, using libraries/r-packages, summarize and visualize data and make predictions.
2) Apply supervised and unsupervised learning to solve prediction problems in finance involving big data (amongst others) predict future stock prices, construct tradable portfolios and assess credit risk.
3) Analyze the main benefits and limitations of supervised and unsupervised learning methods we cover in class
4) Evaluate and compare the performance of different methods
The course will be a combination of theory and practice with a focus on applying Big Data to finance. You will learn why, when, and how to apply "Big Data methodology" to real-world situations. In the main part of the course, we will explore how to analyze panel data and how to use machine learning techniques to make predictions and to evaluate those predictions. While most of the focus will be on supervised machine learning in the context of portfolio theory, the methodology you will learn can be applied in many other domains. More specifically, you will learn how to practically analyze large-scale business data using machine learning techniques, including: |
1) Introduction to statistical computing. Students will learn basics about using computers to analyze big data, with a special emphasis on R, and the most common big data libraries in R.
2) Working with data. Data is rarely found in perfectly usable form. You will learn how to clean the data to make it usable.
3) Supervised learning techniques have become very advanced. We cover the basics including regression and classification, which are two basic supervised learning problems, and the more advanced techniques, such as decision trees, support vector machines, and boosting
4) We also study unsupervised learning techniques, which purpose is to discover hidden patterns or data groupings without the need for human intervention, and apply it to portfolio theory and asset pricing.
5) Fitting and overfitting. Big Data allows fitting very flexible models, which permits learning subtle features of the data. This creates the danger of overfitting, where the fit fails out of sample. Controlling overfitting is one of the central tasks in analysis of Big Data.