After completing this course:
- The student can describe the main principles of a “data science” approach to behavioral and social science research.
- The student can compare and contrast the “data science” approach with the traditional statistical paradigms used in behavioral and social science research.
- The student can use principal component analysis and cluster analysis to find structure in complex behavioral and social science data.
- The student can use linear regression and logistic regression to build predictive models that generalize to unseen data.
- The student can use a trained linear regression or logistic regression model for prediction or classification, respectively.
Our society is turning into a data-driven society. According to a May 2013 article in ScienceDaily, “A full 90 percent of all the data in the world has been generated over the last two years.” We can learn a great deal from these massive amounts of data, and a "data science" approach can help us do so. Data science methods are used to derive knowledge from data in academic research, companies, governmental agencies, and any other organization that wants to make data-based decisions. This course offers an introduction to the use of data science methods for social and behavioral science research. Upon completing this course, students will have acquired the skills necessary to apply statistical data science techniques to summarize and visualize complex data, discover patterns, and predict outcomes and trends for unseen data. Topics include prediction, classification, clustering, dimension reduction, shrinkage approaches, and more.
During the course, students will complete two group assignments in which they will apply their data science skills to real behavioral and social science data. The assignments will be performed using the open-source statistical software platform R. The course will be completed with a written exam.
This course is compulsory for students of the major Psychological Methods and Data Analysis.
Familiarity with basic statistics, in particular linear regression, is assumed.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning. New York: Springer.