Data sets (e.g., retrieved from a company’s database, collected via surveys, or scraped from the Internet) need to undergo a careful cleaning and transformation procedure before they can be used in empirical research projects 1. Therefore, in this course, you familiarize yourself with data structures common in empirical marketing research, and learn how to efficiently engineer complex data sets and document them for reusability/reproducibility.
After successful completion of this course, you will be able to:
- Use R to read in various data formats for further processing
- Apply common data operations in R to transform and clean your data (e.g., aggregation, merging, de-duplication, reshaping, date conversions, regular expressions)
- Use basic programming concepts to increase speed and minimize errors (e.g., looping, vectorization, writing functions, handling errors/debugging)
- Operationalize variables/engineer features from numerical, textual, and visual raw data
- Store and manage data using file-based systems and databases
- Use workflow management techniques to create and audit automated and reproducible data pipelines
- Version code and manage and contribute to GitHub repositories
- Document and archive final data sets, and learn how to make them available for public (re)use
Students pass this course if the final course grade (i.e., the weighted average of the group project and exam; weights indicated above) is ≥ 5.5, and the exam is passed (≥ 5.5).
- Group project (4-5 team members) with peer assessment 1 (40%)
1Self- and peer-assessment: The team project is subject to self- and peer assessment, i.e., students’ grades will be corrected upwards or downwards, depending on their own contribution to the overall team effort. Students provide written feedback to each other once during the course, and score themselves and their team members on, among others, the quantity and quality of their contributions.
- Hybrid format: Jupyter notebooks or pre-recorded web clips for preparation and self-paced lab sessions; live streams on Zoom for feedback and joint coding sessions (recordings will be made available)
- Modern content: copy-paste code snippets and demos from the course page, access code on GitHub, start projects with workflow templates
- Interactive, immersive and student-centred: live coding, hackathon, debates, working with real data sets
Student profile / prerequisites
- The course is instructed to MSc students in the Marketing Analytics (TiSEM) program.
- The course expects students to have acquired working knowledge in R (e.g., from introductory courses at Datacamp).
- The course welcomes novices, of whom extra preparation prior to the start of the course is expected. Preparation material will be shared with students in advance in the form of R Notebooks or course recommendations at Datacamp. Novices may further benefit from following other courses at Tilburg University in which R is used.
- Students are recommended to use their own computer for this course (Windows, Mac or Linux). Android/Chromebook/iOS devices are not supported.
Enrollment and Obtaining Course Credits
- The course (3 ECTS) will be taught in the Marketing Analytics Program at Tilburg University (please check Osiris for the specifics).
- Interested Research Master or PhD students who seek to advance their data collection skills can audit this course upon the approval of the instructor and their coordinator.