Do liars use specific words and sentence structures more often than people who tell the truth? Can a computer judge the quality of a text and make suggestions for improvement? Have political speeches become more polarizing in the last decade? How can a publisher turn a newspaper archive with millions of articles into an online database that can be browsed by topic?
These are just a few questions that fall within the scope of ‘text mining’, an umbrella term for various processes for extracting high-quality information from text, mostly relying on advanced computational techniques. In this course, the student will learn how these techniques work and apply them in the context of scientific research. More specifically, by the end of this course, the student can:
- search and segment text with regular expressions;
- pre-process unstructured text for a specific computational analysis;
- calculate type-token ratios and other measures of lexical diversity;
- describe the computational techniques that underlie state-of-the-art applications for text categorization, sentiment analysis, and topic modeling;
- identify the strengths and weaknesses of these applications;
- describe the possibilities and limitations of several open-access text corpora (e.g., OpenSONAR, COCA, Google Corpus);
- use the above applications and corpora to answer simple research questions.
This course has a maximum capacity of 40 participants. If you filled out the survey you received recently (selecting master courses CIS Fall semester 2017) you will be enrolled automatically. All other students have to send an e-mail to: mastercoursesCIW@uvt.nl and will be enrolled if places are available.
Attendance to the meetings is obligatory and active participation is required. The grade will be based on weekly individual assignments (together making up 50% of the grade) during the course and a research project (the other 50% of the grade).
Required PrerequisitesNo prior experience with programming or machine learning is required.