Kies de Nederlandse taal
Course module: 880091-M-3
Research Skills: Text Mining
Course info
Course module880091-M-3
Credits (ECTS)3
CategoryMA (Master)
Course typeCourse
Language of instructionEnglish
Offered byTilburg University; Tilburg School of Humanities and Digital Sciences; TSH Other;
Is part of
M Communication and Information Sciences (research)
M Culture Studies
M Communication and Information Sciences
M Linguistics and Communication Sciences (research)
Convenant TSH
C.D. Emmery, MSc
Other course modules lecturer
dr. E.A. Keuleers
Other course modules lecturer
Academic year2016
Starting block
Course mode
RemarksThis information is not up to date. Check the Course Catalog 2018 or select the course via “Register”.
Registration opennot known yet

Do liars use specific words and sentence structures more often than people who tell the truth? Can a computer judge the quality of a text and make suggestions for improvement? Have political speeches become more polarizing in the last decade? How can a publisher turn a newspaper archive with millions of articles into an online database that can be browsed by topic?

These are just a few questions that fall within the scope of ‘text mining’, an umbrella term for various processes for extracting high-quality information from text, mostly relying on advanced computational techniques. In this course, the student will learn how these techniques work and apply them in the context of scientific research. More specifically, by the end of this course, the student can:

- search and segment text with regular expressions;

- pre-process unstructured text for a specific computational analysis;

- calculate type-token ratios and other measures of lexical diversity;

- describe the computational techniques that underlie state-of-the-art applications for text categorization, sentiment analysis, and topic modeling;

- identify the strengths and weaknesses of these applications;

- describe the possibilities and limitations of several open-access text corpora (e.g., OpenSONAR, COCA, Google Corpus);

- use the above applications and corpora to answer simple research questions.


This course has a maximum capacity of 40 participants. If you filled out the survey you received recently (selecting master courses CIS Fall semester 2017) you will be enrolled automatically. All other students have to send an e-mail to: and will be enrolled if places are available.

Attendance to the meetings is obligatory and active participation is required. The grade will be based on weekly individual assignments (together making up 50% of the grade) during the course and a research project (the other 50% of the grade). 

Required Prerequisites

No prior experience with programming or machine learning is required.

The course consists of weekly two-hour meetings for a period of seven weeks. These will be a mix of lectures and seminars, in which the student will gain hands-on experience with several text mining tools (including, but not limited to those described above) and text corpora.

Type of instructions

Lectures and seminars

Type of exams

Assignments + paper (see Specifics)

Compulsory Reading
  1. Will be announced during the course

Recommended Reading
  1. Will be announced during the course
Required materials
Recommended materials

Kies de Nederlandse taal