The students of this course will learn how to design and build infrastructures for retrieving, integrating, managing, and querying data. Emphasis will be given both on the theoretical principles (presented and discussed during the lectures) as well as the best practical mechanisms (illustrated and practiced during the lab sessions) that enable the smooth and optimal usage of data.|
At the end of the course, participants will know the models and methodologies that are currently most commonly used for managing data and be able to quickly learn how to use contemporary technologies and infrastructures for interactive data management. After the successful completion of the course, students will be able to:
- Discuss data formats and characteristics using currently popular applications and list the challenges for collecting, integrating, and managing such data.
- Design and build tools for retrieving, integrating, managing, and querying data.
- Explain and relate models and methodologies that are currently most commonly used for interactive management of data.
- Review possible infrastructures that are most appropriate for a given purpose and justify the selection of the one that is the most suitable.
- Quickly master and make practical use of contemporary technologies and infrastructures.
The course consists of lectures. The primary topics covered in the lectures each week are the following: (note that slight modifications are probable):|
Weeks 1: Relational database model and Structured Query Language (SQL) for representing, creating, updating and querying databases. The provided material, i.e., lecture presentations, lab exercises, online tutorials, will act as a reminder for students who have the related knowledge but will also include elaborated information for students that first encounter the topic.
Week 3: Analytic skills, i.e., towards ontology, taxonomy, etc. Discussion of the Entity Relationship (ER) diagram, which is typically used to design and represent relational databases. This includes translating a natural language specification into ER diagram as well as creating the relational database schema from an ER diagram.
Week 4: Practical aspects for using contemporary technologies and infrastructures, for example the available relational database management systems (i.e., MySQL, SQLite), importing and exporting database data, connecting to a database through programming (i.e., Python).
Week 5: Data and overview of benefits, models, and practical usage of the currently popular data intensive systems. These models are among: Parallel and distributed data processing (MapReduce, etc.), Data warehousing, column-based access (SAP S4 HANA), and online analytical processing (ROLAP, MOLAP, etc.), Graph databases, Data Stream Management Systems and Approximate Query Processing.
Week 6: Database normalization for being able to structure a relational database in accordance with a series of so-called normal forms that reduces data redundancy and improves data integrity.
Week 7: Connecting the various topics. Exam preparation.
Knowledge of the following computer science areas: (i) data structures, (i) data modelling and databases, and (ii) basic programming skills. Note that participants are expected to study and cover these topics on their own before the course, or during the couple weeks of the course using the material that will be provided on the course canvas page.
Types of instructions
Types of exams
30% team project, split in two milestones
70% for the final written exam