Foto personale

MARCO LIPPI

Department of Engineering Sciences and Methods

Content class: Advanced information systems

Class: MANAGEMENT ENGINEERING (D.M.270/04) (Offer 2017)
  • CFU: 9
  • SSD: ING-INF/05

Objectives

Introducing the student to the advanced aspects of data science and data management for Business Intelligence, Data Warehousing and Information Integration.

Prerequisites

Sistemi Informativi

Course Syllabus

Data Science: - Statistics background -- Descriptive and inferential statistics, data sampling, hypothesis testing - Machine learning -- Supervised and unsupervised learning -- Performance measures for classification and regression -- Decision trees, support vector machines, neural networks, clustering, regression tasks - Data visualization - Software R Data Management: - Data Warehouses -- Conceptual design: Dimensional Fact Model (DFM) -- Logical design: star schema and snowflake schema - Data Warehouse and SQL -- SQL background: join and grouping -- Software MySQL - Data analysis over data warehouses: -- OLAP (On-Line Analytical Processing): roll-up, drill-down - Data Mining: -- Association Rule Mining - Big Data: -- Distributed and relational databases (NoSQL) -- Map-Reduce and large-scale databases -- Software MongoDB

Reference texts

M. Golfarelli, S. Rizzi. Data Warehouse: Teoria e Pratica della Progettazione - Seconda Edizione. McGraw-Hill, 2006. G. James, D. Witten, T. Hastie, R. Tibshirani: An introduction to statistical learning with applications in R - freely available at http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Seventh%20Printing.pdf Dispense del Docente

Teaching methods

Lectures, practical exercises, laboratory activities.

Verification of learning

Oral exam. The discussion and presentation of a project on one of the course topics, chosen by the student and approved by the instructor, is mandatory for the exam. The project can be carried out in groups of two students (three only in exceptional cases, to be agreed with the instructor). The project should not require an excessive amount of work. The marking percentage will be in fact approximately computed as: - Oral exam: 70% - Project: 30%

Expected results

- Knowledge and understanding -- Through lectures, students will get deep knowledge and understanding of systems for data analysis and management, with a specific focus on large data collections. - Applying knowledge and understanding -- Through classroom exercises and practical computer exercises, the student will be able to use the advanced features of the standard language to apply the knowledge gained in the design and implementation of data warehouse systems, and the functionalities of programming language R to build analytic and predictive models. - Making judgments -- Thanks to the implementation of a project and to the resolution of individual exercises and practical exercises in the laboratory, the student will be able to critically evaluate the design and implementative choices taken and the results obtained. - Communication skills -- The oral exam which includes the presentation of the implemented project, will equip the student to organize and clearly present, through the technical language, the results of his work. - Learning skills -- The activities carried out during the course and during the examination allow the student to acquire the instruments to autonomously upgrade his knowledge. This is especially crucial in the field of advanced data science and management, where technology is constantly evolving.