• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта

Introduction to Machine Learning

2020/2021
Учебный год
ENG
Обучение ведется на английском языке
6
Кредиты

Преподаватель

Course Syllabus

Abstract

Not so long ago, the term "big data" has become widespread, designating a new applied area - the search for ways to automatically quickly analyze huge amounts of heterogeneous information. Big data science is still taking shape, but it is already in great demand - and in the future it will only be in demand more. With its help, you can solve incredible tasks: assess the state of the liver using a cardiogram, predict salary based on a job description, offer music to a user based on his profile on the Internet. Big data can be anything: the results of scientific experiments, bank transaction logs, meteorological observations, social media profiles - in short, anything that can be useful to analyze. The most promising approach to big data analysis is considered to be the use of machine learning - a set of methods thanks to which a computer can find initially unknown relationships and patterns in arrays. There are people at the HSE Faculty of Computer Science and at the School of Data Analysis who are actively using machine learning and developing new approaches to it. They are the teachers of this course. You will learn the basic types of problems solved by machine learning - mainly about classification, regression, and clustering. Learn about the basic methods of machine learning and their features, learn how to evaluate the quality of models - and decide if a model is suitable for solving a specific problem. Finally, get acquainted with modern libraries that implement the discussed models and methods for assessing their quality. For work, we will use real data from real tasks.
Learning Objectives

Learning Objectives

  • Освоение базовых алгоритмов машинного обучения, позволяющих решать различные задачи анализа данных.
  • Освоение библиотек языка python, в которых реализованы различные алгоритмы машинного обучения.
Expected Learning Outcomes

Expected Learning Outcomes

  • Решает задачи и доказывает утверждения по теме модуля
Course Contents

Course Contents

  • Introduction. Objects similarity measure. Введение. Мера близости объектов
    Introduction to cluster analysis. An overview of clustering methodologies. Measuring similarity be-tween objects. Distance of numeric data. Minkowski distance. Proximity measure for symmetric vs asymmetric binary variables. Distance between categorical attributes, ordinal attributes and mixed types. Proximity measure between two vectors, cosine similarity. Correlation measures between two variables covariance and correlation coefficient.
  • K-means clustering. Hierarchical clustering. Алгоритм k средних. Иерархические методы кластеризации
    Partitioning based clustering methods. K-means clustering method. Initialization of k-means clus-tering. K-medoids clustering method. K-medians and K-modes clustering methods. Kernel k-means clustering. Hierarchical clustering methods. Agglomerative clustering algorithms. Divisive cluster-ing algorithms. Extensions to hierarchical clustering. BIRCH. Clustering using well-scattered repre-sentatives. Graph partitioning on the KNN graph of the data.
  • Density based clustering. Grid based clus-tering. Алгоритмы кластеризации, основанные на плотности. Алгоритмы кластеризации, ос-нованные на сетке.
    DBSCAN. OPTICS. STING. CLIQUE.
  • Clustering validation. Методы оценки качества кластеризации
    Methods for clustering validation. Clustering evaluation, measuring clustering quality. Constraint-based clustering. External measures: matching-based measures, entropy-based measures, pairwise measures. Internal measures for clustering validation. Relative measures. Cluster stability. Cluster-ing tendency.
Assessment Elements

Assessment Elements

  • blocking Письменная работа 60 минут
  • blocking Аудиторная оценка
  • blocking Письменная работа 60 минут
  • blocking Аудиторная оценка
Interim Assessment

Interim Assessment

  • Interim assessment (2 module)
    0.5 * Аудиторная оценка + 0.5 * Письменная работа 60 минут
Bibliography

Bibliography

Recommended Core Bibliography

  • Haroon, D. (2017). Python Machine Learning Case Studies : Five Case Studies for the Data Scientist. [Berkeley, CA]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1623520

Recommended Additional Bibliography

  • Baka, B. (2017). Python Data Structures and Algorithms. Birmingham, U.K.: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1528144
  • Bill Lubanovic. (2019). Introducing Python : Modern Computing in Simple Packages. [N.p.]: O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=2291494