• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта

How to Win a Data Science Competition: Learn from Top Kagglers

2020/2021
Учебный год
ENG
Обучение ведется на английском языке
3
Кредиты

Преподаватель

Course Syllabus

Abstract

The study of this discipline is based on the following courses: • Machine learning • Data analysis methods To master the discipline, students must possess the following knowledge and competencies: • Programming method • Linear algebra Probability and statistics The main provisions of the discipline can be used in their professional activities. https://www.coursera.org/learn/competitive-data-science
Learning Objectives

Learning Objectives

  • The purpose of the discipline is to get acquainted with modern methods of data analysis and ma-chine learning and their use in data analysis competitions
Expected Learning Outcomes

Expected Learning Outcomes

  • Be able to choose the method of data processing and perform the data processing by the selected method
  • Be able to choose the method of cross validation and evaluate the quality of the selected method of data processing
  • Be able to solve the problems of data analysis competitions.
Course Contents

Course Contents

  • Data processing
    The tasks of predictive modeling. Preprocessing of data. Collection and processing of data from various sources, texts and images. Advanced feature engineering techniques, generating mean-encodings, using aggregated statistical measures, finding nearest neighbors as a means to im-prove the predictions
  • Methodology of cross validation
    Reliable cross validation methodologies. Overfitting and underfitting. Advanced techniques to overcome overfitting and underfitting. Analysis and interpretation of data. Comparisson of data analysis algorithms
  • Completions in data analysis
    Examples of competitions. General principals to approach data analysis competition problem. Key issues for a good solution
Assessment Elements

Assessment Elements

  • non-blocking On-line test
  • non-blocking Examen
  • non-blocking On-line test
  • non-blocking Examen
Interim Assessment

Interim Assessment

  • Interim assessment (3 module)
    0.3 * Examen + 0.7 * On-line test
Bibliography

Bibliography

Recommended Core Bibliography

  • Muller, A. C., & Guido, S. (2017). Introduction to machine learning with Python: a guide for data scientists. O’Reilly Media. (HSE access: http://ebookcentral.proquest.com/lib/hselibrary-ebooks/detail.action?docID=4698164)

Recommended Additional Bibliography

  • Witten, I. H. et al. Data Mining: Practical machine learning tools and techniques. – Morgan Kaufmann, 2017. – 654 pp.