We use cookies in order to improve the quality and usability of the HSE website. More information about the use of cookies is available here, and the regulations on processing personal data can be found here. By continuing to use the site, you hereby confirm that you have been informed of the use of cookies by the HSE website and agree with our rules for processing personal data. You may disable cookies in your browser settings.

  • A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

How to Win a Data Science Competition: Learn from Top Kagglers

2020/2021
Academic Year
ENG
Instruction in English
3
ECTS credits
Delivered at:
Department of Applied Mathematics and Informatics (Faculty of Informatics, Mathematics, and Computer Science (HSE Nizhny Novgorod))
Course type:
Compulsory course
When:
2 year, 3 module

Instructor


Razvenskaya, Olga

Course Syllabus

Abstract

The study of this discipline is based on the following courses: • Machine learning • Data analysis methods To master the discipline, students must possess the following knowledge and competencies: • Programming method • Linear algebra Probability and statistics The main provisions of the discipline can be used in their professional activities. https://www.coursera.org/learn/competitive-data-science
Learning Objectives

Learning Objectives

  • The purpose of the discipline is to get acquainted with modern methods of data analysis and ma-chine learning and their use in data analysis competitions
Expected Learning Outcomes

Expected Learning Outcomes

  • Be able to choose the method of data processing and perform the data processing by the selected method
  • Be able to choose the method of cross validation and evaluate the quality of the selected method of data processing
  • Be able to solve the problems of data analysis competitions.
Course Contents

Course Contents

  • Data processing
    The tasks of predictive modeling. Preprocessing of data. Collection and processing of data from various sources, texts and images. Advanced feature engineering techniques, generating mean-encodings, using aggregated statistical measures, finding nearest neighbors as a means to im-prove the predictions
  • Methodology of cross validation
    Reliable cross validation methodologies. Overfitting and underfitting. Advanced techniques to overcome overfitting and underfitting. Analysis and interpretation of data. Comparisson of data analysis algorithms
  • Completions in data analysis
    Examples of competitions. General principals to approach data analysis competition problem. Key issues for a good solution
Assessment Elements

Assessment Elements

  • non-blocking On-line test
  • non-blocking Examen
  • non-blocking On-line test
  • non-blocking Examen
Interim Assessment

Interim Assessment

  • Interim assessment (3 module)
    0.3 * Examen + 0.7 * On-line test
Bibliography

Bibliography

Recommended Core Bibliography

  • Muller, A. C., & Guido, S. (2017). Introduction to machine learning with Python: a guide for data scientists. O’Reilly Media. (HSE access: http://ebookcentral.proquest.com/lib/hselibrary-ebooks/detail.action?docID=4698164)

Recommended Additional Bibliography

  • Witten, I. H. et al. Data Mining: Practical machine learning tools and techniques. – Morgan Kaufmann, 2017. – 654 pp.