• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта

What is Data Science?

2024/2025
Учебный год
ENG
Обучение ведется на английском языке
3
Кредиты
Статус:
Курс обязательный
Когда читается:
1-й курс, 4 модуль

Преподаватель

Course Syllabus

Abstract

During this course students get acquainted with math concepts related to data science, master basic methods of collecting, processing and transforming of data using Python. The course deals with the concepts of data preprocessing, visualization, classical machine learning methods and deep neural networks. It includes basics of classification methods, regression, image recognition and natural language processing.
Learning Objectives

Learning Objectives

  • Formation of ideas about various ways of working with data.
  • Acquaintance with methods of preprocessing and visualization of data.
  • Development the ability to create data analysis software employing the methods of machine learning and deep neural networks.
Expected Learning Outcomes

Expected Learning Outcomes

  • Student are aware of basic concepts and can use Python for NLP deep lerning: recurrent neural networks, convolutional networks, pooling, attention mechanism, transformer.
  • Students are aware consept and can write Python program for cluster analysis
  • Students are aware of basic concepts of deep learning: tensor, model weighs, layers, various activation functions, loss function and metrics, optimization methods, softmax and crossentropy, dropout, batches, stochastic gradient decent, epoch, batch normalization.
  • Students are aware of basic concepts of linear algebra: matrix, vectors, matrix algebra; matrix decompositions.
  • Students are aware of basic concepts of machine learning: models, problems of ML, overfitting and underfitting, correctness, bias-variance tradeoff, feature extraction and selection.
  • Students are aware of basic concepts of natural language processing and its applications
  • Students are aware of basic concepts of neural networks: artificial neuron, activation function, perceptron, backpropagation
  • Students are aware of basic concepts of probability theory: dependence and independence, conditional probability, Bayes's theorem, normal distribution.
  • Students are aware of basics of Python: variables, expressions, control structures, functions, classes, exceptions, files
  • Students are aware of basics of statistics: average, dispersion and standard deviation, correlation.
  • Students are aware of concept and can write Python program for decision trees
  • Students are aware of concept and can write Python program for k-nearest neighbors classification
  • Students are aware of concept and can write Python program for naive Bayes classification
  • Students are aware of concept and can write Python program for regression analysis
  • Students are aware of concept of PCA and can write Python program that performs it
  • Students are aware of concepts and can write Python programs for nonlinear dimension reduction and manifold learning
  • Students are aware of gradient decent method
  • Students are aware of statistical hypothesis testing, confidence interval, Bayesian inference.
  • Students can perform the data visualization using Python. They can use matplotlib to draw bar charts, line charts, scatter plots.
  • Students can use Python to create bag of word text representation and use TF-IDF for weighting terms
  • Students can use Python to create deep networks for image recognition: dense layers, convolutional layers, pooling layers, augmentation.
  • Students can use Python to create word embedding
  • Students can use Python to pefrom text preprocessing: word normalization (spelling correction, stemming, lemmatization, stopword removal, case folding), tokenization and creation n-grams .
  • Students can write Python programs to perform exploring data, representing, cleaning, manipulating, rescaling.
  • Students can write Python programs to read data files and scrap the Web
  • Students can write Python programs to work with regular expressions, time and date, perform one-hot vectorization.
Course Contents

Course Contents

  • Python
  • Visualization of data
  • Linear algebra
  • Statistics
  • Probability
  • Hypothesis and inference
  • Gradient descent
  • Getting data
  • Working with data
  • Data pre-processing
  • Linear dimension reduction
  • Nonlinear dimension reduction
  • Machine learning
  • k-nearest neighbors
  • Naive Bayes classification
  • Regression
  • Decision trees
  • Clustering
  • Neural networks
  • Deep learniing
  • Image recognition
  • Natural language processing
  • Text preprocessing and building vocabulary.
  • Bag of words
  • Word embedding
  • Deep network architecture for NLP
Assessment Elements

Assessment Elements

  • non-blocking Linear algebra
  • non-blocking Statistics
  • non-blocking Probability
  • non-blocking Hypothesis and inference
  • non-blocking Gradient descent
  • non-blocking Getting data
  • non-blocking Working with data
  • non-blocking Dimension reduction
  • non-blocking Basic concepts of machine learning
  • non-blocking k-nearest neighbors
  • non-blocking Naive Bayes classifier
  • non-blocking Neural networks and deep learning
  • non-blocking Image recognition
Interim Assessment

Interim Assessment

  • 2024/2025 4th module
    0.077 * Basic concepts of machine learning + 0.077 * Dimension reduction + 0.077 * Getting data + 0.077 * Gradient descent + 0.077 * Hypothesis and inference + 0.076 * Image recognition + 0.077 * Linear algebra + 0.077 * Naive Bayes classifier + 0.077 * Neural networks and deep learning + 0.077 * Probability + 0.077 * Statistics + 0.077 * Working with data + 0.077 * k-nearest neighbors
Bibliography

Bibliography

Recommended Core Bibliography

  • Aman Kedia, & Mayank Rasu. (2020). Hands-On Python Natural Language Processing : Explore Tools and Techniques to Analyze and Process Text with a View to Building Real-world NLP Applications. Packt Publishing.
  • Ben Stephenson. (2019). The Python Workbook : A Brief Introduction with Exercises and Solutions (Vol. 2nd ed. 2019). Springer.
  • Bill Lubanovic. (2019). Introducing Python : Modern Computing in Simple Packages. [N.p.]: O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=2291494
  • Grus, J. (2019). Data Science From Scratch : First Principles with Python (Vol. Second edition). Sebastopol, CA: O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=2102311

Recommended Additional Bibliography

  • Döbler, M., & Grössmann, T. (2019). Data Visualization with Python : Create an Impact with Meaningful Data Insights Using Interactive and Engaging Visuals. Packt Publishing.
  • Embarak O. Data Analysis and Visualization Using Python: Analyze Data to Create Visualizations for BI Systems. - Apress, 2018.
  • Ian Goodfellow and Yoshua Bengio and Aaron Courville. Deep Learning, 2016. URL: http://www.deeplearningbook.org
  • Idris, I. (2015). NumPy: Beginner’s Guide - Third Edition (Vol. 3rd edition). Birmingham, UK: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1018109
  • Muller, A. C., & Guido, S. (2017). Introduction to machine learning with Python: a guide for data scientists. O’Reilly Media. (HSE access: http://ebookcentral.proquest.com/lib/hselibrary-ebooks/detail.action?docID=4698164)
  • Vanderplas, J.T. (2016). Python data science handbook: Essential tools for working with data. Sebastopol, CA: O’Reilly Media, Inc. https://proxylibrary.hse.ru:2119/login.aspx?direct=true&db=nlebk&AN=1425081.
  • Williams, G. (2019). Linear Algebra with Applications (Vol. Ninth edition). Burlington, MA: Jones & Bartlett Learning. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1708709

Authors

  • Kuptsov Pavel Vladimirovich