We use cookies in order to improve the quality and usability of the HSE website. More information about the use of cookies is available here, and the regulations on processing personal data can be found here. By continuing to use the site, you hereby confirm that you have been informed of the use of cookies by the HSE website and agree with our rules for processing personal data. You may disable cookies in your browser settings.

  • A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Introduction to NLP

2024/2025
Academic Year
ENG
Instruction in English
3
ECTS credits
Delivered at:
School of Fundamental and Applied Linguistics
Course type:
Compulsory course
When:
1 year, 4 module

Instructor

Course Syllabus

Abstract

NLP (Natural Language Processing) is natural language processing, which allows you to apply machine learning algorithms to text and speech. The course will study the basics of NLP, the mathematical methods used in NLP, sentiment analysis, working with databases etc.
Learning Objectives

Learning Objectives

  • The purpose of the course is to gain knowledges of the Natural Language Processing statistical methods.
  • Get acquainted with vector represintation of data.
  • Study building models in NLP.
  • Get acquainted with NLP Libraries.
Expected Learning Outcomes

Expected Learning Outcomes

  • Processing texts using basic string manipulations, as well as sentiment analysis and topic modeling
  • A student knows the history of the discipline and subfields
  • have the skill to work unstructured text data
  • Students are aware of concept and can write Python program for k-nearest neighbors classification
  • Students are aware of concept and can write Python program for naive Bayes classification
  • Students can use Python to pefrom text preprocessing: word normalization (spelling correction, stemming, lemmatization, stopword removal, case folding), tokenization and creation n-grams .
  • Understand the transformer architecture
  • Students are aware of different types of machine learning techniques, such as supervised and unsupervised learning.
  • Student are aware of ways to collect data by scraping web-pages.
  • Students are aware of topic modelling
  • Student are aware of two different algorithms, LSA, LDA
  • A student apply the basics of thematic modeling, is familiar with the main approaches of text summarization, simplification and text generation, writes the examples of programs in Python
  • Students are aware of the motivations behind converting human language into mathematical structures.
  • Student are aware of the different types of vector representation techniques.
  • Apply the NLP transduction and induction process
  • Apply CoLA, SST-2, Winograd schemas for solving tasks
Course Contents

Course Contents

  • Introduction
  • Basic Feature Extraction Methods
  • Developing a Text classifier.
  • Collecting Text Data from the Web.
  • Topic Modelling.
  • Text Summarization and Text Generation.
  • Vector Representation.
  • Sentiment Analysis.
Assessment Elements

Assessment Elements

  • non-blocking Practical Work 1 "Basic Feature Extraction Methods"
  • non-blocking Practical Work 2 "Developing a Text classifiers"
  • non-blocking "Collecting Text Data from the Web"
  • non-blocking Practical work 4 "Topic Modeling"
  • non-blocking Practical work 5 "Text summarization and Text Generation"
  • non-blocking Practical Work 6 "Vector Representation"
  • non-blocking Practical Work 7 "Sentiment Analysis":
  • non-blocking Practical Work 8 "Model Architecture of the Transformer"
  • non-blocking Practical Work 9 'NLP Task with Transformers
  • non-blocking Activity on Lections
  • non-blocking Creative Task
Interim Assessment

Interim Assessment

  • 2024/2025 4th module
    0.1 * Practical Work 8 "Model Architecture of the Transformer" + 0.1 * Practical work 5 "Text summarization and Text Generation" + 0.1 * "Collecting Text Data from the Web" + 0.05 * Activity on Lections + 0.05 * Creative Task + 0.1 * Practical Work 1 "Basic Feature Extraction Methods" + 0.1 * Practical Work 2 "Developing a Text classifiers" + 0.1 * Practical Work 6 "Vector Representation" + 0.1 * Practical Work 7 "Sentiment Analysis": + 0.1 * Practical Work 9 'NLP Task with Transformers + 0.1 * Practical work 4 "Topic Modeling"
Bibliography

Bibliography

Recommended Core Bibliography

  • Beysolow, T. (2018). Applied Natural Language Processing with Python : Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing. [Berkeley, CA]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1892182
  • Indurkhya N., Damerau F. J. Handbook of natural language processing. – Chapman and Hall/CRC, 2010. – 704 pp.
  • Introduction to natural language processing, Eisenstein, J., 2019
  • Nfn Bahrawi. (2019). Online Realtime Sentiment Analysis Tweets by Utilizing Streaming API Features From Twitter. Jurnal Penelitian Pos Dan Informatika, (1), 53. https://doi.org/10.17933/jppi.2019.090105
  • Pozzi F. et. al. Sentiment Analysis in Social Networks. - Morgan Kaufmann Publishers, 2016. - ЭБС Books 24x7.
  • Speech and language processing. An introduction to natural language processing, computational lin..., Jurafsky, D., 2009

Recommended Additional Bibliography

  • Dale R., Moisl H., Somers H. (ed.). Handbook of natural language processing. – CRC Press, 2000. – 1015 pp.
  • Natural Language Processing and Information Systems. (2017). Springer.

Authors

  • Kuptsov Pavel Vladimirovich
  • Stankevich Nataliia Vladimirovna