• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Fundamentals of Computational Linguistics

2019/2020
Academic Year
ENG
Instruction in English
4
ECTS credits
Delivered at:
Department of Applied Mathematics and Informatics (Faculty of Informatics, Mathematics, and Computer Science (HSE Nizhny Novgorod))
Course type:
Compulsory course
When:
1 year, 1 module

Instructor

Программа дисциплины

Аннотация

The course is aimed at mastering the basics of natural language processing (NLP) and computational linguistics (CL) vibrant interdisciplinary fields. The course covers the methods and approaches used in many real-world NLP applications such as language modeling, text classification, sentiment analysis and machine translation. The students taking the course will not only use some of the existing NLP libraries and software packages, but also learn about the principles behind their design, and about the main mathematical models underlying modern computational linguistics. The course also involves completing some practical programming assignments in Python and conducting experiments on texts written in English and Russian.
Цель освоения дисциплины

Цель освоения дисциплины

  • Know the structural features of natural language texts and the principles of their computer processing in order to obtain linguistic (morphological, syntactic, semantic) information;
  • Have an idea of the methods used to solve complex practical problems of natural language processing, in particular, information retrieval, summarization, sentiment analysis, machine translation;
  • Understand the limitations of existing computer models of natural language processing.
Результаты освоения дисциплины

Результаты освоения дисциплины

  • Ability of use modern PoS-taggers for Russian and English languages
  • Understand basic algorithms of machine translation. Understand task of machine translation
  • Understand basic methodology of text processing
  • Understand pros and cons evaluation methods in data mining tasks
  • Understand task of PoS-tagging.
  • Understand task of text classification
  • Understand terminology of computational semantics. Onthology (WordNet) and distributional semantics,
  • Understanding terminology of language models. Will be able to apply which language model to use and the cases of using LM.
  • Use text classification algorithms for sentiment analysis and text categorization
  • Will be able to choose right algorithms for text classification
  • Will be able to create basic regular expression for everyday tasks
  • Will be able to interpret metrics in ML tasks
  • Will be able to understand modern tasks of NLP, understand terminology
Содержание учебной дисциплины

Содержание учебной дисциплины

  • Introduction to natural language processing
    Structural features of texts in natural language; ambiguity on all levels of language; the main challenges of natural language processing; basic approaches to problem solving: manually written rules and machine learning.
  • Basic text processing and edit distance
    Preprocessing: tokenization and segmentation; normalization of words: stemming, lemmatization, morphological analyzers; regular expressions; edit distance.
  • Language models
    N-grams; perplexity; methods of smoothing; the use of language models: input prediction, error correction, speech recognition, text generation.
  • Tagging problems and hidden Markov models
    POS tagging; named entity recognition as a tagging problem; hidden Markov models, their ad-vantages and disadvantages; the Viterbi algorithm.
  • Text classification and sentiment analysis
    Classification problems; naive Bayes classifier; text classification; sentiment analysis.
  • Evaluation
    Performance measures: accuracy, precision, recall, F-measure; state-of-the-art.
  • Basic of machine translation
    Classical approaches: direct, transfer-based, interlingual; statistical machine translation; IBM model; alignment; phrase-based translation models.
  • Computational semantics
    Word senses and meanings; WordNet; semantic similarity measures: thesaurus-based and distri-butional methods.
Элементы контроля

Элементы контроля

  • Тест (неблокирующий)
  • Тест (неблокирующий)
  • Тест (неблокирующий)
  • Тест (неблокирующий)
  • Тест (неблокирующий)
  • Устный экзамен (неблокирующий)
Промежуточная аттестация

Промежуточная аттестация

  • Промежуточная аттестация (1 модуль)
    0.1 * Тест + 0.1 * Тест + 0.1 * Тест + 0.1 * Тест + 0.1 * Тест + 0.5 * Устный экзамен
Список литературы

Список литературы

Рекомендуемая основная литература

  • Perkins, J. Python Text Processing with NLTK 2.0 Cookbook: Use Python NLTK Suite of Libraries to Maximize Your Natural Language Processing Capabilities [Электронный ресурс] / Jacob Perkins; DB ebrary. – Birmingham: Packt Publishing Ltd, 2010. – 336 p.
  • Sarkar, D. Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from your Data [Электронный ресурс] / Dipanjan Sarkar; БД Books 24x7. – Chicago: Apress, 2016. – 412 p. – ISBN 978-1-4842-2387-1
  • The Handbook of Computational Linguistics and Natural Language Processing [Электронный ресурс] / ed. by Alexander Clark, Chris Fox, Shalom Lappin; DB ebrary. – Chichester: John Wiley & Sons, 2013. – 203 p. – Режим доступа: https://ebookcentral.proquest.com/lib/hselibrary-ebooks/reader.action?docID=4035461&query=computational+linguistics
  • The Handbook of Natural Language Processing [Электронный ресурс] / edited by Robert Dale, Hermann Moisl, Harold Somers; DB ebrary. – New York: Marcel Dekker, Inc., 2010. – XIX, 996 p. – режим доступа: https://ebookcentral.proquest.com/lib/hselibrary-ebooks/reader.action?docID=216282&query=natural+language+processing+with