Natural Language Processing
Малафеев Алексей Юрьевич
- As a result of mastering the discipline, the student will know the structural features of natural language texts and the principles of their computer processing in order to obtain linguistic (morphological, syntactic, semantic) information;
- As a result of mastering the discipline, the student will have an idea of the methods used to solve complex practical problems of natural language processing, in particular, information retrieval, summarization, sentiment analysis, machine translation
- The student woll understand the limitations of existing computer models of natural language processing
- Understands the difficulties in natural language processing, has an idea of approaches to solve these problems
- Knows how to preprocess text, knows the syntax of regular expressions, has an idea of the editing distance.
- Knows why language models are needed, knows how to create language models using n-grams
- Has an idea of the tagging problem, knows the principle of hidden Markov models and the basic algorithm for implementation
- Has an idea of the classification problem and approaches to it, understands the naive Bayesian classifier algorithm
- Understands the difference between basic classification metrics
- Understands the difference between basic classification metrics, has an idea of dependency and constituent trees and context-free grammars, knows the basic algorithms of syntactic parsing
- Has an idea of classical and modern approaches to machine translation
- Has an idea of computational semantics, knows the basic approaches, able to calculate semantic similarity
- Has an idea of different types of summarization and ways to assess the quality of summarization
- Introduction to natural language processingStructural features of texts in natural language; ambiguity on all levels of language; the main challenges of natural language processing; basic approaches to problem solving: manually written rules and machine learning.
- Basic text processing and edit distancePreprocessing: tokenization and segmentation; normalization of words: stemming, lemmatization, morphological analyzers; regular expressions; edit distance.
- Language modelsN-grams; perplexity; methods of smoothing; the use of language models: input prediction, error correction, speech recognition, text generation.
- Tagging problems and hidden Markov modelsPOS tagging; named entity recognition as a tagging problem; hidden Markov models, their ad-vantages and disadvantages; the Viterbi algorithm.
- Text classification and sentiment analysisClassification problems; naive Bayes classifier; text classification; sentiment analysis.
- EvaluationPerformance measures: accuracy, precision, recall, F-measure; state-of-the-art.
- ParsingConstituency and dependency trees; context-free grammar; probabilistic approach to parsing; lexicalized PCFGs; CKY algorithm.
- Machine translationClassical approaches: direct, transfer-based, interlingual; statistical machine translation; IBM model; alignment; parameter estimation in IBM models; phrase-based translation models.
- Computational semanticsWord senses and meanings; WordNet; semantic similarity measures: thesaurus-based and distributional methods.
- Text summarizationExtractive and abstractive summarization; multiple-document summarization; query-based summarization; supervised and unsupervised learning; evaluation of summarization systems; ROUGE.
- Graded tests
- Oral examЭкзамен проводится на платформах Zoom (https://zoom.us), MS Teams (https://teams.microsoft.com). Ссылка будет отправлена преподавателем за три дня до экзамена.
- Perkins, J. Python Text Processing with NLTK 2.0 Cookbook: Use Python NLTK Suite of Libraries to Maximize Your Natural Language Processing Capabilities [Электронный ресурс] / Jacob Perkins; DB ebrary. – Birmingham: Packt Publishing Ltd, 2010. – 336 p.
- The Handbook of Natural Language Processing [Электронный ресурс] / edited by Robert Dale, Hermann Moisl, Harold Somers; DB ebrary. – New York: Marcel Dekker, Inc., 2010. – XIX, 996 p. – режим доступа: https://ebookcentral.proquest.com/lib/hselibrary-ebooks/reader.action?docID=216282&query=natural+language+processing+with