Fundamentals of Computational Linguistics
- Know the structural features of natural language texts and the principles of their computer processing in order to obtain linguistic (morphological, syntactic, semantic) information;
- Have an idea of the methods used to solve complex practical problems of natural language processing, in particular, information retrieval, summarization, sentiment analysis, machine translation;
- Understand the limitations of existing computer models of natural language processing.
- Will be able to understand modern tasks of NLP, understand terminology
- Understand basic methodology of text processing
- Will be able to create basic regular expression for everyday tasks
- Understanding terminology of language models. Will be able to apply which language model to use and the cases of using LM.
- Understand task of PoS-tagging.
- Ability of use modern PoS-taggers for Russian and English languages
- Understand task of text classification
- Will be able to choose right algorithms for text classification
- Use text classification algorithms for sentiment analysis and text categorization
- Understand pros and cons evaluation methods in data mining tasks
- Will be able to interpret metrics in ML tasks
- Understand basic algorithms of machine translation. Understand task of machine translation
- Understand terminology of computational semantics. Onthology (WordNet) and distributional semantics,
- Introduction to natural language processingStructural features of texts in natural language; ambiguity on all levels of language; the main challenges of natural language processing; basic approaches to problem solving: manually written rules and machine learning.
- Basic text processing and edit distancePreprocessing: tokenization and segmentation; normalization of words: stemming, lemmatization, morphological analyzers; regular expressions; edit distance.
- Language modelsN-grams; perplexity; methods of smoothing; the use of language models: input prediction, error correction, speech recognition, text generation.
- Tagging problems and hidden Markov modelsPOS tagging; named entity recognition as a tagging problem; hidden Markov models, their ad-vantages and disadvantages; the Viterbi algorithm.
- Text classification and sentiment analysisClassification problems; naive Bayes classifier; text classification; sentiment analysis.
- EvaluationPerformance measures: accuracy, precision, recall, F-measure; state-of-the-art.
- Basic of machine translationClassical approaches: direct, transfer-based, interlingual; statistical machine translation; IBM model; alignment; phrase-based translation models.
- Computational semanticsWord senses and meanings; WordNet; semantic similarity measures: thesaurus-based and distri-butional methods.
- Interim assessment (1 module)0.1 * Тест + 0.1 * Тест + 0.1 * Тест + 0.1 * Тест + 0.1 * Тест + 0.5 * Устный экзамен
- Perkins, J. Python Text Processing with NLTK 2.0 Cookbook: Use Python NLTK Suite of Libraries to Maximize Your Natural Language Processing Capabilities [Электронный ресурс] / Jacob Perkins; DB ebrary. – Birmingham: Packt Publishing Ltd, 2010. – 336 p.
- The Handbook of Natural Language Processing [Электронный ресурс] / edited by Robert Dale, Hermann Moisl, Harold Somers; DB ebrary. – New York: Marcel Dekker, Inc., 2010. – XIX, 996 p. – режим доступа: https://ebookcentral.proquest.com/lib/hselibrary-ebooks/reader.action?docID=216282&query=natural+language+processing+with
- Sarkar, D. Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from your Data [Электронный ресурс] / Dipanjan Sarkar; БД Books 24x7. – Chicago: Apress, 2016. – 412 p. – ISBN 978-1-4842-2387-1
- The Handbook of Computational Linguistics and Natural Language Processing [Электронный ресурс] / ed. by Alexander Clark, Chris Fox, Shalom Lappin; DB ebrary. – Chichester: John Wiley & Sons, 2013. – 203 p. – Режим доступа: https://ebookcentral.proquest.com/lib/hselibrary-ebooks/reader.action?docID=4035461&query=computational+linguistics