Dalhousie University    [  http://web.cs.dal.ca/~vlado/csci6509/coursecalendar.html  ]
Winter 2021 (Jan6-Apr8)
Faculty of Computer Science
Dalhousie University

CSCI 4152/6509 — Course Calendar (tentative)

[ Home | Calendar | Project | P0 | Misc | A0 | A1 | A2 | A3 | A4 | Login ]
  Part I: Introduction
  Jan 6-8"Soft start": Course Introduction
We will use a "soft start" of the term and there will be no synchronized classes before Monday Jan 11, 2020.
Files: Syllabus (PDF).
1 Mo Jan 11 Course Introduction
Course information: logistics and administrivia, textbook and other resources, evaluation scheme, academic integrity policy, culture of respect, tentative course schedule.
Files: slides, lecture notes.
L1 Mo Jan 11 Lab 1 (4152): Computing Environment and Perl Tutorial 1
Logging in using CSID, timberlea environment; Introduction to Perl programming language.
Files: lab notes, slides.
2 We Jan 13 Introduction to NLP
Introduction to NLP (reading Ch.1 [JM]): natural language and other languages. NLP applications. NLP as a research area, NLP Research Links and NLP Anthology http://aclweb.org/anthology/. Short history of NLP. Levels of NLP.
Files: slides, lecture notes.
L1 Th Jan 14 Lab 1 (6509): Computing Environment and Perl Tutorial 1 Files: lab notes, slides. 
3 Fr Jan 15 Ambiguity in NLP; About Course Project
Some reasons why NLP is hard. Ambiguities at different levels of NLP: phonological (phonetic), lexical, syntactic, semantic, pragmatic, and discourse ambiguities. About the course project: deliverables, P0, P1, P, R; project types, choosing topic, resources, themes and previous topics.
Files: slides, lecture notes.
  Fr Jan 15Last day to add/drop courses  
  Part II: Stream-based Text Processing
4 Mo Jan 18 Finite Automata
Part II: Stream-based Text Processing: Deterministic and Non-deterministic Automata. Review of Deterministic Finite Automata (DFA) and Non-deterministic Finite Automata (NFA), and their use in NLP; NFA-to-DFA conversion.
Files: slides, lecture notes.
L2 Mo Jan 18 Lab 2 (4152): Perl Tutorial 2
Regular expressions and character n-grams in Perl.
Files: lab notes, slides.
5 We Jan 20 Regular Expressions
Review of regular expressions. (Reading: Chapter 2 [JM])
Files: slides, lecture notes.
A0 out
A1 out
L2 Th Jan 21 Lab 2 (6509): Perl Tutorial 2 Files: lab notes, slides. 
6 Fr Jan 22 Text Processing in Perl
Introduction to Perl, main Perl features, program examples, syntactic elements, I/O, regular expressions in Perl, regular expression based processing in Perl.
Files: slides, lecture notes.
7 Mo Jan 25 Perl Processing Examples
More on Perl regular expressions; Text processing examples: tokenization, counting letters.
Files: slides, lecture notes.
L3 Mo Jan 25 Lab 3 (4152): Perl Tutorial 3
Perl: Arrays or lists; associative arrays or hashes; references.
Files: lab notes, slides.
8 We Jan 27 Elements of Morphology
Elements of Morphology: reading: Section 3.1 [JM]; morphemes, stems, affixes, tokenization, stemming, lemmatization; morphological processes. Characters, Words, and N-grams: counting words, Zipf's law.
Files: slides, lecture notes.
A0 due
L3 Th Jan 28 Lab 3 (6509): Perl Tutorial 3 Files: lab notes, slides. 
9 Fr Jan 29 Guest Speaker, LeadSift Files: slides, lecture notes. 
  Fr Jan 29Last day to drop classes without "W"  
10 Mo Feb  1 Counting N-grams
Guest Speaker, WESI; N-grams: character and word n-grams.
Files: slides, lecture notes.
L4 Mo Feb  1 Lab 4 (4152): Git and GitLab Tutorial
Introduction to GitLab and Git; adding and modifying files, setting up SSH key, add, commit, and push commands, checkout; creating branches and working collaboratively, pull, merge, rebase, resolving conflicts.
Files: lab notes, slides.
11 We Feb  3 Elements of Information Retrieval
More Perl examples with N-gram collection, using Ngrams module; Elements of information retrieval: typical IR system architecture, vector space model. reading: [JM] 23.1 (Information Retrieval), [MS] Ch.15 (Topics in Information Retrieval).
Files: slides, lecture notes.
L4 Th Feb  4 Lab 4 (6509): Git and GitLab Tutorial Files: lab notes, slides.A1 due
  Fr Feb  5Munro Day, University closed  
  Mo Feb  8Snow Day, University closed, classes suspended P0 due
12 We Feb 10 IR Evaluation Measures
Some interesting links: Lucene, IR book by Manning, Raghavan, and Schutze. IR Evaluation Measures: Precision, Recall, and F-measure; Precision-Recall curve.
Files: slides, lecture notes.
L5 -- Lab 5 (4152): Python NLTK Tutorial 1
Introduction to Python: basics, lists, tuples, dictionaries; Introduction to NLTK: tokenization, stop-words, stemming, n-grams, frequency distribution, classification.
Files: lab notes, slides.
L5 Th Feb 11 Lab 5 (6509): Python NLTK Tutorial 1 Files: lab notes, slides. 
13 Fr Feb 12 Text Classification
Precision-Recall curve (continued). Text mining. Text Classification: Classifier Evaluation, Evaluation measures for text classification: accuracy, precision, recall; macro and micro averaging; Evaluation methods for classification: underfitting and overfitting.
Files: slides, lecture notes.
  Mo Feb 15Nova Scotia Heritage Day, University closed  
  Mo Feb 15Winter Study Break, no classes, Feb 15-19  
14 Mo Feb 22 Similarity-based Text Classification
Evaluation methods for classification (continued); Text clustering; Similarity-based text classification. CNG classification method for authorship attribution. Probabilistic approach to NLP: logical vs. plausible reasoning in AI and NLP; Brief review of elements of probability theory;
Files: slides, lecture notes.
L6 Mo Feb 22 Lab 6 (4152): Python NLTK Tutorial 2
Part-of-speech taggers in NLTK: HMM and CRF, Brill tagger; Named entity chunking; Jupyter and using JupyterHub.
Files: lab notes, slides.
  Part III: Probabilistic Approach to NLP
15 We Feb 24 Probabilistic Modeling
Bayesian inference, generative models. Probabilistic modeling: random variables, configurations, and models; computational tasks; joint distribution model.
Files: slides, lecture notes.
L6 Th Feb 25 Lab 6 (6509): Python NLTK Tutorial 2 Files: lab notes, slides. 
16 Fr Feb 26P0 Topics Discussion (1)
Projects discussion: P-01, P-02, P-03, P-04, P-05, P-06, P-07, P-08, P-09, P-11.
Files: P0 slides.
17 Mo Mar  1P0 Topics Discussion (2)
Projects discussion: P-10, P12, P-13, P-14, P-15, P-17, P-18, P-19, P-20, P-21, P-22.
Files: P0 slides.
L7 Mo Mar  1 Lab 7 (4152): Fetching Tweets with Python Files: lab notes. 
18 We Mar  3Naive Bayes Classification Model
Fully independent model; efficient product-sum formula. Naive Bayes model: definition, assumption, graphical model, computational tasks.
Files: slides, lecture notes.
L7 Th Mar  4 Lab 7 (6509): Fetching Tweets with Python Files: lab notes. 
19 Fr Mar  5 N-gram Model
Naive Bayes model (continued): spam example, number of parameters, pros and cons of the NB model, additional notes. N-gram model: language modeling, n-gram model assumption, Markov property, Markov Chain. reading:[JM] Ch4 N-Grams
Files: slides, lecture notes.
P1 due
20 Mo Mar  8 N-gram Model Smoothing
N-gram model (continuted): perplexity. Text classification using language modeling. N-gram model smoothing: Lapace smoothing, Witten-Bell discounting.
Files: slides, lecture notes.
  Mo Mar  8Last day to drop classes with "W"  
21 We Mar 10 POS Tagging
N-gram model smoothing (continued): Witten-Bell discounting formulae. POS tags: introduction, open and closed word categories. reading: [JM] Ch5 Part-of-Speech Tagging. Open word categories: nouns (NN, NNS, NNP, NNPS), adjectives (JJ, JJR, JJS), verbs (VB, VBP, VBZ, VBG, VBD, VBN), adverbs (RB, RBR, RBS); Closed word categories: DT, WDT.
Files: slides, lecture notes.
22 Fr Mar 12 Hidden Markov Model (HMM)
Closed word categories (continued): PDT, PRP, PRP$, WP, WP$, IN, RP, POS, MD, TO, RB (closed), WRB, CC, UH; Other POS classes: EX, FW, LS, punctuation, SYM. Examples. Hidden Markov Model (HMM): motivation, definition, HMM assumption, applications, POS tagging. reading: [JM] Ch. 6 (HMM, first part)
Files: slides, lecture notes.
A2 due
  Part IV: Parsing (Syntactic Processing)
23 Mo Mar 15 Introduction to Prolog
Implicative normal form and Horn clauses; rules, facts, and a simple Prolog example; Unification and backtracking, variables, lists, factorial example.
Files: slides, lecture notes.
L8 Mo Mar 15 Lab 8 (4152): Prolog Tutorial 1 Files: lab notes, slides. 
24 We Mar 17 Natural Language Syntax
Parsing (Syntactic Processing): Natural language syntax: phrase structure, clauses, sentences; reading: [JM] Ch 12; parsing, parse tree examples. Contest-Free Grammars review: definition, parse trees, derivations and other concepts, bracket representation.
Files: slides, lecture notes.
L8 Th Mar 18 Lab 8 (6509): Prolog Tutorial 1 Files: lab notes, slides. 
25 Fr Mar 19 Guest Speaker, Dash Hudson Files: slides, lecture notes. 
26 Mo Mar 22 NL Parsing in Prolog Files: slides, lecture notes. 
L9 Mo Mar 22 Lab 9 (4152): Prolog Tutorial 2 Files: lab notes, slides. 
27 We Mar 24 Probabilistic Context-Free Grammars
Probabilistic Context-Free Grammars (continued): computational tasks, evaluation, generation; expressing PCFGs in DCGs, CYK chart parsing algorithm, CNF.
Files: slides, lecture notes.
L9 Th Mar 25 Lab 9 (6509): Prolog Tutorial 2 Files: lab notes, slides. 
28 Fr Mar 26 Efficient Parsing with PCFGs
CYK algorithm, efficient PCFG marginalization.
Files: slides, lecture notes.
29 Mo Mar 29 Efficient Inference in HMMs
Efficient inference in PCFG model: conditioning, completion (parsing); Issues with PCFGs: structural and lexical dependencies. Back to HMM: Example of learning HMM.
Files: slides, lecture notes.
30 We Mar 31 CFGs for Natural Languages
Efficient inference in HMMs (continued): brute-force approach, tagging example, dynamic programming approach, Viterbi algorithm example. Typical phrase structure rules in English: Sentence (S), Noun Phrase (NP), Verb Phrase (VP), Prepositional Phrase (PP), Adjective Phrase (ADJP), Adverbial Phrase (ADVP), others.
Files: slides, lecture notes.
  Th Apr 1A3 due A3 due
  Fr Apr  2Good Friday, University closed  
31 Mo Apr  5 Heads and Dependency, NL Phenomena
Heads and dependency: example, head-feature principle, dependency trees, arguments and adjuncts; Non-context-free Natural Language Phenomena: are NLs context-free in formal sense?, NL phenomena: agreement, movement, subcategorization.
Files: slides, lecture notes.
  Part VI: Student Presentations
32 We Apr  7 (Friday schedule) Student Presentations 09:30-10:30
09:30-10:30: PT-25* (Rafael), PT-26* (Mack), PT-27* (Caleidgh), PT-28* (Janessa);
16:00-17:00: PT-45* (Gautam), PT-46* (Mani), PT-47* (Kamal), PT-48* (Harsh);
33 Th Apr  8 (Friday schedule) Student Presentations
09:30-10:30: PT-01* (p-04 Matthew, Cooper, Amous), PT-02, PT-03* (Christian, Ryan S.), PT-04* (Hamideh);
10:45-11:45: PT-05* (p-05 Sijia, Frank, Tina), PT-06* (Wen), PT-07* (Alex), PT-08* (James);
12:00-13:00: PT-09* (Noah), PT-10* (Rakshit), PT-11* (Bhuvaneshwari), PT-12* (Shakhboz);
13:30-14:30: PT-13* (p-18 Elizabeth, Liam, Lauryn, Will), PT-14* (Shyam), PT-15* (p-11 Youssof, Imaad, Harris, Ravi), PT-16* (p-12 Ryan G.);
14:45-15:45: PT-17* (Elias), PT-18* (Yiping), PT-19* (p-07 Yiwei, Bote, Zhilei), PT-20* (Moath);
16:00-17:00: PT-21* (Nathaniel), PT-22* (Adrian), PT-23* (p-03 Gabriel, Ben, Rhys, Courtney), PT-24* (Harvey);
  Th Apr  8Term end  
  Mo Apr 12 Project Report due Reports due
  Th Apr 15 A4 due A4 due
  Final Exam
  Tu Apr 20Final Exam (electronic) (8:30-11:30)
The instructions for completing the exam will be sent by email. Exams schedule URL: http://www.dal.ca/academics/exam_schedule/halifax_campus_exam_schedule.html

Maintained by: Vlado Keselj, last update: 03-May-2021