課程名稱︰自然語言處理
課程性質︰系內選修
課程教師︰陳信希
開課學院:電資學院
開課系所︰資訊工程學系
考試日期(年月日)︰106/04/20
考試時限(分鐘):180
試題 :
1. The following questions concern the resources used in natural language
processing (NLP) researches.
(a) Annual Meeting of Association for Computational Linguistics (ACL)
and International Conference on Computational Linguistics (COLING)
are two top tier/representative conferences in NLP. Please specify
the largest NLP archive in the world, which keeps the major
NLP conference proceedings. (5 points)
(b) If we need an English treebank to train a parser, please suggest
an organization where we can purchase the required treebank.
(5 points)
(c) If we need a balanced Chinese corpus to develop a Chinese segmentation
system, please suggest an organization where we can get the required
corpus. (5 points)
2. A pipelined NLP system can be composed of morphological processing
module, syntactic analysis module, semantic interpretation module
and discourse analysis module. Please use the following sentence
to describe any 5 operations in the pipelined system. The operations
can be selected from the same module or different modules. Please
also address to which module the mentioned operation belongs.
(20 points)
英國首相今天宣布提前大選,英鎊轉貶,但隨後重升。
3. Labelling/tagging operation plays an important role in natural
language processing. Different labels are proposed at different
analysis levels. For example, a set of part-of-speech (POS) tags
are defined at the lexical level. POS tagger aims at labelling each
word in a sentence a POS tag. Here tagging is a labelling operation.
Please specify 3 other labelling (tagging) operations in NLP.
(15 points)
4. The following shows a review of a hotel:
客房古老,面積不大,不過景觀很好,可以看見秦淮河。
The words "客房", "面積", and "景觀" are aspect terms. In contrast, the
words "古老", "大", and "好" are opinion words, which modifies aspect
terms and shows the polarity on the aspect. In some case, only opinoin
words are used in a review, but aspect terms are absent (i.e.
implicit aspect). In the sentence "這是千萬畫素裡最便宜的一台", we
know the opinion word "便宜" modifies an implicit aspect term "價錢".
Given a hotel review corpus, please propose a method to find the
collocation of opinion word and aspect term, and use the findings to
deal with implicit aspect problem. (10 points)
5. One of the applications of language model is to estimate the
probability of next word given previous n-1 words. Please compare
traditional language model and neural probability language model to
deal with this problem. (10 points)
6. In training HMM model, we need to compute number of each individual
arc (link) passed for a training instance. How can we compute this
number for each arc efficiently without enumerating all the paths?
(10 points)
7.
(a) What is zero probability problem? (5 points)
(b) What is the major problem of Laplace smoothing? (5 points)
(c) How does Kneser-Ney Smoothing word to deal with the zero probability
problem? (10 points)
(d) In traditional language modeling, smoothing technique is introduced
to avoid zero probability problem. In distributed representation, we
associate each word in the vocabulary with a distributed dense vector.
Similar words (semantically and syntactically) will be close in the
embedding space. Is it necessary to introduce smooth technique to
neural probability language model? Please explain why. You can use
the following examples to explain your answers. (10 points)
The cat is walking in the bedroom.
A dog was running in a room.
8. In analogy analysis, two pairs of words which share a relation are
given. We aim at identifying a hidden word based on three other words.
Word embedding is shown to be powerful in this application. Please
present three similarity computation methods to find the hidden word.
(10 points)