作者:
a123zyx (小企)
2014-04-18 10:39:32課程名稱︰自然語言處理
課程性質︰資工所選修
課程教師︰陳信希
開課學院:電資學院
開課系所︰資工所
考試日期(年月日)︰4/10
考試時限(分鐘):180min
是否需發放獎勵金:是
(如未明確表示,則不予發放)
試題 :
1.Opinion mining and sentiment analysis is a very important NLP application
nowadays. A review is usually composed of some aspects about an opinion
target and the opinion words expressing polarity about the aspects. The
following review aboit Howard Civil Service International House (福華文教
會館) is selected from the tripadvisor. Please indicate what explicit
aspects and opinion words are shown in this review. (10 points)
「我們的房間非常棒。飯店員工很不錯,而且總是會有會說英文的人
可以服務我們。他們提供非常好的用餐建議,並確認是否有優良計程
車司機可以為我們服務。地點很適合商務旅行,只要走一點路就可到
達餐廳、銀行和服務業。飯店自助餐還不錯,咖啡館也是。總之,這
是一個不錯的住宿經驗。」
2.Machine translation (MT) is another important NLP application. It aims to
translate a document in one language into a document in another language.
There are many challenging issues in designing MT systems. The following
shows an English sentence and three Chinese sentences translated by using
Google translate in 2008, 2012 and 2014, respectively. Please translate this
Englisg sentence into a Chinese one and analyze why MT is challenging from
this example. (10 points)
Source: Taiwan wins gold in woman's 75 kg powerlifting in Paralympics
2008 : 台灣勝金在婦女的75公斤 powerlifting 在殘奧會
2012 : 台灣勝在殘奧會舉重女子75公斤黃金
2014 : 台灣勝金在女子75公斤級舉重殘奧會
3.Basically, an NLP system is a pipeline of four modules which deal with
different problems on different linguistic levels. Please explain the
functions of each module. (12 points)
4.A blog post may be composed of sentences with emoticons. The non-verbal
emotional expressions described the author's feelings with s/he wrote down
the post. The following shows some typical examples. Given a collection of
sentences, each of them containing an emoticon, we plan to learn an emotion
dictionary with mutual information. The dictionary keeps the emotion
tendency of each word. Please define mutual information (MI) at first, and
then discuss how you achieve the goal with MI. (10 points)
●今天跟你約吃飯 不知為什麼特別緊張 :o
●謝謝你請我吃飯 還送我禮物:目
●但收到的時候還是很開心:P
(以上表情符號皆為圖片,僅以相似之符號表達)
5.The t-test is a useful hypothesis testing tool. It can be used to learn
multi-word expressions from a large corpus. Moreover, it can also be used
to tell out if the performance of two models differ significantly. Please
specify the TWO applications of t-test in detail. (10 points)
6.A person found an old book inside a wall when restructing a historical
building. He claimed that the book was written in the 16th century. Assume
you have several book corpora written in the 15th, 16th, ..., 20th century,
respectively. How do you verify the claim is true based on the book content?
The person further claimed that the book was written by William Shakespeare
(1564-1616). Please design a method to verify if the book is fake based on
the written style of William Shakespeare. (10 points)
7.The following defines basic symbols for smoothing.
N:total occurrences of n-grams in a training dataset.
B:total types of n-grams
r:frequency of an n-grams
Nr:total number of n-grams of frequency r in a training dataset
Tr:total occurrences of n-grams of frequency r in further dataset
Please give a formula to estimate the probability of an unseen n-gram for
each of the smoothing methods. (12 points)
(a) Add a small value λ to all types of n-grams.
(b) Subtract a constant δ from each non-zero count.
(c) Estimate by held out dataset.
8.What are the differences between deleted interpolation and back-off model?
Please take the computation of P(Wn|Wn-3,Wn-2,Wn-1) as an example.(10 points)
9.Given a model λ and an observation sequence O,
(a) find the probability of the sequence with Backward algorithm. (8 points)
(b) find the best path with Viterbi algorithm. (8 points)
10.Forward probability and backward probability are often used to determine
the parameters in an HMM model. Please show how it works. (10 points)