課程名稱︰自然語言處理
課程性質︰選修
課程教師:陳信希
開課學院:電資學院
開課系所︰資工所、網媒所、知識管理學程
考試日期(年月日)︰2015.06.25
考試時限(分鐘):180分鐘
試題 :
Natural Language Processing Final-Term Exam (2015.6.25)
The final-term exam is composed of 14 sets of questions. In total, there are
145 points. Please answer the questions as possible as you can within
9:10-12:10.
1. Given a sentence as follows:
"He took the cheapest flight from Taipei to Tokyo leaving in the morning."
Answer each of the following questions: (10 points)
(a) What is the argument structure of the verb "take"?
(b) What is the head noun of the object NP of "take"?
(c) What are the pre-noun-modifier and post-noun-modifier in this sentence?
2. Draw the constituent structure (i.e., phrase structure) and the dependency
structure of the sentence, "Sue bought a plant with yellow leaves",
respectively. (10 points)
3. CKY parsing and Chart parsing provide mechanisms to avoid repeated work
during parsing. What are the differences of these two parsing frameworks?
(10 points)
4. Assume we have the following grammar rules.
S→np vp, vp→tv np, vp→sv s, vp→iv, np→det n
np→I, tv→saw, sv→saw, det→her, np→her, n→duck, iv→duck
(a) What rules will be used in a bottom-up chart parser? (4 points)
(b) Please draw the chart for the sentence "I saw her duck" with bottom-up
chart parsing. (8 points)
(c) Extract all the possible parsing trees from the chart. (4 points)
5. Both inside algorithm and outside algorithm individually can compute the
probability of a sentence given a probabilistic context free grammar (PCFG).
When will both algorithms be applied? (4 points)
6. To simplify the derivation of outside probability shown as follows, we
assume a grammar is in Chomsky normal form (CNF). Please specify how to
extend the idea to deal with probabilistic context free grammars (PCFGs).
(10 points)
m
α_j(p, q) = [ Σ Σ α_f(p, e) P(N^f → N^j N^g) β_g(q+1, e)]
f,g≠j e=q+1
p-1
+ [ Σ Σ α_f(e, q) P(N^f → N^g N^j) β_g(e, p-1)]
f,g e=1
7. Please use the four transition-actions, i.e., shift, reduce, arc-left, and
arc-right, to derive the correct dependency structure for the sentence
"I saw her duck" in the arc-eager dependency parser. Assume the four words,
I, saw, her, and duck, have the parts-of-speech defined in Question 4.
(10 points)
8. Word sense disambiguation (WSD) is a fundamental problem in natural
language processing. A word in a sentence may have more than one sense.
Contextual information of the target word is usually adopted to
disambiguate its use. However, the words themselves in the context may
be ambiguous too. In other words, disambiguating word A may need the
supports of words B in A's context. Disambiguating B may need to
disambiguate A because A is in the context of B. How do you deal with
the chicken-egg problem in WSD? (10 points)
9. Assume we define a university database with the following relation schemas.
Each attribute of each relation is self-explanatory.
STUDENT
┌───┬────────┬────┬────┐
│ Name │ StudentNumber │ Class │ Major │
└───┴────────┴────┴────┘
COURSE
┌──────┬───────┬───────┬──────┐
│ CourseName │ CourseNumber │ CreditHours │ Department │
└──────┴───────┴───────┴──────┘
PREREQUISITE
┌───────┬──────────┐
│ CourseNumber │ PrerequisiteNumber │
└───────┴──────────┘
SECTION
┌──────────┬───────┬─────┬───┬──────┐
│ SectionIdentifier │ CourseNumber │ Semester │ Year │ Instructor │
└──────────┴───────┴─────┴───┴──────┘
GRADE_R EPORT
┌────────┬──────────┬────┐
│ StudentNumber │ SectionIdentifier │ Grade │
└────────┴──────────┴────┘
We plan to design a system supporting natural language access to this
university database. This system accepts queries such as find the grade
report of John Yen in 2014; list the prerequisite of data structures and
algorithms; and so on. Please specify the issues to design such a system.
(10 points)
10. There are four types of discourse relations defined at the top level of
Penn discourse tree bank (PDTB). That is, temporal, contingency,
comparison and expansion. Please label a discourse relation for each of
the following sentences. (10 points)
(a) If he is injured, he cannot play the second half of the game.
(b) Voltaire is the leader of the Enlightenment. In addition, he is also
a prolific writer.
(c) Despite the typhoon, the police still have to patrol outside.
(d) The typhoon struck; the school has broken up.
(e) This sentence is first segmented, and then tagged with POS.
11. The following is a review for a hotel. Please identify the opinions and
aspects for the target hotel. (10 points)
就在東京車站走路5分鐘路程,旁邊緊鄰地鐵站京橋站,對於去成田機場客人是不
錯的選擇,就是房間比較mini,位置沒得說,客人以歐美人居多
(selected from booking.com)
12. Selectional restriction/selectional preference is quite useful in many NLP
applications. Please explain their uses in the following cases. (15 points)
(a) Used in determination of the possible semantic type of an unknown word.
(b) Used in pronominal anaphora resolution.
(c) Used in relative clause attachment.
13. Identify the co-reference chains from the following news article.
(10 points)
Pledges by institutions and individuals to purchase "green" power from
state-owned Taiwan Power Co (Taipower) have far exceeded a goal set by
the government for this year after several business heavyweights
announced their participation in the program, the Bureau of Energy said
yesterday.
On Tuesday, TSMC, the world's largest contract chip maker, announced
that it would purchase 100 million kWh of green power, accounting for
almost 13 percent of what Taipower's has available.
On Thursday, I-Mei Foods pledged to buy 2.5 million kWh of green power.
The top three buyers last year were IC packaging and testing services
provider Advanced Semiconductor Engineering Inc, Fubon Financial Holding
Co and petrochemical firm Swancor Industry Co, with those three
accounting for almost 90 percent of the power purchased.
(selected from Taipei Times)
14. In the term project, you are asked to detect and correct extra-words in
given sentences.
(a) Please specify your idea in the correction part. (5 points)
(b) If we plan to extend this research to detect if there are any missing
words in given sentences, please propose any ideas learned from the
term project. (5 points)