課程名稱︰自然語言處理
課程性質︰系選修
課程教師︰陳信希
開課學院:電資學院
開課系所︰資訊工程學系
考試日期(年月日)︰2021/6/24
考試時限(分鐘):180
*因疫情改為線上,可查詢網路資源
試題 :
1. Given the sentence “在 夫子廟 入口 遍布 我 喜歡 的 小吃店”, please show
the results after (a) constituency parser, (b) noun phrase chunker, and (c)
dependency parser. (15 points)
2. Assume arc-standard dependency parser is adopted. Please show the actions
to parse the sentence “在 夫子廟 入口 遍布 我 喜歡 的 小吃店”. (10 points)
3. Assume we have a set of four discourse relations – say, temporal,
contingency, comparison, and expansion, as defined in PDTB. Please judge if
”而” in each of the following sentences is a discourse connective.
If yes, please specify their relations based on the connective. (20 points)
(a) 1997 年發達國家經濟形勢的特點是[美國增長強勁]而[日本經濟疲弱]。
(b) 開放起了[積極]而[關鍵]的作用。
(c) [這當然不是歷史的巧合],而[是歷史的累積和轉接]。
(d) [水東開發區是適應乙烯工程的需要]而[建立的一個後繼加工基地]。
4. In recent years, there are important advances in the quality of
state-of-the-art models, but those models are often less interpretable.
Nowadays “explainable NLP”is an emerging research when we develop a model.
Attention mechanism is widely used operation to enable explanations.
Please explain how it achieves "explanation." (10 points)
5. Nowadays newspapers become more partisan. Some research proposes a slant
index to measure the frequency of phrases to sway readers to the left or
the right in a media outlet. Some research investigates demographic
characteristics and political attitudes of newspaper readers in Taiwan from
1992 to 2004. Their studies conclude that media are biased, i.e.,
left-wing vs. right-wing in US and pan-green vs.panblue in Taiwan. Now you are
asked to design an NN model to transform a pan-green content to a pan-blue one.
Please show your idea. (10 points)
6. There are several ways to achieve semantic analysis. One possibility is a
sequenceto-sequence model to transform an NL sentence to a semantic form.
Another possibility is to extract the most important parts from an NL sentence,
such as Arg0, Arg1, and so on. Please explain the ideas behind these two
possible solutions. (15 points)
7. One major disadvantage of skip-gram and CBOW is the same representation for
different senses of a word. Do you have any idea to capture a suitable sense
of a word based on its context? (10 points)
8. To automatically interpret the semantics of written languages, the
analysis and understanding of causal relationships between facts stand as a
key point. The following shows three examples. The 2nd column shows a passage.
The cause and the effect extracted from the passage are shown in the 3rd and
4th columns, respectively.
https://imgur.com/a/GAyJN9h
Assume you are given a cause-effect corpus consisting of passages with
annotated cause and effect segments. You are asked to design a system to
identify the cause and effect segments from the given passage. (10 points)
9. For the privacy and security issues, electronic medical records (EMRs)
have to be de-identified before being released for potential applications.
According to HIPPA, 18 types of identifiable data must be removed, including
names, telephone, email addresses, IP addresses, social security numbers,
medical record numbers, and so on. Do you have any ideas to deal with this
problem? (10 points)