目前正在讀機器翻譯文章,其中一小段一直讀不懂,懇請版上前輩給予指教。
Error classification and annotation is carried out when the focus is on
the understanding the types of errors produced by an MT system and their
frequency. An example of an error typology for the evaluation of MT
is proposed by Vilar, Xu, d'Haro, and Ney. This form of evaluation was
particularly useful when the dominant MT paradigm was rule based; that is,
it was possible to "code" linguistic rules for the transfer of words, phrases,
and grammatical structures from one source language into a target language.
The use of error typology for the more recent data-driven or statistical
machine translation is more limited because, in this case, the nature and
volume of the data, as opposed to formal lingusitic rules, dictate the output
to a large extent.
error classification指將機器譯錯的部分分類,例如missing word, incorrect
word order... 等。這一套做法不適用於statistical machine translation
(SMT有學習能力,給予幾組翻譯譯文對照,機器進行分析,得到某種規則或公式,接下
遇到類似的翻譯,就有能力翻得出來。
我看不太懂的地方在the nature and volume of the data dictate the output to a
large extent。依照SMT操作模式,譯文產出與否,DATA量做為規則歸納來源,
所以很重要,但和nature (資料種類?)什麼關係?