課程名稱︰數位語音處理概論
課程性質︰選修
課程教師︰李琳山教授
開課學院:電機資訊學院
開課系所︰電機系、資工所、網媒所
考試日期(年月日)︰2018/11/28
考試時限(分鐘):120分鐘
試題 :
Introduction to Digital Speech Processing, Midterm Exam
Nov. 28, 2018, 10:00-12:00
● OPEN Lecture Power Point (Printed Version) and Personal Notes
● You have to use CHINESE sentences to answer all the questions, but
you can use English terminologies
● Total points: 100
1. Take a look at the block diagram of a speech recognition system in
Figure 1.
https://i.imgur.com/9snxD8J.png
Figure 1: A Speech Recognition System
(a) In the block of front-end processing, why do we use the
filter-bank? (4%)
(b) Explain the rules of the acoustic models, lexicon, and language
model in Figure 1? (12%)
(c) Why do we need smoothing in the language model? (2%)
(d) Which part includes the HMM-GMM? (2%)
2. Given a HMM λ= (A, B, π) with N states, an observation sequence
O = o1o2 ... ot ... oT and a state sequence q = q1q2 ... qt ... qT,
define
α (i) = Prob[o1o2 ... ot, q = i|λ]
t t
β (i) = Prob[o o ... o | q = i, λ]
t t+1 t+2 T t
N
(a) What is Σ α (i)β (i)? Show your results. (4%)
i=1 t t
α (i)β (i)
t t
(b) What is -------- ? Show your results. (4%)
N
Σ α (j)β (j)
j=1 t t
(c) What is α (i) a b (o )β (j)? Show your results. (4%)
t ij j t+1 t+1
(d) Formulate and describe the Viterbi algorithm to find the best
state sequence q* = q1*q2* ... qt* ... qT* giving the highest
probability Prob[O, q*|λ]. Explain how it works and why
backtracking is necessary. (4%)
3. Explain what is a tree lexicon and why it is useful in speech
recognition. (8%)
4. (a) Given a discrete-valued random variable X with probability
distribution
M
{p = Prob(X = x ), i = 1, 2, 3, ..., M}, Σ p = 1
i i i=1 i
M
Explain the meaning of H(X) = –Σ p [log(p )]. (4%)
i=1 i i
(b) Explain why and how H(X) above can be used to select the
criterion to split a node into two in developing a decision
tree. (4%)
5. (a) What is the perplexity of a language source? (4%)
(b) What is the perplexity of a language model with respect to a
corpus? (4%)
(c) How are they related to a "virtual vocabulary"? (4%)
6. Please answer the following questions.
(a) Explain what a triphone is and why it is useful. (4%)
(b) Explain why and how the unseen triphones can be trained using
decision trees. (4%)
7. What is the prosody of speech signals? How is it related to
text-to-speech synthesis of speech? (6%)
8. Explain why and how beam search and two-pass search are useful in
large vocabulary continuous speech recognition. (8%)
9. Please briefly describe LBG algorithm and K-means algorithm
respectively. Which one of the above two algorithms usually
performs better? (Explain your answer with descriptions, not just
formula only.) (8%)
10.Homework problems (You can choose either HW2-1 or HW2-2 to answer.)
HW2-1
(a) We added the sp and sil model in HW2-1. How can they be used in
digital recognition? (2%)
(b) Write down two methods to improve the baseline of the digital
recognizer and explain the reason. (4%)
HW2-2
(a) Why do we use Right-Context-Dependent Initial/Final to label?
(2%)
(b) What characteristics can we use to help distinguish the Initials
and Finals? (4%)