課程名稱︰數位語音處理概論
課程性質︰電機系/資訊系選修
課程教師︰李琳山
開課學院:電資學院
開課系所︰電機系
考試日期(年月日)︰108.11.09
考試時限(分鐘):120
試題 :
註:以下部分數學式以LaTeX語法表示。
1. (8 pts) What is GMM? How do we use it with HMM for continuous speech recong-
nition?
2. (12 pts) Given an HMM with parameters \lambda = (A, B, \pi), an observation
sequence \bar{O} = o_1,...,o_t,...,o_T and a state sequence \bar{q} = q_1,..
.,q_t,...,q_T, define
\alpha_t(i) = Prob[o_1,...,o_t, q_t = i | \lambda]
\beta_t(i) = Prob[o_{t+1},...,o_T | q_t = i, \lambda]
We usually assume Prob[\bar{O}, q_t = i | \lambda] = \alpha_t(i)\beta_t(i).
(3 pts) Show that Prob(\bar{O} | \lambda) =
\sum_{i=1}^N[\alpha_t(i)\beta_t(i)].
(3 pts) Show that Prob(q_t = i | \bar{O}, \lambda) =
\frac{\alpha_t(i)\beta_y(i)}{\sum_{i=1}^N[\alpha_t(i)\beta_t(i)]}.
(6 pts) Formulate and describe the procedures for Viterbi algorithm to find
the best state sequence \bar{q}^* = q_1^*,...,q_t^*,...,q_T^*.
3. (10 pts) Please explain how LBG algorithm and K-means algorithm work respec-
tively. Does K-means algorithm always yeild the same result regardless of d-
ifferent initialization?
4. (10 pts) While training triphone acoustic models, data and parameter sharing
is a common approach to ensure that there is enough data to train each acou-
stic model. Such sharing technique usually occurs on the state level. Please
explain what this means.
5. (15 pts) You are taking an adventure in the Mabao forest. There are only fo-
ur kinds of animals in the forest: otters, foxes, squirrels and duckbills.
You know that the population percentage of each kind of animals is 30%, 20%,
40% and 10%, respectively.
One morning, you see a brown-colored creature with white strips on its back
and a black tail run away swiftly, while it is too sudden that you cannot c-
learly recognize which species it is. Luckily, you have got the probability
of the three characteristics observed on each of the four species from a pr-
evious research listed in Table 1, where o_1, o_2, o_3 refer to "brown-colo-
red", "white-striped" and "black-tailed".
Moreover, you know that for each of the four species, the three characteris-
tics happen independently, that is \forall i \neq j, o_i \neq o_j | c_k.
In order to make you guess more efficient so that you can spend most of your
time enjoying the wilderness, you decide to make a decision tree for animal
classification based on the three question: "Whether it is brown-colored",
"Whether it has white strips" and "Whether it has black tail". The decision
tree is like the one in Figure 1. Please build up this decision tree by put-
ting the three questions into the three nodes. What is the entropy reduction
resulted from the uppermost node of the tree? YOu are allowed to leave the
logarithmic term in your answer instead of giving a numerical solution.
| p(o_1 | c_i) | p(o_2 | c_i) | p(o_3 | c_i) |
otter(c_1) | 0.8 | 0.3 | 0.8 |
fox(c_2) | 0.1 | 0.3 | 0.4 |
squirrel(c_3) | 0.2 | 0.7 | 0.4 |
duckbill(c_4) | 0.8 | 0.3 | 0.2 |
Table 1: The Posterior Probability of the Three Charecteristics
question A
/(T) \(F)
question B question C
/(T) \(F) /(T) \(F)
class a class b class c class d
Figure 1: Sample Decision Tree
(Hint: You do not need to actually compute the entropy of the whole tree. I-
stead, you should be able to come up with a "best" tree structure by simply
looking at the posterior probability of the three characteristics. Trust yo-
ur intuition!)
6. (10 pts) Explain: What is entropy? What is perplexity of a language model w-
ith respect to a test corpus?
7. (10 pts) Explain the OOV problem and how this problem for high frequency OOV
words can be solved for Chinese language.
8. (10 pts) Explain the following two things:
(5 pts) What are excitation and formant structure? Which one is more import-
ant in speech recognition? Why?
(5 pts) What is voiced speech? What is pitch? How is it related to the tone
in Mandarin Chinese?
9. (6 pts) Describe the precise way of measuring the recognition errors between
the following two strings in digital string recognition:
(3 pts) a as the reference and b the machine output
(3 pts) b as the reference and a the machine output
(a) 52030325 (b) 5940345
10. (9 pts) Explain what bean search is. Whata are the advantage of using it in
a large-vocabulary continuous speech recognition system? What are the trade
-off in choosing the bean which for it?