[試題] 103-1 陳炳宇 計算機組織與結構 期末考

作者: h999342 (翔子)   2015-01-16 17:22:48
課程名稱︰計算機組織與結構
課程性質:資管系大二必修
課程教師︰陳炳宇
開課學院:管理學院
開課系所︰資訊管理學系
考試日期(年月日)︰2015/1/13
考試時限(分鐘):180分鐘
是否需發放獎勵金:是
(如未明確表示,則不予發放)
試題 :
Computer Organization and Structure
Final Exam.
Date: 2015/1/13
1. (8%) We have a program core consisting of five conditional branches.
The program core will be executed thousands of times. Below are the outcomes
of each branch for one execution of the program core (T for taken, N for
not taken).
Branch 1: T-T-T
Branch 2: N-N-N-N
Branch 3: T-N-T-N-T-N
Branch 4: T-T-T-N-T
Branch 5: T-T-N-T-T-N-T
Assume the behavior of each branch remains the same for each program core
execution. For dynamic schemes, assume each branch has its own prediction
buffer and each buffer initialized to the same state before each execution.
List the predictions for the following branch prediction schemes:
a. Always taken
b. Always not taken
c. 1-bit predictor, initialized to predict taken
d. 2-bit predictor, initialized to weakly predict taken
What are the prediction accuracies?
2. (2%) What is the difierence between CPU and GPU? What kinds of problems are
GPUs suited to handle?
3. (12%) For a direct-mapped cache design with 32-bit address, the following
bits of the address are used to access the cache.
Tag Index Offset
a. 31-12 11-6 5-0
b. 31-10 9-5 4-0
a. What is the cache line size (in words)?
b. How many entries does the cache have?
c. What is the ratio between total bits required for such a cache
implementation over the data storage bits?
4. (15%) What is the average CPI for each of the following 3 schemes taking to
execute the code sequence below? (Note: For the pipeline scheme, there are
five stages: 1F, ID, EX, MEM, and WB. We assume the reads and writes of
register file can occur in the same clock cycle, and the stall circuits
are available.)
add $t3 , $s1 , $s2
sub $t1 , $s1 , $s2
lw $t2, 100($t3)
sub $s1, $tl, $t2
a. single cycle scheme
b. pipelined scheme without data forwarding hardware
c. pipelined scheme with data forwarding hardware (one from EX/MEM to ALU input and the other from MEM/WB to ALU input) available
5. (8%) Consider the following code segment in C:
A = B + E;
C = B + F;
Here is the generated MIPS code for this segment, assuming all variables are
in memory and are addressable as offsets from $t0:
lw $tl, O($t0)
lw $t2, 4($t0)
add $t3, $t1, $t2
sw $t3, 12($t0)
lw $t4, 8($t0)
add $t5, $tl, $t4
sw $t5, 16($t0)
Find the hazards in the code segment and reorder the instructions to avoid
any pipeline stalls.
6. (20%) A majority function is generated in a combinational circuit when the
output is equal to 1 if the input variables have more 1's than 0's. The
output is 0 otherwise.
a. Please write the truth table for a 4-input majority function.
b. What are the functions in sum of products forms? (you can just use
"little m" notation)
c. Please use the Kamaugh map to find the minimum sum of products form and
the minimum sum of products form for the complement.
d. Please draw the logic schematic by using AND, OR, and INVERT gates.
7. (15%) Assume the three caches below, each consisting of 16 words. Given the
series of address references as word addresses: 2, 3, 4, 16, 18, 16, 4, 2.
Please label each reference as a hit or a miss for the three caches (a),
(b), and (c) below. Assuming that LRU is used for cache replacement
algorithm and all the caches are initially empty.
a. A direct-mapped cache with 16 one-word blocks
b. A direct-mapped cache with 4 four-word blocks
c. A four-way set associative cache with block size of one-word
8. (10%) Suppose we have a processor with a base CPI of 1.0, assuming all
references hit in the primary cache, and a clock rate of 5 GHz. Assume a
main memory access time of 100 ns, including all the miss handling.
Suppose the miss rate per instruction at the primary cache is 2%. How much
faster will the processor be if we add a secondary cache that has a 5 ns
access time for either a hit or a miss and is large enough to reduce the
miss rate to main memory to 0.6%?
9. (10%) Please describe the Amdahl's law on parallel computing and use it to
calculate the following question. There is a task with 60% work
parallelizable, what is the speed up if it runs using 10 processors?

Links booklink

Contact Us: admin [ a t ] ucptt.com