Foreword |
|
vii | |
Preface |
|
xvii | |
Acknowledgments |
|
xxv | |
|
Fundamentals of Computer Design |
|
|
|
|
2 | (2) |
|
The Changing Face of Computing and the Task of the Computer Designer |
|
|
4 | (7) |
|
|
11 | (3) |
|
Cost, Price, and Their Trends |
|
|
14 | (10) |
|
Measuring and Reporting Performance |
|
|
24 | (15) |
|
Quantitative Principles of Computer Design |
|
|
39 | (9) |
|
Putting It All Together: Performance and Price-Performance |
|
|
48 | (8) |
|
Another View: Power Consumption and Efficiency as the Metric |
|
|
56 | (1) |
|
|
57 | (8) |
|
|
65 | (2) |
|
Historical Perspective and References |
|
|
67 | (23) |
|
|
74 | (16) |
|
Instruction Set Principles and Examples |
|
|
|
|
90 | (2) |
|
Classifying Instruction Set Architectures |
|
|
92 | (3) |
|
|
95 | (6) |
|
Addressing Modes for Signal Processing |
|
|
101 | (3) |
|
Type and Size of Operands |
|
|
104 | (1) |
|
Operands for Media and Signal Processing |
|
|
105 | (3) |
|
Operations in the Instruction Set |
|
|
108 | (1) |
|
Operations for Media and Signal Processing |
|
|
109 | (2) |
|
Instructions for Control Flow |
|
|
111 | (6) |
|
Encoding an Instruction Set |
|
|
117 | (3) |
|
Crosscutting Issues: The Role of Compilers |
|
|
120 | (9) |
|
Putting It All Together: The MIPS Architecture |
|
|
129 | (7) |
|
Another View: The Trimedia TM32 CPU |
|
|
136 | (6) |
|
|
142 | (5) |
|
|
147 | (1) |
|
Historical Perspective and References |
|
|
148 | (24) |
|
|
161 | (11) |
|
Instruction-Level Parallelism and Its Dynamic Exploitation |
|
|
|
Instruction-Level Parallelism: Concepts and Challenges |
|
|
172 | (9) |
|
Overcoming Data Hazards with Dynamic Scheduling |
|
|
181 | (8) |
|
Dynamic Scheduling: Examples and the Algorithm |
|
|
189 | (7) |
|
Reducing Branch Costs with Dynamic Hardware Prediction |
|
|
196 | (13) |
|
High-Performance Instruction Delivery |
|
|
209 | (6) |
|
Taking Advantage of More ILP with Multiple Issue |
|
|
215 | (9) |
|
Hardware-Based Speculation |
|
|
224 | (16) |
|
Studies of the Limitations of ILP |
|
|
240 | (13) |
|
Limitations on ILP for Realizable Processors |
|
|
253 | (6) |
|
Putting It All Together: The P6 Microarchitecture |
|
|
259 | (13) |
|
Another View: Thread-Level Parallelism |
|
|
272 | (1) |
|
Crosscutting Issues: Using an ILP Data Path to Exploit TLP |
|
|
273 | (1) |
|
|
273 | (3) |
|
|
276 | (4) |
|
Historical Perspective and References |
|
|
280 | (24) |
|
|
288 | (16) |
|
Exploiting Instruction-Level Parallelism with Software Approaches |
|
|
|
Basic Compiler Techniques for Exposing ILP |
|
|
304 | (9) |
|
|
313 | (2) |
|
Static Multiple Issue: The VLIW Approach |
|
|
315 | (4) |
|
Advanced Compiler Support for Exposing and Exploiting ILP |
|
|
319 | (21) |
|
Hardware Support for Exposing More Parallelism at Compile Time |
|
|
340 | (10) |
|
Crosscutting Issues: Hardware versus Software Speculation Mechanisms |
|
|
350 | (1) |
|
Putting It All Together: The Intel IA-64 Architecture and Itanium Processor |
|
|
351 | (12) |
|
Another View: ILP in the Embedded and Mobile Markets |
|
|
363 | (7) |
|
|
370 | (2) |
|
|
372 | (1) |
|
Historical Perspective and References |
|
|
373 | (17) |
|
|
378 | (12) |
|
|
|
|
390 | (2) |
|
Review of the ABCs of Caches |
|
|
392 | (14) |
|
|
406 | (7) |
|
Reducing Cache Miss Penalty |
|
|
413 | (10) |
|
|
423 | (12) |
|
Reducing Cache Miss Penalty or Miss Rate via Parallelism |
|
|
435 | (8) |
|
|
443 | (5) |
|
Main Memory and Organizations for Improving Performance |
|
|
448 | (6) |
|
|
454 | (6) |
|
|
460 | (9) |
|
Protection and Examples of Virtual Memory |
|
|
469 | (9) |
|
Crosscutting Issues: The Design of Memory Hierarchies |
|
|
478 | (4) |
|
Putting It All Together: Alpha 21264 Memory Hierarchy |
|
|
482 | (8) |
|
Another View: The Emotion Engine of the Sony Playstation 2 |
|
|
490 | (4) |
|
Another View: The Sun Fire 6800 Server |
|
|
494 | (4) |
|
|
498 | (6) |
|
|
504 | (1) |
|
Historical Perspective and References |
|
|
504 | (24) |
|
|
513 | (15) |
|
Multiprocessors and Thread-Level Parallelism |
|
|
|
|
528 | (12) |
|
Characteristics of Application Domains |
|
|
540 | (9) |
|
Symmetric Shared-Memory Architectures |
|
|
549 | (11) |
|
Performance of Symmetric Shared-Memory Multiprocessors |
|
|
560 | (16) |
|
Distributed Shared-Memory Architectures |
|
|
576 | (8) |
|
Performance of Distributed Shared-Memory Multiprocessors |
|
|
584 | (6) |
|
|
590 | (15) |
|
Models of Memory Consistency: An Introduction |
|
|
605 | (3) |
|
Multithreading: Exploiting Thread-Level Parallelism within a Processor |
|
|
608 | (7) |
|
|
615 | (7) |
|
Putting It All Together: Sun's Wildfire Prototype |
|
|
622 | (13) |
|
Another View: Multithreading in a Commercial Server |
|
|
635 | (1) |
|
Another View: Embedded Multiprocessors |
|
|
636 | (1) |
|
|
637 | (6) |
|
|
643 | (6) |
|
Historical Perspective and References |
|
|
649 | (29) |
|
|
665 | (13) |
|
|
|
|
678 | (1) |
|
|
679 | (13) |
|
Buses---Connecting I/O Devices to CPU/Memory |
|
|
692 | (10) |
|
Reliability, Availability, and Dependability |
|
|
702 | (3) |
|
RAID: Redundant Arrays of Inexpensive Disks |
|
|
705 | (5) |
|
Errors and Failures in Real Systems |
|
|
710 | (6) |
|
|
716 | (4) |
|
|
720 | (11) |
|
Benchmarks of Storage Performance and Availability |
|
|
731 | (6) |
|
|
737 | (4) |
|
Designing an I/O System in Five Easy Pieces |
|
|
741 | (13) |
|
Putting It All Together: EMC Symmetrix and Celerra |
|
|
754 | (6) |
|
Another View: Sanyo VPC-SX500 Digital Camera |
|
|
760 | (3) |
|
|
763 | (6) |
|
|
769 | (1) |
|
Historical Perspective and References |
|
|
770 | (18) |
|
|
778 | (10) |
|
Interconnection Networks and Clusters |
|
|
|
|
788 | (5) |
|
|
793 | (9) |
|
Interconnection Network Media |
|
|
802 | (3) |
|
Connecting More Than Two Computers |
|
|
805 | (9) |
|
|
814 | (7) |
|
Practical Issues for Commercial Interconnection Networks |
|
|
821 | (4) |
|
Examples of Interconnection Networks |
|
|
825 | (5) |
|
|
830 | (4) |
|
Crosscutting Issues for Interconnection Networks |
|
|
834 | (4) |
|
|
838 | (5) |
|
|
843 | (12) |
|
Putting It All Together: The Google Cluster of PCs |
|
|
855 | (7) |
|
Another View: Inside a Cell Phone |
|
|
862 | (5) |
|
|
867 | (3) |
|
|
870 | (1) |
|
Historical Perspective and References |
|
|
871 | |
|
|
877 | |
Appendix A Pipelining: Basic and Intermediate Concepts |
|
|
|
A-2 | |
|
A.2 The Major Hurdle of Pipelining-Pipeline Hazards |
|
|
A-11 | |
|
A.3 How Is Pipelining Implemented? |
|
|
A-26 | |
|
A.4 What Makes Pipelining Hard to Implement? |
|
|
A-37 | |
|
A.5 Extending the MIPS Pipeline to Handle Multicycle Operations |
|
|
A-47 | |
|
A.6 Putting It All Together: The MIPS R4000 Pipeline |
|
|
A-57 | |
|
A.7 Another View: The MIPS R4300 Pipeline |
|
|
A-66 | |
|
|
A-67 | |
|
A.9 Fallacies and Pitfalls |
|
|
A-77 | |
|
|
A-78 | |
|
A.11 Historical Perspective and References |
|
|
A-78 | |
|
|
A-81 | |
Appendix B Solutions to Selected Exercises |
|
|
|
B-2 | |
|
|
B-2 | |
|
|
B-7 | |
|
|
B-11 | |
|
|
B-16 | |
|
|
B-21 | |
|
|
B-25 | |
|
|
B-29 | |
|
|
B-30 | |
|
|
B-35 | |
|
Online Appendices (www.mkp.com/CA3/) |
|
|
Appendix C A Survey of RISC Architectures for Desktop, Server, and Embedded Computers |
|
Appendix D An Alternative to RISC: The Intel 80x86 |
|
Appendix E Another Alternative to RISC: The VAX Architecture |
|
Appendix F The IBM 360/370 Architecture for Mainframe Computers |
|
Appendix G Vector Processors |
|
|
Appendix H Computer Arithmetic |
|
|
Appendix I Implementing Coherence Protocols |
|
References |
|
R-1 | |
Index |
|
I-1 | |