Handbook of Computer Architecture

Handbook of Computer Architecture

Handbook of Computer Architecture
Автор: Chattopadhyay Anupam
Дата выхода: 2025
Издательство: Springer Nature
Количество страниц: 1465
Размер файла: 19,8 МБ
Тип файла: PDF
Добавил: Федоров_АИ
 Проверить на вирусы

Preface....5

Acknowledgments....7

Contents....8

About the Editor....12

Section Editors....14

Contributors....16

Part I Single Core Processors....21

1 Microarchitecture....22

Contents....22

Introduction....23

Single-Cycle Processor Design....24

Processor Data Path....25

Processor Control Unit....30

Pipelining....31

Pipeline Principle and Performance Metrics....31

Pipelined Processors....35

Pipeline Hazards....37

Data Hazards....38

Control Hazards....44

Structural Hazards....54

Multiple-Issue Processor....54

Conclusions....62

References....64

2 The Architecture....65

Contents....65

Introduction....66

Terms and Notations....68

Laws and Models in Microprocessor/System-on-Chip (SoC) Architectures....68

ISA Selection and Considerations....72

CISC: Complex Instruction Set Computer....72

The Baseline: Looking at the ISA of the 8088 and the 8086 Processors....72

IA32 Architectures....75

Extending the Architecture to 64 Bits (X86-64 ISA)....75

IA-64 Registers....76

Adjusting the Architecture to Support New Technologies....77

Summary....79

RISC: Reduced Instruction Set Computer....79

MIPS....80

SPARC ISA....81

ARM: Advanced RISC Machines....82

ARM7-32 Bits....84

AA64 Architecture....84

Summary of ARM ISA....87

The RISC-V Approach for ISA....87

RISCV: Basic ISA (RISCV 2021)....89

RISCV: Extensions (RISCV 2021)....89

Extension M: Integer Multiplication and Division....90

Extension A: Atomic Instructions....90

Extension F: Single-Precision Floating Point....90

Summary of ISA Selection....91

Vector and SIMD Extensions....91

SIMD Architectures....92

MMX....93

Streaming SIMD Extensions (SSE)....93

Advanced Vector Extensions (AVX)....94

Support for Machine Learning....95

Discussion on the Use of SIMD Operations (in Intel's Cores)....96

Support for Vectors....96

Cross-Layers Optimizations....97

Background....98

Delayed Branch in MIPS....98

The User-Defined Microcode Programming....99

VLIW Architectures....99

HW/SW Codesign: The CUDA Approach....100

ISA Agnostic Systems....102

The Use of Intermediate Representations....102

Binary Translation....103

Summary....104

References....104

3 Architectures for Self-Powered Edge Intelligence....107

Contents....107

Evolution of Edge Intelligence and a Pathway to Self-Powered IntelligentComputations....108

Architectures for Energy Harvesting in IoT Edges....110

A Self-Powered Image Sensor System with Autonomous Mode Management (AMM)....111

Factors Affecting Self-Power Performance....112

Effects of a Processing Pipeline....112

Effects of Unit Pixel Size....112

Effects of SRAM Leakage Energy....113

Effects of Power Converter Efficiency....113

ROI-Aware Image Processing Architecture....113

Moving Object Detection Architecture....114

Low-Power Moving Object Detection....114

Noise-Robust Moving Object Detection....115

ROI-Based Coding Architecture....115

Temporal ROI-Based Coding....115

Spatial ROI-Based Coding....116

Resource-Aware Control of Target Data Rate....116

Conventional Target Data Rate Control....116

Energy- and Content-Aware Target Data Rate Control....117

Resource-Aware Control of Encoding Data Rate....118

Challenges in Data Rate Control....118

Low-Power Data Rate Control....119

Architectural Support for Handling Sparsity in IoT Devices....119

Approaches in Matrix Multiplication....120

Inner Product-Based Approach....120

Outer Product-Based Approach....121

Compressed Sparse Formats....121

Recent Hardware Architecture for Handling Sparsity....122

Hardware Architecture for Inner Product Approach....123

Hardware Architecture for Outer Product Approach....126

Architectures for Power-Gating-Based Active Leakage Control....129

Overview of Power-Gating....129

Challenges and Trade-Offs in Power-Gating....130

Power-Gating Efficiency Learner....133

Self-Adaptive Power-Gating Architecture....134

Test Chip and Measurement Results....135

Conclusion and Future Roadmap....138

References....139

4 Real-Time Scheduling for Computing Architectures....144

Contents....144

Real-Time Operating System (RTOS)....145

Introduction to Key OS Features....145

Introduction to Real-Time Systems....147

Real-Time CPU Scheduling....149

Scheduling on Single-Core CPUs....149

Scheduling on Multi-core CPUs....151

Real-Time Scheduling for CPU-GPU Systems....154

GPU Background....154

GPU Hardware Architecture....155

Threading Model....156

Scheduling Tasks on a Single GPU....157

Intra-SM Resource Allocation....158

Inter-SM Resource Allocation....159

Memory Transfer Between Device and Host....160

Multi-GPU and CPU-GPU Scheduling....160

Multiple GPUs Controlled by One Host....161

Heterogeneous Systems as DAGs....161

Splitting Tasks Between CPUs and GPUs....162

Application Domains....163

Graphics Processing....163

Cloud Systems....163

Tools and Frameworks....164

NVIDIA and CUDA....164

AMD and ROCm....164

OpenCL....164

Alternative Architectures....165

Processing in Memory....165

FPGAs as Accelerators....165

Real-Time Edge Computing Systems....165

Introduction to Edge Computing....165

The Edge Architecture....167

Real-Time Edge Computing....168

Resource Allocation in Real-Time Edge....169

Contention Model....169

Tiered Architecture....171

Model Parameters....172

Introduction to Real-Time Networks....173

Real-Time Wired Networks....173

Real-Time Wireless Networks....175

Real-Time Flow....178

Routing and Scheduling in Real-Time Wireless Sensor Networks....180

RAP Routing Protocol....181

SPEED Routing Protocol....181

Summary....181

References....182

5 Secure Processor Architectures....188

Contents....188

Introduction....189

Modern CPU Microarchitecture....190

Micro-architectural Attacks....193

Transient Micro-architectural Attacks....196

Meltdown and Spectre-Like Attacks....198

Micro-architectural Data Sampling Attacks....201

Countermeasures....206

Prevention-Based Countermeasures....206

Detection-Based Countermeasures....211

Conclusions....211

References....212

6 Bus and Memory Architectures....217

Contents....217

Introduction....218

SoC Overview....218

Processor Overview....219

CPU Types....220

Balanced Processor Architectures....221

CPU Memory Parallelism....222

MSHRs....222

Memory-Level and Memory Hierarchy Parallelism (MLP and MHP)....223

Parallelism to DRAM....224

Accelerators....224

On-Chip Connectivity....225

Interconnect Interfaces....225

Interconnect Topologies....226

Off-Chip Connectivity....227

Summary and Conclusion....227

References....227

Part II Application-Specific Processors....229

7 Architectures for Multimedia Processing: A Cross-Layer Perspective....230

Contents....230

Introduction and Overview of Video Codecs....231

High Efficiency Video Coding....233

Overview of the Standard....233

Analysis of Computational Complexity, Memory Requirements, and Processor Temperature....235

Hardware and Software Architectures for Video Coding....239

Complexity Reduction....242

Low-Power Memory Architectures....242

Workload Balancing for Multiple Video Tiles....244

Dynamic Thermal Management for HEVC....244

Future Directions....247

Conclusions....249

References....249

8 Post-Quantum Cryptographic Accelerators....252

Contents....252

Introduction....253

Post-Quantum Cryptography (PQC)....255

NIST Post-Quantum Cryptography Standardisation Project....255

Initial Submissions....256

NIST's PQC Round1....256

NIST's PQC Round2....256

NIST's PQC Round3....256

Classes of Post-Quantum Cryptography....256

Code-Based....257

Multivariate-Based....257

Hash-Based....257

Isogeny-Based....257

Lattice-Based....258

Lattice-Based Cryptography Primitives....259

Lattices....259

Computational Problems on Lattices....259

Average-Case Problems on Standard Lattices....260

Classes of Lattices....261

Ring-LWE Based PKE Scheme....262

Computationally Intensive Components of LWE (and Variants)....263

Discrete Gaussian Sampling....264

Polynomial Multiplication....265

Schoolbook Algorithm....265

Number Theoretic Transform (NTT)....266

Barrett's Reduction....268

Coprocessors for the Lattice-Based Cryptography....269

General Optimisation Strategies....269

Performance Benchmarks....270

Coprocessors Design Paradigms for Lattice-Based Cryptography....271

Optimization Strategies for Implementation of Underlying Components....276

Discrete Gaussian Sampling....276

Polynomial Multiplication....278

Physical Protection of Lattice-Based Cryptography....281

Timing Attacks....282

Power Analysis Attacks....282

Fault Attacks....283

Challenges in the Post-Quantum Cryptography Adaptation....284

Conclusions....285

References....286

9 Fault Tolerant Architectures....291

Contents....291

Introduction....292

Faults, Errors, and Failures....295

Fault Model....295

Fault Mechanisms....296

External Faults....296

Aging/Stress-Induced Faults....297

Fault Masking....298

Reliability....300

Types of Reliability....300

Reliability Estimation....301

Fault Tolerance....303

Fault Tolerance Activities....304

Redundancy....305

Fault-Tolerant Computation....308

Single-Core Computing....308

Multicore Computing....310

Reconfigurable Computing....311

Fault-Tolerant Memory/Storage....312

Cache/On-chip SRAM....313

Main Memory/DRAM....313

Storage....313

Fault-Tolerant On-Chip Communication....314

Cross-Layer Reliability....315

Domain-Specific Fault Tolerance....317

Signal Processing....317

Wireless Communication....318

Fault Tolerance in Emerging Technologies....318

Emerging Memory Technologies....318

Reliability Issues in NVMs....320

Read Disturb Issue in OxRRAM....320

Thermal Issues due to PCM's High Voltage Operations....322

Fault Tolerance in AI/ML....323

Built-In Error Tolerance of Machine Learning Models....323

Fault Tolerance via Self-Repair....324

Conclusion....328

Glossary....328

References....330

10 Architectures for Machine Learning....335

Contents....22

Introduction....336

Architectures for Neuromorphic Computing....338

Biological Computing Models and Learning Methods....338

Microarchitecture for Neuromorphic Computing....344

Circuit-Level Design Considerations....349

Prominent Neuromorphic Chips....357

SpiNNaker....357

Neurogrid....358

BrainScales....359

LaCSNN....359

TrueNorth....360

Loihi....361

ODIN....362

Tianjic....362

Architectures for Artificial Neural Networks....363

Design Metrics for ANN Architectures....364

Design Abstractions and Trade-Offs....369

Selective ANN Architectures and Circuits....371

Architectures for Classic Machine Learning....384

Conclusions....385

References....386

11 Computer Arithmetic....394

Contents....394

Introduction....395

Definitions....398

Radix....398

Positional Notation....399

Absolute Error....399

Relative Error....399

Numerical Precision....400

Units in the Last Place....400

Machine Epsilon....400

Floating-Point Operations Per Second....400

Integer Arithmetic....400

Gray Code....401

Unary Code....402

Fixed-Point Arithmetic....402

Floating-Point Arithmetic....403

IEEE 754....403

Subnormal Numbers....404

Exceptions....404

Not a Number-NaN and Infinity....405

Quiet NaN....405

Signaling NaN....405

Rounding Modes....405

Floating-Point Approximate Circuits....406

Posit Arithmetic....407

Other Formats....408

BF16....408

TensorFlow-32....409

Hardware Implementations....409

Adders....409

Ripple-Carry Adder....409

Carry-Lookahead Adder....410

Multipliers....410

Dividers....411

Square Root....411

Conclusion....411

References....412

12 Architectures for Scientific Computing....414

Contents....414

Introduction....415

Definitions....416

Scientific Computing....416

Multicore Architectures....417

Manycore Architectures....418

Field-Programmable Gate Arrays....418

Coarse-Grained Reconfigurable Architectures....418

Custom Architectures....420

Multicore Architectures....420

General Purpose Graphics Processing Units....421

Field-Programmable Gate Arrays....423

Coarse-Grained Reconfigurable Architectures....423

Conclusion....425

References....425

Part III Multicore and Reconfigurable Architectures....428

13 Field-Programmable Gate Array Architecture....429

Contents....429

Introduction....430

Methodology and Tools for FPGA Architecture Evaluation....432

Key FPGA Applications....434

Programmable Logic Blocks....435

Programmable Routing....442

Programmable IO....447

Programmable Clock Distribution Networks....449

On-chip Memory....451

DSP Blocks....459

Processor Subsystems....465

System-Level Interconnect: Network-on-Chip....467

Interposers....469

Configuration and Security....471

Conclusion....472

References....472

14 Coarse-Grained Reconfigurable Array (CGRA)....476

Contents....476

Introduction....477

Historical Context....479

Architecture: A Landscape of Modern CGRA....482

Compilation for CGRAs....486

Modulo Scheduling and Modulo Routing Resource Graph (MRRG)....486

CGRA Mapping Approaches....488

Heuristic Approaches....489

Mathematical Optimization Techniques....494

Graph-Theory-Inspired Techniques....495

Other Compilation-Related Issues....503

Challenges Related to Data Access ....503

Nested Loop Mapping....505

Application-Level Mapping....507

Handling Loops with Control Flow....510

Scalable CGRA Mapping....510

Conclusions....511

References....511

15 Dynamic and Partial Reconfiguration of FPGAs....517

Contents....517

Introduction....518

FPGA Configuration....520

Designing Partially Reconfigurable Systems....522

Managing Partial Reconfiguration....527

Applications of Dynamic Partial Reconfiguration....529

Computing Infrastructure and Virtualization....530

Design Compilation....531

Adaptive Systems....532

Machine Learning....534

Reliability and Harsh Environments....534

Research Directions....535

Conclusions....536

References....536

16 GPU Architecture....541

Contents....541

Introduction....542

Graphics Pipeline....543

GPU for General-Purpose Computing....546

Execution Model....546

Programming Interface....548

Hardware Architecture....550

Shader Pipeline....550

Register File....552

Warp Scheduler....553

SIMT Stack....554

Memories....555

Global Memory....555

Constant Memory and Texture Memory....556

Shared Memory....556

L1 and L2 Caches....557

Optimization Use Case: Access-Aware Variable Mapping to Memory....557

Recent Research on GPU Architecture....560

Performance....560

Hiding Memory Access Latency with Advanced Warp Schedulers....560

Throttling Memory Access Latency....561

Energy Efficiency....562

Revisiting Compute Cores and Pipeline....563

Revisiting Register File....564

Reliability....565

Run-Time Error Detection and Correction....566

Fault Analysis....567

Conclusion....567

References....567

17 Power Management of Multicore Systems....570

Contents....570

Introduction....571

Power Dissipation in Multicore Systems....573

Causes and Effects of Power Dissipation....573

Power Dissipation in Multicore Systems....576

Common Power Reduction Methods....577

Hardware....577

Firmware....578

Dynamic Voltage and Frequency Scaling (dvfs)....578

Dynamic Power Management (dpm)....579

Virtualization....580

Software....580

Task Migration....580

Task Scheduling....580

Data Forwarding....580

Power Management: Embedded Systems....581

Energy Minimization....581

Thermal Management....583

Reliability Improvement....587

Power Management: Desktop and Servers....589

ACPI Standard....590

Power Schemes: Governors....591

Power Management: High-Performance Computing (HPC) Data Centers....592

Fast Heuristics....593

Heuristics Using Design-Time Profiling....593

Machine Learning....594

Network Technologies....594

Recent Advances in Multicore Power Management....594

2.5D/3D Systems....594

Cross-Layer Approach....595

Emerging Technologies....595

AI-/ML-Based Power Management....595

Conclusion....596

Glossary....596

References....597

18 General-Purpose Multicore Architectures....603

Contents....603

Introduction....604

Motivating the Need for Concurrent Processing....606

Classifying Parallel Computing Hardware....606

Multiprocessing....607

Thread-Level Parallelism Within an Application....609

What to Do With All These Transistors?....612

Multicore CPU Hardware Design....614

Optimizing CPU Cores for Parallelism....614

Sharing Caches and Main Memory....617

Coordinating Memory Requests Across Cores....622

Scaling to Many Cores....622

Managing Memory....623

Shared-Memory Model....625

Main Memory Policies....626

Mitigating Interference....629

Cache Coherence....630

Memory Consistency Models....634

Optimizing Operating Systems for Multicore CPUs....636

Evaluating Multicore CPUs....639

The Evolution of Multicore CPUs....643

Systems-on-Chip....643

Heterogeneous CPU Cores....645

Chiplet-Based Multicore Design....646

Conclusion....648

References....648

Part IV Emerging Computing Architectures....652

19 Compute-in-Memory Architecture....653

Contents....653

Introduction....654

DNN Basics and Corresponding CIM Principle....656

Architecture and Algorithm Techniques for CIM....658

Hierarchical Architecture of CIM....658

Network Mapping Strategies....659

Mapping Methods for Inference....660

Mapping Method for Training....663

Number Representation in CIM Architecture....663

Pipeline Design in CIM Architecture....666

Intra-Layer Pipeline....667

Inter-Layer Pipeline....667

Quantization Techniques in CIM Architectures....670

Hardware Implementations for CIM Architecture....673

Device Technologies....673

SRAM....673

Two Terminal eNVM....675

Three-Terminal eNVM....676

Overcoming the Non-idealities from eNVM....677

Circuit Techniques for CIM....678

Memory Modification....678

Input Encoding....679

Output Sensing....681

Frameworks for Evaluating CIM Designs....687

Conclusion....688

References....689

20 Design Automation Techniques for Microfluidic Biochips....693

Contents....693

Introduction....694

Flow-Based Microfluidic Biochips....696

Design Tasks for FBMBs....697

Architecture Design of the Flow Layer....697

Architecture Design of the Control Layer....699

Design Automation for FBMBs....699

Synthesis Methods for the Flow Layer....699

Synthesis Methods for the Control Layer....702

Synthesis Methods for the Codesign of the Control and Flow Layers....705

Digital Microfluidic Biochips....707

Technology Platforms and Applications....707

Synthesis Methods....710

Scheduling and Module Placement....711

Droplet Routing....712

MEDA Biochips....713

Hardware Implementation....714

MEDA Evolution....717

Synthesis Methods....718

Scheduling and Placement for MEDA Biochips....718

Droplet Routing and Extension for MEDA....721

Conclusion....724

References....725

21 Architectures for Quantum Information Processing....729

Contents....729

Introduction....730

Background....731

Quantum Bits (Qubits)....732

Quantum Gates....733

Quantum Error....733

Gate Error....734

Relaxation and Dephasing....734

Measurement Error....734

Crosstalk Error....735

Quantum Hardware....735

Qubit Technologies....736

Superconducting Qubits....736

Trapped-Ion Qubits....737

Spin Qubits....738

Quantum Algorithms....738

Algorithms Designed for Fault-Tolerant Quantum Computers....739

Shor's Algorithm....739

Grover's Algorithm....740

Algorithms for NISQ Computers....740

Variational Quantum Eigensolver or VQE....740

Quantum Approximate Optimization Algorithm or QAOA....741

Quantum Software....741

Quantum Program, Quantum Instruction Sets, and Software Development Kits....741

Quantum Programming Languages....742

Quantum Annealing....744

Compilation, Mapping, and Optimization....745

Superconducting Quantum Computers....746

Coupling Constraints and Need for SWAP Operation....746

Compilation and Optimization....747

Trapped-Ion Quantum Computers....747

Shuttle Operation....747

Compilation and Optimization....748

Considerations for Noisy Systems....749

Technology Agnostic Work....749

Noise-Aware Qubit Mapping....749

Measurement Error Mitigation....750

Superconducting-Specific Work....750

Crosstalk Mitigation....750

Leveraging Extended Native Gates....751

Application-Specific Compilation....751

Conclusion....752

References....752

22 Design and Tool Solutions for Monolithic Three-Dimensional Integrated Circuits....756

Contents....756

Introduction....757

Monolithic 3D IC Design Flow....758

Motivation and Background....758

Benefit Trends of Monolithic 3D ICs Across Technology Nodes....759

Analysis on Benefits of Monolithic 3D ICs....759

Technology Nodes and Design Libraries....759

Implementation Methodology....760

Power Saving Trend of Monolithic 3D ICs....761

Analysis of Trends....763

M3D Power Saving at Low Frequency....763

M3D Power Saving at High Frequency....765

A Design-Aware Partitioning Approach to Monolithic 3D IC with 2D Commercial Tools....766

Implementation Methodology....767

Design-Aware Partitioning Stage....767

MIV Planning Stage....769

Cascade-2D Stage....770

Impact of New Monolithic 3D IC Design Flow....773

Power and Performance Benefit....773

Comparison to Shrunk-2D Design Flow....774

Power Supply Integrity of Monolithic Three-Dimensional Integrated Circuits....778

Motivation and Background....778

System-Level Power Delivery Network Analysis for Monolithic 3D ICs....779

System-Level Power Delivery Network Modeling....780

Analysis on Power Supply Integrity of Monolithic 3D ICs....781

Monolithic 3D IC Power Delivery Network Design Flow....781

Technology Nodes and Design Libraries....781

Analysis Methods....782

Static Rail Analysis....783

Dynamic Rail Analysis....786

Frequency- and Time-Domain Analysis....787

Monolithic 3D ICs for Deep Neural Network Hardware....790

Motivation and Background....790

Impact of Monolithic 3D ICs on On-Chip Deep Neural Networks Targeting Speech Recognition....791

Deep Neural Network for Speech Recognition....791

DNN Topology....791

Deep Neural Network Training and Classification....792

Coarse-Grain Sparsification....793

Deep Neural Network Architecture Description....794

Impact of Monolithic 3D ICs on Energy-Efficiency of Deep Neural Network Hardware....796

Area, Wire-Length, and Capacitance Comparisons....796

Power Comparisons....798

Impact of Monolithic 3D ICs on Performance of Deep Neural Network Hardware....800

Architectural Impact Discussions....802

CGS-16 and CGS-64 Architecture Comparisons....802

Impact of Workloads....805

Conclusion....806

References....807

Part V Processor Design and Programming Flows....810

23 Architecture Description Languages....811

Contents....811

Introduction....812

A Brief History of ADLs....816

The Classical Era: 1990–2000....817

The First Industrial Era: 2000–2010....817

The Second Industrial Era: 2010–2020....818

Types and Characteristics of ADLs....818

Types of ADLs....818

Characteristics of ADLs....819

Key ADLs....820

MIMOLA....820

EXPRESSION....821

nML....821

LISA....822

PEAS....823

TENSILICA TIE....823

ARC APEX....824

Codasip CodAL....825

Andes ACE....826

RISC-V Chisel....826

ADL-Driven Methodologies....827

Generation of Software Tools....827

Automatic Synthesis of Custom Instructions for an Application....828

Instruction-Set Simulator Generation....829

Generation of Hardware Implementation....831

Top-Down Verification....832

Validation of an ADL Specification....833

Specification-Driven, Simulation-Based, Verification....834

Applications of ADL-Based Design....835

Conclusions....837

References....838

24 Accelerator Design with High-Level Synthesis....844

Contents....844

Introduction....845

Background: Technology and Models....847

Target Technology....847

Accelerator Models....849

Accelerator Template....850

Introduction to High-Level Synthesis....851

A Traditional High-Level Synthesis Framework....851

A Bit of History on Commercial Products and Academic Projects....853

From Input Specification to Intermediate Representation....854

Input Specification and Intermediate Representation....854

Analysis and Optimization of the Intermediate Representation....856

Creation of the Microarchitecture....858

Scheduling and Performance Optimization....858

Binding and Resource Optimization....860

Definition of the Memory Architecture....862

Creation of the FSM Controller....866

RTL Generation and System Integration....866

Code Generation, Evaluation, and Verification....866

System-Level Integration and Optimization....867

Open and Modern Challenges....868

Creation of Domain-Specific Architectures....868

Programmability and System-Level Optimization....870

Hardware Security and Data Protection....871

Conclusion....871

References....872

25 Processor Simulation and Characterization....877

Contents....877

Introduction....878

Application and Algorithm Analysis....881

Data Types and Operations....881

Algorithms....882

Example: Affine Transform of 2D Image....883

New or Existing Processor?....884

Existing Processor....884

Extending Configurable Processor....885

New Processor with New ISA....885

Hybrid Mode: New ISA with Custom Extensions....886

Standard Benchmarks....886

Issues with Estimating Processor Performance....886

Whetstone....890

Linpack....890

Dhrystone....890

CoreMark....891

Embench....892

SPEC CPU....893

EEMBC....894

Berkeley Design Technology....894

Summary....894

Using Application Code for Benchmarking....895

Estimation Analysis....895

Examples of Estimation Flow....897

Hardware Aspects....898

Software Aspects....900

Custom Instructions....903

For Further Consideration....903

Processor Simulation....904

Functional Simulation....904

Definition....904

Trace-Driven Cache Simulators and Branch-Prediction Simulators....904

Instruction Mix Analysis....905

Instruction Level Parallelism (ILP)....905

Memory Access Patterns....905

Register-File Usage Analysis....906

Open-Source Simulators....906

Cycle-Level Simulation....907

Definition....907

Performance Analysis....907

Metrics and System Partitioning....907

Optimization....908

Configurability....908

Open-Source Simulators....908

Hardware Emulation....909

Definition....909

Emulation Modes....909

Using Processor Simulators in System Modelling....909

Summary Table Comparing Various CPU Modelling Abstractions....911

Examples....911

Conclusion....913

References....913

26 Methodologies for Design Space Exploration....916

Contents....916

Introduction....917

DSE: The Basic Concepts....918

Two Basic Ingredients of DSE....920

Y-Chart-Based DSE....921

Evaluation of a Single Design Point....923

Simulative Fitness Evaluation....923

Analytical Fitness Evaluation....927

Searching the Design Space....928

GA-Based DSE....929

Optimizing GA-Based DSE....932

Multi-application Workload Models....933

Scenario-Based DSE....934

Application Exploration....938

NAS by Means of Evolutionary Piecemeal Training (EPT)....938

Evolutionary Operators....939

NAS Results....940

Conclusion and Outlook....941

References....943

27 Virtual Prototyping of Processor-Based Platforms....947

Contents....947

Introduction to Virtual Prototypes....949

SoC Design and Verification Overview....949

Historic Background of Virtual Prototyping....951

Virtual Prototyping in the Verification Continuum....952

Use-Cases for Virtual Prototypes....954

Architecture Analysis....956

Macro-architecture Specification....956

HW/SW Performance Optimization and Validation....959

Software Use-Cases....960

Early Software Development....962

Software Regression Testing....962

Hybrid Use-Cases for Software-Driven Functional Verification....963

RTL Co-simulation....964

Hybrid Emulation....964

Hybrid FPGA Prototyping....965

System-Level Power Analysis....965

Summary....966

Building Transaction Level Virtual Prototypes....967

The SystemC Transaction Level Modeling Standard....967

Loosely Timed Modeling Style....969

Extended Loosely Timed Modeling Style....970

Approximately Timed Modeling Style....970

Extended AT....971

TLM-2.0 Summary....972

Building TLM Components for Virtual Prototypes....973

Levels of Abstraction....973

Processor Models....974

TLM Integration of Processor Models....975

TLM Models of Peripheral Components....976

SSD Controller SoC Case Study....977

SSD Controller SoC Introduction....978

Loosely Timed Virtual Prototype of the SSD SoC....979

Accurate Virtual Prototype of SSD SoC....980

SSD Case Study Summary....982

Conclusion and Outlook....982

References....984

28 FPGA-Specific Compilers....988

Contents....988

Introduction....989

Existing HLS Compilers and Programming Models....991

C-Based HLS Tools....992

Dataflow Compilers....993

Domain-Specific Languages (DSLs)....994

Emerging Accelerator Design Languages....995

Key Compiler and Synthesis Optimizations....996

Pipelining Techniques....997

Operator-Level Optimizations....997

Statically Scheduled Pipelining....999

Dynamically Scheduled Pipelining....1001

Parallelization Techniques....1003

Homogeneous Data-Level Parallelism....1004

Heterogeneous Task-Level Parallelism....1005

Memory Customization Techniques....1006

Exploiting Data Reuse....1006

Decoupled Access-Execute....1007

Data Vectorization....1009

Memory Banking....1009

Data Type Customization Techniques....1010

Automatic Bitwidth Optimization....1010

Custom Precision Floating-Point Data Types....1011

Float to Fixed-Point Conversion....1011

Case Study: Binarized Convolutional Neural Networks....1012

Algorithm Overview....1012

Pipelining and Unrolling....1013

Line Buffers and Window Buffers....1015

Data Vectorization....1015

Building the BNN Accelerator Using HeteroCL....1016

Evaluation....1016

Concluding Remarks....1018

References....1019

29 Approximate Computing Architectures....1025

Contents....1025

Approximate Computing....1026

Approximate Arithmetic Components....1028

Design Methodologies for Approximate Components....1028

Manual Approximation Methods....1029

Automated Approximation Methods....1030

Error Metrics and Evaluation Analysis for Approximate Components....1032

Arithmetic Error Metrics....1033

General Error Metrics....1034

Quality Evaluation....1034

Design Methods for Building Approximate Hardware Accelerators: Case Studies for Error-Tolerant Applications....1035

Image and Video Processing Applications....1036

AutoAx Methodology....1036

Results....1040

Deep Neural Networks (DNNs)....1044

ALWANN Methodology....1045

Evaluation and Experiments....1047

Cross-Layer Approximations for Error-Tolerant Applications....1050

Methodology for Combining Hardware- and Software-Level Approximations....1050

Cross-Layer Methodology for Optimizing DNNs....1052

Case Studies for Improving the Energy and Performance Efficiency of DNN Inference....1053

Structured Pruning....1053

Quantization....1055

Hardware-Level Approximations: Impact of Self-Healing and Nonself-Healing Designs on DNN Accuracy....1056

Conclusions....1061

References....1062

30 Parallel Programming Models....1066

Contents....1066

Introduction....1067

Hardware Models....1067

Constructs in Parallel Programming Models....1068

Taxonomy....1069

The OpenMP Programming Model....1071

The Worksharing Model....1073

The Tasking Model....1074

SIMD Support in OpenMP....1076

Vectorization, Intrinsics, and Semi-automatic Vectorization....1077

SIMD Loops....1079

Function Vectorization....1081

The Accelerator Model....1082

The OmpSs-2 Programming Model....1084

Advanced Dependency System....1084

Global Domain of Dependencies....1085

Advanced Dependency Types....1087

Exploiting Structured Parallelism on Many-Core Processors....1088

Optimal Task Granularity....1088

Work-Sharing Task Syntax....1089

Semantics of Work-Sharing Tasks....1089

OmpSs-2 NUMA Support....1090

NUMA-Aware Allocation API....1090

Nanos6 Data-Tracking System....1091

Nanos6 NUMA-Aware Scheduling System....1092

The XiTAO Programming Model and Runtime....1093

Explicit DAG Programming in XiTAO....1093

Software Topologies and Locality-Aware Programming....1095

The Software Topology Mapping....1095

Locality-Aware Moldable Mapping....1096

The XiTAO Data-Parallel Interface....1097

The Asynchronous Data-Parallel Mode....1098

The Synchronous Data-Parallel Mode....1098

The XiTAO Runtime....1099

XiTAO Internals....1099

Configuring the Runtime....1100

Conclusion....1100

References....1101

31 Dataflow Models of Computation for Programming Heterogeneous Multicores....1103

Contents....1103

Introduction....1104

About Models of Computation....1106

Dataflow Models of Computation....1108

Static Dataflow Models....1108

Homogeneous Synchronous Dataflow (HSDF)....1110

Synchronous Dataflow (SDF)....1112

Further Static Extensions....1115

Dynamic Dataflow moc....1116

Kahn Process Network....1116

Dataflow Process Networks....1117

Relation to Other Dataflow MoCs and Extensions....1118

Reconfigurable Dataflow....1118

πSDF....1118

Other Reconfigurable Dataflow moc....1120

Optimization of Dataflow Programs....1121

Modeling Heterogeneous Platforms....1122

System-Level Description....1122

Modeling Performance and Energy Consumption....1124

Static Mapping....1125

Hybrid Mapping....1128

Examples: Models and Tools....1130

Dataflow in Commercial and Mainstream Tools....1130

MPSoC Application Programming Studio (MAPS)....1130

Preesm and Spider....1133

Preesm....1134

Spider....1135

Conclusion and Outlook....1136

References....1136

32 Retargetable Compilation....1143

Contents....1143

Introduction and Historical Perspective....1144

Compiler Construction....1145

Compiler Frameworks....1145

Retargetable Compilers....1147

Outline of This Chapter....1149

Anatomy of a Compiler....1149

Intermediate Representations....1149

Compilation Phases and Dependencies....1150

Front End....1151

Middle End....1152

Back End....1155

Linker....1156

Architectural Scope of ASIPs....1157

Parallelism....1158

Specialization....1160

Example....1162

Retargetable Compilers for ASIPs....1163

Processor Intermediate Representations....1166

Retargetable Compiler Optimizations....1167

Front End and Middle End....1169

Code Selection....1172

Register Allocation....1173

Register Assignment....1175

Instruction Scheduling....1176

Conclusions....1181

References....1182

Part VI Test and Verification....1185

33 Verification and Its Role in Design of Modern Computers....1186

Contents....1187

Introduction....1187

Formal Verification, Simulation, and Emulation....1188

Outline of the Section....1189

Section Organization....1190

Bit-Level Model Checking Algorithms....1190

C-to-RTL Equivalence Checking....1190

Symbolic Simulation....1192

Mechanical Theorem Proving....1192

Versatile Binary-Level Concolic Testing....1193

Information Flow Analysis....1193

Verification of Quantum Circuit Design Flows....1194

Discussion....1194

Conclusion....1196

References....1196

34 Bit-Level Model Checking....1198

Contents....1198

Introduction....1199

Preliminaries....1200

Explicit Example: A Simple Counter....1200

Linear Time Temporal Logic....1202

Representing Systems Symbolically....1203

Algorithms for Safety Properties....1207

The Induction Principle....1207

Overview of Model Checking Algorithms....1209

Symbolic Model Checking (with BDDs)....1212

Bounded Model Checking....1213

k-Induction....1214

Interpolation and Model Checking....1216

Interpolation Sequence-Based Model Checking (Isb)....1217

Interpolation-Based Model Checking (Itp)....1218

Property Directed Reachability....1220

Combining Interpolation and Pdr....1223

Summary....1224

Algorithms for Liveness Properties....1224

Introduction....1224

Overview of Model Checking Algorithms....1225

Symbolic Model Checking with BDDs....1225

Liveness-to-Safety Conversion (L2S)....1226

Bounded Liveness Checking....1227

Counter-Based Translation....1227

kLiveness....1227

FAIR....1229

Summary....1230

Design Simplification Techniques....1230

Reductions....1231

Combinational Redundancy Removal....1231

Retiming....1231

Sequential Redundancy Removal....1231

Input Reparameterization....1232

Phase Abstraction....1232

Over-approximations....1232

Proof-Based Abstraction....1233

Counterexample-Guided Abstraction....1233

Other Approaches....1234

Summary....1234

Conclusion....1234

References....1234

35 High-Level Formal Equivalence....1238

Contents....1238

Types of Equivalence to Check....1239

Combinational Equivalence....1240

Sequential Equivalence....1241

Transaction-Based Equivalence....1242

Verification Methodology....1243

Using Design Exercise for Datapath Designs....1252

Advanced Datapath Verification....1254

Managing Inconclusive Proofs....1254

Accuracy Challenges....1256

Accuracy Optimized Component Verification....1259

Proving Faithful Rounding....1260

Proving Monotonicity....1261

Proving Commutativity....1262

References....1263

36 Verification of Arithmetic and Datapath Circuits with Symbolic Simulation....1264

Contents....1264

Introduction....1265

Symbolic Simulation....1265

Symbolic Simulation as Formal Verification....1266

Symbolic Simulation Among Formal Verification Methods....1267

Chapter Outline....1269

Simulation....1270

Booleans and Undefined Values....1270

Circuit Simulation and Undefined Values....1271

Mathematical Model of Circuit Simulation....1274

Circuit Properties....1276

Mathematical Model of Circuit Properties....1277

Symbolic Simulation....1278

Symbolic Computation....1278

Simulation with Symbolic Values....1279

Mathematical Model of Symbolic Simulation....1282

Practical Considerations....1283

Simulation Scope Control....1284

Property Triggers....1284

Scope Reduction by Triggers....1288

Reachable-State Invariants....1290

Complexity Management....1292

Simulation Complexity....1292

Complexity Analysis....1294

Weakening....1295

Verification Flow....1297

Arithmetic Circuits....1299

Direct Verification....1299

Floating-Point Operations....1301

Floating-Point Addition....1302

Integer Multiplication....1303

Floating-Point Multiplication and Fused Multiply-Add....1305

Floating-Point Division and Square Root....1306

Industrial Verification....1309

Related Work....1310

References....1312

37 Microprocessor Assurance and the Role of Theorem Proving....1316

Contents....1316

Introduction....1317

ACL2 Preliminaries....1319

Logic Basics....1320

Extension Principles....1322

The Theorem Prover....1323

Some Execution Features: Guards, MBE, and Stobjs....1324

Intended Domains and Guards....1325

Must Be Equal....1326

Single-Threaded Objects....1327

ISA Analysis....1327

ISA Formalization....1328

Mechanical Analysis for ISA....1329

Binary Code Analysis with ISA Models....1330

Some Formalized ISAs....1331

Analysis of Microarchitecture Properties....1333

Pipelining, Out-of-Order, and Speculative Executions....1333

Pipelining....1333

Interrupts, Out-of-Order and Speculative Execution, Self-Modifying Code, and the Works....1335

Reasoning About Memory Hierarchy....1337

Verification of Execution Units....1338

Deep Dive: Formalization and Analysis of (Simplified) x86....1340

Approach....1340

Design Considerations....1342

Scope....1344

Application: Verifying x86 Instruction Implementations....1345

Ucode Model....1347

Verification of the exec Block....1348

A Candidate Instruction....1348

Verification of the Decode Block....1350

Verification of the Xlate/Ucode Blocks....1351

Discussion....1352

Theorem Proving Beyond Microarchitecture....1353

Conclusion....1353

References....1354

38 Versatile Binary-Level Concolic Testing....1359

Contents....1359

Introduction....1360

Challenges of Classic Symbolic and Concolic Testing....1361

Overview of Versatile Binary-Level Concolic Testing....1361

Background....1362

Symbolic Execution....1362

Concolic Testing....1363

Related Works....1364

The Infrastructure of Versatile Binary-Level Concolic Testing....1365

Design and Architecture....1366

Real-World Examples....1367

Concolic Testing on COTS Linux Kernel Modules....1369

Design and Architecture....1370

Real-World Examples....1372

Concolic Testing for Hardware/Software Co-validation of Systems-on-Chips....1374

Design and Architecture....1374

Real-World Examples....1377

Conclusions....1379

References....1379

39 Information Flow Verification....1383

Contents....1383

Introduction....1384

Information Flow....1385

Information Flow Model....1385

Specifying Information Flow Properties....1389

Information Flow Analysis....1390

Trace Properties and Hyperproperties....1392

Verifying Hyperproperties....1393

Static Analysis....1394

Dynamic Analysis....1396

Verification Tools....1397

Simulation-Based Verification....1397

Formal Verification Methods....1398

Case Studies....1398

Cache Timing Side Channels....1399

Memory Access Control....1402

Conclusion....1404

References....1404

40 Verification of Quantum Circuits....1407

Contents....1407

Introduction....1408

Background....1410

Quantum Computing....1410

Quantum Circuit Compilation....1412

Verification....1414

Classical Circuits....1414

Quantum Circuits....1415

Formal Verification....1417

Decision Diagrams....1418

General Approach....1419

Alternating Approach....1420

Designing a Strategy for Verifying Compilation Flow Results....1421

Simulative Verification....1424

Verification Schemes Based on Simulation....1425

Stimuli Generation Schemes....1426

Resulting Quantum Circuit Equivalence Checking Flow....1430

Conclusions....1432

References....1432

Index....1435

This handbook presents the key topics in the area of computer architecture covering from the basic to the most advanced topics, including software and hardware design methodologies. It will provide readers with the most comprehensive updated reference information covering applications in single core processors, multicore processors, application-specific processors, reconfigurable architectures, emerging computing architectures, processor design and programming flows, test and verification. This information benefits the readers as a full and quick technical reference with a high-level review of computer architecture technology, detailed technical descriptions and the latest practical applications.

The content is spread over multiple sections, and in each section, specific chapters offer a detailed glimpse of a topic of interest. The chapters are presented in increasing order of advanced concepts. It is also cross-linked in such a manner that reader can peruse a chapter with only necessary pre-requisite from selected, prior chapters.

In the first section of single-core processors, three chapters provide the background of computer organization, microarchitecture, and communication networks. This is complemented with chapters on operating systems, edge computing, and secure computing architectures – which provide sufficient foundation for a reader to move toward more advanced notions in any of the following sections.

The section on application-specific processors provides valuable insights into the growing demands from application developers to have customized architectures, also referred to as co-processors or accelerators. From a wide range of application segments, multimedia processing, scientific computing, machine learning, and cryptographic workloads are chosen to be covered here. Since these applications heavily depend on digital arithmetic, a short overview of the concepts is presented as well. Multimedia, machine learning, and several other domain-specific architectures are known to get influenced – for good or worse – due to the device-level faults appearing in advanced technology nodes. This is discussed in the section of fault-tolerant architectures.

Various application-specific processors and general-purpose ones come together to contribute in the rich tapestry of modern System-on-Chips (SoCs). This also enhances the notion of architectures significantly by offering reconfigurability as a property. Multicore SoCs and reconfigurable architectures are studied in a dedicated section, covering general-purpose multicore architectures, Graphics Processing Units (GPUs), and Field Programmable Gate Arrays (FPGAs). Furthermore, readers are offered to delve into the Coarse-Grained Reconfigurable Architectures (CGRAs), dynamic and partial reconfigurability notions as well as power management challenges for multicore systems.

Growing technology prowess offers various capabilities to modern architects. In the section of Emerging Computing Architectures, these are studied, including compute-in-memory architectures, architectures for microfluidic biochips, Quantum computing, and the ones benefitting from 3D ICs. The complexity of modern computer architectures can only be managed with the help of powerful design automation flows. This is discussed in the section on Processor Design and Programming Flows. The introductory chapters on parallel programming models and dataflow models help reader to familiarize with the abstract notions necessary to grasp the design automation concepts. This foundation brings further the methodologies for design space exploration, followed by specific tool-flows, as elaborated in the chapters on architecture description languages, high-level synthesis, processor simulation, and virtual prototyping. For customizable, application-specific, and reconfigurable architectures, the compilation flows present a critical role to extract maximum efficiency out of the computing fabric. These are discussed in two chapters on FPGA-specific compilers and retargetable compilers. Balancing of technology constraints all the way to the application layer is a complex design automation challenge, which is discussed in the chapter on approximate computing architectures.

The last section of this volume brings forth the classic and modern techniques for testing and verification of computer architectures.


Похожее:

Список отзывов:

Нет отзывов к книге.