#### Lecture 12: Caches (cont'd)

CS 105

Fall 2023

### **Review: The CPU-Memory Gap**



## **Review: Principle of Locality**

Programs tend to use data and instructions with addresses near or equal to those they have used recently

- Temporal locality:
  - Recently referenced items are likely to be referenced again in the near future

#### Spatial locality:

 Items with nearby addresses tend to be referenced close together in time







# Review: Handling Cache Miss

When a cache miss occurs update cache line at that index:

- 1. Set valid bit to 1
- 2. Update tag
- 3. Replace data block with bytes from memory





### Exercise: Direct-mapped Cache



| Cache  |           |            |  |  |  |  |  |  |  |  |  |
|--------|-----------|------------|--|--|--|--|--|--|--|--|--|
|        | Valid Tag | Data Block |  |  |  |  |  |  |  |  |  |
| Line 0 |           |            |  |  |  |  |  |  |  |  |  |
| Line 1 |           |            |  |  |  |  |  |  |  |  |  |

#### Assume 8 byte data blocks

|    | 13  |     |      |        |   |        |        |    |        |      |    |    |  |  |  |
|----|-----|-----|------|--------|---|--------|--------|----|--------|------|----|----|--|--|--|
|    |     |     |      |        |   |        | _ine 0 |    | Line 1 |      |    |    |  |  |  |
| g  | idx | off | h/m  |        | C | 0000   | 47     | 48 | 0      | 0000 | 47 | 48 |  |  |  |
| LO | 0   | 000 | Miss |        |   | . 0110 | 13     | 14 |        |      |    |    |  |  |  |
|    |     |     |      |        |   | . 0110 | 15     | 14 |        |      |    |    |  |  |  |
|    |     |     |      |        |   |        |        |    |        |      |    |    |  |  |  |
|    |     |     |      | Time   |   |        |        |    |        |      |    |    |  |  |  |
|    |     |     |      | Ξ<br>Η |   |        |        |    |        |      |    |    |  |  |  |
|    |     |     |      |        |   |        |        |    |        |      |    |    |  |  |  |
|    |     |     |      |        |   |        |        |    |        |      |    |    |  |  |  |
|    |     |     |      |        |   |        |        |    |        |      |    |    |  |  |  |
|    |     |     |      |        |   |        |        |    |        |      |    |    |  |  |  |

How well does this take advantage of spacial locality? How well does this take advantage of temporal locality?



#### Exercise: 2-way Set Associative Cache



### **Eviction from the Cache**

On a cache miss, a new block is loaded into the cache

- Direct-mapped cache: A valid block at the same location must be evicted—no choice
- Associative cache: If all blocks in the set are valid, one must be evicted
  - Random policy
  - FIFO
  - LIFO
  - Least-recently used; requires extra data in each set
  - Most-recently used; requires extra data in each set
  - Most-frequently used; requires extra data in each set

#### Exercise: 2-way Set Associative Cache

| 0x74          |     | 18  |        | Cache |       |   |                    |       |     |       |   |    |        |   |     |       |  |
|---------------|-----|-----|--------|-------|-------|---|--------------------|-------|-----|-------|---|----|--------|---|-----|-------|--|
| $0 \times 70$ |     | 17  | _ <    | 1     |       |   |                    | Set   | 0   |       |   |    |        |   | _   |       |  |
| 0x6c          |     | 16  | Memory |       |       |   |                    | Cat   |     |       |   |    |        |   |     |       |  |
| 0x68          |     |     |        | Set   | 1     |   |                    | ~ (   |     |       |   |    |        |   |     |       |  |
| 0x64          |     |     |        |       |       |   | Assume 8 byte data |       |     |       |   | a  | DIOCKS |   |     |       |  |
| 0x60          |     | 13  |        |       | Set 0 |   |                    |       |     |       |   |    | Set 1  |   |     |       |  |
| Access t      | tag | idx | off    | h/m   |       |   | Li                 | ne 0  |     | ne 1  |   | Li | ne 0   |   | -11 | ne 1  |  |
|               | 110 | 0   |        | Miss  | - I   | 0 | 0                  | 47 48 | 0 1 | 47 48 | 0 | 0  | 47 48  | 0 | 1   | 47 48 |  |
| rd 0x64       |     | -   |        |       |       | 1 | 6                  | 13 14 |     |       |   |    |        |   |     |       |  |
|               |     |     |        |       |       |   |                    |       |     |       |   |    |        |   |     |       |  |
| rd 0x70       |     |     |        |       | ۵     |   |                    |       |     |       |   |    |        |   |     |       |  |
| rd 0x64       |     |     |        |       | Lime  |   |                    |       |     |       |   |    |        |   |     |       |  |
| rd 0x64       |     |     |        |       |       |   |                    |       |     |       |   |    |        |   |     |       |  |
| rd 0x60       |     |     |        |       |       |   |                    |       |     |       |   |    |        |   |     |       |  |
| rd 0x70       |     |     |        |       | V     |   |                    |       |     |       |   |    |        |   |     |       |  |
| rd 0x80       |     |     |        |       |       |   |                    |       |     |       |   |    |        |   |     |       |  |



## Typical Intel Core i7 Hierarchy

Processor package



L1 d-cache and i-cache: 32 KB, 8-way Access: 4 cycles

L2 unified cache: 256 KB, 8-way Access: 10 cycles

L3 unified cache: 8 MB, 16-way Access: 40-75 cycles

Block size: 64 bytes for all caches.

### Caching and Writes

- What to do on a write-hit?
  - Write-through: write immediately to memory
  - Write-back: defer write to memory until replacement of line
    - Need a dirty bit (line different from memory or not)
- What to do on a write-miss?
  - Write-allocate: load into cache, update line in cache
    - Good if more writes to the location follow
  - No-write-allocate: writes straight to memory, does not load into cache
- Typical
  - Write-through + No-write-allocate
  - Write-back + Write-allocate

### Exercise 5: Write-back + Write-allocate

| Memory |    |  |  |  |  |  |  |  |  |
|--------|----|--|--|--|--|--|--|--|--|
| 0x84   | 22 |  |  |  |  |  |  |  |  |
| 0x80   | 21 |  |  |  |  |  |  |  |  |
| 0x7c   | 20 |  |  |  |  |  |  |  |  |
| 0x78   | 19 |  |  |  |  |  |  |  |  |
| 0x74   | 18 |  |  |  |  |  |  |  |  |
| 0x70   | 17 |  |  |  |  |  |  |  |  |



Assume 4 byte data blocks

| Access    | tag | idx | off | h/m |
|-----------|-----|-----|-----|-----|
| rd 0x70   |     |     |     |     |
| wr 8,0x70 |     |     |     |     |
| wr 9,0x84 |     |     |     |     |
| rd 0x84   |     |     |     |     |
| rd 0x80   |     |     |     |     |

|   | Line 0 Line 1 |    |   |   | Line | e 2 | l | W  |   |   |    |  |
|---|---------------|----|---|---|------|-----|---|----|---|---|----|--|
| 0 | 0             | 47 | 0 | 1 | 47   | 0   | 2 | 47 | 0 | 3 | 47 |  |
|   |               |    |   |   |      |     |   |    |   |   |    |  |
|   |               |    |   |   |      |     |   |    |   |   |    |  |
|   |               |    |   |   |      |     |   |    |   |   |    |  |
|   |               |    |   |   |      |     |   |    |   |   |    |  |
|   |               |    |   |   |      |     |   |    |   |   |    |  |

## **Caching Organization Summarized**

- A cache consists of lines
- A line contains
  - A block of bytes, the data values from memory
  - A tag, indicating where in memory the values are from
  - A valid bit, indicating if the data are valid
- Lines are organized into sets
  - Direct-mapped cache: one line per set
  - k-way associative cache: k lines per set
  - Fully associative cache: all lines in one set
- Caches handle both reads and writes
  - write-through: write to both cache and memory
  - write-back: write only to cache, write to memory on evict,
  - write-allocate: alloc on any miss
  - no-write allocate: alloc only on read miss