Cache

A cache is divided into blocks

line #validtagblock
multiple blocks can have the same lineto determine wether the data is validdistinguishes a block from the othersthe block of data itself
0010101001001_11010010
1101111001011_01001100
2111111010010_11001011
3000001001100_00100110
4101000100110_00100110

Direct Mapping

The following is an exmple of a cache with 4 words, with 2 lines

#wordbyte1byte2byte3byte4
0000000000000000000000000000000000
0100000000000000000000000000000000
0200000000000000000000000000000000
0300000000000000000000000000000000
1000000000000000000000000000000000
1100000000000000000000000000000000
1200000000000000000000000000000000
1300000000000000000000000000000000

Now let's see how to determine where to get a word in the cache, based on its value (with 4 word blocks, in a 2 line cache).

tagline #wordbyte
00000000000000000000100000001001

To determine a HIT in a cache it's easy (where multiple words are present in a block, there's the need for a mux to determine which word to get data from)

Cache Hit

Cache Size

To determine the size of a direct mapping cache we need some data:

  • \(2^n\) lines
  • \(2^m\) words block size
  • 1 validity bit

tag size = \(32 - n - m - 2\)

cache size in bits = \(2^n \cdot (2^m + 1 + tag\_size) \)

Associativity

By making the cache more associative, we reduce the number of conflict misses.

Cache Size pt. 2

TODO: pdf 22, slide 11

Policies

Replacement Policy

  • LRU (least recently used), it requires a bit to determine how old a block is, to decide which one to replace (the oldest one)
  • LFU (least frequently used), replace the least frequently used, but it requires a more complex hardware (it requires a counter for each set, the counter is updated at every access)
  • RANDOM, replace a random block

Writing Policy

It's the policy used to update the RAM when the cache is written.

  • Write through, at each update, the block is updated in RAM (good for consistency on multi-core systems, very slow)
  • Write back, the blocks is updated only when replaced, (faster, but the content in the cache isn't in sync with the content in RAM)

By using a DIRTY bit, we can manage to save in RAM only blocks which have been changed.

MISS types

  • Cold start, when the address is requested for the first time (solved by making bigger blocks)
  • Conflict, when the block has been replaced due to the associativity of the cache (solved by increasing the associativity)
  • Capacity, the block has been replaced due to the size of the cache (solved by increasing the size of the cache)

Cache & Parallelism

In multi-processor architectures there are multiple parallel caches, which have fast communication to keep the data coherent. Multiple difference processes can access and modify the same data.

There must be a way to keep consistency and coherence of the data in multiple caches.

To solve this problem, there are two strategies:

  • distributed protocol which caches use to communicate
  • centralized manager which handles the interactions

Cache Controller FSA

Finite State Automaton

stateDiagram-v2
    IDLE --> Tag
    Tag --> IDLE
    Allocation --> Tag
    Tag --> Allocation
    Tag --> WriteBack
    WriteBack --> WriteBack
    WriteBack --> Allocation
    Allocation --> Allocation

The writes of different processors must be read in order.

Cache Invalidation Protocol

Coherence is when the value I read is the last one written, consistency means that all data is consistent (calendar - message example)

Virtual Memory

In a multi process system, it's hard to manage the memory for all the processes; there could be a problem with memory not being sufficient for all the processes. The solution is to make the addresses of the processes "virtual", and map them to physical ones when needed.

The memory is divided in pages, which are stored on a slower memory when not needed.

Each process has a page table which takes a virtual page and maps it to a physical page.

validdirtyusedphysical page address
100address in mass memory
110address in mass memory
000address in mass memory
111address in mass memory

When valid is 1, the page is in RAM, but you still need 2 accesses to get the address. When valid is 0, the page is in a mass storage, a page fault exception is launched, and it requires millions of clock cycles to get the data from memory.

Policies

Replacement Policy

  • LRU
  • LFU
  • RANDOM

Writing Policy

  • Write back (because write through requires too much time)

TLB

A Translation Lookaside Buffer is a special buffer used to access virtual memory addresses faster.