Computer Organization Principles Review Summary (III) Multi-level Memory

Chapter 3 Multi-level Memory#

This chapter contains a lot of content, mainly including various types of memory and storage methods, focusing on basic concepts of memory, DRAM, SRAM, cache, hit rate and average access time, main memory and cache mapping methods, and virtual memory.

3.1 Overview of Memory#

3.1.1 Classification of Memory#

Memory is the storage device in a computer system used to store programs and data.
Storage Medium: Currently mainly using semiconductor devices and magnetic materials.
Storage Bit: A bistable semiconductor circuit or a CMOS transistor or a storage element made of magnetic material can store one bit of binary code. This binary code bit is the smallest storage unit in memory, called a storage bit.
Storage Unit: A storage unit is composed of several storage bits. Many storage units make up a memory.

According to the different properties of storage materials and usage methods, memory can be classified in various ways:
(1) Based on storage medium, it is divided into magnetic surface/semiconductor memory.
(2) Based on access method, it is divided into random/sequential access (magnetic tape).
(3) Based on read/write function, it is divided into read-only memory (ROM) and random read/write memory (RAM).
(4) Based on volatility of information, it is divided into volatile and non-volatile.
(5) Based on role in the memory system, it is divided into main/auxiliary/cache/control.

3.1.2 Memory Hierarchy#

Current characteristics of memory:

Fast memory is expensive and has a small capacity;
Cheap memory is slow and has a large capacity.
When designing the architecture of computer memory systems, we hope for large capacity, fast speed, and low cost. Therefore, in the design of memory systems, a compromise should be made among memory capacity, speed, and price, establishing a multi-level memory architecture, as shown in the figure below.
High-speed cache, referred to as cache, is a high-speed small-capacity semiconductor memory in the computer system.
Main memory, referred to as main memory, is the primary storage of the computer system, used to store a large number of programs and data during the operation of the computer.
External storage, referred to as external memory, is a large-capacity auxiliary storage.

3.1.3 Technical Indicators of Main Memory#

Word Storage Unit: A storage unit that holds one machine word, and the corresponding unit address is called the word address.
Byte Storage Unit: A unit that holds one byte, and the corresponding address is called the byte address.
Storage Capacity: Refers to the total number of storage units that can be accommodated in a memory. The larger the storage capacity, the more information can be stored.
Access Time (also known as memory access time): Refers to the time from when a read operation command is issued until the operation is completed and the data is read out onto the data bus. Usually, the write operation time is taken to be equal to the read operation time, hence it is called memory access time.
Storage Cycle: Refers to the minimum time interval required to initiate two consecutive read operations. Usually, the storage cycle is slightly longer than the access time, and its time unit is ns.
Memory Bandwidth: The amount of information accessed by the memory in a unit time, usually measured in bits/second or bytes/second.

3.2 SRAM Memory (Static Random Access Memory)#

The currently widely used main memory (internal memory) is semiconductor memory. Based on the different mechanisms of information storage, it can be divided into two categories:

Static Random Access Memory (SRAM): Fast access speed, but storage capacity is not as large as DRAM.
Dynamic Random Access Memory (DRAM): Slightly slower access speed, but storage capacity is larger than SRAM.

3.2.1 Basic Static Storage Element Array#

Storage Bit: A latch (flip-flop). As long as a DC power supply is continuously applied to this memory circuit, it can indefinitely maintain the memory state of 1 or 0. If the power supply is cut off, the stored data (1 or 0) will be lost.
Three Sets of Signal Lines (Key Points): Address Lines, Data Lines (row lines, column lines), Control Lines.
Address Lines: If there are 6 lines, it specifies that the memory capacity is 2^6 = 64 storage units.
Data Lines: If there are 4 lines, it specifies that the memory word length is 4 bits, thus the total number of storage bits is 64×4 = 256.
Control Lines: R/~W control line, specifies whether to read from or write to the memory.

The address decoder outputs 64 selection lines, which we call row lines, and its function is to open the input of each storage bit's NAND gate.
Insert image description here

3.2.2 Basic SRAM Logic Structure#

Most SRAM chips use a dual-decoding method to organize larger storage capacities.
It employs two-level decoding: dividing the address into x-direction and y-direction parts as shown in the figure.
Storage array (256 rows × 128 columns × 8 bits).
Address decoder
- Uses dual decoding (reducing the number of selection lines).
- A0~A7 are row address decode lines.
- A8~A14 are column address decode lines.
There are 8 bidirectional data lines.

3.2.3 Read/Write Cycle Waveform Diagram#

Insert image description here

Example 1: The figure shows the write timing diagram of SRAM. R/W is the read/write command control line. When the R/W line is low, the memory writes the data on the data line to the memory at the specified address. Please identify the errors in the write timing diagram and draw the correct write timing diagram.
Insert image description here

Solution: The timing signals for writing to memory must be synchronized. Typically, when a negative pulse is applied to the R/W line, the levels of the address lines and data lines must be stable. When the R/W line reaches a low level, the data is immediately stored. Therefore, if the data line changes value while the R/W line is low, the memory will store the new data⑤. Similarly, if the address line changes while the R/W line is low, the data will also be stored at the new address② or③.
The correct write timing diagram is shown in diagram (b).

3.3 DRAM Memory (Dynamic Random Access Memory)#

The storage element of SRAM memory is a flip-flop, which has two stable states. The storage element of DRAM memory consists of a MOS transistor and a capacitor.
The MOS transistor acts as a switch, and the stored information of 1 or 0 is represented by the charge amount on the capacitor.

When the capacitor is fully charged, it represents storing a 1;
When the capacitor discharges and has no charge, it represents storing a 0.

The difference between DRAM and SRAM is:

It adds row address latches and column address latches. Since DRAM memory has a large capacity, the width of the address lines must be increased accordingly, which inevitably increases the number of pins for the chip's address lines. To avoid this, the method of time-division transmission of address codes is adopted. If the address bus width is 10 bits, the address codes A0 to A9 are first sent, entered into the row address latch by the row select signal RAS; then the address codes A10 to A19 are sent, entered into the column address latch by the column select signal CAS. The two parts of the chip together achieve an address line width of 20 bits, with a storage capacity of 1M×4 bits.
It adds refresh counters and corresponding control circuits. DRAM must be refreshed after reading, and unaccessed storage elements must also be refreshed periodically, and they must be refreshed by row, so the length of the refresh counter is equal to the row address latch. Refresh operations and read/write operations are alternated, so a 2-to-1 multiplexer is used to provide either the refresh row address or the normal read/write row address.

3.3.3 Read/Write Cycle, Refresh Cycle (Key Points)#

Read/Write Cycle#

The definition of read cycle and write cycle is from the falling edge of the row select signal RAS to the next falling edge of the RAS signal, which is the time interval between two consecutive read cycles. For convenience of control, the read cycle and write cycle times are usually equal.
Insert image description here

Refresh Cycle#

DRAM storage bits are based on the charge amount on the capacitor, which decreases over time and temperature, so they must be refreshed periodically to maintain the correct information they originally remembered.
There are two types of refresh operations: centralized refresh and distributed refresh.

Centralized Refresh#

All rows of DRAM are refreshed in each refresh cycle. For example, if the refresh cycle is 8ms, all rows must be refreshed every 8ms. To do this, the 8ms time is divided into two parts: the first part is for normal read/write operations, and the second part (from 8ms to the normal read/write cycle time) is for centralized refresh operations.

Distributed Refresh#

The refresh of each row is interleaved into the normal read/write cycle. For example, in the DRAM shown in figure 3.7 on page 70, if the refresh cycle is 8ms, then each row must be refreshed every 8ms ÷ 1024 = 7.8us. Distributed refresh has no dead time!

3.4 Read-Only Memory (ROM) and Flash Memory#

1. Mask ROM (MROM)#

ROM with fixed storage content, provided by the manufacturer. Once the ROM chip is made, the stored content cannot be changed. It is used to store widely used programs or data with standard functions, or user-customized programs or data with special functions (all of which use binary code).

Advantages: High reliability and integration, low price.
Disadvantages: Cannot be rewritten.

2. Programmable ROM#

Users can modify its stored content. Depending on the programming operation, programmable ROM can be divided into:

One-time programmable (PROM)
Characteristics: Users can change certain storage elements in the product; users can program once.
Advantages: Can be programmed according to user needs.
Disadvantages: Can only be rewritten once.
Erasable Programmable ROM (EPROM)
The stored content can be written as needed. When an update is needed, the original stored content is erased, and new content is written in.
Electrically Erasable Programmable ROM (EEPROM)

3. Flash Memory#

Flash memory is also translated as flash storage, which is a high-density non-volatile read/write memory.
High density means it has a huge number of bits of storage capacity.
Non-volatile means the stored data can be preserved for a long time without power.
It has the advantages of both RAM and ROM, representing a revolutionary advancement in storage technology.

The basic operations of flash memory include programming operations, reading operations, and erasing operations.

3.5 Parallel Memory (Key Points)#

Due to the speed mismatch between the CPU and main memory, this situation has become a major issue limiting the design of high-speed computers.
To increase the data transfer rate between the CPU and main memory, in addition to using faster technologies for main memory to shorten read times, parallel technology memory can also be used.
Dual-port memory — spatial parallel technology.
Multi-module cross memory — temporal parallel technology.

3.5.1 Dual-Port Memory#

1. Logic Structure of Dual-Port Memory#

Dual-port memory is named because the same memory has two sets of independent read/write control circuits. Due to the independent operations being performed in parallel, it is a high-speed working memory that is very useful in research and engineering.
For example, the logic diagram of dual-port memory IDT7133 is shown in the figure on the next page.
Insert image description here

2. Non-Conflict Read/Write Control#

When the addresses of the two ports are different, read/write operations can be performed on both ports without conflict.
When either port is selected and driven, the entire memory can be accessed, and each port has its own chip select control (CE) and output driver control (OE).
During read operations, the OE (active low) of the port opens the output driver, and the data read from the storage matrix appears on the I/O line.

3. Conflict Read/Write Control#

When both ports access the same storage unit in memory simultaneously, a read/write conflict occurs.
To solve this problem, a BUSY flag is set. In this case, the chip's judgment logic can decide which port to prioritize for read/write operations, while the other delayed port is set to BUSY (BUSY becomes low), effectively temporarily disabling that port.

3.5.2 Multi-Module Cross Memory#

1. Modular Organization of Memory#

A main memory composed of several modules is linearly addressed. There are two ways to arrange these addresses in each module: one is sequential, and the other is crossed.
Insert image description here
[Example] M0-M3 has four modules, each with 8 words.
Sequential Method: M0: 0-7
　　　　　　　M1: 8-15
　　　　　　　M2: 16-23
　　　　　　　M3: 24-31
5-bit address organization is as follows: X X X X X
High bits select module, low bits select address within block.

Characteristics: When a certain module is accessed, other modules do not work.
Advantages: When a certain module fails, other modules can continue to work, and it is relatively easy to expand memory capacity by adding modules.
Disadvantages: Each module works serially, limiting the bandwidth of the memory.

[Example] M0-M3 has four modules, each with 8 words.
Cross Method: M0: 0, 4,…… remainder 0 when divided by 4
　　　　　　　　M1: 1, 5,…… remainder 1 when divided by 4
　　　　　　　　M2: 2, 6,…… remainder 2 when divided by 4
　　　　　　　　M3: 3, 7,…… remainder 3 when divided by 4
5-bit address organization is as follows: X X X X X
High bits select address within block, low bits select module.

Characteristics: Continuous addresses are distributed across adjacent different modules, and addresses within the same module are not continuous.
Advantages: Block transfers of continuous words can achieve multi-module pipelined parallel access, greatly improving the bandwidth of the memory. This is used for batch data reading.

2. Basic Structure of Multi-Module Cross Memory#

The figure below shows the structure diagram of a four-module cross memory. The main memory is divided into 4 independent modules M0, M1, M2, and M3, each with its own read/write control circuit, address register, and data register, all transmitting information to the CPU in the same way. Ideally, if program segments or data blocks are continuously accessed in main memory, the access speed of main memory will be greatly improved.
Insert image description here

3.6 Cache Memory (Key Points)#

3.6.1 Basic Principles of Cache#

1. Function of Cache#

Cache is an important technology adopted to solve the speed mismatch problem between the CPU and main memory.
It is a small capacity high-speed buffer memory located between the CPU and main memory.
Based on the locality principle of program access.
It can quickly provide instructions and data to the CPU, thereby accelerating the execution speed of the program.
To pursue high speed, all functions, including management, are implemented by hardware.

Locality Principle of Program Access#

In a short time interval, the program frequently accesses memory addresses within a local range while accessing addresses outside that range very rarely, which is called the locality of the program.
Generally, cache is made of high-speed SRAM, which is more expensive than main memory, but because its capacity is much smaller than main memory, it can better solve the conflict between speed and price.

2. Basic Principles of Cache#

The design basis of cache: The data accessed by the CPU this time is likely to also be nearby data in the next access. (Locality of program access)
Data exchange between the CPU and Cache is word-oriented.
Data exchange between main memory and Cache is block-oriented.
When the CPU reads a word from memory, it sends the memory address of that word to both Cache and main memory. At this time, the Cache control logic determines based on the address whether this word is currently in Cache. If it is, this word is immediately sent to the CPU; if not, it reads this word from main memory using the main memory read cycle and sends it to the CPU, while also reading the entire data block containing this word from main memory into Cache.

In the figure below, Cache is divided into 4 rows, each containing 4 words. The addresses allocated to Cache exist in a Content Addressable Memory (CAM), which is a memory that is addressed by content. When the CPU executes a memory access instruction, it sends the address of the word to be accessed to both CAM and main memory. The address sent to CAM is compared by content, and if that word is not in Cache, it is found in main memory and sent to the CPU. Meanwhile, the entire row of data containing the four words before and after that word is sent into Cache.
Insert image description here

3. Structure of Cache#

The data block of Cache is called a row, denoted as L~i~, where i=0, 1, … , m-1.
The data block of main memory is called a block, denoted as B~j~, where j=0, 1, … , n-1.
Rows and blocks are of equal length, and each row (block) contains k main memory words.
Cache consists of data memory and tag memory.
- Data memory: Stores the data of one data block from main memory.
- Tag memory: Stores the address information of the data in main memory.

4. Hit and Miss#

Hit:

Main memory block is loaded into cache.
Main memory block and cache block establish a corresponding relationship.
A tag records the main memory block number that has established a corresponding relationship with a certain cache block.

Miss:

Main memory block is not loaded into cache.
Main memory block and cache block have not established a corresponding relationship.

Hit Rate:

From the perspective of the CPU, the purpose of adding a cache is to make the average read time of main memory as close as possible to the read time of cache in terms of performance.
To achieve this goal, the portion of all memory accesses that cache satisfies for the CPU should occupy a high proportion, meaning the hit rate of cache should be close to 1.
During the execution of a program, let Nc represent the total number of accesses completed by cache, and Nm represent the total number of accesses completed by main memory, and let h define the hit rate, then h = Nc / (Nc + Nm).
If Tc represents the cache access time when there is a hit, and Tm represents the main memory access time when there is a miss, then the average access time Ta of the cache/main memory system is: $T_a = h * T_c +(1-h) * T_m$
Our goal is to make the average access time T~a~ of the cache/main memory system as close to T~c~ as possible with a small hardware cost.
Let r represent the ratio of main memory being slower than cache: $r = \frac{T_m}{T_c}$
e represents access efficiency, then: