banner
cos

cos

愿热情永存,愿热爱不灭,愿生活无憾
github
tg_channel
bilibili

Computer Organization Principles Review Summary (III) Multi-level Memory

Chapter 3 Multi-level Memory#

This chapter contains a lot of content, mainly including various types of memory and storage methods, focusing on basic concepts of memory, DRAM, SRAM, cache, hit rate and average access time, main memory and cache mapping methods, and virtual memory.

3.1 Overview of Memory#

3.1.1 Classification of Memory#

  • Memory is the storage device in a computer system used to store programs and data.
  • Storage Medium: Currently mainly using semiconductor devices and magnetic materials.
  • Storage Bit: A bistable semiconductor circuit or a CMOS transistor or a storage element made of magnetic material can store one bit of binary code. This binary code bit is the smallest storage unit in memory, called a storage bit.
  • Storage Unit: A storage unit is composed of several storage bits. Many storage units make up a memory.

According to the different properties of storage materials and usage methods, memory can be classified in various ways:
(1) Based on storage medium, it is divided into magnetic surface/semiconductor memory.
(2) Based on access method, it is divided into random/sequential access (magnetic tape).
(3) Based on read/write function, it is divided into read-only memory (ROM) and random read/write memory (RAM).
(4) Based on volatility of information, it is divided into volatile and non-volatile.
(5) Based on role in the memory system, it is divided into main/auxiliary/cache/control.

3.1.2 Memory Hierarchy#

Current characteristics of memory:

  • Fast memory is expensive and has a small capacity;

  • Cheap memory is slow and has a large capacity.
    When designing the architecture of computer memory systems, we hope for large capacity, fast speed, and low cost. Therefore, in the design of memory systems, a compromise should be made among memory capacity, speed, and price, establishing a multi-level memory architecture, as shown in the figure below.
    Insert image description here

  • High-speed cache, referred to as cache, is a high-speed small-capacity semiconductor memory in the computer system.

  • Main memory, referred to as main memory, is the primary storage of the computer system, used to store a large number of programs and data during the operation of the computer.

  • External storage, referred to as external memory, is a large-capacity auxiliary storage.

3.1.3 Technical Indicators of Main Memory#

  • Word Storage Unit: A storage unit that holds one machine word, and the corresponding unit address is called the word address.
  • Byte Storage Unit: A unit that holds one byte, and the corresponding address is called the byte address.
  • Storage Capacity: Refers to the total number of storage units that can be accommodated in a memory. The larger the storage capacity, the more information can be stored.
  • Access Time (also known as memory access time): Refers to the time from when a read operation command is issued until the operation is completed and the data is read out onto the data bus. Usually, the write operation time is taken to be equal to the read operation time, hence it is called memory access time.
  • Storage Cycle: Refers to the minimum time interval required to initiate two consecutive read operations. Usually, the storage cycle is slightly longer than the access time, and its time unit is ns.
  • Memory Bandwidth: The amount of information accessed by the memory in a unit time, usually measured in bits/second or bytes/second.

3.2 SRAM Memory (Static Random Access Memory)#

The currently widely used main memory (internal memory) is semiconductor memory. Based on the different mechanisms of information storage, it can be divided into two categories:

  • Static Random Access Memory (SRAM): Fast access speed, but storage capacity is not as large as DRAM.
  • Dynamic Random Access Memory (DRAM): Slightly slower access speed, but storage capacity is larger than SRAM.

3.2.1 Basic Static Storage Element Array#

  • Storage Bit: A latch (flip-flop). As long as a DC power supply is continuously applied to this memory circuit, it can indefinitely maintain the memory state of 1 or 0. If the power supply is cut off, the stored data (1 or 0) will be lost.
  • Three Sets of Signal Lines (Key Points): Address Lines, Data Lines (row lines, column lines), Control Lines.
  • Address Lines: If there are 6 lines, it specifies that the memory capacity is 2^6 = 64 storage units.
  • Data Lines: If there are 4 lines, it specifies that the memory word length is 4 bits, thus the total number of storage bits is 64×4 = 256.
  • Control Lines: R/~W control line, specifies whether to read from or write to the memory.

The address decoder outputs 64 selection lines, which we call row lines, and its function is to open the input of each storage bit's NAND gate.
Insert image description here

3.2.2 Basic SRAM Logic Structure#

  • Most SRAM chips use a dual-decoding method to organize larger storage capacities.

  • It employs two-level decoding: dividing the address into x-direction and y-direction parts as shown in the figure.
    Insert image description here

  • Storage array (256 rows × 128 columns × 8 bits).

  • Address decoder

    • Uses dual decoding (reducing the number of selection lines).
    • A0~A7 are row address decode lines.
    • A8~A14 are column address decode lines.
  • There are 8 bidirectional data lines.

3.2.3 Read/Write Cycle Waveform Diagram#

Insert image description here
Insert image description here
Example 1: The figure shows the write timing diagram of SRAM. R/W is the read/write command control line. When the R/W line is low, the memory writes the data on the data line to the memory at the specified address. Please identify the errors in the write timing diagram and draw the correct write timing diagram.
Insert image description here

Solution: The timing signals for writing to memory must be synchronized. Typically, when a negative pulse is applied to the R/W line, the levels of the address lines and data lines must be stable. When the R/W line reaches a low level, the data is immediately stored. Therefore, if the data line changes value while the R/W line is low, the memory will store the new data⑤. Similarly, if the address line changes while the R/W line is low, the data will also be stored at the new address② or③.
The correct write timing diagram is shown in diagram (b).

3.3 DRAM Memory (Dynamic Random Access Memory)#

The storage element of SRAM memory is a flip-flop, which has two stable states. The storage element of DRAM memory consists of a MOS transistor and a capacitor.
The MOS transistor acts as a switch, and the stored information of 1 or 0 is represented by the charge amount on the capacitor.

  • When the capacitor is fully charged, it represents storing a 1;
  • When the capacitor discharges and has no charge, it represents storing a 0.

The difference between DRAM and SRAM is:

  • It adds row address latches and column address latches. Since DRAM memory has a large capacity, the width of the address lines must be increased accordingly, which inevitably increases the number of pins for the chip's address lines. To avoid this, the method of time-division transmission of address codes is adopted. If the address bus width is 10 bits, the address codes A0 to A9 are first sent, entered into the row address latch by the row select signal RAS; then the address codes A10 to A19 are sent, entered into the column address latch by the column select signal CAS. The two parts of the chip together achieve an address line width of 20 bits, with a storage capacity of 1M×4 bits.
  • It adds refresh counters and corresponding control circuits. DRAM must be refreshed after reading, and unaccessed storage elements must also be refreshed periodically, and they must be refreshed by row, so the length of the refresh counter is equal to the row address latch. Refresh operations and read/write operations are alternated, so a 2-to-1 multiplexer is used to provide either the refresh row address or the normal read/write row address.

3.3.3 Read/Write Cycle, Refresh Cycle (Key Points)#

Read/Write Cycle#

The definition of read cycle and write cycle is from the falling edge of the row select signal RAS to the next falling edge of the RAS signal, which is the time interval between two consecutive read cycles. For convenience of control, the read cycle and write cycle times are usually equal.
Insert image description here

Refresh Cycle#

DRAM storage bits are based on the charge amount on the capacitor, which decreases over time and temperature, so they must be refreshed periodically to maintain the correct information they originally remembered.
There are two types of refresh operations: centralized refresh and distributed refresh.

Centralized Refresh#

All rows of DRAM are refreshed in each refresh cycle. For example, if the refresh cycle is 8ms, all rows must be refreshed every 8ms. To do this, the 8ms time is divided into two parts: the first part is for normal read/write operations, and the second part (from 8ms to the normal read/write cycle time) is for centralized refresh operations.

Distributed Refresh#

The refresh of each row is interleaved into the normal read/write cycle. For example, in the DRAM shown in figure 3.7 on page 70, if the refresh cycle is 8ms, then each row must be refreshed every 8ms ÷ 1024 = 7.8us. Distributed refresh has no dead time!

3.4 Read-Only Memory (ROM) and Flash Memory#

1. Mask ROM (MROM)#

ROM with fixed storage content, provided by the manufacturer. Once the ROM chip is made, the stored content cannot be changed. It is used to store widely used programs or data with standard functions, or user-customized programs or data with special functions (all of which use binary code).

  • Advantages: High reliability and integration, low price.
  • Disadvantages: Cannot be rewritten.

2. Programmable ROM#

Users can modify its stored content. Depending on the programming operation, programmable ROM can be divided into:

  • One-time programmable (PROM)
    Characteristics: Users can change certain storage elements in the product; users can program once.
    Advantages: Can be programmed according to user needs.
    Disadvantages: Can only be rewritten once.
  • Erasable Programmable ROM (EPROM)
    The stored content can be written as needed. When an update is needed, the original stored content is erased, and new content is written in.
  • Electrically Erasable Programmable ROM (EEPROM)

3. Flash Memory#

  • Flash memory is also translated as flash storage, which is a high-density non-volatile read/write memory.
  • High density means it has a huge number of bits of storage capacity.
  • Non-volatile means the stored data can be preserved for a long time without power.
  • It has the advantages of both RAM and ROM, representing a revolutionary advancement in storage technology.

The basic operations of flash memory include programming operations, reading operations, and erasing operations.

3.5 Parallel Memory (Key Points)#

  • Due to the speed mismatch between the CPU and main memory, this situation has become a major issue limiting the design of high-speed computers.
  • To increase the data transfer rate between the CPU and main memory, in addition to using faster technologies for main memory to shorten read times, parallel technology memory can also be used.
  • Dual-port memory — spatial parallel technology.
  • Multi-module cross memory — temporal parallel technology.

3.5.1 Dual-Port Memory#

1. Logic Structure of Dual-Port Memory#

Dual-port memory is named because the same memory has two sets of independent read/write control circuits. Due to the independent operations being performed in parallel, it is a high-speed working memory that is very useful in research and engineering.
For example, the logic diagram of dual-port memory IDT7133 is shown in the figure on the next page.
Insert image description here

2. Non-Conflict Read/Write Control#

  • When the addresses of the two ports are different, read/write operations can be performed on both ports without conflict.
  • When either port is selected and driven, the entire memory can be accessed, and each port has its own chip select control (CE) and output driver control (OE).
  • During read operations, the OE (active low) of the port opens the output driver, and the data read from the storage matrix appears on the I/O line.

3. Conflict Read/Write Control#

  • When both ports access the same storage unit in memory simultaneously, a read/write conflict occurs.
  • To solve this problem, a BUSY flag is set. In this case, the chip's judgment logic can decide which port to prioritize for read/write operations, while the other delayed port is set to BUSY (BUSY becomes low), effectively temporarily disabling that port.

3.5.2 Multi-Module Cross Memory#

1. Modular Organization of Memory#

A main memory composed of several modules is linearly addressed. There are two ways to arrange these addresses in each module: one is sequential, and the other is crossed.
Insert image description here
[Example] M0-M3 has four modules, each with 8 words.
Sequential Method: M0: 0-7
       M1: 8-15
       M2: 16-23
       M3: 24-31
5-bit address organization is as follows: X X X X X
High bits select module, low bits select address within block.

  • Characteristics: When a certain module is accessed, other modules do not work.
  • Advantages: When a certain module fails, other modules can continue to work, and it is relatively easy to expand memory capacity by adding modules.
  • Disadvantages: Each module works serially, limiting the bandwidth of the memory.

[Example] M0-M3 has four modules, each with 8 words.
Cross Method: M0: 0, 4,…… remainder 0 when divided by 4
        M1: 1, 5,…… remainder 1 when divided by 4
        M2: 2, 6,…… remainder 2 when divided by 4
        M3: 3, 7,…… remainder 3 when divided by 4
5-bit address organization is as follows: X X X X X
High bits select address within block, low bits select module.

  • Characteristics: Continuous addresses are distributed across adjacent different modules, and addresses within the same module are not continuous.
  • Advantages: Block transfers of continuous words can achieve multi-module pipelined parallel access, greatly improving the bandwidth of the memory. This is used for batch data reading.

2. Basic Structure of Multi-Module Cross Memory#

The figure below shows the structure diagram of a four-module cross memory. The main memory is divided into 4 independent modules M0, M1, M2, and M3, each with its own read/write control circuit, address register, and data register, all transmitting information to the CPU in the same way. Ideally, if program segments or data blocks are continuously accessed in main memory, the access speed of main memory will be greatly improved.
Insert image description here

3.6 Cache Memory (Key Points)#

3.6.1 Basic Principles of Cache#

1. Function of Cache#

  • Cache is an important technology adopted to solve the speed mismatch problem between the CPU and main memory.
  • It is a small capacity high-speed buffer memory located between the CPU and main memory.
  • Based on the locality principle of program access.
  • It can quickly provide instructions and data to the CPU, thereby accelerating the execution speed of the program.
  • To pursue high speed, all functions, including management, are implemented by hardware.
Locality Principle of Program Access#

In a short time interval, the program frequently accesses memory addresses within a local range while accessing addresses outside that range very rarely, which is called the locality of the program.
Generally, cache is made of high-speed SRAM, which is more expensive than main memory, but because its capacity is much smaller than main memory, it can better solve the conflict between speed and price.

2. Basic Principles of Cache#

  • The design basis of cache: The data accessed by the CPU this time is likely to also be nearby data in the next access. (Locality of program access)
  • Data exchange between the CPU and Cache is word-oriented.
  • Data exchange between main memory and Cache is block-oriented.
  • When the CPU reads a word from memory, it sends the memory address of that word to both Cache and main memory. At this time, the Cache control logic determines based on the address whether this word is currently in Cache. If it is, this word is immediately sent to the CPU; if not, it reads this word from main memory using the main memory read cycle and sends it to the CPU, while also reading the entire data block containing this word from main memory into Cache.

In the figure below, Cache is divided into 4 rows, each containing 4 words. The addresses allocated to Cache exist in a Content Addressable Memory (CAM), which is a memory that is addressed by content. When the CPU executes a memory access instruction, it sends the address of the word to be accessed to both CAM and main memory. The address sent to CAM is compared by content, and if that word is not in Cache, it is found in main memory and sent to the CPU. Meanwhile, the entire row of data containing the four words before and after that word is sent into Cache.
Insert image description here

3. Structure of Cache#

  • The data block of Cache is called a row, denoted as L~i~, where i=0, 1, … , m-1.
  • The data block of main memory is called a block, denoted as B~j~, where j=0, 1, … , n-1.
  • Rows and blocks are of equal length, and each row (block) contains k main memory words.
  • Cache consists of data memory and tag memory.
    • Data memory: Stores the data of one data block from main memory.
    • Tag memory: Stores the address information of the data in main memory.
      Insert image description here

4. Hit and Miss#

Hit:

  • Main memory block is loaded into cache.
  • Main memory block and cache block establish a corresponding relationship.
  • A tag records the main memory block number that has established a corresponding relationship with a certain cache block.

Miss:

  • Main memory block is not loaded into cache.
  • Main memory block and cache block have not established a corresponding relationship.

Hit Rate:

  • From the perspective of the CPU, the purpose of adding a cache is to make the average read time of main memory as close as possible to the read time of cache in terms of performance.
  • To achieve this goal, the portion of all memory accesses that cache satisfies for the CPU should occupy a high proportion, meaning the hit rate of cache should be close to 1.
  • During the execution of a program, let Nc represent the total number of accesses completed by cache, and Nm represent the total number of accesses completed by main memory, and let h define the hit rate, then h = Nc / (Nc + Nm).
  • If Tc represents the cache access time when there is a hit, and Tm represents the main memory access time when there is a miss, then the average access time Ta of the cache/main memory system is: Ta=hTc+(1h)TmT_a = h * T_c +(1-h) * T_m
  • Our goal is to make the average access time T~a~ of the cache/main memory system as close to T~c~ as possible with a small hardware cost.
  • Let r represent the ratio of main memory being slower than cache: r=TmTcr = \frac{T_m}{T_c}
    e represents access efficiency, then:
= \frac{T_c} {h*T_c + (1-h)*T_m }\\ = \frac{1} {h + (1-h)*r}\\ = \frac{1} {r + (1-r)*h}$$ - From the expression, to improve access efficiency, **the hit rate h should be as close to 1 as possible, and the r value should ideally be between 5 and 10**, not too large. - The hit rate h is related to the behavior of the program, the capacity of the cache, the organization method, and the block size. Example 6: When the CPU executes a program, the number of accesses completed by cache is 1900 times, and the number of accesses completed by main memory is 100 times. Given that the cache access cycle is 50ns and the main memory access cycle is 250ns, calculate the efficiency and average access time of the cache/main memory system. Solution: h= Nc / (Nc + Nm) = 1900/(1900+100)=0.95 r= Tm / Tc = 250ns/50ns=5 e=1/(r+(1-r)h)=1/(5+(1-5)×0.95)=83.3% e= Tc / Ta Ta= Tc / e=50ns/0.833=60ns ### 3.6.2 Address Mapping between Main Memory and Cache - Compared to main memory, **the capacity of cache is very small, and the content it holds is only a subset of the content of main memory**, and the data exchange between cache and main memory is done in blocks. - To place a main memory block into cache, some method must be applied to locate the main memory address in cache, which is called **address mapping**. - Regardless of which mapping method is chosen, both main memory and cache must be divided into blocks of the same size. - When choosing which mapping method to use, consider: - Whether the hardware is easy to implement. - Whether the speed of address transformation is fast. - Whether the utilization rate of main memory space is high. - The probability of conflict when a block is loaded into main memory. - Below we introduce three mapping methods: - **Fully Associative Mapping, Direct Mapping, Set Associative Mapping**. #### Fully Associative Mapping The address (block number) of a block in main memory and its content (word) are stored in the rows of cache, where the block address is stored in the tag part of the cache row. Each block in main memory can be mapped to any row in cache. ![Insert image description here](https://img-blog.csdnimg.cn/20210615202323953.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1ODkwNTMz,size_16,color_FFFFFF,t_70) - The CPU memory access instruction specifies a memory address (composed of block number and word). - For fast retrieval, the block number in the main memory address is compared simultaneously with the tags of all rows in Cache in a comparator. If they match, it indicates a hit, and a word is read from cache; if there is a miss, the word is read from main memory according to the memory address. - Conversion formulas: - Length of main memory address = (s+w) bits. - Number of addressable units = 2^w words or bytes. - Block size = row size = 2^w words or bytes. - Number of blocks in main memory = 2^s. - Tag size = s bits. - Number of rows in cache = not determined by address format. - Advantages: Allows a block from main memory to be directly copied to any row in cache, **very flexible**, with **high utilization of cache space** and **low probability of block conflict**. - Disadvantages: **Comparator is difficult to implement**, requiring a fast and expensive associative memory, **only suitable for small capacity cache**. #### Direct Mapping - A **many-to-one** mapping relationship. - **A block from main memory can only be copied to a specific row position in cache**. - The row number i of Cache and the block number j of main memory have the following functional relationship: - i= j mod m (m is the total number of rows in cache). - The content of the j-th block in main memory is copied to the i-th row of Cache. ![Insert image description here](https://img-blog.csdnimg.cn/20210615202913451.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1ODkwNTMz,size_16,color_FFFFFF,t_70) ##### Retrieval Process of Direct Mapping - Use the row number to select the corresponding row. - Compare the row tag with the CPU access address. If they match, it indicates a hit, and access Cache; if there is a miss, access main memory and write the corresponding block into Cache. ![Insert image description here](https://img-blog.csdnimg.cn/20210616090434170.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1ODkwNTMz,size_16,color_FFFFFF,t_70) **Conversion formulas**: - Length of main memory address = (s+w) bits. - Number of addressable units = 2^s+w words or bytes. - Block size = row size = 2^w words or bytes. - Number of blocks in main memory = 2^s. - Number of rows in cache = m = 2^r. - Tag size = (s-r) bits. Advantages: Fewer comparison circuits, m times fewer lines, so **hardware implementation is simple**, and the Cache address is the low few bits of the main memory address, requiring no transformation. Disadvantages: **High probability of conflict**. Application scenario: Suitable for **large capacity Cache**. #### Set Associative Mapping The advantages and disadvantages of fully associative mapping and direct mapping are exactly opposite. From the perspective of flexibility of storage location and hit rate, the former is superior; from the perspective of simplicity of comparator circuits and hardware investment, the latter is better. - Cache is divided into u groups, each with v rows. - The mapping between main memory blocks and cache groups uses direct mapping, meaning **which group a main memory block is stored in is fixed**; within each group, the rows use fully associative mapping, meaning **a main memory block can be mapped to any row in a fixed cache group**. - The relationship between the group number q of Cache and the block number j of main memory is: - q= j mod u (u is the total number of groups in cache). - The content of the j-th block in main memory is copied to a certain row in the q-th group of Cache. - Address transformation: - Let the main memory address x check whether it is in cache, first y= x mod u, then look up once in group y. Easier to implement than fully associative mapping, with lower conflict. If v=1, it becomes direct mapping. If u=1, it becomes fully associative mapping. The value of v is generally small, typically a power of 2 (typical values are 2, 4, 8, 16), referred to as v-way set associative cache. ![Insert image description here](https://img-blog.csdnimg.cn/20210616091621259.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1ODkwNTMz,size_16,color_FFFFFF,t_70) **Conversion formulas**: - Length of main memory address = (s+w) bits. - Number of addressable units = 2^s+w words or bytes. - Block size = row size = 2^w words or bytes. - Number of blocks in main memory = 2^s. - Number of rows per group = v. - Number of groups u = 2^d. - Number of rows in cache = uv. - Tag size = (s-d) bits. Example 6: A set associative cache consists of 64 rows, each group has 4 rows, and the main memory contains 4K blocks, each block has 128 words. Please represent the format of the memory address. Solution: Block size = row size = 2^w words = 128 = 2^7, so w = 7. Number of rows per group = v = 4. Number of rows in cache = uv = 64, so number of groups u = 16. u = 2^d = 16 = 2^4, so d = 4. Number of blocks in main memory = 2^s = 4K = 2^2 * 2^10 = 2^12, so s = 12. The address format is as follows: | Tag s-d | Group Number d | Word Number w| |--|--|--| | 8 bits | 4 bits | 7 bits | ### 3.6.3 Replacement Strategies - The working principle of Cache requires that the most recent data be preserved as much as possible. - When a new memory block needs to be copied to cache, and all allowed positions for storing this block are occupied by other main memory blocks, a replacement is needed. - In direct mapping: **direct replacement**. - In fully associative and set associative mapping: A row is selected from several specific rows that allow the new main memory block to be stored. - Commonly used replacement algorithms: - Least Frequently Used (LFU) algorithm. - Least Recently Used (LRU) algorithm. - Random replacement. #### Least Frequently Used (LFU) Algorithm - Replace **the row of data that has been accessed the least in a certain period of time**. - **Each row is set with a counter**. After a new row is established, it starts counting from 0, and each time it is accessed, the counter of the accessed row increments by 1. When a replacement is needed, the row with the smallest count value is replaced, and the counters of these rows are reset to zero. - This algorithm limits the counting period to the interval between two replacements of specific rows, **which cannot reflect the recent access situation of the cache**. #### Least Recently Used (LRU) Algorithm - Replace **the row that has not been accessed for the longest time recently**. - Each row also has a counter set; each time the cache hits, the counter of the hit row is reset to zero, while the counters of other rows increment by 1. When a replacement is needed, the row with the largest count value is replaced. - This algorithm protects the new data row that has just been copied to cache, resulting in **a higher hit rate**. #### Random Replacement - Randomly select a row from specific row positions to **replace**. - This strategy is **easy to implement in hardware** and **is faster than the first two strategies**. - The downside is that the data randomly replaced may be used again soon, thus **reducing the hit rate and cache working efficiency**. However, this drawback diminishes as cache capacity increases. ## 3.7 Virtual Memory (Key Points) ### 3.7.1 Basic Concepts of Virtual Memory #### 1. Real Address and Virtual Address - The address used by the user **when writing programs** is called a **virtual address or logical address**, and the corresponding storage space is called **virtual memory space or logical address space**. - The access address of **computer physical memory** is called a **real address or physical address**, and the corresponding storage space is called **physical storage space or main memory space**. - The process of **converting virtual addresses to real addresses by the program is called program relocation**. #### 2. Access Process of Virtual Memory - User programs in the virtual memory space are programmed using virtual addresses and stored in **auxiliary storage**. When the program runs, **the address translation mechanism** loads a part of the program into real memory based on the real address space allocated to that program. - Each time memory is accessed, it first checks whether the part corresponding to the virtual address is in real memory: - If it is, address translation is performed and the main memory is accessed using the real address; - Otherwise, a part of the program in auxiliary storage is scheduled into memory according to a certain algorithm, and then the main memory is accessed in the same way. - Thus, it can be seen that the virtual address space of each program can be much larger than the real address space, or it can be much smaller. - The former aims to **increase storage capacity**, while the latter aims for **address transformation**. - The latter usually occurs in multi-user or multitasking systems: the real memory space is large, but a single task does not require a large address space, and a smaller virtual memory space can **shorten the instruction address field length**. - Each program can have a virtual memory that has the **capacity of auxiliary storage** and **access speed close to main memory**. However, this virtual memory is a conceptual model composed of main memory, auxiliary storage, and auxiliary storage management components, not an actual physical memory. Virtual memory is implemented by adding some hardware and software outside of main memory and auxiliary storage. #### 3. Similarities and Differences between Cache and Virtual Memory - From the concept of virtual memory, the access mechanism between main memory and auxiliary storage is similar to that between cache and main memory. This is two levels in a three-level storage system composed of cache memory, main memory, and auxiliary storage. - There are auxiliary hardware and auxiliary software responsible for address transformation and management between cache and main memory, as well as between main memory and auxiliary storage, so that the various levels of memory can form an organic three-level storage system. - **Cache and main memory constitute the system's memory, while main memory and auxiliary storage rely on the support of auxiliary software and hardware to form virtual memory.** In the three-level storage system, there are many similarities between cache-main memory and main memory-auxiliary storage: - **Same starting point**: Both are constructed as layered storage systems to **improve the performance-price ratio of the storage system**, striving to make the performance of the storage system close to that of high-speed storage while keeping the price and capacity close to that of low-speed storage. - **Same principle**: Both utilize the **locality principle of program execution** to transfer the most recently used information blocks from relatively slow but large-capacity storage to relatively fast but small-capacity storage. However, there are also many differences between the two storage levels of cache-main memory and main memory-auxiliary storage: - **Different focus**: **Cache mainly solves the speed difference problem between main memory and CPU**; while in terms of improving performance-price ratio, **virtual memory mainly addresses storage capacity issues**, as well as storage management, main memory allocation, and storage protection. - **Different data paths**: **There are direct access paths between the CPU and both cache and main memory**, and when cache misses, it can directly access main memory; while **there is no direct data path between auxiliary storage and the CPU**, and when main memory misses, it can only be resolved through page swapping, with the CPU ultimately accessing main memory. - **Different transparency**: **Cache management is completely handled by hardware, transparent to both system programmers and application programmers**; while virtual memory management is jointly handled by software (operating system) and hardware, due to the involvement of software, **virtual memory is opaque to system programmers implementing storage management, but transparent to application programmers** (segment and page management are "semi-transparent" to application programmers). - **Different losses when missing**: Since the access time of main memory is 5 to 10 times that of cache, and **the access speed of main memory is usually thousands of times faster than that of auxiliary storage**, the **performance loss of the system when main memory misses is much greater than that when cache misses**. #### 4. Key Issues to Solve in Virtual Memory Mechanism - **Scheduling issues** — deciding which programs and data should be loaded into main memory. - **Address mapping issues** — transforming virtual addresses into physical addresses of main memory (this process is called **internal address transformation**) and transforming virtual addresses into physical addresses of auxiliary storage (this process is called **external address transformation**) for page swapping. Additionally, it must solve issues such as main memory allocation, storage protection, and program relocation. - **Replacement issues** — deciding which programs and data should be swapped out of main memory. - **Update issues** — ensuring **consistency between main memory and auxiliary storage**. Under the control of the operating system, hardware and system software solve the above problems for users, greatly simplifying application programming. ### 3.7.2 Page-Based Virtual Memory #### 1. Page-Based Virtual Memory Address Mapping - In a page-based virtual storage system, the virtual address space is divided into equally sized pages, called **logical pages**; - The main memory space is also divided into pages of the same size, called **physical pages**. - The virtual address is divided into two fields: **the high field is the logical page number, and the low field is the page offset**; - The physical address is also divided into two fields: **the high field is the physical page number, and the low field is the page offset.** - The **page table** can be used to convert virtual addresses (logical addresses) into physical addresses. The address mapping process of a page-based virtual storage system is shown in the figure below. ![Insert image description here](https://img-blog.csdnimg.cn/20210617193837358.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1ODkwNTMz,size_16,color_FFFFFF,t_70) - In most systems, each process corresponds to a page table. Each entry in the page table corresponds to a virtual memory page and contains the address of the main memory page (physical page number) where that virtual memory page is located, as well as a valid bit indicating whether that logical page has been loaded into main memory. - During address transformation, **the logical page number is used as the offset address to index the page table (treating the virtual page number as the subscript of the page table array) and find the corresponding physical page number, which is then used as the high field of the physical address, concatenated with the page offset of the virtual address** to form the complete physical address. - The number of pages required by each process is not fixed, so the length of the page table is variable. Therefore, a common implementation method is to store the base address of the page table in a register, while the page table itself is placed in main memory. Since the virtual memory address space can be very large, the page table for each process can potentially be very long. To save the main memory space occupied by the page table itself, some systems store the page table in virtual memory, so the page table itself also needs to be paged. - When a process runs, part of its page table is in main memory, while another part is stored in auxiliary storage. - Other systems adopt a **two-level page table** structure. Each process has a page directory table, where each entry points to a page table. Therefore, **if the length of the page directory table (number of entries) is m, and the maximum length of each page table (number of entries) is n, then a process can have at most m×n pages.** - In systems with large page table lengths, a reverse page table can also be used to achieve reverse mapping from physical page numbers to logical page numbers. #### 2. Translation Lookaside Buffer (TLB) - Since page tables are usually in main memory, even if the logical page is already in main memory, it requires at least two accesses to physical memory to achieve one memory access, which doubles the access time of virtual memory. - To avoid increasing the number of accesses to main memory, a two-level cache can be implemented for the page table itself, storing the most active parts of the page table in high-speed memory, forming a **TLB**. - The complete page table stored in main memory is referred to as the **slow table**. - The address mapping process of TLB is shown in the figure. ![Insert image description here](https://img-blog.csdnimg.cn/20210617194507114.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1ODkwNTMz,size_16,color_FFFFFF,t_70) #### 3. Internal Page Table and External Page Table - The page table is a **transformation table from virtual addresses to physical addresses in main memory**, usually referred to as the internal page table. - External page table: Used for **transformations between virtual addresses and auxiliary storage addresses**. - When a page fault occurs in main memory, the page swapping operation must first locate the auxiliary storage, and the structure of the external page table is closely related to the addressing mechanism of auxiliary storage. - For example, for disks, the auxiliary storage address includes disk drive number, head number, track number, and sector number, etc. The main advantages of page-based storage are: - High utilization of main memory. - Relatively simple page table. - Fast speed of address transformation. - Easier management of disks. Main disadvantages: - Poor modular performance of programs. - Long page tables, requiring a large amount of storage space. ### 3.7.3 Segment-Based Virtual Memory and Segment-Page-Based Virtual Memory #### 1. Segment-Based Virtual Memory - **Segments are variable-length areas divided according to the natural boundaries of the program.** - Programmers divide different types of data such as subroutines, operands, and constants into different segments, and each program can have multiple segments of the same type. - In a segment-based virtual storage system, **the virtual address consists of a segment number and an address within the segment (offset)**. The transformation from virtual address to real main memory address is achieved through a **segment table**. - Each program has a segment table, and each entry in the segment table corresponds to a segment. Each entry contains at least the following three fields: - **Valid bit**: Indicates whether the segment has been loaded into real memory. - **Segment base address**: Indicates the starting address of the segment in real memory when it has been loaded. - **Segment length**: Records the actual length of the segment. The purpose of setting the segment length field is to ensure that when accessing the address space of a segment, the address within the segment does not exceed the segment length, preventing address overflow that could corrupt other segments. The segment table itself is also a segment and can exist in auxiliary storage, but is generally resident in main memory. The transformation process from segment-based virtual address to real memory address is shown in the figure. ![Insert image description here](https://img-blog.csdnimg.cn/20210617195345818.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1ODkwNTMz,size_16,color_FFFFFF,t_70) Segment-based virtual memory has many advantages: - ① The **logical independence** of segments makes them easy to compile, manage, modify, and protect, and also **facilitates sharing among multiple programs**. - ② **Segment lengths can be dynamically changed as needed**, allowing for flexible scheduling to effectively utilize main memory space. Disadvantages: - ① Because the length of segments is not fixed, **allocating main memory space is more complicated**. - ② It is easy to leave many **external fragments** between segments, reducing the utilization rate of storage space. - ③ Since segment lengths are not necessarily powers of 2, it cannot simply use the lowest few binary bits of the virtual address and real address as the offset within the segment, and must use addition to calculate the physical address by summing the segment base address and the offset. Therefore, segment-based storage management requires **more hardware support** than page-based storage management. #### 2. Segment-Page-Based Virtual Memory - Segment-page-based virtual memory is a combination of segment-based virtual memory and page-based virtual memory. **Main memory is divided into pages. Each program is first divided into segments according to its logical structure, and each segment is then paged according to the size of the pages in main memory**, allowing programs to be loaded and unloaded by pages while enabling programming, protection, and sharing by segments. [Example 1] Suppose there are three programs, represented by base numbers A, B, and C, with their base register contents being S~A~, S~B~, and S~C~ respectively. Program A consists of 4 segments, while program C consists of 3 segments. The transformation process from logical address to physical address in a segment-page-based virtual storage system is shown in the figure. In main memory, each program has a segment table, program A has 4 segments, and program C has 3 segments, with each segment having a page table. Each row of the segment table represents the starting position of the corresponding page table, while each row within the page table corresponds to the physical page number. Please explain the process of transforming virtual addresses to physical addresses. ![Insert image description here](https://img-blog.csdnimg.cn/20210617200047485.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1ODkwNTMz,size_16,color_FFFFFF,t_70) Solution: The address transformation process is as follows: (1) The storage management component finds the segment table base address from the base register table based on the base number C, obtaining the segment table base address SC for program C. Then, it finds the starting address of the page table for segment S (where S=1) from the S-th entry of the segment table for program C. (2) The logical page number P (where P=2) is used to look up the page table, obtaining the physical page number (shown in the figure as 10). (3) The physical page number is concatenated with the page offset to obtain the physical address. If there is only one base register in the computer system, the base number can be omitted. During multi-program switching, the operating system modifies the contents of the base register. It can be seen that the disadvantage of segment-page-based virtual memory is that the mapping process from virtual addresses to main memory addresses requires multiple table lookups, making it more complex to implement. ### 3.7.4 Replacement Algorithms for Virtual Memory When pages are swapped from auxiliary storage to main memory and main memory is full, it is also necessary to replace pages in main memory. The replacement algorithms for virtual memory are similar to those for cache: there are FIFO algorithms, LRU algorithms, LFU algorithms, etc. The differences between the replacement algorithms for virtual memory and those for cache are: - (1) Cache replacement is entirely implemented by hardware, while virtual memory replacement has the support of the operating system. - (2) The impact of page faults in virtual memory on system performance is much greater than that of cache misses, as page swapping requires accessing auxiliary storage and involves task switching. - (3) There is a wide range of choices for page replacement in virtual memory; any page belonging to a process can be replaced. ## Summary of This Chapter - The requirements for memory are large capacity, fast speed, and low cost. To resolve the contradictions in these three aspects, computers adopt a **multi-level storage architecture**, namely cache, main memory, and external storage. The CPU can directly access memory (cache, main memory), but cannot directly access external storage. **The technical indicators of memory** include storage capacity, access time, storage cycle, and memory bandwidth. - Widely used **SRAM and DRAM** are both semiconductor random read/write memories. The former is faster than the latter but has lower integration. Both have the advantages of small size, high reliability, and low cost, but the disadvantage is that they cannot retain information after power loss. - **Read-only memory and flash memory** precisely compensate for the shortcomings of SRAM and DRAM, retaining the originally written data even after power loss. In particular, flash memory provides high performance, low power consumption, high reliability, and mobility, representing a new storage architecture. - **Dual-port memory and multi-module cross memory** belong to parallel memory structures. The former uses spatial parallel technology, while the latter uses temporal parallel technology. These two types of memory are widely used in research and engineering. - Cache is a high-speed buffer memory, an important hardware technology adopted to solve the speed mismatch between the CPU and main memory, and has developed into a multi-level cache system, with instruction cache and data cache separated. The hit rate of cache should be close to 1. The address mapping between main memory and cache has **fully associative, direct, and set associative** three methods. Among them, set associative mapping is a compromise solution between the first two, moderately balancing their advantages while trying to avoid their disadvantages, making it ideal in terms of flexibility, hit rate, and hardware investment, thus widely adopted. - User programs are programmed using virtual addresses (logical addresses) and stored in auxiliary storage. When the program runs, the address transformation mechanism loads a part of the program into real memory (physical storage space or main memory space) based on the real address space allocated to that program. The operating system, with the support of hardware, performs the transformation from virtual addresses to real addresses for the program, a process known as program relocation. **Each time memory is accessed, it first checks whether the part corresponding to the virtual address is in real memory: if so, address transformation is performed and the main memory is accessed using the real address; otherwise, a part of the program in auxiliary storage is scheduled into memory according to a certain algorithm, and then the main memory is accessed in the same way.** For application programs, if the hit rate of main memory is high, the access time of virtual memory approaches that of main memory, while the size of virtual memory depends only on the size of auxiliary storage. - The virtual memory mechanism also needs to solve some key issues, including **scheduling issues, address mapping issues, and replacement issues**. Under the control of the operating system, hardware and system software solve the above problems for users, greatly simplifying application programming. - In the **page-based virtual storage system**, both the virtual address space and the main memory space are divided into equally sized pages, and the page table can be used to convert virtual addresses into physical addresses. To avoid increasing the number of accesses to main memory, a two-level cache can be implemented for the page table itself, storing the most active parts of the page table in the Translation Lookaside Buffer (TLB). - The disadvantage of paging is that the page length is not related to the logical size of the program, while the segment method can divide the memory space into dynamically changing length storage areas according to the natural boundaries of the program. In the **segment-based virtual storage system**, the virtual address consists of a segment number and an address within the segment (offset). The transformation from virtual address to real main memory address is achieved through a **segment table**. - **Segment-page-based virtual memory** is a combination of segment-based virtual memory and page-based virtual memory, allowing programs to be loaded and unloaded by pages while enabling programming, protection, and sharing by segments. Virtual memory also addresses storage protection and other issues. In virtual storage systems, page table protection, segment table protection, and key-based protection methods are typically used to implement storage area protection. Access method protection can also be achieved in conjunction with the usage of information in main memory.
Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.