日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 >

linux中的memory management和page mapping

發(fā)布時(shí)間:2023/11/29 31 豆豆
生活随笔 收集整理的這篇文章主要介紹了 linux中的memory management和page mapping 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

1 首先要說的最簡單的是在一個(gè)process在運(yùn)行的時(shí)候,它看到的內(nèi)存是這個(gè)樣子的。3G以后是給kernel使用的運(yùn)行和動(dòng)態(tài)分配的內(nèi)存的空間,注意因?yàn)槭莗rocess所看到的,下面全部都是虛擬地址空間。
如下:

?2 然后需要說的是Linux Physical Memory Layout
下面這段話解釋了為什么linux不能占用所有的Ram內(nèi)存:

Why isn't the kernel loaded starting with the first available megabyte of RAM? Well, the PC architecture has
several peculiarities that must be taken into account.

For example:
1 Page frame 0 is used by BIOS to store the system hardware configuration detected during the
Power-On Self-Test(POST); the BIOS of many laptops, moreover, writes data on this page frame
even after the system is initialized.

2 Physical addresses ranging from 0x000a0000 to 0x000fffff are usually reserved to BIOS
routines and to map the internal memory of ISA graphics cards. This area is the well-known hole from
640 KB to 1 MB in all IBM-compatible PCs: the physical addresses exist but they are reserved, and
the corresponding page frames cannot be used by the operating system.

3 Additional page frames within the first megabyte may be reserved by specific computer models. For
example, the IBM ThinkPad maps the 0xa0 page frame into the 0x9f one.

所以總之一句話:前1M的內(nèi)存存儲(chǔ)了BIOS和其他一些硬件信息。所以Linux代碼物理開始地址在1M處。

在不考慮virtual address也就是不考慮使用page table的時(shí)候,kernel的物理占用如下圖所示:


圖中各個(gè)段的含義都已經(jīng)很明確了。kernel物理內(nèi)存 [_text? _end].
具體的值可以不用細(xì)扣,因?yàn)椴煌募軜?gòu)上,不同的內(nèi)核編譯后可能位置和大小可能有偏差。
比如我的?linux-2.6.38.8版本的內(nèi)核編譯后產(chǎn)生的System.map文件中_text 和 _end的地址為:

  • 0xc0400000 --- _text
  • 0xc0cc5000 --- _end

首先說明這是內(nèi)核使用page table之后的虛擬內(nèi)存的地址。
圖中_text在虛擬內(nèi)存中:起始于3G + 偏移量4M。
_end在虛擬內(nèi)存中:起始于3G + 偏移量超過12M。

這說明我用的內(nèi)核編譯后比上圖中的內(nèi)核要大一些。

3 Kernel Page Tables
因?yàn)镵ernel加載完初始完后,就會(huì)進(jìn)入保護(hù)模式,所以在往下走之前需要了解保護(hù)模式,并且了解Linux的Page Table的使用,如下可以是Linux的頁表的形式,每個(gè)Process和Kernel都有一個(gè)Page Table:

然后Process和Kernel的Page Table的關(guān)系是怎么樣的呢?請(qǐng)看這句引用:

1 The kernel maintains a set of page tables for its own use, rooted at a so-called master kernel Page Global Directory.

2 After system initialization, this set of page tables is never directly used by any process or kernel thread;

3 rather, the highest entries of the master kernel Page Global Directory are the reference model for the corresponding entries of the Page Global Directories of every regular process in the system.

將3這句話復(fù)制出來加以強(qiáng)調(diào):

the highest entries of the master kernel Page Global Directory are
the reference model for the corresponding entries of the Page Global Directories of
every regular process in the system.

--------------------------
4 加入頁表后,具體我們分為兩部分來講,
第1:Kernel Page Table中各映射了些什么東西?第2:Kernel是如何完成這些映射的?

第1:Kernel Page Table中各映射了些什么東西?就是Kernel在運(yùn)行的時(shí)候使用的Page Table。

依次介紹下:

  • Physical memory mapping ---- 這一塊是最基本的內(nèi)存映射,
    • 先假設(shè)內(nèi)存在0-896M(1G - 128M)之間,那么在初始化的時(shí)候,0x0 - 896M(physical address) ----(3G + 0x0) - (3G + 896M)[Linear address]了。Kernel的function variable地址在編譯的時(shí)候就確定好了為3G以后的Virtual address.因此Kernel是假設(shè)自己有1G的虛擬內(nèi)存可以使用的,頁不夠就swap【swap比較復(fù)雜,先假設(shè)自己知道,也可以先假設(shè)內(nèi)存足夠】。
    • 如果RAM實(shí)際大小大于896M,那么在訪問高地址的時(shí)候,動(dòng)態(tài)的remap【section later will discuss it】。
  • Fix-mapped linear addresses. ---- 只是知道這一塊可以被映射到任何的內(nèi)存,【不是太清楚用途,先放一放】
  • Persistent kernel mappings ----- Starting from PKMAP_BASE we find the linear addresses used for the persistent kernel mapping of high-memory page frames.
  • vmalloc area ----- Linux provides a mechanism via vmalloc() where non-contiguous physically memory can be used that is contiguous in virtual memory.【見下面non-contiguous memory allocation.】

?------------------------------------------------------------------------------------------------
?------------------------------------------------------------------------------------------------

Kernel Mappings of High-Memory page Frames
我想利用這個(gè)dynamic kernel-mapping來理解,linear address與physical address 的對(duì)應(yīng)關(guān)系的,以及內(nèi)核是如何keep track of physical page frame including low-memory and high memory.

1 直接用一段話來說明Kernel Mapping存在的必要性。

1 Where to store map page table(其實(shí)上圖中有)
The linear address that corresponds to the end of the directly mapped physical memory, and thus to the
beginning of the high memory, is stored in the high_memory variable, which is set to 896 MB.

2 Page frames above the 896 MB boundary are not generally mapped in the fourth gigabyte of
the kernel linear address spaces, so the kernel is unable to directly access them.

3 This implies that each page allocator function that returns the linear address of the
assigned page frame doesn't work for high-memory page frames, that is, for page frames in
the ZONE_HIGHMEM memory zone

所以說low-memory本來就被映射了,所以不需要remap。high-memory因?yàn)闆]有被page table映射,所以需要在用到的時(shí)候動(dòng)態(tài)的申請(qǐng)remap。

2 第一種方法:Permanent kernel mappings(如上圖的persistent kernel mappings位置)
用于映射的基本變量和數(shù)據(jù)結(jié)構(gòu):

  • pkmap_page_table ------- stores the address of this Page Table
  • LAST_PKMAP ------? macro yields the number of Page Table entries.
  • pkmap_count ------ array in kernel 原型為:int pkmap_count[LAST_PKMAP].
    The pkmap_count array includes LAST_PKMAP counters, one for each entry of the pkmap_page_table Page Table
    用于記錄counter。
    1 The counter is 0
    The corresponding Page Table entry does not map any high-memory page frame and is usable.

    2 The counter is 1
    The corresponding Page Table entry map any high-memory page frame, but it cannot be
    used because the corresponding TLB entry has not been flushed since its last usage.
    表明這個(gè)線性地址被映射過了,可是現(xiàn)在還沒有模塊使用它,它屬于閑置資源,如果暫時(shí)資源不夠就對(duì)這種資源進(jìn)行回收。

    3 The counter is n (greater than 1)
    The corresponding Page Table entry maps a high-memory page frame, which is used by exactly n - 1
    kernel components.

  • page_address_htable ----- This table contains one page_address_map data structure for each page frame in high memory that is currently mapped.
  • page_address_map ----- prototype 如下:
    struct page_address_map {
    struct page *page;
    void *virtual;
    struct list_head list;
    };
  • page_address( ) function ----- returns the linear address associated with the page frame, or NULL if the
    page frame is in high memory and is not mapped.
  • struct page ----- State information of a page frame is kept in a page descriptor of type page. All page descriptors are stored in the mem_map array.即是說physical address中的每一個(gè)page frame在內(nèi)核的初始化數(shù)據(jù)中都有對(duì)應(yīng)的一個(gè)struct page數(shù)據(jù)結(jié)構(gòu)。kernel就是通過對(duì)這些struct page類型的page descriptor調(diào)度和存儲(chǔ)信息的。就像進(jìn)程的基本信息都存放在struct task中一樣。還有下面這句話,所以說struct page是物理上的RAM的每一個(gè)page在kernel中的數(shù)據(jù)結(jié)構(gòu)的代表:
    The kernel must keep track of the current status of each page frame. For instance, it must
    be able to distinguish the page frames that are used to contain pages that belong to
    processes from those that contain kernel code or kernel data structures. Similarly, it must
    be able to determine whether a page frame in dynamic memory is free. A page frame in
    dynamic memory is free if it does not contain any useful data. It is not free when the page
    frame contains data of a User Mode process, data of a software cache, dynamically
    allocated kernel data structures, buffered data of a device driver, code of a kernel module,
    and so on

?首先要說明的是kernel對(duì)page的引用是這樣的:
假設(shè)Kernel當(dāng)前正在操作一個(gè)struct page,那么當(dāng)他想得到這個(gè)page的線性地址也就是虛擬地址的時(shí)候,調(diào)用page_address(page)返回它的線性地址。當(dāng)然如果它是low_memory或者它是high_memory并且已經(jīng)被映射。
如:_ _va((unsigned long)(page? -? mem_map)? <<? 12) ------ low memory這樣得到線性地址。

下面的偽代碼主要是解釋remap是如何進(jìn)行的,不解釋,具體參看書本<Understanding the linux kernel>:

void * kmap(struct page * page)
{
if (!PageHighMem(page))
return page_address(page);
return kmap_high(page);
}

void * kmap_high(struct page * page)
{
unsigned long vaddr;
spin_lock(&kmap_lock);
vaddr = (unsigned long) page_address(page);
if (!vaddr)
vaddr = map_new_virtual(page);
pkmap_count[(vaddr-PKMAP_BASE) >> PAGE_SHIFT]++;
spin_unlock(&kmap_lock);
return (void *) vaddr;
} View Code 1 for (;;) {
2 int count;
3 DECLARE_WAITQUEUE(wait, current);
4 for (count = LAST_PKMAP; count > 0; --count) {
5 last_pkmap_nr = (last_pkmap_nr + 1) & (LAST_PKMAP - 1);
6 if (!last_pkmap_nr) {
7 flush_all_zero_pkmaps( );
8 count = LAST_PKMAP;
9 }
10 if (!pkmap_count[last_pkmap_nr]) {
11 unsigned long vaddr = PKMAP_BASE +
12 (last_pkmap_nr << PAGE_SHIFT);
13 set_pte(&(pkmap_page_table[last_pkmap_nr]),
14 mk_pte(page, _ _pgprot(0x63)));
15 pkmap_count[last_pkmap_nr] = 1;
16 set_page_address(page, (void *) vaddr);
17 return vaddr;
18 }
19 }
20 current->state = TASK_UNINTERRUPTIBLE;
21 add_wait_queue(&pkmap_map_wait, &wait);
22 spin_unlock(&kmap_lock);
23 schedule( );
24 remove_wait_queue(&pkmap_map_wait, &wait);
25 spin_lock(&kmap_lock);
26 if (page_address(page))
27 return (unsigned long) page_address(page);
28 }

3 Temporary Kernel Mappings

Temporary kernel Mappings 和Permanent kernel mappings中有一個(gè)比較:

1 The temporary mapping of data from highmem into kernel virtual
memory is done using the functions kmap(), kunmap(), kmap_atomic() and kunmap_atomic().

2 The function kmap() gives you a persistant mapping, ie. one that will
still be there after you schedule and/or move to another CPU.
However, this kind of mapping is allocated under a global lock, which can be a bottleneck on SMP systems.
The kmap() function is discouraged.

3 Good SMP scalability can be obtained by using kmap_atomic(), which is lockless.
The reason kmap_atomic() can run without any locks is that the page is mapped to a fixed address
which is private to the CPU on which you run. Of course, this means that you can not schedule between setting up
such a mapping and using it, since another process running on the same CPU might also need the same address!
This is the highmem mapping type used most in the 2.6 kernel.

Fix-mapped 的一些數(shù)據(jù)結(jié)構(gòu):

  • enum fixed-address ----- 主要用于內(nèi)核編譯的時(shí)候確定virtual 地址,它還包括很多其他的用途,但是這里的Temporary kernal mapping只用到了FIX_KMAP_BEGIN和FIX_KMAP_END。以下是它的數(shù)據(jù)結(jié)構(gòu)定義:。
    ?Here we define all the compile-time 'special' virtual
    ?addresses. The point is to have a constant address at
    ?compile time, but to set the physical address only
    ?in the boot process. We allocate these special addresses
    ?from the end of virtual memory (0xfffff000) backwards.


    enum
    fixed_addresses{
      ....
      #ifdef CONFIG_HIGHMEM
    FIX_KMAP_BEGIN, /* reserved pte's for temporary kernel mappings */
    FIX_KMAP_END = FIX_KMAP_BEGIN+(KM_TYPE_NR*NR_CPUS)-1,
    #endif
      ....
    }
  • enum km_type --- 主要用于訪問high_memory的remap。
    1 Each CPU has its own set of 13 windows, represented by the enum km_type data structure.

    2 The kernel must ensure that the same window is never used by two kernel control paths at the same time.
    Thus, each symbol in the km_type structure is dedicated to one kernel component and is named after the
    component. The last symbol, KM_TYPE_NR, does not represent a linear address by itself, but yields the
    number of different windows usable by every CPU。

    以上的意思是:模塊總共可能有13個(gè)control path(kernel component)同時(shí)運(yùn)行,于是將這13個(gè)control path各分一個(gè)window
    (即一個(gè)page table entry)。這樣就不用加鎖,不會(huì)出現(xiàn)沖突了。同時(shí)如果是smp, 每個(gè)cpu都有13個(gè)window。

    【雖然暫時(shí)不知道為什么會(huì)有13個(gè)control path?但以后會(huì)理解的】
    下面這段代碼就是使用fixed_addresses and km_type來進(jìn)行page的替換,將type轉(zhuǎn)換成cpu對(duì)應(yīng)的window的linear address, 然后修改page table:
    void * kmap_atomic(struct page * page, enum km_type type)
    {
    enum fixed_addresses idx;
    unsigned long vaddr;
    current_thread_info( )->preempt_count++;
    if (!PageHighMem(page))
    return page_address(page);
    idx = type + KM_TYPE_NR * smp_processor_id( );
    vaddr = fix_to_virt(FIX_KMAP_BEGIN + idx);
    set_pte(kmap_pte-idx, mk_pte(page, 0x063));
    _ _flush_tlb_single(vaddr);
    return (void *) vaddr;
    }
    ------------------------------------------------------------------------------------------------------
    -----------------------------------------------------------------------------------------------------

?ps:

1 ZONE_DMA
Contains page frames of memory below 16 MB

2 ZONE_NORMAL
Contains page frames of memory at and above 16 MB and below 896 MB

3 ZONE_HIGHMEM
Contains page frames of memory at and above 896 MB


-----------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------

Linear Addresses of Noncontiguous Memory Areas?

Linux provides a mechanism via vmalloc() where non-contiguous physically memory can be used that is contiguous in virtual memory.
主要是如果系統(tǒng)中連續(xù)的內(nèi)存不夠的時(shí)候,使用vmalloc(),可以在high_memory中分配一些零碎的page,使得這些page在physical memory是離散的,使用page table將其映射成virtual memory是連續(xù)的。

get_vm_area() ------ looks for a free range of linear addresses between VMALLOC_START and VMALLOC_END.(就是說分配一塊虛擬地址),此函數(shù)的主要功能就是。

  • Invokes kmalloc( ) to obtain a memory area for the new descriptor of type vm_struct.
  • Gets the vmlist_lock lock for writing and scans the list of descriptors of type vm_struct looking for a free range of linear addresses that includes at least size + 4096 addresses (4096 is the size of the safety interval between the memory areas).
  • If such an interval exists, the function initializes the fields of the descriptor, releases the vmlist_lock lock, and terminates by returning the initial address of the noncontiguous memory area。
  • Otherwise, get_vm_area( ) releases the descriptor obtained previously, releases the vmlist_lock lock, and returns NULL.
  • 下面是申請(qǐng)物理上的page,并且映射為virtual上連續(xù)的page,讀者讀的時(shí)候即使有些不理解的地方,大體上就是這個(gè)樣子,可以暫時(shí)不求甚解。

    void * vmalloc(unsigned long size)
    {
    struct vm_struct *area;
    struct page **pages;
    unsigned int array_size, i;
    size = (size + PAGE_SIZE - 1) & PAGE_MASK;
    area = get_vm_area(size, VM_ALLOC); ------------ 【分配虛擬內(nèi)存地址】
    if (!area)
    return NULL;
    area->nr_pages = size >> PAGE_SHIFT;
    array_size = (area->nr_pages * sizeof(struct page *));
    area->pages = pages = kmalloc(array_size, GFP_KERNEL); ---------- 【申請(qǐng)存儲(chǔ)struct page *的指針數(shù)組】
    if (!area_pages) {
    remove_vm_area(area->addr);
    kfree(area);
    return NULL;
    }
    memset(area->pages, 0, array_size);
    for (i=0; i<area->nr_pages; i++) {
    area->pages[i] = alloc_page(GFP_KERNEL|_ _GFP_HIGHMEM); -------- 【在高地址處分配物理上存在的page,其實(shí)是返回struct page * 的指針】
    if (!area->pages[i]) {
    area->nr_pages = i;
    fail: vfree(area->addr);
    return NULL;

    }
    }
    if (map_vm_area(area, _ _pgprot(0x63), &pages)) ---------- 【在page table做映射,如果存在就修改,不存在就生成page table的各級(jí)表項(xiàng)】
    goto fail;
    return area->addr; ------- 【返回虛擬地址】
    }



    ?

    ?

    ?

    ?

    ?

    ?

    ?

    ?

    ?

    ?

    ?

    ?

    總結(jié)

    以上是生活随笔為你收集整理的linux中的memory management和page mapping的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。