日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 运维知识 > linux >内容正文

linux

一个历史遗留问题,引发的linux内存管理的‘血案’

發布時間:2025/3/15 linux 39 豆豆
生活随笔 收集整理的這篇文章主要介紹了 一个历史遗留问题,引发的linux内存管理的‘血案’ 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

最近處理一個骨灰級歷史殘留問題,內核模塊DPI的內存數據被無故關顧,導致系統的panic的問題,linux 內核版本3.18 x86_64,由于我們要精簡系統,許多調試工具已經被閹割,SLAB_DEBUG, KASAN not support, 由于這部分數據主要是查詢,在初始化后不會對其進行修改,所以想到一個辦法初始化完DPI后,將其使用的內存頁設置為只讀,通過stack的信息找到元兇。

按照以上的分析總共分為以下步驟:

  • 查找 虛擬地址的PTE
  • 設置PTE的屬性為只讀
  • #include <linux/kernel.h> #include <linux/module.h> #include <linux/init.h> #include <linux/gfp.h> #include <linux/mm.h>MODULE_LICENSE("GPL");static void *address = NULL;static __init int test_init(void) {int level;pte_t *ptep;struct page *page = alloc_pages(GFP_KERNEL, 1);if (unlikely(page == NULL)) {pr_err("alloc page err %p\n", page);return -1;}address = page_address(page);pr_info("lookup_address %p\n", address);// 1, lookup for PTEptep = lookup_address((unsigned long)address, &level);if (unlikely(ptep == NULL)) {pr_err("lookup_address %p\n", ptep);goto err;}if(level != PG_LEVEL_4K) {pr_err("level not 4K %d\n", level);goto err;}if (!pte_present(*ptep)) {pr_err("level not 4K %d\n", level);goto err;}// 2, set write protect flagset_pte(ptep, pte_wrprotect(*ptep));return 0;err:__free_page(page);return -1; }static __exit void test_exit(void) {// clear wrprotect flag// TODO ...pr_info("test exit\n"); }module_init(test_init); module_exit(test_exit);

    按照思路從業務中抽取功能代碼,寫了非常簡單的一個測試用例,以為萬事大吉,萬萬沒有想到,理想很豐滿,現實很骨感,事情總是不按照我們的預期執行,多次執行insmod test.ko,得到以下結果

    [ 659.486243] lookup_address ffff8800692e4000 [ 659.486248] level not 4K 2 [ 660.142577] lookup_address ffff880046436000 [ 660.142582] level not 4K 2 [ 660.530890] lookup_address ffff8800461a0000 [ 660.530896] level not 4K 2 [ 660.873884] lookup_address ffff88012369a000 [ 660.873889] level not 4K 2

    為什么level不是PG_LEVEL_4K,明明申請一頁,level層級確是PG_LEVEL_2M,這樣會將2M的內存空間設置為只讀狀態,為了查清這個問題,我們不得不梳理內存管理初始化流程:

    start_kernel()|---->setup_arch(&command_line);||---->init_mem_mapping();||---->memory_map_top_down();||---->init_range_memory_mapping();||---->init_memory_mapping(); /** Setup the direct mapping of the physical memory at PAGE_OFFSET.* This runs before bootmem is initialized and gets pages directly from* the physical memory. To access them they are temporarily mapped.*/ unsigned long __init_refok init_memory_mapping(unsigned long start,unsigned long end) {struct map_range mr[NR_RANGE_MR];unsigned long ret = 0;int nr_range, i;pr_info("init_memory_mapping: [mem %#010lx-%#010lx]\n",start, end - 1);memset(mr, 0, sizeof(mr));nr_range = split_mem_range(mr, 0, start, end);for (i = 0; i < nr_range; i++)ret = kernel_physical_mapping_init(mr[i].start, mr[i].end,mr[i].page_size_mask);add_pfn_range_mapped(start >> PAGE_SHIFT, ret >> PAGE_SHIFT);return ret >> PAGE_SHIFT; }static int __meminit split_mem_range(struct map_range *mr, int nr_range,unsigned long start,unsigned long end) {...... // 省略部分代碼/* big page (2M) range */start_pfn = round_up(pfn, PFN_DOWN(PMD_SIZE)); #ifdef CONFIG_X86_32end_pfn = round_down(limit_pfn, PFN_DOWN(PMD_SIZE)); #else /* CONFIG_X86_64 */end_pfn = round_up(pfn, PFN_DOWN(PUD_SIZE));if (end_pfn > round_down(limit_pfn, PFN_DOWN(PMD_SIZE)))end_pfn = round_down(limit_pfn, PFN_DOWN(PMD_SIZE)); #endifif (start_pfn < end_pfn) {nr_range = save_mr(mr, nr_range, start_pfn, end_pfn,page_size_mask & (1<<PG_LEVEL_2M));pfn = end_pfn;}#ifdef CONFIG_X86_64/* big page (1G) range */start_pfn = round_up(pfn, PFN_DOWN(PUD_SIZE));end_pfn = round_down(limit_pfn, PFN_DOWN(PUD_SIZE));if (start_pfn < end_pfn) {nr_range = save_mr(mr, nr_range, start_pfn, end_pfn,page_size_mask &((1<<PG_LEVEL_2M)|(1<<PG_LEVEL_1G)));pfn = end_pfn;}/* tail is not big page (1G) alignment */start_pfn = round_up(pfn, PFN_DOWN(PMD_SIZE));end_pfn = round_down(limit_pfn, PFN_DOWN(PMD_SIZE));if (start_pfn < end_pfn) {nr_range = save_mr(mr, nr_range, start_pfn, end_pfn,page_size_mask & (1<<PG_LEVEL_2M));pfn = end_pfn;} #endif...... // 省略部分代碼 }

    從split_mem_range() 可以看出,在做物理內存直接映射的時候,盡可能使用huge page去映射,這就解釋了為什么我們申請的內存是PG_LEVEL_2M,理論上說應該也會出現PG_LEVEL_1G的大頁,問題原因找到了,該怎么解決這個問題呢?此時想到了BPF功能,會將BPF字節碼注入內核,為了安全它也會做BPF字節碼的內存設置只讀權限,肯定也會遇到我們同樣的問題,RTFSC

    sys_bpf() | |---->bpf_prog_load()||---->bpf_prog_select_runtime()||---->bpf_int_jit_compile()||---->set_memory_ro()||---->change_page_attr_clear()||---->__change_page_attr_set_clr()||---->__change_page_attr()||---->lookup_address_cpa()||---->split_large_page() /* ! PG_LEVEL_4K */

    從上面代碼流程可以看出,bpf() 系統調用最終會調用split_large_page() 來解決申請的大頁的情況,x86平臺封裝了系列函數,至此我們修改我們的實現方式,采用set_memory_ro(),自作聰明的以為修改PTE屬性,還是掉進的坑里。

    /** The set_memory_* API can be used to change various attributes of a virtual* address range. The attributes include:* Cachability : UnCached, WriteCombining, WriteBack* Executability : eXeutable, NoteXecutable* Read/Write : ReadOnly, ReadWrite* Presence : NotPresent* / int set_memory_uc(unsigned long addr, int numpages); int set_memory_wc(unsigned long addr, int numpages); int set_memory_wb(unsigned long addr, int numpages); int set_memory_x(unsigned long addr, int numpages); int set_memory_nx(unsigned long addr, int numpages); int set_memory_ro(unsigned long addr, int numpages); int set_memory_rw(unsigned long addr, int numpages); int set_memory_np(unsigned long addr, int numpages); int set_memory_4k(unsigned long addr, int numpages);

    學習的道路,永無止境,特別是內核學習,RTFSC!!!!

    總結

    以上是生活随笔為你收集整理的一个历史遗留问题,引发的linux内存管理的‘血案’的全部內容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。