Android Linker学习笔记
原文地址:?http://drops.wooyun.org/tips/12122
0x00 知識預備
Linker是Android系統動態庫so的加載器/鏈接器,要想輕松地理解Android linker的運行機制,我們需要先熟悉ELF的文件結構,再了解ELF文件的裝入/啟動,最后學習Linker的加載和啟動原理。
鑒于ELF文件結構網上有很多資料,這里就不做累述了。
0x01 so的加載和啟動
我們知道如果一個APP需要使用某一共享庫so的話,它會在JAVA層聲明代碼:
| 1 2 3 | Static{ System.loadLibrary(“name”); } |
此代碼完成library的加載工作。翻看system.loadLibrary的源代碼,可以發現:
System.loadLibrary也是一個native方法,它的調用的過程是:
| 1 2 3 | Dalvik/vm/native/java_lang_Runtime.cpp: Dalvik_java_lang_Runtime_nativeLoad ->Dalvik/vm/Native.cpp:dvmLoadNativeCode dvmLoadNativeCode |
打開函數dvmLoadNativeCode,可以找到以下代碼:
| 1 2 3 4 5 6 7 8 9 10 | …….. handle = dlopen(pathName, RTLD_LAZY);//獲得指定庫文件的句柄,這個handle是soinfo* //這個庫文件就是System.loadLibrary(pathName)傳遞的參數 ….. vonLoad = dlsym(handle,"JNI_OnLoad");//獲取該文件的JNI_OnLoad函數的地址 ???if(vonLoad == NULL) { //如果找不到JNI_OnLoad,就說明這是用javah風格的代碼了,那么就推遲解析 ?LOGD("No JNI_OnLoad found in %s %p, skipping init",pathName, classLoader); //這句話我們在logcat中經常看見! }else{ …. } |
從上面的代碼可以看出Android系統加載共享庫的關鍵代碼為dlopen函數。這個dlopen函數的代碼在bionic/linker/dlfcn.c中:
| 1 2 3 4 5 6 7 8 9 | void* dlopen(constchar* filename,int flags) { ??ScopedPthreadMutexLocker locker(&gDlMutex); ??soinfo* result = do_dlopen(filename, flags); ??if(result == NULL) { ????__bionic_format_dlerror("dlopen failed", linker_get_error_buffer()); ????returnNULL; ??} ??returnresult; } |
此函數主要通過調用do_dlopen函數來返回一個動態鏈接庫的句柄,該句柄為一個soinfo結構體。Soinfo結構體的具體定義在bionic/linker/linker.h中。
繼續查看do_dlopen函數,代碼在linker.cpp中:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 | soinfo* do_dlopen(constchar* name,int flags) { ??if((flags & ~(RTLD_NOW|RTLD_LAZY|RTLD_LOCAL|RTLD_GLOBAL)) != 0) { ????DL_ERR("invalid flags to dlopen: %x", flags); ????returnNULL; ??} ??set_soinfo_pool_protection(PROT_READ | PROT_WRITE); ??soinfo* si = find_library(name);//查找動態鏈接庫 ??if(si != NULL) { ????si->CallConstructors(); ??} ??set_soinfo_pool_protection(PROT_READ); ??returnsi; } |
顯然,重點在find_library函數。此函數代碼如下:
| 1 2 3 4 5 6 7 | staticsoinfo* find_library(constchar* name) { ??soinfo* si = find_library_internal(name); ??if(si != NULL) { ????si->ref_count++; ??} ??returnsi; } |
繼續往下深入:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | staticsoinfo* find_library_internal(constchar* name) { ??…….. ??soinfo* si = find_loaded_library(name);?//首先查看這個so是否已經加載,如果已經加載,就返回該so的soinfo ??if(si != NULL) { ????if(si->flags & FLAG_LINKED) { ??????returnsi; ????} ????DL_ERR("OOPS: recursive link to \"%s\"", si->name); ????returnNULL; ??} ??TRACE("[ '%s' has not been loaded yet.? Locating...]", name); ??si = load_library(name);?//說明該so沒有被加載,就調用此函數進行加載 ??if(si == NULL) { ????returnNULL; ??} ??// At this point we know that whatever is loaded @ base is a valid ELF ??// shared library whose segments are properly mapped in. ??TRACE("[ find_library_internal base=%p size=%zu name='%s' ]", ????????reinterpret_cast<void*>(si->base), si->size, si->name); ??if(!soinfo_link_image(si)) {? //加載完so后,根據si的反饋進行鏈接。會在第3節進行詳細分析 ????munmap(reinterpret_cast<void*>(si->base), si->size); ????soinfo_free(si); ????returnNULL; ??} ??returnsi; } |
先不去關心那些錯誤處理信息,我們假設各個函數的返回值均在預期范圍內,這個函數的執行流程為:
load_library函數是整個so加載過程的重中之重!它創建了動態鏈接庫的句柄,代碼如下:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | staticsoinfo* load_library(constchar* name) { ????// Open the file. ????intfd = open_library(name); ????if(fd == -1) { ????????DL_ERR("library \"%s\" not found", name); ????????returnNULL; ????} ????// Read the ELF header and load the segments. ????ElfReader elf_reader(name, fd); ????if(!elf_reader.Load()) { ????????returnNULL; ????} ????constchar* bname =strrchr(name,'/'); ????soinfo* si = soinfo_alloc(bname ? bname + 1 : name); ????if(si == NULL) { ????????returnNULL; ????} ????si->base = elf_reader.load_start(); ????si->size = elf_reader.load_size(); ????si->load_bias = elf_reader.load_bias(); ????si->flags = 0; ????si->entry = 0;//入口函數設為null ????si->dynamic = NULL; ????si->phnum = elf_reader.phdr_count(); ????si->phdr = elf_reader.loaded_phdr(); ????returnsi; } |
load_library函數的執行過程可以概括如下:
下面對步驟二加以詳細介紹。
1.1 SO文件的讀取與加載工作
Linker使用ElfRead類的load函數完成so文件的分析工作。該類的源代碼在linker_phdr.cpp中。Load函數代碼如下:
| 1 2 3 4 5 6 7 8 | boolElfReader::Load() { ??returnReadElfHeader() && ?????????VerifyElfHeader() && ?????????ReadProgramHeader() && ?????????ReserveAddressSpace() && ?????????LoadSegments() && ?????????FindPhdr(); } |
顯然此函數依次調用ReadElfHeader、ReadProgramHeader等函數。
首先,我們需要知道Android系統加載segments的機制:
一個ELF文件的程序頭表包含一個或多個PT_LOAD segments,這些segments標志ELF文件中需要被映射到進程空間的區域。每一個可以加載的segment都含有如下重要屬性:
- p_offset: 段在文件的偏移地址
- p_filesz:段的大小
- p_memsz:段在內存中占據的大小(通常大于p_filesz)。
- p_vaddr: 段的虛擬地址
- p_flags:段的標記(可讀,可寫,可執行)
當前,我們忽略p_paddr和p_align成員。
可以加載的segments能在虛擬地址范圍[p_vaddr…p_vaddr+p_memsz)以列表的形式展現。其中有如下幾個規則:
下面是兩個loadable segments的信息:
| 1 2 | [ offset:0,????? filesz:0x4000, memsz:0x4000, vaddr:0x30000 ], [ offset:0x4000, filesz:0x2000, memsz:0x8000, vaddr:0x40000 ], |
相當于這兩個segments的虛擬地址范圍分別為:
| 1 2 | 0x30000...0x34000 0x40000...0x48000 |
如果加載器決定將第一個segment加載到0xa0000000的話(通過后面的分析會知道,這個加載地址是在加載程序頭部表的時候由系統確定的),那么它們的實際虛擬地址范圍就是:
| 1 2 | 0xa0030000...0xa0034000 0xa0040000...0xa0048000 |
換句話說,所有的segments的實際加載開始地址與其vaddr的偏差值是固定的(0xa0030000 – 0x30000 = 0xa0040000 – 0x40000)。
但是,在實際情況下,segments的地址并不是在每一頁的邊界出開始的。考慮到我們只能在頁面邊界進行內存映射,因此,這就意味著加載地址的偏差bias應當按照如下方法進行計算:
| 1 2 3 | load_bias = phdr0_load_address - PAGE_START(phdr0->p_vaddr) (#define PAGE_START(x)? ((x) & PAGE_MASK)? PAGE_MASK的值一般為0xfffff000。) |
所以第一個segment的load_bias?= 0xa0030000 – 0x30000&0xfffff000 = 0xa00000000。
這里phdr0_load_address必須以某一頁的邊界為起始地址,所以該segments的真正內容的開始地址為:
| 1 2 | phdr0_load_address + PAGE_OFFSET(phdr0->p_vaddr) (#define? PAGE_OFFSET(x)? ((x) & ~PAGE_MASK)?? 就是x & 0xfff) |
注意:ELF要求如下條件,以滿足mmap正常工作:
| 1 | PAGE_OFFSET(phdr0->p_vaddr) == PAGE_OFFSET(phdr0->p_offset) |
每一個loadable segments的p_vaddr都必須加上load_bias,其和就是該segments在內存中的實際開始地址。
1.1.1 ReadProgramHeader
理清了Android加載segments的機制,我們就來看linker中的實際代碼,先看ReadProgramHeader:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | boolElfReader::ReadProgramHeader() { phdr_num_ = header_.e_phnum; ??…….. ??ElfW(Addr) page_min = PAGE_START(header_.e_phoff); ??ElfW(Addr) page_max = PAGE_END(header_.e_phoff + (phdr_num_ *sizeof(ElfW(Phdr)))); ??ElfW(Addr) page_offset = PAGE_OFFSET(header_.e_phoff); ??phdr_size_ = page_max - page_min; ??void* mmap_result = mmap(NULL, phdr_size_, PROT_READ, MAP_PRIVATE, fd_, page_min); ??…….. ??phdr_mmap_ = mmap_result; ??phdr_table_ =reinterpret_cast<ElfW(Phdr)*>(reinterpret_cast<char*>(mmap_result) + page_offset); ??returntrue; } |
(注:紅色代碼為倒數第三句)
首先reinterpret_cast<char*>(mmap_result):經void*型指針mmap_result強制轉換成char*型;
然后reinterpret_cast<char*>(mmap_result) + page_offset:char*型指針+page_offset,表示指向程序頭部表真正開始的地方;
最后再將其轉換成ElfW(Phdr)*型指針,顯然phdr_table_指向程序頭部表開始地址。
1.1.2 ReserveAddressSpace
再來看ReserveAddressSpace:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 | /*預備一塊足夠大的虛擬地址范圍,用來加載所有可加載的segments.我們可以通過mmap創建一個帶有PROT_NONE屬性的私有匿名內存映射。PROT_NONE表示頁不可訪問,匿名映射表示映射區不與任何文件關聯(要求fd為-1),私有映射表示對該映射區域的寫入操作會產生一個映射文件的復制,對此區域做的任何修改夠不會寫會原來的文件*/ boolElfReader::ReserveAddressSpace() { ??ElfW(Addr) min_vaddr; ??load_size_ = phdr_table_get_load_size(phdr_table_, phdr_num_, &min_vaddr); ??…….. ??uint8_t* addr =reinterpret_cast<uint8_t*>(min_vaddr); ??intmmap_flags = MAP_PRIVATE | MAP_ANONYMOUS; ??void* start = mmap(addr, load_size_, PROT_NONE, mmap_flags, -1, 0); ??…….. ??load_start_ = start; ??load_bias_ =reinterpret_cast<uint8_t*>(start) - addr; ??returntrue; } |
這里有一個關鍵函數phdr_table_get_load_siz:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | /*返回ELF文件程序頭部表中所指定的所有可加載segments(這些segments可能是非連續的)的區間大小,如果沒有可加載的segments,就返回0 如果out_min_vaddr 或 out_max_vadd是非空的,它們就會被設置成將被存儲的頁的最小/大地址(如果沒有可加載segments的話,就設為0) */ size_tphdr_table_get_load_size(constElfW(Phdr)* phdr_table, size_tphdr_count, ????????????????????????????????ElfW(Addr)* out_min_vaddr, ????????????????????????????????ElfW(Addr)* out_max_vaddr) { ??ElfW(Addr) min_vaddr = UINTPTR_MAX; ??ElfW(Addr) max_vaddr = 0; ??boolfound_pt_load = false; ??for(size_ti = 0; i < phdr_count; ++i) { ????constElfW(Phdr)* phdr = &phdr_table[i]; ????if(phdr->p_type != PT_LOAD) { ??????continue; ????} ????found_pt_load =true; ????if(phdr->p_vaddr < min_vaddr) { ??????min_vaddr = phdr->p_vaddr; ????} ????if(phdr->p_vaddr + phdr->p_memsz > max_vaddr) { ??????max_vaddr = phdr->p_vaddr + phdr->p_memsz; ????} ??} ??if(!found_pt_load) { ????min_vaddr = 0; ??} ??min_vaddr = PAGE_START(min_vaddr); ??max_vaddr = PAGE_END(max_vaddr); ??if(out_min_vaddr != NULL) { ????*out_min_vaddr = min_vaddr; ??} ??if(out_max_vaddr != NULL) { ????*out_max_vaddr = max_vaddr; ??} ??returnmax_vaddr - min_vaddr; } |
通俗點講,此函數就是返回ELF文件中包含的可加載segments總共需要占用的空間大小,并設置其最小虛擬地址的值(是頁對齊的)。值得注意的是,原函數有4個參數,但是在ReserveAddressSpace中調用該函數時卻只傳遞了3個參數,忽略了out_max_vaddr。在我個人看來是因為已知了out_min_vaddr及兩者的差值load_size,所以可以通過out_min_vaddr + load_size來求得out_max_vaddr。
現在回到ReserveAddressSpace函數。求得load_size之后,就需要為這些segments分配足夠的內存空間。這里需要注意的是mmap的第一個參數并非為Null,而是addr。這就表示將映射區間的開始地址放在進程的addr地址處(一般不會成功,而是由系統自動分配,所以可以看作是Null),mmap返回實際映射后的內存開始地址start。顯然load_bias_ = start – addr就是實際映射內存地址同linker期望的映射地址的誤差值。后面的操作中,linker就可以通過p_vaddr + load_bias_來獲取某一segments在內存中的開始地址了。
1.1.3 LoadSegments
現在就開始加載ELF文件中的可加載segments了:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | boolElfReader::LoadSegments() { ??for(size_ti = 0; i < phdr_num_; ++i) { ????constElfW(Phdr)* phdr = &phdr_table_[i]; ????if(phdr->p_type != PT_LOAD) { ??????continue; ????} ????// Segment addresses in memory. ????ElfW(Addr) seg_start = phdr->p_vaddr + load_bias_; ????ElfW(Addr) seg_end?? = seg_start + phdr->p_memsz; ????ElfW(Addr) seg_page_start = PAGE_START(seg_start); ????ElfW(Addr) seg_page_end?? = PAGE_END(seg_end); ????ElfW(Addr) seg_file_end?? = seg_start + phdr->p_filesz; ????// File offsets. ????ElfW(Addr) file_start = phdr->p_offset; ????ElfW(Addr) file_end?? = file_start + phdr->p_filesz; ????ElfW(Addr) file_page_start = PAGE_START(file_start); ????ElfW(Addr) file_length = file_end - file_page_start; ????if(file_length != 0) { ??????void* seg_addr = mmap(reinterpret_cast<void*>(seg_page_start), ????????????????????????????file_length,//是以文件大小為參照,而非內存大小 ????????????????????????????PFLAGS_TO_PROT(phdr->p_flags), ????????????????????????????MAP_FIXED|MAP_PRIVATE, ????????????????????????????fd_, ????????????????????????????file_page_start); ??????if(seg_addr == MAP_FAILED) { ????????DL_ERR("couldn't map \"%s\" segment %zd: %s", name_, i, strerror(errno)); ????????returnfalse; ??????} ????} ????/*如果segments可寫,并且該segments的實際結束地址不在某一頁的邊界的話,就將該segments實際結束地址到此頁的邊界之間的內存全置為0*/ ????if((phdr->p_flags & PF_W) != 0 && PAGE_OFFSET(seg_file_end) > 0) { ??????memset(reinterpret_cast<void*>(seg_file_end), 0, PAGE_SIZE - PAGE_OFFSET(seg_file_end)); ????} ????seg_file_end = PAGE_END(seg_file_end); ????// seg_file_end is now the first page address after the file ????// content. If seg_end is larger, we need to zero anything ????// between them. This is done by using a private anonymous ????// map for all extra pages. ????if(seg_page_end > seg_file_end) { ??????void* zeromap = mmap(reinterpret_cast<void*>(seg_file_end), ???????????????????????????seg_page_end - seg_file_end, ???????????????????????????PFLAGS_TO_PROT(phdr->p_flags), ???????????????????????????MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE, ???????????????????????????-1, ???????????????????????????0); ??????if(zeromap == MAP_FAILED) { ????????DL_ERR("couldn't zero fill \"%s\" gap: %s", name_,strerror(errno)); ????????returnfalse; ??????} ????} ??} ??returntrue; } |
此部分功能很簡單:就是將ELF中的可加載segments依次映射到內存中,并進行一些輔助掃尾工作。
1.1.4 FindPhdr
返回程序頭部表在內存中地址。這與phdr_table_是不同的,后者是一個臨時的、在so被重定位之前會為釋放的變量:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | boolElfReader::FindPhdr() { ??constElfW(Phdr)* phdr_limit = phdr_table_ + phdr_num_; ??//如果段類型是 PT_PHDR, 那么我們就直接使用該段的地址. ??for(constElfW(Phdr)* phdr = phdr_table_; phdr < phdr_limit; ++phdr) { ????if(phdr->p_type == PT_PHDR) { ??????returnCheckPhdr(load_bias_ + phdr->p_vaddr); ????} ??} ??//否則,我們就檢查第一個可加載段。如果該段的文件偏移值為0,那么就表示它是以ELF頭開始的,我們就可以通過它來找到程序頭表加載到內存的地址(雖然過程有點繁瑣)。 ??for(constElfW(Phdr)* phdr = phdr_table_; phdr < phdr_limit; ++phdr) { ????if(phdr->p_type == PT_LOAD) { ??????if(phdr->p_offset == 0) { ????????ElfW(Addr)? elf_addr = load_bias_ + phdr->p_vaddr; ????????constElfW(Ehdr)* ehdr = reinterpret_cast<constElfW(Ehdr)*>(elf_addr); ????????ElfW(Addr)? offset = ehdr->e_phoff; ????????returnCheckPhdr((ElfW(Addr))ehdr + offset); ??????} ??????break; ????} ??} ??DL_ERR("can't find loaded phdr for \"%s\"", name_); ??returnfalse; } |
要理解這段代碼,我們需要知道段類型PT_PHDR所表示的意義:指定程序頭表在文件及程序內存映像中的位置和大小。此段類型不能在一個文件中多次出現。此外,僅當程序頭表是程序內存映像的一部分時,才可以出現此段。此類型(如果存在)必須位于任何可裝入段的各項的前面。有關詳細信息,請參見程序的解釋程序。
至此so文件的讀取、加載工作就分析完畢了。我們可以發現,Android對so的加載操作只是以段為單位,跟section完全沒有關系。另外,通過查看VerifyElfHeader的代碼,我們還可以發現,Android系統僅僅對ELF文件頭的e_ident、e_type、e_version、e_machine進行驗證(當然,e_phnum也是不能錯的),所以,這就解釋了為什么有些加殼so文件頭的section相關字段可以任意修改,系統也不會報錯了。
1.2 so的鏈接機制
在1.1我們詳細分析了Android so的加載機制,現在就開始分析so的鏈接機制。在分析linker的關于鏈接的源代碼之前,我們需要學習ELF文件關于動態鏈接方面的知識。
1.2.1 動態節區
如果一個目標文件參與動態鏈接,它的程序頭部表將包含類型為?PT_DYNAMIC?的元素。此“段”包含.dynamic節區(這個節區是一個數組)。該節區采用一個特殊符號_DYNAMIC來標記,其中包含如下結構的數組:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 | typedefstruct { Elf32_Sword d_tag; union{ Elf32_Word d_val; Elf32_Addr d_ptr; } d_un; } Elf32_Dyn; externElf32_Dyn _DYNAMIC[]; //注意這里是一個數組 /*注意: 對每個這種類型的對象,d_tag控制d_un的解釋含義: d_val 此 Elf32_Word 對象表示一個整數值,可以有多種解釋。 d_ptr 此 Elf32_Addr 對象代表程序的虛擬地址。 關于d_tag的值、該值的意義,及其與d_un的關系,可查看ELF.PDF? p24。 */ |
該Elf32_Dyn數組就是soinfo結構體中的dynamic成員,我們在第2節介紹的load_library函數中發現,si->dynamic被賦值為null,這就說明,在加載階段是不需要此值的,只有在鏈接階段才需要。Android的動態庫的鏈接工作還是由linker完成,主要代碼就是在linker.cpp的soinfo_link_image(find_library_internal方法中調用)中,此函數的代碼相當多,我們來分塊分析:
首先,我們需要從程序頭部表中獲取dynamic節區信息:
| 1 2 3 4 5 6 7 | /*in function soinfo_link_image */??? ????/*抽取動態節區*/ ????size_tdynamic_count; ????ElfW(Word) dynamic_flags; ????/*這里的si->dynamic 為ElfW(Dyn)指針,就是上面提到的Elf32_Dyn _DYNAMIC[]*/ ????phdr_table_get_dynamic_section(phdr, phnum, base, &si->dynamic, ???????????????????????????????????&dynamic_count, &dynamic_flags); |
此函數很簡單:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | /*返回ELF文件中的dynamic節區在內存中的地址和大小,如果沒有該節區就返回null ?* Input: ?*?? phdr_table? -> program header table ?*?? phdr_count? -> number of entries in tables ?*?? load_bias?? -> load bias ?* Output: ?*?? dynamic?????? -> address of table in memory (NULL on failure). ?*?? dynamic_count -> number of items in table (0 on failure). ?*?? dynamic_flags -> protection flags for section (unset on failure) */ voidphdr_table_get_dynamic_section(constElfW(Phdr)* phdr_table, size_tphdr_count, ????????????????????????????????????ElfW(Addr) load_bias, ????????????????????????????????????ElfW(Dyn)** dynamic,size_t* dynamic_count, ElfW(Word)* dynamic_flags) { ??constElfW(Phdr)* phdr = phdr_table; ??constElfW(Phdr)* phdr_limit = phdr + phdr_count; ??for(phdr = phdr_table; phdr < phdr_limit; phdr++) { ????if(phdr->p_type != PT_DYNAMIC) { ??????continue; ????} ????*dynamic =reinterpret_cast<ElfW(Dyn)*>(load_bias + phdr->p_vaddr); ????if(dynamic_count) { ??????*dynamic_count = (unsigned)(phdr->p_memsz / 8); ??????//這里需要解釋下,在2.2.1中我們介紹了Elf32_Dyn的結構,它占8字節。而PT_DYNAMIC段就是存放著Elf32_Dyn數組,所以dynamic_count的值就是該段的memsz/8。 ????} ????if(dynamic_flags) { ??????*dynamic_flags = phdr->p_flags; ????} ????return; ??} ??*dynamic = NULL; ??if(dynamic_count) { ????*dynamic_count = 0; ??} } |
成功獲取了dynamic節區信息,我們就可以根據該節區中的Elf32_Dyn數組來進行so鏈接操作了。我們需要從dynamic節區中抽取有用的信息,linker采用遍歷dynamic數組的方式,根據每個元素的flags()進行相應的處理:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 | /*in function soinfo_link_image */ ????// 從動態dynamic節區中抽取有用信息 ????uint32_t needed_count = 0; ????//開始從頭遍歷dyn數組,根據數組中個元素的標記進行相應的處理 ????for(ElfW(Dyn)* d = si->dynamic; d->d_tag != DT_NULL; ++d) {//標記為 DT_NULL 的項目標注了整個 _DYNAMIC 數組的末端,因此以它為結尾標志。 ????????........ ????????switch(d->d_tag) { ????????caseDT_HASH: ????????????........ ????????????break; ????????caseDT_STRTAB: ????????????si->strtab =reinterpret_cast<constchar*>(base + d->d_un.d_ptr); ????????????break; ????????caseDT_SYMTAB: ????????????si->symtab =reinterpret_cast<ElfW(Sym)*>(base + d->d_un.d_ptr); ????????????break; ????????caseDT_JMPREL: #if defined(USE_RELA) ????????????si->plt_rela =reinterpret_cast<ElfW(Rela)*>(base + d->d_un.d_ptr); #else ????????????si->plt_rel =reinterpret_cast<ElfW(Rel)*>(base + d->d_un.d_ptr); #endif ????????????break; ????????caseDT_PLTRELSZ: #if defined(USE_RELA) ????????????si->plt_rela_count = d->d_un.d_val /sizeof(ElfW(Rela)); #else ????????????si->plt_rel_count = d->d_un.d_val /sizeof(ElfW(Rel)); #endif ????????????break; #if defined(__mips__) ????????caseDT_PLTGOT: ????????????// Used by mips and mips64. ????????????si->plt_got =reinterpret_cast<ElfW(Addr)**>(base + d->d_un.d_ptr); ????????????break; #endif ?????????........ #if defined(USE_RELA) ?????????caseDT_RELA: ????????????si->rela =reinterpret_cast<ElfW(Rela)*>(base + d->d_un.d_ptr); ????????????break; ?????????caseDT_RELASZ: ????????????si->rela_count = d->d_un.d_val /sizeof(ElfW(Rela)); ????????????break; ????????caseDT_REL: ????????????DL_ERR("unsupported DT_REL in \"%s\"", si->name); ????????????returnfalse; ????????caseDT_RELSZ: ????????????DL_ERR("unsupported DT_RELSZ in \"%s\"", si->name); ????????????returnfalse; #else ????????caseDT_REL: ????????????si->rel =reinterpret_cast<ElfW(Rel)*>(base + d->d_un.d_ptr); ????????????break; ????????caseDT_RELSZ: ????????????si->rel_count = d->d_un.d_val /sizeof(ElfW(Rel)); ????????????break; ?????????caseDT_RELA: ????????????DL_ERR("unsupported DT_RELA in \"%s\"", si->name); ????????????returnfalse; #endif ????????caseDT_INIT: //只有可執行文件才有此節區 ????????????si->init_func =reinterpret_cast<linker_function_t>(base + d->d_un.d_ptr); ????????????DEBUG("%s constructors (DT_INIT) found at %p", si->name, si->init_func); ????????????break; ????????caseDT_FINI: ????????????si->fini_func =reinterpret_cast<linker_function_t>(base + d->d_un.d_ptr); ????????????DEBUG("%s destructors (DT_FINI) found at %p", si->name, si->fini_func); ????????????break; ????????caseDT_INIT_ARRAY: ????????????si->init_array =reinterpret_cast<linker_function_t*>(base + d->d_un.d_ptr); ????????????DEBUG("%s constructors (DT_INIT_ARRAY) found at %p", si->name, si->init_array); ????????????break; ????????caseDT_INIT_ARRAYSZ: ????????????si->init_array_count = ((unsigned)d->d_un.d_val) /sizeof(ElfW(Addr)); ????????????break; ????????caseDT_FINI_ARRAY: ????????????si->fini_array =reinterpret_cast<linker_function_t*>(base + d->d_un.d_ptr); ????????????DEBUG("%s destructors (DT_FINI_ARRAY) found at %p", si->name, si->fini_array); ????????????break; ????????caseDT_FINI_ARRAYSZ: ????????????si->fini_array_count = ((unsigned)d->d_un.d_val) /sizeof(ElfW(Addr)); ????????????break; ????????caseDT_PREINIT_ARRAY: ????????????si->preinit_array =reinterpret_cast<linker_function_t*>(base + d->d_un.d_ptr); ????????????DEBUG("%s constructors (DT_PREINIT_ARRAY) found at %p", si->name, si->preinit_array); ????????????break; ????????caseDT_PREINIT_ARRAYSZ: ????????????si->preinit_array_count = ((unsigned)d->d_un.d_val) /sizeof(ElfW(Addr)); ????????????break; ????????caseDT_TEXTREL: #if defined(__LP64__) ????????????DL_ERR("text relocations (DT_TEXTREL) found in 64-bit ELF file \"%s\"", si->name); ????????????returnfalse; #else ????????????si->has_text_relocations =true; ????????????break; #endif ????????caseDT_SYMBOLIC: ????????????si->has_DT_SYMBOLIC =true; ????????????break; ????????caseDT_NEEDED: ????????????++needed_count; ????????????break; ????????caseDT_FLAGS: ????????????if(d->d_un.d_val & DF_TEXTREL) { ????????????????........ ????????????????si->has_text_relocations =true; ????????????} ????????????if(d->d_un.d_val & DF_SYMBOLIC) { ????????????????si->has_DT_SYMBOLIC =true; ????????????} ????????????break; #if defined(__mips__) ????????caseDT_STRSZ: ????????caseDT_SYMENT: ????????caseDT_RELENT: ?????????????break; ????????caseDT_MIPS_RLD_MAP: ????????????// Set the DT_MIPS_RLD_MAP entry to the address of _r_debug for GDB. ????????????{ ??????????????r_debug** dp =reinterpret_cast<r_debug**>(base + d->d_un.d_ptr); ??????????????*dp = &_r_debug; ????????????} ????????????break; ????????caseDT_MIPS_RLD_VERSION: ????????caseDT_MIPS_FLAGS: ????????caseDT_MIPS_BASE_ADDRESS: ????????caseDT_MIPS_UNREFEXTNO: ????????????break; ????????caseDT_MIPS_SYMTABNO: ????????????si->mips_symtabno = d->d_un.d_val; ????????????break; ????????caseDT_MIPS_LOCAL_GOTNO: ????????????si->mips_local_gotno = d->d_un.d_val; ????????????break; ????????caseDT_MIPS_GOTSYM: ????????????si->mips_gotsym = d->d_un.d_val; ????????????break; #endif ????????default: ????????????DEBUG("Unused DT entry: type %p arg %p", ??????????????????reinterpret_cast<void*>(d->d_tag),reinterpret_cast<void*>(d->d_un.d_val)); ????????????break; ????????} ????} |
完成dynamic數組的遍歷后,就說明我們已經獲取了其中的有用信息了,那么現在就需要根據這些信息進行處理:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 | /*in function soinfo_link_image */ ????//再檢測一遍,這種做法總是明智的 ????if(relocating_linker && needed_count != 0) { ????????DL_ERR("linker cannot have DT_NEEDED dependencies on other libraries"); ????????returnfalse; ????} ????if(si->nbucket == 0) { ????????DL_ERR("empty/missing DT_HASH in \"%s\" (built with --hash-style=gnu?)", si->name); ????????returnfalse; ????} ????if(si->strtab == 0) { ????????DL_ERR("empty/missing DT_STRTAB in \"%s\"", si->name); ????????returnfalse; ????} ????if(si->symtab == 0) { ????????DL_ERR("empty/missing DT_SYMTAB in \"%s\"", si->name); ????????returnfalse; ????} ????// If this is the main executable, then load all of the libraries from LD_PRELOAD now. ????//如果是main可執行文件,那么就根據LD_PRELOAD信息來加載所有相關的庫 ????//這里面涉及到的gLdPreloadNames變量,我們知道在前面的整個分析過程中均沒有涉及,這是因為,對于可執行文件而言,它的起始函數并不是dlopen,而是系統內核的execv函數,通過層層調用之后才會執行到linker的linker_init_post_ralocation函數,在這個函數中調用parse_LD_PRELOAD函數完成 gLdPreloadNames變量的賦值 ????if(si->flags & FLAG_EXE) { ????????memset(gLdPreloads, 0,sizeof(gLdPreloads)); ????????size_tpreload_count = 0; ????????for(size_ti = 0; gLdPreloadNames[i] != NULL; i++) { ????????????soinfo* lsi = find_library(gLdPreloadNames[i]); ????????????if(lsi != NULL) { ????????????????gLdPreloads[preload_count++] = lsi; ????????????}else { ????????????????........ ????????????} ????????} ????} ????//分配一個soinfo*[]指針數組,用于存放本so庫需要的外部so庫的soinfo指針 ????soinfo** needed =reinterpret_cast<soinfo**>(alloca((1 + needed_count) *sizeof(soinfo*))); ????soinfo** pneeded = needed; ????//依次獲取dynamic數組中定義的每一個外部so庫soinfo ????for(ElfW(Dyn)* d = si->dynamic; d->d_tag != DT_NULL; ++d) { ????????if(d->d_tag == DT_NEEDED) { ????????????constchar* library_name = si->strtab + d->d_un.d_val;//根據index值獲取所需庫的名字 ????????????DEBUG("%s needs %s", si->name, library_name); ????????????soinfo* lsi = find_library(library_name);?//獲取該庫的soinfo ????????????if(lsi == NULL) { ????????????????........ ????????????} ????????????*pneeded++ = lsi; ????????} ????} ????*pneeded = NULL; #if !defined(__LP64__) ????if(si->has_text_relocations) { ????????// Make segments writable to allow text relocations to work properly. We will later call ????????// phdr_table_protect_segments() after all of them are applied and all constructors are run. ????????DL_WARN("%s has text relocations. This is wasting memory and prevents " ????????????????"security hardening. Please fix.", si->name); ????????if(phdr_table_unprotect_segments(si->phdr, si->phnum, si->load_bias) < 0) { ????????????DL_ERR("can't unprotect loadable segments for \"%s\": %s", ???????????????????si->name,strerror(errno)); ????????????returnfalse; ????????} ????} #endif #if defined(USE_RELA) ????if(si->plt_rela != NULL) { ????????DEBUG("[ relocating %s plt ]\n", si->name); ????????if(soinfo_relocate(si, si->plt_rela, si->plt_rela_count, needed)) { ????????????returnfalse; ????????} ????} ????if(si->rela != NULL) { ????????DEBUG("[ relocating %s ]\n", si->name); ????????if(soinfo_relocate(si, si->rela, si->rela_count, needed)) { ????????????returnfalse; ????????} ????} #else ????if(si->plt_rel != NULL) { ????????DEBUG("[ relocating %s plt ]", si->name); ????????if(soinfo_relocate(si, si->plt_rel, si->plt_rel_count, needed)) { ????????????returnfalse; ????????} ????} ????if(si->rel != NULL) { ????????DEBUG("[ relocating %s ]", si->name); ????????if(soinfo_relocate(si, si->rel, si->rel_count, needed)) { ????????????returnfalse; ????????} ????} #endif #if defined(__mips__) ????if(!mips_relocate_got(si, needed)) { ????????returnfalse; ????} #endif ????si->flags |= FLAG_LINKED; ????DEBUG("[ finished linking %s ]", si->name); #if !defined(__LP64__) ????if(si->has_text_relocations) { ????????// All relocations are done, we can protect our segments back to read-only. ????????if(phdr_table_protect_segments(si->phdr, si->phnum, si->load_bias) < 0) { ????????????DL_ERR("can't protect segments for \"%s\": %s", ???????????????????si->name,strerror(errno)); ????????????returnfalse; ????????} ????} #endif ????/* We can also turn on GNU RELRO protection */ ????if(phdr_table_protect_gnu_relro(si->phdr, si->phnum, si->load_bias) < 0) { ????????DL_ERR("can't enable GNU RELRO protection for \"%s\": %s", ???????????????si->name,strerror(errno)); ????????returnfalse; ????} ????notify_gdb_of_load(si); ????returntrue; } |
0x02 開始執行so文件
上面的find_library_internal函數中的soinfo_link_image函數執行完后就返回到上層函數find_library中,然后進一步返回到do_dlopen函數:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 | soinfo* do_dlopen(constchar* name,int flags) { ??if((flags & ~(RTLD_NOW|RTLD_LAZY|RTLD_LOCAL|RTLD_GLOBAL)) != 0) { ????DL_ERR("invalid flags to dlopen: %x", flags); ????returnNULL; ??} ??set_soinfo_pool_protection(PROT_READ | PROT_WRITE); ??soinfo* si = find_library(name); ??if(si != NULL) { ????si->CallConstructors(); ??} ??set_soinfo_pool_protection(PROT_READ); ??returnsi; } |
如果獲取的si不為空,就說明so的加載和鏈接操作正確完成,那么就可以執行so的初始化構造函數了:
| 1 2 3 4 5 6 7 | voidsoinfo::CallConstructors() { ??........ ??// DT_INIT should be called before DT_INIT_ARRAY if both are present. ??//如果文件含有.init和.init_array節區的話,就先執行.init節區的代碼再執行.init_array節區的代碼 ??CallFunction("DT_INIT", init_func);? ??CallArray("DT_INIT_ARRAY", init_array, init_array_count,false); } |
由于我們只分析so庫,所以只需要關心CallArray("DT_INIT_ARRAY", init_array, init_array_count, false)函數即可:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 | voidsoinfo::CallArray(constchar* array_name UNUSED, linker_function_t* functions,size_t count, bool reverse) { ??........ ??//這里的recerse變量用于指定.init_array中的函數是由前到后執行還是由后到前執行。默認是由前到后 ??intbegin = reverse ? (count - 1) : 0; ??intend = reverse ? -1 : count; ??intstep = reverse ? -1 : 1; ??for(inti = begin; i != end; i += step) { ????TRACE("[ %s[%d] == %p ]", array_name, i, functions[i]); ????CallFunction("function", functions[i]);//依次調用init_array中的函數。 ??} ?........ } |
這里需要對init_array節區的結構和作用加以說明。
首先是init_array節區的數據結構。該節中包含指針,這些指針指向了一些初始化代碼。這些初始化代碼一般是在main函數之前執行的。在C++程序中,這些代碼用來運行靜態構造函數。另外一個用途就是有時候用來初始化C庫中的一些IO系統。使用IDA查看具有init_array節區的so庫文件就可以找到如下數據:
這里共三個函數指針,每個指針指向一個函數地址。值得注意的是,上圖中每個函數指針的值都加了1,這是因為地址的最后1位置1表明需要使得處理器由ARM轉為Thumb狀態來處理Thumb指令。將目標地址處的代碼解釋為Thumb代碼來執行。
然后再來看CallFunction的具體實現:
| 1 2 3 4 5 6 7 8 9 10 11 | voidsoinfo::CallFunction(constchar* function_name UNUSED, linker_function_t function) { ??//如果函數地址為空或者為-1就直接退出。 ??if(function == NULL || reinterpret_cast<uintptr_t>(function) == static_cast<uintptr_t>(-1)) { ????return; ??} ??........ ??function();//執行該指針所指定的函數 ??// The function may have called dlopen(3) or dlclose(3), so we need to ensure our data structures ??// are still writable. This happens with our debug malloc (see http://b/7941716). ??set_soinfo_pool_protection(PROT_READ | PROT_WRITE); } |
至此,整個Android so的linker機制就分析完畢了!
總結
以上是生活随笔為你收集整理的Android Linker学习笔记的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Local Binary Convolu
- 下一篇: TensorFlow for Hacke