當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

mach-o格式分析

發布時間：2025/3/15 编程问答 41 豆豆

生活随笔收集整理的這篇文章主要介紹了 mach-o格式分析小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

0x00 摘要

人生無根蒂，飄如陌上塵。分散逐風轉，此已非常身。

— 陶淵明《雜詩》

mach-o格式是OS X系統上的可執行文件格式，類似于windows的PE與linux的ELF，如果不徹底搞清楚mach-o的格式與相關知識，去做其他研究，無異于建造空中閣樓。

每個Mach-O文件斗包含一個Mach-O頭，然后是載入命令(Load Commands),最后是數據塊(Data)。

接下來就對整個Mach-O的格式做出詳細的分析。

0x01 Mach-O格式簡單介紹

Mach-O文件的格式如下圖所示：

又如下幾個部分組成：

Header：保存了Mach-O的一些基本信息，包括了平臺、文件類型、LoadCommands的個數等等。
LoadCommands：這一段緊跟Header，加載Mach-O文件時會使用這里的數據來確定內存的分布。
Data：每一個segment的具體數據都保存在這里，這里包含了具體的代碼、數據等等。

0x02 Headers

2.1 數據結構

Headers的定義可以在開源的內核代碼中找到。

 /*
 * The 32-bit mach header appears at the very beginning of the object file for
 * 32-bit architectures.
 */
struct mach_header {
	uint32_t	magic;		/* mach magic number identifier */
	cpu_type_t	cputype;	/* cpu specifier */
	cpu_subtype_t	cpusubtype;	/* machine specifier */
	uint32_t	filetype;	/* type of file */
	uint32_t	ncmds;		/* number of load commands */
	uint32_t	sizeofcmds;	/* the size of all the load commands */
	uint32_t	flags;		/* flags */
};

/* Constant for the magic field of the mach_header (32-bit architectures) */
#define	MH_MAGIC	0xfeedface	/* the mach magic number */
#define MH_CIGAM	0xcefaedfe	/* NXSwapInt(MH_MAGIC) */

/*
 * The 64-bit mach header appears at the very beginning of object files for
 * 64-bit architectures.
 */
struct mach_header_64 {
	uint32_t	magic;		/* mach magic number identifier */
	cpu_type_t	cputype;	/* cpu specifier */
	cpu_subtype_t	cpusubtype;	/* machine specifier */
	uint32_t	filetype;	/* type of file */
	uint32_t	ncmds;		/* number of load commands */
	uint32_t	sizeofcmds;	/* the size of all the load commands */
	uint32_t	flags;		/* flags */
	uint32_t	reserved;	/* reserved */
};

/* Constant for the magic field of the mach_header_64 (64-bit architectures) */
#define MH_MAGIC_64 0xfeedfacf /* the 64-bit mach magic number */
#define MH_CIGAM_64 0xcffaedfe /* NXSwapInt(MH_MAGIC_64) */
 

根據mach_header與mach_header_64的定義，很明顯可以看出，Headers的主要作用就是幫助系統迅速的定位Mach-O文件的運行環境，文件類型。

2.2 實例

使用工具分析一個mach-o文件來具體的看一下Mach-O Headers。

通過otool可以得到Mach header的具體的情況，但是可讀性略微有一點差。

 ?  bin otool -h git
git:
Mach header
      magic cputype cpusubtype  caps    filetype ncmds sizeofcmds      flags
 0xfeedfacf 16777223          3  0x80           2    17       1432 0x00200085
 

還有一個工具是MachOview可以看的更清楚一點。

MagicNumber的值為0xFEEDFACF所以該文件是一個64位平臺上的文件
CPU Type和CPU SubType也很容易理解，運行在X86_64的CPU平臺上
File Type標示了該文件是一個可執行文件，后面具體分析
Flags標示了這個MachO文件的四個特性，后面具體分析

2.3 具體參數

2.3.1 FileType

因為Mach-O文件不僅僅用來實現可執行文件，同時還用來實現了其他內容

內核擴展
庫文件
CoreDump
…

他的源碼定義如下：

 #define	MH_OBJECT	0x1		/* relocatable object file */
#define	MH_EXECUTE	0x2		/* demand paged executable file */
#define	MH_FVMLIB	0x3		/* fixed VM shared library file */
#define	MH_CORE		0x4		/* core file */
#define	MH_PRELOAD	0x5		/* preloaded executable file */
#define	MH_DYLIB	0x6		/* dynamically bound shared library */
#define	MH_DYLINKER	0x7		/* dynamic link editor */
#define	MH_BUNDLE	0x8		/* dynamically bound bundle file */
#define	MH_DYLIB_STUB	0x9		/* shared library stub for static */
					/*  linking only, no section contents */
#define	MH_DSYM		0xa		/* companion file with only debug */
					/*  sections */
#define	MH_KEXT_BUNDLE	0xb		/* x86_64 kexts */
 

解釋一下一些常用到的文件類型。

File Type用處例子

MH_OBJECT	編譯過程中產生的*.obj文件	gcc -c xxx.c 生成xxx.o文件
MH_EXECUTABLE	可執行二進制文件	/usr/bin/git
MH_CORE	CoreDump	崩潰時的Dump文件
MH_DYLIB	動態庫	/usr/lib/里面的那些庫文件
MH_DYLINKER	連接器linker	/usr/lib/dyld文件
MH_KEXT_BUNDLE	內核擴展文件	自己開發的簡單內核模塊

2.3.2 flags

Mach-O headers還包含了一些很重要的dyld的加載參數。代碼中的定義如下：

 #define	MH_INCRLINK	0x2		/* the object file is the output of an
					   incremental link against a base file
					   and can't be link edited again */
#define MH_DYLDLINK	0x4		/* the object file is input for the
					   dynamic linker and can't be staticly
					   link edited again */
#define MH_BINDATLOAD	0x8		/* the object file's undefined
					   references are bound by the dynamic
					   linker when loaded. */
#define MH_PREBOUND	0x10		/* the file has its dynamic undefined
					   references prebound. */
#define MH_SPLIT_SEGS	0x20		/* the file has its read-only and
					   read-write segments split */
#define MH_LAZY_INIT	0x40		/* the shared library init routine is
					   to be run lazily via catching memory
					   faults to its writeable segments
					   (obsolete) */
#define MH_TWOLEVEL	0x80		/* the image is using two-level name
					   space bindings */
...
//太長，有興趣可以自己看源碼
// EXTERNAL_HEADERS/mach-o/x86_64/loader.h
 

同樣簡單的介紹幾個比較重要的。

Flag Type含義

MH_NOUNDEFS	目標沒有未定義的符號，不存在鏈接依賴
MH_DYLDLINK	該目標文件是dyld的輸入文件，無法被再次的靜態鏈接
MH_PIE	允許隨機的地址空間
MH_ALLOW_STACK_EXECUTION	棧內存可執行代碼，一般是默認關閉的。
MH_NO_HEAP_EXECUTION	堆內存無法執行代碼

2.4 Headers小結

0x03 Load Commands

這是load_command的數據結構

 struct load_command {
	uint32_t cmd;		/* type of load command */
	uint32_t cmdsize;	/* total size of command in bytes */
};
 

Load Commands 直接就跟在Header后面，所有command占用內存的總和在Mach-O Header里面已經給出了。在加載過Header之后就是通過解析LoadCommand來加載接下來的數據了。我簡單的看了一下內核中是如何解析macho數據的，拋開內核的實現細節，邏輯其實也十分簡單。

 static
load_return_t
parse_machfile(
	struct vnode 		*vp,       
	vm_map_t		map,
	thread_t		thread,
	struct mach_header	*header,
	off_t			file_offset,
	off_t			macho_size,
	int			depth,
	int64_t			aslr_offset,
	int64_t			dyld_aslr_offset,
	load_result_t		*result
)
{
	[...] //此處省略大量初始化與檢測

		/*
		 * Loop through each of the load_commands indicated by the
		 * Mach-O header; if an absurd value is provided, we just
		 * run off the end of the reserved section by incrementing
		 * the offset too far, so we are implicitly fail-safe.
		 */
		offset = mach_header_sz;
		ncmds = header->ncmds;

		while (ncmds--) {
			/*
			 *	Get a pointer to the command.
			 */
			lcp = (struct load_command *)(addr + offset);
			//lcp設為當前要解析的cmd的地址
			oldoffset = offset;
			//oldoffset是從macho文件內存開始的地方偏移到當前command的偏移量
			offset += lcp->cmdsize;
			//重新計算offset，再加上當前command的長度，offset的值為文件內存起始地址到下一個command的偏移量
			/*
			 * Perform prevalidation of the struct load_command
			 * before we attempt to use its contents.  Invalid
			 * values are ones which result in an overflow, or
			 * which can not possibly be valid commands, or which
			 * straddle or exist past the reserved section at the
			 * start of the image.
			 */
			if (oldoffset > offset ||
			    lcp->cmdsize < sizeof(struct load_command) ||
			    offset > header->sizeofcmds + mach_header_sz) {
				ret = LOAD_BADMACHO;
				break;
			}
			//做了一個檢測，與如何加載進入內存無關

			/*
			 * Act on struct load_command's for which kernel
			 * intervention is required.
			 */
			switch(lcp->cmd) {
			case LC_SEGMENT:
				[...]
				ret = load_segment(lcp,
				                   header->filetype,
				                   control,
				                   file_offset,
				                   macho_size,
				                   vp,
				                   map,
				                   slide,
				                   result);
				break;
			case LC_SEGMENT_64:
				[...]
				ret = load_segment(lcp,
				                   header->filetype,
				                   control,
				                   file_offset,
				                   macho_size,
				                   vp,
				                   map,
				                   slide,
				                   result);
				break;
			case LC_UNIXTHREAD:
				if (pass != 1)
					break;
				ret = load_unixthread(
						 (struct thread_command *) lcp,
						 thread,
						 slide,
						 result);
				break;
			case LC_MAIN:
				if (pass != 1)
					break;
				if (depth != 1)
					break;
				ret = load_main(
						 (struct entry_point_command *) lcp,
						 thread,
						 slide,
						 result);
				break;
			case LC_LOAD_DYLINKER:
				if (pass != 3)
					break;
				if ((depth == 1) && (dlp == 0)) {
					dlp = (struct dylinker_command *)lcp;
					dlarchbits = (header->cputype & CPU_ARCH_MASK);
				} else {
					ret = LOAD_FAILURE;
				}
				break;
			case LC_UUID:
				if (pass == 1 && depth == 1) {
					ret = load_uuid((struct uuid_command *) lcp,
							(char *)addr + mach_header_sz + header->sizeofcmds,
							result);
				}
				break;
			case LC_CODE_SIGNATURE:
				[...]
				ret = load_code_signature(
					(struct linkedit_data_command *) lcp,
					vp,
					file_offset,
					macho_size,
					header->cputype,
					result);
				[...]
				break;
#if CONFIG_CODE_DECRYPTION
			case LC_ENCRYPTION_INFO:
			case LC_ENCRYPTION_INFO_64:
				if (pass != 3)
					break;
				ret = set_code_unprotect(
					(struct encryption_info_command *) lcp,
					addr, map, slide, vp, file_offset,
					header->cputype, header->cpusubtype);
				if (ret != LOAD_SUCCESS) {
					printf("proc %d: set_code_unprotect() error %d "
					       "for file \"%s\"\n",
					       p->p_pid, ret, vp->v_name);
					/* 
					 * Don't let the app run if it's 
					 * encrypted but we failed to set up the
					 * decrypter. If the keys are missing it will
					 * return LOAD_DECRYPTFAIL.
					 */
					 if (ret == LOAD_DECRYPTFAIL) {
						/* failed to load due to missing FP keys */
						proc_lock(p);
						p->p_lflag |= P_LTERM_DECRYPTFAIL;
						proc_unlock(p);
					 }
					 psignal(p, SIGKILL);
				}
				break;
#endif
			default:
				/* Other commands are ignored by the kernel */
				ret = LOAD_SUCCESS;
				break;
			}
			if (ret != LOAD_SUCCESS)
				break;
		}
		if (ret != LOAD_SUCCESS)
			break;
	}

	[...] //此處略去加載之后的處理代碼
}
 

3.1cmdsize字段

這里主要看while循環剛剛進入的時候幾行代碼,來理解是如何通過load_command的cmd字段來解析Macho文件的數據。

 ...
lcp = (struct load_command *)(addr + offset);
//lcp設為當前要解析的cmd的地址
oldoffset = offset;
//oldoffset是從macho文件內存開始的地方偏移到當前command的偏移量
offset += lcp->cmdsize;
//重新計算offset，再加上當前command的長度，offset的值為文件內存起始地址到下一個command的偏移量
...
 

3.2 cmd字段

 switch(lcp->cmd) {
			case LC_SEGMENT:
				[...]
				ret = load_segment(lcp,
				                   header->filetype,
				                   control,
				                   file_offset,
				                   macho_size,
				                   vp,
				                   map,
				                   slide,
				                   result);
				break;
			case LC_SEGMENT_64:
				[...]
				ret = load_segment(lcp,
				                   header->filetype,
				                   control,
				                   file_offset,
				                   macho_size,
				                   vp,
				                   map,
				                   slide,
				                   result);
				break;
			case LC_UNIXTHREAD:
				if (pass != 1)
					break;
				ret = load_unixthread(
						 (struct thread_command *) lcp,
						 thread,
						 slide,
						 result);
				break;
			case LC_MAIN:
				if (pass != 1)
					break;
				if (depth != 1)
					break;
				ret = load_main(
						 (struct entry_point_command *) lcp,
						 thread,
						 slide,
						 result);
				break;
			case LC_LOAD_DYLINKER:
				if (pass != 3)
					break;
				if ((depth == 1) && (dlp == 0)) {
					dlp = (struct dylinker_command *)lcp;
					dlarchbits = (header->cputype & CPU_ARCH_MASK);
				} else {
					ret = LOAD_FAILURE;
				}
				break;
			case LC_UUID:
				if (pass == 1 && depth == 1) {
					ret = load_uuid((struct uuid_command *) lcp,
							(char *)addr + mach_header_sz + header->sizeofcmds,
							result);
				}
				break;
			case LC_CODE_SIGNATURE:
				[...]
				ret = load_code_signature(
					(struct linkedit_data_command *) lcp,
					vp,
					file_offset,
					macho_size,
					header->cputype,
					result);
				[...]
				break;
#if CONFIG_CODE_DECRYPTION
			case LC_ENCRYPTION_INFO:
			case LC_ENCRYPTION_INFO_64:
				if (pass != 3)
					break;
				ret = set_code_unprotect(
					(struct encryption_info_command *) lcp,
					addr, map, slide, vp, file_offset,
					header->cputype, header->cpusubtype);
				if (ret != LOAD_SUCCESS) {
					printf("proc %d: set_code_unprotect() error %d "
					       "for file \"%s\"\n",
					       p->p_pid, ret, vp->v_name);
					/* 
					 * Don't let the app run if it's 
					 * encrypted but we failed to set up the
					 * decrypter. If the keys are missing it will
					 * return LOAD_DECRYPTFAIL.
					 */
					 if (ret == LOAD_DECRYPTFAIL) {
						/* failed to load due to missing FP keys */
						proc_lock(p);
						p->p_lflag |= P_LTERM_DECRYPTFAIL;
						proc_unlock(p);
					 }
					 psignal(p, SIGKILL);
				}
				break;
#endif
			default:
				/* Other commands are ignored by the kernel */
				ret = LOAD_SUCCESS;
				break;
			}
 

從這一段代碼可以看出，根據cmd字段的類型不同，使用了不同的函數來加載。簡單的列出一張表看一看在內核代碼中不同的command類型都有哪些作用。

Command類型處理函數用途

LC_SEGMENT；LC_SEGMENT_64	load_segment	將segment中的數據加載并映射到進程的內存空間去
LC_LOAD_DYLINKER	load_dylinker	調用/usr/lib/dyld程序
LC_UUID	load_uuid	加載128-bit的唯一ID
LC_THREAD	load_thread	開啟一個MACH線程，但是不分配棧空間。
LC_UNIXTHREAD	load_unixthread	開啟一個UNIX線程
LC_CODE_SIGNATURE	load_code_signature	進行數字簽名
LC_ENCRYPTION_INFO	set_code_unprotect	加密二進制文件

0x04 Segment&Section

加載數據時，主要加載的就是LC_SEGMET活著LC_SEGMENT_64。其他的Segment的用途在上一節已經簡單的介紹了，這里不做深究。

LCSEGMENT以及LC_SEGMENT_64的數據結構是這樣的。

 
struct segment_command { /* for 32-bit architectures */
	uint32_t	cmd;		/* LC_SEGMENT */
	uint32_t	cmdsize;	/* includes sizeof section structs */
	char		segname[16];	/* segment name */
	uint32_t	vmaddr;		/* memory address of this segment */
	uint32_t	vmsize;		/* memory size of this segment */
	uint32_t	fileoff;	/* file offset of this segment */
	uint32_t	filesize;	/* amount to map from the file */
	vm_prot_t	maxprot;	/* maximum VM protection */
	vm_prot_t	initprot;	/* initial VM protection */
	uint32_t	nsects;		/* number of sections in segment */
	uint32_t	flags;		/* flags */
};


struct segment_command_64 { /* for 64-bit architectures */
	uint32_t	cmd;		/* LC_SEGMENT_64 */
	uint32_t	cmdsize;	/* includes sizeof section_64 structs */
	char		segname[16];	/* segment name */
	uint64_t	vmaddr;		/* memory address of this segment */
	uint64_t	vmsize;		/* memory size of this segment */
	uint64_t	fileoff;	/* file offset of this segment */
	uint64_t	filesize;	/* amount to map from the file */
	vm_prot_t	maxprot;	/* maximum VM protection */
	vm_prot_t	initprot;	/* initial VM protection */
	uint32_t	nsects;		/* number of sections in segment */
	uint32_t	flags;		/* flags */
};
 

可以看出，這里大部分的數據是用來幫助內核將Segment映射到虛擬內存的。主要要關注的是nsects

字段，標示了Segment中有多少secetion。section是具體有用的數據存放的地方。

Section的數據結構如下：

 struct section { /* for 32-bit architectures */
	char		sectname[16];	/* name of this section */
	char		segname[16];	/* segment this section goes in */
	uint32_t	addr;		/* memory address of this section */
	uint32_t	size;		/* size in bytes of this section */
	uint32_t	offset;		/* file offset of this section */
	uint32_t	align;		/* section alignment (power of 2) */
	uint32_t	reloff;		/* file offset of relocation entries */
	uint32_t	nreloc;		/* number of relocation entries */
	uint32_t	flags;		/* flags (section type and attributes)*/
	uint32_t	reserved1;	/* reserved (for offset or index) */
	uint32_t	reserved2;	/* reserved (for count or sizeof) */
};

struct section_64 { /* for 64-bit architectures */
	char		sectname[16];	/* name of this section */
	char		segname[16];	/* segment this section goes in */
	uint64_t	addr;		/* memory address of this section */
	uint64_t	size;		/* size in bytes of this section */
	uint32_t	offset;		/* file offset of this section */
	uint32_t	align;		/* section alignment (power of 2) */
	uint32_t	reloff;		/* file offset of relocation entries */
	uint32_t	nreloc;		/* number of relocation entries */
	uint32_t	flags;		/* flags (section type and attributes)*/
	uint32_t	reserved1;	/* reserved (for offset or index) */
	uint32_t	reserved2;	/* reserved (for count or sizeof) */
	uint32_t	reserved3;	/* reserved */
};
 

除了同樣有幫助內存映射的變量外，在了解Mach-O格式的時候，只需要知道不同的Section有著不同的作用就可以了。

Section作用

__text	代碼
__cstring	硬編碼的字符串
__const	const 關鍵詞修飾過的變量
__DATA.__bss	bss段

因為section類型已經是最小的分類了，還有更多復雜section段就不一一例舉了，遇到沒見過的section類型可以自行查找Apple文檔。

0x05 小結

通過對Mach-O格式的仔細分析，可以更好的理解Mach-O文件的加載過程，為研究dyld或者其他OS X系統下的模塊打好基礎。

參考

1.mach-o文件加載的全過程(1)

http://dongaxis.github.io/2015/01/01/mac-o%E6%96%87%E4%BB%B6%E5%8A%A0%E8%BD%BD%E7%9A%84%E5%85%A8%E8%BF%87%E7%A8%8B-1/

2.Mach-O 可執行文件

http://objccn.io/issue-6-3/

3.iPhone Mach-O文件格式與代碼簽名

http://zhiwei.li/text/2012/02/15/iphone-mach-o%E6%96%87%E4%BB%B6%E6%A0%BC%E5%BC%8F%E4%B8%8E%E4%BB%A3%E7%A0%81%E7%AD%BE%E5%90%8D/

4.Dynamic Linking of Imported Functions in Mach-O

http://www.codeproject.com/Articles/187181/Dynamic-Linking-of-Imported-Functions-in-Mach-O

5.otool詳解Mach-o文件頭部

http://www.mc2lab.com/?p=68

原文地址： http://turingh.github.io/2016/03/07/mach-o%E6%96%87%E4%BB%B6%E6%A0%BC%E5%BC%8F%E5%88%86%E6%9E%90/

總結

以上是生活随笔為你收集整理的mach-o格式分析的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

格式
Mach