把握linux内核设计思想(十二):内存管理之slab分配器
【版權(quán)聲明:尊重原創(chuàng),轉(zhuǎn)載請(qǐng)保留出處:blog.csdn.net/shallnet,文章僅供學(xué)習(xí)交流。請(qǐng)勿用于商業(yè)用途】
???????上一節(jié)最后說到對(duì)于小內(nèi)存區(qū)的請(qǐng)求,假設(shè)採用伙伴系統(tǒng)來進(jìn)行分配,則會(huì)在頁內(nèi)產(chǎn)生非常多空暇空間無法使用。因此產(chǎn)生slab分配器來處理對(duì)小內(nèi)存區(qū)(幾十或幾百字節(jié))的請(qǐng)求。Linux中引入Slab的主要目的是為了降低對(duì)伙伴算法的調(diào)用次數(shù)。
????????內(nèi)核常常重復(fù)使用某一內(nèi)存區(qū)。比如。僅僅要內(nèi)核創(chuàng)建一個(gè)新的進(jìn)程,就要為該進(jìn)程相關(guān)的數(shù)據(jù)結(jié)構(gòu)(task_struct、打開文件對(duì)象等)分配內(nèi)存區(qū)。當(dāng)進(jìn)程結(jié)束時(shí)。收回這些內(nèi)存區(qū)。由于進(jìn)程的創(chuàng)建和撤銷很頻繁。linux把那些頻繁使用的頁面保存在快速緩存中并又一次使用。
????????slab分配器基于對(duì)象進(jìn)行管理,同樣類型的對(duì)象歸為一類(如進(jìn)程描寫敘述符就是一類),每當(dāng)要申請(qǐng)這樣一個(gè)對(duì)象。slab分配器就分配一個(gè)空暇對(duì)象出去,而當(dāng)要釋放時(shí),將其又一次保存在slab分配器中,而不是直接返回給伙伴系統(tǒng)。
對(duì)于頻繁請(qǐng)求的對(duì)象。創(chuàng)建適當(dāng)大小的專用對(duì)象來處理。對(duì)于不頻繁的對(duì)象。用一系列幾何分布大小的對(duì)象來處理(詳見通用對(duì)象)。
????????Slab分配模式把對(duì)象分組放進(jìn)緩沖區(qū),為緩沖區(qū)的組織和管理與硬件快速緩存的命中率密切相關(guān),因此。Slab緩沖區(qū)并不是由各個(gè)對(duì)象直接構(gòu)成。而是由一連串的“大塊(Slab)”構(gòu)成,而每一個(gè)大塊中則包括了若干個(gè)同種類型的對(duì)象。這些對(duì)象或已被分配。或空暇。實(shí)際上。緩沖區(qū)就是主存中的一片區(qū)域,把這片區(qū)域劃分為多個(gè)塊。每塊就是一個(gè)Slab,每一個(gè)Slab由一個(gè)或多個(gè)頁面組成,每一個(gè)Slab中存放的就是對(duì)象。
slab相關(guān)數(shù)據(jù)結(jié)構(gòu):
緩沖區(qū)數(shù)據(jù)結(jié)構(gòu)使用kmem_cache結(jié)構(gòu)來表示。
struct kmem_cache { /* 1) per-cpu data, touched during every alloc/free */struct array_cache *array[NR_CPUS]; /* 2) Cache tunables. Protected by cache_chain_mutex */unsigned int batchcount;unsigned int limit;unsigned int shared;unsigned int buffer_size;u32 reciprocal_buffer_size; /* 3) touched by every alloc & free from the backend */unsigned int flags; /* constant flags */unsigned int num; /* # of objs per slab *//* 4) cache_grow/shrink *//* order of pgs per slab (2^n) */unsigned int gfporder;/* force GFP flags, e.g. GFP_DMA */gfp_t gfpflags;size_t colour; /* cache colouring range */unsigned int colour_off; /* colour offset */struct kmem_cache *slabp_cache;unsigned int slab_size;unsigned int dflags; /* dynamic flags *//* constructor func */void (*ctor)(void *obj);/* 5) cache creation/removal */const char *name;struct list_head next;/* 6) statistics */ #ifdef CONFIG_DEBUG_SLABunsigned long num_active;unsigned long num_allocations;unsigned long high_mark;unsigned long grown;unsigned long reaped;unsigned long errors;unsigned long max_freeable;unsigned long node_allocs;unsigned long node_frees;unsigned long node_overflow;atomic_t allochit;atomic_t allocmiss;atomic_t freehit;atomic_t freemiss;/** If debugging is enabled, then the allocator can add additional* fields and/or padding to every object. buffer_size contains the total* object size including these internal fields, the following two* variables contain the offset to the user object and its size.*/int obj_offset;int obj_size; #endif /* CONFIG_DEBUG_SLAB *//** We put nodelists[] at the end of kmem_cache, because we want to size* this array to nr_node_ids slots instead of MAX_NUMNODES* (see kmem_cache_init())* We still use [MAX_NUMNODES] and not [1] or [0] because cache_cache* is statically defined, so we reserve the max number of nodes.*/struct kmem_list3 *nodelists[MAX_NUMNODES];/** Do not add fields after nodelists[]*/ };
當(dāng)中struct kmem_list3結(jié)構(gòu)體鏈接slab,共享快速緩存。其定義例如以下:
/** The slab lists for all objects.*/ struct kmem_list3 {struct list_head slabs_partial; /* partial list first, better asm code */struct list_head slabs_full;struct list_head slabs_free;unsigned long free_objects;unsigned int free_limit;unsigned int colour_next; /* Per-node cache coloring */spinlock_t list_lock;struct array_cache *shared; /* shared per node */struct array_cache **alien; /* on other nodes */unsigned long next_reap; /* updated without locking */int free_touched; /* updated without locking */ };
該結(jié)構(gòu)包括三個(gè)鏈表:slabs_partial、slabs_full、slabs_free,這些鏈表包括緩沖區(qū)全部slab。slab描寫敘述符struct slab用于描寫敘述每一個(gè)slab:
/** struct slab** Manages the objs in a slab. Placed either at the beginning of mem allocated* for a slab, or allocated from an general cache.* Slabs are chained into three list: fully used, partial, fully free slabs.*/ struct slab {struct list_head list;unsigned long colouroff;void *s_mem; /* including colour offset */unsigned int inuse; /* num of objs active in slab */kmem_bufctl_t free;unsigned short nodeid; };
一個(gè)新的緩沖區(qū)使用例如以下函數(shù)創(chuàng)建:
struct kmem_cache *kmem_cache_create (const char *name, size_t size, size_t align, unsigned long flags, void (*ctor)(void *));函數(shù)創(chuàng)建成功會(huì)返回一個(gè)指向所創(chuàng)建緩沖區(qū)的指針;撤銷一個(gè)緩沖區(qū)調(diào)用例如以下函數(shù):
<span style="font-family:Microsoft YaHei;">void kmem_cache_destroy(struct kmem_cache *cachep);</span>上面兩個(gè)函數(shù)都不能在中斷上下文中使用。由于它可能睡眠。
在創(chuàng)建來緩沖區(qū)之后,能夠通過下列函數(shù)獲取對(duì)象:
/*** kmem_cache_alloc - Allocate an object* @cachep: The cache to allocate from.* @flags: See kmalloc().** Allocate an object from this cache. The flags are only relevant* if the cache has no available objects.*/ void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags) {void *ret = __cache_alloc(cachep, flags, __builtin_return_address(0));trace_kmem_cache_alloc(_RET_IP_, ret,obj_size(cachep), cachep->buffer_size, flags);return ret; }
該函數(shù)從給點(diǎn)緩沖區(qū)cachep中返回一個(gè)指向?qū)ο蟮闹羔槨?/span>
假設(shè)緩沖區(qū)的全部slab中都沒有空暇對(duì)象,那么slab層必須通過kmem_getpages()獲取新的頁。參數(shù)flags傳遞給_get_free_pages()。
<span style="font-family:Microsoft YaHei;">static void *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, int nodeid);</span>釋放對(duì)象使用例如以下函數(shù):
/*** kmem_cache_free - Deallocate an object* @cachep: The cache the allocation was from.* @objp: The previously allocated object.** Free an object which was previously allocated from this* cache.*/ void kmem_cache_free(struct kmem_cache *cachep, void *objp) {unsigned long flags;local_irq_save(flags);debug_check_no_locks_freed(objp, obj_size(cachep));if (!(cachep->flags & SLAB_DEBUG_OBJECTS))debug_check_no_obj_freed(objp, obj_size(cachep));__cache_free(cachep, objp);local_irq_restore(flags);trace_kmem_cache_free(_RET_IP_, objp); }
假設(shè)你要頻繁的創(chuàng)建非常多同樣類型的對(duì)象,就要當(dāng)考慮使用slab快速緩存區(qū)。
實(shí)際上上一節(jié)所講kmalloc()函數(shù)也是使用slab分配器分配的。
static __always_inline void *kmalloc(size_t size, gfp_t flags) {struct kmem_cache *cachep;void *ret;if (__builtin_constant_p(size)) {int i = 0;if (!size)return ZERO_SIZE_PTR;#define CACHE(x) \if (size <= x) \goto found; \else \i++; #include <linux/kmalloc_sizes.h> #undef CACHEreturn NULL; found: #ifdef CONFIG_ZONE_DMAif (flags & GFP_DMA)cachep = malloc_sizes[i].cs_dmacachep;else #endifcachep = malloc_sizes[i].cs_cachep;ret = kmem_cache_alloc_notrace(cachep, flags);trace_kmalloc(_THIS_IP_, ret,size, slab_buffer_size(cachep), flags);return ret;}return __kmalloc(size, flags); }kfree函數(shù)實(shí)現(xiàn)例如以下:
/*** kfree - free previously allocated memory* @objp: pointer returned by kmalloc.** If @objp is NULL, no operation is performed.** Don't free memory not originally allocated by kmalloc()* or you will run into trouble.*/ void kfree(const void *objp) {struct kmem_cache *c;unsigned long flags;trace_kfree(_RET_IP_, objp);if (unlikely(ZERO_OR_NULL_PTR(objp)))return;local_irq_save(flags);kfree_debugcheck(objp);c = virt_to_cache(objp);debug_check_no_locks_freed(objp, obj_size(c));debug_check_no_obj_freed(objp, obj_size(c));__cache_free(c, (void *)objp);local_irq_restore(flags); } 最后。結(jié)合上一節(jié)。看看分配函數(shù)的選擇:
假設(shè)須要連續(xù)的物理頁,就能夠使用某個(gè)低級(jí)頁分配器或kmalloc()。
假設(shè)想從高端內(nèi)存進(jìn)行分配,使用alloc_pages()。
假設(shè)不須要物理上連續(xù)的頁,而不過虛擬地址上連續(xù)的頁,那么就是用vmalloc。
假設(shè)要?jiǎng)?chuàng)建和銷毀非常多大的數(shù)據(jù)結(jié)構(gòu),那么考慮建立slab快速緩存。
總結(jié)
以上是生活随笔為你收集整理的把握linux内核设计思想(十二):内存管理之slab分配器的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 什么是社保卡
- 下一篇: Linux 用户与用户组管理