當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Golang源码探索(三) GC的实现原理

發布時間：2024/8/26 编程问答 31 豆豆

生活随笔收集整理的這篇文章主要介紹了 Golang源码探索(三) GC的实现原理小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Golang從1.5開始引入了三色GC, 經過多次改進, 當前的1.9版本的GC停頓時間已經可以做到極短.
停頓時間的減少意味著"最大響應時間"的縮短, 這也讓go更適合編寫網絡服務程序.
這篇文章將通過分析golang的源代碼來講解go中的三色GC的實現原理.

這個系列分析的golang源代碼是Google官方的實現的1.9.2版本, 不適用于其他版本和gccgo等其他實現,
運行環境是Ubuntu 16.04 LTS 64bit.
首先會講解基礎概念, 然后講解分配器, 再講解收集器的實現.

基礎概念

內存結構

go在程序啟動時會分配一塊虛擬內存地址是連續的內存, 結構如下:

這一塊內存分為了3個區域, 在X64上大小分別是512M, 16G和512G, 它們的作用如下:

arena

arena區域就是我們通常說的heap, go從heap分配的內存都在這個區域中.

bitmap

bitmap區域用于表示arena區域中哪些地址保存了對象, 并且對象中哪些地址包含了指針.
bitmap區域中一個byte(8 bit)對應了arena區域中的四個指針大小的內存, 也就是2 bit對應一個指針大小的內存.
所以bitmap區域的大小是 512GB / 指針大小(8 byte) / 4 = 16GB.

bitmap區域中的一個byte對應arena區域的四個指針大小的內存的結構如下,
每一個指針大小的內存都會有兩個bit分別表示是否應該繼續掃描和是否包含指針:

bitmap中的byte和arena的對應關系從末尾開始, 也就是隨著內存分配會向兩邊擴展:

spans

spans區域用于表示arena區中的某一頁(Page)屬于哪個span, 什么是span將在下面介紹.
spans區域中一個指針(8 byte)對應了arena區域中的一頁(在go中一頁=8KB).
所以spans的大小是 512GB / 頁大小(8KB) * 指針大小(8 byte) = 512MB.

spans區域的一個指針對應arena區域的一頁的結構如下, 和bitmap不一樣的是對應關系會從開頭開始:

什么時候從Heap分配對象

很多講解go的文章和書籍中都提到過, go會自動確定哪些對象應該放在棧上, 哪些對象應該放在堆上.
簡單的來說, 當一個對象的內容可能在生成該對象的函數結束后被訪問, 那么這個對象就會分配在堆上.
在堆上分配對象的情況包括:

返回對象的指針
傳遞了對象的指針到其他函數
在閉包中使用了對象并且需要修改對象
使用new

在C語言中函數返回在棧上的對象的指針是非常危險的事情, 但在go中卻是安全的, 因為這個對象會自動在堆上分配.
go決定是否使用堆分配對象的過程也叫"逃逸分析".

GC Bitmap

GC在標記時需要知道哪些地方包含了指針, 例如上面提到的bitmap區域涵蓋了arena區域中的指針信息.
除此之外, GC還需要知道棧空間上哪些地方包含了指針,
因為棧空間不屬于arena區域, 棧空間的指針信息將會在函數信息里面.
另外, GC在分配對象時也需要根據對象的類型設置bitmap區域, 來源的指針信息將會在類型信息里面.

總結起來go中有以下的GC Bitmap:

bitmap區域: 涵蓋了arena區域, 使用2 bit表示一個指針大小的內存
函數信息: 涵蓋了函數的棧空間, 使用1 bit表示一個指針大小的內存 (位于stackmap.bytedata)
類型信息: 在分配對象時會復制到bitmap區域, 使用1 bit表示一個指針大小的內存 (位于_type.gcdata)

Span

span是用于分配對象的區塊, 下圖是簡單說明了Span的內部結構:

通常一個span包含了多個大小相同的元素, 一個元素會保存一個對象, 除非:

span用于保存大對象, 這種情況span只有一個元素
span用于保存極小對象且不包含指針的對象(tiny object), 這種情況span會用一個元素保存多個對象

span中有一個freeindex標記下一次分配對象時應該開始搜索的地址, 分配后freeindex會增加,
在freeindex之前的元素都是已分配的, 在freeindex之后的元素有可能已分配, 也有可能未分配.

span每次GC以后都可能會回收掉一些元素, allocBits用于標記哪些元素是已分配的, 哪些元素是未分配的.
使用freeindex + allocBits可以在分配時跳過已分配的元素, 把對象設置在未分配的元素中,
但因為每次都去訪問allocBits效率會比較慢, span中有一個整數型的allocCache用于緩存freeindex開始的bitmap, 緩存的bit值與原值相反.

gcmarkBits用于在gc時標記哪些對象存活,?每次gc以后gcmarkBits會變為allocBits.
需要注意的是span結構本身的內存是從系統分配的, 上面提到的spans區域和bitmap區域都只是一個索引.

Span的類型

span根據大小可以分為67個類型, 如下:

// class bytes/obj bytes/span objects tail waste max waste // 1 8 8192 1024 0 87.50% // 2 16 8192 512 0 43.75% // 3 32 8192 256 0 46.88% // 4 48 8192 170 32 31.52% // 5 64 8192 128 0 23.44% // 6 80 8192 102 32 19.07% // 7 96 8192 85 32 15.95% // 8 112 8192 73 16 13.56% // 9 128 8192 64 0 11.72% // 10 144 8192 56 128 11.82% // 11 160 8192 51 32 9.73% // 12 176 8192 46 96 9.59% // 13 192 8192 42 128 9.25% // 14 208 8192 39 80 8.12% // 15 224 8192 36 128 8.15% // 16 240 8192 34 32 6.62% // 17 256 8192 32 0 5.86% // 18 288 8192 28 128 12.16% // 19 320 8192 25 192 11.80% // 20 352 8192 23 96 9.88% // 21 384 8192 21 128 9.51% // 22 416 8192 19 288 10.71% // 23 448 8192 18 128 8.37% // 24 480 8192 17 32 6.82% // 25 512 8192 16 0 6.05% // 26 576 8192 14 128 12.33% // 27 640 8192 12 512 15.48% // 28 704 8192 11 448 13.93% // 29 768 8192 10 512 13.94% // 30 896 8192 9 128 15.52% // 31 1024 8192 8 0 12.40% // 32 1152 8192 7 128 12.41% // 33 1280 8192 6 512 15.55% // 34 1408 16384 11 896 14.00% // 35 1536 8192 5 512 14.00% // 36 1792 16384 9 256 15.57% // 37 2048 8192 4 0 12.45% // 38 2304 16384 7 256 12.46% // 39 2688 8192 3 128 15.59% // 40 3072 24576 8 0 12.47% // 41 3200 16384 5 384 6.22% // 42 3456 24576 7 384 8.83% // 43 4096 8192 2 0 15.60% // 44 4864 24576 5 256 16.65% // 45 5376 16384 3 256 10.92% // 46 6144 24576 4 0 12.48% // 47 6528 32768 5 128 6.23% // 48 6784 40960 6 256 4.36% // 49 6912 49152 7 768 3.37% // 50 8192 8192 1 0 15.61% // 51 9472 57344 6 512 14.28% // 52 9728 49152 5 512 3.64% // 53 10240 40960 4 0 4.99% // 54 10880 32768 3 128 6.24% // 55 12288 24576 2 0 11.45% // 56 13568 40960 3 256 9.99% // 57 14336 57344 4 0 5.35% // 58 16384 16384 1 0 12.49% // 59 18432 73728 4 0 11.11% // 60 19072 57344 3 128 3.57% // 61 20480 40960 2 0 6.87% // 62 21760 65536 3 256 6.25% // 63 24576 24576 1 0 11.45% // 64 27264 81920 3 128 10.00% // 65 28672 57344 2 0 4.91% // 66 32768 32768 1 0 12.50%

以類型(class)為1的span為例,
span中的元素大小是8 byte, span本身占1頁也就是8K, 一共可以保存1024個對象.

在分配對象時, 會根據對象的大小決定使用什么類型的span,
例如16 byte的對象會使用span 2, 17 byte的對象會使用span 3, 32 byte的對象會使用span 3.
從這個例子也可以看到, 分配17和32 byte的對象都會使用span 3, 也就是說部分大小的對象在分配時會浪費一定的空間.

有人可能會注意到, 上面最大的span的元素大小是32K, 那么分配超過32K的對象會在哪里分配呢?
超過32K的對象稱為"大對象", 分配大對象時, 會直接從heap分配一個特殊的span,
這個特殊的span的類型(class)是0, 只包含了一個大對象, span的大小由對象的大小決定.

特殊的span加上的66個標準的span, 一共組成了67個span類型.

Span的位置

在前一篇中我提到了P是一個虛擬的資源, 同一時間只能有一個線程訪問同一個P, 所以P中的數據不需要鎖.
為了分配對象時有更好的性能, 各個P中都有span的緩存(也叫mcache), 緩存的結構如下:

各個P中按span類型的不同, 有67*2=134個span的緩存,

其中scan和noscan的區別在于,
如果對象包含了指針, 分配對象時會使用scan的span,
如果對象不包含指針, 分配對象時會使用noscan的span.
把span分為scan和noscan的意義在于,
GC掃描對象的時候對于noscan的span可以不去查看bitmap區域來標記子對象, 這樣可以大幅提升標記的效率.

在分配對象時將會從以下的位置獲取適合的span用于分配:

首先從P的緩存(mcache)獲取, 如果有緩存的span并且未滿則使用, 這個步驟不需要鎖
然后從全局緩存(mcentral)獲取, 如果獲取成功則設置到P, 這個步驟需要鎖
最后從mheap獲取, 獲取后設置到全局緩存, 這個步驟需要鎖

在P中緩存span的做法跟CoreCLR中線程緩存分配上下文(Allocation Context)的做法相似,
都可以讓分配對象時大部分時候不需要線程鎖, 改進分配的性能.

分配對象的處理

分配對象的流程

go從堆分配對象時會調用newobject函數, 這個函數的流程大致如下:

首先會檢查GC是否在工作中, 如果GC在工作中并且當前的G分配了一定大小的內存則需要協助GC做一定的工作,
這個機制叫GC Assist, 用于防止分配內存太快導致GC回收跟不上的情況發生.

之后會判斷是小對象還是大對象, 如果是大對象則直接調用largeAlloc從堆中分配,
如果是小對象分3個階段獲取可用的span, 然后從span中分配對象:

首先從P的緩存(mcache)獲取
然后從全局緩存(mcentral)獲取, 全局緩存中有可用的span的列表
最后從mheap獲取, mheap中也有span的自由列表, 如果都獲取失敗則從arena區域分配

這三個階段的詳細結構如下圖:

數據類型的定義

分配對象涉及的數據類型包含:

p: 前一篇提到過, P是協程中的用于運行go代碼的虛擬資源
m: 前一篇提到過, M目前代表系統線程
g: 前一篇提到過, G就是goroutine
mspan: 用于分配對象的區塊
mcentral: 全局的mspan緩存, 一共有67*2=134個
mheap: 用于管理heap的對象, 全局只有一個

源代碼分析

go從堆分配對象時會調用newobject函數, 先從這個函數看起:

// implementation of new builtin // compiler (both frontend and SSA backend) knows the signature // of this function func newobject(typ *_type) unsafe.Pointer { return mallocgc(typ.size, typ, true) }

newobject調用了mallocgc函數:

// Allocate an object of size bytes. // Small objects are allocated from the per-P cache's free lists. // Large objects (> 32 kB) are allocated straight from the heap. func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer { if gcphase == _GCmarktermination { throw("mallocgc called with gcphase == _GCmarktermination") } if size == 0 { return unsafe.Pointer(&zerobase) } if debug.sbrk != 0 { align := uintptr(16) if typ != nil { align = uintptr(typ.align) } return persistentalloc(size, align, &memstats.other_sys) } // 判斷是否要輔助GC工作 // gcBlackenEnabled在GC的標記階段會開啟 // assistG is the G to charge for this allocation, or nil if // GC is not currently active. var assistG *g if gcBlackenEnabled != 0 { // Charge the current user G for this allocation. assistG = getg() if assistG.m.curg != nil { assistG = assistG.m.curg } // Charge the allocation against the G. We'll account // for internal fragmentation at the end of mallocgc. assistG.gcAssistBytes -= int64(size) // 會按分配的大小判斷需要協助GC完成多少工作 // 具體的算法將在下面講解收集器時說明 if assistG.gcAssistBytes < 0 { // This G is in debt. Assist the GC to correct // this before allocating. This must happen // before disabling preemption. gcAssistAlloc(assistG) } } // 增加當前G對應的M的lock計數, 防止這個G被搶占 // Set mp.mallocing to keep from being preempted by GC. mp := acquirem() if mp.mallocing != 0 { throw("malloc deadlock") } if mp.gsignal == getg() { throw("malloc during signal") } mp.mallocing = 1 shouldhelpgc := false dataSize := size // 獲取當前G對應的M對應的P的本地span緩存(mcache) // 因為M在擁有P后會把P的mcache設到M中, 這里返回的是getg().m.mcache c := gomcache() var x unsafe.Pointer noscan := typ == nil || typ.kind&kindNoPointers != 0 // 判斷是否小對象, maxSmallSize當前的值是32K if size <= maxSmallSize { // 如果對象不包含指針, 并且對象的大小小于16 bytes, 可以做特殊處理 // 這里是針對非常小的對象的優化, 因為span的元素最小只能是8 byte, 如果對象更小那么很多空間都會被浪費掉 // 非常小的對象可以整合在"class 2 noscan"的元素(大小為16 byte)中 if noscan && size < maxTinySize { // Tiny allocator. // // Tiny allocator combines several tiny allocation requests // into a single memory block. The resulting memory block // is freed when all subobjects are unreachable. The subobjects // must be noscan (don't have pointers), this ensures that // the amount of potentially wasted memory is bounded. // // Size of the memory block used for combining (maxTinySize) is tunable. // Current setting is 16 bytes, which relates to 2x worst case memory // wastage (when all but one subobjects are unreachable). // 8 bytes would result in no wastage at all, but provides less // opportunities for combining. // 32 bytes provides more opportunities for combining, // but can lead to 4x worst case wastage. // The best case winning is 8x regardless of block size. // // Objects obtained from tiny allocator must not be freed explicitly. // So when an object will be freed explicitly, we ensure that // its size >= maxTinySize. // // SetFinalizer has a special case for objects potentially coming // from tiny allocator, it such case it allows to set finalizers // for an inner byte of a memory block. // // The main targets of tiny allocator are small strings and // standalone escaping variables. On a json benchmark // the allocator reduces number of allocations by ~12% and // reduces heap size by ~20%. off := c.tinyoffset // Align tiny pointer for required (conservative) alignment. if size&7 == 0 { off = round(off, 8) } else if size&3 == 0 { off = round(off, 4) } else if size&1 == 0 { off = round(off, 2) } if off+size <= maxTinySize && c.tiny != 0 { // The object fits into existing tiny block. x = unsafe.Pointer(c.tiny + off) c.tinyoffset = off + size c.local_tinyallocs++ mp.mallocing = 0 releasem(mp) return x } // Allocate a new maxTinySize block. span := c.alloc[tinySpanClass] v := nextFreeFast(span) if v == 0 { v, _, shouldhelpgc = c.nextFree(tinySpanClass) } x = unsafe.Pointer(v) (*[2]uint64)(x)[0] = 0 (*[2]uint64)(x)[1] = 0 // See if we need to replace the existing tiny block with the new one // based on amount of remaining free space. if size < c.tinyoffset || c.tiny == 0 { c.tiny = uintptr(x) c.tinyoffset = size } size = maxTinySize } else { // 否則按普通的小對象分配 // 首先獲取對象的大小應該使用哪個span類型 var sizeclass uint8 if size <= smallSizeMax-8 { sizeclass = size_to_class8[(size+smallSizeDiv-1)/smallSizeDiv] } else { sizeclass = size_to_class128[(size-smallSizeMax+largeSizeDiv-1)/largeSizeDiv] } size = uintptr(class_to_size[sizeclass]) // 等于sizeclass * 2 + (noscan ? 1 : 0) spc := makeSpanClass(sizeclass, noscan) span := c.alloc[spc] // 嘗試快速的從這個span中分配 v := nextFreeFast(span) if v ==

轉載于:https://www.cnblogs.com/xumaojun/p/8547439.html

總結

以上是生活随笔為你收集整理的Golang源码探索(三) GC的实现原理的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： LeetCode 13. Roman t
下一篇： IoT与区块链的机遇与挑战