longtext长度为0是什么意思_为什么 HashMap 中链表长度大于 8 才转化为红黑树?
Java 中的 HashMap 采用鏈表法來解決哈希沖突(HashMap 原理),即具有相同桶下標(biāo)的鍵值對(duì)使用一個(gè)鏈表儲(chǔ)存。當(dāng)鏈表變長時(shí),查找和添加(需要確定 key 是否已經(jīng)存在)都需要遍歷這個(gè)鏈表,速度會(huì)變慢。JDK 1.8 后加入了鏈表轉(zhuǎn)換為紅黑樹的機(jī)制,但是紅黑樹的轉(zhuǎn)換并不是一個(gè)廉價(jià)的操作,只有當(dāng)鏈表長度大于等于 TREEIFY_THRESHOLD 才會(huì) treeify。
/*** The bin count threshold for using a tree rather than list for a* bin. Bins are converted to trees when adding an element to a* bin with at least this many nodes. The value must be greater* than 2 and should be at least 8 to mesh with assumptions in* tree removal about conversion back to plain bins upon* shrinkage.*/static final int TREEIFY_THRESHOLD = 8;TREEIFY_THRESHOLD 默認(rèn)為 8,putVal() 源碼如下:
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1sttreeifyBin(tab, hash);實(shí)際上并不是只要鏈表長度大于 8 就會(huì) treeify。當(dāng) table.length(桶的個(gè)數(shù))小于 MIN_TREEIFY_CAPACITY 時(shí)會(huì)優(yōu)先擴(kuò)容而不是轉(zhuǎn)換為紅黑樹。
/*** The smallest table capacity for which bins may be treeified.* (Otherwise the table is resized if too many nodes in a bin.)* Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts* between resizing and treeification thresholds.*/static final int MIN_TREEIFY_CAPACITY = 64;MIN_TREEIFY_CAPACITY 默認(rèn)為 64,treeifyBin() 源碼大致是這樣:
final void treeifyBin(Node<K,V>[] tab, int hash) {int n, index; Node<K,V> e;if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY) // 大小不夠,優(yōu)先擴(kuò)容resize();else if ((e = tab[index = (n - 1) & hash]) != null) {// ...}}為什么是 8 ?
TREEIFY_THRESHOLD 的默認(rèn)值 8 是如何確定的呢?源碼中的注解其實(shí)給出了答案:
Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough nodes to warrant use (see TREEIFY_THRESHOLD). And when they become too small (due to removal or resizing) they are converted back to plain bins. In usages with well-distributed user hashCodes, tree bins are rarely used. Ideally, under random hashCodes, the frequency of nodes in bins follows a Poisson distribution with a parameter of about 0.5 on average for the default resizing threshold of 0.75, although with a large variance because of resizing granularity. Ignoring variance, the expected occurrences of list size k are (exp(-0.5) * pow(0.5, k) / factorial(k)). The first values are:treeify 過程會(huì)把原本的 Node 對(duì)象轉(zhuǎn)化為 TreeNode 對(duì)象,而 TreeNode 大小是 Node 的兩倍;除去內(nèi)存的損耗,treeify 本身也是一個(gè)耗時(shí)的過程,并且在紅黑樹節(jié)點(diǎn)數(shù)小于等于 UNTREEIFY_THRESHOLD(默認(rèn)為 6)時(shí),紅黑樹又會(huì)重新轉(zhuǎn)換為鏈表,如果出現(xiàn)頻繁的相互轉(zhuǎn)化,這是一筆不小的開銷。
所以我們需要確定一個(gè) k 值,鏈表長度大于等于 k 時(shí),會(huì)轉(zhuǎn)化為紅黑樹。k 不能太大,因?yàn)殒湵淼拈L度過大會(huì)影響插入和查找效率;k 也不能太小,treeify 并不是一個(gè)廉價(jià)的操作,我們希望鏈表長度大于等于 k 的概率要足夠小,這樣就可以盡量避免 treeify。
鏈表長度為 k 的概率就是出現(xiàn) k 個(gè)鍵值對(duì)都在同一個(gè)桶中的概率,假設(shè) table 的長度為 m,也就是有 m 個(gè)桶,一個(gè)鍵值對(duì)落入每一個(gè)桶都是等概率的,求不同 k 對(duì)應(yīng)的概率就是標(biāo)準(zhǔn)的泊松分布問題:
其中
, 由于等概率,所以 ,關(guān)鍵在于求 ,也就是鍵值對(duì)的個(gè)數(shù)。通常情況下 HashMap 都是要經(jīng)歷擴(kuò)容過程的,擴(kuò)容后 table 的長度是原來的兩倍,不妨以這個(gè)角度來考慮鍵值對(duì)的個(gè)數(shù)。我們知道當(dāng)鍵值對(duì)個(gè)數(shù)大于等于加載因子(默認(rèn) 0.75f)和當(dāng)前 table 長度的乘積時(shí),會(huì)發(fā)生擴(kuò)容,所以剛剛完成擴(kuò)容時(shí):
要發(fā)生下一次擴(kuò)容時(shí):
求均值:
最后求得:帶入式(1),得出下表:
- X = 0: P = 0.60653066
- X = 1: P = 0.30326533
- X = 2: P = 0.07581633
- X = 3: P = 0.01263606
- X = 4: P = 0.00157952
- X = 5: P = 0.00015795
- X = 6: P = 0.00001316
- X = 7: P = 0.00000094
- X = 8: P = 0.00000006
- X > 8: P less than 1 in ten million
可以看到 8 個(gè)鍵值對(duì)同時(shí)存在于同一個(gè)桶的概率只有 0.00000006,比 8 大的概率更是小于千萬分之一,就它了!
總結(jié)
以上是生活随笔為你收集整理的longtext长度为0是什么意思_为什么 HashMap 中链表长度大于 8 才转化为红黑树?的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: mysql运算结果放入表中_MySQL表
- 下一篇: rocketmq 订阅组_必须先理解的R