當前位置：首頁 > 编程语言 > java >内容正文

java

Java HashSet源码解析

發布時間：2025/3/20 java 31 豆豆

生活随笔收集整理的這篇文章主要介紹了 Java HashSet源码解析小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

本解析源碼來自JDK1.7，HashSet是基于HashMap實現的，方法實現大都直接調用HashMap的方法
另一篇HashMap的源碼解析文章

概要

實現了Set接口，實際是靠HashMap實現的
不保證遍歷時的順序，不保證集合順序的不變性
HashSet允許出現null值
假定Hash算法能很好的分散元素，查詢的時間復雜度為O(1)
遍歷的時間復雜度由set的size和其依靠的HashMap的capacity來決定
HashSet是非同步的可以通過Set s = Collections.synchronizedSet(new HashSet(...));的方式獲得同步的set
HashSet的Iterator有fast-fail機制，但是并不能保證程序一定正確，fail-fast機制通常只用來檢測bug

實現接口

Set接口包含集合常用方法

public class HashSet<E>extends AbstractSet<E>implements Set<E>, Cloneable, java.io.Serializable public interface Set<E> extends Collection<E> {// Query Operationsint size();boolean isEmpty();boolean contains(Object o);Iterator<E> iterator();Object[] toArray();<T> T[] toArray(T[] a);// Modification Operationsboolean add(E e);boolean remove(Object o);// Bulk Operationsboolean containsAll(Collection<?> c);boolean addAll(Collection<? extends E> c);boolean retainAll(Collection<?> c);boolean removeAll(Collection<?> c);void clear();// Comparison and hashingboolean equals(Object o);int hashCode(); }

Cloneable 對集合元素進行淺拷貝，調用了map的clone方法來克隆自身map域，由于HashMap執行的是淺拷貝，雖然創建了新的Entry，但是沒有創建新的key和value，通過原Set和clone后的set對key和value的改變是等效的。

Serializable HashSet實現了自己readObject和writeObject方法，將map的keySet中的元素分別寫入，對端讀出后放入自己的map中去

主要成員

map用來裝載set的元素，HashSet就是HashMap的keySet。transient表明序列化時略過，HashSet實現了自己的序列化方法。
常量PRESENT用來填充HashMap的value，也就是說HashSet中的map的value都是同一個對象，高效的利用堆空間

NOTE：所有被裝入集合(map,set,list)的對象，都只是將對象的引用復制一份到集合中。如果通過外部引用改變了對象的內容，集合中的對象的內容也會跟著改變。但是如果將外部引用指向其他對象，集合內部的引用并不會改變，還是會指向加入集合時引用指向的對象。也即元素被放入集合時，執行的是淺拷貝，引用復制而對象不會重新創建

private transient HashMap<E,Object> map;// Dummy value to associate with an Object in the backing Mapprivate static final Object PRESENT = new Object();

構造函數

我們看到初始化HashSet就是初始化對應的HashMap對象map成員變量

public HashSet() {map = new HashMap<>();} public HashSet(Collection<? extends E> c) {map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));addAll(c);}public HashSet(int initialCapacity, float loadFactor) {map = new HashMap<>(initialCapacity, loadFactor);}public HashSet(int initialCapacity) {map = new HashMap<>(initialCapacity);}

集合相關方法

可以看到對HashSet的操作就是對對應的HashMap對象map的操作

public Iterator<E> iterator() {return map.keySet().iterator();}public int size() {return map.size();}public boolean isEmpty() {return map.isEmpty();}public boolean contains(Object o) {return map.containsKey(o);}public boolean add(E e) {return map.put(e, PRESENT) == null;}public boolean remove(Object o) {return map.remove(o) == PRESENT;}public void clear() {map.clear();}

HashMap解決擴容問題

調整的時機（負載因子）x（容量）>（Map 大小），則調整 Map大小為之前的二倍，該過程包含了table的復制，性能消耗較大，如果map大小已知，可以在初始化時合理設定map的初始大小，避免擴容。
如果數組大小已經到達最大容量，將閾值置為Integer.MAX_VAlUE,不再進行擴容
新申請數組，重新定址并將原數組中的Entry轉移到新數組中，由于容量變化，即使Hash值不變，Entry的index也會改變，index=hash&(length-1),取hash的低位，length增大，index取的位數增多
依舊使用頭插法將所有元素進行復制

void resize(int newCapacity) {Entry[] oldTable = table;int oldCapacity = oldTable.length;if (oldCapacity == MAXIMUM_CAPACITY) {threshold = Integer.MAX_VALUE;return;}Entry[] newTable = new Entry[newCapacity];transfer(newTable, initHashSeedAsNeeded(newCapacity));table = newTable;threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);} /*** Transfers all entries from current table to newTable.*/void transfer(Entry[] newTable, boolean rehash) {int newCapacity = newTable.length;for (Entry<K,V> e : table) {while(null != e) {Entry<K,V> next = e.next;if (rehash) {e.hash = null == e.key ? 0 : hash(e.key);}int i = indexFor(e.hash, newCapacity);e.next = newTable[i];newTable[i] = e;e = next;}}}

HashMap 元素存儲位置的計算 hash值

String類型的key的hashcode是根據與字符串內容相關的，由于可能引起很多碰撞，所以值單獨計算
Object類型的key的HashCode是基于其內存地址的。為了充分利用Integer值的高位，需要將高位的影響引入低位，（由于多數map的length是比較小的）
由于length是2的指數倍，所以可以用hash&（length-1）代替 hash%length，位運算有更高的效率

final int hash(Object k) {int h = hashSeed;if (0 != h && k instanceof String) {return sun.misc.Hashing.stringHash32((String) k);}h ^= k.hashCode();// This function ensures that hashCodes that differ only by// constant multiples at each bit position have a bounded// number of collisions (approximately 8 at default load factor).h ^= (h >>> 20) ^ (h >>> 12);return h ^ (h >>> 7) ^ (h >>> 4);} static int indexFor(int h, int length) {// assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";return h & (length-1);}

HashMap的put方法

下面以HashMap的put方法為例對HashSet的集合方法進行說明

如果是key為null，遍歷查找table中key是否有null，如果有更新value，否則添加null，value節點
如果key不為null，根據Key的hashcode獲取Hash值，根據Hash計算其在table中的索引。hash值計算時利用高位與低位進行異或操作，加入高位因素，來減少Hash碰撞。
由于tablelength 都是2的指數次冪，所以indexFor用 HashCode&（table.lenght-1）取HashCode的低位，使用位運算提高運算效率
如果table[i]不為null(并不表示Hash值相同，HashCode不同也可能碰撞)，也就是發生了Hash碰撞，如果存在與keyHash相等（equals）或相同（==）的key，那么更新value
如果table[i]為null，或者table[i]鏈表中不存在Hash值與Key相同且equals函數返回true的情況就根據Hash值添加新的節點
addEntry()方法首先判斷大小是否超過閾值，然后使用頭插法，插入元素
NOTE
在判斷插入Entry是否為覆蓋時，會先判斷Key的hashCode是否和map中的key相等，然后判斷Equals方法或者==，所以如果重寫了equals方法，要記得重寫hashcode方法，使得其邏輯相同，否則即使equals方法判斷相等也不會發生覆蓋

public V put(K key, V value) {// HashMap允許存放null鍵和null值。// 當key為null時，調用putForNullKey方法，將value放置在數組第一個位置。if (key == null)return putForNullKey(value);// 根據key的keyCode重新計算hash值。int hash = hash(key);//注意這里的實現是jdk1.7和以前的版本有區別的// 搜索指定hash值在對應table中的索引。int i = indexFor(hash, table.length);// 如果 i 索引處的 Entry 不為 null，通過循環不斷遍歷 e 元素的下一個元素。for (Entry<K,V> e = table[i]; e != null; e = e.next) {Object k;if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {V oldValue = e.value;e.value = value;e.recordAccess(this);return oldValue;}}// 如果i索引處的Entry為null，表明此處還沒有Entry。modCount++;// 將key、value添加到i索引處。addEntry(hash, key, value, i);return null; } /**產生哈希碼*/ final int hash(Object k) {int h = 0;if (useAltHashing) {if (k instanceof String) {return sun.misc.Hashing.stringHash32((String) k);}h = hashSeed;}h ^= k.hashCode();// This function ensures that hashCodes that differ only by// constant multiples at each bit position have a bounded// number of collisions (approximately 8 at default load factor)./*加入高位計算，防止低位不變，高位變化是引起hash沖突*/h ^= (h >>> 20) ^ (h >>> 12);return h ^ (h >>> 7) ^ (h >>> 4);} /**產生索引，由于索引產生是不確定的，因此也就造成了HashMap順序的不確定性。需要注意的是不同的hash產生的索引完全有可能相同的該方法的實現十分的巧妙，它通過h & (length-1)來的到對象保存的索引，有可知道底層數組為2的n次方，這在速度上就有了明顯的優化*/ static int indexFor(int h, int length) {return h & (length-1);}private V putForNullKey(V value) {for (Entry<K,V> e = table[0]; e != null; e = e.next) {if (e.key == null) {V oldValue = e.value;e.value = value;e.recordAccess(this);return oldValue;}}modCount++;addEntry(0, null, value, 0);return null;}void addEntry(int hash, K key, V value, int bucketIndex) {if ((size >= threshold) && (null != table[bucketIndex])) {resize(2 * table.length);hash = (null != key) ? hash(key) : 0;bucketIndex = indexFor(hash, table.length);}createEntry(hash, key, value, bucketIndex);} void createEntry(int hash, K key, V value, int bucketIndex) {Entry<K,V> e = table[bucketIndex];table[bucketIndex] = new Entry<>(hash, key, value, e);size++;}

Clone方法

Clone方法是淺拷貝方法
Clone就是將對應的map進行復制

HashMap的Clone方法

Clone實現的是淺拷貝，雖然重新創建了Entry但是并沒有重新創建key，value，即如果通過原HashMap的key的引用改變了key的屬性，clone出來的HashMap的key也會跟著改變，克隆出來的Map的數組的大小也不一定與原Map相同

HashSet的Clone方法其實就是對成員變量map的clone
首先會創建一個空的HashMap對象
然后對該HashMap進行擴容，容量大小取Math.min(當前table大小，HashMap的最大容量，當前的Size*（Math.min(1/loadFactor,4))，克隆出來的HashMap的數組初始大小并不會與當前Map一致，而是考慮合理的初始化loadFactor之后的結果。
最后調用putAllForCreate(this)依次將當前Map的(key,value)放到Map中去，過程中雖然創建了新的Entry但是并沒有創建新的key，value，通過原HashMap和通過克隆出來的HashMap改變(key,value)效果是等同的。

public Object clone() {try {HashSet<E> newSet = (HashSet<E>) super.clone();newSet.map = (HashMap<E, Object>) map.clone();return newSet;} catch (CloneNotSupportedException e) {throw new InternalError();}} public Object clone() {HashMap<K,V> result = null;try {result = (HashMap<K,V>)super.clone();} catch (CloneNotSupportedException e) {// assert false;}if (result.table != EMPTY_TABLE) {result.inflateTable(Math.min((int) Math.min(size * Math.min(1 / loadFactor, 4.0f),// we have limits...HashMap.MAXIMUM_CAPACITY),table.length));}result.entrySet = null;result.modCount = 0;result.size = 0;result.init();result.putAllForCreate(this);return result;} private void putAllForCreate(Map<? extends K, ? extends V> m) {for (Map.Entry<? extends K, ? extends V> e : m.entrySet())putForCreate(e.getKey(), e.getValue());}private void putForCreate(K key, V value) {int hash = null == key ? 0 : hash(key);int i = indexFor(hash, table.length);/*** Look for preexisting entry for key. This will never happen for* clone or deserialize. It will only happen for construction if the* input Map is a sorted map whose ordering is inconsistent w/ equals.*/for (Entry<K,V> e = table[i]; e != null; e = e.next) {Object k;if (e.hash == hash &&((k = e.key) == key || (key != null && key.equals(k)))) {e.value = value;return;}}createEntry(hash, key, value, i);}void createEntry(int hash, K key, V value, int bucketIndex) {Entry<K,V> e = table[bucketIndex];table[bucketIndex] = new Entry<>(hash, key, value, e);size++;}Entry(int h, K k, V v, Entry<K,V> n) {value = v;next = n;key = k;hash = h;}

序列化方法

序列化方法就是將Map keySet中的元素依次寫出，然后在對端依次讀入，重建HashMap

private void writeObject(java.io.ObjectOutputStream s)throws java.io.IOException {// Write out any hidden serialization magics.defaultWriteObject();// Write out HashMap capacity and load factors.writeInt(map.capacity());s.writeFloat(map.loadFactor());// Write out sizes.writeInt(map.size());// Write out all elements in the proper order.for (E e : map.keySet())s.writeObject(e);}private void readObject(java.io.ObjectInputStream s)throws java.io.IOException, ClassNotFoundException {// Read in any hidden serialization magics.defaultReadObject();// Read in HashMap capacity and load factor and create backing HashMapint capacity = s.readInt();float loadFactor = s.readFloat();map = (((HashSet)this) instanceof LinkedHashSet ?new LinkedHashMap<E,Object>(capacity, loadFactor) :new HashMap<E,Object>(capacity, loadFactor));// Read in sizeint size = s.readInt();// Read in all elements in the proper order.for (int i=0; i<size; i++) {E e = (E) s.readObject();map.put(e, PRESENT);}}

總結

以上是生活随笔為你收集整理的Java HashSet源码解析的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： wordpress 首页调用文章不同样
下一篇：大数据Java基础第十九天作业