HashMap之resize詳解

1. 導讀

上期分享了HashMap的key定位以及數據節點的設計, 本期就下面三個問題來分享下個人對於HashMap擴容的理解:

.1 HashMap為什麼要擴容? 何時擴容?

.2 負載因子為什麼是0.75?

.3 HashMap如何擴容;

2. HashMap為什麼要擴容

經過上期分享, 我們都知道HashMap在構建初始是可以指定table(hash槽)的長度的, 假設我們設定了2, 這時候有10萬數據要插入, 最好的情況就是兩邊各是5萬, 最差的情況就是一邊是10萬, 顯然這個時候hash衝突已經很嚴重了, 為了解決衝突, 我們就需要對table進行擴容, 所以HashMap的擴容就是加長table的長度, 來減少hash衝突的概率;

3. HashMap何時擴容

HashMap是如何來判定何時該擴容的呢?

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { if ((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length; ... if (++size > threshold) resize(); ... }

上面代碼是HashMap::put的核心實現, 我將與本問題無關的代碼都省略了, HashMap會在兩個地方進行resize(擴容):

.1 HashMap實行了懶載入, 新建HashMap時不會對table進行賦值, 而是到第一次插入時, 進行resize時構建table;

.2 當HashMap.size 大於 threshold時, 會進行resize;threshold的值我們在上一次分享中提到過: 當第一次構建時, 如果沒有指定HashMap.table的初始長度, 就用默認值16, 否則就是指定的值; 然後不管是第一次構建還是後續擴容, threshold = table.length * loadFactor;

3. 為什麼是0.75

HashMap的擴容時取決於threshold, 而threshold取決於loadFactor, loadFactor(負載因子)HashMap的默認值是0.75(3/4), 那麼為什麼當HashMap的容量超過3/4時就需要擴容了呢? 為什麼不是1/2擴容或者等於table.length時擴容呢?

答案就在HashMap的注釋中:

/** * Ideally, under random hashCodes, the frequency of * nodes in bins follows a Poisson distribution * (http://en.wikipedia.org/wiki/Poisson_distribution) with a * parameter of about 0.5 on average for the default resizing * threshold of 0.75, although with a large variance because of * resizing granularity. Ignoring variance, the expected * occurrences of list size k are (exp(-0.5) * pow(0.5, k) / * factorial(k)). The first values are: * * 0: 0.60653066 * 1: 0.30326533 * 2: 0.07581633 * 3: 0.01263606 * 4: 0.00157952 * 5: 0.00015795 * 6: 0.00001316 * 7: 0.00000094 * 8: 0.00000006 * */