當前位置：首頁 > 编程语言 > python >内容正文

python

[Python技巧]是时候用 defaultdict 和 Counter 代替 dictionary 了

發布時間：2023/12/10 python 36 豆豆

生活随笔收集整理的這篇文章主要介紹了 [Python技巧]是时候用 defaultdict 和 Counter 代替 dictionary 了小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

我們在采用 dict 的時候，一般都需要判斷鍵是否存在，如果不存在，設置一個默認值，存在則采取其他的操作，但這個做法其實需要多寫幾行代碼，那么是否有更高效的寫法，可以減少代碼，但可讀性又不會降低呢，畢竟作為程序員，我們都希望寫出可用并且高效簡潔的代碼。

今天看到一篇文章，作者介紹可以使用 defaultdict 和 Counter 來代替 dictionary 可以寫出比更加簡潔和可讀性高的代碼，因此今天就簡單翻譯這篇文章，并后續簡單介紹這兩種數據類型。

原文：

https://towardsdatascience.com/python-pro-tip-start-using-python-defaultdict-and-counter-in-place-of-dictionary-d1922513f747

關于字典的介紹，也可以查看我之前寫的Python基礎入門_2基礎語法和變量類型。

本文目錄：

Counter 和 defaultdict
為何要用 defaultdict 呢？
defaultdict 的定義和使用
Counter 的定義和使用

學習一門編程語言很簡單，在學習一門新語言的時候，我會專注于以下順序的知識點，并且開始用新語言寫代碼其實很簡單：

運算符和數據類型：+，-，int，float，str
條件語句：if，else，case，switch
循環語句：For，while
數據結構：List，Array，Dict，Hashmaps
定義函數

但能寫代碼和寫出優雅高效的代碼是兩件事情，每種語言都有其獨特的地方。

因此，一門編程語言的新手總是會寫出比較過度的代碼，比如，對于 Java 開發者，在學習 Python 后，要寫一段實現對一組數字的求和代碼，會是下面這樣子：

x=[1,2,3,4,5] sum_x = 0 for i in range(len(x)):sum_x+=x[i]

但對于一名 Python 老手來說，他的代碼是：

sum_x = sum(x)

所以接下來會開啟一個名為“Python Shorts”的系列文章，主要介紹一些 Python 提供的簡單概念以及有用的技巧和使用例子，這個系列的目標就是寫出高效并且可讀的代碼。

Counter 和 defaultdict

這里的要代碼實現的例子就是統計一段文字中單詞出現的次數，比如《哈姆雷特》，應該如何做呢？

Python 中可以有多種實現方法，但只有一種是比較優雅的，那就是采用原生 Python 的實現–dict 數據類型。

代碼如下所示：

# count the number of word occurrences in a piece of text text = "I need to count the number of word occurrences in a piece of text. How could I do that? " \"Python provides us with multiple ways to do the same thing. But only one way I find beautiful."word_count_dict = {} for w in text.split(" "):if w in word_count_dict:word_count_dict[w] += 1else:word_count_dict[w] = 1

這里還可以應用 defaultdict 來減少代碼行數：

from collections import defaultdict word_count_dict = defaultdict(int) for w in text.split(" "):word_count_dict[w] += 1

利用 Counter 也可以做到：

from collections import Counter word_count_dict = Counter() for w in text.split(" "):word_count_dict[w] += 1

Counter 還有另一種寫法，更加簡潔：

word_counter = Counter(text.split(" "))

Counter 其實就是一個計數器，它本身就應用于統計給定的變量對象的次數，因此，我們還可以獲取出現次數最多的單詞：

print('most common word: ', word_count_dict.most_common(10))

輸出如下：

most common word: [('I', 3), ('the', 2), ('of', 2), ('do', 2), ('to', 2), ('multiple', 1), ('in', 1), ('way', 1), ('us', 1), ('occurrences', 1)]

其他的一些應用例子：

# Count Characters print(Counter('abccccccddddd')) # Count List elements print(Counter([1, 2, 3, 4, 5, 1, 2]))

輸出結果：

Counter({'c': 6, 'd': 5, 'a': 1, 'b': 1}) Counter({1: 2, 2: 2, 3: 1, 4: 1, 5: 1})

為何要用 defaultdict 呢？

既然 Counter 這么好用，是不是只需要 Counter 就可以了？答案當然是否定的，因為 Counter 的問題就是其數值必須是整數，本身就是用于統計數量，因此如果我們需要的數值是字符串，列表或者元組，那么就不能繼續用它。

這個時候，defaultdict 就派上用場了。它相比于 dict 的最大區別就是可以設置默認的數值，即便 key 不存在。例子如下：

s = [('color', 'blue'), ('color', 'orange'), ('color', 'yellow'), ('fruit', 'banana'), ('fruit', 'orange'),('fruit', 'banana')] d = defaultdict(list) for k, v in s:d[k].append(v) print(d)

輸出結果：

defaultdict(<class 'list'>, {'color': ['blue', 'orange', 'yellow'], 'fruit': ['banana', 'orange', 'banana']})

這里就是事先將字典的所有數值都初始化一個空列表，而如果是傳入集合 set：

s = [('color', 'blue'), ('color', 'orange'), ('color', 'yellow'), ('fruit', 'banana'), ('fruit', 'orange'),('fruit', 'banana')] d = defaultdict(set) for k, v in s:d[k].add(v) print(d)

輸出結果：

defaultdict(<class 'set'>, {'color': {'blue', 'yellow', 'orange'}, 'fruit': {'banana', 'orange'}})

這里需要注意的就是列表和集合的添加元素方法不相同，列表是list.append()，而集合是set.add()。

接著是補充下，這兩個數據類型的一些定義和方法，主要是參考官方文檔的解釋。

defaultdict 的定義和使用

關于 defaultdict，在官方文檔的介紹有：

class collections.defaultdict([default_factory[, …]])

返回一個新的類似字典的對象。 defaultdict 是內置 dict 類的子類。它重載了一個方法并添加了一個可寫的實例變量。其余的功能與 dict 類相同，此處不再重復說明。

第一個參數 default_factory 提供了一個初始值。它默認為 None 。所有的其他參數都等同與 dict 構建器中的參數對待，包括關鍵詞參數。

在 dict 有一個方法setdefault()，利用它也可以實現比較簡潔的代碼：

s = [('color', 'blue'), ('color', 'orange'), ('color', 'yellow'), ('fruit', 'banana'), ('fruit', 'orange'),('fruit', 'banana')] a = dict() for k, v in s:a.setdefault(k, []).append(v) print(a)

但官方文檔也說了，defaultdict 的實現要比這種方法更加快速和簡單。

Counter 的定義和使用

中文官方文檔的說明：

class collections.Counter([iterable-or-mapping])

一個 Counter 是一個 dict 的子類，用于計數可哈希對象。它是一個集合，元素像字典鍵(key)一樣存儲，它們的計數存儲為值。計數可以是任何整數值，包括0和負數。 Counter 類有點像其他語言中的 bags或multisets。

這里，應該不只是可哈希對象，還有可迭代對象，否則列表屬于不可哈希對象，是否可哈希，其實是看該數據類型是否實現了 __hash__ 方法：

a = (2, 1) a.__hash__()

輸出：

3713082714465905806

而列表：

b=[1,2] b.__hash__()

報錯：

TypeError: 'NoneType' object is not callable

當然，之前也提過，調用hash() 方法，也同樣可以判斷一個數據類型是否可哈希，而可哈希的數據類型都是不可變的數據類型。

對于 Counter ，還可以通過關鍵字來初始化：

c = Counter(cats=4, dogs=8) print(c)

輸出：

Counter({'dogs': 8, 'cats': 4})

Counter 的一些方法，除了上述介紹的most_common()外，還有：

elements()：返回一個迭代器，將所有出現元素按照其次數來重復 n 個，并且返回任意順序，但如果該元素統計的次數少于 1 ，則會忽略，例子如下：

c = Counter(a=4, b=2, c=0, d=-2) sorted(c.elements()) # ['a', 'a', 'a', 'a', 'b', 'b']

subtract():減法操作，輸入輸出可以是 0 或者負數

c = Counter(a=4, b=2, c=0, d=-2) d = Counter(a=1, b=2, c=3, d=4) c.subtract(d) print(c) # Counter({'a': 3, 'b': 0, 'c': -3, 'd': -6})

此外，還有以下這些方法：

# 求和 sum(c.values()) # 清空 Counter c.clear() # 轉換為列表 list(c) # 轉換為集合 set(c) # 轉換為字典 dict(c) # 鍵值對 c.items() # Counter(dict(list_of_pairs)) # 輸出 n 個最少次數的元素 c.most_common()[:-n-1:-1] # 返回非零正數 +c # 返回負數 -c

此外，也可以采用運算符+,-,&,|，各有各不同的實現作用：

c = Counter(a=3, b=1) d = Counter(a=1, b=2) # 加法操作 c[x] + d[x] print(c + d) # Counter({'a': 4, 'b': 3}) # 減法，僅保留正數 print(c - d ) # Counter({'a': 2}) # 交集: min(c[x], d[x]) print(c & d) # Counter({'a': 1, 'b': 1}) # 并集: max(c[x], d[x]) print(c | d) # Counter({'a': 3, 'b': 2})

參考：

collections— 容器數據類型
What does “hashable” mean in Python?

小結

如果需要進行計數，比如計算單詞出現次數，采用 Counter 是一個不錯的選擇，非常簡潔，可讀性也高；而如果需要保存的數據不是整數，并且都是統一的某個類型，比如都是列表，那么直接采用 defaultdict 來定義一個變量對象，會比用 dict 的選擇更好。

最后，本文的代碼例子已經上傳到 Github 上了：

https://github.com/ccc013/Python_Notes/blob/master/Python_tips/defaultdict_and_counter.ipynb

歡迎關注我的微信公眾號–算法猿的成長，或者掃描下方的二維碼，大家一起交流，學習和進步！

總結

以上是生活随笔為你收集整理的[Python技巧]是时候用 defaultdict 和 Counter 代替 dictionary 了的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： UML建模与软件工程
下一篇： python两列数据生成邻接矩阵_用py