使用NumPy优于Python列表的优势
In this article, I will show a few neat tricks that come with NumPy, yet are must faster than vanilla python code.
在本文中,我將展示NumPy附帶的一些巧妙技巧,但必須比普通python代碼更快。
內存使用情況 (Memory usage)
The most important gain is the memory usage. This comes in handy when we implement complex algorithms and in research work.
最重要的收益是內存使用率。 當我們實現復雜的算法和進行研究工作時,這將派上用場。
array = list(range(10**7))np_array = np.array(array)
I found the following code from a blog. I will be using this code snippet to compute the size of the objects in this article.
我從博客中找到了以下代碼。 在本文中,我將使用此代碼段計算對象的大小。
get_size(array) ====> 370000108 bytes ~ 352.85MBget_size(np_array) => 80000160 bytes ~ 76.29MB
This is because NumPy arrays are fixed-length arrays, while vanilla python has lists that are extensible.
這是因為NumPy數組是定長數組,而vanilla python具有可擴展的列表。
速度 (Speed)
Speed is, in fact, a very important property in data structures. Why does it take much less time to use NumPy operations over vanilla python? Let’s have a look at a few examples.
實際上,速度是數據結構中非常重要的屬性。 為什么在香草python上使用NumPy操作需要更少的時間? 讓我們看幾個例子。
矩陣乘法 (Matrix Multiplication)
In this example, we will look at a scenario where we multiply two square matrices.
在此示例中,我們將研究將兩個平方矩陣相乘的情況。
from time import timeimport numpy as npdef matmul(A, B):
N = len(A)
product = [[0 for x in range(N)] for y in range(N)] for i in range(N):
for j in range(N):
for k in range(N):
product[i][j] += matrix1[i][k] * matrix2[k][j]
return productmatrix1 = np.random.rand(1000, 1000)
matrix2 = np.random.rand(1000, 1000)t = time()
prod = matmul(matrix1, matrix1)
print("Normal", time() - t)
t = time()
np_prod = np.matmul(matrix1, matrix2)
print("Numpy", time() - t)
The times will be observed as follows;
時間將被觀察如下。
Normal 7.604596138000488Numpy 0.0007512569427490234
We can see that the NumPy implementation is almost 10,000 times faster. Why? Because NumPy uses under-the-hood optimizations such as transposing and chunked multiplications. Furthermore, the operations are vectorized so that the looped operations are performed much faster. The NumPy library uses the BLAS (Basic Linear Algebra Subroutines) library under in its backend. Hence, it is important to install NumPy properly to compile the binaries to fit the hardware architecture.
我們可以看到NumPy的實現快了將近10,000倍。 為什么? 因為NumPy使用了底層優化,例如轉置和分塊乘法。 此外,將操作向量化,以便更快地執行循環操作。 NumPy庫在其后端使用BLAS(基本線性代數子例程)庫。 因此,正確安裝NumPy以編譯二進制文件以適合硬件體系結構非常重要。
更多矢量化操作 (More Vectorized Operations)
Vectorized operations are simply scenarios that we run operations on vectors including dot product, transpose and other matrix operations, on the entire array at once. Let’s have a look at the following example that we compute the element-wise product.
向量化運算只是簡單的場景,我們可以在整個陣列上一次對向量執行運算,包括點積,轉置和其他矩陣運算。 讓我們看一下下面的示例,我們將計算按元素乘積。
vec_1 = np.random.rand(5000000)vec_2 = np.random.rand(5000000)t = time()
dot = [float(x*y) for x, y in zip(vec_1, vec_2)]
print("Normal", time() - t)
t = time()
np_dot = vec_1 * vec_2
print("Numpy", time() - t)
The timings on each operation will be;
每個操作的時間將是;
Normal 2.0582966804504395Numpy 0.02198004722595215
We can see that the implementation of NumPy gives a much faster vectorized operation.
我們可以看到NumPy的實現提供了更快的向量化操作。
廣播業務 (Broadcast Operations)
Numpy vectorized operations also provide much faster operations on arrays. These are called broadcast operations. This is because the operations are broadcasted over the entire array using Intel Vectorized instructions (Intel AVX).
Numpy向量化操作還提供了對陣列更快的操作。 這些稱為廣播操作 。 這是因為操作是使用Intel向量化指令( Intel AVX )在整個陣列上廣播的。
vec = np.random.rand(5000000)t = time()mul = [float(x) * 5 for x in vec]
print("Normal", time() - t)
t = time()
np_mul = 5 * vec
print("Numpy", time() - t)
Let’s see how the running times look;
讓我們看看運行時間如何。
Normal 1.3156049251556396Numpy 0.01950979232788086
Almost 100 times!
差不多100次!
篩選 (Filtering)
Filtering includes scenarios where you only pick a few items from an array, based on a condition. This is integrated into the NumPy indexed access. Let me show you a simple practical example.
篩選包括根據條件僅從陣列中選擇一些項目的方案。 這已集成到NumPy索引訪問中。 讓我向您展示一個簡單的實際示例。
X = np.array(DATA)Y = np.array(LABELS)Y_red = Y[Y=='red'] # obtain all Y values with RED
X_red = X[Y=='red'] # feed Y=='red' indices and filter X
Let’s compare this against the vanilla python implementation.
讓我們將其與香草python實現進行比較。
X = np.random.rand(5000000)Y = np.int64(10 * np.random.rand(5000000))t = time()
Y_even = [int(y) for y in Y if y%2==0]
X_even = [float(X[i]) for i, y in enumerate(Y) if y%2==0]
print("Normal", time() - t)
t = time()
np_Y_even = Y[Y%2==0]
np_X_even = X[Y%2==0]
print("Numpy", time() - t)
The running times are as follows;
運行時間如下:
Normal 6.341982841491699Numpy 0.2538008689880371
This is a pretty handy trick when you want to separate data based on some condition or the label. It is very useful in data analytics and machine learning.
當您要根據某些條件或標簽分離數據時,這是一個非常方便的技巧。 它在數據分析和機器學習中非常有用。
Finally, let’s have a look at np.where which enables you to transform a NumPy array with a condition.
最后,讓我們看一下np.where ,它使您能夠轉換帶條件的NumPy數組。
X = np.int64(10 * np.random.rand(5000000))X_even_or_zeros = np.where(X%2==0, 1, 0)
This returns an array where even-numbered slots are replaced with ones and others with zeros.
這將返回一個數組,其中偶數編號的插槽將替換為1,而其他編號的插槽將替換為零。
These are a few vital operations and I hope the read was worth the time. I always use NumPy with huge numeric datasets and find the performance very satisfying. NumPy has really helped the research community to stick with python without levelling down to C/C++ to gain numeric computation speeds. Room for improvements still exists!
這些是一些至關重要的操作,我希望閱讀是值得的。 我始終將NumPy與龐大的數字數據集結合使用,并發現性能非常令人滿意。 NumPy確實幫助研究社區堅持使用python,而無需降低到C / C ++來獲得數值計算速度。 仍有改進的空間!
Cheers!
干杯!
翻譯自: https://medium.com/swlh/why-use-numpy-d06c573fbcda
總結
以上是生活随笔為你收集整理的使用NumPy优于Python列表的优势的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 中国石化投资设立智能机器人公司 注册资本
- 下一篇: websocket python爬虫_p