當前位置：首頁 > 编程语言 > python >内容正文

python

使用NumPy优于Python列表的优势

發布時間：2023/12/15 python 25 豆豆

生活随笔收集整理的這篇文章主要介紹了使用NumPy优于Python列表的优势小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

In this article, I will show a few neat tricks that come with NumPy, yet are must faster than vanilla python code.

在本文中，我將展示NumPy附帶的一些巧妙技巧，但必須比普通python代碼更快。

內存使用情況 (Memory usage)

The most important gain is the memory usage. This comes in handy when we implement complex algorithms and in research work.

最重要的收益是內存使用率。當我們實現復雜的算法和進行研究工作時，這將派上用場。

array = list(range(10**7))
np_array = np.array(array)

I found the following code from a blog. I will be using this code snippet to compute the size of the objects in this article.

我從博客中找到了以下代碼。在本文中，我將使用此代碼段計算對象的大小。

get_size(array) ====> 370000108 bytes ~ 352.85MB
get_size(np_array) => 80000160 bytes ~ 76.29MB

This is because NumPy arrays are fixed-length arrays, while vanilla python has lists that are extensible.

這是因為NumPy數組是定長數組，而vanilla python具有可擴展的列表。

速度 (Speed)

Speed is, in fact, a very important property in data structures. Why does it take much less time to use NumPy operations over vanilla python? Let’s have a look at a few examples.

實際上，速度是數據結構中非常重要的屬性。為什么在香草python上使用NumPy操作需要更少的時間？讓我們看幾個例子。

矩陣乘法 (Matrix Multiplication)

In this example, we will look at a scenario where we multiply two square matrices.

在此示例中，我們將研究將兩個平方矩陣相乘的情況。

from time import time
import numpy as npdef matmul(A, B):
N = len(A)
product = [[0 for x in range(N)] for y in range(N)] for i in range(N):
for j in range(N):
for k in range(N):
product[i][j] += matrix1[i][k] * matrix2[k][j]
return productmatrix1 = np.random.rand(1000, 1000)
matrix2 = np.random.rand(1000, 1000)t = time()
prod = matmul(matrix1, matrix1)
print("Normal", time() - t)
t = time()
np_prod = np.matmul(matrix1, matrix2)
print("Numpy", time() - t)

The times will be observed as follows;

時間將被觀察如下。

Normal 7.604596138000488
Numpy 0.0007512569427490234

We can see that the NumPy implementation is almost 10,000 times faster. Why? Because NumPy uses under-the-hood optimizations such as transposing and chunked multiplications. Furthermore, the operations are vectorized so that the looped operations are performed much faster. The NumPy library uses the BLAS (Basic Linear Algebra Subroutines) library under in its backend. Hence, it is important to install NumPy properly to compile the binaries to fit the hardware architecture.

我們可以看到NumPy的實現快了將近10,000倍。為什么？因為NumPy使用了底層優化，例如轉置和分塊乘法。此外，將操作向量化，以便更快地執行循環操作。 NumPy庫在其后端使用BLAS(基本線性代數子例程)庫。因此，正確安裝NumPy以編譯二進制文件以適合硬件體系結構非常重要。

廣播業務 (Broadcast Operations)

Numpy vectorized operations also provide much faster operations on arrays. These are called broadcast operations. This is because the operations are broadcasted over the entire array using Intel Vectorized instructions (Intel AVX).

Numpy向量化操作還提供了對陣列更快的操作。這些稱為廣播操作 。這是因為操作是使用Intel向量化指令( Intel AVX )在整個陣列上廣播的。

vec = np.random.rand(5000000)t = time()
mul = [float(x) * 5 for x in vec]
print("Normal", time() - t)
t = time()
np_mul = 5 * vec
print("Numpy", time() - t)

Let’s see how the running times look;

讓我們看看運行時間如何。

Normal 1.3156049251556396
Numpy 0.01950979232788086

Almost 100 times!

差不多100次！

篩選 (Filtering)

Filtering includes scenarios where you only pick a few items from an array, based on a condition. This is integrated into the NumPy indexed access. Let me show you a simple practical example.

篩選包括根據條件僅從陣列中選擇一些項目的方案。這已集成到NumPy索引訪問中。讓我向您展示一個簡單的實際示例。

X = np.array(DATA)
Y = np.array(LABELS)Y_red = Y[Y=='red'] # obtain all Y values with RED
X_red = X[Y=='red'] # feed Y=='red' indices and filter X

Let’s compare this against the vanilla python implementation.

讓我們將其與香草python實現進行比較。

X = np.random.rand(5000000)
Y = np.int64(10 * np.random.rand(5000000))t = time()
Y_even = [int(y) for y in Y if y%2==0]
X_even = [float(X[i]) for i, y in enumerate(Y) if y%2==0]
print("Normal", time() - t)
t = time()
np_Y_even = Y[Y%2==0]
np_X_even = X[Y%2==0]
print("Numpy", time() - t)

The running times are as follows;

運行時間如下：

Normal 6.341982841491699
Numpy 0.2538008689880371

This is a pretty handy trick when you want to separate data based on some condition or the label. It is very useful in data analytics and machine learning.

當您要根據某些條件或標簽分離數據時，這是一個非常方便的技巧。它在數據分析和機器學習中非常有用。

Finally, let’s have a look at np.where which enables you to transform a NumPy array with a condition.

最后，讓我們看一下np.where ，它使您能夠轉換帶條件的NumPy數組。

X = np.int64(10 * np.random.rand(5000000))
X_even_or_zeros = np.where(X%2==0, 1, 0)

This returns an array where even-numbered slots are replaced with ones and others with zeros.

這將返回一個數組，其中偶數編號的插槽將替換為1，而其他編號的插槽將替換為零。

These are a few vital operations and I hope the read was worth the time. I always use NumPy with huge numeric datasets and find the performance very satisfying. NumPy has really helped the research community to stick with python without levelling down to C/C++ to gain numeric computation speeds. Room for improvements still exists!

這些是一些至關重要的操作，我希望閱讀是值得的。我始終將NumPy與龐大的數字數據集結合使用，并發現性能非常令人滿意。 NumPy確實幫助研究社區堅持使用python，而無需降低到C / C ++來獲得數值計算速度。仍有改進的空間！

Cheers!

干杯!

翻譯自: https://medium.com/swlh/why-use-numpy-d06c573fbcda

總結

以上是生活随笔為你收集整理的使用NumPy优于Python列表的优势的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：中国石化投资设立智能机器人公司注册资本
下一篇： websocket python爬虫_p