當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

魅族mx5游戏模式小熊猫_您不知道的5大熊猫技巧

發布時間：2023/11/29 编程问答 40 豆豆

生活随笔收集整理的這篇文章主要介紹了魅族mx5游戏模式小熊猫_您不知道的5大熊猫技巧小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

魅族mx5游戲模式小熊貓

重點 (Top highlight)

I’ve been using pandas for years and each time I feel I am typing too much, I google it and I usually find a new pandas trick! I learned about these functions recently and I deem them essential because of ease of use.

我已經使用熊貓多年了，每次我輸入太多單詞時，我都會用google搜索它，而且我通常會發現一個新的熊貓技巧！我最近了解了這些功能，并且由于易于使用，我認為它們是必不可少的。

1.功能之間 (1. between function)

GiphyGiphy的 Gif

I’ve been using “between” function in SQL for years, but I only discovered it recently in pandas.

多年來，我一直在SQL中使用“ between”功能，但最近才在pandas中發現它。

Let’s say we have a DataFrame with prices and we would like to filter prices between 2 and 4.

假設我們有一個帶有價格的DataFrame，并且我們希望在2到4之間過濾價格。

df = pd.DataFrame({'price': [1.99, 3, 5, 0.5, 3.5, 5.5, 3.9]})

With between function, you can reduce this filter:

使用between功能，可以減少此過濾器：

df[(df.price >= 2) & (df.price <= 4)]

To this:

對此：

df[df.price.between(2, 4)]

It might not seem much, but those parentheses are annoying when writing many filters. The filter with between function is also more readable.

看起來似乎不多，但是編寫許多過濾器時這些括號令人討厭。具有中間功能的過濾器也更易讀。

between function sets interval left <= series <= right.

功能集之間的間隔左<=系列<=右。

2.使用重新索引功能固定行的順序 (2. Fix the order of the rows with reindex function)

giphygiphy

Reindex function conforms a Series or a DataFrame to a new index. I resort to the reindex function when making reports with columns that have a predefined order.

Reindex函數使Series或DataFrame符合新索引。當使用具有預定義順序的列制作報表時，我求助于reindex函數。

Let’s add sizes of T-shirts to our Dataframe. The goal of analysis is to calculate the mean price for each size:

讓我們在數據框中添加T恤的尺寸。分析的目的是計算每種尺寸的平ASP格：

df = pd.DataFrame({'price': [1.99, 3, 5], 'size': ['medium', 'large', 'small']})df_avg = df.groupby('size').price.mean()
df_avg

Sizes have a random order in the table above. It should be ordered: small, medium, large. As sizes are strings we cannot use the sort_values function. Here comes reindex function to the rescue:

尺寸在上表中具有隨機順序。應該訂購：小，中，大。由于大小是字符串，因此我們不能使用sort_values函數。這里有reindex函數來解救：

df_avg.reindex(['small', 'medium', 'large'])

通過

3.描述類固醇 (3. Describe on steroids)

GiphyGiphy的 Gif

Describe function is an essential tool when working on Exploratory Data Analysis. It shows basic summary statistics for all columns in a DataFrame.

當進行探索性數據分析時，描述功能是必不可少的工具。它顯示了DataFrame中所有列的基本摘要統計信息。

df.price.describe()

What if we would like to calculate 10 quantiles instead of 3?

如果我們想計算10個分位數而不是3個分位數怎么辦？

df.price.describe(percentiles=np.arange(0, 1, 0.1))

Describe function takes percentiles argument. We can specify the number of percentiles with NumPy's arange function to avoid typing each percentile by hand.

描述函數采用百分位數參數。我們可以使用NumPy的arange函數指定百分位數，以避免手動鍵入每個百分位數。

This feature becomes really useful when combined with the group by function:

與group by函數結合使用時，此功能將非常有用：

df.groupby('size').describe(percentiles=np.arange(0, 1, 0.1))

4.使用正則表達式進行文本搜索 (4. Text search with regex)

GiphyGiphy的 Gif

Our T-shirt dataset has 3 sizes. Let’s say we would like to filter small and medium sizes. A cumbersome way of filtering is:

我們的T恤數據集有3種尺寸。假設我們要過濾中小型尺寸。繁瑣的過濾方式是：

df[(df['size'] == 'small') | (df['size'] == 'medium')]

This is bad because we usually combine it with other filters, which makes the expression unreadable. Is there a better way?

這很不好，因為我們通常將其與其他過濾器結合使用，從而使表達式不可讀。有沒有更好的辦法？

pandas string columns have an “str” accessor, which implements many functions that simplify manipulating string. One of them is “contains” function, which supports search with regular expressions.

pandas字符串列具有“ str”訪問器，該訪問器實現了許多簡化操作字符串的功能。其中之一是“包含”功能，該功能支持使用正則表達式進行搜索。

df[df['size'].str.contains('small|medium')]

The filter with “contains” function is more readable, easier to extend and combine with other filters.

具有“包含”功能的過濾器更具可讀性，更易于擴展并與其他過濾器組合。

5.比帶有熊貓的內存數據集更大 (5. Bigger than memory datasets with pandas)

giphygiphy

pandas cannot even read bigger than the main memory datasets. It throws a MemoryError or Jupyter Kernel crashes. But to process a big dataset you don’t need Dask or Vaex. You just need some ingenuity. Sounds too good to be true?

熊貓讀取的數據甚至不能超過主內存數據集。它引發MemoryError或Jupyter Kernel崩潰。但是，要處理大型數據集，您不需要Dask或Vaex。 您只需要一些獨創性 。聽起來好得令人難以置信？

In case you’ve missed my article about Dask and Vaex with bigger than main memory datasets:

如果您錯過了我的有關Dask和Vaex的文章，而這篇文章的內容比主內存數據集還大：

When doing an analysis you usually don’t need all rows or all columns in the dataset.

執行分析時，通常不需要數據集中的所有行或所有列。

In a case, you don’t need all rows, you can read the dataset in chunks and filter unnecessary rows to reduce the memory usage:

在某種情況下，您不需要所有行，您可以按塊讀取數據集并過濾不必要的行以減少內存使用量：

iter_csv = pd.read_csv('dataset.csv', iterator=True, chunksize=1000)
df = pd.concat([chunk[chunk['field'] > constant] for chunk in iter_csv])

Reading a dataset in chunks is slower than reading it all once. I would recommend using this approach only with bigger than memory datasets.

分塊讀取數據集要比一次讀取所有數據集慢。我建議僅對大于內存的數據集使用此方法。

In a case, you don’t need all columns, you can specify required columns with “usecols” argument when reading a dataset:

在某種情況下，不需要所有列，可以在讀取數據集時使用“ usecols”參數指定所需的列：

df = pd.read_csvsecols=['col1', 'col2'])

The great thing about these two approaches is that you can combine them.

這兩種方法的優點在于您可以將它們組合在一起。

你走之前 (Before you go)

giphygiphy

These are a few links that might interest you:

這些鏈接可能會讓您感興趣：

- Your First Machine Learning Model in the Cloud- AI for Healthcare- Parallels Desktop 50% off- School of Autonomous Systems- Data Science Nanodegree Program- 5 lesser-known pandas tricks- How NOT to write pandas code

翻譯自: https://towardsdatascience.com/5-essential-pandas-tricks-you-didnt-know-about-2d1a5b6f2e7

魅族mx5游戲模式小熊貓

總結

以上是生活随笔為你收集整理的魅族mx5游戏模式小熊猫_您不知道的5大熊猫技巧的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

编程问答

魅族mx5游戏模式小熊猫_您不知道的5大熊猫技巧

重點 (Top highlight)

1.功能之間 (1. between function)

2.使用重新索引功能固定行的順序 (2. Fix the order of the rows with reindex function)

3.描述類固醇 (3. Describe on steroids)

4.使用正則表達式進行文本搜索 (4. Text search with regex)

5.比帶有熊貓的內存數據集更大 (5. Bigger than memory datasets with pandas)

你走之前 (Before you go)

總結