當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

熊猫数据集_大熊猫数据框的5个基本操作

發(fā)布時(shí)間：2023/11/29 编程问答 59 豆豆

生活随笔收集整理的這篇文章主要介紹了熊猫数据集_大熊猫数据框的5个基本操作小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

熊貓數(shù)據(jù)集

Tips and Tricks for Data Science

數(shù)據(jù)科學(xué)技巧與竅門(mén)

Pandas is a powerful and easy-to-use software library written in the Python programming language, and is used for data manipulation and analysis.

Pandas是使用Python編程語(yǔ)言編寫(xiě)的功能強(qiáng)大且易于使用的軟件庫(kù)，可用于數(shù)據(jù)處理和分析。

Installing pandas: https://pypi.org/project/pandas/

安裝熊貓： https : //pypi.org/project/pandas/

pip install pandas

什么是Pandas DataFrame？ (What is a Pandas DataFrame?)

A pandas DataFrame is a two dimensional data structure which stores data in a tabular form. Every row and column are labeled and can hold data of any type.

pandas DataFrame是二維數(shù)據(jù)結(jié)構(gòu)，以表格形式存儲(chǔ)數(shù)據(jù)。每行和每列都有標(biāo)簽，可以保存任何類型的數(shù)據(jù)。

Here is an example:

這是一個(gè)例子：

First 3 rows of the Titanic: Machine Learning from Disaster dataset泰坦尼克號(hào)的前三行：災(zāi)難數(shù)據(jù)中的機(jī)器學(xué)習(xí)

1.創(chuàng)建一個(gè)熊貓DataFrame (1. Creating a pandas DataFrame)

The pandas.DataFrame constructor:

pandas.DataFrame構(gòu)造函數(shù)：

pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False

data This parameter serves as the input to make a DataFrame, which could be a NumPy ndarray, iterable, dict or another DataFrame. An ndarray is a multidimensional container of items of the same type and size. An iterable is any Python object capable of returning its members one at a time, permitting to be iterated over in a for-loop. Some examples for iterables are lists, tuples and sets. Dict here can refer to pandas Series, arrays, constants or list-like objects.

data此參數(shù)用作制作DataFrame的輸入，該DataFrame可以是NumPy ndarray，可迭代，dict或另一個(gè)DataFrame 。 ndarray是具有相同類型和大小的項(xiàng)目的多維容器。可迭代對(duì)象是能夠一次返回其成員并允許在for循環(huán)中對(duì)其進(jìn)行迭代的任何Python對(duì)象。可迭代的一些示例是列表，元組和集合。這里的Dict可以引用pandas系列，數(shù)組，常量或類似列表的對(duì)象。

indexThis parameter could have an Index or an array-like data type and serves as the index for the row labels in the resulting DataFrame. If no indexing information is provided, this parameter will default to RangeIndex.

index此參數(shù)可以具有Index或類似數(shù)組的數(shù)據(jù)類型，并用作結(jié)果DataFrame中行標(biāo)簽的索引。如果沒(méi)有提供索引信息，則此參數(shù)將默認(rèn)為RangeIndex 。

columnsThis parameter could have an Index or an array-like data type and serves as the index for the column labels in the resulting DataFrame. If no indexing information is provided, this parameter will default to RangeIndex.

columns此參數(shù)可以具有Index或類似數(shù)組的數(shù)據(jù)類型，并用作結(jié)果DataFrame中列標(biāo)簽的索引。如果沒(méi)有提供索引信息，則此參數(shù)將默認(rèn)為RangeIndex 。

dtypeEach column in the DataFrame can only have a single data type. This parameter is used to force a certain data type. By default, datatype is inferred from data.

DTYPE在數(shù)據(jù)幀的每一列只能有一種數(shù)據(jù)類型。此參數(shù)用于強(qiáng)制某種數(shù)據(jù)類型。默認(rèn)情況下，從數(shù)據(jù)推斷出數(shù)據(jù)類型。

copyWhen this parameter is set to True, and the input data is a DataFrame or a 2D ndarray, data is copied into the resulting DataFrame. By default, copy is set to False.

復(fù)制如果將此參數(shù)設(shè)置為T(mén)rue，并且輸入數(shù)據(jù)是DataFrame或2D ndarray，則將數(shù)據(jù)復(fù)制到結(jié)果DataFrame中。默認(rèn)情況下，復(fù)制設(shè)置為False。

從Python字典創(chuàng)建Pandas DataFrame (Creating a Pandas DataFrame from a Python Dictionary)

import pandas as pd

d = {'Name' : ['John', 'Adam', 'Jane'], 'Age' : [25, 18, 30]}pd.DataFrame(d)

The index parameter can be used to change the default row index and the columns parameter can be used to change the order of the keys:

index參數(shù)可用于更改默認(rèn)行索引， columns參數(shù)可用于更改鍵的順序：

d = {'Name' : ['John', 'Adam', 'Jane'], 'Age' : [25, 18, 30]}pd.DataFrame(d, index=[10, 20, 30], columns=['First Name', 'Current Age'])

從列表創(chuàng)建Pandas DataFrame： (Creating a Pandas DataFrame from a list:)

l = [['John', 25], ['Adam', 18], ['Jane', 30]]pd.DataFrame(l, columns=['Name', 'Age'])

從文件創(chuàng)建Pandas DataFrame (Creating a Pandas DataFrame from a File)

For any Data Science process, the dataset is commonly stored in files having formats like CSV (Comma Separated Values). Pandas allows storing data along with their labels from a CSV file using the method pandas.read_csv().

對(duì)于任何數(shù)據(jù)科學(xué)過(guò)程，數(shù)據(jù)集通常存儲(chǔ)在具有CSV(逗號(hào)分隔值)之類的格式的文件中。 Pandas允許使用pandas.read_csv()方法將數(shù)據(jù)及其標(biāo)簽中的數(shù)據(jù)與CSV文件一起存儲(chǔ)。

Example1.csvExample1.csv

2.從Pandas DataFrame中選擇行和列 (2. Selecting Rows and Columns from a Pandas DataFrame)

從Pandas DataFrame中選擇列 (Selecting Columns from a Pandas DataFrame)

Columns can be selected using their column names.

可以使用列名稱選擇列。

df[column_1, column_2])

df[ column_1 , column_2 ])

Selecting column ‘Name’ from DataFrame df從DataFrame df中選擇“名稱”列

從Pandas DataFrame中選擇行 (Selecting Rows from a Pandas DataFrame)

Pandas provides 2 attributes for selecting rows from a DataFrame: loc and iloc

Pandas提供了2個(gè)用于從DataFrame中選擇行的屬性： loc和iloc

loc is label-based, which means that the row label has to be specified and iloc is integer-based which means that the integer index has to be specified.

loc是基于標(biāo)簽的，這意味著必須指定行標(biāo)簽，而iloc是基于整數(shù)的，這意味著必須指定整數(shù)索引。

Using loc and iloc for selecting rows from DataFrame df使用loc和iloc從DataFrame df中選擇行

3.在Pandas DataFrame中插入行和列 (3. Inserting Rows and Columns to a Pandas DataFrame)

在Pandas DataFrame中插入行 (Inserting Rows to a Pandas DataFrame)

One method of inserting a row into a DataFrame is to create a pandas.Series() object and insert it at the end of the DataFrame using the pandas.DataFrame.append()method. The column indices of the DataFrame serve as the index attribute for the Series object.

將行插入DataFrame的一種方法是創(chuàng)建pandas.Series() 對(duì)象，然后使用pandas.DataFrame.append()方法將其插入DataFrame的pandas.DataFrame.append() 。 DataFrame的列索引用作Series對(duì)象的索引屬性。

Inserting new row to DataFrame df將新行插入DataFrame df

將列插入Pandas DataFrame (Inserting Columns to a Pandas DataFrame)

One easy method of adding a column to a DataFrame is by just referring to the new column and assigning values.

將列添加到DataFrame的一種簡(jiǎn)單方法是僅引用新列并分配值。

Inserting columns ID, Score and Country to DataFrame df將列ID，分?jǐn)?shù)和國(guó)家/地區(qū)插入DataFrame df

4.從Pandas DataFrame刪除行和列 (4. Deleting Rows and Columns from a Pandas DataFrame)

從Pandas DataFrame刪除行 (Deleting Rows from a Pandas DataFrame)

A row can be deleted using the method pandas.DataFrame.drop() with it’s row label.

可以使用帶有行標(biāo)簽的pandas.DataFrame.drop()方法刪除一行。

Deleting row with label 1 from DataFrame df從DataFrame df中刪除帶有標(biāo)簽1的行

To delete a row based on a column, the index of the row is obtained using the DataFrame.index attribute and then the row with the index is deleted using the pandas.DataFrame.drop() method.

要?jiǎng)h除基于列的行，請(qǐng)使用DataFrame.index屬性獲取該行的索引，然后使用pandas.DataFrame.drop()方法刪除具有索引的行。

Deleting row with Name Kelly from DataFrame df從DataFrame df中刪除名稱為Kelly的行

從Pandas DataFrame刪除列 (Deleting Columns from a Pandas DataFrame)

A column can be deleted from a DataFrame based on its label as well as its position in the DataFrame using the method pandas.DataFrame.drop().

可以使用pandas.DataFrame.drop()方法根據(jù)列的標(biāo)簽及其在DataFrame中的位置從DataFrame中刪除列。

Deleting column with label ‘Country’ from DataFrame df從DataFrame df中刪除帶有標(biāo)簽“國(guó)家”的列 Deleting column with position 2 from DataFrame df從DataFrame df中刪除位置2的列

The axis argument is set to 1 when dropping columns, and 0 when dropping rows.

刪除列時(shí)， axis參數(shù)設(shè)置為1；刪除行時(shí)， axis參數(shù)設(shè)置為0。

5.對(duì)Pandas DataFrame排序 (5. Sorting a Pandas DataFrame)

A Pandas DataFrame can be sorted using the pandas.DataFrame.sort_values() method. The by parameter for the method serves as the label of the column to sort by and ascending is set to True for sorting in ascending order and to False for sorting in descending order.

可以使用pandas.DataFrame.sort_values()方法對(duì)Pandas DataFrame進(jìn)行排序。該方法的by參數(shù)用作要按其進(jìn)行排序的列的標(biāo)簽，并且升序設(shè)置為T(mén)rue(以升序排序)，設(shè)置為False(以降序排序)。

Sorting DataFrame df by Name in ascending order按名稱對(duì)DataFrame df進(jìn)行升序排序 Sorting DataFrame df by Age in descending order按年齡降序?qū)ataFrame df進(jìn)行排序

https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-pythonhttps://realpython.com/pandas-dataframe/#creating-a-pandas-dataframehttps://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htmhttps://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html

https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python https://realpython.com/pandas-dataframe/#creating-a-pandas-dataframe https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htm https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html

翻譯自: https://medium.com/ml-course-microsoft-udacity/5-fundamental-operations-on-a-pandas-dataframe-93b4384dff9d