日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程语言 > python >内容正文

python

python数据特征提取_训练数据的特征提取

發(fā)布時間:2025/3/20 python 29 豆豆
生活随笔 收集整理的這篇文章主要介紹了 python数据特征提取_训练数据的特征提取 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

創(chuàng)建兩個數(shù)據(jù)幀(在第一個數(shù)據(jù)幀中,每個數(shù)據(jù)點的所有功能都相同,第二個數(shù)據(jù)幀是對第一個數(shù)據(jù)幀的修改,為某些數(shù)據(jù)點引入了不同的功能):import pandas as pd

import numpy as np

import random

import time

import itertools

# Create a DataFrame where all the keys for each datapoint in the "features" column are the same.

num = 300000

NAMES = ['John', 'Mark', 'David', 'George', 'Kevin']

AGES = [25, 21, 12, 11, 16]

FEATURES1 = ['Post Graduate', 'Under Graduate', 'High School']

FEATURES2 = ['Football Player', 'Cricketer', 'Carpenter', 'Driver']

LABELS = [1, 2, 3]

df = pd.DataFrame()

df.loc[:num, 0]= ["name={0};age={1};feature1={2};feature2={3}"\

.format(NAMES[np.random.randint(0, len(NAMES))],\

AGES[np.random.randint(0, len(AGES))],\

FEATURES1[np.random.randint(0, len(FEATURES1))],\

FEATURES2[np.random.randint(0, len(FEATURES2))]) for i in xrange(num)]

df['label'] = [LABELS[np.random.randint(0, len(LABELS))] for i in range(num)]

df.rename(columns={0:"features"}, inplace=True)

print df.head(20)

# Create a modified sample DataFrame from the previous one, where not all the keys are the same for each data point.

mod_df = df

random_positions1 = random.sample(xrange(10), 5)

random_positions2 = random.sample(xrange(11, 20), 5)

INTERESTS = ['Basketball', 'Golf', 'Rugby']

SMOKING = ['Yes', 'No']

mod_df.loc[random_positions1, 'features'] = ["name={0};age={1};interest={2}"\

.format(NAMES[np.random.randint(0, len(NAMES))],\

AGES[np.random.randint(0, len(AGES))],\

INTERESTS[np.random.randint(0, len(INTERESTS))]) for i in xrange(len(random_positions1))]

mod_df.loc[random_positions2, 'features'] = ["name={0};age={1};smoking={2}"\

.format(NAMES[np.random.randint(0, len(NAMES))],\

AGES[np.random.randint(0, len(AGES))],\

SMOKING[np.random.randint(0, len(SMOKING))]) for i in xrange(len(random_positions2))]

print mod_df.head(20)

假設原始數(shù)據(jù)存儲在名為df的數(shù)據(jù)幀中。在

解決方案1(每個數(shù)據(jù)點的所有功能都相同)

^{pr2}$

編輯:您需要做的一件事就是相應地編輯columns列表。在

每個點的解決方案可能相同(或者每個特征都相同)import pandas as pd

import numpy as np

import time

import itertools

# The following functions are meant to extract the keys from each row, which are going to be used as columns.

def extract_key(x):

return x.split('=')[0]

def def_columns(x):

lista = x.split(';')

keys = [extract_key(i) for i in lista]

return keys

df = mod_df

columns = pd.Series(df.features.apply(def_columns)).tolist()

flattened_columns = list(itertools.chain(*columns))

flattened_columns = np.unique(np.array(flattened_columns)).tolist()

flattened_columns

# This function turns each row from the original dataframe into a dictionary.

def function(x):

lista = x.split(';')

dict_ = {}

for i in lista:

key, val = i.split('=')

dict_[key ] = val

return dict_

df.features.apply(function)

arr = pd.Series(df.features.apply(function)).tolist()

pd.DataFrame.from_dict(arr)

總結(jié)

以上是生活随笔為你收集整理的python数据特征提取_训练数据的特征提取的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。