特征工程?
業務建模流程?
將業務抽象為分類or回歸問題 定義標簽,得到y 選取合適的樣本,并匹配出全部的信息作為特征來源 特征工程+模型訓練+模型評價與調優(相互之間可能會有交互) 輸出模型報告 上線與監控
什么是特征
在機器學習的背景下,特征是用來解釋現象發生的單個特性或一組特性。 當這些特性轉換為某種可度量的形式時,它們被稱為特征。
舉個例子,假設你有一個學生列表,這個列表里包含每個學生的姓名、學習小時數、IQ和之前考試的總分數。現在,有一個新學生,你知道他/她的學習小時數和IQ,但他/她的考試分數缺失,你需要估算他/她可能獲得的考試分數。
在這里,你需要用IQ和study_hours構建一個估算分數缺失值的預測模型。所以,IQ和study_hours就成了這個模型的特征。
特征工程可能包含的內容
基礎特征構造
數據預處理
特征衍生
特征篩選
這是一個完整的特征工程流程,但不是唯一的流程,每個過程都有可能會交換順序,具體的場景需要具體分析。
import pandas as pd
import numpy as np
df_train = pd.read_csv('/Users/zhucan/Desktop/金融風控實戰/第三課資料/train.csv')
df_train.head()
結果:
#查看數據基本情況
df_train.shape
#(891, 12)
df_train.info()
結果:
df_train.describe()
結果:
#箱線圖
df_train.boxplot(column = "Age")
結果:?
?
import seaborn as sns
sns.set(color_codes = True)
np.random.seed(sum(map(ord,"distributions"))) #固定種子
sns.distplot(df_train.Age, kde = True, bins = 20, rug = True)
結果:
set(df_train.label)
#{0, 1}
數據預處理
(1)缺失值
主要用到的兩個包:
pandas fillna ? sklearn Imputer
df_train['Age'].sample(10)
#299 50.0
#408 21.0
#158 NaN
#672 70.0
#172 1.0
#447 34.0
#86 16.0
#824 2.0
#527 NaN
#327 36.0
#Name: Age, dtype: float64df_train['Age'].fillna(value=df_train['Age'].mean()).sample(10)
#115 21.000000
#372 19.000000
#771 48.000000
#379 19.000000
#855 18.000000
#231 29.000000
#641 24.000000
#854 44.000000
#303 29.699118
#0 22.000000
#Name: Age, dtype: float64
(2)數值型?
數值縮放
"""取對數等變換,可以對分布做一定的緩解
可以讓數值間的差異變大"""
import numpy as np
log_age = df_train['Age'].apply(lambda x:np.log(x))
df_train.loc[:,'log_age'] = log_age
df_train.head(10)
結果:
""" 幅度縮放,最大最小值縮放到[0,1]區間內 """
from sklearn.preprocessing import MinMaxScaler
mm_scaler = MinMaxScaler()
fare_trans = mm_scaler.fit_transform(df_train[['Fare']])""" 幅度縮放,將每一列的數據標準化為正態分布 """
from sklearn.preprocessing import StandardScaler
std_scaler = StandardScaler()
fare_std_trans = std_scaler.fit_transform(df_train[['Fare']])""" 中位數或者四分位數去中心化數據,對異常值不敏感 """
from sklearn.preprocessing import robust_scale
fare_robust_trans = robust_scale(df_train[['Fare','Age']])""" 將同一行數據規范化,前面的同一變為1以內也可以達到這樣的效果 """
from sklearn.preprocessing import Normalizer
normalizer = Normalizer()
fare_normal_trans = normalizer.fit_transform(df_train[['Age','Fare']])
(3)統計值
""" 最大最小值 """
max_age = df_train['Age'].max()
min_age = df_train["Age"].min()""" 分位數,極值處理,我們最粗暴的方法就是將前后1%的值替換成前后兩個端點的值 """
age_quarter_01 = df_train['Age'].quantile(0.01)
age_quarter_99 = df_train['Age'].quantile(0.99)""" 四則運算 """
df_train.loc[:,'family_size'] = df_train['SibSp']+df_train['Parch']+1
df_train.loc[:,'tmp'] = df_train['Age']*df_train['Pclass'] + 4*df_train['family_size']""" 多項式特征 """
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=2)
df_train[['SibSp','Parch']].head()
poly_fea = poly.fit_transform(df_train[['SibSp','Parch']])
pd.DataFrame(poly_fea,columns = poly.get_feature_names()).head()
(4)離散化/分箱/分桶
""" 等距切分 """
df_train.loc[:, 'fare_cut'] = pd.cut(df_train['Fare'], 20)
df_train.head()""" 等頻切分做切分,但是每一部分的人數是差不多的"""
""" 通常情況都是使用等頻分箱,讓每個區間人數差不多"""
df_train.loc[:,'fare_qcut'] = pd.qcut(df_train['Fare'], 10)
df_train.head()
結果:?
(5)BiVar圖?
""" BiVar圖是指橫軸為特征升序,縱軸為badrate的變化趨勢 """
""" badrate曲線 """
df_train = df_train.sort_values('Fare')
alist = list(set(df_train['fare_qcut']))
badrate = {}
for x in alist:a = df_train[df_train.fare_qcut == x]bad = a[a.label == 1]['label'].count()good = a[a.label == 0]['label'].count()badrate[x] = bad/(bad+good)
f = zip(badrate.keys(),badrate.values())
f = sorted(f,key = lambda x : x[1],reverse = True )
badrate = pd.DataFrame(f)
badrate.columns = pd.Series(['cut','badrate'])
badrate = badrate.sort_values('cut')
print(badrate)
badrate.plot("cut","badrate",figsize=(10,4)) #.plot用于前面是dataframe,series
結果:
一般采取等頻分箱,很少等距分箱,等距分箱可能造成樣本非常不均勻
一般分5-6箱,保證badrate曲線從非嚴格遞增轉化為嚴格遞增曲線,分箱同時要考慮占比均衡
BIivar圖(1)業務上可解釋(2)bivar圖太平也不好,類似星座這個變量,去掉(3)粗分箱,使bivar圖嚴格單調遞增
(6)OneHot編碼
""" OneHot encoding/獨熱向量編碼 """
""" 一般像男、女這種二分類categories類型的數據采取獨熱向量編碼, 轉化為0、1主要用到 pd.get_dummies """
fare_qcut_oht = pd.get_dummies(df_train[['fare_qcut']])
fare_qcut_oht.head()
embarked_oht = pd.get_dummies(df_train[['Embarked']])
embarked_oht.head()
結果:?
onehot編碼會導致維度過高的問題,可以分箱后再使用onehot
分箱會損失信息,但會帶來穩定性、魯棒性
(7)時間型數據
'''時間型 日期處理'''
car_sales = pd.read_csv('/Users/zhucan/Desktop/金融風控實戰/第三課資料/car_data.csv')
print(car_sales.head())car_sales.loc[:,'date'] = pd.to_datetime(car_sales['date_t'])
print(car_sales.head())
結果:
car_sales.info() '''原始是字符型的,轉變后變成datatime型的'''
結果:?
""" 取出關鍵時間信息 """
""" 月份 """
car_sales.loc[:,'month'] = car_sales['date'].dt.month
""" 幾號 """
car_sales.loc[:,'dom'] = car_sales['date'].dt.day
""" 一年當中第幾天 """
car_sales.loc[:,'doy'] = car_sales['date'].dt.dayofyear
""" 星期幾 """
car_sales.loc[:,'dow'] = car_sales['date'].dt.dayofweek
print(car_sales.head())
結果:
(8)文本型數據
""" 詞袋模型 """
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
corpus = ['This is a very good class','students are very very very good','This is the third sentence','Is this the last doc','PS teacher Mei is very very handsome'
]
X = vectorizer.fit_transform(corpus)
print(vectorizer.get_feature_names())
X.toarray()
結果: 可以得到樣本的詞向量
'''單分詞,雙分詞,多分詞'''
vec = CountVectorizer(ngram_range=(1,3))
X_ngram = vec.fit_transform(corpus)
print(vec.get_feature_names())
X_ngram.toarray()
結果:
""" TF-IDF """
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf_vec = TfidfVectorizer()
tfidf_X = tfidf_vec.fit_transform(corpus)
print(tfidf_vec.get_feature_names())
tfidf_X.toarray()
?結果:
可視化?
""" 詞云圖可以直觀的反應哪些詞作用權重比較大 """
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
corpus = ['This is a very good class','students are very very very good','This is the third sentence','Is this the last doc','teacher Mei is very very handsome'
]
X = vectorizer.fit_transform(corpus)
L = []for item in list(X.toarray()):L.append(list(item))value = [0 for i in range(len(L[0]))]
for i in range(len(L[0])):for j in range(len(L)):value[i] += L[j][i]from pyecharts import WordCloud
wordcloud = WordCloud(width=800,height=500)
#這里是需要做的
wordcloud.add('',vectorizer.get_feature_names(),value,word_size_range=[20,100])
wordcloud
結果:
(9)組合特征
""" 根據條件去判斷獲取組合特征 """
df_train.loc[:,'alone'] = (df_train['SibSp']==0)&(df_train['Parch']==0)
基于時間序列進行特征衍生?
import pandas as pd
import numpy as np
data = pd.read_excel('/Users/zhucan/Desktop/金融風控實戰/第三課資料/textdata.xlsx')
data.head()
""" ft 和 gt 表示兩個變量名 1-12 表示對應12個月中每個月的相應數值 """
'''ft1 指的是 離申請當天一個月內的數據計算出來的加油次數'''
'''gt1 指的是 離申請當天一個月內的數據計算出來的加油金額'''
結果:
""" 基于時間序列進行特征衍生 """
""" 最近p個月,inv>0的月份數 inv表示傳入的變量名 """
def Num(data,inv,p):df=data.loc[:,inv+'1':inv+str(p)]auto_value=np.where(df>0,1,0).sum(axis=1)return data,inv+'_num'+str(p),auto_valuedata_new = data.copy()for p in range(1,12):for inv in ['ft','gt']:data_new,columns_name,values=Num(data_new,inv,p)data_new[columns_name]=values
結果:
'''構建時間序列衍生特征,37個函數'''import numpy as np
import pandas as pdclass time_series_feature(object):def __init__(self):passdef Num(self,data,inv,p):""":param data::param inv::param p::return: 最近p個月,inv大于0的月份個數"""df = data.loc[:,inv+'1':inv+str(p)]auto_value = np.where(df > 0,1,0).sum(axis=1)return inv+'_num'+str(p),auto_valuedef Nmz(self,data,inv,p):""":param data::param inv::param p::return: 最近p個月,inv=0的月份個數"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = np.where(df == 0, 1, 0).sum(axis=1)return inv + '_nmz' + str(p), auto_valuedef Evr(self,data,inv, p):""":param data::param inv::param p::return: 最近p個月,inv>0的月份數是否>=1"""df = data.loc[:, inv + '1':inv + str(p)]arr = np.where(df > 0, 1, 0).sum(axis=1)auto_value = np.where(arr, 1, 0)return inv + '_evr' + str(p), auto_valuedef Avg(self,data,inv, p):""":param p::return: 最近p個月,inv均值"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = np.nanmean(df, axis=1)return inv + '_avg' + str(p), auto_valuedef Tot(self,data,inv, p):""":param data::param inv::param p::return: 最近p個月,inv和"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = np.nansum(df, axis=1)return inv + '_tot' + str(p), auto_valuedef Tot2T(self,data,inv, p):""":param data::param inv::param p::return: 最近(2,p+1)個月,inv和可以看出該變量的波動情況"""df = data.loc[:, inv + '2':inv + str(p + 1)]auto_value = df.sum(1)return inv + '_tot2t' + str(p), auto_valuedef Max(self,data,inv, p):""":param data::param inv::param p::return: 最近p個月,inv最大值"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = np.nanmax(df, axis=1)return inv + '_max' + str(p), auto_valuedef Min(self,data,inv, p):""":param data::param inv::param p::return: 最近p個月,inv最小值"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = np.nanmin(df, axis=1)return inv + '_min' + str(p), auto_valuedef Msg(self,data,inv, p):""":param data::param inv::param p::return: 最近p個月,最近一次inv>0到現在的月份數"""df = data.loc[:, inv + '1':inv + str(p)]df_value = np.where(df > 0, 1, 0)auto_value = []for i in range(len(df_value)):row_value = df_value[i, :]if row_value.max() <= 0:indexs = '0'auto_value.append(indexs)else:indexs = 1for j in row_value:if j > 0:breakindexs += 1auto_value.append(indexs)return inv + '_msg' + str(p), auto_valuedef Msz(self,data,inv, p):""":param data::param inv::param p::return: 最近p個月,最近一次inv=0到現在的月份數"""df = data.loc[:, inv + '1':inv + str(p)]df_value = np.where(df == 0, 1, 0)auto_value = []for i in range(len(df_value)):row_value = df_value[i, :]if row_value.max() <= 0:indexs = '0'auto_value.append(indexs)else:indexs = 1for j in row_value:if j > 0:breakindexs += 1auto_value.append(indexs)return inv + '_msz' + str(p), auto_valuedef Cav(self,data,inv, p):""":param p::return: 當月inv/(最近p個月inv的均值)"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = df[inv + '1'] / np.nanmean(df, axis=1)return inv + '_cav' + str(p), auto_valuedef Cmn(self,data,inv, p):""":param data::param inv::param p::return: 當月inv/(最近p個月inv的最小值)"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = df[inv + '1'] / np.nanmin(df, axis=1)return inv + '_cmn' + str(p), auto_valuedef Mai(self,data,inv, p):""":param data::param inv::param p::return: 最近p個月,每兩個月間的inv的增長量的最大值"""arr = np.array(data.loc[:, inv + '1':inv + str(p)])auto_value = []for i in range(len(arr)):df_value = arr[i, :]value_lst = []for k in range(len(df_value) - 1):minus = df_value[k] - df_value[k + 1]value_lst.append(minus)auto_value.append(np.nanmax(value_lst))return inv + '_mai' + str(p), auto_valuedef Mad(self,data,inv, p):""":param data::param inv::param p::return: 最近p個月,每兩個月間的inv的減少量的最大值"""arr = np.array(data.loc[:, inv + '1':inv + str(p)])auto_value = []for i in range(len(arr)):df_value = arr[i, :]value_lst = []for k in range(len(df_value) - 1):minus = df_value[k + 1] - df_value[k]value_lst.append(minus)auto_value.append(np.nanmax(value_lst))return inv + '_mad' + str(p), auto_valuedef Std(self,data,inv, p):""":param data::param inv::param p::return: 最近p個月,inv的標準差"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = np.nanvar(df, axis=1)return inv + '_std' + str(p), auto_valuedef Cva(self,data,inv, p):""":param p::return: 最近p個月,inv的變異系數"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = np.nanmean(df, axis=1) / np.nanvar(df, axis=1)return inv + '_cva' + str(p), auto_valuedef Cmm(self,data,inv, p):""":param data::param inv::param p::return: (當月inv) - (最近p個月inv的均值)"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = df[inv + '1'] - np.nanmean(df, axis=1)return inv + '_cmm' + str(p), auto_valuedef Cnm(self,data,inv, p):""":param p::return: (當月inv) - (最近p個月inv的最小值)"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = df[inv + '1'] - np.nanmin(df, axis=1)return inv + '_cnm' + str(p), auto_valuedef Cxm(self,data,inv, p):""":param data::param inv::param p::return: (當月inv) - (最近p個月inv的最大值)"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = df[inv + '1'] - np.nanmax(df, axis=1)return inv + '_cxm' + str(p), auto_valuedef Cxp(self,data,inv, p):""":param p::return: ( (當月inv) - (最近p個月inv的最大值) ) / (最近p個月inv的最大值) )"""df = data.loc[:, inv + '1':inv + str(p)]temp = np.nanmin(df, axis=1)auto_value = (df[inv + '1'] - temp) / tempreturn inv + '_cxp' + str(p), auto_valuedef Ran(self,data,inv, p):""":param data::param inv::param p::return: 最近p個月,inv的極差"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = np.nanmax(df, axis=1) - np.nanmin(df, axis=1)return inv + '_ran' + str(p), auto_valuedef Nci(self,data,inv, p):""":param data::param inv::param p::return: 最近min( Time on book,p )個月中,后一個月相比于前一個月增長了的月份數"""arr = np.array(data.loc[:, inv + '1':inv + str(p)])auto_value = []for i in range(len(arr)):df_value = arr[i, :]value_lst = []for k in range(len(df_value) - 1):minus = df_value[k] - df_value[k + 1]value_lst.append(minus)value_ng = np.where(np.array(value_lst) > 0, 1, 0).sum()auto_value.append(np.nanmax(value_ng))return inv + '_nci' + str(p), auto_valuedef Ncd(self,data,inv, p):""":param data::param inv::param p::return: 最近min( Time on book,p )個月中,后一個月相比于前一個月減少了的月份數"""arr = np.array(data.loc[:, inv + '1':inv + str(p)])auto_value = []for i in range(len(arr)):df_value = arr[i, :]value_lst = []for k in range(len(df_value) - 1):minus = df_value[k] - df_value[k + 1]value_lst.append(minus)value_ng = np.where(np.array(value_lst) < 0, 1, 0).sum()auto_value.append(np.nanmax(value_ng))return inv + '_ncd' + str(p), auto_valuedef Ncn(self,data,inv, p):""":param data::param inv::param p::return: 最近min( Time on book,p )個月中,相鄰月份inv 相等的月份數"""arr = np.array(data.loc[:, inv + '1':inv + str(p)])auto_value = []for i in range(len(arr)):df_value = arr[i, :]value_lst = []for k in range(len(df_value) - 1):minus = df_value[k] - df_value[k + 1]value_lst.append(minus)value_ng = np.where(np.array(value_lst) == 0, 1, 0).sum()auto_value.append(np.nanmax(value_ng))return inv + '_ncn' + str(p), auto_valuedef Bup(self,data,inv, p):""":param p::return:desc:If 最近min( Time on book,p )個月中,對任意月份i ,都有 inv[i] > inv[i+1] 即嚴格遞增,且inv > 0則flag = 1 Else flag = 0"""arr = np.array(data.loc[:, inv + '1':inv + str(p)])auto_value = []for i in range(len(arr)):df_value = arr[i, :]index = 0for k in range(len(df_value) - 1):if df_value[k] > df_value[k + 1]:breakindex = + 1if index == p:value = 1else:value = 0auto_value.append(value)return inv + '_bup' + str(p), auto_valuedef Pdn(self,data,inv, p):""":param data::param inv::param p::return:desc: If 最近min( Time on book,p )個月中,對任意月份i ,都有 inv[i] < inv[i+1] ,即嚴格遞減,且inv > 0則flag = 1 Else flag = 0"""arr = np.array(data.loc[:, inv + '1':inv + str(p)])auto_value = []for i in range(len(arr)):df_value = arr[i, :]index = 0for k in range(len(df_value) - 1):if df_value[k + 1] > df_value[k]:breakindex = + 1if index == p:value = 1else:value = 0auto_value.append(value)return inv + '_pdn' + str(p), auto_valuedef Trm(self,data,inv, p):""":param data::param inv::param p::return: 最近min( Time on book,p )個月,inv的修建均值"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = []for i in range(len(df)):trm_mean = list(df.loc[i, :])trm_mean.remove(np.nanmax(trm_mean))trm_mean.remove(np.nanmin(trm_mean))temp = np.nanmean(trm_mean)auto_value.append(temp)return inv + '_trm' + str(p), auto_valuedef Cmx(self,data,inv, p):""":param data::param inv::param p::return: 當月inv / 最近p個月的inv中的最大值"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = (df[inv + '1'] - np.nanmax(df, axis=1)) / np.nanmax(df, axis=1)return inv + '_cmx' + str(p), auto_valuedef Cmp(self,data,inv, p):""":param data::param inv::param p::return: ( 當月inv - 最近p個月的inv均值 ) / inv均值"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = (df[inv + '1'] - np.nanmean(df, axis=1)) / np.nanmean(df, axis=1)return inv + '_cmp' + str(p), auto_valuedef Cnp(self,data,inv, p):""":param p::return: ( 當月inv - 最近p個月的inv最小值 ) /inv最小值"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = (df[inv + '1'] - np.nanmin(df, axis=1)) / np.nanmin(df, axis=1)return inv + '_cnp' + str(p), auto_valuedef Msx(self,data,inv, p):""":param data::param inv::param p::return: 最近min( Time on book,p )個月取最大值的月份距現在的月份數"""df = data.loc[:, inv + '1':inv + str(p)]df['_max'] = np.nanmax(df, axis=1)for i in range(1, p + 1):df[inv + str(i)] = list(df[inv + str(i)] == df['_max'])del df['_max']df_value = np.where(df == True, 1, 0)auto_value = []for i in range(len(df_value)):row_value = df_value[i, :]indexs = 1for j in row_value:if j == 1:breakindexs += 1auto_value.append(indexs)return inv + '_msx' + str(p), auto_valuedef Rpp(self,data,inv, p):""":param data::param inv::param p::return: 近p個月的均值/((p,2p)個月的inv均值)"""df1 = data.loc[:, inv + '1':inv + str(p)]value1 = np.nanmean(df1, axis=1)df2 = data.loc[:, inv + str(p):inv + str(2 * p)]value2 = np.nanmean(df2, axis=1)auto_value = value1 / value2return inv + '_rpp' + str(p), auto_valuedef Dpp(self,data,inv, p):""":param data::param inv::param p::return: 最近p個月的均值 - ((p,2p)個月的inv均值)"""df1 = data.loc[:, inv + '1':inv + str(p)]value1 = np.nanmean(df1, axis=1)df2 = data.loc[:, inv + str(p):inv + str(2 * p)]value2 = np.nanmean(df2, axis=1)auto_value = value1 - value2return inv + '_dpp' + str(p), auto_valuedef Mpp(self,data,inv, p):""":param data::param inv::param p::return: (最近p個月的inv最大值)/ (最近(p,2p)個月的inv最大值)"""df1 = data.loc[:, inv + '1':inv + str(p)]value1 = np.nanmax(df1, axis=1)df2 = data.loc[:, inv + str(p):inv + str(2 * p)]value2 = np.nanmax(df2, axis=1)auto_value = value1 / value2return inv + '_mpp' + str(p), auto_valuedef Npp(self,data,inv, p):""":param data::param inv::param p::return: (最近p個月的inv最小值)/ (最近(p,2p)個月的inv最小值)"""df1 = data.loc[:, inv + '1':inv + str(p)]value1 = np.nanmin(df1, axis=1)df2 = data.loc[:, inv + str(p):inv + str(2 * p)]value2 = np.nanmin(df2, axis=1)auto_value = value1 / value2return inv + '_npp' + str(p), auto_valuedef auto_var(self,data_new,inv,p):""":param data::param inv::param p::return: 批量調用雙參數函數"""try:columns_name, values = self.Num(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Nmz(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Evr(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Avg(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Tot(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Tot2T(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Max(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Max(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Min(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Msg(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Msz(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cav(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cmn(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Std(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cva(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cmm(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cnm(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cxm(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cxp(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Ran(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Nci(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Ncd(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Ncn(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Pdn(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cmx(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cmp(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cnp(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Msx(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Nci(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Trm(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Bup(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Mai(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Mad(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Rpp(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Dpp(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Mpp(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Npp(data_new,inv, p)data_new[columns_name] = valuesexcept:passreturn data_newauto_var2 = time_series_feature() for p in range(1,12):for inv in ['ft','gt']:data = auto_var2.auto_var(data,inv,p)
data
結果:
總結
以上是生活随笔 為你收集整理的金融风控实战——特征工程上 的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔 網站內容還不錯,歡迎將生活随笔 推薦給好友。