日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

金融风控实战——特征工程上

發布時間:2025/3/21 编程问答 64 豆豆
生活随笔 收集整理的這篇文章主要介紹了 金融风控实战——特征工程上 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

特征工程?

業務建模流程?

  • 將業務抽象為分類or回歸問題
  • 定義標簽,得到y
  • 選取合適的樣本,并匹配出全部的信息作為特征來源
  • 特征工程+模型訓練+模型評價與調優(相互之間可能會有交互)
  • 輸出模型報告
  • 上線與監控
  • 什么是特征

    在機器學習的背景下,特征是用來解釋現象發生的單個特性或一組特性。 當這些特性轉換為某種可度量的形式時,它們被稱為特征。

    舉個例子,假設你有一個學生列表,這個列表里包含每個學生的姓名、學習小時數、IQ和之前考試的總分數。現在,有一個新學生,你知道他/她的學習小時數和IQ,但他/她的考試分數缺失,你需要估算他/她可能獲得的考試分數。

    在這里,你需要用IQ和study_hours構建一個估算分數缺失值的預測模型。所以,IQ和study_hours就成了這個模型的特征。

    特征工程可能包含的內容

  • 基礎特征構造

  • 數據預處理

  • 特征衍生

  • 特征篩選

  • 這是一個完整的特征工程流程,但不是唯一的流程,每個過程都有可能會交換順序,具體的場景需要具體分析。

    import pandas as pd import numpy as np df_train = pd.read_csv('/Users/zhucan/Desktop/金融風控實戰/第三課資料/train.csv') df_train.head()

    結果:

    #查看數據基本情況 df_train.shape #(891, 12) df_train.info()

    結果:

    df_train.describe()

    結果:

    #箱線圖 df_train.boxplot(column = "Age")

    結果:?

    ?

    import seaborn as sns sns.set(color_codes = True) np.random.seed(sum(map(ord,"distributions"))) #固定種子 sns.distplot(df_train.Age, kde = True, bins = 20, rug = True)

    結果:

    set(df_train.label) #{0, 1}

    數據預處理

    (1)缺失值

    主要用到的兩個包:

  • pandas fillna ?
  • sklearn Imputer

  • df_train['Age'].sample(10) #299 50.0 #408 21.0 #158 NaN #672 70.0 #172 1.0 #447 34.0 #86 16.0 #824 2.0 #527 NaN #327 36.0 #Name: Age, dtype: float64df_train['Age'].fillna(value=df_train['Age'].mean()).sample(10) #115 21.000000 #372 19.000000 #771 48.000000 #379 19.000000 #855 18.000000 #231 29.000000 #641 24.000000 #854 44.000000 #303 29.699118 #0 22.000000 #Name: Age, dtype: float64

    (2)數值型?

    數值縮放

    """取對數等變換,可以對分布做一定的緩解 可以讓數值間的差異變大""" import numpy as np log_age = df_train['Age'].apply(lambda x:np.log(x)) df_train.loc[:,'log_age'] = log_age df_train.head(10)

    結果:

    """ 幅度縮放,最大最小值縮放到[0,1]區間內 """ from sklearn.preprocessing import MinMaxScaler mm_scaler = MinMaxScaler() fare_trans = mm_scaler.fit_transform(df_train[['Fare']])""" 幅度縮放,將每一列的數據標準化為正態分布 """ from sklearn.preprocessing import StandardScaler std_scaler = StandardScaler() fare_std_trans = std_scaler.fit_transform(df_train[['Fare']])""" 中位數或者四分位數去中心化數據,對異常值不敏感 """ from sklearn.preprocessing import robust_scale fare_robust_trans = robust_scale(df_train[['Fare','Age']])""" 將同一行數據規范化,前面的同一變為1以內也可以達到這樣的效果 """ from sklearn.preprocessing import Normalizer normalizer = Normalizer() fare_normal_trans = normalizer.fit_transform(df_train[['Age','Fare']])

    (3)統計值

    """ 最大最小值 """ max_age = df_train['Age'].max() min_age = df_train["Age"].min()""" 分位數,極值處理,我們最粗暴的方法就是將前后1%的值替換成前后兩個端點的值 """ age_quarter_01 = df_train['Age'].quantile(0.01) age_quarter_99 = df_train['Age'].quantile(0.99)""" 四則運算 """ df_train.loc[:,'family_size'] = df_train['SibSp']+df_train['Parch']+1 df_train.loc[:,'tmp'] = df_train['Age']*df_train['Pclass'] + 4*df_train['family_size']""" 多項式特征 """ from sklearn.preprocessing import PolynomialFeatures poly = PolynomialFeatures(degree=2) df_train[['SibSp','Parch']].head() poly_fea = poly.fit_transform(df_train[['SibSp','Parch']]) pd.DataFrame(poly_fea,columns = poly.get_feature_names()).head()

    (4)離散化/分箱/分桶

    """ 等距切分 """ df_train.loc[:, 'fare_cut'] = pd.cut(df_train['Fare'], 20) df_train.head()""" 等頻切分做切分,但是每一部分的人數是差不多的""" """ 通常情況都是使用等頻分箱,讓每個區間人數差不多""" df_train.loc[:,'fare_qcut'] = pd.qcut(df_train['Fare'], 10) df_train.head()

    結果:?

    (5)BiVar圖?

    """ BiVar圖是指橫軸為特征升序,縱軸為badrate的變化趨勢 """ """ badrate曲線 """ df_train = df_train.sort_values('Fare') alist = list(set(df_train['fare_qcut'])) badrate = {} for x in alist:a = df_train[df_train.fare_qcut == x]bad = a[a.label == 1]['label'].count()good = a[a.label == 0]['label'].count()badrate[x] = bad/(bad+good) f = zip(badrate.keys(),badrate.values()) f = sorted(f,key = lambda x : x[1],reverse = True ) badrate = pd.DataFrame(f) badrate.columns = pd.Series(['cut','badrate']) badrate = badrate.sort_values('cut') print(badrate) badrate.plot("cut","badrate",figsize=(10,4)) #.plot用于前面是dataframe,series

    結果:

    一般采取等頻分箱,很少等距分箱,等距分箱可能造成樣本非常不均勻

    一般分5-6箱,保證badrate曲線從非嚴格遞增轉化為嚴格遞增曲線,分箱同時要考慮占比均衡

    BIivar圖(1)業務上可解釋(2)bivar圖太平也不好,類似星座這個變量,去掉(3)粗分箱,使bivar圖嚴格單調遞增

    (6)OneHot編碼

    """ OneHot encoding/獨熱向量編碼 """ """ 一般像男、女這種二分類categories類型的數據采取獨熱向量編碼, 轉化為0、1主要用到 pd.get_dummies """ fare_qcut_oht = pd.get_dummies(df_train[['fare_qcut']]) fare_qcut_oht.head() embarked_oht = pd.get_dummies(df_train[['Embarked']]) embarked_oht.head()

    結果:?

    onehot編碼會導致維度過高的問題,可以分箱后再使用onehot

    分箱會損失信息,但會帶來穩定性、魯棒性

    (7)時間型數據

    '''時間型 日期處理''' car_sales = pd.read_csv('/Users/zhucan/Desktop/金融風控實戰/第三課資料/car_data.csv') print(car_sales.head())car_sales.loc[:,'date'] = pd.to_datetime(car_sales['date_t']) print(car_sales.head())

    結果:

    car_sales.info() '''原始是字符型的,轉變后變成datatime型的'''

    結果:?

    """ 取出關鍵時間信息 """ """ 月份 """ car_sales.loc[:,'month'] = car_sales['date'].dt.month """ 幾號 """ car_sales.loc[:,'dom'] = car_sales['date'].dt.day """ 一年當中第幾天 """ car_sales.loc[:,'doy'] = car_sales['date'].dt.dayofyear """ 星期幾 """ car_sales.loc[:,'dow'] = car_sales['date'].dt.dayofweek print(car_sales.head())

    結果:

    (8)文本型數據

    """ 詞袋模型 """ from sklearn.feature_extraction.text import CountVectorizer vectorizer = CountVectorizer() corpus = ['This is a very good class','students are very very very good','This is the third sentence','Is this the last doc','PS teacher Mei is very very handsome' ] X = vectorizer.fit_transform(corpus) print(vectorizer.get_feature_names()) X.toarray()

    結果:
    可以得到樣本的詞向量

    '''單分詞,雙分詞,多分詞''' vec = CountVectorizer(ngram_range=(1,3)) X_ngram = vec.fit_transform(corpus) print(vec.get_feature_names()) X_ngram.toarray()

    結果:

    """ TF-IDF """ from sklearn.feature_extraction.text import TfidfVectorizer tfidf_vec = TfidfVectorizer() tfidf_X = tfidf_vec.fit_transform(corpus) print(tfidf_vec.get_feature_names()) tfidf_X.toarray()

    ?結果:

    可視化?

    """ 詞云圖可以直觀的反應哪些詞作用權重比較大 """ from sklearn.feature_extraction.text import CountVectorizer vectorizer = CountVectorizer() corpus = ['This is a very good class','students are very very very good','This is the third sentence','Is this the last doc','teacher Mei is very very handsome' ] X = vectorizer.fit_transform(corpus) L = []for item in list(X.toarray()):L.append(list(item))value = [0 for i in range(len(L[0]))] for i in range(len(L[0])):for j in range(len(L)):value[i] += L[j][i]from pyecharts import WordCloud wordcloud = WordCloud(width=800,height=500) #這里是需要做的 wordcloud.add('',vectorizer.get_feature_names(),value,word_size_range=[20,100]) wordcloud

    結果:

    (9)組合特征

    """ 根據條件去判斷獲取組合特征 """ df_train.loc[:,'alone'] = (df_train['SibSp']==0)&(df_train['Parch']==0)

    基于時間序列進行特征衍生?

    import pandas as pd import numpy as np data = pd.read_excel('/Users/zhucan/Desktop/金融風控實戰/第三課資料/textdata.xlsx') data.head() """ ft 和 gt 表示兩個變量名 1-12 表示對應12個月中每個月的相應數值 """ '''ft1 指的是 離申請當天一個月內的數據計算出來的加油次數''' '''gt1 指的是 離申請當天一個月內的數據計算出來的加油金額'''

    結果:

    """ 基于時間序列進行特征衍生 """ """ 最近p個月,inv>0的月份數 inv表示傳入的變量名 """ def Num(data,inv,p):df=data.loc[:,inv+'1':inv+str(p)]auto_value=np.where(df>0,1,0).sum(axis=1)return data,inv+'_num'+str(p),auto_valuedata_new = data.copy()for p in range(1,12):for inv in ['ft','gt']:data_new,columns_name,values=Num(data_new,inv,p)data_new[columns_name]=values

    結果:

    '''構建時間序列衍生特征,37個函數'''import numpy as np import pandas as pdclass time_series_feature(object):def __init__(self):passdef Num(self,data,inv,p):""":param data::param inv::param p::return: 最近p個月,inv大于0的月份個數"""df = data.loc[:,inv+'1':inv+str(p)]auto_value = np.where(df > 0,1,0).sum(axis=1)return inv+'_num'+str(p),auto_valuedef Nmz(self,data,inv,p):""":param data::param inv::param p::return: 最近p個月,inv=0的月份個數"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = np.where(df == 0, 1, 0).sum(axis=1)return inv + '_nmz' + str(p), auto_valuedef Evr(self,data,inv, p):""":param data::param inv::param p::return: 最近p個月,inv>0的月份數是否>=1"""df = data.loc[:, inv + '1':inv + str(p)]arr = np.where(df > 0, 1, 0).sum(axis=1)auto_value = np.where(arr, 1, 0)return inv + '_evr' + str(p), auto_valuedef Avg(self,data,inv, p):""":param p::return: 最近p個月,inv均值"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = np.nanmean(df, axis=1)return inv + '_avg' + str(p), auto_valuedef Tot(self,data,inv, p):""":param data::param inv::param p::return: 最近p個月,inv和"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = np.nansum(df, axis=1)return inv + '_tot' + str(p), auto_valuedef Tot2T(self,data,inv, p):""":param data::param inv::param p::return: 最近(2,p+1)個月,inv和可以看出該變量的波動情況"""df = data.loc[:, inv + '2':inv + str(p + 1)]auto_value = df.sum(1)return inv + '_tot2t' + str(p), auto_valuedef Max(self,data,inv, p):""":param data::param inv::param p::return: 最近p個月,inv最大值"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = np.nanmax(df, axis=1)return inv + '_max' + str(p), auto_valuedef Min(self,data,inv, p):""":param data::param inv::param p::return: 最近p個月,inv最小值"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = np.nanmin(df, axis=1)return inv + '_min' + str(p), auto_valuedef Msg(self,data,inv, p):""":param data::param inv::param p::return: 最近p個月,最近一次inv>0到現在的月份數"""df = data.loc[:, inv + '1':inv + str(p)]df_value = np.where(df > 0, 1, 0)auto_value = []for i in range(len(df_value)):row_value = df_value[i, :]if row_value.max() <= 0:indexs = '0'auto_value.append(indexs)else:indexs = 1for j in row_value:if j > 0:breakindexs += 1auto_value.append(indexs)return inv + '_msg' + str(p), auto_valuedef Msz(self,data,inv, p):""":param data::param inv::param p::return: 最近p個月,最近一次inv=0到現在的月份數"""df = data.loc[:, inv + '1':inv + str(p)]df_value = np.where(df == 0, 1, 0)auto_value = []for i in range(len(df_value)):row_value = df_value[i, :]if row_value.max() <= 0:indexs = '0'auto_value.append(indexs)else:indexs = 1for j in row_value:if j > 0:breakindexs += 1auto_value.append(indexs)return inv + '_msz' + str(p), auto_valuedef Cav(self,data,inv, p):""":param p::return: 當月inv/(最近p個月inv的均值)"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = df[inv + '1'] / np.nanmean(df, axis=1)return inv + '_cav' + str(p), auto_valuedef Cmn(self,data,inv, p):""":param data::param inv::param p::return: 當月inv/(最近p個月inv的最小值)"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = df[inv + '1'] / np.nanmin(df, axis=1)return inv + '_cmn' + str(p), auto_valuedef Mai(self,data,inv, p):""":param data::param inv::param p::return: 最近p個月,每兩個月間的inv的增長量的最大值"""arr = np.array(data.loc[:, inv + '1':inv + str(p)])auto_value = []for i in range(len(arr)):df_value = arr[i, :]value_lst = []for k in range(len(df_value) - 1):minus = df_value[k] - df_value[k + 1]value_lst.append(minus)auto_value.append(np.nanmax(value_lst))return inv + '_mai' + str(p), auto_valuedef Mad(self,data,inv, p):""":param data::param inv::param p::return: 最近p個月,每兩個月間的inv的減少量的最大值"""arr = np.array(data.loc[:, inv + '1':inv + str(p)])auto_value = []for i in range(len(arr)):df_value = arr[i, :]value_lst = []for k in range(len(df_value) - 1):minus = df_value[k + 1] - df_value[k]value_lst.append(minus)auto_value.append(np.nanmax(value_lst))return inv + '_mad' + str(p), auto_valuedef Std(self,data,inv, p):""":param data::param inv::param p::return: 最近p個月,inv的標準差"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = np.nanvar(df, axis=1)return inv + '_std' + str(p), auto_valuedef Cva(self,data,inv, p):""":param p::return: 最近p個月,inv的變異系數"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = np.nanmean(df, axis=1) / np.nanvar(df, axis=1)return inv + '_cva' + str(p), auto_valuedef Cmm(self,data,inv, p):""":param data::param inv::param p::return: (當月inv) - (最近p個月inv的均值)"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = df[inv + '1'] - np.nanmean(df, axis=1)return inv + '_cmm' + str(p), auto_valuedef Cnm(self,data,inv, p):""":param p::return: (當月inv) - (最近p個月inv的最小值)"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = df[inv + '1'] - np.nanmin(df, axis=1)return inv + '_cnm' + str(p), auto_valuedef Cxm(self,data,inv, p):""":param data::param inv::param p::return: (當月inv) - (最近p個月inv的最大值)"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = df[inv + '1'] - np.nanmax(df, axis=1)return inv + '_cxm' + str(p), auto_valuedef Cxp(self,data,inv, p):""":param p::return: ( (當月inv) - (最近p個月inv的最大值) ) / (最近p個月inv的最大值) )"""df = data.loc[:, inv + '1':inv + str(p)]temp = np.nanmin(df, axis=1)auto_value = (df[inv + '1'] - temp) / tempreturn inv + '_cxp' + str(p), auto_valuedef Ran(self,data,inv, p):""":param data::param inv::param p::return: 最近p個月,inv的極差"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = np.nanmax(df, axis=1) - np.nanmin(df, axis=1)return inv + '_ran' + str(p), auto_valuedef Nci(self,data,inv, p):""":param data::param inv::param p::return: 最近min( Time on book,p )個月中,后一個月相比于前一個月增長了的月份數"""arr = np.array(data.loc[:, inv + '1':inv + str(p)])auto_value = []for i in range(len(arr)):df_value = arr[i, :]value_lst = []for k in range(len(df_value) - 1):minus = df_value[k] - df_value[k + 1]value_lst.append(minus)value_ng = np.where(np.array(value_lst) > 0, 1, 0).sum()auto_value.append(np.nanmax(value_ng))return inv + '_nci' + str(p), auto_valuedef Ncd(self,data,inv, p):""":param data::param inv::param p::return: 最近min( Time on book,p )個月中,后一個月相比于前一個月減少了的月份數"""arr = np.array(data.loc[:, inv + '1':inv + str(p)])auto_value = []for i in range(len(arr)):df_value = arr[i, :]value_lst = []for k in range(len(df_value) - 1):minus = df_value[k] - df_value[k + 1]value_lst.append(minus)value_ng = np.where(np.array(value_lst) < 0, 1, 0).sum()auto_value.append(np.nanmax(value_ng))return inv + '_ncd' + str(p), auto_valuedef Ncn(self,data,inv, p):""":param data::param inv::param p::return: 最近min( Time on book,p )個月中,相鄰月份inv 相等的月份數"""arr = np.array(data.loc[:, inv + '1':inv + str(p)])auto_value = []for i in range(len(arr)):df_value = arr[i, :]value_lst = []for k in range(len(df_value) - 1):minus = df_value[k] - df_value[k + 1]value_lst.append(minus)value_ng = np.where(np.array(value_lst) == 0, 1, 0).sum()auto_value.append(np.nanmax(value_ng))return inv + '_ncn' + str(p), auto_valuedef Bup(self,data,inv, p):""":param p::return:desc:If 最近min( Time on book,p )個月中,對任意月份i ,都有 inv[i] > inv[i+1] 即嚴格遞增,且inv > 0則flag = 1 Else flag = 0"""arr = np.array(data.loc[:, inv + '1':inv + str(p)])auto_value = []for i in range(len(arr)):df_value = arr[i, :]index = 0for k in range(len(df_value) - 1):if df_value[k] > df_value[k + 1]:breakindex = + 1if index == p:value = 1else:value = 0auto_value.append(value)return inv + '_bup' + str(p), auto_valuedef Pdn(self,data,inv, p):""":param data::param inv::param p::return:desc: If 最近min( Time on book,p )個月中,對任意月份i ,都有 inv[i] < inv[i+1] ,即嚴格遞減,且inv > 0則flag = 1 Else flag = 0"""arr = np.array(data.loc[:, inv + '1':inv + str(p)])auto_value = []for i in range(len(arr)):df_value = arr[i, :]index = 0for k in range(len(df_value) - 1):if df_value[k + 1] > df_value[k]:breakindex = + 1if index == p:value = 1else:value = 0auto_value.append(value)return inv + '_pdn' + str(p), auto_valuedef Trm(self,data,inv, p):""":param data::param inv::param p::return: 最近min( Time on book,p )個月,inv的修建均值"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = []for i in range(len(df)):trm_mean = list(df.loc[i, :])trm_mean.remove(np.nanmax(trm_mean))trm_mean.remove(np.nanmin(trm_mean))temp = np.nanmean(trm_mean)auto_value.append(temp)return inv + '_trm' + str(p), auto_valuedef Cmx(self,data,inv, p):""":param data::param inv::param p::return: 當月inv / 最近p個月的inv中的最大值"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = (df[inv + '1'] - np.nanmax(df, axis=1)) / np.nanmax(df, axis=1)return inv + '_cmx' + str(p), auto_valuedef Cmp(self,data,inv, p):""":param data::param inv::param p::return: ( 當月inv - 最近p個月的inv均值 ) / inv均值"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = (df[inv + '1'] - np.nanmean(df, axis=1)) / np.nanmean(df, axis=1)return inv + '_cmp' + str(p), auto_valuedef Cnp(self,data,inv, p):""":param p::return: ( 當月inv - 最近p個月的inv最小值 ) /inv最小值"""df = data.loc[:, inv + '1':inv + str(p)]auto_value = (df[inv + '1'] - np.nanmin(df, axis=1)) / np.nanmin(df, axis=1)return inv + '_cnp' + str(p), auto_valuedef Msx(self,data,inv, p):""":param data::param inv::param p::return: 最近min( Time on book,p )個月取最大值的月份距現在的月份數"""df = data.loc[:, inv + '1':inv + str(p)]df['_max'] = np.nanmax(df, axis=1)for i in range(1, p + 1):df[inv + str(i)] = list(df[inv + str(i)] == df['_max'])del df['_max']df_value = np.where(df == True, 1, 0)auto_value = []for i in range(len(df_value)):row_value = df_value[i, :]indexs = 1for j in row_value:if j == 1:breakindexs += 1auto_value.append(indexs)return inv + '_msx' + str(p), auto_valuedef Rpp(self,data,inv, p):""":param data::param inv::param p::return: 近p個月的均值/((p,2p)個月的inv均值)"""df1 = data.loc[:, inv + '1':inv + str(p)]value1 = np.nanmean(df1, axis=1)df2 = data.loc[:, inv + str(p):inv + str(2 * p)]value2 = np.nanmean(df2, axis=1)auto_value = value1 / value2return inv + '_rpp' + str(p), auto_valuedef Dpp(self,data,inv, p):""":param data::param inv::param p::return: 最近p個月的均值 - ((p,2p)個月的inv均值)"""df1 = data.loc[:, inv + '1':inv + str(p)]value1 = np.nanmean(df1, axis=1)df2 = data.loc[:, inv + str(p):inv + str(2 * p)]value2 = np.nanmean(df2, axis=1)auto_value = value1 - value2return inv + '_dpp' + str(p), auto_valuedef Mpp(self,data,inv, p):""":param data::param inv::param p::return: (最近p個月的inv最大值)/ (最近(p,2p)個月的inv最大值)"""df1 = data.loc[:, inv + '1':inv + str(p)]value1 = np.nanmax(df1, axis=1)df2 = data.loc[:, inv + str(p):inv + str(2 * p)]value2 = np.nanmax(df2, axis=1)auto_value = value1 / value2return inv + '_mpp' + str(p), auto_valuedef Npp(self,data,inv, p):""":param data::param inv::param p::return: (最近p個月的inv最小值)/ (最近(p,2p)個月的inv最小值)"""df1 = data.loc[:, inv + '1':inv + str(p)]value1 = np.nanmin(df1, axis=1)df2 = data.loc[:, inv + str(p):inv + str(2 * p)]value2 = np.nanmin(df2, axis=1)auto_value = value1 / value2return inv + '_npp' + str(p), auto_valuedef auto_var(self,data_new,inv,p):""":param data::param inv::param p::return: 批量調用雙參數函數"""try:columns_name, values = self.Num(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Nmz(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Evr(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Avg(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Tot(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Tot2T(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Max(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Max(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Min(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Msg(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Msz(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cav(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cmn(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Std(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cva(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cmm(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cnm(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cxm(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cxp(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Ran(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Nci(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Ncd(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Ncn(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Pdn(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cmx(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cmp(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Cnp(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Msx(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Nci(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Trm(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Bup(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Mai(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Mad(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Rpp(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Dpp(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Mpp(data_new,inv, p)data_new[columns_name] = valuescolumns_name, values = self.Npp(data_new,inv, p)data_new[columns_name] = valuesexcept:passreturn data_newauto_var2 = time_series_feature() for p in range(1,12):for inv in ['ft','gt']:data = auto_var2.auto_var(data,inv,p) data

    結果:

    總結

    以上是生活随笔為你收集整理的金融风控实战——特征工程上的全部內容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。