日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

微软NNI---AutoFeatureENG

發(fā)布時(shí)間:2023/12/15 编程问答 33 豆豆
生活随笔 收集整理的這篇文章主要介紹了 微软NNI---AutoFeatureENG 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

01 AutoML概述

AutoML不光是調(diào)參,應(yīng)該包含自動(dòng)特征工程。”今天回過頭來看AutoML是一個(gè)系統(tǒng)化的體系,包含3個(gè)要素:

  • 自動(dòng)特征工程Auto Feature Engineering
  • 自動(dòng)調(diào)參HPO
  • 自動(dòng)神經(jīng)網(wǎng)絡(luò)探索NAS
  • 02 NNI概述

    NNI(NerualNetworkIntelligence)是微軟發(fā)起的一個(gè)AutoML開源工具,覆蓋了上文提到的3要素

    包括特征工程、神經(jīng)網(wǎng)絡(luò)架構(gòu)搜索(NAS)、超參調(diào)優(yōu)和模型壓縮在內(nèi)的步驟,你都能使用自動(dòng)機(jī)器學(xué)習(xí)算法來完成

    https://github.com/SpongebBob/tabular_automl_NNI

    總體看微軟的工具都有一個(gè)比較大的特點(diǎn),技術(shù)可能不一定多新穎,但是設(shè)計(jì)都非常贊。NNI的AutoFeatureENG基本包含了用戶對(duì)于AutoFeatureENG的一切幻想。


    03 細(xì)說NNI-AutoFeatureENG

    NNI把AutoFeatureENG拆分成exploration和selection兩個(gè)模塊。exploration主要是特征衍生和交叉,selection講的是如何做特征篩選。

    04 特征Exploration

    在特征衍生方面,微軟教科書般的把特征衍生分成以下一些方式:

    count: Count encoding is based on replacing categories with their counts computed on the train set, also named frequency encoding.

    target: Target encoding is based on encoding categorical variable values with the mean of target variable per value.

    embedding: Regard features as sentences, generate vectors using Word2Vec.

    crosscout: Count encoding on more than one-dimension, alike CTR (Click Through Rate).

    aggregete: Decide the aggregation functions of the features, including min/max/mean/var.

    nunique: Statistics of the number of unique features.

    histsta: Statistics of feature buckets, like histogram statistics.

    具體特征怎么交叉,哪一列和哪一列交叉,每一列特征用什么方式衍生呢?可以通過search_space.json這個(gè)文件控制。

    NNI provides count encoding for 1-order-op, as well as crosscount encoding, aggerate statistics (min max var mean median nunique) for 2-order-op.

    For example, we want to search the features which are a frequency encoding (valuecount) features on columns name {“C1”, …,” C26”}, in the following way:

    we can define a cross frequency encoding (value count on cross dims) method on columns {“C1”,…,”C26”} x {“C1”,…,”C26”} in the following way:

    The purpose of Exploration is to generate new features. You can use get_next_parameter function to get received feature candidates of one trial.

    RECEIVED_PARAMS = nni.get_next_parameter()

    05 特征Selection

    為了避免特征泛濫的情況,避免過擬合,一定要有Selection的機(jī)制挑選特征。這里微軟同學(xué)用了個(gè)小心機(jī),在特征篩選的時(shí)候主推了同樣是他們自己開源的算法lightGBM

    了解xgboost或者GBDT算法同學(xué)應(yīng)該知道,這種樹形結(jié)構(gòu)的算法是很容易計(jì)算出每個(gè)特征對(duì)于結(jié)果的影響的。所以使用lightGBM可以天然的進(jìn)行特征篩選。弊病就是,如果下游是個(gè)LR這種線性算法,篩選出來的特征是否具備普適性。跑通后產(chǎn)出的結(jié)果包含了每個(gè)特征的value以及屬性。

    06 總結(jié)

    NNI的AutoFeature模塊是給整個(gè)行業(yè)制定了一個(gè)教科書般的標(biāo)準(zhǔn),告訴大家這個(gè)東西要怎么做,有哪些模塊,使用起來非常方便。但是如果只是基于這樣簡單的模式,不一定能達(dá)到很好的效果。

    Suggestions to NNI

    About Exploration: If consider using DNN (like xDeepFM) to extract high-order feature would be better.

    About Selection: There could be more intelligent options, such as automatic selection system based on downstream models.

    Conclusion: NNI could offer users some inspirations of design and it is a good open source project. I suggest researchers leverage it to accelerate the AI research.

    source: 如何看待微軟最新發(fā)布的AutoML平臺(tái)NNI?By Garvin Li

    NNI review article from Zhihu: - By Garvin Li — An open source AutoML toolkit for neural architecture search, model compression and hyper-parameter tuning (NNI v2.4)

    總結(jié)

    以上是生活随笔為你收集整理的微软NNI---AutoFeatureENG的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。