出人意料的生日会400字_出人意料的有效遗传方法进行特征选择
出人意料的生日會400字
Genetic and evolutionary algorithms are often bashed as not being good enough to compete with the capabilities of neural networks, and for the most part, it’s true, which is why the industry seldom even considers these types of algorithms. They are too general whereas there are other specific solutions that are designed for specific problems, and require too much computing power. But there is one fascinating application of genetic algorithms to feature selection, an important part of machine learning.
遺傳算法和進化算法經常被指責為不夠好,無法與神經網絡的功能競爭,并且在大多數情況下都是如此,這就是為什么業界很少考慮使用這類算法的原因。 它們太籠統,而還有其他針對特定問題而設計的特定解決方案,它們需要太多的計算能力。 但是遺傳算法在特征選擇方面有一種引人入勝的應用,它是機器學習的重要組成部分。
We’ll explore the genetic/evolutionary model of thinking, how that approach can be applied to feature selection, and why it is effective, alongside diagrams and analogies.
我們將探究遺傳/進化思維模型,以及如何將這種方法應用于特征選擇,以及為何有效,以及圖表和類比。
In genetic algorithms, a population of candidate solutions, also known as individuals, creatures, or phenotypes, are evolved towards better solutions in an optimization problem. Each candidate has a set of properties that can be mutated and altered.
在遺傳算法中,一組候選解決方案(也稱為個體,生物或表型)在優化問題中朝著更好的解決方案發展。 每個候選者都有一組可以突變和更改的屬性。
These properties can be represented as a binary string (a sequences of zeroes and ones), but there exist other encodings. In the case of feature selection, each individual represents one selection of features, and each ‘property’ represents one feature, which can be turned on or off (1 or 0).
這些屬性可以表示為二進制字符串(零和一的序列),但是存在其他編碼。 在特征選擇的情況下,每個人代表一個特征選擇,每個“屬性”代表一個特征,可以打開或關閉(1或0)。
The evolution of individuals begins with a random generated population, meaning each’s properties are randomly initialized. Evolution is an iterative process, and the population in each iteration is referred to as a generation. In a genetic feature selection in a dataset with 900 columns, an initial population may consist of 300 individuals, or randomly generated combinations of on/off switches.
個體的進化始于隨機產生的種群,這意味著每個個體的屬性都是隨機初始化的。 演化是一個迭代過程,每次迭代中的總體稱為一代。 在具有900列的數據集中的遺傳特征選擇中,初始種群可能包含300個個體,或者是隨機生成的開/關開關組合。
In each generation, the fitness, which is the function of the problem being solved, of each individual is evaluated.
在每一代中,評估每個人的適應度,這是要解決的問題的功能。
One direct fitness function would be to simply evaluate the accuracy of a model when trained on that subset of data, or another of many possible model metrics. This can be a bit costly, though, so it should only be used with small datasets or populations.
一個直接的適應度函數將是在對該數據子集或許多可能的模型指標中的另一個進行訓練時,簡單地評估模型的準確性。 但是,這可能會有點昂貴,因此只能用于小型數據集或總體。
An alternative is use a variety of cheaper-to-access metrics that can assist in evaluating the fitness of each solution. Some include:
一種替代方法是使用各種價格便宜的度量標準,可以幫助評估每個解決方案的適用性。 其中包括:
- Collinearity. Make sure that features in a subset do not contain similar information by evaluating the overall correlation of each subset. 共線性。 通過評估每個子集的整體相關性,確保子集中的要素不包含相似的信息。
Entropy / separability. With the current dataset, how well separated are the classes? The more separable the data, the better it is.
熵 /可分離性。 使用當前數據集,這些類之間的分隔程度如何? 數據越分離,就越好。
- Hybrid. Combine these metrics with others like variance or how normally distributed the data is to yield a combination that satisfies the needs of the model. 混合動力車 將這些指標與方差或數據的正態分布等其他指標結合起來,可以得出滿足模型需求的組合。
With some controllable randomness injected to stimulate proper evolutionary discovery, individuals on the fitter side (scoring a better on the fitness function) are randomly selected. Randomness is added and ranking is not based on pure highest score because that would allow for little exploration and is not how evolution is conducted in the real biological world.
通過注入一些可控制的隨機性以刺激適當的進化發現,隨機選擇在鉗工側(在健身功能上得分更高)的個體。 增加了隨機性,并且排名不是基于純粹的最高分數,因為這將很少進行探索 ,也不是在實際生物世界中如何進行進化。
Even though individual 1 is shorter in terms of fitness level, stochastic selection gives it a chance and sees if a slight alteration and boot its performance. In this case, it turns out it does!即使個人1的適應水平較短,隨機選擇也會給它一個機會,看看是否有微小的改變并啟動其性能。 在這種情況下,事實證明確實如此!Each individual’s genome is modified — either through recombination, or ‘mating’, or through random mutation (slight modification) — to form a new generation. There are incredibly sophisticated ways to perform mutations and recombination that build upon the evolutionary discoveries of previous generations and the structures of the current population, so don’t think of this process as brute force. Instead, it is analyzing previous learning and testing out different hypotheses as to what may work and what will not in an intelligent fashion.
通過重組或“交配”或通過隨機突變(輕微修飾)對每個人的基因組進行修飾,以形成新一代。 在前幾代人的進化發現和當前人口的結構的基礎上,有許多非常復雜的方法可以進行突變和重組,因此不要將這一過程視為蠻力。 取而代之的是,它正在分析先前的學習,并以智能的方式測試了關于哪些可行,哪些無效的不同假設。
The algorithm completes when the maximum number of generations (iterations) have been created, or when a population has evolved enough such that its fitness level is satisfactory. We may say that the algorithm terminates when one individual reaches over 98% accuracy. Another alternative is to finish the algorithm when the best performing individual’s fitness plateaus, or converges.
當創建了最大代數(迭代)時,或者當種群進化到足以使其適應度令人滿意時,該算法即告完成。 我們可以說,當一個人的準確率超過98%時,該算法就會終止。 另一種選擇是在表現最佳的個人健身平穩或收斂時完成算法。
The result of this approach is an individual that yields a subset of the data that has best satisfies the cost function.
這種方法的結果是產生一個個體的數據,該數據的子集最能滿足成本函數 。
There is grandeur in this view of life, with its several powers, having been originally breathed into a few forms or into one; and that, whilst this planet has gone cycling on according to the fixed law of gravity, from so simple a beginning endless forms most beautiful and most wonderful have been, and are being, evolved. — Charles Darwin, The Origin of Species
這種生命觀具有宏偉的力量,最初具有多種形式或一種形式,具有多種力量。 而且,雖然這個星球有根據萬有引力定律固定自行車走了,從這么簡單的一個開始無休止的形式最美麗,最精彩的已經和正在得到發展 。 —查爾斯·達爾文, 《物種起源》
The genetic approach to feature selection can be expanded such that each value is not a binary 0 or 1 to indicate a presence in the subset, but a scalar multiplier, much like the result that linear discriminant analysis or principal component analysis yield. Initialization would be based on normally distributed noise, a mutation would entail adding or subtracting some amount, and a recombination would yield something like an average.
可以擴展遺傳方法進行特征選擇,以使每個值不是表示子集中是否存在的二進制0或1,而是標量乘數,非常類似于線性判別分析或主成分分析得出的結果。 初始化將基于正態分布的噪聲,突變將需要增加或減少一些量,重組將產生類似平均值的結果。
So what is the advantage of using genetic algorithms to select features?
那么使用遺傳算法選擇特征的優勢是什么?
It doesn’t limit how many features you can choose. If you are to use something like permutation importance or almost every other feature selection method, you must determine the amount of features you want to select — but no one really knows how to select a good number of resulting features. Why is it so that a 6th feature is discarded if the limit is five features even if the sixth is almost just as valuable as the fifth best one?
它不限制您可以選擇多少個功能 。 如果要使用排列重要性或幾乎所有其他特征選擇方法,則必須確定要選擇的特征數量,但是沒人真正知道如何選擇大量的最終特征。 為什么即使限制為五個特征,即使第六個幾乎與第五個最佳特征一樣有價值,第六個特征也被丟棄了?
It is highly customizable. While genetic algorithms generally have the impression of being very computationally expensive, using it properly yields a great bang for your buck. You can control many aspects of evolutionary algorithms, from population size to learning rate to degree of random selection to mutation/recombination styling, and tailor them to your specific problem. Permutation importance is also a default expensive algorithm, but without the upsides and customizability of genetic feature selection.
它是高度可定制的 。 雖然遺傳算法通常給人以計算上非常昂貴的印象,但正確使用它會帶來很大的收益。 您可以控制進化算法的許多方面,從人口規模到學習率再到隨機選擇的程度再到突變/重組樣式,并根據您的特定問題進行調整。 排列重要性也是默認的昂貴算法,但是沒有遺傳特征選擇的缺點和可定制性。
It is efficient because it is intelligent. Traditional methods of feature selection essentially entail trying out all the combinations of potential features. Genetic feature selection takes a different approach — it learns from an exploration/exploitation trade-off, searching a larger search space and arriving at a better solution in less time — if it’s programmed properly.
這是有效的,因為它很智能 。 傳統的特征選擇方法實質上需要嘗試所有潛在特征的組合。 遺傳特征選擇采用不同的方法-如果編程正確,它可以從探索/開發的權衡中學習,搜索更大的搜索空間并在更短的時間內獲得更好的解決方案。
Data scientists and machine learning engineers have usually been quick to discard evolutionary algorithms, and is partially justified. On the other hand, innovation can only be achieved by opening our eyes up to different and sometimes seemingly stupid and pointless exploration and application.
數據科學家和機器學習工程師通常很快就放棄了進化算法,并且在某種程度上是合理的。 另一方面,創新只能通過睜開眼睛來進行不同的,有時是愚蠢的,毫無意義的探索和應用來實現。
All images except for header image created by author.
除作者創建的標題圖像外的所有圖像。
翻譯自: https://towardsdatascience.com/the-surprisingly-effective-genetic-approach-to-feature-selection-7eb2b080b713
出人意料的生日會400字
總結
以上是生活随笔為你收集整理的出人意料的生日会400字_出人意料的有效遗传方法进行特征选择的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 总订单超 1200 架,C919 国产客
- 下一篇: fast.ai_使用fast.ai自组织