机器学习术语_机器学习术语神秘化。
機器學習術語
Till this day, my favorite definition of a Machine is ; something that makes work easier. At its simplest, a machine is an invention that does a job better and faster and more powerfully than a human being. With regards to machine learning, this is the why. There is a need to preform a task more efficiently and at a faster rate. What is the task? to make decisions. Hence what then is Machine learning??
直到今天,我最喜歡的機器定義是; 使工作更輕松的東西。 最簡單地說,機器是一項比人類更好,更快,更強大地完成工作的發明。 關于機器學習,這就是原因。 需要更有效和更快地執行任務。 任務是什么? 做出決定。 因此,機器學習又是什么呢?
Before I answer that, a quick introduction. In my journey to becoming a data scientist, I found myself having to learn a lot of new terminologies. Even certain terms that already existed in my vocabulary, took on a new meaning. A lot of these terminologies can be wordy and somewhat intimidating. My aim in this write up is to provide as much as possible layman definitions for the basic terminologies associated with machine learning that I have come across.
在我回答之前,先進行快速介紹。 在成為數據科學家的過程中,我發現自己必須學習許多新術語。 甚至我詞匯中已經存在的某些術語也具有新的含義。 這些術語中的許多術語可能有些羅word,有些令人生畏。 我寫這篇文章的目的是為我遇到的與機器學習相關的基本術語提供盡可能多的外行定義。
Data science in its essence is the skill of using information available to gain insight and improve processes. It does this using a blend of machine learning algorithms, statistics, business intelligence, and programming. It aims to discover patterns from the raw data, which in turn provides insights into any processes.
數據科學從本質上講就是使用可用信息來獲得洞察力和改進流程的技能。 它結合了機器學習算法,統計數據,商業智能和編程來完成此任務。 它旨在從原始數據中發現模式,進而提供對任何流程的見解。
Now back to the question, what is machine learning?
現在回到問題,什么是機器學習?
Machine learning is a field in technology that allows machine to learn from data and self improve. Machine-learning algorithms use statistics and other mathematical tools to find patterns in data.
機器學習是技術領域,允許機器從數據中學習并自我完善。 機器學習算法使用統計數據和其他數學工具來查找數據模式。
Machine Learning can be separated into three groups:
機器學習可以分為三類:
Supervised learning, is a type of machine learning, where data is labeled to tell the machine exactly what patterns it should look for. Under the umbrella of supervised learning:
監督學習是機器學習的一種類型,其中標記數據以告知機器確切應尋找的模式。 在監督學習的保護下:
Classification: In classification tasks, the machine learning program must draw a conclusion from observed values and determine to
分類 :在分類任務中,機器學習程序必須從觀察值得出結論并確定
what category new observations belong
新觀測值屬于什么類別
Regression: In regression tasks, the machine learning program must estimate and understand the relationships among variables.Regression analysis focuses on one dependent variable and a series of other changing variables.
回歸 :在回歸任務中,機器學習程序必須估計并了解變量之間的關系。回歸分析著重于一個因變量和一系列其他變化的變量。
Forecasting: Forecasting is the process of making predictions about the future based on the past and present data,
預測 :預測是根據過去和現在的數據對未來進行預測的過程,
Unsupervised learning, here the data has no labels. The machine just looks for whatever patterns it can find.Under the umbrella of Unsupervised learning:
無監督學習,這里的數據沒有標簽。 機器只會尋找可以找到的任何模式。在無監督學習的保護下:
Clustering: Clustering involves grouping sets of similar data (based on defined criteria).After which you can analyze and find patterns
聚類 :聚類涉及將相似數據集(基于定義的標準)進行分組,然后您可以分析和查找模式
Dimension reduction: Dimension reduction reduces the number of variables being considered to find the exact information required.
降維 :降維減少了為了找到所需的確切信息而要考慮的變量數量。
Reinforcement learning, learns by trial and error to achieve a clear objective. It tries out lots of different things and is rewarded or penalized depending on whether its behaviors help or hinder it from reaching its objective.
強化學習,通過反復試驗來學習,以達到明確的目標。 它嘗試許多不同的事物,并根據其行為是幫助還是阻礙其實現目標而受到獎勵或懲罰。
Machine learning Algorithm
機器學習算法
An ‘algorithm’ is a series of steps to complete a task.
“ 算法”是完成任務的一系列步驟。
An algorithm in machine learning is a procedure that is run on data to create a machine learning “model.”
機器學習中的算法是在數據上運行以創建機器學習模型的過程。 ”
Machine learning algorithms perform “pattern recognition.” Algorithms “learn” from data, or are “fit” on a dataset.
機器學習算法執行“ 模式識別” 。 算法從數據中“ 學習 ”,或“ 適合 ”數據集。
A “Model” in machine learning is the output of a machine learning algorithm run on data.
機器學習中的“ 模型 ”是在數據上運行的機器學習算法的輸出。
A model represents what was learned by a machine learning algorithm.
模型代表通過機器學習算法學習到的內容。
流行的機器學習算法 (Popular Machine Learning Algorithms)
Linear regression (Supervised Learning/Regression): Linear regression is the most basic type of regression. Simple linear regression allows us to understand the relationships between two continuous variables.
線性回歸 (監督學習/回歸):線性回歸是最基本的回歸類型。 簡單的線性回歸使我們能夠理解兩個連續變量之間的關系。
Logistic regression (Supervised learning — Classification): Logistic regression focuses on estimating the probability of an event occurring based on the previous data provided. It is used to cover a binary dependent variable, that is where only two values, 0 and 1, represent outcomes.
Logistic回歸 (監督學習-分類): Logistic回歸專注于根據提供的先前數據估算事件發生的概率。 它用于覆蓋二進制因變量,即只有兩個值0和1表示結果。
Naive Bayes (Supervised Learning — Classification): The Na?ve Bayes classifier is based on Bayes’ theorem and classifies every value as independent of any other value. It allows us to predict a class/category, based on a given set of features, using probability.
樸素貝葉斯 (監督學習-分類):樸素貝葉斯分類器基于貝葉斯定理,將每個值分類為與任何其他值無關。 它使我們能夠使用概率基于給定的一組特征來預測類別/類別。
K-nearest neighbor algorithm (Supervised Learning): The Neighbor algorithm estimates how likely a data point is to be a member of one group or another. It essentially looks at the data points around a single data point to determine what group it is actually in.
K近鄰算法 (監督學習): Neighbor算法估計數據點成為一個或另一個組的成員的可能性。 它實質上是查看單個數據點周圍的數據點,以確定其實際位于哪個組中。
Decision trees (Supervised Learning — Classification/Regression): A decision tree is a flow-chart-like tree structure that uses a branching method to illustrate every possible outcome of a decision. Each node within the tree represents a test on a specific variable and each branch is the outcome of that test.
決策樹 (監督學習-分類/回歸):決策樹是類似于流程圖的樹結構,使用分支方法來說明決策的每種可能結果。 樹中的每個節點代表對特定變量的測試,每個分支都是該測試的結果。
Random Forests (Supervised Learning — Classification/Regression): Random forest, like its name implies, consists of a large number of individual decision trees that operate as an ensemble. Each individual tree in the random forest spits out a class prediction and the class with the most votes becomes our model’s prediction
隨機森林 (監督學習-分類/回歸):隨機森林,顧名思義,是由大量獨立的決策樹組成的 。 隨機森林中的每棵樹都會發出類別預測,而投票數最多的類別將成為我們模型的預測
Support Vector Machines (Supervised Learning — Classification); Support Vector Machine algorithms are supervised learning models that analyze data used for classification and regression analysis. They essentially filter data into categories, which is achieved by providing a set of training examples, each set marked as belonging to one or the other of the two categories. The algorithm then works to build a model that assigns new values to one category or the other.
支持向量機 (監督學習-分類); 支持向量機算法是有監督的學習模型,可以分析用于分類和回歸分析的數據。 它們實質上將數據過濾到類別中,這是通過提供一組訓練示例來實現的,每組訓練示例都標記為屬于兩個類別中的一個或另一個。 然后,該算法將構建一個將新值分配給一個類別或另一個類別的模型。
K Means Clustering Algorithm (Unsupervised Learning — Clustering)
K均值聚類算法 (無監督學習—聚類)
The algorithm works by finding groups within the data, with the number of groups represented by the variable K. It then works iteratively to assign each data point to one of K groups based on the features provided.
該算法通過查找數據中的組(用變量K表示的組數)進行工作。然后,該算法根據提供的功能迭代地將每個數據點分配給K個組之一。
Artificial Neural Networks (Reinforcement Learning) : An artificial neural network (ANN) comprises ‘units’ arranged in a series of layers, each of which connects to layers on either side. ANNs are inspired by biological systems, such as the brain, and how they process information. ANNs are essentially a large number of interconnected processing elements, working in unison to solve specific problems.
人工神經網絡 (強化學習):人工神經網絡(ANN)包括布置在一系列層中的“單元”,每個單元連接到任一側的層。 人工神經網絡受到諸如大腦之類的生物系統以及它們如何處理信息的啟發。 人工神經網絡本質上是大量相互連接的處理元素,它們協同工作以解決特定問題。
Other useful terminologies when talking about machine learning include:
在談論機器學習時,其他有用的術語包括:
Ensemble learning method, combining multiple algorithms to generate better results for classification, regression and other tasks. Each individual classifier is weak, but when combined with others, can produce excellent results.
集成學習方法 ,結合多種算法為分類,回歸和其他任務生成更好的結果。 每個單獨的分類器都很弱,但是與其他分類器結合使用時,可以產生出色的結果。
Artificial Intelligence (AI) refers to machines that can learn, reason, and act for themselves. They can make their own decisions when faced with new situations, in the same way that humans and animals can.
人工智能 (AI)是指可以自行學習,推理并采取行動的機器。 面對新的情況,他們可以像人類和動物一樣做出自己的決定。
Data are characteristics or information that are collected through observation
數據是通過觀察收集的特征或信息
Data Cleaning refers to the steps needed to take to prepare you data for use. Here you detect incomplete, incorrect, inaccurate or irrelevant data from your dataset and then you choose either to replace, modify, delete or coarse the data as needed
數據清理是指準備使用數據所需采取的步驟。 在這里,您可以從數據集中檢測不完整,不正確,不準確或不相關的數據,然后根據需要選擇替換,修改,刪除或粗化數據
Exploratory data analysis (EDA):This refers to the critical process of performing initial investigations on data so as to discover patterns, to spot anomalies, to test hypothesis and to check assumptions with the help of summary statistics and graphical representations.
探索性數據分析 (EDA):這是對數據進行初步調查以發現模式,發現異常情況,檢驗假設并在匯總統計信息和圖形表示的幫助下檢查假設的關鍵過程。
Training data is the main and most important data which helps machines to learn and make the predictions. This data set is used by machine learning engineer to develop your algorithm and more than 70% of your total data used in the project.
訓練數據是主要和最重要的數據,可幫助機器學習并做出預測。 機器學習工程師使用此數據集來開發您的算法,并在項目中使用總數據的70%以上。
Validation Data is the second type of data set used to validate the machine learning model before final delivery of project. ML model validation is important to ensure the accuracy of model prediction to develop a right application. Using this type of data helps to know whether model can correctly identify the new examples or not.
驗證數據是第二種數據集,用于在最終交付項目之前驗證機器學習模型。 ML模型驗證對于確保模型預測的準確性以開發正確的應用程序非常重要。 使用此類數據有助于了解模型是否可以正確識別新示例。
Testing data is the final and last type of data helps to check the prediction level of machine learning and AI model.
測試數據是最終的數據類型,也是最后一種數據類型,它有助于檢查機器學習和AI模型的預測水平。
The world of machine learning and data science is vast and ever growing. It is easy to view it as an insurmountable endeavor. I’ll like to encourage anyone at wishing to take the path down this road not to be intimidated. A lot of these terminologies only sound incomprehensible but once you discover its very essence, everything becomes clear. Again, good things take time and great ones take even more time, so do not weary and keep pushing forward.
機器學習和數據科學的世界廣闊且不斷增長。 很容易將其視為無法克服的努力。 我想鼓勵任何想走這條路的人不要被嚇到。 這些術語中的許多聽起來僅是難以理解的,但是一旦發現其本質,一切就變得清晰起來。 同樣,美好的事物需要時間,偉大的事物需要更多的時間,因此不要疲倦并繼續前進。
翻譯自: https://medium.com/@chibuzo.ugonabo/machine-learning-terminologies-demystified-6aa1aa81a57b
機器學習術語
總結
以上是生活随笔為你收集整理的机器学习术语_机器学习术语神秘化。的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 白雪小町_町
- 下一篇: centos有趣软件包_这5个软件包使学