日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

无监督学习与监督学习_有监督与无监督学习

發(fā)布時間:2023/12/15 编程问答 39 豆豆
生活随笔 收集整理的這篇文章主要介紹了 无监督学习与监督学习_有监督与无监督学习 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

無監(jiān)督學(xué)習(xí)與監(jiān)督學(xué)習(xí)

If we don’t know what the objective of the machine learning algorithm is, we may fail to build an accurate model. Knowing the types of Machine learning algorithms is essential. It helps us to see a bigger picture of machine learning, what is the goal of all the things that are being done in the field and especially, put us in a better position to break down a real problem and design a machine learning system.

如果我們不知道機(jī)器學(xué)習(xí)算法的目標(biāo)是什么,則可能無法建立準(zhǔn)確的模型。 了解機(jī)器學(xué)習(xí)算法的類型至關(guān)重要。 它可以幫助我們看到更大的機(jī)器學(xué)習(xí)圖景,這是該領(lǐng)域中所有工作的目標(biāo),尤其是使我們處于更好的位置,可以解決一個實際問題并設(shè)計一個機(jī)器學(xué)習(xí)系統(tǒng)。

The goal of most machine learning algorithms is to construct a model or a hypothesis. All machine learning models categorize as either supervised or unsupervised. In this note, we will discuss these two types, how they worked and how to use each of them in various fields.

大多數(shù)機(jī)器學(xué)習(xí)算法的目標(biāo)是構(gòu)建模型或假設(shè)。 所有機(jī)器學(xué)習(xí)模型都分為有監(jiān)督的或無監(jiān)督的。 在本說明中,我們將討論這兩種類型,它們是如何工作的以及如何在各個領(lǐng)域中使用它們。

The structure of this note:

本注釋的結(jié)構(gòu):

  • Supervised learning: categorizations and its applications.

    監(jiān)督學(xué)習(xí):分類及其應(yīng)用。
  • Unsupervised learning: categorizations and its applications.

    無監(jiān)督學(xué)習(xí):分類及其應(yīng)用。
  • Supervised learning vs. unsupervised learning.

    監(jiān)督學(xué)習(xí)與無監(jiān)督學(xué)習(xí)。
  • Let’s begin by taking a look at supervised learning.

    讓我們從監(jiān)督學(xué)習(xí)開始。

    什么是監(jiān)督學(xué)習(xí)? (What is Supervised Learning?)

    Supervise means to watch over, to provide direction for someone or something. Supervised learning is a process in which we teach or train the machine using data that is well labeled.

    監(jiān)督是指監(jiān)視,為某人或某物提供指示。 監(jiān)督學(xué)習(xí)是一個過程,在該過程中,我們使用標(biāo)記良好的數(shù)據(jù)來教學(xué)或訓(xùn)練機(jī)器。

    The most important concept to remember:

    要記住的最重要概念:

    Supervised learning means learning by example.

    監(jiān)督學(xué)習(xí)意味著通過榜樣學(xué)習(xí)。

    The objective of a supervised learning model is to predict the correct label for the newly presented input data. When training a supervised learning algorithm, the computer learns by example. The machine learns from past data and applies the learning to present data to predict future events. Our training data will consist of inputs paired with the correct outputs; that is, the input data is labeled or tagged as the right answer. In short, the machine already knows the output of the algorithm before it starts working on it or learning it.

    監(jiān)督學(xué)習(xí)模型的目的是為新顯示的輸入數(shù)據(jù)預(yù)測正確的標(biāo)簽。 在訓(xùn)練監(jiān)督學(xué)習(xí)算法時,計算機(jī)將通過示例進(jìn)行學(xué)習(xí)。 機(jī)器從過去的數(shù)據(jù)中學(xué)習(xí),并將學(xué)習(xí)的結(jié)果應(yīng)用于當(dāng)前數(shù)據(jù)以預(yù)測未來事件。 我們的訓(xùn)練數(shù)據(jù)將由輸入和正確的輸出組成; 也就是說,將輸入數(shù)據(jù)標(biāo)記或標(biāo)記為正確答案。 簡而言之,機(jī)器在開始執(zhí)行或?qū)W習(xí)算法之前已經(jīng)知道算法的輸出。

    During training, the algorithm will search for patterns in the data that correlate with the desired outputs. After training, a supervised learning algorithm will take in new unseen inputs and classify the label for them based on prior training data.

    在訓(xùn)練過程中,該算法將在數(shù)據(jù)中搜索與所需輸出相關(guān)的模式。 訓(xùn)練后,監(jiān)督學(xué)習(xí)算法將接收新的看不見的輸入,并根據(jù)先前的訓(xùn)練數(shù)據(jù)為它們分類標(biāo)簽。

    That definition might be too academic, so we could instead think about a real-life example related to that concept. Let’s say; we show a picture to a baby. We tell the baby that “these are ice-creams.” The baby here plays a role as a computer. The ice creams photo is our input, and annotation is our output data. The baby keeps in the mind that if the color is red, and it has a cone shape, then it’s an ice-cream. That’s how she learns. The baby will recognize the ice-cream picture the next time she sees it. That is because, well, we have already labeled the image, the baby knows what ice-cream is. That’s how supervised learning works.

    該定義可能過于學(xué)術(shù)化,因此我們可以考慮一個與該概念相關(guān)的現(xiàn)實示例。 比方說; 我們給嬰兒看照片。 我們告訴嬰兒“這些是冰淇淋”。 嬰兒在這里扮演著計算機(jī)的角色。 冰淇淋照片是我們的輸入,注釋是我們的輸出數(shù)據(jù)。 嬰兒要記住,如果顏色是紅色并且呈圓錐形,那就是冰淇淋。 她就是這樣學(xué)習(xí)的。 嬰兒在下次看到冰淇淋時會認(rèn)出它。 那是因為,好,我們已經(jīng)給圖像貼上標(biāo)簽,嬰兒知道什么是冰淇淋。 這就是監(jiān)督學(xué)習(xí)的工作方式。

    監(jiān)督機(jī)器學(xué)習(xí)分類 (Supervised Machine Learning Categorization)

    Supervised learning classified into two categories: classification and regression.

    監(jiān)督學(xué)習(xí)分為兩類:分類和回歸。

    分類 (Classification)

    A classification problem is when the output variable is a category, such as a pass or fail, red or blue, etc. We use classification algorithms to predict a group of data.

    分類問題是當(dāng)輸出變量是類別(例如通過或失敗,紅色或藍(lán)色等)時。我們使用分類算法來預(yù)測一組數(shù)據(jù)。

    During training, a classification algorithm will be given data points with an assigned category. The job of a classification algorithm is to take an input value then and assign it a class or group that it fits into based on the training data provided. The most common example of classification is determining if an email is spam or not. This problem is called a binary classification problem. The algorithm will be given training data with all emails (spam and note spam.) The model will find the features within the data that correlate to either class and create the mapping function to map input data to output: Y=f(x). Then, when provided with an unseen email, the model will use this function to predict whether the email is spam or not.

    在訓(xùn)練期間,將為分類算法提供具有指定類別的數(shù)據(jù)點。 分類算法的工作是獲取輸入值,然后根據(jù)提供的訓(xùn)練數(shù)據(jù)為它分配適合的類或組。 最常見的分類示例是確定電子郵件是否為垃圾郵件。 該問題稱為二進(jìn)制分類問題。 將為該算法提供所有電子郵件(垃圾郵件和便箋垃圾郵件)的訓(xùn)練數(shù)據(jù)。該模型將在數(shù)據(jù)中找到與任一類相關(guān)的特征,并創(chuàng)建將輸入數(shù)據(jù)映射到輸出的映射函數(shù): Y=f(x) 。 然后,當(dāng)收到看不見的電子郵件時,模型將使用此功能來預(yù)測電子郵件是否為垃圾郵件。

    Note:

    注意 :

    • We use Binary or Binomial Classification grouping data using two kinds of labels.

      我們使用兩種標(biāo)簽使用二元或二項分類分組數(shù)據(jù)。
    • We use Multi-class or Multinomial Classification grouping data using more than two kinds of labels.

      我們使用兩種以上的標(biāo)簽來使用多類或多項式分類分組數(shù)據(jù)。

    Here are a few popular classification algorithms:

    以下是一些流行的分類算法:

    • Decision Trees are one of the simplest and yet most useful machine learning algorithms. We split the data according to a certain parameter. The tree has two entities, namely decision nodes and leaves. The leaves are the decisions or outcomes. And the decision nodes are where we split the data.

      決策樹是最簡單但也是最有用的機(jī)器學(xué)習(xí)算法之一。 我們根據(jù)某個參數(shù)拆分?jǐn)?shù)據(jù)。 該樹具有兩個實體,即決策節(jié)點和葉子。 葉子是決定或結(jié)果。 決策節(jié)點是我們分割數(shù)據(jù)的地方。

    • Random Forest is a set of decision trees on various subsets of the given dataset. It takes the average to improve the predictive accuracy of that dataset. Instead of relying on one decision tree, the random forest takes the prediction from each tree and based on the majority votes of predictions; it predicts the final output.

      隨機(jī)森林是在給定數(shù)據(jù)集的各個子集上的一組決策樹。 需要平均才能提高該數(shù)據(jù)集的預(yù)測準(zhǔn)確性。 隨機(jī)森林不依賴一棵決策樹,而是根據(jù)預(yù)測的多數(shù)票從每一棵樹獲取預(yù)測。 它可以預(yù)測最終的輸出。

    • Support Vector Machines (SVM): The objective of the SVM algorithm is to find a hyperplane in N-dimensional space(N — the number of features). A hyperplane distinctly classifies the data points. That is, given a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one group or the other, making it a non-probabilistic binary linear classifier.

      支持向量機(jī)(SVM) :SVM算法的目標(biāo)是在N維空間(N —特征數(shù))中找到一個超平面。 超平面將數(shù)據(jù)點明確分類。 也就是說,給定一組訓(xùn)練示例,每個訓(xùn)練示例都標(biāo)記為屬于兩個類別中的一個或另一個,則SVM訓(xùn)練算法會建立一個模型,該模型分配 一組或一組新示例,使其成為非概率二進(jìn)制線性分類器。

    • We can use the K-Nearest Neighbor (KNN) for both classification and regression predictive problems. However, KNN is more widely used in classification problems in the industry. In the KNN algorithm, “k” means the number of nearest neighbors the model will consider. KNN is a model that classifies data points based on the points that are most similar to it. It uses test data to make an “educated guess” on what an unclassified position should be classified. If k=1, then the situation is simply attached to the class of its nearest neighbor.

      我們可以將K最近鄰(KNN)用于分類和回歸預(yù)測問題。 但是,KNN在行業(yè)中被更廣泛地用于分類問題。 在KNN算法中,“ k”表示最近的數(shù)字 模型將考慮的鄰居。 KNN是一個模型,該模型根據(jù)與數(shù)據(jù)點最相似的點對數(shù)據(jù)點進(jìn)行分類。 它使用測試數(shù)據(jù)對未分類的職位應(yīng)進(jìn)行何種分類做出“有根據(jù)的猜測”。 如果k = 1,那么情況將被簡單地附加到其最近鄰居的類中。

    回歸 (Regression)

    A regression problem is when the output variable has a real value, such as weight, height, or dollars. It is most often used to predict numerical values based on previous data observations. The typical example of regression is predicting housing prices of future sales based on the prevailing market price.

    回歸問題是輸出變量具有真實值(例如重量,高度或美元)時。 它最常用于根據(jù)先前的數(shù)據(jù)觀察來預(yù)測數(shù)值。 回歸的典型示例是根據(jù)當(dāng)前市場價格預(yù)測未來銷售的房屋價格。

    Some of the more familiar regression algorithms include

    一些更熟悉的回歸算法包括

    • Linear regression performs the task of predicting a target y (output) based on given features x (input). The input variable is called the Independent Variable, and the output variable is called the Dependent Variable. This regression technique finds a linear relationship between the independent variables and dependent variables. Linear regression classifies into two categories: simple linear regression and multiple linear regression. Simple linear regression has only one x and one y variable. In comparison, multiple linear regression has one y and two or more x variables.

      線性回歸基于給定特征x(輸入)執(zhí)行預(yù)測目標(biāo)y(輸出)的任務(wù)。 輸入變量稱為自變量,輸出變量稱為因變量。 這種回歸技術(shù)找到了自變量和因變量之間的線性關(guān)系。 線性回歸分為兩類:簡單線性回歸和多重線性回歸。 簡單的線性回歸只有一個x和一個y變量。 相比之下,多元線性回歸具有一個y和兩個或多個x變量。

    Multiple linear regression with PythonPython的多元線性回歸
    • Logistic regression performs the task of predicting the discrete values for the set of independent variables that have been passed to it. It predicts by mapping the unseen data to the logit function that has been programmed into it. The algorithm predicts the probability of the new data, and so it’s output lies between the range of 0 and 1.

      Logistic回歸執(zhí)行預(yù)測傳遞給它的一組獨(dú)立變量的離散值的任務(wù)。 它通過將看不見的數(shù)據(jù)映射到已編程到其中的logit函數(shù)來進(jìn)行預(yù)測。 該算法可預(yù)測新數(shù)據(jù)的概率,因此其輸出介于0到1之間。

    • Polynomial regression: Polynomial regression is a particular case of linear regression. This regression technique finds the curvilinear relationship between the independent variable x and the dependent variable y.

      多項式回歸:多項式回歸是線性回歸的一種特殊情況。 該回歸技術(shù)找到曲線關(guān)系 在自變量x和因變量y之間。

    • Ridge regression is a technique for analyzing multiple regression data that suffer from multicollinearity. Multicollinearity is a state of very high intercorrelations or inter-association among the independent variables. When multicollinearity occurs, least squares estimates are unbiased, but their variances are significant so that they may be far from the actual value.

      Ridge回歸是用于分析遭受多重共線性的多個回歸數(shù)據(jù)的技術(shù)。 多重共線性是自變量之間具有非常高的相互關(guān)系或相互聯(lián)系的狀態(tài)。 當(dāng)發(fā)生多重共線性時,最小二乘估計是無偏的,但是它們的方差很大,因此它們可能與實際值相去甚遠(yuǎn)。

    Note:

    注意 :

    • If the label is categorical, the model is known as a “classification.”

      如果標(biāo)簽是分類的,則該模型稱為“ 分類”。

    • If the label is numeric, the model is known as a “regression.”

      如果標(biāo)簽為數(shù)字 ,則該模型稱為“ 回歸”。

    Some practical applications of supervised learning algorithms in real life:

    監(jiān)督學(xué)習(xí)算法在現(xiàn)實生活中的一些實際應(yīng)用:

    • BioInformatics: fingerprints, iris texture, earlobe, and so on.

      生物信息學(xué):指紋,虹膜紋理,耳垂等。
    • Face detection, spam detection.

      人臉檢測,垃圾郵件檢測。
    • Signature recognition, speech recognition.

      簽名識別,語音識別。
    • Weather forecasting

      天氣預(yù)報
    • Stock price predictions, among others

      股票價格預(yù)測等

    什么是無監(jiān)督學(xué)習(xí)? (What is Unsupervised Learning?)

    Now we know the basic to supervised learning, it would be pertinent to hop on unsupervised learning.

    現(xiàn)在我們知道了監(jiān)督學(xué)習(xí)的基礎(chǔ),因此跳到無監(jiān)督學(xué)習(xí)就很有意義了。

    Unsupervised learning is the method that trains machines to use data that is neither classified nor labeled. It means there is no training data set, and the machine learns by itself. The computer needs to be programmed to learn by itself. It needs to understand and provide insights from both structured and unstructured data.

    無監(jiān)督學(xué)習(xí)是一種訓(xùn)練機(jī)器使用既未分類也未標(biāo)記的數(shù)據(jù)的方法。 這意味著沒有訓(xùn)練數(shù)據(jù)集,機(jī)器可以自行學(xué)習(xí)。 需要對計算機(jī)進(jìn)行編程以自行學(xué)習(xí)。 它需要從結(jié)構(gòu)化和非結(jié)構(gòu)化數(shù)據(jù)中理解并提供見解。

    The idea is to expose the machines to large volumes of varying data and allow it to learn from that data to provide insights that were previously unknown and to identify hidden patterns. As such, there aren’t necessarily defined outcomes from unsupervised learning algorithms. Instead, it determines what is different or exciting from the given dataset.

    這個想法是將機(jī)器暴露給大量變化的數(shù)據(jù),并允許其從數(shù)據(jù)中學(xué)習(xí),以提供以前未知的見解并識別隱藏的模式。 因此,不一定要定義無監(jiān)督學(xué)習(xí)算法的結(jié)果。 相反,它確定與給定數(shù)據(jù)集有何不同或令人興奮。

    During the process of unsupervised learning, the system does not have particular data sets, and the outcomes to most of the problems are mostly unknown. In simple terminology, the AI system and the machine learning objective is blinded when it goes into the operation. The lack of proper input and output algorithms makes the process even more challenging.

    在無監(jiān)督學(xué)習(xí)的過程中,系統(tǒng)沒有特定的數(shù)據(jù)集,并且大多數(shù)問題的結(jié)果大多是未知的。 用簡單的術(shù)語來說,人工智能系統(tǒng)和機(jī)器學(xué)習(xí)目標(biāo)在投入運(yùn)營時是盲目的。 缺乏適當(dāng)?shù)妮斎牒洼敵鏊惴ㄊ乖撨^程更具挑戰(zhàn)性。

    Let’s make the concept simpler through the use of an example. We have shown a group of ice-creams and cupcakes pictures to the baby. Assume the baby hasn’t seen ice-creams and cakes earlier. So the baby doesn’t know what the feature of an ice-cream or a cupcake is. In this example, the baby is not able to categorize ice-creams and cakes as a supervised learning example. The whole process that follows supervised learning is simple. It is incredibly straightforward, as we teach the baby all the details on the figures.

    讓我們通過使用示例來簡化概念。 我們給嬰兒看了一組冰淇淋和紙杯蛋糕的照片。 假設(shè)嬰兒較早沒有看過冰淇淋和蛋糕。 因此,嬰兒不知道冰淇淋或紙杯蛋糕的特征是什么。 在此示例中,嬰兒無法將冰淇淋和蛋糕分類為有監(jiān)督的學(xué)習(xí)示例。 監(jiān)督學(xué)習(xí)之后的整個過程很簡單。 這是非常簡單明了的,因為我們教給嬰兒有關(guān)數(shù)字的所有細(xì)節(jié)。

    However, in unsupervised learning, the whole process becomes a little trickier. The algorithm for an unsupervised learning system has the same input data as the one for its supervised counterpart (in our case, ice-creams and cupcakes have different shapes and colors). However, have no specific outcomes. In a simple word, there is no label associated with this learning. Once the baby (the computer) has seen the picture (our input data), she learns from the information at hand. Now, with information related to the problem, our baby will recognize all similar objects and group them. In other words, the computer will design and label the objects itself. Technically, there are bound to be wrong answers, since there is a certain degree of probability. However, just like how humans work, the strength of machine learning lies in its ability to recognize mistakes, learn from them, and make better estimations next time. That process is known as unsupervised learning.

    但是,在無監(jiān)督學(xué)習(xí)中,整個過程變得有些棘手。 無監(jiān)督學(xué)習(xí)系統(tǒng)的算法與受監(jiān)督學(xué)習(xí)系統(tǒng)的算法具有相同的輸入數(shù)據(jù)(在我們的案例中,冰淇淋和紙杯蛋糕的形狀和顏色不同)。 但是,沒有具體結(jié)果。 簡而言之,沒有與此學(xué)習(xí)相關(guān)的標(biāo)簽。 嬰兒(計算機(jī))看到圖片(我們的輸入數(shù)據(jù))后,她將從手頭的信息中學(xué)習(xí)。 現(xiàn)在,有了與問題相關(guān)的信息,我們的寶寶將識別出所有相似的物體并將其分組。 換句話說,計算機(jī)將設(shè)計和標(biāo)記對象本身。 從技術(shù)上講,肯定存在錯誤的答案,因為存在一定程度的可能性。 但是,就像人類的工作方式一樣,機(jī)器學(xué)習(xí)的優(yōu)勢在于它能夠識別錯誤,從錯誤中學(xué)習(xí)并在下一次做出更好的估計。 該過程稱為無監(jiān)督學(xué)習(xí)。

    無監(jiān)督機(jī)器學(xué)習(xí)分類 (Unsupervised Machine Learning Categorization)

    Unsupervised learning classified into two categories: clustering and association problems.

    無監(jiān)督學(xué)習(xí)分為兩類:聚類和關(guān)聯(lián)問題。

    Clustering: A clustering problem involves organizing unlabelled data into similar groups, such as grouping customers by purchasing behavior. It is one of the most common unsupervised learning methods. We often use clustering in marketing campaigns. For example, clustering algorithms can group people with similar traits and likelihood to purchase. Once we have the groups, we can run tests on each group with different marketing copy that will help us better target our messaging to them in the future.

    群集:群集問題涉及將未標(biāo)記的數(shù)據(jù)組織到相似的組中,例如通過購買行為對客戶進(jìn)行分組。 它是最常見的無監(jiān)督學(xué)習(xí)方法之一。 我們經(jīng)常在營銷活動中使用群集。 例如,聚類算法可以將具有相似特征和購買可能性的人分組。 一旦有了小組,我們就可以對每個小組使用不同的營銷副本進(jìn)行測試,這將有助于我們將來更好地針對他們發(fā)送消息。

    • Hierarchical clustering — is an algorithm that groups similar objects into groups called clusters. In this technique, initially, each data point is considered an individual cluster. The algorithm goes over the various features of the data points and looks for the similarity between them. If the algorithm finds similar data, they group those data. The process continues until the dataset has been grouped, which creates a hierarchy for each of these clusters.

      分層聚類 —是一種將相似對象分為幾類的算法。 在此技術(shù)中,最初,每個數(shù)據(jù)點被視為一個單獨(dú)的群集。 該算法遍歷數(shù)據(jù)點的各種特征,并尋找它們之間的相似性。 如果算法找到相似的數(shù)據(jù),則將這些數(shù)據(jù)分組。 該過程將繼續(xù)進(jìn)行,直到將數(shù)據(jù)集分組為止,這將為每個群集創(chuàng)建層次結(jié)構(gòu)。

    researchgateresearchgate
    • K-Means Clustering — This algorithm works step-by-step, where the main goal is to achieve clusters that have labels to identify them. K-means is a centroid-based algorithm, or a distance-based algorithm, where we calculate the distances to assign a point to a cluster. The smallest distance between the data point and the centroid determines which group it belongs to while making sure the clusters do not interlay with each other. The centroid acts like the heart of the cluster. That ultimately gives us the cluster, which can be labeled as needed.

      K均值聚類 -此算法分步工作,主要目標(biāo)是獲得帶有標(biāo)簽的聚類以識別它們。 K均值是基于質(zhì)心的算法或基于距離的算法,我們在其中計算將點分配給聚類的距離。 數(shù)據(jù)點和形心之間的最小距離確定了它屬于哪個組,同時確保群集之間不會相互交錯。 質(zhì)心的作用就像集群的心臟。 最終為我們提供了集群,可以根據(jù)需要對其進(jìn)行標(biāo)記。

    Association problem is where you want to discover rules that describe large portions of your data, for example, if a person buys hamburger buns, she will likely buy hamburgers.

    關(guān)聯(lián)問題是您想要發(fā)現(xiàn)描述數(shù)據(jù)大部分的規(guī)則的地方,例如,如果某人購買漢堡包,那么她很可能會購買漢堡包。

    • Apriori algorithm is used for frequent mining itemsets and relevant association rules. This support maps the dependency of one data item with another, which can help us understand what data item influences the possibility of something happening to the other data item. For example, bread affects the buyer to buy milk and eggs. So that mapping helps increase profits for the store. This mapping process can be learned using this algorithm, which yields rules for its output.

      Apriori算法用于頻繁挖掘項目集和相關(guān)的關(guān)聯(lián)規(guī)則。 這種支持將一個數(shù)據(jù)項與另一個數(shù)據(jù)項之間的依賴關(guān)系映射到一起,這可以幫助我們了解哪些數(shù)據(jù)項會影響另一數(shù)據(jù)項發(fā)生某些事情的可能性。 例如,面包會影響買方購買牛奶和雞蛋。 這樣映射有助于增加商店的利潤。 可以使用此算法學(xué)習(xí)此映射過程,該算法為其輸出產(chǎn)生規(guī)則。

    • Frequent Pattern Growth Algorithm(FP-Growth Algorithm) is the method of finding frequent patterns without candidate generation. The algorithm finds the count of the repeated pattern, adds that to a table, and then finds the most plausible item and sets that as the root of the tree. We then add other data items into the tree and calculate the support. If that particular branch fails to meet the threshold of the support, it is pruned. Once all the iterations are completed, a tree with the root of the item will be created, which are then used to make rules of the association. FP-Growth algorithm is faster than apriori as the support is calculated and checked for increasing iterations rather than creating a standard and testing the support from the dataset.

      頻繁模式增長算法(FP-Growth Algorithm,FP-Growth Algorithm)是一種無需候選者生成即可找到頻繁模式的方法。 該算法找到重復(fù)模式的計數(shù),將其添加到表中,然后找到最合理的項目并將其設(shè)置為樹的根。 然后,我們將其他數(shù)據(jù)項添加到樹中并計算支持度。 如果該特定分支未達(dá)到支持的閾值,則將其修剪。 一旦所有迭代都完成,將創(chuàng)建帶有項目根的樹,然后將其用于制定關(guān)聯(lián)規(guī)則。 FP-Growth算法比先驗算法要快,因為要計算和檢查支撐以增加迭代次數(shù),而不是創(chuàng)建標(biāo)準(zhǔn)并從數(shù)據(jù)集中測試支撐。

    無監(jiān)督學(xué)習(xí)算法的應(yīng)用 (Applications of Unsupervised Learning Algorithms)

    Some practical applications of unsupervised learning algorithms include:

    無監(jiān)督學(xué)習(xí)算法的一些實際應(yīng)用包括:

    • Credit-Card Fraud Detection.

      信用卡欺詐檢測。
    • Identification of human errors during data entry

      識別數(shù)據(jù)輸入過程中的人為錯誤
    • Amazon uses unsupervised learning to learn the customer’s purchase and recommend the products which are most frequently bought together (an example of association rule mining).

      亞馬遜使用無監(jiān)督學(xué)習(xí)來學(xué)習(xí)客戶的購買并推薦最常一起購買的產(chǎn)品(關(guān)聯(lián)規(guī)則挖掘的示例)。

    監(jiān)督學(xué)習(xí)與無監(jiān)督學(xué)習(xí) (Supervise Learning vs. Unsupervised Learning)

    The most significant difference between supervised and unsupervised learning is that each data have a label in the case of supervised learning. In contrast, there is NO label for each input in the case of unsupervised learning, implying, our data have not been classified.

    監(jiān)督學(xué)習(xí)和非監(jiān)督學(xué)習(xí)之間的最大區(qū)別是,在監(jiān)督學(xué)習(xí)的情況下,每個數(shù)據(jù)都有一個標(biāo)簽 。 相反,在無監(jiān)督學(xué)習(xí)的情況下,每個輸入都沒有標(biāo)簽 ,這意味著我們的數(shù)據(jù)尚未分類。

    Note:

    注意 :

    • Supervised learning will always have an input-output pair.

      監(jiān)督學(xué)習(xí)將始終具有輸入輸出對。
    • Unsupervised learning is just data without a label nor meaning that we try to make some sense out of it.

      無監(jiān)督學(xué)習(xí)只是沒有標(biāo)簽的數(shù)據(jù),也不意味著我們試圖從中獲得一些意義。
    Quick summary快速總結(jié)

    什么時候應(yīng)該選擇監(jiān)督學(xué)習(xí)與無監(jiān)督學(xué)習(xí)? (When Should you Choose Supervised Learning vs. Unsupervised Learning?)

    A good strategy for honing in on the right machine learning approach is to:

    磨練正確的機(jī)器學(xué)習(xí)方法的一個好策略是:

    • Evaluate the data: Is our data labeled or unlabelled? Is there available expert knowledge to support additional labeling? That will help to determine whether a supervised, we should use unsupervised.

      評估數(shù)據(jù):我們的數(shù)據(jù)是帶標(biāo)簽的還是未帶標(biāo)簽的? 是否有可用的專家知識來支持附加標(biāo)簽? 那將有助于確定是否有監(jiān)督,我們應(yīng)該使用無監(jiān)督。

    • Review available algorithms that may suit the problem with regards to dimensionality (number of features, attributes, or characteristics). Candidate algorithms should be tailored to the overall volume of data and its structure.

      復(fù)習(xí)可能在維度(特征數(shù)量,屬性或特征)方面適合該問題的可用算法 。 應(yīng)根據(jù)整體數(shù)據(jù)量及其結(jié)構(gòu)來調(diào)整候選算法。

    In general, we use unsupervised machine learning when we do not have data on desired outcomes, such as determining a target market for a new product that a business has never sold before. However, if we are trying to get a better understanding of our existing consumer base, then supervised learning is the optimal technique.

    通常,當(dāng)我們沒有所需結(jié)果的數(shù)據(jù)時(例如,確定企業(yè)從未出售過的新產(chǎn)品的目標(biāo)市場),我們將使用無監(jiān)督機(jī)器學(xué)習(xí)。 但是,如果我們試圖更好地了解我們現(xiàn)有的消費(fèi)者基礎(chǔ),那么監(jiān)督學(xué)習(xí)是最佳技術(shù)。

    尾注 (End Notes)

    Supervised learning and unsupervised learning are critical concepts in the field of machine learning. A proper understanding of the basics is crucial before you jump into the pool of different machine learning algorithms.

    監(jiān)督學(xué)習(xí)和無監(jiān)督學(xué)習(xí)是機(jī)器學(xué)習(xí)領(lǐng)域中的關(guān)鍵概念。 在跳入不同的機(jī)器學(xué)習(xí)算法之前,對基礎(chǔ)知識有適當(dāng)?shù)牧私庵陵P(guān)重要。

    Learn on!

    繼續(xù)學(xué)習(xí)!

    me.meme.me

    資源: (Resource:)

    There are many machine learning books you can read. I certainly didn’t cover enough information here to fill a chapter, but that doesn’t mean you can’t keep learning! Fill your mind with more awesomeness, starting with the excellent links below.

    您可以閱讀許多機(jī)器學(xué)習(xí)書籍。 我當(dāng)然沒有在此處提供足夠的信息來填寫一章,但這并不意味著您無法繼續(xù)學(xué)習(xí)! 從下面的出色鏈接開始,讓您更加精采。

  • Supervised and unsupervised learning

    有監(jiān)督和無監(jiān)督學(xué)習(xí)

  • Machine learning course by Andrew Ng

    Ng的機(jī)器學(xué)習(xí)課程

  • 5 Beginner Friendly Steps to Learn Machine Learning and Data Science with Python

    5個初學(xué)者友好的步驟,以使用Python學(xué)習(xí)機(jī)器學(xué)習(xí)和數(shù)據(jù)科學(xué)

  • Machine Learning (ML) vs. AI and their Important Differences

    機(jī)器學(xué)習(xí)(ML)與AI及其重要區(qū)別

  • Dojo Data Science

    Dojo數(shù)據(jù)科學(xué)
  • https://datasciencedojo.com/https://datasciencedojo.com/

    翻譯自: https://medium.com/nothingaholic/supervised-vs-unsupervised-learning-eb4edc1c803b

    無監(jiān)督學(xué)習(xí)與監(jiān)督學(xué)習(xí)

    總結(jié)

    以上是生活随笔為你收集整理的无监督学习与监督学习_有监督与无监督学习的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。