當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

人工智能中对机器学非常简要的介绍

發(fā)布時(shí)間：2025/7/25 编程问答 26 豆豆

生活随笔收集整理的這篇文章主要介紹了人工智能中对机器学非常简要的介绍小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

Very Brief Introduction to Machine Learning for AI?

The topics summarized here are covered in these?slides.

本主題總結(jié)的內(nèi)容包含在這些幻燈片中。

Intelligence（智能）

The notion of?intelligence?can be defined in many ways. Here we define it as the ability to take the?right decisions, according to some criterion (e.g. survival and reproduction, for most animals). To take better decisions requires?knowledge, in a form that is?operational, i.e., can be used to interpret sensory data and use that information to take decisions.

智能的概念可以用很多種方式來(lái)定義。本文中，我們把他定義為參照某些標(biāo)準(zhǔn)（例如，大多數(shù)動(dòng)物的生存和繁殖）做出正確決策的能力。要做出更好的決策，需要可操作形式的知識(shí)的支撐，例如，可以用于轉(zhuǎn)化感覺(jué)數(shù)據(jù)，并使用轉(zhuǎn)化的信息來(lái)決定。

Artificial Intelligence（人工智能）

Computers already possess some intelligence thanks to all the programs that humans have crafted and which allow them to “do things” that we consider useful (and that is basically what we mean for a computer to take the right decisions). But there are many tasks which animals and humans are able to do rather easily but remain out of reach of computers, at the beginning of the 21st century. Many of these tasks fall under the label of Artificial Intelligence, and include many perception and control tasks. Why is it that we have failed to write programs for these tasks? I believe that it is mostly because we do not know explicitly (formally) how to do these tasks, even though our brain (coupled with a body) can do them. Doing those tasks involve knowledge that is currently implicit, but we have information about those tasks through data and examples (e.g. observations of what a human would do given a particular request or input). How do we get machines to acquire that kind of intelligence? Using data and examples to build operational knowledge is what learning is about.

基于人類(lèi)編寫(xiě)的讓計(jì)算機(jī)“做一些事情”的代碼，計(jì)算機(jī)已經(jīng)可以做一些我們認(rèn)為有意義的只能的事情（那基本上是我們讓一臺(tái)電腦做正確決策的意思）。但是，在第二十一世紀(jì)初，仍然有很多事情人類(lèi)和動(dòng)物可以很容易地完成，但是計(jì)算機(jī)卻不能完成。Many of these tasks fall under the label of Artificial Intelligence，包括許多感知和控制任務(wù)。為什么我們寫(xiě)不成代碼來(lái)完成這些任務(wù)呢？我覺(jué)得主要是因?yàn)槲覀冞€沒(méi)有清楚（正式）的知道如何做這些事情，雖然我們有一個(gè)大腦（加上一個(gè)身軀）可以完成他們。完成這些事情需要一些目前隱含存在的知識(shí)的支撐，but we have information about those tasks through data and examples（如，觀察在給出特定的需求或者輸入的時(shí)候，一個(gè)人會(huì)做什么）。我們?cè)鯓幼寵C(jī)器來(lái)獲得這種智能呢？用數(shù)據(jù)和實(shí)例來(lái)構(gòu)建可操作的知識(shí)就學(xué)習(xí)要做的事情。

Machine Learning（機(jī)器學(xué)習(xí)）

Machine learning has a long history and numerous textbooks have been written that do a good job of covering its main principles. Among the recent ones I suggest:

機(jī)器學(xué)習(xí)的歷史非常長(zhǎng)遠(yuǎn)，已經(jīng)有非常多的不錯(cuò)的書(shū)包含了機(jī)器學(xué)習(xí)的主要原理。建議讀以下書(shū)籍：

Chris Bishop, “Pattern Recognition and Machine Learning”, 2007
模式識(shí)別和機(jī)器學(xué)習(xí)

Simon Haykin, “Neural Networks: a Comprehensive Foundation”, 2009 (3rd edition)
?神經(jīng)網(wǎng)絡(luò)：綜合基礎(chǔ)?

Richard O. Duda, Peter E. Hart and David G. Stork, “Pattern Classification”, 2001 (2nd edition)
模式分類(lèi)

Here we focus on a few concepts that are most relevant to this course.

下面就本文相關(guān)的一些主要概念做解釋。

Formalization of Learning（形式化學(xué)習(xí) ？）

First, let us formalize the most common mathematical framework for learning. We are given training examples

with the??being examples sampled from an?unknown?process?. We are also given a loss functional??which takes as argument a decision function??and an example?, and returns a real-valued scalar. We want to minimize the expected value of??under the unknown generating process?.

首先，讓我們形式化機(jī)器學(xué)習(xí)中最常見(jiàn)的計(jì)算框架。我們給出訓(xùn)練實(shí)例

其中為未知過(guò)程的一個(gè)樣本。我們給出作為決策函數(shù)的參數(shù)的損失函數(shù) ，一個(gè)樣本 , 和一個(gè)實(shí)值的標(biāo)量返回值。我們希望最小化在未知產(chǎn)生函數(shù)下的期望值

Supervised Learning（監(jiān)督式學(xué)習(xí)）

In supervised learning, each examples is an (input,target) pair:??and??takes an??as argument. The most common examples are

在監(jiān)督是學(xué)習(xí)中，每一個(gè)樣本個(gè)是一個(gè)（輸入，目標(biāo)）對(duì)偶：，為的參數(shù)。最常見(jiàn)的例子如下

regression:??is a real-valued scalar or vector, the output of??is in the same set of values as?, and we often take as loss functional the squared error
回歸：?是一個(gè)實(shí)值的標(biāo)量或者向量， the output of??is in the same set of values as?, 通常取平方誤差為損失函數(shù)。

classification（分類(lèi)）:??is a finite integer (e.g. a symbol) corresponding to a class index, and we often take as loss function the negative conditional log-likelihood, with the interpretation that??estimates?:

where we have the constraints（這里的約束為）

Unsupervised Learning（非監(jiān)督式學(xué)習(xí)）

In unsupervised learning we are learning a function??which helps to characterize the unknown distribution?. Sometimes??is directly an estimator of??itself (this is called density estimation). In many other cases??is an attempt to characterize where the density concentrates. Clustering algorithms divide up the input space in regions (often centered around a prototype example or centroid ). Some clustering algorithms create a hard partition (e.g. the k-means algorithm) while others construct a soft partition (e.g. a Gaussian mixture model) which assign to each??a probability of belonging to each cluster. Another kind of unsupervised learning algorithms are those that construct a new representation for?. Many deep learning algorithms fall in this category, and so does Principal Components Analysis.

在無(wú)監(jiān)督學(xué)習(xí)中，我們要學(xué)習(xí)一個(gè)函數(shù)? 來(lái)描述未知分布?。通常?是對(duì)?本身的一個(gè)估計(jì)（密度估計(jì)）。在許多其他情況下，? 嘗試描述哪里是密度中心。聚類(lèi)算法按照區(qū)域（通常圍繞一個(gè)原始的樣本的或質(zhì)心）劃分輸入空間。一些聚類(lèi)算法創(chuàng)建一個(gè)硬劃分（如，k-均值算法），而其他構(gòu)建一個(gè)軟劃分（如高斯混合模型），并分配給每個(gè)?一個(gè)概率表示屬于每個(gè)聚簇的可能性。另一類(lèi)無(wú)監(jiān)督的學(xué)習(xí)算法是一類(lèi)構(gòu)造? 的新表示的算法，許多深度學(xué)習(xí)算法屬于這一類(lèi)，另外主成分分析（PCA）也是。

Local Generalization（局部泛化）

The vast majority of learning algorithms exploit a single principle for achieving generalization: local generalization. It assumes that if input example??is close to input example?, then the corresponding outputs??and??should also be close. This is basically the principle used to perform local interpolation. This principle is very powerful, but it has limitations: what if we have to extrapolate? or equivalently, what if the target unknown function has many more variations than the number of training examples? in that case there is no way that local generalization will work, because we need at least as many examples as there are ups and downs of the target function, in order to cover those variations and be able to generalize by this principle. This issue is deeply connected to the so-called?curse of dimensionality?for the following reason. When the input space is high-dimensional, it is easy for it to have a number of variations of interest that is exponential in the number of input dimensions. For example, imagine that we want to distinguish between 10 different values of each input variable (each element of the input vector), and that we care about about all the??configurations of these??variables. Using only local generalization, we need to see at least one example of each of these??configurations in order to be able to generalize to all of them.

Distributed versus Local Representation and Non-Local Generalization（分布式 VS 局部表示和非局部泛化）

A simple-minded binary local representation of integer??is a sequence of??bits such that?, and all bits are 0 except the?-th one. A simple-minded binary distributed representation of integer??is a sequence of??bits with the usual binary encoding for?. In this example we see that distributed representations can be exponentially more efficient than local ones. In general, for learning algorithms, distributed representations have the potential to capture exponentially more variations than local ones for the same number of free parameters. They hence offer the potential for better generalization because learning theory shows that the number of examples needed (to achieve a desired degree of generalization performance) to tune??effective degrees of freedom is?.

Another illustration of the difference between distributed and local representation (and corresponding local and non-local generalization) is with (traditional) clustering versus Principal Component Analysis (PCA) or Restricted Boltzmann Machines (RBMs). The former is local while the latter is distributed. With k-means clustering we maintain a vector of parameters for each prototype, i.e., one for each of the regions distinguishable by the learner. With PCA we represent the distribution by keeping track of its major directions of variations. Now imagine a simplified interpretation of PCA in which we care mostly, for each direction of variation, whether the projection of the data in that direction is above or below a threshold. With??directions, we can thus distinguish between??regions. RBMs are similar in that they define??hyper-planes and associate a bit to an indicator of being on one side or the other of each hyper-plane. An RBM therefore associates one input region to each configuration of the representation bits (these bits are called the hidden units, in neural network parlance). The number of parameters of the RBM is roughly equal to the number these bits times the input dimension. Again, we see that the number of regions representable by an RBM or a PCA (distributed representation) can grow exponentially in the number of parameters, whereas the number of regions representable by traditional clustering (e.g. k-means or Gaussian mixture, local representation) grows only linearly with the number of parameters. Another way to look at this is to realize that an RBM can generalize to a new region corresponding to a configuration of its hidden unit bits for which no example was seen, something not possible for clustering algorithms (except in the trivial sense of locally generalizing to that new regions what has been learned for the nearby regions for which examples have been seen).

總結(jié)

以上是生活随笔為你收集整理的人工智能中对机器学非常简要的介绍的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：聊聊 Linux 中的五种 IO 模型
下一篇：深度学习算法简介