人工智能ai 学习_人工智能中强化学习的要点
人工智能ai 學(xué)習(xí)
As discussed earlier, in Reinforcement Learning, the agent takes decisions in order to attain maximum rewards. These rewards are the reinforcements through which the agent learns in this type of agent.
如前所述,在“ 強(qiáng)化學(xué)習(xí)”中 ,代理做出決策以獲取最大的回報(bào)。 這些獎(jiǎng)勵(lì)是代理在此類代理中學(xué)習(xí)的增強(qiáng)。
The reinforcements are of two types:
鋼筋有兩種類型:
Positive Reinforcement:
積極加固:
When the agent completes any task, if the feedback or the points for the task are in a positive response, then it is termed as the positive reinforcement. This type of reinforcement increases the performance of the agent as the agent now gets a hint that it has to make decisions and perform tasks in this particular manner to earn more rewards in the future also.
當(dāng)代理完成任何任務(wù)時(shí),如果任務(wù)的反饋或要點(diǎn)處于積極響應(yīng)中,則稱為積極強(qiáng)化。 這種增強(qiáng)方式可以提高代理的性能,因?yàn)榇憩F(xiàn)在可以暗示它必須以這種特定方式做出決定并執(zhí)行任務(wù),以在將來(lái)也獲得更多的回報(bào)。
Negative Reinforcement:
負(fù)加固:
Whenever the agent fails to perform any task as required, in that case, the agent is provided with negative reinforcement. This can be thought as of giving punishment to a child for doing mischiefs. The negative reinforcements tell the agent that such type of performance or such type of decisions must be avoided in the future while solving similar types of problems.
每當(dāng)代理未能按要求執(zhí)行任何任務(wù)時(shí),在這種情況下,就會(huì)為代理提供負(fù)加固。 可以認(rèn)為這是對(duì)孩子作惡的懲罰。 負(fù)面的補(bǔ)充告訴代理人,將來(lái)在解決類似類型的問(wèn)題時(shí),必須避免這種績(jī)效或這種決策。
Factors on which the performance of the agent which learns through Reinforcements depend:
通過(guò)增援來(lái)學(xué)習(xí)的業(yè)務(wù)代表的績(jī)效取決于以下因素:
Input:
輸入:
The Agent seeks the initial stage as the input from which it has to start. This is an important phase because all the observations and inferences will be drawn starting from this state, and the past state of the agent will not be considered.
代理尋求初始階段作為必須從其開始的輸入。 這是重要的階段,因?yàn)閷拇藸顟B(tài)開始繪制所有觀察和推論,并且不會(huì)考慮代理的過(guò)去狀態(tài)。
Output:
輸出:
The output state that the system will reach after solving a certain problem is not fixed as there are multiple ways of solving a problem and the agent can choose different solution whenever it tries to solve the same type of problem.
系統(tǒng)解決某個(gè)問(wèn)題后將達(dá)到的輸出狀態(tài)不是固定的,因?yàn)橛卸喾N解決方法,并且座席在嘗試解決同一類型的問(wèn)題時(shí)可以選擇不同的解決方案。
Training/Learning:
培訓(xùn)/學(xué)習(xí):
The training phase or the Learning Phase is when the agent builds its Knowledge Base from the reward or punishment that it gets based on the output it produces. It is a very important phase in Reinforcement Learning because it helps the agent to understand and learn in the same way as humans. This implements the human behavior in agents which is the main target in Artificial Intelligence.
培訓(xùn)階段或?qū)W習(xí)階段是指代理根據(jù)其產(chǎn)生的輸出所獲得的獎(jiǎng)勵(lì)或懲罰建立其知識(shí)庫(kù)。 這是強(qiáng)化學(xué)習(xí)中非常重要的階段,因?yàn)樗梢詭椭硪耘c人類相同的方式來(lái)理解和學(xué)習(xí)。 這在代理中實(shí)現(xiàn)了人類行為,而代理是人工智能的主要目標(biāo)。
翻譯自: https://www.includehelp.com/ml-ai/main-points-of-reinforcement-learning-in-artificial-intelligence.aspx
人工智能ai 學(xué)習(xí)
總結(jié)
以上是生活随笔為你收集整理的人工智能ai 学习_人工智能中强化学习的要点的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: dumpstack_Java Threa
- 下一篇: 图解:为什么非公平锁的性能更高?