當前位置：首頁 > 运维知识 > windows >内容正文

windows

基于kb的问答系统_1KB以下基于表的Q学习

發布時間：2023/12/15 windows 45 豆豆

生活随笔收集整理的這篇文章主要介紹了基于kb的问答系统_1KB以下基于表的Q学习小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

基于kb的問答系統

介紹 (Introduction)

Q-learning is an algorithm in which an agent interacts with its environment and collects rewards for taking desirable actions.

Q學習是一種算法，其中代理與它的環境進行交互，并為采取期望的行動而收集獎勵。

The simplest implementation of Q-learning is referred to as tabular or table-based Q-learning. There are tons of articles, tutorials, etc. already available on the web which describe Q-learning so I won’t go into excruciating detail here. Instead, I want to show how efficiently table-base Q-learning can be done using tinymind. In this article, I will describe how tinymind implements Q-learning using C++ templates and fixed-point (Q-format) numbers as well as go thru the example in the repo.

Q學習的最簡單實現稱為表格或基于表的Q學習。網絡上已經有大量的文章，教程等描述了Q學習，所以在這里我不再贅述。相反，我想展示使用tinymind如何高效地進行基于表的Q學習。在本文中，我將描述tinymind如何使用C ++模板和定點( Q格式 )數字實現Q學習，并通過回購中的示例進行介紹。

迷宮問題 (The Maze Problem)

A common table-based Q-learning problem is to train a virtual mouse to find its way out of a maze to get the cheese (reward). Tinymind contains an example program which demonstrates how the Q-learning template library works.

基于表的常見Q學習問題是訓練虛擬鼠標從迷宮中脫出以獲取奶酪(獎勵)。 Tinymind包含一個示例程序，該程序演示了Q學習模板庫的工作方式。

In the example program, we define the maze:

在示例程序中，我們定義迷宮：

/*
Q-Learning unit test. Learn the best path out of a simple maze.
5 == Outside the maze
________________________________________________
| | |
| | |
| 0 | 1 / 5
| | |
|____________/ ________|__/ __________________|_______________________
| | | |
| | / |
| 4 | 3 | 2 |
| / | |
|__/ __________________|_______________________|_______________________|
5
The paths out of the maze:0->4->5
0->4->3->1->5
1->5
1->3->4->5
2->3->1->5
2->3->4->5
3->1->5
3->4->5
4->5
4->3->1->5

We define all of our types in a common header so that we can separate the maze learner code from the training and file management code. I have done this so that we can measure the amount of code and data required for the Q-learner alone. The common header defines the maze as well as the type required to hold states and actions:

我們在公共標頭中定義所有類型，以便我們可以將迷宮學習者代碼與培訓和文件管理代碼分開。我這樣做是為了使我們可以衡量僅Q學習器所需的代碼和數據量。通用頭定義了迷宮以及保存狀態和動作所需的類型：

// 6 rooms and 6 actions
#define NUMBER_OF_STATES 6
#define NUMBER_OF_ACTIONS 6typedef uint8_t state_t;
typedef uint8_t action_t;

We train the mouse by dropping it into a randomly-selected room (or on the outside of it where the cheese is). The mouse starts off by taking a random action from a list of available actions at each step. The mouse receives a reward only when he finds the cheese (e.g. makes it to position 5 outside the maze). If the mouse is dropped into position 5, he has to learn to stay there and not wander back into the maze.

我們通過將鼠標放到隨機選擇的房間(或放在奶酪所在的外部)中來訓練鼠標。鼠標通過在每個步驟從可用動作列表中采取隨機動作來開始。僅當他找到奶酪時(例如，使其在迷宮外的位置5定位)，鼠標才會獲得獎勵。如果將鼠標放到位置5，則他必須學會留在位置5，而不能游回迷宮。

建立例子 (Building The Example)

Starting from cppnnml/examples/maze, I will create a directory to hold the executable file and build the example.

從cppnnml / examples / maze開始，我將創建一個目錄來保存可執行文件并構建示例。

mkdir -p ~/maze
g++ -O3 -o ~/maze/maze maze.cpp mazelearner.cpp -I../../cpp

This builds the maze leaner example program and places the executable file at ~/maze. We can now cd into the directory where the executable file was generated and run the example program.

這將構建迷宮式更精巧的示例程序，并將可執行文件放置在?/ maze中。現在我們可以cd到生成可執行文件的目錄中，并運行示例程序。

cd ~/maze
./maze

When the program finishes running, you’ll see the last of the output messages, something like this:

程序完成運行后，您將看到最后一條輸出消息，如下所示：

take action 5
*** starting in state 3 ***
take action 4
take action 5
*** starting in state 2 ***
take action 3
take action 2
take action 3
take action 4
take action 5
*** starting in state 3 ***
take action 4
take action 5
*** starting in state 5 ***
take action 5

Your messages may be slightly different since we’re starting our mouse in a random room on every iteration. During example program execution, we save all mouse activity to files (maze_training.txt and maze_test.txt). Within the training file, the mouse takes random actions for the first 400 episodes and then the randomness is decreased from 100% random to 0% random for another 100 episodes. To see the first few training iterations you can do this:

您的消息可能會略有不同，因為每次迭代時我們都會在隨機的房間內啟動鼠標。在示例程序執行期間，我們將所有鼠標活動保存到文件(maze_training.txt和maze_test.txt)中。在訓練文件中，鼠標在前400個情節中采取隨機動作，然后對于其他100個情節，隨機性從100％隨機降低為0％隨機。要查看前幾次訓練迭代，可以執行以下操作：

head maze_training.txt

You should see something like this:

您應該會看到以下內容：

1,3,4,0,4,5,
4,5,
2,3,1,3,4,3,1,5,
5,5,
4,5,
1,5,
3,2,3,4,3,4,5,
0,4,0,4,0,4,0,4,5,
1,3,1,5,
5,4,0,4,3,1,3,1,5,

Again, your messages will look slightly different. The first number is the start state and every comma-separated value after that is the random movement of the mouse from room to room. Example: In the first line above we started in room 1, then moved to 3, then 4, then 0, then back to 4, then to 5. Since 5 is our goal state, we stopped. The reason this looks so erratic is for the first 400 iterations of training we make a random decision from our possible actions. Once we get to state 5, we get our reward and stop.

同樣，您的消息看起來會略有不同。第一個數字是開始狀態，之后的每個逗號分隔值是鼠標在不同房間之間的隨機移動。示例：在上面的第一行中，我們從1號房間開始，然后移至3號，然后是4號，然后是0號，然后又回到4號，然后是5號。由于5是我們的目標狀態，因此我們停止了。看起來如此古怪的原因是對于訓練的前400次迭代，我們從可能的動作中做出隨機決定。一旦進入狀態5，我們將獲得獎勵并停止。

During the test runs, we’ve decreased our randomness down to 0% and so we rely upon the Q-table to decide which action to take from the state our mouse is in.

在測試運行期間，我們已將隨機性降低到0％，因此我們依靠Q表來決定從鼠標所處的狀態采取哪種動作。

可視化培訓和測試 (Visualizing Training And Testing)

I have included a Python script to plot the training and test data. If we plot the training data for start state == 2 (i.e. the mouse is dropped into room 2 at the beginning):

我包括一個Python腳本來繪制訓練和測試數據。如果我們繪制起始狀態== 2的訓練數據(即，鼠標首先放在房間2中)：

Each line on the graph represents an episode where we’ve randomly placed the mouse into room 2 at the start of the episode. You can see that in the worst case run, we took 32 random moves to find the goal state (state 5). This is because at each step, we’re simply generating a random number to choose from the available actions (i.e. which room should be move to next). If we use the script to plot the testing data for start state == 2:

圖中的每一行代表一個情節，在情節開始時我們將鼠標隨機放置在2號房間中。您可以看到，在最壞的情況下，我們隨機進行了32次移動以找到目標狀態(狀態5)。這是因為在每個步驟中，我們只是生成一個隨機數以從可用操作中進行選擇(即應將哪個房間移至下一個房間)。如果我們使用腳本繪制開始狀態== 2的測試數據：

You can see that once trained, the Q-learner has learned, thru random experimentation, to follow an optimal path to the goal state: 2->3->4->5.

您可以看到，經過訓練后，Q-學習者通過隨機實驗學會了遵循最佳狀態進入目標狀態的途徑：2-> 3-> 4-> 5。

What happens when we drop the virtual mouse outside of the maze where the cheese is? If we plot the training data:

當我們將虛擬鼠標放到奶酪所在的迷宮之外時，會發生什么？如果我們繪制訓練數據：

The mouse is making random decisions during training and so wanders back into the maze in most episodes. After training:

鼠標在訓練過程中會做出隨機決定，因此在大多數情節中都會游走回到迷宮中。訓練結束后：

Our virtual mouse has learned to stay put and get the reward.

我們的虛擬鼠標學會了保持原狀并獲得獎勵。

確定Q學習器的大小 (Determining The Size Of The Q-Learner)

We can determine how much code and data are taken up by the Q-learner by compiling just the machine learner code and using the size program:

通過僅編譯機器學習者代碼并使用size程序，我們可以確定Q學習者占用了多少代碼和數據：

g++ -c mazelearner.cpp -O3 -I../../cpp && mv mazelearner.o ~/maze/.
cd ~/maze
size mazelearner.o

The output you should see is:

您應該看到的輸出是：

text data bss dec hex filename
540 8 348 896 380 mazelearner.o

The total code + data footprint of the Q-learner is 896 bytes. This should allow a table-based Q-learning implementation to fit in any embedded system available today.

Q學習器的總代碼+數據占用空間為896字節。這應該允許基于表的Q學習實現適合當今可用的任何嵌入式系統。

結論 (Conclusion)

Table-based Q-learning can be done very efficiently using the capabilities provided within tinymind. We don’t need floating point or fancy interpreted programming languages. One can instantiate a Q-learner using C++ templates and fixed point numbers. Clone the repo and try the example for yourself!

使用tinymind提供的功能可以非常有效地完成基于表的Q學習。我們不需要浮點或花哨的解釋性編程語言。可以使用C ++模板和定點數實例化Q學習器。克隆倉庫并親自嘗試示例！

翻譯自: https://medium.com/swlh/table-based-q-learning-in-under-1kb-3cc0b5b54b43

基于kb的問答系統

總結

以上是生活随笔為你收集整理的基于kb的问答系统_1KB以下基于表的Q学习的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： 2022年四大行利率，定期存款利率一览
下一篇：如何设置Jupiter Notebook