當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

SorterBot-第1部分

發布時間：2023/12/15 编程问答 40 豆豆

生活随笔收集整理的這篇文章主要介紹了 SorterBot-第1部分小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

A web-based solution to control a swarm of Raspberry Pis, featuring a real-time dashboard, a deep learning inference engine, 1-click Cloud deployment, and dataset labeling tools.

一種基于Web的解決方案，用于控制大量的Raspberry Pi，具有實時儀表板，深度學習推理引擎，一鍵式Cloud部署和數據集標簽工具。

This is the first article of the three-part SorterBot series.

這是分三部分的SorterBot系列的第一篇文章。

Part 1 — General project description and the Web Application
第1部分-常規項目描述和Web應用程序
Part 2 — Controlling the Robotic Arm
第2部分-控制機械臂
Part 3 — Transfer Learning and Cloud Deployment (coming soon)
第3部分-轉移學習和云部署(即將推出)

Source code on GitHub:

GitHub上的源代碼：

Control Panel: Django backend and React frontend, running on EC2
控制面板：在EC2上運行的Django后端和React前端
Inference Engine: Object Recognition with PyTorch, running on ECS
推理引擎：使用PyTorch進行對象識別，在ECS上運行
Raspberry: Python script to control the Robotic Arm
Raspberry ：控制機器人手臂的Python腳本
Installer: AWS CDK, GitHub Actions and a bash script to deploy the solution
安裝程序：AWS CDK，GitHub Actions和bash腳本以部署解決方案
LabelTools: Dataset labeling tools with Python and OpenCV
LabelTools ：使用Python和OpenCV數據集標簽工具

I recently completed an AI mentorship program at SharpestMinds, of which the central element was to build a project, or even better, a complete product. I choose the latter, and in this article, I write about what I built, how I built it, and what I learned along the way. Before we get started, I would like to send a special thanks to my mentor, Tomas Babej (CTO@ProteinQure) for his invaluable help during this journey.

我最近在SharpestMinds完成了AI指導計劃，其核心要素是建立一個項目，或者甚至更好的一個完整的產品。我選擇后者，在本文中，我將介紹我的構建，構建方式以及在此過程中學到的知識。在開始之前，我要特別感謝我的導師Tomas Babej(CTO @ ProteinQure)在此過程中提供的寶貴幫助。

When thinking about what to build, I came up with an idea of a web-based solution to control a swarm of Raspberry Pis, featuring a real-time dashboard, a deep learning inference engine, 1-click Cloud deployment, and dataset labeling tools. The Raspberry Pis can have any sensors and actuators attached to them. They collect data, send it to the inference engine, which processes it and turns it into commands that the actuators can execute. A control panel is also included to manage and monitor the system, while the subsystems communicate with each other using either WebSockets or REST API calls.

在考慮要構建什么時，我想到了一個基于Web的解決方案，用于控制大量的Raspberry Pis，該功能具有實時儀表板，深度學習推理引擎，一鍵式Cloud部署和數據集標簽工具。 Raspberry Pis可以連接任何傳感器和執行器。他們收集數據，將其發送到推理機，由推理機對其進行處理并將其轉換為執行器可以執行的命令。還包括一個控制面板，用于管理和監視系統，而子系統之間可以使用WebSocket或REST API調用相互通信。

As an implementation of the above general idea, I built SorterBot, where the sensor is a camera, and the actuators are a robotic arm and an electromagnet. This solution is able to automatically sort metal objects based on how they look. When the user starts a session, the arm scans the area in front of it, locates the objects and containers within its reach, then automatically divides the objects into as many groups as many containers were found. Finally, it moves the objects to their corresponding containers.

為了實現上述總體思想，我構建了SorterBot，其中的傳感器是攝像頭，執行器是機械臂和電磁體。該解決方案能夠根據外觀自動對金屬對象進行排序。當用戶開始會話時，手臂會掃描其前面的區域，找到其范圍內的對象和容器，然后自動將對象劃分為與找到的容器一樣多的組。最后，它將對象移動到其相應的容器。

SorterBot automatically picks up objectsSorterBot自動拾取對象

To process the images taken by the arm’s camera, I built an inference engine based on Facebook AI’s Detectron2 framework. When a picture arrives for processing, it localizes the items and containers on that image, then saves the bounding boxes to the database. After the last picture in a given session is processed, the items are clustered into as many groups as many containers were found. Finally, the inference engine generates commands, which are instructing the arm to move the similar-looking items into the same container.

為了處理手臂相機拍攝的圖像，我構建了基于Facebook AI的Detectron2框架的推理引擎。圖片到達進行處理時，它將在該圖像上定位項目和容器，然后將邊界框保存到數據庫中。在處理給定會話中的最后一張圖片之后，將項目聚類到與找到的容器一樣多的組中。最后，推理引擎生成命令，這些命令指示手臂將外觀相似的項目移動到同一容器中。

To make it easier to control and monitor the system, I built a control panel, using React for the front-end and Django for the back-end. The front end shows a list of registered arms, allows the user to start a session, and also shows existing sessions with their statuses. Under each session, the user can access the logically grouped logs, as well as before and after overview images of the working area. To avoid paying for AWS resources unnecessarily, the user also has the option to start and stop the ECS cluster where the inference engine runs, using a button in the header.

為了簡化控制和監視系統，我構建了一個控制面板，前端使用React，后端使用Django。前端顯示已注冊武器的列表，允許用戶啟動會話，還顯示帶有其狀態的現有會話。在每個會話下，用戶都可以訪問按邏輯分組的日志，以及工作區概覽圖像的前后。為了避免不必要地支付AWS資源，用戶還可以使用標題中的按鈕來選擇啟動和停止運行推理引擎的ECS集群。

User Interface of the Control Panel控制面板的用戶界面

To make it easier for the user to see what the arm is doing, I used OpenCV to stitch together the pictures that the camera took during the session. Additionally, another set of pictures are taken after the arm moved the objects to the containers, so the user can see a before/after overview of the area and verify that the arm actually moved the objects to the containers.

為了使用戶更容易看到手臂在做什么，我使用OpenCV將攝像機在會話期間拍攝的照片拼接在一起。另外，在手臂將對象移至容器后，還拍攝了另一組照片，因此用戶可以查看該區域的前后視圖，并驗證手臂是否確實將對象移至容器。

Overview image made of the session images stitched together由縫合在一起的會話圖像組成的概覽圖像

The backend communicates with the Raspberry Pis via WebSockets and REST calls, handles the database and controls the inference engine. To enable real-time updates from the backend as they happen, the front-end also communicates with the back-end via WebSockets.

后端通過WebSocket和REST調用與Raspberry Pi進行通信，處理數據庫并控制推理引擎。為了在后端進行實時更新時，前端還通過WebSockets與后端進行通信。

Since the solution consists of many different AWS resources and it is very tedious to manually provision them, I automated the deployment process utilizing AWS CDK and a lengthy bash script. To deploy the solution, 6 environment variables have to be set, and a single bash script has to be run. After the process finishes (which takes around 30 minutes), the user can log in to the control panel from any web browser and start using the solution.

由于該解決方案由許多不同的AWS資源組成，并且手動配置它們非常繁瑣，因此我利用AWS CDK和冗長的bash腳本自動化了部署過程。要部署該解決方案，必須設置6個環境變量，并且必須運行一個bash腳本。該過程完成后(大約需要30分鐘)，用戶可以從任何Web瀏覽器登錄到控制面板并開始使用該解決方案。

Web應用程序 (The Web Application)

Conceptually the communication protocol has two parts. The first part is a repeated heartbeat sequence that the arm runs at regular intervals to check if everything is ready for a session to be started. The second part is the session sequence, responsible for coordinating the execution of the whole session across subsystems.

從概念上講，通信協議分為兩個部分。第一部分是重復的心跳序列，手臂以固定的間隔運行，以檢查是否已準備好開始會話。第二部分是會話序列，負責協調跨子系統的整個會話的執行。

Diagram illustrating how the different parts of the solution communicate with each other該圖說明了解決方案的不同部分如何相互通信

心跳序列 (Heartbeat Sequence)

The point where the execution of the first part starts is marked with a green rectangle. As the first step, the Raspberry Pi pings the WebSocket connection to the inference engine. If the connection is healthy, it skips over to the next part. If the inference engine appears to be offline, it requests its IP address from the control panel. After the control panel returns the IP (or ‘false’ if the inference engine is actually offline), it tries to establish a connection with the new address. This behavior enables the inference engine to be turned off when it’s not in use, which lowers costs significantly. It also simplifies setting up the arms, which is especially important when multiple arms are used.

第一部分開始執行的點用綠色矩形標記。第一步，Raspberry Pi將WebSocket連接ping到推理引擎。如果連接正常，則跳至下一部分。如果推理引擎似乎處于脫機狀態，則它會從控制面板中請求其IP地址。在控制面板返回IP(如果推理引擎實際上處于脫機狀態，則為“ false”)之后，它將嘗試與新地址建立連接。此行為使推理引擎在不使用時可以關閉，從而大大降低了成本。它還簡化了臂的設置，這在使用多個臂時尤其重要。

Regardless if the connection with the new IP succeeds or not, the result gets reported to the control panel alongside the arm’s ID. When the control panel receives the connection status, it first checks if the arm ID is already registered in the database, and registers it if needed. After that, the connection status is pushed to the UI, where a status LED lights up in green or orange, representing whether the connection succeeded or not, respectively.

無論與新IP的連接成功與否，結果都會與機械臂ID一起報告給控制面板。當控制面板收到連接狀態時，它首先檢查臂ID是否已在數據庫中注冊，并在需要時進行注冊。之后，將連接狀態推送到UI，其中狀態LED呈綠色或橙色點亮，分別表示連接是否成功。

An arm as it appears on the UI, with the start button and status light出現在UI上的手臂，帶有開始按鈕和狀態指示燈

On the UI, next to the status LED, there is a ‘play’ button. When the user clicks this button, the arm’s ID is added to a list in the database that contains the IDs of the arms that should start a session. When an arm checks in with the connection status, and that status is green, it checks if its ID is in that list. If it is, the ID gets removed and a response is sent back to the arm to start a session. If it isn’t, a response is sent back to restart the heartbeat sequence without starting a session.

在UI上，狀態LED旁邊有一個“播放”按鈕。當用戶單擊此按鈕時，機械臂的ID將添加到數據庫中的列表中，該列表包含應啟動會話的機械臂的ID。當機械臂以連接狀態簽入且該狀態為綠色時，它將檢查其ID是否在該列表中。如果是，則刪除該ID，并將響應發送回該分支以開始會話。如果不是，則發送響應以重新啟動心跳序列，而無需啟動會話。

會話順序 (Session Sequence)

The first task of the arm is to take pictures for inference. To do that, the arm moves to inference position then starts to rotate at its base. It stops at certain intervals, then the camera takes a picture, which is directly sent to the inference engine as bytes, using the WebSocket connection.

手臂的首要任務是拍照以作推斷。為此，手臂移至推斷位置，然后在其底部開始旋轉。它以一定的間隔停止，然后相機拍攝照片，并使用WebSocket連接將其作為字節直接發送到推理引擎。

High-level diagram of the Inference Engine推理引擎的高級圖

When the image data is received from the Raspberry Pi, the image processing begins. First, the image is decoded from bytes, then the resulting NumPy array is used as the input of the Detectron2 object recognizer. The model outputs bounding box coordinates of the recognized objects alongside their classes. The coordinates are relative distances from the top-left corner of the image measured in pixels. Only binary classification is done here, meaning an object can be either an item or a container. Further clustering of items is done in a later step. At the end of the processing, the results are saved to the PostgreSQL database, then the images are written to disk to be used later by the vectorizer, and archived to S3 for later reference. Saving and uploading the image is not in the critical path, so they are executed in a separate thread. This lowers execution time as the sequence can continue before the upload finishes.

從Raspberry Pi收到圖像數據后，圖像處理開始。首先，從字節解碼圖像，然后將所得的NumPy數組用作Detectron2對象識別器的輸入。該模型將已識別對象的邊界框坐標與它們的類一起輸出。坐標是距圖像左上角的相對距離(以像素為單位)。此處僅進行二進制分類，這意味著對象可以是項目或容器。項的進一步聚類在后續步驟中完成。在處理結束時，將結果保存到PostgreSQL數據庫中，然后將圖像寫入磁盤以供矢量化器稍后使用，并存檔到S3中以供以后參考。保存和上傳圖像不是關鍵路徑，因此它們在單獨的線程中執行。由于序列可以在上傳完成之前繼續進行，因此減少了執行時間。

When evaluating models in Detectron2’s model zoo, I choose Faster R-CNN R-50 FPN, as it provides the lowest inference time (43 ms), lowest training time (0.261 s/iteration), and lowest training memory consumption (3.4 GB), without giving up too much accuracy (41.0 box AP, which is 92.5% of the best network’s box AP), compared to other available architectures.

在Detectron2的模型動物園中評估模型時，我選擇Faster R-CNN R-50 FPN，因為它提供了最低的推理時間(43 ms)，最低的訓練時間(0.261 s /迭代)和最低的訓練內存消耗(3.4 GB)與其他可用架構相比，不會犧牲太多的準確性(41.0盒式AP，是最佳網絡盒式AP的92.5％)。

High-level diagram of the Vectorizer矢量化器的高級圖

After all of the session images have been processed and the signal to generate session commands arrived, stitching together these pictures starts on a separate process, providing a ‘before’ overview for the user. Parallel to this, all the image processing results belonging to the current session are loaded from the database. First, the coordinates are converted to absolute polar coordinates using an arm-specific constant sent with the request. The constant, r represents the distance between the center of the image and the arm’s base axis. The relative coordinates (x and y on the drawing below) are pixel distances from the top-left corner of the image. The angle where the image was taken is denoted with γ. Δγ represents the difference between the angle of the given item and the image’s center and can be calculated using equation 1) on the drawing below. The first absolute polar coordinate of the item (angle, γ’), can be simply calculated using this equation: γ’ = γ + Δγ. The second coordinate (radius, r’), can be calculated using equation 2) on the drawing.

在處理完所有會話圖像并到達生成會話命令的信號之后，將這些圖片拼接在一起是在單獨的過程中開始的，為用戶提供了“之前”的概覽。與此并行，屬于當前會話的所有圖像處理結果都從數據庫中加載。首先，使用隨請求發送的特定于手臂的常數將坐標轉換為絕對極坐標。常數r表示圖像中心與手臂的基本軸之間的距離。相對坐標(下圖上的x和y )是距圖像左上角的像素距離。拍攝圖像的角度用γ表示。 Δγ表示給定項目的角度與圖像中心之間的差，可以使用下圖中的公式1)計算得出。可以使用以下等式簡單地計算出項的第一絕對極坐標(角度， γ' )： γ'=γ+Δγ 。可以使用圖形上的公式2)計算第二個坐標(半徑， r') 。

Drawing and equations used to convert relative coordinates to absolute polar coordinates用于將相對坐標轉換為絕對極坐標的圖形和方程式

After the conversion of the coordinates, the bounding boxes belonging to the same physical objects are replaced by their averaged absolute coordinates.

坐標轉換后，屬于相同物理對象的邊界框將被其平均絕對坐標替換。

In the preprocessing step for the vectorizer, the images saved to disk during the previous step are loaded, then cropped around the bounding boxes of each object, resulting in a small picture of every item.

在矢量化程序的預處理步驟中，加載在上一步中保存到磁盤的圖像，然后在每個對象的邊界框周圍進行裁剪，從而為每個項目生成一張小圖片。

Example of an object cropped around its bounding box圍繞其邊界框裁剪的對象示例

These pictures are converted to tensors, then added to a PyTorch dataloader. Once all the images are cropped, the created batch is processed by the vectorizer network. The chosen architecture is a ResNet18 model, which is appropriate for these small-sized images. A PyTorch hook is inserted after the last fully connected layer, so in each inference step the output of that layer, a 512-dimensional feature vector is copied to a tensor outside of the network. After the vectorizer processed all of the images, the resulting tensor is directly used as input of the K-Means clustering algorithm. For the other required input, the number of clusters to be computed, a simple count of the recognized containers is inserted from the database. This step outputs a set of pairings, representing which item goes to which container. Lastly, these pairings are replaced with absolute coordinates that are sent to the robotic arm.

這些圖片被轉換為張量，然后添加到PyTorch數據加載器中。裁剪完所有圖像后，矢量化儀網絡將處理創建的批次。選擇的體系結構是ResNet18模型，適用于這些小型圖像。在最后一個完全連接的層之后插入一個PyTorch掛鉤，因此在每個推理步驟中，該層的輸出將512維特征向量復制到網絡外部的張量。在向量化器處理完所有圖像之后，所得張量將直接用作K-Means聚類算法的輸入。對于其他所需輸入，即要計算的群集數，將從數據庫中插入一個簡單的已識別容器計數。此步驟輸出一組配對，代表哪個項目進入哪個容器。最后，將這些配對替換為發送到機械臂的絕對坐標。

The commands are pairs of coordinates representing items and containers. The arm executes these one by one, moving the objects to the containers using the electromagnet.

命令是代表項目和容器的坐標對。手臂一步一步地執行這些操作，然后使用電磁體將物體移到容器中。

After the objects were moved, the arm takes another set of pictures to be stitched, as an overview of the landscape after the operation. Finally, the arm resets to its initial position and the session is complete.

在移動對象之后，手臂會拍攝另一組要縫合的照片，作為手術后的景觀概覽。最后，手臂復位到其初始位置，會話完成。

To be continued in Part 2…

在第二部分中繼續……

翻譯自: https://medium.com/swlh/web-application-to-control-a-swarm-of-raspberry-pis-with-an-ai-enabled-inference-engine-b3cb4b4c9fd