當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

rcp rapido_Rapido使用数据改善乘车调度

發布時間：2023/11/29 编程问答 33 豆豆

生活随笔收集整理的這篇文章主要介紹了 rcp rapido_Rapido使用数据改善乘车调度小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

rcp rapido

Given our last blog post of the series, which can be found here :

鑒于我們在該系列中的最后一篇博客文章，可以在這里找到：

We thought it would be helpful to explain how we implemented all of the above into an on-ground experiment. We mentioned above about how the lack of a logical time-based control group forced us to pivot to geo-temporal control formation. I would like to take this opportunity to talk about an experiment we ran as part of the Dispatch team @ Rapido.

我們認為將上述所有內容如何實施到地面實驗中會有所幫助。上面我們提到了缺乏基于時間的邏輯控制組如何迫使我們轉向地時控制結構。我想借此機會談論作為Dispatch團隊@ Rapido的一部分進行的一項實驗。

什么是乘車調度？ (What is a Ride Dispatch?)

The system that decides which order request (when you tap the Request Rapido button, aka the Book my Ride button, on your app) should be sent to which particular Captain(s) to ensure that the Captain reaches the customer in the quickest and most efficient way possible, is called ‘Dispatch’. It is an homage to the days of old when Taxi services were run over the telephone and a Customer who had called in for a pickup would be patched through to an Agent who would find a willing cabbie (often after multiple calls) and that driver was “dispatched” for that order.

決定哪個訂單請求(當您點擊應用上的Request Rapido按鈕，也就是“預訂我的乘車”按鈕時 )的系統應該發送給哪個特定船長，以確保船長以最快，最快捷的方式到達客戶高效的方法稱為“調度”。這是對過去的日子的敬意，當時出租車服務是通過電話運行的，而要求接機的客戶會被派遣到一個代理商，該代理商會找到愿意的出租車司機(通常是在多次打電話之后)，而那個司機是“派遣”該訂單。

Dispatch is one of the key levers of a ride-hailing marketplace. It is one of those systems that EVERY ride request has to propagate through, hence the room for error is low, with the stakes being very high.

調度是乘車市場的關鍵杠桿之一。它是每個乘車請求都必須傳播的系統之一，因此錯誤空間很小，風險很高。

One of the first questions we had to answer while even thinking of a product to build was, “What metrics do we look at to see if marketplace conditions are being improved”? Is the ETA the gold metric for this system, or do we look at other things like Matching Time, Distance Driven by the captain to get to the customer, and cancellations from both the demand and supply sides? We definitely had to be cognizant of these metrics while evaluating any changes to our system.

我們甚至在考慮要生產的產品時，必須回答的第一個問題是：“ 我們看什么指標才能確定市場條件是否正在改善 ”？ ETA是該系統的黃金指標，還是我們要考慮其他方面，例如比賽時間，由船長駕駛到達客戶的距離以及需求方和供應方的取消？在評估我們系統的任何更改時，我們絕對必須意識到這些指標。

在我們開始重建它之前，Dispatch @ Rapido是什么樣的？ (What was Dispatch @ Rapido like before we started rebuilding it?)

Going into the rebuilding process, the current dispatch system was a simple radial system, where a customer requests a ride on the app, and the system draws a circle of radius say 2 km, and looks at all the captains in that area, calculates the crow-flying distance to the customer, and propagates the ping in order.

進入重建過程，當前的調度系統是一個簡單的放射狀系統，客戶請求在應用程序上乘車，該系統繪制一個半徑為2 km的圓，并查看該區域中的所有機長，烏鴉飛到客戶的距離，并按順序傳播ping。

As a first solution, this is fine, but discerning data enthusiasts can probably find many issues with this system — how to design the optimal radius, what happens if there is a huge divider like a ring-road or a railway crossing that results in a short euclidean distance but long route based distance. In the latter case, this would be categorized as a sub-optimal match, as now the captain has to spend more time driving empty kilometers to reach the customer, and the customer gets frustrated about being matched to a captain who looks close by but takes twice the time to reach the pickup location.

作為第一個解決方案，這很好，但是有眼光的數據愛好者可能會發現此系統存在許多問題-如何設計最佳半徑，如果存在像環形公路或鐵路交叉路口這樣的巨大分隔線而導致行駛速度變慢，會發生什么情況？歐幾里得距離短，但基于路徑的距離長。在后一種情況下，這將歸類為次優比賽，因為現在船長不得不花更多時間駕駛空曠的里程才能到達客戶，并且客戶對與看上去很近但是卻要走近的船長感到沮喪到達取件地點的時間兩次。

This specific use case can be reduced to a higher-level question: for a given pickup location, is there a corresponding nearby area that should be geo-fenced when considering it to be a part of the “dispatch radius”?

可以將這個特定的用例簡化為一個更高層次的問題：對于給定的取貨地點，在將其視為“派發半徑”的一部分時，是否應該對相應的附近區域進行地理圍欄？

Furthermore, is there a location that is potentially further away in a euclidean sense, but closer by in terms of driving time?

此外，是否存在一個可能在歐幾里得距離更遠但在行車時間更近的位置？

通過支付maps API可以緩解這個問題嗎？ (Won’t this problem be alleviated by paying for a maps API?)

Too expensive at a per-request level. Right now, even though we are at 20% of our pre-COVID levels (and recovering every week!), servicing each request via google-maps API would be prohibitively expensive for a growing startup like Rapido, especially in these times where innovation is warranted. The goal was to deploy a smart solution, without breaking the bank, that would still have a high impact on the ground.

在每個請求級別上太貴了。現在，即使我們的使用率達到了COVID認證前的水平的20％(并且每周都在恢復！)，對于像Rapido這樣的新興創業公司而言，通過google-maps API服務每個請求的費用實在是太高了，尤其是在這些創新的時代保證。我們的目標是在不中斷資金的情況下部署智能解決方案，這仍然會對現場產生重大影響。

建立行車時間估算 (Building the driving time estimates)

The most crucial component of a smart Dispatch system is having reliable driving time estimates. This is essentially built by leveraging the huge store of data available to us from our historical rides. As part of our internal logging, we record the time taken from :

智能調度系統最關鍵的部分是可靠的行駛時間估算。這本質上是通過利用我們過去的經驗為我們提供的大量數據來構建的。作為內部日志記錄的一部分，我們記錄以下時間：

The captain to the customer aka the ETA

客戶的船長又稱ETA

The customer’s pickup to the customer’s drop aka the Ridetime

顧客接送顧客的乘車時間

Each part of this gives us more coverage within a city in terms of pickup-to-drop driving times. The ETA gives us short-distance coverage, and the Ridetime gives us longer-distance coverage. We combine the two sources of data and group-by at a time-of-day and a day-of-week level, remove outliers, add a few filters for the minimum amount of rides being done in that bucket to be considered valid, and store the output in a dataset to be consumed by any concerned team.

從接送車的時間來看，每個部分都為我們提供了更多城市覆蓋范圍。 ETA給我們短距離覆蓋，而Ridetime給我們長距離覆蓋。我們將兩種數據來源結合在一起，并按一天中的某天和一周中的某天進行分組，刪除異常值，添加一些過濾器以使在該存儲分區中執行的最少乘車次數被視為有效，并將輸出存儲在數據集中以供任何相關團隊使用。

設計實驗 (Designing the experiment)

Once we have a pickup-to-drop driving time map, at a time-of-day and day-of-week level, we now get to the dirty work of actually designing an experiment. The first step was to answer the question of, “for a pickup location, can we find a close-by area that has a worse driving time to the source than a further away area”. I will use this segue to introduce some of the terminologies we use in this regard :

一旦有了一天中和一周中某天的上下班駕駛時間圖，我們就可以開始實際設計實驗的工作了。第一步是回答以下問題：“對于接送地點，我們能找到距離源頭行駛時間比遠離區域更差的附近區域”的問題。我將使用這種方法來介紹我們在這方面使用的一些術語：

source_hex : the Uber h3 derived hex8 in which the ride request originates

source_hex ：Uber h3派生的hex8，乘車請求起源于此

bad_hex : the Uber h3 derived hex8, which is closer to the source_hex geometrically, but not while driving

bad_hex ：Uber h3派生的hex8，在幾何上更接近source_hex，但在行駛時不是

good_hex : the Uber h3 derived hex8, which is further away from the source_hex geometrically, but has a faster driving time than the bad_hex

good_hex ：Uber h3派生的hex8，在幾何上距離source_hex較遠，但是驅動時間比bad_hex快

We do this analysis at a time_of_day and day_of_week level, so a trio of HexA HexB and HexC could be mapped as : Source_hex -> HexA, Bad_hex -> HexB, Good_hex -> HexC on a Monday morning, but on a Sunday evening, it is not necessary that HexB and HexC’s relative driving times to HexA are the same. We were cognizant to not make too many dangerous assumptions here.

我們在time_of_day和day_of_week級別進行此分析，因此可以將HexA HexB和HexC的三個映射為： Source_hex-> HexA，Bad_hex-> HexB，Good_hex-> HexC在星期一的早晨，但是在周日的晚上， HexB和HexC相對于HexA的相對行駛時間不必相同。我們意識到在這里不要做太多危險的假設。

An example of a source_hex, bad_hex and good_hexsource_hex，bad_hex和good_hex的示例

Here the brown hex is the source_hex, the yellow hex is the bad-level-1-hex and the orange one is the good-level-2-hex. Now, from this map, it is not clear what is the reason for the increase in ridetime from yellow to brown as opposed to orange to brown. But when we look at the google maps view it becomes evident :

這里棕色的十六進制是source_hex，黃色的十六進制是壞1進制，而橙色的是好2進制。現在，從這張地圖上，不清楚從黃色到棕色而不是橙色到棕色的行駛時間增加的原因是什么。但是，當我們查看谷歌地圖視圖時，它變得顯而易見：

We see that the brown and orange hex8s are bifurcated by a huge railway track ( Vijayawada is one of the biggest railway junctions in the country and regularly reports trains crossing road tracks ). On the other hand, the orange hex8 has clear unfettered access to the source_hex.

我們看到棕色和橙色的hex8s被一條巨大的鐵路軌道分叉(維賈亞瓦達(Vijayawada)是該國最大的鐵路樞紐之一，并定期報告火車穿越道路)。另一方面，橙色的hex8可以不受限制地訪問source_hex。

Once we have the universe of such hex trios, we are back to the problem or how to do a test-control split. Given that time-based control is not an option, we tried to use features ( relevant to dispatch) of each hex-trio and passed it through a vector-similarity measure to calculate the similarity scores of each pair restricted by both of them having the same day and time at which the hex-trio is valid ( aka both test and control source hexes have bad and good hexes on the same day and time period ).

一旦有了這樣的十六進制三重奏的宇宙，我們就回到了問題或如何進行測試控制拆分。鑒于基于時間的控制不是一種選擇，我們嘗試使用每個十六進制三元組的功能(與分派有關)，并將其通過向量相似度度量傳遞，以計算受其限制的每對相似度得分。十六進制三重奏有效的同一天和同一時間(又稱測試和對照源十六進制在同一天和同一時間段內有壞和好十六進制)。

Example of a test group測試組示例 Example of a control group對照組的例子

It doesn’t make a lot of sense to say HexA on a Monday morning is similar to HexB on a Wednesday afternoon. So we only do the split if HexA and HexB are both source_hexes on the same day and time period.

在星期一早上說HexA與在星期三下午說HexB并沒有多大意義。因此，僅當HexA和HexB均為同一日期和時間段的source_hexes時，才進行拆分。

Once the above is done for each pair in the universe, we start building the test control split to ensure that no hex in the test group is also in the control group through some other mapping, as this would contaminate the experiment results.

一旦對Universe中的每個對完成上述操作，我們便開始構建測試控件組，以確保通過其他映射，測試組中的十六進制也不會出現在對照組中，因為這會污染實驗結果。

Given that we now have our test-control split, the measure we take is that the test-group source_hexes will have the good_hex included and bad_hex excluded when creating the “dispatch radius”, whereas the control-group will not have the bad_hex excluded and good_hex included. Given that everything else remains the same, the test group should show a reduced ETA compared to the control group post-experiment.

鑒于我們現在已經進行了測試控制拆分，因此我們采取的措施是，在創建“分發半徑”時，測試組source_hexes將包含good_hex，而bad_hex將被排除，而控制組將不排除bad_hex，并且包括good_hex。考慮到其他所有條件均保持不變，與實驗后的對照組相比，測試組的ETA應當降低。

We then ran this experiment for 2 weeks and tried to get 1000+ orders cumulatively in both the test group and the control group, so we don’t suffer from data-sparsity while analysing what happened.

然后，我們進行了2周的實驗，并試圖在測試組和對照組中累計獲得1000多個訂單，因此在分析發生的情況時，我們不會遭受數據稀疏的困擾。

實驗結果 (Experiment Results)

We ran this experiment in Hyderabad where we saw an ETA reduction in test group vs control group of around 9% when comparing the Median ETAs and almost 13% when comparing Mean ETAs. Pre experiment the test and control groups had a difference of only about 3% when looking at both Mean and Median ETAs, thus showing us that the changes we made actually added value to on-ground ETAs.

我們在海得拉巴進行了該實驗，與中位數ETA相比，測試組與對照組的ETA降低了約9％，而平均ETA則降低了近13％。實驗前，測試組和對照組在平均和中位數ETA上的差異僅為3％，因此向我們表明，所做的更改實際上為地面ETA增值。

We know that no experiment can be called successful without statistical tests of significance, so we went into the experiment having defined our hypothesis as follows:

我們知道，沒有顯著性統計檢驗就不能說成功就是實驗，因此我們按照以下假設進行假設的實驗：

H0 ( Null Hypothesis ) : Hex based swaps have no effect on realized ETAs

H0(零假設) ：基于十六進制的交換對已實現的ETA無效

H1 ( Alternate Hypothesis ) : Hex based swaps DO have an effect on realized ETAs

H1(備用假設) ：基于十六進制的互換確實會影響已實現的ETA

Rejecting the null hypothesis at a significance level of above 95% is the gold standard that we were striving for, and we are happy to report that we achieved statistically significant results at a level of around 98%, with p-values in the 0.01 range when using a few statistical tests of significance.

我們一直追求的金標準是拒絕高于95％的顯著性水平的零假設，并且我們很高興地報告我們在98％左右的水平上取得了統計學上顯著的結果，p值在0.01范圍內在使用一些有意義的統計檢驗時。

When viewed visually, what we got was something similar to this :

從視覺上看，我們得到的類似于以下內容：

Test vs Control group change visualized -> Blue vertical line represents the mean of the test-group, and the Purple vertical line represents the mean of the control-group測試組與對照組的變化可視化->藍色豎線表示測試組的平均值，紫色豎線表示對照組的平均值

What this image is telling us, is that when viewed on a relative scale and after adjusting for pre-experiment ETA delta, we have shifted the center of the test group ETA distribution towards the lower side when compared to the control ETA group, thus showing that our changes have made an impact in lowering ETAs as we expected.

這張圖片告訴我們的是，以相對比例查看并調整了實驗前的ETA增量后，與對照組ETA組相比，我們已將測試組ETA分布的中心向下方移動，我們的更改對降低ETA產生了預期的影響。

結論 (Conclusion)

The high-level goal as mentioned at the start was to improve a key aspect of dispatch: ETA. We wanted to add a good amount of value by doing something that was not cost-intensive, rather by doing something that leveraged the technology and information we already had. This is the hallmark of any data-science team, to use common sense and best practices to uncover hidden insights using as simple an approach as possible.

一開始提到的高級目標是改進調度的一個關鍵方面：ETA。我們想通過做一些不耗費成本的事情，而是通過利用我們已經擁有的技術和信息，來增加很多價值。這是任何數據科學團隊的標志，可以使用常識和最佳實踐以盡可能簡單的方法來發現隱藏的見解。

If you enjoyed this blog post, check out what we’ve posted so far over here, and keep an eye out on the same space for some really cool upcoming blogs in the near future. If you have any questions about the problems we face as Data Scientists at Rapido, about transitioning to a start-up after a few years in a different field, or about anything else, please reach out to me on LinkedIn or on siddharth.p@rapido.bike, I look forward to answering any questions!

如果您喜歡這篇博客文章，請查看我們到目前為止在這里發布的內容，并在不久的將來留意相同的空間來關注一些即將發布的非常酷的博客。如果您對我們在Rapido擔任數據科學家時遇到的問題，在其他領域工作幾年后要過渡到初創公司或其他任何問題有任何疑問，請通過LinkedIn或siddharth.p @與我聯系。 Rapido.bike ，我期待回答任何問題！

翻譯自: https://medium.com/rapido-labs/improving-dispatch-with-data-6a307dab7ecc

rcp rapido

總結

以上是生活随笔為你收集整理的rcp rapido_Rapido使用数据改善乘车调度的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： r语言怎么以第二列绘制线图_用卫星图像绘
下一篇：飞机上的氧气面罩有什么用_第2部分—另一