當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

定位系列论文阅读-RoNIN（二）-Robust Neural Inertial Navigation in the Wild: Benchmark, Evaluations

發(fā)布時間：2025/4/5 编程问答 26 豆豆

生活随笔收集整理的這篇文章主要介紹了定位系列论文阅读-RoNIN（二）-Robust Neural Inertial Navigation in the Wild: Benchmark, Evaluations 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

這里寫目錄標題

0.Abstract
- 0.1逐句翻譯
- 0.2總結(jié)
1. Introduction
- 1.1逐句翻譯
- - - - 第一段（就是說慣性傳感器十分重要有研究的必要）
      - 第二段（慣性導航是非常理想的一個導航方式，具體介紹了幾個優(yōu)點）
      - 第三段（介紹當前傳感器需要受到諸多限制的現(xiàn)狀）
      - 第四段（介紹前人為了減弱這些限制做的努力）
      - 第五段（介紹本文貢獻）
      - 第六段（作者將開源并共享數(shù)據(jù)集）
2. Related Work
- 2.1 Physics-based (no priors): 物理基礎(chǔ)，不需要先驗知識
- - - - 傳統(tǒng)的慣性積分面臨很多限制
- 2.2Heuristic priors: 基于一定的先驗知識
- - - - 在一定的限制下可以達到很好的效果，但是這和魯棒性矛盾
- 2.3Data-driven priors: 數(shù)據(jù)驅(qū)動的
- - - - 傳統(tǒng)機械編排方法
- 2.4Inertial navigation datasets:
- - - - RIDI dataset
      - IONet dataset, namely OXIOD
      - 傳統(tǒng)的數(shù)據(jù)集有什么不足
      - 因此作者做了什么變化
3. The RoNIN dataset
- - - - 作者介紹自己的數(shù)據(jù)集，目前不需要看，暫時跳過
4. Robust Neural Inertial Navigation (RoNIN)
- 4.1. Coordinate frame normalization坐標系歸一化
- - - - 第一段（因為載體系和導航系的沖突，所以坐標系選擇很重要）
      - 第二段（因為載體b系在不斷變化，所以我們不能選擇這個東西當做參考系）
      - 第三段(通過重力進行對齊，以及其漏洞)
      - 第四段（因為對齊到固定的坐標系存在問題，所以本文的結(jié)構(gòu)設(shè)計成對齊到任何坐標系都可以接受的狀態(tài)）
      - 第五段（于是本文作者在訓練時選擇了隨機的坐標系作為轉(zhuǎn)化目標，但是測試的時候選擇確定的坐標系）
- 4.2. Backbone architectures骨干架構(gòu)（骨干網(wǎng)絡(luò)）
- - 4.2.1 逐句翻譯
  - - - RoNIN ResNet
      - RoNIN LSTM
      - RoNIN TCN:
  - 4.2.2總結(jié)
- 4.3. Robust velocity loss
- - 4.3.1Latent velocity loss
  - - - Latent velocity loss
      - Strided velocity loss
- 4.4. RoNIN body heading network
- - 4.4.1逐句翻譯
  - 4.4.2總結(jié)

0.Abstract

0.1逐句翻譯

This paper sets a new foundation for data-driven inertial navigation research, where the task is the estimation of positions and orientations of a moving subject from a sequence of IMU sensor measurements.
本文的研究為數(shù)據(jù)驅(qū)動慣性導航的研究奠定了新的基礎(chǔ)，其中的任務(wù)是從一系列IMU傳感器測量中估計運動對象的位置和方向。
（大約就是使用慣性傳感器推算當前的位置）

More concretely, the paper presents
更具體地說，本文提出

a new benchmark containing more than 40 hours of IMU sensor data from 100 human subjects with ground-truth 3D trajectories under natural human motions;
一個新的基準包含超過40小時的IMU傳感器數(shù)據(jù)，來自100名人體受試者在自然人體運動下的地面真實3D軌跡
（文章準備了一個新的數(shù)據(jù)集）

novel neural inertial navigation architectures, making significant improvements for challenging motion cases;
新穎的神經(jīng)慣性導航體系結(jié)構(gòu)，對具有挑戰(zhàn)性的運動情況進行了顯著改進;

and 3) qualitative and quantitative evaluations of the competing methods over three inertial navigation benchmarks. We will share the code and data to promote further research.1
3)對三種慣性導航基準的競爭方法進行定性和定量評估。我們將分享代碼和數(shù)據(jù)，以促進進一步的研究

0.2總結(jié)

1.本文準備了數(shù)據(jù)集和開源代碼，便于大家進一步研究。
2.本文提出了新的慣性導航結(jié)構(gòu)（應(yīng)該指的是使用深度學習，之前大家都沒有使用深度學習來解決這個問題。）

1. Introduction

1.1逐句翻譯

第一段（就是說慣性傳感器十分重要有研究的必要）

An inertial measurement unit (IMU), often a combination of accelerometers, gyroscopes, and magnetometers, plays an important role in a wide range of navigation applications.
慣性測量單元(IMU)通常由加速度計、陀螺儀和磁力計組成，在廣泛的導航應(yīng)用中發(fā)揮著重要作用。

In Virtual Reality, IMU sensor fusion produces real-time orientations of head-mounted displays.
在虛擬現(xiàn)實中，IMU傳感器融合產(chǎn)生的頭戴式顯示器的實時方向。
（大約就是vr當中也需要使用這個東西）

In Augmented Reality applications (e.g., Apple ARKit [1], Google ARCore [7], or Microsoft HoloLens[16]), IMU augments SLAM [17, 14, 6] by resolving scale ambiguities and providing motion cues in the absence of visual features.
在增強現(xiàn)實應(yīng)用中(例如，Apple ARKit[1]，谷歌ARCore[7]，或Microsoft HoloLens[16])， IMU通過解決尺度模糊和在沒有視覺特征的情況下提供運動線索來增強SLAM[17,14,6]。
（慣性傳感器對各種都是有很多幫助的，其他的我不太懂，但是在slam當中慣性傳感器一般是作為一種優(yōu)化項，只是在一定程度上做一下修正，還是一般依靠圖像幾何。多源融合系統(tǒng)當中，如果只剩下慣性傳感器work的話，系統(tǒng)就默認當前已經(jīng)宕機了）

UAVs, autonomous cars, humanoid robots, and smart vacuum cleaners are other emerging domains, utilizing IMUs for enhanced navigation, control, and beyond.
無人機、自動駕駛汽車、類人機器人和智能吸塵器是其他新興領(lǐng)域，它們利用imu增強導航、控制等功能。

第二段（慣性導航是非常理想的一個導航方式，具體介紹了幾個優(yōu)點）

Inertial navigation is the ultimate form of IMU-based navigation, whose task is to estimate positions and orientations of a moving subject only from a sequence of IMU sensor measurements (See Fig. 1).
慣性導航是基于IMU的導航的最終形式，其任務(wù)是僅通過IMU傳感器的一系列測量來估計運動對象的位置和方向(見圖1)。

Inertial navigation has been a dream technology for academic researchers and industrial engineers, as IMUs
慣性導航一直是學術(shù)研究人員和工業(yè)工程師夢寐以求的技術(shù)，比如imu

are energy-efficient, capable of running 24 hours a day;
1)節(jié)能，可24小時運行;

work anywhere even inside pockets; and
2)在任何地方工作，甚至在口袋里;和

are in every smartphone, which everyone carries everyday all the time.
3)在每個人每天都隨身攜帶的智能手機中。

第三段（介紹當前傳感器需要受到諸多限制的現(xiàn)狀）

Most existing inertial navigation algorithms require unrealistic constraints that are incompatible with everyday smartphone usage scenarios.
大多數(shù)現(xiàn)有的慣性導航算法需要不現(xiàn)實的約束，與日常智能手機使用場景不兼容。

For example, an IMU must be attached to a foot to enable the zero speed update heuristic (i.e., a device speed becomes 0 every time a foot touches the ground) [11].
例如，IMU必須附加到腳上，以啟用零速度更新啟發(fā)式(即，每次腳接觸地面時，設(shè)備速度變?yōu)?)[11]。

Step counting methods assume that the IMU is rigidly attached to a body and a subject must walk forward so that the motion direction becomes a constant in device coordinate frame [3]
步長計算方法假設(shè)IMU剛性附著在物體上，物體必須向前行走，使運動方向在設(shè)備坐標系[3]中為常數(shù)

第四段（介紹前人為了減弱這些限制做的努力）

Data-driven approaches have recently made a breakthrough in loosing these constraints [22, 5] where the acquisition of IMU sensor data and ground-truth motion trajectories allows supervised learning of direct motion parameters (e.g., a velocity vector from IMU sensor history).
數(shù)據(jù)驅(qū)動方法最近在放寬這些限制方面取得了突破[22,5]，IMU傳感器數(shù)據(jù)和地面真實運動軌跡的獲取允許監(jiān)督學習直接運動參數(shù)(例如，從IMU傳感器歷史中獲得速度矢量)。

第五段（介紹本文貢獻）

This paper seeks to take data-driven inertial navigation research to the next level via the following three contributions.
本文試圖通過以下三個方面的貢獻，將數(shù)據(jù)驅(qū)動慣性導航研究推向一個新的水平。

? The paper provides the largest inertial navigation database consisting of more than 42.7 hours of IMU and ground-truth 3D motion data from 100 human subjects. Our data acquisition protocol allows users to handle smartphones naturally as in real day-to-day activities.
本文提供了最大的慣性導航數(shù)據(jù)庫，包含了超過42.7小時的IMU，以及來自100名受試者的地真三維運動數(shù)據(jù)。我們的數(shù)據(jù)采集協(xié)議允許用戶像在現(xiàn)實生活中一樣自然地處理智能手機。

? The paper presents novel neural architectures for inertial navigation, making significant improvements for challenging motion cases over the existing best method.
本文提出了一種新穎的慣性導航神經(jīng)體系結(jié)構(gòu)，在具有挑戰(zhàn)性的運動情況下對現(xiàn)有的最佳方法進行了顯著改進。

? The paper presents extensive qualitative and quantitative evaluations of existing baselines and state-of-the-art methods on the three benchmarks.
本文對現(xiàn)有的基線和三個基準的最新方法進行了廣泛的定性和定量評價。

第六段（作者將開源并共享數(shù)據(jù)集）

We will share the code and data to promote further research in a hope to establish an ultimate anytime anywhere navigation system for everyone’s smartphone.
我們將分享代碼和數(shù)據(jù)，以促進進一步的研究，希望建立一個最終的導航系統(tǒng)，隨時隨地為每個人的智能手機。
開源代碼
數(shù)據(jù)集
官方網(wǎng)站

2. Related Work

We group inertial navigation algorithms into three categories based on their use of priors.
我們根據(jù)慣性導航算法對先驗的使用將其分為三類。

2.1 Physics-based (no priors): 物理基礎(chǔ)，不需要先驗知識

傳統(tǒng)的慣性積分面臨很多限制

IMU double integration is a simple idea for inertial navigation.
IMU雙積分是一種簡單的慣性導航思想。

Given the device orientation (e.g., via Kalman filter[12] on IMU signals), one subtracts the gravity from the device acceleration, integrates the residual accelerations once to get velocities, and integrates once more to get positions.
給定設(shè)備方向(例如，通過IMU信號上的Kalman濾波器[12])，從設(shè)備加速度中減去重力，將剩余加速度積分一次得到速度，再積分一次得到位置。

Unfortunately, sensor biases explode quickly in the double integration process, and these systems do not work in practice without additional constraints.
不幸的是，在雙積分過程中，傳感器偏差會迅速增加，如果沒有額外的約束，這些系統(tǒng)在實踐中就無法工作。

A foot mounted IMU with zero speed update is probably the most successful example, where the sensor bias can be corrected subject to a constraint that the velocity must become zero whenever a foot touches the ground.
安裝在腳上的IMU的零速度更新可能是最成功的例子，在這種情況下，傳感器的偏差可以被糾正，只要約束條件是當腳接觸地面時，速度必須為零。

2.2Heuristic priors: 基于一定的先驗知識

在一定的限制下可以達到很好的效果，但是這和魯棒性矛盾

Human motions are highly repetitive.
人類的動作是高度重復的。

Most existing inertial navigation research seeks to find heuristics exploiting such motion regularities.
現(xiàn)有的大多數(shù)慣性導航研究試圖利用這種運動規(guī)律尋找啟發(fā)式。

Step counting is a popular approach assuming that

An IMU is rigidly attached to a body;

The motion direction is fixedwith respect to the IMU; and

The distance of travel is proportional to the number of foot-steps.
計算步數(shù)是一種流行的方法

IMU剛性附著在主體上;
2)運動方向相對于IMU固定;和
3）行走的距離與行走的步數(shù)成正比。

The method produces impressive results in a controlled environment where these assumptions are assured.
在這些假設(shè)得到保證的受控環(huán)境中，該方法產(chǎn)生了令人印象深刻的結(jié)果。

More sophisticated approaches utilize principal component analysis [10] or frequency domain analysis [13] to infer motion directions.
更復雜的方法是利用主成分分析[10]或頻域分析[13]來推斷運動方向。

However,these heuristic based approaches do not match up with the
robustness of emerging data-driven methods [22].
然而，這些基于啟發(fā)式的方法與新興的數(shù)據(jù)驅(qū)動方法[22]的魯棒性并不匹配。

2.3Data-driven priors: 數(shù)據(jù)驅(qū)動的

傳統(tǒng)機械編排方法

Robust IMU double integration(RIDI) was the first data driven Inertial navigation method [22].
魯棒IMU雙積分(RIDI)是首個數(shù)據(jù)驅(qū)動的慣性導航方法[22]。

RIDI focuses on regressing velocity vectors in a device coordinate frame, while relying on traditional sensor fusion methods to estimate device orientations.
RIDI專注于在設(shè)備坐標系中回歸速度矢量，而依賴于傳統(tǒng)的傳感器融合方法來估計設(shè)備的方向。

RIDI works for complex motion cases such as backward-walking, significantly expanding the operating ranges of the inertial navigation system.
RIDI適用于后向行走等復雜運動情況，顯著擴大了慣性導航系統(tǒng)的工作范圍。

IONet is a neural network based approach, which regresses the velocity magnitude and the rate of motion-heading change without relying on external device orientation information [4].
IONet是一種基于神經(jīng)網(wǎng)絡(luò)的方法，在不依賴外部設(shè)備定位信息[4]的情況下，對速度幅度和運動方向變化速率進行回歸。
（也就說這個東西可以僅僅依賴傳感器數(shù)據(jù)對加速度和）

2.4Inertial navigation datasets:

RIDI dataset

RIDI dataset utilized a phone with 3D tracking capability (Lenovo Phab Pro 2) to collect IMU-motion data under four different phone placements (i.e., a hand, a bag, a leg pocket, and a body).
RIDI數(shù)據(jù)集利用具有3D跟蹤功能的手機(聯(lián)想Phab Pro 2)收集四種不同手機放置位置(即手、包、腿袋和身體)下的imu運動數(shù)據(jù)。

The Visual Inertial SLAM produced the ground-truth motion data.
視覺慣性SLAM產(chǎn)生運動數(shù)據(jù)的真值。

The data was collected by 10 human subjects, totalling 2.5 hours.
數(shù)據(jù)由10名受試者收集，耗時2.5小時。

IONet dataset, namely OXIOD

IONet dataset, namely OXIOD used a high precision motion capture system (Vicon) under four different phone placements (i.e., a hand, a bag, a pocket, and a trolley) [5].
IONet數(shù)據(jù)集，即oxod采用高精度運動捕捉系統(tǒng)(Vicon)，在四種不同的手機放置(即手、包、口袋和手推車)下[5]。
The data was collected by five human subjects, totalling 14.7 hours.
數(shù)據(jù)由5名受試者收集，共耗時14.7小時。

傳統(tǒng)的數(shù)據(jù)集有什么不足

The common issue in these datasets is the reliance on a single device for both IMU data and the ground-truth motion acquisition.
這些數(shù)據(jù)集中的共同問題是IMU數(shù)據(jù)和地真運動采集都依賴于單個設(shè)備。

The phone must have a clean line-of-sight for Visual Inertial SLAM or must be clearly visible for the Vicon system all the time, prohibiting natural phone handling especially for a bag and a leg pocket scenarios.
對于視覺慣性SLAM來說，手機必須有一個清晰的視線范圍，或者Vicon系統(tǒng)必須始終清晰可見，禁止自然的手機處理，特別是在包和腿袋的情況下。

因此作者做了什么變化

This paper presents a new IMU-motion data acquisition protocol that utilizes two smartphones to overcome these issues.
本文提出了一種新的imu運動數(shù)據(jù)采集協(xié)議，利用兩個智能手機來克服這些問題。

3. The RoNIN dataset

Scale, diversity and fidelity are the three key factors in building a next-generation inertial navigation database.
規(guī)模、多樣性和保真度是構(gòu)建下一代慣性導航數(shù)據(jù)庫的三個關(guān)鍵因素

In comparison to the current largest database OXIOD [5], our dataset boasts
與目前最大的數(shù)據(jù)庫oxod[5]相比，我們的數(shù)據(jù)集值得夸耀

作者介紹自己的數(shù)據(jù)集，目前不需要看，暫時跳過

4. Robust Neural Inertial Navigation (RoNIN)

Our neural architecture for inertial navigation, dubbed Robust Neural Inertial Navigation (RoNIN), takes ResNet [8], Long Short Term Memory Network (LSTM) [9], or Temporal Convolutional Network (TCN) [2] as its backbone.
我們用于慣性導航的神經(jīng)結(jié)構(gòu)，被稱為魯棒神經(jīng)慣性導航(RoNIN)，以ResNet[8]、Long - Short - Term Memory Network (LSTM)[9]或時態(tài)卷積網(wǎng)絡(luò)(TCN)[2]為骨干。

RoNIN seeks to regress a velocity vector given an IMU sensor history with two key design priciples:

Coordinate frame normalization defining the input and output feature space and

Robust velocity losses improving the signal-to-noise-ratio even with noisy regression targets.
RoNIN試圖通過兩個關(guān)鍵的設(shè)計原則來回歸IMU傳感器歷史的速度矢量:
1)坐標系歸一化定義輸入輸出特征空間和
2)魯棒速度損失，即使在有噪聲的回歸目標下，也能提高信噪比。

We now explain the coordinate frame normalization strategy, three backbone neural architectures, and the robust velocity losses. Lastly, the section presents our neural architecture for the body heading regression.
我們現(xiàn)在解釋坐標系歸一化策略，三個骨干神經(jīng)結(jié)構(gòu)，和魯棒速度損失。最后，該部分介紹了我們的神經(jīng)結(jié)構(gòu)的身體頭部回歸。

4.1. Coordinate frame normalization坐標系歸一化

第一段（因為載體系和導航系的沖突，所以坐標系選擇很重要）

Feature representations, in our case the choice of coordinate frames, have significant impacts on the training.
在我們的坐標框架的選擇當中，特征表示對訓練有顯著的影響。

IMU sensor measurements come from moving device coordinate frames, while ground-truth motion trajectories come from a global coordinate frame.
IMU傳感器的測量數(shù)據(jù)來自移動設(shè)備的坐標系，而地真運動軌跡來自全局坐標系。
（傳感器的數(shù)據(jù)是使用的是b系，但是導航使用的是n系）

RoNIN uses a heading-agnostic coordinate frame to represent both the input IMU and the output velocity data.
RoNIN使用一個不確定方向的坐標系來表示輸入的IMU和輸出的速度數(shù)據(jù)。

第二段（因為載體b系在不斷變化，所以我們不能選擇這個東西當做參考系）

Suppose we pick the local device coordinate frame to encode our data.
假設(shè)我們選擇本地設(shè)備坐標系來編碼我們的數(shù)據(jù)。

The device coordinate changes every frame, resulting in inconsistent motion representation.
設(shè)備坐標每幀都會改變，導致不一致的運動表示。

For example, target velocities would change depending on how one holds a phone even for exactly the same motions.
例如，目標速度會根據(jù)手持手機的方式而變化，即使是在運動完全相同的情況下。

第三段(通過重力進行對齊，以及其漏洞)

RIDI [22] proposed the stabilized IMU coordinate frame, which is obtained from the device coordinate frame by aligning its Y-axis with the negated gravity direction.
RIDI[22]提出了穩(wěn)定的IMU坐標系，通過將設(shè)備的y軸對準負重力方向，得到穩(wěn)定的IMU坐標系。
（通過和重力校準，得到一個相對固定的坐標系）

However, this alignment process has a singularity (ambiguity) when the Y-axis points towards the gravity (e.g., a phone is inside a leg pocket upside-down), making the regression task harder, usually completely fail due to the randomness.
然而，當y軸指向重力時，這個對齊過程會出現(xiàn)奇點(模糊性)，當y軸指向重力時(例如，手機倒放在腿袋里)，這使得回歸任務(wù)更加困難，通常會由于隨機性而完全失敗。

第四段（因為對齊到固定的坐標系存在問題，所以本文的結(jié)構(gòu)設(shè)計成對齊到任何坐標系都可以接受的狀態(tài)）

RoNIN uses a heading-agnostic coordinate frame (HACF), that is, any coordinate frame whose Z axis is aligned with gravity. In other words, we can pick any such coordinate frame as long as we keep it consistent through-out the sequence.
RoNIN使用了一個頭部無關(guān)的坐標系(HACF)，也就是說，任何Z軸與重力對齊的坐標系。換句話說，我們可以選擇任何這樣的坐標系只要我們在整個序列中保持一致。

The coordinate transformation into HACF does not suffer from singularities or discontinuities withproper rotation representation, e.g. with quaternion.
通過適當?shù)男D(zhuǎn)表示，例如四元數(shù)，坐標變換到HACF不會出現(xiàn)奇點或不連續(xù)。

第五段（于是本文作者在訓練時選擇了隨機的坐標系作為轉(zhuǎn)化目標，但是測試的時候選擇確定的坐標系）

During training, we use a random HACF at each step, which is defined by randomly rotating ground-truth trajectories on the horizontal plane.
在訓練過程中，我們在每一步使用隨機的HACF，它是由水平面上隨機旋轉(zhuǎn)的軌跡真值確定的。

IMU data is transformed into the same HACF by the device orientation and the same horizontal rotation. The use of device orientations effectively incorporates sensor fusion 3 into our data-driven system.
IMU數(shù)據(jù)通過設(shè)備方向和水平旋轉(zhuǎn)轉(zhuǎn)換為相同的HACF。設(shè)備定位的使用有效地將傳感器融合到我們的數(shù)據(jù)驅(qū)動系統(tǒng)中。

At test time, we use the coordinate frame defined by system device orientations from Android or iOS, whose Z axis is aligned with gravity.
在測試時，我們使用由Android或iOS系統(tǒng)設(shè)備方向定義的坐標系，其Z軸與重力對齊。

4.2. Backbone architectures骨干架構(gòu)（骨干網(wǎng)絡(luò)）

4.2.1 逐句翻譯

We present three RoNIN variants based on ResNet [8], LSTM [9] or TCN [2].
我們提出了三種基于ResNet[8]、LSTM[9]和TCN[2]的RoNIN變體。

RoNIN ResNet

RoNIN ResNet: We take the 1D version of the standard ResNet-18 architecture [8] and add one fully connected layer with 512 units at the end to regress a 2D vector.
RoNIN ResNet:我們采用標準ResNet-18架構(gòu)[8]的1D版本，并添加一個末端帶有512個單元的全連接層來回歸一個2D向量。

At frame i, the network takes IMU data from frame i ? 200 to i as a 200×6 tensor and produces a velocity vector at frame i. At test time, we make predictions every five frames and integrate them to estimate motion trajectories.
在第i幀，網(wǎng)絡(luò)將從第i - 200幀到第i幀的IMU數(shù)據(jù)作為200×6張量，在第i幀產(chǎn)生一個速度矢量。在測試時，我們每5幀進行預測，并對其進行積分來估計運動軌跡。
（這里大約是使用一個滑動窗口進行劃分數(shù)據(jù)，最終回歸出來一個速度）

RoNIN LSTM

RoNIN LSTM: We use a stacked unidirectional LSTM while enriching its input feature by concatenating the output of a bilinear layer [20].
RoNIN LSTM:我們使用一個堆疊的單向LSTM，同時通過連接雙線性層[20]的輸出來豐富其輸入特征。

RoNIN-LSTM has three layers each with 100 units and regresses a 2D vector for each frame, to which we add an additional integration layer to calculate the loss.
RoNIN-LSTM有三層，每層100個單元，每幀回歸一個二維向量，我們在其中增加一個積分層來計算損耗。

See Sect. 4.3 for the details of the integration layer.
有關(guān)積分層的詳細信息，請參閱4.3節(jié)。

RoNIN TCN:

RoNIN TCN: TCN is a recently proposed CNN architecture, which approximates many-to-many recurrent architectures with dilated causal convolutions.
RoNIN TCN: TCN是最近提出的CNN體系結(jié)構(gòu)，它近似于多對多循環(huán)體系結(jié)構(gòu)，具有擴展的因果卷積。

RoNIN TCN has six residual blocks with 16, 32, 64, 128, 72, and 36 channels, respectively, where a convolutional kernel of size 3 leads to the receptive field of 253 frames.
RoNIN TCN有6個殘差塊，分別為16、32、64、128、72和36個通道，其中卷積核大小為3，接收域253幀。

4.2.2總結(jié)

這里從這些backbone可以看出來本文是使用一個滑動窗口預測一個每個窗口的速度。

4.3. Robust velocity loss

Defining a velocity for each IMU frame amounts to computing the derivative of low-frequency VI-SLAM poses at much higher frame rate. This makes the ground-truth velocity very noisy.
為每個IMU幀定義一個速度相當于在更高的幀速率下計算低頻VI-SLAM姿態(tài)的導數(shù)。這使得速度的真值非常嘈雜。
（就是vi-slam作為真值輸出頻率比較低，但是我們計算損失的時候速度輸出頻率很高，論文里描述為噪聲，我理解這里的情況和知識蒸餾是一樣的，v-slam輸出的是一種一段時間的均值，并不是這里每個時刻都是這個速度，所以我們在實際學習的過程中，如果讓每個時刻的速度都訓練成一個平均速度，那是十分不合理的。）

We propose two robust velocity losses that increase the signal-to-noise-ratio for better motion learning.
我們提出兩種魯棒速度損失，增加信噪比，以更好的運動學習。

4.3.1Latent velocity loss

Latent velocity loss

Latent velocity loss: RoNIN LSTM/TCN regresses a sequence of two dimensional vectors over time.
潛在速度損失:RoNIN LSTM/TCN隨時間回歸一個二維矢量序列。

We add an integration layer that sums up the vectors (over 400/253 frames for LSTM/TCN), then define a L2 norm against the ground-truth positional difference over the same framewindow.
我們增加了一個積分層來總結(jié)向量(LSTM/TCN超過400/253幀)，然后定義了一個L2范數(shù)，針對同一框架窗口上的地面真值位置差。

Note that this loss simply enforces that the sum of per-frame 2D vectors must match the position difference.
注意，這種損失只是強制每幀2D向量的和必須匹配位置差。

To our surprise, both LSTM and TCN learn to regress a velocity in this latent layer before the integration, and hence, we name it the latent velocity loss.
出乎我們意料的是，LSTM和TCN在整合之前都學會了在這個潛層中回歸一個速度，因此，我們將其命名為潛速度損失。

Strided velocity loss

Strided velocity loss: For RoNIN ResNet, the network learns to predict positional difference over a stride of 200 frames (i.e., one second), instead of instantaneous velocities.
跨步速度損失:對于RoNIN ResNet，網(wǎng)絡(luò)學習預測200幀(即1秒)跨度內(nèi)的位置差異，而不是瞬時速度。

More specifically, we compute MSE loss between the 2D network output at frame i and Pi ? Pi?200, where Pi is the ground truth position at frame i in the global frame.
更具體地說，我們計算在第i幀的2D網(wǎng)絡(luò)輸出和Pi?Pi?200之間的MSE損失，其中Pi是全局幀第i幀的地面真實位置。

4.4. RoNIN body heading network

4.4.1逐句翻譯

Different from the position regression, the task of heading regression becomes inherently ambiguous when a subject is stationary.
與位置回歸不同，當主體靜止時，航向回歸的任務(wù)具有固有的模糊性。
（就是你速度的話如果是靜止，你就把他回歸到0就完事了，但是航向在靜止的時候你回歸成什么東西都不大對勁）

Suppose one is sitting still at a chair for 30 seconds.
假設(shè)一個人在椅子上靜坐了30秒

We need to access the IMU sensor data 30 seconds back in time to estimate the body heading, as IMU data have almost zero information after the sitting event.
我們需要提前30秒訪問IMU傳感器數(shù)據(jù)來估計身體的方向，因為IMU數(shù)據(jù)在坐姿事件后幾乎沒有任何信息。

Therefore, we borrow the RoNIN LSTM architecture for the task, which is capable of keeping a long memory.
因此，我們借用了RoNIN LSTM架構(gòu)來完成這個任務(wù)，它能夠保持長時間的記憶。

More precisely, we take the RoNIN LSTM architecture without the integration layer, and let the network predict a 2D vector (x, y), which are sin and cos values of the body heading angle at each frame.
更準確地說，我們采用沒有集成層的RoNIN LSTM體系結(jié)構(gòu)，讓網(wǎng)絡(luò)預測一個二維向量(x, y)，即每一幀的體向角的sin值和cos值。

During training, we unroll the network over 1,000 steps for back-propagation.
在訓練期間，我們展開網(wǎng)絡(luò)超過1000步進行反向傳播。

Note that the initial state is ambiguous if the subject is stationary, therefore we only update network parameters when the first frame of the unrolled sequence have velocity magnitude greater than 0.1[m/s].
注意，如果被試是靜止的，初始狀態(tài)是模糊的，因此我們只在展開序列的第一幀速度幅度大于0.1[m/s]時更新網(wǎng)絡(luò)參數(shù)。
（因為靜止的時候我們認為這個航向是不可信的，所以我們在判斷為靜止的時候不進行反向傳播更新參數(shù)）

We use MSE loss against sin and cos values of ground-truth body heading angles. We also add a normalization loss as k 1 ? x2 ? y2k to guide the network to predict valid trigonometric values.
我們利用均方誤差(MSE)損失對地真體航向角的正弦值和余弦值。我們還添加了一個歸一化損失作為k 1?x2?y2k，以指導網(wǎng)絡(luò)預測有效的三角函數(shù)值。

4.4.2總結(jié)

作者在這里解決了兩個問題：

1.人們靜止的時候完全檢測不到任何信息，所以直接輸出航向是不現(xiàn)實的，所以需要靜止之前一段時間的速度。
因此作者使用了擁有長時間記憶的LSTM網(wǎng)絡(luò)來解決這個問題。
2.人們在靜止的時候我們很難規(guī)定他的航向。
所以作者在測試者幾乎靜止的時候就直接不對航向網(wǎng)絡(luò)進行反向傳播了。

總結(jié)

以上是生活随笔為你收集整理的定位系列论文阅读-RoNIN（二）-Robust Neural Inertial Navigation in the Wild: Benchmark, Evaluations的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：定位系列论文：基于行为识别的楼层定位（二
下一篇：欧拉角推算旋转矩阵的问题