當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

点云网络的论文理解（一）-点云网络的提出 PointNet : Deep Learning on Point Sets for 3D Classification and Segmentation

發布時間：2025/4/5 编程问答 19 豆豆

生活随笔收集整理的這篇文章主要介紹了点云网络的论文理解（一）-点云网络的提出 PointNet : Deep Learning on Point Sets for 3D Classification and Segmentation 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1.摘要

1.1逐句翻譯

Point cloud is an important type of geometric data structure.
點云是一種重要的數據結構。
Due to its irregular format, most researchers transform such data to regular 3D voxel grids or collections of images.
由于其格式不規則，大多數研究人員將此類數據轉換為規則的3D體素網格或圖像集合。
This, however, renders data unnecessarily voluminous and causes issues.
然而，這會導致不必要的大量數據渲染并引起問題。
In this paper, we design a novel type of neural network that directly consumes point clouds,
本文設計了一種直接消耗點云的新型神經網絡，
which well respects the permutation invariance of points in the input.
這很好地考慮了輸入中點的排列不變性。
Our network, named PointNet, provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing.
我們的網絡名為PointNet，為從對象分類、內容部分分割到場景語義解析等應用程序提供了統一的體系結構。
Though simple, PointNet is highly efficient and effective.
雖然簡單，PointNet卻非常高效。
Empirically, it shows strong performance on par or even better than state of the art.
從經驗上看，它顯示了與現有技術水平相當甚至更好的強大性能。
Theoretically,we provide analysis towards understanding of what the network has learnt and why the network is robust with
respect to input perturbation and corruption.
理論上，我們提供分析來理解網絡學習了什么，以及為什么網絡關于輸入干擾和腐敗具有魯棒性。

1.2總結

我們都知道，摘要一篇文章的最重要部分，所以我們應當先閱讀摘要部分。這里大約說了幾個事情：

1.點云是一個重要的數據結構，所以有研究的必要。
2.點云有自己本身你的特性，也就是 irregular format，之前的研究人員都想將其轉化為一種立體模型，但是這恰恰是丟失了其本身的特性。（natural invariances）
我理解這里說的大約意思就是，就像是一開始的機器學習，先要提取信息，之后再將提取到的信息進行分析得出結果。我個人理解這里有一個問題就是中間提取的信息的好壞很難評價。不如直接將提取的信息和分析信息交給一個深度學習模型。
類比起來，這里雖然我們中間得的的結果很好評價，直接看像不像一個立體形狀就行了。但是，在這個轉化為立體圖形的過程中，可能會丟失一些我沒注意到的信息。所以這個文章直接從點云出發，到各種具體的應用。
(上面內容都是我一開始閱讀的感受，讀完之后我又有了下面的感受：)
這里提出的natural invariances，主要是指的旋轉的時候其本身不會發射變化，文章正是利用這個性質想到了對稱函數使用的必要性，并在實驗中產生了很好的效果。
3.另外會引入很多不必要的復雜（voluminous）
我理解這里首先第一個問題，就是維度陷阱的問題，這個轉化為三維的過程中結構變得更加復雜，所以我們的模型在處理這些輸入的過程中，就需要更多地參數，也就引發更多地參數調整，也就需要更多的數據集，這是我們不想看到的
4.所以文章提出了一種從點云開始一步到位的模型
剛開始閱讀的時候這里我是沒有讀懂的，（閱讀全文之后明白）這個特點應該指的是，相比于之前的模型還需要額外進行結構的提取，這里只需要從點云開始就能獲得最終的效果。
5.另外，文章也對模型學習的內容作了一定的解釋。
這個點應該就是體現了最近的深度學習的一個潮流，也就是越來越向可解釋的方向發展。大家都想對自己設計的模型做出一個解釋。
6.另外文章也處理了魯棒性的問題
魯棒性這個東西是大家都要考慮的問題，我們要做的就是吸收這里他處理這些通用的魯棒性問題的方法。我認為任何實際應用的內容都應經過魯棒性測試，所以這里的測試雖然不是創新點，但是十分必要。

2. Introduction

2.1逐句翻譯

原文部分
第一段
In this paper we explore deep learning architectures capable of reasoning about 3D geometric data such as point clouds or meshes.
在本文中，我們探索了能夠推理三維幾何數據(如點云或網格圖)的深度學習體系結構。
Typical convolutional architectures require highly regular input data formats, like those of image grids or 3D voxels, in order to perform weight sharing and other kernel optimizations.
典型的卷積架構要求高度規則的輸入數據格式，如圖像網格或3D體素，以執行權重共享和其他內核優化。
Since point clouds or meshes are not in a regular format, most researchers typically transform such data to regular 3D voxel grids or collections of images (e.g, views) before feeding them to a deep net architecture.
由于點云或網格不是常規格式，大多數研究人員通常會將這些數據轉換為常規的3D體素網格或圖像集合(例如視圖)，然后再將它們輸入到深層網絡架構中。
This data representation transformation, however, renders the resulting data unnecessarily voluminous — while also introducing quantization artifacts that can obscure natural invariances of the data
然而，這種數據表示轉換會使結果數據變得不必要的龐大——同時還會引入量化工件，從而掩蓋數據的自然不變性

第二段
For this reason we focus on a different input representation for 3D geometry using simply point clouds– and name our resulting deep nets PointNets.
出于這個原因，我們專注于開創一種不同的網絡模型，直接處理點云原有的數據-并且將我們得到的結果網絡命名為PointNets.
Point clouds are simple and unified structures that avoid the combinatorial irregularities and complexities of meshes,and thus are easier to learn from.
點云是簡單統一的結構，避免了不規則的組合和復雜的網格，因此更容易學習。
The PointNet, however,still has to respect the fact that a point cloud is just a set of points and therefore invariant to permutations of its members, necessitating certain symmetrizations in the net computation.
然而，PointNet仍然必須尊重這樣一個事實:點云只是點的集合，因此對其成員的排列是不變的，因此需要在網絡計算中進行一定的均衡。
Further invariances to rigid motions also need to be considered.
還需要考慮剛性運動的進一步不變性。

第三段
Our PointNet is a unified architecture that directly takes point clouds as input and outputs either class labels for the entire input or per point segment/part labels for each point of the input.
我們的PointNet是一個統一的體系結構，它直接將點云作為輸入和輸出，要么是整個輸入的類標簽，要么是輸入的每個點的分段/部分標簽。
The basic architecture of our network is surprisingly simple as in the initial stages each point is processed identically and independently.
我們的網絡的基本架構出奇地簡單，因為在初始階段，每個點都以相同和獨立的方式處理。
In the basic setting each point is represented by just its three coordinates (x, y, z). Additional dimensions may be added by computing normals and other local or global features.
在基本設置中，每個點僅用它的三個坐標(x, y, z)表示。額外的維度可以通過計算法線和其他局部或全局特征來添加。

第四段
Key to our approach is the use of a single symmetric function, max pooling.
我們的方法的關鍵是使用單一的對稱函數，最大池化。
Effectively the network learns a set of optimization functions/criteria that select interesting or informative points of the point cloud and encode the reason for their selection.
該網絡有效地學習了一組優化函數/準則，這些函數/準則選擇點云中有趣的或有信息的點，并對其選擇的原因進行編碼。
（我理解這里說的就是可以訓練一個函數，這個函數可以從中選出一些包含很多信息的點，并對其進行編碼，也就是轉換或是加入信息）
The final fully connected layers of the network aggregate these learnt optimal values into the global descriptor for the entire shape as mentioned above (shape classification) or are used to predict per point labels (shape segmentation).
網絡的最后一層全連接層：將這些學習到的最優值聚集到整個形狀的全局描述符中(形狀分類)或用于預測每個點標簽(形狀分割)。

第五段
Our input format is easy to apply rigid or affine transfor-mations to, as each point transforms independently.
我們的輸入格式很容易應用剛性或仿射變換，因為每個點都是獨立變換的。大約就是下圖：

Thus we can add a data-dependent spatial transformer network that attempts to canonicalize the data before the PointNet processes them, so as to further improve the results.
因此，我們可以添加一個數據相關的空間變壓器網絡，在PointNet處理數據之前嘗試對數據進行規范化，從而進一步改善結果。

第六段
We provide both a theoretical analysis and an experimental evaluation of our approach.
我們提供了我們的方法的理論分析和實驗評估。
We show that our network can approximate any set function that is continuous.
我們證明了我們的網絡可以近似任何連續的集合函數。（因為這個東西是個點集）
More interestingly, it turns out that our network learns to summarize an input point cloud by a sparse set of key points, which roughly corresponds to the skeleton of objects according to visualization.
更有趣的是，我們的網絡將輸入的點云總結成了一個稀疏的點的集合，根據可視化，關鍵點恰好大致對應于對象的骨架。
The theoretical analysis provides an understanding why our PointNet is highly robust to small perturbation of input points as well as to corruption through point insertion (outliers) or deletion(missing data).
通過這個，理論分析提供了一個理解，為什么我們的PointNet對于輸入點的小擾動以及通過點插入(異常值)或刪除(缺失數據)造成的損壞是高度魯棒的。

第七段
On a number of benchmark datasets ranging from shape classification, part segmentation to scene segmentation,we experimentally compare our PointNet with state-of-the-art approaches based upon multi-view and volumetric representations.
在一些基準數據集上，從形狀分類，部分分割到場景分割，我們實驗性地比較了PointNet與基于多視圖和體積表示的最先進的方法。
Under a unified architecture, not only is our PointNet much faster in speed, but it also exhibits strong performance on par or even better than state of the art.
在統一的體系結構下，我們的PointNet不僅速度快得多，而且表現出與現有技術水平相當甚至更好的性能。

第八段：文章自己提出的自己做的貢獻
The key contributions of our work are as follows:
我們工作的主要貢獻如下:
? We design a novel deep net architecture suitable for consuming unordered point sets in 3D;
我們設計了一種新的適用于消費無序點集的三維深網體系結構;
? We show how such a net can be trained to perform 3D shape classification, shape part segmentation and scene semantic parsing tasks;
我們展示了如何訓練這樣一個網絡來執行三維形狀分類，形狀部分分割和場景語義分析任務;
? We provide thorough empirical and theoretical analysis on the stability and efficiency of our method;
我們對我們的方法的穩定性和效率提供了深入的經驗和理論分析;
? We illustrate the 3D features computed by the selected neurons in the net and develop intuitive explanations for its performance.
我們舉例說明網絡中選定的神經元計算出的三維特征，并對其性能發展出直觀的解釋。

第九段
The problem of processing unordered sets by neural nets is a very general and fundamental problem – we expect that our ideas can be transferred to other domains as well.
利用神經網絡處理無序集是一個非常普遍和基本的問題，所以，我們希望我們的想法也能轉移到其他領域。

圖片部分

We propose a novel deep net architecture that consumes raw point cloud (set of points) without voxelization or rendering.
我們提出了一種新的深度網絡架構，它使用原始點云(點集)，而不需要體素化或渲染。
It is a unified architecture that learns both global and local point features, providing a simple, efficient and effective approach for a number of 3D recognition tasks.
它是一個統一的體系結構，可以學習全局和局部點特征，為許多三維識別任務提供了一個簡單、高效和有效的方法。

2.2總結

首先，作者想要設計出新的網絡得調研一下傳統的網絡模型所具有的特點得到了如下的結果：
卷積網絡和要設計的網絡不太兼容，卷積網絡的關鍵在于重用卷積核的參數，但是這里顯然不太好，這里直接使用并不好，因為點云不具有相似的結構。所以不能使用。

之后，作者又把上面的原有網絡不好說了一次也就是：1.voluminous 2.less respects the permutation invariance of points

然后，就是提出了PointNet，他具有:

1.簡單易學的特點，我理解的原因是不需要使用對于點圖的專有的人工處理。
也就是不需要對其進行技術性的轉化。
2.同時也要注意到點云只是一個set，因為其排列的不變性，所以要在計算的時候做出一定的均衡。
這里提出的其實不是PointNet的特點，提出的是set of point的特點，也就是，
3.還需要考慮剛性運動的進一步不變性。
這里提出的其實不是PointNet的特點，提出的是set of point的特點，也就是，
4.文章提出的PointNet可以解決三種常見問題：1）.整張圖片一個大標簽 2）.圖片的每個像素有一個標簽3）.圖片的每一部分信息有一個標簽。
我理解這個大約可以解決大部分問題，例如：現在比較火的圖像識別、語義分割。
5.PointNet結構很簡單，因為在一開始的時候這個系統中的每個點是單獨運算的。
剛讀到這里的時候我覺得什么迷惑，應該是使用了某種非線性操作來完成操作的，因為線性操作完全可以用一個n*n的線性層，那樣效果更好。
讀完全文之后，我理解到這里平等是真的平等，一開始的操作都是針對一個點坐標在進行處理，并沒有牽涉到其他點坐標。一開始理解的并不到位
6.我們的基礎輸入當中，只需要（x,y,z）三個輸入，更多維度的信息都是網絡自動幫我們填充的。
我剛剛讀到這里的時候我感覺這個可能是和卷積增加channel的方式比較相似。后來在實現的時候，覺得差不多就是這樣，但是其實其他方法應該也行。
7.PointNet的前幾層可以提取出一些特殊的點，這些點富含信息，并且可以依據提取出他的原因對其進行編碼。并且試驗表明這些點往往恰好集中在物體的邊界上面，同樣的這也可以在理論上解釋為什么網絡有較好的抗干擾能力。
這個東西我理解就是提取信息，就像常見的網絡的淺層網絡一樣，提取出信息，只是這里使用了一個很奇怪的方法，往后看看再說。
讀完全文之后，還是不懂，試驗一下確實會這樣子。
8.最后使用一個全連接層（fully connected layers）來輸出結果
這個確實沒什么可說的，畢竟分類任務，最終總要全連接，就算使用FCN也逃不出全連接的使用。
9.這個東西支持剛性變換和仿射變換，所以我們可以做一些預處理
我理解一下這里就是說我們可以提前使用仿射變換對我們收集的不太好的數據進行一定的處理，或者也許我們也可以用這個特性做一些數據增強，來解決我們實際應用當中收集的數據效果不好的情況。我個人覺得這個數據增強重要一些。
復現代碼的時候，發現這里理解錯了，這里的情況是使用旋轉網絡，訓練一種特殊的網絡，幫助我們來處理旋轉和仿射變換的問題。
10.論文作者說試驗證明了這個PointNet是可以擬合任何連續的set函數。
這個話其實很大，我覺得大約是離散數據我們都可以考慮往這個方向上考慮。
11.這個網絡比其他的網絡訓練的速度快、精確度要好。
我覺得這個效果快的主要原因是這個東西結構比較簡單，參數可能比較少，所以速度快。效果好的原因，可能是他所說的沒有丟失點云本身的不能觀察的信息吧。

最后，文章提出了利用神經網絡處理無序集是一個非常普遍和基本的問題，希望其他領域也能借鑒本文。我覺得這個東西說的確實沒有問題。

另外，我覺得圖片這個部分有一個非常魔幻的描述：learns both global and local point features 讀到這里我是要畫上一個問號的，因為在我的了解當中，要解決的的一個大問題就是局部和細節信息。一般來說，使用淺層網絡，局部信息更好一些，使用深層網絡，全局信息更好一點。我當時猜測應該是使用了一個跳連接，結果發現確實如此。

3.Related Work 相關工作

暫時略過

4.Problem Statement 問題描述

這里有一個經驗要積累：下次讀論文的時候遇見自己不太了解的領域，應該讀完問題描述之后，就對之前的內容進行回看。

4.1逐句翻譯

第一段
We design a deep learning framework that directly consumes unordered point sets as inputs.
我們設計了一個直接使用無序點集作為輸入的深度學習框架。

A point cloud is represented as a set of 3D points {Pi| i = 1, …, n}, where each point Pi is a vector of its (x, y, z) coordinate plus extra feature channels such as color, normal etc.
點云表示為一組三維點{Pi| i = 1，…， n}，其中每個點Pi是其(x, y, z)坐標加上額外的特征通道，如顏色，法線等的一個向量。

For simplicity and clarity, unless otherwise noted, we only use the (x, y, z) coordinate as our point’s channels.
為了簡單和清晰，除非另有說明，我們只使用(x, y, z)坐標作為點的通道。

第二段
For the object classification task, the input point cloud is either directly sampled from a shape or pre-segmented from a scene point cloud.
對于目標分類任務，輸入點云要么直接從形狀中采樣，要么從場景點云中預先分割。
Our proposed deep network outputs k scores for all the k candidate classes.
我們提出的深度網絡為所有的k個候選類輸出k個分數

For semantic segmentation, the input can be a single object for part region segmentation, or a sub-volume from a 3D scene for object region segmentation.
對于語義分割，輸入可以是用于部分區域分割的單個對象，也可以是用于對象區域分割的3D場景的一個子體。

Our model will output n × m scores for each of the n points and each of the m semantic sub-categories.
我們的模型將為這n個點和m個語義子類別輸出n × m個分數。

4.2總結

明確我們的試驗環境：
**1.輸入點的情況：**這里所有的點都是三維坐標，也就是（x，y，z），當然還可以加入其他輸入，但是這里為了簡單，就只使用這個。
**2.對象識別輸入圖片的情況：**可以是直接傳進來一張圖片，也可能是之前用對象追蹤等的東西提前分割的圖片。
**3.語義分割輸入圖片的情況：**可能是整個對象，也可能是整個環境的一小部分，這個小部分包含一個對象。
**4.對象識別輸出的情況：**輸出k個類的分數，我理解這個分數應該是softmax層的輸出，也就是實際上使各個分類的概率。
**5.語義分割的輸出情況：**輸出的是m*n也就是m個點，每個點n個輸出，應該每個點的輸出都是一個softmax，也就是各個分類的概率。

5.Deep Learning on Point Sets

這個部分真的開始談實現了，為了真正讀懂這個東西，我們先復現這個東西再來閱讀：

圖片部分

這個圖片就不理解了，可以看我另外的播客，那個理解更充分一些：基于Pytorch的PointNet復現

5.0部分綜述

The architecture of our network (Sec 4.2) is inspired by
the properties of point sets in Rn (Sec 4.1).
我們4.2中的網絡結構來自于4.1中提出的內容

5.1.1 翻譯4.1講述的點集的特征

**綜述 **
Our input is a subset of points from an Euclidean space.
It has three main properties:
我們輸入的是一個來自于歐式空間的點集，他大約有三個主要方面
第一個特征
? Unordered. Unlike pixel arrays in images or voxel arrays in volumetric grids, point cloud is a set of points without specific order. In other words, a network that consumes N 3D point sets needs to be invariant to N! permutations of the input set in data feeding order.
**無序性。**與圖像中的像素陣列或體素網格中的體素陣列不同，點云是一組沒有特定順序的點。換句話說，一個消耗N個3D點集的網絡需要對N不變性!按數據輸入順序排列輸入集。
（我感覺這個意思就是，這個是一個集合，本身應當是無序的，但是輸入的數據總是有一個順序的）
? Interaction among points. The points are from a space with a distance metric.
**點間的相互作用。**這些點來自一個有距離度量的空間。
It means that points are not isolated, and neighboring points form a meaningful subset.
這意味著點不是孤立的，相鄰的點形成一個有意義的子集。
Therefore, the model needs to be able to capture local structures from nearby points, and the combinatorial interactions among local structures.
因此，模型需要能夠從附近的點獲取局部結構，以及局部結構之間的組合相互作用。
? Invariance under transformations. As a geometric object, the learned representation of the point set should be invariant to certain transformations.
**?變換下的不變性。**作為一個幾何對象，學習到的點集表示對于某些變換應該是不變的。
For example, rotating and translating points all together should not modify the global point cloud category nor the segmentation of the points.
例如，將全部的點一起旋轉和平移不應改變全局點云類別，也不應改變點云的分割。

5.1.2點集的特點總結：

1.無序性性，點集是一個點的集合，本身應當是沒有順序的，但是我們的輸入總是有一個先后順序的。
2.點之間的相互作用，點集中的每個點會受到周邊的點的影響，所以我們需要讓點可以和周圍的點融合，我覺得這個大約是卷積的意思，只是這里的點不是有序的結構，所以我們不能使用卷積。
3.變換下的不變性：大約就是說這個圖我們翻轉或是一定情況下的仿射，圖的類別不應當發生變化，每個點的語義分割也不應該發生變化。

5.2 翻譯4.2節PointNet Architecture

Our full network architecture is visualized in Fig 2, where the classification network and the segmentation network share a great portion of structures.
我們的整個網絡架構如圖2所示，其中分類網絡和分割網絡共享了很大一部分的結構。也就是我們上面的大圖。
Please read the caption of Fig 2 for the pipeline.
請閱讀上面的圖標2，中的傳播途徑pipeline
Our network has three key modules: the max pooling layer as a symmetric function to aggregate information from all the points, a local and global information combination structure, and two joint alignment networks that align both input points and point features.
我們的網絡有三個關鍵模塊:最大池化層作為一個對稱函數來聚合所有點的信息，一個局部和全局的信息組合結構，以及兩個聯合對齊網絡來對齊輸入點和點特征。
We will discuss our reason behind these design choices in separate paragraphs below.
我們將在下面的單獨段落中討論這些設計選擇背后的原因。

5.2.2模型結構總結

這個的情況就是，我們介紹了模型的三個組成部分：
1.最大池化層作為一個對稱函數來聚合所有點的信息，（這個最大池化層很好找）其實這里最大池化層才是全文模型的核心，但是他卻是這里最簡單的一種結構，確實十分有趣，我覺得也許，這里可以做一些新的探索，寫一些新的論文。
2.一個局部和全局的信息組合結構，（我覺得這里是說的是那個跳連接的部分），實現的時候確實如此，這里其實有一個比較好的點，文章在將兩部分合并之后，又在這個之后加入了一個全連接層組對其進行處理，我覺得這個過程的好處是通過訓練，為加入的部分選定了一個合適的權重。
（我一開始理解這個mlp是一個線性層，我們真正復現他的時候，發現不行，不能使用線性層，需要因為真正變換的dim=-2，而線性層是處理dim=-1的，所以需要使用1乘1的卷積。）

3.以及兩個聯合對齊網絡來對齊輸入點和點特征。（這里指的是T-Net）

5.3Symmetry Function for Unordered Input

5.3.1對稱函數的說明部分翻譯

需要用到的圖片：Fig5

Figure 5. Three approaches to achieve order invariance.
三種實現階不變性的方法。
Multilayer perceptron (MLP) applied on points consists of 5 hidden layers with neuron sizes 64,64,64,128,1024, all points share a single copy of MLP. The MLP close to the output consists of two layers with sizes 512,256.
應用于點的多層感知器(MLP)由5個隱藏層組成，神經元大小分別為64、64、64、128、1024，所有點共享一個單一副本的MLP。靠近輸出的MLP由兩層組成，大小為512,256。
總結起來就是：maxpool的效果比他們都好。

這部分在詳細介紹對稱函數處理無序點集的問題
第一段
In order to make a model invariant to input permutation, three strategies exist:
為了使模型不變的輸入排列，我們提出來了三個可以選擇的策略:

sort input into a canonical order;
將輸入按照規范的順序排列

treat the input as a sequence to train an RNN, but augment the training data by all kinds of permutations;
將輸入作為一個序列來訓練RNN，但通過各種排列來增強訓練數據;

use a simple symmetric function to aggregate the information from each point. Here, a symmetric function takes n vectors as input and outputs a new vector that is invariant to the input order.
使用一個簡單的對稱函數來聚合每個點的信息。這里，一個對稱函數取n個向量作為輸入，然后輸出一個新的向量，這個向量對輸入順序不變。
For example, + and ? operators are symmetric binary
functions.
例如，+和?運算符是對稱的二元函數。

第二段
While sorting sounds like a simple solution, in high dimensional space there in fact does not exist an ordering that is stable w.r.t. point perturbations in the general sense.

雖然排序聽起來像一個簡單的解決方案，但在高維空間中，實際上并不存在一種有序，即一般意義上穩定的w.r.t.點擾動。

This can be easily shown by contradiction. If such an ordering strategy exists, it defines a bijection map between a high-dimensional space and a 1d real line.

這很容易通過矛盾來證明。如果存在這樣的排序策略，它定義了高維空間和一維實線之間的雙射映射。

It is not hard to see, to require an ordering to be stable w.r.t point perturbations is equivalent to requiring that this map preserves spatial proximity as the dimension reduces, a task that cannot be achieved in the general case.
不難看出，要求w。r。t點攝動的有序是穩定的，就等于要求該映射在維數減少時保持空間鄰近性，顯然在一般情況下這是無法完成的任務。
Therefore, sorting does not fully resolve the ordering issue, and it’s hard for a network to learn a consistent mapping from input to output as the ordering issue persists.
因此，排序并不能完全解決排序問題，而且當排序問題持續存在時，網絡很難學會從輸入到輸出的一致映射。
As shown in experiments (Fig 5), we find that applying a MLP directly on the sorted point set performs poorly, though slightly better than directly processing an unsorted input.
如圖5所示，我們發現直接在排序的點集上應用MLP的性能較差，但略好于直接處理未排序的輸入。
第三段
The idea to use RNN considers the point set as a sequential signal and hopes that by training the RNN with randomly permuted sequences, the RNN will become invariant to input order.
使用RNN的思想認為點集是一個序列信號，并希望通過用隨機排列的序列訓練RNN, RNN將成為不變的輸入順序。
However in “OrderMatters” [22] the authors have shown that order does matter and cannot be totally omitted.
然而，在“OrderMatters”[22]中，作者已經證明了順序是重要的，不能被完全忽略。
While RNN has relatively good robustness to input ordering for sequences with small length (dozens), it’s hard to scale to thousands of input elements, which is the common size for point sets.
RNN對小長度序列(幾十個)的輸入排序具有較好的魯棒性，但難以伸縮到數千個輸入元素，這是點集的常見規模。
Empirically, we have also shown that model based on RNN does not perform as well as our proposed method (Fig 5)
經驗上，我們也證明了基于RNN的模型并沒有我們所提出的方法表現的好。
第四段
Our idea is to approximate a general function defined on
a point set by applying （這里應當翻譯為變形）a symmetric function on transformed elements in the set:
我們的想法是通過變形一個元素集合的對稱函數來擬合一個在無序點集上的一般函數。
文章給了一個函數的表達式如下：

第五段
Empirically, our basic module is very simple: we approximate h by a multi-layer perceptron network and g by a composition of a single variable function and a max pooling function.
根據經驗，我們的基本模塊非常簡單:我們用多層感知器網絡近似h，用單變量函數和最大池化函數的組合近似g。
This is found to work well by experiments. Through a collection of h, we can learn a number of f’s to capture different properties of the set.
實驗證明這是有效的。通過一個h函數的集合集合，我們可以學習若干個f來捕捉集合的不同性質。

第六段
While our key module seems simple, it has interesting properties (see Sec 5.3) and can achieve strong performace (see Sec 5.1) in a few different applications.
雖然我們的key模塊看起來很簡單，但它具有有趣的屬性(見章節5.3)，并且可以在一些不同的應用程序中實現強大的性能(見章節5.1)。
Due to the simplicity of our module, we are also able to provide theoretical analysis as in Sec 4.3.
由于我們的模塊很簡單，我們也可以像4.3節那樣提供理論分析。

5.3.2對稱函數的總結

文章提出來三種方法，但是比較起來還是最后這個好用，

1）排列這個順序，好不好，可不可以實現？
從深度學習的角度來說：這個很難學習，不是我們所希望看到的。
我個人理解這個東西是不可能的，因為排序之后相當于原先沒有順序的東西你給他們搞了個順序，有了順序他們就需要使用不同的參數進行處理，這樣就增加了參數的個數，這顯然不是我們所希望看到的。
但是論文并沒有說明原因，而是進行一個試驗來進行論證，我覺得這是我需要學習的一個地方，當道理說不清楚的時候，或者說不透徹的時候我們應當適當引入試驗來彌補理論的不足。
從幾何意義的角度來說：準確來說，這個點就不存在嚴格意義上的順序。
論文當中說了一個問題：“就是如果存在一個方法使得其可以嚴格排序，那么我們就可以將任意的三維空間的點直接映射到一維空間上（也就是映射到一個數軸上）這個顯然是不靠譜的。”
我覺得這個可以這樣理解，首先我們要排序一個點集，那么我們必須對每個點對應一個不同的數值，我們不難證明高維度的點的個數是一定大于數軸上點的個數的，所以這個映射顯然不成立。
2）使用有序RNN，之后再增強數據，這樣做好不好？這個過程會讓數據集擴大階乘倍，需要的計算量也將飛速上升，但是，最后的結果并不是提取了有用的信息，只是讓RNN變得對于所有輸入平衡了。
但是這里存在兩個問題：
1.RNN自身很難做到對數據順序的安全無感
2.RNN的使用范圍就那么長（最長就在輸入幾十個的時候表現良好），太長的序列他有點頂不住
3）接下來文章又在理論上證明了自己模型的合理性。就是證明使用這種對稱函數加上多層感知機可以擬合任意的集合上的函數。

5.4Local and Global Information Aggregation局部信息和全局信息的聚合

5.4.1局部信息和全局信息部分的翻譯

第一段
The output from the above section forms a vector [f1, . . . , fK], which is a global signature of the input set.
以上部分的輸出形成了一個向量[f1，…， fK]，它是輸入集的全局信號。也就是說上面的所述的函數的最終輸出是一個全局的信息。
We can easily train a SVM or multi-layer perceptron classifier on the shape global features for classification.
我們可以很容易地訓練一個支持向量機或多層感知器分類器對形狀的全局特征進行分類。
However, point segmentation requires a combination of local and global knowledge. We can achieve this by a simple yet highly effective manner.
然而，點語義分割需要結合局部知識和全局知識。我們可以通過一種簡單而高效的方式來實現這一目標。
第二段
Our solution can be seen in Fig 2 (Segmentation Network).
我們的解決方案如圖2 (Segmentation Network)所示。
After computing the global point cloud feature vector, we feed it back to per point features by concatenating the global feature with each of the point features.
在計算出全局點云特征向量后，通過將全局特征與每個點云特征連接起來，將其反饋到每個點云特征。
Then we extract new per point features based on the combined point features - this time the per point feature is aware of both the local and global information.
然后在新的點對（老的點和新的點）信息的基礎上提取新的單點特征，這一次單點特征同時具有局部信息和全局信息。
第三段
With this modification our network is able to predict per point quantities that rely on both local geometry and global semantics.
通過這種修改，我們的網絡能夠預測依賴于局部幾何和全局語義的每個點的數量。
For example we can accurately predict per-point normals (fig in supplementary), validating that the network is able to summarize information from the point’s local neighborhood.
例如，我們可以準確地預測每個點的法線(圖在補充)，驗證網絡能夠從點的局部鄰域總結信息。
In experiment session, we also show that our model can achieve state-of-the-art performance on shape part segmentation and scene segmentation.
實驗結果表明，該模型在形狀分割和場景分割方面均能取得較好的效果。

5.4.2局部信息和全局信息的總結

總結起來就是這里沒有創新：
這里還是使用一個傳統的跳連接來完成兩種信息的融合。（這個是傳統方法）。

5.5 Joint Alignment Network聯合定位網絡（主要是講圖像旋轉的問題）

5.5.1翻譯部分

第一段
The semantic labeling of a point cloud has to be invariant if the point cloud undergoes certain geometric transformations, such as rigid transformation.
如果點云經過一定的幾何變換(如剛性變換)，則點云的語義標記必須是不變的。（也就是怎么轉語義分割都不變）
We therefore expect that the learnt representation by our point set is invariant to these transformations.
因此，我們期望我們的點集所學習的表示對于這些變換是不變的。

第二段
A natural solution is to align all input set to a canonical space before feature extraction. Jaderberg et al. [9] introduces the idea of spatial transformer to align 2D images through sampling and interpolation, achieved by a specifically tailored layer implemented on GPU.
一個自然的解決方案是在特征提取之前將所有的輸入集對齊到一個規范空間。Jaderberg等人的[9]引入了空間變換的思想，通過采樣和插值來對齊2D圖像，通過在GPU上實現專門定制的層來實現。

第三段
Our input form of point clouds allows us to achieve this goal in a much simpler way compared with [9].
我們的點云輸入形式允許我們以比[9]簡單得多的方式實現這一目標。
We do not need to invent any new layers and no alias is introduced as in the image case.
我們不需要創造任何新的圖層，也不需要像圖片那樣引入別名。
We predict an affine（仿射） transformation matrix by a mini-network (T-net in Fig 2) and directly apply this transformation to the coordinates（坐標） of input points.
我們通過一個微型網絡(圖2中的T-net)預測一個仿射變換矩陣，并直接將這個變換應用到輸入點的坐標上。
The mininetwork itself resembles the big network and is composed by basic modules of point independent feature extraction, max pooling and fully connected layers.
微型網絡本身類似于大網絡，由點獨立特征提取、最大的池化和完全連接的層。（微型網絡除了自己特別突出的模擬仿射變換的情況以外，其他的特征其實和整體網絡的設計類似）
More details about the T-net are in the supplementary.
更多關于T-net的細節在補充中。

第四段
This idea can be further extended to the alignment of feature space, as well. We can insert another alignment net-work on point features and predict a feature transformation matrix to align features from different input point clouds. However, transformation matrix in the feature space has much higher dimension than the spatial transform matrix, which greatly increases the difficulty of optimization. We therefore add a regularization term to our softmax training loss. We constrain the feature transformation matrix to be close to orthogonal matrix:

第五段
where A is the feature alignment matrix predicted by a mini-network. An orthogonal transformation will not lose information in the input, thus is desired. We find that by adding the regularization term, the optimization becomes more stable and our model achieves better performance.

5.5.2總結部分

這里提出來了一個轉換的問題:就是轉換不變性，目標分類和語義識別都不隨著旋轉改變，所以我們的網絡也應當具有旋轉不變性。
這里舉例了一個之前的方法：之前有人提出來了一種標準化的方法，我沒有看過這個文章，所以我大約認為這個東西是一種預處理，這種預處理，我理解里他是違背了深度學習的整體思想的，并且一般來看應該結果也不會特別好。
后來去看了論文本來作者的視頻，對這里有了更深的理解，之前的人主要是作了兩件事情：

1.直接將三維空間的點集映射到一個二維空間里，這樣就可以使用我們在二維空間中慣用的卷積等各種操作完成我們需要的任務。但是，顯然這個映射的過程中，會丟失很多本身的特征。這個映射我覺得有兩步：
首先.就是之前我談過的一一對應的問題，如果3d的點可以于2d的點一一對應，那么一定不可能，因為3d當中有更多的點。
其次，我們真的想要使用圖像處理的方法，那么我們就得把點云這個孤立的點，轉化為一個連續的圖片。
2.手工從三維空間提取信息，這個大家就該都懂了，深度學習的指導思想就是盡量避免手工的操作。

講了本文怎么實現的：本文最后使用一個T-Net的操作其實是一個矩陣變換，這個網絡設計的過程中其他的pooling和mlp和整體的網絡特點都是一致的。

5.6理論分析Theoretical Analysis

這個我將在另外一篇博客和大家探討，如果你對只是使用，其實不太需要閱讀理論。

總結

以上是生活随笔為你收集整理的点云网络的论文理解（一）-点云网络的提出 PointNet : Deep Learning on Point Sets for 3D Classification and Segmentation的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：点云网络的论文理解（二）- PointN
下一篇： ubantu使用apt安装时出现： xx