當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

大数据 vr csdn_VR中的数据可视化如何革命化科学

發(fā)布時(shí)間：2023/11/29 编程问答 59 豆豆

生活随笔收集整理的這篇文章主要介紹了大数据 vr csdn_VR中的数据可视化如何革命化科学小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

大數(shù)據(jù) vr csdn

Astronomy has become a big data discipline, and the ever growing databases in modern astronomy pose many new challenges for analysts. Scientists are more frequently turning to artificial intelligence and machine learning algorithms to analyze multidimensional data sets. However, it is not only a methodological and technical challenge: it is also a visual one! Data visualization is driving discovery in astronomy and is also helping with communicating new findings to the general public. The history of information graphics shows how the transformation of data into knowledge is vital for understanding the data at hand, a subject I have previously written about here.

天文學(xué)已經(jīng)成為一門(mén)大數(shù)據(jù)學(xué)科，而現(xiàn)代天文學(xué)中不斷增長(zhǎng)的數(shù)據(jù)庫(kù)對(duì)分析人員提出了許多新的挑戰(zhàn)。科學(xué)家越來(lái)越多地轉(zhuǎn)向人工智能和機(jī)器學(xué)習(xí)算法來(lái)分析多維數(shù)據(jù)集。但是，這不僅是方法和技術(shù)上的挑戰(zhàn)：這也是視覺(jué)上的挑戰(zhàn)！數(shù)據(jù)可視化正在推動(dòng)天文學(xué)的發(fā)現(xiàn)，并且還有助于將新發(fā)現(xiàn)傳達(dá)給公眾。信息圖形的歷史表明，如何將數(shù)據(jù)轉(zhuǎn)換為知識(shí)對(duì)于理解手頭的數(shù)據(jù)至關(guān)重要，這是我先前在此撰寫(xiě)的主題。

The problem of visualizing complex data and exploring it interactively is by no means new or limited to research. Examples from digital information design in bioinformatics and medicine (e.g. Genome Valence by Ben Fry or Meviatis by Ricarda Schuhmann) show how visualization can support the understanding of structures within data sets and facilitate exploration. The representation of the data’s dimensions (i.e. its parameter values) can result in dynamic and aesthetic data sculptures. Such visualizations are often quite beautiful in themselves but, crucially, their interactive features enable users to quickly make comparisons and interpret the data.

可視化復(fù)雜數(shù)據(jù)并以交互方式進(jìn)行探索的問(wèn)題絕不是新問(wèn)題，也不是僅限于研究。生物信息學(xué)和醫(yī)學(xué)中的數(shù)字信息設(shè)計(jì)的示例(例如Ben Fry的Genome Valence或Ricarda Schuhmann的Meviatis )顯示了可視化如何能夠支持對(duì)數(shù)據(jù)集內(nèi)結(jié)構(gòu)的理解并促進(jìn)探索。數(shù)據(jù)尺寸的表示(即參數(shù)值)可以產(chǎn)生動(dòng)態(tài)和美觀的數(shù)據(jù)雕塑。這樣的可視化效果通常很漂亮，但至關(guān)重要的是，它們的交互功能使用戶(hù)可以快速進(jìn)行比較并解釋數(shù)據(jù)。

Today’s digital media allows us to go beyond designing interactive on-screen three-dimensional applications. Both augmented reality (AR) and virtual reality (VR) make it possible for users to take a fresh look at their data and explore parameter spaces in 3D. There is so much potential for using these technologies in the field of information design. For VR, the advantages are obvious:

當(dāng)今的數(shù)字媒體使我們不僅可以設(shè)計(jì)交互式屏幕三維應(yīng)用程序。增強(qiáng)現(xiàn)實(shí)(AR)和虛擬現(xiàn)實(shí)(VR)都使用戶(hù)可以重新查看其數(shù)據(jù)并探索3D中的參數(shù)空間。在信息設(shè)計(jì)領(lǐng)域中使用這些技術(shù)的潛力很大。對(duì)于VR，優(yōu)勢(shì)顯而易見(jiàn)：

More space! VR offers a larger field of view than 2D images. This allows for multiple views to be arranged in space, making it easier to draw cross-references and connections.
更多的空間！ VR提供比2D圖像更大的視野。這樣就可以在空間中排列多個(gè)視圖，從而更容易繪制交叉引用和連接。
More dimensions! Compared to 2D graphics, VR visualizations offer additional parameters that can represent data (e.g. sound, haptics, lighting, interaction).
更多尺寸！ 與2D圖形相比，VR可視化提供了可以表示數(shù)據(jù)的其他參數(shù)(例如，聲音，觸覺(jué) ，照明，交互)。
More structure! The perception of space and depth is more intuitive; enabling shapes and volumes to be recognized more quickly.
更結(jié)構(gòu)！ 對(duì)空間和深度的感知更加直觀；使形狀和體積更快地被識(shí)別。
More fun! Immersing yourself in the data and the ability to go from overview to detail by scaling the space is a powerful immersive experience.
更多樂(lè)趣！ 沉浸在數(shù)據(jù)中，并能夠通過(guò)縮放空間來(lái)從概覽到細(xì)節(jié)，這是一種強(qiáng)大的沉浸式體驗(yàn)。

了解未知的本質(zhì) (Understanding the nature of the unknown)

Inspired by the above research examples, the hypothesis I chose to explore for my bachelor thesis in Information Design was:

受到以上研究示例的啟發(fā)，我選擇為信息設(shè)計(jì)學(xué)士學(xué)位探索的假設(shè)是：

The presentation of scientific data with new digital media, especially VR, offers great potential for data analysis in science.

用新的數(shù)字媒體(尤其是VR)呈現(xiàn)科學(xué)數(shù)據(jù)，為科學(xué)數(shù)據(jù)分析提供了巨大潛力。

I wanted to test this hypothesis on a data set from my previous research which I had been struggling to get an overview of. During my PhD in Astrophysics, I was involved in the EXTraS project, which aimed to automatically classify unknown and newly discovered X-ray sources in the cosmos. The sources were observed by the X-ray satellite XMM-Newton from the European Space Agency (ESA). I set about designing the Virtual Data Cosmos as a way of grouping data with similar properties and visualizing these groups.

我想用以前的研究中的數(shù)據(jù)集來(lái)檢驗(yàn)這個(gè)假設(shè)，而我一直在努力進(jìn)行概述。在獲得天體物理學(xué)博士學(xué)位期間，我參與了EXTraS項(xiàng)目，該項(xiàng)目旨在自動(dòng)對(duì)宇宙中未知和新發(fā)現(xiàn)的X射線(xiàn)源進(jìn)行分類(lèi)。來(lái)源是由歐洲航天局 (ESA)的X射線(xiàn)衛(wèi)星XMM-Newton觀測(cè)到的。我著手設(shè)計(jì)虛擬數(shù)據(jù)宇宙，以將具有相似屬性的數(shù)據(jù)分組并可視化這些組的方式。

As more and more data is collected by X-ray satellites, the data archives of these satellites are growing annually. The records detail millions of sources that emit X-rays, and from which any newly found source could yield new physical discoveries. The classification of unknown sources is therefore hugely important in modern astronomy and, due to the sheer amount of data, intelligent algorithms are increasingly being adopted by astronomers worldwide.

隨著越來(lái)越多的X射線(xiàn)衛(wèi)星收集數(shù)據(jù)，這些衛(wèi)星的數(shù)據(jù)檔案每年都在增長(zhǎng)。記錄詳細(xì)記錄了數(shù)百萬(wàn)個(gè)發(fā)出X射線(xiàn)的放射源，任何新發(fā)現(xiàn)的放射源都可以從中產(chǎn)生新的物理發(fā)現(xiàn)。因此，未知源的分類(lèi)在現(xiàn)代天文學(xué)中非常重要，由于數(shù)據(jù)量巨大，全球各地的天文學(xué)家都越來(lái)越多地采用智能算法。

The image below shows an image of the entire sky in the optical wavelength as seen from Earth. This projection scan be seen as analogous to a world map in which the galactic plane lies on the equator and the galactic center is in the center of the map. Just as in a normal world map there are longitudes and latitudes, shown as white grid lines. This is typically referred to as a sky map. Laid over the optical image are white dots; each represents a region observed by the X-ray satellite XMM-Newton. Each white dot includes several unknown X-ray sources. The objective of the project was to classify each of these sources.

下圖顯示了從地球看到的整個(gè)光學(xué)波長(zhǎng)的天空?qǐng)D像。該投影掃描被視為類(lèi)似于世界地圖，其中銀河平面位于赤道上，而銀河中心位于地圖中心。就像在正常的世界地圖中一樣，也有經(jīng)度和緯度，以白色網(wǎng)格線(xiàn)顯示。這通常稱(chēng)為天空地圖。光學(xué)圖像上有白點(diǎn)。每個(gè)代表一個(gè)由X射線(xiàn)衛(wèi)星XMM-Newton觀測(cè)到的區(qū)域。每個(gè)白點(diǎn)包括幾個(gè)未知的X射線(xiàn)源。該項(xiàng)目的目的是對(duì)所有這些來(lái)源進(jìn)行分類(lèi)。

An optical sky map of the universe (Source: ESA) adapted to show the positions of unknown X-ray sources宇宙的光學(xué)天空?qǐng)D(來(lái)源：ESA)，適合顯示未知X射線(xiàn)源的位置

In order to understand the nature of each X-ray source, astronomers compare its features (specifically the energetic and temporal properties observed) to those of objects with known classification types such as binary star or Seyfert galaxy. Questions like these help:

為了了解每個(gè)X射線(xiàn)源的性質(zhì)，天文學(xué)家將其特征(特別是觀察到的能量和時(shí)間性質(zhì))與具有已知分類(lèi)類(lèi)型的物體(如雙星或塞弗特星系)進(jìn)行了比較。這些問(wèn)題有幫助：

What are the correlations between the properties of the X-ray source and those of known object classification type?
X射線(xiàn)源的屬性與已知對(duì)象分類(lèi)類(lèi)型的屬性之間有什么關(guān)聯(lián)？
Where are the differences?
區(qū)別在哪里？
Has the unknown object been discovered elsewhere in the electromagnetic spectrum which could yield further hints on its nature?
是否在電磁光譜的其他地方發(fā)現(xiàn)了未知物體，這可能進(jìn)一步暗示其性質(zhì)？

In order to describe the similarity between an unknown and a known X-ray source we astronomers use statistics as well as visualization. In this case, machine learning algorithms (supervised decision tree algorithms to be precise) automatically characterized every source in this large and complex data set by comparing their precise parameter values (e.g. observed X-ray intensity) with those of known objects. Ultimately, the algorithms calculate the probability of an X-ray source belonging to various classification types and allocate it to the class that is most likely.

為了描述未知和已知X射線(xiàn)源之間的相似性，我們的天文學(xué)家使用了統(tǒng)計(jì)數(shù)據(jù)和可視化數(shù)據(jù)。在這種情況下，通過(guò)將機(jī)器學(xué)習(xí)算法(精確的監(jiān)督?jīng)Q策樹(shù)算法 )的精確參數(shù)值(例如觀察到的X射線(xiàn)強(qiáng)度)與已知對(duì)象的精確參數(shù)值進(jìn)行比較，可以自動(dòng)表征該龐大而復(fù)雜的數(shù)據(jù)集中的每個(gè)源。最終，算法計(jì)算出X射線(xiàn)源屬于各種分類(lèi)類(lèi)型的概率，并將其分配給最可能的類(lèi)別。

For example: The X-ray source with ID 1 has a 45% probability of being a single star, a 30% probability of being a binary star and a 0.01% probability of being a galaxy. The algorithm therefore assigns the class with the highest probability as the final classification of the unknown source. In this case, source ID 1 would be classified as single star.

例如：ID為1的X射線(xiàn)源有45％的概率是單顆星，有30％的概率是雙星，有0.01％的概率是星系。因此，該算法將概率最高的類(lèi)別指定為未知源的最終分類(lèi)。在這種情況下，源ID 1將被分類(lèi)為單顆星。

Once the algorithm has classified all unknown sources in this way, the task of the astronomer is to carefully screen and control the results. How did the algorithm perform? Did it make mistakes? Since more than one algorithm was tested one would need to compare the results of each to answer these questions. Did different algorithms classify the same unknown source into different classes? Also, as a scientist, one also wants to know why an algorithm classified an object as it did. The astronomer requires an understanding of the relationship between different parameters and source classification types, and does this with the help of visualization.

一旦算法以這種方式對(duì)所有未知源進(jìn)行分類(lèi)，天文學(xué)家的任務(wù)就是仔細(xì)篩選和控制結(jié)果。該算法如何執(zhí)行？它犯錯(cuò)了嗎？由于測(cè)試了多種算法，因此需要比較每種算法的結(jié)果來(lái)回答這些問(wèn)題。是否有不同的算法將相同的未知源劃分為不同的類(lèi)？另外，作為科學(xué)家，人們也想知道為什么一種算法將對(duì)象分類(lèi)。天文學(xué)家需要了解不同參數(shù)與源分類(lèi)類(lèi)型之間的關(guān)系，并借助可視化來(lái)做到這一點(diǎn)。

傳統(tǒng)科學(xué)的局限性 (The limitations of traditional science viz)

A typical method is to create multiple scatterplots in which the X-ray properties of unknown cosmic sources are compared with each other while taking into account the results of a single algorithm. This is done by assigning a unique color and symbol to a specific source classification and depicting X-ray sources with specific class symbols in the plot. We astronomers can then analyze whether the positions of sources depicted with the same symbol form patterns that help to distinguish different classification types.

一種典型的方法是創(chuàng)建多個(gè)散點(diǎn)圖，其中將未知宇宙源的X射線(xiàn)屬性相互比較，同時(shí)考慮到單個(gè)算法的結(jié)果。這是通過(guò)為特定的源分類(lèi)分配唯一的顏色和符號(hào)并在繪圖中描繪具有特定類(lèi)別符號(hào)的X射線(xiàn)源來(lái)完成的。然后，我們的天文學(xué)家可以分析以相同符號(hào)表示的源位置是否形成有助于區(qū)分不同分類(lèi)類(lèi)型的模式。

Typical scatterplots used in astronomy to explore data set dimensions. Classification types (e.g.: stars, galaxies, etc.) are coded by color and symbol.天文學(xué)中用于探索數(shù)據(jù)集維度的典型散點(diǎn)圖。分類(lèi)類(lèi)型(例如：恒星，星系等)按顏色和符號(hào)編碼。

For example: these scatterplots were created to investigate the relationships between parameter HR1 and parameters HR2, HR3, and HR4. The parameters are abstract properties used to describe specific radiation energies of the cosmic sources and visualizing them in the abstract plane enables us to look for patterns that may characterize the properties of different objects. The data points represent all unknown cosmic sources observed by the satellite.

例如：創(chuàng)建這些散點(diǎn)圖以研究參數(shù)HR1與參數(shù)HR2，HR3和HR4之間的關(guān)系。這些參數(shù)是抽象屬性，用于描述宇宙源的特定輻射能，并在抽象平面中對(duì)其進(jìn)行可視化處理使我們能夠?qū)ふ铱杀碚鞑煌矬w屬性的模式。數(shù)據(jù)點(diǎn)代表衛(wèi)星觀測(cè)到的所有未知宇宙源。

In this case, green triangles represent the class Seyfert galaxies, while purple squares depict the class of single variable stars that exist within our Milky Way. We see that the sources overlap if we only look at the HR1 parameter, but they occupy very different regions in the HR1-HR2 plane in the first scatterplot. Hence from that plot we can conclude that sources with a low HR1 and HR2 value belong to the purple square (variable star) class.

在這種情況下，綠色三角形代表塞弗特星系類(lèi)別，而紫色正方形代表我們銀河系中存在的單變星類(lèi)別。如果僅查看HR1參數(shù)，就會(huì)看到源重疊，但是在第一個(gè)散點(diǎn)圖中，它們?cè)贖R1-HR2平面中占據(jù)了非常不同的區(qū)域。因此，從該圖可以得出結(jié)論，HR1和HR2值較低的源屬于紫色正方形( 變星 )類(lèi)。

But what about sources with high HR1 and HR2 values? Comparing only these parameters would put them in the galaxy (green) class. But there are many other classes which also occupy this region, e.g. blue triangles, which represent a kind of binary star system and this confuses the picture. To get a clearer understanding we now need to compare the HR1-HR2 parameter plane with the other scatterplots. If we now look at the second image, which illustrates the HR1-HR3 plane, we see that the sources shown in green and blue symbols are slightly more separated. And by combining the information of the first and second plots, we can identify the specific combinations of HR1, H2 and HR3 parameters that differentiate variable stars (purple), galaxies (green) and binary star systems (blue) .

但是，具有高HR1和HR2值的源又如何呢？僅比較這些參數(shù)會(huì)將它們置于銀河 (綠色)類(lèi)中。但是還有許多其他類(lèi)別也占據(jù)了這個(gè)區(qū)域，例如藍(lán)色三角形代表了一種雙星系統(tǒng) ，這使圖片變得混亂。為了更清楚地了解，我們現(xiàn)在需要將HR1-HR2參數(shù)平面與其他散點(diǎn)圖進(jìn)行比較。現(xiàn)在，如果我們查看第二張圖片，該圖片說(shuō)明了HR1-HR3平面，那么我們看到以綠色和藍(lán)色符號(hào)顯示的源稍微分開(kāi)了。并通過(guò)結(jié)合第一和??第二個(gè)圖的信息，我們可以確定HR1，H2和HR3參數(shù)的特定組合，以區(qū)分可變恒星 (紫色)，星系 (綠色)和雙星系統(tǒng) (藍(lán)色)。

With each additional scatterplot we gradually form a mental model of a multidimensional parameter space in which each source class is located in a unique location. In principle this is what the algorithms do and is why our parameters are also known as the ‘dimensions’ of a data set. However, the larger the number of parameters and classes, the more difficult it is for humans to keep an overview of all relationships. It is simply not possible for us to imagine more than three dimensions at once.

通過(guò)每個(gè)其他散點(diǎn)圖，我們逐漸形成多維參數(shù)空間的思維模型，其中每個(gè)源類(lèi)都位于唯一的位置。原則上，這就是算法的工作，也是為什么我們的參數(shù)也被稱(chēng)為數(shù)據(jù)集的“維度”的原因。但是，參數(shù)和類(lèi)的數(shù)量越多，人們?cè)诫y掌握所有關(guān)系。我們根本無(wú)法一次想象三個(gè)以上的維度。

In our sample, the size of the data set and the fact there were more than 50 parameters made it impossible to get an overview of all the relationships between parameter values and source classifications. The scatterplots required were simply too many and, due to the size of the data set, many regions were occupied by multiple source classes. The overlap of their symbols made it very difficult to see the data patterns.

在我們的樣本中，數(shù)據(jù)集的大小以及超過(guò)50個(gè)參數(shù)的事實(shí)使得無(wú)法大致了解參數(shù)值與源分類(lèi)之間的所有關(guān)系。所需的散點(diǎn)圖太多了，并且由于數(shù)據(jù)集的大小，許多區(qū)域被多個(gè)源類(lèi)別占用。它們符號(hào)的重疊使得很難看到數(shù)據(jù)模式。

In addition, these plots correspond to the classification by a single algorithm. So as we increase the number of algorithms in use, the number of plots would quickly become unmanageable. I concluded that this traditional 2D visualization did not allow a proper overview of the data, and was frustrated that the decision-making mechanisms of the algorithm remained opaque.

此外，這些圖對(duì)應(yīng)于單個(gè)算法的分類(lèi)。因此，隨著我們?cè)黾邮褂玫乃惴〝?shù)量，地塊數(shù)量將很快變得難以管理。我得出的結(jié)論是，這種傳統(tǒng)的2D可視化無(wú)法正確查看數(shù)據(jù)，并且對(duì)算法的決策機(jī)制仍然不透明感到沮喪。

設(shè)計(jì)虛擬數(shù)據(jù)宇宙 (Designing the Virtual Data Cosmos)

直接可視化數(shù)據(jù) (Visualizing the data directly)

To come up with a new way to visualize this big data set, I first did some research on the history and principles of data visualization. I was fascinated by the creativity with which designers and scientists mapped their data.

為了提出一種可視化此大數(shù)據(jù)集的新方法，我首先對(duì)數(shù)據(jù)可視化的歷史和原理進(jìn)行了一些研究。設(shè)計(jì)師和科學(xué)家繪制數(shù)據(jù)的創(chuàng)造力使我著迷。

Excellence in statistical graphics consists of complex ideas communicated with clarity, and efficiency.

統(tǒng)計(jì)圖形方面的卓越表現(xiàn)包括復(fù)雜，清晰，高效的想法。

Edward Tufte coined the term ‘graphic excellence’ in data visualization. He postulated various properties that statistical graphics require to be successful. His theory was that data should be displayed directly without the user being distracted by the design itself. Furthermore, statistical graphics should serve a clear purpose (either description, exploration, tabulation or decoration) and should show several levels of detail, from a rough overview to the fine structure of the data.

Edward Tufte創(chuàng)造了數(shù)據(jù)可視化中的“圖形卓越”一詞。他提出了統(tǒng)計(jì)圖形必須具備的各種屬性才能成功。他的理論是，數(shù)據(jù)應(yīng)直接顯示，而用戶(hù)不會(huì)因設(shè)計(jì)本身而分心。此外，統(tǒng)計(jì)圖形應(yīng)具有明確的目的(描述，探索，制表或修飾)，并應(yīng)顯示從粗糙的概述到數(shù)據(jù)的精細(xì)結(jié)構(gòu)的多個(gè)細(xì)節(jié)級(jí)別。

Similar claims were made by a 2015 study on the visualization of big data in VR and AR. The authors concluded that for a data visualization to serve as an analysis tool, it requires the data concerned to be represented exactly. The implication for my work was that the data mapping had to be done through coding. This meant that the data values themselves would define the visual aesthetic of the virtual environment.

2015年關(guān)于VR和AR中大數(shù)據(jù)可視化的研究也提出了類(lèi)似的主張。作者得出的結(jié)論是，要使數(shù)據(jù)可視化充當(dāng)分析工具，就需要準(zhǔn)確地表示有關(guān)數(shù)據(jù)。這對(duì)我的工作意味著數(shù)據(jù)映射必須通過(guò)編碼來(lái)完成。這意味著數(shù)據(jù)值本身將定義虛擬環(huán)境的視覺(jué)美感。

In addition, the interaction and scalability in a VR scene would allow the user to be fully immersed in the data and literally dive into it. One could easily move around and take different perspectives on the data set. Similarly, the user would be able to zoom out and get an overview, effectively holding the data in their hands. The data set could even be turned around and explored as though it were a physical object.

此外，VR場(chǎng)景中的交互性和可伸縮性將使用戶(hù)完全沉浸在數(shù)據(jù)中，并從字面上深入其中。人們可以輕松地走動(dòng)，并對(duì)數(shù)據(jù)集采取不同的觀點(diǎn)。類(lèi)似地，用戶(hù)將能夠縮小并獲得概覽，從而有效地將數(shù)據(jù)掌握在他們手中。甚至可以將數(shù)據(jù)集轉(zhuǎn)為一個(gè)物理對(duì)象并進(jìn)行探索。

This, for me, was the most important aspect of the VR approach: it combined the advantage of data physicalization with the possibility to shape and manipulate the data environment, which is not possible in the real world.

對(duì)我而言，這是VR方法最重要的方面：它將數(shù)據(jù)物理化的優(yōu)勢(shì)與塑造和操縱數(shù)據(jù)環(huán)境的可能性相結(jié)合，這在現(xiàn)實(shí)世界中是不可能的。

A sketch illustrating two immersive moments in VR: holding the data in your hands versus diving into the data一張草圖說(shuō)明了VR中的兩個(gè)沉浸式時(shí)刻：將數(shù)據(jù)掌握在手中與深入研究數(shù)據(jù)

Regardless of how the X-ray source data was organized, my principle idea was to pull the cluster of X-ray parameters and probabilities apart and display them in three-dimensional space. The goal was an interactive data visualization in VR in which the data could be explored directly. By interacting with a concrete virtual environment anyone could explore this abstract data space.

不管X射線(xiàn)源數(shù)據(jù)是如何組織的，我的基本思想都是將X射線(xiàn)參數(shù)和概率簇分開(kāi)，并在三維空間中顯示它們。目標(biāo)是在VR中進(jìn)行交互式數(shù)據(jù)可視化，從而可以直接瀏覽數(shù)據(jù)。通過(guò)與具體的虛擬環(huán)境進(jìn)行交互，任何人都可以探索此抽象數(shù)據(jù)空間。

My solution for the problem resulted in the Virtual Data Cosmos. I’ll talk you through the design concept here. A detailed description of the design process will be explained in the next article in this series.

我針對(duì)該問(wèn)題的解決方案產(chǎn)生了Virtual Data Cosmos 。我將在這里與您討論整個(gè)設(shè)計(jì)概念。設(shè)計(jì)過(guò)程的詳細(xì)描述將在本系列的下一篇文章中進(jìn)行解釋。

應(yīng)用設(shè)計(jì)理念 (Applying the design concept)

I wanted to ensure that the visualization would first give the user an overview of the data and only then allow them to go into the detail. By zooming in on their chosen classification type, one would finally reach the DNA of the X-ray source (i.e., they would find details of its spectral parameters) and therefore understand why the algorithm assigned the source to a certain class.

我想確保可視化效果將首先為用戶(hù)提供數(shù)據(jù)概覽，然后才允許他們進(jìn)入細(xì)節(jié)。通過(guò)放大他們選擇的分類(lèi)類(lèi)型，人們最終將到達(dá)X射線(xiàn)源的DNA(即，他們將找到其光譜參數(shù)的詳細(xì)信息)，因此可以理解為什么算法將源分配給特定類(lèi)別。

The VR experience consists of two spaces; users can choose to zoom in and out to seamlessly move from one space to the other:

VR體驗(yàn)包含兩個(gè)空間：用戶(hù)可以選擇放大和縮小以從一個(gè)空間無(wú)縫移動(dòng)到另一個(gè)空間：

The class room represents the entire cosmos and includes all data points, grouped according to their classification by the algorithms.
教室代表整個(gè)宇宙，包括所有數(shù)據(jù)點(diǎn)，這些數(shù)據(jù)點(diǎn)根據(jù)算法的分類(lèi)進(jìn)行分組。
The parameter space represents the observed parameter values of a user-selected subsample of the X-ray sources, and their classification by a selected algorithm.
參數(shù)空間表示用戶(hù)選擇的X射線(xiàn)源子樣本的觀察參數(shù)值，以及通過(guò)選定算法進(jìn)行的分類(lèi)。

The starting point was to create the ‘class room’, within which each classification type has its own three-dimensional volume. The class room visualizes the classification results of the X-ray sources by the various algorithms and allows users to explore the probability distributions within the database. It prompts questions such as:

起點(diǎn)是創(chuàng)建“教室”，其中每個(gè)分類(lèi)類(lèi)型都有自己的三維空間。教室通過(guò)各種算法可視化X射線(xiàn)源的分類(lèi)結(jié)果，并允許用戶(hù)瀏覽數(shù)據(jù)庫(kù)中的概率分布。它提示如下問(wèn)題：

How did an algorithm classify the unknown X-ray sources?
算法如何對(duì)未知的X射線(xiàn)源進(jìn)行分類(lèi)？
What is the probability of a source of belonging to that source class?
一個(gè)源屬于該源類(lèi)的概率是多少？
What could be an alternative classification?
什么是替代分類(lèi)？

A sketch of the VR concept showing the class room and parameter space, and how data points serve as a portal between the two spaces.VR概念的草圖，顯示了教室和參數(shù)空間，以及數(shù)據(jù)點(diǎn)如何充當(dāng)兩個(gè)空間之間的門(mén)戶(hù)。

Visualizing the complete data set in the class room was a very exciting moment! For the first time since the start of the EXTraS project, we were able to clearly visualize more than 500,000 data points without compromise, and compare the results of various algorithms all at once. I felt that I finally got a clear overview of the results and could easily see the distribution of all classified X-ray sources.

在教室中可視化完整的數(shù)據(jù)集是一個(gè)非常激動(dòng)人心的時(shí)刻！自EXTraS項(xiàng)目啟動(dòng)以來(lái)，我們首次能夠毫不妥協(xié)地清晰地可視化超過(guò)500,000個(gè)數(shù)據(jù)點(diǎn)，并同時(shí)比較各種算法的結(jié)果。我覺(jué)得我終于對(duì)結(jié)果有了一個(gè)清晰的概覽，并且可以輕松看到所有分類(lèi)X射線(xiàn)源的分布。

Here are some screenshots from the VR class room:

這是VR教室的一些屏幕截圖：

Overview of the classification results in the class room.課堂中分類(lèi)結(jié)果的概述。 Zooming in on the details of the data set in the class room.放大教室中數(shù)據(jù)集的詳細(xì)信息。

The next step was to understand how an algorithm distinguished between different classes. By zooming in and comparing the features of various selected X-ray sources one enters the parameter space. There is a lot to view here, and again we faced the problem of how to visualize all parameter dimensions at once.

下一步是了解算法如何區(qū)分不同的類(lèi)。通過(guò)放大并比較各種選定的X射線(xiàn)源的特征，可以進(jìn)入?yún)?shù)空間。這里有很多視圖，而且我們?cè)俅蚊媾R如何一次性可視化所有參數(shù)維的問(wèn)題。

The desire to pull the data points apart eventually led to the final approach: to let each source perform a ‘walk’ through space, each source starting from the same point. Their parameter values were used to define the direction and length of each step. This mapping yields that each source produced a unique path (or trace) in space, and objects with similar properties ended up in similar locations in the virtual cosmos.

將數(shù)據(jù)點(diǎn)分開(kāi)的愿望最終導(dǎo)致了最終的方法：讓每個(gè)源在空間中進(jìn)行“漫游”，每個(gè)源都從同一點(diǎn)開(kāi)始。它們的參數(shù)值用于定義每個(gè)步驟的方向和長(zhǎng)度。通過(guò)這種映射，每個(gè)源都在空間中產(chǎn)生了唯一的路徑(或軌跡)，并且具有相似屬性的對(duì)象最終位于虛擬宇宙中的相似位置。

For example, the following image shows the possible walks of three sources belonging to different classes. This one image allows us to draw the same conclusions that we received from comparing the three scatterplots from above.

例如，下圖顯示了屬于不同類(lèi)別的三個(gè)源的可能遍歷。這一幅圖像使我們能夠得出與從上方比較三個(gè)散點(diǎn)圖所得出的相同結(jié)論。

Example of parameter walks for three different source classes.三種不同源類(lèi)??的參數(shù)遍歷示例。

In this sketch, four steps are defined based on the values of the parameters HR1, HR2, HR3, and HR4. Their values mainly define the direction of the step, while the step length is defined by the selected algorithm.

在此草圖中，基于參數(shù)HR1，HR2，HR3和HR4的值定義了四個(gè)步驟。它們的值主要定義步的方向，而步長(zhǎng)由所選算法定義。

We see that the HR1and HR2 steps already help us to separate variable stars from galaxies or binary star systems. The additional parameters then help to differentiate between the latter two classes.

我們看到HR1和HR2步驟已經(jīng)幫助我們將可變恒星與星系或雙星系統(tǒng)分開(kāi)。然后，附加參數(shù)有助于區(qū)分后兩個(gè)類(lèi)。

We can see how an algorithm classified an object by the color of the objects path. More detailed information on the data mapping will be given in a subsequent article.

我們可以看到算法如何通過(guò)對(duì)象路徑的顏色對(duì)對(duì)象進(jìn)行分類(lèi)。有關(guān)數(shù)據(jù)映射的更多詳細(xì)信息將在后續(xù)文章中給出。

This is a screenshot of the VR parameter space for a large number of sources that were classified to three different classes (named CV, BL and STAR):

這是VR參數(shù)空間的屏幕快照，該VR參數(shù)空間適用于被分類(lèi)為三個(gè)不同類(lèi)(名為CV，BL和STAR)的大量源：

Exploring the parameter space探索參數(shù)空間

In the image above, there are three classes: variable stars (blue), a very active kind of elliptical galaxies (light green) and normal stars (dark green). We can see that sources whose parameters generated a similar path have been assigned to the same class. We can also see situations where the parameter values caused the path to take on a strange shape, causing confusion for the algorithm.

在上圖中，分為三類(lèi)：變星 (藍(lán)色)，一種非常活躍的橢圓星系 (淺綠色)和普通星(深綠色)。我們可以看到，其參數(shù)生成相似路徑的源已分配給同一類(lèi)。我們還可以看到參數(shù)值導(dǎo)致路徑采用奇怪形狀，導(dǎo)致算法混亂的情況。

This representation yielded a much better understanding of why a machine-learning algorithm classified a source in a certain way and made clear why it failed to characterize other sources when their paths overlapped.

通過(guò)這種表示，可以更好地理解為什么機(jī)器學(xué)習(xí)算法以某種方式對(duì)源進(jìn)行分類(lèi)，并闡明了為什么當(dāng)路徑重疊時(shí)無(wú)法表征其他源的原因。

摘要 (Summary)

Creating the Virtual Data Cosmos convinced me not only of my hypothesis that VR offers great potential for scientific data analysis in science, but also that the pure presentation of big data can create interesting and aesthetic virtual spaces when determined by the specific parameters of the data. This generative approach implies that by exploring the virtual world, users can actually examine an abstract parameter space that is not necessarily visual in nature. By interacting with the virtual elements, the visualization becomes an extremely useful tool.

創(chuàng)建虛擬數(shù)據(jù)宇宙不僅使我相信虛擬現(xiàn)實(shí)為科學(xué)中的科學(xué)數(shù)據(jù)分析提供了巨大潛力的假設(shè)，而且使大數(shù)據(jù)的純粹呈現(xiàn)在由數(shù)據(jù)的特定參數(shù)確定的情況下可以創(chuàng)建有趣且美觀的虛擬空間，這使我相信了這一事實(shí)。這種生成方法意味著，通過(guò)探索虛擬世界，用戶(hù)實(shí)際上可以檢查本質(zhì)上不一定是視覺(jué)上的抽象參數(shù)空間。通過(guò)與虛擬元素進(jìn)行交互，可視化成為極其有用的工具。

The scalability in VR is just one advantage over traditional science viz methods. Additionally, the immersive data visualization is fun to work with. It encourages one to focus longer on the data and have a more complete sense of what information might otherwise be hidden.

虛擬現(xiàn)實(shí)中的可伸縮性只是相對(duì)于傳統(tǒng)科學(xué)方法的優(yōu)勢(shì)之一。此外，沉浸式數(shù)據(jù)可視化非常有趣。它鼓勵(lì)人們將注意力集中在數(shù)據(jù)上，并對(duì)可能隱藏的信息有更全面的了解。

There is of course plenty more to be explored in this area. Once I was free from using conventional methods to represent the data, designing the parameter space using the radiation properties of the sources raised many new questions for me. How could the parameters be separated more precisely? Are there better representations that would allow the parameter correlations to be analyzed even more clearly? I’ll talk more about how I improved upon the first version by manipulating the parameters in the next article in this series.

當(dāng)然，在這一領(lǐng)域還有很多值得探索的地方。一旦我擺脫了使用傳統(tǒng)方法來(lái)表示數(shù)據(jù)的麻煩，利用光源的輻射特性設(shè)計(jì)參數(shù)空間就給我提出了許多新問(wèn)題。如何更精確地分離參數(shù)？是否有更好的表示形式可以使參數(shù)相關(guān)性得到更清晰的分析？在本系列的下一篇文章中，我將通過(guò)操縱參數(shù)來(lái)詳細(xì)討論如何對(duì)第一個(gè)版本進(jìn)行改進(jìn)。

The example of the Virtual Data Cosmos illustrates how applying principles of data visualization in VR can support the sciences by enabling the creation of mental models for multidimensional data. This project shows just how thinking outside the box and coming up with new ways to visualize big data opens many exciting possibilities for science.

虛擬數(shù)據(jù)宇宙的示例說(shuō)明了在VR中應(yīng)用數(shù)據(jù)可視化原理如何通過(guò)為多維數(shù)據(jù)創(chuàng)建心理模型來(lái)支持科學(xué)。該項(xiàng)目展示了開(kāi)箱即用的思維方式以及提出可視化大數(shù)據(jù)的新方法如何為科學(xué)帶來(lái)了許多令人興奮的可能性。

I hope I was able to inspire you to create your own VR data visualization experience. A walk-through of the VR experience I created is available on http://annok.de/vdc-2/

我希望能夠激發(fā)您創(chuàng)建自己的VR數(shù)據(jù)可視化體驗(yàn)的靈感。有關(guān)我創(chuàng)建的VR體驗(yàn)的演練，請(qǐng)?jiān)L問(wèn)http://annok.de/vdc-2/

During my years in astronomy, data visualization has been an elemental part of my research. Toward the end of my PhD, I encountered a challenge quite common in modern astronomy: understanding and visualizing information of a big dataset. Since I was also studying information design at the University of Applied Sciences, I started my exploration into data visualizations and how it could be a tool in processing multidimensional data in science or industry. In this series of articles I will describe my adventure, which eventually led to the development of the Virtual Data Cosmos.

在我從事天文學(xué)的幾年中，數(shù)據(jù)可視化一直是我研究的基本組成部分。在攻讀博士學(xué)位時(shí)，我遇到了現(xiàn)代天文學(xué)中一個(gè)相當(dāng)普遍的挑戰(zhàn)：理解和可視化大數(shù)據(jù)集的信息。由于我還在應(yīng)用科學(xué)大學(xué)學(xué)習(xí)信息設(shè)計(jì)，因此我開(kāi)始探索數(shù)據(jù)可視化以及它如何成為處理科學(xué)或工業(yè)中多維數(shù)據(jù)的工具。在本系列文章中，我將描述我的冒險(xiǎn)，最終導(dǎo)致了Virtual Data Cosmos的發(fā)展。

翻譯自: https://medium.com/nightingale/how-data-visualization-in-vr-can-revolutionize-science-aece026a2207