當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

基于plotly数据可视化_[Plotly + Datashader]可视化大型地理空间数据集

發(fā)布時(shí)間：2023/11/29 编程问答 35 豆豆

生活随笔收集整理的這篇文章主要介紹了基于plotly数据可视化_[Plotly + Datashader]可视化大型地理空间数据集小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

基于plotly數(shù)據(jù)可視化

簡(jiǎn)介(我們將創(chuàng)建的內(nèi)容)： (Introduction (what we’ll create):)

Unlike the previous tutorials in this map-based visualization series, we will be dealing with a very large dataset in this tutorial (about 2GB of lat, lon coordinates). We will learn how to use the Datashader library to convert this data into a pixel-density raster, which can be superimposed on a Mapbox base-map to create cool visualizations. The image below shows what you will create by the end of this tutorial.

與本基于地圖的可視化系列文章中的先前教程不同，本教程將處理非常大的數(shù)據(jù)集(約2GB的經(jīng)緯度坐標(biāo))。我們將學(xué)習(xí)如何使用Datashader庫(kù)將該數(shù)據(jù)轉(zhuǎn)換為像素密度柵格，該柵格可以疊加在Mapbox底圖上以創(chuàng)建出色的可視化效果。下圖顯示了本教程結(jié)束時(shí)將創(chuàng)建的內(nèi)容。

本教程的結(jié)構(gòu)： (Structure of the tutorial:)

The tutorial is structured into the following sections:

本教程分為以下幾節(jié)：

Pre-requisites

先決條件

About Datashader

關(guān)于Datashader

Getting started with the tutorial

教程入門

When to use this library

何時(shí)使用此庫(kù)

先決條件： (Pre-requisites:)

This tutorial assumes that you are familiar with python and that you have python downloaded and installed in your machine. If you are not familiar with python but have some experience of programming in some other languages, you may still be able to follow this tutorial, depending on your proficiency.

本教程假定您熟悉python，并且已在計(jì)算機(jī)中下載并安裝了python。如果您不熟悉python，但有一些使用其他語(yǔ)言進(jìn)行編程的經(jīng)驗(yàn)，那么您仍然可以根據(jù)自己的熟練程度來(lái)學(xué)習(xí)本教程。

It is very strongly recommended that you go through the Plotly tutorial before going through this tutorial. In this tutorial, the installation of plotly and the concepts covered in the Plotly tutorial will not be repeated.

強(qiáng)烈建議您先閱讀Plotly教程，然后再進(jìn)行本教程。在本教程中，不會(huì)重復(fù)安裝plotly和Plotly教程中涵蓋的概念。

Also, you are strongly encouraged to go through the ‘About Mapbox’ section in the [Plotly + Mapbox] Interactive Choropleth visualization tutorial. We will not repeat that section here, but it is very much a part of this tutorial.

另外，強(qiáng)烈建議您閱讀[Plotly + Mapbox] Interactive Choropleth可視化教程中的“關(guān)于Mapbox”部分。我們不會(huì)在這里重復(fù)該部分，但這是本教程的大部分內(nèi)容。

關(guān)于Datashader： (About Datashader:)

Quoting the official Datashader website,

引用Datashader官方網(wǎng)站，

Datashader is a graphics pipeline system for creating meaningful representations of large datasets quickly and flexibly

Datashader是一個(gè)圖形管道系統(tǒng)，用于快速，靈活地創(chuàng)建大型數(shù)據(jù)集的有意義的表示形式

In layman terms, datashader converts the millions of lat-lon coordinates into a pixel-density map. Say you have a million lat-lon coordinates bound between latitudes [x,y] and longitudes [a,b]. Now, you create a 100x100 pixels image with the corners corresponding to the extreme lat-lon pairs. So you now have a total of 10,000 pixels. Each pixel corresponds to a physical tile of say 100 sq. km. (actual area will depend on the values of x,y,a,b). Now, if tile1 has 100 lat-lon coordinates within it and tile2 has 1000 coordinates, tile2 has a coordinate density 10 times higher than tile 1. Thus, the pixel corresponding to tile2 will be 10 times brighter than the pixel corresponding to tile1. So essentially, a million lat-lon coordinates now get converted into 10,000 pixel-density mappings. Essentially, the coordinates have been converted into a raster image. This is what makes datashader so powerful.

用外行術(shù)語(yǔ)來(lái)說(shuō)，數(shù)據(jù)著色器將數(shù)百萬(wàn)個(gè)緯度坐標(biāo)轉(zhuǎn)換為像素密度圖。假設(shè)您在緯度[x，y]和經(jīng)度[a，b]之間綁定了一百萬(wàn)個(gè)緯度坐標(biāo)。現(xiàn)在，您創(chuàng)建一個(gè)100x100像素的圖像，其角對(duì)應(yīng)于極端緯度對(duì)。因此，您現(xiàn)在總共有10,000個(gè)像素。每個(gè)像素對(duì)應(yīng)于例如100平方公里的物理圖塊。 (實(shí)際面積取決于x，y，a，b的值)。現(xiàn)在，如果tile1中具有100個(gè)緯度坐標(biāo)，而tile2中具有1000個(gè)坐標(biāo)，則tile2的坐標(biāo)密度將比tile 1高10倍。因此，與tile2對(duì)應(yīng)的像素將比與tile1對(duì)應(yīng)的像素亮10倍。因此從本質(zhì)上講，現(xiàn)在可以將一百萬(wàn)個(gè)緯度坐標(biāo)轉(zhuǎn)換為10,000個(gè)像素密度映射。實(shí)質(zhì)上，坐標(biāo)已轉(zhuǎn)換為光柵圖像。這就是使datashader如此強(qiáng)大的原因。

安裝數(shù)據(jù)著色器： (Installing datashader:)

If you are using Anaconda,

如果您正在使用Anaconda，

conda install datashader

Else, you can use the pip installer:

另外，您可以使用pip安裝程序：

pip install datashader

See the Getting Started guide on the datashader website for more information.

有關(guān)更多信息，請(qǐng)參見(jiàn)datashader網(wǎng)站上的《入門指南》。

教程入門： (Getting started with the tutorial:)

GitHub repo: https://github.com/carnot-technologies/MapVisualizations

GitHub回購(gòu)： https : //github.com/carnot-technologies/MapVisualizations

Relevant notebook: DatashaderDemo.ipynb

相關(guān)筆記本： DatashaderDemo.ipynb

View notebook on NBViewer: Click Here

在NBViewer上查看筆記本：單擊此處

導(dǎo)入相關(guān)軟件包： (Import relevant packages:)

import dask.dataframe as dd
import datashader as ds
import plotly.express as px

Note the import of dask.dataframe instead of pandas. Because we are dealing with a large dataset, dask will be much faster than pandas. For perspective, the .read_csv() operation takes 19 seconds with pandas and less than a second with dask. Click here to know more about why dask is preferred for large datasets. The gist is that dask utilizes all the cores on your machine, which pandas is unable to do.

注意dask.dataframe而不是pandas的導(dǎo)入。由于我們要處理的是大型數(shù)據(jù)集，因此dask的速度將比pandas快得多。出于透視考慮，.read_csv()操作使用熊貓需要19秒，而使用dask則不到一秒。單擊此處以了解更多關(guān)于為什么dask是大型數(shù)據(jù)集首選的原因。要點(diǎn)是，dask可以利用計(jì)算機(jī)上的所有內(nèi)核，而pandas則無(wú)法做到。

導(dǎo)入和清除數(shù)據(jù)： (Import and clean data:)

Since the relevant CSV for this tutorial is about 2 GB large (74 million + coordinates), it was not possible to host this on GitHub. It can be downloaded from this Google Drive link. It is recommended that you download this file and save it into your data folder. Once that is done, you can simply import it like any other CSV.

由于本教程的相關(guān)CSV大小約為2 GB(7400萬(wàn)個(gè)+坐標(biāo))，因此無(wú)法在GitHub上托管。可以從此Google云端硬盤鏈接下載。建議您下載此文件并將其保存到數(shù)據(jù)文件夾中。完成后，您可以像導(dǎo)入其他CSV一樣簡(jiǎn)單地導(dǎo)入它。

Note: Make sure that you don’t have any other heavy software open when you are loading this dataset, especially if your RAM is comparable to the file size.

注意：加載此數(shù)據(jù)集時(shí)，請(qǐng)確保沒(méi)有打開(kāi)任何其他繁瑣的軟件，尤其是在您的RAM與文件大小相當(dāng)?shù)那闆r下。

df = dd.read_csv('data/lat_lon_data.csv')

Now, we will perform some basic cleaning of the data. Since our region of interest is India, we will make sure that all coordinates outside the lat-lon bounds of India are excluded.

現(xiàn)在，我們將對(duì)數(shù)據(jù)進(jìn)行一些基本清理。由于我們的關(guān)注區(qū)域是印度，因此我們將確保排除印度經(jīng)緯度范圍以外的所有坐標(biāo)。

#Remove any unwanted columns
df = df[['latitude','longitude']]#Clean data, remove any out of bounds points
df = df[df['latitude'] > 6]
df = df[df['latitude'] < 38]
df = df[df['longitude'] > 68]
df = df[df['longitude'] < 98]

創(chuàng)建數(shù)據(jù)著色器畫布： (Creating the datashader canvas:)

cvs = ds.Canvas(plot_width=1000, plot_height=1000)
agg = cvs.points(df, x='longitude', y='latitude')
# agg is an xarray object, see http://xarray.pydata.org/en/stable/coords_lat, coords_lon = agg.coords['latitude'].values, agg.coords['longitude'].values# Corners of the image, which need to be passed to mapbox
coordinates = [[coords_lon[0], coords_lat[0]],
[coords_lon[-1], coords_lat[0]],
[coords_lon[-1], coords_lat[-1]],
[coords_lon[0], coords_lat[-1]]]

We have created a 1000 x 1000 canvas cvs . Next, we projected the longitude and latitude from the dataframe onto the canvas, using cvs.points. Then we fetch the projected coordinates and determine the corner points for the image.

我們創(chuàng)建了一個(gè)1000 x 1000的畫布cvs 。接下來(lái)，我們使用cvs.points將數(shù)據(jù)cvs.points的經(jīng)度和緯度投影到畫布上。然后，我們獲取投影坐標(biāo)并確定圖像的角點(diǎn)。

Now that we have the canvas ready, let us define the colormap for the visualization. We will use the hot colormap. You can use other alternatives, like fire, or any other color map of your choice.

現(xiàn)在我們已經(jīng)準(zhǔn)備好畫布，讓我們?yōu)榭梢暬x顏色圖。我們將使用hot表。您可以使用其他替代方法，例如火或您選擇的任何其他顏色圖。

from matplotlib.cm import hot
import datashader.transfer_functions as tf
img=(tf.shade(agg, cmap = hot, how='log'))[::-1].to_pil()#pil stands for Python Image Library

A couple of things to note here. We are using a transfer function to shade the projected coordinates, using the hot colormap. We have specified the mapping methodology as log. This is to ensure that even the low-intensity points get represented adequately in the visualization. If we chose the linear mapping, then the high intensity points completely overshadow the low-intensity points.

這里有幾件事要注意。我們正在使用傳遞函數(shù)，通過(guò)hot色圖來(lái)陰影投影坐標(biāo)。我們已將映射方法指定為log 。這是為了確保即使是低強(qiáng)度的點(diǎn)也可以在可視化中得到充分的體現(xiàn)。如果我們選擇linear映射，則高強(qiáng)度點(diǎn)將完全覆蓋低強(qiáng)度點(diǎn)。

Another mapping option is eq_hist , which produces a result similar to the log transformation. You can read more about it here. A comparison of the outputs of the 3 transformations in shown below.

另一個(gè)映射選項(xiàng)是eq_hist ，它產(chǎn)生的結(jié)果類似于對(duì)數(shù)轉(zhuǎn)換。您可以在此處了解更多信息。下面顯示了3個(gè)轉(zhuǎn)換的輸出的比較。

Different transformations不同的轉(zhuǎn)變

As you can see, almost nothing is visible with the linear transformation. This is because a couple of pixels with extremely high intensity have overshadowed all others. You will need to zoom-in to identify those hotspots.

如您所見(jiàn)，線性變換幾乎看不到任何東西。這是因?yàn)閹讉€(gè)具有極高強(qiáng)度的像素使所有其他像素都黯淡了。您將需要放大以識(shí)別那些熱點(diǎn)。

Similar to the transformation, different color map options are also available. To get the list of all color maps, click here. Below, the examples with a few different color maps are shown.

與轉(zhuǎn)換類似，也可以使用不同的顏色圖選項(xiàng)。要獲取所有顏色圖的列表，請(qǐng)單擊此處。下面顯示了帶有一些不同顏色映射的示例。

Different color maps不同的顏色圖

創(chuàng)建可視化： (Creating the visualization:)

fig = px.scatter_mapbox(df.tail(1),
lat='latitude',
lon='longitude',
zoom=4,width=1000, height=1000)# Add the datashader image as a mapbox layer image
fig.update_layout(mapbox_style="carto-darkmatter",
mapbox_layers = [
{
"sourcetype": "image",
"source": img,
"coordinates": coordinates
}]
)
fig.show()

Here, we are plotting just one point from the dataframe (the last one), so that plotly can create the scatter visualization. We are using the carto-darkmatter style from Mapbox and overlaying the image output of datashader as a layer on top of the visualization. Congratulations!! Your visualization is ready!

在這里，我們僅繪制了數(shù)據(jù)框中的一個(gè)點(diǎn)(最后一個(gè))，以便可以通過(guò)散點(diǎn)圖創(chuàng)建散點(diǎn)圖。我們正在使用Mapbox中的carto-darkmatter樣式，并將datashader的圖像輸出覆蓋為可視化之上的一層。恭喜!! 您的可視化已準(zhǔn)備就緒！

何時(shí)使用此庫(kù)： (When to use this library:)

The answer is perhaps the simplest for this library. Use this when you have a very large data set. If you find this visualization aesthetically appealing as I do, then you can use this for smaller datasets as well, but the results will depend on the density distribution of your data. You won’t get high interactivity, because datashader essentially overlays an image on the Mapbox base-map. But you can still zoom and pan the visualization.

對(duì)于這個(gè)庫(kù)，答案也許是最簡(jiǎn)單的。如果數(shù)據(jù)集非常大，請(qǐng)使用此選項(xiàng)。如果您發(fā)現(xiàn)這種可視化效果像我一樣美觀，那么您也可以將其用于較小的數(shù)據(jù)集，但結(jié)果將取決于數(shù)據(jù)的密度分布。您不會(huì)獲得很高的交互性，因?yàn)閐atashader本質(zhì)上會(huì)將圖像疊加在Mapbox底圖上。但是您仍然可以縮放和平移可視化效果。

We are trying to fix some broken benches in the Indian agriculture ecosystem through technology, to improve farmers’ income. If you share the same passion join us in the pursuit, or simply drop us a line on report@carnot.co.in

我們正在嘗試通過(guò)技術(shù)修復(fù)印度農(nóng)業(yè)生態(tài)系統(tǒng)中一些破爛的長(zhǎng)凳，以提高農(nóng)民的收入。如果您有同樣的熱情，請(qǐng)加入我們的行列，或者直接給我們寫信至report@carnot.co.in

翻譯自: https://medium.com/tech-carnot/plotly-datashader-visualizing-large-geospatial-datasets-bea27b9d7824