地理空间数据
摘要 (Summary)
In this article, using Data Science and Python, I will show how different Clustering algorithms can be applied to Geospatial data in order to solve a Retail Rationalization business case.
在本文中,我將使用數(shù)據(jù)科學(xué)和Python演示如何將不同的聚類(lèi)算法應(yīng)用于地理空間數(shù)據(jù),以解決零售合理化業(yè)務(wù)案例。
Store Rationalization is the reorganization of a company in order to increase its operating efficiency and decrease costs. As a result of the Covid-19 crisis, several retail businesses from all around the world are closing stores. That is not exclusively a symptom of financial distress, in fact many companies have been focusing their investments on making their business more digital.
商店合理化是為了提高公司的運(yùn)營(yíng)效率和降低成本而對(duì)公司進(jìn)行的重組。 Covid-19危機(jī)的結(jié)果是,來(lái)自世界各地的多家零售企業(yè)都關(guān)閉了商店。 這不僅是財(cái)務(wù)困境的征兆,事實(shí)上,許多公司一直將其投資重點(diǎn)放在使業(yè)務(wù)數(shù)字化上。
Clustering is the task of grouping a set of objects in such a way that observations in the same group are more similar to each other than to those in other groups. It is one of the most popular applications of the Unsupervised Learning (Machine Learning when there is no target variable).
聚類(lèi) 任務(wù)是將一組對(duì)象進(jìn)行分組,以使同一組中的觀(guān)察彼此之間的相似度高于其他組中的觀(guān)察。 它是無(wú)監(jiān)督學(xué)習(xí) (當(dāng)沒(méi)有目標(biāo)變量時(shí)的機(jī)器學(xué)習(xí))最受歡迎的應(yīng)用之一。
Geospatial analysis is the field of Data Science that processes satellite images, GPS coordinates, and street addresses to apply to geographic models.
地理空間分析是數(shù)據(jù)科學(xué)領(lǐng)域,其處理衛(wèi)星圖像,GPS坐標(biāo)和街道地址以應(yīng)用于地理模型。
In this article, I’m going to use clustering with geographic data to solve a retail rationalization problem. I will present some useful Python code that can be easily applied in other similar cases (just copy, paste, run) and walk through every line of code with comments so that you can replicate this example (link to the full code below).
在本文中,我將使用集群與地理數(shù)據(jù)來(lái)解決零售合理化問(wèn)題。 我將介紹一些有用的Python代碼,這些代碼可以輕松地應(yīng)用于其他類(lèi)似情況(只需復(fù)制,粘貼,運(yùn)行),并在每行代碼中添加注釋,以便您可以復(fù)制此示例(鏈接至下面的完整代碼)。
I will use the “Starbucks Stores dataset” that provides the location of all the stores in operation (link below). I shall select a particular geographic area and, in addition to the latitude and longitude provided, I will simulate some business information for each store in the dataset (cost, capacity, staff).
我將使用“ 星巴克商店數(shù)據(jù)集 ”,它提供了所有正在運(yùn)營(yíng)的商店的位置(下面的鏈接)。 我將選擇一個(gè)特定的地理區(qū)域,除了提供的緯度和經(jīng)度之外,我還將模擬數(shù)據(jù)集中每個(gè)商店的一些業(yè)務(wù)信息(成本,容量,員工)。
In particular, I will go through:
特別是,我將經(jīng)歷:
- Setup: import packages, read geographic data, create business features. 設(shè)置:導(dǎo)入軟件包,讀取地理數(shù)據(jù),創(chuàng)建業(yè)務(wù)功能。
Data Analysis: presentation of the business case on the map with folium and geopy.
數(shù)據(jù)分析:使用大葉 草和geopy在地圖上呈現(xiàn)業(yè)務(wù)案例。
Clustering: Machine Learning (K-Means / Affinity Propagation) with scikit-learn, Deep Learning (Self Organizing Map) with minisom.
聚類(lèi):具有scikit-learn的機(jī)器學(xué)習(xí)(K均值/親和力傳播),具有minisom的深度學(xué)習(xí)(自組織圖)。
- Store Rationalization: build a deterministic algorithm to solve the business case. 商店合理化:構(gòu)建確定性算法來(lái)解決業(yè)務(wù)案例。
建立 (Setup)
First of all, I need to import the following packages.
首先,我需要導(dǎo)入以下軟件包。
## for dataimport numpy as np
import pandas as pd## for plotting
import matplotlib.pyplot as plt
import seaborn as sns## for geospatial
import folium
import geopy## for machine learning
from sklearn import preprocessing, cluster
import scipy## for deep learning
import minisom
Then I shall read the data into a pandas Dataframe.
然后,我將數(shù)據(jù)讀入pandas Dataframe。
dtf = pd.read_csv('data_stores.csv')The original dataset contains over 5,000 cities and 25,000 stores, but for the purpose of this tutorial, I will work with just one city.
原始數(shù)據(jù)集包含5,000多個(gè)城市和25,000個(gè)商店,但是出于本教程的目的,我將僅處理一個(gè)城市。
filter = "Las Vegas"dtf = dtf[dtf["City"]==filter][["City","Street Address","Longitude","Latitude"]].reset_index(drop=True)dtf = dtf.reset_index().rename(columns={"index":"id"})dtf.head()
In that area, there are 156 stores. In order to proceed with the business case, I’m going to simulate some information for each store:
在那個(gè)地區(qū),有156家商店。 為了繼續(xù)進(jìn)行業(yè)務(wù)案例,我將為每個(gè)商店模擬一些信息 :
Potential: total capacity in terms of staff (e.g. 10 means that the store can have up to 10 employees)
潛力 :以員工人數(shù)計(jì)算的總?cè)萘?例如10表示商店最多可容納10名員工)
Staff: current staff level (e.g. 7 means that the store is currently operating with 7 employees)
員工 :當(dāng)前員工級(jí)別(例如7表示商店目前有7名員工)
Capacity: current left capacity (e.g. 10–7=3, the store can still host 3 employees)
容量 :當(dāng)前剩余容量(例如10–7 = 3,商店仍可容納3名員工)
Cost: annual cost for the company to keep the store operating (“l(fā)ow”, “medium”, “high”)
成本 :公司維持商店運(yùn)營(yíng)的年度成本(“ 低 ”,“ 中 ”,“ 高 ”)
Please note that this is just a simulation, these numbers are generated randomly and don’t actually reflect Starbucks (or any other company) business.
請(qǐng)注意,這只是一個(gè)模擬,這些數(shù)字是隨機(jī)生成的,實(shí)際上并不反映星巴克(或任何其他公司)的業(yè)務(wù)。
Now that it’s all set, I will start by analyzing the business case, then build a clustering model and a rationalization algorithm.
現(xiàn)在已經(jīng)準(zhǔn)備就緒,我將首先分析業(yè)務(wù)案例,然后構(gòu)建聚類(lèi)模型和合理化算法。
Let’s get started, shall we?
讓我們開(kāi)始吧,好嗎?
數(shù)據(jù)分析 (Data Analysis)
Let’s pretend we own a retail business and we have to close some stores. We would want to do that maximizing the profit (by minimizing the cost) and without laying off any staff.
假設(shè)我們擁有一家零售企業(yè),而我們不得不關(guān)閉一些商店。 我們希望做到這一點(diǎn),以最大化利潤(rùn)(通過(guò)最小化成本)并且不裁員。
The costs are distributed as follows:
成本分配如下:
x = "Cost"ax = dtf[x].value_counts().sort_values().plot(kind="barh")totals = []
for i in ax.patches:
totals.append(i.get_width())
total = sum(totals)
for i in ax.patches:
ax.text(i.get_width()+.3, i.get_y()+.20,
str(round((i.get_width()/total)*100, 2))+'%',
fontsize=10, color='black')
ax.grid(axis="x")
plt.suptitle(x, fontsize=20)
plt.show()
Currently, only a small portion of stores are running at full potential (left Capacity = 0), meaning that there are some with really low staff (high left Capacity):
當(dāng)前,只有一小部分商店正以全部潛力運(yùn)營(yíng)(左產(chǎn)能= 0),這意味著有些員工的數(shù)量真的很低(左產(chǎn)能很高):
Let’s visualize those pieces of information on a map. First of all, I need to get the coordinates of the geographic area to start up the map. I shall do that with geopy:
讓我們?cè)诘貓D上可視化這些信息。 首先,我需要獲取地理區(qū)域的坐標(biāo)才能啟動(dòng)地圖。 我將用geopy做到這一點(diǎn) :
city = "Las Vegas"## get locationlocator = geopy.geocoders.Nominatim(user_agent="MyCoder")
location = locator.geocode(city)
print(location)## keep latitude and longitude only
location = [location.latitude, location.longitude]
print("[lat, long]:", location)
I am going to create the map with folium, a really convenient package that allows us to plot interactive maps without needing to load a shapefile. Each store shall be identified by a point with size proportional to its current staff and color based on its cost. I’m also going to add a small piece of HTML code to the default map to display the legend.
我將使用folium創(chuàng)建地圖,這是一個(gè)非常方便的程序包,它使我們能夠繪制交互式地圖而無(wú)需加載shapefile 。 每個(gè)商店都應(yīng)通過(guò)一個(gè)點(diǎn)來(lái)標(biāo)識(shí),該點(diǎn)的大小與其當(dāng)前員工人數(shù)成正比,并根據(jù)其成本來(lái)區(qū)分顏色。 我還將在默認(rèn)地圖中添加一小段HTML代碼以顯示圖例。
x, y = "Latitude", "Longitude"color = "Cost"size = "Staff"popup = "Street Address"
data = dtf.copy()
## create color column
lst_colors=["red","green","orange"]
lst_elements = sorted(list(dtf[color].unique()))
data["color"] = data[color].apply(lambda x:
lst_colors[lst_elements.index(x)])## create size column (scaled)
scaler = preprocessing.MinMaxScaler(feature_range=(3,15))
data["size"] = scaler.fit_transform(
data[size].values.reshape(-1,1)).reshape(-1)
## initialize the map with the starting locationmap_ = folium.Map(location=location, tiles="cartodbpositron",
zoom_start=11)## add points
data.apply(lambda row: folium.CircleMarker(
location=[row[x],row[y]], popup=row[popup],
color=row["color"], fill=True,
radius=row["size"]).add_to(map_), axis=1)## add html legendlegend_html = """<div style="position:fixed; bottom:10px; left:10px; border:2px solid black; z-index:9999; font-size:14px;"> <b>"""+color+""":</b><br>"""
for i in lst_elements:
legend_html = legend_html+""" <i class="fa fa-circle
fa-1x" style="color:"""+lst_colors[lst_elements.index(i)]+"""">
</i> """+str(i)+"""<br>"""
legend_html = legend_html+"""</div>"""map_.get_root().html.add_child(folium.Element(legend_html))
## plot the map
map_
Our objective is to close as many high-cost stores (red points) as possible by moving their staff into low-cost stores (green points) with capacity located in the same neighborhood. As a result, we’ll maximize profit (by closing high-cost stores) and efficiency (by having low-cost stores working at full capacity).
我們的目標(biāo)是通過(guò)將員工轉(zhuǎn)移到容量相同的低成本商店(綠點(diǎn))中來(lái)關(guān)閉盡可能多的高成本商店(紅點(diǎn))。 結(jié)果,我們將最大化利潤(rùn)(通過(guò)關(guān)閉高成本商店)和效率(通過(guò)使低成本商店滿(mǎn)負(fù)荷運(yùn)轉(zhuǎn))。
How can we define neighborhoods without selecting distance thresholds and geographic boundaries? Well, the answer is … Clustering.
我們?nèi)绾卧诓贿x擇距離閾值和地理邊界的情況下定義鄰域? 好吧,答案是…群集。
聚類(lèi) (Clustering)
There are several algorithms that can be used, the main ones are listed here. I will try K-Means, Affinity Propagation, Self Organizing Map.
可以使用幾種算法, 此處列出了主要算法。 我將嘗試K-均值,親和傳播,自組織映射。
K-Means aims to partition the observations into a predefined number of clusters (k) in which each point belongs to the cluster with the nearest mean. It starts by randomly selecting k centroids and assigning the points to the closest cluster, then it updates each centroid with the mean of all points in the cluster. This algorithm is convenient when you need the get a precise number of groups (e.g. to keep a minimum number of operating stores), and it’s more appropriate for a small number of even clusters.
K-Means旨在將觀(guān)察結(jié)果劃分為預(yù)定義數(shù)量的聚類(lèi)( k ),其中每個(gè)點(diǎn)均屬于具有最均值的聚類(lèi)。 首先從隨機(jī)選擇k個(gè)質(zhì)心并將點(diǎn)分配給最近的聚類(lèi)開(kāi)始,然后使用聚類(lèi)中所有點(diǎn)的平均值更新每個(gè)質(zhì)心。 當(dāng)您需要獲取精確數(shù)量的組(例如,保持最少數(shù)量的運(yùn)營(yíng)商店)時(shí),此算法非常方便,它更適合少數(shù)偶數(shù)集群。
Here, in order to define the right k, I shall use the Elbow Method: plotting the variance as a function of the number of clusters and picking the k that flats the curve.
在這里,為了定義正確的k,我將使用Elbow方法 :繪制方差作為簇?cái)?shù)的函數(shù),并選擇使曲線(xiàn)平坦的k 。
X = dtf[["Latitude","Longitude"]]max_k = 10## iterationsdistortions = []
for i in range(1, max_k+1):
if len(X) >= i:
model = cluster.KMeans(n_clusters=i, init='k-means++', max_iter=300, n_init=10, random_state=0)
model.fit(X)
distortions.append(model.inertia_)## best k: the lowest derivative
k = [i*100 for i in np.diff(distortions,2)].index(min([i*100 for i
in np.diff(distortions,2)]))## plot
fig, ax = plt.subplots()
ax.plot(range(1, len(distortions)+1), distortions)
ax.axvline(k, ls='--', color="red", label="k = "+str(k))
ax.set(title='The Elbow Method', xlabel='Number of clusters',
ylabel="Distortion")
ax.legend()
ax.grid(True)
plt.show()
We can try with k = 5 so that the K-Means algorithm will find 5 theoretical centroids. In addition, I will identify the real centroids too (the closest observation to the cluster center).
我們可以嘗試使用k = 5,以便K-Means算法將找到5個(gè)理論質(zhì)心。 此外,我還將識(shí)別真實(shí)的質(zhì)心(最接近聚類(lèi)中心的觀(guān)測(cè)值)。
k = 5model = cluster.KMeans(n_clusters=k, init='k-means++')
X = dtf[["Latitude","Longitude"]]## clustering
dtf_X = X.copy()
dtf_X["cluster"] = model.fit_predict(X)## find real centroids
closest, distances = scipy.cluster.vq.vq(model.cluster_centers_,
dtf_X.drop("cluster", axis=1).values)
dtf_X["centroids"] = 0
for i in closest:
dtf_X["centroids"].iloc[i] = 1## add clustering info to the original datasetdtf[["cluster","centroids"]] = dtf_X[["cluster","centroids"]]
dtf.sample(5)
I added two columns to the dataset: “cluster” indicating what cluster the observation belongs to, and “centroids” that is 1 if an observation is also the centroid (the closest to the center) and 0 otherwise. Let’s plot it out:
我在數(shù)據(jù)集中添加了兩列:“ cluster ”,指示觀(guān)察值所屬的聚類(lèi);“ centroids ”,如果觀(guān)察值也是質(zhì)心(最靠近中心),則為1;否則為0。 讓我們把它畫(huà)出來(lái):
## plotfig, ax = plt.subplots()
sns.scatterplot(x="Latitude", y="Longitude", data=dtf,
palette=sns.color_palette("bright",k),
hue='cluster', size="centroids", size_order=[1,0],
legend="brief", ax=ax).set_title('Clustering
(k='+str(k)+')')th_centroids = model.cluster_centers_
ax.scatter(th_centroids[:,0], th_centroids[:,1], s=50, c='black',
marker="x")
Affinity Propagation is a graph-based algorithm that assigns each observation to its nearest exemplar. Basically, all the observations “vote” for which other observations they want to be associated with, which results in a partitioning of the whole dataset into a large number of uneven clusters. It’s quite convenient when you can’t specify the number of clusters, and it’s suited for geospatial data as it works well with non-flat geometry.
相似性傳播是一種基于圖的算法,可將每個(gè)觀(guān)察值分配給最接近的示例。 基本上,所有觀(guān)察都“投票”與它們想要關(guān)聯(lián)的其他觀(guān)察,這導(dǎo)致整個(gè)數(shù)據(jù)集被劃分為大量不均勻的簇。 當(dāng)您無(wú)法指定簇?cái)?shù)時(shí),這非常方便,并且它適用于地理空間數(shù)據(jù),因?yàn)樗m用于非平面幾何。
model = cluster.AffinityPropagation()Using the same code from before, you can fit the model (finds 12 clusters), and you can use the following code to plot (the difference is that k wasn’t declared at the beginning and there are no theoretical centroids):
使用與之前相同的代碼,可以擬合模型(找到12個(gè)聚類(lèi)),并且可以使用以下代碼進(jìn)行繪圖(不同之處在于,開(kāi)頭沒(méi)有聲明k ,并且沒(méi)有理論上的質(zhì)心):
k = dtf["cluster"].nunique()sns.scatterplot(x="Latitude", y="Longitude", data=dtf,palette=sns.color_palette("bright",k),
hue='cluster', size="centroids", size_order=[1,0],
legend="brief").set_title('Clustering
(k='+str(k)+')')
Self Organizing Maps (SOMs) are quite different as they use deep learning. In fact, A SOM is a type of artificial neural network that is trained using unsupervised learning to produce a low-dimensional representation of the input space, called a “map” (also referred to as Kohonen layer). Basically, inputs are connected to n x m neurons which form the map, then for every observation is calculated the “winning” neuron (the closest), and neurons are clustered together using the lateral distance. Here, I will try with a 4x4 SOM:
自組織地圖 (SOM)使用深度學(xué)習(xí)的方式完全不同。 實(shí)際上,SOM是一種人工神經(jīng)網(wǎng)絡(luò),使用無(wú)監(jiān)督學(xué)習(xí)對(duì)其進(jìn)行訓(xùn)練,以生成輸入空間的低維表示形式,稱(chēng)為“地圖”(也稱(chēng)為Kohonen層 )。 基本上,將輸入連接到形成地圖的nxm個(gè)神經(jīng)元,然后針對(duì)每個(gè)觀(guān)察值計(jì)算“獲勝”神經(jīng)元(最近的),并使用橫向距離將神經(jīng)元聚在一起。 在這里,我將嘗試使用4x4 SOM:
X = dtf[["Latitude","Longitude"]]map_shape = (4,4)## scale data
scaler = preprocessing.StandardScaler()
X_preprocessed = scaler.fit_transform(X.values)## clusteringmodel = minisom.MiniSom(x=map_shape[0], y=map_shape[1],
input_len=X.shape[1])
model.train_batch(X_preprocessed, num_iteration=100, verbose=False)## build output dataframe
dtf_X = X.copy()
dtf_X["cluster"] = np.ravel_multi_index(np.array(
[model.winner(x) for x in X_preprocessed]).T, dims=map_shape)## find real centroidscluster_centers = np.array([vec for center in model.get_weights()
for vec in center])closest, distances = scipy.cluster.vq.vq(model.cluster_centers_,
X_preprocessed)
dtf_X["centroids"] = 0
for i in closest:
dtf_X["centroids"].iloc[i] = 1## add clustering info to the original datasetdtf[["cluster","centroids"]] = dtf_X[["cluster","centroids"]]## plotk = dtf["cluster"].nunique()fig, ax = plt.subplots()
sns.scatterplot(x="Latitude", y="Longitude", data=dtf,
palette=sns.color_palette("bright",k),
hue='cluster', size="centroids", size_order=[1,0],
legend="brief", ax=ax).set_title('Clustering
(k='+str(k)+')')th_centroids = scaler.inverse_transform(cluster_centers)
ax.scatter(th_centroids[:,0], th_centroids[:,1], s=50, c='black',
marker="x")
Independently from the algorithm you used to cluster the data, now you have a dataset with two more columns (“cluster”, “centroids”). We can use that to visualize the clusters on the map, and this time I’m going to display the centroids as well using a marker.
獨(dú)立于用于對(duì)數(shù)據(jù)進(jìn)行聚類(lèi)的算法,現(xiàn)在您有了一個(gè)包含兩列(“ 聚類(lèi) ”,“ 質(zhì)心 ”)的數(shù)據(jù)集。 我們可以使用它來(lái)可視化地圖上的聚類(lèi),這一次,我還將使用標(biāo)記顯示質(zhì)心。
x, y = "Latitude", "Longitude"color = "cluster"size = "Staff"popup = "Street Address"marker = "centroids"
data = dtf.copy()## create color column
lst_elements = sorted(list(dtf[color].unique()))
lst_colors = ['#%06X' % np.random.randint(0, 0xFFFFFF) for i in
range(len(lst_elements))]
data["color"] = data[color].apply(lambda x:
lst_colors[lst_elements.index(x)])## create size column (scaled)
scaler = preprocessing.MinMaxScaler(feature_range=(3,15))
data["size"] = scaler.fit_transform(
data[size].values.reshape(-1,1)).reshape(-1)## initialize the map with the starting locationmap_ = folium.Map(location=location, tiles="cartodbpositron",
zoom_start=11)## add points
data.apply(lambda row: folium.CircleMarker(
location=[row[x],row[y]], popup=row[popup],
color=row["color"], fill=True,
radius=row["size"]).add_to(map_), axis=1)## add html legendlegend_html = """<div style="position:fixed; bottom:10px; left:10px; border:2px solid black; z-index:9999; font-size:14px;"> <b>"""+color+""":</b><br>"""
for i in lst_elements:
legend_html = legend_html+""" <i class="fa fa-circle
fa-1x" style="color:"""+lst_colors[lst_elements.index(i)]+"""">
</i> """+str(i)+"""<br>"""
legend_html = legend_html+"""</div>"""map_.get_root().html.add_child(folium.Element(legend_html))## add centroids marker
lst_elements = sorted(list(dtf[marker].unique()))
data[data[marker]==1].apply(lambda row:
folium.Marker(location=[row[x],row[y]],
popup=row[marker], draggable=False,
icon=folium.Icon(color="black")).add_to(map_), axis=1)## plot the map
map_
Now that we have the clusters, we can start the store rationalization inside each of them.
現(xiàn)在我們有了集群,我們可以在每個(gè)集群內(nèi)部開(kāi)始商店合理化。
商店合理化 (Store Rationalization)
Since the main focus of this article is clustering geospatial data, I will keep this section very simple. Inside each cluster, I will select the potential targets (high-cost stores) and hubs (low-cost stores), and relocate the staff of the targets in the hubs until the latter reach full capacity. When the whole staff of a target is moved, the store can be closed.
由于本文的主要重點(diǎn)是對(duì)地理空間數(shù)據(jù)進(jìn)行聚類(lèi),因此我將使本節(jié)非常簡(jiǎn)單。 在每個(gè)集群內(nèi)部,我將選擇潛在目標(biāo)(高成本商店)和中心(低成本商店),并在中心達(dá)到目標(biāo)容量之前重新定位目標(biāo)人員。 當(dāng)目標(biāo)的整個(gè)人員移動(dòng)時(shí),可以關(guān)閉商店。
Iteration inside a cluster集群內(nèi)部的迭代 dtf_new = pd.DataFrame()for c in sorted(dtf["cluster"].unique()):dtf_cluster = dtf[dtf["cluster"]==c]
## hubs and targets
lst_hubs = dtf_cluster[dtf_cluster["Cost"]=="low"
].sort_values("Capacity").to_dict("records")
lst_targets = dtf_cluster[dtf_cluster["Cost"]=="high"
].sort_values("Staff").to_dict("records") ## move targets
for target in lst_targets:
for hub in lst_hubs:
### if hub has space
if hub["Capacity"] > 0:
residuals = hub["Capacity"] - target["Staff"] #### case of hub has still capacity: do next target
if residuals >= 0:
hub["Staff"] += target["Staff"]
hub["Capacity"] = hub["Potential"] - hub["Staff"]
target["Capacity"] = target["Potential"]
target["Staff"] = 0
break #### case of hub is full: do next hub
else:
hub["Capacity"] = 0
hub["Staff"] = hub["Potential"]
target["Staff"] = -residuals
target["Capacity"] = target["Potential"] -
target["Staff"] dtf_new = dtf_new.append(pd.DataFrame(lst_hubs)
).append(pd.DataFrame(lst_targets))dtf_new = dtf_new.append(dtf[dtf["Cost"]=="medium"]
).reset_index(drop=True).sort_values(
["cluster","Staff"])
dtf_new.head()
This is a really simple algorithm that can be improved in several ways: for example, by taking the medium-cost stores into the equation and replicate the process when the low-cost ones are all full.
這是一個(gè)非常簡(jiǎn)單的算法,可以通過(guò)幾種方式進(jìn)行改進(jìn):例如,通過(guò)將中等成本的存儲(chǔ)納入等式,并在所有低成本存儲(chǔ)都已滿(mǎn)時(shí)復(fù)制該過(guò)程。
Let’s see how many high-cost stores we closed with this basic process:
讓我們看看通過(guò)此基本流程關(guān)閉了多少家高價(jià)商店:
dtf_new["closed"] = dtf_new["Staff"].apply(lambda x: 1if x==0 else 0)
print("closed:", dtf_new["closed"].sum())
We managed to close 19 stores, but did we also maintained a homogeneous coverage of the area so that customers won’t need to go to another neighborhood to visit a store? Let’s visualize the aftermath on the map by marking out the closed stores (marker = “closed”):
我們?cè)O(shè)法關(guān)閉了19家商店,但我們是否也對(duì)該地區(qū)進(jìn)行了統(tǒng)一覆蓋,這樣客戶(hù)就不必去另一個(gè)街區(qū)去逛商店了? 讓我們通過(guò)標(biāo)記關(guān)閉的商店( marker =“ closed” )來(lái)可視化地圖上的后果:
結(jié)論 (Conclusion)
This article has been a tutorial about how to use Clustering and Geospatial Analysis for a retail business case. I used a simulated dataset to compare popular Machine Learning and Deep Learning approaches and showed how to plot the output on interactive maps. I also showed a simple deterministic algorithm to provide a solution to the business case.
本文是有關(guān)如何在零售業(yè)務(wù)案例中使用聚類(lèi)和地理空間分析的教程。 我使用模擬數(shù)據(jù)集比較了流行的機(jī)器學(xué)習(xí)和深度學(xué)習(xí)方法,并展示了如何在交互式地圖上繪制輸出。 我還展示了一種簡(jiǎn)單的確定性算法,可以為業(yè)務(wù)案例提供解決方案。
This article is part of the series Machine Learning with Python, see also:
本文是“ 使用Python進(jìn)行機(jī)器學(xué)習(xí) ”系列的一部分,另請(qǐng)參見(jiàn):
Contacts: LinkedIn | Twitter
聯(lián)系人: 領(lǐng)英 | 推特
翻譯自: https://towardsdatascience.com/clustering-geospatial-data-f0584f0b04ec
總結(jié)
- 上一篇: 兔年首飞!中国东航全球首架C919国产客
- 下一篇: hotelling变换_基于Hotell