相似邻里算法_纽约市-邻里之战
相似鄰里算法
IBM Data Science Capstone Project
IBM Data Science Capstone項(xiàng)目
分析和可視化與服裝店投資者的要求有關(guān)的紐約市結(jié)構(gòu) (Analyzing and visualizing the structure of New York City in relation to the requirements of a Clothing Store Investor)
介紹 (Introduction)
I have taken the time in my semester break to acquire more skills related to data science by completing the IBM Data Science Professional Certificate course on Coursera. The last module is a capstone project and the highlight of the course, it includes the greatest learning effect by applying the learned skills on an individual real life problem. As you will see from the business problem part, I decided to add some more complexity to the standard course assignment .
我在學(xué)期中花了時(shí)間通過完成Coursera上的IBM數(shù)據(jù)科學(xué)專業(yè)證書課程來學(xué)習(xí)與數(shù)據(jù)科學(xué)相關(guān)的更多技能。 最后一個(gè)模塊是一個(gè)頂峰項(xiàng)目,也是課程的重點(diǎn),它通過將學(xué)到的技能應(yīng)用到單個(gè)現(xiàn)實(shí)生活問題上,包括最大的學(xué)習(xí)效果。 從業(yè)務(wù)問題部分您將看到,我決定為標(biāo)準(zhǔn)課程分配增加一些復(fù)雜性。
I present here the summery of my project and explain different methods briefly. If you are interested in the extended code check out my jupyter notebook.
我在這里介紹我的項(xiàng)目概述,并簡(jiǎn)要說明不同的方法。 如果您對(duì)擴(kuò)展代碼感興趣,請(qǐng)查看我的jupyter 筆記本 。
業(yè)務(wù)問題 (Business Problem)
The project is based on a hypothetical business case. A Canadian Investor who recently made a fortune with an investment in a Clothing Store in Toronto wants to repeat his idea in New York City.
該項(xiàng)目基于一個(gè)假設(shè)的業(yè)務(wù)案例。 一位最近在多倫多的一家服裝店投資賺錢的加拿大投資者想在紐約市重復(fù)他的想法。
The Investor first wants a macro overview of New York City. So we are exploring the community districts.
投資者首先要對(duì)紐約市進(jìn)行宏觀概覽。 因此,我們正在探索社區(qū)區(qū)。
1.業(yè)務(wù)問題理解 (1. Business Problem Understanding)
The Project seems very clear, find the perfect district for an Italian brand clothing store, taking into account the location should be suitable to his imaginations of the perfect place of residence, where you feel safe at the same time.
該項(xiàng)目看起來非常清晰,為意大利品牌服裝店找到理想的地區(qū),同時(shí)考慮到該位置應(yīng)適合他對(duì)理想住所的想象,同時(shí)又讓您感到安全。
2.分析方法 (2. Analytical Approach)
The core of the project will be the socio-economic data frame. Complementary we build a venues data frame fetched from foursquare and explore these venues. The final venues frame will contain the most common venues of each district, which we will get through one hot encoding. This data frame is the basis for the k-means algorithm to cluster the districts by their features to compare similarity between these districts.
該項(xiàng)目的核心將是社會(huì)經(jīng)濟(jì)數(shù)據(jù)框架。 作為補(bǔ)充,我們構(gòu)建了一個(gè)從foursquare獲取的場(chǎng)所數(shù)據(jù)框架,并探索了這些場(chǎng)所。 最終的場(chǎng)所框架將包含每個(gè)地區(qū)最常見的場(chǎng)所,我們將通過一種熱門編碼進(jìn)行了解。 該數(shù)據(jù)幀是k-means算法根據(jù)區(qū)域特征對(duì)區(qū)域進(jìn)行聚類以比較這些區(qū)域之間相似度的基礎(chǔ)。
features weighted matrix特征加權(quán)矩陣For the best result the analytical solution to the business problem is to quantify and evaluate the thoughts of the client to full fill his requirements completely. For evaluating his criteria, we will create a features weighted matrix to express the investors desires in a scientific way, which we will multiply with the normalized final data frame to add the extra column with the weighted results, which gives us an indication of the best districts.
為了獲得最佳結(jié)果,業(yè)務(wù)問題的分析解決方案是量化和評(píng)估客戶的想法,以完全滿足其需求。 為了評(píng)估他的標(biāo)準(zhǔn),我們將創(chuàng)建一個(gè)特征加權(quán)矩陣 ,以科學(xué)的方式表達(dá)投資者的需求,然后將其與標(biāo)準(zhǔn)化的最終數(shù)據(jù)框相乘,以添加帶有加權(quán)結(jié)果的額外列,從而為我們指明最佳地區(qū)。
3.數(shù)據(jù)要求和收集 (3. Data requirements and collection)
To ensure the best location for the store I decided to add some more complexity to the standard course problem. As you can see from the criteria given by the investor we need some more data.
為了確保商店的最佳位置,我決定為標(biāo)準(zhǔn)課程問題增加一些復(fù)雜性。 從投資者給出的標(biāo)準(zhǔn)可以看出,我們需要更多數(shù)據(jù)。
In the beginning of the Project I found data from many different data sources, but decided to get the data mainly from cccnewyork.org by the reason that the source of their Data is the U.S. Census Bureau and the data was fetched by the American Community Survey https://data.census.gov/. So we can be sure the data is up to date, consistent and reliable.
在項(xiàng)目開始時(shí),我從許多不同的數(shù)據(jù)源中找到了數(shù)據(jù),但決定主要從cccnewyork.org獲取數(shù)據(jù),原因是其數(shù)據(jù)源是美國(guó)人口普查局,并且該數(shù)據(jù)是由美國(guó)社區(qū)調(diào)查局獲取的。 https://data.census.gov/ 。 因此,我們可以確保數(shù)據(jù)是最新的,一致且可靠的。
the socio-economic data will be obtained from various csv files from cccnewyork.org
社會(huì)經(jīng)濟(jì)數(shù)據(jù)將從cccnewyork.org的各種csv文件中獲得
the venues will be fetched from Foursquare through an API
場(chǎng)地將通過API從Foursquare獲取
the Geo-coordinates will be obtained with nominatim and geopy
地理坐標(biāo)將通過nominatim和geopy獲得
4.數(shù)據(jù)理解與準(zhǔn)備 (4. Data understanding and preparing)
First of all, we will build a clean socio-economic data frame with all the necessary information which are related to the business problem. Therefore, we need to load all the files and drop all unnecessary columns and rows.
首先,我們將建立一個(gè)干凈的社會(huì)經(jīng)濟(jì)數(shù)據(jù)框架,其中包含與業(yè)務(wù)問題相關(guān)的所有必要信息。 因此,我們需要加載所有文件并刪除所有不必要的列和行。
final socio economic data frame最終的社會(huì)經(jīng)濟(jì)數(shù)據(jù)框架使用Geocoder獲取經(jīng)度和緯度 (Getting latitudes and longitudes with geocoder)
As mentioned before geocoder is a great tool to get the latitude and longitude. If you query is too large you can use the rate limiter for fetching larger amounts of queries successful:
如前所述,Geocoder是獲取緯度和經(jīng)度的好工具。 如果查詢太大,則可以使用速率限制器成功獲取大量查詢:
from geopy.extra.rate_limiter import RateLimitergeocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)D_lat = []
D_long = []for district in socio_test["Area"]:
location = geolocator.geocode(district)
if location:
d_latitude = location.latitude
d_longitude = location.longitude
D_lat.append(d_latitude)
D_long.append(d_longitude)
else:
print(district)#Append to df
socio["D_lat"] = D_lat
socio["D_long"] = D_long
用葉映射 (Mapping with Folium)
Folium is a great package to make beautiful maps. We will use it for a general overview of the districts of New York City to get familiar with the structure of the City and for interactive choropleth maps.
Folium是制作精美地圖的理想選擇。 我們將使用它來大致了解紐約市各區(qū),以熟悉紐約市的結(jié)構(gòu)以及交互式的十字形圖。
map_NY_Nsimple = folium.Map(location=[40.730610, -73.935242], zoom_start=10)# for each Community District add a marker to mapfor lat, long, district in zip(socio[‘D_lat’], socio[‘D_long’], socio[‘Area’]):
label = ‘{}’.format(district)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, long],
radius=15,
popup=label,
color=’#3186cc’,
fill=True,
fill_color=’#3186cc’,
fill_opacity=0.7,
parse_html=False).add_to(map_NY_Nsimple)
map_NY_Nsimplesimple map of the 59 community districts New York City紐約市59個(gè)社區(qū)區(qū)域的簡(jiǎn)單地圖
By adding chloropleth layers for each columnm, the map gets more interactive and informative. Simply add a layer for each column similar to the code below.
通過為每個(gè)columnm添加chloropleth層,地圖將變得更具交互性和信息量。 只需為每列添加一層,類似于下面的代碼。
income = map_NY_N.choropleth(geo_data=nyc_geo,
data=socio,
columns=['boro_cd', 'Income'],
key_on='feature.properties.boro_cd',
fill_color='OrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='income',
smooth_factor=0,
name = "income",
highlight = True)choropleth heatmap with checkbox for visualizing different columns帶復(fù)選框的Choropleth熱圖,用于可視化不同的列
With the Layer Control you get the checkbox for selecting the different data.
使用“ 圖層控件”,您將獲得用于選擇其他數(shù)據(jù)的復(fù)選框。
folium.LayerControl(collapsed=False).add_to(map_NY_N)使用Foursquare獲取場(chǎng)地?cái)?shù)據(jù) (Getting the venue data with Foursquare)
With Foursquare we can get up to 100 venues for each district, which is great for a free service. We will fetch the data and create a venues data frame, the pandas build in method .get_dummies lets us easily use the one hot encoding process to quantify the venues. After grouping the frame by the districts and calculating the mean value we can compare the different districts perfectly. In the jupyter Notebook you can comprehend the venue exploring detailed. But what is one hot encoding again?
使用Foursquare,我們可以為每個(gè)地區(qū)提供多達(dá)100個(gè)場(chǎng)所,這是免費(fèi)服務(wù)的絕佳選擇。 我們將獲取數(shù)據(jù)并創(chuàng)建一個(gè)場(chǎng)所數(shù)據(jù)框架,熊貓內(nèi)置方法.get_dummies使我們可以輕松地使用一個(gè)熱編碼過程來量化場(chǎng)所。 在按區(qū)域?qū)蚣苓M(jìn)行分組并計(jì)算平均值之后,我們可以完美地比較不同的區(qū)域。 在jupyter Notebook中,您可以詳細(xì)了解場(chǎng)地。 但是又是什么一種熱編碼?
One Hot Encoding is a process in the data processing that is applied to categorical data, to convert it into a binary vector representation for use in machine learning algorithms
一種熱編碼是數(shù)據(jù)處理中的一種過程,用于分類數(shù)據(jù),將其轉(zhuǎn)換為二進(jìn)制矢量表示形式,供機(jī)器學(xué)習(xí)算法使用
One-Hot Encoding simply creates one column for every possible value and put a 1 or 0 in the appropriate column.
一鍵編碼僅為每個(gè)可能的值創(chuàng)建一列,然后在適當(dāng)?shù)牧兄蟹湃?或0。
one hot encoded heatmap data frame with mean values一個(gè)帶有平均值的熱編碼熱圖數(shù)據(jù)幀最常見的場(chǎng)所 (The most common venues)
For the comparison of the districts we would like to create a table with a function which gives us the most common venues of each district. We can use this function later to explore the different cluster by their venues.
為了比較各區(qū),我們希望創(chuàng)建一個(gè)帶有功能的表格,該功能為我們提供了每個(gè)區(qū)最常見的場(chǎng)所。 稍后我們可以使用此功能,根據(jù)其場(chǎng)所探索不同的集群。
data frame of the most common venues by community districts社區(qū)地區(qū)最常見場(chǎng)館的數(shù)據(jù)框目標(biāo)場(chǎng)地的熱圖 (Heat-map of the target venues)
Related to the requirements of the customer we will have a closer look at the distribution of the Clothing Stores, Boutiques and Italien Restaurants in the City.
根據(jù)客戶的需求,我們將更詳細(xì)地了解城市中服裝店,精品店和意大利餐廳的分布。
Distribution of Clothing Stores, Boutiques and Italien Restaurants in the Community Districts of New York City紐約市社區(qū)地區(qū)的服裝店,精品店和意大利餐廳的分布5.分析與建模 (5. Analysing and Modelling)
You can find the detailed code here
您可以找到詳細(xì)的代碼 這里
This project has a need for data analysing through data exploring we will only use a simple classification algorithm but the main part is not about a machine learning model. We will use the k-means clustering followed by more data exploring and visualisation to expand our feeling for the data and understanding of the city.
這個(gè)項(xiàng)目需要通過數(shù)據(jù)探索進(jìn)行數(shù)據(jù)分析,我們將僅使用簡(jiǎn)單的分類算法,但主要部分不涉及機(jī)器學(xué)習(xí)模型。 我們將使用k-means聚類,然后進(jìn)行更多數(shù)據(jù)探索和可視化,以擴(kuò)大我們對(duì)城市數(shù)據(jù)和理解的感覺。
K-means is a method that aims to partition n data points into k clusters where each data point is assigned to the cluster with the nearest mean. The goal is to minimize the sum of all squared distances within a cluster.
K均值是一種旨在將n個(gè)數(shù)據(jù)點(diǎn)劃分為k個(gè)簇的方法,其中每個(gè)數(shù)據(jù)點(diǎn)均以最接近的均值分配給該簇。 目的是最小化群集內(nèi)所有平方距離的總和。
To find the perfect number of cluster the most common approach is the elbow method. Therefore we run the algorithm multiple times and then plotting the related score.
要找到理想的簇?cái)?shù),最常用的方法是彎頭方法。 因此,我們多次運(yùn)行該算法,然后繪制相關(guān)分?jǐn)?shù)。
The elbow method for determining number of clusters確定簇?cái)?shù)的肘法As you can see the elbow method is sometimes not very conclusive. But there are numerous different methods to determine the best number of clusters. The second method I used ist the Silhouette coefficient.
如您所見,肘部方法有時(shí)不是很確定。 但是,有許多不同的方法可以確定最佳群集數(shù)。 我使用的第二種方法是Silhouette系數(shù)。
The Silhouette coefficient is calculated using the mean intra-cluster distance and the mean nearest-cluster distance for each sample. For each point p, first find the average distance between p and all other points in the same cluster this is a measure of cohesion (A). Then find the average distance between p and all points in the nearest cluster, this is a measure of separation from the closest other cluster (B). The silhouette coefficient for p is defined as the difference between B and A (B-A) divided by the greater of the two (max(A,B))
使用每個(gè)樣本的平均集群內(nèi)距離和平均最近集群距離來計(jì)算Silhouette系數(shù)。 對(duì)于每個(gè)點(diǎn)p,首先找到p和同一簇中所有其他點(diǎn)之間的平均距離,這是內(nèi)聚力(A)的量度。 然后找到p與最近的群集中所有點(diǎn)之間的平均距離,這是與最近的其他群集(B)分離的量度。 p的輪廓系數(shù)定義為B和A之間的差(BA)除以兩者中的最大值(max(A,B))
The Silhouette coefficient method for determining number of clusters確定簇?cái)?shù)的輪廓系數(shù)法There are numerous quantitative methods of evaluating clustering results, you will see by using them as tools with the full understanding of the limitations the combination of contrasting methods rises the quality of your choice, if you be aware of actually examine the results, kind of a human inspection and making a determination based on an understanding of what the data represents, what a cluster represents, and what the clustering is intended to achieve, you will find the perfect number of clusters.
評(píng)估聚類結(jié)果的定量方法很多,如果您將其用作工具并充分了解其局限性,那么如果您知道實(shí)際檢查結(jié)果的話,對(duì)比方法的組合可以提高選擇的質(zhì)量。通過對(duì)數(shù)據(jù)代表什么,集群代表什么以及將要實(shí)現(xiàn)的集群的理解進(jìn)行人工檢查和確定,您將找到理想數(shù)量的集群。
This is the clustered map of each Community District by the venue structure and similarity.
這是根據(jù)會(huì)場(chǎng)結(jié)構(gòu)和相似性劃分的每個(gè)社區(qū)區(qū)的聚類地圖。
map of each Community District by the venue structure and similarity.每個(gè)社區(qū)區(qū)的地圖由會(huì)場(chǎng)結(jié)構(gòu)和相似性組成。分析投資者要求 (Analyzing the Investor requirements)
where solving the Business Problem begins
解決業(yè)務(wù)問題的起點(diǎn)
The clustered map above includes all venues we have fetched from Foursquare including the irrelevant venues, except the socio economic data. For the quality of the result it is important to deal only with relevant features, which have an impact on the decision of the Investor. Beginning from this part we will deploy the recently mentioned features weight matrix.
上面的群集地圖包括我們從Foursquare獲取的所有場(chǎng)所,包括不相關(guān)的場(chǎng)所,但社會(huì)經(jīng)濟(jì)數(shù)據(jù)除外。 對(duì)于結(jié)果的質(zhì)量,重要的是僅處理相關(guān)特征,這些特征會(huì)影響投資者的決策。 從這一部分開始,我們將部署最近提到的功能權(quán)重矩陣。
Initially we prepare and merge the data frames to include only the necessary columns.
最初,我們準(zhǔn)備并合并數(shù)據(jù)幀以僅包括必要的列。
data frame of all customer requirements before feature scaling功能擴(kuò)展之前所有客戶需求的數(shù)據(jù)框For the next part Feature Scaling is very important.
對(duì)于下一部分, 功能縮放非常重要。
Feature scaling is a technique to change the values of columns in the dataset to use a common scale, without losing information or distorting the differences in the ranges of the values. This can be achieved through Normalization and Standardization
特征縮放是一種將數(shù)據(jù)集中的列的值更改為使用公共比例的技術(shù),而不會(huì)丟失信息或扭曲值范圍內(nèi)的差異。 這可以通過標(biāo)準(zhǔn)化和標(biāo)準(zhǔn)化來實(shí)現(xiàn)
Normalization is a scaling technique which rescales the features so that the data will fall in the range of [0,1] to bring them to a comparable grade.
規(guī)范化是一種縮放技術(shù),可以對(duì)特征進(jìn)行重新縮放,以使數(shù)據(jù)落入[0,1]的范圍內(nèi),以使它們達(dá)到可比的等級(jí)。
Standardization is a scaling technique which rescales the features the way they range between [-1,1] by the properties of a standard normal distribution with the mean μ=0 and the standard deviation, σ=1, where μ is the average and σ is the standard deviation from the average.
標(biāo)準(zhǔn)化是一種縮放技術(shù),通過標(biāo)準(zhǔn)均值μ= 0和 標(biāo)準(zhǔn)偏差σ= 1,其中μ是平均值,而σ是與平均值的標(biāo)準(zhǔn)偏差。
So after the normalization and setting the index on the Area our data frame looks like the following:
因此,在標(biāo)準(zhǔn)化并在Area上設(shè)置索引后,我們的數(shù)據(jù)框如下所示:
data frame of all customer requirements after feature scaling功能擴(kuò)展后所有客戶需求的數(shù)據(jù)框Now we can multiply the features weight matrix and calculate the total score column, with some simple visualisation the data frame looks pretty informative.
現(xiàn)在,我們可以將特征權(quán)重矩陣相乘并計(jì)算總分列,通過一些簡(jiǎn)單的可視化,數(shù)據(jù)框看起來很有用。
heat map data frame based on the feature scaled frame multiplied by the weighted matrix基于特征縮放幀乘以加權(quán)矩陣的熱圖數(shù)據(jù)幀After applying the k-means method featuring this data frame (dropping the total score column) won’t get a visualization of the best districts numerically, but it shows us which districts are similar in accordance to the investor requirements. We will repeat the same process as mentioned before, finding the perfect number of clusters with the two method previously explained.
在應(yīng)用了具有該數(shù)據(jù)框的k-means方法(刪除總分列)后,將無法從數(shù)字上直觀地看到最佳地區(qū),但可以根據(jù)投資者的需求向我們顯示哪些地區(qū)相似。 我們將重復(fù)前面提到的相同過程,并使用前面介紹的兩種方法找到理想的簇?cái)?shù)。
NYC map showing the clusters based on the scaled and weighted data frameNYC地圖顯示基于縮放和加權(quán)數(shù)據(jù)框的聚類Red cluster 0 ist the medium level cluster the total mean of the features is mediocre. The mean total score is 1.39, but it’s notable that it includes 5 of the top scored districts, especially South Beach and Tottenville, which are located in Staten Island. There are also 3 high ranked districts from Manhatten included. The rest of the cluster is moderate.
紅色簇0和中等水平簇的特征總平均值中等。 平均總分是1.39,但值得注意的是,它包括5個(gè)得分最高的地區(qū),尤其是位于史坦頓島的南海灘和托特維爾。 曼哈頓還包括3個(gè)排名較高的地區(qū)。 集群的其余部分為中等。
The purple Cluster 1 is the high ranked Cluster it consists of only 2 districts with an median total score of 1.97. The districts of this cluster Battery Park and Midtown Business District scoring with a high occurrence of hotels but low population.
紫色的聚類1是排名最高的聚類,它僅由2個(gè)區(qū)域組成,中位數(shù)總分為1.97。 這個(gè)集群的炮臺(tái)公園和中城商業(yè)區(qū)的地區(qū)得分很高,旅館的發(fā)生率很高,但人口很少。
The blue Cluster 2 is the substandard faction, with a median total score of 0.67 and except of the population, trust and parks the mean values are very low.
藍(lán)色的簇2是不合標(biāo)準(zhǔn)的派系,平均總得分為0.67,除了人口,信任和公園之外,平均值非常低。
6.評(píng)估 (6. Evaluation)
As you may see presenting the customer a clustered map is not a result, which is a good foundation for finding the perfect location of the Store. But the weighted heatmap is great to work with. We are going to explore this data frame further. Presenting a map with the total scores has much more information for the decision of the customer, combining this map with the choropleth map of the socio economic data is superb to visualize the data frames interactive.
您可能會(huì)看到,向客戶展示集群地圖并不是結(jié)果,這是找到商店理想位置的良好基礎(chǔ)。 但是加權(quán)熱圖非常有用。 我們將進(jìn)一步探索該數(shù)據(jù)框架。 呈現(xiàn)具有總分的地圖可為客戶決策提供更多信息,將該地圖與社會(huì)經(jīng)濟(jì)數(shù)據(jù)的choropleth地圖相結(jié)合可以很好地可視化交互式數(shù)據(jù)框架。
heat map data frame of the top 15 community districts前15個(gè)社區(qū)區(qū)域的熱點(diǎn)地圖數(shù)據(jù)框可視化總成績(jī) (Visualize the total score)
chloropleth map of the total scores of the community districts社區(qū)地區(qū)總得分的chloropleth地圖 choropleth heatmap with checkbox for visualizing different columns帶復(fù)選框的Choropleth熱圖,用于可視化不同的列Bar plots of the top 15 districts
前15個(gè)區(qū)的條形圖
Supportive to the maps are the following bar plots of the sorted top 15 results, to get a contrasting view of the data.
下面的條形圖對(duì)地圖的前15個(gè)結(jié)果進(jìn)行了排序,以支持這些地圖,以形成對(duì)比的數(shù)據(jù)視圖。
bar plots of each column of the top 15 districts前15個(gè)地區(qū)的每列的條形圖 bar plots of each column of the top 15 districts前15個(gè)地區(qū)的每列的條形圖 bar plot of top 15 community districts sorted by total score按總分排序的前15個(gè)社區(qū)區(qū)的條形圖7.結(jié)果討論 (7. Discussion of the Result)
Our analysis shows that there are several promising districts for the store. Especially South Beach, Upper West Side and Battery Park are high rated. The Distribution of Clothing Stores was the highest in St.George and of Boutiques in Midtown Business District.
我們的分析表明,這家商店有幾個(gè)很有希望的地區(qū)。 尤其是南海灘 , 上西區(qū)和炮臺(tái)公園很高。 在圣喬治和中城商業(yè)區(qū)的精品店中,服裝店的分布最高。
As you can see from the map the Cluster 0 (red) is the medium cluster for the requirements of the Customer, it is a pretty big cluster and includes some of the best scored districts. The purple Cluster is mostly located in Manhatten and consist of only 2 high ranked districts. The blue cluster should be ignored.
從地圖上可以看到,集群0( 紅色 )是滿足客戶需求的中型集群,它是一個(gè)很大的集群,其中包括一些得分最高的地區(qū)。 紫色星團(tuán)大部分位于曼哈頓,僅由2個(gè)排名較高的地區(qū)組成。 的 藍(lán)色簇應(yīng)忽略。
South Beach located in Staten Island gained the highest score. There is a high frequency of Italian restaurants and the factor that it is a good place to live with a low Crime Rate compensates the medium socio economic data. Choosing this location could mean that the Store will profit from regular customers but there won’t be as many tourists and walk in customers as in Manhatten.
位于史泰登島的南海灘獲得最高分。 意大利餐館的出現(xiàn)頻率很高,并且這是一個(gè)犯罪率低的好地方,這彌補(bǔ)了中等社會(huì)經(jīng)濟(jì)數(shù)據(jù)的不足。 選擇這個(gè)位置可能意味著該商店將從常規(guī)客戶那里獲利,但是不會(huì)有像曼哈頓那樣多的游客和步行者。
Upper Westside scores with high income, population and parks but there are no hotels directly in the district, which could lead to less touristic customers. On the oter hand the Central Park is close by, which is a touristic hotspot. But probably more touristic than South Beach. On the other hand this is a place where a lot of wealthy people live and the store could benefit from regular customers. It could be a great place to live if the customer prefers to live right in the city Center. The proximity to the Central Park a touristic hot spot could maybe compensate the lack of hotels in the relation of touristic customers.
上西區(qū)的收入,人口和公園都很高,但該地區(qū)沒有直接的酒店,這可能會(huì)減少游客人數(shù)。 另一方面,中央公園就在附近,這是一個(gè)旅游熱點(diǎn)。 但可能比南灘更具旅游意義。 另一方面,這里是許多有錢人居住的地方,商店可以從普通顧客中受益。 如果客戶更喜歡住在市中心,那將是一個(gè)理想的居住地。 靠近中央公園的旅游熱點(diǎn)也許可以彌補(bǔ)旅游客戶關(guān)系方面缺乏酒店的問題。
The Battery Park is a touristic hotspot in New York even though the low population it is in the top 3 districts and got the highest income score. The few People who can afford to live in the top of Manhatten have a high income furthermore there are lots of hotels located in and around the area which guarantees for a great mix of tourists and regular customers.
炮臺(tái)公園是紐約的旅游熱點(diǎn),盡管人口稀少,是前三名地區(qū),并且收入最高。 少數(shù)有能力在曼哈頓上山生活的人收入較高,此外,該地區(qū)及周邊地區(qū)還有許多酒店,這些酒店保證了游客和固定客戶的大量融合。
Tottenville the 4. place is in Staten Island too and has the lowest crime and the lowest unemployment rate of the top 15. The trust score is also one of the highest. Furthermore it has a high overall score and is pretty similar to South Beach.
排名第四的托特維爾也在史坦頓島,犯罪率最低,前15名的失業(yè)率最低。信任度也是最高的之一。 此外,它的總體得分很高,與南海灘非常相似。
要做出一個(gè)主要決定: (There is one main decision to make:)
Manhatten or Staten Island
曼哈頓或史泰登島
8.結(jié)論 (8. Conclusion)
Purpose of this project was to identify districts which fits best to the diverse requirements of the customer. By evaluating and quantifying his imaginations with the weighted matrix it was possible to identify several districts which combines his requirements for the location of the store and personal living wishes.
該項(xiàng)目的目的是確定最適合客戶多樣化需求的地區(qū)。 通過使用加權(quán)矩陣評(píng)估和量化他的想象力,可以確定幾個(gè)區(qū)域,這些區(qū)域結(jié)合了他對(duì)商店位置和個(gè)人生活愿望的要求。
For finding the perfect location we now have to go deeper and analyse the top 10 to 15 districts more detailed. We could compare specific neighbourhoods and add more detailed data like tourism frequency to finally find the perfect neighbourhood or even the best street for the store.
為了找到理想的位置,我們現(xiàn)在必須更深入地分析更詳細(xì)的前10至15個(gè)地區(qū)。 我們可以比較特定的社區(qū),并添加更多詳細(xì)數(shù)據(jù)(例如旅游頻率),以最終找到商店的理想社區(qū)甚至最佳街道。
翻譯自: https://medium.com/@patrich94/new-york-city-the-battle-of-the-neighbourhoods-73d10987cb
相似鄰里算法
總結(jié)
以上是生活随笔為你收集整理的相似邻里算法_纽约市-邻里之战的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 梦到黑无常预示着什么
- 下一篇: 数据透视表和数据交叉表_数据透视表的数据