當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

相似邻里算法_纽约市-邻里之战

發布時間：2023/11/29 编程问答 40 豆豆

生活随笔收集整理的這篇文章主要介紹了相似邻里算法_纽约市-邻里之战小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

相似鄰里算法

IBM Data Science Capstone Project

IBM Data Science Capstone項目

分析和可視化與服裝店投資者的要求有關的紐約市結構 (Analyzing and visualizing the structure of New York City in relation to the requirements of a Clothing Store Investor)

介紹 (Introduction)

I have taken the time in my semester break to acquire more skills related to data science by completing the IBM Data Science Professional Certificate course on Coursera. The last module is a capstone project and the highlight of the course, it includes the greatest learning effect by applying the learned skills on an individual real life problem. As you will see from the business problem part, I decided to add some more complexity to the standard course assignment .

我在學期中花了時間通過完成Coursera上的IBM數據科學專業證書課程來學習與數據科學相關的更多技能。最后一個模塊是一個頂峰項目，也是課程的重點，它通過將學到的技能應用到單個現實生活問題上，包括最大的學習效果。從業務問題部分您將看到，我決定為標準課程分配增加一些復雜性。

I present here the summery of my project and explain different methods briefly. If you are interested in the extended code check out my jupyter notebook.

我在這里介紹我的項目概述，并簡要說明不同的方法。如果您對擴展代碼感興趣，請查看我的jupyter 筆記本 。

業務問題 (Business Problem)

The project is based on a hypothetical business case. A Canadian Investor who recently made a fortune with an investment in a Clothing Store in Toronto wants to repeat his idea in New York City.

該項目基于一個假設的業務案例。一位最近在多倫多的一家服裝店投資賺錢的加拿大投資者想在紐約市重復他的想法。

As his brand is exclusive and expensive the location should be one of the most crowded districts with high employment rate and above average income. He prefers not only tourists to buy in the store he would also like to gain many regular customers.

由于他的品牌專有且價格昂貴，因此該地區應該是就業率最高且收入高于平均水平的最擁擠的地區之一。他不僅喜歡游客在商店購物，而且還希望吸引許多固定顧客。

Due to the origin of his brand has a touch of Italian design he prefers a location close to Italian restaurants on the basis of window shopping and the chance that people who go for Italian food also have a sympathy for Italian fashion is pretty high.

由于他的品牌的起源具有意大利風格，因此他更喜歡在櫥窗購物的基礎上靠近意大利餐廳，而且去意大利美食的人也很喜歡意大利時尚。

Tourists and business traveler are well known for spending money generously, therefore the criteria to be as close as possible to hotels is highly important, because guests of the city hotels are more likely to buy clothes nearby and guarantee for more walk-in customers.

游客和商務旅行者以大筆消費而聞名，因此，盡可能接近酒店的標準非常重要，因為城市酒店的客人更有可能在附近買衣服，并保證有更多的客源顧客。

As close to the city Center or other touristic hotspots to benefit from walk-in customers. Approximately 20 Minutes walking distance to the Center of the district. If possible far away from other clothing stores.

距市中心或其他旅游熱點不遠，因此可從步行游客中受益。到市區中心約20分鐘步行路程。如果可能，請遠離其他服裝店。

The Investor wishes to invest in a flat in New York City to be nearby the store. By the reason to live close to the store he has the following criteria to his place of residence: low crime rate, high community trust, close to parks, theatres and art galleries.

投資者希望在商店附近的紐約市投資一個公寓。由于居住在商店附近的原因，他對居住地有以下標準：犯罪率低，社區信任度高，靠近公園，劇院和美術館。

The Investor first wants a macro overview of New York City. So we are exploring the community districts.

投資者首先要對紐約市進行宏觀概覽。因此，我們正在探索社區區。

1.業務問題理解 (1. Business Problem Understanding)

The Project seems very clear, find the perfect district for an Italian brand clothing store, taking into account the location should be suitable to his imaginations of the perfect place of residence, where you feel safe at the same time.

該項目看起來非常清晰，為意大利品牌服裝店找到理想的地區，同時考慮到該位置應適合他對理想住所的想象，同時又讓您感到安全。

2.分析方法 (2. Analytical Approach)

The core of the project will be the socio-economic data frame. Complementary we build a venues data frame fetched from foursquare and explore these venues. The final venues frame will contain the most common venues of each district, which we will get through one hot encoding. This data frame is the basis for the k-means algorithm to cluster the districts by their features to compare similarity between these districts.

該項目的核心將是社會經濟數據框架。作為補充，我們構建了一個從foursquare獲取的場所數據框架，并探索了這些場所。最終的場所框架將包含每個地區最常見的場所，我們將通過一種熱門編碼進行了解。該數據幀是k-means算法根據區域特征對區域進行聚類以比較這些區域之間相似度的基礎。

features weighted matrix特征加權矩陣

For the best result the analytical solution to the business problem is to quantify and evaluate the thoughts of the client to full fill his requirements completely. For evaluating his criteria, we will create a features weighted matrix to express the investors desires in a scientific way, which we will multiply with the normalized final data frame to add the extra column with the weighted results, which gives us an indication of the best districts.

為了獲得最佳結果，業務問題的分析解決方案是量化和評估客戶的想法，以完全滿足其需求。為了評估他的標準，我們將創建一個特征加權矩陣 ，以科學的方式表達投資者的需求，然后將其與標準化的最終數據框相乘，以添加帶有加權結果的額外列，從而為我們指明最佳地區。

3.數據要求和收集 (3. Data requirements and collection)

To ensure the best location for the store I decided to add some more complexity to the standard course problem. As you can see from the criteria given by the investor we need some more data.

為了確保商店的最佳位置，我決定為標準課程問題增加一些復雜性。從投資者給出的標準可以看出，我們需要更多數據。

In the beginning of the Project I found data from many different data sources, but decided to get the data mainly from cccnewyork.org by the reason that the source of their Data is the U.S. Census Bureau and the data was fetched by the American Community Survey https://data.census.gov/. So we can be sure the data is up to date, consistent and reliable.

在項目開始時，我從許多不同的數據源中找到了數據，但決定主要從cccnewyork.org獲取數據，原因是其數據源是美國人口普查局，并且該數據是由美國社區調查局獲取的。 https://data.census.gov/ 。因此，我們可以確保數據是最新的，一致且可靠的。

the socio-economic data will be obtained from various csv files from cccnewyork.org
社會經濟數據將從cccnewyork.org的各種csv文件中獲得
the venues will be fetched from Foursquare through an API
場地將通過API從Foursquare獲取
the Geo-coordinates will be obtained with nominatim and geopy
地理坐標將通過nominatim和geopy獲得

4.數據理解與準備 (4. Data understanding and preparing)

First of all, we will build a clean socio-economic data frame with all the necessary information which are related to the business problem. Therefore, we need to load all the files and drop all unnecessary columns and rows.

首先，我們將建立一個干凈的社會經濟數據框架，其中包含與業務問題相關的所有必要信息。因此，我們需要加載所有文件并刪除所有不必要的列和行。

final socio economic data frame最終的社會經濟數據框架

使用Geocoder獲取經度和緯度 (Getting latitudes and longitudes with geocoder)

As mentioned before geocoder is a great tool to get the latitude and longitude. If you query is too large you can use the rate limiter for fetching larger amounts of queries successful:

如前所述，Geocoder是獲取緯度和經度的好工具。如果查詢太大，則可以使用速率限制器成功獲取大量查詢：

from geopy.extra.rate_limiter import RateLimiter
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)D_lat = []
D_long = []for district in socio_test["Area"]:

location = geolocator.geocode(district)
if location:
d_latitude = location.latitude
d_longitude = location.longitude
D_lat.append(d_latitude)
D_long.append(d_longitude)
else:
print(district)#Append to df
socio["D_lat"] = D_lat
socio["D_long"] = D_long

用葉映射 (Mapping with Folium)

Folium is a great package to make beautiful maps. We will use it for a general overview of the districts of New York City to get familiar with the structure of the City and for interactive choropleth maps.

Folium是制作精美地圖的理想選擇。我們將使用它來大致了解紐約市各區，以熟悉紐約市的結構以及交互式的十字形圖。

map_NY_Nsimple = folium.Map(location=[40.730610, -73.935242], zoom_start=10)# for each Community District add a marker to map
for lat, long, district in zip(socio[‘D_lat’], socio[‘D_long’], socio[‘Area’]):
label = ‘{}’.format(district)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, long],
radius=15,
popup=label,
color=’#3186cc’,
fill=True,
fill_color=’#3186cc’,
fill_opacity=0.7,
parse_html=False).add_to(map_NY_Nsimple)

map_NY_Nsimplesimple map of the 59 community districts New York City紐約市59個社區區域的簡單地圖

By adding chloropleth layers for each columnm, the map gets more interactive and informative. Simply add a layer for each column similar to the code below.

通過為每個columnm添加chloropleth層，地圖將變得更具交互性和信息量。只需為每列添加一層，類似于下面的代碼。

income = map_NY_N.choropleth(
geo_data=nyc_geo,
data=socio,
columns=['boro_cd', 'Income'],
key_on='feature.properties.boro_cd',
fill_color='OrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='income',
smooth_factor=0,
name = "income",
highlight = True)choropleth heatmap with checkbox for visualizing different columns帶復選框的Choropleth熱圖，用于可視化不同的列

With the Layer Control you get the checkbox for selecting the different data.

使用“ 圖層控件”，您將獲得用于選擇其他數據的復選框。

folium.LayerControl(collapsed=False).add_to(map_NY_N)

使用Foursquare獲取場地數據 (Getting the venue data with Foursquare)

With Foursquare we can get up to 100 venues for each district, which is great for a free service. We will fetch the data and create a venues data frame, the pandas build in method .get_dummies lets us easily use the one hot encoding process to quantify the venues. After grouping the frame by the districts and calculating the mean value we can compare the different districts perfectly. In the jupyter Notebook you can comprehend the venue exploring detailed. But what is one hot encoding again?

使用Foursquare，我們可以為每個地區提供多達100個場所，這是免費服務的絕佳選擇。我們將獲取數據并創建一個場所數據框架，熊貓內置方法.get_dummies使我們可以輕松地使用一個熱編碼過程來量化場所。在按區域對框架進行分組并計算平均值之后，我們可以完美地比較不同的區域。在jupyter Notebook中，您可以詳細了解場地。但是又是什么一種熱編碼？

One Hot Encoding is a process in the data processing that is applied to categorical data, to convert it into a binary vector representation for use in machine learning algorithms

一種熱編碼是數據處理中的一種過程，用于分類數據，將其轉換為二進制矢量表示形式，供機器學習算法使用

One-Hot Encoding simply creates one column for every possible value and put a 1 or 0 in the appropriate column.

一鍵編碼僅為每個可能的值創建一列，然后在適當的列中放入1或0。

one hot encoded heatmap data frame with mean values一個帶有平均值的熱編碼熱圖數據幀

最常見的場所 (The most common venues)

For the comparison of the districts we would like to create a table with a function which gives us the most common venues of each district. We can use this function later to explore the different cluster by their venues.

為了比較各區，我們希望創建一個帶有功能的表格，該功能為我們提供了每個區最常見的場所。稍后我們可以使用此功能，根據其場所探索不同的集群。

data frame of the most common venues by community districts社區地區最常見場館的數據框

目標場地的熱圖 (Heat-map of the target venues)

Related to the requirements of the customer we will have a closer look at the distribution of the Clothing Stores, Boutiques and Italien Restaurants in the City.

根據客戶的需求，我們將更詳細地了解城市中服裝店，精品店和意大利餐廳的分布。

Distribution of Clothing Stores, Boutiques and Italien Restaurants in the Community Districts of New York City紐約市社區地區的服裝店，精品店和意大利餐廳的分布

5.分析與建模 (5. Analysing and Modelling)

You can find the detailed code here

您可以找到詳細的代碼這里

This project has a need for data analysing through data exploring we will only use a simple classification algorithm but the main part is not about a machine learning model. We will use the k-means clustering followed by more data exploring and visualisation to expand our feeling for the data and understanding of the city.

這個項目需要通過數據探索進行數據分析，我們將僅使用簡單的分類算法，但主要部分不涉及機器學習模型。我們將使用k-means聚類，然后進行更多數據探索和可視化，以擴大我們對城市數據和理解的感覺。

K-means is a method that aims to partition n data points into k clusters where each data point is assigned to the cluster with the nearest mean. The goal is to minimize the sum of all squared distances within a cluster.

K均值是一種旨在將n個數據點劃分為k個簇的方法，其中每個數據點均以最接近的均值分配給該簇。目的是最小化群集內所有平方距離的總和。

To find the perfect number of cluster the most common approach is the elbow method. Therefore we run the algorithm multiple times and then plotting the related score.

要找到理想的簇數，最常用的方法是彎頭方法。 因此，我們多次運行該算法，然后繪制相關分數。

The elbow method for determining number of clusters確定簇數的肘法

As you can see the elbow method is sometimes not very conclusive. But there are numerous different methods to determine the best number of clusters. The second method I used ist the Silhouette coefficient.

如您所見，肘部方法有時不是很確定。但是，有許多不同的方法可以確定最佳群集數。我使用的第二種方法是Silhouette系數。

The Silhouette coefficient is calculated using the mean intra-cluster distance and the mean nearest-cluster distance for each sample. For each point p, first find the average distance between p and all other points in the same cluster this is a measure of cohesion (A). Then find the average distance between p and all points in the nearest cluster, this is a measure of separation from the closest other cluster (B). The silhouette coefficient for p is defined as the difference between B and A (B-A) divided by the greater of the two (max(A,B))

使用每個樣本的平均集群內距離和平均最近集群距離來計算Silhouette系數。對于每個點p，首先找到p和同一簇中所有其他點之間的平均距離，這是內聚力(A)的量度。然后找到p與最近的群集中所有點之間的平均距離，這是與最近的其他群集(B)分離的量度。 p的輪廓系數定義為B和A之間的差(BA)除以兩者中的最大值(max(A，B))

The Silhouette coefficient method for determining number of clusters確定簇數的輪廓系數法

There are numerous quantitative methods of evaluating clustering results, you will see by using them as tools with the full understanding of the limitations the combination of contrasting methods rises the quality of your choice, if you be aware of actually examine the results, kind of a human inspection and making a determination based on an understanding of what the data represents, what a cluster represents, and what the clustering is intended to achieve, you will find the perfect number of clusters.

評估聚類結果的定量方法很多，如果您將其用作工具并充分了解其局限性，那么如果您知道實際檢查結果的話，對比方法的組合可以提高選擇的質量。通過對數據代表什么，集群代表什么以及將要實現的集群的理解進行人工檢查和確定，您將找到理想數量的集群。

This is the clustered map of each Community District by the venue structure and similarity.

這是根據會場結構和相似性劃分的每個社區區的聚類地圖。

map of each Community District by the venue structure and similarity.每個社區區的地圖由會場結構和相似性組成。

分析投資者要求 (Analyzing the Investor requirements)

where solving the Business Problem begins

解決業務問題的起點

The clustered map above includes all venues we have fetched from Foursquare including the irrelevant venues, except the socio economic data. For the quality of the result it is important to deal only with relevant features, which have an impact on the decision of the Investor. Beginning from this part we will deploy the recently mentioned features weight matrix.

上面的群集地圖包括我們從Foursquare獲取的所有場所，包括不相關的場所，但社會經濟數據除外。對于結果的質量，重要的是僅處理相關特征，這些特征會影響投資者的決策。從這一部分開始，我們將部署最近提到的功能權重矩陣。

Initially we prepare and merge the data frames to include only the necessary columns.

最初，我們準備并合并數據幀以僅包括必要的列。

data frame of all customer requirements before feature scaling功能擴展之前所有客戶需求的數據框

For the next part Feature Scaling is very important.

對于下一部分， 功能縮放非常重要。

Feature scaling is a technique to change the values of columns in the dataset to use a common scale, without losing information or distorting the differences in the ranges of the values. This can be achieved through Normalization and Standardization

特征縮放是一種將數據集中的列的值更改為使用公共比例的技術，而不會丟失信息或扭曲值范圍內的差異。這可以通過標準化和標準化來實現

Normalization is a scaling technique which rescales the features so that the data will fall in the range of [0,1] to bring them to a comparable grade.

規范化是一種縮放技術，可以對特征進行重新縮放，以使數據落入[0,1]的范圍內，以使它們達到可比的等級。

Standardization is a scaling technique which rescales the features the way they range between [-1,1] by the properties of a standard normal distribution with the mean μ=0 and the standard deviation, σ=1, where μ is the average and σ is the standard deviation from the average.

標準化是一種縮放技術，通過標準均值μ= 0和 標準偏差σ= 1，其中μ是平均值，而σ是與平均值的標準偏差。

So after the normalization and setting the index on the Area our data frame looks like the following:

因此，在標準化并在Area上設置索引后，我們的數據框如下所示：

data frame of all customer requirements after feature scaling功能擴展后所有客戶需求的數據框

Now we can multiply the features weight matrix and calculate the total score column, with some simple visualisation the data frame looks pretty informative.

現在，我們可以將特征權重矩陣相乘并計算總分列，通過一些簡單的可視化，數據框看起來很有用。

heat map data frame based on the feature scaled frame multiplied by the weighted matrix基于特征縮放幀乘以加權矩陣的熱圖數據幀

After applying the k-means method featuring this data frame (dropping the total score column) won’t get a visualization of the best districts numerically, but it shows us which districts are similar in accordance to the investor requirements. We will repeat the same process as mentioned before, finding the perfect number of clusters with the two method previously explained.

在應用了具有該數據框的k-means方法(刪除總分列)后，將無法從數字上直觀地看到最佳地區，但可以根據投資者的需求向我們顯示哪些地區相似。我們將重復前面提到的相同過程，并使用前面介紹的兩種方法找到理想的簇數。

NYC map showing the clusters based on the scaled and weighted data frameNYC地圖顯示基于縮放和加權數據框的聚類

Red cluster 0 ist the medium level cluster the total mean of the features is mediocre. The mean total score is 1.39, but it’s notable that it includes 5 of the top scored districts, especially South Beach and Tottenville, which are located in Staten Island. There are also 3 high ranked districts from Manhatten included. The rest of the cluster is moderate.

紅色簇0和中等水平簇的特征總平均值中等。平均總分是1.39，但值得注意的是，它包括5個得分最高的地區，尤其是位于史坦頓島的南海灘和托特維爾。曼哈頓還包括3個排名較高的地區。集群的其余部分為中等。

The purple Cluster 1 is the high ranked Cluster it consists of only 2 districts with an median total score of 1.97. The districts of this cluster Battery Park and Midtown Business District scoring with a high occurrence of hotels but low population.

紫色的聚類1是排名最高的聚類，它僅由2個區域組成，中位數總分為1.97。這個集群的炮臺公園和中城商業區的地區得分很高，旅館的發生率很高，但人口很少。

The blue Cluster 2 is the substandard faction, with a median total score of 0.67 and except of the population, trust and parks the mean values are very low.

藍色的簇2是不合標準的派系，平均總得分為0.67，除了人口，信任和公園之外，平均值非常低。

6.評估 (6. Evaluation)

As you may see presenting the customer a clustered map is not a result, which is a good foundation for finding the perfect location of the Store. But the weighted heatmap is great to work with. We are going to explore this data frame further. Presenting a map with the total scores has much more information for the decision of the customer, combining this map with the choropleth map of the socio economic data is superb to visualize the data frames interactive.

您可能會看到，向客戶展示集群地圖并不是結果，這是找到商店理想位置的良好基礎。但是加權熱圖非常有用。我們將進一步探索該數據框架。呈現具有總分的地圖可為客戶決策提供更多信息，將該地圖與社會經濟數據的choropleth地圖相結合可以很好地可視化交互式數據框架。

heat map data frame of the top 15 community districts前15個社區區域的熱點地圖數據框

可視化總成績 (Visualize the total score)

chloropleth map of the total scores of the community districts社區地區總得分的chloropleth地圖 choropleth heatmap with checkbox for visualizing different columns帶復選框的Choropleth熱圖，用于可視化不同的列

Bar plots of the top 15 districts

前15個區的條形圖

Supportive to the maps are the following bar plots of the sorted top 15 results, to get a contrasting view of the data.

下面的條形圖對地圖的前15個結果進行了排序，以支持這些地圖，以形成對比的數據視圖。

bar plots of each column of the top 15 districts前15個地區的每列的條形圖 bar plots of each column of the top 15 districts前15個地區的每列的條形圖 bar plot of top 15 community districts sorted by total score按總分排序的前15個社區區的條形圖

7.結果討論 (7. Discussion of the Result)

Our analysis shows that there are several promising districts for the store. Especially South Beach, Upper West Side and Battery Park are high rated. The Distribution of Clothing Stores was the highest in St.George and of Boutiques in Midtown Business District.

我們的分析表明，這家商店有幾個很有希望的地區。尤其是南海灘 ， 上西區和炮臺公園很高。在圣喬治和中城商業區的精品店中，服裝店的分布最高。

As you can see from the map the Cluster 0 (red) is the medium cluster for the requirements of the Customer, it is a pretty big cluster and includes some of the best scored districts. The purple Cluster is mostly located in Manhatten and consist of only 2 high ranked districts. The blue cluster should be ignored.

從地圖上可以看到，集群0( 紅色 )是滿足客戶需求的中型集群，它是一個很大的集群，其中包括一些得分最高的地區。紫色星團大部分位于曼哈頓，僅由2個排名較高的地區組成。的藍色簇應忽略。

South Beach located in Staten Island gained the highest score. There is a high frequency of Italian restaurants and the factor that it is a good place to live with a low Crime Rate compensates the medium socio economic data. Choosing this location could mean that the Store will profit from regular customers but there won’t be as many tourists and walk in customers as in Manhatten.

位于史泰登島的南海灘獲得最高分。意大利餐館的出現頻率很高，并且這是一個犯罪率低的好地方，這彌補了中等社會經濟數據的不足。選擇這個位置可能意味著該商店將從常規客戶那里獲利，但是不會有像曼哈頓那樣多的游客和步行者。

Upper Westside scores with high income, population and parks but there are no hotels directly in the district, which could lead to less touristic customers. On the oter hand the Central Park is close by, which is a touristic hotspot. But probably more touristic than South Beach. On the other hand this is a place where a lot of wealthy people live and the store could benefit from regular customers. It could be a great place to live if the customer prefers to live right in the city Center. The proximity to the Central Park a touristic hot spot could maybe compensate the lack of hotels in the relation of touristic customers.

上西區的收入，人口和公園都很高，但該地區沒有直接的酒店，這可能會減少游客人數。另一方面，中央公園就在附近，這是一個旅游熱點。但可能比南灘更具旅游意義。另一方面，這里是許多有錢人居住的地方，商店可以從普通顧客中受益。如果客戶更喜歡住在市中心，那將是一個理想的居住地。靠近中央公園的旅游熱點也許可以彌補旅游客戶關系方面缺乏酒店的問題。

The Battery Park is a touristic hotspot in New York even though the low population it is in the top 3 districts and got the highest income score. The few People who can afford to live in the top of Manhatten have a high income furthermore there are lots of hotels located in and around the area which guarantees for a great mix of tourists and regular customers.

炮臺公園是紐約的旅游熱點，盡管人口稀少，是前三名地區，并且收入最高。少數有能力在曼哈頓上山生活的人收入較高，此外，該地區及周邊地區還有許多酒店，這些酒店保證了游客和固定客戶的大量融合。

Tottenville the 4. place is in Staten Island too and has the lowest crime and the lowest unemployment rate of the top 15. The trust score is also one of the highest. Furthermore it has a high overall score and is pretty similar to South Beach.

排名第四的托特維爾也在史坦頓島，犯罪率最低，前15名的失業率最低。信任度也是最高的之一。此外，它的總體得分很高，與南海灘非常相似。

要做出一個主要決定： (There is one main decision to make:)

Manhatten or Staten Island

曼哈頓或史泰登島

8.結論 (8. Conclusion)

Purpose of this project was to identify districts which fits best to the diverse requirements of the customer. By evaluating and quantifying his imaginations with the weighted matrix it was possible to identify several districts which combines his requirements for the location of the store and personal living wishes.

該項目的目的是確定最適合客戶多樣化需求的地區。通過使用加權矩陣評估和量化他的想象力，可以確定幾個區域，這些區域結合了他對商店位置和個人生活愿望的要求。

For finding the perfect location we now have to go deeper and analyse the top 10 to 15 districts more detailed. We could compare specific neighbourhoods and add more detailed data like tourism frequency to finally find the perfect neighbourhood or even the best street for the store.

為了找到理想的位置，我們現在必須更深入地分析更詳細的前10至15個地區。我們可以比較特定的社區，并添加更多詳細數據(例如旅游頻率)，以最終找到商店的理想社區甚至最佳街道。

翻譯自: https://medium.com/@patrich94/new-york-city-the-battle-of-the-neighbourhoods-73d10987cb