编程问答

在加利福尼亚州投资于新餐馆：一种数据驱动的方法

發(fā)布時(shí)間：2023/11/29 编程问答 62 豆豆

生活随笔收集整理的這篇文章主要介紹了在加利福尼亚州投资于新餐馆：一种数据驱动的方法小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

“It is difficult to make predictions, especially about the future.”

“很難做出預(yù)測(cè)，尤其是對(duì)未來的預(yù)測(cè)?！?/strong>

~Niels Bohr
?尼爾斯·波爾

Everything is better interpreted through data. And data-driven decision making is crucial for success in any industry.
通過數(shù)據(jù)可以更好地解釋一切。數(shù)據(jù)驅(qū)動(dòng)的決策對(duì)于任何行業(yè)的成功都是至關(guān)重要的。

And it has been true since time immemorable. The difference now is that we have, for better, developed a healthy outlook to data, and we have much more data available to us than previous times. And we have, in our disposal, computing powers previously unimagined.
自從難忘的時(shí)光以來，這就是事實(shí)。現(xiàn)在的區(qū)別在于，我們更好地發(fā)展了健康的數(shù)據(jù)前景，并且我們擁有比以前更多的數(shù)據(jù)。而且，我們擁有以前無法想象的計(jì)算能力。

In this situation, the computing power and the data should be leveraged to make better decisions to solve business problems.
在這種情況下，應(yīng)利用計(jì)算能力和數(shù)據(jù)做出更好的決策來解決業(yè)務(wù)問題。

In my project, I chose to provide recommendations for opening new eateries in California City. In this project, I provided a concrete list of recommendations to invest in. Eatery types (such as- Japanese restaurant, dessert shop, etc.) and respective counties were suggested.
在我的項(xiàng)目中，我選擇為在加利福尼亞市開設(shè)新餐館提供建議。在這個(gè)項(xiàng)目中，我提供了一份具體的投資建議清單。對(duì)餐館類型(例如日式餐廳，甜點(diǎn)店等)和各個(gè)縣提出了建議。
In this post, I will go over the full process of a Data Science project.
在本文中，我將介紹數(shù)據(jù)科學(xué)項(xiàng)目的整個(gè)過程。
數(shù)據(jù)源 (Data Sources)

For solving this problem, data from four sources have been leveraged-
為了解決這個(gè)問題，我們利用了來自四個(gè)來源的數(shù)據(jù)-

Location data titled “California Counties” provided in California Open Data Portal provided by the Government of California for the geographical location data.
由加利福尼亞政府提供的加利福尼亞開放數(shù)據(jù)門戶中提供的地理位置數(shù)據(jù)稱為“加利福尼亞縣” 。
The Foursquare API for information about established restaurants and other relevant detailed information about the same.
Foursquare API，用于提供有關(guān)已建立餐廳的信息以及有關(guān)該餐廳的其他相關(guān)詳細(xì)信息。
County-wise population data from the US Government Census site.
來自美國(guó)政府人口普查站點(diǎn)的縣級(jí)人口數(shù)據(jù)。
County-wise Real GDP data provided by the Bureau of Economic Analysis, U.S. Department of Commerce.
美國(guó)商務(wù)部經(jīng)濟(jì)分析局提供的縣級(jí)實(shí)際GDP數(shù)據(jù)。
探索性數(shù)據(jù)分析 (Exploratory Data Analysis)

After cleaning the data (which is definitely more than 90% of a Data Scientist’s job), meaningful insights were gained from the data.
清理數(shù)據(jù)后( 絕對(duì)超過數(shù)據(jù)科學(xué)家工作的90％ )，從數(shù)據(jù)中獲得了有意義的見解。
City Centers of California’s Counties, source: Author加利福尼亞州縣城中心，資料來源：作者
It was also found that the GDPs of the counties are strongly correlated with the Populations of the counties. Thus making counties with high GDPs and high populations attractive destination of investment.
還發(fā)現(xiàn)縣的GDP與縣的人口密切相關(guān)。因此，具有高GDP和高人口的縣成為吸引投資的目的地。
Strong Correlation Between GDP and Population of Californian Counties, source: Author加利福尼亞縣的GDP與人口之間的強(qiáng)相關(guān)性，來源：作者 Number of Eateries in Each County (capped at 50 by Foursquare), source: Author每個(gè)縣的餐館數(shù)量(Foursquare限制為50)，來源：作者
With the information provided by the Foursquare API, a list of ten most common venues was obtained for each county. This will be leveraged in decision making.
借助Foursquare API提供的信息，獲得了每個(gè)縣的十個(gè)最常見的場(chǎng)所列表。這將在決策中加以利用。
Five Row五排
應(yīng)用機(jī)器學(xué)習(xí)模型 (Applying Machine Learning Model)

選擇算法 (Choosing Algorithm)

The business problem is to look for eatery types and locations to invest in. The data is not labeled. This renders the problem to be solved a classical application of unsupervised learning.
業(yè)務(wù)問題是尋找餐館類型和投資地點(diǎn)。數(shù)據(jù)未標(biāo)記。這使得要解決的問題成為無監(jiān)督學(xué)習(xí)的經(jīng)典應(yīng)用。

The aim is not to look for value or look for a class. The aim is not to suggest someone only one recommendation for investment. To suggest the stakeholders a list of likely venues is the goal.
目的不是尋找價(jià)值或?qū)ふ译A級(jí)。目的不是建議某人僅提出一項(xiàng)投資建議。向利益相關(guān)者建議可能的場(chǎng)所清單是目標(biāo)。

And this can be achieved by clustering the counties based on GDP and Population. And KMeans Clustering is the best Statistical Learning algorithm to achieve this.
這可以通過基于GDP和人口對(duì)縣進(jìn)行聚類來實(shí)現(xiàn)。而KMeans聚類是實(shí)現(xiàn)這一目標(biāo)的最佳統(tǒng)計(jì)學(xué)習(xí)算法。

Scikit-learn library’s implementation for the KMeans Clustering algorithm was used.
使用了Scikit-learn庫(kù)的KMeans聚類算法實(shí)現(xiàn)。

選擇k (Choosing k)

For choosing the best k for clustering, the elbow method was employed.
為了選擇最佳的k進(jìn)行聚類，采用了彎頭法。
Inertia vs. Values of k Plot, source: Author慣性與k圖的值的關(guān)系，來源：作者
As evident from the graph, the best k is 4. Hence, the clustering algorithm was applied with k = 4. So, 4 clusters of counties were formed based on population and GDP of the counties.
從圖中可以看出，最佳k為4。因此，在k = 4時(shí)應(yīng)用了聚類算法。因此，根據(jù)縣的人口和GDP形成了4個(gè)縣集群。

結(jié)果 (Results)

4 clusters were formed containing counties. Upon examination, it was found that Los Angeles county formed one cluster (cluster-2) with itself due to its comparatively abysmally high GDP and population. Counties in another cluster had high GDP and high population, but not anywhere close to the Los Angeles county. Orange, Santa Clara, and San Diego are the three counties in this cluster (cluster-3). Then there are counties with low GDP and low populations such as Plumas, Nevada, Sierra, etc. in one cluster (cluster-1), and mid-range GDP and population, such as Sacramento, Riverside, etc. in another cluster (cluster-4).
形成了包含縣的4個(gè)集群。經(jīng)檢查，發(fā)現(xiàn)洛杉磯縣因其GDP和人口相對(duì)較高而與其自身形成了一個(gè)集群(集群2)。另一個(gè)集群中的縣的GDP較高且人口眾多，但洛杉磯縣附近沒有。奧蘭治，圣克拉拉和圣地亞哥是該集群中的三個(gè)縣(集群3)。然后是一個(gè)集群(集群1)中的Plumas，內(nèi)華達(dá)州，塞拉利昂等GDP較低且人口較少的縣(另一個(gè)集群)(薩克拉曼多，河濱等)中部GDP和人口較低的縣(集群) -4)。
Resulting Clusters on a Map, source: Author地圖上的結(jié)果集群，來源：作者
In clusters 2, 3 we have counties with a high population and high GDP. In these counties, it will be profitable to invest in any eatery while it is advisable to invest in an eatery that is not in the top 3 venues.
在第2、3組中，我們的縣人口眾多，GDP很高。在這些縣中，投資于任何一家餐館都是有利可圖的，而建議投資于不在前三名場(chǎng)所中的餐館則是有利的。

In cluster 4, the population and GDP of counties are higher than those of the counties in cluster 1 but lower than those of counties in 2 or 3. Investment in these counties is preferred after a county in cluster 2 and cluster 3, in that order. Investment should be done in uncommon eateries so that they face lesser competition.
在集群4中，縣的人口和GDP高于集群1中的縣，但低于集群2或3中的縣。在這些縣中投資優(yōu)先于集群2和集群3中的縣。。應(yīng)該在不常見的餐館里進(jìn)行投資，以使他們面臨的競(jìng)爭(zhēng)更少。

Cluster 1 is dominated by lower population counties. Investment in these counties should be preferred after investments in counties in clusters 2 or 3 or cluster 4. Investment in most common eateries is not advised at all. Investment in these counties is least advised.
集群1由人口較少的縣主導(dǎo)。在對(duì)第2組或第3組或第4組的縣進(jìn)行投資之后，應(yīng)該優(yōu)先選擇對(duì)這些縣進(jìn)行投資。建議不要在這些縣進(jìn)行投資。

After suggesting investment options, tables for each cluster were formed with eatery types, not in the three most common types.
在提出投資選擇建議之后，每個(gè)集群的表格都是用餐館類型構(gòu)成的，而不是三種最常見的類型。
Table for Counties and Investment Recommendations in Cluster 3表3中的縣和投資建議表
Full Report Link: PDF in GitHub RepositoryNotebook with Full Code: NB Viewer
完整報(bào)告鏈接： GitHub存儲(chǔ)庫(kù)筆記本中的PDF ，完整代碼： NB Viewer
Feel free to comment, provide feedback, or criticize.
隨時(shí)發(fā)表評(píng)論，提供反饋或批評(píng)。
Connect with me on LinkedIn or Twitter.
在LinkedIn或Twitter上與我聯(lián)系。
This blog post is related to Applied Data Science Capstone Project offered by IBM through Coursera.
這篇博客文章與IBM通過Coursera提供的Applied Data Science Capstone Project有關(guān)。
翻譯自: https://medium.com/beginning-data-science/investing-in-a-new-eastery-in-california-a-data-driven-approach-e91229e0289e

總結(jié)

以上是生活随笔為你收集整理的在加利福尼亚州投资于新餐馆：一种数据驱动的方法的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

方法

加利福尼亚州

餐馆

于新

歡迎分享！

轉(zhuǎn)載請(qǐng)說明來源于"生活随笔"，并保留原作者的名字。

本文地址：在加利福尼亚州投资于新餐馆：一种数据驱动的方法

上一篇：梦到蛇心里害怕怎么回事

下一篇：近似算法的近似率_选择最佳近似最近算法的

最新發(fā)布

点击弹窗 input直接是待输入状态_第六课：你知道如何用两行代码做个弹窗吗？看这里...

暖通专业标准规范大全_中高级职称专业分类改革机械类十大热门专业分享

动态添加的路由直接访问_VUE 动态路由（二）

重新分区_手机DATA重新分区教程(超详细)

怎么挪动_你真的懂iPhone上的小圆点怎么玩吗

熱門推薦

蓝牙厂商代码与公司对应列表

历年高考报考人数和录取人数

河南王牌计算机专业,河南计算机专业实力突出的7所大学，郑大位列次席，榜首实至名归...

UniCode编码对照表及过滤方案

LeetCode——Backtracking

標(biāo)簽云

连接数据库

单元格

蓝牙耳机

程序语言

微信游戏

软件安装

双系统

游戏开发者

设计理念

计算机资源

visiting

星条旗

蔡国庆

浩哥拍

来福枪

五米长三米宽

改一般

世界文化遗产

parents

人均可支配

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

生活随笔

生活随笔

编程问答

在加利福尼亚州投资于新餐馆：一种数据驱动的方法

數(shù)據(jù)源 (Data Sources)

探索性數(shù)據(jù)分析 (Exploratory Data Analysis)

應(yīng)用機(jī)器學(xué)習(xí)模型 (Applying Machine Learning Model)

選擇算法 (Choosing Algorithm)

選擇k (Choosing k)

結(jié)果 (Results)

總結(jié)