数据结构两个月学完_这是我作为数据科学家两年来所学到的
數(shù)據(jù)結(jié)構(gòu)兩個(gè)月學(xué)完
It has been 2 years ever since I started my data science journey. Boy, that was one heck of a roller coaster ride!
自從我開(kāi)始數(shù)據(jù)科學(xué)之旅以來(lái)已經(jīng)有兩年了 。 男孩 ,那可真是坐過(guò)山車!
There were many highs and lows, and of course, countless cups of coffee and sleepless nights.
有很多高峰和低谷,當(dāng)然還有無(wú)數(shù)杯咖啡和不眠之夜。
I failed a lot, learned a lot, and of course, grew a lot as a data scientist along the journey.
作為一個(gè)數(shù)據(jù)科學(xué)家,我經(jīng)歷了很多失敗,學(xué)到了很多東西,當(dāng)然,成長(zhǎng)了很多。
Throughout my journey in these 2 years, from writing on Medium, speaking at meetups and workshops, sharing my experience on LinkedIn, consulting clients on data science projects, to the current stage of teaching data science in education, I find joy and fulfilment in sharing and teaching to help others in data science and make an impact.
在這兩年的旅程中,從撰寫中型文章 , 在聚會(huì)和研討會(huì) 上 發(fā)表演講, 在LinkedIn上分享我的經(jīng)驗(yàn) , 就數(shù)據(jù)科學(xué)項(xiàng)目向客戶提供咨詢 ,到目前在教育中教授數(shù)據(jù)科學(xué)的階段,我在分享中都感到快樂(lè)和成就并進(jìn)行教學(xué)以幫助他人在數(shù)據(jù)科學(xué)中產(chǎn)生影響 。
At the end of the day, it all boils down to one simple fact — that I’m moving towards my mission — Making data science accessible to everyone.
歸根結(jié)底,這都?xì)w結(jié)為一個(gè)簡(jiǎn)單的事實(shí)-我正在朝著自己的使命邁進(jìn)- 使所有人都能使用數(shù)據(jù)科學(xué) 。
If you’re interested, feel free to check my previous LinkedIn post on why I decided to transition from a data scientist to becoming a data science instructor — a.k.a teacher.
如果您有興趣,請(qǐng)隨時(shí)查看我以前在LinkedIn上發(fā)布的帖子,以了解為什么我決定從數(shù)據(jù)科學(xué)家過(guò)渡到成為數(shù)據(jù)科學(xué)老師(又名老師)。
In this article, for the first time, I’ll consolidate everything that I’ve learned and condense all of these into 5 lessons that I’ve learned in 2 years as a data scientist.
在本文中,我將第一次將自己學(xué)到的所有知識(shí)整合在一起,并將所有這些知識(shí)匯總為我在兩年內(nèi)作為數(shù)據(jù)科學(xué)家學(xué)到的5課 。
If you’re just starting out in data science and wondering what to learn…
如果您只是剛開(kāi)始從事數(shù)據(jù)科學(xué),并想知道該學(xué)習(xí)什么……
Or you’re looking for a job in data science…
或者您正在尋找數(shù)據(jù)科學(xué)領(lǐng)域的工作...
Or you’re already working in data science space…
或者您已經(jīng)在數(shù)據(jù)科學(xué)領(lǐng)域工作了……
I hope you’ll find these 5 lessons helpful to you as a data scientist!
希望您會(huì)發(fā)現(xiàn)這5堂課程對(duì)數(shù)據(jù)科學(xué)家有幫助!
Enough of talking… Let’s get started!
足夠多的談話……讓我們開(kāi)始吧!
我兩年來(lái)作為數(shù)據(jù)科學(xué)家學(xué)到的5課 (5 Lessons I’ve Learned in 2 Years as a Data Scientist)
(Source)(資源)1.講故事,而不是陳述。 (1. Storytelling, NOT Presentation.)
One of the most profound questions that I’ve ever been asked by one of the great senior data scientists during my data science career:
在我的數(shù)據(jù)科學(xué)職業(yè)生涯中,一位偉大的高級(jí)數(shù)據(jù)科學(xué)家曾經(jīng)問(wèn)過(guò)我最深刻的問(wèn)題之一:
“Admond, what’s the story that we are gonna tell in the meeting later?”
“阿德蒙德,我們稍后在會(huì)議上要講的故事是什么?”
The first time I heard this question, I was stunned for a second.
第一次聽(tīng)到這個(gè)問(wèn)題時(shí),我驚呆了一秒鐘。
He didn’t ask what slides I’d prepared.
他沒(méi)有問(wèn)我準(zhǔn)備了哪些幻燈片。
He didn’t ask what I was gonna share.
他沒(méi)有問(wèn)我要分享什么。
He didn’t ask what results that I was gonna tell.
他沒(méi)有問(wèn)我要告訴什么結(jié)果。
NONE.
沒(méi)有。
To be honest with you, I didn’t even understand why he emphasized so much on telling stories, instead of telling facts that we already had.
老實(shí)說(shuō),我什至不明白他為什么這么講講故事,而不是講我們已經(jīng)掌握的事實(shí)。
Before I began to appreciate the importance of telling stories, I made tons of mistakes.
在我開(kāi)始欣賞講故事的重要性之前,我犯了很多錯(cuò)誤。
Either stakeholders didn’t understand what I was saying. Or the insights couldn’t convince and motivate them to take action.
任何一個(gè)利益相關(guān)者都不理解我在說(shuō)什么。 否則這些見(jiàn)解無(wú)法說(shuō)服和激勵(lì)他們采取行動(dòng)。
Once I decided to improve my storytelling skills…
一旦我決定提高敘事技巧,…
Once I started focusing on telling stories…
一旦我開(kāi)始專注于講故事...
Things changed, for real.
事情變了,真的。
Stakeholders or non-technical bosses began to understand what I was delivering without bombarding them with technical jargons and results. They took action.
利益相關(guān)者或非技術(shù)老板開(kāi)始理解我所提供的內(nèi)容,而沒(méi)有用技術(shù)術(shù)語(yǔ)和結(jié)果轟炸他們。 他們采取了行動(dòng)。
Facts tell, but stories sell.
F 言行舉止,但故事卻賣。
If you want to be a good data scientist, focus on technical skills.
如果您想成為一名優(yōu)秀的數(shù)據(jù)科學(xué)家,請(qǐng)專注于技術(shù)技能。
If you want to be a great data scientist, focus on storytelling skills.
如果您想成為一名出色的數(shù)據(jù)科學(xué)家,請(qǐng)專注于講故事的技能。
所以……如何學(xué)習(xí)講故事的技巧? (So… How To Learn Storytelling Skills?)
Want to learn storytelling skills? Learn from Vox.
想學(xué)習(xí)講故事的技巧嗎? 向Vox學(xué)習(xí)。
Because they are the master of storytelling, like seriously.
因?yàn)樗麄兪侵v故事的主人,所以很認(rèn)真。
They have always been able to explain complex issues or ideas in an engaging and understandable way.
他們始終能夠以一種引人入勝且易于理解的方式解釋復(fù)雜的問(wèn)題或想法。
If this is the first time you’ve heard of Vox, check out their YouTube video below.
如果這是您第一次聽(tīng)說(shuō)Vox,請(qǐng)?jiān)谙旅娌榭此麄兊腨ouTube視頻。
Just observe how they explained societal phenomena and issues in the most intuitive storytelling way possible to understand.
只需觀察他們?nèi)绾我宰钪庇^的講故事的方式解釋社會(huì)現(xiàn)象和問(wèn)題,就可以理解。
And this is very important when it comes to presenting insights or delivering core message to your audience with great storytelling skills.
當(dāng)談到具有深刻的講故事技巧的見(jiàn)解或向您的聽(tīng)眾傳達(dá)核心信息時(shí),這一點(diǎn)非常重要。
演示地址
Vox — How wildlife trade is linked to coronavirusVox —野生生物貿(mào)易與冠狀病毒之間的聯(lián)系2.數(shù)據(jù)混亂,擁抱它。 (2. Data Is Messy, Embrace It.)
Forget about having Kaggle-like data in your real working environment, because most of the time you won’t have clean data.
忘記在實(shí)際的工作環(huán)境中擁有類似Kaggle的數(shù)據(jù),因?yàn)榇蠖鄶?shù)時(shí)候您將沒(méi)有干凈的數(shù)據(jù)。
Or worse, sometimes you don’t even have data to begin with, or perhaps you’re just not sure where to get or query data because they are scattered everywhere.
或更糟糕的是,有時(shí)您甚至沒(méi)有開(kāi)始使用的數(shù)據(jù),或者您只是不確定要從哪里獲取或查詢數(shù)據(jù),因?yàn)樗鼈兎稚⒃诟魈帯?
Data collection and data integrity are one of the most important steps in any data science projects, yet a lot of junior data scientists might be oblivious to that.
數(shù)據(jù)采集 和數(shù)據(jù)完整性 這是任何數(shù)據(jù)科學(xué)項(xiàng)目中最重要的步驟之一,但是許多初級(jí)數(shù)據(jù)科學(xué)家可能會(huì)忽略這一點(diǎn)。
The reality is that you need to know where to get your data based on business requirements and the existing data architecture.
現(xiàn)實(shí)情況是,您需要根據(jù)業(yè)務(wù)需求和現(xiàn)有數(shù)據(jù)架構(gòu)來(lái)了解從何處獲取數(shù)據(jù)。
You might breathe a sigh of relief after you’ve got the data, but this is where the hard part begins — data integrity.
擁有數(shù)據(jù)后,您可能會(huì)松一口氣,但這就是最困難的部分-數(shù)據(jù)完整性。
You need to perform a thorough check on the data collected by asking hard questions and understanding from different stakeholders to see if the data collected makes any sense.
您需要通過(guò)提出難題和不同利益相關(guān)者的理解對(duì)收集的數(shù)據(jù)進(jìn)行徹底檢查,以查看收集的數(shù)據(jù)是否有意義。
Without having right and accurate data in place at the first place, all of our data cleaning, EDA, machine learning models building, and deployment are simply a luxury.
如果沒(méi)有首先放置正確且準(zhǔn)確的數(shù)據(jù),那么我們所有的數(shù)據(jù)清理 , EDA ,機(jī)器學(xué)習(xí)模型的建立和部署都是一種奢侈。
3.軟技能>技術(shù)技能 (3. Soft Skills > Technical Skills)
One of the most common questions for beginners in data science is this:
數(shù)據(jù)科學(xué)初學(xué)者最常見(jiàn)的問(wèn)題之一是:
“What are the skills that I need to learn when starting out in data science?”
“從數(shù)據(jù)科學(xué)開(kāi)始我需要學(xué)習(xí)哪些技能?”
In my opinion, I think learning technical skills (programming, statistics etc.) should be the priority when first starting out in data science.
在我看來(lái),我認(rèn)為學(xué)習(xí)技術(shù)技能 (編程,統(tǒng)計(jì)學(xué)等)應(yīng)該是首次進(jìn)入數(shù)據(jù)科學(xué)時(shí)的優(yōu)先事項(xiàng)。
Once we’ve a solid foundation in technical skills, we should focus more on building and improving our soft skills (communication, storytelling etc.).
一旦我們?cè)诩夹g(shù)技能上建立了堅(jiān)實(shí)的基礎(chǔ),我們就應(yīng)該更加專注于建立和改進(jìn)我們的軟技能 (溝通,講故事等)。
While this might seem a bit counter-intuitive to the normal ways of learning data science skills, I truly believe in this approach.
盡管這似乎與學(xué)習(xí)數(shù)據(jù)科學(xué)技能的常規(guī)方法有點(diǎn)反常理,但我確實(shí)相信這種方法。
WHY?
為什么?
You see. Data scientists are problem solvers.
你看。 數(shù)據(jù)科學(xué)家是解決問(wèn)題的人。
We don’t just write some code, build some fancy machine learning models and call it a day.
我們不只是編寫一些代碼,構(gòu)建一些高級(jí)的機(jī)器學(xué)習(xí)模型,然后再稱之為一天。
From understanding a business problem, collecting and visualizing data, to the stage of prototyping, fine-tuning and deploying models to real world applications, all these steps require teamwork, communication and storytelling skills to work with team members, manage expectation with stakeholders and ultimately to drive business decisions and actions.
從了解業(yè)務(wù)問(wèn)題,收集和可視化數(shù)據(jù)到原型設(shè)計(jì),微調(diào)和將模型部署到現(xiàn)實(shí)世界應(yīng)用程序的階段,所有這些步驟都需要團(tuán)隊(duì)合作,溝通和講故事的技巧,才能與團(tuán)隊(duì)成員一起工作,與利益相關(guān)者一起管理期望并最終推動(dòng)業(yè)務(wù)決策和行動(dòng)。
There is a famous quote:
有句名言:
“ Without data you’re just another person with an opinion ”
“沒(méi)有數(shù)據(jù),您就是另一個(gè)有意見(jiàn)的人”
— W. Edwards Deming
—愛(ài)德華茲·戴明(W. Edwards Deming)
To me, getting data is only the first step. What’s more important is how you can use data to drive business decisions and actions to make a real impact. Here is a slightly modified quote from me:
對(duì)我來(lái)說(shuō),獲取數(shù)據(jù)只是第一步。 更重要的是如何使用數(shù)據(jù)來(lái)推動(dòng)業(yè)務(wù)決策和行動(dòng)以產(chǎn)生真正的影響。 這是我的引用語(yǔ):
“ Without storytelling skills you’re just another person with data ”
“沒(méi)有講故事的技巧,您就是另一個(gè)擁有數(shù)據(jù)的人”
You can perform the best data analytics in the world.
您可以執(zhí)行世界上最好的數(shù)據(jù)分析。
You can build the best machine learning model in the world.
您可以構(gòu)建世界上最好的機(jī)器學(xué)習(xí)模型。
You can also write the cleanest code in the world.
您還可以編寫世界上最干凈的代碼。
But if you can’t use your results to drive business decisions and actions to convince people to use what you’ve got, your results would only be residing in your PowerPoint slides without having any real impact.
但是,如果您不能使用結(jié)果來(lái)推動(dòng)業(yè)務(wù)決策和采取行動(dòng)來(lái)說(shuō)服人們使用您所擁有的功能,那么結(jié)果將只會(huì)駐留在PowerPoint幻燈片中而不會(huì)產(chǎn)生任何實(shí)際影響。
Sad, but true.
傷心,但真實(shí)。
4.可解釋的模型很重要。 (4. Interpretable Models Matter, A Lot.)
For most businesses — unless you’re working at some cutting-edge technology companies — fancy or complex models typically are not the first choice for analytics or predictions.
對(duì)于大多數(shù)企業(yè)而言-除非您在某些尖端科技公司工作-否則,花哨或復(fù)雜的模型通常不是分析或預(yù)測(cè)的首選。
Your boss and stakeholders want to understand what’s going on behind your results.
您的老板和利益相關(guān)者希望了解結(jié)果背后的情況。
Therefore, you need to be able to explain what’s going on behind your results.
因此,您需要能夠解釋結(jié)果背后的原因。
For instance, what caused this anomaly to be detected? And why is that so? Does it make sense in the business context? Why is the prediction the way it is? What are the contributing factors to the prediction? Are our assumptions correct?
例如,什么原因?qū)е麓水惓1粰z測(cè)到? 為什么會(huì)這樣呢? 在商業(yè)環(huán)境中有意義嗎? 為什么預(yù)測(cè)是這樣? 預(yù)測(cè)的影響因素是什么? 我們的假設(shè)正確嗎?
From all those questions asked above, it essentially boils down to one simple question:
從以上所有這些問(wèn)題中,它基本上可以歸結(jié)為一個(gè)簡(jiǎn)單的問(wèn)題:
“ What’s the pattern observed behind? ”
“觀察到的模式是什么? ”
Being able to understand what’s going on behind our models and results is crucial to drive business decisions by convincing stakeholders to take actions.
通過(guò)說(shuō)服利益相關(guān)者采取行動(dòng),能夠了解我們的模型和結(jié)果背后發(fā)生的事情,對(duì)于推動(dòng)業(yè)務(wù)決策至關(guān)重要。
Huge enterprises simply can’t afford to deploy a blackbox model in the real world and let it run wild on the ground without understanding how it works or when it fails.
巨大的企業(yè)根本無(wú)力在現(xiàn)實(shí)世界中部署黑盒模型,而讓它在不了解其工作原理或失效時(shí)間的情況下在野外瘋狂運(yùn)行。
And this is exactly why we’re still seeing simple models are still being utilized in the current industry like decision trees and logistic regression models.
這就是為什么我們?nèi)匀豢吹街T如決策樹(shù)和邏輯回歸模型之類的簡(jiǎn)單模型在當(dāng)前行業(yè)中仍在使用的原因。
5.總是看到大圖景 (5. Always See The Big Picture)
(Source)(資源)I made this huge mistake when I was first starting out in data science.
當(dāng)我剛開(kāi)始從事數(shù)據(jù)科學(xué)時(shí),我犯了一個(gè)巨大的錯(cuò)誤。
I focused too much on code and errors but somehow lost sight of the big picture that was truly important — end-to-end pipeline integration in production and how the solution performed in real world.
我過(guò)多地專注于代碼和錯(cuò)誤,但是卻以某種方式忽略了真正重要的全局- 生產(chǎn)中的端到端管道集成以及解決方案在現(xiàn)實(shí)世界中的執(zhí)行情況 。
In other words, I was too fixated with the technical part to the extent of over-optimizing my code and models without having a real impact in the overall project or business.
換句話說(shuō),我過(guò)于專注于技術(shù)部分,以至于過(guò)度優(yōu)化了我的代碼和模型,而對(duì)整個(gè)項(xiàng)目或業(yè)務(wù)沒(méi)有真正的影響。
Unfortunately, I learned this the hard way.
不幸的是,我很難學(xué)到這一點(diǎn)。
Fortunately, I’m currently using what I’ve learned to always remind myself to see the big picture.
幸運(yùn)的是,我目前正在使用自己學(xué)到的知識(shí)來(lái)提醒自己看大圖。
Hopefully, you’ll begin to realize the importance of seeing the big picture in your day-to-day work as a data scientist.
希望您會(huì)開(kāi)始意識(shí)到在作為數(shù)據(jù)科學(xué)家的日常工作中看到全局的重要性。
And the first step to do this is to first understand the business domain and the problems that you’re solving.
第一步是首先了解業(yè)務(wù)領(lǐng)域和您要解決的問(wèn)題。
Be clear of what you or your team aims to achieve in a project and understand how your role could be a part of the big picture and how different small pieces of picture can work together as a whole for the common goals.
清楚您或您的團(tuán)隊(duì)在項(xiàng)目中要實(shí)現(xiàn)的目標(biāo),并了解您的角色如何成為整體的一部分,以及不同的小片段如何共同為共同的目標(biāo)而協(xié)同工作。
最后的想法 (Final Thoughts)
(Source)(資源)Thank you for reading.
感謝您的閱讀。
My data science journey definitely has been a tough one, but I enjoyed the ride and learned a lot along the way.
我的數(shù)據(jù)科學(xué)之旅當(dāng)然是艱難的,但是我很喜歡這次旅程,并且在此過(guò)程中學(xué)到了很多東西。
And I’m still learning each and every day.
而且我仍在每天學(xué)習(xí)。
I hope you found this article helpful to you in some ways and will apply the lessons here in your work as a data scientist.
我希望您發(fā)現(xiàn)本文在某些方面對(duì)您有所幫助,并將本文中的課程應(yīng)用于您作為數(shù)據(jù)科學(xué)家的工作。
Now that I’ve moved to become a data science instructor, you’d also expect more data science content from me in future to help you learn and get into this field.
既然我已經(jīng)成為一名數(shù)據(jù)科學(xué)講師,那么您也希望以后我會(huì)提供更多的數(shù)據(jù)科學(xué)內(nèi)容,以幫助您學(xué)習(xí)和進(jìn)入這一領(lǐng)域。
Check out my other articles if you want to learn more about data science.
如果您想了解有關(guān)數(shù)據(jù)科學(xué)的更多信息,請(qǐng)查看我的其他文章 。
If you’re interested in learning how to go into data science, feel free to check out this article — How To Go Into Data Science — where I compiled and answered a list of common questions (or challenges) faced by beginners in data science with guidance.
如果您有興趣學(xué)習(xí)如何進(jìn)入數(shù)據(jù)科學(xué)領(lǐng)域,請(qǐng)隨時(shí)閱讀本文— 如何進(jìn)入數(shù)據(jù)科學(xué)領(lǐng)域。 在這里,我整理并回答了數(shù)據(jù)科學(xué)初學(xué)者在指導(dǎo)下遇到的常見(jiàn)問(wèn)題(或挑戰(zhàn))列表。
I hope you enjoyed reading this article and I look forward to having you as part of the data science community.
希望您喜歡閱讀本文,并希望您成為數(shù)據(jù)科學(xué)界的一員。
Remember, keep learning and never stop improving.
記住,繼續(xù)學(xué)習(xí),永遠(yuǎn)不要停止改進(jìn)。
As always, if you have any questions or comments feel free to leave your feedback below or you can always reach me on LinkedIn. Till then, see you in the next post! 😄
與往常一樣,如果您有任何疑問(wèn)或意見(jiàn),請(qǐng)隨時(shí)在下面留下您的反饋,或者隨時(shí)可以通過(guò)LinkedIn與我聯(lián)系。 到那時(shí),在下一篇文章中見(jiàn)! 😄
關(guān)于作者 (About the Author)
As a data scientist and data science instructor, Admond Lee is on a mission to make data science accessible to everyone. He is helping companies and digital marketing agencies track and achieve marketing ROI with actionable insights through innovative attribution and data-driven approach.
作為數(shù)據(jù)科學(xué)家和數(shù)據(jù)科學(xué)講師, Admond Lee的使命是使每個(gè)人都可以訪問(wèn)數(shù)據(jù)科學(xué)。 他正在通過(guò)創(chuàng)新的歸因和數(shù)據(jù)驅(qū)動(dòng)方法,以切實(shí)可行的見(jiàn)解,幫助公司和數(shù)字營(yíng)銷機(jī)構(gòu)跟蹤并實(shí)現(xiàn)營(yíng)銷投資回報(bào)。
His story and data science work have been featured by various publications, including KDnuggets, Medium, Tech in Asia, AI Time Journal and business magazines. Besides, he has been invited to speak at various workshops and meetups.
他的故事和數(shù)據(jù)科學(xué)工作在KDnuggets , Medium , Asia in Tech , AI Time Journal和商業(yè)雜志等各種出版物中都有報(bào)道。 此外,他還應(yīng)邀在各種研討會(huì)和聚會(huì)上演講 。
With his expertise in advanced social analytics and machine learning, Admond aims to bridge the gaps between digital marketing and data science.
憑借在高級(jí)社交分析和機(jī)器學(xué)習(xí)方面的專業(yè)知識(shí),Admond致力于彌合數(shù)字營(yíng)銷與數(shù)據(jù)科學(xué)之間的鴻溝。
Check out his website if you want to understand more about Admond’s story, data science services, and how he can help you in marketing space using data science.
如果您想了解有關(guān)Admond的故事,數(shù)據(jù)科學(xué)服務(wù)以及他如何使用數(shù)據(jù)科學(xué)幫助您進(jìn)行市場(chǎng)營(yíng)銷的更多信息,請(qǐng)?jiān)L問(wèn)他的網(wǎng)站 。
You can connect with him on LinkedIn, Medium, Twitter, and Facebook.
您可以在LinkedIn , Medium , Twitter和Facebook上與他聯(lián)系。
翻譯自: https://towardsdatascience.com/here-is-what-ive-learned-in-2-years-as-a-data-scientist-e13a24a74a72
數(shù)據(jù)結(jié)構(gòu)兩個(gè)月學(xué)完
總結(jié)
以上是生活随笔為你收集整理的数据结构两个月学完_这是我作为数据科学家两年来所学到的的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 做梦梦到发洪水房屋倒塌怎么回事
- 下一篇: ipywidgets_未来价值和Ipyw