杨超越微数据_资料来源同意:数据科学技能超越数据
楊超越微數據
As data science enthusiasts know, there’s a lot more to excelling in the field than just its technical aspects. Data professionals need a wide range of skills, extending well beyond the technical aspects of data manipulation and analysis.
正如數據科學愛好者所知道的,該領域的杰出表現不僅僅是其技術方面。 數據專業人員需要廣泛的技能,遠遠超出了數據處理和分析的技術范圍。
This week’s episode of the Alter Everything podcast showcases Carlene Jones, data and analytics consultant, and Nynne Haagensen, a data enthusiast who worked with Carlene. Their conversation reinforces that people skills, communication abilities and business savvy are all critical to success in data science and analytics.
本周的 “ Alter Everything”播客的一集將展示數據和分析顧問Carlene Jones以及與Carlene合作的數據愛好者Nynne Haagensen。 他們的對話進一步證明,人們的技能,溝通能力和業務頭腦對于數據科學和分析的成功至關重要。
What are all those skills? To explore online conversations around this skill set, I decided to gather and analyze some data, naturally, inspired by this fantastic topic modeling trilogy (part 3 is coming soon!). This seemed like a fun opportunity to apply topic modeling with Alteryx Designer to what folks have discussed out there on the interwebz about the data science skill set. (Topic Modeling is part of the Alteryx Intelligence Suite, which includes some new text mining tools.)
這些技能是什么? 為了探索圍繞該技能集的在線對話,我決定收集和分析一些數據,自然是受到這個奇妙的主題 建模三部曲的啟發(第3部分即將推出!)。 這似乎是一個有趣的機會,可以將使用Alteryx Designer進行主題建模應用于人們在互聯網上有關數據科學技能集的討論。 ( 主題建模是Alteryx Intelligence Suite的一部分 ,其中包括一些新的文本挖掘工具。)
收集意見 (Gathering Opinions)
I built a workflow in Designer that scraped 64 articles from the data science site KDnuggets tagged “skills” and cleaned up the text. I also used Text Pre-processing to quickly prep the remaining text before sending it into the Topic Modeling and Word Cloud tools. The word cloud below gives you a preview of some of the prominent ideas, but topic modeling lets us dig a little deeper.
我在Designer中構建了一個工作流,該工作流從數據科學網站KDnuggets標記了“技能”的64篇文章中抓取并清理了文本。 我還使用文本預處理來快速準備剩余的文本,然后再將其發送到主題建模和詞云工具中。 下面的“云”一詞為您提供了一些重要思想的預覽,但是主題建模使我們可以更深入地研究。
I asked the Topic Modeling tool to identify three dominant topics in the text of these articles. You should definitely read all the details on how this process works, but in a nutshell: This is an unsupervised approach, meaning that I’m not specifying what I want the model to find in advance, but rather letting it identify on its own the key ideas in the text of the articles. This tool assumes that each chunk of text I feed it is a mixture of those three different topics, since I asked for three. It figures out how those topics are represented in each chunk based on the probability that certain words occur together. It doesn’t give a name to the topics it finds, though; it needs us to figure out what its groupings of words mean.
我要求主題建模工具在這些文章的文本中確定三個主要主題。 您絕對應該閱讀有關此過程如何工作的所有詳細信息 ,但總而言之 :這是一種無監督的方法,這意味著我并不是在指定我希望模型預先找到的內容,而是讓它自己識別模型。文章正文中的關鍵思想。 該工具假設我輸入的每個文本塊都是這三個主題的混合體,因為我要了三個主題。 它根據某些單詞一起出現的可能性,弄清楚了這些主題在每個塊中是如何表示的。 但是,它沒有為找到的主題起名字。 它需要我們弄清楚其詞組的含義。
技術技能及更多 (Technical Skills and More)
The topic model that results from this analysis is open to interpretation, but here’s what I see. Topic 1 looks to describe the role of the data analyst or data scientist within an organization, with some technical terms mentioned (Python, SQL, Hadoop). However, it also includes concepts like “value,” “market” and “demand” that could reflect the business expertise a skilled data professional brings to the organization. Some of the chunks of original text that scored highly for the presence of Topic 1 include:
通過這種分析得出的主題模型可以接受解釋,但這就是我所看到的。 主題1旨在描述組織中數據分析師或數據科學家的角色,并提及一些技術術語(Python,SQL,Hadoop)。 但是,它也包含諸如“價值”,“市場”和“需求”之類的概念,這些概念可能反映出熟練的數據專業人員帶給組織的業務專業知識。 因主題1的存在而獲得高分的一些原始文本包括:
- “… a data scientist doesn’t just possess technical skills, they also have domain expertise” “……數據科學家不僅擁有技術技能,而且還具有領域專業知識”
- “Knowing the basic principles of data science and machine learning is still required, but knowing how to apply them to your problem is even more valuable” “仍然需要了解數據科學和機器學習的基本原理,但是知道如何將其應用于您的問題就更有價值了”
- “Remember, my goal wasn’t to invent a new machine learning algorithm; it was to demonstrate to a client the potential machine learning had or didn’t have for their business” “請記住,我的目標不是發明新的機器學習算法;而是 旨在向客戶證明其業務可能具有或不具有潛在的機器學習能力。”
Topic 2 has “learning” as its most relevant term and “machine” in second place, so a quick conclusion would be that Topic 2 reflects the prominence of machine learning skills for data science. However, a closer review suggests that maybe “learning” could also be interpreted in another way. Some of the chunks of text that scored highly for Topic 2 include:
主題2以“學習”為其最相關的術語,而“機器”則排在第二位,因此可以快速得出結論,主題2反映了數據科學中機器學習技能的突出地位。 但是,仔細研究表明,也許“學習”也可以用另一種方式來解釋。 在主題2上得分很高的一些文本塊包括:
- “Apart from classroom learning, you can practice what you learned in the classroom by building an app, starting a blog, or exploring data analysis to enable you to learn more” “除了課堂學習之外,您還可以通過構建應用程序,創建博客或探索數據分析來練習在課堂上學到的東西,以使您學到更多”
- “Communication problems are harder than technical problems” “通信問題比技術問題難”
- “If you’re stuck on a problem, sitting and staring at code may solve it or may not. Instead talk it out in language with a teammate” “如果您遇到問題,坐下來凝視代碼可能會解決問題,也可能無法解決。 而是與隊友用語言交流”
Some of the other terms included in this topic are “question,” “understand,” “team,” “approach” and “offer.” This topic seems to have a theme of ongoing learning and skill development for the data professional.
本主題中包含的其他一些術語是“問題”,“理解”,“團隊”,“方法”和“報價”。 這個主題似乎是數據專業人員不斷學習和發展技能的主題。
Finally, Topic 3 looks like it represents the intersection of technical skills and problem-solving, with terms “problem,” “solve,” “think,” “model,” and “code” showing up as highly relevant. “Math” also appears here, as do “research” and “concept,” suggesting some of the more specific intellectual skills useful in the data fields.
最后,主題3似乎代表了技術技能與解決問題的交集,術語“問題”,“解決”,“思考”,“模型”和“代碼”顯示為高度相關。 “數學”也出現在這里,“研究”和“概念”也出現在這里,表明在數據領域有用的一些更具體的智力技能。
- “Machine learning can seem magical. And in some cases it is. But in the cases it’s not, it’s important to acknowledge it.” “機器學習似乎很神奇。 在某些情況下是這樣。 但是在某些情況下,必須承認這一點很重要?!?
- “There are too many data points for a human to make sense of it. It is a textbook case of death by information overload” “對于人類來說,有太多的數據點無法理解。 這是一本教科書,因信息超載而死亡”
- “Communication skills” and “data visualization” “溝通技巧”和“數據可視化”
- “Spend time thinking about the products of the company, how your job impacts the core of the business, and a few ideas of how you would do your job to solve an important problem” “花時間思考公司的產品,您的工作如何影響業務核心以及關于如何解決重要問題的一些想法”
- “It’s perfectly fine if you’re overwhelmed by the skills needed (So am I!)” “如果您對所需的技能不知所措(我也是!),那絕對很好”
分析的人文環境 (The Human Context for Analysis)
Yes, it is a lengthy list of skills indeed! This quick analysis suggests that in discussions of data science skills, there is a recurring emphasis not just on technical skills, but on the capabilities that put data analyses into human and business contexts. The best model or analysis doesn’t mean much without humans empowered to figure out the right problem-solving strategy, the questions to ask, the methods to use and the interpretation of their results.
是的,確實是一長串的技能! 這種快速分析表明,在討論數據科學技能時,經常強調的不僅是技術技能,而且還強調將數據分析納入人員和業務環境的能力。 沒有人被授權找出正確的問題解決策略,提出的問題,使用的方法以及對結果的解釋,最好的模型或分析并沒有多大意義。
Learn more about how Carlene and Nynne view the skills needed for a data-driven company culture and professional success in this week’s Alter Everything episode.
在本周的“ Alter Everything”一集中,詳細了解Carlene和Nynne如何看待數據驅動的公司文化和專業成功所需的技能。
Originally published on the Alteryx Community.
最初發表在 Alteryx社區 。
翻譯自: https://towardsdatascience.com/sources-agree-data-science-skills-go-beyond-data-4cd9057960c4
楊超越微數據
總結
以上是生活随笔為你收集整理的杨超越微数据_资料来源同意:数据科学技能超越数据的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 最值钱的银元有哪几种
- 下一篇: 统计概率分布_概率统计中的重要分布