如何成为数据科学家_成为数据科学家需要了解什么
如何成為數據科學家
Data science is one of the new, emerging fields that has the power to extract useful trends and insights from both structured and unstructured data. It is an interdisciplinary field that uses scientific research, algorithms, and graphs to uncover the patterns within the chaos and use these patterns to create amazing things.
數據科學是新興的領域之一,可以從結構化和非結構化數據中提取有用的趨勢和見解。 它是一個跨學科領域,它使用科學研究,算法和圖形來揭示混亂中的模式,并使用這些模式來創造令人驚奇的事物。
As a data scientist, you need to know some basic mathematics, programming, and have a keen eye for patterns and trends finding. Due to the inter-disciplinary nature of the field, data scientists will find themselves working on different and broad aspects of technology.
作為數據科學家,您需要了解一些基本的數學知識,編程知識,并對模型和趨勢發現有敏銳的洞察力。 由于該領域具有跨學科性質,因此數據科學家將發現自己正在研究技術的廣泛領域。
Before we get into what you need to become a data scientist, let’s first talk about what a job in data science entails.
在深入探討成為數據科學家所需的知識之前,讓我們先談談數據科學工作需要做什么。
數據科學家做什么? (What do data scientists do?)
Working in data science is similar to riding a roller-coaster. Some aspects of the job are slow and steady, while others are fast and crazy. Other parts of it are just like going in a loop, and you repeat things over and over again.
從事數據科學工作類似于坐過山車。 工作的某些方面緩慢而穩定,而另一些方面則快速而瘋狂。 它的其他部分就像循環一樣,您一遍又一遍地重復。
Whenever a data scientist starts a new project, they will go through a known set of steps to get to their final conclusion.
每當數據科學家開始一個新項目時,他們都會經過一系列已知的步驟來得出最終結論。
Any data science project starts with data and ends with data, and in between, the magic happens.
任何數據科學項目都以數據開始,以數據結束,在這兩者之間,魔術就發生了。
If you look through the internet, you will find many articles that address a different number of steps in a data science project. However, regardless of the number of steps, the core aspects are the same. For me, any data science project goes through 6 main steps.
如果您瀏覽互聯網,則會發現許多文章涉及數據科學項目中不同步驟的內容。 但是,無論步驟數如何,核心方面都是相同的。 對我來說,任何數據科學項目都要經歷6個主要步驟。
Canva)Canva制作 )步驟№1:了解數據背景。 (Step №1: Understand data background.)
Whenever we start a data science project, we are usually aiming to solve a problem, enhance performance, or predict future trends. To do any of that, we first need to grasp the history of the source of the data and how it’s produced.
每當我們啟動數據科學項目時,我們通常旨在解決問題,提高性能或預測未來趨勢。 為此,我們首先需要掌握數據源的歷史及其產生方式。
步驟№2:收集數據。 (Step №2: Collect data.)
Once we understood the background of that data, we need to collect the data to start working on it. Based on the nature of the project, there are different approaches to gather data. We can get it from a database, from an API, or — if you’re a beginner or just working on your skills — from an open data source. Another option to collect data is to scarp the wen for publically available information.
一旦了解了這些數據的背景,就需要收集數據以開始處理它。 根據項目的性質,有多種收集數據的方法。 我們可以從數據庫,API或(如果您是初學者或只是在從事技能的人)從開放數據源中獲取它。 收集數據的另一種方法是在網上獲取公開信息。
步驟№3:清理并轉換數據。 (Step №3: Clean and transform the data.)
Most — if not all — of the time, the data we collect from the source are pure and raw. That kind of data is not suitable to be used in algorithms and future steps. So, the first thing we do when we get new data is clean it up, categorize and tag it, and make sense of it.
在大多數時間(如果不是全部時間)中,我們從源頭收集的數據是純原始數據。 這類數據不適合用于算法和將來的步驟。 因此,當我們獲取新數據時,我們要做的第一件事就是清理,分類和標記數據,并弄清數據。
步驟№4:分析和探索數據。 (Step №4: Analyze and explore the data.)
Once our data is clean and structured, we can start analyzing it and attempt to find patterns in it. This can be done by visualizing the data and looking for repetitions or spikes.
一旦我們的數據是干凈的和結構化的,我們就可以開始分析它并嘗試在其中找到模式。 這可以通過可視化數據并查找重復或峰值來完成。
步驟№5:對數據建模。 (Step №5: Model the data.)
We finally reach the magical step! After we explore and analyze our data, it’s time to feed into a machine learning algorithm and use it to predict future outcomes. This is truly the power of data science.
我們終于達到了神奇的一步 ! 在探索和分析我們的數據之后,是時候引入機器學習算法并用它來預測未來的結果了。 這確實是數據科學的力量。
步驟№6:可視化和交流結果。 (Step №6: Visualize and communicate results.)
Finally, and the most crucial step of the process is to visualize and present the results of the project effectively.
最后,該過程中最關鍵的一步是有效地可視化并呈現項目結果。
Once those steps are done, a new project comes in, and it’s time to start all over again.
完成這些步驟后,就會出現一個新項目,該是重新開始的時候了。
數據科學需要哪些技能? (What skills are needed for data science?)
Every step of the data project lifecycle requires a specific set of knowledge and skills. To better connect the skills needed, I will pair each phase of the project with the necessary skill to complete that step.
數據項目生命周期的每個步驟都需要一組特定的知識和技能。 為了更好地連接所需的技能,我將項目的每個階段與必要的技能配對以完成該步驟。
To perform data investigation, you only need a curious mind, a pen, and a paper. You sit down and either ask the data source some questions to understand the data better or if it is an open-source data, read the documentation that accompanies the data.
要執行數據調查,您只需要好奇的頭腦,一支筆和一張紙。 您坐下來,或者問數據源一些問題以更好地理解數據,或者如果它是開源數據,請閱讀數據隨附的文檔。
To perform data collection, you will need to know how to communicate with databases and APIs. Understanding the basic structure and mechanics of such techniques will make your data collection a breeze. If you’re using an open-source dataset, then learning how to look for datasets and some good sources can make a huge difference.
要執行數據收集,您將需要知道如何與數據庫和API通信。 了解此類技術的基本結構和機制將使您的數據收集輕而易舉。 如果您使用的是開源數據集,那么學習如何查找數據集和一些好的資源可能會產生很大的不同。
To perform data cleaning, you need some good knowledge of basic data mining and cleaning techniques. You will need to tag your data and categorize it properly. Moreover, you can use regular expressions to look for misspellings or use special tools created to make this process easier for you.
要執行數據清理 ,您需要一些基本的數據挖掘和清理技術方面的知識。 您將需要標記數據并進行正確分類。 此外,您可以使用正則表達式查找拼寫錯誤,也可以使用創建的特殊工具使此過程更輕松。
To perform data exploration, you will need some basics statistics and probability theory. Some knowledge of data visualization and experimental design can help you a lot at this stage.
要進行數據探索,您將需要一些基礎統計和概率論。 在此階段,一些數據可視化和實驗設計方面的知識可以為您提供很多幫助。
To perform data modeling, you will need to know a few machine learning algorithms and how they work. You don’t need to understand everything 100%; if you can use them correctly and apply them to the correct form of data, you will be fine.
要執行數據建模,您將需要了解一些機器學習算法及其工作方式。 您無需100%理解所有內容; 如果您可以正確使用它們并將它們應用于正確的數據形式,則可以。
Finally, to perform data communication, you might use some essential science communication 101. Which are knowing your audience, their background knowledge, and choosing wimple words to explain complex concepts? Additionally, effective data visualization can make or break your project at this stage.
最后, 要進行數據交流,您可能會使用一些基本的科學交流101。哪些能了解您的聽眾,他們的背景知識以及選擇愚蠢的單詞來解釋復雜的概念? 此外,有效的數據可視化可以在此階段創建或破壞您的項目。
技術工具 (Technical tools)
Some of the skills we just talked about require a programming language, an algorithm, or special packages.
我們剛才談到的一些技能需要編程語言,算法或特殊程序包。
- Programming languages: Python, R. 編程語言:Python,R。
For handling and creating databases: MySQL, PostgreSQL, MongoDB, or SQLite in Python. If you’re using R, then you can use RMySQL.
用于處理和創建數據庫:Python中的MySQL , PostgreSQL , MongoDB或SQLite 。 如果使用R,則可以使用RMySQL。
Packages for data exploration and transformation: in Python Pandas, Numpy, or Scipy. Or in R GGplot2 and Dplyr.
數據探索和轉換的軟件包:Python Pandas , Numpy或Scipy 。 或在R GGplot2和Dplyr中。
Python libraries for visualizations: Matplotlib, Plotly, Pygal.
用于可視化的Python庫: Matplotlib , Plotly , Pygal 。
Basic machine learning package for Python Scikit-learn and CARET in R.
R中用于Python Scikit-learn和CARET的基本機器學習包。
結論 (Conclusion)
You don’t need to know everything about statistics, math, machine learning, or be a professional programmer to start with data science. You only need the basics of this knowledge. As you work on different projects and build your profile, your knowledge base will expand, and your “data science sense” will improve automatically.
您不需要了解統計,數學,機器學習的全部知識,也不需要成為專業的程序員就可以開始數據科學。 您只需要這些知識的基礎。 當您從事不同的項目并建立個人檔案時,您的知識庫將會擴大,并且您的“數據科學意識”也會自動提高。
So, don’t be intimidated by the field, or by how many things you need to “master” to be a good data scientist. Just start with the basics and work your way through to the advanced topics. Be patient and give it your all, and you will get there.
因此,不要被該領域或要成為一名出色的數據科學家所需要掌握的幾件事所嚇倒。 只是從基礎開始,然后逐步學習高級主題。 耐心點,全力以赴,您將到達那兒。
翻譯自: https://towardsdatascience.com/what-do-you-need-to-know-to-become-a-data-scientist-1ed52e0e1ad
如何成為數據科學家
總結
以上是生活随笔為你收集整理的如何成为数据科学家_成为数据科学家需要了解什么的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 梦到收集花种子什么意思
- 下一篇: 个人项目api接口_5个免费有趣的API