数据科学自动化_数据科学会自动化吗?
數據科學自動化
意見 (Opinion)
目錄 (Table of Contents)
介紹 (Introduction)
With data science becoming more and more popular, companies are figuring out how many data scientists they will need on one team to make a successful product or answer a business problem successfully. While companies focus on hiring data scientists, they have most likely noticed that instead of hiring people to perform data science, they could hire a platform — or perform data science in other ways to employ data science at their company. Ultimately, data science can be automated, just like most technical processes, which is a bit of inception. The question, however, turns into, should it be automated and how well does data science perform when it is automated by a tool or platform? I will discuss these questions below by highlighting the pros and cons of auto — data science and/or machine learning.
隨著數據科學越來越流行,公司正在計算一個團隊要成功制作成功的產品或成功解決業務問題所需的數據科學家數量。 盡管公司專注于聘用數據科學家,但他們很可能已經注意到,與其雇用人員來執行數據科學,他們可以租用平臺-或以其他方式執行數據科學以在公司中使用數據科學。 最終,就像大多數技術流程一樣,數據科學可以實現自動化,這只是一個開始。 但是,問題變成了它應該自動化嗎?當通過工具或平臺自動化時,數據科學的性能如何? 我將在下面通過重點介紹自動技術(數據科學和/或機器學習)的利弊來討論這些問題。
數據科學自動化 (Automation of Data Science)
Like most things in life, moderation is key, so to eliminate your human data scientists and replace them with a tool is probably going to lead to some chaos and confusion — at first. Just like in education, an online platform could teach many people to become successful in an academic area, as can automated data science platforms. Data science can be learned by a human from a machine. But, when you automate data science this early in the history of the field (yes, I know it not as new of a field as many people think), you can run into some serious problems. Opposingly, you can run into some awesome pros.
就像生活中的大多數事物一樣,節制是關鍵,因此,消除您的人類數據科學家并用一種工具代替它們可能首先會導致混亂和混亂。 就像在教育中一樣,在線平臺可以教許多人在學術領域取得成功,自動化數據科學平臺也可以。 數據科學可以由人從機器中學習。 但是,當您在該領域的歷史早期就自動化數據科學時( 是的,我知道它并不像許多人認為的那樣新出現 ),您會遇到一些嚴重的問題。 相反,您會遇到一些很棒的專業人士。
利弊 (Pros and Cons)
There are pros and cons to everything, automated data science is no exception. I am not going to detail the specific tools/companies where their main product is data science automation, but you can expect some of these pros and cons to represent some of these tools.
一切都有利弊,自動化數據科學也不例外。 我不會詳細介紹其主要產品是數據科學自動化的特定工具/公司,但是您可以期望其中的一些利弊代表這些工具。
Pros
優點
Easy to Use
易于使用
The main function of automated data science platforms is to make it easier for users to implement data science in their business. Therefore, someone who has a background in data analytics or product management could expect to easily use a platform, to say — categorize images.
自動化數據科學平臺的主要功能是使用戶更輕松地在其業務中實施數據科學。 因此,具有數據分析或產品管理背景的人可能希望輕松使用平臺,例如對圖像進行分類。
Cheaper
便宜一點
Whereas hiring data scientists can cost a company well over $100,000 from salary and onboarding costs, an automated platform could cost significantly less than even just one data scientist — it is important to note that some companies have plenty over one data scientist.
聘用數據科學家可能會使公司的薪金和入門成本大大超過10萬美元,而自動化平臺的成本甚至可能僅比一名數據科學家還要低-重要的是要注意,有些公司擁有大量的數據科學家。
Powerful
強大
Data science is widely known as a powerful tool in itself that can significantly impact a company or business. Data science and machine learning has lead countless products and served nearly every human in some way. Use your phone today? Was it an iPhone? Did you use Face ID? Then you probably already used machine learning without even realizing it (unless you are a data scientist now and know it already). Maybe you used Netflix’s recommendation algorithm that suggested a show or movie. These are some of the examples of everyday machine learning that you will encounter. There are countless more, and a company can truly benefit from the power of data science on their business, whether it be internally or externally.
數據科學本身就是一種功能強大的工具,可以對公司或企業產生重大影響。 數據科學和機器學習已經引領了無數產品,并以某種方式為幾乎每個人服務。 今天使用手機嗎? 是iPhone嗎? 您是否使用了Face ID? 然后,您可能甚至在沒有意識到的情況下就已經使用了機器學習(除非您現在是數據科學家并且已經知道它)。 也許您使用了Netflix推薦節目或電影的推薦算法。 這些是您將遇到的日常機器學習的一些示例。 數不勝數,無論是內部還是外部,公司都可以從數據科學的業務中真正受益。
Cons
缺點
I am going to highlight the cons next, as I believe they are more important and outweigh the pros (as of now — this could change quickly).
接下來,我將重點介紹缺點,因為我認為它們更為重要,并且勝過了優點(到目前為止,這可能會Swift改變)。
Hard to Explain
很難解釋
The cons are where it gets tricky. These points can really mess up a company from a user not using the platform correctly and/or interpreting the results and model incorrectly. It can be hard to explain the results of a complicated data science model. Now imagine you are not a data scientist and have not had an academic background in the various types of machine learning algorithms. You will have to explain these platform model results and implement the suggestions or predictions with regards to your company’s integrations (sometimes), which could prove to be time-consuming and difficult.
缺點是棘手的地方。 這些問題確實會使用戶無法正確使用平臺和/或錯誤地解釋結果和模型,從而使公司陷入混亂。 很難解釋復雜的數據科學模型的結果。 現在,假設您不是數據科學家,并且沒有各種機器學習算法的學術背景。 您將不得不解釋這些平臺模型的結果,并就您公司的集成( 有時 )實施建議或預測,這可能會非常耗時且困難。
Misleading Results
誤導性結果
Since you did not build the model yourself, you may be unaware of possible parameters that need to be tuned. Additionally, you might not know that you need to use an elbow plot to find the optimal number of clusters for an unsupervised segmentation algorithm. All of these complications of not understanding the model from scratch could lead to results that may not make the most sense. Perhaps you used logistic regression to predict temperature for the next few months, but then later realize it was best to use the algorithm as a classification model instead, despite the contradicting name. There are small nuances that can add up and could lead to some serious mistakes.
由于您不是自己構建模型的,因此您可能沒有意識到需要調整的可能參數。 此外,您可能不知道是否需要使用彎頭圖來找到無監督分割算法的最佳聚類數。 不從頭開始理解模型的所有這些復雜情況可能導致結果可能沒有任何意義。 也許您使用邏輯回歸來預測接下來幾個月的溫度,但是后來意識到,盡管名稱相互矛盾,但最好還是使用該算法作為分類模型。 有一些細微差別可能加起來,并可能導致一些嚴重的錯誤。
摘要 (Summary)
Markus Winkler on Markus Winkler在Unsplash [2].Unsplash [2]上的照片。Ultimately, it depends on if data science will be completely automated. Sure, use an automated data science platform if you already have a data analyst on your team. Or, use the automated solution for predictions that are not harmful if incorrect. Categorizing clothes incorrectly is not the worst thing that can happen, but when you are in the health or finance industry and you classify a disease or large sums of money incorrectly, the harm is undeniable.
最終,這取決于數據科學是否將完全自動化。 當然,如果您的團隊中已經有數據分析師,請使用自動化數據科學平臺。 或者,使用自動解決方案進行預測,如果預測不正確,則無害。 錯誤地分類衣服并不是可能發生的最壞的事情,但是當您在醫療保健或金融行業中,將疾病或大量金錢錯誤地分類時,危害是不可否認的。
Figure out what company you are, your goals, and weigh the pros and cons, and from there, you can decide if automated data science is right for you. That being said, data science is already being automated but will face platforms that will try to completely automate the whole entire process in the future.
弄清楚您是一家公司,您的目標,并權衡利弊,然后您可以決定自動數據科學是否適合您。 話雖這么說,數據科學已經實現了自動化,但是將面向未來將嘗試完全自動化整個過程的平臺。
I hope this article brings some interesting discussion. Of course, I am biased and prefer to keep data scientists around; however, I know how much data science is automated already with importing popular libraries that are pre-saved. The solution may be that you could use the human-in-the-loop method: automate what you can, and then provide checks and balances to account for model error.
我希望本文帶來一些有趣的討論。 當然,我有偏見,更喜歡讓數據科學家留在身邊。 但是,我知道導入預先保存的流行庫已經實現了多少數據科學自動化。 解決方案可能是您可以使用“ 在環方法”:自動化您所能做的,然后提供制衡以解決模型錯誤。
Feel free to comment down below. Thank you for reading!
請在下面隨意評論。 感謝您的閱讀!
翻譯自: https://towardsdatascience.com/will-data-science-become-automated-407f32270de6
數據科學自動化
總結
以上是生活随笔為你收集整理的数据科学自动化_数据科学会自动化吗?的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 人工智能对金融世界的改变_人工智能革命正
- 下一篇: 数据结构栈和队列_使您的列表更上一层楼: