日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 >

数据科学r语言_您应该为数据科学学习哪些语言?

發(fā)布時間:2023/11/29 38 豆豆
生活随笔 收集整理的這篇文章主要介紹了 数据科学r语言_您应该为数据科学学习哪些语言? 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

數(shù)據(jù)科學(xué)r語言

Data science is an exciting field to work in, combining advanced statistical and quantitative skills with real-world programming ability. There are many potential programming languages that the aspiring data scientist might consider specializing in.

數(shù)據(jù)科學(xué)是一個令人興奮的領(lǐng)域,它將先進(jìn)的統(tǒng)計和定量技能與實(shí)際編程能力相結(jié)合。 有抱負(fù)的數(shù)據(jù)科學(xué)家可能會考慮采用許多潛在的編程語言。

While there is no correct answer, there are several things to take into consideration. Your success as a data scientist will depend on many points, including:

盡管沒有正確的答案,但有幾件事要考慮。 您作為數(shù)據(jù)科學(xué)家的成功取決于很多方面,包括:

Specificity

特異性

When it comes to advanced data science, you will only get so far reinventing the wheel each time. Learn to master the various packages and modules offered in your chosen language. The extent to which this is possible depends on what domain-specific packages are available to you in the first place!

當(dāng)涉及到高級數(shù)據(jù)科學(xué)時,每次您都只能重新發(fā)明輪子。 學(xué)習(xí)掌握以您選擇的語言提供的各種軟件包和模塊。 可能的程度首先取決于您可以使用哪些特定于域的軟件包!

Generality

概論

A top data scientist will have good all-round programming skills as well as the ability to crunch numbers. Much of the day-to-day work in data science revolves around sourcing and processing raw data or ‘data cleaning’. For this, no amount of fancy machine learning packages are going to help.

一位頂尖的數(shù)據(jù)科學(xué)家將具有良好的全面編程技能以及處理數(shù)字的能力。 數(shù)據(jù)科學(xué)中的許多日常工作都圍繞著采購和處理原始數(shù)據(jù)或“數(shù)據(jù)清理”。 為此,沒有任何花哨的機(jī)器學(xué)習(xí)包會有所幫助。

Productivity

生產(chǎn)率

In the often fast-paced world of commercial data science, there is much to be said for getting the job done quickly. However, this is what enables technical debt to creep in — and only with sensible practices can this be minimized.

在通常快節(jié)奏的商業(yè)數(shù)據(jù)科學(xué)世界中,要快速完成工作有很多話要說。 但是,這正是技術(shù)債務(wù)蔓延的原因,只有明智的實(shí)踐才能使這種債務(wù)減至最少。

Performance

性能

In some cases it is vital to optimize the performance of your code, especially when dealing with large volumes of mission-critical data. Compiled languages are typically much faster than interpreted ones; likewise statically typed languages are considerably more fail-proof than dynamically typed. The obvious trade-off is against productivity.

在某些情況下,優(yōu)化代碼的性能至關(guān)重要,尤其是在處理大量關(guān)鍵任務(wù)數(shù)據(jù)時。 編譯語言通常比解釋語言要快得多。 同樣,靜態(tài)類型的語言比動態(tài)類型的語言具有更好的防故障能力。 明顯的權(quán)衡是不利于生產(chǎn)力。

To some extent, these can be seen as a pair of axes (Generality-Specificity, Performance-Productivity). Each of the languages below fall somewhere on these spectra.

在某種程度上,這些可以看作是一對軸(通用性,性能和生產(chǎn)率)。 下面的每種語言都屬于這些頻譜。

With these core principles in mind, let’s take a look at some of the more popular languages used in data science. What follows is a combination of research and personal experience of myself, friends and colleagues — but it is by no means definitive! In approximately order of popularity, here goes:

牢記這些核心原則,讓我們看一下數(shù)據(jù)科學(xué)中使用的一些更流行的語言。 接下來是我自己,朋友和同事的研究和個人經(jīng)驗(yàn)的結(jié)合,但這絕不是確定的! 按流行程度大致如下:

[R (R)

你需要知道的 (What you need to know)

Released in 1995 as a direct descendant of the older S programming language, R has since gone from strength to strength. Written in C, Fortran and itself, the project is currently supported by the R Foundation for Statistical Computing.

R作為較早的S編程語言的直接后代于1995年發(fā)布,此后R變得越來越強(qiáng)大。 該項(xiàng)目由C,Fortran及其本身編寫,目前得到R統(tǒng)計計算基金會的支持。

執(zhí)照 (License)

Free!

自由!

優(yōu)點(diǎn) (Pros)

  • Excellent range of high-quality, domain specific and open source packages. R has a package for almost every quantitative and statistical application imaginable. This includes neural networks, non-linear regression, phylogenetics, advanced plotting and many, many others.

    高質(zhì)量,特定領(lǐng)域和開源軟件包的優(yōu)秀產(chǎn)品。 R提供了幾乎所有可以想象的定量和統(tǒng)計應(yīng)用程序的軟件包。 這包括神經(jīng)網(wǎng)絡(luò),非線性回歸,系統(tǒng)發(fā)育,高級繪圖以及許多其他功能。

  • The base installation comes with very comprehensive, in-built statistical functions and methods. R also handles matrix algebra particularly well.

    基本安裝帶有非常全面的內(nèi)置統(tǒng)計功能和方法。 R還可以很好地處理矩陣代數(shù)。
  • Data visualization is a key strength with the use of libraries such as ggplot2.

    借助ggplot2之類的庫,數(shù)據(jù)可視化是關(guān)鍵優(yōu)勢。

缺點(diǎn) (Cons)

  • Performance. There’s no two ways about it, R is not a quick language.

    性能。 關(guān)于它,沒有兩種方法, R不是一種快速的語言 。

  • Domain specificity. R is fantastic for statistics and data science purposes. But less so for general purpose programming.

    域特異性。 對于統(tǒng)計和數(shù)據(jù)科學(xué)而言,R太棒了。 但是對于通用編程而言則更少。
  • Quirks. R has a few unusual features that might catch out programmers experienced with other languages. For instance: indexing from 1, using multiple assignment operators, unconventional data structures.

    怪癖。 R具有一些不尋常的功能,這些功能可能趕不上使用其他語言的程序員。 例如:使用多個賦值運(yùn)算符從1開始索引,使用非常規(guī)數(shù)據(jù)結(jié)構(gòu)。

裁決-“為設(shè)計目的而精采” (Verdict — “brilliant at what it’s designed for”)

R is a powerful language that excels at a huge variety of statistical and data visualization applications, and being open source allows for a very active community of contributors. Its recent growth in popularity is a testament to how effective it is at what it does.

R是一種功能強(qiáng)大的語言,擅長于各種統(tǒng)計和數(shù)據(jù)可視化應(yīng)用程序,并且開源是一個非常活躍的貢獻(xiàn)者社區(qū)。 它最近受歡迎程度的提高證明了它在工作中的有效性。

Python (Python)

你需要知道的 (What you need to know)

Guido van Rossum introduced Python back in 1991. It has since become an extremely popular general purpose language, and is widely used within the data science community. The major versions are currently 3.6 and 2.7.

Guido van Rossum于1991年引入Python。此后,Python成為一種非常流行的通用語言,并在數(shù)據(jù)科學(xué)界廣泛使用。 當(dāng)前的主要版本是3.6和2.7 。

執(zhí)照 (License)

Free!

自由!

優(yōu)點(diǎn) (Pros)

  • Python is a very popular, mainstream general purpose programming language. It has an extensive range of purpose-built modules and community support. Many online services provide a Python API.

    Python是一種非常流行的主流通用編程語言。 它具有廣泛的專用模塊和社區(qū)支持。 許多在線服務(wù)都提供Python API。

  • Python is an easy language to learn. The low barrier to entry makes it an ideal first language for those new to programming.

    Python是一種易于學(xué)習(xí)的語言。 入門門檻低,使其成為編程新手的理想第一語言。
  • Packages such as pandas, scikit-learn and Tensorflow make Python a solid option for advanced machine learning applications.

    諸如pandas , scikit-learn和Tensorflow之類的軟件包使Python成為高級機(jī)器學(xué)習(xí)應(yīng)用程序的可靠選擇。

缺點(diǎn) (Cons)

  • Type safety: Python is a dynamically typed language, which means you must show due care. Type errors (such as passing a String as an argument to a method which expects an Integer) are to be expected from time-to-time.

    類型安全性:Python是一種動態(tài)類型化的語言,這意味著您必須格外小心。 有時會出現(xiàn)類型錯誤(例如將String作為參數(shù)傳遞給需要Integer的方法)。
  • For specific statistical and data analysis purposes, R’s vast range of packages gives it a slight edge over Python. For general purpose languages, there are faster and safer alternatives to Python.

    為了實(shí)現(xiàn)特定的統(tǒng)計和數(shù)據(jù)分析目的,R廣泛的軟件包使其與Python相比有一點(diǎn)優(yōu)勢。 對于通用語言,Python提供了更快,更安全的替代方法。

裁決-“優(yōu)秀的全能選手” (Verdict — “excellent all-rounder”)

Python is a very good choice of language for data science, and not just at entry-level. Much of the data science process revolves around the ETL process (extraction-transformation-loading). This makes Python’s generality ideally suited. Libraries such as Google’s Tensorflow make Python a very exciting language to work in for machine learning.

Python是數(shù)據(jù)科學(xué)語言的很好選擇,而不僅僅是入門級的語言。 許多數(shù)據(jù)科學(xué)過程都圍繞ETL過程 (提取-轉(zhuǎn)換-加載)進(jìn)行。 這使得Python的通用性非常適合。 諸如Google的Tensorflow之類的庫使Python成為一種非常激動人心的語言,可用于機(jī)器學(xué)習(xí)。

SQL (SQL)

你需要知道的 (What you need to know)

SQL (‘Structured Query Language’) defines, manages and queries relational databases. The language appeared by 1974 and has since undergone many implementations, but the core principles remain the same.

SQL (“結(jié)構(gòu)化查詢語言”)定義,管理和查詢關(guān)系數(shù)據(jù)庫 。 該語言于1974年問世,此后經(jīng)歷了許多實(shí)現(xiàn),但是核心原理保持不變。

執(zhí)照 (License)

Varies — some implementations are free, others proprietary

不同-有些實(shí)現(xiàn)是免費(fèi)的,而另一些則是專有的

優(yōu)點(diǎn) (Pros)

  • Very efficient at querying, updating and manipulating relational databases.

    在查詢,更新和操作關(guān)系數(shù)據(jù)庫方面非常高效。
  • Declarative syntax makes SQL an often very readable language . There’s no ambiguity about what SELECT name FROM users WHERE age > 18 is supposed to do!

    聲明式語法使SQL成為一種非常易讀的語言。 對于SELECT name FROM users WHERE age > 18 SELECT name FROM users WHERE age > , SELECT name FROM users WHERE age >應(yīng)該做什么沒有任何歧義!

  • SQL is very used across a range of applications, making it a very useful language to be familiar with. Modules such as SQLAlchemy make integrating SQL with other languages straightforward.

    SQL在各種應(yīng)用程序中都非常常用,這使其成為一種非常有用的語言。 諸如SQLAlchemy之類的模塊使SQL與其他語言的集成變得簡單。

缺點(diǎn) (Cons)

  • SQL’s analytical capabilities are rather limited — beyond aggregating and summing, counting and averaging data, your options are limited.

    SQL的分析功能非常有限-除了聚合,求和,計數(shù)和平均數(shù)據(jù)之外,您的選擇也受到限制。
  • For programmers coming from an imperative background, SQL’s declarative syntax can present a learning curve.

    對于來自命令性背景的程序員而言,SQL的聲明性語法可以顯示學(xué)習(xí)曲線。
  • There are many different implementations of SQL such as PostgreSQL, SQLite, MariaDB . They are all different enough to make inter-operability something of a headache.

    SQL有許多不同的實(shí)現(xiàn),例如PostgreSQL , SQLite , MariaDB 。 它們之間的差異足以使互操作性令人頭疼。

裁決-“永恒而高效” (Verdict — “timeless and efficient”)

SQL is more useful as a data processing language than as an advanced analytical tool. Yet so much of the data science process hinges upon ETL, and SQL’s longevity and efficiency are proof that it is a very useful language for the modern data scientist to know.

SQL作為數(shù)據(jù)處理語言比作為高級分析工具更有用。 然而,這么多的數(shù)據(jù)科學(xué)過程都取決于ETL,而SQL的壽命和效率證明了它是現(xiàn)代數(shù)據(jù)科學(xué)家了解的非常有用的語言。

Java (Java)

你需要知道的 (What you need to know)

Java is an extremely popular, general purpose language which runs on the (JVM) Java Virtual Machine. It’s an abstract computing system that enables seamless portability between platforms. Currently supported by Oracle Corporation.

Java是一種非常流行的通用語言,可在(JVM)Java虛擬機(jī)上運(yùn)行。 這是一個抽象的計算系統(tǒng),可實(shí)現(xiàn)平臺之間的無縫移植。 目前由Oracle Corporation支持。

執(zhí)照 (License)

Version 8 — Free! Legacy versions, proprietary.

版本8 —免費(fèi)! 舊版,專有。

優(yōu)點(diǎn) (Pros)

  • Ubiquity . Many modern systems and applications are built upon a Java back-end. The ability to integrate data science methods directly into the existing codebase is a powerful one to have.

    無處不在。 許多現(xiàn)代系統(tǒng)和應(yīng)用程序都建立在Java后端上。 將數(shù)據(jù)科學(xué)方法直接集成到現(xiàn)有代碼庫中的能力非常強(qiáng)大。
  • Strongly typed. Java is no-nonsense when it comes to ensuring type safety. For mission-critical big data applications, this is invaluable.

    強(qiáng)類型。 在確保類型安全性方面,Java毫無疑問。 對于關(guān)鍵任務(wù)大數(shù)據(jù)應(yīng)用程序來說,這是無價的。
  • Java is a high-performance, general purpose, compiled language . This makes it suitable for writing efficient ETL production code and computationally intensive machine learning algorithms.

    Java是一種高性能的通用編譯語言。 這使其適合編寫高效的ETL生產(chǎn)代碼和計算密集型機(jī)器學(xué)習(xí)算法。

缺點(diǎn) (Cons)

  • For ad-hoc analyses and more dedicated statistical applications, Java’s verbosity makes it an unlikely first choice. Dynamically typed scripting languages such as R and Python lend themselves to much greater productivity.

    對于臨時分析和更專用的統(tǒng)計應(yīng)用程序,Java的冗長性使其成為不太可能的首選。 動態(tài)類型的腳本語言(例如R和Python)可提高生產(chǎn)力。
  • Compared to domain-specific languages like R, there aren’t a great number of libraries available for advanced statistical methods in Java.

    與R之類的領(lǐng)域特定語言相比,Java中沒有太多可用于高級統(tǒng)計方法的庫。

裁決-“數(shù)據(jù)科學(xué)的有力競爭者” (Verdict — “a serious contender for data science”)

There is a lot to be said for learning Java as a first choice data science language. Many companies will appreciate the ability to seamlessly integrate data science production code directly into their existing codebase, and you will find Java’s performance and and type safety are real advantages.

學(xué)習(xí)Java作為首選的數(shù)據(jù)科學(xué)語言有很多話要說。 許多公司將欣賞將數(shù)據(jù)科學(xué)生產(chǎn)代碼直接無縫集成到其現(xiàn)有代碼庫中的能力,并且您會發(fā)現(xiàn)Java的性能和類型安全是真正的優(yōu)勢。

However, you’ll be without the range of stats-specific packages available to other languages. That said, definitely one to consider — especially if you already know one of R and/or Python.

但是,您將沒有其他語言可用的特定于統(tǒng)計信息的軟件包范圍。 就是說,絕對要考慮的一個-特別是如果您已經(jīng)了解R和/或Python之一。

Scala (Scala)

你需要知道的 (What you need to know)

Developed by Martin Odersky and released in 2004, Scala is a language which runs on the JVM. It is a multi-paradigm language, enabling both object-oriented and functional approaches. Cluster computing framework Apache Spark is written in Scala.

Scala由Martin Odersky開發(fā)并于2004年發(fā)布,是一種在JVM上運(yùn)行的語言。 它是一種多范式語言,支持面向?qū)ο蟮姆椒ê凸δ芊椒ā?集群計算框架Apache Spark用Scala編寫。

執(zhí)照 (License)

Free!

自由!

優(yōu)點(diǎn) (Pros)

  • Scala + Spark = High performance cluster computing. Scala is an ideal choice of language for those working with high-volume data sets.

    Scala + Spark =高性能集群計算。 對于使用大量數(shù)據(jù)集的人員來說,Scala是理想的語言選擇。
  • Multi-paradigmatic: Scala programmers can have the best of both worlds. Both object-oriented and functional programming paradigms available to them.

    多范式:Scala程序員可以兼得兩者。 面向?qū)ο蠛凸δ芫幊谭独伎梢允褂谩?
  • Scala is compiled to Java bytecode and runs on a JVM. This allows inter-operability with the Java language itself, making Scala a very powerful general purpose language, while also being well-suited for data science.

    Scala被編譯為Java字節(jié)碼,并在JVM上運(yùn)行。 這允許與Java語言本身進(jìn)行互操作,從而使Scala成為功能非常強(qiáng)大的通用語言,同時也非常適合數(shù)據(jù)科學(xué)。

缺點(diǎn) (Cons)

  • Scala is not a straightforward language to get up and running with if you’re just starting out. Your best bet is to download sbt and set up an IDE such as Eclipse or IntelliJ with a specific Scala plug-in.

    如果您剛?cè)腴T,Scala不是一種簡單易用的語言。 最好的選擇是下載sbt并使用特定的Scala插件設(shè)置IDE(例如Eclipse或IntelliJ)。

  • The syntax and type system are often described as complex. This makes for a steep learning curve for those coming from dynamic languages such as Python.

    語法和類型系統(tǒng)通常被描述為復(fù)雜的。 這為那些來自動態(tài)語言(例如Python)的人提供了陡峭的學(xué)習(xí)曲線。

裁決-“完美,適用于大數(shù)據(jù)” (Verdict — “perfect, for suitably big data”)

When it comes to using cluster computing to work with Big Data, then Scala + Spark are fantastic solutions. If you have experience with Java and other statically typed languages, you’ll appreciate these features of Scala too.

在使用群集計算與大數(shù)據(jù)一起使用時,Scala + Spark是絕佳的解決方案。 如果您有Java和其他靜態(tài)類型語言的使用經(jīng)驗(yàn),那么您也會喜歡Scala的這些功能。

Yet if your application doesn’t deal with the volumes of data that justify the added complexity of Scala, you will likely find your productivity being much higher using other languages such as R or Python.

但是,如果您的應(yīng)用程序不處理足以證明Scala增加了復(fù)雜性的數(shù)據(jù)量,那么使用R或Python等其他語言可能會發(fā)現(xiàn)您的生產(chǎn)力要高得多。

朱莉亞 (Julia)

你需要知道的 (What you need to know)

Released just over 5 years ago, Julia has made an impression in the world of numerical computing. Its profile was raised thanks to early adoption by several major organizations including many in the finance industry.

Julia(Julia)發(fā)布于5年前,在數(shù)值計算領(lǐng)域給人留下了深刻的印象。 由于包括金融業(yè)在內(nèi)的數(shù)個主要組織的早期采用,提高了它的形象。

執(zhí)照 (License)

Free!

自由!

優(yōu)點(diǎn) (Pros)

  • Julia is a JIT (‘just-in-time’) compiled language, which lets it offer good performance. It also offers the simplicity, dynamic-typing and scripting capabilities of an interpreted language like Python.

    Julia是一種JIT(“及時”)編譯語言,可提供良好的性能。 它還提供了像Python這樣的解釋語言的簡單性,動態(tài)鍵入和腳本編寫功能。
  • Julia was purpose-designed for numerical analysis. It is capable of general purpose programming as well.

    Julia是專為數(shù)字分析而設(shè)計的。 它也能夠進(jìn)行通用編程。
  • Readability. Many users of the language cite this as a key advantage

    可讀性。 許多使用該語言的用戶都將其作為主要優(yōu)勢

缺點(diǎn) (Cons)

  • Maturity. As a new language, some Julia users have experienced instability when using packages. But the core language itself is reportedly stable enough for production use.

    到期。 作為一種新語言,一些Julia用戶在使用軟件包時會遇到不穩(wěn)定的情況。 但是據(jù)報道,核心語言本身已經(jīng)足夠穩(wěn)定,可供生產(chǎn)使用。
  • Limited packages are another consequence of the language’s youthfulness and small development community. Unlike long-established R and Python, Julia doesn’t have the choice of packages (yet).

    有限的軟件包是該語言的年輕化和小型開發(fā)社區(qū)的另一個結(jié)果。 與歷史悠久的R和Python不同,Julia尚未選擇軟件包。

裁決-“一個為未來” (Verdict — “one for the future”)

The main issue with Julia is one that cannot be blamed for. As a recently developed language, it isn’t as mature or production-ready as its main alternatives Python and R.

Julia的主要問題是不能責(zé)怪的。 作為一種新近開發(fā)的語言,它不像其主要替代品Python和R那樣成熟或可以投入生產(chǎn)。

But, if you are willing to be patient, there’s every reason to pay close attention as the language evolves in the coming years.

但是,如果您愿意耐心等待,那么隨著語言在未來幾年的發(fā)展,我們有充分的理由要密切注意。

的MATLAB (MATLAB)

你需要知道的 (What you need to know)

MATLAB is an established numerical computing language used throughout academia and industry. It is developed and licensed by MathWorks, a company established in 1984 to commercialize the software.

MATLAB是一種在學(xué)術(shù)界和行業(yè)中廣泛使用的已建立的數(shù)值計算語言。 它由MathWorks開發(fā)并獲得許可,該公司成立于1984年,旨在將該軟件商業(yè)化。

執(zhí)照 (License)

Proprietary — pricing varies depending on your use case

專有-定價因您的用例而異

優(yōu)點(diǎn) (Pros)

  • Designed for numerical computing. MATLAB is well-suited for quantitative applications with sophisticated mathematical requirements such as signal processing, Fourier transforms, matrix algebra and image processing.

    專為數(shù)值計算而設(shè)計。 MATLAB非常適合具有復(fù)雜數(shù)學(xué)要求的定量應(yīng)用,例如信號處理,傅立葉變換,矩陣代數(shù)和圖像處理。
  • Data Visualization. MATLAB has some great inbuilt plotting capabilities.

    數(shù)據(jù)可視化。 MATLAB具有一些出色的內(nèi)置繪圖功能。
  • MATLAB is often taught as part of many undergraduate courses in quantitative subjects such as Physics, Engineering and Applied Mathematics. As a consequence, it is widely used within these fields.

    在許多物理,工程和應(yīng)用數(shù)學(xué)等定量學(xué)科的本科課程中,經(jīng)常將MATLAB授課。 結(jié)果,它在這些領(lǐng)域中被廣泛使用。

缺點(diǎn) (Cons)

  • Proprietary licence. Depending on your use-case (academic, personal or enterprise) you may have to fork out for a pricey licence. There are free alternatives available such as Octave. This is something you should give real consideration to.

    專有許可證。 根據(jù)您的用例(學(xué)術(shù),個人或企業(yè)),您可能需要為獲得昂貴的許可證付出代價。 有免費(fèi)的替代方法,例如Octave 。 這是您應(yīng)該真正考慮的事情。

  • MATLAB isn’t an obvious choice for general-purpose programming.

    對于通用編程,MATLAB不是一個明顯的選擇。

裁決-“最適合數(shù)學(xué)密集型應(yīng)用程序” (Verdict — “best for mathematically intensive applications”)

MATLAB’s widespread use in a range of quantitative and numerical fields throughout industry and academia makes it a serious option for data science.

MATLAB在整個行業(yè)和學(xué)術(shù)界在定量和數(shù)值領(lǐng)域的廣泛使用使其成為數(shù)據(jù)科學(xué)的重要選擇。

The clear use-case would be when your application or day-to-day role requires intensive, advanced mathematical functionality. Indeed, MATLAB was specifically designed for this.

明確的用例是您的應(yīng)用程序或日常角色需要密集的高級數(shù)學(xué)功能時。 實(shí)際上,MATLAB是為此專門設(shè)計的。

其他語言 (Other Languages)

There are other mainstream languages that may or may not be of interest to data scientists. This section provides a quick overview… with plenty of room for debate of course!

還有其他主流語言可能會或可能不會對數(shù)據(jù)科學(xué)家感興趣。 本節(jié)提供快速概述…當(dāng)然還有足夠的討論空間!

C ++ (C++)

C++ is not a common choice for data science, although it has lightning fast performance and widespread mainstream popularity. The simple reason may be a question of productivity versus performance.

盡管C ++具有閃電般的快速性能和廣泛的主流流行度,但它并不是數(shù)據(jù)科學(xué)的常見選擇。 簡單的原因可能是生產(chǎn)率與性能的問題。

As one Quora user puts it:

正如Quora的一位用戶所說 :

“If you’re writing code to do some ad-hoc analysis that will probably only be run one time, would you rather spend 30 minutes writing a program that will run in 10 seconds, or 10 minutes writing a program that will run in 1 minute?”

“如果您正在編寫代碼以進(jìn)行可能僅運(yùn)行一次的臨時分析,您寧愿花30分鐘編寫將在10秒內(nèi)運(yùn)行的程序,還是花10分鐘編寫將在1秒內(nèi)運(yùn)行的程序?分鐘?”

The dude’s got a point. Yet for serious production-level performance, C++ would be an excellent choice for implementing machine learning algorithms optimized at a low-level.

花花公子的觀點(diǎn)。 但是對于嚴(yán)重的生產(chǎn)級性能,C ++將是實(shí)現(xiàn)在低級優(yōu)化的機(jī)器學(xué)習(xí)算法的絕佳選擇。

Verdict — “not for day-to-day work, but if performance is critical…”

裁決-“不是日常工作,但如果績效至關(guān)重要……”

JavaScript (JavaScript)

With the rise of Node.js in recent years, JavaScript has become more and more a serious server-side language. However, its use in data science and machine learning domains has been limited to date (although checkout brain.js and synaptic.js!). It suffers from the following disadvantages:

近年來,隨著Node.js的興起, JavaScript越來越成為一種嚴(yán)肅的服務(wù)器端語言。 但是,它在數(shù)據(jù)科學(xué)和機(jī)器學(xué)習(xí)領(lǐng)域中的使用迄今受到限制(盡管結(jié)帳brain.js和synaptic.js !)。 它具有以下缺點(diǎn):

  • Late to the game (Node.js is only 8 years old!), meaning…

    游戲晚了(Node.js才8歲!),這意味著…
  • Few relevant data science libraries and modules are available. This means no real mainstream interest or momentum

    很少有相關(guān)的數(shù)據(jù)科學(xué)庫和模塊可用。 這意味著沒有真正的主流興趣或動力
  • Performance-wise, Node.js is quick. But JavaScript as a language is not without its critics.

    在性能方面,Node.js很快。 但是JavaScript作為一種語言并不是沒有批評者的 。

Node’s strengths are in asynchronous I/O, its widespread use and the existence of languages which compile to JavaScript. So it’s conceivable that a useful framework for data science and realtime ETL processing could come together.

Node的優(yōu)勢在于異步I / O,其廣泛使用以及可編譯為JavaScript的語言的存在。 因此可以想象,一個有用的數(shù)據(jù)科學(xué)框架和實(shí)時ETL處理可以融合在一起。

The key question is whether this would offer anything different to what already exists.

關(guān)鍵問題是,這是否會提供與已經(jīng)存在的不同的東西。

Verdict — “there is much to do before JavaScript can be taken as a serious data science language”

結(jié)論—“在將JavaScript視為一種嚴(yán)肅的數(shù)據(jù)科學(xué)語言之前,還有很多事情要做”

Perl (Perl)

Perl is known as a ‘Swiss-army knife of programming languages’, due to its versatility as a general-purpose scripting language. It shares a lot in common with Python, being a dynamically typed scripting language. But, it has not seen anything like the popularity Python has in the field of data science.

Perl因其作為通用腳本語言的多功能性而被稱為“編程語言的瑞士軍刀”。 它是動態(tài)類型化的腳本語言,與Python有很多共同點(diǎn)。 但是,它沒有像Python在數(shù)據(jù)科學(xué)領(lǐng)域那樣受歡迎。

This is a little surprising, given its use in quantitative fields such as bioinformatics. Perl has several key disadvantages when it comes to data science. It isn’t stand-out fast, and its syntax is famously unfriendly. There hasn’t been the same drive towards developing data science specific libraries. And in any field, momentum is key.

考慮到它在生物信息學(xué)等定量領(lǐng)域的使用,這有點(diǎn)令人驚訝。 在數(shù)據(jù)科學(xué)方面,Perl有幾個關(guān)鍵的缺點(diǎn)。 它不是很快就脫穎而出,并且其語法眾所周知是不友好的 。 開發(fā)特定于數(shù)據(jù)科學(xué)的庫的驅(qū)動力不同。 在任何領(lǐng)域,動力都是關(guān)鍵。

Verdict — “a useful general purpose scripting language, yet it offers no real advantages for your data science CV”

判決-“一種有用的通用腳本語言,但對于您的數(shù)據(jù)科學(xué)CV并沒有任何真正的優(yōu)勢”

Ruby (Ruby)

Ruby is another general purpose, dynamically typed interpreted language. Yet it also hasn’t seen the same adoption for data science as has Python.

Ruby是另一種通用的動態(tài)類型的解釋語言。 但是,它還沒有像Python那樣被數(shù)據(jù)科學(xué)采用。

This might seem surprising, but is likely a result of Python’s dominance in academia, and a positive feedback effect . The more people use Python, the more modules and frameworks are developed, and the more people will turn to Python.

這似乎令人驚訝,但很可能是Python在學(xué)術(shù)界的統(tǒng)治地位以及積極的反饋效應(yīng)的結(jié)果。 使用Python的人越多,開發(fā)的模塊和框架就越多,并且使用Python的人也就越多。

The SciRuby project exists to bring scientific computing functionality, such as matrix algebra, to Ruby. But for the time being, Python still leads the way.

存在SciRuby項(xiàng)目是為了將科學(xué)計算功能(例如矩陣代數(shù))引入Ruby。 但是就目前而言,Python仍然處于領(lǐng)先地位。

Verdict — “not an obvious choice yet for data science, but won’t harm the CV”

判決-“對于數(shù)據(jù)科學(xué)而言,尚不是一個顯而易見的選擇,但不會損害簡歷”

結(jié)論 (Conclusion)

Well, there you have it — a quickfire guide to which languages to consider for data science. The key here is to understand your usage requirements in terms of generality vs specificity, as well as your personal preferred development style of performance vs productivity.

好了,您已掌握了它-速成指南,可為數(shù)據(jù)科學(xué)考慮使用哪些語言。 這里的關(guān)鍵是要從通用性和特異性方面了解您的使用要求,以及您個人偏愛的性能與生產(chǎn)力開發(fā)風(fēng)格。

I use R, Python and SQL on a regular basis, as my current role largely focuses on developing existing data pipeline and ETL processes. These languages give the right balance of generality and productivity to do the job, with the option of using R’s more advanced statistics packages when needed.

我經(jīng)常使用R,Python和SQL,因?yàn)槲夷壳暗穆氊?zé)主要集中在開發(fā)現(xiàn)有的數(shù)據(jù)管道和ETL流程上。 這些語言可以在通用性和生產(chǎn)率之間實(shí)現(xiàn)適當(dāng)?shù)钠胶?#xff0c;并在需要時可以選擇使用R的高級統(tǒng)計軟件包。

However — you may already have some experience with Java. Or you may want to use Scala for big data. Or, perhaps you’re keen to get involved with the Julia project.

但是,您可能已經(jīng)對Java有一定的經(jīng)驗(yàn)。 或者您可能想將Scala用于大數(shù)據(jù)。 或者,也許您渴望參與Julia項(xiàng)目。

Maybe you learned MATLAB at university, or want to give SciRuby a chance? Perhaps you have an altogether different suggestion. If so, please leave a reply below — I look forward to hearing from you!

也許您是在大學(xué)學(xué)習(xí)過MATLAB的,還是想給SciRuby一個機(jī)會? 也許您有完全不同的建議。 如果是這樣,請?jiān)谙旅媪粝麓饛?fù)-我期待您的來信!

Thanks for reading!

謝謝閱讀!

翻譯自: https://www.freecodecamp.org/news/which-languages-should-you-learn-for-data-science-e806ba55a81f/

數(shù)據(jù)科學(xué)r語言

總結(jié)

以上是生活随笔為你收集整理的数据科学r语言_您应该为数据科学学习哪些语言?的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。