一件登录facebook_我从Facebook的R教学中学到的6件事
一件登錄facebook
Between 2018 to 2019, I worked at Facebook as a data scientist — during that time I was involved in developing and teaching a class for R beginners. This was a two-day course that was taught about once a month to a group of roughly 15–20 students, and the goal was that they would leave the class with the ability to use R in their day-to-day work.
乙切口白內(nèi)障手術(shù)挽2018至19年,我曾在Facebook上的數(shù)據(jù)科學(xué)家-那段時(shí)間我曾參與開發(fā)和教學(xué)的R初學(xué)者一類。 這是一門為期兩天的課程,每月大約有15至20名學(xué)生參加一次該課程,目的是讓他們?cè)谌粘9ぷ髦袚碛惺褂肦的能力。
This article goes shares some of the things that I learned from teaching these classes, with an emphasis on what worked well for the students. Hopefully these six tips can be of use for anyone that uses R, especially those just beginning their journey.
本文將分享我從這些課程的教學(xué)中學(xué)到的一些知識(shí),并重點(diǎn)介紹對(duì)學(xué)生有效的方法。 希望這六個(gè)技巧對(duì)使用R的任何人都有用,尤其是剛開始使用R的人。
但是首先,我的個(gè)人經(jīng)驗(yàn)學(xué)習(xí)R (But first, my personal experiences learning R)
I initially learned R as a statistics undergrad at Berkeley. In college I despised using R, and used it as a means to an end for completing projects and problem sets so that I could graduate.
我最初在伯克利學(xué)習(xí)R作為統(tǒng)計(jì)專業(yè)的本科生。 在大學(xué)里,我鄙視使用R,并將其用作完成項(xiàng)目和問題集以達(dá)到畢業(yè)的目的。
Once I entered the workforce and started learning R from my coworkers, my perspective towards the language started to shift. I realized that there were some key gaps on how R was taught in college — mainly that we were learning R for a classroom setting, which does not translate too well to a workplace setting.
一旦我進(jìn)入工作隊(duì)伍并開始從同事那里學(xué)習(xí)R,我對(duì)語(yǔ)言的看法就開始發(fā)生變化。 我意識(shí)到在大學(xué)教授R的方法上存在一些關(guān)鍵空白-主要是我們?cè)诮淌噎h(huán)境中學(xué)習(xí)R,這對(duì)工作場(chǎng)所的設(shè)置并不太好。
Since graduating college, I have grown to embrace R fully— I’ve developed R packages at Facebook and Doordash, taught R at Facebook, and have attended several R conferences. With my background out of the way, I wanted to share some tips and advice for those on their own journey to using R in their day-to-day.
自大學(xué)畢業(yè)以來,我已經(jīng)完全擁抱R —我在Facebook和Doordash開發(fā)了R軟件包,在Facebook上教過R,并參加了幾次R會(huì)議。 在沒有背景的情況下,我想為那些在日常使用R的旅途中的人分享一些技巧和建議。
Note: I graduated college in 2015, so the curriculum has likely improved, so my personal experiences may not be as relevant for more recent college grads.
注意:我于2015年大學(xué)畢業(yè),因此課程可能有所改善,因此我的個(gè)人經(jīng)歷可能與最近的大學(xué)畢業(yè)生不太相關(guān)。
1. R不僅適合數(shù)據(jù)科學(xué)家,而且有使用該語(yǔ)言的理由會(huì)使學(xué)習(xí)變得更容易 (1. R is not just for data scientists, and having a reason for using the language will make learning easier)
Before teaching R, I assumed that a large majority our students would be data scientists looking to increase their impact by bringing R into their SQL/Excel workflow. However, I was really surprised by the diversity of people that attended these classes. We had a good mix of software engineers, data scientists, data engineers, researchers, and finance/operations people just to name a few.
在教授R之前,我假設(shè)絕大多數(shù)學(xué)生都是數(shù)據(jù)科學(xué)家,他們希望通過將R引入他們SQL / Excel工作流程來增加其影響。 但是,我對(duì)參加這些課程的人的多樣性感到非常驚訝。 我們匯集了軟件工程師,數(shù)據(jù)科學(xué)家,數(shù)據(jù)工程師,研究人員以及財(cái)務(wù)/運(yùn)營(yíng)人員,僅舉幾例。
Photo by Priscilla Du Preez on Unsplash Priscilla Du Preez 攝于UnsplashFor data scientists, their main reason for taking the class was clear — they’re constantly working with data, and learning R will gives them a more effective and flexible way of working with data. Also, learning R will come more naturally as they have a lot of opportunity to practice the language while at the same time making a direct impact on their work.
對(duì)于數(shù)據(jù)科學(xué)家而言,他們上課的主要原因很明確-他們一直在處理數(shù)據(jù),而學(xué)習(xí)R將為他們提供一種更有效,更靈活的數(shù)據(jù)處理方式。 另外,學(xué)習(xí)R會(huì)更自然,因?yàn)樗麄冇泻芏鄼C(jī)會(huì)練習(xí)語(yǔ)言,同時(shí)直接影響他們的工作。
When trying to understand why the some of the other students signed up for the class there were a variety of reasons, for example:
當(dāng)試圖理解為什么其他一些學(xué)生報(bào)名參加該課程時(shí),有多種原因,例如:
- Engineers who wanted to be able to improve their ability to modify and visualize data. 希望能夠提高其修改和可視化數(shù)據(jù)能力的工程師。
- Operations and finance looking for an alternative for repetitive daily/weekly Excel updates. 運(yùn)營(yíng)和財(cái)務(wù)部門正在尋找替代方案,以進(jìn)行每日/每周重復(fù)的Excel更新。
- People who are already familiar with R but wanted to freshen up their knowledge and learn how to use it effectively at Facebook. 那些已經(jīng)熟悉R但想要更新他們的知識(shí)并在Facebook上學(xué)習(xí)如何有效使用它的人們。
In the three examples above, we see ways that non-data scientists can gain value from learning R. These tangible use cases are great things to have to keep focused because learning R takes a fair amount of persistence. Broadly, you want to be in one of these two categories if you’re not a data scientist/analyst:
在上面的三個(gè)示例中,我們看到了非數(shù)據(jù)科學(xué)家從學(xué)習(xí)R中獲得價(jià)值的方法。 這些有形的用例是必須重點(diǎn)關(guān)注的好事情,因?yàn)閷W(xué)習(xí)R需要相當(dāng)多的持久性。 廣義來說,如果您不是數(shù)據(jù)科學(xué)家/分析師,則希望屬于以下兩種類別之一:
You want to do something but it will be very difficult/impossible without knowing R (or some other programming language)
您想做點(diǎn)什么,但是如果不了解R (或其他編程語(yǔ)言) ,將非常困難/不可能。
One last point on this topic —sometimes R is not the best tool for the job. For example, if you already know how to use SQL+Excel you already have a deadly duo of tools to aggregate, analyze, and visualize data. Having used R myself for around 7 years, I often find myself resorting to SQL + Excel simply because it’s faster and more sharable. So if you spend a lot of time learning R, don’t feel like you need to use it for everything because sometimes it will actually take twice as long then if you use tools you’re already an expert in.
關(guān)于這個(gè)話題的最后一點(diǎn)-有時(shí)R并不是完成這項(xiàng)工作的最佳工具。 例如,如果您已經(jīng)知道如何使用SQL + Excel,那么您已經(jīng)擁有了致命的工具組合,用于匯總,分析和可視化數(shù)據(jù)。 使用R本身已有大約7年的時(shí)間,我經(jīng)常發(fā)現(xiàn)自己求助于SQL + Excel是因?yàn)樗?#xff0c;更易于共享。 因此,如果您花費(fèi)大量時(shí)間學(xué)習(xí)R,就不需要使用它來做所有事情,因?yàn)橛袝r(shí)使用R的時(shí)間實(shí)際上是使用R的兩倍,而如果您已經(jīng)是專家。
2. Tidyverse為王 (2. Tidyverse is king)
Source: tidyverse.org資料來源:tidyverse.orgWhat is Tidyverse? The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
什么是 Tidyverse ? tidyverse是為數(shù)據(jù)科學(xué)而設(shè)計(jì)的R軟件包的自以為是的集合。 所有軟件包都共享基本的設(shè)計(jì)理念,語(yǔ)法和數(shù)據(jù)結(jié)構(gòu)。
The two most popular and useful packages in Tidyverse are:
Tidyverse中兩個(gè)最受歡迎和最有用的軟件包是:
Source: tidyverse.org資料來源:tidyverse.orgTo keep this section short and to the point: Tidyverse is the quickest and most straightforward way to aggregate and modify data in R. Not only that, but it makes learning R a lot more fun and easy. I’ve first learned R without Tidyverse and it was a miserable experience, and others who learned R a similar way share my sentiments. Tidyverse has become so widespread amongst R users that I would not recommend learning/teaching R without it.
為了使本節(jié)簡(jiǎn)短明了 : Tidyverse是聚合和修改R中數(shù)據(jù)的最快,最直接的方法 。 不僅如此,它還使學(xué)習(xí)R變得更加有趣和輕松。 我最初是在沒有Tidyverse的情況下學(xué)習(xí)R的,這是一次痛苦的經(jīng)歷,而其他以類似方式學(xué)習(xí)R的人也分享了我的觀點(diǎn)。 Tidyverse已經(jīng)在R用戶中變得如此普遍,以至于我不建議沒有 Rdy 學(xué)習(xí)/教學(xué)R。
If you’ve never used Tidyverse, it’s super simple to set up and I would highly encourage you to start using it (there are many resources online to learn)
如果您從未使用過Tidyverse,那么它的設(shè)置非常簡(jiǎn)單,我強(qiáng)烈建議您開始使用它(有很多在線資源可供學(xué)習(xí))
# This is all you need to install tidyverse:install.pacakges('tidyverse')library(tidyverse)
Note: I reference some packages later in this article, if you ever need to install a new package, you can use the function above to do so. Once installed, load it into R using library()
注意:我將在本文后面引用一些軟件包,如果您需要安裝新軟件包,則可以使用上面的功能來安裝。 安裝后,使用library()其加載到R中
3.備忘單,備忘單,備忘單 (3. Cheatsheets, cheatsheets, cheatsheets)
This goes well with the previous topic because learning Tidyverse can be daunting at first with its unique syntax and long list of functions. Luckily, the RStudio team has created a bunch of cheatsheets. For our in-person classes, we would make sure to print cheat sheets for all of the students so that they wouldn’t have to keep switching tabs to search for functions. If you are able to, I would highly recommend printing and laminating your own cheat sheets for personal use. I still reference my cheat sheets even having used the language for over 5 years.
這與上一個(gè)主題非常吻合,因?yàn)閷W(xué)習(xí)Tidyverse最初可能因其獨(dú)特的語(yǔ)法和長(zhǎng)功能列表而令人生畏。 幸運(yùn)的是,RStudio團(tuán)隊(duì)創(chuàng)建了很多備忘單。 對(duì)于我們的現(xiàn)場(chǎng)授課,我們將確保為所有學(xué)生打印備忘單,這樣他們就不必繼續(xù)切換選項(xiàng)卡來搜索功能 。 如果可以的話,我強(qiáng)烈建議您打印并層壓自己的備忘單以供個(gè)人使用。 即使使用該語(yǔ)言已有5年以上,我仍然參考我的備忘單。
This website contains a list cheatsheets published by the RStudio team. Some of the topics here are more advanced, but I would say two essential cheat sheets to get started are the ones below:
該網(wǎng)站包含RStudio團(tuán)隊(duì)發(fā)布的清單備忘單。 這里的一些主題更高級(jí),但是我要說的是以下兩個(gè)基本的備忘單:
Source: https://rstudio.com/resources/cheatsheets/資料來源: https : //rstudio.com/resources/cheatsheets/ Source: https://rstudio.com/resources/cheatsheets/資料來源: https : //rstudio.com/resources/cheatsheets/4.通過使用內(nèi)部數(shù)據(jù)集學(xué)習(xí) (4. Learn by using internal datasets)
Within the first hour of class, we have our students query data from the internal database into R. At Facebook, this would be as simple as using our internal package and writing:
在上課的第一個(gè)小時(shí)內(nèi),我們讓學(xué)生將內(nèi)部數(shù)據(jù)庫(kù)中的數(shù)據(jù)查詢到R中。在Facebook上,這就像使用內(nèi)部程序包并編寫以下代碼一樣簡(jiǎn)單:
df <- presto("SELECT * from example_table limit 10000")There are two main reasons I recommend learning with internal datasets:
我建議學(xué)習(xí)內(nèi)部數(shù)據(jù)集的主要原因有兩個(gè):
Being able to query internal data directly into your R amplifies your ability to use company data. If you are not able to query internal data directly into R, you’d have to do some sort of workaround such as exporting data into a csv file, then reading that into R. This wastes a lot of time, so I would try to get familiar with bringing data directly into R as early as possible, even if it means an extra hour or two of initial set up/getting the right permissions.
能夠直接查詢R中的內(nèi)部數(shù)據(jù),從而增強(qiáng)了使用公司數(shù)據(jù)的能力。 如果您無法直接向R查詢內(nèi)部數(shù)據(jù),則必須采取某種變通方法,例如將數(shù)據(jù)導(dǎo)出到csv文件中,然后再將其讀入R。這會(huì)浪費(fèi)很多時(shí)間,因此我將嘗試盡早熟悉將數(shù)據(jù)直接帶到R中,即使這意味著一兩個(gè)小時(shí)的初始設(shè)置/獲得正確的權(quán)限也是如此。
A company’s data is one of its most valuable resources. If you work at Facebook, then you should be taking advantage of the fact that you have some of the richest and most interesting datasets in the world. The same applies with any other company — Uber with its ride data, Airbnb with its bookings data, Medium with data on articles. A lot of online resources will have you use a generic dataset, so I would try to take the extra step and bring in key company datasets when possible to aid your learning. By doing this, you’re already in the mindset of easing R into your workflow.
公司的數(shù)據(jù)是其最有價(jià)值的資源之一。 如果您在Facebook工作,那么您應(yīng)該利用以下事實(shí):您擁有世界上最豐富,最有趣的數(shù)據(jù)集。 其他公司也是如此,Uber擁有乘車數(shù)據(jù),Airbnb擁有預(yù)訂數(shù)據(jù),Medium擁有商品數(shù)據(jù)。 很多在線資源將使您使用通用數(shù)據(jù)集,因此,我將嘗試采取額外的步驟,并盡可能引入重要的公司數(shù)據(jù)集,以幫助您學(xué)習(xí)。 這樣,您就已經(jīng)可以將R放寬到工作流程中了。
5.導(dǎo)入和導(dǎo)出數(shù)據(jù)的重要性 (5. The importance of importing and exporting data)
R is a great tool for analyzing data but if you can’t get data into or out of R that’s a really big problem. The previous section touched a little bit on this, so this section is meant to be more practical and goes over some the main methods to get different types of data into/out of R.
R是用于分析數(shù)據(jù)的好工具,但是如果您無法將數(shù)據(jù)放入R中或從R中取出,那將是一個(gè)很大的問題。 上一節(jié)對(duì)此進(jìn)行了一些介紹,因此本節(jié)旨在更加實(shí)用,并介紹了一些用于將不同類型的數(shù)據(jù)傳入/傳出R的主要方法。
By focusing on these methods, you should be able to import/export almost 100% of what is necessary. And of course, there is also a cheat sheet that you may find helpful for this:
通過專注于這些方法,您應(yīng)該能夠?qū)?導(dǎo)出幾乎100%的必需品。 當(dāng)然,還有一個(gè)備忘單 ,您可能會(huì)對(duì)此有所幫助:
Source: https://rstudio.com/resources/cheatsheets/資料來源: https : //rstudio.com/resources/cheatsheets/For importing data:
導(dǎo)入數(shù)據(jù):
Csv: read_csv() (Tidyverse)
read_csv() : read_csv() (Tidyverse)
Excel: read_excel() (Tidyverse)
Excel: read_excel() (Tidyverse)
Google Sheets: Similar to the above, but may require extra steps for private sheets. You want to use the package googlesheets4. Worst case scenario, you export the Google Sheet as a csv and read it in using read_csv()
Google表格:與上述類似,但對(duì)于私人表格可能需要額外的步驟。 您要使用包googlesheets4 。 最壞的情況是,您將Google表格導(dǎo)出為csv并使用read_csv()讀取
Internal database: Use SQL to bring data directly into R. You’ll need to consult with your data team to see if there is an internal package to do this. At Facebook, presto("SELECT * FROM tbl")is all you need to grab data from a table. At smaller companies, there may be some extra steps to connect R to an internal database, but at the very least setting up ODBC connection should allow you to grab data.
內(nèi)部數(shù)據(jù)庫(kù):使用SQL將數(shù)據(jù)直接帶到R中。您需要咨詢數(shù)據(jù)團(tuán)隊(duì),以查看是否有內(nèi)部軟件包可以執(zhí)行此操作。 在Facebook上,只需presto("SELECT * FROM tbl")即可從表中獲取數(shù)據(jù)。 在較小的公司中,可能需要一些額外的步驟才能將R連接到內(nèi)部數(shù)據(jù)庫(kù),但是至少要設(shè)置ODBC連接才能允許您獲取數(shù)據(jù)。
For exporting data:
對(duì)于導(dǎo)出數(shù)據(jù):
Copy to clipboard: write_clip() from the clipr package copies a data frame directly into your clipboard. If your company uses Google Sheets, this is the quickest way to get data into there, so this is one of the most useful functions that you can learn. Essentially, it’s cutting down the steps from: Export df to csv -> Open csv and copy contents -> Paste into Sheets to Copy df to clipboard -> Paste into Sheets
復(fù)制到剪貼板:來自clipr包的write_clip()將數(shù)據(jù)幀直接復(fù)制到剪貼板中。 如果您的公司使用Google表格,這是將數(shù)據(jù)導(dǎo)入其中的最快方法,因此這是您可以學(xué)習(xí)的最有用的功能之一。 從本質(zhì)上講,它減少了以下步驟: Export df to csv -> Open csv and copy contents -> Paste into Sheets以Copy df to clipboard -> Paste into Sheets
Copy a plot/graph: When you make a graph in R, the easiest way to share it out is to copy/paste it. Simple zoom in on a plot to bring it into its own window, and you can right click and copy the image.
復(fù)制圖/圖:在R中創(chuàng)建圖時(shí),最簡(jiǎn)單的共享方法是復(fù)制/粘貼。 只需簡(jiǎn)單地放大繪圖,即可將其帶到其自己的窗口中,然后可以右鍵單擊并復(fù)制圖像。
Screenshot directly from R: If you want to share out a small table more informally (i.e. Slack), taking a screenshot of your R console is probably the best bet. If you want to get fancy, you can use the kable() function from the knitr package to clean up your table so that it’s a little easier to read.
直接來自R的屏幕截圖:如果您想更非正式地共享一張小桌子(即Slack),那么為R控制臺(tái)截圖可能是最好的選擇。 如果您想花哨的話,可以使用knitr包中的kable()函數(shù)清理表,以便于閱讀。
Write to csv: write_csv()
寫入csv: write_csv()
Write to internal database: This is usually a lot more complicated than reading from an internal database, but would definitely talk your data team if you think you’ll do this often.
寫入內(nèi)部數(shù)據(jù)庫(kù):與從內(nèi)部數(shù)據(jù)庫(kù)讀取相比,這通常要復(fù)雜得多,但是如果您認(rèn)為自己經(jīng)常這樣做,肯定會(huì)與您的數(shù)據(jù)團(tuán)隊(duì)聯(lián)系。
6.保持簡(jiǎn)單,專注于基本原理 (6. Keep it simple and focus on the fundamentals)
There are so many things you can do with R, it can be a little overwhelming at first. For example, just in the cheat sheet link alone, you already see so many topics/packages that R is capable of, and even that is just scratching the surface. Don’t be intimated by this.
R可以做很多事情,一開始可能有點(diǎn)讓人不知所措。 例如,僅在備忘單鏈接中 ,您已經(jīng)看到了R能夠支持的如此多的主題/程序包,甚至只是在刮擦表面。 不要被這個(gè)暗示。
We found that focusing on the fundamentals is the best way to learn R:
我們發(fā)現(xiàn),專注于基礎(chǔ)知識(shí)是學(xué)習(xí)R的最好方法:
Modifying the data with dplyr to do analysis
使用dplyr修改數(shù)據(jù)以進(jìn)行分析
Creating visualizations with ggplot2
使用ggplot2創(chuàng)建可視化
If you are able to do these well, then you will have a strong foundation for doing a lot with R.
如果您能夠做到這些很好,那么您將為使用R做很多事打下堅(jiān)實(shí)的基礎(chǔ)。
總結(jié)思想 (Closing thoughts)
I wanted to write this article to because I enjoyed teaching R classes at Facebook, and thought that my unique experiences as an instructor could be helpful for others who do not have access to these types of classes or who are looking for advice on ways to use R more effectively in their own work.
我之所以寫這篇文章,是因?yàn)槲蚁矚g在Facebook上教授R課,并認(rèn)為我作為一名講師的獨(dú)特經(jīng)歷會(huì)對(duì)那些無法使用此類課程或正在尋求使用方式建議的人有所幫助R在自己的工作中更有效。
翻譯自: https://towardsdatascience.com/6-things-i-learned-from-teaching-r-at-facebook-806fc2832ec0
一件登錄facebook
總結(jié)
以上是生活随笔為你收集整理的一件登录facebook_我从Facebook的R教学中学到的6件事的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 梦到和妈妈旅游是什么意思
- 下一篇: Lockdown Wheelie项目