bigquery 教程_bigquery挑战实验室教程从数据中获取见解
bigquery 教程
This medium article focusses on the detailed walkthrough of the steps I took to solve the challenge lab of the Insights from Data with BigQuery Skill Badge on the Google Cloud Platform (Qwiklabs). I got access to this lab in the Google Cloud Ready Facilitator Program. Thanks to Google!
這篇中篇文章重點(diǎn)介紹了我為解決Google Cloud Platform( Qwiklabs )上的BigQuery Skill Badge數(shù)據(jù)見解挑戰(zhàn)實(shí)驗(yàn)室而采取的步驟的詳細(xì)演練。 我可以通過Google Cloud Ready Facilitator計(jì)劃訪問此實(shí)驗(yàn)室。 感謝Google!
Till now, I have completed over 100 labs and 23 quests on Qwiklabs. Below is the reference of my profile.
到目前為止,我已經(jīng)完成了100多個(gè)實(shí)驗(yàn)室和Qwiklabs上的23個(gè)任務(wù) 。 以下是我的個(gè)人資料參考。
This lab is only recommended for students who have completed the labs in the Insights from Data with BigQuery Quest. Knowledge of SQL and BigQuery is also needed to solve this challenge lab. Are you up for the challenge? Let’s go!
僅向在使用BigQuery Quest進(jìn)行數(shù)據(jù)洞察中完成實(shí)驗(yàn)的學(xué)生推薦該實(shí)驗(yàn)。 的知識(shí) 解決此挑戰(zhàn)實(shí)驗(yàn)室也需要SQL 和 BigQuery 。 你準(zhǔn)備好接受挑戰(zhàn)了嗎? 我們走吧!
使用的數(shù)據(jù)集 (Dataset Used)
The dataset that we would be using in this challenge lab is bigquery-public-data.covid19_open_data.covid19_open_data. This dataset contains data related to covid-19 on a country basis globally. We would be using this in this skill badge tutorial.
我們將在此挑戰(zhàn)實(shí)驗(yàn)室中使用的數(shù)據(jù)集為bigquery-public-data.covid19_open_data.covid19_open_data。 該數(shù)據(jù)集包含全球基于國家/地區(qū)與covid-19相關(guān)的數(shù)據(jù)。 我們將在本技能徽章教程中使用它。
BigQuery Tutorial can be found on the reference below:
可以在以下參考資料中找到BigQuery教程:
挑戰(zhàn)場景 (Challenge Scenario)
There are 10 small tasks in this challenge lab, all of which should be completed to score 100/100. In order to pass the lab, there are 9 SQL commands and 1 Data Studio report that should be generated in order to score 100. This tutorial list out the steps I took to solve all the ten challenges within the lab. The ten tasks are as follows:
這個(gè)挑戰(zhàn)實(shí)驗(yàn)室中有10個(gè)小任務(wù) ,所有這些小任務(wù)都應(yīng)得分為100/100。 為了通過實(shí)驗(yàn)室,應(yīng)生成9條SQL命令和1個(gè)Data Studio報(bào)告才能獲得100分。本教程列出了我為解決實(shí)驗(yàn)室中的所有十個(gè)挑戰(zhàn)而采取的步驟。 十個(gè)任務(wù)如下:
Building a SQL query that outputs the total no. of confirmed cases.
建立一個(gè)SQL查詢,輸出總編號(hào)。 確診病例。
Building a SQL query that outputs the worst affected areas.
構(gòu)建一個(gè)SQL查詢以輸出受影響最嚴(yán)重的區(qū)域。
Building a SQL query that identifies the Hotspots in USA.
建立一個(gè)SQL查詢來標(biāo)識(shí)美國的熱點(diǎn)。
Building a SQL query that outputs the Fatality Ratio.
建立一個(gè)輸出致命率SQL查詢。
Building a SQL query that identifies a specific day according to the constraints.
建立一個(gè)SQL查詢來根據(jù)約束條件確定特定的一天 。
Building a SQL query that outputs the number of days with zero net new cases.
建立一個(gè)SQL查詢,以輸出凈新案例為零的天數(shù)。
Building a SQL query that outputs the Doubling Rate.
建立一個(gè)輸出雙倍速率SQL查詢。
Building a SQL query that outputs the Recovery Rate.
構(gòu)建一個(gè)輸出恢復(fù)率SQL查詢。
Building a SQL query that outputs the CDGR — Cumulative Daily Growth Rate.
構(gòu)建一個(gè)輸出CDGRSQL查詢-累積每日增長率。
Creating a Datastudio report.
創(chuàng)建一個(gè)Datastudio報(bào)告。
重要的提示 (Important Note)
Before starting this lab, ensure that you do whatever is required. Allocating more resources or doing something that is not required may lead to blocking of account by qwiklabs admin. Doing something other than that required in the lab results in account blocked by qwiklabs. Don’t worry. I came across this problem. The account can easily be unblocked by contacting qwiklabs support within a second.
在開始本實(shí)驗(yàn)之前,請(qǐng)確保您執(zhí)行所需的任何操作。 分配更多資源或執(zhí)行不必要的操作可能會(huì)導(dǎo)致qwiklabs管理員阻止帳戶。 如果執(zhí)行實(shí)驗(yàn)室中未要求的操作,則會(huì)導(dǎo)致qwiklabs阻止帳戶。 不用擔(dān)心 我遇到了這個(gè)問題。 一秒鐘內(nèi)聯(lián)系qwiklabs支持人員即可輕松解除帳戶鎖定。
加載數(shù)據(jù)集 (Loading the Dataset)
In the cloud console, once logged in completely, Go to Menu > BigQuery.
在云控制臺(tái)中,一旦完全登錄,請(qǐng)轉(zhuǎn)到菜單> BigQuery。
Click + Add Data and then click on Explore Public Datasets from the left pane.
單擊+添加數(shù)據(jù) ,然后從左窗格中單擊探索公共數(shù)據(jù)集 。
Search covid19_open_data and then select “Covid-19 Open Data”. Click on View Dataset to explore more!
搜索covid19_open_data ,然后選擇“ Covid-19 Open Data”。 單擊查看數(shù)據(jù)集以探索更多內(nèi)容!
Use filter and locate the table covid19_open_data under the covid19_open_data dataset.
使用過濾器并在covid19_open_data下找到表covid19_open_data 數(shù)據(jù)集。
任務(wù)詳細(xì)教程— 1 (Detailed Tutorial of Task — 1)
In task 1 it requires the user to execute a query that outputs the total count of confirmed cases on Apr 15, 2020. The output should contain only a single row containing the sum of confirmed cases across all the countries in the dataset. total_cases_worldwide should be the name of the column.
在任務(wù)1中,它要求用戶執(zhí)行查詢,以輸出2020年4月15日確診病例的總數(shù) 。輸出應(yīng)僅包含一行,其中包含數(shù)據(jù)集中所有國家/地區(qū)的確診病例的總數(shù)。 total_cases_worldwide應(yīng)該是列的名稱。
Copy the below query in the query editor and click on RUN.
在查詢編輯器中復(fù)制以下查詢,然后單擊“ 運(yùn)行”。
SELECTSUM(cumulative_confirmed) AS total_cases_worldwideFROM`bigquery-public-data.covid19_open_data.covid19_open_data`WHERE
date = "2020-04-15"
任務(wù)詳細(xì)教程— 2 (Detailed Tutorial of Task — 2)
Task 2 requires to build a query for extracting the result of: “How many states in the US had more than 100 deaths on Apr 10, 2020?” The output should have the field name as count_of_states.
任務(wù)2需要構(gòu)建一個(gè)查詢來提取以下結(jié)果:“ 到2020年4月10日,美國有多少州的死亡人數(shù)超過100? 輸出的字段名稱應(yīng)為count_of_states。
Hint: We don’t have to include NULL values.(Important)
提示:我們不必包含NULL值。(重要)
Copy the below query in the query editor and click on RUN.
在查詢編輯器中復(fù)制以下查詢,然后單擊“ 運(yùn)行”。
SELECTCOUNT(*) AS count_of_statesFROM (SELECTsubregion1_name AS state,SUM(cumulative_deceased) AS death_countFROM
`bigquery-public-data.covid19_open_data.covid19_open_data`WHERE
country_name="United States of America"AND date='2020-04-10'AND subregion1_name IS NOT NULLGROUP BY
subregion1_name
)WHERE death_count > 100
任務(wù)詳細(xì)教程— 3 (Detailed Tutorial of Task — 3)
Writing a query that will output the result of: “List all the states in the United States of America that had more than 1000 confirmed cases on Apr 10, 2020?” The output should have two columns named state and total_confirmed_cases that corresponds to State Name and the confirmed cases arranged in descending order.
編寫查詢將輸出以下結(jié)果:“ 列出2020年4月10日美國確診病例超過1000的所有州? ”輸出應(yīng)具有名為state和total_confirmed_cases的兩列,分別對(duì)應(yīng)于State Name和已確認(rèn)的個(gè)案,它們以降序排列。
Copy the below query in the query editor and click on RUN.
在查詢編輯器中復(fù)制以下查詢,然后單擊“ 運(yùn)行”。
SELECTsubregion1_name AS state,SUM(cumulative_confirmed) AS total_confirmed_casesFROM
`bigquery-public-data.covid19_open_data.covid19_open_data`WHERE
country_name="United States of America"AND date = "2020-04-10"GROUP BY subregion1_nameHAVING total_confirmed_cases > 1000ORDER BY total_confirmed_cases DESC
任務(wù)詳細(xì)教程— 4 (Detailed Tutorial of Task — 4)
Building a query in the query editor that will answer the following question: “What was the case-fatality ratio in Italy for the month of April 2020?”
在查詢編輯器中構(gòu)建一個(gè)查詢,該查詢將回答以下問題: “意大利2020年4月的病死率是多少? ”
Case-fatality ratio is defined as (total deaths / total confirmed cases) * 100. The output should have three columns named total_confirmed_cases, total_deaths and case_fatality_ratio.
病死率定義為(總死亡人數(shù)/確診病例總數(shù))*100 。輸出應(yīng)具有三列,分別稱為total_confirmed_cases , total_deaths和case_fatality_ratio 。
Copy the below query in the query editor and click on RUN.
在查詢編輯器中復(fù)制以下查詢,然后單擊“ 運(yùn)行”。
SELECT SUM(cumulative_confirmed) AS total_confirmed_cases, SUM(cumulative_deceased) AS total_deaths, (SUM(cumulative_deceased)/SUM(cumulative_confirmed))*100 AS case_fatality_ratioFROM `bigquery-public-data.covid19_open_data.covid19_open_data`WHERE country_name="Italy" AND date BETWEEN "2020-04-01" AND "2020-04-30"任務(wù)詳細(xì)教程— 5 (Detailed Tutorial of Task — 5)
Building a query that will answer the following question: “On what day did the total number of deaths cross 10000 in Italy?”
建立一個(gè)查詢,將回答以下問題:“ 意大利的總死亡人數(shù)在哪一天超過10000? ”
The query should output the date with a column name “date” and in the format “yyyy-mm-dd”.
查詢應(yīng)以列名稱“ date”和格式“ yyyy-mm-dd”輸出日期。
Copy the below query in the query editor and click on RUN.
在查詢編輯器中復(fù)制以下查詢,然后單擊“ 運(yùn)行”。
SELECTdateFROM
`bigquery-public-data.covid19_open_data.covid19_open_data`WHERE
country_name = 'Italy'AND cumulative_deceased > 10000ORDER BY dateLIMIT 1
任務(wù)詳細(xì)教程— 6 (Detailed Tutorial of Task — 6)
The query given should be updated to output the correct number of days in India between 21 Feb 2020 and 15 March 2020 when there were zero increases in the number of confirmed cases.
給出的查詢應(yīng)進(jìn)行更新,以輸出2020年2月21日至2020年3月15日之間印度的正確天數(shù),此時(shí)確診病例數(shù)增加為零。
Copy the below query in the query editor and click on RUN.
在查詢編輯器中復(fù)制以下查詢,然后單擊“ 運(yùn)行”。
WITH india_cases_by_date AS (SELECTdate,SUM(cumulative_confirmed) AS casesFROM
`bigquery-public-data.covid19_open_data.covid19_open_data`WHERE
country_name="India"AND date between '2020-02-21' and '2020-03-15'GROUP BY
dateORDER BY
date ASC
)
, india_previous_day_comparison AS
(SELECT
date,
cases,
LAG(cases) OVER(ORDER BY date) AS previous_day,
cases - LAG(cases) OVER(ORDER BY date) AS net_new_casesFROM india_cases_by_date
)SELECTCOUNT(date)FROM
india_previous_day_comparisonWHERE
net_new_cases = 0
任務(wù)詳細(xì)教程— 7 (Detailed Tutorial of Task — 7)
Using the query that we ran in Task 6 as a template, the user has to build a query to find out the dates on which the confirmed cases increased by more than 10% compared to the previous day in the US between the dates March 22, 2020 and April 20, 2020.
使用我們?cè)谌蝿?wù)6中運(yùn)行的查詢作為模板,用戶必須構(gòu)建查詢以找出確認(rèn)的病例比3月22日在美國的前一天增加了10%以上的日期, 2020年和2020年4月20日。
There should be four columns named Date, Confirmed_Cases_On_Day, Confirmed_Cases_Previous_Day and Percentage_Increase_In_Cases.
應(yīng)該有四列,分別命名為Date , Confirmed_Cases_On_Day , Confirmed_Cases_Previous_Day和Percentage_Increase_In_Cases 。
Copy the below query in the query editor and click on RUN.
在查詢編輯器中復(fù)制以下查詢,然后單擊“ 運(yùn)行”。
WITH us_cases_by_date AS (SELECTdate,SUM( cumulative_confirmed ) AS casesFROM
`bigquery-public-data.covid19_open_data.covid19_open_data`WHERE
country_name="United States of America"AND date between '2020-03-22' and '2020-04-20'GROUP BY
dateORDER BY
date ASC
)
, us_previous_day_comparison AS
(SELECT
date,
cases,
LAG(cases) OVER(ORDER BY date) AS previous_day,
cases - LAG(cases) OVER(ORDER BY date) AS net_new_cases,
(cases - LAG(cases) OVER(ORDER BY date))*100/LAG(cases) OVER(ORDER BY date) AS percentage_increaseFROM us_cases_by_date
)SELECT
Date,
cases AS Confirmed_Cases_On_Day,
previous_day AS Confirmed_Cases_Previous_Day,
percentage_increase AS Percentage_Increase_In_CasesFROM
us_previous_day_comparisonWHERE
percentage_increase > 10
任務(wù)詳細(xì)教程— 8 (Detailed Tutorial of Task — 8)
Building a query to list the recovery rates of countries on the date May 10, 2020 with only those countries having more than 50K confirmed cases and output arranged in descending order (limit to 10). The name of the columns in the output should be as country, recovered_cases, confirmed_cases, recovery_rate in order to score full marks.
生成查詢以列出2020年5月10日的國家的恢復(fù)率,只有那些確認(rèn)病例和產(chǎn)量超過5萬的國家/地區(qū)以降序排列(限制為10個(gè))。 在輸出列的名稱應(yīng)為國家 ,recovered_cases,confirmed_cases,recovery_rate才能得滿分。
Copy the below query in the query editor and click on RUN.
在查詢編輯器中復(fù)制以下查詢,然后單擊“ 運(yùn)行”。
WITH cases_by_country AS (SELECTcountry_name AS country,SUM(cumulative_confirmed) AS cases,SUM(cumulative_recovered) AS recovered_casesFROM
`bigquery-public-data.covid19_open_data.covid19_open_data`WHERE
date="2020-05-10"GROUP BY
country_name
)
, recovered_rate AS (SELECT
country, cases, recovered_cases,
(recovered_cases * 100)/cases AS recovery_rateFROM
cases_by_country
)SELECT country, cases AS confirmed_cases, recovered_cases, recovery_rateFROM
recovered_rateWHERE
cases > 50000ORDER BY recovery_rate DESCLIMIT 10
任務(wù)詳細(xì)教程— 9 (Detailed Tutorial of Task — 9)
Building a query that outputs the correct CDGR in the correct format. The CDGR or Cumulative Daily Growth Rate is calculated as:
建立一個(gè)以正確格式輸出正確CDGR的查詢。 CDGR或累計(jì)每日增長率計(jì)算為:
((last_day_cases/first_day_cases)^1/days_diff)-1)
((last_day_cases/first_day_cases)^1/days_diff)-1)
Where last_day_cases, first_day_cases and days_diff is given as:
其中last_day_cases,first_day_cases和days_diff給出為:
last_day_cases corresponds to the number of confirmed cases on May 10, 2020
last_day_cases對(duì)應(yīng)于2020年5月10日的確診病例數(shù)
first_day_cases corresponds to the number of confirmed cases on Feb 02, 2020
first_day_cases對(duì)應(yīng)于2020年2月2日的確診病例數(shù)
days_diff corresponds to the number of days between Feb 02 - May 10, 2020
days_diff對(duì)應(yīng)于2020年2月2日至5月10日之間的天數(shù)
Copy the below query in the query editor and click on RUN.
在查詢編輯器中復(fù)制以下查詢,然后單擊“ 運(yùn)行”。
WITHfrance_cases AS (SELECT
date,SUM(cumulative_confirmed) AS total_casesFROM
`bigquery-public-data.covid19_open_data.covid19_open_data`WHERE
country_name="France"AND date IN ('2020-01-24',
'2020-05-10')GROUP BY
dateORDER BY
date)
, summary as (SELECT
total_cases AS first_day_cases,
LEAD(total_cases) OVER(ORDER BY date) AS last_day_cases,
DATE_DIFF(LEAD(date) OVER(ORDER BY date),date, day) AS days_diffFROM
france_casesLIMIT 1
)select first_day_cases, last_day_cases, days_diff, POWER(last_day_cases/first_day_cases,1/days_diff)-1 as cdgrfrom summary
任務(wù)詳細(xì)教程— 10 (Detailed Tutorial of Task — 10)
For creating the Data Studio report, a number of steps should be followed.
要?jiǎng)?chuàng)建Data Studio報(bào)表,應(yīng)遵循許多步驟。
1. First of all, Copy the below query in the query editor and click on RUN.
1.首先,在查詢編輯器中復(fù)制以下查詢,然后單擊“ 運(yùn)行”。
SELECTdate, SUM(cumulative_confirmed) AS country_cases,SUM(cumulative_deceased) AS country_deathsFROM
`bigquery-public-data.covid19_open_data.covid19_open_data`WHERE
date BETWEEN '2020-03-15'AND '2020-04-30'AND country_name='United States of America'GROUP BY date
2. Click on EXPLORE DATA > Explore with Data Studio.
2.單擊探索 數(shù)據(jù) > 使用Data Studio探索 。
3. Give access to Data Studio and authorize it to control BigQuery.
3.授予對(duì)Data Studio的訪問權(quán)限,并授權(quán)它控制BigQuery。
If you fail to create a report for the very first time login of Data Studio, click + Blank Report option and accept the Terms of Service. Then, go back again to BigQuery page and click Explore with Data Studio again.
如果您第一次登錄Data Studio時(shí)未能創(chuàng)建報(bào)告,請(qǐng)單擊+空白報(bào)告選項(xiàng)并接受服務(wù)條款。 然后,再次返回BigQuery頁面,然后再次單擊“使用Data Studio探索” 。
4. Create a new Time series chart in the new Data Studio report by selecting Add a chart > Time series Chart.
4.通過選擇新的Data Studio報(bào)告創(chuàng)建一個(gè)新的時(shí)間序列圖表 添加圖表 > 時(shí)間序列圖 。
5. Add country_cases and country_deaths to the Metric field.
5.將country_cases和country_deaths添加到“ 度量”字段。
6. Click Save to commit the change.
6.單擊保存以提交更改。
恭喜!! (Congratulations!!)
This is the skill badge I got after completing this challenge lab :P
這是完成挑戰(zhàn)實(shí)驗(yàn)后獲得的技能徽章:P
Google Cloud — Skill Badge (Image by author)Google Cloud —技能徽章(作者提供的圖片)With this, we have come to the end of this challenge lab. Thanks for reading this and following along. Hope you loved it! Bundle of thanks for reading it!
至此,我們已經(jīng)到了挑戰(zhàn)實(shí)驗(yàn)室的終點(diǎn)。 感謝您閱讀并繼續(xù)。 希望你喜歡它! 捆綁感謝您閱讀!
My Portfolio and Linkedin :)
我的投資組合和Linkedin :)
翻譯自: https://medium.com/swlh/insights-from-data-with-bigquery-challenge-lab-tutorial-f868992ef9dc
bigquery 教程
總結(jié)
以上是生活随笔為你收集整理的bigquery 教程_bigquery挑战实验室教程从数据中获取见解的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 团队管理新思考_需要一个新的空间来思考讨
- 下一篇: java职业技能了解精通_如何通过精通数