利用云功能和API监视Google表格中的Cloud Dataprep作业状态
If you manage a data and analytics pipeline in Google Cloud, you may want to monitor it and obtain a comprehensive view of the end-to-end analytics process in order to react quickly when something breaks.
如果您在Google Cloud中管理數據和分析管道,則可能需要對其進行監視并獲得端到端分析過程的全面視圖,以便在出現問題時Swift做出React。
This article shows you how you can capture Cloud Dataprep jobs status via APIs leveraging Cloud Functions. We then input the statuses to a Google Sheet for an easy way to check the statuses of the jobs. Using the same principle, you can combine other Google Cloud service statuses in Google Sheets to obtain a comprehensive view of your data pipeline.
本文向您展示了如何通過利用Cloud Functions的API捕獲Cloud Dataprep作業狀態。 然后,我們將狀態輸入到Google表格中,以方便地檢查作業狀態。 使用相同的原理,您可以在Google表格中合并其他Google Cloud服務狀態,以獲得數據管道的全面視圖。
To illustrate this concept, we will assume you want to monitor a daily scheduled Dataprep job with a quick look at a Google Sheet to get an overview of potential failure. The icing on the cake is that you will also be able to check the recipe name and jobs profile results in Google Sheets.
為了說明這個概念,我們假設您想通過快速瀏覽Google表格來監視每日計劃的Dataprep作業,以大致了解潛在故障。 錦上添花的是,您還可以在Google表格中查看食譜名稱和工作資料結果。
This article is a step-by-step guide to the process of triggering Cloud Functions when a Cloud Dataprep job is finished and publishing the job results, status, and direct links into a Google Sheet.
本文是逐步指南,指導您完成Cloud Dataprep作業完成后觸發Cloud Functions并將流程結果,狀態和直接鏈接發布到Google表格中 。
Here is an example of a Google Sheet with jobs results and links published.
這是一個帶有工作結果和鏈接發布的Google表格示例。
Fig. 2 — High-level process to trigger a Cloud Function based on a Cloud Dataprep job execution圖2 —基于Cloud Dataprep作業執行觸發云功能的高級過程1.入門 (1. Getting Started)
To make this guide practical, we are sharing it here in Github, the Node.js code for the Cloud Function.
為了使本指南切實可行,我們在Github中共享它,Cloud功能的Node.js代碼。
You need a valid Google account and access to Cloud Dataprep and Cloud Functions to try it out. You can start from the Google Console https://console.cloud.google.com/ to activate the services.
您需要有效的Google帳戶并可以訪問Cloud Dataprep和Cloud Functions進行嘗試。 您可以從Google控制臺https://console.cloud.google.com/啟動以激活服務。
REMARK: To call APIs, one needs an Access Token. One must be a Google Cloud project owner to generate this Access Token. If you are not a Google Cloud project owner, you can try it out by using a personal Gmail account.
備注 :要調用API,需要一個訪問令牌。 一個人必須是Google Cloud項目所有者才能生成此訪問令牌。 如果您不是Google Cloud項目的所有者,則可以使用個人Gmail帳戶進行試用。
Fig. 3 — Get the access token from the Settings menu圖3-從“設置”菜單獲取訪問令牌2.創建HTTP Cloud函數以在Google表格中發布 (2. Create the HTTP Cloud Function to Publish in a Google Sheet)
First, we need to create the HTTP Cloud Function that will be triggered as a Webhook when a Dataprep job has finished.
首先,我們需要創建HTTP Cloud Function,當Dataprep作業完成時,它將作為Webhook觸發。
Create a Cloud Function from the Google Cloud console here. The trigger type must be “HTTP”. Give it a name and get a URL similar to https://us-central1-dataprep-premium-demo.cloudfunctions.net/Dataprep-Webhook-Function. Later on, we will need this URL while creating the Webhook in Dataprep. In our example, we will use the Node.js, provided above, as the Runtime under the Source Code section.
在此處從Google Cloud控制臺創建Cloud Function。 觸發類型必須為“ HTTP”。 給它命名并獲取類似于https://us-central1-dataprep-premium-demo.cloudfunctions.net/Dataprep-Webhook-Function的URL。 稍后,在Dataprep中創建Webhook時將需要此URL。 在我們的示例中,我們將使用上面提供的Node.js作為“源代碼”部分下的“運行時”。
Fig. 4 Create the Cloud Function to be called from Dataprep圖4創建要從Dataprep調用的Cloud FunctionIf you want to explore more about Cloud Functions, check out this tutorial.
如果您想了解有關Cloud Functions的更多信息,請查看本教程 。
The Cloud Function code follows this logic:
云功能代碼遵循以下邏輯:
Get more information (status, recipe id) about the job with the getJobGroup Dataprep API call. Documentation on this Dataprep API endpoint can be found here: https://clouddataprep.com/documentation/api/#operation/getJobGroup
通過getJobGroup Dataprep API調用獲取有關該作業的更多信息(狀態,配方ID)。 可以在以下位置找到有關此Dataprep API端點的文檔: https ://clouddataprep.com/documentation/api/#operation/getJobGroup
Get information (name, description) about the job’s recipe with getWrangledDataset Dataprep API call. Documentation on this Dataprep API endpoint can be found here: https://clouddataprep.com/documentation/api/#operation/getWrangledDataset
通過getWrangledDataset Dataprep API調用獲取有關作業配方的信息(名稱,描述)。 可以在以下位置找到有關此Dataprep API端點的文檔: https ://clouddataprep.com/documentation/api/#operation/getWrangledDataset
Job result URL is https://clouddataprep.com/jobs/<jobID>
作業結果URL為https://clouddataprep.com/jobs/ <jobID>
Job result profile in a PDF format can be downloaded from this URL: https://clouddataprep.com/v4/jobGroups/<jobID>/pdfResults
可以從以下URL下載PDF格式的作業結果配置文件: https : //clouddataprep.com/v4/jobGroups/< jobID>/ pdfResults
The Node.js code is here. You need to edit and replace the highlighted values in red with the proper one you retrieved in your Cloud Dataprep project.
Node.js代碼在這里 。 您需要用在Cloud Dataprep項目中檢索到的正確值來編輯并替換紅色突出顯示的值。
- Access Token to call Dataprep API: 訪問令牌以調用Dataprep API:
var DataprepToken = “eyJhjkfryue353lgh12ghjkdfsghk”
var DataprepToken =“ eyJhjkfryue353lgh12ghjkdfsghk”
- Google Sheet ID where you want to publish the results: 您要在其中發布結果的Google表格ID:
const JobSheetId = “1X63lFIfsdfd3dsfN0wm3SKx-Ro”
const JobSheetId =“ 1X63lFIfsdfd3dsfN0wm3SKx-Ro”
To retrieve the Google Spreadsheet ID, follow the explanations here.
要檢索Google Spreadsheet ID,請按照此處的說明進行操作。
- Google API Key: Google API密鑰:
sheetsAPI.spreadsheets.values.append({key:”AIzaSydfsfsdfLh0qu8q”,
sheetsAPI.spreadsheets.values.append({key:“ AIzaSydfsfsdfLh0qu8q”,
To retrieve the Google API Key, follow the explanations here.
要檢索Google API密鑰,請按照此處的說明進行操作。
You also need to add the following dependencies to your Node.js Cloud Function (PACKAGE.JSON tab):
您還需要將以下依賴項添加到您的Node.js云函數(PACKAGE.JSON選項卡):
Fig. 6 — Node.js dependencies packages圖6-Node.js依賴包You then need to deploy the Cloud Function. After it is deployed, the Cloud Function is running and waiting to be called from Cloud Dataprep when a job is executed. You can learn more here about deploying and executing Cloud Functions.
然后,您需要部署云功能。 部署后,Cloud Function正在運行,并等待執行作業時從Cloud Dataprep調用。 您可以在此處了解有關部署和執行云功能的更多信息。
3.創建一個Cloud Dataprep流并配置一個Webhook (3. Create a Cloud Dataprep Flow and Configure a Webhook)
Next, you need to create the Cloud Dataprep flow that will call the HTTP Cloud Function to publish the job result in Google Sheets.
接下來,您需要創建Cloud Dataprep流程,該流程將調用HTTP Cloud Function以在Google表格中發布作業結果。
You need to create and configure a Webhook task in your flow that will call your HTTP Cloud Function.
您需要在流程中創建并配置一個Webhook任務,該任務將調用HTTP Cloud Function。
Fig. 7 — Creating a Cloud Dataprep flow and configuring a Webhook task on a flow圖7 —創建Cloud Dataprep流并在流上配置Webhook任務The Webhook task needs to be configured with this information:
Webhook任務需要配置以下信息:
URL: This is the URL of the HTTP Cloud Function you previously created. For example, https://us-central1-dataprep-premium-demo.cloudfunctions.net/Dataprep-Webhook-Function.
URL :這是您先前創建的HTTP Cloud Function的URL。 例如, https://us-central1-dataprep-premium-demo.cloudfunctions.net/Dataprep-Webhook-Function 。
Headers: Use headers like those shown in the screenshot below with content-type and application/json.
標頭 :將標頭(如下面的屏幕快照所示)與content-type和application / json一起使用。
Body: Use the value {“jobid”:”$jobId”,”jobstatus”:”$jobStatus”} as shown in the below screenshot.
正文 :使用值{“作業ID”:“$ jobStatus”:” $的jobId”,” jobstatus”}如下面的截圖所示。
Trigger event: You can decide to trigger the Webhook for any status or just for jobs failed or completed.
觸發事件 :您可以決定觸發Webhook的任何狀態,或者僅針對失敗或完成的作業。
Trigger object: You can decide to trigger the Webhook for only specific outputs in the flow, or for any job executed in the flow.
觸發對象 :您可以決定僅針對流中的特定輸出或流中執行的任何作業來觸發Webhook。
When you have entered this information, you can test your Webhook task that calls your Cloud Function.
輸入此信息后,您可以測試調用您的Cloud Function的Webhook任務。
Fig. 8 — Webhook task parameters to call the Cloud Function圖8 — Webhook任務參數調用云函數After you save the Webhook task, it is then ready to be called when the job is executed.
保存Webhook任務后,便可以在執行作業時調用它。
Fig. 9 — Webhook task created圖9-創建Webhook任務4.測試端到端流程 (4. Testing the End-to-End Process)
You are now ready to test the end-to-end process by running a job from your Dataprep job and see the job result status added to your Google Sheet.
現在,您可以通過運行Dataprep作業中的作業來測試端到端流程,并查看作業結果狀態已添加到Google表格中。
Fig. 10 — Run a Dataprep job圖10-運行Dataprep作業 Fig. 11 — Job result status and links published in the Google Sheet圖11-作業結果狀態和在Google表格中發布的鏈接Lastly, you can also check proper execution details (API call with the parameter and Cloud Dataprep job status) by reviewing the Google Cloud Functions logs located here.
最后,您還可以通過查看位于此處的Google Cloud Functions日志來檢查正確的執行詳細信息(帶有參數的API調用和Cloud Dataprep作業狀態)。
Fig. 12 — Cloud Functions logs圖12-云功能日志結論 (Conclusion)
You should now understand the fundamental principles associated with automatically publishing Dataprep Job results in a Google Sheet, so you can monitor and share summary information easily to a broader team.
現在,您應該了解與在Google表格中自動發布Dataprep Job結果相關的基本原理,以便可以輕松地將摘要信息監視和共享給更廣泛的團隊。
You have learned about
您已經了解了
- Cloud Dataprep APIs Cloud Dataprep API
- Cloud Dataprep Webhooks Cloud Dataprep Webhooks
- Cloud Functions calling an API 調用API的云函數
You can also extend this solution to monitor additional Google Cloud services for end-to-end data pipeline monitoring.
您也可以擴展此解決方案,以監視其他Google Cloud服務以進行端到端數據管道監視。
You’re now ready to automate the monitoring of your job status. You can also automate Cloud Dataprep leveraging another Cloud Function or an external scheduler. Take a look at these articles explaining how to orchestrate Cloud Dataprep jobs using Cloud Composer and how to automate a Cloud Dataprep pipeline when a file arrives in Cloud Storage.
現在,您可以自動監視作業狀態了。 您還可以利用其他Cloud Function或外部調度程序來自動化Cloud Dataprep。 請看一下這些文章,這些文章解釋了如何使用Cloud Composer編排Cloud Dataprep作業 , 以及在文件到達Cloud Storage時如何自動化Cloud Dataprep管道 。
Originally published at www.trifacta.com
最初發布在 www.trifacta.com
翻譯自: https://towardsdatascience.com/leverage-cloud-functions-and-apis-to-monitor-cloud-dataprep-jobs-status-in-a-google-sheet-b412ee2b9acc
總結
以上是生活随笔為你收集整理的利用云功能和API监视Google表格中的Cloud Dataprep作业状态的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: tkmapper教程_tkmapper
- 下一篇: 谷歌联合学习的论文_Google的未来联