数据库备份策略 分布式_管理优秀的分布式数据团队的4种基本策略
數(shù)據(jù)庫備份策略 分布式
COVID-19 has forced nearly every organization to adapt to a new workforce reality: distributed teams. We share four key tactics for turning your remote data team into a force multiplier for your entire company.
COVID-19迫使幾乎每個組織都適應(yīng)新的勞動力現(xiàn)實:分散的團隊。 我們分享了四個關(guān)鍵策略,可將您的遠程數(shù)據(jù)團隊變成整個公司的力量倍增器。
It’s month 6 (or is it 72? It’s hard to tell) of the global pandemic, and despite the short commute from your bedroom to the kitchen table, you’re still adjusting to this new normal.
現(xiàn)在是全球大流行的第6個月(或者是72歲?這很難說),盡管從臥室到廚房的通勤時間很短,但您仍在適應(yīng)這一新常態(tài)。
Your team is responsible for all the same tasks (handling ad-hoc queries, fixing broken pipelines, implementing new rules and logic, etc.), but troubleshooting broken data has only gotten harder. It’s difficult enough to identify the root cause of a data downtime incident when you’re all 5 feet away from each other; it’s 10 times harder when you’re working on different time zones.
您的團隊負責(zé)所有相同的任務(wù)(處理臨時查詢,修復(fù)損壞的管道,實現(xiàn)新規(guī)則和邏輯等),但是對損壞的數(shù)據(jù)進行故障排除只會變得更加困難。 當(dāng)您彼此相距5英尺時,要確定數(shù)據(jù)停機事件的根本原因已經(jīng)非常困難。 當(dāng)您在不同時區(qū)工作時,難度會增加10倍。
Distributed teams aren’t novel, in fact, they’ve become increasingly common over the last few decades, but working during a pandemic is new for everyone. While this shift widens the geographic talent pool, collaborating at this scale entails unforeseen hurdles, particularly when it comes to working with real-time data.
分布式團隊并不是什么新奇的事物,事實上,在過去的幾十年里它們已經(jīng)變得越來越普遍,但是在大流行期間工作對于每個人來說都是新事物。 盡管這種轉(zhuǎn)變擴大了地理人才庫,但這種規(guī)模的協(xié)作帶來了不可預(yù)見的障礙,尤其是在處理實時數(shù)據(jù)時。
Your daily standup only gets you so far.
每天的站起來只會讓您走得那么遠。
Here are 4 essential steps to managing a great distributed data team:
以下是管理一個出色的分布式數(shù)據(jù)團隊的4個基本步驟:
記錄所有東西 (Document all the things)
Information about which tables and columns are “good or bad” breaks down when teams are distributed. One data scientist we spoke with at a leading e-commerce company told us that it takes 9 months of working on a team to develop a spidey-sense for what data lives where, which tables are the ‘right’ ones, and which columns are healthy vs. experimental.
分配團隊時,有關(guān)哪些表和列是“好是壞”的信息會分解。 我們在一家領(lǐng)先的電子商務(wù)公司與之交談的一位數(shù)據(jù)科學(xué)家告訴我們,一個團隊需要花9個月的時間開發(fā)出針對數(shù)據(jù)存放在何處,哪些表是“正確的”表,哪些列是什么的間諜意識。健康與實驗。
The answer? Consider investing in a data catalog or lineage solution. Such technologies provide one source of truth about a team’s data assets, and make it easy to understand formatting and style guidelines for data input. Data catalogs become particularly important when data governance and compliance come into play, which is top of mind for data teams in financial services, healthcare, and many other industries.
答案? 考慮投資數(shù)據(jù)目錄或沿襲解決方案 。 此類技術(shù)提供了有關(guān)團隊數(shù)據(jù)資產(chǎn)的一個真實來源,并易于理解數(shù)據(jù)輸入的格式和樣式準(zhǔn)則。 當(dāng)數(shù)據(jù)治理和合規(guī)性發(fā)揮作用時,數(shù)據(jù)目錄就變得尤為重要,這對于金融服務(wù),醫(yī)療保健和許多其他行業(yè)的數(shù)據(jù)團隊而言,是最重要的。
設(shè)置數(shù)據(jù)的SLA和SLO (Set SLAs and SLOs for data)
It’s important to ensure alignment not just among data team members but with data consumers (i.e., marketing, executives, or operations teams), too. To do so, we suggest taking a page out of the site reliability engineering book and setting and align clear service level agreements (SLAs) and service level objectives (SLOs) for data. SLAs for expectations around data freshness, volume, and distribution, as well as other pillars of observability, will be crucial here.
重要的是,不僅要確保數(shù)據(jù)團隊成員之間的一致性,而且還要確保與數(shù)據(jù)消費者(即市場,執(zhí)行人員或運營團隊)的一致性。 為此,我們建議從站點可靠性工程手冊中抽出一頁,并為數(shù)據(jù)設(shè)置并調(diào)整明確的服務(wù)水平協(xié)議(SLA)和服務(wù)水平目標(biāo)(SLO)。 關(guān)于數(shù)據(jù)新鮮度,數(shù)據(jù)量和分布以及其他可觀察性Struts的 SLA在這里至關(guān)重要。
Katie Bauer, a Data Science Manager at Reddit, suggests distributed data teams maintain a central document with expected delivery dates for important projects, and review that document weekly.
Reddit的數(shù)據(jù)科學(xué)經(jīng)理Katie Bauer建議分布式數(shù)據(jù)團隊維護一個中心文檔,其中包含重要項目的預(yù)計交付日期,并每周審查該文檔。
“Instead of pinging my team for updates throughout the week when questions arise from stakeholders, I can easily visit this document for answers,” she said. “This keeps us focused on delivering our work and avoids unnecessary diversions.”
她說:“當(dāng)利益相關(guān)者提出問題時,我不必整周對我的團隊進行更新,而是可以輕松訪問此文檔以獲取答案,”她說。 “這使我們專注于交付工作,避免了不必要的轉(zhuǎn)移。”
投資自助工具 (Invest in self-serve tooling)
Investing in self-serve data tools (including cloud warehouses like Snowflake and Redshift, as well as data analytics solutions, like Mode, Tableau, and Looker) will streamline data democratization no matter the location or persona of the data user.
投資自助數(shù)據(jù)工具(包括Snowflake和Redshift之類的云倉庫,以及Mode,Tableau和Looker之類的數(shù)據(jù)分析解決方案)將簡化數(shù)據(jù)民主化,無論數(shù)據(jù)用戶的位置或角色如何。
Similarly, self-serve versioning control systems helps everyone stay on the same page when it comes to collaborating on larger workflows, which becomes extremely important when it comes to leveraging real-time data across time zones.
同樣,自助式版本控制系統(tǒng)可以幫助每個人在較大的工作流程上保持一致,這在跨時區(qū)利用實時數(shù)據(jù)時顯得尤為重要。
優(yōu)先考慮數(shù)據(jù)可靠性 (Prioritize data reliability)
Industries that are responsible for managing PII and other sensitive customer information, like healthcare and financial services, have a low tolerance for mistakes. Data teams need confidence that data is secure and accurate across their pipeline, from consumption to output. The right processes and procedures around data reliability can prevent such data downtime incidents and restore trust in your data.
醫(yī)療保健和金融服務(wù)等負責(zé)管理PII和其他敏感客戶信息的行業(yè)對錯誤的容忍度較低。 數(shù)據(jù)團隊需要信心,確保從消費到輸出的整個管道中的數(shù)據(jù)都是安全和準(zhǔn)確的。 圍繞數(shù)據(jù)可靠性的正確流程和步驟可以防止此類數(shù)據(jù)停機事件并恢復(fù)對數(shù)據(jù)的信任。
For many years, data quality monitoring was the primary way in which data teams caught broken data, but this isn’t cutting it anymore, particularly when real-time data and distributed teams are the norm. Our remote-first world calls for a more comprehensive solution that can seamlessly track the five pillars of data observability and other important data health metrics tailored to the needs of your organization.
多年來,數(shù)據(jù)質(zhì)量監(jiān)視是數(shù)據(jù)團隊捕獲損壞的數(shù)據(jù)的主要方式,但是這種情況已不再減少,尤其是在實時數(shù)據(jù)和分布式團隊成為常態(tài)的情況下。 我們的遠程第一世界需要一個更全面的解決方案,該解決方案可以無縫地跟蹤數(shù)據(jù)可觀察性的五個Struts以及適合組織需求的其他重要數(shù)據(jù)健康指標(biāo)。
記住:沒事也可以 (Remember: it’s OK to not be OK)
We hope these tips help you accept and even embrace the data world’s new normal.
我們希望這些技巧可以幫助您接受甚至接受數(shù)據(jù)世界的新常態(tài)。
On top of this more tactical advice, however, it never hurts to remember that it’s OK to not be OK. Emilie Schario, GitLab’s first data analyst who is now an internal strategy consultant, put it best: “This is not normal remote work. What it takes to be successful during a period of forced remote work in a global pandemic is different from what it means to be remote-as-usual.”
但是,除了這個更具戰(zhàn)術(shù)性的建議外,記住“ 不行是可以的”也從未有過任何傷害。 GitLab的第一位數(shù)據(jù)分析師Emilie Schario現(xiàn)已成為內(nèi)部戰(zhàn)略顧問,他最好地指出:“這不是正常的遠程工作。 在全球大流行中被迫進行遠程工作期間要取得成功所需要的與不同于通常進行遠程管理意味著什么。”
We’d love to hear your advice for leading distributed teams! Reach out to Barr Moses with your words of wisdom.
我們很想聽聽您對領(lǐng)先的分布式團隊的建議! 用您的智慧之言與 Barr Moses 接觸 。
This article was written by Will Robins & Barr Moses.
本文由威爾·羅賓斯和巴爾·摩西撰寫。
翻譯自: https://towardsdatascience.com/4-essential-tactics-for-managing-a-great-distributed-data-team-e7df9f85e6fa
數(shù)據(jù)庫備份策略 分布式
總結(jié)
以上是生活随笔為你收集整理的数据库备份策略 分布式_管理优秀的分布式数据团队的4种基本策略的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 梦到姐姐打胎预示什么
- 下一篇: nba数据库统计_NBA板块的价值-从统