大数据平台构建_如何像产品一样构建数据平台
大數(shù)據(jù)平臺構(gòu)建
重點 (Top highlight)
Over the past few years, many companies have embraced data platforms as an effective way to aggregate, handle, and utilize data at scale. Despite the data platform’s rising popularity, however, little literature exists on what it actually takes to successfully build one.
在過去的幾年中,許多公司都將數(shù)據(jù)平臺視為一種有效的大規(guī)模聚合,處理和利用數(shù)據(jù)的方法。 盡管數(shù)據(jù)平臺越來越受歡迎,但是,關(guān)于成功構(gòu)建數(shù)據(jù)平臺實際所需的文獻很少。
Barr Moses, CEO & co-founder of Monte Carlo, and Atul Gupte, former Product Manager for Uber’s Data Platform Team, share advice for designing a data platform that will maximize the value and impact of data on your organization.
蒙特卡洛(Monte Carlo) 首席執(zhí)行官兼聯(lián)合創(chuàng)始人 Barr Moses 和 Uber數(shù)據(jù)平臺團隊 前產(chǎn)品經(jīng)理 Atul Gupte 分享了有關(guān)設(shè)計數(shù)據(jù)平臺的建議,以最大程度地提高數(shù)據(jù)對組織的價值和影響。
Your company likes data. A lot. Your boss requested additional headcount this year to beef up your data engineering team (Presto and Kafka and Hadoop, oh my!). Your VP of Data is constantly lurking in your company’s Eng-Team Slack channel to see “how people feel” about migrating to Snowflake. Your CEO even wants to become data-driven, whatever that means. To say that data is a priority for your company would be an understatement.
您的公司喜歡數(shù)據(jù)。 很多。 您的老板今年要求增加人員,以增強您的數(shù)據(jù)工程團隊(Presto,Kafka和Hadoop,我的天哪!)。 您的數(shù)據(jù)副總裁一直潛伏在公司的Eng-Team Slack頻道中,以了解人們對遷移到Snowflake的 “感覺”。 您的CEO甚至想成為數(shù)據(jù)驅(qū)動型,無論這意味著什么。 要說數(shù)據(jù)是貴公司的優(yōu)先事項,那是輕描淡寫。
To satisfy your company’s insatiable appetite for data, you may even be building a complex, multi-layered data ecosystem: in other words, a data platform.
為了滿足公司對數(shù)據(jù)的無限需求,您甚至可能正在構(gòu)建一個復(fù)雜的多層數(shù)據(jù)生態(tài)系統(tǒng):換句話說,就是一個數(shù)據(jù)平臺 。
At its core, a data platform is a central repository for all data, handling the collection, cleansing, transformation, and application of data to generate business insights. For most organizations, building a data platform is no longer a nice-to-have but a necessity, with many businesses distinguishing themselves from the competition based on their ability to glean actionable insights from their data, whether to improve the customer experience, increase revenue, or even define their brand.
數(shù)據(jù)平臺的核心是所有數(shù)據(jù)的中央存儲庫,用于處理數(shù)據(jù)的收集,清理,轉(zhuǎn)換和應(yīng)用以產(chǎn)生業(yè)務(wù)見解。 對于大多數(shù)組織而言,構(gòu)建數(shù)據(jù)平臺已不再是一個好主意 ,而是必不可少的 ,因為許多企業(yè)基于從數(shù)據(jù)中收集可行的見解,是否改善客戶體驗,增加收入的能力,將自己與競爭對手區(qū)分開來。 ,甚至定義自己的品牌。
Much in the same way that many view data itself as a product, data-first companies like Uber, LinkedIn, and Facebook increasingly view data platforms as “products,” too, with dedicated engineering, product, and operational teams. Despite their ubiquity and popularity, however, data platforms are often spun up with little foresight into who is using them, how they’re being used, and what engineers and product managers can do to optimize these experiences.
就像許多人將數(shù)據(jù)本身視為產(chǎn)品一樣, Uber , LinkedIn和Facebook等數(shù)據(jù)優(yōu)先公司也越來越多地將數(shù)據(jù)平臺視為“產(chǎn)品”,并擁有專門的工程,產(chǎn)品和運營團隊。 盡管數(shù)據(jù)平臺無處不在且很受歡迎,但是它們常常毫無預(yù)見性地演變?yōu)檎l在使用它們,如何使用它們以及工程師和產(chǎn)品經(jīng)理可以做什么以優(yōu)化這些體驗。
Whether you’re just getting started or are in the process of scaling one, we share five best practices for avoiding these common pitfalls and building the data platform of your dreams:
無論您是剛剛起步還是正在擴展一個,我們都會分享五種最佳實踐,以避免這些常見的陷阱并構(gòu)建您夢想中的數(shù)據(jù)平臺:
使您的產(chǎn)品目標(biāo)與業(yè)務(wù)目標(biāo)保持一致 (Align your product’s goals with the goals of the business)
It’s important to align your platform’s goals with the overarching data goals of your business. Image courtesy of John Schnobirch on Unsplash. 使平臺的目標(biāo)與業(yè)務(wù)的總體數(shù)據(jù)目標(biāo)保持一致很重要。 圖片由 John Schnobirch 在Unsplash上??提供。For several decades, data platforms were viewed as a means to an end versus “the end,” as in, the core product you’re building. In fact, although data platforms powered many services, fueling rich insights to the applications that power our lives, they weren’t given the respect and attention they truly deserve until very recently.
幾十年來,數(shù)據(jù)平臺一直被視為實現(xiàn)目標(biāo)而不是“終結(jié)”的手段,就像您正在構(gòu)建的核心產(chǎn)品一樣。 實際上,盡管數(shù)據(jù)平臺為許多服務(wù)提供了支持,并為支持我們生活的應(yīng)用程序提供了豐富的見識,但直到最近,他們才真正得到應(yīng)有的重視和關(guān)注。
When you’re building or scaling your data platform, the first question you should ask is: how does data map to your company’s goals?
在構(gòu)建或擴展數(shù)據(jù)平臺時,您應(yīng)該問的第一個問題是: 數(shù)據(jù)如何映射到公司的目標(biāo)?
To answer this question, you have to put on your data platform product manager hat. Unlike specific product managers, a data platform product manager must understand the big picture versus area-specific goals since data feeds into the needs of every other functional team, from marketing and recruiting to business development and sales.
要回答這個問題,您必須戴上數(shù)據(jù)平臺產(chǎn)品經(jīng)理的帽子。 與特定的產(chǎn)品經(jīng)理不同, 數(shù)據(jù)平臺的產(chǎn)品經(jīng)理必須了解全局和特定區(qū)域的目標(biāo),因為數(shù)據(jù)會滿足從營銷和招聘到業(yè)務(wù)開發(fā)和銷售的每個其他職能團隊的需求 。
For instance, if your business’s goal is to increase revenue (go big or go home!), how does data help you achieve these goals? For the sake of this experiment, consider the following questions:
例如,如果您的企業(yè)目標(biāo)是增加收入(變大或回家!),那么數(shù)據(jù)如何幫助您實現(xiàn)這些目標(biāo)? 為了進行此實驗,請考慮以下問題:
- What services or products drive revenue growth? 哪些服務(wù)或產(chǎn)品推動收入增長?
- What data do these services or products collect? 這些服務(wù)或產(chǎn)品收集什么數(shù)據(jù)?
- What do we need to do with the data before we can use it? 在使用數(shù)據(jù)之前,我們需要對數(shù)據(jù)做什么?
- Which teams need this data? What will they do with it? 哪些團隊需要此數(shù)據(jù)? 他們將如何處理?
- Who will have access to this data or the analytics it generates? 誰將有權(quán)訪問此數(shù)據(jù)或其生成的分析?
- How quickly do these users need access to this data? 這些用戶需要多長時間才能訪問此數(shù)據(jù)?
- What, if any, compliance or governance checks does the platform need to address? 平臺需要解決哪些(如果有)合規(guī)性或治理檢查?
By answering these questions, you’ll have a better understanding of how to prioritize your product roadmap, as well as who you need to build for (often, the engineers) versus design for (the day-to-day platform users, including analysts). Moreover, this holistic approach to KPI development and execution strategy sets your platform up for a more scalable impact across teams.
通過回答這些問題,您將更好地了解如何確定產(chǎn)品路線圖的優(yōu)先級,以及為(通常是工程師)為誰構(gòu)建的(而不是針對(包括平臺的)日常平臺用戶的設(shè)計) )。 而且,這種用于KPI開發(fā)和執(zhí)行策略的整體方法為平臺建立了跨團隊的更具可擴展性的影響。
獲得正確的利益相關(guān)者的反饋和支持 (Gain feedback and buy-in from the right stakeholders)
It goes without saying that receiving both buy-in upfront and iterative feedback throughout the product development process are necessary components of the data platform journey. What isn’t as widely understood is whose voice you should care about.
毋庸置疑,在整個產(chǎn)品開發(fā)過程中,既要獲得預(yù)購的支持,又要獲得迭代式反饋,這是數(shù)據(jù)平臺之旅的必要組成部分。 尚未廣為人知的是您應(yīng)該關(guān)注誰的聲音。
Yes, you need the ultimate sign-off from your CTO or VP of Data on the finished product, but their decisions are often informed by their trusted advisors: staff engineers, technical program managers, and other day-to-day data practitioners.
是的,您需要最終產(chǎn)品的CTO或數(shù)據(jù)副總裁的最終批準(zhǔn),但他們的決定通常是由其值得信賴的顧問(員工工程師,技術(shù)程序經(jīng)理和其他日常數(shù)據(jù)從業(yè)人員)告知的。
While developing a new data cataloging system for her company, one product manager we spoke with at a leading transportation company spent 3 months trying to sell her VP of Engineering on her team’s idea, only to be shut down in a single email by his chief-of-staff.
在為公司開發(fā)新的數(shù)據(jù)分類系統(tǒng)時,我們在一家領(lǐng)先的運輸公司與一位產(chǎn)品經(jīng)理進行了交流,他們花了3個月的時間試圖根據(jù)她的團隊的想法出售她的工程副總裁,但隨后被他的首席執(zhí)行官以一封電子郵件關(guān)閉了,工作人員。
Consider different tactics based on the DNA of your company. We suggest following these three concurrent steps:
根據(jù)您公司的DNA考慮不同的策略。 我們建議遵循以下三個并行步驟:
Apply a customer-centric approach, no matter who you’re talking to. Position the platform as a means of empowering different types of personas in your data ecosystem, including both your data team (data engineers, data scientists, analysts, and researchers) and data consumers (program managers, executives, business development, and sales, to name a few categories). A great data platform will enable the technical users to do their work easily and efficiently, while also allowing less technical personas to leverage rich insights or put together visualizations based on data without much assistance from engineers and analysts.
無論您與誰聊天,都應(yīng)以客戶為中心 。 將平臺定位為增強數(shù)據(jù)生態(tài)系統(tǒng)中不同類型角色的一種手段,包括您的數(shù)據(jù)團隊(數(shù)據(jù)工程師,數(shù)據(jù)科學(xué)家,分析師和研究人員)和數(shù)據(jù)消費者(程序經(jīng)理,主管,業(yè)務(wù)開發(fā)和銷售),列舉幾個類別)。 出色的數(shù)據(jù)平臺將使技術(shù)用戶能夠輕松高效地完成工作,同時還允許較少的技術(shù)人員利用豐富的見解或基于數(shù)據(jù)將可視化結(jié)果整合在一起,而無需工程師和分析師的大力支持。
At the end of the day, it’s important that this experience nurtures a community of data enthusiasts that build, share, and learn together. Since your platform has the potential to serve the entire company, everyone should feel invested in its success, even if that means making some compromises along the way.
歸根結(jié)底,重要的是,這種體驗應(yīng)養(yǎng)育一群數(shù)據(jù)愛好者,他們可以一起建立,共享和學(xué)習(xí)。 由于您的平臺有潛力服務(wù)于整個公司,因此每個人都應(yīng)該為自己的成功而投入,即使這意味著在此過程中做出一些妥協(xié)。
優(yōu)先考慮長期增長和可持續(xù)性與短期收益 (Prioritize long-term growth and sustainability vs. short-term gains)
Data solutions with short-term usability in mind are often easier to get off the ground, but over time, end up being more costly than platforms built with sustainability in mind. (Image courtesy of Atul Gupte.) 考慮到短期可用性的數(shù)據(jù)解決方案通常更容易上手,但是隨著時間的流逝,最終要比考慮到可持續(xù)性的平臺成本更高。 (圖片由Atul Gupte提供。)Unlike other types of products, data platforms are not successful simply because they benefit “first-to-market.” Since data platforms are almost exclusively internal tools, we’ve found that the best data platforms are built with sustainability in mind versus feature-specific wins.
與其他類型的產(chǎn)品不同,數(shù)據(jù)平臺之所以不能成功,不僅僅是因為它們有益于“首創(chuàng)”。 由于數(shù)據(jù)平臺幾乎完全是內(nèi)部工具,因此我們發(fā)現(xiàn),構(gòu)建最佳數(shù)據(jù)平臺時要考慮到可持續(xù)性與特定功能的優(yōu)勢。
Remember: your customer is your company, and your company’s success is your success. This is not to say that your roadmap won’t change several times over (it will), but when you do make changes, do it with growth and maturation in mind.
請記住:您的客戶就是您的公司,而公司的成功就是您的成功。 這并不是說您的路線圖不會多次改變(它會改變),但是當(dāng)您進行更改時,請牢記增長和成熟度。
For instance, Uber’s big data platform was built over the course of five years, constantly evolving with the needs of the business; Pinterest has gone through several iterations of their core data analytics product; and leading the pack, LinkedIn has been building and iterating on its data platform since 2008!
例如, 優(yōu)步(Uber)的大數(shù)據(jù)平臺是在過去的五年中建立的,并隨著業(yè)務(wù)需求不斷發(fā)展。 Pinterest已經(jīng)對其核心數(shù)據(jù)分析產(chǎn)品進行了多次迭代。 從2008年開始, LinkedIn就一直在其數(shù)據(jù)平臺上進行構(gòu)建和迭代!
Our suggestion: choose solutions that make sense in the context of your organization, and align your plan with these expectations and deadlines. Sometimes, quick wins as part of a larger product development strategy can help with achieving internal buy-in — as long as it’s not shortsighted. Rome wasn’t built in a day, and neither was your data platform.
我們的建議: 選擇在您的組織范圍內(nèi)有意義的解決方案,并使您的計劃與這些期望和最后期限保持一致。 有時,只要不是短視的話,將快速獲勝作為更大的產(chǎn)品開發(fā)策略的一部分可以幫助實現(xiàn)內(nèi)部認(rèn)可。 羅馬不是一天建成的,您的數(shù)據(jù)平臺也不是一天。
簽署數(shù)據(jù)的基準(zhǔn)指標(biāo)及其測量方式 (Sign-off on baseline metrics for your data and how you measure it)
It doesn’t matter how great your data platform is if you can’t trust your data, but data quality means different things to different stakeholders. Consequently, your data platform won’t be successful if you and your stakeholders aren’t aligned on this definition.
如果您不信任數(shù)據(jù),則數(shù)據(jù)平臺的強大程度并不重要,但是數(shù)據(jù)質(zhì)量對于不同的利益相關(guān)者而言意味著不同的事情。 因此,如果您和您的利益相關(guān)者對此定義不統(tǒng)一,則您的數(shù)據(jù)平臺將不會成功。
To address this, it’s important to set baseline expectations for your data reliability, in other words, your organization’s ability to deliver high data availability and health throughout the entire data life cycle. Setting clear Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for software application reliability is a no-brainer. Data teams should do the same for their data pipelines.
為了解決這個問題,重要的是為數(shù)據(jù)可靠性設(shè)定基線期望,換句話說,就是組織在整個數(shù)據(jù)生命周期中提供高數(shù)據(jù)可用性和運行狀況的能力。 為軟件應(yīng)用程序的可靠性設(shè)置明確的服務(wù)水平目標(biāo)(SLO)和服務(wù)水平指標(biāo)(SLI)并非難事。 數(shù)據(jù)團隊?wèi)?yīng)該對他們的數(shù)據(jù)管道做同樣的事情。
This isn’t to say that different stakeholders will have the same vision for what “good data” looks like; in fact, they probably won’t, and that’s OK. Instead of fitting square pegs into round holes, it’s important to create a baseline metric of data reliability and, as with building a new platform feature, gain sign-off on the lowest common denominator.
這并不是說不同的利益相關(guān)者對“好數(shù)據(jù)”的外觀會有相同的看法; 實際上,他們可能不會,那就可以了。 與將方形釘插入圓Kong中不同,重要的是創(chuàng)建數(shù)據(jù)可靠性的基準(zhǔn)度量標(biāo)準(zhǔn),并且與構(gòu)建新的平臺功能一樣,獲得最低公分母上的簽字。
We suggest choosing a novel measurement (like this one for data downtime) that will help data practitioners across the company align on baseline quality metrics.
我們建議選擇一種新穎的度量標(biāo)準(zhǔn)( 例如用于數(shù)據(jù)停機的度量標(biāo)準(zhǔn)),以幫助整個公司的數(shù)據(jù)從業(yè)人員調(diào)整基準(zhǔn)質(zhì)量指標(biāo)。
知道何時建造與購買 (Know when to build vs. buy)
One of the first decisions you have to make is whether or not to build the platform from scratch or purchase the technology (or several supporting technologies) from a vendor.
您首先要做出的決定之一是是否從頭開始構(gòu)建平臺或從供應(yīng)商那里購買技術(shù)(或幾種支持技術(shù))。
While companies like — you guessed it — Uber, LinkedIn, and Facebook have opted to build their own data platforms, often on top of open source solutions, it doesn’t always make sense for your needs. While there isn’t a magic formula that will tell you whether to build vs. buy, we’ve found that there is value in buying until you’re convinced that:
盡管您猜對了,但Uber,LinkedIn和Facebook這樣的公司通常選擇在開源解決方案之上構(gòu)建自己的數(shù)據(jù)平臺,但這并不總是符合您的需求。 雖然沒有一個神奇的公式可以告訴您是建造還是購買,但我們發(fā)現(xiàn)購買是有價值的,直到您確信:
- The product needs to operate using sensitive/classified information (e.g., financial or health records) that cannot be shared with external vendors for regulatory reasons 產(chǎn)品需要使用出于監(jiān)管原因無法與外部供應(yīng)商共享的敏感/分類信息(例如財務(wù)或健康記錄)進行操作
- Specific customizations are required for it to work well with other internal tools/systems 為了使其與其他內(nèi)部工具/系統(tǒng)良好配合,需要進行特定的自定義
- These customizations are niche enough that a vendor may not prioritize them 這些自定義項非常利基,因此供應(yīng)商可能不會優(yōu)先考慮它們
- There is some other strategic value to building vs. buying (i.e., competitive advantage for the business or beneficial for hiring talent) 建立與購買之間還有其他一些戰(zhàn)略價值(例如,企業(yè)的競爭優(yōu)勢或人才的聘用優(yōu)勢)
One VP of Data Engineering at a healthcare startup we spoke with noted that if he was in his 20s, he would have wanted to build. But now, in his late 30s, he would almost exclusively buy.
我們與之交談的一家醫(yī)療保健初創(chuàng)公司的數(shù)據(jù)工程副總裁指出,如果他20多歲,他本來想建造。 但是現(xiàn)在,在他30多歲的時候,他幾乎會獨家購買。
“I get the enthusiasm,” he says, “But I’ll be darned if I have the time, energy, and resources to build a data platform from scratch. I’m older and wiser now — I know better than to NOT trust the experts.”
他說:“我充滿熱情,但是如果我有時間,精力和資源從頭開始構(gòu)建數(shù)據(jù)平臺,我會感到驚訝。 我現(xiàn)在年紀(jì)大了,也比較聰明-我比不信任專家更了解。”
When it comes to where you could be spending your time — and more importantly, money — it often makes more sense to buy a tried and true solution with a dedicated team to help you solve any issues that arise.
說到您可能會花費時間的地方-更重要的是,省錢-在專門的團隊那里購買經(jīng)過實踐檢驗的真實解決方案來幫助您解決出現(xiàn)的任何問題通常更有意義。
下一步是什么? (What’s next?)
Building a data platform is an exciting journey that will benefit from applying from a product development perspective. Image courtesy of memegenerator.net. 從產(chǎn)品開發(fā)的角度來看,構(gòu)建數(shù)據(jù)平臺是一段令人興奮的旅程,它將受益于此。 圖片由 memegenerator.net提供 。Building your data platform as a product will help you ensure greater consensus around data priorities, standardize on data quality and other key KPIs, foster greater collaboration, and, as a result, bring unprecedented value to your company.
將數(shù)據(jù)平臺構(gòu)建為產(chǎn)品將幫助您確保就數(shù)據(jù)優(yōu)先級達成更大的共識,標(biāo)準(zhǔn)化數(shù)據(jù)質(zhì)量和其他關(guān)鍵KPI,促進更好的協(xié)作,從而為您的公司帶來空前的價值。
In addition to serving as a vehicle for effective data management, reliability, and democratization, the benefits of building a data platform as a product include:
除了充當(dāng)有效數(shù)據(jù)管理,可靠性和民主化的手段外,構(gòu)建數(shù)據(jù)平臺產(chǎn)品的好處還包括:
- Guiding sales efforts (giving you insights on where to focus your efforts based on how prospective customers are responding) 指導(dǎo)銷售工作(根據(jù)潛在客戶的React為您提供工作重點的見解)
- Driving application product road maps 駕駛應(yīng)用產(chǎn)品路線圖
- Improving the customer experience (helps teams learn what your service pain points are, what’s working, and what’s not) 改善客戶體驗(幫助團隊了解您的服務(wù)難題是什么,什么在起作用以及什么不起作用)
- Standardizing data governance and compliance measures across the company (GDPR, CCPA, etc.) 標(biāo)準(zhǔn)化整個公司的數(shù)據(jù)治理和合規(guī)措施(GDPR,CCPA等)
Building a data platform might seem overwhelming at first blush, but with the right approach, your solution has the potential to become a force multiplier for your entire organization.
乍一看,構(gòu)建數(shù)據(jù)平臺似乎不堪重負(fù),但是采用正確的方法,您的解決方案就有可能成為整個組織的力量倍增器。
Want to learn more about building a reliable data platform? Reach out to Barr Moses and the Monte Carlo Team.
想更多地了解構(gòu)建可靠的數(shù)據(jù)平臺嗎? 接觸 Barr Moses 和蒙特卡洛團隊。
This article was co-written by Barr Moses and Atul Gupte.
本文由Barr Moses和Atul Gupte共同撰寫。
翻譯自: https://towardsdatascience.com/how-to-build-your-data-platform-like-a-product-6677e8abe318
大數(shù)據(jù)平臺構(gòu)建
總結(jié)
以上是生活随笔為你收集整理的大数据平台构建_如何像产品一样构建数据平台的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 做梦梦到猴子是怎么回事
- 下一篇: 时间序列预测 时间因果建模_时间序列建模