如何在不亏本的情况下构建道德数据科学系统?
Inspired by Google DeepMind’s team, Shakir Mohamed, William Isaac, and Implikit’s founder Marie-Therese Png article, Decolonial AI, my experience with Data Science and readings, I’ll try to propose a production strategy that compensates the lack of scalable ethics in Data Science Systems and make it embedded since the beginning of the development, saving the cost of change later.
受Google DeepMind團隊Shakir Mohamed,William Isaac和Implikit創始人Marie-Therese Png文章Decolonial AI的啟發,我對數據科學和閱讀的經驗,我將嘗試提出一種生產策略,以彌補數據中缺乏可擴展的道德規范自從開發之初就開始使用Science Systems并將其嵌入,從而節省了以后的更改成本。
問題 (The Problem)
The main problem that I’ll approach might be kind of obvious.
我要解決的主要問題可能很明顯。
Data Science does not implement efficient and scalable Ethical guidelines. At least, yet.
數據科學沒有實施有效且可擴展的道德準則。 至少呢。
But, in my opinion, it could be reformulated, as:
但是,我認為可以重新定義為:
Today, Data Science is not Customer Centric.
如今,數據科學已不再以客戶為中心。
The reason, I'll detail along the article is:
原因,我將在本文中詳細介紹:
Implicitly, our work might be motivated by solely on optimizing revenue, costs, human and non-human operational resources under the facade of enriching Customer Experience when we are launching Data-based products.
隱含地,我們的工作可能僅出于在啟動基于數據的產品時豐富客戶體驗的基礎上優化收入,成本,人力和非人力運營資源的動機。
This is as complicated to say as is it to tackle.
說起來和解決起來一樣復雜。
But I'll explain it and argue along this article that with Architectural principles from Software Engineering, there might be a light at the end of the tunnel.
但是,我將對此進行解釋,并在本文中堅持認為,根據軟件工程的體系結構原理,隧道盡頭可能會有曙光。
Without losing money (or our jobs, for the sake of productivity).
不虧本(或為了提高生產力而失業)。
上下文 (The Context)
We see we live, and it’s pretty clear.
我們看到我們生活了,這很清楚。
Data Science and Software Engineering novelties are bringing lots of innovations, opportunities, but also reinforcing social inequalities and dishonesty.
數據科學和軟件工程的新穎性帶來了許多創新,機遇,但同時也加劇了社會不平等和不誠實。
For example, thanks to the social network recommendation systems, radical skepticism is becoming almost a needed practice to consume information on the web.
例如,由于有了社交網絡推薦系統,激進的懷疑論幾乎成為了在網絡上消費信息的一種必需做法。
And it shouldn’t be like this, because doing so it’s exhausting, and not everyone is willing to do it.
而且不應該這樣,因為這樣做很累,而且并不是每個人都愿意這樣做。
If some of us do it, we are getting tired, and we are already tired of other consumption cycles, personal problems, and the current international healthcare and political crisis.
如果我們中的某些人這樣做,我們會感到疲倦,而我們已經厭倦了其他消費周期,個人問題以及當前的國際醫療保健和政治危機。
Unfortunately, who endorsed this situation, specifically in the information domain?
不幸的是,誰支持這種情況,特別是在信息領域?
In my opinion, we, the Data practitioners.
我認為我們是數據從業者。
Possibly, this is you right now.可能是您現在。I agree, it’s not like we made it, but let’s be frank, we've let things get out of control.
我同意,這不像我們做到的那樣,但是坦率地說,我們讓事情失控了。
You too.你也是。We are acting like Software Engineers once did in enterprises.
我們的行為就像軟件工程師曾經在企業中所做的那樣。
It really costed them to build stable and good practices, but eventually, it paid off. Usually, they are seen as independent workers, with their own practices and ethics. Practices and ethics that actually, we use almost all of them.
建立穩定和良好的實踐確實使他們付出了代價,但最終卻獲得了回報。 通常,他們被視為具有自己的實踐和道德準則的獨立工人。 實際上,我們幾乎都使用了實踐和道德規范。
But, this independence does not follow for Data practitioners yet.
但是,數據從業者還沒有遵循這種獨立性。
We import many practices from Software Engineering, but our dependencies have very different behaviors since we deal with personal data artifacts.
我們從軟件工程部門引進了許多實踐,但是由于我們處理個人數據工件,因此我們的依賴項具有截然不同的行為。
And as I saw from other companies, interviews, talking to friends in the field, the production cycle train of thought ends up being "will it pay-off for the company or for my team?".
正如我從其他公司,訪談,與現場朋友交談中看到的那樣,生產周期的思路最終是“為公司還是為我的團隊帶來回報?”。
The thing about it is that we usually put the customer only in the beginning and at the end of the process.
關于它的事情是,我們通常只將客戶放在流程的開始和結尾。
We should be thinking about them along all of the processes.
我們應該在所有過程中考慮它們。
For instance, the recent GPT-3, made by an enormous company like OpenAI, and I didn’t see any pieces of evidence of ethical practices in production. The model is too complex, too big to unbiased effectively.
例如,最近由像OpenAI這樣的大公司生產的GPT-3,我沒有看到任何有關生產中道德規范的證據。 該模型太復雜,太大而無法有效地進行偏置。
Possibly they’ve used explainability techniques in the model, but is this enough?
他們可能在模型中使用了可解釋性技術,但這足夠了嗎?
Could this avoid GPT-3 producing sophisticated fake news?
這樣可以避免GPT-3產生復雜的假新聞嗎?
Can underdeveloped countries fight these effectively? Or will they be subjected to the will of an elite that will do according to their ethics? It is of the interest of countries at war “want” to discover if the news of each other is fake, or will they use fake news to fuel rage upon each other?
欠發達國家可以有效地與之抗爭嗎? 還是會服從按照道德規范行事的精英人士的意志? 發現“彼此”的消息是否是偽造的,還是它們會利用偽造的消息在彼此之間激怒,是否符合“想要”戰爭國家的利益?
Can an underdeveloped country make its own GPT-3 model compete with OpenAI?
欠發達國家能否使自己的GPT-3模型與OpenAI競爭?
My point is, that maybe we're not thinking enough about the implications of our models.
我的觀點是,也許我們沒有對模型的含義進行足夠的思考。
And that while we continue to be educated by governments and their laws, we are not maturing the area enough.
而且,盡管我們繼續受到政府及其法律的教育,但我們對該地區的成熟程度還不夠。
Good professional ethics in the field should transcend our local domain problems, and we should start to effectively embed and advocate in our practice ethical concerns.
該領域的良好職業道德應該超越我們當地的問題,我們應該開始有效地將道德 問題納入和倡導實踐。
For starters, just because we use Data from the user or because we talk with them from time to time, it doesn't mean we really care, or that we are thinking about the user, being Customer Centric.
對于初學者來說,僅僅是因為我們使用了來自用戶的數據,或者是因為我們不時與他們交談,這并不意味著我們真的在乎,或者我們正在考慮以用戶為中心。
Maybe that's a mixture of Product Centrism and Direct Marketing, but not Customer-Centric.
也許這是產品中心主義和直銷的混合,但不是以客戶為中心的。
For me, being Customer Centric it's when the Customer comes to our mind in the whole production cycle, embeds care, wishes success for them, and not because we use them for production.
對我而言,以客戶為中心是客戶在整個生產周期中想到我們,嵌入關懷,為他們祝愿成功的原因,而不是因為我們將其用于生產。
We should really learn the impact, positive or negative of our work, and not make ethics a rhetorical tool for positive, humane marketing.
我們應該真正了解我們工作的正面或負面影響,而不是將道德規范作為積極,人道營銷的修辭工具。
我們正在成為技術專家嗎? (Are we becoming technocrats?)
And this rhetoric might be endorsed by the predominant technocrat perspective in the field.
這種言論可能被該領域主要的技術專家觀點所認可。
While we excessively value the technical side of Data Science, pure Data as our guide, modeling complex Machine Learning models, we are not being serious about our social responsibilities.
盡管我們過分重視數據科學的技術方面,以純數據為指導,為復雜的機器學習模型建模,但我們并不認真對待我們的社會責任。
Many of us suppose our work is neutral, but as I see it:
我們許多人認為我們的工作是中立的,但據我所知:
Data = People.
數據=人員。
And not thinking like this it seems to me to be a backward, especially when we're establishing ethical directives. Ethics starts to look like it's a detail of our system, a marketing tool.
而且我不這樣想,這似乎是一種落后,特別是在我們建立道德規范時。 道德開始看起來像是我們系統的一種細節,一種營銷工具。
Do you know that Christmas ornament? That one we put in the ending? Looks like what we're making of ethics.
你知道那個圣誕節裝飾品嗎? 我們把那個放在結尾嗎? 看起來像我們在講道德。
But, shouldn't it be?
但是,不是嗎?
I don't think so.
我不這么認為。
And to showcase the lack of neutrality of our work, how power is under every relationship, I'll try to summarize my point with two subjects commonly seen as neutral in society, that relates very closely in Data Science, both Science and Language.
為了說明我們的工作缺乏中立性,以及在每種關系下力量如何發揮作用,我將嘗試總結兩個在社會中通常被視為中立的主題,這兩個主題在數據科學中與科學和語言密切相關。
首先,科學。 (First, Science.)
Like any other institution, is moved by the engine of interests and desires. There are those who make decisions of what is relevant or not, of what gets published or not based on variable criteria.
像任何其他機構一樣,它是由利益和欲望的動力所驅動的。 有些人根據可變的標準來決定是否相關,是否發布哪些內容。
Kevin C. Elliott and Daniel J. McKaughan described well at the Philosophy of Science paper “Non-epistemic Values and the Multiple Goals of Science”.
凱文·埃利奧特(Kevin C. Elliott)和丹尼爾·麥考恩(Daniel J. McKaughan)在《科學哲學》論文“非流行性價值觀和科學的多重目標”中作了很好的描述。
In summary, they argue that non-epistemic values (those not related to knowledge itself) also direct science growth, not only pure and quality knowledge since they depend on someone to approve the definition of what is “ purity” and “ quality”.
總之,他們認為非流行性價值觀(那些與知識本身無關的價值觀)也指導科學的發展,不僅是純粹的知識和高質量的知識,因為它們依賴于某人來認可“純度”和“質量”的定義。
第二,語言。 (Second, language.)
Another example is the mathematical language. If we consider mathematics a language, modeling of phenomena, there is a filter of information as in any other model and group of people who speak it.
另一個例子是數學語言。 如果我們認為數學是一種語言,一種現象的建模,那么就像任何其他模型和說話的人一樣,信息過濾器也是如此。
We could ask:
我們可以問:
Who usually practice mathematics?
誰通常練習數學?
What’s the ratio of black woman and white males in math academia? LGBTQI+?
在學術界,黑人女性和白人男性的比例是多少? LGBTQI +?
Segmenting by country, how is it in the USA? In Brazil? In Venezuela? In Argentina? In China?
按國家細分,在美國如何? 在巴西? 在委內瑞拉? 在阿根廷? 在中國?
Another dimension of language is regional:
語言的另一個方面是區域性的:
What are the proportion of mathematical Portuguese articles in science that have more than 1.000 citations in relation to English?
相對于英語,被引用超過1.000的科學數學葡萄牙語文章占什么比例?
Is English that “universal”? Or is “dominant”?
英語是“通用”嗎? 還是“顯性”?
If we assume that English is “universal”, what does that mean for approximately 95% of the Brazillians that doesn't speak it?
如果我們假設英語是“通用”的,那么對于大約95%不會說英語的巴西人意味著什么 ?
Are they inferior? Or they lack opportunity and infrastructure?
他們是自卑的嗎? 還是他們缺乏機會和基礎設施?
As we question, and question, power structures start to unveil, even for language itself.
當我們質疑時,甚至對于語言本身,權力結構也開始顯現。
In the context of Data Science - sure - the poor, uninformed, minorities in power can produce Data Science.
在數據科學的背景下-當然-權力低下,知情的少數群體可以產生數據科學。
But what are the odds that they will make their own best practices for their context? Do they have sustainable infrastructure to practice?
但是,他們會根據自己的情況制定最佳實踐的可能性有多大? 他們有可實踐的可持續基礎設施嗎?
Or will they follow international guidelines, that probably doesn’t think about them? Implement solutions in their context that might be more damaging than positive in the long run?
還是他們會遵循可能不考慮它們的國際準則? 在他們的環境中實施解決方案,從長遠來看可能比積極的解決方案更具破壞性?
For me, it's pretty clear that:
對我來說,很明顯:
Data Science is far from neutral.
數據科學遠非中立。
And this should be something we need to act upon if we are serious about it.
如果我們認真對待,這應該是我們需要采取的行動。
If we continue to act and think only based on technical valuation and conception, on pure data, we will inevitably end up excluding others, excluding minorities of our workflow, and producing biased products and experience.
如果我們繼續僅基于技術評估和構想采取行動并思考,僅基于純數據,我們將不可避免地最終將其他人排除在外,排除工作流程中的少數人,并產生有偏見的產品和經驗。
That's why Ethics and Customer-Centric philosophy are heavily important for a sustainable Data Science practice.
這就是為什么道德和以客戶為中心的哲學對于可持續的數據科學實踐至關重要的原因。
For me, this is maintained today because of two factors:
對我而言,由于兩個因素,今天仍保持這種狀態:
1o原因:“數據科學”有何目的? (1o Reason: “Data Science” for what purpose?)
There is a possibility that we are heirs of problems not solved by another data-based field, Operational Research (OR).
我們有可能成為其他基于數據的領域運籌學(OR)無法解決的問題的繼承人。
For those who don’t know, since the II World War, Operational Research was the main Data-based technique being used, focused on optimizing resource allocations to win the war.
對于那些不知道的人,自第二次世界大戰以來,運籌學是使用的主要基于數據的技術,專注于優化資源分配以贏得戰爭。
Maximizing and minimizing resources for some specific goal, or as we might call, the objective function.
為某些特定目標或目標功能最大化或最小化資源。
The philosophy of optimal production, performance independent of the circumstances, was really attractive then and later in the '60s.
最佳生產的哲學,與環境無關的性能,在那時和60年代后期確實很有吸引力。
Not a surprise that it became a success. Since then, it became a powerful tool for optimizing cost-benefit relationships in enterprise production until today.
它成功了就不足為奇了。 從那時起,直到今天,它已成為在企業生產中優化成本-收益關系的有力工具。
Usually, an Operational Research model structures itself in three attributes:
通常,運籌學模型將自身構造為三個屬性:
- The decision variables, or the resource variables that we will use for obtaining our objectives; 決策變量或我們將用于實現目標的資源變量;
- Objective functions, usually a function of the decision variables that we want to minimize or maximize; 目標函數,通常是我們要最小化或最大化的決策變量的函數;
- Restrictions, that will make the contour of the solution space of our problem. 限制,將使我們的問題的解決空間成為輪廓。
There are companies solutions like ILOG CPLEX from IBM, Gurobi Solvers, these apply specific methods for each kind of problem using Dual-Simplex, Interior Points Methods, and others to obtain the optimal solution in optimal time.
有一些公司的解決方案,例如IBM的ILOG CPLEX,Gurobi Solvers,這些解決方案使用Dual-Simplex,Interior Points方法等針對每種問題應用特定方法,以在最佳時間內獲得最佳解決方案。
The OR workflow, extremely simplified, goes like this:
極簡化的OR工作流程如下:
(please OR practitioners, don't kill me)
(請或從業者,別殺了我)
First, you model the problem. Like we want the optimal share for certain users in a Revenue Sharing model.
首先,您對問題進行建模。 就像我們希望在“收入共享”模型中為某些用戶獲得最佳份額一樣。
Second, specify the decision variables for attaining optimality.
其次,指定決策變量以獲得最優性。
Third, define the restrictions of the model based on enterprise resources.
第三,根據企業資源定義模型的限制。
Write the model in a solver, and press enter :)
將模型寫入求解器,然后按Enter :)
In practice, is this easy as it looks?
實際上,這看起來容易嗎?
No, far from easy. It takes time to make an efficient enterprise solution based on MILP ( Mixed-Integer Linear Programming ), but it also depends on the problem. I just needed to summarize so I don’t end up writing a full Epic.
不,絕非易事。 制定基于MILP(混合整數線性規劃)的高效企業解決方案需要花費時間,但是這也取決于問題。 我只需要總結一下,這樣我就不會寫完整的Epic。
And well, I don’t know what you think, but this process certainly is not Customer-Centric for me.
而且,我不知道您的想法,但是對于我來說,此過程當然不是以客戶為中心的。
100% Product, Capital, Enterprise Centric.
100%產品,資本,以企業為中心。
And this was the core of Operational Research.
這是運籌學的核心 。
But how could we make the modeling process more Customer-Centric?
但是我們如何才能使建模過程更加以客戶為中心呢?
One solution could be enforcing restrictions that consider human health, age, time spent producing, mental conditions. All of these can be modeled mathematically, we just need to make it part of the development of the solution.
一種解決方案可能是實施限制措施,考慮人類健康,年齡,生產時間,精神狀況。 所有這些都可以用數學方式建模,我們只需要使其成為解決方案開發的一部分即可。
When we apply OR to factories, enterprise productivity optimization, we could consider the humans and their necessities for well-being in the restrictions.
當我們將OR應用于工廠,企業生產力優化時,我們可以在限制條件中考慮人員及其對幸福感的必要性。
But usually, it's not.
但通常不是。
In practice, usually Operational Researchers deal with data as pure resources.
實際上,運營研究人員通常將數據視為純資源。
And Data Scientists that deal with customers should see data as living behavior.
與客戶打交道的數據科學家應該將數據視為生活行為。
But not coincidentally, they both end up with the same scope today.
但并非巧合的是,它們今天最終都具有相同的范圍。
Optimizing, scaling processes, getting the state-of-the-art. But where is the Customer at this process?
優化,擴展流程,獲取最新技術。 但是在此過程中客戶在哪里?
It seems to me that the way we deal with data today is that if it were only resources, and not behaviors. In the end we're kind of reproducing the Operational Research way of thinking data.
在我看來,我們今天處理數據的方式是,如果它只是資源而不是行為。 最后,我們將重現運營研究的思維數據方式。
We're thinking about speed, optimizing metrics.
我們正在考慮速度,優化指標。
We're being agile, but in the wrong sense.
我們正在敏捷,但是在錯誤的意義上。
Usually, there are two common misinterpretations of the Agile production model:
通常,對敏捷生產模型有兩種常見的誤解:
1.速度: (1. Speed:)
Agile translates to continuous iteration, evolutionary design, and it does not mean necessarily to produce things fast.
敏捷轉化為連續迭代,進化設計,并不意味著一定要快速生產。
When we think of Data Science as something purely technical, achieving full speed and metrics optimization should be the pinnacle of our art. But as we’ve discussed, it’s not.
當我們將數據科學視為純粹的技術時,實現全速和指標優化應該是我們藝術的頂峰。 但是,正如我們所討論的,事實并非如此。
When we join this Agile misinterpretation with seeing Data Science as something purely technical, good professional ethics with customers are usually a "nice to have", when it should be upfront.
當我們將敏捷性誤解與將數據科學視為純粹的技術結合起來時,與客戶保持良好的職業道德通常是“必不可少的”,應該先行一步。
2.缺乏長期規劃: (2. The absence of long-term planning:)
Iterations, but how small should them be?
迭代,但是應該多小呢?
What's the scope so that we don't over-engineer or we stop losing track of what does really delivers value with professional ethics?
在什么范圍內我們可以不過度設計,或者我們不再失去對職業道德真正帶來價值的追蹤?
That might be something that is lacking in today's Engineering practices in Data Science, an Architectural perspective. And might be the secret to reduce the cost of implementing ethical principles in our Data Platform.
從架構的角度來看,這可能是當今數據科學的工程實踐中缺少的東西。 并且可能是降低在我們的數據平臺中實施道德原則的成本的秘密。
2o原因:我們的敏捷性不那么敏捷 (2o Reason: Our Agile is not that agile)
Since the beginning, Agile and eXtreme Programming philosophies advocated for incremental and continuous development. Solving a problem when it emerges, YAGNI and KISS.
從一開始,敏捷和極限編程理念就倡導漸進和持續的開發。 YAGNI和KISS解決出現的問題。
It's productive, but they might be a problem when a system has no Architectural long-term guidelines.
它很有生產力,但是當系統沒有體系結構長期指南時,它們可能會成為問題。
And in Data Science in particular, we have very few good architecture references.
尤其是在數據科學領域,我們很少有優秀的體系結構參考。
Don’t we have lot's of Data Pipelines and Machine Learning Processes?
我們是否沒有大量的數據管道和機器學習流程?
The way I see it, these are operational pipelines, not Architectural projects.
我的看法是,這些是可操作的管道,而不是建筑項目。
Agile architectures should be built incrementally, with small and complete iterations, to maximize value delivery and maintain close contact with customers.
敏捷體系結構應逐步構建,并進行小而完整的迭代,以最大程度地提高價值交付并保持與客戶的密切聯系。
They call it an evolutionary design, and I think it makes total sense.
他們稱之為進化設計,我認為這是完全合理的。
A good Software Architect (or Data Architect) should have a horizon of the system in mind as soon as possible because the Architecture will guide him through the constraints of the system.
優秀的軟件架構師(或數據架構師)應盡快考慮系統的前景,因為架構將引導他克服系統的約束。
If we don’t have this, we will postpone invisible problems, that are not usually measurable at the beginning, that will end up showing up with harsh costs.
如果沒有這些,我們將推遲通常在一開始就無法衡量的無形的問題,這些問題最終將導致高昂的代價。
In our case, we've seen with the latest events how Data-based Platforms influenced social behavior in the Coronavirus Crisis, the latest elections won based on Fake News and automated bots and other consequences.
在我們的案例中,我們通過最新事件了解了基于數據的平臺如何影響冠狀病毒危機中的社會行為,基于虛假新聞和自動漫游器贏得的最新選舉以及其他后果。
These invisible problems, are getting big. But even so, until now enterprises don't want or can't approach effectively those problems, because of cost-of-change constraints.
這些無形的問題正在變得越來越大。 但是即使如此,由于變更成本的限制,直到現在,企業還是不希望或無法有效解決這些問題。
So
所以
解決方案:了解客戶,然后圍繞他構建系統 (A Solution: Understand the customer, then build the System around him)
A possible solution might be based at the center of a Software Architecture:
可能的解決方案可能基于軟件體系結構的中心:
The Domain Layer.
域層。
For those who already studied the Clean Architecture model of software systems, Hexagonal, Ports and Adapters, etc, know that the most stable part of the system is the domain part of it.
對于那些已經研究過軟件體系結構,六角形,端口和適配器等清潔結構模型的人來說,知道系統的最穩定部分是系統領域的一部分。
The business rules, that we programmers, Data Scientists follow, are ruled by the customer experience, problems, and desires that define use cases.
我們的程序員,數據科學家遵循的業務規則由定義用例的客戶體驗,問題和需求所決定。
The rest of the system is developed around it.
系統的其余部分圍繞它開發。
If the Customer is at the center of the system domain, it means that ethics should be in the Domain region of our Architecture also.
如果客戶是系統領域的中心,則意味著道德也應該在我們架構的領域范圍內。
That might be the road for a good, sustainable solution in today’s production model.
在當今的生產模型中,這可能是一個好的,可持續的解決方案的道路。
Because when we delegate Customer Centrism, Ethics as a detail — as Robert Martin says in Clean Architecture — we are making the system less and less dependent on it, and that means we are implicitly saying:
因為當我們委托客戶中心主義時,道德作為一個細節,正如羅伯特·馬丁(Robert Martin)在“清潔建筑”中所說的那樣,我們正在使系統對它的依賴越來越少,這意味著我們暗中說:
This does not matter now, we can delegate.
現在沒關系,我們可以委托。
And this is actually wrong from a professional perspective.
從專業的角度來看,這實際上是錯誤的。
By telling customers we want to make their experience the best with their data, but are not we thinking about them in the process, the social implications of our work, using their information and behavior solely for our own production and profit optimization, we are lying.
通過告訴客戶我們希望利用他們的數據使他們的體驗達到最佳,但是我們不是在過程中,工作的社會含義,僅將他們的信息和行為用于我們自己的生產和利潤優化時就在考慮他們,這是在說謊。
If we don’t make Ethics part of the core of our development cycle, we won’t experience really concrete our proposal to the customer. We are just using them for profit, for their data assets, and they benefit somehow while they use our product.
如果我們不將道德規范納入開發周期的核心,我們將不會真正向客戶提出具體的建議。 我們只是將它們用于牟利,獲取數據資產,并且在使用我們的產品時會從某種程度上受益。
Somewhere in a close future, it's quite possible that probably someone will brag about the systems build like this and say:
在不久的將來的某個地方,很可能有人會吹噓這樣的系統構建并說:
How did they not think about this? I’ll have to fix this mess somehow…
他們怎么沒想到呢? 我將不得不以某種方式解決此問題……
New Data usage legislations, regulations will come, and we could be already prepared for it.
新的數據使用立法,法規將出臺,我們可能已經為此做好了準備。
This is somehow familiar with what already happened with Software Engineering, in production environment.
這對生產環境中軟件工程已經發生的事情有些熟悉。
When programmers were postponing detecting bugs, implementing tests, building monolithic systems that had a bizarre cost of change if they needed to refactor, fix bugs, concise and scalable implementation of business rules in production.
當程序員推遲檢測錯誤,實施測試,構建整體式系統時,如果他們需要重構,修復錯誤,在業務中實現業務規則的簡潔和可擴展實施,那么變更成本將非常高。
Until certain point in history, they used to delegate the responsibility, and someone else, who was going to fix it that mess, would feel like this:
直到歷史上的某個特定時刻,他們通常將職責委派給他人,而要修復這一混亂狀況的其他人會感覺像這樣:
And because of these problems, specially in production environment, Test-Driven Development was formalized and is evangelized until today.
由于存在這些問題,特別是在生產環境中,“測試驅動開發”被正式化并推廣到今天。
They embedded in their practice professional ethics and anticipation, that if well implemented didn't augmented the software production time.
他們將實踐中的職業道德和期望嵌入到實踐中,即如果實施得當,不會增加軟件生產時間。
You don't ship production code without testing.
未經測試,您不會交付生產代碼。
Those who shameful about their past could say:
那些對自己的過去感到羞恥的人可以說:
We didn’t knew that. It was not a good practice then…
我們不知道。 那不是一個好習慣……
But I think probably you knew somehow, as much as we Data Scientists know.
但是我認為您可能知道了什么,正如我們數據科學家所知道的那樣。
It's like we fear the speed of the production engine. And I get it, it’s quite scary and big.
就像我們擔心生產引擎的速度一樣。 而且我知道,它非常可怕而且很大。
It can take your job, salary if you don’t go along with it, if you're not as fast as they think it should be, so you postpone invisible but important things like tests and ethical data usage, feature engineering.
它會占用您的工作,薪水(如果您不配合的話),薪水不如他們認為的那么快,因此您會推遲看不見但重要的事情,例如測試和道??德數據使用,功能工程。
Eventually, this will not hold. It's not sustainable, the same thing is happening again in Data Science, and the consequences are showing up very quickly.
最終,這將不成立。 這是不可持續的,數據科學領域又發生了同樣的事情,其后果正在Swift顯現。
But, if we embed the ethical logic in the core of the system, the Domain Layer, in our core practices, we enforce ethical values not only in our Data Platform, but in our daily practice.
但是 ,如果在我們的核心實踐中將道德邏輯嵌入到系統的核心(領域層)中,我們不僅在我們的數據平臺中,而且在我們的日常實踐中都踐行道德價值觀。
And that could be our strategy.
那可能是我們的策略。
為什么以及如何運作? (Why and how should this work?)
The architectural reason is that, ideally, the core of the system is as abstract as stable.
架構上的原因是,理想情況下,系統的核心既要抽象又要穩定。
That means that most of the modules depend on business rules, in our case, all data products depends on the Customer Problem, Use Cases, and Business Rules.
這意味著大多數模塊都取決于業務規則,在我們的情況下,所有數據產品都取決于客戶問題,用例和業務規則。
If we include Customer Ethics in this domain, we protect it and make it almost obligatory.
如果我們將客戶道德規范納入此領域,則我們將對其加以保護并使其幾乎成為強制性。
So in summary, all you need to do is to include a Customer Ethics in the Domain Layer. Then, we'll have an Ethical Data Science Platform.
因此,總而言之,您需要做的就是在域層中加入客戶道德規范。 然后,我們將有一個道德數據科學平臺。
Once you make the Ethics part of the Domain Layer, it’s not a detail anymore, there is no escape, because it will be part of the most stable part of the system.
一旦將Ethics設置為Domain Layer的一部分,就不再是一個細節,也就不會逃脫,因為它將成為系統最穩定的部分。
But how will I unify them?
但是,我將如何統一它們?
Create an interface called Ethics, CustomerEthics implements it, and compose with Customer? What else?
創建一個名為Ethics的接口,CustomerEthics實施該接口,并與Customer組成? 還有什么?
You could do that, but I don’t see the need for this yet, maybe it’s a solution I didn’t think about and might be good in the future.
您可以做到這一點,但我認為還沒有必要,也許這是我沒有考慮過的解決方案,并且將來可能會很好。
For now, I thought that you need to construct the system thinking about building a culture around the user.
就目前而言,我認為您需要構建考慮圍繞用戶的文化的系統。
Gather knowledge from the customer, directly and indirectly, understand their pain, and understand how power inequalities might affect them.
直接和間接地從客戶那里收集知識,了解他們的痛苦,并了解電力不平等如何影響他們。
Understand the socioeconomic profile of your users, the proportions, and design along with product priorities, making ethics part of the Business Rules.
了解用戶的社會經濟概況,比例,設計以及產品優先級,從而將道德規范納入業務規則。
This should enforce production train of thought to include best ethical practices in the Data system.
這應該加強生產思路,以在數據系統中包括最佳道德規范。
Displacing the Data Platform development from 100% technical, functional pure data to 50% technical and 50% user profiling, experience (or something like that), the designing starts to change, and the way the team thinks will start to change also.
將數據平臺的開發從100%的技術,功能純數據轉移到50%的技術和50%的用戶配置文件,經驗(或類似的東西),設計開始發生變化,并且團隊認為的方式也將開始發生變化。
Make it in the Domain, and the natural evolution of the system should take care of it, with good Data Architecting and Agile made it right.
在Domain中實現它,并且系統的自然演進應該照顧好它,良好的Data Architect和Agile正確地做到了。
That’s how a Data Science System could start to evolve naturally with Ethics, without having to implement big changes later, you already made them in the first place.
這樣一來,數據科學系統便可以隨著Ethics自然地發展,而不必稍后進行重大更改,而您已經在第一時間進行了更改。
For references and different use cases, there is a good catalog that might help you ideate and define specific strategies based on this architectural approach on the Decolonial AI article.
對于參考和不同的用例,有一個不錯的目錄,可以幫助您根據Decolonial AI文章中的這種體系結構方法來構思和定義特定策略。
As enterprises domain vary a lot, understanding the possible biases made upon users could be interesting to change the Data culture in your culture also, making Data-Driven philosophy more mature.
由于企業領域千差萬別,因此了解用戶上可能存在的偏見也可能會改變您所在文化中的數據文化,從而使數據驅動的哲學更加成熟。
Probably uniting with a UX team might be very effective for this, since they are specialists in User Stories, new strategies might come along in your company, since Ethical guidelines will differ from Use Cases to Use Cases.
也許與UX團隊團結起來可能會非常有效,因為他們是用戶故事的專家,因此,公司的道德準則會因用例而異,因此新的策略可能會出現在您的公司中。
結論 (Conclusions)
And that’s it, after a long reading, I hope I made some contributions to the discussion, with my point of view of how can we implement an efficient Data Science System without suffering later with Ethical concerns.
就這樣,經過長時間的閱讀,我希望我能為討論做出一些貢獻,并提出自己的觀點,即如何實施高效的數據科學系統而又不會再遭受道德問題的困擾。
The objective is to put Ethics in the Domain Layer of an Architectural perspective of the system and build it around it with responsible Agile development.
目的是將道德放在系統的體系結構透視圖的領域層中,并通過負責任的敏捷開發圍繞它進行構建。
Gather with UX researchers, your PM’s and Data fellows to understand the profile of the user, because your tools interact directly with them.
與您的UX研究人員,您的PM和數據研究員一起,了解用戶的概況,因為您的工具直接與他們互動。
That's why we should
這就是為什么我們應該
Make Data products Customer-Centric.
使數據產品以客戶為中心。
Data is unstable because it has a bijective relationship with the Customer, that's why we should invest deeply understanding them.
數據不穩定,因為它與客戶之間存在雙向關系,這就是為什么我們應該投入更多精力來理解它們。
The more we realize this, I think the more we will mature our practices and how we are seen professionally.
我們越了解這一點,我認為我們越會成熟我們的實踐以及如何在專業上被看待。
If you agree, disagree, think there are any historical, logical misconceptions on the text, want to contribute somehow, I’ll be glad to discuss, and you can e-mail me in victor.souza@passeidireto.com, or talk with me in my LinkedIn page.
如果您同意,不同意,認為文本有任何歷史上的,邏輯上的誤解,想以某種方式做出貢獻,我將很高興進行討論,您可以通過victor.souza@passeidireto.com給我發送電子郵件,或與我在我的LinkedIn頁面上 。
[1] Mohamed, S., Png, M. & Isaac, W. Decolonial AI: Decolonial Theory as Sociotechnical Foresight in Artificial Intelligence (2020), Philos. Technol.
[1] Mohamed,S.,Png,M.和Isaac,W. Decolonial AI:作為人工智能的社會技術預見的Decolonial理論 (2020年), Philos。 技術。
[2] Kevin C. Elliott and Daniel J. McKaughan, Nonepistemic Values and the Multiple Goals of Science (2014), Philosophy of Science 81:1, 1–21
[2]凱文·埃利奧特(Kevin C. Elliott)和丹尼爾·麥考恩(Daniel J. McKaughan),《非精神價值論和科學的多重目標》 (2014年),《科學哲學》 81:1,1–21
[3] Robert C. Martin, Clean Architecture: A Craftsman’s Guide to Software Structure and Design (2017), Prentice Hall
[3] Robert C. Martin,《 清潔建筑:軟件結構和設計的工匠指南》 (2017年),Prentice Hall
翻譯自: https://towardsdatascience.com/how-to-build-an-ethical-data-science-system-without-losing-money-b5a72015ea8f
總結
以上是生活随笔為你收集整理的如何在不亏本的情况下构建道德数据科学系统?的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 基金为什么要分红 为什么基金要分红
- 下一篇: 支付宝双v尊享权益怎么取消