當(dāng)前位置：首頁 > 运维知识 > windows >内容正文

windows

如何在不亏本的情况下构建道德数据科学系统？

發(fā)布時間：2023/12/15 windows 56 豆豆

生活随笔收集整理的這篇文章主要介紹了如何在不亏本的情况下构建道德数据科学系统？小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

Inspired by Google DeepMind’s team, Shakir Mohamed, William Isaac, and Implikit’s founder Marie-Therese Png article, Decolonial AI, my experience with Data Science and readings, I’ll try to propose a production strategy that compensates the lack of scalable ethics in Data Science Systems and make it embedded since the beginning of the development, saving the cost of change later.

受Google DeepMind團(tuán)隊Shakir Mohamed，William Isaac和Implikit創(chuàng)始人Marie-Therese Png文章Decolonial AI的啟發(fā)，我對數(shù)據(jù)科學(xué)和閱讀的經(jīng)驗，我將嘗試提出一種生產(chǎn)策略，以彌補(bǔ)數(shù)據(jù)中缺乏可擴(kuò)展的道德規(guī)范自從開發(fā)之初就開始使用Science Systems并將其嵌入，從而節(jié)省了以后的更改成本。

問題 (The Problem)

The main problem that I’ll approach might be kind of obvious.

我要解決的主要問題可能很明顯。

Data Science does not implement efficient and scalable Ethical guidelines. At least, yet.

數(shù)據(jù)科學(xué)沒有實施有效且可擴(kuò)展的道德準(zhǔn)則。至少呢。

But, in my opinion, it could be reformulated, as:

但是，我認(rèn)為可以重新定義為:

Today, Data Science is not Customer Centric.

如今，數(shù)據(jù)科學(xué)已不再以客戶為中心。

The reason, I'll detail along the article is:

原因，我將在本文中詳細(xì)介紹:

Implicitly, our work might be motivated by solely on optimizing revenue, costs, human and non-human operational resources under the facade of enriching Customer Experience when we are launching Data-based products.

隱含地，我們的工作可能僅出于在啟動基于數(shù)據(jù)的產(chǎn)品時豐富客戶體驗的基礎(chǔ)上優(yōu)化收入，成本，人力和非人力運營資源的動機(jī)。

This is as complicated to say as is it to tackle.

說起來和解決起來一樣復(fù)雜。

But I'll explain it and argue along this article that with Architectural principles from Software Engineering, there might be a light at the end of the tunnel.

但是，我將對此進(jìn)行解釋，并在本文中堅持認(rèn)為，根據(jù)軟件工程的體系結(jié)構(gòu)原理，隧道盡頭可能會有曙光。

Without losing money (or our jobs, for the sake of productivity).

不虧本(或為了提高生產(chǎn)力而失業(yè))。

上下文 (The Context)

We see we live, and it’s pretty clear.

我們看到我們生活了，這很清楚。

Data Science and Software Engineering novelties are bringing lots of innovations, opportunities, but also reinforcing social inequalities and dishonesty.

數(shù)據(jù)科學(xué)和軟件工程的新穎性帶來了許多創(chuàng)新，機(jī)遇，但同時也加劇了社會不平等和不誠實。

For example, thanks to the social network recommendation systems, radical skepticism is becoming almost a needed practice to consume information on the web.

例如，由于有了社交網(wǎng)絡(luò)推薦系統(tǒng)，激進(jìn)的懷疑論幾乎成為了在網(wǎng)絡(luò)上消費信息的一種必需做法。

And it shouldn’t be like this, because doing so it’s exhausting, and not everyone is willing to do it.

而且不應(yīng)該這樣，因為這樣做很累，而且并不是每個人都愿意這樣做。

If some of us do it, we are getting tired, and we are already tired of other consumption cycles, personal problems, and the current international healthcare and political crisis.

如果我們中的某些人這樣做，我們會感到疲倦，而我們已經(jīng)厭倦了其他消費周期，個人問題以及當(dāng)前的國際醫(yī)療保健和政治危機(jī)。

Unfortunately, who endorsed this situation, specifically in the information domain?

不幸的是，誰支持這種情況，特別是在信息領(lǐng)域？

In my opinion, we, the Data practitioners.

我認(rèn)為我們是數(shù)據(jù)從業(yè)者。

Possibly, this is you right now.可能是您現(xiàn)在。

I agree, it’s not like we made it, but let’s be frank, we've let things get out of control.

我同意，這不像我們做到的那樣，但是坦率地說，我們讓事情失控了。

You too.你也是。

We are acting like Software Engineers once did in enterprises.

我們的行為就像軟件工程師曾經(jīng)在企業(yè)中所做的那樣。

It really costed them to build stable and good practices, but eventually, it paid off. Usually, they are seen as independent workers, with their own practices and ethics. Practices and ethics that actually, we use almost all of them.

建立穩(wěn)定和良好的實踐確實使他們付出了代價，但最終卻獲得了回報。通常，他們被視為具有自己的實踐和道德準(zhǔn)則的獨立工人。實際上，我們幾乎都使用了實踐和道德規(guī)范。

But, this independence does not follow for Data practitioners yet.

但是，數(shù)據(jù)從業(yè)者還沒有遵循這種獨立性。

We import many practices from Software Engineering, but our dependencies have very different behaviors since we deal with personal data artifacts.

我們從軟件工程部門引進(jìn)了許多實踐，但是由于我們處理個人數(shù)據(jù)工件，因此我們的依賴項具有截然不同的行為。

And as I saw from other companies, interviews, talking to friends in the field, the production cycle train of thought ends up being "will it pay-off for the company or for my team?".

正如我從其他公司，訪談，與現(xiàn)場朋友交談中看到的那樣，生產(chǎn)周期的思路最終是“為公司還是為我的團(tuán)隊帶來回報？”。

The thing about it is that we usually put the customer only in the beginning and at the end of the process.

關(guān)于它的事情是，我們通常只將客戶放在流程的開始和結(jié)尾。

We should be thinking about them along all of the processes.

我們應(yīng)該在所有過程中考慮它們。

For instance, the recent GPT-3, made by an enormous company like OpenAI, and I didn’t see any pieces of evidence of ethical practices in production. The model is too complex, too big to unbiased effectively.

例如，最近由像OpenAI這樣的大公司生產(chǎn)的GPT-3，我沒有看到任何有關(guān)生產(chǎn)中道德規(guī)范的證據(jù)。該模型太復(fù)雜，太大而無法有效地進(jìn)行偏置。

Possibly they’ve used explainability techniques in the model, but is this enough?

他們可能在模型中使用了可解釋性技術(shù)，但這足夠了嗎？

Could this avoid GPT-3 producing sophisticated fake news?

這樣可以避免GPT-3產(chǎn)生復(fù)雜的假新聞嗎？

Can underdeveloped countries fight these effectively? Or will they be subjected to the will of an elite that will do according to their ethics? It is of the interest of countries at war “want” to discover if the news of each other is fake, or will they use fake news to fuel rage upon each other?

欠發(fā)達(dá)國家可以有效地與之抗?fàn)巻?#xff1f; 還是會服從按照道德規(guī)范行事的精英人士的意志？發(fā)現(xiàn)“彼此”的消息是否是偽造的，還是它們會利用偽造的消息在彼此之間激怒，是否符合“想要”戰(zhàn)爭國家的利益？

Can an underdeveloped country make its own GPT-3 model compete with OpenAI?

欠發(fā)達(dá)國家能否使自己的GPT-3模型與OpenAI競爭？

My point is, that maybe we're not thinking enough about the implications of our models.

我的觀點是，也許我們沒有對模型的含義進(jìn)行足夠的思考。

And that while we continue to be educated by governments and their laws, we are not maturing the area enough.

而且，盡管我們繼續(xù)受到政府及其法律的教育，但我們對該地區(qū)的成熟程度還不夠。

Good professional ethics in the field should transcend our local domain problems, and we should start to effectively embed and advocate in our practice ethical concerns.

該領(lǐng)域的良好職業(yè)道德應(yīng)該超越我們當(dāng)?shù)氐膯栴}，我們應(yīng)該開始有效地將道德問題納入和倡導(dǎo)實踐。

For starters, just because we use Data from the user or because we talk with them from time to time, it doesn't mean we really care, or that we are thinking about the user, being Customer Centric.

對于初學(xué)者來說，僅僅是因為我們使用了來自用戶的數(shù)據(jù)，或者是因為我們不時與他們交談，這并不意味著我們真的在乎，或者我們正在考慮以用戶為中心。

Maybe that's a mixture of Product Centrism and Direct Marketing, but not Customer-Centric.

也許這是產(chǎn)品中心主義和直銷的混合，但不是以客戶為中心的。

For me, being Customer Centric it's when the Customer comes to our mind in the whole production cycle, embeds care, wishes success for them, and not because we use them for production.

對我而言，以客戶為中心是客戶在整個生產(chǎn)周期中想到我們，嵌入關(guān)懷，為他們祝愿成功的原因，而不是因為我們將其用于生產(chǎn)。

We should really learn the impact, positive or negative of our work, and not make ethics a rhetorical tool for positive, humane marketing.

我們應(yīng)該真正了解我們工作的正面或負(fù)面影響，而不是將道德規(guī)范作為積極，人道營銷的修辭工具。

我們正在成為技術(shù)專家嗎？ (Are we becoming technocrats?)

And this rhetoric might be endorsed by the predominant technocrat perspective in the field.

這種言論可能被該領(lǐng)域主要的技術(shù)專家觀點所認(rèn)可。

While we excessively value the technical side of Data Science, pure Data as our guide, modeling complex Machine Learning models, we are not being serious about our social responsibilities.

盡管我們過分重視數(shù)據(jù)科學(xué)的技術(shù)方面，以純數(shù)據(jù)為指導(dǎo)，為復(fù)雜的機(jī)器學(xué)習(xí)模型建模，但我們并不認(rèn)真對待我們的社會責(zé)任。

Many of us suppose our work is neutral, but as I see it:

我們許多人認(rèn)為我們的工作是中立的，但據(jù)我所知:

Data = People.

數(shù)據(jù)=人員。

And not thinking like this it seems to me to be a backward, especially when we're establishing ethical directives. Ethics starts to look like it's a detail of our system, a marketing tool.

而且我不這樣想，這似乎是一種落后，特別是在我們建立道德規(guī)范時。道德開始看起來像是我們系統(tǒng)的一種細(xì)節(jié)，一種營銷工具。

Do you know that Christmas ornament? That one we put in the ending? Looks like what we're making of ethics.

你知道那個圣誕節(jié)裝飾品嗎？我們把那個放在結(jié)尾嗎？看起來像我們在講道德。

But, shouldn't it be?

但是，不是嗎？

I don't think so.

我不這么認(rèn)為。

And to showcase the lack of neutrality of our work, how power is under every relationship, I'll try to summarize my point with two subjects commonly seen as neutral in society, that relates very closely in Data Science, both Science and Language.

為了說明我們的工作缺乏中立性，以及在每種關(guān)系下力量如何發(fā)揮作用，我將嘗試總結(jié)兩個在社會中通常被視為中立的主題，這兩個主題在數(shù)據(jù)科學(xué)中與科學(xué)和語言密切相關(guān)。

首先，科學(xué)。 (First, Science.)

Like any other institution, is moved by the engine of interests and desires. There are those who make decisions of what is relevant or not, of what gets published or not based on variable criteria.

像任何其他機(jī)構(gòu)一樣，它是由利益和欲望的動力所驅(qū)動的。有些人根據(jù)可變的標(biāo)準(zhǔn)來決定是否相關(guān)，是否發(fā)布哪些內(nèi)容。

Kevin C. Elliott and Daniel J. McKaughan described well at the Philosophy of Science paper “Non-epistemic Values and the Multiple Goals of Science”.

凱文·埃利奧特(Kevin C. Elliott)和丹尼爾·麥考恩(Daniel J. McKaughan)在《科學(xué)哲學(xué)》論文“非流行性價值觀和科學(xué)的多重目標(biāo)”中作了很好的描述。

In summary, they argue that non-epistemic values (those not related to knowledge itself) also direct science growth, not only pure and quality knowledge since they depend on someone to approve the definition of what is “ purity” and “ quality”.

總之，他們認(rèn)為非流行性價值觀(那些與知識本身無關(guān)的價值觀)也指導(dǎo)科學(xué)的發(fā)展，不僅是純粹的知識和高質(zhì)量的知識，因為它們依賴于某人來認(rèn)可“純度”和“質(zhì)量”的定義。

第二，語言。 (Second, language.)

Another example is the mathematical language. If we consider mathematics a language, modeling of phenomena, there is a filter of information as in any other model and group of people who speak it.

另一個例子是數(shù)學(xué)語言。如果我們認(rèn)為數(shù)學(xué)是一種語言，一種現(xiàn)象的建模，那么就像任何其他模型和說話的人一樣，信息過濾器也是如此。

We could ask:

我們可以問:

Who usually practice mathematics?

誰通常練習(xí)數(shù)學(xué)？

What’s the ratio of black woman and white males in math academia? LGBTQI+?

在學(xué)術(shù)界，黑人女性和白人男性的比例是多少？ LGBTQI +？

Segmenting by country, how is it in the USA? In Brazil? In Venezuela? In Argentina? In China?

按國家細(xì)分，在美國如何？在巴西？在委內(nèi)瑞拉？在阿根廷？在中國？

Another dimension of language is regional:

語言的另一個方面是區(qū)域性的:

What are the proportion of mathematical Portuguese articles in science that have more than 1.000 citations in relation to English?

相對于英語，被引用超過1.000的科學(xué)數(shù)學(xué)葡萄牙語文章占什么比例？

Is English that “universal”? Or is “dominant”?

英語是“通用”嗎？還是“顯性”？

If we assume that English is “universal”, what does that mean for approximately 95% of the Brazillians that doesn't speak it?

如果我們假設(shè)英語是“通用”的，那么對于大約95％不會說英語的巴西人意味著什么？

Are they inferior? Or they lack opportunity and infrastructure?

他們是自卑的嗎？還是他們?nèi)狈C(jī)會和基礎(chǔ)設(shè)施？

As we question, and question, power structures start to unveil, even for language itself.

當(dāng)我們質(zhì)疑時，甚至對于語言本身，權(quán)力結(jié)構(gòu)也開始顯現(xiàn)。

In the context of Data Science - sure - the poor, uninformed, minorities in power can produce Data Science.

在數(shù)據(jù)科學(xué)的背景下-當(dāng)然-權(quán)力低下，知情的少數(shù)群體可以產(chǎn)生數(shù)據(jù)科學(xué)。

But what are the odds that they will make their own best practices for their context? Do they have sustainable infrastructure to practice?

但是，他們會根據(jù)自己的情況制定最佳實踐的可能性有多大？他們有可實踐的可持續(xù)基礎(chǔ)設(shè)施嗎？

Or will they follow international guidelines, that probably doesn’t think about them? Implement solutions in their context that might be more damaging than positive in the long run?

還是他們會遵循可能不考慮它們的國際準(zhǔn)則？在他們的環(huán)境中實施解決方案，從長遠(yuǎn)來看可能比積極的解決方案更具破壞性？

For me, it's pretty clear that:

對我來說，很明顯:

Data Science is far from neutral.

數(shù)據(jù)科學(xué)遠(yuǎn)非中立。

And this should be something we need to act upon if we are serious about it.

如果我們認(rèn)真對待，這應(yīng)該是我們需要采取的行動。

If we continue to act and think only based on technical valuation and conception, on pure data, we will inevitably end up excluding others, excluding minorities of our workflow, and producing biased products and experience.

如果我們繼續(xù)僅基于技術(shù)評估和構(gòu)想采取行動并思考，僅基于純數(shù)據(jù)，我們將不可避免地最終將其他人排除在外，排除工作流程中的少數(shù)人，并產(chǎn)生有偏見的產(chǎn)品和經(jīng)驗。

That's why Ethics and Customer-Centric philosophy are heavily important for a sustainable Data Science practice.

這就是為什么道德和以客戶為中心的哲學(xué)對于可持續(xù)的數(shù)據(jù)科學(xué)實踐至關(guān)重要的原因。

For me, this is maintained today because of two factors:

對我而言，由于兩個因素，今天仍保持這種狀態(tài):

The way the Data practices were built around Operational Research in the context of companies, and how it didn't address the Customer-Centric model;

在公司背景下圍繞運營研究構(gòu)建數(shù)據(jù)實踐的方式，以及它如何不解決以客戶為中心的模型；

Because of how we are implementing our system in Agile practice today.

由于我們今天如何在敏捷實踐中實施我們的系統(tǒng)。

1o原因:“數(shù)據(jù)科學(xué)”有何目的？ (1o Reason: “Data Science” for what purpose?)

There is a possibility that we are heirs of problems not solved by another data-based field, Operational Research (OR).

我們有可能成為其他基于數(shù)據(jù)的領(lǐng)域運籌學(xué)(OR)無法解決的問題的繼承人。

For those who don’t know, since the II World War, Operational Research was the main Data-based technique being used, focused on optimizing resource allocations to win the war.

對于那些不知道的人，自第二次世界大戰(zhàn)以來，運籌學(xué)是使用的主要基于數(shù)據(jù)的技術(shù)，專注于優(yōu)化資源分配以贏得戰(zhàn)爭。

Maximizing and minimizing resources for some specific goal, or as we might call, the objective function.

為某些特定目標(biāo)或目標(biāo)功能最大化或最小化資源。

The philosophy of optimal production, performance independent of the circumstances, was really attractive then and later in the '60s.

最佳生產(chǎn)的哲學(xué)，與環(huán)境無關(guān)的性能，在那時和60年代后期確實很有吸引力。

Not a surprise that it became a success. Since then, it became a powerful tool for optimizing cost-benefit relationships in enterprise production until today.

它成功了就不足為奇了。從那時起，直到今天，它已成為在企業(yè)生產(chǎn)中優(yōu)化成本－收益關(guān)系的有力工具。

Usually, an Operational Research model structures itself in three attributes:

通常，運籌學(xué)模型將自身構(gòu)造為三個屬性:

The decision variables, or the resource variables that we will use for obtaining our objectives;
決策變量或我們將用于實現(xiàn)目標(biāo)的資源變量；
Objective functions, usually a function of the decision variables that we want to minimize or maximize;
目標(biāo)函數(shù)，通常是我們要最小化或最大化的決策變量的函數(shù)；
Restrictions, that will make the contour of the solution space of our problem.
限制，將使我們的問題的解決空間成為輪廓。

There are companies solutions like ILOG CPLEX from IBM, Gurobi Solvers, these apply specific methods for each kind of problem using Dual-Simplex, Interior Points Methods, and others to obtain the optimal solution in optimal time.

有一些公司的解決方案，例如IBM的ILOG CPLEX，Gurobi Solvers，這些解決方案使用Dual-Simplex，Interior Points方法等針對每種問題應(yīng)用特定方法，以在最佳時間內(nèi)獲得最佳解決方案。

The OR workflow, extremely simplified, goes like this:

極簡化的OR工作流程如下:

(please OR practitioners, don't kill me)

(請或從業(yè)者，別殺了我)

First, you model the problem. Like we want the optimal share for certain users in a Revenue Sharing model.

首先，您對問題進(jìn)行建模。就像我們希望在“收入共享”模型中為某些用戶獲得最佳份額一樣。

Second, specify the decision variables for attaining optimality.

其次，指定決策變量以獲得最優(yōu)性。

Third, define the restrictions of the model based on enterprise resources.

第三，根據(jù)企業(yè)資源定義模型的限制。

Write the model in a solver, and press enter :)

將模型寫入求解器，然后按Enter :)

In practice, is this easy as it looks?

實際上，這看起來容易嗎？

No, far from easy. It takes time to make an efficient enterprise solution based on MILP ( Mixed-Integer Linear Programming ), but it also depends on the problem. I just needed to summarize so I don’t end up writing a full Epic.

不，絕非易事。制定基于MILP(混合整數(shù)線性規(guī)劃)的高效企業(yè)解決方案需要花費時間，但是這也取決于問題。我只需要總結(jié)一下，這樣我就不會寫完整的Epic。

And well, I don’t know what you think, but this process certainly is not Customer-Centric for me.

而且，我不知道您的想法，但是對于我來說，此過程當(dāng)然不是以客戶為中心的。

100% Product, Capital, Enterprise Centric.

100％產(chǎn)品，資本，以企業(yè)為中心。

And this was the core of Operational Research.

這是運籌學(xué)的核心。

But how could we make the modeling process more Customer-Centric?

但是我們?nèi)绾尾拍苁菇＿^程更加以客戶為中心呢？

One solution could be enforcing restrictions that consider human health, age, time spent producing, mental conditions. All of these can be modeled mathematically, we just need to make it part of the development of the solution.

一種解決方案可能是實施限制措施，考慮人類健康，年齡，生產(chǎn)時間，精神狀況。所有這些都可以用數(shù)學(xué)方式建模，我們只需要使其成為解決方案開發(fā)的一部分即可。

When we apply OR to factories, enterprise productivity optimization, we could consider the humans and their necessities for well-being in the restrictions.

當(dāng)我們將OR應(yīng)用于工廠，企業(yè)生產(chǎn)力優(yōu)化時，我們可以在限制條件中考慮人員及其對幸福感的必要性。

But usually, it's not.

但通常不是。

In practice, usually Operational Researchers deal with data as pure resources.

實際上，運營研究人員通常將數(shù)據(jù)視為純資源。

And Data Scientists that deal with customers should see data as living behavior.

與客戶打交道的數(shù)據(jù)科學(xué)家應(yīng)該將數(shù)據(jù)視為生活行為。

But not coincidentally, they both end up with the same scope today.

但并非巧合的是，它們今天最終都具有相同的范圍。

Optimizing, scaling processes, getting the state-of-the-art. But where is the Customer at this process?

優(yōu)化，擴(kuò)展流程，獲取最新技術(shù)。但是在此過程中客戶在哪里？

It seems to me that the way we deal with data today is that if it were only resources, and not behaviors. In the end we're kind of reproducing the Operational Research way of thinking data.

在我看來，我們今天處理數(shù)據(jù)的方式是，如果它只是資源而不是行為。最后，我們將重現(xiàn)運營研究的思維數(shù)據(jù)方式。

We're thinking about speed, optimizing metrics.

我們正在考慮速度，優(yōu)化指標(biāo)。

We're being agile, but in the wrong sense.

我們正在敏捷，但是在錯誤的意義上。

Usually, there are two common misinterpretations of the Agile production model:

通常，對敏捷生產(chǎn)模型有兩種常見的誤解:

1.速度: (1. Speed:)

Agile translates to continuous iteration, evolutionary design, and it does not mean necessarily to produce things fast.

敏捷轉(zhuǎn)化為連續(xù)迭代，進(jìn)化設(shè)計，并不意味著一定要快速生產(chǎn)。

When we think of Data Science as something purely technical, achieving full speed and metrics optimization should be the pinnacle of our art. But as we’ve discussed, it’s not.

當(dāng)我們將數(shù)據(jù)科學(xué)視為純粹的技術(shù)時，實現(xiàn)全速和指標(biāo)優(yōu)化應(yīng)該是我們藝術(shù)的頂峰。但是，正如我們所討論的，事實并非如此。

When we join this Agile misinterpretation with seeing Data Science as something purely technical, good professional ethics with customers are usually a "nice to have", when it should be upfront.

當(dāng)我們將敏捷性誤解與將數(shù)據(jù)科學(xué)視為純粹的技術(shù)結(jié)合起來時，與客戶保持良好的職業(yè)道德通常是“必不可少的”，應(yīng)該先行一步。

2.缺乏長期規(guī)劃: (2. The absence of long-term planning:)

Iterations, but how small should them be?

迭代，但是應(yīng)該多小呢？

What's the scope so that we don't over-engineer or we stop losing track of what does really delivers value with professional ethics?

在什么范圍內(nèi)我們可以不過度設(shè)計，或者我們不再失去對職業(yè)道德真正帶來價值的追蹤？

That might be something that is lacking in today's Engineering practices in Data Science, an Architectural perspective. And might be the secret to reduce the cost of implementing ethical principles in our Data Platform.

從架構(gòu)的角度來看，這可能是當(dāng)今數(shù)據(jù)科學(xué)的工程實踐中缺少的東西。并且可能是降低在我們的數(shù)據(jù)平臺中實施道德原則的成本的秘密。

2o原因:我們的敏捷性不那么敏捷 (2o Reason: Our Agile is not that agile)

Since the beginning, Agile and eXtreme Programming philosophies advocated for incremental and continuous development. Solving a problem when it emerges, YAGNI and KISS.

從一開始，敏捷和極限編程理念就倡導(dǎo)漸進(jìn)和持續(xù)的開發(fā)。 YAGNI和KISS解決出現(xiàn)的問題。

It's productive, but they might be a problem when a system has no Architectural long-term guidelines.

它很有生產(chǎn)力，但是當(dāng)系統(tǒng)沒有體系結(jié)構(gòu)長期指南時，它們可能會成為問題。

And in Data Science in particular, we have very few good architecture references.

尤其是在數(shù)據(jù)科學(xué)領(lǐng)域，我們很少有優(yōu)秀的體系結(jié)構(gòu)參考。

Don’t we have lot's of Data Pipelines and Machine Learning Processes?

我們是否沒有大量的數(shù)據(jù)管道和機(jī)器學(xué)習(xí)流程？

The way I see it, these are operational pipelines, not Architectural projects.

我的看法是，這些是可操作的管道，而不是建筑項目。

Agile architectures should be built incrementally, with small and complete iterations, to maximize value delivery and maintain close contact with customers.

敏捷體系結(jié)構(gòu)應(yīng)逐步構(gòu)建，并進(jìn)行小而完整的迭代，以最大程度地提高價值交付并保持與客戶的密切聯(lián)系。

They call it an evolutionary design, and I think it makes total sense.

他們稱之為進(jìn)化設(shè)計，我認(rèn)為這是完全合理的。

A good Software Architect (or Data Architect) should have a horizon of the system in mind as soon as possible because the Architecture will guide him through the constraints of the system.

優(yōu)秀的軟件架構(gòu)師(或數(shù)據(jù)架構(gòu)師)應(yīng)盡快考慮系統(tǒng)的前景，因為架構(gòu)將引導(dǎo)他克服系統(tǒng)的約束。

If we don’t have this, we will postpone invisible problems, that are not usually measurable at the beginning, that will end up showing up with harsh costs.

如果沒有這些，我們將推遲通常在一開始就無法衡量的無形的問題，這些問題最終將導(dǎo)致高昂的代價。

In our case, we've seen with the latest events how Data-based Platforms influenced social behavior in the Coronavirus Crisis, the latest elections won based on Fake News and automated bots and other consequences.

在我們的案例中，我們通過最新事件了解了基于數(shù)據(jù)的平臺如何影響冠狀病毒危機(jī)中的社會行為，基于虛假新聞和自動漫游器贏得的最新選舉以及其他后果。

These invisible problems, are getting big. But even so, until now enterprises don't want or can't approach effectively those problems, because of cost-of-change constraints.

這些無形的問題正在變得越來越大。但是即使如此，由于變更成本的限制，直到現(xiàn)在，企業(yè)還是不希望或無法有效解決這些問題。

所以

解決方案:了解客戶，然后圍繞他構(gòu)建系統(tǒng) (A Solution: Understand the customer, then build the System around him)

A possible solution might be based at the center of a Software Architecture:

可能的解決方案可能基于軟件體系結(jié)構(gòu)的中心:

The Domain Layer.

域?qū)印?

For those who already studied the Clean Architecture model of software systems, Hexagonal, Ports and Adapters, etc, know that the most stable part of the system is the domain part of it.

對于那些已經(jīng)研究過軟件體系結(jié)構(gòu)，六角形，端口和適配器等清潔結(jié)構(gòu)模型的人來說，知道系統(tǒng)的最穩(wěn)定部分是系統(tǒng)領(lǐng)域的一部分。

The business rules, that we programmers, Data Scientists follow, are ruled by the customer experience, problems, and desires that define use cases.

我們的程序員，數(shù)據(jù)科學(xué)家遵循的業(yè)務(wù)規(guī)則由定義用例的客戶體驗，問題和需求所決定。

The rest of the system is developed around it.

系統(tǒng)的其余部分圍繞它開發(fā)。

If the Customer is at the center of the system domain, it means that ethics should be in the Domain region of our Architecture also.

如果客戶是系統(tǒng)領(lǐng)域的中心，則意味著道德也應(yīng)該在我們架構(gòu)的領(lǐng)域范圍內(nèi)。

That might be the road for a good, sustainable solution in today’s production model.

在當(dāng)今的生產(chǎn)模型中，這可能是一個好的，可持續(xù)的解決方案的道路。

Because when we delegate Customer Centrism, Ethics as a detail — as Robert Martin says in Clean Architecture — we are making the system less and less dependent on it, and that means we are implicitly saying:

因為當(dāng)我們委托客戶中心主義時，道德作為一個細(xì)節(jié)，正如羅伯特·馬丁(Robert Martin)在“清潔建筑”中所說的那樣，我們正在使系統(tǒng)對它的依賴越來越少，這意味著我們暗中說:

This does not matter now, we can delegate.

現(xiàn)在沒關(guān)系，我們可以委托。

And this is actually wrong from a professional perspective.

從專業(yè)的角度來看，這實際上是錯誤的。

By telling customers we want to make their experience the best with their data, but are not we thinking about them in the process, the social implications of our work, using their information and behavior solely for our own production and profit optimization, we are lying.

通過告訴客戶我們希望利用他們的數(shù)據(jù)使他們的體驗達(dá)到最佳，但是我們不是在過程中，工作的社會含義，僅將他們的信息和行為用于我們自己的生產(chǎn)和利潤優(yōu)化時就在考慮他們，這是在說謊。

If we don’t make Ethics part of the core of our development cycle, we won’t experience really concrete our proposal to the customer. We are just using them for profit, for their data assets, and they benefit somehow while they use our product.

如果我們不將道德規(guī)范納入開發(fā)周期的核心，我們將不會真正向客戶提出具體的建議。我們只是將它們用于牟利，獲取數(shù)據(jù)資產(chǎn)，并且在使用我們的產(chǎn)品時會從某種程度上受益。

Somewhere in a close future, it's quite possible that probably someone will brag about the systems build like this and say:

在不久的將來的某個地方，很可能有人會吹噓這樣的系統(tǒng)構(gòu)建并說:

How did they not think about this? I’ll have to fix this mess somehow…

他們怎么沒想到呢？我將不得不以某種方式解決此問題……

New Data usage legislations, regulations will come, and we could be already prepared for it.

新的數(shù)據(jù)使用立法，法規(guī)將出臺，我們可能已經(jīng)為此做好了準(zhǔn)備。

This is somehow familiar with what already happened with Software Engineering, in production environment.

這對生產(chǎn)環(huán)境中軟件工程已經(jīng)發(fā)生的事情有些熟悉。

When programmers were postponing detecting bugs, implementing tests, building monolithic systems that had a bizarre cost of change if they needed to refactor, fix bugs, concise and scalable implementation of business rules in production.

當(dāng)程序員推遲檢測錯誤，實施測試，構(gòu)建整體式系統(tǒng)時，如果他們需要重構(gòu)，修復(fù)錯誤，在業(yè)務(wù)中實現(xiàn)業(yè)務(wù)規(guī)則的簡潔和可擴(kuò)展實施，那么變更成本將非常高。

Until certain point in history, they used to delegate the responsibility, and someone else, who was going to fix it that mess, would feel like this:

直到歷史上的某個特定時刻，他們通常將職責(zé)委派給他人，而要修復(fù)這一混亂狀況的其他人會感覺像這樣:

And because of these problems, specially in production environment, Test-Driven Development was formalized and is evangelized until today.

由于存在這些問題，特別是在生產(chǎn)環(huán)境中，“測試驅(qū)動開發(fā)”被正式化并推廣到今天。

They embedded in their practice professional ethics and anticipation, that if well implemented didn't augmented the software production time.

他們將實踐中的職業(yè)道德和期望嵌入到實踐中，即如果實施得當(dāng)，不會增加軟件生產(chǎn)時間。

You don't ship production code without testing.

未經(jīng)測試，您不會交付生產(chǎn)代碼。

Those who shameful about their past could say:

那些對自己的過去感到羞恥的人可以說:

We didn’t knew that. It was not a good practice then…

我們不知道。那不是一個好習(xí)慣……

But I think probably you knew somehow, as much as we Data Scientists know.

但是我認(rèn)為您可能知道了什么，正如我們數(shù)據(jù)科學(xué)家所知道的那樣。

It's like we fear the speed of the production engine. And I get it, it’s quite scary and big.

就像我們擔(dān)心生產(chǎn)引擎的速度一樣。而且我知道，它非常可怕而且很大。

It can take your job, salary if you don’t go along with it, if you're not as fast as they think it should be, so you postpone invisible but important things like tests and ethical data usage, feature engineering.

它會占用您的工作，薪水(如果您不配合的話)，薪水不如他們認(rèn)為的那么快，因此您會推遲看不見但重要的事情，例如測試和道??德數(shù)據(jù)使用，功能工程。

Eventually, this will not hold. It's not sustainable, the same thing is happening again in Data Science, and the consequences are showing up very quickly.

最終，這將不成立。這是不可持續(xù)的，數(shù)據(jù)科學(xué)領(lǐng)域又發(fā)生了同樣的事情，其后果正在Swift顯現(xiàn)。

But, if we embed the ethical logic in the core of the system, the Domain Layer, in our core practices, we enforce ethical values not only in our Data Platform, but in our daily practice.

但是，如果在我們的核心實踐中將道德邏輯嵌入到系統(tǒng)的核心(領(lǐng)域?qū)?中，我們不僅在我們的數(shù)據(jù)平臺中，而且在我們的日常實踐中都踐行道德價值觀。

And that could be our strategy.

那可能是我們的策略。

為什么以及如何運作？ (Why and how should this work?)

The architectural reason is that, ideally, the core of the system is as abstract as stable.

架構(gòu)上的原因是，理想情況下，系統(tǒng)的核心既要抽象又要穩(wěn)定。

That means that most of the modules depend on business rules, in our case, all data products depends on the Customer Problem, Use Cases, and Business Rules.

這意味著大多數(shù)模塊都取決于業(yè)務(wù)規(guī)則，在我們的情況下，所有數(shù)據(jù)產(chǎn)品都取決于客戶問題，用例和業(yè)務(wù)規(guī)則。

If we include Customer Ethics in this domain, we protect it and make it almost obligatory.

如果我們將客戶道德規(guī)范納入此領(lǐng)域，則我們將對其加以保護(hù)并使其幾乎成為強(qiáng)制性。

So in summary, all you need to do is to include a Customer Ethics in the Domain Layer. Then, we'll have an Ethical Data Science Platform.

因此，總而言之，您需要做的就是在域?qū)又屑尤肟蛻舻赖乱?guī)范。然后，我們將有一個道德數(shù)據(jù)科學(xué)平臺。

Once you make the Ethics part of the Domain Layer, it’s not a detail anymore, there is no escape, because it will be part of the most stable part of the system.

一旦將Ethics設(shè)置為Domain Layer的一部分，就不再是一個細(xì)節(jié)，也就不會逃脫，因為它將成為系統(tǒng)最穩(wěn)定的部分。

But how will I unify them?

但是，我將如何統(tǒng)一它們？

Create an interface called Ethics, CustomerEthics implements it, and compose with Customer? What else?

創(chuàng)建一個名為Ethics的接口，CustomerEthics實施該接口，并與Customer組成？還有什么？

You could do that, but I don’t see the need for this yet, maybe it’s a solution I didn’t think about and might be good in the future.

您可以做到這一點，但我認(rèn)為還沒有必要，也許這是我沒有考慮過的解決方案，并且將來可能會很好。

For now, I thought that you need to construct the system thinking about building a culture around the user.

就目前而言，我認(rèn)為您需要構(gòu)建考慮圍繞用戶的文化的系統(tǒng)。

Gather knowledge from the customer, directly and indirectly, understand their pain, and understand how power inequalities might affect them.

直接和間接地從客戶那里收集知識，了解他們的痛苦，并了解電力不平等如何影響他們。

Understand the socioeconomic profile of your users, the proportions, and design along with product priorities, making ethics part of the Business Rules.

了解用戶的社會經(jīng)濟(jì)概況，比例，設(shè)計以及產(chǎn)品優(yōu)先級，從而將道德規(guī)范納入業(yè)務(wù)規(guī)則。

This should enforce production train of thought to include best ethical practices in the Data system.

這應(yīng)該加強(qiáng)生產(chǎn)思路，以在數(shù)據(jù)系統(tǒng)中包括最佳道德規(guī)范。

Displacing the Data Platform development from 100% technical, functional pure data to 50% technical and 50% user profiling, experience (or something like that), the designing starts to change, and the way the team thinks will start to change also.

將數(shù)據(jù)平臺的開發(fā)從100％的技術(shù)，功能純數(shù)據(jù)轉(zhuǎn)移到50％的技術(shù)和50％的用戶配置文件，經(jīng)驗(或類似的東西)，設(shè)計開始發(fā)生變化，并且團(tuán)隊認(rèn)為的方式也將開始發(fā)生變化。

Make it in the Domain, and the natural evolution of the system should take care of it, with good Data Architecting and Agile made it right.

在Domain中實現(xiàn)它，并且系統(tǒng)的自然演進(jìn)應(yīng)該照顧好它，良好的Data Architect和Agile正確地做到了。

That’s how a Data Science System could start to evolve naturally with Ethics, without having to implement big changes later, you already made them in the first place.

這樣一來，數(shù)據(jù)科學(xué)系統(tǒng)便可以隨著Ethics自然地發(fā)展，而不必稍后進(jìn)行重大更改，而您已經(jīng)在第一時間進(jìn)行了更改。

For references and different use cases, there is a good catalog that might help you ideate and define specific strategies based on this architectural approach on the Decolonial AI article.

對于參考和不同的用例，有一個不錯的目錄，可以幫助您根據(jù)Decolonial AI文章中的這種體系結(jié)構(gòu)方法來構(gòu)思和定義特定策略。

As enterprises domain vary a lot, understanding the possible biases made upon users could be interesting to change the Data culture in your culture also, making Data-Driven philosophy more mature.

由于企業(yè)領(lǐng)域千差萬別，因此了解用戶上可能存在的偏見也可能會改變您所在文化中的數(shù)據(jù)文化，從而使數(shù)據(jù)驅(qū)動的哲學(xué)更加成熟。

Probably uniting with a UX team might be very effective for this, since they are specialists in User Stories, new strategies might come along in your company, since Ethical guidelines will differ from Use Cases to Use Cases.

也許與UX團(tuán)隊團(tuán)結(jié)起來可能會非常有效，因為他們是用戶故事的專家，因此，公司的道德準(zhǔn)則會因用例而異，因此新的策略可能會出現(xiàn)在您的公司中。

結(jié)論 (Conclusions)

And that’s it, after a long reading, I hope I made some contributions to the discussion, with my point of view of how can we implement an efficient Data Science System without suffering later with Ethical concerns.

就這樣，經(jīng)過長時間的閱讀，我希望我能為討論做出一些貢獻(xiàn)，并提出自己的觀點，即如何實施高效的數(shù)據(jù)科學(xué)系統(tǒng)而又不會再遭受道德問題的困擾。

The objective is to put Ethics in the Domain Layer of an Architectural perspective of the system and build it around it with responsible Agile development.

目的是將道德放在系統(tǒng)的體系結(jié)構(gòu)透視圖的領(lǐng)域?qū)又?#xff0c;并通過負(fù)責(zé)任的敏捷開發(fā)圍繞它進(jìn)行構(gòu)建。

Gather with UX researchers, your PM’s and Data fellows to understand the profile of the user, because your tools interact directly with them.

與您的UX研究人員，您的PM和數(shù)據(jù)研究員一起，了解用戶的概況，因為您的工具直接與他們互動。

That's why we should

這就是為什么我們應(yīng)該

Make Data products Customer-Centric.

使數(shù)據(jù)產(chǎn)品以客戶為中心。

Data is unstable because it has a bijective relationship with the Customer, that's why we should invest deeply understanding them.

數(shù)據(jù)不穩(wěn)定，因為它與客戶之間存在雙向關(guān)系，這就是為什么我們應(yīng)該投入更多精力來理解它們。

The more we realize this, I think the more we will mature our practices and how we are seen professionally.

我們越了解這一點，我認(rèn)為我們越會成熟我們的實踐以及如何在專業(yè)上被看待。

If you agree, disagree, think there are any historical, logical misconceptions on the text, want to contribute somehow, I’ll be glad to discuss, and you can e-mail me in victor.souza@passeidireto.com, or talk with me in my LinkedIn page.

如果您同意，不同意，認(rèn)為文本有任何歷史上的，邏輯上的誤解，想以某種方式做出貢獻(xiàn)，我將很高興進(jìn)行討論，您可以通過victor.souza@passeidireto.com給我發(fā)送電子郵件，或與我在我的LinkedIn頁面上。

[1] Mohamed, S., Png, M. & Isaac, W. Decolonial AI: Decolonial Theory as Sociotechnical Foresight in Artificial Intelligence (2020), Philos. Technol.

[1] Mohamed，S.，Png，M.和Isaac，W. Decolonial AI:作為人工智能的社會技術(shù)預(yù)見的Decolonial理論 (2020年)， Philos。技術(shù)。

[2] Kevin C. Elliott and Daniel J. McKaughan, Nonepistemic Values and the Multiple Goals of Science (2014), Philosophy of Science 81:1, 1–21

[2]凱文·埃利奧特(Kevin C. Elliott)和丹尼爾·麥考恩(Daniel J. McKaughan)，《非精神價值論和科學(xué)的多重目標(biāo)》 (2014年)，《科學(xué)哲學(xué)》 81:1，1–21

[3] Robert C. Martin, Clean Architecture: A Craftsman’s Guide to Software Structure and Design (2017), Prentice Hall

[3] Robert C. Martin，《清潔建筑:軟件結(jié)構(gòu)和設(shè)計的工匠指南》 (2017年)，Prentice Hall

翻譯自: https://towardsdatascience.com/how-to-build-an-ethical-data-science-system-without-losing-money-b5a72015ea8f

總結(jié)

以上是生活随笔為你收集整理的如何在不亏本的情况下构建道德数据科学系统？的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：基金为什么要分红为什么基金要分红
下一篇：机器学习经典算法实践_服务机器学习算法的