openai-gpt_GPT-3报告存在的问题
openai-gpt
I’ve recently seen a massive number of articles about GPT-3, on Medium and elsewhere. I even wrote one. The language model is a significant development in AI, so it’s only natural that writers want to share their excitement with the world.
我最近在Medium和其他地方看到了大量關(guān)于GPT-3的文章。 我什至寫了一個(gè) 。 語言模型是AI的重大發(fā)展,因此,作家與世界分享自己的興奮是很自然的。
Here’s the problem: the ability of GPT-3 — namely the quality of its writing — is often exaggerated by published samples. In fact, there are not one, but two filters keeping the AI’s worst results from wide dissemination.
這就是問題所在:GPT-3的能力(即其寫作質(zhì)量)經(jīng)常被已發(fā)布的樣本夸大。 實(shí)際上,沒有一個(gè)過濾器,但是有兩個(gè)過濾器使AI的最壞結(jié)果無法廣泛傳播。
Selection bias wouldn’t be a problem if any interested reader could access the GPT-3 API and make their own observations of its ability. However, access is currently severely limited. (AI Dungeon is often used to test GPT-3 by those of us without the full version, but its creator has recently outlined ways backdoor access to GPT-3 is being prevented.)
如果有興趣的讀者可以訪問GPT-3 API并對(duì)其功能進(jìn)行自己的觀察,那么選擇偏向就不會(huì)成為問題。 但是,當(dāng)前訪問受到嚴(yán)重限制。 ( AI地牢通常被我們那些沒有完整版本的人用來測試GPT-3,但其創(chuàng)建者最近概述了如何防止對(duì)GPT-3進(jìn)行后門訪問。)
When reporting — and I use that term in its broadest possible interpretation to mean any writing about GPT-3 — is the only source of public information, selection biases ought to be considered in our understanding of the product. Here, I outline the obvious bias, and a less-obvious bias which exacerbates the issue.
報(bào)告時(shí)(我在最廣泛的解釋中使用該術(shù)語表示任何有關(guān)GPT-3的文字)是唯一的公共信息來源,因此,在理解產(chǎn)品時(shí)應(yīng)考慮選擇偏見。 在這里,我概述了明顯的偏見,以及不太明顯的偏見,這加劇了該問題。
1.選擇寫作樣本以提高質(zhì)量 (1. Writing samples are selected for quality)
Say I’m writing an informative piece on GPT-3. I want to demonstrate that it can put together coherent strings of sentences, so I give it a prompt and examine the output.
假設(shè)我正在撰寫有關(guān)GPT-3的內(nèi)容豐富的文章。 我想證明它可以將連貫的句子串在一起,所以我給了它一個(gè)提示,并檢查了輸出。
If I don’t like what I see, I’m likely to try again with a slightly different (perhaps longer) prompt. Even if I’m not actively selecting particular sentences that suit the purpose of my article, massaging the output creates a biased sample of writing that is not representative of GPT-3’s overall quality.
如果我不喜歡自己看到的內(nèi)容,則可能會(huì)再次嘗試使用稍有不同(可能更長)的提示。 即使我沒有積極選擇適合我文章目的的特定句子,但對(duì)輸出進(jìn)行按摩也會(huì)產(chǎn)生有偏見的寫作樣本,不能代表GPT-3的整體素質(zhì)。
In the context of creating a narrative about the AI, it makes sense to showcase its best work rather than a fair representation of its limitations. This is the first problem.
在創(chuàng)建有關(guān)AI的敘述的背景下,有意義的是展示其最佳作品,而不是公平地表述其局限性。 這是第一個(gè)問題。
2.文章越酷,觀看次數(shù)越多 (2. The cooler the article, the more views)
Consider the case where something does gets written about a function GPT-3 cannot perform. It might be a list of writing fails, or code that doesn’t compile.
考慮以下情況: 確實(shí)編寫了有關(guān)GPT-3 無法執(zhí)行的功能的信息。 可能是寫入失敗或代碼未編譯的列表。
To me, that wouldn’t be an interesting piece, and I suspect it wouldn’t intrigue others either. I’m sure Tweets, Reddit posts, and longer articles detailing GPT-3’s unexpected failures are out there, but the fact of the matter is they’re not getting read.
對(duì)我來說,那不是件有趣的事,而且我懷疑它也不會(huì)吸引其他人。 我敢肯定,這里有Tweets,Reddit帖子和更長的文章,詳細(xì)介紹了GPT-3的意外故障,但是事實(shí)是它們沒有被閱讀 。
On the surface, this doesn’t seem like a problem. It definitely isn’t necessary to read about everything that GPT-3 can’t do. The real problem is when positive results are favoured over negative ones for the same task. For example, if someone reported positive results for getting GPT-3 to write a legal document, this would undoubtedly receive more attention than an instance where the AI fails to generate a coherent document.
從表面上看,這似乎不是問題。 絕對(duì)沒有必要閱讀GPT-3不能做的所有事情。 真正的問題是,對(duì)于同一任務(wù),正面結(jié)果勝于負(fù)面結(jié)果。 例如,如果有人報(bào)告說讓GPT-3撰寫法律文件取得了積極成果,那么毫無疑問,這將比AI無法生成連貫文件的情況受到更多關(guān)注。
In essence, the way GPT-3 reporting currently works is analogous to running scientific trials without pre-registration. Publication bias, where statistically insignificant results don’t get published, can cause absurd findings to be accepted as solid research.
從本質(zhì)上講,GPT-3報(bào)告的當(dāng)前工作方式類似于無需預(yù)先注冊(cè)即可進(jìn)行的科學(xué)試驗(yàn)。 出版偏見不會(huì)發(fā)表統(tǒng)計(jì)上微不足道的結(jié)果,這可能會(huì)導(dǎo)致荒謬的發(fā)現(xiàn)被接受為可靠的研究。
To be clear, I don’t think there is an imperative for writers to publish more negative results from GPT-3. There is, however, an obligation to contextualize samples with the way in which they were generated and how many negative results were obtained in the process.
需要明確的是,我認(rèn)為作者沒有必要發(fā)布GPT-3的更多負(fù)面結(jié)果。 但是,有義務(wù)根據(jù)樣本的生成方式和在此過程中獲得多少負(fù)面結(jié)果來對(duì)樣本進(jìn)行情境化。
After all, human selection — on the level of individual pieces of writing or how the larger body of work gets consumed — of an AI’s output is a combination of our intelligence with that of a computer program, and that’s a beautiful thing.
畢竟,人工智能的輸出是人工選擇,無論是在單個(gè)作品的層次上還是在更大的工作量上,人類的選擇都是我們的智慧與計(jì)算機(jī)程序的結(jié)合,這是一件很美的事情。
翻譯自: https://towardsdatascience.com/the-problem-with-gpt-3-reporting-93c7b5b58400
openai-gpt
總結(jié)
以上是生活随笔為你收集整理的openai-gpt_GPT-3报告存在的问题的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 属实赚麻了!《满江红》7天为光线传媒创收
- 下一篇: 机器学习 凝聚态物理_机器学习遇到了凝聚