gpt 语言模型_您可以使用语言模型构建的事物的列表-不仅仅是GPT-3
gpt 語言模型
Natural language processing (NLP) is everywhere lately, with OpenAI’s GPT-3 generating as much hype as we’ve ever seen from a single model.
最近,自然語言處理(NLP)隨處可見,OpenAI的GPT-3產(chǎn)生了我們從單一模型中看到的盡可能多的炒作。
As I’ve written about before, the flood of projects being built on GPT-3 is not down to just its computational power, but its accessibility. The fact that it was released as a simple API has made it so that anyone who can query an endpoint can use state of the art machine learning.
正如我之前所寫 ,在GPT-3上構(gòu)建的大量項(xiàng)目不僅取決于其計(jì)算能力,還取決于其可訪問性。 它以簡(jiǎn)單的API發(fā)行的事實(shí)使之成為現(xiàn)實(shí),因此任何可以查詢端點(diǎn)的人都可以使用最新的機(jī)器學(xué)習(xí)技術(shù)。
Experiencing machine learning as “just another web service” has opened the eyes of many engineers, who previously experienced machine learning as an arcane, unapproachable field.
將機(jī)器學(xué)習(xí)體驗(yàn)為“僅僅是另一個(gè)Web服務(wù)”已經(jīng)打開了許多工程師的視野,他們以前曾將機(jī)器學(xué)習(xí)視為一個(gè)不可思議的,不可接近的領(lǐng)域。
Suddenly, machine learning is something you can build things with.
突然之間,機(jī)器學(xué)習(xí)就是您可以構(gòu)建的東西 。
And while GPT-3 is an incredible accomplishment, it’s far from the only impressive language model in the world. If you’re interested in machine learning engineering, my goal in this article is to introduce you to a number of open source language models you can use today to build software, using some of the most popular ML applications in the world as examples.
盡管GPT-3是令人難以置信的成就,但它與世界上唯一令人印象深刻的語言模型相距甚遠(yuǎn)。 如果您對(duì)機(jī)器學(xué)習(xí)工程感興趣,本文的目的是向您介紹當(dāng)今可以用于構(gòu)建軟件的許多開源語言模型,并以世界上一些最受歡迎的ML應(yīng)用程序?yàn)槔?
Before we start, I need to give a little context to our approach.
在開始之前,我需要對(duì)我們的方法進(jìn)行一些介紹。
將模型作為API進(jìn)行實(shí)時(shí)推理 (Deploying a model as an API for realtime inference)
What made GPT-3 so accessible to engineers was that to use it, you just queried an endpoint with some text, and it sent back a response. This on-demand, web service interface is called realtime inference.
使GPT-3易于工程師使用的原因是要使用它,您只是向端點(diǎn)查詢了一些文本,然后它發(fā)回了響應(yīng)。 這種按需的Web服務(wù)接口稱為實(shí)時(shí)推理 。
In the case of GPT-3, the API was deployed for us by the team at OpenAI. However, deploying a model as an API on our own is fairly trivial, with the right tools.
對(duì)于GPT-3,OpenAI團(tuán)隊(duì)已為我們部署了API。 但是,使用合適的工具將模型本身作為API部署是相當(dāng)瑣碎的。
We’re going to use two main tools in these examples. First is Hugging Face’s Transformers, a library that provides a very easy-to-use interface for working with popular language models. Second is Cortex, an open source machine learning engineering platform I maintain, designed to be make it as easy as possible to put models into production.
在這些示例中,我們將使用兩個(gè)主要工具。 首先是Hugging Face的Transformers,該庫(kù)提供了一個(gè)非常易于使用的界面來處理流行的語言模型。 其次是Cortex ,我維護(hù)著一個(gè)開放源代碼的機(jī)器學(xué)習(xí)工程平臺(tái),旨在使其盡可能輕松地將模型投入生產(chǎn)。
To deploy a model as an API for realtime inference, we need to do three things.
要將模型部署為用于實(shí)時(shí)推理的API,我們需要做三件事。
First, we need to write the API. With Cortex’s Predictor interface, our API is just a Python class with an __init__() function, which initializes our model, and a predict() function, which does the predicting. It looks something like this:
首先,我們需要編寫API。 通過Cortex的Predictor接口,我們的API只是一個(gè)帶有__init__()函數(shù)(用于初始化我們的模型)和predict()函數(shù)predict()用于進(jìn)行predict()的Python類。 看起來像這樣:
Cortex will then use this to create and deploy a web service. Under the hood, it’s doing a bunch of things with Docker, FastAPI, Kubernetes, and various AWS Services, but you don’t have to worry about the underlying infrastructure (unless you want to).
然后,Cortex將使用它來創(chuàng)建和部署Web服務(wù)。 在后臺(tái),它正在使用Docker,FastAPI,Kubernetes和各種AWS服務(wù)來做很多事情,但是您不必?fù)?dān)心基礎(chǔ)架構(gòu)(除非您愿意)。
One thing Cortex needs to turn this Python API into a web service, however, is a configuration file, which we write in YAML:
但是,Cortex需要將該P(yáng)ython API轉(zhuǎn)換為Web服務(wù)的一件事是一個(gè)配置文件,我們使用YAML編寫該文件:
Nothing too crazy. We give our API a name, tell Cortex where to find the Python API, and allocate some compute resources, in this case, one CPU. You can configure in much more depth if you’d like, but this will suffice.
沒什么太瘋狂的。 我們給我們的API起個(gè)名字,告訴Cortex在哪里可以找到Python API,并分配一些計(jì)算資源,在這種情況下,是一個(gè)CPU。 如果愿意,您可以進(jìn)行更深入的配置,但這足夠了。
Then, we run $ cortex deploy using the Cortex CLI, and that’s it. Our model is now a functioning web service, a la GPT-3:
然后,我們使用Cortex CLI運(yùn)行$ cortex deploy ,僅此而已。 我們的模型現(xiàn)在是運(yùn)行正常的Web服務(wù),例如GPT-3:
This is the general approach we will take to deployment throughout this list, though the emphasis will be on the models themselves, the tasks they’re suited to, and the projects you can build with them.
這是我們將在整個(gè)列表中進(jìn)行部署的一般方法,盡管重點(diǎn)將放在模型本身,模型適合的任務(wù)以及可以使用它們構(gòu)建的項(xiàng)目上。
1. Gmail智能撰寫 (1. Gmail Smart Compose)
Smart Compose is responsible for those eerily-accurate email suggestions Gmail throws out while you type:
Smart Compose負(fù)責(zé)在您鍵入時(shí)Gmail拋出的那些錯(cuò)誤準(zhǔn)確的電子郵件建議:
Google’s The KeywordGoogle的關(guān)鍵字Even though Smart Compose is the result of huge budgets and engineering teams, you can build your own version in a couple of hours.
即使Smart Compose是龐大預(yù)算和工程團(tuán)隊(duì)的結(jié)果,您也可以在幾個(gè)小時(shí)內(nèi)構(gòu)建自己的版本。
架構(gòu)智能撰寫 (Architecting Smart Compose)
Architecturally, Smart Compose is a straightforward example of realtime inference:
從結(jié)構(gòu)上講,Smart Compose是實(shí)時(shí)推斷的簡(jiǎn)單示例:
- As you type, Gmail pings a web service with the text of your email chain. 鍵入時(shí),Gmail會(huì)使用電子郵件鏈中的文本對(duì)網(wǎng)絡(luò)服務(wù)執(zhí)行ping操作。
- The web service feeds the text to a model, predicting the next sequence. Web服務(wù)將文本輸入模型,以預(yù)測(cè)下一個(gè)序列。
- The web service delivers the predicted text back to the Gmail client. 該網(wǎng)絡(luò)服務(wù)將預(yù)測(cè)的文本發(fā)送回Gmail客戶端。
The biggest technical challenge to Smart Compose is actually latency. Predicting a probable sequence of text is a fairly routine task in ML, but delivering a prediction as fast as someone types is much harder.
Smart Compose的最大技術(shù)挑戰(zhàn)實(shí)際上是延遲。 在ML中,預(yù)測(cè)可能的文本序列是一項(xiàng)相當(dāng)常規(guī)的任務(wù),但要提供與輸入者一樣快的預(yù)測(cè)則要困難得多。
To build our own Smart Compose, we’ll need to select a model, deploy it as an API, and build some kind of text editor frontend to query the API, but I’ll leave that last part to you.
要構(gòu)建自己的Smart Compose,我們需要選擇一個(gè)模型,將其部署為API,并構(gòu)建某種文本編輯器前端來查詢API,但我將把最后一部分留給您。
構(gòu)建文本預(yù)測(cè)API (Building a text prediction API)
Let’s start by picking a model. We need one that is accurate enough to generate good suggestions on potentially not a lot of input. We also, however, need one that can serve predictions quickly.
讓我們從選擇一個(gè)模型開始。 我們需要一個(gè)足夠準(zhǔn)確的建議,以針對(duì)可能不多的輸入產(chǎn)生好的建議。 但是,我們還需要一種可以快速提供預(yù)測(cè)的服務(wù)。
Now, latency isn’t all about the model—the resources you allocate to your API (GPU vs. CPU, for example) play a major role—but the model itself is still important.
現(xiàn)在,延遲不僅僅與模型有關(guān)—您分配給API的資源(例如,GPU與CPU)起著主要作用,但是模型本身仍然很重要。
There are a bunch of models capable of doing text generation, but for this task, we’re going to use DistilGPT-2, via Hugging Face’s Transformers library.
有許多模型可以執(zhí)行文本生成,但是對(duì)于此任務(wù),我們將通過Hugging Face的Transformers庫(kù)使用DistilGPT-2。
GPT-2 is, shockingly, the predecessor to GPT-3. Until GPT-3’s release, it was widely regarded as the best model for text generation. The tradeoff with GPT-2, however, is performance. It’s really big—like 6 GB—and even with GPUs, can be slow in generating predictions. DistilGPT-2, as the name suggests, is a distilled version of GPT-2. It retains most of GPT-2 accuracy, while running roughly twice as fast (according to Hugging Face, it can run on a iPhone 7).
令人震驚的是,GPT-2是GPT-3的前身。 在GPT-3發(fā)布之前,它一直被廣泛認(rèn)為是文本生成的最佳模型。 但是,與GPT-2的權(quán)衡是性能。 它確實(shí)很大(例如6 GB),即使使用GPU,生成預(yù)測(cè)也會(huì)很慢。 顧名思義,DistilGPT-2是GPT-2的簡(jiǎn)化版本。 它保留了大多數(shù)GPT-2精度,同時(shí)運(yùn)行速度大約是后者的兩倍(根據(jù)“擁抱臉”,它可以在iPhone 7上運(yùn)行)。
We can write a prediction API for DistilGPT-2 in barely 15 lines of Python:
我們可以用不到15行的Python編寫針對(duì)DistilGPT-2的預(yù)測(cè)API:
Most of that should be intuitive.
其中大多數(shù)應(yīng)該是直觀的。
We initialize our predictor in __init__(), wherein we declare our device (in this case a CPU, but you can change that to GPU), load our model into memory, and load our tokenizer. A tokenizer encodes text into tokens the model can understand, and decodes predictions into text we can understand.
我們?cè)赺_init__()初始化我們的預(yù)測(cè)變量,在其中聲明我們的設(shè)備(在本例中為CPU,但您可以將其更改為GPU),將模型加載到內(nèi)存中,并加載令牌生成器。 分詞器將文本編碼為模型可以理解的令牌,并將預(yù)測(cè)解碼為我們可以理解的文本。
Then, our predict() function handles requests. It tokenizes our request, feeds it to the model, and returns a decoded prediction.
然后,我們的predict()函數(shù)處理請(qǐng)求。 它標(biāo)記化我們的請(qǐng)求,將其提供給模型,并返回解碼后的預(yù)測(cè)。
Once you deploy that API with Cortex, all you need to do is connect it to your frontend. Under the hood, that’s all Smart Compose is. A single text generator, deployed as an API. The rest is just normal web development.
使用Cortex部署該API之后,您要做的就是將其連接到您的前端。 在后臺(tái),這就是Smart Compose的全部功能。 單個(gè)文本生成器,部署為API。 其余只是普通的Web開發(fā)。
2. Siri式問題解答 (2. Siri-esque question answering)
Virtual assistants, from Siri to Alexa to Google Assistant, are ubiquitous. And while many actually rely on machine learning for multiple tasks—speech-to-text, voice recognition, text-to-speech, and more—they all have one core task in common:
從Siri到Alexa再到Google Assistant的虛擬助手無處不在。 盡管許多人實(shí)際上依靠機(jī)器學(xué)習(xí)來完成多個(gè)任務(wù)(語音到文本,語音識(shí)別,文本到語音等),但它們都有一個(gè)共同的核心任務(wù):
Question answering.
問題解答 。
For many, question answering is one of the more scifi seeming ML tasks, because it fits our pop culture image of a robot that knows more about the world than we do. As it turns out, however, setting up a question answering model is relatively straightforward.
對(duì)于許多人來說,問題解答是看起來更科學(xué)的ML任務(wù)之一,因?yàn)樗衔覀兊牧餍形幕蜗?#xff0c;即機(jī)器人比我們更了解世界。 然而,事實(shí)證明,建立問題回答模型相對(duì)簡(jiǎn)單。
設(shè)計(jì)提取性問題解答 (Architecting extractive question answering)
There are a few different approaches to this task, but we’re going to focus on extractive question answering, in which a model answers questions by extracting relevant summarizations from a body of reference material (documentation, wikipedia, etc.)
有幾種不同的方法可以完成此任務(wù),但我們將專注于提取式問題解答,其中模型通過從參考資料(文檔,維基百科等)中提取相關(guān)摘要來回答問題。
Our API will be a model trained for extractive question answering, initialized with a body of reference material. We’ll then send it inputs via the API, and return predictions.
我們的API將是經(jīng)過訓(xùn)練的提取性問題回答模型,并使用大量參考資料進(jìn)行初始化。 然后,我們將通過API向其發(fā)送輸入,并返回預(yù)測(cè)。
For this example, I’ll use the Wikipedia article on machine learning.
對(duì)于此示例,我將使用有關(guān)機(jī)器學(xué)習(xí)的Wikipedia文章。
構(gòu)建提取性問答API: (Building an extractive question answering API:)
We aren’t going to be selecting a model at all this time. Instead, we’ll be using Hugging Face’s Pipeline, which allows us to download a pretrained model by specifying the task we want to accomplish, not the specific model.
我們不會(huì)一直在選擇模型。 相反,我們將使用Hugging Face的管道,該管道允許我們通過指定要完成的任務(wù)而不是特定模型來下載預(yù)訓(xùn)練的模型。
That’s roughly 7 lines of Python to implement machine learning that, just a few years ago, you would have needed a team of researchers to develop.
大約幾年前,大約需要7行Python來實(shí)現(xiàn)機(jī)器學(xué)習(xí),因此您將需要一個(gè)研究人員團(tuán)隊(duì)來進(jìn)行開發(fā)。
Testing out the API, when I ping it with “What is machine learning,” it responds:
當(dāng)我用“什么是機(jī)器學(xué)習(xí)”對(duì)它進(jìn)行測(cè)試時(shí),對(duì)API進(jìn)行測(cè)試,它將響應(yīng):
"the study of computer algorithms that improve automatically through experience."3. Google翻譯 (3. Google Translate)
Translation is an incredibly complicated task, and the fact that Google Translate is so reliable is a testament to how powerful production machine learning has become over the last decade.
翻譯是一項(xiàng)非常復(fù)雜的任務(wù),而Google Translate如此可靠的事實(shí)證明了過去十年來強(qiáng)大的生產(chǎn)機(jī)器學(xué)習(xí)已變得如此強(qiáng)大。
Google TranslateGoogle翻譯And while Google Translate represents the pinnacle of machine translation in production, you can still build your own Google Translate without becoming an expert in the field.
盡管Google Translate代表了生產(chǎn)中機(jī)器翻譯的巔峰之作,但您仍然可以在不成為該領(lǐng)域?qū)<业那闆r下構(gòu)建自己的Google Translate。
架構(gòu)語言翻譯 (Architecting language translation)
To understand why translation is such a difficult task to model computationally, think about what constitutes a correct translation.
要了解為什么翻譯是一項(xiàng)難以計(jì)算的建模任務(wù),請(qǐng)考慮什么構(gòu)成正確的翻譯。
A phrase can be translated into any number of equivalent sentences in another language, all of which could be “correct,” but some of which would sound better to a native speaker.
可以將一個(gè)短語翻譯成另一種語言的任意數(shù)量的對(duì)等句子,所有這些句子都可能是“正確的”,但其中某些對(duì)母語為母語的人聽起來會(huì)更好 。
These phrases wouldn’t sound better because they were more grammatically correct, they would sound better because they agreed with a wide variety of implicit rules, patterns, and trends in the language, all of which are fluid and change constantly.
這些短語聽起來不會(huì)更好,因?yàn)樗鼈冊(cè)谡Z法上更正確,它們聽起來更好,因?yàn)樗鼈兣c語言中的各種隱式規(guī)則,模式和趨勢(shì)相一致,所有這些規(guī)則,模式和趨勢(shì)都在不斷變化并不斷變化。
The best approach to modeling this complexity is called sequence-to-sequence learning. Writing a primer on sequence-to-sequence learning is beyond the scope of this article, but if you’re interested, I’ve written an article about how it used in both Google Translate and, oddly enough, drug development.
對(duì)這種復(fù)雜性進(jìn)行建模的最佳方法稱為序列到序列學(xué)習(xí)。 撰寫有關(guān)序列到序列學(xué)習(xí)的入門知識(shí)超出了本文的范圍,但是,如果您有興趣,我已經(jīng)寫了一篇文章,介紹了它如何在Google Translate和藥物開發(fā)中使用。
We need to find a sequence-to-sequence model pretrained for translations between two languages, and deploy it as an API.
我們需要找到一種針對(duì)兩種語言之間的翻譯進(jìn)行預(yù)訓(xùn)練的序列到序列模型,并將其部署為API。
構(gòu)建語言翻譯API: (Building a language translation API:)
For this task, we can again use Hugging Face’s Transfomers pipeline to initialize a model fine-tuned for the exact language translation we need. I’ll be using an English-to-German model here.
對(duì)于此任務(wù),我們可以再次使用Hugging Face的Transfomers管道來初始化針對(duì)我們所需的精確語言翻譯進(jìn)行微調(diào)的模型。 我將在這里使用英語到德語的模型。
The code is very similar to before, just import the model from the pipeline, and serve the request:
該代碼與之前非常相似,只是從管道中導(dǎo)入模型并滿足請(qǐng)求:
Now, you can ping that API with any English text, and it will respond with a German translation. For other languages, you can simply load a different model (there are many available through Hugging Face’s library).
現(xiàn)在,您可以使用任何英文文本ping該API,它將以德語翻譯作為響應(yīng)。 對(duì)于其他語言,您可以簡(jiǎn)單地加載不同的模型(Hugging Face的庫(kù)中有很多可用的模型)。
機(jī)器學(xué)習(xí)工程不僅限于FAANG公司 (Machine learning engineering isn’t just for FAANG companies)
All of these products are developed by tech giants, because for years, they’ve been the only ones able to build them. This is no longer the case.
所有這些產(chǎn)品都是由技術(shù)巨頭開發(fā)的,因?yàn)槎嗄陙?#xff0c;它們一直是唯一能夠制造它們的產(chǎn)品。 這已不再是這種情況。
Every one of these examples has implemented state of the art machine learning in less than 20 lines of Python. Any engineer can build them.
這些示例中的每一個(gè)都在不到20行的Python中實(shí)現(xiàn)了最先進(jìn)的機(jī)器學(xué)習(xí)。 任何工程師都可以構(gòu)建它們。
A natural objection here would be that we only used pretrained models, and that to build a “real” product, you’d need to develop a new model from scratch. This is a common line of thinking, but doesn’t square with the reality of the field.
這里的自然反對(duì)意見是,我們僅使用預(yù)先訓(xùn)練的模型,并且要構(gòu)建“真實(shí)的”產(chǎn)品,您需要從頭開始開發(fā)新模型。 這是一條常見的思路,但與該領(lǐng)域的實(shí)際情況不符。
For example, AI Dungeon is a dungeon explorer built on machine learning. The game went viral last year, quickly racking up over 1,000,000 players, and it is still one of the most popular examples of ML text generation.
例如,AI Dungeon是基于機(jī)器學(xué)習(xí)構(gòu)建的地牢瀏覽器。 去年,該游戲開始風(fēng)靡一時(shí),Swift吸引了超過1,000,000名玩家,它仍然是ML文本生成中最受歡迎的示例之一。
While the game has recently transitioned to using GPT-3, it was originally “just” a fine-tuned GPT-2 model. The creator scraped text from a choose-your-own-adventure site, used the gpt-2-simple library to fine-tune GPT-2 with the text, and then deployed it as an API with Cortex.
盡管游戲最近已過渡到使用GPT-3,但它最初只是“微調(diào)”的GPT-2模型。 創(chuàng)建者從一個(gè)自己選擇的冒險(xiǎn)網(wǎng)站上抓取了文本,使用gpt-2-simple庫(kù)對(duì)GPT-2進(jìn)行了微調(diào),然后將其作為API與Cortex一起部署 。
You don’t need Google’s budget. You don’t need early access to the GPT-3 API. You don’t need a PhD in computer science. If you know how to write code, you can build state-of-the-art machine learning applications right now.
您不需要Google的預(yù)算。 您不需要及早訪問GPT-3 API。 您不需要計(jì)算機(jī)科學(xué)博士學(xué)位。 如果您知道如何編寫代碼,則可以立即構(gòu)建最新的機(jī)器學(xué)習(xí)應(yīng)用程序。
翻譯自: https://towardsdatascience.com/a-list-of-things-you-can-build-with-language-models-not-just-gpt-3-e6fcac85cef1
gpt 語言模型
總結(jié)
以上是生活随笔為你收集整理的gpt 语言模型_您可以使用语言模型构建的事物的列表-不仅仅是GPT-3的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 重拾强化学习的核心概念_强化学习的核心概
- 下一篇: 廉价raid_如何查找80行代码中的廉价