當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

gpt 语言模型_您可以使用语言模型构建的事物的列表-不仅仅是GPT-3

發布時間：2023/12/15 编程问答 52 豆豆

生活随笔收集整理的這篇文章主要介紹了 gpt 语言模型_您可以使用语言模型构建的事物的列表-不仅仅是GPT-3 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

gpt 語言模型

Natural language processing (NLP) is everywhere lately, with OpenAI’s GPT-3 generating as much hype as we’ve ever seen from a single model.

最近，自然語言處理(NLP)隨處可見，OpenAI的GPT-3產生了我們從單一模型中看到的盡可能多的炒作。

As I’ve written about before, the flood of projects being built on GPT-3 is not down to just its computational power, but its accessibility. The fact that it was released as a simple API has made it so that anyone who can query an endpoint can use state of the art machine learning.

正如我之前所寫，在GPT-3上構建的大量項目不僅取決于其計算能力，還取決于其可訪問性。它以簡單的API發行的事實使之成為現實，因此任何可以查詢端點的人都可以使用最新的機器學習技術。

Experiencing machine learning as “just another web service” has opened the eyes of many engineers, who previously experienced machine learning as an arcane, unapproachable field.

將機器學習體驗為“僅僅是另一個Web服務”已經打開了許多工程師的視野，他們以前曾將機器學習視為一個不可思議的，不可接近的領域。

Suddenly, machine learning is something you can build things with.

突然之間，機器學習就是您可以構建的東西。

And while GPT-3 is an incredible accomplishment, it’s far from the only impressive language model in the world. If you’re interested in machine learning engineering, my goal in this article is to introduce you to a number of open source language models you can use today to build software, using some of the most popular ML applications in the world as examples.

盡管GPT-3是令人難以置信的成就，但它與世界上唯一令人印象深刻的語言模型相距甚遠。如果您對機器學習工程感興趣，本文的目的是向您介紹當今可以用于構建軟件的許多開源語言模型，并以世界上一些最受歡迎的ML應用程序為例。

Before we start, I need to give a little context to our approach.

在開始之前，我需要對我們的方法進行一些介紹。

將模型作為API進行實時推理 (Deploying a model as an API for realtime inference)

What made GPT-3 so accessible to engineers was that to use it, you just queried an endpoint with some text, and it sent back a response. This on-demand, web service interface is called realtime inference.

使GPT-3易于工程師使用的原因是要使用它，您只是向端點查詢了一些文本，然后它發回了響應。這種按需的Web服務接口稱為實時推理 。

In the case of GPT-3, the API was deployed for us by the team at OpenAI. However, deploying a model as an API on our own is fairly trivial, with the right tools.

對于GPT-3，OpenAI團隊已為我們部署了API。但是，使用合適的工具將模型本身作為API部署是相當瑣碎的。

We’re going to use two main tools in these examples. First is Hugging Face’s Transformers, a library that provides a very easy-to-use interface for working with popular language models. Second is Cortex, an open source machine learning engineering platform I maintain, designed to be make it as easy as possible to put models into production.

在這些示例中，我們將使用兩個主要工具。首先是Hugging Face的Transformers，該庫提供了一個非常易于使用的界面來處理流行的語言模型。其次是Cortex ，我維護著一個開放源代碼的機器學習工程平臺，旨在使其盡可能輕松地將模型投入生產。

To deploy a model as an API for realtime inference, we need to do three things.

要將模型部署為用于實時推理的API，我們需要做三件事。

First, we need to write the API. With Cortex’s Predictor interface, our API is just a Python class with an __init__() function, which initializes our model, and a predict() function, which does the predicting. It looks something like this:

首先，我們需要編寫API。通過Cortex的Predictor接口，我們的API只是一個帶有__init__()函數(用于初始化我們的模型)和predict()函數predict()用于進行predict()的Python類。看起來像這樣：

Cortex will then use this to create and deploy a web service. Under the hood, it’s doing a bunch of things with Docker, FastAPI, Kubernetes, and various AWS Services, but you don’t have to worry about the underlying infrastructure (unless you want to).

然后，Cortex將使用它來創建和部署Web服務。在后臺，它正在使用Docker，FastAPI，Kubernetes和各種AWS服務來做很多事情，但是您不必擔心基礎架構(除非您愿意)。

One thing Cortex needs to turn this Python API into a web service, however, is a configuration file, which we write in YAML:

但是，Cortex需要將該Python API轉換為Web服務的一件事是一個配置文件，我們使用YAML編寫該文件：

Nothing too crazy. We give our API a name, tell Cortex where to find the Python API, and allocate some compute resources, in this case, one CPU. You can configure in much more depth if you’d like, but this will suffice.

沒什么太瘋狂的。我們給我們的API起個名字，告訴Cortex在哪里可以找到Python API，并分配一些計算資源，在這種情況下，是一個CPU。如果愿意，您可以進行更深入的配置，但這足夠了。

Then, we run $ cortex deploy using the Cortex CLI, and that’s it. Our model is now a functioning web service, a la GPT-3:

然后，我們使用Cortex CLI運行$ cortex deploy ，僅此而已。我們的模型現在是運行正常的Web服務，例如GPT-3：

This is the general approach we will take to deployment throughout this list, though the emphasis will be on the models themselves, the tasks they’re suited to, and the projects you can build with them.

這是我們將在整個列表中進行部署的一般方法，盡管重點將放在模型本身，模型適合的任務以及可以使用它們構建的項目上。

1. Gmail智能撰寫 (1. Gmail Smart Compose)

Smart Compose is responsible for those eerily-accurate email suggestions Gmail throws out while you type:

Smart Compose負責在您鍵入時Gmail拋出的那些錯誤準確的電子郵件建議：

Google’s The KeywordGoogle的關鍵字

Even though Smart Compose is the result of huge budgets and engineering teams, you can build your own version in a couple of hours.

即使Smart Compose是龐大預算和工程團隊的結果，您也可以在幾個小時內構建自己的版本。

架構智能撰寫 (Architecting Smart Compose)

Architecturally, Smart Compose is a straightforward example of realtime inference:

從結構上講，Smart Compose是實時推斷的簡單示例：

As you type, Gmail pings a web service with the text of your email chain.
鍵入時，Gmail會使用電子郵件鏈中的文本對網絡服務執行ping操作。
The web service feeds the text to a model, predicting the next sequence.
Web服務將文本輸入模型，以預測下一個序列。
The web service delivers the predicted text back to the Gmail client.
該網絡服務將預測的文本發送回Gmail客戶端。

The biggest technical challenge to Smart Compose is actually latency. Predicting a probable sequence of text is a fairly routine task in ML, but delivering a prediction as fast as someone types is much harder.

Smart Compose的最大技術挑戰實際上是延遲。在ML中，預測可能的文本序列是一項相當常規的任務，但要提供與輸入者一樣快的預測則要困難得多。

To build our own Smart Compose, we’ll need to select a model, deploy it as an API, and build some kind of text editor frontend to query the API, but I’ll leave that last part to you.

要構建自己的Smart Compose，我們需要選擇一個模型，將其部署為API，并構建某種文本編輯器前端來查詢API，但我將把最后一部分留給您。

構建文本預測API (Building a text prediction API)

Let’s start by picking a model. We need one that is accurate enough to generate good suggestions on potentially not a lot of input. We also, however, need one that can serve predictions quickly.

讓我們從選擇一個模型開始。我們需要一個足夠準確的建議，以針對可能不多的輸入產生好的建議。但是，我們還需要一種可以快速提供預測的服務。

Now, latency isn’t all about the model—the resources you allocate to your API (GPU vs. CPU, for example) play a major role—but the model itself is still important.

現在，延遲不僅僅與模型有關—您分配給API的資源(例如，GPU與CPU)起著主要作用，但是模型本身仍然很重要。

There are a bunch of models capable of doing text generation, but for this task, we’re going to use DistilGPT-2, via Hugging Face’s Transformers library.

有許多模型可以執行文本生成，但是對于此任務，我們將通過Hugging Face的Transformers庫使用DistilGPT-2。

GPT-2 is, shockingly, the predecessor to GPT-3. Until GPT-3’s release, it was widely regarded as the best model for text generation. The tradeoff with GPT-2, however, is performance. It’s really big—like 6 GB—and even with GPUs, can be slow in generating predictions. DistilGPT-2, as the name suggests, is a distilled version of GPT-2. It retains most of GPT-2 accuracy, while running roughly twice as fast (according to Hugging Face, it can run on a iPhone 7).

令人震驚的是，GPT-2是GPT-3的前身。在GPT-3發布之前，它一直被廣泛認為是文本生成的最佳模型。但是，與GPT-2的權衡是性能。它確實很大(例如6 GB)，即使使用GPU，生成預測也會很慢。顧名思義，DistilGPT-2是GPT-2的簡化版本。它保留了大多數GPT-2精度，同時運行速度大約是后者的兩倍(根據“擁抱臉”，它可以在iPhone 7上運行)。

We can write a prediction API for DistilGPT-2 in barely 15 lines of Python:

我們可以用不到15行的Python編寫針對DistilGPT-2的預測API：

Most of that should be intuitive.

其中大多數應該是直觀的。

We initialize our predictor in __init__(), wherein we declare our device (in this case a CPU, but you can change that to GPU), load our model into memory, and load our tokenizer. A tokenizer encodes text into tokens the model can understand, and decodes predictions into text we can understand.

我們在__init__()初始化我們的預測變量，在其中聲明我們的設備(在本例中為CPU，但您可以將其更改為GPU)，將模型加載到內存中，并加載令牌生成器。分詞器將文本編碼為模型可以理解的令牌，并將預測解碼為我們可以理解的文本。

Then, our predict() function handles requests. It tokenizes our request, feeds it to the model, and returns a decoded prediction.

然后，我們的predict()函數處理請求。它標記化我們的請求，將其提供給模型，并返回解碼后的預測。

Once you deploy that API with Cortex, all you need to do is connect it to your frontend. Under the hood, that’s all Smart Compose is. A single text generator, deployed as an API. The rest is just normal web development.

使用Cortex部署該API之后，您要做的就是將其連接到您的前端。在后臺，這就是Smart Compose的全部功能。單個文本生成器，部署為API。其余只是普通的Web開發。

2. Siri式問題解答 (2. Siri-esque question answering)

Virtual assistants, from Siri to Alexa to Google Assistant, are ubiquitous. And while many actually rely on machine learning for multiple tasks—speech-to-text, voice recognition, text-to-speech, and more—they all have one core task in common:

從Siri到Alexa再到Google Assistant的虛擬助手無處不在。盡管許多人實際上依靠機器學習來完成多個任務(語音到文本，語音識別，文本到語音等)，但它們都有一個共同的核心任務：

Question answering.

問題解答 。

For many, question answering is one of the more scifi seeming ML tasks, because it fits our pop culture image of a robot that knows more about the world than we do. As it turns out, however, setting up a question answering model is relatively straightforward.

對于許多人來說，問題解答是看起來更科學的ML任務之一，因為它符合我們的流行文化形象，即機器人比我們更了解世界。然而，事實證明，建立問題回答模型相對簡單。

設計提取性問題解答 (Architecting extractive question answering)

There are a few different approaches to this task, but we’re going to focus on extractive question answering, in which a model answers questions by extracting relevant summarizations from a body of reference material (documentation, wikipedia, etc.)

有幾種不同的方法可以完成此任務，但我們將專注于提取式問題解答，其中模型通過從參考資料(文檔，維基百科等)中提取相關摘要來回答問題。

Our API will be a model trained for extractive question answering, initialized with a body of reference material. We’ll then send it inputs via the API, and return predictions.

我們的API將是經過訓練的提取性問題回答模型，并使用大量參考資料進行初始化。然后，我們將通過API向其發送輸入，并返回預測。

For this example, I’ll use the Wikipedia article on machine learning.

對于此示例，我將使用有關機器學習的Wikipedia文章。

構建提取性問答API： (Building an extractive question answering API:)

We aren’t going to be selecting a model at all this time. Instead, we’ll be using Hugging Face’s Pipeline, which allows us to download a pretrained model by specifying the task we want to accomplish, not the specific model.

我們不會一直在選擇模型。相反，我們將使用Hugging Face的管道，該管道允許我們通過指定要完成的任務而不是特定模型來下載預訓練的模型。

That’s roughly 7 lines of Python to implement machine learning that, just a few years ago, you would have needed a team of researchers to develop.

大約幾年前，大約需要7行Python來實現機器學習，因此您將需要一個研究人員團隊來進行開發。

Testing out the API, when I ping it with “What is machine learning,” it responds:

當我用“什么是機器學習”對它進行測試時，對API進行測試，它將響應：

"the study of computer algorithms that improve automatically through experience."

3. Google翻譯 (3. Google Translate)

Translation is an incredibly complicated task, and the fact that Google Translate is so reliable is a testament to how powerful production machine learning has become over the last decade.

翻譯是一項非常復雜的任務，而Google Translate如此可靠的事實證明了過去十年來強大的生產機器學習已變得如此強大。

Google TranslateGoogle翻譯

And while Google Translate represents the pinnacle of machine translation in production, you can still build your own Google Translate without becoming an expert in the field.

盡管Google Translate代表了生產中機器翻譯的巔峰之作，但您仍然可以在不成為該領域專家的情況下構建自己的Google Translate。

架構語言翻譯 (Architecting language translation)

To understand why translation is such a difficult task to model computationally, think about what constitutes a correct translation.

要了解為什么翻譯是一項難以計算的建模任務，請考慮什么構成正確的翻譯。

A phrase can be translated into any number of equivalent sentences in another language, all of which could be “correct,” but some of which would sound better to a native speaker.

可以將一個短語翻譯成另一種語言的任意數量的對等句子，所有這些句子都可能是“正確的”，但其中某些對母語為母語的人聽起來會更好。

These phrases wouldn’t sound better because they were more grammatically correct, they would sound better because they agreed with a wide variety of implicit rules, patterns, and trends in the language, all of which are fluid and change constantly.

這些短語聽起來不會更好，因為它們在語法上更正確，它們聽起來更好，因為它們與語言中的各種隱式規則，模式和趨勢相一致，所有這些規則，模式和趨勢都在不斷變化并不斷變化。

The best approach to modeling this complexity is called sequence-to-sequence learning. Writing a primer on sequence-to-sequence learning is beyond the scope of this article, but if you’re interested, I’ve written an article about how it used in both Google Translate and, oddly enough, drug development.

對這種復雜性進行建模的最佳方法稱為序列到序列學習。撰寫有關序列到序列學習的入門知識超出了本文的范圍，但是，如果您有興趣，我已經寫了一篇文章，介紹了它如何在Google Translate和藥物開發中使用。

We need to find a sequence-to-sequence model pretrained for translations between two languages, and deploy it as an API.

我們需要找到一種針對兩種語言之間的翻譯進行預訓練的序列到序列模型，并將其部署為API。

構建語言翻譯API： (Building a language translation API:)

For this task, we can again use Hugging Face’s Transfomers pipeline to initialize a model fine-tuned for the exact language translation we need. I’ll be using an English-to-German model here.

對于此任務，我們可以再次使用Hugging Face的Transfomers管道來初始化針對我們所需的精確語言翻譯進行微調的模型。我將在這里使用英語到德語的模型。

The code is very similar to before, just import the model from the pipeline, and serve the request:

該代碼與之前非常相似，只是從管道中導入模型并滿足請求：

Now, you can ping that API with any English text, and it will respond with a German translation. For other languages, you can simply load a different model (there are many available through Hugging Face’s library).

現在，您可以使用任何英文文本ping該API，它將以德語翻譯作為響應。對于其他語言，您可以簡單地加載不同的模型(Hugging Face的庫中有很多可用的模型)。

機器學習工程不僅限于FAANG公司 (Machine learning engineering isn’t just for FAANG companies)

All of these products are developed by tech giants, because for years, they’ve been the only ones able to build them. This is no longer the case.

所有這些產品都是由技術巨頭開發的，因為多年來，它們一直是唯一能夠制造它們的產品。這已不再是這種情況。

Every one of these examples has implemented state of the art machine learning in less than 20 lines of Python. Any engineer can build them.

這些示例中的每一個都在不到20行的Python中實現了最先進的機器學習。任何工程師都可以構建它們。

A natural objection here would be that we only used pretrained models, and that to build a “real” product, you’d need to develop a new model from scratch. This is a common line of thinking, but doesn’t square with the reality of the field.

這里的自然反對意見是，我們僅使用預先訓練的模型，并且要構建“真實的”產品，您需要從頭開始開發新模型。這是一條常見的思路，但與該領域的實際情況不符。

For example, AI Dungeon is a dungeon explorer built on machine learning. The game went viral last year, quickly racking up over 1,000,000 players, and it is still one of the most popular examples of ML text generation.

例如，AI Dungeon是基于機器學習構建的地牢瀏覽器。去年，該游戲開始風靡一時，Swift吸引了超過1,000,000名玩家，它仍然是ML文本生成中最受歡迎的示例之一。

While the game has recently transitioned to using GPT-3, it was originally “just” a fine-tuned GPT-2 model. The creator scraped text from a choose-your-own-adventure site, used the gpt-2-simple library to fine-tune GPT-2 with the text, and then deployed it as an API with Cortex.

盡管游戲最近已過渡到使用GPT-3，但它最初只是“微調”的GPT-2模型。創建者從一個自己選擇的冒險網站上抓取了文本，使用gpt-2-simple庫對GPT-2進行了微調，然后將其作為API與Cortex一起部署。

You don’t need Google’s budget. You don’t need early access to the GPT-3 API. You don’t need a PhD in computer science. If you know how to write code, you can build state-of-the-art machine learning applications right now.

您不需要Google的預算。您不需要及早訪問GPT-3 API。您不需要計算機科學博士學位。如果您知道如何編寫代碼，則可以立即構建最新的機器學習應用程序。

翻譯自: https://towardsdatascience.com/a-list-of-things-you-can-build-with-language-models-not-just-gpt-3-e6fcac85cef1