當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

BERT论文阅读(二): CG-BERT:Conditional Text Generation with BERT for Generalized Few-shot Intent Detection

發布時間：2025/4/5 编程问答 45 豆豆

生活随笔收集整理的這篇文章主要介紹了 BERT论文阅读(二): CG-BERT:Conditional Text Generation with BERT for Generalized Few-shot Intent Detection 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

The proposed method

Input Representation

The Encoder?

?The Decoder

?fine-tuning

discriminate a joint label space consisting of both existing intent which have enough labeled data and novel intents which only have a few examples for each class.

==> Conditional Text Generation with BERT

The proposed method

CG-BERT: adopts the CVAE(Condiional ?Variational?AutoEncoder) framework and incorporates BERT into both the encoder and the decoder.

采用條件變分自編碼器，并將BERT融入到encoder-decoder中

the encoder: encodes the utterance x and its intent y together into a latent variable z and models the posterior distribution p(z|x,y), where y is the condition in the CVAE model.

編碼器：同時將話語x和意圖y編碼為一個潛變量z，并且模擬z的后驗概率分布p(z|x,y)，y是CVAE模型中的條件。 ==> encoder模擬few-shot intent的數據概率分布

the decoder: decodes z and the intent y together to reconstruct the input utterance x.

解碼器：同時解碼變量z和意圖y，以便重構輸入話語x ==> 利用masked attention的特性限制attend，以保持文本生成這種特定任務left-to-right的特性，保留其autoregressive特性！

to generate new utterances for an novel intent y, we sample the latent variable z from a prior distribution p(z|y) and utilize the decoder to decode z and y??into new utterances.

為新意圖y生成新話語，我們通過從一個先驗分布p(z|y)采樣潛變量z，并且用解碼器解碼變量z和y to 新話語。

It's able to generate more utterances for the novel intent through sampling from the learned distribution.

通過從學到的概率分布中采樣，為新意圖生成更多的話語

Input Representation

input: intent + utterance text sentences (concatenated)

句子S1: CLS token + intent y + SEP token --> first intent sentence

句子S2: utterance x + SEP --> second utterance sentence

whole input: S1 + S2

CLS: as the representation for the whole input

variable z: encode the embeddings for [CLS] to the latent variable z

Text are tokenized into subword units by WordPiece

embedding: obtained for each token --> token embeddings, position embeddings, segment embeddings

a given token: constructed by summing these three embeddings and represented as?? with a total length of T tokens.

The Encoder?

models the distribution of diverse utterances for a given intent.

對給定intent，即few-shot intent，的不同話語分布進行建模

to obtain deep bidirectional context information <-- models the attention between the intent tokens and the utterance tokens

為獲得深度雙向上下文信息 <-- 利用意圖令牌和話語令牌之間的attention進行建模

the input representation:??

multiple self-attention heads:?

output of the previous layer??--> a triple of queries, keys and values

embeddings for the [CLS] token in the 6-th transformer block??--> sentence-level representation

sentence-level representation?? --> a latent variable z =?a latent vector z,?where prior distribution p(z|y) is a multivariate standard Gaussian distribution.

?u and??in the Gaussian distribution q(z|x,y) = N(u, ) --> to sample z

?The Decoder

?aims to reconstruct the input utterance x using the latent variable z and the intent y.

目的是用潛變量z和意圖y重構輸入話語x

residual connection from input representation H0 --> decoder H6'殘差連接z和H0

==> input of the decoder ?

left-to-right manner ==> 掩碼masked attention

the attention mask --> helps the transformer blocks fit into the conditional text generation task.?

attention掩碼 --> 幫助transformer塊適應有條件文本生成任務

not whole bidirectional attention to the input ==> instead a mask matrix to determine whether a pair of tokens can be attended to each other.

并不是全部雙向attention的輸入 ==> 而是用一個掩碼矩陣去決定一對令牌是否要相互關注

updated Attention:

?output of 12-th transformer block in decoder?，?is the embeddings for the latent variable z

To further increase the?impact of z and alleviate the vanishing latent variable?problem,

embeddings of z with all the tokens?，

Two fully-connected layers with?a layer normalization to get the final representation

to predict the next token at position t+1 <-- the embeddings in Hf at position t

?fine-tuning

in order to improve the performance in the few-shot intent?of model?learned from existing intents with enough labeled data.

reference: Cross-Lingual Natural Language Generation via Pre-training

總結

以上是生活随笔為你收集整理的BERT论文阅读(二): CG-BERT:Conditional Text Generation with BERT for Generalized Few-shot Intent Detection的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：深度学习数学基础(三): 激活函数、正则
下一篇：指代消解论文阅读(一): END-TO-