日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

预训练语言模型论文分类整理:综述、基准数据集、PLM的设计和分析

發布時間:2024/10/8 编程问答 30 豆豆
生活随笔 收集整理的這篇文章主要介紹了 预训练语言模型论文分类整理:综述、基准数据集、PLM的设计和分析 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

?作者?|?王曉磊

學校?|?中國人民大學博士生

研究方向?|?對話系統

1. 引言

近年來,以 BERT 和 GPT 系列為代表的大規模預訓練語言模型(Pre-trained Language Model, PLM)在 NLP 的各個領域取得了巨大成功。本文整理了自 BERT 和 GPT 誕生以來與 PLM 相關的論文,根據引用數篩選出163篇具有代表性的工作,并按照綜述基準數據集PLM的設計PLM的分析高效的PLMPLM的使用六大類型進行了初步劃分。

本文整理的論文列表已經同步更新到 GitHub,也會進行持續的更新,歡迎大家關注和 Star。

https://github.com/RUCAIBox/PLMPapers

本文盡可能地在每篇論文的后面附上了 PDF 鏈接、代碼實現和項目主頁,以方便讀者進一步了解相關工作。

2. 綜述

  • "Pre-trained models for natural language processing: A survey". Science China Technological Sciences(2020)

  • "Which *BERT? A Survey Organizing Contextualized Encoders". EMNLP(2020)

  • "A Primer in BERTology: What We Know About How BERT Works". TACL(2020)?

  • "From static to dynamic word representations: a survey". International Journal of Machine Learning and Cybernetics(2020)?

  • "Overview of the Transformer-based Models for NLP Tasks". 2020 15th Conference on Computer Science and Information Systems (FedCSIS)?

  • "A Survey on Contextual Embeddings". arXiv(2020)?

  • "The NLP Cookbook: Modern Recipes for Transformer Based Deep Learning Architectures". IEEE Access(2021)?

  • "Pre-Trained Models: Past, Present and Future". arXiv(2021)?

  • "A Survey of Transformers". arXiv(2021)?

  • 3. 基準數據集

  • XNLI: "XNLI: Evaluating Cross-lingual Sentence Representations". EMNLP(2018)?

  • GLUE: "GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding". ICLR(2019)

  • SuperGLUE: "SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems". NeurIPS(2019)?

  • CLUE: "CLUE: A Chinese Language Understanding Evaluation Benchmark". COLING(2020)?

  • XTREME: "XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization". ICML(2020)?

  • XGLUE: "XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation". EMNLP(2020)?

  • DialoGLUE: "DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue". arXiv(2020)?

  • ?4.?PLM的設計

    ?4.1 通用設計

  • GPT: "Improving Language Understanding by Generative Pre-Training". OpenAI(2018)?

  • GPT-2: "Language Models are Unsupervised Multitask Learners". OpenAI(2019)?

  • BERT: "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". NAACL(2019)?

  • XLNet: "XLNet: Generalized Autoregressive Pretraining for Language Understanding". NeurIPS(2019)?

  • SBERT: "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks". ACL(2019)?

  • UniLM: "Unified Language Model Pre-training for Natural Language Understanding and Generation". NeurIPS(2019)?

  • MASS: "MASS: Masked Sequence to Sequence Pre-training for Language Generation". ICML(2019)?

  • Chinese-BERT-wwm: "Pre-Training with Whole Word Masking for Chinese BERT". arXiv(2019)?

  • "Cloze-driven Pretraining of Self-attention Networks". EMNLP(2019)?

  • "BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model". Workshop on Methods for Optimizing and Evaluating Neural Language Generation(2019)?

  • GPT-3: "Language Models are Few-Shot Learners". arXiv(2020)?

  • T5: "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". JMLR(2020)?

  • BART: "BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension". ACL(2020)?

  • Poly-encoders: "Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring". ICLR(2020)?

  • SpanBERT: "SpanBERT: Improving Pre-training by Representing and Predicting Spans". TACL(2020)?

  • ERNIE 2.0: "ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding". AAAI(2020)?

  • SemBERT: "Semantics-Aware BERT for Language Understanding". AAAI(2020)?

  • "Leveraging Pre-trained Checkpoints for Sequence Generation Tasks". TACL(2020)?

  • ProphetNet: "ProphetNet: Predicting Future N-gram for Sequence-to-SequencePre-training". EMNLP(2020)?

  • UniLMv2: "UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training". ICML(2020)?

  • MacBERT: "Revisiting Pre-Trained Models for Chinese Natural Language Processing". EMNLP(2020)?

  • MPNet: "MPNet: Masked and Permuted Pre-training for Language Understanding". arXiv(2020)?

  • DEBERTA: "DeBERTa: Decoding-enhanced BERT with Disentangled Attention". ICLR(2021)?

  • PALM: "PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation". EMNLP(2020)?

  • 4.2?知識增強

  • ERNIE(Baidu): "ERNIE: Enhanced Representation through Knowledge Integration". arXiv(2019)?

  • KnowBert: "Knowledge Enhanced Contextual Word Representations". EMNLP(2019)?

  • ERNIE(Tsinghua): "ERNIE: Enhanced Language Representation with Informative Entities". ACL(2019)?

  • COMET: "COMET: Commonsense Transformers for Automatic Knowledge Graph Construction". ACL(2019)?

  • K-BERT: "K-BERT: Enabling Language Representation with Knowledge Graph". AAAI(2020)?

  • WKLM: "Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model". ICLR(2020)?

  • LUKE: "LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention". EMNLP(2020)?

  • K-Adapter: "K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters". ICLR(2021)?

  • KEPLER: "KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation". TACL(2021)?

  • 4.3?多語言

  • XLM: "Cross-lingual Language Model Pretraining". arXiv(2019)?

  • "Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond". TACL(2019)?

  • UDify: "75 Languages, 1 Model: Parsing Universal Dependencies Universally". EMNLP(2019)?

  • Unicoder: "Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks". EMNLP(2019)?

  • XLM-R: "Unsupervised Cross-lingual Representation Learning at Scale". ACL(2020)?

  • "Multilingual Alignment of Contextual Word Representations". ICLR(2020)?

  • mBART: "Multilingual Denoising Pre-training for Neural Machine Translation". TACL(2020)?

  • mT5: "mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer". NAACL(2021)?

  • InfoXLM: "InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training". NAACL(2021)?

  • 4.4 多模態

  • ViLBERT: "ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks". NeuralIPS(2019)?

  • LXMERT: "LXMERT: Learning Cross-Modality Encoder Representations from Transformers". EMNLP(2019)?

  • VideoBERT: "VideoBERT: A Joint Model for Video and Language Representation Learning" ICCV(2019)?

  • MulT: "Multimodal Transformer for Unaligned Multimodal Language Sequences". ACL(2019)?

  • VisualBERT: "VisualBERT: A Simple and Performant Baseline for Vision and Language". arXiv(2019)?

  • B2T2: "Fusion of Detected Objects in Text for Visual Question Answering". EMNLP(2019)?

  • VL-BERT: "VL-BERT: Pre-training of Generic Visual-Linguistic Representations". ICLR(2020)?

  • Unicoder-VL: "Unicoder-VL: A Universal Encoder for Vision and Language by Cross-Modal Pre-Training". AAAI(2020)?

  • VLP: "Unified Vision-Language Pre-Training for Image Captioning and VQA". AAAI(2020)?

  • UNITER: "UNITER: UNiversal Image-TExt Representation Learning". ECCV(2020)?

  • Oscar: "Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks". ECCV(2020)?

  • "12-in-1: Multi-Task Vision and Language Representation Learning". CVPR(2020)?

  • ActBERT: "ActBERT: Learning Global-Local Video-Text Representations". CVPR(2020)?

  • VLN: "Vision-Language Navigation With Self-Supervised Auxiliary Reasoning Tasks". CVPR(2020)?

  • VILLA: "Large-Scale Adversarial Training for Vision-and-Language Representation Learning". arXiv(2020)?

  • ImageBERT: "ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data". arXiv(2020)?

  • ALIGN: "Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision". ICML(2021)?

  • ClipBERT: "Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling". CVPR(2021)?

  • DALL·E: "Zero-Shot Text-to-Image Generation". arXiv(2021)?

  • CLIP: "Learning Transferable Visual Models From Natural Language Supervision". arXiv(2021)?

  • 4.5 信息檢索

  • ORQA: "Latent Retrieval for Weakly Supervised Open Domain Question Answering". ACL(2019)?

  • REALM: "REALM: Retrieval-Augmented Language Model Pre-Training". arXiv(2020)?

  • RAG: "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks". NeurIPS(2020)?

  • DPR: "Dense Passage Retrieval for Open-Domain Question Answering". EMNLP(2020)?

  • ?5.?PLM的分析

    ?5.1 知識

  • "What Does BERT Look at? An Analysis of BERT’s Attention". BlackBoxNLP(2019)?

  • "BERT Rediscovers the Classical NLP Pipeline". ACL(2019)?

  • "How Multilingual is Multilingual BERT?". ACL(2019)?

  • "A Structural Probe for Finding Syntax in Word Representations". NAACL(2019)?

  • "Language Models as Knowledge Bases?". EMNLP(2019)?

  • "What Does BERT Learn about the Structure of Language?". ACL(2019)?

  • "Linguistic Knowledge and Transferability of Contextual Representations". NAACL(2019)?

  • "Assessing BERT's Syntactic Abilities". arXiv(2019)?

  • "Probing Neural Network Comprehension of Natural Language Arguments" ACL(2019)?

  • "How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings". EMNLP(2019)?

  • "Visualizing and Measuring the Geometry of BERT". NeurIPS(2019)?

  • "Designing and Interpreting Probes with Control Tasks". EMNLP(2019)?

  • "Open Sesame: Getting inside BERT’s Linguistic Knowledge". BlackboxNLP(2019)?

  • "What do you learn from context? Probing for sentence structure in contextualized word representations". ICLR(2019)?

  • "Commonsense Knowledge Mining from Pretrained Models". EMNLP(2019)?

  • "Do NLP Models Know Numbers? Probing Numeracy in Embeddings". EMNLP(2019)?

  • "On the Cross-lingual Transferability of Monolingual Representations". ACL(2020)?

  • "Cross-Lingual Ability of Multilingual BERT: An Empirical Study". ICLR(2020)?

  • "What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models". TACL(2020)?

  • "How Much Knowledge Can You Pack Into the Parameters of a Language Model?". EMNLP(2020)?

  • "How Can We Know What Language Models Know?". TACL(2020)?

  • "oLMpics-On What Language Model Pre-training Captures". TACL(2020)

  • "Information-Theoretic Probing with Minimum Description Length". EMNLP(2020)?

  • "Inducing Relational Knowledge from BERT". AAAI(2020)?

  • AutoPrompt: "AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts". EMNLP(2020)?

  • "Emergent linguistic structure in artificial neural networks trained by self-supervision". PNAS(2020)?

  • "Evaluating Commonsense in Pre-Trained Language Models". AAAI(2020)?

  • "Inducing Relational Knowledge from BERT". AAAI(2020)?

  • ?5.2?魯棒性

  • "Universal Adversarial Triggers for Attacking and Analyzing NLP". EMNLP(2019)?

  • "Pretrained Transformers Improve Out-of-Distribution Robustness". ACL(2020)?

  • BERT-ATTACK: "BERT-ATTACK: Adversarial Attack Against BERT Using BERT". EMNLP(2020)?

  • "Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment". AAAI(2020)?

  • 5.3?稀疏性

  • "Are Sixteen Heads Really Better than One?". NeurIPS(2019)?

  • "Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned". ACL(2019)?

  • "Revealing the Dark Secrets of BERT". EMNLP(2019)?

  • "The Lottery Ticket Hypothesis for Pre-trained BERT Networks". NeurIPS(2020)?

  • "When BERT Plays the Lottery, All Tickets Are Winning". EMNLP(2020)?

  • 5.4?其他

  • "Scaling Laws for Neural Language Models". arXiv(2020)

  • "Extracting Training Data from Large Language Models". arXiv(2020)?

  • 6. 高效的PLM

    6.1?模型訓練

  • RoBERTa: "RoBERTa: A Robustly Optimized BERT Pretraining Approach". arXiv(2019)?

  • "Efficient Training of BERT by Progressively Stacking". ICML(2019)?

  • Megatron-LM: "Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism". arXiv(2019)?

  • ELECTRA: "ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators". ICLR(2020)?

  • "Large Batch Optimization for Deep Learning: Training BERT in 76 minutes". ICLR(2020)?

  • GShard: "GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding". arXiv(2020)?

  • Admin: "Understanding the Difficulty of Training Transformers". EMNLP(2020)?

  • ZeRO: "ZeRO: Memory optimizations Toward Training Trillion Parameter Models". SC20: International Conference for High Performance Computing, Networking, Storage and Analysis?

  • Switch Transformers: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity". arXiv(2021)?

  • 6.2?模型壓縮

  • DistilBERT: "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter". arXiv(2019)?

  • PKD: "Patient Knowledge Distillation for BERT Model Compression". EMNLP(2019)?

  • "Distilling Task-Specific Knowledge from BERT into Simple Neural Networks". arXiv(2019)?

  • Q8BERT: "Q8BERT: Quantized 8Bit BERT". 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS 2019?

  • ALBERT: "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations". ICLR(2020)?

  • TinyBERT: "TinyBERT: Distilling BERT for Natural Language Understanding". EMNLP(2020)?

  • Layerdrop: "Reducing Transformer Depth on Demand with Structured Dropout". ICLR(2020)?

  • Q-BERT: "Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT". AAAI(2020)?

  • MobileBERT: "MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices". ACL(2020)?

  • "Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning". 5th Workshop on Representation Learning for NLP(2020)?

  • MiniLM: "MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers". arXiv(2020)?

  • FastBERT: "FastBERT: a Self-distilling BERT with Adaptive Inference Time". ACL(2020)?

  • DeeBERT: "DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference". ACL(2020)?

  • 7. PLM的使用

    7.1 兩階段

  • "Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks". arXiv(2018)?

  • "How to Fine-Tune BERT for Text Classification?". CCL(2019)?

  • "Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks". ACL(2020)?

  • "Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work?". ACL(2020)?

  • 7.2?多任務

  • MT-DNN: "Multi-Task Deep Neural Networks for Natural Language Understanding". ACL(2019)?

  • "BAM! Born-Again Multi-Task Networks for Natural Language Understanding". ACL(2019)?

  • "Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding". arXiv(2019)?

  • 7.3?Adapter

  • "BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning". ICML(2019)?

  • Adapter: "Parameter-Efficient Transfer Learning for NLP". ICML(2019)?

  • 7.4?Prompt

  • PET: "Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference". EACL(2021)?

  • "It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners". NAACL(2021)?

  • "Prefix-Tuning: Optimizing Continuous Prompts for Generation". arXiv(2021)

  • LM-BFF: "Making Pre-trained Language Models Better Few-shot Learners". ACL(2021)?

  • "What Makes Good In-Context Examples for GPT-3?". arXiv(2021)?

  • "The Power of Scale for Parameter-Efficient Prompt Tuning". arXiv(2021)?

  • 7.5?其他

  • "To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks". RepL4NLP(2019)?

  • "An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models". NAACL(2019)?

  • "Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping". arXiv(2020)?

  • SMART: "SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization". EMNLP(2020)?

  • "Revisiting Few-sample BERT Fine-tuning". ICLR(2021)?

  • 特別鳴謝

    感謝 TCCI 天橋腦科學研究院對于 PaperWeekly 的支持。TCCI 關注大腦探知、大腦功能和大腦健康。

    更多閱讀

    ????

    現在,在「知乎」也能找到我們了

    進入知乎首頁搜索「PaperWeekly」

    點擊「關注」訂閱我們的專欄吧

    ·

    總結

    以上是生活随笔為你收集整理的预训练语言模型论文分类整理:综述、基准数据集、PLM的设计和分析的全部內容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。

    主站蜘蛛池模板: 日韩成人在线播放 | 9.1成人免费看片 | 久久99激情 | 亚洲一区二区三区久久久成人动漫 | 18被视频免费观看视频 | www.com欧美 | 欧美又大粗又爽又黄大片视频 | 久久一区二区三区精品 | 福利视频第一页 | 黄网站色视频免费观看 | 牛牛澡牛牛爽一区二区 | 性一交一黄一片 | 狠狠干影院 | 欧美做爰性生交视频 | 日本黄色a视频 | 久久久久久久久久久久久国产 | 亚洲精品国产一区 | av这里只有精品 | 美女日日日 | 国产a一区| 少妇又紧又色又爽又刺激视频 | 91啦中文 | 免费日韩一级片 | 亚洲一二三区视频 | 亚洲图片在线 | 亚洲人成无码www久久久 | 免费观看毛片视频 | 高清在线一区二区 | 久久久老熟女一区二区三区91 | 色呦呦在线看 | 日日噜噜噜夜夜爽爽狠狠视频97 | 亚洲成人激情小说 | 在线观看黄网址 | 欧美三级久久久 | 日韩在线视频观看免费 | 美女网站全黄 | 欧美一二三区视频 | 快播色图 | 国产福利片一区二区 | 免费成人蒂法网站 | 无码国产精品一区二区免费16 | 黑人100部av解禁片 | 成人啪啪18免费游戏链接 | 色噜噜影院 | 99爱免费| 亚洲国产精品成人va在线观看 | 日韩电影在线一区二区 | 亚洲精品激情视频 | 国产精品不卡在线观看 | 波多野结衣视频在线 | 激情久久久久 | 欧美v日韩 | 国产人妻人伦精品1国产盗摄 | 国产精品一线二线三线 | 久久三级网站 | 久久亚洲综合色 | www色综合| 欧美天天性 | 不卡一区二区在线观看 | 51成人做爰www免费看网站 | 色呦呦国产精品 | 国模丫头1000人体 | 国产精品传媒在线 | 无码人妻精品一区二区三区在线 | 91久久久久久久久久 | 免费看黄色一级视频 | 亚洲av无码国产精品久久久久 | 亚洲尹人| 日韩色图视频 | 在线视频1卡二卡三卡 | 久久久久久久久久国产精品 | 国产精品久久在线观看 | 天天操国产 | a∨色狠狠一区二区三区 | 视频1区 | 一本久道综合色婷婷五月 | 无码一区二区三区在线 | 波多野结衣在线影院 | 亚洲操一操 | 国产成年网站 | 成人免费视频一区二区 | 久久久久一区二区 | 欧美日韩激情一区 | 欧美污视频在线观看 | 欧美日韩激情在线观看 | 婷婷一区二区三区 | 亚洲一区二区三区久久 | 天天操天天操天天操 | 亚洲AV无码国产精品播放在线 | 日韩高清国产一区在线 | 95精品视频 | 国产激情一区二区三区视频免樱桃 | 伊人网综合 | 成人在线观看亚洲 | 日本v片 | 永久免费在线观看视频 | 亚洲第一淫片 | 天天干狠狠爱 | 美女流白浆视频 |