NLP_Paper_Pool

Dataset

Open Domain QA

October 2020: Open-Domain Question Answering Goes Conversational via Question Rewriting

Multi-hop QA

November 2020: Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps

Machine Translation

January 2021: An Efficient Transformer Decoder with Compressed Sub-layers
December 2020: Train Once, and Decode As You Like
November 2020: Language Models not just for Pre-training: Fast Online Neural Noisy Channel Modeling
October 2020: Nearest Neighbor Machine Translation
October 2020: Inference Strategies for Machine Translation with Conditional Masking
October 2020: Multi-task Learning for Multilingual Neural Machine Translation
September 2020: Energy-Based Reranking: Improving Neural Machine Translation Using Energy-Based Models
September 2020: Softmax Tempering for Training Neural Machine Translation Models
September 2020: CSP: Code-Switching Pre-training for Neural Machine Translation
June 2020: Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation

Machine Translation (Non-Autoregressive)

November 2020: Context-Aware Cross-Attention for Non-Autoregressive Translation
April 2020: Non-Autoregressive Machine Translation with Latent Alignments

Machine Translation (Low-Resource)

October 2020: Cross-lingual Machine Reading Comprehension with Language Branch Knowledge Distillation
October 2020: Improving Target-side Lexical Transfer in Multilingual Neural Machine Translation
September 2020: Harnessing Multilinguality in Unsupervised Machine Translation for Rare Languages

Model Compression

October 2020: Adversarial Self-Supervised Data-Free Distillation for Text Classification
October 2020: Optimizing Transformers with Approximate Computing for Faster, Smaller and more Accurate NLP Models
September 2020: Contrastive Distillation on Intermediate Representations for Language Model Compression
September 2020: Weight Distillation: Transferring the Knowledge in Neural Network Parameters
June 2020: SqueezeBERT: What can computer vision teach NLP about efficient neural networks?
February 2020: BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
February 2020: Compressing Large-Scale Transformer-Based Models: A Case Study on BERT

Attention

October 2020: Long Document Ranking with Query-Directed Sparse Transformer
October 2020: SMYRF: Efficient Attention using Asymmetric Clustering
October 2020: Improving Attention Mechanism with Query-Value Interaction
October 2020: Guiding Attention for Self-Supervised Learning with Transformers
September 2020: An Attention Free Transformer
September 2020: Sparsifying Transformer Models with Differentiable Representation Pooling
September 2020: Repulsive Attention: Rethinking Multi-head Attention as Bayesian Inference
June 2020: Limits to Depth Efficiencies of Self-Attention
May 2020: Hard-Coded Gaussian Attention for Neural Machine Translation
November 2019: Location Attention for Extrapolation to Longer Sequences

Transformers

November 2020: Long Range Arena: A Benchmark for Efficient Transformers
November 2020: Colorization Transformer
October 2020: N-ODE Transformer: A Depth-Adaptive Variant of the Transformer Using Neural Ordinary Differential Equations
August 2020: DeLighT: Very Deep and Light-weight Transformer
April 2020: Fast and Accurate Deep Bidirectional Language Representations for Unsupervised Learning
May 2019: Unified Language Model Pre-training for Natural Language Understanding and Generation

Training Tips for Transformers

Pretraining

Sequence Span Rewriting

January 2021: Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting

Auxiliary Tasks

October 2020: On Losses for Modern Language Models

Special Tokens Across Layers

October 2020: Cross-Thought for Sentence Encoder Pre-training

Sub-modules

October 2020: VECO: Variable Encoder-decoder Pre-training for Cross-lingual Understanding and Generation

Loss

September 2019: You Only Train Once: Loss-Conditional Training of Deep Networks

Miscellaneous

July 2020: Data Movement Is All You Need: A Case Study on Optimizing Transformers

Explaination

October 2020: Explaining and Improving Model Behavior with k Nearest Neighbor Representations
April 2020: Attention Module is Not Only a Weight: Analyzing Transformers with Vector Norms
September 2019: Learning to Deceive with Attention-Based Explanations

Rich Answer Type

September 2020:No Answer is Better Than Wrong Answer: A Reflection Model for Document Level Machine Reading Comprehension

Optimizer

September 2020:Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization

Text Attribute Transfer

November 2020:Deep Learning for Text Attribute Transfer: A Survey

Layer Analysis

October 2020 :Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth

Pre-Funetunning

January 2021: Muppet: Massive Multi-task Representations with Pre-Finetuning

Need to update

An Attention Free Transformer

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP_Paper_Pool

Table of Contents

Dataset

Open Domain QA

Multi-hop QA

Machine Translation

Machine Translation (Non-Autoregressive)

Machine Translation (Low-Resource)

Model Compression

Attention

Transformers

Training Tips for Transformers

Positional Encoding

Char Embedding

Long Text

Word Sense Disambiguation

Pretraining

Sequence Span Rewriting

Auxiliary Tasks

Special Tokens Across Layers

Sub-modules

Loss

Miscellaneous

Explaination

Rich Answer Type

Optimizer

Text Attribute Transfer

Layer Analysis

Pre-Funetunning

Need to update

About

Releases

Packages

YNNEKUW/NLP_Paper_Pool

Folders and files

Latest commit

History

Repository files navigation

NLP_Paper_Pool

Table of Contents

Dataset

Open Domain QA

Multi-hop QA

Machine Translation

Machine Translation (Non-Autoregressive)

Machine Translation (Low-Resource)

Model Compression

Attention

Transformers

Training Tips for Transformers

Positional Encoding

Char Embedding

Long Text

Word Sense Disambiguation

Pretraining

Sequence Span Rewriting

Auxiliary Tasks

Special Tokens Across Layers

Sub-modules

Loss

Miscellaneous

Explaination

Rich Answer Type

Optimizer

Text Attribute Transfer

Layer Analysis

Pre-Funetunning

Need to update

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages