REALM: Retrieval-Augmented Language Model Pre-training | Qpen Question Answering SOTA #OpenQA
Deep Learning Explainer
This paper introduces a way to use unsupervised learning to train a neural knowledge retriever that achieves state-of-the-art results in three popular Open-QA benchmarks, and it outperforms all previous methods by a significant margin (4-16%absolute accuracy).
0:00 - Intro 2:30 - Language models v.s world knowledge 5:16 - REALM: Retrieve-then-predict 7:13 - Masked language modelling for knowledge retrieval 10:54 - Neural lnowledge retriever 16:11 - Knowledge-augmented encoder 17:00 - Pre-training: MLM loss function 21:19 - Fine-tuning loss function 25:42 - Training 27:31 - Summation over all document problem 31:03 - Maximum inner product search (MIPS) 34:11 - Asynchronous MIPS 35:31 - Injecting inductive biases into pre-training 38:50 - Cold start problem 41:50 - Experiments 46:03 - Ablation study 48:50 - How much do retrieved docs help? 50:08 - Different scopes of language modelling 52:08 - Implications
Paper: REALM: Retrieval-Augmented Language Model Pre-Training https://arxiv.org/abs/2002.08909
Abstract
Language model pre-training has been shown to capture a surprising amount of world knowledge, crucial for NLP tasks such as question answering. However, this knowledge is stored implicitly in the parameters of a neural network, requiring ever-larger networks to cover more facts. To capture knowledge in a more modular and interpretable way, we augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia, used during pre-training, fine-tuning and inference. For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner, using masked language modeling as the learning signal and backpropagating through a retrieval step that considers millions of documents. We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA). We compare against state-of-the-art models for both explicit and implicit knowledge storage on three popular Open-QA benchmarks, and find that we outperform all previous methods by a significant margin (4-16% absolute accuracy), while also providing qualitative benefits such as interpretability and modularity. ... https://www.youtube.com/watch?v=JQ-bxQT5Qsw
74919381 Bytes