Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning (Paper Explained)
Deep Learning Explainer
This paper proposes a method requiring no fine-tuning to achieve very decent results on commonsense reasoning tasks. (zero-shot learning) With this novel scoring mechanism, ROBERTA-large (355M parameters) performs surprisingly well in a zero-shot learning setup.
0:00 - Intro 3:08 - Commonsense reasoning 5:15 - Proposed method 8:29 - Sequence scoring method 12:59 - SSM-based fine-tuning 15:28 - Task probing 19:23 - Experiment results 24:58 - Future work 25:31 - Takeaways
Paper: https://arxiv.org/abs/2004.14074
Abstract: Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks. Most of the existing approaches rely on a randomly initialized classifier on top of such networks. We argue that this fine-tuning procedure is sub-optimal as the pre-trained model has no prior on the specific classifier labels, while it might have already learned an intrinsic textual representation of the task. In this paper, we introduce a new scoring method that casts a plausibility ranking task in a full-text format and leverages the masked language modeling head tuned during the pre-training phase. We study commonsense reasoning tasks where the model must rank a set of hypotheses given a premise, focusing on the COPA, Swag, HellaSwag and CommonsenseQA datasets. By exploiting our scoring method without fine-tuning, we are able to produce strong baselines (e.g. 80% test accuracy on COPA) that are comparable to supervised approaches. Moreover, when fine-tuning directly on the proposed scoring function, we show that our method provides a much more stable training phase across random restarts (e.g ×10 standard deviation reduction on COPA test accuracy) and requires less annotated data than the standard classifier approach to reach equivalent performances.
Deep Learning Explainer Twitter: https://twitter.com/DeepExplainer ... https://www.youtube.com/watch?v=Ijrdm0Nb_k0
36496994 Bytes