"Tiny Language Models Enriched with Multimodal Knowledge from Multiplex Networks" by Clayton Fields

Selected Works of Casey R. Kennington

Follow Contact

Contribution to Book

Tiny Language Models Enriched with Multimodal Knowledge from Multiplex Networks

Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning

Clayton Fields, Boise State University
Osama Natouf, Boise State University
Andrew McMains, Boise State University
Catherine Henry, Boise State University
Casey Kennington, Boise State University

Download

Document Type

Conference Proceeding

Publication Date

1-1-2023

Disciplines

Computer Sciences

Abstract

Large transformer language models trained exclusively on massive quantities of text are now the standard in NLP. In addition to the impractical amounts of data used to train them, they require enormous computational resources for training. Furthermore, they lack the rich array of sensory information available to humans, who can learn language with much less exposure to language. In this study, performed for submission in the BabyLM challenge, we show that we can improve a small transformer model’s data efficiency by enriching its embeddings by swapping the learned word embeddings from a tiny transformer model with vectors extracted from a custom multiplex network that encodes visual and sensorimotor information. Further, we use a custom variation of the ELECTRA model that contains less than 7 million parameters and can be trained end-to-end using a single GPU. Our experiments show that models using these embeddings outperform equivalent models when pretrained with only the small BabyLM dataset, containing only 10 million words of text, on a variety of natural language understanding tasks from the GLUE and SuperGLUE benchmarks and a variation of the BLiMP task.

Creative Commons License

Creative Commons Attribution 4.0 International

Citation Information

Fields, Clayton; Natouf, Osama; McMains, Andrew; Henry, Catherine; and Kennington, Casey. (2023). "Tiny Language Models Enriched with Multimodal Knowledge from Multiplex Networks". In A. Warstadt, A. Mueller, L. Choshen, E. Wilcox, C. Zhuang, J. Ciro, R. Mosquera, B. Paranjabe, A. Williams, T. Linzen, and R. Cotterell (Eds.), Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning (pp. 47-57). https://doi.org/10.18653/v1/2023.conll-babylm.3