Skip to main content
Contribution to Book
Tiny Language Models Enriched with Multimodal Knowledge from Multiplex Networks
Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning
  • Clayton Fields, Boise State University
  • Osama Natouf, Boise State University
  • Andrew McMains, Boise State University
  • Catherine Henry, Boise State University
  • Casey Kennington, Boise State University
Document Type
Conference Proceeding
Publication Date
1-1-2023
Disciplines
Abstract

Large transformer language models trained exclusively on massive quantities of text are now the standard in NLP. In addition to the impractical amounts of data used to train them, they require enormous computational resources for training. Furthermore, they lack the rich array of sensory information available to humans, who can learn language with much less exposure to language. In this study, performed for submission in the BabyLM challenge, we show that we can improve a small transformer model’s data efficiency by enriching its embeddings by swapping the learned word embeddings from a tiny transformer model with vectors extracted from a custom multiplex network that encodes visual and sensorimotor information. Further, we use a custom variation of the ELECTRA model that contains less than 7 million parameters and can be trained end-to-end using a single GPU. Our experiments show that models using these embeddings outperform equivalent models when pretrained with only the small BabyLM dataset, containing only 10 million words of text, on a variety of natural language understanding tasks from the GLUE and SuperGLUE benchmarks and a variation of the BLiMP task.

Creative Commons License
Creative Commons Attribution 4.0 International
Citation Information
Fields, Clayton; Natouf, Osama; McMains, Andrew; Henry, Catherine; and Kennington, Casey. (2023). "Tiny Language Models Enriched with Multimodal Knowledge from Multiplex Networks". In A. Warstadt, A. Mueller, L. Choshen, E. Wilcox, C. Zhuang, J. Ciro, R. Mosquera, B. Paranjabe, A. Williams, T. Linzen, and R. Cotterell (Eds.), Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning (pp. 47-57). https://doi.org/10.18653/v1/2023.conll-babylm.3