In this paper, we describe a spoken Arabic dialect identification (ADI) model for Arabic that consistently outperforms previously published results on two benchmark datasets: ADI-5 and ADI-17. We explore two architectural variations: ResNet and ECAPA-TDNN, coupled with two types of acoustic features: MFCCs and features exratected from the pre-trained self-supervised model UniSpeech-SAT Large, as well as a fusion of all four variants. We find that individually, ECAPA-TDNN network outperforms ResNet, and models with UniSpeech-SAT features outperform models with MFCCs by a large margin. Furthermore, a fusion of all four variants consistently outperforms individual models. Our best models outperform previously reported results on both datasets, with accuracies of 84.7% and 96.9% on ADI-5 and ADI-17, respectively. © 2023, CC BY-NC-SA.
- Acoustic features,
- Arabic dialects,
- Benchmark datasets,
- Best model,
- Dialect identification,
- Identification modeling,
- Individual modeling,
- Large margins
arXiv link: https://doi.org/10.48550/arXiv.2310.13812
Preprint: arXiv
Archived with thanks to arXiv
Preprint License: CC BY NC SA 4.0
Uploaded 30 November 2023