A code-switching-based approach for low-resource language visual question answering

LIU Zheng; DONG Jun; JIALE Dongzhu; CHAOMU Rilige; LIU Xuan; WENG Yu

doi:10.12202/j.0476-0301.2025054

LIU Zheng, DONG Jun, JIALE Dongzhu, CHAOMU Rilige, LIU Xuan, WENG Yu. A code-switching-based approach for low-resource language visual question answering[J]. Journal of Beijing Normal University(Natural Science), 2025, 61(3): 277-284. DOI: 10.12202/j.0476-0301.2025054

Citation:

A code-switching-based approach for low-resource language visual question answering

Abstract

Abstract

To address challenges facing vision-language models in low-resource scenarios, such as lack of large-scale annotated data and effective transfer methods, a code-switching Chinese Minority pre-trained language model visual question answering (CCMPLM-VQA) method is proposed in this work． With a cross-lingual masked modeling approach using code-switching, model dependence on annotated training data is reduced． A language adapter (LA) with novel structures is introduced to effectively improve multimodal alignment of CCMPLM-VQA． The effectiveness of the proposed method is verified． Compared with the best benchmark model, CCMPLM-VQA improves zero-shot performance on real-world general visual reasoning dataset by approximately 12%． Additionally, its zero-shot performance on cross-lingual real-world general visual reasoning datasets also outperforms existing methods by about 1%．

FullText(HTML)

References (19)

Cited By

Turn off MathJax

Article Contents

A code-switching-based approach for low-resource language visual question answering

Abstract

Catalog

Export File

Citation

Format

Content