LIU Zheng, DONG Jun, JIALE Dongzhu, CHAOMU Rilige, LIU Xuan, WENG Yu. A code-switching-based approach for low-resource language visual question answering[J]. Journal of Beijing Normal University(Natural Science). DOI: 10.12202/j.0476-0301.2025054
Citation: LIU Zheng, DONG Jun, JIALE Dongzhu, CHAOMU Rilige, LIU Xuan, WENG Yu. A code-switching-based approach for low-resource language visual question answering[J]. Journal of Beijing Normal University(Natural Science). DOI: 10.12202/j.0476-0301.2025054

A code-switching-based approach for low-resource language visual question answering

  • To address the challenges faced by vision-language models in low-resource scenarios, such as the lack of large-scale annotated data and effective transfer methods, Code-switching Chinese Minority Pre-trained Language Model-Visual Question Answering (CCMPLM-VQA) method is proposed. Through a cross-lingual masked modeling approach using code-switching, the model's dependence on annotated training data is reduced. Meanwhile, a Language Adapter (LA) with a novel structure is introduced to effectively improve the multimodal alignment of CCMPLM-VQA. The effectiveness of the proposed method is verified. The results show that compared with the best benchmark model, CCMPLM-VQA improves the zero-shot performance on the real-world general visual reasoning dataset by approximately 12%. Additionally, its zero-shot performance on cross-lingual real-world general visual reasoning datasets also outperforms existing similar methods by about 1%.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return