Bài báo:
ViConBERT: Context-Gloss Aligned Vietnamese Word Embedding for Polysemous and Sense-Aware Representations
Học viên thực hiện:
• Huỳnh Tấn Khang – Chương trình liên kết quốc tế BCU 2023 – Tác giả chính
Giảng viên hướng dẫn:
• TS. Nguyễn Thanh Bình
• ThS. Nguyễn Hà Dung
Tóm tắt:
Recent advances in contextualized word embeddings have significantly improved semantic tasks such as Word Sense Disambiguation (WSD) and contextual similarity, yet these advances are largely confined to high-resource languages like English. Vietnamese still lacks robust models and large-scale evaluation resources for fine-grained semantic understanding. This paper introduces ViConBERT, a novel framework for learning Vietnamese contextualized word embeddings by integrating contrastive learning (SimCLR) with gloss-based distillation to better capture word meaning. In addition, the study proposes ViConWSD, the first large-scale synthetic dataset designed to evaluate semantic understanding in Vietnamese, covering both WSD and contextual similarity tasks. Experimental results demonstrate that ViConBERT outperforms strong baselines on WSD (F1 = 0.87) and achieves competitive performance on ViCon (AP = 0.88) and ViSim-400 (Spearman’s rho = 0.60), highlighting its effectiveness in modeling both discrete word senses and graded semantic relations.
The 14th International Symposium on Information and Communication Technology (SOICT 2025) là hội nghị khoa học quốc tế uy tín, bao phủ các lĩnh vực như AI Foundations và Big Data, Networking and Communication Technologies, Multimedia Processing, AI Applications, Generative AI, Applied Operations Research and Optimization, và Cyber Security. Kỷ yếu hội nghị được xuất bản trong Springer Communications in Computer and Information Science (CCIS) và được lập chỉ mục bởi DBLP, Google Scholar, EI-Compendex, Mathematical Reviews, SCImago và Scopus; đồng thời được gửi xét duyệt vào ISI Proceedings.
Thông tin chi tiết: https://www.facebook.com/share/p/1A3M1y8ZRY/



