A Unified Multimodal Framework for Joint Visual Question Answering and Image Captioning. PIJSSH. 2026;5(1):1-14. doi:10.59088/7m3hce68