Return to Issue Details A Unified Multimodal Framework for Joint Visual Question Answering and Image Captioning Download Download PDF