A Comparative Study on Reinforcement Learning Based Visual Dialog Systems

Elshamy, Ghada M.; Alfonse, Marco; Islam Hegazy; Aref, Mostafa M.;

Abstract


Recently the conjunction between vision and language has created many intersecting tasks as visual question-answering systems, image captioning, etc. Specifically, dialog systems that depend on a visual scene play an important role in improving human-computer interaction technology. At the same time, reinforcement learning has emerged as a very successful paradigm for a variety of machine learning tasks, especially those tasks that aim to develop smart and humanoid machines. In this paper, we show how reinforcement learning is applied to conversational agents to build a powerful visual dialog agent. Visual Dialog task requires the agent to have a meaningful conversation about visual content in natural language. For a given image, its caption, dialog history (question/answer pairs), and a question about this scene, the agent should comprehend the question, extract the relevant context from the history, and ground this information on the image to correctly answer the current question. Two main visual dialog tasks have been introduced which are a free-form dialog task known as “Visual Dialog” and a goal-oriented dialog task formulated as a guessing game. Two datasets have been introduced to address these tasks which are VisDial dataset and GuessWhat?! datasets. For evaluation, some approaches use the accuracy metric while others use four metrics that have been proposed for the sake of this task. Several approaches are proposed for tackling this task based on supervised learning or reinforcement learning or even combining both techniques. This paper represents a comparative study of eleven important reinforcement learning approaches for visual dialog.


Other data

Title A Comparative Study on Reinforcement Learning Based Visual Dialog Systems
Authors Elshamy, Ghada M.; Alfonse, Marco ; Islam Hegazy ; Aref, Mostafa M.
Keywords Visual Dialog;Guessing Game;Guess What?!;Guess Which;Attention Mechanism
Issue Date Jun-2024
Publisher Faculty of Computer and Information Sciences, Ain Shams University
Journal International Journal of Intelligent Computing and Information Sciences 
Volume 24
Issue 2
Start page 58
End page 79
ISSN 2535-1710
DOI 10.21608/ijicis.2024.295310.1339

Recommend this item

Similar Items from Core Recommender Database

Google ScholarTM

Check



Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.