A Comparative Study on Reinforcement Learning Based Visual Dialog Systems
Elshamy, Ghada M.; Alfonse, Marco; Islam Hegazy; Aref, Mostafa M.;
Abstract
Recently the conjunction between vision and language has created many intersecting tasks as visual question-answering systems, image captioning, etc. Specifically, dialog systems that depend on a visual scene play an important role in improving human-computer interaction technology. At the same time, reinforcement learning has emerged as a very successful paradigm for a variety of machine learning tasks, especially those tasks that aim to develop smart and humanoid machines. In this paper, we show how reinforcement learning is applied to conversational agents to build a powerful visual dialog agent. Visual Dialog task requires the agent to have a meaningful conversation about visual content in natural language. For a given image, its caption, dialog history (question/answer pairs), and a question about this scene, the agent should comprehend the question, extract the relevant context from the history, and ground this information on the image to correctly answer the current question. Two main visual dialog tasks have been introduced which are a free-form dialog task known as “Visual Dialog” and a goal-oriented dialog task formulated as a guessing game. Two datasets have been introduced to address these tasks which are VisDial dataset and GuessWhat?! datasets. For evaluation, some approaches use the accuracy metric while others use four metrics that have been proposed for the sake of this task. Several approaches are proposed for tackling this task based on supervised learning or reinforcement learning or even combining both techniques. This paper represents a comparative study of eleven important reinforcement learning approaches for visual dialog.
Other data
Title | A Comparative Study on Reinforcement Learning Based Visual Dialog Systems | Authors | Elshamy, Ghada M.; Alfonse, Marco ; Islam Hegazy ; Aref, Mostafa M. | Keywords | Visual Dialog;Guessing Game;Guess What?!;Guess Which;Attention Mechanism | Issue Date | Jun-2024 | Publisher | Faculty of Computer and Information Sciences, Ain Shams University | Journal | International Journal of Intelligent Computing and Information Sciences | Volume | 24 | Issue | 2 | Start page | 58 | End page | 79 | ISSN | 2535-1710 | DOI | 10.21608/ijicis.2024.295310.1339 |
Recommend this item
Similar Items from Core Recommender Database
Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.