Augmented TIRG for CBIR Using Combined Text and Image Features

Aboali, Mohamed; Elmaddah, Islam; Hossam El DIn Hassan Abdelmunim;

Abstract


In this paper we propose a methodology for Content Based Image Retrieval, CBIR, using query inputs in the form of a source image and text modifiers. The proposed methodology augments the methodology proposed in [21], TIRG, with a trained module. The trained module aims at enhancing the relationship between a) the composed image-text features and b) the target image features (e.g. input an image of blue dress along with a textual description and ask for the same dress but in red). Our study used two trained modules (Linear Regression, LR, and Non-Linear Multilayered Perceptron, NMLP). The proposed models were tested using the well-known fashion 200K dataset. The LR model reduced the Mean Squared Error, MSE, significantly. A joint LR model outperformed the TIRG on the testing Dataset. Two NMLP trained models were used: MSE-optimized and Cosine similarity optimized. The performance of the two models was very similar. The NMLP models, in general, outperformed TIRG over the Training dataset. The study also indicates that combining image-text features should be kept to later stages to obtain their recall intersections. Moreover, the study showed that text-features generation based on words assigned numbers irrelated to their semantics requires semantic hub to bridge to a semantic-numbers.


Other data

Title Augmented TIRG for CBIR Using Combined Text and Image Features
Authors Aboali, Mohamed; Elmaddah, Islam; Hossam El DIn Hassan Abdelmunim 
Issue Date 1-Jan-2021
Conference International Conference on Electrical Computer and Energy Technologies Icecet 2021
ISBN [9781665442312]
DOI 10.1109/ICECET52533.2021.9698617
Scopus ID 2-s2.0-85127030138

Recommend this item

Similar Items from Core Recommender Database

Google ScholarTM

Check



Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.