Augmented TIRG for CBIR Using Combined Text and Image Features
Aboali, Mohamed; Elmaddah, Islam; Hossam El DIn Hassan Abdelmunim;
Abstract
In this paper we propose a methodology for Content Based Image Retrieval, CBIR, using query inputs in the form of a source image and text modifiers. The proposed methodology augments the methodology proposed in [21], TIRG, with a trained module. The trained module aims at enhancing the relationship between a) the composed image-text features and b) the target image features (e.g. input an image of blue dress along with a textual description and ask for the same dress but in red). Our study used two trained modules (Linear Regression, LR, and Non-Linear Multilayered Perceptron, NMLP). The proposed models were tested using the well-known fashion 200K dataset. The LR model reduced the Mean Squared Error, MSE, significantly. A joint LR model outperformed the TIRG on the testing Dataset. Two NMLP trained models were used: MSE-optimized and Cosine similarity optimized. The performance of the two models was very similar. The NMLP models, in general, outperformed TIRG over the Training dataset. The study also indicates that combining image-text features should be kept to later stages to obtain their recall intersections. Moreover, the study showed that text-features generation based on words assigned numbers irrelated to their semantics requires semantic hub to bridge to a semantic-numbers.
Other data
| Title | Augmented TIRG for CBIR Using Combined Text and Image Features | Authors | Aboali, Mohamed; Elmaddah, Islam; Hossam El DIn Hassan Abdelmunim | Issue Date | 1-Jan-2021 | Conference | International Conference on Electrical Computer and Energy Technologies Icecet 2021 | ISBN | [9781665442312] | DOI | 10.1109/ICECET52533.2021.9698617 | Scopus ID | 2-s2.0-85127030138 |
Recommend this item
Similar Items from Core Recommender Database
Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.