Header menu link for other important links
Generation of Image Captions Using VGG and ResNet CNN Models Cascaded with RNN Approach
, Shubham Sureka, Shaunak Joshi,
Published in Springer
Volume: 1085
Pages: 27 - 42
Recent advancements in technology have made available a variety of image capturing devices, ranging from handheld mobiles to space-grade rovers. This has generated a tremendous visual data, which has made a necessity to organize and understand this visual data. Thus, there is a need to caption thousands of such images. This has resulted in tremendous research in computer vision and deep learning. Inspired by such recent works, we present an image caption generating system that uses convolutional neural network (CNN) for extracting the feature embedding of an image and feed that as an input to long short-term memory cells that generates a caption. We are using two pre-trained CNN models on ImageNet, VGG16 and ResNet-101. Both the models were tested and compared on Flickr8K dataset. Experimental results on this dataset demonstrate that the proposed architecture is as efficient as the state-of-the-art multi-label classification models. Experimental results on public benchmark dataset demonstrate that the proposed architecture performs as efficiently as the state-of-the-art image captioning model. © Springer Nature Singapore Pte Ltd. 2020.
About the journal
JournalData powered by TypesetInternational Conference on Machine Intelligence and Signal Processing
PublisherData powered by TypesetSpringer
Open AccessNo