Generation and Evaluation of Hindi Image Captions of Visual Genome

0
438

Abstract

The automatic image caption generation with proper fluency and expressiveness is an emerging area of research. A lot of research has been done on image caption generation for English, but very few work has been done in the area of generating and evaluating captions in Hindi. In this paper, the problem of generation and evaluation of captions in Hindi is addressed by using a framework based on convolutional neural network (CNN) and long short-term memory (LSTM). This model maximizes the likelihood of the target caption for an input image. The framework is experimented over Hindi Visual Genome dataset. Human evaluation and pre-defined automatic evaluation metrics are used for the evaluation of generated output. The experimental results of the framework manifest that the model is generating reasonably impressive Hindi captions.