In recent years, we have witnessed a significant progress in various fields of AI, such as computer vision, as well as language understanding. These progress motivated researchers to address a more challenging problem: question answering and question generation on visual data. This problem combines both image understanding as well as language understanding.
Essentially, this task is defined as follows: an image along with a question about that image is the input to the AI system, and the intelligent system is supposed to output a correct answer to the question with respect the input image. However, taking a step beyond that, we through this project, aim to create a system which on taking an image as input generates a question-answer pair and when the user submits the answer, verification of the answer takes place.
Our solution thus involves the usage of both the Visual Question Generation (VQG) and VQA model in a sequential manner. The VQG module generates reasonable questions given an image whereas, the VQA module generates natural answers given a question and an image. On the basis of the image, the VQG and VQA module will be used to get the correct question and then the answer.
The proposed solution will aim towards bridging the gap and assimilating verification of answers on the existing question answer generation platforms. The most immediate application is as an automatic dataset annotation tool. We can reduce the need for human intervention for dataset annotation. With an increasing need for varied datasets for Deep Learning, such applications can help reduce the need for human labour. Further, the solution will benefit the educational sector, as well as help, improve existing authentication using captcha. In the educational sector, children can use such an application to learn to answer basic questions from images. While these same images can be used as an extra step in authentication to increase the security of the project.
Received a grant from the University of Mumbai