Visual understanding is an important area in artificial intelligence. Many researchers have proposed novel deep learning models for this, which have achieved impressive results. However, deep learning models of visual understanding do not model human-like cognition and are still far from human-level ability. Standard deep learning models do not model...
Visual Question Answering (VQA) increasingly attracts industry and academia attention. It requires the model to provide a natural language answer by an image and a related natural language question. Meanwhile, it relates to multidisciplinary research such as natural language understanding, visual information retrieval, and multimodal reasoning. As a multimodality task,...