How do we represent the visual world? My research focuses on developing representation and reasoning approaches for deeper understanding of the scene. I am interested in formulating the scene understanding problem in terms of the underlying 3D scene and develop reasoning approaches based on physical, functional and causal relationships between the different elements in the scene. The key idea is to have a qualitative representation and yet have a meaningful grounding in the physical scene.
What is the link between Language and Vision? What role does language play in visual learning? I am interested in exploring how declarative information and other linguistic information can be harnessed to efficiently learn how the world works (structural information). I am also interesting in exploring how we can obtain such linguistic information.
How are actions and objects related to each other? I have been focusing on studying how do humans interact with their environment and how does their perception of visual world depends on these interactions and their abilities. Building upon Gibson’s idea of affordances, we have recently proposed the concept of human centric scene understanding.