Deep Learning for Scene Understanding (3D, Functional, Semantics)

Abstract

The goal of this project is to develop a geometric and functional representation of our visual world for scene understanding. This project aims to harness the recent advancements in deep learning (convolutional neural networks) and explore the possible improvements in scene understanding. Another goal of this project is to explore how reasoning can be performed using convolutional neural networks.

Intellectual Merit

This project is an attempt to use deep learning based approaches for integrating physical and visual representation of the visual world with action modeling. We propose research divided into three task areas:
Task 1. Physical Representation: Exploring how CNNs can be used to predict surface normals for an input image.
Task 2. Functional Representation: Exploring how CNNs can be used for direct perception of affordances.
Task 3. Reasoning: How CNNs can be used for combining multiple tasks.

Broader Impact

This project is anticipated to result in major advances within the image understanding community, bringing it closer to researchers in deep learning and robotics. It is anticipated to result in improvements in: (a) 3D Scene Understanding; (b) Recognition; (c) Human Activity Understanding, and hence could be a critical enabling technology for applications such as autonomous systems, surveillance, and personal robotics.

This project is also expected to contribute to education through course development, student projects (See: https://sites.google.com/site/16899fall2014/), workshops, and tutorials involving a broader audience as well as using popular online media (e.g., YouTube).

Use of FutureGrid

We plan to use the GPU cloud for developing CNN architectures for Scene Understanding. We also plan to use the cloud for demos in the new deep learning course being taught at CMU.

Scale Of Use

We will need a few VMs for few months to run different architectures for experiments. We will also need just a few TB for storage.

Publications

Project Number: FG-457

Project Lead: Abhinav Gupta

Institution: Carnegie Mellon Universiy

Project Status: Active

View Project Details

Project Members

Abhinav Shrivastava

Naiyan Wang

Xiaolong Wang

Keywords

Deep Learning, Scene understanding

Timeline

Updated: 2 weeks 44 min ago