Deep Learning for Scene Understanding (3D, Functional, Semantics)

Project Information

Discipline
Computer Science (401) 
Subdiscipline
11.04 Information Sciences and Systems 
Orientation
Research 
Abstract

The goal of this project is to develop a geometric and functional representation of our visual world for scene understanding. This project aims to harness the recent advancements in deep learning (convolutional neural networks) and explore the possible improvements in scene understanding. Another goal of this project is to explore how reasoning can be performed using convolutional neural networks.

Intellectual Merit

This project is an attempt to use deep learning based approaches for integrating physical and visual representation of the visual world with action modeling. We propose research divided into three task areas: Task 1. Physical Representation: Exploring how CNNs can be used to predict surface normals for an input image. Task 2. Functional Representation: Exploring how CNNs can be used for direct perception of affordances. Task 3. Reasoning: How CNNs can be used for combining multiple tasks.

Broader Impacts

This project is anticipated to result in major advances within the image understanding community, bringing it closer to researchers in deep learning and robotics. It is anticipated to result in improvements in: (a) 3D Scene Understanding; (b) Recognition; (c) Human Activity Understanding, and hence could be a critical enabling technology for applications such as autonomous systems, surveillance, and personal robotics. This project is also expected to contribute to education through course development, student projects (See: https://sites.google.com/site/16899fall2014/), workshops, and tutorials involving a broader audience as well as using popular online media (e.g., YouTube).

Project Contact

Project Lead
Abhinav Gupta (abhinavg) 
Project Manager
Abhinav Gupta (abhinavg) 
Project Members
Xiaolong Wang, Abhinav Shrivastava, Naiyan Wang  

Resource Requirements

Hardware System
  • delta (GPU Cloud)
 
Use of FutureGrid

We plan to use the GPU cloud for developing CNN architectures for Scene Understanding. We also plan to use the cloud for demos in the new deep learning course being taught at CMU.

Scale of Use

We will need a few VMs for few months to run different architectures for experiments. We will also need just a few TB for storage.

Project Timeline

Submitted
09/26/2014 - 00:04