Deep Learning for Scene Understanding (3D, Functional, Semantics)

Project Information

Discipline: Computer Science (401)
Subdiscipline: 11.04 Information Sciences and Systems
Orientation: Research

Abstract

The goal of this project is to develop a geometric and functional representation of our visual world for scene understanding. This project aims to harness the recent advancements in deep learning (convolutional neural networks) and explore the possible improvements in scene understanding. Another goal of this project is to explore how reasoning can be performed using convolutional neural networks.

Intellectual Merit

This project is an attempt to use deep learning based approaches for integrating physical and visual representation of the visual world with action modeling. We propose research divided into three task areas: Task 1. Physical Representation: Exploring how CNNs can be used to predict surface normals for an input image. Task 2. Functional Representation: Exploring how CNNs can be used for direct perception of affordances. Task 3. Reasoning: How CNNs can be used for combining multiple tasks.

Broader Impacts

This project is anticipated to result in major advances within the image understanding community, bringing it closer to researchers in deep learning and robotics. It is anticipated to result in improvements in: (a) 3D Scene Understanding; (b) Recognition; (c) Human Activity Understanding, and hence could be a critical enabling technology for applications such as autonomous systems, surveillance, and personal robotics. This project is also expected to contribute to education through course development, student projects (See: https://sites.google.com/site/16899fall2014/), workshops, and tutorials involving a broader audience as well as using popular online media (e.g., YouTube).

Project Contact

Project Lead: Abhinav Gupta (abhinavg)
Project Manager: Abhinav Gupta (abhinavg)
Project Members: Xiaolong Wang, Abhinav Shrivastava, Naiyan Wang

Resource Requirements

Hardware System

delta (GPU Cloud)

Use of FutureGrid

We plan to use the GPU cloud for developing CNN architectures for Scene Understanding. We also plan to use the cloud for demos in the new deep learning course being taught at CMU.

Scale of Use

We will need a few VMs for few months to run different architectures for experiments. We will also need just a few TB for storage.

Project Timeline

Submitted: 09/26/2014 - 00:04

Deep Learning for Scene Understanding (3D, Functional, Semantics)

Project Information

Project Contact

Resource Requirements

Project Timeline

About

Support

Community

Projects