NIPS 2016 — Day 3 Highlights: Robots that know, Cars that see, and more!

Ross Fadely
Insight
Published in
3 min readDec 9, 2016

--

Jeremy Karnowski & Ross Fadely, Insight Artificial Intelligence

Missed our highlights from NIPS 2016? Check out Day 1 and Day 2. Want to learn about applied Artificial Intelligence from leading practitioners in Silicon Valley or New York? Learn more about the Insight Artificial Intelligence Fellows Program.

Integrating deep learning, mechanical systems, and the real world

The robots are coming! Or rather, the robots are poking? One of the emerging themes of Day 3 was deep learning integrated into hardware applications, specifically robots and cars. Of these, a standout talk was given by Pulkit Agrawal about his team’s work Learning to Poke by Poking: Experiential Learning of Intuitive Physics. Using hundreds of hours of data where a robot was tasked to move objects by poking them, they constructed a system that allows the robot to move an object around even if it has never seen it before. The system utilizes convolutional neural nets (CNNs) to see the world, and has two models for understanding the physics involved. First, a forward model (which predicts the outcome of an action/poke) and an inverse model that takes the state of the world and maps it to actions. Through a series of convincing videos it is clear that the robot has learned how to move objects around fairly generally — incredibly impressive!

One author (Ross) jumping in to have a look through a car’s eyes.

On our way to Agrawal’s talk we ran into Insight Alum Michael Beaumier (Machine Learning Engineer, Mercedes-Benz), whose colleagues were showing off their latest object detection system. The key innovation they have been developing is the ability to detect small objects on the road. Using a combination of scene segmentation (CNN trained on ImageNet) and some geometric information from stereo imaging, they built a Bayesian model to detect objects as small as 5cm at 100m. Such work is paving the way to making driverless cars possible and safe.

Quick hits: Wrapping up the 2016 posters

For our Day 1 post, we had the chance to highlight a few (3) of the 170+ papers represented during that poster session. Over days 2 and 3 another 390+ were presented, for a grand total of 567 awesome papers! Like Day 1, we wanted to pass along some quick highlights that caught our eye:

  • Attend, Infer, Repeat: Fast Scene Understanding with Generative Models Presents an inspiring approach to understanding scenes in an image. Using Bayesian and Variational Inference, the authors construct models that understand the number, location, and type of objects in a picture without any supervision. We are intrigued, as their models can reason/infer about distributions outside of training examples. The models do suffer from specification needs, but none-the-less provide interesting avenues for exploration.
  • DeepMath — Deep Sequence Models for Premise Selection Deep learning is constantly breaking into new domains. A team from Google Research (including François Chollet and Geoffrey Irving) demonstrated the first example of using deep learning for automated theorem proving. This work will help accelerate system correctness proofs and replace the need for expert engineered features in this domain (which has similar but differing structure from natural language). Their process of automatically choosing which operations are relevant to a current state in a reasoning process can be expanded to other domains and is a very exciting direction.
  • We love steady progress. Word embeddings helped to transform many NLP tasks, and last year the Word Movers Distance (paper) provided a way of summarizing the difference between documents using their embeddings. For tasks that are supervised (e.g., text classification) this can be taken one step further. The Supervised Word Movers Distance (paper) performs affine transformations and re-weightings to provide class separation, leading to efficient state-of-the-art performance.

--

--