Building and Working in Environments for Embodied AI

Introduction

In recent years, there has been a growing interest in embodied AI research in computer vision. Multiple embodied AI workshops and challenges have taken place in the research community, including Generalizable Policy Learning in the Physical World in ICLR 2022, OCRTOC: Open Cloud Robot Table Organization Challenge in IROS 2020, Habitat: Embodied Agents Challenge and Workshop in CVPR 2019, and Embodied AI Workshop in CVPR 2020 and 2021. Computer vision is now an essential module in embodied AI research, but we are still missing a basic tutorial to guide researchers, especially those from vision and machine learning backgrounds, to get started in this field.

In particular, many impressive progress in embodied AI has been made in virtual environments, which are powered by the latest progress in physical simulation and rendering technologies. These platforms allows for the study of many vision-robotics problems that cannot be studied at scale in the real world before. The nature of faster speed, easier parallelization, simpler data collection, and lower cost allows embodied AI study in simulation to build larger communities, with diverse researcher backgrounds, improved code sharing, and standard benchmarks. However, virtual environments do come with their own issues, such as simulation parameters and domain gaps, which are worth noting when building and using them.

Our tutorial aims to provide the getting-started guide for computer vision researchers to study vision problems on embodied agents in the environments, as well as highlight common issues encountered when using these environments. The tutorial will focus on the principles shared across platforms and teach concepts using multiple simulation environments.

Syllabus

The course will cover the following units:

Overview of Embodied AI

Embodied AI involves a wide range of topics. Here we provide a broad overview of the embodied AI field, including the following components.

Simulators and environments
Robots and controllers
Datasets and assets
Tasks and approaches
Challenges

The Basic Frameworks and techniques for Embodied AI

The details of how the visual system and the control and actuation system are connected together is often unclear to researchers in the vision community. Here we introduce common frameworks to compose a system with both components.

Reinforcement learning: OpenAI Gym interface
Overview of motion planning and control

When vision researchers use embodied AI environments, they also need to have an basic knowledge of how the simulator works. This allows them to understand the capabilities and the limitations of the simulation, so that they can leverage the full capabilities of these environments and ensure correct simulation. We provide a summary of the key parameters, and guidance on how to debug issues independently.

A glance at rigid-body simulation
Visual sensor simulation (RGB and Depth)
Simulatable 3D asset representation and construction

Design Choices in Modern Embodied AI Environments

Building a embodied AI environment is much more than knowing the underlying simulation technologies. To study vision problems under useful setups and at proper abstraction level, we introduce the common design choices. We will also explain the choices in common embodied AI challenges so that audiences can quickly start working on them.

Overview of Design Dimensions: Embodiment (Sensor, Actuator), Task Specification, Metric
Case Study: Habitat Challenge (based on Habitat), Rearrangement Challenge (based on AI2THOR), ManiSkill Challenge (based on SAPIEN)

Experiences and Practices to Debug Simulators

Virtual environments are not perfect. Vision researchers new to them often face challenges using them correctly. Our team has rich experiences from the feedbacks of the SAPIEN (a simulator used by many and supports the ManiSkill Embodied AI Challenge) user community. We would share these experiences.

Common issues in simulators
Causes of the issues related to simulation techniques
Tips and tricks to tackle these issues

Real World Robotics and Sim2Real

Sim2Real is a very common question asked by users of simulation environments. In this section, we demonstrate how sim2real domain gaps arise in vision and robot control through case studies, and share our experience on deploying policies trained on simulators to the real world.

Embodied AI Tasks in ManiSkill and Visual Learning Challenges

Goal: we will summarize our findings through hosting the ManiSkill challenge, including

A comparison of imitation learning, reinforcement learning, and classic robotics
A summary of performance by tasks and their key challenges
Analysis on generalization performance

Material

Section	Slides	Video
Overview of Embodied AI	PDF Google Slides	YouTube
The Basic Frameworks and techniques for Embodied AI	PDF Google Slides	YouTube
Design Choices in Embodied AI Environments	PDF Google Slides	YouTube
Experience and Practices to Debug Simulators	PDF Google Slides	YouTube
Real World Robotics and Sim2Real	PDF Google Slides	YouTube
Embodied AI Tasks in ManiSkill and Visual Learning Challenges	PDF Google Slides	YouTube

Code: https://github.com/haosulab/cvpr-tutorial-2022

Schedule

Start	End	Section	Speaker
13:00	13:45	Overview of Embodied AI	Zhiwei Jia (video)
13:45	14:30	The Basic Frameworks and techniques for Embodied AI	Fanbo Xiang (in person)
14:30	15:15	Design Choices in Embodied AI Environments	Jiayuan Gu (video)
15:15	15:30	Break
15:30	16:15	Experience and Practices to Debug Simulators	Fanbo Xiang (in person)
16:15	16:35	Real World Robotics and Sim2Real	Rui Chen (video)
16:35	17:00	Embodied AI Tasks in ManiSkill and Visual Learning Challenges	Fanbo Xiang (in person)