Building and Working in Environments for Embodied AI

A CVPR 2022 Tutorial, June 20


In recent years, there has been a growing interest in embodied AI research in computer vision. Multiple embodied AI workshops and challenges have taken place in the research community, including Generalizable Policy Learning in the Physical World in ICLR 2022, OCRTOC: Open Cloud Robot Table Organization Challenge in IROS 2020, Habitat: Embodied Agents Challenge and Workshop in CVPR 2019, and Embodied AI Workshop in CVPR 2020 and 2021. Computer vision is now an essential module in embodied AI research, but we are still missing a basic tutorial to guide researchers, especially those from vision and machine learning backgrounds, to get started in this field.

In particular, many impressive progress in embodied AI has been made in virtual environments, which are powered by the latest progress in physical simulation and rendering technologies. These platforms allows for the study of many vision-robotics problems that cannot be studied at scale in the real world before. The nature of faster speed, easier parallelization, simpler data collection, and lower cost allows embodied AI study in simulation to build larger communities, with diverse researcher backgrounds, improved code sharing, and standard benchmarks. However, virtual environments do come with their own issues, such as simulation parameters and domain gaps, which are worth noting when building and using them.

Our tutorial aims to provide the getting-started guide for computer vision researchers to study vision problems on embodied agents in the environments, as well as highlight common issues encountered when using these environments. The tutorial will focus on the principles shared across platforms and teach concepts using multiple simulation environments.


The course will cover the following units:

The Basic Frameworks for Embodied AI

The details of how the visual system and the control and actuation system are connected together is often unclear to researchers in the vision community. Here we introduce common frameworks to compose a system with both components.

Techniques Behind Embodied AI Environments and Sources of Domain Gaps

hen vision researchers use embodied AI environments, they need to have an basic knowledge of how the simulator works. This allows them to understand the capabilities and the limitations of the simulation, so that they can leverage the full capabilities of these environments and ensure correct simulation. We provide a summary of the key parameters, and guidance on how to debug issues independently.

Design Choices in Modern Embodied AI Environments

Building a embodied AI environment is much more than knowing the underlying simulation technologies. To study vision problems under useful setups and at proper abstraction level, we introduce the common design choices. We will also explain the choices in common embodied AI challenges so that audiences can quickly start working on them.

Experiences and Practices to Debug Simulators

Virtual environments are not perfect. Vision researchers new to them often face challenges using them correctly. Our team has rich experiences from the feedbacks of the SAPIEN (a simulator used by many and supports the ManiSkill Embodied AI Challenge) user community. We would share these experiences.

Embodied AI Tasks and Visual Learning Challenges

Goal: We will introduce major tasks in embodied AI and provide perspective on some interesting and challenging vision problems offered by these tasks.


(coming soon)

Tentative Schedule

Start End Section
13:00 13:45 The Basic Frameworks and Embodied AI
13:45 14:30 Techniques Behind Embodied AI Environments
14:30 15:15 Design Choices in Embodied AI Environments
15:15 15:30 Break
15:30 16:15 Experience and Practices to Debug Simulators
16:15 17:00 Embodied AI Tasks and Visual Learning Challenges

Organizers and Speakers

listed alphabetically

© 2022 Building and Working in Environments for Embodied AI