In recent years, there has been a growing interest in embodied AI research in computer vision. Multiple embodied AI workshops and challenges have taken place in the research community, including Generalizable Policy Learning in the Physical World in ICLR 2022, OCRTOC: Open Cloud Robot Table Organization Challenge in IROS 2020, Habitat: Embodied Agents Challenge and Workshop in CVPR 2019, and Embodied AI Workshop in CVPR 2020 and 2021. Computer vision is now an essential module in embodied AI research, but we are still missing a basic tutorial to guide researchers, especially those from vision and machine learning backgrounds, to get started in this field.
In particular, many impressive progress in embodied AI has been made in virtual environments, which are powered by the latest progress in physical simulation and rendering technologies. These platforms allows for the study of many vision-robotics problems that cannot be studied at scale in the real world before. The nature of faster speed, easier parallelization, simpler data collection, and lower cost allows embodied AI study in simulation to build larger communities, with diverse researcher backgrounds, improved code sharing, and standard benchmarks. However, virtual environments do come with their own issues, such as simulation parameters and domain gaps, which are worth noting when building and using them.
Our tutorial aims to provide the getting-started guide for computer vision researchers to study vision problems on embodied agents in the environments, as well as highlight common issues encountered when using these environments. The tutorial will focus on the principles shared across platforms and teach concepts using multiple simulation environments.
The course will cover the following units:
Embodied AI involves a wide range of topics. Here we provide a broad overview of the embodied AI field, including the following components.
The details of how the visual system and the control and actuation system are connected together is often unclear to researchers in the vision community. Here we introduce common frameworks to compose a system with both components.
When vision researchers use embodied AI environments, they also need to have an basic knowledge of how the simulator works. This allows them to understand the capabilities and the limitations of the simulation, so that they can leverage the full capabilities of these environments and ensure correct simulation. We provide a summary of the key parameters, and guidance on how to debug issues independently.
Building a embodied AI environment is much more than knowing the underlying simulation technologies. To study vision problems under useful setups and at proper abstraction level, we introduce the common design choices. We will also explain the choices in common embodied AI challenges so that audiences can quickly start working on them.
Virtual environments are not perfect. Vision researchers new to them often face challenges using them correctly. Our team has rich experiences from the feedbacks of the SAPIEN (a simulator used by many and supports the ManiSkill Embodied AI Challenge) user community. We would share these experiences.
Sim2Real is a very common question asked by users of simulation environments. In this section, we demonstrate how sim2real domain gaps arise in vision and robot control through case studies, and share our experience on deploying policies trained on simulators to the real world.
Goal: we will summarize our findings through hosting the ManiSkill challenge, including
|Overview of Embodied AI||PDF Google Slides||YouTube|
|The Basic Frameworks and techniques for Embodied AI||PDF Google Slides||YouTube|
|Design Choices in Embodied AI Environments||PDF Google Slides||YouTube|
|Experience and Practices to Debug Simulators||PDF Google Slides||YouTube|
|Real World Robotics and Sim2Real||PDF Google Slides||YouTube|
|Embodied AI Tasks in ManiSkill and Visual Learning Challenges||PDF Google Slides||YouTube|
|13:00||13:45||Overview of Embodied AI||Zhiwei Jia (video)|
|13:45||14:30||The Basic Frameworks and techniques for Embodied AI||Fanbo Xiang (in person)|
|14:30||15:15||Design Choices in Embodied AI Environments||Jiayuan Gu (video)|
|15:30||16:15||Experience and Practices to Debug Simulators||Fanbo Xiang (in person)|
|16:15||16:35||Real World Robotics and Sim2Real||Rui Chen (video)|
|16:35||17:00||Embodied AI Tasks in ManiSkill and Visual Learning Challenges||Fanbo Xiang (in person)|