In recent years, there has been a growing interest in embodied AI research in computer vision. Multiple embodied AI workshops and challenges have taken place in the research community, including Generalizable Policy Learning in the Physical World in ICLR 2022, OCRTOC: Open Cloud Robot Table Organization Challenge in IROS 2020, Habitat: Embodied Agents Challenge and Workshop in CVPR 2019, and Embodied AI Workshop in CVPR 2020 and 2021. Computer vision is now an essential module in embodied AI research, but we are still missing a basic tutorial to guide researchers, especially those from vision and machine learning backgrounds, to get started in this field.
In particular, many impressive progress in embodied AI has been made in virtual environments, which are powered by the latest progress in physical simulation and rendering technologies. These platforms allows for the study of many vision-robotics problems that cannot be studied at scale in the real world before. The nature of faster speed, easier parallelization, simpler data collection, and lower cost allows embodied AI study in simulation to build larger communities, with diverse researcher backgrounds, improved code sharing, and standard benchmarks. However, virtual environments do come with their own issues, such as simulation parameters and domain gaps, which are worth noting when building and using them.
Our tutorial aims to provide the getting-started guide for computer vision researchers to study vision problems on embodied agents in the environments, as well as highlight common issues encountered when using these environments. The tutorial will focus on the principles shared across platforms and teach concepts using multiple simulation environments.
The course will cover the following units:
The details of how the visual system and the control and actuation system are connected together is often unclear to researchers in the vision community. Here we introduce common frameworks to compose a system with both components.
hen vision researchers use embodied AI environments, they need to have an basic knowledge of how the simulator works. This allows them to understand the capabilities and the limitations of the simulation, so that they can leverage the full capabilities of these environments and ensure correct simulation. We provide a summary of the key parameters, and guidance on how to debug issues independently.
Building a embodied AI environment is much more than knowing the underlying simulation technologies. To study vision problems under useful setups and at proper abstraction level, we introduce the common design choices. We will also explain the choices in common embodied AI challenges so that audiences can quickly start working on them.
Virtual environments are not perfect. Vision researchers new to them often face challenges using them correctly. Our team has rich experiences from the feedbacks of the SAPIEN (a simulator used by many and supports the ManiSkill Embodied AI Challenge) user community. We would share these experiences.
Goal: We will introduce major tasks in embodied AI and provide perspective on some interesting and challenging vision problems offered by these tasks.
|13:00||13:45||The Basic Frameworks and Embodied AI|
|13:45||14:30||Techniques Behind Embodied AI Environments|
|14:30||15:15||Design Choices in Embodied AI Environments|
|15:30||16:15||Experience and Practices to Debug Simulators|
|16:15||17:00||Embodied AI Tasks and Visual Learning Challenges|