Software & Data Downloads — AVLEN

Audio-Visual-Language Embodied Navigation in 3D Environments for audio-visual-language embodied navigation.

Recent years have seen embodied visual navigation advance in two distinct directions: (i) in equipping the AI agent to follow natural language instructions, and (ii) in making the navigable world multimodal, e.g., audio-visual navigation. However, the real world is not only multimodal, but also often complex, and thus in spite of these advances, agents still need to understand the uncertainty in their actions and seek instructions to navigate. To this end, we present AVLEN -- an interactive agent for Audio-Visual-Language Embodied Navigation. Similar to audio-visual navigation tasks, the goal of our embodied agent is to localize an audio event via navigating the 3D visual world; however, the agent may also seek help from a human (oracle), where the assistance is provided in free-form natural language. To realize these abilities, AVLEN uses a multimodal hierarchical reinforcement learning backbone that learns: (a) high-level policies to choose either audio-cues for navigation or to query the oracle, and (b) lower-level policies to select navigation actions based on its audio-visual and language inputs. The policies are trained via rewarding for the success on the navigation task while minimizing the number of queries to the oracle.

In this release, we are making public our pretrained models and auxiliary data that are useful for training and evaluating our PyTorch-based AVLEN implementation.

  •  Paul, S., Roy Chowdhury, A.K., Cherian, A., "AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments", Advances in Neural Information Processing Systems (NeurIPS), October 2022, pp. 6236-6249.
    BibTeX TR2022-131 PDF Video Data Software
    • @inproceedings{Paul2022oct2,
    • author = {Paul, Sudipta and Roy Chowdhury, Amit K and Cherian, Anoop},
    • title = {AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments},
    • booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
    • year = 2022,
    • pages = {6236--6249},
    • month = oct,
    • url = {https://www.merl.com/publications/TR2022-131}
    • }

Access software at https://github.com/merlresearch/avlen.

Access data at https://doi.org/10.5281/zenodo.7871764.