LLawCo: Learning Laws of Cooperation for Modeling Embodied Multi-Agent Behavior

LLawCo enables embodied agents to autonomously align their thoughts with both their partners and task objectives.

MERL Researchers: Anoop Cherian.
Joint work with:
Qinhong Zhou (University of Massachusetts Amherst),
Chuang Gan (University of Massachusetts Amherst)

Embodied agents operating in decentralized and partially observable environments must jointly reason, act, and communicate with their partners. However, existing LLM-based agents often exhibit behaviors that are misaligned with their partners or inconsistent with the environment state, leading to inefficient cooperation and poor task success.

Inspired by Isaac Asimov's Three Laws of Robotics, we propose to use high-level laws for guiding robot behavior. However, instead of using a fixed set of laws, our laws are derived from task-specific experience and performance. To this end, we introduce LLawCo, a law learning framework that enables embodied agents to autonomously align their thoughts with both their partners and task objectives. Unlike manually specified universal rules, LLawCo learns task- and partner-specific cooperation laws from agents' own interaction failures. It reflects on past failures, extracts recurring behavioral failure patterns, and summarizes them into high-level cooperation laws, such as "talk hen necessary", "plan before acting", and "confirm completion". These laws are then incorporated into the agents' reasoning traces through supervised fine-tuning.

LLawCo: Learning Laws from Failures

LLawCo is built around a simple idea: failed interactions reveal what agents should avoid, while successful law-aligned interactions reveal what agents should learn.

Given multi-agent training episodes, LLawCo first separates successful and failed traces according to task completion. Failed traces are analyzed to identify recurring failure reasons, such as poor coordination, ambiguous communication, redundant exploration, invalid actions, or premature task completion. These failure reasons are then summarized into a compact set of high-level behavioral laws.

Successful traces are further filtered according to these laws. LLawCo retains successful episodes whose behaviors are consistent with the induced laws, and uses them to generate law-guided reasoning traces. The resulting data is used to fine-tune the LLM-based planner, so that during inference the agent can explicitly reason with laws before selecting communication or physical actions.

The learned laws serve as an interpretable and controllable interface. They help agents coordinate with partners, communicate more effectively, avoid repeated failures, and verify task completion. They can also be manually edited to enforce user-specified preferences or domain-specific constraints.

PARTNR-Dialog: A Benchmark for Communicative Embodied Cooperation

We introduce PARTNR-Dialog, a large-scale benchmark for communicative embodied cooperation. PARTNR-Dialog extends the original PARTNR-Dialog environment with explicit communication between agents. Agents can invoke a new Talk [message] action to broadcast messages to their partners. These messages are handled through an event buffer and incorporated into each agent's future observations and planning context.

Experimental Evaluation

We evaluate LLawCo on two communicative embodied multi-agent benchmarks: PARTNR-Dialog and TDW-MAT. Across four LLM backbones, LLawCo consistently improves task success and cooperation efficiency compared with strong communicative baselines.

Table 1. Results on PARTNR-Dialog. — *Table 1.* Results on PARTNR-Dialog.

Table 2. Results on TDW-MAT. — *Table 2.* Results on TDW-MAT.

Case Study: Laws Guide Communication and Planning

An example from the TDW dataset shows how laws influence concrete agent behavior. Rather than only improving final success rates, LLawCo changes how agents cooperate: they communicate with clearer intent, avoid redundant actions, adapt to partner behavior, and make more efficient task progress.

MERL Publications

Zhou, Q., Gan, C., Cherian, A., "LLawCo: Learning Laws of Cooperation for Modeling Embodied Multi-Agent Behavior", International Conference on Machine Learning (ICML), June 2026.
BibTeX TR2026-081 PDF Video
- @inproceedings{Zhou2026jun,
- author = {Zhou, Qinhong and Gan, Chuang and Cherian, Anoop},
- title = {{LLawCo: Learning Laws of Cooperation for Modeling Embodied Multi-Agent Behavior}},
- booktitle = {International Conference on Machine Learning (ICML)},
- year = 2026,
- month = jun,
- url = {https://www.merl.com/publications/TR2026-081}
- }