TR2025-053

FDPP: Fine-tune Diffusion Policy with Human Preference


    •  Chen, Y., Jha, D.K., Tomizuka, M., Romeres, D., "FDPP: Fine-tune Diffusion Policy with Human Preference", IEEE International Conference on Robotics and Automation (ICRA), May 2025.
      BibTeX TR2025-053 PDF
      • @inproceedings{Chen2025may,
      • author = {Chen, Yuxin and Jha, Devesh K. and Tomizuka, Masayoshi and Romeres, Diego},
      • title = {{FDPP: Fine-tune Diffusion Policy with Human Preference}},
      • booktitle = {IEEE International Conference on Robotics and Automation (ICRA)},
      • year = 2025,
      • month = may,
      • url = {https://www.merl.com/publications/TR2025-053}
      • }
  • MERL Contacts:
  • Research Areas:

    Machine Learning, Optimization

Abstract:

Imitation learning from human demonstrations enables robots to perform complex manipulation tasks and has recently witnessed huge success. However, these techniques often struggle to adapt behavior to new preferences or changes in the environment. To address these limitations, we propose Fine-tuning Diffusion Policy with Human Preference (FDPP). FDPP learns a reward function through preference-based learning. This reward is then used to fine-tune the pre-trained policy with reinforcement learning (RL), resulting in alignment of pre-trained policy with new human preferences while still solving the original task. Our experiments across various robotic tasks and preferences demonstrate that FDPP effectively customizes policy behavior without compromising performance. Additionally, we show that incorporating Kullback–Leibler (KL) regularization during fine-tuning prevents over-fitting and helps maintain the competencies of the initial policy.

 

  • Related Publication

  •  Chen, Y., Jha, D.K., Tomizuka, M., Romeres, D., "FDPP: Fine-tune Diffusion Policy with Human Preference", arXiv, January 2025.
    BibTeX arXiv
    • @article{Chen2025jan,
    • author = {Chen, Yuxin and Jha, Devesh K. and Tomizuka, Masayoshi and Romeres, Diego},
    • title = {{FDPP: Fine-tune Diffusion Policy with Human Preference}},
    • journal = {arXiv},
    • year = 2025,
    • month = jan,
    • url = {https://arxiv.org/abs/2501.08259}
    • }