TR2023-150

LoDA: Low-Dimensional Adaptation of Large Language Models


    •  Liu, J., Koike-Akino, T., Wang, P., Brand, M., Wang, Y., Parsons, K., "LoDA: Low-Dimensional Adaptation of Large Language Models", Advances in Neural Information Processing Systems (NeurIPS) workshop, December 2023.
      BibTeX TR2023-150 PDF
      • @inproceedings{Liu2023dec,
      • author = {Liu, Jing and Koike-Akino, Toshiaki and Wang, Pu and Brand, Matthew and Wang, Ye and Parsons, Kieran},
      • title = {LoDA: Low-Dimensional Adaptation of Large Language Models},
      • booktitle = {Advances in Neural Information Processing Systems (NeurIPS) workshop},
      • year = 2023,
      • month = dec,
      • url = {https://www.merl.com/publications/TR2023-150}
      • }
  • MERL Contacts:
  • Research Areas:

    Artificial Intelligence, Machine Learning

Abstract:

Parameter-Efficient Fine-Tuning (PEFT) has recently garnered significant attention, due to the enormous size of Large Language Models (LLM). Among various PEFT methods, Low-Rank Adaptation (LoRA) demonstrates comparable performance to full fine-tuning, despite having significantly fewer trainable parameters. In this work, we first generalize LoRA from a low-rank linear adaptation/mapping to low- dimensional, non-linear adaptation/mapping, called Low-Dimensional Adaptation (LoDA). We further propose LoDA+, which further improves the expressiveness of the non-linear adaptation and still uses almost the same number of tunable parameters as LoRA. Both LoDA and LoDA+ include LoRA as a special case. To improve computational efficiency at inference, we further propose R-LoDA(+) and S-LoDA(+), replacing the pretrained weight matrix by its low-rank or sparse approximation, which is frozen during fine-tuning. Empirical evaluations on Natu- ral Language Generation tasks show that LoDA(+) and some variants outperform LoRA as well as other baselines. We will release a package that facilitates the integration of LoDA(+) and their variants with PyTorch models.