Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy


Pseudo-labeling (PL), a semi-supervised learning (SSL) method where a seed model performs self-training using pseudo-labels generated from untranscribed speech, has been shown to enhance the performance of end-to-end automatic speech recognition (ASR). Our prior work proposed momentum pseudo-labeling (MPL), which performs PL-based SSL via an interaction between online and offline models, inspired by the mean teacher framework. MPL achieves remarkable results on various semi-supervised settings, showing robustness to variations in the amount of data and domain mismatch severity. However, there is further room for improving the seed model used to initialize the MPL training, as it is in general critical for a PL-based method to start training from high-quality pseudolabels. To this end, we propose to enhance MPL by (1) introducing the Conformer architecture to boost the overall recognition accuracy and (2) exploiting iterative pseudo-labeling with a language model to improve the seed model before applying MPL. The experimental results demonstrate that the proposed approaches effectively improve MPL performance, outperforming other PL-based methods. We also present in-depth investigations to make our improvements effective, e.g., with regard to batch normalization typically used in
Conformer and LM quality.


  • Related News & Events

    •  NEWS    MERL presenting 8 papers at ICASSP 2022
      Date: May 22, 2022 - May 27, 2022
      Where: Singapore
      MERL Contacts: Anoop Cherian; Chiori Hori; Toshiaki Koike-Akino; Jonathan Le Roux; Tim K. Marks; Philip V. Orlik; Kuan-Chuan Peng; Pu (Perry) Wang; Gordon Wichern
      Research Areas: Artificial Intelligence, Computer Vision, Signal Processing, Speech & Audio
      • MERL researchers are presenting 8 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held in Singapore from May 22-27, 2022. A week of virtual presentations also took place earlier this month.

        Topics to be presented include recent advances in speech recognition, audio processing, scene understanding, computational sensing, and classification.

        ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.
    •  AWARD    Japan Telecommunications Advancement Foundation Award
      Date: March 15, 2022
      Awarded to: Yukimasa Nagai, Jianlin Guo, Philip Orlik, Takenori Sumi, Benjamin A. Rolfe and Hiroshi Mineno
      MERL Contacts: Jianlin Guo; Philip V. Orlik
      Research Areas: Communications, Machine Learning
      • MELCO/MERL research paper “Sub-1 GHz Frequency Band Wireless Coexistence for the Internet of Things” has won the 37th Telecommunications Advancement Foundation Award (Telecom System Technology Award) in Japan. This award started in 1984, and is given to research papers and works related to information and telecommunications that have made significant contributions and achievements to the advancement, development, and standardization of information and telecommunications from technical and engineering perspectives. The award recognizes both the IEEE 802.19.3 standardization efforts and the technological advancements using reinforcement learning and robust access methodologies for wireless communication system. This year, there were 43 entries with 5 winning awards and 3 winning encouragement awards. This is the first time MELCO/MERL has received this award. Our paper has been published by IEEE Access in 2021 and authors are Yukimasa Nagai, Jianlin Guo, Philip Orlik, Takenori Sumi, Benjamin A. Rolfe and Hiroshi Mineno.
  • Related Publications

  •  Higuchi, Y., Moritz, N., Le Roux, J., Hori, T., "Momentum Pseudo-Labelingによる半教師ありEnd-to-End音声認識", Acoustical Society of Japan Spring Meeting (ASJ), February 2022.
    • @inproceedings{Higuchi2022feb,
    • author = {Higuchi, Yosuke and Moritz, Niko and Le Roux, Jonathan and Hori, Takaaki},
    • title = {Momentum Pseudo-Labelingによる半教師ありEnd-to-End音声認識},
    • booktitle = {Acoustical Society of Japan Spring Meeting (ASJ)},
    • year = 2022,
    • month = feb
    • }
  •  Higuchi, Y., Moritz, N., Le Roux, J., Hori, T., "Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy", arXiv, DOI: 10.48550/​arXiv.2110.04948, December 2021.
    BibTeX arXiv
    • @article{Higuchi2021dec,
    • author = {Higuchi, Yosuke and Moritz, Niko and Le Roux, Jonathan and Hori, Takaaki},
    • title = {Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy},
    • journal = {arXiv},
    • year = 2021,
    • month = dec,
    • doi = {10.48550/arXiv.2110.04948},
    • url = {}
    • }