Francois Germain

- Phone: 617-621-7506
- Email:
-
Position:
Research / Technical Staff
Visiting Research Scientist -
Education:
Ph.D., Stanford University, 2019 -
Research Areas:
Francois' Quick Links
-
Biography
During his graduate studies, François worked on advancing the state of the art in efficient modelling of analog audio systems. Concurrently, he made important contributions to audio signal processing and spatial audio rendering during internships at Adobe Research, Dolby Laboratories and Intel Labs. Before joining MERL, he led research on music source separation and speech enhancement at iZotope. His research interests focus on efficient and robust signal processing and machine learning methods applied to speech, music, and audio content in general.
-
Recent News & Events
-
EVENT MERL Contributes to ICASSP 2023 Date: Sunday, June 4, 2023 - Saturday, June 10, 2023
Location: Rhodes Island, Greece
MERL Contacts: Petros T. Boufounos; Francois Germain; Toshiaki Koike-Akino; Jonathan Le Roux; Dehong Liu; Suhas Lohit; Yanting Ma; Hassan Mansour; Joshua Rapp; Anthony Vetro; Pu (Perry) Wang; Gordon Wichern
Research Areas: Artificial Intelligence, Computational Sensing, Machine Learning, Signal Processing, Speech & AudioBrief- MERL has made numerous contributions to both the organization and technical program of ICASSP 2023, which is being held in Rhodes Island, Greece from June 4-10, 2023.
Organization
Petros Boufounos is serving as General Co-Chair of the conference this year, where he has been involved in all aspects of conference planning and execution.
Perry Wang is the organizer of a special session on Radar-Assisted Perception (RAP), which will be held on Wednesday, June 7. The session will feature talks on signal processing and deep learning for radar perception, pose estimation, and mutual interference mitigation with speakers from both academia (Carnegie Mellon University, Virginia Tech, University of Illinois Urbana-Champaign) and industry (Mitsubishi Electric, Bosch, Waveye).
Anthony Vetro is the co-organizer of the Workshop on Signal Processing for Autonomous Systems (SPAS), which will be held on Monday, June 5, and feature invited talks from leaders in both academia and industry on timely topics related to autonomous systems.
Sponsorship
MERL is proud to be a Silver Patron of the conference and will participate in the student job fair on Thursday, June 8. Please join this session to learn more about employment opportunities at MERL, including openings for research scientists, post-docs, and interns.
MERL is pleased to be the sponsor of two IEEE Awards that will be presented at the conference. We congratulate Prof. Rabab Ward, the recipient of the 2023 IEEE Fourier Award for Signal Processing, and Prof. Alexander Waibel, the recipient of the 2023 IEEE James L. Flanagan Speech and Audio Processing Award.
Technical Program
MERL is presenting 13 papers in the main conference on a wide range of topics including source separation and speech enhancement, radar imaging, depth estimation, motor fault detection, time series recovery, and point clouds. One workshop paper has also been accepted for presentation on self-supervised music source separation.
Perry Wang has been invited to give a keynote talk on Wi-Fi sensing and related standards activities at the Workshop on Integrated Sensing and Communications (ISAC), which will be held on Sunday, June 4.
Additionally, Anthony Vetro will present a Perspective Talk on Physics-Grounded Machine Learning, which is scheduled for Thursday, June 8.
About ICASSP
ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.
- MERL has made numerous contributions to both the organization and technical program of ICASSP 2023, which is being held in Rhodes Island, Greece from June 4-10, 2023.
-
NEWS Members of the Speech & Audio team elected to IEEE Technical Committee Date: November 28, 2022
MERL Contacts: Francois Germain; Gordon Wichern
Research Area: Speech & AudioBrief- Gordon Wichern and François Germain have been elected for 3-year terms to the IEEE Audio and Acoustic Signal Processing Technical Committee (AASP TC) of the IEEE Signal Processing Society.
The AASP TC's mission is to support, nourish, and lead scientific and technological development in all areas of audio and acoustic signal processing. It numbers 30 or so appointed volunteer members drawn roughly equally from leading academic and industrial organizations around the world, unified by the common aim to offer their expertise in the service of the scientific community.
- Gordon Wichern and François Germain have been elected for 3-year terms to the IEEE Audio and Acoustic Signal Processing Technical Committee (AASP TC) of the IEEE Signal Processing Society.
See All News & Events for Francois -
-
Awards
-
AWARD Joint CMU-MERL team wins DCASE2023 Challenge on Automated Audio Captioning Date: June 1, 2023
Awarded to: Shih-Lun Wu, Xuankai Chang, Gordon Wichern, Jee-weon Jung, Francois Germain, Jonathan Le Roux, Shinji Watanabe
MERL Contacts: Francois Germain; Jonathan Le Roux; Gordon Wichern
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- A joint team consisting of members of CMU Professor and MERL Alumn Shinji Watanabe's WavLab and members of MERL's Speech & Audio team ranked 1st out of 11 teams in the DCASE2023 Challenge's Task 6A "Automated Audio Captioning". The team was led by student Shih-Lun Wu and also featured Ph.D. candidate Xuankai Chang, Postdoctoral research associate Jee-weon Jung, Prof. Shinji Watanabe, and MERL researchers Gordon Wichern, Francois Germain, and Jonathan Le Roux.
The IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE Challenge), started in 2013, has been organized yearly since 2016, and gathers challenges on multiple tasks related to the detection, analysis, and generation of sound events. This year, the DCASE2023 Challenge received over 428 submissions from 123 teams across seven tasks.
The CMU-MERL team competed in the Task 6A track, Automated Audio Captioning, which aims at generating informative descriptions for various sounds from nature and/or human activities. The team's system made strong use of large pretrained models, namely a BEATs transformer as part of the audio encoder stack, an Instructor Transformer encoding ground-truth captions to derive an audio-text contrastive loss on the audio encoder, and ChatGPT to produce caption mix-ups (i.e., grammatical and compact combinations of two captions) which, together with the corresponding audio mixtures, increase not only the amount but also the complexity and diversity of the training data. The team's best submission obtained a SPIDEr-FL score of 0.327 on the hidden test set, largely outperforming the 2nd best team's 0.315.
- A joint team consisting of members of CMU Professor and MERL Alumn Shinji Watanabe's WavLab and members of MERL's Speech & Audio team ranked 1st out of 11 teams in the DCASE2023 Challenge's Task 6A "Automated Audio Captioning". The team was led by student Shih-Lun Wu and also featured Ph.D. candidate Xuankai Chang, Postdoctoral research associate Jee-weon Jung, Prof. Shinji Watanabe, and MERL researchers Gordon Wichern, Francois Germain, and Jonathan Le Roux.
-
-
Research Highlights
-
Internships with Francois
-
SA2067: Sound event and anomaly detection
We are seeking graduate students interested in helping advance the fields of sound event detection/localization and sound anomaly detection. The interns will collaborate with MERL researchers to derive and implement new models and optimization methods, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which may later become part of the intern''s doctoral work. The ideal candidates are senior Ph.D. students with experience in some of the following: audio signal processing, microphone array processing, probabilistic modeling, sequence to sequence models, and deep learning techniques, in particular those involving minimal supervision (e.g., unsupervised, weakly-supervised, self-supervised, or few-shot learning). Multiple positions are available with flexible start date (not just Spring/Summer but throughout 2024) and duration (typically 3-6 months).
-
-
MERL Publications
- "On the Use of Pretrained Deep Audio Encoders for Automated Audio Captioning Tasks", International Symposium on Future Active Safety Technology toward zero traffic accidents (FAST-zero), November 2023.BibTeX TR2023-141 PDF
- @inproceedings{Wu2023nov,
- author = {Wu, Shih-Lun and Chang, Xuankai and Wichern, Gordon and Jung, Jee-weon and Germain, François G and Le Roux, Jonathan and Watanabe, Shinji},
- title = {On the Use of Pretrained Deep Audio Encoders for Automated Audio Captioning Tasks},
- booktitle = {International Symposium on Future Active Safety Technology toward zero traffic accidents (FAST-zero)},
- year = 2023,
- month = nov,
- url = {https://www.merl.com/publications/TR2023-141}
- }
, - "Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction", arXiv, October 2023.BibTeX arXiv
- @article{Pan2023oct,
- author = {Pan, Zexu and Wichern, Gordon and Masuyama, Yoshiki and Germain, François G and Khurana, Sameer and Hori, Chiori and Le Roux, Jonathan},
- title = {Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction},
- journal = {arXiv},
- year = 2023,
- month = oct,
- url = {http://arxiv.org/abs/2310.19644}
- }
, - "Generation or Replication: Auscultating Audio Latent Diffusion Models", arXiv, October 2023.BibTeX arXiv
- @article{Bralios2023oct,
- author = {Bralios, Dimitrios and Wichern, Gordon and Germain, François G and Pan, Zexu and Khurana, Sameer and Hori, Chiori and Le Roux, Jonathan},
- title = {Generation or Replication: Auscultating Audio Latent Diffusion Models},
- journal = {arXiv},
- year = 2023,
- month = oct,
- url = {https://arxiv.org/abs/2310.10604}
- }
, - "Late Audio-Visual Fusion for In-The-Wild Speaker Diarization", arXiv, DOI: 10.48550/arXiv.2211.01299, September 2023.BibTeX arXiv
- @article{Pan2023sep,
- author = {Pan, Zexu and Wichern, Gordon and Germain, Francois and Subramanian, Aswin and Le Roux, Jonathan},
- title = {Late Audio-Visual Fusion for In-The-Wild Speaker Diarization},
- journal = {arXiv},
- year = 2023,
- month = sep,
- doi = {10.48550/arXiv.2211.01299},
- url = {https://arxiv.org/abs/2211.01299}
- }
, - "Location as supervision for weakly supervised multi-channel source separation of machine sounds", IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), DOI: 10.1109/WASPAA58266.2023.10248128, September 2023.BibTeX TR2023-119 PDF Presentation
- @inproceedings{FalconPerez2023aug,
- author = {Falcon Perez, Ricardo and Wichern, Gordon and Germain, Francois and Le Roux, Jonathan},
- title = {Location as supervision for weakly supervised multi-channel source separation of machine sounds},
- booktitle = {IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
- year = 2023,
- month = sep,
- publisher = {IEEE},
- doi = {10.1109/WASPAA58266.2023.10248128},
- issn = {1947-1629},
- isbn = {979-8-3503-2372-6},
- url = {https://www.merl.com/publications/TR2023-119}
- }
,
- "On the Use of Pretrained Deep Audio Encoders for Automated Audio Captioning Tasks", International Symposium on Future Active Safety Technology toward zero traffic accidents (FAST-zero), November 2023.
-
Other Publications
- "Periodic Analysis of Nonlinear Virtual Analog Models", IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), October 2021, pp. 321-325.BibTeX
- @Inproceedings{Germain:PeriodicAnalysisNonlinear:2021,
- author = {Germain, Fran\c{c}ois G.},
- title = {Periodic Analysis of Nonlinear Virtual Analog Models},
- booktitle = {{IEEE} Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
- year = 2021,
- pages = {321--325},
- month = oct
- }
, - "Practical Virtual Analog Modeling Using Möbius Transforms", International Conference on Digital Audio Effects (DAFx), September 2021, pp. 49-56.BibTeX
- @Inproceedings{Germain:PracticalVirtualAnalog:2021,
- author = {Germain, Fran\c{c}ois G.},
- title = {Practical Virtual Analog Modeling Using Möbius Transforms},
- booktitle = {International Conference on Digital Audio Effects (DAFx)},
- year = 2021,
- pages = {49--56},
- month = sep
- }
, - "Energy-preserving Time-varying Schroeder Allpass Filters and Multichannel Extensions", Journal of the Audio Engineering Society (AES), Vol. 69, No. 7/8, pp. 465-485, 2021.BibTeX
- @Article{WernerGermainGoldsmith:EnergypreservingTime:2021,
- author = {Werner, Kurt James and Germain, Francois G. and Goldsmith, Cory S.},
- title = {Energy-preserving Time-varying Schroeder Allpass Filters and Multichannel Extensions},
- journal = {Journal of the Audio Engineering Society (AES)},
- year = 2021,
- volume = 69,
- number = {7/8},
- pages = {465--485}
- }
, - "Non-oversampled Physical Modeling for Virtual Analog Simulations", 2019, Stanford University.BibTeX
- @Phdthesis{Germain:NonoversampledPhysical:2019,
- author = {Germain, Fran\c{c}ois G.},
- title = {Non-oversampled Physical Modeling for Virtual Analog Simulations},
- school = {{S}tanford University},
- year = 2019
- }
, - "Speech Denoising with Deep Feature Losses", INTERSPEECH Conference, September 2018, pp. 2723-2727.BibTeX
- @Inproceedings{GermainChenKoltun:SpeechDenoisingDeep:2018,
- author = {Germain, Francois G. and Chen, Qifeng and Koltun, Vladlen},
- title = {Speech Denoising with Deep Feature Losses},
- booktitle = {{INTERSPEECH} Conference},
- year = 2018,
- pages = {2723--2727},
- month = sep
- }
, - "Optimizing Differentiated Discretization for Audio Circuits beyond Driving Point Transfer Functions", IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), October 2017, pp. 384-388.BibTeX
- @Inproceedings{GermainWerner:OptimizingDifferentiatedDiscretization:2017,
- author = {Germain, Fran\c{c}ois G. and Werner, Kurt James},
- title = {Optimizing Differentiated Discretization for Audio Circuits beyond Driving Point Transfer Functions},
- booktitle = {{IEEE} Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
- year = 2017,
- pages = {384--388},
- month = oct
- }
, - "Fixed-rate Modeling of Audio Lumped Systems: A Comparison between Trapezoidal and Implicit Midpoint Methods", International Conference on Digital Audio Effects (DAFx), September 2017, pp. 168-75.BibTeX
- @Inproceedings{Germain:FixedrateModeling:2017,
- author = {Germain, Fran\c{c}ois G.},
- title = {Fixed-rate Modeling of Audio Lumped Systems: A Comparison between Trapezoidal and Implicit Midpoint Methods},
- booktitle = {International Conference on Digital Audio Effects (DAFx)},
- year = 2017,
- pages = {168--75},
- month = sep
- }
, - "Network Variable Preserving Step-size Control in Wave Digital Filters", International Conference on Digital Audio Effects (DAFx), September 2017, pp. 200-207.BibTeX
- @Inproceedings{OlsenWernerGermain:NetworkVariablePreserving:2017,
- author = {Olsen, Michael J{\o}rgen and Werner, Kurt James and Germain, Fran{\c{c}}ois G.},
- title = {Network Variable Preserving Step-size Control in Wave Digital Filters},
- booktitle = {International Conference on Digital Audio Effects (DAFx)},
- year = 2017,
- pages = {200--207},
- month = sep
- }
, - "Joint Parameter Optimization of Differentiated Discretization Schemes for Audio Circuits", Audio Engineering Society (AES) Convention, May 2017.BibTeX
- @Inproceedings{GermainWerner:JointParameterOptimization:2017,
- author = {Germain, Fran\c{c}ois G. and Werner, Kurt James},
- title = {Joint Parameter Optimization of Differentiated Discretization Schemes for Audio Circuits},
- booktitle = {Audio Engineering Society (AES) Convention},
- year = 2017,
- month = may
- }
, - "A Computational Model of the Hammond Organ Vibrato/chorus Using Wave Digital Filters", International Conference on Digital Audio Effects (DAFx), September 2016, pp. 271-278.BibTeX
- @Inproceedings{WernerDunkelGermain:ComputationalModelHammond:2016,
- author = {Werner, Kurt James and Dunkel, W. Ross and Germain, Fran{\c{c}}ois G.},
- title = {A Computational Model of the Hammond Organ Vibrato/chorus Using Wave Digital Filters},
- booktitle = {International Conference on Digital Audio Effects (DAFx)},
- year = 2016,
- pages = {271--278},
- month = sep
- }
, - "Equalization Matching of Speech Recordings in Real-world Environments", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March 2016, pp. 609-613.BibTeX
- @Inproceedings{GermainMysoreFujioka:EqualizationMatchingSpeech:2016,
- author = {Germain, Fran\c{c}ois G. and Mysore, Gautham J. and Fujioka, Takako},
- title = {Equalization Matching of Speech Recordings in Real-world Environments},
- booktitle = {{IEEE} International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
- year = 2016,
- pages = {609--613},
- month = mar
- }
, - "Sinusoidal Parameter Estimation Using Quadratic Interpolation around Power-scaled Magnitude Spectrum Peaks", Applied Sciences, Vol. 6, No. 10, pp. 306, 2016.BibTeX
- @Article{WernerGermain:SinusoidalParameterEstimation:2016,
- author = {Werner, Kurt James and Germain, Fran{\c{c}}ois Georges},
- title = {Sinusoidal Parameter Estimation Using Quadratic Interpolation around Power-scaled Magnitude Spectrum Peaks},
- journal = {Applied Sciences},
- year = 2016,
- volume = 6,
- number = 10,
- pages = 306,
- publisher = {MDPI}
- }
, - "Design Principles for Lumped Model Discretization Using Möbius Transforms", International Conference on Digital Audio Effects (DAFx), December 2015, pp. 371-378.BibTeX
- @Inproceedings{GermainWerner:DesignPrinciplesLumped:2015,
- author = {Germain, Fran\c{c}ois G. and Werner, Kurt James},
- title = {Design Principles for Lumped Model Discretization Using Möbius Transforms},
- booktitle = {International Conference on Digital Audio Effects (DAFx)},
- year = 2015,
- pages = {371--378},
- month = dec
- }
, - "Speaker and Noise Independent Online Single-channel Speech Enhancement", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), April 2015, pp. 71-75.BibTeX
- @Inproceedings{GermainMysore:SpeakerNoiseIndependent:2015,
- author = {Germain, Fran{\c{c}}ois G. and Mysore, Gautham J.},
- title = {Speaker and Noise Independent Online Single-channel Speech Enhancement},
- booktitle = {{IEEE} International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
- year = 2015,
- pages = {71--75},
- month = apr
- }
, - "Efficient Illuminant Correction in the Local, Linear, Learned (L3) Method", Digital Photography XI, February 2015, vol. 9404, pp. 24-30.BibTeX
- @Inproceedings{GermainAkinolaTianEtAl:EfficientIlluminantCorrection:2015,
- author = {Germain, Francois G. and Akinola, Iretiayo A. and Tian, Qiyuan and Lansel, Steven P. and Wandell, Brian A.},
- title = {Efficient Illuminant Correction in the Local, Linear, Learned (L3) Method},
- booktitle = {Digital Photography XI},
- year = 2015,
- volume = 9404,
- pages = {24--30},
- month = feb
- }
, - "Stopping Criteria for Non-negative Matrix Factorization Based Supervised and Semi-supervised Source Separation", IEEE Signal Processing Letters, Vol. 21, No. 10, pp. 1284-1288, 2014.BibTeX
- @Article{GermainMysore:StoppingCriteriaNon:2014,
- author = {Germain, Fran\c{c}ois G. and Mysore, Gautham J.},
- title = {Stopping Criteria for Non-negative Matrix Factorization Based Supervised and Semi-supervised Source Separation},
- journal = {{IEEE} Signal Processing Letters},
- year = 2014,
- volume = 21,
- number = 10,
- pages = {1284--1288},
- publisher = {IEEE}
- }
, - "Combining Modeling of Singing Voice and Background Music for Automatic Separation of Musical Mixtures", Internation Society for Music Information Retrieval (ISMIR) Conference, November 2013, pp. 41-46.BibTeX
- @Inproceedings{RafiiGermainSunEtAl:CombiningModelingSinging:2013,
- author = {Rafii, Zafar and Germain, Fran{\c{c}}ois G. and Sun, Dennis L. and Mysore, Gautham J.},
- title = {Combining Modeling of Singing Voice and Background Music for Automatic Separation of Musical Mixtures},
- booktitle = {Internation Society for Music Information Retrieval (ISMIR) Conference},
- year = 2013,
- pages = {41--46},
- month = nov
- }
, - "Speaker and Noise Independent Voice Activity Detection", INTERSPEECH Conference, August 2013, pp. 732-736.BibTeX
- @Inproceedings{GermainSunMysore:SpeakerNoiseIndependent:2013,
- author = {Germain, François G. and Sun, Dennis L. and Mysore, Gautham J.},
- title = {Speaker and Noise Independent Voice Activity Detection},
- booktitle = {{INTERSPEECH} Conference},
- year = 2013,
- pages = {732--736},
- month = aug
- }
, - "Uniform Noise Sequencers for Nonlinear System Identification", International Conference on Digital Audio Effects (DAFx), September 2012, pp. 241-244.BibTeX
- @Inproceedings{GermainAbelDepalleEtAl:UniformNoiseSequencers:2012,
- author = {Germain, Fran\c{c}ois G. and Abel, Jonathan S. and Depalle, Philippe and Wanderley, Marcelo M.},
- title = {Uniform Noise Sequencers for Nonlinear System Identification},
- booktitle = {International Conference on Digital Audio Effects (DAFx)},
- year = 2012,
- pages = {241--244},
- address = {York, United Kingdom},
- month = sep
- }
, - "A Nonlinear Analysis Framework for Electronic Synthesizer Circuits", October 2011, McGill University.BibTeX
- @Mastersthesis{Germain:NonlinearAnalysisFramework:2011,
- author = {Germain, Fran\c{c}ois Georges},
- title = {A Nonlinear Analysis Framework for Electronic Synthesizer Circuits},
- school = {McGill University},
- year = 2011,
- address = {Montr{\'e}al, Canada},
- month = oct
- }
, - "Acoustical Properties of the Vocal-tract in Trombone Performance", Forum Acusticum, June 2011, pp. 625-630.BibTeX
- @Inproceedings{FreourScavoneLefebvreEtAl:AcousticalPropertiesVocal:2011,
- author = {Freour, Vincent and Scavone, Gary P. and Lefebvre, Antoine and Germain, Fran{\c{c}}ois},
- title = {Acoustical Properties of the Vocal-tract in Trombone Performance},
- booktitle = {Forum Acusticum},
- year = 2011,
- pages = {625--630},
- month = jun
- }
, - "Synthesis of Guitar by Digital Waveguides: Modeling the Plectrum in the Physical Interaction of the Player with the Instrument", IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), October 2009, pp. 25-28.BibTeX
- @Inproceedings{GermainEvangelista:SynthesisGuitarDigital:2009,
- author = {Germain, Fran{\c{c}}ois and Evangelista, Gianpaolo},
- title = {Synthesis of Guitar by Digital Waveguides: Modeling the Plectrum in the Physical Interaction of the Player with the Instrument},
- booktitle = {{IEEE} Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
- year = 2009,
- pages = {25--28},
- month = oct
- }
,
- "Periodic Analysis of Nonlinear Virtual Analog Models", IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), October 2021, pp. 321-325.