
Machine Learning Researcher - Audio
Protege
Completely RemoteFull TimeEngineering & Architecture
Posted Today
Job description
Responsibilities
- Research audio data quality for machine learning by investigating how signal properties and dataset composition affect downstream model training.
- Develop new metrics, benchmarks, and evaluation frameworks to measure audio quality in ways that predict ML model performance.
- Characterize speech datasets by analyzing acoustic properties such as effective bandwidth, spectral energy, noise, and codec artifacts.
- Build workflows for segment-level quality evaluation to detect localized degradation in diarized or segmented speech regions.
- Design and run targeted evaluations connecting audio quality issues to downstream behaviors in ASR, TTS, and speaker modeling.
- Translate research findings into reproducible filtering rules, quality gates, and scalable evaluation infrastructure.
- Collaborate with ML researchers, data engineers, and operations teams to communicate the value of audio data assets.
Requirements
- PhD or equivalent Master’s degree plus 4+ years of industry experience in machine learning, audio signal processing, or speech technology.
- Proven experience designing and running data evaluations, audio analyses, or benchmarks.
- Strong understanding of speech/audio signal properties, including sampling rates, codecs, spectrograms, and perceptual quality.
- Experience developing or evaluating metrics and measurement frameworks for ML systems or audio signal analysis.
- Ability to connect low-level signal properties to downstream machine learning behavior and model robustness.
- Proficiency in moving between research exploration and production implementation of scalable tools.
- Excellent technical communication skills and a high degree of ownership.
Preferred Qualifications
- Experience with ASR, TTS, speaker modeling, self-supervised speech models, or multimodal audio models.
- Experience developing evaluation frameworks specifically for training data.
- Publications or open-source contributions in speech, audio ML, or data-centric AI.
- Experience studying the relationship between dataset quality and downstream model performance.
About the Company
Protege is building a platform to solve the biggest unmet need in AI: access to high-quality training data. We facilitate the secure, efficient, and privacy-centric exchange of AI training data, helping ambitious teams power their models with the best possible signals. We are a lean, fast-moving, high-trust team of builders obsessed with velocity and impact.
Skills & tools
Machine LearningAudio Signal ProcessingSpeech Technology
What the team is looking for
Use this list as a quick fit check before you apply.
- 01PhD or Master's with 4+ years experience
- 02Experience in ML, audio signal processing, or speech technology
- 03Strong understanding of speech/audio signal properties
- 04Experience designing data evaluations or benchmarks
- 05Ability to implement research into scalable tools

Protege
Job details
- Work model
- Completely Remote
- Commitment
- Full Time
- Category
- Engineering & Architecture
- Posted
- Today