
Senior Software Engineer, Data Processing
Protege
Completely RemoteFull TimeEngineering & Architecture
Posted Today
Job description
Responsibilities
- Design, build, and operate ingestion systems that process large volumes of multimodal data into structured, AI-ready datasets
- Own the end-to-end ingestion path, including data validation, processing, tracking, and downstream availability
- Build modality-specific processing steps for imaging, audio, video, and other unstructured data formats
- Develop parsers, validators, and normalization logic to handle messy and high-variance source data
- Optimize systems for high throughput, reliability, and cost-efficiency using distributed and parallel compute
- Implement rigorous data quality checks and security protocols to handle sensitive and regulated data (e.g., PHI)
- Partner with Product and Data Lab teams to standardize reusable processing patterns and internal tooling
Requirements
- 5+ years of experience building and operating production backend or data systems
- Proven experience designing and running large-scale data pipelines
- Strong programming skills in Python
- Experience with distributed data processing
- Strong proficiency with AWS
- Ability to thrive in high-ambiguity environments with messy, high-volume data
Preferred Qualifications
- Experience processing specific modalities such as medical imaging (DICOM), text, audio, or video
- Background working with regulated data environments (HIPAA, healthcare compliance, PHI)
- Experience with workflow orchestration tools like Airflow or Dagster
- Experience with GCP or Azure
- Prior experience as an early engineer in a startup environment
- Familiarity with ML, NLP, or LLM-based systems, including embeddings and fine-tuning
About the Company
Protege is building a platform to solve AI's biggest unmet need: access to high-quality training data. We facilitate the secure, efficient, and privacy-centric exchange of AI training data, connecting organizations with high-value data to the AI builders who need it. We are a lean, fast-moving team of builders obsessed with velocity and impact.
Skills & tools
PythonAWSData Pipelines
What the team is looking for
Use this list as a quick fit check before you apply.
- 015+ years building production backend or data systems
- 02Experience designing large-scale data pipelines
- 03Strong Python programming skills
- 04Experience with distributed data processing
- 05Proficiency with AWS

Protege
Job details
- Work model
- Completely Remote
- Commitment
- Full Time
- Category
- Engineering & Architecture
- Posted
- Today