Senior Software Engineer, Data Processing

Protege

Completely RemoteFull TimeEngineering & Architecture

Posted 1 months ago

This role is no longer accepting applications.

Browse live jobs

Job description

Responsibilities

Design, build, and operate ingestion systems that process large volumes of multimodal data into structured, AI-ready datasets
Own the end-to-end ingestion path, including data validation, processing, tracking, and downstream availability
Build modality-specific processing steps for imaging, audio, video, and other unstructured data formats
Develop parsers, validators, and normalization logic to handle messy and high-variance source data
Optimize systems for high throughput, reliability, and cost-efficiency using distributed and parallel compute
Implement rigorous data quality checks and security protocols to handle sensitive and regulated data (e.g., PHI)
Partner with Product and Data Lab teams to standardize reusable processing patterns and internal tooling

Requirements

5+ years of experience building and operating production backend or data systems
Proven experience designing and running large-scale data pipelines
Strong programming skills in Python
Experience with distributed data processing
Strong proficiency with AWS
Ability to thrive in high-ambiguity environments with messy, high-volume data

Preferred Qualifications

Experience processing specific modalities such as medical imaging (DICOM), text, audio, or video
Background working with regulated data environments (HIPAA, healthcare compliance, PHI)
Experience with workflow orchestration tools like Airflow or Dagster
Experience with GCP or Azure
Prior experience as an early engineer in a startup environment
Familiarity with ML, NLP, or LLM-based systems, including embeddings and fine-tuning

About the Company

Protege is building a platform to solve AI's biggest unmet need: access to high-quality training data. We facilitate the secure, efficient, and privacy-centric exchange of AI training data, connecting organizations with high-value data to the AI builders who need it. We are a lean, fast-moving team of builders obsessed with velocity and impact.

Skills & tools

PythonAWSData Pipelines

What the team is looking for

Use this list as a quick fit check before you apply.

015+ years building production backend or data systems
02Experience designing large-scale data pipelines
03Strong Python programming skills
04Experience with distributed data processing
05Proficiency with AWS

Wake up to a shortlist, not a search results page.

ScoutJobs scores every new listing against your CV, salary floor and visa. A handful of real matches by morning.

Get your daily matches

Protege

Applications closed

Job details

Work model: Completely Remote
Commitment: Full Time
Category: Engineering & Architecture
Posted: 1 months ago

Wake up to a shortlist, not a search results page.

ScoutJobs scores every new listing against your CV, salary floor and visa. A handful of real matches by morning.

Get your daily matches

Applications closed