
Solutions Applied Data Scientist
Protege
Completely RemoteFull TimeEngineering & Architecture
Posted Yesterday
Job description
Responsibilities
- Design, construct, and validate complex healthcare data cohorts used for AI model training
- Translate research requirements and customer needs into practical dataset definitions and SQL logic
- Act as a technical partner to Solutions Leads to solve complex data challenges, including multi-source joins and dataset linkage
- Perform data completeness analysis, investigate anomalies, and verify cohort logic results
- Evaluate the feasibility of requested variables and labels against available real-world data sources
- Collaborate with delivery engineers to implement necessary changes to data pipelines and infrastructure
- Develop reusable SQL templates, automated validation checks, and scripts to improve delivery workflows
Requirements
- Experience working with large, structured healthcare datasets
- Strong SQL and Python skills, including experience writing complex queries and data transformations
- Experience using Claude Code or Codex
- Experience with data validation, exploratory analysis, and performing completeness checks
- Proficiency with structured file formats such as CSV and Parquet
- Ability to translate ambiguous research requirements into concrete, actionable data logic
- Strong communication skills for collaborating with both technical and non-technical stakeholders
About the Company
Protege is building a platform to solve AI's biggest unmet need: access to high-quality training data. We facilitate the secure, efficient, and privacy-centric exchange of AI training data, helping ambitious teams power their models with the right information. We are a lean, fast-moving, high-trust team of builders obsessed with velocity and impact.
Skills & tools
SQLPython
What the team is looking for
Use this list as a quick fit check before you apply.
- 01Experience with large structured healthcare datasets
- 02Strong SQL and Python skills
- 03Experience with complex queries and data transformation
- 04Experience using Claude Code or Codex
- 05Experience with data validation and exploratory analysis
- 06Experience with CSV and Parquet formats

Protege
Job details
- Work model
- Completely Remote
- Commitment
- Full Time
- Category
- Engineering & Architecture
- Posted
- Yesterday