Solutions Applied Data Scientist

Protege

Completely RemoteFull TimeEngineering & Architecture
Posted Yesterday

Job description

Responsibilities

  • Design, construct, and validate complex healthcare data cohorts used for AI model training
  • Translate research requirements and customer needs into practical dataset definitions and SQL logic
  • Act as a technical partner to Solutions Leads to solve complex data challenges, including multi-source joins and dataset linkage
  • Perform data completeness analysis, investigate anomalies, and verify cohort logic results
  • Evaluate the feasibility of requested variables and labels against available real-world data sources
  • Collaborate with delivery engineers to implement necessary changes to data pipelines and infrastructure
  • Develop reusable SQL templates, automated validation checks, and scripts to improve delivery workflows

Requirements

  • Experience working with large, structured healthcare datasets
  • Strong SQL and Python skills, including experience writing complex queries and data transformations
  • Experience using Claude Code or Codex
  • Experience with data validation, exploratory analysis, and performing completeness checks
  • Proficiency with structured file formats such as CSV and Parquet
  • Ability to translate ambiguous research requirements into concrete, actionable data logic
  • Strong communication skills for collaborating with both technical and non-technical stakeholders

About the Company

Protege is building a platform to solve AI's biggest unmet need: access to high-quality training data. We facilitate the secure, efficient, and privacy-centric exchange of AI training data, helping ambitious teams power their models with the right information. We are a lean, fast-moving, high-trust team of builders obsessed with velocity and impact.

Skills & tools

SQLPython

What the team is looking for

Use this list as a quick fit check before you apply.

  1. 01Experience with large structured healthcare datasets
  2. 02Strong SQL and Python skills
  3. 03Experience with complex queries and data transformation
  4. 04Experience using Claude Code or Codex
  5. 05Experience with data validation and exploratory analysis
  6. 06Experience with CSV and Parquet formats