Solutions Engineer

Protege

Completely RemoteFull TimeEngineering & Architecture
Posted Today

Job description

Responsibilities

  • Execute and monitor large-scale data transfers across AWS S3, Google Cloud Storage, Azure Blob, Snowflake, and customer environments
  • Use Python and SQL to join datasets, clean data, and validate CSV, Parquet, and database tables
  • Manage credentials, permissions, manifests, and delivery packaging artifacts for ingestion and handoff workflows
  • Leverage Dagster-based platforms to orchestrate data processing and delivery
  • Build lightweight scripts and command-line workflows for filtering, manifest generation, and recovery
  • Document steps, outputs, and recovery paths to ensure auditability and repeatability
  • Partner with Engineering to turn manual operational work into repeatable platform capabilities

Requirements

  • Strong hands-on experience with production data pipelines
  • Proficiency in Python, SQL, and Bash/shell
  • Command-line fluency in Linux or MacOS
  • Experience with cloud storage systems and large-scale cross-cloud data movement
  • High bar for data integrity, validation, and auditability, especially for regulated data
  • Calm, methodical debugging instincts and strong operational judgment

Preferred Qualifications

  • Experience with AWS S3, GCS, Azure Blob, or Snowflake
  • Experience with IAM debugging
  • Familiarity with Dagster or Airflow
  • Experience working with healthcare data or AI training data

About the Company

Protege is building a platform to solve the biggest unmet need in AI: getting access to the right training data. We facilitate the secure, efficient, and privacy-centric exchange of AI training data. We are a lean, fast-moving, high-trust team of builders obsessed with velocity and impact.

Skills & tools

PythonSQLAWSGCPAzureSnowflakeDagsterBash

What the team is looking for

Use this list as a quick fit check before you apply.

  1. 01Experience with production data pipelines
  2. 02Proficiency in Python, SQL, and Bash/shell
  3. 03Command-line fluency in Linux or MacOS
  4. 04Experience with cloud storage systems
  5. 05Knowledge of large-scale cross-cloud data movement