Staff Site Reliability Engineer

Remote

Completely RemoteFull TimeEngineering & Architecture
Posted Today

Job description

Responsibilities

  • Own the technical direction of Remote's SRE/Platform domain, including architecture, tooling, and long-term roadmap
  • Define and drive the reliability strategy across the platform using SLOs/SLIs, error budgets, and observability
  • Lead complex, cross-team infrastructure initiatives from discovery through delivery
  • Identify and lead AI enablement initiatives to reduce operational toil and accelerate development workflows
  • Drive AI-powered automation for platform operations, such as intelligent alerting and self-healing infrastructure
  • Mentor senior engineers and raise the technical bar through design feedback and hands-on guidance
  • Collaborate with Security teams on platform hardening, threat mitigation, and compliance

Requirements

  • 8+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering
  • Deep expertise in Kubernetes: operating, designing, and scaling production clusters
  • Proven experience managing large-scale cloud infrastructure on AWS
  • Strong infrastructure-as-code practice using Terraform
  • Experience defining and operating reliability frameworks (SLOs, SLIs, error budgets)
  • Solid observability background with tools like Datadog, Grafana, or Prometheus
  • Proficiency with CI/CD platforms (GitLab CI, GitHub Actions) and deployment automation
  • Proficiency with Bash/scripting and container tooling (Docker)
  • Practical experience applying AI tools to infrastructure, operations, or developer tooling

Preferred Qualifications

  • Excellent communication and interpersonal skills for aligning stakeholders
  • Holistic debugging skills and strong security knowledge (defensive and offensive)
  • Experience navigating ambiguity and translating vague requirements into concrete solutions

Benefits

  • Work from anywhere with a fully remote culture
  • Flexible paid time off and flexible working hours (async-first)
  • 16 weeks paid parental leave
  • Mental health support services
  • Stock options
  • Learning budget
  • Home office budget and IT equipment
  • Budget for local in-person social events or co-working spaces

About the Company

Remote is solving modern organizations’ biggest challenge – navigating global employment compliantly with ease. We make it possible for businesses of all sizes to recruit, pay, and manage international teams. Our team works asynchronously from across the globe to build a best-in-class HR platform.

Skills & tools

KubernetesAWSTerraformDatadogDockerCI/CD

What the team is looking for

Use this list as a quick fit check before you apply.

  1. 018+ years SRE/DevOps/Platform experience
  2. 02Deep Kubernetes expertise
  3. 03AWS infrastructure management
  4. 04Terraform IaC
  5. 05SLO/SLI/Error budget operation
  6. 06Observability (Datadog/Grafana/Prometheus)
  7. 07CI/CD proficiency
  8. 08Bash/scripting
  9. 09AI tool application experience