
Site Reliability Engineer
Chess.com
Completely RemoteFull TimeEngineering & Architecture
Posted Today
Job description
Responsibilities
- Design and implement multi-regional resilient infrastructure to handle millions of concurrent sessions.
- Lead hybrid cloud migration strategies, integrating bare-metal resources with cloud services.
- Own on-call rotations and incident response procedures to maintain high availability SLAs.
- Architect monitoring and alerting systems to proactively identify performance bottlenecks.
- Collaborate with development teams to implement infrastructure-as-code and CI/CD pipelines.
- Optimize system performance through capacity planning, load testing, and resource allocation.
- Drive automation initiatives to reduce manual operational overhead.
Requirements
- Bachelor's degree in Computer Science, Engineering, or a related technical field.
- 5+ years of experience in SRE, DevOps, or infrastructure engineering.
- Strong proficiency with UNIX/Linux operating systems and command-line administration.
- Experience with cloud platforms (GCP, AWS, or Azure) and Infrastructure-as-Code (Terraform, CloudFormation).
- Hands-on experience with configuration management (Ansible, Chef, or Puppet).
- Solid understanding of networking fundamentals (TCP/IP, HTTP, DNS).
- Experience with containerization and orchestration (Docker, Kubernetes).
- Proficiency with monitoring and observability tools (Datadog, Prometheus, Grafana).
Preferred Qualifications
- Experience managing bare-metal server infrastructure and datacenter operations.
- Proficiency with scripting languages such as Python, Go, or Bash.
- Background in high-availability architectures and disaster recovery planning.
- Experience with game server infrastructure or real-time application hosting.
- Previous experience working in a fully remote, distributed environment.
About the Company
Chess.com is one of the largest gaming sites in the world and the #1 platform for playing, learning, and enjoying chess. We are a team of over 600 fully remote people in 60+ countries working to support 250M+ chess players worldwide. We prize our mission-driven, flat, and no-corporate culture.
Skills & tools
LinuxTerraformKubernetesDockerAWSGCPPythonGoAnsible
What the team is looking for
Use this list as a quick fit check before you apply.
- 01Bachelor's degree in Computer Science or related field
- 025+ years in SRE, DevOps, or infrastructure engineering
- 03Proficiency with UNIX/Linux
- 04Experience with cloud platforms (GCP, AWS, or Azure)
- 05Infrastructure-as-code expertise (Terraform, CloudFormation)
- 06Configuration management (Ansible, Chef, Puppet)
- 07Networking fundamentals (TCP/IP, HTTP, DNS)
- 08Containerization (Docker, Kubernetes)
- 09Monitoring tools (Datadog, Prometheus, Grafana)

Chess.com
Job details
- Work model
- Completely Remote
- Commitment
- Full Time
- Category
- Engineering & Architecture
- Posted
- Today