Cohere

Member of Technical Staff, Training Infra Engineer

Cohere

Posted 2 months ago

Employment Type

Full Time

Location

Dubai

Job Listing No Longer Available

This job posting is no longer accepting applications. It may be more than 30 days old or the position has been filled.

Requirements

Python, JAX / PyTorch, XLA/MLIR, Distributed Training, Kubernetes, Slurm, Ray, Performance Tuning, Systems Debugging, Software Engineering

Job Description

Responsibilities

  • Design and implement high-performance, scalable software for large-scale model training.
  • Improve training infrastructure, codebase performance, and orchestration for faster iterations.
  • Build tools and automation to speed training cycles and improve reliability on supercompute resources.
  • Research and prototype infrastructure and data-platform improvements (XLA/MLIR, compilation, I/O).
  • Collaborate closely with research scientists and production engineers to ship state-of-the-art models.
  • Support distributed training stacks (Kubernetes, Slurm, Ray) and debugging at scale.
  • Maintain and document training pipelines, benchmarks, and operational runbooks.

Requirements

  • Strong software engineering
  • Python proficiency
  • JAX / PyTorch
  • XLA/MLIR experience
  • Distributed training
  • Kubernetes / Slurm
  • Ray experience
  • Large-scale training
  • Performance tuning
  • Systems debugging

Preferred Qualifications

  • Experience training large language models at scale
  • Contributions to training tooling or infrastructure
  • Publications in top ML/Systems venues (NeurIPS, ICLR, MLSys, etc.)
  • Background in compiler/runtime optimization for ML
  • Familiarity with supercompute and GPU/TPU fleets
  • Experience bridging research and production systems

Benefits

  • Competitive health and dental coverage
  • Family medical insurance
  • Generous paid leave and annual leave allowance
  • Annual flight / ticket allowance
  • Remote-flexible / hybrid working model with office presence in Dubai
  • Parental leave top-up and personal enrichment stipends

About the Company

Cohere builds and ships frontier AI models and infrastructure to scale intelligence for developers and enterprises. We combine world-class research and engineering to power applications like content generation, semantic search, RAG, and agents. The team operates with a high compute-to-engineer ratio and encourages engineers to contribute across research and production. This opening is based in Dubai, UAE (hybrid / remote-friendly) and is ideal for engineers who enjoy working at the intersection of large-scale ML training, tooling, and systems engineering.

How to Apply

Similar Jobs You Might Be Interested In

Join Dubai's Remote Work Revolution.

Stay ahead in your career with Dubai's first platform dedicated to remote and hybrid job opportunities. Subscribe for weekly insights and job alerts directly to your inbox.

Thank you for subscribing! Check your inbox for confirmation.
Weekly Job Alerts
Subscribe to receive curated lists of the best remote and hybrid job opportunities in Dubai, tailored to your skills and interests.
Weekly Blog Newsletter
Get the latest insights, trends, and advice on remote work every week to help you thrive in the evolving work environment.