Member of Technical Staff, Training Infra Engineer
Cohere
Employment Type
Full Time
Location
Dubai
Job Listing No Longer Available
This job posting is no longer accepting applications. It may be more than 30 days old or the position has been filled.
Requirements
Required Skills
Job Description
Responsibilities
- Design and implement high-performance, scalable software for large-scale model training.
- Improve training infrastructure, codebase performance, and orchestration for faster iterations.
- Build tools and automation to speed training cycles and improve reliability on supercompute resources.
- Research and prototype infrastructure and data-platform improvements (XLA/MLIR, compilation, I/O).
- Collaborate closely with research scientists and production engineers to ship state-of-the-art models.
- Support distributed training stacks (Kubernetes, Slurm, Ray) and debugging at scale.
- Maintain and document training pipelines, benchmarks, and operational runbooks.
Requirements
- Strong software engineering
- Python proficiency
- JAX / PyTorch
- XLA/MLIR experience
- Distributed training
- Kubernetes / Slurm
- Ray experience
- Large-scale training
- Performance tuning
- Systems debugging
Preferred Qualifications
- Experience training large language models at scale
- Contributions to training tooling or infrastructure
- Publications in top ML/Systems venues (NeurIPS, ICLR, MLSys, etc.)
- Background in compiler/runtime optimization for ML
- Familiarity with supercompute and GPU/TPU fleets
- Experience bridging research and production systems
Benefits
- Competitive health and dental coverage
- Family medical insurance
- Generous paid leave and annual leave allowance
- Annual flight / ticket allowance
- Remote-flexible / hybrid working model with office presence in Dubai
- Parental leave top-up and personal enrichment stipends
About the Company
Cohere builds and ships frontier AI models and infrastructure to scale intelligence for developers and enterprises. We combine world-class research and engineering to power applications like content generation, semantic search, RAG, and agents. The team operates with a high compute-to-engineer ratio and encourages engineers to contribute across research and production. This opening is based in Dubai, UAE (hybrid / remote-friendly) and is ideal for engineers who enjoy working at the intersection of large-scale ML training, tooling, and systems engineering.
How to Apply
Similar Jobs You Might Be Interested In
-
-
-
-
AI Development Consultant – Agentic Environments
Faze 3 Consulting
Information Technology Contract Hybrid: Abu DhabiPosted 3 weeks ago
-
AI Transformation Manager
Mondia Group
Information Technology Full Time Hybrid: DubaiPosted 3 weeks ago
-
AI Transformation Specialist
Mondia Group
Information Technology Full Time Hybrid: DubaiPosted 3 weeks ago
-
Technical Support Engineer
The Open Platform
Information Technology Full Time Hybrid: DubaiPosted 3 weeks ago
-
AI Transformation Specialist
Mondia Group
Information Technology Full Time Hybrid: DubaiPosted 3 weeks ago
-
AI Transformation Manager
Mondia Group
Information Technology Full Time Hybrid: DubaiPosted 3 weeks ago
-
Sales Manager - IT Services
Inspire Selection
Information Technology Full Time Hybrid: DubaiPosted 3 weeks ago
Join Dubai's Remote Work Revolution.
Stay ahead in your career with Dubai's first platform dedicated to remote and hybrid job opportunities. Subscribe for weekly insights and job alerts directly to your inbox.
- Weekly Job Alerts
- Subscribe to receive curated lists of the best remote and hybrid job opportunities in Dubai, tailored to your skills and interests.
- Weekly Blog Newsletter
- Get the latest insights, trends, and advice on remote work every week to help you thrive in the evolving work environment.