DevOps / Platform Engineer (Fintech + AI Infrastructure)
OnHires
Employment Type
Full Time
Location
Dubai
Requirements
Required Skills
Job Description
Responsibilities
- GPU Infrastructure: Deploy and maintain high-performance GPU clusters.
- AI Lifecycle: Manage the full lifecycle of AI services: inference deployment (Triton, vLLM, custom services), autoscaling, and seamless rollout/rollback strategies.
- Data Management: Manage model storage, artifact versioning, caching, and high-speed data access via S3-compatible storage.
- Observability: Monitor performance metrics including latency, throughput, error budgets, resource limits, and cost/performance ratios.
- High Availability: Ensure fault tolerance for payment services (SLA/SLO management, redundancy, Disaster Recovery planning, and regular recovery testing).
- Fintech-Grade Security: Implement secrets management, HSM/managed KMS integration, infrastructure hardening, and audit logging.
- Secure CI/CD: Build secure pipelines featuring artifact signing, vulnerability scanning, policy gates, and isolated environments.
- Node Operations: Deploy and maintain crypto nodes (Full, Archive, RPC) across various networks.
- Automation: Automate node updates, synchronization monitoring, and health checks.
- Storage & Performance: Manage disk I/O (IOPS/RAID), protect RPC endpoints, and manage access controls.
- Metrics: Monitor for sync lags, chain forks, and consensus issues.
Requirements
- 5+ years in DevOps, SRE, or Platform Engineering (Fintech experience is mandatory)
- Deep expertise in Linux, networking (TCP/IP, DNS, TLS, routing), and complex troubleshooting
- Production experience with K8s, Helm, Ingress, autoscaling, network policies, and resource management
- Proficiency in GitHub Actions, GitLab CI, or Jenkins
- Hands-on experience with Prometheus + Grafana, logging (Loki/ELK), and tracing (OpenTelemetry/Jaeger)
- Experience with GPU clusters and ML stacks (NVIDIA drivers, CUDA, MIG, GPU monitoring)
- Production-level operation of Postgres, Redis, Kafka, or RabbitMQ
- Practical knowledge of Vault, KMS, RBAC, OPA/Gatekeeper/Kyverno, Trivy, and SBOM
About the Company
The company is a fintech innovator operating a proprietary Payment Service Provider (PSP) platform, advanced AI infrastructure (including on-prem GPU/bare-metal servers), and a dedicated crypto division focused on node infrastructure. They operate a multi-cloud environment (AWS/Hetzner/DigitalOcean) and are looking for a seasoned Engineer to build and maintain a resilient, secure, and scalable platform that powers production payments and high-performance AI services.
How to Apply
Similar Jobs You Might Be Interested In
-
-
-
-
AI Development Consultant – Agentic Environments
Faze 3 Consulting
Information Technology Contract Hybrid: Abu DhabiPosted 3 weeks ago
-
AI Transformation Manager
Mondia Group
Information Technology Full Time Hybrid: DubaiPosted 3 weeks ago
-
AI Transformation Specialist
Mondia Group
Information Technology Full Time Hybrid: DubaiPosted 3 weeks ago
-
Technical Support Engineer
The Open Platform
Information Technology Full Time Hybrid: DubaiPosted 3 weeks ago
-
AI Transformation Specialist
Mondia Group
Information Technology Full Time Hybrid: DubaiPosted 3 weeks ago
-
AI Transformation Manager
Mondia Group
Information Technology Full Time Hybrid: DubaiPosted 3 weeks ago
-
Sales Manager - IT Services
Inspire Selection
Information Technology Full Time Hybrid: DubaiPosted 3 weeks ago
Join Dubai's Remote Work Revolution.
Stay ahead in your career with Dubai's first platform dedicated to remote and hybrid job opportunities. Subscribe for weekly insights and job alerts directly to your inbox.
- Weekly Job Alerts
- Subscribe to receive curated lists of the best remote and hybrid job opportunities in Dubai, tailored to your skills and interests.
- Weekly Blog Newsletter
- Get the latest insights, trends, and advice on remote work every week to help you thrive in the evolving work environment.