OnHires

DevOps / Platform Engineer (Fintech + AI Infrastructure)

OnHires

Posted 10 hours ago

Employment Type

Full Time

Location

Dubai

Requirements

Linux, Networking, Kubernetes, CI/CD, Observability, AI Infrastructure, Security

Job Description

Responsibilities

  • GPU Infrastructure: Deploy and maintain high-performance GPU clusters.
  • AI Lifecycle: Manage the full lifecycle of AI services: inference deployment (Triton, vLLM, custom services), autoscaling, and seamless rollout/rollback strategies.
  • Data Management: Manage model storage, artifact versioning, caching, and high-speed data access via S3-compatible storage.
  • Observability: Monitor performance metrics including latency, throughput, error budgets, resource limits, and cost/performance ratios.
  • High Availability: Ensure fault tolerance for payment services (SLA/SLO management, redundancy, Disaster Recovery planning, and regular recovery testing).
  • Fintech-Grade Security: Implement secrets management, HSM/managed KMS integration, infrastructure hardening, and audit logging.
  • Secure CI/CD: Build secure pipelines featuring artifact signing, vulnerability scanning, policy gates, and isolated environments.
  • Node Operations: Deploy and maintain crypto nodes (Full, Archive, RPC) across various networks.
  • Automation: Automate node updates, synchronization monitoring, and health checks.
  • Storage & Performance: Manage disk I/O (IOPS/RAID), protect RPC endpoints, and manage access controls.
  • Metrics: Monitor for sync lags, chain forks, and consensus issues.

Requirements

  • 5+ years in DevOps, SRE, or Platform Engineering (Fintech experience is mandatory)
  • Deep expertise in Linux, networking (TCP/IP, DNS, TLS, routing), and complex troubleshooting
  • Production experience with K8s, Helm, Ingress, autoscaling, network policies, and resource management
  • Proficiency in GitHub Actions, GitLab CI, or Jenkins
  • Hands-on experience with Prometheus + Grafana, logging (Loki/ELK), and tracing (OpenTelemetry/Jaeger)
  • Experience with GPU clusters and ML stacks (NVIDIA drivers, CUDA, MIG, GPU monitoring)
  • Production-level operation of Postgres, Redis, Kafka, or RabbitMQ
  • Practical knowledge of Vault, KMS, RBAC, OPA/Gatekeeper/Kyverno, Trivy, and SBOM

About the Company

The company is a fintech innovator operating a proprietary Payment Service Provider (PSP) platform, advanced AI infrastructure (including on-prem GPU/bare-metal servers), and a dedicated crypto division focused on node infrastructure. They operate a multi-cloud environment (AWS/Hetzner/DigitalOcean) and are looking for a seasoned Engineer to build and maintain a resilient, secure, and scalable platform that powers production payments and high-performance AI services.

How to Apply

Similar Jobs You Might Be Interested In

Join Dubai's Remote Work Revolution.

Stay ahead in your career with Dubai's first platform dedicated to remote and hybrid job opportunities. Subscribe for weekly insights and job alerts directly to your inbox.

Thank you for subscribing! Check your inbox for confirmation.
Weekly Job Alerts
Subscribe to receive curated lists of the best remote and hybrid job opportunities in Dubai, tailored to your skills and interests.
Weekly Blog Newsletter
Get the latest insights, trends, and advice on remote work every week to help you thrive in the evolving work environment.