Senior MLOps/LLMOps Engineer,

Kindred Group plc
London

About Us

At FDJ UNITED, we don't just follow the game, we reinvent it.

FDJ UNITED is one of Europe’s leading betting and gaming operators, with a vast portfolio of iconic brands and a reputation for technological excellence. With more than 5,000 employees and a presence in around fifteen regulated markets, the Group offers a diversified, responsible range of games, both under exclusive rights and open to competition. We set new standards, proving that entertainment and safety can go hand in hand. Here, you’ll work alongside a team of passionate individuals dedicated to delivering the best and safest entertaining experiences for our customers every day.

We’re looking for bold people who are eager to succeed and ready to level-up the game. If you thrive on innovation, embrace challenges, and want to make a real impact at all levels, FDJ UNITED is your playing field.

Join us in shaping the future of gaming. Are you ready to LEVEL-UP THE GAME?

The Role

As a Senior MLOps/LLMOps Engineer, you will be at the forefront of building and scaling our AI/ML infrastructure, bridging the gap between cutting-edge large language models and production-ready systems. You will play a pivotal role in designing, deploying, and operating the platforms that power our AI-driven products, working at the intersection of DevOps, MLOps, and emerging LLM technologies.

In this role, you'll architect robust, scalable infrastructure for deploying and monitoring large language models (LLMs) such as GPT and Claude-family models in AWS Bedrock & AWS AI Foundry, while ensuring security, observability, and reliability across multi-tenant ML workloads. You will collaborate closely with data scientists, ML engineers, platform teams, and product stakeholders to create seamless, self-serve experiences that accelerate AI innovation across the organization.

This is a hands-on leadership role that blends strategic thinking with deep technical execution. You'll own the end-to-end ML platform lifecycle; from infrastructure provisioning and CI/CD automation to model deployment, monitoring, and cost optimization. As a senior technical leader, you'll champion best practices, mentor team members, and drive a culture of continuous improvement, experimentation, and operational excellence.

Key Responsibilities

Platform Infrastructure & Deployment

  • Run and evolve our ML/LLM compute infrastructure on Kubernetes/EKS (CPU/GPU) for multi-tenant workloads, ensuring portability across AWS/Azure AI Foundry regions with region-aware scheduling, cross-region data access, and artifact management
  • Engage with platform and infrastructure teams to provision and maintain access to cloud environments (AWS, Azure), ensuring seamless integration with existing systems
  • Setup and maintain deployment workflows for LLM-powered applications, handling environment-specific configurations across development, staging/UAT, and production
  • Build and operate GitOps-native delivery pipelines using GitLab CI, Jenkins, ArgoCD, Helm, and FluxCD to enable fast, safe rollouts and automated rollbacks

LLM Operations & Optimization

  • Deploy, scale, and optimize large language models (GPT, Claude, and similar) with deep consideration for prompt engineering, latency/performance tradeoffs, and cost efficiency
  • Operate and maintain Argo Workflows as reliable, self-serve orchestration platforms for data preparation, model training, evaluation, and large-scale batch compute
  • Implement and evaluate models using AI Observability frameworks to track model performance, drift, and quality in production

CI/CD & Infrastructure as Code

  • Design and maintain robust CI/CD pipelines with isolated development, staging, and production environments to support safe iteration, reproducibility, and full lifecycle observability
  • Implement Infrastructure as Code (IaC) using Terraform, CloudFormation, and Helm to automate provisioning, configuration, and scaling of cloud resources
  • Manage container orchestration, secrets management (e.g., AWS Secrets Manager), and secure deployment practices across all environments

Observability, Monitoring & Reliability

  • Set up and analyze comprehensive observability stacks using Prometheus/Grafana and Splunk to monitor model health, infrastructure performance, and system reliability
  • Support system monitoring for health, usage, and cost across AWS and Azure environments, including CloudWatch, ELK Stack, and custom alerting solutions
  • Implement sensible alerting strategies to proactively detect and resolve incidents, minimizing downtime and ensuring high availability
  • Proactively troubleshoot production issues, manage release cycles, and provide on-call support as necessary

Data Platform & Experiment Reproducibility

  • Design and maintain a modern data platform built on Apache Iceberg to enable experiment reproducibility, data lineage tracking, and automated governance
  • Build data pipelines with strong principles of idempotency, retries, backfills, and reproducibility to support ML workflows
  • Collaborate with data engineers to ensure seamless integration between data ingestion, transformation, and model training processes

Developer Experience & Enablement

  • Own developer experience by creating intuitive APIs, CLIs, and minimal UIs that enable engineers and data scientists to self-serve infrastructure and deployment needs
  • Develop comprehensive, modular documentation covering system architecture, deployment processes, model usage guidelines, onboarding playbooks, and operational runbooks
  • Treat the ML platform as a product: engage with internal users (engineers, data scientists), gather feedback, remove friction points, and continuously improve usability
  • Create reusable templates, standards, and best practices to ensure maintainability, consistency, and scalability across teams

Architecture, Security & Governance

  • Define and refine platform architecture with a focus on scalability, security, and compliance with organizational and regulatory standards
  • Engage in security approval conversations, ensuring that infrastructure, deployments, and data handling meet security and governance requirements
  • Implement FinOps best practices, including cost attribution, budget monitoring, and optimization strategies for multi-tenant ML infrastructure
  • Champion a culture of continuous integration, continuous delivery, and continuous improvement across engineering teams

Skills, Knowledge, and Experience

Essential Experience

  • 8+ years of experience in DevOps, Platform Engineering, or Site Reliability Engineering, with at least 2+ years focused on MLOps/LLMOps
  • Deep hands-on expertise with AWS services, including Bedrock, S3, EC2, EKS, RDS/PostgreSQL, ECR, IAM, Lambda, Step Functions, and CloudWatch
  • Production experience managing Kubernetes workloads in EKS, including GPU workloads, autoscaling, resource quotas, and multi-tenant configurations
  • Proficient in container orchestration (Docker, Kubernetes), secrets management, and implementing GitOps-style deployments using Jenkins, ArgoCD, FluxCD, or similar tools
  • Practical understanding of deploying and scaling LLMs (e.g., GPT and Claude-family models), including prompt engineering, latency/performance tradeoffs, and model evaluation
  • Strong programming skills in Python (FastAPI, Django, Pydantic, boto3, Pandas, NumPy) with solid computer science fundamentals (performance, concurrency, data structures)
  • Working knowledge of Machine Learning techniques and frameworks (e.g., scikit-learn, TensorFlow, PyTorch)
  • Experience building and operating data pipelines with principles of idempotency, retries, backfills, and reproducibility
  • Expertise in Infrastructure as Code (IaC) using Terraform, CloudFormation, and Helm
  • Proven track record designing and maintaining CI/CD pipelines with GitLab CI, Jenkins, or similar tools
  • Observability experience with Prometheus/Grafana, Splunk, Datadog, Loki/Promtail, OpenTelemetry, and Sentry, including implementing sensible alerting strategies
  • Strong grasp of networking, security concepts, and Linux systems administration
  • Excellent communication skills with ability to collaborate across development, QA, operations, and product teams
  • Self-motivated, proactive, with a strong sense of ownership and a passion for removing friction and improving developer experience

Nice to Have

  • Experience with distributed compute frameworks such as Dask, Spark, or Ray
  • Familiarity with NVIDIA Triton, TorchServe, or other inference servers
  • Experience with ML experiment tracking platforms like Weights & Biases, MLflow, or Kubeflow
  • FinOps best practices and cost attribution strategies for multi-tenant ML infrastructure
  • Exposure to multi-region and multi-cloud designs, including dataset replication strategies, compute placement, and latency optimization
  • Experience with LakeFS, Apache Iceberg, or Delta Lake for data versioning and lakehouse architectures
  • Knowledge of data transformation tools such as DBT
  • Experience with data pipeline orchestration tools like Airflow or Prefect
  • Familiarity with Snowflake or other cloud data warehouses
  • Understanding of responsible AI practices, model governance, and compliance frameworks

Our Way Of Working

Our world is hybrid.

A career is not a sprint. It's a marathon. One of the perks of joining us is that we value you as a person first. Our hybrid world allows you to focus on your goals and responsibilities and lets you self-organize to improve your deliveries and get the work done in your own way.

Application Process

We believe talent knows no boundaries. Our hiring process focuses solely on your skills, experience, and potential to contribute to our team. We welcome applicants from all backgrounds and evaluate each candidate based on merit, regardless of personal characteristics such as age, gender, origin, religion, sexual orientation, neurodiversity, or disability.

Why Join FDJ UNITED?

  • Work on cutting-edge AI/ML technologies at scale in a regulated, high-stakes industry
  • Technical leadership opportunities with visibility across the organization
  • Collaborate with world-class engineers, data scientists, and product teams
  • Influence the architecture and strategy of our AI platform from the ground up
  • Continuous learning environment with access to the latest tools, technologies, and practices

Our Way Of Working

Our world is hybrid.

A career is not a sprint. It’s a marathon. One of the perks of joining us is that we value you as a person first. Our hybrid world allows you to focus on your goals and responsibilities and lets you self-organise to improve your deliveries and get the work done in your own way.

Application Process

We believe talent knows no boundaries. Our hiring process focuses solely on your skills, experience, and potential to contribute to our team. We welcome applicants from all backgrounds and evaluate each candidate based on merit, regardless of personal characteristics as the age, gender, origin, religion, sexual orientation, neurodiversity or disability.

Posted 2025-12-09

Recommended Jobs

SALARIED GP s WEMBLEY PARTNERSHIP OPPORTUNITY £9500 per session Bens in Middlesex

Dream Medical Limited
Middlesex

Salaried GP Wembley | £9,500 plus MDU and Pension Dream medical are working on behalf of a long existing client in Middlesex looking to recruit a Salaried GP on a FT or PT basis due to expansion …

View Details
Posted 2025-12-09

Locum ANP Uxbridge Flexible Hours GREAT RATES in London West

Dream Medical Limited
West London

Locum ANP Uxbridge ongoing Dream Medical are looking for an experienced Locum Advanced Nurse Practitioner (ANP) to work within a GP surgery in the centre of Uxbridge. The practice would ideally …

View Details
Posted 2025-12-09

Live-in Housekeeper in Richmond upon Thames, Job ID J1EC37R

Little Ones UK Ltd
Richmond upon Thames, Greater London

This lovely family based in Richmond upon Thames, is seeking a Live-in Nanny-Housekeeper to care for their home and maintain it to a high standard. The role includes all general housekeeping duties. …

View Details
Posted 2025-12-09

Cover Supervisor

All Saints Catholic School
Barking & Dagenham, Greater London

We are seeking to appoint a suitable individual to supervise whole classes when teachers are absent. Cover Supervisors are provided with appropriate materials to use in such lessons.   The successful…

View Details
Posted 2025-12-15

Tech Lead - AI Start Up - Python, LLMs, 150k PLUS Generous Equity & Fully Remote

East London

Job description I am looking for a Tech Lead to join my clients LLM team; you’ll own the architecture, training, and deployment of models that drive our core product. This is a hands-on …

View Details
Posted 2025-12-27

Forensic Technology Manager

Brimstone Consulting
London

Forensic Technology Manager London (hybrid) Opportunity to work for an outstanding company in the field. You will have a strong commercial background and client facing communication skills.  You…

View Details
Posted 2025-09-10

Doggy Daycare needed near West Barnes KT3

Tailster
Barnes, Greater London

I am looking for someone to have Charlie on Saturdays between 8.30am and 5.30 and occasional other days

View Details
Posted 2025-10-12

Year 3 Teacher - Independent School - Merton - January...

Marchant Recruitment
Merton, Greater London

Are you a committed Year 3 Teacher seeking an inspiring Independent School setting? A respected Independent School in Merton is recruiting a Full-Time Year 3 Teacher to join its KS2 team from …

View Details
Posted 2025-10-11

Project Manager (London)

OCU
London

PROJECT MANAGER (LONDON) Location: Greenwich or Borehamwood (with travel across the South East, Kent & West London) Competitive Package and Progression   Role Overview We are looking for …

View Details
Posted 2025-12-19

Remote Financial Advisor - International Expatriate Clients

Prestige IFA Jobs
London

Opportunity Overview: Our client, a forward-thinking international wealth management firm, is seeking experienced financial advisors to join their expanding global team. With a client base of inte…

View Details
Posted 2025-10-12