Service Reliability Eng - London, N1C 4AG

Universal Music Group
London

Service Reliability Eng - London, N1C 4AG, United Kingdom

Job Summary:

We are UMG, the Universal Music Group. We are the world’s leading music company. In everything we do, we are committed to artistry, innovation and entrepreneurship. We own and operate a broad array of businesses engaged in recorded music, music publishing, merchandising, and audiovisual content in more than 60 countries. We identify and develop recording artists and songwriters, and we produce, distribute and promote the most critically acclaimed and commercially successful music to delight and entertain fans around the world.

As a key member of our Global Technical Operations team, you will be responsible for the reliability, scalability, and performance of the critical systems that power a global enterprise. By blending a software engineering mindset with operational expertise, you will engineer solutions that improve system reliability, automate complex processes, and reduce manual toil. You will be an essential partner to our development, infrastructure, and security teams, driving a culture of resilience and continuous improvement across the organization.

As a Site Reliability Engineer, you won't just be supporting systems; you'll be ensuring the services that connect artists and fans around the globe are always on.

Job Functions:


Key Responsibilities:


System Reliability & Performance:

  • Design, build, and maintain the availability, scalability, and performance of critical services.

  • Develop and maintain robust monitoring, alerting, and observability systems (e.g., using AWS CloudWatch, Dynatrace) to ensure rapid issue detection and resolution.

  • Monitor infrastructure capacity and performance, providing analysis and suggestions for service delivery improvement.

Automation & Efficiency:

  • Drive the automation of repetitive operational tasks, including infrastructure provisioning, deployments, and scaling.

  • Create and maintain scripts and custom code to support and enhance our operational toolset.

  • Support and optimize CI/CD pipelines to improve deployment speed and reliability.

Incident Management & Collaboration:

  • Participate in an on-call rotation to troubleshoot and mitigate production incidents.

  • Lead post-incident reviews and root cause analyses to implement lasting solutions.

  • Partner with engineering and IT stakeholders to embed SRE best practices (SLOs, error budgets) into the design and development lifecycle.

Job Requirements:

Required Experience & Skills:

  • A strong background in systems administration (Linux/Windows) in a large-scale environment.

  • Proficiency in at least one programming language (e.g., Python, Go, Java).

  • Hands-on experience with a major cloud platform (AWS, GCP, or Azure), with a high preference for AWS.

  • Solid understanding of networking, containers (Docker, Kubernetes), and Infrastructure as Code (e.g., Terraform, Ansible).

  • Experience with modern monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk, Dynatrace).

  • Proven analytical and problem-solving abilities with experience in a high-pressure environment.

  • Excellent communication skills and the ability to foster a collaborative team environment.

Preferred Experience & Skills:

  • Bachelor's degree in an IT-related field.

  • Experience managing large-scale, distributed systems for a global organization.

  • Familiarity with IT governance standards like ITIL.

  • Direct experience with ServiceNow for IT service management.

  • Knowledge of chaos engineering, resilience testing, and advanced capacity planning.

Posted 2026-02-24

Recommended Jobs

Supervisor (FOH) Part-Time

Draughts
London

Draughts is a trailblazer in experiential hospitality, redefining board games for a modern audience. Our mission is to entertain our customers with amazing food and drink whilst providing a healthy …

View Details
Posted 2026-02-07

Communications Executive

Williams Racing
London

Role Purpose:   Atlassian Williams F1 Team is transforming in all areas in pursuit of its mission to win multiple F1 World Championships and is investing in what it takes to win. The Commercial an…

View Details
Posted 2026-01-24

MD-Customer Success Management

Moody's
London

At Moody's, we unite the brightest minds to turn today’s risks into tomorrow’s opportunities. We do this by striving to create an inclusive environment where everyone feels welcome to be who they are…

View Details
Posted 2026-02-15

Midday Meals Supervisor - Haringey

Marchant Recruitment
London

Are you a dependable Midday Meals Supervisor who enjoys working with children and supporting positive lunchtime experiences? A friendly primary in Haringey is recruiting a Midday Meals Supervisor to …

View Details
Posted 2025-12-03

Senior Account Executive (UK and Europe)

All Gravy
London

Frontline teams deserve tools that feel like the apps they actually use. We’re All Gravy, and we make work feel… less worky. Faster hiring, smoother shifts, happier people - and results your CFO…

View Details
Posted 2026-02-19

Banking Job - Mandarin speaking Credit Risk Manager (Corporate Banking) - rj

People First Recruitment
Central London

Please follow us on WeChat to see all our Cantonese and Mandarin jobs, interview tips and London news: Your New Job Title: Mandarin speaking Credit Risk Manager (Corporate Banking) The Skill…

View Details
Posted 2026-01-06

Digital Content Strategy and GEO Discovery Manager

Currys
London

Role overview: Digital Content Strategy & GEO Discovery Manager Waterloo - Hybrid Working Full Time Permanent  Grade 4   At Currys we’re united by one passion: to help everyone en…

View Details
Posted 2026-02-06

Lead Test Engineer

Medefer
London

ABOUT MEDEFER Medefer is an innovative CQC registered healthcare provider that aims to transform the way that healthcare is delivered by enabling healthcare systems to provide patients with the be…

View Details
Posted 2026-02-24

Veterinary Surgeon

North London

Veterinary Surgeon, North London We're working with a well-established, modern veterinary practice in North London that's looking to welcome a dedicated and customer-focused Veterinary Surgeon to t…

View Details
Posted 2026-01-21

Midweight Social Designer

Fabric Social
London

Midweight Social Designer Who We Are Fabric Social is the maverick of social agencies. We specialise in building brands through best-in-class social content, trend-first creative, and community…

View Details
Posted 2026-01-31