Site Reliability & Observability Engineer
Overview
Home. There’s no place like it. And there’s no feeling like helping people create the joy of feeling truly at home. At Dunelm, that’s what we do. We’re the UK's number one choice for homewares because we make home life lovelier for our customers. And we’ve crafted a workplace that feels just as welcoming - where you can bring your ideas, be yourself, and feel right at home. Remaining first-choice for savvy homeware shoppers also involves making use of advanced technology. We have embraced serverless, event-driven architecture and container orchestration, and are moving from a monolithic front end to micro front ends. You’ll join a talented and collaborative group of engineers and architects who care about quality and reliability. Learn more on our Engineering Blog ( Site Reliability EngineeringOur SRE team is a high-trust, high-impact group of engineers who bring software engineering principles to operational reliability. We are hands-on developers and systems thinkers who build scalable, observable, and resilient platforms. We work closely with other Engineering, Data, Platform and Operations teams to help them build reliable, observable, and cost-effective systems. We lead incident response, improve deployment safety, and guide teams toward sustainable service ownership. We process large volumes of telemetry data every day and are constantly evolving our approach to cost-efficient observability, adaptive sampling, and meaningful tracing. Observability is not a bolt-on - it is a first-class concern that shapes how we build and support systems across the business. Ways of Working This is a hybrid role, with time split between working from home and our London or Leicester offices. We get together as a team for two days every month, but there may be an expectation of other ad-hoc office days where necessary. Interview Process
- Step 1: Introductory video call (around 45 minutes) with the Principal Engineer and Delivery Lead to get to know each other, explain the role, and hear about your experience, goals and approach to work.
- Step 2: A 90-minute technical discussion with a few members of the SRE team. You will work through scenario-based questions designed to help you to highlight your knowledge, specific approach and where you feel any improvements could be made.
- TypeScript or similar strongly typed programming language(s).
- Ability to write idiomatic, pragmatic, and testable code, with strong, appropriate, automated testing.
- AWS, including serverless services and general networking principles
- Understanding of SRE principles, namely: embracing risk, service level objectives, eliminating toil, monitoring distributed systems, automation and release engineering
- AWS expertise, including serverless services and general networking principles
- Linux system administration knowledge - able to use a command line to navigate and troubleshoot a server or container running a Linux OS
- Configuring and using observability back-end SaaS platforms, such as Datadog, Grafana etc.
- Infrastructure-as-Code tools, such as Pulumi and Terraform
- Kubernetes fundamentals (deploying and monitoring workloads)
- CI/CD pipelines (GitLab or similar) and build/test/deploy automation
- Participation in incident response, root cause analysis and post-incident reviews
- Strong problem-solving and investigative mindset, with high attention to detail
- Rust or a similar compiled language (e.g. Go)
- Knowledge of OpenTelemetry tools, specification, APIs etc.
- Instrumenting and running OpenTelemetry in production at scale.
- Distributed tracing and trace sampling
- Cost optimisation for observability and cloud services
- Exposure to Google Cloud Platform (GCP)
- Deep Kubernetes observability (e.g. metrics exporters, service mesh)
- Familiarity with challenges in the retail sector is a bonus but not expected
- Support and build trust with teammates, always assuming positive intent
- Communicate clearly and share knowledge to build shared understanding
- Stay curious, ask why, and always look to improve how things work
- Embrace change, adapt quickly, and take on a variety of challenges
- Drive innovation by looking for better ways forward and pushing for progress
Recommended Jobs
Senior Paralegal
Senior Paralegal - Residential Property My team is seeking an experienced and highly organised Senior Paralegal to join our Residential Property team. This role is ideal for someone who thrives in …
Site Manager Role Available - Camden - January 2026
School Status & Location Sector: Prestigious Independent School, Inner London. Borough: Camden. Start Date: Permanent, full-time role commencing January 2026. The Opportunity & School P…
Commissioning General Manager
ð¡ General Manager â Luxury Care Home Up to £95,000 + 50% bonus  Private medical  30+ days leave Weâre looking for a proven leader to take full ownership of a high-end care home â drivi…
QA Automation Engineer
QA Automation Engineer Remote Location This role is remote from anywhere in Europe, although we have a preference for Portugal , Ukraine , or Romania . Who We Are Payset, a UK-ba…
Service Delivery Supervisor
Contract type Permanent Working Pattern Full time Location based Barking Hours per week 37.5 Dates that interviews will take place Interviews will be held in person. Salary £…
Mandarin speaking Job - HR senior officer/assistant - Payroll - London-wm
Please follow us on WeChat to see all our Cantonese and Mandarin jobs, interview tips and London news: Your New Job Title: Mandarin speaking HR senior officer/assistant - Payroll , London …
Business Analyst / Project Manager (PM/BA)
Job Description Job Summary: We are seeking an experienced and versatile Business Analyst / Project Manager (PM/BA) Hybrid to lead and manage the delivery of complex business initiatives acro…
European Portfolio Coordinator
Edgewell is not just a company, but a vibrant global community of 6,800 visionaries, doers, and makers . Our family of over 25 personal care brands serves people in more than 50 countries . We are…
Business Development Manager - Transport and Smart...
About the Role We’re looking for a commercially savvy, technically minded Business Development Manager to join our team and help drive growth across the EMEA region. This role is all about spot…
Event Sales Executive
About Launch180 Launch180 is a fast-growing outsourced sales and marketing agency specialising in customer acquisition, brand representation, and event-based sales solutions. We partner with some o…