Staff Linux Systems Engineer
Meet the team behind this journey Within the Infrastructure Operations and Security (IOPS) department, our Data Center Unit manages all infrastructure systems across our remote sites. As a key member of the Research Infrastructure Operations (RIO) team, you will architect and design systems to help us operate our research GPU infrastructure, support the Research department and make fundamental contributions to our AI development. You will be one of the first ones in Europe to work hands-on with the latest Nvidia's AI systems GB200 NVL72. Given the scale and complexity of our infrastructure, it's not just about maintaining our systems, it's about advancing them. You will use your expertise in tooling and automation to improve the efficiency, reliability and performance of our infrastructure, taking our operations to the next level. In this role, you will also coordinate with on-site personnel and work closely with various teams within our organization. Joining our team means becoming part of a skilled group of engineers ready to support and kick-start your journey with us. Your responsibilities
- Co-own the architecture and roadmap for the model-training infrastructure with the Engineering Manager.
- Lead cross-team project implementations end to end-align stakeholders, define scope and milestones, manage dependencies, and drive on-time delivery.
- Provide technical mentorship through design reviews, documentation and hands-on coaching, without managing direct reports.
- Build and own automation tooling for provisioning, maintenance and troubleshooting of our GPU infrastructure while continuously improving team tooling.
- Plan and execute fleet upgrades (kernels, NVIDIA drivers, BIOS/NIC/HBA firmware) with minimal disruption; keep sites consistent.
- Establish observability across the whole GPU cluster including storage and network by extending and optimizing our monitoring systems.
- Lead cross-team incident response and drive root-cause analysis.
- Benchmark and optimize cluster performance.
- Partner with the network team to design and tune the fabric for high-performance workloads.
- Participation in our on-call rotation: You'll ensure the reliability and availability of our services by being available to join the team's shared on-call rotation as needed.
- Staff-level individual contributor with a proven track record of setting and implementing technical strategy and leading cross-team technical projects
- Extensive experience in management and troubleshooting of GPU compute clusters, being able to architect solutions that scale
- Proficiency in containerization and container orchestration technologies such as Docker and Kubernetes
- Software engineering expertise and fluency in at least one programming language, preferably in Go.
- Expertise in patch and OS management at scale
- Experienced in Linux performance benchmarking, tuning and troubleshooting
- Familiarity with distributed storage solutions like Lustre and Ceph
- Knowledgeable in networking technologies and protocols, including Ethernet and ideally Infiniband
- Proactive and solution-oriented mindset
- Excellent problem-solving skills
- Initiative-driven and able to take ownership
- Diverse and internationally distributed team : joining our team means becoming part of a large, global community with people of more than 90 nationalities. We're more than just colleagues; we're a group of professionals with a shared mission to connect diverse cultures. Our global presence is growing-we've doubled in size nearly every year, with our employees based in the UK, Germany, the Netherlands, Poland, the US, and Japan, and we continue to expand our network.
- Open communication, regular feedback : as a language-focused company, we value the importance of clear, honest communication. We value smooth collaboration, direct and actionable feedback, and believe that leading with empathy and growth mindset makes us better together.
- Hybrid work, flexible hours : we offer a hybrid work schedule, with team members coming into the office twice a week. This allows you to engage directly with your team and experience the unique energy of our workspace, while still enjoying the flexibility and comfort of working from home. With flexible working hours and trust in your productivity, we are in sync with your team's general locations and time zones to foster effective and seamless collaboration.
- Regular in-person team events : we bond over vibrant events that are as unique as our team, from local team and business unit gatherings, to new-joiner onboardings, to company-wide events that bring us all together-literally.
- Monthly full-day hacking sessions : every month, we have Hack Fridays, where you can spend your time diving into a project you're passionate about and get the opportunity to work with other teams-we value your initiatives, impact, and creativity.
- 30 days of annual leave : we value your peace of mind. With 30 days off (excluding public holidays) and access to mental health resources, we make sure you're as strong mentally as you are professionally.
- Virtual Shares: An ownership mindset in every role. We believe everyone should share in our success, and that's why every employee receives Virtual Shares, linking your contribution directly to DeepL's growth and rewarding you with a stake in our future.
- Competitive benefits : just as our team spans the globe, so does our benefits package. We've crafted it to reflect the diversity of our team and tailored it to align with your unique location, to ensure you feel supported every step of the way.
Recommended Jobs
Soft Services Director (Ref: 006934)
An established and forward-thinking facilities management group is seeking an exceptional Soft Services Director to lead its nationwide soft services division. This is a unique opportunity for an insp…
Associate Director
Job Description The AtkinsRéalis UK Infrastructure Project Delivery Practice (PDP) is the centre of excellence for the delivery of infrastructure projects and the home unit for our project managem…
Nurse Prescriber needed to work with one of our best clients
JOB OVERVIEW We are looking to hire an experienced Nurse Prescriber who can join the founding healthcare team in Chingford to deliver quality service to its patients and work with a group of docto…
Commercial Finance Business Partner - Sales Forecasting and Analysis
Permanent, full time Location: London (Paddington) - hybrid (3 days per week in office) We are DS Smith, together with International Paper, we are a global leader in sustainable packaging solutions a…
SEN Teaching Assistant
SEN Teaching Assistant (KS1 - 1:1 Support for Child with Autism) Location: West Heath, Birmingham Hours: 8:30am - 3.30pm (Full-Time, Temporary) Start Date: November 2025 | End Date: July 2026 Sala…
VAT Associate Director, M&A, London : £80 - 105k guide
● This Big 4 firm’s team that specialises in M&A work is keen to find identify talented individuals at Associate Director/Senior Manager grade. ● The team has seen significant growth in recent years…
Head of Communications, London
Head of Communications Job in London Would you like to build, create and develop the communications strategy for a leading, global consultancy? You will shape and own how effective Communication…
Purchase Ledger Clerk
Robert Half Finance & Accounting are partnering with a leading Retail Company in London to recruit an immediate, temporary Accounts Payable Clerk for 2-3 months. Role: Our client is looking for a…
Senior Mining Engineer
Job Description What if you could shape a career as unique as you? At WSP, you can always find opportunities to grow and do what matters to you. Make the most of our global reach to discover new…
Electrician London
Electrician London A leading Mechanical & Electrical contractor with a strong family culture is seeking an experienced Electrician to join their team. This is a varied role combining electrical exp…