Senior ML Systems Engineer - Simulations
We are looking for a Senior ML Systems Engineer to build and validate simulation infrastructure for large-scale machine learning systems. This role focuses on modelling the compute and communication behaviour of systems used for ML training and inference, and using simulation to guide architecture, performance optimization, and capacity planning.
The ideal candidate combines strong systems experience with hands-on experience in measurement, benchmarking, and performance analysis of modern ML systems.
What You’ll Do:
Build simulation models for compute, memory, interconnect, and communication behavior in ML systems.
Develop tools to simulate performance for training and inference workloads.
Model distributed execution across accelerators, hosts, and network fabrics, including collectives, synchronization, and communication bottlenecks.
Use simulation and analytical modelling to evaluate tradeoffs, identify bottlenecks, and guide system design.
Run performance experiments and benchmarks on real ML systems to calibrate and validate simulation models.
Analyze end-to-end performance, including throughput, latency, scaling efficiency, utilization, and cost/performance tradeoffs.
Partner with hardware/software/Networking/ML teams to align simulation with real workloads and constraints.
Create reproducible benchmarking methodologies across models, system configurations, and compare against real system measurements to prove validity.
Communicate findings through technical reports and design recommendations.
Qualifications
Required:
Master’s, or PhD in Computer Science, Electrical Engineering, Computer Engineering, or a related field.
Strong experience in ML systems, distributed systems, performance engineering, computer architecture, or simulation.
Understanding of systems used for machine learning training and inference.
Experience analyzing compute, communication, and memory behavior in large-scale ML systems.
Hands-on experience with performance benchmarking, profiling, and measurement of ML systems.
Experience with distributed training concepts such as data parallelism, tensor/model parallelism, pipeline parallelism, collectives, and synchronization overheads.
Proficiency in one of the following Python, C++, or Rust.
Strong analytical skills and the ability to connect simulation results to real system behavior.
Preferred:
Experience with system performance modelling, network simulation, or architecture evaluation tools. - this background is ideal
Familiarity with accelerator-based systems such as GPUs, TPUs, or custom ML hardware.
Experience with PyTorch, JAX, TensorFlow, NCCL, XLA, CUDA, or similar tools.
Knowledge of interconnect and networking technologies such as InfiniBand, Ethernet/RDMA, NVLink, PCIe, or equivalent.
Experience evaluating both training throughput and inference latency/serving efficiency.
Background in workload characterization, trace-driven simulation, or model calibration.
Ability to work across hardware and software boundaries in a cross-functional environment.
What Success Looks Like:
Build simulation models that accurately predict performance trends and inform architectural decisions.
Identify compute and communication bottlenecks in ML training and inference systems.
Correlate simulation outputs with real-world benchmark data.
Improve system efficiency, scalability, and cost effectiveness through data-driven insights.
Recommended Jobs
HR Manager - Independent School - London - January 2026...
A prestigious independent school in London is seeking a highly skilled HR Manager to join the Senior Support Team from January 2026. This is a full-time, permanent position overseeing the full HR…
Housing Officer (BAND2C)
Job Category : Housing Location : Sutton Gate, Sutton Housing Partnership Hours Per Week : 37.50 Start Date : Immediate Start Start Time : 08:45 End Time : 17:00 Salary: £15.18 I a…
Senior Management Accountant *Advertising*
Job Description Senior Management Accountant for a reputable advertising business based in London Your new company An established creative advertising agency based in London is looking for a S…
School Receptionist - Brent Outstanding Secondary School
Location: Brent (Inner London) Status: Outstanding Ofsted Secondary School Start Date: ASAP Start – Permanent, Full-Time The Opportunity A high-achieving school in Brent is looking for …
History Teacher role - ECT Support - Islington...
We are working with a leading secondary school in Islington who are recruiting a scholarly and engaging History ECT for an immediate start. This is a school where History isn't just about dates—it’s …
7.5t Delivery Driver for Plumbing Supplies
7.5t Delivery Driver for Plumbing Supplies About the Role: The The Best Connection are currently looking for a C1 7.5t Delivery driver for immediate starts in the Dagenham area. In this role, yo…
History Teacher: Independent Mixed Harrow School
Are you a passionate and scholarly History Teacher ready to join a distinguished independent school known for its deep academic culture? We are seeking a specialist in Harrow whose teaching is rigoro…
Independent Travel Agent
Are you passionate about travel and looking for a flexible way to earn income on your own terms? We’re expanding and looking for motivated individuals who want to build their own independent travel b…
Immunisation Nurse- Vaccination UK
ABOUT US: Vaccination UK has been commissioned by NHS England since 2015 to provide school aged immunisations, including Influenza, to pupils across numerous counties across England and all …
Head Chef
Up to £18.50 per hour | 35 - 40 hours per week | Evenings & weekends required We believe you share our view that quality is always the top priority — and that you genuinely love working with peo…