Senior AI Research Engineer, Model Inference (Remote)
- Implement and optimize custom inference and fine-tuning kernels for small and large language models across multiple hardware backends.
- Implement and optimize full and LoRA fine-tuning for small and large language models across multiple hardware backends.
- Design and extend datatype and precision support (int, float, mixed precision, ternary QTypes, etc.).
- Design, customize, and optimize Vulkan compute shaders for quantized operators and fine-tuning workflows.
- Investigate and resolve GPU acceleration issues on Vulkan and integrated/mobile GPUs.
- Architect and prepare support for advanced quantization techniques to improve efficiency and memory usage.
- Debug and optimize GPU operators (e.g., int8, fp16, fp4, ternary).
- Integrate and validate quantization workflows for training and inference.
- Conduct evaluation and benchmarking (e.g., perplexity testing, fine-tuned adapter performance).
- Conduct GPU testing across desktop and mobile devices.
- Collaborate with research and engineering teams to prototype, benchmark, and scale new model optimization methods.
- Deliver production-grade, efficient language model deployment for mobile and edge use cases.
- Work closely with cross-functional teams to integrate optimized serving and inference frameworks into production pipelines designed for edge and on-device applications. Define clear success metrics such as improved real-world performance, low error rates, robust scalability, optimal memory usage and ensure continuous monitoring and iterative refinements for sustained improvements.
- Proficiency in C++ and GPU kernel programming.
- Proven Expertise in GPU acceleration with Vulkan framework.
- Strong background in quantization and mixed-precision model optimization.
- Experience and Expertise in Vulkan compute shader development and customization.
- Familiarity with LoRA fine-tuning and parameter-efficient training methods.
- Ability to debug GPU-specific performance and stability issues on desktop and mobile devices.
- Hands-on experience with mobile GPU acceleration and model inference.
- Familiarity with large language model architectures (e.g., Qwen, Gemma, LLaMA, Falcon etc.).
- Experience implementing custom backward operators for fine-tuning.
- Experience creating and curating custom datasets for style transfer and domain-specific fine-tuning.
- Demonstrated ability to apply empirical research to overcome challenges in model
Recruitment scams have become increasingly common. To protect yourself, please keep the following in mind when applying for roles:
- Apply only through our official channels. We do not use third-party platforms or agencies for recruitment unless clearly stated. All open roles are listed on our official careers page:
- Verify the recruiter's identity. All our recruiters have verified LinkedIn profiles. If you're unsure, you can confirm their identity by checking their profile or contacting us through our website.
- Be cautious of unusual communication methods. We do not conduct interviews over WhatsApp, Telegram, or SMS. All communication is done through official company emails and platforms.
- Double-check email addresses. All communication from us will come from emails ending in @ tether.to or @ tether.io
- We will never request payment or financial details. If someone asks for personal financial information or payment at any point during the hiring process, it is a scam. Please report it immediately.
Recommended Jobs
English Teacher
English Teacher - Chiswick - September Start - Full Time - Temp to Perm A fantastic opportunity has arisen for a passionate and committed English Teacher to join a thriving school in Chiswick this S…
Trade Marketing Placement
Trade Marketing Placement Programme Essentials To join one of our 12-month Internships you must meet one of the following criteria: You’re currently an undergraduate studying at University a…
Live-out Nanny-Housekeeper, Job ID J1E711
A lovely family based on Cambridge Road, London, is seeking a Full-time Nanny Housekeeper to care for their baby and school-aged child while maintaining a clean and well-organised home. The role incl…
SOCIAL MEDIA MANAGER
SOCIAL MEDIA MANAGER We Realise Potential in Your Story Based in the heart of Shoreditch, we are a multi-platform production company, social media agency, and digital media network who are cu…
Senior PCS7 Engineer
Are you a seasoned PCS7 freelancer ready to support major UK infrastructure and manufacturing projects? Would you like to be part of a flexible, forward-thinking automation team working on scalable…
Talent Acquisition Advisor - FTC
About us. JATO Dynamics is a global company and the leading provider of automotive market intelligence. With an insight into over 50 overseas markets, we deliver the world's most complete, accurate …
Sales Strategy & Development Placement
Sales Strategy & Development Placement Programme Essentials To join one of our 12-month Internships you must meet one of the following criteria: You’re currently an undergraduate studying at…
Head of IT Service Operations
Urenco is a global leader in the production of low carbon energy. We work at the cutting edge of the transition to a sustainable, net zero world. We’re looking for a Head of I&T Service Opera…