Staff Technical Lead for Inference & ML Performance

2bc6ae79-8ee Staff Technical Lead for Inference & ML Performance We're looking for a Staff Technical Lead for Inference & ML Performance to guide a team in building and optimizing state-of-the-art inference systems. This role is intense yet deeply impactful.

You'll shape the future of fal's inference engine and ensure our generative models achieve best-in-class performance. Your work directly impacts our ability to rapidly deliver cutting-edge creative solutions to users, from individual creators to global brands.

Day-to-day, you'll set technical direction, guide your team to build high-performance inference solutions, and personally contribute to critical inference performance enhancements and optimizations. You'll collaborate closely with research & applied ML teams, influence model inference strategies and deployment techniques, and drive advanced performance optimizations.

As a leader, you'll mentor and scale your team, coach and expand your team of performance-focused engineers, and help them innovate, solve complex performance challenges, and level up their skills.

To succeed in this role, you'll need to be deeply experienced in ML performance optimization, understand the full ML performance stack, and know inference inside-out. You'll also need to thrive in cross-functional collaboration and have excellent leadership skills.

If you're ready to lead the future of inference performance at a fast-paced, high-growth frontier, apply now!

XML job scraping automation by YubHub

]]> full-time staff onsite ML performance optimization, PyTorch, TensorRT, TransformerEngine, Triton, CUTLASS kernels, Quantization, Kernel authoring, Compilation, Model parallelism, Distributed serving, Profiling Engineering Technology fal https://logos.yubhub.co/fal.com.png fal is a fast-growing company pioneering the next generation of generative-media infrastructure. https://fal.com https://job-boards.greenhouse.io/fal/jobs/4012780009 San Francisco 2026-04-18