<?xml version="1.0" encoding="UTF-8"?>
<source>
  <jobs>
    <job>
      <externalid>2bc6ae79-8ee</externalid>
      <Title>Staff Technical Lead for Inference &amp; ML Performance</Title>
      <Description><![CDATA[<p>We&#39;re looking for a Staff Technical Lead for Inference &amp; ML Performance to guide a team in building and optimizing state-of-the-art inference systems. This role is intense yet deeply impactful.</p>
<p>You&#39;ll shape the future of fal&#39;s inference engine and ensure our generative models achieve best-in-class performance. Your work directly impacts our ability to rapidly deliver cutting-edge creative solutions to users, from individual creators to global brands.</p>
<p>Day-to-day, you&#39;ll set technical direction, guide your team to build high-performance inference solutions, and personally contribute to critical inference performance enhancements and optimizations. You&#39;ll collaborate closely with research &amp; applied ML teams, influence model inference strategies and deployment techniques, and drive advanced performance optimizations.</p>
<p>As a leader, you&#39;ll mentor and scale your team, coach and expand your team of performance-focused engineers, and help them innovate, solve complex performance challenges, and level up their skills.</p>
<p>To succeed in this role, you&#39;ll need to be deeply experienced in ML performance optimization, understand the full ML performance stack, and know inference inside-out. You&#39;ll also need to thrive in cross-functional collaboration and have excellent leadership skills.</p>
<p>If you&#39;re ready to lead the future of inference performance at a fast-paced, high-growth frontier, apply now!</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>staff</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>ML performance optimization, PyTorch, TensorRT, TransformerEngine, Triton, CUTLASS kernels, Quantization, Kernel authoring, Compilation, Model parallelism, Distributed serving, Profiling</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>fal</Employername>
      <Employerlogo>https://logos.yubhub.co/fal.com.png</Employerlogo>
      <Employerdescription>fal is a fast-growing company pioneering the next generation of generative-media infrastructure.</Employerdescription>
      <Employerwebsite>https://fal.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/fal/jobs/4012780009</Applyto>
      <Location>San Francisco</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
  </jobs>
</source>