<?xml version="1.0" encoding="UTF-8"?>
<source>
  <jobs>
    <job>
      <externalid>71554e46-b64</externalid>
      <Title>Senior Engineering Manager, AI Runtime</Title>
      <Description><![CDATA[<p>At Databricks, we are committed to enabling data teams to solve the world&#39;s toughest problems. As a Senior Engineering Manager, you will lead the team owning both the product experience and the foundational infrastructure of our AI Runtime (AIR) product.</p>
<p>You will be responsible for shaping customer-facing capabilities while designing for scalability, extensibility, and performance of GPU training and adjacent areas. This will involve collaborating closely across the platform, product, infrastructure, and research organisations.</p>
<p>Key responsibilities include:</p>
<ul>
<li>Leading, mentoring, and growing a high-performing engineering team responsible for the Custom Training product and its foundational infrastructure</li>
<li>Defining and owning the product and technical roadmap for AIR, balancing customer experience, functionality, and foundational investments</li>
<li>Collaborating closely with product, research, platform, infrastructure teams, and customers to drive end-to-end delivery</li>
<li>Driving architectural decisions and product design for managed GPU training at scale</li>
<li>Advocating for customer needs through direct engagement, ensuring engineering decisions translate to clear product impact</li>
</ul>
<p>We are looking for someone with 8+ years of software engineering experience, with 3+ years in engineering management. You should have a track record of building and operating managed GPU training infrastructure at scale, as well as deep familiarity with distributed training frameworks and parallelism strategies.</p>
<p>In addition, you should have experience with training resilience patterns, such as checkpointing, elastic training, and automated failure recovery for long-running jobs. You should also have a strong understanding of GPU performance fundamentals, including NCCL, interconnect topologies, and memory optimisation.</p>
<p>Experience building platform products with clear SLAs is also essential, as is strong cross-functional leadership across platform, product, and research teams. Excellent collaboration and communication skills are also required.</p>
<p>The pay range for this role is $228,600-$314,250 USD per year, depending on location. The total compensation package may also include eligibility for annual performance bonus, equity, and benefits.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange>$228,600-$314,250 USD per year</Salaryrange>
      <Skills>software engineering, engineering management, distributed training frameworks, parallelism strategies, GPU training infrastructure, checkpointing, elastic training, automated failure recovery, GPU performance fundamentals, NCCL, interconnect topologies, memory optimisation</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Databricks</Employername>
      <Employerlogo>https://logos.yubhub.co/databricks.com.png</Employerlogo>
      <Employerdescription>Databricks is a data and AI company that provides a unified platform for data, analytics, and AI. It was founded by the original creators of Lakehouse, Apache Spark, Delta Lake, and MLflow.</Employerdescription>
      <Employerwebsite>https://databricks.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/databricks/jobs/8490282002</Applyto>
      <Location>Mountain View, California; San Francisco, California</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
  </jobs>
</source>