Senior Engineering Manager, AI Runtime

71554e46-b64 Senior Engineering Manager, AI Runtime At Databricks, we are committed to enabling data teams to solve the world's toughest problems. As a Senior Engineering Manager, you will lead the team owning both the product experience and the foundational infrastructure of our AI Runtime (AIR) product.

You will be responsible for shaping customer-facing capabilities while designing for scalability, extensibility, and performance of GPU training and adjacent areas. This will involve collaborating closely across the platform, product, infrastructure, and research organisations.

Key responsibilities include:

Leading, mentoring, and growing a high-performing engineering team responsible for the Custom Training product and its foundational infrastructure
Defining and owning the product and technical roadmap for AIR, balancing customer experience, functionality, and foundational investments
Collaborating closely with product, research, platform, infrastructure teams, and customers to drive end-to-end delivery
Driving architectural decisions and product design for managed GPU training at scale
Advocating for customer needs through direct engagement, ensuring engineering decisions translate to clear product impact

We are looking for someone with 8+ years of software engineering experience, with 3+ years in engineering management. You should have a track record of building and operating managed GPU training infrastructure at scale, as well as deep familiarity with distributed training frameworks and parallelism strategies.

In addition, you should have experience with training resilience patterns, such as checkpointing, elastic training, and automated failure recovery for long-running jobs. You should also have a strong understanding of GPU performance fundamentals, including NCCL, interconnect topologies, and memory optimisation.

Experience building platform products with clear SLAs is also essential, as is strong cross-functional leadership across platform, product, and research teams. Excellent collaboration and communication skills are also required.

The pay range for this role is $228,600-$314,250 USD per year, depending on location. The total compensation package may also include eligibility for annual performance bonus, equity, and benefits.

XML job scraping automation by YubHub

]]> full-time senior onsite $228,600-$314,250 USD per year software engineering, engineering management, distributed training frameworks, parallelism strategies, GPU training infrastructure, checkpointing, elastic training, automated failure recovery, GPU performance fundamentals, NCCL, interconnect topologies, memory optimisation Engineering Technology Databricks https://logos.yubhub.co/databricks.com.png Databricks is a data and AI company that provides a unified platform for data, analytics, and AI. It was founded by the original creators of Lakehouse, Apache Spark, Delta Lake, and MLflow. https://databricks.com https://job-boards.greenhouse.io/databricks/jobs/8490282002 Mountain View, California; San Francisco, California 2026-04-18