<?xml version="1.0" encoding="UTF-8"?>
<source>
  <jobs>
    <job>
      <externalid>c220dc7b-fb8</externalid>
      <Title>Model Routing &amp; Inference Team Lead</Title>
      <Description><![CDATA[<p><strong>About the Role</strong></p>
<p>You will lead the Model Routing &amp; Inference team at Cursor, owning the inference platform that powers every AI interaction in the product. This team owns the full inference path: making Cursor&#39;s AI faster, more reliable, and more cost-effective at a scale few teams in the world get to operate at. Every agent session, every tab completion, and every chat message flows through your stack.</p>
<p>You&#39;ll set technical direction for cluster management, inference optimization, and traffic egress, building the platform that lets the rest of the company move fast without worrying about provider complexity. You&#39;ll lead a team of strong engineers, set strong direction for the business, and make the calls that balance latency, cost, reliability, and user experience across millions of daily requests.</p>
<p><strong>What you’ll do</strong></p>
<ul>
<li>Building and evolving our inference gateway, a single abstraction over every provider&#39;s API semantics, so model onboarding becomes a config change.</li>
<li>Building the systems that dynamically select the best model for each request based on cost, latency, and quality.</li>
<li>Managing GPU cluster utilization and capacity planning across providers, optimizing for cost and performance.</li>
<li>Designing routing backpressure and admission control so traffic spikes don&#39;t cascade into providers.</li>
<li>Hiring and growing the team: sourcing, interviewing, and closing top inference and systems talent, while developing your engineers through coaching, mentorship, and high-leverage project assignments.</li>
</ul>
<p><strong>You may be a fit if</strong></p>
<ul>
<li>You have led engineering teams building high-throughput, low-latency distributed systems, especially in inference serving, traffic routing, or real-time data pipelines.</li>
<li>You&#39;re comfortable reasoning about cost/performance tradeoffs at scale (GPU utilization, provider economics, capacity planning) and making decisions with incomplete information.</li>
<li>You have strong software engineering fundamentals and enjoy shipping production systems that handle millions of requests.</li>
<li>Experience with model serving frameworks (vLLM, TensorRT-LLM, TGI), load balancing, or building resilient multi-provider architectures is a plus.</li>
<li>You make good calls in the gray area: weighing reliability, cost, latency, and user experience when there isn&#39;t a single &#39;right&#39; answer.</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>remote</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>model serving frameworks, load balancing, resilient multi-provider architectures, cost/performance tradeoffs, GPU utilization</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Cursor</Employername>
      <Employerlogo>https://logos.yubhub.co/cursor.com.png</Employerlogo>
      <Employerdescription>Cursor is an AI-powered product company that operates at a large scale.</Employerdescription>
      <Employerwebsite>https://cursor.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://cursor.com/careers/engineering-manager-model-routing-inference?utm_source=yubhub.co&amp;utm_medium=jobs_feed&amp;utm_campaign=apply</Applyto>
      <Location></Location>
      <Country></Country>
      <Postedate>2026-04-24</Postedate>
    </job>
  </jobs>
</source>