{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/resilient-multi-provider-architectures"},"x-facet":{"type":"skill","slug":"resilient-multi-provider-architectures","display":"resilient multi-provider architectures","count":1},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_c220dc7b-fb8"},"title":"Model Routing & Inference Team Lead","description":"<p><strong>About the Role</strong></p>\n<p>You will lead the Model Routing &amp; Inference team at Cursor, owning the inference platform that powers every AI interaction in the product. This team owns the full inference path: making Cursor&#39;s AI faster, more reliable, and more cost-effective at a scale few teams in the world get to operate at. Every agent session, every tab completion, and every chat message flows through your stack.</p>\n<p>You&#39;ll set technical direction for cluster management, inference optimization, and traffic egress, building the platform that lets the rest of the company move fast without worrying about provider complexity. You&#39;ll lead a team of strong engineers, set strong direction for the business, and make the calls that balance latency, cost, reliability, and user experience across millions of daily requests.</p>\n<p><strong>What you’ll do</strong></p>\n<ul>\n<li>Building and evolving our inference gateway, a single abstraction over every provider&#39;s API semantics, so model onboarding becomes a config change.</li>\n<li>Building the systems that dynamically select the best model for each request based on cost, latency, and quality.</li>\n<li>Managing GPU cluster utilization and capacity planning across providers, optimizing for cost and performance.</li>\n<li>Designing routing backpressure and admission control so traffic spikes don&#39;t cascade into providers.</li>\n<li>Hiring and growing the team: sourcing, interviewing, and closing top inference and systems talent, while developing your engineers through coaching, mentorship, and high-leverage project assignments.</li>\n</ul>\n<p><strong>You may be a fit if</strong></p>\n<ul>\n<li>You have led engineering teams building high-throughput, low-latency distributed systems, especially in inference serving, traffic routing, or real-time data pipelines.</li>\n<li>You&#39;re comfortable reasoning about cost/performance tradeoffs at scale (GPU utilization, provider economics, capacity planning) and making decisions with incomplete information.</li>\n<li>You have strong software engineering fundamentals and enjoy shipping production systems that handle millions of requests.</li>\n<li>Experience with model serving frameworks (vLLM, TensorRT-LLM, TGI), load balancing, or building resilient multi-provider architectures is a plus.</li>\n<li>You make good calls in the gray area: weighing reliability, cost, latency, and user experience when there isn&#39;t a single &#39;right&#39; answer.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_c220dc7b-fb8","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Cursor","sameAs":"https://cursor.com","logo":"https://logos.yubhub.co/cursor.com.png"},"x-apply-url":"https://cursor.com/careers/engineering-manager-model-routing-inference?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["model serving frameworks","load balancing","resilient multi-provider architectures","cost/performance tradeoffs","GPU utilization"],"x-skills-preferred":[],"datePosted":"2026-04-24T14:14:59.582Z","jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"model serving frameworks, load balancing, resilient multi-provider architectures, cost/performance tradeoffs, GPU utilization"}]}