<?xml version="1.0" encoding="UTF-8"?>
<source>
  <jobs>
    <job>
      <externalid>74be15a1-bce</externalid>
      <Title>Software Engineer, Inference Deployment</Title>
      <Description><![CDATA[<p>Our mandate is to make inference deployment boring and unattended. We serve Claude to millions of users across GPUs, TPUs, and Trainium , and every model update must reach production safely, quickly, and without disrupting service. As a Software Engineer on the Launch Engineering team, you&#39;ll design and build the deployment infrastructure that moves inference code from merge to production.</p>
<p>This is a resource-constrained optimization problem at its core: validation and deployment consume the same accelerator chips that serve customer traffic , your deploys compete with live user requests for the same hardware. Every model brings different fleet sizes, startup times, and correctness requirements, so the system must adapt continuously. You&#39;ll build systems that navigate these constraints , orchestrating validation, scheduling deployments intelligently, and driving down cycle time from merge to production.</p>
<p>Responsibilities:</p>
<ul>
<li>Own deployment orchestration that continuously moves validated inference builds into production across GPU, TPU, and Trainium fleets, unattended under normal conditions</li>
</ul>
<ul>
<li>Improve capacity-aware deployment scheduling to maximize deployment throughput against constrained accelerator budgets and variable fleet sizes</li>
</ul>
<ul>
<li>Extend deployment observability , dashboards and tooling that answer &quot;what code is running in production,&quot; &quot;where is my commit,&quot; and &quot;what validation passed for this deploy&quot;</li>
</ul>
<ul>
<li>Drive down cycle time from code merge to production with pipeline architectures that minimize serial dependencies and maximize parallelism</li>
</ul>
<ul>
<li>Optimize fleet rollout strategies for large-scale deployments across thousands of GPU, TPU, and Trainium chips, minimizing disruption to serving capacity</li>
</ul>
<ul>
<li>Evolve self-service model onboarding so that new models can be added to the continuous deployment pipeline without Launch Engineering involvement</li>
</ul>
<ul>
<li>Partner across the Inference organization with teams owning validation, autoscaling, and model routing to integrate deployment automation with their systems</li>
</ul>
<p>You May Be a Good Fit If You Have:</p>
<ul>
<li>5+ years of experience building deployment, release, or delivery infrastructure at scale</li>
</ul>
<ul>
<li>Strong software engineering skills with experience designing systems that manage complex state machines and multi-stage pipelines</li>
</ul>
<ul>
<li>Experience with deployment systems where resource constraints shape the design , whether that&#39;s fleet capacity, network bandwidth, hardware availability, or coordinated rollout windows</li>
</ul>
<ul>
<li>A track record of building automation that measurably improves deployment velocity and reliability</li>
</ul>
<ul>
<li>Proficiency with Kubernetes-based deployments, rolling update mechanics, and container orchestration</li>
</ul>
<ul>
<li>Comfort working across the stack , from backend services and databases to CLI tools and web UIs</li>
</ul>
<ul>
<li>Strong communication skills and the ability to work closely with oncall engineers, model teams, and infrastructure partners</li>
</ul>
<p>Strong Candidates May Also Have:</p>
<ul>
<li>Experience with ML inference or training infrastructure deployment, particularly across multiple accelerator types (GPU, TPU, Trainium)</li>
</ul>
<ul>
<li>Background in capacity planning or resource-constrained scheduling (e.g., bin-packing, fleet management, job scheduling with hardware affinity)</li>
</ul>
<ul>
<li>Experience with progressive delivery in systems with long validation cycles: canary/soak testing, blue-green deployments, traffic shifting, automated rollback</li>
</ul>
<ul>
<li>Experience at companies with large-scale release engineering challenges (mobile release trains, monorepo deployments, multi-datacenter rollouts)</li>
</ul>
<ul>
<li>Experience with Python and/or Rust in production systems</li>
</ul>
<p>The annual compensation range for this role is $320,000-$485,000 USD.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$320,000-$485,000 USD</Salaryrange>
      <Skills>deployment infrastructure, software engineering, complex state machines, multi-stage pipelines, Kubernetes-based deployments, container orchestration, backend services, databases, CLI tools, web UIs, ML inference, training infrastructure deployment, capacity planning, resource-constrained scheduling,  deployments, progressive delivery, Python, Rust</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Anthropic</Employername>
      <Employerlogo>https://logos.yubhub.co/anthropic.com.png</Employerlogo>
      <Employerdescription>Anthropic is a public benefit corporation that creates reliable, interpretable, and steerable AI systems.</Employerdescription>
      <Employerwebsite>https://www.anthropic.com/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/anthropic/jobs/5111745008</Applyto>
      <Location>San Francisco, CA | New York City, NY | Seattle, WA</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
  </jobs>
</source>