{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/trainium"},"x-facet":{"type":"skill","slug":"trainium","display":"Trainium","count":9},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_70e2591f-d7d"},"title":"Technical Program Manager, Infrastructure","description":"<p>As a Technical Program Manager for Infrastructure, you&#39;ll work across multiple infrastructure domains to coordinate complex programs that have broad organisational impact. You&#39;ll be solving novel scaling challenges at the frontier of what&#39;s possible, all while maintaining the security and reliability our mission demands.</p>\n<p>Developer Productivity &amp; Tooling</p>\n<ul>\n<li>Drive cross-functional programs to improve developer environments, CI/CD infrastructure, and release processes that enable rapid innovation while maintaining high security standards</li>\n</ul>\n<ul>\n<li>Coordinate large-scale migrations and platform modernization efforts across engineering teams</li>\n</ul>\n<ul>\n<li>Partner with teams to measure and improve developer productivity metrics, identifying bottlenecks and driving systematic improvements</li>\n</ul>\n<ul>\n<li>Lead initiatives to integrate AI tools into development workflows, helping Anthropic be at the forefront of AI-assisted research and engineering</li>\n</ul>\n<p>Infrastructure Reliability &amp; Operations</p>\n<ul>\n<li>Drive programs to establish and achieve reliability targets across training infrastructure and production services</li>\n</ul>\n<ul>\n<li>Coordinate incident response improvements, post-mortem processes, and on-call rotations that help teams operate effectively</li>\n</ul>\n<ul>\n<li>Establish metrics and dashboards to track infrastructure health, capacity utilisation, and operational excellence</li>\n</ul>\n<p>Cross-functional Coordination</p>\n<ul>\n<li>Serve as the critical bridge between infrastructure teams, research, and product, translating technical complexities into clear updates for a variety of audiences</li>\n</ul>\n<ul>\n<li>Consult with stakeholders to deeply understand infrastructure, data, and compute needs, identifying solutions to support frontier research and product development</li>\n</ul>\n<ul>\n<li>Drive alignment on priorities and timelines across teams with competing constraints</li>\n</ul>\n<p>You&#39;ll be a good fit if you have 5+ years of technical program management experience, with a track record of successfully delivering complex infrastructure programs in ML/AI systems or large-scale distributed systems. You&#39;ll also need a deep technical understanding of infrastructure systems, strong stakeholder management skills, and the ability to navigate competing priorities-confirming data-driven technical decisions.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_70e2591f-d7d","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5111783008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$290,000-$365,000 USD","x-skills-required":["Kubernetes","Cloud platforms (AWS, GCP, Azure)","ML infrastructure (GPU/TPU/Trainium clusters)","Developer productivity initiatives","CI/CD systems","Infrastructure scaling"],"x-skills-preferred":["Observability tooling and practices","AI tools to improve engineering productivity","Research teams and translating their needs into concrete technical requirements"],"datePosted":"2026-04-18T15:57:52.097Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Kubernetes, Cloud platforms (AWS, GCP, Azure), ML infrastructure (GPU/TPU/Trainium clusters), Developer productivity initiatives, CI/CD systems, Infrastructure scaling, Observability tooling and practices, AI tools to improve engineering productivity, Research teams and translating their needs into concrete technical requirements","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":290000,"maxValue":365000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_ac45e205-e7d"},"title":"Engineering Manager, Inference Routing and Performance","description":"<p><strong>About the role\\nEvery request that hits Claude , from claude.ai, the API, our cloud partners, or internal research , passes through a routing decision. Not a generic load balancer round-robin, but a decision that accounts for what&#39;s already cached where, which accelerator the request runs best on, and what else is in flight across the fleet.\\n\\nGet it right and you extract meaningfully more throughput from the same hardware. Get it wrong and you burn capacity, miss latency SLOs, or shed load that shouldn&#39;t have been shed.\\n\\nThe Inference Routing team owns this layer. We build the cluster-level routing and coordination plane for Anthropic&#39;s inference fleet , the system that sits between the API surface and the inference engines themselves, making fleet-wide efficiency decisions in real time.\\n\\nAs Anthropic moves from &quot;many independent inference replicas&quot; toward &quot;a single warehouse-scale computer running a coordinated program,&quot; Dystro is the coordination layer. This is a deeply technical team.\\n\\nThe engineers here design custom load-balancing algorithms, build quantitative models of system performance, debug latency spikes that cross kernel, network, and framework boundaries, and reason carefully about cache placement across thousands of accelerators.\\n\\nThey work shoulder-to-shoulder with teams that write kernels and ML framework internals.\\n\\nThe EM for this team doesn&#39;t need to write kernels , but they do need the systems depth to make architectural calls, evaluate deeply technical candidates, and spot when a proposed optimization will have second-order effects on the fleet.\\n\\nYou&#39;ll inherit a strong team of distributed-systems engineers, and you&#39;ll be accountable for two things that pull in different directions: shipping system-level performance improvements that measurably increase fleet throughput and efficiency, and running the team operationally so that deploys are safe, incidents are rare, and the teams who depend on Dystro can plan around you with confidence.\\n\\nThe job is holding both.\\n\\n## Representative work:\\nThings the Inference Routing EM actually spends time on:\\n- Deciding whether a proposed routing algorithm change is worth the deploy risk, given the modeled throughput gain and the blast radius if it regresses\\n- Sequencing a quarter where KV-cache offload, a new coordination protocol, and two model launches all compete for the same engineers\\n- Working through a persistent tail-latency regression with the team , walking down from fleet-level metrics to per-replica behavior to a root cause in the networking stack\\n- Building the case (with numbers) to peer teams for why a cross-team protocol change unlocks the next efficiency win\\n- Running the post-incident review after a cache-eviction bug caused a capacity event, and turning it into process changes that stick\\n- Interviewing a candidate who has built schedulers at supercomputing scale, and deciding whether they&#39;d be additive to a team that already goes deep\\n\\n## What you&#39;ll do:\\nDrive system-level performance\\n- Own the technical roadmap for cluster-level inference efficiency , routing decisions, cache placement and eviction, cross-replica coordination, and the protocols that keep routing and inference engines in sync\\n- Partner with the inference engine, kernels, and performance teams to identify fleet-level throughput and latency wins, then turn those into shipped improvements with measurable results\\n- Build the team&#39;s habit of quantitative performance modeling: claim a win only when you can measure it, and know before you ship what the expected effect is\\n\\nDeliver reliably and operate cleanly\\n- Set technical strategy for how routing evolves across heterogeneous hardware (GPUs, TPUs, Trainium) and across all our serving surfaces\\n- Run the team&#39;s operational backbone , on-call rotation, incident response, postmortem review, deploy safety , so the team can ship aggressively without the system becoming fragile\\n- Create clarity at a seam: Inference Routing sits between the API surface, the inference engines, and the cloud deployment teams. You&#39;ll make sure commitments are realistic, dependencies are understood, and nobody is surprised\\n\\nBuild and grow the team\\n- Develop and retain a strong existing team, and hire against the bar described above: people who can go to the OS and framework level when the problem demands it, and who care about production reliability\\n- Coach engineers through a roadmap where priorities shift with model launches, new hardware, and scaling demands. We pair a lot here , you&#39;ll help make that collaboration pattern productive\\n- Pick up slack when it matters. This is a small team in a critical path; sometimes the EM is the one unblocking a stuck deploy or synthesizing a design debate\\n\\n## You may be a good fit if you:\\n- Have 5+ years of engineering management experience, ideally with at least part of that leading teams on critical-path production infrastructure at scale\\n- Have a deep systems background , load balancing, scheduling, cache-coherent distributed state, high-performance networking, or similar. You need enough depth to make architectural calls about routing and efficiency, and to evaluate candidates who go to the kernel and framework level\\n- Have shipped performance improvements in large-scale systems and can explain, with numbers, what the impact was\\n- Have run production infrastructure with real operational stakes: on-call, incident response, capacity events, deploy discipline\\n- Are results-oriented with a bias toward impact, and comfortable working in a space where throughput, latency, stability, and feature velocity all pull in different directions\\n- Build strong relationships across team boundaries , this is a seam role, and much of the job is making sure other teams can rely on yours\\n- Are curious about machine learning systems. You don&#39;t need an ML research background, but you should want to learn how transformer inference actually works and how that shapes the systems problems\\n\\nStrong candidates may also have:\\n- Experience with LLM inference serving , KV caching, continuous batching, request scheduling, prefill/decode disaggregation\\n- Background in cluster schedulers, load balancers, service meshes, or coordination planes at scale\\n- Familiarity with heterogeneous accelerator fleets (GPU/TPU/Trainium) and how hardware differences affect workload placement\\n- Experience with GPU/accelerator programming, ML framework internals, or OS-level performance debugging , enough to follow and evaluate the technical work, not necessarily to do it daily\\n- Led teams at supercomputing or hyperscaler infrastructure scale\\n- Led teams through rapid-growth periods where hiring and onboarding competed with roadmap delivery\\n\\nThe annual compensation range for this role is listed below. For sales roles, the range provided is the role’s On Target Earnings (&quot;OTE&quot;) range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.\\nAnnual Salary: $405,000-$485,000 USD</strong></p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_ac45e205-e7d","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5155391008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$405,000-$485,000 USD","x-skills-required":["engineering management","distributed systems","load balancing","scheduling","cache-coherent distributed state","high-performance networking","machine learning systems"],"x-skills-preferred":["LLM inference serving","cluster schedulers","load balancers","service meshes","coordination planes","heterogeneous accelerator fleets","GPU/TPU/Trainium","GPU/accelerator programming","ML framework internals","OS-level performance debugging"],"datePosted":"2026-04-18T15:56:48.587Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"engineering management, distributed systems, load balancing, scheduling, cache-coherent distributed state, high-performance networking, machine learning systems, LLM inference serving, cluster schedulers, load balancers, service meshes, coordination planes, heterogeneous accelerator fleets, GPU/TPU/Trainium, GPU/accelerator programming, ML framework internals, OS-level performance debugging","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":405000,"maxValue":485000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_59e88547-efc"},"title":"Senior Software Engineer, Systems","description":"<p>About Anthropic</p>\n<p>Anthropic&#39;s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole.</p>\n<p>About the Role</p>\n<p>Anthropic&#39;s Infrastructure organization is foundational to our mission of developing AI systems that are reliable, interpretable, and steerable. The systems we build determine how quickly we can train new models, how reliably we can run safety experiments, and how effectively we can scale Claude to millions of users , demonstrating that safe, reliable infrastructure and frontier capabilities can go hand in hand. The Systems engineering team owns compute uptime and resilience at massive scale, building the clusters, automation, and observability that make frontier AI research possible and safely deployable to customers.</p>\n<p>Responsibilities</p>\n<ul>\n<li>Lead infrastructure projects from design through delivery, owning scope, execution, and outcomes</li>\n<li>Build and maintain systems that support AI clusters at massive scale (thousands to hundreds of thousands of machines)</li>\n<li>Partner with cloud providers and internal teams to solve compute, networking, and reliability challenges</li>\n<li>Tackle difficult technical problems in your domain and proactively fill gaps in tooling, documentation, and processes</li>\n<li>Contribute to operational practices including incident response, postmortems, and on-call rotations</li>\n</ul>\n<p>Benefits</p>\n<ul>\n<li>Competitive compensation and benefits</li>\n<li>Optional equity donation matching</li>\n<li>Generous vacation and parental leave</li>\n<li>Flexible working hours</li>\n<li>Lovely office space in which to collaborate with colleagues</li>\n</ul>\n<p>Requirements</p>\n<ul>\n<li>6+ years of software engineering experience</li>\n<li>Have led technical projects end-to-end over multiple months, including scoping, breaking down work, and driving delivery</li>\n<li>Have deep knowledge of distributed systems, reliability, and cloud platforms (Kubernetes, IaC, AWS/GCP)</li>\n<li>Are strong in at least one systems language (Python, Rust, Go, Java)</li>\n<li>Solve hard problems independently and know when to pull others in</li>\n<li>Help teammates grow through knowledge sharing and thoughtful technical guidance</li>\n<li>Communicate clearly in design docs, presentations, and cross-functional discussions</li>\n</ul>\n<p>Preferred Qualifications</p>\n<ul>\n<li>Security and privacy best practice expertise</li>\n<li>Experience with machine learning infrastructure like GPUs, TPUs, or Trainium, as well as supporting networking infrastructure like NCCL</li>\n<li>Low level systems experience, for example linux kernel tuning and eBPF</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_59e88547-efc","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4915842008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"£240,000-£325,000 GBP","x-skills-required":["Distributed systems","Reliability","Cloud platforms","Kubernetes","IaC","AWS/GCP","Systems language","Python","Rust","Go","Java"],"x-skills-preferred":["Security and privacy best practice","Machine learning infrastructure","GPUs","TPUs","Trainium","Networking infrastructure","NCCL","Low level systems experience","Linux kernel tuning","eBPF"],"datePosted":"2026-04-18T15:48:47.617Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"London, UK"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Distributed systems, Reliability, Cloud platforms, Kubernetes, IaC, AWS/GCP, Systems language, Python, Rust, Go, Java, Security and privacy best practice, Machine learning infrastructure, GPUs, TPUs, Trainium, Networking infrastructure, NCCL, Low level systems experience, Linux kernel tuning, eBPF","baseSalary":{"@type":"MonetaryAmount","currency":"GBP","value":{"@type":"QuantitativeValue","minValue":240000,"maxValue":325000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_63af8568-789"},"title":"Engineering Manager, Inference Routing and Performance","description":"<p><strong>About the role\\nEvery request that hits Claude , from claude.ai, the API, our cloud partners, or internal research , passes through a routing decision. Not a generic load balancer round-robin, but a decision that accounts for what&#39;s already cached where, which accelerator the request runs best on, and what else is in flight across the fleet.\\n\\nGet it right and you extract meaningfully more throughput from the same hardware. Get it wrong and you burn capacity, miss latency SLOs, or shed load that shouldn&#39;t have been shed.\\n\\nThe Inference Routing team owns this layer. We build the cluster-level routing and coordination plane for Anthropic&#39;s inference fleet , the system that sits between the API surface and the inference engines themselves, making fleet-wide efficiency decisions in real time.\\n\\nAs Anthropic moves from &quot;many independent inference replicas&quot; toward &quot;a single warehouse-scale computer running a coordinated program,&quot; Dystro is the coordination layer. This is a deeply technical team.\\n\\nThe engineers here design custom load-balancing algorithms, build quantitative models of system performance, debug latency spikes that cross kernel, network, and framework boundaries, and reason carefully about cache placement across thousands of accelerators.\\n\\nThey work shoulder-to-shoulder with teams that write kernels and ML framework internals.\\n\\nThe EM for this team doesn&#39;t need to write kernels , but they do need the systems depth to make architectural calls, evaluate deeply technical candidates, and spot when a proposed optimization will have second-order effects on the fleet.\\n\\nYou&#39;ll inherit a strong team of distributed-systems engineers, and you&#39;ll be accountable for two things that pull in different directions: shipping system-level performance improvements that measurably increase fleet throughput and efficiency, and running the team operationally so that deploys are safe, incidents are rare, and the teams who depend on Dystro can plan around you with confidence.\\n\\nThe job is holding both.\\n\\n## Representative work:\\nThings the Inference Routing EM actually spends time on:\\n- Deciding whether a proposed routing algorithm change is worth the deploy risk, given the modeled throughput gain and the blast radius if it regresses\\n- Sequencing a quarter where KV-cache offload, a new coordination protocol, and two model launches all compete for the same engineers\\n- Working through a persistent tail-latency regression with the team , walking down from fleet-level metrics to per-replica behavior to a root cause in the networking stack\\n- Building the case (with numbers) to peer teams for why a cross-team protocol change unlocks the next efficiency win\\n- Running the post-incident review after a cache-eviction bug caused a capacity event, and turning it into process changes that stick\\n- Interviewing a candidate who has built schedulers at supercomputing scale, and deciding whether they&#39;d be additive to a team that already goes deep\\n\\n## What you&#39;ll do:\\nDrive system-level performance\\n- Own the technical roadmap for cluster-level inference efficiency , routing decisions, cache placement and eviction, cross-replica coordination, and the protocols that keep routing and inference engines in sync\\n- Partner with the inference engine, kernels, and performance teams to identify fleet-level throughput and latency wins, then turn those into shipped improvements with measurable results\\n- Build the team&#39;s habit of quantitative performance modeling: claim a win only when you can measure it, and know before you ship what the expected effect is\\n\\nDeliver reliably and operate cleanly\\n- Set technical strategy for how routing evolves across heterogeneous hardware (GPUs, TPUs, Trainium) and across all our serving surfaces\\n- Run the team&#39;s operational backbone , on-call rotation, incident response, postmortem review, deploy safety , so the team can ship aggressively without the system becoming fragile\\n- Create clarity at a seam: Inference Routing sits between the API surface, the inference engines, and the cloud deployment teams. You&#39;ll make sure commitments are realistic, dependencies are understood, and nobody is surprised\\n\\nBuild and grow the team\\n- Develop and retain a strong existing team, and hire against the bar described above: people who can go to the OS and framework level when the problem demands it, and who care about production reliability\\n- Coach engineers through a roadmap where priorities shift with model launches, new hardware, and scaling demands. We pair a lot here , you&#39;ll help make that collaboration pattern productive\\n- Pick up slack when it matters. This is a small team in a critical path; sometimes the EM is the one unblocking a stuck deploy or synthesizing a design debate\\n\\n## You may be a good fit if you:\\n- Have 5+ years of engineering management experience, ideally with at least part of that leading teams on critical-path production infrastructure at scale\\n- Have a deep systems background , load balancing, scheduling, cache-coherent distributed state, high-performance networking, or similar. You need enough depth to make architectural calls about routing and efficiency, and to evaluate candidates who go to the kernel and framework level\\n- Have shipped performance improvements in large-scale systems and can explain, with numbers, what the impact was\\n- Have run production infrastructure with real operational stakes: on-call, incident response, capacity events, deploy discipline\\n- Are results-oriented with a bias toward impact, and comfortable working in a space where throughput, latency, stability, and feature velocity all pull in different directions\\n- Build strong relationships across team boundaries , this is a seam role, and much of the job is making sure other teams can rely on yours\\n- Are curious about machine learning systems. You don&#39;t need an ML research background, but you should want to learn how transformer inference actually works and how that shapes the systems problems\\n\\nStrong candidates may also have:\\n- Experience with LLM inference serving , KV caching, continuous batching, request scheduling, prefill/decode disaggregation\\n- Background in cluster schedulers, load balancers, service meshes, or coordination planes at scale\\n- Familiarity with heterogeneous accelerator fleets (GPU/TPU/Trainium) and how hardware differences affect workload placement\\n- Experience with GPU/accelerator programming, ML framework internals, or OS-level performance debugging , enough to follow and evaluate the technical work, not necessarily to do it daily\\n- Led teams at supercomputing or hyperscaler infrastructure scale\\n- Led teams through rapid-growth periods where hiring and onboarding competed with roadmap delivery\\n\\nThe annual compensation range for this role is listed below. For sales roles, the range provided is the role’s On Target Earnings (&quot;OTE&quot;) range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.\\nAnnual Salary: $405,000-$485,000 USD</strong></p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_63af8568-789","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5155391008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$405,000-$485,000 USD","x-skills-required":["engineering management","deep systems background","load balancing","scheduling","cache-coherent distributed state","high-performance networking"],"x-skills-preferred":["LLM inference serving","cluster schedulers","load balancers","service meshes","coordination planes","heterogeneous accelerator fleets","GPU/TPU/Trainium","GPU/accelerator programming","ML framework internals","OS-level performance debugging"],"datePosted":"2026-04-18T15:37:38.038Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"engineering management, deep systems background, load balancing, scheduling, cache-coherent distributed state, high-performance networking, LLM inference serving, cluster schedulers, load balancers, service meshes, coordination planes, heterogeneous accelerator fleets, GPU/TPU/Trainium, GPU/accelerator programming, ML framework internals, OS-level performance debugging","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":405000,"maxValue":485000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_0a872d93-7f6"},"title":"Engineering Manager, Cloud Inference AWS","description":"<p>We are seeking an experienced Engineering Manager to lead the Cloud Inference team for AWS. You will lead your team to scale and optimize Claude to serve the massive audiences of developers and enterprise companies using AWS.</p>\n<p>As an Engineering Manager, you will own the end-to-end product of Claude on AWS, including API, load balancing, inference, capacity and operations. Your team will ensure our LLMs meet rigorous performance, safety and security standards and enhance our core infrastructure for packaging, testing, and deploying inference technology across the globe.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Set technical strategy and oversee development of Claude on AWS across all layers of the technical stack.</li>\n<li>Collaborate across teams and companies to deeply understand product, infrastructure, operations and capacity needs, identifying potential solutions to support frontier LLM serving</li>\n<li>Work closely with cross-functional stakeholders across companies to align on goals and drive outcomes</li>\n<li>Create clarity for the team and stakeholders in an ambiguous and evolving environment</li>\n<li>Take an inclusive approach to hiring and coaching top technical talent, and support a high performing team</li>\n<li>Design and run processes (e.g. postmortem review, incident response, on-call rotations) that help the team operate effectively and never fail the same way twice</li>\n</ul>\n<p>Requirements:</p>\n<ul>\n<li>10+ years of experience in high-scale, high-reliability software development, particularly infrastructure or capacity management</li>\n<li>5+ years of engineering management experience</li>\n<li>Experience recruiting, scaling, and retaining engineering talent in a high growth environment</li>\n<li>Have experience scaling products, resources and operations to accommodate rapid growth</li>\n<li>Are deeply interested in the potential transformative effects of advanced AI systems and are committed to ensuring their safe development</li>\n<li>Excel at building strong relationships and strategy with stakeholders across engineering, product, finance, and sales</li>\n<li>Have experience working with external partners to align goals and deliver impact</li>\n<li>Enjoy working in a fast-paced, early environment; comfortable with adapting priorities as driven by the rapidly evolving AI space</li>\n<li>Have excellent written and verbal communication skills</li>\n<li>Demonstrated success building a culture of belonging and engineering excellence</li>\n<li>Are motivated by developing AI responsibly and safely</li>\n<li>Are willing and able to travel frequently between Seattle and the SF Bay Area</li>\n</ul>\n<p>Strong candidates may also have experience with:</p>\n<ul>\n<li>Experience with machine learning infrastructure like GPUs, TPUs, or Trainium, as well as supporting networking infrastructure like NCCL</li>\n<li>Experience as a Product Manager</li>\n<li>Experience with deployment and capacity management automation</li>\n<li>Security and privacy best practice expertise</li>\n</ul>\n<p>Annual compensation range for this role is $405,000-$485,000 USD.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_0a872d93-7f6","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5141377008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$405,000-$485,000 USD","x-skills-required":["Cloud Inference","AWS","Machine Learning","Infrastructure Management","Capacity Planning","Security and Privacy","Leadership","Communication","Collaboration"],"x-skills-preferred":["GPU","TPU","Trainium","NCCL","Product Management","Deployment Automation","Security Best Practices"],"datePosted":"2026-04-18T15:37:12.539Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Cloud Inference, AWS, Machine Learning, Infrastructure Management, Capacity Planning, Security and Privacy, Leadership, Communication, Collaboration, GPU, TPU, Trainium, NCCL, Product Management, Deployment Automation, Security Best Practices","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":405000,"maxValue":485000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_0a7113f5-76c"},"title":"Engineering Manager, Cloud Inference AWS","description":"<p><strong>About the role</strong></p>\n<p>We are seeking an experienced Engineering Manager to lead the Cloud Inference team for AWS. You will lead your team to scale and optimize Claude to serve the massive audiences of developers and enterprise companies using AWS. You will own the end-to-end product of Claude on AWS, including API, load balancing, inference, capacity and operations. Your team will ensure our LLMs meet rigorous performance, safety and security standards and enhance our core infrastructure for packaging, testing, and deploying inference technology across the globe. Your work will increase the scale at which Anthropic operates and accelerate our ability to reliably launch new frontier models and innovative features to customers across all platforms.</p>\n<p><strong>Responsibilities:</strong></p>\n<ul>\n<li>Set technical strategy and oversee development of Claude on AWS across all layers of the technical stack.</li>\n<li>Collaborate across teams and companies to deeply understand product, infrastructure, operations and capacity needs, identifying potential solutions to support frontier LLM serving</li>\n<li>Work closely with cross-functional stakeholders across companies to align on goals and drive outcomes</li>\n<li>Create clarity for the team and stakeholders in an ambiguous and evolving environment</li>\n<li>Take an inclusive approach to hiring and coaching top technical talent, and support a high performing team</li>\n<li>Design and run processes (e.g. postmortem review, incident response, on-call rotations) that help the team operate effectively and never fail the same way twice</li>\n</ul>\n<p><strong>You may be a good fit if you:</strong></p>\n<ul>\n<li>Have 10+ years of experience in high-scale, high-reliability software development, particularly infrastructure or capacity management</li>\n<li>Have 5+ years of engineering management experience</li>\n<li>Experience recruiting, scaling, and retaining engineering talent in a high growth environment</li>\n<li>Have experience scaling products, resources and operations to accommodate rapid growth</li>\n<li>Are deeply interested in the potential transformative effects of advanced AI systems and are committed to ensuring their safe development</li>\n<li>Excel at building strong relationships and strategy with stakeholders across engineering, product, finance, and sales</li>\n<li>Have experience working with external partners to align goals and deliver impact</li>\n<li>Enjoy working in a fast-paced, early environment; comfortable with adapting priorities as driven by the rapidly evolving AI space</li>\n<li>Have excellent written and verbal communication skills</li>\n<li>Demonstrated success building a culture of belonging and engineering excellence</li>\n<li>Are motivated by developing AI responsibly and safely</li>\n<li>Are willing and able to travel frequently between Seattle and the SF Bay Area</li>\n</ul>\n<p><strong>Strong candidates may also have experience with:</strong></p>\n<ul>\n<li>Experience with machine learning infrastructure like GPUs, TPUs, or Trainium, as well as supporting networking infrastructure like NCCL</li>\n<li>Experience as a Product Manager</li>\n<li>Experience with deployment and capacity management automation</li>\n<li>Security and privacy best practice expertise</li>\n</ul>\n<p><strong>Logistics</strong></p>\n<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience. <strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>\n<p><strong>Visa sponsorship:</strong> We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>\n<p><strong>We encourage you to apply even if you do not believe you meet every single qualification.</strong> Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work.</p>\n<p><strong>Your safety matters to us.</strong> To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you&#39;re ever unsure about a communication, don&#39;t click any links—visit anthropic.com/careers directly for confirmed position openings.</p>\n<p><strong>How we&#39;re different</strong></p>\n<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as a collaborative effort, and we work closely with other researchers, engineers, and experts to advance our understanding of AI and its applications.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_0a7113f5-76c","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://job-boards.greenhouse.io","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5141377008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$405,000 - $485,000 USD","x-skills-required":["high-scale, high-reliability software development","infrastructure or capacity management","engineering management","recruiting, scaling, and retaining engineering talent","scaling products, resources and operations","machine learning infrastructure","deployment and capacity management automation","security and privacy best practice expertise"],"x-skills-preferred":["experience with GPUs, TPUs, or Trainium","experience as a Product Manager","experience with networking infrastructure like NCCL"],"datePosted":"2026-03-08T13:56:51.226Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"high-scale, high-reliability software development, infrastructure or capacity management, engineering management, recruiting, scaling, and retaining engineering talent, scaling products, resources and operations, machine learning infrastructure, deployment and capacity management automation, security and privacy best practice expertise, experience with GPUs, TPUs, or Trainium, experience as a Product Manager, experience with networking infrastructure like NCCL","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":405000,"maxValue":485000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_25934fbc-c50"},"title":"Staff / Senior Software Engineer, Cloud Inference","description":"<p><strong>About the Role</strong></p>\n<p>The Cloud Inference team scales and optimizes Claude to serve the massive audiences of developers and enterprise companies across AWS, GCP, Azure, and future cloud service providers (CSPs). We own the end-to-end product of Claude on each cloud platform—from API integration and intelligent request routing to inference execution, capacity management, and day-to-day operations.</p>\n<p>Our engineers are extremely high leverage: we simultaneously drive multiple major revenue streams while optimizing one of Anthropic&#39;s most precious resources—compute. As we expand to more cloud platforms, the complexity of managing inference efficiently across providers with different hardware, networking stacks, and operational models grows significantly. We need engineers who can navigate these platform differences, build robust abstractions that work across providers, and make smart infrastructure decisions that keep us cost-effective at massive scale.</p>\n<p>Your work will increase the scale at which our services operate, accelerate our ability to reliably launch new frontier models and innovative features to customers across all platforms, and ensure our LLMs meet rigorous safety, performance, and security standards.</p>\n<p><strong>What You&#39;ll Do</strong></p>\n<ul>\n<li>Design and build infrastructure that serves Claude across multiple CSPs, accounting for differences in compute hardware, networking, APIs, and operational models</li>\n<li>Collaborate with CSP partner engineering teams to resolve operational issues, influence provider roadmaps, and stand up end-to-end serving on new cloud platforms</li>\n<li>Design and evolve CI/CD automation systems, including validation and deployment pipelines, that reliably ship new model versions to millions of users across cloud platforms without regressions</li>\n<li>Design interfaces and tooling abstractions across CSPs that enable cost-effective inference management, scale across providers, and reduce per-platform complexity</li>\n<li>Contribute to capacity planning and autoscaling strategies that dynamically match supply with demand across CSP validation and production workloads</li>\n<li>Optimize inference cost and performance across providers—designing workload placement and routing systems that direct requests to the most cost-effective accelerator and region</li>\n<li>Contribute to inference features that must work consistently across all platforms</li>\n<li>Analyze observability data across providers to identify performance bottlenecks, cost anomalies, and regressions, and drive remediation based on real-world production workloads</li>\n</ul>\n<p><strong>You May Be a Good Fit If You:</strong></p>\n<ul>\n<li>Have significant software engineering experience, with a strong background in high-performance, large-scale distributed systems serving millions of users</li>\n<li>Have experience building or operating services on at least one major cloud platform (AWS, GCP, or Azure), with exposure to Kubernetes, Infrastructure as Code or container orchestration</li>\n<li>Have strong interest in inference</li>\n<li>Thrive in cross-functional collaboration with both internal teams and external partners</li>\n<li>Are a fast learner who can quickly ramp up on new technologies, hardware platforms, and provider ecosystems</li>\n<li>Are highly autonomous and self-driven, taking ownership of problems end-to-end with a bias toward flexibility and high-impact work</li>\n<li>Pick up slack, even when it goes outside your job description</li>\n</ul>\n<p><strong>Strong Candidates May Also Have Experience With</strong></p>\n<ul>\n<li>Direct experience working with CSP partner teams to scale infrastructure or products across multiple platforms, navigating differences in networking, security, privacy, billing, and managed service offerings</li>\n<li>A background in building platform-agnostic tooling or abstraction layers that work across cloud providers</li>\n<li>Hands-on experience with capacity management, cost optimization, or resource planning at scale across heterogeneous environments</li>\n<li>Strong familiarity with LLM inference optimization, batching, caching, and serving strategies</li>\n<li>Experience with Machine learning infrastructure including GPUs, TPUs, Trainium, or other AI accelerators</li>\n<li>Background designing and building CI/CD systems that automate deployment and validation across cloud environments</li>\n<li>Solid understanding of multi-region deployments, geographic routing, and global traffic management</li>\n<li>Proficiency in Python or Rust</li>\n</ul>\n<p><strong>Logistics</strong></p>\n<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience. <strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>\n<p><strong>Visa sponsorship:</strong> We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_25934fbc-c50","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5107466008","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$300,000 - $485,000 USD","x-skills-required":["Software engineering","Cloud infrastructure","Kubernetes","Infrastructure as Code","Container orchestration","LLM inference optimization","Batching","Caching","Serving strategies","Machine learning infrastructure","GPUs","TPUs","Trainium","AI accelerators","CI/CD systems","Deployment and validation","Cloud environments","Multi-region deployments","Geographic routing","Global traffic management"],"x-skills-preferred":["Python","Rust","Cloud platforms","Networking","Security","Privacy","Billing","Managed service offerings","Platform-agnostic tooling","Abstraction layers","Capacity management","Cost optimization","Resource planning"],"datePosted":"2026-03-08T13:49:59.956Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Software engineering, Cloud infrastructure, Kubernetes, Infrastructure as Code, Container orchestration, LLM inference optimization, Batching, Caching, Serving strategies, Machine learning infrastructure, GPUs, TPUs, Trainium, AI accelerators, CI/CD systems, Deployment and validation, Cloud environments, Multi-region deployments, Geographic routing, Global traffic management, Python, Rust, Cloud platforms, Networking, Security, Privacy, Billing, Managed service offerings, Platform-agnostic tooling, Abstraction layers, Capacity management, Cost optimization, Resource planning","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":300000,"maxValue":485000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_3b20b513-ea1"},"title":"Staff+ Software Engineer, Systems","description":"<p><strong>About Anthropic</strong></p>\n<p>Anthropic&#39;s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.</p>\n<p><strong>About the Role</strong></p>\n<p>Anthropic&#39;s Infrastructure organisation is foundational to our mission of developing AI systems that are reliable, interpretable, and steerable. The systems we build determine how quickly we can train new models, how reliably we can run safety experiments, and how effectively we can scale Claude to millions of users — demonstrating that safe, reliable infrastructure and frontier capabilities can go hand in hand.</p>\n<p>The Systems engineering team owns compute uptime and resilience at massive scale, building the clusters, automation, and observability that make frontier AI research possible and safely deployable to customers.</p>\n<p>_Team Matching: Team matching is determined after the interview process based on interview performance, interests, and business priorities. Please note we may also consider you for different Infrastructure teams._</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Own the technical strategy and roadmap for your area, translating team-level goals into concrete execution plans</li>\n<li>Drive cross-team initiatives to build and scale AI clusters (thousands to hundreds of thousands of machines)</li>\n<li>Define infrastructure architecture, ensuring the hardest problems get solved — whether by you directly or by working through others</li>\n<li>Partner with cloud providers and internal stakeholders to shape long-term compute, data, and infrastructure strategy</li>\n<li>Establish and evolve operational excellence practices (incident response, postmortem culture, on-call)</li>\n</ul>\n<p><strong>You may be a good fit if you:</strong></p>\n<ul>\n<li>Have 10+ years of software engineering experience</li>\n<li>Have led complex, multi-quarter technical initiatives that span multiple teams or systems</li>\n<li>Can set technical direction for a team, not just execute within it</li>\n<li>Have deep expertise in distributed systems, reliability, and cloud platforms (Kubernetes, IaC, AWS/GCP)</li>\n<li>Are strong in at least one systems language (Python, Rust, Go, Java)</li>\n<li>Naturally uplevel the engineers around you and can redirect efforts when things are heading off track</li>\n<li>Build alignment across senior stakeholders and communicate effectively at all levels</li>\n</ul>\n<p><strong>Strong candidates may have:</strong></p>\n<ul>\n<li>Security and privacy best practice expertise</li>\n<li>Experience with machine learning infrastructure like GPUs, TPUs, or Trainium, as well as supporting networking infrastructure like NCCL</li>\n<li>Low level systems experience, for example linux kernel tuning and eBPF</li>\n<li>Technical expertise: Quickly understanding systems design tradeoffs, keeping track of rapidly evolving software systems</li>\n</ul>\n<p>_Deadline to apply: None. Applications will be reviewed on a rolling basis._</p>\n<p><strong>Logistics</strong></p>\n<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience. <strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>\n<p><strong>Visa sponsorship:</strong> We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>\n<p><strong>We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work.</strong></p>\n<p><strong>Your safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you&#39;re ever unsure about a communication, don&#39;t click any links—visit anthropic.com/careers directly for confirmed position openings.</strong></p>\n<p><strong>How we&#39;re different</strong></p>\n<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We&#39;re an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.</p>\n<p>The easiest way to understand our research directions is to read our recent research. This re</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_3b20b513-ea1","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://job-boards.greenhouse.io","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5108817008","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$405,000 - $485,000 USD","x-skills-required":["distributed systems","reliability","cloud platforms","Kubernetes","IaC","AWS/GCP","Python","Rust","Go","Java"],"x-skills-preferred":["security and privacy best practice expertise","machine learning infrastructure","GPUs","TPUs","Trainium","NCCL","low level systems experience","linux kernel tuning","eBPF"],"datePosted":"2026-03-08T13:49:17.054Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"distributed systems, reliability, cloud platforms, Kubernetes, IaC, AWS/GCP, Python, Rust, Go, Java, security and privacy best practice expertise, machine learning infrastructure, GPUs, TPUs, Trainium, NCCL, low level systems experience, linux kernel tuning, eBPF","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":405000,"maxValue":485000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_886a66bf-10d"},"title":"Senior Software Engineer, Systems","description":"<p><strong>About Anthropic</strong></p>\n<p>Anthropic&#39;s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.</p>\n<p><strong>About the Role</strong></p>\n<p>Anthropic&#39;s Infrastructure organisation is foundational to our mission of developing AI systems that are reliable, interpretable, and steerable. The systems we build determine how quickly we can train new models, how reliably we can run safety experiments, and how effectively we can scale Claude to millions of users — demonstrating that safe, reliable infrastructure and frontier capabilities can go hand in hand.</p>\n<p>The Systems engineering team owns compute uptime and resilience at massive scale, building the clusters, automation, and observability that make frontier AI research possible and safely deployable to customers.</p>\n<p>_Team Matching: Team matching is determined after the interview process based on interview performance, interests, and business priorities. Please note we may also consider you for different Infrastructure teams._</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Lead infrastructure projects from design through delivery, owning scope, execution, and outcomes</li>\n<li>Build and maintain systems that support AI clusters at massive scale (thousands to hundreds of thousands of machines)</li>\n<li>Partner with cloud providers and internal teams to solve compute, networking, and reliability challenges</li>\n<li>Tackle difficult technical problems in your domain and proactively fill gaps in tooling, documentation, and processes</li>\n<li>Contribute to operational practices including incident response, postmortems, and on-call rotations</li>\n</ul>\n<p><strong>You may be a good fit if you:</strong></p>\n<ul>\n<li>Have 6+ years of software engineering experience</li>\n<li>Have led technical projects end-to-end over multiple months, including scoping, breaking down work, and driving delivery</li>\n<li>Have deep knowledge of distributed systems, reliability, and cloud platforms (Kubernetes, IaC, AWS/GCP)</li>\n<li>Are strong in at least one systems language (Python, Rust, Go, Java)</li>\n<li>Solve hard problems independently and know when to pull others in</li>\n<li>Help teammates grow through knowledge sharing and thoughtful technical guidance</li>\n<li>Communicate clearly in design docs, presentations, and cross-functional discussions</li>\n</ul>\n<p><strong>Strong candidates may have:</strong></p>\n<ul>\n<li>Security and privacy best practice expertise</li>\n<li>Experience with machine learning infrastructure like GPUs, TPUs, or Trainium, as well as supporting networking infrastructure like NCCL</li>\n<li>Low level systems experience, for example linux kernel tuning and eBPF</li>\n<li>Technical expertise: Quickly understanding systems design tradeoffs, keeping track of rapidly evolving software systems</li>\n</ul>\n<p>_Deadline to apply: None. Applications will be reviewed on a rolling basis._</p>\n<p><strong>Logistics</strong></p>\n<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience. <strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>\n<p><strong>Visa sponsorship:</strong> We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>\n<p><strong>We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work.</strong></p>\n<p><strong>Your safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you&#39;re ever unsure about a communication, don&#39;t click any links—visit anthropic.com/careers directly for confirmed position openings.</strong></p>\n<p><strong>How we&#39;re different</strong></p>\n<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We&#39;re an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.</p>\n<p>The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_886a66bf-10d","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://job-boards.greenhouse.io","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4915842008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"£240,000 - £325,000GBP","x-skills-required":["distributed systems","reliability","cloud platforms","Kubernetes","IaC","AWS/GCP","Python","Rust","Go","Java"],"x-skills-preferred":["security and privacy best practice expertise","machine learning infrastructure","GPUs","TPUs","Trainium","NCCL","low level systems experience","linux kernel tuning","eBPF"],"datePosted":"2026-03-08T13:46:27.991Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"London, UK"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"distributed systems, reliability, cloud platforms, Kubernetes, IaC, AWS/GCP, Python, Rust, Go, Java, security and privacy best practice expertise, machine learning infrastructure, GPUs, TPUs, Trainium, NCCL, low level systems experience, linux kernel tuning, eBPF","baseSalary":{"@type":"MonetaryAmount","currency":"GBP","value":{"@type":"QuantitativeValue","minValue":240000,"maxValue":325000,"unitText":"YEAR"}}}]}