{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/infrastructure-migrations"},"x-facet":{"type":"skill","slug":"infrastructure-migrations","display":"Infrastructure Migrations","count":2},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_c4e35d55-5d1"},"title":"Technical Program Manager, Safeguards (Infrastructure & Evals)","description":"<p>Job Title: Technical Program Manager, Safeguards (Infrastructure &amp; Evals)</p>\n<p>About Anthropic</p>\n<p>Anthropic&#39;s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole.</p>\n<p>About the Role</p>\n<p>Safeguards Engineering builds and operates the infrastructure that keeps Anthropic&#39;s AI systems safe in production , the classifiers, detection pipelines, evaluation platforms, and monitoring systems that sit between our models and the real world. That infrastructure needs to be not just correct, but reliable: when a safety-critical pipeline goes down or degrades, the consequences can be serious, and they can be invisible until someone looks closely.</p>\n<p>As a Technical Program Manager for Safeguards Infrastructure and Evals, you&#39;ll own the operational health and forward momentum of this stack. Your primary responsibility is driving reliability , owning the incident-response and post-mortem process, ensuring SLOs are defined and met in partnership with various teams, and making sure that when things go wrong, the right people know, the right actions get taken, and those actions actually get closed out.</p>\n<p>Alongside that ongoing operational rhythm, you&#39;ll coordinate the larger platform investments: migrations, eval-platform improvements, and the cross-team dependencies that connect them. This role sits at the intersection of operations and program management. It requires genuine technical depth , you need to understand how these systems work well enough to triage effectively, judge what&#39;s actually safety-critical versus what can wait, and have informed conversations with the engineers building and maintaining them. But the core of the job is keeping the machine running well and the work moving.</p>\n<p>What You&#39;ll Do:</p>\n<ul>\n<li>Own the Safeguards Engineering ops review</li>\n<li>Drive the recurring cadence that keeps the team informed and coordinated: surfacing recent incidents and failures, bringing visibility to reliability trends, and making sure the right people are in the room when decisions need to be made.</li>\n<li>Drive incident tracking and post-mortem execution</li>\n<li>Establish and maintain SLOs with partner teams</li>\n<li>Maintain runbook quality and incident-ownership clarity</li>\n<li>Drive platform migrations and infrastructure projects</li>\n<li>Coordinate evals platform improvements</li>\n</ul>\n<p>You might be a good fit if you:</p>\n<ul>\n<li>Have solid technical program management experience, particularly in operational or infrastructure-heavy environments , you&#39;re comfortable owning a mix of ongoing operational cadences and discrete project work simultaneously.</li>\n<li>Understand how production ML systems work well enough to triage incidents intelligently and have substantive conversations with engineers about what&#39;s going wrong and why , you don&#39;t need to write the code, but you need to follow the technical thread.</li>\n<li>Are energized by closing loops. Post-mortem action items that never get done, SLOs that no one checks, runbooks that go stale , these things bother you, and you know how to build the processes and follow-ups that fix them.</li>\n<li>Can work effectively across team boundaries , comfortable coordinating with partner teams (like Inference) where you don&#39;t have direct authority, and skilled at keeping shared work moving through influence and clear communication.</li>\n<li>Thrive in environments where the work shifts between &#39;keep the lights on&#39; and &#39;build something new&#39; , and can context-switch between incident follow-ups and longer-horizon platform projects without dropping either.</li>\n<li>Have experience with or strong interest in AI safety , you understand why the reliability of a safety-critical pipeline is a different kind of problem than the reliability of a product feature, and that distinction motivates you.</li>\n</ul>\n<p>Strong candidates may also:</p>\n<ul>\n<li>Have experience with SRE practices, incident management frameworks, or on-call operations at scale.</li>\n<li>Have worked on or with evaluation infrastructure for ML systems , understanding how evals get designed, run, and interpreted.</li>\n<li>Have experience driving infrastructure migrations in complex, multi-team environments , particularly where the migration touches operational systems that can&#39;t go offline.</li>\n<li>Be familiar with monitoring and alerting tooling (PagerDuty, Datadog, or equivalents) and the operational culture around them.</li>\n</ul>\n<p>Deadline to apply: None, applications will be received on a rolling basis.</p>\n<p>The annual compensation range for this role is listed below. For sales roles, the range provided is the role&#39;s On Target Earnings (&#39;OTE&#39;) range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.</p>\n<p>Annual Salary: $290,000-$365,000 USD</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_c4e35d55-5d1","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5108695008","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$290,000-$365,000 USD","x-skills-required":["Technical Program Management","Operational or Infrastructure-heavy environments","Production ML systems","Incident management frameworks","On-call operations","Evaluation infrastructure for ML systems","Infrastructure migrations","Monitoring and alerting tooling"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:56:34.910Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Technical Program Management, Operational or Infrastructure-heavy environments, Production ML systems, Incident management frameworks, On-call operations, Evaluation infrastructure for ML systems, Infrastructure migrations, Monitoring and alerting tooling","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":290000,"maxValue":365000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_ca221b6f-dca"},"title":"Technical Program Manager, Safeguards (Infrastructure & Evals)","description":"<p><strong>About the Role</strong></p>\n<p>Safeguards Engineering builds and operates the infrastructure that keeps Anthropic&#39;s AI systems safe in production. As a Technical Program Manager for Safeguards Infrastructure and Evals, you&#39;ll own the operational health and forward momentum of this stack.</p>\n<p>Your primary responsibility is driving reliability , owning the incident-response and post-mortem process, ensuring SLOs are defined and met in partnership with various teams, and making sure that when things go wrong, the right people know, the right actions get taken, and those actions actually get closed out.</p>\n<p>Alongside that ongoing operational rhythm, you&#39;ll coordinate the larger platform investments: migrations, eval-platform improvements, and the cross-team dependencies that connect them.</p>\n<p>This role sits at the intersection of operations and program management. It requires genuine technical depth , you need to understand how these systems work well enough to triage effectively, judge what&#39;s actually safety-critical versus what can wait, and have informed conversations with the engineers building and maintaining them.</p>\n<p>But the core of the job is keeping the machine running well and the work moving.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Own the Safeguards Engineering ops review</li>\n<li>Drive the recurring cadence that keeps the team informed and coordinated: surfacing recent incidents and failures, bringing visibility to reliability trends, and making sure the right people are in the room when decisions need to be made.</li>\n<li>Drive incident tracking and post-mortem execution</li>\n<li>Establish and maintain SLOs with partner teams</li>\n<li>Maintain runbook quality and incident-ownership clarity</li>\n<li>Drive platform migrations and infrastructure projects</li>\n<li>Coordinate evals platform improvements</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>Solid technical program management experience, particularly in operational or infrastructure-heavy environments</li>\n<li>Understanding of how production ML systems work well enough to triage incidents intelligently and have substantive conversations with engineers about what&#39;s going wrong and why</li>\n<li>Ability to work effectively across team boundaries</li>\n<li>Experience with or strong interest in AI safety</li>\n</ul>\n<p><strong>Nice to Have</strong></p>\n<ul>\n<li>Experience with SRE practices, incident management frameworks, or on-call operations at scale</li>\n<li>Familiarity with monitoring and alerting tooling (PagerDuty, Datadog, or equivalents)</li>\n<li>Experience driving infrastructure migrations in complex, multi-team environments</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_ca221b6f-dca","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://anthropic.ai/","logo":"https://logos.yubhub.co/anthropic.ai.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5108695008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$290,000-$365,000 USD","x-skills-required":["Technical Program Management","Operational or Infrastructure-heavy Environments","Production ML Systems","Incident Tracking and Post-Mortem Execution","Service-Level Objectives (SLOs)","Runbook Quality and Incident-Ownership Clarity","Platform Migrations and Infrastructure Projects","Evals Platform Improvements"],"x-skills-preferred":["SRE Practices","Incident Management Frameworks","On-Call Operations at Scale","Monitoring and Alerting Tooling","Infrastructure Migrations in Complex, Multi-Team Environments"],"datePosted":"2026-04-18T15:55:20.655Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Technical Program Management, Operational or Infrastructure-heavy Environments, Production ML Systems, Incident Tracking and Post-Mortem Execution, Service-Level Objectives (SLOs), Runbook Quality and Incident-Ownership Clarity, Platform Migrations and Infrastructure Projects, Evals Platform Improvements, SRE Practices, Incident Management Frameworks, On-Call Operations at Scale, Monitoring and Alerting Tooling, Infrastructure Migrations in Complex, Multi-Team Environments","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":290000,"maxValue":365000,"unitText":"YEAR"}}}]}