{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/production-ml-systems"},"x-facet":{"type":"skill","slug":"production-ml-systems","display":"Production Ml Systems","count":6},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_b1be4c11-417"},"title":"Senior Research Scientist, Reward Models","description":"<p>As a Senior Research Scientist on our Reward Models team, you&#39;ll lead research efforts to improve how we specify and learn human preferences at scale. Your work will directly shape how our models understand and optimize for what humans actually want , enabling Claude to be more useful, more reliable, and better aligned with human values.</p>\n<p>This role focuses on pushing the frontier of reward modeling for large language models. You&#39;ll develop novel architectures and training methodologies for RLHF, research new approaches to LLM-based evaluation and grading (including rubric-based methods), and investigate techniques to identify and mitigate reward hacking. You&#39;ll collaborate closely with teams across Anthropic, including Finetuning, Alignment Science, and our broader research organization, to ensure your work translates into concrete improvements in both model capabilities and safety.</p>\n<p>We&#39;re looking for someone who can drive ambitious research agendas while also shipping practical improvements to production systems. You&#39;ll have the opportunity to work on some of the most important open problems in AI alignment, with access to frontier models and significant computational resources. Your work will directly advance the science of how we train AI systems to be both highly capable and safe.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Lead research on novel reward model architectures and training approaches for RLHF</li>\n<li>Develop and evaluate LLM-based grading and evaluation methods, including rubric-driven approaches that improve consistency and interpretability</li>\n<li>Research techniques to detect, characterize, and mitigate reward hacking and specification gaming</li>\n<li>Design experiments to understand reward model generalization, robustness, and failure modes</li>\n<li>Collaborate with the Finetuning team to translate research insights into improvements for production training pipelines</li>\n<li>Contribute to research publications, blog posts, and internal documentation</li>\n<li>Mentor other researchers and help build institutional knowledge around reward modeling</li>\n</ul>\n<p>You may be a good fit if you</p>\n<ul>\n<li>Have a track record of research contributions in reward modeling, RLHF, or closely related areas of machine learning</li>\n<li>Have experience training and evaluating reward models for large language models</li>\n<li>Are comfortable designing and running large-scale experiments with significant computational resources</li>\n<li>Can work effectively across research and engineering, iterating quickly while maintaining scientific rigor</li>\n<li>Enjoy collaborative research and can communicate complex ideas clearly to diverse audiences</li>\n<li>Care deeply about building AI systems that are both highly capable and safe</li>\n</ul>\n<p>Strong candidates may also</p>\n<ul>\n<li>Have published research on reward modeling, preference learning, or RLHF</li>\n<li>Have experience with LLM-as-judge approaches, including calibration and reliability challenges</li>\n<li>Have worked on reward hacking, specification gaming, or related robustness problems</li>\n<li>Have experience with constitutional AI, debate, or other scalable oversight approaches</li>\n<li>Have contributed to production ML systems at scale</li>\n<li>Have familiarity with interpretability techniques as applied to understanding reward model behavior</li>\n</ul>\n<p>The annual compensation range for this role is $350,000-$500,000 USD.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_b1be4c11-417","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5024835008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$350,000-$500,000 USD","x-skills-required":["reward modeling","RLHF","LLM-based evaluation and grading","rubric-driven approaches","reward hacking","specification gaming","large-scale experiments","computational resources","research and engineering","collaborative research","complex ideas communication","AI systems development"],"x-skills-preferred":["published research","LLM-as-judge approaches","calibration and reliability challenges","constitutional AI","debate","scalable oversight approaches","production ML systems","interpretability techniques"],"datePosted":"2026-04-18T15:57:50.755Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Remote-Friendly (Travel Required) | San Francisco, CA"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"reward modeling, RLHF, LLM-based evaluation and grading, rubric-driven approaches, reward hacking, specification gaming, large-scale experiments, computational resources, research and engineering, collaborative research, complex ideas communication, AI systems development, published research, LLM-as-judge approaches, calibration and reliability challenges, constitutional AI, debate, scalable oversight approaches, production ML systems, interpretability techniques","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":350000,"maxValue":500000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_c4e35d55-5d1"},"title":"Technical Program Manager, Safeguards (Infrastructure & Evals)","description":"<p>Job Title: Technical Program Manager, Safeguards (Infrastructure &amp; Evals)</p>\n<p>About Anthropic</p>\n<p>Anthropic&#39;s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole.</p>\n<p>About the Role</p>\n<p>Safeguards Engineering builds and operates the infrastructure that keeps Anthropic&#39;s AI systems safe in production , the classifiers, detection pipelines, evaluation platforms, and monitoring systems that sit between our models and the real world. That infrastructure needs to be not just correct, but reliable: when a safety-critical pipeline goes down or degrades, the consequences can be serious, and they can be invisible until someone looks closely.</p>\n<p>As a Technical Program Manager for Safeguards Infrastructure and Evals, you&#39;ll own the operational health and forward momentum of this stack. Your primary responsibility is driving reliability , owning the incident-response and post-mortem process, ensuring SLOs are defined and met in partnership with various teams, and making sure that when things go wrong, the right people know, the right actions get taken, and those actions actually get closed out.</p>\n<p>Alongside that ongoing operational rhythm, you&#39;ll coordinate the larger platform investments: migrations, eval-platform improvements, and the cross-team dependencies that connect them. This role sits at the intersection of operations and program management. It requires genuine technical depth , you need to understand how these systems work well enough to triage effectively, judge what&#39;s actually safety-critical versus what can wait, and have informed conversations with the engineers building and maintaining them. But the core of the job is keeping the machine running well and the work moving.</p>\n<p>What You&#39;ll Do:</p>\n<ul>\n<li>Own the Safeguards Engineering ops review</li>\n<li>Drive the recurring cadence that keeps the team informed and coordinated: surfacing recent incidents and failures, bringing visibility to reliability trends, and making sure the right people are in the room when decisions need to be made.</li>\n<li>Drive incident tracking and post-mortem execution</li>\n<li>Establish and maintain SLOs with partner teams</li>\n<li>Maintain runbook quality and incident-ownership clarity</li>\n<li>Drive platform migrations and infrastructure projects</li>\n<li>Coordinate evals platform improvements</li>\n</ul>\n<p>You might be a good fit if you:</p>\n<ul>\n<li>Have solid technical program management experience, particularly in operational or infrastructure-heavy environments , you&#39;re comfortable owning a mix of ongoing operational cadences and discrete project work simultaneously.</li>\n<li>Understand how production ML systems work well enough to triage incidents intelligently and have substantive conversations with engineers about what&#39;s going wrong and why , you don&#39;t need to write the code, but you need to follow the technical thread.</li>\n<li>Are energized by closing loops. Post-mortem action items that never get done, SLOs that no one checks, runbooks that go stale , these things bother you, and you know how to build the processes and follow-ups that fix them.</li>\n<li>Can work effectively across team boundaries , comfortable coordinating with partner teams (like Inference) where you don&#39;t have direct authority, and skilled at keeping shared work moving through influence and clear communication.</li>\n<li>Thrive in environments where the work shifts between &#39;keep the lights on&#39; and &#39;build something new&#39; , and can context-switch between incident follow-ups and longer-horizon platform projects without dropping either.</li>\n<li>Have experience with or strong interest in AI safety , you understand why the reliability of a safety-critical pipeline is a different kind of problem than the reliability of a product feature, and that distinction motivates you.</li>\n</ul>\n<p>Strong candidates may also:</p>\n<ul>\n<li>Have experience with SRE practices, incident management frameworks, or on-call operations at scale.</li>\n<li>Have worked on or with evaluation infrastructure for ML systems , understanding how evals get designed, run, and interpreted.</li>\n<li>Have experience driving infrastructure migrations in complex, multi-team environments , particularly where the migration touches operational systems that can&#39;t go offline.</li>\n<li>Be familiar with monitoring and alerting tooling (PagerDuty, Datadog, or equivalents) and the operational culture around them.</li>\n</ul>\n<p>Deadline to apply: None, applications will be received on a rolling basis.</p>\n<p>The annual compensation range for this role is listed below. For sales roles, the range provided is the role&#39;s On Target Earnings (&#39;OTE&#39;) range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.</p>\n<p>Annual Salary: $290,000-$365,000 USD</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_c4e35d55-5d1","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5108695008","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$290,000-$365,000 USD","x-skills-required":["Technical Program Management","Operational or Infrastructure-heavy environments","Production ML systems","Incident management frameworks","On-call operations","Evaluation infrastructure for ML systems","Infrastructure migrations","Monitoring and alerting tooling"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:56:34.910Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Technical Program Management, Operational or Infrastructure-heavy environments, Production ML systems, Incident management frameworks, On-call operations, Evaluation infrastructure for ML systems, Infrastructure migrations, Monitoring and alerting tooling","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":290000,"maxValue":365000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_ca221b6f-dca"},"title":"Technical Program Manager, Safeguards (Infrastructure & Evals)","description":"<p><strong>About the Role</strong></p>\n<p>Safeguards Engineering builds and operates the infrastructure that keeps Anthropic&#39;s AI systems safe in production. As a Technical Program Manager for Safeguards Infrastructure and Evals, you&#39;ll own the operational health and forward momentum of this stack.</p>\n<p>Your primary responsibility is driving reliability , owning the incident-response and post-mortem process, ensuring SLOs are defined and met in partnership with various teams, and making sure that when things go wrong, the right people know, the right actions get taken, and those actions actually get closed out.</p>\n<p>Alongside that ongoing operational rhythm, you&#39;ll coordinate the larger platform investments: migrations, eval-platform improvements, and the cross-team dependencies that connect them.</p>\n<p>This role sits at the intersection of operations and program management. It requires genuine technical depth , you need to understand how these systems work well enough to triage effectively, judge what&#39;s actually safety-critical versus what can wait, and have informed conversations with the engineers building and maintaining them.</p>\n<p>But the core of the job is keeping the machine running well and the work moving.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Own the Safeguards Engineering ops review</li>\n<li>Drive the recurring cadence that keeps the team informed and coordinated: surfacing recent incidents and failures, bringing visibility to reliability trends, and making sure the right people are in the room when decisions need to be made.</li>\n<li>Drive incident tracking and post-mortem execution</li>\n<li>Establish and maintain SLOs with partner teams</li>\n<li>Maintain runbook quality and incident-ownership clarity</li>\n<li>Drive platform migrations and infrastructure projects</li>\n<li>Coordinate evals platform improvements</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>Solid technical program management experience, particularly in operational or infrastructure-heavy environments</li>\n<li>Understanding of how production ML systems work well enough to triage incidents intelligently and have substantive conversations with engineers about what&#39;s going wrong and why</li>\n<li>Ability to work effectively across team boundaries</li>\n<li>Experience with or strong interest in AI safety</li>\n</ul>\n<p><strong>Nice to Have</strong></p>\n<ul>\n<li>Experience with SRE practices, incident management frameworks, or on-call operations at scale</li>\n<li>Familiarity with monitoring and alerting tooling (PagerDuty, Datadog, or equivalents)</li>\n<li>Experience driving infrastructure migrations in complex, multi-team environments</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_ca221b6f-dca","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://anthropic.ai/","logo":"https://logos.yubhub.co/anthropic.ai.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5108695008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$290,000-$365,000 USD","x-skills-required":["Technical Program Management","Operational or Infrastructure-heavy Environments","Production ML Systems","Incident Tracking and Post-Mortem Execution","Service-Level Objectives (SLOs)","Runbook Quality and Incident-Ownership Clarity","Platform Migrations and Infrastructure Projects","Evals Platform Improvements"],"x-skills-preferred":["SRE Practices","Incident Management Frameworks","On-Call Operations at Scale","Monitoring and Alerting Tooling","Infrastructure Migrations in Complex, Multi-Team Environments"],"datePosted":"2026-04-18T15:55:20.655Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Technical Program Management, Operational or Infrastructure-heavy Environments, Production ML Systems, Incident Tracking and Post-Mortem Execution, Service-Level Objectives (SLOs), Runbook Quality and Incident-Ownership Clarity, Platform Migrations and Infrastructure Projects, Evals Platform Improvements, SRE Practices, Incident Management Frameworks, On-Call Operations at Scale, Monitoring and Alerting Tooling, Infrastructure Migrations in Complex, Multi-Team Environments","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":290000,"maxValue":365000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_5d38ab71-400"},"title":"Research Engineer, Pretraining Scaling","description":"<p><strong>About Anthropic</strong></p>\n<p>Anthropic&#39;s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.</p>\n<p><strong>About the Role:</strong></p>\n<p>Anthropic&#39;s ML Performance and Scaling team trains our production pretrained models, work that directly shapes the company&#39;s future and our mission to build safe, beneficial AI systems. As a Research Engineer on this team, you&#39;ll ensure our frontier models train reliably, efficiently, and at scale. This is demanding, high-impact work that requires both deep technical expertise and a genuine passion for the craft of large-scale ML systems.</p>\n<p>This role lives at the boundary between research and engineering. You&#39;ll work across our entire production training stack: performance optimisation, hardware debugging, experimental design, and launch coordination. During launches, the team works in tight lockstep, responding to production issues that can&#39;t wait for tomorrow.</p>\n<p><strong>Responsibilities:</strong></p>\n<ul>\n<li>Own critical aspects of our production pretraining pipeline, including model operations, performance optimisation, observability, and reliability</li>\n<li>Debug and resolve complex issues across the full stack—from hardware errors and networking to training dynamics and evaluation infrastructure</li>\n<li>Design and run experiments to improve training efficiency, reduce step time, increase uptime, and enhance model performance</li>\n<li>Respond to on-call incidents during model launches, diagnosing problems quickly and coordinating solutions across teams</li>\n<li>Build and maintain production logging, monitoring dashboards, and evaluation infrastructure</li>\n<li>Add new capabilities to the training codebase, such as long context support or novel architectures</li>\n<li>Collaborate closely with teammates across SF and London, as well as with Tokens, Architectures, and Systems teams</li>\n<li>Contribute to the team&#39;s institutional knowledge by documenting systems, debugging approaches, and lessons learned</li>\n</ul>\n<p><strong>You May Be a Good Fit If You:</strong></p>\n<ul>\n<li>Have hands-on experience training large language models, or deep expertise with JAX, TPU, PyTorch, or large-scale distributed systems</li>\n<li>Genuinely enjoy both research and engineering work—you&#39;d describe your ideal split as roughly 50/50 rather than heavily weighted toward one or the other</li>\n<li>Are excited about being on-call for production systems, working long days during launches, and solving hard problems under pressure</li>\n<li>Thrive when working on whatever is most impactful, even if that changes day-to-day based on what the production model needs</li>\n<li>Excel at debugging complex, ambiguous problems across multiple layers of the stack</li>\n<li>Communicate clearly and collaborate effectively, especially when coordinating across time zones or during high-stress incidents</li>\n<li>Are passionate about the work itself and want to refine your craft as a research engineer</li>\n<li>Care about the societal impacts of AI and responsible scaling</li>\n</ul>\n<p><strong>Strong Candidates May Also Have:</strong></p>\n<ul>\n<li>Previous experience training LLM’s or working extensively with JAX/TPU, PyTorch, or other ML frameworks at scale</li>\n<li>Contributed to open-source LLM frameworks (e.g., open\\_lm, llm-foundry, mesh-transformer-jax)</li>\n<li>Published research on model training, scaling laws, or ML systems</li>\n<li>Experience with production ML systems, observability tools, or evaluation infrastructure</li>\n<li>Background as a systems engineer, quant, or in other roles requiring both technical depth and operational excellence</li>\n</ul>\n<p><strong>What Makes This Role Unique:</strong></p>\n<p>This is not a typical research engineering role. The work is highly operational—you&#39;ll be deeply involved in keeping our production models training smoothly, which means being responsive to incidents, flexible about priorities, and comfortable with uncertainty. During launches, the team often works extended hours and may need to respond to issues on evenings and weekends.</p>\n<p>However, this operational intensity comes with extraordinary learning opportunities. You&#39;ll gain hands-on experience with some of the largest, most sophisticated training runs in the industry. You&#39;ll work alongside world-class researchers and engineers, and the institutional knowledge you build will compound in ways that can&#39;t be easily transferred. For people who thrive on this type of work, it&#39;s uniquely rewarding.</p>\n<p>We&#39;re building a close-knit team of people who genuinely care about doing excellent work together. If you&#39;re someone who wants to be part of training the models that will define the future of AI—and you&#39;re excited about the full reality of what that entails—we&#39;d love to hear from you.</p>\n<p><strong>Logistics</strong></p>\n<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience. <strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>\n<p><strong>Visa sponsorship:</strong> We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>\n<p><strong>We encourage you to apply even if you do not believe you meet every single qualification.</strong></p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_5d38ab71-400","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://job-boards.greenhouse.io","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4938432008","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$350,000 - $850,000USD","x-skills-required":["JAX","TPU","PyTorch","large-scale distributed systems","model operations","performance optimisation","observability","reliability","model training","scaling laws","ML systems"],"x-skills-preferred":["open-source LLM frameworks","production ML systems","observability tools","evaluation infrastructure","systems engineer","quant"],"datePosted":"2026-03-08T13:48:54.589Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"JAX, TPU, PyTorch, large-scale distributed systems, model operations, performance optimisation, observability, reliability, model training, scaling laws, ML systems, open-source LLM frameworks, production ML systems, observability tools, evaluation infrastructure, systems engineer, quant","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":350000,"maxValue":850000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_912450ea-c61"},"title":"Research Engineer, Environment Scaling","description":"<p><strong>About the role</strong></p>\n<p>The Environment Scaling team is a team of researchers and engineers whose goal is to improve the intelligence of our public models for novel verticals and use cases. The team builds the training environments that fuel RL at scale. This is a unique role that combines executing directly on ML research, data operations, and project management to improve our models. You&#39;ll own the end-to-end process of creating RL environments for new capabilities: identifying high-value tasks, designing reward signals, managing vendor relationships, and measuring impact on model performance.</p>\n<p><strong>Responsibilities:</strong></p>\n<ul>\n<li>Improve and execute our fine-tuning strategies for adapting Claude to new domains and tasks</li>\n<li>Manage technical relationships with external data vendors, including evaluation of data quality and reward design</li>\n<li>Collaborate with domain experts to design data pipelines and evaluations</li>\n<li>Explore novel ways of creating RL environments for high value tasks</li>\n<li>Develop and improve QA frameworks to catch reward hacking and ensure environment quality</li>\n<li>Partner with other RL research teams and product teams to translate capability goals into training environments and evals</li>\n</ul>\n<p><strong>You may be a good fit if you:</strong></p>\n<ul>\n<li>Have experience with fine-tuning large language models for specific domains or real-world use cases and/or domain expertise in an area where we would like to make our models more useful.</li>\n<li>Have experience with reinforcement learning, reward design, or training data curation for LLMs</li>\n<li>Are comfortable managing technical vendor relationships and iterating quickly on feedback</li>\n<li>Find value in reading through datasets to understand them and spot issues</li>\n<li>Have strong project management and interpersonal skills</li>\n<li>Are passionate about making AI more useful and accessible across different industries</li>\n<li>Are excited about a role that includes a combination of ML research, data operations, and project management</li>\n</ul>\n<p><strong>Strong candidates may also:</strong></p>\n<ul>\n<li>Have experience training production ML systems</li>\n<li>Be familiar with distributed systems and cloud infrastructure</li>\n<li>Have domain expertise in an area where we would like to make our models more useful</li>\n<li>Have experience working with external vendors or technical partners</li>\n</ul>\n<p><strong>Logistics</strong></p>\n<ul>\n<li>Education requirements: We require at least a Bachelor&#39;s degree in a related field or equivalent experience.</li>\n<li>Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</li>\n<li>Visa sponsorship: We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</li>\n</ul>\n<p><strong>We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work.</strong></p>\n<p><strong>Your safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you&#39;re ever unsure about a communication, don&#39;t click any links—visit anthropic.com/careers directly for confirmed position openings.</strong></p>\n<p><strong>How we&#39;re different</strong></p>\n<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We&#39;re an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.</p>\n<p>The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI &amp; Compute, Concrete Problems in AI Safety, and Learning from Human Preferences.</p>\n<p><strong>Come work with us!</strong></p>\n<p>Anthropic is a public benefit corporation headquartered in San Francisco, CA.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_912450ea-c61","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://job-boards.greenhouse.io","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4951064008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$350,000 - $850,000USD","x-skills-required":["fine-tuning large language models","reinforcement learning","reward design","training data curation","project management","interpersonal skills"],"x-skills-preferred":["experience training production ML systems","distributed systems and cloud infrastructure","domain expertise in an area where we would like to make our models more useful","experience working with external vendors or technical partners"],"datePosted":"2026-03-08T13:47:17.433Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"fine-tuning large language models, reinforcement learning, reward design, training data curation, project management, interpersonal skills, experience training production ML systems, distributed systems and cloud infrastructure, domain expertise in an area where we would like to make our models more useful, experience working with external vendors or technical partners","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":350000,"maxValue":850000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_a05bfa1a-d23"},"title":"Research Engineer, Pretraining Scaling","description":"<p><strong>About Anthropic</strong></p>\n<p>Anthropic&#39;s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.</p>\n<p><strong>About the Role:</strong></p>\n<p>Anthropic&#39;s ML Performance and Scaling team trains our production pretrained models, work that directly shapes the company&#39;s future and our mission to build safe, beneficial AI systems. As a Research Engineer on this team, you&#39;ll ensure our frontier models train reliably, efficiently, and at scale. This is demanding, high-impact work that requires both deep technical expertise and a genuine passion for the craft of large-scale ML systems.</p>\n<p>This role lives at the boundary between research and engineering. You&#39;ll work across our entire production training stack: performance optimization, hardware debugging, experimental design, and launch coordination. During launches, the team works in tight lockstep, responding to production issues that can&#39;t wait for tomorrow.</p>\n<p><strong>Responsibilities:</strong></p>\n<ul>\n<li>Own critical aspects of our production pretraining pipeline, including model operations, performance optimization, observability, and reliability</li>\n<li>Debug and resolve complex issues across the full stack—from hardware errors and networking to training dynamics and evaluation infrastructure</li>\n<li>Design and run experiments to improve training efficiency, reduce step time, increase uptime, and enhance model performance</li>\n<li>Respond to on-call incidents during model launches, diagnosing problems quickly and coordinating solutions across teams</li>\n<li>Build and maintain production logging, monitoring dashboards, and evaluation infrastructure</li>\n<li>Add new capabilities to the training codebase, such as long context support or novel architectures</li>\n<li>Collaborate closely with teammates across SF and London, as well as with Tokens, Architectures, and Systems teams</li>\n<li>Contribute to the team&#39;s institutional knowledge by documenting systems, debugging approaches, and lessons learned</li>\n</ul>\n<p><strong>You May Be a Good Fit If You:</strong></p>\n<ul>\n<li>Have hands-on experience training large language models, or deep expertise with JAX, TPU, PyTorch, or large-scale distributed systems</li>\n<li>Genuinely enjoy both research and engineering work—you&#39;d describe your ideal split as roughly 50/50 rather than heavily weighted toward one or the other</li>\n<li>Are excited about being on-call for production systems, working long days during launches, and solving hard problems under pressure</li>\n<li>Thrive when working on whatever is most impactful, even if that changes day-to-day based on what the production model needs</li>\n<li>Excel at debugging complex, ambiguous problems across multiple layers of the stack</li>\n<li>Communicate clearly and collaborate effectively, especially when coordinating across time zones or during high-stress incidents</li>\n<li>Are passionate about the work itself and want to refine your craft as a research engineer</li>\n<li>Care about the societal impacts of AI and responsible scaling</li>\n</ul>\n<p><strong>Strong Candidates May Also Have:</strong></p>\n<ul>\n<li>Previous experience training LLM’s or working extensively with JAX/TPU, PyTorch, or other ML frameworks at scale</li>\n<li>Contributed to open-source LLM frameworks (e.g., open\\_lm, llm-foundry, mesh-transformer-jax)</li>\n<li>Published research on model training, scaling laws, or ML systems</li>\n<li>Experience with production ML systems, observability tools, or evaluation infrastructure</li>\n<li>Background as a systems engineer, quant, or in other roles requiring both technical depth and operational excellence</li>\n</ul>\n<p><strong>What Makes This Role Unique:</strong></p>\n<p>This is not a typical research engineering role. The work is highly operational—you&#39;ll be deeply involved in keeping our production models training smoothly, which means being responsive to incidents, flexible about priorities, and comfortable with uncertainty. During launches, the team often works extended hours and may need to respond to issues on evenings and weekends.</p>\n<p>However, this operational intensity comes with extraordinary learning opportunities. You&#39;ll gain hands-on experience with some of the largest, most sophisticated training runs in the industry. You&#39;ll work alongside world-class researchers and engineers, and the institutional knowledge you build will compound in ways that can&#39;t be easily transferred. For people who thrive on this type of work, it&#39;s uniquely rewarding.</p>\n<p>We&#39;re building a close-knit team of people who genuinely care about doing excellent work together. If you&#39;re someone who wants to be part of training the models that will define the future of AI—and you&#39;re excited about the full reality of what that entails—we&#39;d love to hear from you.</p>\n<p><strong>Logistics</strong></p>\n<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience. <strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>\n<p><strong>Visa sponsorship:</strong> We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_a05bfa1a-d23","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4938436008","x-work-arrangement":"onsite","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"£260,000 - £630,000GBP","x-skills-required":["JAX","TPU","PyTorch","large-scale distributed systems","model operations","performance optimization","observability","reliability","debugging","experimental design","launch coordination","production logging","monitoring dashboards","evaluation infrastructure","collaboration","communication"],"x-skills-preferred":["open-source LLM frameworks","research on model training","scaling laws","ML systems","production ML systems","observability tools","evaluation infrastructure","systems engineering","quant","operational excellence"],"datePosted":"2026-03-08T13:44:15.893Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"London"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"JAX, TPU, PyTorch, large-scale distributed systems, model operations, performance optimization, observability, reliability, debugging, experimental design, launch coordination, production logging, monitoring dashboards, evaluation infrastructure, collaboration, communication, open-source LLM frameworks, research on model training, scaling laws, ML systems, production ML systems, observability tools, evaluation infrastructure, systems engineering, quant, operational excellence","baseSalary":{"@type":"MonetaryAmount","currency":"GBP","value":{"@type":"QuantitativeValue","minValue":260000,"maxValue":630000,"unitText":"YEAR"}}}]}