{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/interpretability"},"x-facet":{"type":"skill","slug":"interpretability","display":"Interpretability","count":11},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_1c4de3ab-a58"},"title":"Machine Learning Engineer, Global Public Sector","description":"<p>We&#39;re hiring a Machine Learning Engineer to bridge the gap between frontier research and real-world impact. As a key member of our GPS Engineering team, you will lead the charge in research into Agent design, Deep Research and AI Safety/reliability, developing novel methodologies that not only power public sector applications but set new standards across the entire Scale organisation.</p>\n<p>Your mission is threefold:</p>\n<ul>\n<li>Frontier Research &amp; Publication: Leading research into LLM/agent capabilities, reasoning, and safety, with the goal of publishing at top-tier venues (NeurIPS, ICML, ICLR).</li>\n<li>Cross-Org Impact: Developing generalised techniques in Agent design, AI Safety and Deep Research agents that scale across our commercial and government platforms.</li>\n<li>Mission-Critical Applications: Engineering high-stakes AI systems that impact millions of citizens globally.</li>\n</ul>\n<p>You will:</p>\n<ul>\n<li>Pioneer Novel Architectures: Design and train state-of-the-art models and agents, moving beyond “off-the-shelf” solutions to create custom architectures for complex public sector reasoning tasks.</li>\n<li>Lead AI Safety Initiatives: Research and implement robust safety frameworks, including red teaming, alignment (RLHF/DPO), and bias mitigation strategies essential for sovereign AI.</li>\n<li>Drive Deep Research Capabilities: Develop agents capable of long-horizon reasoning and autonomous information synthesis to solve complex problems for national security and public policy.</li>\n<li>Publish and Contribute: Represent Scale in the broader research community by publishing high-impact papers and contributing to open-source breakthroughs.</li>\n<li>Consult as a Subject Matter Expert: Act as a technical authority for public sector leaders, advising on the theoretical limits and safety requirements of emerging AI.</li>\n<li>Build Evaluation Frontiers: Create new benchmarks and evaluation protocols that define what success looks like for high-stakes, non-commercial AI applications.</li>\n</ul>\n<p>Ideally, you’d have:</p>\n<ul>\n<li>Advanced Degree: PhD or Master’s in Computer Science, Mathematics, or a related field with a focus on Deep Learning.</li>\n<li>Research Track Record: A portfolio of first-author publications at major conferences (NeurIPS, ICML, CVPR, EMNLP, etc.).</li>\n<li>Engineering Rigour: Strong proficiency in Python, deep learning frameworks (PyTorch/JAX), with the ability to write production-ready code that scales.</li>\n<li>Safety Expertise: Experience in alignment, robustness, or interpretability research.</li>\n</ul>\n<p>Nice to haves:</p>\n<ul>\n<li>Experience with large-scale distributed training on massive clusters.</li>\n<li>Experience in building agentic systems that are reliable.</li>\n<li>Experience in Sovereign AI or working with highly regulated data environments.</li>\n<li>A zero-to-one mindset: Comfortable navigating ambiguity and defining research directions from scratch.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_1c4de3ab-a58","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Scale","sameAs":"https://scale.com/","logo":"https://logos.yubhub.co/scale.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/scaleai/jobs/4413274005","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Python","Deep Learning","PyTorch","JAX","AI Safety","Alignment","Robustness","Interpretability"],"x-skills-preferred":["Large-scale Distributed Training","Agentic Systems","Sovereign AI","Regulated Data Environments"],"datePosted":"2026-04-18T15:59:21.005Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Doha, Qatar; London, UK"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Deep Learning, PyTorch, JAX, AI Safety, Alignment, Robustness, Interpretability, Large-scale Distributed Training, Agentic Systems, Sovereign AI, Regulated Data Environments"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_ba66dcb1-8d9"},"title":"Research Scientist, AI Controls and Monitoring","description":"<p>We&#39;re seeking a Research Scientist to join our team focused on AI Controls and Monitoring. As a key member of our team, you will design methods, systems, and experiments to ensure that advanced AI models and agents remain aligned with intended goals, even in high-stakes or adversarial environments.</p>\n<p>Your responsibilities will include developing monitoring techniques and observability methods, researching mechanisms for layered control, and designing red-team simulations to probe weaknesses in oversight and control mechanisms.</p>\n<p>To succeed in this role, you&#39;ll need a strong background in machine learning, particularly in generative AI, and at least three years of experience addressing sophisticated ML problems. You should be comfortable designing control and monitoring experiments for AI systems, building prototype systems, and quickly turning new ideas from the research literature into working prototypes.</p>\n<p>In addition to your technical expertise, you&#39;ll need strong written and verbal communication skills to operate in a cross-functional team.</p>\n<p>This role offers a competitive salary range of $216,000-$270,000 USD, depending on location and experience, as well as equity-based compensation and benefits, including comprehensive health, dental, and vision coverage, retirement benefits, a learning and development stipend, and generous PTO.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_ba66dcb1-8d9","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Scale","sameAs":"https://scale.com/","logo":"https://logos.yubhub.co/scale.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/scaleai/jobs/4675694005","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$216,000-$270,000 USD","x-skills-required":["Machine Learning","Generative AI","AI Control Protocols","AI Risk Evaluations","Runtime Monitoring","Anomaly Detection","Observability"],"x-skills-preferred":["Post-Training and RL Techniques","Scalable Oversight","Interpretability","Debate"],"datePosted":"2026-04-18T15:58:38.219Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA; New York, NY"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Machine Learning, Generative AI, AI Control Protocols, AI Risk Evaluations, Runtime Monitoring, Anomaly Detection, Observability, Post-Training and RL Techniques, Scalable Oversight, Interpretability, Debate","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":216000,"maxValue":270000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_b1be4c11-417"},"title":"Senior Research Scientist, Reward Models","description":"<p>As a Senior Research Scientist on our Reward Models team, you&#39;ll lead research efforts to improve how we specify and learn human preferences at scale. Your work will directly shape how our models understand and optimize for what humans actually want , enabling Claude to be more useful, more reliable, and better aligned with human values.</p>\n<p>This role focuses on pushing the frontier of reward modeling for large language models. You&#39;ll develop novel architectures and training methodologies for RLHF, research new approaches to LLM-based evaluation and grading (including rubric-based methods), and investigate techniques to identify and mitigate reward hacking. You&#39;ll collaborate closely with teams across Anthropic, including Finetuning, Alignment Science, and our broader research organization, to ensure your work translates into concrete improvements in both model capabilities and safety.</p>\n<p>We&#39;re looking for someone who can drive ambitious research agendas while also shipping practical improvements to production systems. You&#39;ll have the opportunity to work on some of the most important open problems in AI alignment, with access to frontier models and significant computational resources. Your work will directly advance the science of how we train AI systems to be both highly capable and safe.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Lead research on novel reward model architectures and training approaches for RLHF</li>\n<li>Develop and evaluate LLM-based grading and evaluation methods, including rubric-driven approaches that improve consistency and interpretability</li>\n<li>Research techniques to detect, characterize, and mitigate reward hacking and specification gaming</li>\n<li>Design experiments to understand reward model generalization, robustness, and failure modes</li>\n<li>Collaborate with the Finetuning team to translate research insights into improvements for production training pipelines</li>\n<li>Contribute to research publications, blog posts, and internal documentation</li>\n<li>Mentor other researchers and help build institutional knowledge around reward modeling</li>\n</ul>\n<p>You may be a good fit if you</p>\n<ul>\n<li>Have a track record of research contributions in reward modeling, RLHF, or closely related areas of machine learning</li>\n<li>Have experience training and evaluating reward models for large language models</li>\n<li>Are comfortable designing and running large-scale experiments with significant computational resources</li>\n<li>Can work effectively across research and engineering, iterating quickly while maintaining scientific rigor</li>\n<li>Enjoy collaborative research and can communicate complex ideas clearly to diverse audiences</li>\n<li>Care deeply about building AI systems that are both highly capable and safe</li>\n</ul>\n<p>Strong candidates may also</p>\n<ul>\n<li>Have published research on reward modeling, preference learning, or RLHF</li>\n<li>Have experience with LLM-as-judge approaches, including calibration and reliability challenges</li>\n<li>Have worked on reward hacking, specification gaming, or related robustness problems</li>\n<li>Have experience with constitutional AI, debate, or other scalable oversight approaches</li>\n<li>Have contributed to production ML systems at scale</li>\n<li>Have familiarity with interpretability techniques as applied to understanding reward model behavior</li>\n</ul>\n<p>The annual compensation range for this role is $350,000-$500,000 USD.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_b1be4c11-417","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5024835008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$350,000-$500,000 USD","x-skills-required":["reward modeling","RLHF","LLM-based evaluation and grading","rubric-driven approaches","reward hacking","specification gaming","large-scale experiments","computational resources","research and engineering","collaborative research","complex ideas communication","AI systems development"],"x-skills-preferred":["published research","LLM-as-judge approaches","calibration and reliability challenges","constitutional AI","debate","scalable oversight approaches","production ML systems","interpretability techniques"],"datePosted":"2026-04-18T15:57:50.755Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Remote-Friendly (Travel Required) | San Francisco, CA"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"reward modeling, RLHF, LLM-based evaluation and grading, rubric-driven approaches, reward hacking, specification gaming, large-scale experiments, computational resources, research and engineering, collaborative research, complex ideas communication, AI systems development, published research, LLM-as-judge approaches, calibration and reliability challenges, constitutional AI, debate, scalable oversight approaches, production ML systems, interpretability techniques","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":350000,"maxValue":500000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_cd02d1a1-0e8"},"title":"Communications Lead, Claude Code","description":"<p>We&#39;re looking for a Communications Lead to own comms for Claude Code. You&#39;ll sit on the Product Communications team, working day-to-day with the Claude Code product team, developer relations, and marketing.</p>\n<p>The media landscape for developer tools doesn&#39;t look like it did five years ago. We need someone who understands both traditional press and the channels where developers form opinions. You might have come up through an in-house comms team, or you might have run launches inside product marketing, handled press from a DevRel role, or found your way to this work from somewhere adjacent.</p>\n<p>You should be a Claude Code user yourself and know the product well.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Own communications for Claude Code, from the big launches to the steady rhythm of updates, community moments, and everything in between</li>\n<li>Build and maintain strong relationships with journalists, newsletter writers, podcasters, and creators covering dev tools and the AI ecosystem</li>\n<li>Lead cross-functional product launch communications, coordinating messaging across comms, marketing, developer relations, and product</li>\n<li>Advise leadership and DevRel when things move fast or catch fire, whether it’s an incident or a community thread</li>\n<li>Translate complex technical work into stories that land with developers and still make sense to broader audiences</li>\n<li>Develop messaging frameworks and content strategies that work across technical and non-technical audiences</li>\n<li>Prepare Claude Code engineers and product leads for external moments: podcasts, talks, press, etc.</li>\n<li>Think across channels (press, social, community, owned) and know which lever to pull for each moment</li>\n<li>Pay attention to what&#39;s actually working and build the program from there</li>\n</ul>\n<p>You may be a good fit if you:</p>\n<ul>\n<li>Have 8–12 years of experience in communications, PR, or developer marketing, with meaningful time focused on technical products or developer audiences</li>\n<li>Use Claude Code heavily and can talk specifically about how you use it in your day-to-day</li>\n<li>Are high-agency and low-ego, with a bias to action</li>\n<li>Write clearly and concisely, whether it&#39;s a launch post or a cross-functional update, a lot of context moves through this role and people need to be able to follow it</li>\n<li>Have a deep understanding of both traditional media channels and the emerging platforms where technical communities engage</li>\n<li>Are very online, follow the right people, know what&#39;s moving through Hacker News and developer social chatter, and catch things early</li>\n<li>Have real fluency in developer culture and know how trust gets earned there</li>\n</ul>\n<p>Strong candidates may also</p>\n<ul>\n<li>Have experience at developer tools companies, infrastructure products, or open source projects</li>\n<li>Have an existing network in developer media, technical journalism, or the creator space</li>\n<li>Have experience managing communications for AI or ML products</li>\n</ul>\n<p>The annual compensation range for this role is $185,000-$255,000 USD.</p>\n<p>Logistics</p>\n<ul>\n<li>Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience</li>\n<li>Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience</li>\n<li>Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position</li>\n<li>Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</li>\n<li>Visa sponsorship: We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</li>\n</ul>\n<p>We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work.</p>\n<p>Your safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you&#39;re ever unsure about a communication, don&#39;t click any links,visit anthropic.com/careers directly for confirmed position openings.</p>\n<p>How we&#39;re different</p>\n<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact , advancing our long-term goals of steerable, trustworthy AI , rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We&#39;re an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.</p>\n<p>The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI &amp; Compute, Concrete Problems in AI Safety, and Learning from Human Preferences.</p>\n<p>Come work with us!</p>\n<p>Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_cd02d1a1-0e8","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5153586008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$185,000-$255,000 USD","x-skills-required":["communications","PR","developer marketing","technical products","developer audiences","AI","ML","GPT-3","Circuit-Based Interpretability","Multimodal Neurons","Scaling Laws","AI & Compute","Concrete Problems in AI Safety","Learning from Human Preferences"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:57:22.697Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"communications, PR, developer marketing, technical products, developer audiences, AI, ML, GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, Learning from Human Preferences","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":185000,"maxValue":255000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_e0907526-c49"},"title":"Senior Privacy Architect Manager","description":"<p>We are looking for a Senior Manager, Privacy Architect to join our Privacy &amp; Data Security Team. As our growth accelerates through AI-powered personalization and innovative social features, privacy takes on new importance,fueling our ability to deliver magical, personalized experiences while ensuring our users feel safe, respected, and in control.</p>\n<p>The ideal candidate will have deep knowledge of current privacy and technology trends, with a strong passion for data governance/management and AI/ML. They will have demonstrated experience in ensuring privacy-by-design principles are applied throughout the design, construction, and operation of digital products and services at scale.</p>\n<p>Responsibilities include leading high-impact initiatives and defining technical requirements for compliance and the responsible use of technology at Airbnb. The candidate will work closely with the Chief Privacy Officer, Legal, and other Privacy &amp; Data Security Team members, as well as engineering and data science teams.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Collaborating with technical teams in the identification and effective management of company-wide risks to privacy and the responsible use of technology</li>\n<li>Leading definition and implementation of company-wide standards, practices, and patterns to protect and manage personal data in accordance with privacy and AI regulations</li>\n<li>Working with Legal, Data Science, Data Governance, and InfoSec to introduce Privacy by Design principles in company products and infrastructure</li>\n<li>Creating privacy training for technical roles, including data engineers, developers, and data scientists</li>\n</ul>\n<p>Requirements include:</p>\n<ul>\n<li>15+ years of total experience, with 5+ years of experience in technical program/project management or privacy engineering focused on building technology products and/or systems</li>\n<li>Deep understanding of large-scale, “Big Data” data stores and technologies</li>\n<li>Strong familiarity with the AI/ML development lifecycle: from data collection and curation, through model architecture selection, training, testing, A/B testing, and deployment</li>\n<li>Solid understanding of Large Language Models (LLMs), Generative AI and AI Agents, including compliance and responsible use challenges arising from their deployment in B2C services</li>\n<li>Strong familiarity with Privacy Enhancing Technologies (PETs), such as various types of encryption, de-identification methods (e.g., k-anonymity, differential privacy), and AI/ML interpretability techniques (e.g., SHAP, LIME)</li>\n</ul>\n<p>Preferred qualifications include:</p>\n<ul>\n<li>Professional certifications such as Certified Information Privacy Professional (CIPP), Certified Information Privacy Manager (CIPM), or AI Governance Professional (AIGP) or equivalent</li>\n<li>BA/BS and/or advanced degree in engineering, computer science, mathematics, statistics, physics, or a related field</li>\n<li>Experience with programming languages and tools commonly used in AI, such as R, Python and Github</li>\n</ul>\n<p>This position is US - Remote Eligible. The role may include occasional work at an Airbnb office or attendance at offsites, as agreed to with your manager.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_e0907526-c49","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Airbnb","sameAs":"https://www.airbnb.com/","logo":"https://logos.yubhub.co/airbnb.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/airbnb/jobs/7782533","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Privacy","Data Governance","AI/ML","Large Scale Data Stores","Big Data","Large Language Models","Generative AI","AI Agents","Privacy Enhancing Technologies","Encryption","De-identification","AI/ML Interpretability"],"x-skills-preferred":["Certified Information Privacy Professional (CIPP)","Certified Information Privacy Manager (CIPM)","AI Governance Professional (AIGP)","R","Python","Github"],"datePosted":"2026-04-18T15:50:10.170Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"United States"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Privacy, Data Governance, AI/ML, Large Scale Data Stores, Big Data, Large Language Models, Generative AI, AI Agents, Privacy Enhancing Technologies, Encryption, De-identification, AI/ML Interpretability, Certified Information Privacy Professional (CIPP), Certified Information Privacy Manager (CIPM), AI Governance Professional (AIGP), R, Python, Github"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_8549c317-12f"},"title":"Senior Research Scientist, Reward Models","description":"<p>As a Senior Research Scientist on our Reward Models team, you&#39;ll lead research efforts to improve how we specify and learn human preferences at scale.</p>\n<p>Your work will directly shape how our models understand and optimize for what humans actually want , enabling Claude to be more useful, more reliable, and better aligned with human values.</p>\n<p>This role focuses on pushing the frontier of reward modeling for large language models. You&#39;ll develop novel architectures and training methodologies for RLHF, research new approaches to LLM-based evaluation and grading (including rubric-based methods), and investigate techniques to identify and mitigate reward hacking.</p>\n<p>You&#39;ll collaborate closely with teams across Anthropic, including Finetuning, Alignment Science, and our broader research organization, to ensure your work translates into concrete improvements in both model capabilities and safety.</p>\n<p>We&#39;re looking for someone who can drive ambitious research agendas while also shipping practical improvements to production systems. You&#39;ll have the opportunity to work on some of the most important open problems in AI alignment, with access to frontier models and significant computational resources.</p>\n<p>Your work will directly advance the science of how we train AI systems to be both highly capable and safe.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Lead research on novel reward model architectures and training approaches for RLHF</li>\n</ul>\n<ul>\n<li>Develop and evaluate LLM-based grading and evaluation methods, including rubric-driven approaches that improve consistency and interpretability</li>\n</ul>\n<ul>\n<li>Research techniques to detect, characterize, and mitigate reward hacking and specification gaming</li>\n</ul>\n<ul>\n<li>Design experiments to understand reward model generalization, robustness, and failure modes</li>\n</ul>\n<ul>\n<li>Collaborate with the Finetuning team to translate research insights into improvements for production training pipelines</li>\n</ul>\n<ul>\n<li>Contribute to research publications, blog posts, and internal documentation</li>\n</ul>\n<ul>\n<li>Mentor other researchers and help build institutional knowledge around reward modeling</li>\n</ul>\n<p>You may be a good fit if you:</p>\n<ul>\n<li>Have a track record of research contributions in reward modeling, RLHF, or closely related areas of machine learning</li>\n</ul>\n<ul>\n<li>Have experience training and evaluating reward models for large language models</li>\n</ul>\n<ul>\n<li>Are comfortable designing and running large-scale experiments with significant computational resources</li>\n</ul>\n<ul>\n<li>Can work effectively across research and engineering, iterating quickly while maintaining scientific rigor</li>\n</ul>\n<ul>\n<li>Enjoy collaborative research and can communicate complex ideas clearly to diverse audiences</li>\n</ul>\n<ul>\n<li>Care deeply about building AI systems that are both highly capable and safe</li>\n</ul>\n<p>Strong candidates may also:</p>\n<ul>\n<li>Have published research on reward modeling, preference learning, or RLHF</li>\n</ul>\n<ul>\n<li>Have experience with LLM-as-judge approaches, including calibration and reliability challenges</li>\n</ul>\n<ul>\n<li>Have worked on reward hacking, specification gaming, or related robustness problems</li>\n</ul>\n<ul>\n<li>Have experience with constitutional AI, debate, or other scalable oversight approaches</li>\n</ul>\n<ul>\n<li>Have contributed to production ML systems at scale</li>\n</ul>\n<ul>\n<li>Have familiarity with interpretability techniques as applied to understanding reward model behavior</li>\n</ul>\n<p>The annual compensation range for this role is $350,000-$500,000 USD.</p>\n<p>Logistics:</p>\n<ul>\n<li>Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience</li>\n</ul>\n<ul>\n<li>Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience</li>\n</ul>\n<ul>\n<li>Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position</li>\n</ul>\n<p>Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>\n<p>Visa sponsorship: We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>\n<p>We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work.</p>\n<p>Your safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you&#39;re ever unsure about a communication, don&#39;t click any links,visit anthropic.com/careers directly for confirmed position openings.</p>\n<p>How we&#39;re different:</p>\n<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact , advancing our long-term goals of steerable, trustworthy AI , rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We&#39;re an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.</p>\n<p>The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI &amp; Compute, Concrete Problems in AI Safety, and Learning from Human Preferences.</p>\n<p>Come work with us!</p>\n<p>Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_8549c317-12f","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5024835008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$350,000-$500,000 USD","x-skills-required":["reward modeling","RLHF","large language models","novel architectures","training methodologies","evaluation and grading","rubric-based methods","reward hacking","specification gaming","generalization","robustness","failure modes","computational resources","scientific rigor","communication skills","interpretability techniques"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:47:13.514Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Remote-Friendly (Travel Required) | San Francisco, CA"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"reward modeling, RLHF, large language models, novel architectures, training methodologies, evaluation and grading, rubric-based methods, reward hacking, specification gaming, generalization, robustness, failure modes, computational resources, scientific rigor, communication skills, interpretability techniques","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":350000,"maxValue":500000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_60da952d-d37"},"title":"Research Scientist, Interpretability","description":"<p><strong>About the role</strong></p>\n<p>When you see what modern language models are capable of, do you wonder, &quot;How do these things work? How can we trust them?&quot; The Interpretability team at Anthropic is working to reverse-engineer how trained models work because we believe that a mechanistic understanding is the most robust way to make advanced systems safe.</p>\n<p>We&#39;re looking for researchers and engineers to join our efforts. People mean many different things by &quot;interpretability&quot;. We&#39;re focused on mechanistic interpretability, which aims to discover how neural network parameters map to meaningful algorithms.</p>\n<p>A few places to learn more about our work and team at a high level are this introduction to Interpretability from our research lead, Chris Olah; a discussion of our work on the Hard Fork podcast produced by the New York Times, and this blog post (and accompanying video) sharing more about some of the engineering challenges we’d had to solve to get these results.</p>\n<p>Some of our team&#39;s notable publications include A Mathematical Framework for Transformer Circuits, In-context Learning and Induction Heads, Toy Models of Superposition, Scaling Monosemanticity, and our Circuits’ Methods and Biology papers.</p>\n<p>This work builds on ideas from members&#39; work prior to Anthropic such as the original circuits thread, Multimodal Neurons, Activation Atlases, and Building Blocks.</p>\n<p>We aim to create a solid foundation for mechanistically understanding neural networks and making them safe (see our vision post).</p>\n<p>In the short term, we have focused on resolving the issue of &quot;superposition&quot; (see Toy Models of Superposition, Superposition, Memorization, and Double Descent, and our May 2023 update), which causes the computational units of the models, like neurons and attention heads, to be individually uninterpretable, and on finding ways to decompose models into more interpretable components.</p>\n<p>Our subsequent work found millions of features in Sonnet, one of our production language models, represents progress in this direction.</p>\n<p>In our most recent work, we develop methods that allow us to build circuits using features and use this circuits to understand the mechanisms associated with a model&#39;s computation and study specific examples of multi-hop reasoning, planning, and chain-of-thought faithfulness on Haiku 3.5, one of our production models.</p>\n<p>This is a stepping stone towards our overall goal of mechanistically understanding neural networks.</p>\n<p>We often collaborate with teams across Anthropic, such as Alignment Science and Societal Impacts to use our work to make Anthropic’s models safer.</p>\n<p>We also have an Interpretability Architectures project that involves collaborating with Pretraining.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Develop methods for understanding LLMs by reverse engineering algorithms learned in their weights</li>\n<li>Design and run robust experiments, both quickly in toy scenarios and at scale in large models</li>\n<li>Create and analyze new interpretability features and circuits to better understand how models work.</li>\n<li>Build infrastructure for running experiments and visualizing results</li>\n<li>Work with colleagues to communicate results internally and publicly</li>\n</ul>\n<p><strong>You may be a good fit if you:</strong></p>\n<ul>\n<li>Have a strong track record of scientific research (in any field), and have done some work on Interpretability</li>\n<li>Enjoy team science – working collaboratively to make big discoveries</li>\n<li>Are comfortable with messy experimental science. We&#39;re inventing the field as we work, and the first textbook is years away</li>\n<li>You view research and engineering as two sides of the same coin. Every team member writes code, designs and runs experiments, and interprets results</li>\n<li>You can clearly articulate and discuss the motivations behind your work, and teach us about what you&#39;ve learned. You like writing up and communicating your results, even when they&#39;re null</li>\n</ul>\n<p>To learn more about the skills we look for and how to prepare for this role, see our blog post – So You Want to Work in Mechanistic Interpretability?</p>\n<p>Familiarity with Python is required for this role.</p>\n<p><strong>Role Specific Location Policy:</strong></p>\n<ul>\n<li>This role is based in San Francisco office; however, we are open to considering exceptional candidates for remote work on a case-by-case basis.</li>\n</ul>\n<p>The annual compensation range for this role is listed below. For sales roles, the range provided is the role’s On Target Earnings (&quot;OTE&quot;) range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.</p>\n<p>Annual Salary: $350,000-$850,000 USD</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_60da952d-d37","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4980427008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$350,000-$850,000 USD","x-skills-required":["Python","Mechanistic Interpretability","LLMs","Neural Networks","Circuits","Features","Model Computation"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:44:56.628Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Mechanistic Interpretability, LLMs, Neural Networks, Circuits, Features, Model Computation","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":350000,"maxValue":850000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_6aa46bac-783"},"title":"Software Engineer, Cybersecurity Products","description":"<p><strong>About Anthropic</strong></p>\n<p>Anthropic&#39;s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.</p>\n<p><strong>About the Role</strong></p>\n<p>We&#39;re looking for engineers to join a new effort building AI-powered products and capabilities for cybersecurity. You&#39;ll work across the stack to prototype new ideas and build from the ground up.</p>\n<p>This role sits at the intersection of research, product, and go-to-market. You&#39;ll work closely with research teams to develop new model capabilities for security applications, prototype and iterate quickly to validate ideas, and engage directly with customers and partners to inform what we build. The right candidate has the technical depth to engage with research, the product instincts to know what&#39;s worth building, and the drive to move fast.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Prototype and build new AI-powered products for cybersecurity</li>\n</ul>\n<ul>\n<li>Iterate quickly based on customer feedback and what you learn</li>\n</ul>\n<ul>\n<li>Collaborate with research teams to identify and develop new model capabilities for security applications</li>\n</ul>\n<ul>\n<li>Engage directly with customers and partners to understand workflows and inform product direction</li>\n</ul>\n<p><strong>You may be a good fit if you:</strong></p>\n<ul>\n<li>Have 7+ years of experience as a software engineer</li>\n</ul>\n<ul>\n<li>Experience developing cybersecurity products</li>\n</ul>\n<ul>\n<li>Enjoy fast iteration and are energized by prototyping new ideas</li>\n</ul>\n<ul>\n<li>Have strong product instincts and enjoy defining what to build, not just how to build it</li>\n</ul>\n<ul>\n<li>Are comfortable working closely with research and go-to-market teams</li>\n</ul>\n<ul>\n<li>Have strong communication skills and can work effectively across functions</li>\n</ul>\n<p><strong>Strong candidates may also have:</strong></p>\n<ul>\n<li>Experience in incident response, reverse engineering, network analysis, penetration testing, or similar fields</li>\n</ul>\n<ul>\n<li>Experience working with AI/ML models and building products on top of them</li>\n</ul>\n<ul>\n<li>Experience building agentic applications</li>\n</ul>\n<p><strong>Deadline to apply:</strong></p>\n<p>None. Applications will be reviewed on a rolling basis.</p>\n<p><strong>Logistics</strong></p>\n<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience. <strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>\n<p><strong>Visa sponsorship:</strong> We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>\n<p><strong>We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work.</strong></p>\n<p><strong>Your safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you&#39;re ever unsure about a communication, don&#39;t click any links—visit anthropic.com/careers directly for confirmed position openings.</strong></p>\n<p><strong>How we&#39;re different</strong></p>\n<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We&#39;re an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.</p>\n<p>The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI &amp; Compute, Concrete Problems in AI Safety, and Learning from Human Preferences.</p>\n<p><strong>Come work with us!</strong></p>\n<p>Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues. <strong>Guidance on Candidates&#39; AI Usage:</strong> Learn about our policy for using AI in our application process</p>\n<p>Interested in building your career at Anthropic? Get future opportunities by following us on LinkedIn and Twitter.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_6aa46bac-783","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://job-boards.greenhouse.io","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5063007008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$320,000 - $405,000 USD","x-skills-required":["software engineer","cybersecurity products","AI/ML models","incident response","reverse engineering","network analysis","penetration testing"],"x-skills-preferred":["agentic applications","circuit-based interpretability","multimodal neurons","scaling laws","AI & compute","concrete problems in AI safety","learning from human preferences"],"datePosted":"2026-03-08T13:52:59.143Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA; Washington, DC"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"software engineer, cybersecurity products, AI/ML models, incident response, reverse engineering, network analysis, penetration testing, agentic applications, circuit-based interpretability, multimodal neurons, scaling laws, AI & compute, concrete problems in AI safety, learning from human preferences","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":320000,"maxValue":405000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_20e650c2-d9c"},"title":"Research Scientist, Interpretability","description":"<p><strong>About the role:</strong></p>\n<p>When you see what modern language models are capable of, do you wonder, &#39;How do these things work? How can we trust them?&#39;</p>\n<p>The Interpretability team at Anthropic is working to reverse-engineer how trained models work because we believe that a mechanistic understanding is the most robust way to make advanced systems safe. We’re looking for researchers and engineers to join our efforts.</p>\n<p>People mean many different things by &#39;interpretability&#39;. We&#39;re focused on mechanistic interpretability, which aims to discover how neural network parameters map to meaningful algorithms. Some useful analogies might be to think of us as trying to do &#39;biology&#39; or &#39;neuroscience&#39; of neural networks using “microscopes” we build, or as treating neural networks as binary computer programs we&#39;re trying to &#39;reverse engineer&#39;.</p>\n<p><strong>Responsibilities:</strong></p>\n<ul>\n<li>Develop methods for understanding LLMs by reverse engineering algorithms learned in their weights</li>\n</ul>\n<ul>\n<li>Design and run robust experiments, both quickly in toy scenarios and at scale in large models</li>\n</ul>\n<ul>\n<li>Create and analyse new interpretability features and circuits to better understand how models work.</li>\n</ul>\n<ul>\n<li>Build infrastructure for running experiments and visualising results</li>\n</ul>\n<ul>\n<li>Work with colleagues to communicate results internally and publicly</li>\n</ul>\n<p><strong>You may be a good fit if you:</strong></p>\n<ul>\n<li>Have a strong track record of scientific research (in any field), and have done _some_ work on Interpretability</li>\n</ul>\n<ul>\n<li>Enjoy team science – working collaboratively to make big discoveries</li>\n</ul>\n<ul>\n<li>Are comfortable with messy experimental science. We&#39;re inventing the field as we work, and the first textbook is years away</li>\n</ul>\n<ul>\n<li>You view research and engineering as two sides of the same coin. Every team member writes code, designs and runs experiments, and interprets results</li>\n</ul>\n<ul>\n<li>You can clearly articulate and discuss the motivations behind your work, and teach us about what you&#39;ve learned. You like writing up and communicating your results, even when they&#39;re null</li>\n</ul>\n<p><strong>Role Specific Location Policy:</strong></p>\n<ul>\n<li>This role is based in San Francisco office; however, we are open to considering exceptional candidates for remote work on a case-by-case basis.</li>\n</ul>\n<p><strong>Logistics</strong></p>\n<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience. <strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>\n<p><strong>Visa sponsorship:</strong> We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>\n<p><strong>We encourage you to apply even if you do not believe you meet every single qualification.</strong> Not all strong candidates will meet every single qualification as listed.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_20e650c2-d9c","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4980427008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$350,000 - $850,000USD","x-skills-required":["Python","Mechanistic Interpretability","Neural Networks","Reverse Engineering","Experimental Science"],"x-skills-preferred":["Research","Engineering","Team Science","Communication"],"datePosted":"2026-03-08T13:48:39.765Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Mechanistic Interpretability, Neural Networks, Reverse Engineering, Experimental Science, Research, Engineering, Team Science, Communication","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":350000,"maxValue":850000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_716d3247-e3f"},"title":"ML/Research Engineer, Safeguards","description":"<p><strong>About Anthropic</strong></p>\n<p>Anthropic&#39;s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.</p>\n<p><strong>About the role</strong></p>\n<p>We are looking for ML Engineers and Research Engineers to help detect and mitigate misuse of our AI systems. As a member of the Safeguards ML team, you will build systems that identify harmful use—from individual policy violations to sophisticated, coordinated attacks—and develop defenses that keep our products safe as capabilities advance. You will also work on systems that protect user wellbeing and ensure our models behave appropriately across a wide range of contexts. This work feeds directly into Anthropic&#39;s Responsible Scaling Policy commitments.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Develop classifiers to detect misuse and anomalous behavior at scale. This includes developing synthetic data pipelines for training classifiers and methods to automatically source representative evaluations to iterate on</li>\n<li>Build systems to monitor for harms that span multiple exchanges, such as coordinated cyber attacks and influence operations, and develop new methods for aggregating and analyzing signals across contexts</li>\n<li>Evaluate and improve the safety of agentic products—developing both threat models and environments to test for agentic risks, and developing and deploying mitigations for prompt injection attacks</li>\n<li>Conduct research on automated red-teaming, adversarial robustness, and other research that helps test for or find misuse</li>\n</ul>\n<p><strong>You may be a good fit if you</strong></p>\n<ul>\n<li>Have 4+ years of experience in ML engineering, research engineering, or applied research, in academia or industry</li>\n<li>Have proficiency in Python and experience building ML systems</li>\n<li>Are comfortable working across the research-to-deployment pipeline, from exploratory experiments to production systems</li>\n<li>Are worried about misuse risks of AI systems, and want to work to mitigate them</li>\n<li>Have strong communication skills and ability to explain complex technical concepts to non-technical stakeholders</li>\n</ul>\n<p><strong>Strong candidates may also have experience with</strong></p>\n<ul>\n<li>Language modeling and transformers</li>\n<li>Building classifiers, anomaly detection systems, or behavioral ML</li>\n<li>Adversarial machine learning or red-teaming</li>\n<li>Interpretability or probes</li>\n<li>Reinforcement learning</li>\n<li>High-performance, large-scale ML systems</li>\n</ul>\n<p><strong>Logistics</strong></p>\n<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience. <strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>\n<p><strong>Visa sponsorship</strong></p>\n<p>We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>\n<p><strong>We encourage you to apply even if you do not believe you meet every single qualification.</strong></p>\n<p>Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work.</p>\n<p><strong>Your safety matters to us.</strong></p>\n<p>To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you&#39;re ever unsure about a communication, don&#39;t click any links—visit anthropic.com/careers directly for confirmed position openings.</p>\n<p><strong>How we&#39;re different</strong></p>\n<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We&#39;re an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.</p>\n<p>The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI &amp; Compute, Concrete Problems in AI Safety, and Learning from Human Preferences.</p>\n<p><strong>Come work with us!</strong></p>\n<p>Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_716d3247-e3f","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://job-boards.greenhouse.io","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4949336008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$350,000 - $500,000USD","x-skills-required":["Python","Machine Learning","Research Engineering","Adversarial Machine Learning","Red-teaming","Interpretability","Probes","Reinforcement Learning","High-performance, large-scale ML systems"],"x-skills-preferred":["Language modeling and transformers","Building classifiers, anomaly detection systems, or behavioral ML"],"datePosted":"2026-03-08T13:46:45.711Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Machine Learning, Research Engineering, Adversarial Machine Learning, Red-teaming, Interpretability, Probes, Reinforcement Learning, High-performance, large-scale ML systems, Language modeling and transformers, Building classifiers, anomaly detection systems, or behavioral ML","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":350000,"maxValue":500000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_61280fe7-04a"},"title":"Researcher, Interpretability","description":"<p><strong>Job Posting</strong></p>\n<p><strong>Researcher, Interpretability</strong></p>\n<p><strong>Location</strong></p>\n<p>San Francisco</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Location Type</strong></p>\n<p>Hybrid</p>\n<p><strong>Department</strong></p>\n<p>Safety Systems</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$295K – $445K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p>More details about our benefits are available to candidates during the hiring process.</p>\n<p>This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.</p>\n<p><strong>About the Team</strong></p>\n<p>The Interpretability team studies internal representations of deep learning models. We are interested in using representations to understand model behavior, and in engineering models to have more understandable representations. We are particularly interested in applying our understanding to ensure the safety of powerful AI systems. Our working style is collaborative and curiosity-driven.</p>\n<p><strong>About the Role</strong></p>\n<p>OpenAI is seeking a researcher passionate about understanding deep networks, with a strong background in engineering, quantitative reasoning, and the research process. You will develop and carry out a research plan in mechanistic interpretability, in close collaboration with a highly motivated team. You will play a critical role in helping OpenAI ensure future models remain safe even as they grow in capability. This will make a significant impact on our goal of building and deploying safe AGI.</p>\n<p>In this role, you will:</p>\n<ul>\n<li>Develop and publish research on techniques for understanding representations of deep networks.</li>\n</ul>\n<ul>\n<li>Engineer infrastructure for studying model internals at scale.</li>\n</ul>\n<ul>\n<li>Collaborate across teams to work on projects that OpenAI is uniquely suited to pursue.</li>\n</ul>\n<ul>\n<li>Guide research directions toward demonstrable usefulness and/or long-term scalability.</li>\n</ul>\n<p><strong>You might thrive in this role if you:</strong></p>\n<ul>\n<li>Are excited about OpenAI’s mission of ensuring AGI benefits all of humanity, and are aligned with OpenAI’s charter.</li>\n</ul>\n<ul>\n<li>Show enthusiasm for long-term AI safety, and have thought deeply about technical paths to safe AGI.</li>\n</ul>\n<ul>\n<li>Bring experience in the field of AI safety, mechanistic interpretability, or spiritually related disciplines.</li>\n</ul>\n<ul>\n<li>Hold a Ph.D. or have research experience in computer science, machine learning, or a related field.</li>\n</ul>\n<ul>\n<li>Thrive in environments involving large-scale AI systems, and are excited to make use of OpenAI’s unique resources in this area.</li>\n</ul>\n<ul>\n<li>Possess 2+ years of research engineering experience and proficiency in Python or similar languages.</li>\n</ul>\n<ul>\n<li>Are deeply curious.</li>\n</ul>\n<p><strong>About OpenAI</strong></p>\n<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_61280fe7-04a","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/c44268f1-717b-4da3-9943-2557f7d739f0","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$295K – $445K • Offers Equity","x-skills-required":["Python","Machine Learning","Deep Learning","Research Engineering","Computer Science"],"x-skills-preferred":["AI Safety","Mechanistic Interpretability","Quantitative Reasoning","Engineering"],"datePosted":"2026-03-06T18:39:59.202Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Machine Learning, Deep Learning, Research Engineering, Computer Science, AI Safety, Mechanistic Interpretability, Quantitative Reasoning, Engineering","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":295000,"maxValue":445000,"unitText":"YEAR"}}}]}