<?xml version="1.0" encoding="UTF-8"?>
<source>
  <jobs>
    <job>
      <externalid>b1be4c11-417</externalid>
      <Title>Senior Research Scientist, Reward Models</Title>
      <Description><![CDATA[<p>As a Senior Research Scientist on our Reward Models team, you&#39;ll lead research efforts to improve how we specify and learn human preferences at scale. Your work will directly shape how our models understand and optimize for what humans actually want , enabling Claude to be more useful, more reliable, and better aligned with human values.</p>
<p>This role focuses on pushing the frontier of reward modeling for large language models. You&#39;ll develop novel architectures and training methodologies for RLHF, research new approaches to LLM-based evaluation and grading (including rubric-based methods), and investigate techniques to identify and mitigate reward hacking. You&#39;ll collaborate closely with teams across Anthropic, including Finetuning, Alignment Science, and our broader research organization, to ensure your work translates into concrete improvements in both model capabilities and safety.</p>
<p>We&#39;re looking for someone who can drive ambitious research agendas while also shipping practical improvements to production systems. You&#39;ll have the opportunity to work on some of the most important open problems in AI alignment, with access to frontier models and significant computational resources. Your work will directly advance the science of how we train AI systems to be both highly capable and safe.</p>
<p>Responsibilities:</p>
<ul>
<li>Lead research on novel reward model architectures and training approaches for RLHF</li>
<li>Develop and evaluate LLM-based grading and evaluation methods, including rubric-driven approaches that improve consistency and interpretability</li>
<li>Research techniques to detect, characterize, and mitigate reward hacking and specification gaming</li>
<li>Design experiments to understand reward model generalization, robustness, and failure modes</li>
<li>Collaborate with the Finetuning team to translate research insights into improvements for production training pipelines</li>
<li>Contribute to research publications, blog posts, and internal documentation</li>
<li>Mentor other researchers and help build institutional knowledge around reward modeling</li>
</ul>
<p>You may be a good fit if you</p>
<ul>
<li>Have a track record of research contributions in reward modeling, RLHF, or closely related areas of machine learning</li>
<li>Have experience training and evaluating reward models for large language models</li>
<li>Are comfortable designing and running large-scale experiments with significant computational resources</li>
<li>Can work effectively across research and engineering, iterating quickly while maintaining scientific rigor</li>
<li>Enjoy collaborative research and can communicate complex ideas clearly to diverse audiences</li>
<li>Care deeply about building AI systems that are both highly capable and safe</li>
</ul>
<p>Strong candidates may also</p>
<ul>
<li>Have published research on reward modeling, preference learning, or RLHF</li>
<li>Have experience with LLM-as-judge approaches, including calibration and reliability challenges</li>
<li>Have worked on reward hacking, specification gaming, or related robustness problems</li>
<li>Have experience with constitutional AI, debate, or other scalable oversight approaches</li>
<li>Have contributed to production ML systems at scale</li>
<li>Have familiarity with interpretability techniques as applied to understanding reward model behavior</li>
</ul>
<p>The annual compensation range for this role is $350,000-$500,000 USD.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$350,000-$500,000 USD</Salaryrange>
      <Skills>reward modeling, RLHF, LLM-based evaluation and grading, rubric-driven approaches, reward hacking, specification gaming, large-scale experiments, computational resources, research and engineering, collaborative research, complex ideas communication, AI systems development, published research, LLM-as-judge approaches, calibration and reliability challenges, constitutional AI, debate, scalable oversight approaches, production ML systems, interpretability techniques</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Anthropic</Employername>
      <Employerlogo>https://logos.yubhub.co/anthropic.com.png</Employerlogo>
      <Employerdescription>Anthropic creates reliable, interpretable, and steerable AI systems. It is a public benefit corporation headquartered in San Francisco.</Employerdescription>
      <Employerwebsite>https://www.anthropic.com/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/anthropic/jobs/5024835008</Applyto>
      <Location>Remote-Friendly (Travel Required) | San Francisco, CA</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>8549c317-12f</externalid>
      <Title>Senior Research Scientist, Reward Models</Title>
      <Description><![CDATA[<p>As a Senior Research Scientist on our Reward Models team, you&#39;ll lead research efforts to improve how we specify and learn human preferences at scale.</p>
<p>Your work will directly shape how our models understand and optimize for what humans actually want , enabling Claude to be more useful, more reliable, and better aligned with human values.</p>
<p>This role focuses on pushing the frontier of reward modeling for large language models. You&#39;ll develop novel architectures and training methodologies for RLHF, research new approaches to LLM-based evaluation and grading (including rubric-based methods), and investigate techniques to identify and mitigate reward hacking.</p>
<p>You&#39;ll collaborate closely with teams across Anthropic, including Finetuning, Alignment Science, and our broader research organization, to ensure your work translates into concrete improvements in both model capabilities and safety.</p>
<p>We&#39;re looking for someone who can drive ambitious research agendas while also shipping practical improvements to production systems. You&#39;ll have the opportunity to work on some of the most important open problems in AI alignment, with access to frontier models and significant computational resources.</p>
<p>Your work will directly advance the science of how we train AI systems to be both highly capable and safe.</p>
<p>Responsibilities:</p>
<ul>
<li>Lead research on novel reward model architectures and training approaches for RLHF</li>
</ul>
<ul>
<li>Develop and evaluate LLM-based grading and evaluation methods, including rubric-driven approaches that improve consistency and interpretability</li>
</ul>
<ul>
<li>Research techniques to detect, characterize, and mitigate reward hacking and specification gaming</li>
</ul>
<ul>
<li>Design experiments to understand reward model generalization, robustness, and failure modes</li>
</ul>
<ul>
<li>Collaborate with the Finetuning team to translate research insights into improvements for production training pipelines</li>
</ul>
<ul>
<li>Contribute to research publications, blog posts, and internal documentation</li>
</ul>
<ul>
<li>Mentor other researchers and help build institutional knowledge around reward modeling</li>
</ul>
<p>You may be a good fit if you:</p>
<ul>
<li>Have a track record of research contributions in reward modeling, RLHF, or closely related areas of machine learning</li>
</ul>
<ul>
<li>Have experience training and evaluating reward models for large language models</li>
</ul>
<ul>
<li>Are comfortable designing and running large-scale experiments with significant computational resources</li>
</ul>
<ul>
<li>Can work effectively across research and engineering, iterating quickly while maintaining scientific rigor</li>
</ul>
<ul>
<li>Enjoy collaborative research and can communicate complex ideas clearly to diverse audiences</li>
</ul>
<ul>
<li>Care deeply about building AI systems that are both highly capable and safe</li>
</ul>
<p>Strong candidates may also:</p>
<ul>
<li>Have published research on reward modeling, preference learning, or RLHF</li>
</ul>
<ul>
<li>Have experience with LLM-as-judge approaches, including calibration and reliability challenges</li>
</ul>
<ul>
<li>Have worked on reward hacking, specification gaming, or related robustness problems</li>
</ul>
<ul>
<li>Have experience with constitutional AI, debate, or other scalable oversight approaches</li>
</ul>
<ul>
<li>Have contributed to production ML systems at scale</li>
</ul>
<ul>
<li>Have familiarity with interpretability techniques as applied to understanding reward model behavior</li>
</ul>
<p>The annual compensation range for this role is $350,000-$500,000 USD.</p>
<p>Logistics:</p>
<ul>
<li>Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience</li>
</ul>
<ul>
<li>Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience</li>
</ul>
<ul>
<li>Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position</li>
</ul>
<p>Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>
<p>Visa sponsorship: We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>
<p>We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work.</p>
<p>Your safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you&#39;re ever unsure about a communication, don&#39;t click any links,visit anthropic.com/careers directly for confirmed position openings.</p>
<p>How we&#39;re different:</p>
<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact , advancing our long-term goals of steerable, trustworthy AI , rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We&#39;re an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.</p>
<p>The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI &amp; Compute, Concrete Problems in AI Safety, and Learning from Human Preferences.</p>
<p>Come work with us!</p>
<p>Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$350,000-$500,000 USD</Salaryrange>
      <Skills>reward modeling, RLHF, large language models, novel architectures, training methodologies, evaluation and grading, rubric-based methods, reward hacking, specification gaming, generalization, robustness, failure modes, computational resources, scientific rigor, communication skills, interpretability techniques</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Anthropic</Employername>
      <Employerlogo>https://logos.yubhub.co/anthropic.com.png</Employerlogo>
      <Employerdescription>Anthropic is a public benefit corporation that aims to create reliable, interpretable, and steerable AI systems.</Employerdescription>
      <Employerwebsite>https://www.anthropic.com/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/anthropic/jobs/5024835008</Applyto>
      <Location>Remote-Friendly (Travel Required) | San Francisco, CA</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>d2f5b1e5-545</externalid>
      <Title>Research Scientist, Gemini Safety</Title>
      <Description><![CDATA[<p>We&#39;re seeking a versatile Research Scientist to join our Gemini Safety team. As a Research Scientist, you will apply and develop data and algorithmic cutting-edge solutions to advance our latest user-facing models. Your work will focus on advancing the safety and fairness behavior of state-of-the-art AI models, driving the development of foundational technology adopted by numerous product areas, including Gemini App, Cloud API, and Search.</p>
<p>Key responsibilities include:</p>
<ul>
<li>Post-training/instruction tuning state-of-the-art LLMs, focusing on text-to-text, image/video/audio-to-text modalities and agentic capabilities</li>
<li>Exploring data, reasoning, and algorithmic solutions to ensure Gemini Models are safe, maximally helpful, and work for everyone</li>
<li>Improve Gemini&#39;s adversarial robustness, with a focus on high-stakes abuse risks</li>
<li>Design and maintain high-quality evaluation protocols to assess model behavior gaps and headroom related to safety and fairness</li>
<li>Develop and execute experimental plans to address known gaps, or construct entirely new capabilities</li>
<li>Drive innovation and enhance understanding of Supervised Fine Tuning and Reinforcement Learning fine-tuning at scale</li>
</ul>
<p>To succeed as a Research Scientist in the Gemini Safety team, we look for the following skills and experience:</p>
<ul>
<li>PhD in Computer Science, a related field, or equivalent practical experience</li>
<li>Significant LLM post-training experience</li>
<li>Experience in Reward modeling and Reinforcement Learning for LLMs Instruction tuning</li>
<li>Experience with Long-range Reinforcement learning</li>
<li>Experience in areas such as Safety, Fairness, and Alignment</li>
<li>Track record of publications at NeurIPS, ICLR, ICML</li>
<li>Experience taking research from concept to product</li>
<li>Experience with collaborating or leading an applied research project</li>
<li>Strong experimental taste: Good judgment regarding baselines, ablations, and what is worth testing</li>
<li>Experience with JAX</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>PhD in Computer Science, LLM post-training experience, Reward modeling and Reinforcement Learning for LLMs Instruction tuning, Long-range Reinforcement learning, Safety, Fairness, and Alignment, NeurIPS, ICLR, ICML publications, Research from concept to product, Collaborating or leading an applied research project, JAX</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Google DeepMind</Employername>
      <Employerlogo>https://logos.yubhub.co/deepmind.com.png</Employerlogo>
      <Employerdescription>Google DeepMind is a subsidiary of Alphabet Inc., a multinational conglomerate.</Employerdescription>
      <Employerwebsite>https://deepmind.com/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/deepmind/jobs/7731944</Applyto>
      <Location>Zurich, Switzerland</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>540ce49c-271</externalid>
      <Title>Member of Technical Staff - Multimodal Understanding</Title>
      <Description><![CDATA[<p><strong>About the Role</strong></p>
<p>You will join the multimodal team to push toward superhuman multimodal intelligence. Advance understanding and generation across modalities,image, video, audio, and text,spanning the full stack: data curation/acquisition, tokenizer training, large-scale pre-training, post-training/alignment, infrastructure/scaling, evaluation, tooling/demos, and end-to-end product experiences.</p>
<p>Collaborate cross-functionally with pre-training, post-training, reasoning, data, applied, and product teams to deliver frontier capabilities in multimodal reasoning, world modeling, tool use, agentic behaviors, and interactive human-AI collaboration. Contribute to building models that can see, hear, reason about, and interact with the world in real time at unprecedented levels.</p>
<p><strong>Responsibilities</strong></p>
<ul>
<li>Design, build, and optimize large-scale distributed systems for multimodal pre-training, post-training, inference, data processing, and tokenization at web/petabyte scale.</li>
<li>Develop high-throughput pipelines for data acquisition, preprocessing, filtering, generation, decoding, loading, crawling, visualization, and management (images, videos, audio + text).</li>
<li>Advance multimodal capabilities including spatial-temporal compression, cross-modal alignment, world modeling, reasoning, emergent abilities, audio/image/video understanding &amp; generation, real-time video processing, and noisy data handling.</li>
<li>Drive data quality and studies: curation (human/synthetic), filtering techniques, analysis, and scalable pipelines to support trillion-parameter models.</li>
<li>Create evaluation frameworks, internal benchmarks, reward models, and metrics that capture real-world usage, failure modes, interactive dynamics, and human-AI synergy.</li>
<li>Innovate on algorithms, modeling approaches, hardware/software/algorithm co-design, and scaling paradigms for state-of-the-art performance.</li>
<li>Build research tooling, user-friendly interfaces, prototypes/demos, full-stack applications, and enable rapid iteration based on feedback.</li>
<li>Work across the stack (pre-training → SFT/RL/post-training) to enable reasoning, tool calling, agentic behaviors, orchestration, and seamless real-time interactions.</li>
</ul>
<p><strong>Basic Qualifications</strong></p>
<ul>
<li>Hands-on experience with multimodal pre-training, post-training, or fine-tuning (vision, audio, video, or cross-modal).</li>
<li>Expert-level proficiency in Python (core language), with strong experience in at least one of: JAX / PyTorch / XLA.</li>
<li>Proven track record building or optimizing large-scale distributed ML systems (training/inference optimization, GPU utilization, multi-GPU/TPU setups, hardware co-design).</li>
<li>Deep experience designing and running data pipelines at scale: curation, filtering, generation, quality studies, especially for noisy/real-world multimodal data.</li>
<li>Strong fundamentals in evaluation design, benchmarks, reward modeling, or RL techniques (particularly for interactive/agentic behaviors).</li>
<li>Proactive self-starter who thrives in high-intensity environments and is passionate about pushing multimodal AI frontiers.</li>
<li>Willingness to own end-to-end initiatives and do whatever it takes to deliver breakthrough user experiences.</li>
</ul>
<p><strong>Preferred Skills and Experience</strong></p>
<ul>
<li>Experience leading major improvements in model capabilities through better data, modeling, algorithms, or scaling.</li>
<li>Familiarity with state-of-the-art in multimodal LLMs, scaling laws, tokenizers, compression techniques, reasoning, or agentic systems.</li>
<li>Proficiency in Rust and/or C++ for performance-critical components.</li>
<li>Hands-on work with large-scale orchestration tools such as Spark, Ray, or Kubernetes.</li>
<li>Background building full-stack tooling: performant interfaces, real-time research demos/apps, or end-to-end product ownership.</li>
<li>Passion for end-to-end user experience in interactive, real-time multimodal AI systems.</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>staff</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange>$180,000 - $440,000 USD</Salaryrange>
      <Skills>Multimodal pre-training, Post-training, Fine-tuning, Python, JAX, PyTorch, XLA, Large-scale distributed ML systems, Data pipelines, Evaluation design, Benchmarks, Reward modeling, RL techniques, State-of-the-art in multimodal LLMs, Scaling laws, Tokenizers, Compression techniques, Reasoning, Agentic systems, Rust, C++, Spark, Ray, Kubernetes, Full-stack tooling</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>xAI</Employername>
      <Employerlogo>https://logos.yubhub.co/xai.com.png</Employerlogo>
      <Employerdescription>xAI creates AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge.</Employerdescription>
      <Employerwebsite>https://www.xai.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/xai/jobs/5111374007</Applyto>
      <Location>Palo Alto, CA</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>a48bc0a6-719</externalid>
      <Title>Research Scientist, Gemini Safety</Title>
      <Description><![CDATA[<p>Job Title: Research Scientist, Gemini Safety</p>
<p>We&#39;re looking for a versatile Research Scientist to join our Gemini Safety team at Google DeepMind. As a Research Scientist, you will be responsible for applying and developing data and algorithmic cutting-edge solutions to advance the safety and fairness behavior of our latest user-facing models.</p>
<p>The Gemini Safety team is accountable for the safety and fairness behavior of GDM&#39;s latest Gemini models. Our team focuses on advancing the safety and fairness behavior of state-of-the-art AI models, driving the development of foundational technology adopted by numerous product areas, including Gemini App, Cloud API, and Search.</p>
<p>Key Responsibilities:</p>
<ul>
<li>Post-training/instruction tuning state-of-the-art LLMs, focusing on text-to-text, image/video/audio-to-text modalities and agentic capabilities</li>
<li>Exploring data, reasoning, and algorithmic solutions to ensure Gemini Models are safe, maximally helpful, and work for everyone</li>
<li>Improve Gemini&#39;s adversarial robustness, with a focus on high-stakes abuse risks</li>
<li>Design and maintain high-quality evaluation protocols to assess model behavior gaps and headroom related to safety and fairness</li>
<li>Develop and execute experimental plans to address known gaps, or construct entirely new capabilities</li>
<li>Drive innovation and enhance understanding of Supervised Fine Tuning and Reinforcement Learning fine-tuning at scale</li>
</ul>
<p>About You:</p>
<ul>
<li>PhD in Computer Science, a related field, or equivalent practical experience</li>
<li>Significant LLM post-training experience</li>
<li>Experience in Reward modeling and Reinforcement Learning for LLMs Instruction tuning</li>
<li>Experience with Long-range Reinforcement learning</li>
<li>Experience in areas such as Safety, Fairness, and Alignment</li>
<li>Track record of publications at NeurIPS, ICLR, ICML, RL/DL, EMNLP, AAAI, UAI</li>
<li>Experience taking research from concept to product</li>
<li>Experience with collaborating or leading an applied research project</li>
<li>Experience with JAX</li>
</ul>
<p>At Google DeepMind, we value diversity of experience, knowledge, backgrounds, and perspectives and harness these qualities to create extraordinary impact. We are committed to equal employment opportunity regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, pregnancy, or related condition (including breastfeeding) or any other basis as protected by applicable law. If you have a disability or additional need that requires accommodation, please do not hesitate to let us know.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>PhD in Computer Science, LLM post-training experience, Reward modeling and Reinforcement Learning for LLMs Instruction tuning, Long-range Reinforcement learning, Safety, Fairness, and Alignment, JAX</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Google DeepMind</Employername>
      <Employerlogo>https://logos.yubhub.co/deepmind.com.png</Employerlogo>
      <Employerdescription>Google DeepMind is a subsidiary of Alphabet Inc., a multinational conglomerate, and is involved in the development of artificial intelligence.</Employerdescription>
      <Employerwebsite>https://deepmind.com/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/deepmind/jobs/7421111</Applyto>
      <Location>Mountain View, California, US</Location>
      <Country></Country>
      <Postedate>2026-03-16</Postedate>
    </job>
    <job>
      <externalid>ca908406-7b8</externalid>
      <Title>Member of Technical Staff - Post Training - MAI Superintelligence Team</Title>
      <Description><![CDATA[<p><strong>Summary</strong></p>
<p>Microsoft AI are looking for a talented Member of Technical Staff - Post Training - MAI Superintelligence Team at their New York office. This role sits at the heart of strategic decision-making, turning market data into actionable insights for a company that&#39;s revolutionising haptic entertainment technology. You&#39;ll work directly with leadership to shape the company&#39;s direction in the cinema and simulation markets.</p>
<p><strong>About the Role</strong></p>
<p>This role involves contributions to all stages of the post-training process: driving data collection and acquisition, building evaluations of model capabilities, and applying advanced reward modeling and RL techniques to develop and improve the post-training recipe. We work on the bleeding edge and leverage the most powerful pretrained models and algorithms for our needs. We are an interdisciplinary team of engineers and scientists, learning from each other and collaborating to create the best models.</p>
<p><strong>Accountabilities</strong></p>
<ul>
<li>Develop data collection, evaluation, and post-training methods for models.</li>
<li>Design hypotheses and experiment plans for rapidly iterating on model performance.</li>
</ul>
<p><strong>The Candidate we&#39;re looking for</strong></p>
<p><strong>Experience:</strong></p>
<ul>
<li>Bachelor’s Degree in Computer Science, Machine Learning, Mathematics, or related technical discipline AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.</li>
</ul>
<p><strong>Technical skills:</strong></p>
<ul>
<li>Experience with reward modeling, RL, or other post-training techniques.</li>
</ul>
<p><strong>Personal attributes:</strong></p>
<ul>
<li>Passionate about advancing the state of post-training research.</li>
<li>Willing to contribute meaningfully as individuals and take end-to-end ownership of projects.</li>
</ul>
<p><strong>Benefits</strong></p>
<ul>
<li>Competitive salary range: $119,800 - $234,700 per year.</li>
<li>Comprehensive benefits package, including health insurance, retirement plan, and paid time off.</li>
<li>Opportunities for professional growth and development.</li>
<li>Collaborative and inclusive work environment.</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>staff</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange>$119,800 - $234,700 per year</Salaryrange>
      <Skills>reward modeling, RL, post-training techniques, C, C++, C#, Java, JavaScript, Python, conversational AI, deployment</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Microsoft AI</Employername>
      <Employerlogo>https://logos.yubhub.co/microsoft.ai.png</Employerlogo>
      <Employerdescription>Microsoft AI is a leading technology company that specializes in developing cutting-edge algorithms for post-training large language models. They aim to empower every person and every organization on the planet to achieve more. Their mission is to push the boundaries of AI toward Humanist Superintelligence—ultra-capable systems that remain controllable, safety-aligned, and anchored to human values.</Employerdescription>
      <Employerwebsite>https://microsoft.ai</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://microsoft.ai/job/member-of-technical-staff-post-training-mai-superintelligence-team-3/</Applyto>
      <Location>New York</Location>
      <Country></Country>
      <Postedate>2026-03-06</Postedate>
    </job>
    <job>
      <externalid>a98b937b-085</externalid>
      <Title>Member of Technical Staff - Post Training - MAI Superintelligence Team</Title>
      <Description><![CDATA[<p><strong>Summary</strong></p>
<p>Microsoft AI are looking for a talented Member of Technical Staff - Post Training - MAI Superintelligence Team at their Redmond office. This role sits at the heart of strategic decision-making, turning market data into actionable insights for a company that&#39;s revolutionising AI technology. You&#39;ll work directly with leadership to shape the company&#39;s direction in the AI market.</p>
<p><strong>About the Role</strong></p>
<p>This role involves contributions to all stages of the post-training process: driving data collection and acquisition, building evaluations of model capabilities, and applying advanced reward modeling and RL techniques to develop and improve the post-training recipe. We work on the bleeding edge and leverage the most powerful pretrained models and algorithms for our needs. We are an interdisciplinary team of engineers and scientists, learning from each other and collaborating to create the best models.</p>
<p><strong>Accountabilities</strong></p>
<ul>
<li>Develop data collection, evaluation, and post-training methods for models.</li>
<li>Design hypotheses and experiment plans for rapidly iterating on model performance.</li>
</ul>
<p><strong>The Candidate we&#39;re looking for</strong></p>
<p><strong>Experience:</strong></p>
<ul>
<li>Bachelor’s Degree in Computer Science, Machine Learning, Mathematics, or related technical discipline AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.</li>
</ul>
<p><strong>Technical skills:</strong></p>
<ul>
<li>Experience with reward modeling, RL, or other post-training techniques.</li>
</ul>
<p><strong>Personal attributes:</strong></p>
<ul>
<li>Passionate about advancing the state of post-training research.</li>
<li>Willing to contribute meaningfully as individuals and take end-to-end ownership of projects.</li>
</ul>
<p><strong>Benefits</strong></p>
<ul>
<li>Competitive salary range: $119,800 - $234,700 per year.</li>
<li>Comprehensive benefits package, including health insurance, retirement plan, and paid time off.</li>
<li>Opportunities for professional growth and development.</li>
<li>Collaborative and inclusive work environment.</li>
<li>Access to cutting-edge technology and resources.</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>staff</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange>$119,800 - $234,700 per year</Salaryrange>
      <Skills>reward modeling, RL, post-training techniques, C, C++, C#, Java, JavaScript, Python, conversational AI, deployment</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Microsoft AI</Employername>
      <Employerlogo>https://logos.yubhub.co/microsoft.ai.png</Employerlogo>
      <Employerdescription>Microsoft AI is a leading technology company that specializes in developing artificial intelligence (AI) and machine learning (ML) solutions. They are known for their cutting-edge research and development in AI, and their products are used by millions of users worldwide. Microsoft AI is committed to empowering every person and every organization on the planet to achieve more.</Employerdescription>
      <Employerwebsite>https://microsoft.ai</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://microsoft.ai/job/member-of-technical-staff-post-training-mai-superintelligence-team-2/</Applyto>
      <Location>Redmond</Location>
      <Country></Country>
      <Postedate>2026-03-06</Postedate>
    </job>
    <job>
      <externalid>99319c10-68b</externalid>
      <Title>Member of Technical Staff - Post Training - MAI Superintelligence Team</Title>
      <Description><![CDATA[<p><strong>Summary</strong></p>
<p>Microsoft AI are looking for a talented Member of Technical Staff - Post Training - MAI Superintelligence Team at their Mountain View office. This role sits at the heart of post-training and improving pre-trained models to advance the state-of-the-art on a wide variety of internal and external benchmarks. You&#39;ll work on the bleeding edge and leverage the most powerful pretrained models and algorithms for your needs.</p>
<p><strong>About the Role</strong></p>
<p>This role involves contributions to all stages of the post-training process: driving data collection and acquisition, building evaluations of model capabilities, and applying advanced reward modeling and RL techniques to develop and improve the post-training recipe. You will design hypotheses and experiment plans for rapidly iterating on model performance. You will work on the bleeding edge and leverage the most powerful pretrained models and algorithms for your needs.</p>
<p><strong>Accountabilities</strong></p>
<ul>
<li>Develop data collection, evaluation, and post-training methods for models.</li>
<li>Design hypotheses and experiment plans for rapidly iterating on model performance.</li>
</ul>
<p><strong>The Candidate we&#39;re looking for</strong></p>
<p><strong>Experience:</strong></p>
<ul>
<li>Bachelor’s Degree in Computer Science, Machine Learning, Mathematics, or related technical discipline AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.</li>
</ul>
<p><strong>Technical skills:</strong></p>
<ul>
<li>Experience with reward modeling, RL, or other post-training techniques.</li>
</ul>
<p><strong>Personal attributes:</strong></p>
<ul>
<li>Passionate about advancing the state of post-training research.</li>
<li>Will thrive in a highly collaborative, fast-paced environment.</li>
</ul>
<p><strong>Benefits</strong></p>
<ul>
<li>Starting January 26, 2026, MAI employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.) or 25 miles (non-U.S., country-specific) of that location.</li>
<li>Microsoft Superintelligence team’s mission is to empower every person and every organization on the planet to achieve more.</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>staff</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange>USD $119,800 – </Salaryrange>
      <Skills>reward modeling, RL, post-training techniques, C, C++, C#, Java, JavaScript, Python, conversational AI, large-scale AI</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Microsoft AI</Employername>
      <Employerlogo>https://logos.yubhub.co/microsoft.ai.png</Employerlogo>
      <Employerdescription>Microsoft AI is a leading technology company that specializes in developing cutting-edge algorithms for post-training large language models. They aim to push the boundaries of AI toward Humanist Superintelligence—ultra-capable systems that remain controllable, safety-aligned, and anchored to human values.</Employerdescription>
      <Employerwebsite>https://microsoft.ai</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://microsoft.ai/job/member-of-technical-staff-post-training-mai-superintelligence-team/</Applyto>
      <Location>Mountain View</Location>
      <Country></Country>
      <Postedate>2026-03-06</Postedate>
    </job>
  </jobs>
</source>