<?xml version="1.0" encoding="UTF-8"?>
<source>
  <jobs>
    <job>
      <externalid>3f9344a5-f6e</externalid>
      <Title>[Expression of Interest] Research Scientist / Engineer, Honesty</Title>
      <Description><![CDATA[<p><strong>About the role</strong></p>
<p>As a Research Scientist/Engineer focused on honesty within the Finetuning Alignment team, you&#39;ll spearhead the development of techniques to minimize hallucinations and enhance truthfulness in language models.</p>
<p>Your work will focus on creating robust systems that are accurate and reflect their true levels of confidence across all domains, and that work to avoid being deceptive or misleading.</p>
<p>Your work will be critical for ensuring our models maintain high standards of accuracy and honesty across diverse domains.</p>
<p><strong>Responsibilities</strong></p>
<ul>
<li>Design and implement novel data curation pipelines to identify, verify, and filter training data for accuracy given the model’s knowledge</li>
</ul>
<ul>
<li>Develop specialized classifiers to detect potential hallucinations or miscalibrated claims made by the model</li>
</ul>
<ul>
<li>Create and maintain comprehensive honesty benchmarks and evaluation frameworks</li>
</ul>
<ul>
<li>Implement techniques to ground model outputs in verified information, such as search and retrieval-augmented generation (RAG) systems</li>
</ul>
<ul>
<li>Design and deploy human feedback collection specifically for identifying and correcting miscalibrated responses</li>
</ul>
<ul>
<li>Design and implement prompting pipelines to generate data that improves model accuracy and honesty</li>
</ul>
<ul>
<li>Develop and test novel RL environments that reward truthful outputs and penalize fabricated claims</li>
</ul>
<ul>
<li>Create tools to help human evaluators efficiently assess model outputs for accuracy</li>
</ul>
<p><strong>Requirements</strong></p>
<ul>
<li>Have an MS/PhD in Computer Science, ML, or related field</li>
</ul>
<ul>
<li>Possess strong programming skills in Python</li>
</ul>
<ul>
<li>Have industry experience with language model finetuning and classifier training</li>
</ul>
<ul>
<li>Show proficiency in experimental design and statistical analysis for measuring improvements in calibration and accuracy</li>
</ul>
<ul>
<li>Care about AI safety and the accuracy and honesty of both current and future AI systems</li>
</ul>
<ul>
<li>Have experience in data science or the creation and curation of datasets for finetuning LLMs</li>
</ul>
<ul>
<li>An understanding of various metrics of uncertainty, calibration, and truthfulness in model outputs</li>
</ul>
<p><strong>Preferred qualifications</strong></p>
<ul>
<li>Published work on hallucination prevention, factual grounding, or knowledge integration in language models</li>
</ul>
<ul>
<li>Experience with fact-grounding techniques</li>
</ul>
<ul>
<li>Background in developing confidence estimation or calibration methods for ML models</li>
</ul>
<ul>
<li>A track record of creating and maintaining factual knowledge bases</li>
</ul>
<ul>
<li>Familiarity with RLHF specifically applied to improving model truthfulness</li>
</ul>
<ul>
<li>Worked with crowd-sourcing platforms and human feedback collection systems</li>
</ul>
<ul>
<li>Experience developing evaluations of model accuracy or hallucinations</li>
</ul>
<p><strong>Benefits</strong></p>
<ul>
<li>Competitive compensation and benefits</li>
</ul>
<ul>
<li>Optional equity donation matching</li>
</ul>
<ul>
<li>Generous vacation and parental leave</li>
</ul>
<ul>
<li>Flexible working hours</li>
</ul>
<ul>
<li>Lovely office space in which to collaborate with colleagues</li>
</ul>
<p><strong>Visa sponsorship</strong></p>
<p>We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$350,000-$500,000 USD</Salaryrange>
      <Skills>Python, Language model finetuning, Classifier training, Experimental design, Statistical analysis, Data science, Dataset creation, Uncertainty metrics, Calibration methods, Hallucination prevention, Factual grounding, Confidence estimation, Knowledge integration, RLHF, Crowd-sourcing platforms, Human feedback collection, Evaluations of model accuracy</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Anthropic</Employername>
      <Employerlogo>https://logos.yubhub.co/anthropic.com.png</Employerlogo>
      <Employerdescription>Anthropic is a public benefit corporation that creates reliable, interpretable, and steerable AI systems.</Employerdescription>
      <Employerwebsite>https://www.anthropic.com/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/anthropic/jobs/4532887008</Applyto>
      <Location>New York City, NY; San Francisco, CA</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
  </jobs>
</source>