{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/long-context"},"x-facet":{"type":"skill","slug":"long-context","display":"Long Context","count":4},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_dc6154f8-cff"},"title":"Research Engineer, Pretraining Scaling - London","description":"<p>About Anthropic\\n\\nAnthropic&#39;s mission is to create reliable, interpretable, and steerable AI systems.\\n\\nAbout the Role:\\n\\nAs a Research Engineer on Anthropic&#39;s ML Performance and Scaling team, you&#39;ll ensure our frontier models train reliably, efficiently, and at scale. This is demanding, high-impact work that requires both deep technical expertise and a genuine passion for the craft of large-scale ML systems.\\n\\nResponsibilities:\\n\\n- Own critical aspects of our production pretraining pipeline, including model operations, performance optimization, observability, and reliability\\n- Debug and resolve complex issues across the full stack,from hardware errors and networking to training dynamics and evaluation infrastructure\\n- Design and run experiments to improve training efficiency, reduce step time, increase uptime, and enhance model performance\\n- Respond to on-call incidents during model launches, diagnosing problems quickly and coordinating solutions across teams\\n- Build and maintain production logging, monitoring dashboards, and evaluation infrastructure\\n- Add new capabilities to the training codebase, such as long context support or novel architectures\\n- Collaborate closely with teammates across SF and London, as well as with Tokens, Architectures, and Systems teams\\n- Contribute to the team&#39;s institutional knowledge by documenting systems, debugging approaches, and lessons learned\\n\\nYou May Be a Good Fit If You:\\n\\n- Have hands-on experience training large language models, or deep expertise with JAX, TPU, PyTorch, or large-scale distributed systems\\n- Genuinely enjoy both research and engineering work,you&#39;d describe your ideal split as roughly 50/50 rather than heavily weighted toward one or the other\\n- Are excited about being on-call for production systems, working long days during launches, and solving hard problems under pressure\\n- Thrive when working on whatever is most impactful, even if that changes day-to-day based on what the production model needs\\n- Excel at debugging complex, ambiguous problems across multiple layers of the stack\\n- Communicate clearly and collaborate effectively, especially when coordinating across time zones or during high-stress incidents\\n- Are passionate about the work itself and want to refine your craft as a research engineer\\n- Care about the societal impacts of AI and responsible scaling\\n\\nStrong Candidates May Also Have:\\n\\n- Previous experience training LLM’s or working extensively with JAX/TPU, PyTorch, or other ML frameworks at scale\\n- Contributed to open-source LLM frameworks (e.g., open_lm, llm-foundry, mesh-transformer-jax)\\n- Published research on model training, scaling laws, or ML systems\\n- Experience with production ML systems, observability tools, or evaluation infrastructure\\n- Background as a systems engineer, quant, or in other roles requiring both technical depth and operational excellence\\n\\nWhat Makes This Role Unique:\\n\\nThis is not a typical research engineering role. The work is highly operational,you&#39;ll be deeply involved in keeping our production models training smoothly, which means being responsive to incidents, flexible about priorities, and comfortable with uncertainty. During launches, the team often works extended hours and may need to respond to issues on evenings and weekends.\\n\\nHowever, this operational intensity comes with extraordinary learning opportunities. You&#39;ll gain hands-on experience with some of the largest, most sophisticated training runs in the industry. You&#39;ll work alongside world-class researchers and engineers, and the institutional knowledge you build will compound in ways that can&#39;t be easily transferred. For people who thrive on this type of work, it&#39;s uniquely rewarding.\\n\\nWe&#39;re building a close-knit team of people who genuinely care about doing excellent work together. If you&#39;re someone who wants to be part of training the models that will define the future of AI,and you&#39;re excited about the full reality of what that entails,we&#39;d love to hear from you.\\n\\nLocation:\\n\\nThis role requires working in-office 5 days per week in London.\\n\\nDeadline to apply:\\n\\nNone. Applications will be reviewed on a rolling basis.\\n\\nThe annual compensation range for this role is listed below.\\n\\nFor sales roles, the range provided is the role’s On Target Earnings (&quot;OTE&quot;) range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.\\n\\nAnnual Salary:\\n\\n£260,000-£630,000 GBP\\n\\nLogistics\\n\\nMinimum education:\\n\\nBachelor’s degree or an equivalent combination of education, training, and/or experience\\n\\nRequired field of study:\\n\\nA field relevant to the role as demonstrated through coursework, training, or professional experience\\n\\nMinimum years of experience:\\n\\nYears of experience required will correlate with the internal job level requirements for the position\\n\\nLocation-based hybrid policy:\\n\\nCurrently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.\\n\\nVisa sponsorship:\\n\\nWe do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.\\n\\nWe encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work. We think AI systems like the ones we&#39;re building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team.\\n\\nYour safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you&#39;re ever unsure about a communication, don&#39;t click any links,visit anthropic.com/careers directly for confirmed position openings.\\n\\nHow we&#39;re different\\n\\nWe believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact , advancing our long-term goals of steerable, trustworthy AI , rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We&#39;re an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the h</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_dc6154f8-cff","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4938436008","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"£260,000-£630,000 GBP","x-skills-required":["JAX","TPU","PyTorch","large-scale distributed systems","model operations","performance optimization","observability","reliability","debugging","complex issues","hardware errors","networking","training dynamics","evaluation infrastructure","experiments","training efficiency","step time","uptime","model performance","production logging","monitoring dashboards","codebase","long context support","novel architectures","collaboration","institutional knowledge","documentation","debugging approaches","lessons learned"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:42:55.023Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"London, UK"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"JAX, TPU, PyTorch, large-scale distributed systems, model operations, performance optimization, observability, reliability, debugging, complex issues, hardware errors, networking, training dynamics, evaluation infrastructure, experiments, training efficiency, step time, uptime, model performance, production logging, monitoring dashboards, codebase, long context support, novel architectures, collaboration, institutional knowledge, documentation, debugging approaches, lessons learned","baseSalary":{"@type":"MonetaryAmount","currency":"GBP","value":{"@type":"QuantitativeValue","minValue":260000,"maxValue":630000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_6960fd5f-0e8"},"title":"Research Engineer, Pretraining Scaling","description":"<p><strong>About the Role:\\n\\nAs a Research Engineer on Anthropic&#39;s ML Performance and Scaling team, you&#39;ll ensure our frontier models train reliably, efficiently, and at scale. This is demanding, high-impact work that requires both deep technical expertise and a genuine passion for the craft of large-scale ML systems.\\n\\n## Responsibilities:\\n\\n- Own critical aspects of our production pretraining pipeline, including model operations, performance optimization, observability, and reliability\\n- Debug and resolve complex issues across the full stack,from hardware errors and networking to training dynamics and evaluation infrastructure\\n- Design and run experiments to improve training efficiency, reduce step time, increase uptime, and enhance model performance\\n- Respond to on-call incidents during model launches, diagnosing problems quickly and coordinating solutions across teams\\n- Build and maintain production logging, monitoring dashboards, and evaluation infrastructure\\n- Add new capabilities to the training codebase, such as long context support or novel architectures\\n- Collaborate closely with teammates across SF and London, as well as with Tokens, Architectures, and Systems teams\\n- Contribute to the team&#39;s institutional knowledge by documenting systems, debugging approaches, and lessons learned\\n\\n## You May Be a Good Fit If You:\\n\\n- Have hands-on experience training large language models, or deep expertise with JAX, TPU, PyTorch, or large-scale distributed systems\\n- Genuinely enjoy both research and engineering work,you&#39;d describe your ideal split as roughly 50/50 rather than heavily weighted toward one or the other\\n- Are excited about being on-call for production systems, working long days during launches, and solving hard problems under pressure\\n- Thrive when working on whatever is most impactful, even if that changes day-to-day based on what the production model needs\\n- Excel at debugging complex, ambiguous problems across multiple layers of the stack\\n- Communicate clearly and collaborate effectively, especially when coordinating across time zones or during high-stress incidents\\n- Are passionate about the work itself and want to refine your craft as a research engineer\\n- Care about the societal impacts of AI and responsible scaling\\n\\n## Strong Candidates May Also Have:\\n\\n- Previous experience training LLM’s or working extensively with JAX/TPU, PyTorch, or other ML frameworks at scale\\n- Contributed to open-source LLM frameworks (e.g., open_lm, llm-foundry, mesh-transformer-jax)\\n- Published research on model training, scaling laws, or ML systems\\n- Experience with production ML systems, observability tools, or evaluation infrastructure\\n- Background as a systems engineer, quant, or in other roles requiring both technical depth and operational excellence\\n\\n## What Makes This Role Unique:\\n\\nThis is not a typical research engineering role. The work is highly operational,you&#39;ll be deeply involved in keeping our production models training smoothly, which means being responsive to incidents, flexible about priorities, and comfortable with uncertainty. During launches, the team often works extended hours and may need to respond to issues on evenings and weekends.\\n\\nHowever, this operational intensity comes with extraordinary learning opportunities. You&#39;ll gain hands-on experience with some of the largest, most sophisticated training runs in the industry. You&#39;ll work alongside world-class researchers and engineers, and the institutional knowledge you build will compound in ways that can&#39;t be easily transferred. For people who thrive on this type of work, it&#39;s uniquely rewarding.\\n\\nWe&#39;re building a close-knit team of people who genuinely care about doing excellent work together. If you&#39;re someone who wants to be part of training the models that will define the future of AI,and you&#39;re excited about the full reality of what that entails,we&#39;d love to hear from you.\\n\\nLocation: This role requires working in-office 5 days per week in San Francisco.\\n\\nDeadline to apply: None. Applications will be reviewed on a rolling basis.\\n\\nThe annual compensation range for this role is listed below.\\n\\nFor sales roles, the range provided is the role’s On Target Earnings (&quot;OTE&quot;) range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.\\n\\nAnnual Salary: $350,000-$850,000 USD\\n\\n## Logistics\\n\\nMinimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience\\n\\nRequired field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience\\n\\nMinimum years of experience: Years of experience required will correlate with the internal job level requirements for the position\\n\\nLocation-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.\\n\\nVisa sponsorship: We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.\\n\\nWe encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work. We think AI systems like the ones we&#39;re building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team.\\n\\nYour safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you&#39;re ever unsure about a communication, don&#39;t click any links,visit anthropic.com/careers directly for confirmed position openings.\\n\\n## How we&#39;re different\\n\\nWe believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact , advancing our long-term goals of steerable, trustworthy AI , rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We&#39;re an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing</strong></p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_6960fd5f-0e8","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4938432008","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$350,000-$850,000 USD","x-skills-required":["JAX","TPU","PyTorch","large-scale distributed systems","model operations","performance optimization","observability","reliability","debugging","complex issues","hardware errors","networking","training dynamics","evaluation infrastructure","experiments","training efficiency","step time","uptime","model performance","production logging","monitoring dashboards","new capabilities","long context support","novel architectures","collaboration","institutional knowledge","documentation","debugging approaches","lessons learned"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:42:31.268Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"JAX, TPU, PyTorch, large-scale distributed systems, model operations, performance optimization, observability, reliability, debugging, complex issues, hardware errors, networking, training dynamics, evaluation infrastructure, experiments, training efficiency, step time, uptime, model performance, production logging, monitoring dashboards, new capabilities, long context support, novel architectures, collaboration, institutional knowledge, documentation, debugging approaches, lessons learned","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":350000,"maxValue":850000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_a938a934-817"},"title":"Software Engineer, Applied Evals","description":"<p><strong>Software Engineer, Applied Evals</strong></p>\n<p><strong>Location</strong></p>\n<p>San Francisco</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Location Type</strong></p>\n<p>Hybrid</p>\n<p><strong>Department</strong></p>\n<p>Applied AI</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$230K – $325K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p>More details about our benefits are available to candidates during the hiring process.</p>\n<p>This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.</p>\n<p><strong>About the team</strong></p>\n<p>Applied Evals defines what good looks like for safe, advanced AI systems. We turn complex, high-value workflows into clear, reproducible signals that guide model training and product quality. Our work bridges frontier customers and models, ensuring improvements show up where users experience them. We combine hands-on, unscalable efforts with systems that others can extend, creating a compounding loop of model improvement.</p>\n<p><strong>About the Role</strong></p>\n<p>We’re hiring product-minded engineers to design and build evals and harnesses that capture real-world quality for advanced AI systems. You’ll own the loop from prototyping with users to building reliable pipelines and integrating signals into training stacks. This role sits at the center of model improvement. The systems you design will directly shape how models behave, accelerate their reliability, and raise the standard for what customers expect.</p>\n<p>You’ll collaborate closely with research and product teams and work across the stack, from backend pipelines to user-facing interfaces. The work includes evaluating multi-turn and tool-using systems, designing agent harnesses, and applying reinforcement learning and related methods in production settings. Engineers who succeed in this role bring both a builder’s mindset and the judgment to create reusable systems that others can build on. Many thrive here by operating like founders or founding engineers, taking initiative, moving quickly, and creating structure where none exists.</p>\n<p>This role is based in our San Francisco HQ. We use a hybrid work model of 3 days in the office per week and offer relocation assistance.</p>\n<p><strong>In this role, you will:</strong></p>\n<ul>\n<li>Define the core evaluation signals that drive model improvement at OpenAI, turning vague product gaps into crisp, defensible measures of quality</li>\n</ul>\n<ul>\n<li>Design agents, harnesses, and eval pipelines that are reliable, reproducible, and extendable</li>\n</ul>\n<ul>\n<li>Prototype solutions with real workflows and convert them into scalable feedback loops</li>\n</ul>\n<ul>\n<li>Connect evaluation signals directly to research and training systems so product improvements show up in what users experience</li>\n</ul>\n<ul>\n<li>Shape model interaction paradigms by partnering with engineering, research, and product teams on how models are deployed and measured</li>\n</ul>\n<ul>\n<li>Build reusable systems and tools that enable contributions from across the company and steadily raise the quality bar</li>\n</ul>\n<p><strong>You’ll thrive in this role if you:</strong></p>\n<ul>\n<li>Bring 4+ years of experience in software engineering with strong fundamentals and a track record of shipping production systems end-to-end</li>\n</ul>\n<ul>\n<li>Have experience building AI agents or applications, including designing evals and improving performance through prompting or scaffolding</li>\n</ul>\n<ul>\n<li>Are familiar with evaluation methods for LLMs and have worked with patterns like multi-agent workflows, tool use, or long context.</li>\n</ul>\n<ul>\n<li>Are familiar with deep learning concepts or have prior exposure to training models.</li>\n</ul>\n<ul>\n<li>Communicate clearly across technical and non-technical audiences across levels</li>\n</ul>\n<ul>\n<li>Are motivated by high-impact collaboration with research and product teams and thrive in ambiguity</li>\n</ul>\n<p><strong>About OpenAI</strong></p>\n<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_a938a934-817","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/99121e6d-a542-4881-968f-4cd89d9f583c","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$230K – $325K","x-skills-required":["Software engineering","AI agents or applications","Evaluation methods for LLMs","Deep learning concepts","Training models"],"x-skills-preferred":["Reinforcement learning","Multi-agent workflows","Tool use","Long context"],"datePosted":"2026-03-06T18:26:55.038Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Software engineering, AI agents or applications, Evaluation methods for LLMs, Deep learning concepts, Training models, Reinforcement learning, Multi-agent workflows, Tool use, Long context","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":230000,"maxValue":325000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_c224e1d4-cc6"},"title":"Backend Software Engineer (Evals)","description":"<p><strong>Location</strong></p>\n<p>San Francisco; Seattle</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p>Applied AI</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$230K – $385K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p>More details about our benefits are available to candidates during the hiring process.</p>\n<p>This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.</p>\n<p><strong>About the Team</strong></p>\n<p>The Support Automation team at OpenAI scales the organization by applying cutting-edge AI models to real-world challenges, automating and enhancing work across the organization. From customer operations to engineering, we develop an ecosystem of automation products that empower our colleagues and drive impact. We&#39;re passionate about crafting products that serve those around us, blending rapid prototyping with a focus on long-term quality and reliability. By creating reusable solutions, we create patterns that can be applied across diverse domains within OpenAI.</p>\n<p>TLDR: this team leverages OpenAI technology to improve OpenAI, and you’ll have the opportunity to leverage the full extent of our tech (both public and pre-released) to accomplish this mission.</p>\n<p><strong>About the Role</strong></p>\n<p>We’re looking for a <strong>Backend Software Engineer</strong> with experience working in ML/LLM-heavy domains to help to design and build an evals infrastructure that measures the quality of OpenAI’s support automation. This is a deeply technical and highly cross-functional role where you’ll build robust systems and backend services that serve as the foundation for how knowledge is created, accessed, and applied across OpenAI. The role will especially focus on working closely with Data Science and Research partners to design and build evals at scale.</p>\n<p><strong>In this role, you will:</strong></p>\n<ul>\n<li>Design eval pipelines that are reliable, reproducible, and extendable</li>\n</ul>\n<ul>\n<li>Build the infrastructure for continuous eval monitoring frameworks (regression/drift monitoring, building robust golden datasets) along with feedback loops that ultimately strengthen support automation</li>\n</ul>\n<ul>\n<li>Design, build, and maintain backend services and APIs to support intelligent automation and knowledge systems</li>\n</ul>\n<ul>\n<li>Integrate and structure data across internal platforms, transforming it into formats optimized for use by downstream systems and AI workflows.</li>\n</ul>\n<ul>\n<li>Collaborate closely with data, research, and engineering teams to integrate OpenAI models into high-leverage workflows</li>\n</ul>\n<ul>\n<li>Own the full development lifecycle of new backend systems and internal platform capabilities</li>\n</ul>\n<ul>\n<li>Build with scale and maintainability in mind, while rapidly iterating on new ideas</li>\n</ul>\n<p><strong>You might be a great fit if you have:</strong></p>\n<ul>\n<li>4+ years of backend engineering experience at product-driven companies (excluding internships)</li>\n</ul>\n<ul>\n<li>Proficiency in backend technologies. Our tech stack includes Python, FastAPI, and Postgres</li>\n</ul>\n<ul>\n<li>Experience designing and scaling distributed systems, APIs, or data processing pipelines</li>\n</ul>\n<ul>\n<li>Have experience building AI agents or applications, including designing evals and improving performance through prompting or scaffolding</li>\n</ul>\n<ul>\n<li>Are familiar with evaluation methods for LLMs and have worked with patterns like multi-agent workflows, tool use, or long context.</li>\n</ul>\n<ul>\n<li>Experience creating production evals and/or measuring performance of ML/LLM models at scale</li>\n</ul>\n<ul>\n<li>A pragmatic mindset. You’re comfortable shipping iteratively while building toward a long-term vision</li>\n</ul>\n<p><strong>About OpenAI</strong></p>\n<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_c224e1d4-cc6","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/3d064454-c0c3-4225-bc2c-6d8c0f8735b2","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$230K – $385K","x-skills-required":["backend engineering","Python","FastAPI","Postgres","distributed systems","APIs","data processing pipelines","AI agents","evaluation methods for LLMs"],"x-skills-preferred":["ML/LLM-heavy domains","designing evals","improving performance through prompting or scaffolding","multi-agent workflows","tool use","long context"],"datePosted":"2026-03-06T18:19:49.073Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco; Seattle"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"backend engineering, Python, FastAPI, Postgres, distributed systems, APIs, data processing pipelines, AI agents, evaluation methods for LLMs, ML/LLM-heavy domains, designing evals, improving performance through prompting or scaffolding, multi-agent workflows, tool use, long context","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":230000,"maxValue":385000,"unitText":"YEAR"}}}]}