Machine Learning Systems Research Engineer, Agent Post-training

f28927b0-573 Machine Learning Systems Research Engineer, Agent Post-training - Enterprise GenAI At Scale, our mission is to accelerate the development of AI applications. We are working on an arsenal of proprietary research and resources that serve all of our enterprise clients. As an ML Sys Research Engineer, you'll work on building out the algorithms for our next-gen Agent RL training platform, support large scale training, and research and integrate state-of-the-art technologies to optimize our ML system.

Your customer will be other MLREs and AAIs on the Enterprise AI team who are taking the training algorithms and applying them to client use-cases ranging from next-generation AI cybersecurity firewall LLMs to training foundation healthtech search models.

If you are excited about shaping the future of the modern AI movement, we would love to hear from you!

Key Responsibilities:

Build, profile and optimize our training and inference framework.
Post-train state of the art models, developed both internally and from the community, to define stable post-training recipes for our enterprise engagements.
Collaborate with ML teams to accelerate their research and development, and enable them to develop the next generation of models and data curation.
Create a next-gen agent training algorithm for multi-agent/multi-tool rollouts.

Ideal Candidate:

At least 1-3 years of LLM training in a production environment.
Passionate about system optimization.
Experience with post-training methods like RLHF/RLVR and related algorithms like PPO/GRPO etc.
Ability to demonstrate know-how on how to operate the architecture of the modern GPU cluster.
Experience with multi-node LLM training and inference.
Strong software engineering skills, proficient in frameworks and tools such as CUDA, Pytorch, transformers, flash attention, etc.
Strong written and verbal communication skills to operate in a cross functional team environment.
PhD or Masters in Computer Science or a related field.

Compensation:

We offer competitive compensation packages, including base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position, determined by work location and additional factors, including job-related skills, experience, interview performance, and relevant education or training.

Benefits:

Comprehensive health, dental and vision coverage.
Retirement benefits.
A learning and development stipend.
Generous PTO.
Commuter stipend.

XML job scraping automation by YubHub

]]> full-time mid hybrid $189,600-$237,000 USD LLM training, System optimization, Post-training methods, GPU cluster operation, Multi-node LLM training, Inference, CUDA, Pytorch, Transformers, Flash attention Engineering Technology Scale https://logos.yubhub.co/scale.com.png Scale is a leading AI data foundry that helps fuel advancements in AI, including generative AI, defense applications, and autonomous vehicles. https://www.scale.com/ https://job-boards.greenhouse.io/scaleai/jobs/4625341005 San Francisco, CA; New York, NY 2026-04-18 539e2a23-ddf Tech Lead Manager- MLRE, ML Systems You will lead the development of our internal distributed framework for large language model training. The platform powers MLEs, researchers, data scientists, and operators for fast and automatic training and evaluation of LLMs. It also serves as the underlying training framework for the data quality evaluation pipeline.

You will work closely with Scale’s ML teams and researchers to build the foundation platform which supports all our ML research and development works. You will be building and optimising the platform to enable our next generation LLM training, inference and data curation.

Key responsibilities include:

Building, profiling and optimising our training and inference framework.
Collaborating with ML and research teams to accelerate their research and development, and enable them to develop the next generation of models and data curation.
Researching and integrating state-of-the-art technologies to optimise our ML system.

The ideal candidate will have:

Passionate about system optimisation.
Experience with multi-node LLM training and inference.
Experience with developing large-scale distributed ML systems.
Experience with post-training methods like RLHF/RLVR and related algorithms like PPO/GRPO etc.
Strong software engineering skills, proficient in frameworks and tools such as CUDA, PyTorch, transformers, flash attention, etc.

Nice to haves include demonstrated expertise in post-training methods and/or next generation use cases for large language models including instruction tuning, RLHF, tool use, reasoning, agents, and multimodal, etc.

XML job scraping automation by YubHub

]]> full-time senior hybrid $264,800-$331,000 USD system optimisation, multi-node LLM training and inference, large-scale distributed ML systems, post-training methods, software engineering skills, CUDA, PyTorch, transformers, flash attention, next generation use cases for large language models, instruction tuning, RLHF, tool use, reasoning, agents, multimodal Engineering Technology Scale https://logos.yubhub.co/scale.com.png Scale provides training and evaluation data and end-to-end solutions for the ML lifecycle. https://scale.com/ https://job-boards.greenhouse.io/scaleai/jobs/4618046005 San Francisco, CA; New York, NY 2026-04-18 57a8aa85-77e Staff Machine Learning Research Engineer, Agent Post-training - Enterprise GenAI We are seeking a Staff Machine Learning Research Engineer to join our Enterprise ML Research Lab. As a key member of our team, you will build out our next-gen Agent RL training platform, integrating cutting-edge research into our training stack. You will train state-of-the-art models, design solutions for complex multi-agent systems, and collaborate with our team to deploy use-cases ranging from next-generation AI cybersecurity firewall LLMs to training foundation healthtech search models.

The ideal candidate will have 5+ years of LLM training in a production environment, experience with post-training methods like RLHF/RLVR and related algorithms like PPO/GRPO, and publications in top conferences such as NEURIPS, ICLR, or ICML within the last two years. A PhD or Masters in Computer Science or a related field is required.

In addition to a competitive salary, you will receive equity-based compensation, comprehensive health, dental, and vision coverage, retirement benefits, a learning and development stipend, and generous PTO. This role may also be eligible for additional benefits such as a commuter stipend.

XML job scraping automation by YubHub

]]> full-time staff onsite $189,600-$237,000 USD LLM training, Post-training methods, RLHF/RLVR, PPO/GRPO, NEURIPS, ICLR, ICML, Computer Science, PhD, Masters Engineering Technology Scale https://logos.yubhub.co/scale.com.png Scale is a leading AI data foundry, helping fuel the most exciting advancements in AI, including generative AI, defense applications, and autonomous vehicles. https://www.scale.com/ https://job-boards.greenhouse.io/scaleai/jobs/4625337005 San Francisco, CA; New York, NY 2026-04-18 840bab06-7be ML Research Engineer, ML Systems Job Description:

Scale's ML platform (RLXF) team builds our internal distributed framework for large language model training and inference. The platform has been powering MLEs, researchers, data scientists and operators for fast and automatic training and evaluation of LLM's, as well as evaluation of data quality.

At Scale, we're uniquely positioned at the heart of the field of AI as an indispensable provider of training and evaluation data and end-to-end solutions for the ML lifecycle. You will work closely across Scale's ML teams and researchers to build the foundation platform that supports all our ML research and development. You will be building and optimizing the platform to enable our next generation of LLM training, inference and data curation.

Responsibilities:

Build, profile and optimize our training and inference framework
Collaborate with ML teams to accelerate their research and development and enable them to develop the next generation of models and data curation
Research and integrate state-of-the-art technologies to optimize our ML system

Ideal Candidate:

Strong excitement about system optimization
Experience with multi-node LLM training and inference
Experience with developing large-scale distributed ML systems
Strong software engineering skills, proficient in frameworks and tools such as CUDA, Pytorch, transformers, flash attention, etc.
Strong written and verbal communication skills and the ability to operate in a cross functional team environment

Nice to Have:

Demonstrated expertise in post-training methods &/or next generation use cases for large language models including instruction tuning, RLHF, tool use, reasoning, agents, and multimodal, etc.

Compensation Packages:

Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position, determined by work location and additional factors, including job-related skills, experience, interview performance, and relevant education or training. Scale employees in eligible roles are also granted equity based compensation, subject to Board of Director approval. Your recruiter can share more about the specific salary range for your preferred location during the hiring process, and confirm whether the hired role will be eligible for equity grant. You'll also receive benefits including, but not limited to: Comprehensive health, dental and vision coverage, retirement benefits, a learning and development stipend, and generous PTO. Additionally, this role may be eligible for additional benefits such as a commuter stipend.

Please note that our policy requires a 90-day waiting period before reconsidering candidates for the same role. This allows us to ensure a fair and thorough evaluation of all applicants.

XML job scraping automation by YubHub

]]> full-time senior hybrid $189,600-$237,000 USD System Optimization, Multi-node LLM Training and Inference, Large-Scale Distributed ML Systems, CUDA, Pytorch, Transformers, Flash Attention, Post-Training Methods, Next Generation Use Cases for Large Language Models, Instruction Tuning, RLHF, Tool Use, Reasoning, Agents, Multimodal Engineering Technology Scale https://logos.yubhub.co/scale.com.png Scale develops reliable AI systems for the world's most important decisions, providing high-quality data and full-stack technologies for leading models. https://scale.com/ https://job-boards.greenhouse.io/scaleai/jobs/4534631005 San Francisco, CA; Seattle, WA; New York, NY 2026-04-18 b2637f59-e14 Full-Stack Software Engineer, Reinforcement Learning As a Full-Stack Software Engineer in RL, you'll build the platforms, tools, and interfaces that power environment creation, data collection, and training observability. The quality of Claude's next generation depends on the quality of the data we train it on , and the systems you build are what make that data possible. You'll own product surfaces end-to-end , from backend services and APIs to the web UIs that researchers, external vendors, and thousands of data labelers use every day.\n\nYou don't need a background in ML research. What matters is that you can take an ambiguous, high-stakes problem and ship a polished, reliable product against it, fast. This team moves very quickly. Claude writes a lot of the code we commit, which means the bottleneck isn't typing , it's judgment, taste, and the ability to react to what researchers need next.\n\nYou'll iterate on data collection strategies to distill the knowledge of thousands of human experts around the world into our models, and you'll do it in a loop that closes in hours and days, not quarters or months.\n\nAnthropic's Reinforcement Learning organization leads the research and development that trains Claude to be capable, reliable, and safe. We've contributed to every Claude model, with significant impact on the autonomy and coding capabilities of our most advanced models.\n\nOur work spans teaching models to use computers effectively, advancing code generation through RL, pioneering fundamental RL research for large language models, and building the scalable training methodologies behind our frontier production models.\n\nThe RL org is organized around four goals: solving the science of long-horizon tasks and continual learning, scaling RL data and environments to be comprehensive and diverse, automating software engineering end-to-end, and training the frontier production model.\n\nOur engineering teams build the environments, evaluation systems, data pipelines, and tooling that make all of this possible , from realistic agentic training environments and scalable code data generation to human data collection platforms and production training operations.\n\n### Responsibilities\n\n Build and extend web platforms for RL environment creation, management, and quality review , including environment configuration, versioning, and validation workflows\n Develop vendor-facing interfaces and tooling that let external partners create, submit, and iterate on training environments with minimal friction\n Design and implement platforms for human data collection at scale, including labeling workflows, quality assurance systems, and feedback mechanisms that surface reward signal integrity issues early\n Build evaluation dashboards and observability UIs that give researchers real-time insight into environment quality, training run health, and reward hacking\n Create backend services and APIs that connect environment authoring tools, data collection systems, and RL training infrastructure\n Build and expand scalable code data generation pipelines, producing diverse programming tasks with robust reward signals across languages and difficulty levels\n Develop onboarding automation and documentation tooling so new vendors and internal users ramp up in hours, not weeks\n Partner closely with RL researchers, data operations, and vendor management to translate ambiguous requirements into well-scoped, well-designed products\n\n### Requirements\n\n Strong software engineering fundamentals and real full-stack range , you're comfortable owning a surface from database schema to frontend\n Proficient in Python and a modern web stack (React, TypeScript, or similar)\n Track record of shipping systems that solved a hard problem, not just shipped on time , e.g. you built the thing that made your team 10x faster, or the internal tool nobody thought was possible\n Operate with high agency: you identify what needs to be done and drive it forward without waiting for a ticket\n Found yourself wondering "why isn't this moving faster?" in previous roles , and then have done something about it\n Care about UX and can build interfaces that are intuitive for both technical researchers and non-technical labelers\n Communicate clearly with researchers, operations teams, and engineers, and can turn vague asks into well-scoped work\n Thrive in a fast-moving environment where priorities shift, Claude is your pair programmer, and the next problem is often one nobody has solved before\n Care about Anthropic's mission to build safe, beneficial AI and want your work to contribute directly to it\n\n### Nice to Have\n\n Built data collection, labeling, or annotation platforms , ideally ones that had to scale across many vendors or many task types\n Background building multi-tenant platforms with role-based access, audit trails, and vendor management workflows\n Experience with cloud infrastructure (GCP or AWS), Docker, and CI/CD pipelines\n Familiarity with LLM training, fine-tuning, or evaluation workflows\n Experience with async Python (Trio, asyncio) or high-throughput API design\n Background in dashboards, monitoring, or observability tooling\n Experience working directly with external vendors or partners on technical integrations\n A background that isn't a straight line , e.g. math or physics into SWE, competitive programming, research into engineering, or a side project that outgrew its scope\n\n### Representative Projects\n\n Building a unified platform for human data collection that integrates labeling workflows, vendor management, and QA for complex agentic tasks\n Developing vendor onboarding automation that handles Docker registry access, API token management, and environment validation\n Creating evaluation and observability dashboards that catch reward hacks, measure environment difficulty, and give real-time feedback during production training\n Building environment quality review workflows that let researchers browse, grade, and provide feedback on training environments\n Developing automated environment quality pipelines that validate correctness and difficulty calibration before environments hit production training\n* Building internal tools for browsing and analyzing training run results, environment statistics, and data collection progress

XML job scraping automation by YubHub

]]> full-time staff hybrid $300,000-$405,000 USD Python, Modern web stack, React, TypeScript, Strong software engineering fundamentals, Full-stack range, Database schema, Frontend, Cloud infrastructure, Docker, CI/CD pipelines, LLM training, Fine-tuning, Evaluation workflows, Async Python, High-throughput API design, Dashboards, Monitoring, Observability tooling, Data collection, Labeling, Annotation platforms, Multi-tenant platforms, Role-based access, Audit trails, Vendor management workflows Engineering Technology Anthropic https://logos.yubhub.co/anthropic.com.png Anthropic is a company working on developing artificial intelligence systems. It has a quickly growing team of researchers, engineers, and business leaders. https://www.anthropic.com/ https://job-boards.greenhouse.io/anthropic/jobs/5186067008 San Francisco, CA | New York City, NY 2026-04-18 c9ab5cbc-dd6 Research Engineer, Performance RL We're hiring a Research Engineer to join our Code RL team within the RL organization. As a Research Engineer, you'll advance our models' ability to safely write correct, fast code for accelerators.

You'll need to know accelerator performance well to turn it into tasks and signals models can learn from. Specifically, you will:

Invent, design and implement RL environments and evaluations.
Conduct experiments and shape our research roadmap.
Deliver your work into training runs.
Collaborate with other researchers, engineers, and performance engineering specialists across and outside Anthropic.

We're looking for someone with expertise in accelerators (CUDA, ROCm, Triton, Pallas), ML framework programming (JAX or PyTorch), and experience with balancing research exploration with engineering implementation.

Strong candidates may also have experience with reinforcement learning, porting ML workloads between different types of accelerators, and familiarity with LLM training methodologies.

The annual compensation range for this role is $350,000-$850,000 USD.

Please note that we're an extremely collaborative group, and we value communication skills. The easiest way to understand our research directions is to read our recent research.

We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues.

XML job scraping automation by YubHub

]]> full-time senior hybrid $350,000-$850,000 USD accelerator performance, ML framework programming, reinforcement learning, RL environments and evaluations, experiments and research roadmap, training runs, collaboration with researchers and engineers, CUDA, ROCm, Triton, Pallas, JAX, PyTorch, LLM training methodologies Engineering Technology Anthropic https://logos.yubhub.co/anthropic.com.png Anthropic is a public benefit corporation that focuses on creating reliable, interpretable, and steerable AI systems. https://www.anthropic.com/ https://job-boards.greenhouse.io/anthropic/jobs/5160330008 San Francisco, CA 2026-04-18 faffcca4-e94 Research Engineer, Cybersecurity Reinforcement Learning About the role

We're hiring for the Cybersecurity RL team within Horizons. As a Research Engineer, you'll help to safely advance the capabilities of our models in secure coding, vulnerability remediation, and other areas of defensive cybersecurity.

This role blends research and engineering, requiring you to both develop novel approaches and realize them in code. Your work will include designing and implementing RL environments, conducting experiments and evaluations, delivering your work into production training runs, and collaborating with other researchers, engineers, and cybersecurity specialists across and outside Anthropic.

The role requires domain expertise in cybersecurity paired with interest or experience in training safe AI models. For example, you might be a white hat hacker who's curious about how LLMs could augment or transform your work, a security engineer interested in how AI could help harden systems at scale, or a detection and response professional wondering how models could enhance defensive workflows.

Responsibilities

Design and implement RL environments for secure coding and vulnerability remediation
Conduct experiments and evaluations to assess the effectiveness of our models
Deliver your work into production training runs to advance the capabilities of our models
Collaborate with other researchers, engineers, and cybersecurity specialists across and outside Anthropic

Requirements

Experience in cybersecurity research
Experience with machine learning
Strong software engineering skills
Ability to balance research exploration with engineering implementation
Passion for AI's potential and commitment to developing safe and beneficial systems

Strong candidates may also have:

Professional experience in security engineering, fuzzing, detection and response, or other applied defensive work
Experience participating in or building CTF competitions and cyber ranges
Academic research experience in cybersecurity
Familiarity with RL techniques and environments
Familiarity with LLM training methodologies

Logistics

Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience
Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience
Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position
Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.
Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.

We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work.

Your safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you're ever unsure about a communication, don't click any links,visit anthropic.com/careers directly for confirmed position openings.

How we're different

We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact , advancing our long-term goals of steerable, trustworthy AI , rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We're an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.

The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, and Learning from Human Preferences.

Come work with us!

Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues.

XML job scraping automation by YubHub

]]> full-time mid hybrid $300,000-$405,000 USD cybersecurity research, machine learning, software engineering, research exploration, engineering implementation, security engineering, fuzzing, detection and response, RL techniques, LLM training methodologies Engineering Technology Anthropic https://logos.yubhub.co/anthropic.com.png Anthropic is a public benefit corporation that creates reliable, interpretable, and steerable AI systems. https://www.anthropic.com/ https://job-boards.greenhouse.io/anthropic/jobs/5025624008 San Francisco, CA | New York City, NY 2026-04-18 157be224-49f Machine Learning Systems Engineer, RL Engineering About the role:

As an ML Systems Engineer on our Reinforcement Learning Engineering team, you'll be responsible for the critical algorithms and infrastructure that our researchers depend on to train models. Your work will directly enable breakthroughs in AI capabilities and safety.

You'll focus obsessively on improving the performance, robustness, and usability of these systems so our research can progress as quickly as possible.

Our finetuning researchers train our production Claude models, and internal research models, using RLHF and other related methods. Your job will be to build, maintain, and improve the algorithms and systems that these researchers use to train models.

You'll be responsible for improving the speed, reliability, and ease-of-use of these systems.

Strong candidates may also have experience with:

High performance, large scale distributed systems Large scale LLM training Python Implementing LLM finetuning algorithms, such as RLHF

Representative projects:

Profiling our reinforcement learning pipeline to find opportunities for improvement Building a system that regularly launches training jobs in a test environment so that we can quickly detect problems in the training pipeline Making changes to our finetuning systems so they work on new model architectures Building instrumentation to detect and eliminate Python GIL contention in our training code Diagnosing why training runs have started slowing down after some number of steps, and fixing it Implementing a stable, fast version of a new training algorithm proposed by a researcher

Deadline to apply: None. Applications will be reviewed on a rolling basis.

The annual compensation range for this role is $500,000-$850,000 USD.

Logistics:

Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices. Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.

We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed.

Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work.

We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team.

How we're different:

We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We're an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time.

As such, we greatly value communication skills.

Come work with us!

Guidance on Candidates' AI Usage: Learn about our policy for using AI in our application process

XML job scraping automation by YubHub

]]> full-time senior hybrid $500,000-$850,000 USD High performance, large scale distributed systems, Large scale LLM training, Python, Implementing LLM finetuning algorithms, such as RLHF Engineering Technology Anthropic https://logos.yubhub.co/anthropic.com.png Anthropic is a public benefit corporation that creates reliable, interpretable, and steerable AI systems. https://www.anthropic.com/ https://job-boards.greenhouse.io/anthropic/jobs/4952051008 San Francisco, CA | New York City, NY | Seattle, WA 2026-04-18 1507524b-770 Research Engineer, Performance RL We're hiring a Research Engineer to join our Code RL team within the RL organization. As a Research Engineer, you'll advance our models' ability to safely write correct, fast code for accelerators.

You'll need to know accelerator performance well to turn it into tasks and signals models can learn from. Specifically, you will:

Invent, design and implement RL environments and evaluations.
Conduct experiments and shape our research roadmap.
Deliver your work into training runs.
Collaborate with other researchers, engineers, and performance engineering specialists across and outside Anthropic.

You may be a good fit if you:

Have expertise with accelerators (CUDA, ROCm, Triton, Pallas), ML framework programming (JAX or PyTorch).
Have worked across the stack – kernels, model code, distributed systems.
Know how to balance research exploration with engineering implementation.
Are passionate about AI's potential and committed to developing safe and beneficial systems.

Strong candidates may also have:

Experience with reinforcement learning.
Experience porting ML workloads between different types of accelerators.
Familiarity with LLM training methodologies.

The annual compensation range for this role is $350,000-$850,000 USD.

We're an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.

We kitchen is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues.

XML job scraping automation by YubHub

]]> full-time senior hybrid $350,000-$850,000 USD accelerators, ML framework programming, distributed systems, reinforcement learning, LLM training methodologies, CUDA, ROCm, Triton, Pallas, JAX, PyTorch Engineering Technology Anthropic https://logos.yubhub.co/anthropic.com.png Anthropic is a public benefit corporation that creates reliable, interpretable, and steerable AI systems. https://www.anthropic.com/ https://job-boards.greenhouse.io/anthropic/jobs/5160330008 San Francisco, CA 2026-04-18 76fd624c-e23 Full-Stack Software Engineer, Reinforcement Learning As a Full-Stack Software Engineer in RL, you'll build the platforms, tools, and interfaces that power environment creation, data collection, and training observability. The quality of Claude's next generation depends on the quality of the data we train it on , and the systems you build are what make that data possible. You'll own product surfaces end-to-end , from backend services and APIs to the web UIs that researchers, external vendors, and thousands of data labelers use every day. You don't need a background in ML research. What matters is that you can take an ambiguous, high-stakes problem and ship a polished, reliable product against it, fast.

This team moves very quickly. Claude writes a lot of the code we commit, which means the bottleneck isn't typing , it's judgment, taste, and the ability to react to what researchers need next. You'll iterate on data collection strategies to distill the knowledge of thousands of human experts around the world into our models, and you'll do it in a loop that closes in hours and days, not quarters or months.

Our work spans teaching models to use computers effectively, advancing code generation through RL, pioneering fundamental RL research for large language models, and building the scalable training methodologies behind our frontier production models. The RL org is organized around four goals: solving the science of long-horizon tasks and continual learning, scaling RL data and environments to be comprehensive and diverse, automating software engineering end-to-end, and training the frontier production model.

Our engineering teams build the environments, evaluation systems, data pipelines, and tooling that make all of this possible , from realistic agentic training environments and scalable code data generation to human data collection platforms and production training operations.

Responsibilities:

Build and extend web platforms for RL environment creation, management, and quality review , including environment configuration, versioning, and validation workflows
Develop vendor-facing interfaces and tooling that let external partners create, submit, and iterate on training environments with minimal friction
Design and implement platforms for human data collection at scale, including labeling workflows, quality assurance systems, and feedback mechanisms that surface reward signal integrity issues early
Build evaluation dashboards and observability UIs that give researchers real-time insight into environment quality, training run health, and reward hacking
Create backend services and APIs that connect environment authoring tools, data collection systems, and RL training infrastructure
Build and expand scalable code data generation pipelines, producing diverse programming tasks with robust reward signals across languages and difficulty levels
Develop onboarding automation and documentation tooling so new vendors and internal users ramp up in hours, not weeks
Partner closely with RL researchers, data operations, and vendor management to translate ambiguous requirements into well-scoped, well-designed products

You May Be a Good Fit If You:

Have strong software engineering fundamentals and real full-stack range , you're comfortable owning a surface from database schema to frontend
Are proficient in Python and a modern web stack (React, TypeScript, or similar)
Have a track record of shipping systems that solved a hard problem, not just shipped on time , e.g. you built the thing that made your team 10x faster, or the internal tool nobody thought was possible
Operate with high agency: you identify what needs to be done and drive it forward without waiting for a ticket
Have found yourself wondering "why isn't this moving faster?" in previous roles , and then have done something about it
Care about UX and can build interfaces that are intuitive for both technical researchers and non-technical labelers
Communicate clearly with researchers, operations teams, and engineers, and can turn vague asks into well-scoped work
Thrive in a fast-moving environment where priorities shift, Claude is your pair programmer, and the next problem is often one nobody has solved before
Care about Anthropic's mission to build safe, beneficial AI and want your work to contribute directly to it

Strong Candidates May Also Have:

Built data collection, labeling, or annotation platforms , ideally ones that had to scale across many vendors or many task types
Background building multi-tenant platforms with role-based access, audit trails, and vendor management workflows
Experience with cloud infrastructure (GCP or AWS), Docker, and CI/CD pipelines
Familiarity with LLM training, fine-tuning, or evaluation workflows
Experience with async Python (Trio, asyncio) or high-throughput API design
Background in dashboards, monitoring, or observability tooling
Experience working directly with external vendors or partners on technical integrations
A background that isn't a straight line , e.g. math or physics into SWE, competitive programming, research into engineering, or a side project that outgrew its scope

Representative Projects:

Building a unified platform for human data collection that integrates labeling workflows, vendor management, and QA for complex agentic tasks
Developing vendor onboarding automation that handles Docker registry access, API token management, and environment validation
Creating evaluation and observability dashboards that catch reward hacks, measure environment difficulty, and give real-time feedback during production training
Building environment quality review workflows that let researchers browse, grade, and provide feedback on training environments
Developing automated environment quality pipelines that validate correctness and difficulty calibration before environments hit production training
Building internal tools for browsing and analyzing training run results, environment statistics, and data collection progress

XML job scraping automation by YubHub

]]> full-time staff hybrid $300,000-$405,000 USD Python, Modern web stack, React, TypeScript, Cloud infrastructure, Docker, CI/CD pipelines, LLM training, Fine-tuning, Evaluation workflows, Async Python, High-throughput API design, Dashboards, Monitoring, Observability tooling, Data collection, Labeling, Annotation, Multi-tenant platforms, Role-based access, Audit trails, Vendor management workflows Engineering Technology Anthropic https://logos.yubhub.co/anthropic.com.png Anthropic is a company that creates reliable, interpretable, and steerable AI systems. It has a quickly growing team of researchers, engineers, policy experts, and business leaders. https://www.anthropic.com/ https://job-boards.greenhouse.io/anthropic/jobs/5186067008 San Francisco, CA | New York City, NY 2026-04-18 2907e75d-d4e Research Engineer, Frontier Safety Risk Assessment Job Title: Research Engineer, Frontier Safety Risk Assessment

We are seeking 2 Research Engineers for the Frontier Safety Risk Assessment team within the AGI Safety and Alignment Team.

As a Research Engineer, you will contribute novel research towards our ability to measure and assess risk from frontier models. This might include:

Identifying new risk pathways within current areas (loss of control, ML R&D, cyber, CBRN, harmful manipulation) or in new ones;
Conceiving of, designing, and developing new ways to measure pre-mitigation and post-mitigation risk;
Forecasting and scenario planning for future risks which are not yet material.

Your work will involve complex conceptual thinking as well as engineering. You should be comfortable with research that is uncertain, under-constrained, and which does not have an achievable “right answer”. You should also be skilled at engineering, especially using Python, and able to rapidly familiarise yourself with internal and external codebases. Lastly, you should be able to adapt to pragmatic constraints around compute and researcher time that require us to prioritise effort based on the value of information.

Although this job description is written for a Research Engineer, all members of this team are better thought of as members of technical staff. We expect everyone to contribute to the research as well as the engineering and to be strong in both areas.

The role will mostly depend on your general ability to assess and manage future risks, rather than from specialist knowledge within the risk domains, but insofar as specialist knowledge is helpful, knowledge in ML R&D and loss of control as risk domains are likely the most valuable.

About You

In order to set you up for success as a Research Engineer at Google DeepMind, we look for the following skills and experience:

You have extensive research experience with deep learning and/or foundation models (for example, but not necessarily, a PhD in machine learning).
You are adept at generating ideas and designing experiments, and implementing these in Python with real AI systems.
You are keen to address risks from foundation models, and have thought about how to do so. You plan for your research to impact production systems on a timescale between “immediately” and “a few years”.
You are excited to work with strong contributors to make progress towards a shared ambitious goal.
With strong, clear communication skills, you are confident engaging technical stakeholders to share research insights tailored to their background.

In addition, any of the following would be an advantage:

Experience in areas such as frontier risk assessment and/or mitigations, safety, and alignment.
Engineering experience with LLM training and inference.
PhD in Computer Science or Machine Learning related field.
A track record of publications at venues such as NeurIPS, ICLR, ICML, RL/DL, EMNLP, AAAI and UAI.
Experience with collaborating or leading an applied research project.

At Google DeepMind, we value diversity of experience, knowledge, backgrounds and perspectives and harness these qualities to create extraordinary impact. We are committed to equal employment opportunity regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, pregnancy, or related condition (including breastfeeding) or any other basis as protected by applicable law. If you have a disability or additional need that requires accommodation, please do not hesitate to let us know.

At Google DeepMind, we want employees and their families to live happier and healthier lives, both in and out of work, and our benefits reflect that. Some select benefits we offer: enhanced maternity, paternity, adoption, and shared parental leave, private medical and dental insurance for yourself and any dependents, and flexible working options. We strive to continually improve our working environment, and provide you with excellent facilities such as healthy food, an on-site gym, faith rooms, terraces etc.

We are also open to relocating candidates and offer a bespoke service and immigration support to make it as easy as possible (depending on eligibility).

The US base salary range for this full-time position is between $136,000 - $245,000 + bonus + equity + benefits. Your recruiter can share more about the specific salary range for your targeted location during the hiring process.

XML job scraping automation by YubHub

]]> full-time staff onsite $136,000 - $245,000 + bonus + equity + benefits Python, Deep learning, Foundation models, Risk assessment, Mitigation, Forecasting, Scenario planning, LLM training and inference, PhD in Computer Science or Machine Learning related field, Track record of publications at venues such as NeurIPS, ICLR, ICML, RL/DL, EMNLP, AAAI and UAI, Experience with collaborating or leading an applied research project Engineering Technology Google DeepMind https://logos.yubhub.co/deepmind.com.png Google DeepMind is a subsidiary of Alphabet Inc., a multinational conglomerate headquartered in Mountain View, California. https://deepmind.com/ https://job-boards.greenhouse.io/deepmind/jobs/7493360 London, UK; New York City, New York, US; San Francisco, California, US 2026-03-16 b151fcc2-2fb Member of Technical Staff, High Performance Computing Engineer We are looking for experienced Member of Technical Staff, High Performance Computing Engineers to help build and scale the infrastructure that trains our frontier models and powers the next evolution of our personal AI, Copilot.

This role offers the unique opportunity to work on some of the largest scale supercomputers in the world – a rare chance to operate at such a significant scale.

Responsibilities

Design, operate, and maintain large-scale HPC environments, drawing on hands-on engineering experience in production settings.

Own the deployment, configuration, and day-to-day operation of HPC schedulers (e.g., SLURM, Kubernetes), ensuring reliable and efficient job scheduling at scale.

Serve as a technical owner for at least one core HPC domain (GPU compute, high-performance storage, networking, or similar), including ongoing maintenance, performance tuning, and troubleshooting of massive clusters.

Develop and maintain automation and tooling using Bash and/or Python to improve cluster reliability, observability, and operational efficiency.

Partner closely with researchers and engineers to support their workloads, troubleshoot cluster usage issues, and triage failed or underperforming jobs to resolution.

Drive work forward independently by navigating ambiguity and technical roadblocks, delivering incremental improvements that get capabilities into users’ hands quickly.

Qualifications

Do you have a Bachelor’s degree in computer science, or related technical field AND 4+ years technical engineering experience with deploying or operating on-premise or cloud high-performance clusters, AND 4+ years experience working with high-scale training clusters (ex. working with frameworks/tools such as nvidia InfiniBand clusters, SLURM, Kubernetes, Ray, etc.), AND 4+ years experience building scalable services on top of public cloud infrastructure like Azure, AWS, or GCP, OR equivalent experience?

Preferred Qualifications

Master’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience with deploying or operating on-premise or cloud high-performance clusters, AND 6+ years experience working with high-scale training clusters (ex. working with frameworks/tools such as nvidia InfiniBand clusters, SLURM, Kubernetes, Ray, etc.), AND 6+ years experience building scalable services on top of public cloud infrastructure like Azure, AWS, or GCP, OR equivalent experience.

XML job scraping automation by YubHub

]]> full-time staff onsite HPC, SLURM, Kubernetes, GPU compute, high-performance storage, networking, Bash, Python, nvidia InfiniBand clusters, Ray, LLM training clusters, AI platforms, Machine Learning frameworks, large-scale HPC or GPU systems Engineering Technology Microsoft AI https://logos.yubhub.co/microsoft.ai.png Microsoft AI is a leading technology company that develops and markets software, services, and solutions for personal and business use. It is one of the largest and most influential technology companies in the world. https://microsoft.ai https://microsoft.ai/job/member-of-technical-staff-high-performance-computing-engineer-mai-superintelligence-team-3/ Zürich 2026-03-08 34f86990-9e0 Machine Learning Systems Engineer, RL Engineering About the role:

You want to build the cutting-edge systems that train AI models like Claude. You're excited to work at the frontier of machine learning, implementing and improving advanced techniques to create ever more capable, reliable and steerable AI. As an ML Systems Engineer on our Reinforcement Learning Engineering team, you'll be responsible for the critical algorithms and infrastructure that our researchers depend on to train models. Your work will directly enable breakthroughs in AI capabilities and safety. You'll focus obsessively on improving the performance, robustness, and usability of these systems so our research can progress as quickly as possible. You're energized by the challenge of supporting and empowering our research team in the mission to build beneficial AI systems.

You may be a good fit if you:

Have 4+ years of software engineering experience
Like working on systems and tools that make other people more productive
Are results-oriented, with a bias towards flexibility and impact
Pick up slack, even if it goes outside your job description
Enjoy pair programming (we love to pair!)
Want to learn more about machine learning research
Care about the societal impacts of your work

Strong candidates may also have experience with:

High performance, large scale distributed systems
Large scale LLM training
Python
Implementing LLM finetuning algorithms, such as RLHF

Representative projects:

Profiling our reinforcement learning pipeline to find opportunities for improvement
Building a system that regularly launches training jobs in a test environment so that we can quickly detect problems in the training pipeline
Making changes to our finetuning systems so they work on new model architectures
Building instrumentation to detect and eliminate Python GIL contention in our training code
Diagnosing why training runs have started slowing down after some number of steps, and fixing it
Implementing a stable, fast version of a new training algorithm proposed by a researcher

Logistics

Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience. Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.

Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.

We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team.

Your safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you're ever unsure about a communication, don't click any links—visit anthropic.com/careers directly for confirmed position openings.

How we're different

We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We're an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.

The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior

XML job scraping automation by YubHub

]]> full-time senior hybrid $500,000 - $850,000USD High performance, large scale distributed systems, Large scale LLM training, Python, Implementing LLM finetuning algorithms, such as RLHF, High performance, large scale distributed systems, Large scale LLM training, Python, Implementing LLM finetuning algorithms, such as RLHF Engineering Technology Anthropic https://logos.yubhub.co/anthropic.com.png Anthropic is a company that aims to create reliable, interpretable, and steerable AI systems. It has a team of researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. https://job-boards.greenhouse.io https://job-boards.greenhouse.io/anthropic/jobs/4952051008 San Francisco, CA | New York City, NY | Seattle, WA 2026-03-08 b0188062-45f Research Engineer, Cybersecurity Reinforcement Learning About the role

This role blends research and engineering, requiring you to both develop novel approaches and realise them in code. Your work will include designing and implementing RL environments, conducting experiments and evaluations, delivering your work into production training runs, and collaborating with other researchers, engineers, and cybersecurity specialists across and outside Anthropic.

You may be a good fit if you:

Have experience in cybersecurity research.
Have experience with machine learning.
Have strong software engineering skills.
Can balance research exploration with engineering implementation.
Are passionate about AI's potential and committed to developing safe and beneficial systems.

Strong candidates may also have:

Professional experience in security engineering, fuzzing, detection and response, or other applied defensive work.
Experience participating in or building CTF competitions and cyber ranges.
Academic research experience in cybersecurity.
Familiarity with RL techniques and environments.
Familiarity with LLM training methodologies.

Logistics

How we're different

Come work with us!

XML job scraping automation by YubHub

]]> full-time senior hybrid $300,000 - $405,000 USD cybersecurity research, machine learning, software engineering, RL techniques and environments, LLM training methodologies, security engineering, fuzzing, detection and response, CTF competitions and cyber ranges, academic research in cybersecurity Engineering Technology Anthropic https://logos.yubhub.co/anthropic.com.png Anthropic is a public benefit corporation headquartered in San Francisco, focused on creating reliable, interpretable, and steerable AI systems. https://job-boards.greenhouse.io https://job-boards.greenhouse.io/anthropic/jobs/5025624008 San Francisco, CA, New York City, NY 2026-03-08 c1d20281-7ee Member of Technical Staff, High Performance Computing Engineer Summary

Microsoft AI are looking for a talented Member of Technical Staff, High Performance Computing Engineer at their London office. This role sits at the heart of building and scaling the infrastructure that trains their frontier models and powers the next evolution of their personal AI, Copilot. You'll work directly with researchers and engineers to support their workloads, troubleshoot cluster usage issues, and triage failed or underperforming jobs to resolution.

About the Role

As a Member of Technical Staff, High Performance Computing Engineer, you will design, operate, and maintain large-scale HPC environments, drawing on hands-on engineering experience in production settings. You will own the deployment, configuration, and day-to-day operation of HPC schedulers (e.g., SLURM, Kubernetes), ensuring reliable and efficient job scheduling at scale. You will serve as a technical owner for at least one core HPC domain (GPU compute, high-performance storage, networking, or similar), including ongoing maintenance, performance tuning, and troubleshooting of massive clusters.

Accountabilities

Design, operate, and maintain large-scale HPC environments, drawing on hands-on engineering experience in production settings.
Own the deployment, configuration, and day-to-day operation of HPC schedulers (e.g., SLURM, Kubernetes), ensuring reliable and efficient job scheduling at scale.
Serve as a technical owner for at least one core HPC domain (GPU compute, high-performance storage, networking, or similar), including ongoing maintenance, performance tuning, and troubleshooting of massive clusters.

The Candidate we're looking for

Experience:

4+ years technical engineering experience with deploying or operating on-premise or cloud high-performance clusters.
4+ years experience working with high-scale training clusters (ex. working with frameworks/tools such as nvidia InfiniBand clusters, SLURM, Kubernetes, Ray, etc.).
4+ years experience building scalable services on top of public cloud infrastructure like Azure, AWS, or GCP.

Technical skills:

Experience with LLM training clusters.
Experience working with AI platforms, frameworks, and APIs.
Experience using Machine Learning frameworks, including experience using, deploying, and scaling language learning models, either personally or professionally.

Personal attributes:

Ability to identify, analyze, and resolve complex technical issues, ensuring optimal performance, scalability, and user experience.
Dedication to writing clean, maintainable, and well-documented code with a focus on application quality, performance, and security.

Benefits

Competitive salary and benefits package.
Opportunity to work with a leading technology company and contribute to HERE's mission.
Collaborative and dynamic work environment.
Professional development opportunities.

XML job scraping automation by YubHub

]]> full-time staff onsite Competitive salary and benefits package High Performance Computing, Cloud Infrastructure, Machine Learning, AI Platforms, Frameworks and APIs, LLM Training Clusters, AI Platforms, Frameworks and APIs Engineering Technology Microsoft AI https://logos.yubhub.co/microsoft.ai.png Microsoft AI is a leading technology company that empowers every person and every organization on the planet to achieve more. They come together with a growth mindset, innovate to empower others, and collaborate to realize their shared goals. https://microsoft.ai https://microsoft.ai/job/member-of-technical-staff-high-performance-computing-engineer-mai-superintelligence-team-2/ London 2026-03-06 7abfb827-590 Member of Technical Staff, High Performance Computing Engineer Summary

Microsoft AI are looking for experienced Member of Technical Staff, High Performance Computing Engineers to help build and scale the infrastructure that trains their frontier models and powers the next evolution of their personal AI, Copilot.

About the Role

This role offers the unique opportunity to work on some of the largest scale supercomputers in the world – a rare chance to operate at such a significant scale. As a Member of Technical Staff, High Performance Computing Engineer, you will design, operate, and maintain large-scale HPC environments, drawing on hands-on engineering experience in production settings. You will own the deployment, configuration, and day-to-day operation of HPC schedulers (e.g., SLURM, Kubernetes), ensuring reliable and efficient job scheduling at scale.

Accountabilities

Design, operate, and maintain large-scale HPC environments, drawing on hands-on engineering experience in production settings.
Own the deployment, configuration, and day-to-day operation of HPC schedulers (e.g., SLURM, Kubernetes), ensuring reliable and efficient job scheduling at scale.

The Candidate we're looking for

Experience:

4+ years technical engineering experience with deploying or operating on-premise or cloud high-performance clusters.
4+ years experience working with high-scale training clusters (ex. working with frameworks/tools such as nvidia InfiniBand clusters, SLURM, Kubernetes, Ray, etc.).
4+ years experience building scalable services on top of public cloud infrastructure like Azure, AWS, or GCP.

Technical skills:

Experience with LLM training clusters.
Experience working with AI platforms, frameworks, and APIs.
Experience using Machine Learning frameworks, including experience using, deploying, and scaling language learning models, either personally or professionally.

Personal attributes:

Ability to identify, analyze, and resolve complex technical issues, ensuring optimal performance, scalability, and user experience.
Dedication to writing clean, maintainable, and well-documented code with a focus on application quality, performance, and security.

Benefits

Competitive salary.
Comprehensive benefits package.
Opportunities for professional growth and development.

XML job scraping automation by YubHub

]]> full-time staff onsite Competitive salary High Performance Computing, Cloud Infrastructure, Machine Learning, AI Platforms, Frameworks and APIs, LLM Training Clusters, AI Platforms, Frameworks and APIs Engineering Technology Microsoft AI https://logos.yubhub.co/microsoft.ai.png Microsoft AI is a leading technology company that empowers every person and every organization on the planet to achieve more. They come together with a growth mindset, innovate to empower others, and collaborate to realize their shared goals. https://microsoft.ai https://microsoft.ai/job/member-of-technical-staff-high-performance-computing-engineer-mai-superintelligence-team/ Multiple Locations, United States 2026-03-06 ba52acc3-4fd Engineering Site Lead We're seeking an exceptional Site Lead to establish and scale our London office. This is a unique opportunity to shape Perplexity's presence in one of the world's leading tech hubs, building teams and culture from the ground up while driving technical excellence in infrastructure and AI systems.

What you'll do

As Site Lead, you'll serve as the face of Perplexity in London, responsible for building our technical organization, fostering a world-class engineering culture, and directly managing one or more infrastructure teams. You'll report to senior leadership and work cross-functionally with teams across our global footprint.

What you need

10+ years of experience in software engineering with 5+ years in infrastructure, cloud infrastructure, or AI infrastructure roles
3+ years of people management experience, including building and scaling teams
Proven track record of establishing or significantly growing an engineering site or office

XML job scraping automation by YubHub

]]> full-time senior hybrid distributed systems, cloud platforms, infrastructure automation, GPU infrastructure and orchestration, ML training and inference pipelines, Model serving and deployment at scale, Kubernetes, Terraform, container orchestration, CI/CD systems, experience at companies focused on AI/ML, search, or large-scale consumer applications, previous experience as a site lead, office lead, or similar multi-team leadership role, background in building infrastructure for LLM training or inference, contributions to open-source infrastructure or AI infrastructure projects, experience scaling teams from 0 to 20+ engineers, active involvement in the London or European tech community Engineering Technology Perplexity https://logos.yubhub.co/perplexity.com.png Perplexity is revolutionizing how people discover and interact with information through AI-powered search and knowledge tools. As we expand our global footprint, we're establishing a strategic presence in London to drive innovation and growth across Europe. https://jobs.ashbyhq.com https://jobs.ashbyhq.com/perplexity/638e6823-be7f-46c6-9675-7b1197fc9b8c London 2026-03-04 46711770-4ab AI Researcher Perplexity is seeking top-tier AI Research Scientists and Engineers to advance our AI products and capabilities. We're building the future of AI-powered search and agent experiences through our Sonar models, Deep Research Agent, Comet Agent, and Search products. Join us in creating SOTA experiences that handle hundreds of millions of queries and continue to scale rapidly.

What you'll do

Research & Development

Post-train SOTA LLMs using the latest supervised and reinforcement learning techniques (SFT/DPO/GRPO)
Leverage our rich query/answer dataset to scale model performance across Sonar, Deep Research, Comet, and Search products

What you need

Proven experience with large-scale LLMs and Deep Learning systems

XML job scraping automation by YubHub

]]> full-time senior onsite $220K – $485K large-scale LLMs, Deep Learning systems, Python/PyTorch, post-training techniques, reinforcement learning, PhD in Machine Learning, AI, Systems, or related areas, C++/CUDA programming skills, experience building LLM training frameworks, academic publications and research impact, experience with agent systems and multi-step reasoning, background in personalization and preference learning Engineering Technology Perplexity https://logos.yubhub.co/perplexity.com.png Perplexity is a company seeking top-tier AI Research Scientists and Engineers to advance their AI products and capabilities. They're building the future of AI-powered search and agent experiences through their Sonar models, Deep Research Agent, Comet Agent, and Search products. https://jobs.ashbyhq.com https://jobs.ashbyhq.com/perplexity/8fe61c73-0daf-4432-a47d-44714c1ef764 San Francisco, Palo Alto 2026-03-04