Evals Engineer, Applied AI

9a42f26c-511 Evals Engineer, Applied AI We are seeking a technically rigorous and driven AI Research Engineer to join our Enterprise Evaluations team. This high-impact role is critical to our mission of delivering the industry's leading GenAI Evaluation Suite.

As a hands-on contributor to the core systems that ensure the safety, reliability, and continuous improvement of LLM-powered workflows and agents for the enterprise, you will partner with Scale's Operations team and enterprise customers to translate ambiguity into structured evaluation data. This involves guiding the creation and maintenance of gold-standard human-rated datasets and expert rubrics that anchor AI evaluation systems.

Your responsibilities will also include analysing feedback and collected data to identify patterns, refine evaluation frameworks, and establish iterative improvement loops that enhance the quality and relevance of human-curated assessments. You will design, research, and develop LLM-as-a-Judge autorater frameworks and AI-assisted evaluation systems, including creating models that critique, grade, and explain agent outputs.

To succeed in this role, you will need a strong foundational knowledge of large language models, a passion for tackling complex evaluation challenges, and the ability to thrive in a dynamic, fast-paced research environment. You should be able to think outside the box, stay current with the latest literature in AI evaluation, and be passionate about integrating novel research ideas into our workflows to build best-in-class evaluation systems.

In addition to your technical expertise, you will need excellent communication and collaboration skills, as you will work closely with cross-functional teams to drive project success.

If you are a motivated and detail-oriented individual with a passion for AI research and evaluation, we encourage you to apply for this exciting opportunity.

XML job scraping automation by YubHub

]]> full-time mid hybrid $216,000-$270,000 USD Python, PyTorch, TensorFlow, Large Language Models, Generative AI, Machine Learning, Applied Research, Evaluation Infrastructure, Advanced degree in Computer Science, Machine Learning, or a related quantitative field, Published research in leading ML or AI conferences, Experience designing, building, or deploying LLM-as-a-Judge frameworks or other automated evaluation systems, Experience collaborating with operations or external teams to define high-quality human annotator guidelines, Expertise in ML research engineering, stochastic systems, observability, or LLM-powered applications for model evaluation and analysis Engineering Technology Scale AI https://logos.yubhub.co/scale.com.png Scale AI develops reliable AI systems for the world's most important decisions. https://scale.com/ https://job-boards.greenhouse.io/scaleai/jobs/4629589005 San Francisco, CA; New York, NY 2026-04-18 78eea632-7b6 Deep Research Agent Tech Lead We're seeking a highly technical and strategic Staff/Senior Staff Machine Learning Engineer to act as the Tech Lead for our next-generation deep research agents for the Enterprise.

This high-impact role will drive the technical direction and oversight for Deep Research Agent Development, translating cutting-edge research in Generative AI, Large Language Models (LLMs), and Agentic Frameworks into robust, scalable, and high-impact production systems that enhance enterprise operations, analytics, and core efficiency.

The ideal candidate thrives in a fast-paced environment, has a passion for both deep technical work and mentoring, and is capable of setting a long-term technical strategy for a critical domain while maintaining a strong, hands-on delivery focus.

Responsibilities

Technical Leadership & Vision

Set the Technical Roadmap: Define and own the technical strategy, architecture, and roadmap for Deep Research Agents for the Enterprise, ensuring alignment with Scale AI’s overall AI strategy and business goals.

Drive Breakthrough Research to Production: Lead the end-to-end development, from initial research to production deployment, to landing on customer impact, with a focus on integrating diverse data modalities.

Core Agent Capabilities Development:

Advanced Knowledge Retrieval: Architect and implement state-of-the-art retrieval systems to ensure the agents provide accurate and comprehensive answers from public and proprietary data sources from enterprises.

Data Analysis: Design and champion the development of data analysis agents that accurately translate complex natural language queries into executable SQL/code against diverse enterprise data schemas.

Multimodal Intelligence: Lead the integration of Multimodal AI capabilities to process and extract structured information from visual documents, tables, and forms, enriching the agent's knowledge base.

Architecture & Design: Design and champion highly scalable, reliable, and low-latency infrastructure and frameworks for building, orchestrating, and evaluating multi-agent systems at enterprise scale.

Technical Excellence: Serve as the technical authority for the team, leading design reviews, defining ML engineering best practices, and ensuring code quality, security, and operational excellence for all agent systems.

Team Leadership & Mentorship

Lead and Mentor: Technically lead and mentor a team of Machine Learning Engineers and Research Scientists, fostering a culture of innovation, rigorous engineering, rapid iteration, and technical depth.

Recruiting & Growth: Partner with management to hire, onboard, and grow top-tier talent, helping to shape the long-term structure and capabilities of the team.

Cross-Functional Influence: Collaborate effectively with Product Managers, Data Scientists, and other engineering/science teams to translate ambiguous, high-level business problems into concrete, executable technical specifications and impactful agent solutions.

Basic Qualifications

Bachelor's degree in Computer Science, Electrical Engineering, a related field, or equivalent practical experience.

8+ years of experience in software development, with at least 6 years focused on Machine Learning, Deep Learning, or Applied Research in a production environment.

2+ years of experience in a formal or informal Technical Leadership role (Team Lead, Tech Lead) with a focus on setting technical direction for a domain.

Deep expertise in Generative AI and Large Language Models (LLMs).

Demonstrated experience designing, building, and deploying AI Agents or complex Agentic systems in production at scale.

Experience with large-scale distributed systems and real-time data processing.

Preferred Qualifications

Advanced degree (Master's or Ph.D.) in Computer Science, Machine Learning, or a related quantitative field.

Demonstrated experience designing and deploying production-grade Text-to-SQL systems, including handling complex schema linking and query optimization.

Practical experience with Multimodal AI, specifically integrating OCR and vision-language models for document intelligence and structured data extraction from images/forms.

Proven experience in one or more relevant deep research areas: Reinforcement Learning (RL), Reasoning and Planning, Agentic Systems.

Experience with vector databases and advanced retrieval techniques.

A track record of publishing research papers in top-tier ML/AI conferences (e.g., NeurIPS, ICML, ICLR, KDD).

Excellent written and verbal communication skills, with the ability to articulate complex technical vision to executive stakeholders and technical peers.

Experience driving cross-team technical initiatives that have delivered significant business impact.

Compensation

Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position, determined by work location and additional factors, including job-related skills, experience, interview performance, and relevant education or training. Scale employees in eligible roles are also granted equity-based compensation, subject to Board of Director approval. Your recruiter can share more about the specific salary range for your preferred location during the hiring process, and confirm whether the hired role will be eligible for equity grant. You’ll also receive benefits including, but not limited to: Comprehensive health, dental and vision coverage, retirement benefits, a learning and development stipend, and generous PTO. Additionally, this role may be eligible for additional benefits such as a commuter stipend.

About Us

At Scale, our mission is to develop reliable AI systems for the world's most important decisions. Our products provide the high-quality data and full-stack technologies that power the world's leading models, and help enterprises and governments build, deploy, and oversee AI applications that deliver real impact. We work closely with industry leaders like Meta, Cisco, DLA Piper, Mayo Clinic, Time Inc., the Government of Qatar, and U.S. government agencies including the Army and Air Force. We are expanding our team to accelerate the development of AI applications.

XML job scraping automation by YubHub

]]> full-time staff onsite $264,800-$331,000 USD Generative AI, Large Language Models (LLMs), Agentic Frameworks, Machine Learning, Deep Learning, Applied Research, Distributed Systems, Real-time Data Processing, Text-to-SQL Systems, Multimodal AI, Reinforcement Learning (RL), Reasoning and Planning, Agentic Systems, Vector Databases, Advanced Retrieval Techniques Engineering Technology Scale AI https://logos.yubhub.co/scale.com.png Scale AI develops reliable AI systems for the world's most important decisions, providing high-quality data and full-stack technologies to power leading models. https://scale.com/ https://job-boards.greenhouse.io/scaleai/jobs/4623590005 San Francisco, CA; New York, NY 2026-04-18 cd3b618b-96d Security Labs Engineer Job Title: Security Labs Engineer

About Anthropic

Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole.

About the Role

Security at Anthropic is not a compliance exercise. It is a core part of how we stay safe as we build increasingly capable systems. Our Responsible Scaling Policy commits us to launching structured security R&D projects: ambitious, time-boxed experiments designed to resolve high-uncertainty questions about our long-term security posture.

Each project runs for roughly 6 months with defined exit criteria. Some will succeed and move toward production. Others will fail, and we'll treat that as useful signals. The questions these projects are designed to answer include:

Can our core research workflows survive extreme isolation?

Can we get cryptographic guarantees where we currently rely on trust?

Can AI become our most effective security control?

As a Security Labs Engineer, you own one or more projects end-to-end: scoping the experiment, building the infrastructure, coordinating across teams, running the pilot, documenting results, and where the experiment succeeds, helping scale it into production. This is 0-to-1 and 1-to-10 work.

Current Project Areas

The portfolio evolves based on what we learn. Current areas include:

Designing and operating a mock high-assurance research environment: simulating what our infrastructure would look like under extreme isolation and physical security controls, with real measurement of productivity impact

Exploring cryptographic verification of model integrity using techniques like zero-knowledge proofs to provide mathematical guarantees about what is running in production

Assessing the feasibility of confidential computing across the full model lifecycle (note: this is an open question, not a committed roadmap item)

Piloting AI-assisted security tooling including vulnerability discovery, automated patching, anomaly detection, and adaptive behavioral monitoring

Prototyping API-only access regimes where even internal research workflows never touch raw model weights

Part of your job is helping shape what comes next based on gaps uncovered in the current round.

Responsibilities

Own the end-to-end execution of a Security Labs project: refine the hypothesis, design the experiment, build the prototype, run the pilot, and write up the results

Build novel security infrastructure under real time pressure: isolated clusters, hardened access controls, cryptographic verification layers, with a bias toward learning fast

Where experiments succeed, drive them toward production scale. An experiment that works on one cluster but not a hundred is not a finished result.

Work embedded with research teams (Pretraining, RL, Inference) to stress-test whether their core workflows can function under extreme security controls, and document precisely where they break

Evaluate and integrate emerging security technologies through coordination with external vendors and research groups

Turn experimental results into clear, decision-ready writeups that inform Anthropic's long-term security architecture and RSP commitments

Maintain a pain-point registry and feasibility assessment for each project, feeding directly into the design of production high-assurance environments

Help scope and prioritize the next wave of Labs projects based on what the current round uncovers

Requirements

7+ years of software or security engineering experience, with a solid foundation in production systems

Some of that time spent on pilots, prototypes, or applied research work where shipping a working answer to a hard question was the explicit goal

Strong programming skills in Python and at least one systems language (Go, Rust, or C/C++)

Hands-on experience with cloud infrastructure (AWS, GCP, or Azure), Kubernetes, and networking fundamentals sufficient to stand up and tear down isolated environments quickly

A track record of cross-functional execution: you can walk into a room with ML researchers, infrastructure engineers, and vendors and leave with a shared plan

Clear written communication: you know how to turn six weeks of experimentation into a two-page memo someone can act on

Comfort with ambiguity and iteration, having run experiments that failed, extracted the lesson, and moved forward

Genuine curiosity about what it would actually take to defend against a nation-state-level adversary

Passion for AI safety and a real understanding of the role security plays in making frontier AI development go well

Bachelor's degree in Computer Science, a related field, or equivalent industry experience required.

Preferred Qualifications

Prior experience in offensive security, red teaming, or security research, having thought adversarially about systems and knowing which threats actually matter

Familiarity with airgapped or high-side environments (classified networks, ICS/SCADA, financial trading infrastructure, or similar) and the operational realities of working inside them

Knowledge of applied cryptography: zero-knowledge proofs, attestation protocols, secure enclaves, TPMs, or confidential computing primitives

Experience with ML infrastructure (training pipelines, inference serving, model packaging) sufficient for grounded conversations with researchers about what their workflows actually need

Background building or operating security systems in environments that demand rapid iteration rather than rigid change control

Prior work at a startup, on an innovation team, or in an applied research group where shipping a working v0 to answer a real question was explicitly the goal

Location

This role is based in our San Francisco office (500 Howard St). Several Labs projects involve physical secure facilities on-site, so expect to be in-office more frequently than Anthropic's standard 25% hybrid baseline.

What We Offer

Competitive salary and equity package

Comprehensive health insurance and retirement plans

Flexible work arrangements, including remote work options

Professional development opportunities, including training and conference attendance

Collaborative and dynamic work environment

Access to cutting-edge technology and resources

Opportunity to work on challenging and impactful projects

Recognition and rewards for outstanding performance

If you're excited about the opportunity to join our team and contribute to the development of secure and beneficial AI systems, please submit your application. We can't wait to hear from you!

Deadline to Apply

None, applications will be received on a rolling basis.

Annual Compensation Range

$405,000 - $485,000 USD

Logistics

Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience

Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience

Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position

Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.

Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with the process.

XML job scraping automation by YubHub

]]> full-time senior hybrid $405,000 - $485,000 USD Python, Go, Rust, C/C++, Cloud infrastructure, Kubernetes, Networking fundamentals, Cross-functional execution, Clear written communication, Comfort with ambiguity and iteration, Genuine curiosity about what it would actually take to defend against a nation-state-level adversary, Passion for AI safety, Real understanding of the role security plays in making frontier AI development go well, Offensive security, Red teaming, Security research, Applied cryptography, ML infrastructure, Background building or operating security systems in environments that demand rapid iteration rather than rigid change control, Prior work at a startup, on an innovation team, or in an applied research group where shipping a working v0 to answer a real question was explicitly the goal Engineering Technology Anthropic https://logos.yubhub.co/anthropic.com.png Anthropic is a technology company that specializes in developing artificial intelligence systems. https://www.anthropic.com/ https://job-boards.greenhouse.io/anthropic/jobs/5153564008 San Francisco, CA 2026-04-18 a29e2179-b95 Analyst, The Anthropic Institute About the Role

As a Member of Analytical Staff for The Anthropic Institute, you'll spend your time researching how Anthropic is tackling the challenges the Institute is focused on, synthesizing what you learn from across the organization, and turning that into rigorous analysis that shapes both internal decisions and public understanding.

You'll be talking to colleagues all over Anthropic,building a picture of how different teams are approaching some of the most consequential questions in AI,and then helping the Institute communicate that work and its implications to the world. You will use Claude aggressively, creatively, and daily, to help you surface insights about what Anthropic is doing with regard to these problem areas.

Responsibilities

Research how Anthropic's teams are working on the Institute's challenges and synthesize findings from across the organization into a coherent picture.
Partner with teams to help them surface their insights to the world, often working to act as the 'connective tissue' between them and other teams to bring different insights together.
Produce written analysis and memos about how Anthropic is approaching these problems,for both internal leadership and public audiences.
Partner with relevant teams to develop and publish public outputs.
Come up with creative ways to carry our work into the world: sometimes the most impactful way to talk about an issue is through a technical demonstration rather than a blog post or research paper (e.g, Golden Gate Claude, Project Vend, Robodog).

What We're Looking For

7+ years of experience in technical policy research, think tank work, or applied research in a domain relevant to the Institute's focus areas (AI, labor economics, national security, or emerging technology governance).
Track record of publishing or producing work for external audiences,whether policy memos, research reports, white papers, or public-facing analysis.
Comfort operating at the intersection of technical and policy audiences. You don't need to be an ML researcher, but you can read technical work, ask good questions of the people who produce it, and translate findings accurately for non-technical stakeholders without losing the nuance.
Demonstrated ability to synthesize across disciplines and bodies of work. You've produced analysis that draws on economics, political science, technology, or organizational behavior,not siloed in one field.
Deep experience using Claude as a tool for research and organizational knowledge-gathering.

Benefits

The annual compensation range for this role is $295,000-$345,000 USD.

Logistics

Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices. Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.

XML job scraping automation by YubHub

]]> full-time senior hybrid $295,000-$345,000 USD Technical policy research, Think tank work, Applied research, Claude, Policy analysis, Research synthesis, Communication Engineering Technology The Anthropic Institute https://logos.yubhub.co/anthropic.com.png A new externally-facing function within Anthropic focused on generating and releasing information about the impact of AI systems on the economy, threat landscape, and society. https://www.anthropic.com/ https://job-boards.greenhouse.io/anthropic/jobs/5123742008 San Francisco, CA 2026-04-18 1410a549-44e Director of Machine Learning, Safety & Mods We're looking for a Director of Machine Learning to lead Reddit's efforts in building industry-leading ML systems that keep our platform safe and foster healthy online communities.

This leader will drive the strategy, development, and deployment of machine learning models that detect and prevent harmful content and behavior at scale.

In this role, you will own the roadmap for Safety and moderation ML, lead a team of applied scientists and engineers, and partner cross-functionally across Product, Engineering, Safety operations, Trust & Community, and AI/ML Platform to innovate on real-time detection, automation, and user protection systems.

You will leverage modern ML , including fine-tuned LLMs , to ensure Reddit remains a safe, welcoming, and positive environment for our global user base.

Responsibilities:

Set the vision and strategy for applying ML to Trust & Safety, ensuring scalable, proactive protection against evolving abuse patterns.

Lead and grow a high-performing Safety ML organization, including applied research, model development, productionization, and continuous improvement.

Develop and deploy cutting-edge Safety ML systems (including fine-tuned LLMs and transformer models) that outperform state-of-the-art solutions in quality, latency, and efficiency.

Partner with Trust & Safety, Product, Moderation, and AI/ML Platform teams to identify safety risks, emerging harm vectors, and ML opportunities that improve detection, enforcement, and user experience.

Drive successful experimentation, evaluation, and model lifecycle management, ensuring high precision, recall, explainability, and policy alignment.

Champion ethical and responsible AI practices in all Safety ML solutions.

Track performance through metrics, research-based iteration, and alignment with Reddit’s safety policies and regulatory standards.

Represent Safety ML leadership internally and externally , including conferences, publications, industry groups, and cross-company collaboration initiatives.

Required Qualifications:

10+ years of experience in Machine Learning, AI, or applied research, with a strong background in Trust & Safety, abuse prevention, detection, or content integrity.

5+ years of experience leading multi-disciplinary ML teams (applied science, engineering, analytics) in a high-growth or high-impact environment.

Proven track record of shipping ML systems at scale in production, ideally including transformer-based models and LLM fine-tuning.

Depth in NLP, content understanding, detection systems, supervised and weak-supervision techniques.

Strong cross-functional leadership skills, with ability to influence executives and foster alignment across Safety, Product, and Engineering.

Thought leadership in responsible AI, safety ML research, or safety measurement frameworks.

Bonus points if you have:

Experience building or operating real-time abuse detection and automated moderation systems in a complex user-generated content ecosystem.

Prior work in consumer-facing tech, social platforms, or large-scale community-driven products.

Benefits:

Comprehensive Healthcare Benefits and Income Replacement Programs

401k with Employer Match

Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support

Family Planning Support

Gender-Affirming Care

Mental Health & Coaching Benefits

Flexible Vacation & Paid Volunteer Time Off

Generous Paid Parental Leave

XML job scraping automation by YubHub

]]> full-time executive remote $265,800-$365,100 USD Machine Learning, AI, Applied Research, Trust & Safety, Abuse Prevention, Detection, Content Integrity, NLP, Content Understanding, Detection Systems, Supervised and Weak-Supervision Techniques Engineering Technology Reddit https://logos.yubhub.co/redditinc.com.png Reddit is a community-driven platform with over 100,000 active communities and 121 million daily active unique visitors. https://www.redditinc.com https://job-boards.greenhouse.io/reddit/jobs/7430544 Remote - United States 2026-04-18 d2f5b1e5-545 Research Scientist, Gemini Safety We're seeking a versatile Research Scientist to join our Gemini Safety team. As a Research Scientist, you will apply and develop data and algorithmic cutting-edge solutions to advance our latest user-facing models. Your work will focus on advancing the safety and fairness behavior of state-of-the-art AI models, driving the development of foundational technology adopted by numerous product areas, including Gemini App, Cloud API, and Search.

Key responsibilities include:

Post-training/instruction tuning state-of-the-art LLMs, focusing on text-to-text, image/video/audio-to-text modalities and agentic capabilities
Exploring data, reasoning, and algorithmic solutions to ensure Gemini Models are safe, maximally helpful, and work for everyone
Improve Gemini's adversarial robustness, with a focus on high-stakes abuse risks
Design and maintain high-quality evaluation protocols to assess model behavior gaps and headroom related to safety and fairness
Develop and execute experimental plans to address known gaps, or construct entirely new capabilities
Drive innovation and enhance understanding of Supervised Fine Tuning and Reinforcement Learning fine-tuning at scale

To succeed as a Research Scientist in the Gemini Safety team, we look for the following skills and experience:

PhD in Computer Science, a related field, or equivalent practical experience
Significant LLM post-training experience
Experience in Reward modeling and Reinforcement Learning for LLMs Instruction tuning
Experience with Long-range Reinforcement learning
Experience in areas such as Safety, Fairness, and Alignment
Track record of publications at NeurIPS, ICLR, ICML
Experience taking research from concept to product
Experience with collaborating or leading an applied research project
Strong experimental taste: Good judgment regarding baselines, ablations, and what is worth testing
Experience with JAX

XML job scraping automation by YubHub

]]> full-time senior onsite PhD in Computer Science, LLM post-training experience, Reward modeling and Reinforcement Learning for LLMs Instruction tuning, Long-range Reinforcement learning, Safety, Fairness, and Alignment, NeurIPS, ICLR, ICML publications, Research from concept to product, Collaborating or leading an applied research project, JAX Engineering Technology Google DeepMind https://logos.yubhub.co/deepmind.com.png Google DeepMind is a subsidiary of Alphabet Inc., a multinational conglomerate. https://deepmind.com/ https://job-boards.greenhouse.io/deepmind/jobs/7731944 Zurich, Switzerland 2026-04-18 81443981-bfe Analyst, The Anthropic Institute As a Member of Analytical Staff for The Anthropic Institute, you'll spend your time researching how Anthropic is tackling the challenges the Institute is focused on, synthesizing what you learn from across the organization, and turning that into rigorous analysis that shapes both internal decisions and public understanding.

Some representative projects we can imagine working on include:

Clearly explaining the implications of automating AI research with AI systems.

Convening outside and internal experts to generate policy options a government may want to utilize if GDP growth rates in developed economies exceed 10% a year.

Learning from our customers about how using AI technology has changed their own organizations and then turning that into lessons which others can benefit from.

Asking and prototyping how the regulation of AI companies and AI systems could be done via AI systems themselves.

You'll also conduct your own research in these areas, working to utilize the AI systems you have access to at Anthropic as well as your colleagues to sharpen and improve your thinking.

Key responsibilities include:

Researching how Anthropic's teams are working on the Institute's challenges and synthesizing findings from across the organization into a coherent picture.

Partnering with teams to help them surface their insights to the world, often working to act as the 'connective tissue' between them and other teams to bring different insights together.

Producing written analysis and memos about how Anthropic is approaching these problems,for both internal leadership and public audiences.

Partnering with relevant teams to develop and publish public outputs.

Coming up with creative ways to carry our work into the world: sometimes the most impactful way to talk about an issue is through a technical demonstration rather than a blog post or research paper.

XML job scraping automation by YubHub

]]> full-time senior hybrid $295,000-$345,000 USD Technical policy research, Think tank work, Applied research, Claude, Machine learning, Data analysis, Communication, Public policy, Economics, Political science, Organizational behavior Engineering Technology The Anthropic Institute https://logos.yubhub.co/anthropic.com.png The Anthropic Institute is a new externally-facing function within Anthropic that aims to benefit the public by providing information about the impact of Anthropic's AI systems. https://www.anthropic.com/ https://job-boards.greenhouse.io/anthropic/jobs/5123742008 San Francisco, CA 2026-04-18 2907e75d-d4e Research Engineer, Frontier Safety Risk Assessment Job Title: Research Engineer, Frontier Safety Risk Assessment

We are seeking 2 Research Engineers for the Frontier Safety Risk Assessment team within the AGI Safety and Alignment Team.

As a Research Engineer, you will contribute novel research towards our ability to measure and assess risk from frontier models. This might include:

Identifying new risk pathways within current areas (loss of control, ML R&D, cyber, CBRN, harmful manipulation) or in new ones;
Conceiving of, designing, and developing new ways to measure pre-mitigation and post-mitigation risk;
Forecasting and scenario planning for future risks which are not yet material.

Your work will involve complex conceptual thinking as well as engineering. You should be comfortable with research that is uncertain, under-constrained, and which does not have an achievable “right answer”. You should also be skilled at engineering, especially using Python, and able to rapidly familiarise yourself with internal and external codebases. Lastly, you should be able to adapt to pragmatic constraints around compute and researcher time that require us to prioritise effort based on the value of information.

Although this job description is written for a Research Engineer, all members of this team are better thought of as members of technical staff. We expect everyone to contribute to the research as well as the engineering and to be strong in both areas.

The role will mostly depend on your general ability to assess and manage future risks, rather than from specialist knowledge within the risk domains, but insofar as specialist knowledge is helpful, knowledge in ML R&D and loss of control as risk domains are likely the most valuable.

About You

In order to set you up for success as a Research Engineer at Google DeepMind, we look for the following skills and experience:

You have extensive research experience with deep learning and/or foundation models (for example, but not necessarily, a PhD in machine learning).
You are adept at generating ideas and designing experiments, and implementing these in Python with real AI systems.
You are keen to address risks from foundation models, and have thought about how to do so. You plan for your research to impact production systems on a timescale between “immediately” and “a few years”.
You are excited to work with strong contributors to make progress towards a shared ambitious goal.
With strong, clear communication skills, you are confident engaging technical stakeholders to share research insights tailored to their background.

In addition, any of the following would be an advantage:

Experience in areas such as frontier risk assessment and/or mitigations, safety, and alignment.
Engineering experience with LLM training and inference.
PhD in Computer Science or Machine Learning related field.
A track record of publications at venues such as NeurIPS, ICLR, ICML, RL/DL, EMNLP, AAAI and UAI.
Experience with collaborating or leading an applied research project.

At Google DeepMind, we value diversity of experience, knowledge, backgrounds and perspectives and harness these qualities to create extraordinary impact. We are committed to equal employment opportunity regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, pregnancy, or related condition (including breastfeeding) or any other basis as protected by applicable law. If you have a disability or additional need that requires accommodation, please do not hesitate to let us know.

At Google DeepMind, we want employees and their families to live happier and healthier lives, both in and out of work, and our benefits reflect that. Some select benefits we offer: enhanced maternity, paternity, adoption, and shared parental leave, private medical and dental insurance for yourself and any dependents, and flexible working options. We strive to continually improve our working environment, and provide you with excellent facilities such as healthy food, an on-site gym, faith rooms, terraces etc.

We are also open to relocating candidates and offer a bespoke service and immigration support to make it as easy as possible (depending on eligibility).

The US base salary range for this full-time position is between $136,000 - $245,000 + bonus + equity + benefits. Your recruiter can share more about the specific salary range for your targeted location during the hiring process.

XML job scraping automation by YubHub

]]> full-time staff onsite $136,000 - $245,000 + bonus + equity + benefits Python, Deep learning, Foundation models, Risk assessment, Mitigation, Forecasting, Scenario planning, LLM training and inference, PhD in Computer Science or Machine Learning related field, Track record of publications at venues such as NeurIPS, ICLR, ICML, RL/DL, EMNLP, AAAI and UAI, Experience with collaborating or leading an applied research project Engineering Technology Google DeepMind https://logos.yubhub.co/deepmind.com.png Google DeepMind is a subsidiary of Alphabet Inc., a multinational conglomerate headquartered in Mountain View, California. https://deepmind.com/ https://job-boards.greenhouse.io/deepmind/jobs/7493360 London, UK; New York City, New York, US; San Francisco, California, US 2026-03-16 8ee55a18-4c1 Researcher, Automated Red Teaming Location

San Francisco

Employment Type

Full time

Department

Safety Systems

Compensation

Estimated Base Salary $295K – $445K

The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts

Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)

401(k) retirement plan with employer match

Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)

Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees

13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)

Mental health and wellness support

Employer-paid basic life and disability coverage

Annual learning and development stipend to fuel your professional growth

Daily meals in our offices, and meal delivery credits as eligible

Relocation support for eligible employees

Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.

More details about our benefits are available to candidates during the hiring process.

This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.

About the team

The Safety Systems org ensures that OpenAI’s most capable models can be responsibly developed and deployed. We build evaluations, safeguards, and safety frameworks that help our models behave as intended in real-world settings.

The Preparedness team is an important part of the Safety Systems org at OpenAI, and is guided by OpenAI’s Preparedness Framework.

Frontier AI models have the potential to benefit all of humanity, but also pose increasingly severe risks. To ensure that AI promotes positive change, the Preparedness team helps us prepare for the development of increasingly capable frontier AI models. This team is tasked with identifying, tracking, and preparing for catastrophic risks related to frontier AI models.

The mission of the Preparedness team is to:

Closely monitor and predict the evolving capabilities of frontier AI systems, with an eye towards risks whose impact could be catastrophic
Ensure we have concrete procedures, infrastructure and partnerships to mitigate these risks and to safely handle the development of powerful AI systems

Preparedness tightly connects capability assessment, evaluations, and internal red teaming, and mitigations for frontier models, as well as overall coordination on AGI preparedness. This is fast paced, exciting work that has far reaching importance for the company and for society.

About the role

This role leads the Automated Red Teaming (ART) effort: building scalable, research-driven systems that continuously discover failure modes in our models and mitigations — and translate those findings into actionable, production-facing improvements. The goal is to maximize counterfactual reduction in expected harm by finding the highest-leverage, least-covered weaknesses early and reliably.

In this role you will

You will own the research and technical direction for automated red teaming across catastrophic risk areas, with an initial emphasis on:

Automated classifier jailbreak discovery (cyber and bio)
Automated bio threat-development elicitation (worst-feasible planning uplift)
CoT monitoring evasion probing (and adjacent loss-of-control evaluations)

You will partner tightly with:

Vertical risk teams (Cyber, Bio, Loss of Control) to define threat models, prioritize targets, and land mitigations
The Classifiers team to turn discovered attacks into training data, evals, and measurable robustness gains
Product / eng / safety stakeholders to ensure ART outputs are operationally useful (not just interesting)

You might thrive in this role if you:

Feel a strong pull toward AI safety, and you’re motivated by reducing real-world catastrophic risk (not just publishing cool results)
Love breaking systems (responsibly) — you get energy from finding weird, high-severity failure modes and turning them into concrete fixes
Have strong applied research instincts, especially around evaluations: you’re good at designing experiments that are reproducible, interpretable, and hard to fool
Bring hands-on experience with LLMs and agents, including multi-turn behaviors, tool use, and the ways models adapt to constraints
Are comfortable building scalable automation, not just prototypes — you can turn red-teaming ideas into pipelines that run continuously and produce high-signal outputs
Have solid software engineering fundamentals (data structures, algorithms, testing discipline) and you can work effectively in a production-adjacent environment
Think in threat models and incentives, and you naturally ask “what would an attacker do next?” or “how would this fail under pressure?”
Can translate messy findings into action, communicating clearly with researchers, engineers, product, and policy — and driving alignment on what to fix first
Care about efficiency and prioritization, and you’re happy to say “no” to low-level

XML job scraping automation by YubHub

]]> full-time senior onsite $295K – $445K Applied research, Automated red teaming, Catastrophic risk assessment, Classifier jailbreak discovery, Cybersecurity, Data structures, Evaluations, LLMs and agents, Loss-of-control evaluations, Multi-turn behaviors, Red teaming, Scalable automation, Software engineering, Threat models, Tool use, Bio threat-development elicitation, CoT monitoring evasion probing, Loss-of-control evaluations, Multi-turn behaviors, Red teaming, Scalable automation, Software engineering, Threat models, Tool use Engineering Technology OpenAI https://logos.yubhub.co/openai.com.png OpenAI is a technology company that specializes in developing and training artificial intelligence models. It was founded in 2015 and is headquartered in San Francisco, California. https://jobs.ashbyhq.com https://jobs.ashbyhq.com/openai/bf7d2623-7846-410c-87f8-c628915ec16c San Francisco 2026-03-06 1875654e-29d Data Scientist, Foundation AI - PhD Early Career [2026] Data Scientist, Foundation AI - PhD Early Career

San Mateo, CA, United States

Early Career

ID: 5825

Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D immersive digital experiences– all created by our global community of developers and creators.

At Roblox, we’re building the tools and platform that empower our community to bring any experience that they can imagine to life. Our vision is to reimagine the way people come together, from anywhere in the world, and on any device.

We’re on a mission to connect a billion people with optimism and civility, and looking for amazing talent to help us get there.

A career at Roblox means you’ll be working to shape the future of human interaction, solving unique technical challenges at scale, and helping to create safer, more civil shared experiences for everyone.

WHY DATA SCIENCE & ANALYTICS?

The Data Science & Analytics organization's mission is to increase our speed, frequency, and acumen in making decisions at scale by instilling a data-influenced approach to building products. We cover a wide area of the data spectrum, including analytical data engineering, product analytics, experimentation, causal inference, statistical modeling, and machine learning. Aligned and partnered with product verticals, we use this extensive tool belt to discover new opportunities and unmet use cases, influence and craft the product roadmap, and prioritize, build data products, and measure impact on our community of players and developers.

WHY GENERATIVE AI?

Our team’s mission is to enable Roblox Creators to bring GenAI capabilities to millions of users. We drive this innovation with a core commitment to safety, responsibility, and quality.

As a Data Scientist, you will play a critical role in evaluation and optimization for user-facing GenAI systems (such as text, image, video, 3D, 4D). You will define how we measure safety, responsibility, quality, and efficiency. You will combine annotation analysis, design of experiments, causal inference, model-based evaluation methods (such as LLM-as-a-judge), optimization algorithm, and AI models to drive product decisions and model improvements.

You Will:

Develop Evaluation Frameworks: Design and operationalize rigorous evaluation systems for either GenAI features (text, image, video, 3D, 4D). This includes eval experiment design, dataset design, label reliability analysis, and implementing and finetuning LLM-as-judge methods.

Run Rigorous Experiments: Conduct online experiments (A/B tests) and causal inference to quantify the impact of GenAI features. You will identify opportunities, measure lift, and ensure statistical rigor.

Define Success Metrics: Partner with cross-functional teams to define leading/lagging indicators for GenAI feature user satisfaction, business success, and safety.

Build Automated Systems: Research and apply state-of-the-art methodologies to build reproducible evaluation tooling that lift rigor and efficiency across the company.

Conduct Applied Research at the Frontier: Maintain an active pulse on the intersection of Gen AI and Data Science. You will innovate on methodology and techniques to solve unique business challenges while contributing to the broader field in the technical community.

You Have:

Possess or pursuing a PhD or equivalent in Statistics, Economics, Computer Science, Applied Math, Physics, Engineering, or a related quantitative field.

Technical Proficiency: Strong proficiency in SQL (Hive/Spark) for manipulating large datasets and scripting languages (Python or R) for analysis and modeling.

Experimentation and Causal Inference: A solid grounding in experimentation, causal inference, and statistical analysis, including test design and metric design for feature impact.

Problem Solving: A demonstrated track record of framing ambiguous problems, designing analytical approaches, and solving open-ended data science problems that drive business impact.

Learning Agility: Ability to effectively and responsibly use AI tools to enhance productivity and a passion for continuously improving methods in a fast-evolving field.

GenAI Familiarity: Familiarity with GenAI models and safety/quality evaluation methods. Expertise in the model training lifecycle is a plus (e.g., fine-tuning, RLHF, or synthetic data generation).

Applied Research Background: A track record of applied research or publications in relevant technical fields is highly valued.

You may redact age, date of birth, and dates of attendance/graduation from your resume if you prefer.

For roles that are based at our headquarters in San Mateo, CA: The starting base pay for this position is as shown below. The actual base pay is dependent upon a variety of job-related factors such as professional background, training, work experience, location, business needs and market demand. Therefore, in some circumstances, the actual salary could fall outside of this expected range. This pay range is subject to change and may be modified in the future. All full-time employees are also eligible for equity compensation and for benefits as described on this page.

Annual Salary Range

$185,860—$221,380 USD

Roles that are based in an office are onsite Tuesday, Wednesday, and Thursday, with optional presence on Monday and Friday (unless otherwise noted).

XML job scraping automation by YubHub

]]> full-time entry hybrid $185,860—$221,380 USD SQL, Hive/Spark, Python, R, Statistics, Economics, Computer Science, Applied Math, Physics, Engineering, Experimentation, Causal Inference, Statistical Analysis, Test Design, Metric Design, Feature Impact, Problem Solving, Learning Agility, AI Tools, GenAI Models, Safety/Quality Evaluation Methods, Model Training Lifecycle, GenAI Familiarity, Applied Research Background, Publications in Relevant Technical Fields Engineering Technology Roblox https://logos.yubhub.co/careers.roblox.com.png Roblox is a global online platform that allows users to create and play a wide variety of games and experiences. With tens of millions of users, it is one of the largest online gaming platforms in the world. https://careers.roblox.com https://careers.roblox.com/jobs/7577436 San Mateo, CA 2026-03-06 f631833c-2da AI Software Engineer - Agents Perplexity is seeking an energetic engineer to join our highly driven Agents engineering team. The Agents team consists of AI/ML, backend, and full-stack engineers who collaborate to build delightful agentic experiences within our Comet ecosystem and Perplexity Computer, our platform for generalized frontier intelligence.

What you'll do

As an engineer on our Agents team, you will bring AI expertise, sharp product intuition, and a tinkerer's mindset to advance the frontier of what agents can accomplish for our millions of devoted users. You will work across applied research and engineering to solve many open problems in AI, including:

Designing AI agents to navigate the digital world and perform increasingly valuable units of work for our users;

Training action and decision models that determine, based on complex multimodal states, how to accomplish user-specified objectives;

Providing consistently excellent experiences across desktop, mobile, headless cloud, and other environments through flexible abstractions and frictionless backgrounding;

Developing permission architectures, payload classifiers, and other methods to implement secure-by-design agentic capabilities;

Designing optimal data representations and modes of interaction between agents and their environments;

and much, much more.

What you need

Strong foundational familiarity with the full AI product stack;

Proficiency in Python (bonus points for TypeScript, Go, and/or Rust);

Significant experience in at least one of the following areas:

Context engineering and tool interfaces for frontier AI models

Post-training and reinforcement learning (particularly for multimodal models)

Browser technologies (CDP, Playwright, extension development, etc.)

XML job scraping automation by YubHub

]]> full-time mid onsite $220K – $405K AI expertise, sharp product intuition, Python, TypeScript, Go, Rust, context engineering, post-training and reinforcement learning, browser technologies, AI/ML, backend, full-stack, applied research, engineering Engineering Technology Perplexity https://logos.yubhub.co/perplexity.ai.png Perplexity is a company that empowers users with AI agents that can faithfully actualize their intent, however and wherever expressed, through open-ended interactions with the world. The company is seeking an energetic engineer to join its highly driven Agents engineering team. https://www.perplexity.ai/ https://jobs.ashbyhq.com/perplexity/bc1a6878-8de9-48c2-a791-95b2f8f27261 San Francisco 2026-03-04 ede762ec-1df Principal Research Programmer We are seeking a highly skilled Principal Research Programmer to join our Research & Development team. As a Principal Research Programmer, you will be responsible for researching and prototyping machine learning and generative approaches for game content creation, working with development programmers and product programmers to deploy and productize novel technology results in Epic products, and engaging with the scientific community to contribute domain knowledge and advice in support of strategically relevant R&D efforts.

What you'll do

Research and prototype machine learning and generative approaches for game content creation
Work with development programmers and product programmers to deploy and productize novel technology results in Epic products

What you need

PhD degree in Computer Science, Machine Learning, Mathematics, Programming, or a related discipline
Proven applied research impact in shipping titles

XML job scraping automation by YubHub

]]> full-time senior onsite PhD degree in Computer Science, Machine Learning, Mathematics, Programming, or a related discipline, Proven applied research impact in shipping titles, Strong analytical and reasoning skills with an emphasis on innovative and practical solutions, Excellent programming skills with a preference for experience with Python and C++ Engineering Technology Epic Games https://logos.yubhub.co/epicgames.com.png Epic Games is a leading game development company that creates award-winning games and engine technology. The company is known for its collaborative and welcoming environment, and it prides itself on creating a space for talented and passionate individuals to innovate and grow. https://www.epicgames.com https://www.epicgames.com/en-US/careers/jobs/5552623004 Montreal, Canada 2026-01-08 9468a7ec-252 Principal Research Engineer We are looking for a Principal Research Engineer to join our team. In this role, you will research and prototype machine learning and generative approaches for game content creation, work with development engineers and product engineers to deploy and productize novel technology results in Epic products, and engage with the scientific community, contribute domain knowledge and advice in support of strategically relevant R&D efforts.

What you'll do

Research and prototype machine learning and generative approaches for game content creation
Work with development engineers and product engineers to deploy and productize novel technology results in Epic products

What you need

PhD degree in Computer Science, Machine Learning, Mathematics, Engineering, or a related discipline
Proven applied research impact in shipping titles

XML job scraping automation by YubHub

]]> full-time senior onsite PhD degree in Computer Science, Machine Learning, Mathematics, Engineering, or a related discipline, Proven applied research impact in shipping titles, Strong analytical and reasoning skills with an emphasis on innovative and practical solutions, Excellent programming skills with a preference for experience with Python and C++ Engineering Technology Epic Games https://logos.yubhub.co/epicgames.com.png Epic Games is a leading game development company that creates award-winning games and engine technology. The company prides itself on creating a collaborative, welcoming, and creative environment. https://www.epicgames.com https://www.epicgames.com/en-US/careers/jobs/5553148004 Porto Alegre, Brazil 2026-01-08 c98c4304-e19 Principal Research Engineer The Epic Games Research & Development team is searching for an experienced hands-on Research Programmer. The ideal candidate will bring a passion for digital humans and experience with machine learning and animation tech.

What you'll do

The Research Programmer will be responsible for researching and prototyping machine learning and generative approaches for game content creation, working with development programmers and product programmers to deploy and productize novel technology results in Epic products, and engaging with the scientific community to contribute domain knowledge and advice in support of strategically relevant R&D efforts.

What you need

PhD degree in Computer Science, Machine Learning, Mathematics, Programming, or a related discipline

XML job scraping automation by YubHub

]]> full-time senior onsite $270,572—$396,838 USD (California Base Pay Range) PhD degree in Computer Science, Machine Learning, Mathematics, Programming, or a related discipline, Proven applied research impact in shipping titles, Strong analytical and reasoning skills with an emphasis on innovative and practical solutions, Excellent programming skills with a preference for experience with Python and C++, Experience working in common ML frameworks such as PyTorch or Tensorflow, and rapid prototyping in Python, In-depth knowledge in one or more of the following areas: real-time graphics, computer vision, machine learning, animation, large-language models, speech recognition, speech synthesis, linguistics, performance-capture, 3D reconstruction Engineering Technology Epic Games https://logos.yubhub.co/epicgames.com.png Epic Games is a leading game development company that creates award-winning games and engine technology. The company is known for its collaborative and welcoming environment, and it prides itself on creating a space for talented individuals to innovate and push the boundaries of game development. https://www.epicgames.com https://www.epicgames.com/en-US/careers/jobs/5500537004 Multiple Locations 2026-01-08 ced9832a-a8f Principal Research Engineer The Special Projects team at Epic is responsible for executing high-impact projects that push the envelope to define the future of real-time graphics and gaming technology. In this role, you will research and prototype machine learning and generative approaches for game content creation, work with development engineers and product engineers to deploy and productize novel technology results in Epic products, and engage with the scientific community, contribute domain knowledge and advice in support of strategically relevant R&D efforts.

What you'll do

Research and prototype machine learning and generative approaches for game content creation
Work with development engineers and product engineers to deploy and productize novel technology results in Epic products

What you need

PhD degree in Computer Science, Machine Learning, Mathematics, Engineering, or a related discipline
Proven applied research impact in shipping titles

XML job scraping automation by YubHub