Evals Engineer, Applied AI

9a42f26c-511 Evals Engineer, Applied AI We are seeking a technically rigorous and driven AI Research Engineer to join our Enterprise Evaluations team. This high-impact role is critical to our mission of delivering the industry's leading GenAI Evaluation Suite.

As a hands-on contributor to the core systems that ensure the safety, reliability, and continuous improvement of LLM-powered workflows and agents for the enterprise, you will partner with Scale's Operations team and enterprise customers to translate ambiguity into structured evaluation data. This involves guiding the creation and maintenance of gold-standard human-rated datasets and expert rubrics that anchor AI evaluation systems.

Your responsibilities will also include analysing feedback and collected data to identify patterns, refine evaluation frameworks, and establish iterative improvement loops that enhance the quality and relevance of human-curated assessments. You will design, research, and develop LLM-as-a-Judge autorater frameworks and AI-assisted evaluation systems, including creating models that critique, grade, and explain agent outputs.

To succeed in this role, you will need a strong foundational knowledge of large language models, a passion for tackling complex evaluation challenges, and the ability to thrive in a dynamic, fast-paced research environment. You should be able to think outside the box, stay current with the latest literature in AI evaluation, and be passionate about integrating novel research ideas into our workflows to build best-in-class evaluation systems.

In addition to your technical expertise, you will need excellent communication and collaboration skills, as you will work closely with cross-functional teams to drive project success.

If you are a motivated and detail-oriented individual with a passion for AI research and evaluation, we encourage you to apply for this exciting opportunity.

XML job scraping automation by YubHub

]]> full-time mid hybrid $216,000-$270,000 USD Python, PyTorch, TensorFlow, Large Language Models, Generative AI, Machine Learning, Applied Research, Evaluation Infrastructure, Advanced degree in Computer Science, Machine Learning, or a related quantitative field, Published research in leading ML or AI conferences, Experience designing, building, or deploying LLM-as-a-Judge frameworks or other automated evaluation systems, Experience collaborating with operations or external teams to define high-quality human annotator guidelines, Expertise in ML research engineering, stochastic systems, observability, or LLM-powered applications for model evaluation and analysis Engineering Technology Scale AI https://logos.yubhub.co/scale.com.png Scale AI develops reliable AI systems for the world's most important decisions. https://scale.com/ https://job-boards.greenhouse.io/scaleai/jobs/4629589005 San Francisco, CA; New York, NY 2026-04-18 683a40cb-69e Machine Learning Research Scientist / Research Engineer, Post-Training We are seeking a Research Scientist/Research Engineer to join our team. As a Research Scientist/Research Engineer, you will develop novel methods to improve the alignment and generalization of large-scale generative models. You will collaborate with researchers and engineers to define best practices in data-driven AI development. You will also partner with top foundation model labs to provide both technical and strategic input on the development of the next generation of generative AI models.

Key Responsibilities:

Research and develop novel post-training techniques, including SFT, RLHF, and reward modeling, to enhance LLM core capabilities in both text and multimodal modalities.
Design and experiment new approaches to preference optimization.
Analyze model behavior, identify weaknesses, and propose solutions for bias mitigation and model robustness.
Publish research findings in top-tier AI conferences.

Ideal Candidate:

Ph.D. or Master's degree in Computer Science, Machine Learning, AI, or a related field.
Deep understanding of deep learning, reinforcement learning, and large-scale model fine-tuning.
Experience with post-training techniques such as RLHF, preference modeling, or instruction tuning.
Excellent written and verbal communication skills
Published research in areas of machine learning at major conferences (NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR, etc.) and/or journals
Previous experience in a customer-facing role.

XML job scraping automation by YubHub

]]> full-time senior onsite $252,000-$315,000 USD deep learning, reinforcement learning, large-scale model fine-tuning, post-training techniques, RLHF, preference modeling, instruction tuning, published research, customer-facing role Engineering Technology Scale https://logos.yubhub.co/scale.com.png Scale develops reliable AI systems for the world's most important decisions. https://scale.com/ https://job-boards.greenhouse.io/scaleai/jobs/4528009005 San Francisco, CA; Seattle, WA; New York, NY 2026-04-18 60a7e1e6-b51 Tech Lead/Manager, Machine Learning Research Scientist- LLM Evals As the leading data and evaluation partner for frontier AI companies, we're dedicated to advancing the evaluation and benchmarking of large language models (LLMs). Our Research teams work with the industry's leading AI labs to provide high-quality data and accelerate progress in GenAI research.

We're seeking a Tech Lead Manager to lead a talented team of research scientists and research engineers focused on developing and implementing novel evaluation methodologies, metrics, and benchmarks to assess the capabilities and limitations of our cutting-edge LLMs.

Key responsibilities:

Lead a team of highly effective research scientists and research engineers on LLM evals.
Conduct research on the effectiveness and limitations of existing LLM evaluation techniques.
Design and develop novel evaluation benchmarks for large language models, covering areas such as instruction following, factuality, robustness, and fairness.
Communicate, collaborate, and build relationships with clients and peer teams to facilitate cross-functional projects.
Collaborate with internal teams and external partners to refine metrics and create standardized evaluation protocols.
Implement scalable and reproducible evaluation pipelines using modern ML frameworks.
Publish research findings in top-tier AI conferences and contribute to open-source benchmarking initiatives.

Ideal candidate has 5+ years of hands-on experience in large language model, NLP, and Transformer modeling, in the setting of both research and engineering development. Experience supporting and leading a team of research scientists and research engineers is also required.

XML job scraping automation by YubHub

]]> full-time senior onsite $264,800-$331,000 USD large language model, NLP, Transformer modeling, research and engineering development, team leadership, cross-functional collaboration, evaluation methodologies, metrics and benchmarks, scalable and reproducible evaluation pipelines, modern ML frameworks, published research in top-tier AI conferences, open-source benchmarking initiatives, customer-facing role Engineering Technology Scale https://logos.yubhub.co/scale.com.png Scale develops reliable AI systems for the world's most important decisions, providing high-quality data and full-stack technologies. https://scale.com/ https://job-boards.greenhouse.io/scaleai/jobs/4304790005 San Francisco, CA; Seattle, WA; New York, NY 2026-04-18 b1be4c11-417 Senior Research Scientist, Reward Models As a Senior Research Scientist on our Reward Models team, you'll lead research efforts to improve how we specify and learn human preferences at scale. Your work will directly shape how our models understand and optimize for what humans actually want , enabling Claude to be more useful, more reliable, and better aligned with human values.

This role focuses on pushing the frontier of reward modeling for large language models. You'll develop novel architectures and training methodologies for RLHF, research new approaches to LLM-based evaluation and grading (including rubric-based methods), and investigate techniques to identify and mitigate reward hacking. You'll collaborate closely with teams across Anthropic, including Finetuning, Alignment Science, and our broader research organization, to ensure your work translates into concrete improvements in both model capabilities and safety.

We're looking for someone who can drive ambitious research agendas while also shipping practical improvements to production systems. You'll have the opportunity to work on some of the most important open problems in AI alignment, with access to frontier models and significant computational resources. Your work will directly advance the science of how we train AI systems to be both highly capable and safe.

Responsibilities:

Lead research on novel reward model architectures and training approaches for RLHF
Develop and evaluate LLM-based grading and evaluation methods, including rubric-driven approaches that improve consistency and interpretability
Research techniques to detect, characterize, and mitigate reward hacking and specification gaming
Design experiments to understand reward model generalization, robustness, and failure modes
Collaborate with the Finetuning team to translate research insights into improvements for production training pipelines
Contribute to research publications, blog posts, and internal documentation
Mentor other researchers and help build institutional knowledge around reward modeling

You may be a good fit if you

Have a track record of research contributions in reward modeling, RLHF, or closely related areas of machine learning
Have experience training and evaluating reward models for large language models
Are comfortable designing and running large-scale experiments with significant computational resources
Can work effectively across research and engineering, iterating quickly while maintaining scientific rigor
Enjoy collaborative research and can communicate complex ideas clearly to diverse audiences
Care deeply about building AI systems that are both highly capable and safe

Strong candidates may also

Have published research on reward modeling, preference learning, or RLHF
Have experience with LLM-as-judge approaches, including calibration and reliability challenges
Have worked on reward hacking, specification gaming, or related robustness problems
Have experience with constitutional AI, debate, or other scalable oversight approaches
Have contributed to production ML systems at scale
Have familiarity with interpretability techniques as applied to understanding reward model behavior

The annual compensation range for this role is $350,000-$500,000 USD.

XML job scraping automation by YubHub

]]> full-time senior hybrid $350,000-$500,000 USD reward modeling, RLHF, LLM-based evaluation and grading, rubric-driven approaches, reward hacking, specification gaming, large-scale experiments, computational resources, research and engineering, collaborative research, complex ideas communication, AI systems development, published research, LLM-as-judge approaches, calibration and reliability challenges, constitutional AI, debate, scalable oversight approaches, production ML systems, interpretability techniques Engineering Technology Anthropic https://logos.yubhub.co/anthropic.com.png Anthropic creates reliable, interpretable, and steerable AI systems. It is a public benefit corporation headquartered in San Francisco. https://www.anthropic.com/ https://job-boards.greenhouse.io/anthropic/jobs/5024835008 Remote-Friendly (Travel Required) | San Francisco, CA 2026-04-18 769c0070-5b2 Research Scientist, Agent Robustness As a Research Scientist working on Agent Robustness, you will work on the fundamental challenges of building AI agents that are safe and aligned with humans.

For example, you might:

Research the science of AI agent capabilities with a focus on how they relate to safety, risk factors, and methodologies for benchmarking them;
Design and build harnesses to test AI agents' tendency to take harmful actions when pressured to do so by users or tricked into doing so by elements of their environment;
Design and build exploits and mitigations for new and unique failure modes that arise as AI agents gain affordances like coding, web browsing, and computer use;
Characterize and design mitigations for potential failure modes or broader risks of systems involving multiple interacting AI agents.

Ideally you'd have:

Commitment to our mission of promoting safe, secure, and trustworthy AI deployments in the industry as frontier AI capabilities continue to advance;
Practical experience conducting technical research collaboratively;
Experience with post-training and RL techniques such as RLHF, DPO, GRPO, and similar approaches;
A track record of published research in machine learning, particularly in generative AI;
At least three years of experience addressing sophisticated ML problems, whether in a research setting or in product development;
Strong written and verbal communication skills to operate in a cross-functional team.

Nice to have:

Hands-on experience with agent evaluation frameworks such as SWE-bench, WebArena, OSWorld, Inspect, or similar tools;
Experience with red-teaming, prompt injection, or adversarial testing of AI systems.

Our research interviews are crafted to assess candidates' skills in practical ML prototyping and debugging, their grasp of research concepts, and their alignment with our organisational culture. We will not ask any LeetCode-style questions. If you're excited about advancing AI safety and contributing to our mission, we encourage you to apply, even if your experience doesn't perfectly align with every requirement.

Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position, determined by work location and additional factors, including job-related skills, experience, interview performance, and relevant education or training. Scale employees in eligible roles are also granted equity-based compensation, subject to Board of Director approval. Your recruiter can share more about the specific salary range for your preferred location during the hiring process, and confirm whether the hired role will be eligible for equity grant. You'll also receive benefits including, but not limited to: Comprehensive health, dental and vision coverage, retirement benefits, a learning and development stipend, and generous PTO. Additionally, this role may be eligible for additional benefits such as a commuter stipend.

XML job scraping automation by YubHub

]]> full-time senior hybrid $216,000-$270,000 USD Commitment to our mission of promoting safe, secure, and trustworthy AI deployments in the industry as frontier AI capabilities continue to advance, Practical experience conducting technical research collaboratively, Experience with post-training and RL techniques such as RLHF, DPO, GRPO, and similar approaches, A track record of published research in machine learning, particularly in generative AI, At least three years of experience addressing sophisticated ML problems, whether in a research setting or in product development, Hands-on experience with agent evaluation frameworks such as SWE-bench, WebArena, OSWorld, Inspect, or similar tools, Experience with red-teaming, prompt injection, or adversarial testing of AI systems Engineering Technology Scale https://logos.yubhub.co/scale.com.png Scale develops reliable AI systems for the world's most important decisions. https://scale.com/ https://job-boards.greenhouse.io/scaleai/jobs/4675684005 San Francisco, CA; New York, NY 2026-04-18 e197305b-444 Model Behavior Tutor - Social Cognition & EQ About the Role

You will make Grok extraordinarily adept at reading emotional subtext, social context, and user intent, then responding with perfect calibration.

Responsibilities

Detect and interpret implied emotional states, attachment styles, and unspoken needs.
Teach models appropriate mirroring, boundary-setting, defusing, challenging, or comforting.
Detect and correct emotionally tone-deaf responses in AI-generated outputs.
Build datasets exposing dark-pattern social tactics (concern-trolling, gaslighting, etc.) so Grok remains immune.
Ensure cross-cultural sensitivity without performative overcorrection.

Basic Qualifications

Direct clinical experience with diverse populations (trauma, PTSD, personality disorders, neurodivergence, etc.).
Expertise in attachment theory, polyvagal theory, mentalization, or equivalent frameworks.
Demonstrated ability to read subtext and calibrate responses in high-stakes therapeutic or educational settings.
Familiarity with DSM-5 diagnostics and evidence-based interventions (CBT, DBT, trauma-focused modalities).
Strong ethical grounding and understanding of scope-of-practice boundaries.

Preferred Skills and Experience

Active or past clinical licensure.
Experience supervising clinicians or teaching at the postgraduate level.
Published research or clinical writing on trauma, emotional regulation, or social cognition.
Public policy or advocacy experience around mental health.

Location and Other Expectations

Tutor roles may be offered as full-time, part-time, or contractor positions, depending on role needs and candidate fit.
For contractor positions, hours will vary widely based on project scope and contractor availability, with no fixed commitments required. On average most projects may involve at least 10 hours per week to achieve deliverables effectively though this is not a fixed commitment and depends on the scope of work. Contractors have full flexibility to set their own hours and determine the exact amount of time needed to complete deliverables.
Tutor roles may be performed remotely from any location worldwide, subject to legal eligibility, time-zone compatibility, and role specific needs.
For US based candidates, please note we are unable to hire in the states of Wyoming and Illinois at this time.
We are unable to provide visa sponsorship.
For those who will be working from a personal device, your computer must be a Chromebook, Mac with MacOS 11.0 or later, or Windows 10 or later.

Compensation and Benefits

US based candidates: $45/hour - $70/hour depending on factors including relevant experience, skills, education, geographic location, and qualifications. International candidates: Information will be provided to you during the recruitment process.

Benefits vary based on employment type, location and jurisdiction. Benefits for eligible U.S. based positions include health insurance, 401(k) plan, and paid sick leave. Specific details and role specific information will be provided to you during the interview process.

XML job scraping automation by YubHub

]]> full-time|part-time|contract remote $45/hour - $70/hour Clinical experience with diverse populations, Attachment theory, Polyvagal theory, Mentalization, DSM-5 diagnostics, Evidence-based interventions, Emotional regulation, Social cognition, Active or past clinical licensure, Experience supervising clinicians, Published research or clinical writing, Public policy or advocacy experience Engineering Technology xAI https://logos.yubhub.co/xai.com.png xAI creates AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. https://www.xai.com/ https://job-boards.greenhouse.io/xai/jobs/5017526007 Remote 2026-04-18 6e5d1aa4-d3c Policy Manager, Chemical Weapons and High Yield Explosives About this Role

This role offers a unique opportunity to shape how AI systems handle sensitive chemical and explosives information. You'll work with leading AI safety researchers while tackling critical problems in preventing catastrophic misuse. If you're excited about using your expertise to ensure AI systems remain safe and beneficial, we want to hear from you.

Responsibilities:

Design and implement evaluation methodologies for assessing AI model capabilities relevant to chemical weapons, explosives synthesis, and energetic materials

Develop and execute strategies to identify and mitigate potential C/E misuse in model outputs

Create C/E threat models, including precursor identification, synthesis routes, and weaponization techniques

Review and analyze traffic to identify potential policy violations related to C/E content

Collaborate with software engineers to develop and refine detection systems and automated enforcement tools for C/E threats

Conduct rapid response to escalations involving dangerous C/E queries

Collaborate across teams to establish safety benchmarks and develop appropriate model guardrails

Translate C/E domain knowledge into actionable safety requirements

Develop approaches to assess C/E model knowledge boundaries for dual-use chemical information

Monitor emerging threats in the C/E landscape to inform policy development

You may be a good fit if you:

Have a Ph.D. in Chemistry, Chemical Engineering, or a related field with focus on energetic materials, explosives, and/or chemical weapons

Have 5-8+ years of experience in chemical weapons and/or explosives defense, with deep expertise in energetic materials, chemical weapon agents, or related areas

Have knowledge of high yield explosives application to radiological dispersal devices (dirty bombs) and related radiological weapons

Have a track record of translating specialized technical knowledge into actionable safety policies or guidelines

Are comfortable navigating ambiguity and developing solutions for novel safety challenges

Can work independently while maintaining strong collaboration with cross-functional teams including engineering, enforcement, and research

Thrive in a fast-paced environment where you balance rigorous scientific standards with rapid threat response

Are passionate about preventing misuse of dangerous technical knowledge while enabling beneficial applications

Strong candidates may have:

Experience with both chemical weapons and high yield explosives defense

Experience working with defense, intelligence, or nonproliferation organizations (e.g., OPCW, IAEA, national labs, defense contractors)

Published research or practical experience in explosives characterization, chemical weapons detection, or related security applications

Knowledge of international chemical weapons conventions (CWC) and controlled substances regulations

Demonstrated ability to communicate complex technical concepts to non-specialist audiences

Experience with chemical databases (PubChem, Reaxys, SciFinder) and computational chemistry tools

Understanding of radiological materials and their interaction with explosive dispersal mechanisms

Familiarity with dual-use C/E research concerns and responsible disclosure practices

The annual compensation range for this role is $245,000-$285,000 USD.

XML job scraping automation by YubHub

]]> full-time senior hybrid $245,000-$285,000 USD Ph.D. in Chemistry, Chemical Engineering, or a related field, 5-8+ years of experience in chemical weapons and/or explosives defense, Knowledge of high yield explosives application to radiological dispersal devices, Experience with chemical databases (PubChem, Reaxys, SciFinder), Understanding of radiological materials and their interaction with explosive dispersal mechanisms, Experience with both chemical weapons and high yield explosives defense, Experience working with defense, intelligence, or nonproliferation organizations, Published research or practical experience in explosives characterization, chemical weapons detection, or related security applications, Knowledge of international chemical weapons conventions (CWC) and controlled substances regulations, Demonstrated ability to communicate complex technical concepts to non-specialist audiences Engineering Technology Anthropic https://logos.yubhub.co/anthropic.com.png Anthropic is a public benefit corporation that focuses on creating reliable, interpretable, and steerable AI systems. https://www.anthropic.com/ https://job-boards.greenhouse.io/anthropic/jobs/5140226008 San Francisco, CA | New York City, NY 2026-04-18 9f6fed50-cc0 Applied AI, AI Engineer About the Job

We are seeking an Applied AI, AI Engineer to join our customer-facing technical organization. As a member of our team, you will work directly with enterprise clients from pre-sales through implementation to deploy cutting-edge AI solutions that deliver measurable business impact.

Your primary responsibility will be to identify high-value internal use cases across engineering, legal, HR, sales, and operations, and build or vibe code end-to-end LLM applications. You will own the full lifecycle of these applications, from prototype to production, maintenance, and iteration.

In addition to your technical skills, you will also be responsible for documenting learnings and sharing insights with product and research teams, and converting successful internal tools into customer demos or case studies where appropriate.

How We Work in Applied AI

We care about people and outputs. What matters is what you ship, not the time you spend on it. Bureaucracy is where urgency goes to vanish. You talk to whoever you need to talk to. The best idea wins, whether it comes from a principal engineer or someone in their first week. Always ask why. The best solutions come from deep understanding, not from copying what worked before. We say what we mean. Feedback is direct, timely, and given because we care. No politics. Low ego, high standards. We embrace an unstructured environment and find joy in it.

About You

You are fluent in English and have 3+ years of experience building production software, with meaningful experience deploying LLM applications. You have a bias toward shipping, preferring a working prototype over a perfect specification. You possess strong technical coding skills in Python and front-end skills with React Frameworks. You are comfortable working autonomously across teams with different needs and constraints, and have strong communication skills to bridge non-technical teams and AI capabilities.

Ideally, you have contributions to open-source evaluation frameworks or published research on LLM evaluation, experience as a Customer Engineer, Forward Deployed Engineer, Sales Engineer, Solutions Architect, or Technical Product Manager, and experience with ML frameworks (PyTorch, HuggingFace Transformers).

Benefits

PTO: The CDI contract will be a 'Forfait 218 jours', corresponding to 25 days of holidays and on average 8 to 10 days of RTT days, and complete autonomy on working hours.

Health: Full health insurance coverage for you and your family.

Transportation: We offer a €600 annual mobility allowance, covering 50% of your public transportation costs and including the Sustainable Mobility Allowance (FMD), encouraging eco-friendly travel options such as cycling or carpooling.

Food: Swile meal vouchers with 10,83€ per worked day, including 60% offered by the company.

Sport: Gymlib - sponsorship by Mistral of a significant part of the monthly fee (depending on the program you chose).

Parental policy: 4 additional weeks for parents on top of what is offered by the French state.

XML job scraping automation by YubHub

]]> full-time mid onsite Python, React Frameworks, LLM applications, PyTorch, HuggingFace Transformers, Open-source evaluation frameworks, Published research on LLM evaluation, Customer Engineer, Forward Deployed Engineer, Sales Engineer, Solutions Architect, Technical Product Manager Engineering Technology Mistral AI https://logos.yubhub.co/mistral.ai.png Mistral AI develops and provides high-performance, optimized, open-source, and cutting-edge AI models, products, and solutions for enterprise needs. https://mistral.ai https://jobs.lever.co/mistral/3d9a6ece-1f8c-4e0b-a275-fde6300ed1f8 Paris 2026-04-17 5c28c97d-fc5 Member of Technical Staff - Image / Video Generation Job Title

Member of Technical Staff - Image / Video Generation

Job Description

We're the team behind Latent Diffusion, Stable Diffusion, and FLUX,foundational technologies that changed how the world creates images and video. We're creating the generative models that power how people make images and video,tools used by millions of creators, developers, and businesses worldwide. Our FLUX models are among the most advanced in the world, and we’re just getting started.

Why This Role

You'll train large-scale diffusion models for image and video generation, exploring new approaches while maintaining the rigor that helps us distinguish meaningful progress from incremental tweaks. This isn't about following established recipes,it's about running the experiments that clarify which architectural choices matter and which are less impactful.

What You’ll Work On

Trains large-scale diffusion transformer models for image and video data, working at the scale where intuitions break and empirical evidence matters
Rigorously ablates design choices,running experiments that isolate variables, control for confounds, and produce insights you can actually trust,then communicating those results to shape our research direction
Reasons about the speed-quality tradeoffs of neural network architectures in production settings where both constraints matter simultaneously
Fine-tunes diffusion models for specialized applications like image and video upscalers, inpainting/outpainting models, and other tasks where general-purpose models aren't enough

What We’re Looking For

You've trained large-scale diffusion models and developed strong intuitions about what matters. You know that at research scale, every design choice has tradeoffs, and the only way to know which ones are worth making is through careful ablation. You're comfortable debugging distributed training issues and presenting research findings to the team.

Required Skills

Hands-on experience training large-scale diffusion models for image and video data, with practical knowledge of common failure modes and what matters most in training
Experience fine-tuning diffusion models for specialized applications,upscalers, inpainting, outpainting, or other tasks where understanding the domain matters as much as understanding the architecture
Deep understanding of how to effectively evaluate image and video generative models,knowing which metrics correlate with quality and which are just convenient proxies
Strong proficiency in PyTorch, transformer architectures, and the full ecosystem of modern deep learning
Solid understanding of distributed training techniques,FSDP, low precision training, model parallelism,because our models don't fit on one GPU and training decisions impact research outcomes

Preferred Skills

Experience writing forward and backward Triton kernels and ensuring their correctness while considering floating point errors
Proficiency with profiling, debugging, and optimizing single and multi-GPU operations using tools like Nsight or stack trace viewers
Know the performance characteristics of different architectural choices at scale
Have published research that contributed to how people think about generative models

How We Work Together

We’re a distributed team with real offices that people actually use. Depending on your role, you’ll either join us in Freiburg or SF at least 2 days a week (or one full week every other week), or work remotely with a monthly in-person week to stay connected. We’ll cover reasonable travel costs to make this possible. We think in-person time matters, and we’ve structured things to make it accessible to all. We’ll discuss what this will look like for the role during our interview process.

XML job scraping automation by YubHub

]]> full-time staff hybrid large-scale diffusion models, image and video data, PyTorch, transformer architectures, distributed training techniques, writing forward and backward Triton kernels, profiling, debugging, and optimizing single and multi-GPU operations, published research on generative models Engineering Technology Black Forest Labs https://logos.yubhub.co/blackforestlabs.com.png Black Forest Labs is a research lab developing foundational technologies for image and video generation. They have a growing presence in San Francisco and headquarters in Freiburg, Germany. https://www.blackforestlabs.com/ https://job-boards.greenhouse.io/blackforestlabs/jobs/4132217008 Freiburg (Germany) 2026-04-17 e3cc8654-159 Policy Manager, Chemical Weapons and High Yield Explosives About this Role

Responsibilities:

Design and implement evaluation methodologies for assessing AI model capabilities relevant to chemical weapons, explosives synthesis, and energetic materials
Develop and execute strategies to identify and mitigate potential C/E misuse in model outputs
Create C/E threat models, including precursor identification, synthesis routes, and weaponization techniques
Review and analyze traffic to identify potential policy violations related to C/E content
Collaborate with software engineers to develop and refine detection systems and automated enforcement tools for C/E threats
Conduct rapid response to escalations involving dangerous C/E queries
Collaborate across teams to establish safety benchmarks and develop appropriate model guardrails
Translate C/E domain knowledge into actionable safety requirements
Develop approaches to assess C/E model knowledge boundaries for dual-use chemical information
Monitor emerging threats in the C/E landscape to inform policy development

You may be a good fit if you:

Have a Ph.D. in Chemistry, Chemical Engineering, or a related field with focus on energetic materials, explosives, and/or chemical weapons
Have 5-8+ years of experience in chemical weapons and/or explosives defense, with deep expertise in energetic materials, chemical weapon agents, or related areas
Have knowledge of high yield explosives application to radiological dispersal devices (dirty bombs) and related radiological weapons
Have a track record of translating specialized technical knowledge into actionable safety policies or guidelines
Are comfortable navigating ambiguity and developing solutions for novel safety challenges
Can work independently while maintaining strong collaboration with cross-functional teams including engineering, enforcement, and research
Thrive in a fast-paced environment where you balance rigorous scientific standards with rapid threat response
Are passionate about preventing misuse of dangerous technical knowledge while enabling beneficial applications

Strong candidates may have:

Experience with both chemical weapons and high yield explosives defense
Experience working with defense, intelligence, or nonproliferation organizations (e.g., OPCW, IAEA, national labs, defense contractors)
Published research or practical experience in explosives characterization, chemical weapons detection, or related security applications
Knowledge of international chemical weapons conventions (CWC) and controlled substances regulations
Demonstrated ability to communicate complex technical concepts to non-specialist audiences
Experience with chemical databases (PubChem, Reaxys, SciFinder) and computational chemistry tools
Understanding of radiological materials and their interaction with explosive dispersal mechanisms
Familiarity with dual-use C/E research concerns and responsible disclosure practices

Logistics

Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience. Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.

Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.

We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work.

Your safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you're ever unsure about a communication, don't click any links—visit anthropic.com/careers directly for confirmed position openings.

XML job scraping automation by YubHub

]]> full-time senior hybrid $245,000 - $285,000 USD Ph.D. in Chemistry, Chemical Engineering, or a related field, 5-8+ years of experience in chemical weapons and/or explosives defense, Knowledge of high yield explosives application to radiological dispersal devices (dirty bombs) and related radiological weapons, Experience with chemical databases (PubChem, Reaxys, SciFinder) and computational chemistry tools, Understanding of radiological materials and their interaction with explosive dispersal mechanisms, Familiarity with dual-use C/E research concerns and responsible disclosure practices, Experience with both chemical weapons and high yield explosives defense, Experience working with defense, intelligence, or nonproliferation organizations (e.g., OPCW, IAEA, national labs, defense contractors), Published research or practical experience in explosives characterization, chemical weapons detection, or related security applications, Knowledge of international chemical weapons conventions (CWC) and controlled substances regulations, Demonstrated ability to communicate complex technical concepts to non-specialist audiences Engineering Technology Anthropic https://logos.yubhub.co/anthropic.com.png Anthropic's mission is to create reliable, interpretable, and steerable AI systems. The company is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. https://job-boards.greenhouse.io https://job-boards.greenhouse.io/anthropic/jobs/5140226008 San Francisco, CA | New York City, NY 2026-03-08 eb1455d9-a55 Technical Advisor, Microsoft Superintelligence (Office of the CEO) Summary

Microsoft are looking for a talented Technical Advisor to join their Microsoft Superintelligence team in the Office of the CEO. This role requires someone who can provide direct technical and strategic support to the CEO of Microsoft AI, identifying opportunities and risks across our research portfolio.

About the Role

As a Technical Advisor, you will engage deeply with research teams on pre-training, post-training, multimodal systems, infrastructure, and safety/alignment work. You will prototype and evaluate models, techniques, and infrastructure to develop insights and inform strategic decisions. You will analyze technical tradeoffs and synthesize findings into clear, actionable recommendations for leadership.

Accountabilities

Provide direct technical and strategic support to the CEO of Microsoft AI—identifying opportunities and risks across our research portfolio
Engage deeply with research teams on pre-training, post-training, multimodal systems, infrastructure, and safety/alignment work

The Candidate we're looking for

Experience:

4+ years technical engineering experience with coding

Technical skills:

Software engineering skills with the ability to rapidly prototype and evaluate technical solutions
Understanding modern ML techniques, including large language models, multimodal systems, agents, or related areas

Personal attributes:

Written and verbal communication skills, with the ability to distill complex technical concepts for executive audiences
Track record of operating effectively in fast-paced, ambiguous environments

Benefits

Software Engineering IC4 – The typical base pay range for this role across the U.S. is USD $119,800 – $234,700 per year
Certain roles may be eligible for benefits and other compensation

XML job scraping automation by YubHub

]]> full-time senior onsite USD $119,800 – $234,700 per year software engineering, machine learning, large language models, multimodal systems, agents, published research at top ML/AI venues, experience at a leading AI research lab or startup Engineering Technology Microsoft https://logos.yubhub.co/microsoft.ai.png Microsoft is a multinational technology company that develops, manufactures, licenses, and supports a wide range of software products, services, and devices. They are a leader in the technology industry and have a strong presence in the global market. Microsoft is known for its innovative products and services, such as Windows, Office, and Azure. https://microsoft.ai https://microsoft.ai/job/technical-advisor-microsoft-superintelligence-office-of-the-ceo/ Mountain View 2026-03-06