Senior Machine Learning Engineer, Public Sector

1ccfb615-468 Senior Machine Learning Engineer, Public Sector We are seeking a Senior Machine Learning Engineer to join our Public Sector team. As a Senior Machine Learning Engineer, you will leverage techniques in generative AI, computer vision, reinforcement learning, and agentic AI to improve Scale's products and customer experience in production environments.

Our Public Sector Machine Learning team is focused on deploying cutting-edge models to mission-critical government systems through products like Donovan and Thunderforge. You will take state-of-the-art models developed internally and from the community, use them in production to solve problems for our customers and taskers.

Key responsibilities include:

Improving and maintaining production models through retraining, hyperparameter tuning, and architectural updates, while preserving core performance characteristics
Collaborating with product and research teams to identify and prototype ML-driven product enhancements, including for upcoming product lines
Working with massive datasets to develop both generic models as well as fine-tune models for specific products
Building scalable machine learning infrastructure to automate and optimize our ML services
Serving as a cross-functional representative and advocate for machine learning techniques across engineering and product organizations

Ideal candidates will have extensive experience using computer vision, deep learning, and deep reinforcement learning, or natural language processing in a production environment. Solid background in algorithms, data structures, and object-oriented programming is also required.

Nice to haves include a graduate degree in Computer Science, Machine Learning, or Artificial Intelligence specialization, experience working with cloud platforms, and familiarity with ML evaluation frameworks and agentic model design.

Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position, determined by work location and additional factors, including job-related skills, experience, interview performance, and relevant education or training.

You'll also receive benefits including comprehensive health, dental, and vision coverage, retirement benefits, a learning and development stipend, and generous PTO. This role may be eligible for additional benefits such as a commuter stipend.

XML job scraping automation by YubHub

]]> full-time senior onsite $216,300-$300,300 USD computer vision, deep learning, deep reinforcement learning, natural language processing, algorithms, data structures, object-oriented programming, Python, TensorFlow, PyTorch, graduate degree in Computer Science, Machine Learning, or Artificial Intelligence specialization, experience working with cloud platforms, familiarity with ML evaluation frameworks and agentic model design Engineering Technology Scale https://logos.yubhub.co/scale.com.png Scale develops reliable AI systems for the world's most important decisions. https://scale.com/ https://job-boards.greenhouse.io/scaleai/jobs/4281519005 San Francisco, CA; New York, NY; Washington, DC 2026-04-18 c7da135a-ebe Applied AI, Evaluation Engineer About the Job

The Applied AI team is Mistral's customer-facing technical organization. We work directly with enterprise clients from pre-sales through implementation to deploy cutting-edge AI solutions that deliver measurable business impact.

As a first Evaluation Engineer, you'll design the methodology, build the infrastructure, and define what 'ready for production' means across verticals and use cases. You will design and implement evaluation systems that help our customers understand model performance across their specific use cases, build robust evaluation infrastructure, and work closely with both research and customer-facing teams.

Research builds evals for frontier capabilities but customers don't care about MMLU scores. We need in Applied AI evals and frameworks for customer reality domain-specific, risk-aware, production-grade. The kind that tell you whether your medical summarization model will hallucinate drug interactions, or whether your legal assistant will invent case citations.

This role sits at the intersection of research, engineering, and solutions, you will play a critical cross role in measuring, understanding, and improving the capabilities of our models for our enterprise customers.

Responsibilities

Design and implement comprehensive evaluation frameworks to measure LLM capabilities across diverse customer use cases, including text generation, reasoning, code, and domain-specific applications
Build scalable evaluation infrastructure and pipelines that enable rapid, reproducible assessment of model performance
Develop novel evaluation methodologies to assess emerging capabilities or verticalized use cases (cybersecurity, finance, healthcare, etc.) and enable the Solutions (Deployment Strategist and Applied AI) on these topics
Create custom evaluation suites tailored to enterprise customers' specific needs, working closely with them to understand their requirements and success criteria
Collaborate with research teams to translate evaluation insights into model improvements and training decisions
Partner with product teams to continuously improve our evaluation tooling based on customer feedback

How We Work in Applied AI

We care about people and outputs
What matters is what you ship, not the time you spend on it
Bureaucracy is where urgency goes to vanish. You talk to whoever you need to talk to
The best idea wins, whether it comes from a principal engineer or someone in their first week
Always ask why. The best solutions come from deep understanding, not from copying what worked before
We say what we mean. Feedback is direct, timely, and given because we care
No politics. Low ego, high standards
We embrace an unstructured environment and find joy in it

About You

You are fluent in English
3+ years of experience in ML evaluation, benchmarking for LLM or agentic systems
You have proven experience in AI or machine learning product implementation with APIs, back-end
You have deep understanding of concepts and algorithms underlying machine learning and LLMs
You have strong technical coding skills in Python
You hold strong communication skills with an ability to explain complex technical concepts in simple terms with technical and non-technical audiences

Ideally You Have:

Contributions to open-source evaluation frameworks (e.g., LM Eval Harness, OpenAI Evals) or published research on LLM evaluation
Experience as a Customer Engineer, Forward Deployed Engineer, Sales Engineer, Solutions Architect or Technical Product Manager
Experience with ML frameworks (PyTorch, HuggingFace Transformers)

Benefits

PTO: The CDI contract will be a 'Forfait 218 jours', corresponding to 25 days of holidays and on average 8 to 10 days of RTT days, and complete autonomy on working hours
Health: Full health insurance coverage for you and your family
Transportation: We offer a €600 annual mobility allowance. This package covers 50% of your public transportation costs and includes the Sustainable Mobility Allowance (FMD), encouraging eco-friendly travel options such as cycling or carpooling
Food: Swile meal vouchers with 10,83€ per worked day, incl 60% offered by company
Sport: Gymlib - sponsorship by Mistral of a significant part of the monthly fee (depending on the program you chose)
Parental policy: 4 additional weeks for parents on top of what is offered by the French state

XML job scraping automation by YubHub

]]> full-time entry onsite ML evaluation, benchmarking for LLM or agentic systems, AI or machine learning product implementation with APIs, back-end, Python, evaluation frameworks, open-source evaluation frameworks, PyTorch, HuggingFace Transformers Engineering Technology Mistral AI https://logos.yubhub.co/mistral.ai.png Mistral AI develops and integrates AI technology into daily working life, providing high-performance, optimized, open-source and cutting-edge models, products and solutions. https://mistral.ai https://jobs.lever.co/mistral/e0db3860-0a80-47a8-958a-f8e62f3bb59c Paris 2026-04-17 67fcb604-29e Applied AI, Evaluation Engineer About Mistral AI

At Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life.

We are a global organisation with teams distributed between France, USA, UK, Germany, and Singapore. Our comprehensive AI platform meets enterprise needs, whether on-premises or in cloud environments.

Our offerings include le Chat, the AI assistant for life and work.

About The Job

The Applied AI team is Mistral's customer-facing technical organisation. We work directly with enterprise clients from pre-sales through implementation to deploy cutting-edge AI solutions that deliver measurable business impact.

Responsibilities

Design and implement comprehensive evaluation frameworks to measure LLM capabilities across diverse customer use cases, including text generation, reasoning, code, and domain-specific applications
Build scalable evaluation infrastructure and pipelines that enable rapid, reproducible assessment of model performance
Develop novel evaluation methodologies to assess emerging capabilities or verticalized use cases (cybersecurity, finance, healthcare, etc.) and enable the Solutions (Deployment Strategist and Applied AI) on these topics
Create custom evaluation suites tailored to enterprise customers' specific needs, working closely with them to understand their requirements and success criteria
Collaborate with research teams to translate evaluation insights into model improvements and training decisions
Partner with product teams to continuously improve our evaluation tooling based on customer feedback

How We Work in Applied AI

We care about people and outputs
What matters is what you ship, not the time you spend on it
Bureaucracy is where urgency goes to vanish. You talk to whoever you need to talk to. The best idea wins, whether it comes from a principal engineer or someone in their first week
Always ask why. The best solutions come from deep understanding, not from copying what worked before
We say what we mean. Feedback is direct, timely, and given because we care
No politics. Low ego, high standards
We embrace an unstructured environment and find joy in it

About You

You are fluent in English
3+ years of experience in ML evaluation, benchmarking for LLM or agentic systems
You have proven experience in AI or machine learning product implementation with APIs, back-end
You have deep understanding of concepts and algorithms underlying machine learning and LLMs
You have strong technical coding skills in Python

Ideally You Have:

Contributions to open-source evaluation frameworks (e.g., LM Eval Harness, OpenAI Evals) or published research on LLM evaluation
Experience as a Customer Engineer, Forward Deployed Engineer, Sales Engineer, Solutions Architect or Technical Product Manager
Experience with ML frameworks (PyTorch, HuggingFace Transformers)

Benefits

PTO: The CDI contract will be a 'Forfait 218 jours', corresponding to 25 days of holidays and on average 8 to 10 days of RTT days, and complete autonomy on working hours

Health: Full health insurance coverage for you and your family

Transportation: We offer a €600 annual mobility allowance. This package covers 50% of your public transportation costs and includes the Sustainable Mobility Allowance (FMD), encouraging eco-friendly travel options such as cycling or carpooling

Food: Swile meal vouchers with 10,83€ per worked day, incl 60% offered by company

Sport: Gymlib - sponsorship by Mistral of a significant part of the monthly fee (depending on the program you chose)

Parental policy: 4 additional weeks for parents on top of what is offered by the French state

XML job scraping automation by YubHub

]]> full-time entry onsite ML evaluation, benchmarking, LLM, agentic systems, AI, machine learning, APIs, back-end, Python, PyTorch, HuggingFace Transformers Engineering Technology Mistral AI Mistral AI develops and provides high-performance, optimized, open-source, and cutting-edge AI models, products, and solutions for enterprise needs. https://mistral.ai https://jobs.lever.co/mistral/e0db3860-0a80-47a8-958a-f8e62f3bb59c Paris 2026-03-10