Model Behavior Tutor - Epistemic Rigor & Truthfulness

1f33d5a1-6ed Model Behavior Tutor - Epistemic Rigor & Truthfulness You will ensure Grok reasons carefully, resists motivated reasoning, and communicates uncertainty and evidence proportionately.

Responsibilities: Assess model outputs for factual accuracy, logical coherence, fallacious reasoning, and hidden assumptions. Identify subtle ideological capture, statistical fallacies, and rhetorical sleights of hand. Write exemplary reasoning that models intellectual honesty, source evaluation, nuanced weighing of primary and secondary sources, and scoping of confidence. Construct adversarial examples and red-team prompts to expose remaining epistemic weaknesses. Contribute to the definition and scaling of constitutional principles for truth-seeking behavior.

Basic Qualifications: Published analytical work and academic training in a high-rigor field. Strong Forecasting track record (e.g., Metaculus, Good Judgment), rigorous analysis, or public updating on errors. Deep knowledge in at least three of: philosophy of science, cognitive psychology, statistics, logic, linguistics, history, economics, or related disciplines. Ability to steel-man opposing views and separate settled knowledge from speculation. Habitual reliance on primary sources and base rates.

Preferred Skills and Experience: Experience in intelligence analysis, investigative journalism, or academic peer review.

XML job scraping automation by YubHub

]]> full-time|part-time|contract senior remote $40/hour - $70/hour factual accuracy, logical coherence, fallacious reasoning, ideological capture, statistical fallacies, rhetorical sleights of hand, intellectual honesty, source evaluation, nuanced weighing of primary and secondary sources, scoping of confidence, adversarial examples, red-team prompts, epistemic weaknesses, definition and scaling of constitutional principles for truth-seeking behavior, intelligence analysis, investigative journalism, academic peer review Engineering Technology xAI https://logos.yubhub.co/xai.com.png xAI creates AI systems to understand the universe and aid humanity in its pursuit of knowledge. https://www.xai.com/ https://job-boards.greenhouse.io/xai/jobs/5017518007 Remote 2026-04-18 9f6fed50-cc0 Applied AI, AI Engineer About the Job

We are seeking an Applied AI, AI Engineer to join our customer-facing technical organization. As a member of our team, you will work directly with enterprise clients from pre-sales through implementation to deploy cutting-edge AI solutions that deliver measurable business impact.

Your primary responsibility will be to identify high-value internal use cases across engineering, legal, HR, sales, and operations, and build or vibe code end-to-end LLM applications. You will own the full lifecycle of these applications, from prototype to production, maintenance, and iteration.

In addition to your technical skills, you will also be responsible for documenting learnings and sharing insights with product and research teams, and converting successful internal tools into customer demos or case studies where appropriate.

How We Work in Applied AI

We care about people and outputs. What matters is what you ship, not the time you spend on it. Bureaucracy is where urgency goes to vanish. You talk to whoever you need to talk to. The best idea wins, whether it comes from a principal engineer or someone in their first week. Always ask why. The best solutions come from deep understanding, not from copying what worked before. We say what we mean. Feedback is direct, timely, and given because we care. No politics. Low ego, high standards. We embrace an unstructured environment and find joy in it.

About You

You are fluent in English and have 3+ years of experience building production software, with meaningful experience deploying LLM applications. You have a bias toward shipping, preferring a working prototype over a perfect specification. You possess strong technical coding skills in Python and front-end skills with React Frameworks. You are comfortable working autonomously across teams with different needs and constraints, and have strong communication skills to bridge non-technical teams and AI capabilities.

Ideally, you have contributions to open-source evaluation frameworks or published research on LLM evaluation, experience as a Customer Engineer, Forward Deployed Engineer, Sales Engineer, Solutions Architect, or Technical Product Manager, and experience with ML frameworks (PyTorch, HuggingFace Transformers).

Benefits

PTO: The CDI contract will be a 'Forfait 218 jours', corresponding to 25 days of holidays and on average 8 to 10 days of RTT days, and complete autonomy on working hours.

Health: Full health insurance coverage for you and your family.

Transportation: We offer a €600 annual mobility allowance, covering 50% of your public transportation costs and including the Sustainable Mobility Allowance (FMD), encouraging eco-friendly travel options such as cycling or carpooling.

Food: Swile meal vouchers with 10,83€ per worked day, including 60% offered by the company.

Sport: Gymlib - sponsorship by Mistral of a significant part of the monthly fee (depending on the program you chose).

Parental policy: 4 additional weeks for parents on top of what is offered by the French state.

XML job scraping automation by YubHub

]]> full-time mid onsite Python, React Frameworks, LLM applications, PyTorch, HuggingFace Transformers, Open-source evaluation frameworks, Published research on LLM evaluation, Customer Engineer, Forward Deployed Engineer, Sales Engineer, Solutions Architect, Technical Product Manager Engineering Technology Mistral AI https://logos.yubhub.co/mistral.ai.png Mistral AI develops and provides high-performance, optimized, open-source, and cutting-edge AI models, products, and solutions for enterprise needs. https://mistral.ai https://jobs.lever.co/mistral/3d9a6ece-1f8c-4e0b-a275-fde6300ed1f8 Paris 2026-04-17 c7da135a-ebe Applied AI, Evaluation Engineer About the Job

The Applied AI team is Mistral's customer-facing technical organization. We work directly with enterprise clients from pre-sales through implementation to deploy cutting-edge AI solutions that deliver measurable business impact.

As a first Evaluation Engineer, you'll design the methodology, build the infrastructure, and define what 'ready for production' means across verticals and use cases. You will design and implement evaluation systems that help our customers understand model performance across their specific use cases, build robust evaluation infrastructure, and work closely with both research and customer-facing teams.

Research builds evals for frontier capabilities but customers don't care about MMLU scores. We need in Applied AI evals and frameworks for customer reality domain-specific, risk-aware, production-grade. The kind that tell you whether your medical summarization model will hallucinate drug interactions, or whether your legal assistant will invent case citations.

This role sits at the intersection of research, engineering, and solutions, you will play a critical cross role in measuring, understanding, and improving the capabilities of our models for our enterprise customers.

Responsibilities

Design and implement comprehensive evaluation frameworks to measure LLM capabilities across diverse customer use cases, including text generation, reasoning, code, and domain-specific applications
Build scalable evaluation infrastructure and pipelines that enable rapid, reproducible assessment of model performance
Develop novel evaluation methodologies to assess emerging capabilities or verticalized use cases (cybersecurity, finance, healthcare, etc.) and enable the Solutions (Deployment Strategist and Applied AI) on these topics
Create custom evaluation suites tailored to enterprise customers' specific needs, working closely with them to understand their requirements and success criteria
Collaborate with research teams to translate evaluation insights into model improvements and training decisions
Partner with product teams to continuously improve our evaluation tooling based on customer feedback

How We Work in Applied AI

We care about people and outputs
What matters is what you ship, not the time you spend on it
Bureaucracy is where urgency goes to vanish. You talk to whoever you need to talk to
The best idea wins, whether it comes from a principal engineer or someone in their first week
Always ask why. The best solutions come from deep understanding, not from copying what worked before
We say what we mean. Feedback is direct, timely, and given because we care
No politics. Low ego, high standards
We embrace an unstructured environment and find joy in it

About You

You are fluent in English
3+ years of experience in ML evaluation, benchmarking for LLM or agentic systems
You have proven experience in AI or machine learning product implementation with APIs, back-end
You have deep understanding of concepts and algorithms underlying machine learning and LLMs
You have strong technical coding skills in Python
You hold strong communication skills with an ability to explain complex technical concepts in simple terms with technical and non-technical audiences

Ideally You Have:

Contributions to open-source evaluation frameworks (e.g., LM Eval Harness, OpenAI Evals) or published research on LLM evaluation
Experience as a Customer Engineer, Forward Deployed Engineer, Sales Engineer, Solutions Architect or Technical Product Manager
Experience with ML frameworks (PyTorch, HuggingFace Transformers)

Benefits

PTO: The CDI contract will be a 'Forfait 218 jours', corresponding to 25 days of holidays and on average 8 to 10 days of RTT days, and complete autonomy on working hours
Health: Full health insurance coverage for you and your family
Transportation: We offer a €600 annual mobility allowance. This package covers 50% of your public transportation costs and includes the Sustainable Mobility Allowance (FMD), encouraging eco-friendly travel options such as cycling or carpooling
Food: Swile meal vouchers with 10,83€ per worked day, incl 60% offered by company
Sport: Gymlib - sponsorship by Mistral of a significant part of the monthly fee (depending on the program you chose)
Parental policy: 4 additional weeks for parents on top of what is offered by the French state

XML job scraping automation by YubHub

]]> full-time entry onsite ML evaluation, benchmarking for LLM or agentic systems, AI or machine learning product implementation with APIs, back-end, Python, evaluation frameworks, open-source evaluation frameworks, PyTorch, HuggingFace Transformers Engineering Technology Mistral AI https://logos.yubhub.co/mistral.ai.png Mistral AI develops and integrates AI technology into daily working life, providing high-performance, optimized, open-source and cutting-edge models, products and solutions. https://mistral.ai https://jobs.lever.co/mistral/e0db3860-0a80-47a8-958a-f8e62f3bb59c Paris 2026-04-17