{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/title/model-behavior-architect"},"x-facet":{"type":"title","slug":"model-behavior-architect","display":"Model Behavior Architect","count":2},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_d9383bcf-242"},"title":"Model Behavior Architect","description":"<p>About this role</p>\n<p>As a Model Behavior Architect at Mistral AI, you will be at the forefront of defining and measuring Large Language Model (LLM) behavior. You will work closely with our Science team to define what &#39;good&#39; looks like for various tasks, including Reasoning, Audio, Alignment, Tools, and Frontier bets.</p>\n<p>Responsibilities</p>\n<ul>\n<li>Interact with models to identify areas for improvement in model behavior</li>\n<li>Gather internal and external feedback on model behavior to scope areas for improvement</li>\n<li>Design and implement evaluation pipelines, data guidelines, data generation, and synthetic testing environments</li>\n<li>Identify and fix edge case behaviors through rigorous testing</li>\n<li>Develop robust evaluation pipelines for model candidates</li>\n<li>Collaborate with AI Scientists</li>\n</ul>\n<p>About you</p>\n<ul>\n<li>You have a deep understanding of linguistics, language, and translation, engineering and code behavior, or LLM agents at work, including reasoning and tool use</li>\n<li>You have prior knowledge in training and optimizing model behavior</li>\n<li>You are an expert at building robust evaluations</li>\n<li>You thrive in dynamic and technically complex environments</li>\n<li>You have a track record of delivering innovative, out-of-the-box solutions to address real-world constraints</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_d9383bcf-242","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai","logo":"https://logos.yubhub.co/mistral.ai.png"},"x-apply-url":"https://jobs.lever.co/mistral/4337cebc-b951-4528-98f8-ebcb45db5645","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Large Language Models","Model Evaluation","Policy Writing","Evaluation Pipelines","Data Generation","Synthetic Testing Environments"],"x-skills-preferred":[],"datePosted":"2026-04-17T12:47:33.755Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Paris"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Large Language Models, Model Evaluation, Policy Writing, Evaluation Pipelines, Data Generation, Synthetic Testing Environments"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_dbcceacb-d90"},"title":"Model Behavior Architect","description":"<p>We&#39;re looking for a Model Behavior Architect to help build Perplexity&#39;s AI products and evaluations. You&#39;ll sit within our AI team and collaborate closely with research and product teams, designing prompt and context engineering strategies to deliver high quality user experiences across multiple domains and models.</p>\n<p><strong>What you&#39;ll do</strong></p>\n<ul>\n<li>Context Engineering: Design, test, and optimize context strategies and system prompts that shape answer engine behavior across products, features, and use cases.</li>\n<li>Evaluation Systems: Build automated and semi-automated evaluation pipelines that measure model quality, catch regressions, and scale across product surfaces.</li>\n<li>Model Launch Support: Partner with research and engineering to validate model behavior before and during rollouts, ensuring smooth transitions with no degradation.</li>\n<li>Research &amp; Analysis: Identify inconsistencies and failure modes in model outputs through well-designed research projects — for both internal and production-facing systems.</li>\n<li>Cross-functional Collaboration: Work closely with design, product, and research teams to translate product goals into concrete model behavior requirements.</li>\n<li>Knowledge Sharing: Help engineers across teams build intuition for prompt design, context engineering, and evaluation best practices.</li>\n<li>Staying Current: Track the latest alignment, evaluation, and prompting techniques from industry and academia, and bring the best ideas back to the team.</li>\n</ul>\n<p><strong>What you need</strong></p>\n<ul>\n<li>Experience designing evaluations, benchmarks, or metrics for AI systems.</li>\n<li>Strong written and verbal communication skills, particularly in explaining complex concepts to diverse stakeholders.</li>\n<li>Ability to manage multiple concurrent projects in a fast-moving environment.</li>\n<li>Strong experience with Perplexity or other frontier AI models in production settings.</li>\n<li>Demonstrated experience with Python — you&#39;ll prototype, debug, automate, and build systems at scale.</li>\n<li>3+ years of experience working with LLMs in a product or research setting.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_dbcceacb-d90","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Perplexity","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/perplexity.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/perplexity/9904db61-b8ca-4207-8f93-88ab6f0cd3fd","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$180K - $270K","x-skills-required":["experience designing evaluations, benchmarks, or metrics for AI systems","strong written and verbal communication skills","ability to manage multiple concurrent projects","strong experience with Perplexity or other frontier AI models","demonstrated experience with Python","3+ years of experience working with LLMs"],"x-skills-preferred":["experience with A/B testing or experimentation frameworks","track record of improving AI system performance through systematic evaluation and iteration"],"datePosted":"2026-03-04T12:25:33.674Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"experience designing evaluations, benchmarks, or metrics for AI systems, strong written and verbal communication skills, ability to manage multiple concurrent projects, strong experience with Perplexity or other frontier AI models, demonstrated experience with Python, 3+ years of experience working with LLMs, experience with A/B testing or experimentation frameworks, track record of improving AI system performance through systematic evaluation and iteration","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":180000,"maxValue":270000,"unitText":"YEAR"}}}]}