{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/offline-evaluation"},"x-facet":{"type":"skill","slug":"offline-evaluation","display":"Offline Evaluation","count":2},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_5c7e3c9c-ece"},"title":"AI Product Engineer - Agentic AI Platforms (Financial Services)","description":"<p>Capgemini is at the forefront of Generative AI innovation, helping Financial Services clients industrialize GenAI and Agentic AI platforms at enterprise scale.</p>\n<p>We are seeking an experienced and innovative AI Product Engineer – Agentic Platforms to join our Financial Services Artificial Intelligence &amp; Business Lines (FS-ABL) practice. This role is ideal for a consulting technologist with deep expertise in modern GenAI tooling, agentic system design, and enterprise SDLC, who can partner directly with clients to envision, design, develop, and deploy Agentic AI platforms in regulated environments.</p>\n<p>In this role, you will work at the intersection of client advisory, AI product engineering, and delivery execution, helping banks, insurers, and capital markets firms transition from GenAI pilots to production-grade, governed, multi-agent systems. You will apply leading GenAI frameworks and LLM platforms , including Anthropic, OpenAI, LangChain, LangGraph, DSPy, and vector databases,while operating across the full Agentic SDLC.</p>\n<p>P&amp;C Insurance knowledge and experience is a significant plus. Additionally, familiarity with core insurance platforms like Guidewire, DuckCreek or Majesco will be extremely helpful to succeed in this role.</p>\n<p>We are looking for candidates across all levels of experience and expertise - junior through senior level AI Product Engineers.</p>\n<p><strong>Responsibilities</strong></p>\n<p>Client Advisory &amp; Product Vision</p>\n<p>Partner directly with Financial Services clients to identify, prioritize, and shape Agentic AI use cases across customer operations, underwriting, claims, risk, compliance, finance, and technology.</p>\n<p>Lead client workshops to define agent personas, responsibilities, autonomy boundaries, human-in-the-loop checkpoints, and escalation logic.</p>\n<p>Translate evolving business needs into agentic product backlogs, roadmaps, and MVP definitions.</p>\n<p>Support executive conversations around GenAI platform strategy, operating models, vendor selection, and scale-out approaches.</p>\n<p>Agentic Platform &amp; Architecture Design</p>\n<p>Design and implement multi-agent architectures using modern GenAI tooling, including:</p>\n<p>Planner, executor, reviewer/critic, and supervisor agents</p>\n<p>Tool-calling and function-calling agents</p>\n<p>Memory-enabled agents (conversation, semantic, episodic, and structured memory)</p>\n<p>Leverage LangChain and LangGraph for agent orchestration, workflows, and control flow.</p>\n<p>Apply DSPy and declarative prompt optimization techniques for repeatability, performance tuning, and regression control.</p>\n<p>Design agent interaction patterns such as hierarchical agents, collaborating agents, and event-driven agent workflows.</p>\n<p>Define standardized agent contracts, interfaces, and schemas to enable reuse and scale.</p>\n<p>Agentic SDLC &amp; Engineering Delivery</p>\n<p>Own delivery across the full Software Development Lifecycle (SDLC), extending it into a formal Agentic SDLC, including:</p>\n<p>Agent design specifications and behavior contracts</p>\n<p>Prompt, policy, and tool versioning</p>\n<p>Simulation environments and offline evaluation</p>\n<p>Automated testing of agent flows and guardrails</p>\n<p>Controlled rollout, telemetry-driven optimization, and continuous learning</p>\n<p>Build production-grade AI services primarily using Python, integrating:</p>\n<p>LLM providers such as Anthropic (Claude), OpenAI, and open-source models</p>\n<p>Retrieval-Augmented Generation (RAG) using vector databases (e.g., Pinecone, FAISS, Milvus, Weaviate)</p>\n<p>Implement CI/CD pipelines for agent code, prompts, and policies.</p>\n<p>Integrate GenAI agents with client systems via APIs, workflow engines, event streams, and data platforms.</p>\n<p>Observability, Evaluation &amp; Optimization</p>\n<p>Implement agent observability including tracing, decision logging, tool usage, and failure analysis.</p>\n<p>Apply evaluation frameworks for hallucination detection, consistency checks, and fitness scoring.</p>\n<p>Design feedback loops incorporating human-in-the-loop review and reinforcement.</p>\n<p>Monitor cost, latency, throughput, and behavioral drift across deployed agents.</p>\n<p>Governance, Risk &amp; Financial Services Compliance</p>\n<p>Design Agentic AI platforms aligned with Financial Services regulatory expectations, including:</p>\n<p>Auditability and traceability of agent decisions</p>\n<p>Model and prompt explainability</p>\n<p>Data privacy and security controls</p>\n<p>Resilience and fail-safe mechanisms</p>\n<p>Embed guardrails and policies addressing hallucination risk, bias, unauthorized actions, and escalation failures.</p>\n<p>Produce documentation supporting risk, compliance, internal audit, and regulator engagement.</p>\n<p><strong>Team Leadership &amp; Firm Contribution</strong></p>\n<p>Provide technical leadership and mentorship to consulting delivery teams.</p>\n<p>Contribute to internal GenAI accelerators, agent frameworks, and reusable assets.</p>\n<p>Support RFPs, proposals, and client solution designs with credible GenAI and agentic architectures.</p>\n<p>Participate in thought leadership on Agentic SDLC, GenAI engineering, and responsible autonomy.</p>\n<p><strong>Benefits</strong></p>\n<p>This position comes with competitive compensation and benefits package:</p>\n<ol>\n<li>Competitive salary and performance-based bonuses</li>\n</ol>\n<ol>\n<li>Comprehensive benefits package</li>\n</ol>\n<ol>\n<li>Career development and training opportunities</li>\n</ol>\n<ol>\n<li>Flexible work arrangements (remote and/or office-based)</li>\n</ol>\n<ol>\n<li>Dynamic and inclusive work culture within a globally known group</li>\n</ol>\n<ol>\n<li>Private Health Insurance</li>\n</ol>\n<ol>\n<li>Retirement Benefits</li>\n</ol>\n<ol>\n<li>Paid Time Off</li>\n</ol>\n<ol>\n<li>Training &amp; Development</li>\n</ol>\n<ol>\n<li>Note: Benefits differ based on employee level</li>\n</ol>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_5c7e3c9c-ece","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Capgemini","sameAs":"https://www.capgemini.com/","logo":"https://logos.yubhub.co/capgemini.com.png"},"x-apply-url":"https://jobs.workable.com/view/dX77bfYLcJf1VCF2yXNUEe/hybrid-ai-product-engineer---agentic-ai-platforms-(financial-services)-in-mexico-city-at-capgemini","x-work-arrangement":"hybrid","x-experience-level":null,"x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Python","GenAI","LLM","LangChain","LangGraph","DSPy","Vector Databases","APIs","Workflow Engines","Event Streams","Data Platforms","Agentic SDLC","Agent Design","Behavior Contracts","Prompt Policy","Tool Versioning","Simulation Environments","Offline Evaluation","Automated Testing","Controlled Rollout","Telemetry-Driven Optimization","Continuous Learning","Production-Grade AI Services","Retrieval-Augmented Generation","Human-in-the-Loop Review","Reinforcement","Cost Latency Throughput","Behavioral Drift","Auditability","Traceability","Model Explainability","Data Privacy","Security Controls","Resilience","Fail-Safe Mechanisms","Guardrails","Policies","Risk Compliance","Internal Audit","Regulator Engagement"],"x-skills-preferred":[],"datePosted":"2026-04-24T14:19:00.539Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Mexico City"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, GenAI, LLM, LangChain, LangGraph, DSPy, Vector Databases, APIs, Workflow Engines, Event Streams, Data Platforms, Agentic SDLC, Agent Design, Behavior Contracts, Prompt Policy, Tool Versioning, Simulation Environments, Offline Evaluation, Automated Testing, Controlled Rollout, Telemetry-Driven Optimization, Continuous Learning, Production-Grade AI Services, Retrieval-Augmented Generation, Human-in-the-Loop Review, Reinforcement, Cost Latency Throughput, Behavioral Drift, Auditability, Traceability, Model Explainability, Data Privacy, Security Controls, Resilience, Fail-Safe Mechanisms, Guardrails, Policies, Risk Compliance, Internal Audit, Regulator Engagement"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_e3b1c38b-ef1"},"title":"Staff Software Engineer, Communication Products","description":"<p>Job Title: Staff Software Engineer, Communication Products</p>\n<p>We are seeking a highly skilled and experienced Staff Software Engineer to join our Communication Products team. As a Staff Engineer, you will be responsible for leading the technical vision for ML-powered messaging features, architecting and delivering intelligent capabilities end-to-end, and partnering deeply with ML and product teams.</p>\n<p>The Difference You Will Make:</p>\n<p>As a Staff Engineer on the team, you will define and drive the technical strategy for integrating ML capabilities into Airbnb&#39;s messaging products, including smart replies, message classification, content moderation, translation, and conversational assistance. You will also own the full lifecycle of ML-powered features: from prototyping and experimentation through launch, monitoring, and iteration.</p>\n<p>A Typical Day:</p>\n<ul>\n<li>Design, build, and operate the systems that serve ML models within the messaging stack, with a focus on latency, reliability, and scalability</li>\n<li>Write and review technical designs that solve large, open-ended problems at the intersection of ML and product engineering without clearly-known solutions</li>\n<li>Partner with ML, data science, and product teams to identify high-value opportunities, establish evaluation criteria, and close the gap between offline model performance and production impact</li>\n<li>Collaborate with other engineers and cross-functional partners across Messaging, Trust &amp; Safety, Localization, and Platform organizations to align on long-term technical solutions</li>\n<li>Mentor, guide, advocate, and support the career growth of individual contributors</li>\n<li>Establish engineering standards for ML integration across the messaging surface, including feature flagging, A/B testing, observability, and graceful degradation</li>\n</ul>\n<p>Your Expertise:</p>\n<ul>\n<li>9+ years of relevant engineering hands-on work experience</li>\n<li>Bachelors, Masters, or PhD in CS or related field</li>\n<li>Demonstrated experience building and shipping ML-powered product features in production environments, including model serving, feature pipelines, online/offline evaluation, and monitoring</li>\n<li>Exceptional architecture abilities and experience with architectural patterns of large, high-scale applications</li>\n<li>Familiarity with NLP/NLU techniques and large language models, particularly as applied to messaging, conversational AI, or content understanding</li>\n<li>Shipped several large-scale projects with multiple dependencies across teams, specifically at the intersection of ML infrastructure and product engineering</li>\n<li>Technical leadership and strong communication skills with the ability to translate between ML research, product goals, and engineering execution</li>\n<li>Experience operating distributed, real-time systems at scale with high reliability requirements</li>\n<li>Experience with real-time messaging systems or event-driven architectures</li>\n<li>Familiarity with ML infrastructure at scale (e.g., feature stores, model registries, online inference platforms)</li>\n<li>Prior work on trust &amp; safety, content moderation, or internationalization in a messaging context</li>\n<li>Experience with LLM-based product features, including prompt engineering, retrieval-augmented generation, or fine-tuning</li>\n</ul>\n<p>How We&#39;ll Take Care of You:</p>\n<p>Our job titles may span more than one career level. The actual base pay is dependent upon many factors, such as: training, transferable skills, work experience, business needs and market demands. The base pay range is subject to change and may be modified in the future. This role may also be eligible for bonus, equity, benefits, and Employee Travel Credits.</p>\n<p>Pay Range: $204,000-$255,000 USD</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_e3b1c38b-ef1","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Airbnb","sameAs":"https://www.airbnb.com/","logo":"https://logos.yubhub.co/airbnb.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/airbnb/jobs/7655958","x-work-arrangement":"remote","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$204,000-$255,000 USD","x-skills-required":["ML-powered product features","model serving","feature pipelines","online/offline evaluation","monitoring","architectural patterns","NLP/NLU techniques","large language models","messaging","conversational AI","content understanding","distributed, real-time systems","real-time messaging systems","event-driven architectures","ML infrastructure","feature stores","model registries","online inference platforms","trust & safety","content moderation","internationalization","LLM-based product features","prompt engineering","retrieval-augmented generation","fine-tuning"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:49:16.839Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Remote - USA"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"ML-powered product features, model serving, feature pipelines, online/offline evaluation, monitoring, architectural patterns, NLP/NLU techniques, large language models, messaging, conversational AI, content understanding, distributed, real-time systems, real-time messaging systems, event-driven architectures, ML infrastructure, feature stores, model registries, online inference platforms, trust & safety, content moderation, internationalization, LLM-based product features, prompt engineering, retrieval-augmented generation, fine-tuning","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":204000,"maxValue":255000,"unitText":"YEAR"}}}]}