{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/evals"},"x-facet":{"type":"skill","slug":"evals","display":"Evals","count":20},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_8a3caae4-044"},"title":"Member of Technical Staff - Imagine Model","description":"<p>As a Member of Technical Staff on the Imagine Model Team, you will develop cutting-edge AI experiences beyond text, with a strong focus on enabling high-fidelity understanding and generation across image and video modalities. Responsibilities span data curation, modeling, training, inference serving, and product integration, covering both pretraining and post-training phases. You will collaborate closely with product teams to push model frontiers and deliver exceptional end-to-end user experiences.</p>\n<p>Key responsibilities include creating and driving engineering agendas to advance multimodal capabilities, improving data quality through annotation, filtering, augmentation, synthetic generation, captioning, and in-depth data studies, designing evaluation frameworks, metrics, benchmarks, evals, and reward models tailored to image/video/audio quality and coherence, implementing efficient algorithms for state-of-the-art model performance, and developing scalable data collection and processing pipelines for multimodal (primarily image/video-focused) datasets.</p>\n<p>The ideal candidate will have a track record in leading studies that significantly improve neural network capabilities and performance through better data or modeling, experience in data-driven experiment designs, systematic analysis, and iterative model debugging, experience developing or working with large-scale distributed machine learning systems, and ability to deliver optimal end-to-end user experiences.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_8a3caae4-044","directApply":true,"hiringOrganization":{"@type":"Organization","name":"xAI","sameAs":"https://www.xai.com/","logo":"https://logos.yubhub.co/xai.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/xai/jobs/5051985007","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$180,000 - $440,000 USD","x-skills-required":["data curation","modeling","training","inference serving","product integration","large-scale distributed machine learning systems"],"x-skills-preferred":["SFT","RL","evals","human/synthetic data collection","agentic systems","Python","JAX/XLA","PyTorch","Rust/C++","Spark","Ray"],"datePosted":"2026-04-18T15:58:43.641Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Palo Alto, CA; Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"data curation, modeling, training, inference serving, product integration, large-scale distributed machine learning systems, SFT, RL, evals, human/synthetic data collection, agentic systems, Python, JAX/XLA, PyTorch, Rust/C++, Spark, Ray","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":180000,"maxValue":440000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_8a1df8fb-ff4"},"title":"Principal Engineer, Fin AI Agent","description":"<p>We&#39;re looking for a Principal Engineer to join our AI Group in Berlin. As a Principal Engineer, you will be responsible for leading the development of our Fin AI agent, which is the #1 AI agent for customer service. You will partner at the strategic pillar level, having broad context across work streams and using that to inform technical strategy and investment priorities.</p>\n<p>Your responsibilities will include:</p>\n<ul>\n<li>Partnering at the strategic pillar level to inform technical strategy and investment priorities</li>\n<li>Spinning up 0-to-1 work streams, bringing together engineers who&#39;ve never worked as a team, disambiguating the problem space, building momentum under aggressive timelines, setting high expectations, and driving execution</li>\n<li>Executing on the most ambiguous, highest-stakes problems, writing code, shipping features, and being deep in the weeds</li>\n<li>Leading experimental work at the AI frontier, running your own A/B tests, doing prompt engineering, building evals, and calibrating accuracy, cost, and latency for LLM-powered features</li>\n<li>Shaping long-term technical strategy through execution, building and thinking about what needs to change about how we build products – data models, system design, the shift from GUI-first to agent-first interfaces</li>\n<li>Working across the full stack in an AI-first development environment, pushing the boundaries of what&#39;s possible with AI-assisted development and helping shape how the entire engineering org works</li>\n<li>Raising the bar for the people around you, giving direct, actionable feedback that changes outcomes</li>\n</ul>\n<p>We&#39;re looking for someone with:</p>\n<ul>\n<li>Engineering depth and product thinking, combining deep engineering ability with strong product and design instincts</li>\n<li>Experience operating at real scale and having builder energy, with a bias toward building over discussing</li>\n<li>AI fluency, actively experimenting with AI-assisted development and pushing the boundaries of what&#39;s possible</li>\n<li>Deep technical depth with breadth, navigating complex multi-team systems with ease</li>\n<li>Communication as a superpower, explaining to leadership why a technical investment matters, aligning multiple teams around a complex project, and walking an engineer through the gnarly implementation details</li>\n<li>Extreme autonomy, partnering with the Engineering Director on where you think the pillar needs to go next</li>\n<li>Critical thinking about the business, understanding what Intercom is optimizing for and translating that into technical decisions</li>\n<li>At least 10+ years of experience, with significant time as a technical leader driving complex projects across multiple teams and stakeholders</li>\n<li>Stack agnostic, with experience working with Ruby on Rails, React, and AWS, and being fluent with AI-assisted development tools like Claude Code</li>\n</ul>\n<p>If you&#39;re looking for a challenging role that will push you to grow and develop as an engineer, and you&#39;re passionate about AI and customer service, we&#39;d love to hear from you!</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_8a1df8fb-ff4","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Intercom","sameAs":"https://www.intercom.com/","logo":"https://logos.yubhub.co/intercom.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/intercom/jobs/7725837","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Ruby on Rails","React","AWS","AI-assisted development","Claude Code","LLM-powered features","A/B testing","Prompt engineering","Evals","Accuracy","Cost","Latency"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:58:31.198Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Berlin, Germany"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Ruby on Rails, React, AWS, AI-assisted development, Claude Code, LLM-powered features, A/B testing, Prompt engineering, Evals, Accuracy, Cost, Latency"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_ca221b6f-dca"},"title":"Technical Program Manager, Safeguards (Infrastructure & Evals)","description":"<p><strong>About the Role</strong></p>\n<p>Safeguards Engineering builds and operates the infrastructure that keeps Anthropic&#39;s AI systems safe in production. As a Technical Program Manager for Safeguards Infrastructure and Evals, you&#39;ll own the operational health and forward momentum of this stack.</p>\n<p>Your primary responsibility is driving reliability , owning the incident-response and post-mortem process, ensuring SLOs are defined and met in partnership with various teams, and making sure that when things go wrong, the right people know, the right actions get taken, and those actions actually get closed out.</p>\n<p>Alongside that ongoing operational rhythm, you&#39;ll coordinate the larger platform investments: migrations, eval-platform improvements, and the cross-team dependencies that connect them.</p>\n<p>This role sits at the intersection of operations and program management. It requires genuine technical depth , you need to understand how these systems work well enough to triage effectively, judge what&#39;s actually safety-critical versus what can wait, and have informed conversations with the engineers building and maintaining them.</p>\n<p>But the core of the job is keeping the machine running well and the work moving.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Own the Safeguards Engineering ops review</li>\n<li>Drive the recurring cadence that keeps the team informed and coordinated: surfacing recent incidents and failures, bringing visibility to reliability trends, and making sure the right people are in the room when decisions need to be made.</li>\n<li>Drive incident tracking and post-mortem execution</li>\n<li>Establish and maintain SLOs with partner teams</li>\n<li>Maintain runbook quality and incident-ownership clarity</li>\n<li>Drive platform migrations and infrastructure projects</li>\n<li>Coordinate evals platform improvements</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>Solid technical program management experience, particularly in operational or infrastructure-heavy environments</li>\n<li>Understanding of how production ML systems work well enough to triage incidents intelligently and have substantive conversations with engineers about what&#39;s going wrong and why</li>\n<li>Ability to work effectively across team boundaries</li>\n<li>Experience with or strong interest in AI safety</li>\n</ul>\n<p><strong>Nice to Have</strong></p>\n<ul>\n<li>Experience with SRE practices, incident management frameworks, or on-call operations at scale</li>\n<li>Familiarity with monitoring and alerting tooling (PagerDuty, Datadog, or equivalents)</li>\n<li>Experience driving infrastructure migrations in complex, multi-team environments</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_ca221b6f-dca","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://anthropic.ai/","logo":"https://logos.yubhub.co/anthropic.ai.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5108695008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$290,000-$365,000 USD","x-skills-required":["Technical Program Management","Operational or Infrastructure-heavy Environments","Production ML Systems","Incident Tracking and Post-Mortem Execution","Service-Level Objectives (SLOs)","Runbook Quality and Incident-Ownership Clarity","Platform Migrations and Infrastructure Projects","Evals Platform Improvements"],"x-skills-preferred":["SRE Practices","Incident Management Frameworks","On-Call Operations at Scale","Monitoring and Alerting Tooling","Infrastructure Migrations in Complex, Multi-Team Environments"],"datePosted":"2026-04-18T15:55:20.655Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Technical Program Management, Operational or Infrastructure-heavy Environments, Production ML Systems, Incident Tracking and Post-Mortem Execution, Service-Level Objectives (SLOs), Runbook Quality and Incident-Ownership Clarity, Platform Migrations and Infrastructure Projects, Evals Platform Improvements, SRE Practices, Incident Management Frameworks, On-Call Operations at Scale, Monitoring and Alerting Tooling, Infrastructure Migrations in Complex, Multi-Team Environments","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":290000,"maxValue":365000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_d08d38d2-b72"},"title":"Engineering Manager, Agent Prompts & Evals","description":"<p><strong>About the Role</strong></p>\n<p>Anthropic is looking for an Engineering Manager to lead the Agent Prompts &amp; Evals team. This team owns the infrastructure that lets Anthropic ship model and prompt changes with confidence , the eval frameworks, system prompt pipelines, and regression-detection systems that every model launch depends on.</p>\n<p>When a new Claude model is ready to ship, this team is the one answering “is it actually better in our products?” When a product team wants to change how Claude behaves, this team owns the tooling that tells them whether they broke something. It’s a platform team whose platform is model behavior itself.</p>\n<p>The team sits deliberately at the seam between product engineering and research. You’ll partner closely with other evals groups across the company on shared infrastructure and methodology, with product teams who are shipping features on top of Claude, and with the TPMs and research PMs driving model launches. The pace is set by the model release cadence, and the team operates as both a platform owner and a hands-on partner during launch periods.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Lead and grow a team of prompt engineers and platform software engineers</li>\n<li>Own the product-side eval platform: the frameworks, dashboards, bulk runners, and CI integrations that product teams use to measure Claude’s behavior and catch regressions before they ship</li>\n<li>Own system prompt infrastructure: versioning, deployment, rollback, and review tooling for the prompts that run in production across claude.ai, the API, and agentic surfaces</li>\n<li>Be a steady hand through model launches , these are the team’s highest-stakes operational moments and the EM is the backstop when things get chaotic</li>\n<li>Build durable collaboration with other evals groups across the company; this means real work on ownership boundaries, shared roadmaps, and avoiding tragedy-of-the-commons on shared eval infrastructure</li>\n<li>Recruit, close, and retain engineers who want to work at the intersection of product engineering and model behavior</li>\n<li>Shape where the team invests next: there are credible paths into frontier eval development, model launch automation, and deeper prompt engineering support, and part of the job is sequencing them</li>\n<li>Push the team toward measuring things that are hard to measure , behavioral drift, prompt quality, harness parity , not just things that are easy</li>\n</ul>\n<p><strong>You May Be a Good Fit If You Have</strong></p>\n<ul>\n<li>8+ years in software engineering with 3+ years managing engineering teams, including experience leading a platform, infra, or developer-tooling team where your customers were other engineers</li>\n<li>A track record of building “pits of success” , tooling and process that made it easy for other teams to do the right thing without needing to understand all the details</li>\n<li>Comfort managing a team with a mixed charter: platform ownership, service-to-other-teams, and a launch-driven operational rhythm, all at once</li>\n<li>Enough technical depth to engage on system design, review pipeline architecture, and be credible in debates with strong ICs , you don’t need to be writing code by hand every day, but you should be able to read it, review it, and be comfortable leveraging Claude to understand, design, and occasionally build.</li>\n<li>A product mindset and willingness to wear multiple hats when the work calls for it</li>\n<li>Demonstrated ability to build and maintain peer relationships with partner orgs that have different cultures and incentives , negotiating ownership, aligning roadmaps, and holding ground when it matters without being territorial about it</li>\n<li>Experience recruiting and closing senior ICs in a competitive market</li>\n</ul>\n<p><strong>Strong Candidates May Also Have</strong></p>\n<ul>\n<li>Prior exposure to LLM evals, ML experimentation platforms, or model quality work , even tangentially</li>\n<li>Experience with A/B testing infrastructure, feature flagging, or gradual rollout systems</li>\n<li>Background in devtools, CI/CD platforms, or testing infrastructure at scale</li>\n<li>A history of managing teams that sit between two larger orgs and making that position an asset rather than a liability</li>\n<li>Interest in AI safety and alignment , not required, but it makes the “why” of the work land harder</li>\n</ul>\n<p><strong>Logistics</strong></p>\n<ul>\n<li>Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience</li>\n<li>Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience</li>\n<li>Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position</li>\n<li>Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</li>\n<li>Visa sponsorship: We do sponsor visas! However, we aren’t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</li>\n</ul>\n<p><strong>How we’re different</strong></p>\n<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact , advancing our long-term goals of steerable, trustworthy AI , rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We’re an extremely collaborative group, and we host frequent research discussions</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_d08d38d2-b72","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5159608008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$320,000-$405,000 USD","x-skills-required":["software engineering","team management","platform ownership","service-to-other-teams","launch-driven operational rhythm","system design","pipeline architecture","product mindset","recruiting and closing senior ICs"],"x-skills-preferred":["LLM evals","ML experimentation platforms","model quality work","A/B testing infrastructure","feature flagging","gradual rollout systems","devtools","CI/CD platforms","testing infrastructure at scale"],"datePosted":"2026-04-18T15:54:35.018Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"software engineering, team management, platform ownership, service-to-other-teams, launch-driven operational rhythm, system design, pipeline architecture, product mindset, recruiting and closing senior ICs, LLM evals, ML experimentation platforms, model quality work, A/B testing infrastructure, feature flagging, gradual rollout systems, devtools, CI/CD platforms, testing infrastructure at scale","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":320000,"maxValue":405000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_b372d3eb-ee1"},"title":"Staff Research Engineer, Applied AI","description":"<p>We are seeking a Staff Research Engineer, Applied AI to lead the development and deployment of novel applications, leveraging Google&#39;s generative AI models.</p>\n<p>This role focuses on rapidly developing new features, and working across partner teams to deliver solutions, and maximize impact for Google and top customers.</p>\n<p>You will be instrumental in translating cutting-edge AI research into real-world products, and demonstrating the capabilities of latest-generation models.</p>\n<p>We are looking for engineers with a strong track record of building and shipping AI-powered software, ideally with experience in early-stage environments where they have contributed to scaling products from initial concept to production.</p>\n<p>The ideal candidate will be motivated by the opportunity to drive product &amp; business impact.</p>\n<p>Key responsibilities:</p>\n<ul>\n<li>Harness frontier models to drive real-world high-impact outcomes</li>\n</ul>\n<ul>\n<li>Build evaluations, training data, and infrastructure to support AI deployments and rapid iterations</li>\n</ul>\n<ul>\n<li>Collaborate with researchers and product managers to translate research advancements into tangible product features.</li>\n</ul>\n<ul>\n<li>Contribute to the development of best practices for building and deploying generative AI applications.</li>\n</ul>\n<ul>\n<li>Contribute signal to influence the development of frontier models</li>\n</ul>\n<ul>\n<li>Lead the architecture and development of new products &amp; features from 0 to 1.</li>\n</ul>\n<p>About you:</p>\n<p>In order to set you up for success as a Staff Research Engineer, Applied AI at Google DeepMind, we look for the following skills and experience:</p>\n<p>Required Skills:</p>\n<ul>\n<li>Bachelor&#39;s degree or equivalent practical experience.</li>\n</ul>\n<ul>\n<li>8 years of experience in software development, and with data structures/algorithms.</li>\n</ul>\n<ul>\n<li>5 years of hands-on experience in AI research (e.g. RL, finetuning, evals), AI applications, or model deployment</li>\n</ul>\n<ul>\n<li>Proven experience in rapidly developing and shipping software products.</li>\n</ul>\n<ul>\n<li>Deep understanding of software development best practices, including testing &amp; deployment.</li>\n</ul>\n<ul>\n<li>Experience with cloud computing platforms and infrastructure (e.g., Google Cloud Platform, AWS, Azure).</li>\n</ul>\n<ul>\n<li>Substantial experience with machine learning frameworks and libraries such as TensorFlow, PyTorch, Hugging Face, etc.</li>\n</ul>\n<ul>\n<li>Ability to work in a fast-paced environment and adapt to changing priorities.</li>\n</ul>\n<p>Preferred Skills:</p>\n<ul>\n<li>Experience with generative AI research or applications.</li>\n</ul>\n<ul>\n<li>Contributions to open-source projects.</li>\n</ul>\n<ul>\n<li>Experience working in, or founding early stage startups.</li>\n</ul>\n<ul>\n<li>Experience delivering software solutions in a fast-paced, customer-facing environment.</li>\n</ul>\n<p>If you are a passionate machine learning engineer with a drive to build innovative products and a desire to work at the forefront of AI, we encourage you to apply!</p>\n<p>The US base salary range for this full-time position is between $197,000 - $291,000 + bonus + equity + benefits.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_b372d3eb-ee1","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Google DeepMind","sameAs":"https://deepmind.com/","logo":"https://logos.yubhub.co/deepmind.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/deepmind/jobs/7561938","x-work-arrangement":"onsite","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$197,000 - $291,000 + bonus + equity + benefits","x-skills-required":["Bachelor's degree or equivalent practical experience","8 years of experience in software development, and with data structures/algorithms","5 years of hands-on experience in AI research (e.g. RL, finetuning, evals), AI applications, or model deployment","Proven experience in rapidly developing and shipping software products","Deep understanding of software development best practices, including testing & deployment","Experience with cloud computing platforms and infrastructure (e.g., Google Cloud Platform, AWS, Azure)","Substantial experience with machine learning frameworks and libraries such as TensorFlow, PyTorch, Hugging Face, etc","Ability to work in a fast-paced environment and adapt to changing priorities"],"x-skills-preferred":["Experience with generative AI research or applications","Contributions to open-source projects","Experience working in, or founding early stage startups","Experience delivering software solutions in a fast-paced, customer-facing environment"],"datePosted":"2026-04-18T15:54:04.942Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Mountain View, California, US"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Bachelor's degree or equivalent practical experience, 8 years of experience in software development, and with data structures/algorithms, 5 years of hands-on experience in AI research (e.g. RL, finetuning, evals), AI applications, or model deployment, Proven experience in rapidly developing and shipping software products, Deep understanding of software development best practices, including testing & deployment, Experience with cloud computing platforms and infrastructure (e.g., Google Cloud Platform, AWS, Azure), Substantial experience with machine learning frameworks and libraries such as TensorFlow, PyTorch, Hugging Face, etc, Ability to work in a fast-paced environment and adapt to changing priorities, Experience with generative AI research or applications, Contributions to open-source projects, Experience working in, or founding early stage startups, Experience delivering software solutions in a fast-paced, customer-facing environment","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":197000,"maxValue":291000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_4249dbdd-13b"},"title":"Product Operations Manager, Feedback Loops","description":"<p>We&#39;re hiring a Product Operations Manager , Feedback Loops to own and continuously improve how customer signal flows into product and research decisions at Anthropic. This is a horizontal, org-wide role , you won&#39;t be embedded in a single product team, you&#39;ll build the shared operating system for voice of the customer that every product team, every surface, and every GTM motion plugs into.</p>\n<p>Feedback at Anthropic is uniquely high-leverage. We&#39;re building on frontier models that evolve constantly, serving customers from individual developers to the largest enterprises, across multiple surfaces (API, claude.ai, Claude Code). Customer signal arrives from everywhere , field conversations, support interactions, early access programs, in-product telemetry , and the opportunity is to make that signal a first-class, structured input to every product and research decision.</p>\n<p>This role will build the system that makes customer voice as easy to act on as any other data source. You treat feedback loops as a product. You&#39;re obsessed with making it effortless for the field to share what they&#39;re hearing and for product teams to know what matters most. You build AI-enabled systems that do the first pass so humans can focus on judgment, not triage. You think like a product manager, not a process administrator.</p>\n<p>Your work will directly impact how fast Anthropic learns from its customers and how reliably that learning shapes what we build next.</p>\n<p><strong>Key Responsibilities</strong></p>\n<p>You&#39;ll own the operating system for customer feedback across all of Anthropic , one shared platform, not a collection of per-team processes. Working horizontally across every Product team, Research PM, GTM, Customer Success, and Support, you&#39;ll establish the intake, synthesis, and routing infrastructure that makes voice of the customer a first-class input to every roadmap.</p>\n<p>You&#39;ll drive adoption through influence, making it so obviously useful that teams pull from it rather than get pushed to it.</p>\n<p><strong>Feedback Intake &amp; System of Record</strong></p>\n<p>Own the single, org-wide pipeline that captures customer feedback from every channel , field teams, support, early access programs, in-product signals , into one structured system of record that serves every product surface.</p>\n<p>Build intake workflows that meet teams where they already work (Slack, Gong, CRM) without creating a documentation tax. Obsess over the submitter experience so that sharing feedback is faster than not sharing it.</p>\n<p><strong>AI-Enabled Synthesis &amp; Triage</strong></p>\n<p>Build Claude-powered pipelines that enrich, tag, cluster, and summarize unstructured feedback into trackable issues , doing the first-pass work so humans focus on verification and judgment.</p>\n<p>Design the human-in-the-loop model: Claude proposes, PMs and field teams correct, and the system learns from those corrections over time.</p>\n<p>Partner with Engineering and Research on tooling strategy, evals, and the closed-loop data that makes synthesis quality measurably improve.</p>\n<p><strong>Routing &amp; Closing the Loop</strong></p>\n<p>Establish clear routing so the right feedback reaches the right product or research owner at the right time , including the path from product signal back into model training priorities.</p>\n<p>Build the visibility layer that gives GTM and Support a clear line of sight from customer input to roadmap outcome, so they can close the loop with customers confidently and in real time.</p>\n<p><strong>Voice of the Customer Programs</strong></p>\n<p>Partner deeply with GTM, Customer Success, and Sales to design and run structured voice of the customer programs , customer advisory boards, early access programs, design partner cohorts , that generate high-signal feedback by design.</p>\n<p>Define what &#39;high-signal&#39; means: feedback tied to specific use cases, blocker severity, revenue context, and customer segments so product teams can make confident tradeoffs.</p>\n<p><strong>Continuous Improvement</strong></p>\n<p>Define and track success metrics for feedback loop health , time-to-triage, signal quality, roadmap influence, field satisfaction , and use them to identify bottlenecks.</p>\n<p>Run regular retros with Product and GTM partners and feed learnings back into process and tooling improvements. Scale what works through documentation and enablement.</p>\n<p><strong>You may be a good fit if you:</strong></p>\n<p>Have 7+ years in product operations, customer insights, voice of the customer programs, or related roles in fast-paced tech companies.</p>\n<p>Have personally shipped AI-enabled processes and systems , you&#39;ve written the prompts, built the evals, and iterated on production LLM workflows yourself. You can talk about model behavior with specificity, not just direct others to build.</p>\n<p>Have owned a customer feedback program end-to-end , intake, synthesis, routing, and closing the loop , that product teams actually used to make decisions. The customer mix can be enterprise, PLG, design partner, or dev community; what matters is that you designed it and ran it.</p>\n<p>Have operated at earlier-stage and scaling companies (Series B-D or equivalent) where you built things that didn&#39;t exist yet, shipped v1s in weeks not quarters, and iterated in public.</p>\n<p>Have operated in horizontal, cross-org roles before , you know how to build shared infrastructure that many teams depend on, drive adoption through influence rather than mandate, and earn trust across functions that don&#39;t report to you.</p>\n<p>Are comfortable with ambiguity and can create structure where none exists , you&#39;ve built the v1 of a system and iterated it into something teams rely on.</p>\n<p>Are service-oriented and obsessed with making it easy for others to do great work.</p>\n<p><strong>Strong candidates may also have experience with:</strong></p>\n<p>Building AI-native workflows end-to-end , prompt design, evals, closed-loop improvement , and pushing the boundaries of what automation can own.</p>\n<p>Product Management, Customer Success Operations, or Research Operations.</p>\n<p>Feedback tooling ecosystems (Productboard, Dovetail, or homegrown equivalents) and the tradeoffs between buy vs. build.</p>\n<p>Treating process as a product with users, metrics, and continuous iteration.</p>\n<p>Track record of building and scaling operations programs from zero to one.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_4249dbdd-13b","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.co/","logo":"https://logos.yubhub.co/anthropic.co.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5179882008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$260,000-$325,000 USD","x-skills-required":["Product Operations","Customer Insights","Voice of the Customer Programs","AI-Enabled Systems","Process Automation","Collaboration Tools","Data Analysis","Metrics Tracking","Continuous Improvement"],"x-skills-preferred":["LLM Workflows","Prompt Design","Evals","Closed-Loop Improvement","Product Management","Customer Success Operations","Research Operations","Feedback Tooling Ecosystems"],"datePosted":"2026-04-18T15:54:03.763Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Product Operations, Customer Insights, Voice of the Customer Programs, AI-Enabled Systems, Process Automation, Collaboration Tools, Data Analysis, Metrics Tracking, Continuous Improvement, LLM Workflows, Prompt Design, Evals, Closed-Loop Improvement, Product Management, Customer Success Operations, Research Operations, Feedback Tooling Ecosystems","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":260000,"maxValue":325000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_a9559937-5dc"},"title":"Engineering Manager, Agent","description":"<p>Job Title: Engineering Manager, Agent</p>\n<p>We are hiring an Engineering Manager to lead our Agent team, focused on building the world’s best Agentic Video Editor, Underlord. This is a rapidly changing landscape, and there are many unique technical challenges that come with building high quality, cost-effective Agents for handling the multimodal nature of video.</p>\n<p>As an Engineering Manager, you will manage a team of 5-6 software engineers, report to the Head of AI Engineering, and partner closely with others across engineering, product, design, and AI research teams. Agent-driven editing is a major company focus and this role requires strong product thinking and strategic partnership skills as the team continues to develop novel, high-quality experiences around Agentic Video Editing.</p>\n<p>In 2026, your primary focus will be driving quality to move from beta to GA and delivering material impact for the business. You will:</p>\n<ul>\n<li>Own execution: Manage team prioritizes, drive alignment across stakeholders, adapt plans to maximize outcomes, and hold a high bar for delivery through active planning, prioritization, and follow-through</li>\n</ul>\n<ul>\n<li>Define team direction: Connect customer needs, product vision, and technical realities in close collaboration with Product, Design, and AI Research by setting clear priorities and making tradeoffs to deliver results</li>\n</ul>\n<ul>\n<li>Develop your team: Recruit, mentor, and grow engineers, fostering a culture of continuous learning and high performance. Create stretch opportunities for ICs, deliver clear and direct feedback, and manage underperformance early</li>\n</ul>\n<ul>\n<li>Drive operational excellence: Ensure execution is predictable, reliable, and sustainable, implementing best practices in project management and engineering processes</li>\n</ul>\n<ul>\n<li>Maintain the technical bar: Guide architectural decisions, ensure sound design tradeoffs, address tech debt that slows execution, and drive delivery of high-quality, reliable systems</li>\n</ul>\n<p>The base salary range for this role is $222,431-$261,684/year. Final offer amounts will carefully consider multiple factors, including prior experience, expertise, location, level, and may vary from the amount above.</p>\n<p>Benefits include a generous healthcare package, 401k matching program, catered lunches, and flexible vacation time. Our headquarters are located in the Mission District of San Francisco, CA. We&#39;re hiring for a mix of remote roles and hybrid roles. For those who are remote, we have a handful of opportunities throughout the year for in person collaboration. For our hybrid roles, we&#39;re flexible, and you&#39;re an adult,we don&#39;t expect or mandate that you&#39;re in the office every day.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_a9559937-5dc","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Descript","sameAs":"https://descript.com/","logo":"https://logos.yubhub.co/descript.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/descript/jobs/7617845003","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$222,431-$261,684/year","x-skills-required":["software engineering","engineering management","product thinking","strategic partnership","team management","project management","engineering processes"],"x-skills-preferred":["agent development","evals","LLM-powered products","creative tools","video editing"],"datePosted":"2026-04-18T15:49:35.717Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | Remote"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"software engineering, engineering management, product thinking, strategic partnership, team management, project management, engineering processes, agent development, evals, LLM-powered products, creative tools, video editing","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":222431,"maxValue":261684,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_47588e09-b9f"},"title":"Product Operations Manager, Feedback Loops","description":"<p>We&#39;re hiring a Product Operations Manager , Feedback Loops to own and continuously improve how customer signal flows into product and research decisions at Anthropic.</p>\n<p>This is a horizontal, org-wide role , you won&#39;t be embedded in a single product team, you&#39;ll build the shared operating system for voice of the customer that every product team, every surface, and every GTM motion plugs into.</p>\n<p>Feedback at Anthropic is uniquely high-leverage. We&#39;re building on frontier models that evolve constantly, serving customers from individual developers to the largest enterprises, across multiple surfaces (API, claude.ai, Claude Code).</p>\n<p>Customer signal arrives from everywhere , field conversations, support interactions, early access programs, in-product telemetry , and the opportunity is to make that signal a first-class, structured input to every product and research decision.</p>\n<p>You treat feedback loops as a product. You&#39;re obsessed with making it effortless for the field to share what they&#39;re hearing and for product teams to know what matters most.</p>\n<p>You build AI-enabled systems that do the first pass so humans can focus on judgment, not triage. You think like a product manager, not a process administrator.</p>\n<p>Your work will directly impact how fast Anthropic learns from its customers and how reliably that learning shapes what we build next.</p>\n<p><strong>Key Responsibilities</strong></p>\n<p>You&#39;ll own the operating system for customer feedback across all of Anthropic , one shared platform, not a collection of per-team processes.</p>\n<p>Working horizontally across every Product team, Research PM, GTM, Customer Success, and Support, you&#39;ll establish the intake, synthesis, and routing infrastructure that makes voice of the customer a first-class input to every roadmap.</p>\n<p>You&#39;ll drive adoption through influence, making it so obviously useful that teams pull from it rather than get pushed to it.</p>\n<p><strong>Feedback Intake &amp; System of Record</strong></p>\n<ul>\n<li>Own the single, org-wide pipeline that captures customer feedback from every channel , field teams, support, early access programs, in-product signals , into one structured system of record that serves every product surface.</li>\n</ul>\n<ul>\n<li>Build intake workflows that meet teams where they already work (Slack, Gong, CRM) without creating a documentation tax. Obsess over the submitter experience so that sharing feedback is faster than not sharing it.</li>\n</ul>\n<p><strong>AI-Enabled Synthesis &amp; Triage</strong></p>\n<ul>\n<li>Build Claude-powered pipelines that enrich, tag, cluster, and summarize unstructured feedback into trackable issues , doing the first-pass work so humans focus on verification and judgment.</li>\n</ul>\n<ul>\n<li>Design the human-in-the-loop model: Claude proposes, PMs and field teams correct, and the system learns from those corrections over time.</li>\n</ul>\n<ul>\n<li>Partner with Engineering and Research on tooling strategy, evals, and the closed-loop data that makes synthesis quality measurably improve.</li>\n</ul>\n<p><strong>Routing &amp; Closing the Loop</strong></p>\n<ul>\n<li>Establish clear routing so the right feedback reaches the right product or research owner at the right time , including the path from product signal back into model training priorities.</li>\n</ul>\n<ul>\n<li>Build the visibility layer that gives GTM and Support a clear line of sight from customer input to roadmap outcome, so they can close the loop with customers confidently and in real time.</li>\n</ul>\n<p><strong>Voice of the Customer Programs</strong></p>\n<ul>\n<li>Partner deeply with GTM, Customer Success, and Sales to design and run structured voice of the customer programs , customer advisory boards, early access programs, design partner cohorts , that generate high-signal feedback by design.</li>\n</ul>\n<ul>\n<li>Define what &#39;high-signal&#39; means: feedback tied to specific use cases, blocker severity, revenue context, and customer segments so product teams can make confident tradeoffs.</li>\n</ul>\n<p><strong>Continuous Improvement</strong></p>\n<ul>\n<li>Define and track success metrics for feedback loop health , time-to-triage, signal quality, roadmap influence, field satisfaction , and use them to identify bottlenecks.</li>\n</ul>\n<ul>\n<li>Run regular retros with Product and GTM partners and feed learnings back into process and tooling improvements. Scale what works through documentation and enablement.</li>\n</ul>\n<p><strong>You may be a good fit if you:</strong></p>\n<ul>\n<li>Have 7+ years in product operations, customer insights, voice of the customer programs, or related roles in fast-paced tech companies.</li>\n</ul>\n<ul>\n<li>Have personally shipped AI-enabled processes and systems , you&#39;ve written the prompts, built the evals, and iterated on production LLM workflows yourself.</li>\n</ul>\n<ul>\n<li>Have owned a customer feedback program end-to-end , intake, synthesis, routing, and closing the loop , that product teams actually used to make decisions.</li>\n</ul>\n<ul>\n<li>Have operated at earlier-stage and scaling companies (Series B-D or equivalent) where you built things that didn&#39;t exist yet, shipped v1s in weeks not quarters, and iterated in public.</li>\n</ul>\n<ul>\n<li>Have operated in horizontal, cross-org roles before , you know how to build shared infrastructure that many teams depend on, drive adoption through influence rather than mandate, and earn trust across functions that don&#39;t report to you.</li>\n</ul>\n<ul>\n<li>Are comfortable with ambiguity and can create structure where none exists , you&#39;ve built the v1 of a system and iterated it into something teams rely on.</li>\n</ul>\n<ul>\n<li>Are service-oriented and obsessed with making it easy for others to do great work.</li>\n</ul>\n<p><strong>Strong candidates may also have experience with:</strong></p>\n<ul>\n<li>Building AI-native workflows end-to-end , prompt design, evals, closed-loop improvement , and pushing the boundaries of what automation can own.</li>\n</ul>\n<ul>\n<li>Product Management, Customer Success Operations, or Research Operations.</li>\n</ul>\n<ul>\n<li>Feedback tooling ecosystems (Productboard, Dovetail, or homegrown equivalents) and the tradeoffs between buy vs. build.</li>\n</ul>\n<ul>\n<li>Treating process as a product with users, metrics, and continuous iteration.</li>\n</ul>\n<ul>\n<li>Track record of building and scaling operations programs from zero to one.</li>\n</ul>\n<p>Annual compensation range for this role is $260,000-$325,000 USD</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_47588e09-b9f","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.co/","logo":"https://logos.yubhub.co/anthropic.co.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5179882008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$260,000-$325,000 USD","x-skills-required":["AI-enabled processes","Customer insights","Voice of the customer programs","Product operations","Customer feedback","Synthesis and triage","Routing and closing the loop","Continuous improvement","Metrics tracking","Process management"],"x-skills-preferred":["Prompt design","Evals","Closed-loop improvement","Automation","Product management","Customer success operations","Research operations","Feedback tooling ecosystems","Process as a product","Metrics-driven approach"],"datePosted":"2026-04-18T15:44:07.529Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"AI-enabled processes, Customer insights, Voice of the customer programs, Product operations, Customer feedback, Synthesis and triage, Routing and closing the loop, Continuous improvement, Metrics tracking, Process management, Prompt design, Evals, Closed-loop improvement, Automation, Product management, Customer success operations, Research operations, Feedback tooling ecosystems, Process as a product, Metrics-driven approach","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":260000,"maxValue":325000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_aa1a6f6f-fee"},"title":"Staff Research Engineer, Applied AI","description":"<p>We are seeking a Staff Research Engineer, Applied AI to lead the development and deployment of novel applications, leveraging Google&#39;s generative AI models.</p>\n<p>This role focuses on rapidly developing new features, and working across partner teams to deliver solutions, and maximize impact for Google and top customers.</p>\n<p>You will be instrumental in translating cutting-edge AI research into real-world products, and demonstrating the capabilities of latest-generation models.</p>\n<p>We are looking for engineers with a strong track record of building and shipping AI-powered software, ideally with experience in early-stage environments where they have contributed to scaling products from initial concept to production.</p>\n<p>The ideal candidate will be motivated by the opportunity to drive product &amp; business impact.</p>\n<p>Key responsibilities:</p>\n<ul>\n<li>Harness frontier models to drive real-world high-impact outcomes</li>\n</ul>\n<ul>\n<li>Build evaluations, training data, and infrastructure to support AI deployments and rapid iterations</li>\n</ul>\n<ul>\n<li>Collaborate with researchers and product managers to translate research advancements into tangible product features.</li>\n</ul>\n<ul>\n<li>Contribute to the development of best practices for building and deploying generative AI applications.</li>\n</ul>\n<ul>\n<li>Contribute signal to influence the development of frontier models</li>\n</ul>\n<ul>\n<li>Lead the architecture and development of new products &amp; features from 0 to 1.</li>\n</ul>\n<p>About you:</p>\n<p>In order to set you up for success as a Staff Research Engineer, Applied AI at Google DeepMind, we look for the following skills and experience:</p>\n<p>Required Skills:</p>\n<ul>\n<li>Bachelor&#39;s degree or equivalent practical experience.</li>\n</ul>\n<ul>\n<li>8 years of experience in software development, and with data structures/algorithms.</li>\n</ul>\n<ul>\n<li>5 years of hands-on experience in AI research (e.g. RL, finetuning, evals), AI applications, or model deployment</li>\n</ul>\n<ul>\n<li>Proven experience in rapidly developing and shipping software products.</li>\n</ul>\n<ul>\n<li>Deep understanding of software development best practices, including testing &amp; deployment.</li>\n</ul>\n<ul>\n<li>Experience with cloud computing platforms and infrastructure (e.g., Google Cloud Platform, AWS, Azure).</li>\n</ul>\n<ul>\n<li>Substantial experience with machine learning frameworks and libraries such as TensorFlow, PyTorch, Hugging Face, etc.</li>\n</ul>\n<ul>\n<li>Ability to work in a fast-paced environment and adapt to changing priorities.</li>\n</ul>\n<p>Preferred Skills:</p>\n<ul>\n<li>Experience with generative AI research or applications.</li>\n</ul>\n<ul>\n<li>Contributions to open-source projects.</li>\n</ul>\n<ul>\n<li>Experience working in, or founding early stage startups.</li>\n</ul>\n<ul>\n<li>Experience delivering software solutions in a fast-paced, customer-facing environment.</li>\n</ul>\n<p>If you are a passionate machine learning engineer with a drive to build innovative products and a desire to work at the forefront of AI, we encourage you to apply!</p>\n<p>The US base salary range for this full-time position is between $197,000 - $291,000 + bonus + equity + benefits.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_aa1a6f6f-fee","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Google DeepMind","sameAs":"https://deepmind.com/","logo":"https://logos.yubhub.co/deepmind.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/deepmind/jobs/7561938","x-work-arrangement":"onsite","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$197,000 - $291,000 + bonus + equity + benefits","x-skills-required":["Bachelor's degree or equivalent practical experience","8 years of experience in software development, and with data structures/algorithms","5 years of hands-on experience in AI research (e.g. RL, finetuning, evals), AI applications, or model deployment","Proven experience in rapidly developing and shipping software products","Deep understanding of software development best practices, including testing & deployment","Experience with cloud computing platforms and infrastructure (e.g., Google Cloud Platform, AWS, Azure)","Substantial experience with machine learning frameworks and libraries such as TensorFlow, PyTorch, Hugging Face, etc.","Ability to work in a fast-paced environment and adapt to changing priorities"],"x-skills-preferred":["Experience with generative AI research or applications","Contributions to open-source projects","Experience working in, or founding early stage startups","Experience delivering software solutions in a fast-paced, customer-facing environment"],"datePosted":"2026-04-18T15:40:05.366Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Mountain View, California, US"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Bachelor's degree or equivalent practical experience, 8 years of experience in software development, and with data structures/algorithms, 5 years of hands-on experience in AI research (e.g. RL, finetuning, evals), AI applications, or model deployment, Proven experience in rapidly developing and shipping software products, Deep understanding of software development best practices, including testing & deployment, Experience with cloud computing platforms and infrastructure (e.g., Google Cloud Platform, AWS, Azure), Substantial experience with machine learning frameworks and libraries such as TensorFlow, PyTorch, Hugging Face, etc., Ability to work in a fast-paced environment and adapt to changing priorities, Experience with generative AI research or applications, Contributions to open-source projects, Experience working in, or founding early stage startups, Experience delivering software solutions in a fast-paced, customer-facing environment","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":197000,"maxValue":291000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_cc98c40d-f7e"},"title":"Partner Business Systems & AI Operations Lead","description":"<p>The Partner Business Systems &amp; AI Operations Lead will own the foundation of the Claude Partner Network, including the Salesforce partner data model, the partner platform stack, and the integrations between them. This role will also define and own the partner data quality standard, administer the partner platform stack, and build and operate the AI automation layer across the partner workflow stack.</p>\n<p>Key responsibilities include owning the Salesforce partner data model end to end, administering the partner platform stack, defining and owning the partner data quality standard, partnering with the Business Process Manager to instrument every partner process, running access and configuration governance for partner systems, and building and operating the AI automation layer.</p>\n<p>The ideal candidate will have five or more years in revenue systems, partner systems, or business systems roles with hands-on Salesforce administration or architecture experience, and will be able to translate a program rule into a schema, a validation rule, and an entitlement flow without a detailed specification.</p>\n<p>Strong candidates may also have Salesforce Administrator or Platform App Builder certification, or experience with Experience Cloud or a PRM such as Impartner or Salesforce PRM.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_cc98c40d-f7e","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5191437008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$215,000-$300,000 USD","x-skills-required":["Salesforce administration","Salesforce architecture","Data quality standard","AI automation","Workflow automation","LLMs","Agentic workflows","Prompt engineering","Evals","SQL fluency"],"x-skills-preferred":["Salesforce Administrator or Platform App Builder certification","Experience with Experience Cloud or a PRM such as Impartner or Salesforce PRM","Prior partner program or channel operations experience","Experience standing up a data quality program from the ground up","Shipped an AI-powered or LLM-driven workflow into a production ops environment"],"datePosted":"2026-04-18T15:39:44.195Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Salesforce administration, Salesforce architecture, Data quality standard, AI automation, Workflow automation, LLMs, Agentic workflows, Prompt engineering, Evals, SQL fluency, Salesforce Administrator or Platform App Builder certification, Experience with Experience Cloud or a PRM such as Impartner or Salesforce PRM, Prior partner program or channel operations experience, Experience standing up a data quality program from the ground up, Shipped an AI-powered or LLM-driven workflow into a production ops environment","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":215000,"maxValue":300000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_0806749e-694"},"title":"Engineering Manager, Agent Prompts & Evals","description":"<p><strong>About the Role</strong></p>\n<p>Anthropic is looking for an Engineering Manager to lead the Agent Prompts &amp; Evals team. This team owns the infrastructure that lets Anthropic ship model and prompt changes with confidence , the eval frameworks, system prompt pipelines, and regression-detection systems that every model launch depends on.</p>\n<p>When a new Claude model is ready to ship, this team is the one answering “is it actually better in our products?” When a product team wants to change how Claude behaves, this team owns the tooling that tells them whether they broke something. It’s a platform team whose platform is model behavior itself.</p>\n<p>The team sits deliberately at the seam between product engineering and research. You’ll partner closely with other evals groups across the company on shared infrastructure and methodology, with product teams who are shipping features on top of Claude, and with the TPMs and research PMs driving model launches. The pace is set by the model release cadence, and the team operates as both a platform owner and a hands-on partner during launch periods.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Lead and grow a team of prompt engineers and platform software engineers</li>\n<li>Own the product-side eval platform: the frameworks, dashboards, bulk runners, and CI integrations that product teams use to measure Claude’s behavior and catch regressions before they ship</li>\n<li>Own system prompt infrastructure: versioning, deployment, rollback, and review tooling for the prompts that run in production across claude.ai, the API, and agentic surfaces</li>\n<li>Be a steady hand through model launches , these are the team’s highest-stakes operational moments and the EM is the backstop when things get chaotic</li>\n<li>Build durable collaboration with other evals groups across the company; this means real work on ownership boundaries, shared roadmaps, and avoiding tragedy-of-the-commons on shared eval infrastructure</li>\n<li>Recruit, close, and retain engineers who want to work at the intersection of product engineering and model behavior</li>\n<li>Shape where the team invests next: there are credible paths into frontier eval development, model launch automation, and deeper prompt engineering support, and part of the job is sequencing them</li>\n<li>Push the team toward measuring things that are hard to measure , behavioral drift, prompt quality, harness parity , not just things that are easy</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>8+ years in software engineering with 3+ years managing engineering teams, including experience leading a platform, infra, or developer-tooling team where your customers were other engineers</li>\n<li>A track record of building “pits of success” , tooling and process that made it easy for other teams to do the right thing without needing to understand all the details</li>\n<li>Comfort managing a team with a mixed charter: platform ownership, service-to-other-teams, and a launch-driven operational rhythm, all at once</li>\n<li>Enough technical depth to engage on system design, review pipeline architecture, and be credible in debates with strong ICs , you don’t need to be writing code by hand every day, but you should be able to read it, review it, and be comfortable leveraging Claude to understand, design, and occasionally build.</li>\n<li>A product mindset and willingness to wear multiple hats when the work calls for it</li>\n<li>Demonstrated ability to build and maintain peer relationships with partner orgs that have different cultures and incentives , negotiating ownership, aligning roadmaps, and holding ground when it matters without being territorial about it</li>\n<li>Experience recruiting and closing senior ICs in a competitive market</li>\n</ul>\n<p><strong>Nice to Have</strong></p>\n<ul>\n<li>Prior exposure to LLM evals, ML experimentation platforms, or model quality work , even tangentially</li>\n<li>Experience with A/B testing infrastructure, feature flagging, or gradual rollout systems</li>\n<li>Background in devtools, CI/CD platforms, or testing infrastructure at scale</li>\n<li>A history of managing teams that sit between two larger orgs and making that position an asset rather than a liability</li>\n<li>Interest in AI safety and alignment , not required, but it makes the “why” of the work land harder</li>\n</ul>\n<p><strong>Logistics</strong></p>\n<ul>\n<li>Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience</li>\n<li>Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience</li>\n<li>Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position</li>\n<li>Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</li>\n<li>Visa sponsorship: We do sponsor visas! However, we aren’t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_0806749e-694","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5159608008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$320,000-$405,000 USD","x-skills-required":["Software engineering","Team management","Platform ownership","Service-to-other-teams","Launch-driven operational rhythm","System design","Pipeline architecture","Product mindset","Peer relationships","Recruiting and closing senior ICs"],"x-skills-preferred":["LLM evals","ML experimentation platforms","Model quality work","A/B testing infrastructure","Feature flagging","Gradual rollout systems","Devtools","CI/CD platforms","Testing infrastructure","AI safety and alignment"],"datePosted":"2026-04-18T15:39:18.064Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Software engineering, Team management, Platform ownership, Service-to-other-teams, Launch-driven operational rhythm, System design, Pipeline architecture, Product mindset, Peer relationships, Recruiting and closing senior ICs, LLM evals, ML experimentation platforms, Model quality work, A/B testing infrastructure, Feature flagging, Gradual rollout systems, Devtools, CI/CD platforms, Testing infrastructure, AI safety and alignment","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":320000,"maxValue":405000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_28b01ce3-8a3"},"title":"Member of Technical Staff - Imagine Model","description":"<p>As a Member of Technical Staff on the Imagine Model Team, you will develop cutting-edge AI experiences beyond text, with a strong focus on enabling high-fidelity understanding and generation across image and video modalities, while also incorporating audio where it enhances visual content.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Create and drive engineering agendas to advance multimodal capabilities, with emphasis on image and video generation, editing, understanding, controllable/long-horizon synthesis, agentic planning, RL training, and world simulation (including audio integration for richer video experiences).</li>\n<li>Improve data quality through annotation, filtering, augmentation, synthetic generation, captioning, and in-depth data studies, particularly for visual and audio data.</li>\n<li>Design evaluation frameworks, metrics, benchmarks, evals, and reward models tailored to image/video/audio quality and coherence.</li>\n<li>Implement efficient algorithms for state-of-the-art model performance, including real-time inference, distillation, and scalable serving for visual content.</li>\n<li>Develop scalable data collection and processing pipelines for multimodal (primarily image/video-focused) datasets.</li>\n<li>Collaborate cross-functionally to integrate AI solutions into production and rapidly iterate based on user feedback.</li>\n</ul>\n<p>Basic Qualifications:</p>\n<ul>\n<li>Track record in leading studies that significantly improve neural network capabilities and performance through better data or modeling.</li>\n<li>Experience in data-driven experiment designs, systematic analysis, and iterative model debugging.</li>\n<li>Experience developing or working with large-scale distributed machine learning systems.</li>\n<li>Ability to deliver optimal end-to-end user experiences.</li>\n<li>Hands-on contributor with initiative, excellence, strong work ethic, prioritization skills, and excellent communication.</li>\n</ul>\n<p>Preferred Skills and Experience:</p>\n<ul>\n<li>Experience in SFT, RL, evals, human/synthetic data collection, or agentic systems.</li>\n<li>Proficiency in Python, JAX/XLA, PyTorch, Rust/C++, Spark, Ray, and related large-scale frameworks.</li>\n<li>Domain expertise in multimodal applications such as graphics engines, rendering techniques, image/video understanding and generation, world models, real-time simulation, or controllable/long-horizon visual content creation (audio/speech processing or music/audio generation experience is a plus where it supports video).</li>\n<li>Experience with agentic RL training, controllable/long-horizon generation, or multimodal agents that reason and act across modalities (especially in visual domains).</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_28b01ce3-8a3","directApply":true,"hiringOrganization":{"@type":"Organization","name":"xAI","sameAs":"https://www.xai.com/","logo":"https://logos.yubhub.co/xai.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/xai/jobs/5051985007","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$180,000 - $440,000 USD","x-skills-required":["Python","JAX/XLA","PyTorch","Rust/C++","Spark","Ray","multimodal applications","agentic systems","RL training","controllable/long-horizon generation"],"x-skills-preferred":["SFT","evals","human/synthetic data collection","graphics engines","rendering techniques","image/video understanding and generation","world models","real-time simulation"],"datePosted":"2026-04-18T15:24:12.847Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Palo Alto, CA; Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, JAX/XLA, PyTorch, Rust/C++, Spark, Ray, multimodal applications, agentic systems, RL training, controllable/long-horizon generation, SFT, evals, human/synthetic data collection, graphics engines, rendering techniques, image/video understanding and generation, world models, real-time simulation","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":180000,"maxValue":440000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_58b03260-1e2"},"title":"AI Engineer, Product","description":"<p>About Mistral AI</p>\n<p>At Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life.</p>\n<p>We are a global company with a diverse workforceREADME</p>\n<p>Embedded directly in a product team as search, chat, documents, or audio, you&#39;ll improve AI-powered features through rigorous evaluation, prompt and orchestration design, and rapid experimentation. You&#39;ll own your domain&#39;s AI quality end-to-end: define what &quot;good&quot; looks like, measure it, run experiments, and ship what works.</p>\n<p>Responsibilities</p>\n<p>• Design and run evaluations for your product area: reference tests, heuristics, model-graded checks tailored to search relevance, chat quality, document understanding, or audio performance.</p>\n<p>• Define and track metrics that matter: task success, helpfulness, hallucination proxies, safety flags, latency, cost.</p>\n<p>• Own prompt and orchestration design: write, test, and iterate on prompts and system prompts as a core part of your work.</p>\n<p>• Run A/B tests on prompts, models, and configurations; analyze results; make rollout or rollback decisions from data.</p>\n<p>• Set up observability for LLM calls: structured logging, tracing, dashboards, alerts.</p>\n<p>• Operate model releases: canary and shadow traffic, sign-offs, SLO-based rollback criteria, regression detection.</p>\n<p>• Improve core behaviors in your product area, whether that&#39;s memory policies, intent classification, routing, tool-call reliability, or retrieval quality.</p>\n<p>• Create templates and documentation so other teams can author evals and ship safely.</p>\n<p>• Partner with Science to diagnose regressions and lead post-mortems.</p>\n<p>About you</p>\n<p>• 3-4 years of experience; backgrounds that fit well include ML engineers moving closer to product, or software engineers with real AI/ML production experience.</p>\n<p>• Strong TypeScript or Python skills - we have both tracks depending on team fit.</p>\n<p>• Production LLM experience: prompts, tool/function calling, system prompts.</p>\n<p>• Hands-on with evals and A/B testing; you can design metrics, not just run them.</p>\n<p>• Comfortable implementing directly in product code, not only notebooks.</p>\n<p>• Observability experience: logging, tracing, dashboards, alerting.</p>\n<p>• Product mindset: form hypotheses, run experiments, interpret results, ship.</p>\n<p>• Clear communication, autonomous, and oriented toward production impact over experimentation for its own sake.</p>\n<p>It would be ideal if you also have:</p>\n<p>• Safety systems experience: moderation, PII handling/redaction, guardrails.</p>\n<p>• Release operations: canary/shadowing, automated rollbacks, experiment platforms.</p>\n<p>• Prior work on search ranking, chat systems, document AI, or audio ML features.</p>\n<p>Hiring Process</p>\n<p>• Introduction call - 30 min</p>\n<p>• Hiring Manager interview - 30 min</p>\n<p>• Technical Rounds - Live-coding Interview - 45 min - AI Engineering Interview - 45 min</p>\n<p>• Culture-fit discussion - 30 min</p>\n<p>• References</p>\n<p>By applying, you agree to our Applicant Privacy Policy.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_58b03260-1e2","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai","logo":"https://logos.yubhub.co/mistral.ai.png"},"x-apply-url":"https://jobs.lever.co/mistral/c79ff8ed-6689-4dda-aec6-979a5dc767d0","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["TypeScript","Python","Production LLM experience","Evals and A/B testing","Observability","Product mindset","Clear communication"],"x-skills-preferred":["Safety systems experience","Release operations","Search ranking","Chat systems","Document AI","Audio ML features"],"datePosted":"2026-04-17T12:46:01.954Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Paris"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"TypeScript, Python, Production LLM experience, Evals and A/B testing, Observability, Product mindset, Clear communication, Safety systems experience, Release operations, Search ranking, Chat systems, Document AI, Audio ML features"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_6663d8f4-ea5"},"title":"AI Engineer, Product","description":"<p>About Mistral AI</p>\n<p>At Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life.</p>\n<p>We are a global company with teams distributed between France, USA, UK, Germany, and Singapore. Our diverse workforce thrives in competitive environments and is committed to driving innovation.</p>\n<p>Role Summary</p>\n<p>Embedded directly in a product team as search, chat, documents, or audio, you&#39;ll improve AI-powered features through rigorous evaluation, prompt and orchestration design, and rapid experimentation. You&#39;ll own your domain&#39;s AI quality end-to-end: define what &#39;good&#39; looks like, measure it, run experiments, and ship what works.</p>\n<p>Responsibilities</p>\n<ul>\n<li>Design and run evaluations for your product area: reference tests, heuristics, model-graded checks tailored to search relevance, chat quality, document understanding, or audio performance.</li>\n<li>Define and track metrics that matter: task success, helpfulness, hallucination proxies, safety flags, latency, cost.</li>\n<li>Own prompt and orchestration design: write, test, and iterate on prompts and system prompts as a core part of your work.</li>\n<li>Run A/B tests on prompts, models, and configurations; analyze results; make rollout or rollback decisions from data.</li>\n<li>Set up observability for LLM calls: structured logging, tracing, dashboards, alerts.</li>\n<li>Operate model releases: canary and shadow traffic, sign-offs, SLO-based rollback criteria, regression detection.</li>\n<li>Improve core behaviors in your product area, whether that&#39;s memory policies, intent classification, routing, tool-call reliability, or retrieval quality.</li>\n<li>Create templates and documentation so other teams can author evals and ship safely.</li>\n<li>Partner with Science to diagnose regressions and lead post-mortems.</li>\n</ul>\n<p>About You</p>\n<ul>\n<li>3-4 years of experience; backgrounds that fit well include ML engineers moving closer to product, or software engineers with real AI/ML production experience.</li>\n<li>Strong TypeScript or Python skills - we have both tracks depending on team fit.</li>\n<li>Production LLM experience: prompts, tool/function calling, system prompts.</li>\n<li>Hands-on with evals and A/B testing; you can design metrics, not just run them.</li>\n<li>Comfortable implementing directly in product code, not only notebooks.</li>\n<li>Observability experience: logging, tracing, dashboards, alerting.</li>\n<li>Product mindset: form hypotheses, run experiments, interpret results, ship.</li>\n<li>Clear communication, autonomous, and oriented toward production impact over experimentation for its own sake.</li>\n</ul>\n<p>Benefits</p>\n<ul>\n<li>Competitive salary and equity package</li>\n<li>Health insurance</li>\n<li>Transportation allowance</li>\n<li>Sport allowance</li>\n<li>Meal vouchers</li>\n<li>Private pension plan</li>\n<li>Generous parental leave policy</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_6663d8f4-ea5","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai"},"x-apply-url":"https://jobs.lever.co/mistral/c79ff8ed-6689-4dda-aec6-979a5dc767d0","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["TypeScript","Python","Production LLM experience","Evals and A/B testing","Observability","Product mindset"],"x-skills-preferred":["Safety systems experience","Release operations","Search ranking","Chat systems","Document AI","Audio ML features"],"datePosted":"2026-03-10T11:22:23.831Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Paris"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"TypeScript, Python, Production LLM experience, Evals and A/B testing, Observability, Product mindset, Safety systems experience, Release operations, Search ranking, Chat systems, Document AI, Audio ML features"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_273a6027-d6f"},"title":"Provider Operations & Support","description":"<p><strong>About the Role</strong></p>\n<p>OpenRouter is an LLM marketplace that lets developers use frontier models in one place. This means shipping frequent model launches and making integration dead simple.</p>\n<p>We’re looking for an AI enthusiast to manage model launches and provider onboarding. You’ll help bring new models to market and improve the onboarding process for new providers. This is an ideal role if you love tinkering with AI models, want to develop technical depth, and want to work directly with all of the model labs and providers in the AI ecosystem.</p>\n<p><strong>Key Responsibilities</strong></p>\n<p><strong>Model Launches &amp; Provider Partnerships (~60%)</strong></p>\n<p>Run end-to-end launch playbooks: scoping, test plans, latency/quality checks, pricing &amp; quotas, docs, and announcement assets.</p>\n<p>Coordinate with model providers to integrate, QA, and hit ship dates.</p>\n<p>Maintain clear versioning and release notes; manage deprecations and migrations.</p>\n<p><strong>Tooling for benchmarks, onboarding and internal operations (~40%)</strong></p>\n<p>Build internal tools to help speed up the model onboarding process.</p>\n<p>Build internal evals to evaluate models and endpoints quickly.</p>\n<p>Document internal processes and prime them for automation.</p>\n<p><strong>About You</strong></p>\n<ul>\n<li>Experience: 2-3 years in a startup, solutions engineering, product ops, or similar.</li>\n</ul>\n<ul>\n<li>Technical Skills: Comfortable building demos and scripts; can read API docs and troubleshoot with logs/cURL/Postman. Experience with TypeScript/JavaScript and/or Python. Git-literate. Bonus: familiarity with LangChain/LlamaIndex, benchmarking/evals, or basic cloud/observability.</li>\n</ul>\n<ul>\n<li>Mindset: Ambitious owner, fast learner, bias to ship. Passionate about AI and developer experience.</li>\n</ul>\n<ul>\n<li>Education: CS/Engineering degree is a plus, not required with demonstrated technical aptitude.</li>\n</ul>\n<p><strong>What We Offer</strong></p>\n<ul>\n<li>A front-row seat to real-world LLM adoption and access to cutting-edge models.</li>\n</ul>\n<ul>\n<li>Growth into specialized or senior roles as we scale.</li>\n</ul>\n<ul>\n<li>Collaborative, high-ownership environment.</li>\n</ul>\n<ul>\n<li>Competitive compensation, benefits, and equity.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_273a6027-d6f","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenRouter","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openrouter.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openrouter/58dd70b9-f387-4ba1-8bee-1033f91e76ee","x-work-arrangement":"Remote","x-experience-level":"mid","x-job-type":"Full time","x-salary-range":null,"x-skills-required":["TypeScript/JavaScript","Python","Git","API docs","logs/cURL/Postman"],"x-skills-preferred":["LangChain/LlamaIndex","benchmarking/evals","cloud/observability"],"datePosted":"2026-03-09T09:47:53.961Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Remote (US)"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"TypeScript/JavaScript, Python, Git, API docs, logs/cURL/Postman, LangChain/LlamaIndex, benchmarking/evals, cloud/observability"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_030fe4a1-0c7"},"title":"Platform Engineer, Forward Deployed Engineering (FDE) - NYC","description":"<p><strong>Job Posting</strong></p>\n<p><strong>Platform Engineer, Forward Deployed Engineering (FDE) - NYC</strong></p>\n<p><strong>Location</strong></p>\n<p>New York City</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Location Type</strong></p>\n<p>Hybrid</p>\n<p><strong>Department</strong></p>\n<p>Model Deployment for Business</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$230K – $385K</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p>More details about our benefits are available to candidates during the hiring process.</p>\n<p>This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.</p>\n<p><strong>About the team</strong></p>\n<p>OpenAI’s Forward Deployed Engineering (FDE) org sits at the intersection of product, engineering, research, and go-to-market. We take frontier platform capabilities into the real world with design partners, turning raw customer signal into shipped software, repeatable patterns, and durable products.</p>\n<p>The FDE Platform team is primarily a leverage function that scales the FDE org’s impact to OpenAI’s platform and products. We provide hands-on leverage by embedding with customer-tagged FDE pods to aid in architecting, product shaping, refactoring, and building. This team is perfect for highly collaborative software engineers who love innovating on cutting-edge products with other builders.</p>\n<p><strong>About the role</strong></p>\n<p>Platform Engineer is a role within Forward Deployed Engineering (FDE) for strong software and ML engineers who want to build new platform capabilities from scratch, grounded in real customer deployments.</p>\n<p>You will partner with customer-tagged FDEs who are driving delivery and customer outcomes, and embed where you can provide the highest leverage. In practice that means working in the trenches on architecture, product shaping, refactoring, hardening, and reusable abstractions, while preserving the pod’s ownership of customer understanding and day-to-day execution. You will also collaborate closely with our B2B Platform Team and other long-term owners to align early on what should generalize, what should remain customer-specific, and what “ready for handoff” looks like.</p>\n<p><strong>This role does not require travel. It is based in San Francisco or New York. We use a hybrid work model of 3 days in the office per week. We offer relocation assistance. Travel is optional-by-project and typically &lt;%, with occasional spikes for key embeds or launches.</strong></p>\n<p><strong>In this role you will</strong></p>\n<ul>\n<li><strong>Provide hands-on leverage to customer pods:</strong> embed with customer-tagged FDE teams to support generalization, contributing directly in architecture, product shaping, refactoring, and implementation.</li>\n</ul>\n<ul>\n<li><strong>Turn repeated signals into platform bets:</strong> translate cross-customer patterns into crisp hypotheses with clear success criteria, scope, and a validation plan that fits real account constraints.</li>\n</ul>\n<ul>\n<li><strong>Raise the engineering bar through tooling and mentorship:</strong> set org-wide quality norms through high-signal code review and pairing, and build lightweight developer tooling that makes good architecture, readability, and correctness the default across FDE.</li>\n</ul>\n<ul>\n<li><strong>Collaborate as part of cross-functional platform teams:</strong> partner closely with B2B Product, customer-tagged FDEs, ops, and business partners to bring the right products and platform capabilities to market.</li>\n</ul>\n<ul>\n<li><strong>Lead complex platform capabilities end-to-end when needed:</strong> for high-leverage primitives like our Context Platform, act as DRI from requirements through implementation, make key tradeoffs explicit, and pull in customer pods early to keep the work grounded in real deployments.</li>\n</ul>\n<p><strong>You might thrive in this role if you</strong></p>\n<ul>\n<li>Bring <strong>5+ years of software engineering or ML engineering experience</strong> with a track record of <strong>shipping 0→1 capabilities</strong> that other engineers or customers depend on. Experience in high-ambiguity, fast-iteration environments (startups or product-centric teams) is a plus.</li>\n</ul>\n<ul>\n<li>Have owned <strong>customer-adjacent technical work</strong> end-to-end, from scoping and hypothesis-setting through production adoption, and improved outcomes through structured iteration (instrumentation, evals, error analysis, and tightening success criteria over time).</li>\n</ul>\n<ul>\n<li>Have built or operated systems where <strong>reliability, security, and governance</strong> materially shaped design (permissions/RBAC, auditability, data access boundaries, rollout safety, observability, and incident-driven hardening).</li>\n</ul>\n<ul>\n<li>Communicate clearly across <strong>engineering, product, go-to-market, and customer-facing teams</strong> to drive alignment and shared understanding of customer needs and technical tradeoffs.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_030fe4a1-0c7","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/45ab8896-06bd-4c8e-bb76-914483d5d180","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$230K – $385K","x-skills-required":["software engineering","ML engineering","architecture","product shaping","refactoring","implementation","cross-functional collaboration","customer-facing communication","reliability","security","governance"],"x-skills-preferred":["high-ambiguity, fast-iteration environments","startups or product-centric teams","customer-adjacent technical work","structured iteration","instrumentation","evals","error analysis","tightening success criteria"],"datePosted":"2026-03-06T18:44:04.355Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"New York City"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"software engineering, ML engineering, architecture, product shaping, refactoring, implementation, cross-functional collaboration, customer-facing communication, reliability, security, governance, high-ambiguity, fast-iteration environments, startups or product-centric teams, customer-adjacent technical work, structured iteration, instrumentation, evals, error analysis, tightening success criteria","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":230000,"maxValue":385000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_f54a7413-225"},"title":"Research Engineer, Frontier Evals & Environments - Finance","description":"<p><strong>Job Posting</strong></p>\n<p><strong>Research Engineer, Frontier Evals &amp; Environments - Finance</strong></p>\n<p><strong>Location</strong></p>\n<p>San Francisco</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Location Type</strong></p>\n<p>Hybrid</p>\n<p><strong>Department</strong></p>\n<p>Research</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$205K – $380K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p>More details about our benefits are available to candidates during the hiring process.</p>\n<p>This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.</p>\n<p><strong>About the team</strong></p>\n<p>The Frontier Evals team builds north star model evaluations to drive progress towards safe AGI/ASI. This team builds ambitious evaluations to measure and steer our models, and creates self-improvement loops to steer our training, safety, and launch decisions. Some of the team&#39;s open-sourced evaluations include SWE-bench Verified, MLE-bench, PaperBench, and SWE-Lancer, and the team built and ran frontier evaluations for GPT4o, o1, o3, GPT 4.5, ChatGPT Agent, and GPT5. If you are interested in feeling firsthand the fast progress of our models, and steering them towards good, this is the team for you.</p>\n<p><strong>About you</strong></p>\n<p>We seek exceptional research engineers that can push the boundaries of our frontier models in the finance domain. We are looking for those who will help shape AI evaluations of financial reasoning and related capabilities, and will own individual threads within this endeavor end-to-end.</p>\n<p><strong>In this role, you&#39;ll:</strong></p>\n<ul>\n<li>Identify important model capabilities, skills, and behaviors that are crucial to financial workflows, and design methods to quantify performance in these areas</li>\n</ul>\n<ul>\n<li>Own and pursue a research agenda to identify an important model capability (especially as it relates to financial reasoning) and build evals to measure it</li>\n</ul>\n<ul>\n<li>Continuously refine evaluations of frontier AI models to assess the extent of frontier capabilities</li>\n</ul>\n<p><strong>We expect you to:</strong></p>\n<ul>\n<li>Have strong engineering and statistical analysis skills (with at least 2-3 years of full-time technical experience)</li>\n</ul>\n<ul>\n<li>Be passionate about evals for real world applications and knowledge work</li>\n</ul>\n<ul>\n<li>Be detail-oriented and thorough</li>\n</ul>\n<ul>\n<li>Be a team player / willing to do a variety of tasks to move the team forward</li>\n</ul>\n<ul>\n<li>Be passionate and knowledgeable about AGI/ASI measurement</li>\n</ul>\n<ul>\n<li>Be able to operate effectively in a dynamic and extremely fast-paced research environment as well as scope and deliver projects end-to-end</li>\n</ul>\n<p><strong>It would be great if you also have:</strong></p>\n<ul>\n<li>An ability to work cross-functionally</li>\n</ul>\n<ul>\n<li>Excellent communication skills</li>\n</ul>\n<p><strong>About OpenAI</strong></p>\n<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_f54a7413-225","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/9708b52d-909d-4a3c-a21e-f8043c3679ce","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$205K – $380K • Offers Equity","x-skills-required":["strong engineering and statistical analysis skills","passionate about evals for real world applications and knowledge work","detail-oriented and thorough","team player","passionate and knowledgeable about AGI/ASI measurement"],"x-skills-preferred":["ability to work cross-functionally","excellent communication skills"],"datePosted":"2026-03-06T18:37:37.177Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"strong engineering and statistical analysis skills, passionate about evals for real world applications and knowledge work, detail-oriented and thorough, team player, passionate and knowledgeable about AGI/ASI measurement, ability to work cross-functionally, excellent communication skills","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":205000,"maxValue":380000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_2bfc37e4-bc3"},"title":"Researcher, Pretraining Safety","description":"<p><strong>Job Posting</strong></p>\n<p><strong>Researcher, Pretraining Safety</strong></p>\n<p><strong>Location</strong></p>\n<p>San Francisco</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p>Safety Systems</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$295K – $445K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p>More details about our benefits are available to candidates during the hiring process.</p>\n<p>This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.</p>\n<p><strong><strong>About the Team</strong></strong></p>\n<p>The Safety Systems team is responsible for various safety work to ensure our best models can be safely deployed to the real world to benefit the society and is at the forefront of OpenAI&#39;s mission to build and deploy safe AGI, driving our commitment to AI safety and fostering a culture of trust and transparency.</p>\n<p>The Pretraining Safety team’s goal is to build safer, more capable base models and enable earlier, more reliable safety evaluation during training. We aim to:</p>\n<ol>\n<li><strong>Develop upstream safety evaluations</strong> that to monitor how and when unsafe behaviors and goals emerge;</li>\n</ol>\n<ol>\n<li><strong>Create safer priors</strong> through targeted pretraining and mid-training interventions that make downstream alignment more effective and efficient</li>\n</ol>\n<ol>\n<li><strong>Design safe-by-design architectures</strong> that allow for more controllability of model capabilities</li>\n</ol>\n<p>In addition, we will conduct the foundational research necessary for understanding how behaviors emerge, generalize, and can be reliably measured throughout training.</p>\n<p><strong><strong>About the Role</strong></strong></p>\n<p>The Pretraining Safety team is pioneering how safety is built into models before they reach post-training and deployment. In this role, you will work throughout the full stack of model development with a focus on pre-training:</p>\n<ul>\n<li>Identify safety-relevant behaviors as they first emerge in base models</li>\n</ul>\n<ul>\n<li>Evaluate and reduce risk without waiting for full-scale training runs</li>\n</ul>\n<ul>\n<li>Design architectures and training setups that make safer behavior the default</li>\n</ul>\n<ul>\n<li>Strengthen models by incorporating richer, earlier safety signals</li>\n</ul>\n<p>We collaborate across OpenAI’s safety ecosystem—from Safety Systems to Training—to ensure that safety foundations are robust, scalable, and grounded in real-world risks.</p>\n<p><strong><strong>In this role, you will:</strong></strong></p>\n<ul>\n<li>Develop new techniques to predict, measure, and evaluate unsafe behavior in early-stage models</li>\n</ul>\n<ul>\n<li>Design data curation strategies that improve pretraining priors and reduce downstream risk</li>\n</ul>\n<ul>\n<li>Explore safe-by-design architectures and training configurations that improve controllability</li>\n</ul>\n<ul>\n<li>Introduce novel safety-oriented loss functions, metrics, and evals into the pretraining stack</li>\n</ul>\n<ul>\n<li>Work closely with cross-functional safety teams to unify pre- and post-training risk reduction</li>\n</ul>\n<p><strong><strong>You might thrive in this role if you:</strong></strong></p>\n<ul>\n<li>Have experience developing or scaling pretraining architectures (LLMs, diffusion models, multimodal models, etc.)</li>\n</ul>\n<ul>\n<li>Are comfortable working with training infrastructure, data pipelines, and evaluation frameworks (e.g., Python, PyTorch/JAX, Apache Beam)</li>\n</ul>\n<ul>\n<li>Enjoy hands-on research — designing, implementing, and iterating on experiments</li>\n</ul>\n<ul>\n<li>Enjoy collaborating with diverse technical and cross-functional partners (e.g., policy, legal, training)</li>\n</ul>\n<ul>\n<li>Are data-driven with strong statistical reasoning and rigor in experimental design</li>\n</ul>\n<ul>\n<li>Value building clean, scalable research workflows and streamlining processes for yourself and others</li>\n</ul>\n<p><strong><strong>About OpenAI</strong></strong></p>\n<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_2bfc37e4-bc3","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/d829b701-5ee2-414f-8596-ef94911a168a","x-work-arrangement":"onsite","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$295K – $445K • Offers Equity","x-skills-required":["pretraining architectures","training infrastructure","data pipelines","evaluation frameworks","Python","PyTorch/JAX","Apache Beam","hands-on research","collaboration","data-driven","statistical reasoning"],"x-skills-preferred":["LLMs","diffusion models","multimodal models","safe-by-design architectures","training configurations","loss functions","metrics","evals"],"datePosted":"2026-03-06T18:36:25.493Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"pretraining architectures, training infrastructure, data pipelines, evaluation frameworks, Python, PyTorch/JAX, Apache Beam, hands-on research, collaboration, data-driven, statistical reasoning, LLMs, diffusion models, multimodal models, safe-by-design architectures, training configurations, loss functions, metrics, evals","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":295000,"maxValue":445000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_c224e1d4-cc6"},"title":"Backend Software Engineer (Evals)","description":"<p><strong>Location</strong></p>\n<p>San Francisco; Seattle</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p>Applied AI</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$230K – $385K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p>More details about our benefits are available to candidates during the hiring process.</p>\n<p>This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.</p>\n<p><strong>About the Team</strong></p>\n<p>The Support Automation team at OpenAI scales the organization by applying cutting-edge AI models to real-world challenges, automating and enhancing work across the organization. From customer operations to engineering, we develop an ecosystem of automation products that empower our colleagues and drive impact. We&#39;re passionate about crafting products that serve those around us, blending rapid prototyping with a focus on long-term quality and reliability. By creating reusable solutions, we create patterns that can be applied across diverse domains within OpenAI.</p>\n<p>TLDR: this team leverages OpenAI technology to improve OpenAI, and you’ll have the opportunity to leverage the full extent of our tech (both public and pre-released) to accomplish this mission.</p>\n<p><strong>About the Role</strong></p>\n<p>We’re looking for a <strong>Backend Software Engineer</strong> with experience working in ML/LLM-heavy domains to help to design and build an evals infrastructure that measures the quality of OpenAI’s support automation. This is a deeply technical and highly cross-functional role where you’ll build robust systems and backend services that serve as the foundation for how knowledge is created, accessed, and applied across OpenAI. The role will especially focus on working closely with Data Science and Research partners to design and build evals at scale.</p>\n<p><strong>In this role, you will:</strong></p>\n<ul>\n<li>Design eval pipelines that are reliable, reproducible, and extendable</li>\n</ul>\n<ul>\n<li>Build the infrastructure for continuous eval monitoring frameworks (regression/drift monitoring, building robust golden datasets) along with feedback loops that ultimately strengthen support automation</li>\n</ul>\n<ul>\n<li>Design, build, and maintain backend services and APIs to support intelligent automation and knowledge systems</li>\n</ul>\n<ul>\n<li>Integrate and structure data across internal platforms, transforming it into formats optimized for use by downstream systems and AI workflows.</li>\n</ul>\n<ul>\n<li>Collaborate closely with data, research, and engineering teams to integrate OpenAI models into high-leverage workflows</li>\n</ul>\n<ul>\n<li>Own the full development lifecycle of new backend systems and internal platform capabilities</li>\n</ul>\n<ul>\n<li>Build with scale and maintainability in mind, while rapidly iterating on new ideas</li>\n</ul>\n<p><strong>You might be a great fit if you have:</strong></p>\n<ul>\n<li>4+ years of backend engineering experience at product-driven companies (excluding internships)</li>\n</ul>\n<ul>\n<li>Proficiency in backend technologies. Our tech stack includes Python, FastAPI, and Postgres</li>\n</ul>\n<ul>\n<li>Experience designing and scaling distributed systems, APIs, or data processing pipelines</li>\n</ul>\n<ul>\n<li>Have experience building AI agents or applications, including designing evals and improving performance through prompting or scaffolding</li>\n</ul>\n<ul>\n<li>Are familiar with evaluation methods for LLMs and have worked with patterns like multi-agent workflows, tool use, or long context.</li>\n</ul>\n<ul>\n<li>Experience creating production evals and/or measuring performance of ML/LLM models at scale</li>\n</ul>\n<ul>\n<li>A pragmatic mindset. You’re comfortable shipping iteratively while building toward a long-term vision</li>\n</ul>\n<p><strong>About OpenAI</strong></p>\n<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_c224e1d4-cc6","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/3d064454-c0c3-4225-bc2c-6d8c0f8735b2","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$230K – $385K","x-skills-required":["backend engineering","Python","FastAPI","Postgres","distributed systems","APIs","data processing pipelines","AI agents","evaluation methods for LLMs"],"x-skills-preferred":["ML/LLM-heavy domains","designing evals","improving performance through prompting or scaffolding","multi-agent workflows","tool use","long context"],"datePosted":"2026-03-06T18:19:49.073Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco; Seattle"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"backend engineering, Python, FastAPI, Postgres, distributed systems, APIs, data processing pipelines, AI agents, evaluation methods for LLMs, ML/LLM-heavy domains, designing evals, improving performance through prompting or scaffolding, multi-agent workflows, tool use, long context","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":230000,"maxValue":385000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_fce9fa40-da1"},"title":"Principal Product Manager","description":"<p><strong>Summary</strong></p>\n<p>Microsoft AI are looking for a talented Principal Product Manager at their Hyderabad office. This role sits at the heart of strategic decision-making, turning market data into actionable insights for a company that&#39;s revolutionising AI-powered consumer health experiences. You&#39;ll work directly with leadership to shape the company&#39;s direction in the AI and health markets.</p>\n<p><strong>About the Role</strong></p>\n<p>This is a rare opportunity for a breadth and depth PM who works across large web scale search products as well as breakthrough Copilot product areas. We are looking for a Principal Product Manager (IC) to drive a core product area within Microsoft’s AI-powered consumer health experiences across Bing and Copilot. This role sits in Hyderabad and operates in a highly collaborative, cross-geo environment with close partners in the UK and US.</p>\n<p><strong>Accountabilities</strong></p>\n<ul>\n<li>Own a product area end-to-end, developing deep understanding of user needs and behaviors, defining product strategy and requirements, and driving execution for AI-powered consumer health experiences including AI workflows, Evals etc.</li>\n<li>Translate business objectives into clear product strategy, user experience direction, and technical requirements in close partnership with design, engineering, data science, and AI model teams.</li>\n</ul>\n<p><strong>The Candidate we&#39;re looking for</strong></p>\n<p><strong>Experience:</strong></p>\n<ul>\n<li>10+ years experience in product management, with deep technical fluency (DS &amp; Engg.)</li>\n</ul>\n<p><strong>Technical skills:</strong></p>\n<ul>\n<li>Deep technical fluency in DS &amp; Engg.</li>\n</ul>\n<p><strong>Personal attributes:</strong></p>\n<ul>\n<li>Exceptional written and verbal communication skills, with the ability to clearly articulate high altitude strategy as well as lowest level technical details.</li>\n</ul>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Competitive salary and benefits package</li>\n<li>Opportunity to work on cutting-edge AI and health projects</li>\n<li>Collaborative and dynamic work environment</li>\n<li>Professional development opportunities</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_fce9fa40-da1","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Microsoft AI","sameAs":"https://microsoft.ai","logo":"https://logos.yubhub.co/microsoft.ai.png"},"x-apply-url":"https://microsoft.ai/job/principal-product-manager-9/","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"Competitive salary and benefits package","x-skills-required":["product management","deep technical fluency","DS & Engg.","AI workflows","Evals"],"x-skills-preferred":["AI","machine learning","data science","engineering"],"datePosted":"2026-03-06T07:33:27.787Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Hyderabad"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"product management, deep technical fluency, DS & Engg., AI workflows, Evals, AI, machine learning, data science, engineering"}]}