{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/machine-learning-infrastructure"},"x-facet":{"type":"skill","slug":"machine-learning-infrastructure","display":"Machine Learning Infrastructure","count":11},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_d50772ab-afe"},"title":"Staff / Senior Software Engineer, Cloud Inference","description":"<p>We are seeking a Staff / Senior Software Engineer to join our Cloud Inference team. The successful candidate will design and build infrastructure that serves Claude across multiple cloud service providers (CSPs), accounting for differences in compute hardware, networking, APIs, and operational models.</p>\n<p>The ideal candidate will have significant software engineering experience, with a strong background in high-performance, large-scale distributed systems serving millions of users. They will also have experience building or operating services on at least one major cloud platform (AWS, GCP, or Azure), with exposure to Kubernetes, Infrastructure as Code or container orchestration.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Design and build infrastructure that serves Claude across multiple CSPs, accounting for differences in compute hardware, networking, APIs, and operational models</li>\n</ul>\n<ul>\n<li>Collaborate with CSP partner engineering teams to resolve operational issues, influence provider roadmaps, and stand up end-to-end serving on new cloud platforms</li>\n</ul>\n<ul>\n<li>Design and evolve CI/CD automation systems, including validation and deployment pipelines, that reliably ship new model versions to millions of users across cloud platforms without regressions</li>\n</ul>\n<ul>\n<li>Design interfaces and tooling abstractions across CSPs that enable cost-effective inference management, scale across providers, and reduce per-platform complexity</li>\n</ul>\n<ul>\n<li>Contribute to capacity planning and autoscaling strategies that dynamically match supply with demand across CSP validation and production workloads</li>\n</ul>\n<ul>\n<li>Optimise inference cost and performance across providers,designing workload placement and routing systems that direct requests to the most cost-effective accelerator and region</li>\n</ul>\n<ul>\n<li>Contribute to inference features that must work consistently across all platforms</li>\n</ul>\n<ul>\n<li>Analyse observability data across providers to identify performance bottlenecks, cost anomalies, and regressions, and drive remediation based on real-world production workloads</li>\n</ul>\n<p>Requirements:</p>\n<ul>\n<li>Significant software engineering experience, with a strong background in high-performance, large-scale distributed systems serving millions of users</li>\n</ul>\n<ul>\n<li>Experience building or operating services on at least one major cloud platform (AWS, GCP, or Azure), with exposure to Kubernetes, Infrastructure as Code or container orchestration</li>\n</ul>\n<ul>\n<li>Strong interest in inference</li>\n</ul>\n<ul>\n<li>Thrive in cross-functional collaboration with both internal teams and external partners</li>\n</ul>\n<ul>\n<li>Are a fast learner who can quickly ramp up on new technologies, hardware platforms, and provider ecosystems</li>\n</ul>\n<ul>\n<li>Are highly autonomous and self-driven, taking ownership of problems end-to-end with a bias toward flexibility and high-impact work</li>\n</ul>\n<ul>\n<li>Pick up slack, even when it goes outside your job description</li>\n</ul>\n<p>Preferred skills:</p>\n<ul>\n<li>Direct experience working with CSP partner teams to scale infrastructure or products across multiple platforms, navigating differences in networking, security, privacy, billing, and managed service offerings</li>\n</ul>\n<ul>\n<li>A background in building platform-agnostic tooling or abstraction layers that work across cloud providers</li>\n</ul>\n<ul>\n<li>Hands-on experience with capacity management, cost optimisation, or resource planning at scale across heterogeneous environments</li>\n</ul>\n<ul>\n<li>Strong familiarity with LLM inference optimisation, batching, caching, and serving strategies</li>\n</ul>\n<ul>\n<li>Experience with Machine learning infrastructure including GPUs, TPUs, Trainium, or other AI accelerators</li>\n</ul>\n<ul>\n<li>Background designing and building CI/CD systems that automate deployment and validation across cloud environments</li>\n</ul>\n<ul>\n<li>Solid understanding of multi-region deployments, geographic routing, and global traffic management</li>\n</ul>\n<ul>\n<li>Proficiency in Python or Rust</li>\n</ul>\n<p>Salary Range: $300,000-$485,000 USD</p>\n<p>Experience Level: Staff</p>\n<p>Employment Type: Full-time</p>\n<p>Workplace Type: Hybrid</p>\n<p>Category: Engineering</p>\n<p>Industry: Technology</p>\n<p>Required Skills:</p>\n<ul>\n<li>High-performance, large-scale distributed systems</li>\n</ul>\n<ul>\n<li>Cloud computing (AWS, GCP, Azure)</li>\n</ul>\n<ul>\n<li>Kubernetes</li>\n</ul>\n<ul>\n<li>Infrastructure as Code</li>\n</ul>\n<ul>\n<li>Container orchestration</li>\n</ul>\n<ul>\n<li>Inference</li>\n</ul>\n<ul>\n<li>Cross-functional collaboration</li>\n</ul>\n<ul>\n<li>Autonomy and self-driven</li>\n</ul>\n<ul>\n<li>Platform-agnostic tooling</li>\n</ul>\n<ul>\n<li>Capacity management</li>\n</ul>\n<ul>\n<li>Cost optimisation</li>\n</ul>\n<ul>\n<li>Resource planning</li>\n</ul>\n<ul>\n<li>LLM inference optimisation</li>\n</ul>\n<ul>\n<li>Machine learning infrastructure</li>\n</ul>\n<ul>\n<li>CI/CD systems</li>\n</ul>\n<ul>\n<li>Multi-region deployments</li>\n</ul>\n<ul>\n<li>Geographic routing</li>\n</ul>\n<ul>\n<li>Global traffic management</li>\n</ul>\n<ul>\n<li>Python</li>\n</ul>\n<ul>\n<li>Rust</li>\n</ul>\n<p>Preferred Skills:</p>\n<ul>\n<li>Direct experience working with CSP partner teams</li>\n</ul>\n<ul>\n<li>Building platform-agnostic tooling</li>\n</ul>\n<ul>\n<li>Hands-on experience with capacity management</li>\n</ul>\n<ul>\n<li>Strong familiarity with LLM inference optimisation</li>\n</ul>\n<ul>\n<li>Experience with Machine learning infrastructure</li>\n</ul>\n<ul>\n<li>Background designing and building CI/CD systems</li>\n</ul>\n<ul>\n<li>Solid understanding of multi-region deployments</li>\n</ul>\n<ul>\n<li>Proficiency in Python or Rust</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_d50772ab-afe","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5107466008","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$300,000-$485,000 USD","x-skills-required":["high-performance, large-scale distributed systems","cloud computing (AWS, GCP, Azure)","kubernetes","infrastructure as code","container orchestration","inference","cross-functional collaboration","autonomy and self-driven","platform-agnostic tooling","capacity management","cost optimisation","resource planning","llm inference optimisation","machine learning infrastructure","ci/cd systems","multi-region deployments","geographic routing","global traffic management","python","rust"],"x-skills-preferred":["direct experience working with csp partner teams","building platform-agnostic tooling","hands-on experience with capacity management","strong familiarity with llm inference optimisation","experience with machine learning infrastructure","background designing and building ci/cd systems","solid understanding of multi-region deployments","proficiency in python or rust"],"datePosted":"2026-04-18T15:53:24.048Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"engineering","industry":"technology","skills":"high-performance, large-scale distributed systems, cloud computing (AWS, GCP, Azure), kubernetes, infrastructure as code, container orchestration, inference, cross-functional collaboration, autonomy and self-driven, platform-agnostic tooling, capacity management, cost optimisation, resource planning, llm inference optimisation, machine learning infrastructure, ci/cd systems, multi-region deployments, geographic routing, global traffic management, python, rust, direct experience working with csp partner teams, building platform-agnostic tooling, hands-on experience with capacity management, strong familiarity with llm inference optimisation, experience with machine learning infrastructure, background designing and building ci/cd systems, solid understanding of multi-region deployments, proficiency in python or rust","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":300000,"maxValue":485000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_3c6419c4-a9b"},"title":"Software Engineer, Compute Efficiency","description":"<p>As a Software Engineer for Compute Efficiency on the Capacity team, you will play a central role in making our systems more performant, cost-effective, and sustainable,without compromising reliability or latency.</p>\n<p>You will work across the full infrastructure stack, from cloud platforms and networking to application-level performance, and will bridge the gap between high-level research needs and low-level hardware constraints to build the most efficient AI infrastructure in the world. You will help with building the telemetry, cost attribution, and optimization frameworks that ensure every dollar of our infrastructure investment delivers maximum value.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Build and evolve telemetry and monitoring systems to provide deep visibility into infrastructure performance, utilization, and costs across our cloud and datacenter fleets.</li>\n</ul>\n<ul>\n<li>Design and implement cost attribution frameworks for our multi-tenant infrastructure, enabling teams to understand and optimize their resource consumption.</li>\n</ul>\n<ul>\n<li>Identify and resolve performance bottlenecks and capacity hotspots through deep analysis of distributed systems at scale.</li>\n</ul>\n<ul>\n<li>Partner closely with cloud service providers and internal stakeholders to optimize cluster configurations, workload placement, and resource utilization across AI training and inference workloads,including large-scale clusters spanning thousands to hundreds of thousands of machines.</li>\n</ul>\n<ul>\n<li>Develop and champion engineering practices around efficiency, driving a culture of performance awareness and cost-conscious design across Anthropic.</li>\n</ul>\n<ul>\n<li>Collaborate with research and product teams to deeply understand their infrastructure needs, and design solutions that balance performance with cost efficiency.</li>\n</ul>\n<ul>\n<li>Drive architectural improvements and code-level optimizations across multiple services and platforms to deliver measurable utilization and performance gains.</li>\n</ul>\n<p>You may be a good fit if you:</p>\n<ul>\n<li>Have 6+ years of relevant industry experience, 1+ year leading large scale, complex projects or teams as a software engineer or tech lead</li>\n</ul>\n<ul>\n<li>Deep expertise in distributed systems at scale, with a strong focus on infrastructure reliability, scalability, and continuous improvement.</li>\n</ul>\n<ul>\n<li>Strong proficiency in at least one programming language (e.g., Python, Rust, Go, Java)</li>\n</ul>\n<ul>\n<li>Hands-on experience with cloud infrastructure, including Kubernetes, Infrastructure as Code, and major cloud providers such as AWS or GCP.</li>\n</ul>\n<ul>\n<li>Experience optimizing end-to-end performance of distributed systems, including workload right-sizing and resource utilization tuning.</li>\n</ul>\n<ul>\n<li>You possess a deep curiosity for how things work under the hood and have a proven ability to work independently to solve opaque performance issues</li>\n</ul>\n<ul>\n<li>Experience designing or working with performance and utilization monitoring tools in large-scale, distributed environments.</li>\n</ul>\n<ul>\n<li>Strong problem-solving skills with the ability to work independently and navigate ambiguity.</li>\n</ul>\n<ul>\n<li>Excellent communication and collaboration skills,you will work closely with internal and external stakeholders to build consensus and drive projects forward.</li>\n</ul>\n<p>Strong candidates may have:</p>\n<ul>\n<li>Experience with machine learning infrastructure workloads as well as associated networking technologies like NCCL.</li>\n</ul>\n<ul>\n<li>Low level systems experience, for example linux kernel tuning and eBPF</li>\n</ul>\n<ul>\n<li>Quickly understanding systems design tradeoffs, keeping track of rapidly evolving software systems</li>\n</ul>\n<ul>\n<li>Published work in performance optimization and scaling distributed systems</li>\n</ul>\n<p>The annual compensation range for this role is $320,000-$405,000 USD.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_3c6419c4-a9b","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5108982008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$320,000-$405,000 USD","x-skills-required":["distributed systems","cloud infrastructure","Kubernetes","Infrastructure as Code","AWS","GCP","Python","Rust","Go","Java"],"x-skills-preferred":["machine learning infrastructure workloads","NCCL","linux kernel tuning","eBPF","performance optimization","scaling distributed systems"],"datePosted":"2026-04-18T15:49:18.293Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"distributed systems, cloud infrastructure, Kubernetes, Infrastructure as Code, AWS, GCP, Python, Rust, Go, Java, machine learning infrastructure workloads, NCCL, linux kernel tuning, eBPF, performance optimization, scaling distributed systems","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":320000,"maxValue":405000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_59e88547-efc"},"title":"Senior Software Engineer, Systems","description":"<p>About Anthropic</p>\n<p>Anthropic&#39;s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole.</p>\n<p>About the Role</p>\n<p>Anthropic&#39;s Infrastructure organization is foundational to our mission of developing AI systems that are reliable, interpretable, and steerable. The systems we build determine how quickly we can train new models, how reliably we can run safety experiments, and how effectively we can scale Claude to millions of users , demonstrating that safe, reliable infrastructure and frontier capabilities can go hand in hand. The Systems engineering team owns compute uptime and resilience at massive scale, building the clusters, automation, and observability that make frontier AI research possible and safely deployable to customers.</p>\n<p>Responsibilities</p>\n<ul>\n<li>Lead infrastructure projects from design through delivery, owning scope, execution, and outcomes</li>\n<li>Build and maintain systems that support AI clusters at massive scale (thousands to hundreds of thousands of machines)</li>\n<li>Partner with cloud providers and internal teams to solve compute, networking, and reliability challenges</li>\n<li>Tackle difficult technical problems in your domain and proactively fill gaps in tooling, documentation, and processes</li>\n<li>Contribute to operational practices including incident response, postmortems, and on-call rotations</li>\n</ul>\n<p>Benefits</p>\n<ul>\n<li>Competitive compensation and benefits</li>\n<li>Optional equity donation matching</li>\n<li>Generous vacation and parental leave</li>\n<li>Flexible working hours</li>\n<li>Lovely office space in which to collaborate with colleagues</li>\n</ul>\n<p>Requirements</p>\n<ul>\n<li>6+ years of software engineering experience</li>\n<li>Have led technical projects end-to-end over multiple months, including scoping, breaking down work, and driving delivery</li>\n<li>Have deep knowledge of distributed systems, reliability, and cloud platforms (Kubernetes, IaC, AWS/GCP)</li>\n<li>Are strong in at least one systems language (Python, Rust, Go, Java)</li>\n<li>Solve hard problems independently and know when to pull others in</li>\n<li>Help teammates grow through knowledge sharing and thoughtful technical guidance</li>\n<li>Communicate clearly in design docs, presentations, and cross-functional discussions</li>\n</ul>\n<p>Preferred Qualifications</p>\n<ul>\n<li>Security and privacy best practice expertise</li>\n<li>Experience with machine learning infrastructure like GPUs, TPUs, or Trainium, as well as supporting networking infrastructure like NCCL</li>\n<li>Low level systems experience, for example linux kernel tuning and eBPF</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_59e88547-efc","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4915842008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"£240,000-£325,000 GBP","x-skills-required":["Distributed systems","Reliability","Cloud platforms","Kubernetes","IaC","AWS/GCP","Systems language","Python","Rust","Go","Java"],"x-skills-preferred":["Security and privacy best practice","Machine learning infrastructure","GPUs","TPUs","Trainium","Networking infrastructure","NCCL","Low level systems experience","Linux kernel tuning","eBPF"],"datePosted":"2026-04-18T15:48:47.617Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"London, UK"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Distributed systems, Reliability, Cloud platforms, Kubernetes, IaC, AWS/GCP, Systems language, Python, Rust, Go, Java, Security and privacy best practice, Machine learning infrastructure, GPUs, TPUs, Trainium, Networking infrastructure, NCCL, Low level systems experience, Linux kernel tuning, eBPF","baseSalary":{"@type":"MonetaryAmount","currency":"GBP","value":{"@type":"QuantitativeValue","minValue":240000,"maxValue":325000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_08f992cf-0e9"},"title":"CyberSecurity Team Lead, Infrastructure and Application","description":"<p>About Mistral AI</p>\n<p>Mistral AI is a technology company that develops and provides AI-powered solutions and platforms for enterprise use. Our technology is designed to integrate seamlessly into daily working life.</p>\n<p>Role Summary</p>\n<p>As a CyberSecurity Team Lead, you will be responsible for architecting and enforcing the security posture of our entire technical stack, from on-premise foundations to cloud-native deployments. You will oversee the identification, prioritization, and remediation of vulnerabilities across both On-Prem and Cloud infrastructures as well as internal applications.</p>\n<p>Responsibilities</p>\n<ul>\n<li>Oversee the identification, prioritization, and remediation of vulnerabilities across both On-Prem and Cloud infrastructures as well as internal applications.</li>\n<li>Select, deploy, and maintain the tools needed for visibility and protection, including CNAPP, CSPM, SAST/DAST, secret scanning, and SBOM/CVE tracking.</li>\n<li>Integrate security controls and automated gates directly into CI/CD pipelines to catch vulnerabilities before deployment (Shift Left).</li>\n<li>Partner with engineering teams to interpret findings and &#39;ease the fix,&#39; providing patches, code snippets, or architectural advice to resolve issues quickly.</li>\n<li>Define and maintain rigorous security guidelines and best practices for developers and system administrators.</li>\n<li>Design and lead security awareness programs and technical training tailored for developers and admins to reduce human risk.</li>\n<li>Track and define key security metrics (MTTR, coverage, vulnerability density) to visualize posture and progress to leadership.</li>\n</ul>\n<p>Requirements</p>\n<ul>\n<li>6+ years of experience in Information Security, with a specific focus on Application Security, Cloud Security, or DevSecOps.</li>\n<li>Strong scripting skills (Python, Go, or Bash) to automate security tasks and integrate tools.</li>\n<li>Deep understanding of CI/CD ecosystems and container orchestration (Kubernetes/Docker).</li>\n<li>Hands-on experience with modern security tooling (e.g., Wiz, Snyk, SonarQube, Prisma, or similar enterprise tools).</li>\n<li>Collaborative mindset: you view developers as partners, not adversaries, and focus on enabling them to code securely.</li>\n<li>Clear communication, autonomous, and capable of translating technical security risks into actionable engineering tasks.</li>\n</ul>\n<p>Benefits</p>\n<ul>\n<li>Competitive salary</li>\n<li>Comprehensive health insurance</li>\n<li>Flexible working hours</li>\n<li>Professional development opportunities</li>\n</ul>\n<p>Note: The company may offer additional benefits not listed here.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_08f992cf-0e9","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai/","logo":"https://logos.yubhub.co/mistral.ai.png"},"x-apply-url":"https://jobs.lever.co/mistral/c9b75928-dd48-4432-b6f1-fc0b24e51657","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Application Security","Cloud Security","DevSecOps","CI/CD ecosystems","Container orchestration","Modern security tooling","Scripting skills","Collaborative mindset","Clear communication"],"x-skills-preferred":["Industry certifications","Infrastructure as Code","Offensive security","Prior experience securing large-scale AI or Machine Learning infrastructure"],"datePosted":"2026-04-17T12:46:50.079Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Paris"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Application Security, Cloud Security, DevSecOps, CI/CD ecosystems, Container orchestration, Modern security tooling, Scripting skills, Collaborative mindset, Clear communication, Industry certifications, Infrastructure as Code, Offensive security, Prior experience securing large-scale AI or Machine Learning infrastructure"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_7bce292a-74f"},"title":"CyberSecurity Team Lead, Infrastructure and Application","description":"<p>Role summary</p>\n<p>Embedded directly within Mistral&#39;s Security Engineering ecosystem, you will architect and enforce the security posture of our entire technical stack, from on-premise foundations to cloud-native deployments.</p>\n<p>As a CyberSecurity Team Lead, you will oversee the identification, prioritization, and remediation of vulnerabilities across both On-Prem and Cloud infrastructures as well as internal applications.</p>\n<p>You will select, deploy, and maintain the tools needed for visibility and protection, including CNAPP, CSPM, SAST/DAST, secret scanning, and SBOM/CVE tracking.</p>\n<p>Integrate security controls and automated gates directly into CI/CD pipelines to catch vulnerabilities before deployment (Shift Left).</p>\n<p>Partner with engineering teams to interpret findings and &#39;ease the fix,&#39; providing patches, code snippets, or architectural advice to resolve issues quickly.</p>\n<p>Define and maintain rigorous security guidelines and best practices for developers and system administrators.</p>\n<p>Design and lead security awareness programs and technical training tailored for developers and admins to reduce human risk.</p>\n<p>Track and define key security metrics (MTTR, coverage, vulnerability density) to visualize posture and progress to leadership.</p>\n<p>Who you are</p>\n<p>• 6+ years of experience in Information Security, with a specific focus on Application Security, Cloud Security, or DevSecOps.</p>\n<p>• Strong scripting skills (Python, Go, or Bash) to automate security tasks and integrate tools.</p>\n<p>• Deep understanding of CI/CD ecosystems and container orchestration (Kubernetes/Docker).</p>\n<p>• Hands-on experience with modern security tooling (e.g., Wiz, Snyk, SonarQube, Prisma, or similar enterprise tools).</p>\n<p>• Collaborative mindset: you view developers as partners, not adversaries, and focus on enabling them to code securely.</p>\n<p>• Clear communication, autonomous, and capable of translating technical security risks into actionable engineering tasks.</p>\n<p>It would be ideal if you also have:</p>\n<p>• Industry certifications such as CISSP, CCSP, OSCP, or cloud-specific security certifications.</p>\n<p>• Strong Infrastructure as Code (IaC) experience with Terraform or Ansible.</p>\n<p>• Experience in offensive security (Penetration Testing) to better understand attacker mindsets.</p>\n<p>• Prior experience securing large-scale AI or Machine Learning infrastructure.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_7bce292a-74f","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai"},"x-apply-url":"https://jobs.lever.co/mistral/c9b75928-dd48-4432-b6f1-fc0b24e51657","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"hybrid","x-salary-range":null,"x-skills-required":["Application Security","Cloud Security","DevSecOps","CI/CD","Container Orchestration","Modern Security Tooling","Scripting Skills","Infrastructure as Code"],"x-skills-preferred":["Industry Certifications","Infrastructure as Code","Offensive Security","Large-Scale AI or Machine Learning Infrastructure"],"datePosted":"2026-03-10T11:24:46.918Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Paris"}},"occupationalCategory":"Engineering","industry":"Technology","skills":"Application Security, Cloud Security, DevSecOps, CI/CD, Container Orchestration, Modern Security Tooling, Scripting Skills, Infrastructure as Code, Industry Certifications, Infrastructure as Code, Offensive Security, Large-Scale AI or Machine Learning Infrastructure"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_139cd1f4-231"},"title":"Software Engineer, Compute Efficiency","description":"<p><strong>About Anthropic</strong></p>\n<p>Anthropic&#39;s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole.</p>\n<p>At Anthropic, we are building some of the most complex and large-scale AI infrastructure in the world. As that infrastructure scales rapidly, so does the imperative to optimise how we use it. As a Software Engineer for Compute Efficiency on the Capacity team, you will play a central role in making our systems more performant, cost-effective, and sustainable—without compromising reliability or latency.</p>\n<p>You will work across the full infrastructure stack, from cloud platforms and networking to application-level performance, and will bridge the gap between high-level research needs and low-level hardware constraints to build the most efficient AI infrastructure in the world. You will help with building the telemetry, cost attribution, and optimisation frameworks that ensure every dollar of our infrastructure investment delivers maximum value. This is a high-impact, cross-functional role at the intersection of systems engineering, financial optimisation, and AI infrastructure.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Build and evolve telemetry and monitoring systems to provide deep visibility into infrastructure performance, utilisation, and costs across our cloud and datacentre fleets.</li>\n</ul>\n<ul>\n<li>Design and implement cost attribution frameworks for our multi-tenant infrastructure, enabling teams to understand and optimise their resource consumption.</li>\n</ul>\n<ul>\n<li>Identify and resolve performance bottlenecks and capacity hotspots through deep analysis of distributed systems at scale.</li>\n</ul>\n<ul>\n<li>Partner closely with cloud service providers and internal stakeholders to optimise cluster configurations, workload placement, and resource utilisation across AI training and inference workloads—including large-scale clusters spanning thousands to hundreds of thousands of machines.</li>\n</ul>\n<ul>\n<li>Develop and champion engineering practices around efficiency, driving a culture of performance awareness and cost-conscious design across Anthropic.</li>\n</ul>\n<ul>\n<li>Collaborate with research and product teams to deeply understand their infrastructure needs, and design solutions that balance performance with cost efficiency.</li>\n</ul>\n<ul>\n<li>Drive architectural improvements and code-level optimisations across multiple services and platforms to deliver measurable utilisation and performance gains.</li>\n</ul>\n<p><strong>You may be a good fit if you:</strong></p>\n<ul>\n<li>Have 6+ years of relevant industry experience, 1+ year leading large scale, complex projects or teams as a software engineer or tech lead</li>\n</ul>\n<ul>\n<li>Deep expertise in distributed systems at scale, with a strong focus on infrastructure reliability, scalability, and continuous improvement.</li>\n</ul>\n<ul>\n<li>Strong proficiency in at least one programming language (e.g., Python, Rust, Go, Java)</li>\n</ul>\n<ul>\n<li>Hands-on experience with cloud infrastructure, including Kubernetes, Infrastructure as Code, and major cloud providers such as AWS or GCP.</li>\n</ul>\n<ul>\n<li>Experience optimising end-to-end performance of distributed systems, including workload right-sizing and resource utilisation tuning.</li>\n</ul>\n<ul>\n<li>You possess a deep curiosity for how things work under the hood and have a proven ability to work independently to solve opaque performance issues</li>\n</ul>\n<ul>\n<li>Experience designing or working with performance and utilisation monitoring tools in large-scale, distributed environments.</li>\n</ul>\n<ul>\n<li>Strong problem-solving skills with the ability to work independently and navigate ambiguity.</li>\n</ul>\n<ul>\n<li>Excellent communication and collaboration skills—you will work closely with internal and external stakeholders to build consensus and drive projects forward.</li>\n</ul>\n<p><strong>Strong candidates may have:</strong></p>\n<ul>\n<li>Experience with machine learning infrastructure workloads as well as associated networking technologies like NCCL.</li>\n</ul>\n<ul>\n<li>Low level systems experience, for example linux kernel tuning and eBPF</li>\n</ul>\n<ul>\n<li>Quickly understanding systems design tradeoffs, keeping track of rapidly evolving software systems</li>\n</ul>\n<ul>\n<li>Published work in performance optimisation and scaling distributed systems</li>\n</ul>\n<p><strong>Logistics</strong></p>\n<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience. <strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>\n<p><strong>Visa sponsorship:</strong> We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>\n<p><strong>We encourage you to apply even if you do not believe you meet every single qualification.</strong> Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work. We think AI systems like the ones we&#39;re building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team.</p>\n<p><strong>Your safety matters to us.</strong> To protect yourself from potential scams, remember that Anthropic</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_139cd1f4-231","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://job-boards.greenhouse.io","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5108982008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$320,000 - $405,000USD","x-skills-required":["distributed systems","cloud infrastructure","Kubernetes","Infrastructure as Code","AWS","GCP","Python","Rust","Go","Java","performance optimisation","scalability","continuous improvement"],"x-skills-preferred":["machine learning infrastructure workloads","NCCL","linux kernel tuning","eBPF","systems design tradeoffs","published work in performance optimisation"],"datePosted":"2026-03-08T13:56:57.417Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"distributed systems, cloud infrastructure, Kubernetes, Infrastructure as Code, AWS, GCP, Python, Rust, Go, Java, performance optimisation, scalability, continuous improvement, machine learning infrastructure workloads, NCCL, linux kernel tuning, eBPF, systems design tradeoffs, published work in performance optimisation","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":320000,"maxValue":405000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_0a7113f5-76c"},"title":"Engineering Manager, Cloud Inference AWS","description":"<p><strong>About the role</strong></p>\n<p>We are seeking an experienced Engineering Manager to lead the Cloud Inference team for AWS. You will lead your team to scale and optimize Claude to serve the massive audiences of developers and enterprise companies using AWS. You will own the end-to-end product of Claude on AWS, including API, load balancing, inference, capacity and operations. Your team will ensure our LLMs meet rigorous performance, safety and security standards and enhance our core infrastructure for packaging, testing, and deploying inference technology across the globe. Your work will increase the scale at which Anthropic operates and accelerate our ability to reliably launch new frontier models and innovative features to customers across all platforms.</p>\n<p><strong>Responsibilities:</strong></p>\n<ul>\n<li>Set technical strategy and oversee development of Claude on AWS across all layers of the technical stack.</li>\n<li>Collaborate across teams and companies to deeply understand product, infrastructure, operations and capacity needs, identifying potential solutions to support frontier LLM serving</li>\n<li>Work closely with cross-functional stakeholders across companies to align on goals and drive outcomes</li>\n<li>Create clarity for the team and stakeholders in an ambiguous and evolving environment</li>\n<li>Take an inclusive approach to hiring and coaching top technical talent, and support a high performing team</li>\n<li>Design and run processes (e.g. postmortem review, incident response, on-call rotations) that help the team operate effectively and never fail the same way twice</li>\n</ul>\n<p><strong>You may be a good fit if you:</strong></p>\n<ul>\n<li>Have 10+ years of experience in high-scale, high-reliability software development, particularly infrastructure or capacity management</li>\n<li>Have 5+ years of engineering management experience</li>\n<li>Experience recruiting, scaling, and retaining engineering talent in a high growth environment</li>\n<li>Have experience scaling products, resources and operations to accommodate rapid growth</li>\n<li>Are deeply interested in the potential transformative effects of advanced AI systems and are committed to ensuring their safe development</li>\n<li>Excel at building strong relationships and strategy with stakeholders across engineering, product, finance, and sales</li>\n<li>Have experience working with external partners to align goals and deliver impact</li>\n<li>Enjoy working in a fast-paced, early environment; comfortable with adapting priorities as driven by the rapidly evolving AI space</li>\n<li>Have excellent written and verbal communication skills</li>\n<li>Demonstrated success building a culture of belonging and engineering excellence</li>\n<li>Are motivated by developing AI responsibly and safely</li>\n<li>Are willing and able to travel frequently between Seattle and the SF Bay Area</li>\n</ul>\n<p><strong>Strong candidates may also have experience with:</strong></p>\n<ul>\n<li>Experience with machine learning infrastructure like GPUs, TPUs, or Trainium, as well as supporting networking infrastructure like NCCL</li>\n<li>Experience as a Product Manager</li>\n<li>Experience with deployment and capacity management automation</li>\n<li>Security and privacy best practice expertise</li>\n</ul>\n<p><strong>Logistics</strong></p>\n<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience. <strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>\n<p><strong>Visa sponsorship:</strong> We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>\n<p><strong>We encourage you to apply even if you do not believe you meet every single qualification.</strong> Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work.</p>\n<p><strong>Your safety matters to us.</strong> To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you&#39;re ever unsure about a communication, don&#39;t click any links—visit anthropic.com/careers directly for confirmed position openings.</p>\n<p><strong>How we&#39;re different</strong></p>\n<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as a collaborative effort, and we work closely with other researchers, engineers, and experts to advance our understanding of AI and its applications.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_0a7113f5-76c","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://job-boards.greenhouse.io","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5141377008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$405,000 - $485,000 USD","x-skills-required":["high-scale, high-reliability software development","infrastructure or capacity management","engineering management","recruiting, scaling, and retaining engineering talent","scaling products, resources and operations","machine learning infrastructure","deployment and capacity management automation","security and privacy best practice expertise"],"x-skills-preferred":["experience with GPUs, TPUs, or Trainium","experience as a Product Manager","experience with networking infrastructure like NCCL"],"datePosted":"2026-03-08T13:56:51.226Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"high-scale, high-reliability software development, infrastructure or capacity management, engineering management, recruiting, scaling, and retaining engineering talent, scaling products, resources and operations, machine learning infrastructure, deployment and capacity management automation, security and privacy best practice expertise, experience with GPUs, TPUs, or Trainium, experience as a Product Manager, experience with networking infrastructure like NCCL","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":405000,"maxValue":485000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_25934fbc-c50"},"title":"Staff / Senior Software Engineer, Cloud Inference","description":"<p><strong>About the Role</strong></p>\n<p>The Cloud Inference team scales and optimizes Claude to serve the massive audiences of developers and enterprise companies across AWS, GCP, Azure, and future cloud service providers (CSPs). We own the end-to-end product of Claude on each cloud platform—from API integration and intelligent request routing to inference execution, capacity management, and day-to-day operations.</p>\n<p>Our engineers are extremely high leverage: we simultaneously drive multiple major revenue streams while optimizing one of Anthropic&#39;s most precious resources—compute. As we expand to more cloud platforms, the complexity of managing inference efficiently across providers with different hardware, networking stacks, and operational models grows significantly. We need engineers who can navigate these platform differences, build robust abstractions that work across providers, and make smart infrastructure decisions that keep us cost-effective at massive scale.</p>\n<p>Your work will increase the scale at which our services operate, accelerate our ability to reliably launch new frontier models and innovative features to customers across all platforms, and ensure our LLMs meet rigorous safety, performance, and security standards.</p>\n<p><strong>What You&#39;ll Do</strong></p>\n<ul>\n<li>Design and build infrastructure that serves Claude across multiple CSPs, accounting for differences in compute hardware, networking, APIs, and operational models</li>\n<li>Collaborate with CSP partner engineering teams to resolve operational issues, influence provider roadmaps, and stand up end-to-end serving on new cloud platforms</li>\n<li>Design and evolve CI/CD automation systems, including validation and deployment pipelines, that reliably ship new model versions to millions of users across cloud platforms without regressions</li>\n<li>Design interfaces and tooling abstractions across CSPs that enable cost-effective inference management, scale across providers, and reduce per-platform complexity</li>\n<li>Contribute to capacity planning and autoscaling strategies that dynamically match supply with demand across CSP validation and production workloads</li>\n<li>Optimize inference cost and performance across providers—designing workload placement and routing systems that direct requests to the most cost-effective accelerator and region</li>\n<li>Contribute to inference features that must work consistently across all platforms</li>\n<li>Analyze observability data across providers to identify performance bottlenecks, cost anomalies, and regressions, and drive remediation based on real-world production workloads</li>\n</ul>\n<p><strong>You May Be a Good Fit If You:</strong></p>\n<ul>\n<li>Have significant software engineering experience, with a strong background in high-performance, large-scale distributed systems serving millions of users</li>\n<li>Have experience building or operating services on at least one major cloud platform (AWS, GCP, or Azure), with exposure to Kubernetes, Infrastructure as Code or container orchestration</li>\n<li>Have strong interest in inference</li>\n<li>Thrive in cross-functional collaboration with both internal teams and external partners</li>\n<li>Are a fast learner who can quickly ramp up on new technologies, hardware platforms, and provider ecosystems</li>\n<li>Are highly autonomous and self-driven, taking ownership of problems end-to-end with a bias toward flexibility and high-impact work</li>\n<li>Pick up slack, even when it goes outside your job description</li>\n</ul>\n<p><strong>Strong Candidates May Also Have Experience With</strong></p>\n<ul>\n<li>Direct experience working with CSP partner teams to scale infrastructure or products across multiple platforms, navigating differences in networking, security, privacy, billing, and managed service offerings</li>\n<li>A background in building platform-agnostic tooling or abstraction layers that work across cloud providers</li>\n<li>Hands-on experience with capacity management, cost optimization, or resource planning at scale across heterogeneous environments</li>\n<li>Strong familiarity with LLM inference optimization, batching, caching, and serving strategies</li>\n<li>Experience with Machine learning infrastructure including GPUs, TPUs, Trainium, or other AI accelerators</li>\n<li>Background designing and building CI/CD systems that automate deployment and validation across cloud environments</li>\n<li>Solid understanding of multi-region deployments, geographic routing, and global traffic management</li>\n<li>Proficiency in Python or Rust</li>\n</ul>\n<p><strong>Logistics</strong></p>\n<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience. <strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>\n<p><strong>Visa sponsorship:</strong> We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_25934fbc-c50","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5107466008","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$300,000 - $485,000 USD","x-skills-required":["Software engineering","Cloud infrastructure","Kubernetes","Infrastructure as Code","Container orchestration","LLM inference optimization","Batching","Caching","Serving strategies","Machine learning infrastructure","GPUs","TPUs","Trainium","AI accelerators","CI/CD systems","Deployment and validation","Cloud environments","Multi-region deployments","Geographic routing","Global traffic management"],"x-skills-preferred":["Python","Rust","Cloud platforms","Networking","Security","Privacy","Billing","Managed service offerings","Platform-agnostic tooling","Abstraction layers","Capacity management","Cost optimization","Resource planning"],"datePosted":"2026-03-08T13:49:59.956Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Software engineering, Cloud infrastructure, Kubernetes, Infrastructure as Code, Container orchestration, LLM inference optimization, Batching, Caching, Serving strategies, Machine learning infrastructure, GPUs, TPUs, Trainium, AI accelerators, CI/CD systems, Deployment and validation, Cloud environments, Multi-region deployments, Geographic routing, Global traffic management, Python, Rust, Cloud platforms, Networking, Security, Privacy, Billing, Managed service offerings, Platform-agnostic tooling, Abstraction layers, Capacity management, Cost optimization, Resource planning","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":300000,"maxValue":485000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_3b20b513-ea1"},"title":"Staff+ Software Engineer, Systems","description":"<p><strong>About Anthropic</strong></p>\n<p>Anthropic&#39;s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.</p>\n<p><strong>About the Role</strong></p>\n<p>Anthropic&#39;s Infrastructure organisation is foundational to our mission of developing AI systems that are reliable, interpretable, and steerable. The systems we build determine how quickly we can train new models, how reliably we can run safety experiments, and how effectively we can scale Claude to millions of users — demonstrating that safe, reliable infrastructure and frontier capabilities can go hand in hand.</p>\n<p>The Systems engineering team owns compute uptime and resilience at massive scale, building the clusters, automation, and observability that make frontier AI research possible and safely deployable to customers.</p>\n<p>_Team Matching: Team matching is determined after the interview process based on interview performance, interests, and business priorities. Please note we may also consider you for different Infrastructure teams._</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Own the technical strategy and roadmap for your area, translating team-level goals into concrete execution plans</li>\n<li>Drive cross-team initiatives to build and scale AI clusters (thousands to hundreds of thousands of machines)</li>\n<li>Define infrastructure architecture, ensuring the hardest problems get solved — whether by you directly or by working through others</li>\n<li>Partner with cloud providers and internal stakeholders to shape long-term compute, data, and infrastructure strategy</li>\n<li>Establish and evolve operational excellence practices (incident response, postmortem culture, on-call)</li>\n</ul>\n<p><strong>You may be a good fit if you:</strong></p>\n<ul>\n<li>Have 10+ years of software engineering experience</li>\n<li>Have led complex, multi-quarter technical initiatives that span multiple teams or systems</li>\n<li>Can set technical direction for a team, not just execute within it</li>\n<li>Have deep expertise in distributed systems, reliability, and cloud platforms (Kubernetes, IaC, AWS/GCP)</li>\n<li>Are strong in at least one systems language (Python, Rust, Go, Java)</li>\n<li>Naturally uplevel the engineers around you and can redirect efforts when things are heading off track</li>\n<li>Build alignment across senior stakeholders and communicate effectively at all levels</li>\n</ul>\n<p><strong>Strong candidates may have:</strong></p>\n<ul>\n<li>Security and privacy best practice expertise</li>\n<li>Experience with machine learning infrastructure like GPUs, TPUs, or Trainium, as well as supporting networking infrastructure like NCCL</li>\n<li>Low level systems experience, for example linux kernel tuning and eBPF</li>\n<li>Technical expertise: Quickly understanding systems design tradeoffs, keeping track of rapidly evolving software systems</li>\n</ul>\n<p>_Deadline to apply: None. Applications will be reviewed on a rolling basis._</p>\n<p><strong>Logistics</strong></p>\n<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience. <strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>\n<p><strong>Visa sponsorship:</strong> We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>\n<p><strong>We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work.</strong></p>\n<p><strong>Your safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you&#39;re ever unsure about a communication, don&#39;t click any links—visit anthropic.com/careers directly for confirmed position openings.</strong></p>\n<p><strong>How we&#39;re different</strong></p>\n<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We&#39;re an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.</p>\n<p>The easiest way to understand our research directions is to read our recent research. This re</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_3b20b513-ea1","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://job-boards.greenhouse.io","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5108817008","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$405,000 - $485,000 USD","x-skills-required":["distributed systems","reliability","cloud platforms","Kubernetes","IaC","AWS/GCP","Python","Rust","Go","Java"],"x-skills-preferred":["security and privacy best practice expertise","machine learning infrastructure","GPUs","TPUs","Trainium","NCCL","low level systems experience","linux kernel tuning","eBPF"],"datePosted":"2026-03-08T13:49:17.054Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"distributed systems, reliability, cloud platforms, Kubernetes, IaC, AWS/GCP, Python, Rust, Go, Java, security and privacy best practice expertise, machine learning infrastructure, GPUs, TPUs, Trainium, NCCL, low level systems experience, linux kernel tuning, eBPF","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":405000,"maxValue":485000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_886a66bf-10d"},"title":"Senior Software Engineer, Systems","description":"<p><strong>About Anthropic</strong></p>\n<p>Anthropic&#39;s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.</p>\n<p><strong>About the Role</strong></p>\n<p>Anthropic&#39;s Infrastructure organisation is foundational to our mission of developing AI systems that are reliable, interpretable, and steerable. The systems we build determine how quickly we can train new models, how reliably we can run safety experiments, and how effectively we can scale Claude to millions of users — demonstrating that safe, reliable infrastructure and frontier capabilities can go hand in hand.</p>\n<p>The Systems engineering team owns compute uptime and resilience at massive scale, building the clusters, automation, and observability that make frontier AI research possible and safely deployable to customers.</p>\n<p>_Team Matching: Team matching is determined after the interview process based on interview performance, interests, and business priorities. Please note we may also consider you for different Infrastructure teams._</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Lead infrastructure projects from design through delivery, owning scope, execution, and outcomes</li>\n<li>Build and maintain systems that support AI clusters at massive scale (thousands to hundreds of thousands of machines)</li>\n<li>Partner with cloud providers and internal teams to solve compute, networking, and reliability challenges</li>\n<li>Tackle difficult technical problems in your domain and proactively fill gaps in tooling, documentation, and processes</li>\n<li>Contribute to operational practices including incident response, postmortems, and on-call rotations</li>\n</ul>\n<p><strong>You may be a good fit if you:</strong></p>\n<ul>\n<li>Have 6+ years of software engineering experience</li>\n<li>Have led technical projects end-to-end over multiple months, including scoping, breaking down work, and driving delivery</li>\n<li>Have deep knowledge of distributed systems, reliability, and cloud platforms (Kubernetes, IaC, AWS/GCP)</li>\n<li>Are strong in at least one systems language (Python, Rust, Go, Java)</li>\n<li>Solve hard problems independently and know when to pull others in</li>\n<li>Help teammates grow through knowledge sharing and thoughtful technical guidance</li>\n<li>Communicate clearly in design docs, presentations, and cross-functional discussions</li>\n</ul>\n<p><strong>Strong candidates may have:</strong></p>\n<ul>\n<li>Security and privacy best practice expertise</li>\n<li>Experience with machine learning infrastructure like GPUs, TPUs, or Trainium, as well as supporting networking infrastructure like NCCL</li>\n<li>Low level systems experience, for example linux kernel tuning and eBPF</li>\n<li>Technical expertise: Quickly understanding systems design tradeoffs, keeping track of rapidly evolving software systems</li>\n</ul>\n<p>_Deadline to apply: None. Applications will be reviewed on a rolling basis._</p>\n<p><strong>Logistics</strong></p>\n<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience. <strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>\n<p><strong>Visa sponsorship:</strong> We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>\n<p><strong>We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work.</strong></p>\n<p><strong>Your safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you&#39;re ever unsure about a communication, don&#39;t click any links—visit anthropic.com/careers directly for confirmed position openings.</strong></p>\n<p><strong>How we&#39;re different</strong></p>\n<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We&#39;re an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.</p>\n<p>The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_886a66bf-10d","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://job-boards.greenhouse.io","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4915842008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"£240,000 - £325,000GBP","x-skills-required":["distributed systems","reliability","cloud platforms","Kubernetes","IaC","AWS/GCP","Python","Rust","Go","Java"],"x-skills-preferred":["security and privacy best practice expertise","machine learning infrastructure","GPUs","TPUs","Trainium","NCCL","low level systems experience","linux kernel tuning","eBPF"],"datePosted":"2026-03-08T13:46:27.991Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"London, UK"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"distributed systems, reliability, cloud platforms, Kubernetes, IaC, AWS/GCP, Python, Rust, Go, Java, security and privacy best practice expertise, machine learning infrastructure, GPUs, TPUs, Trainium, NCCL, low level systems experience, linux kernel tuning, eBPF","baseSalary":{"@type":"MonetaryAmount","currency":"GBP","value":{"@type":"QuantitativeValue","minValue":240000,"maxValue":325000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_520ca95e-75f"},"title":"Software Engineer, Agent Infrastructure","description":"<p><strong>Software Engineer, Agent Infrastructure</strong></p>\n<p><strong>Location</strong></p>\n<p>San Francisco; New York City</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p>Scaling</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$230K – $385K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p><strong>About the Team</strong></p>\n<p>The Agent Infrastructure team at OpenAI is responsible for building systems that enable training and deployment of highly useful AI agents, both internally and for the world.</p>\n<p>We work hand-in-hand with researchers to design and scale the environment in which agentic models are trained – providing a workspace for AI models to execute code, debug issues, and develop software just as human SWEs do. Our training environment for agentic models operates at an extremely high scale and has the flexibility to emulate any environment in which an agent might work.</p>\n<p>At the same time, our team builds and maintains OpenAI’s core platform for the deployment and execution of agents in production. Our systems power products such as Codex, Operator, tool use in ChatGPT, and future agentic products.</p>\n<p><strong>About the Role</strong></p>\n<p>As a Software Engineer on the Agent Infrastructure team, you will have the opportunity to work closely with both research and product at OpenAI - building and scaling systems to train highly capable agentic models, and building the platform and integrations to launch new agents to hundreds of millions of users worldwide.</p>\n<p>Your work will consist of both building new capabilities - standing up the infrastructure and integrations needed to train more complex agentic models - and rapidly scaling these new capabilities to some of the largest compute clusters in the world. At the same time, you’ll be instrumental to the launch of agentic products at OpenAI - building, maintaining, and scaling the production platform on which all agents run.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Push massive compute clusters to their limits. You will be a core contributor to a novel container orchestration platform built in-house by our team to scale far beyond what’s possible with systems like Kubernetes.</li>\n</ul>\n<ul>\n<li>Develop and maintain FastAPI and gRPC APIs that serve as the interface for our agentic infrastructure used both in training and production.</li>\n</ul>\n<ul>\n<li>Use Terraform to stand up and evolve complex infrastructure for both research and production.</li>\n</ul>\n<ul>\n<li>Collaborate with research teams to stand up and optimize systems for novel AI training runs and experimental applications.</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>Have deep experience working on large-scale machine learning infrastructure. You know how to reason about training at scale, identifying bottlenecks and engineering solutions to optimize system performance in training environments.</li>\n</ul>\n<ul>\n<li>Know how to build new things from 0-1 quickly, and then scale them 1,000,000x.</li>\n</ul>\n<ul>\n<li>Have a keen eye for performance and optimization. You know how to squeeze the most performance out of complex, globally-distributed systems.</li>\n</ul>\n<ul>\n<li>Know your way around cloud platforms and work with infrastructure-as-code tech like Terraform.</li>\n</ul>\n<ul>\n<li>Are driven by solving complex, ambiguous problems at the intersection of infrastructure scalability, virtualization efficiency, and agentic capabilities.</li>\n</ul>\n<ul>\n<li>Have deep technical expertise in virtualization and containerization technologies (e.g. Kata, Firecracker, gVisor, Sysbox) and are passionate about optimizing runtime performance.</li>\n</ul>\n<p><strong>What We Offer</strong></p>\n<ul>\n<li>Competitive salary and equity package</li>\n</ul>\n<ul>\n<li>Opportunity to work on cutting-edge AI infrastructure</li>\n</ul>\n<ul>\n<li>Collaborative and dynamic team environment</li>\n</ul>\n<ul>\n<li>Flexible work arrangements</li>\n</ul>\n<ul>\n<li>Professional development opportunities</li>\n</ul>\n<ul>\n<li>Access to the latest technology and tools</li>\n</ul>\n<p><strong>How to Apply</strong></p>\n<p>If you are a motivated and experienced software engineer looking to join a dynamic team and work on cutting-edge AI infrastructure, please submit your application. We look forward to hearing from you!</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_520ca95e-75f","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/c1316397-25bb-4add-9e9d-0e3ea8ba929a","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$230K – $385K","x-skills-required":["large-scale machine learning infrastructure","container orchestration","FastAPI","gRPC","Terraform","cloud platforms","infrastructure-as-code","virtualization","containerization","Kata","Firecracker","gVisor","Sysbox"],"x-skills-preferred":["AI infrastructure","agentic models","training environments","compute clusters","performance optimization","runtime performance"],"datePosted":"2026-03-06T18:41:05.385Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco; New York City"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"large-scale machine learning infrastructure, container orchestration, FastAPI, gRPC, Terraform, cloud platforms, infrastructure-as-code, virtualization, containerization, Kata, Firecracker, gVisor, Sysbox, AI infrastructure, agentic models, training environments, compute clusters, performance optimization, runtime performance","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":230000,"maxValue":385000,"unitText":"YEAR"}}}]}