{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/gpus"},"x-facet":{"type":"skill","slug":"gpus","display":"Gpus","count":22},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_2d4635c3-8a5"},"title":"Director, Compute & Infrastructure FP&A","description":"<p>As a Director, Compute &amp; Infrastructure FP&amp;A, you will own and drive the monthly forecasting process for the Compute &amp; Infrastructure org by partnering with various stakeholders across Finance, Accounting, Tax and Engineering. You will play a critical role in planning and forecasting the company&#39;s largest and most complex cost center (Compute &amp; Infrastructure).</p>\n<p>You will collaborate cross-functionally to develop long-range infrastructure investment plans, evaluate build vs. buy decisions, and ensure capital is deployed efficiently to support rapid growth. You will also provide strategic financial guidance through scenario modeling, ROI analysis, and performance tracking, enabling leadership to make high-stakes decisions under uncertainty.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Own compute financial planning &amp; Forecasting.</li>\n<li>Build and manage consolidation models for GPU/CPU capacity, storage, networking, and data center investments.</li>\n<li>Translate infrastructure roadmaps into short- and long-term financial forecasts (LRP, annual planning)</li>\n<li>Coordinate closely with Corporate FP&amp;A on timelines and process</li>\n<li>Present insights on a monthly basis to senior management.</li>\n<li>Drive infrastructure investment decisions.</li>\n<li>Evaluate build vs. buy, vendor vs. owned infrastructure, and capacity allocation tradeoffs.</li>\n<li>Develop frameworks for investment trade-offs to guide executive decision making.</li>\n<li>Build scalable tooling &amp; reporting.</li>\n<li>Implement stakeholder-facing dashboards to track compute spend, utilization, and efficiency metrics.</li>\n<li>Improve visibility into unit economics (e.g., cost per training run, cost per inference, cost per customer).</li>\n<li>Drive forecasting accuracy &amp; accountability.</li>\n<li>Lead budget vs. actual analysis for compute and infrastructure spend.</li>\n<li>Identify key cost drivers (utilization, pricing, efficiency gains) and reduce forecast variance.</li>\n<li>Support close &amp; financial reporting.</li>\n<li>Partner with Accounting to ensure accurate classification of infrastructure spend (OpEx vs CapEx).</li>\n<li>Translate complex infrastructure costs into clear insights for leadership.</li>\n<li>Enable strategic decision-making.</li>\n<li>Build scenario models to support leadership decisions on capacity scaling, new model launches, and infrastructure investments.</li>\n<li>Lead ad hoc analyses on emerging topics.</li>\n</ul>\n<p>You might thrive in this role if you have:</p>\n<ul>\n<li>10+ years in strategic finance, with experience in infrastructure, cloud, hardware, or compute-intensive environments</li>\n<li>2+ years in investment banking</li>\n<li>Must have experience running an FP&amp;A team at the corporate level or business unit level with significant scale.</li>\n<li>Strong financial modeling skills, particularly in capacity planning, unit economics, and scenario analysis under uncertainty.</li>\n<li>Experience supporting large-scale infrastructure or cloud spend (e.g., AWS/GCP/Azure, GPUs, data centers).</li>\n<li>Ability to translate technical concepts (compute usage, model training/inference, system architecture) into financial insights.</li>\n<li>Proficiency in Excel/Sheets, SQL, and BI tools (e.g., Tableau); experience with planning systems like Anaplan is a plus.</li>\n<li>Strong cross-functional partnership skills, especially with Engineering, Product, and Supply Chain.</li>\n<li>Familiarity with AI/ML infrastructure cost drivers and the economics of training and serving models.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_2d4635c3-8a5","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://openai.com/","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/7536171d-0f98-4964-8f22-7968db062105","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"Full time","x-salary-range":"$234K – $325K","x-skills-required":["strategic finance","infrastructure","cloud","hardware","compute-intensive environments","investment banking","financial modeling","capacity planning","unit economics","scenario analysis","large-scale infrastructure","cloud spend","AWS","GCP","Azure","GPUs","data centers","Excel","SQL","BI tools","Tableau","planning systems","Anaplan","cross-functional partnership","engineering","product","supply chain","AI/ML infrastructure cost drivers","economics of training and serving models"],"x-skills-preferred":[],"datePosted":"2026-04-24T12:23:57.567Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"strategic finance, infrastructure, cloud, hardware, compute-intensive environments, investment banking, financial modeling, capacity planning, unit economics, scenario analysis, large-scale infrastructure, cloud spend, AWS, GCP, Azure, GPUs, data centers, Excel, SQL, BI tools, Tableau, planning systems, Anaplan, cross-functional partnership, engineering, product, supply chain, AI/ML infrastructure cost drivers, economics of training and serving models","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":234000,"maxValue":325000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_b5023ab2-eae"},"title":"TL, Research Inference","description":"<p><strong>Compensation</strong></p>\n<p>$380K – $555K • Offers Equity</p>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p><strong>About the Team</strong></p>\n<p>The Foundations team focuses on how model behavior changes as we scale models, data, and compute. The team studies the interactions between model architecture, optimization, and training data, and uses those insights to guide how new models are designed and trained.</p>\n<p><strong>About the Role</strong></p>\n<p>In this role, you will build the systems that enable advanced AI models to run efficiently at scale. You will operate at the intersection of model research and systems engineering, translating new architectural ideas into high-performance inference systems that surface real tradeoffs in performance, memory, and scalability.</p>\n<p>Your work will directly influence how models are designed, evaluated, and iterated on across the research organization. By developing and evolving high-performance inference infrastructure, you will enable researchers to explore new ideas with a clear understanding of their computational and systems implications.</p>\n<p>This is not a product-serving role. Instead, it is a research-enabling systems role focused on performance, correctness, and realism - ensuring that AI research is grounded in what can actually scale.</p>\n<p><strong>In this role, you will:</strong></p>\n<ul>\n<li>Design and build high-performance inference runtimes for large-scale AI models, with a focus on efficiency, reliability, and scalability.</li>\n</ul>\n<ul>\n<li>Own and optimize core execution paths, including model execution, memory management, batching, and scheduling.</li>\n</ul>\n<ul>\n<li>Develop and improve distributed inference across multiple GPUs, including parallelism strategies, communication patterns, and runtime coordination.</li>\n</ul>\n<ul>\n<li>Implement and optimize inference-critical operators and kernels informed by real-world workloads.</li>\n</ul>\n<ul>\n<li>Partner closely with research teams to ensure new model architectures are supported accurately and efficiently in inference systems.</li>\n</ul>\n<ul>\n<li>Diagnose and resolve performance bottlenecks through profiling, benchmarking, and low-level debugging.</li>\n</ul>\n<ul>\n<li>Contribute to observability, correctness, and reliability of large-scale AI systems.</li>\n</ul>\n<p><strong>You might thrive in this role if you:</strong></p>\n<ul>\n<li>Have experience building production inference systems, not just training or running models.</li>\n</ul>\n<ul>\n<li>Are comfortable with GPU-centric performance engineering, including memory behavior and latency/throughput tradeoffs.</li>\n</ul>\n<ul>\n<li>Have worked on multi-GPU or distributed systems involving batching, scheduling, or runtime coordination.</li>\n</ul>\n<ul>\n<li>Can reason end-to-end about inference pipelines, from request handling through execution and output streaming.</li>\n</ul>\n<ul>\n<li>Are able to understand research ideas and implement them within real system and performance constraints.</li>\n</ul>\n<ul>\n<li>Enjoy solving hard, ambiguous systems problems that only emerge at scale.</li>\n</ul>\n<ul>\n<li>Prefer hands-on technical ownership and execution over abstract design work.</li>\n</ul>\n<p><strong>About OpenAI</strong></p>\n<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p><strong>Required Skills</strong></p>\n<ul>\n<li>Experience building production inference systems, not just training or running models</li>\n</ul>\n<ul>\n<li>Comfortable with GPU-centric performance engineering, including memory behavior and latency/throughput tradeoffs</li>\n</ul>\n<ul>\n<li>Multi-GPU or distributed systems involving batching, scheduling, or runtime coordination</li>\n</ul>\n<ul>\n<li>Reasoning end-to-end about inference pipelines, from request handling through execution and output streaming</li>\n</ul>\n<ul>\n<li>Understanding research ideas and implementing them within real system and performance constraints</li>\n</ul>\n<ul>\n<li>Solving hard, ambiguous systems problems that only emerge at scale</li>\n</ul>\n<ul>\n<li>Hands-on technical ownership and execution over abstract design work</li>\n</ul>\n<p><strong>Preferred Skills</strong></p>\n<ul>\n<li>Experience working with large-scale AI models</li>\n</ul>\n<ul>\n<li>Distributed inference across multiple GPUs</li>\n</ul>\n<ul>\n<li>Parallelism strategies, communication patterns, and runtime coordination</li>\n</ul>\n<ul>\n<li>Implementing and optimizing inference-critical operators and kernels</li>\n</ul>\n<ul>\n<li>Observability, correctness, and reliability of large-scale AI systems</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_b5023ab2-eae","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://openai.com/","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/50aab80a-fa60-4fcc-882d-18ea76db5f11","x-work-arrangement":null,"x-experience-level":null,"x-job-type":"Full time","x-salary-range":"$380K – $555K","x-skills-required":["Experience building production inference systems, not just training or running models","Comfortable with GPU-centric performance engineering, including memory behavior and latency/throughput tradeoffs","Multi-GPU or distributed systems involving batching, scheduling, or runtime coordination","Reasoning end-to-end about inference pipelines, from request handling through execution and output streaming","Understanding research ideas and implementing them within real system and performance constraints","Solving hard, ambiguous systems problems that only emerge at scale","Hands-on technical ownership and execution over abstract design work"],"x-skills-preferred":["Experience working with large-scale AI models","Distributed inference across multiple GPUs","Parallelism strategies, communication patterns, and runtime coordination","Implementing and optimizing inference-critical operators and kernels","Observability, correctness, and reliability of large-scale AI systems","Mental health and wellness support","Employer-paid basic life and disability coverage","Annual learning and development stipend to fuel your professional growth","Daily meals in our offices, and meal delivery credits as eligible","Relocation support for eligible employees","Additional taxable fringe benefits, such as charitable donation matching and wellness stipends"],"datePosted":"2026-04-24T12:21:17.917Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Experience building production inference systems, not just training or running models, Comfortable with GPU-centric performance engineering, including memory behavior and latency/throughput tradeoffs, Multi-GPU or distributed systems involving batching, scheduling, or runtime coordination, Reasoning end-to-end about inference pipelines, from request handling through execution and output streaming, Understanding research ideas and implementing them within real system and performance constraints, Solving hard, ambiguous systems problems that only emerge at scale, Hands-on technical ownership and execution over abstract design work, Experience working with large-scale AI models, Distributed inference across multiple GPUs, Parallelism strategies, communication patterns, and runtime coordination, Implementing and optimizing inference-critical operators and kernels, Observability, correctness, and reliability of large-scale AI systems, Mental health and wellness support, Employer-paid basic life and disability coverage, Annual learning and development stipend to fuel your professional growth, Daily meals in our offices, and meal delivery credits as eligible, Relocation support for eligible employees, Additional taxable fringe benefits, such as charitable donation matching and wellness stipends","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":380000,"maxValue":555000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_0f00522c-1ea"},"title":"Inference Technical Lead, On-Device Transformers","description":"<p>Job Title: Inference Technical Lead, On-Device Transformers</p>\n<p>Location: San Francisco</p>\n<p>Department: Consumer Products</p>\n<p>Job Type: Full time</p>\n<p>Workplace Type: Hybrid</p>\n<p><strong>Compensation</strong></p>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p><strong>About the Team</strong></p>\n<p>The Future of Computing Research team is an applied research team in the Consumer Devices group focused on developing new methods and models to support our vision as we advance forward in our mission of building AGI that benefits all of humanity.</p>\n<p><strong>About the Role</strong></p>\n<p>As a Technical Lead on the Future of Computing Research team, you will work together with both the best ML researchers in the world and the greatest design talent of our generation to push the frontier of model capabilities.</p>\n<p><strong>This role is based in San Francisco, CA. We follow a hybrid model with 4 days a week in the office and offer relocation assistance to new employees.</strong></p>\n<p><strong>In this role, you will:</strong></p>\n<ul>\n<li>Evaluate and select silicon platforms (GPUs, NPUs, and specialized accelerators) for on-device and edge deployment of OpenAI models.</li>\n</ul>\n<ul>\n<li>Work closely with research teams to co-design model architectures that meet real-world deployment constraints such as latency, memory, power, and bandwidth.</li>\n</ul>\n<ul>\n<li>Analyze and model system performance, identifying tradeoffs between model design, memory hierarchy, compute throughput, and hardware capabilities.</li>\n</ul>\n<ul>\n<li>Partner with hardware vendors and internal infrastructure teams to bring up new accelerators and ensure efficient execution of transformer workloads.</li>\n</ul>\n<ul>\n<li>Build and lead a team of engineers responsible for implementing the low-level inference stack, including kernel development and runtime systems.</li>\n</ul>\n<ul>\n<li>Run through the necessary walls to take nascent research capabilities and turn them into capabilities we can build on top of.</li>\n</ul>\n<p><strong>You might thrive in this role if you:</strong></p>\n<ul>\n<li>Have experience evaluating or deploying workloads on GPUs, NPUs, or other specialized accelerators.</li>\n</ul>\n<ul>\n<li>Understand the performance characteristics of transformer models, including attention, KV-cache behavior, and memory bandwidth requirements.</li>\n</ul>\n<ul>\n<li>Have designed or optimized high-performance compute systems, such as inference engines, distributed runtimes, or hardware-aware ML pipelines.</li>\n</ul>\n<ul>\n<li>Have experience building or leading teams working on low-level performance-critical software such as CUDA kernels, compilers, or ML runtimes.</li>\n</ul>\n<ul>\n<li>Have already spent time in the weeds teaching models to speak and perceive.</li>\n</ul>\n<p><strong>About OpenAI</strong></p>\n<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>\n<p><strong>Salary</strong></p>\n<p>Compensation Range: $445K</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_0f00522c-1ea","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://openai.com/","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/a653b035-a866-4a5c-9c2a-fda3c2950eee","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"Full time","x-salary-range":"$445K","x-skills-required":["Experience evaluating or deploying workloads on GPUs, NPUs, or other specialized accelerators","Understanding the performance characteristics of transformer models, including attention, KV-cache behavior, and memory bandwidth requirements","Designing or optimizing high-performance compute systems, such as inference engines, distributed runtimes, or hardware-aware ML pipelines","Building or leading teams working on low-level performance-critical software such as CUDA kernels, compilers, or ML runtimes","Teaching models to speak and perceive"],"x-skills-preferred":[],"datePosted":"2026-04-24T12:20:13.092Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Experience evaluating or deploying workloads on GPUs, NPUs, or other specialized accelerators, Understanding the performance characteristics of transformer models, including attention, KV-cache behavior, and memory bandwidth requirements, Designing or optimizing high-performance compute systems, such as inference engines, distributed runtimes, or hardware-aware ML pipelines, Building or leading teams working on low-level performance-critical software such as CUDA kernels, compilers, or ML runtimes, Teaching models to speak and perceive","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":445000,"maxValue":445000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_ac45e205-e7d"},"title":"Engineering Manager, Inference Routing and Performance","description":"<p><strong>About the role</strong></p>\n<p>Every request that hits Claude , from claude.ai, the API, our cloud partners, or internal research , passes through a routing decision. Not a generic load balancer round-robin, but a decision that accounts for what&#39;s already cached where, which accelerator the request runs best on, and what else is in flight across the fleet.</p>\n<p>The Inference Routing team owns this layer. We build the cluster-level routing and coordination plane for Anthropic&#39;s inference fleet , the system that sits between the API surface and the inference engines themselves, making fleet-wide efficiency decisions in real time.</p>\n<p><strong>Representative work:</strong></p>\n<p>Things the Inference Routing EM actually spends time on:</p>\n<ul>\n<li>Deciding whether a proposed routing algorithm change is worth the deploy risk, given the modeled throughput gain and the blast radius if it regresses</li>\n</ul>\n<ul>\n<li>Sequencing a quarter where KV-cache offload, a new coordination protocol, and two model launches all compete for the same engineers</li>\n</ul>\n<ul>\n<li>Working through a persistent tail-latency regression with the team , walking down from fleet-level metrics to per-replica behavior to a root cause in the networking stack</li>\n</ul>\n<ul>\n<li>Building the case (with numbers) to peer teams for why a cross-team protocol change unlocks the next efficiency win</li>\n</ul>\n<ul>\n<li>Running the post-incident review after a cache-eviction bug caused a capacity event, and turning it into process changes that stick</li>\n</ul>\n<ul>\n<li>Interviewing a candidate who has built schedulers at supercomputing scale, and deciding whether they&#39;d be additive to a team that already goes deep</li>\n</ul>\n<p><strong>Drive system-level performance</strong></p>\n<ul>\n<li>Own the technical roadmap for cluster-level inference efficiency , routing decisions, cache placement and eviction, cross-replica coordination, and the protocols that keep routing and inference engines in sync</li>\n</ul>\n<ul>\n<li>Partner with the inference engine, kernels, and performance teams to identify fleet-level throughput and latency wins, then turn those into shipped improvements with measurable results</li>\n</ul>\n<ul>\n<li>Build the team&#39;s habit of quantitative performance modeling: claim a win only when you can measure it, and know before you ship what the expected effect is</li>\n</ul>\n<p><strong>Deliver reliably and operate cleanly</strong></p>\n<ul>\n<li>Set technical strategy for how routing evolves across heterogeneous hardware (GPUs, TPUs, Trainium) and across all our serving surfaces</li>\n</ul>\n<ul>\n<li>Run the team&#39;s operational backbone , on-call rotation, incident response, postmortem review, deploy safety , so the team can ship aggressively without the system becoming fragile</li>\n</ul>\n<ul>\n<li>Create clarity at a seam: Inference Routing sits between the API surface, the inference engines, and the cloud deployment teams. You&#39;ll make sure commitments are realistic, dependencies are understood, and nobody is surprised</li>\n</ul>\n<p><strong>Build and grow the team</strong></p>\n<ul>\n<li>Develop and retain a strong existing team, and hire against the bar described above: people who can go to the OS and framework level when the problem demands it, and who care about production reliability</li>\n</ul>\n<ul>\n<li>Coach engineers through a roadmap where priorities shift with model launches, new hardware, and scaling demands. We pair a lot here , you&#39;ll help make that collaboration pattern productive</li>\n</ul>\n<ul>\n<li>Pick up slack when it matters. This is a small team in a critical path; sometimes the EM is the one unblocking a stuck deploy or synthesizing a design debate</li>\n</ul>\n<p><strong>You may be a good fit if you:</strong></p>\n<ul>\n<li>Have 5+ years of engineering management experience, ideally with at least part of that leading teams on critical-path production infrastructure at scale</li>\n</ul>\n<ul>\n<li>Have a deep systems background , load balancing, scheduling, cache-coherent distributed state, high-performance networking, or similar. You need enough depth to make architectural calls about routing and efficiency, and to evaluate candidates who go to the kernel and framework level</li>\n</ul>\n<ul>\n<li>Have shipped performance improvements in large-scale systems and can explain, with numbers, what the impact was</li>\n</ul>\n<ul>\n<li>Have run production infrastructure with real operational stakes: on-call, incident response, capacity events, deploy discipline</li>\n</ul>\n<ul>\n<li>Are results-oriented with a bias toward impact, and comfortable working in a space where throughput, latency, stability, and feature velocity all pull in different directions</li>\n</ul>\n<ul>\n<li>Build strong relationships across team boundaries , this is a seam role, and much of the job is making sure other teams can rely on yours</li>\n</ul>\n<ul>\n<li>Are curious about machine learning systems. You don&#39;t need an ML research background, but you should want to learn how transformer inference actually works and how that shapes the systems problems</li>\n</ul>\n<p><strong>Strong candidates may also have:</strong></p>\n<ul>\n<li>Experience with LLM inference serving , KV caching, continuous batching, request scheduling, prefill/decode disaggregation</li>\n</ul>\n<ul>\n<li>Background in cluster schedulers, load balancers, service meshes, or coordination planes at scale</li>\n</ul>\n<ul>\n<li>Familiarity with heterogeneous accelerator fleets (GPU/TPU/Trainium) and how hardware differences affect workload placement</li>\n</ul>\n<ul>\n<li>Experience with GPU/accelerator programming, ML framework internals, or OS-level performance debugging , enough to follow and evaluate the technical work, not necessarily to do it daily</li>\n</ul>\n<ul>\n<li>Led teams at supercomputing or hyperscaler infrastructure scale</li>\n</ul>\n<ul>\n<li>Led teams through rapid-growth periods where hiring and onboarding competed with roadmap delivery</li>\n</ul>\n<p>The annual compensation range for this role is listed below.</p>\n<p>For sales roles, the range provided is the role’s On Target Earnings (“OTE”) range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.</p>\n<p>Annual Salary: $405,000-$485,000 USD</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_ac45e205-e7d","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5155391008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"Annual Salary: $405,000-$485,000 USD","x-skills-required":["engineering management","inference routing","cluster-level routing","cache placement and eviction","cross-replica coordination","protocols","heterogeneous hardware","GPUs","TPUs","Trainium","machine learning systems","transformer inference","LLM inference serving","KV caching","continuous batching","request scheduling","prefill/decode disaggregation","cluster schedulers","load balancers","service meshes","coordination planes","GPU/accelerator programming","ML framework internals","OS-level performance debugging"],"x-skills-preferred":[],"datePosted":"2026-04-24T11:25:04.722Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"engineering management, inference routing, cluster-level routing, cache placement and eviction, cross-replica coordination, protocols, heterogeneous hardware, GPUs, TPUs, Trainium, machine learning systems, transformer inference, LLM inference serving, KV caching, continuous batching, request scheduling, prefill/decode disaggregation, cluster schedulers, load balancers, service meshes, coordination planes, GPU/accelerator programming, ML framework internals, OS-level performance debugging","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":405000,"maxValue":485000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_588dfb0e-611"},"title":"Solutions Architect - Kubernetes","description":"<p>As a Solutions Architect at CoreWeave, you will play a vital role in helping customers succeed with our cloud infrastructure offerings, focusing on Kubernetes solutions within high-performance compute (HPC) environments.</p>\n<p>Your responsibilities will include serving as the primary technical point of contact for customers, establishing strong technical relationships and ensuring their success with CoreWeave&#39;s cloud infrastructure offerings.</p>\n<p>You will collaborate closely with customers to understand their unique business needs and create, prototype, and deploy tailored solutions that align with their requirements.</p>\n<p>You will lead proof of concept initiatives to showcase the value and viability of CoreWeave&#39;s solutions within specific environments.</p>\n<p>You will drive technical leadership and direction during customer meetings, presentations, and workshops, addressing any technical queries or concerns that arise.</p>\n<p>You will act as a virtual member of CoreWeave&#39;s Kubernetes product and engineering teams, identifying opportunities for product enhancement and collaborating with engineers to implement your suggestions.</p>\n<p>You will offer valuable insights on product features, functionality, and performance, contributing regularly to discussions about product strategy and architecture.</p>\n<p>You will conduct periodic technical reviews and assessments of customer workloads, pinpointing opportunities for workload optimization and suggesting suitable solutions.</p>\n<p>You will stay informed of the latest developments and trends in Kubernetes, cloud computing and infrastructure, sharing your thought leadership with customers and internal stakeholders.</p>\n<p>You will lead the prototyping and initiation of research and development efforts for emerging products and solutions, delivering prototypes and key insights for internal consumption.</p>\n<p>You will represent CoreWeave at conferences and industry events, with occasional travel as required.</p>\n<p>To be successful in this role, you will need to have a B.S. in Computer Science or a related technical discipline, or equivalent experience.</p>\n<p>You will also need to have 7+ years of proven experience as a Solutions Architect, engineer, researcher, or technical account manager in cloud infrastructure, focusing on building distributed systems or HPC/cloud services, with an expertise focused on scalable Kubernetes solutions.</p>\n<p>You will need to be fluent in cloud computing concepts, architecture, and technologies with hands-on experience in designing and implementing cloud solutions.</p>\n<p>You will need to have a proven track record with building customer relationships, communicating clearly and the ability to break down complex technical concepts to both technical and non-technical audiences.</p>\n<p>You will need to be familiar with NVIDIA GPUs typically used in AI/ML applications and associated technologies such as Infiniband and NVIDIA Collective Communications Library (NCCL).</p>\n<p>You will need to have experience with running large-scale Artificial Intelligence/Machine Learning (AI/ML) training and inference workloads on technologies such as Slurm and Kubernetes.</p>\n<p>Preferred qualifications include code contributions to open-source inference frameworks, experience with scripting and automation related to Kubernetes clusters and workloads, experience with building solutions across multi-cloud environments, and client or customer-facing publications/talks on latency, optimization, or advanced model-server architectures.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_588dfb0e-611","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4557835006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$165,000 to $220,000","x-skills-required":["Kubernetes","Cloud Computing","High-Performance Compute (HPC)","Distributed Systems","Cloud Infrastructure","Scalable Solutions","NVIDIA GPUs","Infiniband","NVIDIA Collective Communications Library (NCCL)","Slurm","Kubernetes Clusters"],"x-skills-preferred":["Code Contributions to Open-Source Inference Frameworks","Scripting and Automation Related to Kubernetes Clusters and Workloads","Building Solutions Across Multi-Cloud Environments","Client or Customer-Facing Publications/Talks on Latency, Optimization, or Advanced Model-Server Architectures"],"datePosted":"2026-04-18T15:57:29.779Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Kubernetes, Cloud Computing, High-Performance Compute (HPC), Distributed Systems, Cloud Infrastructure, Scalable Solutions, NVIDIA GPUs, Infiniband, NVIDIA Collective Communications Library (NCCL), Slurm, Kubernetes Clusters, Code Contributions to Open-Source Inference Frameworks, Scripting and Automation Related to Kubernetes Clusters and Workloads, Building Solutions Across Multi-Cloud Environments, Client or Customer-Facing Publications/Talks on Latency, Optimization, or Advanced Model-Server Architectures","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":165000,"maxValue":220000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_d799d883-0dd"},"title":"Solutions Architect- Networking","description":"<p>As a Solutions Architect at CoreWeave, you will play a vital role in leading innovation at every turn. You will have the opportunity to demonstrate thought leadership and engage hands-on throughout our customers&#39; entire lifecycle. From establishing their Kubernetes environment to developing proofs of concept, onboarding, and optimizing workloads, you will lead innovation at every turn.</p>\n<p>In this role, you will:</p>\n<p>Serve as the primary technical point of contact for customers, establishing strong technical relationships and ensuring their success with CoreWeave&#39;s cloud infrastructure offerings, focusing on networking technologies within high-performance compute (HPC) environments Collaborate closely with customers to understand their unique business needs and create, prototype, and deploy tailored solutions that align with their requirements. Lead proof of concept initiatives to showcase the value and viability of CoreWeave&#39;s solutions within specific environments. Drive technical leadership and direction during customer meetings, presentations, and workshops, addressing any technical queries or concerns that arise. Act as a virtual member of CoreWeave&#39;s Networking product and engineering teams, identifying opportunities for product enhancement and collaborating with engineers to implement your suggestions. Offer valuable insights on product features, functionality, and performance, contributing regularly to discussions about product strategy and architecture. Conduct periodic technical reviews and assessments of customer workloads, pinpointing opportunities for workload optimization and suggesting suitable solutions. Stay informed of the latest developments and trends in Kubernetes, cloud computing and infrastructure, sharing your thought leadership with customers and internal stakeholders. Lead the prototyping and initiation of research and development efforts for emerging products and solutions, delivering prototypes and key insights for internal consumption. Represent CoreWeave at conferences and industry events, with occasional travel as required.</p>\n<p>Who You Are:</p>\n<p>B.S. in Computer Science or a related technical discipline, or equivalent experience 7+ years of proven experience as a Solutions Architect, engineer, researcher, or technical account manager in cloud infrastructure focusing on building distributed systems or HPC/cloud services, with an expertise focused on infrastructure networking. Fluency in cloud computing concepts, architecture, and technologies with hands-on experience in designing and implementing cloud solutions Proven track record with building customer relationships, communicating clearly and the ability to break down complex technical concepts to both technical and non-technical audiences Expertise with a broad range of networking technologies and topics, with a familiarity to understand the needs and use cases is it relates to securing and enabling high performance networking environments. Experience with managing infrastructure networking, Kubernnetes CSI management, and private networking concepts Familiar with NVIDIA GPUs typically used in AI/ML applications and associated technologies such as Infiniband and NVIDIA Collective Communications Library (NCCL)</p>\n<p>Preferred:</p>\n<p>Code contributions to open-source inference frameworks Experience with scripting and automation related to network technologies Experience with building solutions across multi-cloud environments Client or customer-facing publications/talks on latency, optimization, or advanced model-server architectures</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_d799d883-0dd","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4568528006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$165,000 to $220,000","x-skills-required":["cloud computing","Kubernetes","infrastructure networking","high-performance computing","networking technologies","NVIDIA GPUs","Infiniband","NVIDIA Collective Communications Library (NCCL)"],"x-skills-preferred":["open-source inference frameworks","scripting and automation","multi-cloud environments","latency, optimization, or advanced model-server architectures"],"datePosted":"2026-04-18T15:56:27.053Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Livingston, NJ / New York, NY / Sunnyvale, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"cloud computing, Kubernetes, infrastructure networking, high-performance computing, networking technologies, NVIDIA GPUs, Infiniband, NVIDIA Collective Communications Library (NCCL), open-source inference frameworks, scripting and automation, multi-cloud environments, latency, optimization, or advanced model-server architectures","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":165000,"maxValue":220000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_9166d234-4c5"},"title":"Solutions Architect - HPC/AI/ML","description":"<p>As a Solutions Architect at CoreWeave, you will play a vital and dynamic role in helping customers establish their Kubernetes environment, develop proofs of concept, onboard, and optimise workloads. You will serve as the primary technical point of contact for customers, establishing strong technical relationships and ensuring their success with CoreWeave&#39;s cloud infrastructure offerings, focusing on AI/ML workloads within high-performance compute (HPC) environments.</p>\n<p>Collaborate closely with customers to understand their unique business needs and create, prototype, and deploy tailored solutions that align with their requirements. Lead proof of concept initiatives to showcase the value and viability of CoreWeave&#39;s solutions within specific environments.</p>\n<p>Drive technical leadership and direction during customer meetings, presentations, and workshops, addressing any technical queries or concerns that arise. Act as a virtual member of CoreWeave&#39;s Kubernetes product and engineering teams, identifying opportunities for product enhancement and collaborating with engineers to implement your suggestions.</p>\n<p>Offer valuable insights on product features, functionality, and performance, contributing regularly to discussions about product strategy and architecture. Conduct periodic technical reviews and assessments of customer workloads, pinpointing opportunities for workload optimisation and suggesting suitable solutions.</p>\n<p>Stay informed of the latest developments and trends in Kubernetes, cloud computing and infrastructure, sharing your thought leadership with customers and internal stakeholders. Lead the prototyping and initiation of research and development efforts for emerging products and solutions, delivering prototypes and key insights for internal consumption.</p>\n<p>Represent CoreWeave at conferences and industry events, with occasional travel as required.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_9166d234-4c5","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4649044006","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$165,000 to $225,000 SGD","x-skills-required":["cloud computing concepts","architecture","technologies","NVIDIA GPUs","Infiniband","NVIDIA Collective Communications Library (NCCL)","Slurm","Kubernetes"],"x-skills-preferred":["code contributions to open-source inference frameworks","scripting and automation related to AI/ML workloads","building solutions across multi-cloud environments","client or customer-facing publications/talks on latency, optimisation, or advanced model-server architectures"],"datePosted":"2026-04-18T15:51:30.371Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Singapore"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"cloud computing concepts, architecture, technologies, NVIDIA GPUs, Infiniband, NVIDIA Collective Communications Library (NCCL), Slurm, Kubernetes, code contributions to open-source inference frameworks, scripting and automation related to AI/ML workloads, building solutions across multi-cloud environments, client or customer-facing publications/talks on latency, optimisation, or advanced model-server architectures","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":165000,"maxValue":225000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_a8092b6e-7f5"},"title":"Bare Metal Support Engineer","description":"<p>As a Bare Metal Support Engineer at CoreWeave, you will be responsible for supporting, operating, and maintaining CoreWeave&#39;s extensive GPU fleet across our growing data centers in the U.S., Europe, and beyond.</p>\n<p>You will work closely with customers, data center technicians, and engineering teams to ensure the reliability, performance, and scalability of our infrastructure.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Providing high-level support for customers utilizing bare-metal GPU fleets on CoreWeave Cloud.</li>\n<li>Diagnosing, triaging, and investigating reported customer issues and high-priority incidents, identifying root causes and escalating when necessary.</li>\n<li>Developing a deep understanding of customer workloads and use cases to provide tailored technical support.</li>\n<li>Coordinating remote troubleshooting and hardware interventions with Data Center Technicians.</li>\n<li>Creating and maintaining internal documentation, including troubleshooting guides, best practices, and knowledge base articles.</li>\n<li>Participating in an on-call rotation to support production clusters and ensure operational reliability.</li>\n<li>Collaborating with engineering teams to improve hardware reliability, software stability, and system performance.</li>\n<li>Implementing automation and scripting to streamline support workflows and reduce manual interventions.</li>\n<li>Performing in-depth log analysis and debugging across multiple layers of the stack (firmware, drivers, hardware).</li>\n<li>Providing feedback to internal teams on common support issues to drive continuous improvements.</li>\n<li>Working with networking teams to troubleshoot connectivity issues affecting customer workloads.</li>\n<li>Supporting supercomputing infrastructure running GPU workloads at scale.</li>\n<li>Driving operational excellence by refining internal processes and support methodologies.</li>\n</ul>\n<p>To succeed in this role, you will need:</p>\n<ul>\n<li>Experience in data centers, GPU clusters, server deployments, system administration, or hardware troubleshooting.</li>\n<li>Demonstrated experience driving resolutions and continuous improvements across cross-functional environments and teams within a data center environment.</li>\n<li>Intermediate knowledge of Linux (Ubuntu, CentOS, or similar), including command-line proficiency.</li>\n<li>Experience with NVIDIA GPUs, SuperMicro systems, Dell systems, high-performance computing (HPC), and large-scale data center environments.</li>\n<li>Experience in networking fundamentals (TCP/IP, VLANs, DNS, DHCP) and troubleshooting tools.</li>\n<li>Hands-on experience with firmware updates, BIOS configurations, and driver management.</li>\n<li>Experience analyzing system logs and debugging issues across firmware, drivers, and hardware layers.</li>\n<li>Experience working with Jira, Confluence, Notion, or other issue-tracking and documentation platforms.</li>\n<li>Experience in scripting and automation (Python, Bash, Ansible, or similar).</li>\n</ul>\n<p>If you&#39;re a curious and analytical individual with a passion for problem-solving and a desire to work in a fast-paced environment, we&#39;d love to hear from you!</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_a8092b6e-7f5","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4560350006","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$83,000 to $132,000","x-skills-required":["Linux","GPU clusters","server deployments","system administration","hardware troubleshooting","NVIDIA GPUs","SuperMicro systems","Dell systems","high-performance computing","large-scale data center environments","networking fundamentals","troubleshooting tools","firmware updates","BIOS configurations","driver management","system logs","debugging issues","Jira","Confluence","Notion","issue-tracking","documentation platforms","scripting","automation"],"x-skills-preferred":["Kubernetes","Docker","containerized infrastructure"],"datePosted":"2026-04-18T15:49:58.535Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Linux, GPU clusters, server deployments, system administration, hardware troubleshooting, NVIDIA GPUs, SuperMicro systems, Dell systems, high-performance computing, large-scale data center environments, networking fundamentals, troubleshooting tools, firmware updates, BIOS configurations, driver management, system logs, debugging issues, Jira, Confluence, Notion, issue-tracking, documentation platforms, scripting, automation, Kubernetes, Docker, containerized infrastructure","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":83000,"maxValue":132000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_59e88547-efc"},"title":"Senior Software Engineer, Systems","description":"<p>About Anthropic</p>\n<p>Anthropic&#39;s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole.</p>\n<p>About the Role</p>\n<p>Anthropic&#39;s Infrastructure organization is foundational to our mission of developing AI systems that are reliable, interpretable, and steerable. The systems we build determine how quickly we can train new models, how reliably we can run safety experiments, and how effectively we can scale Claude to millions of users , demonstrating that safe, reliable infrastructure and frontier capabilities can go hand in hand. The Systems engineering team owns compute uptime and resilience at massive scale, building the clusters, automation, and observability that make frontier AI research possible and safely deployable to customers.</p>\n<p>Responsibilities</p>\n<ul>\n<li>Lead infrastructure projects from design through delivery, owning scope, execution, and outcomes</li>\n<li>Build and maintain systems that support AI clusters at massive scale (thousands to hundreds of thousands of machines)</li>\n<li>Partner with cloud providers and internal teams to solve compute, networking, and reliability challenges</li>\n<li>Tackle difficult technical problems in your domain and proactively fill gaps in tooling, documentation, and processes</li>\n<li>Contribute to operational practices including incident response, postmortems, and on-call rotations</li>\n</ul>\n<p>Benefits</p>\n<ul>\n<li>Competitive compensation and benefits</li>\n<li>Optional equity donation matching</li>\n<li>Generous vacation and parental leave</li>\n<li>Flexible working hours</li>\n<li>Lovely office space in which to collaborate with colleagues</li>\n</ul>\n<p>Requirements</p>\n<ul>\n<li>6+ years of software engineering experience</li>\n<li>Have led technical projects end-to-end over multiple months, including scoping, breaking down work, and driving delivery</li>\n<li>Have deep knowledge of distributed systems, reliability, and cloud platforms (Kubernetes, IaC, AWS/GCP)</li>\n<li>Are strong in at least one systems language (Python, Rust, Go, Java)</li>\n<li>Solve hard problems independently and know when to pull others in</li>\n<li>Help teammates grow through knowledge sharing and thoughtful technical guidance</li>\n<li>Communicate clearly in design docs, presentations, and cross-functional discussions</li>\n</ul>\n<p>Preferred Qualifications</p>\n<ul>\n<li>Security and privacy best practice expertise</li>\n<li>Experience with machine learning infrastructure like GPUs, TPUs, or Trainium, as well as supporting networking infrastructure like NCCL</li>\n<li>Low level systems experience, for example linux kernel tuning and eBPF</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_59e88547-efc","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4915842008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"£240,000-£325,000 GBP","x-skills-required":["Distributed systems","Reliability","Cloud platforms","Kubernetes","IaC","AWS/GCP","Systems language","Python","Rust","Go","Java"],"x-skills-preferred":["Security and privacy best practice","Machine learning infrastructure","GPUs","TPUs","Trainium","Networking infrastructure","NCCL","Low level systems experience","Linux kernel tuning","eBPF"],"datePosted":"2026-04-18T15:48:47.617Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"London, UK"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Distributed systems, Reliability, Cloud platforms, Kubernetes, IaC, AWS/GCP, Systems language, Python, Rust, Go, Java, Security and privacy best practice, Machine learning infrastructure, GPUs, TPUs, Trainium, Networking infrastructure, NCCL, Low level systems experience, Linux kernel tuning, eBPF","baseSalary":{"@type":"MonetaryAmount","currency":"GBP","value":{"@type":"QuantitativeValue","minValue":240000,"maxValue":325000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_9ecceef8-349"},"title":"Research Engineer/Research Scientist, Audio","description":"<p>We are seeking a Research Engineer/Research Scientist to join our Audio team. As a member of this team, you will work across the full stack of audio ML, developing audio codecs and representations, sourcing and synthesizing high-quality audio data, training large-scale speech language models and large audio diffusion models, and developing novel architectures for incorporating continuous signals into LLMs.</p>\n<p>Our team focuses primarily but not exclusively on speech, building advanced steerable systems spanning end-to-end conversational systems, speech and audio understanding models, and speech synthesis capabilities. The team works closely with many collaborators across pretraining, finetuning, reinforcement learning, production inference, and product to get advanced audio technologies from early research to high-impact real-world deployments.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Develop and train audio models, including conversational speech-to-speech, speech translation, speech recognition, text-to-speech, diarization, codecs, and generative audio models</li>\n<li>Work across abstraction levels, from signal processing fundamentals to large-scale model training and inference optimization</li>\n<li>Collaborate with teams across the company to develop and deploy audio technologies</li>\n<li>Communicate clearly and effectively with colleagues and stakeholders</li>\n</ul>\n<p>Strong candidates may also have experience with:</p>\n<ul>\n<li>Large language model pretraining and finetuning</li>\n<li>Training diffusion models for image and audio generation</li>\n<li>Reinforcement learning for large language models and diffusion models</li>\n<li>End-to-end system optimization, from performance benchmarking to kernel optimization</li>\n<li>GPUs, Kubernetes, PyTorch, or distributed training infrastructure</li>\n</ul>\n<p>Representative projects:</p>\n<ul>\n<li>Training state-of-the-art neural audio codecs for 48 kHz stereo audio</li>\n<li>Developing novel algorithms for diffusion pretraining and reinforcement learning</li>\n<li>Scaling audio datasets to millions of hours of high-quality audio</li>\n<li>Creating robust evaluation methodologies for hard-to-measure qualities such as naturalness or expressiveness</li>\n<li>Studying training dynamics of mixed audio-text language models</li>\n<li>Optimizing latency and inference throughput for deployed streaming audio systems</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_9ecceef8-349","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5074815008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$350,000-$500,000 USD","x-skills-required":["JAX","PyTorch","large-scale distributed training","signal processing fundamentals","speech language models","audio diffusion models","continuous signals","LLMs"],"x-skills-preferred":["large language model pretraining","diffusion models","reinforcement learning","end-to-end system optimization","GPUs","Kubernetes","distributed training infrastructure"],"datePosted":"2026-04-18T15:42:59.425Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"JAX, PyTorch, large-scale distributed training, signal processing fundamentals, speech language models, audio diffusion models, continuous signals, LLMs, large language model pretraining, diffusion models, reinforcement learning, end-to-end system optimization, GPUs, Kubernetes, distributed training infrastructure","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":350000,"maxValue":500000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_6be97e03-e54"},"title":"ML HW-SW Co-Design Software Tech Lead Manager (TLM)","description":"<p>We are seeking a highly motivated ML Software Tech Lead Manager to join our HW-SW Co-design team. As a technical expert, you will lead a small, high-impact team to drive advances in machine learning acceleration. Your primary responsibilities will include direct technical contribution, technical team leadership, architectural alignment, HW-SW strategy, and execution management.</p>\n<p>You will spend a significant portion of your time on technical execution while managing a multi-disciplinary team to evolve our software stack. You will directly contribute to the codebase and technical strategy, focusing on acting as a Mountain View-based bridge between our co-design time and the Gemini core team.</p>\n<p>You will lead a small team of ML software engineers across numerics, performance optimization, novel training techniques, and novel model exploration. You will drive team cohesion by synthesizing fragmented technical opinions into a single, high-quality execution plan.</p>\n<p>You will partner closely with the hardware team to define requirements for next-generation ML accelerators. You will oversee technical execution across a virtual team including Google-internal and external partners.</p>\n<p>We value diversity of experience, knowledge, backgrounds, and perspectives and harness these qualities to create extraordinary impact. We are committed to equal employment opportunity regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, pregnancy, or related condition (including breastfeeding) or any other basis as protected by applicable law.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_6be97e03-e54","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Google DeepMind","sameAs":"https://deepmind.com/","logo":"https://logos.yubhub.co/deepmind.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/deepmind/jobs/7509867","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["high-performance software","AI/ML","technical leadership","architectural alignment","HW-SW strategy","execution management"],"x-skills-preferred":["Master's or Ph.D. in a related field","hands-on experience with high-performance compute IPs (GPUs, ML accelerators)","experience contributing to silicon development","expertise in at least one core silicon engineering discipline (e.g., RTL, PD, DV) and familiarity with the full ASIC flow"],"datePosted":"2026-04-18T15:42:14.076Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Mountain View, California, US"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"high-performance software, AI/ML, technical leadership, architectural alignment, HW-SW strategy, execution management, Master's or Ph.D. in a related field, hands-on experience with high-performance compute IPs (GPUs, ML accelerators), experience contributing to silicon development, expertise in at least one core silicon engineering discipline (e.g., RTL, PD, DV) and familiarity with the full ASIC flow"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_c1dcea75-d5a"},"title":"Member of Technical Staff - Infrastructure Engineer","description":"<p>We&#39;re looking for an experienced engineer to join our team in Freiburg, Germany or San Francisco, USA. As a Member of Technical Staff - Infrastructure Engineer, you will be responsible for maintaining and scaling our research infrastructure, ensuring health and optimizing components to extract peak performance from the system. You will also collaborate with research teams to deeply understand their infrastructure needs and design solutions that balance performance with cost efficiency.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Maintaining research infrastructure, ensuring health, and optimizing components to extract peak performance from the system (both on application and infrastructure side)</li>\n<li>Scaling infrastructure to meet growing research demands while maintaining reliability and performance</li>\n<li>Collaborating with research teams to deeply understand their infrastructure needs, and design solutions that balance performance with cost efficiency</li>\n<li>Identifying and resolving performance bottlenecks and capacity hotspots through deep analysis of distributed systems at scale</li>\n<li>Building and evolving telemetry and monitoring systems to provide deep visibility into infrastructure performance, utilization, and costs across our cloud and datacenter fleets</li>\n<li>Participating in on-call rotations and incident response to maintain system reliability</li>\n</ul>\n<p>Technical focus includes:</p>\n<ul>\n<li>Python, Bash, Go</li>\n<li>Kubernetes</li>\n<li>Nvidia GPU drivers and operators</li>\n<li>OTel, Prometheus</li>\n</ul>\n<p>Requirements include:</p>\n<ul>\n<li>Experience building or operating large-scale training platforms</li>\n<li>Worked with large-scale compute clusters (GPUs)</li>\n<li>Proven ability to debug performance and reliability issues across large distributed fleets</li>\n<li>Strong problem-solving skills and ability to work independently</li>\n<li>Strong communication skills and the ability to work effectively with both internal and external partners</li>\n<li>Deep knowledge of modern cloud infrastructure including Kubernetes, Infrastructure as Code, AWS, and GCP</li>\n<li>Experience with SLURM</li>\n</ul>\n<p>We offer a competitive base annual salary of $180,000-$300,000 USD and a hybrid work model with a meaningful in-person presence.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_c1dcea75-d5a","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Black Forest Labs","sameAs":"https://www.blackforestlabs.com/","logo":"https://logos.yubhub.co/blackforestlabs.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/blackforestlabs/jobs/4925659008","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$180,000-$300,000 USD","x-skills-required":["Python","Bash","Go","Kubernetes","Nvidia GPU drivers","Nvidia GPU operators","OTel","Prometheus","Experience building or operating large-scale training platforms","Worked with large-scale compute clusters (GPUs)","Proven ability to debug performance and reliability issues across large distributed fleets","Strong problem-solving skills and ability to work independently","Strong communication skills and the ability to work effectively with both internal and external partners","Deep knowledge of modern cloud infrastructure including Kubernetes, Infrastructure as Code, AWS, and GCP","Experience with SLURM"],"x-skills-preferred":[],"datePosted":"2026-04-17T12:25:55.745Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Freiburg (Germany), San Francisco (USA)"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Bash, Go, Kubernetes, Nvidia GPU drivers, Nvidia GPU operators, OTel, Prometheus, Experience building or operating large-scale training platforms, Worked with large-scale compute clusters (GPUs), Proven ability to debug performance and reliability issues across large distributed fleets, Strong problem-solving skills and ability to work independently, Strong communication skills and the ability to work effectively with both internal and external partners, Deep knowledge of modern cloud infrastructure including Kubernetes, Infrastructure as Code, AWS, and GCP, Experience with SLURM","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":180000,"maxValue":300000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_0a7113f5-76c"},"title":"Engineering Manager, Cloud Inference AWS","description":"<p><strong>About the role</strong></p>\n<p>We are seeking an experienced Engineering Manager to lead the Cloud Inference team for AWS. You will lead your team to scale and optimize Claude to serve the massive audiences of developers and enterprise companies using AWS. You will own the end-to-end product of Claude on AWS, including API, load balancing, inference, capacity and operations. Your team will ensure our LLMs meet rigorous performance, safety and security standards and enhance our core infrastructure for packaging, testing, and deploying inference technology across the globe. Your work will increase the scale at which Anthropic operates and accelerate our ability to reliably launch new frontier models and innovative features to customers across all platforms.</p>\n<p><strong>Responsibilities:</strong></p>\n<ul>\n<li>Set technical strategy and oversee development of Claude on AWS across all layers of the technical stack.</li>\n<li>Collaborate across teams and companies to deeply understand product, infrastructure, operations and capacity needs, identifying potential solutions to support frontier LLM serving</li>\n<li>Work closely with cross-functional stakeholders across companies to align on goals and drive outcomes</li>\n<li>Create clarity for the team and stakeholders in an ambiguous and evolving environment</li>\n<li>Take an inclusive approach to hiring and coaching top technical talent, and support a high performing team</li>\n<li>Design and run processes (e.g. postmortem review, incident response, on-call rotations) that help the team operate effectively and never fail the same way twice</li>\n</ul>\n<p><strong>You may be a good fit if you:</strong></p>\n<ul>\n<li>Have 10+ years of experience in high-scale, high-reliability software development, particularly infrastructure or capacity management</li>\n<li>Have 5+ years of engineering management experience</li>\n<li>Experience recruiting, scaling, and retaining engineering talent in a high growth environment</li>\n<li>Have experience scaling products, resources and operations to accommodate rapid growth</li>\n<li>Are deeply interested in the potential transformative effects of advanced AI systems and are committed to ensuring their safe development</li>\n<li>Excel at building strong relationships and strategy with stakeholders across engineering, product, finance, and sales</li>\n<li>Have experience working with external partners to align goals and deliver impact</li>\n<li>Enjoy working in a fast-paced, early environment; comfortable with adapting priorities as driven by the rapidly evolving AI space</li>\n<li>Have excellent written and verbal communication skills</li>\n<li>Demonstrated success building a culture of belonging and engineering excellence</li>\n<li>Are motivated by developing AI responsibly and safely</li>\n<li>Are willing and able to travel frequently between Seattle and the SF Bay Area</li>\n</ul>\n<p><strong>Strong candidates may also have experience with:</strong></p>\n<ul>\n<li>Experience with machine learning infrastructure like GPUs, TPUs, or Trainium, as well as supporting networking infrastructure like NCCL</li>\n<li>Experience as a Product Manager</li>\n<li>Experience with deployment and capacity management automation</li>\n<li>Security and privacy best practice expertise</li>\n</ul>\n<p><strong>Logistics</strong></p>\n<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience. <strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>\n<p><strong>Visa sponsorship:</strong> We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>\n<p><strong>We encourage you to apply even if you do not believe you meet every single qualification.</strong> Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work.</p>\n<p><strong>Your safety matters to us.</strong> To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you&#39;re ever unsure about a communication, don&#39;t click any links—visit anthropic.com/careers directly for confirmed position openings.</p>\n<p><strong>How we&#39;re different</strong></p>\n<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as a collaborative effort, and we work closely with other researchers, engineers, and experts to advance our understanding of AI and its applications.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_0a7113f5-76c","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://job-boards.greenhouse.io","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5141377008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$405,000 - $485,000 USD","x-skills-required":["high-scale, high-reliability software development","infrastructure or capacity management","engineering management","recruiting, scaling, and retaining engineering talent","scaling products, resources and operations","machine learning infrastructure","deployment and capacity management automation","security and privacy best practice expertise"],"x-skills-preferred":["experience with GPUs, TPUs, or Trainium","experience as a Product Manager","experience with networking infrastructure like NCCL"],"datePosted":"2026-03-08T13:56:51.226Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"high-scale, high-reliability software development, infrastructure or capacity management, engineering management, recruiting, scaling, and retaining engineering talent, scaling products, resources and operations, machine learning infrastructure, deployment and capacity management automation, security and privacy best practice expertise, experience with GPUs, TPUs, or Trainium, experience as a Product Manager, experience with networking infrastructure like NCCL","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":405000,"maxValue":485000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_25934fbc-c50"},"title":"Staff / Senior Software Engineer, Cloud Inference","description":"<p><strong>About the Role</strong></p>\n<p>The Cloud Inference team scales and optimizes Claude to serve the massive audiences of developers and enterprise companies across AWS, GCP, Azure, and future cloud service providers (CSPs). We own the end-to-end product of Claude on each cloud platform—from API integration and intelligent request routing to inference execution, capacity management, and day-to-day operations.</p>\n<p>Our engineers are extremely high leverage: we simultaneously drive multiple major revenue streams while optimizing one of Anthropic&#39;s most precious resources—compute. As we expand to more cloud platforms, the complexity of managing inference efficiently across providers with different hardware, networking stacks, and operational models grows significantly. We need engineers who can navigate these platform differences, build robust abstractions that work across providers, and make smart infrastructure decisions that keep us cost-effective at massive scale.</p>\n<p>Your work will increase the scale at which our services operate, accelerate our ability to reliably launch new frontier models and innovative features to customers across all platforms, and ensure our LLMs meet rigorous safety, performance, and security standards.</p>\n<p><strong>What You&#39;ll Do</strong></p>\n<ul>\n<li>Design and build infrastructure that serves Claude across multiple CSPs, accounting for differences in compute hardware, networking, APIs, and operational models</li>\n<li>Collaborate with CSP partner engineering teams to resolve operational issues, influence provider roadmaps, and stand up end-to-end serving on new cloud platforms</li>\n<li>Design and evolve CI/CD automation systems, including validation and deployment pipelines, that reliably ship new model versions to millions of users across cloud platforms without regressions</li>\n<li>Design interfaces and tooling abstractions across CSPs that enable cost-effective inference management, scale across providers, and reduce per-platform complexity</li>\n<li>Contribute to capacity planning and autoscaling strategies that dynamically match supply with demand across CSP validation and production workloads</li>\n<li>Optimize inference cost and performance across providers—designing workload placement and routing systems that direct requests to the most cost-effective accelerator and region</li>\n<li>Contribute to inference features that must work consistently across all platforms</li>\n<li>Analyze observability data across providers to identify performance bottlenecks, cost anomalies, and regressions, and drive remediation based on real-world production workloads</li>\n</ul>\n<p><strong>You May Be a Good Fit If You:</strong></p>\n<ul>\n<li>Have significant software engineering experience, with a strong background in high-performance, large-scale distributed systems serving millions of users</li>\n<li>Have experience building or operating services on at least one major cloud platform (AWS, GCP, or Azure), with exposure to Kubernetes, Infrastructure as Code or container orchestration</li>\n<li>Have strong interest in inference</li>\n<li>Thrive in cross-functional collaboration with both internal teams and external partners</li>\n<li>Are a fast learner who can quickly ramp up on new technologies, hardware platforms, and provider ecosystems</li>\n<li>Are highly autonomous and self-driven, taking ownership of problems end-to-end with a bias toward flexibility and high-impact work</li>\n<li>Pick up slack, even when it goes outside your job description</li>\n</ul>\n<p><strong>Strong Candidates May Also Have Experience With</strong></p>\n<ul>\n<li>Direct experience working with CSP partner teams to scale infrastructure or products across multiple platforms, navigating differences in networking, security, privacy, billing, and managed service offerings</li>\n<li>A background in building platform-agnostic tooling or abstraction layers that work across cloud providers</li>\n<li>Hands-on experience with capacity management, cost optimization, or resource planning at scale across heterogeneous environments</li>\n<li>Strong familiarity with LLM inference optimization, batching, caching, and serving strategies</li>\n<li>Experience with Machine learning infrastructure including GPUs, TPUs, Trainium, or other AI accelerators</li>\n<li>Background designing and building CI/CD systems that automate deployment and validation across cloud environments</li>\n<li>Solid understanding of multi-region deployments, geographic routing, and global traffic management</li>\n<li>Proficiency in Python or Rust</li>\n</ul>\n<p><strong>Logistics</strong></p>\n<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience. <strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>\n<p><strong>Visa sponsorship:</strong> We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_25934fbc-c50","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5107466008","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$300,000 - $485,000 USD","x-skills-required":["Software engineering","Cloud infrastructure","Kubernetes","Infrastructure as Code","Container orchestration","LLM inference optimization","Batching","Caching","Serving strategies","Machine learning infrastructure","GPUs","TPUs","Trainium","AI accelerators","CI/CD systems","Deployment and validation","Cloud environments","Multi-region deployments","Geographic routing","Global traffic management"],"x-skills-preferred":["Python","Rust","Cloud platforms","Networking","Security","Privacy","Billing","Managed service offerings","Platform-agnostic tooling","Abstraction layers","Capacity management","Cost optimization","Resource planning"],"datePosted":"2026-03-08T13:49:59.956Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Software engineering, Cloud infrastructure, Kubernetes, Infrastructure as Code, Container orchestration, LLM inference optimization, Batching, Caching, Serving strategies, Machine learning infrastructure, GPUs, TPUs, Trainium, AI accelerators, CI/CD systems, Deployment and validation, Cloud environments, Multi-region deployments, Geographic routing, Global traffic management, Python, Rust, Cloud platforms, Networking, Security, Privacy, Billing, Managed service offerings, Platform-agnostic tooling, Abstraction layers, Capacity management, Cost optimization, Resource planning","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":300000,"maxValue":485000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_3b20b513-ea1"},"title":"Staff+ Software Engineer, Systems","description":"<p><strong>About Anthropic</strong></p>\n<p>Anthropic&#39;s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.</p>\n<p><strong>About the Role</strong></p>\n<p>Anthropic&#39;s Infrastructure organisation is foundational to our mission of developing AI systems that are reliable, interpretable, and steerable. The systems we build determine how quickly we can train new models, how reliably we can run safety experiments, and how effectively we can scale Claude to millions of users — demonstrating that safe, reliable infrastructure and frontier capabilities can go hand in hand.</p>\n<p>The Systems engineering team owns compute uptime and resilience at massive scale, building the clusters, automation, and observability that make frontier AI research possible and safely deployable to customers.</p>\n<p>_Team Matching: Team matching is determined after the interview process based on interview performance, interests, and business priorities. Please note we may also consider you for different Infrastructure teams._</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Own the technical strategy and roadmap for your area, translating team-level goals into concrete execution plans</li>\n<li>Drive cross-team initiatives to build and scale AI clusters (thousands to hundreds of thousands of machines)</li>\n<li>Define infrastructure architecture, ensuring the hardest problems get solved — whether by you directly or by working through others</li>\n<li>Partner with cloud providers and internal stakeholders to shape long-term compute, data, and infrastructure strategy</li>\n<li>Establish and evolve operational excellence practices (incident response, postmortem culture, on-call)</li>\n</ul>\n<p><strong>You may be a good fit if you:</strong></p>\n<ul>\n<li>Have 10+ years of software engineering experience</li>\n<li>Have led complex, multi-quarter technical initiatives that span multiple teams or systems</li>\n<li>Can set technical direction for a team, not just execute within it</li>\n<li>Have deep expertise in distributed systems, reliability, and cloud platforms (Kubernetes, IaC, AWS/GCP)</li>\n<li>Are strong in at least one systems language (Python, Rust, Go, Java)</li>\n<li>Naturally uplevel the engineers around you and can redirect efforts when things are heading off track</li>\n<li>Build alignment across senior stakeholders and communicate effectively at all levels</li>\n</ul>\n<p><strong>Strong candidates may have:</strong></p>\n<ul>\n<li>Security and privacy best practice expertise</li>\n<li>Experience with machine learning infrastructure like GPUs, TPUs, or Trainium, as well as supporting networking infrastructure like NCCL</li>\n<li>Low level systems experience, for example linux kernel tuning and eBPF</li>\n<li>Technical expertise: Quickly understanding systems design tradeoffs, keeping track of rapidly evolving software systems</li>\n</ul>\n<p>_Deadline to apply: None. Applications will be reviewed on a rolling basis._</p>\n<p><strong>Logistics</strong></p>\n<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience. <strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>\n<p><strong>Visa sponsorship:</strong> We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>\n<p><strong>We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work.</strong></p>\n<p><strong>Your safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you&#39;re ever unsure about a communication, don&#39;t click any links—visit anthropic.com/careers directly for confirmed position openings.</strong></p>\n<p><strong>How we&#39;re different</strong></p>\n<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We&#39;re an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.</p>\n<p>The easiest way to understand our research directions is to read our recent research. This re</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_3b20b513-ea1","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://job-boards.greenhouse.io","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5108817008","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$405,000 - $485,000 USD","x-skills-required":["distributed systems","reliability","cloud platforms","Kubernetes","IaC","AWS/GCP","Python","Rust","Go","Java"],"x-skills-preferred":["security and privacy best practice expertise","machine learning infrastructure","GPUs","TPUs","Trainium","NCCL","low level systems experience","linux kernel tuning","eBPF"],"datePosted":"2026-03-08T13:49:17.054Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"distributed systems, reliability, cloud platforms, Kubernetes, IaC, AWS/GCP, Python, Rust, Go, Java, security and privacy best practice expertise, machine learning infrastructure, GPUs, TPUs, Trainium, NCCL, low level systems experience, linux kernel tuning, eBPF","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":405000,"maxValue":485000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_886a66bf-10d"},"title":"Senior Software Engineer, Systems","description":"<p><strong>About Anthropic</strong></p>\n<p>Anthropic&#39;s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.</p>\n<p><strong>About the Role</strong></p>\n<p>Anthropic&#39;s Infrastructure organisation is foundational to our mission of developing AI systems that are reliable, interpretable, and steerable. The systems we build determine how quickly we can train new models, how reliably we can run safety experiments, and how effectively we can scale Claude to millions of users — demonstrating that safe, reliable infrastructure and frontier capabilities can go hand in hand.</p>\n<p>The Systems engineering team owns compute uptime and resilience at massive scale, building the clusters, automation, and observability that make frontier AI research possible and safely deployable to customers.</p>\n<p>_Team Matching: Team matching is determined after the interview process based on interview performance, interests, and business priorities. Please note we may also consider you for different Infrastructure teams._</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Lead infrastructure projects from design through delivery, owning scope, execution, and outcomes</li>\n<li>Build and maintain systems that support AI clusters at massive scale (thousands to hundreds of thousands of machines)</li>\n<li>Partner with cloud providers and internal teams to solve compute, networking, and reliability challenges</li>\n<li>Tackle difficult technical problems in your domain and proactively fill gaps in tooling, documentation, and processes</li>\n<li>Contribute to operational practices including incident response, postmortems, and on-call rotations</li>\n</ul>\n<p><strong>You may be a good fit if you:</strong></p>\n<ul>\n<li>Have 6+ years of software engineering experience</li>\n<li>Have led technical projects end-to-end over multiple months, including scoping, breaking down work, and driving delivery</li>\n<li>Have deep knowledge of distributed systems, reliability, and cloud platforms (Kubernetes, IaC, AWS/GCP)</li>\n<li>Are strong in at least one systems language (Python, Rust, Go, Java)</li>\n<li>Solve hard problems independently and know when to pull others in</li>\n<li>Help teammates grow through knowledge sharing and thoughtful technical guidance</li>\n<li>Communicate clearly in design docs, presentations, and cross-functional discussions</li>\n</ul>\n<p><strong>Strong candidates may have:</strong></p>\n<ul>\n<li>Security and privacy best practice expertise</li>\n<li>Experience with machine learning infrastructure like GPUs, TPUs, or Trainium, as well as supporting networking infrastructure like NCCL</li>\n<li>Low level systems experience, for example linux kernel tuning and eBPF</li>\n<li>Technical expertise: Quickly understanding systems design tradeoffs, keeping track of rapidly evolving software systems</li>\n</ul>\n<p>_Deadline to apply: None. Applications will be reviewed on a rolling basis._</p>\n<p><strong>Logistics</strong></p>\n<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience. <strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>\n<p><strong>Visa sponsorship:</strong> We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>\n<p><strong>We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work.</strong></p>\n<p><strong>Your safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you&#39;re ever unsure about a communication, don&#39;t click any links—visit anthropic.com/careers directly for confirmed position openings.</strong></p>\n<p><strong>How we&#39;re different</strong></p>\n<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We&#39;re an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.</p>\n<p>The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_886a66bf-10d","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://job-boards.greenhouse.io","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4915842008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"£240,000 - £325,000GBP","x-skills-required":["distributed systems","reliability","cloud platforms","Kubernetes","IaC","AWS/GCP","Python","Rust","Go","Java"],"x-skills-preferred":["security and privacy best practice expertise","machine learning infrastructure","GPUs","TPUs","Trainium","NCCL","low level systems experience","linux kernel tuning","eBPF"],"datePosted":"2026-03-08T13:46:27.991Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"London, UK"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"distributed systems, reliability, cloud platforms, Kubernetes, IaC, AWS/GCP, Python, Rust, Go, Java, security and privacy best practice expertise, machine learning infrastructure, GPUs, TPUs, Trainium, NCCL, low level systems experience, linux kernel tuning, eBPF","baseSalary":{"@type":"MonetaryAmount","currency":"GBP","value":{"@type":"QuantitativeValue","minValue":240000,"maxValue":325000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_58928a28-64d"},"title":"Research Engineer/Research Scientist, Audio","description":"<p><strong>About Anthropic</strong></p>\n<p>Anthropic&#39;s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.</p>\n<p><strong>You may be a good fit if you:</strong></p>\n<ul>\n<li>Have hands-on experience with training audio models, whether that&#39;s conversational speech-to-speech, speech translation, speech recognition, text-to-speech, diarization, codecs, or generative audio models</li>\n<li>Genuinely enjoy both research and engineering work, and you&#39;d describe your ideal split as roughly 50/50 rather than heavily weighted toward one or the other</li>\n<li>Are comfortable working across abstraction levels, from signal processing fundamentals to large-scale model training and inference optimization</li>\n<li>Have deep expertise with JAX, PyTorch, or large-scale distributed training, and can debug performance issues across the full stack</li>\n<li>Thrive in fast-moving environments where the most important problem might shift as we learn more about what works</li>\n<li>Communicate clearly and collaborate effectively; audio touches many parts of our systems, so you&#39;ll work closely with teams across the company</li>\n<li>Are passionate about building conversational AI that feels natural, steerable, and safe</li>\n<li>Care about the societal impacts of voice AI and want to help shape how these systems are developed responsibly</li>\n</ul>\n<p><strong>Strong candidates may also have experience with:</strong></p>\n<ul>\n<li>Large language model pretraining and finetuning</li>\n<li>Training diffusion models for image and audio generation</li>\n<li>Reinforcement learning for large language models and diffusion models</li>\n<li>End-to-end system optimization, from performance benchmarking to kernel optimization</li>\n<li>GPUs, Kubernetes, PyTorch, or distributed training infrastructure</li>\n</ul>\n<p><strong>Representative projects:</strong></p>\n<ul>\n<li>Training state-of-the art neural audio codecs for 48 kHz stereo audio</li>\n<li>Developing novel algorithms for diffusion pretraining and reinforcement learning</li>\n<li>Scaling audio datasets to millions of hours of high quality audio</li>\n<li>Creating robust evaluation methodologies for hard-to-measure qualities such as naturalness or expressiveness</li>\n<li>Studying training dynamics of mixed audio-text language models</li>\n<li>Optimizing latency and inference throughput for deployed streaming audio systems</li>\n</ul>\n<p><strong>Logistics</strong></p>\n<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience. <strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>\n<p><strong>Visa sponsorship:</strong> We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>\n<p><strong>We encourage you to apply even if you do not believe you meet every single qualification.</strong> Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work.</p>\n<p><strong>Your safety matters to us.</strong> To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you&#39;re ever unsure about a communication, don&#39;t click any links—visit anthropic.com/careers directly for confirmed position openings.</p>\n<p><strong>How we&#39;re different</strong></p>\n<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI systems that benefit society.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_58928a28-64d","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://job-boards.greenhouse.io","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5074815008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$350,000 - $500,000 USD","x-skills-required":["audio models","speech-to-speech","speech translation","speech recognition","text-to-speech","diarization","codecs","generative audio models","JAX","PyTorch","large-scale distributed training"],"x-skills-preferred":["large language model pretraining","training diffusion models","reinforcement learning","end-to-end system optimization","GPUs","Kubernetes","PyTorch","distributed training infrastructure"],"datePosted":"2026-03-08T13:46:24.550Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"audio models, speech-to-speech, speech translation, speech recognition, text-to-speech, diarization, codecs, generative audio models, JAX, PyTorch, large-scale distributed training, large language model pretraining, training diffusion models, reinforcement learning, end-to-end system optimization, GPUs, Kubernetes, PyTorch, distributed training infrastructure","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":350000,"maxValue":500000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_9d8e34bd-10a"},"title":"Research Engineer / Research Scientist, Tokens","description":"<p><strong>About Anthropic</strong></p>\n<p>Anthropic&#39;s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.</p>\n<p><strong>You may be a good fit if you:</strong></p>\n<ul>\n<li>Have significant software engineering experience</li>\n<li>Are results-oriented, with a bias towards flexibility and impact</li>\n<li>Pick up slack, even if it goes outside your job description</li>\n<li>Enjoy pair programming (we love to pair!)</li>\n<li>Want to learn more about machine learning research</li>\n<li>Care about the societal impacts of your work</li>\n</ul>\n<p><strong>Strong candidates may also have experience with:</strong></p>\n<ul>\n<li>High performance, large-scale ML systems</li>\n<li>GPUs, Kubernetes, Pytorch, or OS internals</li>\n<li>Language modeling with transformers</li>\n<li>Reinforcement learning</li>\n<li>Large-scale ETL</li>\n</ul>\n<p><strong>Representative projects:</strong></p>\n<ul>\n<li>Optimizing the throughput of a new attention mechanism</li>\n<li>Comparing the compute efficiency of two Transformer variants</li>\n<li>Making a Wikipedia dataset in a format models can easily consume</li>\n<li>Scaling a distributed training job to thousands of GPUs</li>\n<li>Writing a design doc for fault tolerance strategies</li>\n<li>Creating an interactive visualization of attention between tokens in a language model</li>\n</ul>\n<p><strong>Annual compensation range for this role is listed below.</strong></p>\n<p>Annual Salary:</p>\n<p>$350,000 - $500,000USD</p>\n<p><strong>Logistics</strong></p>\n<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience. <strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>\n<p><strong>Visa sponsorship:</strong> We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>\n<p><strong>Your safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you&#39;re ever unsure about a communication, don&#39;t click any links—visit anthropic.com/careers directly for confirmed position openings.</strong></p>\n<p><strong>How we&#39;re different</strong></p>\n<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We&#39;re an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.</p>\n<p>The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI &amp; Compute, Concrete Problems in AI Safety, and Learning from Human Preferences.</p>\n<p><strong>Come work with us!</strong></p>\n<p>Anthropic is a public benefit corporation headquartered in California, USA.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_9d8e34bd-10a","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://job-boards.greenhouse.io","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4951814008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$350,000 - $500,000USD","x-skills-required":["software engineering","machine learning research","high performance","large-scale ML systems","GPUs","Kubernetes","Pytorch","OS internals","language modeling","reinforcement learning","large-scale ETL"],"x-skills-preferred":["pair programming","collaboration","communication skills"],"datePosted":"2026-03-08T13:46:19.922Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"New York City, NY; Seattle, WA; San Francisco, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"software engineering, machine learning research, high performance, large-scale ML systems, GPUs, Kubernetes, Pytorch, OS internals, language modeling, reinforcement learning, large-scale ETL, pair programming, collaboration, communication skills","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":350000,"maxValue":500000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_11a60d5a-f54"},"title":"Performance Engineer, GPU","description":"<p><strong>About the role:</strong></p>\n<p>Pioneering the next generation of AI requires breakthrough innovations in GPU performance and systems engineering. As a GPU Performance Engineer, you&#39;ll architect and implement the foundational systems that power Claude and push the frontiers of what&#39;s possible with large language models. You&#39;ll be responsible for maximizing GPU utilization and performance at unprecedented scale, developing cutting-edge optimizations that directly enable new model capabilities and dramatically improve inference efficiency.</p>\n<p>Working at the intersection of hardware and software, you&#39;ll implement state-of-the-art techniques from custom kernel development to distributed system architectures. Your work will span the entire stack—from low-level tensor core optimizations to orchestrating thousands of GPUs in perfect synchronization.</p>\n<p>Strong candidates will have a track record of delivering transformative GPU performance improvements in production ML systems and will be excited to shape the future of AI infrastructure alongside world-class researchers and engineers.</p>\n<p><strong>You might be a good fit if you:</strong></p>\n<ul>\n<li>Have deep experience with GPU programming and optimization at scale</li>\n<li>Are impact-driven, passionate about delivering measurable performance breakthroughs</li>\n<li>Can navigate complex systems from hardware interfaces to high-level ML frameworks</li>\n<li>Enjoy collaborative problem-solving and pair programming</li>\n<li>Want to work on state-of-the-art language models with real-world impact</li>\n<li>Care about the societal impacts of your work</li>\n<li>Thrive in ambiguous environments where you define the path forward</li>\n</ul>\n<p><strong>Strong candidates may also have experience with:</strong></p>\n<ul>\n<li>GPU Kernel Development: CUDA, Triton, CUTLASS, Flash Attention, tensor core optimization</li>\n<li>ML Compilers &amp; Frameworks: PyTorch/JAX internals, torch.compile, XLA, custom operators</li>\n<li>Performance Engineering: Kernel fusion, memory bandwidth optimization, profiling with Nsight</li>\n<li>Distributed Systems: NCCL, NVLink, collective communication, model parallelism</li>\n<li>Low-Precision: INT8/FP8 quantization, mixed-precision techniques</li>\n<li>Production Systems: Large-scale training infrastructure, fault tolerance, cluster orchestration</li>\n</ul>\n<p><strong>Representative projects:</strong></p>\n<ul>\n<li>Co-design attention mechanisms and algorithms for next-generation hardware architectures</li>\n<li>Develop custom kernels for emerging quantization formats and mixed-precision techniques</li>\n<li>Design distributed communication strategies for multi-node GPU clusters</li>\n<li>Optimize end-to-end training and inference pipelines for frontier language models</li>\n<li>Build performance modeling frameworks to predict and optimize GPU utilization</li>\n<li>Implement kernel fusion strategies to minimize memory bandwidth bottlenecks</li>\n<li>Create resilient systems for planet-scale distributed training infrastructure</li>\n<li>Profile and eliminate performance bottlenecks in production serving infrastructure</li>\n<li>Partner with hardware vendors to influence future accelerator capabilities and software stacks</li>\n</ul>\n<p><strong>Deadline to apply:</strong> None. Applications will be reviewed on a rolling basis.</p>\n<p>The expected salary range for this position is:</p>\n<p>Annual Salary: $280,000 - $850,000USD</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_11a60d5a-f54","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://job-boards.greenhouse.io","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4926227008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$280,000 - $850,000USD","x-skills-required":["GPU programming","optimization at scale","custom kernel development","distributed system architectures","low-level tensor core optimizations","orchestrating thousands of GPUs","GPU kernel development","CUDA","Triton","CUTLASS","Flash Attention","tensor core optimization","ML compilers & frameworks","PyTorch/JAX internals","torch.compile","XLA","custom operators","performance engineering","kernel fusion","memory bandwidth optimization","profiling with Nsight","distributed systems","NCCL","NVLink","collective communication","model parallelism","low-precision","INT8/FP8 quantization","mixed-precision techniques","production systems","large-scale training infrastructure","fault tolerance","cluster orchestration"],"x-skills-preferred":["GPU programming","optimization at scale","custom kernel development","distributed system architectures","low-level tensor core optimizations","orchestrating thousands of GPUs","GPU kernel development","CUDA","Triton","CUTLASS","Flash Attention","tensor core optimization","ML compilers & frameworks","PyTorch/JAX internals","torch.compile","XLA","custom operators","performance engineering","kernel fusion","memory bandwidth optimization","profiling with Nsight","distributed systems","NCCL","NVLink","collective communication","model parallelism","low-precision","INT8/FP8 quantization","mixed-precision techniques","production systems","large-scale training infrastructure","fault tolerance","cluster orchestration"],"datePosted":"2026-03-08T13:45:05.412Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"GPU programming, optimization at scale, custom kernel development, distributed system architectures, low-level tensor core optimizations, orchestrating thousands of GPUs, GPU kernel development, CUDA, Triton, CUTLASS, Flash Attention, tensor core optimization, ML compilers & frameworks, PyTorch/JAX internals, torch.compile, XLA, custom operators, performance engineering, kernel fusion, memory bandwidth optimization, profiling with Nsight, distributed systems, NCCL, NVLink, collective communication, model parallelism, low-precision, INT8/FP8 quantization, mixed-precision techniques, production systems, large-scale training infrastructure, fault tolerance, cluster orchestration, GPU programming, optimization at scale, custom kernel development, distributed system architectures, low-level tensor core optimizations, orchestrating thousands of GPUs, GPU kernel development, CUDA, Triton, CUTLASS, Flash Attention, tensor core optimization, ML compilers & frameworks, PyTorch/JAX internals, torch.compile, XLA, custom operators, performance engineering, kernel fusion, memory bandwidth optimization, profiling with Nsight, distributed systems, NCCL, NVLink, collective communication, model parallelism, low-precision, INT8/FP8 quantization, mixed-precision techniques, production systems, large-scale training infrastructure, fault tolerance, cluster orchestration","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":280000,"maxValue":850000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_5d37a7c7-d2a"},"title":"ML Infrastructure Engineer","description":"<p><strong>About the role</strong></p>\n<p>The ML Infrastructure team at Cursor builds large-scale compute, storage, and software infrastructure to support the company&#39;s work building the world&#39;s best agentic coding model. We&#39;re looking for strong engineers who are interested in building high-performance infrastructure and the software to support it. This role works closely with ML researchers and engineers to enable their work through improvements to our training framework, systems reliability/performance, and developer experience.</p>\n<p><strong>What you&#39;ll do</strong></p>\n<ul>\n<li>Collaborate with ML researchers to improve the throughput and reliability of training</li>\n<li>Work with OEMs, cloud service providers, and others to plan and build cutting-edge GPU infrastructure</li>\n<li>Improve the density and scalability of compute environments to enable increasingly large RL workloads</li>\n<li>Create software and systems to automate building, monitoring, and running GPU clusters</li>\n<li>Build workload scheduling and data movement systems to support Cursor&#39;s growing training footprint</li>\n</ul>\n<p><strong>You may be a fit if</strong></p>\n<ul>\n<li>A strong background in systems and infrastructure-focused software engineering, particularly in Python, Typescript, Rust, and Golang</li>\n<li>Experience with distributed storage and networking infrastructure, particularly on Linux systems across cloud and bare metal environments</li>\n<li>Exposure to large-scale systems and their unique challenges, ideally across thousands of nodes with significant resource footprints</li>\n</ul>\n<p><strong>Nice to have</strong></p>\n<ul>\n<li>Operational exposure to Nvidia GPUs with Infiniband or RoCE, particularly with Blackwell and Hopper-class hardware</li>\n<li>Exposure to Ray, Slurm, or other common compute and runtime schedulers</li>\n</ul>\n<p>Name<em> Email</em> ↥ Upload file LinkedIn URL GitHub Profile</p>\n<p>Please write a short note on a project you&#39;re proud of:</p>\n<p>Will you now or in the future require visa sponsorship to work in the country where this position is located?</p>\n<p>Has someone at Cursor referred you for this role? If so, please include their email here</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_5d37a7c7-d2a","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Cursor","sameAs":"https://cursor.com","logo":"https://logos.yubhub.co/cursor.com.png"},"x-apply-url":"https://cursor.com/careers/software-engineer-ml-infrastructure","x-work-arrangement":"remote","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Python","Typescript","Rust","Golang","Distributed storage","Networking infrastructure","Linux systems","Kubernetes"],"x-skills-preferred":["Nvidia GPUs","Infiniband","RoCE","Blackwell","Hopper-class hardware","Ray","Slurm"],"datePosted":"2026-03-08T00:17:18.553Z","jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Typescript, Rust, Golang, Distributed storage, Networking infrastructure, Linux systems, Kubernetes, Nvidia GPUs, Infiniband, RoCE, Blackwell, Hopper-class hardware, Ray, Slurm"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_f5e7e195-679"},"title":"Datacenter Hardware Operations Technician, AI Compute Infrastructure - Stargate","description":"<p><strong>Job Posting</strong></p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$86.4K – $228K</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p><strong>About the Team</strong></p>\n<p>OpenAI, in close collaboration with our capital partners, is embarking on a journey to build the world’s most advanced AI infrastructure ecosystem. Our Stargate program develops and deploys massive, state-of-the-art data center campuses in partnership with industry leaders such as Oracle today—and through future OpenAI infrastructure projects tomorrow. We design for scale, speed, and reliability, and we need experienced hardware professionals who can help ensure our high-density compute environment operates at peak performance.</p>\n<p><strong>About the Role</strong></p>\n<p>We are seeking a senior datacenter hardware operations technician to coordinate physical hardware activities at a large partner-operated campus. In this role you will work side-by-side with Oracle and their delivery teams, helping align OpenAI’s compute requirements with day-to-day hardware work on the ground. Rather than directing partner personnel, you will focus on collaboration, technical alignment, and shared problem solving, ensuring that maintenance, repairs, and lifecycle activities support the performance and reliability goals of both organizations. As the campus matures, you will help capture lessons learned and develop standards and playbooks to guide hardware operations at future OpenAI infrastructure projects.</p>\n<p>_Candidates must be able to sit onsite in Abilene, Texas 5 days per week_</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Serve as OpenAI’s primary on-site hardware contact, collaborating with Oracle teams and vendors to plan and coordinate maintenance, repairs, and lifecycle activities.</li>\n</ul>\n<ul>\n<li>Share technical requirements and verify that work performed supports OpenAI’s compute needs and agreed quality targets.</li>\n</ul>\n<ul>\n<li>Coordinate schedules, spare-parts planning, and issue escalation with partner teams to minimize downtime and keep operations running smoothly.</li>\n</ul>\n<ul>\n<li>Work with OpenAI fleet-health engineers to translate software-detected issues into on-site hardware actions in partnership with Oracle.</li>\n</ul>\n<ul>\n<li>Track hardware trends and provide joint recommendations with partner teams for design or operational improvements.</li>\n</ul>\n<ul>\n<li>Prepare documentation and runbooks that capture joint best practices and can be applied at additional campuses.</li>\n</ul>\n<ul>\n<li>Offer technical guidance and context to partner personnel while respecting their operational ownership.</li>\n</ul>\n<ul>\n<li>Collaborate with supply-chain teams to plan spares and manage hardware lifecycle activities.</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>Have 7+ years of experience in datacenter hardware operations, hardware engineering, or large-scale server maintenance, with at least 2 years in a senior or lead technician capacity.</li>\n</ul>\n<ul>\n<li>Bring deep knowledge of high-density server hardware, including x86 platforms, GPUs, storage devices, and power/cooling systems.</li>\n</ul>\n<ul>\n<li>Excel at diagnosing hardware issues, coordinating complex repairs, and maintaining strong working relationships across organizations.</li>\n</ul>\n<ul>\n<li>Are comfortable setting technical expectations and validating outcomes through collaboration, not direct management.</li>\n</ul>\n<ul>\n<li>Adapt quickly to changing operational conditions and enjoy solving problems at both the strategic and on-site levels.</li>\n</ul>\n<ul>\n<li>Communicate clearly and build trust across partner teams, vendors, and internal engineering stakeholders.</li>\n</ul>\n<ul>\n<li>Are willing to be based full-time at a partner-operated campus</li>\n</ul>\n<p><strong>Preferred Skills</strong></p>\n<ul>\n<li>Familiarity with large-scale cluster management or monitoring tools (IPMI, BMC, Prometheus, Nagios) to interpret alerts and coordinate partner responses.</li>\n</ul>\n<ul>\n<li>Experience with GPU-accelerated compute clusters or other high-performance computing hardware.</li>\n</ul>\n<ul>\n<li>Knowledge of Linux/Unix system administration and command-line diagnostic tools for hardware validation.</li>\n</ul>\n<ul>\n<li>Industry certifications such as CompTIA Server+, OEM hardware certifications, or equivalent.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_f5e7e195-679","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/b9a4a809-a965-4dbe-aeef-6ce1593903dd","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$86.4K – $228K","x-skills-required":["datacenter hardware operations","hardware engineering","large-scale server maintenance","high-density server hardware","x86 platforms","GPUs","storage devices","power/cooling systems"],"x-skills-preferred":["large-scale cluster management","monitoring tools","IPMI","BMC","Prometheus","Nagios","GPU-accelerated compute clusters","Linux/Unix system administration","command-line diagnostic tools","industry certifications"],"datePosted":"2026-03-06T18:43:34.654Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Remote - US"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"datacenter hardware operations, hardware engineering, large-scale server maintenance, high-density server hardware, x86 platforms, GPUs, storage devices, power/cooling systems, large-scale cluster management, monitoring tools, IPMI, BMC, Prometheus, Nagios, GPU-accelerated compute clusters, Linux/Unix system administration, command-line diagnostic tools, industry certifications","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":86400,"maxValue":228000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_d5390946-539"},"title":"Software Engineer, Model Inference","description":"<p><strong>Software Engineer, Model Inference</strong></p>\n<p><strong>Location</strong></p>\n<p>San Francisco</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p>Scaling</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$295K – $555K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p><strong>About the Team</strong></p>\n<p>Our Inference team brings OpenAI’s most capable research and technology to the world through our products. We empower consumers, enterprise and developers alike to use and access our start-of-the-art AI models, allowing them to do things that they’ve never been able to before. We focus on performant and efficient model inference, as well as accelerating research progression via model inference.</p>\n<p><strong>About the Role</strong></p>\n<p>We are looking for an engineer who wants to take the world&#39;s largest and most capable AI models and optimize them for use in a high-volume, low-latency, and high-availability production and research environment.</p>\n<p><strong>In this role, you will:</strong></p>\n<ul>\n<li>Work alongside machine learning researchers, engineers, and product managers to bring our latest technologies into production.</li>\n</ul>\n<ul>\n<li>Work alongside researchers to enable advanced research through awesome engineering.</li>\n</ul>\n<ul>\n<li>Introduce new techniques, tools, and architecture that improve the performance, latency, throughput, and efficiency of our model inference stack.</li>\n</ul>\n<ul>\n<li>Build tools to give us visibility into our bottlenecks and sources of instability and then design and implement solutions to address the highest priority issues.</li>\n</ul>\n<ul>\n<li>Optimize our code and fleet of Azure VMs to utilize every FLOP and every GB of GPU RAM of our hardware.</li>\n</ul>\n<p><strong>You might thrive in this role if you:</strong></p>\n<ul>\n<li>Have an understanding of modern ML architectures and an intuition for how to optimize their performance, particularly for inference.</li>\n</ul>\n<ul>\n<li>Own problems end-to-end, and are willing to pick up whatever knowledge you&#39;re missing to get the job done.</li>\n</ul>\n<ul>\n<li>Have at least 5 years of professional software engineering experience.</li>\n</ul>\n<ul>\n<li>Have or can quickly gain familiarity with PyTorch, NVidia GPUs and the software stacks that optimize them (e.g. NCCL, CUDA), as well as HPC technologies such as InfiniBand, MPI, NVLink, etc.</li>\n</ul>\n<ul>\n<li>Have experience architecting, building, observing, and debugging production distributed systems. Bonus point if worked on performance-critical distributed systems.</li>\n</ul>\n<ul>\n<li>Have needed to rebuild or substantially refactor production systems several times over due to rapidly increasing scale.</li>\n</ul>\n<ul>\n<li>Are self-directed and enjoy figuring out the most important problem to work on.</li>\n</ul>\n<ul>\n<li>Have a humble attitude, an eagerness to help your colleagues, and a desire to do whatever it takes to make the team succeed.</li>\n</ul>\n<p><strong>About OpenAI</strong></p>\n<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_d5390946-539","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/83b6755d-7785-4186-9050-5ef3ad127941","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$295K – $555K • Offers Equity","x-skills-required":["PyTorch","NVidia GPUs","NCCL","CUDA","HPC technologies","InfiniBand","MPI","NVLink","Azure VMs","GPU RAM","FLOP"],"x-skills-preferred":["modern ML architectures","intuition for optimizing performance","distributed systems","performance-critical distributed systems"],"datePosted":"2026-03-06T18:31:29.482Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"PyTorch, NVidia GPUs, NCCL, CUDA, HPC technologies, InfiniBand, MPI, NVLink, Azure VMs, GPU RAM, FLOP, modern ML architectures, intuition for optimizing performance, distributed systems, performance-critical distributed systems","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":295000,"maxValue":555000,"unitText":"YEAR"}}}]}