{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/compute-clusters"},"x-facet":{"type":"skill","slug":"compute-clusters","display":"Compute Clusters","count":6},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_18013f3c-904"},"title":"Cluster Deployment Engineer","description":"<p>As a Cluster Deployment Engineer at Anthropic, you will own how large-scale AI compute clusters physically come together inside our datacenter fleet.</p>\n<p>You will set the deployment-engineering strategy for cluster build-out , how racks are organized into pods, halls, and sites; how compute, network, power, and cooling systems interface at the rack boundary; and how deployment scope flows cleanly from hardware specification to facility delivery to a running cluster.</p>\n<p>This role is focused on deployment engineering, not on datacenter network or systems design , your scope is making sure clusters land cleanly and predictably, not designing the fabrics or facilities themselves.</p>\n<p>You will work across hardware, networking, facilities, supply chain, and construction to ensure that every generation of accelerator we deploy lands in a datacenter that is ready for it , on schedule, at full density, and with every piece of required infrastructure accounted for.</p>\n<p>You will be the person who sees around corners: anticipating how next-generation rack designs will stress our facilities, where our deployment model will break at scale, and what needs to change now so that the next cluster turn-up is faster and more predictable than the last.</p>\n<p>You will operate at the intersection of engineering strategy and execution discipline, partnering with internal research and systems teams, external developers, engineering firms, and OEM partners to deliver cluster capacity at the speed the frontier demands.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Own cluster-level deployment strategy , define how AI compute clusters are organized across the floor, how racks interconnect, and how cluster topology requirements translate into facility and deployment scope across a portfolio of sites.</li>\n</ul>\n<ul>\n<li>Set rack interface standards spanning power, network, mechanical, thermal, and spatial domains, and ensure that every deployment includes the complete set of infrastructure required to bring a cluster online.</li>\n</ul>\n<ul>\n<li>Drive multi-threaded cluster bring-up programs across hardware, networking, power, and cooling , owning plans, dependencies, and critical paths from hardware specification through energization and turn-up.</li>\n</ul>\n<ul>\n<li>Partner with internal engineering teams , research, systems, networking, and hardware , to translate cluster requirements into deployable facility scope, and to derisk onboarding of new hardware platforms well ahead of delivery.</li>\n</ul>\n<ul>\n<li>Lead external partner execution with developers, engineering firms, OEMs, and construction teams, driving technical reviews, deviation management, and handoffs that keep deployments on schedule and within specification.</li>\n</ul>\n<ul>\n<li>Improve cluster turn-up reliability and repeatability , identify systemic gaps in deployment scope, tooling, and partner interfaces, and drive durable fixes that reduce time-to-serve for new capacity.</li>\n</ul>\n<ul>\n<li>Define and track deployment KPIs , cluster readiness, schedule adherence, scope completeness, time-to-first-packet , and use historical trends to forecast risk and inform capacity planning.</li>\n</ul>\n<ul>\n<li>Coordinate cross-functional readiness across supply chain, security, operations, and construction to ship production-ready compute capacity.</li>\n</ul>\n<ul>\n<li>Provide crisp executive visibility on deployment progress, tradeoffs, and risks across a portfolio of concurrent cluster programs.</li>\n</ul>\n<ul>\n<li>Design cluster interfaces for durability , define rack and cluster-level interfaces that remain robust across hardware generations, so that facility scope and deployment models do not need to be reinvented every time the underlying hardware changes.</li>\n</ul>\n<ul>\n<li>Build cluster layout and BOM tooling , create and maintain the tools, templates, and data models that turn cluster topology and rack specifications into accurate floor layouts, deployment sequences, and complete bills of materials, replacing one-off spreadsheets with repeatable, auditable workflows.</li>\n</ul>\n<p>You may be a good fit if you:</p>\n<ul>\n<li>Have 10+ years of experience in hyperscale datacenter environments, with senior-level responsibility for cluster deployment, large-scale IT integration, or equivalent infrastructure programs.</li>\n</ul>\n<ul>\n<li>Have delivered AI, HPC, or high-density compute clusters at scale and developed a strong intuition for the constraints that govern cluster deployment , interconnect reach, adjacency, power density, and thermal limits.</li>\n</ul>\n<ul>\n<li>Can operate fluently across the boundary between IT hardware and facility infrastructure, and have set interface standards that held up across multiple hardware generations and sites.</li>\n</ul>\n<ul>\n<li>Have led cross-functional programs with both internal engineering teams and external developers, engineering firms, and OEM partners, and are effective at driving alignment across organizational levels.</li>\n</ul>\n<ul>\n<li>Combine strong systems thinking with execution discipline , comfortable zooming from cluster topology and portfolio strategy down to the specific interface detail that will otherwise become a field issue.</li>\n</ul>\n<ul>\n<li>Communicate clearly with technical and executive audiences, and can distill complex, multi-disciplinary programs into decisions and tradeoffs leadership can act on.</li>\n</ul>\n<ul>\n<li>Thrive in ambiguous, fast-moving environments where the hardware, the scale, and the requirements are all changing simultaneously.</li>\n</ul>\n<ul>\n<li>Hold a Bachelor&#39;s degree in Electrical Engineering, Mechanical Engineering, Computer Engineering, or equivalent practical experience.</li>\n</ul>\n<p>Strong candidates may also:</p>\n<ul>\n<li>Have direct experience deploying leading-edge AI accelerator clusters at hyperscale.</li>\n</ul>\n<ul>\n<li>Have shaped reference designs, deployment standards, or cluster-level playbooks that were adopted across a fleet.</li>\n</ul>\n<ul>\n<li>Have experience working across multiple geographies and understand how regional codes, climate, utility constraints, and supply chains shape cluster-level decisions.</li>\n</ul>\n<ul>\n<li>Have partnered closely with hardware and system providers on long-term platform onboarding and bring-up.</li>\n</ul>\n<ul>\n<li>Have experience building the program mechanisms , roadmaps, milestones, risk registers, runbooks , that make delivery predictable at massive scale.</li>\n</ul>\n<p>The annual compensation range for this role is listed below. For sales roles, the range provided is the role’s On Target Earnings (“OTE”) range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.</p>\n<p>Annual Salary: $320,000-$405,000 USD</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_18013f3c-904","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5191638008","x-work-arrangement":"remote-hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$320,000-$405,000 USD","x-skills-required":["Hyperscale datacenter environments","Cluster deployment","Large-scale IT integration","Infrastructure programs","AI","HPC","High-density compute clusters","Interconnect reach","Adjacency","Power density","Thermal limits","IT hardware","Facility infrastructure","Interface standards","Cluster topology","Portfolio strategy","Execution discipline","Systems thinking","Communication","Technical audiences","Executive audiences","Complex programs","Decisions","Tradeoffs","Leadership","Bachelor's degree","Electrical Engineering","Mechanical Engineering","Computer Engineering","Practical experience"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:51:42.505Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Remote-Friendly, United States"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Hyperscale datacenter environments, Cluster deployment, Large-scale IT integration, Infrastructure programs, AI, HPC, High-density compute clusters, Interconnect reach, Adjacency, Power density, Thermal limits, IT hardware, Facility infrastructure, Interface standards, Cluster topology, Portfolio strategy, Execution discipline, Systems thinking, Communication, Technical audiences, Executive audiences, Complex programs, Decisions, Tradeoffs, Leadership, Bachelor's degree, Electrical Engineering, Mechanical Engineering, Computer Engineering, Practical experience","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":320000,"maxValue":405000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_60082588-bf0"},"title":"Cluster Deployment Engineer","description":"<p>As a Cluster Deployment Engineer at Anthropic, you will own how large-scale AI compute clusters physically come together inside our datacenter fleet.</p>\n<p>You will set the deployment-engineering strategy for cluster build-out , how racks are organized into pods, halls, and sites; how compute, network, power, and cooling systems interface at the rack boundary; and how deployment scope flows cleanly from hardware specification to facility delivery to a running cluster.</p>\n<p>This role is focused on deployment engineering, not on datacenter network or systems design , your scope is making sure clusters land cleanly and predictably, not designing the fabrics or facilities themselves.</p>\n<p>You will work across hardware, networking, facilities, supply chain, and construction to ensure that every generation of accelerator we deploy lands in a datacenter that is ready for it , on schedule, at full density, and with every piece of required infrastructure accounted for.</p>\n<p>You will be the person who sees around corners: anticipating how next-generation rack designs will stress our facilities, where our deployment model will break at scale, and what needs to change now so that the next cluster turn-up is faster and more predictable than the last.</p>\n<p>You will operate at the intersection of engineering strategy and execution discipline, partnering with internal research and systems teams, external developers, engineering firms, and OEM partners to deliver cluster capacity at the speed the frontier demands.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Own cluster-level deployment strategy , define how AI compute clusters are organized across the floor, how racks interconnect, and how cluster topology requirements translate into facility and deployment scope across a portfolio of sites.</li>\n</ul>\n<ul>\n<li>Set rack interface standards spanning power, network, mechanical, thermal, and spatial domains, and ensure that every deployment includes the complete set of infrastructure required to bring a cluster online.</li>\n</ul>\n<ul>\n<li>Drive multi-threaded cluster bring-up programs across hardware, networking, power, and cooling , owning plans, dependencies, and critical paths from hardware specification through energization and turn-up.</li>\n</ul>\n<ul>\n<li>Partner with internal engineering teams , research, systems, networking, and hardware , to translate cluster requirements into deployable facility scope, and to derisk onboarding of new hardware platforms well ahead of delivery.</li>\n</ul>\n<ul>\n<li>Lead external partner execution with developers, engineering firms, OEMs, and construction teams, driving technical reviews, deviation management, and handoffs that keep deployments on schedule and within specification.</li>\n</ul>\n<ul>\n<li>Improve cluster turn-up reliability and repeatability , identify systemic gaps in deployment scope, tooling, and partner interfaces, and drive durable fixes that reduce time-to-serve for new capacity.</li>\n</ul>\n<ul>\n<li>Define and track deployment KPIs , cluster readiness, schedule adherence, scope completeness, time-to-first-packet , and use historical trends to forecast risk and inform capacity planning.</li>\n</ul>\n<ul>\n<li>Coordinate cross-functional readiness across supply chain, security, operations, and construction to ship production-ready compute capacity.</li>\n</ul>\n<ul>\n<li>Provide crisp executive visibility on deployment progress, tradeoffs, and risks across a portfolio of concurrent cluster programs.</li>\n</ul>\n<ul>\n<li>Design cluster interfaces for durability , define rack and cluster-level interfaces that remain robust across hardware generations, so that facility scope and deployment models do not need to be reinvented every time the underlying hardware changes.</li>\n</ul>\n<ul>\n<li>Build cluster layout and BOM tooling , create and maintain the tools, templates, and data models that turn cluster topology and rack specifications into accurate floor layouts, deployment sequences, and complete bills of materials, replacing one-off spreadsheets with repeatable, auditable workflows.</li>\n</ul>\n<p>You may be a good fit if you:</p>\n<ul>\n<li>Have 10+ years of experience in hyperscale datacenter environments, with senior-level responsibility for cluster deployment, large-scale IT integration, or equivalent infrastructure programs.</li>\n</ul>\n<ul>\n<li>Have delivered AI, HPC, or high-density compute clusters at scale and developed a strong intuition for the constraints that govern cluster deployment , interconnect reach, adjacency, power density, and thermal limits.</li>\n</ul>\n<ul>\n<li>Can operate fluently across the boundary between IT hardware and facility infrastructure, and have set interface standards that held up across multiple hardware generations and sites.</li>\n</ul>\n<ul>\n<li>Have led cross-functional programs with both internal engineering teams and external developers, engineering firms, and OEM partners, and are effective at driving alignment across organizational levels.</li>\n</ul>\n<ul>\n<li>Combine strong systems thinking with execution discipline , comfortable zooming from cluster topology and portfolio strategy down to the specific interface detail that will otherwise become a field issue.</li>\n</ul>\n<ul>\n<li>Communicate clearly with technical and executive audiences, and can distill complex, multi-disciplinary programs into decisions and tradeoffs leadership can act on.</li>\n</ul>\n<ul>\n<li>Thrive in ambiguous, fast-moving environments where the hardware, the scale, and the requirements are all changing simultaneously.</li>\n</ul>\n<ul>\n<li>Hold a Bachelor&#39;s degree in Electrical Engineering, Mechanical Engineering, Computer Engineering, or equivalent practical experience.</li>\n</ul>\n<p>Strong candidates may also:</p>\n<ul>\n<li>Have direct experience deploying leading-edge AI accelerator clusters at hyperscale.</li>\n</ul>\n<ul>\n<li>Have shaped reference designs, deployment standards, or cluster-level playbooks that were adopted across a fleet.</li>\n</ul>\n<ul>\n<li>Have experience working across multiple geographies and understand how regional codes, climate, utility constraints, and supply chains shape cluster-level decisions.</li>\n</ul>\n<ul>\n<li>Have partnered closely with hardware and system providers on long-term platform onboarding and bring-up.</li>\n</ul>\n<ul>\n<li>Have experience building the program mechanisms , roadmaps, milestones, risk registers, runbooks , that make delivery predictable at massive scale.</li>\n</ul>\n<p>The annual compensation range for this role is listed below. For sales roles, the range provided is the role’s On Target Earnings (“OTE”) range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.</p>\n<p>Annual Salary: $320,000-$405,000 USD</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_60082588-bf0","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5191638008","x-work-arrangement":"remote-hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$320,000-$405,000 USD","x-skills-required":["Hyperscale datacenter environments","Cluster deployment","Large-scale IT integration","Infrastructure programs","AI","HPC","High-density compute clusters","Interconnect reach","Adjacency","Power density","Thermal limits","IT hardware","Facility infrastructure","Interface standards","Cluster topology","Portfolio strategy","Execution discipline","Systems thinking","Communication","Technical audiences","Executive audiences","Decision-making","Trade-offs","Leadership","Bachelor's degree","Electrical Engineering","Mechanical Engineering","Computer Engineering","Practical experience"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:36:06.517Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Remote-Friendly, United States"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Hyperscale datacenter environments, Cluster deployment, Large-scale IT integration, Infrastructure programs, AI, HPC, High-density compute clusters, Interconnect reach, Adjacency, Power density, Thermal limits, IT hardware, Facility infrastructure, Interface standards, Cluster topology, Portfolio strategy, Execution discipline, Systems thinking, Communication, Technical audiences, Executive audiences, Decision-making, Trade-offs, Leadership, Bachelor's degree, Electrical Engineering, Mechanical Engineering, Computer Engineering, Practical experience","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":320000,"maxValue":405000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_c1dcea75-d5a"},"title":"Member of Technical Staff - Infrastructure Engineer","description":"<p>We&#39;re looking for an experienced engineer to join our team in Freiburg, Germany or San Francisco, USA. As a Member of Technical Staff - Infrastructure Engineer, you will be responsible for maintaining and scaling our research infrastructure, ensuring health and optimizing components to extract peak performance from the system. You will also collaborate with research teams to deeply understand their infrastructure needs and design solutions that balance performance with cost efficiency.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Maintaining research infrastructure, ensuring health, and optimizing components to extract peak performance from the system (both on application and infrastructure side)</li>\n<li>Scaling infrastructure to meet growing research demands while maintaining reliability and performance</li>\n<li>Collaborating with research teams to deeply understand their infrastructure needs, and design solutions that balance performance with cost efficiency</li>\n<li>Identifying and resolving performance bottlenecks and capacity hotspots through deep analysis of distributed systems at scale</li>\n<li>Building and evolving telemetry and monitoring systems to provide deep visibility into infrastructure performance, utilization, and costs across our cloud and datacenter fleets</li>\n<li>Participating in on-call rotations and incident response to maintain system reliability</li>\n</ul>\n<p>Technical focus includes:</p>\n<ul>\n<li>Python, Bash, Go</li>\n<li>Kubernetes</li>\n<li>Nvidia GPU drivers and operators</li>\n<li>OTel, Prometheus</li>\n</ul>\n<p>Requirements include:</p>\n<ul>\n<li>Experience building or operating large-scale training platforms</li>\n<li>Worked with large-scale compute clusters (GPUs)</li>\n<li>Proven ability to debug performance and reliability issues across large distributed fleets</li>\n<li>Strong problem-solving skills and ability to work independently</li>\n<li>Strong communication skills and the ability to work effectively with both internal and external partners</li>\n<li>Deep knowledge of modern cloud infrastructure including Kubernetes, Infrastructure as Code, AWS, and GCP</li>\n<li>Experience with SLURM</li>\n</ul>\n<p>We offer a competitive base annual salary of $180,000-$300,000 USD and a hybrid work model with a meaningful in-person presence.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_c1dcea75-d5a","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Black Forest Labs","sameAs":"https://www.blackforestlabs.com/","logo":"https://logos.yubhub.co/blackforestlabs.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/blackforestlabs/jobs/4925659008","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$180,000-$300,000 USD","x-skills-required":["Python","Bash","Go","Kubernetes","Nvidia GPU drivers","Nvidia GPU operators","OTel","Prometheus","Experience building or operating large-scale training platforms","Worked with large-scale compute clusters (GPUs)","Proven ability to debug performance and reliability issues across large distributed fleets","Strong problem-solving skills and ability to work independently","Strong communication skills and the ability to work effectively with both internal and external partners","Deep knowledge of modern cloud infrastructure including Kubernetes, Infrastructure as Code, AWS, and GCP","Experience with SLURM"],"x-skills-preferred":[],"datePosted":"2026-04-17T12:25:55.745Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Freiburg (Germany), San Francisco (USA)"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Bash, Go, Kubernetes, Nvidia GPU drivers, Nvidia GPU operators, OTel, Prometheus, Experience building or operating large-scale training platforms, Worked with large-scale compute clusters (GPUs), Proven ability to debug performance and reliability issues across large distributed fleets, Strong problem-solving skills and ability to work independently, Strong communication skills and the ability to work effectively with both internal and external partners, Deep knowledge of modern cloud infrastructure including Kubernetes, Infrastructure as Code, AWS, and GCP, Experience with SLURM","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":180000,"maxValue":300000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_f5e7e195-679"},"title":"Datacenter Hardware Operations Technician, AI Compute Infrastructure - Stargate","description":"<p><strong>Job Posting</strong></p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$86.4K – $228K</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p><strong>About the Team</strong></p>\n<p>OpenAI, in close collaboration with our capital partners, is embarking on a journey to build the world’s most advanced AI infrastructure ecosystem. Our Stargate program develops and deploys massive, state-of-the-art data center campuses in partnership with industry leaders such as Oracle today—and through future OpenAI infrastructure projects tomorrow. We design for scale, speed, and reliability, and we need experienced hardware professionals who can help ensure our high-density compute environment operates at peak performance.</p>\n<p><strong>About the Role</strong></p>\n<p>We are seeking a senior datacenter hardware operations technician to coordinate physical hardware activities at a large partner-operated campus. In this role you will work side-by-side with Oracle and their delivery teams, helping align OpenAI’s compute requirements with day-to-day hardware work on the ground. Rather than directing partner personnel, you will focus on collaboration, technical alignment, and shared problem solving, ensuring that maintenance, repairs, and lifecycle activities support the performance and reliability goals of both organizations. As the campus matures, you will help capture lessons learned and develop standards and playbooks to guide hardware operations at future OpenAI infrastructure projects.</p>\n<p>_Candidates must be able to sit onsite in Abilene, Texas 5 days per week_</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Serve as OpenAI’s primary on-site hardware contact, collaborating with Oracle teams and vendors to plan and coordinate maintenance, repairs, and lifecycle activities.</li>\n</ul>\n<ul>\n<li>Share technical requirements and verify that work performed supports OpenAI’s compute needs and agreed quality targets.</li>\n</ul>\n<ul>\n<li>Coordinate schedules, spare-parts planning, and issue escalation with partner teams to minimize downtime and keep operations running smoothly.</li>\n</ul>\n<ul>\n<li>Work with OpenAI fleet-health engineers to translate software-detected issues into on-site hardware actions in partnership with Oracle.</li>\n</ul>\n<ul>\n<li>Track hardware trends and provide joint recommendations with partner teams for design or operational improvements.</li>\n</ul>\n<ul>\n<li>Prepare documentation and runbooks that capture joint best practices and can be applied at additional campuses.</li>\n</ul>\n<ul>\n<li>Offer technical guidance and context to partner personnel while respecting their operational ownership.</li>\n</ul>\n<ul>\n<li>Collaborate with supply-chain teams to plan spares and manage hardware lifecycle activities.</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>Have 7+ years of experience in datacenter hardware operations, hardware engineering, or large-scale server maintenance, with at least 2 years in a senior or lead technician capacity.</li>\n</ul>\n<ul>\n<li>Bring deep knowledge of high-density server hardware, including x86 platforms, GPUs, storage devices, and power/cooling systems.</li>\n</ul>\n<ul>\n<li>Excel at diagnosing hardware issues, coordinating complex repairs, and maintaining strong working relationships across organizations.</li>\n</ul>\n<ul>\n<li>Are comfortable setting technical expectations and validating outcomes through collaboration, not direct management.</li>\n</ul>\n<ul>\n<li>Adapt quickly to changing operational conditions and enjoy solving problems at both the strategic and on-site levels.</li>\n</ul>\n<ul>\n<li>Communicate clearly and build trust across partner teams, vendors, and internal engineering stakeholders.</li>\n</ul>\n<ul>\n<li>Are willing to be based full-time at a partner-operated campus</li>\n</ul>\n<p><strong>Preferred Skills</strong></p>\n<ul>\n<li>Familiarity with large-scale cluster management or monitoring tools (IPMI, BMC, Prometheus, Nagios) to interpret alerts and coordinate partner responses.</li>\n</ul>\n<ul>\n<li>Experience with GPU-accelerated compute clusters or other high-performance computing hardware.</li>\n</ul>\n<ul>\n<li>Knowledge of Linux/Unix system administration and command-line diagnostic tools for hardware validation.</li>\n</ul>\n<ul>\n<li>Industry certifications such as CompTIA Server+, OEM hardware certifications, or equivalent.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_f5e7e195-679","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/b9a4a809-a965-4dbe-aeef-6ce1593903dd","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$86.4K – $228K","x-skills-required":["datacenter hardware operations","hardware engineering","large-scale server maintenance","high-density server hardware","x86 platforms","GPUs","storage devices","power/cooling systems"],"x-skills-preferred":["large-scale cluster management","monitoring tools","IPMI","BMC","Prometheus","Nagios","GPU-accelerated compute clusters","Linux/Unix system administration","command-line diagnostic tools","industry certifications"],"datePosted":"2026-03-06T18:43:34.654Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Remote - US"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"datacenter hardware operations, hardware engineering, large-scale server maintenance, high-density server hardware, x86 platforms, GPUs, storage devices, power/cooling systems, large-scale cluster management, monitoring tools, IPMI, BMC, Prometheus, Nagios, GPU-accelerated compute clusters, Linux/Unix system administration, command-line diagnostic tools, industry certifications","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":86400,"maxValue":228000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_520ca95e-75f"},"title":"Software Engineer, Agent Infrastructure","description":"<p><strong>Software Engineer, Agent Infrastructure</strong></p>\n<p><strong>Location</strong></p>\n<p>San Francisco; New York City</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p>Scaling</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$230K – $385K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p><strong>About the Team</strong></p>\n<p>The Agent Infrastructure team at OpenAI is responsible for building systems that enable training and deployment of highly useful AI agents, both internally and for the world.</p>\n<p>We work hand-in-hand with researchers to design and scale the environment in which agentic models are trained – providing a workspace for AI models to execute code, debug issues, and develop software just as human SWEs do. Our training environment for agentic models operates at an extremely high scale and has the flexibility to emulate any environment in which an agent might work.</p>\n<p>At the same time, our team builds and maintains OpenAI’s core platform for the deployment and execution of agents in production. Our systems power products such as Codex, Operator, tool use in ChatGPT, and future agentic products.</p>\n<p><strong>About the Role</strong></p>\n<p>As a Software Engineer on the Agent Infrastructure team, you will have the opportunity to work closely with both research and product at OpenAI - building and scaling systems to train highly capable agentic models, and building the platform and integrations to launch new agents to hundreds of millions of users worldwide.</p>\n<p>Your work will consist of both building new capabilities - standing up the infrastructure and integrations needed to train more complex agentic models - and rapidly scaling these new capabilities to some of the largest compute clusters in the world. At the same time, you’ll be instrumental to the launch of agentic products at OpenAI - building, maintaining, and scaling the production platform on which all agents run.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Push massive compute clusters to their limits. You will be a core contributor to a novel container orchestration platform built in-house by our team to scale far beyond what’s possible with systems like Kubernetes.</li>\n</ul>\n<ul>\n<li>Develop and maintain FastAPI and gRPC APIs that serve as the interface for our agentic infrastructure used both in training and production.</li>\n</ul>\n<ul>\n<li>Use Terraform to stand up and evolve complex infrastructure for both research and production.</li>\n</ul>\n<ul>\n<li>Collaborate with research teams to stand up and optimize systems for novel AI training runs and experimental applications.</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>Have deep experience working on large-scale machine learning infrastructure. You know how to reason about training at scale, identifying bottlenecks and engineering solutions to optimize system performance in training environments.</li>\n</ul>\n<ul>\n<li>Know how to build new things from 0-1 quickly, and then scale them 1,000,000x.</li>\n</ul>\n<ul>\n<li>Have a keen eye for performance and optimization. You know how to squeeze the most performance out of complex, globally-distributed systems.</li>\n</ul>\n<ul>\n<li>Know your way around cloud platforms and work with infrastructure-as-code tech like Terraform.</li>\n</ul>\n<ul>\n<li>Are driven by solving complex, ambiguous problems at the intersection of infrastructure scalability, virtualization efficiency, and agentic capabilities.</li>\n</ul>\n<ul>\n<li>Have deep technical expertise in virtualization and containerization technologies (e.g. Kata, Firecracker, gVisor, Sysbox) and are passionate about optimizing runtime performance.</li>\n</ul>\n<p><strong>What We Offer</strong></p>\n<ul>\n<li>Competitive salary and equity package</li>\n</ul>\n<ul>\n<li>Opportunity to work on cutting-edge AI infrastructure</li>\n</ul>\n<ul>\n<li>Collaborative and dynamic team environment</li>\n</ul>\n<ul>\n<li>Flexible work arrangements</li>\n</ul>\n<ul>\n<li>Professional development opportunities</li>\n</ul>\n<ul>\n<li>Access to the latest technology and tools</li>\n</ul>\n<p><strong>How to Apply</strong></p>\n<p>If you are a motivated and experienced software engineer looking to join a dynamic team and work on cutting-edge AI infrastructure, please submit your application. We look forward to hearing from you!</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_520ca95e-75f","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/c1316397-25bb-4add-9e9d-0e3ea8ba929a","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$230K – $385K","x-skills-required":["large-scale machine learning infrastructure","container orchestration","FastAPI","gRPC","Terraform","cloud platforms","infrastructure-as-code","virtualization","containerization","Kata","Firecracker","gVisor","Sysbox"],"x-skills-preferred":["AI infrastructure","agentic models","training environments","compute clusters","performance optimization","runtime performance"],"datePosted":"2026-03-06T18:41:05.385Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco; New York City"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"large-scale machine learning infrastructure, container orchestration, FastAPI, gRPC, Terraform, cloud platforms, infrastructure-as-code, virtualization, containerization, Kata, Firecracker, gVisor, Sysbox, AI infrastructure, agentic models, training environments, compute clusters, performance optimization, runtime performance","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":230000,"maxValue":385000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_e31a2c4e-190"},"title":"ASIC Firmware Engineer, Modeling","description":"<p><strong>Job Posting</strong></p>\n<p><strong>ASIC Firmware Engineer, Modeling</strong></p>\n<p><strong>Location</strong></p>\n<p>San Francisco</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p>Scaling</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$226K – $445K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p>More details about our benefits are available to candidates during the hiring process.</p>\n<p>This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.</p>\n<p><strong>About the Team</strong></p>\n<p>OpenAI’s Hardware organization develops silicon and system-level solutions designed for the unique demands of advanced AI workloads. The team is responsible for building the next generation of AI-native silicon while working closely with software and research partners to co-design hardware tightly integrated with AI models. In addition to delivering production-grade silicon for OpenAI’s supercomputing infrastructure, the team also creates custom design tools and methodologies that accelerate innovation and enable hardware optimized specifically for AI.</p>\n<p><strong>About the Role</strong></p>\n<p>We are looking for an embedded engineer to help build firmware and associated modeling software for OpenAI’s in house AI accelerator. This role involves designing and developing drivers and functional models for a large array of HW components, writing high throughput and low latency firmware code, investigating bring-up and production issues.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Design and implement drivers for hardware peripherals, including those related to AI chips.</li>\n</ul>\n<ul>\n<li>Design and implement functional software models to simulate SoC uncore logic and enable FW testing against the model</li>\n</ul>\n<ul>\n<li>Design and implement low-latency and high throughput embedded SW to manage HW resources.</li>\n</ul>\n<ul>\n<li>Work with adjacent software and hardware teams to implement requirements, debug issues and shape future generations of the hardware.</li>\n</ul>\n<ul>\n<li>Collaborate with vendors to integrate their technologies within our systems.</li>\n</ul>\n<ul>\n<li>Bring up and debug firmware/driver on new platforms.</li>\n</ul>\n<ul>\n<li>Come up with processes and debug issues raised in the field.</li>\n</ul>\n<ul>\n<li>Set up monitoring, integration testing and diagnostics tools.</li>\n</ul>\n<p><strong>Qualifications</strong></p>\n<ul>\n<li>5+ years of experience working in embedded SW space.</li>\n</ul>\n<ul>\n<li>Ability to thrive in ambiguity and learn new technologies.</li>\n</ul>\n<ul>\n<li>Strong programming skills in C/C++ and/or Rust.</li>\n</ul>\n<ul>\n<li>Experience developing high throughput, low latency and multi-threaded code.</li>\n</ul>\n<ul>\n<li>Experience working with real time operating systems (RTOS).</li>\n</ul>\n<ul>\n<li>Experience developing hardware drivers and working with hardware</li>\n</ul>\n<ul>\n<li>Experience with HW/SW co-design</li>\n</ul>\n<ul>\n<li>Knowledge of common embedded protocols, e.g. UART, I2C, SPI, etc.</li>\n</ul>\n<ul>\n<li>Knowledge of microprocessor and common ARM architectures (e.g. AMBA) is a plus.</li>\n</ul>\n<ul>\n<li>Knowledge of PCIe, ethernet and other high BW communication protocols is a plus.</li>\n</ul>\n<ul>\n<li>Experience with GPUs or other compute hardware is a plus.</li>\n</ul>\n<ul>\n<li>Experience deploying large compute clusters is a plus.</li>\n</ul>\n<p>_To comply with U.S. export control laws and regulations, candidates for this role may need to meet certain legal status requirements as provided in those laws and regulations._</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_e31a2c4e-190","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/e4ef18a1-f2f7-4920-a53c-aeadd184d124","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$226K – $445K • Offers Equity","x-skills-required":["C/C++","Rust","Embedded SW","Real time operating systems (RTOS)","Hardware drivers","HW/SW co-design","Common embedded protocols (UART, I2C, SPI, etc.)","Microprocessor and common ARM architectures (e.g. AMBA)","PCIe, ethernet and other high BW communication protocols"],"x-skills-preferred":["GPU","Compute hardware","Large compute clusters"],"datePosted":"2026-03-06T18:40:36.430Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"C/C++, Rust, Embedded SW, Real time operating systems (RTOS), Hardware drivers, HW/SW co-design, Common embedded protocols (UART, I2C, SPI, etc.), Microprocessor and common ARM architectures (e.g. AMBA), PCIe, ethernet and other high BW communication protocols, GPU, Compute hardware, Large compute clusters","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":226000,"maxValue":445000,"unitText":"YEAR"}}}]}