{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/server-hardware"},"x-facet":{"type":"skill","slug":"server-hardware","display":"Server Hardware","count":13},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_13f90298-71b"},"title":"Hardware Engineer","description":"<p>We are seeking a highly skilled and motivated Hardware Engineer to join our Hardware Provisioning team. In this role, you will play a crucial part in the design, development, and optimization of our server hardware infrastructure.</p>\n<p>Your responsibilities will include developing and maintaining hardware/firmware management services, automating all aspects of the server hardware lifecycle, serving as the senior point of contact for hardware escalation and troubleshooting, collaborating with cross-functional teams to define hardware requirements, specifications, and system architecture, creating and maintaining accurate documentation of hardware designs, specifications, test procedures, and results, analyzing and optimizing the performance of hardware systems, identifying bottlenecks, and proposing improvements for enhanced efficiency, and establishing processes for internal hardware testing, deployment, and performance optimization.</p>\n<p>To be successful in this role, you will need to have proficiency in Ansible/Python and experience with programmatically interacting with server BMCs, using IPMI or Redfish, in-depth knowledge of server hardware, components, and management technologies, proven ability to stay updated with the latest industry technologies and trends, previous experience collaborating with hardware vendors, strong passion for automation, excellent documentation skills and attention to detail, and strong analytical and problem-solving abilities.</p>\n<p>The base annual salary range for this role is $109,000 to $204,000, and we offer a variety of benefits to support your needs, including medical, dental, and vision insurance, company-paid life insurance, voluntary supplemental life insurance, short and long-term disability insurance, flexible spending account, health savings account, tuition reimbursement, ability to participate in employee stock purchase program (ESPP), mental wellness benefits through Spring Health, family-forming support provided by Carrot, paid parental leave, flexible, full-service childcare support with Kinside, 401(k) with a generous employer match, and flexible PTO.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_13f90298-71b","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4644828006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$109,000 to $204,000","x-skills-required":["Ansible","Python","IPMI","Redfish","server hardware","components","management technologies"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:50:44.597Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"New York, NY / Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Ansible, Python, IPMI, Redfish, server hardware, components, management technologies","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":109000,"maxValue":204000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_3d3e5c3d-569"},"title":"Senior Engineer, Datacenter Server Lifecycle","description":"<p>As a Senior Engineer on the Datacenter Machine Lifecycle team, you will own the end-to-end operational journey of every machine in our facility , from initial provisioning and deployment, across its working life, through maintenance and refresh, and all the way to decommissioning.</p>\n<p>This is greenfield work: you will help define the processes, tooling, and operational standards that govern how we run and retire hardware at scale.</p>\n<p>A distinguishing aspect of this role is its deep intersection with security. The machines in our datacenter handle some of the most sensitive workloads in AI , training frontier models and serving millions of users interacting with Claude.</p>\n<p>Ensuring that every machine in the fleet is trusted, attested, and operating with a verified chain of integrity from the hardware up is a core part of the job, not an afterthought.</p>\n<p>You will partner closely with our Infrastructure Security team to define and enforce trusted compute standards across the lifecycle, from secure provisioning through end-of-life handling.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Lead the build-out of automation to support datacenters containing tens of thousands of servers.</li>\n</ul>\n<ul>\n<li>Own and define the end-to-end machine lifecycle strategy , from provisioning and deployment through operation, maintenance, refresh, and decommissioning , and maintain automation and operational procedures for common lifecycle events (e.g. hardware failures, firmware upgrades, fleet rotations).</li>\n</ul>\n<ul>\n<li>Partner closely with Infrastructure Security to design and enforce trusted compute standards across the machine lifecycle.</li>\n</ul>\n<ul>\n<li>Work closely with our Networking team to ensure end-to-end connectivity across all sites.</li>\n</ul>\n<ul>\n<li>Build and maintain tooling to track machine health, configuration, and operational status across the full datacenter fleet.</li>\n</ul>\n<p>You May Be a Good Fit If You:</p>\n<ul>\n<li>Have 5+ years of experience in datacenter operations, hardware infrastructure management, or a closely related discipline.</li>\n</ul>\n<ul>\n<li>Have deep, hands-on experience with server hardware , including rack deployment, cabling, troubleshooting, and understanding failure modes at scale.</li>\n</ul>\n<ul>\n<li>Understand hardware lifecycle management end-to-end: asset tracking, provisioning workflows, maintenance scheduling, and decommissioning practices.</li>\n</ul>\n<ul>\n<li>Have strong proficiency in at least one programming language (e.g., Python, Rust, Go, or Java).</li>\n</ul>\n<ul>\n<li>Are comfortable navigating ambiguity and working independently to drive progress on complex, cross-functional problems.</li>\n</ul>\n<ul>\n<li>Communicate clearly and can build consensus with a wide range of stakeholders.</li>\n</ul>\n<ul>\n<li>Have working knowledge of modern cloud infrastructure, including Kubernetes, Infrastructure as Code, AWS, and GCP.</li>\n</ul>\n<ul>\n<li>Are comfortable with occasional travel to datacenter sites across North America.</li>\n</ul>\n<p>Strong Candidates May Also Have:</p>\n<ul>\n<li>Hands-on experience with GPU or AI accelerator hardware (e.g. NVIDIA A100/H100, AMD MI300, Google TPUs, or AWS Trainium) and an understanding of their operational demands.</li>\n</ul>\n<ul>\n<li>Familiarity with modern provisioning tooling such as coreboot, LinuxBoot, or u-root.</li>\n</ul>\n<ul>\n<li>Experience building or contributing to datacenter automation or fleet management platforms.</li>\n</ul>\n<ul>\n<li>Experience building and deploying server operating system distributions across thousands of hosts.</li>\n</ul>\n<ul>\n<li>A background in large-scale capacity planning and hardware refresh strategy, ideally at a hyperscaler or large cloud provider.</li>\n</ul>\n<ul>\n<li>Experience with trusted compute and hardware security concepts such as secure boot, TPM, hardware attestation, and firmware verification , or a strong desire to develop deep expertise in this area.</li>\n</ul>\n<p>The annual compensation range for this role is £255,000-£325,000 GBP.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_3d3e5c3d-569","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5131038008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"£255,000-£325,000 GBP","x-skills-required":["datacenter operations","hardware infrastructure management","server hardware","programming language","cloud infrastructure","Kubernetes","Infrastructure as Code","AWS","GCP"],"x-skills-preferred":["GPU or AI accelerator hardware","modern provisioning tooling","datacenter automation","fleet management platforms","trusted compute and hardware security concepts"],"datePosted":"2026-04-18T15:47:48.808Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"London, UK"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"datacenter operations, hardware infrastructure management, server hardware, programming language, cloud infrastructure, Kubernetes, Infrastructure as Code, AWS, GCP, GPU or AI accelerator hardware, modern provisioning tooling, datacenter automation, fleet management platforms, trusted compute and hardware security concepts","baseSalary":{"@type":"MonetaryAmount","currency":"GBP","value":{"@type":"QuantitativeValue","minValue":255000,"maxValue":325000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_61e08612-10d"},"title":"Data Center Operations Technician","description":"<p>As a Data Center Operations Technician at xAI, you will be responsible for the health of our server and network infrastructure for Data Centers and Global Points of Presence. You will be responsible for our two most important data center operations metrics: mean time to detect (MTTD) and mean time to repair (MTTR).</p>\n<p>Your primary responsibilities will include:</p>\n<ul>\n<li>Reporting to the job site during initial construction and reporting back to the engineering team as required.</li>\n<li>Performing troubleshooting and monitoring of the servers and network in our data centers and global points of presence.</li>\n<li>Rack and stacking of data center network equipment.</li>\n<li>Maintaining Warehouse inventory and asset management using our internal application.</li>\n<li>Labelling and troubleshooting for fibre/optics cables.</li>\n<li>Power supply cabling, installation, troubleshooting and repair.</li>\n<li>Installation of racks, servers and switches; this includes staging racks in place, cabling, power up and handoff of hardware to the provisioning team for customer capacity allocation.</li>\n<li>Managing, responding and resolving of data center operations tickets used cross functionally within xAI via Jira.</li>\n<li>Creating and maintaining documentation of tasks and standard operating procedures.</li>\n<li>Receipt and decommissioning of data center hardware.</li>\n<li>Vendor returns for infrastructure under and out of warranty.</li>\n<li>Managing spare parts inventory within the data center.</li>\n<li>Defining, designing, and implementing network layouts and solutions within our data centers.</li>\n</ul>\n<p>To be successful in this role, you will need:</p>\n<ul>\n<li>A high school diploma or equivalency certificate.</li>\n<li>2+ years of experience working with server, storage, compute and network hardware.</li>\n<li>2+ years of experience troubleshooting and repairing servers and networking infrastructure.</li>\n<li>2+ years of experience in Inventory Management, and ordering, receiving and shipping server and network equipment.</li>\n<li>Strong Linux skills, including navigating the system&#39;s directories and filing system, manipulating files in the Linux shell, user permission configuration, package installation and software management.</li>\n<li>Ability to identify and apply different filesystem types, using Linux commands for process management, basic troubleshooting and debugging, and Bash or other scripting.</li>\n<li>Experience being on-call and ability to respond to critical events as needed.</li>\n<li>Experience leading Data Center Infrastructure projects.</li>\n<li>Curious to always learn new things within the Data Center World.</li>\n<li>Excellent prioritization and time management skills.</li>\n<li>Able to work in a fast-paced environment.</li>\n<li>Detail-oriented.</li>\n<li>Oracle Experience.</li>\n<li>Inventory Management.</li>\n<li>4+ years of experience in Structured Cabling Copper/Fibre.</li>\n<li>4+ years of experience in Power and Cooling concepts inside the data center.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_61e08612-10d","directApply":true,"hiringOrganization":{"@type":"Organization","name":"xAI","sameAs":"https://www.xai.com/","logo":"https://logos.yubhub.co/xai.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/xai/jobs/4741579007","x-work-arrangement":"onsite","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Linux","Server Hardware","Network Hardware","Inventory Management","Troubleshooting","Debugging","Scripting","Oracle Experience","Structured Cabling Copper/Fibre","Power and Cooling Concepts"],"x-skills-preferred":["Strong Linux skills","Experience leading Data Center Infrastructure projects","Curious to always learn new things within the Data Center World","Excellent prioritization and time management skills","Able to work in a fast-paced environment","Detail-oriented"],"datePosted":"2026-04-18T15:34:28.202Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Memphis, TN; Southaven, MS"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Linux, Server Hardware, Network Hardware, Inventory Management, Troubleshooting, Debugging, Scripting, Oracle Experience, Structured Cabling Copper/Fibre, Power and Cooling Concepts, Strong Linux skills, Experience leading Data Center Infrastructure projects, Curious to always learn new things within the Data Center World, Excellent prioritization and time management skills, Able to work in a fast-paced environment, Detail-oriented"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_7cd8e557-30a"},"title":"Data Center Operations Technician","description":"<p>As a Data Center Operations Technician at xAI, you will be responsible for the health of our server and network infrastructure for Data Centers. Your primary focus will be on maintaining our network infrastructure, performing troubleshooting and monitoring of servers, and managing data center operations tickets.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Perform troubleshooting and monitoring of servers, diagnose and repair issues.</li>\n<li>Maintain our network infrastructure, including network gear swaps, troubleshooting optics/fiber links, and repairing them.</li>\n<li>Manage, respond, and resolve data center operations tickets used cross-functionally via Jira.</li>\n<li>Create and maintain documentation of tasks and standard operating procedures.</li>\n<li>Receive and decommission data center hardware.</li>\n<li>Install racks, servers, and switches, including staging racks in place, cabling, power up, and handoff of hardware to the provisioning team for customer capacity allocation.</li>\n<li>Maintain warehouse inventory and asset management using our internal application.</li>\n<li>Vendor returns for infrastructure under and out of warranty.</li>\n<li>Manage spare parts inventory within the data center.</li>\n<li>Define, design, and implement network layouts and solutions within our data centers.</li>\n</ul>\n<p><strong>Qualifications</strong></p>\n<ul>\n<li>2+ years of experience working with server, storage, compute, and network hardware.</li>\n<li>2+ years of experience troubleshooting and repairing servers and networking infrastructure.</li>\n<li>1+ year of experience in inventory management, and ordering, receiving, and shipping server and network equipment.</li>\n<li>Strong Linux skills, including proficiency in lifting 75 lbs.</li>\n<li>Ability to work 24/7 in a fast-paced environment with excellent prioritization and time management skills.</li>\n</ul>\n<p><strong>Preferred Skills and Experience</strong></p>\n<ul>\n<li>Experience being on-call and responding to critical events as needed.</li>\n<li>Experience leading Data Center Infrastructure projects.</li>\n<li>Curious to always learn new things within the Data Center World.</li>\n<li>Excellent prioritization and time management skills.</li>\n<li>Able to work in a fast-paced environment and detail-oriented.</li>\n<li>1+ year of experience in structured cabling copper/fiber.</li>\n<li>1+ year of experience in power and cooling concepts inside the data center.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_7cd8e557-30a","directApply":true,"hiringOrganization":{"@type":"Organization","name":"xAI","sameAs":"https://www.xai.com/","logo":"https://logos.yubhub.co/xai.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/xai/jobs/5085890007","x-work-arrangement":"onsite","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Linux","Server hardware","Network hardware","Troubleshooting","Inventory management","Lifting 75 lbs"],"x-skills-preferred":["On-call experience","Data Center Infrastructure project leadership","Structured cabling copper/fiber","Power and cooling concepts"],"datePosted":"2026-04-18T15:23:26.236Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Hillsboro, OR"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Linux, Server hardware, Network hardware, Troubleshooting, Inventory management, Lifting 75 lbs, On-call experience, Data Center Infrastructure project leadership, Structured cabling copper/fiber, Power and cooling concepts"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_058f6a10-283"},"title":"Field Hardware Engineer, HPC","description":"<p>Our compute footprint is growing fast to support our science and engineering teams. We&#39;re hiring a Field HW Engineer to understand end-to-end systems, execute complex/vendor-level interventions, and guide L1 engineers on site,without direct line management.</p>\n<p>You&#39;ll work hands-on across compute, storage, interconnect and cooling to keep one of France&#39;s largest GPU/CPU clusters healthy and scalable.</p>\n<p>Location: Bruyères-le-Châtel , on-site, field role (multi-site mobility: Paris area and nearby)</p>\n<p>Reporting line: Hardware Ops</p>\n<p>Impact:</p>\n<p>• Compute is a key lever for Mistral&#39;s success and our largest spend item.</p>\n<p>• Direct impact on scale: you&#39;ll restore service on complex incidents and raise the bar on reliability as we grow.</p>\n<p>• Enable breakthrough AI: your work unlocks science &amp; engineering teams to deliver state-of-the-art AI.</p>\n<p>What you will do:</p>\n<p>• Lead complex interventions: plan and execute vendor-level or multi-node operations (e.g., full rack work, intricate recabling, post-restart diagnosis), own risk assessment/rollback, and coordinate with vendors (RMA/escalations).</p>\n<p>• Advanced diagnostics: correlate symptoms across compute, storage, interconnect, cooling; read system indicators (LED/POST/beep), BMC/IPMI consoles, and logs to identify root causes.</p>\n<p>• Guide and uplift L1s: coach on safe practices (ESD/LOTO), first-line triage, rack craftsmanship, documentation quality; pair on tricky procedures.</p>\n<p>• Process &amp; automation: improve SOPs/checklists; propose/build small automation (Python/Bash) for photo/serial capture, inventory sync, dashboards/alerts; shorten MTTR.</p>\n<p>• Safety &amp; compliance: enforce lockout/tagout, ESD, PPE; ensure audit-ready tickets, evidence and change traces.</p>\n<p>• Parts &amp; logistics (advanced): plan spares strategy, track failure trends, and drive proactive vendor actions.</p>\n<p>About you:</p>\n<p>• 5+ years in data center/server hardware or L2/L3 hardware support, with proven complex hands-on work in production (HPC/AI/Cloud at scale).</p>\n<p>• End-to-end hardware expertise: comfortable across CPU/memory/PCIe cards (incl. accelerators), NICs, PSUs, drives, network, power and cooling (including DLC); strong judgment on when/how to escalate.</p>\n<p>• Diagnostics depth: confident in analyzing BMC/IPMI logs, linux software logs and crashes simple CLI checks; methodical root cause analysis.</p>\n<p>• Safety &amp; discipline: impeccable ESD/LOTO/PPE habits; zero rough handling; clean, labeled, auditable work.</p>\n<p>• Communication &amp; mentoring: crisp status/handovers; able to coach L1s during live operations.</p>\n<p>Provide technical documentations to L1s or other team</p>\n<p>Mobility: willing to travel between sites (Paris area or nearby regions, occasionally in Europe or US))</p>\n<p>Nice to have:</p>\n<p>• Vendor tools (iDRAC/iLO/IPMI), RAID/storage basics (NVMe/SAS/SATA), high-speed interconnect (Ethernet/InfiniBand).</p>\n<p>• Coding/automation (Python/Bash) for small ops tools and reporting.</p>\n<p>• Experience with ticketing (Jira/ServiceNow), inventory/RMA flows, vendor coordination.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_058f6a10-283","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai","logo":"https://logos.yubhub.co/mistral.ai.png"},"x-apply-url":"https://jobs.lever.co/mistral/ea94b55b-58e1-437b-bf3d-07ed150308e3","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["data center/server hardware","L2/L3 hardware support","HPC/AI/Cloud at scale","end-to-end hardware expertise","diagnostics depth","safety & discipline","communication & mentoring"],"x-skills-preferred":["vendor tools","RAID/storage basics","high-speed interconnect","coding/automation","ticketing","inventory/RMA flows","vendor coordination"],"datePosted":"2026-04-17T12:47:46.512Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Paris"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"data center/server hardware, L2/L3 hardware support, HPC/AI/Cloud at scale, end-to-end hardware expertise, diagnostics depth, safety & discipline, communication & mentoring, vendor tools, RAID/storage basics, high-speed interconnect, coding/automation, ticketing, inventory/RMA flows, vendor coordination"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_b7bde4cf-9c8"},"title":"Datacenter Hardware Engineer, HPC","description":"<p>About Mistral</p>\n<p>At Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life.</p>\n<p>Our compute footprint is growing fast to support our science and engineering teams. We’re hiring a Datacenter HW Engineer to maintain, troubleshoot, and scale our GPU/CPU clusters safely and reliably.</p>\n<p>You’ll execute hands-on hardware work in our Paris-area datacenter and partner with hardware owners, DC operations, and vendors to keep one of France’s largest GPU clusters healthy.</p>\n<p>Location: Bruyères-le-Châtel , on-site, field role</p>\n<p>Reporting line: Hardware Ops</p>\n<p>Impact</p>\n<p>• Compute is a key lever for Mistral’s success and our largest spend item.\n• Direct impact on scale: your work keeps one of France’s largest AI clusters healthy as we grow to unprecedented scale.\n• Enable breakthrough AI: you unlock our science &amp; engineering teams to deliver groundbreaking AI solutions.</p>\n<p>Responsibilities</p>\n<p>• Diagnose &amp; operate core server/cluster components - Investigate and handle compute/storage hardware issues (CPU, memory, drives, NICs, GPUs, PSUs) and interconnect problems (switches, cables, transceivers; Ethernet/InfiniBand).\n• Safety &amp; procedures - Apply lockout/tagout (LOTO) and ESD discipline; follow pre/post-work checklists; maintain tidy, safe work areas.\n• First-line diagnostics - Triage using LEDs, POST, beep codes and basic tests; capture evidence (photos, serials, results); open/update/close tickets with clear notes.\n• Preventive maintenance - Provide feedback and ideas to improve proactive activities, monitoring, and targeted follow-ups on recurring or specific anomalies; help turn ad-hoc checks into SOPs, alerts, and dashboards.\n• Parts &amp; logistics - Receive and track parts, keep labeled inventory accurate, manage simple RMAs, and coordinate with vendors.\n• Collaboration &amp; escalation - Partner with senior hardware/firmware owners on complex or multi-node issues; communicate status and next steps crisply.\n• Documentation &amp; quality - Keep SOPs/checklists current; ensure zero undocumented changes and consistent, audit-ready records.</p>\n<p>About you</p>\n<p>• Hands-on mindset in datacenters/server hardware: you can install/re-seat/swap GPU/PCIe cards, NICs, PSUs, drives, and work cleanly in racks (rails, cabling, labeling).\n• Disciplined and meticulous: follows checklists, ESD/LOTO; no rough handling; careful with all high-value server components.\n• Practical electrical basics: power-off, PPE, short-circuit risk awareness.\n• Comfortable in racks: cooling, network, storage, PDU, cable management; can lift/mount safely (within HSE limits).\n• Clear communicator: short factual updates; reliable teammate; punctual and process-minded.\n• Hardware-passionate, professionally grounded: strong curiosity and craft mindset.</p>\n<p>Nice to have</p>\n<p>• HPC/AI/Cloud at scale experience (production environments), large-fleet/server install &amp; maintenance in datacenters.\n• Basic networking (Ethernet/InfiniBand) and basic Linux (boot/check; no coding needed).\n• Coding/automation skills (Python/Bash): small tools/scripts to improve checklists, photo/serial capture, inventory sync, or simple monitoring/reporting.\n• Experience with inventory/RMA tools and vendor coordination.\n• Exposure to HPC/research/industrial environments.</p>\n<p>What we offer</p>\n<p>💰 Competitive salary and equity package</p>\n<p>🧑‍⚕️ Health insurance</p>\n<p>🚴 Transportation allowance</p>\n<p>🥎 Sport allowance</p>\n<p>🥕 Meal vouchers</p>\n<p>💰 Private pension plan</p>\n<p>🍼 Generous parental leave policy</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_b7bde4cf-9c8","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai/careers","logo":"https://logos.yubhub.co/mistral.ai.png"},"x-apply-url":"https://jobs.lever.co/mistral/ddf7bcbb-e223-4768-a553-6e95df472cf7","x-work-arrangement":"onsite","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["GPU/CPU clusters","server hardware","Linux fundamentals","scripting","electrical basics","networking","inventory management"],"x-skills-preferred":["HPC/AI/Cloud at scale experience","basic Linux","coding/automation skills","inventory/RMA tools","vendor coordination"],"datePosted":"2026-04-17T12:47:08.660Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Paris"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"GPU/CPU clusters, server hardware, Linux fundamentals, scripting, electrical basics, networking, inventory management, HPC/AI/Cloud at scale experience, basic Linux, coding/automation skills, inventory/RMA tools, vendor coordination"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_24be48df-238"},"title":"Field Hardware Engineer, HPC","description":"<p>We&#39;re hiring a Field HW Engineer to work on-site at our data centre in Bruyères-le-Châtel. As a Field HW Engineer, you will be responsible for understanding end-to-end systems, executing complex/vendor-level interventions, and guiding L1 engineers on site.</p>\n<p>Your work will involve hands-on troubleshooting and repair of compute, storage, interconnect and cooling systems to keep our large GPU/CPU cluster healthy and scalable. You will also be responsible for leading complex interventions, advanced diagnostics, guiding and uplifting L1s, process and automation, safety and compliance, and parts and logistics.</p>\n<p>To be successful in this role, you will need 5+ years of experience in data center/server hardware or L2/L3 hardware support, with proven complex hands-on work in production (HPC/AI/Cloud at scale). You should have end-to-end hardware expertise, including comfort with CPU/memory/PCIe cards, NICs, PSUs, drives, network, power and cooling. You should also be confident in analyzing BMC/IPMI logs, linux software logs and crashes simple CLI checks, and have methodical root cause analysis skills.</p>\n<p>The ideal candidate will be willing to travel between sites (Paris area or nearby regions, occasionally in Europe or US) and have a strong understanding of safety and discipline, including impeccable ESD/LOTO/PPE habits, zero rough handling, and clean, labeled, auditable work.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_24be48df-238","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai"},"x-apply-url":"https://jobs.lever.co/mistral/ea94b55b-58e1-437b-bf3d-07ed150308e3","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["data center/server hardware","L2/L3 hardware support","complex hands-on work in production (HPC/AI/Cloud at scale)","end-to-end hardware expertise","CPU/memory/PCIe cards","NICs","PSUs","drives","network","power and cooling","BMC/IPMI logs","linux software logs","crashes simple CLI checks","root cause analysis"],"x-skills-preferred":["vendor tools (iDRAC/iLO/IPMI)","RAID/storage basics (NVMe/SAS/SATA)","high-speed interconnect (Ethernet/InfiniBand)","coding/automation (Python/Bash)"],"datePosted":"2026-03-10T11:27:14.542Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Bruyères-le-Châtel"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"data center/server hardware, L2/L3 hardware support, complex hands-on work in production (HPC/AI/Cloud at scale), end-to-end hardware expertise, CPU/memory/PCIe cards, NICs, PSUs, drives, network, power and cooling, BMC/IPMI logs, linux software logs, crashes simple CLI checks, root cause analysis, vendor tools (iDRAC/iLO/IPMI), RAID/storage basics (NVMe/SAS/SATA), high-speed interconnect (Ethernet/InfiniBand), coding/automation (Python/Bash)"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_c8c20fa9-7f3"},"title":"Datacenter Hardware Engineer, HPC","description":"<p>About Mistral AI</p>\n<p>At Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life.</p>\n<p>We are a company that democratizes AI through high-performance, optimized, open-source and cutting-edge models, products and solutions. Our comprehensive AI platform is designed to meet enterprise needs, whether on-premises or in cloud environments.</p>\n<p>Our offerings include le Chat, the AI assistant for life and work. We are a team passionate about AI and its potential to transform society.</p>\n<p>Role Summary</p>\n<p>Our compute footprint is growing fast to support our science and engineering teams. We’re hiring a Datacenter HW Engineer to maintain, troubleshoot, and scale our GPU/CPU clusters safely and reliably.</p>\n<p>What you will do</p>\n<ul>\n<li>Diagnose &amp; operate core server/cluster components - Investigate and handle compute/storage hardware issues (CPU, memory, drives, NICs, GPUs, PSUs) and interconnect problems (switches, cables, transceivers; Ethernet/InfiniBand).</li>\n<li>Safety &amp; procedures - Apply lockout/tagout (LOTO) and ESD discipline; follow pre/post-work checklists; maintain tidy, safe work areas.</li>\n<li>First-line diagnostics - Triage using LEDs, POST, beep codes and basic tests; capture evidence (photos, serials, results); open/update/close tickets with clear notes.</li>\n<li>Preventive maintenance - Provide feedback and ideas to improve proactive activities, monitoring, and targeted follow-ups on recurring or specific anomalies; help turn ad-hoc checks into SOPs, alerts, and dashboards.</li>\n<li>Parts &amp; logistics - Receive and track parts, keep labeled inventory accurate, manage simple RMAs, and coordinate with vendors.</li>\n<li>Collaboration &amp; escalation - Partner with senior hardware/firmware owners on complex or multi-node issues; communicate status and next steps crisply.</li>\n<li>Documentation &amp; quality - Keep SOPs/checklists current; ensure zero undocumented changes and consistent, audit-ready records.</li>\n</ul>\n<p>About you</p>\n<ul>\n<li>Hands-on mindset in datacenters/server hardware: you can install/re-seat/swap GPU/PCIe cards, NICs, PSUs, drives, and work cleanly in racks (rails, cabling, labeling).</li>\n<li>Disciplined and meticulous: follows checklists, ESD/LOTO; no rough handling; careful with all high-value server components.</li>\n<li>Practical electrical basics: power-off, PPE, short-circuit risk awareness.</li>\n<li>Comfortable in racks: cooling, network, storage, PDU, cable management; can lift/mount safely (within HSE limits).</li>\n<li>Clear communicator: short factual updates; reliable teammate; punctual and process-minded.</li>\n<li>Hardware-passionate, professionally grounded: strong curiosity and craft mindset.</li>\n</ul>\n<p>Nice to have</p>\n<ul>\n<li>HPC/AI/Cloud at scale experience (production environments), large-fleet/server install &amp; maintenance in datacenters.</li>\n<li>Basic networking (Ethernet/InfiniBand) and basic Linux (boot/check; no coding needed).</li>\n<li>Coding/automation skills (Python/Bash): small tools/scripts to improve checklists, photo/serial capture, inventory sync, or simple monitoring/reporting.</li>\n<li>Experience with inventory/RMA tools and vendor coordination.</li>\n<li>Exposure to HPC/research/industrial environments.</li>\n</ul>\n<p>What we offer</p>\n<ul>\n<li>Competitive salary and equity package</li>\n<li>Health insurance</li>\n<li>Transportation allowance</li>\n<li>Sport allowance</li>\n<li>Meal vouchers</li>\n<li>Private pension plan</li>\n<li>Generous parental leave policy</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_c8c20fa9-7f3","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai"},"x-apply-url":"https://jobs.lever.co/mistral/ddf7bcbb-e223-4768-a553-6e95df472cf7","x-work-arrangement":"onsite","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Datacenter hardware","Server hardware","GPU/CPU clusters","Networking","Linux","Scripting (Python/Bash)","Inventory/RMA tools","Vendor coordination"],"x-skills-preferred":["HPC/AI/Cloud at scale experience","Basic networking (Ethernet/InfiniBand)","Basic Linux (boot/check; no coding needed)","Coding/automation skills (Python/Bash)"],"datePosted":"2026-03-10T11:25:48.956Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Paris"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Datacenter hardware, Server hardware, GPU/CPU clusters, Networking, Linux, Scripting (Python/Bash), Inventory/RMA tools, Vendor coordination, HPC/AI/Cloud at scale experience, Basic networking (Ethernet/InfiniBand), Basic Linux (boot/check; no coding needed), Coding/automation skills (Python/Bash)"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_7143feef-7ec"},"title":"Active Directory / Identity Engineer","description":"<p>We are seeking an experienced Active Directory SME and Azure identity lead to join our Global IT team. This is a lead role to work closely with Global IT members and regional staff.</p>\n<p>The role will have primary responsibility for technical decisions and guiding the company-wide AD architecture. This position will have a focus on on-premises directory services and identity management, as well as their cloud-based counterparts, such as Azure Active Directory.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Overall responsibility for the company-wide Active Directory infrastructure</li>\n<li>Plan and deploy Active Directory Domain Services inter-forest and intra-forest migrations</li>\n<li>Plan and deploy the extension of active directory to Azure domain services and future-proof the existing environment</li>\n<li>Plan, coordinate and execute any life cycle activities with regard to Active Directory and surrounding systems</li>\n<li>Plan and execute large-scale Active Directory Domain Services, Federation Services, and Certificate Services upgrades</li>\n<li>Assess the sizing and health of acquisition Active Directory deployments in support of consolidation efforts</li>\n<li>Design and deploy various identity management solutions such as Azure Active Directory Premium</li>\n<li>Design and implement solutions for customers based around core Microsoft technologies, such as DFS, DHCP, DNS, File and Print Services, IIS, etc.</li>\n<li>Keep up on emerging technologies and understand how they can add value to existing infrastructures</li>\n<li>Work with regional and infosec teams closely as SME for active directory</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>Strong Risk &amp; Security awareness</li>\n<li>Strong verbal and written communication skills</li>\n<li>The ability to translate business needs into technical requirements</li>\n<li>Must love documentation. Creating detailed design or migration documents is a regular occurrence</li>\n<li>Expert-level Active Directory Domain Services experience including Azure AD</li>\n<li>Capable of designing and deploying AD Certificate Services and Azure domain services</li>\n<li>Strong scripting skills. PowerShell 3.0+ experience is very strongly preferred</li>\n<li>Must be able to speak about topics such as routing and switching, storage, virtualization, and server hardware as it relates to the above technologies and responsibilities</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_7143feef-7ec","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Keywords Group","sameAs":"https://apply.workable.com","logo":"https://logos.yubhub.co/j.com.png"},"x-apply-url":"https://apply.workable.com/j/BBAE60898D","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Active Directory Domain Services","Azure AD","PowerShell","AD Certificate Services","Azure domain services","DFS","DHCP","DNS","File and Print Services","IIS"],"x-skills-preferred":["Routing and switching","Storage","Virtualization","Server hardware"],"datePosted":"2026-03-09T10:50:34.529Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"London"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"IT","industry":"Technology","skills":"Active Directory Domain Services, Azure AD, PowerShell, AD Certificate Services, Azure domain services, DFS, DHCP, DNS, File and Print Services, IIS, Routing and switching, Storage, Virtualization, Server hardware"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_9d1edc39-95f"},"title":"Senior Engineer, Datacenter Server Lifecycle","description":"<p><strong>About the Role</strong></p>\n<p>Anthropic is expanding beyond cloud infrastructure, and this role sits at the heart of that effort. As a Senior Engineer on the Datacenter Machine Lifecycle team, you will own the end-to-end operational journey of every machine in our facility — from initial provisioning and deployment, across its working life, through maintenance and refresh, and all the way to decommissioning. This is greenfield work: you will help define the processes, tooling, and operational standards that govern how we run and retire hardware at scale.</p>\n<p>A distinguishing aspect of this role is its deep intersection with security. The machines in our datacenter handle some of the most sensitive workloads in AI — training frontier models and serving millions of users interacting with Claude. Ensuring that every machine in the fleet is trusted, attested, and operating with a verified chain of integrity from the hardware up is a core part of the job, not an afterthought. You will partner closely with our Infrastructure Security team to define and enforce trusted compute standards across the lifecycle, from secure provisioning through end-of-life handling.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Lead the build-out of automation to support datacenters containing tens of thousands of servers.</li>\n<li>Own and define the end-to-end machine lifecycle strategy — from provisioning and deployment through operation, maintenance, refresh, and decommissioning — and maintain automation and operational procedures for common lifecycle events (e.g. hardware failures, firmware upgrades, fleet rotations).</li>\n<li>Partner closely with Infrastructure Security to design and enforce trusted compute standards across the machine lifecycle.</li>\n<li>Work closely with our Networking team to ensure end-to-end connectivity across all sites.</li>\n<li>Build and maintain tooling to track machine health, configuration, and operational status across the full datacenter fleet.</li>\n</ul>\n<p><strong>You May Be a Good Fit If You</strong></p>\n<ul>\n<li>Have 5+ years of experience in datacenter operations, hardware infrastructure management, or a closely related discipline.</li>\n<li>Have deep, hands-on experience with server hardware — including rack deployment, cabling, troubleshooting, and understanding failure modes at scale.</li>\n<li>Understand hardware lifecycle management end-to-end: asset tracking, provisioning workflows, maintenance scheduling, and decommissioning practices.</li>\n<li>Have strong proficiency in at least one programming language (e.g., Python, Rust, Go, or Java).</li>\n<li>Are comfortable navigating ambiguity and working independently to drive progress on complex, cross-functional problems.</li>\n<li>Communicate clearly and can build consensus with a wide range of stakeholders.</li>\n<li>Have working knowledge of modern cloud infrastructure, including Kubernetes, Infrastructure as Code, AWS, and GCP.</li>\n<li>Are comfortable with occasional travel to datacenter sites across North America.</li>\n</ul>\n<p><strong>Strong Candidates May Also Have</strong></p>\n<ul>\n<li>Hands-on experience with GPU or AI accelerator hardware (e.g. NVIDIA A100/H100, AMD MI300, Google TPUs, or AWS Trainium) and an understanding of their operational demands.</li>\n<li>Familiarity with modern provisioning tooling such as coreboot, LinuxBoot, or u-root.</li>\n<li>Experience building or contributing to datacenter automation or fleet management platforms.</li>\n<li>Experience building and deploying server operating system distributions across thousands of hosts.</li>\n<li>A background in large-scale capacity planning and hardware refresh strategy, ideally at a hyperscaler or large cloud provider.</li>\n<li>Experience with trusted compute and hardware security concepts such as secure boot, TPM, hardware attestation, and firmware verification — or a strong desire to develop deep expertise in this area.</li>\n</ul>\n<p><strong>Logistics</strong></p>\n<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience. <strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>\n<p><strong>Visa sponsorship:</strong> We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>\n<p><strong>We encourage you to apply even if you do not believe you meet every single qualification.</strong> Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work. We think AI systems like the ones we&#39;re building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team.</p>\n<p><strong>Your safety matters to us.</strong> To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_9d1edc39-95f","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://job-boards.greenhouse.io","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5131038008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"£255,000 - £325,000GBP","x-skills-required":["datacenter operations","hardware infrastructure management","server hardware","programming language","cloud infrastructure","Kubernetes","Infrastructure as Code","AWS","GCP"],"x-skills-preferred":["GPU or AI accelerator hardware","modern provisioning tooling","datacenter automation","fleet management platforms","server operating system distributions","trusted compute and hardware security concepts"],"datePosted":"2026-03-08T13:45:45.465Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"London, UK"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"datacenter operations, hardware infrastructure management, server hardware, programming language, cloud infrastructure, Kubernetes, Infrastructure as Code, AWS, GCP, GPU or AI accelerator hardware, modern provisioning tooling, datacenter automation, fleet management platforms, server operating system distributions, trusted compute and hardware security concepts","baseSalary":{"@type":"MonetaryAmount","currency":"GBP","value":{"@type":"QuantitativeValue","minValue":255000,"maxValue":325000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_f5e7e195-679"},"title":"Datacenter Hardware Operations Technician, AI Compute Infrastructure - Stargate","description":"<p><strong>Job Posting</strong></p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$86.4K – $228K</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p><strong>About the Team</strong></p>\n<p>OpenAI, in close collaboration with our capital partners, is embarking on a journey to build the world’s most advanced AI infrastructure ecosystem. Our Stargate program develops and deploys massive, state-of-the-art data center campuses in partnership with industry leaders such as Oracle today—and through future OpenAI infrastructure projects tomorrow. We design for scale, speed, and reliability, and we need experienced hardware professionals who can help ensure our high-density compute environment operates at peak performance.</p>\n<p><strong>About the Role</strong></p>\n<p>We are seeking a senior datacenter hardware operations technician to coordinate physical hardware activities at a large partner-operated campus. In this role you will work side-by-side with Oracle and their delivery teams, helping align OpenAI’s compute requirements with day-to-day hardware work on the ground. Rather than directing partner personnel, you will focus on collaboration, technical alignment, and shared problem solving, ensuring that maintenance, repairs, and lifecycle activities support the performance and reliability goals of both organizations. As the campus matures, you will help capture lessons learned and develop standards and playbooks to guide hardware operations at future OpenAI infrastructure projects.</p>\n<p>_Candidates must be able to sit onsite in Abilene, Texas 5 days per week_</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Serve as OpenAI’s primary on-site hardware contact, collaborating with Oracle teams and vendors to plan and coordinate maintenance, repairs, and lifecycle activities.</li>\n</ul>\n<ul>\n<li>Share technical requirements and verify that work performed supports OpenAI’s compute needs and agreed quality targets.</li>\n</ul>\n<ul>\n<li>Coordinate schedules, spare-parts planning, and issue escalation with partner teams to minimize downtime and keep operations running smoothly.</li>\n</ul>\n<ul>\n<li>Work with OpenAI fleet-health engineers to translate software-detected issues into on-site hardware actions in partnership with Oracle.</li>\n</ul>\n<ul>\n<li>Track hardware trends and provide joint recommendations with partner teams for design or operational improvements.</li>\n</ul>\n<ul>\n<li>Prepare documentation and runbooks that capture joint best practices and can be applied at additional campuses.</li>\n</ul>\n<ul>\n<li>Offer technical guidance and context to partner personnel while respecting their operational ownership.</li>\n</ul>\n<ul>\n<li>Collaborate with supply-chain teams to plan spares and manage hardware lifecycle activities.</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>Have 7+ years of experience in datacenter hardware operations, hardware engineering, or large-scale server maintenance, with at least 2 years in a senior or lead technician capacity.</li>\n</ul>\n<ul>\n<li>Bring deep knowledge of high-density server hardware, including x86 platforms, GPUs, storage devices, and power/cooling systems.</li>\n</ul>\n<ul>\n<li>Excel at diagnosing hardware issues, coordinating complex repairs, and maintaining strong working relationships across organizations.</li>\n</ul>\n<ul>\n<li>Are comfortable setting technical expectations and validating outcomes through collaboration, not direct management.</li>\n</ul>\n<ul>\n<li>Adapt quickly to changing operational conditions and enjoy solving problems at both the strategic and on-site levels.</li>\n</ul>\n<ul>\n<li>Communicate clearly and build trust across partner teams, vendors, and internal engineering stakeholders.</li>\n</ul>\n<ul>\n<li>Are willing to be based full-time at a partner-operated campus</li>\n</ul>\n<p><strong>Preferred Skills</strong></p>\n<ul>\n<li>Familiarity with large-scale cluster management or monitoring tools (IPMI, BMC, Prometheus, Nagios) to interpret alerts and coordinate partner responses.</li>\n</ul>\n<ul>\n<li>Experience with GPU-accelerated compute clusters or other high-performance computing hardware.</li>\n</ul>\n<ul>\n<li>Knowledge of Linux/Unix system administration and command-line diagnostic tools for hardware validation.</li>\n</ul>\n<ul>\n<li>Industry certifications such as CompTIA Server+, OEM hardware certifications, or equivalent.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_f5e7e195-679","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/b9a4a809-a965-4dbe-aeef-6ce1593903dd","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$86.4K – $228K","x-skills-required":["datacenter hardware operations","hardware engineering","large-scale server maintenance","high-density server hardware","x86 platforms","GPUs","storage devices","power/cooling systems"],"x-skills-preferred":["large-scale cluster management","monitoring tools","IPMI","BMC","Prometheus","Nagios","GPU-accelerated compute clusters","Linux/Unix system administration","command-line diagnostic tools","industry certifications"],"datePosted":"2026-03-06T18:43:34.654Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Remote - US"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"datacenter hardware operations, hardware engineering, large-scale server maintenance, high-density server hardware, x86 platforms, GPUs, storage devices, power/cooling systems, large-scale cluster management, monitoring tools, IPMI, BMC, Prometheus, Nagios, GPU-accelerated compute clusters, Linux/Unix system administration, command-line diagnostic tools, industry certifications","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":86400,"maxValue":228000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_1c21aa9c-b37"},"title":"Software Engineer, Fleet Hardware Health","description":"<p><strong>Software Engineer, Fleet Hardware Health</strong></p>\n<p><strong>Location</strong></p>\n<p>San Francisco</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p>Scaling</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$230K – $490K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p>More details about our benefits are available to candidates during the hiring process.</p>\n<p>This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.</p>\n<p><strong>About the team</strong></p>\n<p>The Fleet team at OpenAI supports the computing environment that powers our cutting-edge research and product development. We oversee large-scale systems that span data centers, GPUs, networking, and more, ensuring high availability, performance, and efficiency. Our work enables OpenAI’s models to operate seamlessly at scale, supporting both internal research and external products like ChatGPT. We prioritize safety, reliability, and responsible AI deployment over unchecked growth.</p>\n<p><strong>About the role</strong></p>\n<p>As a software engineer on the Fleet Hardware team, you will be responsible for the reliability and uptime of all of OpenAI’s compute fleet. Minimizing hardware failure is key to research training progress and stable services, as even a single hardware hiccup can cause significant disruptions. With increasingly large supercomputers, the stakes continue to rise.</p>\n<p>Being at the forefront of technology means that we are often the pioneers in troubleshooting these state-of-the-art systems at scale. This is a unique opportunity to work with cutting-edge technologies and devise innovative solutions to maintain the health and efficiency of our supercomputing infrastructure.</p>\n<p>Our team empowers strong engineers with a high degree of autonomy and ownership, as well as ability to effect change. This role will require a keen focus on system-level comprehensive investigations and the development of automated solutions. We want people who go deep on problems, investigate as thoroughly as possible, and build automation for detection and remediation at scale.</p>\n<p><strong>In this role, you will:</strong></p>\n<ul>\n<li>Build and maintain automation systems for provisioning and managing server fleets.</li>\n</ul>\n<ul>\n<li>Develop tools to monitor server health, performance, and lifecycle events.</li>\n</ul>\n<ul>\n<li>Collaborate with clusters, networking, and infrastructure teams.</li>\n</ul>\n<ul>\n<li>Partner with external operators to ensure a high level of quality.</li>\n</ul>\n<ul>\n<li>Identify and fix performance bottlenecks and inefficiencies.</li>\n</ul>\n<ul>\n<li>Continuously improve automation to reduce manual work.</li>\n</ul>\n<p><strong>You might thrive in this role if you have:</strong></p>\n<ul>\n<li>Experience managing large-scale server environments.</li>\n</ul>\n<ul>\n<li>A balance of strengths in building and operationalizing.</li>\n</ul>\n<ul>\n<li>Proficiency in Python, Go, or similar languages.</li>\n</ul>\n<ul>\n<li>Strong Linux, networking, and server hardware knowledge.</li>\n</ul>\n<ul>\n<li>Comfort digging into noisy data with SQL, PromQL, and Pandas or any other tool.</li>\n</ul>\n<p><strong>Prior hardware expertise is not required for this role.</strong></p>\n<p><strong>Bonus Skills:</strong></p>\n<ul>\n<li>Experience with low level details of hardware components, protocols, and associated Linux tooling (e.g., PCIe, Infiniband, networking, power management, kernel perf tuning)</li>\n</ul>\n<ul>\n<li>Knowledge of hardware management protocols (e.g., IPMI, Redfish).</li>\n</ul>\n<ul>\n<li>High-performance computing (HPC) or distributed systems experience.</li>\n</ul>\n<ul>\n<li>Prior experience developing, managing, or designing hardware.</li>\n</ul>\n<ul>\n<li>Familiarity with monitoring tools (e.g., Prometheus, Grafana).</li>\n</ul>\n<p><strong>About OpenAI</strong></p>\n<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_1c21aa9c-b37","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/57551641-208c-48d9-bfb8-9a298d7e7510","x-work-arrangement":"onsite","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$230K – $490K • Offers Equity","x-skills-required":["Python","Go","Linux","networking","server hardware","SQL","PromQL","Pandas"],"x-skills-preferred":["low level details of hardware components","protocols","Linux tooling","hardware management protocols","high-performance computing","distributed systems","monitoring tools"],"datePosted":"2026-03-06T18:32:47.315Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Go, Linux, networking, server hardware, SQL, PromQL, Pandas, low level details of hardware components, protocols, Linux tooling, hardware management protocols, high-performance computing, distributed systems, monitoring tools","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":230000,"maxValue":490000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_795b8789-196"},"title":"Software Engineer, GPU Infrastructure - HPC","description":"<p><strong>Software Engineer, GPU Infrastructure - HPC</strong></p>\n<p><strong>Location</strong></p>\n<p>San Francisco</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p>Scaling</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$230K – $490K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p>More details about our benefits are available to candidates during the hiring process.</p>\n<p>This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.</p>\n<p><strong>About the team</strong></p>\n<p>The Fleet team at OpenAI supports the computing environment that powers our cutting-edge research and product development. We oversee large-scale systems that span data centers, GPUs, networking, and more, ensuring high availability, performance, and efficiency. Our work enables OpenAI’s models to operate seamlessly at scale, supporting both internal research and external products like ChatGPT. We prioritize safety, reliability, and responsible AI deployment over unchecked growth.</p>\n<p><strong>About the role</strong></p>\n<p>As a software engineer on the Fleet High Performance Computing (HPC) team, you will be responsible for the reliability and uptime of all of OpenAI’s compute fleet. Minimizing hardware failure is key to research training progress and stable services, as even a single hardware hiccup can cause significant disruptions. With increasingly large supercomputers, the stakes continue to rise.</p>\n<p>Being at the forefront of technology means that we are often the pioneers in troubleshooting these state-of-the-art systems at scale. This is a unique opportunity to work with cutting-edge technologies and devise innovative solutions to maintain the health and efficiency of our supercomputing infrastructure.</p>\n<p>Our team empowers strong engineers with a high degree of autonomy and ownership, as well as ability to effect change. This role will require a keen focus on system-level comprehensive investigations and the development of automated solutions. We want people who go deep on problems, investigate as thoroughly as possible, and build automation for detection and remediation at scale.</p>\n<p><strong>In this role, you will:</strong></p>\n<ul>\n<li>Build and maintain automation systems for provisioning and managing server fleets.</li>\n</ul>\n<ul>\n<li>Develop tools to monitor server health, performance, and lifecycle events.</li>\n</ul>\n<ul>\n<li>Collaborate with clusters, networking, and infrastructure teams.</li>\n</ul>\n<ul>\n<li>Partner with external operators to ensure a high level of quality.</li>\n</ul>\n<ul>\n<li>Identify and fix performance bottlenecks and inefficiencies.</li>\n</ul>\n<ul>\n<li>Continuously improve automation to reduce manual work.</li>\n</ul>\n<p><strong>You might thrive in this role if you have:</strong></p>\n<ul>\n<li>Experience managing large-scale server environments.</li>\n</ul>\n<ul>\n<li>A balance of strengths in building and operationalizing.</li>\n</ul>\n<ul>\n<li>Proficiency in Python, Go, or similar languages.</li>\n</ul>\n<ul>\n<li>Strong Linux, networking, and server hardware knowledge.</li>\n</ul>\n<ul>\n<li>Comfort digging into noisy data with SQL, PromQL, and Pandas or any other tool.</li>\n</ul>\n<p><strong>Prior hardware expertise is not required for this role.</strong></p>\n<p><strong>Bonus Skills:</strong></p>\n<ul>\n<li>Experience with low level details of hardware components, protocols, and associated Linux tooling (e.g., PCIe, Infiniband, networking, power management, kernel perf tuning)</li>\n</ul>\n<ul>\n<li>Knowledge of hardware management protocols (e.g., IPMI, Redfish).</li>\n</ul>\n<ul>\n<li>High-performance computing (HPC) or distributed systems experience.</li>\n</ul>\n<ul>\n<li>Prior experience developing, managing, or designing hardware.</li>\n</ul>\n<ul>\n<li>Familiarity with monitoring tools (e.g., Prometheus, Grafana).</li>\n</ul>\n<p><strong>About OpenAI</strong></p>\n<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_795b8789-196","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/f58cb1eb-9642-4a4d-a14d-d7a57d583a11","x-work-arrangement":"onsite","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$230K – $490K • Offers Equity","x-skills-required":["Python","Go","Linux","networking","server hardware","SQL","PromQL","Pandas"],"x-skills-preferred":["low level details of hardware components","protocols","Linux tooling","hardware management protocols","high-performance computing","distributed systems","monitoring tools"],"datePosted":"2026-03-06T18:28:35.701Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Go, Linux, networking, server hardware, SQL, PromQL, Pandas, low level details of hardware components, protocols, Linux tooling, hardware management protocols, high-performance computing, distributed systems, monitoring tools","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":230000,"maxValue":490000,"unitText":"YEAR"}}}]}