{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/low-level-configuration"},"x-facet":{"type":"skill","slug":"low-level-configuration","display":"Low Level Configuration","count":1},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_7520a7f6-8b6"},"title":"Member of Technical Staff - Infrastructure Reliability","description":"<p>We are seeking a Member of Technical Staff - Infrastructure Reliability to join our team. As a key member of our infrastructure team, you will own the availability, performance, and evolution of our core compute, storage, and networking infrastructure. This is a joint xAI/X role: you will own 24×7 reliability for the world&#39;s largest GPU training superclusters and one of the highest-QPS production systems on the planet.</p>\n<p>You will define and execute the technical strategy for infrastructure reliability and scalability, build and maintain the automation, observability, and control planes that keep multi-datacenter, hybrid cloud/on-prem environments healthy, lead incident response, deep-dive root cause analysis, and post-mortems that drive real fixes, identify, instrument, and eliminate systemic failure patterns, design and implement high-leverage systems software in Python and Rust, and push the state of the art in large-scale GPU cluster operations and AI workload reliability.</p>\n<p>To succeed in this role, you will need 5+ years shipping production software and/or operating distributed infrastructure at scale, expert-level knowledge of Linux systems, TCP/IP networking, and systems programming, strong coding skills with proven production experience in Rust (strongly preferred) and at least one of Python, Go, or C++, deep experience with large-scale distributed systems in on-prem and cloud environments, hands-on expertise with container orchestration, container runtimes, and infrastructure-as-code, intimate understanding of common failure modes in distributed systems and how to mitigate them, and a track record of participating in (or building) effective on-call rotations in high-stakes environments.</p>\n<p>In addition to a competitive base salary, you will receive equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short &amp; long-term disability insurance, life insurance, and various other discounts and perks.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_7520a7f6-8b6","directApply":true,"hiringOrganization":{"@type":"Organization","name":"xAI","sameAs":"https://www.xai.com/","logo":"https://logos.yubhub.co/xai.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/xai/jobs/4801451007","x-work-arrangement":"onsite","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$180,000 - $400,000 USD","x-skills-required":["Linux systems","TCP/IP networking","systems programming","Rust","Python","Go","C++","container orchestration","container runtimes","infrastructure-as-code"],"x-skills-preferred":["high-performance networking","low level configuration","deployment","support","monitoring","administration","troubleshooting"],"datePosted":"2026-04-18T15:55:02.425Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Palo Alto, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Linux systems, TCP/IP networking, systems programming, Rust, Python, Go, C++, container orchestration, container runtimes, infrastructure-as-code, high-performance networking, low level configuration, deployment, support, monitoring, administration, troubleshooting","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":180000,"maxValue":400000,"unitText":"YEAR"}}}]}