<?xml version="1.0" encoding="UTF-8"?>
<source>
  <jobs>
    <job>
      <externalid>2ab9c635-07a</externalid>
      <Title>Operations Engineer, Fleet Reliability</Title>
      <Description><![CDATA[<p>The Fleet Reliability Operations team is responsible for the day-to-day provisioning, management, and uptime of CoreWeave&#39;s ever-expanding fleet of server nodes. This team plays a central role in CoreWeave&#39;s growth strategy, configuring, updating, and remotely troubleshooting our highest-tier supercomputing clusters and their networking, delivery platforms, and tools dependencies.</p>
<p>We are seeking curious, creative, and persistent problem solvers to join our Fleet Reliability Operations team to help drive batches of server nodes through our provisioning and validation processes while efficiently and effectively troubleshooting node or cluster problems as they arise.</p>
<p>Key responsibilities include:</p>
<ul>
<li>Configuring and maintaining large-scale high-performance supercomputing clusters running state-of-the-art GPUs</li>
<li>Troubleshooting hardware and software issues; escalating and coordinating as needed with data center, network, hardware, and platform teams to drive resolution</li>
<li>Monitoring and analyzing system performance and taking appropriate remediation actions for cloud health</li>
<li>Approaching work with flexibility and optimism, anticipating shifting business and technical priorities</li>
<li>Creating and maintaining documentation of team processes, knowledge, and best practices for system management</li>
<li>Thinking critically about day-to-day work and working collaboratively to improve team processes and efficiency</li>
</ul>
<p>As a member of our team, you will be part of a dynamic and fast-paced environment where you will have the opportunity to grow and develop your skills. We offer a competitive salary range of $83,000 to $110,000, as well as a comprehensive benefits package, including medical, dental, and vision insurance, company-paid life insurance, and flexible PTO.</p>
<p>If you are a motivated and detail-oriented individual who is passionate about working with cutting-edge technology, we encourage you to apply for this exciting opportunity.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>mid</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$83,000 to $110,000</Salaryrange>
      <Skills>Linux system administration, Troubleshooting hardware and software issues, System maintenance tasks, Scripting languages (bash, python, powershell, etc), Grafana, Prometheus, promsql queries or similar observability platforms, Kubernetes administration, HPC - administering GPU-related workloads, Data center environments including server racks, HVAC systems, fiber trays</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>CoreWeave</Employername>
      <Employerlogo>https://logos.yubhub.co/coreweave.com.png</Employerlogo>
      <Employerdescription>CoreWeave is a cloud computing company that provides a platform for building and scaling AI applications.</Employerdescription>
      <Employerwebsite>https://www.coreweave.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/coreweave/jobs/4617382006</Applyto>
      <Location>New York, NY /Plano, TX /  Bellevue, WA / Sunnyvale, CA</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>1868194d-726</externalid>
      <Title>Operations Engineer, HPC Networking</Title>
      <Description><![CDATA[<p>In this role, you will support the deployment, monitoring, troubleshooting, and maintenance of large-scale InfiniBand fabrics, ensuring their stability and performance.</p>
<p>The ideal candidate will have a strong operations mindset, effective collaboration skills, and the ability to solve complex issues in a dynamic environment.</p>
<p>Key responsibilities include:</p>
<ul>
<li>Regularly monitoring the performance and health of InfiniBand fabrics, including switches, host adapters, and nodes.</li>
<li>Investigating and resolving operational issues within InfiniBand fabrics, such as network connectivity problems and performance bottlenecks.</li>
<li>Assisting with the installation and operational bring-up of large InfiniBand fabrics in collaboration with onsite personnel and customer teams.</li>
<li>Performing routine maintenance and upgrades on InfiniBand switches and control plane components.</li>
<li>Collaborating with HPC cluster operations teams to provide troubleshooting and operational expertise.</li>
</ul>
<p>Investing in our people is one of our top priorities, and we value candidates who can bring their diversified experiences to our teams.</p>
<p>Minimum Qualifications:</p>
<ul>
<li>At least 1 year of experience with InfiniBand or similar networking technologies.</li>
<li>Solid understanding of networking concepts, including architectures, topologies, operational best practices, and troubleshooting.</li>
<li>Experience with Linux system administration and maintenance.</li>
<li>Proficiency in at least one scripting language.</li>
</ul>
<p>Preferred Qualifications:</p>
<ul>
<li>Hands-on experience with Nvidia UFM or similar fabric management tools.</li>
<li>Familiarity with SLURM job scheduler and its role in HPC environments.</li>
<li>Experience with monitoring and visualization platforms such as Grafana or Prometheus.</li>
<li>Experience with operational tooling and automation frameworks like Ansible.</li>
<li>Knowledge of data center operations, including server racks, and cabling.</li>
<li>Python or Bash scripting.</li>
</ul>
<p>Why CoreWeave? At CoreWeave, we work hard, have fun, and move fast! We’re in an exciting stage of hyper-growth that you will not want to miss out on. We’re not afraid of a little chaos, and we’re constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values:</p>
<ul>
<li>Be Curious at Your Core</li>
<li>Act Like an Owner</li>
<li>Empower Employees</li>
<li>Deliver Best-in-Class Client Experiences</li>
<li>Achieve More Together</li>
</ul>
<p>We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and enables the development of innovative solutions to complex problems. As we get set for takeoff, the organization&#39;s growth opportunities are constantly expanding. You will be surrounded by some of the best talent in the industry, who will want to learn from you, too.</p>
<p>Come join us!</p>
<p>The base salary range for this role is $110,000 to $179,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>mid</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$110,000 to $179,000</Salaryrange>
      <Skills>InfiniBand, Linux system administration, Scripting language, Networking concepts, Architectures, Topologies, Operational best practices, Troubleshooting, Nvidia UFM, SLURM job scheduler, Grafana, Prometheus, Ansible, Data center operations, Server racks, Cabling, Python, Bash scripting</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>CoreWeave</Employername>
      <Employerlogo>https://logos.yubhub.co/coreweave.com.png</Employerlogo>
      <Employerdescription>CoreWeave is a cloud computing company that provides a platform for building and scaling AI applications.</Employerdescription>
      <Employerwebsite>https://www.coreweave.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/coreweave/jobs/4673462006</Applyto>
      <Location>Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
  </jobs>
</source>