<?xml version="1.0" encoding="UTF-8"?>
<source>
  <jobs>
    <job>
      <externalid>8eda17fb-432</externalid>
      <Title>Senior HPC Engineer</Title>
      <Description><![CDATA[<p>We are seeking a Senior HPC Engineer to join our team in a senior, hands-on role building and evolving large-scale, high-throughput HPC and GPU platforms that underpin AI- and machine-learning-driven research.</p>
<p>In this role, you will be part of a small, senior HPC team, taking end-to-end ownership of a significant area of the platform while collaborating closely with other subject-matter experts.</p>
<p>You will be a systems-level engineer who is comfortable owning complex technical decisions and designing and building production infrastructure, rather than advising from the sidelines.</p>
<p>We aim to build infrastructure that is reliable, understandable, and adaptable, and we value engineers who care about simplicity, clarity, and maintainability as much as raw performance.</p>
<p>Responsibilities:</p>
<ul>
<li>Design, build, and operate large-scale, high-throughput HPC and GPU clusters (for example, tens of thousands of CPU cores and hundreds of GPUs) supporting AI and machine-learning workloads.</li>
</ul>
<ul>
<li>Collaborate with other HPC engineers and subject-matter experts to co-design system architectures, review designs, and share knowledge.</li>
</ul>
<ul>
<li>Partner with storage specialists to architect and maintain high-performance, low-latency storage solutions, including parallel or scale-out file systems.</li>
</ul>
<ul>
<li>Work closely with researchers, data scientists, and engineers to understand computational needs and translate them into effective, scalable system designs.</li>
</ul>
<ul>
<li>Monitor, analyze, and optimize performance across compute, scheduling, networking, and storage layers.</li>
</ul>
<ul>
<li>Build and maintain automation and infrastructure-as-code for provisioning, configuration, monitoring, and lifecycle management, with an emphasis on repeatability and simplicity.</li>
</ul>
<ul>
<li>Participate in design reviews, operational discussions, and post-incident reviews with a focus on learning, collaboration, and system improvement rather than blame.</li>
</ul>
<ul>
<li>Explore alternative approaches to scheduling, data layout, cluster architectures, and GPU utilization through small experiments or prototypes, using data to guide decisions.</li>
</ul>
<ul>
<li>Produce clear documentation, diagrams, and reusable tooling that enable others to operate, debug, and extend the platform.</li>
</ul>
<ul>
<li>Stay current with advancements in HPC, GPU computing, networking, and storage, and help assess where new technologies can add real value.</li>
</ul>
<p>What you&#39;ll bring:</p>
<ul>
<li>Bachelor’s degree in Computer Science, Engineering, or a related technical field; a Master’s or PhD is a plus.</li>
</ul>
<ul>
<li>Typically 7+ years of hands-on experience designing, building, and operating HPC or large-scale compute environments.</li>
</ul>
<ul>
<li>Deep, practical experience with at least one major HPC scheduler (such as Slurm), including using it to operate large-scale or high-throughput clusters in production.</li>
</ul>
<ul>
<li>Hands-on experience with GPU-accelerated computing, including NVIDIA GPUs and associated software ecosystems.</li>
</ul>
<ul>
<li>Strong Linux systems engineering skills and comfort working close to the operating system, drivers, and hardware.</li>
</ul>
<ul>
<li>Experience designing or operating high-performance storage systems, including parallel or scale-out file systems.</li>
</ul>
<ul>
<li>Curious, evidence-driven problem solving, including experimenting with different approaches and using data to inform decisions.</li>
</ul>
<ul>
<li>A collaborative working style that values listening, respectful discussion, and incorporating different perspectives , whether you are more quiet and reflective or more vocal in group settings.</li>
</ul>
<ul>
<li>Clear written and verbal communication skills, and an ability to explain complex ideas in a way that works for different audiences.</li>
</ul>
<ul>
<li>A strong sense of ownership for outcomes, paired with openness to feedback, learning, and evolving systems over time.</li>
</ul>
<p>Additional experience that may be helpful:</p>
<ul>
<li>Experience with Kubernetes, Run:ai, or other workload orchestration platforms alongside traditional HPC schedulers.</li>
</ul>
<ul>
<li>Familiarity with Lustre, GPFS / Spectrum Scale, or similar high-performance storage technologies.</li>
</ul>
<ul>
<li>Exposure to cloud-based HPC environments (e.g., GCP or other major cloud providers).</li>
</ul>
<ul>
<li>Experience supporting quantitative research, finance, or other demanding compute-intensive workloads.</li>
</ul>
<ul>
<li>Interest in applying AI or ML techniques to infrastructure (for example, optimization, anomaly detection, or predictive analysis).</li>
</ul>
<p>The estimated base salary range for this position is $175,000 to $250,000, which is specific to New York and may change in the future. Millennium pays a total compensation package which includes a base salary, discretionary performance bonus, and a comprehensive benefits package.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange>$175,000 to $250,000</Salaryrange>
      <Skills>HPC, GPU, Linux, Slurm, NVIDIA, GPU-accelerated computing, High-performance storage systems, Parallel or scale-out file systems, Automation and infrastructure-as-code, Provisioning, Configuration, Monitoring, Lifecycle management</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>IT Infrastructure</Employername>
      <Employerlogo>https://logos.yubhub.co/mlp.eightfold.ai.png</Employerlogo>
      <Employerdescription>Millennium&apos;s Infrastructure organization designs, engineers, and operates a robust global computing platform supporting WorldQuant&apos;s quantitative research.</Employerdescription>
      <Employerwebsite>https://mlp.eightfold.ai</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://mlp.eightfold.ai/careers/job/755955818333</Applyto>
      <Location>New York, New York, United States of America</Location>
      <Country></Country>
      <Postedate>2026-04-25</Postedate>
    </job>
  </jobs>
</source>