<?xml version="1.0" encoding="UTF-8"?>
<source>
  <jobs>
    <job>
      <externalid>c1dcea75-d5a</externalid>
      <Title>Member of Technical Staff - Infrastructure Engineer</Title>
      <Description><![CDATA[<p>We&#39;re looking for an experienced engineer to join our team in Freiburg, Germany or San Francisco, USA. As a Member of Technical Staff - Infrastructure Engineer, you will be responsible for maintaining and scaling our research infrastructure, ensuring health and optimizing components to extract peak performance from the system. You will also collaborate with research teams to deeply understand their infrastructure needs and design solutions that balance performance with cost efficiency.</p>
<p>Key responsibilities include:</p>
<ul>
<li>Maintaining research infrastructure, ensuring health, and optimizing components to extract peak performance from the system (both on application and infrastructure side)</li>
<li>Scaling infrastructure to meet growing research demands while maintaining reliability and performance</li>
<li>Collaborating with research teams to deeply understand their infrastructure needs, and design solutions that balance performance with cost efficiency</li>
<li>Identifying and resolving performance bottlenecks and capacity hotspots through deep analysis of distributed systems at scale</li>
<li>Building and evolving telemetry and monitoring systems to provide deep visibility into infrastructure performance, utilization, and costs across our cloud and datacenter fleets</li>
<li>Participating in on-call rotations and incident response to maintain system reliability</li>
</ul>
<p>Technical focus includes:</p>
<ul>
<li>Python, Bash, Go</li>
<li>Kubernetes</li>
<li>Nvidia GPU drivers and operators</li>
<li>OTel, Prometheus</li>
</ul>
<p>Requirements include:</p>
<ul>
<li>Experience building or operating large-scale training platforms</li>
<li>Worked with large-scale compute clusters (GPUs)</li>
<li>Proven ability to debug performance and reliability issues across large distributed fleets</li>
<li>Strong problem-solving skills and ability to work independently</li>
<li>Strong communication skills and the ability to work effectively with both internal and external partners</li>
<li>Deep knowledge of modern cloud infrastructure including Kubernetes, Infrastructure as Code, AWS, and GCP</li>
<li>Experience with SLURM</li>
</ul>
<p>We offer a competitive base annual salary of $180,000-$300,000 USD and a hybrid work model with a meaningful in-person presence.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>staff</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$180,000-$300,000 USD</Salaryrange>
      <Skills>Python, Bash, Go, Kubernetes, Nvidia GPU drivers, Nvidia GPU operators, OTel, Prometheus, Experience building or operating large-scale training platforms, Worked with large-scale compute clusters (GPUs), Proven ability to debug performance and reliability issues across large distributed fleets, Strong problem-solving skills and ability to work independently, Strong communication skills and the ability to work effectively with both internal and external partners, Deep knowledge of modern cloud infrastructure including Kubernetes, Infrastructure as Code, AWS, and GCP, Experience with SLURM</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Black Forest Labs</Employername>
      <Employerlogo>https://logos.yubhub.co/blackforestlabs.com.png</Employerlogo>
      <Employerdescription>Black Forest Labs develops foundational technologies for image and video creation, including Latent Diffusion, Stable Diffusion, and FLUX.</Employerdescription>
      <Employerwebsite>https://www.blackforestlabs.com/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/blackforestlabs/jobs/4925659008</Applyto>
      <Location>Freiburg (Germany), San Francisco (USA)</Location>
      <Country></Country>
      <Postedate>2026-04-17</Postedate>
    </job>
  </jobs>
</source>