<?xml version="1.0" encoding="UTF-8"?>
<source>
  <jobs>
    <job>
      <externalid>3b419874-946</externalid>
      <Title>Senior Production Engineer</Title>
      <Description><![CDATA[<p>Production Engineering ensures CoreWeave&#39;s cloud delivers world-class reliability, performance, and operational excellence. We are hiring a Senior Production Engineer to take direct, hands-on ownership of critical tooling that drives reliability and delivery success.</p>
<p>In this role, you will work broadly across the cloud stack designing, implementing, deploying, and operating systems that improve delivery velocity, service availability, and operational safety. You’ll be responsible for leading end-to-end technical projects, maintaining long-lived systems the team owns, and strengthening our operational foundations through durable engineering investments.</p>
<p>This is a role for someone who enjoys building, debugging, and operating production systems. You will collaborate closely with service owners, but your primary impact comes from the reliability, quality, and maturity of the systems you deliver and maintain over time.</p>
<p><strong>Responsibilities</strong></p>
<ul>
<li>Take hands-on ownership of critical systems and frameworks, driving their architecture, implementation, and long-term evolution.</li>
<li>Lead end-to-end delivery of engineering projects that improve availability, scalability, operational automation, and failure recovery.</li>
<li>Build and maintain observability, alerting, automated remediation, and resilience testing for the systems you support.</li>
<li>Participate in incident response as a subject-matter expert; drive deep root-cause investigations and implement lasting fixes.</li>
<li>Improve runbooks, sources of truth, deployment workflows, and operational tooling to harden production readiness.</li>
<li>Eliminate single points of failure and reduce operational toil through automation, refactors, and system redesigns.</li>
<li>Ship production code regularly in Python, Go, or similar languages, and participate in on-call rotations.</li>
<li>Maintain and mature long-term projects and frameworks owned by the team, ensuring they remain reliable, well-instrumented, and easy to operate.</li>
<li>Collaborate with platform teams to ensure new features and services integrate cleanly with our reliability best-practices and tooling.</li>
</ul>
<p><strong>What You’ve Worked On (Minimum Qualifications)</strong></p>
<ul>
<li>7+ years of engineering experience building and operating distributed systems or cloud platforms.</li>
<li>Demonstrated ability to debug complex production issues end-to-end, across services, infrastructure layers, and automation.</li>
<li>Strong programming or scripting ability (Python, Go, or similar), with experience shipping and operating production services and tools.</li>
<li>Deep knowledge of cloud-native technologies and distributed system patterns, particularly Kubernetes.</li>
<li>Experience with modern observability stacks: metrics, tracing, structured logs, SLOs/SLIs, and incident lifecycle practices.</li>
<li>A track record of successfully delivering hands-on reliability improvements through engineering execution.</li>
</ul>
<p><strong>Preferred Qualifications</strong></p>
<ul>
<li>Experience building internal tooling, frameworks, or automation that supports high-availability cloud operations.</li>
<li>Familiarity with DR/BCP, service tiering, capacity planning, or chaos engineering.</li>
<li>Background operating or building large-scale AI or GPU-accelerated infrastructure.</li>
<li>Experience maintaining multi-year ownership of foundational production systems.</li>
</ul>
<p><strong>Why CoreWeave</strong></p>
<p>At CoreWeave, we work hard, have fun, and move fast. You’ll join a team that values curiosity, ownership, and creative problem-solving. Production Engineering sits at the intersection of reliability and AI infrastructure, building systems that enable the world’s most powerful AI cloud.</p>
<p><strong>Core Values</strong></p>
<ul>
<li>Be Curious at Your Core</li>
<li>Act Like an Owner</li>
<li>Empower Employees</li>
<li>Deliver Best-in-Class Client Experiences</li>
<li>Achieve More Together</li>
</ul>
<p>We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and enables the development of innovative solutions to complex problems. As we get set for takeoff, the organization&#39;s growth opportunities are constantly expanding. You will be surrounded by some of the best talent in the industry, who will want to learn from you, too. Come join us!</p>
<p><strong>Compensation</strong></p>
<p>The base salary range for this role is 160,000 to 214,000 SGD. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).</p>
<p><strong>What We Offer</strong></p>
<p>The range we’ve posted represents the typical compensation range for this role. To determine actual compensation, we review the market rate for each candidate which can include a variety of factors. These include qualifications, experience, interview performance, and location.</p>
<p>In addition to a competitive salary, we offer a variety of benefits to support your needs, including:</p>
<ul>
<li>Medical, dental, and vision insurance - 100% paid for by CoreWeave</li>
<li>Company-paid Life Insurance</li>
<li>Voluntary supplemental life insurance</li>
<li>Short and long-term disability insurance</li>
<li>Flexible Spending Account</li>
<li>Health Savings Account</li>
<li>Tuition Reimbursement</li>
<li>Ability to Participate in Employee Stock Purchase Program (ESPP)</li>
<li>Mental Wellness Benefits through Spring Health</li>
<li>Family-Forming support provided by Carrot</li>
<li>Paid Parental Leave</li>
<li>Flexible, full-service childcare support with Kinside</li>
<li>401(k) with a generous employer match</li>
<li>Flexible PTO</li>
<li>Catered lunch each day in our office and data center locations</li>
<li>A casual work environment</li>
<li>A work culture focused on innovative disruption</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>160,000 to 214,000 SGD</Salaryrange>
      <Skills>cloud computing, distributed systems, Kubernetes, observability stacks, metrics, tracing, structured logs, SLOs/SLIs, incident lifecycle practices, Python, Go, engineering experience, internal tooling, frameworks, automation, DR/BCP, service tiering, capacity planning, chaos engineering, large-scale AI, GPU-accelerated infrastructure</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>CoreWeave</Employername>
      <Employerlogo>https://logos.yubhub.co/coreweave.com.png</Employerlogo>
      <Employerdescription>CoreWeave is a cloud computing company that provides a platform for building and scaling AI applications.</Employerdescription>
      <Employerwebsite>https://www.coreweave.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/coreweave/jobs/4675297006?utm_source=yubhub.co&amp;utm_medium=jobs_feed&amp;utm_campaign=apply</Applyto>
      <Location>Singapore</Location>
      <Country></Country>
      <Postedate>2026-04-24</Postedate>
    </job>
    <job>
      <externalid>c0569537-539</externalid>
      <Title>Staff Backend Engineer, Gitlab Delivery: Upgrades</Title>
      <Description><![CDATA[<p>As a Staff Engineer on the GitLab Delivery - Upgrades team, you&#39;ll guide the technical direction for GitLab&#39;s self-managed deployment strategy so customers can deploy, upgrade, and run GitLab reliably in their own infrastructure with minimal disruption.</p>
<p>You&#39;ll serve as a technical anchor for the team, working closely with your engineering manager, product manager, and partners across Site Reliability Engineering, Release, Security, and Development to shape cloud-native, operator-driven deployment patterns that reduce operational complexity and upgrade friction.</p>
<p>In your first year, you&#39;ll help define the architecture for zero-downtime upgrades, strengthen observability and reliability practices, and guide the next generation of deployment automation for self-managed GitLab environments.</p>
<p>Some examples of our projects:</p>
<ul>
<li>Evolving GitLab Operator and Helm charts to support zero-downtime upgrades for complex, stateful GitLab installations</li>
</ul>
<ul>
<li>Advancing the GitLab Environment Toolkit to simplify large-scale, production-ready self-managed deployments</li>
</ul>
<p><strong>Responsibilities</strong></p>
<ul>
<li>Guide the technical vision and architecture for GitLab&#39;s cloud-native, self-managed deployments and upgrade workflows.</li>
</ul>
<ul>
<li>Establish operational maturity standards, service integration patterns, and deployment models that help development teams manage the lifecycle of their components.</li>
</ul>
<ul>
<li>Design and maintain Kubernetes Operators, Helm charts, and upgrade orchestration tooling for self-managed GitLab deployments across varied environments.</li>
</ul>
<ul>
<li>Develop automation and integration frameworks for database migrations, rolling deployments, compatibility checks, and rollback paths.</li>
</ul>
<ul>
<li>Define database and application lifecycle strategies, including safe PostgreSQL migration approaches and validation mechanisms that reduce downtime risk.</li>
</ul>
<ul>
<li>Work with Product Management, GitLab.com Site Reliability Engineering, GitLab Dedicated, and development teams to align deployment patterns with customer needs.</li>
</ul>
<ul>
<li>Mentor engineers and enable customer-facing teams through design reviews, code reviews, documentation, and runbooks.</li>
</ul>
<ul>
<li>Drive observability, testing, performance, and resilience practices for self-managed deployments, and contribute to incident response and post-incident learning.</li>
</ul>
<p><strong>Requirements</strong></p>
<ul>
<li>Strong software engineering experience designing and delivering production systems that customers install and operate in their own infrastructure.</li>
</ul>
<ul>
<li>Proficiency in Go for large, complex codebases, with familiarity with Ruby on Rails and Rails application architecture as a useful addition.</li>
</ul>
<ul>
<li>Hands-on experience with Kubernetes in production, including building and maintaining Operators, designing Helm charts for stateful applications, and working with Custom Resource Definitions, admission controllers, and controller patterns.</li>
</ul>
<ul>
<li>Knowledge of cloud-native systems and tooling, such as service mesh, observability stacks, infrastructure as code, and automation tools like Terraform or Ansible.</li>
</ul>
<ul>
<li>Experience with stateful workloads and databases, including PostgreSQL schema design and migrations, persistent volumes, storage classes, and approaches for reducing downtime during upgrades.</li>
</ul>
<ul>
<li>Understanding of Linux systems and production operations, including package management, systemd, system-level debugging, observability, incident response, and on-call participation.</li>
</ul>
<ul>
<li>Ability to guide through influence, including writing clear technical proposals, documenting decisions, mentoring engineers, and working effectively across teams.</li>
</ul>
<ul>
<li>Interest in open source infrastructure or deployment tooling, or transferable experience from adjacent domains, with the ability to explain technical concepts clearly to different audiences.</li>
</ul>
<p><strong>About the Team</strong></p>
<p>The Delivery - Upgrades team sits within GitLab Delivery and focuses on delivering GitLab to self-managed users through supported, validated deployment tooling. We own and evolve the GitLab Omnibus package, Helm charts, GitLab Operator, and the GitLab Environment Toolkit, and we work asynchronously across regions with partners in Site Reliability Engineering, Release, Security, and Development.</p>
<p>Our work centers on enabling zero-downtime upgrades, reducing operational complexity at scale, supporting GitLab’s cloud-native transition while continuing to serve existing deployments, and improving the upgrade experience for customers running GitLab in diverse environments.</p>
<p>For more on how we work, see [Link: Team Handbook Page].</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>staff</Experiencelevel>
      <Workarrangement>remote</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>Go, Ruby on Rails, Kubernetes, Cloud-native systems, Service mesh, Observability stacks, Infrastructure as code, Automation tools, Linux systems, Production operations, Package management, Systemd, System-level debugging, Incident response, On-call participation</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>GitLab</Employername>
      <Employerlogo>https://logos.yubhub.co/about.gitlab.com.png</Employerlogo>
      <Employerdescription>GitLab is a software development platform that provides tools for version control, issue tracking, and project management. With over 50 million registered users and more than 50% of the Fortune 100 trusting GitLab, it is a large and established company.</Employerdescription>
      <Employerwebsite>https://about.gitlab.com/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/gitlab/jobs/8463922002?utm_source=yubhub.co&amp;utm_medium=jobs_feed&amp;utm_campaign=apply</Applyto>
      <Location>Remote, India</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>15a29cc3-0bf</externalid>
      <Title>Senior Production Engineer</Title>
      <Description><![CDATA[<p>CORPORATION</p>
<p>CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025.</p>
<p><strong>About the Role</strong></p>
<p>Production Engineering ensures CoreWeave’s cloud delivers world-class reliability, performance, and operational excellence. We are hiring a Senior Production Engineer to take direct, hands-on ownership of critical tooling that drives reliability and delivery success.</p>
<p>In this role, you will work broadly across the cloud stack designing, implementing, deploying, and operating systems that improve delivery velocity, service availability, and operational safety. You’ll be responsible for leading end-to-end technical projects, maintaining long-lived systems the team owns, and strengthening our operational foundations through durable engineering investments.</p>
<p>This is a role for someone who enjoys building, debugging, and operating production systems. You will collaborate closely with service owners, but your primary impact comes from the reliability, quality, and maturity of the systems you deliver and maintain over time.</p>
<p><strong>What You’ll Do</strong></p>
<ul>
<li>Take hands-on ownership of critical systems and frameworks, driving their architecture, implementation, and long-term evolution.</li>
</ul>
<ul>
<li>Lead end-to-end delivery of engineering projects that improve availability, scalability, operational automation, and failure recovery.</li>
</ul>
<ul>
<li>Build and maintain observability, alerting, automated remediation, and resilience testing for the systems you support.</li>
</ul>
<ul>
<li>Participate in incident response as a subject-matter expert; drive deep root-cause investigations and implement lasting fixes.</li>
</ul>
<ul>
<li>Improve runbooks, sources of truth, deployment workflows, and operational tooling to harden production readiness.</li>
</ul>
<ul>
<li>Eliminate single points of failure and reduce operational toil through automation, refactors, and system redesigns.</li>
</ul>
<ul>
<li>Ship production code regularly in Python, Go, or similar languages, and participate in on-call rotations.</li>
</ul>
<ul>
<li>Maintain and mature long-term projects and frameworks owned by the team, ensuring they remain reliable, well-instrumented, and easy to operate.</li>
</ul>
<ul>
<li>Collaborate with platform teams to ensure new features and services integrate cleanly with our reliability best-practices and tooling.</li>
</ul>
<p><strong>What You’ve Worked On (Minimum Qualifications)</strong></p>
<ul>
<li>7+ years of engineering experience building and operating distributed systems or cloud platforms.</li>
</ul>
<ul>
<li>Demonstrated ability to debug complex production issues end-to-end, across services, infrastructure layers, and automation.</li>
</ul>
<ul>
<li>Strong programming or scripting ability (Python, Go, or similar), with experience shipping and operating production services and tools.</li>
</ul>
<ul>
<li>Deep knowledge of cloud-native technologies and distributed system patterns, particularly Kubernetes.</li>
</ul>
<ul>
<li>Experience with modern observability stacks: metrics, tracing, structured logs, SLOs/SLIs, and incident lifecycle practices.</li>
</ul>
<ul>
<li>A track record of successfully delivering hands-on reliability improvements through engineering execution.</li>
</ul>
<p><strong>Preferred Qualifications</strong></p>
<ul>
<li>Experience building internal tooling, frameworks, or automation that supports high-availability cloud operations.</li>
</ul>
<ul>
<li>Familiarity with DR/BCP, service tiering, capacity planning, or chaos engineering.</li>
</ul>
<ul>
<li>Background operating or building large-scale AI or GPU-accelerated infrastructure.</li>
</ul>
<ul>
<li>Experience maintaining multi-year ownership of foundational production systems.</li>
</ul>
<p>Why CoreWeave?</p>
<p>At CoreWeave, we work hard, have fun, and move fast! We’re in an exciting stage of hyper-growth that you will not want to miss out on. We’re not afraid of a little chaos, and we’re constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values:</p>
<ul>
<li>Be Curious at Your Core</li>
</ul>
<ul>
<li>Act Like an Owner</li>
</ul>
<ul>
<li>Empower Employees</li>
</ul>
<ul>
<li>Deliver Best-in-Class Client Experiences</li>
</ul>
<ul>
<li>Achieve More Together</li>
</ul>
<p>We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and enables the development of innovative solutions to complex problems. As we get set for takeoff, the organization&#39;s growth opportunities are constantly expanding. You will be surrounded by some of the best talent in the industry, who will want to learn from you, too. Come join us!</p>
<p>The base salary range for this role is $139,000 to $204,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).</p>
<p>What We Offer</p>
<p>The range we’ve posted represents the typical compensation range for this role. To determine actual compensation, we review the market rate for each candidate which can include a variety of factors. These include qualifications, experience, interview performance, and location.</p>
<p>In addition to a competitive salary, we offer a variety of benefits to support your needs, including:</p>
<ul>
<li>Medical, dental, and vision insurance - 100% paid for by CoreWeave</li>
</ul>
<ul>
<li>Company-paid Life Insurance</li>
</ul>
<ul>
<li>Voluntary supplemental life insurance</li>
</ul>
<ul>
<li>Short and long-term disability insurance</li>
</ul>
<ul>
<li>Flexible Spending Account</li>
</ul>
<ul>
<li>Health Savings Account</li>
</ul>
<ul>
<li>Tuition Reimbursement</li>
</ul>
<ul>
<li>Ability to Participate in Employee Stock Purchase Program (ESPP)</li>
</ul>
<ul>
<li>Mental Wellness Benefits through Spring Health</li>
</ul>
<ul>
<li>Family-Forming support provided by Carrot</li>
</ul>
<ul>
<li>Paid Parental Leave</li>
</ul>
<ul>
<li>Flexible, full-service childcare support with Kinside</li>
</ul>
<ul>
<li>401(k) with a generous employer match</li>
</ul>
<ul>
<li>Flexible PTO</li>
</ul>
<ul>
<li>Catered lunch each day in our office and data center locations</li>
</ul>
<ul>
<li>A casual work environment</li>
</ul>
<ul>
<li>A work culture focused on innovative disruption</li>
</ul>
<p>Our Workplace</p>
<p>While we prioritize a hybrid work environment, remote work may be considered for candidates located more than 30 miles from an office, based on role requirements for specialized skill sets. New hires will be invited to attend onboarding at one of our hubs within their first month. Teams also gather quarterly to support collaboration.</p>
<p>California Consumer Privacy Act - California applicants only</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$139,000 to $204,000</Salaryrange>
      <Skills>cloud computing, distributed systems, cloud platforms, Kubernetes, observability stacks, metrics, tracing, structured logs, SLOs/SLIs, incident lifecycle practices, Python, Go, programming, scripting, production services, tools, internal tooling, frameworks, automation, high-availability cloud operations, DR/BCP, service tiering, capacity planning, chaos engineering, large-scale AI, GPU-accelerated infrastructure</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>CoreWeave</Employername>
      <Employerlogo>https://logos.yubhub.co/coreweave.com.png</Employerlogo>
      <Employerdescription>CoreWeave is a cloud computing company that provides a platform for building and scaling AI applications.</Employerdescription>
      <Employerwebsite>https://www.coreweave.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/coreweave/jobs/4670172006?utm_source=yubhub.co&amp;utm_medium=jobs_feed&amp;utm_campaign=apply</Applyto>
      <Location>Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>9701c504-1a6</externalid>
      <Title>Senior Software Engineer I, Inference</Title>
      <Description><![CDATA[<p>We&#39;re looking for a Senior Software Engineer I to join our team. As a senior engineer, you&#39;ll lead designs, raise engineering standards, and deliver measurable improvements to latency, throughput, and reliability across multiple services. You&#39;ll partner with product, orchestration, and hardware teams to evolve our Kubernetes-native inference platform and meet strict P99 SLAs at scale.</p>
<p>Key responsibilities include:</p>
<ul>
<li>Lead design reviews and drive architecture within the team; decompose multi-service work into clear milestones.</li>
<li>Define and own SLIs/SLOs; ensure post-incident actions land and reliability improves release-over-release.</li>
<li>Implement advanced optimizations (e.g., micro-batch schedulers, speculative decoding, KV-cache reuse) and quantify impact.</li>
<li>Strengthen incident posture: capacity planning, autoscaling policy, graceful degradation, rollback/traffic-shift strategies.</li>
<li>Mentor IC1/IC2 engineers; review cross-team designs and elevate coding/testing standards.</li>
</ul>
<p>Requirements include:</p>
<ul>
<li>3-5 years of industry experience building distributed systems or cloud services.</li>
<li>Strong coding in Python or Go (C++ a plus) and deep familiarity with networked systems and performance.</li>
<li>Hands-on experience with Kubernetes at production scale, CI/CD, and observability stacks (Prometheus, Grafana, OpenTelemetry).</li>
<li>Practical knowledge of inference internals: batching, caching, mixed precision (BF16/FP8), streaming token delivery.</li>
<li>Proven track record improving tail latency (P95/P99) and service reliability through metrics-driven work.</li>
</ul>
<p>Preferred qualifications include contributions to inference frameworks, experience with CUDA kernels, NCCL/SHARP, RDMA/NUMA, or GPU interconnect topologies, and leading multi-team initiatives or partnering with customers on mission-critical launches.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$139,000 to $204,000</Salaryrange>
      <Skills>Python, Go, Kubernetes, CI/CD, Observability stacks, Inference internals, Batching, Caching, Mixed precision, Streaming token delivery, Contributions to inference frameworks, CUDA kernels, NCCL/SHARP, RDMA/NUMA, GPU interconnect topologies</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>CoreWeave</Employername>
      <Employerlogo>https://logos.yubhub.co/coreweave.com.png</Employerlogo>
      <Employerdescription>CoreWeave is a cloud computing company that provides a platform for building and scaling AI. It was founded in 2017 and became a publicly traded company in March 2025.</Employerdescription>
      <Employerwebsite>https://www.coreweave.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/coreweave/jobs/4647603006?utm_source=yubhub.co&amp;utm_medium=jobs_feed&amp;utm_campaign=apply</Applyto>
      <Location>Sunnyvale, CA / Bellevue, WA</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>40d32156-365</externalid>
      <Title>Reliability Lead, Common Services</Title>
      <Description><![CDATA[<p>As Reliability Lead, Common Services, you will establish and lead the Reliability Engineering and production operations practice for the Common Services organization. You&#39;ll partner closely with engineering leaders and teams across Common Services to define how we build, release, monitor, and operate critical services,raising the bar on reliability, availability, and operational excellence across the board.</p>
<p>In this role, you will:</p>
<ul>
<li>Establish and lead the SRE / production engineering practice for the Common Services organization, including standards for reliability, incident management, and on-call, in partnership with the central Product Engineering organization.</li>
<li>Develop an Operational Excellence strategy that focuses on not only improving system performance but also monitoring and reducing operational toil</li>
<li>Partner with engineering and product teams to define SLOs, SLIs, and error budgets for critical Common Services, and ensure these become part of how teams plan and make tradeoffs.</li>
<li>Own and improve the incident management lifecycle for Common Services, including on-call rotations, escalation paths, incident tooling, post-incident reviews, and follow-through on corrective actions.</li>
<li>Drive the observability strategy (metrics, logs, traces, dashboards, alerts) for Common Services, ensuring we have actionable visibility into the health, performance, and capacity of key systems.</li>
<li>Collaborate with engineering leads to design and review architectures for reliability, scalability, resilience, and operability, including failure modes, redundancy, and graceful degradation.</li>
<li>Lead efforts to automate and harden operational workflows, including deployments, rollbacks, configuration management, change management, and routine maintenance tasks.</li>
<li>Build strong, trust-based relationships with partner teams and stakeholders, becoming a go-to leader for production readiness and operational risk within Common Services.</li>
<li>Hire, mentor, and develop SRE and production engineering talent, fostering a culture of continuous improvement, learning from incidents, and humane on-call.</li>
<li>Partner with other SRE and production engineering leaders across CoreWeave to align on global practices, tools, and reliability goals, representing the needs and constraints of Common Services.</li>
</ul>
<p>You will be responsible for defining the reliability strategy, processes, and standards for the Common Services portfolio and driving consistent, high-quality operational practices across multiple teams.</p>
<p>The base salary range for this role is $206,000 to $303,000.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$206,000 to $303,000</Salaryrange>
      <Skills>Site Reliability Engineering, Production Engineering, Linux-based production environments, Containers, Orchestration technologies, Observability stacks, Alerting systems, SLIs/SLOs, Error budgets, Incident management, On-call rotations, Escalation paths, Post-incident reviews, Corrective actions, Automation tooling, Infrastructure-as-code, CI/CD pipelines, GPU workloads, High-performance computing, Latency/throughput-sensitive systems, Multi-tenant environments, Multi-region environments, Regulated environments, Service ownership models, Mentoring, Managing senior engineers</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>CoreWeave</Employername>
      <Employerlogo>https://logos.yubhub.co/coreweave.com.png</Employerlogo>
      <Employerdescription>CoreWeave is a cloud computing company that provides a platform for AI development and deployment.</Employerdescription>
      <Employerwebsite>https://www.coreweave.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/coreweave/jobs/4650165006?utm_source=yubhub.co&amp;utm_medium=jobs_feed&amp;utm_campaign=apply</Applyto>
      <Location>New York, NY / Sunnyvale, CA / Bellevue, WA</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>86622b48-10e</externalid>
      <Title>Software Engineer, Site Reliability</Title>
      <Description><![CDATA[<p>We are looking for a Site Reliability Engineer who thinks like a software engineer first. You will own critical production systems end-to-end, designing, building, and improving them rather than simply operating them. You will write production-quality code that keeps the platform reliable at scale, embed with product engineering teams to influence architecture from the start, and build the internal tooling that every engineer at Hebbia depends on.</p>
<p>Responsibilities:</p>
<ul>
<li>Own critical production services end-to-end, from design and code review through deployment, operation, and incident response</li>
<li>Profile, benchmark, and rewrite hot paths to eliminate bottlenecks as Hebbia scales</li>
<li>Lead incident response and drive post-mortem culture, translating findings into code changes and architectural improvements rather than runbooks</li>
<li>Design and build observability frameworks from scratch, writing custom instrumentation, alerting logic, and debugging tooling that surfaces production issues before customers feel them</li>
<li>Define and enforce SLOs across platform services and build the feedback loops that keep engineering teams accountable to them</li>
<li>Own capacity planning and cost efficiency: model growth, right-size infrastructure, and write automation that prevents over-provisioning and resource exhaustion</li>
<li>Build robust, well-tested internal platforms and deployment tooling held to the same engineering standards as customer-facing code</li>
<li>Own and continuously improve CI/CD systems so engineering teams can ship safely and quickly</li>
<li>Embed with product engineering teams as a peer software engineer, contributing directly to production codebases and co-designing systems for reliability from the start</li>
<li>Partner on infrastructure security through threat modeling, hardening, and automated compliance tooling</li>
</ul>
<p>Who You Are:</p>
<ul>
<li>5+ years software development with a track record of writing, shipping, and maintaining production services, not just operating infrastructure</li>
<li>Production-grade proficiency in at least one systems or backend language: Go, Python, C++, or Rust</li>
<li>Proven experience as a Production Engineer, SRE, or software engineer with a deep infrastructure focus, comfortable owning services end-to-end across the full stack</li>
<li>Deep understanding of distributed systems</li>
<li>Container orchestration expertise and hands-on experience debugging complex distributed failures in production</li>
<li>Working knowledge of OS-level concepts</li>
<li>Cloud platform fluency (AWS preferred)</li>
<li>Experience in building and maintaining observability stacks</li>
<li>Strong CI/CD pipeline expertise and a track record of improving developer velocity without sacrificing safety</li>
<li>Background at a company with a Production Engineering or software-focused SRE culture is a strong plus</li>
<li>Experience building platforms for AI/ML workloads or high-throughput document processing pipelines is a plus</li>
</ul>
<p>Compensation:
The salary range for this role is $160,000 to $300,000. This range may be inclusive of several career levels at Hebbia and will be narrowed during the interview process based on the candidate’s experience and qualifications. Adjustments outside of this range may be considered for candidates whose qualifications significantly differ from those outlined in the job description.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange>$160,000 - $300,000</Salaryrange>
      <Skills>Go, Python, C++, Rust, Distributed systems, Container orchestration, OS-level concepts, Cloud platform fluency (AWS), Observability stacks, CI/CD pipeline expertise</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Hebbia</Employername>
      <Employerlogo>https://logos.yubhub.co/hebbia.com.png</Employerlogo>
      <Employerdescription>Hebbia is an AI platform that generates alpha and drives upside for investors and bankers. It was founded in 2020 and backed by Peter Thiel and Andreessen Horowitz.</Employerdescription>
      <Employerwebsite>https://hebbia.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/hebbia/jobs/4666955005?utm_source=yubhub.co&amp;utm_medium=jobs_feed&amp;utm_campaign=apply</Applyto>
      <Location>New York City; San Francisco, CA</Location>
      <Country></Country>
      <Postedate>2026-04-17</Postedate>
    </job>
    <job>
      <externalid>089e27b0-40a</externalid>
      <Title>Backend Engineer</Title>
      <Description><![CDATA[<p>We&#39;re looking for a skilled Backend Engineer to join our Data Infrastructure engineering organisation. As a member of this team, you will play a key role in helping us build analytics for internal and music industry-facing tools. Our platform enables capabilities such as showing artists how many streams their latest release has to informing internal teams about their cloud resource usage.</p>
<p>As a Backend Engineer, you will help us exemplify, measure and raise the reliability of data infrastructure of squads across different verticals within Spotify. You&#39;ll work closely with engineers to provide OLAP capabilities to build dynamic, reliable data visualizations and share responsibility with them in diagnosing, resolving, and preventing production issues.</p>
<p>Key responsibilities include:</p>
<ul>
<li>Building, operating, and evolving data analytics platforms that include backend services as well as OLAP data stores (Druid) for teams building analytics across Spotify.</li>
<li>Building internal tooling, libraries, and services that streamline integration patterns with our analytics platform.</li>
<li>Advocating for best practices in service design, data modeling, schema evolution, and contract testing to ensure long-term maintainability.</li>
<li>Working in an autonomous, multi-functional environment and collaborating with squads across Spotify to continuously iterate and deliver on new product objectives.</li>
</ul>
<p>To succeed in this role, you will need:</p>
<ul>
<li>3+ years of relevant experience with distributed datastores and backend services.</li>
<li>Proficiency in Java and a willingness to learn Kubernetes and Terraform.</li>
<li>Understanding of data modeling, dimensional schemas, and analytical query patterns.</li>
<li>Experience building internal developer tools, libraries, or shared services that support large engineering organisations.</li>
<li>A strong sense of ownership of service quality, SLOs, and operational excellence.</li>
<li>Familiarity with OLAP databases or analytics warehouses (e.g., Druid, ClickHouse, Pinot, BigQuery, Snowflake).</li>
<li>Comfort with metrics-driven development and observability stacks (Prometheus, Grafana, similar).</li>
<li>Excellent communication and interpersonal skills, with the ability to work effectively with cross-functional teams.</li>
</ul>
<p>In return, we offer a competitive salary range of $125,562-$179,374, plus equity, as well as a comprehensive benefits package including health insurance, six months&#39; paid parental leave, 401(k) retirement plan, monthly meal allowance, 23 paid days off, 13 paid flexible holidays, and paid sick leave.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>mid</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$125,562-$179,374</Salaryrange>
      <Skills>Java, Kubernetes, Terraform, OLAP databases, Analytics warehouses, Metrics-driven development, Observability stacks</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Spotify</Employername>
      <Employerlogo>https://logos.yubhub.co/spotify.com.png</Employerlogo>
      <Employerdescription>Spotify is a music streaming service with millions of users worldwide.</Employerdescription>
      <Employerwebsite>https://www.spotify.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.lever.co/spotify/66492688-d5b0-4cf8-b1a4-4a715157edd9?utm_source=yubhub.co&amp;utm_medium=jobs_feed&amp;utm_campaign=apply</Applyto>
      <Location>NYC</Location>
      <Country></Country>
      <Postedate>2026-03-31</Postedate>
    </job>
    <job>
      <externalid>2d29b93a-388</externalid>
      <Title>Software Engineer III</Title>
      <Description><![CDATA[<p>We are seeking an experienced Software Engineer skilled in Python and/or Golang to build and maintain automation, APIs, and services that manage the lifecycle of our infrastructure platforms.</p>
<p><strong>What you&#39;ll do</strong></p>
<ul>
<li>Design, develop, and maintain scalable APIs and automation services in Python and/or Go to manage platform operations (e.g., provisioning, configuration, access control, DNS updates).</li>
<li>Automate infrastructure lifecycle workflows across Kubernetes/OpenShift and related systems.</li>
</ul>
<p><strong>What you need</strong></p>
<ul>
<li>5+ years of professional software engineering experience using Python and/or Go (Golang).</li>
<li>Proven experience designing and implementing RESTful or gRPC APIs.</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$119,600 - $167,300 CAD</Salaryrange>
      <Skills>Python, Golang, Kubernetes, OpenShift, GitOps, CI/CD pipelines, containerization, Linux-based systems, PKI, certificate management, secrets automation, infrastructure-as-code, policy-as-code, DNS automation, RBAC, access control systems, observability stacks</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Electronic Arts</Employername>
      <Employerlogo>https://logos.yubhub.co/jobs.ea.com.png</Employerlogo>
      <Employerdescription>Electronic Arts creates next-level entertainment experiences that inspire players and fans around the world. Here, everyone is part of the story. Part of a community that connects across the globe. A place where creativity thrives, new perspectives are invited, and ideas matter.</Employerdescription>
      <Employerwebsite>https://jobs.ea.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.ea.com/en_US/careers/JobDetail/Software-Engineer-III/212100?utm_source=yubhub.co&amp;utm_medium=jobs_feed&amp;utm_campaign=apply</Applyto>
      <Location>Vancouver</Location>
      <Country></Country>
      <Postedate>2026-01-24</Postedate>
    </job>
  </jobs>
</source>