{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/observability-stacks"},"x-facet":{"type":"skill","slug":"observability-stacks","display":"Observability Stacks","count":11},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_c0569537-539"},"title":"Staff Backend Engineer, Gitlab Delivery: Upgrades","description":"<p>As a Staff Engineer on the GitLab Delivery - Upgrades team, you&#39;ll guide the technical direction for GitLab&#39;s self-managed deployment strategy so customers can deploy, upgrade, and run GitLab reliably in their own infrastructure with minimal disruption.</p>\n<p>You&#39;ll serve as a technical anchor for the team, working closely with your engineering manager, product manager, and partners across Site Reliability Engineering, Release, Security, and Development to shape cloud-native, operator-driven deployment patterns that reduce operational complexity and upgrade friction.</p>\n<p>In your first year, you&#39;ll help define the architecture for zero-downtime upgrades, strengthen observability and reliability practices, and guide the next generation of deployment automation for self-managed GitLab environments.</p>\n<p>Some examples of our projects:</p>\n<ul>\n<li>Evolving GitLab Operator and Helm charts to support zero-downtime upgrades for complex, stateful GitLab installations</li>\n</ul>\n<ul>\n<li>Advancing the GitLab Environment Toolkit to simplify large-scale, production-ready self-managed deployments</li>\n</ul>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Guide the technical vision and architecture for GitLab&#39;s cloud-native, self-managed deployments and upgrade workflows.</li>\n</ul>\n<ul>\n<li>Establish operational maturity standards, service integration patterns, and deployment models that help development teams manage the lifecycle of their components.</li>\n</ul>\n<ul>\n<li>Design and maintain Kubernetes Operators, Helm charts, and upgrade orchestration tooling for self-managed GitLab deployments across varied environments.</li>\n</ul>\n<ul>\n<li>Develop automation and integration frameworks for database migrations, rolling deployments, compatibility checks, and rollback paths.</li>\n</ul>\n<ul>\n<li>Define database and application lifecycle strategies, including safe PostgreSQL migration approaches and validation mechanisms that reduce downtime risk.</li>\n</ul>\n<ul>\n<li>Work with Product Management, GitLab.com Site Reliability Engineering, GitLab Dedicated, and development teams to align deployment patterns with customer needs.</li>\n</ul>\n<ul>\n<li>Mentor engineers and enable customer-facing teams through design reviews, code reviews, documentation, and runbooks.</li>\n</ul>\n<ul>\n<li>Drive observability, testing, performance, and resilience practices for self-managed deployments, and contribute to incident response and post-incident learning.</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>Strong software engineering experience designing and delivering production systems that customers install and operate in their own infrastructure.</li>\n</ul>\n<ul>\n<li>Proficiency in Go for large, complex codebases, with familiarity with Ruby on Rails and Rails application architecture as a useful addition.</li>\n</ul>\n<ul>\n<li>Hands-on experience with Kubernetes in production, including building and maintaining Operators, designing Helm charts for stateful applications, and working with Custom Resource Definitions, admission controllers, and controller patterns.</li>\n</ul>\n<ul>\n<li>Knowledge of cloud-native systems and tooling, such as service mesh, observability stacks, infrastructure as code, and automation tools like Terraform or Ansible.</li>\n</ul>\n<ul>\n<li>Experience with stateful workloads and databases, including PostgreSQL schema design and migrations, persistent volumes, storage classes, and approaches for reducing downtime during upgrades.</li>\n</ul>\n<ul>\n<li>Understanding of Linux systems and production operations, including package management, systemd, system-level debugging, observability, incident response, and on-call participation.</li>\n</ul>\n<ul>\n<li>Ability to guide through influence, including writing clear technical proposals, documenting decisions, mentoring engineers, and working effectively across teams.</li>\n</ul>\n<ul>\n<li>Interest in open source infrastructure or deployment tooling, or transferable experience from adjacent domains, with the ability to explain technical concepts clearly to different audiences.</li>\n</ul>\n<p><strong>About the Team</strong></p>\n<p>The Delivery - Upgrades team sits within GitLab Delivery and focuses on delivering GitLab to self-managed users through supported, validated deployment tooling. We own and evolve the GitLab Omnibus package, Helm charts, GitLab Operator, and the GitLab Environment Toolkit, and we work asynchronously across regions with partners in Site Reliability Engineering, Release, Security, and Development.</p>\n<p>Our work centers on enabling zero-downtime upgrades, reducing operational complexity at scale, supporting GitLab’s cloud-native transition while continuing to serve existing deployments, and improving the upgrade experience for customers running GitLab in diverse environments.</p>\n<p>For more on how we work, see [Link: Team Handbook Page].</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_c0569537-539","directApply":true,"hiringOrganization":{"@type":"Organization","name":"GitLab","sameAs":"https://about.gitlab.com/","logo":"https://logos.yubhub.co/about.gitlab.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/gitlab/jobs/8463922002","x-work-arrangement":"remote","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Go","Ruby on Rails","Kubernetes","Cloud-native systems","Service mesh","Observability stacks","Infrastructure as code","Automation tools","Linux systems","Production operations","Package management","Systemd","System-level debugging","Incident response","On-call participation"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:52:40.073Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Remote, India"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Go, Ruby on Rails, Kubernetes, Cloud-native systems, Service mesh, Observability stacks, Infrastructure as code, Automation tools, Linux systems, Production operations, Package management, Systemd, System-level debugging, Incident response, On-call participation"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_15a29cc3-0bf"},"title":"Senior Production Engineer","description":"<p>CORPORATION</p>\n<p>CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025.</p>\n<p><strong>About the Role</strong></p>\n<p>Production Engineering ensures CoreWeave’s cloud delivers world-class reliability, performance, and operational excellence. We are hiring a Senior Production Engineer to take direct, hands-on ownership of critical tooling that drives reliability and delivery success.</p>\n<p>In this role, you will work broadly across the cloud stack designing, implementing, deploying, and operating systems that improve delivery velocity, service availability, and operational safety. You’ll be responsible for leading end-to-end technical projects, maintaining long-lived systems the team owns, and strengthening our operational foundations through durable engineering investments.</p>\n<p>This is a role for someone who enjoys building, debugging, and operating production systems. You will collaborate closely with service owners, but your primary impact comes from the reliability, quality, and maturity of the systems you deliver and maintain over time.</p>\n<p><strong>What You’ll Do</strong></p>\n<ul>\n<li>Take hands-on ownership of critical systems and frameworks, driving their architecture, implementation, and long-term evolution.</li>\n</ul>\n<ul>\n<li>Lead end-to-end delivery of engineering projects that improve availability, scalability, operational automation, and failure recovery.</li>\n</ul>\n<ul>\n<li>Build and maintain observability, alerting, automated remediation, and resilience testing for the systems you support.</li>\n</ul>\n<ul>\n<li>Participate in incident response as a subject-matter expert; drive deep root-cause investigations and implement lasting fixes.</li>\n</ul>\n<ul>\n<li>Improve runbooks, sources of truth, deployment workflows, and operational tooling to harden production readiness.</li>\n</ul>\n<ul>\n<li>Eliminate single points of failure and reduce operational toil through automation, refactors, and system redesigns.</li>\n</ul>\n<ul>\n<li>Ship production code regularly in Python, Go, or similar languages, and participate in on-call rotations.</li>\n</ul>\n<ul>\n<li>Maintain and mature long-term projects and frameworks owned by the team, ensuring they remain reliable, well-instrumented, and easy to operate.</li>\n</ul>\n<ul>\n<li>Collaborate with platform teams to ensure new features and services integrate cleanly with our reliability best-practices and tooling.</li>\n</ul>\n<p><strong>What You’ve Worked On (Minimum Qualifications)</strong></p>\n<ul>\n<li>7+ years of engineering experience building and operating distributed systems or cloud platforms.</li>\n</ul>\n<ul>\n<li>Demonstrated ability to debug complex production issues end-to-end, across services, infrastructure layers, and automation.</li>\n</ul>\n<ul>\n<li>Strong programming or scripting ability (Python, Go, or similar), with experience shipping and operating production services and tools.</li>\n</ul>\n<ul>\n<li>Deep knowledge of cloud-native technologies and distributed system patterns, particularly Kubernetes.</li>\n</ul>\n<ul>\n<li>Experience with modern observability stacks: metrics, tracing, structured logs, SLOs/SLIs, and incident lifecycle practices.</li>\n</ul>\n<ul>\n<li>A track record of successfully delivering hands-on reliability improvements through engineering execution.</li>\n</ul>\n<p><strong>Preferred Qualifications</strong></p>\n<ul>\n<li>Experience building internal tooling, frameworks, or automation that supports high-availability cloud operations.</li>\n</ul>\n<ul>\n<li>Familiarity with DR/BCP, service tiering, capacity planning, or chaos engineering.</li>\n</ul>\n<ul>\n<li>Background operating or building large-scale AI or GPU-accelerated infrastructure.</li>\n</ul>\n<ul>\n<li>Experience maintaining multi-year ownership of foundational production systems.</li>\n</ul>\n<p>Why CoreWeave?</p>\n<p>At CoreWeave, we work hard, have fun, and move fast! We’re in an exciting stage of hyper-growth that you will not want to miss out on. We’re not afraid of a little chaos, and we’re constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values:</p>\n<ul>\n<li>Be Curious at Your Core</li>\n</ul>\n<ul>\n<li>Act Like an Owner</li>\n</ul>\n<ul>\n<li>Empower Employees</li>\n</ul>\n<ul>\n<li>Deliver Best-in-Class Client Experiences</li>\n</ul>\n<ul>\n<li>Achieve More Together</li>\n</ul>\n<p>We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and enables the development of innovative solutions to complex problems. As we get set for takeoff, the organization&#39;s growth opportunities are constantly expanding. You will be surrounded by some of the best talent in the industry, who will want to learn from you, too. Come join us!</p>\n<p>The base salary range for this role is $139,000 to $204,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).</p>\n<p>What We Offer</p>\n<p>The range we’ve posted represents the typical compensation range for this role. To determine actual compensation, we review the market rate for each candidate which can include a variety of factors. These include qualifications, experience, interview performance, and location.</p>\n<p>In addition to a competitive salary, we offer a variety of benefits to support your needs, including:</p>\n<ul>\n<li>Medical, dental, and vision insurance - 100% paid for by CoreWeave</li>\n</ul>\n<ul>\n<li>Company-paid Life Insurance</li>\n</ul>\n<ul>\n<li>Voluntary supplemental life insurance</li>\n</ul>\n<ul>\n<li>Short and long-term disability insurance</li>\n</ul>\n<ul>\n<li>Flexible Spending Account</li>\n</ul>\n<ul>\n<li>Health Savings Account</li>\n</ul>\n<ul>\n<li>Tuition Reimbursement</li>\n</ul>\n<ul>\n<li>Ability to Participate in Employee Stock Purchase Program (ESPP)</li>\n</ul>\n<ul>\n<li>Mental Wellness Benefits through Spring Health</li>\n</ul>\n<ul>\n<li>Family-Forming support provided by Carrot</li>\n</ul>\n<ul>\n<li>Paid Parental Leave</li>\n</ul>\n<ul>\n<li>Flexible, full-service childcare support with Kinside</li>\n</ul>\n<ul>\n<li>401(k) with a generous employer match</li>\n</ul>\n<ul>\n<li>Flexible PTO</li>\n</ul>\n<ul>\n<li>Catered lunch each day in our office and data center locations</li>\n</ul>\n<ul>\n<li>A casual work environment</li>\n</ul>\n<ul>\n<li>A work culture focused on innovative disruption</li>\n</ul>\n<p>Our Workplace</p>\n<p>While we prioritize a hybrid work environment, remote work may be considered for candidates located more than 30 miles from an office, based on role requirements for specialized skill sets. New hires will be invited to attend onboarding at one of our hubs within their first month. Teams also gather quarterly to support collaboration.</p>\n<p>California Consumer Privacy Act - California applicants only</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_15a29cc3-0bf","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4670172006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$139,000 to $204,000","x-skills-required":["cloud computing","distributed systems","cloud platforms","Kubernetes","observability stacks","metrics","tracing","structured logs","SLOs/SLIs","incident lifecycle practices","Python","Go","programming","scripting","production services","tools"],"x-skills-preferred":["internal tooling","frameworks","automation","high-availability cloud operations","DR/BCP","service tiering","capacity planning","chaos engineering","large-scale AI","GPU-accelerated infrastructure"],"datePosted":"2026-04-18T15:52:09.786Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"cloud computing, distributed systems, cloud platforms, Kubernetes, observability stacks, metrics, tracing, structured logs, SLOs/SLIs, incident lifecycle practices, Python, Go, programming, scripting, production services, tools, internal tooling, frameworks, automation, high-availability cloud operations, DR/BCP, service tiering, capacity planning, chaos engineering, large-scale AI, GPU-accelerated infrastructure","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":139000,"maxValue":204000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_b687767a-7a1"},"title":"Director of Engineering, Security Risk Management","description":"<p>We&#39;re seeking an exceptional Engineering Lead to drive the evolution of GitLab&#39;s Security Risk Management (SRM) stage into a world-class platform for vulnerability analysis and remediation at enterprise scale.</p>\n<p>This is a rare opportunity to architect and build distributed systems that will fundamentally change how large organisations approach application security and developer security workflows.</p>\n<p>As the SRM Stage Lead, you&#39;ll be responsible for transforming our engineering culture toward high-performance distributed systems while delivering an exceptional user experience for both Application Security professionals and Developers.</p>\n<p>You&#39;ll own the technical strategy for processing, analysing, and remediating vulnerabilities across massive codebases and complex enterprise environments.</p>\n<p><strong>Technical Leadership &amp; Architecture</strong></p>\n<ul>\n<li>Design distributed systems architecture capable of processing vulnerability data from thousands of repositories, millions of commits, and complex dependency graphs in real-time</li>\n<li>Drive storage system decisions for multi-petabyte security datasets, balancing query performance, cost efficiency, and data retention requirements across time-series, graph, and document storage paradigms</li>\n<li>Architect scalable analysis pipelines that can ingest vulnerability feeds, correlate findings across multiple security tools, and provide actionable intelligence to both security teams and individual developers</li>\n<li>Lead the technical evolution from monolithic security scanning to microservices-based, event-driven vulnerability management systems</li>\n</ul>\n<p><strong>Engineering Culture Transformation</strong></p>\n<ul>\n<li>Champion high-performance systems thinking throughout the team, establishing patterns for horizontal scaling, efficient resource utilisation, and fault-tolerant distributed computing</li>\n<li>Establish technical standards for system observability, chaos engineering, and performance optimisation in security-critical systems</li>\n<li>Mentor and develop senior engineers in distributed systems design, database optimisation, and large-scale system architecture</li>\n<li>Drive architectural decision records (ADRs) for major technical decisions, particularly around data storage, processing frameworks, and system boundaries</li>\n</ul>\n<p><strong>Product &amp; User Experience Excellence</strong></p>\n<ul>\n<li>Own the end-to-end user journey (in partnership with PM) for both AppSec professionals managing enterprise-wide risk and developers receiving actionable security feedback in their workflow</li>\n<li>Design APIs and interfaces that abstract complexity while providing the power and flexibility that security professionals demand</li>\n<li>Collaborate with Product Management, UX and Product Design to translate complex technical capabilities into intuitive user experiences</li>\n<li>Establish feedback loops with large enterprise customers to ensure our technical solutions scale with their organisational complexity</li>\n</ul>\n<p><strong>Strategic Technical Execution</strong></p>\n<ul>\n<li>Evaluate and integrate cutting-edge technologies in areas such as graph databases, stream processing, machine learning inference at scale, and distributed caching, in collaboration with GitLab’s Infrastructure, Data and AI teams</li>\n<li>Own the technical roadmap for vulnerability correlation, risk scoring, and automated remediation workflows</li>\n<li>Drive partnerships with other GitLab stages to ensure seamless integration across the DevSecOps platform</li>\n<li>Lead incident response for availability and performance issues in customer-facing security systems</li>\n</ul>\n<p><strong>What You’ll Bring</strong></p>\n<ul>\n<li>10+ years of software engineering experience with 5+ years leading distributed systems at scale (&gt;100M daily operations)</li>\n<li>Deep expertise in designing and operating high-throughput, low-latency distributed systems with complex data models</li>\n<li>Proven experience with polyglot persistence strategies, including relational databases (PostgreSQL, Cloud Spanner), time-series databases, graph databases, and distributed key-value stores</li>\n<li>Strong background in stream processing frameworks (Apache Kafka, Apache Flink, or similar) and event-driven architectures</li>\n<li>Hands-on experience with container orchestration (Kubernetes) and cloud-native observability stacks</li>\n<li>Security domain knowledge with understanding of vulnerability assessment, static analysis, dependency scanning, or application security testing</li>\n</ul>\n<p><strong>Leadership &amp; Communication</strong></p>\n<ul>\n<li>Proven track record of leading and growing high-performing engineering teams (40+ engineers)</li>\n<li>Experience transforming engineering culture and establishing technical excellence standards in fast-growing organisations</li>\n<li>Strong technical communication skills with ability to present complex architectural decisions to executive stakeholders</li>\n<li>Collaborative leadership style with experience working across multiple engineering teams and product stakeholders</li>\n</ul>\n<p><strong>Problem-Solving &amp; Innovation</strong></p>\n<ul>\n<li>Systems thinking approach to complex technical problems with demonstrated ability to make appropriate trade-offs between performance, scalability, and maintainability</li>\n<li>Experience with A/B testing frameworks and data-driven decision making in technical contexts</li>\n<li>Track record of successfully delivering large-scale technical migrations or architectural transformations</li>\n<li>Startup or high-growth company experience with ability to balance technical debt with rapid feature delivery</li>\n</ul>\n<p><strong>About the team</strong></p>\n<p>Security Risk Management sits at the heart of modern DevSecOps. The systems you build will directly impact how Fortune 500 companies protect their applications and how millions of developers integrate security into their daily workflow.</p>\n<p>You&#39;ll have the opportunity to define the future of application security tooling while working with some of the most challenging distributed systems problems in the industry.</p>\n<p>The Technical Challenge</p>\n<p>You&#39;ll be solving some of the most interesting distributed systems problems in the security space:</p>\n<ul>\n<li>Scale: Processing vulnerability data for organisations with 100,000+ repositories and millions of developers</li>\n<li>Performance: Sub-second query response times for complex security analytics across massive datasets</li>\n<li>Reliability: 99.95%+ uptime SLAs for security-critical workflows that can&#39;t afford downtime</li>\n<li>Complexity: Correlating findings across 20+ different security tools while maintaining data lineage and audit trails</li>\n<li>User Experience: Making complex security data accessible to both security experts and developers with varying security expertise</li>\n</ul>\n<p><strong>Salary</strong></p>\n<p>The base salary range for this role’s listed level is currently for residents of the United States.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_b687767a-7a1","directApply":true,"hiringOrganization":{"@type":"Organization","name":"GitLab","sameAs":"https://about.gitlab.com/","logo":"https://logos.yubhub.co/about.gitlab.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/gitlab/jobs/8195921002","x-work-arrangement":"remote","x-experience-level":"executive","x-job-type":"full-time","x-salary-range":"Base salary range for this role’s listed level is currently for residents of the United States.","x-skills-required":["Distributed systems","Polyglot persistence strategies","Stream processing frameworks","Event-driven architectures","Container orchestration","Cloud-native observability stacks","Security domain knowledge","Vulnerability assessment","Static analysis","Dependency scanning","Application security testing"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:48:19.166Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Remote, Canada; Remote, EMEA; Remote, US"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Distributed systems, Polyglot persistence strategies, Stream processing frameworks, Event-driven architectures, Container orchestration, Cloud-native observability stacks, Security domain knowledge, Vulnerability assessment, Static analysis, Dependency scanning, Application security testing"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_9701c504-1a6"},"title":"Senior Software Engineer I, Inference","description":"<p>We&#39;re looking for a Senior Software Engineer I to join our team. As a senior engineer, you&#39;ll lead designs, raise engineering standards, and deliver measurable improvements to latency, throughput, and reliability across multiple services. You&#39;ll partner with product, orchestration, and hardware teams to evolve our Kubernetes-native inference platform and meet strict P99 SLAs at scale.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Lead design reviews and drive architecture within the team; decompose multi-service work into clear milestones.</li>\n<li>Define and own SLIs/SLOs; ensure post-incident actions land and reliability improves release-over-release.</li>\n<li>Implement advanced optimizations (e.g., micro-batch schedulers, speculative decoding, KV-cache reuse) and quantify impact.</li>\n<li>Strengthen incident posture: capacity planning, autoscaling policy, graceful degradation, rollback/traffic-shift strategies.</li>\n<li>Mentor IC1/IC2 engineers; review cross-team designs and elevate coding/testing standards.</li>\n</ul>\n<p>Requirements include:</p>\n<ul>\n<li>3-5 years of industry experience building distributed systems or cloud services.</li>\n<li>Strong coding in Python or Go (C++ a plus) and deep familiarity with networked systems and performance.</li>\n<li>Hands-on experience with Kubernetes at production scale, CI/CD, and observability stacks (Prometheus, Grafana, OpenTelemetry).</li>\n<li>Practical knowledge of inference internals: batching, caching, mixed precision (BF16/FP8), streaming token delivery.</li>\n<li>Proven track record improving tail latency (P95/P99) and service reliability through metrics-driven work.</li>\n</ul>\n<p>Preferred qualifications include contributions to inference frameworks, experience with CUDA kernels, NCCL/SHARP, RDMA/NUMA, or GPU interconnect topologies, and leading multi-team initiatives or partnering with customers on mission-critical launches.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_9701c504-1a6","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4647603006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$139,000 to $204,000","x-skills-required":["Python","Go","Kubernetes","CI/CD","Observability stacks","Inference internals","Batching","Caching","Mixed precision","Streaming token delivery"],"x-skills-preferred":["Contributions to inference frameworks","CUDA kernels","NCCL/SHARP","RDMA/NUMA","GPU interconnect topologies"],"datePosted":"2026-04-18T15:48:09.297Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Go, Kubernetes, CI/CD, Observability stacks, Inference internals, Batching, Caching, Mixed precision, Streaming token delivery, Contributions to inference frameworks, CUDA kernels, NCCL/SHARP, RDMA/NUMA, GPU interconnect topologies","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":139000,"maxValue":204000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_40d32156-365"},"title":"Reliability Lead, Common Services","description":"<p>As Reliability Lead, Common Services, you will establish and lead the Reliability Engineering and production operations practice for the Common Services organization. You&#39;ll partner closely with engineering leaders and teams across Common Services to define how we build, release, monitor, and operate critical services,raising the bar on reliability, availability, and operational excellence across the board.</p>\n<p>In this role, you will:</p>\n<ul>\n<li>Establish and lead the SRE / production engineering practice for the Common Services organization, including standards for reliability, incident management, and on-call, in partnership with the central Product Engineering organization.</li>\n<li>Develop an Operational Excellence strategy that focuses on not only improving system performance but also monitoring and reducing operational toil</li>\n<li>Partner with engineering and product teams to define SLOs, SLIs, and error budgets for critical Common Services, and ensure these become part of how teams plan and make tradeoffs.</li>\n<li>Own and improve the incident management lifecycle for Common Services, including on-call rotations, escalation paths, incident tooling, post-incident reviews, and follow-through on corrective actions.</li>\n<li>Drive the observability strategy (metrics, logs, traces, dashboards, alerts) for Common Services, ensuring we have actionable visibility into the health, performance, and capacity of key systems.</li>\n<li>Collaborate with engineering leads to design and review architectures for reliability, scalability, resilience, and operability, including failure modes, redundancy, and graceful degradation.</li>\n<li>Lead efforts to automate and harden operational workflows, including deployments, rollbacks, configuration management, change management, and routine maintenance tasks.</li>\n<li>Build strong, trust-based relationships with partner teams and stakeholders, becoming a go-to leader for production readiness and operational risk within Common Services.</li>\n<li>Hire, mentor, and develop SRE and production engineering talent, fostering a culture of continuous improvement, learning from incidents, and humane on-call.</li>\n<li>Partner with other SRE and production engineering leaders across CoreWeave to align on global practices, tools, and reliability goals, representing the needs and constraints of Common Services.</li>\n</ul>\n<p>You will be responsible for defining the reliability strategy, processes, and standards for the Common Services portfolio and driving consistent, high-quality operational practices across multiple teams.</p>\n<p>The base salary range for this role is $206,000 to $303,000.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_40d32156-365","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4650165006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$206,000 to $303,000","x-skills-required":["Site Reliability Engineering","Production Engineering","Linux-based production environments","Containers","Orchestration technologies","Observability stacks","Alerting systems","SLIs/SLOs","Error budgets","Incident management","On-call rotations","Escalation paths","Post-incident reviews","Corrective actions","Automation tooling","Infrastructure-as-code","CI/CD pipelines"],"x-skills-preferred":["GPU workloads","High-performance computing","Latency/throughput-sensitive systems","Multi-tenant environments","Multi-region environments","Regulated environments","Service ownership models","Mentoring","Managing senior engineers"],"datePosted":"2026-04-18T15:47:45.370Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"New York, NY / Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Site Reliability Engineering, Production Engineering, Linux-based production environments, Containers, Orchestration technologies, Observability stacks, Alerting systems, SLIs/SLOs, Error budgets, Incident management, On-call rotations, Escalation paths, Post-incident reviews, Corrective actions, Automation tooling, Infrastructure-as-code, CI/CD pipelines, GPU workloads, High-performance computing, Latency/throughput-sensitive systems, Multi-tenant environments, Multi-region environments, Regulated environments, Service ownership models, Mentoring, Managing senior engineers","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":206000,"maxValue":303000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_86622b48-10e"},"title":"Software Engineer, Site Reliability","description":"<p>We are looking for a Site Reliability Engineer who thinks like a software engineer first. You will own critical production systems end-to-end, designing, building, and improving them rather than simply operating them. You will write production-quality code that keeps the platform reliable at scale, embed with product engineering teams to influence architecture from the start, and build the internal tooling that every engineer at Hebbia depends on.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Own critical production services end-to-end, from design and code review through deployment, operation, and incident response</li>\n<li>Profile, benchmark, and rewrite hot paths to eliminate bottlenecks as Hebbia scales</li>\n<li>Lead incident response and drive post-mortem culture, translating findings into code changes and architectural improvements rather than runbooks</li>\n<li>Design and build observability frameworks from scratch, writing custom instrumentation, alerting logic, and debugging tooling that surfaces production issues before customers feel them</li>\n<li>Define and enforce SLOs across platform services and build the feedback loops that keep engineering teams accountable to them</li>\n<li>Own capacity planning and cost efficiency: model growth, right-size infrastructure, and write automation that prevents over-provisioning and resource exhaustion</li>\n<li>Build robust, well-tested internal platforms and deployment tooling held to the same engineering standards as customer-facing code</li>\n<li>Own and continuously improve CI/CD systems so engineering teams can ship safely and quickly</li>\n<li>Embed with product engineering teams as a peer software engineer, contributing directly to production codebases and co-designing systems for reliability from the start</li>\n<li>Partner on infrastructure security through threat modeling, hardening, and automated compliance tooling</li>\n</ul>\n<p>Who You Are:</p>\n<ul>\n<li>5+ years software development with a track record of writing, shipping, and maintaining production services, not just operating infrastructure</li>\n<li>Production-grade proficiency in at least one systems or backend language: Go, Python, C++, or Rust</li>\n<li>Proven experience as a Production Engineer, SRE, or software engineer with a deep infrastructure focus, comfortable owning services end-to-end across the full stack</li>\n<li>Deep understanding of distributed systems</li>\n<li>Container orchestration expertise and hands-on experience debugging complex distributed failures in production</li>\n<li>Working knowledge of OS-level concepts</li>\n<li>Cloud platform fluency (AWS preferred)</li>\n<li>Experience in building and maintaining observability stacks</li>\n<li>Strong CI/CD pipeline expertise and a track record of improving developer velocity without sacrificing safety</li>\n<li>Background at a company with a Production Engineering or software-focused SRE culture is a strong plus</li>\n<li>Experience building platforms for AI/ML workloads or high-throughput document processing pipelines is a plus</li>\n</ul>\n<p>Compensation:\nThe salary range for this role is $160,000 to $300,000. This range may be inclusive of several career levels at Hebbia and will be narrowed during the interview process based on the candidate’s experience and qualifications. Adjustments outside of this range may be considered for candidates whose qualifications significantly differ from those outlined in the job description.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_86622b48-10e","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Hebbia","sameAs":"https://hebbia.com","logo":"https://logos.yubhub.co/hebbia.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/hebbia/jobs/4666955005","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$160,000 - $300,000","x-skills-required":["Go","Python","C++","Rust","Distributed systems","Container orchestration","OS-level concepts","Cloud platform fluency (AWS)","Observability stacks","CI/CD pipeline expertise"],"x-skills-preferred":[],"datePosted":"2026-04-17T12:37:23.089Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"New York City; San Francisco, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Go, Python, C++, Rust, Distributed systems, Container orchestration, OS-level concepts, Cloud platform fluency (AWS), Observability stacks, CI/CD pipeline expertise","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":160000,"maxValue":300000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_089e27b0-40a"},"title":"Backend Engineer","description":"<p>We&#39;re looking for a skilled Backend Engineer to join our Data Infrastructure engineering organisation. As a member of this team, you will play a key role in helping us build analytics for internal and music industry-facing tools. Our platform enables capabilities such as showing artists how many streams their latest release has to informing internal teams about their cloud resource usage.</p>\n<p>As a Backend Engineer, you will help us exemplify, measure and raise the reliability of data infrastructure of squads across different verticals within Spotify. You&#39;ll work closely with engineers to provide OLAP capabilities to build dynamic, reliable data visualizations and share responsibility with them in diagnosing, resolving, and preventing production issues.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Building, operating, and evolving data analytics platforms that include backend services as well as OLAP data stores (Druid) for teams building analytics across Spotify.</li>\n<li>Building internal tooling, libraries, and services that streamline integration patterns with our analytics platform.</li>\n<li>Advocating for best practices in service design, data modeling, schema evolution, and contract testing to ensure long-term maintainability.</li>\n<li>Working in an autonomous, multi-functional environment and collaborating with squads across Spotify to continuously iterate and deliver on new product objectives.</li>\n</ul>\n<p>To succeed in this role, you will need:</p>\n<ul>\n<li>3+ years of relevant experience with distributed datastores and backend services.</li>\n<li>Proficiency in Java and a willingness to learn Kubernetes and Terraform.</li>\n<li>Understanding of data modeling, dimensional schemas, and analytical query patterns.</li>\n<li>Experience building internal developer tools, libraries, or shared services that support large engineering organisations.</li>\n<li>A strong sense of ownership of service quality, SLOs, and operational excellence.</li>\n<li>Familiarity with OLAP databases or analytics warehouses (e.g., Druid, ClickHouse, Pinot, BigQuery, Snowflake).</li>\n<li>Comfort with metrics-driven development and observability stacks (Prometheus, Grafana, similar).</li>\n<li>Excellent communication and interpersonal skills, with the ability to work effectively with cross-functional teams.</li>\n</ul>\n<p>In return, we offer a competitive salary range of $125,562-$179,374, plus equity, as well as a comprehensive benefits package including health insurance, six months&#39; paid parental leave, 401(k) retirement plan, monthly meal allowance, 23 paid days off, 13 paid flexible holidays, and paid sick leave.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_089e27b0-40a","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Spotify","sameAs":"https://www.spotify.com","logo":"https://logos.yubhub.co/spotify.com.png"},"x-apply-url":"https://jobs.lever.co/spotify/66492688-d5b0-4cf8-b1a4-4a715157edd9","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$125,562-$179,374","x-skills-required":["Java","Kubernetes","Terraform","OLAP databases","Analytics warehouses","Metrics-driven development","Observability stacks"],"x-skills-preferred":[],"datePosted":"2026-03-31T18:16:24.884Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"NYC"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Java, Kubernetes, Terraform, OLAP databases, Analytics warehouses, Metrics-driven development, Observability stacks","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":125562,"maxValue":179374,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_e8d98a5b-1ea"},"title":"AI & ML Engineer","description":"<p>About Charlotte Tilbury Beauty</p>\n<p>Founded by British makeup artist and beauty entrepreneur Charlotte Tilbury MBE in 2013, Charlotte Tilbury Beauty has revolutionised the face of the global beauty industry.</p>\n<p>The AI &amp; ML Engineering team accelerates the adoption of AI across the business, championing innovation while ensuring our machine learning products are robust, scalable, and cost-efficient.</p>\n<p>Responsibilities:</p>\n<p>Partner with stakeholders to scope problems and identify the right solution - whether leveraging existing AI tools or building custom workflows &amp; solutions.</p>\n<p>Design and implement agentic systems using techniques spanning RAG, grounding, prompt engineering, and orchestration on a GCP-first stack.</p>\n<p>Build and maintain production ML pipelines and services for non-GenAI use cases (e.g. recommender systems, customer segmentation models, marketing optimisation modules, leveraging supervised, unsupervised and/or econometric modelling approaches).</p>\n<p>Develop APIs and microservices for AI/ML solutions, ensuring security, scalability, and observability.</p>\n<p>Implement CI/CD for ML services, writing infrastructure as code, and monitoring for model/data drift and performance.</p>\n<p>Establish robust guardrails for safe AI usage, including prompt security, practical evaluation frameworks, and compliance with privacy regulations.</p>\n<p>Drive and evangelize best practices, reusable templates, and documentation to scale AI/ML delivery across the business.</p>\n<p>Collaborate with data engineers, data scientists, front &amp; back-end engineers, product managers, legal &amp; infosec colleagues to deliver impactful solutions end-to-end.</p>\n<p>Who you will work with</p>\n<p>The AI &amp; ML Engineer Lead and the wider data team.</p>\n<p>About you</p>\n<p>The role requires a blend of technical depth and product sense, including:</p>\n<p>Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.</p>\n<p>Strong Python engineering skills (FastAPI, testing, typing) and experience with cloud-native development (GCP preferred).</p>\n<p>Hands-on experience with GCP Vertex AI (model endpoints, pipelines, embeddings, vector search) or equivalent cloud-native ML platforms (e.g. AWS SageMaker, Azure ML) and agent orchestration frameworks such as LangChain and LangGraph.</p>\n<p>Solid understanding of MLOps - CI/CD, IaC (Terraform), experiment tracking, model registry, and monitoring.</p>\n<p>Proven experience deploying and operating ML systems in production (batch and real-time).</p>\n<p>Familiarity with RAG architectures, prompt engineering, and evaluation techniques.</p>\n<p>Strong grasp of security, privacy, and governance principles (IAM, secrets, PII handling).</p>\n<p>Excellent communication skills and ability to work with non-technical stakeholders.</p>\n<p>In addition to the above, we would LOVE if you have:</p>\n<p>Experience with vector databases and retrieval strategies.</p>\n<p>Knowledge of recommender systems and ranking models.</p>\n<p>Familiarity with LLM evaluation tools (e.g., RAGAS, TruLens, LangSmith, Arize).</p>\n<p>Exposure to feature stores, data lineage, and observability stacks.</p>\n<p>Experience in e-commerce or retail environments.</p>\n<p>Demonstrable ability to weigh up build/build/configure decisions in the LLM space.</p>\n<p>Why join us?</p>\n<p>Be a part of this values driven, high growth, magical journey with an ultimate vision to empower everyone, everywhere to be the best version of themselves.</p>\n<p>We’re a hybrid model with flexibility, allowing you to work how best suits you.</p>\n<p>25 days holiday (plus bank holidays) with an additional day to celebrate your birthday.</p>\n<p>Inclusive parental leave policy that supports all parents and carers throughout their parenting and caring journey.</p>\n<p>Financial security and planning with our pension and life assurance for all.</p>\n<p>Wellness and social benefits including Medicash, Employee Assist Programs and regular social connects with colleagues.</p>\n<p>Bring your furry friend to work with you on our allocated dog friendly days and spaces.</p>\n<p>And not to forget our generous product discount and gifting!</p>\n<p>At Charlotte Tilbury Beauty, our mission is to empower everybody in the world to be the most beautiful version of themselves.</p>\n<p>We celebrate and support this by encouraging and hiring people with diverse backgrounds, cultures, voices, beliefs, and perspectives into our growing global workforce.</p>\n<p>By doing so, we better serve our communities, customers, employees - and the candidates that take part in our recruitment process.</p>\n<p>If you want to learn more about life at Charlotte Tilbury Beauty please follow our LinkedIn page!</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_e8d98a5b-1ea","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Charlotte Tilbury Beauty","sameAs":"https://www.charlottetilbury.com/","logo":"https://logos.yubhub.co/charlottetilbury.com.png"},"x-apply-url":"https://apply.workable.com/j/243770B17B","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Python","GCP","Vertex AI","LangChain","LangGraph","MLOps","CI/CD","IaC","Experiment tracking","Model registry","Monitoring"],"x-skills-preferred":["Vector databases","Recommender systems","Ranking models","LLM evaluation tools","Feature stores","Data lineage","Observability stacks","E-commerce","Retail environments"],"datePosted":"2026-03-20T16:05:50.155Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"London"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Beauty","skills":"Python, GCP, Vertex AI, LangChain, LangGraph, MLOps, CI/CD, IaC, Experiment tracking, Model registry, Monitoring, Vector databases, Recommender systems, Ranking models, LLM evaluation tools, Feature stores, Data lineage, Observability stacks, E-commerce, Retail environments"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_3de2c475-9ca"},"title":"Software Engineer, Database Systems","description":"<p><strong>Software Engineer, Database Systems</strong></p>\n<p><strong>Location</strong></p>\n<p>San Francisco</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p>Applied AI</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$230K – $385K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p><strong>About the Team:</strong></p>\n<p>The Database Systems team specializes in high-performance distributed databases. Our team built Rockset, the real-time search, analytics, and vector database that powers all vector search and retrieval augmented generation (RAG) at OpenAI. In addition to retrieval, as an online database, Rockset powers core functionality across all of OpenAI&#39;s product lines and many critical internal use cases.</p>\n<p><strong>About the Role:</strong></p>\n<p>We are looking for engineers passionate about distributed systems, close-to-the-metal performance optimization (our core engine is written in C++), and building scalable database infrastructure from the ground up. As an engineer on the Database Systems team, you&#39;ll contribute to the core database engine, driving improvements across ingestion, query execution, indexing, and storage. You&#39;ll partner with teams across OpenAI to unlock new product capabilities and help scale online database reliability and throughput as usage grows by orders of magnitude.</p>\n<p><strong>In this role you will:</strong></p>\n<ul>\n<li>Design, build, and operate high-performance distributed systems</li>\n</ul>\n<ul>\n<li>Identify and resolve performance bottlenecks to scale infrastructure to the next order of magnitude</li>\n</ul>\n<ul>\n<li>Define long-term technical direction and guide system evolution</li>\n</ul>\n<ul>\n<li>Collaborate with product, engineering, and research teams to deliver scalable and reliable infrastructure</li>\n</ul>\n<ul>\n<li>Dig deep into complex production issues across the stack</li>\n</ul>\n<ul>\n<li>Contribute to incident response, postmortems, and best practices for system reliability</li>\n</ul>\n<p><strong>You might thrive in this role if you:</strong></p>\n<ul>\n<li>Have significant experience building, scaling, and optimizing distributed systems at scale</li>\n</ul>\n<ul>\n<li>Are curious about database internals, storage engines, or low-latency query systems</li>\n</ul>\n<ul>\n<li>Enjoy debugging challenging performance issues in complex, high-throughput systems</li>\n</ul>\n<ul>\n<li>Have experience operating production clusters at scale (e.g., Kubernetes or other orchestration systems)</li>\n</ul>\n<ul>\n<li>Think rigorously about scalability, correctness, and reliability</li>\n</ul>\n<ul>\n<li>Thrive in fast-paced environments with high autonomy and impact</li>\n</ul>\n<p><strong>Qualifications:</strong></p>\n<ul>\n<li>4+ years of relevant industry experience, with 2+ years leading large scale, complex projects or teams as an engineer or tech lead</li>\n</ul>\n<ul>\n<li>Experience with distributed systems at scale, with a strong focus on performance, reliability, and scalability</li>\n</ul>\n<ul>\n<li>Strong communication skills and ability to collaborate across highly technical and cross-functional teams</li>\n</ul>\n<ul>\n<li>Proficiency in a systems programming language such as C++ (our core engine is written in C++) is strongly preferred</li>\n</ul>\n<ul>\n<li>Fluency in cloud environments (AWS, GCP, Azure) and IaC tools (Terraform or similar)</li>\n</ul>\n<ul>\n<li>Experience with Linux systems, CI/CD pipelines, and modern observability stacks (Prometheus, Grafana, etc.)</li>\n</ul>\n<ul>\n<li>Domain knowledge in areas such as databases, data systems, storage engines, indexing, and query processing is a plus but not required</li>\n</ul>\n<p><strong>About OpenAI</strong></p>\n<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_3de2c475-9ca","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/2b5e8e15-7952-4170-a927-2ad68e318ed6","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$230K – $385K • Offers Equity","x-skills-required":["distributed systems","C++","cloud environments","IaC tools","Linux systems","CI/CD pipelines","modern observability stacks"],"x-skills-preferred":["database internals","storage engines","low-latency query systems","Kubernetes","orchestration systems"],"datePosted":"2026-03-06T18:24:14.702Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"distributed systems, C++, cloud environments, IaC tools, Linux systems, CI/CD pipelines, modern observability stacks, database internals, storage engines, low-latency query systems, Kubernetes, orchestration systems","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":230000,"maxValue":385000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_2d29b93a-388"},"title":"Software Engineer III","description":"<p>We are seeking an experienced Software Engineer skilled in Python and/or Golang to build and maintain automation, APIs, and services that manage the lifecycle of our infrastructure platforms.</p>\n<p><strong>What you&#39;ll do</strong></p>\n<ul>\n<li>Design, develop, and maintain scalable APIs and automation services in Python and/or Go to manage platform operations (e.g., provisioning, configuration, access control, DNS updates).</li>\n<li>Automate infrastructure lifecycle workflows across Kubernetes/OpenShift and related systems.</li>\n</ul>\n<p><strong>What you need</strong></p>\n<ul>\n<li>5+ years of professional software engineering experience using Python and/or Go (Golang).</li>\n<li>Proven experience designing and implementing RESTful or gRPC APIs.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_2d29b93a-388","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Electronic Arts","sameAs":"https://jobs.ea.com","logo":"https://logos.yubhub.co/jobs.ea.com.png"},"x-apply-url":"https://jobs.ea.com/en_US/careers/JobDetail/Software-Engineer-III/212100","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$119,600 - $167,300 CAD","x-skills-required":["Python","Golang","Kubernetes","OpenShift","GitOps","CI/CD pipelines","containerization","Linux-based systems"],"x-skills-preferred":["PKI","certificate management","secrets automation","infrastructure-as-code","policy-as-code","DNS automation","RBAC","access control systems","observability stacks"],"datePosted":"2026-01-24T06:04:42.662Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Vancouver"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Golang, Kubernetes, OpenShift, GitOps, CI/CD pipelines, containerization, Linux-based systems, PKI, certificate management, secrets automation, infrastructure-as-code, policy-as-code, DNS automation, RBAC, access control systems, observability stacks","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":119600,"maxValue":167300,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_b050a65c-f0a"},"title":"Senior SRE 1","description":"<p>We are seeking an accomplished Senior Site Reliability Engineer (SRE) with 12–15 years of experience to lead the reliability, scalability, and performance engineering of our critical infrastructure and production systems. As a Senior SRE, you will play a strategic and technical leadership role — driving reliability practices, mentoring SRE teams, and influencing the adoption of automation, observability, and resilience engineering across the organization.</p>\n<p><strong>What you&#39;ll do</strong></p>\n<ul>\n<li>Architect, implement, and manage resilient, scalable, and highly available infrastructure systems.</li>\n<li>Lead initiatives to automate manual operations, deployment, and monitoring processes to improve reliability and reduce toil.</li>\n</ul>\n<p><strong>What you need</strong></p>\n<ul>\n<li>Strong proficiency in Linux/Unix system administration and internals.</li>\n<li>Proven experience in cloud platforms — AWS, Azure, or GCP.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_b050a65c-f0a","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Electronic Arts","sameAs":"https://jobs.ea.com","logo":"https://logos.yubhub.co/jobs.ea.com.png"},"x-apply-url":"https://jobs.ea.com/en_US/careers/JobDetail/Senior-SRE-I/211515","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Linux/Unix system administration","Cloud platforms","Automation"],"x-skills-preferred":["Containerization and orchestration","Monitoring and observability stacks","Configuration management and IaC tools"],"datePosted":"2026-01-05T21:08:11.258Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Hyderabad"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Linux/Unix system administration, Cloud platforms, Automation, Containerization and orchestration, Monitoring and observability stacks, Configuration management and IaC tools"}]}