{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/agentic-workloads"},"x-facet":{"type":"skill","slug":"agentic-workloads","display":"Agentic Workloads","count":1},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_ffccb977-f95"},"title":"Senior Site Reliability Engineer","description":"<p>Are you excited by the idea of building fast, reliable, and intelligent infrastructure for a product used by engineering teams around the world? We&#39;re looking for a Senior Site Reliability Engineer to join the Backstage team at Spotify. We&#39;re building the next generation of our developer platform , one that doesn&#39;t just manage software, but actively helps create and maintain it through AI-native workflows.</p>\n<p>In 2026, SRE isn&#39;t just about uptime; it&#39;s about symbiosis. As part of our growing engineering team, you&#39;ll design, build, and operate the cloud infrastructure behind our external developer portal product and our internal fleet of background coding agents. You&#39;ll collaborate closely with experienced engineers (both human and AI-assisted) while operating at real-world scale, with deep observability, strong safety boundaries, and the unique reliability challenges of agentic production systems.</p>\n<p>Backstage is more than just a platform , it&#39;s a foundational force in the developer community. Born out of Spotify&#39;s quest for better developer tooling, Backstage now powers developer portals across the globe. But we didn&#39;t stop at catalogs and templates. Today, Backstage is becoming the command center for AI-native engineering. From enterprises orchestrating large-scale migrations to fast-moving teams using AI to improve velocity and quality, our solutions are redefining what great developer experience looks like.</p>\n<p>As part of the Backstage team, you&#39;ll shape developer experience for companies large and small, for our thriving open-source community, and for Spotify itself. You&#39;ll help define how reliable, secure infrastructure enables the next wave of agentic developer tooling.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Own fleet reliability. Lead the reliability, security, and scalability strategy for Portal&#39;s SaaS infrastructure, including the runtime environments that power our platform and LLM-driven agent workflows. Define SLOs, drive capacity planning, and ensure our systems meet the demands of a rapidly growing product.</li>\n</ul>\n<ul>\n<li>Architect for the agentic era. Design and evolve infrastructure on GCP and AWS using Terraform and infrastructure-from-code patterns. Shape how we structure environments for non-deterministic AI workloads , including sandboxing, resource isolation, cost governance, and security boundaries.</li>\n</ul>\n<ul>\n<li>Drive operational excellence. Evolve our incident management, on-call, and postmortem practices. Leverage AI assistants to accelerate root cause analysis and build increasingly self-healing capabilities into our production systems.</li>\n</ul>\n<ul>\n<li>Lead fullstack reliability. Operate across a modern web stack (TypeScript, React, Python). While not frontend-heavy, you&#39;ll diagnose and resolve issues across the stack and drive reliability improvements end-to-end.</li>\n</ul>\n<ul>\n<li>Mentor and multiply. Raise the reliability IQ of the broader engineering team. Establish SRE best practices, conduct production-readiness reviews, and mentor engineers on operational thinking.</li>\n</ul>\n<ul>\n<li>Shape the roadmap. Partner with engineering and product leadership to evolve our infrastructure in step with generative AI features. Translate operational insights into strategic input on the product roadmap.</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>You have 5+ years of hands-on experience operating cloud infrastructure (GCP and/or AWS), using Terraform and Kubernetes to run production systems at scale.</li>\n</ul>\n<ul>\n<li>You have practical experience , or a strong demonstrated interest , in operating LLM-based systems, RAG pipelines, or agentic workloads, and understand the reliability challenges of non-deterministic systems.</li>\n</ul>\n<ul>\n<li>You think in distributed systems first principles , consistency, availability, partition tolerance , and translate that thinking into pragmatic infrastructure decisions.</li>\n</ul>\n<ul>\n<li>You are proficient in at least one modern language (TypeScript, Java, Go, or Python) and comfortable navigating large, heterogeneous codebases, including environments where AI-generated PRs are common.</li>\n</ul>\n<ul>\n<li>You build automation and improve systems so that whole categories of operational issues disappear over time.</li>\n</ul>\n<ul>\n<li>You communicate complex infrastructure trade-offs clearly to both technical and non-technical stakeholders, and you write postmortems that lead to meaningful change.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_ffccb977-f95","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Spotify","sameAs":"https://www.spotify.com","logo":"https://logos.yubhub.co/spotify.com.png"},"x-apply-url":"https://jobs.lever.co/spotify/fdfe281d-889c-478a-8f27-c9bc36b2b0cf","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$164,448–$234,926 USD","x-skills-required":["cloud infrastructure","Terraform","Kubernetes","LLM-based systems","RAG pipelines","agentic workloads","distributed systems","TypeScript","Java","Go","Python"],"x-skills-preferred":[],"datePosted":"2026-03-31T18:18:50.967Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"New York"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"cloud infrastructure, Terraform, Kubernetes, LLM-based systems, RAG pipelines, agentic workloads, distributed systems, TypeScript, Java, Go, Python","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":164448,"maxValue":234926,"unitText":"YEAR"}}}]}