{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/title/senior-site-reliability-engineer"},"x-facet":{"type":"title","slug":"senior-site-reliability-engineer","display":"Senior Site Reliability Engineer","count":4},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_eff95313-cdc"},"title":"Senior Site Reliability Engineer","description":"<p>The Senior Site Reliability Engineer will play a key role in developing scalable, reliable, and efficient infrastructure that powers the entire company. This includes building and scaling internal platform offerings, designing and implementing monitoring, alerting, and incident response systems, and collaborating with application software engineers to guide their design and ensure it scales for what Carta needs in the long run.</p>\n<p>The ideal candidate will have extensive experience with cloud services such as AWS, Google Cloud Platform, or Azure, including services like EC2, S3, RDS, and Lambda. They will also be proficient in using tools such as Terraform, Ansible, or CloudFormation for managing and provisioning cloud infrastructure.</p>\n<p>The team is responsible for providing secure, reliable, scalable, and performant infrastructure to Carta&#39;s customers and developers. The successful candidate will be a strong communicator who enjoys collaborating to solve complex problems and has familiarity with infrastructure best practices on performance, reliability, and security and their associated tools.</p>\n<p>Our stack is Python, Java, Terraform, gRPC, Docker, Kubernetes, Postgres, running on AWS. Come join us!</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_eff95313-cdc","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Carta","sameAs":"https://carta.com/","logo":"https://logos.yubhub.co/carta.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/carta/jobs/7688689003","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$181,688 - $225,000","x-skills-required":["Cloud Platforms","Infrastructure as Code (IaC)","Networking","Monitoring and Observability","Software Development","API Services","AI Fluency"],"x-skills-preferred":["Experience operating CI/CD and its associated best practices"],"datePosted":"2026-04-18T15:55:48.770Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, California; Santa Clara, California; Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Cloud Platforms, Infrastructure as Code (IaC), Networking, Monitoring and Observability, Software Development, API Services, AI Fluency, Experience operating CI/CD and its associated best practices","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":181688,"maxValue":225000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_9d27e558-af6"},"title":"Senior Site Reliability Engineer","description":"<p><strong>Role</strong></p>\n<p>We are building a global operating network that finally enables supply-chain companies to collaborate within one platform. Our workflow engine empowers non-technical industry experts to model their complex manufacturing and operational processes. Our forms engine enables unprecedented data exchange between companies. And our upcoming AI engine can generate entire new processes and summarize the complex goings-on across thousands of workflows, identifying inefficiencies and driving optimization as companies react to a constantly-shifting global landscape.</p>\n<p>As an SRE you will have the opportunity to shape our developer platform, work directly with customers, and architect solutions that balance the rigorous security and reliability requirements of global enterprises with the speed and flexibility of a rapidly growing series A organization.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Contribute to SRE-owned portions of application codebases related to infrastructure clients, SaaS clients, observability, and reliability patterns.</li>\n<li>Contribute to the developer platform interfaces to enable a growing number of engineers, microservices, and environments (helm charts, CI platform, and deploy processes).</li>\n<li>Advocate for new tools and processes that will help Regrello grow.</li>\n<li>Take part in on-call rotations.</li>\n<li>Collaborate with cross-functional teams, including Development, QA, Product Management, to ensure successful releases.</li>\n</ul>\n<p><strong>Stack</strong></p>\n<ul>\n<li>GCP: GKE, CloudRun, Memorystore, CloudSQL, BigQuery</li>\n<li>Kubernetes: helm, helmfile</li>\n<li>Automation: Terraform, shell</li>\n<li>Queue: Temporal, Machinery, Celery</li>\n<li>Launchdarkly</li>\n<li>Otel / Prometheus / Grafana / Splunk</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>Bachelor’s degree in Computer Science or a related field.</li>\n<li>4-8 years of experience in site reliability, software engineering, or a related role.</li>\n<li>Strong understanding of software development lifecycle (SDLC) and Agile methodologies.</li>\n<li>Experience with CI/CD tools such as Github Actions, GitLab CI, or CircleCI.</li>\n<li>Proficiency in scripting languages for automation tasks.</li>\n<li>Fluency with cloud platforms (AWS, Azure, GCP), kubernetes, feature flags, and modern backend technologies (experience with Go is strongly preferred, with the ability to quickly learn new technologies as needed).</li>\n<li>A builder’s spirit (you have a track record of building projects for fun, staying updated with open-source developments, etc.)</li>\n<li>Excellent problem-solving and communications skills, and attention to detail, with the ability to work effectively in a remote team environment.</li>\n</ul>\n<p><strong>Culture and Compensation</strong></p>\n<p>We are a customer-obsessed, product-driven company that is building a flexible, hybrid/remote culture to enable the brightest minds in the industry. We are particularly interested in candidates based in our hubs of Seattle, San Francisco, and New York, but we will consider candidates who live anywhere in the US, Canada, or Mexico. We have industry-leading compensation packages, including equity and health benefits. We are willing to sponsor US work authorization if needed.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_9d27e558-af6","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Regrello","sameAs":"https://regrello.com","logo":"https://logos.yubhub.co/regrello.com.png"},"x-apply-url":"https://jobs.lever.co/regrello/e4222908-c38b-4c4c-9067-9f66d94c0be2","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$150,000-200,000 per year","x-skills-required":["Bachelor’s degree in Computer Science or a related field","4-8 years of experience in site reliability, software engineering, or a related role","Strong understanding of software development lifecycle (SDLC) and Agile methodologies","Experience with CI/CD tools such as Github Actions, GitLab CI, or CircleCI","Proficiency in scripting languages for automation tasks","Fluency with cloud platforms (AWS, Azure, GCP), kubernetes, feature flags, and modern backend technologies (experience with Go is strongly preferred, with the ability to quickly learn new technologies as needed)","A builder’s spirit (you have a track record of building projects for fun, staying updated with open-source developments, etc.)","Excellent problem-solving and communications skills, and attention to detail, with the ability to work effectively in a remote team environment"],"x-skills-preferred":["GCP: GKE, CloudRun, Memorystore, CloudSQL, BigQuery","Kubernetes: helm, helmfile","Automation: Terraform, shell","Queue: Temporal, Machinery, Celery","Launchdarkly","Otel / Prometheus / Grafana / Splunk"],"datePosted":"2026-04-17T12:54:41.965Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"United States"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Bachelor’s degree in Computer Science or a related field, 4-8 years of experience in site reliability, software engineering, or a related role, Strong understanding of software development lifecycle (SDLC) and Agile methodologies, Experience with CI/CD tools such as Github Actions, GitLab CI, or CircleCI, Proficiency in scripting languages for automation tasks, Fluency with cloud platforms (AWS, Azure, GCP), kubernetes, feature flags, and modern backend technologies (experience with Go is strongly preferred, with the ability to quickly learn new technologies as needed), A builder’s spirit (you have a track record of building projects for fun, staying updated with open-source developments, etc.), Excellent problem-solving and communications skills, and attention to detail, with the ability to work effectively in a remote team environment, GCP: GKE, CloudRun, Memorystore, CloudSQL, BigQuery, Kubernetes: helm, helmfile, Automation: Terraform, shell, Queue: Temporal, Machinery, Celery, Launchdarkly, Otel / Prometheus / Grafana / Splunk","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":150000,"maxValue":200000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_77a449f6-0b3"},"title":"Senior Site Reliability Engineer","description":"<p>We are building a global operating network that enables supply-chain companies to collaborate within one platform. Our workflow engine empowers non-technical industry experts to model their complex manufacturing and operational processes. Our forms engine enables unprecedented data exchange between companies. And our upcoming AI engine can generate entire new processes and summarize the complex goings-on across thousands of workflows, identifying inefficiencies and driving optimization as companies react to a constantly-shifting global landscape.</p>\n<p>Role:\nAs an SRE at Regrello, you will oversee release management across a growing fleet of SaaS and customer environments. You will collaborate with various teams to ensure that Regrello releases are efficient, consistent, and high-quality, enabling us to scale our engineering and product teams to deliver more features.</p>\n<p>Some of the projects the team is working on include becoming cloud-neutral by turning all application infrastructure clients into plugins, helping our AI team manage GPU infrastructure to train new foundational models, and applying changes to our architecture to increase reliability.</p>\n<p>Culture and Compensation:\nWe are a customer-obsessed, product-driven company that is building a flexible, hybrid/remote culture to enable the brightest minds in the industry. We have industry-leading compensation packages, including equity and health benefits. We are willing to sponsor US work authorization if needed.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_77a449f6-0b3","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Regrello","sameAs":"https://regrello.com","logo":"https://logos.yubhub.co/regrello.com.png"},"x-apply-url":"https://jobs.lever.co/regrello/b59962e0-e244-4f03-9073-dfc4c362055f","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"MXN 85,000-105,000 per-month-salary","x-skills-required":["Bachelor's degree in Computer Science or a related field","4-8 years of experience in site reliability, software engineering, or a related role","Strong understanding of software development lifecycle (SDLC) and Agile methodologies","Experience with CI/CD tools such as Github Actions, GitLab CI, or CircleCI","Proficiency in scripting languages for automation tasks"],"x-skills-preferred":["Fluency with cloud platforms (AWS, Azure, GCP)","Kubernetes","Feature flags","Modern backend technologies (experience with Go is strongly preferred)"],"datePosted":"2026-04-17T12:53:28.057Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Monterrey"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Bachelor's degree in Computer Science or a related field, 4-8 years of experience in site reliability, software engineering, or a related role, Strong understanding of software development lifecycle (SDLC) and Agile methodologies, Experience with CI/CD tools such as Github Actions, GitLab CI, or CircleCI, Proficiency in scripting languages for automation tasks, Fluency with cloud platforms (AWS, Azure, GCP), Kubernetes, Feature flags, Modern backend technologies (experience with Go is strongly preferred)","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":85000,"maxValue":105000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_ffccb977-f95"},"title":"Senior Site Reliability Engineer","description":"<p>Are you excited by the idea of building fast, reliable, and intelligent infrastructure for a product used by engineering teams around the world? We&#39;re looking for a Senior Site Reliability Engineer to join the Backstage team at Spotify. We&#39;re building the next generation of our developer platform , one that doesn&#39;t just manage software, but actively helps create and maintain it through AI-native workflows.</p>\n<p>In 2026, SRE isn&#39;t just about uptime; it&#39;s about symbiosis. As part of our growing engineering team, you&#39;ll design, build, and operate the cloud infrastructure behind our external developer portal product and our internal fleet of background coding agents. You&#39;ll collaborate closely with experienced engineers (both human and AI-assisted) while operating at real-world scale, with deep observability, strong safety boundaries, and the unique reliability challenges of agentic production systems.</p>\n<p>Backstage is more than just a platform , it&#39;s a foundational force in the developer community. Born out of Spotify&#39;s quest for better developer tooling, Backstage now powers developer portals across the globe. But we didn&#39;t stop at catalogs and templates. Today, Backstage is becoming the command center for AI-native engineering. From enterprises orchestrating large-scale migrations to fast-moving teams using AI to improve velocity and quality, our solutions are redefining what great developer experience looks like.</p>\n<p>As part of the Backstage team, you&#39;ll shape developer experience for companies large and small, for our thriving open-source community, and for Spotify itself. You&#39;ll help define how reliable, secure infrastructure enables the next wave of agentic developer tooling.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Own fleet reliability. Lead the reliability, security, and scalability strategy for Portal&#39;s SaaS infrastructure, including the runtime environments that power our platform and LLM-driven agent workflows. Define SLOs, drive capacity planning, and ensure our systems meet the demands of a rapidly growing product.</li>\n</ul>\n<ul>\n<li>Architect for the agentic era. Design and evolve infrastructure on GCP and AWS using Terraform and infrastructure-from-code patterns. Shape how we structure environments for non-deterministic AI workloads , including sandboxing, resource isolation, cost governance, and security boundaries.</li>\n</ul>\n<ul>\n<li>Drive operational excellence. Evolve our incident management, on-call, and postmortem practices. Leverage AI assistants to accelerate root cause analysis and build increasingly self-healing capabilities into our production systems.</li>\n</ul>\n<ul>\n<li>Lead fullstack reliability. Operate across a modern web stack (TypeScript, React, Python). While not frontend-heavy, you&#39;ll diagnose and resolve issues across the stack and drive reliability improvements end-to-end.</li>\n</ul>\n<ul>\n<li>Mentor and multiply. Raise the reliability IQ of the broader engineering team. Establish SRE best practices, conduct production-readiness reviews, and mentor engineers on operational thinking.</li>\n</ul>\n<ul>\n<li>Shape the roadmap. Partner with engineering and product leadership to evolve our infrastructure in step with generative AI features. Translate operational insights into strategic input on the product roadmap.</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>You have 5+ years of hands-on experience operating cloud infrastructure (GCP and/or AWS), using Terraform and Kubernetes to run production systems at scale.</li>\n</ul>\n<ul>\n<li>You have practical experience , or a strong demonstrated interest , in operating LLM-based systems, RAG pipelines, or agentic workloads, and understand the reliability challenges of non-deterministic systems.</li>\n</ul>\n<ul>\n<li>You think in distributed systems first principles , consistency, availability, partition tolerance , and translate that thinking into pragmatic infrastructure decisions.</li>\n</ul>\n<ul>\n<li>You are proficient in at least one modern language (TypeScript, Java, Go, or Python) and comfortable navigating large, heterogeneous codebases, including environments where AI-generated PRs are common.</li>\n</ul>\n<ul>\n<li>You build automation and improve systems so that whole categories of operational issues disappear over time.</li>\n</ul>\n<ul>\n<li>You communicate complex infrastructure trade-offs clearly to both technical and non-technical stakeholders, and you write postmortems that lead to meaningful change.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_ffccb977-f95","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Spotify","sameAs":"https://www.spotify.com","logo":"https://logos.yubhub.co/spotify.com.png"},"x-apply-url":"https://jobs.lever.co/spotify/fdfe281d-889c-478a-8f27-c9bc36b2b0cf","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$164,448–$234,926 USD","x-skills-required":["cloud infrastructure","Terraform","Kubernetes","LLM-based systems","RAG pipelines","agentic workloads","distributed systems","TypeScript","Java","Go","Python"],"x-skills-preferred":[],"datePosted":"2026-03-31T18:18:50.967Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"New York"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"cloud infrastructure, Terraform, Kubernetes, LLM-based systems, RAG pipelines, agentic workloads, distributed systems, TypeScript, Java, Go, Python","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":164448,"maxValue":234926,"unitText":"YEAR"}}}]}