{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/observability-solutions"},"x-facet":{"type":"skill","slug":"observability-solutions","display":"Observability Solutions","count":5},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_4df3e714-829"},"title":"Sr. Software Engineer","description":"<p>We are seeking a skilled and motivated Sr. Software Engineer to join our team. As a Sr. Software Engineer, you will be responsible for developing and maintaining our Payments services, including Card Attributes, Webhooks, and Event Pipeline. You will collaborate with cross-functional teams to design, build, and optimize high-throughput, fault-tolerant services within the VGS platform.</p>\n<p>Your responsibilities will include engaging in all phases of the software lifecycle - design, implement, test, deploy, and support services in production. You will maintain a culture of code quality through rigorous testing, automation, and code reviews. You will also be proactive and innovative, relying on your feedback to build a world-class product.</p>\n<p>We are looking for a candidate with deep hands-on expertise in Java and the Spring Framework, strong practical experience working with Kafka, and solid understanding and hands-on experience working with cloud-native architecture, microservices, CI/CD, GitOps, APIs, and API Gateway. You should also have strong experience implementing and leveraging Observability solutions, strong written and verbal communication skills, and bonus points if you have familiarity with the payment processing ecosystem.</p>\n<p>In addition to a competitive salary, you will receive flexible work hours, flexible PTO, competitive health benefits, VGS stock options, 401k plan with employer matching, life and disability insurance, pre-tax flexible spending accounts, global parental leave program, employee assistance program, home internet reimbursement, new hire home office set-up allowance, and professional learning reimbursement.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_4df3e714-829","directApply":true,"hiringOrganization":{"@type":"Organization","name":"VGS","sameAs":"https://www.vgs.io/","logo":"https://logos.yubhub.co/vgs.io.png"},"x-apply-url":"https://jobs.lever.co/verygoodsecurity/21eae4be-c4cb-48d3-9e08-c3923f3cf081","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Java","Spring Framework","Kafka","Cloud-native architecture","Microservices","CI/CD","GitOps","APIs","API Gateway","Observability solutions"],"x-skills-preferred":[],"datePosted":"2026-04-17T13:09:47.168Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Java, Spring Framework, Kafka, Cloud-native architecture, Microservices, CI/CD, GitOps, APIs, API Gateway, Observability solutions"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_5dd5f58c-c07"},"title":"Principal Engineer","description":"<p>We&#39;re looking for a well-versed Principal Engineer to play a key role in architecting and building highly available, reliable, and scalable payments applications. Collaborate with Payments Engineering teams to design, develop, and champion best-practices, patterns, and standards for all payments applications. Work closely with our CTO and other architects to create holistic technology solutions for our customers.</p>\n<p>As a Principal Engineer, you will:</p>\n<ul>\n<li>Collaborate and communicate with Payments Engineering teams to design, develop, and champion best-practices, patterns, and standards for all payments applications.</li>\n<li>Work closely with our CTO and other architects to create holistic technology solutions for our customers.</li>\n<li>Be part of the Tech Leads group, driving measurable outcomes and iterative delivery strategy, removing roadblocks, empowering others, and mentoring high-potential engineers.</li>\n<li>Produce clear, detailed, and actionable design documents, architecture blueprints, architectural decisions with context, decision, and tradeoffs.</li>\n<li>Be involved in hands-on development of proof-of-concepts, prototypes, and real production-ready code.</li>\n<li>Mentor engineers on architecture best practices and standards.</li>\n<li>Engage in all phases of the software lifecycle - design, implement, test, deploy, and support services in production.</li>\n<li>Maintain a culture of code quality through rigorous testing, automation, and code reviews.</li>\n<li>Be proactive and innovative - we rely on your feedback to build a world-class product.</li>\n</ul>\n<p>We&#39;re seeking individuals with an equal flair for creative problem-solving, enthusiasm for new technologies, and a desire to contribute to our product. You will likely be successful in this role if you identify with the following traits: attention to detail, problem solver, customer-oriented, versatile, resilient, and confident.</p>\n<p>If all of this sounds interesting to you, we&#39;d love to hear from you.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_5dd5f58c-c07","directApply":true,"hiringOrganization":{"@type":"Organization","name":"VGS","sameAs":"https://www.vgs.com","logo":"https://logos.yubhub.co/vgs.com.png"},"x-apply-url":"https://jobs.lever.co/verygoodsecurity/33e033b6-ae9b-4d51-b190-262a2cb83d96","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Cloud SaaS environment","Highly available, reliable, and scalable SaaS applications/platforms","Backend API specs, mocks, and service implementations","Cloud-native architecture, microservices, CI/CD (GitHub Actions, Argo), GitOps, Authentication and Authorization, APIs and API Gateway, Docker, Kubernetes (EKS), Kafka (MSK), Java, Spring Framework, Python, and AWS services","Observability solutions using Grafana and Open Telemetry","DevOps, SRE, Configuration Management, and Release Management","Payments technologies and ecosystem (card networks, PSP integration)"],"x-skills-preferred":[],"datePosted":"2026-04-17T13:09:07.462Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Cloud SaaS environment, Highly available, reliable, and scalable SaaS applications/platforms, Backend API specs, mocks, and service implementations, Cloud-native architecture, microservices, CI/CD (GitHub Actions, Argo), GitOps, Authentication and Authorization, APIs and API Gateway, Docker, Kubernetes (EKS), Kafka (MSK), Java, Spring Framework, Python, and AWS services, Observability solutions using Grafana and Open Telemetry, DevOps, SRE, Configuration Management, and Release Management, Payments technologies and ecosystem (card networks, PSP integration)"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_8c164f95-f8d"},"title":"Senior Infrastructure Engineer","description":"<p>Join our Infrastructure Engineering team and help ensure the reliability, scalability, and performance of Replit&#39;s infrastructure that serves millions of developers worldwide. As a Senior Infrastructure Engineer, you will bridge the gap between development and operations, implementing automation and establishing best practices that enable our platform to scale efficiently while maintaining high availability.</p>\n<p>We are seeking Senior Infrastructure Engineers who are passionate about building and maintaining resilient systems at scale. Your mission will be to proactively find and analyse reliability problems across our stack, then design and implement software and systems to address them. You will build robust monitoring solutions, automate operational tasks, and continuously improve our infrastructure&#39;s reliability.</p>\n<p><strong>You Will:</strong></p>\n<ul>\n<li>Drive Automation and Infrastructure as Code: Build and improve automation to eliminate toil and operational work. Maintain CI/CD pipelines and infrastructure automation using tools like Terraform or Pulumi. Create self-healing systems that can automatically respond to common failure scenarios.</li>\n<li>Optimise Performance and Infrastructure: Collaborate with core infrastructure and product teams to performance tune and optimise our cloud deployments (Kubernetes, Docker, GCP). Identify and resolve performance bottlenecks and implement capacity planning strategies.</li>\n<li>Elevate Developer Experience: Design and implement improvements to our build, test, and deployment systems to make software delivery faster, safer, and more reliable for all engineers.</li>\n<li>Drive Cross-Team Improvements: Partner with service owners across Replit to understand their pain points, and collaborate on implementing build/test/deploy enhancements within their specific services.</li>\n<li>Build Shared Tooling: Create and maintain centralized tooling and automation that improves the engineering lifecycle, from local development to production monitoring.</li>\n<li>Debug and Harden Systems: Dive deep into debugging difficult technical problems, making our systems and products more robust, operable, and easier to diagnose.</li>\n<li>Collaborate on Design Reviews: Participate in feature and system design reviews, contributing expertise on security, scale, and operational considerations.</li>\n<li>Build and Integrate: Write high-quality, well-tested code to meet the needs of your customers, including building pipelines to integrate with 3rd party vendors.</li>\n</ul>\n<p><strong>Required Skills and Experience:</strong></p>\n<ul>\n<li>4+ years of experience in Site Reliability Engineering or similar roles (DevOps, Systems Engineering, Infrastructure Engineering).</li>\n<li>Strong programming skills in languages like Python or Go.</li>\n<li>You write high-quality, well-tested code.</li>\n<li>Solid understanding of distributed systems. You&#39;ve built, scaled, and maintained production services and understand service-oriented architecture.</li>\n<li>Experience with container orchestration platforms (Kubernetes) and cloud-native technologies.</li>\n<li>Experience implementing and maintaining monitoring/observability solutions, with strong skills in debugging and performance tuning.</li>\n<li>Strong incident management skills with experience participating in incident response and demonstrated critical thinking under pressure.</li>\n<li>Experience with infrastructure as code (e.g., Terraform) and configuration management tools.</li>\n<li>Excellent written and verbal communication skills, with an ability to explain technical concepts clearly.</li>\n<li>A willingness to dive into understanding, debugging, and improving any layer of the stack.</li>\n<li>You&#39;re passionate about making software creation accessible and empowering the next generation of builders.</li>\n</ul>\n<p><strong>Bonus Points:</strong></p>\n<ul>\n<li>Experience with Google Cloud Platform (GCP) services and tools.</li>\n<li>Knowledge of modern observability platforms (Prometheus, Grafana, Datadog, etc.).</li>\n<li>Experience building reliable systems capable of handling high throughput and low latency.</li>\n<li>Experience with Go and Terraform.</li>\n<li>Familiarity with working in rapid-growth environments.</li>\n</ul>\n<p>_This is a full-time role that can be held from our Foster City, CA office. The role has an in-office requirement of Monday, Wednesday, and Friday._</p>\n<p><strong>Full-Time Employee Benefits Include:</strong></p>\n<ul>\n<li>Competitive Salary &amp; Equity</li>\n<li>401(k) Program with a 4% match</li>\n<li>Health, Dental, Vision and Life Insurance</li>\n<li>Short Term and Long Term Disability</li>\n<li>Paid Parental, Medical, Caregiver Leave</li>\n<li>Commuter Benefits</li>\n<li>Monthly Wellness Stipend</li>\n<li>Autonomous Work Environment</li>\n<li>In Office Set-Up Reimbursement</li>\n<li>Flexible Time Off (FTO) + Holidays</li>\n<li>Quarterly Team Gatherings</li>\n<li>In Office Amenities</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_8c164f95-f8d","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Replit","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/replit.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/replit/16c85abc-763c-4f36-ab67-64f416343384","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$190K - $240K","x-skills-required":["Site Reliability Engineering","DevOps","Systems Engineering","Infrastructure Engineering","Python","Go","Terraform","Kubernetes","Docker","GCP","Monitoring/observability solutions","Debugging and performance tuning","Incident management","Infrastructure as code","Configuration management tools"],"x-skills-preferred":["Google Cloud Platform (GCP) services and tools","Modern observability platforms (Prometheus, Grafana, Datadog, etc.)","Building reliable systems capable of handling high throughput and low latency","Go and Terraform","Familiarity with working in rapid-growth environments"],"datePosted":"2026-03-07T15:20:28.138Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Foster City, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Site Reliability Engineering, DevOps, Systems Engineering, Infrastructure Engineering, Python, Go, Terraform, Kubernetes, Docker, GCP, Monitoring/observability solutions, Debugging and performance tuning, Incident management, Infrastructure as code, Configuration management tools, Google Cloud Platform (GCP) services and tools, Modern observability platforms (Prometheus, Grafana, Datadog, etc.), Building reliable systems capable of handling high throughput and low latency, Go and Terraform, Familiarity with working in rapid-growth environments","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":190000,"maxValue":240000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_b7de618e-5e1"},"title":"Site Reliability Engineer","description":"<p>Join our Site Reliability Engineering team and help ensure the reliability, scalability, and performance of Replit&#39;s infrastructure that serves millions of developers worldwide. As a Site Reliability Engineer, you will bridge the gap between development and operations, implementing automation and establishing best practices that enable our platform to scale efficiently while maintaining high availability.</p>\n<p>We are seeking SREs who are passionate about building and maintaining resilient systems at scale. Your mission will be to design and implement robust monitoring solutions, automate operational tasks, and continuously improve our infrastructure&#39;s reliability and performance.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Design and Implement Observability Solutions: Develop comprehensive monitoring and alerting systems using modern observability tools. Create dashboards and metrics that provide real-time visibility into system health and performance. Implement logging strategies that enable quick problem identification and resolution.</li>\n</ul>\n<ul>\n<li>Drive Automation and Infrastructure as Code: Architect and implement infrastructure automation solutions using tools like Terraform, Ansible, or Pulumi. Design and maintain CI/CD pipelines that enable reliable and consistent deployments. Create self-healing systems that can automatically respond to common failure scenarios.</li>\n</ul>\n<ul>\n<li>Establish SLOs and SLIs: Work with product and engineering teams to define and implement Service Level Objectives (SLOs) and Service Level Indicators (SLIs). Build systems to track and report on these metrics, ensuring we maintain high reliability standards while balancing innovation speed.</li>\n</ul>\n<ul>\n<li>Incident Management and Response: Lead incident response efforts, conducting thorough post-mortems, and implementing improvements to prevent future occurrences. Develop and maintain runbooks for critical services. Build tools and processes that reduce Mean Time To Recovery (MTTR).</li>\n</ul>\n<ul>\n<li>Performance Optimization: Identify and resolve performance bottlenecks across our infrastructure. Implement capacity planning strategies and optimize resource utilization. Work on reducing latency and improving system efficiency across global regions.</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>4-8 years of experience in Site Reliability Engineering or similar roles (DevOps, Systems Engineering, Infrastructure Engineering)</li>\n</ul>\n<ul>\n<li>Strong programming skills in languages commonly used for automation (Python, Go, or similar)</li>\n</ul>\n<ul>\n<li>Deep understanding of distributed systems</li>\n</ul>\n<ul>\n<li>Experience with container orchestration platforms (Kubernetes) and cloud-native technologies</li>\n</ul>\n<ul>\n<li>Proven track record of implementing and maintaining monitoring/observability solutions</li>\n</ul>\n<ul>\n<li>Strong incident management skills with experience leading incident response</li>\n</ul>\n<ul>\n<li>Experience with infrastructure as code and configuration management tools</li>\n</ul>\n<p><strong>Bonus Points</strong></p>\n<ul>\n<li>Experience with Google Cloud Platform (GCP) services and tools</li>\n</ul>\n<ul>\n<li>Knowledge of modern observability platforms (Prometheus, Grafana, Datadog, etc.)</li>\n</ul>\n<p><strong>What We Value</strong></p>\n<ul>\n<li>Problem-solving mindset: Ability to approach complex operational challenges systematically and devise effective solutions</li>\n</ul>\n<ul>\n<li>Self-directed and autonomous: Capable of working independently while collaborating effectively with cross-functional teams</li>\n</ul>\n<ul>\n<li>Strong communication skills: Ability to explain complex technical concepts to both technical and non-technical audiences</li>\n</ul>\n<ul>\n<li>Continuous learning: Passion for staying current with industry best practices and new technologies</li>\n</ul>\n<ul>\n<li>Focus on automation: Strong belief in automating repetitive tasks and building self-healing systems</li>\n</ul>\n<p><strong>Full-Time Employee Benefits Include</strong></p>\n<ul>\n<li>Competitive Salary &amp; Equity</li>\n</ul>\n<ul>\n<li>401(k) Program with a 4% match</li>\n</ul>\n<ul>\n<li>Health, Dental, Vision and Life Insurance</li>\n</ul>\n<ul>\n<li>Short Term and Long Term Disability</li>\n</ul>\n<ul>\n<li>Paid Parental, Medical, Caregiver Leave</li>\n</ul>\n<ul>\n<li>Commuter Benefits</li>\n</ul>\n<ul>\n<li>Monthly Wellness Stipend</li>\n</ul>\n<ul>\n<li>Autonomous Work Environment</li>\n</ul>\n<ul>\n<li>In Office Set-Up Reimbursement</li>\n</ul>\n<ul>\n<li>Flexible Time Off (FTO) + Holidays</li>\n</ul>\n<ul>\n<li>Quarterly Team Gatherings</li>\n</ul>\n<ul>\n<li>In Office Amenities</li>\n</ul>\n<p><strong>Want to Learn More About What We Are Up To?</strong></p>\n<ul>\n<li>Meet the Replit Agent</li>\n</ul>\n<ul>\n<li>Replit: Make an app for that</li>\n</ul>\n<ul>\n<li>Replit Blog</li>\n</ul>\n<ul>\n<li>Amjad TED Talk</li>\n</ul>\n<p><strong>Interviewing + Culture at Replit</strong></p>\n<ul>\n<li>Operating Principles</li>\n</ul>\n<ul>\n<li>Reasons not to work at Replit</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_b7de618e-5e1","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Replit","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/replit.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/replit/f6e6158e-eb89-4008-81ea-1b7512bc509d","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$160K - $250K","x-skills-required":["Site Reliability Engineering","DevOps","Systems Engineering","Infrastructure Engineering","Python","Go","Distributed systems","Container orchestration platforms","Cloud-native technologies","Monitoring/observability solutions","Incident management","Infrastructure as code","Configuration management tools"],"x-skills-preferred":["Google Cloud Platform","Prometheus","Grafana","Datadog"],"datePosted":"2026-03-07T15:20:24.140Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"United States"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Site Reliability Engineering, DevOps, Systems Engineering, Infrastructure Engineering, Python, Go, Distributed systems, Container orchestration platforms, Cloud-native technologies, Monitoring/observability solutions, Incident management, Infrastructure as code, Configuration management tools, Google Cloud Platform, Prometheus, Grafana, Datadog","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":160000,"maxValue":250000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_323bc85d-b69"},"title":"Staff Infrastructure Engineer","description":"<p><strong>About the Role:</strong></p>\n<p>Join our Infrastructure Engineering team and help ensure the reliability, scalability, and performance of Replit&#39;s infrastructure that serves millions of developers worldwide. As a Staff Infrastructure Engineer, you will bridge the gap between development and operations, implementing automation and establishing best practices that enable our platform to scale efficiently while maintaining high availability.</p>\n<p><strong>Responsibilities:</strong></p>\n<ul>\n<li>Drive Automation and Infrastructure as Code: Architect, build, and improve automation to eliminate toil and operational work. Design and maintain CI/CD pipelines and infrastructure automation using tools like Terraform or Pulumi. Create self-healing systems that can automatically respond to common failure scenarios.</li>\n</ul>\n<ul>\n<li>Optimise Performance and Infrastructure: Collaborate with core infrastructure and product teams to performance tune and optimise our cloud deployments (Kubernetes, Docker, GCP). Identify and resolve performance bottlenecks, implement capacity planning strategies, and reduce latency across global regions.</li>\n</ul>\n<ul>\n<li>Elevate Developer Experience: Design and implement improvements to our build, test, and deployment systems to make software delivery faster, safer, and more reliable for all engineers.</li>\n</ul>\n<ul>\n<li>Drive Cross-Company Improvements: Partner directly with service owners across Replit to understand their pain points, and collaborate on implementing build/test/deploy enhancements within their specific services.</li>\n</ul>\n<ul>\n<li>Build Shared Tooling: Create and maintain centralized tooling and automation that improves the entire engineering lifecycle, from local development to production monitoring.</li>\n</ul>\n<ul>\n<li>Debug and Harden Systems: Dive deep into debugging extremely difficult technical problems, making our systems and products more robust, operable, and easier to diagnose.</li>\n</ul>\n<ul>\n<li>Provide Staff-Level Guidance: Review feature and system designs, acting as an owner for the security, scale, and operational integrity of those designs.</li>\n</ul>\n<ul>\n<li>Educate and Mentor: Educate, mentor, and hold accountable the engineering team to improve the reliability of our systems, making reliability a core value of the Replit engineering culture.</li>\n</ul>\n<ul>\n<li>Build and Integrate: Write high-quality, well-tested code to meet the needs of your customers, including building pipelines to integrate with 3rd party vendors.</li>\n</ul>\n<p><strong>Required Skills and Experience:</strong></p>\n<ul>\n<li>8-10 years of experience in Infrastructure Engineering or similar roles (DevOps, Systems Engineering, Site Reliability Engineering).</li>\n</ul>\n<ul>\n<li>Strong programming skills in languages like Python or Go.</li>\n</ul>\n<ul>\n<li>You write high-quality, well-tested code.</li>\n</ul>\n<ul>\n<li>Deep understanding of distributed systems. You&#39;ve designed, built, scaled, and maintained production services and know how to compose a service-oriented architecture.</li>\n</ul>\n<ul>\n<li>Experience with container orchestration platforms (Kubernetes) and cloud-native technologies.</li>\n</ul>\n<ul>\n<li>Proven track record of implementing and maintaining monitoring/observability solutions, with strong skills in debugging and performance tuning.</li>\n</ul>\n<ul>\n<li>Strong incident management skills with experience leading incident response and demonstrated critical thinking under pressure.</li>\n</ul>\n<ul>\n<li>Experience with infrastructure as code (e.g., Terraform) and configuration management tools.</li>\n</ul>\n<ul>\n<li>Excellent written and verbal communication skills, with an ability to explain technical concepts clearly and simply and a bias toward open, transparent cultural practices.</li>\n</ul>\n<ul>\n<li>Strong interpersonal skills, with experience working with engineers from junior to principal levels.</li>\n</ul>\n<ul>\n<li>A willingness to dive into understanding, debugging, and improving any layer of the stack.</li>\n</ul>\n<ul>\n<li>You&#39;re passionate about making software creation accessible and empowering the next generation of builders.</li>\n</ul>\n<p><strong>Bonus Points:</strong></p>\n<ul>\n<li>Deep experience with Google Cloud Platform (GCP) services and tools.</li>\n</ul>\n<ul>\n<li>Knowledge of modern observability platforms (Prometheus, Grafana, Datadog, etc.).</li>\n</ul>\n<ul>\n<li>Experience designing and building reliable systems capable of handling high throughput and low latency.</li>\n</ul>\n<ul>\n<li>Experience with Go and Terraform.</li>\n</ul>\n<ul>\n<li>Familiarity with working in rapid-growth environments.</li>\n</ul>\n<ul>\n<li>Experience writing company-facing blog posts and training materials.</li>\n</ul>\n<p><strong>Full-Time Employee Benefits Include:</strong></p>\n<ul>\n<li>Competitive Salary &amp; Equity</li>\n</ul>\n<ul>\n<li>401(k) Program with a 4% match</li>\n</ul>\n<ul>\n<li>Health, Dental, Vision and Life Insurance</li>\n</ul>\n<ul>\n<li>Short Term and Long Term Disability</li>\n</ul>\n<ul>\n<li>Paid Parental, Medical, Caregiver Leave</li>\n</ul>\n<ul>\n<li>Commuter Benefits</li>\n</ul>\n<ul>\n<li>Monthly Wellness Stipend</li>\n</ul>\n<ul>\n<li>Autonomous Work Environment</li>\n</ul>\n<ul>\n<li>In Office Set-Up Reimbursement</li>\n</ul>\n<ul>\n<li>Flexible Time Off (FTO) + Holidays</li>\n</ul>\n<ul>\n<li>Quarterly Team Gatherings</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_323bc85d-b69","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Replit","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/replit.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/replit/6481ec1e-527c-4c1f-a041-2fb5021e7bd5","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$220K – $325K","x-skills-required":["Infrastructure Engineering","DevOps","Systems Engineering","Site Reliability Engineering","Python","Go","Distributed systems","Container orchestration platforms","Cloud-native technologies","Monitoring/observability solutions","Infrastructure as code","Configuration management tools"],"x-skills-preferred":["Google Cloud Platform","Prometheus","Grafana","Datadog","Go","Terraform","Rapid-growth environments","Company-facing blog posts","Training materials"],"datePosted":"2026-03-07T15:18:43.191Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Foster City, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Infrastructure Engineering, DevOps, Systems Engineering, Site Reliability Engineering, Python, Go, Distributed systems, Container orchestration platforms, Cloud-native technologies, Monitoring/observability solutions, Infrastructure as code, Configuration management tools, Google Cloud Platform, Prometheus, Grafana, Datadog, Go, Terraform, Rapid-growth environments, Company-facing blog posts, Training materials","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":220000,"maxValue":325000,"unitText":"YEAR"}}}]}