{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/opentelemetry"},"x-facet":{"type":"skill","slug":"opentelemetry","display":"Opentelemetry","count":21},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_07c95966-8e7"},"title":"Backend Developer - Host Experience (all genders)","description":"<p>Join our Host Experience department as a Backend Developer and become part of the team that brings new vacation rental properties to life on Holidu.</p>\n<p>You&#39;ll be working at the heart of our property acquisition engine , where we take hosts from their very first sign-up all the way to their first booking, making that journey as fast and seamless as possible.</p>\n<p>This team sits at a uniquely strategic intersection of product and growth. You will build and optimize the systems that every new host flows through: from onboarding and listing creation, to property configuration, content quality, and referral programs.</p>\n<p>The work demands reliability and attention to detail , because the time between a host signing up and welcoming their first guest, and how well their property performs from day one, is directly shaped by the quality of what you build.</p>\n<p><strong>Our Tech Stack</strong></p>\n<ul>\n<li>Backend written in Kotlin and Java 21+ (with Spring Boot), with Gradle.</li>\n<li>Deployed as microservices on AWS-hosted Kubernetes cluster (EKS).</li>\n<li>Internal and external web applications written with ReactJS.</li>\n<li>Event-driven communication between services through EventBridge with SQS / ActiveMQ.</li>\n<li>Usage of a diverse set of technologies depending on the use case, such as PostgreSQL, S3, Valkey, ElasticSearch, GraphQL, and many more.</li>\n<li>Monitoring with OpenTelemetry, Grafana, Prometheus, ELK, APM, and CloudWatch.</li>\n</ul>\n<p><strong>Your role in this journey</strong></p>\n<ul>\n<li>Design, build, evolve, and maintain our services, creating a great user experience for our hosts.</li>\n<li>Build a strong understanding of the product, use it to drive initiatives end-to-end, and contribute to shaping the team&#39;s direction as you grow.</li>\n<li>Work AI-first: use AI to accelerate not just coding, but data exploration, codebase understanding, technical design, and decision-making , and continuously sharpen how you use these tools.</li>\n</ul>\n<p><strong>Your backpack is filled with</strong></p>\n<ul>\n<li>A passion for great user experience and drive to deliver world-class products.</li>\n<li>Early experience delivering product impact through engineering , you&#39;ve shipped things that real users depend on.</li>\n<li>Experience with Java or Kotlin with Spring is a plus.</li>\n<li>Experience with relational databases and deploying apps in cloud environments. NoSQL experience is a plus.</li>\n<li>Familiarity with various API types and integration best practices.</li>\n<li>Strong problem-solving skills and a team-oriented mindset.</li>\n<li>Curiosity for the business side - you want to understand the “why” behind the features.</li>\n<li>A love for coding and building high-quality products that make a difference.</li>\n<li>High motivation to learn and experiment with new technologies.</li>\n</ul>\n<p><strong>Our adventure includes</strong></p>\n<ul>\n<li>Impact: Shape the future of travel with products used by millions of guests and thousands of hosts. At Holidu ideas become products, data drives decisions, and iteration fuels fast learning. Your work matters - and you’ll see the impact.</li>\n<li>Learning: Grow professionally in a culture that thrives on curiosity and feedback. You’ll learn from outstanding colleagues, collaborate across disciplines, and benefit from mentorship, and personal learning budgets - with a strong focus on AI.</li>\n<li>Great People: Join a team of smart, motivated and international colleagues who challenge and support each other. We celebrate wins and keep our culture fun, ambitious and human. Our customers are guests and hosts - people we can all relate to - making work meaningful and energizing.</li>\n<li>Technology: Work in a modern tech environment. You’ll experience the pace of a scale-up combined with the stability of a proven business model, enabling you to build, test, and improve continuously.</li>\n<li>Flexibility: Work a hybrid setup with 50% in-office time for collaboration, and spend up to 8 weeks a year from other inspiring locations. You’ll stay connected through regular events and meet-ups across our almost 30 offices.</li>\n<li>Perks on Top: Of course, we also offer travel benefits, gym discounts, and other perks to keep you energized - but what truly sets us apart is the chance to grow in a dynamic industry, alongside amazing people, while having fun along the way.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_07c95966-8e7","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Holidu Hosts GmbH","sameAs":"https://holidu.jobs.personio.com","logo":"https://logos.yubhub.co/holidu.jobs.personio.com.png"},"x-apply-url":"https://holidu.jobs.personio.com/job/2589679","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"Full-time","x-salary-range":null,"x-skills-required":["Java","Kotlin","Spring Boot","Gradle","AWS","Kubernetes","ReactJS","EventBridge","SQS","ActiveMQ","PostgreSQL","S3","Valkey","ElasticSearch","GraphQL","OpenTelemetry","Grafana","Prometheus","ELK","APM","CloudWatch"],"x-skills-preferred":[],"datePosted":"2026-04-18T22:14:06.987Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Munich, Germany"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Java, Kotlin, Spring Boot, Gradle, AWS, Kubernetes, ReactJS, EventBridge, SQS, ActiveMQ, PostgreSQL, S3, Valkey, ElasticSearch, GraphQL, OpenTelemetry, Grafana, Prometheus, ELK, APM, CloudWatch"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_f6deb282-e3c"},"title":"Senior Backend Developer (all genders)","description":"<p>Join our Host Experience department as a Senior Backend Developer and become part of the team that powers how our hosts&#39; vacation rentals reach the world.</p>\n<p>You&#39;ll be working at the core of our distribution engine - where we take tens of thousands of homes and make them bookable on major travel platforms such as Holidu, Booking.com, Airbnb, VRBO, HomeToGo, and Check24.</p>\n<p>This team operates in one of the most technically dynamic areas of our product. You will work with systems that synchronize large volumes of updates at high speed and maintain high availability, while integrating with a wide variety of partner APIs - each with its own structure and complexity.</p>\n<p>It&#39;s work that demands precision, scalability, and smart engineering decisions, and it plays a crucial role in helping our hosts reach millions of guests worldwide.</p>\n<p><strong>Our Tech Stack</strong></p>\n<ul>\n<li>Backend written in Kotlin and Java 21+ (with Spring Boot), with Gradle.</li>\n<li>Deployed as microservices on AWS-hosted Kubernetes cluster (EKS).</li>\n<li>Internal and external web applications written with ReactJS.</li>\n<li>Event-driven communication between services through EventBridge with SQS / ActiveMQ.</li>\n<li>Usage of a diverse set of technologies depending on the use case, such as PostgreSQL, S3, Valkey, ElasticSearch, GraphQL, and many more.</li>\n<li>Monitoring with OpenTelemetry, Grafana, Prometheus, ELK, APM, and CloudWatch.</li>\n</ul>\n<p><strong>Your role in this journey</strong></p>\n<ul>\n<li>Design, build, evolve, and maintain our services, creating a great user experience for our hosts.</li>\n<li>Build a strong understanding of the product, use it to drive initiatives end-to-end, and actively shape the team&#39;s direction , not just execute on it.</li>\n<li>Work AI-first: use AI to accelerate not just coding, but data exploration, codebase understanding, technical design, and decision-making , and continuously sharpen how you use these tools.</li>\n<li>Ensure our applications are highly scalable, capable of handling tens of thousands of properties and millions of bookings.</li>\n<li>Work with data persistence - whether in PostgreSQL, Redis, S3, or new state-of-the-art technologies you help us evaluate.</li>\n<li>Ship to production daily , deploying to our AWS Kubernetes cluster is part of the routine, not a special occasion.</li>\n<li>Own the reliability of your services , set up monitoring, define SLOs, and drive incident resolution so your team can move fast with confidence.</li>\n<li>Collaborate in a supportive, cross-functional team that values knowledge sharing and improving together.</li>\n<li>Apply engineering best practices, and stay curious by experimenting with new technologies.</li>\n</ul>\n<p><strong>Your backpack is filled with</strong></p>\n<ul>\n<li>A passion for great user experience and drive to deliver world-class products.</li>\n<li>Proven track record of delivering product impact through engineering , not just building services, but solving real problems for users.</li>\n<li>Experience with Java or Kotlin with Spring is a plus.</li>\n<li>Experience with relational databases and deploying apps in cloud environments. NoSQL experience is a plus.</li>\n<li>Familiarity with various API types and integration best practices.</li>\n<li>Strong problem-solving skills and a team-oriented mindset.</li>\n<li>Curiosity for the business side - you want to understand the “why” behind the features.</li>\n<li>A love for coding and building high-quality products that make a difference.</li>\n<li>High motivation to learn and experiment with new technologies.</li>\n</ul>\n<p><strong>Our adventure includes</strong></p>\n<ul>\n<li>Impact: Shape the future of travel with products used by millions of guests and thousands of hosts. At Holidu ideas become products, data drives decisions, and iteration fuels fast learning. Your work matters - and you’ll see the impact.</li>\n<li>Learning: Grow professionally in a culture that thrives on curiosity and feedback. You’ll learn from outstanding colleagues, collaborate across disciplines, and benefit from mentorship, and personal learning budgets - with a strong focus on AI.</li>\n<li>Great People: Join a team of smart, motivated and international colleagues who challenge and support each other. We celebrate wins and keep our culture fun, ambitious and human. Our customers are guests and hosts - people we can all relate to - making work meaningful and energizing.</li>\n<li>Technology: Work in a modern tech environment. You’ll experience the pace of a scale-up combined with the stability of a proven business model, enabling you to build, test, and improve continuously.</li>\n<li>Flexibility: Work a hybrid setup with 50% in-office time for collaboration, and spend up to 8 weeks a year from other inspiring locations. You’ll stay connected through regular events and meet-ups across our almost 30 offices.</li>\n<li>Perks on Top: Of course, we also offer travel benefits, gym discounts, and other perks to keep you energized - but what truly sets us apart is the chance to grow in a dynamic industry, alongside amazing people, while having fun along the way.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_f6deb282-e3c","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Holidu Hosts GmbH","sameAs":"https://holidu.jobs.personio.com","logo":"https://logos.yubhub.co/holidu.jobs.personio.com.png"},"x-apply-url":"https://holidu.jobs.personio.com/job/2573674","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"Full-time","x-salary-range":null,"x-skills-required":["Java","Kotlin","Spring Boot","Gradle","AWS-hosted Kubernetes cluster","ReactJS","EventBridge","SQS","ActiveMQ","PostgreSQL","S3","Valkey","ElasticSearch","GraphQL","OpenTelemetry","Grafana","Prometheus","ELK","APM","CloudWatch"],"x-skills-preferred":[],"datePosted":"2026-04-18T22:09:50.075Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Munich, Germany"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Java, Kotlin, Spring Boot, Gradle, AWS-hosted Kubernetes cluster, ReactJS, EventBridge, SQS, ActiveMQ, PostgreSQL, S3, Valkey, ElasticSearch, GraphQL, OpenTelemetry, Grafana, Prometheus, ELK, APM, CloudWatch"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_95c49f85-a98"},"title":"Staff+ Software Engineer, Observability","description":"<p><strong>About the Role</strong></p>\n<p>Anthropic is seeking talented and experienced Software Engineers to join our Observability team within the Infrastructure organization. The Observability team owns the monitoring and telemetry infrastructure that every engineer and researcher at Anthropic depends on,from metrics and logging pipelines to distributed tracing, error analytics, alerting, and the dashboards and query interfaces that make it all actionable.</p>\n<p>As Anthropic scales its infrastructure across massive GPU, TPU, and Trainium clusters, the volume and complexity of operational data is growing by orders of magnitude. We’re building next-generation observability systems,high-throughput ingest pipelines, cost-efficient columnar storage, unified query layers across signals, and agentic diagnostic tools,to ensure that engineers can detect, diagnose, and resolve issues in minutes rather than hours, even as the systems they operate become exponentially more complex.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Design and build scalable telemetry ingest and storage pipelines for metrics, logs, traces, and error data across Anthropic’s multi-cluster infrastructure</li>\n</ul>\n<ul>\n<li>Own and evolve core observability platforms, driving migrations and architectural improvements that improve reliability, reduce cost, and scale with organisational growth</li>\n</ul>\n<ul>\n<li>Build instrumentation libraries, SDKs, and integrations that make it easy for engineering teams to emit high-quality telemetry from their services</li>\n</ul>\n<ul>\n<li>Drive alerting and SLO infrastructure that enables teams to define, monitor, and respond to reliability targets with minimal noise</li>\n</ul>\n<ul>\n<li>Reduce mean time to detection and resolution by building cross-signal correlation, unified query interfaces, and AI-assisted diagnostic tooling</li>\n</ul>\n<ul>\n<li>Partner with Research, Inference, Product, and Infrastructure teams to ensure observability solutions meet the unique needs of each organisation</li>\n</ul>\n<p><strong>You May Be a Good Fit If You</strong></p>\n<ul>\n<li>Have 10+ years of relevant industry experience building and operating large-scale observability or monitoring infrastructure</li>\n</ul>\n<ul>\n<li>Have deep experience with at least one observability signal area (metrics, logging, tracing, or error analytics) and familiarity with the others</li>\n</ul>\n<ul>\n<li>Understand high-throughput data pipelines, columnar storage engines, and the tradeoffs involved in ingesting and querying telemetry data at scale</li>\n</ul>\n<ul>\n<li>Have experience operating or building on top of observability platforms such as Prometheus, Grafana, ClickHouse, OpenTelemetry, or similar systems</li>\n</ul>\n<ul>\n<li>Have strong proficiency in at least one of Python, Rust, or Go</li>\n</ul>\n<ul>\n<li>Have excellent communication skills and enjoy partnering with internal teams to improve their operational visibility and incident response capabilities</li>\n</ul>\n<ul>\n<li>Are excited about building foundational infrastructure and are comfortable working independently on ambiguous, high-impact technical challenges</li>\n</ul>\n<p><strong>Strong Candidates May Also Have</strong></p>\n<ul>\n<li>Experience operating metrics systems at very high cardinality (hundreds of millions of active time series or more)</li>\n</ul>\n<ul>\n<li>Experience with log storage migrations or operating columnar databases (ClickHouse, BigQuery, or similar) for analytics workloads</li>\n</ul>\n<ul>\n<li>Experience with OpenTelemetry instrumentation, collector pipelines, and tail-based sampling strategies</li>\n</ul>\n<ul>\n<li>Experience building or operating alerting platforms, on-call tooling, or SLO frameworks at scale</li>\n</ul>\n<ul>\n<li>Experience with Kubernetes-native monitoring, eBPF-based observability, or continuous profiling</li>\n</ul>\n<ul>\n<li>Interest in applying AI/LLMs to operational workflows such as automated root cause analysis, anomaly detection, or intelligent alerting</li>\n</ul>\n<p><strong>Logistics</strong></p>\n<ul>\n<li>Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience</li>\n</ul>\n<ul>\n<li>Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience</li>\n</ul>\n<ul>\n<li>Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position</li>\n</ul>\n<ul>\n<li>Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</li>\n</ul>\n<ul>\n<li>Visa sponsorship: We do sponsor visas! However, we aren’t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</li>\n</ul>\n<p><strong>How we&#39;re different</strong></p>\n<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact , advancing our long-term goals of steerable, trustworthy AI , rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We’re an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.</p>\n<p><strong>Come work with us!</strong></p>\n<p>Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_95c49f85-a98","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5102440008","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"£325,000-£390,000 GBP","x-skills-required":["observability","telemetry","metrics","logging","tracing","error analytics","alerting","SLO infrastructure","cross-signal correlation","unified query interfaces","AI-assisted diagnostic tooling","Python","Rust","Go","Prometheus","Grafana","ClickHouse","OpenTelemetry"],"x-skills-preferred":["high-throughput data pipelines","columnar storage engines","Kubernetes-native monitoring","eBPF-based observability","continuous profiling","AI/LLMs","automated root cause analysis","anomaly detection","intelligent alerting"],"datePosted":"2026-04-18T15:57:27.177Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"London, UK"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"observability, telemetry, metrics, logging, tracing, error analytics, alerting, SLO infrastructure, cross-signal correlation, unified query interfaces, AI-assisted diagnostic tooling, Python, Rust, Go, Prometheus, Grafana, ClickHouse, OpenTelemetry, high-throughput data pipelines, columnar storage engines, Kubernetes-native monitoring, eBPF-based observability, continuous profiling, AI/LLMs, automated root cause analysis, anomaly detection, intelligent alerting","baseSalary":{"@type":"MonetaryAmount","currency":"GBP","value":{"@type":"QuantitativeValue","minValue":325000,"maxValue":390000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_067a9092-157"},"title":"Manager, Software Engineering - Observability","description":"<p>We are seeking a Manager, Software Engineering - Observability to lead our team of engineers responsible for the reliability, scalability, and evolution of Figma&#39;s observability and cost engineering platforms.</p>\n<p>As a key member of our engineering team, you will own and operate Figma&#39;s core observability stack, including vendor platforms such as Datadog, ensuring high availability, strong data quality, and effective signal-to-noise across metrics, logs, and traces.</p>\n<p>You will define and drive the technical strategy for instrumentation standards, observability libraries, agents, and operators used to monitor internal and external facing services. You will also explore and implement innovative, AI-driven approaches to anomaly detection, root cause analysis, signal correlation, and operational automation.</p>\n<p>In addition, you will establish clear frameworks for cost attribution, budgeting, forecasting, and alerting across infrastructure and observability spend, enabling teams to make informed tradeoffs.</p>\n<p>You will partner with infrastructure, product engineering, finance, and security teams to improve visibility into system health and cost efficiency at scale.</p>\n<p>You will lead initiatives to optimize observability footprint and spend, balancing depth of insight with performance and cost considerations.</p>\n<p>You will coach and mentor engineers through career development, performance feedback, and technical leadership, fostering a culture of ownership, collaboration, and high-quality execution.</p>\n<p>We are looking for someone with 4+ years of experience leading infrastructure, observability, or platform engineering teams, with a track record of delivering highly reliable production systems.</p>\n<p>You should have deep hands-on experience with modern observability platforms (e.g., Datadog, OpenTelemetry) across metrics, logs, and distributed tracing.</p>\n<p>You should have a strong understanding of distributed systems, instrumentation best practices, SLO design, and incident response workflows.</p>\n<p>Experience driving cost transparency and accountability initiatives, including cost attribution, budgeting, forecasting, and alerting in cloud environments is also required.</p>\n<p>Preferred skills include experience designing or evolving company-wide observability standards, shared libraries, and agent/operator-based integrations, background in cost optimization for infrastructure or observability tooling, including vendor negotiations and usage modeling, and experience applying AI or machine learning techniques to anomaly detection, root cause analysis, or operational automation.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_067a9092-157","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Figma","sameAs":"https://www.figma.com/","logo":"https://logos.yubhub.co/figma.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/figma/jobs/5807963004","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$258,000-$376,000 USD","x-skills-required":["observability","datadog","opentelemetry","distributed systems","instrumentation best practices","slo design","incident response workflows","cost transparency","accountability initiatives","cost attribution","budgeting","forecasting","alerting"],"x-skills-preferred":["designing or evolving company-wide observability standards","shared libraries","agent/operator-based integrations","cost optimization for infrastructure or observability tooling","vendor negotiations","usage modeling","applying ai or machine learning techniques to anomaly detection","root cause analysis","operational automation"],"datePosted":"2026-04-18T15:55:20.408Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA • New York, NY • United States"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"observability, datadog, opentelemetry, distributed systems, instrumentation best practices, slo design, incident response workflows, cost transparency, accountability initiatives, cost attribution, budgeting, forecasting, alerting, designing or evolving company-wide observability standards, shared libraries, agent/operator-based integrations, cost optimization for infrastructure or observability tooling, vendor negotiations, usage modeling, applying ai or machine learning techniques to anomaly detection, root cause analysis, operational automation","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":258000,"maxValue":376000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_491db8e9-776"},"title":"Staff Site Reliability Engineer- Splunk Expert","description":"<p>We are seeking a highly technical Staff Site Reliability Engineer with deep expertise in Splunk and Grafana to own and evolve our observability ecosystem.</p>\n<p>As a Staff Site Reliability Engineer, you will move beyond simple monitoring to architect a comprehensive, scalable telemetry platform. You will be our subject-matter expert in Splunk optimisation, ensuring our logging architecture is performant, cost-effective, and deeply integrated with our automated workflows.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Splunk Architecture &amp; Optimisation: Lead the design and tuning of Splunk environments. Optimise indexer performance, search efficiency, and data models to ensure rapid troubleshooting and cost-efficiency.</li>\n</ul>\n<ul>\n<li>Advanced Visualisation: Architect and maintain sophisticated Grafana dashboards that correlate disparate data sources into a single pane of glass for real-time system health.</li>\n</ul>\n<ul>\n<li>Automated Infrastructure: Design, build, and maintain scalable observability infrastructure using tools like Terraform.</li>\n</ul>\n<ul>\n<li>Pipeline Engineering: Optimise the collection, processing, and storage of telemetry data (Metrics, Logs, Traces) to ensure high reliability and low latency.</li>\n</ul>\n<ul>\n<li>Workflow Automation: Develop custom Splunk workflows and integrations that trigger automated responses to system events, reducing Mean Time to Resolution (MTTR).</li>\n</ul>\n<ul>\n<li>Incident Response: Participate in on-call rotations and lead post-incident reviews to drive systemic improvements through &#39;observability-driven development.&#39;</li>\n</ul>\n<p>Required skills and experience include:</p>\n<ul>\n<li>Splunk Mastery: Deep, hands-on experience with Splunk administration, search optimisation (SPL), and architecting complex data pipelines.</li>\n</ul>\n<ul>\n<li>Grafana Expertise: Proven ability to build actionable, intuitive dashboards in Grafana that go beyond simple charts to provide deep operational insights.</li>\n</ul>\n<ul>\n<li>SRE Mindset: Minimum 8+ years of experience in an SRE, DevOps, or Systems Engineering role with a focus on high-availability systems.</li>\n</ul>\n<ul>\n<li>Programming Proficiency: Strong coding skills in Go, Python, or Ruby for building internal tools and automating observability workflows.</li>\n</ul>\n<ul>\n<li>Telemetry Standards: Hands-on experience with OpenTelemetry (OTel), Prometheus, or similar frameworks for instrumenting applications.</li>\n</ul>\n<ul>\n<li>Distributed Systems: Deep understanding of Linux internals, networking (TCP/IP, DNS, Load Balancing), and container orchestration (Kubernetes/EKS).</li>\n</ul>\n<p>Bonus skills include:</p>\n<ul>\n<li>Tracing: Implementation of distributed tracing (Jaeger, Tempo, or Honeycomb) to visualise request flow across microservices.</li>\n</ul>\n<ul>\n<li>Security Observability: Experience using Splunk for security orchestration (SOAR) or SIEM-related workflows.</li>\n</ul>\n<ul>\n<li>Cloud Platforms: Experience managing observability native tools within AWS, Azure, or GCP.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_491db8e9-776","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Okta","sameAs":"https://www.okta.com/","logo":"https://logos.yubhub.co/okta.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/okta/jobs/6874616","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Splunk","Grafana","SRE","Go","Python","Ruby","OpenTelemetry","Prometheus","Linux","Networking","Container Orchestration"],"x-skills-preferred":["Tracing","Security Observability","Cloud Platforms"],"datePosted":"2026-04-18T15:54:34.221Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Bengaluru, India"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Splunk, Grafana, SRE, Go, Python, Ruby, OpenTelemetry, Prometheus, Linux, Networking, Container Orchestration, Tracing, Security Observability, Cloud Platforms"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_190bd9e9-0d1"},"title":"Staff+ Software Engineer, Observability","description":"<p><strong>About the Role</strong></p>\n<p>Anthropic is seeking talented and experienced Software Engineers to join our Observability team within the Infrastructure organization. The Observability team owns the monitoring and telemetry infrastructure that every engineer and researcher at Anthropic depends on,from metrics and logging pipelines to distributed tracing, error analytics, alerting, and the dashboards and query interfaces that make it all actionable.</p>\n<p>By joining this team, you’ll have a direct impact on the reliability and operational excellence of Anthropic’s research and product systems.</p>\n<p>As Anthropic scales its infrastructure across massive GPU, TPU, and Trainium clusters, the volume and complexity of operational data is growing by orders of magnitude. We’re building next-generation observability systems,high-throughput ingest pipelines, cost-efficient columnar storage, unified query layers across signals, and agentic diagnostic tools,to ensure that engineers can detect, diagnose, and resolve issues in minutes rather than hours, even as the systems they operate become exponentially more complex.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Design and build scalable telemetry ingest and storage pipelines for metrics, logs, traces, and error data across Anthropic’s multi-cluster infrastructure</li>\n</ul>\n<ul>\n<li>Own and evolve core observability platforms, driving migrations and architectural improvements that improve reliability, reduce cost, and scale with organisational growth</li>\n</ul>\n<ul>\n<li>Build instrumentation libraries, SDKs, and integrations that make it easy for engineering teams to emit high-quality telemetry from their services</li>\n</ul>\n<ul>\n<li>Drive alerting and SLO infrastructure that enables teams to define, monitor, and respond to reliability targets with minimal noise</li>\n</ul>\n<ul>\n<li>Reduce mean time to detection and resolution by building cross-signal correlation, unified query interfaces, and AI-assisted diagnostic tooling</li>\n</ul>\n<ul>\n<li>Partner with Research, Inference, Product, and Infrastructure teams to ensure observability solutions meet the unique needs of each organisation</li>\n</ul>\n<p><strong>You May Be a Good Fit If You</strong></p>\n<ul>\n<li>Have 10+ years of relevant industry experience building and operating large-scale observability or monitoring infrastructure</li>\n</ul>\n<ul>\n<li>Have deep experience with at least one observability signal area (metrics, logging, tracing, or error analytics) and familiarity with the others</li>\n</ul>\n<ul>\n<li>Understand high-throughput data pipelines, columnar storage engines, and the tradeoffs involved in ingesting and querying telemetry data at scale</li>\n</ul>\n<ul>\n<li>Have experience operating or building on top of observability platforms such as Prometheus, Grafana, ClickHouse, OpenTelemetry, or similar systems</li>\n</ul>\n<ul>\n<li>Have strong proficiency in at least one of Python, Rust, or Go</li>\n</ul>\n<ul>\n<li>Have excellent communication skills and enjoy partnering with internal teams to improve their operational visibility and incident response capabilities</li>\n</ul>\n<ul>\n<li>Are excited about building foundational infrastructure and are comfortable working independently on ambiguous, high-impact technical challenges</li>\n</ul>\n<p><strong>Strong Candidates May Also Have</strong></p>\n<ul>\n<li>Experience operating metrics systems at very high cardinality (hundreds of millions of active time series or more)</li>\n</ul>\n<ul>\n<li>Experience with log storage migrations or operating columnar databases (ClickHouse, BigQuery, or similar) for analytics workloads</li>\n</ul>\n<ul>\n<li>Experience with OpenTelemetry instrumentation, collector pipelines, and tail-based sampling strategies</li>\n</ul>\n<ul>\n<li>Experience building or operating alerting platforms, on-call tooling, or SLO frameworks at scale</li>\n</ul>\n<ul>\n<li>Experience with Kubernetes-native monitoring, eBPF-based observability, or continuous profiling</li>\n</ul>\n<ul>\n<li>Interest in applying AI/LLMs to operational workflows such as automated root cause analysis, anomaly detection, or intelligent alerting</li>\n</ul>\n<p><strong>Logistics</strong></p>\n<ul>\n<li>Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience</li>\n</ul>\n<ul>\n<li>Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience</li>\n</ul>\n<ul>\n<li>Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position</li>\n</ul>\n<ul>\n<li>Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</li>\n</ul>\n<ul>\n<li>Visa sponsorship: We do sponsor visas! However, we aren’t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</li>\n</ul>\n<p><strong>How we’re different</strong></p>\n<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact , advancing our long-term goals of steerable, trustworthy AI , rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We’re an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.</p>\n<p>The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI &amp; Compute, Concrete Problems in AI Safety, and Learning from Human Preferences.</p>\n<p><strong>Come work with us!</strong></p>\n<p>Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_190bd9e9-0d1","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5102440008","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"£325,000-£390,000 GBP","x-skills-required":["Python","Rust","Go","Prometheus","Grafana","ClickHouse","OpenTelemetry"],"x-skills-preferred":["Kubernetes-native monitoring","eBPF-based observability","continuous profiling","AI/LLMs","automated root cause analysis","anomaly detection","intelligent alerting"],"datePosted":"2026-04-18T15:54:10.425Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"London, UK"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Rust, Go, Prometheus, Grafana, ClickHouse, OpenTelemetry, Kubernetes-native monitoring, eBPF-based observability, continuous profiling, AI/LLMs, automated root cause analysis, anomaly detection, intelligent alerting","baseSalary":{"@type":"MonetaryAmount","currency":"GBP","value":{"@type":"QuantitativeValue","minValue":325000,"maxValue":390000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_ebdf39f1-65a"},"title":"Senior Product Manager, APM - Observability","description":"<p>We&#39;re eager to welcome an outstanding Product Manager to join the Elastic Observability product team, with a focus on APM, Synthetics and RUM capabilities. With your knowledge and understanding of the Observability domain, you&#39;ll help ensure that we give our customers the best end to end full stack Observability experience for the applications we monitor.</p>\n<p>This role significantly impacts the success of our Observability product!</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Evolving our APM, Synthetics, and RUM experience, aligning at every step with organisational and company strategy.</li>\n</ul>\n<ul>\n<li>Working closely across teams, collaborating with Engineering, UX, Design and other teams to bring top-notch application monitoring experiences to our users.</li>\n</ul>\n<ul>\n<li>Working with our Sales &amp; Marketing teams on go-to-market, messaging, competitive analysis, enablement, and feature adoption.</li>\n</ul>\n<ul>\n<li>Tracking user engagement and feedback to inform product and feature evolution.</li>\n</ul>\n<p><strong>What You Bring</strong></p>\n<ul>\n<li>Ideally 3+ years in product management or closely related area.</li>\n</ul>\n<ul>\n<li>Experience in APM and distributed tracing.</li>\n</ul>\n<ul>\n<li>Synthetic transaction monitoring and RUM experience are optional but highly desired.</li>\n</ul>\n<ul>\n<li>Comfort working with and converting abstract problems into concrete, prioritised solutions, working from the customer backwards.</li>\n</ul>\n<ul>\n<li>Outstanding spoken and written communication skills.</li>\n</ul>\n<ul>\n<li>Passion for industry trends, competitive landscape.</li>\n</ul>\n<ul>\n<li>Experience working in remote, globally distributed work environment is optional but desired.</li>\n</ul>\n<ul>\n<li>Knowledge of OpenTelemetry and of AI are optional but desired.</li>\n</ul>\n<p><strong>Additional Information</strong></p>\n<p>As a distributed company, diversity drives our identity. Whether you’re looking to launch a new career or grow an existing one, Elastic is the type of company where you can balance great work with great life. Your age is only a number. It doesn’t matter if you’re just out of college or your children are; we need you for what you can do.</p>\n<p>We strive to have parity of benefits across regions and while regulations differ from place to place, we believe taking care of our people is the right thing to do.</p>\n<ul>\n<li>Competitive pay based on the work you do here and not your previous salary.</li>\n</ul>\n<ul>\n<li>Health coverage for you and your family in many locations.</li>\n</ul>\n<ul>\n<li>Ability to craft your calendar with flexible locations and schedules for many roles.</li>\n</ul>\n<ul>\n<li>Generous number of vacation days each year.</li>\n</ul>\n<ul>\n<li>Increase your impact - We match up to $2000 (or local currency equivalent) for financial donations and service.</li>\n</ul>\n<ul>\n<li>Up to 40 hours each year to use toward volunteer projects you love.</li>\n</ul>\n<ul>\n<li>Embracing parenthood with minimum of 16 weeks of parental leave.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_ebdf39f1-65a","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Elastic, the Search AI Company","sameAs":"https://www.elastic.co/","logo":"https://logos.yubhub.co/elastic.co.png"},"x-apply-url":"https://job-boards.greenhouse.io/elastic/jobs/7677216","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["APM","Distributed tracing","Synthetic transaction monitoring","RUM","OpenTelemetry","AI"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:52:11.433Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Poland"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"APM, Distributed tracing, Synthetic transaction monitoring, RUM, OpenTelemetry, AI"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_fa9a54d7-549"},"title":"Senior Site Reliability Engineer, Data Infrastructure","description":"<p>As a Senior Site Reliability Engineer, you will own the reliability and performance of our Kubernetes-based data platform. You will design and operate highly available, multi-region systems, ensuring our services meet strict uptime and latency targets.</p>\n<p>Day-to-day, you’ll work on scaling infrastructure, improving deployment pipelines, and hardening our security posture. You’ll play a key role in evolving our DevSecOps practices while partnering closely with engineering teams to ensure services are built for reliability from day one.</p>\n<p>We operate with production-grade discipline, supporting mission-critical services with stringent uptime requirements and a focus on automation, observability, and resilience.</p>\n<p>The Platform &amp; Infrastructure Engineering team in the Data Infrastructure organization is responsible for the reliability, scalability, and security of the company’s data platform. The team builds and operates the foundational systems that power data ingestion, transformation, analytics, and internal AI workloads at scale.</p>\n<p>About the role:</p>\n<ul>\n<li>5+ years of experience in Site Reliability Engineering, Platform Engineering, or Infrastructure Engineering roles</li>\n<li>Deep expertise in Kubernetes and containerized software services, including cluster design, operations, and troubleshooting in production environments</li>\n<li>Strong experience building and operating CI/CD systems, including tools such as Argo CD and GitHub Actions</li>\n<li>Proven experience owning production systems with high availability requirements (≥99.99% uptime), including incident response, SLI/SLO/SLA definition, error budgets, and postmortems</li>\n<li>Hands-on experience designing and operating geo-replicated, multi-region, active-active systems, including traffic routing, failover strategies, and data consistency tradeoffs</li>\n<li>Strong experience building and owning observability components, including metrics, logging, and tracing (e.g., Prometheus, Grafana, OpenTelemetry).</li>\n<li>Experience with infrastructure as code (e.g., Helm, Terraform, Pulumi) and automated environment provisioning</li>\n<li>Strong understanding of system performance tuning, capacity planning, and resource optimization in distributed systems</li>\n<li>Experience implementing and operating security best practices in cloud-native environments (e.g., secrets management, network policies, vulnerability scanning)</li>\n</ul>\n<p>Preferred:</p>\n<ul>\n<li>Experience operating data platforms or data-intensive workloads (e.g., Spark, Airflow, Kafka, Flink)</li>\n<li>Familiarity with service mesh technologies (e.g., Istio, Linkerd)</li>\n<li>Experience working in regulated environments with compliance frameworks such as GDPR, SOC 2, HIPAA, or SOX</li>\n<li>Background in building internal developer platforms or self-service infrastructure</li>\n</ul>\n<p>Wondering if you’re a good fit?</p>\n<p>We believe in investing in our people, and value candidates who can bring their own diversified experiences to our teams – even if you aren’t a 100% skill or experience match.</p>\n<p>Here are a few qualities we’ve found compatible with our team. If some of this describes you, we’d love to talk.</p>\n<ul>\n<li>You love building highly reliable systems that operate at scale</li>\n<li>You’re curious about how to continuously improve system resilience, security, and operations</li>\n<li>You’re an expert in diagnosing and solving complex distributed systems problems</li>\n</ul>\n<p>Why CoreWeave?</p>\n<p>At CoreWeave, we work hard, have fun, and move fast! We’re in an exciting stage of hyper-growth that you will not want to miss out on. We’re not afraid of a little chaos, and we’re constantly learning.</p>\n<p>Our team cares deeply about how we build our product and how we work together, which is represented through our core values:</p>\n<ul>\n<li>Be Curious at Your Core</li>\n<li>Act Like an Owner</li>\n<li>Empower Employees</li>\n<li>Deliver Best-in-Class Client Experiences</li>\n<li>Achieve More Together</li>\n</ul>\n<p>We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and provides the opportunity to develop innovative solutions to complex problems.</p>\n<p>As we get set for take off, the growth opportunities within the organization are constantly expanding. You will be surrounded by some of the best talent in the industry, who will want to learn from you, too.</p>\n<p>Come join us!</p>\n<p>The base salary range for this role is $165,000 to $242,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation.</p>\n<p>In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).</p>\n<p>What We Offer</p>\n<p>The range we’ve posted represents the typical compensation range for this role. To determine actual compensation, we review the market rate for each candidate which can include a variety of factors. These include qualifications, experience, interview performance, and location.</p>\n<p>In addition to a competitive salary, we offer a variety of benefits to support your needs, including:</p>\n<ul>\n<li>Medical, dental, and vision insurance</li>\n<li>100% paid for by CoreWeave</li>\n<li>Company-paid Life Insurance</li>\n<li>Voluntary supplemental life insurance</li>\n<li>Short and long-term disability insurance</li>\n<li>Flexible Spending Account</li>\n<li>Health Savings Account</li>\n<li>Tuition Reimbursement</li>\n<li>Ability to Participate in Employee Stock Purchase Program (ESPP)</li>\n<li>Mental Wellness Benefits through Spring Health</li>\n<li>Family-Forming support provided by Carrot</li>\n<li>Paid Parental Leave</li>\n<li>Flexible, full-service childcare support with Kinside</li>\n<li>401(k) with a generous employer match</li>\n<li>Flexible PTO</li>\n<li>Catered lunch each day in our office and data center locations</li>\n<li>A casual work environment</li>\n<li>A work culture focused on innovative disruption</li>\n</ul>\n<p>Our Workplace</p>\n<p>While we prioritize a hybrid work environment, remote work may be considered for candidates located more than 30 miles from an office, based on role requirements for specialized skill sets.</p>\n<p>New hires will be invited to attend onboarding at one of our hubs within their first month.</p>\n<p>Teams also gather quarterly to support collaboration.</p>\n<p>California Consumer Privacy Act - California applicants only</p>\n<p>CoreWeave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace.</p>\n<p>All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, sexual orientation, gender identity, national origin, veteran status, or genetic information.</p>\n<p>As part of this commitment and consistent with the Americans with Disabilities Act (ADA), CoreWeave will ensure that qualified applicants and candidates with disabilities are provided reasonable accommodations for the hiring process, unless such accommodation would cause an undue hardship.</p>\n<p>If reasonable accommodation is needed, please contact: careers@coreweave.com.</p>\n<p>Export Control Compliance</p>\n<p>This position requires access to export controlled information.</p>\n<p>To conform to U.S. Government export regulations applicable to that information, applicant must either be (A) a U.S. person, defined as a (i) U.S. citizen or national, (ii) U.S. lawful permanent resident (green card holder), (iii) refugee under 8 U.S.C. § 1157, or (iv) asylee under 8 U.S.C. § 1158, (B) eligible to access the export controlled information without restrictions, or (C) otherwise exempt from the export regulations.</p>\n<p>If you are not a U.S. person, you will be required to provide documentation of your eligibility to access the export controlled information before being considered for this position.</p>\n<p>Please note that CoreWeave is subject to the requirements of the U.S. Department of Commerce&#39;s Export Administration Regulations (EAR) and the U.S. Department of State&#39;s International Traffic in Arms Regulations (ITAR).</p>\n<p>By applying for this position, you acknowledge that you have read and understood the export control requirements and that you will comply with them.</p>\n<p>If you have any questions or concerns regarding the export control requirements, please contact: careers@coreweave.com.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_fa9a54d7-549","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4671535006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$165,000 to $242,000","x-skills-required":["Kubernetes","containerized software services","cluster design","operations","troubleshooting","CI/CD systems","Argo CD","GitHub Actions","production systems","high availability","incident response","SLI/SLO/SLA definition","error budgets","postmortems","geo-replicated","multi-region","active-active systems","traffic routing","failover strategies","data consistency tradeoffs","observability components","metrics","logging","tracing","Prometheus","Grafana","OpenTelemetry","infrastructure as code","Helm","Terraform","Pulumi","automated environment provisioning","system performance tuning","capacity planning","resource optimization","distributed systems","security best practices","cloud-native environments","secrets management","network policies","vulnerability scanning"],"x-skills-preferred":["Spark","Airflow","Kafka","Flink","service mesh technologies","Istio","Linkerd","regulated environments","compliance frameworks","GDPR","SOC 2","HIPAA","SOX","internal developer platforms","self-service infrastructure"],"datePosted":"2026-04-18T15:51:59.035Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"New York, NY / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Kubernetes, containerized software services, cluster design, operations, troubleshooting, CI/CD systems, Argo CD, GitHub Actions, production systems, high availability, incident response, SLI/SLO/SLA definition, error budgets, postmortems, geo-replicated, multi-region, active-active systems, traffic routing, failover strategies, data consistency tradeoffs, observability components, metrics, logging, tracing, Prometheus, Grafana, OpenTelemetry, infrastructure as code, Helm, Terraform, Pulumi, automated environment provisioning, system performance tuning, capacity planning, resource optimization, distributed systems, security best practices, cloud-native environments, secrets management, network policies, vulnerability scanning, Spark, Airflow, Kafka, Flink, service mesh technologies, Istio, Linkerd, regulated environments, compliance frameworks, GDPR, SOC 2, HIPAA, SOX, internal developer platforms, self-service infrastructure","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":165000,"maxValue":242000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_a1ba5c28-9ce"},"title":"Senior Software Engineer, Observability","description":"<p>Join CoreWeave&#39;s Observability team, responsible for building the systems that give our customers and internal teams unparalleled visibility into complex AI workloads.</p>\n<p>Our team empowers engineers to understand, troubleshoot, and optimize high-performance infrastructure at massive scale.</p>\n<p>As a Senior Software Engineer on the Observability team, you will design, build, and maintain core observability infrastructure spanning metrics, logging, tracing, and telemetry pipelines.</p>\n<p>Your day-to-day will involve developing highly reliable and scalable systems, collaborating with internal engineering teams to embed observability best practices, and tackling performance and reliability challenges across clusters of thousands of GPUs.</p>\n<p>You&#39;ll also contribute to platform strategy and participate in on-call rotations to ensure critical production systems remain robust and operational.</p>\n<p>The base salary range for this role is $139,000 to $220,000.</p>\n<p>In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).</p>\n<p>We offer a variety of benefits to support your needs, including medical, dental, and vision insurance, 100% paid for by CoreWeave, company-paid Life Insurance, voluntary supplemental life insurance, short and long-term disability insurance, flexible Spending Account, Health Savings Account, tuition reimbursement, ability to participate in Employee Stock Purchase Program (ESPP), mental wellness benefits through Spring Health, family-forming support provided by Carrot, paid parental leave, flexible, full-service childcare support with Kinside, 401(k) with a generous employer match, flexible PTO, catered lunch each day in our office and data center locations, a casual work environment, and a work culture focused on innovative disruption.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_a1ba5c28-9ce","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4554201006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$139,000 to $220,000","x-skills-required":["Go","Python","Kubernetes","containerization","microservices architectures","Helm","YAML-based configurations","automated testing","progressive release strategies","on-call rotations"],"x-skills-preferred":["designing, operating, or scaling logging, metrics, or tracing platforms","data streaming systems for observability pipelines","automating infrastructure provisioning","OpenTelemetry for unified telemetry collection and instrumentation","exposure to modern AI workloads and GPU-based infrastructure"],"datePosted":"2026-04-18T15:51:55.238Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"New York, NY / Sunnyvale, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Go, Python, Kubernetes, containerization, microservices architectures, Helm, YAML-based configurations, automated testing, progressive release strategies, on-call rotations, designing, operating, or scaling logging, metrics, or tracing platforms, data streaming systems for observability pipelines, automating infrastructure provisioning, OpenTelemetry for unified telemetry collection and instrumentation, exposure to modern AI workloads and GPU-based infrastructure","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":139000,"maxValue":220000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_72ebb09d-b37"},"title":"Staff+ Software Engineer, Observability","description":"<p>We&#39;re seeking talented and experienced Software Engineers to join our Observability team within the Infrastructure organization. The Observability team owns the monitoring and telemetry infrastructure that every engineer and researcher at Anthropic depends on,from metrics and logging pipelines to distributed tracing, error analytics, alerting, and the dashboards and query interfaces that make it all actionable.</p>\n<p>As Anthropic scales its infrastructure across massive GPU, TPU, and Trainium clusters, the volume and complexity of operational data is growing by orders of magnitude. We&#39;re building next-generation observability systems,high-throughput ingest pipelines, cost-efficient columnar storage, unified query layers across signals, and agentic diagnostic tools,to ensure that engineers can detect, diagnose, and resolve issues in minutes rather than hours, even as the systems they operate become exponentially more complex.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Design and build scalable telemetry ingest and storage pipelines for metrics, logs, traces, and error data across Anthropic&#39;s multi-cluster infrastructure</li>\n<li>Own and evolve core observability platforms, driving migrations and architectural improvements that improve reliability, reduce cost, and scale with organisational growth</li>\n<li>Build instrumentation libraries, SDKs, and integrations that make it easy for engineering teams to emit high-quality telemetry from their services</li>\n<li>Drive alerting and SLO infrastructure that enables teams to define, monitor, and respond to reliability targets with minimal noise</li>\n<li>Reduce mean time to detection and resolution by building cross-signal correlation, unified query interfaces, and AI-assisted diagnostic tooling</li>\n<li>Partner with Research, Inference, Product, and Infrastructure teams to ensure observability solutions meet the unique needs of each organisation</li>\n</ul>\n<p>You May Be a Good Fit If You:</p>\n<ul>\n<li>Have 10+ years of relevant industry experience building and operating large-scale observability or monitoring infrastructure</li>\n<li>Have deep experience with at least one observability signal area (metrics, logging, tracing, or error analytics) and familiarity with the others</li>\n<li>Understand high-throughput data pipelines, columnar storage engines, and the tradeoffs involved in ingesting and querying telemetry data at scale</li>\n<li>Have experience operating or building on top of observability platforms such as Prometheus, Grafana, ClickHouse, OpenTelemetry, or similar systems</li>\n<li>Have strong proficiency in at least one of Python, Rust, or Go</li>\n<li>Have excellent communication skills and enjoy partnering with internal teams to improve their operational visibility and incident response capabilities</li>\n<li>Are excited about building foundational infrastructure and are comfortable working independently on ambiguous, high-impact technical challenges</li>\n</ul>\n<p>Strong Candidates May Also Have:</p>\n<ul>\n<li>Experience operating metrics systems at very high cardinality (hundreds of millions of active time series or more)</li>\n<li>Experience with log storage migrations or operating columnar databases (ClickHouse, BigQuery, or similar) for analytics workloads</li>\n<li>Experience with OpenTelemetry instrumentation, collector pipelines, and tail-based sampling strategies</li>\n<li>Experience building or operating alerting platforms, on-call tooling, or SLO frameworks at scale</li>\n<li>Experience with Kubernetes-native monitoring, eBPF-based observability, or continuous profiling</li>\n<li>Interest in applying AI/LLMs to operational workflows such as automated root cause analysis, anomaly detection, or intelligent alerting</li>\n</ul>\n<p>The annual compensation range for this role is $405,000-$485,000 USD.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_72ebb09d-b37","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5139910008","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$405,000-$485,000 USD","x-skills-required":["observability","monitoring","telemetry","metrics","logging","tracing","error analytics","alerting","SLO infrastructure","cross-signal correlation","unified query interfaces","AI-assisted diagnostic tooling","Python","Rust","Go","Prometheus","Grafana","ClickHouse","OpenTelemetry"],"x-skills-preferred":["high-throughput data pipelines","columnar storage engines","operating system administration","cloud computing","containerization","DevOps"],"datePosted":"2026-04-18T15:51:29.494Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"observability, monitoring, telemetry, metrics, logging, tracing, error analytics, alerting, SLO infrastructure, cross-signal correlation, unified query interfaces, AI-assisted diagnostic tooling, Python, Rust, Go, Prometheus, Grafana, ClickHouse, OpenTelemetry, high-throughput data pipelines, columnar storage engines, operating system administration, cloud computing, containerization, DevOps","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":405000,"maxValue":485000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_60aae9e8-e8b"},"title":"Software Engineer, Observability","description":"<p>We&#39;re looking for a skilled Software Engineer to join our Observability team. As a member of this team, you will be responsible for designing and evolving logging, metrics, and tracing pipelines to handle massive data volumes. You will also evaluate and integrate new technologies to enhance Airtable&#39;s observability posture.</p>\n<p>Your responsibilities will include guiding and mentoring a growing team of infrastructure engineers, defining and upholding coding standards, partnering with other teams to embed observability throughout the development lifecycle, and owning end-to-end reliability for observability tools.</p>\n<p>You will also extend observability to LLM and AI features by instrumenting prompts, model calls, and RAG pipelines to capture latency, reliability, cost, and safety signals. You will design online and offline evaluation loops for LLM quality, build dashboards and alerts for token usage, error rates, and model performance, and connect these signals to tracing for prompt lineage.</p>\n<p>To succeed in this role, you will need 6+ years of software engineering experience, with 3+ years focused on observability or infrastructure at scale. You will also need demonstrated success implementing and running production-grade logging, metrics, or tracing systems, proficiency in distributed systems concepts, data streaming pipelines, and container orchestration, and deep hands-on knowledge of tools such as Prometheus, Grafana, Datadog, OpenTelemetry, ELK Stack, Loki, or ClickHouse.</p>\n<p>This is a high-impact role that will allow you to lead the modernization of Airtable&#39;s observability stack, influence how every engineer monitors and debugs mission-critical systems, and drive major projects across engineering organization to build platform and services for solving observability problems.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_60aae9e8-e8b","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Airtable","sameAs":"https://airtable.com/","logo":"https://logos.yubhub.co/airtable.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/airtable/jobs/8400374002","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Distributed systems concepts","Data streaming pipelines","Container orchestration","Prometheus","Grafana","Datadog","OpenTelemetry","ELK Stack","Loki","ClickHouse"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:47:22.779Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA; New York, NY; Remote (Seattle, WA only)"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Distributed systems concepts, Data streaming pipelines, Container orchestration, Prometheus, Grafana, Datadog, OpenTelemetry, ELK Stack, Loki, ClickHouse"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_cbeabfab-916"},"title":"Software Engineer, Observability","description":"<p>As a Software Engineer on the Observability team, you will design, build, and maintain scalable systems that process and surface telemetry data across distributed environments.</p>\n<p>You&#39;ll contribute production-quality code in languages like Go and Python, while improving system reliability through enhanced monitoring, alerting, and incident response practices.</p>\n<p>Day to day, you&#39;ll collaborate with cross-functional engineering teams to implement observability best practices, support production systems, and help optimize performance across large-scale infrastructure.</p>\n<p>You will also participate in on-call rotations and contribute to continuous improvements based on real-world system behavior.</p>\n<p>CoreWeave is looking for a talented software engineer to join our Observability team. You will be responsible for designing, building, and maintaining scalable systems that process and surface telemetry data across distributed environments.</p>\n<p>The ideal candidate will have experience with Go and Python, as well as a strong understanding of system reliability and observability best practices.</p>\n<p>In addition to your technical skills, you should be able to collaborate effectively with cross-functional teams and communicate complex technical concepts to non-technical stakeholders.</p>\n<p>If you&#39;re passionate about building scalable systems and improving system reliability, we&#39;d love to hear from you!</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_cbeabfab-916","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4587675006","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$109,000 to $145,000","x-skills-required":["Go","Python","Kubernetes","containerization","microservices architectures","observability systems","metrics","logging","tracing"],"x-skills-preferred":["ClickHouse","Elastic","Loki","VictoriaMetrics","Prometheus","Thanos","OpenTelemetry","Grafana","Terraform","modern testing frameworks","deployment strategies","data streaming technologies","AI/ML infrastructure"],"datePosted":"2026-04-18T15:46:41.788Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"New York, NY / Sunnyvale, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Go, Python, Kubernetes, containerization, microservices architectures, observability systems, metrics, logging, tracing, ClickHouse, Elastic, Loki, VictoriaMetrics, Prometheus, Thanos, OpenTelemetry, Grafana, Terraform, modern testing frameworks, deployment strategies, data streaming technologies, AI/ML infrastructure","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":109000,"maxValue":145000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_0ef1d7d5-e0a"},"title":"Member of Technical Staff - Observability","description":"<p>We&#39;re looking for a skilled engineer to join our small, high-impact Observability team. As a Member of Technical Staff, you&#39;ll design and implement scalable observability infrastructure for metrics, logging, and tracing. You&#39;ll build high-performance telemetry pipelines, develop APIs and query engines, and define best practices for instrumentation and alerting. Your work will enable engineering teams to operate services at scale, identify issues before they impact users, and drive systemic reliability improvements.</p>\n<p>Our team operates with a flat organisational structure, and leadership is given to those who show initiative and consistently deliver excellence. We value strong communication skills, and all employees are expected to contribute directly to the company&#39;s mission.</p>\n<p>You&#39;ll be working with a range of technologies, including Go, Rust, Scala, Prometheus, Grafana, OpenTelemetry, VictoriaMetrics, and ClickHouse. Experience with Kafka, Redis, and large-scale time series databases is also essential.</p>\n<p>In this role, you&#39;ll own the reliability, scalability, and performance of the observability stack end-to-end. You&#39;ll partner with infrastructure and product teams to deeply integrate observability into our internal platforms.</p>\n<p>We offer a competitive salary of $180,000 - $440,000 USD, plus equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short &amp; long-term disability insurance, life insurance, and various other discounts and perks.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_0ef1d7d5-e0a","directApply":true,"hiringOrganization":{"@type":"Organization","name":"xAI","sameAs":"https://www.xai.com/","logo":"https://logos.yubhub.co/xai.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/xai/jobs/4803905007","x-work-arrangement":"onsite","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$180,000 - $440,000 USD","x-skills-required":["Go","Rust","Scala","Prometheus","Grafana","OpenTelemetry","VictoriaMetrics","ClickHouse","Kafka","Redis","large-scale time series databases"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:43:49.694Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Palo Alto, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Go, Rust, Scala, Prometheus, Grafana, OpenTelemetry, VictoriaMetrics, ClickHouse, Kafka, Redis, large-scale time series databases","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":180000,"maxValue":440000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_bd4ea9f9-369"},"title":"Staff Software Engineer","description":"<p>Omada Health is on a mission to inspire and engage people in lifelong health, one step at a time.</p>\n<p>We&#39;re seeking a Staff Software Engineer to lead the modernization, optimization, and scalability of Omada&#39;s B2B platform. This role is ideal for someone who combines deep technical expertise with strong leadership,someone eager to design for scale, mentor others, and influence technical direction across teams.</p>\n<p>You&#39;ll play a central role in re-architecting complex legacy systems, designing high-performance data pipelines (batch and real-time), and ensuring our core B2B capabilities,file ingestion, marketing outreach, eligibility, and billing,are robust, performant, and ready for the next wave of growth.</p>\n<p><strong>About You:</strong></p>\n<p>You&#39;re a systems thinker who thrives on solving hard technical challenges at scale. You have a strong foundation in distributed systems, database performance, and architectural design patterns,and you naturally guide teams toward simpler, more scalable solutions.</p>\n<p>You&#39;re both a technical expert and a connector, equally comfortable deep in the code or collaborating across disciplines. You&#39;re passionate about leading by example, mentoring others, and helping engineers across Omada level up their craft. You&#39;re also motivated by impact,building systems that help improve health outcomes for millions.</p>\n<p><strong>What You&#39;ll Be Doing:</strong></p>\n<ul>\n<li>Lead architecture, system design and engineering efforts for high-scale, data-intensive B2B systems supporting eligibility, billing, marketing, and file ingestion.</li>\n<li>Design and implement batch and real-time processing architectures that are reliable, observable, and performant.</li>\n<li>Drive efforts in database performance optimization, schema design, and long-term scalability planning across multi-terabyte PostgreSQL and other persistent stores.</li>\n<li>Partner closely with product, infrastructure, and operations teams to deliver resilient, maintainable systems that balance business needs with technical excellence.</li>\n<li>Identify and lead engineering-wide initiatives that improve scalability, developer efficiency, or data quality.</li>\n<li>Mentor and coach engineers at all levels, and actively contribute to Omada’s engineering community through design reviews, technical talks, and shared best practices.</li>\n<li>Contribute to modern, cloud-forward architecture across multiple product domains, ensuring our systems are designed to evolve gracefully and scale efficiently.</li>\n<li>Use and advocate for AI-assisted development tools (e.g., Cursor, Claude) to enhance individual and team productivity.</li>\n<li>Champion a culture of quality, observability, and reliability through strong DevOps principles and continuous improvement.</li>\n</ul>\n<p>*</p>\n<ul>\n<li><strong>What You Need for This Role:</strong></li>\n</ul>\n<ul>\n<li>10+ years of software engineering experience, with a significant portion spent on scalable systems architecture and performance optimization.</li>\n<li>Proven success in re-architecting complex legacy platforms and implementing modern, maintainable solutions.</li>\n<li>Strong programming experience with Ruby and Python, and comfort working across a modern stack (Rails, GraphQL, Django, Sidekiq).</li>\n<li>Deep understanding of relational databases (PostgreSQL, MySQL), performance tuning, and data modeling.</li>\n<li>Hands-on experience with both batch and streaming data pipelines (e.g., SQS, Kafka, Kinesis, Airflow).</li>\n<li>Demonstrable mastery of API design, distributed systems, and cloud-native architecture (preferably AWS).</li>\n<li>Fluency in CI/CD, containerization, and infrastructure-as-code (Docker, Kubernetes, Terraform).</li>\n<li>Familiarity with monitoring and observability frameworks (Datadog, OpenTelemetry).</li>\n<li>Excellent communication and collaboration skills, with a proven ability to influence and deliver through others.</li>\n<li>Growth mindset and genuine curiosity about new technologies, tools, and team approaches.</li>\n</ul>\n<p>*</p>\n<ul>\n<li><strong>Technologies We Use:</strong></li>\n</ul>\n<ul>\n<li>Ruby on Rails</li>\n<li>Sidekiq</li>\n<li>AWS Managed Datastores (RDS with PostgreSQL, Elasticache, ElasticSearch SNS/SQS)</li>\n<li>GraphQL</li>\n<li>Docker</li>\n<li>Kubernetes</li>\n</ul>\n<p>*</p>\n<ul>\n<li><strong>Benefits:</strong></li>\n</ul>\n<ul>\n<li>Competitive salary with generous annual cash bonus</li>\n<li>Equity grants</li>\n<li>Remote first work from home culture</li>\n<li>Flexible Time Off to help you rest, recharge, and connect with loved ones</li>\n<li>Generous parental leave</li>\n<li>Health, dental, and vision insurance (and above market employer contributions)</li>\n<li>401k retirement savings plan</li>\n<li>Lifestyle Spending Account (LSA)</li>\n<li>Mental Health Support Solutions</li>\n<li>...and more!</li>\n</ul>\n<p>*</p>\n<ul>\n<li><strong>It Takes a Village to Change Healthcare:</strong></li>\n</ul>\n<p>At Omada, we strive to embody the following values in our day-to-day work. We hope these hold meaning for you as well as you consider Omada!</p>\n<ul>\n<li>Cultivate Trust. We listen closely and we operate with kindness. We provide respectful and candid feedback to each other.</li>\n<li>Seek Context. We ask to understand and we build connections. We do our research up front to move faster down the road.</li>\n<li>Act Boldly. We innovate daily to solve problems, improve processes, and find new opportunities for our members and customers.</li>\n<li>Deliver Results. We reward impact above output. We set a high bar, we’re not afraid to fail, and we take pride in our work.</li>\n<li>Succeed Together. We prioritize Omada’s progress above team or individual. We have fun as we get stuff done, and we celebrate together.</li>\n<li>Remember Why We’re Here. We push through the challenges of changing healthcare because we know the destination is worth it.</li>\n</ul>\n<p>*</p>\n<ul>\n<li><strong>About Omada Health:</strong></li>\n</ul>\n<p>Omada Health is a between-visit healthcare provider that addresses lifestyle and behavior change elements for individuals managing chronic conditions. Omada’s multi-condition platform treats diabetes, hypertension, prediabetes, musculoskeletal, and GLP-1 management. With insights from connected devices and AI-supported tools, Omada care teams deliver care that is rooted in evidence and unique to every member, unlocking results at scale. With more than a decade of experience and data, and 29 peer-reviewed publications showcasing clinical and economic proof points, Omada’s approach is designed to improve health outcomes and contain costs. Our customers include health plans, pharmacy benefit managers, health systems, and employers ranging from small businesses to Fortune 500s. At Omada, we aim to inspire and empower people to make lasting health changes on their own terms. For more information, visit: https://www.omadahealth.com/</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_bd4ea9f9-369","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Omada Health","sameAs":"https://www.omadahealth.com/","logo":"https://logos.yubhub.co/omadahealth.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/omadahealth/jobs/7611424","x-work-arrangement":"remote","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Ruby","Python","Ruby on Rails","GraphQL","Django","Sidekiq","PostgreSQL","MySQL","API design","distributed systems","cloud-native architecture","AWS","CI/CD","containerization","infrastructure-as-code","Docker","Kubernetes","monitoring and observability frameworks","Datadog","OpenTelemetry"],"x-skills-preferred":[],"datePosted":"2026-04-17T12:50:06.108Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Remote, USA"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Healthcare","skills":"Ruby, Python, Ruby on Rails, GraphQL, Django, Sidekiq, PostgreSQL, MySQL, API design, distributed systems, cloud-native architecture, AWS, CI/CD, containerization, infrastructure-as-code, Docker, Kubernetes, monitoring and observability frameworks, Datadog, OpenTelemetry"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_245477ba-29a"},"title":"Senior Software Engineer - Stability","description":"<p>The Stability team at Mercury champions and improves observability. We&#39;ve helped define incident response. We have introduced and support robust background work processing. We monitor and build tooling around platform and database health.</p>\n<p>As a Senior Software Engineer - Stability, you will lead projects end-to-end, drive technical projects from concept to production. You will define solutions, analyze tradeoffs, make critical decisions, and deliver software that works today and is sustainable for tomorrow.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Championing reliability by making technical choices that improve the reliability of Mercury&#39;s systems and making it easier to get reliability by default.</li>\n<li>Measuring outcomes by defining and collecting metrics that show how your work creates value for the business.</li>\n<li>Approaching code with craft by writing clear, testable, and maintainable code.</li>\n<li>Building for quality and sustainability by designing extensible systems, making balanced decisions on tech debt, planning careful rollouts, and owning the quality of your work through post-launch monitoring.</li>\n<li>Improving the developer experience by approaching problems with a product mindset, getting close to internal customers by supporting them and getting feedback from them.</li>\n</ul>\n<p>The ideal candidate for this role has expertise in PostgreSQL with query optimization, tuning, replication, pooling/proxying, or client-side libraries. They have worked with other data systems supporting a relational database: event streaming, OLAP, caches, etc. They have authored and operated Temporal workflows, are familiar with tracing and OpenTelemetry, and have learned by leading moderate-to-large technical projects, including planning, execution, and stakeholder management.</p>\n<p>The salary range for this role is $166,600 - 250,900 for US employees and CAD $157,400 - 237,100 for Canadian employees.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_245477ba-29a","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mercury","sameAs":"https://www.mercury.com/","logo":"https://logos.yubhub.co/mercury.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/mercury/jobs/5969193004","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$166,600 - 250,900 (US) | CAD $157,400 - 237,100 (Canada)","x-skills-required":["PostgreSQL","query optimization","tuning","replication","pooling/proxying","client-side libraries","Temporal workflows","tracing","OpenTelemetry"],"x-skills-preferred":[],"datePosted":"2026-04-17T12:46:59.417Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA, New York, NY, Portland, OR, or Remote within Canada or United States"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Finance","skills":"PostgreSQL, query optimization, tuning, replication, pooling/proxying, client-side libraries, Temporal workflows, tracing, OpenTelemetry","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":157400,"maxValue":250900,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_cc2c1709-591"},"title":"Senior Infrastructure Engineer","description":"<p>Imagine being a pioneer, venturing through the uncharted territories of the cloud. You&#39;re not just navigating; you&#39;re shaping the landscape, constructing robust architectures that withstand the tests of time and scale.</p>\n<p>At Mercury, your mission, should you choose to accept it, is to help steer our cloud infrastructure into the future. With projects as dynamic as migrating our entire fleet to ECS and building out our golden paths for service deployment, your role is pivotal. This isn&#39;t just a job; it&#39;s an epic tale of transformation and triumph.</p>\n<p>As a senior member of our infrastructure team, you will be equipped with essential tools and technologies designed for scaling and enhancing Mercury&#39;s infrastructure:</p>\n<ul>\n<li>AWS Services: Proficiently utilize EC2, RDS, IAM, Networking, Opensearch, and ECS to build and manage robust cloud environments.</li>\n<li>Terraform: Leverage Terraform for infrastructure as code to efficiently manage and provision our cloud resources.</li>\n<li>Agentic Infrastructure: Build the frameworks around using AI safely in our infrastructure, both for the agents and the users that kick off those agents.</li>\n<li>Monitoring and Observability Tools: Employ Prometheus, Grafana, Opensearch, and OpenTelemetry to maintain high availability and monitor system health.</li>\n<li>Version Control and CI/CD: Manage code and automate deployments using GitHub &amp; GitHub Actions.</li>\n</ul>\n<p>As we gear up for the next stages of Mercury&#39;s growth, you will:</p>\n<ul>\n<li>Build our “Infrastructure Platform” to support the growing needs of the Engineering Organization.</li>\n<li>Focus on building a platform that is AI friendly while still usable for engineers. We want our users to be humans and Agents.</li>\n<li>Lead key infrastructure projects, break-down complex initiatives, and define our infrastructure strategy through detailed RFCs and technical specifications.</li>\n</ul>\n<p>Must haves:</p>\n<ul>\n<li>You have 5+ years of experience with AWS.</li>\n<li>You have extensive experience, ideally 3 years or more, with observability and monitoring tools like Prometheus, Grafana, and OpenTelemetry, optimizing system performance and reliability.</li>\n<li>You have demonstrated ability in technical writing, with at least 3 years of experience creating detailed technical documentation, RFCs, and tech specs that clearly communicate complex ideas.</li>\n</ul>\n<p>The ideal candidate should:</p>\n<ul>\n<li>You bring at least 2 years of experience leading infrastructure projects in regulated environments such as HITRUST or SOC2, ensuring compliance and security.</li>\n<li>You have 3+ years of experience managing large-scale Terraform implementations, including the setup and maintenance of Terraform CI/CD pipelines.</li>\n<li>You have 2+ years of experience writing code. We are building an Infrastructure Platform from scratch and there is plenty of code to write to support that.</li>\n<li>Experience mentoring and elevating those around you, we are force multipliers for the engineering org.</li>\n</ul>\n<p>If this role interests you, we invite you to explore our public demo at demo.mercury.com.</p>\n<p>The total rewards package at Mercury includes base salary, equity, and benefits. Our salary and equity ranges are highly competitive within the SaaS and fintech industry and are updated regularly using the most reliable compensation survey data for our industry. New hire offers are made based on a candidate’s experience, expertise, geographic location, and internal pay equity relative to peers.</p>\n<p>Our target new hire base salary ranges for this role are the following:</p>\n<ul>\n<li>US employees: $200,700 - $250,900</li>\n<li>Canadian employees: CAD $189,700 - $237,100</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_cc2c1709-591","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mercury","sameAs":"https://demo.mercury.com","logo":"https://logos.yubhub.co/demo.mercury.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/mercury/jobs/5832466004","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$200,700 - $250,900 (US employees), CAD $189,700 - $237,100 (Canadian employees)","x-skills-required":["AWS","EC2","RDS","IAM","Networking","Opensearch","ECS","Terraform","Prometheus","Grafana","OpenTelemetry","GitHub","GitHub Actions"],"x-skills-preferred":[],"datePosted":"2026-04-17T12:44:40.102Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA, New York, NY, Portland, OR, or Remote within Canada or United States"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"AWS, EC2, RDS, IAM, Networking, Opensearch, ECS, Terraform, Prometheus, Grafana, OpenTelemetry, GitHub, GitHub Actions","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":189700,"maxValue":250900,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_eebf21c4-d1f"},"title":"Staff Site Reliability Engineer","description":"<p>Join our Site Reliability Engineering (SRE) team and help ensure the reliability, scalability, and performance of Replit&#39;s infrastructure that serves millions of developers worldwide.</p>\n<p>As a Staff Site Reliability Engineer, you will bridge the gap between development and operations, implementing automation and establishing best practices that enable our platform to scale efficiently while maintaining high availability.</p>\n<p>We are seeking Staff SREs who are passionate about building and maintaining resilient systems at scale. Your mission will be to proactively find and analyze reliability problems across our stack, then design and implement software and systems to create step-function improvements.</p>\n<p>You will design robust observability solutions, lead incident response, automate operational tasks, and continuously improve our infrastructure&#39;s reliability, all while mentoring and educating the broader engineering team to make reliability a core value at Replit.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Architect and Implement Observability: Design, build, and lead the implementation of comprehensive monitoring, logging, and tracing solutions. Create dashboards and metrics that provide real-time visibility into system health and performance, enabling proactive issue detection.</li>\n</ul>\n<ul>\n<li>Define and Drive Reliability Standards: Work with product and engineering teams to define, implement, and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs). Build systems to monitor and report on these metrics, holding teams accountable and ensuring we maintain high reliability standards while balancing innovation speed.</li>\n</ul>\n<ul>\n<li>Lead Incident Management and Response: Act as a senior leader during high-impact incidents, guiding the team to rapid resolution. Conduct thorough, blameless post-mortems and drive the implementation of preventative measures. Develop and refine runbooks and build automation to reduce Mean Time To Recovery (MTTR).</li>\n</ul>\n<ul>\n<li>Drive Automation and Infrastructure as Code: Architect, build, and improve automation to eliminate toil and operational work. Design and maintain CI/CD pipelines and infrastructure automation using tools like Terraform or Pulumi. Create self-healing systems that can automatically respond to common failure scenarios.</li>\n</ul>\n<ul>\n<li>Optimize Performance on Kubernetes: Collaborate with core infrastructure and product teams to performance-tune and optimize our large-scale cloud deployments, with a deep focus on Kubernetes, Docker, and GCP. Identify and resolve performance bottlenecks, implement capacity planning strategies, and reduce latency across global regions.</li>\n</ul>\n<ul>\n<li>Debug and Harden Distributed Systems: Dive deep into debugging extremely difficult technical problems across the stack. Use your findings to design and implement long-term fixes that make our systems and products more robust, operable, and easier to diagnose.</li>\n</ul>\n<ul>\n<li>Provide Staff-Level Guidance: Review feature and system designs from across the company, acting as a key owner for the reliability, scalability, security, and operational integrity of those designs.</li>\n</ul>\n<ul>\n<li>Educate and Mentor: Educate, mentor, and hold accountable the broader engineering team to improve the reliability of our systems, making reliability a core value of the Replit engineering culture.</li>\n</ul>\n<ul>\n<li>Build and Integrate: Write high-quality, well-tested code in Python or Go to meet the needs of your customers, whether it&#39;s building new internal tools or integrating with third-party vendors.</li>\n</ul>\n<p><strong>Required Skills and Experience</strong></p>\n<ul>\n<li>8-10 years of experience in Site Reliability Engineering or similar roles (e.g., DevOps, Systems Engineering, Infrastructure Engineering).</li>\n</ul>\n<ul>\n<li>Strong programming skills in languages like Python or Go. You write high-quality, well-tested code.</li>\n</ul>\n<ul>\n<li>Deep understanding of distributed systems. You’ve designed, built, scaled, and maintained production services and know how to compose a service-oriented architecture.</li>\n</ul>\n<ul>\n<li>Deep experience with container orchestration platforms, specifically Kubernetes, and cloud-native technologies.</li>\n</ul>\n<ul>\n<li>Proven track record of designing, implementing, and maintaining sophisticated monitoring and observability solutions (e.g., metrics, logging, tracing).</li>\n</ul>\n<ul>\n<li>Strong incident management skills with extensive experience leading incident response for complex systems and demonstrated critical thinking under pressure.</li>\n</ul>\n<ul>\n<li>Experience with infrastructure as code (e.g., Terraform, Pulumi) and configuration management tools.</li>\n</ul>\n<ul>\n<li>Excellent written and verbal communication skills, with an ability to explain complex technical concepts clearly and simply and a bias toward open, transparent cultural practices.</li>\n</ul>\n<ul>\n<li>Strong interpersonal skills, with experience working with and mentoring engineers from junior to principal levels.</li>\n</ul>\n<ul>\n<li>A willingness to dive into understanding, debugging, and improving any layer of the stack.</li>\n</ul>\n<ul>\n<li>You&#39;re passionate about making software creation accessible and empowering the next generation of builders.</li>\n</ul>\n<p><strong>Bonus Points</strong></p>\n<ul>\n<li>Deep experience with Google Cloud Platform (GCP) services and tools.</li>\n</ul>\n<ul>\n<li>Expert-level knowledge of modern observability platforms (e.g., Prometheus, Grafana, Datadog, OpenTelemetry).</li>\n</ul>\n<ul>\n<li>Experience designing and building reliable systems capable of handling high throughput and low latency.</li>\n</ul>\n<ul>\n<li>Significant experience with Go and Terraform.</li>\n</ul>\n<ul>\n<li>Familiarity with working in rapid-growth, startup environments.</li>\n</ul>\n<ul>\n<li>Experience writing company-facing blog posts and training materials.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_eebf21c4-d1f","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Replit","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/replit.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/replit/d50ad15b-82d4-452f-b4ea-2a7f5e796170","x-work-arrangement":"remote","x-experience-level":"staff","x-job-type":"Full time","x-salary-range":"$220K - $325K","x-skills-required":["Site Reliability Engineering","DevOps","Systems Engineering","Infrastructure Engineering","Python","Go","Distributed Systems","Container Orchestration","Kubernetes","Cloud-Native Technologies","Monitoring and Observability","Incident Management","Infrastructure as Code","Terraform","Pulumi","Configuration Management"],"x-skills-preferred":["Google Cloud Platform","Prometheus","Grafana","Datadog","OpenTelemetry","Go","Terraform"],"datePosted":"2026-03-08T22:20:23.639Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Remote (United States)"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Site Reliability Engineering, DevOps, Systems Engineering, Infrastructure Engineering, Python, Go, Distributed Systems, Container Orchestration, Kubernetes, Cloud-Native Technologies, Monitoring and Observability, Incident Management, Infrastructure as Code, Terraform, Pulumi, Configuration Management, Google Cloud Platform, Prometheus, Grafana, Datadog, OpenTelemetry, Go, Terraform","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":220000,"maxValue":325000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_f6c11430-460"},"title":"Software Engineer, Observability","description":"<p><strong>Compensation</strong></p>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. The total compensation includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p><strong>About the Role</strong></p>\n<p>We’re building the observability product for OpenAI—from scalable infrastructure to a rich, AI-powered UI. Our systems ingest over petabytes of logs and billions of time series metrics across our fleet. We&#39;re now layering intelligence on top—think agents that summarize SEVs, auto-generate dashboards, or help engineers debug through notebook-like UIs.</p>\n<p><strong>What You’ll Do</strong></p>\n<ul>\n<li>Own core observability infrastructure, including distributed logging, time series, and trace storage</li>\n</ul>\n<ul>\n<li>Build AI-native tools that help engineers detect, understand, and resolve issues autonomously.</li>\n</ul>\n<ul>\n<li>Contribute to UI experiences like dashboards, notebooking, or interactive debugging</li>\n</ul>\n<ul>\n<li>Collaborate closely with engineers, researchers, user ops, and other teams across the company to build the next generation observability product</li>\n</ul>\n<p><strong>You Might Be a Fit If You:</strong></p>\n<ul>\n<li>Have operated large-scale distributed systems in production. (especially logging systems or some other time series databases)</li>\n</ul>\n<ul>\n<li>Thrive in ambiguous environments and roll up your sleeves to solve unscoped problems.</li>\n</ul>\n<ul>\n<li>Have full-stack chops or product sensibilities—you&#39;re excited to build real tools people use.</li>\n</ul>\n<ul>\n<li>Have strong fundamentals in systems, networking, and cloud infra (Kubernetes, AWS, etc).</li>\n</ul>\n<ul>\n<li>Bonus: built or contributed to observability systems (e.g. Prometheus, OpenTelemetry, etc).</li>\n</ul>\n<p><strong>Why This Team</strong></p>\n<ul>\n<li>We’re both an infra and product team—building a real AI application for internal use.</li>\n</ul>\n<ul>\n<li>Your work will directly power the reliability of GPT-based products at massive scale.</li>\n</ul>\n<ul>\n<li>You&#39;ll help define what &#39;AI-powered observability&#39; looks like at one of the world’s most advanced AI labs.</li>\n</ul>\n<p><strong>About OpenAI</strong></p>\n<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>\n<p><strong>Additional Information</strong></p>\n<p>For additional information, please see [OpenAI’s Affirmative Action and Equal Employment Opportunity Policy Statement](https://cdn.openai.com/policies/eeo-policy-statement.pdf).</p>\n<p>Background checks for applicants will be administered in accordance with applicable law, and qualified applicants with arrest or conviction records will be considered for employment consistent with those laws, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act, for US-based candidates. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations.</p>\n<p>To notify OpenAI that you believe this job posting is non-compliant, please submit a report through [this form](https://form.asana.com/?d=57018692298241&amp;k=5MqR40fZd7jlxVUh5J-UeA). No response will be provided to inquiries unrelated to job posting compliance.</p>\n<p>We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this [link](https://form.asana.com/?k=bQ7w9h3iexRlicUdWRiwvg&amp;d=57018692298241).</p>\n<p>[OpenAI Global Applicant Privacy Policy](https://cdn.openai.com/policies/global-employee-and-contractor-privacy-policy.pdf)</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_f6c11430-460","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/d4dcd344-40cf-44d6-a7dd-172118eb0842","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"Full time","x-salary-range":"$255K – $405K","x-skills-required":["distributed systems","logging systems","time series databases","Kubernetes","AWS","Prometheus","OpenTelemetry"],"x-skills-preferred":["full-stack chops","product sensibilities"],"datePosted":"2026-03-08T22:19:31.048Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"distributed systems, logging systems, time series databases, Kubernetes, AWS, Prometheus, OpenTelemetry, full-stack chops, product sensibilities","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":255000,"maxValue":405000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_f70dd4a2-526"},"title":"Staff+ Software Engineer, Observability","description":"<p><strong>About the Role</strong></p>\n<p>Anthropic is seeking talented and experienced Software Engineers to join our Observability team within the Infrastructure organisation. The Observability team owns the monitoring and telemetry infrastructure that every engineer and researcher at Anthropic depends on—from metrics and logging pipelines to distributed tracing, error analytics, alerting, and the dashboards and query interfaces that make it all actionable.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Design and build scalable telemetry ingest and storage pipelines for metrics, logs, traces, and error data across Anthropic&#39;s multi-cluster infrastructure</li>\n<li>Own and evolve core observability platforms, driving migrations and architectural improvements that improve reliability, reduce cost, and scale with organisational growth</li>\n<li>Build instrumentation libraries, SDKs, and integrations that make it easy for engineering teams to emit high-quality telemetry from their services</li>\n<li>Drive alerting and SLO infrastructure that enables teams to define, monitor, and respond to reliability targets with minimal noise</li>\n<li>Reduce mean time to detection and resolution by building cross-signal correlation, unified query interfaces, and AI-assisted diagnostic tooling</li>\n<li>Partner with Research, Inference, Product, and Infrastructure teams to ensure observability solutions meet the unique needs of each organisation</li>\n</ul>\n<p><strong>You May Be a Good Fit If You:</strong></p>\n<ul>\n<li>Have 10+ years of relevant industry experience building and operating large-scale observability or monitoring infrastructure</li>\n<li>Have deep experience with at least one observability signal area (metrics, logging, tracing, or error analytics) and familiarity with the others</li>\n<li>Understand high-throughput data pipelines, columnar storage engines, and the tradeoffs involved in ingesting and querying telemetry data at scale</li>\n<li>Have experience operating or building on top of observability platforms such as Prometheus, Grafana, ClickHouse, OpenTelemetry, or similar systems</li>\n<li>Have strong proficiency in at least one of Python, Rust, or Go</li>\n<li>Have excellent communication skills and enjoy partnering with internal teams to improve their operational visibility and incident response capabilities</li>\n<li>Are excited about building foundational infrastructure and are comfortable working independently on ambiguous, high-impact technical challenges</li>\n</ul>\n<p><strong>Strong Candidates May Also Have:</strong></p>\n<ul>\n<li>Experience operating metrics systems at very high cardinality (hundreds of millions of active time series or more)</li>\n<li>Experience with log storage migrations or operating columnar databases (ClickHouse, BigQuery, or similar) for analytics workloads</li>\n<li>Experience with OpenTelemetry instrumentation, collector pipelines, and tail-based sampling strategies</li>\n<li>Experience building or operating alerting platforms, on-call tooling, or SLO frameworks at scale</li>\n<li>Experience with Kubernetes-native monitoring, eBPF-based observability, or continuous profiling</li>\n<li>Interest in applying AI/LLMs to operational workflows such as automated root cause analysis, anomaly detection, or intelligent alerting</li>\n</ul>\n<p><strong>Logistics</strong></p>\n<ul>\n<li>Education requirements: We require at least a Bachelor&#39;s degree in a related field or equivalent experience.</li>\n<li>Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</li>\n<li>Visa sponsorship: We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</li>\n</ul>\n<p><strong>We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work.</strong></p>\n<p><strong>Your safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses.</strong></p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_f70dd4a2-526","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://job-boards.greenhouse.io","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5139910008","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$405,000 - $485,000 USD","x-skills-required":["observability","metrics","logging","tracing","error analytics","alerting","SLO infrastructure","cross-signal correlation","unified query interfaces","AI-assisted diagnostic tooling","Python","Rust","Go","Prometheus","Grafana","ClickHouse","OpenTelemetry"],"x-skills-preferred":["OpenTelemetry instrumentation","collector pipelines","tail-based sampling strategies","Kubernetes-native monitoring","eBPF-based observability","continuous profiling","AI/LLMs","automated root cause analysis","anomaly detection","intelligent alerting"],"datePosted":"2026-03-08T13:52:33.217Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"observability, metrics, logging, tracing, error analytics, alerting, SLO infrastructure, cross-signal correlation, unified query interfaces, AI-assisted diagnostic tooling, Python, Rust, Go, Prometheus, Grafana, ClickHouse, OpenTelemetry, OpenTelemetry instrumentation, collector pipelines, tail-based sampling strategies, Kubernetes-native monitoring, eBPF-based observability, continuous profiling, AI/LLMs, automated root cause analysis, anomaly detection, intelligent alerting","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":405000,"maxValue":485000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_2b3a3ab9-2bc"},"title":"Member of Technical Staff, HPC Operations Engineering Manager","description":"<p><strong>Summary</strong></p>\n<p>Microsoft AI are looking for a talented Member of Technical Staff, HPC Operations Engineering Manager to join their MAI SuperIntelligence Team. This role sits at the heart of strategic decision-making, turning market data into actionable insights for a company that&#39;s revolutionising haptic entertainment technology. You&#39;ll work directly with leadership to shape the company&#39;s direction in the cinema and simulation markets.</p>\n<p><strong>About the Role</strong></p>\n<p>In this role, you&#39;ll lead a team of Site Reliability Engineers who blend software engineering and systems engineering to keep our large-scale distributed AI infrastructure reliable and efficient. You&#39;ll work closely with ML researchers, data engineers, and product developers to design and operate the platforms that power training, fine-tuning, and serving generative AI models.</p>\n<p><strong>Accountabilities</strong></p>\n<ul>\n<li>Conduct in-depth market research across cinema and simulation sectors, identifying emerging trends, competitive threats, and partnership opportunities that directly inform the company&#39;s quarterly strategic planning sessions</li>\n<li>Lead a team of experienced SREs to ensure uptime, resiliency and fault tolerance of AI model training and inference systems</li>\n</ul>\n<p><strong>The Candidate we&#39;re looking for</strong></p>\n<p><strong>Experience:</strong></p>\n<ul>\n<li>8+ years technical engineering experience with Site Reliability Engineering, DevOps, or Infrastructure Engineering Leadership roles</li>\n</ul>\n<p><strong>Technical skills:</strong></p>\n<ul>\n<li>Kubernetes, Docker, and container orchestration</li>\n<li>Public cloud platforms like Azure/AWS/GCP and infrastructure-as-code</li>\n</ul>\n<p><strong>Personal attributes:</strong></p>\n<ul>\n<li>Low ego individual</li>\n</ul>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Competitive salary</li>\n<li>Benefits and other compensation</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_2b3a3ab9-2bc","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Microsoft AI","sameAs":"https://microsoft.ai","logo":"https://logos.yubhub.co/microsoft.ai.png"},"x-apply-url":"https://microsoft.ai/job/member-of-technical-staff-hpc-operations-engineering-manager-mai-superintelligence-team/","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"USD $139,900 – $274,800 per year","x-skills-required":["Kubernetes","Docker","container orchestration","public cloud platforms","infrastructure-as-code"],"x-skills-preferred":["monitoring & observability tools","Grafana","Datadog","OpenTelemetry"],"datePosted":"2026-03-06T07:26:34.569Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Mountain View"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Kubernetes, Docker, container orchestration, public cloud platforms, infrastructure-as-code, monitoring & observability tools, Grafana, Datadog, OpenTelemetry","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":139900,"maxValue":274800,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_7205bdf6-abf"},"title":"Tools Programmer","description":"<p>We are looking for a Tools Programmer to join the team responsible for developing our internal crash reporting system, one of the most useful services at Ubisoft.</p>\n<p><strong>What you&#39;ll do</strong></p>\n<p>This system is composed of multiple components, including user interfaces written in C# that will undergo significant revamp to better serve our thousands of users, minimize the amount of effort necessary to report bugs through best-in-class UX and AI assistance.</p>\n<ul>\n<li>Responsibility 1: Play a key part in modernizing this strategic tool and making it a cutting-edge technology.</li>\n<li>Responsibility 2: Collaborate with the team to develop and implement new features and improvements.</li>\n</ul>\n<p><strong>What you need</strong></p>\n<ul>\n<li>Excellent proficiency in C#</li>\n<li>Excellent knowledge of desktop and web application development</li>\n</ul>\n<p><strong>Why this matters</strong></p>\n<p>This role is crucial in helping us create a better user experience and improve the overall efficiency of our game development process.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_7205bdf6-abf","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Ubisoft","sameAs":"https://jobs.smartrecruiters.com","logo":"https://logos.yubhub.co/ubisoft2.com.png"},"x-apply-url":"https://jobs.smartrecruiters.com/Ubisoft2/744000110158945-tools-programmer","x-work-arrangement":"onsite","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["C#","desktop and web application development","REST API design"],"x-skills-preferred":["Blazor","SQL","Elastic Search","Docker","Kubernetes","OpenAPI","OpenTelemetry"],"datePosted":"2026-02-20T00:04:50.365Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Bucharest"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"C#, desktop and web application development, REST API design, Blazor, SQL, Elastic Search, Docker, Kubernetes, OpenAPI, OpenTelemetry"}]}