<?xml version="1.0" encoding="UTF-8"?>
<source>
  <jobs>
    <job>
      <externalid>bee517db-e9c</externalid>
      <Title>DevOps Engineer (all genders)</Title>
      <Description><![CDATA[<p>Join our DevOps team at Holidu, a central team across the entire tech organisation, responsible for creating and maintaining the infrastructure that powers all of our products and services.</p>
<p>In this role, you will contribute to the continuous improvement of our DevOps processes, collaborate with cross-functional teams, and apply best practices for scalable, reliable, and secure systems.</p>
<p>Our ideal candidate has a solid technical foundation, a strong hands-on approach, and the ability to deliver results with minimal supervision.</p>
<p><strong>Our Tech Stack</strong></p>
<ul>
<li>Cloud: AWS (EC2, S3, RDS, EKS, Elasticache, Lambda)</li>
<li>Container Orchestration: Kubernetes with Helm</li>
<li>Infrastructure as Code: Terraform + Terragrunt, Pulumi/ CDK</li>
<li>Monitoring &amp; Observability: Prometheus, Grafana, Elastic Stack, OpenTelemetry</li>
<li>CI/CD: Jenkins, GitHub Actions, ArgoCD, ArgoRollouts</li>
<li>Scripting: Python, Go, Bash</li>
<li>Version Control: GitHub</li>
<li>Collaboration: Jira (Agile)</li>
<li>Automation: N8N, AI-assisted tooling (Agentic ADK)</li>
</ul>
<p><strong>Your role in this journey</strong></p>
<p>As a DevOps Engineer, you will be responsible for:</p>
<ul>
<li>Implementing and maintaining infrastructure definitions using Terraform, Pulumi, or similar tools</li>
<li>Ensuring IaC standards are followed and contributing improvements to existing modules and patterns</li>
<li>Managing and monitoring AWS services, ensuring system performance, availability, and adherence to best practices</li>
<li>Troubleshooting production issues and participating in capacity planning</li>
<li>Maintaining and troubleshooting Kubernetes clusters , deploying workloads, managing configurations, scaling services, and resolving incidents to support high-availability applications</li>
<li>Maintaining and improving CI/CD pipelines to ensure smooth, automated software delivery</li>
<li>Identifying bottlenecks and implementing enhancements across Jenkins, GitHub Actions, ArgoRollouts and ArgoCD</li>
<li>Maintaining and extending our monitoring stack (Prometheus, Grafana)</li>
<li>Building dashboards, configuring alerts, and improving observability to ensure comprehensive visibility into system health and performance</li>
</ul>
<p><strong>Your backpack is filled with</strong></p>
<ul>
<li>4+ years of experience in a DevOps, SRE, or cloud engineering role with hands-on production experience</li>
<li>Solid working experience with AWS services (EC2, EKS, S3, RDS, Lambda) and cloud infrastructure management</li>
<li>Hands-on experience with Docker and Kubernetes in production environments , deploying, scaling, and troubleshooting containerized workloads</li>
<li>Practical experience with at least one Infrastructure as Code tool (Terraform, Pulumi, or AWS CDK)</li>
<li>Experience maintaining and improving CI/CD pipelines using tools like Jenkins, GitHub Actions, or ArgoCD</li>
<li>Proficiency in scripting with Python, Bash, or Go for operational automation</li>
<li>Working knowledge of monitoring and observability tools such as Prometheus, Grafana, or similar platforms</li>
<li>Familiarity with logging and log aggregation systems (Elastic Stack, Open Telemetry, or similar)</li>
<li>Solid understanding of Linux administration, networking fundamentals, and system security basics</li>
<li>Strong communication skills with the ability to collaborate across teams and explain technical decisions clearly</li>
</ul>
<p><strong>Nice to Have</strong></p>
<ul>
<li>Experience with Helm charts and Kubernetes package management</li>
<li>Familiarity with GitOps workflows (e.g., Github Actions, ArgoCD, Flux)</li>
<li>Experience with designing AWS services-based architectures is a plus</li>
<li>Experience with AI automation or low-code/no-code platforms such as N8N is a plus</li>
<li>Familiarity with prompt engineering and using AI tools to augment DevOps workflows</li>
<li>Exposure to cost optimization strategies for cloud infrastructure</li>
<li>Experience with incident response, on-call rotations, or SRE practices (SLOs, error budgets)</li>
<li>Experience with DevSecOps practices , integrating security scanning and compliance into CI/CD pipelines</li>
</ul>
<p><strong>Our adventure includes</strong></p>
<ul>
<li>Impact: Shape the future of travel with products used by millions of guests and thousands of hosts</li>
<li>Learning: Grow professionally in a culture that thrives on curiosity and feedback</li>
<li>Great People: Join a team of smart, motivated, and international colleagues who challenge and support each other</li>
<li>Technology: Work in a modern tech environment</li>
<li>Flexibility: Work a hybrid setup with 50% in-office time for collaboration, and spend up to 8 weeks a year from other inspiring locations</li>
<li>Perks on Top: Of course, we also offer travel benefits, gym discounts, and other perks to keep you energized</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>Full-time</Jobtype>
      <Experiencelevel>mid</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>Cloud, Container Orchestration, Infrastructure as Code, Monitoring &amp; Observability, CI/CD, Scripting, Version Control, Collaboration, Automation, Helm, GitOps, AI automation, Low-code/no-code platforms, Prompt engineering, Cost optimization strategies, Incident response, SRE practices, DevSecOps practices</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Holidu Hosts GmbH</Employername>
      <Employerlogo>https://logos.yubhub.co/holidu.jobs.personio.com.png</Employerlogo>
      <Employerdescription>Holidu is a travel technology company that provides search engines for vacation rentals.</Employerdescription>
      <Employerwebsite>https://holidu.jobs.personio.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://holidu.jobs.personio.com/job/2595036</Applyto>
      <Location>Munich, Germany</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>790269e4-0f2</externalid>
      <Title>Associate Director, Software Engineering</Title>
      <Description><![CDATA[<p>Join HSBC and fulfil your potential in the role of Associate Director, Software Engineering.</p>
<p>We are currently seeking an experienced professional to lead our software engineering team and drive practical improvement initiatives to address SDLC bottlenecks, inefficiencies, and friction points across teams.</p>
<p>Key responsibilities include:</p>
<ul>
<li>Partnering with Engineering, Platform, and Risk and Control stakeholders to improve delivery flow, change quality, stability, resiliency, and operational effectiveness.</li>
<li>Defining and driving the adoption of DORA, SPACE, and broader engineering metrics to create visibility, support prioritisation, and improve performance outcomes.</li>
<li>Establishing and maintaining automated reporting to provide clear views of current performance, root-cause analysis, trends, and recommended actions.</li>
<li>Leading engineering and operational automation initiatives across areas such as testing, deployment, patching, recovery, and health checks.</li>
<li>Creating and maintaining a central engineering knowledge space and operating cadence to support governance, transparency, and continuous improvement.</li>
</ul>
<p>To be successful, you will have 12+ years of engineering experience across the full software delivery lifecycle, with strong engineering leadership capability and hands-on experience in coding.</p>
<p>You will also bring proven experience across engineering excellence, DevOps, platform engineering, SRE, or software delivery improvement roles, and demonstrate strong ability to identify SDLC bottlenecks, prioritise improvement opportunities, and convert insight into practical cross-team action.</p>
<p>Additional requirements include:</p>
<ul>
<li>Strong understanding of DORA metrics and good knowledge of SPACE or broader engineering productivity and developer experience measures.</li>
<li>Solid knowledge of software development, testing, release management, incident management, service recovery, and operational resilience practices.</li>
<li>Experience leading automation initiatives across testing, deployment, patching, recovery, and operational health checks.</li>
<li>An AI-driven mindset, with the ability to identify practical opportunities to use AI to improve engineering efficiency, analysis, decision-making, and delivery effectiveness.</li>
<li>Excellent analytical, communication, problem-solving, and delivery leadership skills.</li>
</ul>
<p>You’ll achieve more when you join HSBC.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>SDLC, DORA, SPACE, engineering metrics, automated reporting, engineering and operational automation, testing, deployment, patching, recovery, health checks, central engineering knowledge space, operating cadence, governance, transparency, continuous improvement, DevOps, platform engineering, SRE, software delivery improvement, AI-driven mindset, engineering efficiency, analysis, decision-making, delivery effectiveness</Skills>
      <Category>Engineering</Category>
      <Industry>Finance</Industry>
      <Employername>HSBC</Employername>
      <Employerlogo>https://logos.yubhub.co/portal.careers.hsbc.com.png</Employerlogo>
      <Employerdescription>HSBC is one of the largest banking and financial services organisations in the world, with operations in 64 countries and territories.</Employerdescription>
      <Employerwebsite>https://portal.careers.hsbc.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://portal.careers.hsbc.com/careers/job/563774610662004</Applyto>
      <Location>Pune</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>770c5fe8-cce</externalid>
      <Title>Staff Security Engineer, Vulnerability Management</Title>
      <Description><![CDATA[<p>We are seeking a Staff Security Engineer to lead the most complex technical work in CoreWeave&#39;s Vulnerability Management program.</p>
<p>As a Staff Security Engineer, you will design and implement scalable triage, prioritization, and remediation-tracking systems across application, infrastructure, and hardware domains. You will set technical standards, drive high-impact initiatives, and mentor engineers through technical leadership, while partnering with leadership on priorities and execution risks.</p>
<p>Key Responsibilities:</p>
<ul>
<li>Lead high-complexity VM technical initiatives and deliver architecture decisions for assigned program areas</li>
<li>Design and build scalable triage automation, including integrations, decision logic, and production hardening</li>
<li>Implement end-to-end workflow components from assessment and detection to ticket routing and remediation tracking</li>
<li>Provide deep technical leadership on hardware-adjacent vulnerabilities (GPU firmware, DPU firmware/BlueField, and BMC surfaces)</li>
<li>Act as senior technical responder for embargoed disclosures and zero-day events, coordinating with owner teams that deploy fixes</li>
<li>Improve prioritization logic, severity models, and exception workflows through code, design reviews, and technical proposals</li>
<li>Produce actionable technical metrics and risk insights for leadership consumption</li>
<li>Lead root-cause analysis for high-impact vulnerability incidents and implement durable technical improvements</li>
<li>Mentor IC3/IC4/IC5 engineers through design guidance, code review, and incident coaching</li>
<li>Partner with security, engineering, and operational stakeholders to improve workflow reliability and accelerate remediation outcomes</li>
</ul>
<p>Requirements:</p>
<ul>
<li>9+ years of relevant experience with demonstrated strategic impact in vulnerability management, application security, platform security, or cloud security engineering</li>
<li>Proven track record building and scaling security automation (SOAR workflows, AI/ML systems, detection pipelines) in production environments</li>
<li>Deep subject matter expertise with vulnerability management best practices: CVSS, EPSS, CISA KEV, threat intelligence integration, and risk-based prioritization frameworks</li>
<li>Excellent development background with strong coding skills in Python, Go, or similar languages for building scalable, production-grade security systems</li>
<li>Significant experience with modern vulnerability management tooling (for example Wiz, Semgrep, Rapid7, Tenable, or equivalent)</li>
<li>Experience with specialized infrastructure: GPU/DPU environments, firmware security, hardware vulnerabilities, or high-performance computing</li>
<li>Demonstrated track record mentoring engineers across levels and driving cross-functional technical initiatives at organizational scale</li>
<li>Strong business acumen and understanding of how security decisions impact engineering velocity, customer trust, and business outcomes</li>
</ul>
<p>Preferred Qualifications:</p>
<ul>
<li>Practical experience building AI/ML-powered security systems (LLM integration, automated decision-making, human-in-the-loop validation) in production</li>
<li>Experience managing hardware vendor security partnerships (embargoed disclosures and pre-release collaboration)</li>
<li>Production experience with security automation platforms such as TINES and serverless frameworks (AWS Lambda, GCP Cloud Functions)</li>
<li>Strong DevOps, DevSecOps, or SRE background with deep experience in AWS/GCP/Azure cloud services and Infrastructure as Code (Terraform, CloudFormation)</li>
<li>Deep understanding of Kubernetes security (container scanning, admission controllers, supply chain security, runtime protection)</li>
<li>Experience leading security programs through rapid hypergrowth (10x+ infrastructure scaling) in startup or cloud-native environments</li>
<li>Practical experience managing vulnerabilities within a FedRAMP-certified environment or similar regulatory frameworks</li>
</ul>
<p>Salary and Benefits: The base salary range for this role is $188,000 to $275,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).</p>
<p>Work Environment:</p>
<p>While we prioritize a hybrid work environment, remote work may be considered for candidates located more than 30 miles from an office, based on role requirements for specialized skill sets. New hires will be invited to attend onboarding at one of our hubs within their first month. Teams also gather quarterly to support collaboration.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>staff</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$188,000 to $275,000</Salaryrange>
      <Skills>vulnerability management, application security, platform security, cloud security engineering, security automation, AI/ML systems, detection pipelines, Python, Go, modern vulnerability management tooling, GPU/DPU environments, firmware security, hardware vulnerabilities, high-performance computing, AI/ML-powered security systems, LLM integration, automated decision-making, human-in-the-loop validation, security automation platforms, TINES, serverless frameworks, AWS Lambda, GCP Cloud Functions, DevOps, DevSecOps, SRE, Kubernetes security, container scanning, admission controllers, supply chain security, runtime protection</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>CoreWeave</Employername>
      <Employerlogo>https://logos.yubhub.co/coreweave.com.png</Employerlogo>
      <Employerdescription>CoreWeave is a cloud computing company that provides a platform for building and scaling AI applications.</Employerdescription>
      <Employerwebsite>https://www.coreweave.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/coreweave/jobs/4653130006</Applyto>
      <Location>Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>7bc4518a-7e3</externalid>
      <Title>AI Applications Ops Lead, GPS</Title>
      <Description><![CDATA[<p><strong>Role Overview</strong></p>
<p>Scale&#39;s rapidly growing Global Public Sector team is focused on using AI to address critical challenges facing the public sector around the world.</p>
<p>Our core work consists of creating custom AI applications that will impact millions of citizens, generating high-quality training data for national LLMs, and upskilling and advisory services to spread the impact of AI.</p>
<p>As a Production AI Ops Lead, you will design and develop the production lifecycle of full-stack AI applications, while supporting end-to-end system reliability, real-time inference observability, sovereign data orchestration, high-security software integration, and the resilient cloud infrastructure required for our international government partners.</p>
<p><strong>Responsibilities</strong></p>
<ul>
<li>Own the production outcome: Take full accountability for the long-term performance and reliability of AI use cases deployed across international government agencies.</li>
</ul>
<ul>
<li>Ensure Full-Stack integrity: Oversee the end-to-end health of the platform, ensuring seamless integration between the AI core and all full-stack components, from APIs to UI, to maintain a responsive and production-ready environment.</li>
</ul>
<ul>
<li>Scale the feedback loop: Build automated systems to monitor model performance and data drift across geographically dispersed environments, ensuring the right levels of reliability.</li>
</ul>
<ul>
<li>Navigate global compliance: Manage the technical lifecycle within diverse regulatory frameworks.</li>
</ul>
<ul>
<li>Incident command: Lead the response for production issues in mission-critical environments, ensuring rapid resolution and building the guardrails to prevent them from happening again.</li>
</ul>
<ul>
<li>Bridge the gap: Translate deep technical performance metrics into clear insights for senior international government officials.</li>
</ul>
<ul>
<li>Drive product evolution: Partner with our Engineering and ML teams to ensure the lessons learned in the field directly influence the technical architecture and decisions of future use cases.</li>
</ul>
<p><strong>Ideal Candidate</strong></p>
<ul>
<li>Experience: 6+ years in a high-impact technical role (SRE, FDE or MLOps) with experience in the public sector.</li>
</ul>
<ul>
<li>Global perspective: Familiarity with international government security standards and the complexities of deploying sovereign AI.</li>
</ul>
<ul>
<li>System architecture proficiency: Proven experience maintaining production-grade applications with a deep understanding of the full request lifecycle-connecting frontend/API layers to the backend and AI core.</li>
</ul>
<ul>
<li>Modern AI Stack expertise: Proficiency in coding and the modern AI infrastructure, including Kubernetes, vector databases, agentic development, and LLM observability tools.</li>
</ul>
<ul>
<li>Ownership: You treat every production deployment as your own. You race toward solving hard problems before the customer even sees them.</li>
</ul>
<ul>
<li>Reliability: You understand that in the public sector, a model failure may be a risk to public safety or privacy.</li>
</ul>
<ul>
<li>Customer communication: The ability to explain to a high-ranking official why the performance of the system has degraded and how we are fixing it.</li>
</ul>
<p><strong>About Us</strong></p>
<p>At Scale, our mission is to develop reliable AI systems for the world&#39;s most important decisions. Our products provide the high-quality data and full-stack technologies that power the world&#39;s leading models, and help enterprises and governments build, deploy, and oversee AI applications that deliver real impact.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>Kubernetes, Vector databases, Agentic development, LLM observability tools, SRE, FDE, MLOps</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Scale</Employername>
      <Employerlogo>https://logos.yubhub.co/scale.com.png</Employerlogo>
      <Employerdescription>Scale develops reliable AI systems for the world&apos;s most important decisions.</Employerdescription>
      <Employerwebsite>https://scale.com/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/scaleai/jobs/4654510005</Applyto>
      <Location>Doha, Qatar; London, UK</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>ca221b6f-dca</externalid>
      <Title>Technical Program Manager, Safeguards (Infrastructure &amp; Evals)</Title>
      <Description><![CDATA[<p><strong>About the Role</strong></p>
<p>Safeguards Engineering builds and operates the infrastructure that keeps Anthropic&#39;s AI systems safe in production. As a Technical Program Manager for Safeguards Infrastructure and Evals, you&#39;ll own the operational health and forward momentum of this stack.</p>
<p>Your primary responsibility is driving reliability , owning the incident-response and post-mortem process, ensuring SLOs are defined and met in partnership with various teams, and making sure that when things go wrong, the right people know, the right actions get taken, and those actions actually get closed out.</p>
<p>Alongside that ongoing operational rhythm, you&#39;ll coordinate the larger platform investments: migrations, eval-platform improvements, and the cross-team dependencies that connect them.</p>
<p>This role sits at the intersection of operations and program management. It requires genuine technical depth , you need to understand how these systems work well enough to triage effectively, judge what&#39;s actually safety-critical versus what can wait, and have informed conversations with the engineers building and maintaining them.</p>
<p>But the core of the job is keeping the machine running well and the work moving.</p>
<p><strong>Responsibilities</strong></p>
<ul>
<li>Own the Safeguards Engineering ops review</li>
<li>Drive the recurring cadence that keeps the team informed and coordinated: surfacing recent incidents and failures, bringing visibility to reliability trends, and making sure the right people are in the room when decisions need to be made.</li>
<li>Drive incident tracking and post-mortem execution</li>
<li>Establish and maintain SLOs with partner teams</li>
<li>Maintain runbook quality and incident-ownership clarity</li>
<li>Drive platform migrations and infrastructure projects</li>
<li>Coordinate evals platform improvements</li>
</ul>
<p><strong>Requirements</strong></p>
<ul>
<li>Solid technical program management experience, particularly in operational or infrastructure-heavy environments</li>
<li>Understanding of how production ML systems work well enough to triage incidents intelligently and have substantive conversations with engineers about what&#39;s going wrong and why</li>
<li>Ability to work effectively across team boundaries</li>
<li>Experience with or strong interest in AI safety</li>
</ul>
<p><strong>Nice to Have</strong></p>
<ul>
<li>Experience with SRE practices, incident management frameworks, or on-call operations at scale</li>
<li>Familiarity with monitoring and alerting tooling (PagerDuty, Datadog, or equivalents)</li>
<li>Experience driving infrastructure migrations in complex, multi-team environments</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$290,000-$365,000 USD</Salaryrange>
      <Skills>Technical Program Management, Operational or Infrastructure-heavy Environments, Production ML Systems, Incident Tracking and Post-Mortem Execution, Service-Level Objectives (SLOs), Runbook Quality and Incident-Ownership Clarity, Platform Migrations and Infrastructure Projects, Evals Platform Improvements, SRE Practices, Incident Management Frameworks, On-Call Operations at Scale, Monitoring and Alerting Tooling, Infrastructure Migrations in Complex, Multi-Team Environments</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Anthropic</Employername>
      <Employerlogo>https://logos.yubhub.co/anthropic.ai.png</Employerlogo>
      <Employerdescription>Anthropic develops artificial intelligence systems. It has a growing team of researchers, engineers, and business leaders.</Employerdescription>
      <Employerwebsite>https://anthropic.ai/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/anthropic/jobs/5108695008</Applyto>
      <Location>San Francisco, CA | New York City, NY | Seattle, WA</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>491db8e9-776</externalid>
      <Title>Staff Site Reliability Engineer- Splunk Expert</Title>
      <Description><![CDATA[<p>We are seeking a highly technical Staff Site Reliability Engineer with deep expertise in Splunk and Grafana to own and evolve our observability ecosystem.</p>
<p>As a Staff Site Reliability Engineer, you will move beyond simple monitoring to architect a comprehensive, scalable telemetry platform. You will be our subject-matter expert in Splunk optimisation, ensuring our logging architecture is performant, cost-effective, and deeply integrated with our automated workflows.</p>
<p>Key responsibilities include:</p>
<ul>
<li>Splunk Architecture &amp; Optimisation: Lead the design and tuning of Splunk environments. Optimise indexer performance, search efficiency, and data models to ensure rapid troubleshooting and cost-efficiency.</li>
</ul>
<ul>
<li>Advanced Visualisation: Architect and maintain sophisticated Grafana dashboards that correlate disparate data sources into a single pane of glass for real-time system health.</li>
</ul>
<ul>
<li>Automated Infrastructure: Design, build, and maintain scalable observability infrastructure using tools like Terraform.</li>
</ul>
<ul>
<li>Pipeline Engineering: Optimise the collection, processing, and storage of telemetry data (Metrics, Logs, Traces) to ensure high reliability and low latency.</li>
</ul>
<ul>
<li>Workflow Automation: Develop custom Splunk workflows and integrations that trigger automated responses to system events, reducing Mean Time to Resolution (MTTR).</li>
</ul>
<ul>
<li>Incident Response: Participate in on-call rotations and lead post-incident reviews to drive systemic improvements through &#39;observability-driven development.&#39;</li>
</ul>
<p>Required skills and experience include:</p>
<ul>
<li>Splunk Mastery: Deep, hands-on experience with Splunk administration, search optimisation (SPL), and architecting complex data pipelines.</li>
</ul>
<ul>
<li>Grafana Expertise: Proven ability to build actionable, intuitive dashboards in Grafana that go beyond simple charts to provide deep operational insights.</li>
</ul>
<ul>
<li>SRE Mindset: Minimum 8+ years of experience in an SRE, DevOps, or Systems Engineering role with a focus on high-availability systems.</li>
</ul>
<ul>
<li>Programming Proficiency: Strong coding skills in Go, Python, or Ruby for building internal tools and automating observability workflows.</li>
</ul>
<ul>
<li>Telemetry Standards: Hands-on experience with OpenTelemetry (OTel), Prometheus, or similar frameworks for instrumenting applications.</li>
</ul>
<ul>
<li>Distributed Systems: Deep understanding of Linux internals, networking (TCP/IP, DNS, Load Balancing), and container orchestration (Kubernetes/EKS).</li>
</ul>
<p>Bonus skills include:</p>
<ul>
<li>Tracing: Implementation of distributed tracing (Jaeger, Tempo, or Honeycomb) to visualise request flow across microservices.</li>
</ul>
<ul>
<li>Security Observability: Experience using Splunk for security orchestration (SOAR) or SIEM-related workflows.</li>
</ul>
<ul>
<li>Cloud Platforms: Experience managing observability native tools within AWS, Azure, or GCP.</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>staff</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>Splunk, Grafana, SRE, Go, Python, Ruby, OpenTelemetry, Prometheus, Linux, Networking, Container Orchestration, Tracing, Security Observability, Cloud Platforms</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Okta</Employername>
      <Employerlogo>https://logos.yubhub.co/okta.com.png</Employerlogo>
      <Employerdescription>Okta is a publicly traded software company that specialises in identity and access management.</Employerdescription>
      <Employerwebsite>https://www.okta.com/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/okta/jobs/6874616</Applyto>
      <Location>Bengaluru, India</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>6f3a053e-c43</externalid>
      <Title>Staff Software Engineer, AI Reliability Engineering</Title>
      <Description><![CDATA[<p>We&#39;re seeking a Staff Software Engineer to join our AI Reliability Engineering team. As a key member of our team, you will develop Service Level Objectives for large language model serving systems, design and implement monitoring and observability systems, and lead incident response for critical AI services.</p>
<p>You will work closely with teams across Anthropic to improve reliability across our most critical serving paths. You will be responsible for making the systems that deliver Claude more robust and resilient, whether during an incident or collaborating on projects.</p>
<p>To be successful in this role, you should have strong distributed systems, infrastructure, or reliability backgrounds. You should be curious and brave, comfortable jumping into unfamiliar systems during an incident and helping drive resolution even when you don&#39;t have deep expertise yet.</p>
<p>You will be working on high-availability serving infrastructure across multiple regions and cloud providers. You will support the reliability of safeguard model serving, which is critical for both site reliability and Anthropic&#39;s safety commitments.</p>
<p>If you&#39;re committed to creating reliable, interpretable, and steerable AI systems, and you&#39;re passionate about working on complex technical problems, we&#39;d love to hear from you.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>staff</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>€235.000-€295.000 EUR</Salaryrange>
      <Skills>distributed systems, infrastructure, reliability, Service Level Objectives, monitoring, observability, incident response, high-availability serving infrastructure, cloud providers, SRE, Production Engineer, chaos engineering, systematic resilience testing, AI-specific observability tools and frameworks, ML hardware accelerators, RDMA, InfiniBand</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Anthropic</Employername>
      <Employerlogo>https://logos.yubhub.co/anthropic.com.png</Employerlogo>
      <Employerdescription>Anthropic is a public benefit corporation that creates reliable, interpretable, and steerable AI systems.</Employerdescription>
      <Employerwebsite>https://www.anthropic.com/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/anthropic/jobs/5101169008</Applyto>
      <Location>Dublin, IE</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>ac14f361-5b8</externalid>
      <Title>Network Engineer, Capacity and Efficiency</Title>
      <Description><![CDATA[<p>We&#39;re looking for a network engineer who thinks in metrics first. You will use deep networking knowledge and rigorous measurement to figure out where and how bandwidth, latency, and dollars are being used, find optimization opportunities and land them.</p>
<p>You will instrument spine-leaf fabrics, BGP, SDN overlays, and cloud interconnect products well enough to build them. You&#39;ll own the observability and efficiency surface for Anthropic&#39;s network: from per-flow telemetry on backbone routers, to QoS policy on cross-region links carrying inference traffic, to cost attribution that tells a research team exactly what their checkpoint sync is costing.</p>
<p>This is a hands-on IC role. You&#39;ll write code (Python, Go), build dashboards, model capacity, and ship config changes to production routers. You&#39;ll also influence architecture: when the data says a traffic pattern is pathological, you&#39;ll be in the room root causing it and fixing it.</p>
<p>You will be working across three areas: network telemetry and observability, traffic engineering, and cost modeling and attribution. We expect you to be strong in at least two and willing to grow into the third.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>BGP, ECMP, VXLAN/EVPN, QoS, L1/optical basics, CSP networking model, network telemetry, flow export, eBPF-based host-side instrumentation, Python, Go, SRE experience for large-scale network infrastructure, cloud provider&apos;s networking team or a cloud networking product team, AI/ML infrastructure traffic patterns, HPC fabrics, traffic engineering for large backbones</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Anthropic</Employername>
      <Employerlogo>https://logos.yubhub.co/anthropic.com.png</Employerlogo>
      <Employerdescription>Anthropic is a technology company that creates reliable, interpretable, and steerable AI systems.</Employerdescription>
      <Employerwebsite>https://anthropic.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/anthropic/jobs/5177143008</Applyto>
      <Location>San Francisco, CA | New York City, NY</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>ebf95cea-76b</externalid>
      <Title>Technical Escalation Manager</Title>
      <Description><![CDATA[<p>As a Technical Escalation Manager at Databricks, you will be responsible for coordinating efforts to resolve critical customer issues, customer-impacting situations, and major incidents. You will work with multiple internal teams (engineering, product management, Customer Success Engineering, and Support) and external partners to effectively resolve these customer-impacting situations.</p>
<p>Your key responsibilities will include:</p>
<ul>
<li>Managing support escalation in partnership with engineering, product management, Customer Success Engineering, Support, Customers, and Partners until resolution.</li>
<li>Achieving customer satisfaction by ensuring incidents or escalations (and related cases) are well and fully documented with the timely execution of action items.</li>
<li>Creating and executing a data-driven customer recovery plan for every escalation and incident that is addressed.</li>
<li>Utilizing business and technical skills to manage customer escalations, coordinate meetings and deliverables, and analyze trends and patterns for reporting purposes.</li>
<li>Using data, metrics, and feedback to inform operational and tactical decisions that improve incident and escalation management.</li>
<li>Coordinating all necessary resources to fast-track and resolve new incidents and escalations from customers with a clear and detailed plan.</li>
</ul>
<p>We are looking for a candidate with a minimum of 8+ years of experience in customer support, escalation, SRE, or incident management. You should have excellent contextual interpretation and writing skills, as well as the ability to effectively summarize and communicate to both technical and business audiences.</p>
<p>You will also need experience with a &#39;Distributed big data Computing&#39; environment, SQL-based databases, as well as data warehousing and ETL technologies such as Informatica, DataStage, Oracle, Teradata, SQL Server, and MySQL. Linux/Unix administration skills, networking, and Hands-on Cloud experience with AWS, Azure, or GCP are required.</p>
<p>Experience working cross-functionally with support, engineering, product management, and directly with customers; ability to deeply understand product and customer personas is also essential.</p>
<p>A Bachelor&#39;s or Master&#39;s degree in Computer Science or Computer Engineering, or related Engineering field is preferred. Written and spoken proficiency in both Japanese and English is also required.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>remote</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>customer support, escalation, SRE, incident management, distributed big data computing, SQL-based databases, data warehousing, ETL technologies, Linux/Unix administration, networking, cloud experience</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Databricks</Employername>
      <Employerlogo>https://logos.yubhub.co/databricks.com.png</Employerlogo>
      <Employerdescription>Databricks builds and operates the world&apos;s leading data and AI infrastructure platformاساس enabling customers to leverage deep data insights and enhance their business.</Employerdescription>
      <Employerwebsite>https://databricks.com/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/databricks/jobs/8407911002</Applyto>
      <Location>Japan</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>b5ce114e-dac</externalid>
      <Title>Cloud Engineer – Factory Systems and Operational Technology</Title>
      <Description><![CDATA[<p>Anduril Industries is a defence technology company with a mission to transform U.S. and allied military capabilities with advanced technology. By bringing the expertise, technology and business model of the 21st century&#39;s most innovative companies to the defence industry, Anduril is changing how military systems are designed, built and sold.</p>
<p>The company&#39;s family of systems is powered by Lattice OS, an AI-powered operating system that turns thousands of data streams into a real-time, 3D command and control centre.</p>
<p>As the world enters an era of strategic competition, Anduril is committed to bringing cutting-edge autonomy, AI, computer vision, sensor fusion and networking technology to the military in months, not years.</p>
<p>We are seeking a mission-driven Cloud Infrastructure Engineer to take a leading role in designing and implementing world-class defensive controls. This is a high-impact role with the autonomy to shape security architecture and protect the technology that is changing the future of defence.</p>
<p>Key Responsibilities:</p>
<ul>
<li>Design and Own Security Architecture: Architect, build and deploy robust, scalable security controls for our corporate, development and production cloud environments (AWS, Azure, GCP).</li>
</ul>
<ul>
<li>Automate Everything: Develop and automate infrastructure-as-code (IaC) to manage and scale our cloud deployments securely and efficiently.</li>
</ul>
<ul>
<li>Proactively Defend: Continuously monitor, identify and remediate security weaknesses and configuration drift across our entire cloud footprint.</li>
</ul>
<ul>
<li>Be a Force Multiplier: Partner with infrastructure, application and product teams to embed security best practices into their workflows and secure environments holding mission-critical data.</li>
</ul>
<ul>
<li>Enable Scale and Reliability: Engineer systems and processes that ensure our platforms are highly available, resilient and prepared for rapid growth.</li>
</ul>
<ul>
<li>Serve as a Cloud Security Expert: Act as the go-to subject matter expert for teams across Anduril, providing guidance, mentorship and paved-road solutions for building securely in the cloud.</li>
</ul>
<p>Requirements:</p>
<ul>
<li>Proven experience building and securing complex cloud environments, typically gained through 3+ years in a Cloud Security, DevOps or SRE role.</li>
</ul>
<ul>
<li>Deep proficiency in at least one major cloud provider (AWS, Azure or GCP).</li>
</ul>
<ul>
<li>Strong hands-on experience with Infrastructure as Code (e.g., Terraform, CloudFormation, Bicep).</li>
</ul>
<ul>
<li>Solid programming/scripting ability in one or more languages (e.g., Python, Go, Rust).</li>
</ul>
<ul>
<li>Firm understanding of public cloud networking principles (e.g., VPCs, subnets, routing, security groups).</li>
</ul>
<ul>
<li>Must be a U.S. Person and eligible to obtain and maintain a U.S. Top Secret security clearance.</li>
</ul>
<p>Preferred Qualifications:</p>
<ul>
<li>Experience hardening and monitoring Kubernetes clusters (EKS, GKE, AKS).</li>
</ul>
<ul>
<li>Experience with cloud security posture management (CSPM) or threat detection tooling.</li>
</ul>
<ul>
<li>Familiarity with CI/CD pipelines and securing the software supply chain.</li>
</ul>
<ul>
<li>Knowledge of compliance frameworks such as FedRAMP, MRL, SOC 2 or CMMC.</li>
</ul>
<ul>
<li>On-premises network engineering experience.</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange>$129,000-$193,000 USD</Salaryrange>
      <Skills>Cloud Security, DevOps, SRE, Infrastructure as Code, Terraform, CloudFormation, Bicep, Python, Go, Rust, Public Cloud Networking, VPCs, Subnets, Routing, Security Groups, Kubernetes, Cloud Security Posture Management, Threat Detection Tooling, CI/CD Pipelines, Software Supply Chain Security, Compliance Frameworks, FedRAMP, MRL, SOC 2, CMMC, On-Premises Network Engineering</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Anduril Industries</Employername>
      <Employerlogo>https://logos.yubhub.co/anduril.com.png</Employerlogo>
      <Employerdescription>Anduril Industries is a defence technology company that designs, builds and sells advanced military systems.</Employerdescription>
      <Employerwebsite>https://www.anduril.com/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/andurilindustries/jobs/5087348007</Applyto>
      <Location>Costa Mesa, California, United States</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>9e667b9c-eb8</externalid>
      <Title>Senior Security Engineer II, Vulnerability Management</Title>
      <Description><![CDATA[<p>We are seeking a Senior Security Engineer to build the Vulnerability Management program protecting CoreWeave&#39;s AI infrastructure. You will architect intelligent automation systems that defend the GPU clusters powering breakthrough AI research and enterprise AI applications.</p>
<p>This role combines technical depth, strategic thinking, and the autonomy to design workflows that will protect infrastructure driving the future of AI.</p>
<p><strong>Key Responsibilities:</strong></p>
<ul>
<li>Build and scale AI-powered triage workflows: evaluate tools (LLM integration, TINES orchestration), architect solutions, and deploy to production</li>
<li>Drive intelligent, risk-based vulnerability prioritization while simultaneously training AI models,your assessments become the foundation for automation</li>
<li>Influence automation priorities: recommend which areas of the vulnerability pipeline would most benefit from automation to improve team efficiency</li>
<li>Design and implement automated detection-to-ticket pipelines: build workflows that generate vulnerability detections, test them, scale across the environment, and auto-create Jira tickets</li>
<li>Execute remediation campaigns: build automated workflows for EOL product removal, vulnerable software upgrades, and OS migrations at scale</li>
<li>Manage embargoed vendor disclosures from hardware partners, including embargo verification and zero-day response coordination</li>
<li>Lead security incident investigations related to high-profile vulnerabilities, coordinating cross-functional response and impact assessment</li>
<li>Participate in on-call rotation for rapid-response vulnerability analysis during active zero-day events or critical security incidents</li>
<li>Partner with IT, Infrastructure, and Engineering teams to drive remediation efforts, enforce SLAs, and escalate blockers strategically</li>
<li>Write daily operations reports documenting vulnerability trends, remediation velocity, and emerging threats for security leadership</li>
<li>Drive process improvements and workflow automation to improve operational efficiency and reduce manual toil</li>
</ul>
<p><strong>Requirements:</strong></p>
<ul>
<li>7+ years of relevant experience with demonstrated impact in vulnerability management, application security, platform security, or cloud security engineering</li>
<li>Bachelor’s or Master’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience</li>
<li>Proven hands-on experience building security automation (SOAR workflows, detection pipelines, or vulnerability prioritization frameworks)</li>
<li>Deep subject matter expertise with vulnerability management best practices: CVSS, EPSS, CISA KEV, exploit intelligence, and compensating controls</li>
<li>Strong development background with proficiency in Python, Go, or similar languages for building production-grade security tools</li>
<li>Experience with modern vulnerability management tooling such as Wiz, Semgrep, Rapid7, or similar platforms</li>
<li>Demonstrated ability to partner with cross-functional teams (IT, SRE, Engineering) to drive remediation without formal authority</li>
<li>Strong familiarity with common security vulnerabilities and the ability to judge their severity and business impact</li>
</ul>
<p><strong>Preferred Qualifications:</strong></p>
<ul>
<li>Practical experience building AI/ML-powered security workflows (LLM integration, automated triage, human-in-the-loop validation)</li>
<li>Experience managing hardware security vulnerabilities (GPU/DPU firmware, BMC/IPMI, specialized compute environments)</li>
<li>Production experience with security automation platforms such as TINES, Splunk SOAR, or serverless frameworks (AWS Lambda)</li>
<li>Strong DevOps, DevSecOps, or SRE background with experience in AWS/GCP/Azure cloud services and Infrastructure as Code (Terraform, CloudFormation)</li>
<li>Deep understanding of container security and Kubernetes (image scanning, admission control, runtime protection, supply chain security)</li>
<li>Experience supporting customer audits (SOC 2, ISO 27001, FedRAMP) with vulnerability evidence and control validation</li>
<li>Experience integrating vulnerability management into modern CI/CD pipelines with a &#39;shift-left&#39; mentality</li>
</ul>
<p><strong>What We Offer:</strong></p>
<p>The base salary range for this role is $165,000 to $242,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).</p>
<p>The range we’ve posted represents the typical compensation range for this role. To determine actual compensation, we review the market rate for each candidate which can include a variety of factors. These include qualifications, experience, interview performance, and location.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$165,000 to $242,000</Salaryrange>
      <Skills>vulnerability management, application security, platform security, cloud security engineering, Python, Go, security automation, SOAR workflows, detection pipelines, vulnerability prioritization frameworks, CVSS, EPSS, CISA KEV, exploit intelligence, compensating controls, Wiz, Semgrep, Rapid7, AI/ML-powered security workflows, hardware security vulnerabilities, security automation platforms, DevOps, DevSecOps, SRE, container security, Kubernetes, customer audits, CI/CD pipelines</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>CoreWeave</Employername>
      <Employerlogo>https://logos.yubhub.co/coreweave.com.png</Employerlogo>
      <Employerdescription>CoreWeave is a cloud computing company that provides a platform for AI development and deployment.</Employerdescription>
      <Employerwebsite>https://www.coreweave.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/coreweave/jobs/4650290006</Applyto>
      <Location>Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>f516f0ef-a2d</externalid>
      <Title>Senior Site Reliability Engineer (Auth0)</Title>
      <Description><![CDATA[<p>Secure Every Identity, from AI to Human Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organisations to safely embrace this new era.</p>
<p>This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence. This is an opportunity to do career-defining work. We&#39;re all in on this mission.</p>
<p>As a Senior Site Reliability Engineer, you&#39;ll join our SRE team based in Europe to ensure our production systems are not only operational but also resilient, scalable, and ready for exponential growth. This isn&#39;t just about keeping the lights on; it&#39;s about directly contributing to the platform&#39;s core resiliency and robustness.</p>
<p>You&#39;ll be a hands-on builder, crafting solutions that make our system more reliable by design.</p>
<p><strong>Key Responsibilities:</strong></p>
<ul>
<li>Design and build custom software in Go to enhance the platform&#39;s reliability, resiliency, and redundancy.</li>
<li>Partner with engineering teams to embed reliability principles, improving the availability, performance, and observability of our services.</li>
<li>Use your deep understanding of infrastructure and observability principles to identify opportunities for improvement within the product and implement solutions.</li>
<li>Contribute to our on-call rotation, providing rapid, effective response to critical incidents and using your expertise to troubleshoot, mitigate or accurately escalate production issues.</li>
<li>Develop and refine our SRE tooling and processes, focusing on automation and operational efficiency.</li>
<li>Define, document, and champion reliability best practices across the organisation.</li>
</ul>
<p><strong>Requirements:</strong></p>
<ul>
<li>A proactive and systematic approach to problem-solving, with a high degree of ownership.</li>
<li>Proven experience in a production environment supporting large-scale, mission-critical applications with a high degree of autonomy.</li>
<li>Proficiency in at least one programming language, with a preference for Go. You should be comfortable writing custom applications, not just scripts.</li>
<li>Experience with infrastructure as code (Terraform), container orchestration (Kubernetes, Docker) and GitOps (ArgoCD).</li>
<li>Demonstrable expertise in a major cloud provider (Azure, AWS, or GCP).</li>
<li>A strong grasp of microservices architecture, databases (SQL, NoSQL), and networking fundamentals, so you can understand how custom code can solve platform-level issues.</li>
<li>An understanding of core SRE principles, including SLIs, SLOs, and error budgets.</li>
<li>Experience in an on-call rotation for a 24/7 cloud-based environment.</li>
<li>Exceptional communication and collaboration skills, with a proven ability to work effectively in a remote, distributed team, where tasks may be self-driven.</li>
</ul>
<p>We&#39;re looking for someone who is not just looking for a job, but a career-defining opportunity to tackle complex challenges at a massive scale. If you&#39;re a curious and motivated engineer who&#39;s passionate about building reliability directly into the platform, we&#39;d love to hear from you.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$136,000-$187,000 CAD</Salaryrange>
      <Skills>Go, Terraform, Kubernetes, Docker, GitOps, Cloud provider (Azure, AWS, or GCP), Microservices architecture, Databases (SQL, NoSQL), Networking fundamentals, Core SRE principles (SLIs, SLOs, error budgets)</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Okta</Employername>
      <Employerlogo>https://logos.yubhub.co/okta.com.png</Employerlogo>
      <Employerdescription>Okta provides an unparalleled authentication experience for hundreds of millions of users worldwide.</Employerdescription>
      <Employerwebsite>https://www.okta.com/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/okta/jobs/7791590</Applyto>
      <Location>Toronto, Ontario, Canada</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>fd1da18e-84d</externalid>
      <Title>Principal Software Engineer II - Observability</Title>
      <Description><![CDATA[<p>We&#39;re looking for a Principal Software Engineer to join the Observability Experience Team as one of the Tech Leads. As part of this team, you will work at the intersection of big data engineering, backend architecture, and experiences to help users obtain the best insights from their Observability signals, especially logs, metrics, and traces.</p>
<p>Key responsibilities include collaborating with product management, product design, and multiple teams across Elastic to define and evolve the end-to-end experiences for Observability. You will also be a contact point for other teams within Elastic, providing hands-on support and guidance. Additionally, you will help the team define coding practices and standards, foster a culture of mutual respect, collaboration, and consensus-based decision-making, and stay true to the principles of software development as adopted by the team.</p>
<p>The ideal candidate will have experience leading technical projects in the data and enterprise architecture areas, with a proven knowledge in building and running sophisticated technical infrastructures and engineering sound software systems. They should also have hands-on experience using and developing Observability tools, preferably in the Logs space, and experience mentoring expert engineers, providing technical and professional guidance. Furthermore, they should be able to define a long-term technical vision for an area of a data-intensive application, working across teams and organizations to collaboratively build the technical roadmap.</p>
<p>Bonus points for experience as a user of the Elastic Stack and experience in SRE roles.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>remote</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>Observability tools, Logs space, Big data engineering, Backend architecture, Experiences, Elastic Stack, SRE roles</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Elastic, the Search AI Company</Employername>
      <Employerlogo>https://logos.yubhub.co/elastic.co.png</Employerlogo>
      <Employerdescription>Elastic enables everyone to find the answers they need in real time, using all their data, at scale. The Elastic Search AI Platform is used by more than 50% of the Fortune 500.</Employerdescription>
      <Employerwebsite>https://www.elastic.co/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/elastic/jobs/7635297</Applyto>
      <Location>Greece</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>da7679a6-e4f</externalid>
      <Title>Senior Technical Operations Lead</Title>
      <Description><![CDATA[<p>Job Title: Senior Technical Operations Lead</p>
<p>We are seeking an experienced Senior Technical Operations Lead to drive operational excellence across our Infrastructure Engineering organization.</p>
<p>As a Senior Technical Operations Lead, you will design and implement world-class operational processes, establish SRE best practices, and mentor technical teams to achieve exceptional reliability and efficiency.</p>
<p>Key Responsibilities:</p>
<p>SRE Leadership &amp; Transformation</p>
<ul>
<li>Lead the design and implementation of SRE practices and tooling across Infrastructure Engineering</li>
</ul>
<ul>
<li>Establish and cultivate an SRE-focused culture at Zoominfo</li>
</ul>
<p>Operational Process Design &amp; Governance</p>
<ul>
<li>Establish clear governance frameworks and procedural consistency</li>
</ul>
<ul>
<li>Make decisions about process exceptions and/or changes to accommodate different team contexts</li>
</ul>
<ul>
<li>Design and/or implement process automations using scripts and integrations</li>
</ul>
<ul>
<li>Define functional requirements and goals for process automations</li>
</ul>
<ul>
<li>Conduct hands-on and/or automated audits to ensure process adherence and identify improvement opportunities</li>
</ul>
<p>Incident Management &amp; Root Cause Analysis</p>
<ul>
<li>Design, implement, and continuously improve Incident Management and Change Management procedures that scale across the organization, using tools such as PagerDuty, Slack, Jira, ServiceNow, and custom integrations</li>
</ul>
<ul>
<li>Lead and participate in root cause analysis sessions, driving teams toward systemic improvements rather than blame</li>
</ul>
<ul>
<li>Design and execute incident dry runs and tabletop exercises to build organizational resilience</li>
</ul>
<ul>
<li>Establish metrics and KPIs that measure incident response effectiveness and drive continuous improvement</li>
</ul>
<p>Enable Data-Driven Decision Making</p>
<ul>
<li>Identify, define, and automate the tracking of operational KPIs and departmental metrics that matter, enabling senior managers to make informed decisions on the basis of data</li>
</ul>
<ul>
<li>Build and maintain metric dashboards and automated reporting systems that provide real-time visibility into operational health</li>
</ul>
<ul>
<li>Analyze trends and surface opportunities for optimization</li>
</ul>
<p>Stakeholder Engagement, Training &amp; Mentorship</p>
<ul>
<li>Build and maintain strong relationships with Engineering managers, Product Managers, and cross-functional stakeholders across geographies</li>
</ul>
<ul>
<li>Maintain a feedback loop. Meet with stakeholders to understand process pain points.</li>
</ul>
<ul>
<li>Influence others by fostering trust, leading by example, and inspiring them with your expertise and passion for reliability practices.</li>
</ul>
<ul>
<li>Enhance internal knowledge of third-party tools such as Pagerduty, Datadog, and more, by educating Zoominfo employees on these tools.</li>
</ul>
<p>Deliver training sessions that make Operational Excellence engaging and motivating for diverse audiences.</p>
<p>Required Experience &amp; Qualifications:</p>
<ul>
<li>Bachelor’s degree in Software Engineering, Operations Management, or related field</li>
</ul>
<ul>
<li>7+ years of hands-on experience in technical operations, Site Reliability Engineering (SRE), Incident Management, or IT Service Management roles within SaaS or technical organizations</li>
</ul>
<ul>
<li>Fluent English proficiency (written and verbal)</li>
</ul>
<ul>
<li>Proven track record designing and implementing operational processes at scale</li>
</ul>
<ul>
<li>Demonstrated expertise in SRE principles, practices, and tooling</li>
</ul>
<ul>
<li>Strong data analysis skills with ability to define metrics, build or design dashboards, and use data to drive strategic decisions</li>
</ul>
<ul>
<li>Proven ability to work effectively in a matrix organizational structure</li>
</ul>
<ul>
<li>Ability and experience working with senior management at global organizations</li>
</ul>
<ul>
<li>Hands-on experience with monitoring and observability tools such as PagerDuty and/or Datadog</li>
</ul>
<ul>
<li>Familiarity with Jira, Confluence, Google Data Studio, or Tableau</li>
</ul>
<ul>
<li>Experience with scripting and integrations (Python, JavaScript, Google AppScript, or similar)</li>
</ul>
<ul>
<li>Background in SRE transformation or organizational process improvement initiatives</li>
</ul>
<p>#LI-SS4 #LI-Hybrid</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>Site Reliability Engineering (SRE), Technical Operations, Incident Management, IT Service Management, Monitoring and Observability Tools, Jira, Confluence, Google Data Studio, Tableau, Scripting and Integrations, Python, JavaScript, Google AppScript</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>ZoomInfo</Employername>
      <Employerlogo>https://logos.yubhub.co/zoominfo.com.png</Employerlogo>
      <Employerdescription>ZoomInfo is a technology company that provides a go-to-market intelligence platform. It has over 35,000 customers worldwide.</Employerdescription>
      <Employerwebsite>https://www.zoominfo.com/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/zoominfo/jobs/8451386002</Applyto>
      <Location>Ra&apos;anana, Israel</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>81e928a2-c9f</externalid>
      <Title>Senior Site Reliability Engineer (Auth0)</Title>
      <Description><![CDATA[<p>Secure Every Identity</p>
<p>We are looking for a Senior Site Reliability Engineer to join our SRE team based in Europe. As a Senior Site Reliability Engineer, you&#39;ll ensure our production systems are not only operational but also resilient, scalable, and ready for exponential growth.</p>
<p>This isn&#39;t just about keeping the lights on; it&#39;s about directly contributing to the platform&#39;s core resiliency and robustness. You&#39;ll be a hands-on builder, crafting solutions that make our system more reliable by design.</p>
<p>Responsibilities</p>
<ul>
<li>Design and build custom software in Go to enhance the platform&#39;s reliability, resiliency, and redundancy.</li>
<li>Partner with engineering teams to embed reliability principles, improving the availability, performance, and observability of our services.</li>
<li>Use your deep understanding of infrastructure and observability principles to identify opportunities for improvement within the product and implement solutions.</li>
<li>Contribute to our on-call rotation, providing rapid, effective response to critical incidents and using your expertise to troubleshoot, mitigate or accurately escalate production issues.</li>
<li>Develop and refine our SRE tooling and processes, focusing on automation and operational efficiency.</li>
<li>Define, document, and champion reliability best practices across the organisation.</li>
</ul>
<p>What you&#39;ll need to be successful</p>
<p>This role requires a unique blend of a software engineer&#39;s mindset and operational expertise. You&#39;ll thrive in this role if you have:</p>
<ul>
<li>A proactive and systematic approach to problem-solving, with a high degree of ownership.</li>
<li>Proven experience in a production environment supporting large-scale, mission-critical applications with a high degree of autonomy.</li>
<li>Proficiency in at least one programming language, with a preference for Go. You should be comfortable writing custom applications, not just scripts.</li>
<li>Experience with infrastructure as code (Terraform), container orchestration (Kubernetes, Docker) and GitOps (ArgoCD).</li>
<li>Demonstrable expertise in a major cloud provider (Azure, AWS, or GCP).</li>
<li>A strong grasp of microservices architecture, databases (SQL, NoSQL), and networking fundamentals, so you can understand how custom code can solve platform-level issues.</li>
<li>An understanding of core SRE principles, including SLIs, SLOs, and error budgets.</li>
<li>Experience in an on-call rotation for a 24/7 cloud-based environment.</li>
<li>Exceptional communication and collaboration skills, with a proven ability to work effectively in a remote, distributed team, where tasks may be self-driven.</li>
</ul>
<p>The Okta Experience</p>
<ul>
<li>Supporting Your Well-Being</li>
<li>Driving Social Impact</li>
<li>Developing Talent and Fostering Connection + Community</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>remote</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>Go, Terraform, Kubernetes, Docker, GitOps, Cloud provider (Azure, AWS, or GCP), Microservices architecture, Databases (SQL, NoSQL), Networking fundamentals, Core SRE principles (SLIs, SLOs, error budgets)</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Okta</Employername>
      <Employerlogo>https://logos.yubhub.co/okta.com.png</Employerlogo>
      <Employerdescription>Okta provides an unparalleled authentication experience for hundreds of millions of users worldwide. It is a large technology company.</Employerdescription>
      <Employerwebsite>https://www.okta.com/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/okta/jobs/7418982</Applyto>
      <Location>Barcelona, Spain</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>9cd0420a-99d</externalid>
      <Title>Network Engineer, Capacity and Efficiency</Title>
      <Description><![CDATA[<p><strong>About the Role</strong></p>
<p>We&#39;re looking for a network engineer who thinks in metrics first. You will use deep networking knowledge and rigorous measurement to figure out where and how bandwidth, latency, and dollars are being used, find optimization opportunities and land them.</p>
<p><strong>Responsibilities</strong></p>
<ul>
<li>Build the network observability stack. Design and deploy telemetry pipelines , sFlow/IPFIX, gNMI streaming, eBPF host probes , that turn packet counters into per-flow, per-tenant, per-workload cost and utilization data. Own the SLIs for backbone and DCN fabric health.</li>
<li>Hunt for efficiency. Analyze inter-region traffic patterns, identify hot links and stranded capacity, and quantify the dollar impact. Build the models that tell us whether we should buy more capacity, or move the workload.</li>
<li>Own QoS and traffic engineering. Design and operate traffic classification, marking, and shaping across the backbone. Make sure bulk checkpoint transfers don’t starve latency-sensitive inference, and that we’re not paying premium cross-region rates for traffic that could take the cheap path.</li>
<li>Drive cost attribution. Tie network spend , egress, interconnect ports, transit, optical leases , back to the teams and workloads that generate it. Make network cost a first-class input to capacity planning and workload placement decisions.</li>
<li>Influence decisions you don&#39;t own. A large fraction of this role is convincing other teams to act on what your data shows: making the case to research that a traffic pattern needs to change, to finance that an interconnect tranche is worth buying, to Systems Networking that a QoS policy needs rewriting.</li>
</ul>
<p><strong>Requirements</strong></p>
<ul>
<li>Have 5+ years operating large-scale production networks , data center fabrics (spine-leaf, Clos), backbone/WAN, or hyperscaler-adjacent environments.</li>
<li>Are genuinely fluent across the stack: BGP (including policy and communities), ECMP, VXLAN/EVPN or equivalent overlays, QoS (DSCP, queuing, shaping), and L1/optical basics (DWDM, coherent, LAGs).</li>
<li>Know at least one major CSP’s networking model deeply , AWS (VPC, TGW, Direct Connect, Gateway Load Balancer) or GCP (Shared VPC, Interconnect, Cloud Router, Network Connectivity Center) , and understand how their overlays interact with physical underlays.</li>
<li>Have built or operated network telemetry at scale: streaming telemetry (gNMI/OpenConfig), flow export (sFlow, IPFIX, NetFlow), or eBPF-based host-side instrumentation. You can reason about sampling, cardinality, and storage tradeoffs.</li>
<li>Comfortable writing Python or Go to build tooling, telemetry pipelines, infrastructure-as-code, config management for network devices and automation, that you’ll ship to production.</li>
<li>Think quantitatively by default. You reach for a notebook or a Grafana query before you reach for an opinion, and you can turn messy counter data into a defensible cost model.</li>
<li>Communicate crisply. You can explain to a finance partner why a 10% egress reduction matters, and to a network engineer why a specific ECMP imbalance is costing real money.</li>
</ul>
<p><strong>Nice to Have</strong></p>
<ul>
<li>SRE experience for large-scale network infrastructure , designing for reliability, defining SLOs/SLIs for network services, capacity planning with error budgets, and incident response for network-impacting outages at scale.</li>
<li>Background on a cloud provider&#39;s networking team or a cloud networking product team , building or operating the interconnect, backbone, or SDN control plane from the provider side, not just consuming it as a customer.</li>
<li>Familiarity with AI/ML infrastructure traffic patterns like collective communication (all-reduce, all-gather), checkpoint/weight transfer, inference serving, and how these stress networks differ than traditional workloads in terms of burst behavior, flow synchronization, and bandwidth symmetry.</li>
<li>Experience with HPC fabrics like InfiniBand, RoCE v2, lossless Ethernet, or custom high-radix topologies and an understanding of how job placement, congestion management, and adaptive routing interact at scale.</li>
<li>Background in traffic engineering for large backbones and the operational judgment to know when TE is worth the complexity.</li>
<li>Hands-on time with multi-cloud connectivity: cross-cloud peering, private interconnect products, and the billing models that come with them.</li>
<li>Experience building cost/chargeback systems for shared infrastructure, or FinOps exposure in a large cloud environment.</li>
</ul>
<p><strong>Representative Projects</strong></p>
<ul>
<li>Build a per-flow cost attribution pipeline that traces every byte of cross-region egress back to the team and workload that generated it</li>
<li>Design QoS policy for the private backbone that prevents bulk checkpoint transfers from starving inference traffic</li>
<li>Model whether it&#39;s cheaper to buy an additional 1.6Tb interconnect tranche or to re-route traffic through existing capacity</li>
<li>Instrument DCN fabric utilization with streaming telemetry and build the Grafana dashboards that become the team&#39;s source of truth for network observability</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>network engineering, network observability, telemetry pipelines, sFlow/IPFIX, gNMI streaming, eBPF host probes, BGP, ECMP, VXLAN/EVPN, QoS, DSCP, queuing, shaping, L1/optical basics, DWDM, coherent, LAGs, AWS, GCP, cloud networking, infrastructure-as-code, config management, automation, Python, Go, quantitative analysis, cost modeling, communication, SRE, cloud provider&apos;s networking team, cloud networking product team, AI/ML infrastructure traffic patterns, HPC fabrics, traffic engineering, multi-cloud connectivity, cost/chargeback systems, FinOps</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Anthropic</Employername>
      <Employerlogo>https://logos.yubhub.co/anthropic.com.png</Employerlogo>
      <Employerdescription>Anthropic is a technology company that creates reliable, interpretable, and steerable AI systems.</Employerdescription>
      <Employerwebsite>https://anthropic.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/anthropic/jobs/5177143008</Applyto>
      <Location>San Francisco, CA | New York City, NY</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>9b8fb427-b59</externalid>
      <Title>Elastic AI Engineer - Canada (Remote)</Title>
      <Description><![CDATA[<p>We are looking for an innovative Elastic AI Engineer to join our team to build autonomous, enterprise-grounded agents that don&#39;t just answer questions,they complete complex business tasks to accelerate productivity across the entire organization.</p>
<p>The ideal candidate is an Elastic product expert (including but not limited to Agent Builder and Workflows), using the full power of the Elastic Stack to provide the &#39;brain&#39; and &#39;memory&#39; for our agentic ecosystem.</p>
<p>As the company behind the popular open-source projects , Elasticsearch, Kibana, Logstash, and Beats , we help people around the world do great things with their data.</p>
<p>The Elastic family unites employees across 40+ countries into one coherent team, while the broader community spans across over 100 countries.</p>
<p><strong>Responsibilities</strong></p>
<ul>
<li>Agentic Strategy &amp; Design: Invent and implement sophisticated agentic workflows that use reasoning and tools to complete end-to-end business processes.</li>
</ul>
<ul>
<li>Enterprise Grounding: Apply Retrieval Augmented Generation (RAG) and the Elasticsearch Relevance Engine (ESRE) to ensure agents are deeply grounded in enterprise knowledge for high-accuracy task completion.</li>
</ul>
<ul>
<li>AI Model &amp; Tool Integration: Develop and fine-tune LLMs and integrate them with internal APIs and third-party SaaS tools to enable autonomous action.</li>
</ul>
<ul>
<li>Scalable Infrastructure: Firm understanding of cloud-based environments (AWS, Azure, GCP) in order to support the high-concurrency demands of enterprise agents.</li>
</ul>
<ul>
<li>Lifecycle Management: Oversee the training, deployment, and performance optimization of agents, ensuring they remain secure, reliable, and compliant.</li>
</ul>
<ul>
<li>Technical Leadership: Act as a domain expert on the Elastic Stack, making technical recommendations that push the boundaries of AI-driven productivity.</li>
</ul>
<ul>
<li>Documentation: Maintain comprehensive documentation of AI workflows, cloud infrastructure, and deployment processes.</li>
</ul>
<ul>
<li>Security: Implement standards for security and data privacy to protect sensitive information and ensure compliance with relevant regulations.</li>
</ul>
<p><strong>Requirements</strong></p>
<ul>
<li>3-5 years of work experience in a relevant field.</li>
</ul>
<ul>
<li>Minimum 1 year experience building with the Elastic Stack.</li>
</ul>
<ul>
<li>Knowledge of Elasticsearch Relevance Engine (ESRE), Jina AI, and advanced RAG patterns is critical.</li>
</ul>
<ul>
<li>Proven success in delivering independent GenAI projects, specifically those involving autonomous task completion or complex workflow automation.</li>
</ul>
<ul>
<li>Agentic Frameworks: Familiarity with LangGraph, LangChain, and LangSmith for building and debugging multi-agent systems.</li>
</ul>
<ul>
<li>Expertise in Enterprise Agentic &amp; Workflow Platforms: Deep familiarity with leading agentic AI and workflow automation platforms (such as Microsoft Copilot Studio, Salesforce Agentforce, ServiceNow AI Agents).</li>
</ul>
<ul>
<li>Market Trend Integration: Proven ability to apply emerging market trends,such as Multi-Agent Orchestration and Model Context Protocol (MCP),to build high-impact, cost-optimized solutions that scale across the enterprise.</li>
</ul>
<ul>
<li>Programming: Experience with Python or TypeScript for backend logic and agent orchestration.</li>
</ul>
<ul>
<li>Cloud &amp; Orchestration: Familiarity with Kubernetes (Operators/Controllers), Docker, and Terraform for automated deployment.</li>
</ul>
<ul>
<li>Model Expertise: Hands-on experience with LLM providers.</li>
</ul>
<p><strong>Bonus Points</strong></p>
<ul>
<li>Bachelor’s or Master’s degree in Computer Science or a related engineering field.</li>
</ul>
<ul>
<li>Strong communication skills with the ability to translate business requirements into technical agent architectures.</li>
</ul>
<ul>
<li>A commitment to Ethical AI and responsible development practices.</li>
</ul>
<ul>
<li>Experience with containerization and orchestration (e.g., Docker, Kubernetes).</li>
</ul>
<ul>
<li>Knowledge of DevOps practices for model deployment and automation.</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>remote</Workarrangement>
      <Salaryrange>$101,900-$161,200 CAD</Salaryrange>
      <Skills>Elasticsearch Relevance Engine (ESRE), Jina AI, advanced RAG patterns, LangGraph, LangChain, LangSmith, Microsoft Copilot Studio, Salesforce Agentforce, ServiceNow AI Agents, Multi-Agent Orchestration, Model Context Protocol (MCP), Python, TypeScript, Kubernetes, Docker, Terraform, LLM providers</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Elastic</Employername>
      <Employerlogo>https://logos.yubhub.co/elastic.co.png</Employerlogo>
      <Employerdescription>Elastic is a software company that provides a platform for search, security, and observability. It has a global presence with employees across 40+ countries.</Employerdescription>
      <Employerwebsite>https://www.elastic.co/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/elastic/jobs/7792839</Applyto>
      <Location>Canada</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>6274ee2d-545</externalid>
      <Title>Elastic AI Engineer</Title>
      <Description><![CDATA[<p>We are looking for an innovative Elastic AI Engineer to join our team to build autonomous, enterprise-grounded agents that don&#39;t just answer questions,they complete complex business tasks to accelerate productivity across the entire organization.</p>
<p>As the company behind the popular open-source projects , Elasticsearch, Kibana, Logstash, and Beats , we help people around the world do great things with their data. From stock quotes to Twitter streams, Apache logs to WordPress blogs, our products are extending what&#39;s possible with data, delivering on the promise that good things come from connecting the dots.</p>
<p><strong>Responsibilities</strong></p>
<ul>
<li>Agentic Strategy &amp; Design: Invent and implement sophisticated agentic workflows that use reasoning and tools to complete end-to-end business processes.</li>
<li>Enterprise Grounding: Apply Retrieval Augmented Generation (RAG) and the Elasticsearch Relevance Engine (ESRE) to ensure agents are deeply grounded in enterprise knowledge for high-accuracy task completion.</li>
<li>AI Model &amp; Tool Integration: Develop and fine-tune LLMs and integrate them with internal APIs and third-party SaaS tools to enable autonomous action.</li>
<li>Scalable Infrastructure: Firm understanding of cloud-based environments (AWS, Azure, GCP) in order to support the high-concurrency demands of enterprise agents.</li>
<li>Lifecycle Management: Oversee the training, deployment, and performance optimization of agents, ensuring they remain secure, reliable, and compliant.</li>
<li>Technical Leadership: Act as a domain expert on the Elastic Stack, making technical recommendations that push the boundaries of AI-driven productivity.</li>
<li>Documentation: Maintain comprehensive documentation of AI workflows, cloud infrastructure, and deployment processes.</li>
<li>Security: Implement standards for security and data privacy to protect sensitive information and ensure compliance with relevant regulations.</li>
</ul>
<p><strong>Requirements</strong></p>
<ul>
<li>3-5 years of work experience in a relevant field.</li>
<li>Minimum 1 year experience building with the Elastic Stack.</li>
<li>Knowledge of Elasticsearch Relevance Engine (ESRE), Jina AI, and advanced RAG patterns is critical.</li>
<li>Proven success in delivering independent GenAI projects, specifically those involving autonomous task completion or complex workflow automation.</li>
<li>Agentic Frameworks: Familiarity with LangGraph, LangChain, and LangSmith for building and debugging multi-agent systems.</li>
<li>Expertise in Enterprise Agentic &amp; Workflow Platforms: Deep familiarity with leading agentic AI and workflow automation platforms (such as Microsoft Copilot Studio, Salesforce Agentforce, ServiceNow AI Agents.)</li>
<li>Market Trend Integration: Proven ability to apply emerging market trends,such as Multi-Agent Orchestration and Model Context Protocol (MCP),to build high-impact, cost-optimized solutions that scale across the enterprise.</li>
<li>Programming: Experience with Python or TypeScript for backend logic and agent orchestration.</li>
<li>Cloud &amp; Orchestration: Familiarity with Kubernetes (Operators/Controllers), Docker, and Terraform for automated deployment.</li>
<li>Model Expertise: Hands-on experience with LLM providers.</li>
</ul>
<p><strong>Bonus Points</strong></p>
<ul>
<li>Bachelor’s or Master’s degree in Computer Science or a related engineering field.</li>
<li>Strong communication skills with the ability to translate business requirements into technical agent architectures.</li>
<li>A commitment to Ethical AI and responsible development practices.</li>
<li>Experience with containerization and orchestration (e.g., Docker, Kubernetes).</li>
<li>Knowledge of DevOps practices for model deployment and automation.</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>mid</Experiencelevel>
      <Workarrangement>remote</Workarrangement>
      <Salaryrange>$94,300-$149,200 USD</Salaryrange>
      <Skills>Elasticsearch Relevance Engine (ESRE), Jina AI, Advanced RAG patterns, Python, TypeScript, LangGraph, LangChain, LangSmith, Microsoft Copilot Studio, Salesforce Agentforce, ServiceNow AI Agents</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Elastic</Employername>
      <Employerlogo>https://logos.yubhub.co/elastic.co.png</Employerlogo>
      <Employerdescription>Elastic is a software company that provides a platform for search, security, and observability. The company has a global presence with employees across 40+ countries.</Employerdescription>
      <Employerwebsite>https://www.elastic.co/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/elastic/jobs/7607148</Applyto>
      <Location>United States</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>51758515-c12</externalid>
      <Title>Member of Technical Staff</Title>
      <Description><![CDATA[<p>We are seeking a highly skilled Member of Technical Staff to join our team in managing and enhancing reliability across a multi-data center environment.</p>
<p>This role focuses on automating processes, building and implementing robust observability solutions, and ensuring seamless operations for mission-critical AI infrastructure.</p>
<p>The ideal candidate will combine strong coding abilities with hands-on data center experience to build scalable reliability services, optimize system performance, and minimize downtime,including close partnership with facility operations to address physical infrastructure impacts.</p>
<p>In an era where AI workloads demand near-zero downtime, this position plays a pivotal role in bridging software engineering principles with physical data center realities.</p>
<p>By prioritizing automation and observability, team members in this role can reduce mean time to recovery (MTTR) by up to 50% through proactive monitoring and automated remediation, based on industry benchmarks from high-scale environments like those at hyperscale cloud providers.</p>
<p>Responsibilities:</p>
<ul>
<li>Design, develop, and deploy scalable code and services (primarily in Python and Rust, with flexibility for emerging languages) to automate reliability workflows, including monitoring, alerting, incident response, and infrastructure provisioning.</li>
</ul>
<ul>
<li>Implement and maintain observability tools and practices, such as metrics collection, logging, tracing, and dashboards, to provide real-time insights into system health across multiple data centers,open to innovative stacks beyond traditional ones like ELK.</li>
</ul>
<ul>
<li>Collaborate with cross-functional teams,including software development, network engineering, site operations, and facility operations (critical facilities, mechanical/electrical teams, and data center infrastructure management),to identify reliability bottlenecks, automate solutions for fault tolerance, disaster recovery, capacity planning, and physical/environmental risk mitigation (e.g., power redundancy, cooling efficiency, and environmental monitoring integration).</li>
</ul>
<ul>
<li>Troubleshoot and resolve complex issues in data center environments, including hardware failures, environmental anomalies, software bugs, and network-related problems, while adhering to reliability principles like error budgets and SLAs.</li>
</ul>
<ul>
<li>Optimize Linux-based systems for performance, security, and reliability, including kernel tuning, container orchestration (e.g., Kubernetes or emerging alternatives), and scripting for automation.</li>
</ul>
<ul>
<li>Understand network topologies and concepts in large-scale, multi-data center environments to effectively troubleshoot connectivity, routing, redundancy, and performance issues; integrate observability into data center interconnects and facility-level controls for rapid diagnosis and automation.</li>
</ul>
<ul>
<li>Participate in on-call rotations, post-incident reviews (blameless postmortems), and continuous improvement initiatives to enhance overall site reliability, including joint exercises with facility teams for physical failover and recovery scenarios.</li>
</ul>
<ul>
<li>Mentor junior team members and document processes to foster a culture of automation, knowledge sharing, and adaptability to new technologies.</li>
</ul>
<p>Basic Qualifications:</p>
<ul>
<li>Bachelor&#39;s degree in Computer Science, Computer Engineering, Electrical Engineering, or a closely related technical field (or equivalent professional experience).</li>
</ul>
<ul>
<li>5+ years of hands-on experience in site reliability engineering (SRE), infrastructure engineering, DevOps, or systems engineering, preferably supporting large-scale, distributed, or production environments.</li>
</ul>
<ul>
<li>Strong programming skills with proven production experience in Python (required for automation and tooling); experience with Rust or willingness to work in Rust is a plus, but strong coding fundamentals in at least one systems-level language (e.g., Python, Go, C++) are essential.</li>
</ul>
<ul>
<li>Solid experience with Linux systems administration, performance tuning, kernel-level understanding, and scripting/automation in production environments.</li>
</ul>
<ul>
<li>Practical knowledge of containerization and orchestration technologies, such as Docker and Kubernetes (or similar systems).</li>
</ul>
<ul>
<li>Experience implementing observability solutions, including metrics, logging, tracing, monitoring tools (e.g., Prometheus, Grafana, or alternatives), alerting, and dashboards.</li>
</ul>
<ul>
<li>Familiarity with troubleshooting complex issues in distributed systems, including software bugs, hardware failures, network problems, and environmental factors.</li>
</ul>
<ul>
<li>Understanding of networking fundamentals (TCP/IP, routing, redundancy, DNS) in large-scale or multi-site environments.</li>
</ul>
<ul>
<li>Experience participating in on-call rotations, incident response, post-incident reviews (blameless postmortems), and reliability practices such as error budgets or SLAs.</li>
</ul>
<ul>
<li>Ability to collaborate effectively with cross-functional teams (software engineers, network teams, site/facility operations, mechanical/electrical teams).</li>
</ul>
<p>Preferred Skills and Experience:</p>
<ul>
<li>7+ years of experience in SRE or infrastructure roles, ideally in hyperscale, cloud, or AI/ML training infrastructure environments with multi-data center setups.</li>
</ul>
<ul>
<li>Hands-on experience operating or scaling Kubernetes clusters (or equivalent orchestration) at large scale, including automation for provisioning, lifecycle management, and high-availability.</li>
</ul>
<ul>
<li>Proficiency in Rust for systems programming and performance-critical components.</li>
</ul>
<ul>
<li>Direct experience integrating software reliability tools with physical data center infrastructure.</li>
</ul>
<ul>
<li>Experience with observability tools and practices, such as metrics collection, logging, tracing, and dashboards.</li>
</ul>
<ul>
<li>Familiarity with containerization and orchestration technologies, such as Docker and Kubernetes (or similar systems).</li>
</ul>
<ul>
<li>Experience with Linux systems administration, performance tuning, kernel-level understanding, and scripting/automation in production environments.</li>
</ul>
<ul>
<li>Understanding of networking fundamentals (TCP/IP, routing, redundancy, DNS) in large-scale or multi-site environments.</li>
</ul>
<ul>
<li>Experience participating in on-call rotations, incident response, post-incident reviews (blameless postmortems), and reliability practices such as error budgets or SLAs.</li>
</ul>
<ul>
<li>Ability to collaborate effectively with cross-functional teams (software engineers, network teams, site/facility operations, mechanical/electrical teams).</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>staff</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>Python, Rust, Linux systems administration, performance tuning, kernel-level understanding, scripting/automation, containerization, orchestration, observability, metrics collection, logging, tracing, dashboards, networking fundamentals, TCP/IP, routing, redundancy, DNS, Kubernetes, Docker, Grafana, Prometheus, ELK, DevOps, SRE, infrastructure engineering, systems engineering</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>xAI</Employername>
      <Employerlogo>https://logos.yubhub.co/xai.com.png</Employerlogo>
      <Employerdescription>xAI creates AI systems to understand the universe and aid humanity in its pursuit of knowledge.</Employerdescription>
      <Employerwebsite>https://www.xai.com/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/xai/jobs/5044403007</Applyto>
      <Location>Memphis, TN</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>5dd5f58c-c07</externalid>
      <Title>Principal Engineer</Title>
      <Description><![CDATA[<p>We&#39;re looking for a well-versed Principal Engineer to play a key role in architecting and building highly available, reliable, and scalable payments applications. Collaborate with Payments Engineering teams to design, develop, and champion best-practices, patterns, and standards for all payments applications. Work closely with our CTO and other architects to create holistic technology solutions for our customers.</p>
<p>As a Principal Engineer, you will:</p>
<ul>
<li>Collaborate and communicate with Payments Engineering teams to design, develop, and champion best-practices, patterns, and standards for all payments applications.</li>
<li>Work closely with our CTO and other architects to create holistic technology solutions for our customers.</li>
<li>Be part of the Tech Leads group, driving measurable outcomes and iterative delivery strategy, removing roadblocks, empowering others, and mentoring high-potential engineers.</li>
<li>Produce clear, detailed, and actionable design documents, architecture blueprints, architectural decisions with context, decision, and tradeoffs.</li>
<li>Be involved in hands-on development of proof-of-concepts, prototypes, and real production-ready code.</li>
<li>Mentor engineers on architecture best practices and standards.</li>
<li>Engage in all phases of the software lifecycle - design, implement, test, deploy, and support services in production.</li>
<li>Maintain a culture of code quality through rigorous testing, automation, and code reviews.</li>
<li>Be proactive and innovative - we rely on your feedback to build a world-class product.</li>
</ul>
<p>We&#39;re seeking individuals with an equal flair for creative problem-solving, enthusiasm for new technologies, and a desire to contribute to our product. You will likely be successful in this role if you identify with the following traits: attention to detail, problem solver, customer-oriented, versatile, resilient, and confident.</p>
<p>If all of this sounds interesting to you, we&#39;d love to hear from you.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>remote</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>Cloud SaaS environment, Highly available, reliable, and scalable SaaS applications/platforms, Backend API specs, mocks, and service implementations, Cloud-native architecture, microservices, CI/CD (GitHub Actions, Argo), GitOps, Authentication and Authorization, APIs and API Gateway, Docker, Kubernetes (EKS), Kafka (MSK), Java, Spring Framework, Python, and AWS services, Observability solutions using Grafana and Open Telemetry, DevOps, SRE, Configuration Management, and Release Management, Payments technologies and ecosystem (card networks, PSP integration)</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>VGS</Employername>
      <Employerlogo>https://logos.yubhub.co/vgs.com.png</Employerlogo>
      <Employerdescription>VGS is the world&apos;s leader in payment tokenization, providing processor-agnostic tokenization solutions to large banks, fintechs, and merchants.</Employerdescription>
      <Employerwebsite>https://www.vgs.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.lever.co/verygoodsecurity/33e033b6-ae9b-4d51-b190-262a2cb83d96</Applyto>
      <Location>San Francisco</Location>
      <Country></Country>
      <Postedate>2026-04-17</Postedate>
    </job>
    <job>
      <externalid>a632e52b-c63</externalid>
      <Title>Site Reliability Engineer</Title>
      <Description><![CDATA[<p>About Mistral AI</p>
<p>At Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life.</p>
<p>We are a dynamic team passionate about AI and its potential to transform society. Our diverse workforce thrives in competitive environments and is committed to driving innovation.</p>
<p>Role Summary</p>
<p>We are seeking highly experienced Site Reliability Engineers (SRE) to shape the reliability, scalability and performance of our platform and customer facing applications. You will work closely with our software engineers and research teams to ensure our systems meet and exceed our internal and external customers&#39; expectations.</p>
<p>Responsibilities</p>
<p>As a Site Reliability Engineer, you balance the day-to-day operations on production systems with long-term software engineering improvements to reduce operational toil and foster the reliability, availability, and performance of these systems.</p>
<p>Operations</p>
<p>• Design, build, and maintain scalable, highly available and fault-tolerant infrastructures to support our web services and ML workloads</p>
<p>• Make sure our platform, inference and model training environments are always highly available and enable seamless replication of work environments across several HPC clusters</p>
<p>• Operate systems and troubleshoot issues in production environments (interrupts, on-call responses, users admin, data extraction, infrastructure scaling, etc.)</p>
<p>• Implement and improve monitoring, alerting, and incident response systems to ensure optimal system performance and minimize downtime</p>
<p>• Implement and maintain workflows and tools (CI/CD, containerization, orchestration, monitoring, logging and alerting systems) for both our client-facing APIs and large training runs</p>
<p>• Participate occasionally in on-call rotations to respond to incidents and perform root cause analysis to prevent future occurrences</p>
<p>Development</p>
<p>• Drive continuous improvement in infrastructure automation, deployment, and orchestration using tools like Kubernetes, Flux, Terraform</p>
<p>• Collaborate with AI/ML researchers to develop and implement solutions that enable safe and reproducible model-training experiments</p>
<p>• Build a cloud-agnostic platform offering an abstraction layer between science and infrastructure</p>
<p>• Design and develop new workflows and tooling to improve to the reliability, availability and performance of our systems (automation scripts, refactoring, new API-based features, web apps, dashboards, etc.)</p>
<p>• Collaborate with the security team to ensure infrastructure adheres to best security practices and compliance requirements</p>
<p>• Document processes and procedures to ensure consistency and knowledge sharing across the team</p>
<p>• Contribute to open-source projects, research publications, blog articles and conferences</p>
<p>About You</p>
<p>• Master’s degree in Computer Science, Engineering or a related field</p>
<p>• 7+ years of experience in a DevOps/SRE role</p>
<p>• Strong experience with cloud computing and highly available distributed systems</p>
<p>• Exposure to site reliability issues in critical environments (issue root cause analysis, in-production troubleshooting, on-call rotations...)</p>
<p>• Experience working against reliability KPIs (observability, alerting, SLAs)</p>
<p>• Hands-on experience with CI/CD, containerization and orchestration tools (Docker, Kubernetes...)</p>
<p>• Knowledge of monitoring, logging, alerting and observability tools (Prometheus, Grafana, ELK Stack, Datadog...)</p>
<p>• Familiarity with infrastructure-as-code tools like Terraform or CloudFormation</p>
<p>• Proficiency in scripting languages (Python, Go, Bash...) and knowledge of software development best practices</p>
<p>• Strong understanding of networking, security, and system administration concepts</p>
<p>• Excellent problem-solving and communication skills</p>
<p>• Self-motivated and able to work well in a fast-paced startup environment</p>
<p>Your Application Will Be All The More Interesting If You Also Have:</p>
<p>• Experience in an AI/ML environment</p>
<p>• Experience of high-performance computing (HPC) systems and workload managers (Slurm)</p>
<p>• Worked with modern AI-oriented solutions (Fluidstack, Coreweave, Vast...)</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>remote</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>cloud computing, highly available distributed systems, DevOps, SRE, Kubernetes, Flux, Terraform, CI/CD, containerization, orchestration, monitoring, logging, alerting, observability, infrastructure-as-code, scripting languages, software development best practices, networking, security, system administration, AI/ML environment, high-performance computing (HPC) systems, workload managers, modern AI-oriented solutions</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Mistral AI</Employername>
      <Employerlogo>https://logos.yubhub.co/mistral.ai.png</Employerlogo>
      <Employerdescription>Mistral AI is a company that develops and provides artificial intelligence (AI) technology to simplify tasks, save time, and enhance learning and creativity.</Employerdescription>
      <Employerwebsite>https://mistral.ai</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.lever.co/mistral/6e16e4fa-a60b-4270-a815-06b0450fb597</Applyto>
      <Location>Paris</Location>
      <Country></Country>
      <Postedate>2026-04-17</Postedate>
    </job>
    <job>
      <externalid>419c1058-a0b</externalid>
      <Title>Site Reliability Engineer</Title>
      <Description><![CDATA[<p>About Mistral AI</p>
<p>At Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life. We democratize AI through high-performance, optimized, open-source and cutting-edge models, products and solutions. Our comprehensive AI platform is designed to meet enterprise needs, whether on-premises or in cloud environments.</p>
<p>Role Summary</p>
<p>We are seeking highly experienced Site Reliability Engineers (SRE) to shape the reliability, scalability and performance of our platform and customer facing applications. You will work closely with our software engineers and research teams to ensure our systems meet and exceed our internal and external customers&#39; expectations.</p>
<p>Responsibilities</p>
<p>As a Site Reliability Engineer, you balance the day-to-day operations on production systems with long-term software engineering improvements to reduce operational toil and foster the reliability, availability, and performance of these systems.</p>
<p>Operations (50%)</p>
<ul>
<li>Design, build, and maintain scalable, highly available and fault-tolerant infrastructures to support our web services and ML workloads</li>
<li>Make sure our platform, inference and model training environments are always highly available and enable seamless replication of work environments across several HPC clusters</li>
<li>Operate systems and troubleshoot issues in production environments (interrupts, on-call responses, users admin, data extraction, infrastructure scaling, etc.)</li>
<li>Implement and improve monitoring, alerting, and incident response systems to ensure optimal system performance and minimize downtime</li>
<li>Implement and maintain workflows and tools (CI/CD, containerization, orchestration, monitoring, logging and alerting systems) for both our client-facing APIs and large training runs</li>
<li>Participate occasionally in on-call rotations to respond to incidents and perform root cause analysis to prevent future occurrences</li>
</ul>
<p>Development (50%)</p>
<ul>
<li>Drive continuous improvement in infrastructure automation, deployment, and orchestration using tools like Kubernetes, Flux, Terraform</li>
<li>Collaborate with AI/ML researchers to develop and implement solutions that enable safe and reproducible model-training experiments</li>
<li>Build a cloud-agnostic platform offering an abstraction layer between science and infrastructure</li>
<li>Design and develop new workflows and tooling to improve to the reliability, availability and performance of our systems (automation scripts, refactoring, new API-based features, web apps, dashboards, etc.)</li>
<li>Collaborate with the security team to ensure infrastructure adheres to best security practices and compliance requirements</li>
<li>Document processes and procedures to ensure consistency and knowledge sharing across the team</li>
<li>Contribute to open-source projects, research publications, blog articles and conferences</li>
</ul>
<p>About You</p>
<ul>
<li>Master’s degree in Computer Science, Engineering or a related field</li>
<li>7+ years of experience in a DevOps/SRE role</li>
<li>Strong experience with cloud computing and highly available distributed systems</li>
<li>Exposure to site reliability issues in critical environments (issue root cause analysis, in-production troubleshooting, on-call rotations...) </li>
<li>Experience working against reliability KPIs (observability, alerting, SLAs)</li>
<li>Hands-on experience with CI/CD, containerization and orchestration tools (Docker, Kubernetes...)</li>
<li>Knowledge of monitoring, logging, alerting and observability tools (Prometheus, Grafana, ELK Stack, Datadog...)</li>
<li>Familiarity with infrastructure-as-code tools like Terraform or CloudFormation</li>
<li>Proficiency in scripting languages (Python, Go, Bash...) and knowledge of software development best practices</li>
<li>Strong understanding of networking, security, and system administration concepts</li>
<li>Excellent problem-solving and communication skills</li>
<li>Self-motivated and able to work well in a fast-paced startup environment</li>
</ul>
<p>Your Application Will Be All The More Interesting If You Also Have:</p>
<ul>
<li>Experience in an AI/ML environment</li>
<li>Experience of high-performance computing (HPC) systems and workload managers (Slurm)</li>
<li>Worked with modern AI-oriented solutions (Fluidstack, Coreweave, Vast...)</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>remote</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>cloud computing, highly available distributed systems, DevOps, SRE, Kubernetes, Flux, Terraform, CI/CD, containerization, orchestration, monitoring, logging, alerting, observability, infrastructure-as-code, scripting languages, software development best practices, networking, security, system administration, AI/ML environment, high-performance computing, workload managers, modern AI-oriented solutions</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Mistral AI</Employername>
      <Employerlogo></Employerlogo>
      <Employerdescription>Mistral AI is a company that develops and provides artificial intelligence (AI) technology to simplify tasks, save time, and enhance learning and creativity. It has a diverse workforce with teams distributed across multiple countries.</Employerdescription>
      <Employerwebsite>https://mistral.ai/careers</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.lever.co/mistral/6e16e4fa-a60b-4270-a815-06b0450fb597</Applyto>
      <Location>Paris</Location>
      <Country></Country>
      <Postedate>2026-03-10</Postedate>
    </job>
    <job>
      <externalid>d2955c92-774</externalid>
      <Title>Network Security Engineering Enterprise Architect (GSR8)</Title>
      <Description><![CDATA[<p>As a Network Security Engineering Enterprise Architect (GSR8), you will be a technical lead supporting Ford&#39;s complete Enterprise Network &amp; Security architecture transformation. You will be taking care of dynamics for Network Security Engineering Products (Security Firewalls, Proxy, ISE, SDN Networks, Wireless) team to a centre of technical excellence and customer Advocacy.</p>
<p>You will identify, analyse, and resolve existing network security design weaknesses and vulnerabilities which could possess risk to existing infrastructure. Expert in closing zero-day security vulnerabilities taking along all infrastructure domain teams which could impact Ford&#39;s reputation across globe.</p>
<p>As a Network Security Engineering enterprise architect, you would lead future network security product development by contributing to the network Design (architecture) and Automation used across multiple Engineering Branches, Data Centres, Manufacturing Plants and Remote users.</p>
<p>This Role requires defining road map for ZTNA/SASE deployment using Prisma Access/Cloud, setup support model, automation to accelerate end user experience. The Global Network Security Engineering enterprise architect is responsible for successful setup of the products by working closely with Software developers from Ford and OEMs in consultation with Ford&#39;s Network and Security Operations Team.</p>
<p>This position will be part of Ford&#39;s Enterprise Tech department and will report to the Regional Network Delivery Manager, based in same or another region. The lead needs to ensure &#39;Always On&#39; (24 x 7) availability of Ford Global Network Product offerings, working with Network &amp; Security Peers from other regions.</p>
<p><strong>Responsibilities</strong></p>
<p>This role will also be driving towards supporting full observability and Monitoring, process response, and technical capability to ensure customer up time of 99.999%+. This position requires a wide range of skills and experience,</p>
<ul>
<li>This role involves collaborating closely with the network operations team to identify continuous improvement opportunities and working with the network engineering team and OEMs to devise and implement solutions. The implementation will be driven through automation in partnership with Ford&#39;s developers.</li>
<li>Design and implement robust security architectures and frameworks to protect against threats and vulnerabilities.</li>
<li>Ensure timely proactive identification and reporting of security gaps and vulnerabilities to the critical business information, systems and network infrastructure.</li>
<li>Plan for End-to-end Network &amp; Security projects implementation.</li>
</ul>
<p><strong>Qualifications</strong></p>
<ul>
<li>Support the Major technical Incident Management Calls and Change Controls through STRONG Technical Network Knowledge, Operational capability, and strong communication skills.</li>
<li>Perform configuration updates, such as modifying configurations, signature definitions or implementing new policies on various network security tools, as directed.</li>
<li>Demonstrate technical excellence through technical knowledge.</li>
<li>Collaborate with global leaders to support 24/7 network availability on a worldwide scale.</li>
<li>Advocate and ensure that high quality Follow the Sun (FTS) is delivered to receiving teams. As well as support on-call schedule and shifts are available.</li>
<li>Support continuous improvement in service management for Network Services leveraging enterprise tools and processes (Incident, Problem &amp; Change) and focusing on customer value optimization.</li>
<li>Supports implement best practices and processes for Network &amp; Security Operations services to maintain availability, reliability, scalability, and security.</li>
<li>Support for effective SRE Monitoring and FSO (Full Stack Observability) on system performance and overall health, troubleshoot issues, and implement corrective actions.</li>
<li>Collaborate with the Network LAN/WAN &amp; security Engineering/development teams to optimize infrastructure for application performance and scalability.</li>
<li>Support team members to achieve technical network excellence thru experience, and network Certifications and support training requirements.</li>
<li>Able to support the team to develop continued improvements leading to an &#39;always on network capability.</li>
<li>Be able to leverage other network management tools used by the NOC in the identification and response to security connectivity incidents and faults.</li>
<li>Develop security policies, standards, and procedures.</li>
<li>Assist with security compliance audits to verify completeness of required configurations and verify system hardening.</li>
<li>Participate in the problem investigation connectivity incidents related to security devices, provide recommendations to improve reliability and availability, or reduce recovery time.</li>
<li>Support assurance of up-to-date SW releases, targeted LDOS, and PSIRTS (security updates).</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>Network Security Engineering, Enterprise Architecture, Security Firewalls, Proxy, ISE, SDN Networks, Wireless, Prisma Access/Cloud, ZTNA/SASE, Automation, Network Design, Network Security, Security Operations, Incident Management, Change Controls, Technical Knowledge, Global Leadership, Follow the Sun, SRE Monitoring, FSO, Full Stack Observability, System Performance, Network Certifications, Training Requirements</Skills>
      <Category>Engineering</Category>
      <Industry>Automotive</Industry>
      <Employername>Ford</Employername>
      <Employerlogo></Employerlogo>
      <Employerdescription>Ford is a multinational automaker that designs, manufactures, and markets vehicles and automotive-related products. It is one of the largest automakers in the world.</Employerdescription>
      <Employerwebsite>https://efds.fa.em5.oraclecloud.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://efds.fa.em5.oraclecloud.com/hcmUI/CandidateExperience/en/sites/CX_1/job/56878</Applyto>
      <Location>Chennai, Tamil Nadu, India</Location>
      <Country></Country>
      <Postedate>2026-03-09</Postedate>
    </job>
    <job>
      <externalid>c0069e5d-01b</externalid>
      <Title>Forward Deployment Engineer (Developer Success)</Title>
      <Description><![CDATA[<p>You&#39;ll work directly with customers to deploy Firecrawl in real-world environments, unblock integrations, and turn customer needs into repeatable solutions and product improvements. This is a highly hands-on, customer-facing engineering role — ideal for someone who likes solving complex problems live and shipping pragmatic solutions fast.  <strong>Salary Range:</strong> $150,000–$250,000/year (Range shown is for U.S.-based employees. Compensation outside the U.S. is adjusted fairly based on your country&#39;s cost of living.)  <strong>Equity Range:</strong> Up to 0.10%  <strong>Location:</strong> San Francisco, CA (Hybrid) OR Remote  <strong>Job Type:</strong> Full-Time (SF) OR Contract (Remote)  <strong>Experience:</strong> 3+ years or equivalent shipped systems  <strong>Visa:</strong> US Citizenship/Visa required for SF; N/A for Remote  You&#39;ll work directly with customers to deploy, customize, and troubleshoot Firecrawl in production environments. You&#39;ll own technical delivery for priority accounts, from first integration through ongoing optimization. You&#39;ll debug complex real-world issues involving APIs, crawling, data pipelines, and infra constraints. You&#39;ll build reusable solutions, templates, and playbooks based on customer needs. You&#39;ll translate customer feedback into clear product and engineering insights. You&#39;ll collaborate closely with core engineers to improve reliability, performance, and usability. You&#39;ll help define best practices for how Firecrawl is implemented at scale.  <strong>A strong engineer who likes being close to customers.</strong> You have solid fundamentals — APIs, systems, debugging — and you&#39;re comfortable explaining technical concepts clearly to people who aren&#39;t engineers. You&#39;d rather hop on a call and unblock someone than file a ticket and wait.  <strong>Calm and effective in ambiguity.</strong> You don&#39;t need a runbook for every situation. You diagnose fast, communicate clearly, and make good decisions with incomplete information.  <strong>Biased toward action.</strong> You unblock first, optimize later. You ship pragmatic solutions over perfect abstractions and know when &quot;good enough now&quot; beats &quot;ideal next quarter.&quot;  <strong>Comfortable in a small, high-trust team.</strong> You don&#39;t need layers of process. You work directly with founders and core engineers, own your domain, and move fast.  <strong>Backgrounds that often do well:</strong> Solutions engineers, SREs, devrel engineers, or customer-facing infra roles. Engineers at startups who wore multiple hats. Ex-founders who&#39;ve debugged customer problems at 2am because the customer mattered.  <strong>Benefits &amp; Perks</strong>  <em>   <strong>Salary that makes sense</strong> — $170,000–215,000/year (SF, U.S.-based), based on impact, not tenure </em>   <strong>Own a piece</strong> — Up to 0.20% equity in what you&#39;re helping build <em>   <strong>Generous PTO</strong> — 15 days mandatory, anything after 24 days, just ask (holidays excluded); take the time you need to recharge </em>   <strong>Parental leave</strong> — 12 weeks fully paid, for moms and dads <em>   <strong>Wellness stipend</strong> — $100/month for the gym, therapy, massages, or whatever keeps you human </em>   <strong>Learning &amp; Development</strong> — Expense up to $1000/year toward anything that helps you grow professionally <em>   <strong>Team offsites</strong> — A change of scenery, minus the trust falls </em>   <strong>Sabbatical</strong> — 3 paid months off after 4 years, do something fun and new  <strong>Interview Process</strong>  1.  <strong>Application Review</strong> – Send us your stuff + a quick note on why this excites you (plus links to things you&#39;ve built or deployed). 2.  <strong>Technical + Customer Scenario Interview (~45 min)</strong> – Real-world problem solving: we&#39;ll walk through a customer deployment scenario and see how you debug, communicate, and prioritize live. We&#39;re looking for engineering depth and customer instincts — not trivia. 3.  <strong>Founder Chat (~30 min)</strong> – Culture, pace, ownership, and how you like to work. Time for your questions too. 4.  <strong>Paid Work Trial (1–2 weeks)</strong> – Test drive the real thing: work on a real customer deployment or integration with measurable impact. 5.  <strong>Decision</strong> – We move fast after the trial.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>Full time</Jobtype>
      <Experiencelevel>mid</Experiencelevel>
      <Workarrangement>Remote</Workarrangement>
      <Salaryrange>$150K - $250K</Salaryrange>
      <Skills>APIs, systems, debugging, customer-facing engineering, solutions engineering, SREs, devrel engineers, customer-facing infra roles, engineering depth, customer instincts, problem-solving, communication, prioritization</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Firecrawl</Employername>
      <Employerlogo>https://logos.yubhub.co/firecrawl.com.png</Employerlogo>
      <Employerdescription>Firecrawl is a small, fast-moving, technical team building essential infrastructure super-intelligence will use to gather data on the web.</Employerdescription>
      <Employerwebsite>https://jobs.ashbyhq.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.ashbyhq.com/firecrawl/bda40f47-a69b-44d4-ac1a-3f86f20d802d</Applyto>
      <Location>San Francisco, CA (Hybrid) OR Remote-Global</Location>
      <Country></Country>
      <Postedate>2026-03-08</Postedate>
    </job>
    <job>
      <externalid>84b511f8-598</externalid>
      <Title>Field Engineer</Title>
      <Description><![CDATA[<p><strong>Compensation\n\n- Compensation is determined based on career level, with the OTE for this role being between $150K – $250K • Offers Equity\n\nReplit is the agentic software creation platform that enables anyone to build applications using natural language. With millions of users worldwide, Replit is democratizing software development by removing traditional barriers to application creation.\n\n## <strong>About the Role</strong>\n\nAs an <strong>Enterprise/Strategic Field Engineer (L5)</strong>, you&#39;ll be the technical cornerstone for Replit&#39;s largest and most strategic accounts. This is a hybrid role: <strong>high-impact pre-sales</strong> (closing complex technical evaluations) and <strong>post-sales</strong> (driving adoption, expansion, and retention). You&#39;ll own the end-to-end technical relationship—from pre-sales architecture discussions through multi-year expansion—ensuring our enterprise customers don&#39;t just use Replit, but become Replit-powered companies.\n\nYou&#39;ll partner with Account Executives and Account Managers in a high-accountability <strong>Pod structure</strong>. This is not a reactive support role—this is a proactive, strategic technical leader who identifies blockers before they become problems, champions new use cases, and directly influences $5M+ in annual recurring revenue.\n\n## <strong>In this role you will:</strong>\n\n<strong>Pre-Sales</strong>\n\n- <strong>Strategic Technical Discovery:</strong> When you are pulled into complex deals, you join as the expert closer. You run deep discovery on their stack and constraints, then design the winning technical strategy.\n\n- <strong>Proof of Value (POV) &amp; Live Building:</strong> You build live, functional applications on the fly during executive meetings to prove immediate value and technical feasibility to VPs and C-suite stakeholders.\n\n- <strong>Context &amp; Connectivity (MCP):</strong> You write and deploy <strong>Model Context Protocol (MCP)</strong> servers to securely connect Replit Agents to customer-specific data, making Replit the central hub for their internal development.\n\n- <strong>Enterprise Governance:</strong> You own the &quot;Guardrails&quot; mission. You configure workspace policies and AI governance templates that solve for data safety, compliance, and CISO approval.\n\n- <strong>Infrastructure Strategy:</strong> You lead deep-dive reviews for Single-Tenant/VPC deployments, ensuring Replit fits the customer’s security posture.\n\n- <strong>Hackathons &amp; Trials:</strong> You lead high-energy Hackathons alongside Account Executives to drive hands-on experience and excitement about Replit.\n\n<strong>Post-Sales</strong>\n\n- <strong>Production Foundation:</strong> You lead the technical kickoff, ensuring production-ready <strong>SSO/SCIM</strong> provisioning, guardrails, and security setup.\n\n- <strong>Technical Onboarding &amp; Enablement:</strong> You ensure customers learn how to build in Replit to its maximum capabilities. You enable technical and non-technical teams by running training sessions and workshops that establish enterprise workflows with Replit.\n\n- <strong>Design Systems:</strong> You build Design Systems and starter templates tailored to the customer’s stack to accelerate their internal development time.\n\n- <strong>Drive Viral Growth:</strong> You build trust with key technical stakeholders, proactively running enablement sessions to keep teams building. You act as the spark for viral growth within the account.\n\n- <strong>Source &amp; Qualify Expansion:</strong> Work with Account Managers to proactively identify new teams, new use cases, and new projects. You provide the technical validation to support the commercial close.\n\n- <strong>Run Value-Based QBRs:</strong> Co-lead quarterly business reviews, shifting the conversation to business value delivered and strategic roadmap alignment.\n\n## <strong>Required skills and experience:</strong>\n\n- <strong>5-7+ years in technical customer-facing roles</strong> such as Solutions Engineer, Sales Engineer, Implementation Engineer, Forward Deployed Engineer, Technical Account Manager, or Customer Success Engineer at a high-growth B2B SaaS or dev tools company\n\n- <strong>Replit Power User:</strong> You understand Replit better than 99% of our users. You have likely built Replit apps before. To facilitate this, we provide interviewees with credits/access to the platform.\n\n- <strong>Enterprise Depth:</strong> You have worked with enterprise prospects to drive adoption, expansion, and renewal. You can explain complex technical concepts to non-technical executives and translate business requirements into technical architecture.\n\n- <strong>Live Builder:</strong> You&#39;ve run POCs, onboarding sessions, and workshops. You can listen to vague requirements and translate them on the fly into technical concepts, creating live apps in real-time.\n\n- <strong>Production Engineering:</strong> You can read and write code (JavaScript, Python, or similar). You understand APIs, databases, CI/CD pipelines, and modern cloud architecture.\n\n- <strong>Pod Mentality:</strong> You thrive in a high-accountability POD structure.\n\n- <strong>Military Experience:</strong> Relevant military experience with technology is counted as background and experience.\n\n- Comfort with up to 25% travel (expect 30%+).\n\n## <strong>Nice to have:</strong>\n\n- Experience with AI-powered dev tools (Cursor, Windsurf, Lovable, Claude Code, Zapier etc.)\n\n- Understanding of AI Evaluation patterns (Evals) and Context Management (RAG, System Prompts).\n\n- Background in DevOps, cloud infrastructure (AWS/GCP/Azure), or SRE.\n\n## <strong>Tools + Tech Stack for this role:</strong>\n\n- Replit\n\n- HubSpot CRM\n\n- SSO/SAML/SCIM identity systems\n\n- Cloud platforms (AWS, GCP, Azure)\n\n- Slack\n\n- Claude\n\n- ChatGPT\n\n- Gemini\n\n- Notion\n\n- Superhuman\n\n- ZoomInfo\n\n- Hex\n\n## <strong>This role may</strong> _<strong>not</strong>_ <strong>be a fit if:</strong>\n\n- <strong>You haven&#39;t had your &quot;Replit Moment&quot;.</strong> You didn&#39;t explore the product on your own, get mind-blown by the speed of creation, and immediately start showing your apps to friends and family.\n\n- You are uncomfortable building functional apps on the fly in front of executives or need a script to demonstrate value.\n\n- You prefer reactive troubleshooting over proactively identifying architectural blockers and owning the technical strategy.\n\n- You struggle to debug code, configure SSO, or understand cloud infrastructure without significant hand-holding.\n\n- You aren&#39;t obsessed with how Agents are rewriting the SDLC, and you don&#39;t use AI tools to build in your own spare time.\n\n- You struggle in ambiguous, high-growth environments where you often have to build the tool or process required to solve the problem.\n\n- You&#39;re not comfortable with significant travel (up to 30%+) for customer meetings and on-site engagements\n\n- You prefer transactional interactions over building deep, multi-year customer relationships\n\n_This is a full-time role based in San Francisco Bay Area, New York City, or Remote (US-based). Travel up to 25% required (expect 30%+). 2-week onsite onboarding in Foster City, CA required._\n\n## <strong>Full-Time Employee Benefits Include:</strong>\n\n💰 Competitive Salary &amp; Equity\n\n💹 401(k) Program with a 4% match\n\n⚕️ Health, Dental, Vision and Life Insurance\n\n🩼 Short Term and Long Term Disability\n\n🚼 Paid Parental, Medical, Caregiver Leave\n\n🚗 Commuter Benefits\n\n📱 Monthly Wellness Stipend\n\n🧑‍💻 Autonomous Work Environment\n\n🖥 In Office Set-Up Reimbursement\n\n🏝 Flexible Time Off (FTO) + Holidays\n\n🚀 Quarterly Team Gatherings\n\n☕ In Office Amenities\n\n## <strong>Want to learn more about what we are up to?</strong>\n\n- [Meet the Replit Agent](https://www.youtube.com/watch?v=IYiVPrxY8-Y)\n\n- [Replit: Make an app for that](https://www.youtube.com/watch?v=4zd9hzngFwY)\n\n- [Replit Blog](https://blog.replit.com/)\n\n- [Amjad TED Talk](https://youtu.be/kCudFI4tcpg?si=l4ViCejV_f2RZkDi)\n\n## <strong>Interviewing + Culture at Replit</strong>\n\n- [Operating Principles](https://blog.replit.com/operating-principles)\n\n- [Reasons not to work at Replit](https://blog.replit.com/reasons-not-to-join-replit)\n\nTo achieve our mission of making programming more accessible around the world, we need our team to be representative of the world. We welcome your unique perspective and experiences in shaping this product. We encourage people from all kinds of backgrounds and experiences to apply.\n\n</strong></p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>Full time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>Hybrid</Workarrangement>
      <Salaryrange>Competitive Salary &amp; Equity</Salaryrange>
      <Skills>5-7+ years in technical customer-facing roles, Replit Power User, Enterprise Depth, Live Builder, Production Engineering, Pod Mentality, Military Experience, Experience with AI-powered dev tools, Understanding of AI Evaluation patterns, Background in DevOps, cloud infrastructure (AWS/GCP/Azure), or SRE</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Replit</Employername>
      <Employerlogo>https://logos.yubhub.co/replit.com.png</Employerlogo>
      <Employerdescription>Replit is a software creation platform that enables anyone to build applications using natural language. With millions of users worldwide, Replit is a leading provider of dev tools.</Employerdescription>
      <Employerwebsite>https://jobs.ashbyhq.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.ashbyhq.com/replit/df87458e-b5e9-4dbc-85a4-54c508d040b7</Applyto>
      <Location>NYC (SoHo) Hybrid</Location>
      <Country></Country>
      <Postedate>2026-03-08</Postedate>
    </job>
    <job>
      <externalid>055260e3-5e7</externalid>
      <Title>Field Engineer</Title>
      <Description><![CDATA[<p><strong>Compensation\n\n- Compensation is determined based on career level, with the OTE for this role being between $150K – $250K • Offers Equity\n\nReplit is the agentic software creation platform that enables anyone to build applications using natural language. With millions of users worldwide, Replit is democratizing software development by removing traditional barriers to application creation.\n\n## <strong>About the Role</strong>\n\nAs an <strong>Enterprise/Strategic Field Engineer (L5)</strong>, you&#39;ll be the technical cornerstone for Replit&#39;s largest and most strategic accounts. This is a hybrid role: <strong>high-impact pre-sales</strong> (closing complex technical evaluations) and <strong>post-sales</strong> (driving adoption, expansion, and retention). You&#39;ll own the end-to-end technical relationship—from pre-sales architecture discussions through multi-year expansion—ensuring our enterprise customers don&#39;t just use Replit, but become Replit-powered companies.\n\nYou&#39;ll partner with Account Executives and Account Managers in a high-accountability <strong>Pod structure</strong>. This is not a reactive support role—this is a proactive, strategic technical leader who identifies blockers before they become problems, champions new use cases, and directly influences $5M+ in annual recurring revenue.\n\n## <strong>In this role you will:</strong>\n\n<strong>Pre-Sales</strong>\n\n- <strong>Strategic Technical Discovery:</strong> When you are pulled into complex deals, you join as the expert closer. You run deep discovery on their stack and constraints, then design the winning technical strategy.\n\n- <strong>Proof of Value (POV) &amp; Live Building:</strong> You build live, functional applications on the fly during executive meetings to prove immediate value and technical feasibility to VPs and C-suite stakeholders.\n\n- <strong>Context &amp; Connectivity (MCP):</strong> You write and deploy <strong>Model Context Protocol (MCP)</strong> servers to securely connect Replit Agents to customer-specific data, making Replit the central hub for their internal development.\n\n- <strong>Enterprise Governance:</strong> You own the &quot;Guardrails&quot; mission. You configure workspace policies and AI governance templates that solve for data safety, compliance, and CISO approval.\n\n- <strong>Infrastructure Strategy:</strong> You lead deep-dive reviews for Single-Tenant/VPC deployments, ensuring Replit fits the customer’s security posture.\n\n- <strong>Hackathons &amp; Trials:</strong> You lead high-energy Hackathons alongside Account Executives to drive hands-on experience and excitement about Replit.\n\n<strong>Post-Sales</strong>\n\n- <strong>Production Foundation:</strong> You lead the technical kickoff, ensuring production-ready <strong>SSO/SCIM</strong> provisioning, guardrails, and security setup.\n\n- <strong>Technical Onboarding &amp; Enablement:</strong> You ensure customers learn how to build in Replit to its maximum capabilities. You enable technical and non-technical teams by running training sessions and workshops that establish enterprise workflows with Replit.\n\n- <strong>Design Systems:</strong> You build Design Systems and starter templates tailored to the customer’s stack to accelerate their internal development time.\n\n- <strong>Drive Viral Growth:</strong> You build trust with key technical stakeholders, proactively running enablement sessions to keep teams building. You act as the spark for viral growth within the account.\n\n- <strong>Source &amp; Qualify Expansion:</strong> Work with Account Managers to proactively identify new teams, new use cases, and new projects. You provide the technical validation to support the commercial close.\n\n- <strong>Run Value-Based QBRs:</strong> Co-lead quarterly business reviews, shifting the conversation to business value delivered and strategic roadmap alignment.\n\n## <strong>Required skills and experience:</strong>\n\n- <strong>5-7+ years in technical customer-facing roles</strong> such as Solutions Engineer, Sales Engineer, Implementation Engineer, Forward Deployed Engineer, Technical Account Manager, or Customer Success Engineer at a high-growth B2B SaaS or dev tools company\n\n- <strong>Replit Power User:</strong> You understand Replit better than 99% of our users. You have likely built Replit apps before. To facilitate this, we provide interviewees with credits/access to the platform.\n\n- <strong>Enterprise Depth:</strong> You have worked with enterprise prospects to drive adoption, expansion, and renewal. You can explain complex technical concepts to non-technical executives and translate business requirements into technical architecture.\n\n- <strong>Live Builder:</strong> You&#39;ve run POCs, onboarding sessions, and workshops. You can listen to vague requirements and translate them on the fly into technical concepts, creating live apps in real-time.\n\n- <strong>Production Engineering:</strong> You can read and write code (JavaScript, Python, or similar). You understand APIs, databases, CI/CD pipelines, and modern cloud architecture.\n\n- <strong>Pod Mentality:</strong> You thrive in a high-accountability POD structure.\n\n- <strong>Military Experience:</strong> Relevant military experience with technology is counted as background and experience.\n\n- Comfort with up to 25% travel (expect 30%+).\n\n## <strong>Nice to have:</strong>\n\n- Experience with AI-powered dev tools (Cursor, Windsurf, Lovable, Claude Code, Zapier etc.)\n\n- Understanding of AI Evaluation patterns (Evals) and Context Management (RAG, System Prompts).\n\n- Background in DevOps, cloud infrastructure (AWS/GCP/Azure), or SRE.\n\n## <strong>Tools + Tech Stack for this role:</strong>\n\n- Replit\n\n- HubSpot CRM\n\n- SSO/SAML/SCIM identity systems\n\n- Cloud platforms (AWS, GCP, Azure)\n\n- Slack\n\n- Claude\n\n- ChatGPT\n\n- Gemini\n\n- Notion\n\n- Superhuman\n\n- ZoomInfo\n\n- Hex\n\n## <strong>This role may</strong> _<strong>not</strong>_ <strong>be a fit if:</strong>\n\n- <strong>You haven&#39;t had your &quot;Replit Moment&quot;.</strong> You didn&#39;t explore the product on your own, get mind-blown by the speed of creation, and immediately start showing your apps to friends and family.\n\n- You are uncomfortable building functional apps on the fly in front of executives or need a script to demonstrate value.\n\n- You prefer reactive troubleshooting over proactively identifying architectural blockers and owning the technical strategy.\n\n- You struggle to debug code, configure SSO, or understand cloud infrastructure without significant hand-holding.\n\n- You aren&#39;t obsessed with how Agents are rewriting the SDLC, and you don&#39;t use AI tools to build in your own spare time.\n\n- You struggle in ambiguous, high-growth environments where you often have to build the tool or process required to solve the problem.\n\n- You&#39;re not comfortable with significant travel (up to 30%+) for customer meetings and on-site engagements\n\n- You prefer transactional interactions over building deep, multi-year customer relationships\n\n_This is a full-time role based in San Francisco Bay Area, New York City, or Remote (US-based). Travel up to 25% required (expect 30%+). 2-week onsite onboarding in Foster City, CA required._\n\n## <strong>Full-Time Employee Benefits Include:</strong>\n\n💰 Competitive Salary &amp; Equity\n\n💹 401(k) Program with a 4% match\n\n⚕️ Health, Dental, Vision and Life Insurance\n\n🩼 Short Term and Long Term Disability\n\n🚼 Paid Parental, Medical, Caregiver Leave\n\n🚗 Commuter Benefits\n\n📱 Monthly Wellness Stipend\n\n🧑‍💻 Autonomous Work Environment\n\n🖥 In Office Set-Up Reimbursement\n\n🏝 Flexible Time Off (FTO) + Holidays\n\n🚀 Quarterly Team Gatherings\n\n☕ In Office Amenities\n\n<strong>Want to learn more about what we are up to?</strong>\n\n- [Meet the Replit Agent](https://www.youtube.com/watch?v=IYiVPrxY8-Y)\n\n- [Replit: Make an app for that](https://www.youtube.com/watch?v=4zd9hzngFwY)\n\n- [Replit Blog](https://blog.replit.com/)\n\n- [Amjad TED Talk](https://youtu.be/kCudFI4tcpg?si=l4ViCejV_f2RZkDi)\n\n<strong>Interviewing + Culture at Replit</strong>\n\n- [Operating Principles](https://blog.replit.com/operating-principles)\n\n- [Reasons not to work at Replit](https://blog.replit.com/reasons-not-to-join-replit)\n\nTo achieve our mission of making programming more accessible around the world, we need our team to be representative of the world. We welcome your unique perspective and experiences in shaping this product. We encourage people</strong></p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>Full time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>Hybrid</Workarrangement>
      <Salaryrange>Competitive Salary &amp; Equity</Salaryrange>
      <Skills>5-7+ years in technical customer-facing roles, Replit Power User, Enterprise Depth, Live Builder, Production Engineering, Pod Mentality, Military Experience, Experience with AI-powered dev tools, Understanding of AI Evaluation patterns, Background in DevOps, cloud infrastructure, or SRE</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Replit</Employername>
      <Employerlogo>https://logos.yubhub.co/replit.com.png</Employerlogo>
      <Employerdescription>Replit is an agentic software creation platform that enables anyone to build applications using natural language. With millions of users worldwide, Replit is a high-growth B2B SaaS company.</Employerdescription>
      <Employerwebsite>https://jobs.ashbyhq.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.ashbyhq.com/replit/2c1463ab-05a4-482a-a605-013403a41e80</Applyto>
      <Location>Foster City, CA (Hybrid) In office M,W,F</Location>
      <Country></Country>
      <Postedate>2026-03-08</Postedate>
    </job>
    <job>
      <externalid>73ff6f07-c0e</externalid>
      <Title>Staff Software Engineer, AI Reliability Engineering</Title>
      <Description><![CDATA[<p><strong>About Anthropic</strong></p>
<p>Anthropic&#39;s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.</p>
<p><strong>About the Role</strong></p>
<p>Claude has your back. AIRE has Claude&#39;s. Help us keep Claude reliable for everyone who depends on it.</p>
<p>AIRE (AI Reliability Engineering) partners with teams across Anthropic to improve reliability across our most critical serving paths -- every hop from the SDK through our network, API layers, serving infrastructure, and accelerators and back. We jump into the trenches alongside partner teams to make the systems that deliver Claude more robust and resilient, be it during an incident or collaborating on projects.</p>
<p>Reliability here is an emergent phenomenon that transcends any single team&#39;s boundaries, so someone has to zoom out and look at the whole picture. That&#39;s us -- and it means few teams at Anthropic offer this kind of dynamic, cross-cutting exposure to the systems that matter most.</p>
<p><strong>Responsibilities</strong></p>
<ul>
<li>Develop appropriate Service Level Objectives for large language model serving systems, balancing availability and latency with development velocity.</li>
<li>Design and implement monitoring and observability systems across the token path.</li>
<li>Assist in the design and implementation of high-availability serving infrastructure across multiple regions and cloud providers</li>
<li>Lead incident response for critical AI services, ensuring rapid recovery, thorough incident reviews, and systematic improvements.</li>
<li>Support the reliability of safeguard model serving -- critical for both site reliability and Anthropic&#39;s safety commitments.</li>
</ul>
<p><strong>You may be a good fit if you</strong></p>
<ul>
<li>Have strong distributed systems, infrastructure, or reliability backgrounds -- we&#39;re looking for reliability-minded software engineers and SREs.</li>
<li>Are curious and brave -- comfortable jumping into unfamiliar systems during an incident and helping drive resolution even when you don&#39;t have deep expertise yet.</li>
<li>Think holistically about how systems compose and where the seams are.</li>
<li>Can build lasting relationships across teams -- our engagement model depends on being welcomed as teammates, not outsiders with opinions.</li>
<li>Care about users and feel ownership over outcomes, even for systems you don&#39;t own.</li>
<li>Have excellent communication and collaboration skills -- you&#39;ll be partnering across the entire company.</li>
<li>Bring diverse experience -- the team&#39;s strength comes from people who&#39;ve built product stacks, scaled databases, run massive distributed systems, and everything in between.</li>
</ul>
<p><strong>Strong candidates may also</strong></p>
<ul>
<li>Have been an SRE, Production Engineer, or in similar reliability-focused roles on large scale systems</li>
<li>Have experience operating large-scale model serving or training infrastructure (&gt;1000 GPUs).</li>
<li>Have experience with one or more ML hardware accelerators (GPUs, TPUs, Trainium).</li>
<li>Understand ML-specific networking optimizations like RDMA and InfiniBand.</li>
<li>Have expertise in AI-specific observability tools and frameworks.</li>
<li>Have experience with chaos engineering and systematic resilience testing.</li>
<li>Have contributed to open-source infrastructure or ML tooling.</li>
</ul>
<p><strong>Logistics</strong></p>
<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience. <strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>
<p><strong>Visa sponsorship</strong></p>
<p>We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>
<p><strong>We encourage you to apply even if you do not believe you meet every single qualification.</strong></p>
<p>Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work.</p>
<p><strong>Your safety matters to us.</strong></p>
<p>To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you&#39;re ever unsure about a communication, don&#39;t click any links—visit anthropic.com/careers directly for confirmed position openings.</p>
<p><strong>How we&#39;re different</strong></p>
<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and engineering as it does with computer science.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>staff</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>£325,000 - £390,000GBP</Salaryrange>
      <Skills>distributed systems, infrastructure, reliability, software engineering, SRE, large scale systems, model serving, training infrastructure, ML hardware accelerators, RDMA, InfiniBand, AI-specific observability tools, chaos engineering, resilience testing, open-source infrastructure, ML tooling, SRE, Production Engineer, reliability-focused roles, ML hardware accelerators, RDMA, InfiniBand, AI-specific observability tools, chaos engineering, resilience testing, open-source infrastructure, ML tooling</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Anthropic</Employername>
      <Employerlogo>https://logos.yubhub.co/anthropic.com.png</Employerlogo>
      <Employerdescription>Anthropic is a company that creates reliable, interpretable, and steerable AI systems. It has a quickly growing team of researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.</Employerdescription>
      <Employerwebsite>https://job-boards.greenhouse.io</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/anthropic/jobs/5101173008</Applyto>
      <Location>London, UK</Location>
      <Country></Country>
      <Postedate>2026-03-08</Postedate>
    </job>
    <job>
      <externalid>c930b80e-7a6</externalid>
      <Title>Staff / Senior Software Engineer, AI Reliability</Title>
      <Description><![CDATA[<p><strong>About Anthropic</strong></p>
<p>Anthropic&#39;s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.</p>
<p><strong>About the Role</strong></p>
<p>AIRE (AI Reliability Engineering) partners with teams across Anthropic to improve reliability across our most critical serving paths -- every hop from the SDK through our network, API layers, serving infrastructure, and accelerators and back. We jump into the trenches alongside partner teams to make the systems that deliver Claude more robust and resilient, be it during an incident or collaborating on projects.</p>
<p>Reliability here is an emergent phenomenon that transcends any single team&#39;s boundaries, so someone has to zoom out and look at the whole picture. That&#39;s us -- and it means few teams at Anthropic offer this kind of dynamic, cross-cutting exposure to the systems that matter most.</p>
<p>Claude has your back. AIRE has Claude&#39;s. Help us keep Claude reliable for everyone who depends on it.</p>
<p><strong>Responsibilities:</strong></p>
<ul>
<li>Develop appropriate Service Level Objectives for large language model serving systems, balancing availability and latency with development velocity.</li>
</ul>
<ul>
<li>Design and implement monitoring and observability systems across the token path.</li>
</ul>
<ul>
<li>Assist in the design and implementation of high-availability serving infrastructure across multiple regions and cloud providers</li>
</ul>
<ul>
<li>Lead incident response for critical AI services, ensuring rapid recovery, thorough incident reviews, and systematic improvements.</li>
</ul>
<ul>
<li>Support the reliability of safeguard model serving -- critical for both site reliability and Anthropic&#39;s safety commitments.</li>
</ul>
<p><strong>You may be a good fit if you:</strong></p>
<ul>
<li>Have strong distributed systems, infrastructure, or reliability backgrounds -- we&#39;re looking for reliability-minded software engineers and SREs.</li>
</ul>
<ul>
<li>Are curious and brave -- comfortable jumping into unfamiliar systems during an incident and helping drive resolution even when you don&#39;t have deep expertise yet.</li>
</ul>
<ul>
<li>Think holistically about how systems compose and where the seams are.</li>
</ul>
<ul>
<li>Can build lasting relationships across teams -- our engagement model depends on being welcomed as teammates, not outsiders with opinions.</li>
</ul>
<ul>
<li>Care about users and feel ownership over outcomes, even for systems you don&#39;t own.</li>
</ul>
<ul>
<li>Have excellent communication and collaboration skills -- you&#39;ll be partnering across the entire company.</li>
</ul>
<ul>
<li>Bring diverse experience -- the team&#39;s strength comes from people who&#39;ve built product stacks, scaled databases, run massive distributed systems, and everything in between.</li>
</ul>
<p><strong>Strong candidates may also:</strong></p>
<ul>
<li>Have been an SRE, Production Engineer, or in similar reliability-focused roles on large scale systems</li>
</ul>
<ul>
<li>Have experience operating large-scale model serving or training infrastructure (&gt;1000 GPUs).</li>
</ul>
<ul>
<li>Have experience with one or more ML hardware accelerators (GPUs, TPUs, Trainium).</li>
</ul>
<ul>
<li>Understand ML-specific networking optimizations like RDMA and InfiniBand.</li>
</ul>
<ul>
<li>Have expertise in AI-specific observability tools and frameworks.</li>
</ul>
<ul>
<li>Have experience with chaos engineering and systematic resilience testing.</li>
</ul>
<ul>
<li>Have contributed to open-source infrastructure or ML tooling.</li>
</ul>
<p><strong>Logistics</strong></p>
<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience. <strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>
<p><strong>Visa sponsorship:</strong> We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>
<p><strong>We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work.</strong></p>
<p><strong>Your safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you&#39;re ever unsure about a communication, don&#39;t click any links—visit anthropic.com/careers directly for confirmed position openings.</strong></p>
<p><strong>How we&#39;re different</strong></p>
<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as a team sport, where everyone contributes to the overall success of the team.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>staff</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$325,000 - $485,000 USD</Salaryrange>
      <Skills>distributed systems, infrastructure, reliability, large language model serving systems, monitoring and observability systems, high-availability serving infrastructure, incident response, safeguard model serving, SRE, Production Engineer, ML hardware accelerators, ML-specific networking optimizations, AI-specific observability tools and frameworks, chaos engineering, systematic resilience testing, open-source infrastructure or ML tooling</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Anthropic</Employername>
      <Employerlogo>https://logos.yubhub.co/anthropic.com.png</Employerlogo>
      <Employerdescription>Anthropic is a company that creates reliable, interpretable, and steerable AI systems. It has a quickly growing team of researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.</Employerdescription>
      <Employerwebsite>https://job-boards.greenhouse.io</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/anthropic/jobs/5113224008</Applyto>
      <Location>San Francisco, CA | New York City, NY | Seattle, WA</Location>
      <Country></Country>
      <Postedate>2026-03-08</Postedate>
    </job>
    <job>
      <externalid>10798a1e-9fa</externalid>
      <Title>Staff Software Engineer, AI Reliability Engineering</Title>
      <Description><![CDATA[<p><strong>About Anthropic</strong></p>
<p>Anthropic&#39;s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.</p>
<p><strong>About the Role</strong></p>
<p>Claude has your back. AIRE has Claude&#39;s. Help us keep Claude reliable for everyone who depends on it.</p>
<p>AIRE (AI Reliability Engineering) partners with teams across Anthropic to improve reliability across our most critical serving paths -- every hop from the SDK through our network, API layers, serving infrastructure, and accelerators and back. We jump into the trenches alongside partner teams to make the systems that deliver Claude more robust and resilient, be it during an incident or collaborating on projects.</p>
<p>Reliability here is an emergent phenomenon that transcends any single team&#39;s boundaries, so someone has to zoom out and look at the whole picture. That&#39;s us -- and it means few teams at Anthropic offer this kind of dynamic, cross-cutting exposure to the systems that matter most.</p>
<p><strong>Responsibilities</strong></p>
<ul>
<li>Develop appropriate Service Level Objectives for large language model serving systems, balancing availability and latency with development velocity.</li>
<li>Design and implement monitoring and observability systems across the token path.</li>
<li>Assist in the design and implementation of high-availability serving infrastructure across multiple regions and cloud providers</li>
<li>Lead incident response for critical AI services, ensuring rapid recovery, thorough incident reviews, and systematic improvements.</li>
<li>Support the reliability of safeguard model serving -- critical for both site reliability and Anthropic&#39;s safety commitments.</li>
</ul>
<p><strong>You may be a good fit if you</strong></p>
<ul>
<li>Have strong distributed systems, infrastructure, or reliability backgrounds -- we&#39;re looking for reliability-minded software engineers and SREs.</li>
<li>Are curious and brave -- comfortable jumping into unfamiliar systems during an incident and helping drive resolution even when you don&#39;t have deep expertise yet.</li>
<li>Think holistically about how systems compose and where the seams are.</li>
<li>Can build lasting relationships across teams -- our engagement model depends on being welcomed as teammates, not outsiders with opinions.</li>
<li>Care about users and feel ownership over outcomes, even for systems you don&#39;t own.</li>
<li>Have excellent communication and collaboration skills -- you&#39;ll be partnering across the entire company.</li>
<li>Bring diverse experience -- the team&#39;s strength comes from people who&#39;ve built product stacks, scaled databases, run massive distributed systems, and everything in between.</li>
</ul>
<p><strong>Strong candidates may also</strong></p>
<ul>
<li>Have been an SRE, Production Engineer, or in similar reliability-focused roles on large scale systems</li>
<li>Have experience operating large-scale model serving or training infrastructure (&gt;1000 GPUs).</li>
<li>Have experience with one or more ML hardware accelerators (GPUs, TPUs, Trainium).</li>
<li>Understand ML-specific networking optimizations like RDMA and InfiniBand.</li>
<li>Have expertise in AI-specific observability tools and frameworks.</li>
<li>Have experience with chaos engineering and systematic resilience testing.</li>
<li>Have contributed to open-source infrastructure or ML tooling.</li>
</ul>
<p><strong>Logistics</strong></p>
<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience. <strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>
<p><strong>Salary</strong></p>
<p>The annual compensation range for this role is €235.000 - €295.000EUR.</p>
<p><strong>How we&#39;re different</strong></p>
<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and engineering as it does with computer science. We strive to build a team that reflects this perspective, with people from a wide range of backgrounds and disciplines.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>staff</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>€235.000 - €295.000EUR</Salaryrange>
      <Skills>distributed systems, infrastructure, reliability, software engineering, SRE, large scale systems, model serving, training infrastructure, ML hardware accelerators, RDMA, InfiniBand, AI-specific observability tools, chaos engineering, resilience testing, open-source infrastructure, ML tooling, communication, collaboration, diverse experience, product stacks, databases, distributed systems</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Anthropic</Employername>
      <Employerlogo>https://logos.yubhub.co/anthropic.com.png</Employerlogo>
      <Employerdescription>Anthropic is a company that creates reliable, interpretable, and steerable AI systems. It has a quickly growing team of researchers, engineers, policy experts, and business leaders.</Employerdescription>
      <Employerwebsite>https://job-boards.greenhouse.io</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/anthropic/jobs/5101169008</Applyto>
      <Location>Dublin</Location>
      <Country></Country>
      <Postedate>2026-03-08</Postedate>
    </job>
    <job>
      <externalid>3514d749-08c</externalid>
      <Title>Senior Support Engineer</Title>
      <Description><![CDATA[<p><strong>Senior Support Engineer - San Francisco</strong></p>
<p><strong>Location</strong></p>
<p>San Francisco</p>
<p><strong>Employment Type</strong></p>
<p>Full time</p>
<p><strong>Department</strong></p>
<p><strong>Compensation</strong></p>
<ul>
<li>$234K – $260K • Offers Equity</li>
</ul>
<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>
<p><strong>Benefits</strong></p>
<ul>
<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>
</ul>
<ul>
<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>
</ul>
<ul>
<li>401(k) retirement plan with employer match</li>
</ul>
<ul>
<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>
</ul>
<ul>
<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>
</ul>
<ul>
<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>
</ul>
<ul>
<li>Mental health and wellness support</li>
</ul>
<ul>
<li>Employer-paid basic life and disability coverage</li>
</ul>
<ul>
<li>Annual learning and development stipend to fuel your professional growth</li>
</ul>
<ul>
<li>Daily meals in our offices, and meal delivery credits as eligible</li>
</ul>
<ul>
<li>Relocation support for eligible employees</li>
</ul>
<ul>
<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>
</ul>
<p><strong>About the Team</strong></p>
<p>The Technical Support team is responsible for ensuring that developers and enterprises can reliably build mission critical solutions using OpenAI models. We provide technical guidance, resolve complex issues and support customers in maximizing value and adoption from deploying our highly-capable models. We work closely with Technical Success, Product, Engineering and others to deliver the best possible experience to our customers at scale. We think from an automation-first mindset and leverage the latest in AI to scale our support operations. Join the Senior Support Engineering (SSE) team at OpenAI and help shape the future of Technical Support in the age of AI.</p>
<p><strong>About the Role</strong></p>
<p>We are looking for a Senior Support Engineer to collaborate directly with our strategic enterprise accounts and product teams, helping solve some of the most difficult problems faced by our Customers. You will be part of the best technical troubleshooting team at OpenAI, and our Customers and Engineering teams will look to you for technical guidance in addressing the most technically difficult issues in our environment.</p>
<p>As a Senior Support Engineer, you will design and run operational processes to monitor our top strategic customers and a 24x7 response team. You’ll work closely with our Infrastructure and Engineering teams to deliver the best possible experience to customers at scale. Working directly with our most strategic Customers - You will be crucial to the success of the most innovative, disruptive, and high-scale AI solutions being built with the OpenAI API platform.</p>
<p>The nature of this role will be low volume, high difficulty.</p>
<p>This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.</p>
<p><strong>In this role, you will:</strong></p>
<ul>
<li>Be among the foremost technical and troubleshooting experts for our API platform at OpenAI. You are the last line of defense before the core Engineering team.</li>
</ul>
<ul>
<li>Proactively identify and implement opportunities to scale support operations by leveraging automation and advancements in AI technologies. Contribute to shaping the future of technical support in an AI-driven era.</li>
</ul>
<ul>
<li>Configure and use advanced monitoring and alerting workflows to proactively detect customer impacting issues in real time.</li>
</ul>
<ul>
<li>In partnership with engineering, contribute to reliability reviews and preparedness for new features, launches, or strategic customer requirement updates. Ensure that operational readiness (monitoring, alerting, and fallback plans) is in place for any such changes.</li>
</ul>
<ul>
<li>Design and refine incident response processes and documentation across strategic customers, engineering and support teams.</li>
</ul>
<ul>
<li>Analyze operational metrics and incident RCAs to identify areas for improvement. Proactively recommend and implement enhancements to monitoring dashboards, alert configurations, and support workflows.</li>
</ul>
<ul>
<li>Provide support coverage during holidays and weekends based on business needs.</li>
</ul>
<p><strong>You might thrive in this role if you:</strong></p>
<ul>
<li>Have a Bachelor’s degree in Computer Science or a related field. A strong software engineering foundation is important for this role’s success.</li>
</ul>
<ul>
<li>Have 8+ years of experience in technical operations roles such as SRE/NOC, designing monitoring systems and resolving production issues in fast-paced and mission-critical environments. A strong track record of troubleshooting complex technical problems at the systems level.</li>
</ul>
<ul>
<li>Have deep familiarity with modern monitoring, alerting, and observability practices. Hands‑on experience setting up or managing metrics, logging, and tracing for distributed systems (e.g., understanding of SLIs/SLOs, alert tuning, dashboard creation).</li>
</ul>
<ul>
<li>Have proven experience leading incident response for high‑severity outages or service disruptions. Able to perform real‑time incident coordination, root cause analysis, and communication with stakeholders.</li>
</ul>
<ul>
<li>Are able to work effectively in a fast-paced environment, prioritize tasks, and manage multiple projects simultaneously.</li>
</ul>
<ul>
<li>Are a strong communicator and team player, with excellent written and verbal communication skills.</li>
</ul>
<ul>
<li>Are able to adapt to changing priorities and requirements, and are flexible in your approach to problem-solving.</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$234K – $260K</Salaryrange>
      <Skills>Bachelor’s degree in Computer Science or a related field, 8+ years of experience in technical operations roles such as SRE/NOC, Designing monitoring systems and resolving production issues in fast-paced and mission-critical environments, Troubleshooting complex technical problems at the systems level, Modern monitoring, alerting, and observability practices, Metrics, logging, and tracing for distributed systems, SLIs/SLOs, alert tuning, dashboard creation, Incident response for high‑severity outages or service disruptions, Real-time incident coordination, root cause analysis, and communication with stakeholders, Automation and advancements in AI technologies, Automation-first mindset and leveraging the latest in AI to scale support operations, Technical and troubleshooting expertise for API platform at OpenAI, Proactive identification and implementation of opportunities to scale support operations, Advanced monitoring and alerting workflows to proactively detect customer impacting issues in real time, Reliability reviews and preparedness for new features, launches, or strategic customer requirement updates, Operational readiness (monitoring, alerting, and fallback plans), Incident response processes and documentation across strategic customers, engineering and support teams, Operational metrics and incident RCAs to identify areas for improvement, Enhancements to monitoring dashboards, alert configurations, and support workflows</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>OpenAI</Employername>
      <Employerlogo>https://logos.yubhub.co/openai.com.png</Employerlogo>
      <Employerdescription>OpenAI is a technology company that develops and offers artificial intelligence (AI) models and tools. It was founded in 2015 and is headquartered in San Francisco, California.</Employerdescription>
      <Employerwebsite>https://jobs.ashbyhq.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.ashbyhq.com/openai/5431666c-530b-49c0-b67e-32477f9eaf5e</Applyto>
      <Location>San Francisco</Location>
      <Country></Country>
      <Postedate>2026-03-06</Postedate>
    </job>
    <job>
      <externalid>237ffb32-054</externalid>
      <Title>Software Engineer, Security Observability</Title>
      <Description><![CDATA[<p><strong>Software Engineer, Security Observability</strong></p>
<p><strong>Location</strong></p>
<p>Remote - US</p>
<p><strong>Employment Type</strong></p>
<p>Full time</p>
<p><strong>Location Type</strong></p>
<p>Remote</p>
<p><strong>Department</strong></p>
<p>Security</p>
<p><strong>Compensation</strong></p>
<ul>
<li>$234.4K – $385K • Offers Equity</li>
</ul>
<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>
<ul>
<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>
</ul>
<ul>
<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>
</ul>
<ul>
<li>401(k) retirement plan with employer match</li>
</ul>
<ul>
<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>
</ul>
<ul>
<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>
</ul>
<ul>
<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>
</ul>
<ul>
<li>Mental health and wellness support</li>
</ul>
<ul>
<li>Employer-paid basic life and disability coverage</li>
</ul>
<ul>
<li>Annual learning and development stipend to fuel your professional growth</li>
</ul>
<ul>
<li>Daily meals in our offices, and meal delivery credits as eligible</li>
</ul>
<ul>
<li>Relocation support for eligible employees</li>
</ul>
<ul>
<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>
</ul>
<p>More details about our benefits are available to candidates during the hiring process.</p>
<p>This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.</p>
<p><strong>About the Team</strong></p>
<p>Security is at the foundation of OpenAI’s mission to ensure that artificial general intelligence benefits all of humanity.</p>
<p>The Security team protects OpenAI’s technology, people, and products. We are technical in what we build but are operational in how we do our work, and are committed to supporting all products and research at OpenAI. Our Security team tenets include: prioritizing for impact, enabling researchers, preparing for future transformative technologies, and engaging a robust security culture.</p>
<p><strong>About the Role</strong></p>
<p>We are seeking a Software Engineer, Security Observability to join our Security team. In this role, you will be responsible for building secure, scalable systems that enhance our security observability infrastructure. Leveraging your strong engineering skills, you will collaborate with cross-functional teams to develop, deploy, and maintain robust software solutions that support our security and detection capabilities.</p>
<p>This role is open to remote employees, or relocation assistance is available to one of our OpenAI offices in San Francisco, Seattle, or New York City.</p>
<p><strong>In this role, you will:</strong></p>
<ul>
<li>Design and develop scalable software systems that facilitate security observability across our infrastructure.</li>
</ul>
<ul>
<li>Build and maintain data pipelines that centralize and store security-relevant data from diverse sources.</li>
</ul>
<ul>
<li>Proactively improve the resilience and reliability of data systems to ensure high platform availability</li>
</ul>
<ul>
<li>Collaborate closely with Detection &amp; Response (D&amp;R) and other security teams to reduce the company’s security risk.</li>
</ul>
<ul>
<li>Contribute to data engineering in support of forensic investigations and compliance efforts.</li>
</ul>
<p><strong>You might thrive in this role if you have:</strong></p>
<ul>
<li>Strong software engineering experience, with proficiency in programming languages such as Python, Golang, or similar.</li>
</ul>
<ul>
<li>A background in infrastructure as code, with experience using tools like Terraform and working with cloud platforms such as Azure.</li>
</ul>
<ul>
<li>Experience with building and maintaining data pipelines, particularly for security-related use cases.</li>
</ul>
<ul>
<li>A generalist engineering mindset, with the flexibility to pivot between various technical domains such as databases, site reliability engineering (SRE), or security.</li>
</ul>
<ul>
<li>The ability to collaborate effectively with security and engineering teams to understand evolving data needs and implement scalable solutions.</li>
</ul>
<ul>
<li>A proactive and detail-oriented approach to problem-solving, with a focus on improving security data visibility and forensic capabilities.</li>
</ul>
<p><strong>About OpenAI</strong></p>
<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>mid</Experiencelevel>
      <Workarrangement>remote</Workarrangement>
      <Salaryrange>$234.4K – $385K</Salaryrange>
      <Skills>Python, Golang, Terraform, Azure, data pipelines, security-related use cases, databases, site reliability engineering (SRE), security, infrastructure as code, cloud platforms, forensic investigations, compliance efforts</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>OpenAI</Employername>
      <Employerlogo>https://logos.yubhub.co/openai.com.png</Employerlogo>
      <Employerdescription>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products.</Employerdescription>
      <Employerwebsite>https://jobs.ashbyhq.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.ashbyhq.com/openai/92bf4ff3-7acf-4e49-8e09-47e4e8bd1f83</Applyto>
      <Location>Remote - US</Location>
      <Country></Country>
      <Postedate>2026-03-06</Postedate>
    </job>
    <job>
      <externalid>edcdad0c-360</externalid>
      <Title>Software Engineer, Security Observability</Title>
      <Description><![CDATA[<p><strong>Software Engineer, Security Observability</strong></p>
<p><strong>Location</strong></p>
<p>San Francisco</p>
<p><strong>Employment Type</strong></p>
<p>Full time</p>
<p><strong>Location Type</strong></p>
<p>Hybrid</p>
<p><strong>Department</strong></p>
<p>Security</p>
<p><strong>Compensation</strong></p>
<ul>
<li>$234.4K – $385K • Offers Equity</li>
</ul>
<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>
<p><strong>Benefits</strong></p>
<ul>
<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>
</ul>
<ul>
<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>
</ul>
<ul>
<li>401(k) retirement plan with employer match</li>
</ul>
<ul>
<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>
</ul>
<ul>
<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>
</ul>
<ul>
<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>
</ul>
<ul>
<li>Mental health and wellness support</li>
</ul>
<ul>
<li>Employer-paid basic life and disability coverage</li>
</ul>
<ul>
<li>Annual learning and development stipend to fuel your professional growth</li>
</ul>
<ul>
<li>Daily meals in our offices, and meal delivery credits as eligible</li>
</ul>
<ul>
<li>Relocation support for eligible employees</li>
</ul>
<ul>
<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>
</ul>
<p><strong>About the Team</strong></p>
<p>Security is at the foundation of OpenAI’s mission to ensure that artificial general intelligence benefits all of humanity.</p>
<p>The Security team protects OpenAI’s technology, people, and products. We are technical in what we build but are operational in how we do our work, and are committed to supporting all products and research at OpenAI. Our Security team tenets include: prioritizing for impact, enabling researchers, preparing for future transformative technologies, and engaging a robust security culture.</p>
<p><strong>About the Role</strong></p>
<p>We are seeking a Software Engineer, Security Observability to join our Security team. In this role, you will be responsible for building secure, scalable systems that enhance our security observability infrastructure. Leveraging your strong engineering skills, you will collaborate with cross-functional teams to develop, deploy, and maintain robust software solutions that support our security and detection capabilities.</p>
<p>This role is open to remote employees, or relocation assistance is available to one of our OpenAI offices in San Francisco, Seattle, or New York City.</p>
<p><strong>In this role, you will:</strong></p>
<ul>
<li>Design and develop scalable software systems that facilitate security observability across our infrastructure.</li>
</ul>
<ul>
<li>Build and maintain data pipelines that centralize and store security-relevant data from diverse sources.</li>
</ul>
<ul>
<li>Proactively improve the resilience and reliability of data systems to ensure high platform availability</li>
</ul>
<ul>
<li>Collaborate closely with Detection &amp; Response (D&amp;R) and other security teams to reduce the company’s security risk.</li>
</ul>
<ul>
<li>Contribute to data engineering in support of forensic investigations and compliance efforts.</li>
</ul>
<p><strong>You might thrive in this role if you have:</strong></p>
<ul>
<li>Strong software engineering experience, with proficiency in programming languages such as Python, Golang, or similar.</li>
</ul>
<ul>
<li>A background in infrastructure as code, with experience using tools like Terraform and working with cloud platforms such as Azure.</li>
</ul>
<ul>
<li>Experience with building and maintaining data pipelines, particularly for security-related use cases.</li>
</ul>
<ul>
<li>A generalist engineering mindset, with the flexibility to pivot between various technical domains such as databases, site reliability engineering (SRE), or security.</li>
</ul>
<ul>
<li>The ability to collaborate effectively with security and engineering teams to understand evolving data needs and implement scalable solutions.</li>
</ul>
<ul>
<li>A proactive and detail-oriented approach to problem-solving, with a focus on improving security data visibility and forensic capabilities.</li>
</ul>
<p><strong>About OpenAI</strong></p>
<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>mid</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$234.4K – $385K • Offers Equity</Salaryrange>
      <Skills>Python, Golang, Terraform, Azure, data pipelines, security-related use cases, databases, site reliability engineering (SRE), security, infrastructure as code, cloud platforms, data engineering, forensic investigations, compliance efforts</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>OpenAI</Employername>
      <Employerlogo>https://logos.yubhub.co/openai.com.png</Employerlogo>
      <Employerdescription>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. The company pushes the boundaries of the capabilities of AI systems and seeks to safely deploy them to the world through their products.</Employerdescription>
      <Employerwebsite>https://jobs.ashbyhq.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.ashbyhq.com/openai/3e254907-5101-438d-8708-f6f34e5c75ea</Applyto>
      <Location>San Francisco</Location>
      <Country></Country>
      <Postedate>2026-03-06</Postedate>
    </job>
    <job>
      <externalid>88643d65-f58</externalid>
      <Title>Software Engineer, Security Observability</Title>
      <Description><![CDATA[<p><strong>Software Engineer, Security Observability</strong></p>
<p><strong>Location</strong></p>
<p>Seattle</p>
<p><strong>Employment Type</strong></p>
<p>Full time</p>
<p><strong>Department</strong></p>
<p>Security</p>
<p><strong>Compensation</strong></p>
<ul>
<li>$234.4K – $385K • Offers Equity</li>
</ul>
<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>
<p><strong>Benefits</strong></p>
<ul>
<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>
</ul>
<ul>
<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>
</ul>
<ul>
<li>401(k) retirement plan with employer match</li>
</ul>
<ul>
<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>
</ul>
<ul>
<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>
</ul>
<ul>
<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>
</ul>
<ul>
<li>Mental health and wellness support</li>
</ul>
<ul>
<li>Employer-paid basic life and disability coverage</li>
</ul>
<ul>
<li>Annual learning and development stipend to fuel your professional growth</li>
</ul>
<ul>
<li>Daily meals in our offices, and meal delivery credits as eligible</li>
</ul>
<ul>
<li>Relocation support for eligible employees</li>
</ul>
<ul>
<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>
</ul>
<p><strong>About the Team</strong></p>
<p>Security is at the foundation of OpenAI’s mission to ensure that artificial general intelligence benefits all of humanity.</p>
<p>The Security team protects OpenAI’s technology, people, and products. We are technical in what we build but are operational in how we do our work, and are committed to supporting all products and research at OpenAI. Our Security team tenets include: prioritizing for impact, enabling researchers, preparing for future transformative technologies, and engaging a robust security culture.</p>
<p><strong>About the Role</strong></p>
<p>We are seeking a Software Engineer, Security Observability to join our Security team. In this role, you will be responsible for building secure, scalable systems that enhance our security observability infrastructure. Leveraging your strong engineering skills, you will collaborate with cross-functional teams to develop, deploy, and maintain robust software solutions that support our security and detection capabilities.</p>
<p>This role is open to remote employees, or relocation assistance is available to one of our OpenAI offices in San Francisco, Seattle, or New York City.</p>
<p><strong>In this role, you will:</strong></p>
<ul>
<li>Design and develop scalable software systems that facilitate security observability across our infrastructure.</li>
</ul>
<ul>
<li>Build and maintain data pipelines that centralize and store security-relevant data from diverse sources.</li>
</ul>
<ul>
<li>Proactively improve the resilience and reliability of data systems to ensure high platform availability</li>
</ul>
<ul>
<li>Collaborate closely with Detection &amp; Response (D&amp;R) and other security teams to reduce the company’s security risk.</li>
</ul>
<ul>
<li>Contribute to data engineering in support of forensic investigations and compliance efforts.</li>
</ul>
<p><strong>You might thrive in this role if you have:</strong></p>
<ul>
<li>Strong software engineering experience, with proficiency in programming languages such as Python, Golang, or similar.</li>
</ul>
<ul>
<li>A background in infrastructure as code, with experience using tools like Terraform and working with cloud platforms such as Azure.</li>
</ul>
<ul>
<li>Experience with building and maintaining data pipelines, particularly for security-related use cases.</li>
</ul>
<ul>
<li>A generalist engineering mindset, with the flexibility to pivot between various technical domains such as databases, site reliability engineering (SRE), or security.</li>
</ul>
<ul>
<li>The ability to collaborate effectively with security and engineering teams to understand evolving data needs and implement scalable solutions.</li>
</ul>
<ul>
<li>A proactive and detail-oriented approach to problem-solving, with a focus on improving security data visibility and forensic capabilities.</li>
</ul>
<p><strong>About OpenAI</strong></p>
<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>mid</Experiencelevel>
      <Workarrangement>remote</Workarrangement>
      <Salaryrange>$234.4K – $385K</Salaryrange>
      <Skills>Python, Golang, Terraform, Azure, data pipelines, security-related use cases, databases, site reliability engineering (SRE), security, infrastructure as code, cloud platforms, data engineering, forensic investigations, compliance efforts</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>OpenAI</Employername>
      <Employerlogo>https://logos.yubhub.co/openai.com.png</Employerlogo>
      <Employerdescription>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. The company was founded in 2015 and has since grown to become a leading player in the field of artificial intelligence.</Employerdescription>
      <Employerwebsite>https://jobs.ashbyhq.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.ashbyhq.com/openai/747bb870-4ef1-4bfd-b2c0-d48042a85080</Applyto>
      <Location>Seattle</Location>
      <Country></Country>
      <Postedate>2026-03-06</Postedate>
    </job>
    <job>
      <externalid>7f4e2dd8-338</externalid>
      <Title>Software Engineer, Security Observability</Title>
      <Description><![CDATA[<p><strong>Software Engineer, Security Observability</strong></p>
<p><strong>Location</strong></p>
<p>New York City</p>
<p><strong>Employment Type</strong></p>
<p>Full time</p>
<p><strong>Location Type</strong></p>
<p>Hybrid</p>
<p><strong>Department</strong></p>
<p>Security</p>
<p><strong>Compensation</strong></p>
<ul>
<li>$325K – $405K • Offers Equity</li>
</ul>
<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>
<ul>
<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>
</ul>
<ul>
<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>
</ul>
<ul>
<li>401(k) retirement plan with employer match</li>
</ul>
<ul>
<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>
</ul>
<ul>
<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>
</ul>
<ul>
<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>
</ul>
<ul>
<li>Mental health and wellness support</li>
</ul>
<ul>
<li>Employer-paid basic life and disability coverage</li>
</ul>
<ul>
<li>Annual learning and development stipend to fuel your professional growth</li>
</ul>
<ul>
<li>Daily meals in our offices, and meal delivery credits as eligible</li>
</ul>
<ul>
<li>Relocation support for eligible employees</li>
</ul>
<ul>
<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>
</ul>
<p>More details about our benefits are available to candidates during the hiring process.</p>
<p>This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.</p>
<p><strong>About the Team</strong></p>
<p>Security is at the foundation of OpenAI’s mission to ensure that artificial general intelligence benefits all of humanity.</p>
<p>The Security team protects OpenAI’s technology, people, and products. We are technical in what we build but are operational in how we do our work, and are committed to supporting all products and research at OpenAI. Our Security team tenets include: prioritizing for impact, enabling researchers, preparing for future transformative technologies, and engaging a robust security culture.</p>
<p><strong>About the Role</strong></p>
<p>We are seeking a Software Engineer, Security Observability to join our Security team. In this role, you will be responsible for building secure, scalable systems that enhance our security observability infrastructure. Leveraging your strong engineering skills, you will collaborate with cross-functional teams to develop, deploy, and maintain robust software solutions that support our security and detection capabilities.</p>
<p>This role is open to remote employees, or relocation assistance is available to one of our OpenAI offices in San Francisco, Seattle, or New York City.</p>
<p><strong>In this role, you will:</strong></p>
<ul>
<li>Design and develop scalable software systems that facilitate security observability across our infrastructure.</li>
</ul>
<ul>
<li>Build and maintain data pipelines that centralize and store security-relevant data from diverse sources.</li>
</ul>
<ul>
<li>Proactively improve the resilience and reliability of data systems to ensure high platform availability</li>
</ul>
<ul>
<li>Collaborate closely with Detection &amp; Response (D&amp;R) and other security teams to reduce the company’s security risk.</li>
</ul>
<ul>
<li>Contribute to data engineering in support of forensic investigations and compliance efforts.</li>
</ul>
<p><strong>You might thrive in this role if you have:</strong></p>
<ul>
<li>Strong software engineering experience, with proficiency in programming languages such as Python, Golang, or similar.</li>
</ul>
<ul>
<li>A background in infrastructure as code, with experience using tools like Terraform and working with cloud platforms such as Azure.</li>
</ul>
<ul>
<li>Experience with building and maintaining data pipelines, particularly for security-related use cases.</li>
</ul>
<ul>
<li>A generalist engineering mindset, with the flexibility to pivot between various technical domains such as databases, site reliability engineering (SRE), or security.</li>
</ul>
<ul>
<li>The ability to collaborate effectively with security and engineering teams to understand evolving data needs and implement scalable solutions.</li>
</ul>
<ul>
<li>A proactive and detail-oriented approach to problem-solving, with a focus on improving security data visibility and forensic capabilities.</li>
</ul>
<p><strong>About OpenAI</strong></p>
<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>mid</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$325K – $405K • Offers Equity</Salaryrange>
      <Skills>Python, Golang, Terraform, Azure, data pipelines, security-related use cases, databases, site reliability engineering (SRE), security, infrastructure as code, cloud platforms, forensic investigations, compliance efforts</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>OpenAI</Employername>
      <Employerlogo>https://logos.yubhub.co/openai.com.png</Employerlogo>
      <Employerdescription>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products.</Employerdescription>
      <Employerwebsite>https://jobs.ashbyhq.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.ashbyhq.com/openai/1e4e9985-babf-4bd9-8fe8-a2016250780d</Applyto>
      <Location>New York City</Location>
      <Country></Country>
      <Postedate>2026-03-06</Postedate>
    </job>
    <job>
      <externalid>6308fa9f-2f4</externalid>
      <Title>Member of Technical Staff - Principal Data Infrastructure Engineer</Title>
      <Description><![CDATA[<p><strong>Summary</strong></p>
<p>Microsoft AI are looking for a talented Member of Technical Staff - Principal Data Infrastructure Engineer at their Redmond office. This role sits at the heart of strategic decision-making, turning market data into actionable insights for a company that&#39;s revolutionising AI technology. You&#39;ll work directly with leadership to shape the company&#39;s direction in the AI market.</p>
<p><strong>About the Role</strong></p>
<p>As a Member of Technical Staff - Principal Data Infrastructure Engineer, you will be responsible for architecting and maintaining scalable, reliable, and observable Big Data Infrastructure for mission-critical AI applications. You will champion DevOps and SRE best practices—automated deployments, service monitoring, and incident response. You will build a self-service big data platform that empowers data and platform engineers and researchers. You will develop robust CI/CD pipelines and automate infrastructure provisioning using Infrastructure as Code tools (Bicep, Terraform, ARM).</p>
<p><strong>Accountabilities</strong></p>
<ul>
<li>Architect and maintain scalable, reliable, and observable Big Data Infrastructure for mission-critical AI applications.</li>
<li>Champion DevOps and SRE best practices—automated deployments, service monitoring, and incident response.</li>
</ul>
<p><strong>The Candidate we&#39;re looking for</strong></p>
<p><strong>Experience:</strong></p>
<ul>
<li>Master’s Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 4+ years experience in business analytics, data science, software development, data modeling, or data engineering OR Bachelor’s Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 6+ years experience in business analytics, data science, software development, data modeling, or data engineering OR equivalent experience.</li>
</ul>
<p><strong>Technical skills:</strong></p>
<ul>
<li>4+ years in Big Data Infrastructure, DevOps, SRE, or Platform Engineering.</li>
<li>3+ years of hands-on experience managing and scaling distributed systems—from bare-metal to cloud-native environments.</li>
<li>2+ years deploying containerized applications using Kubernetes and Helm/Kustomize.</li>
<li>Solid scripting and automation skills using Python, Bash, or PowerShell.</li>
</ul>
<p><strong>Personal attributes:</strong></p>
<ul>
<li>Excellent interpersonal and communication skills, with a solid passion for mentorship and continuous learning.</li>
</ul>
<p><strong>Benefits</strong></p>
<ul>
<li>Starting January 26, 2026, Microsoft AI employees who live within a 50-mile commute of a designated Microsoft office in the U.S. or 25-mile commute of a non-U.S., country-specific location are expected to work from the office at least four days per week.</li>
<li>Microsoft’s mission is to empower every person and every organization on the planet to achieve more.</li>
<li>Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>staff</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>Big Data Infrastructure, DevOps, SRE, Platform Engineering, Python, Bash, PowerShell, Kubernetes, Helm/Kustomize, Databricks, IAM, OAuth, Kerberos, Azure, AWS, GCP</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Microsoft AI</Employername>
      <Employerlogo>https://logos.yubhub.co/microsoft.ai.png</Employerlogo>
      <Employerdescription>Microsoft continues to push the boundaries of AI, aiming to build systems with true artificial intelligence across agents, applications, services, and infrastructure, making AI accessible to all.</Employerdescription>
      <Employerwebsite>https://microsoft.ai</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://microsoft.ai/job/member-of-technical-staff-principal-data-infrastructure-engineer/</Applyto>
      <Location>Redmond</Location>
      <Country></Country>
      <Postedate>2026-03-06</Postedate>
    </job>
    <job>
      <externalid>bed7736d-0a7</externalid>
      <Title>Browser Infrastructure Engineer</Title>
      <Description><![CDATA[<p>This role exists to build reliable, automated, and scalable infrastructure for Chromium-based browser teams. As a Browser Infrastructure Engineer, you will focus on CI/CD pipelines, monitoring, and development environments to support fast-paced browser innovation.</p>
<p><strong>What you&#39;ll do</strong></p>
<p>You will set up and maintain CI/CD pipelines for builds and testing, support and evolve Chromium browser development infrastructure, configure monitoring and alerting systems, manage cloud infrastructure, develop automation scripts, and ensure high availability, resilience, and security of development infrastructure.</p>
<p><strong>What you need</strong></p>
<p>You will need 3+ years in software development infrastructure, preferably Chromium browsers, hands-on DevOps and SRE experience, including monitoring and incident management, proficiency in k8s, Terraform, Datadog, Sentry, AWS, Unix, TeamCity, strong CI/CD implementation skills, and ability to thrive in Agile teams with excellent communication.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>mid</Experiencelevel>
      <Workarrangement>remote</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>software development infrastructure, CI/CD pipelines, monitoring and alerting systems, cloud infrastructure, automation scripts, DevOps and SRE experience, k8s, Terraform, Datadog, Sentry, AWS, Unix, TeamCity</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Perplexity</Employername>
      <Employerlogo>https://logos.yubhub.co/perplexity.com.png</Employerlogo>
      <Employerdescription>Perplexity is a young, fast-growing Chromium-based browser. They are committed to building reliable, automated, and scalable infrastructure for their browser development teams.</Employerdescription>
      <Employerwebsite>https://jobs.ashbyhq.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.ashbyhq.com/perplexity/7bce0fcf-eef6-41aa-9243-896f07a0316e</Applyto>
      <Location>Belgrade</Location>
      <Country></Country>
      <Postedate>2026-03-04</Postedate>
    </job>
    <job>
      <externalid>c4e68b15-a2a</externalid>
      <Title>Produktionsplaner / Produktionssteurer (m/w/d)</Title>
      <Description><![CDATA[<p><strong>Summary</strong></p>
<p>FUCHS LUBRICANTS GERMANY GmbH are looking for a talented Produktionsplaner / Produktionssteurer (m/w/d) at their Wedel office. This role sits at the heart of strategic decision-making, turning market data into actionable insights for a company that&#39;s revolutionising the chemical industry. You&#39;ll work directly with leadership to shape the company&#39;s direction in the production and manufacturing markets.</p>
<p><strong>About the Role</strong></p>
<p>Wir suchen einen Produktionsplaner / Produktionssteurer (m/w/d) für unsere Fertigungsstätte in Wedel. Als Produktionsplaner / Produktionssteurer (m/w/d) sind Sie für die Planung und Koordination von Produktions- und Abfüllaufträgen verantwortlich. Sie überwachen alle Produktionsaufträge und sichern die termingerechte Fertigstellung. Enge Abstimmung mit den Bereichen Einkauf, Logistik, Produktion, Qualität, Customer Service und Vertrieb.</p>
<p><strong>Accountabilities</strong></p>
<ul>
<li>Planung und Koordination von Produktions- und Abfüllaufträgen zur optimalen Kapazitätsauslastung</li>
<li>Verfolgung aller Produktionsaufträge und Sicherstellung der termingerechten Fertigstellung</li>
</ul>
<p><strong>The Candidate we&#39;re looking for</strong></p>
<p><strong>Experience:</strong></p>
<ul>
<li>Abgeschlossene kaufmännische oder technische Ausbildung bzw. ein Studium (z.B. Betriebswirtschaft mit Schwerpunkt Supply Chain, Logistik, Produktion)</li>
</ul>
<p><strong>Technical skills:</strong></p>
<ul>
<li>Erfahrung im dispositiven Bereich, z. B. Produktionsplanung, Produktionssteuerung, operativer Einkauf oder Vertrieb</li>
</ul>
<p><strong>Personal attributes:</strong></p>
<ul>
<li>Analytisches Denken, strukturiertes Arbeiten, hohe Stressresistenz sowie Durchsetzungsvermögen</li>
</ul>
<p><strong>Benefits</strong></p>
<ul>
<li>Vereinbarkeit von Privat- und Berufsleben (u.a. Flexible Arbeitszeitmodelle, Gleitzeit, 30 Tage Urlaub, Freistellungsmöglichkeiten)</li>
<li>Eine sichere Zukunftsperspektive in einem dynamischen, weltweit agierenden Unternehmen</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>mid</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>Produktionsplanung, Produktionssteuerung, Einkauf, Logistik, Produktion, Qualität, Customer Service, Vertrieb, Analytisches Denken, strukturiertes Arbeiten, hohe Stressresistenz, Durchsetzungsvermögen</Skills>
      <Category>Operations</Category>
      <Industry>Manufacturing</Industry>
      <Employername>FUCHS LUBRICANTS GERMANY GmbH</Employername>
      <Employerlogo>https://logos.yubhub.co/jobs.fuchs.com.png</Employerlogo>
      <Employerdescription>FUCHS LUBRICANTS GERMANY GmbH ist die größte operativ tätige Gesellschaft des global agierenden FUCHS Konzerns mit Stammsitz in Mannheim und entwickelt, produziert und vertreibt qualitativ hochwertige Schmierstoffe und benachbarte chemische Spezialitäten für den deutschen und internationalen Markt.</Employerdescription>
      <Employerwebsite>https://jobs.fuchs.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.fuchs.com/job/Wedel-Produktionsplaner-Produktionssteurer-%28mwd%29-SH-22880/1365989033/</Applyto>
      <Location>Wedel</Location>
      <Country></Country>
      <Postedate>2026-02-19</Postedate>
    </job>
    <job>
      <externalid>7634df8f-923</externalid>
      <Title>HR Manager (m/w/d)</Title>
      <Description><![CDATA[<p><strong>Summary</strong></p>
<p>FUCHS LUBRICANTS GERMANY GmbH are looking for a talented HR Manager (m/w/d) at their Mannheim office. This role sits at the heart of strategic decision-making, turning market data into actionable insights for a company that&#39;s revolutionising haptic entertainment technology. You&#39;ll work directly with leadership to shape the company&#39;s direction in the cinema and simulation markets.</p>
<p><strong>About the Role</strong></p>
<p>As HR Manager (m/w/d), you will be the first point of contact for employees and management in all basic HR matters of the day-to-day business (e.g. labour and collective bargaining law, remuneration issues, performance and potential management). You will steer and accompany our employees along the entire employee lifecycle from planning to recruitment, development to exit in the assigned department. You will plan and implement personnel measures taking into account legal, contractual, collective bargaining and personnel requirements. You will work closely with the HR team at all German locations. You will maintain a trusting and constructive relationship with the works council. You will participate in HR projects (e.g. process optimisation, digitalisation, employer branding) with a focus on operational implementation. You will support in cross-location HR initiatives.</p>
<p><strong>Accountabilities</strong></p>
<ul>
<li>Conduct in-depth market research across cinema and simulation sectors, identifying emerging trends, competitive threats, and partnership opportunities that directly inform the company&#39;s quarterly strategic planning sessions</li>
</ul>
<p><strong>The Candidate we&#39;re looking for</strong></p>
<p><strong>Experience:</strong></p>
<ul>
<li>Abgeschlossenes Studium im Bereich Wirtschaftswissenschaften, Psychologie, Sozialwissenschaften oder eine vergleichbare Ausbildung mit erster Berufserfahrung im Personalwesen</li>
</ul>
<p><strong>Technical skills:</strong></p>
<ul>
<li>Grundkenntnisse im Arbeitsrecht und ein grundlegendes Verständnis betriebsverfassungsrechtlicher Abläufe</li>
</ul>
<p><strong>Personal attributes:</strong></p>
<ul>
<li>Hohe Serviceorientierung, kommunikative Stärke und Freude an der Zusammenarbeit mit Mitarbeitenden und Führungskräften im operativen Tagesgeschäft</li>
</ul>
<p><strong>Benefits</strong></p>
<ul>
<li>Vereinbarkeit von Privat- und Berufsleben (u.a. Flexible Arbeitszeitmodelle, Gleitzeit, 30 Tage Urlaub, Freistellungsmöglichkeiten)</li>
<li>Eine sichere Zukunftsperspektive in einem dynamischen, weltweit agierenden Unternehmen</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>mid</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>Arbeitsrecht, Betriebsverfassungsrecht, Personalwesen, Digitalisierung, Employer Branding</Skills>
      <Category>HR</Category>
      <Industry>Manufacturing</Industry>
      <Employername>FUCHS LUBRICANTS GERMANY GmbH</Employername>
      <Employerlogo>https://logos.yubhub.co/jobs.fuchs.com.png</Employerlogo>
      <Employerdescription>FUCHS LUBRICANTS GERMANY GmbH is the largest operating company of the global FUCHS Group with its headquarters in Mannheim and develops, produces and distributes high-quality lubricants and adjacent chemical specialties for the German and international market.</Employerdescription>
      <Employerwebsite>https://jobs.fuchs.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.fuchs.com/job/Mannheim-HR-Manager-%28mwd%29-BW-68169/1291919601/</Applyto>
      <Location>Mannheim</Location>
      <Country></Country>
      <Postedate>2026-02-12</Postedate>
    </job>
    <job>
      <externalid>938f7e18-10b</externalid>
      <Title>Praktikum HR Business Partner - Produktion &amp; Logistik</Title>
      <Description><![CDATA[<p><strong>What you&#39;ll do</strong></p>
<p>You&#39;ll manage daily operations of the facility, ensuring equipment runs smoothly and maintenance schedules stay on track.</p>
<ul>
<li>Coordinate maintenance schedules and ensure equipment operates efficiently throughout the day</li>
<li>Respond to urgent repair requests within 2-hour SLA windows using our ticketing system</li>
<li>Manage relationships with external contractors and vendors, getting quotes and overseeing work quality</li>
<li>Track facility costs and identify opportunities to reduce waste whilst maintaining standards</li>
<li>Lead weekly safety inspections and ensure compliance with health and safety regulations</li>
</ul>
<p><strong>What you need</strong></p>
<p>To succeed in this role, you&#39;ll need hands-on facilities experience and strong problem-solving skills.</p>
<ul>
<li>3+ years facilities maintenance experience in a commercial or industrial environment</li>
<li>Proven ability to manage contractors and vendors effectively whilst staying within budget</li>
<li>Strong electrical and mechanical troubleshooting skills - you can diagnose issues quickly</li>
<li>Comfortable using CMMS software (Maximo, SAP, or similar) to log jobs and track work</li>
<li>Understanding of health and safety regulations and how they apply to facilities work</li>
</ul>
<p><strong>Why this matters</strong></p>
<p>This role keeps a world-championship-winning F1 team running. When equipment fails, races can be lost, so your work directly impacts performance. You&#39;ll develop deep expertise in high-spec facilities and have clear progression into senior facilities management roles. The F1 environment means you&#39;ll work with cutting-edge building systems and learn from the best in the industry.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>entry</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>Ganzheitliches Personalmanagement, HR-Prozessmanagement und -optimierung, Führungskräfte-Beratung, Transformationsprozesse, Risikoidentifikation und -analyse im Personalbereich, Personal- und Arbeitsrecht, HR-Administration, SAP-Kenntnisse</Skills>
      <Category>HR</Category>
      <Industry>Automotive</Industry>
      <Employername>Dr. Ing. h.c. F. Porsche AG</Employername>
      <Employerlogo>https://logos.yubhub.co/jobs.porsche.com.png</Employerlogo>
      <Employerdescription>Porsche is a valuable brand with worldwide appeal and a loyal customer base around the globe. The way we work together and hold together as a team is unique. Our Miteinander is shaped by our strong Porsche culture: Heartblood | Sportiness | Pioneer spirit | A family</Employerdescription>
      <Employerwebsite>https://jobs.porsche.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.porsche.com/index.php?ac=jobad&amp;id=18021</Applyto>
      <Location>Sachsenheim bei Stuttgart</Location>
      <Country></Country>
      <Postedate>2025-12-08</Postedate>
    </job>
    <job>
      <externalid>c900cb93-d8d</externalid>
      <Title>Empowering Climate-Positive Generations</Title>
      <Description><![CDATA[<p><strong>What you&#39;ll do</strong></p>
<p>You&#39;ll create, check, and evaluate protection, measurement, and control concepts as well as circuit diagrams for electrical systems. You&#39;ll plan and carry out tenders, including the creation of technical specifications, performance schedules, and job descriptions. You&#39;ll lead bidder discussions, evaluate offers, and create well-founded decision-making documents for the project management. You&#39;ll manage and coordinate electrical projects in the medium and high-voltage range under compliance with schedule, cost, and quality requirements. You&#39;ll manage interfaces and coordinate with customers, network operators, certifiers, and executing companies to ensure smooth project progress. You&#39;ll accompany conformity tests and commissioning of switchgear and generators, taking into account relevant standards and market requirements.</p>
<p><strong>What you need</strong></p>
<p>You&#39;ll be able to read, create, and evaluate protection, measurement, and control concepts (e.g. E-Plan, SLD). You&#39;ll have knowledge of current communication protocols (IEC 61850, IEC 60870-5-101/104, Modbus, etc.). You&#39;ll be familiar with energy market interfaces and their integration into network and control technology. You&#39;ll have in-depth knowledge of technical connection rules in medium-voltage networks (VDE-AR-N 4105/4110). You&#39;ll have experience in project management of electrical projects and switchgear in MS/HS (schedule, cost, resource management). You&#39;ll have a degree in electrical engineering, energy technology, etc. or a comparable training. You&#39;ll have several years of professional experience in consulting, at an engineering service provider, and/or in the energy sector. You&#39;ll have practical experience in project management, e.g. in the energy sector. You&#39;ll have entrepreneurial thinking and sales affinity, ideally with an existing network in the relevant field. You&#39;ll have a high interest in current trends in the energy industry and new technologies. You&#39;ll have a structured and independent working style. You&#39;ll have high communication and coordination skills. You&#39;ll be able to address complex technical issues in a target-oriented manner. You&#39;ll have secure verbal and written German and English skills. You&#39;ll have a basic willingness to travel and flexibility.</p>
<p><strong>Why this matters</strong></p>
<p>We work together on complex, technical, energy-economic, and conceptual challenges of sustainable future design. Our actions are guided by our IE2S purpose: EMPOWERING CLIMATE-POSITIVE GENERATIONS!</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>mid</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>Schutz-, Mess- und Steuerungskonzepte, aktuelle Kommunikationsprotokolle, Energiemarktschnittstellen, technische Anschlussregeln, Projektmanagement, Elektrotechnik, Energiebranche, Unternehmerisches Denken, vertriebliche Affinität, Kommunikations- und Koordinationsfähigkeit</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>MHP - A Porsche Company</Employername>
      <Employerlogo>https://logos.yubhub.co/jobs.porsche.com.png</Employerlogo>
      <Employerdescription>As Intelligent Energy System Services GmbH, we know that the mobility and energy transition can only be achieved with concentrated power - with the bundling of the right competences and skills. We work together on complex, technical, energy-economic, and conceptual challenges of sustainable future design.</Employerdescription>
      <Employerwebsite>https://jobs.porsche.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.porsche.com/index.php?ac=jobad&amp;id=17773</Applyto>
      <Location>Stuttgart</Location>
      <Country></Country>
      <Postedate>2025-12-08</Postedate>
    </job>
  </jobs>
</source>