DevOps Engineer (all genders)

bee517db-e9c DevOps Engineer (all genders) Join our DevOps team at Holidu, a central team across the entire tech organisation, responsible for creating and maintaining the infrastructure that powers all of our products and services.

In this role, you will contribute to the continuous improvement of our DevOps processes, collaborate with cross-functional teams, and apply best practices for scalable, reliable, and secure systems.

Our ideal candidate has a solid technical foundation, a strong hands-on approach, and the ability to deliver results with minimal supervision.

Our Tech Stack

Cloud: AWS (EC2, S3, RDS, EKS, Elasticache, Lambda)
Container Orchestration: Kubernetes with Helm
Infrastructure as Code: Terraform + Terragrunt, Pulumi/ CDK
Monitoring & Observability: Prometheus, Grafana, Elastic Stack, OpenTelemetry
CI/CD: Jenkins, GitHub Actions, ArgoCD, ArgoRollouts
Scripting: Python, Go, Bash
Version Control: GitHub
Collaboration: Jira (Agile)
Automation: N8N, AI-assisted tooling (Agentic ADK)

Your role in this journey

As a DevOps Engineer, you will be responsible for:

Implementing and maintaining infrastructure definitions using Terraform, Pulumi, or similar tools
Ensuring IaC standards are followed and contributing improvements to existing modules and patterns
Managing and monitoring AWS services, ensuring system performance, availability, and adherence to best practices
Troubleshooting production issues and participating in capacity planning
Maintaining and troubleshooting Kubernetes clusters , deploying workloads, managing configurations, scaling services, and resolving incidents to support high-availability applications
Maintaining and improving CI/CD pipelines to ensure smooth, automated software delivery
Identifying bottlenecks and implementing enhancements across Jenkins, GitHub Actions, ArgoRollouts and ArgoCD
Maintaining and extending our monitoring stack (Prometheus, Grafana)
Building dashboards, configuring alerts, and improving observability to ensure comprehensive visibility into system health and performance

Your backpack is filled with

4+ years of experience in a DevOps, SRE, or cloud engineering role with hands-on production experience
Solid working experience with AWS services (EC2, EKS, S3, RDS, Lambda) and cloud infrastructure management
Hands-on experience with Docker and Kubernetes in production environments , deploying, scaling, and troubleshooting containerized workloads
Practical experience with at least one Infrastructure as Code tool (Terraform, Pulumi, or AWS CDK)
Experience maintaining and improving CI/CD pipelines using tools like Jenkins, GitHub Actions, or ArgoCD
Proficiency in scripting with Python, Bash, or Go for operational automation
Working knowledge of monitoring and observability tools such as Prometheus, Grafana, or similar platforms
Familiarity with logging and log aggregation systems (Elastic Stack, Open Telemetry, or similar)
Solid understanding of Linux administration, networking fundamentals, and system security basics
Strong communication skills with the ability to collaborate across teams and explain technical decisions clearly

Nice to Have

Experience with Helm charts and Kubernetes package management
Familiarity with GitOps workflows (e.g., Github Actions, ArgoCD, Flux)
Experience with designing AWS services-based architectures is a plus
Experience with AI automation or low-code/no-code platforms such as N8N is a plus
Familiarity with prompt engineering and using AI tools to augment DevOps workflows
Exposure to cost optimization strategies for cloud infrastructure
Experience with incident response, on-call rotations, or SRE practices (SLOs, error budgets)
Experience with DevSecOps practices , integrating security scanning and compliance into CI/CD pipelines

Our adventure includes

Impact: Shape the future of travel with products used by millions of guests and thousands of hosts
Learning: Grow professionally in a culture that thrives on curiosity and feedback
Great People: Join a team of smart, motivated, and international colleagues who challenge and support each other
Technology: Work in a modern tech environment
Flexibility: Work a hybrid setup with 50% in-office time for collaboration, and spend up to 8 weeks a year from other inspiring locations
Perks on Top: Of course, we also offer travel benefits, gym discounts, and other perks to keep you energized

XML job scraping automation by YubHub

]]> Full-time mid hybrid Cloud, Container Orchestration, Infrastructure as Code, Monitoring & Observability, CI/CD, Scripting, Version Control, Collaboration, Automation, Helm, GitOps, AI automation, Low-code/no-code platforms, Prompt engineering, Cost optimization strategies, Incident response, SRE practices, DevSecOps practices Engineering Technology Holidu Hosts GmbH https://logos.yubhub.co/holidu.jobs.personio.com.png Holidu is a travel technology company that provides search engines for vacation rentals. https://holidu.jobs.personio.com https://holidu.jobs.personio.com/job/2595036 Munich, Germany 2026-04-18 ca221b6f-dca Technical Program Manager, Safeguards (Infrastructure & Evals) About the Role

Safeguards Engineering builds and operates the infrastructure that keeps Anthropic's AI systems safe in production. As a Technical Program Manager for Safeguards Infrastructure and Evals, you'll own the operational health and forward momentum of this stack.

Your primary responsibility is driving reliability , owning the incident-response and post-mortem process, ensuring SLOs are defined and met in partnership with various teams, and making sure that when things go wrong, the right people know, the right actions get taken, and those actions actually get closed out.

Alongside that ongoing operational rhythm, you'll coordinate the larger platform investments: migrations, eval-platform improvements, and the cross-team dependencies that connect them.

This role sits at the intersection of operations and program management. It requires genuine technical depth , you need to understand how these systems work well enough to triage effectively, judge what's actually safety-critical versus what can wait, and have informed conversations with the engineers building and maintaining them.

But the core of the job is keeping the machine running well and the work moving.

Responsibilities

Own the Safeguards Engineering ops review
Drive the recurring cadence that keeps the team informed and coordinated: surfacing recent incidents and failures, bringing visibility to reliability trends, and making sure the right people are in the room when decisions need to be made.
Drive incident tracking and post-mortem execution
Establish and maintain SLOs with partner teams
Maintain runbook quality and incident-ownership clarity
Drive platform migrations and infrastructure projects
Coordinate evals platform improvements

Requirements

Solid technical program management experience, particularly in operational or infrastructure-heavy environments
Understanding of how production ML systems work well enough to triage incidents intelligently and have substantive conversations with engineers about what's going wrong and why
Ability to work effectively across team boundaries
Experience with or strong interest in AI safety

Nice to Have

Experience with SRE practices, incident management frameworks, or on-call operations at scale
Familiarity with monitoring and alerting tooling (PagerDuty, Datadog, or equivalents)
Experience driving infrastructure migrations in complex, multi-team environments

XML job scraping automation by YubHub

]]> full-time senior hybrid $290,000-$365,000 USD Technical Program Management, Operational or Infrastructure-heavy Environments, Production ML Systems, Incident Tracking and Post-Mortem Execution, Service-Level Objectives (SLOs), Runbook Quality and Incident-Ownership Clarity, Platform Migrations and Infrastructure Projects, Evals Platform Improvements, SRE Practices, Incident Management Frameworks, On-Call Operations at Scale, Monitoring and Alerting Tooling, Infrastructure Migrations in Complex, Multi-Team Environments Engineering Technology Anthropic https://logos.yubhub.co/anthropic.ai.png Anthropic develops artificial intelligence systems. It has a growing team of researchers, engineers, and business leaders. https://anthropic.ai/ https://job-boards.greenhouse.io/anthropic/jobs/5108695008 San Francisco, CA | New York City, NY | Seattle, WA 2026-04-18