{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/container-orchestration"},"x-facet":{"type":"skill","slug":"container-orchestration","display":"Container Orchestration","count":63},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_bee517db-e9c"},"title":"DevOps Engineer (all genders)","description":"<p>Join our DevOps team at Holidu, a central team across the entire tech organisation, responsible for creating and maintaining the infrastructure that powers all of our products and services.</p>\n<p>In this role, you will contribute to the continuous improvement of our DevOps processes, collaborate with cross-functional teams, and apply best practices for scalable, reliable, and secure systems.</p>\n<p>Our ideal candidate has a solid technical foundation, a strong hands-on approach, and the ability to deliver results with minimal supervision.</p>\n<p><strong>Our Tech Stack</strong></p>\n<ul>\n<li>Cloud: AWS (EC2, S3, RDS, EKS, Elasticache, Lambda)</li>\n<li>Container Orchestration: Kubernetes with Helm</li>\n<li>Infrastructure as Code: Terraform + Terragrunt, Pulumi/ CDK</li>\n<li>Monitoring &amp; Observability: Prometheus, Grafana, Elastic Stack, OpenTelemetry</li>\n<li>CI/CD: Jenkins, GitHub Actions, ArgoCD, ArgoRollouts</li>\n<li>Scripting: Python, Go, Bash</li>\n<li>Version Control: GitHub</li>\n<li>Collaboration: Jira (Agile)</li>\n<li>Automation: N8N, AI-assisted tooling (Agentic ADK)</li>\n</ul>\n<p><strong>Your role in this journey</strong></p>\n<p>As a DevOps Engineer, you will be responsible for:</p>\n<ul>\n<li>Implementing and maintaining infrastructure definitions using Terraform, Pulumi, or similar tools</li>\n<li>Ensuring IaC standards are followed and contributing improvements to existing modules and patterns</li>\n<li>Managing and monitoring AWS services, ensuring system performance, availability, and adherence to best practices</li>\n<li>Troubleshooting production issues and participating in capacity planning</li>\n<li>Maintaining and troubleshooting Kubernetes clusters , deploying workloads, managing configurations, scaling services, and resolving incidents to support high-availability applications</li>\n<li>Maintaining and improving CI/CD pipelines to ensure smooth, automated software delivery</li>\n<li>Identifying bottlenecks and implementing enhancements across Jenkins, GitHub Actions, ArgoRollouts and ArgoCD</li>\n<li>Maintaining and extending our monitoring stack (Prometheus, Grafana)</li>\n<li>Building dashboards, configuring alerts, and improving observability to ensure comprehensive visibility into system health and performance</li>\n</ul>\n<p><strong>Your backpack is filled with</strong></p>\n<ul>\n<li>4+ years of experience in a DevOps, SRE, or cloud engineering role with hands-on production experience</li>\n<li>Solid working experience with AWS services (EC2, EKS, S3, RDS, Lambda) and cloud infrastructure management</li>\n<li>Hands-on experience with Docker and Kubernetes in production environments , deploying, scaling, and troubleshooting containerized workloads</li>\n<li>Practical experience with at least one Infrastructure as Code tool (Terraform, Pulumi, or AWS CDK)</li>\n<li>Experience maintaining and improving CI/CD pipelines using tools like Jenkins, GitHub Actions, or ArgoCD</li>\n<li>Proficiency in scripting with Python, Bash, or Go for operational automation</li>\n<li>Working knowledge of monitoring and observability tools such as Prometheus, Grafana, or similar platforms</li>\n<li>Familiarity with logging and log aggregation systems (Elastic Stack, Open Telemetry, or similar)</li>\n<li>Solid understanding of Linux administration, networking fundamentals, and system security basics</li>\n<li>Strong communication skills with the ability to collaborate across teams and explain technical decisions clearly</li>\n</ul>\n<p><strong>Nice to Have</strong></p>\n<ul>\n<li>Experience with Helm charts and Kubernetes package management</li>\n<li>Familiarity with GitOps workflows (e.g., Github Actions, ArgoCD, Flux)</li>\n<li>Experience with designing AWS services-based architectures is a plus</li>\n<li>Experience with AI automation or low-code/no-code platforms such as N8N is a plus</li>\n<li>Familiarity with prompt engineering and using AI tools to augment DevOps workflows</li>\n<li>Exposure to cost optimization strategies for cloud infrastructure</li>\n<li>Experience with incident response, on-call rotations, or SRE practices (SLOs, error budgets)</li>\n<li>Experience with DevSecOps practices , integrating security scanning and compliance into CI/CD pipelines</li>\n</ul>\n<p><strong>Our adventure includes</strong></p>\n<ul>\n<li>Impact: Shape the future of travel with products used by millions of guests and thousands of hosts</li>\n<li>Learning: Grow professionally in a culture that thrives on curiosity and feedback</li>\n<li>Great People: Join a team of smart, motivated, and international colleagues who challenge and support each other</li>\n<li>Technology: Work in a modern tech environment</li>\n<li>Flexibility: Work a hybrid setup with 50% in-office time for collaboration, and spend up to 8 weeks a year from other inspiring locations</li>\n<li>Perks on Top: Of course, we also offer travel benefits, gym discounts, and other perks to keep you energized</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_bee517db-e9c","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Holidu Hosts GmbH","sameAs":"https://holidu.jobs.personio.com","logo":"https://logos.yubhub.co/holidu.jobs.personio.com.png"},"x-apply-url":"https://holidu.jobs.personio.com/job/2595036","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"Full-time","x-salary-range":null,"x-skills-required":["Cloud","Container Orchestration","Infrastructure as Code","Monitoring & Observability","CI/CD","Scripting","Version Control","Collaboration","Automation"],"x-skills-preferred":["Helm","GitOps","AI automation","Low-code/no-code platforms","Prompt engineering","Cost optimization strategies","Incident response","SRE practices","DevSecOps practices"],"datePosted":"2026-04-18T22:14:30.429Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Munich, Germany"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Cloud, Container Orchestration, Infrastructure as Code, Monitoring & Observability, CI/CD, Scripting, Version Control, Collaboration, Automation, Helm, GitOps, AI automation, Low-code/no-code platforms, Prompt engineering, Cost optimization strategies, Incident response, SRE practices, DevSecOps practices"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_3fa0b80f-842"},"title":"Staff Software Engineer, Public Sector","description":"<p>Job Title: Staff Software Engineer, Public Sector</p>\n<p>We are seeking a highly skilled Staff Software Engineer to join our Public Sector team. As a Staff Software Engineer, you will be responsible for designing and implementing software solutions for the public sector. You will work closely with cross-functional teams to develop and deploy software applications that meet the needs of government agencies.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Design and implement software solutions for the public sector</li>\n<li>Work closely with cross-functional teams to develop and deploy software applications</li>\n<li>Collaborate with stakeholders to understand their needs and develop software solutions that meet those needs</li>\n<li>Develop and maintain software documentation</li>\n<li>Participate in code reviews and ensure that code meets quality standards</li>\n</ul>\n<p>Requirements:</p>\n<ul>\n<li>Bachelor&#39;s degree in Computer Science or related field</li>\n<li>5+ years of experience in software development</li>\n<li>Proficiency in programming languages such as Java, Python, or C++</li>\n<li>Experience with Agile development methodologies</li>\n<li>Strong understanding of software design patterns and principles</li>\n<li>Excellent communication and collaboration skills</li>\n</ul>\n<p>Preferred Qualifications:</p>\n<ul>\n<li>Master&#39;s degree in Computer Science or related field</li>\n<li>10+ years of experience in software development</li>\n<li>Experience with cloud-based technologies such as AWS or Azure</li>\n<li>Experience with DevOps practices</li>\n</ul>\n<p>Benefits:</p>\n<ul>\n<li>Competitive salary and benefits package</li>\n<li>Opportunities for professional growth and development</li>\n<li>Collaborative and dynamic work environment</li>\n</ul>\n<p>Salary Range: $252,000-$362,000 USD</p>\n<p>Required Skills:</p>\n<ul>\n<li>Full Stack Development</li>\n<li>Cloud-Native Technologies</li>\n<li>Data Engineering</li>\n<li>AI Application Integration</li>\n<li>Problem Solving</li>\n<li>Collaboration and Communication</li>\n<li>Adaptability and Learning Agility</li>\n</ul>\n<p>Preferred Skills:</p>\n<ul>\n<li>Experience with modern web development frameworks</li>\n<li>Familiarity with cloud platforms</li>\n<li>Understanding of containerization and container orchestration</li>\n<li>Knowledge of ETL processes</li>\n<li>Understanding of data modeling, data warehousing, and data governance principles</li>\n<li>Familiarity with integrating Large Language Models</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_3fa0b80f-842","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Scale","sameAs":"https://www.scale.com/","logo":"https://logos.yubhub.co/scale.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/scaleai/jobs/4674913005","x-work-arrangement":"onsite","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$252,000-$362,000 USD","x-skills-required":["Full Stack Development","Cloud-Native Technologies","Data Engineering","AI Application Integration","Problem Solving","Collaboration and Communication","Adaptability and Learning Agility"],"x-skills-preferred":["Experience with modern web development frameworks","Familiarity with cloud platforms","Understanding of containerization and container orchestration","Knowledge of ETL processes","Understanding of data modeling, data warehousing, and data governance principles","Familiarity with integrating Large Language Models"],"datePosted":"2026-04-18T16:00:27.694Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA; St. Louis, MO; New York, NY; Washington, DC"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Full Stack Development, Cloud-Native Technologies, Data Engineering, AI Application Integration, Problem Solving, Collaboration and Communication, Adaptability and Learning Agility, Experience with modern web development frameworks, Familiarity with cloud platforms, Understanding of containerization and container orchestration, Knowledge of ETL processes, Understanding of data modeling, data warehousing, and data governance principles, Familiarity with integrating Large Language Models","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":252000,"maxValue":362000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_c64368dd-789"},"title":"Software Engineer, ARC Team","description":"<p>We are seeking a highly skilled and motivated Software Engineer, ARC (Architecture, Reliability, &amp; Compute) to join our dynamic Public Sector Engineering team.</p>\n<p>As a part of this team, you will define how the company ships software, establishing the patterns for deploying into complex government and high-security environments, rather than just running Terraform scripts.</p>\n<p>You will build and maintain internal CLIs/tools that standardize testing, deployment, environment management and are tools that engineering relies on to prevent downstream breakages.</p>\n<p>You will execute on automated deployment efforts to pay down tech debt, creating fully functional staging/testing environments, and defining the company&#39;s standard for safe deployments.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Design and implement secure scalable backend systems for Public Sector customers, leveraging Scale&#39;s modern and cloud-native AI infrastructure.</li>\n</ul>\n<ul>\n<li>Own services or systems and define their long-term health goals, while also improving the health of surrounding components.</li>\n</ul>\n<ul>\n<li>Re-architect the stack to run in compliant or restrictive environments. This requires designing swappable components (auth, storage, logging) to meet government/security mandates without breaking the product.</li>\n</ul>\n<ul>\n<li>Collaborate with cross-functional teams to define and execute the vision for backend solutions, ensuring they meet the unique needs of government agencies operating in secure environments.</li>\n</ul>\n<ul>\n<li>Participate actively in customer engagements, working closely with stakeholders to understand requirements and deliver innovative solutions.</li>\n</ul>\n<ul>\n<li>Contribute to the platform roadmap and product strategy for Scale AI&#39;s Public Sector business, playing a key role in shaping the future direction of our offerings.</li>\n</ul>\n<p>Must have:</p>\n<ul>\n<li>At least an active secret clearance and the ability &amp; willingness to up level to TS/SCI with CI Poly. This is a requirement and candidates will not be considered who do not hold at least a secret clearance</li>\n</ul>\n<p>Ideally you&#39;d have:</p>\n<ul>\n<li>Full Stack Development: Proficiency in both front-end and back-end development, including experience with modern web development frameworks, programming languages, and databases. Experience with developing &amp; delivering software to air-gapped &amp; isolated environments is a plus.</li>\n</ul>\n<ul>\n<li>Cloud-Native Technologies: Understanding of containerization (e.g., Docker) and container orchestration (e.g., Kubernetes) is desired. Familiarity with cloud platforms (e.g., AWS, Azure, GCP) and experience in developing and deploying applications in a cloud-native environment.</li>\n</ul>\n<ul>\n<li>Security Focused: Experience with Federal Compliance frameworks, and requirements(e.g, Cloud SRG, FedRAMP, STIG Benchmarks, etc). Experience developing software &amp; technical solutions that meet strict security &amp; regulatory compliance requirements.</li>\n</ul>\n<ul>\n<li>Problem Solving: Strong analytical and problem-solving skills to understand complex challenges and devise effective solutions. Ability to think critically, identify root causes, and propose innovative approaches to overcome technical obstacles.</li>\n</ul>\n<ul>\n<li>Collaboration and Communication: Excellent interpersonal and communication skills to effectively collaborate with cross-functional teams, stakeholders, and customers. Ability to clearly articulate technical concepts to non-technical audiences and foster a collaborative work environment.</li>\n</ul>\n<ul>\n<li>Adaptability and Learning Agility: Willingness to embrace new technologies, learn new skills, and adapt to evolving project requirements. Ability to quickly grasp and apply new concepts and stay up-to-date with emerging trends in software engineering.</li>\n</ul>\n<ul>\n<li>Must be able to support work 3-4 days a week from the DC, SF, NYC, or STL office.</li>\n</ul>\n<p>Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position, determined by work location and additional factors, including job-related skills, experience, interview performance, and relevant education or training.</p>\n<p>You’ll also receive benefits including, but not limited to: Comprehensive health, dental and vision coverage, retirement benefits, a learning and development stipend, and generous PTO. Additionally, this role may be eligible for additional benefits such as a commuter stipend.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_c64368dd-789","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Scale AI","sameAs":"https://www.scale.com/","logo":"https://logos.yubhub.co/scale.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/scaleai/jobs/4673771005","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$138,000-$259,440 USD","x-skills-required":["Cloud-Native Technologies","Containerization","Container Orchestration","Cloud Platforms","Federal Compliance Frameworks","Security Focused","Problem Solving","Collaboration and Communication","Adaptability and Learning Agility"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:59:38.809Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA; St. Louis, MO; New York, NY; Washington, DC"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Cloud-Native Technologies, Containerization, Container Orchestration, Cloud Platforms, Federal Compliance Frameworks, Security Focused, Problem Solving, Collaboration and Communication, Adaptability and Learning Agility","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":138000,"maxValue":259440,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_b255adba-bf4"},"title":"Field Engineer, Public Sector","description":"<p>We&#39;re looking for a Field Engineer to join our Public Sector team. As a Field Engineer, you will be on the front lines of our field engineering efforts for our federal AI projects, working closely with our largest public sector customers to ensure seamless and optimized experiences with Scale&#39;s technology.</p>\n<p>Your primary responsibilities will include implementing end-to-end data integrations, syncing customer&#39;s data to Scale&#39;s platform and back, and working closely with our customer&#39;s engineering teams to optimize data pipelines. You will also design, develop and maintain playbooks, internal tools, Scale&#39;s documentation and SDKs to quickly get customers set up for long-term success.</p>\n<p>In addition, you will partner with Software Engineers and Operations to remove any technical hurdles customers may face, debug technical issues impacting delivery and own technical escalations coming from the customer. You will be accountable for the customer&#39;s technical experience throughout their time with Scale.</p>\n<p>The ideal candidate will have a track record of success as a hybrid customer-facing engineer or similar function, wearing multiple hats along the way. Prior technical hands-on experience working with clients in a pre or post-sales capacity to realize business goals is also required.</p>\n<p>We offer a competitive compensation package, including base salary, equity, and benefits. The base salary range for this full-time position is $190,000-$290,000 USD in San Francisco, New York, and Seattle, $170,000-$260,000 USD in Hawaii, Washington DC, Texas, and Colorado, and $140,000-$220,000 USD in St. Louis.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_b255adba-bf4","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Scale","sameAs":"https://www.scale.com/","logo":"https://logos.yubhub.co/scale.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/scaleai/jobs/4518690005","x-work-arrangement":"onsite","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$190,000-$290,000 USD in San Francisco, New York, and Seattle, $170,000-$260,000 USD in Hawaii, Washington DC, Texas, and Colorado, and $140,000-$220,000 USD in St. Louis","x-skills-required":["Python","JavaScript","API integrations","Large Language Models","2D Image Annotation","Container orchestration with Kubernetes","Helm charts for application deployment","Ansible or similar tools for automation"],"x-skills-preferred":["Experience in AI","Experience working in classified environments","Previous experience as a technical go-to-market resource","Understanding of DevSecOps principles"],"datePosted":"2026-04-18T15:58:59.499Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA; New York, NY; Honolulu, Hawaii, St. Louis, MO; Washington, DC"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, JavaScript, API integrations, Large Language Models, 2D Image Annotation, Container orchestration with Kubernetes, Helm charts for application deployment, Ansible or similar tools for automation, Experience in AI, Experience working in classified environments, Previous experience as a technical go-to-market resource, Understanding of DevSecOps principles","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":140000,"maxValue":290000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_5717691a-508"},"title":"Staff Infrastructure Software Engineer, Enterprise AI","description":"<p>We are looking for a Staff Infrastructure Software Engineer to act as a primary technical lead, engineering the &#39;paved road&#39; for our knowledge retrieval and inference engines. You will define the deployment standards for Agentic workflows at scale, bridging the gap between complex AI orchestration and world-class infrastructure.</p>\n<p>The ideal candidate thrives in a fast-paced environment, has a passion for both deep technical work and mentoring, and is capable of setting a long-term technical strategy for a critical domain while maintaining a strong, hands-on delivery focus.</p>\n<p>You will architect and implement solutions across multiple cloud providers (GCP, Azure, AWS) for customers in diverse, highly-regulated industries like healthcare, telecom, finance, and retail.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Architecting multi-cloud systems and abstractions to allow the SGP platform to run on top of existing Cloud providers.</li>\n<li>Using our own data and AI platform to analyse build and test logs and metrics to identify areas for improvement.</li>\n<li>Defining the architectural patterns for our multi-cloud infrastructure to support secure, reliable, and scalable Agentic workflows for enterprise customers.</li>\n<li>Enhancing engineering and infrastructure efficiency, reliability, accuracy, and response times, including CI/CD processes, test frameworks, data quality assurance, end-to-end reconciliation, and anomaly detection.</li>\n<li>Collaborating with platform and product teams to develop and implement innovative infrastructure that scales to meet evolving needs.</li>\n<li>Designing and championing highly scalable, reliable, and low-latency infrastructure and frameworks for building, orchestrating, and evaluating multi-agent systems at enterprise scale.</li>\n<li>Leading the infrastructure roadmap with a strong focus on compliance, privacy, and security standards, including designing change management and data isolation strategies.</li>\n<li>Owning the development and maintenance of our best-in-class Agentic observability platform (logging, metrics, tracing, and analytics) to proactively ensure system health and enable rapid incident response.</li>\n<li>Driving developer efficiency by building automated tooling and championing Infrastructure-as-Code (IaC) paradigms throughout the engineering organization to improve workflows and operational efficiency.</li>\n</ul>\n<p>The ideal candidate has proven experience in a senior role, with 5+ years of full-time software engineering experience, and a deep understanding of modern infrastructure practices, including CI/CD, IaC (e.g., Terraform, Helm Charts), container orchestration (e.g., Kubernetes) and observability platforms (e.g., Datadog, Prometheus, Grafana).</p>\n<p>Extensive experience with at least one major cloud provider (AWS, Azure, or GCP) and strong knowledge of security and compliance in enterprise environments, with a focus on access management, data isolation, and customer-specific VPC setups is required.</p>\n<p>Proficiency in Python or JavaScript/TypeScript, and SQL is also necessary.</p>\n<p>Bonus points for hands-on experience and a passion for working with Agents, LLMs, vector databases, and other emerging AI technologies.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_5717691a-508","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Scale","sameAs":"https://scale.com/","logo":"https://logos.yubhub.co/scale.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/scaleai/jobs/4599700005","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$216,200-$310,500 USD","x-skills-required":["Cloud computing","Infrastructure as Code","Container orchestration","Observability platforms","Security and compliance","Access management","Data isolation","Customer-specific VPC setups","Python","JavaScript/TypeScript","SQL"],"x-skills-preferred":["Agents","LLMs","Vector databases","Emerging AI technologies"],"datePosted":"2026-04-18T15:58:05.354Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"New York, NY; San Francisco, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Cloud computing, Infrastructure as Code, Container orchestration, Observability platforms, Security and compliance, Access management, Data isolation, Customer-specific VPC setups, Python, JavaScript/TypeScript, SQL, Agents, LLMs, Vector databases, Emerging AI technologies","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":216200,"maxValue":310500,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_d9b7d5ae-6bf"},"title":"Software Engineer, Distributed Systems","description":"<p>We&#39;re growing our team of passionate creatives and builders on a mission to make design accessible to all. Our platform helps teams bring ideas to life,whether you&#39;re brainstorming, creating a prototype, translating designs into code, or iterating with AI. From idea to product, Figma empowers teams to streamline workflows, move faster, and work together in real time from anywhere in the world.</p>\n<p>As a Software Engineer on our Infrastructure team, you’ll help design, build, and operate the systems that power our real-time collaborative design tools used by millions of people worldwide. We’re scaling fast, and we’re looking for experienced distributed systems engineers across a variety of teams. Whether you’re passionate about storage, compute orchestration, developer tooling, networking, or real-time data systems, this role offers an opportunity to shape the technical foundation of one of the most beloved design platforms in the world.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Design, build, and maintain scalable and reliable infrastructure systems that support product innovation and user collaboration at scale.</li>\n</ul>\n<ul>\n<li>Architect and evolve distributed systems including storage platforms, streaming infrastructure, and compute orchestration.</li>\n</ul>\n<ul>\n<li>Improve developer experience by building internal platforms, CI/CD systems, build tools, and APIs.</li>\n</ul>\n<ul>\n<li>Collaborate across product and infrastructure teams to design secure, maintainable, and performant systems.</li>\n</ul>\n<ul>\n<li>Participate in shaping platform strategy, roadmaps, and engineering best practices across the organization.</li>\n</ul>\n<ul>\n<li>Debug and resolve complex production issues that span services and layers of the stack.</li>\n</ul>\n<ul>\n<li>Mentor engineers and foster a culture of collaboration, inclusivity, and technical excellence.</li>\n</ul>\n<p>Requirements:</p>\n<ul>\n<li>5+ years of Software Engineering experience, specifically in backend or infrastructure engineering.</li>\n</ul>\n<ul>\n<li>Deep understanding of distributed systems concepts such as sharding, replication, consistency, and eventual convergence.</li>\n</ul>\n<ul>\n<li>Experience with cloud-native environments (AWS, GCP, or Azure), infrastructure-as-code, and container orchestration.</li>\n</ul>\n<ul>\n<li>Proficiency in languages such as Go, TypeScript, Python, Rust, or Ruby.</li>\n</ul>\n<ul>\n<li>Strong system design skills and a track record of architecting resilient production systems.</li>\n</ul>\n<ul>\n<li>Excellent communication skills, with experience collaborating across teams and mentoring others.</li>\n</ul>\n<p>Preferred Qualifications:</p>\n<ul>\n<li>Experience scaling storage platforms (e.g., Postgres, Redis, S3, DynamoDB) or operating streaming systems like Kafka.</li>\n</ul>\n<ul>\n<li>Background in traffic management, DDoS mitigation, or service mesh technologies (e.g., Envoy, Istio).</li>\n</ul>\n<ul>\n<li>A history of developing complex, real-time distributed systems at scale.</li>\n</ul>\n<ul>\n<li>A passion for building developer productivity tools, including development environments, CI/CD pipelines, and build systems.</li>\n</ul>\n<ul>\n<li>Experience with evolving large-scale, shared developer platforms to improve reliability and developer velocity.</li>\n</ul>\n<ul>\n<li>Strong problem-solving skills and a bias for action,especially when tackling high-impact, gritty challenges.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_d9b7d5ae-6bf","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Figma","sameAs":"https://www.figma.com/","logo":"https://logos.yubhub.co/figma.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/figma/jobs/5552549004","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$153,000-$376,000 USD","x-skills-required":["distributed systems","cloud-native environments","infrastructure-as-code","container orchestration","Go","TypeScript","Python","Rust","Ruby","system design","resilient production systems"],"x-skills-preferred":["storage platforms","streaming infrastructure","compute orchestration","developer tooling","networking","real-time data systems","traffic management","DDoS mitigation","service mesh technologies","complex distributed systems","developer productivity tools"],"datePosted":"2026-04-18T15:56:47.168Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA • New York, NY • United States"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"distributed systems, cloud-native environments, infrastructure-as-code, container orchestration, Go, TypeScript, Python, Rust, Ruby, system design, resilient production systems, storage platforms, streaming infrastructure, compute orchestration, developer tooling, networking, real-time data systems, traffic management, DDoS mitigation, service mesh technologies, complex distributed systems, developer productivity tools","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":153000,"maxValue":376000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_0ed46937-df6"},"title":"Staff Developer Success Engineer - West","description":"<p>We&#39;re looking for a Staff Developer Success Engineer to join our team. As a frontline technical expert for our developer community, you will help users deploy and scale Temporal in cloud-native environments. You will also troubleshoot complex infrastructure issues, optimize performance, and develop automation solutions.</p>\n<p>At Temporal, you&#39;ll work with cloud-native, highly scalable infrastructure spanning AWS, GCP, Kubernetes, and microservices. You&#39;ll gain deep expertise in container orchestration, networking, and observability while learning from complex, real-world customer use cases.</p>\n<p>As a Staff Developer Success Engineer, you&#39;ll work directly with developers to debug complex infrastructure issues, optimize cloud performance, and enhance reliability for Temporal users. You&#39;ll develop observability solutions (Grafana, Prometheus), improve networking (load balancing, DNS, ingress/egress), and automate infrastructure operations (Terraform, IaC) to help customers run Temporal efficiently at scale.</p>\n<p>Once ramped up, we expect you to independently drive technical solutions, whether debugging complex production issues or designing infrastructure best practices. Don&#39;t worry, we have seasoned engineers and mentors to support you along the way!</p>\n<p>As a Staff Developer Success Engineer you will engage directly with developers, engineering teams, and product teams to understand infrastructure challenges and provide solutions that enhance scalability, performance, and reliability.</p>\n<p>Your insights will influence platform improvements, from enhancing observability tooling to developing self-service infrastructure solutions that simplify troubleshooting (e.g., building diagnostic tools similar to Twilio’s Network Test).</p>\n<p>You’ll serve as a bridge between developers and infrastructure, ensuring that reliability, performance, and developer experience remain top priorities as Temporal scales.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_0ed46937-df6","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Temporal","sameAs":"https://temporal.io/","logo":"https://logos.yubhub.co/temporal.io.png"},"x-apply-url":"https://job-boards.greenhouse.io/temporaltechnologies/jobs/5076742007","x-work-arrangement":"remote","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$170,000 - $215,000","x-skills-required":["cloud-native infrastructure","container orchestration","networking","observability","infrastructure automation","Terraform","IaC","Kubernetes","AWS","GCP","Python","Java","Go","Grafana","Prometheus"],"x-skills-preferred":["security certificate management","security implementation","use case analysis","Temporal design decisions","architecture best practices","EKS","GKE","OpenTracing","Ansible","CDK"],"datePosted":"2026-04-18T15:56:34.606Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"United States - Remote Opportunity"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"cloud-native infrastructure, container orchestration, networking, observability, infrastructure automation, Terraform, IaC, Kubernetes, AWS, GCP, Python, Java, Go, Grafana, Prometheus, security certificate management, security implementation, use case analysis, Temporal design decisions, architecture best practices, EKS, GKE, OpenTracing, Ansible, CDK","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":170000,"maxValue":215000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_e948a283-667"},"title":"Staff Software Engineer, Platform Security","description":"<p>We are seeking a Staff Software Engineer to join our Platform Security Engineering team. As a key member of this team, you will be responsible for advancing our mission through security expertise, software development, and operational excellence.</p>\n<p>In this technical leadership role, you will articulate and pursue the most leveraged opportunities to reduce security risk across Engineering, designing and building lovable &#39;paved paths&#39; for managing identities and access, shipping code, configuring cloud infrastructure, and operating services.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Developing and applying best-in-class secure baselines for cloud infrastructure</li>\n<li>Securing first- and third-party software supply chains, from the dev environment through CI/CD and into production</li>\n<li>Building and owning identity and access management (IAM) systems that are user-friendly and promote least privilege</li>\n<li>Managing infrastructure vulnerabilities while supporting rapid growth for Engineering</li>\n<li>Consulting on risk assessments, architectural designs, threat models, code reviews, and more,pragmatically balancing security with other business considerations</li>\n</ul>\n<p>Example projects include:</p>\n<ul>\n<li>Supporting IAM with scalable platform solutions</li>\n<li>Building tooling to prevent and address vulnerabilities across our infrastructure</li>\n<li>Integrating service-to-service authentication and authorization into Discord&#39;s internal developer platform</li>\n</ul>\n<p>What we look for in a candidate includes:</p>\n<ul>\n<li>5+ years of experience building and operating production systems or infrastructure</li>\n<li>5+ years of experience writing software in a general-purpose programming language</li>\n<li>4+ years of experience securing systems with millions of users</li>\n<li>Experience mentoring junior ICs and leading technical projects involving multiple engineers and spanning multiple quarters</li>\n<li>Experience designing and building software for customers (internal or external) beyond your immediate team</li>\n<li>Experience securing cloud environments</li>\n<li>Experience defining and orchestrating containers</li>\n<li>Familiarity with build and CI/CD technologies</li>\n<li>Understanding of modern authentication and authorization concepts</li>\n</ul>\n<p>Bonus points if you have experience developing and debugging distributed systems atop GCP and Cloudflare, leading complex migrations or risk management programs across an engineering organization, or managing and securing VMs or bare-metal hosts.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_e948a283-667","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Discord","sameAs":"https://discord.com","logo":"https://logos.yubhub.co/discord.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/discord/jobs/8177912002","x-work-arrangement":"remote","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$248,000 to $279,000 + equity + benefits","x-skills-required":["cloud infrastructure","identity and access management","software development","operational excellence","security expertise"],"x-skills-preferred":["container orchestration","build and CI/CD technologies","modern authentication and authorization concepts","distributed systems","GCP and Cloudflare"],"datePosted":"2026-04-18T15:55:59.878Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco Bay Area or Remote"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"cloud infrastructure, identity and access management, software development, operational excellence, security expertise, container orchestration, build and CI/CD technologies, modern authentication and authorization concepts, distributed systems, GCP and Cloudflare","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":248000,"maxValue":279000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_1fa6d45d-1b7"},"title":"Senior Software Engineer, United Kingdom","description":"<p>We are hiring Software Engineers to accelerate our mission. At KoBold, software engineers have the unique opportunity to embed directly with their users and learn the ins and outs of mineral exploration and geology while developing state-of-the-art technology solutions.\\n\\nUnlike traditional software engineering roles, we don&#39;t simply ship code and passively wait for feedback about its utility: our userbase includes our colleagues... and ourselves!\\n\\nWhile there are real technical challenges in making mineral exploration data broadly searchable and accessible to both humans and machines, we believe that solving these technical challenges cannot be done without &quot;getting our hands dirty&quot; – sometimes literally! – by embedding directly with the exploration teams and even occasionally (~once a year) joining our colleagues in the field, be it in Zambia, Canada, or Arizona, to experience the impact of our software in real time.\\n\\nAs a Software Engineer on the Data Systems Engineering team at KoBold, your main role will be to enable systematic exploration and materially improve exploration success rates by making mineral exploration data broadly accessible to humans and machines.\\n\\nPast projects have included SIP (the Structured Ingest Pipeline), DataKit generation (producing curated sets of data on demand), and RAG (Retrieval Augmentation Generation, utilizing natural language processing on unstructured data).\\n\\nOur tech stack is primarily python and includes Django, React, AWS, and additional technologies like Retool and Prefect.\\n\\nYour work will empower KoBold to unlock invaluable insights and streamline intricate scientific processes.\\n\\nCollaborating with our exceptional team of data scientists, geologists, and other software engineers, you will have the opportunity to tackle complex problems head-on and collectively pave the way for the discoveries of vital energy transition metals like lithium, copper, nickel, and cobalt.\\n\\nTogether we can shape the future of mineral exploration and contribute to building a sustainable world.\\n\\nThis role will be responsible for:\\n\\nDeep engagement with exploration geologists and data scientists, continual learning about mineral exploration, and tailoring technology development to the needs of exploration project scientists\\n\\nBuilding data pipelines and tooling for deriving advanced human and machine insights from exploration data, often leading a small group of software engineers to successful delivery\\n\\nDeveloping expertise in KoBold&#39;s Data Systems and deeply understanding how they impact exploration\\n\\nEnd-to-end ownership of projects from design to implementation and testing to continued engagement with colleagues on exploration teams using your solutions\\n\\nResponding well to design and code feedback, also providing feedback to teammates\\n\\nOperationally managing the team&#39;s services and assisting scientific colleagues with our tooling\\n\\nQualifications:\\n\\n4+ years of software engineering experience, ideally building production cloud data systems\\n\\nProficiency with Python\\n\\nAbility to write production-quality code that is correct, readable, well-tested, scalable and extensible\\n\\nSkilled in large-scale system design\\n\\nA track record of taking ownership from definition of the problem and delivering projects with demonstrated impact in an iterative manner\\n\\nIntellectual curiosity and eagerness to learn about all aspects of mineral exploration, particularly in the geology domain.\\n\\nEnjoys constantly learning such that you are driving insights through using our tools in exploration and willing to work directly with geologists in the field.\\n\\nAbility to explain technical problems to and collaborate on solutions with domain experts who are not software developers.\\n\\nA strong communicator who enjoys working with colleagues across the company.\\n\\nExcitement about joining a fast-growing early-stage company, comfort with a dynamic work environment, and eagerness to take on an evolving range of responsibilities.\\n\\nKeen not just to build cool technology, but to figure out what technical product to build to best achieve the business objectives of the company.\\n\\nNice to Haves:\\n\\nExperience with modern frontend frameworks such as React\\n\\nExperience with geospatial data and building map-based experiences\\n\\nFamiliarity with containerization and container orchestration platforms, such as Docker, AWS ECS, Kubernetes, etc.\\n\\nFormal education or job exposure to natural sciences</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_1fa6d45d-1b7","directApply":true,"hiringOrganization":{"@type":"Organization","name":"KoBold","sameAs":"https://www.kobold.com/","logo":"https://logos.yubhub.co/kobold.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/koboldmetals/jobs/4678367005","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$120,000 - $210,000 USD","x-skills-required":["Python","Django","React","AWS","Retool","Prefect","Geospatial data","Containerization","Container orchestration"],"x-skills-preferred":["Modern frontend frameworks","Geospatial data and map-based experiences","Containerization and container orchestration platforms"],"datePosted":"2026-04-18T15:55:22.022Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Remote, United Kingdom"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Django, React, AWS, Retool, Prefect, Geospatial data, Containerization, Container orchestration, Modern frontend frameworks, Geospatial data and map-based experiences, Containerization and container orchestration platforms","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":120000,"maxValue":210000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_7520a7f6-8b6"},"title":"Member of Technical Staff - Infrastructure Reliability","description":"<p>We are seeking a Member of Technical Staff - Infrastructure Reliability to join our team. As a key member of our infrastructure team, you will own the availability, performance, and evolution of our core compute, storage, and networking infrastructure. This is a joint xAI/X role: you will own 24×7 reliability for the world&#39;s largest GPU training superclusters and one of the highest-QPS production systems on the planet.</p>\n<p>You will define and execute the technical strategy for infrastructure reliability and scalability, build and maintain the automation, observability, and control planes that keep multi-datacenter, hybrid cloud/on-prem environments healthy, lead incident response, deep-dive root cause analysis, and post-mortems that drive real fixes, identify, instrument, and eliminate systemic failure patterns, design and implement high-leverage systems software in Python and Rust, and push the state of the art in large-scale GPU cluster operations and AI workload reliability.</p>\n<p>To succeed in this role, you will need 5+ years shipping production software and/or operating distributed infrastructure at scale, expert-level knowledge of Linux systems, TCP/IP networking, and systems programming, strong coding skills with proven production experience in Rust (strongly preferred) and at least one of Python, Go, or C++, deep experience with large-scale distributed systems in on-prem and cloud environments, hands-on expertise with container orchestration, container runtimes, and infrastructure-as-code, intimate understanding of common failure modes in distributed systems and how to mitigate them, and a track record of participating in (or building) effective on-call rotations in high-stakes environments.</p>\n<p>In addition to a competitive base salary, you will receive equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short &amp; long-term disability insurance, life insurance, and various other discounts and perks.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_7520a7f6-8b6","directApply":true,"hiringOrganization":{"@type":"Organization","name":"xAI","sameAs":"https://www.xai.com/","logo":"https://logos.yubhub.co/xai.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/xai/jobs/4801451007","x-work-arrangement":"onsite","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$180,000 - $400,000 USD","x-skills-required":["Linux systems","TCP/IP networking","systems programming","Rust","Python","Go","C++","container orchestration","container runtimes","infrastructure-as-code"],"x-skills-preferred":["high-performance networking","low level configuration","deployment","support","monitoring","administration","troubleshooting"],"datePosted":"2026-04-18T15:55:02.425Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Palo Alto, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Linux systems, TCP/IP networking, systems programming, Rust, Python, Go, C++, container orchestration, container runtimes, infrastructure-as-code, high-performance networking, low level configuration, deployment, support, monitoring, administration, troubleshooting","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":180000,"maxValue":400000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_0f3a04da-d45"},"title":"Software Engineer, Platform","description":"<p><strong>About the role</strong></p>\n<p>We are looking for software engineers to join our Platform organisation. We build the foundational primitives that accelerate product development across Anthropic, and own infrastructure and systems that teams depend on to ship reliably and at scale.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Architect and optimise the critical development infrastructure that powers our AI product development, including dev environments, observability, and CI/CD pipelines.</li>\n<li>Partner closely with product teams to understand their development workflow and eliminate friction points.</li>\n<li>Work on problems where reliability and enterprise trust are the bar: token refresh at scale, admin controls that let IT govern what agents can do, proxy infrastructure that stays up when partner servers don&#39;t.</li>\n</ul>\n<p><strong>Platforms</strong></p>\n<ul>\n<li>Platform Acceleration: We work on maximising the developer productivity of product engineers at Anthropic.</li>\n<li>Service Infra: We build and maintain the core infrastructure that powers Anthropic&#39;s engineering organisation, from service mesh and observability systems to deployment pipelines and shared libraries.</li>\n<li>Multicloud: We build and maintain the infrastructure that enables Anthropic to operate across multiple cloud providers.</li>\n<li>Auth &amp; Identity: We build and maintain the critical infrastructure that powers identity and authentication across Anthropic&#39;s product suite.</li>\n<li>Connectivity: Our mission is to make Claude the most connected AI.</li>\n<li>API Distributability: The Claude API today is a rapidly growing platform serving developers and enterprises at scale.</li>\n<li>Platform Intelligence: We build the training systems that adapt Claude to specific customer workloads.</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>Have a minimum of 5 years of practical experience building backend product or platform systems,distributed systems, cloud-native products, developer tools, or external developer facing products.</li>\n<li>Have strong fundamentals in service-oriented architectures, networking, and systems design.</li>\n<li>Are proficient in Python, Go, Rust, or similar systems languages.</li>\n<li>Have experience with cloud infrastructure (GCP, AWS, or Azure), container orchestration (Kubernetes), and/or multi-cloud networking.</li>\n</ul>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Annual compensation range: $320,000-\\$320,000 USD.</li>\n<li>Visa sponsorship available.</li>\n<li>Flexible work arrangements, including remote work options.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_0f3a04da-d45","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5157844008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$320,000-\\$320,000 USD","x-skills-required":["Python","Go","Rust","Cloud infrastructure","Container orchestration","Multi-cloud networking","Service-oriented architectures","Networking","Systems design"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:54:40.638Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Go, Rust, Cloud infrastructure, Container orchestration, Multi-cloud networking, Service-oriented architectures, Networking, Systems design","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":320000,"maxValue":320000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_491db8e9-776"},"title":"Staff Site Reliability Engineer- Splunk Expert","description":"<p>We are seeking a highly technical Staff Site Reliability Engineer with deep expertise in Splunk and Grafana to own and evolve our observability ecosystem.</p>\n<p>As a Staff Site Reliability Engineer, you will move beyond simple monitoring to architect a comprehensive, scalable telemetry platform. You will be our subject-matter expert in Splunk optimisation, ensuring our logging architecture is performant, cost-effective, and deeply integrated with our automated workflows.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Splunk Architecture &amp; Optimisation: Lead the design and tuning of Splunk environments. Optimise indexer performance, search efficiency, and data models to ensure rapid troubleshooting and cost-efficiency.</li>\n</ul>\n<ul>\n<li>Advanced Visualisation: Architect and maintain sophisticated Grafana dashboards that correlate disparate data sources into a single pane of glass for real-time system health.</li>\n</ul>\n<ul>\n<li>Automated Infrastructure: Design, build, and maintain scalable observability infrastructure using tools like Terraform.</li>\n</ul>\n<ul>\n<li>Pipeline Engineering: Optimise the collection, processing, and storage of telemetry data (Metrics, Logs, Traces) to ensure high reliability and low latency.</li>\n</ul>\n<ul>\n<li>Workflow Automation: Develop custom Splunk workflows and integrations that trigger automated responses to system events, reducing Mean Time to Resolution (MTTR).</li>\n</ul>\n<ul>\n<li>Incident Response: Participate in on-call rotations and lead post-incident reviews to drive systemic improvements through &#39;observability-driven development.&#39;</li>\n</ul>\n<p>Required skills and experience include:</p>\n<ul>\n<li>Splunk Mastery: Deep, hands-on experience with Splunk administration, search optimisation (SPL), and architecting complex data pipelines.</li>\n</ul>\n<ul>\n<li>Grafana Expertise: Proven ability to build actionable, intuitive dashboards in Grafana that go beyond simple charts to provide deep operational insights.</li>\n</ul>\n<ul>\n<li>SRE Mindset: Minimum 8+ years of experience in an SRE, DevOps, or Systems Engineering role with a focus on high-availability systems.</li>\n</ul>\n<ul>\n<li>Programming Proficiency: Strong coding skills in Go, Python, or Ruby for building internal tools and automating observability workflows.</li>\n</ul>\n<ul>\n<li>Telemetry Standards: Hands-on experience with OpenTelemetry (OTel), Prometheus, or similar frameworks for instrumenting applications.</li>\n</ul>\n<ul>\n<li>Distributed Systems: Deep understanding of Linux internals, networking (TCP/IP, DNS, Load Balancing), and container orchestration (Kubernetes/EKS).</li>\n</ul>\n<p>Bonus skills include:</p>\n<ul>\n<li>Tracing: Implementation of distributed tracing (Jaeger, Tempo, or Honeycomb) to visualise request flow across microservices.</li>\n</ul>\n<ul>\n<li>Security Observability: Experience using Splunk for security orchestration (SOAR) or SIEM-related workflows.</li>\n</ul>\n<ul>\n<li>Cloud Platforms: Experience managing observability native tools within AWS, Azure, or GCP.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_491db8e9-776","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Okta","sameAs":"https://www.okta.com/","logo":"https://logos.yubhub.co/okta.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/okta/jobs/6874616","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Splunk","Grafana","SRE","Go","Python","Ruby","OpenTelemetry","Prometheus","Linux","Networking","Container Orchestration"],"x-skills-preferred":["Tracing","Security Observability","Cloud Platforms"],"datePosted":"2026-04-18T15:54:34.221Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Bengaluru, India"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Splunk, Grafana, SRE, Go, Python, Ruby, OpenTelemetry, Prometheus, Linux, Networking, Container Orchestration, Tracing, Security Observability, Cloud Platforms"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_d50772ab-afe"},"title":"Staff / Senior Software Engineer, Cloud Inference","description":"<p>We are seeking a Staff / Senior Software Engineer to join our Cloud Inference team. The successful candidate will design and build infrastructure that serves Claude across multiple cloud service providers (CSPs), accounting for differences in compute hardware, networking, APIs, and operational models.</p>\n<p>The ideal candidate will have significant software engineering experience, with a strong background in high-performance, large-scale distributed systems serving millions of users. They will also have experience building or operating services on at least one major cloud platform (AWS, GCP, or Azure), with exposure to Kubernetes, Infrastructure as Code or container orchestration.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Design and build infrastructure that serves Claude across multiple CSPs, accounting for differences in compute hardware, networking, APIs, and operational models</li>\n</ul>\n<ul>\n<li>Collaborate with CSP partner engineering teams to resolve operational issues, influence provider roadmaps, and stand up end-to-end serving on new cloud platforms</li>\n</ul>\n<ul>\n<li>Design and evolve CI/CD automation systems, including validation and deployment pipelines, that reliably ship new model versions to millions of users across cloud platforms without regressions</li>\n</ul>\n<ul>\n<li>Design interfaces and tooling abstractions across CSPs that enable cost-effective inference management, scale across providers, and reduce per-platform complexity</li>\n</ul>\n<ul>\n<li>Contribute to capacity planning and autoscaling strategies that dynamically match supply with demand across CSP validation and production workloads</li>\n</ul>\n<ul>\n<li>Optimise inference cost and performance across providers,designing workload placement and routing systems that direct requests to the most cost-effective accelerator and region</li>\n</ul>\n<ul>\n<li>Contribute to inference features that must work consistently across all platforms</li>\n</ul>\n<ul>\n<li>Analyse observability data across providers to identify performance bottlenecks, cost anomalies, and regressions, and drive remediation based on real-world production workloads</li>\n</ul>\n<p>Requirements:</p>\n<ul>\n<li>Significant software engineering experience, with a strong background in high-performance, large-scale distributed systems serving millions of users</li>\n</ul>\n<ul>\n<li>Experience building or operating services on at least one major cloud platform (AWS, GCP, or Azure), with exposure to Kubernetes, Infrastructure as Code or container orchestration</li>\n</ul>\n<ul>\n<li>Strong interest in inference</li>\n</ul>\n<ul>\n<li>Thrive in cross-functional collaboration with both internal teams and external partners</li>\n</ul>\n<ul>\n<li>Are a fast learner who can quickly ramp up on new technologies, hardware platforms, and provider ecosystems</li>\n</ul>\n<ul>\n<li>Are highly autonomous and self-driven, taking ownership of problems end-to-end with a bias toward flexibility and high-impact work</li>\n</ul>\n<ul>\n<li>Pick up slack, even when it goes outside your job description</li>\n</ul>\n<p>Preferred skills:</p>\n<ul>\n<li>Direct experience working with CSP partner teams to scale infrastructure or products across multiple platforms, navigating differences in networking, security, privacy, billing, and managed service offerings</li>\n</ul>\n<ul>\n<li>A background in building platform-agnostic tooling or abstraction layers that work across cloud providers</li>\n</ul>\n<ul>\n<li>Hands-on experience with capacity management, cost optimisation, or resource planning at scale across heterogeneous environments</li>\n</ul>\n<ul>\n<li>Strong familiarity with LLM inference optimisation, batching, caching, and serving strategies</li>\n</ul>\n<ul>\n<li>Experience with Machine learning infrastructure including GPUs, TPUs, Trainium, or other AI accelerators</li>\n</ul>\n<ul>\n<li>Background designing and building CI/CD systems that automate deployment and validation across cloud environments</li>\n</ul>\n<ul>\n<li>Solid understanding of multi-region deployments, geographic routing, and global traffic management</li>\n</ul>\n<ul>\n<li>Proficiency in Python or Rust</li>\n</ul>\n<p>Salary Range: $300,000-$485,000 USD</p>\n<p>Experience Level: Staff</p>\n<p>Employment Type: Full-time</p>\n<p>Workplace Type: Hybrid</p>\n<p>Category: Engineering</p>\n<p>Industry: Technology</p>\n<p>Required Skills:</p>\n<ul>\n<li>High-performance, large-scale distributed systems</li>\n</ul>\n<ul>\n<li>Cloud computing (AWS, GCP, Azure)</li>\n</ul>\n<ul>\n<li>Kubernetes</li>\n</ul>\n<ul>\n<li>Infrastructure as Code</li>\n</ul>\n<ul>\n<li>Container orchestration</li>\n</ul>\n<ul>\n<li>Inference</li>\n</ul>\n<ul>\n<li>Cross-functional collaboration</li>\n</ul>\n<ul>\n<li>Autonomy and self-driven</li>\n</ul>\n<ul>\n<li>Platform-agnostic tooling</li>\n</ul>\n<ul>\n<li>Capacity management</li>\n</ul>\n<ul>\n<li>Cost optimisation</li>\n</ul>\n<ul>\n<li>Resource planning</li>\n</ul>\n<ul>\n<li>LLM inference optimisation</li>\n</ul>\n<ul>\n<li>Machine learning infrastructure</li>\n</ul>\n<ul>\n<li>CI/CD systems</li>\n</ul>\n<ul>\n<li>Multi-region deployments</li>\n</ul>\n<ul>\n<li>Geographic routing</li>\n</ul>\n<ul>\n<li>Global traffic management</li>\n</ul>\n<ul>\n<li>Python</li>\n</ul>\n<ul>\n<li>Rust</li>\n</ul>\n<p>Preferred Skills:</p>\n<ul>\n<li>Direct experience working with CSP partner teams</li>\n</ul>\n<ul>\n<li>Building platform-agnostic tooling</li>\n</ul>\n<ul>\n<li>Hands-on experience with capacity management</li>\n</ul>\n<ul>\n<li>Strong familiarity with LLM inference optimisation</li>\n</ul>\n<ul>\n<li>Experience with Machine learning infrastructure</li>\n</ul>\n<ul>\n<li>Background designing and building CI/CD systems</li>\n</ul>\n<ul>\n<li>Solid understanding of multi-region deployments</li>\n</ul>\n<ul>\n<li>Proficiency in Python or Rust</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_d50772ab-afe","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5107466008","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$300,000-$485,000 USD","x-skills-required":["high-performance, large-scale distributed systems","cloud computing (AWS, GCP, Azure)","kubernetes","infrastructure as code","container orchestration","inference","cross-functional collaboration","autonomy and self-driven","platform-agnostic tooling","capacity management","cost optimisation","resource planning","llm inference optimisation","machine learning infrastructure","ci/cd systems","multi-region deployments","geographic routing","global traffic management","python","rust"],"x-skills-preferred":["direct experience working with csp partner teams","building platform-agnostic tooling","hands-on experience with capacity management","strong familiarity with llm inference optimisation","experience with machine learning infrastructure","background designing and building ci/cd systems","solid understanding of multi-region deployments","proficiency in python or rust"],"datePosted":"2026-04-18T15:53:24.048Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"engineering","industry":"technology","skills":"high-performance, large-scale distributed systems, cloud computing (AWS, GCP, Azure), kubernetes, infrastructure as code, container orchestration, inference, cross-functional collaboration, autonomy and self-driven, platform-agnostic tooling, capacity management, cost optimisation, resource planning, llm inference optimisation, machine learning infrastructure, ci/cd systems, multi-region deployments, geographic routing, global traffic management, python, rust, direct experience working with csp partner teams, building platform-agnostic tooling, hands-on experience with capacity management, strong familiarity with llm inference optimisation, experience with machine learning infrastructure, background designing and building ci/cd systems, solid understanding of multi-region deployments, proficiency in python or rust","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":300000,"maxValue":485000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_42af3f66-4fc"},"title":"AI Infrastructure Architect","description":"<p>Secure Every Identity, from AI to Human</p>\n<p>Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organisations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence.</p>\n<p>This is an opportunity to do career-defining work. We&#39;re all in on this mission. If you are too, let&#39;s talk.</p>\n<p><strong>AI Infrastructure Architect</strong></p>\n<p>About the Role</p>\n<p>We are looking for a smart and versatile AI Infrastructure Architect to build and evolve the AI infrastructure and platform that powers our identity security solutions. Your work will enable internal teams and product groups to integrate AI capabilities safely, securely, and at scale,empowering Okta’s mission to protect millions of digital identities worldwide. While your primary focus will be to architect scalable, secure, and resilient infrastructure supporting AI-driven tools, frameworks, and identity services, we value someone who isn’t afraid to get hands-on when needed to help solve complex challenges and drive projects forward.</p>\n<p><strong>Key Responsibilities</strong></p>\n<ul>\n<li>Lead AI enablement initiatives, including proof-of-concepts for emerging AI infrastructure technologies and integration approaches.</li>\n</ul>\n<ul>\n<li>Collaborate cross-functionally with engineering, security, data science, and product teams to align AI platform architecture with business and security goals.</li>\n</ul>\n<ul>\n<li>Architect scalable, resilient, and secure AI infrastructure that supports AI-powered tools and features across Okta’s Identity Platform.</li>\n</ul>\n<ul>\n<li>Lead infrastructure decisions across AWS, GCP, or hybrid environments with a focus on secure identity data handling</li>\n</ul>\n<ul>\n<li>Develop and maintain infrastructure-as-code frameworks (e.g., Terraform, Helm) to ensure consistent, reproducible deployment of AI services</li>\n</ul>\n<ul>\n<li>Champion security and compliance by embedding data privacy and identity protection standards directly into the AI platform and infrastructure design.</li>\n</ul>\n<ul>\n<li>Serve as the key advocate and strategist for AI-driven efficiency initiatives across infrastructure platform teams and pre-production systems.</li>\n</ul>\n<ul>\n<li>Implement robust MLOps practices, such as model evaluation, rollback strategies, and A/B testing, to guarantee the reliability and governance of AI in production.</li>\n</ul>\n<ul>\n<li>Drive continuous innovation by staying current with AI and cloud infrastructure trends and evangelizing best practices internally.</li>\n</ul>\n<p><strong>Desired Qualifications</strong></p>\n<ul>\n<li>10+ years in infrastructure or software engineering, with ≥ 2 years building AI/ML systems</li>\n</ul>\n<ul>\n<li>Exceptional systems level thinking and a track record in architecting and building enterprise grade infrastructure</li>\n</ul>\n<ul>\n<li>Deep expertise in cloud platforms (AWS, GCP), distributed systems, and container orchestration (Kubernetes)</li>\n</ul>\n<ul>\n<li>Expected to be very hands-on in order to create, review, and contribute large chunks of quality code</li>\n</ul>\n<p><strong>Preferred</strong></p>\n<ul>\n<li>Experience in identity, security, fraud, or risk analytics domains.</li>\n</ul>\n<ul>\n<li>Experience operationalizing large language models or foundation models in production environments.</li>\n</ul>\n<ul>\n<li>Contributions to MLOps or infrastructure open-source projects.</li>\n</ul>\n<p><strong>What You’ll Gain</strong></p>\n<ul>\n<li>Opportunity to lead infrastructure shaping AI systems that protect millions of identity transactions.</li>\n</ul>\n<ul>\n<li>Be at the core of building efficient and AI powered enterprise grade solutions that touch internal and external customers alike.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_42af3f66-4fc","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Okta","sameAs":"https://www.okta.com/","logo":"https://logos.yubhub.co/okta.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/okta/jobs/7122284","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$235,000-$353,000 USD","x-skills-required":["cloud platforms","distributed systems","container orchestration","infrastructure-as-code","MLOps","AI infrastructure","security and compliance","data privacy and identity protection"],"x-skills-preferred":["identity and security","fraud and risk analytics","large language models and foundation models","open-source projects"],"datePosted":"2026-04-18T15:53:19.138Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Bellevue, Washington"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"cloud platforms, distributed systems, container orchestration, infrastructure-as-code, MLOps, AI infrastructure, security and compliance, data privacy and identity protection, identity and security, fraud and risk analytics, large language models and foundation models, open-source projects","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":235000,"maxValue":353000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_74be15a1-bce"},"title":"Software Engineer, Inference Deployment","description":"<p>Our mandate is to make inference deployment boring and unattended. We serve Claude to millions of users across GPUs, TPUs, and Trainium , and every model update must reach production safely, quickly, and without disrupting service. As a Software Engineer on the Launch Engineering team, you&#39;ll design and build the deployment infrastructure that moves inference code from merge to production.</p>\n<p>This is a resource-constrained optimization problem at its core: validation and deployment consume the same accelerator chips that serve customer traffic , your deploys compete with live user requests for the same hardware. Every model brings different fleet sizes, startup times, and correctness requirements, so the system must adapt continuously. You&#39;ll build systems that navigate these constraints , orchestrating validation, scheduling deployments intelligently, and driving down cycle time from merge to production.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Own deployment orchestration that continuously moves validated inference builds into production across GPU, TPU, and Trainium fleets, unattended under normal conditions</li>\n</ul>\n<ul>\n<li>Improve capacity-aware deployment scheduling to maximize deployment throughput against constrained accelerator budgets and variable fleet sizes</li>\n</ul>\n<ul>\n<li>Extend deployment observability , dashboards and tooling that answer &quot;what code is running in production,&quot; &quot;where is my commit,&quot; and &quot;what validation passed for this deploy&quot;</li>\n</ul>\n<ul>\n<li>Drive down cycle time from code merge to production with pipeline architectures that minimize serial dependencies and maximize parallelism</li>\n</ul>\n<ul>\n<li>Optimize fleet rollout strategies for large-scale deployments across thousands of GPU, TPU, and Trainium chips, minimizing disruption to serving capacity</li>\n</ul>\n<ul>\n<li>Evolve self-service model onboarding so that new models can be added to the continuous deployment pipeline without Launch Engineering involvement</li>\n</ul>\n<ul>\n<li>Partner across the Inference organization with teams owning validation, autoscaling, and model routing to integrate deployment automation with their systems</li>\n</ul>\n<p>You May Be a Good Fit If You Have:</p>\n<ul>\n<li>5+ years of experience building deployment, release, or delivery infrastructure at scale</li>\n</ul>\n<ul>\n<li>Strong software engineering skills with experience designing systems that manage complex state machines and multi-stage pipelines</li>\n</ul>\n<ul>\n<li>Experience with deployment systems where resource constraints shape the design , whether that&#39;s fleet capacity, network bandwidth, hardware availability, or coordinated rollout windows</li>\n</ul>\n<ul>\n<li>A track record of building automation that measurably improves deployment velocity and reliability</li>\n</ul>\n<ul>\n<li>Proficiency with Kubernetes-based deployments, rolling update mechanics, and container orchestration</li>\n</ul>\n<ul>\n<li>Comfort working across the stack , from backend services and databases to CLI tools and web UIs</li>\n</ul>\n<ul>\n<li>Strong communication skills and the ability to work closely with oncall engineers, model teams, and infrastructure partners</li>\n</ul>\n<p>Strong Candidates May Also Have:</p>\n<ul>\n<li>Experience with ML inference or training infrastructure deployment, particularly across multiple accelerator types (GPU, TPU, Trainium)</li>\n</ul>\n<ul>\n<li>Background in capacity planning or resource-constrained scheduling (e.g., bin-packing, fleet management, job scheduling with hardware affinity)</li>\n</ul>\n<ul>\n<li>Experience with progressive delivery in systems with long validation cycles: canary/soak testing, blue-green deployments, traffic shifting, automated rollback</li>\n</ul>\n<ul>\n<li>Experience at companies with large-scale release engineering challenges (mobile release trains, monorepo deployments, multi-datacenter rollouts)</li>\n</ul>\n<ul>\n<li>Experience with Python and/or Rust in production systems</li>\n</ul>\n<p>The annual compensation range for this role is $320,000-$485,000 USD.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_74be15a1-bce","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5111745008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$320,000-$485,000 USD","x-skills-required":["deployment infrastructure","software engineering","complex state machines","multi-stage pipelines","Kubernetes-based deployments","container orchestration","backend services","databases","CLI tools","web UIs"],"x-skills-preferred":["ML inference","training infrastructure deployment","capacity planning","resource-constrained scheduling"," deployments","progressive delivery","Python","Rust"],"datePosted":"2026-04-18T15:53:04.252Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"deployment infrastructure, software engineering, complex state machines, multi-stage pipelines, Kubernetes-based deployments, container orchestration, backend services, databases, CLI tools, web UIs, ML inference, training infrastructure deployment, capacity planning, resource-constrained scheduling,  deployments, progressive delivery, Python, Rust","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":320000,"maxValue":485000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_0396ac1c-dad"},"title":"Senior Staff Engineer, Cloud Economics","description":"<p>Reddit is a community of communities. It&#39;s built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet.</p>\n<p>The Ads Foundations organization is responsible for the technical backbone powering Ads Monetization at scale. Within this ecosystem, efficient resource utilization is critical.</p>\n<p>We are seeking a Senior Staff Engineer to serve as the Cloud Resources Technical Owner for the Ads Domain. You will be the primary engineering point of contact for the Senior Director in Ads and Cloud Operations/Resources (COR &amp; Opex) stakeholders.</p>\n<p><strong>Responsibilities</strong></p>\n<p>Technical Vision &amp; Strategy</p>\n<ul>\n<li>Define and drive the technical strategy for Cloud Resource management within Ad first, ensuring that cost accountability is built into the architecture of our systems.</li>\n<li>High-Fidelity Investment Modeling: Elevate cloud estimation from guesswork to a rigorous engineering discipline. You will lead the high-quality forecasting of new cloud investments and efficiency projects, designing data-driven models to validate technical ROI before builds happen</li>\n<li>Design and implement a roadmap for Cost Observability 2.0, moving beyond simple reporting to real-time, service/team-level spend attribution and automated anomaly detection.</li>\n</ul>\n<p>Engineering &amp; Tooling Leadership</p>\n<ul>\n<li>Design and build internal platforms that programmatically enforce PnL accountability. You will engineer (or collaborate with Core Infrastructure partners) to deliver the dashboards, alerts, and governance tools that every Ads team relies on to manage their cloud footprint.</li>\n<li>Architect automated frameworks for validating cost estimates and forecasting, replacing manual spreadsheets with data-driven software solutions.</li>\n</ul>\n<p>Scale &amp; Optimization</p>\n<ul>\n<li>Fight for observability by instrumenting deep telemetry into our cloud infrastructure. You will be hands-on in identifying inefficiencies (e.g., underutilized clusters, uncompressed data flows) and re-architecting critical paths for cost reduction.</li>\n<li>Lead the technical validation of vendor and 3rd-party tool integration, ensuring we extract maximum engineering value from every dollar spent.</li>\n</ul>\n<p>Cultural &amp; Technical Stewardship</p>\n<ul>\n<li>Act as a role model for the Ads domain and the wider company. You will set the standard for how engineering teams think about Cost as a Non Functional Requirement, eventually scaling these patterns to other domains.</li>\n<li>Partner with Finance and Engineering leadership to translate Cloud Spend into actionable engineering tasks (e.g., refactor Service X to use Spot instances).</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>10+ years of software engineering experience, with a strong focus on public cloud infrastructure (AWS/GCP/Azure) and large-scale distributed systems.</li>\n<li>Engineer-First Mindset: You are comfortable writing code (Go, Python, Java) to solve infrastructure problems. You don&#39;t just ask for a report; you build the API that generates it.</li>\n<li>Deep Cloud Expertise: You have mastery over Kubernetes, container orchestration, and cloud-native storage, understanding exactly how architectural choices impact the bottom line.</li>\n<li>Operational Excellence: Proven track record of building observability pipelines (Prometheus, Grafana, Datadog) that drive operational and financial alerts.</li>\n<li>Influential Leader: Skilled at driving clarity in ambiguous spaces. You can convince a Principal Engineer to refactor their service for cost efficiency because you can prove the technical and business value.</li>\n</ul>\n<p><strong>Bonus Points</strong></p>\n<ul>\n<li>Experience building custom FinOps tooling or internal developer platforms.</li>\n<li>Background in performance engineering or capacity planning for high-traffic ad tech environments.</li>\n<li>Contributions to open-source projects related to cloud efficiency or observability.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_0396ac1c-dad","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Reddit Inc.","sameAs":"https://www.redditinc.com","logo":"https://logos.yubhub.co/redditinc.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/reddit/jobs/7628291","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$232,500-$325,500 USD","x-skills-required":["public cloud infrastructure","large-scale distributed systems","Kubernetes","container orchestration","cloud-native storage","observability pipelines","Prometheus","Grafana","Datadog"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:51:43.900Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Remote - United States"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"public cloud infrastructure, large-scale distributed systems, Kubernetes, container orchestration, cloud-native storage, observability pipelines, Prometheus, Grafana, Datadog","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":232500,"maxValue":325500,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_755c5895-997"},"title":"Manager, Product Engineering","description":"<p>At Instabase, we&#39;re committed to democratizing access to cutting-edge AI innovation. Our market opportunity is vast, with customers representing some of the largest and most complex organisations in the world. As an Manager, Product Engineering, you will lead a team responsible for the full-stack development of enterprise software, working closely with cross-functional teams to design and deliver high-impact solutions.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Team Leadership – Build, manage, and develop a team of high-performing engineers, providing mentorship and career development while fostering a collaborative and inclusive culture.</li>\n<li>Cross-Functional Collaboration – Partner with product, design, and technical writing teams to define the roadmap and drive execution.</li>\n<li>End-to-End Execution – Oversee the entire software development lifecycle, from capacity planning and roadmapping to prototyping and production deployment.</li>\n<li>Technical Leadership – Contribute to technical discussions and architectural decisions within your product area.</li>\n<li>Quality &amp; Operational Excellence – Establish and uphold best practices to maintain a high-quality bar for all deliverables, ensuring reliability, scalability, and usability.</li>\n<li>Innovation &amp; AI Integration – Leverage modern AI tools to improve team productivity and enhance product capabilities.</li>\n</ul>\n<p>About You:</p>\n<ul>\n<li>Experience – 5+ years of engineering management experience, with a track record of building and leading high-performing teams.</li>\n<li>AI &amp; Data Expertise – Strong background in AI, ML, and data-driven products, with experience building and scaling intelligent applications.</li>\n<li>Startup Mentality – Comfortable operating in a fast-paced startup environment, navigating ambiguity, and driving impactful results.</li>\n<li>Technical Proficiency – Deep knowledge of modern technology stacks, including cloud infrastructure, container orchestration systems, TypeScript, React, and related tools.</li>\n<li>SaaS &amp; Enterprise Experience – Proven ability to deliver SaaS-based enterprise software solutions at scale.</li>\n<li>Process &amp; Productivity – Experience implementing SDLC, and leveraging modern productivity software (Jira, Confluence, Figma, etc.).</li>\n<li>AI-Driven Development – Passion for integrating modern AI tools to optimise development workflows.</li>\n</ul>\n<p>Compensation: The base salary range for this role is $280,000 to $300,000 + bonus, equity, and benefits.</p>\n<p>Benefits:</p>\n<ul>\n<li>Flexible PTO: Because life is better when you actually live it!</li>\n<li>Comprehensive Coverage: Top-notch medical, dental, and vision insurance.</li>\n<li>401(k) with Matching: We’ve got your back for a secure future.</li>\n<li>Parental Leave &amp; Fertility Benefits: Supporting you in growing your family, your way.</li>\n<li>Therapy Sessions Covered: Mental health matters, 10 free sessions through Samata Health.</li>\n<li>Wellness Stipend: For gym memberships, fitness tech, or whatever keeps you thriving.</li>\n<li>Lunch on Us: Enjoy a lunch credit when you’re in the office.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_755c5895-997","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Instabase","sameAs":"https://www.instabase.com/","logo":"https://logos.yubhub.co/instabase.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/instabase/jobs/8419974002","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$280,000 to $300,000 + bonus, equity, and benefits","x-skills-required":["AI","ML","data-driven products","cloud infrastructure","container orchestration systems","TypeScript","React","SaaS-based enterprise software solutions","SDLC","productivity software"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:51:42.202Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"AI, ML, data-driven products, cloud infrastructure, container orchestration systems, TypeScript, React, SaaS-based enterprise software solutions, SDLC, productivity software","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":280000,"maxValue":300000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_6556c9a6-357"},"title":"Senior Professional Services, Technical Architect - AI","description":"<p>As a Senior Professional Services Technical Architect, AI at GitLab, you&#39;ll be an embedded expert who helps customers move from ideas to production. You&#39;ll work directly with customer teams as a consultative partner, running in-depth discovery to understand their environment and priorities, then designing and delivering solutions that connect business goals to architecture and implementation.</p>\n<p>This is a deeply technical, customer-facing role where you&#39;ll build and deploy Custom Agents, Custom Flows, and CI/CD integrations. You&#39;ll own delivery end-to-end, from prototype through production support. You&#39;ll partner closely with Professional Services and Customer Success stakeholders, including Professional Services Engineers, Project Managers, Customer Success Managers, and Solution Architects.</p>\n<p>Some examples of our projects include leading customer discovery and defining a prioritized GitLab Duo Agent Platform use case roadmap tied to clear success criteria, designing and delivering production-ready GitLab Duo Agent Platform implementations, building rapid prototypes to demonstrate the art of the possible with agentic AI, and integrating the GitLab Duo Agent Platform with customer systems and workflows using GitLab APIs, pipeline configuration, and infrastructure as code.</p>\n<p>What you&#39;ll do:</p>\n<p>Conduct deep customer discovery to understand business goals, technical constraints, and organizational dynamics, and translate them into clear problem statements and a prioritized use case plan for GitLab Duo Agent Platform.</p>\n<p>Partner with customer stakeholders across engineering, security, compliance, and business teams to align on success criteria, milestones, and adoption strategy for AI workflows in production.</p>\n<p>Design, build, and deploy production-ready GitLab Duo Agent Platform solutions, including Custom Agents, Custom Flows, and CI/CD integrations that map to validated customer use cases.</p>\n<p>Embed with customer engineering teams to deliver hands-on implementations end-to-end, from prototype to production rollout, troubleshooting, and optimization.</p>\n<p>Configure and integrate platform foundations such as runners, network access, runtime sandboxing, GitLab APIs (REST and GraphQL), and AI governance controls (for example, role-based access control and model policies) to meet enterprise requirements.</p>\n<p>Measure and communicate impact using DORA (DevOps Research and Assessment) metrics, AI Impact Analytics, and Value Stream Analytics, and use those insights to guide iteration and expansion of successful use cases.</p>\n<p>Codify repeatable deployment patterns, reusable assets, and lessons learned, contributing back to GitLab through documentation, accelerators, and product feedback informed by field experience.</p>\n<p>Travel up to 50% for customer site engagements and company onsite events to support delivery, onboarding, and stakeholder alignment.</p>\n<p>What you&#39;ll bring:</p>\n<p>Demonstrated experience leading customer-facing technical engagements, from discovery through production rollout, with ownership of outcomes.</p>\n<p>Proficiency in Python, with experience building and operating production-grade applications and integrations.</p>\n<p>Experience delivering with GitLab CI/CD, including pipeline design, YAML configuration, and using GitLab APIs (REST and GraphQL).</p>\n<p>Hands-on experience with infrastructure as code (for example, Terraform or Ansible) and deploying solutions into enterprise environments.</p>\n<p>Working knowledge of large language model (LLM) capabilities and limitations, including prompt engineering and building agentic workflows (such as Custom Agents and Custom Flows).</p>\n<p>Experience with Docker, container orchestration concepts, and runner configuration in secure environments.</p>\n<p>Familiarity with DevSecOps practices, including security controls, access management, and compliance requirements that impact deployment design.</p>\n<p>Strong written and verbal communication skills, with the ability to partner closely with customer stakeholders and translate business goals into technical plans in a remote, asynchronous environment.</p>\n<p>About the team:</p>\n<p>GitLab&#39;s Professional Services organization within Customer Success helps customers get value from the GitLab Duo Agent Platform. We&#39;re a remote, asynchronous team that works closely with customer-facing colleagues to support successful deployments. We focus on turning what we learn in the field into reusable assets, clearer documentation, and product feedback that helps improve GitLab Duo Agent Platform for future customers.</p>\n<p>The base salary range for this role’s listed level is currently for residents of the United States only. This range is intended to reflect the role&#39;s base salary rate in locations throughout the US. Grade level and salary ranges are determined through interviews and a review of education, experience, knowledge, skills, abilities of the applicant, equity with other team members, alignment with market data, and geographic location. The base salary range does not include any bonuses, equity, or benefits. See more information on our benefits and equity. Sales roles are also eligible for incentive pay targeted at up to 100% of the offered base salary. United States Salary Range $164,880-$247,320 USD</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_6556c9a6-357","directApply":true,"hiringOrganization":{"@type":"Organization","name":"GitLab","sameAs":"https://about.gitlab.com/","logo":"https://logos.yubhub.co/about.gitlab.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/gitlab/jobs/8334735002","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$164,880-$247,320 USD","x-skills-required":["Python","GitLab CI/CD","Infrastructure as Code","Docker","Container Orchestration","DevSecOps","Large Language Model (LLM)","Prompt Engineering","Agentic Workflows"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:51:17.877Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Remote, US"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, GitLab CI/CD, Infrastructure as Code, Docker, Container Orchestration, DevSecOps, Large Language Model (LLM), Prompt Engineering, Agentic Workflows","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":164880,"maxValue":247320,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_f296b6b0-e66"},"title":"Senior Software Security Engineer","description":"<p>Job Title: Senior Software Security Engineer</p>\n<p>About the Role: The Security Engineering team&#39;s mission is to safeguard our AI systems and maintain the trust of our users and society at large. Whether we&#39;re developing critical security infrastructure, building secure development practices, or partnering with our research and product teams, we are committed to operating as a world-class security organization and keeping the safety and trust of our users at the forefront of everything we do.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Build security for large-scale AI clusters, implementing robust cloud security architecture including IAM, network segmentation, and encryption controls</li>\n</ul>\n<ul>\n<li>Design secure-by-design workflows, secure CI/CD pipelines across our services, help build secure cloud infrastructure, with expertise in various cloud environments, Kubernetes security, container orchestration and identity management</li>\n</ul>\n<ul>\n<li>Ship and operate secure, high-reliability services using Infrastructure-as-Code (IaC) practices and GitOps workflows</li>\n</ul>\n<ul>\n<li>Apply deep expertise in threat modeling and risk assessment to secure complex multi cloud environments</li>\n</ul>\n<ul>\n<li>Mentor engineers and contribute to hiring and growth of the Security team</li>\n</ul>\n<p>Requirements:</p>\n<ul>\n<li>5-15+ years of software engineering experience implementing and maintaining critical systems at scale</li>\n</ul>\n<ul>\n<li>Bachelor&#39;s degree in Computer Science/Software Engineering or equivalent industry experience</li>\n</ul>\n<ul>\n<li>Strong software engineering skills in Python or at least one systems language (Go, Rust, C/C++)</li>\n</ul>\n<ul>\n<li>Experience managing infrastructure at scale with DevOps and cloud automation best practices</li>\n</ul>\n<ul>\n<li>Track record of driving engineering excellence through high standards, constructive code reviews, and mentorship</li>\n</ul>\n<ul>\n<li>Proven ability to lead cross-functional security initiatives and navigate complex organizational dynamics</li>\n</ul>\n<ul>\n<li>Outstanding communication skills, translating technical concepts effectively across all organizational levels</li>\n</ul>\n<ul>\n<li>Demonstrated success in bringing clarity and ownership to ambiguous technical problems</li>\n</ul>\n<ul>\n<li>Strong systems thinking with ability to identify and mitigate risks in complex environments</li>\n</ul>\n<ul>\n<li>Low ego, high empathy engineer who attracts talent and supports diverse, inclusive teams</li>\n</ul>\n<ul>\n<li>Experience supporting fast-paced startup engineering teams</li>\n</ul>\n<ul>\n<li>Passionate about AI safety and alignment, with keen interest in making AI systems more interpretable and aligned with human values</li>\n</ul>\n<p>Salary: The annual compensation range for this role is £240,000-£325,000 GBP.</p>\n<p>Experience Level: senior Employment Type: full-time Workplace Type: hybrid Category: Engineering Industry: Technology Salary Range: £240,000-£325,000 GBP Required Skills:</p>\n<ul>\n<li>Cloud security architecture</li>\n<li>IAM</li>\n<li>Network segmentation</li>\n<li>Encryption controls</li>\n<li>Kubernetes security</li>\n<li>Container orchestration</li>\n<li>Identity management</li>\n<li>Infrastructure-as-Code (IaC)</li>\n<li>GitOps</li>\n<li>Threat modeling</li>\n<li>Risk assessment</li>\n<li>DevOps</li>\n<li>Cloud automation</li>\n<li>Python</li>\n<li>Go</li>\n<li>Rust</li>\n<li>C/C++</li>\n</ul>\n<p>Preferred Skills:</p>\n<ul>\n<li>Secure-by-design workflows</li>\n<li>CI/CD pipelines</li>\n<li>Secure cloud infrastructure</li>\n<li>Cloud environments</li>\n<li>Containerization</li>\n<li>Identity and access management</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_f296b6b0-e66","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5022845008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"£240,000-£325,000 GBP","x-skills-required":["Cloud security architecture","IAM","Network segmentation","Encryption controls","Kubernetes security","Container orchestration","Identity management","Infrastructure-as-Code (IaC)","GitOps","Threat modeling","Risk assessment","DevOps","Cloud automation","Python","Go","Rust","C/C++"],"x-skills-preferred":["Secure-by-design workflows","CI/CD pipelines","Secure cloud infrastructure","Cloud environments","Containerization","Identity and access management"],"datePosted":"2026-04-18T15:51:17.687Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"London, UK"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Cloud security architecture, IAM, Network segmentation, Encryption controls, Kubernetes security, Container orchestration, Identity management, Infrastructure-as-Code (IaC), GitOps, Threat modeling, Risk assessment, DevOps, Cloud automation, Python, Go, Rust, C/C++, Secure-by-design workflows, CI/CD pipelines, Secure cloud infrastructure, Cloud environments, Containerization, Identity and access management","baseSalary":{"@type":"MonetaryAmount","currency":"GBP","value":{"@type":"QuantitativeValue","minValue":240000,"maxValue":325000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_f838587f-1ee"},"title":"Software Engineer, Kubernetes","description":"<p>We&#39;re looking for a skilled Software Engineer to join our team and help us build and scale our Kubernetes environment. As a Software Engineer, you will play a key part in ensuring the availability, reliability, and scalability of our cloud infrastructure. You will drive operational excellence, implement robust automation, and help shape the systems that keep our cloud running smoothly.</p>\n<p>Key Responsibilities:</p>\n<ul>\n<li>Build, operate, and scale Kubernetes-based production infrastructure that delivers our products with high reliability and performance.</li>\n<li>Develop automation, tooling, and infrastructure as code in Go and other infrastructure-focused languages to enable zero-touch operations, rapid recovery, and seamless deployments.</li>\n<li>Design, implement, and maintain monitoring, alerting, and observability solutions,leveraging the Grafana ecosystem and related tools,to proactively identify and resolve production issues.</li>\n<li>Drive incident response efforts, participate in on-call rotations, and lead root cause analysis to prevent recurrence and improve incident handling processes.</li>\n<li>Partner with internal and cross-functional teams to ensure platform capabilities meet rigorous operational requirements and customer SLAs.</li>\n<li>Engineer for resiliency, implementing best practices for redundancy, fault tolerance, and disaster recovery across complex distributed systems.</li>\n<li>Advocate for security, reliability, and performance improvements throughout the stack, continuously seeking opportunities to strengthen operational standards.</li>\n<li>Contribute to the development of custom Kubernetes operators and intelligent orchestration frameworks that optimize AI workload performance and resource utilization at scale.</li>\n</ul>\n<p>Requirements:</p>\n<ul>\n<li>3+ years of experience in production engineering, SRE, or large-scale infrastructure/platform roles.</li>\n<li>Knowledgeable in Kubernetes administration, container orchestration, and microservices architectures, with a bias for automating every aspect of operations.</li>\n<li>Proven track record managing high-uptime, customer-facing systems in a fast-moving environment, with experience delivering measurable improvements in reliability and performance.</li>\n<li>Experience in monitoring, observability, and incident management using tools like Prometheus, Grafana, Datadog, Splunk, Loki, or VictoriaMetrics.</li>\n<li>Deep understanding of Linux systems and infrastructure-focused programming, especially in Go and Bash.</li>\n<li>Strong analytical skills and ability to troubleshoot complex production issues.</li>\n<li>Excellent communication skills and ability to share knowledge with technical and non-technical stakeholders.</li>\n</ul>\n<p>What Success Looks Like:</p>\n<ul>\n<li>Deliver stable, robust, and highly-available systems that consistently meet or exceed uptime and performance targets.</li>\n<li>Champion initiatives that drive automation, reduce operational toil, and increase the efficiency of incident response.</li>\n<li>Actively contribute to a blameless culture of learning, mentoring others in operational best practices and production engineering principles.</li>\n<li>Help CoreWeave maintain industry leadership through flawless execution in supporting demanding, AI-powered workloads at scale.</li>\n</ul>\n<p>Why CoreWeave?</p>\n<ul>\n<li>We work hard, have fun, and move fast!</li>\n<li>We&#39;re in an exciting stage of hyper-growth that you won&#39;t want to miss out on.</li>\n<li>We&#39;re not afraid of a little chaos, and we&#39;re constantly learning.</li>\n<li>Our team cares deeply about how we build our product and how we work together, which is represented through our core values:</li>\n</ul>\n<ul>\n<li>Be Curious at Your Core</li>\n<li>Act Like an Owner</li>\n<li>Empower Employees</li>\n<li>Deliver Best-in-Class Client Experiences</li>\n<li>Achieve More Together</li>\n</ul>\n<p>We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and enables the development of innovative solutions to complex problems. As we get set for takeoff, the organization&#39;s growth opportunities are constantly expanding. You will be surrounded by some of the best talent in the industry, who will want to learn from you, too. Come join us!</p>\n<p>The base salary range for this role is $120,000 to $176,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).</p>\n<p>What We Offer:</p>\n<ul>\n<li>The range we&#39;ve posted represents the typical compensation range for this role. To determine actual compensation, we review the market rate for each candidate which can include a variety of factors. These include qualifications, experience, interview performance, and location.</li>\n<li>In addition to a competitive salary, we offer a variety of benefits to support your needs, including:</li>\n</ul>\n<ul>\n<li>Medical, dental, and vision insurance - 100% paid for by CoreWeave</li>\n<li>Company-paid Life Insurance</li>\n<li>Voluntary supplemental life insurance</li>\n<li>Short and long-term disability insurance</li>\n<li>Flexible Spending Account</li>\n<li>Health Savings Account</li>\n<li>Tuition Reimbursement</li>\n<li>Ability to Participate in Employee Stock Purchase Program (ESPP)</li>\n<li>Mental Wellness Benefits through Spring Health</li>\n<li>Family-Forming support provided by Carrot</li>\n<li>Paid Parental Leave</li>\n<li>Flexible, full-service childcare support with Kinside</li>\n<li>401(k) with a generous employer match</li>\n<li>Flexible PTO</li>\n<li>Catered lunch each day in our office and data center locations</li>\n<li>A casual work environment</li>\n<li>A work culture focused on innovative disruption</li>\n</ul>\n<p>Our Workplace:</p>\n<ul>\n<li>While we prioritize a hybrid work environment, remote work may be considered for candidates located more than 30 miles from an office, based on role requirements for specialized skill sets. New hires will be invited to attend onboarding at one of our hubs within their first month. Teams also gather quarterly to support collaboration.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_f838587f-1ee","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4577764006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$120,000 to $176,000","x-skills-required":["Kubernetes administration","container orchestration","microservices architectures","Go","Bash","Linux systems","monitoring","observability","incident management","Prometheus","Grafana","Datadog","Splunk","Loki","VictoriaMetrics"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:49:38.881Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Kubernetes administration, container orchestration, microservices architectures, Go, Bash, Linux systems, monitoring, observability, incident management, Prometheus, Grafana, Datadog, Splunk, Loki, VictoriaMetrics","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":120000,"maxValue":176000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_248927c8-76d"},"title":"Software Engineer, Platform","description":"<p><strong>About the role</strong></p>\n<p>We are looking for software engineers to join our Platform organisation. We build the foundational primitives that accelerate product development across Anthropic, and own infrastructure and systems that teams depend on to ship reliably and at scale.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Architect and optimise the critical development infrastructure that powers our AI product development, including dev environments, observability, and CI/CD pipelines.</li>\n<li>Partner closely with product teams to understand their development workflow and eliminate friction points.</li>\n<li>Work on problems where reliability and enterprise trust are the bar: token refresh at scale, admin controls that let IT govern what agents can do, proxy infrastructure that stays up when partner servers don&#39;t.</li>\n</ul>\n<p><strong>Platform Acceleration</strong></p>\n<p>We work on maximising the developer productivity of product engineers at Anthropic. You&#39;ll help define performance quality and standard for the company, power the next gen of LLM-first products, and redefine best-in-class developer experience.</p>\n<p><strong>Service Infra</strong></p>\n<p>We build and maintain the core infrastructure that powers Anthropic&#39;s engineering organisation, from service mesh and observability systems to deployment pipelines and shared libraries.</p>\n<p><strong>Multicloud</strong></p>\n<p>We build and maintain the infrastructure that enables Anthropic to operate across multiple cloud providers. We focus on cloud-agnostic tooling, cross-cloud networking, and multi-region deployments.</p>\n<p><strong>Auth &amp; Identity</strong></p>\n<p>We build and maintain the critical infrastructure that powers identity and authentication across Anthropic&#39;s product suite. We work closely with product teams, security, support, and trust &amp; safety as customers.</p>\n<p><strong>Connectivity</strong></p>\n<p>Our mission is to make Claude the most connected AI. We own the MCP proxy that routes every tool call and the OAuth and token management that keeps connections authenticated.</p>\n<p><strong>API Distributability</strong></p>\n<p>The Claude API today is a rapidly growing platform serving developers and enterprises at scale,but reaching the next tier of enterprise customers requires transforming how and where we deploy it.</p>\n<p><strong>Platform Intelligence</strong></p>\n<p>We build the training systems that adapt Claude to specific customer workloads. The core problem is task-specific adaptation: getting the right intelligence, cost, and latency profile for a particular use case, and building toward systems where that adaptation can deepen as the customer&#39;s usage grows.</p>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>A minimum of 5 years of practical experience building backend product or platform systems,distributed systems, cloud-native products, developer tools, or external developer facing products.</li>\n<li>Strong fundamentals in service-oriented architectures, networking, and systems design.</li>\n<li>Proficiency in Python, Go, Rust, or similar systems languages.</li>\n<li>Experience with cloud infrastructure (GCP, AWS, or Azure), container orchestration (Kubernetes), and/or multi-cloud networking.</li>\n<li>Take full ownership of your work,from design through deployment and operations.</li>\n<li>Can navigate ambiguity and make sound technical decisions independently,</li>\n</ul>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Annual compensation range: $320,000-\\$320,000 USD</li>\n<li>Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience</li>\n<li>Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience</li>\n<li>Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position</li>\n<li>Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time.</li>\n<li>Visa sponsorship: We do sponsor visas!</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_248927c8-76d","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5157844008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$320,000-$320,000 USD","x-skills-required":["Python","Go","Rust","cloud infrastructure","container orchestration","multi-cloud networking","service-oriented architectures","networking","systems design"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:49:27.252Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Go, Rust, cloud infrastructure, container orchestration, multi-cloud networking, service-oriented architectures, networking, systems design","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":320000,"maxValue":320000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_b687767a-7a1"},"title":"Director of Engineering, Security Risk Management","description":"<p>We&#39;re seeking an exceptional Engineering Lead to drive the evolution of GitLab&#39;s Security Risk Management (SRM) stage into a world-class platform for vulnerability analysis and remediation at enterprise scale.</p>\n<p>This is a rare opportunity to architect and build distributed systems that will fundamentally change how large organisations approach application security and developer security workflows.</p>\n<p>As the SRM Stage Lead, you&#39;ll be responsible for transforming our engineering culture toward high-performance distributed systems while delivering an exceptional user experience for both Application Security professionals and Developers.</p>\n<p>You&#39;ll own the technical strategy for processing, analysing, and remediating vulnerabilities across massive codebases and complex enterprise environments.</p>\n<p><strong>Technical Leadership &amp; Architecture</strong></p>\n<ul>\n<li>Design distributed systems architecture capable of processing vulnerability data from thousands of repositories, millions of commits, and complex dependency graphs in real-time</li>\n<li>Drive storage system decisions for multi-petabyte security datasets, balancing query performance, cost efficiency, and data retention requirements across time-series, graph, and document storage paradigms</li>\n<li>Architect scalable analysis pipelines that can ingest vulnerability feeds, correlate findings across multiple security tools, and provide actionable intelligence to both security teams and individual developers</li>\n<li>Lead the technical evolution from monolithic security scanning to microservices-based, event-driven vulnerability management systems</li>\n</ul>\n<p><strong>Engineering Culture Transformation</strong></p>\n<ul>\n<li>Champion high-performance systems thinking throughout the team, establishing patterns for horizontal scaling, efficient resource utilisation, and fault-tolerant distributed computing</li>\n<li>Establish technical standards for system observability, chaos engineering, and performance optimisation in security-critical systems</li>\n<li>Mentor and develop senior engineers in distributed systems design, database optimisation, and large-scale system architecture</li>\n<li>Drive architectural decision records (ADRs) for major technical decisions, particularly around data storage, processing frameworks, and system boundaries</li>\n</ul>\n<p><strong>Product &amp; User Experience Excellence</strong></p>\n<ul>\n<li>Own the end-to-end user journey (in partnership with PM) for both AppSec professionals managing enterprise-wide risk and developers receiving actionable security feedback in their workflow</li>\n<li>Design APIs and interfaces that abstract complexity while providing the power and flexibility that security professionals demand</li>\n<li>Collaborate with Product Management, UX and Product Design to translate complex technical capabilities into intuitive user experiences</li>\n<li>Establish feedback loops with large enterprise customers to ensure our technical solutions scale with their organisational complexity</li>\n</ul>\n<p><strong>Strategic Technical Execution</strong></p>\n<ul>\n<li>Evaluate and integrate cutting-edge technologies in areas such as graph databases, stream processing, machine learning inference at scale, and distributed caching, in collaboration with GitLab’s Infrastructure, Data and AI teams</li>\n<li>Own the technical roadmap for vulnerability correlation, risk scoring, and automated remediation workflows</li>\n<li>Drive partnerships with other GitLab stages to ensure seamless integration across the DevSecOps platform</li>\n<li>Lead incident response for availability and performance issues in customer-facing security systems</li>\n</ul>\n<p><strong>What You’ll Bring</strong></p>\n<ul>\n<li>10+ years of software engineering experience with 5+ years leading distributed systems at scale (&gt;100M daily operations)</li>\n<li>Deep expertise in designing and operating high-throughput, low-latency distributed systems with complex data models</li>\n<li>Proven experience with polyglot persistence strategies, including relational databases (PostgreSQL, Cloud Spanner), time-series databases, graph databases, and distributed key-value stores</li>\n<li>Strong background in stream processing frameworks (Apache Kafka, Apache Flink, or similar) and event-driven architectures</li>\n<li>Hands-on experience with container orchestration (Kubernetes) and cloud-native observability stacks</li>\n<li>Security domain knowledge with understanding of vulnerability assessment, static analysis, dependency scanning, or application security testing</li>\n</ul>\n<p><strong>Leadership &amp; Communication</strong></p>\n<ul>\n<li>Proven track record of leading and growing high-performing engineering teams (40+ engineers)</li>\n<li>Experience transforming engineering culture and establishing technical excellence standards in fast-growing organisations</li>\n<li>Strong technical communication skills with ability to present complex architectural decisions to executive stakeholders</li>\n<li>Collaborative leadership style with experience working across multiple engineering teams and product stakeholders</li>\n</ul>\n<p><strong>Problem-Solving &amp; Innovation</strong></p>\n<ul>\n<li>Systems thinking approach to complex technical problems with demonstrated ability to make appropriate trade-offs between performance, scalability, and maintainability</li>\n<li>Experience with A/B testing frameworks and data-driven decision making in technical contexts</li>\n<li>Track record of successfully delivering large-scale technical migrations or architectural transformations</li>\n<li>Startup or high-growth company experience with ability to balance technical debt with rapid feature delivery</li>\n</ul>\n<p><strong>About the team</strong></p>\n<p>Security Risk Management sits at the heart of modern DevSecOps. The systems you build will directly impact how Fortune 500 companies protect their applications and how millions of developers integrate security into their daily workflow.</p>\n<p>You&#39;ll have the opportunity to define the future of application security tooling while working with some of the most challenging distributed systems problems in the industry.</p>\n<p>The Technical Challenge</p>\n<p>You&#39;ll be solving some of the most interesting distributed systems problems in the security space:</p>\n<ul>\n<li>Scale: Processing vulnerability data for organisations with 100,000+ repositories and millions of developers</li>\n<li>Performance: Sub-second query response times for complex security analytics across massive datasets</li>\n<li>Reliability: 99.95%+ uptime SLAs for security-critical workflows that can&#39;t afford downtime</li>\n<li>Complexity: Correlating findings across 20+ different security tools while maintaining data lineage and audit trails</li>\n<li>User Experience: Making complex security data accessible to both security experts and developers with varying security expertise</li>\n</ul>\n<p><strong>Salary</strong></p>\n<p>The base salary range for this role’s listed level is currently for residents of the United States.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_b687767a-7a1","directApply":true,"hiringOrganization":{"@type":"Organization","name":"GitLab","sameAs":"https://about.gitlab.com/","logo":"https://logos.yubhub.co/about.gitlab.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/gitlab/jobs/8195921002","x-work-arrangement":"remote","x-experience-level":"executive","x-job-type":"full-time","x-salary-range":"Base salary range for this role’s listed level is currently for residents of the United States.","x-skills-required":["Distributed systems","Polyglot persistence strategies","Stream processing frameworks","Event-driven architectures","Container orchestration","Cloud-native observability stacks","Security domain knowledge","Vulnerability assessment","Static analysis","Dependency scanning","Application security testing"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:48:19.166Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Remote, Canada; Remote, EMEA; Remote, US"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Distributed systems, Polyglot persistence strategies, Stream processing frameworks, Event-driven architectures, Container orchestration, Cloud-native observability stacks, Security domain knowledge, Vulnerability assessment, Static analysis, Dependency scanning, Application security testing"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_fff47210-64d"},"title":"Senior Software Engineer, Applied AI (Fullstack)","description":"<p>Secure Every Identity, from AI to Human</p>\n<p>Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organisations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence.</p>\n<p>This is an opportunity to do career-defining work. We&#39;re all in on this mission. If you are too, let&#39;s talk.</p>\n<p>Okta&#39;s Business Technology organisation builds secure and intelligent internal platforms that power our global workforce. Our AI &amp; Automation team is delivering next-generation tools and experiences by integrating GenAI and intelligent automation into workflows across IT, HR, Finance, Sales, Marketing and Customer Support.</p>\n<p>We focus on real-world applications: virtual agents, AI copilots, internal RAG services, and AI-augmented self-service portals , all with scale, governance, and user experience in mind.</p>\n<p><strong>The Opportunity</strong></p>\n<p>As a Senior Software Engineer, Applied AI, you&#39;ll play a key role in building user-facing and backend systems that leverage GenAI to improve internal experiences and operations. This role requires strong full-stack engineering skills, with an emphasis on both AI integration and building intuitive, performant UIs that make AI accessible and useful to our internal customers.</p>\n<p>You&#39;ll work closely with software engineers, product managers, and designers to build secure, intelligent tools for employees across Okta.</p>\n<p><strong>What You&#39;ll Do</strong></p>\n<ul>\n<li>Design and build end-to-end GenAI-powered applications, including web-based UIs, API services, and backend orchestration.</li>\n</ul>\n<ul>\n<li>Implement and integrate LLM-based experiences using frameworks like LangChain, LlamaIndex, and tools like OpenAI, Claude, or Gemini.</li>\n</ul>\n<ul>\n<li>Define, implement, and champion operational excellence standards (SLOs, observability, incident response frameworks) for all services deployed.</li>\n</ul>\n<ul>\n<li>Develop responsive, accessible, and modern frontend interfaces using frameworks like React or Vue , with a focus on usability, performance, and trust in AI outputs.</li>\n</ul>\n<ul>\n<li>Build and maintain a library of reusable frontend components and hooks that allow other business delivery teams to easily &#39;drop in&#39; GenAI capabilities into their own applications.</li>\n</ul>\n<ul>\n<li>Build and maintain retrieval-augmented generation (RAG) pipelines with vector search and embedding strategies (e.g., Pinecone, FAISS, Qdrant).</li>\n</ul>\n<ul>\n<li>Collaborate with designers and product managers to rapidly iterate on UX patterns for AI-powered experiences (e.g., prompt inputs, citations, summaries).</li>\n</ul>\n<ul>\n<li>Ensure security, privacy, observability, and test coverage across the full stack.</li>\n</ul>\n<ul>\n<li>Contribute to architecture decisions, engineering standards, and best practices for AI/automation systems.</li>\n</ul>\n<ul>\n<li>Partner with platform and infrastructure teams to ensure services scale reliably across the org.</li>\n</ul>\n<p><strong>What You&#39;ll Bring</strong></p>\n<ul>\n<li>5–8 years of software engineering experience with full-stack development, including 2+ years of building AI/ML-driven applications.</li>\n</ul>\n<ul>\n<li>Strong Python development skills and 5+ years experience building cloud-based services using AWS, Docker, and RESTful APIs.</li>\n</ul>\n<ul>\n<li>2+ years of experience in frontend technologies like React, TypeScript, or Vue, and comfort working on UI/UX for internal tools or enterprise applications.</li>\n</ul>\n<ul>\n<li>Hands-on experience with LLM integration, RAG pipelines, prompt engineering, or orchestration frameworks like LangChain or LlamaIndex.</li>\n</ul>\n<ul>\n<li>Strong background in distributed systems, APIs, microservices, container orchestration (ECS/EKS), and cloud platforms (AWS/GCP/Azure).</li>\n</ul>\n<ul>\n<li>Familiarity with secure coding, authentication/authorisation, and internal data governance best practices.</li>\n</ul>\n<ul>\n<li>Ability to collaborate across engineering, design, and product teams , with a strong sense of user empathy and technical ownership.</li>\n</ul>\n<ul>\n<li>Bonus: Exposure to design systems, AI evaluation tooling, or real-time application performance monitoring.</li>\n</ul>\n<p><strong>Why Join Okta</strong></p>\n<ul>\n<li>Make AI Real: Design and build AI-powered apps used daily by Okta employees.</li>\n</ul>\n<ul>\n<li>Full-Stack Challenge: Tackle end-to-end problems , from LLM orchestration to intuitive UIs.</li>\n</ul>\n<ul>\n<li>Trusted Innovation: Join a team committed to security, ethics, and technical excellence in AI.</li>\n</ul>\n<p>#LI-MK1</p>\n<p>#LI-hybrid</p>\n<p>P24739_3355024</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_fff47210-64d","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Okta","sameAs":"https://www.okta.com/","logo":"https://logos.yubhub.co/okta.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/okta/jobs/7589781","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$165,000-$247,000 USD","x-skills-required":["Python","AWS","Docker","RESTful APIs","React","TypeScript","Vue","LLM integration","RAG pipelines","prompt engineering","orchestration frameworks","distributed systems","APIs","microservices","container orchestration","cloud platforms"],"x-skills-preferred":["design systems","AI evaluation tooling","real-time application performance monitoring"],"datePosted":"2026-04-18T15:47:54.323Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, California"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, AWS, Docker, RESTful APIs, React, TypeScript, Vue, LLM integration, RAG pipelines, prompt engineering, orchestration frameworks, distributed systems, APIs, microservices, container orchestration, cloud platforms, design systems, AI evaluation tooling, real-time application performance monitoring","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":165000,"maxValue":247000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_60aae9e8-e8b"},"title":"Software Engineer, Observability","description":"<p>We&#39;re looking for a skilled Software Engineer to join our Observability team. As a member of this team, you will be responsible for designing and evolving logging, metrics, and tracing pipelines to handle massive data volumes. You will also evaluate and integrate new technologies to enhance Airtable&#39;s observability posture.</p>\n<p>Your responsibilities will include guiding and mentoring a growing team of infrastructure engineers, defining and upholding coding standards, partnering with other teams to embed observability throughout the development lifecycle, and owning end-to-end reliability for observability tools.</p>\n<p>You will also extend observability to LLM and AI features by instrumenting prompts, model calls, and RAG pipelines to capture latency, reliability, cost, and safety signals. You will design online and offline evaluation loops for LLM quality, build dashboards and alerts for token usage, error rates, and model performance, and connect these signals to tracing for prompt lineage.</p>\n<p>To succeed in this role, you will need 6+ years of software engineering experience, with 3+ years focused on observability or infrastructure at scale. You will also need demonstrated success implementing and running production-grade logging, metrics, or tracing systems, proficiency in distributed systems concepts, data streaming pipelines, and container orchestration, and deep hands-on knowledge of tools such as Prometheus, Grafana, Datadog, OpenTelemetry, ELK Stack, Loki, or ClickHouse.</p>\n<p>This is a high-impact role that will allow you to lead the modernization of Airtable&#39;s observability stack, influence how every engineer monitors and debugs mission-critical systems, and drive major projects across engineering organization to build platform and services for solving observability problems.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_60aae9e8-e8b","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Airtable","sameAs":"https://airtable.com/","logo":"https://logos.yubhub.co/airtable.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/airtable/jobs/8400374002","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Distributed systems concepts","Data streaming pipelines","Container orchestration","Prometheus","Grafana","Datadog","OpenTelemetry","ELK Stack","Loki","ClickHouse"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:47:22.779Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA; New York, NY; Remote (Seattle, WA only)"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Distributed systems concepts, Data streaming pipelines, Container orchestration, Prometheus, Grafana, Datadog, OpenTelemetry, ELK Stack, Loki, ClickHouse"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_2d1d4a4d-c55"},"title":"Senior Software Engineer, Applied AI (Fullstack)","description":"<p>We are looking for a Senior Software Engineer, Applied AI to join our team. As a key member of our AI &amp; Automation team, you will play a crucial role in building user-facing and backend systems that leverage GenAI to improve internal experiences and operations.</p>\n<p>Your primary responsibilities will include designing and building end-to-end GenAI-powered applications, implementing and integrating LLM-based experiences, defining and implementing operational excellence standards, developing responsive and accessible frontend interfaces, and ensuring security, privacy, and test coverage across the full stack.</p>\n<p>To be successful in this role, you will need strong full-stack engineering skills, with an emphasis on both AI integration and building intuitive, performant UIs that make AI accessible and useful to our internal customers. You will also need to collaborate with software engineers, product managers, and designers to build secure, intelligent tools for employees across Okta.</p>\n<p>In addition to your technical skills, you will need to have excellent communication and teamwork skills, with a strong sense of user empathy and technical ownership. You will also need to be able to contribute to architecture decisions, engineering standards, and best practices for AI/automation systems.</p>\n<p>If you are a motivated and experienced software engineer with a passion for AI and automation, we encourage you to apply for this exciting opportunity.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_2d1d4a4d-c55","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Okta","sameAs":"https://www.okta.com/","logo":"https://logos.yubhub.co/okta.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/okta/jobs/7603595","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$147,000-$247,000 USD","x-skills-required":["Python","Full-stack development","GenAI","LLM-based experiences","Operational excellence standards","Frontend development","Security","Privacy","Test coverage"],"x-skills-preferred":["LangChain","LlamaIndex","OpenAI","Claude","Gemini","Distributed systems","APIs","Microservices","Container orchestration","Cloud platforms"],"datePosted":"2026-04-18T15:47:16.294Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Chicago, Illinois"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Full-stack development, GenAI, LLM-based experiences, Operational excellence standards, Frontend development, Security, Privacy, Test coverage, LangChain, LlamaIndex, OpenAI, Claude, Gemini, Distributed systems, APIs, Microservices, Container orchestration, Cloud platforms","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":147000,"maxValue":247000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_9f4d3e45-9fb"},"title":"Staff Software Engineer, Front-End (AI Engineering)","description":"<p>We&#39;re looking for a Staff AI Engineer (Front end) to help design and scale our Agentic AI Platform , the foundation that enables enterprise teams to build, deploy, and manage intelligent AI agents securely and at scale.</p>\n<p>As part of a global AI engineering team spanning the US and India, you&#39;ll provide technical leadership, write hands-on code, and mentor peers while shaping platform capabilities that empower applied AI use cases across Okta.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Design and develop platform services for AI agents, including orchestration, lifecycle management, observability, and guardrails.</li>\n<li>Collaborate daily with US counterparts to align on architecture, share roadmaps, and ensure global consistency.</li>\n<li>Build scalable APIs, frameworks, and reusable components that enable teams to rapidly create and deploy secure AI agents.</li>\n<li>Contribute to monitoring, evaluation, and governance features that make AI adoption safe, reliable, and enterprise-ready.</li>\n<li>Partner with Applied AI engineers, product managers, and designers to ensure the platform supports real-world use cases.</li>\n<li>Mentor engineers within the India team, raising the bar on technical craftsmanship and platform design.</li>\n<li>Stay current with advances in agentic AI, orchestration frameworks, and enterprise AI infrastructure.</li>\n</ul>\n<p>Requirements:</p>\n<ul>\n<li>8+ years of software engineering experience, with at least 3+ years in AI/ML platforms or intelligent automation.</li>\n<li>5+ years in modern frontend frameworks (e.g., React, Vue) with the ability to build functional UI components, integrate APIs, and maintain design consistency.</li>\n<li>Knowledge on Python preferred, Java, Node.js, or Go.</li>\n<li>Expertise in AI agent frameworks (LangGraph preferred), orchestration systems, vector databases, RAG pipelines, and monitoring/observability.</li>\n<li>Strong background in distributed systems, APIs, microservices, container orchestration (ECS/EKS), and cloud platforms (AWS/GCP/Azure).</li>\n<li>Experience building secure, enterprise-grade infrastructure for AI or ML workloads.</li>\n<li>Ability to influence architecture decisions, align with a global team, and set technical direction locally.</li>\n<li>Strong communication and collaboration skills to work effectively across time zones and global teams.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_9f4d3e45-9fb","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Okta","sameAs":"https://www.okta.com/","logo":"https://logos.yubhub.co/okta.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/okta/jobs/6949710","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Python","Java","Node.js","Go","React","Vue","AI agent frameworks","Orchestration systems","Vector databases","RAG pipelines","Monitoring/observability","Distributed systems","APIs","Microservices","Container orchestration","Cloud platforms"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:46:37.864Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Bengaluru, India"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Java, Node.js, Go, React, Vue, AI agent frameworks, Orchestration systems, Vector databases, RAG pipelines, Monitoring/observability, Distributed systems, APIs, Microservices, Container orchestration, Cloud platforms"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_bc3394a5-691"},"title":"Senior Software Engineer, Applied AI (Fullstack)","description":"<p>Secure Every Identity, from AI to Human</p>\n<p>Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organisations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence.</p>\n<p>This is an opportunity to do career-defining work. We&#39;re all in on this mission. If you are too, let&#39;s talk.</p>\n<p>Okta&#39;s Business Technology organisation builds secure and intelligent internal platforms that power our global workforce. Our AI &amp; Automation team is delivering next-generation tools and experiences by integrating GenAI and intelligent automation into workflows across IT, HR, Finance, Sales, Marketing and Customer Support.</p>\n<p>We focus on real-world applications: virtual agents, AI copilots, internal RAG services, and AI-augmented self-service portals , all with scale, governance, and user experience in mind.</p>\n<p><strong>The Opportunity</strong></p>\n<p>As a Senior Software Engineer, Applied AI, you&#39;ll play a key role in building user-facing and backend systems that leverage GenAI to improve internal experiences and operations. This role requires strong full-stack engineering skills, with an emphasis on both AI integration and building intuitive, performant UIs that make AI accessible and useful to our internal customers.</p>\n<p>You&#39;ll work closely with software engineers, product managers, and designers to build secure, intelligent tools for employees across Okta.</p>\n<p><strong>What You&#39;ll Do</strong></p>\n<ul>\n<li>Design and build end-to-end GenAI-powered applications, including web-based UIs, API services, and backend orchestration.</li>\n</ul>\n<ul>\n<li>Implement and integrate LLM-based experiences using frameworks like LangChain, LlamaIndex, and tools like OpenAI, Claude, or Gemini.</li>\n</ul>\n<ul>\n<li>Define, implement, and champion operational excellence standards (SLOs, observability, incident response frameworks) for all services deployed.</li>\n</ul>\n<ul>\n<li>Develop responsive, accessible, and modern frontend interfaces using frameworks like React or Vue , with a focus on usability, performance, and trust in AI outputs.</li>\n</ul>\n<ul>\n<li>Build and maintain a library of reusable frontend components and hooks that allow other business delivery teams to easily &#39;drop in&#39; GenAI capabilities into their own applications.</li>\n</ul>\n<ul>\n<li>Build and maintain retrieval-augmented generation (RAG) pipelines with vector search and embedding strategies (e.g., Pinecone, FAISS, Qdrant).</li>\n</ul>\n<ul>\n<li>Collaborate with designers and product managers to rapidly iterate on UX patterns for AI-powered experiences (e.g., prompt inputs, citations, summaries).</li>\n</ul>\n<ul>\n<li>Ensure security, privacy, observability, and test coverage across the full stack.</li>\n</ul>\n<ul>\n<li>Contribute to architecture decisions, engineering standards, and best practices for AI/automation systems.</li>\n</ul>\n<ul>\n<li>Partner with platform and infrastructure teams to ensure services scale reliably across the org.</li>\n</ul>\n<p><strong>What You&#39;ll Bring</strong></p>\n<ul>\n<li>5–8 years of software engineering experience with full-stack development, including 2+ years of building AI/ML-driven applications.</li>\n</ul>\n<ul>\n<li>Strong Python development skills and 5+ years experience building cloud-based services using AWS, Docker, and RESTful APIs.</li>\n</ul>\n<ul>\n<li>2+ years of experience in frontend technologies like React, TypeScript, or Vue, and comfort working on UI/UX for internal tools or enterprise applications.</li>\n</ul>\n<ul>\n<li>Hands-on experience with LLM integration, RAG pipelines, prompt engineering, or orchestration frameworks like LangChain or LlamaIndex.</li>\n</ul>\n<ul>\n<li>Strong background in distributed systems, APIs, microservices, container orchestration (ECS/EKS), and cloud platforms (AWS/GCP/Azure).</li>\n</ul>\n<ul>\n<li>Familiarity with secure coding, authentication/authorisation, and internal data governance best practices.</li>\n</ul>\n<ul>\n<li>Ability to collaborate across engineering, design, and product teams , with a strong sense of user empathy and technical ownership.</li>\n</ul>\n<ul>\n<li>Bonus: Exposure to design systems, AI evaluation tooling, or real-time application performance monitoring.</li>\n</ul>\n<p><strong>Why Join Okta</strong></p>\n<ul>\n<li>Make AI Real: Design and build AI-powered apps used daily by Okta employees.</li>\n</ul>\n<ul>\n<li>Full-Stack Challenge: Tackle end-to-end problems , from LLM orchestration to intuitive UIs.</li>\n</ul>\n<ul>\n<li>Trusted Innovation: Join a team committed to security, ethics, and technical excellence in AI.</li>\n</ul>\n<p>#LI-MK1</p>\n<p>#LI-hybrid</p>\n<p>P24739_3355024</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_bc3394a5-691","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Okta","sameAs":"https://www.okta.com/","logo":"https://logos.yubhub.co/okta.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/okta/jobs/7599857","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$165,000-$247,000 USD","x-skills-required":["Python","AWS","Docker","RESTful APIs","React","TypeScript","Vue","LLM integration","RAG pipelines","prompt engineering","orchestration frameworks","distributed systems","APIs","microservices","container orchestration","cloud platforms"],"x-skills-preferred":["design systems","AI evaluation tooling","real-time application performance monitoring"],"datePosted":"2026-04-18T15:45:45.352Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Bellevue, Washington"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, AWS, Docker, RESTful APIs, React, TypeScript, Vue, LLM integration, RAG pipelines, prompt engineering, orchestration frameworks, distributed systems, APIs, microservices, container orchestration, cloud platforms, design systems, AI evaluation tooling, real-time application performance monitoring","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":165000,"maxValue":247000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_bd9625d9-99b"},"title":"ML Infrastructure Engineer, Safeguards","description":"<p>We are seeking a Machine Learning Infrastructure Engineer to join our Safeguards organization, where you&#39;ll build and scale the critical infrastructure that powers our AI safety systems.</p>\n<p>As part of the Safeguards team, you&#39;ll design and implement ML infrastructure that powers Claude safety. Your work will directly contribute to making AI systems more trustworthy and aligned with human values, ensuring our models operate safely as they become more capable.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Design and build scalable ML infrastructure to support real-time and batch classifier and safety evaluations across our model ecosystem</li>\n<li>Build monitoring and observability tools to track model performance, data quality, and system health for safety-critical applications</li>\n<li>Collaborate with research teams to productionize safety research, translating experimental safety techniques into robust, scalable systems</li>\n<li>Optimize inference latency and throughput for real-time safety evaluations while maintaining high reliability standards</li>\n<li>Implement automated testing, deployment, and rollback systems for ML models in production safety applications</li>\n<li>Partner with Safeguards, Security, and Alignment teams to understand requirements and deliver infrastructure that meets safety and production needs</li>\n<li>Contribute to the development of internal tools and frameworks that accelerate safety research and deployment</li>\n</ul>\n<p>You may be a good fit if you:</p>\n<ul>\n<li>Have 5+ years of experience building production ML infrastructure, ideally in safety-critical domains like fraud detection, content moderation, or risk assessment</li>\n<li>Are proficient in Python and have experience with ML frameworks like PyTorch, TensorFlow, or JAX</li>\n<li>Have hands-on experience with cloud platforms (AWS, GCP) and container orchestration (Kubernetes)</li>\n<li>Understand distributed systems principles and have built systems that handle high-throughput, low-latency workloads</li>\n<li>Have experience with data engineering tools and building robust data pipelines (e.g., Spark, Airflow, streaming systems)</li>\n<li>Are results-oriented, with a bias towards reliability and impact in safety-critical systems</li>\n<li>Enjoy collaborating with researchers and translating cutting-edge research into production systems</li>\n<li>Care deeply about AI safety and the societal impacts of your work</li>\n</ul>\n<p>Strong candidates may have experience with:</p>\n<ul>\n<li>Working with large language models and modern transformer architectures</li>\n<li>Implementing A/B testing frameworks and experimentation infrastructure for ML systems</li>\n<li>Developing monitoring and alerting systems for ML model performance and data drift</li>\n<li>Building automated labeling systems and human-in-the-loop workflows</li>\n<li>Experience in trust &amp; safety, fraud prevention, or content moderation domains</li>\n<li>Knowledge of privacy-preserving ML techniques and compliance requirements</li>\n<li>Contributing to open-source ML infrastructure projects</li>\n</ul>\n<p>We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_bd9625d9-99b","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4778843008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$320,000-$405,000 USD","x-skills-required":["Python","PyTorch","TensorFlow","JAX","Cloud platforms (AWS, GCP)","Container orchestration (Kubernetes)","Distributed systems principles","Data engineering tools (Spark, Airflow, streaming systems)"],"x-skills-preferred":["Large language models and modern transformer architectures","A/B testing frameworks and experimentation infrastructure for ML systems","Monitoring and alerting systems for ML model performance and data drift","Automated labeling systems and human-in-the-loop workflows","Trust & safety, fraud prevention, or content moderation domains","Privacy-preserving ML techniques and compliance requirements","Open-source ML infrastructure projects"],"datePosted":"2026-04-18T15:44:06.907Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, PyTorch, TensorFlow, JAX, Cloud platforms (AWS, GCP), Container orchestration (Kubernetes), Distributed systems principles, Data engineering tools (Spark, Airflow, streaming systems), Large language models and modern transformer architectures, A/B testing frameworks and experimentation infrastructure for ML systems, Monitoring and alerting systems for ML model performance and data drift, Automated labeling systems and human-in-the-loop workflows, Trust & safety, fraud prevention, or content moderation domains, Privacy-preserving ML techniques and compliance requirements, Open-source ML infrastructure projects","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":320000,"maxValue":405000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_022d9aef-8cd"},"title":"Member of Technical Staff - Infrastructure Reliability","description":"<p><strong>About the Role</strong></p>\n<p>We are training some of the largest models in the world on the latest hardware across multiple environments. To do this reliably at xAI&#39;s pace, we need engineers who have battle-tested experience keeping massive distributed infrastructure up and running 24/7, including on-prem and cloud-based infrastructure.</p>\n<p>You will own the availability, performance, and evolution of xAI&#39;s core compute, storage, and networking infrastructure. This is not an ops-only role , strong coding is a hard requirement. You will design, implement, and ship systems software, automation, and tooling in Python and/or Rust that directly impact training throughput and cluster utilization.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Define and execute the technical strategy for infrastructure reliability and scalability</li>\n<li>Build and maintain the automation, observability, and control planes that keep multi-datacenter, hybrid cloud/on-prem environments healthy</li>\n<li>Lead incident response, deep-dive root cause analysis, and post-mortems that drive real fixes</li>\n<li>Identify, instrument, and eliminate systemic failure patterns (capacity, network, hardware, storage, software)</li>\n<li>Design and implement high-leverage systems software (daemons, controllers, schedulers, etc.) in Python and Rust.</li>\n</ul>\n<p><strong>Basic Qualifications</strong></p>\n<ul>\n<li>5+ years shipping production software and/or operating distributed infrastructure at scale</li>\n<li>Expert-level knowledge of Linux systems, TCP/IP networking, and systems programming</li>\n<li>Strong coding skills with proven production experience in Rust (strongly preferred) and at least one of Python, Go, or C++.</li>\n</ul>\n<p><strong>Preferred Skills and Experience</strong></p>\n<ul>\n<li>Significant contributions to large-scale GPU clusters or AI/ML infrastructure</li>\n<li>Experience in on-call rotations and incident response in high-stakes environments.</li>\n</ul>\n<p><strong>Compensation and Benefits</strong></p>\n<p>$180,000 - $400,000 USD</p>\n<p>Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short &amp; long-term disability insurance, life insurance, and various other discounts and perks.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_022d9aef-8cd","directApply":true,"hiringOrganization":{"@type":"Organization","name":"xAI","sameAs":"https://www.xai.com/","logo":"https://logos.yubhub.co/xai.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/xai/jobs/4801451007","x-work-arrangement":"onsite","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$180,000 - $400,000 USD","x-skills-required":["Linux systems","TCP/IP networking","systems programming","Rust","Python","Go","C++","container orchestration","container runtimes","infrastructure-as-code"],"x-skills-preferred":["large-scale GPU clusters","AI/ML infrastructure","on-call rotations","incident response"],"datePosted":"2026-04-18T15:42:36.486Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Palo Alto, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Linux systems, TCP/IP networking, systems programming, Rust, Python, Go, C++, container orchestration, container runtimes, infrastructure-as-code, large-scale GPU clusters, AI/ML infrastructure, on-call rotations, incident response","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":180000,"maxValue":400000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_18ae1499-b22"},"title":"Research Engineer, Discovery","description":"<p>As a Research Engineer on our team, you will work end-to-end across the whole model stack, identifying and addressing key infra blockers on the path to scientific AGI. Strong candidates should have familiarity with elements of language model training, evaluation, and inference and eagerness to quickly dive and get up to speed in areas they are not yet an expert on.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Design and implement large-scale infrastructure systems to support AI scientist training, evaluation, and deployment across distributed environments</li>\n<li>Identify and resolve infrastructure bottlenecks impeding progress toward scientific capabilities</li>\n<li>Develop robust and reliable evaluation frameworks for measuring progress towards scientific AGI</li>\n<li>Build scalable and performant VM/sandboxing/container architectures to safely execute long-horizon AI tasks and scientific workflows</li>\n<li>Collaborate to translate experimental requirements into production-ready infrastructure</li>\n<li>Develop large scale data pipelines to handle advanced language model training requirements</li>\n<li>Optimize large scale training and inference pipelines for stable and efficient reinforcement learning</li>\n</ul>\n<p>You may be a good fit if you:</p>\n<ul>\n<li>Have 6+ years of highly-relevant experience in infrastructure engineering with demonstrated expertise in large-scale distributed systems</li>\n<li>Are a strong communicator and enjoy working collaboratively</li>\n<li>Possess deep knowledge of performance optimization techniques and system architectures for high-throughput ML workloads</li>\n<li>Have experience with containerization technologies (Docker, Kubernetes) and orchestration at scale</li>\n<li>Have proven track record of building large-scale data pipelines and distributed storage systems</li>\n<li>Excel at diagnosing and resolving complex infrastructure challenges in production environments</li>\n<li>Can work effectively across the full ML stack from data pipelines to performance optimization</li>\n<li>Have experience collaborating with other researchers to scale experimental ideas</li>\n<li>Thrive in fast-paced environments and can rapidly iterate from experimentation to production</li>\n</ul>\n<p>Strong candidates may also have:</p>\n<ul>\n<li>Experience with language model training infrastructure and distributed ML frameworks (PyTorch, JAX, etc.)</li>\n<li>Background in building infrastructure for AI research labs or large-scale ML organizations</li>\n<li>Knowledge of GPU/TPU architectures and language model inference optimization</li>\n<li>Experience with cloud platforms (AWS, GCP) at enterprise scale</li>\n<li>Familiarity with VM and container orchestration</li>\n<li>Experience with workflow orchestration tools and experiment management systems</li>\n<li>History working with large scale reinforcement learning</li>\n<li>Comfort with large scale data pipelines (Beam, Spark, Dask, …)</li>\n</ul>\n<p>The annual compensation range for this role is $350,000-$850,000 USD.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_18ae1499-b22","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4669581008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$350,000-$850,000 USD","x-skills-required":["large-scale distributed systems","containerization technologies (Docker, Kubernetes)","performance optimization techniques","system architectures for high-throughput ML workloads","data pipelines","distributed storage systems","ML frameworks (PyTorch, JAX, etc.)","GPU/TPU architectures","cloud platforms (AWS, GCP)","VM and container orchestration","workflow orchestration tools","experiment management systems","reinforcement learning","large scale data pipelines (Beam, Spark, Dask, …)"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:41:42.408Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"large-scale distributed systems, containerization technologies (Docker, Kubernetes), performance optimization techniques, system architectures for high-throughput ML workloads, data pipelines, distributed storage systems, ML frameworks (PyTorch, JAX, etc.), GPU/TPU architectures, cloud platforms (AWS, GCP), VM and container orchestration, workflow orchestration tools, experiment management systems, reinforcement learning, large scale data pipelines (Beam, Spark, Dask, …)","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":350000,"maxValue":850000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_61be0866-2b0"},"title":"Principal Software Engineer, Performance","description":"<p>We are seeking a highly experienced Principal Software Engineer to join our Infrastructure Performance team. As a key member of this team, you will be responsible for defining and driving Airbnb&#39;s long-term performance strategy, spanning product performance, infrastructure efficiency, and business objectives for scale and growth.</p>\n<p>In this role, you will lead the architecture and development of performance profiling and instrumentation infrastructure, covering CPU, GPU, memory, request hot paths, utilization, and deployment events, making these capabilities available to all backend teams.</p>\n<p>You will partner with infrastructure teams across compute, reliability, backend frameworks, and AI Infra to ensure the fleet operates at optimal utilization.</p>\n<p>You will connect performance outcomes to business objectives and company-wide SLOs, and guide engineering teams in keeping the stack scalable and efficient.</p>\n<p>You will evaluate emerging hardware and software technologies, engage with the external solutions ecosystem, and advise on build vs. buy decisions in areas of strategic importance.</p>\n<p>As a mentor and technical leader, you will uplevel engineers across the organization through design reviews, architectural guidance, and performance best practices.</p>\n<p>To be successful in this role, you will need to have 12+ years of performance engineering experience in high-scale, high-growth production environments.</p>\n<p>You will need to have a deep understanding of how software and hardware systems interact at scale, including architectural patterns for performance-critical stacks.</p>\n<p>You will need to have strong familiarity with public cloud infrastructure (AWS, GCP, or Azure) and container orchestration (Docker, Kubernetes).</p>\n<p>You will need to have experience with profiling and instrumentation tooling across CPU, GPU, memory, and distributed request tracing.</p>\n<p>You will need to have demonstrated ability to define performance objectives and drive delivery against company-wide SLOs across multiple organizations.</p>\n<p>You will need to have strong communication and influence skills; comfortable driving technical direction with senior engineering and product leadership.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_61be0866-2b0","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Airbnb","sameAs":"https://www.airbnb.com/","logo":"https://logos.yubhub.co/airbnb.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/airbnb/jobs/7826679","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$292,000-$365,000 USD","x-skills-required":["performance engineering","software engineering","infrastructure performance","public cloud infrastructure","container orchestration","profiling and instrumentation tooling","distributed request tracing","cloud computing","containerization"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:41:00.673Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Remote-US"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"performance engineering, software engineering, infrastructure performance, public cloud infrastructure, container orchestration, profiling and instrumentation tooling, distributed request tracing, cloud computing, containerization","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":292000,"maxValue":365000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_08f992cf-0e9"},"title":"CyberSecurity Team Lead, Infrastructure and Application","description":"<p>About Mistral AI</p>\n<p>Mistral AI is a technology company that develops and provides AI-powered solutions and platforms for enterprise use. Our technology is designed to integrate seamlessly into daily working life.</p>\n<p>Role Summary</p>\n<p>As a CyberSecurity Team Lead, you will be responsible for architecting and enforcing the security posture of our entire technical stack, from on-premise foundations to cloud-native deployments. You will oversee the identification, prioritization, and remediation of vulnerabilities across both On-Prem and Cloud infrastructures as well as internal applications.</p>\n<p>Responsibilities</p>\n<ul>\n<li>Oversee the identification, prioritization, and remediation of vulnerabilities across both On-Prem and Cloud infrastructures as well as internal applications.</li>\n<li>Select, deploy, and maintain the tools needed for visibility and protection, including CNAPP, CSPM, SAST/DAST, secret scanning, and SBOM/CVE tracking.</li>\n<li>Integrate security controls and automated gates directly into CI/CD pipelines to catch vulnerabilities before deployment (Shift Left).</li>\n<li>Partner with engineering teams to interpret findings and &#39;ease the fix,&#39; providing patches, code snippets, or architectural advice to resolve issues quickly.</li>\n<li>Define and maintain rigorous security guidelines and best practices for developers and system administrators.</li>\n<li>Design and lead security awareness programs and technical training tailored for developers and admins to reduce human risk.</li>\n<li>Track and define key security metrics (MTTR, coverage, vulnerability density) to visualize posture and progress to leadership.</li>\n</ul>\n<p>Requirements</p>\n<ul>\n<li>6+ years of experience in Information Security, with a specific focus on Application Security, Cloud Security, or DevSecOps.</li>\n<li>Strong scripting skills (Python, Go, or Bash) to automate security tasks and integrate tools.</li>\n<li>Deep understanding of CI/CD ecosystems and container orchestration (Kubernetes/Docker).</li>\n<li>Hands-on experience with modern security tooling (e.g., Wiz, Snyk, SonarQube, Prisma, or similar enterprise tools).</li>\n<li>Collaborative mindset: you view developers as partners, not adversaries, and focus on enabling them to code securely.</li>\n<li>Clear communication, autonomous, and capable of translating technical security risks into actionable engineering tasks.</li>\n</ul>\n<p>Benefits</p>\n<ul>\n<li>Competitive salary</li>\n<li>Comprehensive health insurance</li>\n<li>Flexible working hours</li>\n<li>Professional development opportunities</li>\n</ul>\n<p>Note: The company may offer additional benefits not listed here.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_08f992cf-0e9","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai/","logo":"https://logos.yubhub.co/mistral.ai.png"},"x-apply-url":"https://jobs.lever.co/mistral/c9b75928-dd48-4432-b6f1-fc0b24e51657","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Application Security","Cloud Security","DevSecOps","CI/CD ecosystems","Container orchestration","Modern security tooling","Scripting skills","Collaborative mindset","Clear communication"],"x-skills-preferred":["Industry certifications","Infrastructure as Code","Offensive security","Prior experience securing large-scale AI or Machine Learning infrastructure"],"datePosted":"2026-04-17T12:46:50.079Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Paris"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Application Security, Cloud Security, DevSecOps, CI/CD ecosystems, Container orchestration, Modern security tooling, Scripting skills, Collaborative mindset, Clear communication, Industry certifications, Infrastructure as Code, Offensive security, Prior experience securing large-scale AI or Machine Learning infrastructure"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_a3d60aab-0bb"},"title":"Research Platform Engineer","description":"<p><strong>About Mistral AI</strong></p>\n<p>Mistral AI is an AI technology company that develops high-performance, optimized, open-source and cutting-edge models, products and solutions.</p>\n<p><strong>Role Summary – Software Engineering track</strong></p>\n<p>As a Research Engineer on the software side, you will design and harden the codebase, tools and distributed services that let our scientists train and ship frontier-scale models. You do not need prior ML experience; what matters is writing clean, reliable code that scales. You will join our Platform REs team to build and maintain shared dev-tools, evaluation &amp; data pipelines, training framework, cluster tooling and CI/CD.</p>\n<p><strong>Responsibilities</strong></p>\n<p>• Accelerate researchers by owning the complex parts of large-scale pipelines and delivering robust internal tooling.\n• Interface research with product: expose clean APIs, automate model pushes, surface live metrics.\n• Write efficient, well-tested Python and systems code; enforce code review, CI, and observability.\n• Design and optimise distributed services (Kubernetes / SLURM, thousands-of-GPU jobs).\n• Prototype utilities (CLI, dashboards) and carry them through to stable, shared libraries.</p>\n<p><strong>About the Research Engineering team</strong></p>\n<p>Based in Paris and London, our REs move fluidly along the research ↔ production spectrum. Engineers can rotate between Platform and Embedded tracks as their interests evolve.</p>\n<p><strong>About you</strong></p>\n<p>• Master’s in Computer Science (or equivalent experience).\n• 4 + years building and operating large-scale or distributed systems.\n• Strong software-design instincts: modular code, tests, CI/CD, observability.\n• Fluency in Python plus one systems language (C++, Rust, Go or Java).\n• Hands-on with container orchestration and schedulers (Kubernetes / K8s, SLURM, or similar).\n• Comfortable profiling performance, optimising I/O, and automating workflows.\n• Self-starter, low-ego, collaborative, high-energy.</p>\n<p><strong>Benefits</strong></p>\n<p>France:\n• Competitive cash salary and equity\n• Food: Daily lunch vouchers\n• Sport: Monthly contribution to a Gympass subscription\n• Transportation: Monthly contribution to a mobility pass\n• Health: Full health insurance for you and your family\n• Parental: Generous parental leave policy</p>\n<p>UK:\n• Competitive cash salary and equity\n• Insurance\n• Transportation: Reimburse office parking charges, or £90 per month for public transport\n• Sport: £90 per month reimbursement for gym membership\n• Meal voucher: £200 monthly allowance for meals\n• Pension plan: SmartPension (percentages are 5% Employee &amp; 3% Employer)</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_a3d60aab-0bb","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai/careers","logo":"https://logos.yubhub.co/mistral.ai.png"},"x-apply-url":"https://jobs.lever.co/mistral/df0d75c1-97ef-4e50-85e6-0ffd8f5b7d7c","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Python","C++","Rust","Go","Java","Kubernetes","SLURM","container orchestration","schedulers"],"x-skills-preferred":[],"datePosted":"2026-04-17T12:46:02.806Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Paris"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, C++, Rust, Go, Java, Kubernetes, SLURM, container orchestration, schedulers"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_2e8a2997-260"},"title":"Senior Infrastructure Engineer","description":"<p>We are open to hiring at multiple levels for this role, depending on experience, impact, and demonstrated ownership. While this role is level-agnostic, it is best suited for engineers with experience owning and working in highly ambiguous problem spaces.</p>\n<p>About the company:\nThe mining industry has steadily become worse at finding new ore deposits, requiring &gt;10X more capital to make discoveries compared to 30 years ago. KoBold Metals builds AI models for mineral exploration and deploys those models,alongside our novel sensors,to guide decisions on KoBold-owned-and-operated exploration programs.</p>\n<p>About The Role:\nIn this role, you will partner with exploration and engineering teams to build reliable, scalable infrastructure that makes it easier to turn data and models into real-world exploration insights. You will improve observability, streamline MLOps workflows, and maintain shared tools like JupyterHub that enable faster experimentation and collaboration. Your work will help create a solid foundation for scientists and engineers to focus on discovery instead of infrastructure.</p>\n<p>Responsibilities</p>\n<ul>\n<li>Design, build, and operate compute infrastructure that is both scalable and reliable to support critical services.</li>\n<li>Work closely with engineering teams to embed observability, reliability, and security throughout the software development process.</li>\n<li>Create and maintain automation for monitoring, deployments, and incident response to keep operations efficient and predictable.</li>\n<li>Lead or support capacity planning, performance reviews, and system tuning to ensure stable and efficient systems.</li>\n<li>Join the on-call rotation and take part in incident response, troubleshooting, and resolution.</li>\n<li>Develop and refine monitoring and alerting to catch issues early and reduce downtime.</li>\n<li>Establish and maintain disaster recovery and business continuity practices that protect the organization against failures.</li>\n<li>Regularly review and improve our tools and processes to strengthen system visibility and reliability.</li>\n<li>Investigate points of fragility in distributed systems and understand how complex systems behave under stress in order to improve resilience.</li>\n<li>Continually learn about mineral exploration through reading, discussions with exploration team members, periodic rotation on an exploration team and time in the field with geologists</li>\n</ul>\n<p>Qualifications</p>\n<ul>\n<li>5+ years of experience as an Infrastructure Engineer, Site Reliability Engineer or in a similar role</li>\n<li>Strong scripting and programming skills (Python, Go, Java or JavaScript/ Node.js )</li>\n<li>Experience with IaC tools like Terraform and container orchestration tools like Kubernetes and Docker</li>\n<li>Experience with cloud platforms such as AWS</li>\n<li>Experience operating or administering JupyterHub in a multi-user environment</li>\n<li>Understanding of MLOps workflows, including model training, deployment, and related tooling</li>\n<li>Excellent communication &amp; collaboration skills and a continuous improvement mindset</li>\n<li>Proven ability to troubleshoot complex issues and implement effective solutions</li>\n<li>Proven ability to thrive in dynamic and evolving environments, effectively navigating uncertainty and incomplete information.</li>\n<li>Proven ability to grow expertise, influence &amp; educate others</li>\n<li>Comfortable making informed decisions with limited data, adapting quickly to new circumstances, and maintaining focus on strategic objectives while driving clarity for the team.</li>\n<li>Intellectual curiosity and eagerness to learn about all aspects of mineral exploration, particularly in the geology domain. Enjoys constantly learning such that you are driving insights through using our tools in exploration and willing to work directly with geologists in the field.</li>\n<li>Ability to explain technical problems to and collaborate on solutions with domain experts who are not infrastructure engineers. A strong communicator who enjoys working with colleagues across the company.</li>\n<li>Excitement about joining a fast-growing early-stage company, comfort with a dynamic work environment, and eagerness to take on an evolving range of responsibilities.</li>\n<li>Keen not just to build cool technology, but to figure out what technical product to build to best achieve the business objectives of the company.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_2e8a2997-260","directApply":true,"hiringOrganization":{"@type":"Organization","name":"KoBold Metals","sameAs":"https://koboldmetals.com/","logo":"https://logos.yubhub.co/koboldmetals.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/koboldmetals/jobs/4002126005","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$170,000 - $230,000","x-skills-required":["scripting","programming","IaC","container orchestration","cloud platforms","MLOps workflows","observability","reliability","security","automation","monitoring","deployments","incident response","capacity planning","performance reviews","system tuning","disaster recovery","business continuity","tools","processes","distributed systems","complex systems","resilience","mineral exploration","geology"],"x-skills-preferred":[],"datePosted":"2026-04-17T12:40:33.164Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Remote"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"scripting, programming, IaC, container orchestration, cloud platforms, MLOps workflows, observability, reliability, security, automation, monitoring, deployments, incident response, capacity planning, performance reviews, system tuning, disaster recovery, business continuity, tools, processes, distributed systems, complex systems, resilience, mineral exploration, geology","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":170000,"maxValue":230000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_86622b48-10e"},"title":"Software Engineer, Site Reliability","description":"<p>We are looking for a Site Reliability Engineer who thinks like a software engineer first. You will own critical production systems end-to-end, designing, building, and improving them rather than simply operating them. You will write production-quality code that keeps the platform reliable at scale, embed with product engineering teams to influence architecture from the start, and build the internal tooling that every engineer at Hebbia depends on.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Own critical production services end-to-end, from design and code review through deployment, operation, and incident response</li>\n<li>Profile, benchmark, and rewrite hot paths to eliminate bottlenecks as Hebbia scales</li>\n<li>Lead incident response and drive post-mortem culture, translating findings into code changes and architectural improvements rather than runbooks</li>\n<li>Design and build observability frameworks from scratch, writing custom instrumentation, alerting logic, and debugging tooling that surfaces production issues before customers feel them</li>\n<li>Define and enforce SLOs across platform services and build the feedback loops that keep engineering teams accountable to them</li>\n<li>Own capacity planning and cost efficiency: model growth, right-size infrastructure, and write automation that prevents over-provisioning and resource exhaustion</li>\n<li>Build robust, well-tested internal platforms and deployment tooling held to the same engineering standards as customer-facing code</li>\n<li>Own and continuously improve CI/CD systems so engineering teams can ship safely and quickly</li>\n<li>Embed with product engineering teams as a peer software engineer, contributing directly to production codebases and co-designing systems for reliability from the start</li>\n<li>Partner on infrastructure security through threat modeling, hardening, and automated compliance tooling</li>\n</ul>\n<p>Who You Are:</p>\n<ul>\n<li>5+ years software development with a track record of writing, shipping, and maintaining production services, not just operating infrastructure</li>\n<li>Production-grade proficiency in at least one systems or backend language: Go, Python, C++, or Rust</li>\n<li>Proven experience as a Production Engineer, SRE, or software engineer with a deep infrastructure focus, comfortable owning services end-to-end across the full stack</li>\n<li>Deep understanding of distributed systems</li>\n<li>Container orchestration expertise and hands-on experience debugging complex distributed failures in production</li>\n<li>Working knowledge of OS-level concepts</li>\n<li>Cloud platform fluency (AWS preferred)</li>\n<li>Experience in building and maintaining observability stacks</li>\n<li>Strong CI/CD pipeline expertise and a track record of improving developer velocity without sacrificing safety</li>\n<li>Background at a company with a Production Engineering or software-focused SRE culture is a strong plus</li>\n<li>Experience building platforms for AI/ML workloads or high-throughput document processing pipelines is a plus</li>\n</ul>\n<p>Compensation:\nThe salary range for this role is $160,000 to $300,000. This range may be inclusive of several career levels at Hebbia and will be narrowed during the interview process based on the candidate’s experience and qualifications. Adjustments outside of this range may be considered for candidates whose qualifications significantly differ from those outlined in the job description.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_86622b48-10e","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Hebbia","sameAs":"https://hebbia.com","logo":"https://logos.yubhub.co/hebbia.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/hebbia/jobs/4666955005","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$160,000 - $300,000","x-skills-required":["Go","Python","C++","Rust","Distributed systems","Container orchestration","OS-level concepts","Cloud platform fluency (AWS)","Observability stacks","CI/CD pipeline expertise"],"x-skills-preferred":[],"datePosted":"2026-04-17T12:37:23.089Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"New York City; San Francisco, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Go, Python, C++, Rust, Distributed systems, Container orchestration, OS-level concepts, Cloud platform fluency (AWS), Observability stacks, CI/CD pipeline expertise","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":160000,"maxValue":300000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_2a88ee59-dc6"},"title":"Full Stack Engineer (Serverless)","description":"<p>We&#39;re building the fastest and most scalable infrastructure for AI inference. As a Full Stack Engineer on Serverless, you will build the core product across frontend and backend that powers our Serverless platform. This is a deeply product-focused role where you will work side-by-side with Product and Infrastructure to design and ship reusable, scalable systems that enterprise customers rely on in production every day.</p>\n<p>You will be a foundational technical owner of our Serverless product as it scales to thousands of enterprise customers, with real responsibility, autonomy, and impact. This is a chance to help build a new product vertical from the ground up inside a company that is already scaling at rocket-ship speed.</p>\n<p>Your responsibilities will include:</p>\n<ul>\n<li>Building and maintaining core Serverless UI features (dashboards, logs, observability, configuration, usage)</li>\n<li>Designing and implementing backend APIs that power the Serverless product experience</li>\n<li>Improving performance, reliability, and scalability of customer-facing systems</li>\n<li>Working closely with Infrastructure to ensure product features align with platform capabilities</li>\n<li>Owning features end-to-end, from design through production and iteration</li>\n</ul>\n<p>We&#39;re looking for a strong experience working across both frontend and backend, proficiency with TypeScript, Python, Postgres, and Next.js, and experience owning features end-to-end in production systems. Ability to context switch between UI, backend, and performance work, product-minded engineer who values clean abstractions and long-term maintainability, comfortable working in a fast-moving, low-process environment.</p>\n<p>Nice to have experience building developer platforms or infrastructure-adjacent products, familiarity with observability tooling (logging, metrics, tracing) in production environments, background in distributed systems, container orchestration, or cloud-native architectures, experience with real-time systems, streaming logs, or high-throughput data pipelines, exposure to technologies such as Kubernetes, Prometheus, Datadog, gRPC, or similar systems, entrepreneurial mindset and strong ownership mentality.</p>\n<p>We offer interesting and challenging work, competitive salary and equity, a lot of learning and growth opportunities, visa sponsorship and relocation assistance, health, dental, and vision insurance, regular team events and offsite.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_2a88ee59-dc6","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Fal","sameAs":"https://www.fal.com/","logo":"https://logos.yubhub.co/fal.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/fal/jobs/4112697009","x-work-arrangement":"onsite","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$150,000 - $230,000 + equity + comprehensive benefits package","x-skills-required":["TypeScript","Python","Postgres","Next.js","serverless","backend APIs","frontend development"],"x-skills-preferred":["observability tooling","distributed systems","container orchestration","cloud-native architectures","real-time systems","streaming logs","high-throughput data pipelines","Kubernetes","Prometheus","Datadog","gRPC"],"datePosted":"2026-04-17T12:32:02.355Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"TypeScript, Python, Postgres, Next.js, serverless, backend APIs, frontend development, observability tooling, distributed systems, container orchestration, cloud-native architectures, real-time systems, streaming logs, high-throughput data pipelines, Kubernetes, Prometheus, Datadog, gRPC","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":150000,"maxValue":230000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_fecd9424-b1a"},"title":"DevOps Engineer","description":"<p>Espresso Systems is looking for a DevOps Engineer to assist the development team in building infrastructure to support production of the sequencer software, as well as build tooling for the deployment of test networks.</p>\n<p>As a DevOps Engineer, you will be responsible for monitoring and management of cloud environments (AWS, Azure, GCP), assisting with CI/CD pipelines and general code management practices, and develop and maintenance of tooling for deployment of Espresso Systems services.</p>\n<p>We are a fully remote team with flexible hours, and offer a competitive salary + equity package, regular team off-sites to international locations, unlimited vacation policy, and top-tier health, dental, and vision coverage for US employees.</p>\n<p>Responsibilities:\nMonitoring and management of cloud environments (AWS, Azure, GCP)\nAssisting with CI/CD pipelines and general code management practices\nDevelop and maintenance of tooling for deployment of Espresso Systems services</p>\n<p>Requirements:\nExperience working with containers in hosted environments\nExperience with Terraform or other similar declarative cloud operations technologies\nComfort working in the Linux command line</p>\n<p>Preferred:\nExperience with Ansible, AWS container orchestration, Docker, Github Actions, Nix, Rust, and Terraform</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_fecd9424-b1a","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Espresso Systems","sameAs":"https://www.espressosystems.com/","logo":"https://logos.yubhub.co/espressosystems.com.png"},"x-apply-url":"https://jobs.lever.co/Espresso/928cd4da-e76d-4d6a-ba58-fba57e6ae81a","x-work-arrangement":"remote","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["containers","Terraform","Linux command line"],"x-skills-preferred":["Ansible","AWS container orchestration","Docker","Github Actions","Nix","Rust"],"datePosted":"2026-04-17T12:30:48.829Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Remote"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"containers, Terraform, Linux command line, Ansible, AWS container orchestration, Docker, Github Actions, Nix, Rust"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_534baf2e-262"},"title":"Member of Technical Staff, Platform Engineering","description":"<p>At Anchorage Digital, we are building the world’s most advanced digital asset platform for institutions to participate in crypto. Anchorage Digital is a crypto platform that enables institutions to participate in digital assets through custody, staking, trading, governance, settlement, and the industry&#39;s leading security infrastructure.</p>\n<p>As a Member of Technical Staff on the Platform team, you’ll work at the intersection of infrastructure and developer experience (DevEx). You will help define the strategy for our monorepo build pipelines while simultaneously ensuring our hybrid cloud infrastructure is secure and performant.</p>\n<p>Internal developers will rely on the tools you build to get code to production smoothly, while the business will depend on the reliable runtime environment you provision. This role offers high cross-team exposure and the opportunity to solve complex distributed systems problems in a security-first environment.</p>\n<p><strong>Responsibilities</strong></p>\n<p><strong>Technical Skills:</strong></p>\n<ul>\n<li>Build the Platform: Design and implement next-gen CI/CD pipelines focusing on security, speed, and reliability.</li>\n<li>Hybrid Networking: Define network architecture initiatives, creating abstractions for connectivity between physical Data Centers and Kubernetes clusters.</li>\n<li>Cloud Architecture: Work with GCP to architect secure, scalable runtime environments and operate cost-efficient infrastructure.</li>\n</ul>\n<p><strong>Complexity and Impact of Work:</strong></p>\n<ul>\n<li>Developer Experience (DevEx): Measure and improve &quot;DORA metrics&quot; (Deployment Frequency, Lead Time for Changes).</li>\n<li>Modernization: Lead initiatives to migrate legacy network setups to modern, automated architectures.</li>\n<li>Test Infrastructure: Build robust local and remote test environments (ephemeral environments) to boost developer confidence.</li>\n</ul>\n<p><strong>Communication and Influence</strong></p>\n<ul>\n<li>Collaboration: Work with external partners (ISPs, DC providers) and internal security teams to deliver secure solutions.</li>\n<li>Advocacy: Champion best practices across CI/CD, network security, and observability.</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>Strong programming skills and a desire to stop manually SSH-ing into routers to start writing code instead.</li>\n<li>Experience with Kubernetes, Docker, and container orchestration.</li>\n<li>Experience building complex, performant CI/CD systems using toolchains like Bazel or Gradle.</li>\n</ul>\n<p><strong>Bonus Points</strong></p>\n<ul>\n<li>Experience with setting up and monitoring private B2B connectivity.</li>\n<li>Experience with ChromeOS or a background in the finance industry.</li>\n<li>A passion for reading blockchain protocol white papers for fun</li>\n<li>You were emotionally moved by the soundtrack to Hamilton, which chronicles the founding of a new financial system.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_534baf2e-262","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anchorage Digital","sameAs":"https://anchorage.com","logo":"https://logos.yubhub.co/anchorage.com.png"},"x-apply-url":"https://jobs.lever.co/anchorage/339e1838-3da6-4331-b7ea-72799d153bd1","x-work-arrangement":"remote","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Kubernetes","Docker","container orchestration","CI/CD","Bazel","Gradle","GCP","cloud architecture","network security","observability"],"x-skills-preferred":["setting up and monitoring private B2B connectivity","ChromeOS","finance industry","blockchain protocol white papers"],"datePosted":"2026-04-17T12:26:07.283Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"United States"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Finance","skills":"Kubernetes, Docker, container orchestration, CI/CD, Bazel, Gradle, GCP, cloud architecture, network security, observability, setting up and monitoring private B2B connectivity, ChromeOS, finance industry, blockchain protocol white papers"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_f3e50e3a-313"},"title":"Member of Technical Staff, Financial Infrastructure","description":"<p>At Anchorage Digital, we are building the world&#39;s most advanced digital asset platform for institutions to participate in crypto.</p>\n<p>We are building infrastructure that enables the world&#39;s largest financial institutions,banks, broker-dealers, fintechs, and payments companies,to offer digital asset services to their end clients. Think banking-as-a-service, but for crypto custody, trading, staking, and more.</p>\n<p>This is an opportunity to grow as an engineer while contributing to a platform that will power the next generation of institutional digital asset adoption. You&#39;ll work at the intersection of cutting-edge crypto technology and the rigorous standards required by regulated financial services.</p>\n<p>As an engineer on this team, you will contribute to the technical direction of our infrastructure services platform, collaborate with experienced engineers, and help build critical systems. You&#39;ll own features from development through deployment and deepen your expertise in financial infrastructure.</p>\n<p><strong>Technical Skills:</strong></p>\n<ul>\n<li>Develop and maintain infrastructure that powers digital asset custody, trading, staking, and settlement for enterprise financial institutions</li>\n<li>Contribute to virtual accounting layers, API integrations, and multi-tenant platform architectures with guidance from senior engineers</li>\n<li>Write clear, concise, tested code and participate in code reviews to improve engineering practices</li>\n<li>Build understanding of security, reliability, and testing best practices in a regulated environment</li>\n</ul>\n<p><strong>Complexity and Impact of Work:</strong></p>\n<ul>\n<li>Take ownership of the full lifecycle of features,from development through deployment,with increasing autonomy over time</li>\n<li>Apply problem-solving skills to troubleshoot issues, document technical improvements, and contribute to solutions</li>\n<li>Contribute to cross-functional projects and collaborate with adjacent teams to deliver on platform goals</li>\n<li>Participate in on-call rotation to provide product support for the software and infrastructure the team owns</li>\n</ul>\n<p><strong>Organizational Knowledge:</strong></p>\n<ul>\n<li>Understand how Anchorage&#39;s company priorities relate to your area of work and clearly communicate the &#39;why&#39; behind the work</li>\n<li>Work closely with compliance, legal, and security teams to understand regulatory requirements affecting the platform</li>\n<li>Participate in the hiring process through referrals, conducting interviews, or attending recruiting events</li>\n</ul>\n<p><strong>Communication and Influence:</strong></p>\n<ul>\n<li>Communicate clearly and proactively share updates that impact colleagues, managers, and leads</li>\n<li>Contribute actively to team discussions and problem-solving efforts</li>\n<li>Collaborate with your team and adjacent teams to solve problems, and assist or teach other team members when possible</li>\n</ul>\n<p><strong>You may be a fit for this role if you have:</strong></p>\n<ul>\n<li>2-5 years of professional experience building backend services and distributed systems</li>\n<li>Experience contributing to customer-facing APIs or multi-tenant platform architectures</li>\n<li>Familiarity with cloud-native microservices (we use GCP, but experience with other cloud providers is sufficient)</li>\n<li>A track record of writing automated tests alongside your features and supporting production systems</li>\n<li>Experience participating in on-call support and incident response for production systems</li>\n<li>Strong computer science fundamentals (algorithms, data structures, systems design),formal CS degree not required</li>\n<li>A genuine interest in code quality, infrastructure reliability, and security</li>\n<li>The ability to prioritize end-user experience and business value</li>\n</ul>\n<p><strong>Although not a requirement, bonus points if:</strong></p>\n<ul>\n<li>Professional experience with C++, Go, or Rust</li>\n<li>Experience with Kubernetes and container orchestration</li>\n<li>Exposure to ledger systems, virtual accounting layers, or custody infrastructure</li>\n<li>Background in financial services, fintech, or banking-as-a-service platforms</li>\n<li>Experience with system security (authentication, authorization, identity management)</li>\n<li>Familiarity with regulatory requirements for financial institutions (SOC 2, bank examinations, etc.)</li>\n<li>Interest in or exposure to the cryptocurrency or digital assets industry</li>\n</ul>\n<p><strong>Additional Information About Anchorage Digital:</strong></p>\n<p>Who we are The Anchorage Village, what we call our team, brings together the brightest minds from platform security, financial services, and distributed ledger technology to provide the building blocks that empower institutions to safely participate in the evolving digital asset ecosystem.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_f3e50e3a-313","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anchorage Digital","sameAs":"https://anchorage.com","logo":"https://logos.yubhub.co/anchorage.com.png"},"x-apply-url":"https://jobs.lever.co/anchorage/f2f96be1-0e48-4eee-919c-af581ee649ae","x-work-arrangement":"remote","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["cloud-native microservices","API integrations","multi-tenant platform architectures","security, reliability, and testing best practices","distributed systems","backend services","customer-facing APIs","automated tests","on-call support","incident response"],"x-skills-preferred":["C++","Go","Rust","Kubernetes","container orchestration","ledger systems","virtual accounting layers","custody infrastructure","financial services","fintech","banking-as-a-service platforms","system security","regulatory requirements"],"datePosted":"2026-04-17T12:25:24.846Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"United States"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Finance","skills":"cloud-native microservices, API integrations, multi-tenant platform architectures, security, reliability, and testing best practices, distributed systems, backend services, customer-facing APIs, automated tests, on-call support, incident response, C++, Go, Rust, Kubernetes, container orchestration, ledger systems, virtual accounting layers, custody infrastructure, financial services, fintech, banking-as-a-service platforms, system security, regulatory requirements"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_651ca674-a3b"},"title":"Member of Technical Staff, Financial Infrastructure","description":"<p>At Anchorage Digital, we are building the world&#39;s most advanced digital asset platform for institutions to participate in crypto.</p>\n<p>We are building infrastructure that enables the world&#39;s largest financial institutions,banks, broker-dealers, fintechs, and payments companies,to offer digital asset services to their end clients.</p>\n<p>This is a greenfield opportunity to architect and build a platform that will power the next generation of institutional digital asset adoption.</p>\n<p>As an engineer on this team, you will drive the technical direction of our infrastructure services platform, mentor other engineers, and help shape the product roadmap.</p>\n<p><strong>Technical Skills:</strong></p>\n<ul>\n<li>Build secure, scalable infrastructure that powers digital asset custody, trading, staking, and settlement for enterprise financial institutions</li>\n<li>Design and implement virtual accounting layers, API integrations, and multi-tenant platform architectures</li>\n<li>Drive technical excellence through code reviews, design specs, and mentorship across the engineering organization</li>\n<li>Foster an efficient testing culture with emphasis on security, reliability, and minimizing tech debt in a regulated environment</li>\n</ul>\n<p><strong>Complexity and Impact of Work:</strong></p>\n<ul>\n<li>Lead complex cross-functional projects spanning custody, trading, compliance, and client integration</li>\n<li>Break down large initiatives into well-scoped deliverables; accurately estimate multi-person, multi-quarter projects</li>\n<li>Own the end-to-end lifecycle of platform features,from architecture through production support and on-call rotation</li>\n<li>Find the right balance between shipping velocity and the perfection required for regulated financial infrastructure</li>\n</ul>\n<p><strong>Organizational Knowledge:</strong></p>\n<ul>\n<li>Collaborate across product lines and with enterprise clients to understand and deliver on their infrastructure needs</li>\n<li>Work closely with compliance, legal, and security teams to ensure platform meets regulatory requirements</li>\n<li>Contribute to scaling the team through hiring, onboarding, and knowledge sharing</li>\n</ul>\n<p><strong>Communication and Influence:</strong></p>\n<ul>\n<li>Influence architecture decisions and product roadmap,have a seat at the table with leadership</li>\n<li>Mentor and guide engineers on the team; help others connect their work to Anchorage&#39;s strategic goals</li>\n<li>Communicate effectively with external stakeholders, including enterprise clients and partners</li>\n</ul>\n<p><strong>You may be a fit for this role if you have:</strong></p>\n<ul>\n<li>8+ years of professional experience building backend services and distributed systems</li>\n<li>Experience designing and delivering customer-facing APIs and multi-tenant platform architectures</li>\n<li>Experience with cloud-native microservices (we use GCP, but experience with other cloud providers is sufficient)</li>\n<li>A track record of building automated tests alongside your features and maintaining production systems over time</li>\n<li>Experience performing on-call support and incident response for production systems</li>\n<li>Strong computer science fundamentals (algorithms, data structures, systems design),formal CS degree not required</li>\n<li>A genuine interest in code quality, infrastructure reliability, and security</li>\n<li>The ability to prioritize end-user experience and business value over &#39;cool tech&#39;</li>\n</ul>\n<p><strong>Although not a requirement, bonus points if:</strong></p>\n<ul>\n<li>Professional experience with C++, Go, or Rust</li>\n<li>Professional experience with Kubernetes and container orchestration</li>\n<li>Experience building ledger systems, virtual accounting layers, or custody infrastructure</li>\n<li>Background in financial services, fintech, or banking-as-a-service platforms</li>\n<li>Experience with system security (authentication, authorization, identity management, API keys)</li>\n<li>Familiarity with regulatory requirements for financial institutions (SOC 2, bank examinations, etc.)</li>\n<li>Background in the cryptocurrency or digital assets industry</li>\n<li>You were emotionally moved by the soundtrack to Hamilton, which chronicles the founding of a new financial system.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_651ca674-a3b","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anchorage Digital","sameAs":"https://anchorage.com","logo":"https://logos.yubhub.co/anchorage.com.png"},"x-apply-url":"https://jobs.lever.co/anchorage/1407f1e9-8a54-4c3f-afe0-62b96579e815","x-work-arrangement":"remote","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Cloud-native microservices","API integrations","Multi-tenant platform architectures","Code reviews","Design specs","Mentorship","Security","Reliability","Tech debt","Regulated environment","Custody","Trading","Staking","Settlement","Enterprise financial institutions","Virtual accounting layers","Strong computer science fundamentals","Algorithms","Data structures","Systems design"],"x-skills-preferred":["C++","Go","Rust","Kubernetes","Container orchestration","Ledger systems","Custody infrastructure","Financial services","Fintech","Banking-as-a-service platforms","System security","Authentication","Authorization","Identity management","API keys","Regulatory requirements","SOC 2","Bank examinations","Cryptocurrency","Digital assets"],"datePosted":"2026-04-17T12:25:10.161Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"United States"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Finance","skills":"Cloud-native microservices, API integrations, Multi-tenant platform architectures, Code reviews, Design specs, Mentorship, Security, Reliability, Tech debt, Regulated environment, Custody, Trading, Staking, Settlement, Enterprise financial institutions, Virtual accounting layers, Strong computer science fundamentals, Algorithms, Data structures, Systems design, C++, Go, Rust, Kubernetes, Container orchestration, Ledger systems, Custody infrastructure, Financial services, Fintech, Banking-as-a-service platforms, System security, Authentication, Authorization, Identity management, API keys, Regulatory requirements, SOC 2, Bank examinations, Cryptocurrency, Digital assets"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_7bce292a-74f"},"title":"CyberSecurity Team Lead, Infrastructure and Application","description":"<p>Role summary</p>\n<p>Embedded directly within Mistral&#39;s Security Engineering ecosystem, you will architect and enforce the security posture of our entire technical stack, from on-premise foundations to cloud-native deployments.</p>\n<p>As a CyberSecurity Team Lead, you will oversee the identification, prioritization, and remediation of vulnerabilities across both On-Prem and Cloud infrastructures as well as internal applications.</p>\n<p>You will select, deploy, and maintain the tools needed for visibility and protection, including CNAPP, CSPM, SAST/DAST, secret scanning, and SBOM/CVE tracking.</p>\n<p>Integrate security controls and automated gates directly into CI/CD pipelines to catch vulnerabilities before deployment (Shift Left).</p>\n<p>Partner with engineering teams to interpret findings and &#39;ease the fix,&#39; providing patches, code snippets, or architectural advice to resolve issues quickly.</p>\n<p>Define and maintain rigorous security guidelines and best practices for developers and system administrators.</p>\n<p>Design and lead security awareness programs and technical training tailored for developers and admins to reduce human risk.</p>\n<p>Track and define key security metrics (MTTR, coverage, vulnerability density) to visualize posture and progress to leadership.</p>\n<p>Who you are</p>\n<p>• 6+ years of experience in Information Security, with a specific focus on Application Security, Cloud Security, or DevSecOps.</p>\n<p>• Strong scripting skills (Python, Go, or Bash) to automate security tasks and integrate tools.</p>\n<p>• Deep understanding of CI/CD ecosystems and container orchestration (Kubernetes/Docker).</p>\n<p>• Hands-on experience with modern security tooling (e.g., Wiz, Snyk, SonarQube, Prisma, or similar enterprise tools).</p>\n<p>• Collaborative mindset: you view developers as partners, not adversaries, and focus on enabling them to code securely.</p>\n<p>• Clear communication, autonomous, and capable of translating technical security risks into actionable engineering tasks.</p>\n<p>It would be ideal if you also have:</p>\n<p>• Industry certifications such as CISSP, CCSP, OSCP, or cloud-specific security certifications.</p>\n<p>• Strong Infrastructure as Code (IaC) experience with Terraform or Ansible.</p>\n<p>• Experience in offensive security (Penetration Testing) to better understand attacker mindsets.</p>\n<p>• Prior experience securing large-scale AI or Machine Learning infrastructure.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_7bce292a-74f","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai"},"x-apply-url":"https://jobs.lever.co/mistral/c9b75928-dd48-4432-b6f1-fc0b24e51657","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"hybrid","x-salary-range":null,"x-skills-required":["Application Security","Cloud Security","DevSecOps","CI/CD","Container Orchestration","Modern Security Tooling","Scripting Skills","Infrastructure as Code"],"x-skills-preferred":["Industry Certifications","Infrastructure as Code","Offensive Security","Large-Scale AI or Machine Learning Infrastructure"],"datePosted":"2026-03-10T11:24:46.918Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Paris"}},"occupationalCategory":"Engineering","industry":"Technology","skills":"Application Security, Cloud Security, DevSecOps, CI/CD, Container Orchestration, Modern Security Tooling, Scripting Skills, Infrastructure as Code, Industry Certifications, Infrastructure as Code, Offensive Security, Large-Scale AI or Machine Learning Infrastructure"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_eebf21c4-d1f"},"title":"Staff Site Reliability Engineer","description":"<p>Join our Site Reliability Engineering (SRE) team and help ensure the reliability, scalability, and performance of Replit&#39;s infrastructure that serves millions of developers worldwide.</p>\n<p>As a Staff Site Reliability Engineer, you will bridge the gap between development and operations, implementing automation and establishing best practices that enable our platform to scale efficiently while maintaining high availability.</p>\n<p>We are seeking Staff SREs who are passionate about building and maintaining resilient systems at scale. Your mission will be to proactively find and analyze reliability problems across our stack, then design and implement software and systems to create step-function improvements.</p>\n<p>You will design robust observability solutions, lead incident response, automate operational tasks, and continuously improve our infrastructure&#39;s reliability, all while mentoring and educating the broader engineering team to make reliability a core value at Replit.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Architect and Implement Observability: Design, build, and lead the implementation of comprehensive monitoring, logging, and tracing solutions. Create dashboards and metrics that provide real-time visibility into system health and performance, enabling proactive issue detection.</li>\n</ul>\n<ul>\n<li>Define and Drive Reliability Standards: Work with product and engineering teams to define, implement, and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs). Build systems to monitor and report on these metrics, holding teams accountable and ensuring we maintain high reliability standards while balancing innovation speed.</li>\n</ul>\n<ul>\n<li>Lead Incident Management and Response: Act as a senior leader during high-impact incidents, guiding the team to rapid resolution. Conduct thorough, blameless post-mortems and drive the implementation of preventative measures. Develop and refine runbooks and build automation to reduce Mean Time To Recovery (MTTR).</li>\n</ul>\n<ul>\n<li>Drive Automation and Infrastructure as Code: Architect, build, and improve automation to eliminate toil and operational work. Design and maintain CI/CD pipelines and infrastructure automation using tools like Terraform or Pulumi. Create self-healing systems that can automatically respond to common failure scenarios.</li>\n</ul>\n<ul>\n<li>Optimize Performance on Kubernetes: Collaborate with core infrastructure and product teams to performance-tune and optimize our large-scale cloud deployments, with a deep focus on Kubernetes, Docker, and GCP. Identify and resolve performance bottlenecks, implement capacity planning strategies, and reduce latency across global regions.</li>\n</ul>\n<ul>\n<li>Debug and Harden Distributed Systems: Dive deep into debugging extremely difficult technical problems across the stack. Use your findings to design and implement long-term fixes that make our systems and products more robust, operable, and easier to diagnose.</li>\n</ul>\n<ul>\n<li>Provide Staff-Level Guidance: Review feature and system designs from across the company, acting as a key owner for the reliability, scalability, security, and operational integrity of those designs.</li>\n</ul>\n<ul>\n<li>Educate and Mentor: Educate, mentor, and hold accountable the broader engineering team to improve the reliability of our systems, making reliability a core value of the Replit engineering culture.</li>\n</ul>\n<ul>\n<li>Build and Integrate: Write high-quality, well-tested code in Python or Go to meet the needs of your customers, whether it&#39;s building new internal tools or integrating with third-party vendors.</li>\n</ul>\n<p><strong>Required Skills and Experience</strong></p>\n<ul>\n<li>8-10 years of experience in Site Reliability Engineering or similar roles (e.g., DevOps, Systems Engineering, Infrastructure Engineering).</li>\n</ul>\n<ul>\n<li>Strong programming skills in languages like Python or Go. You write high-quality, well-tested code.</li>\n</ul>\n<ul>\n<li>Deep understanding of distributed systems. You’ve designed, built, scaled, and maintained production services and know how to compose a service-oriented architecture.</li>\n</ul>\n<ul>\n<li>Deep experience with container orchestration platforms, specifically Kubernetes, and cloud-native technologies.</li>\n</ul>\n<ul>\n<li>Proven track record of designing, implementing, and maintaining sophisticated monitoring and observability solutions (e.g., metrics, logging, tracing).</li>\n</ul>\n<ul>\n<li>Strong incident management skills with extensive experience leading incident response for complex systems and demonstrated critical thinking under pressure.</li>\n</ul>\n<ul>\n<li>Experience with infrastructure as code (e.g., Terraform, Pulumi) and configuration management tools.</li>\n</ul>\n<ul>\n<li>Excellent written and verbal communication skills, with an ability to explain complex technical concepts clearly and simply and a bias toward open, transparent cultural practices.</li>\n</ul>\n<ul>\n<li>Strong interpersonal skills, with experience working with and mentoring engineers from junior to principal levels.</li>\n</ul>\n<ul>\n<li>A willingness to dive into understanding, debugging, and improving any layer of the stack.</li>\n</ul>\n<ul>\n<li>You&#39;re passionate about making software creation accessible and empowering the next generation of builders.</li>\n</ul>\n<p><strong>Bonus Points</strong></p>\n<ul>\n<li>Deep experience with Google Cloud Platform (GCP) services and tools.</li>\n</ul>\n<ul>\n<li>Expert-level knowledge of modern observability platforms (e.g., Prometheus, Grafana, Datadog, OpenTelemetry).</li>\n</ul>\n<ul>\n<li>Experience designing and building reliable systems capable of handling high throughput and low latency.</li>\n</ul>\n<ul>\n<li>Significant experience with Go and Terraform.</li>\n</ul>\n<ul>\n<li>Familiarity with working in rapid-growth, startup environments.</li>\n</ul>\n<ul>\n<li>Experience writing company-facing blog posts and training materials.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_eebf21c4-d1f","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Replit","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/replit.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/replit/d50ad15b-82d4-452f-b4ea-2a7f5e796170","x-work-arrangement":"remote","x-experience-level":"staff","x-job-type":"Full time","x-salary-range":"$220K - $325K","x-skills-required":["Site Reliability Engineering","DevOps","Systems Engineering","Infrastructure Engineering","Python","Go","Distributed Systems","Container Orchestration","Kubernetes","Cloud-Native Technologies","Monitoring and Observability","Incident Management","Infrastructure as Code","Terraform","Pulumi","Configuration Management"],"x-skills-preferred":["Google Cloud Platform","Prometheus","Grafana","Datadog","OpenTelemetry","Go","Terraform"],"datePosted":"2026-03-08T22:20:23.639Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Remote (United States)"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Site Reliability Engineering, DevOps, Systems Engineering, Infrastructure Engineering, Python, Go, Distributed Systems, Container Orchestration, Kubernetes, Cloud-Native Technologies, Monitoring and Observability, Incident Management, Infrastructure as Code, Terraform, Pulumi, Configuration Management, Google Cloud Platform, Prometheus, Grafana, Datadog, OpenTelemetry, Go, Terraform","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":220000,"maxValue":325000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_672557eb-bee"},"title":"Engineering Manager, Data Platform","description":"<p><strong>Engineering Manager, Data Platform</strong></p>\n<p>We&#39;re looking for an experienced Engineering Manager to lead our Data Interfaces team, responsible for enabling users and systems to leverage our core data platform. The team owns the collection of operational telemetry data, the UI for interacting with the Data Platform, as well as APIs and plugins for querying data out of the Data Platform for visualization, alerting, and integration into internal services.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Lead, mentor, and grow a team of senior and principal engineers</li>\n<li>Foster an inclusive, collaborative, and feedback-driven engineering culture</li>\n<li>Drive continuous improvement in the team&#39;s processes, delivery, and impact</li>\n<li>Collaborate with stakeholders in engineering, data science, and analytics to shape and communicate the team&#39;s vision, strategy, and roadmap</li>\n<li>Bridge strategic vision and tactical execution by breaking down long-term goals into achievable, well-scoped iterations that deliver continuous value</li>\n<li>Ensure high standards in system architecture, code quality, and operational excellence</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>3+ years of engineering management experience leading high-performing teams in data platform or infrastructure environments</li>\n<li>Proven track record navigating complex systems, ambiguous requirements, and high-pressure situations with confidence and clarity</li>\n<li>Deep experience in architecting, building, and operating scalable, distributed data platforms</li>\n<li>Strong technical leadership skills, including the ability to review architecture/design documents and provide actionable feedback on code and systems</li>\n<li>Ability to engage deeply in technical discussions, review architecture and design documents, evaluate pull requests, and step in during high-priority incidents when needed — even if hands-on coding isn’t a part of the day-to-day</li>\n<li>Hands-on experience with distributed event streaming systems like Apache Kafka</li>\n<li>Familiarity with OLAP databases such as Apache Pinot or ClickHouse</li>\n<li>Proficient in modern data lake and warehouse tools such as S3, Databricks, or Snowflake</li>\n<li>Strong foundation in the .NET ecosystem, container orchestration with Kubernetes, and cloud platforms, especially AWS</li>\n<li>Experience with distributed data processing engines like Apache Flink or Apache Spark is nice to have</li>\n</ul>\n<p><strong>Benefits</strong></p>\n<p>Epic Games offers a comprehensive benefits package, including:</p>\n<ul>\n<li>100% coverage of medical, dental, and vision premiums for you and your dependents</li>\n<li>Long-term disability and life insurance</li>\n<li>401k with competitive match</li>\n<li>Unlimited PTO and sick time</li>\n<li>Paid sabbatical after 7 years of employment</li>\n<li>Robust mental well-being program through Modern Health</li>\n<li>Company-wide paid breaks and events throughout the year</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_672557eb-bee","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Epic Games","sameAs":"https://www.epicgames.com","logo":"https://logos.yubhub.co/epicgames.com.png"},"x-apply-url":"https://www.epicgames.com/en-US/careers/jobs/5818031004","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["engineering management","data platform","distributed event streaming systems","OLAP databases","modern data lake and warehouse tools",".NET ecosystem","container orchestration","cloud platforms"],"x-skills-preferred":["Apache Kafka","Apache Pinot","ClickHouse","S3","Databricks","Snowflake","Kubernetes","AWS","Apache Flink","Apache Spark"],"datePosted":"2026-03-08T22:16:11.037Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Cary"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"engineering management, data platform, distributed event streaming systems, OLAP databases, modern data lake and warehouse tools, .NET ecosystem, container orchestration, cloud platforms, Apache Kafka, Apache Pinot, ClickHouse, S3, Databricks, Snowflake, Kubernetes, AWS, Apache Flink, Apache Spark"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_93a4ece6-182"},"title":"Member of Technical Staff, Site Reliability Engineer (HPC)","description":"<p>As Microsoft continues to push the boundaries of AI, we are on the lookout for experienced individuals to work with us on the most interesting and challenging AI questions of our time. Our vision is to build systems that have true artificial intelligence across agents, applications, services, and infrastructure. We&#39;re looking for an experienced HPC Site Reliability Engineer (SRE) to join our High Performance Computing (HPC) infrastructure team. In this role, you&#39;ll blend software engineering and systems engineering to keep our large-scale distributed AI infrastructure reliable and efficient. You&#39;ll ensure that AI systems stay efficient and reliable with very high uptimes.</p>\n<p>Microsoft&#39;s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.</p>\n<p>This role is part of Microsoft AI&#39;s Superintelligence Team. The MAIST is a startup-like team inside Microsoft AI, created to push the boundaries of AI toward Humanist Superintelligence—ultra-capable systems that remain controllable, safety-aligned, and anchored to human values. Our mission is to create AI that amplifies human potential while ensuring humanity remains firmly in control. We aim to deliver breakthroughs that benefit society—advancing science, education, and global well-being.</p>\n<p>Responsibilities\nReliability &amp; Availability : Ensure uptime, resiliency, and fault tolerance of HPC clusters powering MAI model training and inference.\nObservability : Design and maintain monitoring, alerting, and logging systems to provide real-time visibility into all aspects of HPC systems including GPU, clusters, storage and networking.\nAutomation &amp; Tooling : Build automation for deployments, incident response, scaling, and failover in CPU+GPU environments.\nIncident Management : Lead on-call rotations, troubleshoot production issues, conduct blameless postmortems, and drive continuous improvements.\nSecurity &amp; Compliance : Ensure data privacy, compliance, and secure operations across model training and serving environments.\nCollaboration : Partner with ML engineers and platform teams to improve developer experience and accelerate research-to-production workflows.</p>\n<p>Qualifications\nRequired Qualifications\nMaster’s Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering OR Bachelor’s Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering OR equivalent experience</p>\n<p>Preferred Qualifications\nStrong proficiency in Kubernetes, Docker, and container orchestration.\nKnowledge of CI/CD pipelines for Inference and ML model deployment.\nHands-on experience with public cloud platforms like Azure/AWS/GCP and infrastructure-as-code.\nExpertise in monitoring &amp; observability tools (Grafana, Datadog, OpenTelemetry, etc.).\nStrong programming/scripting skills in Python, Go, or Bash.\nSolid knowledge of distributed systems, networking, and storage.\nExperience running large-scale GPU clusters for ML/AI workloads (preferred).\nFamiliarity with ML training/inference pipelines.\nExperience with high-performance computing (HPC) and workload schedulers (Kubernetes operators).\nBackground in capacity planning &amp; cost optimization for GPU-heavy environments.</p>\n<p>Work on cutting-edge infrastructure that powers the future of Generative AI. Collaborate with world-class researchers and engineers. Impact millions of users through reliable and responsible AI deployments. Competitive compensation, equity options, and comprehensive benefits.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_93a4ece6-182","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Microsoft","sameAs":"https://microsoft.ai","logo":"https://logos.yubhub.co/microsoft.ai.png"},"x-apply-url":"https://microsoft.ai/job/member-of-technical-staff-site-reliability-engineer-hpc-mai-superintelligence-team/","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$139,900 – $274,800 per year","x-skills-required":["Kubernetes","Docker","container orchestration","CI/CD pipelines","public cloud platforms","infrastructure-as-code","monitoring & observability tools","programming/scripting skills in Python, Go, or Bash","distributed systems","networking","storage","GPU clusters","ML training/inference pipelines","high-performance computing","workload schedulers"],"x-skills-preferred":["strong proficiency in Kubernetes","knowledge of CI/CD pipelines","hands-on experience with public cloud platforms","expertise in monitoring & observability tools","strong programming/scripting skills in Python, Go, or Bash","solid knowledge of distributed systems","experience running large-scale GPU clusters","familiarity with ML training/inference pipelines","experience with high-performance computing"],"datePosted":"2026-03-08T22:09:23.399Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Mountain View"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Kubernetes, Docker, container orchestration, CI/CD pipelines, public cloud platforms, infrastructure-as-code, monitoring & observability tools, programming/scripting skills in Python, Go, or Bash, distributed systems, networking, storage, GPU clusters, ML training/inference pipelines, high-performance computing, workload schedulers, strong proficiency in Kubernetes, knowledge of CI/CD pipelines, hands-on experience with public cloud platforms, expertise in monitoring & observability tools, strong programming/scripting skills in Python, Go, or Bash, solid knowledge of distributed systems, experience running large-scale GPU clusters, familiarity with ML training/inference pipelines, experience with high-performance computing","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":139900,"maxValue":274800,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_82598f5e-54b"},"title":"Senior Software Security Engineer","description":"<p><strong>About Anthropic</strong></p>\n<p>Anthropic&#39;s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.</p>\n<p><strong>About the role</strong></p>\n<p>The Security Engineering team&#39;s mission is to safeguard our AI systems and maintain the trust of our users and society at large. Whether we&#39;re developing critical security infrastructure, building secure development practices, or partnering with our research and product teams, we are committed to operating as a world-class security organisation and keeping the safety and trust of our users at the forefront of everything we do.</p>\n<p><strong>Responsibilities:</strong></p>\n<ul>\n<li>Build security for large-scale AI clusters, implementing robust cloud security architecture including IAM, network segmentation, and encryption controls</li>\n<li>Design secure-by-design workflows, secure CI/CD pipelines across our services, help build secure cloud infrastructure, with expertise in various cloud environments, Kubernetes security, container orchestration and identity management</li>\n<li>Ship and operate secure, high-reliability services using Infrastructure-as-Code (IaC) practices and GitOps workflows</li>\n<li>Apply deep expertise in threat modeling and risk assessment to secure complex multi cloud environments</li>\n<li>Mentor engineers and contribute to hiring and growth of the Security team</li>\n</ul>\n<p><strong>You may be a good fit if you:</strong></p>\n<ul>\n<li>5-15+ years of software engineering experience implementing and maintaining critical systems at scale</li>\n<li>Bachelor&#39;s degree in Computer Science/Software Engineering or equivalent industry experience</li>\n<li>Strong software engineering skills in Python or at least one systems language (Go, Rust, C/C++)</li>\n<li>Experience managing infrastructure at scale with DevOps and cloud automation best practices</li>\n<li>Track record of driving engineering excellence through high standards, constructive code reviews, and mentorship</li>\n<li>Proven ability to lead cross-functional security initiatives and navigate complex organisational dynamics</li>\n<li>Outstanding communication skills, translating technical concepts effectively across all organisational levels</li>\n<li>Demonstrated success in bringing clarity and ownership to ambiguous technical problems</li>\n<li>Strong systems thinking with ability to identify and mitigate risks in complex environments</li>\n<li>Low ego, high empathy engineer who attracts talent and supports diverse, inclusive teams</li>\n<li>Experience supporting fast-paced startup engineering teams</li>\n<li>Passionate about AI safety and alignment, with keen interest in making AI systems more interpretable and aligned with human values</li>\n</ul>\n<p><strong>Strong candidates may also have experience with:</strong></p>\n<ul>\n<li>Designing and hardening CI/CD pipelines against supply chain attacks through isolated environments, signed attestations, dependency verification, and automated policy enforcement</li>\n<li>Building secure development workflows through hardened remote environments</li>\n<li>Implementing network segmentation and access controls in cloud environments</li>\n<li>Managing infrastructure through automated configuration and policy enforcement</li>\n<li>Hardening containerized applications and enforcing security policies</li>\n</ul>\n<p><strong>Logistics</strong></p>\n<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience. <strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>\n<p><strong>Salary</strong></p>\n<p>The annual compensation range for this role is £240,000 - £325,000GBP.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_82598f5e-54b","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://job-boards.greenhouse.io","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5022845008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"£240,000 - £325,000GBP","x-skills-required":["Python","Go","Rust","C/C++","DevOps","Cloud automation","Kubernetes security","Container orchestration","Identity management"],"x-skills-preferred":["Threat modeling","Risk assessment","Secure-by-design workflows","CI/CD pipelines","Infrastructure-as-Code","GitOps workflows"],"datePosted":"2026-03-08T13:59:11.086Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"London, UK"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Go, Rust, C/C++, DevOps, Cloud automation, Kubernetes security, Container orchestration, Identity management, Threat modeling, Risk assessment, Secure-by-design workflows, CI/CD pipelines, Infrastructure-as-Code, GitOps workflows","baseSalary":{"@type":"MonetaryAmount","currency":"GBP","value":{"@type":"QuantitativeValue","minValue":240000,"maxValue":325000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_11a9548c-a4f"},"title":"Staff+ Software Engineer, Developer Productivity","description":"<p><strong>About Anthropic</strong></p>\n<p>Anthropic&#39;s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.</p>\n<p><strong>About the Role</strong></p>\n<p>Anthropic&#39;s Infrastructure organisation is foundational to our mission of developing AI systems that are reliable, interpretable, and steerable. The systems we build determine how quickly we can train new models, how reliably we can run safety experiments, and how effectively we can scale Claude to millions of users — demonstrating that safe, reliable infrastructure and frontier capabilities can go hand in hand.</p>\n<p>Developer Productivity owns the end-to-end experience of how engineers and researchers at Anthropic develop, build, test, and ship code at scale — from the source control and language ecosystems that underpin our monorepo, to the build and CI infrastructure that keeps thousands of daily builds running reliably across multiple cloud providers, to the developer acceleration tooling that deeply integrates Claude into engineering workflows.</p>\n<p>_Team Matching: Team matching is determined after the interview process based on interview performance, interests, and business priorities. Please note we may also consider you for different Infrastructure teams._</p>\n<p><strong>Responsibilities:</strong></p>\n<ul>\n<li>Own the technical strategy and roadmap for your area, translating team-level goals into concrete execution plans</li>\n<li>Define infrastructure architecture, ensuring the hardest problems get solved — whether by you directly or by working through others</li>\n<li>Design and build scalable, reliable distributed infrastructure and shared libraries that support high-volume workloads across all engineering teams</li>\n<li>Own and evolve build environments, package management, and dependency systems to enable fast, reproducible builds</li>\n<li>Define and implement language ecosystem standards, tooling, and frameworks that drive developer productivity across research and production workloads</li>\n</ul>\n<p><strong>You may be a good fit if you:</strong></p>\n<ul>\n<li>Have 10+ years (not including internships or co-ops) of experience in a Software Engineer role, building and operating large-scale developer infrastructure</li>\n<li>Have 3+ years (not including internships or co-ops) of experience leading large scale, complex projects or teams as an engineer or tech lead</li>\n<li>Have deep experience with build systems, CI/CD pipelines, and/or developer tooling in a large monorepo environment</li>\n<li>Have strong proficiency in Python, Rust and/or Go</li>\n<li>Are obsessed with developer productivity and reducing friction in the software development lifecycle</li>\n<li>Have experience with container orchestration and infrastructure at scale</li>\n<li>Have excellent communication skills and enjoy supporting internal partners to improve their development experience</li>\n<li>Are excited about designing foundational systems and are comfortable working independently on ambiguous, high-impact technical challenges</li>\n</ul>\n<p><strong>Strong candidates may have:</strong></p>\n<ul>\n<li>Experience with CI orchestration tools (Buildkite, Jenkins, GitHub Actions, or similar) and merge queue management at scale</li>\n<li>Experience building or operating remote build execution systems (Bazel Remote Execution API, BuildBarn, BuildBuddy, or similar)</li>\n<li>Experience with Nix/NixOS/Docker and managing large image / package sets at scale</li>\n<li>Experience building CLI tools, developer-facing services, and GitHub API and automation workflows</li>\n</ul>\n<p>_Deadline to apply: None. Applications will be reviewed on a rolling basis._</p>\n<p>The annual compensation range for this role is listed below.</p>\n<p>For sales roles, the range provided is the role’s On Target Earnings (&quot;OTE&quot;) range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.</p>\n<p>Annual Salary:</p>\n<p>$405,000 - $485,000USD</p>\n<p><strong>Logistics</strong></p>\n<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience.</p>\n<p><strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>\n<p><strong>Visa sponsorship:</strong> We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>\n<p><strong>We encourage you to apply even if you do not believe you meet every single qualification.</strong> Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work.</p>\n<p><strong>Your safety matters to us.</strong> To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you&#39;re ever unsure about a communication, don&#39;t click any links—visit anthropic.com/careers directly for confirmed position openings.</p>\n<p><strong>How we&#39;re different</strong></p>\n<p>We believe that the highest-impact work in AI safety and development happens at the intersection of technical expertise and societal responsibility. We&#39;re committed to building a team that reflects a wide range of backgrounds, perspectives, and experiences. We believe that diversity in all its forms drives better decision-making, more innovative solutions, and greater impact.</p>\n<p>We&#39;re an equal opportunities employer and welcome applications from all qualified candidates.</p>\n<p>If you&#39;re excited about this role and want to learn more, please don&#39;t hesitate to reach out to us. We look forward to hearing from you!</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_11a9548c-a4f","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://job-boards.greenhouse.io","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5110511008","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$405,000 - $485,000USD","x-skills-required":["Python","Rust","Go","Build systems","CI/CD pipelines","Developer tooling","Container orchestration","Infrastructure at scale"],"x-skills-preferred":["CI orchestration tools","Merge queue management","Remote build execution systems","Nix/NixOS/Docker","Large image/package sets","CLI tools","Developer-facing services","GitHub API and automation workflows"],"datePosted":"2026-03-08T13:53:03.879Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Rust, Go, Build systems, CI/CD pipelines, Developer tooling, Container orchestration, Infrastructure at scale, CI orchestration tools, Merge queue management, Remote build execution systems, Nix/NixOS/Docker, Large image/package sets, CLI tools, Developer-facing services, GitHub API and automation workflows","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":405000,"maxValue":485000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_25934fbc-c50"},"title":"Staff / Senior Software Engineer, Cloud Inference","description":"<p><strong>About the Role</strong></p>\n<p>The Cloud Inference team scales and optimizes Claude to serve the massive audiences of developers and enterprise companies across AWS, GCP, Azure, and future cloud service providers (CSPs). We own the end-to-end product of Claude on each cloud platform—from API integration and intelligent request routing to inference execution, capacity management, and day-to-day operations.</p>\n<p>Our engineers are extremely high leverage: we simultaneously drive multiple major revenue streams while optimizing one of Anthropic&#39;s most precious resources—compute. As we expand to more cloud platforms, the complexity of managing inference efficiently across providers with different hardware, networking stacks, and operational models grows significantly. We need engineers who can navigate these platform differences, build robust abstractions that work across providers, and make smart infrastructure decisions that keep us cost-effective at massive scale.</p>\n<p>Your work will increase the scale at which our services operate, accelerate our ability to reliably launch new frontier models and innovative features to customers across all platforms, and ensure our LLMs meet rigorous safety, performance, and security standards.</p>\n<p><strong>What You&#39;ll Do</strong></p>\n<ul>\n<li>Design and build infrastructure that serves Claude across multiple CSPs, accounting for differences in compute hardware, networking, APIs, and operational models</li>\n<li>Collaborate with CSP partner engineering teams to resolve operational issues, influence provider roadmaps, and stand up end-to-end serving on new cloud platforms</li>\n<li>Design and evolve CI/CD automation systems, including validation and deployment pipelines, that reliably ship new model versions to millions of users across cloud platforms without regressions</li>\n<li>Design interfaces and tooling abstractions across CSPs that enable cost-effective inference management, scale across providers, and reduce per-platform complexity</li>\n<li>Contribute to capacity planning and autoscaling strategies that dynamically match supply with demand across CSP validation and production workloads</li>\n<li>Optimize inference cost and performance across providers—designing workload placement and routing systems that direct requests to the most cost-effective accelerator and region</li>\n<li>Contribute to inference features that must work consistently across all platforms</li>\n<li>Analyze observability data across providers to identify performance bottlenecks, cost anomalies, and regressions, and drive remediation based on real-world production workloads</li>\n</ul>\n<p><strong>You May Be a Good Fit If You:</strong></p>\n<ul>\n<li>Have significant software engineering experience, with a strong background in high-performance, large-scale distributed systems serving millions of users</li>\n<li>Have experience building or operating services on at least one major cloud platform (AWS, GCP, or Azure), with exposure to Kubernetes, Infrastructure as Code or container orchestration</li>\n<li>Have strong interest in inference</li>\n<li>Thrive in cross-functional collaboration with both internal teams and external partners</li>\n<li>Are a fast learner who can quickly ramp up on new technologies, hardware platforms, and provider ecosystems</li>\n<li>Are highly autonomous and self-driven, taking ownership of problems end-to-end with a bias toward flexibility and high-impact work</li>\n<li>Pick up slack, even when it goes outside your job description</li>\n</ul>\n<p><strong>Strong Candidates May Also Have Experience With</strong></p>\n<ul>\n<li>Direct experience working with CSP partner teams to scale infrastructure or products across multiple platforms, navigating differences in networking, security, privacy, billing, and managed service offerings</li>\n<li>A background in building platform-agnostic tooling or abstraction layers that work across cloud providers</li>\n<li>Hands-on experience with capacity management, cost optimization, or resource planning at scale across heterogeneous environments</li>\n<li>Strong familiarity with LLM inference optimization, batching, caching, and serving strategies</li>\n<li>Experience with Machine learning infrastructure including GPUs, TPUs, Trainium, or other AI accelerators</li>\n<li>Background designing and building CI/CD systems that automate deployment and validation across cloud environments</li>\n<li>Solid understanding of multi-region deployments, geographic routing, and global traffic management</li>\n<li>Proficiency in Python or Rust</li>\n</ul>\n<p><strong>Logistics</strong></p>\n<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience. <strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>\n<p><strong>Visa sponsorship:</strong> We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_25934fbc-c50","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5107466008","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$300,000 - $485,000 USD","x-skills-required":["Software engineering","Cloud infrastructure","Kubernetes","Infrastructure as Code","Container orchestration","LLM inference optimization","Batching","Caching","Serving strategies","Machine learning infrastructure","GPUs","TPUs","Trainium","AI accelerators","CI/CD systems","Deployment and validation","Cloud environments","Multi-region deployments","Geographic routing","Global traffic management"],"x-skills-preferred":["Python","Rust","Cloud platforms","Networking","Security","Privacy","Billing","Managed service offerings","Platform-agnostic tooling","Abstraction layers","Capacity management","Cost optimization","Resource planning"],"datePosted":"2026-03-08T13:49:59.956Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Software engineering, Cloud infrastructure, Kubernetes, Infrastructure as Code, Container orchestration, LLM inference optimization, Batching, Caching, Serving strategies, Machine learning infrastructure, GPUs, TPUs, Trainium, AI accelerators, CI/CD systems, Deployment and validation, Cloud environments, Multi-region deployments, Geographic routing, Global traffic management, Python, Rust, Cloud platforms, Networking, Security, Privacy, Billing, Managed service offerings, Platform-agnostic tooling, Abstraction layers, Capacity management, Cost optimization, Resource planning","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":300000,"maxValue":485000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_da726093-b19"},"title":"Research Engineer, Discovery","description":"<p><strong>About the Role</strong></p>\n<p>As a Research Engineer on our team, you will work end to end across the whole model stack, identifying and addressing key infra blockers on the path to scientific AGI. Strong candidates should have familiarity with elements of language model training, evaluation, and inference and eagerness to quickly dive and get up to speed in areas they are not yet an expert on. This may include performance optimization, distributed systems, VM/sandboxing/container deployment, and large scale data pipelines.</p>\n<p><strong>Responsibilities:</strong></p>\n<ul>\n<li>Design and implement large-scale infrastructure systems to support AI scientist training, evaluation, and deployment across distributed environments</li>\n<li>Identify and resolve infrastructure bottlenecks impeding progress toward scientific capabilities</li>\n<li>Develop robust and reliable evaluation frameworks for measuring progress towards scientific AGI.</li>\n<li>Build scalable and performant VM/sandboxing/container architectures to safely execute long-horizon AI tasks and scientific workflows</li>\n<li>Collaborate to translate experimental requirements into production-ready infrastructure</li>\n<li>Develop large scale data pipelines to handle advanced language model training requirements</li>\n<li>Optimize large scale training and inference pipelines for stable and efficient reinforcement learning</li>\n</ul>\n<p><strong>You may be a good fit if you:</strong></p>\n<ul>\n<li>Have 6+ years of highly-relevant experience in infrastructure engineering with demonstrated expertise in large-scale distributed systems</li>\n<li>Are a strong communicator and enjoy working collaboratively</li>\n<li>Possess deep knowledge of performance optimization techniques and system architectures for high-throughput ML workloads</li>\n<li>Have experience with containerization technologies (Docker, Kubernetes) and orchestration at scale</li>\n<li>Have proven track record of building large-scale data pipelines and distributed storage systems</li>\n<li>Excel at diagnosing and resolving complex infrastructure challenges in production environments</li>\n<li>Can work effectively across the full ML stack from data pipelines to performance optimization</li>\n<li>Have experience collaborating with other researchers to scale experimental ideas</li>\n<li>Thrive in fast-paced environments and can rapidly iterate from experimentation to production</li>\n</ul>\n<p><strong>Strong candidates may also have:</strong></p>\n<ul>\n<li>Experience with language model training infrastructure and distributed ML frameworks (PyTorch, JAX, etc.)</li>\n<li>Background in building infrastructure for AI research labs or large-scale ML organizations</li>\n<li>Knowledge of GPU/TPU architectures and language model inference optimization</li>\n<li>Experience with cloud platforms (AWS, GCP) at enterprise scale</li>\n<li>Familiarity with VM and container orchestration.</li>\n<li>Experience with workflow orchestration tools and experiment management systems</li>\n<li>History working with large scale reinforcement learning</li>\n<li>Comfort with large scale data pipelines (Beam, Spark, Dask, …)</li>\n</ul>\n<p><strong>Logistics</strong></p>\n<ul>\n<li>Education requirements: We require at least a Bachelor&#39;s degree in a related field or equivalent experience.</li>\n<li>Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</li>\n<li>Visa sponsorship: We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</li>\n</ul>\n<p><strong>We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work.</strong></p>\n<p><strong>Your safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you&#39;re ever unsure about a communication, don&#39;t click any links—visit anthropic.com/careers directly for confirmed position openings.</strong></p>\n<p><strong>How we&#39;re different</strong></p>\n<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale projects, and we&#39;re committed to making a positive impact on the world.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_da726093-b19","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://job-boards.greenhouse.io","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4669581008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$350,000 - $850,000 USD","x-skills-required":["infrastructure engineering","large-scale distributed systems","performance optimization","containerization technologies","orchestration at scale","data pipelines","distributed storage systems","complex infrastructure challenges","ML stack","workflow orchestration tools","experiment management systems","reinforcement learning","large scale data pipelines"],"x-skills-preferred":["language model training infrastructure","distributed ML frameworks","GPU/TPU architectures","language model inference optimization","cloud platforms","VM and container orchestration","workflow orchestration tools","experiment management systems","large scale reinforcement learning","large scale data pipelines"],"datePosted":"2026-03-08T13:46:32.661Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"infrastructure engineering, large-scale distributed systems, performance optimization, containerization technologies, orchestration at scale, data pipelines, distributed storage systems, complex infrastructure challenges, ML stack, workflow orchestration tools, experiment management systems, reinforcement learning, large scale data pipelines, language model training infrastructure, distributed ML frameworks, GPU/TPU architectures, language model inference optimization, cloud platforms, VM and container orchestration, workflow orchestration tools, experiment management systems, large scale reinforcement learning, large scale data pipelines","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":350000,"maxValue":850000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_b7de618e-5e1"},"title":"Site Reliability Engineer","description":"<p>Join our Site Reliability Engineering team and help ensure the reliability, scalability, and performance of Replit&#39;s infrastructure that serves millions of developers worldwide. As a Site Reliability Engineer, you will bridge the gap between development and operations, implementing automation and establishing best practices that enable our platform to scale efficiently while maintaining high availability.</p>\n<p>We are seeking SREs who are passionate about building and maintaining resilient systems at scale. Your mission will be to design and implement robust monitoring solutions, automate operational tasks, and continuously improve our infrastructure&#39;s reliability and performance.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Design and Implement Observability Solutions: Develop comprehensive monitoring and alerting systems using modern observability tools. Create dashboards and metrics that provide real-time visibility into system health and performance. Implement logging strategies that enable quick problem identification and resolution.</li>\n</ul>\n<ul>\n<li>Drive Automation and Infrastructure as Code: Architect and implement infrastructure automation solutions using tools like Terraform, Ansible, or Pulumi. Design and maintain CI/CD pipelines that enable reliable and consistent deployments. Create self-healing systems that can automatically respond to common failure scenarios.</li>\n</ul>\n<ul>\n<li>Establish SLOs and SLIs: Work with product and engineering teams to define and implement Service Level Objectives (SLOs) and Service Level Indicators (SLIs). Build systems to track and report on these metrics, ensuring we maintain high reliability standards while balancing innovation speed.</li>\n</ul>\n<ul>\n<li>Incident Management and Response: Lead incident response efforts, conducting thorough post-mortems, and implementing improvements to prevent future occurrences. Develop and maintain runbooks for critical services. Build tools and processes that reduce Mean Time To Recovery (MTTR).</li>\n</ul>\n<ul>\n<li>Performance Optimization: Identify and resolve performance bottlenecks across our infrastructure. Implement capacity planning strategies and optimize resource utilization. Work on reducing latency and improving system efficiency across global regions.</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>4-8 years of experience in Site Reliability Engineering or similar roles (DevOps, Systems Engineering, Infrastructure Engineering)</li>\n</ul>\n<ul>\n<li>Strong programming skills in languages commonly used for automation (Python, Go, or similar)</li>\n</ul>\n<ul>\n<li>Deep understanding of distributed systems</li>\n</ul>\n<ul>\n<li>Experience with container orchestration platforms (Kubernetes) and cloud-native technologies</li>\n</ul>\n<ul>\n<li>Proven track record of implementing and maintaining monitoring/observability solutions</li>\n</ul>\n<ul>\n<li>Strong incident management skills with experience leading incident response</li>\n</ul>\n<ul>\n<li>Experience with infrastructure as code and configuration management tools</li>\n</ul>\n<p><strong>Bonus Points</strong></p>\n<ul>\n<li>Experience with Google Cloud Platform (GCP) services and tools</li>\n</ul>\n<ul>\n<li>Knowledge of modern observability platforms (Prometheus, Grafana, Datadog, etc.)</li>\n</ul>\n<p><strong>What We Value</strong></p>\n<ul>\n<li>Problem-solving mindset: Ability to approach complex operational challenges systematically and devise effective solutions</li>\n</ul>\n<ul>\n<li>Self-directed and autonomous: Capable of working independently while collaborating effectively with cross-functional teams</li>\n</ul>\n<ul>\n<li>Strong communication skills: Ability to explain complex technical concepts to both technical and non-technical audiences</li>\n</ul>\n<ul>\n<li>Continuous learning: Passion for staying current with industry best practices and new technologies</li>\n</ul>\n<ul>\n<li>Focus on automation: Strong belief in automating repetitive tasks and building self-healing systems</li>\n</ul>\n<p><strong>Full-Time Employee Benefits Include</strong></p>\n<ul>\n<li>Competitive Salary &amp; Equity</li>\n</ul>\n<ul>\n<li>401(k) Program with a 4% match</li>\n</ul>\n<ul>\n<li>Health, Dental, Vision and Life Insurance</li>\n</ul>\n<ul>\n<li>Short Term and Long Term Disability</li>\n</ul>\n<ul>\n<li>Paid Parental, Medical, Caregiver Leave</li>\n</ul>\n<ul>\n<li>Commuter Benefits</li>\n</ul>\n<ul>\n<li>Monthly Wellness Stipend</li>\n</ul>\n<ul>\n<li>Autonomous Work Environment</li>\n</ul>\n<ul>\n<li>In Office Set-Up Reimbursement</li>\n</ul>\n<ul>\n<li>Flexible Time Off (FTO) + Holidays</li>\n</ul>\n<ul>\n<li>Quarterly Team Gatherings</li>\n</ul>\n<ul>\n<li>In Office Amenities</li>\n</ul>\n<p><strong>Want to Learn More About What We Are Up To?</strong></p>\n<ul>\n<li>Meet the Replit Agent</li>\n</ul>\n<ul>\n<li>Replit: Make an app for that</li>\n</ul>\n<ul>\n<li>Replit Blog</li>\n</ul>\n<ul>\n<li>Amjad TED Talk</li>\n</ul>\n<p><strong>Interviewing + Culture at Replit</strong></p>\n<ul>\n<li>Operating Principles</li>\n</ul>\n<ul>\n<li>Reasons not to work at Replit</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_b7de618e-5e1","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Replit","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/replit.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/replit/f6e6158e-eb89-4008-81ea-1b7512bc509d","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$160K - $250K","x-skills-required":["Site Reliability Engineering","DevOps","Systems Engineering","Infrastructure Engineering","Python","Go","Distributed systems","Container orchestration platforms","Cloud-native technologies","Monitoring/observability solutions","Incident management","Infrastructure as code","Configuration management tools"],"x-skills-preferred":["Google Cloud Platform","Prometheus","Grafana","Datadog"],"datePosted":"2026-03-07T15:20:24.140Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"United States"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Site Reliability Engineering, DevOps, Systems Engineering, Infrastructure Engineering, Python, Go, Distributed systems, Container orchestration platforms, Cloud-native technologies, Monitoring/observability solutions, Incident management, Infrastructure as code, Configuration management tools, Google Cloud Platform, Prometheus, Grafana, Datadog","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":160000,"maxValue":250000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_323bc85d-b69"},"title":"Staff Infrastructure Engineer","description":"<p><strong>About the Role:</strong></p>\n<p>Join our Infrastructure Engineering team and help ensure the reliability, scalability, and performance of Replit&#39;s infrastructure that serves millions of developers worldwide. As a Staff Infrastructure Engineer, you will bridge the gap between development and operations, implementing automation and establishing best practices that enable our platform to scale efficiently while maintaining high availability.</p>\n<p><strong>Responsibilities:</strong></p>\n<ul>\n<li>Drive Automation and Infrastructure as Code: Architect, build, and improve automation to eliminate toil and operational work. Design and maintain CI/CD pipelines and infrastructure automation using tools like Terraform or Pulumi. Create self-healing systems that can automatically respond to common failure scenarios.</li>\n</ul>\n<ul>\n<li>Optimise Performance and Infrastructure: Collaborate with core infrastructure and product teams to performance tune and optimise our cloud deployments (Kubernetes, Docker, GCP). Identify and resolve performance bottlenecks, implement capacity planning strategies, and reduce latency across global regions.</li>\n</ul>\n<ul>\n<li>Elevate Developer Experience: Design and implement improvements to our build, test, and deployment systems to make software delivery faster, safer, and more reliable for all engineers.</li>\n</ul>\n<ul>\n<li>Drive Cross-Company Improvements: Partner directly with service owners across Replit to understand their pain points, and collaborate on implementing build/test/deploy enhancements within their specific services.</li>\n</ul>\n<ul>\n<li>Build Shared Tooling: Create and maintain centralized tooling and automation that improves the entire engineering lifecycle, from local development to production monitoring.</li>\n</ul>\n<ul>\n<li>Debug and Harden Systems: Dive deep into debugging extremely difficult technical problems, making our systems and products more robust, operable, and easier to diagnose.</li>\n</ul>\n<ul>\n<li>Provide Staff-Level Guidance: Review feature and system designs, acting as an owner for the security, scale, and operational integrity of those designs.</li>\n</ul>\n<ul>\n<li>Educate and Mentor: Educate, mentor, and hold accountable the engineering team to improve the reliability of our systems, making reliability a core value of the Replit engineering culture.</li>\n</ul>\n<ul>\n<li>Build and Integrate: Write high-quality, well-tested code to meet the needs of your customers, including building pipelines to integrate with 3rd party vendors.</li>\n</ul>\n<p><strong>Required Skills and Experience:</strong></p>\n<ul>\n<li>8-10 years of experience in Infrastructure Engineering or similar roles (DevOps, Systems Engineering, Site Reliability Engineering).</li>\n</ul>\n<ul>\n<li>Strong programming skills in languages like Python or Go.</li>\n</ul>\n<ul>\n<li>You write high-quality, well-tested code.</li>\n</ul>\n<ul>\n<li>Deep understanding of distributed systems. You&#39;ve designed, built, scaled, and maintained production services and know how to compose a service-oriented architecture.</li>\n</ul>\n<ul>\n<li>Experience with container orchestration platforms (Kubernetes) and cloud-native technologies.</li>\n</ul>\n<ul>\n<li>Proven track record of implementing and maintaining monitoring/observability solutions, with strong skills in debugging and performance tuning.</li>\n</ul>\n<ul>\n<li>Strong incident management skills with experience leading incident response and demonstrated critical thinking under pressure.</li>\n</ul>\n<ul>\n<li>Experience with infrastructure as code (e.g., Terraform) and configuration management tools.</li>\n</ul>\n<ul>\n<li>Excellent written and verbal communication skills, with an ability to explain technical concepts clearly and simply and a bias toward open, transparent cultural practices.</li>\n</ul>\n<ul>\n<li>Strong interpersonal skills, with experience working with engineers from junior to principal levels.</li>\n</ul>\n<ul>\n<li>A willingness to dive into understanding, debugging, and improving any layer of the stack.</li>\n</ul>\n<ul>\n<li>You&#39;re passionate about making software creation accessible and empowering the next generation of builders.</li>\n</ul>\n<p><strong>Bonus Points:</strong></p>\n<ul>\n<li>Deep experience with Google Cloud Platform (GCP) services and tools.</li>\n</ul>\n<ul>\n<li>Knowledge of modern observability platforms (Prometheus, Grafana, Datadog, etc.).</li>\n</ul>\n<ul>\n<li>Experience designing and building reliable systems capable of handling high throughput and low latency.</li>\n</ul>\n<ul>\n<li>Experience with Go and Terraform.</li>\n</ul>\n<ul>\n<li>Familiarity with working in rapid-growth environments.</li>\n</ul>\n<ul>\n<li>Experience writing company-facing blog posts and training materials.</li>\n</ul>\n<p><strong>Full-Time Employee Benefits Include:</strong></p>\n<ul>\n<li>Competitive Salary &amp; Equity</li>\n</ul>\n<ul>\n<li>401(k) Program with a 4% match</li>\n</ul>\n<ul>\n<li>Health, Dental, Vision and Life Insurance</li>\n</ul>\n<ul>\n<li>Short Term and Long Term Disability</li>\n</ul>\n<ul>\n<li>Paid Parental, Medical, Caregiver Leave</li>\n</ul>\n<ul>\n<li>Commuter Benefits</li>\n</ul>\n<ul>\n<li>Monthly Wellness Stipend</li>\n</ul>\n<ul>\n<li>Autonomous Work Environment</li>\n</ul>\n<ul>\n<li>In Office Set-Up Reimbursement</li>\n</ul>\n<ul>\n<li>Flexible Time Off (FTO) + Holidays</li>\n</ul>\n<ul>\n<li>Quarterly Team Gatherings</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_323bc85d-b69","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Replit","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/replit.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/replit/6481ec1e-527c-4c1f-a041-2fb5021e7bd5","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$220K – $325K","x-skills-required":["Infrastructure Engineering","DevOps","Systems Engineering","Site Reliability Engineering","Python","Go","Distributed systems","Container orchestration platforms","Cloud-native technologies","Monitoring/observability solutions","Infrastructure as code","Configuration management tools"],"x-skills-preferred":["Google Cloud Platform","Prometheus","Grafana","Datadog","Go","Terraform","Rapid-growth environments","Company-facing blog posts","Training materials"],"datePosted":"2026-03-07T15:18:43.191Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Foster City, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Infrastructure Engineering, DevOps, Systems Engineering, Site Reliability Engineering, Python, Go, Distributed systems, Container orchestration platforms, Cloud-native technologies, Monitoring/observability solutions, Infrastructure as code, Configuration management tools, Google Cloud Platform, Prometheus, Grafana, Datadog, Go, Terraform, Rapid-growth environments, Company-facing blog posts, Training materials","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":220000,"maxValue":325000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_2fd7fc02-3ed"},"title":"Security Engineer, Agent Security","description":"<p><strong>Security Engineer, Agent Security</strong></p>\n<p><strong>Location</strong></p>\n<p>San Francisco</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p>Security</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$293K – $385K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p><strong>About the Team</strong></p>\n<p>The team’s mission is to accelerate the secure evolution of agentic AI systems at OpenAI. To achieve this, the team designs, implements, and continuously refines security policies, frameworks, and controls that defend OpenAI’s most critical assets—including the user and customer data embedded within them—against the unique risks introduced by agentic AI.</p>\n<p><strong>About the Role</strong></p>\n<p><strong>As a Security Engineer on the Agent Security Team</strong>, you will be at the forefront of securing OpenAI’s cutting-edge agentic AI systems. Your role will involve designing and implementing robust security frameworks, policies, and controls to safeguard OpenAI’s critical assets and ensure the safe deployment of agentic systems. You will develop comprehensive threat models, partner tightly with our Agent Infrastructure group to fortify the platforms that power OpenAI’s most advanced agentic systems, and lead efforts to enhance safety monitoring pipelines at scale.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Architecting security controls for agentic AI – design, implement, and iterate on identity, network, and runtime-level defenses (e.g., sandboxing, policy enforcement) that integrate directly with the Agent Infrastructure stack.</li>\n</ul>\n<ul>\n<li>Building production-grade security tooling – ship code that hardens safety monitoring pipelines across agent executions at scale.</li>\n</ul>\n<ul>\n<li>Collaborating cross-functionally – work daily with Agent Infrastructure, product, research, safety, and security teams to balance security, performance, and usability.</li>\n</ul>\n<ul>\n<li>Influencing strategy &amp; standards – shape the long-term Agent Security roadmap, publish best practices internally and externally, and help define industry standards for securing autonomous AI.</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>Strong software-engineering skills in Python or at least one systems language (Go, Rust, C/C++), plus a track record of shipping and operating secure, high-reliability services.</li>\n</ul>\n<ul>\n<li>Deep expertise in modern isolation techniques – experience with container security, kernel-level hardening, and other isolation methods.</li>\n</ul>\n<ul>\n<li>Hands-on network security experience – implementing identity-based controls, policy enforcement, and secure large-scale telemetry pipelines.</li>\n</ul>\n<ul>\n<li>Clear, concise communication that bridges engineering, research, and leadership audiences; comfort influencing roadmaps and driving consensus.</li>\n</ul>\n<ul>\n<li>Bias for action &amp; ownership – you thrive in ambiguity, move quickly without sacrificing rigor, and elevate the security bar company-wide from day one.</li>\n</ul>\n<ul>\n<li>Cloud security depth on at least one major provider (Azure, AWS, GCP), including identity federation, workload IAM, and infrastructure-as-code best practices.</li>\n</ul>\n<ul>\n<li>Familiarity with AI/ML security challenges – experience addressing risks associated with advanced AI systems (nice-to-have but valuable)</li>\n</ul>\n<p><strong>Preferred Qualifications</strong></p>\n<ul>\n<li>Experience with container orchestration (e.g., Kubernetes) and service mesh technologies (e.g., Istio, Linkerd).</li>\n</ul>\n<ul>\n<li>Knowledge of cloud security frameworks and compliance standards (e.g., HIPAA, PCI-DSS).</li>\n</ul>\n<ul>\n<li>Familiarity with machine learning and AI frameworks (e.g., TensorFlow, PyTorch).</li>\n</ul>\n<ul>\n<li>Experience with DevOps tools and practices (e.g., CI/CD pipelines, containerization).</li>\n</ul>\n<p><strong>What We Offer</strong></p>\n<ul>\n<li>Competitive salary and benefits package</li>\n</ul>\n<ul>\n<li>Opportunity to work with a talented team of engineers and researchers</li>\n</ul>\n<ul>\n<li>Collaborative and dynamic work environment</li>\n</ul>\n<ul>\n<li>Professional growth and development opportunities</li>\n</ul>\n<ul>\n<li>Flexible work arrangements</li>\n</ul>\n<ul>\n<li>Access to cutting-edge technology and tools</li>\n</ul>\n<p><strong>How to Apply</strong></p>\n<p>If you are a motivated and experienced security engineer looking to join a dynamic team, please submit your application, including your resume and a cover letter, to [insert contact information]. We look forward to hearing from you!</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_2fd7fc02-3ed","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/e9bea775-7eb6-438a-ab96-27d5f941e69d","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$293K – $385K • Offers Equity","x-skills-required":["Python","Go","Rust","C/C++","container security","kernel-level hardening","isolation methods","identity-based controls","policy enforcement","telemetry pipelines","cloud security","identity federation","workload IAM","infrastructure-as-code"],"x-skills-preferred":["container orchestration","service mesh technologies","cloud security frameworks","compliance standards","machine learning","AI frameworks","DevOps tools","CI/CD pipelines","containerization"],"datePosted":"2026-03-06T18:44:49.390Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Go, Rust, C/C++, container security, kernel-level hardening, isolation methods, identity-based controls, policy enforcement, telemetry pipelines, cloud security, identity federation, workload IAM, infrastructure-as-code, container orchestration, service mesh technologies, cloud security frameworks, compliance standards, machine learning, AI frameworks, DevOps tools, CI/CD pipelines, containerization","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":293000,"maxValue":385000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_520ca95e-75f"},"title":"Software Engineer, Agent Infrastructure","description":"<p><strong>Software Engineer, Agent Infrastructure</strong></p>\n<p><strong>Location</strong></p>\n<p>San Francisco; New York City</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p>Scaling</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$230K – $385K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p><strong>About the Team</strong></p>\n<p>The Agent Infrastructure team at OpenAI is responsible for building systems that enable training and deployment of highly useful AI agents, both internally and for the world.</p>\n<p>We work hand-in-hand with researchers to design and scale the environment in which agentic models are trained – providing a workspace for AI models to execute code, debug issues, and develop software just as human SWEs do. Our training environment for agentic models operates at an extremely high scale and has the flexibility to emulate any environment in which an agent might work.</p>\n<p>At the same time, our team builds and maintains OpenAI’s core platform for the deployment and execution of agents in production. Our systems power products such as Codex, Operator, tool use in ChatGPT, and future agentic products.</p>\n<p><strong>About the Role</strong></p>\n<p>As a Software Engineer on the Agent Infrastructure team, you will have the opportunity to work closely with both research and product at OpenAI - building and scaling systems to train highly capable agentic models, and building the platform and integrations to launch new agents to hundreds of millions of users worldwide.</p>\n<p>Your work will consist of both building new capabilities - standing up the infrastructure and integrations needed to train more complex agentic models - and rapidly scaling these new capabilities to some of the largest compute clusters in the world. At the same time, you’ll be instrumental to the launch of agentic products at OpenAI - building, maintaining, and scaling the production platform on which all agents run.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Push massive compute clusters to their limits. You will be a core contributor to a novel container orchestration platform built in-house by our team to scale far beyond what’s possible with systems like Kubernetes.</li>\n</ul>\n<ul>\n<li>Develop and maintain FastAPI and gRPC APIs that serve as the interface for our agentic infrastructure used both in training and production.</li>\n</ul>\n<ul>\n<li>Use Terraform to stand up and evolve complex infrastructure for both research and production.</li>\n</ul>\n<ul>\n<li>Collaborate with research teams to stand up and optimize systems for novel AI training runs and experimental applications.</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>Have deep experience working on large-scale machine learning infrastructure. You know how to reason about training at scale, identifying bottlenecks and engineering solutions to optimize system performance in training environments.</li>\n</ul>\n<ul>\n<li>Know how to build new things from 0-1 quickly, and then scale them 1,000,000x.</li>\n</ul>\n<ul>\n<li>Have a keen eye for performance and optimization. You know how to squeeze the most performance out of complex, globally-distributed systems.</li>\n</ul>\n<ul>\n<li>Know your way around cloud platforms and work with infrastructure-as-code tech like Terraform.</li>\n</ul>\n<ul>\n<li>Are driven by solving complex, ambiguous problems at the intersection of infrastructure scalability, virtualization efficiency, and agentic capabilities.</li>\n</ul>\n<ul>\n<li>Have deep technical expertise in virtualization and containerization technologies (e.g. Kata, Firecracker, gVisor, Sysbox) and are passionate about optimizing runtime performance.</li>\n</ul>\n<p><strong>What We Offer</strong></p>\n<ul>\n<li>Competitive salary and equity package</li>\n</ul>\n<ul>\n<li>Opportunity to work on cutting-edge AI infrastructure</li>\n</ul>\n<ul>\n<li>Collaborative and dynamic team environment</li>\n</ul>\n<ul>\n<li>Flexible work arrangements</li>\n</ul>\n<ul>\n<li>Professional development opportunities</li>\n</ul>\n<ul>\n<li>Access to the latest technology and tools</li>\n</ul>\n<p><strong>How to Apply</strong></p>\n<p>If you are a motivated and experienced software engineer looking to join a dynamic team and work on cutting-edge AI infrastructure, please submit your application. We look forward to hearing from you!</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_520ca95e-75f","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/c1316397-25bb-4add-9e9d-0e3ea8ba929a","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$230K – $385K","x-skills-required":["large-scale machine learning infrastructure","container orchestration","FastAPI","gRPC","Terraform","cloud platforms","infrastructure-as-code","virtualization","containerization","Kata","Firecracker","gVisor","Sysbox"],"x-skills-preferred":["AI infrastructure","agentic models","training environments","compute clusters","performance optimization","runtime performance"],"datePosted":"2026-03-06T18:41:05.385Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco; New York City"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"large-scale machine learning infrastructure, container orchestration, FastAPI, gRPC, Terraform, cloud platforms, infrastructure-as-code, virtualization, containerization, Kata, Firecracker, gVisor, Sysbox, AI infrastructure, agentic models, training environments, compute clusters, performance optimization, runtime performance","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":230000,"maxValue":385000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_8a823cc4-381"},"title":"Full-Stack SWE, Data Acquisition (Foundations)","description":"<p><strong>Job Posting</strong></p>\n<p><strong>Full-Stack SWE, Data Acquisition (Foundations)</strong></p>\n<p><strong>Overview:</strong></p>\n<p>The Data Acquisition team within the Foundations organization at OpenAI is responsible for all aspects of data collection to support our model training operations. Our team manages web crawling and GPTBot services and works closely with Data Processing, Architecture, and Scaling teams. We are looking for a skilled Full-Stack Engineer to join our Data Acquisition team to build and optimize the interfaces and tools that power our data infrastructure.</p>\n<p><strong>Responsibilities:</strong></p>\n<ul>\n<li>Develop and maintain full-stack applications that support data acquisition, including internal tools and dashboards.</li>\n<li>Collaborate closely with cross-functional teams, including Data Processing, Architecture, and Scaling, to ensure seamless data ingestion and workflow management.</li>\n<li>Design and implement APIs to facilitate data interactions between internal services and external data sources.</li>\n<li>Enhance user experience by developing intuitive web-based interfaces for managing and monitoring data pipelines.</li>\n<li>Optimize backend services for performance, scalability, and security in a distributed computing environment.</li>\n<li>Work with legal and compliance teams to ensure our data acquisition processes adhere to privacy regulations and best practices.</li>\n<li>Deploy and maintain infrastructure using Kubernetes and Infrastructure-as-Code (IaC) methodologies.</li>\n<li>Analyze system performance, conduct experiments, and improve data workflows to maximize efficiency.</li>\n</ul>\n<p><strong>Qualifications:</strong></p>\n<ul>\n<li>BS/MS/PhD in Computer Science or a related field.</li>\n<li>4+ years of industry experience in full-stack development.</li>\n<li>Proficiency in frontend frameworks (React, Vue, or similar) and backend technologies such as Python, Node.js, or Go.</li>\n<li>Strong expertise in RESTful APIs, GraphQL, and database design (SQL and NoSQL).</li>\n<li>Experience building data-intensive applications that handle large-scale datasets.</li>\n<li>Familiarity with cloud platforms (AWS, GCP, or Azure) and container orchestration (Kubernetes, Docker).</li>\n<li>Prior experience with web crawling and large-scale data processing is a plus.</li>\n<li>Strong problem-solving skills and ability to balance multiple tasks in a fast-moving environment.</li>\n<li>Excellent communication and collaboration skills.</li>\n</ul>\n<p><strong>About OpenAI</strong></p>\n<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$293K – $385K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n<li>401(k) retirement plan with employer match</li>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n<li>Mental health and wellness support</li>\n<li>Employer-paid basic life and disability coverage</li>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n<li>Relocation support for eligible employees</li>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Experience Level</strong></p>\n<p>Mid</p>\n<p><strong>Workplace Type</strong></p>\n<p>Hybrid</p>\n<p><strong>Category</strong></p>\n<p>Engineering</p>\n<p><strong>Industry</strong></p>\n<p>Technology</p>\n<p><strong>Salary Range</strong></p>\n<p>$293K – $385K • Offers Equity</p>\n<p><strong>Required Skills</strong></p>\n<ul>\n<li>Full-stack development</li>\n<li>Frontend frameworks (React, Vue, or similar)</li>\n<li>Backend technologies (Python, Node.js, or Go)</li>\n<li>RESTful APIs</li>\n<li>GraphQL</li>\n<li>Database design (SQL and NoSQL)</li>\n<li>Cloud platforms (AWS, GCP, or Azure)</li>\n<li>Container orchestration (Kubernetes, Docker)</li>\n</ul>\n<p><strong>Preferred Skills</strong></p>\n<ul>\n<li>Web crawling</li>\n<li>Large-scale data processing</li>\n<li>Problem-solving skills</li>\n<li>Communication and collaboration skills</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_8a823cc4-381","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/a886ff48-b8a1-4e28-b468-296713a5ad78","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$293K – $385K • Offers Equity","x-skills-required":["Full-stack development","Frontend frameworks (React, Vue, or similar)","Backend technologies (Python, Node.js, or Go)","RESTful APIs","GraphQL","Database design (SQL and NoSQL)","Cloud platforms (AWS, GCP, or Azure)","Container orchestration (Kubernetes, Docker)"],"x-skills-preferred":["Web crawling","Large-scale data processing","Problem-solving skills","Communication and collaboration skills"],"datePosted":"2026-03-06T18:39:01.903Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Full-stack development, Frontend frameworks (React, Vue, or similar), Backend technologies (Python, Node.js, or Go), RESTful APIs, GraphQL, Database design (SQL and NoSQL), Cloud platforms (AWS, GCP, or Azure), Container orchestration (Kubernetes, Docker), Web crawling, Large-scale data processing, Problem-solving skills, Communication and collaboration skills","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":293000,"maxValue":385000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_fb4acb2b-bab"},"title":"Security Reliability Engineering, Lead","description":"<p><strong>Security Reliability Engineering, Lead</strong></p>\n<p><strong>Location</strong></p>\n<p>San Francisco</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p>Security</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$293K – $385K</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p>More details about our benefits are available to candidates during the hiring process.</p>\n<p>This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.</p>\n<p><strong>About the Team</strong></p>\n<p>The Infrastructure Engineering function sits within IT and is responsible for reliably building, deploying, and operating critical on prem and hybrid environments that power internal services and critical R&amp;D environments.</p>\n<p>This is a new, bootstrap team focused on applying strong Site Reliability Engineering discipline to environments where uptime, safety, recoverability, and security are non-negotiable. The team replaces bespoke, one off infrastructure with standardized infrastructure-as-code building blocks that compound reliability and operational leverage as OpenAI scales.</p>\n<p><strong>About the Role</strong></p>\n<p>We are looking for a Security Reliability Engineering Lead to design, build, and operate reliable, secure, and scalable infrastructure that underpins identity, access, endpoint, and shared platform services across the company.</p>\n<p>In this role, you will own infrastructure and identity systems end to end, from foundational design and provisioning through policy enforcement, upgrades, recovery, and day two operations. You will establish durable, production grade platforms that remove operational friction, enforce security by default, and enable teams to move faster with confidence.</p>\n<p>This role is well suited for a senior engineer who thrives in ambiguity, enjoys owning complex systems end to end, and raises the reliability and security bar by replacing fragile implementations with standardized, repeatable infrastructure.</p>\n<p>This role is based in our San Francisco HQ and requires in-office presence.</p>\n<p><strong>In this role, you will:</strong></p>\n<p><strong>Set direction and establish strong foundations</strong></p>\n<ul>\n<li>Define and evolve infrastructure patterns for on prem and hybrid environments, including self hosted platforms, vendor supported systems, and lab environments.</li>\n</ul>\n<ul>\n<li>Establish standardized, production grade deployment and operational models that replace bespoke implementations.</li>\n</ul>\n<ul>\n<li>Partner with IT, Security, Identity, and Network teams to ensure infrastructure meets reliability, security, and access requirements by design.</li>\n</ul>\n<ul>\n<li>Design and mature the production architecture for IAM adjacent platforms such as Microsoft Entra using SRE principles.</li>\n</ul>\n<ul>\n<li>Establish common management rules and shared resources within Azure subscriptions to ensure consistent, policy aligned operations.</li>\n</ul>\n<p><strong>Build, operate, and scale reliably</strong></p>\n<ul>\n<li>Own the full lifecycle of infrastructure systems, including deployment, upgrades, patching, recovery, and ongoing operations.</li>\n</ul>\n<ul>\n<li>Operate and harden shared infrastructure provisioned through Infra Terraform, ensuring repeatability, auditability, and safe change management.</li>\n</ul>\n<ul>\n<li>Design and implement infrastructure as code and configuration management to support shared services, identity adjacent systems, and endpoint platforms using tools like Chef, Ansible and Terraform.</li>\n</ul>\n<ul>\n<li>Build and operate monitoring, alerting, and incident response mechanisms to meet high availability and recoverability targets.</li>\n</ul>\n<ul>\n<li>Lead incident response and postmortems across infrastructure, identity adjacent platforms, and fleet systems, driving durable fixes and shared learning.</li>\n</ul>\n<ul>\n<li>Build and operate containerized and platform services, including Kubernetes and Docker-based workloads, using DevOps practices that emphasize reliability, repeatability, and safe change management.</li>\n</ul>\n<ul>\n<li>Use Git-based workflows as the source of truth for infrastructure and policy changes, enabling review, auditability, and safe, reversible automation.</li>\n</ul>\n<p><strong>Automate for leverage and safety</strong></p>\n<ul>\n<li>Identify high leverage automation opportunities that eliminate manual toil and reduce operational risk across infrastructure and access related systems.</li>\n</ul>\n<ul>\n<li>Implement guardrails, safety mechanisms, and progressive rollout patterns for infrastructure and policy enforcement changes.</li>\n</ul>\n<ul>\n<li>Ensure automation is safe, observable, and resilient under failure conditions, particularly for shared services and high blast radius systems.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_fb4acb2b-bab","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/645ccd65-eb60-4eb7-b094-b01c2269638c","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$293K – $385K","x-skills-required":["Security Reliability Engineering","Infrastructure as Code","Cloud Computing","Containerization","DevOps","Git","Terraform","Ansible","Chef","Kubernetes","Docker","Microsoft Entra","Azure","Identity and Access Management","Endpoint Security","Platform Services"],"x-skills-preferred":["Site Reliability Engineering","Cloud Security","Container Orchestration","Infrastructure Automation","Monitoring and Alerting","Incident Response","Postmortem Analysis","DevOps Practices","Cloud-Native Applications","Microservices Architecture"],"datePosted":"2026-03-06T18:29:47.579Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Security Reliability Engineering, Infrastructure as Code, Cloud Computing, Containerization, DevOps, Git, Terraform, Ansible, Chef, Kubernetes, Docker, Microsoft Entra, Azure, Identity and Access Management, Endpoint Security, Platform Services, Site Reliability Engineering, Cloud Security, Container Orchestration, Infrastructure Automation, Monitoring and Alerting, Incident Response, Postmortem Analysis, DevOps Practices, Cloud-Native Applications, Microservices Architecture","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":293000,"maxValue":385000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_33c8b32c-a06"},"title":"Site Reliability Engineer, Frontier Systems Infrastructure","description":"<p><strong>Site Reliability Engineer, Frontier Systems Infrastructure</strong></p>\n<p><strong>Location</strong></p>\n<p>San Francisco</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p>Scaling</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$255K – $490K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p>More details about our benefits are available to candidates during the hiring process.</p>\n<p>This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.</p>\n<p><strong>About the Team</strong></p>\n<p>The Frontier Systems team at OpenAI builds, launches, and supports the largest supercomputers in the world that OpenAI uses for its most cutting edge model training.</p>\n<p>We take data center designs, turn them into real, working systems and build any software needed for running large-scale frontier model trainings.</p>\n<p>Our mission is to bring up, stabilize and keep these hyperscale supercomputers reliable and efficient during the training of the frontier models.</p>\n<p><strong>About the Role</strong></p>\n<p>We are looking for engineers to operate the next generation of compute clusters that power OpenAI’s frontier research.</p>\n<p>This role blends distributed systems engineering with hands-on infrastructure work on our largest datacenters. You will scale Kubernetes clusters to massive scale, automate bare-metal bring-up, and build the software layer that hides the complexity of a magnitude of nodes across multiple data centers.</p>\n<p>You will work at the intersection of hardware and software, where speed and reliability are critical. Expect to manage fast-moving operations, quickly diagnose and fix issues when things are on fire, and continuously raise the bar for automation and uptime.</p>\n<p><strong>In this role, you will:</strong></p>\n<ul>\n<li>Spin up and scale large Kubernetes clusters, including automation for provisioning, bootstrapping, and cluster lifecycle management</li>\n</ul>\n<ul>\n<li>Build software abstractions that unify multiple clusters and present a seamless interface to training workloads</li>\n</ul>\n<ul>\n<li>Own node bring-up from bare metal through firmware upgrades, ensuring fast, repeatable deployment at massive scale</li>\n</ul>\n<ul>\n<li>Improve operational metrics such as reducing cluster restart times (e.g., from hours to minutes) and accelerating firmware or OS upgrade cycles</li>\n</ul>\n<ul>\n<li>Integrate networking and hardware health systems to deliver end-to-end reliability across servers, switches, and data center infrastructure</li>\n</ul>\n<ul>\n<li>Develop monitoring and observability systems to detect issues early and keep clusters stable under extreme load</li>\n</ul>\n<ul>\n<li>Be expected to execute at the same level as a software engineer</li>\n</ul>\n<p><strong>You might thrive in this role if you:</strong></p>\n<ul>\n<li>Have deep experience operating or scaling Kubernetes clusters or similar container orchestration systems in high-growth or hyperscale environments</li>\n</ul>\n<ul>\n<li>Bring strong programming or scripting skills (Python, Go, or similar) and familiarity with Infrastructure-as-Code tools such as Terraform or CloudFormation</li>\n</ul>\n<ul>\n<li>Are comfortable with bare-metal Linux environments, GPU hardware, and large-scale networking</li>\n</ul>\n<ul>\n<li>Enjoy solving fast-moving, high-impact operational problems and building automation to eliminate manual work</li>\n</ul>\n<ul>\n<li>Can balance careful engineering with the urgency of keeping mission-critical systems running</li>\n</ul>\n<p><strong>Qualifications</strong></p>\n<ul>\n<li>Experience as an infrastructure, systems, or distributed systems engineer in large-scale or high-availability environments</li>\n</ul>\n<ul>\n<li>Strong knowledge of Kubernetes internals, cluster scaling patterns, and containerized workloads</li>\n</ul>\n<ul>\n<li>Proficiency in cloud infrastructure concepts (compute, networking, storage, security) and in automating cluster or data center operations</li>\n</ul>\n<p>_Bonus: background with GPU workloads, firmware management, or high-performance computing_</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_33c8b32c-a06","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/ad2cf782-15a4-48c7-9133-1788e3f33bbb","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$255K – $490K • Offers Equity","x-skills-required":["Kubernetes","Python","Go","Terraform","CloudFormation","Linux","GPU hardware","Large-scale networking"],"x-skills-preferred":["Infrastructure-as-Code","Container orchestration","Distributed systems engineering","Cloud infrastructure concepts"],"datePosted":"2026-03-06T18:29:28.224Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Kubernetes, Python, Go, Terraform, CloudFormation, Linux, GPU hardware, Large-scale networking, Infrastructure-as-Code, Container orchestration, Distributed systems engineering, Cloud infrastructure concepts","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":255000,"maxValue":490000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_8481d62a-9bf"},"title":"Software Engineer, Reliability","description":"<p><strong>Job Posting</strong></p>\n<p><strong>Software Engineer, Reliability</strong></p>\n<p><strong>Location</strong></p>\n<p>San Francisco</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p>Applied AI</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$230K – $490K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p>More details about our benefits are available to candidates during the hiring process.</p>\n<p><strong>Job Description</strong></p>\n<p>This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.</p>\n<p>Join the engineering teams that bring OpenAI’s ideas safely to the world!!</p>\n<p>The Applied Engineering team works across research, engineering, product, and design to bring OpenAI’s technology to consumers and businesses. We seek to learn from deployment and distribute the benefits of AI, while ensuring that this powerful tool is used responsibly and safely. Safety is more important to us than unfettered growth.</p>\n<p><strong>About the Role</strong></p>\n<p>As OpenAI continues to grow, we are looking for experienced, problem-solving engineers to ensure our systems scale. Our success depends on our ability to quickly iterate on products while also ensuring that they are performant and reliable. You will work in a deeply iterative, collaborative, fast-paced environment to bring our technology to millions of users around the world, and ensure it’s delivered with safety and reliability in mind. Successful candidates will play a crucial role in ensuring the reliability, scalability, and performance of our systems as we continue to expand. As a reliability expert, you will be at the forefront of maintaining and enhancing the stability, scalability, and performance of our rapidly evolving infrastructure. You will work closely with cross-functional teams, including software engineers, product managers, and data scientists, to build and maintain resilient systems that can handle our growing user base and workload.</p>\n<p><strong>In this role, you will:</strong></p>\n<ul>\n<li>Design and implement solutions to ensure the scalability of our infrastructure to meet rapidly increasing demands.</li>\n</ul>\n<ul>\n<li>Build and maintain the load, chaos and synthetic testing software leveraged by development teams to make the systems they design and operate more reliable.</li>\n</ul>\n<ul>\n<li>Build and maintain automation tools to streamline repetitive tasks and improve system reliability.</li>\n</ul>\n<ul>\n<li>Build and maintain the platform for CPU/storage, GPU, and network lifecycle management to drive efficiency, accountability and support dynamic optimization of our resources.</li>\n</ul>\n<ul>\n<li>Implement fault-tolerant and resilient design patterns to minimize service disruptions.</li>\n</ul>\n<ul>\n<li>Develop and maintain service level objectives (SLOs) and service level indicators (SLIs) to measure and ensure system reliability.</li>\n</ul>\n<ul>\n<li>Partner with researchers, engineers, product managers, and designers to bring new features and research capabilities to the world.</li>\n</ul>\n<ul>\n<li>Participate in an on-call rotation to respond to critical incidents and ensure 24/7 system availability.</li>\n</ul>\n<p><strong>You might thrive in this role if you:</strong></p>\n<ul>\n<li>Have a track record of accelerating engineering reliability by empowering your fellow engineers with excellent tooling and systems.</li>\n</ul>\n<ul>\n<li>Have a humble attitude, an eagerness to help your colleagues, and a desire to do whatever it takes to make the team succeed.</li>\n</ul>\n<ul>\n<li>Own problems end-to-end, and are willing to pick up whatever knowledge you&#39;re missing to get the job done.</li>\n</ul>\n<ul>\n<li>Enjoy seeking out and addressing bottlenecks and areas for performance improvement in our systems.</li>\n</ul>\n<ul>\n<li>Utilize Infrastructure as Code (IaC) principles to automate infrastructure provisioning and configuration management.</li>\n</ul>\n<ul>\n<li>Are experienced in collaborating with cross-functional teams to ensure that reliability and scalability are considered in the design and development of new features and services.</li>\n</ul>\n<p><strong>Qualifications:</strong></p>\n<ul>\n<li>Bachelor&#39;s degree in Computer Science, Information Technology, or a related field (or equivalent work experience).</li>\n</ul>\n<ul>\n<li>Proven experience as an SWE focused on reliability or a similar role in a fast-paced, rapidly scaling company.</li>\n</ul>\n<ul>\n<li>Strong proficiency in cloud infrastructure.</li>\n</ul>\n<ul>\n<li>Proficiency in programming languages.</li>\n</ul>\n<ul>\n<li>Experience with containerization technologies and container orchestration platforms like Kubernetes.</li>\n</ul>\n<ul>\n<li>Knowledge of IaC tools such as Terraform or CloudFormation.</li>\n</ul>\n<ul>\n<li>Excellent problem-solving and troubleshooting skills.</li>\n</ul>\n<ul>\n<li>Strong communication and collaboration skills.</li>\n</ul>\n<ul>\n<li>Experience with</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_8481d62a-9bf","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/1faee5e7-3b2f-4d8c-9a6f-ff0f2a4a42a7","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$230K – $490K","x-skills-required":["cloud infrastructure","programming languages","containerization technologies","container orchestration platforms","IaC tools","problem-solving and troubleshooting skills","communication and collaboration skills"],"x-skills-preferred":["Infrastructure as Code (IaC) principles","automated infrastructure provisioning and configuration management"],"datePosted":"2026-03-06T18:26:04.351Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"cloud infrastructure, programming languages, containerization technologies, container orchestration platforms, IaC tools, problem-solving and troubleshooting skills, communication and collaboration skills, Infrastructure as Code (IaC) principles, automated infrastructure provisioning and configuration management","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":230000,"maxValue":490000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_cdafe464-67b"},"title":"Software Engineer, Codex Cloud","description":"<p><strong>Software Engineer, Codex Cloud</strong></p>\n<p><strong>Location</strong></p>\n<p>San Francisco; Seattle</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Location Type</strong></p>\n<p>On-site</p>\n<p><strong>Department</strong></p>\n<p>Applied AI</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$230K – $325K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p><strong>About the Team</strong></p>\n<p>With Codex we’re building an AI software engineer. One that you can pair with, delegate to, or even ask to take on future tasks proactively. Our team is a fast-moving group within OpenAI, bringing together research, engineering, design, and product. We iteratively build the Codex agent harness and product to get the most out of the model, and we iteratively train the model to be great at complex software engineering tasks.</p>\n<p><strong>About the Role</strong></p>\n<p>Codex Cloud owns cloud-based agentic experiences like code review and cloud tasks (with many more on the way), but also the underlying runtime/orchestration layer for cloud-based agentic tasks. That runtime is the execution and infrastructure layer that turns Codex from a code generator into a real AI software engineer operating at scale. It provides secure, sandboxed environments where Codex can run commands, read and write files, execute tests, and iteratively improve its work across real codebases — just like a human developer. More broadly, Codex Runtime is the distributed systems substrate that enables large numbers of supervised AI agents to run inside OpenAI’s and customers’ data centers, tackling increasingly complex goals over long time horizons. By giving agents a real working context and reliable execution environment, Codex Runtime makes it possible for Codex to validate its own changes, debug failures, and deliver production-ready results, defining how AI agents safely, reliably, and at scale interact with the world’s software.</p>\n<p><strong>In this role, you will:</strong></p>\n<ul>\n<li>Shape the evolution of Codex itself by identifying how teams actually use (and break) AI-powered software engineering, and driving changes across product, infrastructure, and model behavior to make Codex a truly reliable teammate for organizations.</li>\n</ul>\n<ul>\n<li>Design customer-facing (across consumer and enterprise segments) software engineering experiences end-to-end to automate and empower engineers to safely build and deliver software to their customers.</li>\n</ul>\n<ul>\n<li>Build the core team and enterprise primitives that make Codex usable at scale including container orchestration, virtual machine provisioning/configuration, execution sandboxes, shared block storage, RBAC, admin and audit surfaces, usage, rate limits and pricing controls, managed configuration and constraints, and analytics that give teams and operators deep visibility into how Codex is being used.</li>\n</ul>\n<ul>\n<li>Design and own secure, observable, full-stack systems that power Codex across web, IDEs, CLI, and CI/CD, integrating with enterprise identity and governance systems (SSO/SAML/OIDC, SCIM, policy enforcement) and building data-access patterns that are performant, compliant, and trustworthy.</li>\n</ul>\n<ul>\n<li>Lead real-world deployments and launches by working directly with customers and GTM to roll Codex out across teams, using live usage and operational signals to rapidly iterate and turn messy, real-world feedback into scalable product and platform improvements.</li>\n</ul>\n<p><strong>You might thrive in this role if you:</strong></p>\n<ul>\n<li>Have strong software engineering fundamentals and experience turning ideas into production-grade large-scale distributed systems, thinking holistically about balancing speed, performance, costs, and user experience.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_cdafe464-67b","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/130a5389-83e1-493f-9205-542d3ff53afb","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$230K – $325K • Offers Equity","x-skills-required":["software engineering","artificial intelligence","distributed systems","container orchestration","virtual machine provisioning","execution sandboxes","shared block storage","RBAC","admin and audit surfaces","usage","rate limits and pricing controls","managed configuration and constraints","analytics","SSO/SAML/OIDC","SCIM","policy enforcement","data-access patterns","performant","compliant","trustworthy"],"x-skills-preferred":["cloud computing","machine learning","natural language processing","computer vision","robotics","cybersecurity","DevOps","agile development","scrum","kanban"],"datePosted":"2026-03-06T18:25:00.902Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco; Seattle"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"software engineering, artificial intelligence, distributed systems, container orchestration, virtual machine provisioning, execution sandboxes, shared block storage, RBAC, admin and audit surfaces, usage, rate limits and pricing controls, managed configuration and constraints, analytics, SSO/SAML/OIDC, SCIM, policy enforcement, data-access patterns, performant, compliant, trustworthy, cloud computing, machine learning, natural language processing, computer vision, robotics, cybersecurity, DevOps, agile development, scrum, kanban","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":230000,"maxValue":325000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_3f16d353-491"},"title":"Software Engineer, Infrastructure Reliability","description":"<p><strong>Software Engineer, Infrastructure Reliability</strong></p>\n<p><strong>Location</strong></p>\n<p>San Francisco</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p>Applied AI</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$255K – $385K</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p><strong>About the Team</strong></p>\n<p>We’re hiring Software Engineers to join our Applied Infrastructure organization, and more specifically for our Database Systems and Online Storage teams. These teams operate with a high degree of autonomy and are deeply collaborative, with a shared mandate to raise the bar on safety, reliability, and velocity across OpenAI.</p>\n<p><strong>About the Role</strong></p>\n<p>You’ll be at the heart of scaling and hardening the infrastructure that powers some of the most widely used AI systems in the world. You’ll help ensure our systems are highly reliable, observable, performant, and secure—so researchers can iterate quickly, and products like ChatGPT and the OpenAI API can serve millions of users safely and effectively.</p>\n<p>This is a hands-on, high-leverage role for engineers who thrive on ownership, love solving deep technical problems across the stack, and want to work on systems that support cutting-edge research and deploy at global scale. You’ll play a key part in shaping technical direction, proactively improving system resilience, and collaborating closely with infra, product, and research teams to turn complex infrastructure into reliable platforms.</p>\n<p><strong>In this role you will:</strong></p>\n<ul>\n<li>Design, build, and operate reliable and performant systems used across engineering.</li>\n</ul>\n<ul>\n<li>Identify and fix performance bottlenecks and inefficiencies, ensuring our infrastructure can scale to the next order of magnitude.</li>\n</ul>\n<ul>\n<li>Dig deep to resolve complex issues.</li>\n</ul>\n<ul>\n<li>Continuously improve automation to reduce manual work. Improve internal tooling and our developer experience.</li>\n</ul>\n<ul>\n<li>Contribute to incident response, postmortems, and the development of best practices around system reliability and scalability.</li>\n</ul>\n<p><strong>You might thrive in this role if you:</strong></p>\n<ul>\n<li>Have a deep understanding of distributed systems principles and a proven track record in building and operating scalable and reliable systems.</li>\n</ul>\n<ul>\n<li>Have a keen eye for performance and optimization. You know how to squeeze the most performance out of complex, globally-distributed systems.</li>\n</ul>\n<ul>\n<li>Have experience operating orchestration systems such as Kubernetes at scale and building abstractions over cloud platforms</li>\n</ul>\n<ul>\n<li>Are comfortable working in Linux environments, and with tools like Kubernetes, Terraform, CI/CD pipelines, and modern observability stacks.</li>\n</ul>\n<ul>\n<li>Are experienced in collaborating with cross-functional teams to ensure that reliability and scalability are considered in the design and development of new features and services.</li>\n</ul>\n<ul>\n<li>Have a humble attitude, an eagerness to help your colleagues, and a desire to do whatever it takes to make the team succeed.</li>\n</ul>\n<ul>\n<li>Own problems end-to-end, and are willing to pick up whatever knowledge you&#39;re missing to get the job done.</li>\n</ul>\n<ul>\n<li>Are comfortable with ambiguity and rapid change.</li>\n</ul>\n<p><strong>Qualifications:</strong></p>\n<ul>\n<li>4+ years of relevant industry experience, with 2+ years leading large scale, complex projects or teams as an engineer or tech lead</li>\n</ul>\n<ul>\n<li>A passion for distributed systems at scale with a focus on reliability, scalability, security, and continuous improvement.</li>\n</ul>\n<ul>\n<li>Proven experience as an reliability engineer, production engineer, or a similar role in a fast-paced, rapidly scaling company.</li>\n</ul>\n<ul>\n<li>Strong proficiency in cloud infrastructure (like AWS, GCP, Azure) and IaC tools such as Terraform. Proficiency in programming / scripting languages.</li>\n</ul>\n<ul>\n<li>Experience with containerization technologies and container orchestration platforms like Kubernetes.</li>\n</ul>\n<ul>\n<li>Experience with observability tools such as Datadog, Prometheus, Grafana, Splunk and ELK stack.</li>\n</ul>\n<ul>\n<li>Experience with microservices architecture and service mesh technologies.</li>\n</ul>\n<ul>\n<li>Knowledge of security best practices in cloud environments.</li>\n</ul>\n<ul>\n<li>Strong understanding of distributed systems, networking, and database technologies.</li>\n</ul>\n<ul>\n<li>Excellent problem-solving skills and ability to work in a fast-paced environment.</li>\n</ul>\n<p><strong>About OpenAI</strong></p>\n<p>OpenAI is an AI research and deployment company that aims to develop and apply general-purpose technologies to align with human values.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_3f16d353-491","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/779b340d-e645-4da1-a923-b3070a26d936","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$255K – $385K","x-skills-required":["cloud infrastructure","IaC tools","programming/scripting languages","containerization technologies","container orchestration platforms","observability tools","microservices architecture","service mesh technologies","security best practices","distributed systems","networking","database technologies"],"x-skills-preferred":["Kubernetes","Terraform","Datadog","Prometheus","Grafana","Splunk","ELK stack"],"datePosted":"2026-03-06T18:24:50.552Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"cloud infrastructure, IaC tools, programming/scripting languages, containerization technologies, container orchestration platforms, observability tools, microservices architecture, service mesh technologies, security best practices, distributed systems, networking, database technologies, Kubernetes, Terraform, Datadog, Prometheus, Grafana, Splunk, ELK stack","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":255000,"maxValue":385000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_2b3a3ab9-2bc"},"title":"Member of Technical Staff, HPC Operations Engineering Manager","description":"<p><strong>Summary</strong></p>\n<p>Microsoft AI are looking for a talented Member of Technical Staff, HPC Operations Engineering Manager to join their MAI SuperIntelligence Team. This role sits at the heart of strategic decision-making, turning market data into actionable insights for a company that&#39;s revolutionising haptic entertainment technology. You&#39;ll work directly with leadership to shape the company&#39;s direction in the cinema and simulation markets.</p>\n<p><strong>About the Role</strong></p>\n<p>In this role, you&#39;ll lead a team of Site Reliability Engineers who blend software engineering and systems engineering to keep our large-scale distributed AI infrastructure reliable and efficient. You&#39;ll work closely with ML researchers, data engineers, and product developers to design and operate the platforms that power training, fine-tuning, and serving generative AI models.</p>\n<p><strong>Accountabilities</strong></p>\n<ul>\n<li>Conduct in-depth market research across cinema and simulation sectors, identifying emerging trends, competitive threats, and partnership opportunities that directly inform the company&#39;s quarterly strategic planning sessions</li>\n<li>Lead a team of experienced SREs to ensure uptime, resiliency and fault tolerance of AI model training and inference systems</li>\n</ul>\n<p><strong>The Candidate we&#39;re looking for</strong></p>\n<p><strong>Experience:</strong></p>\n<ul>\n<li>8+ years technical engineering experience with Site Reliability Engineering, DevOps, or Infrastructure Engineering Leadership roles</li>\n</ul>\n<p><strong>Technical skills:</strong></p>\n<ul>\n<li>Kubernetes, Docker, and container orchestration</li>\n<li>Public cloud platforms like Azure/AWS/GCP and infrastructure-as-code</li>\n</ul>\n<p><strong>Personal attributes:</strong></p>\n<ul>\n<li>Low ego individual</li>\n</ul>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Competitive salary</li>\n<li>Benefits and other compensation</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_2b3a3ab9-2bc","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Microsoft AI","sameAs":"https://microsoft.ai","logo":"https://logos.yubhub.co/microsoft.ai.png"},"x-apply-url":"https://microsoft.ai/job/member-of-technical-staff-hpc-operations-engineering-manager-mai-superintelligence-team/","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"USD $139,900 – $274,800 per year","x-skills-required":["Kubernetes","Docker","container orchestration","public cloud platforms","infrastructure-as-code"],"x-skills-preferred":["monitoring & observability tools","Grafana","Datadog","OpenTelemetry"],"datePosted":"2026-03-06T07:26:34.569Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Mountain View"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Kubernetes, Docker, container orchestration, public cloud platforms, infrastructure-as-code, monitoring & observability tools, Grafana, Datadog, OpenTelemetry","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":139900,"maxValue":274800,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_a6ce6c28-ca1"},"title":"Search Rust Engineer","description":"<p>Perplexity AI is seeking a talented Search Rust Engineer to join our rapidly growing team, driving innovation in AI-powered search experiences. As a Search Rust Engineer, your main mission will be to relentlessly optimise performance - squeezing every millisecond of latency from our search stack, while implementing robust, scalable, and reliable systems.</p>\n<p><strong>What you&#39;ll do</strong></p>\n<p>Your main responsibilities will be to architect, build, and optimise ultra-low-latency search infrastructure using Rust. You will also profile and instrument services, continuously driving down response times at scale, and develop and maintain distributed backend components powering real-time search and retrieval.</p>\n<ul>\n<li>Architect, build, and optimise ultra-low-latency search infrastructure using Rust</li>\n<li>Profile and instrument services, continuously driving down response times at scale</li>\n<li>Develop and maintain distributed backend components powering real-time search and retrieval</li>\n</ul>\n<p><strong>What you need</strong></p>\n<p>To be successful in this role, you will need to have deep expertise in Rust programming, especially for backend/search systems. You will also need experience profiling and tuning high-load, low-latency distributed services, and a strong understanding of systems design, Linux internals, and performance debugging.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_a6ce6c28-ca1","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Perplexity AI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/perplexity.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/perplexity/a19f1774-5944-4981-b446-e3e40d0dd281","x-work-arrangement":"onsite","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Rust programming","backend/search systems","systems design","Linux internals","performance debugging"],"x-skills-preferred":["cloud infrastructure","container orchestration","benchmarking","instrumentation","continuous performance improvement"],"datePosted":"2026-03-04T12:27:33.366Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Belgrade, Berlin, London"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Rust programming, backend/search systems, systems design, Linux internals, performance debugging, cloud infrastructure, container orchestration, benchmarking, instrumentation, continuous performance improvement"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_ba52acc3-4fd"},"title":"Engineering Site Lead","description":"<p>We&#39;re seeking an exceptional Site Lead to establish and scale our London office. This is a unique opportunity to shape Perplexity&#39;s presence in one of the world&#39;s leading tech hubs, building teams and culture from the ground up while driving technical excellence in infrastructure and AI systems.</p>\n<p><strong>What you&#39;ll do</strong></p>\n<p>As Site Lead, you&#39;ll serve as the face of Perplexity in London, responsible for building our technical organization, fostering a world-class engineering culture, and directly managing one or more infrastructure teams. You&#39;ll report to senior leadership and work cross-functionally with teams across our global footprint.</p>\n<p><strong>What you need</strong></p>\n<ul>\n<li>10+ years of experience in software engineering with 5+ years in infrastructure, cloud infrastructure, or AI infrastructure roles</li>\n<li>3+ years of people management experience, including building and scaling teams</li>\n<li>Proven track record of establishing or significantly growing an engineering site or office</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_ba52acc3-4fd","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Perplexity","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/perplexity.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/perplexity/638e6823-be7f-46c6-9675-7b1197fc9b8c","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["distributed systems","cloud platforms","infrastructure automation","GPU infrastructure and orchestration","ML training and inference pipelines","Model serving and deployment at scale","Kubernetes","Terraform","container orchestration","CI/CD systems"],"x-skills-preferred":["experience at companies focused on AI/ML, search, or large-scale consumer applications","previous experience as a site lead, office lead, or similar multi-team leadership role","background in building infrastructure for LLM training or inference","contributions to open-source infrastructure or AI infrastructure projects","experience scaling teams from 0 to 20+ engineers","active involvement in the London or European tech community"],"datePosted":"2026-03-04T12:25:03.700Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"London"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"distributed systems, cloud platforms, infrastructure automation, GPU infrastructure and orchestration, ML training and inference pipelines, Model serving and deployment at scale, Kubernetes, Terraform, container orchestration, CI/CD systems, experience at companies focused on AI/ML, search, or large-scale consumer applications, previous experience as a site lead, office lead, or similar multi-team leadership role, background in building infrastructure for LLM training or inference, contributions to open-source infrastructure or AI infrastructure projects, experience scaling teams from 0 to 20+ engineers, active involvement in the London or European tech community"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_4a7597fd-d7a"},"title":"Senior Data Engineer","description":"<p>Joining Razer will place you on a global mission to revolutionize the way the world games. Razer is a place to do great work, offering you the opportunity to make an impact globally while working across a global team located across 5 continents. Razer is also a great place to work, providing you the unique, gamer-centric #LifeAtRazer experience that will put you in an accelerated growth, both personally and professionally.</p>\n<p><strong>What you&#39;ll do</strong></p>\n<p>We are looking for a Senior Data Engineer to lead the technical initiatives for AI Data Engineering, enabling scalable, high-performance data pipelines that power AI and machine learning applications. This role will focus on architecting, optimizing, and managing data infrastructure to support AI model training, feature engineering, and real-time inference. You will collaborate closely with AI/ML engineers, data scientists, and platform teams to build the next generation of AI-driven products.</p>\n<ul>\n<li>Lead AI Data Engineering initiatives by driving the design and development of robust data pipelines for AI/ML workloads, ensuring efficiency, scalability, and reliability.</li>\n<li>Design and implement data architectures that support AI model training, including feature stores, vector databases, and real-time streaming solutions.</li>\n<li>Develop high performance data pipelines that process structured, semi-structured, and unstructured data at scale, supporting the various AI applications</li>\n</ul>\n<p><strong>What you need</strong></p>\n<ul>\n<li>Hands on experience working with Vector/Graph;Neo4j</li>\n<li>3+ years of experience in data engineering, working on AI/ML-driven data architectures</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_4a7597fd-d7a","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Razer","sameAs":"https://razer.wd3.myworkdayjobs.com","logo":"https://logos.yubhub.co/razer.com.png"},"x-apply-url":"https://razer.wd3.myworkdayjobs.com/en-US/Careers/job/Singapore/Senior-Data-Engineer_JR2025005485","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Hands on experience working with Vector/Graph;Neo4j","3+ years of experience in data engineering, working on AI/ML-driven data architectures"],"x-skills-preferred":["Python","SQL","Experience in developing and deploying applications running on cloud infrastructure such as AWS, Azure or Google Cloud Platform using Infrastructure as code tools such as Terraform, containerization tools like Dockers, container orchestration platforms like Kubernetes","Experience using orchestration tools like Airflow or Prefect, distributed computing framework like Spark or Dask, data transformation tool like Data Build Tool (DBT)","Excellent with various data processing techniques (both streaming and batch), managing and optimizing data storage (Data Lake, Lake House and Database, SQL, and NoSQL) is essential."],"datePosted":"2026-01-01T15:49:59.491Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Singapore"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Hands on experience working with Vector/Graph;Neo4j, 3+ years of experience in data engineering, working on AI/ML-driven data architectures, Python, SQL, Experience in developing and deploying applications running on cloud infrastructure such as AWS, Azure or Google Cloud Platform using Infrastructure as code tools such as Terraform, containerization tools like Dockers, container orchestration platforms like Kubernetes, Experience using orchestration tools like Airflow or Prefect, distributed computing framework like Spark or Dask, data transformation tool like Data Build Tool (DBT), Excellent with various data processing techniques (both streaming and batch), managing and optimizing data storage (Data Lake, Lake House and Database, SQL, and NoSQL) is essential."},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_e5eb908e-6f9"},"title":"Senior Data Engineer","description":"<p>We are looking for a Senior Data Engineer to lead the technical initiatives for AI Data Engineering, enabling scalable, high-performance data pipelines that power AI and machine learning applications. This role will focus on architecting, optimizing, and managing data infrastructure to support AI model training, feature engineering, and real-time inference.</p>\n<p><strong>What you&#39;ll do</strong></p>\n<p>We are looking for a Senior Data Engineer to lead the technical initiatives for AI Data Engineering, enabling scalable, high-performance data pipelines that power AI and machine learning applications. This role will focus on architecting, optimizing, and managing data infrastructure to support AI model training, feature engineering, and real-time inference.</p>\n<ul>\n<li>Lead AI Data Engineering initiatives by driving the design and development of robust data pipelines for AI/ML workloads, ensuring efficiency, scalability, and reliability.</li>\n<li>Design and implement data architectures that support AI model training, including feature stores, vector databases, and real-time streaming solutions.</li>\n</ul>\n<p><strong>What you need</strong></p>\n<ul>\n<li>Hands on experience working with Vector/Graph;Neo4j</li>\n<li>3+ years of experience in data engineering, working on AI/ML-driven data architectures</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_e5eb908e-6f9","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Razer","sameAs":"https://razer.wd3.myworkdayjobs.com","logo":"https://logos.yubhub.co/razer.com.png"},"x-apply-url":"https://razer.wd3.myworkdayjobs.com/en-US/Careers/job/Singapore/Senior-Data-Engineer_JR2025005485","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Vector/Graph;Neo4j","data engineering","AI/ML-driven data architectures"],"x-skills-preferred":["Python","SQL","Terraform","containerization tools like Dockers","container orchestration platforms like Kubernetes","orchestration tools like Airflow or Prefect","distributed computing framework like Spark or Dask","data transformation tool like Data Build Tool (DBT)"],"datePosted":"2025-12-26T10:53:07.867Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Singapore"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Vector/Graph;Neo4j, data engineering, AI/ML-driven data architectures, Python, SQL, Terraform, containerization tools like Dockers, container orchestration platforms like Kubernetes, orchestration tools like Airflow or Prefect, distributed computing framework like Spark or Dask, data transformation tool like Data Build Tool (DBT)"}]}