Technical Lead, MFT MDE Analytics Engineering

d6e7c226-e8c Technical Lead, MFT MDE Analytics Engineering The SPEED Market Data team at Equity IT is seeking a hands-on Technical Lead to own and drive a critical workstream focused on architecting, implementing, monitoring, and supporting low-latency C++ systems. As a Technical Lead, you will shape the future of the industry by working alongside exceptional engineers and strategists to solve significant engineering problems.

We are looking for a strong technical leader with financial markets technology experience and real-time market data expertise to design, build, and support our global real-time market data platform. This role emphasizes technical leadership, architectural ownership, and cross-team coordination rather than people management.

Principal Responsibilities:

Act as the technical owner for a major market data workstream, setting technical direction, defining architecture, and driving execution across the full lifecycle.
Collaborate with hardware and software teams across divisions to design and build real-time market data processing and distribution systems.
Lead and drive new technical initiatives for the team, including evaluating technologies, defining standards, and establishing best practices.
Design and develop systems, interfaces, and tools for historical market data and trading simulations that increase research productivity.
Architect and implement components of an enterprise market data platform, including components for caching, aggregation, conflation and value-added data enrichment.
Optimise platform performance using network and systems programming, and advanced low-latency techniques (CPU, NIC, kernel, and application-level tuning).
Lead the design and maintenance of automated test and benchmark frameworks, and tools for risk management, performance tracking, and system validation.
Provide technical leadership for the support and operation of both enterprise real-time market data environments, including coordinating internal, vendor, and exchange-driven changes.
Design and engineer components to automate support and management of the market data platform, including monitoring, real-time and historical metrics collection/visualisation, and self-service administrative/user tools.
Serve as a primary technical liaison for users of the market data environment (Portfolio Managers, trading desks, and core technology teams), translating requirements into robust technical solutions.
Lead the enhancement of processes and workflows for operating the market data platform (release/deployment, incident management and remediation, exchange notification handling, defining and enforcing SLAs).
Mentor and influence other engineers through code reviews, design reviews, and hands-on guidance, fostering a culture of technical excellence and accountability.

Qualifications / Skills Required:

Degree in Computer Science or a related field with a strong background in data structures, algorithms, and object-oriented programming in modern C++.
Deep understanding of Linux system internals and networking, especially in low-latency and high-throughput environments.
Strong knowledge of CPU architecture and the ability to leverage CPU capabilities for performance optimisation.
Demonstrated experience acting as a technical lead or senior engineer owning complex systems or workstreams end-to-end (design, delivery, and operations).
Able to prioritise and make trade-offs in a fast-moving, high-pressure, constantly changing environment; strong sense of urgency, ownership, and follow-through.
Strong belief in and practice of extreme ownership, with a track record of taking accountability for systems in production.
Effective communication and stakeholder management skills: able to work closely with business and technology users, understand their needs, and drive appropriate technical solutions.
Experience building solutions on cloud environments such as GCP and AWS.
Knowledge of additional programming languages such as Java, Python, or scripting (Perl, shell).
Technical background in application development on complex market data systems (e.g., Bloomberg, Thomson Reuters, etc.).
Experience supporting market data environments within a global organisation, including internally developed DMA feed handlers and distribution infrastructure.
Strong understanding of market data concepts and functionality, including data models (fields/messages), protocols (e.g., snapshot + delta), order book representations (L1/L2/L3), recovery, and reliability.
Hands-on Site Reliability Engineering or DevOps experience, including system administration, automation, measurement, and release/deployment management.
Experience with monitoring, metrics, and command/control tooling for distributed market data platforms, with the ability to evaluate existing solutions and drive enhancements across development and operations.
Ability to operate with a high level of thoroughness and attention to detail, demonstrating strong ownership of deliverables and production systems.

Millennium pays a total compensation package which includes a base salary, discretionary performance bonus, and a comprehensive benefits package. The estimated base salary range for this position is $175,000 to $250,000, which is specific to New York and may change in the future. When finalising an offer, we take into consideration an individual's experience level and the qualifications they bring to the role to formulate a competitive total compensation package.

XML job scraping automation by YubHub

]]> full-time senior onsite $175,000 to $250,000 C++, Linux system internals, Networking, CPU architecture, Object-oriented programming, Cloud environments, Java, Python, Scripting, Market data systems, Site Reliability Engineering, DevOps, Monitoring, Metrics, Command/control tooling Engineering Finance Equity IT https://logos.yubhub.co/mlp.eightfold.ai.png Equity IT is a technology company that provides services to the financial industry. https://mlp.eightfold.ai https://mlp.eightfold.ai/careers/job/755954905529 New York, New York, United States of America 2026-04-18 262aa1cb-01c Head of Corporate Engineering As Head of Corporate Engineering, you will be responsible for Enterprise engineering and operations globally. You will be responsible for building and managing a highly technical enterprise engineering team, developing first principled-based strategies, and enabling strong enterprise security.

Key responsibilities include engineering, securing and optimizing cloud infrastructure, Identity and Access Management, Endpoints, Collaboration tools, and ensuring compliance with SOX, PCI DSS, and FedRAMP compliance. The Head of Corporate Engineering will work closely with R&D on managing engineering tools like Jira, Confluence, and GitHub, driving efficient adoption and integration.

Strong technical and influencing leadership principles coupled with the ability to manage a complex, scaling, and fast-moving enterprise environment are essential. This role reports directly to the Vice President, Infrastructure and Operations

Responsibilities:

In this influential role, you will be responsible for:

Securing the Enterprise: Working closely with Enterprise Security organization to harden and secure our cloud environments, secret management, collaboration tools, endpoints, SaaS environments, IAM tools, and more. Success measured in continuous improvement of our enterprise security hardening standards

Building and Scaling our Cloud Infrastructure: Your team will be responsible for establishing and implementing enterprise cloud infrastructure including establishing Infrastructure Provisioning, SRE services, 24/7 on-call support, Infra as Code, observability, and more. In addition, you will be responsible for managing cloud budgets, vendor management, and establishing cost optimization initiatives. Success is measured in increased developer velocity while securing & scaling the cloud infrastructure

Engineering Tooling: Partner closely with R&D teams to establish policies, configurations, run-books, SLAs, hardening, scalability and availability of engineering tools like Github, Jira, Atlassian, and more

Endpoint Engineering: Enable extreme automation for endpoint management with zero-touch deployment, observability (synthetic and real-time), provisioning/de-provisioning, and establishing standards / SLAs. Enforce security policies, configure & manage security settings and ensure compliance across all endpoints and mobile devices. Success is measured in terms of end-user satisfaction and % of manual touch

Collaboration Management: Ensure we provide world class tools to our employees to be extremely productive and collaborative. This would include but not be limited to managing and scaling internal workplace products like Gmail, Slack, Atlassian, Moveworks, Glean, and more. Success is measured by user satisfaction

Identity & Access Management: Manage the IAM team from IAM implementation, access standards enforcement, SLA management, and compliance to various standards like FedRAMP, IL5, PCI, and more. Included are both internal and external identity providers to be managed. Success is measured by compliance, Identity governance, and availability

Desired Success Outcomes

A high-performing enterprise engineering team capable of handling complex technical projects with agility and high quality

Well defined cloud strategy ensuring the stability, scalability, and security of cloud infrastructure. Overhaul of current processes and workflows to address inefficiencies and increase team velocity

Robust endpoint security with Implementation of comprehensive security measures for all endpoints, including Mac, Windows, and mobile devices

Deliver high-quality employee experience with productivity tools (Gmail, Slack, Atlassian tools, Moveworks, GitHub) with a robust forward-looking roadmap

Efficient operational support for Tier 3 IT services with minimized production incidents. Implementation of robust incident and change management processes with mature operational practice

Efficient and mature processes for system integrations related to Mergers and Acquisitions (M&As), ensuring timely smooth transitions during M&A integrations

Development and implementation of automation tools and frameworks, Identification of automation opportunities to reduce manual toil and improve accuracy

Qualifications:

10 years of experience managing Cloud infrastructure at large enterprises. Extensive experience managing public cloud implementations in AWS. Experience with GCP and Azure will be a plus

In-depth understanding of Cloud native technologies to lead and guide the team. Must have hands-on experience in troubleshooting and debugging issues in production environments

Working experience in managing DevOps/SRE practices OKRs (Objective and Key Results), Agile development, Infra-as-code, SRE (Site Reliability Engineering), DevOps measurement such as DORA KPIs,

In-depth understanding of each collaboration tool's features, functionalities, and configurations (e.g., Gmail for email, Slack for messaging). Ability to identify and integrate and optimize the use of various tools for seamless collaboration (e.g., connecting Jira with GitHub for Dev metrics)

Experience leading a team of senior professionals working asynchronously in a remote, distributed team. Strong communication skills, with clear verbal communication and written communication skills

Collaborative style: partners well with cross-functional teams to solve hard problems and to complete complex deliverables with quality and business outcomes

Provide mentorship and guidance to team members to ensure that their skills and knowledge are kept up-to-date

Pay Range Transparency Databricks is committed to fair and equitable compensation practices. The pay range(s) for this role is listed below and represents the expected salary range for non-commissionable roles or on-target earnings for commissionable roles. Actual compensation packages are based on several factors that are unique to each candidate, including but not limited to job-related skills, depth of experience, relevant certifications and training, and specific work location. Based on the factors above, Databricks anticipates utilizing the full width of the range. The total compensation package for this position may also include eligibility for annual performance bonus, equity, and the benefits listed above. For more information regarding which range your location is in visit our page here.

Zone 1 Pay Range $265,000-$364,300 USD

XML job scraping automation by YubHub

]]> full-time executive remote $265,000-$364,300 USD Cloud infrastructure, Identity and Access Management, Endpoint security, Collaboration tools, DevOps, Site Reliability Engineering, Agile development, Infrastructure as Code, Observability, Automation, Scripting languages, Cloud native technologies, Public cloud implementations, AWS, GCP, Azure, Jira, Confluence, GitHub, Atlassian, Moveworks, Glean, Slack, Gmail, Microsoft Office Engineering Technology Databricks https://logos.yubhub.co/databricks.com.png Databricks is a data and AI company that provides a unified platform for data, analytics, and AI. https://databricks.com https://job-boards.greenhouse.io/databricks/jobs/7293607002 San Francisco, California 2026-04-18 b05b9f90-7d3 Data Center Engineer, Resource Efficiency – Compute Supply About the Role

As a Power & Resource Efficiency Engineer, you'll sit at the intersection of IT and facilities , building the systems, models, and control loops that optimize how we allocate and consume power, cooling, and physical capacity across our TPU/GPU fleet.

You'll own the technical strategy for turning raw data center capacity into reliable, efficient compute, working across power topology, workload scheduling, and real-time telemetry to push utilization as close to the physical envelope as possible while maintaining our availability commitments.

Responsibilities

Build models that forecast consumption across electrical and mechanical subsystems, informing capacity planning, energy procurement, oversubscription targets and risks, including statistical modeling of cluster utilization, workload profiles, and failure modes.

Design IT/OT interfaces that bridge compute orchestration with facility controls, enabling real-time telemetry across accelerator hardware, power distribution, cooling, and schedulers.

Build and operate load management systems that use power and cooling topology to enable load management and power/thermal-aware placement to maximize throughput while meeting SLOs.

Partner with data center providers to drive design optimizations and hold them accountable to SLA-grade performance standards, providing technical diligence on partner architectures.

What We're Looking For

Deep knowledge of data center power distribution and cooling architectures, and how they interact with IT load profiles. Experience with reliability engineering, SLA development, and failure-mode analysis.

Proficiency in statistical modeling and simulation for infrastructure capacity or power utilization.

Familiarity with SCADA/BMS/EPMS, telemetry pipelines, and control systems. Experience building software that bridges IT and OT.

Exposure to accelerator deployments and their power management interfaces strongly preferred.

Demand response, grid interaction, or behind-the-meter generation experience is a plus.

Ability to translate between infrastructure engineering, software teams, and external partners.

Required Qualifications

Bachelor's degree in Electrical Engineering, Mechanical Engineering, Power Systems, Controls Engineering, or a related field.

5+ years of experience in data center infrastructure or facility engineering.

Demonstrated experience with data center power distribution and cooling system architectures.

Experience building or operating software-based power management, load scheduling, or control systems.

Proficiency in Python or similar languages for statistical modeling, simulation, or automation of data center infrastructure optimizations.

Familiarity with SCADA, BMS, EPMS, or industrial control systems and associated protocols (Modbus, BACnet, SNMP).

Track record of cross-functional collaboration across hardware, software, and facilities teams.

Preferred Qualifications

Master's or PhD in Controls, Power Systems, or related discipline and 3+ years of experience in data center infrastructure or facility engineering.

Experience with accelerator-class deployments and their power management interfaces.

Background in control theory, dynamical systems, or cyber-physical systems design.

Experience with energy storage, microgrid integration, demand response, or behind-the-meter generation.

Familiarity with reliability engineering methods.

Experience with SLA development, availability modeling, or service credit frameworks.

Exposure to ML/optimization techniques applied to infrastructure or energy systems.

Salary

The annual compensation range for this role is $320,000-$405,000 USD.

Benefits

We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with our team.

XML job scraping automation by YubHub

]]> full-time senior hybrid $320,000-$405,000 USD data center power distribution, cooling architectures, IT load profiles, reliability engineering, SLA development, failure-mode analysis, statistical modeling, simulation, infrastructure capacity, power utilization, SCADA/BMS/EPMS, telemetry pipelines, control systems, accelerator deployments, power management interfaces, demand response, grid interaction, behind-the-meter generation, Python, automation, data center infrastructure optimizations, SCADA, BMS, EPMS, industrial control systems, Modbus, BACnet, SNMP, accelerator-class deployments, control theory, dynamical systems, cyber-physical systems design, energy storage, microgrid integration, reliability engineering methods, availability modeling, service credit frameworks, ML/optimization techniques Engineering Technology Anthropic https://logos.yubhub.co/anthropic.com.png Anthropic creates reliable, interpretable, and steerable AI systems. It operates at massive scale, with a focus on extracting maximum compute throughput from every watt. https://www.anthropic.com/ https://job-boards.greenhouse.io/anthropic/jobs/5159642008 Remote-Friendly, United States 2026-04-18 c81cbaa1-56a Engineering Technical Program Manager - W&B Platform The Weights & Biases (W&B) team builds the developer platform trusted by machine learning practitioners to track, manage, and scale their ML workflows. As a Technical Program Manager focused on platform reliability and release management, you'll be at the centre of our platform's growth and stability.

You will partner with engineering teams within W&B and CoreWeave AI/ML Platform Services (AMPS) to ensure W&B integrates seamlessly into the broader ML ecosystem, while maintaining high reliability and predictable releases.

This role is ideal for someone who thrives in cross-functional environments, has a strong grasp of developer workflows, and excels at creating repeatable, reliable program structures that scale.

Responsibilities

Drive end-to-end program management for critical platform initiatives.
Build and run release management processes, ensuring predictable and high-quality delivery cycles.
Partner with engineering and product to define success metrics, manage risks, and ensure on-time delivery.
Build and scale incident management and RCA processes for W&B services.
Improve the predictability and visibility of releases across teams, introducing dashboards, retrospectives, and program forums.
Collaborate with TPMs and engineering leaders across W&B and CoreWeave to ensure end-to-end reliability across the ML developer stack.

Qualifications

Bachelor's degree in a technical field or equivalent experience.
5+ years of program management experience in SaaS, developer tools, or ML/AI platforms.
Proven experience running release management programs and incident management processes.
Strong technical fluency in cloud computing, developer workflows, and CI/CD practices.
Excellent communication and facilitation skills with diverse technical and non-technical audiences.
Track record of improving reliability, efficiency, and predictability in software delivery.

Additional Qualifications

Familiarity with ML workflows, model training/inference, and developer productivity tools.
Experience building integrations between SaaS platforms, APIs, and cloud services.
Strong background in reliability engineering practices and DevOps program leadership.

XML job scraping automation by YubHub

]]> full-time senior hybrid $177,000 to $237,000 cloud computing, developer workflows, CI/CD practices, program management, release management, incident management, reliability engineering, ML workflows, model training/inference, developer productivity tools, integration between SaaS platforms, APIs, and cloud services Engineering Technology CoreWeave https://logos.yubhub.co/coreweave.com.png CoreWeave is a cloud computing company that provides a platform for artificial intelligence and machine learning. https://www.coreweave.com https://job-boards.greenhouse.io/coreweave/jobs/4610109006 San Francisco, CA 2026-04-18 38a5c86c-54e Senior Compliance Engineer JOB TITLE: Senior Compliance Engineer LOCATION: Costa Mesa, California, United States DEPARTMENT: Corporate Technology : Information Security : Corporate Assurance

As a Senior Compliance Engineer at Anduril Industries, you will be responsible for driving automation, compliance, and security engineering principles into the design, integration, and operation of Anduril's internal systems. This is a technically hands-on role that requires a strong DevSecOps background with deep expertise in cloud infrastructure security, embedded systems security, and federal compliance frameworks.

Key Responsibilities

Design, develop, and maintain Infrastructure as Code (IaC) and Policy as Code (PaC) that enforce compliance with NIST SP 800-171 and 800-53, CMMC, and other applicable frameworks, enabling developers to deploy CMMC-certified applications using pre-packaged, compliant infrastructure templates.
Architect, build, and deploy robust, scalable security controls across Anduril's corporate, development, and production cloud environments (AWS, Azure, GCP) and on-premise environments.
Develop and automate IaC pipelines for managing and scaling cloud deployments securely and efficiently, including automated pipelines for deploying infrastructure, applications, and updates.
Build automation for procedural compliance controls, generating compliance and audit artifacts at scale without manual intervention.
Develop security models that integrate Continuous Monitoring (ConMon), DISA STIG scanning, and compliance reporting into a unified, automated workflow.

Compliance Engineering & Framework Implementation

Analyze, interpret, and operationalize federal and industry cybersecurity regulations, including NIST SP 800-171 and 800-53, CMMC, FedRAMP, and SOC 2, translating regulatory language into actionable engineering guidance and enforceable technical controls.
Evaluate system architectures and configurations to ensure alignment with required security controls for moderate-impact information systems.
Interface directly with infrastructure teams to verify and enforce compliance across existing on-premise and cloud stacks, identifying gaps and driving remediation.

Cross-Functional Collaboration & Enablement

Partner with engineers, the DevSecOps Team, and the Automation Team to implement and verify security controls in both corporate and product software environments.
Act as a force multiplier by embedding security best practices into the workflows of infrastructure, application, and product teams, particularly for environments holding mission-critical data.

Strategic & Advisory

Develop strategies and implementation plans for compliance-related matters, advising management on risk posture, regulatory changes, and investment priorities.
Institute best-practice procedures for compliance and risk mitigation across the organization.

Required Qualifications

3+ years of professional experience in Cloud Security, DevSecOps, Site Reliability Engineering (SRE), or a related security engineering role.
Background in one or more of the following disciplines: Systems Security Engineering, Cybersecurity, Systems Engineering, Software Engineering, Computer Engineering, or Computer Science.
Proven experience building and securing complex cloud environments at scale.
3+ years of hands-on experience working with compliance frameworks such as CMMC, NIST SP 800-171 and/or 800-53, and FedRAMP.

XML job scraping automation by YubHub

]]> full-time senior onsite Cloud Security, DevSecOps, Site Reliability Engineering, Systems Security Engineering, Cybersecurity, Systems Engineering, Software Engineering, Computer Engineering, Computer Science, Compliance Frameworks, NIST SP 800-171, NIST SP 800-53, CMMC, FedRAMP Engineering Technology Anduril Industries https://logos.yubhub.co/anduril.com.png Anduril Industries is a defense technology company that designs, builds, and sells advanced technology systems for the U.S. and allied military. https://www.anduril.com/ https://job-boards.greenhouse.io/andurilindustries/jobs/5087188007 Costa Mesa, California, United States 2026-04-18 500752e2-a61 Manager, Software Engineering - Interaction Design We are growing our team of software engineers and are looking for a manager to lead our interaction design team. As a manager, you will be responsible for building and executing a long-term roadmap to improve the platform, features, and runtimes supporting interactive and animated experiences in Figma products like Prototyping and Slides. You will also hire, manage, support and develop a team of engineers, including staff level engineers.

You will partner with product and engineering leadership to set strategy, priorities, and mission for teams and projects. You will roll up your sleeves as needed to get involved in the technical details of solving some of the most complex technical challenges at Figma. You will establish trust within and across teams by creating accountability and a positive work environment in partnership with other leaders in the organization.

You will grow your career in an engaged and creative engineering community. Figma is committed to building an inclusive and diverse team and culture. We expect all of our leaders to play a role in helping to build and drive these initiatives through hiring, community events, and other programs in partnership with teams across all of Figma.

To be successful in this role, you will need 2+ years of experience managing and leading a high output engineering team, 5+ years of engineering experience working on complex systems with an emphasis on performance, reliability, quality, and extensibility. You will also need demonstrated leadership skills in building a high-performing and highly engaged engineering team; including a proven track record of motivating, mentoring, and guiding senior engineers.

Experience building performant animation and interaction frameworks, tooling, and foundations is a plus. Deep knowledge of runtime environments and how they operate, including game, application, or browser engines is also a plus. Experience with creative coding frameworks used for building interaction, animation, and time-based media experiences is a plus.

XML job scraping automation by YubHub

]]> full-time senior remote $258,000-$376,000 USD Leadership, Software engineering, Complex system design, Performance optimization, Reliability engineering, Quality assurance, Extensibility, Animation and interaction frameworks, Runtime environments, Creative coding frameworks Engineering Technology Figma https://logos.yubhub.co/figma.com.png Figma is a software company that provides a platform for design collaboration. https://www.figma.com/ https://job-boards.greenhouse.io/figma/jobs/5778796004 San Francisco, CA • New York, NY • United States 2026-04-18 99fa9996-114 Reliability Engineer We are seeking an experienced Reliability Engineer to join our team in support of our Group 1-2 Unmanned Aircraft Systems (UAS) programs. As a Reliability Engineer at Anduril, you will leverage your extensive knowledge of advanced Reliability tools and methodologies to drive excellence in our Reliability programs.

Support and review system block diagrams, interface control documents, and schematics for compliance, redundancy, and accuracy against requirements and concept of operations. Lead Failure Modes & Effects Analysis efforts and collaborate with cross-functional Engineering teams to implement product improvements for Reliability.

Perform predictive reliability analysis to calculate probability of loss of control, loss of asset, and MTTF/MTBF utilizing Anduril requirements, specifications, and relevant industry standards (i.e. MIL-HDBK-217Plus). Develop Qualification Test plans to ensure appropriate requirements coverage, and work with Test Engineering to fixture, instrument, and execute test campaigns.

Develop Maintainability prediction models using MIL-HDBK-472, and support the development of Preventative Maintenance & Sparing plans. Perform Weibull analysis utilizing data from Qualification campaigns and from fielded assets. Work with Quality and Manufacturing Engineering to ensure inspection and acceptance test gates are in place and documented against requirements.

Use data analysis techniques to highlight key problem areas for deployed products, and drive root cause analysis & corrective action implementation to closure. Develop and implement Reliability processes for the business to improve and streamline our Design, Manufacturing, and Deployment Operations efforts for rapid development.

Support Development Milestone Reviews through identification of appropriate entry and exit criteria, and work with Quality Engineering to guide the hand off process to production.

Required qualifications include a minimum of 5 years experience as a Reliability Engineer or Design/Development Engineer, a B.S. Degree in Mechanical Engineering, Aerospace Engineering, Systems Engineering, or equivalent technical discipline, and experience with safety-critical hardware & software systems in the defense or aerospace industry.

Preferred qualifications include a M.S. Degree in Mechanical Engineering, Aerospace Engineering, Systems Engineering, or equivalent technical discipline, and experience with DO-254, DO-178, MIL-HDBK-217Plus, MIL-HDBK-472, ARP-4754.

The salary range for this role is $146,000-$194,000 USD. Anduril offers top-tier benefits for full-time employees, including comprehensive medical, dental, and vision plans, income protection, generous time off, caregiver & wellness leave, family planning & parenting support, mental health resources, professional development, commuter benefits, and relocation assistance.

XML job scraping automation by YubHub

]]> full-time senior onsite $146,000-$194,000 USD Reliability Engineering, Failure Modes & Effects Analysis, Predictive Reliability Analysis, Qualification Test Planning, Maintainability Prediction Models, Weibull Analysis, Data Analysis Techniques, Root Cause Analysis & Corrective Action Implementation, Reliability Processes Development, Development Milestone Reviews, DO-254, DO-178, MIL-HDBK-217Plus, MIL-HDBK-472, ARP-4754 Engineering Defense Technology Anduril Industries https://logos.yubhub.co/anduril.com.png Anduril Industries is a defense technology company that designs, builds, and sells military systems. https://www.anduril.com/ https://job-boards.greenhouse.io/andurilindustries/jobs/5029462007 Costa Mesa, California, United States 2026-04-18 a7d0cf0f-a3a Senior Engineer- Data Platforms The Data Platform Team serves as the experts on managing data infrastructure for CoreWeave. Our data infrastructure includes managed databases, data ingestion, data flow, data lakes, and other data retrieval for CoreWeave and its customers.

We are seeking senior software engineers with specialization in database and stream processing who can help us fulfill the goal of our global datastore strategy and establish communication models for our data flow. This individual will work with a team of mixed skilled engineers and have the opportunity to work on the full range of rewarding challenges that come with the business of building a cloud in a communicative, supportive, and high-performing environment.

As a member of the Data Platform Team you will have the opportunity to:

Design and implement the platform to deliver data to teams with a focus on providing managed solutions through APIs
Participate in operations and scaling of relational data platforms
Develop a stream processing architecture and solve for scalability and reliability
Improve the performance, security, reliability, and scalability of our data platforms and related services, and participate in the team’s on-call rotation
Establish guidelines and guard rails for data access and storage for stakeholder teams
Ensure compliance with standards for data protection regulation
Grow, change, invest in your teammates, be invested-in, share your ideas, listen to others, be curious, have fun, and, above all, be yourself

The ideal candidate will have 5+ years of experience in a software or infrastructure engineering industry, with experience operating services in production and at scale and familiarity with reliability engineering concepts such as different types of testing, progressive deployments, error budgets, observability, and fault-tolerant design.

The base salary range for this role is $175,000 to $210,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location.

XML job scraping automation by YubHub

]]> full-time senior hybrid $175,000 to $210,000 database and stream processing, managed databases, data ingestion, data flow, data lakes, APIs, operational experience, reliability engineering, testing, progressive deployments, error budgets, observability, fault-tolerant design, Kubernetes, Go, Linux distributions, shell scripting, Linux storage and networking stacks Engineering Technology CoreWeave https://logos.yubhub.co/coreweave.com.png CoreWeave is a cloud computing company that provides a platform for building and scaling AI degradation. It was founded in 2017 and became a publicly traded company in March 2025. https://www.coreweave.com https://job-boards.greenhouse.io/coreweave/jobs/4562276006 Bellevue, WA / Sunnyvale, CA 2026-04-18 6be81135-7d1 Product Training Engineer In this role, you will support the Altius business line by contributing your technical expertise and fostering strategic collaboration across various hardware and software development teams.

As a valued member of our team, you will play an integral role in developing and testing innovative solutions designed to address real-world challenges. This position underscores the importance of an iterative approach to our operations - design, test, field, and refine - thereby driving continuous improvement of our systems and services.

You will own various approaches to achieve product development goals through end-to-end product testing and operating procedural definition. You will interface with internal operations and product teams to identify and solve system or product reliability issues.

You will conduct end-to-end testing to establish robust QA matrices for our products. You will drive continuous improvement initiatives to enhance engineering processes and efficiency.

You will formulate and refine operational procedures and workflows. You will train colleagues on the procedures and workflows developed for new engineering capabilities.

You will provide valuable feedback to engineering teams to improve product functionality. You will lead new project initiatives, focusing on product operations and user maintenance planning.

You will architect and implement cutting-edge sustainment strategies for advanced defense systems. You will spearhead the development of comprehensive lifecycle support plans for Anduril's innovative products.

You will collaborate with cross-functional teams to optimize product design for long-term supportability.

Required qualifications include a strong aptitude for problem-solving in unstructured situations at the interface of hardware, software, and networking. You should have experience working on advanced hardware systems.

Familiarity with government testing protocols, compliance, and regulations is also required. Experience with Aircraft Reliability Engineering is preferred.

You must be eligible to obtain and maintain a U.S. Secret security clearance. Ability to travel as required for planning and testing (30%) is also necessary.

Preferred qualifications include technical writing experience developing standards, specifications, user guides, and policies. Proficiency in CAD software and product lifecycle management tools is also preferred.

Aircraft, spacecraft, or UAV operations experience, training experience, and prior military or government agency exposure/experience are also desirable.

The salary range for this role is $98,000-$129,000 USD.

XML job scraping automation by YubHub

]]> full-time mid onsite $98,000-$129,000 USD problem-solving, hardware systems, government testing protocols, compliance, regulations, Aircraft Reliability Engineering, U.S. Secret security clearance, technical writing, CAD software, product lifecycle management tools, Aircraft operations, spacecraft operations, UAV operations, training experience, prior military or government agency exposure/experience Engineering Technology Altius https://logos.yubhub.co/anduril.com.png Altius is a hardware and software development company focused on creating innovative solutions for real-world challenges. https://www.anduril.com/ https://job-boards.greenhouse.io/andurilindustries/jobs/5055326007 Atlanta, Georgia, United States 2026-04-18 4a6e477d-541 Reliability Engineer - Road Runner We are seeking an experienced Reliability Engineer to join our team in support of our Roadrunner product line. Roadrunner is a reusable, vertical take-off and landing (VTOL), operator-supervised Autonomous Air Vehicle (AAV) with twin turbojet engines and modular payload configurations that can support a variety of missions.

As a Reliability Engineer at Anduril, you will leverage your extensive knowledge of advanced Reliability tools and methodologies to drive excellence in our Reliability programs. You will support the Systems, Hardware, & Software Engineering teams across the product development lifecycle by guiding Failure Modes & Effects Analysis, predictive analysis, qualification test planning, field performance monitoring, and root cause & corrective action activities for your product(s).

The right person for this role has knowledge of design and development with a reliability focus on unmanned aircraft systems, as well as exposure to test, analysis, manufacturing, and continuous improvement in a production environment. If you are someone who has hands-on experience throughout the new product development life cycle from concept to customer delivery, loves to build world-class Reliability processes, can work efficiently across multi-disciplinary teams, and be accountable for results, then this role is for you.

Responsibilities:

Support and review system block diagrams, interface control documents, and schematics for compliance, redundancy, and accuracy against requirements and concept of operations.
Lead Failure Modes & Effects Analysis efforts and collaborate with cross-functional Engineering teams to implement product improvements for Reliability.
Perform predictive reliability analysis to calculate probability of loss of control, loss of asset, and MTTF/MTBF utilizing Anduril requirements, specifications, and relevant industry standards (i.e. MIL-HDBK-217Plus).
Develop Qualification Test plans to ensure appropriate requirements coverage, and work with Test Engineering to fixture, instrument, and execute test campaigns.
Develop Maintainability prediction models using MIL-HDBK-472, and support the development of Preventative Maintenance & Sparing plans.
Perform Weibull analysis utilizing data from Qualification campaigns and from fielded assets.
Work with Quality and Manufacturing Engineering to ensure inspection and acceptance test gates are in place and documented against requirements.
Use data analysis techniques to highlight key problem areas for deployed products, and drive root cause analysis & corrective action implementation to closure.
Develop and implement Reliability processes for the business to improve and streamline our Design, Manufacturing, and Deployment Operations efforts for rapid development.
Support Development Milestone Reviews through identification of appropriate entry and exit criteria, and work with Quality Engineering to guide the hand off process to production.

Required Qualifications:

Minimum of 5 years experience as a Reliability Engineer, Design/Development Engineer, or Test Engineer
B.S. Degree in Mechanical Engineering, Aerospace Engineering, Systems Engineering, or equivalent technical discipline
Experience with safety-critical hardware & software systems in the defense or aerospace industry, with a preference for experience with unmanned drone or aircraft systems
Experience or familiarity with MIL-STD-810, MIL-HDBK-217, MIL-STD-461, MIL-STD-516C, and MIL-STD-1629
Experience setting up a Reliability Engineering framework for a product or program, including support of production testing, field testing, data acquisition, and data analysis
Expertise in reliability analysis techniques, including FMEA/FMECA, predictive modeling, Weibull analysis, and Fault Tree Analysis
Experience with generating qualification testing requirements and executing tests to proactively characterize product behavior and performance
Failure investigation/analysis experience with proven track record of solving problems and preventing reoccurrence
Design Review Experience (PDR, CDR, MRR) and confident in presentation skills across all levels of leadership
Eligibility to obtain/maintain a U.S. Secret clearance, as required

Preferred Qualifications:

M.S. Degree in Mechanical Engineering, Aerospace Engineering, Systems Engineering, or equivalent technical discipline
Experience with DO-254, DO-178, MIL-HDBK-217Plus, MIL-HDBK-472, ARP-4754
Experience with Manufacturing Readiness Level (MRL) Assessments, Manufacturing Review Board (MRB), Quality Inspection, Tooling, or Calibration
Expertise in reliability analysis techniques, including FMEA/FMECA, predictive modeling, Weibull analysis, and Fault Tree Analysis
Experience with risk management, change control/change management reviews, and software/firmware HITL/SITL
Experience with HALT (Highly Accelerated Life Testing), HASS (Highly Accelerated Stress Screening), Environmental Testing, and/or Mechanical Testing
Technical writing experience developing requirements, standards, specifications, user guides, and policies

XML job scraping automation by YubHub

]]> full-time senior onsite $146,000-$193,000 USD Reliability Engineering, Failure Modes & Effects Analysis, Predictive Analysis, Qualification Test Planning, Field Performance Monitoring, Root Cause & Corrective Action, Reliability Processes, Design Development, Test Analysis, Manufacturing Continuous Improvement, Production Environment, Unmanned Aircraft Systems, Safety-Critical Hardware & Software Systems, MIL-STD-810, MIL-HDBK-217, MIL-STD-461, MIL-STD-516C, MIL-STD-1629, Reliability Analysis Techniques, FMEA/FMECA, Predictive Modeling, Weibull Analysis, Fault Tree Analysis, Qualification Testing Requirements, Test Execution, Product Behavior Characterization, Failure Investigation/Analysis, Design Review Experience, Presentation Skills, U.S. Secret Clearance, DO-254, DO-178, MIL-HDBK-217Plus, MIL-HDBK-472, ARP-4754, Manufacturing Readiness Level (MRL) Assessments, Manufacturing Review Board (MRB), Quality Inspection, Tooling, Calibration, Risk Management, Change Control/Change Management Reviews, Software/Firmware HITL/SITL, HALT (Highly Accelerated Life Testing), HASS (Highly Accelerated Stress Screening), Environmental Testing, Mechanical Testing, Technical Writing Engineering 홅 Anduril Industries https://logos.yubhub.co/anduril.com.png Anduril Industries is a defense technology company that designs, builds, and sells advanced military systems. https://www.anduril.com/ https://job-boards.greenhouse.io/andurilindustries/jobs/5022513007 Costa Mesa, California, United States 2026-04-18 67b4ccd7-51d Senior Software Engineer, Observability Insights Join CoreWeave's Observability team, where we are building the next-generation insights layer for AI systems.

Our team empowers internal and external users to understand, troubleshoot, and optimize complex AI workloads by transforming telemetry into actionable insights.

As a Senior Software Engineer on the Observability Insights team, you will lead the development of agentic interfaces and product experiences that sit atop CoreWeave's telemetry layer.

You'll design multi-tenant APIs, managed Grafana experiences, and MCP-based tool servers to help customers and internal teams interact with data in innovative ways.

Collaborating closely with PMs and engineering leadership, your work will shape the end-to-end observability experience and influence how people engage with cutting-edge AI infrastructure.

About the role

6+ years of experience in software or infrastructure engineering building production-grade backend systems and distributed APIs.

Strong focus on developer-facing infrastructure, with a customer-obsessed approach to SDKs, CLIs, and APIs.

Proficient in reliability engineering, including fault-tolerant design, SLOs, error budgets, and multi-tenant system resilience.

Familiar with observability systems such as ClickHouse, Loki, VictoriaMetrics, Prometheus, and Grafana.

Experienced in agentic applications or LLM-based features, including grounding, tool calling, and operational safety.

Comfortable writing production code primarily in Go, with the ability to integrate Python components when needed.

Collaborative experience in agile teams delivering end-to-end telemetry-to-insights pipelines.

Preferred

Experience operating Kubernetes clusters at scale, especially for AI workloads.

Hands-on experience with logging, tracing, and metrics platforms in production, with deep knowledge of cardinality, indexing, and query optimization.

Experienced in running distributed systems or API services at cloud scale, including event streaming and data pipeline management.

Familiarity with LLM frameworks, MCP, and agentic tooling (e.g., Langchain, AgentCore).

Why CoreWeave?

At CoreWeave, we work hard, have fun, and move fast!

We're in an exciting stage of hyper-growth that you will not want to miss out on.

We're not afraid of a little chaos, and we're constantly learning.

Our team cares deeply about how we build our product and how we work together, which is represented through our core values:

Be Curious at Your Core

Act Like an Owner

Empower Employees

Deliver Best-in-Class Client Experiences

Achieve More Together

We support and encourage an entrepreneurial outlook and independent thinking.

We foster an environment that encourages collaboration and enables the development of innovative solutions to complex problems.

As we get set for takeoff, the organization's growth opportunities are constantly expanding.

You will be surrounded by some of the best talent in the industry, who will want to learn from you, too.

Come join us!

XML job scraping automation by YubHub

]]> full-time senior hybrid $165,000 to $242,000 software engineering, infrastructure engineering, backend systems, distributed APIs, reliability engineering, fault-tolerant design, SLOs, error budgets, multi-tenant system resilience, observability systems, ClickHouse, Loki, VictoriaMetrics, Prometheus, Grafana, agentic applications, LLM-based features, grounding, tool calling, operational safety, Go, Python, Kubernetes, logging, tracing, metrics platforms, cardinality, indexing, query optimization, event streaming, data pipeline management, LLM frameworks, MCP, agent tooling, operating Kubernetes clusters Engineering Technology CoreWeave https://logos.yubhub.co/coreweave.com.png CoreWeave is a cloud computing company that provides a platform for building and scaling AI. https://www.coreweave.com https://job-boards.greenhouse.io/coreweave/jobs/4650163006 New York, NY / Sunnyvale, CA 2026-04-18 40d32156-365 Reliability Lead, Common Services As Reliability Lead, Common Services, you will establish and lead the Reliability Engineering and production operations practice for the Common Services organization. You'll partner closely with engineering leaders and teams across Common Services to define how we build, release, monitor, and operate critical services,raising the bar on reliability, availability, and operational excellence across the board.

In this role, you will:

Establish and lead the SRE / production engineering practice for the Common Services organization, including standards for reliability, incident management, and on-call, in partnership with the central Product Engineering organization.
Develop an Operational Excellence strategy that focuses on not only improving system performance but also monitoring and reducing operational toil
Partner with engineering and product teams to define SLOs, SLIs, and error budgets for critical Common Services, and ensure these become part of how teams plan and make tradeoffs.
Own and improve the incident management lifecycle for Common Services, including on-call rotations, escalation paths, incident tooling, post-incident reviews, and follow-through on corrective actions.
Drive the observability strategy (metrics, logs, traces, dashboards, alerts) for Common Services, ensuring we have actionable visibility into the health, performance, and capacity of key systems.
Collaborate with engineering leads to design and review architectures for reliability, scalability, resilience, and operability, including failure modes, redundancy, and graceful degradation.
Lead efforts to automate and harden operational workflows, including deployments, rollbacks, configuration management, change management, and routine maintenance tasks.
Build strong, trust-based relationships with partner teams and stakeholders, becoming a go-to leader for production readiness and operational risk within Common Services.
Hire, mentor, and develop SRE and production engineering talent, fostering a culture of continuous improvement, learning from incidents, and humane on-call.
Partner with other SRE and production engineering leaders across CoreWeave to align on global practices, tools, and reliability goals, representing the needs and constraints of Common Services.

You will be responsible for defining the reliability strategy, processes, and standards for the Common Services portfolio and driving consistent, high-quality operational practices across multiple teams.

The base salary range for this role is $206,000 to $303,000.

XML job scraping automation by YubHub

]]> full-time senior hybrid $206,000 to $303,000 Site Reliability Engineering, Production Engineering, Linux-based production environments, Containers, Orchestration technologies, Observability stacks, Alerting systems, SLIs/SLOs, Error budgets, Incident management, On-call rotations, Escalation paths, Post-incident reviews, Corrective actions, Automation tooling, Infrastructure-as-code, CI/CD pipelines, GPU workloads, High-performance computing, Latency/throughput-sensitive systems, Multi-tenant environments, Multi-region environments, Regulated environments, Service ownership models, Mentoring, Managing senior engineers Engineering Technology CoreWeave https://logos.yubhub.co/coreweave.com.png CoreWeave is a cloud computing company that provides a platform for AI development and deployment. https://www.coreweave.com https://job-boards.greenhouse.io/coreweave/jobs/4650165006 New York, NY / Sunnyvale, CA / Bellevue, WA 2026-04-18 0a2267d9-4e5 Senior Software Engineer, Reliability Experience We're looking for a Senior Software Engineer to join our Reliability Experience team. As a member of this team, you will be responsible for designing, developing, and maintaining opinionated UX across the Reliability Engineering ecosystem at Airbnb.

Our team charts the paved path that all platform, infra, and product engineers rely upon to effectively monitor, investigate, and debug system health across Airbnb's wide-ranging tech stack. We partner closely with the rest of Reliability Engineering and Infrastructure while serving all engineers as customers.

As a Senior Backend (or Fullstack) Engineer, you will be partnering with Reliability, Platform, and Infrastructures teams and utilize your extensive knowledge of web technologies to lead and execute on building the paved path for Airbnb's current and future internal needs. Your primary objective will be to make it easier to understand what's happening in production and quickly triage bugs and outages.

Responsibilities:

Collaborate with the Reliability Experience, Incident Management, Observability, and Resiliency teams to design and develop high-quality UX.
Be an active contributor to your projects by creating high-quality, tested pull requests and reviewing other's designs and code.
Build appropriate tests to ensure the reliability and performance of the software you create.
Create and present your own design, product, and architecture documents and provide feedback on others.
Stay up-to-date with the latest industry trends, technologies, and best practices in Web development and performance engineering, particularly in the Reliability and Observability space.

Requirements:

5+ years of industry engineering experience
Experience building internal infrastructure, particularly in Data or Observability spaces (Prometheus is a plus)
Strong collaboration with colleagues across multiple timezones
Fluency in Java, Python or one objected-oriented language
Experience with airbnb.io/visx/ is preferred but not required
Experience with Grafana and similar solutions is preferred but not required
Deep experience of understanding and solving engineering productivity pain points
Solid engineering and coding skills. Demonstrated knowledge of practical data structures and asynchronous programming
Strong communication and organizational skills
Ability to work in areas outside of your usual comfort zone and show motivation for personal growth without a dedicated product manager
Fluency in English (reading, writing, and speaking) is essential

XML job scraping automation by YubHub

]]> full-time senior remote Java, Python, Web development, Performance engineering, Reliability engineering, Observability, Data infrastructure, Prometheus, Grafana, Asynchronous programming, Data structures, airbnb.io/visx/ Engineering Technology Airbnb https://logos.yubhub.co/airbnb.com.png Airbnb is a global online marketplace for short-term vacation rentals. It has grown to over 5 million hosts who have welcomed over 2 billion guest arrivals. https://www.airbnb.com/ https://job-boards.greenhouse.io/airbnb/jobs/7756712 Brazil 2026-04-18 c3299844-c42 Senior Software Engineer Secure Every Identity, from AI to Human

Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organisations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence.

The Opportunity

The Migration Services team builds the critical, data-driven services that seamlessly move customers across environments in real-time. We are looking for a Senior Software Engineer who is passionate about crafting elegant solutions to complex distributed systems problems. You will be a key player in driving innovation, collaborating with architects and product managers to build and own the crucial infrastructure that underpins the Auth0 ecosystem. If you are excited by the prospect of making a massive impact, we want to hear from you!

What You'll Achieve

Build for scale. You will develop, and operate highly scalable, data-intensive services, demonstrating code craftsmanship and an eye for detail.
Master the data stream. You'll leverage streaming technologies and implement advanced change data capture (CDC) strategies to ensure the secure, reliable, and efficient transfer of data.
Drive operational excellence. Through continuous monitoring and performance tuning, you will enhance the reliability of our migration processes and participate in our team's on-call rotation to ensure our services are always on.

What You'll Bring

Proven engineering background. With 3+ years of experience in fast-paced, agile environments, you have a proven track record of shipping high-quality software.
Database familiarity. You possess a strong understanding of database fundamentals and have hands-on experience with datastores like MongoDB and PostgreSQL.
Go is your go-to. You have a strong proficiency in Golang or optionally, in node.js.
A passion for reliability. You have interest and experience in reliability engineering, with familiarity with observability and incident management.
Collaborative skills. Your excellent written and verbal communication skills enable you to collaborate effectively with cross-functional and geo-dispersed teams.

Bonus Points

Experience with distributed streaming platforms like Kafka.
Familiarity with concepts in the IAM (Identity and Access Management) domain.
Experience with cloud providers (AWS, Azure) and container technologies such as Kubernetes and Docker.

#Hybrid

The Okta Experience

Supporting Your Well-Being
Driving Social Impact
Developing Talent and Fostering Connection + Community

We are intentional about connection. Our global community, spanning over 20 offices worldwide, is united by a drive to innovate. Your journey begins with an immersive, in-person onboarding experience designed to accelerate your impact and connect you to our mission and team from day one.

XML job scraping automation by YubHub

]]> full-time senior hybrid Golang, MongoDB, PostgreSQL, Distributed systems, Reliability engineering, Observability, Incident management, Kafka, IAM, Cloud providers, Container technologies, Kubernetes, Docker Engineering Technology Okta https://logos.yubhub.co/okta.com.png Okta is a technology company that provides identity and access management solutions. https://www.okta.com/ https://job-boards.greenhouse.io/okta/jobs/7809897 Bengaluru, India 2026-04-18 da7679a6-e4f Senior Technical Operations Lead Job Title: Senior Technical Operations Lead

We are seeking an experienced Senior Technical Operations Lead to drive operational excellence across our Infrastructure Engineering organization.

As a Senior Technical Operations Lead, you will design and implement world-class operational processes, establish SRE best practices, and mentor technical teams to achieve exceptional reliability and efficiency.

Key Responsibilities:

SRE Leadership & Transformation

Lead the design and implementation of SRE practices and tooling across Infrastructure Engineering

Establish and cultivate an SRE-focused culture at Zoominfo

Operational Process Design & Governance

Establish clear governance frameworks and procedural consistency

Make decisions about process exceptions and/or changes to accommodate different team contexts

Design and/or implement process automations using scripts and integrations

Define functional requirements and goals for process automations

Conduct hands-on and/or automated audits to ensure process adherence and identify improvement opportunities

Incident Management & Root Cause Analysis

Design, implement, and continuously improve Incident Management and Change Management procedures that scale across the organization, using tools such as PagerDuty, Slack, Jira, ServiceNow, and custom integrations

Lead and participate in root cause analysis sessions, driving teams toward systemic improvements rather than blame

Design and execute incident dry runs and tabletop exercises to build organizational resilience

Establish metrics and KPIs that measure incident response effectiveness and drive continuous improvement

Enable Data-Driven Decision Making

Identify, define, and automate the tracking of operational KPIs and departmental metrics that matter, enabling senior managers to make informed decisions on the basis of data

Build and maintain metric dashboards and automated reporting systems that provide real-time visibility into operational health

Analyze trends and surface opportunities for optimization

Stakeholder Engagement, Training & Mentorship

Build and maintain strong relationships with Engineering managers, Product Managers, and cross-functional stakeholders across geographies

Maintain a feedback loop. Meet with stakeholders to understand process pain points.

Influence others by fostering trust, leading by example, and inspiring them with your expertise and passion for reliability practices.

Enhance internal knowledge of third-party tools such as Pagerduty, Datadog, and more, by educating Zoominfo employees on these tools.

Deliver training sessions that make Operational Excellence engaging and motivating for diverse audiences.

Required Experience & Qualifications:

Bachelor’s degree in Software Engineering, Operations Management, or related field

7+ years of hands-on experience in technical operations, Site Reliability Engineering (SRE), Incident Management, or IT Service Management roles within SaaS or technical organizations

Fluent English proficiency (written and verbal)

Proven track record designing and implementing operational processes at scale

Demonstrated expertise in SRE principles, practices, and tooling

Strong data analysis skills with ability to define metrics, build or design dashboards, and use data to drive strategic decisions

Proven ability to work effectively in a matrix organizational structure

Ability and experience working with senior management at global organizations

Hands-on experience with monitoring and observability tools such as PagerDuty and/or Datadog

Familiarity with Jira, Confluence, Google Data Studio, or Tableau

Experience with scripting and integrations (Python, JavaScript, Google AppScript, or similar)

Background in SRE transformation or organizational process improvement initiatives

#LI-SS4 #LI-Hybrid

XML job scraping automation by YubHub

]]> full-time senior hybrid Site Reliability Engineering (SRE), Technical Operations, Incident Management, IT Service Management, Monitoring and Observability Tools, Jira, Confluence, Google Data Studio, Tableau, Scripting and Integrations, Python, JavaScript, Google AppScript Engineering Technology ZoomInfo https://logos.yubhub.co/zoominfo.com.png ZoomInfo is a technology company that provides a go-to-market intelligence platform. It has over 35,000 customers worldwide. https://www.zoominfo.com/ https://job-boards.greenhouse.io/zoominfo/jobs/8451386002 Ra'anana, Israel 2026-04-18 1a4d732c-42c Principal Site Reliability Engineer - Observability We're looking for a Principal Site Reliability Engineer to join the Observability Solution team. As a key member of the team, you will collaborate with product management, product design, customers, and multiple teams across Elastic to define and evolve end-to-end InfraObs experiences. You will deliver and continually evolve these experiences leveraging the Elastic Platform capabilities and coding agents.

Key responsibilities include being a contact point for other teams within Elastic, fostering a culture of mutual respect, collaboration, and consensus-based decision-making, and being an awesome person to work with.

To be successful in this role, you will need to have a SRE background and experience operating large-scale production services with the help of Observability tools. You should be proficient in operating production infrastructure in K8s and at least one of the three major CSPs, as well as using Observability tools. You will also need to be able to use AI coding agents in the delivery workflow and have excellent verbal and written communication skills.

Bonus points will be given to those with experience as a user of the Elastic Stack.

XML job scraping automation by YubHub

]]> full-time senior remote Site Reliability Engineering, Observability tools, Kubernetes, Cloud Service Providers, AI coding agents, Elastic Stack, Product management, Product design, Collaboration, Communication Engineering Technology Elastic, the Search AI Company https://logos.yubhub.co/elastic.co.png Elastic provides a cloud-based platform for search, security, and observability, used by over 50% of the Fortune 500. https://www.elastic.co/ https://job-boards.greenhouse.io/elastic/jobs/7721575 Spain 2026-04-18 982dd81e-416 Principal Database Engineer, Data Engineering As a Principal Database Engineer, you'll design and lead the evolution of the PostgreSQL backbone that powers GitLab.com and thousands of self-managed enterprise deployments. You'll solve critical challenges around uncontrolled data growth, complex upgrades and migrations, and always-on reliability at global scale, creating the database patterns and platforms that keep GitLab fast, resilient, and cost efficient as usage grows.

You'll architect scalable, distributed database solutions, build proactive health and reliability frameworks, and drive adoption of modern database technologies and data stores that improve both product capabilities and production stability. Working hands-on in the codebase and partnering closely with product and infrastructure teams, you'll turn long-term database strategy into incremental, customer-visible improvements, shift incident response from reactive to proactive, and help define GitLab's next-generation data architecture, including sharding and multi-database support.

Key Responsibilities:

Lead the architecture and strategy for GitLab.com's PostgreSQL infrastructure, designing scalable, resilient solutions for both SaaS and self-managed deployments.

Build proactive database health and reliability frameworks using continuous monitoring, automated remediation, and predictive analytics to prevent customer-impacting incidents.

Drive database best practices across engineering by guiding schema design, migrations, and query optimization, and by creating self-service tools and guardrails for product teams.

Own end-to-end observability for database systems, designing symptom-based monitoring, leading incident response, and turning learnings into automated, repeatable workflows.

Shape the evolution of GitLab’s database platform by evaluating and implementing modern database technologies and data stores that improve reliability, performance, and product capabilities.

Design solutions and patterns that address uncontrolled data growth, cost efficiency, sharding, multi-database support, and other next-generation data architecture needs.

Collaborate closely with product and infrastructure teams to align product decisions with platform constraints and priorities, breaking down long-term goals into incremental, customer-visible outcomes.

Contribute directly to the codebase to prototype and ship working solutions, maintain technical credibility, and deep-dive into complex production issues when needed.

Requirements:

Experience architecting, operating, and optimizing PostgreSQL in large-scale, distributed production environments with high availability and disaster recovery requirements.

Deep knowledge of PostgreSQL internals, including the query planner, write-ahead logging, vacuum processes, and storage engine behavior.

Background designing and maintaining highly distributed database platforms with automated failover, robust monitoring, and self-healing capabilities.

Hands-on coding skills and comfort working across the stack, from low-level database and search systems to backend and frontend services.

Familiarity with infrastructure-as-code, GitOps practices, security hardening, and site reliability engineering principles applied to database operations.

Ability to debug complex, cross-system issues, translate findings into durable technical solutions, and turn incident learnings into repeatable automation.

Experience influencing technical direction across multiple teams, providing practical guidance on migrations, query optimization, and database best practices.

Openness to collaborating with people from diverse technical backgrounds, with a focus on clear communication, shared ownership, and learning transferable skills.

XML job scraping automation by YubHub

]]> full-time staff remote $157,900-$338,400 USD PostgreSQL, database architecture, data engineering, infrastructure-as-code, GitOps, security hardening, site reliability engineering, database operations, query optimization, schema design, migrations, query planning, write-ahead logging, vacuum processes, storage engine behavior Engineering Technology GitLab https://logos.yubhub.co/about.gitlab.com.png GitLab is a software development platform that provides tools for version control, issue tracking, and project management. It has over 50 million registered users and is trusted by more than 50% of the Fortune 100. https://about.gitlab.com/ https://job-boards.greenhouse.io/gitlab/jobs/8231379002 Remote, EMEA; Remote, North America 2026-04-18 777a6e79-5d9 Senior Software Engineer, Security Engineering Secure Every Identity ----------------------- Okta secures AI by building the trusted, neutral infrastructure that enables organisations to safely embrace this new era.

We are looking for builders and owners who operate with speed and urgency and execute with excellence. This is an opportunity to do career-defining work.

The Role -------- We seek a knowledgeable and development-focused Security Engineer, who will build micro-services to secure Customer Identity Products and Infrastructure.

Responsibilities --------------- Work across a globally distributed product-aligned team of security engineers Establish a deep understanding of Okta Customer Identity products and infrastructure Collaborate when necessary with the Okta Security team on security operations Build, deploy & maintain scalable and reliable infrastructure services as well as security solutions for customer identity products Build, deploy & maintain automation to improve platform security capabilities at scale including logging, threat detection and compliance benchmarks to increase our security posture Help meet our operational security commitments by thinking like an attacker, assessing the risk, and advising on mitigation strategies Support security investigations in coordination with the Okta Security team, participate in root cause analysis and perform necessary remediations. Support stakeholders by proposing mitigation strategies for end-of-life software and security vulnerability and patch management

Requirements ----------- You have 3+ years of hands-on development experience writing microservices with Golang You have 3+ years of experience in cloud infrastructure security, product security You have working knowledge and hands on development experience with one or more of the following: AWS and/or Azure security Kubernetes You have strong knowledge in OWASP Top 10 and secure coding best practices You have strong foundation on secure software development lifecycle best practices You have strong written and verbal communication skills You have experience working with a globally distributed and remote team.

Bonus points if: You have working knowledge and experience with one or more of the following: Full-stack engineering Site reliability engineering Identity and access management Vulnerability and threat management Security detection and response Governance, risk and compliance

XML job scraping automation by YubHub

]]> full-time senior hybrid Golang, Cloud infrastructure security, Product security, AWS security, Azure security, Kubernetes, OWASP Top 10, Secure coding best practices, Secure software development lifecycle best practices, Full-stack engineering, Site reliability engineering, Identity and access management, Vulnerability and threat management, Security detection and response, Governance, risk and compliance Engineering Technology Okta https://logos.yubhub.co/okta.com.png Okta is a company that provides identity and access management solutions. It has a global presence with over 20 offices worldwide. https://www.okta.com https://job-boards.greenhouse.io/okta/jobs/7744352 Bengaluru, India 2026-04-18 9238107d-204 Software Architect, Reliability Engineering Join the team as Twilio's next Reliability Architect.

As an Architect in SRE, you will drive the technical strategy, vision and outcomes for Twilio's Reliability Engineering organisation. You will define and lead solutions and initiatives that ensure Twilio products are reliable worldwide, and you will define standards and guide engineering teams on best practices for designing, building, and operating resilient systems.

This role is pivotal to Twilio's commitment to operational excellence, scalability, and pragmatic, large-scale systems design in the cloud.

Responsibilities:

Partner with senior technical leaders across Twilio to set and communicate the reliability strategy, translating business goals into measurable outcomes.
Influence company-wide architectural decisions while balancing long-term vision with near-term and compliance needs.
Lead the design, implementation, and operation of scalable solutions and paved roads that enable reliable, high-traffic services;
Influence company-wide architectural decisions to focus on availability, performance, resilience, and cost efficiency using Kubernetes, AWS, Terraform, and modern observability.
Ensure integrity and quality across the service lifecycle; design fault-tolerant architectures, incident response, disaster recovery, and capacity/cost management.
Collaborate with product and cross-functional teams to identify reliability risks and convert them into actionable designs, programs, and tooling.
Establish and champion reliability practices and drive systemic improvements.
Mentor and grow engineers and technical leaders
Track and apply emerging SRE, cloud, and large-scale systems best practices; introduce pragmatic innovations that improve reliability at scale.

Qualifications:

15+ years of experience in Reliability Engineering, Software Engineering, DevOps roles with a focus on infrastructure, backend systems, and reliability, including as a principal/architect.
Strong experience in driving strategic technical decisions and defining long-term technical vision.
In-depth understanding of the role of Reliability Engineering in a large and diverse SaaS organisation.
Experience driving cross-org technical architecture outcomes.
Knowledge of cloud architecture, devops practices, and large-scale systems design with microservices.
Bachelor's or Master's degree in Computer Science, Engineering, or a related field (or equivalent experience).
Strong production experience, including operational management, scaling, partitioning strategies, and tuning for performance and reliability in high-scale environments.
Hands-on experience with Kubernetes (e.g., EKS), deploying and managing stateful services, and cloud services like AWS.
Proficiency in infrastructure-as-code tools such as Terraform or CloudFormation for automating infrastructure.
Expertise in observability tools (e.g., Prometheus, Grafana, Datadog) for monitoring distributed systems and setting up alerting.
Proficient in at least one programming language (e.g., Go, Python, Java) for building automation and tooling.
Experience designing incident response processes, SLOs/SLIs, runbooks, and participating in on-call rotations.
Experience running cross-functional post-incident reviews and driving improvements.
Strong understanding of distributed systems principles, including consensus, durability, throughput, and availability tradeoffs.
Proven track record of leading reliability improvements in data-intensive or mission-critical systems and collaborating with engineering teams.
Excellent problem-solving, analytical, verbal, and written communication skills, with the ability to work in cross-functional and distributed environments.
Demonstrated leadership in mentoring teams, influencing decisions, and balancing long-term objectives with short-term needs.
Ability to influence and build effective working relationships with all levels of the organisation.

Desired:

Specific experience owning and operating large AWS footprints.
Knowledge of Kubernetes architecture and concepts.
Experience with data technologies like Apache Kafka, AWS MSK, or similar for reliable streaming.
Passion for building reliable products, with prior projects in high-availability systems

XML job scraping automation by YubHub

]]> full-time senior remote $227,840.00 - $284,800.00 per year Reliability Engineering, Software Engineering, DevOps, Cloud Architecture, Microservices, Kubernetes, AWS, Terraform, Observability Tools, Programming Languages, Incident Response, Distributed Systems Principles, Apache Kafka, AWS MSK, Kubernetes Architecture, Data Technologies Engineering Technology Twilio https://logos.yubhub.co/twilio.com.png Twilio is a communications platform that provides cloud communication APIs for building, scaling, and operating real-time communication and collaboration applications. https://www.twilio.com/ https://job-boards.greenhouse.io/twilio/jobs/7658259 Remote - US 2026-04-18 e53014e6-57c Data Center Engineer, Resource Efficiency – Compute Supply As a Power & Resource Efficiency Engineer, you'll sit at the intersection of IT and facilities , building the systems, models, and control loops that optimize how we allocate and consume power, cooling, and physical capacity across our TPU/GPU fleet.

Key responsibilities include:

Building models that forecast consumption across electrical and mechanical subsystems, informing capacity planning, energy procurement, oversubscription targets and risks, including statistical modeling of cluster utilization, workload profiles, and failure modes.

Designing IT/OT interfaces that bridge compute orchestration with facility controls, enabling real-time telemetry across accelerator hardware, power distribution, cooling, and schedulers.

Building and operating load management systems that use power and cooling topology to enable load management and power/thermal-aware placement to maximize throughput while meeting SLOs.

Partnering with data center providers to drive design optimizations and hold them accountable to SLA-grade performance standards, providing technical diligence on partner architectures.

In this role, you'll need to have deep knowledge of data center power distribution and cooling architectures, and how they interact with IT load profiles. Experience with reliability engineering, SLA development, and failure-mode analysis is also essential.

Additionally, proficiency in statistical modeling and simulation for infrastructure capacity or power utilization, familiarity with SCADA/BMS/EPMS, telemetry pipelines, and control systems, and exposure to accelerator deployments and their power management interfaces are highly desirable.

This is a challenging and rewarding role that requires a unique blend of technical expertise, business acumen, and collaboration skills. If you're passionate about data center infrastructure, AI, and sustainability, we encourage you to apply.

XML job scraping automation by YubHub

]]> full-time senior hybrid $320,000-$405,000 USD data center power distribution and cooling architectures, _SYSTEMS, reliability engineering, SLA development, failure-mode analysis, statistical modeling and simulation, SCADA/BMS/EPMS, telemetry pipelines, control systems, accelerator deployments, power management interfaces, Python, similar languages, control theory, dynamical systems, cyber-physical systems design, energy storage, microgrid integration, demand response, behind-the-meter generation Engineering Technology Anthropic https://logos.yubhub.co/anthropic.com.png Anthropic creates reliable, interpretable, and steerable AI systems. It operates at massive scale, with a focus on extracting maximum compute throughput from every watt. https://www.anthropic.com/ https://job-boards.greenhouse.io/anthropic/jobs/5159642008 Remote-Friendly, United States 2026-04-18 c72e1616-491 Staff Hardware Reliability Engineer As a Hardware Reliability Engineer at Shield AI, you will be responsible for ensuring the robustness and long-term performance of our VBAT flight hardware. You'll work closely with design, manufacturing, and supplier chain to implement design-for-reliability best practices and perform reliability verification from concept through production.

You will lead environmental and stress testing efforts, including temperature cycling, vibration, HALT, and HASS, conduct failure analysis and materials characterization, and analyze root cause investigations for manufacturing non-conformances and field returns. You'll participate in design reviews and FMEA activities, shape material selection and manufacturing requirements, analyze test and field data using reliability modeling tools, and help develop corrective actions and process improvements that elevate hardware reliability across the program.

Responsibilities: ude and implement design-for-reliability best practices, conducting rigorous testing, shaping manufacturing requirements, selecting materials, and analyzing field data to enhance the robustness of VBAT hardware. Perform stress screening, environmental testing, and drive failure analysis to ensure flight hardware meets reliability and performance targets. Analyze designs and test results to identify potential failure modes and mitigations. Collaborate with design engineers to implement design for reliability best practices early in design. Act as a key stakeholder in reviewing and approving designs for release. Participate in design reviews and failure mode effects analysis (FMEA) to assess potential reliability issues. Investigate manufacturing non-conformances and field hardware failures to determine root cause. Travel as needed to perform deep dives into supplier processes. Develop and recommend corrective actions to address identified reliability issues. Utilize reliability modeling and simulation tools to predict system performance and lifespan. Stay current with industry trends, advancements, and best practices in hardware reliability engineering. Propose and implement process improvements to drive improvements in reliability across the program.

XML job scraping automation by YubHub

]]> full-time staff onsite $158,542 - $237,812 a year Materials science, Electronics manufacturing processes (PCB fabrication and assembly), Hardware reliability concepts, Environmental test practices, Python, NumPy, Pandas, SciPy, Plotly, Matplotlib, IPC, JEDEC, AIAA, AEC, MIL, SMC standards, Master's degree in Materials Engineering, 3+ years of experience in hardware reliability engineering, Failure analysis techniques and materials characterization methods, Environmental testing, including temperature cycling, vibration, highly accelerated limit testing (HALT), and highly-accelerated stress screening (HASS), PCB fabrication, SMT, and polymerics application manufacturing processes, Significant knowledge of reliability engineering principles, methods, and tools Engineering Technology Shield AI https://logos.yubhub.co/shield.ai.png Shield AI is a venture-backed deep-tech company founded in 2015 with a mission to protect service members and civilians with intelligent systems. https://www.shield.ai https://jobs.lever.co/shieldai/88a4633a-d0b1-4025-b3ff-cb4c976fadc9 Dallas, Texas / Boston, MA 2026-04-17 7f43bb14-3c4 Senior Cloud Engineer Shield AI is seeking a Senior Cloud Engineer to support its leadership in applied artificial intelligence development. In this role, you will be responsible for engineering, deploying, provisioning, and managing critical cloud systems that drive innovation across Shield AI's public and private cloud environments, both domestically and internationally.

As part of the Cloud and Infrastructure team within Enterprise Operations, you will play a key role in ensuring the performance, scalability, and reliability of these systems to support various business units. This position may involve occasional travel to Shield AI locations.

Responsibilities:

Engineering:

Manage and optimize multi-cloud infrastructure (Azure, AWS) for performance, reliability, and scalability.
Support and optimize cloud and virtual machine environments, assisting with capacity planning, performance monitoring, security compliance, and vulnerability remediation.
Assist in implementing and maintaining infrastructure systems, including servers, storage, backup solutions, and disaster recovery processes, for both public and private clouds.
Continuously learn and adapt to emerging technologies and platforms, leveraging automation wherever possible.
Author and produce the necessary documentation for engineered and maintained systems along with associated processes that supporting teams can leverage.
Assist in researching, recommending, and developing innovative solutions for complex requirements and issue resolution.
Collaborate cross-functionally with AI, DevOps, and Security teams to ensure compliance, observability, and resilience in mission-critical environments.
Participate in Agile methodologies and sound engineering principles.

Operations and Support:

Perform daily system monitoring, verifying the integrity and availability of all server resources, systems and key processes, reviewing system and application logs.
Support system maintenance and upgrades, including OS patching, software configuration, hardware updates, and performance tuning to ensure optimal cloud infrastructure performance.
Provide escalated support for operational issues possibly during and after normal business hours for systems, workloads, and Kubernetes AI infrastructure.
Analyze, troubleshoot and resolve system infrastructure and software issues.
Ability to participate in on-call, emergency, or maintenance roles

Requirements:

Bachelor’s degree in Computer Science or related field, or equivalent experience (4+ years) plus an engineer level certification, Azure/AWS Associate, or another similar level certification.
4 years’ experience supporting applications and systems in a production environment in high-availability, mission-critical, or defense-grade environments preferred.
Comfortable with operational efficiencies utilizing Infrastructure as Code (IaC) solutions (e.g., Terraform, Ansible).
Strong understanding of networking concepts (VPCs, VPNs, subnets, routing, firewalls).
Experience in automating repetitive tasks using scripting languages such as PowerShell, Python, or Bash.
Experience with deployment and systems administration of at least one type of Linux distribution (i.e. RHEL, Ubuntu)
Experience with concepts of Microsoft Windows Server administration, Azure and Active Directory environments
Possesses organizational skills, with a process-oriented mindset, attention to detail, and effective verbal and written communication abilities.
Ability to work independently to accomplish assigned tasks.
Solution-oriented, constructive approach to problem-solving.

Preferred Qualifications:

Experience deploying and maintaining workloads in Azure public cloud environments.
Hands-on experience with containerization and Kubernetes-based workloads.
Strong understanding of virtualization and private cloud platforms (e.g., VMware, Hyper-V, KVM).
Background in DevOps, Site Reliability Engineering (SRE), or cloud infrastructure roles.
Proficiency with configuration management and automation tools (e.g., Ansible, Chef, Puppet, Terraform).
Experience building and optimizing CI/CD pipelines.

Salary and Benefits:

$110,000 - $170,000 a year
Full-time regular employee offer package: Pay within range listed + Bonus + Benefits + Equity
Temporary employee offer package: Pay within range listed above + temporary benefits package (applicable after 60 days of employment)

XML job scraping automation by YubHub

]]> full-time senior onsite $110,000 - $170,000 a year Cloud Engineering, Multi-cloud infrastructure, Azure, AWS, Networking concepts, Infrastructure as Code, Scripting languages, Linux distribution, Microsoft Windows Server administration, Active Directory environments, Containerization, Kubernetes-based workloads, Virtualization, Private cloud platforms, DevOps, Site Reliability Engineering, Configuration management, Automation tools, CI/CD pipelines Engineering Technology Shield AI https://logos.yubhub.co/shield.ai.png Shield AI is a venture-backed deep-tech company founded in 2015, developing intelligent systems for military and civilian use. https://www.shield.ai https://jobs.lever.co/shieldai/702e2609-db48-49ab-8bec-d405c956a6ce San Diego, California / Dallas, Texas / San Francisco, California 2026-04-17 82dbc383-af5 Marine Compliance Engineer We are seeking a talented Marine Compliance Engineer to support the design, analysis, and development of autonomous surface vessels. This role will provide hands-on experience in traditional ship design and advanced autonomous technologies.

Key responsibilities include advising the engineering team on applicable maritime standards and regulations, supporting reliability engineering on requirements definition and tracking, and participating in systems engineering activities.

The ideal candidate will have a Bachelor's degree in Naval Architecture, Marine Engineering, or a related field, and 5+ years of engineering experience in designing to or verifying class society rules.

Additional qualifications include strong fundamentals in core marine design principles, experience working on various vessel sizes and types, and expertise with marine standards and regulations.

In this role, you will also be responsible for performing internal audits for compliance against maritime standards, tracking and adjudicating comments received from class society reviews, and maintaining up-to-date technical documentation.

If you are a motivated and detail-oriented engineer with a passion for maritime compliance, we encourage you to apply.

XML job scraping automation by YubHub

]]> full-time mid onsite Naval Architecture, Marine Engineering, Autonomous Technology, Systems Engineering, Reliability Engineering, Marine Standards and Regulations, Class Society Rules, Technical Documentation Engineering Technology Saronic Technologies https://logos.yubhub.co/saronic-tech.com.png Saronic Technologies develops state-of-the-art solutions for autonomous and intelligent maritime operations. https://www.saronic-tech.com/ https://jobs.lever.co/saronic/da3961bf-bb32-430c-b6ac-ab7beae07123 London 2026-04-17 5532a7a5-18c Electrical Engineer We are seeking an experienced Electrical Engineer with strong PCB schematic and layout expertise to join our engineering team. In this role, you will take ownership of fast-paced board-level design from concept through production, working on high-reliability systems for demanding environments.

You will collaborate cross-functionally with other electrical, firmware/software, mechanical, and manufacturing/quality teams to design, validate, and deliver robust electronic solutions.

Key Responsibilities:

Primarily digital and power-based designs, but analog design is a possibility.
Own the entire PCB design lifecycle, including requirements, major component selection, block diagrams, schematic, circuit analysis, placement, routing, DFA/DFM review, prototype testing, and documentation.
Create and maintain BOMs, design files, netlists, and manufacturing documentation.
Work closely with mechanical engineers to integrate PCBs into enclosures and assemblies.
Lead or support prototype bring-up, debugging, and testing with support from the firmware/software teams (Capability to write firmware/software for bring-up is a plus).
Perform signal integrity, power budgeting, and thermal analysis as needed.
Review and optimize designs for EMC/EMI compliance and environmental robustness.
Interface with fabrication and assembly vendors, addressing build issues and driving quality improvements.
Participate and cross-collaborate in design reviews and peer reviews to ensure high standards.
Contribute to and improve engineering processes, libraries, and design guidelines.
Support ITAR/export compliance and follow relevant standards such as MIL-STD’s, IPC-610/6012, etc.

Required Qualifications:

Bachelor’s degree in Electrical Engineering or related field.
3+ years of professional experience in PCB schematic and layout design.
Proficiency with Altium Designer (or other equivalent EDA tools).
Experience with multi-layer board design and high-speed or mixed-signal circuitry.
Strong understanding of electronic components, circuit theory, and board-level design principles.
Ability to interpret datasheets, standards, and mechanical constraints.
Proven experience with design for manufacturing (DFM) and design for test (DFT).

Preferred Qualifications:

Experience in defense, aerospace, or other safety/mission-critical industries.
Familiarity with ruggedized systems, high-temperature settings, water-intrusion protection, and other environmental requirements.
Knowledge of MIL-STD, IPC standards, or similar frameworks.
Experience with SPICE simulations, Signal Integrity Analysis, or reliability engineering.
Ability to work with firmware, FPGA, or embedded teams.
Active or previous security clearance (or ability to obtain one).
Ownership mentality and attention to detail.
Ability to work both independently and in multidisciplinary teams.
Strong organizational and time management skills.
Comfortable working in a fast-paced, product-focused environment.

Benefits:

Medical Insurance: Comprehensive health insurance plans covering a range of services
Dental and Vision Insurance: Coverage for routine dental check-ups, orthodontics, and vision care
Saronic pays 100% of the premium for employees and 80% for dependents
Time Off: Generous PTO and Holidays
Parental Leave: Paid maternity and paternity leave to support new parents
Competitive Salary: Industry-standard salaries with opportunities for performance-based bonuses
Retirement Plan: 401(k) plan
Stock Options: Equity options to give employees a stake in the company’s success
Life and Disability Insurance: Basic life insurance and short- and long-term disability coverage
Additional Perks: Free lunch benefit and unlimited free drinks and snacks in the office

Physical Demands:

Prolonged periods of sitting at a desk and working on a computer.
Occasional standing and walking within the office.
Manual dexterity to operate a computer keyboard, mouse, and other office equipment.
Visual acuity to read screens, documents, and reports.
Occasional reaching, bending, or stooping to access file drawers, cabinets, or office supplies.
Lifting and carrying items up to 20 pounds occasionally (e.g., office supplies, packages).

Additional Information:

This role requires access to export-controlled information or items that require “U.S. Person” status. As defined by U.S. law, individuals who are any one of the following are considered to be a “U.S. Person”: (1) U.S. citizens, (2) legal permanent residents (a.k.a. green card holders), and (3) certain protected classes of asylees and refugees, as defined in 8 U.S.C. 1324b(a)(3).

XML job scraping automation by YubHub

]]> full-time mid onsite Altium Designer, PCB schematic and layout design, Multi-layer board design, High-speed or mixed-signal circuitry, Electronic components, Circuit theory, Board-level design principles, Design for manufacturing (DFM), Design for test (DFT), Defense, Aerospace, Ruggedized systems, High-temperature settings, Water-intrusion protection, MIL-STD, IPC standards, SPICE simulations, Signal Integrity Analysis, Reliability engineering, Firmware, FPGA, Embedded teams Engineering Technology Saronic Technologies https://logos.yubhub.co/saronictechnologies.com.png Saronic Technologies develops state-of-the-art solutions for maritime operations through autonomous and intelligent platforms. https://www.saronictechnologies.com/ https://jobs.lever.co/saronic/7f7a08af-f20e-4091-84df-be2d50ab5bc3 San Francisco 2026-04-17 68a06d9f-c72 Director of Shipyard Facilities Operations & Asset Reliability The Director of Shipyard Facilities Operations & Asset Reliability owns infrastructure uptime, asset lifecycle performance, and execution of capital programs across the shipyard. This role ensures that critical production-enabling assets operate reliably while delivering capital projects and yard expansions safely, on schedule, and within approved budgets.

Responsibilities:

Own reliability and availability of all production-critical assets, including cranes, hoists, and heavy lift systems, shiplifts, docks, transfer systems, and waterfront infrastructure, electrical power distribution, water, compressed air, and utilities, buildings, foundations, pavements, and fixed facilities.
Establish and enforce a disciplined preventive and predictive maintenance strategy to prevent unplanned outages.
Ensure corrective maintenance is prioritized, executed safely, and escalated appropriately when scope exceeds routine maintenance.
Partner with production leadership to align maintenance windows with operational demand.
Implement reliability metrics and use data to drive decisions.
Ensure new assets are onboarded with defined maintenance plans, documentation, and ownership.
Oversee asset condition assessments, lifecycle planning, and replacement strategies.
Own asset inventory accuracy, labeling, auditing, impairment review, and disposition decisions in coordination with Finance.
Capital Programs & Yard Expansion:
Own end-to-end execution of capital programs, including yard expansion and facility build-out, major equipment purchases and infrastructure upgrades, utility capacity expansions and site development.
Lead annual capital planning and prioritization aligned with production forecasts and long-term strategy.
Ensure projects are scoped, budgeted, approved, and executed in accordance with governance and delegation of authority.
Establish clear project controls for cost, schedule, scope, and risk.
Ensure projects transition cleanly from execution into operations with defined ownership and maintenance plans.
Coordinate closely with civil engineers, designers, contractors, and regulators to deliver compliant facilities.
Interfaces with permitting authorities and ensure capital work complies with environmental, safety, and building requirements.
Financial & Program Controls:
Own capital and facilities budgets, including forecasting, tracking, and variance management.
Ensure appropriate classification and governance of maintenance, OPEX projects, and CAPEX investments.
Review estimates, commitments, and EACs to prevent cost overruns.
Drive value by balancing repair vs replacement decisions using lifecycle cost analysis.
Ensure timely closeout of projects and accurate capitalization of assets.
Cross-Functional Integration:
Partner with Manufacturing, Engineering, EHS, and Quality to ensure infrastructure supports safe, efficient execution.
Align asset reliability priorities with throughput, schedule, and vessel delivery commitments.
Serve as escalation point when infrastructure risk threatens production continuity.
Team Leadership:
Lead facilities, maintenance, reliability, and capital project teams.
Set clear expectations for execution discipline, technical rigor, and accountability.
Develop internal capability in maintenance planning, reliability engineering, and project management.

Qualifications:

Bachelor’s degree in Engineering, Facilities Management, Construction Management, or related technical discipline.
10+ years experience in heavy industrial, shipyard, manufacturing, or infrastructure-intensive environments.
5+ years leading facilities, maintenance, reliability, or capital programs.
Demonstrated experience owning uptime of critical industrial assets.
Proven track record delivering capital projects on time and on budget.
Strong understanding of preventive, corrective, and predictive maintenance systems, asset lifecycle management and reliability principles, capital project controls and financial governance.
Experience interfacing with engineering, finance, contractors, and regulators.
Ability to balance short-term operational risk with long-term capital strategy. Preferred Qualifications:
Experience supporting shipyard, port, or waterfront industrial operations.
Background in reliability engineering or asset management programs.
Experience with large mobile cranes, ship lifts, docks, or heavy industrial utilities.
Familiarity with environmental, building, and infrastructure permitting.
Lean, Six Sigma, or continuous improvement experience.
PMP or equivalent project management certification.

XML job scraping automation by YubHub

]]> full-time senior onsite preventive maintenance, corrective maintenance, predictive maintenance, asset lifecycle management, reliability engineering, capital project controls, financial governance, project management, lean, six sigma, continuous improvement, large mobile cranes, ship lifts, docks, heavy industrial utilities, environmental permitting, building permitting, infrastructure permitting Engineering Manufacturing Saronic Technologies https://logos.yubhub.co/saronictechnologies.com.png Saronic Technologies develops state-of-the-art solutions for maritime operations through autonomous and intelligent platforms. https://www.saronictechnologies.com/ https://jobs.lever.co/saronic/94c9cd99-7b77-4cb8-8dbb-d993eb8f8bb1 Piraeus 2026-04-17 e308ff1b-d8b Software Engineer, DevOps, Research Platform About Mistral AI\n\nAt Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life.\n\nWe are a team passionate about AI and its potential to transform society. Our diverse workforce thrives in competitive environments and is committed to driving innovation.\n\nRole Summary\n\nWe are seeking a talented and experienced software engineer to join our Research Platform team. You'll work closely with our R&D team to build a cloud agnostic platform that improves the stability, scalability and velocity across the research department.\n\nResponsibilities\n\nAs a DevOps/Platform Engineer, your responsibilities will include:\n\n* Designing and implementing complex systems (e.g. scale our research CI with a strong focus toward reliability, reproducibility and speed)\n\n* Building flexible yet solid and accessible development environment for researchers, so they can focus on core mission.\n\n* Designing, implementing and advocating for solutions addressing large amounts of data and maintainable data pipelines.\n\n* Optimizing a variety of builds: container images, large libraries compilation times, python environments...\n\n* Building strong relationships with researchers, understanding their workflow and enabling them to achieve more by leveraging your expertise.\n\n* Communicating and producing documentation or any content that will help them to make the most out of the tools and systems you'll build.\n\n* Being part of the team that "platformizes" research and constantly improve the daily experience for researchers while avoiding future roadblocks.\n\nAbout You\n\n* 5+ years of successful experience in a similar DX / DevOps / SRE role.\n\n* Proficiency in software development (Python, Go...) and programming best practices.\n\n* Exposure to site reliability engineering: root cause analysis, in-production troubleshooting, on-call rotations...\n\n* Exposure to infrastructure management: CI/CD, containerization, orchestration, infra-as-code, monitoring, logging, alerting, observability...\n\n* Technical product mindset (e.g. understanding how to debug poor adoption).\n\n* Excellent problem-solving and communication skills (ability to contextualizing, gauging risks and getting buy-in for high stakes and impactful solutions).\n\n* Ownership, high agency and constantly seeking to learn and improving things for others.\n\n* Autonomous, self-driven and able to work well in a fast-paced startup environment.\n\n* Low ego and team spirit mindset.\n\nYour Application Will Be All The More Interesting If You Also Have:\n\n* First hand Bazel (or equivalent) experience.\n\n* Strong knowledge of Python's ecosystem.\n\n* Familiarity with GPU based workloads and ecosystems.\n\n* Experience of full remote environments (you're comfortable with having some of your users on the other side of the globe).\n\nHiring Process\n\n* Intro Call - 30 min\n\n* Tech Culture Interview - 30 min\n\n* Technical Rounds - 2 x 45 min\n\n* Culture-fit Discussion - 30 min\n\n* Reference Calls\n\nBy Applying, You Agree To Our Applicant Privacy Policy.\n\nAdditional Information\n\nLocation & Remote\n\nThis role is primarily based at one of our European offices (Paris, France and London, UK). We will prioritize candidates who either reside there or are open to relocating. We strongly believe in the value of in-person collaboration to foster strong relationships and seamless communication within our team. In certain specific situations, we will also consider remote candidates based in one of the countries listed in this job posting , currently France & UK. In that case, we ask all new hires to visit our local office:\n\n* for the first week of their onboarding (accommodation and travelling covered)\n\n* then at least 3 days per month\n\nWhat We Offer\n\n* Competitive salary and equity\n\n* Health insurance\n\n* Transportation allowance\n\n* Sport allowance\n\n* Meal vouchers\n\n* Private pension plan\n\n* Parental: Generous parental leave policy\n\n* Visa sponsorship\n\nBy Applying, You Agree To Our Applicant Privacy Policy.

XML job scraping automation by YubHub

]]> full-time senior remote software development, python, go, site reliability engineering, infrastructure management, CI/CD, containerization, orchestration, infra-as-code, monitoring, logging, alerting, observability, bazel, python's ecosystem, gpu based workloads, full remote environments Engineering Technology Mistral AI https://logos.yubhub.co/mistral.ai.png Mistral AI develops high-performance, open-source AI models and products for enterprise use. The company has differs locations worldwide. https://mistral.ai https://jobs.lever.co/mistral/18be2b70-c05d-48e4-82ac-e5cb462c96c0 Paris 2026-04-17 e4891ce8-465 Software Engineer, DevEx We are seeking an experienced Software Engineer, Developer Experience to own and foster a collaborative, automated, and efficient software development lifecycle. In this role, you will collaborate closely with product engineering teams to ensure consistent code health, accelerate development velocity through well-maintained CI pipelines, faster builds, and secure release processes.

Your mission is to empower our software engineering team with seamless workflows while securing our production environments.

Responsibilities:

Build, monitor, and enhance CI/CD pipelines to streamline development workflows and accelerate deployments.
Design, operate, and maintain scalable, reliable, and secure multi-cloud infrastructures.
Identify areas for improvement and create innovative solutions that enable high developer velocity.

Team Collaboration & Advocacy:

Standardize DevOps practices to ensure consistency across all engineering teams.
Establish measurable KPIs for security performance, reliability, and compliance adherence.
Partner with development and operations teams to embed security into daily workflows.
Lead training initiatives to upskill teams on secure coding, threat modeling, and incident response.
Champion a security-first mindset, driving cultural adoption of DevSecOps principles across the organization.

About you:

5+ years of successful experience in a similar role (DevOps, Developer Experience, Platform Engineer, Internal tooling engineer, SRE...).
Strong proficiency in scripting languages (Go, Python...) and software development best practices.
Developer experience engineering: developer workflow optimization, tooling, and automation for productivity, real-time developer support, and escalation paths.
Site Reliability Engineering: CI/CD, containerization, orchestration, infra-as-code, monitoring, logging, alerting, observability...
Exposure to multi-cloud infrastructures (AWS / GCP / Azure or On-Prem).
Security Tools & Approaches: OWASP, SAST, DAST, SCA, vulnerability scanners.
Proven problem-solving and communication skills , ability to contextualize, gauge risks, and get buy-in for high-stakes and impactful solutions.
Ownership, high agency, and desire to improve things for others.
Autonomy, self-drive, and ability to work well in a fast-paced startup environment.
Low ego and team spirit mindset.

XML job scraping automation by YubHub

]]> full-time senior onsite scripting languages (Go, Python...), software development best practices, developer experience engineering, site reliability engineering, multi-cloud infrastructures, security tools & approaches Engineering Technology Mistral AI https://logos.yubhub.co/mistral.ai.png Mistral AI develops and provides high-performance, open-source AI models, products, and solutions for enterprise use. https://mistral.ai/ https://jobs.lever.co/mistral/c9e16eb0-0cb9-423d-8495-a96d10782622 Paris 2026-04-17 26bff84c-def Senior/Staff Platform Engineer/SRE About the Role We are seeking a Senior Platform Engineer who will design, develop, and deploy robust platform solutions to ensure the reliability, scalability, and security of our system.

Responsibilities

Identify and build AI-powered capabilities into Flow's platform, from intelligent automation in building operations to personalized resident experiences.
Use AI-assisted development tools (e.g., Cursor, Claude Code) as part of your daily workflow to accelerate development, improve code quality, and push the boundaries of what a small team can ship.
Collaborate with product and engineering teams to define clear requirements and translate them into software solutions.
Core contributor to implementing foundational infrastructure, tooling and automation that is scalable, reliable, and secure.
Elevate site reliability engineering best practices while collaborating with back-end developers.
Develop service-level tooling to enhance productionization, data migrations, system hardening, and related initiatives.
Manage and optimize a multi-region environment.
Be available for on-call activities for infrastructure and services.

Ideal Background

A minimum 10 years in software engineering, site reliability engineering, or platform engineering.
Fluency with AI-assisted development tools and a strong point of view on how AI changes the way software gets built.
Ability to design, implement and maintain the tools and systems that support service reliability, monitoring, and alerting.
Deep understanding of the principles of ensuring high availability, fault tolerance, and efficiency in distributed systems.
Experience with Infrastructure as Code (IaC): Proficiency with Terraform.
Experience with Kubernetes.
Experience administering cloud-based infrastructure (GCP preferred).
Experience troubleshooting production issues related to cloud infrastructure, configuration, monitoring, deployments, continuous integration and delivery.
A keen ability to balance elegant design with pragmatic tradeoffs, prioritizing continuous delivery of business value.
Ability to quickly learn and adapt to new skillsets.
Experience building software in fast-moving startup environments.
Participate in incident response and post-mortems to identify and address systemic issues.

Additional Information Benefits

Comprehensive Benefits Package (Medical / Dental / Vision / Disability / Life)
Paid time off and 13 paid holidays
401(k) retirement plan
Healthcare and Dependent Care Flexible Spending Accounts (FSAs)
Access to HSA-compatible plans
Pre-tax commuter benefits
Employee Assistance Program (EAP), free therapy through SpringHealth, acupuncture, and other wellness offerings

XML job scraping automation by YubHub

]]> full-time senior hybrid $180,000-275,000 per year AI-assisted development tools, Terraform, Kubernetes, Cloud-based infrastructure administration, Site reliability engineering, Monitoring and alerting, Service-level tooling, Multi-region environment management Engineering Technology Flow https://logos.yubhub.co/flow.com.png Flow is a real estate company that operates a technology platform and operations ecosystem spanning condominiums, hotels, multifamily residences, and office spaces. https://flow.com https://jobs.lever.co/flowlife/3ae47b09-e4b4-41be-9312-fafb1d85cf4d Palo Alto 2026-04-17 dc8cf321-418 Staff Software Engineer, Backend (AI Agent Integrations) Join us on this thrilling journey to revolutionize the workforce with AI.

Cresta's AI Agent team is building enterprise-grade AI Agents that can operate inside real-world contact center environments. A critical part of that mission is enabling our AI Agents to seamlessly integrate with customers' CCaaS platforms (Contact Center as a Service), including voice and digital channels , and to smoothly transition conversations between AI and human agents when needed.

This team is focused on building the backend systems that allow our AI Agents to:

Integrate deeply with leading CCaaS platforms
Participate in live customer conversations across voice and chat
Maintain full conversation state and context
Perform real-time actions within the CCaaS ecosystem
Seamlessly hand off conversations to human agents , without losing context, history, or workflow state
Support human agents with AI assistance after transfer

We are looking for strong backend engineers who want to work at the intersection of distributed systems, real-time communication, enterprise integrations, and AI Agent orchestration.

As a Staff Backend Engineer, you will lead the architecture and technical direction of Cresta’s AI Agent integration platform. You will define how our AI Agents connect to, operate within, and scale across complex enterprise ecosystems.

Responsibilities:

Lead the architecture and evolution of Cresta’s AI Agent integration framework across CCaaS platforms
Design scalable, extensible backend systems that manage real-time conversation state, session lifecycle, and context propagation
Establish architectural patterns for AI-to-human handoff that ensure durability, reliability, and seamless customer experience
Define integration strategies for voice, chat, messaging, routing, and agent desktop APIs across enterprise platforms
Drive system design for high availability, low latency, and fault tolerance in real-time environments
Set standards for observability, monitoring, incident response, and operational excellence
Partner closely with ML engineers to operationalize AI Agent capabilities into production-grade systems
Influence technical roadmap and prioritization in collaboration with engineering leadership
Mentor senior engineers and raise the bar for backend engineering excellence across the organization
Lead complex cross-team technical initiatives from design through production rollout

Qualifications We Value:

Bachelor’s degree in Computer Science or related field
8+ years of experience building scalable backend systems in production environments
Demonstrated experience leading architecture for large-scale distributed systems
Deep expertise in API design (REST, gRPC) and service-oriented architectures
Strong understanding of real-time communication systems and low-latency system design
Experience designing integrations with third-party enterprise platforms and APIs
Proven track record of driving technical direction across teams
Experience with containerized environments (Kubernetes, Docker)
Experience with cloud platforms such as AWS, GCP, or Azure
Strong expertise in reliability engineering, observability, and enterprise-grade security
Experience with CCaaS platforms, contact center systems, or real-time communications is highly valued
Familiarity with AI Agents, LLM-based systems, or AI orchestration platforms is a strong plus

Perks & Benefits:

We offer Cresta employees a variety of medical, dental, and vision plans, designed to fit you and your family’s needs
Paid parental leave to support you and your family
Monthly Health & Wellness allowance
Work from home office stipend to help you succeed in a remote environment
Lunch reimbursement for in-office employees
PTO: 3 weeks in Canada

Compensation for this position includes a base salary, equity, and a variety of benefits. Actual base salaries will be based on candidate-specific factors, including experience, skillset, and location, and local minimum pay requirements as applicable.

XML job scraping automation by YubHub

]]> full-time staff remote API design, Service-oriented architectures, Real-time communication systems, Low-latency system design, Containerized environments, Cloud platforms, Reliability engineering, Observability, Enterprise-grade security, CCaaS platforms, Contact center systems, Real-time communications, AI Agents, LLM-based systems, AI orchestration platforms Engineering Technology Cresta https://logos.yubhub.co/cresta.ai.png Cresta is a technology company that specializes in developing AI-powered contact center solutions. https://www.cresta.ai/ https://job-boards.greenhouse.io/cresta/jobs/5137152008 Canada (Remote) 2026-04-17 871d4845-25a Software Engineer, DevOps, Research Platform We are seeking a talented and experienced software engineer to join our Research Platform team. You'll work closely with our R&D team to build a cloud agnostic platform that improves the stability, scalability and velocity across the research department.

As a DevOps/Platform Engineer, your responsibilities will include designing and implementing complex systems, building flexible yet solid and accessible development environment for researchers, designing, implementing and advocating for solutions addressing large amounts of data and maintainable data pipelines, optimizing a variety of builds, building strong relationships with researchers, communicating and producing documentation or any content that will help them to make the most out of the tools and systems you'll build.

About you:

5+ years of successful experience in a similar DX / DevOps / SRE role.
Proficiency in software development (Python, Go...) and programming best practices.
Exposure to site reliability engineering: root cause analysis, in-production troubleshooting, on-call rotations...
Exposure to infrastructure management: CI/CD, containerization, orchestration, infra-as-code, monitoring, logging, alerting, observability...
Technical product mindset (e.g. understanding how to debug poor adoption).
Excellent problem-solving and communication skills (ability to contextualizing, gauging risks and getting buy-in for high stakes and impactful solutions).
Ownership, high agency and constantly seeking to learn and improving things for others.
Autonomous, self-driven and able to work well in a fast-paced startup environment.
Low ego and team spirit mindset.

Your application will be all the more interesting if you also have:

First hand Bazel (or equivalent) experience.
Strong knowledge of Python's ecosystem.
Familiarity with GPU based workloads and ecosystems.
Experience of full remote environments (you're comfortable with having some of your users on the other side of the globe).

XML job scraping automation by YubHub

]]> full-time senior remote software development, Python, Go, site reliability engineering, infrastructure management, CI/CD, containerization, orchestration, infra-as-code, monitoring, logging, alerting, observability, Bazel, Python's ecosystem, GPU based workloads and ecosystems, full remote environments Engineering Technology Mistral AI Mistral AI is an AI technology company that provides high-performance, optimized, open-source and cutting-edge models, products and solutions. https://mistral.ai/careers https://jobs.lever.co/mistral/18be2b70-c05d-48e4-82ac-e5cb462c96c0 Paris 2026-03-10 92a78695-a57 Software Engineer, DevEx We are seeking an experienced Software Engineer, Developer Experience to own and foster a collaborative, automated, and efficient software development lifecycle. In this role, you will collaborate closely with product engineering teams to ensure consistent code health, accelerate development velocity through well-maintained CI pipelines, faster builds, and secure release processes.

Your mission is to empower our software engineering team with seamless workflows while securing our production environments.

Responsibilities:

Build, monitor, and enhance CI/CD pipelines to streamline development workflows and accelerate deployments.
Design, operate and maintain scalable, reliable and secure multi-cloud infrastructures
Identify areas for improvement and create innovative solutions that enable high developer velocity

Team Collaboration & Advocacy:

Standardize DevOps practices to ensure consistency across all engineering teams.
Establish measurable KPIs for security performance, reliability, and compliance adherence.
Partner with development and operations teams to embed security into daily workflows.
Lead training initiatives to upskill teams on secure coding, threat modeling, and incident response.
Champion a security-first mindset, driving cultural adoption of DevSecOps principles across the organization.

About you:

5+ years of successful experience in a similar role (DevOps, Developer Experience, Platform Engineer, Internal tooling engineer, SRE...)
Strong proficiency in scripting languages (Go, Python...) and software development best practices.
Developer experience engineering: developer workflow optimization, tooling and automation for productivity, real-time developer support and escalation paths
Site Reliability Engineering: CI/CD, containerization, orchestration, infra-as-code, monitoring, logging, alerting, observability...
Exposure to multi-cloud infrastructures (AWS / GCP / Azure or On-Prem)
Security Tools & Approaches: OWASP, SAST, DAST, SCA, vulnerability scanners

Proven problem-solving and communication skills — ability to contextualizing, gauging risks and getting buy-in for high stakes and impactful solutions.

Ownership, high agency and desire to improve things for others.

Autonomy, self-drive and ability to work well in a fast-paced startup environment.

Low ego and team spirit mindset.

XML job scraping automation by YubHub

]]> full-time senior onsite scripting languages (Go, Python...), software development best practices, developer experience engineering, site reliability engineering, multi-cloud infrastructures (AWS / GCP / Azure or On-Prem), security tools & approaches (OWASP, SAST, DAST, SCA, vulnerability scanners) Engineering Technology Mistral AI Mistral AI is an AI technology company that provides high-performance, optimized, open-source and cutting-edge models, products and solutions for enterprise needs. https://mistral.ai https://jobs.lever.co/mistral/c9e16eb0-0cb9-423d-8495-a96d10782622 Paris 2026-03-10 eafe9949-c5e Cybersecurity Engineer, SIEM About Mistral AI\n====================\n\nAt Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life.\n\nWe are a global company with teams distributed between France, USA, UK, Germany and Singapore. Our comprehensive AI platform meets enterprise needs, whether on-premises or in cloud environments.\n\nRole Summary\n============\n\nMistral is looking for a Security Platform Engineer to architect and maintain the infrastructure ensuring the observability of our production systems. You will treat the SIEM and logging infrastructure as a high-performance data product.\n\nResponsibilities\n---------------\n\n* Own the set-up, lifecycle, availability, and performance of the SIEM solution, ensuring 99.9% uptime for log ingestion and query availability.\n* Design and maintain high-throughput data pipelines to collect, buffer, and transport logs from distributed systems to the SIEM.\n* Implement parsing logic and schema standardization to ensure unstructured logs are searchable and actionable for analysts.\n* Manage alert rules, connectors, and dashboard configurations, avoiding manual console configuration ("ClickOps").\n* Analyze ingestion patterns to identify noisy, low-value data. Implement filtering and aggregation at the source to maximize signal-to-noise ratio.\n* Architect data tiers to balance query performance with compliance retention requirements and cloud costs.\n\nAbout You\n========\n\n* 5+ years of experience in Site Reliability Engineering (SRE), Data Engineering, or Security Engineering with a focus on logging infrastructure.\n* Deep understanding of log management challenges at scale (indexing strategies, sharding, partitioning, throughput tuning).\n* Strong experience deploying and monitoring stateful workloads on Kubernetes and Cloud providers (Azure/GCP) and On-Prem.\n* Ability to write production-grade Python or Go for automation and custom log exporters.\n* Experience managing monitoring, alerting, and on-call rotations for critical infrastructure.\n\nHiring Process\n============\n\n* Introduction call - 30 min\n* Hiring Manager interview - 30 min\n* Technical Rounds I - 45 min\n* Technical Rounds II - 60 min\n* Culture-fit discussion - 30 min\n* References\n\nAdditional Information\n====================\n\nLocation & Remote\n-----------------\nThe position is based in our Paris HQ offices and we encourage going to the office as much as we can (at least 3 days per week) to create bonds and smooth communication. Our remote policy aims to provide flexibility, improve work-life balance and increase productivity. Each manager can decide the amount of days worked remotely based on autonomy and a specific context (e.g. more flexibility can occur during summer). In any case, employees are expected to maintain regular communication with their teams and be available during core working hours.\n\nWhat We Offer\n============\n\n* Competitive salary and equity package\n* Health insurance\n* Transportation allowance\n* Sport allowance\n* Meal vouchers\n* Private pension plan\n* Generous parental leave policy

XML job scraping automation by YubHub

]]> full-time senior hybrid Site Reliability Engineering, Data Engineering, Security Engineering, Logging infrastructure, Kubernetes, Cloud providers, Python, Go, Monitoring, Alerting, On-call rotations Engineering Technology Mistral AI Mistral AI is an AI platform provider that offers high-performance, optimized, open-source and cutting-edge models, products and solutions for enterprise needs. https://mistral.ai https://jobs.lever.co/mistral/6f7f6e7a-3dc4-430b-8957-a64450a10066 Paris 2026-03-10 eebf21c4-d1f Staff Site Reliability Engineer Join our Site Reliability Engineering (SRE) team and help ensure the reliability, scalability, and performance of Replit's infrastructure that serves millions of developers worldwide.

As a Staff Site Reliability Engineer, you will bridge the gap between development and operations, implementing automation and establishing best practices that enable our platform to scale efficiently while maintaining high availability.

We are seeking Staff SREs who are passionate about building and maintaining resilient systems at scale. Your mission will be to proactively find and analyze reliability problems across our stack, then design and implement software and systems to create step-function improvements.

You will design robust observability solutions, lead incident response, automate operational tasks, and continuously improve our infrastructure's reliability, all while mentoring and educating the broader engineering team to make reliability a core value at Replit.

Responsibilities

Architect and Implement Observability: Design, build, and lead the implementation of comprehensive monitoring, logging, and tracing solutions. Create dashboards and metrics that provide real-time visibility into system health and performance, enabling proactive issue detection.

Define and Drive Reliability Standards: Work with product and engineering teams to define, implement, and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs). Build systems to monitor and report on these metrics, holding teams accountable and ensuring we maintain high reliability standards while balancing innovation speed.

Lead Incident Management and Response: Act as a senior leader during high-impact incidents, guiding the team to rapid resolution. Conduct thorough, blameless post-mortems and drive the implementation of preventative measures. Develop and refine runbooks and build automation to reduce Mean Time To Recovery (MTTR).

Drive Automation and Infrastructure as Code: Architect, build, and improve automation to eliminate toil and operational work. Design and maintain CI/CD pipelines and infrastructure automation using tools like Terraform or Pulumi. Create self-healing systems that can automatically respond to common failure scenarios.

Optimize Performance on Kubernetes: Collaborate with core infrastructure and product teams to performance-tune and optimize our large-scale cloud deployments, with a deep focus on Kubernetes, Docker, and GCP. Identify and resolve performance bottlenecks, implement capacity planning strategies, and reduce latency across global regions.

Debug and Harden Distributed Systems: Dive deep into debugging extremely difficult technical problems across the stack. Use your findings to design and implement long-term fixes that make our systems and products more robust, operable, and easier to diagnose.

Provide Staff-Level Guidance: Review feature and system designs from across the company, acting as a key owner for the reliability, scalability, security, and operational integrity of those designs.

Educate and Mentor: Educate, mentor, and hold accountable the broader engineering team to improve the reliability of our systems, making reliability a core value of the Replit engineering culture.

Build and Integrate: Write high-quality, well-tested code in Python or Go to meet the needs of your customers, whether it's building new internal tools or integrating with third-party vendors.

Required Skills and Experience

8-10 years of experience in Site Reliability Engineering or similar roles (e.g., DevOps, Systems Engineering, Infrastructure Engineering).

Strong programming skills in languages like Python or Go. You write high-quality, well-tested code.

Deep understanding of distributed systems. You’ve designed, built, scaled, and maintained production services and know how to compose a service-oriented architecture.

Deep experience with container orchestration platforms, specifically Kubernetes, and cloud-native technologies.

Proven track record of designing, implementing, and maintaining sophisticated monitoring and observability solutions (e.g., metrics, logging, tracing).

Strong incident management skills with extensive experience leading incident response for complex systems and demonstrated critical thinking under pressure.

Experience with infrastructure as code (e.g., Terraform, Pulumi) and configuration management tools.

Excellent written and verbal communication skills, with an ability to explain complex technical concepts clearly and simply and a bias toward open, transparent cultural practices.

Strong interpersonal skills, with experience working with and mentoring engineers from junior to principal levels.

A willingness to dive into understanding, debugging, and improving any layer of the stack.

You're passionate about making software creation accessible and empowering the next generation of builders.

Bonus Points

Deep experience with Google Cloud Platform (GCP) services and tools.

Expert-level knowledge of modern observability platforms (e.g., Prometheus, Grafana, Datadog, OpenTelemetry).

Experience designing and building reliable systems capable of handling high throughput and low latency.

Significant experience with Go and Terraform.

Familiarity with working in rapid-growth, startup environments.

Experience writing company-facing blog posts and training materials.

XML job scraping automation by YubHub

]]> Full time staff remote $220K - $325K Site Reliability Engineering, DevOps, Systems Engineering, Infrastructure Engineering, Python, Go, Distributed Systems, Container Orchestration, Kubernetes, Cloud-Native Technologies, Monitoring and Observability, Incident Management, Infrastructure as Code, Terraform, Pulumi, Configuration Management, Google Cloud Platform, Prometheus, Grafana, Datadog, OpenTelemetry, Go, Terraform Engineering Technology Replit https://logos.yubhub.co/replit.com.png Replit is an agentic software creation platform that enables anyone to build applications using natural language, with millions of users worldwide. https://jobs.ashbyhq.com https://jobs.ashbyhq.com/replit/d50ad15b-82d4-452f-b4ea-2a7f5e796170 Remote (United States) 2026-03-08 1f6d8d36-cd5 Data Center Incident Program Manager Compensation

The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. The salary range is $125.6K – $228K. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts

Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)

401(k) retirement plan with employer match

Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)

Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees

13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)

Mental health and wellness support

Employer-paid basic life and disability coverage

Annual learning and development stipend to fuel your professional growth

Daily meals in our offices, and meal delivery credits as eligible

Relocation support for eligible employees

Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.

About the Team:

OpenAI, in close collaboration with our capital partners, is embarking on a journey to build the world’s most advanced AI infrastructure ecosystem. Our Stargate program develops and deploys massive, state-of-the-art data center campuses in partnership with industry leaders such as Oracle today—and through future OpenAI infrastructure projects tomorrow. We design for scale, speed, and reliability, and we need experienced hardware professionals who can help ensure our high-density compute environment operates at peak performance.

About the Role:

The Data Center Incident Program Manager is responsible for designing, operating, and continuously improving the end-to-end incident management lifecycle across mission-critical data center environments. This role owns the “before, during, and after” mechanics of incidents — establishing standards and playbooks in steady state, serving as (or designating) Incident Commander during active events, and driving structured post-incident review and corrective action to closure.

In this role you will:

Define and maintain incident severity levels (SEV definitions), classification criteria, and escalation thresholds.

Establish end-to-end incident response standards: protocols, lifecycle stages (declare → stabilize → mitigate → recover → close), and operating cadence.

Build and maintain governance artifacts: runbooks, war room formats, reporting templates, and decision/communication standards.

Create and operationalize notification trees, stakeholder comms templates (initial, periodic updates, recovery/closure), and executive escalation criteria.

Define clear RACI across Facilities, Hardware Ops, Network, Security, and vendor/partner teams, including handoffs and accountability paths.

Set and manage SLAs/OLAs for acknowledgment, escalation, containment, mitigation, and reporting.

Implement and run incident management tooling (ticketing, paging, logging) and ensure integrations with monitoring and workflow systems.

Establish dashboards and program health metrics to track incident performance and readiness.

Lead readiness activities: tabletop exercises, cross-functional simulations, IC/Deputy training, and a rotating on-call IC bench with certification standards.

Serve as Incident Commander as needed: declare severity, stand up the war room, assign functional leads, and drive structured execution under pressure.

Maintain real-time documentation (decisions, timelines, impact scope) and ensure clear restoration objectives and scope control during active events.

Run post-incident reviews (PIRs), validate timelines, drive structured RCA (e.g., 5 Whys, Fault Tree), and separate root cause vs contributing factors.

Define corrective/preventative actions (CAPAs), assign accountable owners, track to verified closure, and escalate overdue actions.

Publish trend reporting (incident taxonomy, counts by severity, MTTA/MTTR, repeat failure domains) and feed systemic gaps back into design and operations teams.

You might thrive in this role if you:

7+ years in mission-critical infrastructure, data center operations, or reliability engineering

Direct experience leading major incidents (P1/P0 equivalent)

Strong familiarity with facilities systems, hardware operations, or network infrastructure

Demonstrated experience running war rooms and executive updates

Experience conducting root cause analysis and corrective action tracking

Ability to remain calm and decisive under high-pressure conditions

Preferred Skills:

Experience in hyperscale or high-density AI compute environments

Background in facilities commissioning, facility operations, hardware operations, or network reliability

Familiarity with ISO-based quality systems or structured operational documentation frameworks

Experience implementing incident tooling (PagerDuty, ServiceNow, Jira, etc.)

XML job scraping automation by YubHub

]]> Full time senior Remote $125.6K – $228K incident management, data center operations, reliability engineering, facilities systems, hardware operations, network infrastructure, root cause analysis, corrective action tracking, hyperscale, high-density AI compute environments, facilities commissioning, facility operations, ISO-based quality systems, structured operational documentation frameworks, incident tooling Engineering Technology OpenAI https://logos.yubhub.co/openai.com.png OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. It is a large-scale organisation. https://jobs.ashbyhq.com https://jobs.ashbyhq.com/openai/16aaa47f-596d-4bbd-a02a-b03db3f40c23 Remote - US 2026-03-08 8c164f95-f8d Senior Infrastructure Engineer Join our Infrastructure Engineering team and help ensure the reliability, scalability, and performance of Replit's infrastructure that serves millions of developers worldwide. As a Senior Infrastructure Engineer, you will bridge the gap between development and operations, implementing automation and establishing best practices that enable our platform to scale efficiently while maintaining high availability.

We are seeking Senior Infrastructure Engineers who are passionate about building and maintaining resilient systems at scale. Your mission will be to proactively find and analyse reliability problems across our stack, then design and implement software and systems to address them. You will build robust monitoring solutions, automate operational tasks, and continuously improve our infrastructure's reliability.

You Will:

Drive Automation and Infrastructure as Code: Build and improve automation to eliminate toil and operational work. Maintain CI/CD pipelines and infrastructure automation using tools like Terraform or Pulumi. Create self-healing systems that can automatically respond to common failure scenarios.
Optimise Performance and Infrastructure: Collaborate with core infrastructure and product teams to performance tune and optimise our cloud deployments (Kubernetes, Docker, GCP). Identify and resolve performance bottlenecks and implement capacity planning strategies.
Elevate Developer Experience: Design and implement improvements to our build, test, and deployment systems to make software delivery faster, safer, and more reliable for all engineers.
Drive Cross-Team Improvements: Partner with service owners across Replit to understand their pain points, and collaborate on implementing build/test/deploy enhancements within their specific services.
Build Shared Tooling: Create and maintain centralized tooling and automation that improves the engineering lifecycle, from local development to production monitoring.
Debug and Harden Systems: Dive deep into debugging difficult technical problems, making our systems and products more robust, operable, and easier to diagnose.
Collaborate on Design Reviews: Participate in feature and system design reviews, contributing expertise on security, scale, and operational considerations.
Build and Integrate: Write high-quality, well-tested code to meet the needs of your customers, including building pipelines to integrate with 3rd party vendors.

Required Skills and Experience:

4+ years of experience in Site Reliability Engineering or similar roles (DevOps, Systems Engineering, Infrastructure Engineering).
Strong programming skills in languages like Python or Go.
You write high-quality, well-tested code.
Solid understanding of distributed systems. You've built, scaled, and maintained production services and understand service-oriented architecture.
Experience with container orchestration platforms (Kubernetes) and cloud-native technologies.
Experience implementing and maintaining monitoring/observability solutions, with strong skills in debugging and performance tuning.
Strong incident management skills with experience participating in incident response and demonstrated critical thinking under pressure.
Experience with infrastructure as code (e.g., Terraform) and configuration management tools.
Excellent written and verbal communication skills, with an ability to explain technical concepts clearly.
A willingness to dive into understanding, debugging, and improving any layer of the stack.
You're passionate about making software creation accessible and empowering the next generation of builders.

Bonus Points:

Experience with Google Cloud Platform (GCP) services and tools.
Knowledge of modern observability platforms (Prometheus, Grafana, Datadog, etc.).
Experience building reliable systems capable of handling high throughput and low latency.
Experience with Go and Terraform.
Familiarity with working in rapid-growth environments.

_This is a full-time role that can be held from our Foster City, CA office. The role has an in-office requirement of Monday, Wednesday, and Friday._

Full-Time Employee Benefits Include:

Competitive Salary & Equity
401(k) Program with a 4% match
Health, Dental, Vision and Life Insurance
Short Term and Long Term Disability
Paid Parental, Medical, Caregiver Leave
Commuter Benefits
Monthly Wellness Stipend
Autonomous Work Environment
In Office Set-Up Reimbursement
Flexible Time Off (FTO) + Holidays
Quarterly Team Gatherings
In Office Amenities

XML job scraping automation by YubHub

]]> full-time senior hybrid $190K - $240K Site Reliability Engineering, DevOps, Systems Engineering, Infrastructure Engineering, Python, Go, Terraform, Kubernetes, Docker, GCP, Monitoring/observability solutions, Debugging and performance tuning, Incident management, Infrastructure as code, Configuration management tools, Google Cloud Platform (GCP) services and tools, Modern observability platforms (Prometheus, Grafana, Datadog, etc.), Building reliable systems capable of handling high throughput and low latency, Go and Terraform, Familiarity with working in rapid-growth environments Engineering Technology Replit https://logos.yubhub.co/replit.com.png Replit is a software creation platform that enables anyone to build applications using natural language. With millions of users worldwide, Replit is a leading platform in the software development industry. https://jobs.ashbyhq.com https://jobs.ashbyhq.com/replit/16c85abc-763c-4f36-ab67-64f416343384 Foster City, CA 2026-03-07 b7de618e-5e1 Site Reliability Engineer Join our Site Reliability Engineering team and help ensure the reliability, scalability, and performance of Replit's infrastructure that serves millions of developers worldwide. As a Site Reliability Engineer, you will bridge the gap between development and operations, implementing automation and establishing best practices that enable our platform to scale efficiently while maintaining high availability.

We are seeking SREs who are passionate about building and maintaining resilient systems at scale. Your mission will be to design and implement robust monitoring solutions, automate operational tasks, and continuously improve our infrastructure's reliability and performance.

Responsibilities

Design and Implement Observability Solutions: Develop comprehensive monitoring and alerting systems using modern observability tools. Create dashboards and metrics that provide real-time visibility into system health and performance. Implement logging strategies that enable quick problem identification and resolution.

Drive Automation and Infrastructure as Code: Architect and implement infrastructure automation solutions using tools like Terraform, Ansible, or Pulumi. Design and maintain CI/CD pipelines that enable reliable and consistent deployments. Create self-healing systems that can automatically respond to common failure scenarios.

Establish SLOs and SLIs: Work with product and engineering teams to define and implement Service Level Objectives (SLOs) and Service Level Indicators (SLIs). Build systems to track and report on these metrics, ensuring we maintain high reliability standards while balancing innovation speed.

Incident Management and Response: Lead incident response efforts, conducting thorough post-mortems, and implementing improvements to prevent future occurrences. Develop and maintain runbooks for critical services. Build tools and processes that reduce Mean Time To Recovery (MTTR).

Performance Optimization: Identify and resolve performance bottlenecks across our infrastructure. Implement capacity planning strategies and optimize resource utilization. Work on reducing latency and improving system efficiency across global regions.

Requirements

4-8 years of experience in Site Reliability Engineering or similar roles (DevOps, Systems Engineering, Infrastructure Engineering)

Strong programming skills in languages commonly used for automation (Python, Go, or similar)

Deep understanding of distributed systems

Experience with container orchestration platforms (Kubernetes) and cloud-native technologies

Proven track record of implementing and maintaining monitoring/observability solutions

Strong incident management skills with experience leading incident response

Experience with infrastructure as code and configuration management tools

Bonus Points

Experience with Google Cloud Platform (GCP) services and tools

Knowledge of modern observability platforms (Prometheus, Grafana, Datadog, etc.)

What We Value

Problem-solving mindset: Ability to approach complex operational challenges systematically and devise effective solutions

Self-directed and autonomous: Capable of working independently while collaborating effectively with cross-functional teams

Strong communication skills: Ability to explain complex technical concepts to both technical and non-technical audiences

Continuous learning: Passion for staying current with industry best practices and new technologies

Focus on automation: Strong belief in automating repetitive tasks and building self-healing systems

Full-Time Employee Benefits Include

Competitive Salary & Equity

401(k) Program with a 4% match

Health, Dental, Vision and Life Insurance

Short Term and Long Term Disability

Paid Parental, Medical, Caregiver Leave

Commuter Benefits

Monthly Wellness Stipend

Autonomous Work Environment

In Office Set-Up Reimbursement

Flexible Time Off (FTO) + Holidays

Quarterly Team Gatherings

In Office Amenities

Want to Learn More About What We Are Up To?

Meet the Replit Agent

Replit: Make an app for that

Replit Blog

Amjad TED Talk

Interviewing + Culture at Replit

Operating Principles

Reasons not to work at Replit

XML job scraping automation by YubHub

]]> full-time senior remote $160K - $250K Site Reliability Engineering, DevOps, Systems Engineering, Infrastructure Engineering, Python, Go, Distributed systems, Container orchestration platforms, Cloud-native technologies, Monitoring/observability solutions, Incident management, Infrastructure as code, Configuration management tools, Google Cloud Platform, Prometheus, Grafana, Datadog Engineering Technology Replit https://logos.yubhub.co/replit.com.png Replit is a software creation platform that enables anyone to build applications using natural language. With millions of users worldwide, Replit is a leading provider of software development tools. https://jobs.ashbyhq.com https://jobs.ashbyhq.com/replit/f6e6158e-eb89-4008-81ea-1b7512bc509d United States 2026-03-07 323bc85d-b69 Staff Infrastructure Engineer About the Role:

Join our Infrastructure Engineering team and help ensure the reliability, scalability, and performance of Replit's infrastructure that serves millions of developers worldwide. As a Staff Infrastructure Engineer, you will bridge the gap between development and operations, implementing automation and establishing best practices that enable our platform to scale efficiently while maintaining high availability.

Responsibilities:

Drive Automation and Infrastructure as Code: Architect, build, and improve automation to eliminate toil and operational work. Design and maintain CI/CD pipelines and infrastructure automation using tools like Terraform or Pulumi. Create self-healing systems that can automatically respond to common failure scenarios.

Optimise Performance and Infrastructure: Collaborate with core infrastructure and product teams to performance tune and optimise our cloud deployments (Kubernetes, Docker, GCP). Identify and resolve performance bottlenecks, implement capacity planning strategies, and reduce latency across global regions.

Elevate Developer Experience: Design and implement improvements to our build, test, and deployment systems to make software delivery faster, safer, and more reliable for all engineers.

Drive Cross-Company Improvements: Partner directly with service owners across Replit to understand their pain points, and collaborate on implementing build/test/deploy enhancements within their specific services.

Build Shared Tooling: Create and maintain centralized tooling and automation that improves the entire engineering lifecycle, from local development to production monitoring.

Debug and Harden Systems: Dive deep into debugging extremely difficult technical problems, making our systems and products more robust, operable, and easier to diagnose.

Provide Staff-Level Guidance: Review feature and system designs, acting as an owner for the security, scale, and operational integrity of those designs.

Educate and Mentor: Educate, mentor, and hold accountable the engineering team to improve the reliability of our systems, making reliability a core value of the Replit engineering culture.

Build and Integrate: Write high-quality, well-tested code to meet the needs of your customers, including building pipelines to integrate with 3rd party vendors.

Required Skills and Experience:

8-10 years of experience in Infrastructure Engineering or similar roles (DevOps, Systems Engineering, Site Reliability Engineering).

Strong programming skills in languages like Python or Go.

You write high-quality, well-tested code.

Deep understanding of distributed systems. You've designed, built, scaled, and maintained production services and know how to compose a service-oriented architecture.

Experience with container orchestration platforms (Kubernetes) and cloud-native technologies.

Proven track record of implementing and maintaining monitoring/observability solutions, with strong skills in debugging and performance tuning.

Strong incident management skills with experience leading incident response and demonstrated critical thinking under pressure.

Experience with infrastructure as code (e.g., Terraform) and configuration management tools.

Excellent written and verbal communication skills, with an ability to explain technical concepts clearly and simply and a bias toward open, transparent cultural practices.

Strong interpersonal skills, with experience working with engineers from junior to principal levels.

A willingness to dive into understanding, debugging, and improving any layer of the stack.

You're passionate about making software creation accessible and empowering the next generation of builders.

Bonus Points:

Deep experience with Google Cloud Platform (GCP) services and tools.

Knowledge of modern observability platforms (Prometheus, Grafana, Datadog, etc.).

Experience designing and building reliable systems capable of handling high throughput and low latency.

Experience with Go and Terraform.

Familiarity with working in rapid-growth environments.

Experience writing company-facing blog posts and training materials.

Full-Time Employee Benefits Include:

Competitive Salary & Equity

401(k) Program with a 4% match

Health, Dental, Vision and Life Insurance

Short Term and Long Term Disability

Paid Parental, Medical, Caregiver Leave

Commuter Benefits

Monthly Wellness Stipend

Autonomous Work Environment

In Office Set-Up Reimbursement

Flexible Time Off (FTO) + Holidays

Quarterly Team Gatherings

XML job scraping automation by YubHub

]]> full-time staff hybrid $220K – $325K Infrastructure Engineering, DevOps, Systems Engineering, Site Reliability Engineering, Python, Go, Distributed systems, Container orchestration platforms, Cloud-native technologies, Monitoring/observability solutions, Infrastructure as code, Configuration management tools, Google Cloud Platform, Prometheus, Grafana, Datadog, Go, Terraform, Rapid-growth environments, Company-facing blog posts, Training materials Engineering Technology Replit https://logos.yubhub.co/replit.com.png Replit is a software creation platform that enables anyone to build applications using natural language. With millions of users worldwide, Replit is democratizing software development by removing traditional barriers to application creation. https://jobs.ashbyhq.com https://jobs.ashbyhq.com/replit/6481ec1e-527c-4c1f-a041-2fb5021e7bd5 Foster City, CA 2026-03-07 237ffb32-054 Software Engineer, Security Observability Software Engineer, Security Observability

Location

Remote - US

Employment Type

Full time

Location Type

Remote

Department

Security

Compensation

$234.4K – $385K • Offers Equity

The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts

Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)

401(k) retirement plan with employer match

Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)

Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees

13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)

Mental health and wellness support

Employer-paid basic life and disability coverage

Annual learning and development stipend to fuel your professional growth

Daily meals in our offices, and meal delivery credits as eligible

Relocation support for eligible employees

Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.

More details about our benefits are available to candidates during the hiring process.

This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.

About the Team

Security is at the foundation of OpenAI’s mission to ensure that artificial general intelligence benefits all of humanity.

The Security team protects OpenAI’s technology, people, and products. We are technical in what we build but are operational in how we do our work, and are committed to supporting all products and research at OpenAI. Our Security team tenets include: prioritizing for impact, enabling researchers, preparing for future transformative technologies, and engaging a robust security culture.

About the Role

We are seeking a Software Engineer, Security Observability to join our Security team. In this role, you will be responsible for building secure, scalable systems that enhance our security observability infrastructure. Leveraging your strong engineering skills, you will collaborate with cross-functional teams to develop, deploy, and maintain robust software solutions that support our security and detection capabilities.

This role is open to remote employees, or relocation assistance is available to one of our OpenAI offices in San Francisco, Seattle, or New York City.

In this role, you will:

Design and develop scalable software systems that facilitate security observability across our infrastructure.

Build and maintain data pipelines that centralize and store security-relevant data from diverse sources.

Proactively improve the resilience and reliability of data systems to ensure high platform availability

Collaborate closely with Detection & Response (D&R) and other security teams to reduce the company’s security risk.

Contribute to data engineering in support of forensic investigations and compliance efforts.

You might thrive in this role if you have:

Strong software engineering experience, with proficiency in programming languages such as Python, Golang, or similar.

A background in infrastructure as code, with experience using tools like Terraform and working with cloud platforms such as Azure.

Experience with building and maintaining data pipelines, particularly for security-related use cases.

A generalist engineering mindset, with the flexibility to pivot between various technical domains such as databases, site reliability engineering (SRE), or security.

The ability to collaborate effectively with security and engineering teams to understand evolving data needs and implement scalable solutions.

A proactive and detail-oriented approach to problem-solving, with a focus on improving security data visibility and forensic capabilities.

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

XML job scraping automation by YubHub

]]> full-time mid remote $234.4K – $385K Python, Golang, Terraform, Azure, data pipelines, security-related use cases, databases, site reliability engineering (SRE), security, infrastructure as code, cloud platforms, forensic investigations, compliance efforts Engineering Technology OpenAI https://logos.yubhub.co/openai.com.png OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. https://jobs.ashbyhq.com https://jobs.ashbyhq.com/openai/92bf4ff3-7acf-4e49-8e09-47e4e8bd1f83 Remote - US 2026-03-06 edcdad0c-360 Software Engineer, Security Observability Software Engineer, Security Observability

Location

San Francisco

Employment Type

Full time

Location Type

Hybrid

Department

Security

Compensation

$234.4K – $385K • Offers Equity

Benefits

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts

Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)

401(k) retirement plan with employer match

Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)

Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees

13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)

Mental health and wellness support

Employer-paid basic life and disability coverage

Annual learning and development stipend to fuel your professional growth

Daily meals in our offices, and meal delivery credits as eligible

Relocation support for eligible employees

Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.

About the Team

Security is at the foundation of OpenAI’s mission to ensure that artificial general intelligence benefits all of humanity.

About the Role

This role is open to remote employees, or relocation assistance is available to one of our OpenAI offices in San Francisco, Seattle, or New York City.

In this role, you will:

Design and develop scalable software systems that facilitate security observability across our infrastructure.

Build and maintain data pipelines that centralize and store security-relevant data from diverse sources.

Proactively improve the resilience and reliability of data systems to ensure high platform availability

Collaborate closely with Detection & Response (D&R) and other security teams to reduce the company’s security risk.

Contribute to data engineering in support of forensic investigations and compliance efforts.

You might thrive in this role if you have:

Strong software engineering experience, with proficiency in programming languages such as Python, Golang, or similar.

A background in infrastructure as code, with experience using tools like Terraform and working with cloud platforms such as Azure.

Experience with building and maintaining data pipelines, particularly for security-related use cases.

A generalist engineering mindset, with the flexibility to pivot between various technical domains such as databases, site reliability engineering (SRE), or security.

The ability to collaborate effectively with security and engineering teams to understand evolving data needs and implement scalable solutions.

A proactive and detail-oriented approach to problem-solving, with a focus on improving security data visibility and forensic capabilities.

About OpenAI

XML job scraping automation by YubHub

]]> full-time mid hybrid $234.4K – $385K • Offers Equity Python, Golang, Terraform, Azure, data pipelines, security-related use cases, databases, site reliability engineering (SRE), security, infrastructure as code, cloud platforms, data engineering, forensic investigations, compliance efforts Engineering Technology OpenAI https://logos.yubhub.co/openai.com.png OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. The company pushes the boundaries of the capabilities of AI systems and seeks to safely deploy them to the world through their products. https://jobs.ashbyhq.com https://jobs.ashbyhq.com/openai/3e254907-5101-438d-8708-f6f34e5c75ea San Francisco 2026-03-06 88643d65-f58 Software Engineer, Security Observability Software Engineer, Security Observability

Location

Seattle

Employment Type

Full time

Department

Security

Compensation

$234.4K – $385K • Offers Equity

Benefits

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts

Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)

401(k) retirement plan with employer match

Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)

Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees

13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)

Mental health and wellness support

Employer-paid basic life and disability coverage

Annual learning and development stipend to fuel your professional growth

Daily meals in our offices, and meal delivery credits as eligible

Relocation support for eligible employees

Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.

About the Team

Security is at the foundation of OpenAI’s mission to ensure that artificial general intelligence benefits all of humanity.

About the Role

This role is open to remote employees, or relocation assistance is available to one of our OpenAI offices in San Francisco, Seattle, or New York City.

In this role, you will:

Design and develop scalable software systems that facilitate security observability across our infrastructure.

Build and maintain data pipelines that centralize and store security-relevant data from diverse sources.

Proactively improve the resilience and reliability of data systems to ensure high platform availability

Collaborate closely with Detection & Response (D&R) and other security teams to reduce the company’s security risk.

Contribute to data engineering in support of forensic investigations and compliance efforts.

You might thrive in this role if you have:

Strong software engineering experience, with proficiency in programming languages such as Python, Golang, or similar.

A background in infrastructure as code, with experience using tools like Terraform and working with cloud platforms such as Azure.

Experience with building and maintaining data pipelines, particularly for security-related use cases.

A generalist engineering mindset, with the flexibility to pivot between various technical domains such as databases, site reliability engineering (SRE), or security.

The ability to collaborate effectively with security and engineering teams to understand evolving data needs and implement scalable solutions.

A proactive and detail-oriented approach to problem-solving, with a focus on improving security data visibility and forensic capabilities.

About OpenAI

XML job scraping automation by YubHub

]]> full-time mid remote $234.4K – $385K Python, Golang, Terraform, Azure, data pipelines, security-related use cases, databases, site reliability engineering (SRE), security, infrastructure as code, cloud platforms, data engineering, forensic investigations, compliance efforts Engineering Technology OpenAI https://logos.yubhub.co/openai.com.png OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. The company was founded in 2015 and has since grown to become a leading player in the field of artificial intelligence. https://jobs.ashbyhq.com https://jobs.ashbyhq.com/openai/747bb870-4ef1-4bfd-b2c0-d48042a85080 Seattle 2026-03-06 7f4e2dd8-338 Software Engineer, Security Observability Software Engineer, Security Observability

Location

New York City

Employment Type

Full time

Location Type

Hybrid

Department

Security

Compensation

$325K – $405K • Offers Equity

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts

Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)

401(k) retirement plan with employer match

Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)

Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees

13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)

Mental health and wellness support

Employer-paid basic life and disability coverage

Annual learning and development stipend to fuel your professional growth

Daily meals in our offices, and meal delivery credits as eligible

Relocation support for eligible employees

Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.

More details about our benefits are available to candidates during the hiring process.

This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.

About the Team

Security is at the foundation of OpenAI’s mission to ensure that artificial general intelligence benefits all of humanity.

About the Role

This role is open to remote employees, or relocation assistance is available to one of our OpenAI offices in San Francisco, Seattle, or New York City.

In this role, you will:

Design and develop scalable software systems that facilitate security observability across our infrastructure.

Build and maintain data pipelines that centralize and store security-relevant data from diverse sources.

Proactively improve the resilience and reliability of data systems to ensure high platform availability

Collaborate closely with Detection & Response (D&R) and other security teams to reduce the company’s security risk.

Contribute to data engineering in support of forensic investigations and compliance efforts.

You might thrive in this role if you have:

Strong software engineering experience, with proficiency in programming languages such as Python, Golang, or similar.

A background in infrastructure as code, with experience using tools like Terraform and working with cloud platforms such as Azure.

Experience with building and maintaining data pipelines, particularly for security-related use cases.

A generalist engineering mindset, with the flexibility to pivot between various technical domains such as databases, site reliability engineering (SRE), or security.

The ability to collaborate effectively with security and engineering teams to understand evolving data needs and implement scalable solutions.

A proactive and detail-oriented approach to problem-solving, with a focus on improving security data visibility and forensic capabilities.

About OpenAI

XML job scraping automation by YubHub

]]> full-time mid hybrid $325K – $405K • Offers Equity Python, Golang, Terraform, Azure, data pipelines, security-related use cases, databases, site reliability engineering (SRE), security, infrastructure as code, cloud platforms, forensic investigations, compliance efforts Engineering Technology OpenAI https://logos.yubhub.co/openai.com.png OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. https://jobs.ashbyhq.com https://jobs.ashbyhq.com/openai/1e4e9985-babf-4bd9-8fe8-a2016250780d New York City 2026-03-06 fb4acb2b-bab Security Reliability Engineering, Lead Security Reliability Engineering, Lead

Location

San Francisco

Employment Type

Full time

Department

Security

Compensation

$293K – $385K

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts

Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)

401(k) retirement plan with employer match

Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)

Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees

13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)

Mental health and wellness support

Employer-paid basic life and disability coverage

Annual learning and development stipend to fuel your professional growth

Daily meals in our offices, and meal delivery credits as eligible

Relocation support for eligible employees

Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.

More details about our benefits are available to candidates during the hiring process.

This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.

About the Team

The Infrastructure Engineering function sits within IT and is responsible for reliably building, deploying, and operating critical on prem and hybrid environments that power internal services and critical R&D environments.

This is a new, bootstrap team focused on applying strong Site Reliability Engineering discipline to environments where uptime, safety, recoverability, and security are non-negotiable. The team replaces bespoke, one off infrastructure with standardized infrastructure-as-code building blocks that compound reliability and operational leverage as OpenAI scales.

About the Role

We are looking for a Security Reliability Engineering Lead to design, build, and operate reliable, secure, and scalable infrastructure that underpins identity, access, endpoint, and shared platform services across the company.

In this role, you will own infrastructure and identity systems end to end, from foundational design and provisioning through policy enforcement, upgrades, recovery, and day two operations. You will establish durable, production grade platforms that remove operational friction, enforce security by default, and enable teams to move faster with confidence.

This role is well suited for a senior engineer who thrives in ambiguity, enjoys owning complex systems end to end, and raises the reliability and security bar by replacing fragile implementations with standardized, repeatable infrastructure.

This role is based in our San Francisco HQ and requires in-office presence.

In this role, you will:

Set direction and establish strong foundations

Define and evolve infrastructure patterns for on prem and hybrid environments, including self hosted platforms, vendor supported systems, and lab environments.

Establish standardized, production grade deployment and operational models that replace bespoke implementations.

Partner with IT, Security, Identity, and Network teams to ensure infrastructure meets reliability, security, and access requirements by design.

Design and mature the production architecture for IAM adjacent platforms such as Microsoft Entra using SRE principles.

Establish common management rules and shared resources within Azure subscriptions to ensure consistent, policy aligned operations.

Build, operate, and scale reliably

Own the full lifecycle of infrastructure systems, including deployment, upgrades, patching, recovery, and ongoing operations.

Operate and harden shared infrastructure provisioned through Infra Terraform, ensuring repeatability, auditability, and safe change management.

Design and implement infrastructure as code and configuration management to support shared services, identity adjacent systems, and endpoint platforms using tools like Chef, Ansible and Terraform.

Build and operate monitoring, alerting, and incident response mechanisms to meet high availability and recoverability targets.

Lead incident response and postmortems across infrastructure, identity adjacent platforms, and fleet systems, driving durable fixes and shared learning.

Build and operate containerized and platform services, including Kubernetes and Docker-based workloads, using DevOps practices that emphasize reliability, repeatability, and safe change management.

Use Git-based workflows as the source of truth for infrastructure and policy changes, enabling review, auditability, and safe, reversible automation.

Automate for leverage and safety

Identify high leverage automation opportunities that eliminate manual toil and reduce operational risk across infrastructure and access related systems.

Implement guardrails, safety mechanisms, and progressive rollout patterns for infrastructure and policy enforcement changes.

Ensure automation is safe, observable, and resilient under failure conditions, particularly for shared services and high blast radius systems.

XML job scraping automation by YubHub

]]> full-time senior onsite $293K – $385K Security Reliability Engineering, Infrastructure as Code, Cloud Computing, Containerization, DevOps, Git, Terraform, Ansible, Chef, Kubernetes, Docker, Microsoft Entra, Azure, Identity and Access Management, Endpoint Security, Platform Services, Site Reliability Engineering, Cloud Security, Container Orchestration, Infrastructure Automation, Monitoring and Alerting, Incident Response, Postmortem Analysis, DevOps Practices, Cloud-Native Applications, Microservices Architecture Engineering Technology OpenAI https://logos.yubhub.co/openai.com.png OpenAI is a technology company that specializes in artificial intelligence. It was founded in 2015 and is headquartered in San Francisco. https://jobs.ashbyhq.com https://jobs.ashbyhq.com/openai/645ccd65-eb60-4eb7-b094-b01c2269638c San Francisco 2026-03-06 d4efa5c8-cef Offensive Security Engineer, Hardware Job Posting

Offensive Security Engineer, Hardware

Location

San Francisco

Employment Type

Full time

Department

Security

Compensation

San Francisco$293K – $490K • Offers Equity

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts

Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)

401(k) retirement plan with employer match

Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)

Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees

13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)

Mental health and wellness support

Employer-paid basic life and disability coverage

Annual learning and development stipend to fuel your professional growth

Daily meals in our offices, and meal delivery credits as eligible

Relocation support for eligible employees

Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.

More details about our benefits are available to candidates during the hiring process.

This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.

About the Team

Security is at the foundation of OpenAI’s mission to ensure that artificial general intelligence benefits all of humanity. The Security team protects OpenAI’s technology, people, and products. We are technical in what we build but are operational in how we do our work, and are committed to supporting all products and research at OpenAI. Our Security team tenets include: prioritizing for impact, enabling researchers, preparing for future transformative technologies, and engaging a robust security culture.

About the Role

We're seeking an exceptional Principal-level Offensive Security Engineer to challenge and strengthen OpenAI's security posture. This role isn't your typical red team job - it's an opportunity to engage broadly and deeply, craft innovative attack simulations, collaborate closely with defensive teams, and influence strategic security improvements across the organization.

You'll have the chance to not only find vulnerabilities but actively drive their resolution, automate offensive techniques with cutting-edge technologies, and use your unique attacker perspective to shape our security strategy. This role will be primarily focused on continuously testing our hardware products and related services.

In this role you will:

Collaborate proactively with engineering teams to enhance security and mitigate risks in hardware, firmware, and software.

Perform comprehensive penetration testing on our diverse suite of products.

Leverage advanced automation and OpenAI technologies to optimize your offensive security work.

Present insightful, actionable findings clearly and compellingly to inspire impactful change.

Influence security strategy by providing attacker-driven insights into risk and threat modeling.

You might thrive in this role if you have:

7+ years of hands-on experience or exceptional accomplishments demonstrating equivalent expertise.

Exceptional skill in code review, identifying novel and subtle vulnerabilities.

Demonstrated mastery assessing complex technology stacks, including:

Proven ability to reverse engineer bootrom images, firmware, or silicon-level components.

Deep familiarity with low-level kernel operations, secure boot processes, and hardware-software interactions.

Hands-on experience building and validating secure boot chains and threat models.

Proficiency with hardware debugging tools (UART, JTAG, SWD, oscilloscopes, logic analyzers).

Solid programming skills in C/C++, Python, or assembly for embedded systems.

Industry experience securing consumer hardware (e.g., mobile devices, IoT, chipsets).

Excellent written and verbal communication skills for technical and non-technical audiences.

Strong intuitive understanding of trust boundaries and risk assessment in dynamic contexts.

Excellent coding skills, capable of writing robust tools and automation for offensive operations.

Ability to communicate complex technical concepts effectively through compelling storytelling.

Proven track record of not just finding vulnerabilities but actively contributing to solutions in complex codebases.

Bonus points:

Prior experience working in tech startups or fast-paced technology environments.

Experience in related disciplines such as Software Engineering (SWE), Detection Engineering, Site Reliability Engineering (SRE), Security Engineering, or IT Infrastructure.

About OpenAI

XML job scraping automation by YubHub

]]> full-time senior onsite $293K – $490K code review, penetration testing, advanced automation, secure boot processes, hardware debugging tools, C/C++, Python, assembly, embedded systems, consumer hardware, firmware, silicon-level components, low-level kernel operations, secure boot chains, threat models, UART, JTAG, SWD, oscilloscopes, logic analyzers, solid programming skills, industry experience, excellent written and verbal communication skills, trust boundaries, risk assessment, dynamic contexts, compelling storytelling, complex technical concepts, offensive operations, robust tools and automation, tech startups, fast-paced technology environments, Software Engineering, Detection Engineering, Site Reliability Engineering, Security Engineering, IT Infrastructure Engineering Technology OpenAI https://logos.yubhub.co/openai.com.png OpenAI is an AI research and deployment company NGDedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. https://jobs.ashbyhq.com https://jobs.ashbyhq.com/openai/f123bbe4-7f19-46c8-a6ab-4a5d7b714988 San Francisco 2026-03-06 5e6602c2-e9d Full Stack Engineer, Health AI Full Stack Engineer, Health AI

Location

San Francisco

Employment Type

Full time

Department

Applied AI

Compensation

$293K – $325K • Offers Equity

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts

Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)

401(k) retirement plan with employer match

Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)

Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees

13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)

Mental health and wellness support

Employer-paid basic life and disability coverage

Annual learning and development stipend to fuel your professional growth

Daily meals in our offices, and meal delivery credits as eligible

Relocation support for eligible employees

Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.

More details about our benefits are available to candidates during the hiring process.

This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.

Job Description

OpenAI’s charter calls on us to ensure the benefits of AI are distributed broadly and safely. Our Health AI team focuses on expanding access to high-quality medical expertise and aims to set a high standard for deploying AI responsibly in high-stakes domains.

Improving health will be one of the defining impacts of AGI. Today, millions of people lack access to reliable medical information, and clinicians around the world face increasing time and resource constraints. We are building AI systems that support patients, clinicians, and health workers, while meeting the highest standards for safety, reliability, and privacy.

We are seeking full stack software engineers to help build and scale products used by consumers and care providers globally. You will work closely with product, design, and research teams to ship real systems in a fast-moving, high-impact environment.

Responsibilities

Design and build scalable fullstack systems for consumer and enterprise health.

Own end-to-end feature development—from early design and implementation through deployment, monitoring, and iteration.

Build and maintain data pipelines and services that meet strict privacy, security, and compliance requirements (e.g., HIPAA).

Collaborate closely with researchers and safety teams to integrate reliability, evaluation, and guardrails into production systems.

Debug, optimize, and harden systems to support high availability, performance, and global scale.

Take ownership of ambiguous problems and drive them to practical, high-quality solutions.

Requirements

Are deeply motivated by improving health outcomes and expanding access to medical expertise.

Are a strong engineer who enjoys building durable, well-designed systems.

Have 5+ years of experience writing maintainable, production-quality code.

Can operate with high agency—owning problems end-to-end with minimal supervision.

Enjoy working in fast-moving, cross-functional teams with engineers, product managers, designers, and researchers.

Are comfortable doing unglamorous but essential work when it is needed for the mission (e.g., compliance, data plumbing, reliability).

Stay goal-oriented rather than tool-oriented, choosing what works over what is fashionable.

Are thoughtful about the societal impact of AI and aligned with OpenAI’s mission to ensure AGI benefits all of humanity.

Bonus if you have

Strong product engineering instincts—able to take a rough spec and ship a polished solution.

Entrepreneurial drive or previous experience as a founder.

Prior experience with health-related products.

High-quality public code (e.g., GitHub projects, open-source contributions).

Experience building compliant or regulated products (healthcare, enterprise, finance, or similar).

Experience with distributed systems, backend architecture, or large-scale data pipelines.

About OpenAI

XML job scraping automation by YubHub

]]> full-time senior onsite $293K – $325K • Offers Equity Full Stack Development, Scalable System Design, Data Pipelines, HIPAA Compliance, Reliability Engineering, Distributed Systems, Backend Architecture, Large-Scale Data Pipelines, Product Engineering, Entrepreneurial Drive, Health-Related Products, Public Code, Compliant or Regulated Products, Distributed Systems, Backend Architecture Engineering Technology OpenAI https://logos.yubhub.co/openai.com.png OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. https://jobs.ashbyhq.com https://jobs.ashbyhq.com/openai/c2aeb70d-3eca-4c4f-a414-6394b30fea80 San Francisco 2026-03-06 b447a8bc-5f1 Backend Software Engineer - B2B Connectors Location

San Francisco; New York City

Employment Type

Full time

Location Type

Hybrid

Department

Applied AI

Compensation

$230K – $385K • Offers Equity

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts

Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)

401(k) retirement plan with employer match

Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)

Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees

13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)

Mental health and wellness support

Employer-paid basic life and disability coverage

Annual learning and development stipend to fuel your professional growth

Daily meals in our offices, and meal delivery credits as eligible

Relocation support for eligible employees

Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.

More details about our benefits are available to candidates during the hiring process.

This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.

About the Team

OpenAI’s mission is to make AGI beneficial for all of humanity and our mission is successful only if AGI drives real benefits across all industries in the world. Our goal in B2B applications is to enable this mission by helping businesses, enterprises & governments redefine how they operate to empower people and accelerate economic growth.

Connectors are the bridge between OpenAI products (ChatGPT Enterprise, Frontier, and the API) and the systems where work actually happens—documents, tickets, messages, CRM records, knowledge bases, and more. The Connectors Platform team builds the infrastructure and control plane that makes these integrations reliable, secure, scalable, and enterprise-ready across a wide range of partners and customer environments.

About the Role

We’re looking for an infrastructure-focused engineer to build and operate the systems that make Connectors dependable at global scale. In this role, you’ll design the control plane, reliability foundations, and operational tooling that power connector execution—auth flows, sync and indexing pipelines, rate limiting, isolation, observability, incident response, and safe rollouts. You’ll work closely with product engineering, partner teams, and security to ship enterprise-grade connectivity while meeting high bars for privacy, compliance, and uptime.

In this role, you will:

Design and operate the infrastructure that powers connector sync, indexing, and retrieval at scale (job orchestration, queues, storage, caching, backpressure).

Build the “control plane” primitives for connectors: rollout controls, configuration management, permissions, policy enforcement, and kill switches.

Own reliability and operational excellence: SLOs, monitoring/alerting, incident response, postmortems, on-call health, and capacity planning.

Create guardrails for safe multi-tenant execution: isolation boundaries, secrets handling, rate limits, abuse prevention, and blast-radius reduction.

Partner with security and compliance teams to ensure enterprise requirements are met (audibility, least privilege, data retention, and secure-by-default architecture).

Improve developer velocity via internal tooling: local dev workflows, canary environments, load testing, and observability dashboards.

Your background might look something like:

5+ years of professional engineering experience (excluding internships) in infra / SRE / platform roles at tech and product-driven companies.

Strong distributed systems fundamentals and production instincts (availability, latency, correctness, resilience).

Experience building and operating services with meaningful uptime and scale requirements (multi-region is a plus).

Proficient in one or more backend languages (e.g. Python, Rust) and comfortable working close to systems concerns (networking, storage, queueing).

Deep familiarity with observability (metrics, logs, tracing), incident management, and reliability engineering practices.

Comfortable navigating ambiguous problem spaces and pushing pragmatic solutions into production.

Interest in AI/ML is a plus, but not required.

About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

XML job scraping automation by YubHub

]]> full-time senior hybrid $230K – $385K Backend languages (e.g. Python, Rust), Distributed systems fundamentals, Production instincts (availability, latency, correctness, resilience), Experience building and operating services with meaningful uptime and scale requirements, Proficient in one or more backend languages and comfortable working close to systems concerns (networking, storage, queueing), Deep familiarity with observability (metrics, logs, tracing), incident management, and reliability engineering practices, AI/ML Engineering Technology OpenAI https://logos.yubhub.co/openai.com.png OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. The company is focused on developing and deploying AI systems that are safe and beneficial to society. https://jobs.ashbyhq.com https://jobs.ashbyhq.com/openai/cbacb6bd-aa41-41af-a5d5-13515a1be72b San Francisco; New York City 2026-03-06