Solutions Architect - Kubernetes

588dfb0e-611 Solutions Architect - Kubernetes As a Solutions Architect at CoreWeave, you will play a vital role in helping customers succeed with our cloud infrastructure offerings, focusing on Kubernetes solutions within high-performance compute (HPC) environments.

Your responsibilities will include serving as the primary technical point of contact for customers, establishing strong technical relationships and ensuring their success with CoreWeave's cloud infrastructure offerings.

You will collaborate closely with customers to understand their unique business needs and create, prototype, and deploy tailored solutions that align with their requirements.

You will lead proof of concept initiatives to showcase the value and viability of CoreWeave's solutions within specific environments.

You will drive technical leadership and direction during customer meetings, presentations, and workshops, addressing any technical queries or concerns that arise.

You will act as a virtual member of CoreWeave's Kubernetes product and engineering teams, identifying opportunities for product enhancement and collaborating with engineers to implement your suggestions.

You will offer valuable insights on product features, functionality, and performance, contributing regularly to discussions about product strategy and architecture.

You will conduct periodic technical reviews and assessments of customer workloads, pinpointing opportunities for workload optimization and suggesting suitable solutions.

You will stay informed of the latest developments and trends in Kubernetes, cloud computing and infrastructure, sharing your thought leadership with customers and internal stakeholders.

You will lead the prototyping and initiation of research and development efforts for emerging products and solutions, delivering prototypes and key insights for internal consumption.

You will represent CoreWeave at conferences and industry events, with occasional travel as required.

To be successful in this role, you will need to have a B.S. in Computer Science or a related technical discipline, or equivalent experience.

You will also need to have 7+ years of proven experience as a Solutions Architect, engineer, researcher, or technical account manager in cloud infrastructure, focusing on building distributed systems or HPC/cloud services, with an expertise focused on scalable Kubernetes solutions.

You will need to be fluent in cloud computing concepts, architecture, and technologies with hands-on experience in designing and implementing cloud solutions.

You will need to have a proven track record with building customer relationships, communicating clearly and the ability to break down complex technical concepts to both technical and non-technical audiences.

You will need to be familiar with NVIDIA GPUs typically used in AI/ML applications and associated technologies such as Infiniband and NVIDIA Collective Communications Library (NCCL).

You will need to have experience with running large-scale Artificial Intelligence/Machine Learning (AI/ML) training and inference workloads on technologies such as Slurm and Kubernetes.

Preferred qualifications include code contributions to open-source inference frameworks, experience with scripting and automation related to Kubernetes clusters and workloads, experience with building solutions across multi-cloud environments, and client or customer-facing publications/talks on latency, optimization, or advanced model-server architectures.

XML job scraping automation by YubHub

]]> full-time senior hybrid $165,000 to $220,000 Kubernetes, Cloud Computing, High-Performance Compute (HPC), Distributed Systems, Cloud Infrastructure, Scalable Solutions, NVIDIA GPUs, Infiniband, NVIDIA Collective Communications Library (NCCL), Slurm, Kubernetes Clusters, Code Contributions to Open-Source Inference Frameworks, Scripting and Automation Related to Kubernetes Clusters and Workloads, Building Solutions Across Multi-Cloud Environments, Client or Customer-Facing Publications/Talks on Latency, Optimization, or Advanced Model-Server Architectures Engineering Technology CoreWeave https://logos.yubhub.co/coreweave.com.png CoreWeave is a cloud infrastructure provider that offers a platform for building and scaling AI workloads. https://www.coreweave.com https://job-boards.greenhouse.io/coreweave/jobs/4557835006 Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA 2026-04-18 d799d883-0dd Solutions Architect- Networking As a Solutions Architect at CoreWeave, you will play a vital role in leading innovation at every turn. You will have the opportunity to demonstrate thought leadership and engage hands-on throughout our customers' entire lifecycle. From establishing their Kubernetes environment to developing proofs of concept, onboarding, and optimizing workloads, you will lead innovation at every turn.

In this role, you will:

Serve as the primary technical point of contact for customers, establishing strong technical relationships and ensuring their success with CoreWeave's cloud infrastructure offerings, focusing on networking technologies within high-performance compute (HPC) environments Collaborate closely with customers to understand their unique business needs and create, prototype, and deploy tailored solutions that align with their requirements. Lead proof of concept initiatives to showcase the value and viability of CoreWeave's solutions within specific environments. Drive technical leadership and direction during customer meetings, presentations, and workshops, addressing any technical queries or concerns that arise. Act as a virtual member of CoreWeave's Networking product and engineering teams, identifying opportunities for product enhancement and collaborating with engineers to implement your suggestions. Offer valuable insights on product features, functionality, and performance, contributing regularly to discussions about product strategy and architecture. Conduct periodic technical reviews and assessments of customer workloads, pinpointing opportunities for workload optimization and suggesting suitable solutions. Stay informed of the latest developments and trends in Kubernetes, cloud computing and infrastructure, sharing your thought leadership with customers and internal stakeholders. Lead the prototyping and initiation of research and development efforts for emerging products and solutions, delivering prototypes and key insights for internal consumption. Represent CoreWeave at conferences and industry events, with occasional travel as required.

Who You Are:

B.S. in Computer Science or a related technical discipline, or equivalent experience 7+ years of proven experience as a Solutions Architect, engineer, researcher, or technical account manager in cloud infrastructure focusing on building distributed systems or HPC/cloud services, with an expertise focused on infrastructure networking. Fluency in cloud computing concepts, architecture, and technologies with hands-on experience in designing and implementing cloud solutions Proven track record with building customer relationships, communicating clearly and the ability to break down complex technical concepts to both technical and non-technical audiences Expertise with a broad range of networking technologies and topics, with a familiarity to understand the needs and use cases is it relates to securing and enabling high performance networking environments. Experience with managing infrastructure networking, Kubernnetes CSI management, and private networking concepts Familiar with NVIDIA GPUs typically used in AI/ML applications and associated technologies such as Infiniband and NVIDIA Collective Communications Library (NCCL)

Preferred:

Code contributions to open-source inference frameworks Experience with scripting and automation related to network technologies Experience with building solutions across multi-cloud environments Client or customer-facing publications/talks on latency, optimization, or advanced model-server architectures

XML job scraping automation by YubHub

]]> full-time senior hybrid $165,000 to $220,000 cloud computing, Kubernetes, infrastructure networking, high-performance computing, networking technologies, NVIDIA GPUs, Infiniband, NVIDIA Collective Communications Library (NCCL), open-source inference frameworks, scripting and automation, multi-cloud environments, latency, optimization, or advanced model-server architectures Engineering Technology CoreWeave https://logos.yubhub.co/coreweave.com.png CoreWeave is a cloud infrastructure provider that enables innovators to build and scale AI with confidence. https://www.coreweave.com https://job-boards.greenhouse.io/coreweave/jobs/4568528006 Livingston, NJ / New York, NY / Sunnyvale, CA 2026-04-18 9166d234-4c5 Solutions Architect - HPC/AI/ML As a Solutions Architect at CoreWeave, you will play a vital and dynamic role in helping customers establish their Kubernetes environment, develop proofs of concept, onboard, and optimise workloads. You will serve as the primary technical point of contact for customers, establishing strong technical relationships and ensuring their success with CoreWeave's cloud infrastructure offerings, focusing on AI/ML workloads within high-performance compute (HPC) environments.

Collaborate closely with customers to understand their unique business needs and create, prototype, and deploy tailored solutions that align with their requirements. Lead proof of concept initiatives to showcase the value and viability of CoreWeave's solutions within specific environments.

Drive technical leadership and direction during customer meetings, presentations, and workshops, addressing any technical queries or concerns that arise. Act as a virtual member of CoreWeave's Kubernetes product and engineering teams, identifying opportunities for product enhancement and collaborating with engineers to implement your suggestions.

Offer valuable insights on product features, functionality, and performance, contributing regularly to discussions about product strategy and architecture. Conduct periodic technical reviews and assessments of customer workloads, pinpointing opportunities for workload optimisation and suggesting suitable solutions.

Stay informed of the latest developments and trends in Kubernetes, cloud computing and infrastructure, sharing your thought leadership with customers and internal stakeholders. Lead the prototyping and initiation of research and development efforts for emerging products and solutions, delivering prototypes and key insights for internal consumption.

Represent CoreWeave at conferences and industry events, with occasional travel as required.

XML job scraping automation by YubHub

]]> full-time senior onsite $165,000 to $225,000 SGD cloud computing concepts, architecture, technologies, NVIDIA GPUs, Infiniband, NVIDIA Collective Communications Library (NCCL), Slurm, Kubernetes, code contributions to open-source inference frameworks, scripting and automation related to AI/ML workloads, building solutions across multi-cloud environments, client or customer-facing publications/talks on latency, optimisation, or advanced model-server architectures Engineering Technology CoreWeave https://logos.yubhub.co/coreweave.com.png CoreWeave is a cloud infrastructure provider specialising in artificial intelligence and machine learning workloads. https://www.coreweave.com https://job-boards.greenhouse.io/coreweave/jobs/4649044006 Singapore 2026-04-18 a8092b6e-7f5 Bare Metal Support Engineer As a Bare Metal Support Engineer at CoreWeave, you will be responsible for supporting, operating, and maintaining CoreWeave's extensive GPU fleet across our growing data centers in the U.S., Europe, and beyond.

You will work closely with customers, data center technicians, and engineering teams to ensure the reliability, performance, and scalability of our infrastructure.

Key responsibilities include:

Providing high-level support for customers utilizing bare-metal GPU fleets on CoreWeave Cloud.
Diagnosing, triaging, and investigating reported customer issues and high-priority incidents, identifying root causes and escalating when necessary.
Developing a deep understanding of customer workloads and use cases to provide tailored technical support.
Coordinating remote troubleshooting and hardware interventions with Data Center Technicians.
Creating and maintaining internal documentation, including troubleshooting guides, best practices, and knowledge base articles.
Participating in an on-call rotation to support production clusters and ensure operational reliability.
Collaborating with engineering teams to improve hardware reliability, software stability, and system performance.
Implementing automation and scripting to streamline support workflows and reduce manual interventions.
Performing in-depth log analysis and debugging across multiple layers of the stack (firmware, drivers, hardware).
Providing feedback to internal teams on common support issues to drive continuous improvements.
Working with networking teams to troubleshoot connectivity issues affecting customer workloads.
Supporting supercomputing infrastructure running GPU workloads at scale.
Driving operational excellence by refining internal processes and support methodologies.

To succeed in this role, you will need:

Experience in data centers, GPU clusters, server deployments, system administration, or hardware troubleshooting.
Demonstrated experience driving resolutions and continuous improvements across cross-functional environments and teams within a data center environment.
Intermediate knowledge of Linux (Ubuntu, CentOS, or similar), including command-line proficiency.
Experience with NVIDIA GPUs, SuperMicro systems, Dell systems, high-performance computing (HPC), and large-scale data center environments.
Experience in networking fundamentals (TCP/IP, VLANs, DNS, DHCP) and troubleshooting tools.
Hands-on experience with firmware updates, BIOS configurations, and driver management.
Experience analyzing system logs and debugging issues across firmware, drivers, and hardware layers.
Experience working with Jira, Confluence, Notion, or other issue-tracking and documentation platforms.
Experience in scripting and automation (Python, Bash, Ansible, or similar).

If you're a curious and analytical individual with a passion for problem-solving and a desire to work in a fast-paced environment, we'd love to hear from you!

XML job scraping automation by YubHub

]]> full-time mid hybrid $83,000 to $132,000 Linux, GPU clusters, server deployments, system administration, hardware troubleshooting, NVIDIA GPUs, SuperMicro systems, Dell systems, high-performance computing, large-scale data center environments, networking fundamentals, troubleshooting tools, firmware updates, BIOS configurations, driver management, system logs, debugging issues, Jira, Confluence, Notion, issue-tracking, documentation platforms, scripting, automation, Kubernetes, Docker, containerized infrastructure Engineering Technology CoreWeave https://logos.yubhub.co/coreweave.com.png CoreWeave is a cloud computing company that delivers a platform of technology, tools, and teams to enable innovators to build and scale AI with confidence. https://www.coreweave.com https://job-boards.greenhouse.io/coreweave/jobs/4560350006 Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA 2026-04-18 ce88828f-470 Solutions Architect, AI and ML We are building the world's leading AI company and are looking for an experienced Cloud Solution Architect to help assist customers with adoption of GPU hardware and Software, as well as building and deploying Machine Learning (ML), Deep Learning (DL), data analytics solutions on various Cloud Computing Platforms.

As part of the Solutions Architecture team, we work with some of the most exciting computing hardware and software technologies including the latest breakthroughs in machine learning and data science. A Solutions Architect is the first line of technical expertise between NVIDIA and our customers so you will engage directly with developers, researchers, and data scientists with some of NVIDIA's most strategic technology customers as well as work directly with business and engineering teams on product strategy.

What you will be doing:

Working with Cloud Service Providers to develop and demonstrate solutions based on NVIDIA's ML/DL and data science software and hardware technologies

Build and deploy AI/ML solutions at scale using NVIDIA's AI software on cloud-based GPU platforms.

Build custom PoCs for solution that address customer's critical business needs applying NVIDIA hardware and software technology

Partner with Sales Account Managers or Developer Relations Managers to identify and secure new business opportunities for NVIDIA products and solutions for ML/DL and other software solutions

Prepare and deliver technical content to customers including presentations about purpose-built solutions, workshops about NVIDIA products and solutions, etc.

Conduct regular technical customer meetings for project/product roadmap, feature discussions, and intro to new technologies. Establish close technical ties to the customer to facilitate rapid resolution of customer issues

What we need to see:

3+ years of Solutions Engineering (or similar Sales Engineering roles) or equivalent experience

3+ years of work-related experience in Deep Learning and Machine Learning, including deep learning frameworks TensorFlow or PyTorch, GPU, and CUDA experience extremely helpful.

BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Statistics, Physics, or other Engineering fields or equivalent experience.

Established track record of deploying solutions in cloud computing environments including AWS, GCP, or Azure

Knowledge of DevOps/ML Ops technologies such as Docker/containers, Kubernetes, data center deployments

Ability to use at least one scripting language (i.e., Python)

Good programming and debugging skills

Ability to communicate your ideas/code clearly through documents, presentation etc.

Ways to stand out from the crowd:

AWS, GCP or Azure Professional Solution Architect Certification.

Hands-on experience with NVIDIA GPUs and SDKs (e.g. CUDA, RAPIDS, Triton etc.)

System-level experience specifically GPU-based systems

Experience with Deep Learning at scale

Familiarity with parallel programming and distributed computing platforms

XML job scraping automation by YubHub

]]> full-time senior onsite Solutions Engineering, Deep Learning and Machine Learning, TensorFlow or PyTorch, GPU and CUDA experience, BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Statistics, Physics, or other Engineering fields, DevOps/ML Ops technologies, Docker/containers, Kubernetes, data center deployments, Scripting language (i.e., Python), Good programming and debugging skills, Ability to communicate your ideas/code clearly through documents, presentation etc., AWS, GCP or Azure Professional Solution Architect Certification, Hands-on experience with NVIDIA GPUs and SDKs (e.g. CUDA, RAPIDS, Triton etc.), System-level experience specifically GPU-based systems, Experience with Deep Learning at scale, Familiarity with parallel programming and distributed computing platforms Engineering Technology NVIDIA https://logos.yubhub.co/nvidia.com.png NVIDIA is a leading technology company that specialises in designing and manufacturing graphics processing units (GPUs) and high-performance computing hardware. https://nvidia.wd5.myworkdayjobs.com https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-WA-Redmond/Solutions-Architect--AI-and-ML_JR2000691 Redmond, Santa Clara, Seattle 2026-03-09 cf4fd05b-818 Senior Software Engineer, NCCL We are looking for a highly motivated senior software engineer to join our communication libraries and network software team. The position will be part of a fast-paced crew that develops and maintains software for complex heterogeneous computing systems that power disruptive products in High Performance Computing and Deep Learning.

Responsibilities:

Design, implement and maintain highly-optimized communication runtimes for Deep Learning frameworks (e.g. NCCL for TensorFlow/Pytorch) and HPC programming interfaces (e.g. UCX for MPI/OpenSHMEM) on GPU clusters.
Participate in and contribute to parallel programming interface specifications like MPI/OpenSHMEM.
Design, implement and maintain system software that enables interactions among GPUs and interactions between GPUs and other system components.
Create proof-of-concepts to evaluate and motivate extensions in programming models, new designs in runtimes and new features in hardware.

Requirements:

M.S./Ph.D. degree in CS/CE or equivalent experience.
5+ years of relevant experience.
Excellent C/C++ programming and debugging skills.
Strong experience with Linux.
Expert understanding of computer system architecture and operating systems.
Experience with parallel programming interfaces and communication runtimes.
Ability and flexibility to work and communicate effectively in a multi-national, multi-time-zone corporate environment.

Nice to Have:

Deep understanding of technology and passionate about what you do.
Experience with CUDA programming and NVIDIA GPUs.
Knowledge of high-performance networks like InfiniBand, iWARP etc.
Experience with HPC applications.
Experience with Deep Learning Frameworks such PyTorch, TensorFlow, etc.
Strong collaborative and interpersonal skills, specifically a proven ability to effectively guide and influence within a dynamic matrix environment.

Benefits:

Highly competitive salaries.
Comprehensive benefits package.
Eligibility for equity.
Opportunity to work with a world-class engineering team.
Ability to work in a dynamic matrix environment.
Opportunity to contribute to cutting-edge technology.
Flexible work arrangements.
Professional development opportunities.

How to Apply:

Applications for this job will be accepted at least until March 13, 2026. NVIDIA uses AI tools in its recruiting processes.

XML job scraping automation by YubHub

]]> full-time senior onsite C/C++, Linux, Computer system architecture, Operating systems, Parallel programming interfaces, Communication runtimes, CUDA programming, NVIDIA GPUs, High-performance networks, HPC applications, Deep Learning Frameworks Engineering Technology NVIDIA https://logos.yubhub.co/nvidia.com.png NVIDIA is a leading developer of graphics processing units (GPUs) and high-performance computing hardware and software. The company's products are used in a wide range of applications, including artificial intelligence, high-performance computing, and visualization. https://nvidia.wd5.myworkdayjobs.com https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-Software-Engineer--GPU-Communications-and-Networking_JR1997186 Santa Clara 2026-03-09 5d37a7c7-d2a ML Infrastructure Engineer About the role

The ML Infrastructure team at Cursor builds large-scale compute, storage, and software infrastructure to support the company's work building the world's best agentic coding model. We're looking for strong engineers who are interested in building high-performance infrastructure and the software to support it. This role works closely with ML researchers and engineers to enable their work through improvements to our training framework, systems reliability/performance, and developer experience.

What you'll do

Collaborate with ML researchers to improve the throughput and reliability of training
Work with OEMs, cloud service providers, and others to plan and build cutting-edge GPU infrastructure
Improve the density and scalability of compute environments to enable increasingly large RL workloads
Create software and systems to automate building, monitoring, and running GPU clusters
Build workload scheduling and data movement systems to support Cursor's growing training footprint

You may be a fit if

A strong background in systems and infrastructure-focused software engineering, particularly in Python, Typescript, Rust, and Golang
Experience with distributed storage and networking infrastructure, particularly on Linux systems across cloud and bare metal environments
Exposure to large-scale systems and their unique challenges, ideally across thousands of nodes with significant resource footprints

Nice to have

Operational exposure to Nvidia GPUs with Infiniband or RoCE, particularly with Blackwell and Hopper-class hardware
Exposure to Ray, Slurm, or other common compute and runtime schedulers

Name Email ↥ Upload file LinkedIn URL GitHub Profile

Please write a short note on a project you're proud of:

Will you now or in the future require visa sponsorship to work in the country where this position is located?

Has someone at Cursor referred you for this role? If so, please include their email here

XML job scraping automation by YubHub

]]> full-time mid remote Python, Typescript, Rust, Golang, Distributed storage, Networking infrastructure, Linux systems, Kubernetes, Nvidia GPUs, Infiniband, RoCE, Blackwell, Hopper-class hardware, Ray, Slurm Engineering Technology Cursor https://logos.yubhub.co/cursor.com.png Cursor is a technology organisation building the world's best agentic coding model. The company has a large-scale compute, storage, and software infrastructure to support its work. https://cursor.com https://cursor.com/careers/software-engineer-ml-infrastructure 2026-03-08 f2722128-3e2 Inference Runtime, Engineering Manager Inference Runtime, Engineering Manager

Location

San Francisco

Employment Type

Full time

Department

Scaling

Compensation

$455K – $555K

The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts

Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)

401(k) retirement plan with employer match

Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)

Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees

13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)

Mental health and wellness support

Employer-paid basic life and disability coverage

Annual learning and development stipend to fuel your professional growth

Daily meals in our offices, and meal delivery credits as eligible

Relocation support for eligible employees

Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.

More details about our benefits are available to candidates during the hiring process.

This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.

About the Team

Our Inference team brings OpenAI’s most capable research and technology to the world through our products. We empower consumers, enterprise and developers alike to use and access our start-of-the-art AI models, allowing them to do things that they’ve never been able to before. We focus on performant and efficient model inference, as well as accelerating research progression via model inference.

About the Role

We are looking for an engineering leader who wants to build and lead the worlds leading AI systems and modeling engineers who take the world's largest and most capable AI models and optimize them for use in a high-volume, low-latency, and high-availability production and research environment.

In this role, you will:

Lead a team of engineers who are experts in working with distributed systems, with a deep understanding of model architecture, system co-design with research and production team,

Work alongside partners in machine learning researchers, engineers, and product managers to bring our latest technologies into production.

Work in an outcome-oriented environment where everyone contributes across layers of the stack, from infra plumbing to performance tuning.

Introduce new techniques, tools, and architecture that improve the performance, latency, throughput, and efficiency of our model inference stack.

Build tools to give us visibility into our bottlenecks and sources of instability and then design and implement solutions to address the highest priority issues.

Optimize our code and fleet of GPU’s to utilize every FLOP and every GB of GPU RAM of our hardware.

You might thrive in this role if you:

Have an understanding of modern ML architectures and an intuition for how to optimize their performance, particularly for inference.

Own problems end-to-end, and are willing to pick up whatever knowledge you're missing to get the job done.

Have at least 15 years of professional software engineering experience.

Have or can quickly gain familiarity with PyTorch, NVidia GPUs and the software stacks that optimize them (e.g. NCCL, CUDA), as well as HPC technologies such as InfiniBand, MPI, NVLink, etc.

Have experience architecting, building, observing, and debugging production distributed systems. Bonus point if worked on performance-critical distributed systems.

Have needed to rebuild or substantially refactor production systems several times over due to rapidly increasing scale.

Are self-directed and enjoy figuring out the most important problem to work on.

Have a humble attitude, an eagerness to help your colleagues, and a desire to do whatever it takes to make the team succeed.

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

XML job scraping automation by YubHub

]]> full-time senior onsite $455K – $555K PyTorch, NVidia GPUs, NCCL, CUDA, InfiniBand, MPI, NVLink, HPC technologies, Distributed systems, Model architecture, System co-design, Machine learning, Research, Production, Software engineering, GPU optimization, HPC technologies, Distributed systems, Model architecture, System co-design, Machine learning, Research, Production, Software engineering, GPU optimization Engineering Technology OpenAI https://logos.yubhub.co/openai.com.png OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. https://jobs.ashbyhq.com https://jobs.ashbyhq.com/openai/4f998abb-4510-4bd3-9922-161599625171 San Francisco 2026-03-06 d5390946-539 Software Engineer, Model Inference Software Engineer, Model Inference

Location

San Francisco

Employment Type

Full time

Department

Scaling

Compensation

$295K – $555K • Offers Equity

Benefits

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts

Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)

401(k) retirement plan with employer match

Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)

Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees

13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)

Mental health and wellness support

Employer-paid basic life and disability coverage

Annual learning and development stipend to fuel your professional growth

Daily meals in our offices, and meal delivery credits as eligible

Relocation support for eligible employees

Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.

About the Team

About the Role

We are looking for an engineer who wants to take the world's largest and most capable AI models and optimize them for use in a high-volume, low-latency, and high-availability production and research environment.

In this role, you will:

Work alongside machine learning researchers, engineers, and product managers to bring our latest technologies into production.

Work alongside researchers to enable advanced research through awesome engineering.

Introduce new techniques, tools, and architecture that improve the performance, latency, throughput, and efficiency of our model inference stack.

Build tools to give us visibility into our bottlenecks and sources of instability and then design and implement solutions to address the highest priority issues.

Optimize our code and fleet of Azure VMs to utilize every FLOP and every GB of GPU RAM of our hardware.

You might thrive in this role if you:

Have an understanding of modern ML architectures and an intuition for how to optimize their performance, particularly for inference.

Own problems end-to-end, and are willing to pick up whatever knowledge you're missing to get the job done.

Have at least 5 years of professional software engineering experience.

Have or can quickly gain familiarity with PyTorch, NVidia GPUs and the software stacks that optimize them (e.g. NCCL, CUDA), as well as HPC technologies such as InfiniBand, MPI, NVLink, etc.

Have experience architecting, building, observing, and debugging production distributed systems. Bonus point if worked on performance-critical distributed systems.

Have needed to rebuild or substantially refactor production systems several times over due to rapidly increasing scale.

Are self-directed and enjoy figuring out the most important problem to work on.

Have a humble attitude, an eagerness to help your colleagues, and a desire to do whatever it takes to make the team succeed.

About OpenAI

XML job scraping automation by YubHub

]]> full-time senior onsite $295K – $555K • Offers Equity PyTorch, NVidia GPUs, NCCL, CUDA, HPC technologies, InfiniBand, MPI, NVLink, Azure VMs, GPU RAM, FLOP, modern ML architectures, intuition for optimizing performance, distributed systems, performance-critical distributed systems Engineering Technology OpenAI https://logos.yubhub.co/openai.com.png OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. It pushes the boundaries of the capabilities of AI systems and seeks to safely deploy them to the world through its products. https://jobs.ashbyhq.com https://jobs.ashbyhq.com/openai/83b6755d-7785-4186-9050-5ef3ad127941 San Francisco 2026-03-06