Member of Technical Staff - Compute Infrastructure

24176cb8-311 Member of Technical Staff - Compute Infrastructure We're seeking a highly skilled Member of Technical Staff to join our Compute Infrastructure team. As a key member of this team, you will design, build, and operate massive-scale clusters and orchestration platforms that power frontier AI training, inference, and agent workloads at unprecedented scale.

In this role, you will push the boundaries of container orchestration far beyond existing systems like Kubernetes, manage exascale compute resources, optimize for high-performance training runs and production serving, and collaborate closely with research and systems teams to deliver reliable, ultra-scalable infrastructure that enables xAI's next-generation models and applications.

Responsibilities include building and managing massive-scale clusters, designing, developing, and extending an in-house container orchestration platform, collaborating with research teams to architect and optimize compute clusters, profiling, debugging, and resolving complex system-level performance bottlenecks, and owning end-to-end infrastructure initiatives.

To succeed in this role, you will need deep expertise in virtualization technologies and advanced containerization/sandboxing, strong proficiency in systems programming languages such as C/C++ and Rust, and proven track record profiling, debugging, and optimizing complex system-level performance issues.

Preferred skills and experience include experience in Linux kernel development, hypervisor extensions, or low-level system programming for compute-intensive workloads, operating or designing large-scale AI training/inference clusters, and familiarity with performance tools, tracing, and debugging in production distributed environments.

XML job scraping automation by YubHub

]]> full-time staff onsite $180,000 - $440,000 USD Deep expertise in virtualization technologies (KVM, Xen, QEMU) and advanced containerization/sandboxing (Kata, Firecracker, gVisor, Sysbox, or equivalent), Strong proficiency in systems programming languages such as C/C++ and Rust, Proven track record profiling, debugging, and optimizing complex system-level performance issues, with deep knowledge of Linux kernel internals, resource management, scheduling, memory management, and low-level engineering, Hands-on experience building or significantly enhancing distributed compute platforms, orchestration systems, or high-performance infrastructure at scale, Experience in Linux kernel development, hypervisor extensions, or low-level system programming for compute-intensive workloads, Proven track record operating or designing large-scale AI training/inference clusters (GPU/TPU scale), Experience with custom runtimes, isolation techniques, or bespoke platforms for specialized AI compute, Familiarity with performance tools, tracing, and debugging in production distributed environments Engineering Technology xAI https://logos.yubhub.co/xai.com.png xAI creates AI systems to understand the universe and aid humanity in its pursuit of knowledge. https://www.xai.com/ https://job-boards.greenhouse.io/xai/jobs/5052040007 Palo Alto, CA 2026-04-18 1f8a39f0-f7c Senior Software Engineer - Artifact Management CoreWeave is seeking a Senior Software Engineer - Artifact Management to join our team. As a Senior Software Engineer - Artifact Management, you will be responsible for designing and implementing distributed storage and caching solutions for artifacts, evaluating and exploring third-party solutions, developing APIs and services for artifact publishing, retrieval, and version management, optimizing performance, reliability, and cost efficiency across multi-region deployments, working closely with build, release, and infrastructure teams to ensure seamless integration into developer workflows, driving observability, automation, and resilience in a high-traffic production environment by creating dashboards, metrics, and alerts, and partnering with cross-functional teams to implement best practices and drive migration from legacy systems.

The ideal candidate will have a bachelor's degree in Computer Science, Software Engineering, or a related field, 4+ years of experience in a software or infrastructure engineering industry, strong experience operating services in production and at scale, deep experience with Go as the primary programming language, experience with infrastructure-as-code, CI/CD systems, and containerization, understanding of system design, scalability, and efficiency, extensive experience with Artifactory, Cloudsmith, and passion for improving developer experience and enabling other engineers to do their best work.

In addition to the above requirements, preferred qualifications include experience integrating or enabling tools that leverage LLMs or code intelligence for developers, experience with KubeVirt, KataContainers, and a willingness to learn and adapt to new technologies and processes.

XML job scraping automation by YubHub

]]> full-time senior hybrid $139,000 to $204,000 Go, Infrastructure-as-code, CI/CD systems, Containerization, System design, Scalability, Efficiency, Artifactory, Cloudsmith, LLMs or code intelligence for developers, KubeVirt, KataContainers Engineering Technology CoreWeave https://logos.yubhub.co/coreweave.com.png CoreWeave is a cloud computing company that provides a platform for artificial intelligence (AI) development and deployment. https://www.coreweave.com https://job-boards.greenhouse.io/coreweave/jobs/4612039006 Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA 2026-04-18 9299d24f-de5 Staff Software Engineer - Artifact Management CoreWeave is seeking a Staff Software Engineer - Artifact Management to join our team. As a Staff Software Engineer, you will be responsible for designing and implementing distributed storage and caching solutions for artifacts, evaluating and exploring third-party solutions, developing APIs and services for artifact publishing, retrieval, and version management, optimizing performance, reliability, and cost efficiency across multi-region deployments, working closely with build, release, and infrastructure teams to ensure seamless integration into developer workflows, driving observability, automation, and resilience in a high-traffic production environment by creating dashboards, metrics, and alerts, diagnosing and resolving system bottlenecks, storage issues, and dependency-related failures, driving and implementing best practices in artifact creation and lifecycle management, growing, changing, investing in your teammates, being invested-in, sharing your ideas, listening to others, being curious, having fun, and being yourself.

The ideal candidate will have a minimum of 7 years of experience in a software or infrastructure engineering industry, deep experience operating services in production and at scale, proficiency in Go as your primary programming language, strong experience with infrastructure-as-code, CI/CD systems (e.g., GitHub Actions, ArgoCD), and containerization (e.g., Docker, Kubernetes), expertise in leading scale system design, scalability, and efficiency, experience with third-party vendors like Artifactory, and passion for improving developer experience and enabling other engineers to do their best work.

In addition to the required skills, preferred skills include experience integrating or enabling tools that leverage LLMs or code intelligence for developers (e.g., GitHub Copilot, Cody, custom LLM integrations), experience with KubeVirt, KataContainers, and experience with LangGraph/LangChain.

XML job scraping automation by YubHub

]]> full-time staff hybrid $188,000 to $275,000 Go, Infrastructure-as-code, CI/CD systems, Containerization, Leading scale system design, Scalability, Efficiency, Third-party vendors, Artifactory, LLMs or code intelligence, GitHub Copilot, Cody, KubeVirt, KataContainers, LangGraph/LangChain Engineering Technology CoreWeave https://logos.yubhub.co/coreweave.com.png CoreWeave is a cloud computing company that provides a platform for artificial intelligence (AI) development and deployment. https://www.coreweave.com https://job-boards.greenhouse.io/coreweave/jobs/4612032006 Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA 2026-04-18 95061695-858 Director of Engineering, Media & Entertainment (M&E) CoreWeave is seeking a Director of Engineering, Media & Entertainment (M&E) to lead the development of next-generation cloud platforms and tools that power modern content creation workflows. This role will drive the engineering strategy and execution for solutions that support visual effects (VFX), animation, rendering, and post-production pipelines used by studios, artists, and creative teams worldwide.

As a senior engineering leader, you will build and lead high-performing engineering teams responsible for designing scalable infrastructure, developer tools, and user-facing systems that enable creative professionals to run complex production workloads in the cloud. You will collaborate closely with product, design, infrastructure, and customer teams to translate real-world production workflows into reliable, high-performance software platforms.

This role combines deep engineering leadership with domain expertise in M&E workflows, ensuring that the platform delivers exceptional performance, reliability, and usability for demanding creative workloads.

Leadership & Strategy

-Build and scale high-performing engineering teams focused on cloud platforms for media production workloads including rendering, simulation, and content processing. -Recruit, mentor, and develop engineering managers and senior engineers while fostering a culture of innovation, accountability, and collaboration. -Define and execute the long-term engineering strategy for Media & Entertainment products and services. -Partner with Product and Design leaders to translate industry workflows and customer needs into scalable platform capabilities. -Establish engineering best practices for reliability, security, observability, and operational excellence. -Drive roadmap alignment between engineering initiatives and strategic business objectives.

Technical Leadership

-Lead the design and development of scalable backend services, APIs, and developer interfaces that power M&E cloud workflows. -Build platforms that support demanding workloads such as rendering, asset processing, and distributed compute pipelines. -Drive architecture decisions for cloud-native systems leveraging technologies such as Kubernetes, distributed services, and infrastructure-as-code. -Ensure the platform enables self-service provisioning, automation, and repeatable workflows for production pipelines. -Establish engineering standards around performance, scalability, and security for enterprise-grade SaaS/PaaS systems. -Oversee system reliability and operational readiness through clear SLOs, monitoring, and runbook-driven on-call practices.

Product & Workflow Collaboration

-Work closely with product leadership to define technical requirements aligned with real customer workflows in animation, VFX, and media production. -Engage directly with studios, artists, and technical directors to understand pipeline challenges and incorporate feedback into product development. -Translate industry needs into clear engineering priorities and technical roadmaps. -Guide development teams through product milestones including specification, development, testing, and release. -Ensure engineering efforts balance customer requirements, technical feasibility, and business goals.

Customer and industry collaboration is critical in identifying workflow needs and transforming them into actionable development plans for engineering teams.

Operational Excellence

-Implement engineering processes that support scalable development, including CI/CD pipelines, testing strategies, and code review standards. -Manage development timelines and resource allocation across multiple engineering teams. -Track key operational and customer metrics including performance, reliability, and cost efficiency. -Drive continuous improvement in engineering productivity and system performance. -Partner with QA, support, and customer success teams to ensure high-quality releases and strong user satisfaction.

Who You Are:

Required Qualifications

-10+ years of software engineering experience, including leadership of engineering teams and managers -Proven experience building and scaling cloud-based platforms or distributed systems. -Strong understanding of cloud infrastructure, microservices architecture, and automation technologies. -Experience delivering enterprise SaaS or PaaS products used by external customers. -Excellent leadership, communication, and cross-functional collaboration skills. -Ability to operate strategically while remaining deeply technical and hands-on with architecture decisions.

Preferred Qualifications

-Experience building platforms or tools for Media & Entertainment workflows such as VFX, animation, rendering, or post-production pipelines. -Familiarity with industry tools such as Maya, Houdini, Katana, Cinema 4D, V-Ray, Arnold, or RenderMan. -Experience designing APIs, developer platforms, or automation frameworks used by technical users. -Knowledge of GPU-accelerated compute workloads and distributed rendering systems. -Experience working with Kubernetes, infrastructure-as-code, and large-scale cloud environments.

What Success Looks Like

-Engineering teams delivering reliable, scalable platforms used by media studios and creative teams globally. -Clear alignment between product vision, customer workflows, and engineering execution. -Platforms capable of supporting large-scale production workloads with high performance and reliability. -Strong engineering culture focused on innovation, collaboration, and operational excellence.

Wondering if you’re a good fit? We believe in investing in our people, and value candidates who can bring their own diversified experiences to our teams – even if you aren't a 100% skill or experience match.

Why CoreWeave?

At CoreWeave, we work hard, have fun, and move fast! We’re in an exciting stage of hyper-growth that you will not want to miss out on. We’re not afraid of a little chaos, and we’re constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values:

-Be Curious at Your Core -Act Like an Owner -Empower Employees -Deliver Best-in-Class Client Experiences -Achieve More Together

We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and provides the opportunity to develop innovative solutions to complex problems. As we get set for take off, the growth opportunities within the organization are constantly expanding. You will be surrounded by some of the best talent in the industry, who will want to learn from you, too. Come join us!

The base salary range for this role is $206,000 to $303,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation.

XML job scraping automation by YubHub

]]> full-time senior hybrid $206,000 - $303,000 Cloud infrastructure, Microservices architecture, Automation technologies, Enterprise SaaS or PaaS products, Leadership, Communication, Cross-functional collaboration, Strategic decision-making, Media & Entertainment workflows, VFX, animation, rendering, or post-production pipelines, Industry tools such as Maya, Houdini, Katana, Cinema 4D, V-Ray, Arnold, or RenderMan, APIs, developer platforms, or automation frameworks, GPU-accelerated compute workloads and distributed rendering systems, Kubernetes, infrastructure-as-code, and large-scale cloud environments Engineering Technology CoreWeave https://logos.yubhub.co/coreweave.com.png CoreWeave is a cloud computing company that provides a platform for artificial intelligence (AI) and machine learning (ML) workloads. https://www.coreweave.com https://job-boards.greenhouse.io/coreweave/jobs/4666156006 Livingston, NJ / New York, NY / San Francisco, CA / Sunnyvale, CA / Bellevue, WA 2026-04-18 520ca95e-75f Software Engineer, Agent Infrastructure Software Engineer, Agent Infrastructure

Location

San Francisco; New York City

Employment Type

Full time

Department

Scaling

Compensation

$230K – $385K • Offers Equity

The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.

Benefits

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts

Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)

401(k) retirement plan with employer match

Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)

Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees

13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)

Mental health and wellness support

Employer-paid basic life and disability coverage

Annual learning and development stipend to fuel your professional growth

Daily meals in our offices, and meal delivery credits as eligible

Relocation support for eligible employees

Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.

About the Team

The Agent Infrastructure team at OpenAI is responsible for building systems that enable training and deployment of highly useful AI agents, both internally and for the world.

We work hand-in-hand with researchers to design and scale the environment in which agentic models are trained – providing a workspace for AI models to execute code, debug issues, and develop software just as human SWEs do. Our training environment for agentic models operates at an extremely high scale and has the flexibility to emulate any environment in which an agent might work.

At the same time, our team builds and maintains OpenAI’s core platform for the deployment and execution of agents in production. Our systems power products such as Codex, Operator, tool use in ChatGPT, and future agentic products.

About the Role

As a Software Engineer on the Agent Infrastructure team, you will have the opportunity to work closely with both research and product at OpenAI - building and scaling systems to train highly capable agentic models, and building the platform and integrations to launch new agents to hundreds of millions of users worldwide.

Your work will consist of both building new capabilities - standing up the infrastructure and integrations needed to train more complex agentic models - and rapidly scaling these new capabilities to some of the largest compute clusters in the world. At the same time, you’ll be instrumental to the launch of agentic products at OpenAI - building, maintaining, and scaling the production platform on which all agents run.

Responsibilities

Push massive compute clusters to their limits. You will be a core contributor to a novel container orchestration platform built in-house by our team to scale far beyond what’s possible with systems like Kubernetes.

Develop and maintain FastAPI and gRPC APIs that serve as the interface for our agentic infrastructure used both in training and production.

Use Terraform to stand up and evolve complex infrastructure for both research and production.

Collaborate with research teams to stand up and optimize systems for novel AI training runs and experimental applications.

Requirements

Have deep experience working on large-scale machine learning infrastructure. You know how to reason about training at scale, identifying bottlenecks and engineering solutions to optimize system performance in training environments.

Know how to build new things from 0-1 quickly, and then scale them 1,000,000x.

Have a keen eye for performance and optimization. You know how to squeeze the most performance out of complex, globally-distributed systems.

Know your way around cloud platforms and work with infrastructure-as-code tech like Terraform.

Are driven by solving complex, ambiguous problems at the intersection of infrastructure scalability, virtualization efficiency, and agentic capabilities.

Have deep technical expertise in virtualization and containerization technologies (e.g. Kata, Firecracker, gVisor, Sysbox) and are passionate about optimizing runtime performance.

What We Offer

Competitive salary and equity package

Opportunity to work on cutting-edge AI infrastructure

Collaborative and dynamic team environment

Flexible work arrangements

Professional development opportunities

Access to the latest technology and tools

How to Apply

If you are a motivated and experienced software engineer looking to join a dynamic team and work on cutting-edge AI infrastructure, please submit your application. We look forward to hearing from you!

XML job scraping automation by YubHub

]]> full-time senior hybrid $230K – $385K large-scale machine learning infrastructure, container orchestration, FastAPI, gRPC, Terraform, cloud platforms, infrastructure-as-code, virtualization, containerization, Kata, Firecracker, gVisor, Sysbox, AI infrastructure, agentic models, training environments, compute clusters, performance optimization, runtime performance Engineering Technology OpenAI https://logos.yubhub.co/openai.com.png OpenAI is a technology company that specializes in artificial intelligence. It was founded in 2015 and is headquartered in San Francisco, California. https://jobs.ashbyhq.com https://jobs.ashbyhq.com/openai/c1316397-25bb-4add-9e9d-0e3ea8ba929a San Francisco; New York City 2026-03-06