{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/gpu-performance"},"x-facet":{"type":"skill","slug":"gpu-performance","display":"Gpu Performance","count":7},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_04ee7215-acf"},"title":"Sr. Manager, Engineering - Model Serving","description":"<p>At Databricks, we enable data teams to solve the world&#39;s toughest problems by building and running the world&#39;s best data and AI infrastructure platform. Our Model Serving product provides enterprises with a unified, scalable, and governed platform to deploy and manage AI/ML models. As a Senior Engineering Manager, you will lead the team owning both the product experience and the foundational infrastructure of Model Serving, shaping customer-facing capabilities while designing for scalability, extensibility, and performance across both CPU and GPU inference. The impact you will have includes leading, mentoring, and growing a high-performing engineering team, defining and owning the product and technical roadmap for Model Serving, collaborating closely with product, research, platform, and infrastructure teams, and ensuring Model Serving meets stringent SLAs, SLOs, and performance and reliability goals.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Leading, mentoring, and growing a high-performing engineering team responsible for both the customer-facing Model Serving product and its foundational infrastructure.</li>\n<li>Defining and owning the product and technical roadmap for Model Serving, balancing customer experience, functionality, and foundational investments across deployment, inference, monitoring, and scaling.</li>\n<li>Collaborating closely with product, research, platform, and infrastructure teams to drive end-to-end delivery from ideation and prioritization to launch and operation.</li>\n<li>Ensuring Model Serving meets stringent SLAs, SLOs, and performance and reliability goals, continuously improving operational efficiency and customer experience.</li>\n<li>Driving architectural decisions and product design around latency, throughput, autoscaling, GPU/CPU placement, and cost optimization.</li>\n<li>Advocating for customer needs through direct engagement, ensuring engineering decisions translate to clear product impact.</li>\n<li>Promoting best practices in code quality, testing, observability, and operational readiness.</li>\n<li>Fostering a culture of excellence, inclusion, and continuous improvement across the team.</li>\n<li>Partnering with recruiting to attract, hire, and develop top-tier engineering talent.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_04ee7215-acf","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Databricks","sameAs":"https://databricks.com","logo":"https://logos.yubhub.co/databricks.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/databricks/jobs/8211957002","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$217,000-$312,200 USD","x-skills-required":["technical leadership","large-scale distributed systems","real-time serving systems","architectural design","operational excellence","production systems","SLAs","SLOs","GPU performance optimization","concurrency","caching","scalability concepts"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:58:02.797Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, California"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"technical leadership, large-scale distributed systems, real-time serving systems, architectural design, operational excellence, production systems, SLAs, SLOs, GPU performance optimization, concurrency, caching, scalability concepts","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":217000,"maxValue":312200,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_5f6e6eac-370"},"title":"Sr GPU Infrastructure Software Engineer","description":"<p>CoreWeave is seeking a Senior GPU Infrastructure Software Engineer to join our team. As a senior engineer, you will be responsible for leading designs, raising engineering standards, and delivering measurable improvements to latency, throughput, and reliability across multiple services. You will partner with fleet, product, and hardware teams to evolve our GPU performance testing platform to ensure we deliver a reliable and performant experience to our customers.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Design and implement solutions to problems of scale for testing and validation of CoreWeave&#39;s global infrastructure.</li>\n<li>Design and develop Kubernetes-native controllers and operators to automate infrastructure workflows.</li>\n<li>Build and maintain scalable backend services and APIs (gRPC/REST) in Go or Python.</li>\n<li>Develop performance tests and automation workflows to expand hardware validation across the CoreWeave fleet.</li>\n<li>Write and maintain Kubernetes custom controllers and operators to automate infrastructure testing.</li>\n<li>Adapt and extend open source tooling to enhance visibility into system metrics, performance, and health.</li>\n</ul>\n<p>To be successful in this role, you should have:</p>\n<ul>\n<li>~5 to 8 years experience.</li>\n<li>Proficiency in Go and/or Python software development.</li>\n<li>Hands-on experience with writing Kubernetes operators/controllers.</li>\n</ul>\n<p>Nice to have:</p>\n<ul>\n<li>Experience testing hardware at scale.</li>\n<li>HPC Experience.</li>\n<li>Experience with AI/ML infrastructure and training / inference.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_5f6e6eac-370","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4627277006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$165,000 to $242,000","x-skills-required":["Go","Python","Kubernetes","GPU performance testing","infrastructure validation"],"x-skills-preferred":["HPC Experience","AI/ML infrastructure","training / inference"],"datePosted":"2026-04-18T15:53:07.770Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Go, Python, Kubernetes, GPU performance testing, infrastructure validation, HPC Experience, AI/ML infrastructure, training / inference","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":165000,"maxValue":242000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_ff4d3a91-b20"},"title":"Principal Engineer - Perf and Benchmarking","description":"<p>We&#39;re looking for a Principal Engineer to be the technical lead of CoreWeave&#39;s Benchmarking &amp; Performance team. You will be responsible for our planet-scale performance data warehouse: Ingesting, storing, transforming and analyzing performance events in all the data centers across our global infrastructure.</p>\n<p>You will also be an integral part of achieving industry-leading end-to-end performance benchmarking publications: If MLPerf (Training &amp; Inference), Working closely with NVIDIA (Megatron-LM, TensorRT-LLM &amp; DGX cloud) and the open-source community (llm-d, vLLM and all popular ML frameworks) speak to you, come help us demonstrate CoreWeave&#39;s performance reliability leadership in the field.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Strategy &amp; Leadership - Define the multi-year benchmarking strategy and roadmap; prioritize models/workloads (LLMs, diffusion, vision, speech) and hardware tiers. Build, lead, and mentor a high-performing team of performance engineers and data analysts. Establish governance for claims: documented methodologies, versioning, reproducibility, and audit trails.</li>\n</ul>\n<ul>\n<li>Perf Ownership - Lead end-to-end MLPerf Inference and Training submissions: workload selection, cluster planning, runbooks, audits, and result publication. Coordinate optimization tracks with NVIDIA (CUDA, cuDNN, TensorRT/TensorRT-LLM, Triton, NCCL) to hit competitive results; drive upstream fixes where needed.</li>\n</ul>\n<ul>\n<li>Internal Latency &amp; Throughput Benchmarks - Design a Kubernetes-native, repeatable benchmarking service that exercises CoreWeave stacks across SUNK (Slurm on Kubernetes), Kueue, and Kubeflow pipelines. Measure and report p50/p95/p99 latency, jitter, tokens/s, time-to-first-token, cold-start/warm-start, and cost-per-token/request across models, precisions (BF16/FP8/FP4), batch sizes, and GPU types. Maintain a corpus of representative scenarios (streaming, batch, multi-tenant) and data sets; automate comparisons across software releases and hardware generations.</li>\n</ul>\n<ul>\n<li>Tooling &amp; Automation - Build CI/CD pipelines and K8s controllers/operators to schedule benchmarks at scale; integrate with observability stacks (Prometheus, Grafana, OpenTelemetry) and results warehouses. Implement supply-chain integrity for benchmark artifacts (SBOMs, Cosign signatures).</li>\n</ul>\n<ul>\n<li>Cross-functional &amp; Community - Partner with NVIDIA, key ISVs, and OSS projects (vLLM, Triton, KServe, PyTorch/DeepSpeed, ONNX Runtime) to co-develop optimizations and upstream improvements. Support Sales/SEs with authoritative numbers for RFPs and competitive evaluations; brief analysts and press with rigorous, defensible data.</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>10+ years building distributed systems or HPC/cloud services, with deep expertise on large-scale ML training or similar high-performance workloads.</li>\n</ul>\n<ul>\n<li>Proven track record of architecting or building planet-scale data systems (e.g., telemetry platforms, observability stacks, cloud data warehouses, large-scale OLAP engines).</li>\n</ul>\n<ul>\n<li>Deep understanding of GPU performance (CUDA, NCCL, RDMA, NVLink/PCIe, memory bandwidth), model-server stacks (Triton, vLLM, TensorRT-LLM, TorchServe), and distributed training frameworks (PyTorch FSDP/DeepSpeed/Megatron-LM).</li>\n</ul>\n<ul>\n<li>Proficient with Kubernetes and ML control planes; familiarity with SUNK, Kueue, and Kubeflow in production environments.</li>\n</ul>\n<ul>\n<li>Excellent communicator able to interface with executives, customers, auditors, and OSS communities.</li>\n</ul>\n<p><strong>Nice to have</strong></p>\n<ul>\n<li>Experience with time-series databases, log-structured merge trees (LSM), or custom storage engine development.</li>\n</ul>\n<ul>\n<li>Experience running MLPerf submissions (Inference and/or Training) or equivalent audited benchmarks at scale.</li>\n</ul>\n<ul>\n<li>Contributions to MLPerf, Triton, vLLM, PyTorch, KServe, or similar OSS projects.</li>\n</ul>\n<ul>\n<li>Experience benchmarking multi-region fleets and large clusters (thousands of GPUs).</li>\n</ul>\n<ul>\n<li>Publications/talks on ML performance, latency engineering, or large-scale benchmarking methodology.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_ff4d3a91-b20","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4627302006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$206,000 to $333,000","x-skills-required":["Distributed systems","HPC/cloud services","Large-scale ML training","GPU performance","Model-server stacks","Distributed training frameworks","Kubernetes","ML control planes","Time-series databases","Log-structured merge trees","Custom storage engine development"],"x-skills-preferred":["MLPerf submissions","Audited benchmarks","Contributions to OSS projects","Benchmarking multi-region fleets","Large clusters","Publications/talks on ML performance"],"datePosted":"2026-04-18T15:51:17.448Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Distributed systems, HPC/cloud services, Large-scale ML training, GPU performance, Model-server stacks, Distributed training frameworks, Kubernetes, ML control planes, Time-series databases, Log-structured merge trees, Custom storage engine development, MLPerf submissions, Audited benchmarks, Contributions to OSS projects, Benchmarking multi-region fleets, Large clusters, Publications/talks on ML performance","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":206000,"maxValue":333000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_71554e46-b64"},"title":"Senior Engineering Manager, AI Runtime","description":"<p>At Databricks, we are committed to enabling data teams to solve the world&#39;s toughest problems. As a Senior Engineering Manager, you will lead the team owning both the product experience and the foundational infrastructure of our AI Runtime (AIR) product.</p>\n<p>You will be responsible for shaping customer-facing capabilities while designing for scalability, extensibility, and performance of GPU training and adjacent areas. This will involve collaborating closely across the platform, product, infrastructure, and research organisations.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Leading, mentoring, and growing a high-performing engineering team responsible for the Custom Training product and its foundational infrastructure</li>\n<li>Defining and owning the product and technical roadmap for AIR, balancing customer experience, functionality, and foundational investments</li>\n<li>Collaborating closely with product, research, platform, infrastructure teams, and customers to drive end-to-end delivery</li>\n<li>Driving architectural decisions and product design for managed GPU training at scale</li>\n<li>Advocating for customer needs through direct engagement, ensuring engineering decisions translate to clear product impact</li>\n</ul>\n<p>We are looking for someone with 8+ years of software engineering experience, with 3+ years in engineering management. You should have a track record of building and operating managed GPU training infrastructure at scale, as well as deep familiarity with distributed training frameworks and parallelism strategies.</p>\n<p>In addition, you should have experience with training resilience patterns, such as checkpointing, elastic training, and automated failure recovery for long-running jobs. You should also have a strong understanding of GPU performance fundamentals, including NCCL, interconnect topologies, and memory optimisation.</p>\n<p>Experience building platform products with clear SLAs is also essential, as is strong cross-functional leadership across platform, product, and research teams. Excellent collaboration and communication skills are also required.</p>\n<p>The pay range for this role is $228,600-$314,250 USD per year, depending on location. The total compensation package may also include eligibility for annual performance bonus, equity, and benefits.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_71554e46-b64","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Databricks","sameAs":"https://databricks.com","logo":"https://logos.yubhub.co/databricks.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/databricks/jobs/8490282002","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$228,600-$314,250 USD per year","x-skills-required":["software engineering","engineering management","distributed training frameworks","parallelism strategies","GPU training infrastructure","checkpointing","elastic training","automated failure recovery","GPU performance fundamentals","NCCL","interconnect topologies","memory optimisation"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:45:28.312Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Mountain View, California; San Francisco, California"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"software engineering, engineering management, distributed training frameworks, parallelism strategies, GPU training infrastructure, checkpointing, elastic training, automated failure recovery, GPU performance fundamentals, NCCL, interconnect topologies, memory optimisation","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":228600,"maxValue":314250,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_f8883394-0fc"},"title":"Solutions Architect, AI and ML","description":"<p>We are looking for an experienced Cloud Solution Architect to help assist customers with adoption of GPU hardware and Software, as well as building and deploying Machine Learning (ML) , Deep Learning (DL), data analytics solutions on various Cloud Computing Platforms.</p>\n<p>As a Solutions Architect, you will engage directly with developers, researchers, and data scientists with some of NVIDIA’s most strategic technology customers as well as work directly with business and engineering teams on product strategy.</p>\n<p><strong>Key Responsibilities:</strong></p>\n<ul>\n<li>Help cloud customers craft, deploy, and maintain scalable, GPU-accelerated inference pipelines on cloud ML services and Kubernetes for large language models (LLMs) and generative AI workloads.</li>\n<li>Enhance performance tuning using TensorRT/TensorRT-LLM, vLLM, Dynamo, and Triton Inference Server to improve GPU utilization and model efficiency.</li>\n<li>Collaborate with multi-functional teams (engineering, product) and offer technical mentorship to cloud customers implementing AI inference at scale.</li>\n<li>Build custom PoCs for solution that address customer’s critical business needs applying NVIDIA hardware and software technology</li>\n<li>Partner with Sales Account Managers or Developer Relations Managers to identify and secure new business opportunities for NVIDIA products and solutions for ML/DL and other software solutions</li>\n<li>Prepare and deliver technical content to customers including presentations about purpose-built solutions, workshops about NVIDIA products and solutions, etc.</li>\n<li>Conduct regular technical customer meetings for project/product roadmap, feature discussions, and intro to new technologies. Establish close technical ties to the customer to facilitate rapid resolution of customer issues</li>\n</ul>\n<p><strong>Requirements:</strong></p>\n<ul>\n<li>BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Statistics, Physics, or other Engineering fields or equivalent experience.</li>\n<li>3+ Years in Solutions Architecture with a proven track record of moving AI inference from POC to production in cloud computing environments including AWS, GCP, or Azure</li>\n<li>3+ years of hands-on experience with Deep Learning frameworks such as PyTorch and TensorFlow</li>\n<li>Excellent knowledge of the theory and practice of LLM and DL inference</li>\n<li>Strong fundamentals in programming, optimizations, and software design, especially in Python</li>\n<li>Experience with containerization and orchestration technologies like Docker and Kubernetes, monitoring, and observability solutions for AI deployments</li>\n<li>Knowledge of Inference technologies - NVIDIA NIM, TensorRT-LLM, Dynamo, Triton Inference Server, vLLM, etc</li>\n<li>Proficiency in problem-solving and debugging skills in GPU environments</li>\n<li>Excellent presentation, communication and collaboration skills</li>\n</ul>\n<p><strong>Nice to Have:</strong></p>\n<ul>\n<li>AWS, GCP or Azure Professional Solution Architect Certification.</li>\n<li>Experience optimizing and deploying large MoE LLMs at scale</li>\n<li>Active contributions to open-source AI inference projects (e.g., vLLM, TensorRT-LLM Dynamo, SGLang, Triton or similar)</li>\n<li>Experience with Multi-GPU Multi-node Inference technologies like Tensor Parallelism/Expert Parallelism, Disaggregated Serving, LWS, MPI, EFA/Infiniband, NVLink/PCIe, etc</li>\n<li>Experience in developing and integrating monitoring and alerting solutions using Prometheus, Grafana, and NVIDIA DCGM and GPU performance Analysis and tools like NVIDIA Nsight Systems</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_f8883394-0fc","directApply":true,"hiringOrganization":{"@type":"Organization","name":"NVIDIA","sameAs":"https://nvidia.wd5.myworkdayjobs.com","logo":"https://logos.yubhub.co/nvidia.com.png"},"x-apply-url":"https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-WA-Redmond/Solutions-Architect--AI-and-ML_JR2005988-1","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Cloud Solution Architecture","GPU hardware and Software","Machine Learning (ML)","Deep Learning (DL)","Data Analytics","Cloud Computing Platforms","Kubernetes","TensorRT","TensorRT-LLM","vLLM","Dynamo","Triton Inference Server","Python","Containerization","Orchestration","Monitoring","Observability","Inference technologies","NVIDIA NIM","Problem-solving","Debugging","GPU environments"],"x-skills-preferred":["AWS","GCP","Azure","Professional Solution Architect Certification","Large MoE LLMs","Open-source AI inference projects","Multi-GPU Multi-node Inference technologies","Monitoring and alerting solutions","Prometheus","Grafana","NVIDIA DCGM","GPU performance Analysis","NVIDIA Nsight Systems"],"datePosted":"2026-03-09T20:45:22.711Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Redmond, CA, Santa Clara, Seattle"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Cloud Solution Architecture, GPU hardware and Software, Machine Learning (ML), Deep Learning (DL), Data Analytics, Cloud Computing Platforms, Kubernetes, TensorRT, TensorRT-LLM, vLLM, Dynamo, Triton Inference Server, Python, Containerization, Orchestration, Monitoring, Observability, Inference technologies, NVIDIA NIM, Problem-solving, Debugging, GPU environments, AWS, GCP, Azure, Professional Solution Architect Certification, Large MoE LLMs, Open-source AI inference projects, Multi-GPU Multi-node Inference technologies, Monitoring and alerting solutions, Prometheus, Grafana, NVIDIA DCGM, GPU performance Analysis, NVIDIA Nsight Systems"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_989f992b-6b2"},"title":"Software Engineer, Inference – AMD GPU Enablement","description":"<p><strong>Software Engineer, Inference – AMD GPU Enablement</strong></p>\n<p><strong>Location</strong></p>\n<p>San Francisco</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p>Scaling</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$295K – $555K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p>More details about our benefits are available to candidates during the hiring process.</p>\n<p>This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.</p>\n<p><strong>About the Team</strong></p>\n<p>Our Inference team brings OpenAI’s most capable research and technology to the world through our products. We empower consumers, enterprises and developers alike to use and access our state-of-the-art AI models, allowing them to do things that they’ve never been able to before. We focus on performant and efficient model inference, as well as accelerating research progression via model inference.</p>\n<p><strong>About the Role</strong></p>\n<p>We’re hiring engineers to scale and optimize OpenAI’s inference infrastructure across emerging GPU platforms. You’ll work across the stack - from low-level kernel performance to high-level distributed execution - and collaborate closely with research, infra, and performance teams to ensure our largest models run smoothly on new hardware.</p>\n<p>This is a high-impact opportunity to shape OpenAI’s multi-platform inference capabilities from the ground up with a particular focus on advancing inference performance on AMD accelerators.</p>\n<p><strong>In this role, you will:</strong></p>\n<ul>\n<li>Own bring-up, correctness and performance of the OpenAI inference stack on AMD hardware.</li>\n</ul>\n<ul>\n<li>Integrate internal model-serving infrastructure (e.g., vLLM, Triton) into a variety of GPU-backed systems.</li>\n</ul>\n<ul>\n<li>Debug and optimize distributed inference workloads across memory, network, and compute layers.</li>\n</ul>\n<ul>\n<li>Validate correctness, performance, and scalability of model execution on large GPU clusters.</li>\n</ul>\n<ul>\n<li>Collaborate with partner teams to design and optimize high-performance GPU kernels for accelerators using HIP, Triton, or other performance-focused frameworks.</li>\n</ul>\n<ul>\n<li>Collaborate with partner teams to build, integrate and tune collective communication libraries (e.g., RCCL) used to parallelize model execution across many GPUs.</li>\n</ul>\n<p><strong>You can thrive in this role if you:</strong></p>\n<ul>\n<li>Have experience writing or porting GPU kernels using HIP, CUDA, or Triton, and care deeply about low-level performance.</li>\n</ul>\n<ul>\n<li>Are familiar with communication libraries like NCCL/RCCL and understand their role in high-throughput model serving.</li>\n</ul>\n<ul>\n<li>Have worked on distributed inference systems and are comfortable scaling models across fleets of accelerators.</li>\n</ul>\n<ul>\n<li>Enjoy solving end-to-end performance challenges across hardware, system libraries, and orchestration layers.</li>\n</ul>\n<ul>\n<li>Are excited to be part of a small, fast-moving team building new infrastructure from first principles.</li>\n</ul>\n<p><strong>Nice to Have:</strong></p>\n<ul>\n<li>Contributions to open-source libraries like RCCL, Triton, or vLLM.</li>\n</ul>\n<ul>\n<li>Experience with GPU performance tools (Nsight, rocprof, perf) and memory/comms profiling.</li>\n</ul>\n<ul>\n<li>Prior experience deploying inference on other non-NVIDIA GPU environments.</li>\n</ul>\n<ul>\n<li>Knowledge of model/tensor parallelism, mixed precision, and serving 10B+ parameter models.</li>\n</ul>\n<p><strong>About OpenAI</strong></p>\n<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_989f992b-6b2","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/9b79406c-89a8-49bd-8a38-e72db80996e9","x-work-arrangement":"onsite","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$295K – $555K • Offers Equity","x-skills-required":["GPU kernels","HIP","CUDA","Triton","NCCL/RCCL","distributed inference systems","GPU performance tools","memory/comms profiling"],"x-skills-preferred":["open-source libraries","GPU performance tools","memory/comms profiling"],"datePosted":"2026-03-06T18:28:36.084Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"GPU kernels, HIP, CUDA, Triton, NCCL/RCCL, distributed inference systems, GPU performance tools, memory/comms profiling, open-source libraries, GPU performance tools, memory/comms profiling","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":295000,"maxValue":555000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_4d9cbf90-719"},"title":"Principal Software Engineer","description":"<p><strong>Summary</strong></p>\n<p>Microsoft are looking for a highly experienced Principal Software Engineer to join their Ads Engineering Platform team to design and build next-generation Ads products that drive revenue growth and create innovative advertising experiences for users and advertisers.</p>\n<p><strong>About the Role</strong></p>\n<p>We are seeking a highly experienced software engineer to join our Ads Engineering Platform team to design and build next-generation Ads products that drive revenue growth and create innovative advertising experiences for users and advertisers. You will play a key role in evolving the core capabilities of our ad-serving infrastructure—the engine that powers advertising across Bing Search, MSN, Microsoft Start, and shopping experiences in Microsoft Edge. Our serving stack operates at massive global scale, delivering millions of ad requests per second through a geo-distributed, low-latency system that integrates real-time bidding, intelligent ranking, and ML-driven decisioning pipelines. We leverage a mix of CPU and GPU-based inference to balance latency, throughput, and cost efficiency. This role combines product innovation, distributed systems architecture, and performance engineering. You will help shape both new monetization capabilities and the next generation of model serving infrastructure that powers them.</p>\n<p><strong>Accountabilities</strong></p>\n<ul>\n<li>Design and build new Ads products and monetization capabilities that unlock incremental revenue and enhance advertiser and end-user experiences.</li>\n<li>Lead the development of large-scale, distributed online serving systems to process millions of ad requests per second with ultra-low latency and high reliability.</li>\n</ul>\n<p><strong>The Candidate we&#39;re looking for</strong></p>\n<p><strong>Experience:</strong></p>\n<ul>\n<li>10+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python.</li>\n</ul>\n<p><strong>Technical skills:</strong></p>\n<ul>\n<li>Proficient experience designing and operating real-time online serving or ranking systems.</li>\n<li>Proficient understanding of distributed systems fundamentals: concurrency, multi-threading, memory management, networking, and fault tolerance.</li>\n</ul>\n<p><strong>Personal attributes:</strong></p>\n<ul>\n<li>Demonstrated ability to diagnose performance bottlenecks and improve latency, throughput, and cost efficiency in high-traffic systems.</li>\n</ul>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Competitive salary range of $163,000 - $296,400 per year.</li>\n<li>Benefits and other compensation.</li>\n<li>Opportunities for professional growth and development.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_4d9cbf90-719","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Microsoft","sameAs":"https://microsoft.ai","logo":"https://logos.yubhub.co/microsoft.ai.png"},"x-apply-url":"https://microsoft.ai/job/principal-software-engineer-32/","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$163,000 - $296,400 per year","x-skills-required":["C","C++","C#","Java","JavaScript","Python","distributed systems","real-time online serving","ranking systems"],"x-skills-preferred":["GPU performance optimization","model quantization","efficient resource scheduling"],"datePosted":"2026-03-06T07:34:43.749Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Redmond"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"C, C++, C#, Java, JavaScript, Python, distributed systems, real-time online serving, ranking systems, GPU performance optimization, model quantization, efficient resource scheduling","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":163000,"maxValue":296400,"unitText":"YEAR"}}}]}