{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/runtime-systems"},"x-facet":{"type":"skill","slug":"runtime-systems","display":"Runtime Systems","count":3},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_faffae87-882"},"title":"Staff Software Engineer - GenAI Performance and Kernel","description":"<p>As a staff software engineer for GenAI Performance and Kernel, you will own the design, implementation, optimization, and correctness of the high-performance GPU kernels powering our GenAI inference stack. You will lead development of highly-tuned, low-level compute paths, manage trade-offs between hardware efficiency and generality, and mentor others in kernel-level performance engineering.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Leading the design, implementation, benchmarking, and maintenance of core compute kernels optimized for various hardware backends (GPU, accelerators)</li>\n<li>Driving the performance roadmap for kernel-level improvements: vectorization, tensorization, tiling, fusion, mixed precision, sparsity, quantization, memory reuse, scheduling, auto-tuning, etc.</li>\n<li>Integrating kernel optimizations with higher-level ML systems</li>\n<li>Building and maintaining profiling, instrumentation, and verification tooling to detect correctness, performance regressions, numerical issues, and hardware utilization gaps</li>\n<li>Leading performance investigations and root-cause analysis on inference bottlenecks, e.g. memory bandwidth, cache contention, kernel launch overhead, tensor fragmentation</li>\n<li>Establishing coding patterns, abstractions, and frameworks to modularize kernels for reuse, cross-backend portability, and maintainability</li>\n<li>Influencing system architecture decisions to make kernel improvements more effective (e.g. memory layout, dataflow scheduling, kernel fusion boundaries)</li>\n<li>Mentoring and guiding other engineers working on lower-level performance, providing code reviews, and helping set best practices</li>\n<li>Collaborating with infrastructure, tooling, and ML teams to roll out kernel-level optimizations into production, and monitoring their impact</li>\n</ul>\n<p>Requirements include:</p>\n<ul>\n<li>BS/MS/PhD in Computer Science, or a related field</li>\n<li>Deep hands-on experience writing and tuning compute kernels (CUDA, Triton, OpenCL, LLVM IR, assembly or similar sort) for ML workloads</li>\n<li>Strong knowledge of GPU/accelerator architecture: warp structure, memory hierarchy (global, shared, register, L1/L2 caches), tensor cores, scheduling, SM occupancy, etc.</li>\n<li>Experience with advanced optimization techniques: tiling, blocking, software pipelining, vectorization, fusion, loop transformations, auto-tuning</li>\n<li>Familiarity with ML-specific kernel libraries (cuBLAS, cuDNN, CUTLASS, oneDNN, etc.) or open kernels</li>\n<li>Strong debugging and profiling skills (Nsight, NVProf, perf, vtune, custom instrumentation)</li>\n<li>Experience reasoning about numerical stability, mixed precision, quantization, and error propagation</li>\n<li>Experience in integrating optimized kernels into real-world ML inference systems; exposure to distributed inference pipelines, memory management, and runtime systems</li>\n<li>Experience building high-performance products leveraging GPU acceleration</li>\n<li>Excellent communication and leadership skills , able to drive design discussions, mentor colleagues, and make trade-offs visible</li>\n<li>A track record of shipping performance-critical, high-quality production software</li>\n<li>Bonus: published in systems/ML performance venues (e.g. MLSys, ASPLOS, ISCA, PPoPP), experience with custom accelerators or FPGA, experience with sparsity or model compression techniques</li>\n</ul>\n<p>The pay range for this role is $190,900-$232,800 USD per year, depending on location and experience.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_faffae87-882","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Databricks","sameAs":"https://databricks.com","logo":"https://logos.yubhub.co/databricks.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/databricks/jobs/8202700002","x-work-arrangement":"onsite","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$190,900-$232,800 USD per year","x-skills-required":["Compute kernels","GPU/accelerator architecture","Advanced optimization techniques","ML-specific kernel libraries","Debugging and profiling skills","Numerical stability","Mixed precision","Quantization","Error propagation","Distributed inference pipelines","Memory management","Runtime systems","High-performance products","GPU acceleration"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:46:07.442Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, California"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Compute kernels, GPU/accelerator architecture, Advanced optimization techniques, ML-specific kernel libraries, Debugging and profiling skills, Numerical stability, Mixed precision, Quantization, Error propagation, Distributed inference pipelines, Memory management, Runtime systems, High-performance products, GPU acceleration","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":190900,"maxValue":232800,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_35f8645a-02b"},"title":"Software Engineer, Hardware","description":"<p><strong>Job Posting</strong></p>\n<p><strong>Software Engineer, Hardware</strong></p>\n<p><strong>Location</strong></p>\n<p>San Francisco</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p>Scaling</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$266K – $455K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p>More details about our benefits are available to candidates during the hiring process.</p>\n<p>This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.</p>\n<p><strong><strong>About the Team</strong></strong></p>\n<p>OpenAI’s Hardware organization develops silicon and system-level solutions designed for the unique demands of advanced AI workloads. The team is responsible for building the next generation of AI-native silicon while working closely with software and research partners to co-design hardware tightly integrated with AI models. In addition to delivering production-grade silicon for OpenAI’s supercomputing infrastructure, the team also creates custom design tools and methodologies that accelerate innovation and enable hardware optimized specifically for AI.</p>\n<p><strong><strong>About the Role</strong></strong></p>\n<p>As a software engineer on the Scaling team, you’ll help build and optimize the low-level stack that orchestrates computation and data movement across OpenAI’s supercomputing clusters. Your work will involve designing high-performance runtimes, building custom kernels, contributing to compiler infrastructure, and developing scalable simulation systems to validate and optimize distributed training workloads.</p>\n<p>You will work at the intersection of systems programming, ML infrastructure, and high-performance computing, helping to create both ergonomic developer APIs and highly efficient runtime systems. This means balancing ease of use and introspection with the need for stability and performance on our evolving hardware fleet.</p>\n<p>This role is based in San Francisco, CA, with a hybrid work model (3 days/week in-office). Relocation assistance is available.</p>\n<p><strong><strong>In this role, you will:</strong></strong></p>\n<ul>\n<li>Design and build APIs and runtime components to orchestrate computation and data movement across heterogeneous ML workloads.</li>\n</ul>\n<ul>\n<li>Contribute to compiler infrastructure, including the development of optimizations and compiler passes to support evolving hardware.</li>\n</ul>\n<ul>\n<li>Engineer and optimize compute and data kernels, ensuring correctness, high performance, and portability across simulation and production environments.</li>\n</ul>\n<ul>\n<li>Profile and optimize system bottlenecks, especially around I/O, memory hierarchy, and interconnects, at both local and distributed scales.</li>\n</ul>\n<ul>\n<li>Develop simulation infrastructure to validate runtime behaviors, test training stack changes, and support early-stage hardware and system development.</li>\n</ul>\n<ul>\n<li>Rapidly deploy runtime and compiler updates to new supercomputing builds in close collaboration with hardware and research teams.</li>\n</ul>\n<ul>\n<li>Work across a diverse stack, primarily using Rust and Python, with opportunities to influence architecture decisions across the training framework.</li>\n</ul>\n<p><strong><strong>You might thrive in this role if you:</strong></strong></p>\n<ul>\n<li>Have a deep curiosity for how large-scale systems work and enjoy making them faster, simpler, and more reliable.</li>\n</ul>\n<ul>\n<li>Are proficient in systems programming (e.g., Rust, C++) and scripting languages like Python.</li>\n</ul>\n<ul>\n<li>Have experience in one or more of the following areas: compiler development, kernel authoring, accelerator programming, runtime systems, distributed systems, or high-performance simulation.</li>\n</ul>\n<ul>\n<li>Are excited to work in a fast-paced, highly collaborative environment with evolving hardware and ML system demands.</li>\n</ul>\n<ul>\n<li>Value engineering excellence, technical leadership, and thoughtful system design.</li>\n</ul>\n<p><strong>About OpenAI</strong></p>\n<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_35f8645a-02b","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/778e3a4f-c318-4a79-a745-00e722e5ee47","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$266K – $455K • Offers Equity","x-skills-required":["Rust","C++","Python","compiler development","kernel authoring","accelerator programming","runtime systems","distributed systems","high-performance simulation"],"x-skills-preferred":["systems programming","ML infrastructure","high-performance computing"],"datePosted":"2026-03-06T18:28:45.571Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Rust, C++, Python, compiler development, kernel authoring, accelerator programming, runtime systems, distributed systems, high-performance simulation, systems programming, ML infrastructure, high-performance computing","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":266000,"maxValue":455000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_4e51470c-8f1"},"title":"Software Engineer, Accelerators","description":"<p><strong>Software Engineer, Accelerators</strong></p>\n<p><strong>Location</strong></p>\n<p>San Francisco</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p>Scaling</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$295K – $380K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p><strong>About the Team</strong></p>\n<p>The Kernels team at OpenAI builds the low-level software that accelerates our most ambitious AI research.</p>\n<p>We work at the boundary of hardware and software, developing high-performance kernels, distributed system optimizations, and runtime improvements to make large-scale training and inference more efficient.</p>\n<p>Our work enables OpenAI to push the limits by ensuring models - from LLMs to recommender systems - to run reliably on advanced supercomputing platforms. That includes adapting our software stack to new types of accelerators, tuning system performance end-to-end, and removing bottlenecks across every layer of the stack.</p>\n<p><strong>About the Role</strong></p>\n<p>On the Accelerators team, you will help OpenAI evaluate and bring up new compute platforms that can support large-scale AI training and inference.</p>\n<p>Your work will range from prototyping system software on new accelerators to enabling performance optimizations across our AI workloads.</p>\n<p>You’ll work across the stack, collaborating with both hardware and software aspects - working on kernels, sharding strategies, scaling across distributed systems, and performance modeling.</p>\n<p>You&#39;ll help adapt OpenAI&#39;s software stack to non-traditional hardware and drive efficiency improvements in core AI workloads. This is not a compiler-focused role, rather bridging ML algorithms with system performance - especially at scale.</p>\n<p><strong>In this role, you will:</strong></p>\n<ul>\n<li>Prototype and enable OpenAI&#39;s AI software stack on new, exploratory accelerator platforms.</li>\n</ul>\n<ul>\n<li>Optimize large-scale model performance (LLMs, recommender systems, distributed AI workloads) for diverse hardware environments.</li>\n</ul>\n<ul>\n<li>Develop kernels, sharding mechanisms, and system scaling strategies tailored to emerging accelerators.</li>\n</ul>\n<ul>\n<li>Collaborate on optimizations at the model code level (e.g. PyTorch) and below to enhance performance on non-traditional hardware.</li>\n</ul>\n<p>Perform system-level performance modeling, debug bottlenecks, and drive end-to-end optimization.</p>\n<ul>\n<li>Work with hardware teams and vendors to evaluate alternatives to existing platforms and adapt the software stack to their architectures.</li>\n</ul>\n<ul>\n<li>Contribute to runtime improvements, compute/communication overlapping, and scaling efforts for frontier AI workloads.</li>\n</ul>\n<p><strong>You might thrive in this role if you have:</strong></p>\n<ul>\n<li>3+ years of experience working on AI infrastructure, including kernels, systems, or hardware-software co-design</li>\n</ul>\n<ul>\n<li>Hands-on experience with accelerator platforms for AI at data center scale (e.g., TPUs, custom silicon, exploratory architectures).</li>\n</ul>\n<ul>\n<li>Strong understanding of kernels, sharding, runtime systems, or distributed scaling techniques.</li>\n</ul>\n<ul>\n<li>Familiarity with optimizing LLMs, CNNs, or recommender models for hardware efficiency.</li>\n</ul>\n<ul>\n<li>Experience with performance modeling, system debugging, and software stack adaptation for novel architectures.</li>\n</ul>\n<ul>\n<li>Exposure to mobile accelerators is welcome, but experience enabling data center-scale AI hardware is preferred.</li>\n</ul>\n<ul>\n<li>Ability to operate across multiple levels of the stack, rapidly prototype solutions, and navigate ambiguity in early hardware bring-up phases</li>\n</ul>\n<ul>\n<li>Interest in shaping the future of AI compute through exploration of alternatives to mainstream accelerators.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_4e51470c-8f1","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/f386b209-1259-4b79-bf5a-aa97fc7ce77b","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$295K – $380K • Offers Equity","x-skills-required":["AI infrastructure","kernels","systems","hardware-software co-design","accelerator platforms","TPUs","custom silicon","exploratory architectures","kernels","sharding","runtime systems","distributed scaling techniques","LLMs","CNNs","recommender models","hardware efficiency","performance modeling","system debugging","software stack adaptation","novel architectures"],"x-skills-preferred":["mobile accelerators","data center-scale AI hardware"],"datePosted":"2026-03-06T18:27:12.141Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"AI infrastructure, kernels, systems, hardware-software co-design, accelerator platforms, TPUs, custom silicon, exploratory architectures, kernels, sharding, runtime systems, distributed scaling techniques, LLMs, CNNs, recommender models, hardware efficiency, performance modeling, system debugging, software stack adaptation, novel architectures, mobile accelerators, data center-scale AI hardware","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":295000,"maxValue":380000,"unitText":"YEAR"}}}]}