{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/kernels"},"x-facet":{"type":"skill","slug":"kernels","display":"Kernels","count":14},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_7bde3fd8-78f"},"title":"Principal VM Engineer – Workers Runtime Team","description":"<p>About Us</p>\n<p>At Cloudflare, we are on a mission to help build a better Internet. Today the company runs one of the world&#39;s largest networks that powers millions of websites and other Internet properties for customers ranging from individual bloggers to SMBs to Fortune 500 companies.</p>\n<p>We protect and accelerate any Internet application online without adding hardware, installing software, or changing a line of code. Internet properties powered by Cloudflare all have web traffic routed through its intelligent global network, which gets smarter with every request. As a result, they see significant improvement in performance and a decrease in spam and other attacks.</p>\n<p>We were named to Entrepreneur Magazine&#39;s Top Company Cultures list and ranked among the World&#39;s Most Innovative Companies by Fast Company.</p>\n<p><strong>Available Locations:</strong></p>\n<p>Remote in US and Europe</p>\n<p><strong>Principal VM Engineer – Workers Runtime Team</strong></p>\n<p>About the Department</p>\n<p>The Emerging Technologies &amp; Incubation (ETI) team at Cloudflare builds and launches bold, new products that push the boundaries of what&#39;s possible on the internet. By leveraging Cloudflare&#39;s massive network and edge computing capabilities, we solve complex problems at a scale few others can achieve.</p>\n<p>About the Team</p>\n<p>The Workers Runtime team is responsible for the execution environment that runs customer code at the edge. We focus on performance, security, and scalability, enhancing JavaScript APIs, WebAssembly support, and system optimizations to prepare for the next 10x scale increase. Our runtime operates in a resource-constrained, highly secure environment, requiring careful management of memory, CPU, and I/O.</p>\n<p>What You&#39;ll Do</p>\n<p>We are looking for a VM Engineer to help improve and embed the V8 virtual machine in our runtime. You&#39;ll work on low-level optimizations, performance enhancements, garbage collection, and language support to ensure our platform remains cutting-edge. This role is ideal for engineers who love tackling high-performance, low-latency challenges in distributed environments.</p>\n<p>Key Responsibilities</p>\n<ul>\n<li>Optimize and embed the V8 VM within Cloudflare&#39;s Workers Runtime.</li>\n<li>Improve JavaScript execution performance and WebAssembly integration.</li>\n<li>Debug, optimize, and enhance low-latency, real-time environments.</li>\n<li>Ensure the reliability and efficiency of large-scale, Linux-based distributed systems.</li>\n<li>Collaborate with engineers across runtime, security, and networking teams to push the boundaries of edge computing.</li>\n</ul>\n<p>What We&#39;re Looking For</p>\n<ul>\n<li>6+ years of professional experience with C++.</li>\n<li>4+ years of hands-on VM/compiler experience, ideally with V8.</li>\n<li>Strong knowledge of computer science fundamentals, including data structures, algorithms, and system architecture.</li>\n<li>Experience with low-latency environments (e.g., game streaming, trading systems, high-performance computing).</li>\n<li>Operational mindset – you build scalable, production-ready solutions.</li>\n<li>Deep understanding of web technologies (HTTP, JavaScript, WASM).</li>\n</ul>\n<p>Bonus Points</p>\n<ul>\n<li>Experience working with Rust in high-performance distributed systems.</li>\n<li>Familiarity with serverless platforms and cloud computing.</li>\n<li>Deep knowledge of JS engine internals (V8, SpiderMonkey, JavaScriptCore).</li>\n<li>Experience with standalone WebAssembly runtimes (Wasmtime, Wasmer, Lucet).</li>\n<li>Strong expertise in Linux/UNIX systems, kernels, and networking.</li>\n<li>Contributions to large open-source projects.</li>\n</ul>\n<p>This is an exciting opportunity to work on cutting-edge compiler and runtime technologies at an unmatched scale. If you&#39;re passionate about high-performance computing, distributed systems, and compilers, we’d love to hear from you!</p>\n<p><strong>## ##</strong></p>\n<p>What Makes Cloudflare Special?</p>\n<p>We’re not just a highly ambitious, large-scale technology company. We’re a highly ambitious, large-scale technology company with a soul. Fundamental to our mission to help build a better Internet is protecting the free and open Internet.</p>\n<p>Project Galileo: Since 2014, we&#39;ve equipped more than 2,400 journalism and civil society organizations in 111 countries with powerful tools to defend themselves against attacks that would otherwise censor their work, technology already used by Cloudflare’s enterprise customers--at no cost.</p>\n<p>Athenian Project: In 2017, we created the Athenian Project to ensure that state and local governments have the highest level of protection and reliability for free, so that their constituents have access to election information and voter registration. Since the project, we&#39;ve provided services to more than 425 local government election websites in 33 states.</p>\n<p>1.1.1.1: We released 1.1.1.1 to help fix the foundation of the Internet by building a faster, more secure and privacy-centric public DNS resolver. This is available publicly for everyone to use - it is the first consumer-focused service Cloudflare has ever released.</p>\n<p>Here’s the deal - we don’t store client IP addresses never, ever. We will continue to abide by our privacy commitment and ensure that no user data is sold to advertisers or used to target consumers.</p>\n<p>Sound like something you’d like to be a part of? We’d love to hear from you!</p>\n<p>This position may require access to information protected under U.S. export control laws, including the U.S. Export Administration Regulations. Please note that any offer of employment may be conditioned on your authorization to receive software or technology controlled under these U.S. export laws without sponsorship for an export license.</p>\n<p>Cloudflare is proud to be an equal opportunity employer. We are committed to providing equal employment opportunity for all people and place great value in both diversity and inclusiveness. All qualified applicants will be considered for employment without regard to their, or any other person&#39;s, perceived or actual race, color, religion, sex, gender, gender identity, gender expression, sexual orientation, national origin, ancestry, citizenship, age, physical or mental disability, medical condition, family care status, or any other basis protected by law. We are an AA/Veterans/Disabled Employer. Cloudflare provides reasonable accommodations to qualified individuals with disabilities. Please tell us if you require a reasonable accommodation to apply for a job. Examples of reasonable accommodations include, but are not limited to, changing the application process, providing documents in an alternate format, using a sign language interpreter, or using specialized equipment. If you require a reasonable accommodation to apply for a job, please contact us via e-mail at hr@cloudflare.com or via mail at 101 Townsend St. San Francisco, CA 94107.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_7bde3fd8-78f","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Cloudflare","sameAs":"https://www.cloudflare.com/","logo":"https://logos.yubhub.co/cloudflare.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/cloudflare/jobs/6718312","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["C++","VM/compiler experience","V8","computer science fundamentals","data structures","algorithms","system architecture","low-latency environments","game streaming","trading systems","high-performance computing","web technologies","HTTP","JavaScript","WASM"],"x-skills-preferred":["Rust","serverless platforms","cloud computing","JS engine internals","WebAssembly runtimes","Linux/UNIX systems","kernels","networking","open-source projects"],"datePosted":"2026-04-18T15:55:33.444Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Distributed"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"C++, VM/compiler experience, V8, computer science fundamentals, data structures, algorithms, system architecture, low-latency environments, game streaming, trading systems, high-performance computing, web technologies, HTTP, JavaScript, WASM, Rust, serverless platforms, cloud computing, JS engine internals, WebAssembly runtimes, Linux/UNIX systems, kernels, networking, open-source projects"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_cba88898-896"},"title":"Research Engineer, Infrastructure, Kernels","description":"<p>We&#39;re looking for an infrastructure research engineer to design, optimize, and maintain the compute foundations that power large-scale language model training. You will develop high-performance ML kernels (e.g., CUDA, CuTe, Triton), enable efficient low-precision arithmetic, and improve the distributed compute stack that makes training large models possible.</p>\n<p>This role is perfect for an engineer who enjoys working close to the metal and across the research boundary. You&#39;ll collaborate with researchers and systems architects to bridge algorithmic design with hardware efficiency. You&#39;ll prototype new kernel implementations, profile performance across hardware generations, and help define the numerical and parallelism strategies that determine how we scale next-generation AI systems.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Design and implement custom ML kernels (e.g., CUDA, CuTe, Triton) for core LLM operations such as attention, matrix multiplication, gating, and normalization, optimized for modern GPU and accelerator architectures.</li>\n<li>Design and think through compute primitives to reduce memory bandwidth bottlenecks and improve kernel compute efficiency.</li>\n<li>Collaborate with research teams to align kernel-level optimizations with model architecture and algorithmic goals.</li>\n<li>Develop and maintain a library of reusable kernels and performance benchmarks that serve as the foundation for internal model training.</li>\n<li>Contribute to infrastructure stability and scalability, ensuring reproducibility, consistency across precision formats, and high utilization of compute resources.</li>\n<li>Document and share insights through internal talks, technical papers, or open-source contributions to strengthen the broader ML systems community.</li>\n</ul>\n<p><strong>Skills and Qualifications</strong></p>\n<p>Minimum qualifications:</p>\n<ul>\n<li>Bachelor’s degree or equivalent experience in computer science, electrical engineering, statistics, machine learning, physics, robotics, or similar.</li>\n<li>Strong engineering skills, ability to contribute performant, maintainable code and debug in complex codebases</li>\n<li>Understanding of deep learning frameworks (e.g., PyTorch, JAX) and their underlying system architectures.</li>\n<li>Thrive in a highly collaborative environment involving many, different cross-functional partners and subject matter experts.</li>\n<li>A bias for action with a mindset to take initiative to work across different stacks and different teams where you spot the opportunity to make sure something ships.</li>\n<li>Proficiency in CUDA, CuTe, Triton, or other GPU programming frameworks.</li>\n<li>Demonstrated ability to analyze, profile, and optimize compute-intensive workloads.</li>\n</ul>\n<p>Preferred qualifications:</p>\n<ul>\n<li>Experience training or supporting large-scale language models with tens of billions of parameters or more.</li>\n<li>Track record of improving research productivity through infrastructure design or process improvements.</li>\n<li>Experience developing or tuning kernels for deep learning frameworks such as PyTorch, JAX, or custom accelerators.</li>\n<li>Familiarity with tensor parallelism, pipeline parallelism, or distributed data processing frameworks.</li>\n<li>Experience implementing low-precision formats (FP8, INT8, block floating point) or contributing to related compiler stacks (e.g., XLA, TVM).</li>\n<li>Contributions to open-source GPU, ML systems, or compiler optimization projects.</li>\n<li>Prior research or engineering experience in numerical optimization, communication-efficient training, or scalable AI infrastructure.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_cba88898-896","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Thinking Machines Lab","sameAs":"https://thinkingmachines.ai/","logo":"https://logos.yubhub.co/thinkingmachines.ai.png"},"x-apply-url":"https://job-boards.greenhouse.io/thinkingmachines/jobs/5013934008","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$350,000 - $475,000 USD","x-skills-required":["CUDA","CuTe","Triton","GPU programming frameworks","Deep learning frameworks (e.g., PyTorch, JAX)","Computer science","Electrical engineering","Statistics","Machine learning","Physics","Robotics"],"x-skills-preferred":["Experience training or supporting large-scale language models with tens of billions of parameters or more","Track record of improving research productivity through infrastructure design or process improvements","Experience developing or tuning kernels for deep learning frameworks such as PyTorch, JAX, or custom accelerators","Familiarity with tensor parallelism, pipeline parallelism, or distributed data processing frameworks","Experience implementing low-precision formats (FP8, INT8, block floating point) or contributing to related compiler stacks (e.g., XLA, TVM)","Contributions to open-source GPU, ML systems, or compiler optimization projects","Prior research or engineering experience in numerical optimization, communication-efficient training, or scalable AI infrastructure"],"datePosted":"2026-04-18T15:54:38.498Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"CUDA, CuTe, Triton, GPU programming frameworks, Deep learning frameworks (e.g., PyTorch, JAX), Computer science, Electrical engineering, Statistics, Machine learning, Physics, Robotics, Experience training or supporting large-scale language models with tens of billions of parameters or more, Track record of improving research productivity through infrastructure design or process improvements, Experience developing or tuning kernels for deep learning frameworks such as PyTorch, JAX, or custom accelerators, Familiarity with tensor parallelism, pipeline parallelism, or distributed data processing frameworks, Experience implementing low-precision formats (FP8, INT8, block floating point) or contributing to related compiler stacks (e.g., XLA, TVM), Contributions to open-source GPU, ML systems, or compiler optimization projects, Prior research or engineering experience in numerical optimization, communication-efficient training, or scalable AI infrastructure","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":350000,"maxValue":475000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_2bc6ae79-8ee"},"title":"Staff Technical Lead for Inference & ML Performance","description":"<p>We&#39;re looking for a Staff Technical Lead for Inference &amp; ML Performance to guide a team in building and optimizing state-of-the-art inference systems. This role is intense yet deeply impactful.</p>\n<p>You&#39;ll shape the future of fal&#39;s inference engine and ensure our generative models achieve best-in-class performance. Your work directly impacts our ability to rapidly deliver cutting-edge creative solutions to users, from individual creators to global brands.</p>\n<p>Day-to-day, you&#39;ll set technical direction, guide your team to build high-performance inference solutions, and personally contribute to critical inference performance enhancements and optimizations. You&#39;ll collaborate closely with research &amp; applied ML teams, influence model inference strategies and deployment techniques, and drive advanced performance optimizations.</p>\n<p>As a leader, you&#39;ll mentor and scale your team, coach and expand your team of performance-focused engineers, and help them innovate, solve complex performance challenges, and level up their skills.</p>\n<p>To succeed in this role, you&#39;ll need to be deeply experienced in ML performance optimization, understand the full ML performance stack, and know inference inside-out. You&#39;ll also need to thrive in cross-functional collaboration and have excellent leadership skills.</p>\n<p>If you&#39;re ready to lead the future of inference performance at a fast-paced, high-growth frontier, apply now!</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_2bc6ae79-8ee","directApply":true,"hiringOrganization":{"@type":"Organization","name":"fal","sameAs":"https://fal.com","logo":"https://logos.yubhub.co/fal.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/fal/jobs/4012780009","x-work-arrangement":"onsite","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["ML performance optimization","PyTorch","TensorRT","TransformerEngine","Triton","CUTLASS kernels","Quantization","Kernel authoring","Compilation","Model parallelism","Distributed serving","Profiling"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:50:42.839Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"ML performance optimization, PyTorch, TensorRT, TransformerEngine, Triton, CUTLASS kernels, Quantization, Kernel authoring, Compilation, Model parallelism, Distributed serving, Profiling"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_ec7cc743-ef4"},"title":"Senior Software Engineer II, Inference","description":"<p>We&#39;re seeking a senior software engineer to join our team and lead the design and development of our Kubernetes-native inference platform. As a senior engineer, you will be responsible for leading design reviews, driving architecture, and ensuring the reliability and scalability of our platform.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Leading design reviews and driving architecture within the team</li>\n<li>Defining and owning SLIs/SLOs and ensuring post-incident actions land and reliability improves release-over-release</li>\n<li>Implementing advanced optimizations such as micro-batch schedulers, speculative decoding, and KV-cache reuse</li>\n<li>Strengthening incident posture through capacity planning, autoscaling policy, and rollback/traffic-shift strategies</li>\n<li>Mentoring IC1/IC2 engineers and reviewing cross-team designs to elevate coding/testing standards</li>\n</ul>\n<p>We&#39;re looking for someone with strong coding skills in Python or Go, deep familiarity with networked systems and performance, and hands-on experience with Kubernetes at production scale. If you have experience with inference internals, batching, caching, mixed precision, and streaming token delivery, that&#39;s a plus.</p>\n<p>In addition to a competitive salary, we offer a range of benefits including medical, dental, and vision insurance, company-paid life insurance, and flexible PTO. We&#39;re committed to creating a work environment that&#39;s inclusive, diverse, and supportive of our employees&#39; well-being.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_ec7cc743-ef4","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4604832006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$165,000 to $242,000","x-skills-required":["Python","Go","Kubernetes","Networked systems","Performance","Inference internals","Batching","Caching","Mixed precision","Streaming token delivery"],"x-skills-preferred":["CUDA kernels","NCCL/SHARP","RDMA/NUMA","GPU interconnect topologies","Contributions to inference frameworks","Experience with multi-team initiatives"],"datePosted":"2026-04-18T15:50:27.738Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Go, Kubernetes, Networked systems, Performance, Inference internals, Batching, Caching, Mixed precision, Streaming token delivery, CUDA kernels, NCCL/SHARP, RDMA/NUMA, GPU interconnect topologies, Contributions to inference frameworks, Experience with multi-team initiatives","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":165000,"maxValue":242000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_9701c504-1a6"},"title":"Senior Software Engineer I, Inference","description":"<p>We&#39;re looking for a Senior Software Engineer I to join our team. As a senior engineer, you&#39;ll lead designs, raise engineering standards, and deliver measurable improvements to latency, throughput, and reliability across multiple services. You&#39;ll partner with product, orchestration, and hardware teams to evolve our Kubernetes-native inference platform and meet strict P99 SLAs at scale.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Lead design reviews and drive architecture within the team; decompose multi-service work into clear milestones.</li>\n<li>Define and own SLIs/SLOs; ensure post-incident actions land and reliability improves release-over-release.</li>\n<li>Implement advanced optimizations (e.g., micro-batch schedulers, speculative decoding, KV-cache reuse) and quantify impact.</li>\n<li>Strengthen incident posture: capacity planning, autoscaling policy, graceful degradation, rollback/traffic-shift strategies.</li>\n<li>Mentor IC1/IC2 engineers; review cross-team designs and elevate coding/testing standards.</li>\n</ul>\n<p>Requirements include:</p>\n<ul>\n<li>3-5 years of industry experience building distributed systems or cloud services.</li>\n<li>Strong coding in Python or Go (C++ a plus) and deep familiarity with networked systems and performance.</li>\n<li>Hands-on experience with Kubernetes at production scale, CI/CD, and observability stacks (Prometheus, Grafana, OpenTelemetry).</li>\n<li>Practical knowledge of inference internals: batching, caching, mixed precision (BF16/FP8), streaming token delivery.</li>\n<li>Proven track record improving tail latency (P95/P99) and service reliability through metrics-driven work.</li>\n</ul>\n<p>Preferred qualifications include contributions to inference frameworks, experience with CUDA kernels, NCCL/SHARP, RDMA/NUMA, or GPU interconnect topologies, and leading multi-team initiatives or partnering with customers on mission-critical launches.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_9701c504-1a6","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4647603006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$139,000 to $204,000","x-skills-required":["Python","Go","Kubernetes","CI/CD","Observability stacks","Inference internals","Batching","Caching","Mixed precision","Streaming token delivery"],"x-skills-preferred":["Contributions to inference frameworks","CUDA kernels","NCCL/SHARP","RDMA/NUMA","GPU interconnect topologies"],"datePosted":"2026-04-18T15:48:09.297Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Go, Kubernetes, CI/CD, Observability stacks, Inference internals, Batching, Caching, Mixed precision, Streaming token delivery, Contributions to inference frameworks, CUDA kernels, NCCL/SHARP, RDMA/NUMA, GPU interconnect topologies","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":139000,"maxValue":204000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_faffae87-882"},"title":"Staff Software Engineer - GenAI Performance and Kernel","description":"<p>As a staff software engineer for GenAI Performance and Kernel, you will own the design, implementation, optimization, and correctness of the high-performance GPU kernels powering our GenAI inference stack. You will lead development of highly-tuned, low-level compute paths, manage trade-offs between hardware efficiency and generality, and mentor others in kernel-level performance engineering.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Leading the design, implementation, benchmarking, and maintenance of core compute kernels optimized for various hardware backends (GPU, accelerators)</li>\n<li>Driving the performance roadmap for kernel-level improvements: vectorization, tensorization, tiling, fusion, mixed precision, sparsity, quantization, memory reuse, scheduling, auto-tuning, etc.</li>\n<li>Integrating kernel optimizations with higher-level ML systems</li>\n<li>Building and maintaining profiling, instrumentation, and verification tooling to detect correctness, performance regressions, numerical issues, and hardware utilization gaps</li>\n<li>Leading performance investigations and root-cause analysis on inference bottlenecks, e.g. memory bandwidth, cache contention, kernel launch overhead, tensor fragmentation</li>\n<li>Establishing coding patterns, abstractions, and frameworks to modularize kernels for reuse, cross-backend portability, and maintainability</li>\n<li>Influencing system architecture decisions to make kernel improvements more effective (e.g. memory layout, dataflow scheduling, kernel fusion boundaries)</li>\n<li>Mentoring and guiding other engineers working on lower-level performance, providing code reviews, and helping set best practices</li>\n<li>Collaborating with infrastructure, tooling, and ML teams to roll out kernel-level optimizations into production, and monitoring their impact</li>\n</ul>\n<p>Requirements include:</p>\n<ul>\n<li>BS/MS/PhD in Computer Science, or a related field</li>\n<li>Deep hands-on experience writing and tuning compute kernels (CUDA, Triton, OpenCL, LLVM IR, assembly or similar sort) for ML workloads</li>\n<li>Strong knowledge of GPU/accelerator architecture: warp structure, memory hierarchy (global, shared, register, L1/L2 caches), tensor cores, scheduling, SM occupancy, etc.</li>\n<li>Experience with advanced optimization techniques: tiling, blocking, software pipelining, vectorization, fusion, loop transformations, auto-tuning</li>\n<li>Familiarity with ML-specific kernel libraries (cuBLAS, cuDNN, CUTLASS, oneDNN, etc.) or open kernels</li>\n<li>Strong debugging and profiling skills (Nsight, NVProf, perf, vtune, custom instrumentation)</li>\n<li>Experience reasoning about numerical stability, mixed precision, quantization, and error propagation</li>\n<li>Experience in integrating optimized kernels into real-world ML inference systems; exposure to distributed inference pipelines, memory management, and runtime systems</li>\n<li>Experience building high-performance products leveraging GPU acceleration</li>\n<li>Excellent communication and leadership skills , able to drive design discussions, mentor colleagues, and make trade-offs visible</li>\n<li>A track record of shipping performance-critical, high-quality production software</li>\n<li>Bonus: published in systems/ML performance venues (e.g. MLSys, ASPLOS, ISCA, PPoPP), experience with custom accelerators or FPGA, experience with sparsity or model compression techniques</li>\n</ul>\n<p>The pay range for this role is $190,900-$232,800 USD per year, depending on location and experience.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_faffae87-882","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Databricks","sameAs":"https://databricks.com","logo":"https://logos.yubhub.co/databricks.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/databricks/jobs/8202700002","x-work-arrangement":"onsite","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$190,900-$232,800 USD per year","x-skills-required":["Compute kernels","GPU/accelerator architecture","Advanced optimization techniques","ML-specific kernel libraries","Debugging and profiling skills","Numerical stability","Mixed precision","Quantization","Error propagation","Distributed inference pipelines","Memory management","Runtime systems","High-performance products","GPU acceleration"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:46:07.442Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, California"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Compute kernels, GPU/accelerator architecture, Advanced optimization techniques, ML-specific kernel libraries, Debugging and profiling skills, Numerical stability, Mixed precision, Quantization, Error propagation, Distributed inference pipelines, Memory management, Runtime systems, High-performance products, GPU acceleration","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":190900,"maxValue":232800,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_44251c7b-221"},"title":"Member of Technical Staff - Recommendation Systems","description":"<p>We&#39;re seeking exceptional Applied engineers to join a high-priority project used by approximately 600 million monthly users. This is an exciting opportunity for individuals with an engineer or scientist background to apply their skills to recommendation systems, ranking algorithms, search technologies, and many other systems.</p>\n<p>You&#39;ll work at the intersection of advanced AI development and real-world impact, enhancing the ability to connect users with relevant content, accounts, and experiences.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Designing and architecting recommendation algorithms across various product surfaces</li>\n</ul>\n<ul>\n<li>Leveraging all of xAI&#39;s infrastructure and AI stacks to dramatically enhance the user experience</li>\n</ul>\n<ul>\n<li>Writing data pipelines and training jobs that continuously learn from product data</li>\n</ul>\n<ul>\n<li>Iterating and improving the algorithm by gathering user feedback in real time through experimentation</li>\n</ul>\n<ul>\n<li>Ensuring scalability and efficiency of machine learning systems</li>\n</ul>\n<p>Basic Qualifications:</p>\n<ul>\n<li>Knowledge of data infrastructure like Kafka, Clickhouse, and Spark</li>\n</ul>\n<ul>\n<li>Experienced in implementing recommender systems and/or deep learning applications at industrial scale</li>\n</ul>\n<ul>\n<li>Skilled in one or more DL software frameworks such as JAX or PyTorch</li>\n</ul>\n<ul>\n<li>Exceptional candidates may be experienced in writing CUDA kernels</li>\n</ul>\n<p>Compensation and Benefits:</p>\n<p>$180,000 - $440,000 USD</p>\n<p>Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short &amp; long-term disability insurance, life insurance, and various other discounts and perks.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_44251c7b-221","directApply":true,"hiringOrganization":{"@type":"Organization","name":"xAI","sameAs":"https://www.xai.com/","logo":"https://logos.yubhub.co/xai.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/xai/jobs/4703144007","x-work-arrangement":"onsite","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$180,000 - $440,000 USD","x-skills-required":["data infrastructure","recommender systems","deep learning","DL software frameworks","CUDA kernels"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:45:00.153Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Palo Alto, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"data infrastructure, recommender systems, deep learning, DL software frameworks, CUDA kernels","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":180000,"maxValue":440000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_5c28c97d-fc5"},"title":"Member of Technical Staff - Image / Video Generation","description":"<p><strong>Job Title</strong></p>\n<p>Member of Technical Staff - Image / Video Generation</p>\n<p><strong>Job Description</strong></p>\n<p>We&#39;re the team behind Latent Diffusion, Stable Diffusion, and FLUX,foundational technologies that changed how the world creates images and video. We&#39;re creating the generative models that power how people make images and video,tools used by millions of creators, developers, and businesses worldwide. Our FLUX models are among the most advanced in the world, and we’re just getting started.</p>\n<p><strong>Why This Role</strong></p>\n<p>You&#39;ll train large-scale diffusion models for image and video generation, exploring new approaches while maintaining the rigor that helps us distinguish meaningful progress from incremental tweaks. This isn&#39;t about following established recipes,it&#39;s about running the experiments that clarify which architectural choices matter and which are less impactful.</p>\n<p><strong>What You’ll Work On</strong></p>\n<ul>\n<li>Trains large-scale diffusion transformer models for image and video data, working at the scale where intuitions break and empirical evidence matters</li>\n<li>Rigorously ablates design choices,running experiments that isolate variables, control for confounds, and produce insights you can actually trust,then communicating those results to shape our research direction</li>\n<li>Reasons about the speed-quality tradeoffs of neural network architectures in production settings where both constraints matter simultaneously</li>\n<li>Fine-tunes diffusion models for specialized applications like image and video upscalers, inpainting/outpainting models, and other tasks where general-purpose models aren&#39;t enough</li>\n</ul>\n<p><strong>What We’re Looking For</strong></p>\n<ul>\n<li>You&#39;ve trained large-scale diffusion models and developed strong intuitions about what matters. You know that at research scale, every design choice has tradeoffs, and the only way to know which ones are worth making is through careful ablation. You&#39;re comfortable debugging distributed training issues and presenting research findings to the team.</li>\n</ul>\n<p><strong>Required Skills</strong></p>\n<ul>\n<li>Hands-on experience training large-scale diffusion models for image and video data, with practical knowledge of common failure modes and what matters most in training</li>\n<li>Experience fine-tuning diffusion models for specialized applications,upscalers, inpainting, outpainting, or other tasks where understanding the domain matters as much as understanding the architecture</li>\n<li>Deep understanding of how to effectively evaluate image and video generative models,knowing which metrics correlate with quality and which are just convenient proxies</li>\n<li>Strong proficiency in PyTorch, transformer architectures, and the full ecosystem of modern deep learning</li>\n<li>Solid understanding of distributed training techniques,FSDP, low precision training, model parallelism,because our models don&#39;t fit on one GPU and training decisions impact research outcomes</li>\n</ul>\n<p><strong>Preferred Skills</strong></p>\n<ul>\n<li>Experience writing forward and backward Triton kernels and ensuring their correctness while considering floating point errors</li>\n<li>Proficiency with profiling, debugging, and optimizing single and multi-GPU operations using tools like Nsight or stack trace viewers</li>\n<li>Know the performance characteristics of different architectural choices at scale</li>\n<li>Have published research that contributed to how people think about generative models</li>\n</ul>\n<p><strong>How We Work Together</strong></p>\n<p>We’re a distributed team with real offices that people actually use. Depending on your role, you’ll either join us in Freiburg or SF at least 2 days a week (or one full week every other week), or work remotely with a monthly in-person week to stay connected. We’ll cover reasonable travel costs to make this possible. We think in-person time matters, and we’ve structured things to make it accessible to all. We’ll discuss what this will look like for the role during our interview process.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_5c28c97d-fc5","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Black Forest Labs","sameAs":"https://www.blackforestlabs.com/","logo":"https://logos.yubhub.co/blackforestlabs.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/blackforestlabs/jobs/4132217008","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["large-scale diffusion models","image and video data","PyTorch","transformer architectures","distributed training techniques"],"x-skills-preferred":["writing forward and backward Triton kernels","profiling, debugging, and optimizing single and multi-GPU operations","published research on generative models"],"datePosted":"2026-04-17T12:25:33.116Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Freiburg (Germany)"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"large-scale diffusion models, image and video data, PyTorch, transformer architectures, distributed training techniques, writing forward and backward Triton kernels, profiling, debugging, and optimizing single and multi-GPU operations, published research on generative models"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_17d653d0-9ba"},"title":"Distributed Training Engineer, Sora","description":"<p><strong>Location</strong></p>\n<p>San Francisco</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p>Research</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$293K – $490K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p>More details about our benefits are available to candidates during the hiring process.</p>\n<p>This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.</p>\n<p><strong>About the Team</strong></p>\n<p>The Sora team is working on making video a key capability of OpenAI’s foundation models. We are a hybrid research and product team that seeks to understand and expand the capabilities of our video models, while ensuring their reliability and safety. We accomplish this both through directly studying and experimenting with the models, as well as deploying them into the real-world to distribute their benefits widely.</p>\n<p><strong>About the Role</strong></p>\n<p>As a Distributed Systems/ML engineer, you will work on improving the training throughput for our internal training framework and enable researchers to experiment with new ideas. This requires good engineering (for example designing, implementing, and optimizing state-of-the-art AI models), writing bug-free machine learning code (surprisingly difficult!), and acquiring deep knowledge of the performance of supercomputers. We’re looking for people who love optimizing performance, understanding distributed systems, and who cannot stand having bugs in their code.</p>\n<p>This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.</p>\n<p><strong>In this role, you will:</strong></p>\n<ul>\n<li>Collaborate with researchers to enable them to develop systems-efficient video models and architectures</li>\n<li>Apply the latest techniques to our internal training framework to achieve impressive hardware efficiency for our training runs</li>\n<li>Profile and optimize our training framework</li>\n</ul>\n<p><strong>You might thrive in this role if you:</strong></p>\n<ul>\n<li>Have experience working with multi-modal ML pipelines</li>\n<li>Love diving deep into systems implementations and understanding their fundamentals in order to improve their performance and maintainability</li>\n<li>Have strong software engineering skills and are proficient in Python.</li>\n<li>Have experience understanding and optimizing training kernels</li>\n<li>Are passionate about understanding stable training dynamics</li>\n</ul>\n<p><strong>About OpenAI</strong></p>\n<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_17d653d0-9ba","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/2f1c59a8-570b-4192-9b5b-422f1a632cb6","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$293K – $490K • Offers Equity","x-skills-required":["Python","Multi-modal ML pipelines","Distributed systems","Software engineering","Training kernels"],"x-skills-preferred":["Experience working with multi-modal ML pipelines","Strong software engineering skills","Experience understanding and optimizing training kernels"],"datePosted":"2026-03-06T18:38:36.058Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Multi-modal ML pipelines, Distributed systems, Software engineering, Training kernels, Experience working with multi-modal ML pipelines, Strong software engineering skills, Experience understanding and optimizing training kernels","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":293000,"maxValue":490000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_1ef31769-74d"},"title":"Software Engineer, Fleet Management","description":"<p><strong>Software Engineer, Fleet Management</strong></p>\n<p><strong>Location</strong></p>\n<p>San Francisco</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p>Scaling</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$230K – $490K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p><strong>About the Role</strong></p>\n<p>The Fleet team at OpenAI supports the computing environment that powers our cutting-edge research and product development. We oversee large-scale systems that span data centers, GPUs, networking, and more, ensuring high availability, performance, and efficiency. Our work enables OpenAI’s models to operate seamlessly at scale, supporting both internal research and external products like ChatGPT. We prioritize safety, reliability, and responsible AI deployment over unchecked growth.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Design and build systems to manage both cloud and bare-metal fleets at scale.</li>\n</ul>\n<ul>\n<li>Develop tools that integrate low-level hardware metrics with high-level job scheduling and cluster management algorithms.</li>\n</ul>\n<ul>\n<li>Leverage LLMs to coordinate vendor operations and optimize infrastructure workflows.</li>\n</ul>\n<ul>\n<li>Automate infrastructure processes, reducing repetitive toil and improving system reliability.</li>\n</ul>\n<ul>\n<li>Collaborate with hardware, infrastructure, and research teams to ensure seamless integration across the stack.</li>\n</ul>\n<ul>\n<li>Continuously improve tools, automation, processes, and documentation to enhance operational efficiency.</li>\n</ul>\n<p><strong>You might thrive in this role if you:</strong></p>\n<ul>\n<li>Have strong software engineering skills with experience in large-scale infrastructure environments.</li>\n</ul>\n<ul>\n<li>Possess broad knowledge of cluster-level systems (e.g., Kubernetes, CI/CD pipelines, Terraform, cloud providers).</li>\n</ul>\n<ul>\n<li>Have deep expertise in server-level systems (e.g., systems, containerization, Chef, Linux kernels, firmware management, host routing).</li>\n</ul>\n<ul>\n<li>Are passionate about optimizing the performance and reliability of large compute fleets.</li>\n</ul>\n<ul>\n<li>Thrive in dynamic environments and are eager to solve complex infrastructure challenges.</li>\n</ul>\n<ul>\n<li>Value automation, efficiency, and continuous improvement in everything you build.</li>\n</ul>\n<p><strong>About OpenAI</strong></p>\n<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_1ef31769-74d","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/7809102e-e82a-4678-bf7c-221de8acc0d6","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$230K – $490K","x-skills-required":["software engineering","large-scale infrastructure environments","cluster-level systems","server-level systems","LLMs","infrastructure workflows","automation","operational efficiency"],"x-skills-preferred":["Kubernetes","CI/CD pipelines","Terraform","cloud providers","Chef","Linux kernels","firmware management","host routing"],"datePosted":"2026-03-06T18:29:06.599Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"software engineering, large-scale infrastructure environments, cluster-level systems, server-level systems, LLMs, infrastructure workflows, automation, operational efficiency, Kubernetes, CI/CD pipelines, Terraform, cloud providers, Chef, Linux kernels, firmware management, host routing","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":230000,"maxValue":490000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_989f992b-6b2"},"title":"Software Engineer, Inference – AMD GPU Enablement","description":"<p><strong>Software Engineer, Inference – AMD GPU Enablement</strong></p>\n<p><strong>Location</strong></p>\n<p>San Francisco</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p>Scaling</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$295K – $555K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p>More details about our benefits are available to candidates during the hiring process.</p>\n<p>This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.</p>\n<p><strong>About the Team</strong></p>\n<p>Our Inference team brings OpenAI’s most capable research and technology to the world through our products. We empower consumers, enterprises and developers alike to use and access our state-of-the-art AI models, allowing them to do things that they’ve never been able to before. We focus on performant and efficient model inference, as well as accelerating research progression via model inference.</p>\n<p><strong>About the Role</strong></p>\n<p>We’re hiring engineers to scale and optimize OpenAI’s inference infrastructure across emerging GPU platforms. You’ll work across the stack - from low-level kernel performance to high-level distributed execution - and collaborate closely with research, infra, and performance teams to ensure our largest models run smoothly on new hardware.</p>\n<p>This is a high-impact opportunity to shape OpenAI’s multi-platform inference capabilities from the ground up with a particular focus on advancing inference performance on AMD accelerators.</p>\n<p><strong>In this role, you will:</strong></p>\n<ul>\n<li>Own bring-up, correctness and performance of the OpenAI inference stack on AMD hardware.</li>\n</ul>\n<ul>\n<li>Integrate internal model-serving infrastructure (e.g., vLLM, Triton) into a variety of GPU-backed systems.</li>\n</ul>\n<ul>\n<li>Debug and optimize distributed inference workloads across memory, network, and compute layers.</li>\n</ul>\n<ul>\n<li>Validate correctness, performance, and scalability of model execution on large GPU clusters.</li>\n</ul>\n<ul>\n<li>Collaborate with partner teams to design and optimize high-performance GPU kernels for accelerators using HIP, Triton, or other performance-focused frameworks.</li>\n</ul>\n<ul>\n<li>Collaborate with partner teams to build, integrate and tune collective communication libraries (e.g., RCCL) used to parallelize model execution across many GPUs.</li>\n</ul>\n<p><strong>You can thrive in this role if you:</strong></p>\n<ul>\n<li>Have experience writing or porting GPU kernels using HIP, CUDA, or Triton, and care deeply about low-level performance.</li>\n</ul>\n<ul>\n<li>Are familiar with communication libraries like NCCL/RCCL and understand their role in high-throughput model serving.</li>\n</ul>\n<ul>\n<li>Have worked on distributed inference systems and are comfortable scaling models across fleets of accelerators.</li>\n</ul>\n<ul>\n<li>Enjoy solving end-to-end performance challenges across hardware, system libraries, and orchestration layers.</li>\n</ul>\n<ul>\n<li>Are excited to be part of a small, fast-moving team building new infrastructure from first principles.</li>\n</ul>\n<p><strong>Nice to Have:</strong></p>\n<ul>\n<li>Contributions to open-source libraries like RCCL, Triton, or vLLM.</li>\n</ul>\n<ul>\n<li>Experience with GPU performance tools (Nsight, rocprof, perf) and memory/comms profiling.</li>\n</ul>\n<ul>\n<li>Prior experience deploying inference on other non-NVIDIA GPU environments.</li>\n</ul>\n<ul>\n<li>Knowledge of model/tensor parallelism, mixed precision, and serving 10B+ parameter models.</li>\n</ul>\n<p><strong>About OpenAI</strong></p>\n<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_989f992b-6b2","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/9b79406c-89a8-49bd-8a38-e72db80996e9","x-work-arrangement":"onsite","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$295K – $555K • Offers Equity","x-skills-required":["GPU kernels","HIP","CUDA","Triton","NCCL/RCCL","distributed inference systems","GPU performance tools","memory/comms profiling"],"x-skills-preferred":["open-source libraries","GPU performance tools","memory/comms profiling"],"datePosted":"2026-03-06T18:28:36.084Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"GPU kernels, HIP, CUDA, Triton, NCCL/RCCL, distributed inference systems, GPU performance tools, memory/comms profiling, open-source libraries, GPU performance tools, memory/comms profiling","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":295000,"maxValue":555000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_4e51470c-8f1"},"title":"Software Engineer, Accelerators","description":"<p><strong>Software Engineer, Accelerators</strong></p>\n<p><strong>Location</strong></p>\n<p>San Francisco</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p>Scaling</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$295K – $380K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p><strong>About the Team</strong></p>\n<p>The Kernels team at OpenAI builds the low-level software that accelerates our most ambitious AI research.</p>\n<p>We work at the boundary of hardware and software, developing high-performance kernels, distributed system optimizations, and runtime improvements to make large-scale training and inference more efficient.</p>\n<p>Our work enables OpenAI to push the limits by ensuring models - from LLMs to recommender systems - to run reliably on advanced supercomputing platforms. That includes adapting our software stack to new types of accelerators, tuning system performance end-to-end, and removing bottlenecks across every layer of the stack.</p>\n<p><strong>About the Role</strong></p>\n<p>On the Accelerators team, you will help OpenAI evaluate and bring up new compute platforms that can support large-scale AI training and inference.</p>\n<p>Your work will range from prototyping system software on new accelerators to enabling performance optimizations across our AI workloads.</p>\n<p>You’ll work across the stack, collaborating with both hardware and software aspects - working on kernels, sharding strategies, scaling across distributed systems, and performance modeling.</p>\n<p>You&#39;ll help adapt OpenAI&#39;s software stack to non-traditional hardware and drive efficiency improvements in core AI workloads. This is not a compiler-focused role, rather bridging ML algorithms with system performance - especially at scale.</p>\n<p><strong>In this role, you will:</strong></p>\n<ul>\n<li>Prototype and enable OpenAI&#39;s AI software stack on new, exploratory accelerator platforms.</li>\n</ul>\n<ul>\n<li>Optimize large-scale model performance (LLMs, recommender systems, distributed AI workloads) for diverse hardware environments.</li>\n</ul>\n<ul>\n<li>Develop kernels, sharding mechanisms, and system scaling strategies tailored to emerging accelerators.</li>\n</ul>\n<ul>\n<li>Collaborate on optimizations at the model code level (e.g. PyTorch) and below to enhance performance on non-traditional hardware.</li>\n</ul>\n<p>Perform system-level performance modeling, debug bottlenecks, and drive end-to-end optimization.</p>\n<ul>\n<li>Work with hardware teams and vendors to evaluate alternatives to existing platforms and adapt the software stack to their architectures.</li>\n</ul>\n<ul>\n<li>Contribute to runtime improvements, compute/communication overlapping, and scaling efforts for frontier AI workloads.</li>\n</ul>\n<p><strong>You might thrive in this role if you have:</strong></p>\n<ul>\n<li>3+ years of experience working on AI infrastructure, including kernels, systems, or hardware-software co-design</li>\n</ul>\n<ul>\n<li>Hands-on experience with accelerator platforms for AI at data center scale (e.g., TPUs, custom silicon, exploratory architectures).</li>\n</ul>\n<ul>\n<li>Strong understanding of kernels, sharding, runtime systems, or distributed scaling techniques.</li>\n</ul>\n<ul>\n<li>Familiarity with optimizing LLMs, CNNs, or recommender models for hardware efficiency.</li>\n</ul>\n<ul>\n<li>Experience with performance modeling, system debugging, and software stack adaptation for novel architectures.</li>\n</ul>\n<ul>\n<li>Exposure to mobile accelerators is welcome, but experience enabling data center-scale AI hardware is preferred.</li>\n</ul>\n<ul>\n<li>Ability to operate across multiple levels of the stack, rapidly prototype solutions, and navigate ambiguity in early hardware bring-up phases</li>\n</ul>\n<ul>\n<li>Interest in shaping the future of AI compute through exploration of alternatives to mainstream accelerators.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_4e51470c-8f1","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/f386b209-1259-4b79-bf5a-aa97fc7ce77b","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$295K – $380K • Offers Equity","x-skills-required":["AI infrastructure","kernels","systems","hardware-software co-design","accelerator platforms","TPUs","custom silicon","exploratory architectures","kernels","sharding","runtime systems","distributed scaling techniques","LLMs","CNNs","recommender models","hardware efficiency","performance modeling","system debugging","software stack adaptation","novel architectures"],"x-skills-preferred":["mobile accelerators","data center-scale AI hardware"],"datePosted":"2026-03-06T18:27:12.141Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"AI infrastructure, kernels, systems, hardware-software co-design, accelerator platforms, TPUs, custom silicon, exploratory architectures, kernels, sharding, runtime systems, distributed scaling techniques, LLMs, CNNs, recommender models, hardware efficiency, performance modeling, system debugging, software stack adaptation, novel architectures, mobile accelerators, data center-scale AI hardware","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":295000,"maxValue":380000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_3c0a8f07-6b9"},"title":"Principal Software Engineer","description":"<p><strong>Summary</strong></p>\n<p>Microsoft are looking for a talented Principal Software Engineer at their Beijing office. This role sits at the heart of AI infrastructure development, driving innovation in large-scale AI infrastructure. You will be instrumental in designing and implementing high-performance, massively scalable infrastructure required to deploy frontier LLM models.</p>\n<p><strong>About the Role</strong></p>\n<p>As a Principal Software Engineer on the AI Infrastructure team, you will be responsible for designing and implementing innovative system optimization solutions for internal LLM workloads. You will optimize LLM inference workloads through innovative kernel, algorithm, scheduling, and parallelization technologies. You will also continuously develop and maintain internal LLM inference infrastructure, discovering new LLM system optimization needs and innovations.</p>\n<p><strong>Accountabilities</strong></p>\n<ul>\n<li>Keep up to date with and utilize the latest developments in LLM system optimization.</li>\n<li>Take the lead in designing innovative system optimization solutions for internal LLM workloads.</li>\n<li>Optimize LLM inference workloads through innovative kernel, algorithm, scheduling, and parallelization technologies.</li>\n<li>Continuously develop and maintain internal LLM inference infrastructure.</li>\n<li>Discover new LLM system optimization needs and innovations.</li>\n</ul>\n<p><strong>The Candidate we&#39;re looking for</strong></p>\n<p><strong>Experience:</strong></p>\n<ul>\n<li>A bachelor&#39;s degree or higher in computer science, engineering, or a related field, PhD is preferred.</li>\n</ul>\n<p><strong>Technical skills:</strong></p>\n<ul>\n<li>Strong programming skills in Python and C/C++.</li>\n<li>5+ years of experience in machine learning system development and optimization.</li>\n</ul>\n<p><strong>Personal attributes:</strong></p>\n<ul>\n<li>A growth mindset and a passion for learning new things.</li>\n</ul>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Competitive salary and benefits package.</li>\n<li>Opportunities for professional growth and development.</li>\n<li>Collaborative and dynamic work environment.</li>\n<li>Access to cutting-edge technology and resources.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_3c0a8f07-6b9","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Microsoft","sameAs":"https://microsoft.ai","logo":"https://logos.yubhub.co/microsoft.ai.png"},"x-apply-url":"https://microsoft.ai/job/principal-software-engineer-28/","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"Competitive salary and benefits package","x-skills-required":["Python","C/C++","Machine learning system development and optimization"],"x-skills-preferred":["CUDA kernel development and optimization","Experience in optimizing communication layer / kernels for deep learning systems"],"datePosted":"2026-03-06T07:32:22.965Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Beijing"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, C/C++, Machine learning system development and optimization, CUDA kernel development and optimization, Experience in optimizing communication layer / kernels for deep learning systems"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_96cf54a4-999"},"title":"Senior Software Engineer","description":"<p><strong>Summary</strong></p>\n<p>Microsoft are looking for a talented Senior Software Engineer at their Beijing office. This role sits at the heart of AI Infrastructure development, driving innovation in large-scale AI infrastructure. You will be instrumental in designing and implementing high-performance, massively scalable infrastructure required to deploy frontier LLM models.</p>\n<p><strong>About the Role</strong></p>\n<p>We are seeking brilliant and passionate engineers to work with us on the most interesting and challenging problems of AI Infrastructure development. As a Senior Software Engineer, you will be responsible for designing and implementing the high-performance, massively scalable infrastructure required to deploy frontier LLM models through innovative GPU kernel, compression, scheduling and parallelization optimizations.</p>\n<p><strong>Accountabilities</strong></p>\n<ul>\n<li>Keep up to date with and utilize the latest developments in LLM system optimization.</li>\n<li>Discover/solve impactful technical problems, advance state-of-the-art LLM technologies, and translate ideas into production.</li>\n<li>Optimize LLM inference workloads through innovative kernel, algorithm, scheduling, and parallelization technologies.</li>\n<li>Continuously maintain internal LLM inference infrastructure.</li>\n</ul>\n<p><strong>The Candidate we&#39;re looking for</strong></p>\n<p><strong>Experience:</strong></p>\n<ul>\n<li>A bachelor&#39;s degree or higher in computer science, engineering, or a related field, PhD is preferred.</li>\n</ul>\n<p><strong>Technical skills:</strong></p>\n<ul>\n<li>Strong programming skills in Python and C/C++.</li>\n<li>2+ years of experience in machine learning system development and optimization.</li>\n</ul>\n<p><strong>Personal attributes:</strong></p>\n<ul>\n<li>A growth mindset and a passion for learning new things.</li>\n</ul>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Competitive salary and benefits package.</li>\n<li>Opportunities for professional growth and development.</li>\n<li>Collaborative and dynamic work environment.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_96cf54a4-999","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Microsoft","sameAs":"https://microsoft.ai","logo":"https://logos.yubhub.co/microsoft.ai.png"},"x-apply-url":"https://microsoft.ai/job/senior-software-engineer-64/","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"Competitive salary and benefits package","x-skills-required":["Python","C/C++","Machine learning system development and optimization"],"x-skills-preferred":["CUDA kernel development and optimization","Experience in optimizing communication layer / kernels for deep learning systems"],"datePosted":"2026-03-06T07:32:05.702Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Beijing"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, C/C++, Machine learning system development and optimization, CUDA kernel development and optimization, Experience in optimizing communication layer / kernels for deep learning systems"}]}