{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/triton"},"x-facet":{"type":"skill","slug":"triton","display":"Triton","count":17},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_2c095439-13b"},"title":"Principal Software Engineer","description":"<p>Microsoft Advertising is seeking a Principal Software Engineer to join our Ads Engineering Platform team and advance the core capabilities of our ad-serving infrastructure,the engine that powers advertising across Bing Search, MSN, Microsoft Start, and shopping experiences in the Edge browser.</p>\n<p>Our serving stack operates at massive global scale, delivering millions of ad requests per second through a geo-distributed, low-latency system that combines large-scale GPU/CPU inference, real-time bidding, and intelligent ranking pipelines.</p>\n<p>This role focuses on advancing the performance, efficiency, and scalability of the next generation of model serving and inference platforms for Ads.</p>\n<p>As a senior technical leader, you’ll design and optimize high-performance serving systems and GPU inference frameworks that drive measurable latency improvements and cost efficiency across Microsoft’s ad ecosystem.</p>\n<p>You’ll work across the stack,from CUDA kernel tuning and NUMA-aware threading to large-scale distributed orchestration and model deployment for deep learning and LLM workloads.</p>\n<p>This is a rare opportunity to shape the architecture of one of the world’s most advanced, mission-critical online serving platforms, collaborating with world-class engineers to deliver innovation at Internet scale.</p>\n<p>Microsoft’s mission is to empower every person and every organization on the planet to achieve more.</p>\n<p>As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals.</p>\n<p>Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.</p>\n<p>Starting January 26, 2026, Microsoft AI (MAI) employees who live within a 50-mile commute of a designated Microsoft office in the U.S. or 25-mile commute of a non-U.S., country-specific location are expected to work from the office at least four days per week.</p>\n<p>This expectation is subject to local law and may vary by jurisdiction.</p>\n<p>Responsibilities:</p>\n<p>Design and lead the development of large-scale, distributed online serving systems,including GPU-accelerated and CPU-based ranking/inference pipelines,to process millions of ad requests per second with ultra-low latency, high throughput, and solid reliability.</p>\n<p>Architect and optimize end-to-end inference infrastructure, including model serving, batching/streaming, caching, scheduling, and resource orchestration across heterogeneous hardware (GPU, CPU, and memory tiers).</p>\n<p>Profile and optimize performance across the full stack,from CUDA kernels and GPU pipelines to CPU threads and OS-level scheduling,identifying bottlenecks, tuning latency tails, and improving cost efficiency through advanced profiling and instrumentation.</p>\n<p>Own live-site reliability as a DRI: design telemetry, alerting, and fault-tolerance mechanisms; drive rapid diagnosis and mitigation of performance regressions or outages in globally distributed systems.</p>\n<p>Collaborate and mentor across teams,driving architecture reviews, enforcing engineering excellence, promoting system-level optimization practices, and mentoring others in deep debugging, profiling, and performance engineering.</p>\n<p>Qualifications:</p>\n<p>Required Qualifications:</p>\n<p>Bachelor’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.</p>\n<p>Preferred Qualifications:</p>\n<p>Master’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor’s Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.</p>\n<p>Industry experience in advertising or search engine backend systems, such as large-scale ad ranking, real-time bidding (RTB), or relevance-serving infrastructure.</p>\n<p>Hands-on experience with real-time data streaming systems (Kafka, Flink, Spark Streaming), feature-store integration, and multi-region deployment for low-latency, globally distributed services.</p>\n<p>Familiarity with LLM inference optimization,model sharding, tensor/kv-cache parallelism, paged attention, continuous batching, quantization (AWQ/FP8), and hybrid CPU–GPU orchestration.</p>\n<p>Demonstrated success operating large-scale systems with SLA-based capacity forecasting, autoscaling, and performance telemetry; proven leadership in cross-functional architecture initiatives and technical mentorship.</p>\n<p>Passion for performance engineering, observability, and deep systems debugging, with a solid drive to push the limits of serving infrastructure for the next generation of ads and AI models.</p>\n<p>Deep expertise in GPU inference frameworks such as NVIDIA Triton Inference Server, CUDA, and TensorRT, including hands-on experience implementing custom CUDA kernels, optimizing memory movement (H2D/D2H), overlapping compute and I/O, and maximizing GPU occupancy and kernel fusion for deep learning and LLM workloads.</p>\n<p>Solid understanding of model-serving trade-offs,batching vs. streaming, latency vs. throughput, quantization (FP16/BF16/INT8), dynamic batching, continuous model rollout, and adaptive inference scheduling across CPU/GPU tiers.</p>\n<p>Proven ability to profile and optimize GPU and system workloads,including tensor/memory alignment, compute–memory balancing, embedding table management, parameter servers, hierarchical caching, and vectorized inference for transformer/LLM architectures.</p>\n<p>Expertise in low-level system and OS internals, including multi-threading, process scheduling, NUMA-aware memory allocation, lock-free data structures, context switching, I/O stack tuning (NVMe, RDMA), kernel bypass (DPDK, io_uring), and CPU/GPU affinity optimization for large-scale serving pipelines.</p>\n<p>#MicrosoftAI Software Engineering IC5 – The typical base pay range for this role across the U.S. is USD $139,900 – $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 – $304,200 per year.</p>\n<p>Certain roles may be eligible for benefits and other compensation.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_2c095439-13b","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Microsoft AI","sameAs":"https://microsoft.ai","logo":"https://logos.yubhub.co/microsoft.ai.png"},"x-apply-url":"https://microsoft.ai/job/principal-software-engineer-41/","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$139,900 - $274,800 per year","x-skills-required":["C","C++","C#","Java","JavaScript","Python","NVIDIA Triton Inference Server","CUDA","TensorRT","Kafka","Flink","Spark Streaming","GPU inference frameworks","LLM inference optimization","model sharding","tensor/kv-cache parallelism","paged attention","continuous batching","quantization","AWQ/FP8","hybrid CPU–GPU orchestration","SLA-based capacity forecasting","autoscaling","performance telemetry","cross-functional architecture initiatives","technical mentorship","performance engineering","observability","deep systems debugging","low-level system and OS internals","multi-threading","process scheduling","NUMA-aware memory allocation","lock-free data structures","context switching","I/O stack tuning","kernel bypass","CPU/GPU affinity optimization"],"x-skills-preferred":[],"datePosted":"2026-04-24T12:12:57.301Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Redmond"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"C, C++, C#, Java, JavaScript, Python, NVIDIA Triton Inference Server, CUDA, TensorRT, Kafka, Flink, Spark Streaming, GPU inference frameworks, LLM inference optimization, model sharding, tensor/kv-cache parallelism, paged attention, continuous batching, quantization, AWQ/FP8, hybrid CPU–GPU orchestration, SLA-based capacity forecasting, autoscaling, performance telemetry, cross-functional architecture initiatives, technical mentorship, performance engineering, observability, deep systems debugging, low-level system and OS internals, multi-threading, process scheduling, NUMA-aware memory allocation, lock-free data structures, context switching, I/O stack tuning, kernel bypass, CPU/GPU affinity optimization","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":139900,"maxValue":274800,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_5e20ca92-993"},"title":"Principal Software Engineer","description":"<p>Monetization Engineering is responsible for building a unified, intelligent, and resilient monetization platform that drives revenue across Microsoft’s AI-native surfaces, including Copilot, Search, MSN, Shopping, and both first-party and third-party ecosystems.</p>\n<p>Our mission is to enhance advertiser value, optimize platform performance, and achieve long-term revenue growth through large-scale systems, machine learning-driven optimization, experimentation, and cross-surface innovation.</p>\n<p>We are seeking an experienced professional with expertise in GPU inference optimization and a deep understanding of LLM/SLM architecture to join our team.</p>\n<p>This is a unique opportunity to contribute to cutting-edge advancements in AI and deep learning while driving impactful solutions for Microsoft’s advertising and monetization platforms.</p>\n<p>Microsoft’s mission is to empower every person and every organization on the planet to achieve more.</p>\n<p>As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals.</p>\n<p>Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.</p>\n<p>Starting January 26, 2026, Microsoft AI (MAI) employees who live within a 50-mile commute of a designated Microsoft office in the U.S. or 25-mile commute of a non-U.S., country-specific location are expected to work from the office at least four days per week.</p>\n<p>This expectation is subject to local law and may vary by jurisdiction.</p>\n<p>Responsibilities:</p>\n<p>Serves as the technological core of Microsoft’s rapidly expanding digital advertising business.</p>\n<p>Focus on accelerating Microsoft’s large-scale deep learning inference for Ads, Shopping, Copilot, and other surfaces, including both offline and online applications that support OpenAI LLM models and next-generation LLMs/SLMs.</p>\n<p>Play a pivotal role in bridging state-of-the-art GPU and deep learning technologies with critical business applications.</p>\n<p>Qualifications:</p>\n<p>Required Qualifications:</p>\n<p>Bachelor’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.</p>\n<p>Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.</p>\n<p>These requirements include but are not limited to the following specialized security screenings:</p>\n<p>Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.</p>\n<p>Preferred Qualifications:</p>\n<p>Master’s Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor’s Degree in Computer Science or related technical field AND 15+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.</p>\n<p>Solid experience in GPU inference optimization (CUDA, TensorRT, Triton, or custom GPU kernels).</p>\n<p>Proficiency in profiling tools (Nsight, TensorBoard, PyTorch profiler) and ability to identify CPU/GPU bottlenecks.</p>\n<p>Deep understanding of LLM/SLM architectures (attention, embeddings, MoE, decoders).</p>\n<p>Experience optimizing latency-critical online services.</p>\n<p>Experience with model compression (quantization, distillation, SVD, low-rank methods).</p>\n<p>Experience in building high-throughput inference serving stacks (continuous batching, KV-cache optimizations, routing).</p>\n<p>Familiarity with Microsoft’s DLIS, Talon routing, Triton/TensorRT-LLM stack, and Azure/H100/A100 GPU environments.</p>\n<p>Publications, competition wins, or real-world deployments related to model efficiency.</p>\n<p>#MicrosoftAI</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_5e20ca92-993","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Microsoft","sameAs":"https://microsoft.ai","logo":"https://logos.yubhub.co/microsoft.ai.png"},"x-apply-url":"https://microsoft.ai/job/principal-software-engineer-47/","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$163,000 - $296,400 per year","x-skills-required":["GPU inference optimization","LLM/SLM architecture","C","C++","C#","Java","JavaScript","Python","CUDA","TensorRT","Triton","custom GPU kernels","profiling tools","CPU/GPU bottlenecks","model compression","high-throughput inference serving stacks","DLIS","Talon routing","Triton/TensorRT-LLM stack","Azure/H100/A100 GPU environments"],"x-skills-preferred":[],"datePosted":"2026-04-24T12:10:41.636Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Redmond"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"GPU inference optimization, LLM/SLM architecture, C, C++, C#, Java, JavaScript, Python, CUDA, TensorRT, Triton, custom GPU kernels, profiling tools, CPU/GPU bottlenecks, model compression, high-throughput inference serving stacks, DLIS, Talon routing, Triton/TensorRT-LLM stack, Azure/H100/A100 GPU environments","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":163000,"maxValue":296400,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_9af8d812-df8"},"title":"AI Infrastructure Engineer","description":"<p>We&#39;re looking for Senior+ AI Infrastructure Engineers to build the systems that train and serve Intercom&#39;s next generation of AI products.</p>\n<p>As a Senior AI Infrastructure Engineer focused on model training and inference, you will:</p>\n<p>Implement and scale training pipelines for large transformer and LLM models, from data ingestion and preprocessing through distributed training and evaluation.</p>\n<p>Build and optimize inference services that deliver low-latency, high-reliability experiences for our customers, including autoscaling, routing, and fallbacks.</p>\n<p>Work on GPU-level performance: tuning kernels, improving utilization, and identifying bottlenecks across our training and inference stack.</p>\n<p>Collaborate closely with ML scientists to implement cutting edge training and inference methods and bring them to production.</p>\n<p>Play an active role in hiring, mentoring, and developing other engineers on the team.</p>\n<p>Raise the bar for technical standards, reliability, and operational excellence across Intercom’s AI platform.</p>\n<p>We’re looking to hire Senior+ AI Infrastructure Engineers. You’re likely a great fit if:</p>\n<p>You have 5+ years of experience in software engineering, with a strong track record of shipping high-quality products or platforms.</p>\n<p>You hold a degree in Computer Science, Computer Engineering, or a related field (or you have equivalent experience with very strong fundamentals).</p>\n<p>You have hands-on experience with one or more of the following:</p>\n<p>Model training (especially transformers and LLMs).</p>\n<p>Model inference at scale (again, especially transformers and LLMs).</p>\n<p>Low-level GPU work, such as writing CUDA or Triton kernels.</p>\n<p>Comfortable working in production environments at meaningful scale (traffic, data, or organizational).</p>\n<p>You communicate clearly, can explain complex technical topics to different audiences, and enjoy close collaboration with both engineers and non-engineers.</p>\n<p>You take pride in strong technical fundamentals, love learning, and are willing to invest in your own development.</p>\n<p>Have deep knowledge of at least one programming language (for example Python, Ruby, Java, Go, etc.). Specific language experience is less important than your ability to write clean, reliable code and learn new stacks quickly.</p>\n<p>We are a well-treated bunch, with awesome benefits! If there’s something important to you that’s not on this list, talk to us!</p>\n<p>Competitive salary, annual bonus and equity</p>\n<p>Regular compensation reviews - we reward great work!</p>\n<p>Unlimited access to Claude Code and best-in-class AI tools; experimentation &amp; building is encouraged &amp; celebrated.</p>\n<p>Generous paid time off above statutory minimum</p>\n<p>Hybrid working</p>\n<p>MacBooks are our standard, but we also offer Windows for certain roles when needed.</p>\n<p>Fun events for employees, friends, and family!</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_9af8d812-df8","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Intercom","sameAs":"https://www.intercom.com/","logo":"https://logos.yubhub.co/intercom.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/intercom/jobs/7824142","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["model training","model inference","low-level GPU work","CUDA","Triton","Python","Ruby","Java","Go"],"x-skills-preferred":["experience at AI native companies","running training or inference workloads on Kubernetes","AWS","cloud providers","production experience with Python in ML or infrastructure contexts"],"datePosted":"2026-04-18T15:57:33.379Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Berlin, Germany"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"model training, model inference, low-level GPU work, CUDA, Triton, Python, Ruby, Java, Go, experience at AI native companies, running training or inference workloads on Kubernetes, AWS, cloud providers, production experience with Python in ML or infrastructure contexts"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_cba88898-896"},"title":"Research Engineer, Infrastructure, Kernels","description":"<p>We&#39;re looking for an infrastructure research engineer to design, optimize, and maintain the compute foundations that power large-scale language model training. You will develop high-performance ML kernels (e.g., CUDA, CuTe, Triton), enable efficient low-precision arithmetic, and improve the distributed compute stack that makes training large models possible.</p>\n<p>This role is perfect for an engineer who enjoys working close to the metal and across the research boundary. You&#39;ll collaborate with researchers and systems architects to bridge algorithmic design with hardware efficiency. You&#39;ll prototype new kernel implementations, profile performance across hardware generations, and help define the numerical and parallelism strategies that determine how we scale next-generation AI systems.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Design and implement custom ML kernels (e.g., CUDA, CuTe, Triton) for core LLM operations such as attention, matrix multiplication, gating, and normalization, optimized for modern GPU and accelerator architectures.</li>\n<li>Design and think through compute primitives to reduce memory bandwidth bottlenecks and improve kernel compute efficiency.</li>\n<li>Collaborate with research teams to align kernel-level optimizations with model architecture and algorithmic goals.</li>\n<li>Develop and maintain a library of reusable kernels and performance benchmarks that serve as the foundation for internal model training.</li>\n<li>Contribute to infrastructure stability and scalability, ensuring reproducibility, consistency across precision formats, and high utilization of compute resources.</li>\n<li>Document and share insights through internal talks, technical papers, or open-source contributions to strengthen the broader ML systems community.</li>\n</ul>\n<p><strong>Skills and Qualifications</strong></p>\n<p>Minimum qualifications:</p>\n<ul>\n<li>Bachelor’s degree or equivalent experience in computer science, electrical engineering, statistics, machine learning, physics, robotics, or similar.</li>\n<li>Strong engineering skills, ability to contribute performant, maintainable code and debug in complex codebases</li>\n<li>Understanding of deep learning frameworks (e.g., PyTorch, JAX) and their underlying system architectures.</li>\n<li>Thrive in a highly collaborative environment involving many, different cross-functional partners and subject matter experts.</li>\n<li>A bias for action with a mindset to take initiative to work across different stacks and different teams where you spot the opportunity to make sure something ships.</li>\n<li>Proficiency in CUDA, CuTe, Triton, or other GPU programming frameworks.</li>\n<li>Demonstrated ability to analyze, profile, and optimize compute-intensive workloads.</li>\n</ul>\n<p>Preferred qualifications:</p>\n<ul>\n<li>Experience training or supporting large-scale language models with tens of billions of parameters or more.</li>\n<li>Track record of improving research productivity through infrastructure design or process improvements.</li>\n<li>Experience developing or tuning kernels for deep learning frameworks such as PyTorch, JAX, or custom accelerators.</li>\n<li>Familiarity with tensor parallelism, pipeline parallelism, or distributed data processing frameworks.</li>\n<li>Experience implementing low-precision formats (FP8, INT8, block floating point) or contributing to related compiler stacks (e.g., XLA, TVM).</li>\n<li>Contributions to open-source GPU, ML systems, or compiler optimization projects.</li>\n<li>Prior research or engineering experience in numerical optimization, communication-efficient training, or scalable AI infrastructure.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_cba88898-896","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Thinking Machines Lab","sameAs":"https://thinkingmachines.ai/","logo":"https://logos.yubhub.co/thinkingmachines.ai.png"},"x-apply-url":"https://job-boards.greenhouse.io/thinkingmachines/jobs/5013934008","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$350,000 - $475,000 USD","x-skills-required":["CUDA","CuTe","Triton","GPU programming frameworks","Deep learning frameworks (e.g., PyTorch, JAX)","Computer science","Electrical engineering","Statistics","Machine learning","Physics","Robotics"],"x-skills-preferred":["Experience training or supporting large-scale language models with tens of billions of parameters or more","Track record of improving research productivity through infrastructure design or process improvements","Experience developing or tuning kernels for deep learning frameworks such as PyTorch, JAX, or custom accelerators","Familiarity with tensor parallelism, pipeline parallelism, or distributed data processing frameworks","Experience implementing low-precision formats (FP8, INT8, block floating point) or contributing to related compiler stacks (e.g., XLA, TVM)","Contributions to open-source GPU, ML systems, or compiler optimization projects","Prior research or engineering experience in numerical optimization, communication-efficient training, or scalable AI infrastructure"],"datePosted":"2026-04-18T15:54:38.498Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"CUDA, CuTe, Triton, GPU programming frameworks, Deep learning frameworks (e.g., PyTorch, JAX), Computer science, Electrical engineering, Statistics, Machine learning, Physics, Robotics, Experience training or supporting large-scale language models with tens of billions of parameters or more, Track record of improving research productivity through infrastructure design or process improvements, Experience developing or tuning kernels for deep learning frameworks such as PyTorch, JAX, or custom accelerators, Familiarity with tensor parallelism, pipeline parallelism, or distributed data processing frameworks, Experience implementing low-precision formats (FP8, INT8, block floating point) or contributing to related compiler stacks (e.g., XLA, TVM), Contributions to open-source GPU, ML systems, or compiler optimization projects, Prior research or engineering experience in numerical optimization, communication-efficient training, or scalable AI infrastructure","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":350000,"maxValue":475000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_c9ab5cbc-dd6"},"title":"Research Engineer, Performance RL","description":"<p>We&#39;re hiring a Research Engineer to join our Code RL team within the RL organization. As a Research Engineer, you&#39;ll advance our models&#39; ability to safely write correct, fast code for accelerators.</p>\n<p>You&#39;ll need to know accelerator performance well to turn it into tasks and signals models can learn from. Specifically, you will:</p>\n<ul>\n<li>Invent, design and implement RL environments and evaluations.</li>\n<li>Conduct experiments and shape our research roadmap.</li>\n<li>Deliver your work into training runs.</li>\n<li>Collaborate with other researchers, engineers, and performance engineering specialists across and outside Anthropic.</li>\n</ul>\n<p>We&#39;re looking for someone with expertise in accelerators (CUDA, ROCm, Triton, Pallas), ML framework programming (JAX or PyTorch), and experience with balancing research exploration with engineering implementation.</p>\n<p>Strong candidates may also have experience with reinforcement learning, porting ML workloads between different types of accelerators, and familiarity with LLM training methodologies.</p>\n<p>The annual compensation range for this role is $350,000-$850,000 USD.</p>\n<p>Please note that we&#39;re an extremely collaborative group, and we value communication skills. The easiest way to understand our research directions is to read our recent research.</p>\n<p>We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_c9ab5cbc-dd6","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5160330008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$350,000-$850,000 USD","x-skills-required":["accelerator performance","ML framework programming","reinforcement learning","RL environments and evaluations","experiments and research roadmap","training runs","collaboration with researchers and engineers"],"x-skills-preferred":["CUDA","ROCm","Triton","Pallas","JAX","PyTorch","LLM training methodologies"],"datePosted":"2026-04-18T15:54:02.762Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"accelerator performance, ML framework programming, reinforcement learning, RL environments and evaluations, experiments and research roadmap, training runs, collaboration with researchers and engineers, CUDA, ROCm, Triton, Pallas, JAX, PyTorch, LLM training methodologies","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":350000,"maxValue":850000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_2bc6ae79-8ee"},"title":"Staff Technical Lead for Inference & ML Performance","description":"<p>We&#39;re looking for a Staff Technical Lead for Inference &amp; ML Performance to guide a team in building and optimizing state-of-the-art inference systems. This role is intense yet deeply impactful.</p>\n<p>You&#39;ll shape the future of fal&#39;s inference engine and ensure our generative models achieve best-in-class performance. Your work directly impacts our ability to rapidly deliver cutting-edge creative solutions to users, from individual creators to global brands.</p>\n<p>Day-to-day, you&#39;ll set technical direction, guide your team to build high-performance inference solutions, and personally contribute to critical inference performance enhancements and optimizations. You&#39;ll collaborate closely with research &amp; applied ML teams, influence model inference strategies and deployment techniques, and drive advanced performance optimizations.</p>\n<p>As a leader, you&#39;ll mentor and scale your team, coach and expand your team of performance-focused engineers, and help them innovate, solve complex performance challenges, and level up their skills.</p>\n<p>To succeed in this role, you&#39;ll need to be deeply experienced in ML performance optimization, understand the full ML performance stack, and know inference inside-out. You&#39;ll also need to thrive in cross-functional collaboration and have excellent leadership skills.</p>\n<p>If you&#39;re ready to lead the future of inference performance at a fast-paced, high-growth frontier, apply now!</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_2bc6ae79-8ee","directApply":true,"hiringOrganization":{"@type":"Organization","name":"fal","sameAs":"https://fal.com","logo":"https://logos.yubhub.co/fal.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/fal/jobs/4012780009","x-work-arrangement":"onsite","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["ML performance optimization","PyTorch","TensorRT","TransformerEngine","Triton","CUTLASS kernels","Quantization","Kernel authoring","Compilation","Model parallelism","Distributed serving","Profiling"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:50:42.839Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"ML performance optimization, PyTorch, TensorRT, TransformerEngine, Triton, CUTLASS kernels, Quantization, Kernel authoring, Compilation, Model parallelism, Distributed serving, Profiling"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_1507524b-770"},"title":"Research Engineer, Performance RL","description":"<p>We&#39;re hiring a Research Engineer to join our Code RL team within the RL organization. As a Research Engineer, you&#39;ll advance our models&#39; ability to safely write correct, fast code for accelerators.</p>\n<p>You&#39;ll need to know accelerator performance well to turn it into tasks and signals models can learn from. Specifically, you will:</p>\n<ul>\n<li>Invent, design and implement RL environments and evaluations.</li>\n<li>Conduct experiments and shape our research roadmap.</li>\n<li>Deliver your work into training runs.</li>\n<li>Collaborate with other researchers, engineers, and performance engineering specialists across and outside Anthropic.</li>\n</ul>\n<p>You may be a good fit if you:</p>\n<ul>\n<li>Have expertise with accelerators (CUDA, ROCm, Triton, Pallas), ML framework programming (JAX or PyTorch).</li>\n<li>Have worked across the stack – kernels, model code, distributed systems.</li>\n<li>Know how to balance research exploration with engineering implementation.</li>\n<li>Are passionate about AI&#39;s potential and committed to developing safe and beneficial systems.</li>\n</ul>\n<p>Strong candidates may also have:</p>\n<ul>\n<li>Experience with reinforcement learning.</li>\n<li>Experience porting ML workloads between different types of accelerators.</li>\n<li>Familiarity with LLM training methodologies.</li>\n</ul>\n<p>The annual compensation range for this role is $350,000-$850,000 USD.</p>\n<p>We&#39;re an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.</p>\n<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact , advancing our long-term goals of steerable, trustworthy AI , rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science.</p>\n<p>We kitchen is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_1507524b-770","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5160330008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$350,000-$850,000 USD","x-skills-required":["accelerators","ML framework programming","distributed systems","reinforcement learning","LLM training methodologies"],"x-skills-preferred":["CUDA","ROCm","Triton","Pallas","JAX","PyTorch"],"datePosted":"2026-04-18T15:42:09.925Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"accelerators, ML framework programming, distributed systems, reinforcement learning, LLM training methodologies, CUDA, ROCm, Triton, Pallas, JAX, PyTorch","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":350000,"maxValue":850000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_28107212-128"},"title":"Performance Engineer, GPU","description":"<p>As a GPU Performance Engineer at Anthropic, you will be responsible for architecting and implementing the foundational systems that power Claude and push the frontiers of what&#39;s possible with large language models. You will maximize GPU utilization and performance at unprecedented scale, develop cutting-edge optimizations that directly enable new model capabilities, and dramatically improve inference efficiency.</p>\n<p>Working at the intersection of hardware and software, you will implement state-of-the-art techniques from custom kernel development to distributed system architectures. Your work will span the entire stack,from low-level tensor core optimizations to orchestrating thousands of GPUs in perfect synchronization.</p>\n<p>Strong candidates will have a track record of delivering transformative GPU performance improvements in production ML systems and will be excited to shape the future of AI infrastructure alongside world-class researchers and engineers.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Architect and implement foundational systems that power Claude</li>\n<li>Maximize GPU utilization and performance at unprecedented scale</li>\n<li>Develop cutting-edge optimizations that directly enable new model capabilities</li>\n<li>Dramatically improve inference efficiency</li>\n<li>Implement state-of-the-art techniques from custom kernel development to distributed system architectures</li>\n<li>Work at the intersection of hardware and software</li>\n<li>Span the entire stack,from low-level tensor core optimizations to orchestrating thousands of GPUs in perfect synchronization</li>\n</ul>\n<p>Requirements:</p>\n<ul>\n<li>Deep experience with GPU programming and optimization at scale</li>\n<li>Impact-driven, passionate about delivering measurable performance breakthroughs</li>\n<li>Ability to navigate complex systems from hardware interfaces to high-level ML frameworks</li>\n<li>Enjoy collaborative problem-solving and pair programming</li>\n<li>Want to work on state-of-the-art language models with real-world impact</li>\n<li>Care about the societal impacts of your work</li>\n<li>Thrive in ambiguous environments where you define the path forward</li>\n</ul>\n<p>Nice to have:</p>\n<ul>\n<li>Experience with GPU Kernel Development: CUDA, Triton, CUTLASS, Flash Attention, tensor core optimization</li>\n<li>ML Compilers &amp; Frameworks: PyTorch/JAX internals, torch.compile, XLA, custom operators</li>\n<li>Performance Engineering: Kernel fusion, memory bandwidth optimization, profiling with Nsight</li>\n<li>Distributed Systems: NCCL, NVLink, collective communication, model parallelism</li>\n<li>Low-Precision: INT8/FP8 quantization, mixed-precision techniques</li>\n<li>Production Systems: Large-scale training infrastructure, fault tolerance, cluster orchestration</li>\n</ul>\n<p>Representative projects:</p>\n<ul>\n<li>Co-design attention mechanisms and algorithms for next-generation hardware architectures</li>\n<li>Develop custom kernels for emerging quantization formats and mixed-precision techniques</li>\n<li>Design distributed communication strategies for multi-node GPU clusters</li>\n<li>Optimize end-to-end training and inference pipelines for frontier language models</li>\n<li>Build performance modeling frameworks to predict and optimize GPU utilization</li>\n<li>Implement kernel fusion strategies to minimize memory bandwidth bottlenecks</li>\n<li>Create resilient systems for planet-scale distributed training infrastructure</li>\n<li>Profile and eliminate performance bottlenecks in production serving infrastructure</li>\n<li>Partner with hardware vendors to influence future accelerator capabilities and software stacks</li>\n</ul>\n<p>Note: The salary range for this position is $280,000-$850,000 USD per year.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_28107212-128","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4926227008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$280,000-$850,000 USD per year","x-skills-required":["GPU programming","optimization at scale","CUDA","Triton","CUTLASS","Flash Attention","tensor core optimization","PyTorch/JAX internals","torch.compile","XLA","custom operators","kernel fusion","memory bandwidth optimization","profiling with Nsight","NCCL","NVLink","collective communication","model parallelism","INT8/FP8 quantization","mixed-precision techniques","large-scale training infrastructure","fault tolerance","cluster orchestration"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:40:11.758Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"GPU programming, optimization at scale, CUDA, Triton, CUTLASS, Flash Attention, tensor core optimization, PyTorch/JAX internals, torch.compile, XLA, custom operators, kernel fusion, memory bandwidth optimization, profiling with Nsight, NCCL, NVLink, collective communication, model parallelism, INT8/FP8 quantization, mixed-precision techniques, large-scale training infrastructure, fault tolerance, cluster orchestration","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":280000,"maxValue":850000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_586b9fef-509"},"title":"Senior Software Engineer - Network Enablement (Applied ML)","description":"<p>We believe that the way people interact with their finances will drastically improve in the next few years. We&#39;re dedicated to empowering this transformation by building the tools and experiences that thousands of developers use to create their own products.</p>\n<p>On this team, you will build and operate the ML infrastructure and product services that enable trust and intelligence across Plaid&#39;s network. You&#39;ll own feature engineering, offline training and batch scoring, online feature serving, and real-time inference so model outputs directly power partner-facing fraud &amp; trust products and bank intelligence features.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Embed model inference into Network Enablement product flows and decision logic (APIs, feature flags, backend flows).</li>\n<li>Define and instrument product + ML success metrics (fraud reduction, retention lift, false positives, downstream impact).</li>\n<li>Design and run experiments and rollout plans (backtesting, shadow scoring, A/B tests, feature-flagged releases) to validate product hypotheses.</li>\n<li>Build and operate offline training pipelines and production batch scoring for bank intelligence products.</li>\n<li>Ship and maintain online feature serving and low-latency model inference endpoints for real-time partner/bank scoring.</li>\n<li>Implement model CI/CD, model/version registry, and safe rollout/rollback strategies.</li>\n<li>Monitor model/data health: drift/regression detection, model-quality dashboards, alerts, and SLOs targeted to partner product needs.</li>\n<li>Ensure offline and online parity, data lineage, and automated validation / data contracts to reduce regressions.</li>\n<li>Optimize inference performance and cost for real-time scoring (batching, caching, runtime selection).</li>\n<li>Ensure fairness, explainability and PII-aware handling for partner-facing ML features; maintain auditability for compliance.</li>\n<li>Partner with platform and cross-functional teams to scale the ML/data foundation (graph features, sequence embeddings, unified pipelines).</li>\n<li>Mentor engineers and document team standards for ML productization and operations.</li>\n</ul>\n<p><strong>Qualifications</strong></p>\n<ul>\n<li>Must-haves:</li>\n<li>Strong software engineering skills including systems design, APIs, and building reliable backend services (Go or Python preferred).</li>\n<li>Production experience with batch and streaming data pipelines and orchestration tools such as Airflow or Spark.</li>\n<li>Experience building or operating real-time scoring and online feature-serving systems, including feature stores and low-latency model inference.</li>\n<li>Experience integrating model outputs into product flows (APIs, feature flags) and measuring impact through experiments and product metrics.</li>\n<li>Experience with model lifecycle and operations: model registries, CI/CD for models, reproducible training, offline &amp; online parity, monitoring and incident response.</li>\n<li>Nice to have:</li>\n<li>Experience in fraud, risk, or marketing intelligence domains.</li>\n<li>Experience with feature-store products (Tecton / Chronon / Feast / internal) and unified pipelines.</li>\n<li>Experience with graph frameworks, graph feature engineering, or sequence embeddings.</li>\n<li>Experience optimizing inference at scale (Triton/ONNX/quantization, batching, caching).</li>\n</ul>\n<p><strong>Additional Information</strong></p>\n<p>Our mission at Plaid is to unlock financial freedom for everyone. To support that mission, we seek to build a diverse team of driven individuals who care deeply about making the financial ecosystem more equitable.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_586b9fef-509","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Plaid","sameAs":"https://plaid.com/","logo":"https://logos.yubhub.co/plaid.com.png"},"x-apply-url":"https://jobs.lever.co/plaid/43b1374d-5c5e-4b63-b710-a95e3cb76bbe","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$190,800-$286,800 per year","x-skills-required":["software engineering","systems design","APIs","backend services","Go","Python","batch and streaming data pipelines","orchestration tools","Airflow","Spark","real-time scoring","online feature-serving systems","feature stores","low-latency model inference","model outputs","product flows","experiments","product metrics","model lifecycle","operations","model registries","CI/CD","reproducible training","offline & online parity","monitoring","incident response"],"x-skills-preferred":["fraud","risk","marketing intelligence","feature-store products","unified pipelines","graph frameworks","graph feature engineering","sequence embeddings","inference at scale","Triton","ONNX","quantization","batching","caching"],"datePosted":"2026-04-17T12:51:26.228Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"software engineering, systems design, APIs, backend services, Go, Python, batch and streaming data pipelines, orchestration tools, Airflow, Spark, real-time scoring, online feature-serving systems, feature stores, low-latency model inference, model outputs, product flows, experiments, product metrics, model lifecycle, operations, model registries, CI/CD, reproducible training, offline & online parity, monitoring, incident response, fraud, risk, marketing intelligence, feature-store products, unified pipelines, graph frameworks, graph feature engineering, sequence embeddings, inference at scale, Triton, ONNX, quantization, batching, caching","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":190800,"maxValue":286800,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_5c28c97d-fc5"},"title":"Member of Technical Staff - Image / Video Generation","description":"<p><strong>Job Title</strong></p>\n<p>Member of Technical Staff - Image / Video Generation</p>\n<p><strong>Job Description</strong></p>\n<p>We&#39;re the team behind Latent Diffusion, Stable Diffusion, and FLUX,foundational technologies that changed how the world creates images and video. We&#39;re creating the generative models that power how people make images and video,tools used by millions of creators, developers, and businesses worldwide. Our FLUX models are among the most advanced in the world, and we’re just getting started.</p>\n<p><strong>Why This Role</strong></p>\n<p>You&#39;ll train large-scale diffusion models for image and video generation, exploring new approaches while maintaining the rigor that helps us distinguish meaningful progress from incremental tweaks. This isn&#39;t about following established recipes,it&#39;s about running the experiments that clarify which architectural choices matter and which are less impactful.</p>\n<p><strong>What You’ll Work On</strong></p>\n<ul>\n<li>Trains large-scale diffusion transformer models for image and video data, working at the scale where intuitions break and empirical evidence matters</li>\n<li>Rigorously ablates design choices,running experiments that isolate variables, control for confounds, and produce insights you can actually trust,then communicating those results to shape our research direction</li>\n<li>Reasons about the speed-quality tradeoffs of neural network architectures in production settings where both constraints matter simultaneously</li>\n<li>Fine-tunes diffusion models for specialized applications like image and video upscalers, inpainting/outpainting models, and other tasks where general-purpose models aren&#39;t enough</li>\n</ul>\n<p><strong>What We’re Looking For</strong></p>\n<ul>\n<li>You&#39;ve trained large-scale diffusion models and developed strong intuitions about what matters. You know that at research scale, every design choice has tradeoffs, and the only way to know which ones are worth making is through careful ablation. You&#39;re comfortable debugging distributed training issues and presenting research findings to the team.</li>\n</ul>\n<p><strong>Required Skills</strong></p>\n<ul>\n<li>Hands-on experience training large-scale diffusion models for image and video data, with practical knowledge of common failure modes and what matters most in training</li>\n<li>Experience fine-tuning diffusion models for specialized applications,upscalers, inpainting, outpainting, or other tasks where understanding the domain matters as much as understanding the architecture</li>\n<li>Deep understanding of how to effectively evaluate image and video generative models,knowing which metrics correlate with quality and which are just convenient proxies</li>\n<li>Strong proficiency in PyTorch, transformer architectures, and the full ecosystem of modern deep learning</li>\n<li>Solid understanding of distributed training techniques,FSDP, low precision training, model parallelism,because our models don&#39;t fit on one GPU and training decisions impact research outcomes</li>\n</ul>\n<p><strong>Preferred Skills</strong></p>\n<ul>\n<li>Experience writing forward and backward Triton kernels and ensuring their correctness while considering floating point errors</li>\n<li>Proficiency with profiling, debugging, and optimizing single and multi-GPU operations using tools like Nsight or stack trace viewers</li>\n<li>Know the performance characteristics of different architectural choices at scale</li>\n<li>Have published research that contributed to how people think about generative models</li>\n</ul>\n<p><strong>How We Work Together</strong></p>\n<p>We’re a distributed team with real offices that people actually use. Depending on your role, you’ll either join us in Freiburg or SF at least 2 days a week (or one full week every other week), or work remotely with a monthly in-person week to stay connected. We’ll cover reasonable travel costs to make this possible. We think in-person time matters, and we’ve structured things to make it accessible to all. We’ll discuss what this will look like for the role during our interview process.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_5c28c97d-fc5","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Black Forest Labs","sameAs":"https://www.blackforestlabs.com/","logo":"https://logos.yubhub.co/blackforestlabs.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/blackforestlabs/jobs/4132217008","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["large-scale diffusion models","image and video data","PyTorch","transformer architectures","distributed training techniques"],"x-skills-preferred":["writing forward and backward Triton kernels","profiling, debugging, and optimizing single and multi-GPU operations","published research on generative models"],"datePosted":"2026-04-17T12:25:33.116Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Freiburg (Germany)"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"large-scale diffusion models, image and video data, PyTorch, transformer architectures, distributed training techniques, writing forward and backward Triton kernels, profiling, debugging, and optimizing single and multi-GPU operations, published research on generative models"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_3c4831ed-fa8"},"title":"Technical Product Manager","description":"<p>We&#39;re hiring a Technical Product Manager to define and execute Alluxio&#39;s AI systems strategy , spanning inference, training, and emerging agentic workloads. This role bridges the worlds of AI infrastructure and distributed data systems, guiding how Alluxio evolves to serve next-generation model architectures and large-scale data flows.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Define the long-term vision and roadmap for Alluxio&#39;s AI data platform, covering inference, training, and agentic workloads.</li>\n<li>Collaborate with engineering to design features that deliver high-throughput, low-latency data access (e.g., GPU-aware caching, streaming reads, tiered prefetching).</li>\n<li>Ensure seamless integration with frameworks like PyTorch, TensorFlow, Ray, and Triton; evolve Alluxio&#39;s APIs for AI-native workloads.</li>\n<li>Engage directly with enterprise AI teams to understand workload patterns, validate impact, and prioritize roadmap direction.</li>\n<li>Stay ahead of trends in multi-model serving, retrieval-augmented generation (RAG), and agentic orchestration; translate them into actionable product plans.</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>5–9 years of experience in product management or technical leadership within AI infrastructure, ML platforms, or distributed systems.</li>\n<li>Strong understanding of AI/ML workflows , from model training and deployment to inference and data-access pipelines.</li>\n<li>Proven track record of delivering infrastructure features that improve latency, GPU utilization, or total cost of ownership.</li>\n<li>Technical fluency with distributed systems, caching, and cloud orchestration (Kubernetes, AWS/GCP/Azure).</li>\n<li>Familiarity with AI frameworks such as PyTorch, TensorFlow, Triton, Ray, or LangChain.</li>\n<li>Exceptional communication and strategic thinking , ability to translate complex systems work into clear, prioritized roadmaps.</li>\n</ul>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Shape how the world&#39;s most advanced AI systems access and process data.</li>\n<li>Work at the intersection of distributed systems, AI acceleration, and open source.</li>\n<li>Collaborate with world-class engineers, researchers, and customers driving the AI frontier.</li>\n<li>Competitive compensation and equity package with comprehensive benefits.</li>\n<li>A culture built on curiosity, empathy, and deep technical rigor.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_3c4831ed-fa8","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Alluxio","sameAs":"https://alluxio.io","logo":"https://logos.yubhub.co/alluxio.io.png"},"x-apply-url":"https://jobs.lever.co/alluxio/e7e0f8a4-83ed-416b-9f95-7f3ed8abfa52","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Distributed systems","Caching","Cloud orchestration","Kubernetes","AWS/GCP/Azure","PyTorch","TensorFlow","Triton","Ray","LangChain"],"x-skills-preferred":[],"datePosted":"2026-04-17T12:15:34.231Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Distributed systems, Caching, Cloud orchestration, Kubernetes, AWS/GCP/Azure, PyTorch, TensorFlow, Triton, Ray, LangChain"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_11a60d5a-f54"},"title":"Performance Engineer, GPU","description":"<p><strong>About the role:</strong></p>\n<p>Pioneering the next generation of AI requires breakthrough innovations in GPU performance and systems engineering. As a GPU Performance Engineer, you&#39;ll architect and implement the foundational systems that power Claude and push the frontiers of what&#39;s possible with large language models. You&#39;ll be responsible for maximizing GPU utilization and performance at unprecedented scale, developing cutting-edge optimizations that directly enable new model capabilities and dramatically improve inference efficiency.</p>\n<p>Working at the intersection of hardware and software, you&#39;ll implement state-of-the-art techniques from custom kernel development to distributed system architectures. Your work will span the entire stack—from low-level tensor core optimizations to orchestrating thousands of GPUs in perfect synchronization.</p>\n<p>Strong candidates will have a track record of delivering transformative GPU performance improvements in production ML systems and will be excited to shape the future of AI infrastructure alongside world-class researchers and engineers.</p>\n<p><strong>You might be a good fit if you:</strong></p>\n<ul>\n<li>Have deep experience with GPU programming and optimization at scale</li>\n<li>Are impact-driven, passionate about delivering measurable performance breakthroughs</li>\n<li>Can navigate complex systems from hardware interfaces to high-level ML frameworks</li>\n<li>Enjoy collaborative problem-solving and pair programming</li>\n<li>Want to work on state-of-the-art language models with real-world impact</li>\n<li>Care about the societal impacts of your work</li>\n<li>Thrive in ambiguous environments where you define the path forward</li>\n</ul>\n<p><strong>Strong candidates may also have experience with:</strong></p>\n<ul>\n<li>GPU Kernel Development: CUDA, Triton, CUTLASS, Flash Attention, tensor core optimization</li>\n<li>ML Compilers &amp; Frameworks: PyTorch/JAX internals, torch.compile, XLA, custom operators</li>\n<li>Performance Engineering: Kernel fusion, memory bandwidth optimization, profiling with Nsight</li>\n<li>Distributed Systems: NCCL, NVLink, collective communication, model parallelism</li>\n<li>Low-Precision: INT8/FP8 quantization, mixed-precision techniques</li>\n<li>Production Systems: Large-scale training infrastructure, fault tolerance, cluster orchestration</li>\n</ul>\n<p><strong>Representative projects:</strong></p>\n<ul>\n<li>Co-design attention mechanisms and algorithms for next-generation hardware architectures</li>\n<li>Develop custom kernels for emerging quantization formats and mixed-precision techniques</li>\n<li>Design distributed communication strategies for multi-node GPU clusters</li>\n<li>Optimize end-to-end training and inference pipelines for frontier language models</li>\n<li>Build performance modeling frameworks to predict and optimize GPU utilization</li>\n<li>Implement kernel fusion strategies to minimize memory bandwidth bottlenecks</li>\n<li>Create resilient systems for planet-scale distributed training infrastructure</li>\n<li>Profile and eliminate performance bottlenecks in production serving infrastructure</li>\n<li>Partner with hardware vendors to influence future accelerator capabilities and software stacks</li>\n</ul>\n<p><strong>Deadline to apply:</strong> None. Applications will be reviewed on a rolling basis.</p>\n<p>The expected salary range for this position is:</p>\n<p>Annual Salary: $280,000 - $850,000USD</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_11a60d5a-f54","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://job-boards.greenhouse.io","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4926227008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$280,000 - $850,000USD","x-skills-required":["GPU programming","optimization at scale","custom kernel development","distributed system architectures","low-level tensor core optimizations","orchestrating thousands of GPUs","GPU kernel development","CUDA","Triton","CUTLASS","Flash Attention","tensor core optimization","ML compilers & frameworks","PyTorch/JAX internals","torch.compile","XLA","custom operators","performance engineering","kernel fusion","memory bandwidth optimization","profiling with Nsight","distributed systems","NCCL","NVLink","collective communication","model parallelism","low-precision","INT8/FP8 quantization","mixed-precision techniques","production systems","large-scale training infrastructure","fault tolerance","cluster orchestration"],"x-skills-preferred":["GPU programming","optimization at scale","custom kernel development","distributed system architectures","low-level tensor core optimizations","orchestrating thousands of GPUs","GPU kernel development","CUDA","Triton","CUTLASS","Flash Attention","tensor core optimization","ML compilers & frameworks","PyTorch/JAX internals","torch.compile","XLA","custom operators","performance engineering","kernel fusion","memory bandwidth optimization","profiling with Nsight","distributed systems","NCCL","NVLink","collective communication","model parallelism","low-precision","INT8/FP8 quantization","mixed-precision techniques","production systems","large-scale training infrastructure","fault tolerance","cluster orchestration"],"datePosted":"2026-03-08T13:45:05.412Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"GPU programming, optimization at scale, custom kernel development, distributed system architectures, low-level tensor core optimizations, orchestrating thousands of GPUs, GPU kernel development, CUDA, Triton, CUTLASS, Flash Attention, tensor core optimization, ML compilers & frameworks, PyTorch/JAX internals, torch.compile, XLA, custom operators, performance engineering, kernel fusion, memory bandwidth optimization, profiling with Nsight, distributed systems, NCCL, NVLink, collective communication, model parallelism, low-precision, INT8/FP8 quantization, mixed-precision techniques, production systems, large-scale training infrastructure, fault tolerance, cluster orchestration, GPU programming, optimization at scale, custom kernel development, distributed system architectures, low-level tensor core optimizations, orchestrating thousands of GPUs, GPU kernel development, CUDA, Triton, CUTLASS, Flash Attention, tensor core optimization, ML compilers & frameworks, PyTorch/JAX internals, torch.compile, XLA, custom operators, performance engineering, kernel fusion, memory bandwidth optimization, profiling with Nsight, distributed systems, NCCL, NVLink, collective communication, model parallelism, low-precision, INT8/FP8 quantization, mixed-precision techniques, production systems, large-scale training infrastructure, fault tolerance, cluster orchestration","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":280000,"maxValue":850000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_7badeaf5-492"},"title":"Hardware / Software CoDesign Engineer","description":"<p><strong>Hardware / Software CoDesign Engineer</strong></p>\n<p><strong>Location</strong></p>\n<p>San Francisco</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Location Type</strong></p>\n<p>Hybrid</p>\n<p><strong>Department</strong></p>\n<p>Scaling</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$342K – $555K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p><strong>About the Team</strong></p>\n<p>OpenAI’s Hardware organization develops silicon and system-level solutions designed for the unique demands of advanced AI workloads. The team is responsible for building the next generation of AI-native silicon while working closely with software and research partners to co-design hardware tightly integrated with AI models. In addition to delivering production-grade silicon for OpenAI’s supercomputing infrastructure, the team also creates custom design tools and methodologies that accelerate innovation and enable hardware optimized specifically for AI.</p>\n<p><strong>About the Role</strong></p>\n<p>As an Engineer on our hardware optimization and co-design team, you will co-design future hardware from different vendors for programmability and performance. You will work with our kernel, compiler and machine learning engineers to understand their unique needs related to ML techniques, algorithms, numerical approximations, programming expressivity, and compiler optimizations. You will evangelize these constraints with various vendors to develop and influence future hardware architectures towards efficient training and inference on our models. If you are excited about efficiently distributing a large language model across devices, dealing with and optimizing system-wide/rack-wide networking bottlenecks and eventually tailoring the compute pipe and memory hierarchy of the hardware platform, simulating workloads at different abstractions and working closely with our partners, this is the perfect opportunity!</p>\n<p><strong>In this role, you will:</strong></p>\n<ul>\n<li>Co-design future hardware for programmability and performance with our hardware vendors</li>\n</ul>\n<ul>\n<li>Assist hardware vendors in developing optimal kernels and add support for it in our compiler</li>\n</ul>\n<ul>\n<li>Develop performance estimates for critical kernels for different hardware configurations and drive decisions on compute core and memory hierarchy features</li>\n</ul>\n<ul>\n<li>Build system performance models at different abstraction levels and carry out analysis to drive decisions on scale up, scale out, front end networking</li>\n</ul>\n<ul>\n<li>Work with machine learning engineers, kernel engineers and compiler developers to understand their vision and needs from high performance accelerators</li>\n</ul>\n<ul>\n<li>Manage communication and coordination with internal and external partners</li>\n</ul>\n<ul>\n<li>Influence the roadmap of hardware partners to optimize them for OpenAI’s workloads.</li>\n</ul>\n<ul>\n<li>Evaluate potential partners’ accelerators and platforms.</li>\n</ul>\n<ul>\n<li>As the scope of the role and team grows, understand and influence roadmaps for hardware partners for our datacenter networks, racks, and buildings.</li>\n</ul>\n<p><strong>You might thrive in this role if you have:</strong></p>\n<ul>\n<li>4+ years of industry experience, including experience harnessing compute at scale and optimizing ML platform code to run efficiently on target hardware.</li>\n</ul>\n<ul>\n<li>Strong experience in software/hardware co-design</li>\n</ul>\n<ul>\n<li>Deep understanding of GPU and/or other AI accelerators</li>\n</ul>\n<ul>\n<li>Experience with CUDA, Triton or a related accelerator programming language</li>\n</ul>\n<ul>\n<li>Experience driving Machine Learning accuracy with low precision formats</li>\n</ul>\n<ul>\n<li>Experience with system performance modeling and analysis to optimize ML model deployment</li>\n</ul>\n<ul>\n<li>Strong coding skills in C/C++ and Python</li>\n</ul>\n<ul>\n<li>Are familiar with the fundamentals of deep learning computing and chip architecture/microarchitecture.</li>\n</ul>\n<p><strong>These attributes are nice to have:</strong></p>\n<ul>\n<li>PhD in Computer Science and Engineering with a specialization in Computer Architecture, Parallel Computing. Compilers or other Systems</li>\n</ul>\n<ul>\n<li>Strong understanding of LLMs and challenges related to their training and inference</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_7badeaf5-492","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/bdbb2292-ecb3-42dc-ba89-65edf397d8f8","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$342K – $555K • Offers Equity","x-skills-required":["software/hardware co-design","GPU and/or other AI accelerators","CUDA, Triton or a related accelerator programming language","Machine Learning accuracy with low precision formats","system performance modeling and analysis to optimize ML model deployment","C/C++ and Python"],"x-skills-preferred":["PhD in Computer Science and Engineering with a specialization in Computer Architecture, Parallel Computing. Compilers or other Systems","Strong understanding of LLMs and challenges related to their training and inference"],"datePosted":"2026-03-06T18:39:51.459Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"software/hardware co-design, GPU and/or other AI accelerators, CUDA, Triton or a related accelerator programming language, Machine Learning accuracy with low precision formats, system performance modeling and analysis to optimize ML model deployment, C/C++ and Python, PhD in Computer Science and Engineering with a specialization in Computer Architecture, Parallel Computing. Compilers or other Systems, Strong understanding of LLMs and challenges related to their training and inference","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":342000,"maxValue":555000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_989f992b-6b2"},"title":"Software Engineer, Inference – AMD GPU Enablement","description":"<p><strong>Software Engineer, Inference – AMD GPU Enablement</strong></p>\n<p><strong>Location</strong></p>\n<p>San Francisco</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p>Scaling</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$295K – $555K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p>More details about our benefits are available to candidates during the hiring process.</p>\n<p>This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.</p>\n<p><strong>About the Team</strong></p>\n<p>Our Inference team brings OpenAI’s most capable research and technology to the world through our products. We empower consumers, enterprises and developers alike to use and access our state-of-the-art AI models, allowing them to do things that they’ve never been able to before. We focus on performant and efficient model inference, as well as accelerating research progression via model inference.</p>\n<p><strong>About the Role</strong></p>\n<p>We’re hiring engineers to scale and optimize OpenAI’s inference infrastructure across emerging GPU platforms. You’ll work across the stack - from low-level kernel performance to high-level distributed execution - and collaborate closely with research, infra, and performance teams to ensure our largest models run smoothly on new hardware.</p>\n<p>This is a high-impact opportunity to shape OpenAI’s multi-platform inference capabilities from the ground up with a particular focus on advancing inference performance on AMD accelerators.</p>\n<p><strong>In this role, you will:</strong></p>\n<ul>\n<li>Own bring-up, correctness and performance of the OpenAI inference stack on AMD hardware.</li>\n</ul>\n<ul>\n<li>Integrate internal model-serving infrastructure (e.g., vLLM, Triton) into a variety of GPU-backed systems.</li>\n</ul>\n<ul>\n<li>Debug and optimize distributed inference workloads across memory, network, and compute layers.</li>\n</ul>\n<ul>\n<li>Validate correctness, performance, and scalability of model execution on large GPU clusters.</li>\n</ul>\n<ul>\n<li>Collaborate with partner teams to design and optimize high-performance GPU kernels for accelerators using HIP, Triton, or other performance-focused frameworks.</li>\n</ul>\n<ul>\n<li>Collaborate with partner teams to build, integrate and tune collective communication libraries (e.g., RCCL) used to parallelize model execution across many GPUs.</li>\n</ul>\n<p><strong>You can thrive in this role if you:</strong></p>\n<ul>\n<li>Have experience writing or porting GPU kernels using HIP, CUDA, or Triton, and care deeply about low-level performance.</li>\n</ul>\n<ul>\n<li>Are familiar with communication libraries like NCCL/RCCL and understand their role in high-throughput model serving.</li>\n</ul>\n<ul>\n<li>Have worked on distributed inference systems and are comfortable scaling models across fleets of accelerators.</li>\n</ul>\n<ul>\n<li>Enjoy solving end-to-end performance challenges across hardware, system libraries, and orchestration layers.</li>\n</ul>\n<ul>\n<li>Are excited to be part of a small, fast-moving team building new infrastructure from first principles.</li>\n</ul>\n<p><strong>Nice to Have:</strong></p>\n<ul>\n<li>Contributions to open-source libraries like RCCL, Triton, or vLLM.</li>\n</ul>\n<ul>\n<li>Experience with GPU performance tools (Nsight, rocprof, perf) and memory/comms profiling.</li>\n</ul>\n<ul>\n<li>Prior experience deploying inference on other non-NVIDIA GPU environments.</li>\n</ul>\n<ul>\n<li>Knowledge of model/tensor parallelism, mixed precision, and serving 10B+ parameter models.</li>\n</ul>\n<p><strong>About OpenAI</strong></p>\n<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_989f992b-6b2","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/9b79406c-89a8-49bd-8a38-e72db80996e9","x-work-arrangement":"onsite","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$295K – $555K • Offers Equity","x-skills-required":["GPU kernels","HIP","CUDA","Triton","NCCL/RCCL","distributed inference systems","GPU performance tools","memory/comms profiling"],"x-skills-preferred":["open-source libraries","GPU performance tools","memory/comms profiling"],"datePosted":"2026-03-06T18:28:36.084Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"GPU kernels, HIP, CUDA, Triton, NCCL/RCCL, distributed inference systems, GPU performance tools, memory/comms profiling, open-source libraries, GPU performance tools, memory/comms profiling","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":295000,"maxValue":555000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_46bb9922-091"},"title":"ML Research Engineer - Hardware Codesign","description":"<p><strong>Location</strong></p>\n<p>San Francisco</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p>Scaling</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$185K – $455K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p>More details about our benefits are available to candidates during the hiring process.</p>\n<p>This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.</p>\n<p><strong><strong>About the Team</strong></strong></p>\n<p>OpenAI’s Hardware organization develops silicon and system-level solutions designed for the unique demands of advanced AI workloads. The team is responsible for building the next generation of AI silicon while working closely with software and research partners to co-design hardware tightly integrated with AI models. In addition to delivering production-grade silicon for OpenAI’s supercomputing infrastructure, the team also creates custom design tools and methodologies that accelerate innovation and enable hardware optimized specifically for AI.</p>\n<p><strong><strong>About the Role</strong></strong></p>\n<p>We’re seeking a Research-Hardware Codesign Engineer to operate at the boundary between model research and silicon/system architecture. You’ll help shape the numerics, architecture, and technology bets of future OpenAI silicon in collaboration with both Research and Hardware.</p>\n<p>Your work will include debugging gaps between rooflines and reality, writing quantization kernels, derisking numerics via model evals, quantifying system architecture tradeoffs, and implementing novel numeric RTL. This is a hands-on role for people who go looking for hard problems, get to ground truth, and drive it to production. Strong prioritization and clear, honest communication are essential.</p>\n<p>Location: San Francisco, CA (Hybrid: 3 days/week onsite)</p>\n<p>Relocation assistance available.</p>\n<p><strong><strong>In this role you will:</strong></strong></p>\n<ul>\n<li>Build on our roofline simulator to track evolving workloads, and deliver analyses that quantify the impact of system architecture decisions and support technology pathfinding.</li>\n</ul>\n<ul>\n<li>Debug gaps between performance simulation and real measurements; clearly communicate root cause, bottlenecks, and invalid assumptions.</li>\n</ul>\n<ul>\n<li>Write emulation kernels for low-precision numerics and lossy compression schemes, and get Research the information they need to trade efficiency with model quality.</li>\n</ul>\n<ul>\n<li>Prototype numerics modules by pushing RTL through synthesis; hand off novel numerics cleanly, or occasionally own an RTL module end-to-end.</li>\n</ul>\n<ul>\n<li>Proactively pull in new ML workloads, prototype them with rooflines and/or functional simulation, and drive initial evaluation of new opportunities or risks.</li>\n</ul>\n<ul>\n<li>Understand the whole picture from ML science to hardware optimization, and slice this end-to-end objective into near-term deliverables.</li>\n</ul>\n<ul>\n<li>Build ad-hoc collaborations across teams with very different goals and areas of expertise, and keep progress unblocked.</li>\n</ul>\n<ul>\n<li>Communicate design tradeoffs clearly with explicit assumptions and confidence levels; produce a trail of evidence that enables confident execution.</li>\n</ul>\n<p><strong><strong>You Will Thrive in this Role if:</strong></strong></p>\n<ul>\n<li>An exceptional track record of high-quality technical output, and a bias for shipping a prototype now and iterating later in the absence of clear requirements.</li>\n</ul>\n<ul>\n<li>Strong Python, and C++ or Rust, with a cautious attitude toward correctness and an intuition for clean extensibility.</li>\n</ul>\n<ul>\n<li>Experience writing Triton, CUDA, or similar, and an understanding of the resulting mapping of tensor ops to functional units.</li>\n</ul>\n<ul>\n<li>Working knowledge of PyTorch or JAX; experience in large ML codebases is a plus.</li>\n</ul>\n<ul>\n<li>Practical understanding of floating point numerics, the ML tradeoffs of reduced precision, and the current state of the art in model quantization.</li>\n</ul>\n<ul>\n<li>Deep understanding of transformer models, and strong intuition for transformer rooflines and the tradeoffs of sharded training and inference in large-scale ML systems.</li>\n</ul>\n<ul>\n<li>Experience writing RTL (especially for floating point logic) and understanding of PPA tradeoffs is a plus.</li>\n</ul>\n<ul>\n<li>Strong cross-functional communication (e.g. across ML researchers and hardware engineers); ability to slice ambiguous early-incubation ideas into concrete arenas in which progress can be made.</li>\n</ul>\n<p>_To comply with U.S. export control laws and regulations, candidates for this role may need to meet certain legal status requirements.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_46bb9922-091","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/5931abef-191b-417e-89f1-1d06f00e908c","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$185K – $455K","x-skills-required":["Python","C++","Rust","Triton","CUDA","PyTorch","JAX","Floating point numerics","Model quantization","Transformer models","RTL","PPA tradeoffs"],"x-skills-preferred":["Strong Python","C++ or Rust","Experience writing Triton","CUDA or similar","Working knowledge of PyTorch or JAX","Experience in large ML codebases"],"datePosted":"2026-03-06T18:28:06.437Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, C++, Rust, Triton, CUDA, PyTorch, JAX, Floating point numerics, Model quantization, Transformer models, RTL, PPA tradeoffs, Strong Python, C++ or Rust, Experience writing Triton, CUDA or similar, Working knowledge of PyTorch or JAX, Experience in large ML codebases","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":185000,"maxValue":455000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_c041d54a-929"},"title":"Internship Program","description":"<p>Perplexity is excited to announce the Internship Program for exceptional Master’s or PhD students studying Computer Science or Engineering in the UK, enrolled in the 2025-2026 academic year. This is an intensive program in which you will work directly with our AI Inference team.</p>\n<p><strong>What you&#39;ll do</strong></p>\n<ul>\n<li>Work with the inference team to improve serving latency and throughput</li>\n<li>Bring up support for new models and state-of-the-art inference optimizations or quantization schemes</li>\n<li>Optimize inference across the entire stack, from GPU kernels to serving endpoints</li>\n</ul>\n<p><strong>What you need</strong></p>\n<ul>\n<li>Strong engineering track record with proven knowledge of fundamentals and programming languages (multi-threaded programming, networking, compilation, systems programming, etc)</li>\n<li>Pursuing a Master&#39;s or PhD in Computer Science with a focus on performance-related subjects (HPC, Compilers, Distributed Systems)</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_c041d54a-929","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Perplexity","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/perplexity.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/perplexity/79a07e2d-6150-4929-80fe-bbe13a641763","x-work-arrangement":"hybrid","x-experience-level":"entry","x-job-type":"internship","x-salary-range":null,"x-skills-required":["strong engineering track record","proven knowledge of fundamentals and programming languages","pursuing a Master's or PhD in Computer Science"],"x-skills-preferred":["experience with ML frameworks (Torch, JAX)","experience with GPU programming (CUDA, Triton)","experience with High-Performance Computing (OpenMPI)"],"datePosted":"2026-03-04T12:25:51.516Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"London"}},"employmentType":"INTERN","occupationalCategory":"Engineering","industry":"Technology","skills":"strong engineering track record, proven knowledge of fundamentals and programming languages, pursuing a Master's or PhD in Computer Science, experience with ML frameworks (Torch, JAX), experience with GPU programming (CUDA, Triton), experience with High-Performance Computing (OpenMPI)"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_7917d1eb-6e2"},"title":"Engineering Manager - Inference","description":"<p>We are looking for an Inference Engineering Manager to lead our AI Inference team. This is a unique opportunity to build and scale the infrastructure that powers Perplexity&#39;s products and APIs, serving millions of users with state-of-the-art AI capabilities.</p>\n<p><strong>What you&#39;ll do</strong></p>\n<p>You will own the technical direction and execution of our inference systems while building and leading a world-class team of inference engineers. Our current stack includes Python, PyTorch, Rust, C++, and Kubernetes.</p>\n<ul>\n<li>Lead and grow a high-performing team of AI inference engineers</li>\n<li>Develop APIs for AI inference used by both internal and external customers</li>\n<li>Architect and scale our inference infrastructure for reliability and efficiency</li>\n</ul>\n<p><strong>What you need</strong></p>\n<ul>\n<li>5+ years of engineering experience with 2+ years in a technical leadership or management role</li>\n<li>Deep experience with ML systems and inference frameworks (PyTorch, TensorFlow, ONNX, TensorRT, vLLM)</li>\n<li>Strong understanding of LLM architecture: Multi-Head Attention, Multi/Grouped-Query Attention, and common layers</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_7917d1eb-6e2","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Perplexity","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/perplexity.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/perplexity/2a87ccbf-82ef-4fc7-b1ed-4dd18b11baf9","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$300K - $405K","x-skills-required":["ML systems","inference frameworks","LLM architecture"],"x-skills-preferred":["CUDA","Triton","custom kernel development"],"datePosted":"2026-03-04T12:24:50.159Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"ML systems, inference frameworks, LLM architecture, CUDA, Triton, custom kernel development","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":300000,"maxValue":405000,"unitText":"YEAR"}}}]}