<?xml version="1.0" encoding="UTF-8"?>
<source>
  <jobs>
    <job>
      <externalid>9af8d812-df8</externalid>
      <Title>AI Infrastructure Engineer</Title>
      <Description><![CDATA[<p>We&#39;re looking for Senior+ AI Infrastructure Engineers to build the systems that train and serve Intercom&#39;s next generation of AI products.</p>
<p>As a Senior AI Infrastructure Engineer focused on model training and inference, you will:</p>
<p>Implement and scale training pipelines for large transformer and LLM models, from data ingestion and preprocessing through distributed training and evaluation.</p>
<p>Build and optimize inference services that deliver low-latency, high-reliability experiences for our customers, including autoscaling, routing, and fallbacks.</p>
<p>Work on GPU-level performance: tuning kernels, improving utilization, and identifying bottlenecks across our training and inference stack.</p>
<p>Collaborate closely with ML scientists to implement cutting edge training and inference methods and bring them to production.</p>
<p>Play an active role in hiring, mentoring, and developing other engineers on the team.</p>
<p>Raise the bar for technical standards, reliability, and operational excellence across Intercom’s AI platform.</p>
<p>We’re looking to hire Senior+ AI Infrastructure Engineers. You’re likely a great fit if:</p>
<p>You have 5+ years of experience in software engineering, with a strong track record of shipping high-quality products or platforms.</p>
<p>You hold a degree in Computer Science, Computer Engineering, or a related field (or you have equivalent experience with very strong fundamentals).</p>
<p>You have hands-on experience with one or more of the following:</p>
<p>Model training (especially transformers and LLMs).</p>
<p>Model inference at scale (again, especially transformers and LLMs).</p>
<p>Low-level GPU work, such as writing CUDA or Triton kernels.</p>
<p>Comfortable working in production environments at meaningful scale (traffic, data, or organizational).</p>
<p>You communicate clearly, can explain complex technical topics to different audiences, and enjoy close collaboration with both engineers and non-engineers.</p>
<p>You take pride in strong technical fundamentals, love learning, and are willing to invest in your own development.</p>
<p>Have deep knowledge of at least one programming language (for example Python, Ruby, Java, Go, etc.). Specific language experience is less important than your ability to write clean, reliable code and learn new stacks quickly.</p>
<p>We are a well-treated bunch, with awesome benefits! If there’s something important to you that’s not on this list, talk to us!</p>
<p>Competitive salary, annual bonus and equity</p>
<p>Regular compensation reviews - we reward great work!</p>
<p>Unlimited access to Claude Code and best-in-class AI tools; experimentation &amp; building is encouraged &amp; celebrated.</p>
<p>Generous paid time off above statutory minimum</p>
<p>Hybrid working</p>
<p>MacBooks are our standard, but we also offer Windows for certain roles when needed.</p>
<p>Fun events for employees, friends, and family!</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>model training, model inference, low-level GPU work, CUDA, Triton, Python, Ruby, Java, Go, experience at AI native companies, running training or inference workloads on Kubernetes, AWS, cloud providers, production experience with Python in ML or infrastructure contexts</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Intercom</Employername>
      <Employerlogo>https://logos.yubhub.co/intercom.com.png</Employerlogo>
      <Employerdescription>Intercom is an AI company that builds customer service solutions. It was founded in 2011 and serves nearly 30,000 global businesses.</Employerdescription>
      <Employerwebsite>https://www.intercom.com/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/intercom/jobs/7824142</Applyto>
      <Location>Berlin, Germany</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>cba88898-896</externalid>
      <Title>Research Engineer, Infrastructure, Kernels</Title>
      <Description><![CDATA[<p>We&#39;re looking for an infrastructure research engineer to design, optimize, and maintain the compute foundations that power large-scale language model training. You will develop high-performance ML kernels (e.g., CUDA, CuTe, Triton), enable efficient low-precision arithmetic, and improve the distributed compute stack that makes training large models possible.</p>
<p>This role is perfect for an engineer who enjoys working close to the metal and across the research boundary. You&#39;ll collaborate with researchers and systems architects to bridge algorithmic design with hardware efficiency. You&#39;ll prototype new kernel implementations, profile performance across hardware generations, and help define the numerical and parallelism strategies that determine how we scale next-generation AI systems.</p>
<p><strong>Responsibilities</strong></p>
<ul>
<li>Design and implement custom ML kernels (e.g., CUDA, CuTe, Triton) for core LLM operations such as attention, matrix multiplication, gating, and normalization, optimized for modern GPU and accelerator architectures.</li>
<li>Design and think through compute primitives to reduce memory bandwidth bottlenecks and improve kernel compute efficiency.</li>
<li>Collaborate with research teams to align kernel-level optimizations with model architecture and algorithmic goals.</li>
<li>Develop and maintain a library of reusable kernels and performance benchmarks that serve as the foundation for internal model training.</li>
<li>Contribute to infrastructure stability and scalability, ensuring reproducibility, consistency across precision formats, and high utilization of compute resources.</li>
<li>Document and share insights through internal talks, technical papers, or open-source contributions to strengthen the broader ML systems community.</li>
</ul>
<p><strong>Skills and Qualifications</strong></p>
<p>Minimum qualifications:</p>
<ul>
<li>Bachelor’s degree or equivalent experience in computer science, electrical engineering, statistics, machine learning, physics, robotics, or similar.</li>
<li>Strong engineering skills, ability to contribute performant, maintainable code and debug in complex codebases</li>
<li>Understanding of deep learning frameworks (e.g., PyTorch, JAX) and their underlying system architectures.</li>
<li>Thrive in a highly collaborative environment involving many, different cross-functional partners and subject matter experts.</li>
<li>A bias for action with a mindset to take initiative to work across different stacks and different teams where you spot the opportunity to make sure something ships.</li>
<li>Proficiency in CUDA, CuTe, Triton, or other GPU programming frameworks.</li>
<li>Demonstrated ability to analyze, profile, and optimize compute-intensive workloads.</li>
</ul>
<p>Preferred qualifications:</p>
<ul>
<li>Experience training or supporting large-scale language models with tens of billions of parameters or more.</li>
<li>Track record of improving research productivity through infrastructure design or process improvements.</li>
<li>Experience developing or tuning kernels for deep learning frameworks such as PyTorch, JAX, or custom accelerators.</li>
<li>Familiarity with tensor parallelism, pipeline parallelism, or distributed data processing frameworks.</li>
<li>Experience implementing low-precision formats (FP8, INT8, block floating point) or contributing to related compiler stacks (e.g., XLA, TVM).</li>
<li>Contributions to open-source GPU, ML systems, or compiler optimization projects.</li>
<li>Prior research or engineering experience in numerical optimization, communication-efficient training, or scalable AI infrastructure.</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange>$350,000 - $475,000 USD</Salaryrange>
      <Skills>CUDA, CuTe, Triton, GPU programming frameworks, Deep learning frameworks (e.g., PyTorch, JAX), Computer science, Electrical engineering, Statistics, Machine learning, Physics, Robotics, Experience training or supporting large-scale language models with tens of billions of parameters or more, Track record of improving research productivity through infrastructure design or process improvements, Experience developing or tuning kernels for deep learning frameworks such as PyTorch, JAX, or custom accelerators, Familiarity with tensor parallelism, pipeline parallelism, or distributed data processing frameworks, Experience implementing low-precision formats (FP8, INT8, block floating point) or contributing to related compiler stacks (e.g., XLA, TVM), Contributions to open-source GPU, ML systems, or compiler optimization projects, Prior research or engineering experience in numerical optimization, communication-efficient training, or scalable AI infrastructure</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Thinking Machines Lab</Employername>
      <Employerlogo>https://logos.yubhub.co/thinkingmachines.ai.png</Employerlogo>
      <Employerdescription>Thinking Machines Lab is a technology company that has created widely used AI products, including ChatGPT and Character.ai, and open-source projects like PyTorch.</Employerdescription>
      <Employerwebsite>https://thinkingmachines.ai/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/thinkingmachines/jobs/5013934008</Applyto>
      <Location>San Francisco</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>c9ab5cbc-dd6</externalid>
      <Title>Research Engineer, Performance RL</Title>
      <Description><![CDATA[<p>We&#39;re hiring a Research Engineer to join our Code RL team within the RL organization. As a Research Engineer, you&#39;ll advance our models&#39; ability to safely write correct, fast code for accelerators.</p>
<p>You&#39;ll need to know accelerator performance well to turn it into tasks and signals models can learn from. Specifically, you will:</p>
<ul>
<li>Invent, design and implement RL environments and evaluations.</li>
<li>Conduct experiments and shape our research roadmap.</li>
<li>Deliver your work into training runs.</li>
<li>Collaborate with other researchers, engineers, and performance engineering specialists across and outside Anthropic.</li>
</ul>
<p>We&#39;re looking for someone with expertise in accelerators (CUDA, ROCm, Triton, Pallas), ML framework programming (JAX or PyTorch), and experience with balancing research exploration with engineering implementation.</p>
<p>Strong candidates may also have experience with reinforcement learning, porting ML workloads between different types of accelerators, and familiarity with LLM training methodologies.</p>
<p>The annual compensation range for this role is $350,000-$850,000 USD.</p>
<p>Please note that we&#39;re an extremely collaborative group, and we value communication skills. The easiest way to understand our research directions is to read our recent research.</p>
<p>We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$350,000-$850,000 USD</Salaryrange>
      <Skills>accelerator performance, ML framework programming, reinforcement learning, RL environments and evaluations, experiments and research roadmap, training runs, collaboration with researchers and engineers, CUDA, ROCm, Triton, Pallas, JAX, PyTorch, LLM training methodologies</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Anthropic</Employername>
      <Employerlogo>https://logos.yubhub.co/anthropic.com.png</Employerlogo>
      <Employerdescription>Anthropic is a public benefit corporation that focuses on creating reliable, interpretable, and steerable AI systems.</Employerdescription>
      <Employerwebsite>https://www.anthropic.com/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/anthropic/jobs/5160330008</Applyto>
      <Location>San Francisco, CA</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>2bc6ae79-8ee</externalid>
      <Title>Staff Technical Lead for Inference &amp; ML Performance</Title>
      <Description><![CDATA[<p>We&#39;re looking for a Staff Technical Lead for Inference &amp; ML Performance to guide a team in building and optimizing state-of-the-art inference systems. This role is intense yet deeply impactful.</p>
<p>You&#39;ll shape the future of fal&#39;s inference engine and ensure our generative models achieve best-in-class performance. Your work directly impacts our ability to rapidly deliver cutting-edge creative solutions to users, from individual creators to global brands.</p>
<p>Day-to-day, you&#39;ll set technical direction, guide your team to build high-performance inference solutions, and personally contribute to critical inference performance enhancements and optimizations. You&#39;ll collaborate closely with research &amp; applied ML teams, influence model inference strategies and deployment techniques, and drive advanced performance optimizations.</p>
<p>As a leader, you&#39;ll mentor and scale your team, coach and expand your team of performance-focused engineers, and help them innovate, solve complex performance challenges, and level up their skills.</p>
<p>To succeed in this role, you&#39;ll need to be deeply experienced in ML performance optimization, understand the full ML performance stack, and know inference inside-out. You&#39;ll also need to thrive in cross-functional collaboration and have excellent leadership skills.</p>
<p>If you&#39;re ready to lead the future of inference performance at a fast-paced, high-growth frontier, apply now!</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>staff</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>ML performance optimization, PyTorch, TensorRT, TransformerEngine, Triton, CUTLASS kernels, Quantization, Kernel authoring, Compilation, Model parallelism, Distributed serving, Profiling</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>fal</Employername>
      <Employerlogo>https://logos.yubhub.co/fal.com.png</Employerlogo>
      <Employerdescription>fal is a fast-growing company pioneering the next generation of generative-media infrastructure.</Employerdescription>
      <Employerwebsite>https://fal.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/fal/jobs/4012780009</Applyto>
      <Location>San Francisco</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>1507524b-770</externalid>
      <Title>Research Engineer, Performance RL</Title>
      <Description><![CDATA[<p>We&#39;re hiring a Research Engineer to join our Code RL team within the RL organization. As a Research Engineer, you&#39;ll advance our models&#39; ability to safely write correct, fast code for accelerators.</p>
<p>You&#39;ll need to know accelerator performance well to turn it into tasks and signals models can learn from. Specifically, you will:</p>
<ul>
<li>Invent, design and implement RL environments and evaluations.</li>
<li>Conduct experiments and shape our research roadmap.</li>
<li>Deliver your work into training runs.</li>
<li>Collaborate with other researchers, engineers, and performance engineering specialists across and outside Anthropic.</li>
</ul>
<p>You may be a good fit if you:</p>
<ul>
<li>Have expertise with accelerators (CUDA, ROCm, Triton, Pallas), ML framework programming (JAX or PyTorch).</li>
<li>Have worked across the stack – kernels, model code, distributed systems.</li>
<li>Know how to balance research exploration with engineering implementation.</li>
<li>Are passionate about AI&#39;s potential and committed to developing safe and beneficial systems.</li>
</ul>
<p>Strong candidates may also have:</p>
<ul>
<li>Experience with reinforcement learning.</li>
<li>Experience porting ML workloads between different types of accelerators.</li>
<li>Familiarity with LLM training methodologies.</li>
</ul>
<p>The annual compensation range for this role is $350,000-$850,000 USD.</p>
<p>We&#39;re an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.</p>
<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact , advancing our long-term goals of steerable, trustworthy AI , rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science.</p>
<p>We kitchen is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$350,000-$850,000 USD</Salaryrange>
      <Skills>accelerators, ML framework programming, distributed systems, reinforcement learning, LLM training methodologies, CUDA, ROCm, Triton, Pallas, JAX, PyTorch</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Anthropic</Employername>
      <Employerlogo>https://logos.yubhub.co/anthropic.com.png</Employerlogo>
      <Employerdescription>Anthropic is a public benefit corporation that creates reliable, interpretable, and steerable AI systems.</Employerdescription>
      <Employerwebsite>https://www.anthropic.com/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/anthropic/jobs/5160330008</Applyto>
      <Location>San Francisco, CA</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>28107212-128</externalid>
      <Title>Performance Engineer, GPU</Title>
      <Description><![CDATA[<p>As a GPU Performance Engineer at Anthropic, you will be responsible for architecting and implementing the foundational systems that power Claude and push the frontiers of what&#39;s possible with large language models. You will maximize GPU utilization and performance at unprecedented scale, develop cutting-edge optimizations that directly enable new model capabilities, and dramatically improve inference efficiency.</p>
<p>Working at the intersection of hardware and software, you will implement state-of-the-art techniques from custom kernel development to distributed system architectures. Your work will span the entire stack,from low-level tensor core optimizations to orchestrating thousands of GPUs in perfect synchronization.</p>
<p>Strong candidates will have a track record of delivering transformative GPU performance improvements in production ML systems and will be excited to shape the future of AI infrastructure alongside world-class researchers and engineers.</p>
<p>Responsibilities:</p>
<ul>
<li>Architect and implement foundational systems that power Claude</li>
<li>Maximize GPU utilization and performance at unprecedented scale</li>
<li>Develop cutting-edge optimizations that directly enable new model capabilities</li>
<li>Dramatically improve inference efficiency</li>
<li>Implement state-of-the-art techniques from custom kernel development to distributed system architectures</li>
<li>Work at the intersection of hardware and software</li>
<li>Span the entire stack,from low-level tensor core optimizations to orchestrating thousands of GPUs in perfect synchronization</li>
</ul>
<p>Requirements:</p>
<ul>
<li>Deep experience with GPU programming and optimization at scale</li>
<li>Impact-driven, passionate about delivering measurable performance breakthroughs</li>
<li>Ability to navigate complex systems from hardware interfaces to high-level ML frameworks</li>
<li>Enjoy collaborative problem-solving and pair programming</li>
<li>Want to work on state-of-the-art language models with real-world impact</li>
<li>Care about the societal impacts of your work</li>
<li>Thrive in ambiguous environments where you define the path forward</li>
</ul>
<p>Nice to have:</p>
<ul>
<li>Experience with GPU Kernel Development: CUDA, Triton, CUTLASS, Flash Attention, tensor core optimization</li>
<li>ML Compilers &amp; Frameworks: PyTorch/JAX internals, torch.compile, XLA, custom operators</li>
<li>Performance Engineering: Kernel fusion, memory bandwidth optimization, profiling with Nsight</li>
<li>Distributed Systems: NCCL, NVLink, collective communication, model parallelism</li>
<li>Low-Precision: INT8/FP8 quantization, mixed-precision techniques</li>
<li>Production Systems: Large-scale training infrastructure, fault tolerance, cluster orchestration</li>
</ul>
<p>Representative projects:</p>
<ul>
<li>Co-design attention mechanisms and algorithms for next-generation hardware architectures</li>
<li>Develop custom kernels for emerging quantization formats and mixed-precision techniques</li>
<li>Design distributed communication strategies for multi-node GPU clusters</li>
<li>Optimize end-to-end training and inference pipelines for frontier language models</li>
<li>Build performance modeling frameworks to predict and optimize GPU utilization</li>
<li>Implement kernel fusion strategies to minimize memory bandwidth bottlenecks</li>
<li>Create resilient systems for planet-scale distributed training infrastructure</li>
<li>Profile and eliminate performance bottlenecks in production serving infrastructure</li>
<li>Partner with hardware vendors to influence future accelerator capabilities and software stacks</li>
</ul>
<p>Note: The salary range for this position is $280,000-$850,000 USD per year.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$280,000-$850,000 USD per year</Salaryrange>
      <Skills>GPU programming, optimization at scale, CUDA, Triton, CUTLASS, Flash Attention, tensor core optimization, PyTorch/JAX internals, torch.compile, XLA, custom operators, kernel fusion, memory bandwidth optimization, profiling with Nsight, NCCL, NVLink, collective communication, model parallelism, INT8/FP8 quantization, mixed-precision techniques, large-scale training infrastructure, fault tolerance, cluster orchestration</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Anthropic</Employername>
      <Employerlogo>https://logos.yubhub.co/anthropic.com.png</Employerlogo>
      <Employerdescription>Anthropic is a public benefit corporation that creates reliable, interpretable, and steerable AI systems.</Employerdescription>
      <Employerwebsite>https://www.anthropic.com/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/anthropic/jobs/4926227008</Applyto>
      <Location>San Francisco, CA | New York City, NY | Seattle, WA</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>586b9fef-509</externalid>
      <Title>Senior Software Engineer - Network Enablement (Applied ML)</Title>
      <Description><![CDATA[<p>We believe that the way people interact with their finances will drastically improve in the next few years. We&#39;re dedicated to empowering this transformation by building the tools and experiences that thousands of developers use to create their own products.</p>
<p>On this team, you will build and operate the ML infrastructure and product services that enable trust and intelligence across Plaid&#39;s network. You&#39;ll own feature engineering, offline training and batch scoring, online feature serving, and real-time inference so model outputs directly power partner-facing fraud &amp; trust products and bank intelligence features.</p>
<p><strong>Responsibilities</strong></p>
<ul>
<li>Embed model inference into Network Enablement product flows and decision logic (APIs, feature flags, backend flows).</li>
<li>Define and instrument product + ML success metrics (fraud reduction, retention lift, false positives, downstream impact).</li>
<li>Design and run experiments and rollout plans (backtesting, shadow scoring, A/B tests, feature-flagged releases) to validate product hypotheses.</li>
<li>Build and operate offline training pipelines and production batch scoring for bank intelligence products.</li>
<li>Ship and maintain online feature serving and low-latency model inference endpoints for real-time partner/bank scoring.</li>
<li>Implement model CI/CD, model/version registry, and safe rollout/rollback strategies.</li>
<li>Monitor model/data health: drift/regression detection, model-quality dashboards, alerts, and SLOs targeted to partner product needs.</li>
<li>Ensure offline and online parity, data lineage, and automated validation / data contracts to reduce regressions.</li>
<li>Optimize inference performance and cost for real-time scoring (batching, caching, runtime selection).</li>
<li>Ensure fairness, explainability and PII-aware handling for partner-facing ML features; maintain auditability for compliance.</li>
<li>Partner with platform and cross-functional teams to scale the ML/data foundation (graph features, sequence embeddings, unified pipelines).</li>
<li>Mentor engineers and document team standards for ML productization and operations.</li>
</ul>
<p><strong>Qualifications</strong></p>
<ul>
<li>Must-haves:</li>
<li>Strong software engineering skills including systems design, APIs, and building reliable backend services (Go or Python preferred).</li>
<li>Production experience with batch and streaming data pipelines and orchestration tools such as Airflow or Spark.</li>
<li>Experience building or operating real-time scoring and online feature-serving systems, including feature stores and low-latency model inference.</li>
<li>Experience integrating model outputs into product flows (APIs, feature flags) and measuring impact through experiments and product metrics.</li>
<li>Experience with model lifecycle and operations: model registries, CI/CD for models, reproducible training, offline &amp; online parity, monitoring and incident response.</li>
<li>Nice to have:</li>
<li>Experience in fraud, risk, or marketing intelligence domains.</li>
<li>Experience with feature-store products (Tecton / Chronon / Feast / internal) and unified pipelines.</li>
<li>Experience with graph frameworks, graph feature engineering, or sequence embeddings.</li>
<li>Experience optimizing inference at scale (Triton/ONNX/quantization, batching, caching).</li>
</ul>
<p><strong>Additional Information</strong></p>
<p>Our mission at Plaid is to unlock financial freedom for everyone. To support that mission, we seek to build a diverse team of driven individuals who care deeply about making the financial ecosystem more equitable.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$190,800-$286,800 per year</Salaryrange>
      <Skills>software engineering, systems design, APIs, backend services, Go, Python, batch and streaming data pipelines, orchestration tools, Airflow, Spark, real-time scoring, online feature-serving systems, feature stores, low-latency model inference, model outputs, product flows, experiments, product metrics, model lifecycle, operations, model registries, CI/CD, reproducible training, offline &amp; online parity, monitoring, incident response, fraud, risk, marketing intelligence, feature-store products, unified pipelines, graph frameworks, graph feature engineering, sequence embeddings, inference at scale, Triton, ONNX, quantization, batching, caching</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Plaid</Employername>
      <Employerlogo>https://logos.yubhub.co/plaid.com.png</Employerlogo>
      <Employerdescription>Plaid is a technology company that powers the tools millions of people rely on to live a healthier financial life. The company has a presence in multiple countries and works with thousands of companies.</Employerdescription>
      <Employerwebsite>https://plaid.com/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.lever.co/plaid/43b1374d-5c5e-4b63-b710-a95e3cb76bbe</Applyto>
      <Location>San Francisco</Location>
      <Country></Country>
      <Postedate>2026-04-17</Postedate>
    </job>
    <job>
      <externalid>5c28c97d-fc5</externalid>
      <Title>Member of Technical Staff - Image / Video Generation</Title>
      <Description><![CDATA[<p><strong>Job Title</strong></p>
<p>Member of Technical Staff - Image / Video Generation</p>
<p><strong>Job Description</strong></p>
<p>We&#39;re the team behind Latent Diffusion, Stable Diffusion, and FLUX,foundational technologies that changed how the world creates images and video. We&#39;re creating the generative models that power how people make images and video,tools used by millions of creators, developers, and businesses worldwide. Our FLUX models are among the most advanced in the world, and we’re just getting started.</p>
<p><strong>Why This Role</strong></p>
<p>You&#39;ll train large-scale diffusion models for image and video generation, exploring new approaches while maintaining the rigor that helps us distinguish meaningful progress from incremental tweaks. This isn&#39;t about following established recipes,it&#39;s about running the experiments that clarify which architectural choices matter and which are less impactful.</p>
<p><strong>What You’ll Work On</strong></p>
<ul>
<li>Trains large-scale diffusion transformer models for image and video data, working at the scale where intuitions break and empirical evidence matters</li>
<li>Rigorously ablates design choices,running experiments that isolate variables, control for confounds, and produce insights you can actually trust,then communicating those results to shape our research direction</li>
<li>Reasons about the speed-quality tradeoffs of neural network architectures in production settings where both constraints matter simultaneously</li>
<li>Fine-tunes diffusion models for specialized applications like image and video upscalers, inpainting/outpainting models, and other tasks where general-purpose models aren&#39;t enough</li>
</ul>
<p><strong>What We’re Looking For</strong></p>
<ul>
<li>You&#39;ve trained large-scale diffusion models and developed strong intuitions about what matters. You know that at research scale, every design choice has tradeoffs, and the only way to know which ones are worth making is through careful ablation. You&#39;re comfortable debugging distributed training issues and presenting research findings to the team.</li>
</ul>
<p><strong>Required Skills</strong></p>
<ul>
<li>Hands-on experience training large-scale diffusion models for image and video data, with practical knowledge of common failure modes and what matters most in training</li>
<li>Experience fine-tuning diffusion models for specialized applications,upscalers, inpainting, outpainting, or other tasks where understanding the domain matters as much as understanding the architecture</li>
<li>Deep understanding of how to effectively evaluate image and video generative models,knowing which metrics correlate with quality and which are just convenient proxies</li>
<li>Strong proficiency in PyTorch, transformer architectures, and the full ecosystem of modern deep learning</li>
<li>Solid understanding of distributed training techniques,FSDP, low precision training, model parallelism,because our models don&#39;t fit on one GPU and training decisions impact research outcomes</li>
</ul>
<p><strong>Preferred Skills</strong></p>
<ul>
<li>Experience writing forward and backward Triton kernels and ensuring their correctness while considering floating point errors</li>
<li>Proficiency with profiling, debugging, and optimizing single and multi-GPU operations using tools like Nsight or stack trace viewers</li>
<li>Know the performance characteristics of different architectural choices at scale</li>
<li>Have published research that contributed to how people think about generative models</li>
</ul>
<p><strong>How We Work Together</strong></p>
<p>We’re a distributed team with real offices that people actually use. Depending on your role, you’ll either join us in Freiburg or SF at least 2 days a week (or one full week every other week), or work remotely with a monthly in-person week to stay connected. We’ll cover reasonable travel costs to make this possible. We think in-person time matters, and we’ve structured things to make it accessible to all. We’ll discuss what this will look like for the role during our interview process.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>staff</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>large-scale diffusion models, image and video data, PyTorch, transformer architectures, distributed training techniques, writing forward and backward Triton kernels, profiling, debugging, and optimizing single and multi-GPU operations, published research on generative models</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Black Forest Labs</Employername>
      <Employerlogo>https://logos.yubhub.co/blackforestlabs.com.png</Employerlogo>
      <Employerdescription>Black Forest Labs is a research lab developing foundational technologies for image and video generation. They have a growing presence in San Francisco and headquarters in Freiburg, Germany.</Employerdescription>
      <Employerwebsite>https://www.blackforestlabs.com/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/blackforestlabs/jobs/4132217008</Applyto>
      <Location>Freiburg (Germany)</Location>
      <Country></Country>
      <Postedate>2026-04-17</Postedate>
    </job>
    <job>
      <externalid>3c4831ed-fa8</externalid>
      <Title>Technical Product Manager</Title>
      <Description><![CDATA[<p>We&#39;re hiring a Technical Product Manager to define and execute Alluxio&#39;s AI systems strategy , spanning inference, training, and emerging agentic workloads. This role bridges the worlds of AI infrastructure and distributed data systems, guiding how Alluxio evolves to serve next-generation model architectures and large-scale data flows.</p>
<p><strong>Responsibilities</strong></p>
<ul>
<li>Define the long-term vision and roadmap for Alluxio&#39;s AI data platform, covering inference, training, and agentic workloads.</li>
<li>Collaborate with engineering to design features that deliver high-throughput, low-latency data access (e.g., GPU-aware caching, streaming reads, tiered prefetching).</li>
<li>Ensure seamless integration with frameworks like PyTorch, TensorFlow, Ray, and Triton; evolve Alluxio&#39;s APIs for AI-native workloads.</li>
<li>Engage directly with enterprise AI teams to understand workload patterns, validate impact, and prioritize roadmap direction.</li>
<li>Stay ahead of trends in multi-model serving, retrieval-augmented generation (RAG), and agentic orchestration; translate them into actionable product plans.</li>
</ul>
<p><strong>Requirements</strong></p>
<ul>
<li>5–9 years of experience in product management or technical leadership within AI infrastructure, ML platforms, or distributed systems.</li>
<li>Strong understanding of AI/ML workflows , from model training and deployment to inference and data-access pipelines.</li>
<li>Proven track record of delivering infrastructure features that improve latency, GPU utilization, or total cost of ownership.</li>
<li>Technical fluency with distributed systems, caching, and cloud orchestration (Kubernetes, AWS/GCP/Azure).</li>
<li>Familiarity with AI frameworks such as PyTorch, TensorFlow, Triton, Ray, or LangChain.</li>
<li>Exceptional communication and strategic thinking , ability to translate complex systems work into clear, prioritized roadmaps.</li>
</ul>
<p><strong>Benefits</strong></p>
<ul>
<li>Shape how the world&#39;s most advanced AI systems access and process data.</li>
<li>Work at the intersection of distributed systems, AI acceleration, and open source.</li>
<li>Collaborate with world-class engineers, researchers, and customers driving the AI frontier.</li>
<li>Competitive compensation and equity package with comprehensive benefits.</li>
<li>A culture built on curiosity, empathy, and deep technical rigor.</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>remote</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>Distributed systems, Caching, Cloud orchestration, Kubernetes, AWS/GCP/Azure, PyTorch, TensorFlow, Triton, Ray, LangChain</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Alluxio</Employername>
      <Employerlogo>https://logos.yubhub.co/alluxio.io.png</Employerlogo>
      <Employerdescription>Alluxio powers the data layer for modern AI and analytics, with proven production at eight of the top ten internet companies and seven of the ten highest-valued enterprises globally.</Employerdescription>
      <Employerwebsite>https://alluxio.io</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.lever.co/alluxio/e7e0f8a4-83ed-416b-9f95-7f3ed8abfa52</Applyto>
      <Location>San Francisco</Location>
      <Country></Country>
      <Postedate>2026-04-17</Postedate>
    </job>
    <job>
      <externalid>ce88828f-470</externalid>
      <Title>Solutions Architect, AI and ML</Title>
      <Description><![CDATA[<p>We are building the world&#39;s leading AI company and are looking for an experienced Cloud Solution Architect to help assist customers with adoption of GPU hardware and Software, as well as building and deploying Machine Learning (ML), Deep Learning (DL), data analytics solutions on various Cloud Computing Platforms.</p>
<p>As part of the Solutions Architecture team, we work with some of the most exciting computing hardware and software technologies including the latest breakthroughs in machine learning and data science. A Solutions Architect is the first line of technical expertise between NVIDIA and our customers so you will engage directly with developers, researchers, and data scientists with some of NVIDIA&#39;s most strategic technology customers as well as work directly with business and engineering teams on product strategy.</p>
<p><strong>What you will be doing:</strong></p>
<ul>
<li>Working with Cloud Service Providers to develop and demonstrate solutions based on NVIDIA&#39;s ML/DL and data science software and hardware technologies</li>
</ul>
<ul>
<li>Build and deploy AI/ML solutions at scale using NVIDIA&#39;s AI software on cloud-based GPU platforms.</li>
</ul>
<ul>
<li>Build custom PoCs for solution that address customer&#39;s critical business needs applying NVIDIA hardware and software technology</li>
</ul>
<ul>
<li>Partner with Sales Account Managers or Developer Relations Managers to identify and secure new business opportunities for NVIDIA products and solutions for ML/DL and other software solutions</li>
</ul>
<ul>
<li>Prepare and deliver technical content to customers including presentations about purpose-built solutions, workshops about NVIDIA products and solutions, etc.</li>
</ul>
<ul>
<li>Conduct regular technical customer meetings for project/product roadmap, feature discussions, and intro to new technologies. Establish close technical ties to the customer to facilitate rapid resolution of customer issues</li>
</ul>
<p><strong>What we need to see:</strong></p>
<ul>
<li>3+ years of Solutions Engineering (or similar Sales Engineering roles) or equivalent experience</li>
</ul>
<ul>
<li>3+ years of work-related experience in Deep Learning and Machine Learning, including deep learning frameworks TensorFlow or PyTorch, GPU, and CUDA experience extremely helpful.</li>
</ul>
<ul>
<li>BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Statistics, Physics, or other Engineering fields or equivalent experience.</li>
</ul>
<ul>
<li>Established track record of deploying solutions in cloud computing environments including AWS, GCP, or Azure</li>
</ul>
<ul>
<li>Knowledge of DevOps/ML Ops technologies such as Docker/containers, Kubernetes, data center deployments</li>
</ul>
<ul>
<li>Ability to use at least one scripting language (i.e., Python)</li>
</ul>
<ul>
<li>Good programming and debugging skills</li>
</ul>
<ul>
<li>Ability to communicate your ideas/code clearly through documents, presentation etc.</li>
</ul>
<p><strong>Ways to stand out from the crowd:</strong></p>
<ul>
<li>AWS, GCP or Azure Professional Solution Architect Certification.</li>
</ul>
<ul>
<li>Hands-on experience with NVIDIA GPUs and SDKs (e.g. CUDA, RAPIDS, Triton etc.)</li>
</ul>
<ul>
<li>System-level experience specifically GPU-based systems</li>
</ul>
<ul>
<li>Experience with Deep Learning at scale</li>
</ul>
<ul>
<li>Familiarity with parallel programming and distributed computing platforms</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>Solutions Engineering, Deep Learning and Machine Learning, TensorFlow or PyTorch, GPU and CUDA experience, BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Statistics, Physics, or other Engineering fields, DevOps/ML Ops technologies, Docker/containers, Kubernetes, data center deployments, Scripting language (i.e., Python), Good programming and debugging skills, Ability to communicate your ideas/code clearly through documents, presentation etc., AWS, GCP or Azure Professional Solution Architect Certification, Hands-on experience with NVIDIA GPUs and SDKs (e.g. CUDA, RAPIDS, Triton etc.), System-level experience specifically GPU-based systems, Experience with Deep Learning at scale, Familiarity with parallel programming and distributed computing platforms</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>NVIDIA</Employername>
      <Employerlogo>https://logos.yubhub.co/nvidia.com.png</Employerlogo>
      <Employerdescription>NVIDIA is a leading technology company that specialises in designing and manufacturing graphics processing units (GPUs) and high-performance computing hardware.</Employerdescription>
      <Employerwebsite>https://nvidia.wd5.myworkdayjobs.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-WA-Redmond/Solutions-Architect--AI-and-ML_JR2000691</Applyto>
      <Location>Redmond, Santa Clara, Seattle</Location>
      <Country></Country>
      <Postedate>2026-03-09</Postedate>
    </job>
    <job>
      <externalid>f8883394-0fc</externalid>
      <Title>Solutions Architect, AI and ML</Title>
      <Description><![CDATA[<p>We are looking for an experienced Cloud Solution Architect to help assist customers with adoption of GPU hardware and Software, as well as building and deploying Machine Learning (ML) , Deep Learning (DL), data analytics solutions on various Cloud Computing Platforms.</p>
<p>As a Solutions Architect, you will engage directly with developers, researchers, and data scientists with some of NVIDIA’s most strategic technology customers as well as work directly with business and engineering teams on product strategy.</p>
<p><strong>Key Responsibilities:</strong></p>
<ul>
<li>Help cloud customers craft, deploy, and maintain scalable, GPU-accelerated inference pipelines on cloud ML services and Kubernetes for large language models (LLMs) and generative AI workloads.</li>
<li>Enhance performance tuning using TensorRT/TensorRT-LLM, vLLM, Dynamo, and Triton Inference Server to improve GPU utilization and model efficiency.</li>
<li>Collaborate with multi-functional teams (engineering, product) and offer technical mentorship to cloud customers implementing AI inference at scale.</li>
<li>Build custom PoCs for solution that address customer’s critical business needs applying NVIDIA hardware and software technology</li>
<li>Partner with Sales Account Managers or Developer Relations Managers to identify and secure new business opportunities for NVIDIA products and solutions for ML/DL and other software solutions</li>
<li>Prepare and deliver technical content to customers including presentations about purpose-built solutions, workshops about NVIDIA products and solutions, etc.</li>
<li>Conduct regular technical customer meetings for project/product roadmap, feature discussions, and intro to new technologies. Establish close technical ties to the customer to facilitate rapid resolution of customer issues</li>
</ul>
<p><strong>Requirements:</strong></p>
<ul>
<li>BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Statistics, Physics, or other Engineering fields or equivalent experience.</li>
<li>3+ Years in Solutions Architecture with a proven track record of moving AI inference from POC to production in cloud computing environments including AWS, GCP, or Azure</li>
<li>3+ years of hands-on experience with Deep Learning frameworks such as PyTorch and TensorFlow</li>
<li>Excellent knowledge of the theory and practice of LLM and DL inference</li>
<li>Strong fundamentals in programming, optimizations, and software design, especially in Python</li>
<li>Experience with containerization and orchestration technologies like Docker and Kubernetes, monitoring, and observability solutions for AI deployments</li>
<li>Knowledge of Inference technologies - NVIDIA NIM, TensorRT-LLM, Dynamo, Triton Inference Server, vLLM, etc</li>
<li>Proficiency in problem-solving and debugging skills in GPU environments</li>
<li>Excellent presentation, communication and collaboration skills</li>
</ul>
<p><strong>Nice to Have:</strong></p>
<ul>
<li>AWS, GCP or Azure Professional Solution Architect Certification.</li>
<li>Experience optimizing and deploying large MoE LLMs at scale</li>
<li>Active contributions to open-source AI inference projects (e.g., vLLM, TensorRT-LLM Dynamo, SGLang, Triton or similar)</li>
<li>Experience with Multi-GPU Multi-node Inference technologies like Tensor Parallelism/Expert Parallelism, Disaggregated Serving, LWS, MPI, EFA/Infiniband, NVLink/PCIe, etc</li>
<li>Experience in developing and integrating monitoring and alerting solutions using Prometheus, Grafana, and NVIDIA DCGM and GPU performance Analysis and tools like NVIDIA Nsight Systems</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>Cloud Solution Architecture, GPU hardware and Software, Machine Learning (ML), Deep Learning (DL), Data Analytics, Cloud Computing Platforms, Kubernetes, TensorRT, TensorRT-LLM, vLLM, Dynamo, Triton Inference Server, Python, Containerization, Orchestration, Monitoring, Observability, Inference technologies, NVIDIA NIM, Problem-solving, Debugging, GPU environments, AWS, GCP, Azure, Professional Solution Architect Certification, Large MoE LLMs, Open-source AI inference projects, Multi-GPU Multi-node Inference technologies, Monitoring and alerting solutions, Prometheus, Grafana, NVIDIA DCGM, GPU performance Analysis, NVIDIA Nsight Systems</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>NVIDIA</Employername>
      <Employerlogo>https://logos.yubhub.co/nvidia.com.png</Employerlogo>
      <Employerdescription>NVIDIA is a leading technology company that specializes in designing and manufacturing graphics processing units (GPUs) and high-performance computing hardware.</Employerdescription>
      <Employerwebsite>https://nvidia.wd5.myworkdayjobs.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-WA-Redmond/Solutions-Architect--AI-and-ML_JR2005988-1</Applyto>
      <Location>Redmond, CA, Santa Clara, Seattle</Location>
      <Country></Country>
      <Postedate>2026-03-09</Postedate>
    </job>
    <job>
      <externalid>11a60d5a-f54</externalid>
      <Title>Performance Engineer, GPU</Title>
      <Description><![CDATA[<p><strong>About the role:</strong></p>
<p>Pioneering the next generation of AI requires breakthrough innovations in GPU performance and systems engineering. As a GPU Performance Engineer, you&#39;ll architect and implement the foundational systems that power Claude and push the frontiers of what&#39;s possible with large language models. You&#39;ll be responsible for maximizing GPU utilization and performance at unprecedented scale, developing cutting-edge optimizations that directly enable new model capabilities and dramatically improve inference efficiency.</p>
<p>Working at the intersection of hardware and software, you&#39;ll implement state-of-the-art techniques from custom kernel development to distributed system architectures. Your work will span the entire stack—from low-level tensor core optimizations to orchestrating thousands of GPUs in perfect synchronization.</p>
<p>Strong candidates will have a track record of delivering transformative GPU performance improvements in production ML systems and will be excited to shape the future of AI infrastructure alongside world-class researchers and engineers.</p>
<p><strong>You might be a good fit if you:</strong></p>
<ul>
<li>Have deep experience with GPU programming and optimization at scale</li>
<li>Are impact-driven, passionate about delivering measurable performance breakthroughs</li>
<li>Can navigate complex systems from hardware interfaces to high-level ML frameworks</li>
<li>Enjoy collaborative problem-solving and pair programming</li>
<li>Want to work on state-of-the-art language models with real-world impact</li>
<li>Care about the societal impacts of your work</li>
<li>Thrive in ambiguous environments where you define the path forward</li>
</ul>
<p><strong>Strong candidates may also have experience with:</strong></p>
<ul>
<li>GPU Kernel Development: CUDA, Triton, CUTLASS, Flash Attention, tensor core optimization</li>
<li>ML Compilers &amp; Frameworks: PyTorch/JAX internals, torch.compile, XLA, custom operators</li>
<li>Performance Engineering: Kernel fusion, memory bandwidth optimization, profiling with Nsight</li>
<li>Distributed Systems: NCCL, NVLink, collective communication, model parallelism</li>
<li>Low-Precision: INT8/FP8 quantization, mixed-precision techniques</li>
<li>Production Systems: Large-scale training infrastructure, fault tolerance, cluster orchestration</li>
</ul>
<p><strong>Representative projects:</strong></p>
<ul>
<li>Co-design attention mechanisms and algorithms for next-generation hardware architectures</li>
<li>Develop custom kernels for emerging quantization formats and mixed-precision techniques</li>
<li>Design distributed communication strategies for multi-node GPU clusters</li>
<li>Optimize end-to-end training and inference pipelines for frontier language models</li>
<li>Build performance modeling frameworks to predict and optimize GPU utilization</li>
<li>Implement kernel fusion strategies to minimize memory bandwidth bottlenecks</li>
<li>Create resilient systems for planet-scale distributed training infrastructure</li>
<li>Profile and eliminate performance bottlenecks in production serving infrastructure</li>
<li>Partner with hardware vendors to influence future accelerator capabilities and software stacks</li>
</ul>
<p><strong>Deadline to apply:</strong> None. Applications will be reviewed on a rolling basis.</p>
<p>The expected salary range for this position is:</p>
<p>Annual Salary: $280,000 - $850,000USD</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$280,000 - $850,000USD</Salaryrange>
      <Skills>GPU programming, optimization at scale, custom kernel development, distributed system architectures, low-level tensor core optimizations, orchestrating thousands of GPUs, GPU kernel development, CUDA, Triton, CUTLASS, Flash Attention, tensor core optimization, ML compilers &amp; frameworks, PyTorch/JAX internals, torch.compile, XLA, custom operators, performance engineering, kernel fusion, memory bandwidth optimization, profiling with Nsight, distributed systems, NCCL, NVLink, collective communication, model parallelism, low-precision, INT8/FP8 quantization, mixed-precision techniques, production systems, large-scale training infrastructure, fault tolerance, cluster orchestration, GPU programming, optimization at scale, custom kernel development, distributed system architectures, low-level tensor core optimizations, orchestrating thousands of GPUs, GPU kernel development, CUDA, Triton, CUTLASS, Flash Attention, tensor core optimization, ML compilers &amp; frameworks, PyTorch/JAX internals, torch.compile, XLA, custom operators, performance engineering, kernel fusion, memory bandwidth optimization, profiling with Nsight, distributed systems, NCCL, NVLink, collective communication, model parallelism, low-precision, INT8/FP8 quantization, mixed-precision techniques, production systems, large-scale training infrastructure, fault tolerance, cluster orchestration</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Anthropic</Employername>
      <Employerlogo>https://logos.yubhub.co/anthropic.com.png</Employerlogo>
      <Employerdescription>Anthropic&apos;s mission is to create reliable, interpretable, and steerable AI systems. The company is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.</Employerdescription>
      <Employerwebsite>https://job-boards.greenhouse.io</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/anthropic/jobs/4926227008</Applyto>
      <Location>San Francisco, CA | New York City, NY | Seattle, WA</Location>
      <Country></Country>
      <Postedate>2026-03-08</Postedate>
    </job>
    <job>
      <externalid>7badeaf5-492</externalid>
      <Title>Hardware / Software CoDesign Engineer</Title>
      <Description><![CDATA[<p><strong>Hardware / Software CoDesign Engineer</strong></p>
<p><strong>Location</strong></p>
<p>San Francisco</p>
<p><strong>Employment Type</strong></p>
<p>Full time</p>
<p><strong>Location Type</strong></p>
<p>Hybrid</p>
<p><strong>Department</strong></p>
<p>Scaling</p>
<p><strong>Compensation</strong></p>
<ul>
<li>$342K – $555K • Offers Equity</li>
</ul>
<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>
<p><strong>Benefits</strong></p>
<ul>
<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>
</ul>
<ul>
<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>
</ul>
<ul>
<li>401(k) retirement plan with employer match</li>
</ul>
<ul>
<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>
</ul>
<ul>
<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>
</ul>
<ul>
<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>
</ul>
<ul>
<li>Mental health and wellness support</li>
</ul>
<ul>
<li>Employer-paid basic life and disability coverage</li>
</ul>
<ul>
<li>Annual learning and development stipend to fuel your professional growth</li>
</ul>
<ul>
<li>Daily meals in our offices, and meal delivery credits as eligible</li>
</ul>
<ul>
<li>Relocation support for eligible employees</li>
</ul>
<ul>
<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>
</ul>
<p><strong>About the Team</strong></p>
<p>OpenAI’s Hardware organization develops silicon and system-level solutions designed for the unique demands of advanced AI workloads. The team is responsible for building the next generation of AI-native silicon while working closely with software and research partners to co-design hardware tightly integrated with AI models. In addition to delivering production-grade silicon for OpenAI’s supercomputing infrastructure, the team also creates custom design tools and methodologies that accelerate innovation and enable hardware optimized specifically for AI.</p>
<p><strong>About the Role</strong></p>
<p>As an Engineer on our hardware optimization and co-design team, you will co-design future hardware from different vendors for programmability and performance. You will work with our kernel, compiler and machine learning engineers to understand their unique needs related to ML techniques, algorithms, numerical approximations, programming expressivity, and compiler optimizations. You will evangelize these constraints with various vendors to develop and influence future hardware architectures towards efficient training and inference on our models. If you are excited about efficiently distributing a large language model across devices, dealing with and optimizing system-wide/rack-wide networking bottlenecks and eventually tailoring the compute pipe and memory hierarchy of the hardware platform, simulating workloads at different abstractions and working closely with our partners, this is the perfect opportunity!</p>
<p><strong>In this role, you will:</strong></p>
<ul>
<li>Co-design future hardware for programmability and performance with our hardware vendors</li>
</ul>
<ul>
<li>Assist hardware vendors in developing optimal kernels and add support for it in our compiler</li>
</ul>
<ul>
<li>Develop performance estimates for critical kernels for different hardware configurations and drive decisions on compute core and memory hierarchy features</li>
</ul>
<ul>
<li>Build system performance models at different abstraction levels and carry out analysis to drive decisions on scale up, scale out, front end networking</li>
</ul>
<ul>
<li>Work with machine learning engineers, kernel engineers and compiler developers to understand their vision and needs from high performance accelerators</li>
</ul>
<ul>
<li>Manage communication and coordination with internal and external partners</li>
</ul>
<ul>
<li>Influence the roadmap of hardware partners to optimize them for OpenAI’s workloads.</li>
</ul>
<ul>
<li>Evaluate potential partners’ accelerators and platforms.</li>
</ul>
<ul>
<li>As the scope of the role and team grows, understand and influence roadmaps for hardware partners for our datacenter networks, racks, and buildings.</li>
</ul>
<p><strong>You might thrive in this role if you have:</strong></p>
<ul>
<li>4+ years of industry experience, including experience harnessing compute at scale and optimizing ML platform code to run efficiently on target hardware.</li>
</ul>
<ul>
<li>Strong experience in software/hardware co-design</li>
</ul>
<ul>
<li>Deep understanding of GPU and/or other AI accelerators</li>
</ul>
<ul>
<li>Experience with CUDA, Triton or a related accelerator programming language</li>
</ul>
<ul>
<li>Experience driving Machine Learning accuracy with low precision formats</li>
</ul>
<ul>
<li>Experience with system performance modeling and analysis to optimize ML model deployment</li>
</ul>
<ul>
<li>Strong coding skills in C/C++ and Python</li>
</ul>
<ul>
<li>Are familiar with the fundamentals of deep learning computing and chip architecture/microarchitecture.</li>
</ul>
<p><strong>These attributes are nice to have:</strong></p>
<ul>
<li>PhD in Computer Science and Engineering with a specialization in Computer Architecture, Parallel Computing. Compilers or other Systems</li>
</ul>
<ul>
<li>Strong understanding of LLMs and challenges related to their training and inference</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$342K – $555K • Offers Equity</Salaryrange>
      <Skills>software/hardware co-design, GPU and/or other AI accelerators, CUDA, Triton or a related accelerator programming language, Machine Learning accuracy with low precision formats, system performance modeling and analysis to optimize ML model deployment, C/C++ and Python, PhD in Computer Science and Engineering with a specialization in Computer Architecture, Parallel Computing. Compilers or other Systems, Strong understanding of LLMs and challenges related to their training and inference</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>OpenAI</Employername>
      <Employerlogo>https://logos.yubhub.co/openai.com.png</Employerlogo>
      <Employerdescription>OpenAI is a technology company that develops and commercializes advanced artificial intelligence (AI) systems. The company was founded in 2015 and is headquartered in San Francisco, California.</Employerdescription>
      <Employerwebsite>https://jobs.ashbyhq.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.ashbyhq.com/openai/bdbb2292-ecb3-42dc-ba89-65edf397d8f8</Applyto>
      <Location>San Francisco</Location>
      <Country></Country>
      <Postedate>2026-03-06</Postedate>
    </job>
    <job>
      <externalid>989f992b-6b2</externalid>
      <Title>Software Engineer, Inference – AMD GPU Enablement</Title>
      <Description><![CDATA[<p><strong>Software Engineer, Inference – AMD GPU Enablement</strong></p>
<p><strong>Location</strong></p>
<p>San Francisco</p>
<p><strong>Employment Type</strong></p>
<p>Full time</p>
<p><strong>Department</strong></p>
<p>Scaling</p>
<p><strong>Compensation</strong></p>
<ul>
<li>$295K – $555K • Offers Equity</li>
</ul>
<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>
<ul>
<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>
</ul>
<ul>
<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>
</ul>
<ul>
<li>401(k) retirement plan with employer match</li>
</ul>
<ul>
<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>
</ul>
<ul>
<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>
</ul>
<ul>
<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>
</ul>
<ul>
<li>Mental health and wellness support</li>
</ul>
<ul>
<li>Employer-paid basic life and disability coverage</li>
</ul>
<ul>
<li>Annual learning and development stipend to fuel your professional growth</li>
</ul>
<ul>
<li>Daily meals in our offices, and meal delivery credits as eligible</li>
</ul>
<ul>
<li>Relocation support for eligible employees</li>
</ul>
<ul>
<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>
</ul>
<p>More details about our benefits are available to candidates during the hiring process.</p>
<p>This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.</p>
<p><strong>About the Team</strong></p>
<p>Our Inference team brings OpenAI’s most capable research and technology to the world through our products. We empower consumers, enterprises and developers alike to use and access our state-of-the-art AI models, allowing them to do things that they’ve never been able to before. We focus on performant and efficient model inference, as well as accelerating research progression via model inference.</p>
<p><strong>About the Role</strong></p>
<p>We’re hiring engineers to scale and optimize OpenAI’s inference infrastructure across emerging GPU platforms. You’ll work across the stack - from low-level kernel performance to high-level distributed execution - and collaborate closely with research, infra, and performance teams to ensure our largest models run smoothly on new hardware.</p>
<p>This is a high-impact opportunity to shape OpenAI’s multi-platform inference capabilities from the ground up with a particular focus on advancing inference performance on AMD accelerators.</p>
<p><strong>In this role, you will:</strong></p>
<ul>
<li>Own bring-up, correctness and performance of the OpenAI inference stack on AMD hardware.</li>
</ul>
<ul>
<li>Integrate internal model-serving infrastructure (e.g., vLLM, Triton) into a variety of GPU-backed systems.</li>
</ul>
<ul>
<li>Debug and optimize distributed inference workloads across memory, network, and compute layers.</li>
</ul>
<ul>
<li>Validate correctness, performance, and scalability of model execution on large GPU clusters.</li>
</ul>
<ul>
<li>Collaborate with partner teams to design and optimize high-performance GPU kernels for accelerators using HIP, Triton, or other performance-focused frameworks.</li>
</ul>
<ul>
<li>Collaborate with partner teams to build, integrate and tune collective communication libraries (e.g., RCCL) used to parallelize model execution across many GPUs.</li>
</ul>
<p><strong>You can thrive in this role if you:</strong></p>
<ul>
<li>Have experience writing or porting GPU kernels using HIP, CUDA, or Triton, and care deeply about low-level performance.</li>
</ul>
<ul>
<li>Are familiar with communication libraries like NCCL/RCCL and understand their role in high-throughput model serving.</li>
</ul>
<ul>
<li>Have worked on distributed inference systems and are comfortable scaling models across fleets of accelerators.</li>
</ul>
<ul>
<li>Enjoy solving end-to-end performance challenges across hardware, system libraries, and orchestration layers.</li>
</ul>
<ul>
<li>Are excited to be part of a small, fast-moving team building new infrastructure from first principles.</li>
</ul>
<p><strong>Nice to Have:</strong></p>
<ul>
<li>Contributions to open-source libraries like RCCL, Triton, or vLLM.</li>
</ul>
<ul>
<li>Experience with GPU performance tools (Nsight, rocprof, perf) and memory/comms profiling.</li>
</ul>
<ul>
<li>Prior experience deploying inference on other non-NVIDIA GPU environments.</li>
</ul>
<ul>
<li>Knowledge of model/tensor parallelism, mixed precision, and serving 10B+ parameter models.</li>
</ul>
<p><strong>About OpenAI</strong></p>
<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>mid</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange>$295K – $555K • Offers Equity</Salaryrange>
      <Skills>GPU kernels, HIP, CUDA, Triton, NCCL/RCCL, distributed inference systems, GPU performance tools, memory/comms profiling, open-source libraries, GPU performance tools, memory/comms profiling</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>OpenAI</Employername>
      <Employerlogo>https://logos.yubhub.co/openai.com.png</Employerlogo>
      <Employerdescription>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products.</Employerdescription>
      <Employerwebsite>https://jobs.ashbyhq.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.ashbyhq.com/openai/9b79406c-89a8-49bd-8a38-e72db80996e9</Applyto>
      <Location>San Francisco</Location>
      <Country></Country>
      <Postedate>2026-03-06</Postedate>
    </job>
    <job>
      <externalid>46bb9922-091</externalid>
      <Title>ML Research Engineer - Hardware Codesign</Title>
      <Description><![CDATA[<p><strong>Location</strong></p>
<p>San Francisco</p>
<p><strong>Employment Type</strong></p>
<p>Full time</p>
<p><strong>Department</strong></p>
<p>Scaling</p>
<p><strong>Compensation</strong></p>
<ul>
<li>$185K – $455K • Offers Equity</li>
</ul>
<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>
<ul>
<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>
</ul>
<ul>
<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>
</ul>
<ul>
<li>401(k) retirement plan with employer match</li>
</ul>
<ul>
<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>
</ul>
<ul>
<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>
</ul>
<ul>
<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>
</ul>
<ul>
<li>Mental health and wellness support</li>
</ul>
<ul>
<li>Employer-paid basic life and disability coverage</li>
</ul>
<ul>
<li>Annual learning and development stipend to fuel your professional growth</li>
</ul>
<ul>
<li>Daily meals in our offices, and meal delivery credits as eligible</li>
</ul>
<ul>
<li>Relocation support for eligible employees</li>
</ul>
<ul>
<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>
</ul>
<p>More details about our benefits are available to candidates during the hiring process.</p>
<p>This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.</p>
<p><strong><strong>About the Team</strong></strong></p>
<p>OpenAI’s Hardware organization develops silicon and system-level solutions designed for the unique demands of advanced AI workloads. The team is responsible for building the next generation of AI silicon while working closely with software and research partners to co-design hardware tightly integrated with AI models. In addition to delivering production-grade silicon for OpenAI’s supercomputing infrastructure, the team also creates custom design tools and methodologies that accelerate innovation and enable hardware optimized specifically for AI.</p>
<p><strong><strong>About the Role</strong></strong></p>
<p>We’re seeking a Research-Hardware Codesign Engineer to operate at the boundary between model research and silicon/system architecture. You’ll help shape the numerics, architecture, and technology bets of future OpenAI silicon in collaboration with both Research and Hardware.</p>
<p>Your work will include debugging gaps between rooflines and reality, writing quantization kernels, derisking numerics via model evals, quantifying system architecture tradeoffs, and implementing novel numeric RTL. This is a hands-on role for people who go looking for hard problems, get to ground truth, and drive it to production. Strong prioritization and clear, honest communication are essential.</p>
<p>Location: San Francisco, CA (Hybrid: 3 days/week onsite)</p>
<p>Relocation assistance available.</p>
<p><strong><strong>In this role you will:</strong></strong></p>
<ul>
<li>Build on our roofline simulator to track evolving workloads, and deliver analyses that quantify the impact of system architecture decisions and support technology pathfinding.</li>
</ul>
<ul>
<li>Debug gaps between performance simulation and real measurements; clearly communicate root cause, bottlenecks, and invalid assumptions.</li>
</ul>
<ul>
<li>Write emulation kernels for low-precision numerics and lossy compression schemes, and get Research the information they need to trade efficiency with model quality.</li>
</ul>
<ul>
<li>Prototype numerics modules by pushing RTL through synthesis; hand off novel numerics cleanly, or occasionally own an RTL module end-to-end.</li>
</ul>
<ul>
<li>Proactively pull in new ML workloads, prototype them with rooflines and/or functional simulation, and drive initial evaluation of new opportunities or risks.</li>
</ul>
<ul>
<li>Understand the whole picture from ML science to hardware optimization, and slice this end-to-end objective into near-term deliverables.</li>
</ul>
<ul>
<li>Build ad-hoc collaborations across teams with very different goals and areas of expertise, and keep progress unblocked.</li>
</ul>
<ul>
<li>Communicate design tradeoffs clearly with explicit assumptions and confidence levels; produce a trail of evidence that enables confident execution.</li>
</ul>
<p><strong><strong>You Will Thrive in this Role if:</strong></strong></p>
<ul>
<li>An exceptional track record of high-quality technical output, and a bias for shipping a prototype now and iterating later in the absence of clear requirements.</li>
</ul>
<ul>
<li>Strong Python, and C++ or Rust, with a cautious attitude toward correctness and an intuition for clean extensibility.</li>
</ul>
<ul>
<li>Experience writing Triton, CUDA, or similar, and an understanding of the resulting mapping of tensor ops to functional units.</li>
</ul>
<ul>
<li>Working knowledge of PyTorch or JAX; experience in large ML codebases is a plus.</li>
</ul>
<ul>
<li>Practical understanding of floating point numerics, the ML tradeoffs of reduced precision, and the current state of the art in model quantization.</li>
</ul>
<ul>
<li>Deep understanding of transformer models, and strong intuition for transformer rooflines and the tradeoffs of sharded training and inference in large-scale ML systems.</li>
</ul>
<ul>
<li>Experience writing RTL (especially for floating point logic) and understanding of PPA tradeoffs is a plus.</li>
</ul>
<ul>
<li>Strong cross-functional communication (e.g. across ML researchers and hardware engineers); ability to slice ambiguous early-incubation ideas into concrete arenas in which progress can be made.</li>
</ul>
<p>_To comply with U.S. export control laws and regulations, candidates for this role may need to meet certain legal status requirements.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$185K – $455K</Salaryrange>
      <Skills>Python, C++, Rust, Triton, CUDA, PyTorch, JAX, Floating point numerics, Model quantization, Transformer models, RTL, PPA tradeoffs, Strong Python, C++ or Rust, Experience writing Triton, CUDA or similar, Working knowledge of PyTorch or JAX, Experience in large ML codebases</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>OpenAI</Employername>
      <Employerlogo>https://logos.yubhub.co/openai.com.png</Employerlogo>
      <Employerdescription>OpenAI is a technology company that develops and commercializes advanced artificial intelligence (AI) systems. It was founded in 2015 and is headquartered in San Francisco, California.</Employerdescription>
      <Employerwebsite>https://jobs.ashbyhq.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.ashbyhq.com/openai/5931abef-191b-417e-89f1-1d06f00e908c</Applyto>
      <Location>San Francisco</Location>
      <Country></Country>
      <Postedate>2026-03-06</Postedate>
    </job>
    <job>
      <externalid>7f56054b-d77</externalid>
      <Title>Principal Software Engineer</Title>
      <Description><![CDATA[<p><strong>Summary</strong></p>
<p>Microsoft AI are looking for a talented Principal Software Engineer at their Mountain View office. This role sits at the heart of strategic decision-making, driving innovations in AI infrastructure. You&#39;ll work directly with key partners to understand, design, and implement complex inferencing capabilities for state-of-the-art deep learning models.</p>
<p><strong>About the Role</strong></p>
<p>As a Principal Software Engineer, you will be responsible for engaging directly with key partners to understand, design, and implement complex inferencing capabilities for state-of-the-art deep learning models. You will work with cutting-edge hardware and software stacks to deliver best-in-class inference performance while optimizing for cost, leveraging open-source projects to advance deep learning applications. You will collaborate with external and internal teams to identify new areas for improvement and contribute to innovations that enhance model performance and deployment.</p>
<p><strong>Accountabilities</strong></p>
<ul>
<li>Engage directly with key partners to understand, design, and implement complex inferencing capabilities for state-of-the-art deep learning models.</li>
<li>Work with cutting-edge hardware and software stacks to deliver best-in-class inference performance while optimizing for cost.</li>
</ul>
<p><strong>The Candidate we&#39;re looking for</strong></p>
<p><strong>Experience:</strong></p>
<ul>
<li>6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python.</li>
</ul>
<p><strong>Technical skills:</strong></p>
<ul>
<li>Experience with model compression (quantization, distillation, SVD, low-rank methods).</li>
<li>Experience in building high-throughput inference serving stacks (continuous batching, KV-cache optimizations, routing).</li>
</ul>
<p><strong>Personal attributes:</strong></p>
<ul>
<li>Solid experience in GPU inference optimization (CUDA, TensorRT, Triton, or custom GPU kernels).</li>
<li>Proficiency in profiling tools (Nsight, TensorBoard, PyTorch profiler) and ability to identify CPU/GPU bottlenecks.</li>
</ul>
<p><strong>Benefits</strong></p>
<ul>
<li>Competitive salary range of USD $139,900 – $274,800 per year.</li>
<li>Comprehensive benefits package, including health insurance, retirement plan, and paid time off.</li>
<li>Opportunities for professional growth and development.</li>
<li>Collaborative and dynamic work environment.</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange>USD $139,900 – $274,800 per year</Salaryrange>
      <Skills>C, C++, C#, Java, JavaScript, Python, model compression, GPU inference optimization, TensorRT, Triton, CUDA, Nsight, TensorBoard, PyTorch profiler</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Microsoft AI</Employername>
      <Employerlogo>https://logos.yubhub.co/microsoft.ai.png</Employerlogo>
      <Employerdescription>Microsoft AI is a leading technology company that specializes in artificial intelligence and machine learning. They are known for their innovative products and services that aim to make a positive impact on society. With a strong focus on research and development, Microsoft AI is constantly pushing the boundaries of what is possible with AI.</Employerdescription>
      <Employerwebsite>https://microsoft.ai</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://microsoft.ai/job/principal-software-engineer-24/</Applyto>
      <Location>Mountain View</Location>
      <Country></Country>
      <Postedate>2026-03-06</Postedate>
    </job>
    <job>
      <externalid>961a53f3-82e</externalid>
      <Title>Senior Software Engineer</Title>
      <Description><![CDATA[<p><strong>Summary</strong></p>
<p>Microsoft are looking for a talented Senior Software Engineer at their Suzhou office. This role sits at the heart of strategic decision-making, turning market data into actionable insights for a company that&#39;s revolutionising the search engine and online advertising ecosystem. You&#39;ll work directly with leadership to shape the company&#39;s direction in the search and advertising markets.</p>
<p><strong>About the Role</strong></p>
<p>The R&amp;D of Search Ads aims to build an online advertising ecosystem of users, advertisers, and the search engine. Bing Search Ads Understanding team is chartered to deliver world class algorithm using web scale data. Our mission is to drive user satisfaction, advertiser ROI and Bing revenue. A core challenge is to match advertisers’ “Ad display” and users’ “query” by build an intelligent system to really understand the users need. This is a very hard problem that demands the most advanced AI models and sophisticated engineering systems. Join us to work on projects highly strategic to Bing search in a fun and fast-paced environment!</p>
<p><strong>Accountabilities</strong></p>
<ul>
<li>Design, develop, and maintain high-performance software in C/C++ and Python, including GPU programming with CUDA, ROCm, or Triton.</li>
<li>Optimize model inference and training pipelines for speed, throughput, memory efficiency, and cost across GPU platforms.</li>
</ul>
<p><strong>The Candidate we&#39;re looking for</strong></p>
<p><strong>Experience:</strong></p>
<ul>
<li>Bachelor’s Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, CUDA, or ROCm OR equivalent experience.</li>
</ul>
<p><strong>Technical skills:</strong></p>
<ul>
<li>Practical experience writing new GPU kernels, going beyond experience of GPU workloads with existing library kernels.</li>
</ul>
<p><strong>Personal attributes:</strong></p>
<ul>
<li>Cross-team collaboration skills and the desire to collaborate in a team of researchers and developers.</li>
</ul>
<p><strong>Benefits</strong></p>
<ul>
<li>Work on projects highly strategic to Bing search in a fun and fast-paced environment.</li>
<li>Collaborate with platform teams to integrate and tune solutions on emerging accelerator stacks and rapidly evolving toolchains.</li>
<li>Partner with internal and external stakeholders to translate requirements into scalable performance features and optimizations for state-of-the-art models.</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>C/C++, Python, CUDA, ROCm, Triton, GPU programming, High-performance software development, Deep learning frameworks, Inference optimization, GPU profiling tools</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Microsoft</Employername>
      <Employerlogo>https://logos.yubhub.co/microsoft.ai.png</Employerlogo>
      <Employerdescription>Microsoft is a multinational technology company that develops, manufactures, licenses, and supports a wide range of software products, services, and devices. The company is known for its Windows operating system, Office software suite, and Xbox gaming console. Microsoft is headquartered in Redmond, Washington, and is one of the largest and most successful technology companies in the world.</Employerdescription>
      <Employerwebsite>https://microsoft.ai</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://microsoft.ai/job/senior-software-engineer-76/</Applyto>
      <Location>Suzhou</Location>
      <Country></Country>
      <Postedate>2026-03-06</Postedate>
    </job>
    <job>
      <externalid>a15b11dd-765</externalid>
      <Title>Principal Software Engineer</Title>
      <Description><![CDATA[<p><strong>Summary</strong></p>
<p>Microsoft AI are looking for a talented Principal Software Engineer at their Redmond office. This role sits at the heart of strategic decision-making, turning market data into actionable insights for a company that&#39;s revolutionising AI technology. You&#39;ll work directly with leadership to shape the company&#39;s direction in the AI market.</p>
<p><strong>About the Role</strong></p>
<p>As a Principal Software Engineer, you will be responsible for designing and implementing complex software systems that drive innovation in AI infrastructure. You will work with cutting-edge hardware and software stacks to deliver best-in-class inference performance while optimizing for cost, leveraging open-source projects to advance deep learning applications. You will collaborate with external and internal teams to identify new areas for improvement and contribute to innovations that enhance model performance and deployment.</p>
<p><strong>Accountabilities</strong></p>
<ul>
<li>Engage directly with key partners to understand, design, and implement complex inferencing capabilities for state-of-the-art deep learning models, driving innovations in AI infrastructure.</li>
<li>Work with cutting-edge hardware and software stacks to deliver best-in-class inference performance while optimizing for cost, leveraging open-source projects to advance deep learning applications.</li>
</ul>
<p><strong>The Candidate we&#39;re looking for</strong></p>
<p><strong>Experience:</strong></p>
<ul>
<li>Bachelor’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.</li>
</ul>
<p><strong>Technical skills:</strong></p>
<ul>
<li>Experience with model compression (quantization, distillation, SVD, low-rank methods).</li>
<li>Experience in building high-throughput inference serving stacks (continuous batching, KV-cache optimizations, routing).</li>
</ul>
<p><strong>Personal attributes:</strong></p>
<ul>
<li>Solid experience in GPU inference optimization (CUDA, TensorRT, Triton, or custom GPU kernels).</li>
<li>Proficiency in profiling tools (Nsight, TensorBoard, PyTorch profiler) and ability to identify CPU/GPU bottlenecks.</li>
</ul>
<p><strong>Benefits</strong></p>
<ul>
<li>Competitive salary</li>
<li>Comprehensive benefits package</li>
<li>Opportunities for professional growth and development</li>
<li>Collaborative and dynamic work environment</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange>USD $139,900 – $274,800 per year</Salaryrange>
      <Skills>C, C++, C#, Java, JavaScript, Python, model compression, GPU inference optimization, profiling tools, TensorRT, Triton, CUDA, TensorBoard, PyTorch profiler</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Microsoft AI</Employername>
      <Employerlogo>https://logos.yubhub.co/microsoft.ai.png</Employerlogo>
      <Employerdescription>Microsoft AI is a leading technology company that specializes in artificial intelligence and machine learning. They are known for their innovative products and services that aim to make a positive impact on society. With a strong focus on research and development, Microsoft AI is constantly pushing the boundaries of what is possible with AI.</Employerdescription>
      <Employerwebsite>https://microsoft.ai</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://microsoft.ai/job/principal-software-engineer-23/</Applyto>
      <Location>Redmond</Location>
      <Country></Country>
      <Postedate>2026-03-06</Postedate>
    </job>
    <job>
      <externalid>426a1b6c-bb9</externalid>
      <Title>Senior Software Engineer</Title>
      <Description><![CDATA[<p><strong>Summary</strong></p>
<p>Microsoft are looking for a talented Senior Software Engineer at their Beijing office. This role sits at the heart of strategic decision-making, turning market data into actionable insights for a company that&#39;s revolutionising the search engine and online advertising ecosystem. You&#39;ll work directly with leadership to shape the company&#39;s direction in the search engine and online advertising markets.</p>
<p><strong>About the Role</strong></p>
<p>The R&amp;D of Search Ads aims to build an online advertising ecosystem of users, advertisers, and the search engine. Bing Search Ads Understanding team is chartered to deliver world class algorithm using web scale data. Our mission is to drive user satisfaction, advertiser ROI and Bing revenue. A core challenge is to match advertisers’ “Ad display” and users’ “query” by build an intelligent system to really understand the users need. This is a very hard problem that demands the most advanced AI models and sophisticated engineering systems. Join us to work on projects highly strategic to Bing search in a fun and fast-paced environment!</p>
<p><strong>Accountabilities</strong></p>
<ul>
<li>Design, develop, and maintain high-performance software in C/C++ and Python, including GPU programming with CUDA, ROCm, or Triton.</li>
<li>Optimize model inference and training pipelines for speed, throughput, memory efficiency, and cost across GPU platforms.</li>
</ul>
<p><strong>The Candidate we&#39;re looking for</strong></p>
<p><strong>Experience:</strong></p>
<ul>
<li>Bachelor’s Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, CUDA, or ROCm OR equivalent experience.</li>
</ul>
<p><strong>Technical skills:</strong></p>
<ul>
<li>Practical experience writing new GPU kernels, going beyond experience of GPU workloads with existing library kernels.</li>
</ul>
<p><strong>Personal attributes:</strong></p>
<ul>
<li>Cross-team collaboration skills and the desire to collaborate in a team of researchers and developers.</li>
</ul>
<p><strong>Benefits</strong></p>
<ul>
<li>Work on projects highly strategic to Bing search in a fun and fast-paced environment.</li>
<li>Collaborate with platform teams to integrate and tune solutions on emerging accelerator stacks and rapidly evolving toolchains.</li>
<li>Partner with internal and external stakeholders to translate requirements into scalable performance features and optimizations for state-of-the-art models.</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>C/C++, Python, CUDA, ROCm, Triton, GPU programming, High-performance software development, Deep learning frameworks, Inference optimization, Software engineering principles, Architecture design</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Microsoft</Employername>
      <Employerlogo>https://logos.yubhub.co/microsoft.ai.png</Employerlogo>
      <Employerdescription>Microsoft is a multinational technology company that develops, manufactures, licenses, and supports a wide range of software products, services, and devices. The company is known for its Windows operating system, Office software suite, and Xbox gaming console. Microsoft is a leader in the technology industry and is committed to innovation and customer satisfaction.</Employerdescription>
      <Employerwebsite>https://microsoft.ai</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://microsoft.ai/job/senior-software-engineer-75/</Applyto>
      <Location>Beijing</Location>
      <Country></Country>
      <Postedate>2026-03-06</Postedate>
    </job>
    <job>
      <externalid>7c0b682d-d0b</externalid>
      <Title>Senior Software Engineer</Title>
      <Description><![CDATA[<p><strong>Summary</strong></p>
<p>Microsoft AI are looking for a talented Senior Software Engineer at their Beijing office. This role sits at the heart of strategic decision-making, turning market data into actionable insights for a company that&#39;s revolutionising AI technology. You&#39;ll work directly with leadership to shape the company&#39;s direction in the AI market.</p>
<p><strong>About the Role</strong></p>
<p>We are seeking an expert Senior GPU Engineer to join our AI Infrastructure team. In this role, you will architect and optimize the core inference engine that powers our large-scale AI models. You will be responsible for pushing the boundaries of hardware performance, reducing latency, and maximizing throughput for Generative AI and Deep Learning workloads. You will work at the intersection of Deep Learning algorithms and low-level hardware, designing custom operators and building a highly efficient training/inference execution engine from the ground up.</p>
<p><strong>Accountabilities</strong></p>
<ul>
<li>Custom Operator Development: Design and implement highly optimized GPU kernels (CUDA/Triton) for critical deep learning operations (e.g., FlashAttention, GEMM, LayerNorm) to outperform standard libraries.</li>
<li>Inference Engine Architecture: Contribute to the development of our high-performance inference engine, focusing on graph optimizations, operator fusion, and dynamic memory management (e.g., KV Cache optimization).</li>
</ul>
<p><strong>The Candidate we&#39;re looking for</strong></p>
<p><strong>Experience:</strong></p>
<ul>
<li>4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python.</li>
</ul>
<p><strong>Technical skills:</strong></p>
<ul>
<li>Expertise in the CUDA programming model and NVIDIA GPU architectures (specifically Ampere/Hopper).</li>
<li>Deep understanding of the memory hierarchy (Shared Memory, L2 cache, Registers), warp-level primitives, occupancy optimization, and bank conflict resolution.</li>
</ul>
<p><strong>Personal attributes:</strong></p>
<ul>
<li>Proven ability to navigate and modify complex, large-scale codebases (e.g., PyTorch internals, Linux kernel).</li>
</ul>
<p><strong>Benefits</strong></p>
<ul>
<li>Starting January 26, 2026, Microsoft AI employees who live within a 50-mile commute of a designated Microsoft office in the U.S. or 25-mile commute of a non-U.S., country-specific location are expected to work from the office at least four days per week.</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>C, C++, CUDA, Triton, PyTorch, Linux, CMake, pybind11, CI/CD, GPU workloads</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Microsoft AI</Employername>
      <Employerlogo>https://logos.yubhub.co/microsoft.ai.png</Employerlogo>
      <Employerdescription>Microsoft&apos;s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals.</Employerdescription>
      <Employerwebsite>https://microsoft.ai</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://microsoft.ai/job/senior-software-engineer-17/</Applyto>
      <Location>Beijing</Location>
      <Country></Country>
      <Postedate>2026-03-06</Postedate>
    </job>
    <job>
      <externalid>c041d54a-929</externalid>
      <Title>Internship Program</Title>
      <Description><![CDATA[<p>Perplexity is excited to announce the Internship Program for exceptional Master’s or PhD students studying Computer Science or Engineering in the UK, enrolled in the 2025-2026 academic year. This is an intensive program in which you will work directly with our AI Inference team.</p>
<p><strong>What you&#39;ll do</strong></p>
<ul>
<li>Work with the inference team to improve serving latency and throughput</li>
<li>Bring up support for new models and state-of-the-art inference optimizations or quantization schemes</li>
<li>Optimize inference across the entire stack, from GPU kernels to serving endpoints</li>
</ul>
<p><strong>What you need</strong></p>
<ul>
<li>Strong engineering track record with proven knowledge of fundamentals and programming languages (multi-threaded programming, networking, compilation, systems programming, etc)</li>
<li>Pursuing a Master&#39;s or PhD in Computer Science with a focus on performance-related subjects (HPC, Compilers, Distributed Systems)</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>internship</Jobtype>
      <Experiencelevel>entry</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>strong engineering track record, proven knowledge of fundamentals and programming languages, pursuing a Master&apos;s or PhD in Computer Science, experience with ML frameworks (Torch, JAX), experience with GPU programming (CUDA, Triton), experience with High-Performance Computing (OpenMPI)</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Perplexity</Employername>
      <Employerlogo>https://logos.yubhub.co/perplexity.com.png</Employerlogo>
      <Employerdescription>Perplexity is a rapidly growing AI startup that has experienced tremendous growth and adoption since publicly launching the world&apos;s first fully functional conversational answer engine in 2022.</Employerdescription>
      <Employerwebsite>https://jobs.ashbyhq.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.ashbyhq.com/perplexity/79a07e2d-6150-4929-80fe-bbe13a641763</Applyto>
      <Location>London</Location>
      <Country></Country>
      <Postedate>2026-03-04</Postedate>
    </job>
    <job>
      <externalid>7917d1eb-6e2</externalid>
      <Title>Engineering Manager - Inference</Title>
      <Description><![CDATA[<p>We are looking for an Inference Engineering Manager to lead our AI Inference team. This is a unique opportunity to build and scale the infrastructure that powers Perplexity&#39;s products and APIs, serving millions of users with state-of-the-art AI capabilities.</p>
<p><strong>What you&#39;ll do</strong></p>
<p>You will own the technical direction and execution of our inference systems while building and leading a world-class team of inference engineers. Our current stack includes Python, PyTorch, Rust, C++, and Kubernetes.</p>
<ul>
<li>Lead and grow a high-performing team of AI inference engineers</li>
<li>Develop APIs for AI inference used by both internal and external customers</li>
<li>Architect and scale our inference infrastructure for reliability and efficiency</li>
</ul>
<p><strong>What you need</strong></p>
<ul>
<li>5+ years of engineering experience with 2+ years in a technical leadership or management role</li>
<li>Deep experience with ML systems and inference frameworks (PyTorch, TensorFlow, ONNX, TensorRT, vLLM)</li>
<li>Strong understanding of LLM architecture: Multi-Head Attention, Multi/Grouped-Query Attention, and common layers</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange>$300K - $405K</Salaryrange>
      <Skills>ML systems, inference frameworks, LLM architecture, CUDA, Triton, custom kernel development</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Perplexity</Employername>
      <Employerlogo>https://logos.yubhub.co/perplexity.com.png</Employerlogo>
      <Employerdescription>Perplexity is a rapidly growing company that is building and scaling the infrastructure that powers its products and APIs, serving millions of users with state-of-the-art AI capabilities.</Employerdescription>
      <Employerwebsite>https://jobs.ashbyhq.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.ashbyhq.com/perplexity/2a87ccbf-82ef-4fc7-b1ed-4dd18b11baf9</Applyto>
      <Location>San Francisco</Location>
      <Country></Country>
      <Postedate>2026-03-04</Postedate>
    </job>
  </jobs>
</source>