<?xml version="1.0" encoding="UTF-8"?>
<source>
  <jobs>
    <job>
      <externalid>01794f13-11a</externalid>
      <Title>TPU Kernel Engineer</Title>
      <Description><![CDATA[<p>As a TPU Kernel Engineer at Anthropic, you&#39;ll be responsible for identifying and addressing performance issues across many different ML systems, including research, training, and inference. A significant portion of this work will involve designing and optimizing kernels for the TPU. You will also provide feedback to researchers about how model changes impact performance.</p>
<p>Strong candidates will have a track record of solving large-scale systems problems and low-level optimization. They should have significant experience optimizing ML systems for TPUs, GPUs, or other accelerators, and be results-oriented with a bias towards flexibility and impact.</p>
<p>Responsibilities:</p>
<ul>
<li>Identify and address performance issues across multiple ML systems</li>
<li>Design and optimize kernels for the TPU</li>
<li>Provide feedback to researchers on model changes and their impact on performance</li>
</ul>
<p>Requirements:</p>
<ul>
<li>Bachelor&#39;s degree or equivalent combination of education, training, and/or experience</li>
<li>Relevant field of study</li>
<li>Years of experience required will correlate with the internal job level requirements for the position</li>
</ul>
<p>Benefits:</p>
<ul>
<li>Competitive compensation and benefits</li>
<li>Optional equity donation matching</li>
<li>Generous vacation and parental leave</li>
<li>Flexible working hours</li>
<li>Lovely office space in which to collaborate with colleagues</li>
</ul>
<p>Note: This job description is a rewritten version of the original ad, focusing on the key responsibilities, requirements, and benefits.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>mid</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$280,000-$850,000 USD</Salaryrange>
      <Skills>ML systems optimization, TPU kernel design and optimization, Large-scale systems problem-solving, Low-level optimization, Results-oriented approach, High-performance computing, Machine learning framework internals, Language modeling with transformers, Accelerator architecture, Collective communication algorithms</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Anthropic</Employername>
      <Employerlogo>https://logos.yubhub.co/anthropic.com.png</Employerlogo>
      <Employerdescription>Anthropic creates reliable, interpretable, and steerable AI systems. It is a public benefit corporation headquartered in San Francisco.</Employerdescription>
      <Employerwebsite>https://www.anthropic.com/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/anthropic/jobs/4720576008</Applyto>
      <Location>San Francisco, CA | New York City, NY | Seattle, WA</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>faffae87-882</externalid>
      <Title>Staff Software Engineer - GenAI Performance and Kernel</Title>
      <Description><![CDATA[<p>As a staff software engineer for GenAI Performance and Kernel, you will own the design, implementation, optimization, and correctness of the high-performance GPU kernels powering our GenAI inference stack. You will lead development of highly-tuned, low-level compute paths, manage trade-offs between hardware efficiency and generality, and mentor others in kernel-level performance engineering.</p>
<p>Key responsibilities include:</p>
<ul>
<li>Leading the design, implementation, benchmarking, and maintenance of core compute kernels optimized for various hardware backends (GPU, accelerators)</li>
<li>Driving the performance roadmap for kernel-level improvements: vectorization, tensorization, tiling, fusion, mixed precision, sparsity, quantization, memory reuse, scheduling, auto-tuning, etc.</li>
<li>Integrating kernel optimizations with higher-level ML systems</li>
<li>Building and maintaining profiling, instrumentation, and verification tooling to detect correctness, performance regressions, numerical issues, and hardware utilization gaps</li>
<li>Leading performance investigations and root-cause analysis on inference bottlenecks, e.g. memory bandwidth, cache contention, kernel launch overhead, tensor fragmentation</li>
<li>Establishing coding patterns, abstractions, and frameworks to modularize kernels for reuse, cross-backend portability, and maintainability</li>
<li>Influencing system architecture decisions to make kernel improvements more effective (e.g. memory layout, dataflow scheduling, kernel fusion boundaries)</li>
<li>Mentoring and guiding other engineers working on lower-level performance, providing code reviews, and helping set best practices</li>
<li>Collaborating with infrastructure, tooling, and ML teams to roll out kernel-level optimizations into production, and monitoring their impact</li>
</ul>
<p>Requirements include:</p>
<ul>
<li>BS/MS/PhD in Computer Science, or a related field</li>
<li>Deep hands-on experience writing and tuning compute kernels (CUDA, Triton, OpenCL, LLVM IR, assembly or similar sort) for ML workloads</li>
<li>Strong knowledge of GPU/accelerator architecture: warp structure, memory hierarchy (global, shared, register, L1/L2 caches), tensor cores, scheduling, SM occupancy, etc.</li>
<li>Experience with advanced optimization techniques: tiling, blocking, software pipelining, vectorization, fusion, loop transformations, auto-tuning</li>
<li>Familiarity with ML-specific kernel libraries (cuBLAS, cuDNN, CUTLASS, oneDNN, etc.) or open kernels</li>
<li>Strong debugging and profiling skills (Nsight, NVProf, perf, vtune, custom instrumentation)</li>
<li>Experience reasoning about numerical stability, mixed precision, quantization, and error propagation</li>
<li>Experience in integrating optimized kernels into real-world ML inference systems; exposure to distributed inference pipelines, memory management, and runtime systems</li>
<li>Experience building high-performance products leveraging GPU acceleration</li>
<li>Excellent communication and leadership skills , able to drive design discussions, mentor colleagues, and make trade-offs visible</li>
<li>A track record of shipping performance-critical, high-quality production software</li>
<li>Bonus: published in systems/ML performance venues (e.g. MLSys, ASPLOS, ISCA, PPoPP), experience with custom accelerators or FPGA, experience with sparsity or model compression techniques</li>
</ul>
<p>The pay range for this role is $190,900-$232,800 USD per year, depending on location and experience.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>staff</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange>$190,900-$232,800 USD per year</Salaryrange>
      <Skills>Compute kernels, GPU/accelerator architecture, Advanced optimization techniques, ML-specific kernel libraries, Debugging and profiling skills, Numerical stability, Mixed precision, Quantization, Error propagation, Distributed inference pipelines, Memory management, Runtime systems, High-performance products, GPU acceleration</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Databricks</Employername>
      <Employerlogo>https://logos.yubhub.co/databricks.com.png</Employerlogo>
      <Employerdescription>Databricks is a data and AI company that provides a unified platform for data, analytics, and AI.</Employerdescription>
      <Employerwebsite>https://databricks.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/databricks/jobs/8202700002</Applyto>
      <Location>San Francisco, California</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
  </jobs>
</source>