TPU Kernel Engineer

01794f13-11a TPU Kernel Engineer As a TPU Kernel Engineer at Anthropic, you'll be responsible for identifying and addressing performance issues across many different ML systems, including research, training, and inference. A significant portion of this work will involve designing and optimizing kernels for the TPU. You will also provide feedback to researchers about how model changes impact performance.

Strong candidates will have a track record of solving large-scale systems problems and low-level optimization. They should have significant experience optimizing ML systems for TPUs, GPUs, or other accelerators, and be results-oriented with a bias towards flexibility and impact.

Responsibilities:

Identify and address performance issues across multiple ML systems
Design and optimize kernels for the TPU
Provide feedback to researchers on model changes and their impact on performance

Requirements:

Bachelor's degree or equivalent combination of education, training, and/or experience
Relevant field of study
Years of experience required will correlate with the internal job level requirements for the position

Benefits:

Competitive compensation and benefits
Optional equity donation matching
Generous vacation and parental leave
Flexible working hours
Lovely office space in which to collaborate with colleagues

Note: This job description is a rewritten version of the original ad, focusing on the key responsibilities, requirements, and benefits.

XML job scraping automation by YubHub

]]> full-time mid hybrid $280,000-$850,000 USD ML systems optimization, TPU kernel design and optimization, Large-scale systems problem-solving, Low-level optimization, Results-oriented approach, High-performance computing, Machine learning framework internals, Language modeling with transformers, Accelerator architecture, Collective communication algorithms Engineering Technology Anthropic https://logos.yubhub.co/anthropic.com.png Anthropic creates reliable, interpretable, and steerable AI systems. It is a public benefit corporation headquartered in San Francisco. https://www.anthropic.com/ https://job-boards.greenhouse.io/anthropic/jobs/4720576008 San Francisco, CA | New York City, NY | Seattle, WA 2026-04-18 faffae87-882 Staff Software Engineer - GenAI Performance and Kernel As a staff software engineer for GenAI Performance and Kernel, you will own the design, implementation, optimization, and correctness of the high-performance GPU kernels powering our GenAI inference stack. You will lead development of highly-tuned, low-level compute paths, manage trade-offs between hardware efficiency and generality, and mentor others in kernel-level performance engineering.

Key responsibilities include:

Leading the design, implementation, benchmarking, and maintenance of core compute kernels optimized for various hardware backends (GPU, accelerators)
Driving the performance roadmap for kernel-level improvements: vectorization, tensorization, tiling, fusion, mixed precision, sparsity, quantization, memory reuse, scheduling, auto-tuning, etc.
Integrating kernel optimizations with higher-level ML systems
Building and maintaining profiling, instrumentation, and verification tooling to detect correctness, performance regressions, numerical issues, and hardware utilization gaps
Leading performance investigations and root-cause analysis on inference bottlenecks, e.g. memory bandwidth, cache contention, kernel launch overhead, tensor fragmentation
Establishing coding patterns, abstractions, and frameworks to modularize kernels for reuse, cross-backend portability, and maintainability
Influencing system architecture decisions to make kernel improvements more effective (e.g. memory layout, dataflow scheduling, kernel fusion boundaries)
Mentoring and guiding other engineers working on lower-level performance, providing code reviews, and helping set best practices
Collaborating with infrastructure, tooling, and ML teams to roll out kernel-level optimizations into production, and monitoring their impact

Requirements include:

BS/MS/PhD in Computer Science, or a related field
Deep hands-on experience writing and tuning compute kernels (CUDA, Triton, OpenCL, LLVM IR, assembly or similar sort) for ML workloads
Strong knowledge of GPU/accelerator architecture: warp structure, memory hierarchy (global, shared, register, L1/L2 caches), tensor cores, scheduling, SM occupancy, etc.
Experience with advanced optimization techniques: tiling, blocking, software pipelining, vectorization, fusion, loop transformations, auto-tuning
Familiarity with ML-specific kernel libraries (cuBLAS, cuDNN, CUTLASS, oneDNN, etc.) or open kernels
Strong debugging and profiling skills (Nsight, NVProf, perf, vtune, custom instrumentation)
Experience reasoning about numerical stability, mixed precision, quantization, and error propagation
Experience in integrating optimized kernels into real-world ML inference systems; exposure to distributed inference pipelines, memory management, and runtime systems
Experience building high-performance products leveraging GPU acceleration
Excellent communication and leadership skills , able to drive design discussions, mentor colleagues, and make trade-offs visible
A track record of shipping performance-critical, high-quality production software
Bonus: published in systems/ML performance venues (e.g. MLSys, ASPLOS, ISCA, PPoPP), experience with custom accelerators or FPGA, experience with sparsity or model compression techniques

The pay range for this role is $190,900-$232,800 USD per year, depending on location and experience.

XML job scraping automation by YubHub

]]> full-time staff onsite $190,900-$232,800 USD per year Compute kernels, GPU/accelerator architecture, Advanced optimization techniques, ML-specific kernel libraries, Debugging and profiling skills, Numerical stability, Mixed precision, Quantization, Error propagation, Distributed inference pipelines, Memory management, Runtime systems, High-performance products, GPU acceleration Engineering Technology Databricks https://logos.yubhub.co/databricks.com.png Databricks is a data and AI company that provides a unified platform for data, analytics, and AI. https://databricks.com https://job-boards.greenhouse.io/databricks/jobs/8202700002 San Francisco, California 2026-04-18