<?xml version="1.0" encoding="UTF-8"?>
<source>
  <jobs>
    <job>
      <externalid>f0f66ce3-d78</externalid>
      <Title>Senior GenAI Research Engineer - Optimization and Kernels</Title>
      <Description><![CDATA[<p>As a research engineer on the Scaling team at Databricks, you will be responsible for keeping up with the latest developments in deep learning and advancing the scientific frontier by creating new techniques that go beyond the state of the art.</p>
<p>You will work together on a collaborative team of researchers and engineers with diverse backgrounds and technical training. Your goal will be to make our customers successful in applying state-of-the-art LLMs and AI systems, and we encode our scientific expertise into our products to make that possible.</p>
<p>Your responsibilities will include:</p>
<ul>
<li>Driving performance improvements through advanced optimization techniques including kernel fusion, mixed precision, memory layout optimization, tiling strategies, and tensorization for training-specific patterns</li>
</ul>
<ul>
<li>Designing, implementing, and optimizing high-performance GPU kernels for training workloads (e.g., attention mechanisms, custom layers, gradient computation, activation functions) targeting NVIDIA architectures</li>
</ul>
<ul>
<li>Designing and implementing distributed training frameworks for large language models, including parallelism strategies (data, tensor, pipeline, ZeRO-based) and optimized communication patterns for gradient synchronization and collective operations</li>
</ul>
<ul>
<li>Profiling, debugging, and optimizing end-to-end training workflows to identify and resolve performance bottlenecks, applying memory optimization techniques like activation checkpointing, gradient sharding, and mixed precision training</li>
</ul>
<p>We look for candidates with a strong background in computer science or a related field, hands-on experience writing and tuning CUDA kernels for ML training applications, and a deep understanding of parallelism techniques and memory optimization strategies for large-scale model training.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange>$166,000-$225,000 USD</Salaryrange>
      <Skills>CUDA, NVIDIA GPU architecture, PyTorch, distributed training frameworks, parallelism techniques, memory optimization strategies</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Databricks</Employername>
      <Employerlogo>https://logos.yubhub.co/databricks.com.png</Employerlogo>
      <Employerdescription>Databricks is a data and AI company that provides a unified platform for data, analytics, and AI. It was founded by the original creators of Lakehouse, Apache Spark, Delta Lake, and MLflow.</Employerdescription>
      <Employerwebsite>https://databricks.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/databricks/jobs/8297797002</Applyto>
      <Location>San Francisco, California</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>cba88898-896</externalid>
      <Title>Research Engineer, Infrastructure, Kernels</Title>
      <Description><![CDATA[<p>We&#39;re looking for an infrastructure research engineer to design, optimize, and maintain the compute foundations that power large-scale language model training. You will develop high-performance ML kernels (e.g., CUDA, CuTe, Triton), enable efficient low-precision arithmetic, and improve the distributed compute stack that makes training large models possible.</p>
<p>This role is perfect for an engineer who enjoys working close to the metal and across the research boundary. You&#39;ll collaborate with researchers and systems architects to bridge algorithmic design with hardware efficiency. You&#39;ll prototype new kernel implementations, profile performance across hardware generations, and help define the numerical and parallelism strategies that determine how we scale next-generation AI systems.</p>
<p><strong>Responsibilities</strong></p>
<ul>
<li>Design and implement custom ML kernels (e.g., CUDA, CuTe, Triton) for core LLM operations such as attention, matrix multiplication, gating, and normalization, optimized for modern GPU and accelerator architectures.</li>
<li>Design and think through compute primitives to reduce memory bandwidth bottlenecks and improve kernel compute efficiency.</li>
<li>Collaborate with research teams to align kernel-level optimizations with model architecture and algorithmic goals.</li>
<li>Develop and maintain a library of reusable kernels and performance benchmarks that serve as the foundation for internal model training.</li>
<li>Contribute to infrastructure stability and scalability, ensuring reproducibility, consistency across precision formats, and high utilization of compute resources.</li>
<li>Document and share insights through internal talks, technical papers, or open-source contributions to strengthen the broader ML systems community.</li>
</ul>
<p><strong>Skills and Qualifications</strong></p>
<p>Minimum qualifications:</p>
<ul>
<li>Bachelor’s degree or equivalent experience in computer science, electrical engineering, statistics, machine learning, physics, robotics, or similar.</li>
<li>Strong engineering skills, ability to contribute performant, maintainable code and debug in complex codebases</li>
<li>Understanding of deep learning frameworks (e.g., PyTorch, JAX) and their underlying system architectures.</li>
<li>Thrive in a highly collaborative environment involving many, different cross-functional partners and subject matter experts.</li>
<li>A bias for action with a mindset to take initiative to work across different stacks and different teams where you spot the opportunity to make sure something ships.</li>
<li>Proficiency in CUDA, CuTe, Triton, or other GPU programming frameworks.</li>
<li>Demonstrated ability to analyze, profile, and optimize compute-intensive workloads.</li>
</ul>
<p>Preferred qualifications:</p>
<ul>
<li>Experience training or supporting large-scale language models with tens of billions of parameters or more.</li>
<li>Track record of improving research productivity through infrastructure design or process improvements.</li>
<li>Experience developing or tuning kernels for deep learning frameworks such as PyTorch, JAX, or custom accelerators.</li>
<li>Familiarity with tensor parallelism, pipeline parallelism, or distributed data processing frameworks.</li>
<li>Experience implementing low-precision formats (FP8, INT8, block floating point) or contributing to related compiler stacks (e.g., XLA, TVM).</li>
<li>Contributions to open-source GPU, ML systems, or compiler optimization projects.</li>
<li>Prior research or engineering experience in numerical optimization, communication-efficient training, or scalable AI infrastructure.</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange>$350,000 - $475,000 USD</Salaryrange>
      <Skills>CUDA, CuTe, Triton, GPU programming frameworks, Deep learning frameworks (e.g., PyTorch, JAX), Computer science, Electrical engineering, Statistics, Machine learning, Physics, Robotics, Experience training or supporting large-scale language models with tens of billions of parameters or more, Track record of improving research productivity through infrastructure design or process improvements, Experience developing or tuning kernels for deep learning frameworks such as PyTorch, JAX, or custom accelerators, Familiarity with tensor parallelism, pipeline parallelism, or distributed data processing frameworks, Experience implementing low-precision formats (FP8, INT8, block floating point) or contributing to related compiler stacks (e.g., XLA, TVM), Contributions to open-source GPU, ML systems, or compiler optimization projects, Prior research or engineering experience in numerical optimization, communication-efficient training, or scalable AI infrastructure</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Thinking Machines Lab</Employername>
      <Employerlogo>https://logos.yubhub.co/thinkingmachines.ai.png</Employerlogo>
      <Employerdescription>Thinking Machines Lab is a technology company that has created widely used AI products, including ChatGPT and Character.ai, and open-source projects like PyTorch.</Employerdescription>
      <Employerwebsite>https://thinkingmachines.ai/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/thinkingmachines/jobs/5013934008</Applyto>
      <Location>San Francisco</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>dc17980d-461</externalid>
      <Title>Research Engineer, Interpretability</Title>
      <Description><![CDATA[<p>JOB TITLE: Research Engineer, Interpretability \n LOCATION: San Francisco, CA \n DEPARTMENT: AI Research &amp; Engineering \n \n JOB DESCRIPTION: \n \n When you see what modern language models are capable of, do you wonder, &quot;How do these things work? How can we trust them?&quot; \n \n The Interpretability team at Anthropic is working to reverse-engineer how trained models work because we believe that a mechanistic understanding is the most robust way to make advanced systems safe. \n \n Think of us as doing &quot;neuroscience&quot; of neural networks using &quot;microscopes&quot; we build - or reverse-engineering neural networks like binary programs. \n \n More resources to learn about our work: \n - Our research blog - covering advances including Monosemantic Features and Circuits \n - An Introduction to Interpretability from our research lead, Chris Olah \n - The Urgency of Interpretability from CEO Dario Amodei \n - Engineering Challenges Scaling Interpretability - directly relevant to this role \n - 60 Minutes segment - Around 8:07, see a demo of tooling our team built \n - New Yorker article - what it&#39;s like to work on one of AI&#39;s hardest open problems \n \n Even if you haven&#39;t worked on interpretability before, the infrastructure expertise is similar to what&#39;s needed across the lifecycle of a production language model: \n - Pretraining: Training dictionary learning models looks a lot like model pretraining - creating stable, performant training jobs for massively parameterized models across thousands of chips \n - Inference: Interp runs a customized inference stack. Day-to-day analysis requires services that allow editing a model&#39;s internal activations mid-forward-pass - for example, adding a &quot;steering vector&quot; \n - Performance: Like all LLM work, we push up against the limits of hardware and software. Rather than squeezing the last 0.1%, we are focused on finding bottlenecks, fixing them and moving ahead given rapidly evolving research and safety mission \n \n The science keeps scaling - and it&#39;s now applied directly in safety audits on frontier models, with real deadlines. As our research has matured, engineering and infrastructure have become a bottleneck. Your work will have a direct impact on one of the most important open problems in AI. \n \n RESPONSIBILITIES: \n - Build and maintain the specialized inference and training infrastructure that powers interpretability research - including instrumented forward/backward passes, activation extraction, and steering vector application \n - Resolve scaling and efficiency bottlenecks through profiling, optimization, and close collaboration with peer infrastructure teams \n - Design tools, abstractions, and platforms that enable researchers to rapidly experiment without hitting engineering barriers \n - Help bring interpretability research into production safety audits - with real deadlines and high reliability expectations \n - Work across the stack - from model internals and accelerator-level optimization to user-facing research tooling \n \n YOU MAY BE A GOOD FIT IF YOU: \n - Have 5-10+ years of experience building software \n - Are highly proficient in at least one programming language (e.g., Python, Rust, Go, Java) and productive with Python \n - Are extremely curious about unfamiliar domains; can quickly learn and put that knowledge to work, e.g. diving into new layers of the stack to find bottlenecks \n - Have a strong ability to prioritize the most impactful work and are comfortable operating with ambiguity and questioning assumptions \n - Prefer fast-moving collaborative projects to extensive solo efforts \n - Are curious about interpretability research and its role in AI safety (though no research experience is required!) \n - Care about the societal impacts and ethics of your work \n - Are comfortable working closely with researchers, translating research needs into engineering solutions. \n \n STRONG CANDIDATES MAY ALSO HAVE EXPERIENCE WITH: \n - Optimizing the performance of large-scale distributed systems \n - Language modeling fundamentals with transformers \n - High Performance LLM optimization: memory management, compute efficiency, parallelism strategies, inference throughput optimization \n - Working hands-on in a mainstream ML stack - PyTorch/CUDA on GPUs or JAX/XLA on TPUs \n - Collaborating closely with researchers and building tooling to support research teams; or directly performed research with complex engineering challenges \n \n REPRESENTATIVE PROJECTS: \n - Building Garcon, a tool that allows researchers to easily instrument LLMs to extract internal activations \n - Designing and optimizing a pipeline to efficiently collect petabytes of transformer activations and shuffle them \n - Profiling and optimizing ML training jobs, including multi-GPU parallelism and memory optimization \n - Building a steered inference system that applies targeted interventions to model internals at scale (conceptually similar to Golden Gate Claude but for safety research) \n \n ROLE SPECIFIC LOCATION POLICY: \n - This role is based in the San Francisco office; however, we are open to considering exceptional candidates for remote work on a case-by-case basis. \n \n The annual compensation range for this role is listed below. \n For sales roles, the range provided is the role&#39;s On Target Earnings (\&quot;OTE\&quot;) range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role. \n Annual Salary:\\$315,000-\\$560,000 USD</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$315,000-$560,000 USD</Salaryrange>
      <Skills>Python, Rust, Go, Java, PyTorch, CUDA, JAX, XLA, High Performance LLM optimization, memory management, compute efficiency, parallelism strategies, inference throughput optimization, large-scale distributed systems, language modeling fundamentals, transformers, collaborating closely with researchers, building tooling to support research teams</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Anthropic</Employername>
      <Employerlogo>https://logos.yubhub.co/anthropic.com.png</Employerlogo>
      <Employerdescription>Anthropic is a company that creates reliable, interpretable, and steerable AI systems.</Employerdescription>
      <Employerwebsite>https://www.anthropic.com/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/anthropic/jobs/4980430008</Applyto>
      <Location>San Francisco, CA</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>19c6b9e4-ff6</externalid>
      <Title>Foundation and generative models for biomolecules</Title>
      <Description><![CDATA[<p>At Inceptive, you will drive forward development that could help billions of people. You will be part of a collaborative, interdisciplinary team building our biological software.</p>
<p>The design space of biomolecules is unimaginably vast , far beyond what can be explored experimentally. Yet within this space lie molecules with properties essential for new medicines. Our machine learning models learn to design therapeutic biomolecules with specific, desirable functions.</p>
<p>We advance the state of the art in molecular design by training large-scale foundation models and developing cutting-edge generative approaches. The models learn from diverse heterogeneous datasets and are refined through focused fine-tuning and feedback from experiments. Key to progress is a team that combines exceptional machine learning expertise with thorough domain understanding.</p>
<p>You will collaborate closely with other machine learning researchers and engineers, as well as computational and experimental biologists, to advance these models and translate their capabilities into real therapeutic designs.</p>
<p><strong>Responsibilities</strong></p>
<ul>
<li>Embody our vision of an interdisciplinary environment and embrace learning about areas outside of your traditional area of expertise</li>
</ul>
<ul>
<li>Develop, implement, train, and iteratively improve state-of-the-art models for biomolecule design</li>
</ul>
<ul>
<li>Analyze, visualize, and communicate results to support team efforts in improving models and data</li>
</ul>
<ul>
<li>Create, deploy, and refine tools for efficient, reliable machine learning experimentation and production</li>
</ul>
<ul>
<li>Work with biologists to collect data for the training and evaluation of generative models of biomolecules</li>
</ul>
<ul>
<li>Provide mentorship and technical direction to team members as appropriate</li>
</ul>
<p><strong>Qualifications</strong></p>
<ul>
<li>3+ years of hands-on experience developing ML models</li>
</ul>
<ul>
<li>Demonstrated track record of implementing, training, improving advanced machine learning models</li>
</ul>
<ul>
<li>Highly capable programmer fluent in Python ecosystem and PyTorch or similar deep learning framework</li>
</ul>
<ul>
<li>Availability to work with team members across US and Europe, with meetings starting at 8am PT and ending at 7pm CET</li>
</ul>
<ul>
<li>Readiness to travel several times a year for company retreats and business events</li>
</ul>
<p><strong>Compensation</strong></p>
<p>$200K – $275K + Bonus + Equity</p>
<p><strong>Benefits</strong></p>
<ul>
<li>A competitive compensation package</li>
</ul>
<ul>
<li>30 days paid vacation per year</li>
</ul>
<ul>
<li>Comprehensive health insurance for US based employees</li>
</ul>
<ul>
<li>401K with company match for US based employees and Direktversicherung for German employees</li>
</ul>
<ul>
<li>Quarterly company-wide retreats</li>
</ul>
<ul>
<li>Monthly wellness benefit</li>
</ul>
<ul>
<li>Budget for multiple visits per year to our offices in Berlin, Palo Alto or Switzerland</li>
</ul>
<ul>
<li>Learning &amp; Development budget to attend conferences, take courses, or otherwise invest in your professional growth, as well as access to the Learning &amp; Development platform EdX and Hone</li>
</ul>
<ul>
<li>A buddy to help you get settled</li>
</ul>
<p>At Inceptive, we are creating tools to develop increasingly powerful biological software for the rational design of novel, broadly accessible medicines and biotechnologies previously out of reach. Our team brings together vast expertise in molecular biology, machine learning, and software engineering, and we are all working towards becoming interdisciplinary, meaning we deepen the knowledge we have in our area of expertise while also expanding our knowledge of completely new fields.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>entry|mid|senior|staff|executive</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange>$200K – $275K + Bonus + Equity</Salaryrange>
      <Skills>Python, PyTorch, Machine Learning, Deep Learning, Biological Software, Molecular Design, Generative Models, Domain Understanding, Interdisciplinary Teamwork, PhD in AI/ML, computer science, computational biology, physics, or a related field, Strong skills in designing, executing, and documenting machine learning experiments, Practical experience with modern generative models, Strong software engineering skills, in particular for data processing, evaluation of ML models, compute cluster orchestration, Experience with large-scale model training, foundation models, model parallelism, multi-node training, Experience with bio sequence data and datasets — various genomic and protein data, sequencing, functional assays, etc, Knowledge of biochemistry, molecular/cell biology, and drug development</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Inceptive</Employername>
      <Employerlogo>https://logos.yubhub.co/inceptive.com.png</Employerlogo>
      <Employerdescription>Inceptive is a company creating tools to develop increasingly powerful biological software for the rational design of novel, broadly accessible medicines and biotechnologies.</Employerdescription>
      <Employerwebsite>https://inceptive.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/inceptive/jobs/4961579007</Applyto>
      <Location>Berlin, Germany or Palo Alto, CA or Zurich, Switzerland</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>2bc6ae79-8ee</externalid>
      <Title>Staff Technical Lead for Inference &amp; ML Performance</Title>
      <Description><![CDATA[<p>We&#39;re looking for a Staff Technical Lead for Inference &amp; ML Performance to guide a team in building and optimizing state-of-the-art inference systems. This role is intense yet deeply impactful.</p>
<p>You&#39;ll shape the future of fal&#39;s inference engine and ensure our generative models achieve best-in-class performance. Your work directly impacts our ability to rapidly deliver cutting-edge creative solutions to users, from individual creators to global brands.</p>
<p>Day-to-day, you&#39;ll set technical direction, guide your team to build high-performance inference solutions, and personally contribute to critical inference performance enhancements and optimizations. You&#39;ll collaborate closely with research &amp; applied ML teams, influence model inference strategies and deployment techniques, and drive advanced performance optimizations.</p>
<p>As a leader, you&#39;ll mentor and scale your team, coach and expand your team of performance-focused engineers, and help them innovate, solve complex performance challenges, and level up their skills.</p>
<p>To succeed in this role, you&#39;ll need to be deeply experienced in ML performance optimization, understand the full ML performance stack, and know inference inside-out. You&#39;ll also need to thrive in cross-functional collaboration and have excellent leadership skills.</p>
<p>If you&#39;re ready to lead the future of inference performance at a fast-paced, high-growth frontier, apply now!</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>staff</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>ML performance optimization, PyTorch, TensorRT, TransformerEngine, Triton, CUTLASS kernels, Quantization, Kernel authoring, Compilation, Model parallelism, Distributed serving, Profiling</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>fal</Employername>
      <Employerlogo>https://logos.yubhub.co/fal.com.png</Employerlogo>
      <Employerdescription>fal is a fast-growing company pioneering the next generation of generative-media infrastructure.</Employerdescription>
      <Employerwebsite>https://fal.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/fal/jobs/4012780009</Applyto>
      <Location>San Francisco</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>c078633c-28c</externalid>
      <Title>Senior Engineer, Core API - W&amp;B</Title>
      <Description><![CDATA[<p>You will be responsible for building and evolving the core backend systems and shared infrastructure that power our platform.</p>
<p>A significant portion of backend logic is shared across services, and this role will help define, maintain, and scale that foundation.</p>
<p>You will own and improve internal schema and code generation tooling that ensures consistency and correctness across services.</p>
<p>You will work on and extend our custom job scheduler, improving reliability, observability, and execution guarantees for distributed workloads.</p>
<p>You will contribute to safely execute large-scale concurrent and distributed operations.</p>
<p>You will play a key role in defining and maintaining API standards across teams, ensuring performance, backward compatibility, and clear evolution strategies.</p>
<p>You will collaborate closely with Product and various Engineering teams to design systems that are reliable, scalable, and maintainable over time.</p>
<p>The Core Systems team is responsible for the foundational backend infrastructure that powers Weights &amp; Biases within CoreWeave.</p>
<p>Much of the platform&#39;s critical logic is shared across services, and this role sits at the center of that foundation.</p>
<p>You will work on the systems that other engineers build upon , from execution frameworks and schedulers to schema tooling and API standards.</p>
<p>This is a high-leverage role focused on durability, scalability, and long-term maintainability.</p>
<p>The systems you design and evolve will directly impact reliability, developer velocity, and the ability of the platform to scale with growing workloads.</p>
<p>You&#39;ll collaborate across teams to ensure that shared backend abstractions remain clean, performant, and consistent as we continue to expand our adoption of technologies like GraphQL and gRPC.</p>
<p>If you enjoy owning deep technical infrastructure, shaping engineering standards, and building systems that other engineers depend on every day, this role offers meaningful scope and impact.</p>
<p>You will be surrounded by some of the best talent in the industry, who will want to learn from you, too.</p>
<p>Come join us!</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$165,000 to $242,000</Salaryrange>
      <Skills>backend engineering experience, designing and maintaining distributed systems, hands-on experience designing and evolving APIs, strong proficiency in Go, Python, or a comparable backend systems language, experience implementing concurrency and parallelism patterns in production systems, familiarity with schema management, code generation tools, or interface definition systems, experience building or operating custom job schedulers, workflow engines, or execution frameworks, experience defining cross-team API standards and governance models, background in high-scale data or ML infrastructure systems, experience improving reliability through observability, metrics, and SLO-driven development practices</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>CoreWeave</Employername>
      <Employerlogo>https://logos.yubhub.co/coreweave.com.png</Employerlogo>
      <Employerdescription>CoreWeave is a cloud computing company that provides a platform for building and scaling AI applications.</Employerdescription>
      <Employerwebsite>https://www.coreweave.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/coreweave/jobs/4658736006</Applyto>
      <Location>Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>97212bdf-dd1</externalid>
      <Title>Research Engineer, Interpretability</Title>
      <Description><![CDATA[<p>Job Title: Research Engineer, Interpretability</p>
<p>About the Role:</p>
<p>When you see what modern language models are capable of, do you wonder, &quot;How do these things work? How can we trust them?&quot; The Interpretability team at Anthropic is working to reverse-engineer how trained models work because we believe that a mechanistic understanding is the most robust way to make advanced systems safe.</p>
<p>Think of us as doing &quot;neuroscience&quot; of neural networks using &quot;microscopes&quot; we build - or reverse-engineering neural networks like binary programs.</p>
<p>More resources to learn about our work:</p>
<ul>
<li>Our research blog - covering advances including Monosemantic Features and Circuits</li>
</ul>
<ul>
<li>An Introduction to Interpretability from our research lead, Chris Olah</li>
</ul>
<ul>
<li>The Urgency of Interpretability from CEO Dario Amodei</li>
</ul>
<ul>
<li>Engineering Challenges Scaling Interpretability - directly relevant to this role</li>
</ul>
<ul>
<li>60 Minutes segment - Around 8:07, see a demo of tooling our team built</li>
</ul>
<ul>
<li>New Yorker article - what it&#39;s like to work on one of AI&#39;s hardest open problems</li>
</ul>
<p>Even if you haven&#39;t worked on interpretability before, the infrastructure expertise is similar to what&#39;s needed across the lifecycle of a production language model:</p>
<ul>
<li>Pretraining: Training dictionary learning models looks a lot like model pretraining - creating stable, performant training jobs for massively parameterized models across thousands of chips</li>
</ul>
<ul>
<li>Inference: Interp runs a customized inference stack. Day-to-day analysis requires services that allow editing a model&#39;s internal activations mid-forward-pass - for example, adding a &quot;steering vector&quot;</li>
</ul>
<ul>
<li>Performance: Like all LLM work, we push up against the limits of hardware and software. Rather than squeezing the last 0.1%, we are focused on finding bottlenecks, fixing them and moving ahead given rapidly evolving research and safety mission</li>
</ul>
<p>The science keeps scaling - and it&#39;s now applied directly in safety audits on frontier models, with real deadlines. As our research has matured, engineering and infrastructure have become a bottleneck. Your work will have a direct impact on one of the most important open problems in AI.</p>
<p>Responsibilities:</p>
<ul>
<li>Build and maintain the specialized inference and training infrastructure that powers interpretability research - including instrumented forward/backward passes, activation extraction, and steering vector application</li>
</ul>
<ul>
<li>Resolve scaling and efficiency bottlenecks through profiling, optimization, and close collaboration with peer infrastructure teams</li>
</ul>
<ul>
<li>Design tools, abstractions, and platforms that enable researchers to rapidly experiment without hitting engineering barriers</li>
</ul>
<ul>
<li>Help bring interpretability research into production safety audits - with real deadlines and high reliability expectations</li>
</ul>
<ul>
<li>Work across the stack - from model internals and accelerator-level optimization to user-facing research tooling</li>
</ul>
<p>You may be a good fit if you:</p>
<ul>
<li>Have 5-10+ years of experience building software</li>
</ul>
<ul>
<li>Are highly proficient in at least one programming language (e.g., Python, Rust, Go, Java) and productive with Python</li>
</ul>
<ul>
<li>Are extremely curious about unfamiliar domains; can quickly learn and put that knowledge to work, e.g. diving into new layers of the stack to find bottlenecks</li>
</ul>
<ul>
<li>Have a strong ability to prioritize the most impactful work and are comfortable operating with ambiguity and questioning assumptions</li>
</ul>
<ul>
<li>Prefer fast-moving collaborative projects to extensive solo efforts</li>
</ul>
<ul>
<li>Are curious about interpretability research and its role in AI safety (though no research experience is required!)</li>
</ul>
<ul>
<li>Care about the societal impacts and ethics of your work</li>
</ul>
<ul>
<li>Are comfortable working closely with researchers, translating research needs into engineering solutions.</li>
</ul>
<p>Strong candidates may also have experience with:</p>
<ul>
<li>Optimizing the performance of large-scale distributed systems</li>
</ul>
<ul>
<li>Language modeling fundamentals with transformers</li>
</ul>
<ul>
<li>High Performance LLM optimization: memory management, compute efficiency, parallelism strategies, inference throughput optimization</li>
</ul>
<ul>
<li>Working hands-on in a mainstream ML stack - PyTorch/CUDA on GPUs or JAX/XLA on TPUs</li>
</ul>
<ul>
<li>Collaborating closely with researchers and building tooling to support research teams; or directly performed research with complex engineering challenges</li>
</ul>
<p>Representative Projects:</p>
<ul>
<li>Building Garcon, a tool that allows researchers to easily instrument LLMs to extract internal activations</li>
</ul>
<ul>
<li>Designing and optimizing a pipeline to efficiently collect petabytes of transformer activations and shuffle them</li>
</ul>
<ul>
<li>Profiling and optimizing ML training jobs, including multi-GPU parallelism and memory optimization</li>
</ul>
<ul>
<li>Building a steered inference system that applies targeted interventions to model internals at scale (conceptually similar to Golden Gate Claude but for safety research)</li>
</ul>
<p>Role Specific Location Policy:</p>
<ul>
<li>This role is based in the San Francisco office; however, we are open to considering exceptional candidates for remote work on a case-by-case basis.</li>
</ul>
<p>The annual compensation range for this role is listed below.</p>
<p>For sales roles, the range provided is the role&#39;s On Target Earnings (&quot;OTE&quot;) range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.</p>
<p>Annual Salary: $315,000-$560,000 USD</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$315,000-$560,000 USD</Salaryrange>
      <Skills>Python, Rust, Go, Java, PyTorch, CUDA, JAX, XLA, Transformers, High Performance LLM optimization, Memory management, Compute efficiency, Parallelism strategies, Inference throughput optimization, Optimizing the performance of large-scale distributed systems, Language modeling fundamentals, Collaborating closely with researchers and building tooling to support research teams</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Anthropic</Employername>
      <Employerlogo>https://logos.yubhub.co/anthropic.com.png</Employerlogo>
      <Employerdescription>Anthropic is a company that creates reliable, interpretable, and steerable AI systems.</Employerdescription>
      <Employerwebsite>https://www.anthropic.com/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/anthropic/jobs/4980430008</Applyto>
      <Location>San Francisco, CA</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>71554e46-b64</externalid>
      <Title>Senior Engineering Manager, AI Runtime</Title>
      <Description><![CDATA[<p>At Databricks, we are committed to enabling data teams to solve the world&#39;s toughest problems. As a Senior Engineering Manager, you will lead the team owning both the product experience and the foundational infrastructure of our AI Runtime (AIR) product.</p>
<p>You will be responsible for shaping customer-facing capabilities while designing for scalability, extensibility, and performance of GPU training and adjacent areas. This will involve collaborating closely across the platform, product, infrastructure, and research organisations.</p>
<p>Key responsibilities include:</p>
<ul>
<li>Leading, mentoring, and growing a high-performing engineering team responsible for the Custom Training product and its foundational infrastructure</li>
<li>Defining and owning the product and technical roadmap for AIR, balancing customer experience, functionality, and foundational investments</li>
<li>Collaborating closely with product, research, platform, infrastructure teams, and customers to drive end-to-end delivery</li>
<li>Driving architectural decisions and product design for managed GPU training at scale</li>
<li>Advocating for customer needs through direct engagement, ensuring engineering decisions translate to clear product impact</li>
</ul>
<p>We are looking for someone with 8+ years of software engineering experience, with 3+ years in engineering management. You should have a track record of building and operating managed GPU training infrastructure at scale, as well as deep familiarity with distributed training frameworks and parallelism strategies.</p>
<p>In addition, you should have experience with training resilience patterns, such as checkpointing, elastic training, and automated failure recovery for long-running jobs. You should also have a strong understanding of GPU performance fundamentals, including NCCL, interconnect topologies, and memory optimisation.</p>
<p>Experience building platform products with clear SLAs is also essential, as is strong cross-functional leadership across platform, product, and research teams. Excellent collaboration and communication skills are also required.</p>
<p>The pay range for this role is $228,600-$314,250 USD per year, depending on location. The total compensation package may also include eligibility for annual performance bonus, equity, and benefits.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange>$228,600-$314,250 USD per year</Salaryrange>
      <Skills>software engineering, engineering management, distributed training frameworks, parallelism strategies, GPU training infrastructure, checkpointing, elastic training, automated failure recovery, GPU performance fundamentals, NCCL, interconnect topologies, memory optimisation</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Databricks</Employername>
      <Employerlogo>https://logos.yubhub.co/databricks.com.png</Employerlogo>
      <Employerdescription>Databricks is a data and AI company that provides a unified platform for data, analytics, and AI. It was founded by the original creators of Lakehouse, Apache Spark, Delta Lake, and MLflow.</Employerdescription>
      <Employerwebsite>https://databricks.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/databricks/jobs/8490282002</Applyto>
      <Location>Mountain View, California; San Francisco, California</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>28107212-128</externalid>
      <Title>Performance Engineer, GPU</Title>
      <Description><![CDATA[<p>As a GPU Performance Engineer at Anthropic, you will be responsible for architecting and implementing the foundational systems that power Claude and push the frontiers of what&#39;s possible with large language models. You will maximize GPU utilization and performance at unprecedented scale, develop cutting-edge optimizations that directly enable new model capabilities, and dramatically improve inference efficiency.</p>
<p>Working at the intersection of hardware and software, you will implement state-of-the-art techniques from custom kernel development to distributed system architectures. Your work will span the entire stack,from low-level tensor core optimizations to orchestrating thousands of GPUs in perfect synchronization.</p>
<p>Strong candidates will have a track record of delivering transformative GPU performance improvements in production ML systems and will be excited to shape the future of AI infrastructure alongside world-class researchers and engineers.</p>
<p>Responsibilities:</p>
<ul>
<li>Architect and implement foundational systems that power Claude</li>
<li>Maximize GPU utilization and performance at unprecedented scale</li>
<li>Develop cutting-edge optimizations that directly enable new model capabilities</li>
<li>Dramatically improve inference efficiency</li>
<li>Implement state-of-the-art techniques from custom kernel development to distributed system architectures</li>
<li>Work at the intersection of hardware and software</li>
<li>Span the entire stack,from low-level tensor core optimizations to orchestrating thousands of GPUs in perfect synchronization</li>
</ul>
<p>Requirements:</p>
<ul>
<li>Deep experience with GPU programming and optimization at scale</li>
<li>Impact-driven, passionate about delivering measurable performance breakthroughs</li>
<li>Ability to navigate complex systems from hardware interfaces to high-level ML frameworks</li>
<li>Enjoy collaborative problem-solving and pair programming</li>
<li>Want to work on state-of-the-art language models with real-world impact</li>
<li>Care about the societal impacts of your work</li>
<li>Thrive in ambiguous environments where you define the path forward</li>
</ul>
<p>Nice to have:</p>
<ul>
<li>Experience with GPU Kernel Development: CUDA, Triton, CUTLASS, Flash Attention, tensor core optimization</li>
<li>ML Compilers &amp; Frameworks: PyTorch/JAX internals, torch.compile, XLA, custom operators</li>
<li>Performance Engineering: Kernel fusion, memory bandwidth optimization, profiling with Nsight</li>
<li>Distributed Systems: NCCL, NVLink, collective communication, model parallelism</li>
<li>Low-Precision: INT8/FP8 quantization, mixed-precision techniques</li>
<li>Production Systems: Large-scale training infrastructure, fault tolerance, cluster orchestration</li>
</ul>
<p>Representative projects:</p>
<ul>
<li>Co-design attention mechanisms and algorithms for next-generation hardware architectures</li>
<li>Develop custom kernels for emerging quantization formats and mixed-precision techniques</li>
<li>Design distributed communication strategies for multi-node GPU clusters</li>
<li>Optimize end-to-end training and inference pipelines for frontier language models</li>
<li>Build performance modeling frameworks to predict and optimize GPU utilization</li>
<li>Implement kernel fusion strategies to minimize memory bandwidth bottlenecks</li>
<li>Create resilient systems for planet-scale distributed training infrastructure</li>
<li>Profile and eliminate performance bottlenecks in production serving infrastructure</li>
<li>Partner with hardware vendors to influence future accelerator capabilities and software stacks</li>
</ul>
<p>Note: The salary range for this position is $280,000-$850,000 USD per year.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$280,000-$850,000 USD per year</Salaryrange>
      <Skills>GPU programming, optimization at scale, CUDA, Triton, CUTLASS, Flash Attention, tensor core optimization, PyTorch/JAX internals, torch.compile, XLA, custom operators, kernel fusion, memory bandwidth optimization, profiling with Nsight, NCCL, NVLink, collective communication, model parallelism, INT8/FP8 quantization, mixed-precision techniques, large-scale training infrastructure, fault tolerance, cluster orchestration</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Anthropic</Employername>
      <Employerlogo>https://logos.yubhub.co/anthropic.com.png</Employerlogo>
      <Employerdescription>Anthropic is a public benefit corporation that creates reliable, interpretable, and steerable AI systems.</Employerdescription>
      <Employerwebsite>https://www.anthropic.com/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/anthropic/jobs/4926227008</Applyto>
      <Location>San Francisco, CA | New York City, NY | Seattle, WA</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>7b2b97d5-0a1</externalid>
      <Title>Software Engineer, Inference Deployment</Title>
      <Description><![CDATA[<p><strong>About Anthropic</strong></p>
<p>Anthropic&#39;s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.</p>
<p><strong>About the Role</strong></p>
<p>Our mandate is to make inference deployment boring and unattended.</p>
<p>Anthropic serves Claude to millions of users across GPUs, TPUs, and Trainium — and every model update must reach production safely, quickly, and without disrupting service. We&#39;re building the systems that make inference deployment continuous and unattended.</p>
<p>As a Software Engineer on the Launch Engineering team, you&#39;ll design and build the deployment infrastructure that moves inference code from merge to production. This is a resource-constrained optimization problem at its core: validation and deployment consume the same accelerator chips that serve customer traffic — your deploys compete with live user requests for the same hardware. Every model brings different fleet sizes, startup times, and correctness requirements, so the system must adapt continuously. You&#39;ll build systems that navigate these constraints — orchestrating validation, scheduling deployments intelligently, and driving down cycle time from merge to production.</p>
<p>If you&#39;ve built deployment systems at scale and gravitate toward the hardest problems at the intersection of automation and resource management, this team will give you an outsized scope to work on them.</p>
<p><strong>Responsibilities</strong></p>
<ul>
<li><strong>Own deployment orchestration</strong> that continuously moves validated inference builds into production across GPU, TPU, and Trainium fleets, unattended under normal conditions</li>
<li><strong>Improve capacity-aware deployment scheduling</strong> to maximize deployment throughput against constrained accelerator budgets and variable fleet sizes</li>
<li><strong>Extend deployment observability</strong> — dashboards and tooling that answer &quot;what code is running in production,&quot; &quot;where is my commit,&quot; and &quot;what validation passed for this deploy&quot;</li>
<li><strong>Drive down cycle time</strong> from code merge to production with pipeline architectures that minimize serial dependencies and maximize parallelism</li>
<li><strong>Optimize fleet rollout strategies</strong> for large-scale deployments across thousands of GPU, TPU, and Trainium chips, minimizing disruption to serving capacity</li>
<li><strong>Evolve self-service model onboarding</strong> so that new models can be added to the continuous deployment pipeline without Launch Engineering involvement</li>
<li><strong>Partner across the Inference organization</strong> with teams owning validation, autoscaling, and model routing to integrate deployment automation with their systems</li>
</ul>
<p><strong>You May Be a Good Fit If You Have</strong></p>
<ul>
<li>5+ years of experience building deployment, release, or delivery infrastructure at scale</li>
<li>Strong software engineering skills with experience designing systems that manage complex state machines and multi-stage pipelines</li>
<li>Experience with deployment systems where resource constraints shape the design — whether that&#39;s fleet capacity, network bandwidth, hardware availability, or coordinated rollout windows</li>
<li>A track record of building automation that measurably improves deployment velocity and reliability</li>
<li>Proficiency with Kubernetes-based deployments, rolling update mechanics, and container orchestration</li>
<li>Comfort working across the stack — from backend services and databases to CLI tools and web UIs</li>
<li>Strong communication skills and the ability to work closely with oncall engineers, model teams, and infrastructure partners</li>
</ul>
<p><strong>Strong Candidates May Also Have</strong></p>
<ul>
<li>Experience with ML inference or training infrastructure deployment, particularly across multiple accelerator types (GPU, TPU, Trainium)</li>
<li>Background in capacity planning or resource-constrained scheduling (e.g., bin-packing, fleet management, job scheduling with hardware affinity)</li>
<li>Experience with progressive delivery in systems with long validation cycles: canary/soak testing, blue-green deployments, traffic shifting, automated rollback</li>
<li>Experience at companies with large-scale release engineering challenges (mobile release trains, monorepo deployments, multi-datacenter rollouts)</li>
<li>Experience with Python and/or Rust in production systems</li>
</ul>
<p><strong>Logistics</strong></p>
<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience. <strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>
<p><strong>Visa sponsorship:</strong> We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>
<p><strong>We encourage you to apply even if you do not believe you meet every single qualification.</strong> Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$320,000 - $485,000USD</Salaryrange>
      <Skills>deployment, release, delivery, infrastructure, Kubernetes, container, orchestration, pipelines, state machines, multi-stage, pipelines, parallelism, optimization, resource management, automation, velocity, reliability, communication, collaboration, oncall, model teams, infrastructure partners, ML inference, training infrastructure, capacity planning, resource-constrained scheduling, bin-packing, fleet management, job scheduling, hardware affinity, progressive delivery, canary/soak testing, blue-green deployments, traffic shifting, automated rollback, mobile release trains, monorepo deployments, multi-datacenter rollouts, Python, Rust</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Anthropic</Employername>
      <Employerlogo>https://logos.yubhub.co/anthropic.com.png</Employerlogo>
      <Employerdescription>Anthropic&apos;s mission is to create reliable, interpretable, and steerable AI systems. The company is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.</Employerdescription>
      <Employerwebsite>https://job-boards.greenhouse.io</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/anthropic/jobs/5111745008</Applyto>
      <Location>San Francisco, CA | New York City, NY | Seattle, WA</Location>
      <Country></Country>
      <Postedate>2026-03-08</Postedate>
    </job>
    <job>
      <externalid>11a60d5a-f54</externalid>
      <Title>Performance Engineer, GPU</Title>
      <Description><![CDATA[<p><strong>About the role:</strong></p>
<p>Pioneering the next generation of AI requires breakthrough innovations in GPU performance and systems engineering. As a GPU Performance Engineer, you&#39;ll architect and implement the foundational systems that power Claude and push the frontiers of what&#39;s possible with large language models. You&#39;ll be responsible for maximizing GPU utilization and performance at unprecedented scale, developing cutting-edge optimizations that directly enable new model capabilities and dramatically improve inference efficiency.</p>
<p>Working at the intersection of hardware and software, you&#39;ll implement state-of-the-art techniques from custom kernel development to distributed system architectures. Your work will span the entire stack—from low-level tensor core optimizations to orchestrating thousands of GPUs in perfect synchronization.</p>
<p>Strong candidates will have a track record of delivering transformative GPU performance improvements in production ML systems and will be excited to shape the future of AI infrastructure alongside world-class researchers and engineers.</p>
<p><strong>You might be a good fit if you:</strong></p>
<ul>
<li>Have deep experience with GPU programming and optimization at scale</li>
<li>Are impact-driven, passionate about delivering measurable performance breakthroughs</li>
<li>Can navigate complex systems from hardware interfaces to high-level ML frameworks</li>
<li>Enjoy collaborative problem-solving and pair programming</li>
<li>Want to work on state-of-the-art language models with real-world impact</li>
<li>Care about the societal impacts of your work</li>
<li>Thrive in ambiguous environments where you define the path forward</li>
</ul>
<p><strong>Strong candidates may also have experience with:</strong></p>
<ul>
<li>GPU Kernel Development: CUDA, Triton, CUTLASS, Flash Attention, tensor core optimization</li>
<li>ML Compilers &amp; Frameworks: PyTorch/JAX internals, torch.compile, XLA, custom operators</li>
<li>Performance Engineering: Kernel fusion, memory bandwidth optimization, profiling with Nsight</li>
<li>Distributed Systems: NCCL, NVLink, collective communication, model parallelism</li>
<li>Low-Precision: INT8/FP8 quantization, mixed-precision techniques</li>
<li>Production Systems: Large-scale training infrastructure, fault tolerance, cluster orchestration</li>
</ul>
<p><strong>Representative projects:</strong></p>
<ul>
<li>Co-design attention mechanisms and algorithms for next-generation hardware architectures</li>
<li>Develop custom kernels for emerging quantization formats and mixed-precision techniques</li>
<li>Design distributed communication strategies for multi-node GPU clusters</li>
<li>Optimize end-to-end training and inference pipelines for frontier language models</li>
<li>Build performance modeling frameworks to predict and optimize GPU utilization</li>
<li>Implement kernel fusion strategies to minimize memory bandwidth bottlenecks</li>
<li>Create resilient systems for planet-scale distributed training infrastructure</li>
<li>Profile and eliminate performance bottlenecks in production serving infrastructure</li>
<li>Partner with hardware vendors to influence future accelerator capabilities and software stacks</li>
</ul>
<p><strong>Deadline to apply:</strong> None. Applications will be reviewed on a rolling basis.</p>
<p>The expected salary range for this position is:</p>
<p>Annual Salary: $280,000 - $850,000USD</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$280,000 - $850,000USD</Salaryrange>
      <Skills>GPU programming, optimization at scale, custom kernel development, distributed system architectures, low-level tensor core optimizations, orchestrating thousands of GPUs, GPU kernel development, CUDA, Triton, CUTLASS, Flash Attention, tensor core optimization, ML compilers &amp; frameworks, PyTorch/JAX internals, torch.compile, XLA, custom operators, performance engineering, kernel fusion, memory bandwidth optimization, profiling with Nsight, distributed systems, NCCL, NVLink, collective communication, model parallelism, low-precision, INT8/FP8 quantization, mixed-precision techniques, production systems, large-scale training infrastructure, fault tolerance, cluster orchestration, GPU programming, optimization at scale, custom kernel development, distributed system architectures, low-level tensor core optimizations, orchestrating thousands of GPUs, GPU kernel development, CUDA, Triton, CUTLASS, Flash Attention, tensor core optimization, ML compilers &amp; frameworks, PyTorch/JAX internals, torch.compile, XLA, custom operators, performance engineering, kernel fusion, memory bandwidth optimization, profiling with Nsight, distributed systems, NCCL, NVLink, collective communication, model parallelism, low-precision, INT8/FP8 quantization, mixed-precision techniques, production systems, large-scale training infrastructure, fault tolerance, cluster orchestration</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Anthropic</Employername>
      <Employerlogo>https://logos.yubhub.co/anthropic.com.png</Employerlogo>
      <Employerdescription>Anthropic&apos;s mission is to create reliable, interpretable, and steerable AI systems. The company is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.</Employerdescription>
      <Employerwebsite>https://job-boards.greenhouse.io</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/anthropic/jobs/4926227008</Applyto>
      <Location>San Francisco, CA | New York City, NY | Seattle, WA</Location>
      <Country></Country>
      <Postedate>2026-03-08</Postedate>
    </job>
    <job>
      <externalid>d3a39f4c-d95</externalid>
      <Title>Software Engineer, Inference - Multi Modal</Title>
      <Description><![CDATA[<p><strong>Software Engineer, Inference - Multi Modal</strong></p>
<p><strong>Location</strong></p>
<p>San Francisco</p>
<p><strong>Employment Type</strong></p>
<p>Full time</p>
<p><strong>Department</strong></p>
<p>Scaling</p>
<p><strong>Compensation</strong></p>
<ul>
<li>$295K – $555K • Offers Equity</li>
</ul>
<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>
<ul>
<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>
</ul>
<ul>
<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>
</ul>
<ul>
<li>401(k) retirement plan with employer match</li>
</ul>
<ul>
<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>
</ul>
<ul>
<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>
</ul>
<ul>
<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>
</ul>
<ul>
<li>Mental health and wellness support</li>
</ul>
<ul>
<li>Employer-paid basic life and disability coverage</li>
</ul>
<ul>
<li>Annual learning and development stipend to fuel your professional growth</li>
</ul>
<ul>
<li>Daily meals in our offices, and meal delivery credits as eligible</li>
</ul>
<ul>
<li>Relocation support for eligible employees</li>
</ul>
<ul>
<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>
</ul>
<p>More details about our benefits are available to candidates during the hiring process.</p>
<p>This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.</p>
<p><strong>About the Team</strong></p>
<p>OpenAI’s Inference team powers the deployment of our most advanced models - including our GPT models, 4o Image Generation, and Whisper - across a variety of platforms. Our work ensures these models are available, performant, and scalable in production, and we partner closely with Research to bring the next generation of models into the world. We&#39;re a small, fast-moving team of engineers focused on delivering a world-class developer experience while pushing the boundaries of what AI can do.</p>
<p>We’re expanding into multimodal inference, building the infrastructure needed to serve models that handle image, audio, and other non-text modalities. These workloads are inherently more heterogeneous and experimental, involving diverse model sizes and interactions, more complex input/output formats, and tighter coordination with product and research.</p>
<p><strong>About the Role</strong></p>
<p>We’re looking for a software engineer to help us serve OpenAI’s multimodal models at scale. You’ll be part of a small team responsible for building reliable, high-performance infrastructure for serving real-time audio, image, and other MM workloads in production.</p>
<p>This work is inherently cross-functional: you’ll collaborate directly with researchers training these models and with product teams defining new modalities of interaction. You&#39;ll build and optimize the systems that let users generate speech, understand images, and interact with models in ways far beyond text.</p>
<p><strong>In this role, you will:</strong></p>
<ul>
<li>Design and implement inference infrastructure for large-scale multimodal models.</li>
</ul>
<ul>
<li>Optimize systems for high-throughput, low-latency delivery of image and audio inputs and outputs.</li>
</ul>
<ul>
<li>Enable experimental research workflows to transition into reliable production services.</li>
</ul>
<ul>
<li>Collaborate closely with researchers, infra teams, and product engineers to deploy state-of-the-art capabilities.</li>
</ul>
<ul>
<li>Contribute to system-level improvements including GPU utilization, tensor parallelism, and hardware abstraction layers.</li>
</ul>
<p><strong>You might thrive in this role if you:</strong></p>
<ul>
<li>Have experience building and scaling inference systems for LLMs or multimodal models.</li>
</ul>
<ul>
<li>Have worked with GPU-based ML workloads and understand the performance dynamics of large models, especially with complex data like images or audio.</li>
</ul>
<ul>
<li>Enjoy experimental, fast-evolving work and collaborating closely with research.</li>
</ul>
<ul>
<li>Are comfortable dealing with systems that span networking, distributed compute, and high-throughput data handling.</li>
</ul>
<ul>
<li>Have familiarity with inference tooling like vLLM, TensorRT-LLM, or custom model parallel systems.</li>
</ul>
<ul>
<li>Own problems end-to-end and are excited to operate in ambiguous, fast-moving spaces.</li>
</ul>
<p><strong>Nice to Have:</strong></p>
<ul>
<li>Experience working with image generation or audio synthesis models in production.</li>
</ul>
<ul>
<li>Exposure to distributed ML training or system-efficient model design.</li>
</ul>
<p><strong>About OpenAI</strong></p>
<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>mid</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange>$295K – $555K • Offers Equity</Salaryrange>
      <Skills>Software Engineer, Inference Infrastructure, GPU-based ML Workloads, Tensor Parallelism, Hardware Abstraction Layers, vLLM, TensorRT-LLM, Custom Model Parallel Systems, Image Generation, Audio Synthesis, Distributed ML Training, System-Efficient Model Design</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>OpenAI</Employername>
      <Employerlogo>https://logos.yubhub.co/openai.com.png</Employerlogo>
      <Employerdescription>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products.</Employerdescription>
      <Employerwebsite>https://jobs.ashbyhq.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.ashbyhq.com/openai/4d14449e-5e7f-45d4-b103-8776a6c87086</Applyto>
      <Location>San Francisco</Location>
      <Country></Country>
      <Postedate>2026-03-06</Postedate>
    </job>
  </jobs>
</source>