<?xml version="1.0" encoding="UTF-8"?>
<source>
  <jobs>
    <job>
      <externalid>f2196e99-854</externalid>
      <Title>Software Engineer - GenAI inference</Title>
      <Description><![CDATA[<p>As a software engineer for GenAI inference, you will help design, develop, and optimize the inference engine that powers Databricks&#39; Foundation Model API. You&#39;ll work at the intersection of research and production, ensuring our large language model (LLM) serving systems are fast, scalable, and efficient.</p>
<p>Your work will touch the full GenAI inference stack , from kernels and runtimes to orchestration and memory management. You will contribute to the design and implementation of the inference engine, and collaborate on model-serving stack optimized for large-scale LLMs inference.</p>
<p>Key responsibilities include:</p>
<ul>
<li>Collaborating with researchers to bring new model architectures or features (sparsity, activation compression, mixture-of-experts) into the engine</li>
<li>Optimizing for latency, throughput, memory efficiency, and hardware utilization across GPUs, and accelerators</li>
<li>Building and maintaining instrumentation, profiling, and tracing tooling to uncover bottlenecks and guide optimizations</li>
<li>Developing and enhancing scalable routing, batching, scheduling, memory management, and dynamic loading mechanisms for inference workloads</li>
<li>Supporting reliability, reproducibility, and fault tolerance in the inference pipelines, including A/B launches, rollback, and model versioning</li>
<li>Integrating with federated, distributed inference infrastructure – orchestrate across nodes, balance load, handle communication overhead</li>
<li>Collaborating cross-functionally: with platform engineers, cloud infrastructure, and security/compliance teams</li>
<li>Documenting and sharing learnings, contributing to internal best practices and open-source efforts when possible</li>
</ul>
<p>Requirements include:</p>
<ul>
<li>BS/MS/PhD in Computer Science, or a related field</li>
<li>Strong software engineering background (3+ years or equivalent) in performance-critical systems</li>
<li>Solid understanding of ML inference internals: attention, MLPs, recurrent modules, quantization, sparse operations, etc.</li>
<li>Hands-on experience with CUDA, GPU programming, and key libraries (cuBLAS, cuDNN, NCCL, etc.)</li>
<li>Comfortable designing and operating distributed systems, including RPC frameworks, queuing, RPC batching, sharding, memory partitioning</li>
<li>Demonstrated ability to uncover and solve performance bottlenecks across layers (kernel, memory, networking, scheduler)</li>
<li>Experience building instrumentation, tracing, and profiling tools for ML models</li>
<li>Ability to work closely with ML researchers, translate novel model ideas into production systems</li>
<li>Ownership mindset and eagerness to dive deep into complex system challenges</li>
<li>Bonus: published research or open-source contributions in ML systems, inference optimization, or model serving</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange>$142,200-$204,600 USD</Salaryrange>
      <Skills>software engineering, performance-critical systems, ML inference internals, CUDA, GPU programming, distributed systems, RPC frameworks, queuing, RPC batching, sharding, memory partitioning, instrumentation, tracing, profiling tools, ML researchers, complex system challenges</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Databricks</Employername>
      <Employerlogo>https://logos.yubhub.co/databricks.com.png</Employerlogo>
      <Employerdescription>Databricks is a data and AI company that provides a unified platform for data, analytics, and AI. It was founded by the original creators of Lakehouse, Apache Spark, Delta Lake, and MLflow.</Employerdescription>
      <Employerwebsite>https://databricks.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/databricks/jobs/8202670002</Applyto>
      <Location>San Francisco, California</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
  </jobs>
</source>