Senior Software Engineer II, Inference

ec7cc743-ef4 Senior Software Engineer II, Inference We're seeking a senior software engineer to join our team and lead the design and development of our Kubernetes-native inference platform. As a senior engineer, you will be responsible for leading design reviews, driving architecture, and ensuring the reliability and scalability of our platform.

Key responsibilities include:

Leading design reviews and driving architecture within the team
Defining and owning SLIs/SLOs and ensuring post-incident actions land and reliability improves release-over-release
Implementing advanced optimizations such as micro-batch schedulers, speculative decoding, and KV-cache reuse
Strengthening incident posture through capacity planning, autoscaling policy, and rollback/traffic-shift strategies
Mentoring IC1/IC2 engineers and reviewing cross-team designs to elevate coding/testing standards

We're looking for someone with strong coding skills in Python or Go, deep familiarity with networked systems and performance, and hands-on experience with Kubernetes at production scale. If you have experience with inference internals, batching, caching, mixed precision, and streaming token delivery, that's a plus.

In addition to a competitive salary, we offer a range of benefits including medical, dental, and vision insurance, company-paid life insurance, and flexible PTO. We're committed to creating a work environment that's inclusive, diverse, and supportive of our employees' well-being.

XML job scraping automation by YubHub

]]> full-time senior hybrid $165,000 to $242,000 Python, Go, Kubernetes, Networked systems, Performance, Inference internals, Batching, Caching, Mixed precision, Streaming token delivery, CUDA kernels, NCCL/SHARP, RDMA/NUMA, GPU interconnect topologies, Contributions to inference frameworks, Experience with multi-team initiatives Engineering Technology CoreWeave https://logos.yubhub.co/coreweave.com.png CoreWeave is a cloud computing company that provides a platform for building and scaling AI applications. https://www.coreweave.com https://job-boards.greenhouse.io/coreweave/jobs/4604832006 Sunnyvale, CA / Bellevue, WA 2026-04-18 9701c504-1a6 Senior Software Engineer I, Inference We're looking for a Senior Software Engineer I to join our team. As a senior engineer, you'll lead designs, raise engineering standards, and deliver measurable improvements to latency, throughput, and reliability across multiple services. You'll partner with product, orchestration, and hardware teams to evolve our Kubernetes-native inference platform and meet strict P99 SLAs at scale.

Key responsibilities include:

Lead design reviews and drive architecture within the team; decompose multi-service work into clear milestones.
Define and own SLIs/SLOs; ensure post-incident actions land and reliability improves release-over-release.
Implement advanced optimizations (e.g., micro-batch schedulers, speculative decoding, KV-cache reuse) and quantify impact.
Strengthen incident posture: capacity planning, autoscaling policy, graceful degradation, rollback/traffic-shift strategies.
Mentor IC1/IC2 engineers; review cross-team designs and elevate coding/testing standards.

Requirements include:

3-5 years of industry experience building distributed systems or cloud services.
Strong coding in Python or Go (C++ a plus) and deep familiarity with networked systems and performance.
Hands-on experience with Kubernetes at production scale, CI/CD, and observability stacks (Prometheus, Grafana, OpenTelemetry).
Practical knowledge of inference internals: batching, caching, mixed precision (BF16/FP8), streaming token delivery.
Proven track record improving tail latency (P95/P99) and service reliability through metrics-driven work.

Preferred qualifications include contributions to inference frameworks, experience with CUDA kernels, NCCL/SHARP, RDMA/NUMA, or GPU interconnect topologies, and leading multi-team initiatives or partnering with customers on mission-critical launches.

XML job scraping automation by YubHub

]]> full-time senior hybrid $139,000 to $204,000 Python, Go, Kubernetes, CI/CD, Observability stacks, Inference internals, Batching, Caching, Mixed precision, Streaming token delivery, Contributions to inference frameworks, CUDA kernels, NCCL/SHARP, RDMA/NUMA, GPU interconnect topologies Engineering Technology CoreWeave https://logos.yubhub.co/coreweave.com.png CoreWeave is a cloud computing company that provides a platform for building and scaling AI. It was founded in 2017 and became a publicly traded company in March 2025. https://www.coreweave.com https://job-boards.greenhouse.io/coreweave/jobs/4647603006 Sunnyvale, CA / Bellevue, WA 2026-04-18 71554e46-b64 Senior Engineering Manager, AI Runtime At Databricks, we are committed to enabling data teams to solve the world's toughest problems. As a Senior Engineering Manager, you will lead the team owning both the product experience and the foundational infrastructure of our AI Runtime (AIR) product.

You will be responsible for shaping customer-facing capabilities while designing for scalability, extensibility, and performance of GPU training and adjacent areas. This will involve collaborating closely across the platform, product, infrastructure, and research organisations.

Key responsibilities include:

Leading, mentoring, and growing a high-performing engineering team responsible for the Custom Training product and its foundational infrastructure
Defining and owning the product and technical roadmap for AIR, balancing customer experience, functionality, and foundational investments
Collaborating closely with product, research, platform, infrastructure teams, and customers to drive end-to-end delivery
Driving architectural decisions and product design for managed GPU training at scale
Advocating for customer needs through direct engagement, ensuring engineering decisions translate to clear product impact

We are looking for someone with 8+ years of software engineering experience, with 3+ years in engineering management. You should have a track record of building and operating managed GPU training infrastructure at scale, as well as deep familiarity with distributed training frameworks and parallelism strategies.

In addition, you should have experience with training resilience patterns, such as checkpointing, elastic training, and automated failure recovery for long-running jobs. You should also have a strong understanding of GPU performance fundamentals, including NCCL, interconnect topologies, and memory optimisation.

Experience building platform products with clear SLAs is also essential, as is strong cross-functional leadership across platform, product, and research teams. Excellent collaboration and communication skills are also required.

The pay range for this role is $228,600-$314,250 USD per year, depending on location. The total compensation package may also include eligibility for annual performance bonus, equity, and benefits.

XML job scraping automation by YubHub

]]> full-time senior onsite $228,600-$314,250 USD per year software engineering, engineering management, distributed training frameworks, parallelism strategies, GPU training infrastructure, checkpointing, elastic training, automated failure recovery, GPU performance fundamentals, NCCL, interconnect topologies, memory optimisation Engineering Technology Databricks https://logos.yubhub.co/databricks.com.png Databricks is a data and AI company that provides a unified platform for data, analytics, and AI. It was founded by the original creators of Lakehouse, Apache Spark, Delta Lake, and MLflow. https://databricks.com https://job-boards.greenhouse.io/databricks/jobs/8490282002 Mountain View, California; San Francisco, California 2026-04-18