{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/custom-storage-engine-development"},"x-facet":{"type":"skill","slug":"custom-storage-engine-development","display":"Custom Storage Engine Development","count":1},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_ff4d3a91-b20"},"title":"Principal Engineer - Perf and Benchmarking","description":"<p>We&#39;re looking for a Principal Engineer to be the technical lead of CoreWeave&#39;s Benchmarking &amp; Performance team. You will be responsible for our planet-scale performance data warehouse: Ingesting, storing, transforming and analyzing performance events in all the data centers across our global infrastructure.</p>\n<p>You will also be an integral part of achieving industry-leading end-to-end performance benchmarking publications: If MLPerf (Training &amp; Inference), Working closely with NVIDIA (Megatron-LM, TensorRT-LLM &amp; DGX cloud) and the open-source community (llm-d, vLLM and all popular ML frameworks) speak to you, come help us demonstrate CoreWeave&#39;s performance reliability leadership in the field.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Strategy &amp; Leadership - Define the multi-year benchmarking strategy and roadmap; prioritize models/workloads (LLMs, diffusion, vision, speech) and hardware tiers. Build, lead, and mentor a high-performing team of performance engineers and data analysts. Establish governance for claims: documented methodologies, versioning, reproducibility, and audit trails.</li>\n</ul>\n<ul>\n<li>Perf Ownership - Lead end-to-end MLPerf Inference and Training submissions: workload selection, cluster planning, runbooks, audits, and result publication. Coordinate optimization tracks with NVIDIA (CUDA, cuDNN, TensorRT/TensorRT-LLM, Triton, NCCL) to hit competitive results; drive upstream fixes where needed.</li>\n</ul>\n<ul>\n<li>Internal Latency &amp; Throughput Benchmarks - Design a Kubernetes-native, repeatable benchmarking service that exercises CoreWeave stacks across SUNK (Slurm on Kubernetes), Kueue, and Kubeflow pipelines. Measure and report p50/p95/p99 latency, jitter, tokens/s, time-to-first-token, cold-start/warm-start, and cost-per-token/request across models, precisions (BF16/FP8/FP4), batch sizes, and GPU types. Maintain a corpus of representative scenarios (streaming, batch, multi-tenant) and data sets; automate comparisons across software releases and hardware generations.</li>\n</ul>\n<ul>\n<li>Tooling &amp; Automation - Build CI/CD pipelines and K8s controllers/operators to schedule benchmarks at scale; integrate with observability stacks (Prometheus, Grafana, OpenTelemetry) and results warehouses. Implement supply-chain integrity for benchmark artifacts (SBOMs, Cosign signatures).</li>\n</ul>\n<ul>\n<li>Cross-functional &amp; Community - Partner with NVIDIA, key ISVs, and OSS projects (vLLM, Triton, KServe, PyTorch/DeepSpeed, ONNX Runtime) to co-develop optimizations and upstream improvements. Support Sales/SEs with authoritative numbers for RFPs and competitive evaluations; brief analysts and press with rigorous, defensible data.</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>10+ years building distributed systems or HPC/cloud services, with deep expertise on large-scale ML training or similar high-performance workloads.</li>\n</ul>\n<ul>\n<li>Proven track record of architecting or building planet-scale data systems (e.g., telemetry platforms, observability stacks, cloud data warehouses, large-scale OLAP engines).</li>\n</ul>\n<ul>\n<li>Deep understanding of GPU performance (CUDA, NCCL, RDMA, NVLink/PCIe, memory bandwidth), model-server stacks (Triton, vLLM, TensorRT-LLM, TorchServe), and distributed training frameworks (PyTorch FSDP/DeepSpeed/Megatron-LM).</li>\n</ul>\n<ul>\n<li>Proficient with Kubernetes and ML control planes; familiarity with SUNK, Kueue, and Kubeflow in production environments.</li>\n</ul>\n<ul>\n<li>Excellent communicator able to interface with executives, customers, auditors, and OSS communities.</li>\n</ul>\n<p><strong>Nice to have</strong></p>\n<ul>\n<li>Experience with time-series databases, log-structured merge trees (LSM), or custom storage engine development.</li>\n</ul>\n<ul>\n<li>Experience running MLPerf submissions (Inference and/or Training) or equivalent audited benchmarks at scale.</li>\n</ul>\n<ul>\n<li>Contributions to MLPerf, Triton, vLLM, PyTorch, KServe, or similar OSS projects.</li>\n</ul>\n<ul>\n<li>Experience benchmarking multi-region fleets and large clusters (thousands of GPUs).</li>\n</ul>\n<ul>\n<li>Publications/talks on ML performance, latency engineering, or large-scale benchmarking methodology.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_ff4d3a91-b20","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4627302006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$206,000 to $333,000","x-skills-required":["Distributed systems","HPC/cloud services","Large-scale ML training","GPU performance","Model-server stacks","Distributed training frameworks","Kubernetes","ML control planes","Time-series databases","Log-structured merge trees","Custom storage engine development"],"x-skills-preferred":["MLPerf submissions","Audited benchmarks","Contributions to OSS projects","Benchmarking multi-region fleets","Large clusters","Publications/talks on ML performance"],"datePosted":"2026-04-18T15:51:17.448Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Distributed systems, HPC/cloud services, Large-scale ML training, GPU performance, Model-server stacks, Distributed training frameworks, Kubernetes, ML control planes, Time-series databases, Log-structured merge trees, Custom storage engine development, MLPerf submissions, Audited benchmarks, Contributions to OSS projects, Benchmarking multi-region fleets, Large clusters, Publications/talks on ML performance","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":206000,"maxValue":333000,"unitText":"YEAR"}}}]}