{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/blackwell"},"x-facet":{"type":"skill","slug":"blackwell","display":"Blackwell","count":2},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_854e95b5-76b"},"title":"Sr. Director of Product, Research and Training Infrastructure","description":"<p>CoreWeave is seeking a visionary Sr. Director of Product, Research Training Infrastructure to lead the product strategy and engineering execution for the services that power the most ambitious AI research labs in the world.</p>\n<p>This executive leader will own the product strategy and engineering execution for the Research Training Stack, focusing on the specialized orchestration, evaluation, and iteration tools required for massive-scale pre-training and post-training.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Frontier Orchestration: Oversee the evolution of SUNK (Slurm on Kubernetes) to provide researchers with deterministic, bare-metal performance through a cloud-native interface.</li>\n</ul>\n<ul>\n<li>Holistic Training Services: Drive the development of next-generation orchestrators and automated training-based evaluation frameworks that ensure model quality throughout the lifecycle.</li>\n</ul>\n<ul>\n<li>Post-Training Excellence: Build the infrastructure required for sophisticated Reinforcement Learning (RL) and RLHF pipelines, enabling labs to refine foundation models with maximum efficiency.</li>\n</ul>\n<ul>\n<li>Customer Advocacy: Act as the primary technical partner for lead researchers at global AI labs, translating their &#39;future-state&#39; requirements into actionable product roadmaps.</li>\n</ul>\n<p>Requirements include:</p>\n<ul>\n<li>Proven leadership experience in engineering leadership, with at least 5+ years managing large-scale infrastructure at a top-tier research lab or an AI-native cloud provider.</li>\n</ul>\n<ul>\n<li>Deep, hands-on knowledge of Slurm, Kubernetes, and the specific networking requirements (InfiniBand/RDMA) for distributed training clusters.</li>\n</ul>\n<ul>\n<li>Research mindset and understanding of the &#39;pain points&#39; of a research scientist.</li>\n</ul>\n<ul>\n<li>Scaling experience delivering mission-critical services on multi-thousand GPU clusters (H100/Blackwell/Rubin architectures).</li>\n</ul>\n<ul>\n<li>Strategic vision to define &#39;what&#39;s next&#39; in the AI stack, from automated RL loops to specialized sandbox environments.</li>\n</ul>\n<p>Why CoreWeave?</p>\n<p>In 2026, CoreWeave is the foundation of the largest infrastructure buildout in human history. We are building AI Factories, not just data centers.</p>\n<ul>\n<li>Silicon-Up Innovation: Work directly with the latest NVIDIA architectures.</li>\n</ul>\n<ul>\n<li>Impact: You will be the architect of the environment that enables the next new discovery.</li>\n</ul>\n<p>Velocity: We move at the speed of the researchers we support, bypassing legacy cloud bottlenecks to deliver raw power.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_854e95b5-76b","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4665964006","x-work-arrangement":"hybrid","x-experience-level":"executive","x-job-type":"full-time","x-salary-range":"$233,000 to $341,000","x-skills-required":["Slurm","Kubernetes","InfiniBand/RDMA","Distributed training clusters","GPU clusters","H100/Blackwell/Rubin architectures","Reinforcement Learning (RL)","RLHF pipelines"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:50:11.130Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Slurm, Kubernetes, InfiniBand/RDMA, Distributed training clusters, GPU clusters, H100/Blackwell/Rubin architectures, Reinforcement Learning (RL), RLHF pipelines","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":233000,"maxValue":341000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_5d37a7c7-d2a"},"title":"ML Infrastructure Engineer","description":"<p><strong>About the role</strong></p>\n<p>The ML Infrastructure team at Cursor builds large-scale compute, storage, and software infrastructure to support the company&#39;s work building the world&#39;s best agentic coding model. We&#39;re looking for strong engineers who are interested in building high-performance infrastructure and the software to support it. This role works closely with ML researchers and engineers to enable their work through improvements to our training framework, systems reliability/performance, and developer experience.</p>\n<p><strong>What you&#39;ll do</strong></p>\n<ul>\n<li>Collaborate with ML researchers to improve the throughput and reliability of training</li>\n<li>Work with OEMs, cloud service providers, and others to plan and build cutting-edge GPU infrastructure</li>\n<li>Improve the density and scalability of compute environments to enable increasingly large RL workloads</li>\n<li>Create software and systems to automate building, monitoring, and running GPU clusters</li>\n<li>Build workload scheduling and data movement systems to support Cursor&#39;s growing training footprint</li>\n</ul>\n<p><strong>You may be a fit if</strong></p>\n<ul>\n<li>A strong background in systems and infrastructure-focused software engineering, particularly in Python, Typescript, Rust, and Golang</li>\n<li>Experience with distributed storage and networking infrastructure, particularly on Linux systems across cloud and bare metal environments</li>\n<li>Exposure to large-scale systems and their unique challenges, ideally across thousands of nodes with significant resource footprints</li>\n</ul>\n<p><strong>Nice to have</strong></p>\n<ul>\n<li>Operational exposure to Nvidia GPUs with Infiniband or RoCE, particularly with Blackwell and Hopper-class hardware</li>\n<li>Exposure to Ray, Slurm, or other common compute and runtime schedulers</li>\n</ul>\n<p>Name<em> Email</em> ↥ Upload file LinkedIn URL GitHub Profile</p>\n<p>Please write a short note on a project you&#39;re proud of:</p>\n<p>Will you now or in the future require visa sponsorship to work in the country where this position is located?</p>\n<p>Has someone at Cursor referred you for this role? If so, please include their email here</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_5d37a7c7-d2a","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Cursor","sameAs":"https://cursor.com","logo":"https://logos.yubhub.co/cursor.com.png"},"x-apply-url":"https://cursor.com/careers/software-engineer-ml-infrastructure","x-work-arrangement":"remote","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Python","Typescript","Rust","Golang","Distributed storage","Networking infrastructure","Linux systems","Kubernetes"],"x-skills-preferred":["Nvidia GPUs","Infiniband","RoCE","Blackwell","Hopper-class hardware","Ray","Slurm"],"datePosted":"2026-03-08T00:17:18.553Z","jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Typescript, Rust, Golang, Distributed storage, Networking infrastructure, Linux systems, Kubernetes, Nvidia GPUs, Infiniband, RoCE, Blackwell, Hopper-class hardware, Ray, Slurm"}]}