{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/deep-learning-infrastructure"},"x-facet":{"type":"skill","slug":"deep-learning-infrastructure","display":"Deep Learning Infrastructure","count":2},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_6821b43f-4f8"},"title":"Senior Software Engineer, RL Post-Training Frameworks","description":"<p>Reinforcement learning post-training is driving some of the most significant capability gains in AI today. It is the process that teaches a model to reason through hard problems, follow complex instructions, and act as an autonomous agent. We are building an RL Frameworks engineering team to develop the open-source tools and infrastructure that AI researchers and post-training teams depend on.</p>\n<p>As a Senior Software Engineer on our team, you will architect and build RL post-training infrastructure that scales efficiently from experimentation on a single GPU to production across thousands of nodes. This means tuning RL training-inference-rollout loops on GPUs, CPUs, and LPUs for performance where it matters, contributing to and improving the performance and usability of open-source RL frameworks, and partnering with the teams who own them.</p>\n<p>The role also spans fault tolerance, elastic scaling, and fast restarts so long-running distributed training jobs survive failures, stragglers, and resource contention. Beyond GPU-accelerated training, this work includes partnering with teams building CPU-driven rollout workloads, including tool-use, code execution, and agentic environments, supplying the systems and framework engineering needed to run them efficiently alongside GPU- or LPU-accelerated generation and GPU-accelerated training.</p>\n<p>We are looking for a highly skilled engineer with experience in distributed systems, high-performance computing, deep learning infrastructure, or ML systems engineering. You should have strong proficiency in Python and C/C++, and demonstrated experience building or contributing to large-scale distributed systems or runtime frameworks in production at a frontier AI lab, hyperscaler, or major technology company.</p>\n<p>In addition to the core responsibilities, you will have the opportunity to work on various technical areas such as reinforcement learning for LLM post-training, PyTorch internals, Kubernetes runtime internals, and end-to-end distributed systems design. You will also have the chance to contribute to open-source projects and participate in the development of new technologies and tools.</p>\n<p>If you are a motivated and experienced engineer who is passionate about building scalable and efficient systems, we encourage you to apply for this exciting opportunity.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_6821b43f-4f8","directApply":true,"hiringOrganization":{"@type":"Organization","name":"NVIDIA","sameAs":"https://www.nvidia.com","logo":"https://logos.yubhub.co/nvidia.com.png"},"x-apply-url":"https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Senior-Software-Engineer--RL-Post-Training-Frameworks_JR2015863","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$150,000 - $250,000 per year","x-skills-required":["Python","C/C++","Distributed Systems","High-Performance Computing","Deep Learning Infrastructure","ML Systems Engineering"],"x-skills-preferred":["Reinforcement Learning","PyTorch","Kubernetes","End-to-End Distributed Systems Design"],"datePosted":"2026-04-24T12:14:17.716Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Santa Clara"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, C/C++, Distributed Systems, High-Performance Computing, Deep Learning Infrastructure, ML Systems Engineering, Reinforcement Learning, PyTorch, Kubernetes, End-to-End Distributed Systems Design","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":150000,"maxValue":250000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_07a3c83e-51e"},"title":"Research Engineer, Infrastructure, Numerics","description":"<p>We&#39;re looking for an infrastructure research engineer to design and build the core systems that enable efficient large-scale model training with a focus on numerics. You will focus on improving the numerical foundations of our distributed training stack, from precision formats and kernel optimizations to communication frameworks that make training trillion-parameter models stable, scalable, and fast.</p>\n<p>This role is ideal for someone who thrives at the intersection of research and systems engineering: a builder who understands both the math of optimization and the realities of distributed compute.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Design and optimize distributed training infrastructure for large-scale LLMs, focusing on performance, stability, and reproducibility across multi-GPU and multi-node setups.</li>\n<li>Implement and evaluate low-precision numerics (for example, BF16, MXFP8, NVFP4) to improve efficiency without sacrificing model quality.</li>\n<li>Develop kernels and communication primitives that use hardware-level support for mixed and low-precision arithmetic.</li>\n<li>Collaborate with research teams to co-design model architectures and training recipes that align with emerging numeric formats and stability constraints.</li>\n<li>Prototype and benchmark scaling strategies such as data, tensor, and pipeline parallelism that integrate precision-adaptive computation and quantized communication.</li>\n<li>Contribute to the design of our internal orchestration and monitoring systems to ensure that thousands of distributed experiments can run efficiently and reproducibly.</li>\n<li>Publish and share learnings through internal documentation, open-source libraries, or technical reports that advance the field of scalable AI infrastructure.</li>\n</ul>\n<p>Skills and Qualifications:</p>\n<p>Minimum qualifications:</p>\n<ul>\n<li>Bachelor’s degree or equivalent experience in computer science, electrical engineering, statistics, machine learning, physics, robotics, or similar.</li>\n<li>Understanding of deep learning frameworks (e.g., PyTorch, JAX) and their underlying system architectures.</li>\n<li>Thrive in a highly collaborative environment involving many, different cross-functional partners and subject matter experts.</li>\n<li>A bias for action with a mindset to take initiative to work across different stacks and different teams where you spot the opportunity to make sure something ships.</li>\n<li>Strong engineering skills, ability to contribute performant, maintainable code and debug in complex codebases in areas such as floating-point numerics, low-precision arithmetic, and distributed systems.</li>\n</ul>\n<p>Preferred qualifications , we encourage you to apply if you meet some but not all of these:</p>\n<ul>\n<li>Familiarity with distributed frameworks such as PyTorch/XLA, DeepSpeed, Megatron-LM.</li>\n<li>Experience implementing FP8, INT8, or block-floating point (MX) formats and understanding their numerical trade-offs.</li>\n<li>Prior contributions to open-source deep learning infrastructure such as PyTorch, DeepSpeed, or XLA.</li>\n<li>Publications, patents, or projects related to numerical optimization, communication-efficient training, or systems for large models.</li>\n<li>Experience training and supporting large-scale AI models.</li>\n<li>Track record of improving research productivity through infrastructure design or process improvements.</li>\n</ul>\n<p>Logistics:</p>\n<ul>\n<li>Location: This role is based in San Francisco, California.</li>\n<li>Compensation: Depending on background, skills and experience, the expected annual salary range for this position is $350,000 - $475,000 USD.</li>\n<li>Visa sponsorship: We sponsor visas. While we can&#39;t guarantee success for every candidate or role, if you&#39;re the right fit, we&#39;re committed to working through the visa process together.</li>\n<li>Benefits: Thinking Machines offers generous health, dental, and vision benefits, unlimited PTO, paid parental leave, and relocation support as needed.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_07a3c83e-51e","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Thinking Machines Lab","sameAs":"https://thinkingmachines.ai/","logo":"https://logos.yubhub.co/thinkingmachines.ai.png"},"x-apply-url":"https://job-boards.greenhouse.io/thinkingmachines/jobs/5013937008","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$350,000 - $475,000 USD","x-skills-required":["Bachelor’s degree or equivalent experience in computer science, electrical engineering, statistics, machine learning, physics, robotics, or similar","Understanding of deep learning frameworks (e.g., PyTorch, JAX) and their underlying system architectures","Thriving in a highly collaborative environment involving many, different cross-functional partners and subject matter experts","Strong engineering skills, ability to contribute performant, maintainable code and debug in complex codebases in areas such as floating-point numerics, low-precision arithmetic, and distributed systems","Familiarity with distributed frameworks such as PyTorch/XLA, DeepSpeed, Megatron-LM"],"x-skills-preferred":["Experience implementing FP8, INT8, or block-floating point (MX) formats and understanding their numerical trade-offs","Prior contributions to open-source deep learning infrastructure such as PyTorch, DeepSpeed, or XLA","Publications, patents, or projects related to numerical optimization, communication-efficient training, or systems for large models","Experience training and supporting large-scale AI models","Track record of improving research productivity through infrastructure design or process improvements"],"datePosted":"2026-04-18T15:56:14.922Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Bachelor’s degree or equivalent experience in computer science, electrical engineering, statistics, machine learning, physics, robotics, or similar, Understanding of deep learning frameworks (e.g., PyTorch, JAX) and their underlying system architectures, Thriving in a highly collaborative environment involving many, different cross-functional partners and subject matter experts, Strong engineering skills, ability to contribute performant, maintainable code and debug in complex codebases in areas such as floating-point numerics, low-precision arithmetic, and distributed systems, Familiarity with distributed frameworks such as PyTorch/XLA, DeepSpeed, Megatron-LM, Experience implementing FP8, INT8, or block-floating point (MX) formats and understanding their numerical trade-offs, Prior contributions to open-source deep learning infrastructure such as PyTorch, DeepSpeed, or XLA, Publications, patents, or projects related to numerical optimization, communication-efficient training, or systems for large models, Experience training and supporting large-scale AI models, Track record of improving research productivity through infrastructure design or process improvements","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":350000,"maxValue":475000,"unitText":"YEAR"}}}]}