{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/deepspeed"},"x-facet":{"type":"skill","slug":"deepspeed","display":"Deepspeed","count":3},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_07a3c83e-51e"},"title":"Research Engineer, Infrastructure, Numerics","description":"<p>We&#39;re looking for an infrastructure research engineer to design and build the core systems that enable efficient large-scale model training with a focus on numerics. You will focus on improving the numerical foundations of our distributed training stack, from precision formats and kernel optimizations to communication frameworks that make training trillion-parameter models stable, scalable, and fast.</p>\n<p>This role is ideal for someone who thrives at the intersection of research and systems engineering: a builder who understands both the math of optimization and the realities of distributed compute.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Design and optimize distributed training infrastructure for large-scale LLMs, focusing on performance, stability, and reproducibility across multi-GPU and multi-node setups.</li>\n<li>Implement and evaluate low-precision numerics (for example, BF16, MXFP8, NVFP4) to improve efficiency without sacrificing model quality.</li>\n<li>Develop kernels and communication primitives that use hardware-level support for mixed and low-precision arithmetic.</li>\n<li>Collaborate with research teams to co-design model architectures and training recipes that align with emerging numeric formats and stability constraints.</li>\n<li>Prototype and benchmark scaling strategies such as data, tensor, and pipeline parallelism that integrate precision-adaptive computation and quantized communication.</li>\n<li>Contribute to the design of our internal orchestration and monitoring systems to ensure that thousands of distributed experiments can run efficiently and reproducibly.</li>\n<li>Publish and share learnings through internal documentation, open-source libraries, or technical reports that advance the field of scalable AI infrastructure.</li>\n</ul>\n<p>Skills and Qualifications:</p>\n<p>Minimum qualifications:</p>\n<ul>\n<li>Bachelor’s degree or equivalent experience in computer science, electrical engineering, statistics, machine learning, physics, robotics, or similar.</li>\n<li>Understanding of deep learning frameworks (e.g., PyTorch, JAX) and their underlying system architectures.</li>\n<li>Thrive in a highly collaborative environment involving many, different cross-functional partners and subject matter experts.</li>\n<li>A bias for action with a mindset to take initiative to work across different stacks and different teams where you spot the opportunity to make sure something ships.</li>\n<li>Strong engineering skills, ability to contribute performant, maintainable code and debug in complex codebases in areas such as floating-point numerics, low-precision arithmetic, and distributed systems.</li>\n</ul>\n<p>Preferred qualifications , we encourage you to apply if you meet some but not all of these:</p>\n<ul>\n<li>Familiarity with distributed frameworks such as PyTorch/XLA, DeepSpeed, Megatron-LM.</li>\n<li>Experience implementing FP8, INT8, or block-floating point (MX) formats and understanding their numerical trade-offs.</li>\n<li>Prior contributions to open-source deep learning infrastructure such as PyTorch, DeepSpeed, or XLA.</li>\n<li>Publications, patents, or projects related to numerical optimization, communication-efficient training, or systems for large models.</li>\n<li>Experience training and supporting large-scale AI models.</li>\n<li>Track record of improving research productivity through infrastructure design or process improvements.</li>\n</ul>\n<p>Logistics:</p>\n<ul>\n<li>Location: This role is based in San Francisco, California.</li>\n<li>Compensation: Depending on background, skills and experience, the expected annual salary range for this position is $350,000 - $475,000 USD.</li>\n<li>Visa sponsorship: We sponsor visas. While we can&#39;t guarantee success for every candidate or role, if you&#39;re the right fit, we&#39;re committed to working through the visa process together.</li>\n<li>Benefits: Thinking Machines offers generous health, dental, and vision benefits, unlimited PTO, paid parental leave, and relocation support as needed.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_07a3c83e-51e","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Thinking Machines Lab","sameAs":"https://thinkingmachines.ai/","logo":"https://logos.yubhub.co/thinkingmachines.ai.png"},"x-apply-url":"https://job-boards.greenhouse.io/thinkingmachines/jobs/5013937008","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$350,000 - $475,000 USD","x-skills-required":["Bachelor’s degree or equivalent experience in computer science, electrical engineering, statistics, machine learning, physics, robotics, or similar","Understanding of deep learning frameworks (e.g., PyTorch, JAX) and their underlying system architectures","Thriving in a highly collaborative environment involving many, different cross-functional partners and subject matter experts","Strong engineering skills, ability to contribute performant, maintainable code and debug in complex codebases in areas such as floating-point numerics, low-precision arithmetic, and distributed systems","Familiarity with distributed frameworks such as PyTorch/XLA, DeepSpeed, Megatron-LM"],"x-skills-preferred":["Experience implementing FP8, INT8, or block-floating point (MX) formats and understanding their numerical trade-offs","Prior contributions to open-source deep learning infrastructure such as PyTorch, DeepSpeed, or XLA","Publications, patents, or projects related to numerical optimization, communication-efficient training, or systems for large models","Experience training and supporting large-scale AI models","Track record of improving research productivity through infrastructure design or process improvements"],"datePosted":"2026-04-18T15:56:14.922Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Bachelor’s degree or equivalent experience in computer science, electrical engineering, statistics, machine learning, physics, robotics, or similar, Understanding of deep learning frameworks (e.g., PyTorch, JAX) and their underlying system architectures, Thriving in a highly collaborative environment involving many, different cross-functional partners and subject matter experts, Strong engineering skills, ability to contribute performant, maintainable code and debug in complex codebases in areas such as floating-point numerics, low-precision arithmetic, and distributed systems, Familiarity with distributed frameworks such as PyTorch/XLA, DeepSpeed, Megatron-LM, Experience implementing FP8, INT8, or block-floating point (MX) formats and understanding their numerical trade-offs, Prior contributions to open-source deep learning infrastructure such as PyTorch, DeepSpeed, or XLA, Publications, patents, or projects related to numerical optimization, communication-efficient training, or systems for large models, Experience training and supporting large-scale AI models, Track record of improving research productivity through infrastructure design or process improvements","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":350000,"maxValue":475000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_50cacac8-b47"},"title":"Research Engineer, Machine Learning","description":"<p><strong>About the Role</strong></p>\n<p>We are seeking a Research Engineer to join our Machine Learning team. As a Research Engineer, you will work on building and optimizing large-scale learning systems that power our open-weight models.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Accelerate researchers by taking on the heavy parts of large-scale ML pipelines and building robust tools.</li>\n<li>Interface cutting-edge research with production: integrate checkpoints, streamline evaluation, and expose APIs.</li>\n<li>Conduct experiments on the latest deep-learning techniques.</li>\n<li>Design, implement and benchmark ML algorithms; write clear, efficient code in Python.</li>\n<li>Deliver prototypes that become production-grade components for Le Chat and our enterprise API.</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>Master&#39;s or PhD in Computer Science (or equivalent proven track record).</li>\n<li>4 + years working on large-scale ML codebases.</li>\n<li>Hands-on with PyTorch, JAX or TensorFlow; comfortable with distributed training (DeepSpeed / FSDP / SLURM / K8s).</li>\n<li>Experience in deep learning, NLP or LLMs; bonus for CUDA or data-pipeline chops.</li>\n<li>Strong software-design instincts: testing, code review, CI/CD.</li>\n<li>Self-starter, low-ego, collaborative.</li>\n</ul>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Competitive cash salary and equity.</li>\n<li>Food: Daily lunch vouchers.</li>\n<li>Sport: Monthly contribution to a Gympass subscription.</li>\n<li>Transportation: Monthly contribution to a mobility pass.</li>\n<li>Health: Full health insurance for you and your family.</li>\n<li>Parental: Generous parental leave policy.</li>\n</ul>\n<p>Note: Benefits may vary depending on location.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_50cacac8-b47","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai/careers","logo":"https://logos.yubhub.co/mistral.ai.png"},"x-apply-url":"https://jobs.lever.co/mistral/07447e1d-7900-46d4-b61b-186f2f76847f","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["PyTorch","JAX","TensorFlow","DeepSpeed","FSDP","SLURM","K8s","Python","CUDA","data-pipeline"],"x-skills-preferred":[],"datePosted":"2026-04-17T12:47:05.094Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Paris"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"PyTorch, JAX, TensorFlow, DeepSpeed, FSDP, SLURM, K8s, Python, CUDA, data-pipeline"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_2bc207d0-89b"},"title":"Senior Machine Learning Engineer","description":"<p>We are seeking a Senior Machine Learning Research Engineer to join the Machine Learning Science (MLS) team, within the Computational Science department. The ideal candidate has a strong knowledge in designing and building deep learning (DL) pipelines, and expertise in creating reliable, scalable artificial intelligence/machine learning (AI/ML) systems in a cloud environment.</p>\n<p>The MLS team at Freenome develops DL models using massive-scale genomic data that presents significant challenges for current training paradigms. The Senior Machine Learning Research Engineer will primarily be responsible for developing and deploying the infrastructure needed to support development of such DL models: enabling distributed DL pipelines, optimising hardware utilisation for efficient training, and performing model optimisations.</p>\n<p>As part of an interdisciplinary R&amp;D team, they will work in close collaboration with machine learning scientists, computational biologists and software engineers to accelerate the development of state-of-the-art ML/AI models and help Freenome achieve its mission.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Implementing and refining DL pipelines on distributed computing platforms to enhance the speed and efficiency of DL operations, including model training, data handling, model management, and inference.</li>\n<li>Collaborating closely with ML scientists and software engineers to understand current challenges and requirements and ensure that the DL model development pipelines created are perfectly aligned with scientific goals and operational needs.</li>\n<li>Continuously monitoring, evaluating, and optimising DL model training pipelines for performance and scalability.</li>\n<li>Staying up to date with the latest advancements in AI, ML, and related technologies, and quickly learning and adapting new tools and frameworks, if necessary.</li>\n<li>Developing and maintaining robust and reproducible DL pipelines that guarantee that DL pipelines can be reliably executed, maintaining consistency and accuracy of results.</li>\n<li>Driving performance improvements across our stack through profiling, optimisation, and benchmarking. Implementing efficient caching solutions and debugging distributed systems to accelerate both training and evaluation pipelines.</li>\n<li>Acting as a bridge facilitating communication between the engineering and scientific teams, documenting and sharing best practices to foster a culture of learning and continuous improvement.</li>\n</ul>\n<p>Must-haves include:</p>\n<ul>\n<li>MS or equivalent experience in a relevant, quantitative field such as Computer Science, Statistics, Mathematics, Software Engineering, with an emphasis on AI/ML theory and/or practical development.</li>\n<li>5+ years of post-MS industry experience working on developing AI/ML software engineering pipelines.</li>\n<li>Proficiency in a general-purpose programming language: Python (preferred), Java, Julia, C, C++, etc.</li>\n<li>Strong knowledge of ML and DL fundamentals and hands-on experience with machine learning frameworks such as PyTorch, TensorFlow, Jax or Scikit-learn.</li>\n<li>In-depth knowledge of scalable and distributed computing platforms that support complex model training (such as Ray or DeepSpeed) and their integration with ML developer tools like TensorBoard, Wandb, or MLflow.</li>\n<li>Experience with cloud platforms (e.g., AWS, Google Cloud, Azure) and how to deploy and manage AI/ML models and pipelines in a cloud environment.</li>\n<li>Understanding of containerisation technologies (e.g., Docker) and computing resource orchestration tools (e.g., Kubernetes) for deploying scalable ML/AI solutions.</li>\n<li>Proven track record of developing and optimising workflows for training DL models, large language models (LLMs), or similar for problems with high data complexity and volume.</li>\n<li>Experience managing large datasets, including data storage (such as HDFS or Parquet on S3), retrieval, and efficient data processing techniques (via libraries and executors such as PyArrow and Spark).</li>\n<li>Proficiency in version control systems (e.g., Git) and continuous integration/continuous deployment (CI/CD) practices to maintain code quality and automate development workflows.</li>\n<li>Expertise in building and launching large-scale ML frameworks in a scientific environment that supports the needs of a research team.</li>\n<li>Excellent ability to work effectively with cross-functional teams and communicate across disciplines.</li>\n</ul>\n<p>Nice-to-haves include:</p>\n<ul>\n<li>Experience working with large-scale genomics or biological datasets.</li>\n<li>Experience managing multimodal datasets, such as combinations of sequence, text, image, and other data.</li>\n<li>Experience GPU/Accelerator programming and kernel development (such as CUDA, Triton or XLA).</li>\n<li>Experience with infrastructure-as-code and configuration management.</li>\n<li>Experience cultivating MLOps and ML infrastructure best practices, especially around reliability, provisioning and monitoring.</li>\n<li>Strong track record of contributions to relevant DL projects, e.g. on github.</li>\n</ul>\n<p>The US target range of our base salary for new hires is $161,925 - $227,325. You will also be eligible to receive equity, cash bonuses, and a full range of medical, financial, and other benefits depending on the position offered.</p>\n<p>Freenome is proud to be an equal-opportunity employer, and we value diversity. Freenome does not discriminate on the basis of race, colour, religion, marital status, age, national origin, ancestry, physical or mental disability, medical condition, pregnancy, genetic information, gender, sexual orientation, gender identity or expression, veteran status, or any other status protected under federal, state, or local law.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_2bc207d0-89b","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Freenome","sameAs":"https://freenome.com/","logo":"https://logos.yubhub.co/freenome.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/freenome/jobs/8013673002","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$161,925 - $227,325","x-skills-required":["Python","Java","Julia","C","C++","PyTorch","TensorFlow","Jax","Scikit-learn","Ray","DeepSpeed","TensorBoard","Wandb","MLflow","AWS","Google Cloud","Azure","Docker","Kubernetes","Git","Continuous Integration/Continuous Deployment"],"x-skills-preferred":["Large-scale genomics or biological datasets","Multimodal datasets","GPU/Accelerator programming and kernel development","Infrastructure-as-code and configuration management","MLOps and ML infrastructure best practices"],"datePosted":"2026-04-17T12:35:01.240Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Brisbane, California"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Java, Julia, C, C++, PyTorch, TensorFlow, Jax, Scikit-learn, Ray, DeepSpeed, TensorBoard, Wandb, MLflow, AWS, Google Cloud, Azure, Docker, Kubernetes, Git, Continuous Integration/Continuous Deployment, Large-scale genomics or biological datasets, Multimodal datasets, GPU/Accelerator programming and kernel development, Infrastructure-as-code and configuration management, MLOps and ML infrastructure best practices","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":161925,"maxValue":227325,"unitText":"YEAR"}}}]}