{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/instruction-tuning"},"x-facet":{"type":"skill","slug":"instruction-tuning","display":"Instruction Tuning","count":6},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_683a40cb-69e"},"title":"Machine Learning Research Scientist / Research Engineer, Post-Training","description":"<p>We are seeking a Research Scientist/Research Engineer to join our team. As a Research Scientist/Research Engineer, you will develop novel methods to improve the alignment and generalization of large-scale generative models. You will collaborate with researchers and engineers to define best practices in data-driven AI development. You will also partner with top foundation model labs to provide both technical and strategic input on the development of the next generation of generative AI models.</p>\n<p>Key Responsibilities:</p>\n<ul>\n<li>Research and develop novel post-training techniques, including SFT, RLHF, and reward modeling, to enhance LLM core capabilities in both text and multimodal modalities.</li>\n<li>Design and experiment new approaches to preference optimization.</li>\n<li>Analyze model behavior, identify weaknesses, and propose solutions for bias mitigation and model robustness.</li>\n<li>Publish research findings in top-tier AI conferences.</li>\n</ul>\n<p>Ideal Candidate:</p>\n<ul>\n<li>Ph.D. or Master&#39;s degree in Computer Science, Machine Learning, AI, or a related field.</li>\n<li>Deep understanding of deep learning, reinforcement learning, and large-scale model fine-tuning.</li>\n<li>Experience with post-training techniques such as RLHF, preference modeling, or instruction tuning.</li>\n<li>Excellent written and verbal communication skills</li>\n<li>Published research in areas of machine learning at major conferences (NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR, etc.) and/or journals</li>\n<li>Previous experience in a customer-facing role.</li>\n</ul>\n<p>Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position, determined by work location and additional factors, including job-related skills, experience, interview performance, and relevant education or training.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_683a40cb-69e","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Scale","sameAs":"https://scale.com/","logo":"https://logos.yubhub.co/scale.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/scaleai/jobs/4528009005","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$252,000-$315,000 USD","x-skills-required":["deep learning","reinforcement learning","large-scale model fine-tuning","post-training techniques","RLHF","preference modeling","instruction tuning"],"x-skills-preferred":["published research","customer-facing role"],"datePosted":"2026-04-18T15:59:43.190Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA; Seattle, WA; New York, NY"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"deep learning, reinforcement learning, large-scale model fine-tuning, post-training techniques, RLHF, preference modeling, instruction tuning, published research, customer-facing role","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":252000,"maxValue":315000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_539e2a23-ddf"},"title":"Tech Lead Manager- MLRE, ML Systems","description":"<p>You will lead the development of our internal distributed framework for large language model training. The platform powers MLEs, researchers, data scientists, and operators for fast and automatic training and evaluation of LLMs. It also serves as the underlying training framework for the data quality evaluation pipeline.</p>\n<p>You will work closely with Scale’s ML teams and researchers to build the foundation platform which supports all our ML research and development works. You will be building and optimising the platform to enable our next generation LLM training, inference and data curation.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Building, profiling and optimising our training and inference framework.</li>\n<li>Collaborating with ML and research teams to accelerate their research and development, and enable them to develop the next generation of models and data curation.</li>\n<li>Researching and integrating state-of-the-art technologies to optimise our ML system.</li>\n</ul>\n<p>The ideal candidate will have:</p>\n<ul>\n<li>Passionate about system optimisation.</li>\n<li>Experience with multi-node LLM training and inference.</li>\n<li>Experience with developing large-scale distributed ML systems.</li>\n<li>Experience with post-training methods like RLHF/RLVR and related algorithms like PPO/GRPO etc.</li>\n<li>Strong software engineering skills, proficient in frameworks and tools such as CUDA, PyTorch, transformers, flash attention, etc.</li>\n</ul>\n<p>Nice to haves include demonstrated expertise in post-training methods and/or next generation use cases for large language models including instruction tuning, RLHF, tool use, reasoning, agents, and multimodal, etc.</p>\n<p>Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position, determined by work location and additional factors, including job-related skills, experience, interview performance, and relevant education or training.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_539e2a23-ddf","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Scale","sameAs":"https://scale.com/","logo":"https://logos.yubhub.co/scale.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/scaleai/jobs/4618046005","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$264,800-$331,000 USD","x-skills-required":["system optimisation","multi-node LLM training and inference","large-scale distributed ML systems","post-training methods","software engineering skills","CUDA","PyTorch","transformers","flash attention"],"x-skills-preferred":["next generation use cases for large language models","instruction tuning","RLHF","tool use","reasoning","agents","multimodal"],"datePosted":"2026-04-18T15:59:21.558Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA; New York, NY"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"system optimisation, multi-node LLM training and inference, large-scale distributed ML systems, post-training methods, software engineering skills, CUDA, PyTorch, transformers, flash attention, next generation use cases for large language models, instruction tuning, RLHF, tool use, reasoning, agents, multimodal","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":264800,"maxValue":331000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_840bab06-7be"},"title":"ML Research Engineer, ML Systems","description":"<p>Job Description:</p>\n<p>Scale&#39;s ML platform (RLXF) team builds our internal distributed framework for large language model training and inference. The platform has been powering MLEs, researchers, data scientists and operators for fast and automatic training and evaluation of LLM&#39;s, as well as evaluation of data quality.</p>\n<p>At Scale, we&#39;re uniquely positioned at the heart of the field of AI as an indispensable provider of training and evaluation data and end-to-end solutions for the ML lifecycle. You will work closely across Scale&#39;s ML teams and researchers to build the foundation platform that supports all our ML research and development. You will be building and optimizing the platform to enable our next generation of LLM training, inference and data curation.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Build, profile and optimize our training and inference framework</li>\n<li>Collaborate with ML teams to accelerate their research and development and enable them to develop the next generation of models and data curation</li>\n<li>Research and integrate state-of-the-art technologies to optimize our ML system</li>\n</ul>\n<p>Ideal Candidate:</p>\n<ul>\n<li>Strong excitement about system optimization</li>\n<li>Experience with multi-node LLM training and inference</li>\n<li>Experience with developing large-scale distributed ML systems</li>\n<li>Strong software engineering skills, proficient in frameworks and tools such as CUDA, Pytorch, transformers, flash attention, etc.</li>\n<li>Strong written and verbal communication skills and the ability to operate in a cross functional team environment</li>\n</ul>\n<p>Nice to Have:</p>\n<ul>\n<li>Demonstrated expertise in post-training methods &amp;/or next generation use cases for large language models including instruction tuning, RLHF, tool use, reasoning, agents, and multimodal, etc.</li>\n</ul>\n<p>Compensation Packages:</p>\n<p>Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position, determined by work location and additional factors, including job-related skills, experience, interview performance, and relevant education or training. Scale employees in eligible roles are also granted equity based compensation, subject to Board of Director approval. Your recruiter can share more about the specific salary range for your preferred location during the hiring process, and confirm whether the hired role will be eligible for equity grant. You&#39;ll also receive benefits including, but not limited to: Comprehensive health, dental and vision coverage, retirement benefits, a learning and development stipend, and generous PTO. Additionally, this role may be eligible for additional benefits such as a commuter stipend.</p>\n<p>Please note that our policy requires a 90-day waiting period before reconsidering candidates for the same role. This allows us to ensure a fair and thorough evaluation of all applicants.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_840bab06-7be","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Scale","sameAs":"https://scale.com/","logo":"https://logos.yubhub.co/scale.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/scaleai/jobs/4534631005","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$189,600-$237,000 USD","x-skills-required":["System Optimization","Multi-node LLM Training and Inference","Large-Scale Distributed ML Systems","CUDA","Pytorch","Transformers","Flash Attention"],"x-skills-preferred":["Post-Training Methods","Next Generation Use Cases for Large Language Models","Instruction Tuning","RLHF","Tool Use","Reasoning","Agents","Multimodal"],"datePosted":"2026-04-18T15:58:47.020Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA; Seattle, WA; New York, NY"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"System Optimization, Multi-node LLM Training and Inference, Large-Scale Distributed ML Systems, CUDA, Pytorch, Transformers, Flash Attention, Post-Training Methods, Next Generation Use Cases for Large Language Models, Instruction Tuning, RLHF, Tool Use, Reasoning, Agents, Multimodal","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":189600,"maxValue":237000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_d2f5b1e5-545"},"title":"Research Scientist, Gemini Safety","description":"<p>We&#39;re seeking a versatile Research Scientist to join our Gemini Safety team. As a Research Scientist, you will apply and develop data and algorithmic cutting-edge solutions to advance our latest user-facing models. Your work will focus on advancing the safety and fairness behavior of state-of-the-art AI models, driving the development of foundational technology adopted by numerous product areas, including Gemini App, Cloud API, and Search.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Post-training/instruction tuning state-of-the-art LLMs, focusing on text-to-text, image/video/audio-to-text modalities and agentic capabilities</li>\n<li>Exploring data, reasoning, and algorithmic solutions to ensure Gemini Models are safe, maximally helpful, and work for everyone</li>\n<li>Improve Gemini&#39;s adversarial robustness, with a focus on high-stakes abuse risks</li>\n<li>Design and maintain high-quality evaluation protocols to assess model behavior gaps and headroom related to safety and fairness</li>\n<li>Develop and execute experimental plans to address known gaps, or construct entirely new capabilities</li>\n<li>Drive innovation and enhance understanding of Supervised Fine Tuning and Reinforcement Learning fine-tuning at scale</li>\n</ul>\n<p>To succeed as a Research Scientist in the Gemini Safety team, we look for the following skills and experience:</p>\n<ul>\n<li>PhD in Computer Science, a related field, or equivalent practical experience</li>\n<li>Significant LLM post-training experience</li>\n<li>Experience in Reward modeling and Reinforcement Learning for LLMs Instruction tuning</li>\n<li>Experience with Long-range Reinforcement learning</li>\n<li>Experience in areas such as Safety, Fairness, and Alignment</li>\n<li>Track record of publications at NeurIPS, ICLR, ICML</li>\n<li>Experience taking research from concept to product</li>\n<li>Experience with collaborating or leading an applied research project</li>\n<li>Strong experimental taste: Good judgment regarding baselines, ablations, and what is worth testing</li>\n<li>Experience with JAX</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_d2f5b1e5-545","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Google DeepMind","sameAs":"https://deepmind.com/","logo":"https://logos.yubhub.co/deepmind.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/deepmind/jobs/7731944","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["PhD in Computer Science","LLM post-training experience","Reward modeling and Reinforcement Learning for LLMs Instruction tuning","Long-range Reinforcement learning","Safety, Fairness, and Alignment","NeurIPS, ICLR, ICML publications","Research from concept to product","Collaborating or leading an applied research project","JAX"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:40:08.109Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Zurich, Switzerland"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"PhD in Computer Science, LLM post-training experience, Reward modeling and Reinforcement Learning for LLMs Instruction tuning, Long-range Reinforcement learning, Safety, Fairness, and Alignment, NeurIPS, ICLR, ICML publications, Research from concept to product, Collaborating or leading an applied research project, JAX"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_1e0f3b52-1ae"},"title":"Research Scientist, Gemini Safety","description":"<p>We&#39;re seeking a versatile Research Scientist to join our Gemini Safety team, responsible for advancing the safety and fairness behaviour of state-of-the-art AI models. As a key member of our team, you will apply and develop cutting-edge data and algorithmic solutions to ensure Gemini models are safe, maximally helpful, and work for everyone.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Post-training/instruction tuning state-of-the-art language models, focusing on text-to-text, image/video/audio-to-text modalities and agentic capabilities</li>\n<li>Exploring data, reasoning, and algorithmic solutions to ensure Gemini models are safe and work for everyone</li>\n<li>Improving Gemini&#39;s adversarial robustness, with a focus on high-stakes abuse risks</li>\n<li>Designing and maintaining high-quality evaluation protocols to assess model behaviour gaps and headroom related to safety and fairness</li>\n<li>Developing and executing experimental plans to address known gaps or construct entirely new capabilities</li>\n</ul>\n<p>To succeed in this role, you should have a PhD in Computer Science or a related field, significant LLM post-training experience, and a track record of publications at top conferences. Experience in reward modelling and reinforcement learning for LLMs instruction tuning, long-range reinforcement learning, safety, fairness, and alignment is an advantage.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_1e0f3b52-1ae","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Google DeepMind","sameAs":"https://deepmind.com/","logo":"https://logos.yubhub.co/deepmind.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/deepmind/jobs/7421111","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["PhD in Computer Science or a related field","Significant LLM post-training experience","Post-training/instruction tuning state-of-the-art language models","Exploring data, reasoning, and algorithmic solutions","Improving Gemini's adversarial robustness"],"x-skills-preferred":["Reward modelling and reinforcement learning for LLMs instruction tuning","Long-range reinforcement learning","Safety, fairness, and alignment"],"datePosted":"2026-03-31T18:27:50.311Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Mountain View, California, US"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"PhD in Computer Science or a related field, Significant LLM post-training experience, Post-training/instruction tuning state-of-the-art language models, Exploring data, reasoning, and algorithmic solutions, Improving Gemini's adversarial robustness, Reward modelling and reinforcement learning for LLMs instruction tuning, Long-range reinforcement learning, Safety, fairness, and alignment"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_a48bc0a6-719"},"title":"Research Scientist, Gemini Safety","description":"<p>Job Title: Research Scientist, Gemini Safety</p>\n<p>We&#39;re looking for a versatile Research Scientist to join our Gemini Safety team at Google DeepMind. As a Research Scientist, you will be responsible for applying and developing data and algorithmic cutting-edge solutions to advance the safety and fairness behavior of our latest user-facing models.</p>\n<p>The Gemini Safety team is accountable for the safety and fairness behavior of GDM&#39;s latest Gemini models. Our team focuses on advancing the safety and fairness behavior of state-of-the-art AI models, driving the development of foundational technology adopted by numerous product areas, including Gemini App, Cloud API, and Search.</p>\n<p>Key Responsibilities:</p>\n<ul>\n<li>Post-training/instruction tuning state-of-the-art LLMs, focusing on text-to-text, image/video/audio-to-text modalities and agentic capabilities</li>\n<li>Exploring data, reasoning, and algorithmic solutions to ensure Gemini Models are safe, maximally helpful, and work for everyone</li>\n<li>Improve Gemini&#39;s adversarial robustness, with a focus on high-stakes abuse risks</li>\n<li>Design and maintain high-quality evaluation protocols to assess model behavior gaps and headroom related to safety and fairness</li>\n<li>Develop and execute experimental plans to address known gaps, or construct entirely new capabilities</li>\n<li>Drive innovation and enhance understanding of Supervised Fine Tuning and Reinforcement Learning fine-tuning at scale</li>\n</ul>\n<p>About You:</p>\n<ul>\n<li>PhD in Computer Science, a related field, or equivalent practical experience</li>\n<li>Significant LLM post-training experience</li>\n<li>Experience in Reward modeling and Reinforcement Learning for LLMs Instruction tuning</li>\n<li>Experience with Long-range Reinforcement learning</li>\n<li>Experience in areas such as Safety, Fairness, and Alignment</li>\n<li>Track record of publications at NeurIPS, ICLR, ICML, RL/DL, EMNLP, AAAI, UAI</li>\n<li>Experience taking research from concept to product</li>\n<li>Experience with collaborating or leading an applied research project</li>\n<li>Experience with JAX</li>\n</ul>\n<p>At Google DeepMind, we value diversity of experience, knowledge, backgrounds, and perspectives and harness these qualities to create extraordinary impact. We are committed to equal employment opportunity regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, pregnancy, or related condition (including breastfeeding) or any other basis as protected by applicable law. If you have a disability or additional need that requires accommodation, please do not hesitate to let us know.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_a48bc0a6-719","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Google DeepMind","sameAs":"https://deepmind.com/","logo":"https://logos.yubhub.co/deepmind.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/deepmind/jobs/7421111","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["PhD in Computer Science","LLM post-training experience","Reward modeling and Reinforcement Learning for LLMs Instruction tuning","Long-range Reinforcement learning","Safety, Fairness, and Alignment"],"x-skills-preferred":["JAX"],"datePosted":"2026-03-16T14:40:47.021Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Mountain View, California, US"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"PhD in Computer Science, LLM post-training experience, Reward modeling and Reinforcement Learning for LLMs Instruction tuning, Long-range Reinforcement learning, Safety, Fairness, and Alignment, JAX"}]}