{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/vision-language-models"},"x-facet":{"type":"skill","slug":"vision-language-models","display":"Vision Language Models","count":4},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_cc9d92de-913"},"title":"Research Engineer / Research Scientist, Vision","description":"<p>We&#39;re looking for research engineers with a strong computer vision background to work on research, development, and evaluation for state-of-the-art Claude models. In this role, you&#39;ll run experiments to evaluate architectural variants, data strategies, and SL and RL techniques to improve Claude&#39;s vision. You&#39;ll also develop and test tools, skills, and agentic infrastructure that enable Claude to reason over visual inputs. Additionally, you&#39;ll create evaluations and benchmarks that measure progress on multimodal capabilities across training and deployment.</p>\n<p>As a research engineer, you&#39;ll partner with the product org to ensure that the vision improvements you deliver impact Claude&#39;s performance on real-world tasks. You&#39;ll also work with our product org to find solutions to our most vexing API customer challenges related to vision and spatial reasoning.</p>\n<p>Strong candidates may also have experience with large-scale pretraining, SL, and RL on language models, deep learning research on images, video, or other modalities, developing complex agentic systems using LLMs, high-performance ML systems (GPUs, TPUs, JAX, PyTorch), and large-scale ETL and data pipeline development.</p>\n<p>The annual compensation range for this role is $350,000-$850,000 USD.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_cc9d92de-913","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5074217008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$350,000-$850,000 USD","x-skills-required":["computer vision","ML","software engineering","large vision language models","synthetic and real-world visual training datasets","systematic prompting, finetuning, or evaluation"],"x-skills-preferred":["large-scale pretraining","SL","RL","deep learning research","agentic systems","high-performance ML systems","ETL and data pipeline development"],"datePosted":"2026-04-18T15:42:18.530Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"New York City, NY; San Francisco, CA; Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"computer vision, ML, software engineering, large vision language models, synthetic and real-world visual training datasets, systematic prompting, finetuning, or evaluation, large-scale pretraining, SL, RL, deep learning research, agentic systems, high-performance ML systems, ETL and data pipeline development","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":350000,"maxValue":850000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_e121da52-304"},"title":"Research Engineer, Human Understanding","description":"<p>We are seeking a highly motivated Research Engineer with a strong background in multi-modal modelling for humans and a focus on speech &amp; audio/visual to join the effort within Google DeepMind&#39;s Frontier AI unit.</p>\n<p>This role is pivotal in developing foundational multimodal AI capabilities to understand, generate, and protect human likeness. As a key contributor, you will design and implement cutting-edge models and frameworks, pushing the boundaries of AI to enable foundational capabilities for human-centric understanding and generation.</p>\n<p>This is a unique opportunity to contribute to impactful research and advance Google DeepMind&#39;s mission towards Artificial General Intelligence (AGI).</p>\n<p><strong>Key Responsibilities</strong></p>\n<ul>\n<li>Advance multimodal human representations &amp; understanding: Research and implement novel models and other multimodal techniques for a more holistic understanding of humans across visual, audio, and textual data.</li>\n<li>Conduct applied research: Conduct experimental research cycles from hypothesis to deployment.</li>\n<li>Drive technical projects: Take ownership of substantial technical projects within the effort, from ideation and design to implementation and evaluation, often involving cross-functional collaboration.</li>\n<li>Contribute to Infrastructure: Inform and contribute to the development of scalable and efficient research infrastructure for multimodal human understanding models and datasets.</li>\n<li>Design and execute strategies for tuning and adapting VLMs and other foundation models for specific tasks</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>PhD degree in Computer Science, Machine Learning, or a related technical field with 3+ years of relevant experience.</li>\n<li>Experience in developing machine learning models, such as audio &amp; speech-visual models.</li>\n<li>Experience in working with and tuning large-scale vision language models.</li>\n<li>Strong programming skills in Python and experience with at least one major deep learning framework (e.g., JAX)</li>\n<li>Experience conducting independent research and development, including experimental design, implementation, and analysis.</li>\n</ul>\n<p><strong>Salary</strong></p>\n<p>The US base salary range for this full-time position is between $174,000 USD - $252,000 USD + bonus + equity + benefits.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_e121da52-304","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Google DeepMind","sameAs":"https://deepmind.com/","logo":"https://logos.yubhub.co/deepmind.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/deepmind/jobs/7669433","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$174,000 USD - $252,000 USD","x-skills-required":["Python","JAX","Machine Learning","Deep Learning","Vision Language Models","Audio & Speech-Visual Models"],"x-skills-preferred":["Generative AI","Reinforcement Learning","Alignment Methods","Multimodal Learning","Privacy-Preserving Machine Learning"],"datePosted":"2026-04-18T15:38:13.994Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Los Angeles, California, US; Mountain View, California, US"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, JAX, Machine Learning, Deep Learning, Vision Language Models, Audio & Speech-Visual Models, Generative AI, Reinforcement Learning, Alignment Methods, Multimodal Learning, Privacy-Preserving Machine Learning","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":174000,"maxValue":252000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_dc117b6b-1b7"},"title":"Research Scientist, Multimodal Alignment, Safety, and Fairness","description":"<p>We are seeking strong Research Scientists with expertise in AI research and experience in interdisciplinary sociotechnical modeling to join a multimodal safety research effort within Google DeepMind&#39;s Frontier AI unit.</p>\n<p>This role requires a passion for understanding and modeling the interactions between AI and society, a strong awareness of the AI alignment and safety landscape, and a penchant for developing novel ideas, methods, interfaces, and tools.</p>\n<p>As a Research Scientist at Google DeepMind, you will join a team working to supercharge exploration, assessment, and steering of evolving AI behaviours, with a focus on subjective and creative tasks. You will tackle the underlying research questions to improve collaborative specification of alignment objectives and assessment of adherence to desired behaviours.</p>\n<p>Key responsibilities include generating new ideas, executing cutting-edge ideas, communicating research findings, collaborating with other researchers, and driving technical projects.</p>\n<p>To be successful in this role, you will need a PhD degree in Computer Science, Machine Learning, or a related technical field, a strong publication record in top machine learning conferences, and demonstrated hands-on experience in developing multimodal AI models and systems.</p>\n<p>In addition, experience with large-scale vision language models, fine-tuning and post-training LLMs using RL, and developing agentic AI solutions to complex problems would be an advantage.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_dc117b6b-1b7","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Google DeepMind","sameAs":"https://deepmind.com/","logo":"https://logos.yubhub.co/deepmind.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/deepmind/jobs/7680885","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$147,000 USD - $211,000 USD + bonus + equity + benefits","x-skills-required":["Python","Deep learning frameworks (e.g., JAX/Flax/Gemax)","Multimodal AI models and systems","Experimental design, implementation, and analysis","Large-scale vision language models"],"x-skills-preferred":["Proven expertise in working with and tuning large-scale vision language models","Experience prototyping with VLMs with modern prompting strategies","Experience finetuning and post-training LLMs using RL","Experience with developing agentic AI solutions to complex problems","Interest and a strong awareness of the AI alignment / safety / responsibility / fairness landscape"],"datePosted":"2026-03-16T14:42:43.157Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Kirkland, Washington, US; Mountain View, California, US; New York City, New York, US"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Deep learning frameworks (e.g., JAX/Flax/Gemax), Multimodal AI models and systems, Experimental design, implementation, and analysis, Large-scale vision language models, Proven expertise in working with and tuning large-scale vision language models, Experience prototyping with VLMs with modern prompting strategies, Experience finetuning and post-training LLMs using RL, Experience with developing agentic AI solutions to complex problems, Interest and a strong awareness of the AI alignment / safety / responsibility / fairness landscape","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":147000,"maxValue":211000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_f0ed63ad-d69"},"title":"Research Engineer / Research Scientist, Vision","description":"<p><strong>About the role</strong></p>\n<p>We&#39;re looking for research engineers with a strong computer vision background who believe that visual and spatial reasoning are core to fully unlocking the capabilities of LLMs. In this role, you&#39;ll work on research, development, and evaluation for state-of-the-art Claude models, with a focus on visual and spatial capabilities.</p>\n<p><strong>What you&#39;ll do:</strong></p>\n<ul>\n<li>Run experiments to evaluate architectural variants, data strategies, and SL and RL techniques to improve Claude&#39;s vision</li>\n</ul>\n<ul>\n<li>Develop and test tools, skills, and agentic infrastructure that enable Claude to reason over visual inputs</li>\n</ul>\n<ul>\n<li>Create evaluations and benchmarks that measure progress on multimodal capabilities across training and deployment</li>\n</ul>\n<ul>\n<li>Work with our product org to find solutions to our most vexing API customer challenges related to vision and spatial reasoning</li>\n</ul>\n<p><strong>You may be a good fit if you:</strong></p>\n<ul>\n<li>Have 7+ years of ML, computer vision, and software engineering experience through industry, academia, or other projects</li>\n</ul>\n<ul>\n<li>Are familiar with the architecture, training, and operation of large vision language models</li>\n</ul>\n<ul>\n<li>Have experience creating and evaluating large synthetic and real-world visual training datasets</li>\n</ul>\n<ul>\n<li>Have experience engaging in systematic prompting, finetuning, or evaluation</li>\n</ul>\n<ul>\n<li>Are results-oriented, with a bias towards flexibility and impact</li>\n</ul>\n<ul>\n<li>Enjoy pair programming and cross-team collaboration</li>\n</ul>\n<ul>\n<li>Care about the societal impacts of your work</li>\n</ul>\n<p><strong>Strong candidates may also have experience with:</strong></p>\n<ul>\n<li>Large-scale pretraining, SL, and RL on language models</li>\n</ul>\n<ul>\n<li>Deep learning research on images, video, or other modalities</li>\n</ul>\n<ul>\n<li>Developing complex agentic systems using LLMs</li>\n</ul>\n<ul>\n<li>High-performance ML systems (GPUs, TPUs, JAX, PyTorch)</li>\n</ul>\n<ul>\n<li>Large-scale ETL and data pipeline development</li>\n</ul>\n<p><strong>Representative projects:</strong></p>\n<ul>\n<li>Running experiments to determine ideal training datamixes and parameters for a synthetically generated vision dataset</li>\n</ul>\n<ul>\n<li>Finetuning Claude to maximise its performance using a particular set of agent tools/skills</li>\n</ul>\n<ul>\n<li>Building a pipeline to ingest and process a novel source of visual training data</li>\n</ul>\n<ul>\n<li>Designing and running experiments to evaluate the scalability of two architectural variants</li>\n</ul>\n<p><strong>Logistics</strong></p>\n<ul>\n<li>Education requirements: We require at least a Bachelor&#39;s degree in a related field or equivalent experience.</li>\n</ul>\n<ul>\n<li>Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</li>\n</ul>\n<ul>\n<li>Visa sponsorship: We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</li>\n</ul>\n<p><strong>How we&#39;re different</strong></p>\n<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We&#39;re an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.</p>\n<p>The easiest way to understand our research directions is to read our recent research. This research can be found on our website.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_f0ed63ad-d69","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://job-boards.greenhouse.io","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5074217008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$350,000 - $850,000 USD","x-skills-required":["computer vision","large vision language models","deep learning research","high-performance ML systems","large-scale ETL and data pipeline development"],"x-skills-preferred":["large-scale pretraining","SL and RL on language models","agentic systems using LLMs"],"datePosted":"2026-03-08T13:45:30.573Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"New York City, NY; San Francisco, CA; Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"computer vision, large vision language models, deep learning research, high-performance ML systems, large-scale ETL and data pipeline development, large-scale pretraining, SL and RL on language models, agentic systems using LLMs","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":350000,"maxValue":850000,"unitText":"YEAR"}}}]}