{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/parallelism-strategies"},"x-facet":{"type":"skill","slug":"parallelism-strategies","display":"Parallelism Strategies","count":3},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_dc17980d-461"},"title":"Research Engineer, Interpretability","description":"<p>JOB TITLE: Research Engineer, Interpretability \\n LOCATION: San Francisco, CA \\n DEPARTMENT: AI Research &amp; Engineering \\n \\n JOB DESCRIPTION: \\n \\n When you see what modern language models are capable of, do you wonder, &quot;How do these things work? How can we trust them?&quot; \\n \\n The Interpretability team at Anthropic is working to reverse-engineer how trained models work because we believe that a mechanistic understanding is the most robust way to make advanced systems safe. \\n \\n Think of us as doing &quot;neuroscience&quot; of neural networks using &quot;microscopes&quot; we build - or reverse-engineering neural networks like binary programs. \\n \\n More resources to learn about our work: \\n - Our research blog - covering advances including Monosemantic Features and Circuits \\n - An Introduction to Interpretability from our research lead, Chris Olah \\n - The Urgency of Interpretability from CEO Dario Amodei \\n - Engineering Challenges Scaling Interpretability - directly relevant to this role \\n - 60 Minutes segment - Around 8:07, see a demo of tooling our team built \\n - New Yorker article - what it&#39;s like to work on one of AI&#39;s hardest open problems \\n \\n Even if you haven&#39;t worked on interpretability before, the infrastructure expertise is similar to what&#39;s needed across the lifecycle of a production language model: \\n - Pretraining: Training dictionary learning models looks a lot like model pretraining - creating stable, performant training jobs for massively parameterized models across thousands of chips \\n - Inference: Interp runs a customized inference stack. Day-to-day analysis requires services that allow editing a model&#39;s internal activations mid-forward-pass - for example, adding a &quot;steering vector&quot; \\n - Performance: Like all LLM work, we push up against the limits of hardware and software. Rather than squeezing the last 0.1%, we are focused on finding bottlenecks, fixing them and moving ahead given rapidly evolving research and safety mission \\n \\n The science keeps scaling - and it&#39;s now applied directly in safety audits on frontier models, with real deadlines. As our research has matured, engineering and infrastructure have become a bottleneck. Your work will have a direct impact on one of the most important open problems in AI. \\n \\n RESPONSIBILITIES: \\n - Build and maintain the specialized inference and training infrastructure that powers interpretability research - including instrumented forward/backward passes, activation extraction, and steering vector application \\n - Resolve scaling and efficiency bottlenecks through profiling, optimization, and close collaboration with peer infrastructure teams \\n - Design tools, abstractions, and platforms that enable researchers to rapidly experiment without hitting engineering barriers \\n - Help bring interpretability research into production safety audits - with real deadlines and high reliability expectations \\n - Work across the stack - from model internals and accelerator-level optimization to user-facing research tooling \\n \\n YOU MAY BE A GOOD FIT IF YOU: \\n - Have 5-10+ years of experience building software \\n - Are highly proficient in at least one programming language (e.g., Python, Rust, Go, Java) and productive with Python \\n - Are extremely curious about unfamiliar domains; can quickly learn and put that knowledge to work, e.g. diving into new layers of the stack to find bottlenecks \\n - Have a strong ability to prioritize the most impactful work and are comfortable operating with ambiguity and questioning assumptions \\n - Prefer fast-moving collaborative projects to extensive solo efforts \\n - Are curious about interpretability research and its role in AI safety (though no research experience is required!) \\n - Care about the societal impacts and ethics of your work \\n - Are comfortable working closely with researchers, translating research needs into engineering solutions. \\n \\n STRONG CANDIDATES MAY ALSO HAVE EXPERIENCE WITH: \\n - Optimizing the performance of large-scale distributed systems \\n - Language modeling fundamentals with transformers \\n - High Performance LLM optimization: memory management, compute efficiency, parallelism strategies, inference throughput optimization \\n - Working hands-on in a mainstream ML stack - PyTorch/CUDA on GPUs or JAX/XLA on TPUs \\n - Collaborating closely with researchers and building tooling to support research teams; or directly performed research with complex engineering challenges \\n \\n REPRESENTATIVE PROJECTS: \\n - Building Garcon, a tool that allows researchers to easily instrument LLMs to extract internal activations \\n - Designing and optimizing a pipeline to efficiently collect petabytes of transformer activations and shuffle them \\n - Profiling and optimizing ML training jobs, including multi-GPU parallelism and memory optimization \\n - Building a steered inference system that applies targeted interventions to model internals at scale (conceptually similar to Golden Gate Claude but for safety research) \\n \\n ROLE SPECIFIC LOCATION POLICY: \\n - This role is based in the San Francisco office; however, we are open to considering exceptional candidates for remote work on a case-by-case basis. \\n \\n The annual compensation range for this role is listed below. \\n For sales roles, the range provided is the role&#39;s On Target Earnings (\\&quot;OTE\\&quot;) range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role. \\n Annual Salary:\\\\$315,000-\\\\$560,000 USD</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_dc17980d-461","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4980430008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$315,000-$560,000 USD","x-skills-required":["Python","Rust","Go","Java","PyTorch","CUDA","JAX","XLA","High Performance LLM optimization","memory management","compute efficiency","parallelism strategies","inference throughput optimization"],"x-skills-preferred":["large-scale distributed systems","language modeling fundamentals","transformers","collaborating closely with researchers","building tooling to support research teams"],"datePosted":"2026-04-18T15:53:01.682Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Rust, Go, Java, PyTorch, CUDA, JAX, XLA, High Performance LLM optimization, memory management, compute efficiency, parallelism strategies, inference throughput optimization, large-scale distributed systems, language modeling fundamentals, transformers, collaborating closely with researchers, building tooling to support research teams","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":315000,"maxValue":560000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_97212bdf-dd1"},"title":"Research Engineer, Interpretability","description":"<p>Job Title: Research Engineer, Interpretability</p>\n<p>About the Role:</p>\n<p>When you see what modern language models are capable of, do you wonder, &quot;How do these things work? How can we trust them?&quot; The Interpretability team at Anthropic is working to reverse-engineer how trained models work because we believe that a mechanistic understanding is the most robust way to make advanced systems safe.</p>\n<p>Think of us as doing &quot;neuroscience&quot; of neural networks using &quot;microscopes&quot; we build - or reverse-engineering neural networks like binary programs.</p>\n<p>More resources to learn about our work:</p>\n<ul>\n<li>Our research blog - covering advances including Monosemantic Features and Circuits</li>\n</ul>\n<ul>\n<li>An Introduction to Interpretability from our research lead, Chris Olah</li>\n</ul>\n<ul>\n<li>The Urgency of Interpretability from CEO Dario Amodei</li>\n</ul>\n<ul>\n<li>Engineering Challenges Scaling Interpretability - directly relevant to this role</li>\n</ul>\n<ul>\n<li>60 Minutes segment - Around 8:07, see a demo of tooling our team built</li>\n</ul>\n<ul>\n<li>New Yorker article - what it&#39;s like to work on one of AI&#39;s hardest open problems</li>\n</ul>\n<p>Even if you haven&#39;t worked on interpretability before, the infrastructure expertise is similar to what&#39;s needed across the lifecycle of a production language model:</p>\n<ul>\n<li>Pretraining: Training dictionary learning models looks a lot like model pretraining - creating stable, performant training jobs for massively parameterized models across thousands of chips</li>\n</ul>\n<ul>\n<li>Inference: Interp runs a customized inference stack. Day-to-day analysis requires services that allow editing a model&#39;s internal activations mid-forward-pass - for example, adding a &quot;steering vector&quot;</li>\n</ul>\n<ul>\n<li>Performance: Like all LLM work, we push up against the limits of hardware and software. Rather than squeezing the last 0.1%, we are focused on finding bottlenecks, fixing them and moving ahead given rapidly evolving research and safety mission</li>\n</ul>\n<p>The science keeps scaling - and it&#39;s now applied directly in safety audits on frontier models, with real deadlines. As our research has matured, engineering and infrastructure have become a bottleneck. Your work will have a direct impact on one of the most important open problems in AI.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Build and maintain the specialized inference and training infrastructure that powers interpretability research - including instrumented forward/backward passes, activation extraction, and steering vector application</li>\n</ul>\n<ul>\n<li>Resolve scaling and efficiency bottlenecks through profiling, optimization, and close collaboration with peer infrastructure teams</li>\n</ul>\n<ul>\n<li>Design tools, abstractions, and platforms that enable researchers to rapidly experiment without hitting engineering barriers</li>\n</ul>\n<ul>\n<li>Help bring interpretability research into production safety audits - with real deadlines and high reliability expectations</li>\n</ul>\n<ul>\n<li>Work across the stack - from model internals and accelerator-level optimization to user-facing research tooling</li>\n</ul>\n<p>You may be a good fit if you:</p>\n<ul>\n<li>Have 5-10+ years of experience building software</li>\n</ul>\n<ul>\n<li>Are highly proficient in at least one programming language (e.g., Python, Rust, Go, Java) and productive with Python</li>\n</ul>\n<ul>\n<li>Are extremely curious about unfamiliar domains; can quickly learn and put that knowledge to work, e.g. diving into new layers of the stack to find bottlenecks</li>\n</ul>\n<ul>\n<li>Have a strong ability to prioritize the most impactful work and are comfortable operating with ambiguity and questioning assumptions</li>\n</ul>\n<ul>\n<li>Prefer fast-moving collaborative projects to extensive solo efforts</li>\n</ul>\n<ul>\n<li>Are curious about interpretability research and its role in AI safety (though no research experience is required!)</li>\n</ul>\n<ul>\n<li>Care about the societal impacts and ethics of your work</li>\n</ul>\n<ul>\n<li>Are comfortable working closely with researchers, translating research needs into engineering solutions.</li>\n</ul>\n<p>Strong candidates may also have experience with:</p>\n<ul>\n<li>Optimizing the performance of large-scale distributed systems</li>\n</ul>\n<ul>\n<li>Language modeling fundamentals with transformers</li>\n</ul>\n<ul>\n<li>High Performance LLM optimization: memory management, compute efficiency, parallelism strategies, inference throughput optimization</li>\n</ul>\n<ul>\n<li>Working hands-on in a mainstream ML stack - PyTorch/CUDA on GPUs or JAX/XLA on TPUs</li>\n</ul>\n<ul>\n<li>Collaborating closely with researchers and building tooling to support research teams; or directly performed research with complex engineering challenges</li>\n</ul>\n<p>Representative Projects:</p>\n<ul>\n<li>Building Garcon, a tool that allows researchers to easily instrument LLMs to extract internal activations</li>\n</ul>\n<ul>\n<li>Designing and optimizing a pipeline to efficiently collect petabytes of transformer activations and shuffle them</li>\n</ul>\n<ul>\n<li>Profiling and optimizing ML training jobs, including multi-GPU parallelism and memory optimization</li>\n</ul>\n<ul>\n<li>Building a steered inference system that applies targeted interventions to model internals at scale (conceptually similar to Golden Gate Claude but for safety research)</li>\n</ul>\n<p>Role Specific Location Policy:</p>\n<ul>\n<li>This role is based in the San Francisco office; however, we are open to considering exceptional candidates for remote work on a case-by-case basis.</li>\n</ul>\n<p>The annual compensation range for this role is listed below.</p>\n<p>For sales roles, the range provided is the role&#39;s On Target Earnings (&quot;OTE&quot;) range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.</p>\n<p>Annual Salary: $315,000-$560,000 USD</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_97212bdf-dd1","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4980430008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$315,000-$560,000 USD","x-skills-required":["Python","Rust","Go","Java","PyTorch","CUDA","JAX","XLA","Transformers","High Performance LLM optimization","Memory management","Compute efficiency","Parallelism strategies","Inference throughput optimization"],"x-skills-preferred":["Optimizing the performance of large-scale distributed systems","Language modeling fundamentals","Collaborating closely with researchers and building tooling to support research teams"],"datePosted":"2026-04-18T15:46:01.999Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Rust, Go, Java, PyTorch, CUDA, JAX, XLA, Transformers, High Performance LLM optimization, Memory management, Compute efficiency, Parallelism strategies, Inference throughput optimization, Optimizing the performance of large-scale distributed systems, Language modeling fundamentals, Collaborating closely with researchers and building tooling to support research teams","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":315000,"maxValue":560000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_71554e46-b64"},"title":"Senior Engineering Manager, AI Runtime","description":"<p>At Databricks, we are committed to enabling data teams to solve the world&#39;s toughest problems. As a Senior Engineering Manager, you will lead the team owning both the product experience and the foundational infrastructure of our AI Runtime (AIR) product.</p>\n<p>You will be responsible for shaping customer-facing capabilities while designing for scalability, extensibility, and performance of GPU training and adjacent areas. This will involve collaborating closely across the platform, product, infrastructure, and research organisations.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Leading, mentoring, and growing a high-performing engineering team responsible for the Custom Training product and its foundational infrastructure</li>\n<li>Defining and owning the product and technical roadmap for AIR, balancing customer experience, functionality, and foundational investments</li>\n<li>Collaborating closely with product, research, platform, infrastructure teams, and customers to drive end-to-end delivery</li>\n<li>Driving architectural decisions and product design for managed GPU training at scale</li>\n<li>Advocating for customer needs through direct engagement, ensuring engineering decisions translate to clear product impact</li>\n</ul>\n<p>We are looking for someone with 8+ years of software engineering experience, with 3+ years in engineering management. You should have a track record of building and operating managed GPU training infrastructure at scale, as well as deep familiarity with distributed training frameworks and parallelism strategies.</p>\n<p>In addition, you should have experience with training resilience patterns, such as checkpointing, elastic training, and automated failure recovery for long-running jobs. You should also have a strong understanding of GPU performance fundamentals, including NCCL, interconnect topologies, and memory optimisation.</p>\n<p>Experience building platform products with clear SLAs is also essential, as is strong cross-functional leadership across platform, product, and research teams. Excellent collaboration and communication skills are also required.</p>\n<p>The pay range for this role is $228,600-$314,250 USD per year, depending on location. The total compensation package may also include eligibility for annual performance bonus, equity, and benefits.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_71554e46-b64","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Databricks","sameAs":"https://databricks.com","logo":"https://logos.yubhub.co/databricks.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/databricks/jobs/8490282002","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$228,600-$314,250 USD per year","x-skills-required":["software engineering","engineering management","distributed training frameworks","parallelism strategies","GPU training infrastructure","checkpointing","elastic training","automated failure recovery","GPU performance fundamentals","NCCL","interconnect topologies","memory optimisation"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:45:28.312Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Mountain View, California; San Francisco, California"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"software engineering, engineering management, distributed training frameworks, parallelism strategies, GPU training infrastructure, checkpointing, elastic training, automated failure recovery, GPU performance fundamentals, NCCL, interconnect topologies, memory optimisation","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":228600,"maxValue":314250,"unitText":"YEAR"}}}]}