{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/tokenization-algorithms"},"x-facet":{"type":"skill","slug":"tokenization-algorithms","display":"Tokenization Algorithms","count":1},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_3b359ef2-6f8"},"title":"Machine Learning Systems Engineer, Research Tools","description":"<p>We are seeking an experienced Machine Learning Systems Engineer to join our Encodings and Tokenization team at Anthropic. This cross-functional role will be instrumental in developing and optimizing the encodings and tokenization systems used throughout our Finetuning workflows. As a bridge between our Pretraining and Finetuning teams, you&#39;ll build critical infrastructure that directly impacts how our models learn from and interpret data.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Design, develop, and maintain tokenization systems used across Pretraining and Finetuning workflows</li>\n<li>Optimize encoding techniques to improve model training efficiency and performance</li>\n<li>Collaborate closely with research teams to understand their evolving needs around data representation</li>\n<li>Build infrastructure that enables researchers to experiment with novel tokenization approaches</li>\n<li>Implement systems for monitoring and debugging tokenization-related issues in the model training pipeline</li>\n<li>Create robust testing frameworks to validate tokenization systems across diverse languages and data types</li>\n<li>Identify and address bottlenecks in data processing pipelines related to tokenization</li>\n<li>Document systems thoroughly and communicate technical decisions clearly to stakeholders across teams</li>\n</ul>\n<p>You May Be a Good Fit If You:</p>\n<ul>\n<li>Have significant software engineering experience with demonstrated machine learning expertise</li>\n<li>Are comfortable navigating ambiguity and developing solutions in rapidly evolving research environments</li>\n<li>Can work independently while maintaining strong collaboration with cross-functional teams</li>\n<li>Are results-oriented, with a bias towards flexibility and impact</li>\n<li>Have experience with machine learning systems, data pipelines, or ML infrastructure</li>\n<li>Are proficient in Python and familiar with modern ML development practices</li>\n<li>Have strong analytical skills and can evaluate the impact of engineering changes on research outcomes</li>\n<li>Pick up slack, even if it goes outside your job description</li>\n<li>Enjoy pair programming (we love to pair!)</li>\n<li>Care about the societal impacts of your work and are committed to developing AI responsibly</li>\n</ul>\n<p>Strong Candidates May Also Have Experience With:</p>\n<ul>\n<li>Working with machine learning data processing pipelines</li>\n<li>Building or optimizing data encodings for ML applications</li>\n<li>Implementing or working with BPE, WordPiece, or other tokenization algorithms</li>\n<li>Performance optimization of ML data processing systems</li>\n<li>Multi-language tokenization challenges and solutions</li>\n<li>Research environments where engineering directly enables scientific progress</li>\n<li>Distributed systems and parallel computing for ML workflows</li>\n<li>Large language models or other transformer-based architectures (not required)</li>\n</ul>\n<p>The annual compensation range for this role is $320,000-$405,000 USD.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_3b359ef2-6f8","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4952079008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$320,000-$405,000 USD","x-skills-required":["Machine Learning","Software Engineering","Python","Data Pipelines","ML Infrastructure"],"x-skills-preferred":["BPE","WordPiece","Tokenization Algorithms","Performance Optimization","Distributed Systems"],"datePosted":"2026-04-18T15:42:42.125Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Machine Learning, Software Engineering, Python, Data Pipelines, ML Infrastructure, BPE, WordPiece, Tokenization Algorithms, Performance Optimization, Distributed Systems","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":320000,"maxValue":405000,"unitText":"YEAR"}}}]}