{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/storage-formats"},"x-facet":{"type":"skill","slug":"storage-formats","display":"Storage Formats","count":4},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_8f03ad2d-96f"},"title":"Software Engineer, Research Data Platform","description":"<p>We&#39;re looking for engineers who love working directly with users and who excel at building data products. The Research Data Platform team builds the tools that Anthropic&#39;s researchers use every day to manage, query, and analyze the data that goes into training and evaluating frontier models.</p>\n<p>As a Software Engineer on the Research Data Platform team, you will:</p>\n<ul>\n<li>Build and operate data pipelines that extract data from research training runs and land it in storage systems that are easy and fast to query</li>\n<li>Work closely with researchers to design and build APIs, libraries, and web interfaces that support data management, exploration, and analysis</li>\n<li>Develop dataset management, data cataloging, and provenance tooling that researchers use in their day-to-day work</li>\n<li>Embed with research teams to understand their workflows, identify high-leverage tooling opportunities, and ship solutions quickly</li>\n<li>Collaborate with adjacent teams to build on existing systems rather than reinventing them</li>\n</ul>\n<p>We do not require prior ML or AI training experience. If you enjoy working closely with technical users, learning new domains quickly, and building tools people actually want to use, you&#39;ll pick up the research context fast.</p>\n<p>Strong candidates may also have experience with large-scale ETL, columnar storage formats, and query engines (e.g., Spark, BigQuery, DuckDB, Parquet), high-volume time series data , ingestion, storage, and efficient querying, data cataloging, lineage, or metadata management systems, or ML experiment tracking or metrics platforms.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_8f03ad2d-96f","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5191226008","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$320,000-$405,000 USD","x-skills-required":["large-scale ETL","columnar storage formats","query engines","high-volume time series data","data cataloging","lineage","metadata management systems","ML experiment tracking"],"x-skills-preferred":["Spark","BigQuery","DuckDB","Parquet"],"datePosted":"2026-04-18T15:55:38.971Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"large-scale ETL, columnar storage formats, query engines, high-volume time series data, data cataloging, lineage, metadata management systems, ML experiment tracking, Spark, BigQuery, DuckDB, Parquet","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":320000,"maxValue":405000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_22ff82ac-40b"},"title":"Software Engineer, Research Data Platform","description":"<p>We&#39;re looking for engineers who love working directly with users and who excel at building data products. The Research Data Platform team builds the tools that Anthropic&#39;s researchers use every day to manage, query, and analyze the data that goes into training and evaluating frontier models.</p>\n<p>As a software engineer on this team, you will:</p>\n<ul>\n<li>Build and operate data pipelines that extract data from research training runs and land it in storage systems that are easy and fast to query</li>\n<li>Work closely with researchers to design and build APIs, libraries, and web interfaces that support data management, exploration, and analysis</li>\n<li>Develop dataset management, data cataloging, and provenance tooling that researchers use in their day-to-day work</li>\n<li>Embed with research teams to understand their workflows, identify high-leverage tooling opportunities, and ship solutions quickly</li>\n<li>Collaborate with adjacent teams to build on existing systems rather than reinventing them</li>\n</ul>\n<p>You may be a good fit if you have significant software engineering experience, particularly building data-intensive applications or internal tooling. You should enjoy working directly with users, gathering requirements iteratively, and shipping things that get adopted. You should also be results-oriented, with a bias towards flexibility and impact.</p>\n<p>Strong candidates may also have experience with large-scale ETL, columnar storage formats, and query engines, high-volume time series data, data cataloging, lineage, or metadata management systems, ML experiment tracking or metrics platforms, complex data visualization, and full-stack web application development.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_22ff82ac-40b","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5191226008","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$320,000-$405,000 USD","x-skills-required":["software engineering","data-intensive applications","internal tooling","data pipelines","storage systems","APIs","libraries","web interfaces","dataset management","data cataloging","provenance tooling","research workflows","adjacent teams"],"x-skills-preferred":["large-scale ETL","columnar storage formats","query engines","high-volume time series data","lineage","metadata management systems","ML experiment tracking","metrics platforms","complex data visualization","full-stack web application development"],"datePosted":"2026-04-18T15:51:29.293Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"software engineering, data-intensive applications, internal tooling, data pipelines, storage systems, APIs, libraries, web interfaces, dataset management, data cataloging, provenance tooling, research workflows, adjacent teams, large-scale ETL, columnar storage formats, query engines, high-volume time series data, lineage, metadata management systems, ML experiment tracking, metrics platforms, complex data visualization, full-stack web application development","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":320000,"maxValue":405000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_d8e9f3a9-77b"},"title":"Intermediate Backend Engineer (C), Tenant Scale: Git","description":"<p>As a Backend Engineer on the Tenant Scale: Git team, you&#39;ll contribute to improving one of the most widely used foundations of modern software development: Git. Git is at the core of how developers collaborate, and this role focuses on making Git and Gitaly more capable, reliable, and efficient for GitLab and the people who use our platform.</p>\n<p>You&#39;ll participate in architectural discussions and technical decisions related to Git and Gitaly, helping drive implementation choices that improve correctness, performance, and maintainability. You&#39;ll contribute features, bug fixes, and performance improvements to upstream Git in line with team and community goals, delivering changes that improve repository access and reliability for users.</p>\n<p>Adapt Gitaly to make effective use of Git capabilities, including integrating newly available features to improve scalability, efficiency, and long-term maintainability. Connect discussions in the open source Git project with GitLab&#39;s product direction and engineering work, helping align upstream contributions with product and platform needs.</p>\n<p>Scope tasks, estimate effort, and describe implementation plans that support the team&#39;s priorities and enable predictable delivery of technical work. Test and validate the features you build and integrate, with a focus on correctness and reliability to reduce regressions and support stable production use.</p>\n<p>Collaborate with team members, contributors, and the Git ecosystem. Represent GitLab as a constructive participant in the open source ecosystem, building productive relationships that support ongoing collaboration with the Git community.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_d8e9f3a9-77b","directApply":true,"hiringOrganization":{"@type":"Organization","name":"GitLab","sameAs":"https://about.gitlab.com/","logo":"https://logos.yubhub.co/about.gitlab.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/gitlab/jobs/8497793002","x-work-arrangement":"remote","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Git internals","C programming language","Go for backend development","Linux internals","Open source projects","Distributed systems","Storage formats","Graph theory","Highly available production environments"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:43:30.153Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Remote, India"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Git internals, C programming language, Go for backend development, Linux internals, Open source projects, Distributed systems, Storage formats, Graph theory, Highly available production environments"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_c6f5337c-c2f"},"title":"Research Engineer (Scaling Multimodal Data)","description":"<p>We&#39;re looking for a research engineer to help improve our in-house world models through better multimodal data. This role is about figuring out what data actually moves model quality , then building the datasets, pipelines, and experiments to prove it.</p>\n<p>The best generative models aren’t just a product of model architecture and compute, they are a product of the training data. The model output reflects someone’s obsession over what goes into the data, how it’s processed, and what gets thrown away. We’re looking for the person who does the obsessing and builds the tools to act on it at scale.</p>\n<p>This isn’t a role where someone hands you a dataset and asks you to clean it. You will decide what data we need, figure out where to get it, build the processing and curation systems, and close the loop with model training to make sure it actually works.</p>\n<p><strong>Responsibilities:</strong></p>\n<ul>\n<li>Discover, evaluate, and acquire training data</li>\n<li>Build data processing and curation systems</li>\n<li>Look at the actual data constantly</li>\n<li>Close the data → model → evaluation loop</li>\n<li>Deploy ML models for data enrichment</li>\n<li>Make systematic, documented decisions</li>\n</ul>\n<p><strong>Requirements:</strong></p>\n<ul>\n<li>Strong software engineering fundamentals</li>\n<li>Deep experience with image and video data at scale</li>\n<li>Experience with distributed computing</li>\n<li>Experience using ML models as components</li>\n<li>A research-oriented approach to data decisions</li>\n<li>Familiarity with the model training lifecycle</li>\n</ul>\n<p><strong>Nice to Have:</strong></p>\n<ul>\n<li>Familiarity with columnar and large-scale data storage formats and libraries</li>\n<li>Track record of independently discovering and integrating new data sources into a training pipeline</li>\n<li>Direct experience closing the data → model quality loop</li>\n<li>Strong visual intuition for data quality and diversity</li>\n</ul>\n<p><strong>What This Isn’t:</strong></p>\n<ul>\n<li>Not infrastructure</li>\n<li>Not pure research</li>\n<li>Not a role where you wait for instructions</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_c6f5337c-c2f","directApply":true,"hiringOrganization":{"@type":"Organization","name":"World Labs","sameAs":"https://world-labs.com/","logo":"https://logos.yubhub.co/world-labs.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/worldlabs/jobs/4164503009","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["software engineering fundamentals","image and video data at scale","distributed computing","ML models as components","research-oriented approach to data decisions","model training lifecycle"],"x-skills-preferred":["columnar and large-scale data storage formats and libraries","independently discovering and integrating new data sources","closing the data → model quality loop","visual intuition for data quality and diversity"],"datePosted":"2026-04-17T13:09:48.326Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"software engineering fundamentals, image and video data at scale, distributed computing, ML models as components, research-oriented approach to data decisions, model training lifecycle, columnar and large-scale data storage formats and libraries, independently discovering and integrating new data sources, closing the data → model quality loop, visual intuition for data quality and diversity"}]}