{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/indexing-systems"},"x-facet":{"type":"skill","slug":"indexing-systems","display":"Indexing Systems","count":2},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_f29672c2-395"},"title":"Software Engineer, Foundations Retrieval","description":"<p>We&#39;re looking for a Software Engineer focused on building and scaling retrieval systems. You&#39;ll work with a team of researchers and engineers to develop infrastructure that enables models to retrieve and act on the right information at the right time. This includes designing and operating indexing systems, retrieval pipelines, and serving layers.</p>\n<p>This work supports retrieval across OpenAI products and research, with direct impact on system performance, reliability, and scale.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Build and scale retrieval infrastructure across indexing, serving, and query execution.</li>\n<li>Develop low-latency, high-throughput systems for real-time model interaction.</li>\n<li>Partner with research to productionize embedding and retrieval techniques.</li>\n<li>Support dense, sparse, and hybrid retrieval pipelines.</li>\n<li>Own system performance, reliability, and observability at scale.</li>\n<li>Collaborate across Pretraining, Inference, and Product teams to integrate retrieval end-to-end.</li>\n<li>Contribute to model - system interfaces for agentic workflows.</li>\n</ul>\n<p>You Might Thrive in This Role If You Have:</p>\n<ul>\n<li>Experience building and scaling distributed systems.</li>\n<li>Background in search, retrieval, or indexing systems.</li>\n<li>Familiarity with embedding-based or ML-powered systems.</li>\n<li>Experience with performance optimization and production reliability.</li>\n<li>Ability to work across ML and systems boundaries.</li>\n<li>First-principles thinking in ambiguous problem spaces.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_f29672c2-395","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://openai.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/40ed6975-ef61-4807-b748-37c2fa2b76c7","x-work-arrangement":"onsite","x-experience-level":"mid","x-job-type":"Full time","x-salary-range":"$380K – $555K","x-skills-required":["distributed systems","search","retrieval","indexing systems","embedding-based systems","ML-powered systems","performance optimization","production reliability"],"x-skills-preferred":[],"datePosted":"2026-04-24T12:24:02.234Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"distributed systems, search, retrieval, indexing systems, embedding-based systems, ML-powered systems, performance optimization, production reliability","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":380000,"maxValue":555000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_ceba9e5b-250"},"title":"Senior Backend Engineer, Product and Infra","description":"<p>We&#39;re looking for a Senior Backend Engineer to build the systems and services that power our product experience. You&#39;ll own the backend infrastructure that makes our content discoverable, our features responsive, and our platform reliable at scale.</p>\n<p>Your work will directly shape what users experience: designing APIs that serve rich content, building services that handle real-time interactions, implementing content-matching systems for rights and safety, and ensuring our platform performs under load. You&#39;ll architect systems that are fast, correct, and maintainable.</p>\n<p>You&#39;ll collaborate closely with Product, ML Research, and Mobile/Web teams to ship features that matter. We use Python, Go, BigQuery, Pub/Sub, and a microservices architecture,but we care more about good judgment than specific tool experience.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Design and maintain application-level data models that organize rich content into canonical structures optimized for product features, search, and retrieval.</li>\n<li>Build high-reliability ETLs and streaming pipelines to process usage events, analytics data, behavioral signals, and application logs.</li>\n<li>Develop data services that expose unified content to the application, such as metadata access APIs, indexing workflows, and retrieval-ready representations.</li>\n<li>Implement and refine fingerprinting pipelines used for deduplication, rights attribution, safety checks, and provenance validation.</li>\n<li>Own data consistency between ingestion systems, application surfaces, metadata storage, and downstream reporting environments.</li>\n<li>Define and track key operational metrics, including latency, completeness, accuracy, and event health.</li>\n<li>Collaborate with Product teams to ensure content structures and APIs support evolving features and high-quality user experiences.</li>\n<li>Partner with Analytics and Research teams to deliver clean usage datasets for experimentation, model evaluation, reporting, and internal insights.</li>\n<li>Operate large analytical workloads in BigQuery and build reusable Dataflow/Beam components for structured processing.</li>\n<li>Improve reliability and scale by designing robust schema evolution strategies, idempotent pipelines, and well-instrumented operational flows.</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>Experience building production backend services and APIs at scale</li>\n<li>Experience building ETL/ELT pipelines, event processing systems, and structured data models for applications or analytics</li>\n<li>Strong background in data modeling, metadata systems, indexing, or building canonical representations for heterogeneous content</li>\n<li>Proficiency in Python, Go, SQL, and scalable data-processing frameworks (Dataflow/Beam, Spark, or similar)</li>\n<li>Familiarity with BigQuery or other analytical data warehouses and strong comfort optimizing large queries and schemas</li>\n<li>Experience with event-driven architectures, Pub/Sub, or Kafka-like systems</li>\n<li>Strong understanding of data quality, schema evolution, lineage, and operational reliability</li>\n<li>Ability to design pipelines that balance cost, latency, correctness, and scale</li>\n<li>Clear communication skills and an ability to collaborate closely with Product, Research, and Analytics stakeholders</li>\n</ul>\n<p><strong>Nice to Have</strong></p>\n<ul>\n<li>Experience building application-facing APIs or microservices that expose structured content</li>\n<li>Background in information retrieval, indexing systems, or search infrastructure</li>\n<li>Experience with fingerprinting, perceptual hashing, audio similarity metrics, or content-matching algorithms</li>\n<li>Familiarity with ML workflows and how downstream analytics and usage data feed back into research pipelines</li>\n<li>Understanding of batch + streaming architectures and how to blend them effectively</li>\n<li>Experience with Go, Next.js, or React Native for occasional full-stack contributions</li>\n</ul>\n<p><strong>Why Join Us</strong></p>\n<p>You will design the core data services and pipelines that power our product experience, analytics, and business operations. You’ll work on high-impact data challenges involving real-time signals, large-scale metadata systems, and cross-platform consistency. You’ll join a small, fast-moving team where you’ll shape the structure, reliability, and intelligence of our downstream data ecosystem.</p>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Highly competitive salary and equity</li>\n<li>Quarterly productivity budget</li>\n<li>Flexible time off</li>\n<li>Fantastic office location in Manhattan</li>\n<li>Productivity package, including ChatGPT Plus, Claude Code, and Copilot</li>\n<li>Top-notch private health, dental, and vision insurance for you and your dependents</li>\n<li>401(k) plan options with employer matching</li>\n<li>Concierge medical/primary care through One Medical and Rightway</li>\n<li>Mental health support from Spring Health</li>\n<li>Personalized life insurance, travel assistance, and many other perks</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_ceba9e5b-250","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Udio","sameAs":"https://www.udio.com/","logo":"https://logos.yubhub.co/udio.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/udio/jobs/4987729008","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$180,000 - $220,000","x-skills-required":["Python","Go","BigQuery","Pub/Sub","Data modeling","Metadata systems","Indexing","Canonical representations","ETL/ELT pipelines","Event processing systems","Structured data models","Scalable data-processing frameworks","Analytical data warehouses","Event-driven architectures","Kafka-like systems","Data quality","Schema evolution","Lineage","Operational reliability"],"x-skills-preferred":["Application-facing APIs","Microservices","Information retrieval","Indexing systems","Search infrastructure","Fingerprinting","Perceptual hashing","Audio similarity metrics","Content-matching algorithms","ML workflows","Batch + streaming architectures"],"datePosted":"2026-04-17T13:05:20.076Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"New York"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Go, BigQuery, Pub/Sub, Data modeling, Metadata systems, Indexing, Canonical representations, ETL/ELT pipelines, Event processing systems, Structured data models, Scalable data-processing frameworks, Analytical data warehouses, Event-driven architectures, Kafka-like systems, Data quality, Schema evolution, Lineage, Operational reliability, Application-facing APIs, Microservices, Information retrieval, Indexing systems, Search infrastructure, Fingerprinting, Perceptual hashing, Audio similarity metrics, Content-matching algorithms, ML workflows, Batch + streaming architectures","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":180000,"maxValue":220000,"unitText":"YEAR"}}}]}