{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/synthetic-data-generation"},"x-facet":{"type":"skill","slug":"synthetic-data-generation","display":"Synthetic Data Generation","count":1},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_cdd2fbec-490"},"title":"Member of Technical Staff - Mid-training","description":"<p><strong>About xAI</strong></p>\n<p>xAI&#39;s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge.</p>\n<p>Our team is small and highly motivated, focused on engineering excellence. We operate with a flat organisational structure, where all employees are expected to be hands-on and contribute directly to the company&#39;s mission.</p>\n<p><strong>Responsibilities</strong></p>\n<p>We&#39;re looking for a Member of Technical Staff to join our team. Key responsibilities include:</p>\n<p>Scaling synthetic coding data to trillions of tokens with large-scale Docker verification. Distilling the intelligence of flagship models into flash models through synthetic data generation. Optimising mid-training data mixtures to boost the ceiling for RL. Engineering long-context data recipes. Developing robust and diverse evaluation for mid-training checkpoints.</p>\n<p><strong>Basic Qualifications</strong></p>\n<p>To be successful in this role, you&#39;ll need:</p>\n<p>Expertise in ML and large model scaling, with familiarity across all kinds of scaling laws. Strong ability to design ML experiments. Familiarity with state-of-the-art techniques for curating AI training data for text, image, audio, and video modalities. Strong engineering abilities in Spark, Ray, and other frameworks for large-scale data processing.</p>\n<p><strong>Compensation and Benefits</strong></p>\n<p>The base salary for this role is $180,000 - $440,000 USD. Our total rewards package includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short &amp; long-term disability insurance, life insurance, and various other discounts and perks.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_cdd2fbec-490","directApply":true,"hiringOrganization":{"@type":"Organization","name":"xAI","sameAs":"https://www.xai.com/","logo":"https://logos.yubhub.co/xai.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/xai/jobs/4965893007","x-work-arrangement":"onsite","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$180,000 - $440,000 USD","x-skills-required":["ML","large model scaling","Docker verification","synthetic data generation","Spark","Ray"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:43:42.841Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Palo Alto, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"ML, large model scaling, Docker verification, synthetic data generation, Spark, Ray","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":180000,"maxValue":440000,"unitText":"YEAR"}}}]}