{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/cluster-operation"},"x-facet":{"type":"skill","slug":"cluster-operation","display":"Cluster Operation","count":2},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_f28927b0-573"},"title":"Machine Learning Systems Research Engineer, Agent Post-training - Enterprise GenAI","description":"<p>At Scale, our mission is to accelerate the development of AI applications. We are working on an arsenal of proprietary research and resources that serve all of our enterprise clients. As an ML Sys Research Engineer, you&#39;ll work on building out the algorithms for our next-gen Agent RL training platform, support large scale training, and research and integrate state-of-the-art technologies to optimize our ML system.</p>\n<p>Your customer will be other MLREs and AAIs on the Enterprise AI team who are taking the training algorithms and applying them to client use-cases ranging from next-generation AI cybersecurity firewall LLMs to training foundation healthtech search models.</p>\n<p>If you are excited about shaping the future of the modern AI movement, we would love to hear from you!</p>\n<p>Key Responsibilities:</p>\n<ul>\n<li>Build, profile and optimize our training and inference framework.</li>\n<li>Post-train state of the art models, developed both internally and from the community, to define stable post-training recipes for our enterprise engagements.</li>\n<li>Collaborate with ML teams to accelerate their research and development, and enable them to develop the next generation of models and data curation.</li>\n<li>Create a next-gen agent training algorithm for multi-agent/multi-tool rollouts.</li>\n</ul>\n<p>Ideal Candidate:</p>\n<ul>\n<li>At least 1-3 years of LLM training in a production environment.</li>\n<li>Passionate about system optimization.</li>\n<li>Experience with post-training methods like RLHF/RLVR and related algorithms like PPO/GRPO etc.</li>\n<li>Ability to demonstrate know-how on how to operate the architecture of the modern GPU cluster.</li>\n<li>Experience with multi-node LLM training and inference.</li>\n<li>Strong software engineering skills, proficient in frameworks and tools such as CUDA, Pytorch, transformers, flash attention, etc.</li>\n<li>Strong written and verbal communication skills to operate in a cross functional team environment.</li>\n<li>PhD or Masters in Computer Science or a related field.</li>\n</ul>\n<p>Compensation:</p>\n<p>We offer competitive compensation packages, including base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position, determined by work location and additional factors, including job-related skills, experience, interview performance, and relevant education or training.</p>\n<p>Benefits:</p>\n<ul>\n<li>Comprehensive health, dental and vision coverage.</li>\n<li>Retirement benefits.</li>\n<li>A learning and development stipend.</li>\n<li>Generous PTO.</li>\n<li>Commuter stipend.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_f28927b0-573","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Scale","sameAs":"https://www.scale.com/","logo":"https://logos.yubhub.co/scale.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/scaleai/jobs/4625341005","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$189,600-$237,000 USD","x-skills-required":["LLM training","System optimization","Post-training methods","GPU cluster operation","Multi-node LLM training","Inference","CUDA","Pytorch","Transformers","Flash attention"],"x-skills-preferred":[],"datePosted":"2026-04-18T16:00:01.664Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA; New York, NY"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"LLM training, System optimization, Post-training methods, GPU cluster operation, Multi-node LLM training, Inference, CUDA, Pytorch, Transformers, Flash attention","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":189600,"maxValue":237000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_1abf8c6b-772"},"title":"Server Software Engineer","description":"<p>Electronic Arts creates next-level entertainment experiences that inspire players and fans around the world. Here, everyone is part of the story. Part of a community that connects across the globe. A place where creativity thrives, new perspectives are invited, and ideas matter. A team where everyone makes play happen.</p>\n<p>Our team is responsible for designing, implementing, and deploying server-side features for real-time scalable games. We manage various backend services to ensure stability and analyze issues that occur in live environments. We also operate large-scale Kubernetes clusters across multiple countries and develop real-time scalable game servers.</p>\n<p>As a Server Software Engineer, you will be part of a team that is passionate about creating innovative and engaging experiences for players. You will work closely with other engineers, designers, and artists to bring new ideas to life and collaborate with other teams to ensure seamless integration.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Design, implement, and deploy server-side features for real-time scalable games</li>\n<li>Manage various backend services to ensure stability</li>\n<li>Analyze issues that occur in live environments</li>\n<li>Operate large-scale Kubernetes clusters across multiple countries</li>\n<li>Develop real-time scalable game servers</li>\n</ul>\n<p>Requirements:</p>\n<ul>\n<li>7+ years of experience in related fields</li>\n<li>Proficiency in TypeScript and JavaScript</li>\n<li>Familiarity with microservices architecture, gRPC, and RESTful APIs</li>\n<li>Experience with NoSQL or RDBMS development or operation</li>\n</ul>\n<p>Preferred Skills:</p>\n<ul>\n<li>Experience with web services or distributed servers that handle large traffic</li>\n<li>Knowledge of container technology and cluster operation</li>\n<li>Experience with data collection and monitoring tools</li>\n<li>Computer science background</li>\n<li>Experience with various programming languages (C, C++, Python, Go, TypeScript)</li>\n<li>Experience with online or mobile game service launch and live operation</li>\n<li>Strong communication and problem-solving skills</li>\n</ul>\n<p>Submission Documents:</p>\n<ul>\n<li>Korean resume and career portfolio</li>\n</ul>\n<p>Hiring Process:</p>\n<ul>\n<li>Initial screening</li>\n<li>Coding test</li>\n<li>1st interview</li>\n<li>2nd interview</li>\n<li>3rd interview (subject to change)</li>\n</ul>\n<p>About Electronic Arts:</p>\n<p>We value adaptability, resilience, creativity, and curiosity. From leadership that brings out your potential, to creating space for learning and experimenting, we empower you to do great work and pursue opportunities for growth.</p>\n<p>We adopt a holistic approach to our benefits programs, emphasizing physical, emotional, financial, career, and community wellness to support a balanced life. Our packages are tailored to meet local needs and may include healthcare coverage, mental well-being support, retirement savings, paid time off, family leaves, complimentary games, and more.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_1abf8c6b-772","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Electronic Arts","sameAs":"https://jobs.ea.com","logo":"https://logos.yubhub.co/jobs.ea.com.png"},"x-apply-url":"https://jobs.ea.com/en_US/careers/JobDetail/RP-Server-SE/212941","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["TypeScript","JavaScript","microservices architecture","gRPC","RESTful APIs","NoSQL","RDBMS"],"x-skills-preferred":["web services","distributed servers","container technology","cluster operation","data collection","monitoring tools","computer science","programming languages","online game service","mobile game service"],"datePosted":"2026-03-09T11:13:20.826Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Seoul, Korea, Republic of"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"TypeScript, JavaScript, microservices architecture, gRPC, RESTful APIs, NoSQL, RDBMS, web services, distributed servers, container technology, cluster operation, data collection, monitoring tools, computer science, programming languages, online game service, mobile game service"}]}