{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/gpu"},"x-facet":{"type":"skill","slug":"gpu","display":"Gpu","count":100},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_f28927b0-573"},"title":"Machine Learning Systems Research Engineer, Agent Post-training - Enterprise GenAI","description":"<p>At Scale, our mission is to accelerate the development of AI applications. We are working on an arsenal of proprietary research and resources that serve all of our enterprise clients. As an ML Sys Research Engineer, you&#39;ll work on building out the algorithms for our next-gen Agent RL training platform, support large scale training, and research and integrate state-of-the-art technologies to optimize our ML system.</p>\n<p>Your customer will be other MLREs and AAIs on the Enterprise AI team who are taking the training algorithms and applying them to client use-cases ranging from next-generation AI cybersecurity firewall LLMs to training foundation healthtech search models.</p>\n<p>If you are excited about shaping the future of the modern AI movement, we would love to hear from you!</p>\n<p>Key Responsibilities:</p>\n<ul>\n<li>Build, profile and optimize our training and inference framework.</li>\n<li>Post-train state of the art models, developed both internally and from the community, to define stable post-training recipes for our enterprise engagements.</li>\n<li>Collaborate with ML teams to accelerate their research and development, and enable them to develop the next generation of models and data curation.</li>\n<li>Create a next-gen agent training algorithm for multi-agent/multi-tool rollouts.</li>\n</ul>\n<p>Ideal Candidate:</p>\n<ul>\n<li>At least 1-3 years of LLM training in a production environment.</li>\n<li>Passionate about system optimization.</li>\n<li>Experience with post-training methods like RLHF/RLVR and related algorithms like PPO/GRPO etc.</li>\n<li>Ability to demonstrate know-how on how to operate the architecture of the modern GPU cluster.</li>\n<li>Experience with multi-node LLM training and inference.</li>\n<li>Strong software engineering skills, proficient in frameworks and tools such as CUDA, Pytorch, transformers, flash attention, etc.</li>\n<li>Strong written and verbal communication skills to operate in a cross functional team environment.</li>\n<li>PhD or Masters in Computer Science or a related field.</li>\n</ul>\n<p>Compensation:</p>\n<p>We offer competitive compensation packages, including base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position, determined by work location and additional factors, including job-related skills, experience, interview performance, and relevant education or training.</p>\n<p>Benefits:</p>\n<ul>\n<li>Comprehensive health, dental and vision coverage.</li>\n<li>Retirement benefits.</li>\n<li>A learning and development stipend.</li>\n<li>Generous PTO.</li>\n<li>Commuter stipend.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_f28927b0-573","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Scale","sameAs":"https://www.scale.com/","logo":"https://logos.yubhub.co/scale.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/scaleai/jobs/4625341005","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$189,600-$237,000 USD","x-skills-required":["LLM training","System optimization","Post-training methods","GPU cluster operation","Multi-node LLM training","Inference","CUDA","Pytorch","Transformers","Flash attention"],"x-skills-preferred":[],"datePosted":"2026-04-18T16:00:01.664Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA; New York, NY"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"LLM training, System optimization, Post-training methods, GPU cluster operation, Multi-node LLM training, Inference, CUDA, Pytorch, Transformers, Flash attention","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":189600,"maxValue":237000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_460d00aa-b48"},"title":"Senior / Staff+ Software Engineer, Voice Platform","description":"<p>About the role</p>\n<p>We&#39;re building the infrastructure that lets people talk to Claude,real-time, bidirectional voice conversations that feel natural, responsive, and safe. This is foundational work for how millions of people will interact with AI.</p>\n<p>The Voice Platform team designs and operates the serving systems, streaming pipelines, and APIs that bring Anthropic&#39;s audio models from research into production across Claude.ai, our mobile apps, and the Anthropic API. You&#39;ll work at the intersection of real-time media, low-latency inference, and distributed systems,building infrastructure where every millisecond of latency is felt by the user.</p>\n<p>We partner closely with the Audio research team, who train the speech understanding and generation models, and with product teams shipping voice experiences to users. Your job is to make those models fast, reliable, and delightful to talk to at scale.</p>\n<p>Responsibilities</p>\n<ul>\n<li>Design and build the real-time streaming infrastructure that powers voice conversations with Claude,ingesting microphone audio, orchestrating model inference, and streaming synthesized speech back with minimal latency</li>\n</ul>\n<ul>\n<li>Build low-latency serving systems for speech models, optimizing time-to-first-audio and end-to-end conversational responsiveness</li>\n</ul>\n<ul>\n<li>Develop the public and internal APIs that expose voice capabilities to Claude.ai, mobile clients, and third-party developers</li>\n</ul>\n<ul>\n<li>Own the audio transport layer,codecs, jitter buffers, adaptive bitrate, packet loss recovery,so conversations stay smooth across unreliable networks</li>\n</ul>\n<ul>\n<li>Build observability and quality-measurement systems for voice: latency distributions, audio quality metrics, interruption handling, and turn-taking accuracy</li>\n</ul>\n<ul>\n<li>Partner with Audio research to move new model architectures from experiment to production, and feed real-world performance data back into research</li>\n</ul>\n<ul>\n<li>Collaborate with mobile and product engineering on client-side audio capture, playback, and the end-to-end user experience</li>\n</ul>\n<p>You may be a good fit if you</p>\n<ul>\n<li>Have 6+ years of experience building distributed systems, real-time infrastructure, or platform services at scale</li>\n</ul>\n<ul>\n<li>Have shipped production systems where latency is measured in tens of milliseconds and users notice when you miss</li>\n</ul>\n<ul>\n<li>Are comfortable working across the stack,from transport protocols and serving infrastructure up to the APIs product teams build on</li>\n</ul>\n<ul>\n<li>Are results-oriented, with a bias toward flexibility and impact</li>\n</ul>\n<ul>\n<li>Pick up slack, even if it goes outside your job description</li>\n</ul>\n<ul>\n<li>Enjoy pair programming (we love to pair!)</li>\n</ul>\n<ul>\n<li>Care about the societal impacts of voice AI and want to help shape how these systems are developed responsibly</li>\n</ul>\n<ul>\n<li>Are comfortable with ambiguity,voice is a fast-moving space, and you&#39;ll help define the architecture as we learn what works</li>\n</ul>\n<p>Strong candidates may also have experience with</p>\n<ul>\n<li>Real-time media protocols and stacks: WebRTC, RTP, gRPC bidirectional streaming, or WebSockets at scale</li>\n</ul>\n<ul>\n<li>Audio engineering fundamentals: codecs (Opus, AAC), voice activity detection, echo cancellation, jitter buffering, or audio DSP</li>\n</ul>\n<ul>\n<li>Low-latency ML inference serving, streaming model outputs, or GPU-based serving infrastructure</li>\n</ul>\n<ul>\n<li>Telephony, live streaming, video conferencing, or voice assistant platforms</li>\n</ul>\n<ul>\n<li>Mobile audio pipelines on iOS (AVAudioEngine, AudioUnits) or Android (Oboe, AAudio)</li>\n</ul>\n<ul>\n<li>Working alongside ML researchers to productionize models,speech experience is a plus but not required</li>\n</ul>\n<p>Representative projects</p>\n<ul>\n<li>Driving time-to-first-audio below human perceptual thresholds by co-designing the serving pipeline with the Audio research team</li>\n</ul>\n<ul>\n<li>Building a streaming inference orchestrator that interleaves speech recognition, LLM reasoning, and speech synthesis with overlapping execution</li>\n</ul>\n<ul>\n<li>Designing the voice mode API surface for the Anthropic API so developers can build their own voice agents on Claude</li>\n</ul>\n<ul>\n<li>Implementing graceful barge-in and interruption handling so users can cut Claude off mid-sentence naturally</li>\n</ul>\n<ul>\n<li>Instrumenting end-to-end audio quality metrics and building dashboards that catch regressions before users do</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_460d00aa-b48","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5172245008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$320,000-$485,000 USD","x-skills-required":["Real-time media protocols and stacks","Audio engineering fundamentals","Low-latency ML inference serving","Distributed systems","Streaming pipelines","APIs"],"x-skills-preferred":["WebRTC","RTP","gRPC bidirectional streaming","WebSockets","Opus","AAC","Voice activity detection","Echo cancellation","Jitter buffering","Audio DSP","GPU-based serving infrastructure","Telephony","Live streaming","Video conferencing","Voice assistant platforms","Mobile audio pipelines on iOS","Android","Working alongside ML researchers"],"datePosted":"2026-04-18T15:59:54.712Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Real-time media protocols and stacks, Audio engineering fundamentals, Low-latency ML inference serving, Distributed systems, Streaming pipelines, APIs, WebRTC, RTP, gRPC bidirectional streaming, WebSockets, Opus, AAC, Voice activity detection, Echo cancellation, Jitter buffering, Audio DSP, GPU-based serving infrastructure, Telephony, Live streaming, Video conferencing, Voice assistant platforms, Mobile audio pipelines on iOS, Android, Working alongside ML researchers","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":320000,"maxValue":485000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_6ddce508-2c7"},"title":"ML Systems Engineer, Robotics","description":"<p>We&#39;re looking for an experienced ML Systems Engineer to join our Physical AI team. As an ML Systems Engineer, you will design and build platforms for scalable, reliable, and efficient serving of foundation models specifically tailored for physical agents. Our platform powers cutting-edge research and production systems, supporting both internal research discovery and external customer use cases for autonomous vehicles and robotics.</p>\n<p>In this role, you will:</p>\n<ul>\n<li>Build &amp; Scale: Maintain fault-tolerant, high-performance systems for serving robotics-related models and foundation models at scale, ensuring low latency for real-time applications.</li>\n<li>Platform Development: Build an internal platform to empower model capability discovery, enabling faster iteration cycles for research teams working on robotics.</li>\n<li>Collaborate: Work closely with Robotics researchers and Computer Vision engineers to integrate and optimize models for production and research environments.</li>\n<li>Design Excellence: Conduct architecture and design reviews to uphold best practices in system scalability, reliability, and security.</li>\n<li>Observability: Develop monitoring and observability solutions to ensure system health and real-time performance tracking of model inference.</li>\n<li>Lead: Own projects end-to-end, from requirements gathering to implementation, in a fast-paced, cross-functional environment.</li>\n</ul>\n<p>Ideally, you&#39;d have:</p>\n<ul>\n<li>Experience: 4+ years of experience building large-scale, high-performance backend systems, with deep experience in machine learning infrastructure.</li>\n<li>Algorithm Optimization: Deep experience optimizing computer vision and other machine learning algorithms for cloud environments, including GPU-level algorithm optimizations (e.g., CUDA, kernel tuning).</li>\n<li>Programming: Strong skills in one or more systems-level languages (e.g., Python, Go, Rust, C++).</li>\n<li>Systems Fundamentals: Deep understanding of serving and routing fundamentals (e.g., rate limiting, load balancing, compute budgets, concurrency) for data-intensive applications.</li>\n<li>Infrastructure: Experience with containers (Docker), orchestration (Kubernetes), and cloud providers (AWS/GCP).</li>\n<li>IaC: Familiarity with infrastructure as code (e.g., Terraform).</li>\n<li>Mindset: Proven ability to solve complex problems and work independently in fast-moving environments.</li>\n</ul>\n<p>Nice to Haves:</p>\n<ul>\n<li>Exposure to Vision-Language-Action (VLA) models.</li>\n<li>Knowledge of high-performance video processing (e.g., FFmpeg, NVDEC/NVENC) or 3D data handling (point clouds).</li>\n<li>Familiarity with robotics middleware (e.g., ROS/ROS2) or AV data formats.</li>\n</ul>\n<p>Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position, determined by work location and additional factors, including job-related skills, experience, interview performance, and relevant education or training.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_6ddce508-2c7","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Scale","sameAs":"https://www.scale.com/","logo":"https://logos.yubhub.co/scale.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/scaleai/jobs/4663053005","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$227,200-$284,000 USD","x-skills-required":["Machine Learning","Backend Systems","Cloud Environments","GPU-Level Algorithm Optimizations","Systems-Level Languages","Containerization","Orchestration","Cloud Providers","Infrastructure as Code"],"x-skills-preferred":["Vision-Language-Action Models","High-Performance Video Processing","3D Data Handling","Robotics Middleware","AV Data Formats"],"datePosted":"2026-04-18T15:59:25.195Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Machine Learning, Backend Systems, Cloud Environments, GPU-Level Algorithm Optimizations, Systems-Level Languages, Containerization, Orchestration, Cloud Providers, Infrastructure as Code, Vision-Language-Action Models, High-Performance Video Processing, 3D Data Handling, Robotics Middleware, AV Data Formats","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":227200,"maxValue":284000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_770c5fe8-cce"},"title":"Staff Security Engineer, Vulnerability Management","description":"<p>We are seeking a Staff Security Engineer to lead the most complex technical work in CoreWeave&#39;s Vulnerability Management program.</p>\n<p>As a Staff Security Engineer, you will design and implement scalable triage, prioritization, and remediation-tracking systems across application, infrastructure, and hardware domains. You will set technical standards, drive high-impact initiatives, and mentor engineers through technical leadership, while partnering with leadership on priorities and execution risks.</p>\n<p>Key Responsibilities:</p>\n<ul>\n<li>Lead high-complexity VM technical initiatives and deliver architecture decisions for assigned program areas</li>\n<li>Design and build scalable triage automation, including integrations, decision logic, and production hardening</li>\n<li>Implement end-to-end workflow components from assessment and detection to ticket routing and remediation tracking</li>\n<li>Provide deep technical leadership on hardware-adjacent vulnerabilities (GPU firmware, DPU firmware/BlueField, and BMC surfaces)</li>\n<li>Act as senior technical responder for embargoed disclosures and zero-day events, coordinating with owner teams that deploy fixes</li>\n<li>Improve prioritization logic, severity models, and exception workflows through code, design reviews, and technical proposals</li>\n<li>Produce actionable technical metrics and risk insights for leadership consumption</li>\n<li>Lead root-cause analysis for high-impact vulnerability incidents and implement durable technical improvements</li>\n<li>Mentor IC3/IC4/IC5 engineers through design guidance, code review, and incident coaching</li>\n<li>Partner with security, engineering, and operational stakeholders to improve workflow reliability and accelerate remediation outcomes</li>\n</ul>\n<p>Requirements:</p>\n<ul>\n<li>9+ years of relevant experience with demonstrated strategic impact in vulnerability management, application security, platform security, or cloud security engineering</li>\n<li>Proven track record building and scaling security automation (SOAR workflows, AI/ML systems, detection pipelines) in production environments</li>\n<li>Deep subject matter expertise with vulnerability management best practices: CVSS, EPSS, CISA KEV, threat intelligence integration, and risk-based prioritization frameworks</li>\n<li>Excellent development background with strong coding skills in Python, Go, or similar languages for building scalable, production-grade security systems</li>\n<li>Significant experience with modern vulnerability management tooling (for example Wiz, Semgrep, Rapid7, Tenable, or equivalent)</li>\n<li>Experience with specialized infrastructure: GPU/DPU environments, firmware security, hardware vulnerabilities, or high-performance computing</li>\n<li>Demonstrated track record mentoring engineers across levels and driving cross-functional technical initiatives at organizational scale</li>\n<li>Strong business acumen and understanding of how security decisions impact engineering velocity, customer trust, and business outcomes</li>\n</ul>\n<p>Preferred Qualifications:</p>\n<ul>\n<li>Practical experience building AI/ML-powered security systems (LLM integration, automated decision-making, human-in-the-loop validation) in production</li>\n<li>Experience managing hardware vendor security partnerships (embargoed disclosures and pre-release collaboration)</li>\n<li>Production experience with security automation platforms such as TINES and serverless frameworks (AWS Lambda, GCP Cloud Functions)</li>\n<li>Strong DevOps, DevSecOps, or SRE background with deep experience in AWS/GCP/Azure cloud services and Infrastructure as Code (Terraform, CloudFormation)</li>\n<li>Deep understanding of Kubernetes security (container scanning, admission controllers, supply chain security, runtime protection)</li>\n<li>Experience leading security programs through rapid hypergrowth (10x+ infrastructure scaling) in startup or cloud-native environments</li>\n<li>Practical experience managing vulnerabilities within a FedRAMP-certified environment or similar regulatory frameworks</li>\n</ul>\n<p>Salary and Benefits: The base salary range for this role is $188,000 to $275,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).</p>\n<p>Work Environment:</p>\n<p>While we prioritize a hybrid work environment, remote work may be considered for candidates located more than 30 miles from an office, based on role requirements for specialized skill sets. New hires will be invited to attend onboarding at one of our hubs within their first month. Teams also gather quarterly to support collaboration.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_770c5fe8-cce","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4653130006","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$188,000 to $275,000","x-skills-required":["vulnerability management","application security","platform security","cloud security engineering","security automation","AI/ML systems","detection pipelines","Python","Go","modern vulnerability management tooling","GPU/DPU environments","firmware security","hardware vulnerabilities","high-performance computing"],"x-skills-preferred":["AI/ML-powered security systems","LLM integration","automated decision-making","human-in-the-loop validation","security automation platforms","TINES","serverless frameworks","AWS Lambda","GCP Cloud Functions","DevOps","DevSecOps","SRE","Kubernetes security","container scanning","admission controllers","supply chain security","runtime protection"],"datePosted":"2026-04-18T15:59:06.360Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"vulnerability management, application security, platform security, cloud security engineering, security automation, AI/ML systems, detection pipelines, Python, Go, modern vulnerability management tooling, GPU/DPU environments, firmware security, hardware vulnerabilities, high-performance computing, AI/ML-powered security systems, LLM integration, automated decision-making, human-in-the-loop validation, security automation platforms, TINES, serverless frameworks, AWS Lambda, GCP Cloud Functions, DevOps, DevSecOps, SRE, Kubernetes security, container scanning, admission controllers, supply chain security, runtime protection","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":188000,"maxValue":275000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_057b8651-835"},"title":"AI Strategy Consultant, Frontier Tech","description":"<p>As a member of our Frontier Tech Consultant team, you will play a critical role in advancing cutting-edge AI innovations by conducting high-impact experiments and ensuring seamless execution at the highest quality standards.</p>\n<p>Your work will directly contribute to Scale AI’s growth, shaping the future of artificial intelligence. In this role, you will be working on various types of projects, including but not limited to: research experiments, dataset generation, data quality improvements, and in-depth technical analysis.</p>\n<p>You will tackle complex, technical and operational challenges while collaborating closely with Scale’s ML research scientists and SPM team.</p>\n<p>The ideal candidate is analytical, detail-oriented, and results-driven, with strong problem-solving abilities and excellent communication skills.</p>\n<p>We are looking for someone who thrives in a fast-paced environment, is proactive in overcoming challenges, and is committed to delivering exceptional outcomes.</p>\n<p>If you are eager to contribute to the forefront of AI innovation, we encourage you to apply.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Design and execute research experiments</li>\n<li>Build and evaluate frontier LLM datasets</li>\n<li>Develop training and testing material for frontier pipelines</li>\n<li>Improve quality of existing and new products</li>\n</ul>\n<p>Ideally you’d have:</p>\n<ul>\n<li>Strong machine learning knowledge, either by being in the final years of a ML PhD career or having already graduated</li>\n<li>Strong writing and verbal communication skills</li>\n<li>An action-oriented mindset that balances creative problem solving with the scrappiness to ultimately deliver results</li>\n<li>Analytical, planning, and process improvement capability</li>\n<li>Experience working in a fast-paced, entrepreneurial environment</li>\n<li>Technical skills including familiarity with Python, GPU, AWS, API, LLM, ML, and SQL</li>\n</ul>\n<p>Pay: $60-80/hr</p>\n<p>Commitment: This is a fully remote, US-based part-time (10-20 hours per week), on-going contract position staffed via HireArt.</p>\n<p>HireArt values diversity and is an Equal Opportunity Employer. We are interested in every qualified candidate who is eligible to work in the United States. Unfortunately, we are not able to sponsor visas, including CPT/OPT or employ corp-to-corp.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_057b8651-835","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Scale","sameAs":"https://scale.com/","logo":"https://logos.yubhub.co/scale.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/scaleai/jobs/4472223005","x-work-arrangement":"remote","x-experience-level":null,"x-job-type":"contract","x-salary-range":"$60-80/hr","x-skills-required":["Python","GPU","AWS","API","LLM","ML","SQL"],"x-skills-preferred":["Machine Learning","Data Analysis","Problem Solving"],"datePosted":"2026-04-18T15:59:01.983Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA"}},"jobLocationType":"TELECOMMUTE","employmentType":"CONTRACTOR","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, GPU, AWS, API, LLM, ML, SQL, Machine Learning, Data Analysis, Problem Solving"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_04ee7215-acf"},"title":"Sr. Manager, Engineering - Model Serving","description":"<p>At Databricks, we enable data teams to solve the world&#39;s toughest problems by building and running the world&#39;s best data and AI infrastructure platform. Our Model Serving product provides enterprises with a unified, scalable, and governed platform to deploy and manage AI/ML models. As a Senior Engineering Manager, you will lead the team owning both the product experience and the foundational infrastructure of Model Serving, shaping customer-facing capabilities while designing for scalability, extensibility, and performance across both CPU and GPU inference. The impact you will have includes leading, mentoring, and growing a high-performing engineering team, defining and owning the product and technical roadmap for Model Serving, collaborating closely with product, research, platform, and infrastructure teams, and ensuring Model Serving meets stringent SLAs, SLOs, and performance and reliability goals.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Leading, mentoring, and growing a high-performing engineering team responsible for both the customer-facing Model Serving product and its foundational infrastructure.</li>\n<li>Defining and owning the product and technical roadmap for Model Serving, balancing customer experience, functionality, and foundational investments across deployment, inference, monitoring, and scaling.</li>\n<li>Collaborating closely with product, research, platform, and infrastructure teams to drive end-to-end delivery from ideation and prioritization to launch and operation.</li>\n<li>Ensuring Model Serving meets stringent SLAs, SLOs, and performance and reliability goals, continuously improving operational efficiency and customer experience.</li>\n<li>Driving architectural decisions and product design around latency, throughput, autoscaling, GPU/CPU placement, and cost optimization.</li>\n<li>Advocating for customer needs through direct engagement, ensuring engineering decisions translate to clear product impact.</li>\n<li>Promoting best practices in code quality, testing, observability, and operational readiness.</li>\n<li>Fostering a culture of excellence, inclusion, and continuous improvement across the team.</li>\n<li>Partnering with recruiting to attract, hire, and develop top-tier engineering talent.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_04ee7215-acf","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Databricks","sameAs":"https://databricks.com","logo":"https://logos.yubhub.co/databricks.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/databricks/jobs/8211957002","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$217,000-$312,200 USD","x-skills-required":["technical leadership","large-scale distributed systems","real-time serving systems","architectural design","operational excellence","production systems","SLAs","SLOs","GPU performance optimization","concurrency","caching","scalability concepts"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:58:02.797Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, California"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"technical leadership, large-scale distributed systems, real-time serving systems, architectural design, operational excellence, production systems, SLAs, SLOs, GPU performance optimization, concurrency, caching, scalability concepts","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":217000,"maxValue":312200,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_7dc0b69a-5b8"},"title":"Senior Engineer, Storage Control Plane","description":"<p>We&#39;re looking for a Senior Storage Engineer to play a key role in designing, building, and operating the control plane for our high-performance AI storage platform. You&#39;ll help evolve CoreWeave&#39;s storage systems by building reliable, scalable, and high-throughput solutions that power some of the largest and innovative AI workloads in the world.</p>\n<p>This role involves close collaboration with teams across infrastructure, compute, and platform to ensure our storage services scale automatically and seamlessly while maximizing performance and reliability.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Design and implement a highly scalable multi-tenant control plane that supports CoreWeave&#39;s growing AI storage and cloud infrastructure needs.</li>\n<li>Contribute to the development of exabyte-scale, S3-compatible object storage, distributed file system and integrate dedicated storage clusters into diverse customer environments.</li>\n<li>Work with technologies such as RDMA, GPU Direct Storage, RoCE, InfiniBand, SPDK, and distributed filesystems to optimize storage performance and efficiency.</li>\n<li>Participate in efforts to improve the reliability, durability, and observability of our storage stack.</li>\n<li>Collaborate with operations teams to monitor, analyze, and optimize storage systems using telemetry, metrics, and dashboards to improve performance, latency, and resilience.</li>\n<li>Work cross-functionally with platform, product, and infrastructure teams to deliver seamless storage capabilities across the stack.</li>\n<li>Share your knowledge and mentor other engineers on best practices in building distributed, high-performance systems.</li>\n</ul>\n<p>The ideal candidate will have:</p>\n<ul>\n<li>A Bachelor&#39;s or Master&#39;s degree in Computer Science, Engineering, or a related field.</li>\n<li>6–10 years of experience working in storage systems engineering or infrastructure.</li>\n<li>Strong hands-on experience with object storage or distributed filesystems in production environments.</li>\n<li>Experience with one or more storage protocols (e.g. S3, NFS) and file systems such as Ceph, DAOS, or similar.</li>\n<li>Proficiency in a systems programming language such as Go, C, or Rust.</li>\n<li>Familiarity with storage observability tools and telemetry pipelines (e.g., ClickHouse, Prometheus, Grafana).</li>\n<li>Solid understanding of cloud-native infrastructure, Kubernetes, and scalable system architecture.</li>\n<li>Strong debugging and problem-solving skills in distributed, high-performance environments.</li>\n<li>Clear communicator, able to work collaboratively across teams and share technical insights effectively.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_7dc0b69a-5b8","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4611874006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$139,000 to $204,000","x-skills-required":["object storage","distributed filesystems","RDMA","GPU Direct Storage","RoCE","InfiniBand","SPDK","cloud-native infrastructure","Kubernetes","scalable system architecture"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:57:57.450Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"object storage, distributed filesystems, RDMA, GPU Direct Storage, RoCE, InfiniBand, SPDK, cloud-native infrastructure, Kubernetes, scalable system architecture","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":139000,"maxValue":204000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_70e2591f-d7d"},"title":"Technical Program Manager, Infrastructure","description":"<p>As a Technical Program Manager for Infrastructure, you&#39;ll work across multiple infrastructure domains to coordinate complex programs that have broad organisational impact. You&#39;ll be solving novel scaling challenges at the frontier of what&#39;s possible, all while maintaining the security and reliability our mission demands.</p>\n<p>Developer Productivity &amp; Tooling</p>\n<ul>\n<li>Drive cross-functional programs to improve developer environments, CI/CD infrastructure, and release processes that enable rapid innovation while maintaining high security standards</li>\n</ul>\n<ul>\n<li>Coordinate large-scale migrations and platform modernization efforts across engineering teams</li>\n</ul>\n<ul>\n<li>Partner with teams to measure and improve developer productivity metrics, identifying bottlenecks and driving systematic improvements</li>\n</ul>\n<ul>\n<li>Lead initiatives to integrate AI tools into development workflows, helping Anthropic be at the forefront of AI-assisted research and engineering</li>\n</ul>\n<p>Infrastructure Reliability &amp; Operations</p>\n<ul>\n<li>Drive programs to establish and achieve reliability targets across training infrastructure and production services</li>\n</ul>\n<ul>\n<li>Coordinate incident response improvements, post-mortem processes, and on-call rotations that help teams operate effectively</li>\n</ul>\n<ul>\n<li>Establish metrics and dashboards to track infrastructure health, capacity utilisation, and operational excellence</li>\n</ul>\n<p>Cross-functional Coordination</p>\n<ul>\n<li>Serve as the critical bridge between infrastructure teams, research, and product, translating technical complexities into clear updates for a variety of audiences</li>\n</ul>\n<ul>\n<li>Consult with stakeholders to deeply understand infrastructure, data, and compute needs, identifying solutions to support frontier research and product development</li>\n</ul>\n<ul>\n<li>Drive alignment on priorities and timelines across teams with competing constraints</li>\n</ul>\n<p>You&#39;ll be a good fit if you have 5+ years of technical program management experience, with a track record of successfully delivering complex infrastructure programs in ML/AI systems or large-scale distributed systems. You&#39;ll also need a deep technical understanding of infrastructure systems, strong stakeholder management skills, and the ability to navigate competing priorities-confirming data-driven technical decisions.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_70e2591f-d7d","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5111783008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$290,000-$365,000 USD","x-skills-required":["Kubernetes","Cloud platforms (AWS, GCP, Azure)","ML infrastructure (GPU/TPU/Trainium clusters)","Developer productivity initiatives","CI/CD systems","Infrastructure scaling"],"x-skills-preferred":["Observability tooling and practices","AI tools to improve engineering productivity","Research teams and translating their needs into concrete technical requirements"],"datePosted":"2026-04-18T15:57:52.097Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Kubernetes, Cloud platforms (AWS, GCP, Azure), ML infrastructure (GPU/TPU/Trainium clusters), Developer productivity initiatives, CI/CD systems, Infrastructure scaling, Observability tooling and practices, AI tools to improve engineering productivity, Research teams and translating their needs into concrete technical requirements","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":290000,"maxValue":365000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_9af8d812-df8"},"title":"AI Infrastructure Engineer","description":"<p>We&#39;re looking for Senior+ AI Infrastructure Engineers to build the systems that train and serve Intercom&#39;s next generation of AI products.</p>\n<p>As a Senior AI Infrastructure Engineer focused on model training and inference, you will:</p>\n<p>Implement and scale training pipelines for large transformer and LLM models, from data ingestion and preprocessing through distributed training and evaluation.</p>\n<p>Build and optimize inference services that deliver low-latency, high-reliability experiences for our customers, including autoscaling, routing, and fallbacks.</p>\n<p>Work on GPU-level performance: tuning kernels, improving utilization, and identifying bottlenecks across our training and inference stack.</p>\n<p>Collaborate closely with ML scientists to implement cutting edge training and inference methods and bring them to production.</p>\n<p>Play an active role in hiring, mentoring, and developing other engineers on the team.</p>\n<p>Raise the bar for technical standards, reliability, and operational excellence across Intercom’s AI platform.</p>\n<p>We’re looking to hire Senior+ AI Infrastructure Engineers. You’re likely a great fit if:</p>\n<p>You have 5+ years of experience in software engineering, with a strong track record of shipping high-quality products or platforms.</p>\n<p>You hold a degree in Computer Science, Computer Engineering, or a related field (or you have equivalent experience with very strong fundamentals).</p>\n<p>You have hands-on experience with one or more of the following:</p>\n<p>Model training (especially transformers and LLMs).</p>\n<p>Model inference at scale (again, especially transformers and LLMs).</p>\n<p>Low-level GPU work, such as writing CUDA or Triton kernels.</p>\n<p>Comfortable working in production environments at meaningful scale (traffic, data, or organizational).</p>\n<p>You communicate clearly, can explain complex technical topics to different audiences, and enjoy close collaboration with both engineers and non-engineers.</p>\n<p>You take pride in strong technical fundamentals, love learning, and are willing to invest in your own development.</p>\n<p>Have deep knowledge of at least one programming language (for example Python, Ruby, Java, Go, etc.). Specific language experience is less important than your ability to write clean, reliable code and learn new stacks quickly.</p>\n<p>We are a well-treated bunch, with awesome benefits! If there’s something important to you that’s not on this list, talk to us!</p>\n<p>Competitive salary, annual bonus and equity</p>\n<p>Regular compensation reviews - we reward great work!</p>\n<p>Unlimited access to Claude Code and best-in-class AI tools; experimentation &amp; building is encouraged &amp; celebrated.</p>\n<p>Generous paid time off above statutory minimum</p>\n<p>Hybrid working</p>\n<p>MacBooks are our standard, but we also offer Windows for certain roles when needed.</p>\n<p>Fun events for employees, friends, and family!</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_9af8d812-df8","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Intercom","sameAs":"https://www.intercom.com/","logo":"https://logos.yubhub.co/intercom.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/intercom/jobs/7824142","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["model training","model inference","low-level GPU work","CUDA","Triton","Python","Ruby","Java","Go"],"x-skills-preferred":["experience at AI native companies","running training or inference workloads on Kubernetes","AWS","cloud providers","production experience with Python in ML or infrastructure contexts"],"datePosted":"2026-04-18T15:57:33.379Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Berlin, Germany"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"model training, model inference, low-level GPU work, CUDA, Triton, Python, Ruby, Java, Go, experience at AI native companies, running training or inference workloads on Kubernetes, AWS, cloud providers, production experience with Python in ML or infrastructure contexts"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_588dfb0e-611"},"title":"Solutions Architect - Kubernetes","description":"<p>As a Solutions Architect at CoreWeave, you will play a vital role in helping customers succeed with our cloud infrastructure offerings, focusing on Kubernetes solutions within high-performance compute (HPC) environments.</p>\n<p>Your responsibilities will include serving as the primary technical point of contact for customers, establishing strong technical relationships and ensuring their success with CoreWeave&#39;s cloud infrastructure offerings.</p>\n<p>You will collaborate closely with customers to understand their unique business needs and create, prototype, and deploy tailored solutions that align with their requirements.</p>\n<p>You will lead proof of concept initiatives to showcase the value and viability of CoreWeave&#39;s solutions within specific environments.</p>\n<p>You will drive technical leadership and direction during customer meetings, presentations, and workshops, addressing any technical queries or concerns that arise.</p>\n<p>You will act as a virtual member of CoreWeave&#39;s Kubernetes product and engineering teams, identifying opportunities for product enhancement and collaborating with engineers to implement your suggestions.</p>\n<p>You will offer valuable insights on product features, functionality, and performance, contributing regularly to discussions about product strategy and architecture.</p>\n<p>You will conduct periodic technical reviews and assessments of customer workloads, pinpointing opportunities for workload optimization and suggesting suitable solutions.</p>\n<p>You will stay informed of the latest developments and trends in Kubernetes, cloud computing and infrastructure, sharing your thought leadership with customers and internal stakeholders.</p>\n<p>You will lead the prototyping and initiation of research and development efforts for emerging products and solutions, delivering prototypes and key insights for internal consumption.</p>\n<p>You will represent CoreWeave at conferences and industry events, with occasional travel as required.</p>\n<p>To be successful in this role, you will need to have a B.S. in Computer Science or a related technical discipline, or equivalent experience.</p>\n<p>You will also need to have 7+ years of proven experience as a Solutions Architect, engineer, researcher, or technical account manager in cloud infrastructure, focusing on building distributed systems or HPC/cloud services, with an expertise focused on scalable Kubernetes solutions.</p>\n<p>You will need to be fluent in cloud computing concepts, architecture, and technologies with hands-on experience in designing and implementing cloud solutions.</p>\n<p>You will need to have a proven track record with building customer relationships, communicating clearly and the ability to break down complex technical concepts to both technical and non-technical audiences.</p>\n<p>You will need to be familiar with NVIDIA GPUs typically used in AI/ML applications and associated technologies such as Infiniband and NVIDIA Collective Communications Library (NCCL).</p>\n<p>You will need to have experience with running large-scale Artificial Intelligence/Machine Learning (AI/ML) training and inference workloads on technologies such as Slurm and Kubernetes.</p>\n<p>Preferred qualifications include code contributions to open-source inference frameworks, experience with scripting and automation related to Kubernetes clusters and workloads, experience with building solutions across multi-cloud environments, and client or customer-facing publications/talks on latency, optimization, or advanced model-server architectures.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_588dfb0e-611","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4557835006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$165,000 to $220,000","x-skills-required":["Kubernetes","Cloud Computing","High-Performance Compute (HPC)","Distributed Systems","Cloud Infrastructure","Scalable Solutions","NVIDIA GPUs","Infiniband","NVIDIA Collective Communications Library (NCCL)","Slurm","Kubernetes Clusters"],"x-skills-preferred":["Code Contributions to Open-Source Inference Frameworks","Scripting and Automation Related to Kubernetes Clusters and Workloads","Building Solutions Across Multi-Cloud Environments","Client or Customer-Facing Publications/Talks on Latency, Optimization, or Advanced Model-Server Architectures"],"datePosted":"2026-04-18T15:57:29.779Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Kubernetes, Cloud Computing, High-Performance Compute (HPC), Distributed Systems, Cloud Infrastructure, Scalable Solutions, NVIDIA GPUs, Infiniband, NVIDIA Collective Communications Library (NCCL), Slurm, Kubernetes Clusters, Code Contributions to Open-Source Inference Frameworks, Scripting and Automation Related to Kubernetes Clusters and Workloads, Building Solutions Across Multi-Cloud Environments, Client or Customer-Facing Publications/Talks on Latency, Optimization, or Advanced Model-Server Architectures","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":165000,"maxValue":220000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_f0f66ce3-d78"},"title":"Senior GenAI Research Engineer - Optimization and Kernels","description":"<p>As a research engineer on the Scaling team at Databricks, you will be responsible for keeping up with the latest developments in deep learning and advancing the scientific frontier by creating new techniques that go beyond the state of the art.</p>\n<p>You will work together on a collaborative team of researchers and engineers with diverse backgrounds and technical training. Your goal will be to make our customers successful in applying state-of-the-art LLMs and AI systems, and we encode our scientific expertise into our products to make that possible.</p>\n<p>Your responsibilities will include:</p>\n<ul>\n<li>Driving performance improvements through advanced optimization techniques including kernel fusion, mixed precision, memory layout optimization, tiling strategies, and tensorization for training-specific patterns</li>\n</ul>\n<ul>\n<li>Designing, implementing, and optimizing high-performance GPU kernels for training workloads (e.g., attention mechanisms, custom layers, gradient computation, activation functions) targeting NVIDIA architectures</li>\n</ul>\n<ul>\n<li>Designing and implementing distributed training frameworks for large language models, including parallelism strategies (data, tensor, pipeline, ZeRO-based) and optimized communication patterns for gradient synchronization and collective operations</li>\n</ul>\n<ul>\n<li>Profiling, debugging, and optimizing end-to-end training workflows to identify and resolve performance bottlenecks, applying memory optimization techniques like activation checkpointing, gradient sharding, and mixed precision training</li>\n</ul>\n<p>We look for candidates with a strong background in computer science or a related field, hands-on experience writing and tuning CUDA kernels for ML training applications, and a deep understanding of parallelism techniques and memory optimization strategies for large-scale model training.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_f0f66ce3-d78","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Databricks","sameAs":"https://databricks.com","logo":"https://logos.yubhub.co/databricks.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/databricks/jobs/8297797002","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$166,000-$225,000 USD","x-skills-required":["CUDA","NVIDIA GPU architecture","PyTorch","distributed training frameworks","parallelism techniques","memory optimization strategies"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:57:26.571Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, California"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"CUDA, NVIDIA GPU architecture, PyTorch, distributed training frameworks, parallelism techniques, memory optimization strategies","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":166000,"maxValue":225000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_53bd182c-902"},"title":"DSP Engineer, EW","description":"<p>Anduril Industries is seeking a highly skilled DSP Engineer to join our team. As a DSP Engineer, you will design, develop, and optimize digital signal processing algorithms and systems for radio direction finding and direction-of-arrival estimation in defense applications.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Collaborating with a multidisciplinary team of software and hardware engineers to develop software defined radios;</li>\n<li>Implementing high-performance, real-time signal processing chains on embedded and hardware platforms to support mission-critical sensing capabilities;</li>\n<li>Developing Modeling and Simulation (M&amp;S) code for RADAR techniques and data analysis including Hardware-in-the Loop / Software-in-the-loop (HIL/SIL) testing;</li>\n<li>Participating in laboratory and field testing of RF systems and techniques;</li>\n<li>Participating in the maturation of RF systems into deployable systems and products.</li>\n</ul>\n<p>Required qualifications include:</p>\n<ul>\n<li>5+ years of experience with a BSEE or related field;</li>\n<li>Strong foundation in digital signal processing, comms theory, and system engineering with emphasis in direction finding algorithm implementation;</li>\n<li>Hands-on experience with direction finding, angle-of-arrival estimation, and multi-antenna signal processing;</li>\n<li>Strong experience with DSP implementation for embedded devices including FPGA, Nvidia Jetson, and Software Defined Radios and/or software defined radios;</li>\n<li>Strong knowledge of Python and MATLAB;</li>\n<li>Experience with CUDA or GPU accelerated frameworks like cuSignal is preferred;</li>\n<li>Familiar with deep learning algorithms;</li>\n<li>Familiar with wireless communication standards (Bluetooth, 3G/4G/5G, Wi-Fi, SINCGARS, MUOS, etc.).</li>\n</ul>\n<p>Preferred qualifications include:</p>\n<ul>\n<li>Masters or PhD degree in Electrical, Electronics, Computer Engineering, or related fields;</li>\n<li>Experience with ML frameworks such as TensorFlow and PyTorch;</li>\n<li>Defense, national security, or aerospace domain familiarity through industry or education;</li>\n<li>Extensive Digital Signal Processing (DSP) knowledge and experience;</li>\n<li>Expertise in Synthetic Aperture Radar (SAR) and/or Inverse SAR (ISAR): Image formation, waveforms, phenomenology, modeling and simulation.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_53bd182c-902","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anduril Industries","sameAs":"https://anduril.com","logo":"https://logos.yubhub.co/anduril.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/andurilindustries/jobs/5031495007","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$166,000-$220,000 USD","x-skills-required":["Digital Signal Processing","Comms Theory","System Engineering","Direction Finding Algorithm Implementation","Embedded Devices","FPGA","Nvidia Jetson","Software Defined Radios","Python","MATLAB","CUDA","GPU Accelerated Frameworks","Deep Learning Algorithms","Wireless Communication Standards"],"x-skills-preferred":["ML Frameworks","TensorFlow","PyTorch","Defense Domain","National Security","Aerospace Domain","Synthetic Aperture Radar","Inverse SAR","Image Formation","Waveforms","Phenomenology","Modeling and Simulation"],"datePosted":"2026-04-18T15:57:17.065Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Costa Mesa, California, United States"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Digital Signal Processing, Comms Theory, System Engineering, Direction Finding Algorithm Implementation, Embedded Devices, FPGA, Nvidia Jetson, Software Defined Radios, Python, MATLAB, CUDA, GPU Accelerated Frameworks, Deep Learning Algorithms, Wireless Communication Standards, ML Frameworks, TensorFlow, PyTorch, Defense Domain, National Security, Aerospace Domain, Synthetic Aperture Radar, Inverse SAR, Image Formation, Waveforms, Phenomenology, Modeling and Simulation","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":166000,"maxValue":220000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_ac45e205-e7d"},"title":"Engineering Manager, Inference Routing and Performance","description":"<p><strong>About the role\\nEvery request that hits Claude , from claude.ai, the API, our cloud partners, or internal research , passes through a routing decision. Not a generic load balancer round-robin, but a decision that accounts for what&#39;s already cached where, which accelerator the request runs best on, and what else is in flight across the fleet.\\n\\nGet it right and you extract meaningfully more throughput from the same hardware. Get it wrong and you burn capacity, miss latency SLOs, or shed load that shouldn&#39;t have been shed.\\n\\nThe Inference Routing team owns this layer. We build the cluster-level routing and coordination plane for Anthropic&#39;s inference fleet , the system that sits between the API surface and the inference engines themselves, making fleet-wide efficiency decisions in real time.\\n\\nAs Anthropic moves from &quot;many independent inference replicas&quot; toward &quot;a single warehouse-scale computer running a coordinated program,&quot; Dystro is the coordination layer. This is a deeply technical team.\\n\\nThe engineers here design custom load-balancing algorithms, build quantitative models of system performance, debug latency spikes that cross kernel, network, and framework boundaries, and reason carefully about cache placement across thousands of accelerators.\\n\\nThey work shoulder-to-shoulder with teams that write kernels and ML framework internals.\\n\\nThe EM for this team doesn&#39;t need to write kernels , but they do need the systems depth to make architectural calls, evaluate deeply technical candidates, and spot when a proposed optimization will have second-order effects on the fleet.\\n\\nYou&#39;ll inherit a strong team of distributed-systems engineers, and you&#39;ll be accountable for two things that pull in different directions: shipping system-level performance improvements that measurably increase fleet throughput and efficiency, and running the team operationally so that deploys are safe, incidents are rare, and the teams who depend on Dystro can plan around you with confidence.\\n\\nThe job is holding both.\\n\\n## Representative work:\\nThings the Inference Routing EM actually spends time on:\\n- Deciding whether a proposed routing algorithm change is worth the deploy risk, given the modeled throughput gain and the blast radius if it regresses\\n- Sequencing a quarter where KV-cache offload, a new coordination protocol, and two model launches all compete for the same engineers\\n- Working through a persistent tail-latency regression with the team , walking down from fleet-level metrics to per-replica behavior to a root cause in the networking stack\\n- Building the case (with numbers) to peer teams for why a cross-team protocol change unlocks the next efficiency win\\n- Running the post-incident review after a cache-eviction bug caused a capacity event, and turning it into process changes that stick\\n- Interviewing a candidate who has built schedulers at supercomputing scale, and deciding whether they&#39;d be additive to a team that already goes deep\\n\\n## What you&#39;ll do:\\nDrive system-level performance\\n- Own the technical roadmap for cluster-level inference efficiency , routing decisions, cache placement and eviction, cross-replica coordination, and the protocols that keep routing and inference engines in sync\\n- Partner with the inference engine, kernels, and performance teams to identify fleet-level throughput and latency wins, then turn those into shipped improvements with measurable results\\n- Build the team&#39;s habit of quantitative performance modeling: claim a win only when you can measure it, and know before you ship what the expected effect is\\n\\nDeliver reliably and operate cleanly\\n- Set technical strategy for how routing evolves across heterogeneous hardware (GPUs, TPUs, Trainium) and across all our serving surfaces\\n- Run the team&#39;s operational backbone , on-call rotation, incident response, postmortem review, deploy safety , so the team can ship aggressively without the system becoming fragile\\n- Create clarity at a seam: Inference Routing sits between the API surface, the inference engines, and the cloud deployment teams. You&#39;ll make sure commitments are realistic, dependencies are understood, and nobody is surprised\\n\\nBuild and grow the team\\n- Develop and retain a strong existing team, and hire against the bar described above: people who can go to the OS and framework level when the problem demands it, and who care about production reliability\\n- Coach engineers through a roadmap where priorities shift with model launches, new hardware, and scaling demands. We pair a lot here , you&#39;ll help make that collaboration pattern productive\\n- Pick up slack when it matters. This is a small team in a critical path; sometimes the EM is the one unblocking a stuck deploy or synthesizing a design debate\\n\\n## You may be a good fit if you:\\n- Have 5+ years of engineering management experience, ideally with at least part of that leading teams on critical-path production infrastructure at scale\\n- Have a deep systems background , load balancing, scheduling, cache-coherent distributed state, high-performance networking, or similar. You need enough depth to make architectural calls about routing and efficiency, and to evaluate candidates who go to the kernel and framework level\\n- Have shipped performance improvements in large-scale systems and can explain, with numbers, what the impact was\\n- Have run production infrastructure with real operational stakes: on-call, incident response, capacity events, deploy discipline\\n- Are results-oriented with a bias toward impact, and comfortable working in a space where throughput, latency, stability, and feature velocity all pull in different directions\\n- Build strong relationships across team boundaries , this is a seam role, and much of the job is making sure other teams can rely on yours\\n- Are curious about machine learning systems. You don&#39;t need an ML research background, but you should want to learn how transformer inference actually works and how that shapes the systems problems\\n\\nStrong candidates may also have:\\n- Experience with LLM inference serving , KV caching, continuous batching, request scheduling, prefill/decode disaggregation\\n- Background in cluster schedulers, load balancers, service meshes, or coordination planes at scale\\n- Familiarity with heterogeneous accelerator fleets (GPU/TPU/Trainium) and how hardware differences affect workload placement\\n- Experience with GPU/accelerator programming, ML framework internals, or OS-level performance debugging , enough to follow and evaluate the technical work, not necessarily to do it daily\\n- Led teams at supercomputing or hyperscaler infrastructure scale\\n- Led teams through rapid-growth periods where hiring and onboarding competed with roadmap delivery\\n\\nThe annual compensation range for this role is listed below. For sales roles, the range provided is the role’s On Target Earnings (&quot;OTE&quot;) range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.\\nAnnual Salary: $405,000-$485,000 USD</strong></p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_ac45e205-e7d","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5155391008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$405,000-$485,000 USD","x-skills-required":["engineering management","distributed systems","load balancing","scheduling","cache-coherent distributed state","high-performance networking","machine learning systems"],"x-skills-preferred":["LLM inference serving","cluster schedulers","load balancers","service meshes","coordination planes","heterogeneous accelerator fleets","GPU/TPU/Trainium","GPU/accelerator programming","ML framework internals","OS-level performance debugging"],"datePosted":"2026-04-18T15:56:48.587Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"engineering management, distributed systems, load balancing, scheduling, cache-coherent distributed state, high-performance networking, machine learning systems, LLM inference serving, cluster schedulers, load balancers, service meshes, coordination planes, heterogeneous accelerator fleets, GPU/TPU/Trainium, GPU/accelerator programming, ML framework internals, OS-level performance debugging","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":405000,"maxValue":485000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_c7de81b4-bec"},"title":"Security Engineer, Infrastructure","description":"<p>We are seeking a highly skilled Infrastructure Security Engineer to join our team. This role is integral to ensuring the security and integrity of our platform.</p>\n<p>You will be responsible for securing large cloud environments, orchestrating and securing various compute clusters, and reviewing infrastructure as code. Your expertise in cloud security, infrastructure automation, and advanced security practices will be essential in maintaining and enhancing our security posture.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Securing infrastructure across large cloud hosting providers (e.g., AWS, Azure, GCP).</li>\n<li>Implementing and maintaining robust security configurations and policies for cloud environments.</li>\n<li>Conducting regular security assessments and audits of infrastructure to identify vulnerabilities and areas for improvement.</li>\n<li>Developing and enforcing security best practices for infrastructure automation and orchestration.</li>\n<li>Collaborating with DeveloperExperience, IT, and product teams to integrate security into all stages of the infrastructure lifecycle.</li>\n<li>Reviewing and securing infrastructure as code (e.g., Terraform, CloudFormation).</li>\n<li>Educating and mentoring team members on infrastructure security best practices and emerging threats.</li>\n</ul>\n<p>Ideally, you&#39;d have:</p>\n<ul>\n<li>Proven experience as a Security Engineer with a focus on product security.</li>\n<li>Proficiency in NodeJS, TypeScript, and Kubernetes.</li>\n<li>Experience with orchestrating and securing GPU clusters.</li>\n<li>Proficiency in infrastructure as code tools such as Terraform and CloudFormation.</li>\n<li>Excellent communication skills, with the ability to clearly explain technical concepts and their implications to both technical and non-technical stakeholders.</li>\n<li>Demonstrated ability to influence security strategies and drive improvements within an organisation.</li>\n<li>Relevant security certifications (e.g., AWS Certified Security Specialty, Certified Cloud Security Professional) are a plus.</li>\n<li>Experience in a senior or lead security role is preferred.</li>\n</ul>\n<p>Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position, determined by work location and additional factors, including job-related skills, experience, interview performance, and relevant education or training.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_c7de81b4-bec","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Scale","sameAs":"https://www.scale.com/","logo":"https://logos.yubhub.co/scale.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/scaleai/jobs/4646888005","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$237,600-$297,000 USD","x-skills-required":["cloud security","infrastructure automation","advanced security practices","NodeJS","TypeScript","Kubernetes","Terraform","CloudFormation"],"x-skills-preferred":["orchestrating and securing GPU clusters","relevant security certifications"],"datePosted":"2026-04-18T15:56:27.426Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"New York, NY; San Francisco, CA; Seattle, WA; Washington, DC"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"cloud security, infrastructure automation, advanced security practices, NodeJS, TypeScript, Kubernetes, Terraform, CloudFormation, orchestrating and securing GPU clusters, relevant security certifications","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":237600,"maxValue":297000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_d799d883-0dd"},"title":"Solutions Architect- Networking","description":"<p>As a Solutions Architect at CoreWeave, you will play a vital role in leading innovation at every turn. You will have the opportunity to demonstrate thought leadership and engage hands-on throughout our customers&#39; entire lifecycle. From establishing their Kubernetes environment to developing proofs of concept, onboarding, and optimizing workloads, you will lead innovation at every turn.</p>\n<p>In this role, you will:</p>\n<p>Serve as the primary technical point of contact for customers, establishing strong technical relationships and ensuring their success with CoreWeave&#39;s cloud infrastructure offerings, focusing on networking technologies within high-performance compute (HPC) environments Collaborate closely with customers to understand their unique business needs and create, prototype, and deploy tailored solutions that align with their requirements. Lead proof of concept initiatives to showcase the value and viability of CoreWeave&#39;s solutions within specific environments. Drive technical leadership and direction during customer meetings, presentations, and workshops, addressing any technical queries or concerns that arise. Act as a virtual member of CoreWeave&#39;s Networking product and engineering teams, identifying opportunities for product enhancement and collaborating with engineers to implement your suggestions. Offer valuable insights on product features, functionality, and performance, contributing regularly to discussions about product strategy and architecture. Conduct periodic technical reviews and assessments of customer workloads, pinpointing opportunities for workload optimization and suggesting suitable solutions. Stay informed of the latest developments and trends in Kubernetes, cloud computing and infrastructure, sharing your thought leadership with customers and internal stakeholders. Lead the prototyping and initiation of research and development efforts for emerging products and solutions, delivering prototypes and key insights for internal consumption. Represent CoreWeave at conferences and industry events, with occasional travel as required.</p>\n<p>Who You Are:</p>\n<p>B.S. in Computer Science or a related technical discipline, or equivalent experience 7+ years of proven experience as a Solutions Architect, engineer, researcher, or technical account manager in cloud infrastructure focusing on building distributed systems or HPC/cloud services, with an expertise focused on infrastructure networking. Fluency in cloud computing concepts, architecture, and technologies with hands-on experience in designing and implementing cloud solutions Proven track record with building customer relationships, communicating clearly and the ability to break down complex technical concepts to both technical and non-technical audiences Expertise with a broad range of networking technologies and topics, with a familiarity to understand the needs and use cases is it relates to securing and enabling high performance networking environments. Experience with managing infrastructure networking, Kubernnetes CSI management, and private networking concepts Familiar with NVIDIA GPUs typically used in AI/ML applications and associated technologies such as Infiniband and NVIDIA Collective Communications Library (NCCL)</p>\n<p>Preferred:</p>\n<p>Code contributions to open-source inference frameworks Experience with scripting and automation related to network technologies Experience with building solutions across multi-cloud environments Client or customer-facing publications/talks on latency, optimization, or advanced model-server architectures</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_d799d883-0dd","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4568528006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$165,000 to $220,000","x-skills-required":["cloud computing","Kubernetes","infrastructure networking","high-performance computing","networking technologies","NVIDIA GPUs","Infiniband","NVIDIA Collective Communications Library (NCCL)"],"x-skills-preferred":["open-source inference frameworks","scripting and automation","multi-cloud environments","latency, optimization, or advanced model-server architectures"],"datePosted":"2026-04-18T15:56:27.053Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Livingston, NJ / New York, NY / Sunnyvale, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"cloud computing, Kubernetes, infrastructure networking, high-performance computing, networking technologies, NVIDIA GPUs, Infiniband, NVIDIA Collective Communications Library (NCCL), open-source inference frameworks, scripting and automation, multi-cloud environments, latency, optimization, or advanced model-server architectures","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":165000,"maxValue":220000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_16599c27-a87"},"title":"Senior Infrastructure Engineer/SRE","description":"<p>We&#39;re on a mission to revolutionize the workforce with AI. As a member of the infrastructure team, you&#39;ll design, build, and advance our core infrastructure that allows the engineering team to execute quickly, productively, and securely.</p>\n<p>You&#39;ll partner with engineers to build dev tools that empower developer workflows and deployment infrastructure. Ensure reliability of multi-cloud Kubernetes clusters and pipelines. Implement metrics, logging, analytics, and alerting for performance and security across all endpoints and applications. Automate operations and engineering, focusing on automation so we can spend energy where it matters.</p>\n<p>You&#39;ll also build machine learning infrastructure that enables AI teams to train, test, and deploy on large-scale datasets.</p>\n<p>We&#39;re looking for someone with 5+ years of experience in DevOps, Site Reliability Engineering, Production Engineering, or equivalent field. You should have deep proficiency with coding languages such as Golang or Python, and deep familiarity with container-related security best practices. Production experience working with Kubernetes, and a deep understanding of the Kubernetes ecosystem, including popular open-source tooling such as cert-manager or external-dns. Experience with GPU-enabled clusters is a bonus.</p>\n<p>Perks &amp; Benefits:</p>\n<ul>\n<li>Comprehensive medical, dental, and vision coverage with plans to fit you and your family</li>\n<li>Flexible PTO to take the time you need, when you need it</li>\n<li>Paid parental leave for all new parents welcoming a new child</li>\n<li>Retirement savings plan to help you plan for the future</li>\n<li>Remote work setup budget to help you create a productive home office</li>\n<li>Monthly wellness and communication stipend to keep you connected and balanced</li>\n<li>In-office meal program and commuter benefits provided for onsite employees</li>\n</ul>\n<p>Compensation at Cresta:</p>\n<p>Cresta&#39;s approach to compensation is simple: recognize impact, reward excellence, and invest in our people. We offer competitive, location-based pay that reflects the market and what each individual brings to the table. The posted base salary range represents what we expect to pay for this role in a given location. Final offers are shaped by factors like experience, skills, education, and geography. In addition to base pay, total compensation includes equity and a comprehensive benefits package for you and your family.</p>\n<p>OTE Range: $205,000–$270,000 + Offers Equity</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_16599c27-a87","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Cresta","sameAs":"https://www.cresta.ai/","logo":"https://logos.yubhub.co/cresta.ai.png"},"x-apply-url":"https://job-boards.greenhouse.io/cresta/jobs/5137153008","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$205,000–$270,000","x-skills-required":["Golang","Python","Kubernetes","cert-manager","external-dns","GPU-enabled clusters","Terraform","CloudFormation","AWS","IAM","S3","EC2","EKS","PostgreSQL","GitOps","Flux","Argo","CI/CD","GitHub Actions"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:55:52.459Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"United States (Remote)"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Golang, Python, Kubernetes, cert-manager, external-dns, GPU-enabled clusters, Terraform, CloudFormation, AWS, IAM, S3, EC2, EKS, PostgreSQL, GitOps, Flux, Argo, CI/CD, GitHub Actions","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":205000,"maxValue":270000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_24176cb8-311"},"title":"Member of Technical Staff - Compute Infrastructure","description":"<p>We&#39;re seeking a highly skilled Member of Technical Staff to join our Compute Infrastructure team. As a key member of this team, you will design, build, and operate massive-scale clusters and orchestration platforms that power frontier AI training, inference, and agent workloads at unprecedented scale.</p>\n<p>In this role, you will push the boundaries of container orchestration far beyond existing systems like Kubernetes, manage exascale compute resources, optimize for high-performance training runs and production serving, and collaborate closely with research and systems teams to deliver reliable, ultra-scalable infrastructure that enables xAI&#39;s next-generation models and applications.</p>\n<p>Responsibilities include building and managing massive-scale clusters, designing, developing, and extending an in-house container orchestration platform, collaborating with research teams to architect and optimize compute clusters, profiling, debugging, and resolving complex system-level performance bottlenecks, and owning end-to-end infrastructure initiatives.</p>\n<p>To succeed in this role, you will need deep expertise in virtualization technologies and advanced containerization/sandboxing, strong proficiency in systems programming languages such as C/C++ and Rust, and proven track record profiling, debugging, and optimizing complex system-level performance issues.</p>\n<p>Preferred skills and experience include experience in Linux kernel development, hypervisor extensions, or low-level system programming for compute-intensive workloads, operating or designing large-scale AI training/inference clusters, and familiarity with performance tools, tracing, and debugging in production distributed environments.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_24176cb8-311","directApply":true,"hiringOrganization":{"@type":"Organization","name":"xAI","sameAs":"https://www.xai.com/","logo":"https://logos.yubhub.co/xai.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/xai/jobs/5052040007","x-work-arrangement":"onsite","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$180,000 - $440,000 USD","x-skills-required":["Deep expertise in virtualization technologies (KVM, Xen, QEMU) and advanced containerization/sandboxing (Kata, Firecracker, gVisor, Sysbox, or equivalent)","Strong proficiency in systems programming languages such as C/C++ and Rust","Proven track record profiling, debugging, and optimizing complex system-level performance issues, with deep knowledge of Linux kernel internals, resource management, scheduling, memory management, and low-level engineering","Hands-on experience building or significantly enhancing distributed compute platforms, orchestration systems, or high-performance infrastructure at scale"],"x-skills-preferred":["Experience in Linux kernel development, hypervisor extensions, or low-level system programming for compute-intensive workloads","Proven track record operating or designing large-scale AI training/inference clusters (GPU/TPU scale)","Experience with custom runtimes, isolation techniques, or bespoke platforms for specialized AI compute","Familiarity with performance tools, tracing, and debugging in production distributed environments"],"datePosted":"2026-04-18T15:55:50.213Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Palo Alto, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Deep expertise in virtualization technologies (KVM, Xen, QEMU) and advanced containerization/sandboxing (Kata, Firecracker, gVisor, Sysbox, or equivalent), Strong proficiency in systems programming languages such as C/C++ and Rust, Proven track record profiling, debugging, and optimizing complex system-level performance issues, with deep knowledge of Linux kernel internals, resource management, scheduling, memory management, and low-level engineering, Hands-on experience building or significantly enhancing distributed compute platforms, orchestration systems, or high-performance infrastructure at scale, Experience in Linux kernel development, hypervisor extensions, or low-level system programming for compute-intensive workloads, Proven track record operating or designing large-scale AI training/inference clusters (GPU/TPU scale), Experience with custom runtimes, isolation techniques, or bespoke platforms for specialized AI compute, Familiarity with performance tools, tracing, and debugging in production distributed environments","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":180000,"maxValue":440000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_6d4292d1-227"},"title":"Software Engineer, Sandboxing (Systems)","description":"<p>We are seeking a Linux OS and System Programming Subject Matter Expert to join our Infrastructure team. In this role, you&#39;ll work on accelerating and optimizing our virtualization and VM workloads that power our AI infrastructure.</p>\n<p>Your expertise in low-level system programming, kernel optimization, and virtualization technologies will be crucial in ensuring Anthropic can scale our compute infrastructure efficiently and reliably for training and serving frontier AI models.</p>\n<p>Responsibilities:</p>\n<p>Optimize our virtualization stack, improving performance, reliability, and efficiency of our VM environments</p>\n<p>Design and implement kernel modules, drivers, and system-level components to enhance our compute infrastructure</p>\n<p>Investigate and resolve performance bottlenecks in virtualized environments</p>\n<p>Collaborate with cloud engineering teams to optimize interactions between our workloads and underlying hardware</p>\n<p>Develop tooling for monitoring and improving virtualization performance</p>\n<p>Work with our ML engineers to understand their computational needs and optimize our systems accordingly</p>\n<p>Contribute to the design and implementation of our next-generation compute infrastructure</p>\n<p>Share knowledge with team members on low-level systems programming and Linux kernel internals</p>\n<p>Partner with cloud providers to influence hardware and platform features for AI workloads</p>\n<p>You may be a good fit if you:</p>\n<p>Have experience with Linux kernel development, system programming, or related low-level software engineering</p>\n<p>Understand virtualization technologies (KVM, Xen, QEMU, etc.) and their performance characteristics</p>\n<p>Have experience optimizing system performance for compute-intensive workloads</p>\n<p>Are familiar with modern CPU architectures and memory systems</p>\n<p>Have strong C/C++ programming skills and ideally experience with systems languages like Rust</p>\n<p>Understand Linux resource management, scheduling, and memory management</p>\n<p>Have experience profiling and debugging system-level performance issues</p>\n<p>Are comfortable diving into unfamiliar codebases and technical domains</p>\n<p>Are results-oriented, with a bias towards practical solutions and measurable impact</p>\n<p>Care about the societal impacts of AI and are passionate about building safe, reliable systems</p>\n<p>Strong candidates may also have experience with:</p>\n<p>GPU virtualization and acceleration technologies</p>\n<p>Cloud infrastructure at scale (AWS, GCP)</p>\n<p>Container technologies and their underlying implementation (Docker, containerd, runc, OCI)</p>\n<p>eBPF programming and kernel tracing tools</p>\n<p>OS-level security hardening and isolation techniques</p>\n<p>Developing custom scheduling algorithms for specialized workloads</p>\n<p>Performance optimization for ML/AI specific workloads</p>\n<p>Network stack optimization and high-performance networking</p>\n<p>Experience with TPUs, custom ASICs, or other ML accelerators</p>\n<p>Representative projects:</p>\n<p>Optimizing kernel parameters and VM configurations to reduce inference latency for large language models</p>\n<p>Implementing custom memory management schemes for large-scale distributed training</p>\n<p>Developing specialized I/O schedulers to prioritize ML workloads</p>\n<p>Creating lightweight virtualization solutions tailored for AI inference</p>\n<p>Building monitoring and instrumentation tools to identify system-level bottlenecks</p>\n<p>Enhancing communication between VMs for distributed training workloads</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_6d4292d1-227","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5025591008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$300,000-$405,000 USD","x-skills-required":["Linux kernel development","System programming","Virtualization technologies","C/C++ programming","Rust programming","Linux resource management","Scheduling","Memory management"],"x-skills-preferred":["GPU virtualization","Cloud infrastructure","Container technologies","eBPF programming","Kernel tracing tools","OS-level security hardening","Custom scheduling algorithms","Performance optimization for ML/AI","Network stack optimization"],"datePosted":"2026-04-18T15:55:40.026Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Linux kernel development, System programming, Virtualization technologies, C/C++ programming, Rust programming, Linux resource management, Scheduling, Memory management, GPU virtualization, Cloud infrastructure, Container technologies, eBPF programming, Kernel tracing tools, OS-level security hardening, Custom scheduling algorithms, Performance optimization for ML/AI, Network stack optimization","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":300000,"maxValue":405000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_ce9f3d34-c8a"},"title":"Senior / Staff+ Software Engineer, Voice Platform","description":"<p>We&#39;re building the infrastructure that lets people talk to Claude,real-time, bidirectional voice conversations that feel natural, responsive, and safe. This is foundational work for how millions of people will interact with AI.</p>\n<p>The Voice Platform team designs and operates the serving systems, streaming pipelines, and APIs that bring Anthropic&#39;s audio models from research into production across Claude.ai, our mobile apps, and the Anthropic API. You&#39;ll work at the intersection of real-time media, low-latency inference, and distributed systems,building infrastructure where every millisecond of latency is felt by the user.</p>\n<p>We partner closely with the Audio research team, who train the speech understanding and generation models, and with product teams shipping voice experiences to users. Your job is to make those models fast, reliable, and delightful to talk to at scale.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Design and build the real-time streaming infrastructure that powers voice conversations with Claude,ingesting microphone audio, orchestrating model inference, and streaming synthesized speech back with minimal latency</li>\n</ul>\n<ul>\n<li>Build low-latency serving systems for speech models, optimizing time-to-first-audio and end-to-end conversational responsiveness</li>\n</ul>\n<ul>\n<li>Develop the public and internal APIs that expose voice capabilities to Claude.ai, mobile clients, and third-party developers</li>\n</ul>\n<ul>\n<li>Own the audio transport layer,codecs, jitter buffers, adaptive bitrate, packet loss recovery,so conversations stay smooth across unreliable networks</li>\n</ul>\n<ul>\n<li>Build observability and quality-measurement systems for voice: latency distributions, audio quality metrics, interruption handling, and turn-taking accuracy</li>\n</ul>\n<ul>\n<li>Partner with Audio research to move new model architectures from experiment to production, and feed real-world performance data back into research</li>\n</ul>\n<ul>\n<li>Collaborate with mobile and product engineering on client-side audio capture, playback, and the end-to-end user experience</li>\n</ul>\n<p>You may be a good fit if you</p>\n<ul>\n<li>Have 6+ years of experience building distributed systems, real-time infrastructure, or platform services at scale</li>\n</ul>\n<ul>\n<li>Have shipped production systems where latency is measured in tens of milliseconds and users notice when you miss</li>\n</ul>\n<ul>\n<li>Are comfortable working across the stack,from transport protocols and serving infrastructure up to the APIs product teams build on</li>\n</ul>\n<ul>\n<li>Are results-oriented, with a bias toward flexibility and impact</li>\n</ul>\n<ul>\n<li>Pick up slack, even if it goes outside your job description</li>\n</ul>\n<ul>\n<li>Enjoy pair programming (we love to pair!)</li>\n</ul>\n<ul>\n<li>Care about the societal impacts of voice AI and want to help shape how these systems are developed responsibly</li>\n</ul>\n<ul>\n<li>Are comfortable with ambiguity,voice is a fast-moving space, and you&#39;ll help define the architecture as we learn what works</li>\n</ul>\n<p>Strong candidates may also have experience with</p>\n<ul>\n<li>Real-time media protocols and stacks: WebRTC, RTP, gRPC bidirectional streaming, or WebSockets at scale</li>\n</ul>\n<ul>\n<li>Audio engineering fundamentals: codecs (Opus, AAC), voice activity detection, echo cancellation, jitter buffering, or audio DSP</li>\n</ul>\n<ul>\n<li>Low-latency ML inference serving, streaming model outputs, or GPU-based serving infrastructure</li>\n</ul>\n<ul>\n<li>Telephony, live streaming, video conferencing, or voice assistant platforms</li>\n</ul>\n<ul>\n<li>Mobile audio pipelines on iOS (AVAudioEngine, AudioUnits) or Android (Oboe, AAudio)</li>\n</ul>\n<ul>\n<li>Working alongside ML researchers to productionize models,speech experience is a plus but not required</li>\n</ul>\n<p>Representative projects</p>\n<ul>\n<li>Driving time-to-first-audio below human perceptual thresholds by co-designing the serving pipeline with the Audio research team</li>\n</ul>\n<ul>\n<li>Building a streaming inference orchestrator that interleaves speech recognition, LLM reasoning, and speech synthesis with overlapping execution</li>\n</ul>\n<ul>\n<li>Designing the voice mode API surface for the Anthropic API so developers can build their own voice agents on Claude</li>\n</ul>\n<ul>\n<li>Implementing graceful barge-in and interruption handling so users can cut Claude off mid-sentence naturally</li>\n</ul>\n<ul>\n<li>Instrumenting end-to-end audio quality metrics and building dashboards that catch regressions before users do</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_ce9f3d34-c8a","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5172245008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$320,000-$485,000 USD","x-skills-required":["Real-time media protocols and stacks","Audio engineering fundamentals","Low-latency ML inference serving","Distributed systems","API design"],"x-skills-preferred":["WebRTC","RTP","gRPC bidirectional streaming","WebSockets","Opus","AAC","voice activity detection","echo cancellation","jitter buffering","audio DSP","GPU-based serving infrastructure","telephony","live streaming","video conferencing","voice assistant platforms","mobile audio pipelines on iOS","Android","pair programming"],"datePosted":"2026-04-18T15:55:09.622Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Real-time media protocols and stacks, Audio engineering fundamentals, Low-latency ML inference serving, Distributed systems, API design, WebRTC, RTP, gRPC bidirectional streaming, WebSockets, Opus, AAC, voice activity detection, echo cancellation, jitter buffering, audio DSP, GPU-based serving infrastructure, telephony, live streaming, video conferencing, voice assistant platforms, mobile audio pipelines on iOS, Android, pair programming","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":320000,"maxValue":485000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_cba88898-896"},"title":"Research Engineer, Infrastructure, Kernels","description":"<p>We&#39;re looking for an infrastructure research engineer to design, optimize, and maintain the compute foundations that power large-scale language model training. You will develop high-performance ML kernels (e.g., CUDA, CuTe, Triton), enable efficient low-precision arithmetic, and improve the distributed compute stack that makes training large models possible.</p>\n<p>This role is perfect for an engineer who enjoys working close to the metal and across the research boundary. You&#39;ll collaborate with researchers and systems architects to bridge algorithmic design with hardware efficiency. You&#39;ll prototype new kernel implementations, profile performance across hardware generations, and help define the numerical and parallelism strategies that determine how we scale next-generation AI systems.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Design and implement custom ML kernels (e.g., CUDA, CuTe, Triton) for core LLM operations such as attention, matrix multiplication, gating, and normalization, optimized for modern GPU and accelerator architectures.</li>\n<li>Design and think through compute primitives to reduce memory bandwidth bottlenecks and improve kernel compute efficiency.</li>\n<li>Collaborate with research teams to align kernel-level optimizations with model architecture and algorithmic goals.</li>\n<li>Develop and maintain a library of reusable kernels and performance benchmarks that serve as the foundation for internal model training.</li>\n<li>Contribute to infrastructure stability and scalability, ensuring reproducibility, consistency across precision formats, and high utilization of compute resources.</li>\n<li>Document and share insights through internal talks, technical papers, or open-source contributions to strengthen the broader ML systems community.</li>\n</ul>\n<p><strong>Skills and Qualifications</strong></p>\n<p>Minimum qualifications:</p>\n<ul>\n<li>Bachelor’s degree or equivalent experience in computer science, electrical engineering, statistics, machine learning, physics, robotics, or similar.</li>\n<li>Strong engineering skills, ability to contribute performant, maintainable code and debug in complex codebases</li>\n<li>Understanding of deep learning frameworks (e.g., PyTorch, JAX) and their underlying system architectures.</li>\n<li>Thrive in a highly collaborative environment involving many, different cross-functional partners and subject matter experts.</li>\n<li>A bias for action with a mindset to take initiative to work across different stacks and different teams where you spot the opportunity to make sure something ships.</li>\n<li>Proficiency in CUDA, CuTe, Triton, or other GPU programming frameworks.</li>\n<li>Demonstrated ability to analyze, profile, and optimize compute-intensive workloads.</li>\n</ul>\n<p>Preferred qualifications:</p>\n<ul>\n<li>Experience training or supporting large-scale language models with tens of billions of parameters or more.</li>\n<li>Track record of improving research productivity through infrastructure design or process improvements.</li>\n<li>Experience developing or tuning kernels for deep learning frameworks such as PyTorch, JAX, or custom accelerators.</li>\n<li>Familiarity with tensor parallelism, pipeline parallelism, or distributed data processing frameworks.</li>\n<li>Experience implementing low-precision formats (FP8, INT8, block floating point) or contributing to related compiler stacks (e.g., XLA, TVM).</li>\n<li>Contributions to open-source GPU, ML systems, or compiler optimization projects.</li>\n<li>Prior research or engineering experience in numerical optimization, communication-efficient training, or scalable AI infrastructure.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_cba88898-896","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Thinking Machines Lab","sameAs":"https://thinkingmachines.ai/","logo":"https://logos.yubhub.co/thinkingmachines.ai.png"},"x-apply-url":"https://job-boards.greenhouse.io/thinkingmachines/jobs/5013934008","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$350,000 - $475,000 USD","x-skills-required":["CUDA","CuTe","Triton","GPU programming frameworks","Deep learning frameworks (e.g., PyTorch, JAX)","Computer science","Electrical engineering","Statistics","Machine learning","Physics","Robotics"],"x-skills-preferred":["Experience training or supporting large-scale language models with tens of billions of parameters or more","Track record of improving research productivity through infrastructure design or process improvements","Experience developing or tuning kernels for deep learning frameworks such as PyTorch, JAX, or custom accelerators","Familiarity with tensor parallelism, pipeline parallelism, or distributed data processing frameworks","Experience implementing low-precision formats (FP8, INT8, block floating point) or contributing to related compiler stacks (e.g., XLA, TVM)","Contributions to open-source GPU, ML systems, or compiler optimization projects","Prior research or engineering experience in numerical optimization, communication-efficient training, or scalable AI infrastructure"],"datePosted":"2026-04-18T15:54:38.498Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"CUDA, CuTe, Triton, GPU programming frameworks, Deep learning frameworks (e.g., PyTorch, JAX), Computer science, Electrical engineering, Statistics, Machine learning, Physics, Robotics, Experience training or supporting large-scale language models with tens of billions of parameters or more, Track record of improving research productivity through infrastructure design or process improvements, Experience developing or tuning kernels for deep learning frameworks such as PyTorch, JAX, or custom accelerators, Familiarity with tensor parallelism, pipeline parallelism, or distributed data processing frameworks, Experience implementing low-precision formats (FP8, INT8, block floating point) or contributing to related compiler stacks (e.g., XLA, TVM), Contributions to open-source GPU, ML systems, or compiler optimization projects, Prior research or engineering experience in numerical optimization, communication-efficient training, or scalable AI infrastructure","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":350000,"maxValue":475000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_f2196e99-854"},"title":"Software Engineer - GenAI inference","description":"<p>As a software engineer for GenAI inference, you will help design, develop, and optimize the inference engine that powers Databricks&#39; Foundation Model API. You&#39;ll work at the intersection of research and production, ensuring our large language model (LLM) serving systems are fast, scalable, and efficient.</p>\n<p>Your work will touch the full GenAI inference stack , from kernels and runtimes to orchestration and memory management. You will contribute to the design and implementation of the inference engine, and collaborate on model-serving stack optimized for large-scale LLMs inference.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Collaborating with researchers to bring new model architectures or features (sparsity, activation compression, mixture-of-experts) into the engine</li>\n<li>Optimizing for latency, throughput, memory efficiency, and hardware utilization across GPUs, and accelerators</li>\n<li>Building and maintaining instrumentation, profiling, and tracing tooling to uncover bottlenecks and guide optimizations</li>\n<li>Developing and enhancing scalable routing, batching, scheduling, memory management, and dynamic loading mechanisms for inference workloads</li>\n<li>Supporting reliability, reproducibility, and fault tolerance in the inference pipelines, including A/B launches, rollback, and model versioning</li>\n<li>Integrating with federated, distributed inference infrastructure – orchestrate across nodes, balance load, handle communication overhead</li>\n<li>Collaborating cross-functionally: with platform engineers, cloud infrastructure, and security/compliance teams</li>\n<li>Documenting and sharing learnings, contributing to internal best practices and open-source efforts when possible</li>\n</ul>\n<p>Requirements include:</p>\n<ul>\n<li>BS/MS/PhD in Computer Science, or a related field</li>\n<li>Strong software engineering background (3+ years or equivalent) in performance-critical systems</li>\n<li>Solid understanding of ML inference internals: attention, MLPs, recurrent modules, quantization, sparse operations, etc.</li>\n<li>Hands-on experience with CUDA, GPU programming, and key libraries (cuBLAS, cuDNN, NCCL, etc.)</li>\n<li>Comfortable designing and operating distributed systems, including RPC frameworks, queuing, RPC batching, sharding, memory partitioning</li>\n<li>Demonstrated ability to uncover and solve performance bottlenecks across layers (kernel, memory, networking, scheduler)</li>\n<li>Experience building instrumentation, tracing, and profiling tools for ML models</li>\n<li>Ability to work closely with ML researchers, translate novel model ideas into production systems</li>\n<li>Ownership mindset and eagerness to dive deep into complex system challenges</li>\n<li>Bonus: published research or open-source contributions in ML systems, inference optimization, or model serving</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_f2196e99-854","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Databricks","sameAs":"https://databricks.com","logo":"https://logos.yubhub.co/databricks.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/databricks/jobs/8202670002","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$142,200-$204,600 USD","x-skills-required":["software engineering","performance-critical systems","ML inference internals","CUDA","GPU programming","distributed systems","RPC frameworks","queuing","RPC batching","sharding","memory partitioning","instrumentation","tracing","profiling tools","ML researchers","complex system challenges"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:54:17.777Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, California"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"software engineering, performance-critical systems, ML inference internals, CUDA, GPU programming, distributed systems, RPC frameworks, queuing, RPC batching, sharding, memory partitioning, instrumentation, tracing, profiling tools, ML researchers, complex system challenges","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":142200,"maxValue":204600,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_c1903386-87b"},"title":"Staff Infrastructure Software Engineer (Kubernetes)","description":"<p>As a member of the infrastructure team, you will design, build, and advance our core infrastructure that allows the engineering team to execute quickly, productively, and securely.</p>\n<p>You will partner with engineers to build dev tools that empower developer workflows and deployment infrastructure.</p>\n<p>Ensure reliability of multi-cloud Kubernetes clusters and pipelines.</p>\n<p>Metrics, logging, analytics, and alerting for performance and security across all endpoints and applications.</p>\n<p>Infrastructure-as-code deployment tooling and supporting services on multiple cloud providers.</p>\n<p>Automate operations and engineering.</p>\n<p>Focus on automation so we can spend energy where it matters.</p>\n<p>Building machine learning infrastructure that enables AI teams to train, test, and deploy on large-scale datasets.</p>\n<p>We are looking for a highly skilled engineer with 5+ years of experience in DevOps, Site Reliability Engineering, Production Engineering, or equivalent field.</p>\n<p>Deep proficiency with coding languages such as Golang or Python.</p>\n<p>Deep familiarity with container-related security best practices.</p>\n<p>Production experience working with Kubernetes, and a deep understanding of the Kubernetes ecosystem, including popular open-source tooling such as cert-manager or external-dns.</p>\n<p>Experience with GPU-enabled clusters is a bonus.</p>\n<p>Production experience with Kubernetes templating tools such as Helm or Kustomize.</p>\n<p>Production experience with IAC tools such as Terraform or CloudFormation.</p>\n<p>Production experience working with AWS and services such as IAM, S3, EC2, and EKS.</p>\n<p>Production experience with other cloud providers such as Google Cloud and Azure is a bonus.</p>\n<p>Production experience with database software such as PostgreSQL.</p>\n<p>Experience with GitOps tooling such as Flux or Argo.</p>\n<p>Experience with CI/CD such as GitHub Actions.</p>\n<p>Perks and benefits include paid parental leave, monthly health and wellness allowance, and PTO.</p>\n<p>Compensation includes a base salary, equity, and a variety of benefits.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_c1903386-87b","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Cresta","sameAs":"https://www.cresta.ai/","logo":"https://logos.yubhub.co/cresta.ai.png"},"x-apply-url":"https://job-boards.greenhouse.io/cresta/jobs/4535898008","x-work-arrangement":"remote","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Golang","Python","Kubernetes","cert-manager","external-dns","GPU-enabled clusters","Helm","Kustomize","Terraform","CloudFormation","AWS","IAM","S3","EC2","EKS","Google Cloud","Azure","PostgreSQL","GitOps","Flux","Argo","CI/CD","GitHub Actions"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:53:57.717Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Germany (Remote)"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Golang, Python, Kubernetes, cert-manager, external-dns, GPU-enabled clusters, Helm, Kustomize, Terraform, CloudFormation, AWS, IAM, S3, EC2, EKS, Google Cloud, Azure, PostgreSQL, GitOps, Flux, Argo, CI/CD, GitHub Actions"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_26212e9e-5a8"},"title":"Infrastructure Engineer/SRE","description":"<p>We&#39;re seeking an experienced Infrastructure Engineer/SRE to join our engineering team. As a key member of our infrastructure team, you will be responsible for designing, building, and advancing our core infrastructure that allows the engineering team to execute quickly, productively, and securely.</p>\n<p>As a collaborative but highly autonomous working environment, each member has a defined role with clear expectations, as well as the freedom to pursue projects they find interesting.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Partner with engineers to build dev tools that empower developer workflows and deployment infrastructure.</li>\n<li>Ensure reliability of multi-cloud Kubernetes clusters and pipelines.</li>\n<li>Metrics, logging, analytics, and alerting for performance and security across all endpoints and applications.</li>\n<li>Infrastructure-as-code deployment tooling and supporting services on multiple cloud providers.</li>\n<li>Automate operations and engineering. Focus on automation so we can spend energy where it matters.</li>\n<li>Building machine learning infrastructure that enables AI teams to train, test, and deploy on large-scale datasets.</li>\n</ul>\n<p>What we are looking for:</p>\n<ul>\n<li>5+ years experience in DevOps, Site Reliability Engineering, Production Engineering, or equivalent field.</li>\n<li>Deep proficiency with coding languages such as Golang or Python.</li>\n<li>Deep familiarity with container-related security best practices.</li>\n<li>Production experience working with Kubernetes, and a deep understanding of the Kubernetes ecosystem, including popular open-source tooling such as cert-manager or external-dns.</li>\n<li>Experience with GPU-enabled clusters is a bonus.</li>\n<li>Production experience with Kubernetes templating tools such as Helm or Kustomize.</li>\n<li>Production experience with IAC tools such as Terraform or CloudFormation.</li>\n<li>Production experience working with AWS and services such as IAM, S3, EC2, and EKS.</li>\n<li>Production experience with other cloud providers such as Google Cloud and Azure is a bonus.</li>\n<li>Production experience with database software such as PostgreSQL.</li>\n<li>Experience with GitOps tooling such as Flux or Argo.</li>\n<li>Experience with CI/CD such as GitHub Actions.</li>\n</ul>\n<p>Perks &amp; Benefits:</p>\n<ul>\n<li>We offer Cresta employees a variety of medical benefits designed to fit your stage of life.</li>\n<li>Flexible vacation time to promote a healthy work-life blend.</li>\n<li>Paid parental leave to support you and your family.</li>\n</ul>\n<p>Compensation for this position includes a base salary, equity, and a variety of benefits. Actual base salaries will be based on candidate-specific factors, including experience, skillset, and location, and local minimum pay requirements as applicable.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_26212e9e-5a8","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Cresta","sameAs":"https://www.cresta.ai/","logo":"https://logos.yubhub.co/cresta.ai.png"},"x-apply-url":"https://job-boards.greenhouse.io/cresta/jobs/5113847008","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Golang","Python","Kubernetes","cert-manager","external-dns","GPU-enabled clusters","Helm","Kustomize","Terraform","CloudFormation","AWS","IAM","S3","EC2","EKS","Google Cloud","Azure","PostgreSQL","GitOps","Flux","Argo","CI/CD","GitHub Actions"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:53:55.875Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Australia (Remote)"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Golang, Python, Kubernetes, cert-manager, external-dns, GPU-enabled clusters, Helm, Kustomize, Terraform, CloudFormation, AWS, IAM, S3, EC2, EKS, Google Cloud, Azure, PostgreSQL, GitOps, Flux, Argo, CI/CD, GitHub Actions"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_8f6ef3b1-c9b"},"title":"Technical Program Manager, Compute","description":"<p>As a Technical Program Manager on the Compute team, you will help drive the planning, coordination, and execution of programs that keep Anthropic&#39;s compute infrastructure running efficiently at scale.</p>\n<p>Our compute fleet is the foundation on which every model training run, evaluation, and inference workload depends. You&#39;ll join a small, high-impact TPM team and take ownership of critical workstreams across the compute lifecycle, from how supply is procured and brought online, to how capacity is allocated and utilized across teams.</p>\n<p>You&#39;ll partner with Infrastructure, Systems, Research, Finance, and Capacity Engineering to shape the processes, tooling, and coordination mechanisms that allow Anthropic to move fast while managing an increasingly complex compute environment.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Own and drive critical programs across the compute lifecycle, coordinating execution across multiple engineering, research, and operations teams</li>\n<li>Build and maintain operational visibility into the compute fleet, ensuring the organization has a clear picture of supply, demand, utilization, and health</li>\n<li>Lead cross-functional coordination for compute transitions: bringing new capacity online, migrating workloads, and managing decommissions across cloud providers and hardware platforms</li>\n<li>Partner with engineering and research leadership to navigate competing priorities and drive alignment on how compute resources are planned, allocated, and used</li>\n<li>Identify and close operational gaps across the compute pipeline, whether through new tooling, improved processes, or better cross-team communication</li>\n<li>Own trade-off discussions between utilization, cost, latency, and reliability, synthesizing inputs from technical and business stakeholders and communicating decisions to leadership</li>\n<li>Develop and improve the processes and frameworks the team uses to plan, track, and execute compute programs at increasing scale and complexity</li>\n</ul>\n<p>You may be a good fit if you:</p>\n<ul>\n<li>Have 7+ years of technical program management experience in infrastructure, platform engineering, or compute-intensive environments</li>\n<li>Have led complex, cross-functional programs involving multiple engineering teams with competing priorities and ambiguous requirements</li>\n<li>Have experience working with research or ML teams and translating their needs into operational plans and technical requirements</li>\n<li>Are comfortable diving deep into technical details (cloud infrastructure, cluster management, job scheduling, resource orchestration) while maintaining program-level visibility</li>\n<li>Thrive in ambiguous, fast-moving environments where you need to define scope and build processes from the ground up</li>\n<li>Have strong communication skills and can engage credibly with engineers, researchers, finance, and executive leadership</li>\n<li>Have a track record of building trust with engineering teams and driving changes through influence rather than authority</li>\n</ul>\n<p>Strong candidates may also have:</p>\n<ul>\n<li>Experience managing compute capacity across multiple cloud providers (AWS, GCP, Azure) or hybrid cloud/on-premises environments</li>\n<li>Familiarity with job scheduling, resource orchestration, or workload management systems (Kubernetes, Slurm, Borg, YARN, or custom schedulers)</li>\n<li>Experience with GPU or accelerator infrastructure, including the unique challenges of large-scale ML training and inference workloads</li>\n<li>Built or improved observability for infrastructure systems: dashboards, alerting, efficiency metrics, or cost attribution</li>\n<li>Capacity planning experience including demand forecasting, cost modeling, or hardware lifecycle management</li>\n<li>Scaled through hypergrowth in AI/ML, HPC, or large-scale cloud environments</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_8f6ef3b1-c9b","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5138044008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$290,000-$365,000 USD","x-skills-required":["Technical Program Management","Cloud Infrastructure","Cluster Management","Job Scheduling","Resource Orchestration","Compute Capacity Management","GPU or Accelerator Infrastructure","Observability for Infrastructure Systems","Capacity Planning"],"x-skills-preferred":["Kubernetes","Slurm","Borg","YARN","Custom Schedulers","Demand Forecasting","Cost Modeling","Hardware Lifecycle Management"],"datePosted":"2026-04-18T15:53:42.458Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Technical Program Management, Cloud Infrastructure, Cluster Management, Job Scheduling, Resource Orchestration, Compute Capacity Management, GPU or Accelerator Infrastructure, Observability for Infrastructure Systems, Capacity Planning, Kubernetes, Slurm, Borg, YARN, Custom Schedulers, Demand Forecasting, Cost Modeling, Hardware Lifecycle Management","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":290000,"maxValue":365000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_5f6e6eac-370"},"title":"Sr GPU Infrastructure Software Engineer","description":"<p>CoreWeave is seeking a Senior GPU Infrastructure Software Engineer to join our team. As a senior engineer, you will be responsible for leading designs, raising engineering standards, and delivering measurable improvements to latency, throughput, and reliability across multiple services. You will partner with fleet, product, and hardware teams to evolve our GPU performance testing platform to ensure we deliver a reliable and performant experience to our customers.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Design and implement solutions to problems of scale for testing and validation of CoreWeave&#39;s global infrastructure.</li>\n<li>Design and develop Kubernetes-native controllers and operators to automate infrastructure workflows.</li>\n<li>Build and maintain scalable backend services and APIs (gRPC/REST) in Go or Python.</li>\n<li>Develop performance tests and automation workflows to expand hardware validation across the CoreWeave fleet.</li>\n<li>Write and maintain Kubernetes custom controllers and operators to automate infrastructure testing.</li>\n<li>Adapt and extend open source tooling to enhance visibility into system metrics, performance, and health.</li>\n</ul>\n<p>To be successful in this role, you should have:</p>\n<ul>\n<li>~5 to 8 years experience.</li>\n<li>Proficiency in Go and/or Python software development.</li>\n<li>Hands-on experience with writing Kubernetes operators/controllers.</li>\n</ul>\n<p>Nice to have:</p>\n<ul>\n<li>Experience testing hardware at scale.</li>\n<li>HPC Experience.</li>\n<li>Experience with AI/ML infrastructure and training / inference.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_5f6e6eac-370","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4627277006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$165,000 to $242,000","x-skills-required":["Go","Python","Kubernetes","GPU performance testing","infrastructure validation"],"x-skills-preferred":["HPC Experience","AI/ML infrastructure","training / inference"],"datePosted":"2026-04-18T15:53:07.770Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Go, Python, Kubernetes, GPU performance testing, infrastructure validation, HPC Experience, AI/ML infrastructure, training / inference","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":165000,"maxValue":242000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_a966b1bf-e76"},"title":"Staff Software Engineer, Compute Infrastructure","description":"<p>As a Staff Software Engineer, you will shape the backbone of our GPU-driven data centers,powering some of the most advanced workloads in AI and large-scale computing. This isn&#39;t just about keeping the lights on; it&#39;s about architecting the next generation of reliable, secure, and massively scalable infrastructure.</p>\n<p>The METALDEV team builds and operates a suite of Go-based services that power large-scale datacenter deployments. These platforms automate complex workflows while providing deep observability and monitoring for tens of thousands of GPU servers and diverse infrastructure components,including CDUs, PDUs, and NVLink switches. Our tooling is designed for next-generation rack systems like NVIDIA GB200 and GB300, as well as a broad range of GPU server platforms.</p>\n<p>Your responsibilities will include:</p>\n<ul>\n<li>Providing technical leadership in designing, architecting, and operating large-scale infrastructure services for GPU servers, with a focus on security, reliability, and scalability.</li>\n<li>Building and enhancing infrastructure services and automation, including inventory management systems and lifecycle management solutions using open source technologies.</li>\n<li>Driving strategic direction for infrastructure automation, lifecycle management, and service orchestration, making MetalDev core services more scalable and resilient.</li>\n<li>Defining best practices for API development (REST/gRPC), distributed databases, and Kubernetes orchestration,while mentoring engineers to follow your lead.</li>\n<li>Partnering with hardware, software, and operations teams to align infrastructure with business impact.</li>\n<li>Contributing to open source communities (e.g., Go, Redfish) through collaboration and technical thought leadership.</li>\n<li>Leading and improving CI/CD pipelines for hardware compliance, firmware management, and data systems.</li>\n<li>Championing reliability and operational excellence by driving observability (Prometheus/Grafana), production incident response, and continuous service improvement.</li>\n</ul>\n<p>We&#39;re looking for someone with a strong background in software engineering, particularly in infrastructure, cloud engineering, and distributed databases. You should have experience with Go and a proven track record of building REST/gRPC APIs for mission-critical platforms. Additionally, you should be familiar with architecting and scaling cloud-native Kubernetes infrastructure and distributed services.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_a966b1bf-e76","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4603505006","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$188,000 to $275,000","x-skills-required":["Go","REST/gRPC","Distributed databases","Kubernetes orchestration","API development","Infrastructure services","Automation","Inventory management","Lifecycle management","CI/CD pipelines","Hardware compliance","Firmware management","Data systems","Observability","Production incident response","Continuous service improvement"],"x-skills-preferred":["Kafka","ClickHouse","CRDB","DMTF","RedFish APIs","GPU servers"],"datePosted":"2026-04-18T15:53:06.173Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Manhattan, NY / Sunnyvale, CA / Bellevue, WA / Livingston, NJ"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Go, REST/gRPC, Distributed databases, Kubernetes orchestration, API development, Infrastructure services, Automation, Inventory management, Lifecycle management, CI/CD pipelines, Hardware compliance, Firmware management, Data systems, Observability, Production incident response, Continuous service improvement, Kafka, ClickHouse, CRDB, DMTF, RedFish APIs, GPU servers","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":188000,"maxValue":275000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_e8e9acc0-a63"},"title":"Technical Program Manager, Compute","description":"<p>As a Technical Program Manager on the Compute team, you will help drive the planning, coordination, and execution of programs that keep Anthropic&#39;s compute infrastructure running efficiently at scale.</p>\n<p>Our compute fleet is the foundation on which every model training run, evaluation, and inference workload depends. You&#39;ll join a small, high-impact TPM team and take ownership of critical workstreams across the compute lifecycle, from how supply is procured and brought online, to how capacity is allocated and utilized across teams.</p>\n<p>You&#39;ll partner with Infrastructure, Systems, Research, Finance, and Capacity Engineering to shape the processes, tooling, and coordination mechanisms that allow Anthropic to move fast while managing an increasingly complex compute environment.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Own and drive critical programs across the compute lifecycle, coordinating execution across multiple engineering, research, and operations teams</li>\n<li>Build and maintain operational visibility into the compute fleet, ensuring the organization has a clear picture of supply, demand, utilization, and health</li>\n<li>Lead cross-functional coordination for compute transitions: bringing new capacity online, migrating workloads, and managing decommissions across cloud providers and hardware platforms</li>\n<li>Partner with engineering and research leadership to navigate competing priorities and drive alignment on how compute resources are planned, allocated, and used</li>\n<li>Identify and close operational gaps across the compute pipeline, whether through new tooling, improved processes, or better cross-team communication</li>\n<li>Own trade-off discussions between utilization, cost, latency, and reliability, synthesizing inputs from technical and business stakeholders and communicating decisions to leadership</li>\n<li>Develop and improve the processes and frameworks the team uses to plan, track, and execute compute programs at increasing scale and complexity</li>\n</ul>\n<p>You may be a good fit if you:</p>\n<ul>\n<li>Have 7+ years of technical program management experience in infrastructure, platform engineering, or compute-intensive environments</li>\n<li>Have led complex, cross-functional programs involving multiple engineering teams with competing priorities and ambiguous requirements</li>\n<li>Have experience working with research or ML teams and translating their needs into operational plans and technical requirements</li>\n<li>Are comfortable diving deep into technical details (cloud infrastructure, cluster management, job scheduling, resource orchestration) while maintaining program-level visibility</li>\n<li>Thrive in ambiguous, fast-moving environments where you need to define scope and build processes from the ground up</li>\n<li>Have strong communication skills and can engage credibly with engineers, researchers, finance, and executive leadership</li>\n<li>Have a track record of building trust with engineering teams and driving changes through influence rather than authority</li>\n</ul>\n<p>Strong candidates may also have:</p>\n<ul>\n<li>Experience managing compute capacity across multiple cloud providers (AWS, GCP, Azure) or hybrid cloud/on-premises environments</li>\n<li>Familiarity with job scheduling, resource orchestration, or workload management systems (Kubernetes, Slurm, Borg, YARN, or custom schedulers)</li>\n<li>Experience with GPU or accelerator infrastructure, including the unique challenges of large-scale ML training and inference workloads</li>\n<li>Built or improved observability for infrastructure systems: dashboards, alerting, efficiency metrics, or cost attribution</li>\n<li>Capacity planning experience including demand forecasting, cost modeling, or hardware lifecycle management</li>\n<li>Scaled through hypergrowth in AI/ML, HPC, or large-scale cloud environments</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_e8e9acc0-a63","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5138044008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$290,000-$365,000 USD","x-skills-required":["Technical Program Management","Compute Infrastructure","Cloud Providers","Job Scheduling","Resource Orchestration","Workload Management","GPU or Accelerator Infrastructure","Observability","Capacity Planning"],"x-skills-preferred":["Kubernetes","Slurm","Borg","YARN","Custom Schedulers","Demand Forecasting","Cost Modeling","Hardware Lifecycle Management"],"datePosted":"2026-04-18T15:52:47.770Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Technical Program Management, Compute Infrastructure, Cloud Providers, Job Scheduling, Resource Orchestration, Workload Management, GPU or Accelerator Infrastructure, Observability, Capacity Planning, Kubernetes, Slurm, Borg, YARN, Custom Schedulers, Demand Forecasting, Cost Modeling, Hardware Lifecycle Management","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":290000,"maxValue":365000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_15a29cc3-0bf"},"title":"Senior Production Engineer","description":"<p>CORPORATION</p>\n<p>CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025.</p>\n<p><strong>About the Role</strong></p>\n<p>Production Engineering ensures CoreWeave’s cloud delivers world-class reliability, performance, and operational excellence. We are hiring a Senior Production Engineer to take direct, hands-on ownership of critical tooling that drives reliability and delivery success.</p>\n<p>In this role, you will work broadly across the cloud stack designing, implementing, deploying, and operating systems that improve delivery velocity, service availability, and operational safety. You’ll be responsible for leading end-to-end technical projects, maintaining long-lived systems the team owns, and strengthening our operational foundations through durable engineering investments.</p>\n<p>This is a role for someone who enjoys building, debugging, and operating production systems. You will collaborate closely with service owners, but your primary impact comes from the reliability, quality, and maturity of the systems you deliver and maintain over time.</p>\n<p><strong>What You’ll Do</strong></p>\n<ul>\n<li>Take hands-on ownership of critical systems and frameworks, driving their architecture, implementation, and long-term evolution.</li>\n</ul>\n<ul>\n<li>Lead end-to-end delivery of engineering projects that improve availability, scalability, operational automation, and failure recovery.</li>\n</ul>\n<ul>\n<li>Build and maintain observability, alerting, automated remediation, and resilience testing for the systems you support.</li>\n</ul>\n<ul>\n<li>Participate in incident response as a subject-matter expert; drive deep root-cause investigations and implement lasting fixes.</li>\n</ul>\n<ul>\n<li>Improve runbooks, sources of truth, deployment workflows, and operational tooling to harden production readiness.</li>\n</ul>\n<ul>\n<li>Eliminate single points of failure and reduce operational toil through automation, refactors, and system redesigns.</li>\n</ul>\n<ul>\n<li>Ship production code regularly in Python, Go, or similar languages, and participate in on-call rotations.</li>\n</ul>\n<ul>\n<li>Maintain and mature long-term projects and frameworks owned by the team, ensuring they remain reliable, well-instrumented, and easy to operate.</li>\n</ul>\n<ul>\n<li>Collaborate with platform teams to ensure new features and services integrate cleanly with our reliability best-practices and tooling.</li>\n</ul>\n<p><strong>What You’ve Worked On (Minimum Qualifications)</strong></p>\n<ul>\n<li>7+ years of engineering experience building and operating distributed systems or cloud platforms.</li>\n</ul>\n<ul>\n<li>Demonstrated ability to debug complex production issues end-to-end, across services, infrastructure layers, and automation.</li>\n</ul>\n<ul>\n<li>Strong programming or scripting ability (Python, Go, or similar), with experience shipping and operating production services and tools.</li>\n</ul>\n<ul>\n<li>Deep knowledge of cloud-native technologies and distributed system patterns, particularly Kubernetes.</li>\n</ul>\n<ul>\n<li>Experience with modern observability stacks: metrics, tracing, structured logs, SLOs/SLIs, and incident lifecycle practices.</li>\n</ul>\n<ul>\n<li>A track record of successfully delivering hands-on reliability improvements through engineering execution.</li>\n</ul>\n<p><strong>Preferred Qualifications</strong></p>\n<ul>\n<li>Experience building internal tooling, frameworks, or automation that supports high-availability cloud operations.</li>\n</ul>\n<ul>\n<li>Familiarity with DR/BCP, service tiering, capacity planning, or chaos engineering.</li>\n</ul>\n<ul>\n<li>Background operating or building large-scale AI or GPU-accelerated infrastructure.</li>\n</ul>\n<ul>\n<li>Experience maintaining multi-year ownership of foundational production systems.</li>\n</ul>\n<p>Why CoreWeave?</p>\n<p>At CoreWeave, we work hard, have fun, and move fast! We’re in an exciting stage of hyper-growth that you will not want to miss out on. We’re not afraid of a little chaos, and we’re constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values:</p>\n<ul>\n<li>Be Curious at Your Core</li>\n</ul>\n<ul>\n<li>Act Like an Owner</li>\n</ul>\n<ul>\n<li>Empower Employees</li>\n</ul>\n<ul>\n<li>Deliver Best-in-Class Client Experiences</li>\n</ul>\n<ul>\n<li>Achieve More Together</li>\n</ul>\n<p>We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and enables the development of innovative solutions to complex problems. As we get set for takeoff, the organization&#39;s growth opportunities are constantly expanding. You will be surrounded by some of the best talent in the industry, who will want to learn from you, too. Come join us!</p>\n<p>The base salary range for this role is $139,000 to $204,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).</p>\n<p>What We Offer</p>\n<p>The range we’ve posted represents the typical compensation range for this role. To determine actual compensation, we review the market rate for each candidate which can include a variety of factors. These include qualifications, experience, interview performance, and location.</p>\n<p>In addition to a competitive salary, we offer a variety of benefits to support your needs, including:</p>\n<ul>\n<li>Medical, dental, and vision insurance - 100% paid for by CoreWeave</li>\n</ul>\n<ul>\n<li>Company-paid Life Insurance</li>\n</ul>\n<ul>\n<li>Voluntary supplemental life insurance</li>\n</ul>\n<ul>\n<li>Short and long-term disability insurance</li>\n</ul>\n<ul>\n<li>Flexible Spending Account</li>\n</ul>\n<ul>\n<li>Health Savings Account</li>\n</ul>\n<ul>\n<li>Tuition Reimbursement</li>\n</ul>\n<ul>\n<li>Ability to Participate in Employee Stock Purchase Program (ESPP)</li>\n</ul>\n<ul>\n<li>Mental Wellness Benefits through Spring Health</li>\n</ul>\n<ul>\n<li>Family-Forming support provided by Carrot</li>\n</ul>\n<ul>\n<li>Paid Parental Leave</li>\n</ul>\n<ul>\n<li>Flexible, full-service childcare support with Kinside</li>\n</ul>\n<ul>\n<li>401(k) with a generous employer match</li>\n</ul>\n<ul>\n<li>Flexible PTO</li>\n</ul>\n<ul>\n<li>Catered lunch each day in our office and data center locations</li>\n</ul>\n<ul>\n<li>A casual work environment</li>\n</ul>\n<ul>\n<li>A work culture focused on innovative disruption</li>\n</ul>\n<p>Our Workplace</p>\n<p>While we prioritize a hybrid work environment, remote work may be considered for candidates located more than 30 miles from an office, based on role requirements for specialized skill sets. New hires will be invited to attend onboarding at one of our hubs within their first month. Teams also gather quarterly to support collaboration.</p>\n<p>California Consumer Privacy Act - California applicants only</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_15a29cc3-0bf","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4670172006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$139,000 to $204,000","x-skills-required":["cloud computing","distributed systems","cloud platforms","Kubernetes","observability stacks","metrics","tracing","structured logs","SLOs/SLIs","incident lifecycle practices","Python","Go","programming","scripting","production services","tools"],"x-skills-preferred":["internal tooling","frameworks","automation","high-availability cloud operations","DR/BCP","service tiering","capacity planning","chaos engineering","large-scale AI","GPU-accelerated infrastructure"],"datePosted":"2026-04-18T15:52:09.786Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"cloud computing, distributed systems, cloud platforms, Kubernetes, observability stacks, metrics, tracing, structured logs, SLOs/SLIs, incident lifecycle practices, Python, Go, programming, scripting, production services, tools, internal tooling, frameworks, automation, high-availability cloud operations, DR/BCP, service tiering, capacity planning, chaos engineering, large-scale AI, GPU-accelerated infrastructure","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":139000,"maxValue":204000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_2ab9c635-07a"},"title":"Operations Engineer, Fleet Reliability","description":"<p>The Fleet Reliability Operations team is responsible for the day-to-day provisioning, management, and uptime of CoreWeave&#39;s ever-expanding fleet of server nodes. This team plays a central role in CoreWeave&#39;s growth strategy, configuring, updating, and remotely troubleshooting our highest-tier supercomputing clusters and their networking, delivery platforms, and tools dependencies.</p>\n<p>We are seeking curious, creative, and persistent problem solvers to join our Fleet Reliability Operations team to help drive batches of server nodes through our provisioning and validation processes while efficiently and effectively troubleshooting node or cluster problems as they arise.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Configuring and maintaining large-scale high-performance supercomputing clusters running state-of-the-art GPUs</li>\n<li>Troubleshooting hardware and software issues; escalating and coordinating as needed with data center, network, hardware, and platform teams to drive resolution</li>\n<li>Monitoring and analyzing system performance and taking appropriate remediation actions for cloud health</li>\n<li>Approaching work with flexibility and optimism, anticipating shifting business and technical priorities</li>\n<li>Creating and maintaining documentation of team processes, knowledge, and best practices for system management</li>\n<li>Thinking critically about day-to-day work and working collaboratively to improve team processes and efficiency</li>\n</ul>\n<p>As a member of our team, you will be part of a dynamic and fast-paced environment where you will have the opportunity to grow and develop your skills. We offer a competitive salary range of $83,000 to $110,000, as well as a comprehensive benefits package, including medical, dental, and vision insurance, company-paid life insurance, and flexible PTO.</p>\n<p>If you are a motivated and detail-oriented individual who is passionate about working with cutting-edge technology, we encourage you to apply for this exciting opportunity.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_2ab9c635-07a","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4617382006","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$83,000 to $110,000","x-skills-required":["Linux system administration","Troubleshooting hardware and software issues","System maintenance tasks","Scripting languages (bash, python, powershell, etc)","Grafana, Prometheus, promsql queries or similar observability platforms"],"x-skills-preferred":["Kubernetes administration","HPC - administering GPU-related workloads","Data center environments including server racks, HVAC systems, fiber trays"],"datePosted":"2026-04-18T15:51:55.238Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"New York, NY /Plano, TX /  Bellevue, WA / Sunnyvale, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Linux system administration, Troubleshooting hardware and software issues, System maintenance tasks, Scripting languages (bash, python, powershell, etc), Grafana, Prometheus, promsql queries or similar observability platforms, Kubernetes administration, HPC - administering GPU-related workloads, Data center environments including server racks, HVAC systems, fiber trays","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":83000,"maxValue":110000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_a1ba5c28-9ce"},"title":"Senior Software Engineer, Observability","description":"<p>Join CoreWeave&#39;s Observability team, responsible for building the systems that give our customers and internal teams unparalleled visibility into complex AI workloads.</p>\n<p>Our team empowers engineers to understand, troubleshoot, and optimize high-performance infrastructure at massive scale.</p>\n<p>As a Senior Software Engineer on the Observability team, you will design, build, and maintain core observability infrastructure spanning metrics, logging, tracing, and telemetry pipelines.</p>\n<p>Your day-to-day will involve developing highly reliable and scalable systems, collaborating with internal engineering teams to embed observability best practices, and tackling performance and reliability challenges across clusters of thousands of GPUs.</p>\n<p>You&#39;ll also contribute to platform strategy and participate in on-call rotations to ensure critical production systems remain robust and operational.</p>\n<p>The base salary range for this role is $139,000 to $220,000.</p>\n<p>In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).</p>\n<p>We offer a variety of benefits to support your needs, including medical, dental, and vision insurance, 100% paid for by CoreWeave, company-paid Life Insurance, voluntary supplemental life insurance, short and long-term disability insurance, flexible Spending Account, Health Savings Account, tuition reimbursement, ability to participate in Employee Stock Purchase Program (ESPP), mental wellness benefits through Spring Health, family-forming support provided by Carrot, paid parental leave, flexible, full-service childcare support with Kinside, 401(k) with a generous employer match, flexible PTO, catered lunch each day in our office and data center locations, a casual work environment, and a work culture focused on innovative disruption.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_a1ba5c28-9ce","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4554201006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$139,000 to $220,000","x-skills-required":["Go","Python","Kubernetes","containerization","microservices architectures","Helm","YAML-based configurations","automated testing","progressive release strategies","on-call rotations"],"x-skills-preferred":["designing, operating, or scaling logging, metrics, or tracing platforms","data streaming systems for observability pipelines","automating infrastructure provisioning","OpenTelemetry for unified telemetry collection and instrumentation","exposure to modern AI workloads and GPU-based infrastructure"],"datePosted":"2026-04-18T15:51:55.238Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"New York, NY / Sunnyvale, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Go, Python, Kubernetes, containerization, microservices architectures, Helm, YAML-based configurations, automated testing, progressive release strategies, on-call rotations, designing, operating, or scaling logging, metrics, or tracing platforms, data streaming systems for observability pipelines, automating infrastructure provisioning, OpenTelemetry for unified telemetry collection and instrumentation, exposure to modern AI workloads and GPU-based infrastructure","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":139000,"maxValue":220000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_33821044-320"},"title":"Principal Engineer, Storage","description":"<p>We&#39;re looking for a Principal Engineer to play a key role in designing, building, and operating the data plane for our high-performance AI storage platform.</p>\n<p>You&#39;ll develop CoreWeave&#39;s storage systems by building reliable, scalable, and high-throughput solutions that power some of the largest and most innovative AI workloads in the world.</p>\n<p>This role involves close collaboration with teams across infrastructure, compute, and platform to ensure our storage services scale automatically and seamlessly while maximizing performance and reliability.</p>\n<p>About the role:</p>\n<ul>\n<li>Design and implement a highly scalable multi-tenant control plane that supports CoreWeave&#39;s growing AI storage and cloud infrastructure needs.</li>\n<li>Contribute to the development of exabyte-scale, S3-compatible object storage, distributed file system and integrate dedicated storage clusters into diverse customer environments.</li>\n<li>Work with technologies such as RDMA, GPU Direct Storage, RoCE, InfiniBand, SPDK, and distributed filesystems to optimize storage performance and efficiency.</li>\n<li>Participate in efforts to improve the reliability, durability, and observability of our storage stack.</li>\n<li>Collaborate with operations teams to monitor, analyze, and optimize storage systems using telemetry, metrics, and dashboards to improve performance, latency, and resilience.</li>\n<li>Work cross-functionally with platform, product, and infrastructure teams to deliver seamless storage capabilities across the stack.</li>\n<li>Share your knowledge and mentor other engineers on best practices in building distributed, high-performance systems, especially focusing on low level storage details that improve performance and durability.</li>\n</ul>\n<p>Who You Are:</p>\n<ul>\n<li>Bachelor’s, Master’s, or PHD degree in Computer Science, Engineering, or a related field.</li>\n<li>8–10+ years of experience working in storage systems engineering.</li>\n<li>Strong hands-on experience with object storage, block storage or distributed filesystems in production environments.</li>\n<li>Proficiency in a systems programming language such as Go, C, or Rust.</li>\n<li>Solid understanding of cloud-native infrastructure, Kubernetes, and scalable system architecture.</li>\n<li>Strong debugging and problem-solving skills in distributed, high-performance environments.</li>\n<li>Clear communicator, able to work collaboratively across teams and share technical insights effectively.</li>\n<li>Familiarity with the trade offs between HDD and SSD based storage systems.</li>\n</ul>\n<p>The base salary range for this role is $206,000 to $303,000.</p>\n<p>In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).</p>\n<p>What We Offer</p>\n<p>The range we’ve posted represents the typical compensation range for this role. To determine actual compensation, we review the market rate for each candidate which can include a variety of factors. These include qualifications, experience, interview performance, and location.</p>\n<p>In addition to a competitive salary, we offer a variety of benefits to support your needs, including:</p>\n<ul>\n<li>Medical, dental, and vision insurance</li>\n<li>100% paid for by CoreWeave</li>\n<li>Company-paid Life Insurance</li>\n<li>Voluntary supplemental life insurance</li>\n<li>Short and long-term disability insurance</li>\n<li>Flexible Spending Account</li>\n<li>Health Savings Account</li>\n<li>Tuition Reimbursement</li>\n<li>Ability to Participate in Employee Stock Purchase Program (ESPP)</li>\n<li>Mental Wellness Benefits through Spring Health</li>\n<li>Family-Forming support provided by Carrot</li>\n<li>Paid Parental Leave</li>\n<li>Flexible, full-service childcare support with Kinside</li>\n<li>401(k) with a generous employer match</li>\n<li>Flexible PTO</li>\n<li>Catered lunch each day in our office and data center locations</li>\n<li>A casual work environment</li>\n<li>A work culture focused on innovative disruption</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_33821044-320","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4646276006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$206,000 to $303,000","x-skills-required":["object storage","block storage","distributed filesystems","RDMA","GPU Direct Storage","RoCE","InfiniBand","SPDK","cloud-native infrastructure","Kubernetes","scalable system architecture","systems programming language","Go","C","Rust"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:51:53.363Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"object storage, block storage, distributed filesystems, RDMA, GPU Direct Storage, RoCE, InfiniBand, SPDK, cloud-native infrastructure, Kubernetes, scalable system architecture, systems programming language, Go, C, Rust","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":206000,"maxValue":303000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_d547efb6-77f"},"title":"Senior Linux Systems Engineer","description":"<p>We are looking for a highly motivated Senior Linux Systems Engineer to join our Computing Team!</p>\n<p>You will work on high-performance computing (HPC) systems that are part of our sequencing platform. The ideal candidate is a hands-on Linux expert who thrives on optimizing performance and building secure, scalable and reliable systems in a fast-paced environment.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Design, build, and maintain high-performance Linux systems supporting compute and data-intensive workloads</li>\n<li>Optimise system performance through kernel and filesystem tuning; identify and eliminate I/O, memory, or network bottlenecks</li>\n<li>Automate provisioning and configuration management using orchestration tools such as Ansible and Salt</li>\n<li>Monitor and troubleshoot kernel, driver, and hardware issues; perform root cause analysis in partnership with data and engineering teams and propose long-term solutions</li>\n<li>Ensure system reliability through regular patching, monitoring, and performance tuning</li>\n<li>Maintain accurate system documentation, runbooks, and configuration baselines</li>\n<li>Collaborate with software, hardware, and scientific teams to ensure platform reliability and scalability</li>\n</ul>\n<p>Qualifications, Skills, Knowledge &amp; Abilities:</p>\n<ul>\n<li>BS in Computer Science, Engineering, or related field</li>\n<li>5+ years of experience designing and building high-performance physical Linux systems in high-throughput or mission-critical environments</li>\n<li>Deep knowledge of Linux kernel, NFS and Linux file system performance tuning</li>\n<li>Solid background in TCP/IP networking, routing, VLANs, and firewall rules</li>\n<li>Experience with the latest CPU and GPU technologies</li>\n<li>Proficiency in shell scripting (bash), working knowledge of Python, and familiarity with Ansible or similar configuration management tools</li>\n<li>Proven hands-on experience building servers from components, diagnosing hardware failures, and working with vendors</li>\n<li>Excellent documentation and communication skills</li>\n<li>May occasionally be exposed to activity that requires pulling/lifting/moving/carrying up to 50 lbs</li>\n<li>Experience with cloud computing infrastructure (e.g. AWS) and Docker desirable</li>\n<li>Familiarity with security frameworks and compliance standards (e.g. ISO 27001) a plus</li>\n</ul>\n<p>At Ultima Genomics, your base pay is one part of your total compensation package. This role pays between $125,000 and $150,000, if performed in California, and your actual base pay will depend on your skills, qualifications, experience, and location.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_d547efb6-77f","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Ultima Genomics","sameAs":"https://www.ultimagen.com/","logo":"https://logos.yubhub.co/ultimagen.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/ultimagenomics/jobs/5649426004","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$125,000 - $150,000","x-skills-required":["Linux","High-performance computing","Kernel and filesystem tuning","Ansible and Salt","TCP/IP networking","Routing","VLANs","Firewall rules","CPU and GPU technologies","Shell scripting","Python","Cloud computing infrastructure","Docker","Security frameworks and compliance standards"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:51:48.011Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Fremont, California, United States"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Life Sciences","skills":"Linux, High-performance computing, Kernel and filesystem tuning, Ansible and Salt, TCP/IP networking, Routing, VLANs, Firewall rules, CPU and GPU technologies, Shell scripting, Python, Cloud computing infrastructure, Docker, Security frameworks and compliance standards","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":125000,"maxValue":150000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_9166d234-4c5"},"title":"Solutions Architect - HPC/AI/ML","description":"<p>As a Solutions Architect at CoreWeave, you will play a vital and dynamic role in helping customers establish their Kubernetes environment, develop proofs of concept, onboard, and optimise workloads. You will serve as the primary technical point of contact for customers, establishing strong technical relationships and ensuring their success with CoreWeave&#39;s cloud infrastructure offerings, focusing on AI/ML workloads within high-performance compute (HPC) environments.</p>\n<p>Collaborate closely with customers to understand their unique business needs and create, prototype, and deploy tailored solutions that align with their requirements. Lead proof of concept initiatives to showcase the value and viability of CoreWeave&#39;s solutions within specific environments.</p>\n<p>Drive technical leadership and direction during customer meetings, presentations, and workshops, addressing any technical queries or concerns that arise. Act as a virtual member of CoreWeave&#39;s Kubernetes product and engineering teams, identifying opportunities for product enhancement and collaborating with engineers to implement your suggestions.</p>\n<p>Offer valuable insights on product features, functionality, and performance, contributing regularly to discussions about product strategy and architecture. Conduct periodic technical reviews and assessments of customer workloads, pinpointing opportunities for workload optimisation and suggesting suitable solutions.</p>\n<p>Stay informed of the latest developments and trends in Kubernetes, cloud computing and infrastructure, sharing your thought leadership with customers and internal stakeholders. Lead the prototyping and initiation of research and development efforts for emerging products and solutions, delivering prototypes and key insights for internal consumption.</p>\n<p>Represent CoreWeave at conferences and industry events, with occasional travel as required.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_9166d234-4c5","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4649044006","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$165,000 to $225,000 SGD","x-skills-required":["cloud computing concepts","architecture","technologies","NVIDIA GPUs","Infiniband","NVIDIA Collective Communications Library (NCCL)","Slurm","Kubernetes"],"x-skills-preferred":["code contributions to open-source inference frameworks","scripting and automation related to AI/ML workloads","building solutions across multi-cloud environments","client or customer-facing publications/talks on latency, optimisation, or advanced model-server architectures"],"datePosted":"2026-04-18T15:51:30.371Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Singapore"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"cloud computing concepts, architecture, technologies, NVIDIA GPUs, Infiniband, NVIDIA Collective Communications Library (NCCL), Slurm, Kubernetes, code contributions to open-source inference frameworks, scripting and automation related to AI/ML workloads, building solutions across multi-cloud environments, client or customer-facing publications/talks on latency, optimisation, or advanced model-server architectures","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":165000,"maxValue":225000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_0ae48270-bef"},"title":"Senior Software Engineer, Storage Engineer","description":"<p>The Storage Engine Organisation at CoreWeave is responsible for the product capabilities and data plane function of CoreWeave&#39;s managed storage products.</p>\n<p>We build reliable, scalable storage solutions with segment leading performance. Storage engine works with engineering teams across infrastructure, compute, and platform to ensure our storage services meet the needs of the world&#39;s most demanding AI workloads.</p>\n<p>The role involves designing and implementing distributed storage solutions to support scaling data-intensive AI workloads, contributing to the development of exabyte-scale, S3-compatible object storage, and integrating dedicated storage clusters into diverse customer environments.</p>\n<p>Key responsibilities include working with technologies such as RDMA, GPU Direct Storage, and distributed filesystems protocols like NFS or FUSE to optimise storage performance and efficiency, participating in efforts to improve the reliability, durability, and observability of our storage stack, collaborating with operations teams to monitor, troubleshoot, and improve storage systems in production environments, and helping develop metrics and dashboards to provide visibility into storage performance and health.</p>\n<p>The ideal candidate will have a strong background in storage systems engineering or infrastructure, with experience working with object storage or distributed filesystems in production environments, proficiency in a systems programming language like Go, C, or Rust, and familiarity with storage observability tools and telemetry pipelines.</p>\n<p>As a senior software engineer, you will be responsible for designing, developing, and deploying scalable and efficient storage solutions, working closely with cross-functional teams to ensure seamless integration with other components of the platform, and mentoring junior engineers to help them grow in their roles.</p>\n<p>If you are passionate about building high-performance storage solutions and have a strong background in software engineering, we encourage you to apply for this exciting opportunity.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_0ae48270-bef","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4643524006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$139,000 to $204,000","x-skills-required":["Storage systems engineering","Infrastructure","Object storage","Distributed filesystems","RDMA","GPU Direct Storage","NFS","FUSE","Systems programming languages (Go, C, Rust)","Storage observability tools","Telemetry pipelines"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:51:26.395Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Livingston, NJ/ New York , NY / Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Storage systems engineering, Infrastructure, Object storage, Distributed filesystems, RDMA, GPU Direct Storage, NFS, FUSE, Systems programming languages (Go, C, Rust), Storage observability tools, Telemetry pipelines","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":139000,"maxValue":204000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_01102ded-ef1"},"title":"Senior Manager, Technical Solutions Manager","description":"<p>The Customer Experience (CX) Organisation at CoreWeave is dedicated to ensuring every client running AI workloads at scale has a seamless, reliable, and high-performance experience.</p>\n<p>We are on the search for a remarkable Senior Manager of Technical Solutions Management (TSM) who shares our passion and has an understanding of GPU infrastructure and AI Applications to join the team.</p>\n<p>This critical leadership role will oversee the TSM function, which is responsible for managing technical relationships with strategic customers, defining and delivering on technical requirements, and driving the execution of complex programs from concept to successful completion.</p>\n<p>In this role, you will:</p>\n<ul>\n<li>Lead the TSM function within CoreWeave by building, leading, and focusing on hiring top talent and fostering their growth to ensure they excel as the primary technical advocates for CoreWeave&#39;s most strategic customers</li>\n</ul>\n<ul>\n<li>Collaborate across functions, working closely with leaders in Solutions Architecture, Support, Sales, and Product Engineering to elevate and enhance the CoreWeave customer experience</li>\n</ul>\n<ul>\n<li>Directly engage and collaborate with key customers to understand their AI workloads, pain points, and future requirements to continuously improve our service offerings</li>\n</ul>\n<ul>\n<li>Define and monitor key performance indicators (KPIs) to evaluate program success and effectiveness through leveraging multiple insights</li>\n</ul>\n<p>You will identify and eliminate inefficiencies, accelerate operational speed, and deliver exceptional results that reinforce CoreWeave&#39;s position as a market leader.</p>\n<p>The base salary range for this role is $207,000 to $275,000.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_01102ded-ef1","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4646569006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$207,000 to $275,000","x-skills-required":["GPU infrastructure","AI Applications","Cloud infrastructure","Kubernetes","High-performance computing"],"x-skills-preferred":["Leadership","Technical program management","Product management","Delivery management","Communication"],"datePosted":"2026-04-18T15:51:22.460Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Sunnyvale, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"GPU infrastructure, AI Applications, Cloud infrastructure, Kubernetes, High-performance computing, Leadership, Technical program management, Product management, Delivery management, Communication","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":207000,"maxValue":275000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_09c520cf-f62"},"title":"Systems Engineer, Kernel","description":"<p>CoreWeave is seeking a highly skilled and motivated Systems Kernel Engineer to join our HAVOCK Team, reporting into the Manager of Systems Engineering. In this role, you will be a key contributor to the stability, performance, and evolution of CoreWeave&#39;s Linux-based infrastructure.</p>\n<p>As a kernel generalist, you will be responsible for debugging kernel-level issues, analysing and fixing crashes, panics, dumps, and upstreaming fixes and features that improve the performance and reliability of our stack.</p>\n<p>This position is ideal for someone who thrives in low-level systems engineering, and understands how modern workloads stress kernels, and is excited to work across a diverse hardware/software ecosystem including CPUs, GPUs, DPUs, networking, and storage.</p>\n<p>Kernel Hardware - Acceleration - Virtualization - Operating Systems - Containerization - Kubelet</p>\n<p>Our Team&#39;s Stack:</p>\n<ul>\n<li>Python, Go, bash/sh, C</li>\n</ul>\n<ul>\n<li>Prometheus, Victoria Metrics, Grafana</li>\n</ul>\n<ul>\n<li>Linux Kernel (custom build), Ubuntu</li>\n</ul>\n<ul>\n<li>Intel/AMD/ARM CPUs, Nvidia GPUs, DPUs, Infiniband and Ethernet NICs</li>\n</ul>\n<ul>\n<li>Docker, kubernetes (k8s), KubeVirt, containerd, kubelet</li>\n</ul>\n<p>Focus Areas:</p>\n<ul>\n<li>Kernel Debugging – Analyse kernel crashes, oopses, panics, and dumps to identify root causes and propose fixes.</li>\n</ul>\n<ul>\n<li>Upstream Contributions – Develop patches for the Linux kernel and upstream them where applicable (networking, storage, virtualization, GPU/DPU enablement).</li>\n</ul>\n<ul>\n<li>Stack-Wide Support – Ensure kernel support and stability across:</li>\n</ul>\n<ul>\n<li>Virtualization (KubeVirt, QEMU, vFIO)</li>\n</ul>\n<ul>\n<li>Container runtimes (containerd, nydus, kubelet)</li>\n</ul>\n<ul>\n<li>HPC/AI workloads (CUDA, GPUDirect, RoCE/InfiniBand)</li>\n</ul>\n<ul>\n<li>Kernel-Hardware Enablement – Support new hardware bring-up across Intel, AMD, ARM CPUs, NVIDIA GPUs, DPUs, and NICs.</li>\n</ul>\n<ul>\n<li>Performance &amp; Stability – Tune kernel subsystems for latency, throughput, and scalability in distributed HPC/AI clusters.</li>\n</ul>\n<p>About the role:</p>\n<ul>\n<li>Triage and fix kernel crashes and performance regressions.</li>\n</ul>\n<ul>\n<li>Develop, test, and upstream kernel patches relevant to CoreWeave’s hardware/software environment.</li>\n</ul>\n<ul>\n<li>Collaborate with hardware vendors and the Linux community on feature enablement.</li>\n</ul>\n<ul>\n<li>Implement diagnostics and tooling for kernel-level observability.</li>\n</ul>\n<ul>\n<li>Work closely with HPC and Fleet teams to ensure kernel readiness for production workloads.</li>\n</ul>\n<ul>\n<li>Provide kernel-level expertise during incident response and root-cause investigations.</li>\n</ul>\n<p>Who You Are:</p>\n<ul>\n<li>5+ years of professional experience in Linux kernel engineering or systems-level development.</li>\n</ul>\n<ul>\n<li>Deep understanding of kernel internals (memory management, scheduling, networking, storage, drivers).</li>\n</ul>\n<ul>\n<li>Experience debugging kernel crashes, dumps, and panics using tools like crash, gdb, kdump.</li>\n</ul>\n<ul>\n<li>Strong C programming skills with the ability to write maintainable and upstream-quality code.</li>\n</ul>\n<ul>\n<li>Experience working with kernel modules, drivers, and subsystems.</li>\n</ul>\n<ul>\n<li>Strong problem-solving abilities with a “full-stack” systems perspective.</li>\n</ul>\n<p>Preferred:</p>\n<ul>\n<li>Contributions to the Linux kernel or related open-source projects.</li>\n</ul>\n<ul>\n<li>Familiarity with virtualization (KVM, QEMU, VFIO) and container runtimes.</li>\n</ul>\n<ul>\n<li>Networking stack expertise (InfiniBand, RoCE, TCP/IP performance tuning).</li>\n</ul>\n<ul>\n<li>GPU/DPU bring-up and driver experience.</li>\n</ul>\n<ul>\n<li>Experience in HPC or large-scale distributed systems.</li>\n</ul>\n<ul>\n<li>Familiarity with QA/QE best practices</li>\n</ul>\n<ul>\n<li>Experience working in Cloud environments</li>\n</ul>\n<ul>\n<li>Experience as a software engineer writing large-scale applications</li>\n</ul>\n<ul>\n<li>Experience with machine learning is a huge bonus</li>\n</ul>\n<p>The base salary range for this role is $165,000 to $242,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).</p>\n<p>What We Offer</p>\n<p>The range we’ve posted represents the typical compensation range for this role. To determine actual compensation, we review the market rate for each candidate which can include a variety of factors. These include qualifications, experience, interview performance, and location.</p>\n<p>In addition to a competitive salary, we offer a variety of benefits to support your needs, including:</p>\n<ul>\n<li>Medical, dental, and vision insurance - 100% paid for by CoreWeave</li>\n</ul>\n<ul>\n<li>Company-paid Life Insurance</li>\n</ul>\n<ul>\n<li>Voluntary supplemental life insurance</li>\n</ul>\n<ul>\n<li>Short and long-term disability insurance</li>\n</ul>\n<ul>\n<li>Flexible Spending Account</li>\n</ul>\n<ul>\n<li>Health Savings Account</li>\n</ul>\n<ul>\n<li>Tuition Reimbursement</li>\n</ul>\n<ul>\n<li>Ability to Participate in Employee Stock Purchase Program (ESPP)</li>\n</ul>\n<ul>\n<li>Mental Wellness Benefits through Spring Health</li>\n</ul>\n<ul>\n<li>Family-Forming support provided by Carrot</li>\n</ul>\n<ul>\n<li>Paid Parental Leave</li>\n</ul>\n<ul>\n<li>Flexible, full-service childcare support with Kinside</li>\n</ul>\n<ul>\n<li>401(k) with a generous employer match</li>\n</ul>\n<ul>\n<li>Flexible PTO</li>\n</ul>\n<ul>\n<li>Catered lunch each day in our office and data center locations</li>\n</ul>\n<ul>\n<li>A casual work environment</li>\n</ul>\n<ul>\n<li>A work culture focused on innovative disruption</li>\n</ul>\n<p>Our Workplace</p>\n<p>While we prioritize a hybrid work environment, remote work may be considered for candidates located more than 30 miles from an office, based on role requirements for specialized skill sets. New hires will be invited to attend onboarding at one of our hubs within their first month. Teams also gather quarterly to support collaboration.</p>\n<p>California Consumer Privacy Act - California applicants only</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_09c520cf-f62","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4599319006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$165,000 to $242,000","x-skills-required":["Linux kernel engineering","Systems-level development","C programming","Kernel modules","Drivers","Subsystems","Kernel debugging","Upstream contributions","Stack-wide support","Virtualization","Container runtimes","HPC/AI workloads","Kernel-hardware enablement","Performance & stability"],"x-skills-preferred":["Contributions to the Linux kernel","Networking stack expertise","GPU/DPU bring-up and driver experience","Experience in HPC or large-scale distributed systems","QA/QE best practices","Cloud environments","Software engineer writing large-scale applications","Machine learning"],"datePosted":"2026-04-18T15:51:21.252Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Linux kernel engineering, Systems-level development, C programming, Kernel modules, Drivers, Subsystems, Kernel debugging, Upstream contributions, Stack-wide support, Virtualization, Container runtimes, HPC/AI workloads, Kernel-hardware enablement, Performance & stability, Contributions to the Linux kernel, Networking stack expertise, GPU/DPU bring-up and driver experience, Experience in HPC or large-scale distributed systems, QA/QE best practices, Cloud environments, Software engineer writing large-scale applications, Machine learning","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":165000,"maxValue":242000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_ff4d3a91-b20"},"title":"Principal Engineer - Perf and Benchmarking","description":"<p>We&#39;re looking for a Principal Engineer to be the technical lead of CoreWeave&#39;s Benchmarking &amp; Performance team. You will be responsible for our planet-scale performance data warehouse: Ingesting, storing, transforming and analyzing performance events in all the data centers across our global infrastructure.</p>\n<p>You will also be an integral part of achieving industry-leading end-to-end performance benchmarking publications: If MLPerf (Training &amp; Inference), Working closely with NVIDIA (Megatron-LM, TensorRT-LLM &amp; DGX cloud) and the open-source community (llm-d, vLLM and all popular ML frameworks) speak to you, come help us demonstrate CoreWeave&#39;s performance reliability leadership in the field.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Strategy &amp; Leadership - Define the multi-year benchmarking strategy and roadmap; prioritize models/workloads (LLMs, diffusion, vision, speech) and hardware tiers. Build, lead, and mentor a high-performing team of performance engineers and data analysts. Establish governance for claims: documented methodologies, versioning, reproducibility, and audit trails.</li>\n</ul>\n<ul>\n<li>Perf Ownership - Lead end-to-end MLPerf Inference and Training submissions: workload selection, cluster planning, runbooks, audits, and result publication. Coordinate optimization tracks with NVIDIA (CUDA, cuDNN, TensorRT/TensorRT-LLM, Triton, NCCL) to hit competitive results; drive upstream fixes where needed.</li>\n</ul>\n<ul>\n<li>Internal Latency &amp; Throughput Benchmarks - Design a Kubernetes-native, repeatable benchmarking service that exercises CoreWeave stacks across SUNK (Slurm on Kubernetes), Kueue, and Kubeflow pipelines. Measure and report p50/p95/p99 latency, jitter, tokens/s, time-to-first-token, cold-start/warm-start, and cost-per-token/request across models, precisions (BF16/FP8/FP4), batch sizes, and GPU types. Maintain a corpus of representative scenarios (streaming, batch, multi-tenant) and data sets; automate comparisons across software releases and hardware generations.</li>\n</ul>\n<ul>\n<li>Tooling &amp; Automation - Build CI/CD pipelines and K8s controllers/operators to schedule benchmarks at scale; integrate with observability stacks (Prometheus, Grafana, OpenTelemetry) and results warehouses. Implement supply-chain integrity for benchmark artifacts (SBOMs, Cosign signatures).</li>\n</ul>\n<ul>\n<li>Cross-functional &amp; Community - Partner with NVIDIA, key ISVs, and OSS projects (vLLM, Triton, KServe, PyTorch/DeepSpeed, ONNX Runtime) to co-develop optimizations and upstream improvements. Support Sales/SEs with authoritative numbers for RFPs and competitive evaluations; brief analysts and press with rigorous, defensible data.</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>10+ years building distributed systems or HPC/cloud services, with deep expertise on large-scale ML training or similar high-performance workloads.</li>\n</ul>\n<ul>\n<li>Proven track record of architecting or building planet-scale data systems (e.g., telemetry platforms, observability stacks, cloud data warehouses, large-scale OLAP engines).</li>\n</ul>\n<ul>\n<li>Deep understanding of GPU performance (CUDA, NCCL, RDMA, NVLink/PCIe, memory bandwidth), model-server stacks (Triton, vLLM, TensorRT-LLM, TorchServe), and distributed training frameworks (PyTorch FSDP/DeepSpeed/Megatron-LM).</li>\n</ul>\n<ul>\n<li>Proficient with Kubernetes and ML control planes; familiarity with SUNK, Kueue, and Kubeflow in production environments.</li>\n</ul>\n<ul>\n<li>Excellent communicator able to interface with executives, customers, auditors, and OSS communities.</li>\n</ul>\n<p><strong>Nice to have</strong></p>\n<ul>\n<li>Experience with time-series databases, log-structured merge trees (LSM), or custom storage engine development.</li>\n</ul>\n<ul>\n<li>Experience running MLPerf submissions (Inference and/or Training) or equivalent audited benchmarks at scale.</li>\n</ul>\n<ul>\n<li>Contributions to MLPerf, Triton, vLLM, PyTorch, KServe, or similar OSS projects.</li>\n</ul>\n<ul>\n<li>Experience benchmarking multi-region fleets and large clusters (thousands of GPUs).</li>\n</ul>\n<ul>\n<li>Publications/talks on ML performance, latency engineering, or large-scale benchmarking methodology.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_ff4d3a91-b20","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4627302006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$206,000 to $333,000","x-skills-required":["Distributed systems","HPC/cloud services","Large-scale ML training","GPU performance","Model-server stacks","Distributed training frameworks","Kubernetes","ML control planes","Time-series databases","Log-structured merge trees","Custom storage engine development"],"x-skills-preferred":["MLPerf submissions","Audited benchmarks","Contributions to OSS projects","Benchmarking multi-region fleets","Large clusters","Publications/talks on ML performance"],"datePosted":"2026-04-18T15:51:17.448Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Distributed systems, HPC/cloud services, Large-scale ML training, GPU performance, Model-server stacks, Distributed training frameworks, Kubernetes, ML control planes, Time-series databases, Log-structured merge trees, Custom storage engine development, MLPerf submissions, Audited benchmarks, Contributions to OSS projects, Benchmarking multi-region fleets, Large clusters, Publications/talks on ML performance","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":206000,"maxValue":333000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_5b6f9322-a9a"},"title":"Staff Engineer, Storage Engine","description":"<p>CoreWeave is seeking a Staff Engineer, Storage Engine to join their team. The successful candidate will design and implement distributed storage solutions to support scaling data-intensive AI workloads. They will contribute to the development of exabyte-scale, S3-compatible object storage and integrate dedicated storage clusters into diverse customer environments.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Designing and implementing distributed storage solutions to support scaling data-intensive AI workloads</li>\n<li>Contributing to the development of exabyte-scale, S3-compatible object storage</li>\n<li>Integrating dedicated storage clusters into diverse customer environments</li>\n<li>Working with technologies such as RDMA, GPU Direct Storage, and distributed filesystems protocols such as NFS or FUSE to optimize storage performance and efficiency</li>\n<li>Leading efforts to improve the reliability, durability, security, and observability of the storage stack</li>\n<li>Collaborating with operations teams to monitor, troubleshoot, and improve storage systems in production environments</li>\n<li>Setting the bar for developing metrics and dashboards to provide visibility into storage performance and health</li>\n<li>Analyzing telemetry and system data to drive improvements in throughput, latency, and resilience</li>\n<li>Working cross-functionally with platform, product, and infrastructure teams to deliver seamless storage capabilities across the stack</li>\n<li>Sharing knowledge and mentoring other engineers on best practices in building distributed, high-performance systems</li>\n</ul>\n<p>Requirements include:</p>\n<ul>\n<li>Bachelor&#39;s, Master&#39;s, or PhD degree in Computer Science, Engineering, or a related field</li>\n<li>8-10+ years of experience working in storage systems engineering or infrastructure</li>\n<li>Strong hands-on experience with object storage or distributed filesystems in production environments</li>\n<li>Experience with one or more storage protocols (e.g. S3, NFS) and file systems such as Ceph, DAOS, or similar</li>\n<li>Proficiency in a systems programming language such as Go, C, or Rust</li>\n<li>Proficiency leveraging AI tools to augment software development</li>\n<li>Familiarity with storage observability tools and telemetry pipelines (e.g., ClickHouse, Prometheus, Grafana)</li>\n<li>Experience working with cloud-native infrastructure, Kubernetes, and scalable system architectures</li>\n</ul>\n<p>The base salary range for this role is $188,000 to $275,000.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_5b6f9322-a9a","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4612047006","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$188,000 to $275,000","x-skills-required":["distributed storage","object storage","S3-compatible object storage","RDMA","GPU Direct Storage","distributed filesystems protocols","NFS","FUSE","storage performance and efficiency","reliability","durability","security","observability","telemetry","system data","throughput","latency","resilience","cloud-native infrastructure","Kubernetes","scalable system architectures"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:50:33.024Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"distributed storage, object storage, S3-compatible object storage, RDMA, GPU Direct Storage, distributed filesystems protocols, NFS, FUSE, storage performance and efficiency, reliability, durability, security, observability, telemetry, system data, throughput, latency, resilience, cloud-native infrastructure, Kubernetes, scalable system architectures","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":188000,"maxValue":275000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_ec7cc743-ef4"},"title":"Senior Software Engineer II, Inference","description":"<p>We&#39;re seeking a senior software engineer to join our team and lead the design and development of our Kubernetes-native inference platform. As a senior engineer, you will be responsible for leading design reviews, driving architecture, and ensuring the reliability and scalability of our platform.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Leading design reviews and driving architecture within the team</li>\n<li>Defining and owning SLIs/SLOs and ensuring post-incident actions land and reliability improves release-over-release</li>\n<li>Implementing advanced optimizations such as micro-batch schedulers, speculative decoding, and KV-cache reuse</li>\n<li>Strengthening incident posture through capacity planning, autoscaling policy, and rollback/traffic-shift strategies</li>\n<li>Mentoring IC1/IC2 engineers and reviewing cross-team designs to elevate coding/testing standards</li>\n</ul>\n<p>We&#39;re looking for someone with strong coding skills in Python or Go, deep familiarity with networked systems and performance, and hands-on experience with Kubernetes at production scale. If you have experience with inference internals, batching, caching, mixed precision, and streaming token delivery, that&#39;s a plus.</p>\n<p>In addition to a competitive salary, we offer a range of benefits including medical, dental, and vision insurance, company-paid life insurance, and flexible PTO. We&#39;re committed to creating a work environment that&#39;s inclusive, diverse, and supportive of our employees&#39; well-being.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_ec7cc743-ef4","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4604832006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$165,000 to $242,000","x-skills-required":["Python","Go","Kubernetes","Networked systems","Performance","Inference internals","Batching","Caching","Mixed precision","Streaming token delivery"],"x-skills-preferred":["CUDA kernels","NCCL/SHARP","RDMA/NUMA","GPU interconnect topologies","Contributions to inference frameworks","Experience with multi-team initiatives"],"datePosted":"2026-04-18T15:50:27.738Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Go, Kubernetes, Networked systems, Performance, Inference internals, Batching, Caching, Mixed precision, Streaming token delivery, CUDA kernels, NCCL/SHARP, RDMA/NUMA, GPU interconnect topologies, Contributions to inference frameworks, Experience with multi-team initiatives","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":165000,"maxValue":242000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_594b20c4-c28"},"title":"Infrastructure Engineer, Security","description":"<p>We&#39;re looking for an infrastructure engineer to own and evolve the security infrastructure that underpins our foundation models. In this role, you&#39;ll work across compute, storage, networking, and data platforms, making sure our systems are secure, reliable, and built to scale.</p>\n<p>You&#39;ll shape controls, architecture, and tooling so that security is part of how the platform works by default. You&#39;ll partner closely with research and product teams, enabling them to move quickly while keeping our models, data, and environments protected.</p>\n<p>Key responsibilities include:</p>\n<p>Architecting security patterns for platforms and services, including network segmentation, service-to-service authentication, RBAC, and policy enforcement in Kubernetes and cloud environments.</p>\n<p>Managing identity, access, and secrets for humans and services: workload and cross-cloud identity, least-privilege IAM, and secrets management.</p>\n<p>Building secure platforms for data ingestion, processing, and curation: classification, encryption, access controls, and safe sharing patterns across teams.</p>\n<p>Writing threat models and reviewing designs with researchers and engineers to help them ship features and experiments in a safe, scalable way.</p>\n<p>Automating security checks and building guardrails: policy-as-code, secure infrastructure baselines, validation in CI/CD, and tools that make the secure path the easiest one.</p>\n<p>Requirements include:</p>\n<p>Bachelor&#39;s degree or equivalent experience in engineering, or similar.</p>\n<p>Strong background with containers and orchestration (e.g., Kubernetes) and how to secure them (namespaces, network policies, pod security, admission controls, etc.).</p>\n<p>Practical experience with Infrastructure as Code (Terraform or similar), including secure patterns for provisioning networks, IAM, and shared services.</p>\n<p>Solid understanding of cloud networking and security: VPCs, load balancers, service discovery, mTLS, firewalls, and zero-trust-style architectures.</p>\n<p>Proficiency with a systems language such as Rust and scripting in Python for building platform components and internal tools.</p>\n<p>Evidence of owning complex, production-critical systems, including debugging issues that span infra, security, and application layers.</p>\n<p>Preferred qualifications include experience with ML infrastructure, GPU clusters, or large-scale training environments, as well as background in AI labs, HPC environments, or ML-heavy organizations.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_594b20c4-c28","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Thinking Machines Lab","sameAs":"https://thinkingmachineslab.com/","logo":"https://logos.yubhub.co/thinkingmachineslab.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/thinkingmachines/jobs/5015964008","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$200,000 - $475,000 USD","x-skills-required":["Kubernetes","Infrastructure as Code","Cloud Networking and Security","Systems Language (Rust)","Scripting (Python)"],"x-skills-preferred":["ML Infrastructure","GPU Clusters","Large-Scale Training Environments","AI Labs","HPC Environments"],"datePosted":"2026-04-18T15:50:20.174Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Kubernetes, Infrastructure as Code, Cloud Networking and Security, Systems Language (Rust), Scripting (Python), ML Infrastructure, GPU Clusters, Large-Scale Training Environments, AI Labs, HPC Environments","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":200000,"maxValue":475000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_372999e8-579"},"title":"Senior Software Engineer II, AI Workload Orchestration","description":"<p>As a Senior Software Engineer II on the AI Workload Orchestration team, you will help build and operate CoreWeave&#39;s Kubernetes-native platform for admitting, scheduling, and operating AI workloads at scale.</p>\n<p>This platform integrates multiple orchestration and scheduling frameworks such as Kueue, Volcano, and Ray to support modern AI training and inference workflows. It complements SUNK (Slurm on Kubernetes) by providing a Kubernetes-first, cloud-native orchestration layer with deep platform integration.</p>\n<p>You will own meaningful components of the platform, drive reliability and performance improvements, and help scale the system as customer demand and workload complexity continue to grow.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Design, build, and operate Kubernetes-native services for AI workload orchestration and scheduling</li>\n<li>Own one or more platform components end-to-end, including design, implementation, testing, and on-call support</li>\n<li>Improve scheduling latency, cluster utilization, and workload reliability through metrics-driven engineering</li>\n<li>Contribute to architectural discussions across services and influence design decisions within the platform</li>\n<li>Work closely with adjacent teams (CKS, infrastructure, managed inference) to ensure clean interfaces and integrations</li>\n<li>Mentor junior engineers and raise the quality bar for code, design, and operations</li>\n</ul>\n<p>About the role:</p>\n<ul>\n<li>5–8 years of professional software engineering experience in distributed systems, cloud infrastructure, or platform engineering</li>\n<li>Strong experience building production systems in Go (Python or C++ a plus)</li>\n<li>Solid understanding of Kubernetes fundamentals, APIs, controllers, and operating services in production</li>\n<li>Experience working with scheduling, resource management, or quota-based systems</li>\n<li>Proven ability to improve system reliability and performance using data and operational metrics</li>\n<li>Comfortable owning services in production and participating in on-call rotations</li>\n</ul>\n<p>Preferred:</p>\n<ul>\n<li>Experience with Kubernetes-native orchestration frameworks such as Kueue, Volcano, Ray, Kubeflow, or Argo Workflows</li>\n<li>Familiarity with GPU-based workloads, ML training, or inference pipelines</li>\n<li>Knowledge of scheduling concepts such as quota enforcement, pre-emption, and backfilling</li>\n<li>Experience with reliability practices including SLOs, alerting, and incident response</li>\n<li>Exposure to AI infrastructure, HPC, or large-scale distributed compute environments</li>\n</ul>\n<p>Why CoreWeave?</p>\n<p>At CoreWeave, we work hard, have fun, and move fast! We’re in an exciting stage of hyper-growth that you will not want to miss out on. We’re not afraid of a little chaos, and we’re constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values:</p>\n<ul>\n<li>Be Curious at Your Core</li>\n<li>Act Like an Owner</li>\n<li>Empower Employees</li>\n<li>Deliver Best-in-Class Client Experiences</li>\n<li>Achieve More Together</li>\n</ul>\n<p>The base salary range for this role is $165,000 to $242,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).</p>\n<p>What We Offer</p>\n<p>The range we’ve posted represents the typical compensation range for this role. To determine actual compensation, we review the market rate for each candidate which can include a variety of factors. These include qualifications, experience, interview performance, and location.</p>\n<p>In addition to a competitive salary, we offer a variety of benefits to support your needs, including:</p>\n<ul>\n<li>Medical, dental, and vision insurance - 100% paid for by CoreWeave</li>\n<li>Company-paid Life Insurance</li>\n<li>Voluntary supplemental life insurance</li>\n<li>Short and long-term disability insurance</li>\n<li>Flexible Spending Account</li>\n<li>Health Savings Account</li>\n<li>Tuition Reimbursement</li>\n<li>Ability to Participate in Employee Stock Purchase Program (ESPP)</li>\n<li>Mental Wellness Benefits through Spring Health</li>\n<li>Family-Forming support provided by Carrot</li>\n<li>Paid Parental Leave</li>\n<li>Flexible, full-service childcare support with Kinside</li>\n<li>401(k) with a generous employer match</li>\n<li>Flexible PTO</li>\n<li>Catered lunch each day in our office and data center locations</li>\n<li>A casual work environment</li>\n<li>A work culture focused on innovative disruption</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_372999e8-579","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4647595006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$165,000 to $242,000","x-skills-required":["Kubernetes","Go","Distributed systems","Cloud infrastructure","Platform engineering","Scheduling","Resource management","Quota-based systems"],"x-skills-preferred":["Kueue","Volcano","Ray","Kubeflow","Argo Workflows","GPU-based workloads","ML training","Inference pipelines","SLOs","Alerting","Incident response","AI infrastructure","HPC","Large-scale distributed compute environments"],"datePosted":"2026-04-18T15:50:19.636Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Kubernetes, Go, Distributed systems, Cloud infrastructure, Platform engineering, Scheduling, Resource management, Quota-based systems, Kueue, Volcano, Ray, Kubeflow, Argo Workflows, GPU-based workloads, ML training, Inference pipelines, SLOs, Alerting, Incident response, AI infrastructure, HPC, Large-scale distributed compute environments","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":165000,"maxValue":242000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_854e95b5-76b"},"title":"Sr. Director of Product, Research and Training Infrastructure","description":"<p>CoreWeave is seeking a visionary Sr. Director of Product, Research Training Infrastructure to lead the product strategy and engineering execution for the services that power the most ambitious AI research labs in the world.</p>\n<p>This executive leader will own the product strategy and engineering execution for the Research Training Stack, focusing on the specialized orchestration, evaluation, and iteration tools required for massive-scale pre-training and post-training.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Frontier Orchestration: Oversee the evolution of SUNK (Slurm on Kubernetes) to provide researchers with deterministic, bare-metal performance through a cloud-native interface.</li>\n</ul>\n<ul>\n<li>Holistic Training Services: Drive the development of next-generation orchestrators and automated training-based evaluation frameworks that ensure model quality throughout the lifecycle.</li>\n</ul>\n<ul>\n<li>Post-Training Excellence: Build the infrastructure required for sophisticated Reinforcement Learning (RL) and RLHF pipelines, enabling labs to refine foundation models with maximum efficiency.</li>\n</ul>\n<ul>\n<li>Customer Advocacy: Act as the primary technical partner for lead researchers at global AI labs, translating their &#39;future-state&#39; requirements into actionable product roadmaps.</li>\n</ul>\n<p>Requirements include:</p>\n<ul>\n<li>Proven leadership experience in engineering leadership, with at least 5+ years managing large-scale infrastructure at a top-tier research lab or an AI-native cloud provider.</li>\n</ul>\n<ul>\n<li>Deep, hands-on knowledge of Slurm, Kubernetes, and the specific networking requirements (InfiniBand/RDMA) for distributed training clusters.</li>\n</ul>\n<ul>\n<li>Research mindset and understanding of the &#39;pain points&#39; of a research scientist.</li>\n</ul>\n<ul>\n<li>Scaling experience delivering mission-critical services on multi-thousand GPU clusters (H100/Blackwell/Rubin architectures).</li>\n</ul>\n<ul>\n<li>Strategic vision to define &#39;what&#39;s next&#39; in the AI stack, from automated RL loops to specialized sandbox environments.</li>\n</ul>\n<p>Why CoreWeave?</p>\n<p>In 2026, CoreWeave is the foundation of the largest infrastructure buildout in human history. We are building AI Factories, not just data centers.</p>\n<ul>\n<li>Silicon-Up Innovation: Work directly with the latest NVIDIA architectures.</li>\n</ul>\n<ul>\n<li>Impact: You will be the architect of the environment that enables the next new discovery.</li>\n</ul>\n<p>Velocity: We move at the speed of the researchers we support, bypassing legacy cloud bottlenecks to deliver raw power.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_854e95b5-76b","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4665964006","x-work-arrangement":"hybrid","x-experience-level":"executive","x-job-type":"full-time","x-salary-range":"$233,000 to $341,000","x-skills-required":["Slurm","Kubernetes","InfiniBand/RDMA","Distributed training clusters","GPU clusters","H100/Blackwell/Rubin architectures","Reinforcement Learning (RL)","RLHF pipelines"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:50:11.130Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Slurm, Kubernetes, InfiniBand/RDMA, Distributed training clusters, GPU clusters, H100/Blackwell/Rubin architectures, Reinforcement Learning (RL), RLHF pipelines","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":233000,"maxValue":341000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_07256a9e-2a8"},"title":"Senior Electrical Engineer","description":"<p>We&#39;re seeking a skilled Senior Electrical Engineer to join our team. As a senior member of our engineering team, you will design servers, develop and review board designs, collaborate with exceptional engineers developing cutting-edge AI/ML hardware, and review JDM&#39;s high-speed design. You will also conduct schematics, board design, and power design reviews, take design from concept to mass production, and work closely with manufacturing teams. To be successful in this role, you should have at least 5 years of internal design experience developing complex hardware systems with high-speed design interfaces, strong skills in electrical board design, and solid experience with high-speed interfaces. You should also be able to negotiate and reach consensus with developers and fellow colleagues from interdisciplinary teams, as well as have excellent documentation skills.</p>\n<p>In addition to a competitive salary, we offer a variety of benefits to support your needs, including medical, dental, and vision insurance, company-paid life insurance, voluntary supplemental life insurance, short and long-term disability insurance, flexible spending account, health savings account, tuition reimbursement, ability to participate in employee stock purchase program (ESPP), mental wellness benefits through Spring Health, family-forming support provided by Carrot, paid parental leave, flexible, full-service childcare support with Kinside, 401(k) with a generous employer match, flexible PTO, catered lunch each day in our office and data center locations, and a casual work environment.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_07256a9e-2a8","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4606485006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$165,000 to $242,000","x-skills-required":["electrical board design","high-speed interfaces","server design","power design","thermal design","mechanical design","signal integrity","PCB design","PCBA design","system assembly manufacturing","testing","design for mass manufacturing","reliability"],"x-skills-preferred":["hyperscaler space","GPU systems","ODM/JDM design model","high-speed SI simulation tools"],"datePosted":"2026-04-18T15:50:03.750Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"New York, NY / Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"electrical board design, high-speed interfaces, server design, power design, thermal design, mechanical design, signal integrity, PCB design, PCBA design, system assembly manufacturing, testing, design for mass manufacturing, reliability, hyperscaler space, GPU systems, ODM/JDM design model, high-speed SI simulation tools","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":165000,"maxValue":242000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_a8092b6e-7f5"},"title":"Bare Metal Support Engineer","description":"<p>As a Bare Metal Support Engineer at CoreWeave, you will be responsible for supporting, operating, and maintaining CoreWeave&#39;s extensive GPU fleet across our growing data centers in the U.S., Europe, and beyond.</p>\n<p>You will work closely with customers, data center technicians, and engineering teams to ensure the reliability, performance, and scalability of our infrastructure.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Providing high-level support for customers utilizing bare-metal GPU fleets on CoreWeave Cloud.</li>\n<li>Diagnosing, triaging, and investigating reported customer issues and high-priority incidents, identifying root causes and escalating when necessary.</li>\n<li>Developing a deep understanding of customer workloads and use cases to provide tailored technical support.</li>\n<li>Coordinating remote troubleshooting and hardware interventions with Data Center Technicians.</li>\n<li>Creating and maintaining internal documentation, including troubleshooting guides, best practices, and knowledge base articles.</li>\n<li>Participating in an on-call rotation to support production clusters and ensure operational reliability.</li>\n<li>Collaborating with engineering teams to improve hardware reliability, software stability, and system performance.</li>\n<li>Implementing automation and scripting to streamline support workflows and reduce manual interventions.</li>\n<li>Performing in-depth log analysis and debugging across multiple layers of the stack (firmware, drivers, hardware).</li>\n<li>Providing feedback to internal teams on common support issues to drive continuous improvements.</li>\n<li>Working with networking teams to troubleshoot connectivity issues affecting customer workloads.</li>\n<li>Supporting supercomputing infrastructure running GPU workloads at scale.</li>\n<li>Driving operational excellence by refining internal processes and support methodologies.</li>\n</ul>\n<p>To succeed in this role, you will need:</p>\n<ul>\n<li>Experience in data centers, GPU clusters, server deployments, system administration, or hardware troubleshooting.</li>\n<li>Demonstrated experience driving resolutions and continuous improvements across cross-functional environments and teams within a data center environment.</li>\n<li>Intermediate knowledge of Linux (Ubuntu, CentOS, or similar), including command-line proficiency.</li>\n<li>Experience with NVIDIA GPUs, SuperMicro systems, Dell systems, high-performance computing (HPC), and large-scale data center environments.</li>\n<li>Experience in networking fundamentals (TCP/IP, VLANs, DNS, DHCP) and troubleshooting tools.</li>\n<li>Hands-on experience with firmware updates, BIOS configurations, and driver management.</li>\n<li>Experience analyzing system logs and debugging issues across firmware, drivers, and hardware layers.</li>\n<li>Experience working with Jira, Confluence, Notion, or other issue-tracking and documentation platforms.</li>\n<li>Experience in scripting and automation (Python, Bash, Ansible, or similar).</li>\n</ul>\n<p>If you&#39;re a curious and analytical individual with a passion for problem-solving and a desire to work in a fast-paced environment, we&#39;d love to hear from you!</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_a8092b6e-7f5","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4560350006","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$83,000 to $132,000","x-skills-required":["Linux","GPU clusters","server deployments","system administration","hardware troubleshooting","NVIDIA GPUs","SuperMicro systems","Dell systems","high-performance computing","large-scale data center environments","networking fundamentals","troubleshooting tools","firmware updates","BIOS configurations","driver management","system logs","debugging issues","Jira","Confluence","Notion","issue-tracking","documentation platforms","scripting","automation"],"x-skills-preferred":["Kubernetes","Docker","containerized infrastructure"],"datePosted":"2026-04-18T15:49:58.535Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Linux, GPU clusters, server deployments, system administration, hardware troubleshooting, NVIDIA GPUs, SuperMicro systems, Dell systems, high-performance computing, large-scale data center environments, networking fundamentals, troubleshooting tools, firmware updates, BIOS configurations, driver management, system logs, debugging issues, Jira, Confluence, Notion, issue-tracking, documentation platforms, scripting, automation, Kubernetes, Docker, containerized infrastructure","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":83000,"maxValue":132000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_a2b0b667-b4a"},"title":"Senior DSP Engineer","description":"<p>Anduril Industries is seeking a Senior DSP Engineer to join their team. As a Senior DSP Engineer, you will guide DSP engineers in the execution of DSP trade studies and optimization of signal processing and machine learning algorithms for deployment on FPGAs and GPUs. You will collaborate with a multidisciplinary team of software and hardware engineers to develop software defined radios; and direct DSP team in the engagement with the software &amp; hardware team, including the implementation of DSP techniques into software and firmware and integration activities. You will design and implement algorithms and techniques for RADAR systems, develop Modeling and Simulation (M&amp;S) code for RADAR techniques and data analysis including Hardware-in-the Loop / Software-in-the-loop (HIL/SIL) testing, participate in laboratory and field testing of RF systems and techniques, and participate in the maturation of RF systems into deployable systems and products.</p>\n<p>The ideal candidate will have 7+ years of experience with a BSEE or related field, strong experience with DSP implementation for embedded devices and/or software defined radios, strong knowledge of Python and MATLAB, experience with CUDA or GPU accelerated frameworks like cuSignal, experience with embedded devices, including FPGA, Nvidia Jetson, and Software Defined Radios, skilled with Modeling and Simulation of RF systems including Radar and SAR, familiar with deep learning algorithms, experience with ML frameworks such as TensorFlow and PyTorch, familiar with wireless communication standards (Bluetooth, 3G/4G/5G, Wi-Fi, SINCGARS, MUOS, etc.), excellent at balancing multiple projects at any given time and/or managing a larger team for a larger program, enthusiastic about both working with a team and executing some work individually (depending on program scope), experience with Electronic Warfare systems, and currently possesses and is able to maintain an active U.S. Secret security clearance.</p>\n<p>Preferred qualifications include a Master&#39;s or PhD degree in Electrical, Electronics, Computer Engineering, or related fields, defense, national security, or aerospace domain familiarity through industry or education, extensive Digital Signal Processing (DSP) knowledge and experience, expertise in Synthetic Aperture Radar (SAR) and/or Inverse SAR (ISAR): Image formation, waveforms, phenomenology, modeling and simulation.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_a2b0b667-b4a","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anduril Industries","sameAs":"https://anduril.com","logo":"https://logos.yubhub.co/anduril.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/andurilindustries/jobs/5031497007","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$191,000-$253,000 USD","x-skills-required":["Digital Signal Processing","Embedded Devices","Software Defined Radios","Python","MATLAB","CUDA","GPU Accelerated Frameworks","Modeling and Simulation","RF Systems","Radar and SAR","Deep Learning Algorithms","ML Frameworks","Wireless Communication Standards","Electronic Warfare Systems"],"x-skills-preferred":["Synthetic Aperture Radar (SAR)","Inverse SAR (ISAR)","Image Formation","Waveforms","Phenomenology"],"datePosted":"2026-04-18T15:49:39.269Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Costa Mesa, California, United States"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Digital Signal Processing, Embedded Devices, Software Defined Radios, Python, MATLAB, CUDA, GPU Accelerated Frameworks, Modeling and Simulation, RF Systems, Radar and SAR, Deep Learning Algorithms, ML Frameworks, Wireless Communication Standards, Electronic Warfare Systems, Synthetic Aperture Radar (SAR), Inverse SAR (ISAR), Image Formation, Waveforms, Phenomenology","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":191000,"maxValue":253000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_46f8a259-843"},"title":"Technical Project Manager - Afton","description":"<p>We are seeking a highly skilled Technical Project Manager with expertise in data center deployments to join our team. The ideal candidate will have a deep understanding of data center infrastructure, including power, cooling, networking, and server technologies, combined with strong project management skills to ensure projects are delivered on time, within scope, and on budget.</p>\n<p>The role will require you to be 100% on-site in Afton/Lubbock, TX. As a Technical Project Manager, you will lead the full lifecycle of multiple simultaneous data center deployment projects, including design, construction, testing, commissioning, and handover to operations.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Project planning and execution</li>\n<li>Stakeholder management</li>\n<li>Resource management</li>\n<li>Risk management</li>\n<li>Scheduling and budgeting</li>\n<li>Quality assurance</li>\n<li>Documentation</li>\n<li>Technical oversight and collaboration</li>\n<li>Continual improvement</li>\n</ul>\n<p>To be successful in this role, you will need to have 5+ years of direct, hands-on experience in data center deployment, data center construction/whitespace project management, or technical project management in data centers. You will also need to have experience with high-performance compute environments and GPU technologies, as well as proven project management skills and a strong understanding of project management methodologies and tools.</p>\n<p>In addition to your technical skills and experience, you will need to have excellent leadership and team management skills, strong communication and interpersonal skills, and the ability to analyze complex technical issues, troubleshoot problems, and propose innovative solutions.</p>\n<p>If you are a motivated and experienced Technical Project Manager looking for a new challenge, please apply for this exciting opportunity.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_46f8a259-843","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4625187006","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$122,000 to $179,000","x-skills-required":["data center deployment","project management","high-performance compute environments","GPU technologies","project management methodologies and tools"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:49:18.824Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Afton, TX"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"data center deployment, project management, high-performance compute environments, GPU technologies, project management methodologies and tools","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":122000,"maxValue":179000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_95061695-858"},"title":"Director of Engineering, Media & Entertainment (M&E)","description":"<p>CoreWeave is seeking a Director of Engineering, Media &amp; Entertainment (M&amp;E) to lead the development of next-generation cloud platforms and tools that power modern content creation workflows. This role will drive the engineering strategy and execution for solutions that support visual effects (VFX), animation, rendering, and post-production pipelines used by studios, artists, and creative teams worldwide.</p>\n<p>As a senior engineering leader, you will build and lead high-performing engineering teams responsible for designing scalable infrastructure, developer tools, and user-facing systems that enable creative professionals to run complex production workloads in the cloud. You will collaborate closely with product, design, infrastructure, and customer teams to translate real-world production workflows into reliable, high-performance software platforms.</p>\n<p>This role combines deep engineering leadership with domain expertise in M&amp;E workflows, ensuring that the platform delivers exceptional performance, reliability, and usability for demanding creative workloads.</p>\n<p><strong>Leadership &amp; Strategy</strong></p>\n<p>-Build and scale high-performing engineering teams focused on cloud platforms for media production workloads including rendering, simulation, and content processing. -Recruit, mentor, and develop engineering managers and senior engineers while fostering a culture of innovation, accountability, and collaboration. -Define and execute the long-term engineering strategy for Media &amp; Entertainment products and services. -Partner with Product and Design leaders to translate industry workflows and customer needs into scalable platform capabilities. -Establish engineering best practices for reliability, security, observability, and operational excellence. -Drive roadmap alignment between engineering initiatives and strategic business objectives.</p>\n<p><strong>Technical Leadership</strong></p>\n<p>-Lead the design and development of scalable backend services, APIs, and developer interfaces that power M&amp;E cloud workflows. -Build platforms that support demanding workloads such as rendering, asset processing, and distributed compute pipelines. -Drive architecture decisions for cloud-native systems leveraging technologies such as Kubernetes, distributed services, and infrastructure-as-code. -Ensure the platform enables self-service provisioning, automation, and repeatable workflows for production pipelines. -Establish engineering standards around performance, scalability, and security for enterprise-grade SaaS/PaaS systems. -Oversee system reliability and operational readiness through clear SLOs, monitoring, and runbook-driven on-call practices.</p>\n<p><strong>Product &amp; Workflow Collaboration</strong></p>\n<p>-Work closely with product leadership to define technical requirements aligned with real customer workflows in animation, VFX, and media production. -Engage directly with studios, artists, and technical directors to understand pipeline challenges and incorporate feedback into product development. -Translate industry needs into clear engineering priorities and technical roadmaps. -Guide development teams through product milestones including specification, development, testing, and release. -Ensure engineering efforts balance customer requirements, technical feasibility, and business goals.</p>\n<p>Customer and industry collaboration is critical in identifying workflow needs and transforming them into actionable development plans for engineering teams.</p>\n<p><strong>Operational Excellence</strong></p>\n<p>-Implement engineering processes that support scalable development, including CI/CD pipelines, testing strategies, and code review standards. -Manage development timelines and resource allocation across multiple engineering teams. -Track key operational and customer metrics including performance, reliability, and cost efficiency. -Drive continuous improvement in engineering productivity and system performance. -Partner with QA, support, and customer success teams to ensure high-quality releases and strong user satisfaction.</p>\n<p><strong>Who You Are:</strong></p>\n<p><strong>Required Qualifications</strong></p>\n<p>-10+ years of software engineering experience, including leadership of engineering teams and managers -Proven experience building and scaling cloud-based platforms or distributed systems. -Strong understanding of cloud infrastructure, microservices architecture, and automation technologies. -Experience delivering enterprise SaaS or PaaS products used by external customers. -Excellent leadership, communication, and cross-functional collaboration skills. -Ability to operate strategically while remaining deeply technical and hands-on with architecture decisions.</p>\n<p><strong>Preferred Qualifications</strong></p>\n<p>-Experience building platforms or tools for Media &amp; Entertainment workflows such as VFX, animation, rendering, or post-production pipelines. -Familiarity with industry tools such as Maya, Houdini, Katana, Cinema 4D, V-Ray, Arnold, or RenderMan. -Experience designing APIs, developer platforms, or automation frameworks used by technical users. -Knowledge of GPU-accelerated compute workloads and distributed rendering systems. -Experience working with Kubernetes, infrastructure-as-code, and large-scale cloud environments.</p>\n<p><strong>What Success Looks Like</strong></p>\n<p>-Engineering teams delivering reliable, scalable platforms used by media studios and creative teams globally. -Clear alignment between product vision, customer workflows, and engineering execution. -Platforms capable of supporting large-scale production workloads with high performance and reliability. -Strong engineering culture focused on innovation, collaboration, and operational excellence.</p>\n<p>Wondering if you’re a good fit? We believe in investing in our people, and value candidates who can bring their own diversified experiences to our teams – even if you aren&#39;t a 100% skill or experience match.</p>\n<p><strong>Why CoreWeave?</strong></p>\n<p>At CoreWeave, we work hard, have fun, and move fast! We’re in an exciting stage of hyper-growth that you will not want to miss out on. We’re not afraid of a little chaos, and we’re constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values:</p>\n<p>-Be Curious at Your Core -Act Like an Owner -Empower Employees -Deliver Best-in-Class Client Experiences -Achieve More Together</p>\n<p>We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and provides the opportunity to develop innovative solutions to complex problems. As we get set for take off, the growth opportunities within the organization are constantly expanding. You will be surrounded by some of the best talent in the industry, who will want to learn from you, too. Come join us!</p>\n<p>The base salary range for this role is $206,000 to $303,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_95061695-858","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4666156006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$206,000 - $303,000","x-skills-required":["Cloud infrastructure","Microservices architecture","Automation technologies","Enterprise SaaS or PaaS products","Leadership","Communication","Cross-functional collaboration","Strategic decision-making"],"x-skills-preferred":["Media & Entertainment workflows","VFX, animation, rendering, or post-production pipelines","Industry tools such as Maya, Houdini, Katana, Cinema 4D, V-Ray, Arnold, or RenderMan","APIs, developer platforms, or automation frameworks","GPU-accelerated compute workloads and distributed rendering systems","Kubernetes, infrastructure-as-code, and large-scale cloud environments"],"datePosted":"2026-04-18T15:49:14.916Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Livingston, NJ / New York, NY / San Francisco, CA / Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Cloud infrastructure, Microservices architecture, Automation technologies, Enterprise SaaS or PaaS products, Leadership, Communication, Cross-functional collaboration, Strategic decision-making, Media & Entertainment workflows, VFX, animation, rendering, or post-production pipelines, Industry tools such as Maya, Houdini, Katana, Cinema 4D, V-Ray, Arnold, or RenderMan, APIs, developer platforms, or automation frameworks, GPU-accelerated compute workloads and distributed rendering systems, Kubernetes, infrastructure-as-code, and large-scale cloud environments","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":206000,"maxValue":303000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_2d198020-3d5"},"title":"Sr. Engineer, Storage","description":"<p>The Storage Engine Team at CoreWeave is responsible for the product capabilities and data plane function of CoreWeave&#39;s managed storage products. We build reliable, scalable storage solutions with segment leading performance. Storage engine works with engineering teams across infrastructure, compute, and platform to ensure our storage services meet the needs of the world&#39;s most demanding AI workloads.</p>\n<p>The primary responsibilities of this role include designing and implementing distributed storage solutions to support scaling data-intensive AI workloads, contributing to the development of exabyte-scale, S3-compatible object storage, and integrating dedicated storage clusters into diverse customer environments. Additionally, the successful candidate will work with technologies such as RDMA, GPU Direct Storage, and distributed filesystems protocols such as NFS or FUSE to optimize storage performance and efficiency.</p>\n<p>Key responsibilities also include leading efforts to improve the reliability, durability, security, and observability of our storage stack, collaborating with operations teams to monitor, troubleshoot, and improve storage systems in production environments, setting the bar for developing metrics and dashboards to provide visibility into storage performance and health, analyzing telemetry and system data to drive improvements in throughput, latency, and resilience, and working cross-functionally with platform, product, and infrastructure teams to deliver seamless storage capabilities across the stack.</p>\n<p>A key aspect of this role is sharing knowledge and mentoring other engineers on best practices in building distributed, high-performance systems.</p>\n<p>To be successful in this role, the ideal candidate will have a strong background in storage systems engineering or infrastructure, with a minimum of 8-10 years of experience. They will also have hands-on experience with object storage or distributed filesystems in production environments, as well as proficiency in a systems programming language such as Go, C, or Rust. Additionally, they will have experience working with cloud-native infrastructure, Kubernetes, and scalable system architectures, and familiarity with storage observability tools and telemetry pipelines.</p>\n<p>If you&#39;re a motivated and experienced engineer looking to join a dynamic team and contribute to the development of cutting-edge storage solutions, we encourage you to apply for this exciting opportunity.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_2d198020-3d5","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4664429006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$143,000 to $210,000","x-skills-required":["storage systems engineering","infrastructure","object storage","distributed filesystems","RDMA","GPU Direct Storage","NFS","FUSE","cloud-native infrastructure","Kubernetes","scalable system architectures","storage observability tools","telemetry pipelines"],"x-skills-preferred":["Go","C","Rust","distributed systems","high-performance systems","storage performance and efficiency"],"datePosted":"2026-04-18T15:49:07.662Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"storage systems engineering, infrastructure, object storage, distributed filesystems, RDMA, GPU Direct Storage, NFS, FUSE, cloud-native infrastructure, Kubernetes, scalable system architectures, storage observability tools, telemetry pipelines, Go, C, Rust, distributed systems, high-performance systems, storage performance and efficiency","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":143000,"maxValue":210000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_d6f9b362-dbe"},"title":"Senior  Machine Learning Engineer, ML Training Platform","description":"<p>As a Senior Machine Learning Engineer on the Machine Learning Platform team at Reddit, you will be instrumental in architecting, implementing, and maintaining foundational Machine Learning (ML) infrastructure that powers Feeds Ranking, Content Understanding, Recommendations and more.</p>\n<p>You will deliver a self-service ML platform that enables the continuous iteration and improvement of systems that use ML techniques including Deep Learning, Natural Language Processing, Recommendation Systems, Representation Learning and Computer Vision.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Leading the building, testing, and maintenance of ML training infrastructure at Reddit</li>\n<li>Designing, building, and optimizing the infrastructure and tooling required to support large-scale machine learning workflows</li>\n<li>Evolving the MLE experience, from provisioning interactive GPU environments through large-scale training, supporting on-demand and self-service workflows</li>\n</ul>\n<p>You will work closely with the underlying compute team to ensure MLEs have efficient access to training hardware resources and handle resource contention gracefully.</p>\n<p>In addition to technical expertise, you will treat internal MLEs as your customers, conducting user research, reducing friction in the &#39;Idea-to-Prototype&#39; loop, and standardizing software environments (Docker images, Python dependency management).</p>\n<p>To be successful in this role, you will have 5+ years of software engineering experience, with a focus on Platform Engineering, ML Infrastructure, or Backend Systems. You will also have deep Kubernetes expertise, Jupyter Ecosystem knowledge, strong coding skills in Python and Go, and experience with GPU environments, cloud providers, and distributed training frameworks.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_d6f9b362-dbe","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Reddit","sameAs":"https://www.redditinc.com","logo":"https://logos.yubhub.co/redditinc.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/reddit/jobs/7074776","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$216,700-$303,400 USD","x-skills-required":["Kubernetes","Jupyter Ecosystem","Python","Go","GPU environments","Cloud providers","Distributed training frameworks"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:48:57.345Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Remote - United States"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Kubernetes, Jupyter Ecosystem, Python, Go, GPU environments, Cloud providers, Distributed training frameworks","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":216700,"maxValue":303400,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_59e88547-efc"},"title":"Senior Software Engineer, Systems","description":"<p>About Anthropic</p>\n<p>Anthropic&#39;s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole.</p>\n<p>About the Role</p>\n<p>Anthropic&#39;s Infrastructure organization is foundational to our mission of developing AI systems that are reliable, interpretable, and steerable. The systems we build determine how quickly we can train new models, how reliably we can run safety experiments, and how effectively we can scale Claude to millions of users , demonstrating that safe, reliable infrastructure and frontier capabilities can go hand in hand. The Systems engineering team owns compute uptime and resilience at massive scale, building the clusters, automation, and observability that make frontier AI research possible and safely deployable to customers.</p>\n<p>Responsibilities</p>\n<ul>\n<li>Lead infrastructure projects from design through delivery, owning scope, execution, and outcomes</li>\n<li>Build and maintain systems that support AI clusters at massive scale (thousands to hundreds of thousands of machines)</li>\n<li>Partner with cloud providers and internal teams to solve compute, networking, and reliability challenges</li>\n<li>Tackle difficult technical problems in your domain and proactively fill gaps in tooling, documentation, and processes</li>\n<li>Contribute to operational practices including incident response, postmortems, and on-call rotations</li>\n</ul>\n<p>Benefits</p>\n<ul>\n<li>Competitive compensation and benefits</li>\n<li>Optional equity donation matching</li>\n<li>Generous vacation and parental leave</li>\n<li>Flexible working hours</li>\n<li>Lovely office space in which to collaborate with colleagues</li>\n</ul>\n<p>Requirements</p>\n<ul>\n<li>6+ years of software engineering experience</li>\n<li>Have led technical projects end-to-end over multiple months, including scoping, breaking down work, and driving delivery</li>\n<li>Have deep knowledge of distributed systems, reliability, and cloud platforms (Kubernetes, IaC, AWS/GCP)</li>\n<li>Are strong in at least one systems language (Python, Rust, Go, Java)</li>\n<li>Solve hard problems independently and know when to pull others in</li>\n<li>Help teammates grow through knowledge sharing and thoughtful technical guidance</li>\n<li>Communicate clearly in design docs, presentations, and cross-functional discussions</li>\n</ul>\n<p>Preferred Qualifications</p>\n<ul>\n<li>Security and privacy best practice expertise</li>\n<li>Experience with machine learning infrastructure like GPUs, TPUs, or Trainium, as well as supporting networking infrastructure like NCCL</li>\n<li>Low level systems experience, for example linux kernel tuning and eBPF</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_59e88547-efc","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4915842008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"£240,000-£325,000 GBP","x-skills-required":["Distributed systems","Reliability","Cloud platforms","Kubernetes","IaC","AWS/GCP","Systems language","Python","Rust","Go","Java"],"x-skills-preferred":["Security and privacy best practice","Machine learning infrastructure","GPUs","TPUs","Trainium","Networking infrastructure","NCCL","Low level systems experience","Linux kernel tuning","eBPF"],"datePosted":"2026-04-18T15:48:47.617Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"London, UK"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Distributed systems, Reliability, Cloud platforms, Kubernetes, IaC, AWS/GCP, Systems language, Python, Rust, Go, Java, Security and privacy best practice, Machine learning infrastructure, GPUs, TPUs, Trainium, Networking infrastructure, NCCL, Low level systems experience, Linux kernel tuning, eBPF","baseSalary":{"@type":"MonetaryAmount","currency":"GBP","value":{"@type":"QuantitativeValue","minValue":240000,"maxValue":325000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_f93bc55a-0cb"},"title":"Data Center Operations Cost Manager","description":"<p>As a Data Center Operations Cost Manager at CoreWeave, you will own and manage data center construction cost planning, forecasting, and control processes from feasibility through final closeout. Your key responsibilities will include lead pre-construction budgeting and detailed cost modeling to optimize capital efficiency, implement and enforce financial controls and leadership reporting mechanisms, develop and maintain program-level cost forecasts across multiple projects and regions, provide cost estimates and commercial risk assessments to inform location strategy and investment decisions, benchmark construction costs, escalation trends, and market rates across regions to maintain current cost intelligence, and manage internal cost control resources and external cost consultants to ensure consistency and excellence.</p>\n<p>You will be responsible for coordinating and executing complex, multi-team technology and operational initiatives, interpreting technical data, mapping workflows, and driving continuous process improvements. You will also be an expert in managing project timelines, resource planning, and ensuring clear communication with all stakeholders.</p>\n<p>In this role, you will work closely with the DC (Data Center) Program and Cost Management team to ensure strategic program objectives are translated into successful outcomes and the cost structure of building and managing data centers is optimized.</p>\n<p>CoreWeave is a rapidly growing company that is looking for a highly skilled and experienced professional to join our team. We offer a competitive salary range of $157,000 to $210,000, a discretionary bonus, equity awards, and a comprehensive benefits program.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_f93bc55a-0cb","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4670115006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$157,000 to $210,000","x-skills-required":["Cost management","Construction cost planning","Analysis","Data center deployment","Infrastructure project implementation","Cost management in a data center environment"],"x-skills-preferred":["Data Center Infrastructure","GPU-based technologies","Software tools such as NetSuite, Oracle, SAP, Maximo, Tableau, AutoCAD Construction Cloud, Procore, Sunbird, or building analytics/AI platforms"],"datePosted":"2026-04-18T15:48:09.854Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Livingston, NJ / New York, NY / Sunnyvale, CA / San Francisco, CA / Bellevue, WA/Richmond,CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Cost management, Construction cost planning, Analysis, Data center deployment, Infrastructure project implementation, Cost management in a data center environment, Data Center Infrastructure, GPU-based technologies, Software tools such as NetSuite, Oracle, SAP, Maximo, Tableau, AutoCAD Construction Cloud, Procore, Sunbird, or building analytics/AI platforms","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":157000,"maxValue":210000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_9701c504-1a6"},"title":"Senior Software Engineer I, Inference","description":"<p>We&#39;re looking for a Senior Software Engineer I to join our team. As a senior engineer, you&#39;ll lead designs, raise engineering standards, and deliver measurable improvements to latency, throughput, and reliability across multiple services. You&#39;ll partner with product, orchestration, and hardware teams to evolve our Kubernetes-native inference platform and meet strict P99 SLAs at scale.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Lead design reviews and drive architecture within the team; decompose multi-service work into clear milestones.</li>\n<li>Define and own SLIs/SLOs; ensure post-incident actions land and reliability improves release-over-release.</li>\n<li>Implement advanced optimizations (e.g., micro-batch schedulers, speculative decoding, KV-cache reuse) and quantify impact.</li>\n<li>Strengthen incident posture: capacity planning, autoscaling policy, graceful degradation, rollback/traffic-shift strategies.</li>\n<li>Mentor IC1/IC2 engineers; review cross-team designs and elevate coding/testing standards.</li>\n</ul>\n<p>Requirements include:</p>\n<ul>\n<li>3-5 years of industry experience building distributed systems or cloud services.</li>\n<li>Strong coding in Python or Go (C++ a plus) and deep familiarity with networked systems and performance.</li>\n<li>Hands-on experience with Kubernetes at production scale, CI/CD, and observability stacks (Prometheus, Grafana, OpenTelemetry).</li>\n<li>Practical knowledge of inference internals: batching, caching, mixed precision (BF16/FP8), streaming token delivery.</li>\n<li>Proven track record improving tail latency (P95/P99) and service reliability through metrics-driven work.</li>\n</ul>\n<p>Preferred qualifications include contributions to inference frameworks, experience with CUDA kernels, NCCL/SHARP, RDMA/NUMA, or GPU interconnect topologies, and leading multi-team initiatives or partnering with customers on mission-critical launches.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_9701c504-1a6","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4647603006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$139,000 to $204,000","x-skills-required":["Python","Go","Kubernetes","CI/CD","Observability stacks","Inference internals","Batching","Caching","Mixed precision","Streaming token delivery"],"x-skills-preferred":["Contributions to inference frameworks","CUDA kernels","NCCL/SHARP","RDMA/NUMA","GPU interconnect topologies"],"datePosted":"2026-04-18T15:48:09.297Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Go, Kubernetes, CI/CD, Observability stacks, Inference internals, Batching, Caching, Mixed precision, Streaming token delivery, Contributions to inference frameworks, CUDA kernels, NCCL/SHARP, RDMA/NUMA, GPU interconnect topologies","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":139000,"maxValue":204000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_126e36d8-668"},"title":"Perception Engineering Intern","description":"<p>We are seeking a perception engineer with a strong background in computer vision to join our rapidly growing team in Costa Mesa, CA. In this role, you will be at the forefront of developing advanced perception systems for complex autonomous aerial platforms.</p>\n<p>Your expertise in computer vision algorithms, combined with your understanding of robotics principles, will be crucial in solving a wide variety of challenges involving visual perception, SLAM, motion planning, controls, and state estimation. This role requires not only technical expertise in computer vision and robotics but also the ability to make pragmatic engineering tradeoffs, considering the unique constraints of aerial platforms.</p>\n<p>Your work will directly contribute to the seamless integration of Anduril&#39;s products, achieving critical outcomes in autonomous operations. This position demands strong systems-level knowledge and experience, as you&#39;ll be working on the intersection of computer vision, robotics, and autonomous systems.</p>\n<p>If you are passionate about pushing the boundaries of computer vision in robotics, possess a &#39;Whatever It Takes&#39; mindset, and can execute in an expedient, scalable, and pragmatic way while keeping the mission top-of-mind and making sound engineering decisions, then this role is for you.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Work at the intersection of 3D perception and computer vision, developing robust algorithms that power real-time decision-making for autonomous aerial systems.</li>\n</ul>\n<ul>\n<li>Develop and implement advanced structure from motion and SLAM algorithms to create accurate 3D models from multiple camera inputs in real-time.</li>\n</ul>\n<ul>\n<li>Integrate perception outputs with path planning algorithms to enable autonomous navigation in complex, unstructured environments.</li>\n</ul>\n<ul>\n<li>Design experiments, data collection efforts, and curate training/evaluation sets to develop insights for both internal purposes and customers.</li>\n</ul>\n<ul>\n<li>Collaborate closely with robotics, software, and hardware teams to integrate perception algorithms into autonomous aerial systems.</li>\n</ul>\n<ul>\n<li>Work with vendors and government stakeholders to advance the state-of-the-art in perception and world modeling for autonomous aerial systems.</li>\n</ul>\n<p>Required Qualifications:</p>\n<ul>\n<li>BS in Robotics, Computer Science, Mechatronics, Electrical Engineering, Mechanical Engineering, or related field.</li>\n</ul>\n<ul>\n<li>Strong knowledge of 3D computer vision concepts, including multi-view geometry, camera models, photogrammetry, depth estimation, and 3D reconstruction techniques.</li>\n</ul>\n<ul>\n<li>Fluency in standard domain libraries (numpy, opencv, pytorch, etc).</li>\n</ul>\n<ul>\n<li>Proven understanding of data structures, algorithms, concurrency, and code optimization.</li>\n</ul>\n<ul>\n<li>Experience working with Python, PyTorch, or C++ programming languages.</li>\n</ul>\n<ul>\n<li>Experience deploying software to end customers, internal or external.</li>\n</ul>\n<ul>\n<li>Must be willing to travel 25%.</li>\n</ul>\n<ul>\n<li>Eligible to obtain an active U.S. Secret security clearance.</li>\n</ul>\n<p>Preferred Qualifications:</p>\n<ul>\n<li>MS or PhD in Robotics, Computer Science, Engineering, or related field.</li>\n</ul>\n<ul>\n<li>Experience with perception systems for aerial robotics or other highly dynamic platforms.</li>\n</ul>\n<ul>\n<li>Experience with real-world sensor integrations, including LiDAR, RGB-D cameras, IR cameras, stereo cameras, or TOF cameras.</li>\n</ul>\n<ul>\n<li>Experience with GPU / CUDA programming for accelerated computer vision processing.</li>\n</ul>\n<ul>\n<li>Knowledge of path planning algorithms and their integration with perception systems in dynamic environments.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_126e36d8-668","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anduril Industries","sameAs":"https://www.anduril.com/","logo":"https://logos.yubhub.co/anduril.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/andurilindustries/jobs/4830032007","x-work-arrangement":"onsite","x-experience-level":"entry","x-job-type":"internship","x-salary-range":null,"x-skills-required":["computer vision","robotics","Python","PyTorch","C++","numpy","opencv","data structures","algorithms","concurrency","code optimization"],"x-skills-preferred":["perception systems","aerial robotics","LiDAR","RGB-D cameras","IR cameras","stereo cameras","TOF cameras","GPU","CUDA"],"datePosted":"2026-04-18T15:48:07.380Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Costa Mesa, California, United States"}},"employmentType":"INTERN","occupationalCategory":"Engineering","industry":"Technology","skills":"computer vision, robotics, Python, PyTorch, C++, numpy, opencv, data structures, algorithms, concurrency, code optimization, perception systems, aerial robotics, LiDAR, RGB-D cameras, IR cameras, stereo cameras, TOF cameras, GPU, CUDA"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_0f249232-d14"},"title":"Principal Engineer, Cluster Orchestration","description":"<p>As a Principal Engineer in AI Infrastructure, you will lead the design and evolution of the cluster orchestration systems that make this possible. This includes Slurm, Kubernetes, SUNK, and the control planes that support AI training, inference, and model onboarding at scale.</p>\n<p>You will define long-term architecture, solve hard scaling problems, and set technical direction across teams. Your work will directly affect how quickly customers can run models, how efficiently we use GPUs, and how reliably the platform behaves at scale.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Defining the long-term architecture for CoreWeave&#39;s orchestration platforms across Kubernetes, Slurm, SUNK, Kueue, and related systems.</li>\n<li>Acting as a technical authority on scheduling, quota enforcement, fairness, pre-emption, and multi-tenant GPU isolation.</li>\n<li>Making design decisions that balance performance, reliability, cost, and operational complexity.</li>\n</ul>\n<p>In addition to these responsibilities, you will also lead the evolution of Kubernetes-native control planes, including SUNK and custom operators, and design systems that support workload admission, validation, and rollout, including model onboarding flows.</p>\n<p>You will work closely with cross-functional teams to ensure that the systems you design and implement meet the needs of our customers and are scalable, reliable, and efficient.</p>\n<p>If you have a passion for building large-scale distributed systems and are looking for a challenging and rewarding role, we encourage you to apply.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_0f249232-d14","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4658799006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$206,000 to $303,000","x-skills-required":["Kubernetes","Slurm","SUNK","Go","Cloud-native systems development","GPU-heavy platforms for AI training, inference, or HPC workloads"],"x-skills-preferred":["Kueue","Kubeflow","Argo Workflows","Ray","Istio","Knative"],"datePosted":"2026-04-18T15:48:07.140Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Bellevue, WA / Sunnyvale, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Kubernetes, Slurm, SUNK, Go, Cloud-native systems development, GPU-heavy platforms for AI training, inference, or HPC workloads, Kueue, Kubeflow, Argo Workflows, Ray, Istio, Knative","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":206000,"maxValue":303000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_a14533c3-732"},"title":"Senior Engineer, Cilium CNI & Cloud Networking","description":"<p>Network Services Team</p>\n<p>The Network Services team builds and operates the foundational networking that powers CoreWeave&#39;s Kubernetes platforms at cloud scale. The team is responsible for container networking, connectivity, and network services that support large-scale, GPU-driven workloads across regions and environments. They focus on scalability, reliability, security, and performance while delivering intuitive platforms for internal teams and customers.</p>\n<p>About the Role</p>\n<p>As a Senior Engineer focused on our Cilium-based CNI, you will design, build, and operate the container networking layer that underpins CoreWeave&#39;s Kubernetes platforms. Day to day, you will work on evolving our CNI stack to support large, high-density GPU clusters with demanding throughput and latency requirements. You will partner closely with Kubernetes, Infrastructure, and Network Services engineers to ensure the platform is highly available, observable, and secure. This role spans architecture, implementation, and operations, with ownership from prototype through production. You will also help shape how our networking platform scales for future growth.</p>\n<p>Who You Are</p>\n<ul>\n<li>5+ years of experience as a Software Engineer or Systems Engineer working on cloud infrastructure or large-scale distributed systems.</li>\n<li>Hands-on production experience with Cilium CNI (or equivalent advanced CNIs), including cluster configuration and lifecycle management.</li>\n<li>Strong understanding of Cilium&#39;s eBPF datapath, policy model, and load-balancing mechanisms.</li>\n<li>Deep knowledge of cloud networking concepts, including VPCs, subnets, routing, security groups/ACLs, NAT, and ingress/egress architectures.</li>\n<li>Experience designing multi-tenant network architectures with strong isolation and security.</li>\n<li>Solid grounding in TCP/IP, dynamic routing (e.g., BGP), ECMP, MTU/fragmentation, and overlay/underlay networking (VXLAN, Geneve, encapsulation).</li>\n<li>Experience with network observability and troubleshooting across L3–L7.</li>\n<li>Proficiency in at least one systems language such as Golang or C/C++.</li>\n<li>Experience working in modern CI/CD environments.</li>\n<li>Experience operating Kubernetes at scale, including cluster lifecycle management and debugging networking issues across pods, nodes, and external services.</li>\n<li>Demonstrated ownership of complex systems end-to-end.</li>\n</ul>\n<p>Preferred</p>\n<ul>\n<li>Experience operating cloud-scale network services across tens of thousands of nodes and multiple regions.</li>\n<li>Contributions to Cilium, Kubernetes, or related open-source networking projects.</li>\n<li>Experience with eBPF development and performance tuning.</li>\n<li>Experience building Kubernetes operators or controllers.</li>\n<li>Familiarity with service meshes, multi-cluster networking, or cluster mesh solutions.</li>\n<li>Experience in GPU-heavy, HPC, or other performance-sensitive environments.</li>\n</ul>\n<p>Wondering if you’re a good fit?</p>\n<p>We believe in investing in our people and value candidates who bring diverse experiences , even if you’re not a 100% match on paper. If some of this sounds like you, we’d love to talk.</p>\n<ul>\n<li>You love solving complex distributed systems and networking challenges at scale.</li>\n<li>You’re curious about cloud-native networking, eBPF, and Kubernetes internals.</li>\n<li>You’re an expert in building reliable, scalable infrastructure that runs in production.</li>\n</ul>\n<p>Why CoreWeave?</p>\n<p>At CoreWeave, we work hard, have fun, and move fast! We’re in an exciting stage of hyper-growth that you will not want to miss out on. We’re not afraid of a little chaos, and we’re constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values:</p>\n<ul>\n<li>Be Curious at Your Core</li>\n<li>Act Like an Owner</li>\n<li>Empower Employees</li>\n<li>Deliver Best-in-Class Client Experiences</li>\n<li>Achieve More Together</li>\n</ul>\n<p>The base salary range for this role is $165,000 to $242,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).</p>\n<p>What We Offer</p>\n<p>The range we’ve posted represents the typical compensation range for this role. To determine actual compensation, we review the market rate for each candidate which can include a variety of factors. These include qualifications, experience, interview performance, and location. In addition to a competitive salary, we offer a variety of benefits to support your needs, including:</p>\n<ul>\n<li>Medical, dental, and vision insurance</li>\n<li>100% paid for by CoreWeave</li>\n<li>Company-paid Life Insurance</li>\n<li>Voluntary supplemental life insurance</li>\n<li>Short and long-term disability insurance</li>\n<li>Flexible Spending Account</li>\n<li>Health Savings Account</li>\n<li>Tuition Reimbursement</li>\n<li>Ability to Participate in Employee Stock Purchase Program (ESPP)</li>\n<li>Mental Wellness Benefits through Spring Health</li>\n<li>Family-Forming support provided by Carrot</li>\n<li>Paid Parental Leave</li>\n<li>Flexible, full-service childcare support with Kinside</li>\n<li>401(k) with a generous employer match</li>\n<li>Flexible PTO</li>\n<li>Catered lunch each day in our office and data center locations</li>\n<li>A casual work environment</li>\n<li>A work culture focused on innovative disruption</li>\n</ul>\n<p>Our Workplace</p>\n<p>While we prioritize a hybrid work environment, remote work may be considered for candidates located more than 30 miles from an office, based on role requirements for specialized skill sets. New hires will be invited to attend onboarding at one of our hubs within their first month. Teams also gather quarterly to support collaboration.</p>\n<p>California Consumer Privacy Act - California applicants only</p>\n<p>CoreWeave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, sexual orientation, gender identity, national origin, veteran status, or genetic information. As part of this commitment and consistent with the Americans with Disabilities Act (ADA), CoreWeave will ensure that qualified applicants and candidates with disabilities are provided reasonable accommodations for the hiring process, unless such accommodation would cause an undue hardship. If reasonable accommodation is needed, please contact: careers@coreweave.com.</p>\n<p>Export Control Compliance</p>\n<p>This position requires access to export controlled information. To conform to U.S. Government export regulations applicable to that information, applicant must either be (A) a U.S. person, defined as a (i) U.S. citizen or national, (ii) U.S. lawful permanent resident (green card holder), (iii) refugee under 8 U.S.C. § 1157, or (iv) asylee under 8 U.S.C. § 1158, (B) eligible to access the export controlled information without a required export authorization, or (C) eligible and reasonably likely to obtain the required export authorization from the applicable U.S. government agency. CoreWeave may, for legitimate business reasons, decline to pursue any export licensing process.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_a14533c3-732","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4653971006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$165,000 to $242,000","x-skills-required":["Cilium CNI","cloud infrastructure","large-scale distributed systems","container networking","connectivity","network services","Kubernetes","eBPF datapath","policy model","load-balancing mechanisms","cloud networking concepts","VPCs","subnets","routing","security groups/ACLs","NAT","ingress/egress architectures","TCP/IP","dynamic routing","ECMP","MTU/fragmentation","overlay/underlay networking","Golang","C/C++","CI/CD environments","Kubernetes at scale","cluster lifecycle management","debugging networking issues"],"x-skills-preferred":["cloud-scale network services","Cilium","eBPF development","performance tuning","Kubernetes operators","controllers","service meshes","multi-cluster networking","cluster mesh solutions","GPU-heavy","HPC","performance-sensitive environments"],"datePosted":"2026-04-18T15:47:58.336Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Cilium CNI, cloud infrastructure, large-scale distributed systems, container networking, connectivity, network services, Kubernetes, eBPF datapath, policy model, load-balancing mechanisms, cloud networking concepts, VPCs, subnets, routing, security groups/ACLs, NAT, ingress/egress architectures, TCP/IP, dynamic routing, ECMP, MTU/fragmentation, overlay/underlay networking, Golang, C/C++, CI/CD environments, Kubernetes at scale, cluster lifecycle management, debugging networking issues, cloud-scale network services, Cilium, eBPF development, performance tuning, Kubernetes operators, controllers, service meshes, multi-cluster networking, cluster mesh solutions, GPU-heavy, HPC, performance-sensitive environments","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":165000,"maxValue":242000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_3d3e5c3d-569"},"title":"Senior Engineer, Datacenter Server Lifecycle","description":"<p>As a Senior Engineer on the Datacenter Machine Lifecycle team, you will own the end-to-end operational journey of every machine in our facility , from initial provisioning and deployment, across its working life, through maintenance and refresh, and all the way to decommissioning.</p>\n<p>This is greenfield work: you will help define the processes, tooling, and operational standards that govern how we run and retire hardware at scale.</p>\n<p>A distinguishing aspect of this role is its deep intersection with security. The machines in our datacenter handle some of the most sensitive workloads in AI , training frontier models and serving millions of users interacting with Claude.</p>\n<p>Ensuring that every machine in the fleet is trusted, attested, and operating with a verified chain of integrity from the hardware up is a core part of the job, not an afterthought.</p>\n<p>You will partner closely with our Infrastructure Security team to define and enforce trusted compute standards across the lifecycle, from secure provisioning through end-of-life handling.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Lead the build-out of automation to support datacenters containing tens of thousands of servers.</li>\n</ul>\n<ul>\n<li>Own and define the end-to-end machine lifecycle strategy , from provisioning and deployment through operation, maintenance, refresh, and decommissioning , and maintain automation and operational procedures for common lifecycle events (e.g. hardware failures, firmware upgrades, fleet rotations).</li>\n</ul>\n<ul>\n<li>Partner closely with Infrastructure Security to design and enforce trusted compute standards across the machine lifecycle.</li>\n</ul>\n<ul>\n<li>Work closely with our Networking team to ensure end-to-end connectivity across all sites.</li>\n</ul>\n<ul>\n<li>Build and maintain tooling to track machine health, configuration, and operational status across the full datacenter fleet.</li>\n</ul>\n<p>You May Be a Good Fit If You:</p>\n<ul>\n<li>Have 5+ years of experience in datacenter operations, hardware infrastructure management, or a closely related discipline.</li>\n</ul>\n<ul>\n<li>Have deep, hands-on experience with server hardware , including rack deployment, cabling, troubleshooting, and understanding failure modes at scale.</li>\n</ul>\n<ul>\n<li>Understand hardware lifecycle management end-to-end: asset tracking, provisioning workflows, maintenance scheduling, and decommissioning practices.</li>\n</ul>\n<ul>\n<li>Have strong proficiency in at least one programming language (e.g., Python, Rust, Go, or Java).</li>\n</ul>\n<ul>\n<li>Are comfortable navigating ambiguity and working independently to drive progress on complex, cross-functional problems.</li>\n</ul>\n<ul>\n<li>Communicate clearly and can build consensus with a wide range of stakeholders.</li>\n</ul>\n<ul>\n<li>Have working knowledge of modern cloud infrastructure, including Kubernetes, Infrastructure as Code, AWS, and GCP.</li>\n</ul>\n<ul>\n<li>Are comfortable with occasional travel to datacenter sites across North America.</li>\n</ul>\n<p>Strong Candidates May Also Have:</p>\n<ul>\n<li>Hands-on experience with GPU or AI accelerator hardware (e.g. NVIDIA A100/H100, AMD MI300, Google TPUs, or AWS Trainium) and an understanding of their operational demands.</li>\n</ul>\n<ul>\n<li>Familiarity with modern provisioning tooling such as coreboot, LinuxBoot, or u-root.</li>\n</ul>\n<ul>\n<li>Experience building or contributing to datacenter automation or fleet management platforms.</li>\n</ul>\n<ul>\n<li>Experience building and deploying server operating system distributions across thousands of hosts.</li>\n</ul>\n<ul>\n<li>A background in large-scale capacity planning and hardware refresh strategy, ideally at a hyperscaler or large cloud provider.</li>\n</ul>\n<ul>\n<li>Experience with trusted compute and hardware security concepts such as secure boot, TPM, hardware attestation, and firmware verification , or a strong desire to develop deep expertise in this area.</li>\n</ul>\n<p>The annual compensation range for this role is £255,000-£325,000 GBP.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_3d3e5c3d-569","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5131038008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"£255,000-£325,000 GBP","x-skills-required":["datacenter operations","hardware infrastructure management","server hardware","programming language","cloud infrastructure","Kubernetes","Infrastructure as Code","AWS","GCP"],"x-skills-preferred":["GPU or AI accelerator hardware","modern provisioning tooling","datacenter automation","fleet management platforms","trusted compute and hardware security concepts"],"datePosted":"2026-04-18T15:47:48.808Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"London, UK"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"datacenter operations, hardware infrastructure management, server hardware, programming language, cloud infrastructure, Kubernetes, Infrastructure as Code, AWS, GCP, GPU or AI accelerator hardware, modern provisioning tooling, datacenter automation, fleet management platforms, trusted compute and hardware security concepts","baseSalary":{"@type":"MonetaryAmount","currency":"GBP","value":{"@type":"QuantitativeValue","minValue":255000,"maxValue":325000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_40d32156-365"},"title":"Reliability Lead, Common Services","description":"<p>As Reliability Lead, Common Services, you will establish and lead the Reliability Engineering and production operations practice for the Common Services organization. You&#39;ll partner closely with engineering leaders and teams across Common Services to define how we build, release, monitor, and operate critical services,raising the bar on reliability, availability, and operational excellence across the board.</p>\n<p>In this role, you will:</p>\n<ul>\n<li>Establish and lead the SRE / production engineering practice for the Common Services organization, including standards for reliability, incident management, and on-call, in partnership with the central Product Engineering organization.</li>\n<li>Develop an Operational Excellence strategy that focuses on not only improving system performance but also monitoring and reducing operational toil</li>\n<li>Partner with engineering and product teams to define SLOs, SLIs, and error budgets for critical Common Services, and ensure these become part of how teams plan and make tradeoffs.</li>\n<li>Own and improve the incident management lifecycle for Common Services, including on-call rotations, escalation paths, incident tooling, post-incident reviews, and follow-through on corrective actions.</li>\n<li>Drive the observability strategy (metrics, logs, traces, dashboards, alerts) for Common Services, ensuring we have actionable visibility into the health, performance, and capacity of key systems.</li>\n<li>Collaborate with engineering leads to design and review architectures for reliability, scalability, resilience, and operability, including failure modes, redundancy, and graceful degradation.</li>\n<li>Lead efforts to automate and harden operational workflows, including deployments, rollbacks, configuration management, change management, and routine maintenance tasks.</li>\n<li>Build strong, trust-based relationships with partner teams and stakeholders, becoming a go-to leader for production readiness and operational risk within Common Services.</li>\n<li>Hire, mentor, and develop SRE and production engineering talent, fostering a culture of continuous improvement, learning from incidents, and humane on-call.</li>\n<li>Partner with other SRE and production engineering leaders across CoreWeave to align on global practices, tools, and reliability goals, representing the needs and constraints of Common Services.</li>\n</ul>\n<p>You will be responsible for defining the reliability strategy, processes, and standards for the Common Services portfolio and driving consistent, high-quality operational practices across multiple teams.</p>\n<p>The base salary range for this role is $206,000 to $303,000.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_40d32156-365","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4650165006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$206,000 to $303,000","x-skills-required":["Site Reliability Engineering","Production Engineering","Linux-based production environments","Containers","Orchestration technologies","Observability stacks","Alerting systems","SLIs/SLOs","Error budgets","Incident management","On-call rotations","Escalation paths","Post-incident reviews","Corrective actions","Automation tooling","Infrastructure-as-code","CI/CD pipelines"],"x-skills-preferred":["GPU workloads","High-performance computing","Latency/throughput-sensitive systems","Multi-tenant environments","Multi-region environments","Regulated environments","Service ownership models","Mentoring","Managing senior engineers"],"datePosted":"2026-04-18T15:47:45.370Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"New York, NY / Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Site Reliability Engineering, Production Engineering, Linux-based production environments, Containers, Orchestration technologies, Observability stacks, Alerting systems, SLIs/SLOs, Error budgets, Incident management, On-call rotations, Escalation paths, Post-incident reviews, Corrective actions, Automation tooling, Infrastructure-as-code, CI/CD pipelines, GPU workloads, High-performance computing, Latency/throughput-sensitive systems, Multi-tenant environments, Multi-region environments, Regulated environments, Service ownership models, Mentoring, Managing senior engineers","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":206000,"maxValue":303000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_abff148c-cfd"},"title":"Staff Machine Learning Engineer, GenAI Platform","description":"<p>As a Staff Machine Learning Engineer on the Machine Learning Platform team, you will be a key technical leader architecting and scaling our Generative AI and LLM platform capabilities.</p>\n<p>Training and deploying foundation models places unprecedented demands on our systems. You will define the technical strategy and build the core infrastructure that enables machine learning engineers and researchers to seamlessly train, evaluate, and iterate on large language models at Reddit scale.</p>\n<ul>\n<li>Drive GenAI Infrastructure Strategy: Propose, design, and lead the architecture of our next-generation LLM platform, significantly advancing our capabilities to support large-scale foundation models that serve millions of redditors.</li>\n<li>Design Resilient, Large-Scale Distributed Systems: Architect highly fault-tolerant training infrastructure capable of supporting multi-week, distributed workloads across massive GPU clusters.</li>\n<li>Build Self-Serve LLM Workflows: Design and implement robust, production-grade pipelines for LLM fine-tuning (e.g., SFT, RLHF/DPO).</li>\n<li>Develop Comprehensive Evaluation &amp; Benchmarking Infrastructure: Treat model evaluation as a first-class platform capability.</li>\n<li>Architect Advanced Data Ingestion Pipelines: Extend our distributed data platforms to natively and efficiently handle the massive, multimodal datasets (text, image, video) required for modern GenAI workloads,</li>\n</ul>\n<p>You will have 10+ years of work experience in a production software development environment or building complex distributed data systems, plus a degree in ML, Engineering, Computer Science, or a related discipline.</p>\n<p>GenAI/LLM Infrastructure Expertise: Proven track record of designing and operating large-scale ML systems, specifically working with distributed training frameworks (e.g., FSDP, DeepSpeed, Megatron-LM) and LLM serving/inference optimization (e.g., vLLM, TensorRT-LLM).</p>\n<p>Distributed Systems Mastery: Hands-on experience managing fault-tolerant, petabyte-scale distributed systems and multi-node/multi-GPU training clusters.</p>\n<p>Advanced MLOps Knowledge: Deep understanding of modern ML orchestration, fine-tuning pipelines, and model evaluation methodologies.</p>\n<p>GPU Experience: Hands-on practice with CUDA environments, GPU virtualization/containerization, and doing it all within Kubernetes.</p>\n<p>Production Engineering Fundamentals: Hands-on experience with Kubernetes, Docker, and building production-quality, object-oriented code in Python and/or Go.</p>\n<p>Strong focus on scalability, reliability, performance, and ease of use.</p>\n<p>You are an undying advocate for platform users and have a deep intuition for the machine learning development lifecycle.</p>\n<p>Strong organizational &amp; communication skills.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_abff148c-cfd","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Reddit","sameAs":"https://www.redditinc.com","logo":"https://logos.yubhub.co/redditinc.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/reddit/jobs/7772523","x-work-arrangement":"remote","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$253,300-$354,600 USD","x-skills-required":["GenAI/LLM Infrastructure Expertise","Distributed Systems Mastery","Advanced MLOps Knowledge","GPU Experience","Production Engineering Fundamentals"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:47:35.489Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Remote - United States"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"GenAI/LLM Infrastructure Expertise, Distributed Systems Mastery, Advanced MLOps Knowledge, GPU Experience, Production Engineering Fundamentals","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":253300,"maxValue":354600,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_67759024-e54"},"title":"Technical Solutions Manager","description":"<p>The Customer Experience (CX) Organisation at CoreWeave is dedicated to ensuring every client running AI workloads at scale has a seamless, reliable, and high-performance experience.</p>\n<p>This team supports the infrastructure that powers the AI revolution,working across data centres, hardware systems, and customer workloads to maintain the integrity of our cloud platform. The CX organisation aligns closely with the internal and customer engineering teams, offering valuable insights from the field and having the chance to contribute to the CoreWeave product roadmap and development.</p>\n<p>We are seeking a remarkable Technical Solutions Manager who shares our passion and has a deep understanding of GPU infrastructure &amp; AI applications to join our CX Organisation. The team is responsible for educating prospective customers on the technical value of CoreWeave, designing and defining customer deliverables and integration points, onboarding and enabling customers, and ensuring the successful ongoing operations of CoreWeave within customer environments.</p>\n<p>In this role, you will:</p>\n<ul>\n<li>Lead Strategic Customer Relationships: Ownership of technical customer relationships to ensure successful adoption and customer satisfaction.</li>\n<li>Define Customer Requirements: Collaborate with customers and partners to define technical requirements that meet the customer&#39;s needs for AI/ML.</li>\n<li>Drive End-to-End Program Execution: Oversee the execution of complex programs, including planning, resource management, risk assessment, and internal/external stakeholder engagement to ensure successful outcomes.</li>\n<li>Engage with Stakeholders/Influence product strategy: Gather, document, and communicate program requirements to ensure clarity, feasibility, and alignment with critical objectives. Share customer feedback with Product Management and Engineering, influencing product direction.</li>\n<li>Foster Collaboration: Facilitate effective communication among various teams, including engineering, product management, operations, support, and sales.</li>\n<li>Build Strong Relationships: Establish and maintain strong relationships with stakeholders to align program objectives and secure necessary resources and support.</li>\n<li>Proactively Manage Risks: Identify potential risks and issues throughout the program and proactively communicate to relevant stakeholders to drive resolutions and minimise impact.</li>\n<li>Measure Success: Define and track key performance indicators (KPIs) and metrics to measure program success and effectiveness.</li>\n<li>Drive Improvements: Identify and address inefficiencies to enhance operational speed and quality outcomes.</li>\n</ul>\n<p>Who You Are:</p>\n<ul>\n<li>B.S. in Computer Science or a related technical discipline, or equivalent experience</li>\n<li>5+ years of experience in technical program management, customer success management, or professional services delivery management, with a focus on cloud infrastructure and AI/ML applications</li>\n<li>Strong communication skills through both long-form documents and short-form/asynchronous communications with internal and external stakeholders</li>\n<li>Proven track record of successfully organising and coordinating the efforts of multiple teams to deliver long-running, complex projects with visibility to senior stakeholders.</li>\n<li>Experience with multiple staples of leadership, with the ability to work in a bottom-up leadership-style organisation that focuses on enablement, communication, organisation, and inspiration over task management.</li>\n<li>Demonstrated experience with proactive self-management, with examples of recognising when to seek help and is willing to ask for it in a timely manner within a safe environment.</li>\n<li>Experience with client management within a cloud infrastructure landscape, ideally with an understanding of the fundamentals of Kubernetes, GPU compute, AI/ML, and high-performance computing</li>\n</ul>\n<p>Why CoreWeave? At CoreWeave, we work hard, have fun, and move fast! We&#39;re in an exciting stage of hyper-growth that you will not want to miss out on. We&#39;re not afraid of a little chaos, and we&#39;re constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values:</p>\n<ul>\n<li>Be Curious at Your Core</li>\n<li>Act Like an Owner</li>\n<li>Empower Employees</li>\n<li>Deliver Best-in-Class Client Experiences</li>\n<li>Achieve More Together</li>\n</ul>\n<p>We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and provides the opportunity to develop innovative solutions to complex problems. As we get set for take off, the growth opportunities within the organisation are constantly expanding. You will be surrounded by some of the best talent in the industry, who will want to learn from you, too. Come join us!</p>\n<p>The base salary range for this role is $185,000 to $215,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits programme (all based on eligibility).</p>\n<p>What We Offer The range we&#39;ve posted represents the typical compensation range for this role. To determine actual compensation, we review the market rate for each candidate which can include a variety of factors. These include qualifications, experience, interview performance, and location. In addition to a competitive salary, we offer a variety of benefits to support your needs, including:</p>\n<ul>\n<li>Medical, dental, and vision insurance</li>\n<li>100% paid for by CoreWeave</li>\n<li>Company-paid Life Insurance</li>\n<li>Voluntary supplemental life insurance</li>\n<li>Short and long-term disability insurance</li>\n<li>Flexible Spending Account</li>\n<li>Health Savings Account</li>\n<li>Tuition Reimbursement</li>\n<li>Ability to Participate in Employee Stock Purchase Programme (ESPP)</li>\n<li>Mental Wellness Benefits through Spring Health</li>\n<li>Family-Forming support provided by Carrot</li>\n<li>Paid Parental Leave</li>\n<li>Flexible, full-service childcare support with Kinside</li>\n<li>401(k) with a generous employer match</li>\n<li>Flexible PTO</li>\n<li>Catered lunch each day in our office and data centre locations</li>\n<li>A casual work environment</li>\n<li>A work culture focused on innovative disruption</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_67759024-e54","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4380852006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$185,000 to $215,000","x-skills-required":["Cloud infrastructure","AI and machine learning","GPU infrastructure","Kubernetes","GPU compute","High-performance computing"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:47:13.234Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Sunnyvale, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Cloud infrastructure, AI and machine learning, GPU infrastructure, Kubernetes, GPU compute, High-performance computing","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":185000,"maxValue":215000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_040a59f5-1d2"},"title":"Research Engineer, Pretraining","description":"<p>We are seeking a Research Engineer to join our Pretraining team. In this role, you will conduct research and implement solutions in areas such as model architecture, algorithms, data processing, and optimizer development. You will also independently lead small research projects while collaborating with team members on larger initiatives.</p>\n<p>Key responsibilities include designing, running, and analyzing scientific experiments to advance our understanding of large language models. Additionally, you will optimize and scale our training infrastructure to improve efficiency and reliability, and develop and improve dev tooling to enhance team productivity.</p>\n<p>As a Research Engineer, you will contribute to the entire stack, from low-level optimizations to high-level model design. You will work at the intersection of cutting-edge research and practical engineering, contributing to the development of safe, steerable, and trustworthy AI systems.</p>\n<p>The ideal candidate will have an advanced degree in Computer Science, Machine Learning, or a related field, and strong software engineering skills with a proven track record of building complex systems. You should be familiar with Python and experience with deep learning frameworks, particularly PyTorch. Additionally, you should have expertise in large-scale machine learning, particularly in the context of language models.</p>\n<p>You will thrive in this role if you have significant software engineering experience, are results-oriented with a bias towards flexibility and impact, willing to take on tasks outside your job description to support the team, enjoy pair programming and collaborative work, and are eager to learn more about machine learning research.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_040a59f5-1d2","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5119713008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"£260,000-£630,000 GBP","x-skills-required":["Python","PyTorch","Machine Learning","Deep Learning","Software Engineering","Computer Science"],"x-skills-preferred":["GPU","Kubernetes","OS Internals","Reinforcement Learning","Language Modeling","Transformer Architectures"],"datePosted":"2026-04-18T15:46:36.925Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"London, UK"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, PyTorch, Machine Learning, Deep Learning, Software Engineering, Computer Science, GPU, Kubernetes, OS Internals, Reinforcement Learning, Language Modeling, Transformer Architectures","baseSalary":{"@type":"MonetaryAmount","currency":"GBP","value":{"@type":"QuantitativeValue","minValue":260000,"maxValue":630000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_7d1e1517-7a3"},"title":"Senior Supply Materials Manager","description":"<p>You will join the Global Supplier Management and Technical Sourcing organization, a team responsible for ensuring CoreWeave&#39;s rapidly growing hardware demand converts into clear-to-build, on-time shipments. The team works cross-functionally with Strategic Sourcing, Engineering, Program Management, Operations, and a global supplier base to support aggressive AI infrastructure deployment schedules in a highly supply-constrained environment.</p>\n<p>As a Senior Supply Materials Manager, you will own end-to-end supply execution for one or more strategic OEM and ODM partners across multiple concurrent hardware programs. You will ensure allocation commitments, material readiness, and manufacturing capacity align with CoreWeave&#39;s deployment plans. This role operates at the intersection of operations, sourcing, engineering, and suppliers, requiring strong execution discipline and comfort navigating ambiguity.</p>\n<p>You will act as the single-threaded owner for supply execution, proactively identifying risks, driving recovery actions, and maintaining operational stability as new NVIDIA-based platforms ramp.</p>\n<p>We believe in investing in our people, and value candidates who can bring their own diversified experiences to our teams – even if you aren&#39;t a 100% skill or experience match. Here are a few qualities we&#39;ve found compatible with our team. If some of this describes you, we&#39;d love to talk.</p>\n<ul>\n<li>You love driving execution in complex, supply-constrained environments</li>\n<li>You&#39;re curious about how hardware, manufacturing, and supply chains come together at scale</li>\n<li>You&#39;re an expert at turning demand signals into executable supply plans</li>\n</ul>\n<p>Why CoreWeave? At CoreWeave, we work hard, have fun, and move fast! We&#39;re in an exciting stage of hyper-growth that you will not want to miss out on. We&#39;re not afraid of a little chaos, and we&#39;re constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values:</p>\n<ul>\n<li>Be Curious at Your Core</li>\n<li>Act Like an Owner</li>\n<li>Empower Employees</li>\n<li>Deliver Best-in-Class Client Experiences</li>\n<li>Achieve More Together</li>\n</ul>\n<p>We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and provides the opportunity to develop innovative solutions to complex problems.</p>\n<p>As we get set for take off, the growth opportunities within the organization are constantly expanding. You will be surrounded by some of the best talent in the industry, who will want to learn from you, too. Come join us!</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_7d1e1517-7a3","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4655160006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["supply chain management","materials management","manufacturing operations","data center soit","hyperscaler","cloud service provider","AI hardware","OEM and ODM supply execution","allocation-constrained environments","end-to-end supply execution","single-threaded owner","supply execution","risk identification","recovery actions","operational stability"],"x-skills-preferred":["experience working directly with Taiwan-based ODMs or global OEM manufacturing partners","familiarity with server, rack-level, or AI system builds including GPU, memory, power, and thermal components","experience supporting multiple overlapping NPI and production ramps"],"datePosted":"2026-04-18T15:46:22.456Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Taiwan"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"supply chain management, materials management, manufacturing operations, data center soit, hyperscaler, cloud service provider, AI hardware, OEM and ODM supply execution, allocation-constrained environments, end-to-end supply execution, single-threaded owner, supply execution, risk identification, recovery actions, operational stability, experience working directly with Taiwan-based ODMs or global OEM manufacturing partners, familiarity with server, rack-level, or AI system builds including GPU, memory, power, and thermal components, experience supporting multiple overlapping NPI and production ramps"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_faffae87-882"},"title":"Staff Software Engineer - GenAI Performance and Kernel","description":"<p>As a staff software engineer for GenAI Performance and Kernel, you will own the design, implementation, optimization, and correctness of the high-performance GPU kernels powering our GenAI inference stack. You will lead development of highly-tuned, low-level compute paths, manage trade-offs between hardware efficiency and generality, and mentor others in kernel-level performance engineering.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Leading the design, implementation, benchmarking, and maintenance of core compute kernels optimized for various hardware backends (GPU, accelerators)</li>\n<li>Driving the performance roadmap for kernel-level improvements: vectorization, tensorization, tiling, fusion, mixed precision, sparsity, quantization, memory reuse, scheduling, auto-tuning, etc.</li>\n<li>Integrating kernel optimizations with higher-level ML systems</li>\n<li>Building and maintaining profiling, instrumentation, and verification tooling to detect correctness, performance regressions, numerical issues, and hardware utilization gaps</li>\n<li>Leading performance investigations and root-cause analysis on inference bottlenecks, e.g. memory bandwidth, cache contention, kernel launch overhead, tensor fragmentation</li>\n<li>Establishing coding patterns, abstractions, and frameworks to modularize kernels for reuse, cross-backend portability, and maintainability</li>\n<li>Influencing system architecture decisions to make kernel improvements more effective (e.g. memory layout, dataflow scheduling, kernel fusion boundaries)</li>\n<li>Mentoring and guiding other engineers working on lower-level performance, providing code reviews, and helping set best practices</li>\n<li>Collaborating with infrastructure, tooling, and ML teams to roll out kernel-level optimizations into production, and monitoring their impact</li>\n</ul>\n<p>Requirements include:</p>\n<ul>\n<li>BS/MS/PhD in Computer Science, or a related field</li>\n<li>Deep hands-on experience writing and tuning compute kernels (CUDA, Triton, OpenCL, LLVM IR, assembly or similar sort) for ML workloads</li>\n<li>Strong knowledge of GPU/accelerator architecture: warp structure, memory hierarchy (global, shared, register, L1/L2 caches), tensor cores, scheduling, SM occupancy, etc.</li>\n<li>Experience with advanced optimization techniques: tiling, blocking, software pipelining, vectorization, fusion, loop transformations, auto-tuning</li>\n<li>Familiarity with ML-specific kernel libraries (cuBLAS, cuDNN, CUTLASS, oneDNN, etc.) or open kernels</li>\n<li>Strong debugging and profiling skills (Nsight, NVProf, perf, vtune, custom instrumentation)</li>\n<li>Experience reasoning about numerical stability, mixed precision, quantization, and error propagation</li>\n<li>Experience in integrating optimized kernels into real-world ML inference systems; exposure to distributed inference pipelines, memory management, and runtime systems</li>\n<li>Experience building high-performance products leveraging GPU acceleration</li>\n<li>Excellent communication and leadership skills , able to drive design discussions, mentor colleagues, and make trade-offs visible</li>\n<li>A track record of shipping performance-critical, high-quality production software</li>\n<li>Bonus: published in systems/ML performance venues (e.g. MLSys, ASPLOS, ISCA, PPoPP), experience with custom accelerators or FPGA, experience with sparsity or model compression techniques</li>\n</ul>\n<p>The pay range for this role is $190,900-$232,800 USD per year, depending on location and experience.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_faffae87-882","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Databricks","sameAs":"https://databricks.com","logo":"https://logos.yubhub.co/databricks.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/databricks/jobs/8202700002","x-work-arrangement":"onsite","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$190,900-$232,800 USD per year","x-skills-required":["Compute kernels","GPU/accelerator architecture","Advanced optimization techniques","ML-specific kernel libraries","Debugging and profiling skills","Numerical stability","Mixed precision","Quantization","Error propagation","Distributed inference pipelines","Memory management","Runtime systems","High-performance products","GPU acceleration"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:46:07.442Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, California"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Compute kernels, GPU/accelerator architecture, Advanced optimization techniques, ML-specific kernel libraries, Debugging and profiling skills, Numerical stability, Mixed precision, Quantization, Error propagation, Distributed inference pipelines, Memory management, Runtime systems, High-performance products, GPU acceleration","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":190900,"maxValue":232800,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_a561c761-1f3"},"title":"Manager, Bare Metal Support Engineering","description":"<p>The Customer Experience (CX) Organisation at CoreWeave is dedicated to ensuring every client running AI workloads at scale has a seamless, reliable, and high-performance experience.</p>\n<p>As a Manager of Bare Metal Support Engineering, you&#39;ll be at the centre of ensuring our dedicated infrastructure remains stable, reliable, and performant. You&#39;ll lead daily support operations, triage incidents, drive escalations, and ensure that hardware is monitored, maintained, and delivered effectively for our clients.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Leading a skilled team responsible for maintaining and optimising physical infrastructure across multiple client environments.</li>\n<li>Building, developing, and leading a dedicated Infrastructure Support team focused on supporting key infrastructure, handling escalations, and ensuring smooth hardware operations.</li>\n<li>Overseeing the resolution of infrastructure-related incidents, escalation management, and collaborating with internal teams to deliver effective solutions.</li>\n<li>Improving support processes to enhance efficiency and reduce downtime, ensuring the infrastructure meets client expectations.</li>\n</ul>\n<p>The ideal candidate will have 5+ years of experience leading teams responsible for infrastructure support, data centre operations, or physical compute environments. They should be hands-on with Linux system administration and command-line tools, familiar with hardware-level diagnostics, troubleshooting, and replacement, and have experience working with high-performance rack-scale hardware.</p>\n<p>In addition to the required skills, preferred skills include experience managing infrastructure support teams in high-growth or rapidly evolving environments, proven ability to develop and implement operational processes that scale with business needs, and strong familiarity with server and GPU hardware lifecycle management.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_a561c761-1f3","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4649055006","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$170,000 to $240,000 SGD","x-skills-required":["Linux system administration","Command-line tools","Hardware-level diagnostics","Troubleshooting and replacement","High-performance rack-scale hardware"],"x-skills-preferred":["Managing infrastructure support teams","Developing and implementing operational processes","Server and GPU hardware lifecycle management"],"datePosted":"2026-04-18T15:45:59.370Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Singapore"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Linux system administration, Command-line tools, Hardware-level diagnostics, Troubleshooting and replacement, High-performance rack-scale hardware, Managing infrastructure support teams, Developing and implementing operational processes, Server and GPU hardware lifecycle management","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":170000,"maxValue":240000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_71554e46-b64"},"title":"Senior Engineering Manager, AI Runtime","description":"<p>At Databricks, we are committed to enabling data teams to solve the world&#39;s toughest problems. As a Senior Engineering Manager, you will lead the team owning both the product experience and the foundational infrastructure of our AI Runtime (AIR) product.</p>\n<p>You will be responsible for shaping customer-facing capabilities while designing for scalability, extensibility, and performance of GPU training and adjacent areas. This will involve collaborating closely across the platform, product, infrastructure, and research organisations.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Leading, mentoring, and growing a high-performing engineering team responsible for the Custom Training product and its foundational infrastructure</li>\n<li>Defining and owning the product and technical roadmap for AIR, balancing customer experience, functionality, and foundational investments</li>\n<li>Collaborating closely with product, research, platform, infrastructure teams, and customers to drive end-to-end delivery</li>\n<li>Driving architectural decisions and product design for managed GPU training at scale</li>\n<li>Advocating for customer needs through direct engagement, ensuring engineering decisions translate to clear product impact</li>\n</ul>\n<p>We are looking for someone with 8+ years of software engineering experience, with 3+ years in engineering management. You should have a track record of building and operating managed GPU training infrastructure at scale, as well as deep familiarity with distributed training frameworks and parallelism strategies.</p>\n<p>In addition, you should have experience with training resilience patterns, such as checkpointing, elastic training, and automated failure recovery for long-running jobs. You should also have a strong understanding of GPU performance fundamentals, including NCCL, interconnect topologies, and memory optimisation.</p>\n<p>Experience building platform products with clear SLAs is also essential, as is strong cross-functional leadership across platform, product, and research teams. Excellent collaboration and communication skills are also required.</p>\n<p>The pay range for this role is $228,600-$314,250 USD per year, depending on location. The total compensation package may also include eligibility for annual performance bonus, equity, and benefits.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_71554e46-b64","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Databricks","sameAs":"https://databricks.com","logo":"https://logos.yubhub.co/databricks.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/databricks/jobs/8490282002","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$228,600-$314,250 USD per year","x-skills-required":["software engineering","engineering management","distributed training frameworks","parallelism strategies","GPU training infrastructure","checkpointing","elastic training","automated failure recovery","GPU performance fundamentals","NCCL","interconnect topologies","memory optimisation"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:45:28.312Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Mountain View, California; San Francisco, California"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"software engineering, engineering management, distributed training frameworks, parallelism strategies, GPU training infrastructure, checkpointing, elastic training, automated failure recovery, GPU performance fundamentals, NCCL, interconnect topologies, memory optimisation","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":228600,"maxValue":314250,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_a45e2e8c-400"},"title":"Staff Software Engineer, Foundational Model Serving","description":"<p>At Databricks, we are enabling data teams to solve the world&#39;s toughest problems by building and running the world&#39;s best data and AI infrastructure platform. Foundation Model Serving is the API Product for hosting and serving frontier AI model inference for open source models like Llama, Qwen, and GPT OSS as well as proprietary models like Claude and OpenAI GPT.</p>\n<p>We&#39;re looking for engineers who have owned high scale operational sensitive systems like customer facing APIs, Edge Gateways, ML Inference, or similar services and have an interest in getting deep building LLM APIs and runtimes at scale. As a Staff Engineer, you&#39;ll play a critical role in shaping both the product experience and core infrastructure.</p>\n<p>The impact you will have:</p>\n<ul>\n<li>Design and implement core systems and APIs that power Databricks Foundation Model Serving, ensuring scalability, reliability, and operational excellence.</li>\n<li>Partner with product and engineering leadership to define the technical roadmap and long-term architecture for serving workloads.</li>\n<li>Drive architectural decisions and trade-offs to optimize performance, throughput, autoscaling, and operational efficiency for GPU serving workloads.</li>\n<li>Contribute directly to key components across the serving infrastructure , from working in systems like vLLM and SGLang to creating token based rate limiters and optimizers , ensuring smooth and efficient operations at scale.</li>\n<li>Collaborate cross-functionally with product, platform, and research teams to translate customer needs into reliable and performant systems.</li>\n<li>Establish best practices for code quality, testing, and operational readiness, and mentor other engineers through design reviews and technical guidance.</li>\n<li>Represent the team in cross-organizational technical discussions and influence Databricks’ broader AI platform strategy.</li>\n</ul>\n<p>What we look for:</p>\n<ul>\n<li>10+ years of experience building and operating large-scale distributed systems.</li>\n<li>Experience leading high-scale operationally sensitive backend systems.</li>\n<li>A track record of up-leveling teams engineering excellence.</li>\n<li>Strong foundation in algorithms, data structures, and system design as applied to large-scale, low-latency serving systems.</li>\n<li>Proven ability to deliver technically complex, high-impact initiatives that create measurable customer or business value.</li>\n<li>Strong communication skills and ability to collaborate across teams in fast-moving environments.</li>\n<li>Strategic and product-oriented mindset with the ability to align technical execution with long-term vision.</li>\n<li>Passion for mentoring, growing engineers, and fostering technical excellence.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_a45e2e8c-400","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Databricks","sameAs":"https://databricks.com","logo":"https://logos.yubhub.co/databricks.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/databricks/jobs/8224683002","x-work-arrangement":"onsite","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$192,000-$260,000 USD","x-skills-required":["large-scale distributed systems","high-scale operationally sensitive backend systems","algorithms","data structures","system design","low-latency serving systems","GPU serving workloads","vLLM","SGLang","token based rate limiters","optimizers"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:44:55.798Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, California"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"large-scale distributed systems, high-scale operationally sensitive backend systems, algorithms, data structures, system design, low-latency serving systems, GPU serving workloads, vLLM, SGLang, token based rate limiters, optimizers","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":192000,"maxValue":260000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_9ecceef8-349"},"title":"Research Engineer/Research Scientist, Audio","description":"<p>We are seeking a Research Engineer/Research Scientist to join our Audio team. As a member of this team, you will work across the full stack of audio ML, developing audio codecs and representations, sourcing and synthesizing high-quality audio data, training large-scale speech language models and large audio diffusion models, and developing novel architectures for incorporating continuous signals into LLMs.</p>\n<p>Our team focuses primarily but not exclusively on speech, building advanced steerable systems spanning end-to-end conversational systems, speech and audio understanding models, and speech synthesis capabilities. The team works closely with many collaborators across pretraining, finetuning, reinforcement learning, production inference, and product to get advanced audio technologies from early research to high-impact real-world deployments.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Develop and train audio models, including conversational speech-to-speech, speech translation, speech recognition, text-to-speech, diarization, codecs, and generative audio models</li>\n<li>Work across abstraction levels, from signal processing fundamentals to large-scale model training and inference optimization</li>\n<li>Collaborate with teams across the company to develop and deploy audio technologies</li>\n<li>Communicate clearly and effectively with colleagues and stakeholders</li>\n</ul>\n<p>Strong candidates may also have experience with:</p>\n<ul>\n<li>Large language model pretraining and finetuning</li>\n<li>Training diffusion models for image and audio generation</li>\n<li>Reinforcement learning for large language models and diffusion models</li>\n<li>End-to-end system optimization, from performance benchmarking to kernel optimization</li>\n<li>GPUs, Kubernetes, PyTorch, or distributed training infrastructure</li>\n</ul>\n<p>Representative projects:</p>\n<ul>\n<li>Training state-of-the-art neural audio codecs for 48 kHz stereo audio</li>\n<li>Developing novel algorithms for diffusion pretraining and reinforcement learning</li>\n<li>Scaling audio datasets to millions of hours of high-quality audio</li>\n<li>Creating robust evaluation methodologies for hard-to-measure qualities such as naturalness or expressiveness</li>\n<li>Studying training dynamics of mixed audio-text language models</li>\n<li>Optimizing latency and inference throughput for deployed streaming audio systems</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_9ecceef8-349","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5074815008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$350,000-$500,000 USD","x-skills-required":["JAX","PyTorch","large-scale distributed training","signal processing fundamentals","speech language models","audio diffusion models","continuous signals","LLMs"],"x-skills-preferred":["large language model pretraining","diffusion models","reinforcement learning","end-to-end system optimization","GPUs","Kubernetes","distributed training infrastructure"],"datePosted":"2026-04-18T15:42:59.425Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"JAX, PyTorch, large-scale distributed training, signal processing fundamentals, speech language models, audio diffusion models, continuous signals, LLMs, large language model pretraining, diffusion models, reinforcement learning, end-to-end system optimization, GPUs, Kubernetes, distributed training infrastructure","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":350000,"maxValue":500000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_be62286d-940"},"title":"Member of Technical Staff - Internal Tools","description":"<p><strong>About the Role</strong></p>\n<p>We are seeking a backend engineer to join our xAI Tooling team (Starfleet). In this role, you will design, develop, and maintain robust, scalable backend systems that empower our researchers, engineers, and product teams to build, experiment with, and deploy cutting-edge AI models and applications more effectively.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Design and build robust, scalable backend systems to power our AI-driven tools and platforms.</li>\n<li>Develop APIs and distributed systems that handle high-throughput data processing for model training and evaluation.</li>\n<li>Create secure, reliable pipelines to support innovative human data generation and agentic workflows.</li>\n<li>Optimise infrastructure to leverage cutting-edge GPU resources for maximum performance.</li>\n<li>Innovate new approaches that bring us closer to our goal: to develop AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge.</li>\n</ul>\n<p><strong>Basic Qualifications</strong></p>\n<ul>\n<li>Experience building high-performance backend systems in dynamic, fast-paced environments - as a tech lead, former founder, etc.</li>\n<li>Expertise in designing scalable distributed systems and APIs with a compiled language like Rust or C++.</li>\n<li>Experience optimising data pipelines for machine learning workloads or real-time applications.</li>\n</ul>\n<p><strong>Preferred Skills and Experience</strong></p>\n<ul>\n<li>Familiarity with frontend technologies (e.g., TypeScript, React) to collaborate seamlessly with cross-functional teams.</li>\n</ul>\n<p><strong>Compensation and Benefits</strong></p>\n<p>$180,000 - $440,000 USD</p>\n<p>Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short &amp; long-term disability insurance, life insurance, and various other discounts and perks.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_be62286d-940","directApply":true,"hiringOrganization":{"@type":"Organization","name":"xAI","sameAs":"https://www.xai.com/","logo":"https://logos.yubhub.co/xai.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/xai/jobs/5052038007","x-work-arrangement":"onsite","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$180,000 - $440,000 USD","x-skills-required":["Rust","C++","APIs","Distributed Systems","Machine Learning","GPU Resources"],"x-skills-preferred":["TypeScript","React"],"datePosted":"2026-04-18T15:42:56.459Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Palo Alto, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Rust, C++, APIs, Distributed Systems, Machine Learning, GPU Resources, TypeScript, React","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":180000,"maxValue":440000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_022d9aef-8cd"},"title":"Member of Technical Staff - Infrastructure Reliability","description":"<p><strong>About the Role</strong></p>\n<p>We are training some of the largest models in the world on the latest hardware across multiple environments. To do this reliably at xAI&#39;s pace, we need engineers who have battle-tested experience keeping massive distributed infrastructure up and running 24/7, including on-prem and cloud-based infrastructure.</p>\n<p>You will own the availability, performance, and evolution of xAI&#39;s core compute, storage, and networking infrastructure. This is not an ops-only role , strong coding is a hard requirement. You will design, implement, and ship systems software, automation, and tooling in Python and/or Rust that directly impact training throughput and cluster utilization.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Define and execute the technical strategy for infrastructure reliability and scalability</li>\n<li>Build and maintain the automation, observability, and control planes that keep multi-datacenter, hybrid cloud/on-prem environments healthy</li>\n<li>Lead incident response, deep-dive root cause analysis, and post-mortems that drive real fixes</li>\n<li>Identify, instrument, and eliminate systemic failure patterns (capacity, network, hardware, storage, software)</li>\n<li>Design and implement high-leverage systems software (daemons, controllers, schedulers, etc.) in Python and Rust.</li>\n</ul>\n<p><strong>Basic Qualifications</strong></p>\n<ul>\n<li>5+ years shipping production software and/or operating distributed infrastructure at scale</li>\n<li>Expert-level knowledge of Linux systems, TCP/IP networking, and systems programming</li>\n<li>Strong coding skills with proven production experience in Rust (strongly preferred) and at least one of Python, Go, or C++.</li>\n</ul>\n<p><strong>Preferred Skills and Experience</strong></p>\n<ul>\n<li>Significant contributions to large-scale GPU clusters or AI/ML infrastructure</li>\n<li>Experience in on-call rotations and incident response in high-stakes environments.</li>\n</ul>\n<p><strong>Compensation and Benefits</strong></p>\n<p>$180,000 - $400,000 USD</p>\n<p>Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short &amp; long-term disability insurance, life insurance, and various other discounts and perks.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_022d9aef-8cd","directApply":true,"hiringOrganization":{"@type":"Organization","name":"xAI","sameAs":"https://www.xai.com/","logo":"https://logos.yubhub.co/xai.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/xai/jobs/4801451007","x-work-arrangement":"onsite","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$180,000 - $400,000 USD","x-skills-required":["Linux systems","TCP/IP networking","systems programming","Rust","Python","Go","C++","container orchestration","container runtimes","infrastructure-as-code"],"x-skills-preferred":["large-scale GPU clusters","AI/ML infrastructure","on-call rotations","incident response"],"datePosted":"2026-04-18T15:42:36.486Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Palo Alto, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Linux systems, TCP/IP networking, systems programming, Rust, Python, Go, C++, container orchestration, container runtimes, infrastructure-as-code, large-scale GPU clusters, AI/ML infrastructure, on-call rotations, incident response","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":180000,"maxValue":400000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_279d67f2-5b5"},"title":"Research Engineer / Research Scientist, Tokens","description":"<p>We&#39;re looking for a Research Engineer / Research Scientist to join our team. As a Research Engineer, you&#39;ll touch all parts of our code and infrastructure, whether that&#39;s making the cluster more reliable for our big jobs, improving throughput and efficiency, running and designing scientific experiments, or improving our dev tooling.</p>\n<p>You&#39;ll be working on large-scale ML systems from the ground up, making safe, steerable, trustworthy systems. You&#39;ll be excited to write code when you understand the research context and more broadly why it&#39;s important.</p>\n<p>Strong candidates may also have experience with high performance, large-scale ML systems, GPUs, Kubernetes, Pytorch, or OS internals, language modeling with transformers, reinforcement learning, and large-scale ETL.</p>\n<p>Representative projects may include optimizing the throughput of a new attention mechanism, comparing the compute efficiency of two Transformer variants, making a Wikipedia dataset in a format models can easily consume, scaling a distributed training job to thousands of GPUs, writing a design doc for fault tolerance strategies, and creating an interactive visualization of attention between tokens in a language model.</p>\n<p>The annual compensation range for this role is $350,000-$500,000 USD.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_279d67f2-5b5","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4951814008","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$350,000-$500,000 USD","x-skills-required":["software engineering","machine learning","high performance computing","Kubernetes","Pytorch","OS internals","language modeling","reinforcement learning","large-scale ETL"],"x-skills-preferred":["GPU","transformers","distributed training"],"datePosted":"2026-04-18T15:42:28.391Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"New York City, NY; New York City, NY | Seattle, WA; San Francisco, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"software engineering, machine learning, high performance computing, Kubernetes, Pytorch, OS internals, language modeling, reinforcement learning, large-scale ETL, GPU, transformers, distributed training","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":350000,"maxValue":500000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_6be97e03-e54"},"title":"ML HW-SW Co-Design Software Tech Lead Manager (TLM)","description":"<p>We are seeking a highly motivated ML Software Tech Lead Manager to join our HW-SW Co-design team. As a technical expert, you will lead a small, high-impact team to drive advances in machine learning acceleration. Your primary responsibilities will include direct technical contribution, technical team leadership, architectural alignment, HW-SW strategy, and execution management.</p>\n<p>You will spend a significant portion of your time on technical execution while managing a multi-disciplinary team to evolve our software stack. You will directly contribute to the codebase and technical strategy, focusing on acting as a Mountain View-based bridge between our co-design time and the Gemini core team.</p>\n<p>You will lead a small team of ML software engineers across numerics, performance optimization, novel training techniques, and novel model exploration. You will drive team cohesion by synthesizing fragmented technical opinions into a single, high-quality execution plan.</p>\n<p>You will partner closely with the hardware team to define requirements for next-generation ML accelerators. You will oversee technical execution across a virtual team including Google-internal and external partners.</p>\n<p>We value diversity of experience, knowledge, backgrounds, and perspectives and harness these qualities to create extraordinary impact. We are committed to equal employment opportunity regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, pregnancy, or related condition (including breastfeeding) or any other basis as protected by applicable law.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_6be97e03-e54","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Google DeepMind","sameAs":"https://deepmind.com/","logo":"https://logos.yubhub.co/deepmind.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/deepmind/jobs/7509867","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["high-performance software","AI/ML","technical leadership","architectural alignment","HW-SW strategy","execution management"],"x-skills-preferred":["Master's or Ph.D. in a related field","hands-on experience with high-performance compute IPs (GPUs, ML accelerators)","experience contributing to silicon development","expertise in at least one core silicon engineering discipline (e.g., RTL, PD, DV) and familiarity with the full ASIC flow"],"datePosted":"2026-04-18T15:42:14.076Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Mountain View, California, US"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"high-performance software, AI/ML, technical leadership, architectural alignment, HW-SW strategy, execution management, Master's or Ph.D. in a related field, hands-on experience with high-performance compute IPs (GPUs, ML accelerators), experience contributing to silicon development, expertise in at least one core silicon engineering discipline (e.g., RTL, PD, DV) and familiarity with the full ASIC flow"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_0a39ade0-46f"},"title":"Principal Engineer, Compute Fleet Management","description":"<p>We are seeking a seasoned Principal Engineer to join the Core of Databricks Infrastructure. As the Technical Lead for Compute Fleet Management, you will set the standard for how Databricks consumes and optimizes compute across all three major clouds (AWS, Azure, and GCP).</p>\n<p>Your mandate includes pioneering fleet optimization, delivering hyper-scale resilience, and owning the critical path. This is a mission-critical role with direct impact on our gross margin and customer experience.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Provisioning and pooling of O(Billion)s of cloud resources to achieve peak workload performance, industry-leading efficiency, and robust resource isolation.</li>\n<li>Building the architecture that guarantees horizontal scaling and resilience against zonal or even cloud account-level failures, ensuring Databricks is always on.</li>\n<li>Leading the development of the lowest-dependency systems required to bootstrap and manage our massive compute platform.</li>\n</ul>\n<p>The ideal candidate will have a track record of leading transformative projects, distributed systems mastery, influence without authority, and execution discipline.</p>\n<p>In addition, highly desirable experience includes managing and scaling a massive fleet of GPUs for AI/ML workloads and developing and operating large-scale distributed systems across all major clouds (AWS, Azure, and GCP).</p>\n<p>Databricks is committed to fair and equitable compensation practices. The pay range for this role is $264,300-$322,300 USD.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_0a39ade0-46f","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Databricks","sameAs":"https://databricks.com","logo":"https://logos.yubhub.co/databricks.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/databricks/jobs/8334738002","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$264,300-$322,300 USD","x-skills-required":["Distributed systems","Cloud computing","Fleet management","Resilience","Scalability"],"x-skills-preferred":["GPU management","Large-scale system development","Cloud operations"],"datePosted":"2026-04-18T15:42:05.071Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Bellevue, Washington"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Distributed systems, Cloud computing, Fleet management, Resilience, Scalability, GPU management, Large-scale system development, Cloud operations","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":264300,"maxValue":322300,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_18ae1499-b22"},"title":"Research Engineer, Discovery","description":"<p>As a Research Engineer on our team, you will work end-to-end across the whole model stack, identifying and addressing key infra blockers on the path to scientific AGI. Strong candidates should have familiarity with elements of language model training, evaluation, and inference and eagerness to quickly dive and get up to speed in areas they are not yet an expert on.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Design and implement large-scale infrastructure systems to support AI scientist training, evaluation, and deployment across distributed environments</li>\n<li>Identify and resolve infrastructure bottlenecks impeding progress toward scientific capabilities</li>\n<li>Develop robust and reliable evaluation frameworks for measuring progress towards scientific AGI</li>\n<li>Build scalable and performant VM/sandboxing/container architectures to safely execute long-horizon AI tasks and scientific workflows</li>\n<li>Collaborate to translate experimental requirements into production-ready infrastructure</li>\n<li>Develop large scale data pipelines to handle advanced language model training requirements</li>\n<li>Optimize large scale training and inference pipelines for stable and efficient reinforcement learning</li>\n</ul>\n<p>You may be a good fit if you:</p>\n<ul>\n<li>Have 6+ years of highly-relevant experience in infrastructure engineering with demonstrated expertise in large-scale distributed systems</li>\n<li>Are a strong communicator and enjoy working collaboratively</li>\n<li>Possess deep knowledge of performance optimization techniques and system architectures for high-throughput ML workloads</li>\n<li>Have experience with containerization technologies (Docker, Kubernetes) and orchestration at scale</li>\n<li>Have proven track record of building large-scale data pipelines and distributed storage systems</li>\n<li>Excel at diagnosing and resolving complex infrastructure challenges in production environments</li>\n<li>Can work effectively across the full ML stack from data pipelines to performance optimization</li>\n<li>Have experience collaborating with other researchers to scale experimental ideas</li>\n<li>Thrive in fast-paced environments and can rapidly iterate from experimentation to production</li>\n</ul>\n<p>Strong candidates may also have:</p>\n<ul>\n<li>Experience with language model training infrastructure and distributed ML frameworks (PyTorch, JAX, etc.)</li>\n<li>Background in building infrastructure for AI research labs or large-scale ML organizations</li>\n<li>Knowledge of GPU/TPU architectures and language model inference optimization</li>\n<li>Experience with cloud platforms (AWS, GCP) at enterprise scale</li>\n<li>Familiarity with VM and container orchestration</li>\n<li>Experience with workflow orchestration tools and experiment management systems</li>\n<li>History working with large scale reinforcement learning</li>\n<li>Comfort with large scale data pipelines (Beam, Spark, Dask, …)</li>\n</ul>\n<p>The annual compensation range for this role is $350,000-$850,000 USD.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_18ae1499-b22","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4669581008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$350,000-$850,000 USD","x-skills-required":["large-scale distributed systems","containerization technologies (Docker, Kubernetes)","performance optimization techniques","system architectures for high-throughput ML workloads","data pipelines","distributed storage systems","ML frameworks (PyTorch, JAX, etc.)","GPU/TPU architectures","cloud platforms (AWS, GCP)","VM and container orchestration","workflow orchestration tools","experiment management systems","reinforcement learning","large scale data pipelines (Beam, Spark, Dask, …)"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:41:42.408Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"large-scale distributed systems, containerization technologies (Docker, Kubernetes), performance optimization techniques, system architectures for high-throughput ML workloads, data pipelines, distributed storage systems, ML frameworks (PyTorch, JAX, etc.), GPU/TPU architectures, cloud platforms (AWS, GCP), VM and container orchestration, workflow orchestration tools, experiment management systems, reinforcement learning, large scale data pipelines (Beam, Spark, Dask, …)","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":350000,"maxValue":850000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_1ee5ad51-8f0"},"title":"SWE - Grids - Fixed Term Contract - 6 Months - London, UK","description":"<p>We are seeking an experienced and hands-on Software Engineer for a fixed-term contract to join the Energy Grids team at Google DeepMind. In this individual contributor role, you will work at the cutting edge of power systems and machine learning, developing and deploying innovative AI solutions to optimize the operation of electrical power grids.</p>\n<p>Your work will be critical to delivering a real-world validation of our approach, with a primary focus on core software engineering tasks to:</p>\n<p>Enable rapid, trustworthy experimentation. Maintain rigorous benchmarking and testing. Manage scale for both data and model size. Ensure and maintain high data quality for both real-world and synthetic data.</p>\n<p><strong>Key Responsibilities</strong></p>\n<ul>\n<li>Design, implement, and maintain robust and reliable systems and workflows for generating large-scale synthetic and real datasets of power grid optimization problems.</li>\n<li>Design and implement rigorous unit, integration, and system tests to ensure the reliability, accuracy, and maintained performance of our models and software, with a focus on data pipelines.</li>\n<li>Maintain and contribute to our machine learning codebase, ensuring efficient data structures and seamless integration with our power system models and optimization solvers.</li>\n<li>Ensure the codebase supports ongoing experimentation, while simultaneously increasing scalability, robustness, and reliability via improved integration testing and performance benchmarking.</li>\n<li>Work closely and collaboratively with a team of engineers, research scientists, and product managers to deliver real-world impact.</li>\n</ul>\n<p><strong>Minimum Qualifications</strong></p>\n<ul>\n<li>Bachelor&#39;s degree in Computer Science, Software Engineering, or equivalent practical experience.</li>\n<li>Excellent proficiency in C++, Python, or Jax.</li>\n<li>Demonstrated experience developing or utilizing solutions for robustness or quality assurance within software and/or ML systems.</li>\n<li>Experience processing, generating, and analyzing large-scale data, e.g. for ML applications.</li>\n<li>Proven ability to discuss technical ideas effectively and collaborate in interdisciplinary teams.</li>\n<li>Motivated by the prospect of real-world impact and focused on excellence in software development.</li>\n</ul>\n<p><strong>Preferred Qualifications</strong></p>\n<ul>\n<li>Experience with Google&#39;s technical stack and/or Google Cloud Platform (GCP).</li>\n<li>Familiarity with modern hardware accelerators (GPU / TPU).</li>\n<li>Experience with modern ML training frameworks, such as Jax.</li>\n<li>Experience in developing software in a translational research or production setting.</li>\n<li>Proficiency in Julia</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_1ee5ad51-8f0","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Google DeepMind","sameAs":"https://deepmind.com/","logo":"https://logos.yubhub.co/deepmind.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/deepmind/jobs/7750738","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"contract","x-salary-range":null,"x-skills-required":["C++","Python","Jax","Robustness","Quality Assurance","Software Development","Machine Learning","Data Analysis"],"x-skills-preferred":["Google's technical stack","Google Cloud Platform (GCP)","Modern hardware accelerators (GPU / TPU)","Modern ML training frameworks (Jax)","Software development in a translational research or production setting","Proficiency in Julia"],"datePosted":"2026-04-18T15:40:16.781Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"London, UK"}},"employmentType":"CONTRACTOR","occupationalCategory":"Engineering","industry":"Technology","skills":"C++, Python, Jax, Robustness, Quality Assurance, Software Development, Machine Learning, Data Analysis, Google's technical stack, Google Cloud Platform (GCP), Modern hardware accelerators (GPU / TPU), Modern ML training frameworks (Jax), Software development in a translational research or production setting, Proficiency in Julia"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_8eec7f08-8c5"},"title":"Engineering Manager, Inference","description":"<p><strong>About the role:</strong></p>\n<p>As an Engineering Manager on Anthropic&#39;s performance and scaling teams, you will be responsible for ensuring the team is identifying and removing bottlenecks, building robust and durable solutions, and maximizing the efficiency of our systems.</p>\n<p><strong>Responsibilities:</strong></p>\n<ul>\n<li>Provide front-line leadership of engineering efforts to improve model performance and scale our inference and training systems</li>\n<li>Become familiar with the team&#39;s technical stack enough to make targeted contributions as an individual contributor</li>\n<li>Manage day-to-day execution of the team&#39;s work</li>\n<li>Prioritize the team&#39;s work and manage projects in a highly dynamic, fast-paced environment</li>\n<li>Coach and support your reports in understanding, and pursuing, their professional growth</li>\n<li>Maintain a deep understanding of the team&#39;s technical work and its implications for AI safety</li>\n</ul>\n<p><strong>Requirements:</strong></p>\n<ul>\n<li>1+ years of management experience in a technical environment, particularly performance or distributed systems</li>\n<li>Background in machine learning, AI, or a similar related technical field</li>\n<li>Deeply interested in the potential transformative effects of advanced AI systems and committed to ensuring their safe development</li>\n<li>Excel at building strong relationships with stakeholders at all levels</li>\n<li>Quick learner, capable of understanding and contributing to discussions on complex technical topics</li>\n<li>Experience managing teams through periods of rapid growth and change</li>\n</ul>\n<p><strong>Nice to have:</strong></p>\n<ul>\n<li>High performance, large-scale ML systems</li>\n<li>GPU/Accelerator programming</li>\n<li>ML framework internals</li>\n<li>OS internals</li>\n<li>Language modeling with transformers</li>\n</ul>\n<p><strong>Compensation:</strong></p>\n<p>The annual compensation range for this role is $425,000-$560,000 USD.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_8eec7f08-8c5","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4741102008","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$425,000-$560,000 USD","x-skills-required":["Machine Learning","AI","Performance Optimization","Distributed Systems","Leadership","Communication"],"x-skills-preferred":["High Performance Computing","GPU Programming","ML Frameworks","OS Internals","Language Modeling"],"datePosted":"2026-04-18T15:40:14.477Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Machine Learning, AI, Performance Optimization, Distributed Systems, Leadership, Communication, High Performance Computing, GPU Programming, ML Frameworks, OS Internals, Language Modeling","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":425000,"maxValue":560000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_28107212-128"},"title":"Performance Engineer, GPU","description":"<p>As a GPU Performance Engineer at Anthropic, you will be responsible for architecting and implementing the foundational systems that power Claude and push the frontiers of what&#39;s possible with large language models. You will maximize GPU utilization and performance at unprecedented scale, develop cutting-edge optimizations that directly enable new model capabilities, and dramatically improve inference efficiency.</p>\n<p>Working at the intersection of hardware and software, you will implement state-of-the-art techniques from custom kernel development to distributed system architectures. Your work will span the entire stack,from low-level tensor core optimizations to orchestrating thousands of GPUs in perfect synchronization.</p>\n<p>Strong candidates will have a track record of delivering transformative GPU performance improvements in production ML systems and will be excited to shape the future of AI infrastructure alongside world-class researchers and engineers.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Architect and implement foundational systems that power Claude</li>\n<li>Maximize GPU utilization and performance at unprecedented scale</li>\n<li>Develop cutting-edge optimizations that directly enable new model capabilities</li>\n<li>Dramatically improve inference efficiency</li>\n<li>Implement state-of-the-art techniques from custom kernel development to distributed system architectures</li>\n<li>Work at the intersection of hardware and software</li>\n<li>Span the entire stack,from low-level tensor core optimizations to orchestrating thousands of GPUs in perfect synchronization</li>\n</ul>\n<p>Requirements:</p>\n<ul>\n<li>Deep experience with GPU programming and optimization at scale</li>\n<li>Impact-driven, passionate about delivering measurable performance breakthroughs</li>\n<li>Ability to navigate complex systems from hardware interfaces to high-level ML frameworks</li>\n<li>Enjoy collaborative problem-solving and pair programming</li>\n<li>Want to work on state-of-the-art language models with real-world impact</li>\n<li>Care about the societal impacts of your work</li>\n<li>Thrive in ambiguous environments where you define the path forward</li>\n</ul>\n<p>Nice to have:</p>\n<ul>\n<li>Experience with GPU Kernel Development: CUDA, Triton, CUTLASS, Flash Attention, tensor core optimization</li>\n<li>ML Compilers &amp; Frameworks: PyTorch/JAX internals, torch.compile, XLA, custom operators</li>\n<li>Performance Engineering: Kernel fusion, memory bandwidth optimization, profiling with Nsight</li>\n<li>Distributed Systems: NCCL, NVLink, collective communication, model parallelism</li>\n<li>Low-Precision: INT8/FP8 quantization, mixed-precision techniques</li>\n<li>Production Systems: Large-scale training infrastructure, fault tolerance, cluster orchestration</li>\n</ul>\n<p>Representative projects:</p>\n<ul>\n<li>Co-design attention mechanisms and algorithms for next-generation hardware architectures</li>\n<li>Develop custom kernels for emerging quantization formats and mixed-precision techniques</li>\n<li>Design distributed communication strategies for multi-node GPU clusters</li>\n<li>Optimize end-to-end training and inference pipelines for frontier language models</li>\n<li>Build performance modeling frameworks to predict and optimize GPU utilization</li>\n<li>Implement kernel fusion strategies to minimize memory bandwidth bottlenecks</li>\n<li>Create resilient systems for planet-scale distributed training infrastructure</li>\n<li>Profile and eliminate performance bottlenecks in production serving infrastructure</li>\n<li>Partner with hardware vendors to influence future accelerator capabilities and software stacks</li>\n</ul>\n<p>Note: The salary range for this position is $280,000-$850,000 USD per year.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_28107212-128","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4926227008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$280,000-$850,000 USD per year","x-skills-required":["GPU programming","optimization at scale","CUDA","Triton","CUTLASS","Flash Attention","tensor core optimization","PyTorch/JAX internals","torch.compile","XLA","custom operators","kernel fusion","memory bandwidth optimization","profiling with Nsight","NCCL","NVLink","collective communication","model parallelism","INT8/FP8 quantization","mixed-precision techniques","large-scale training infrastructure","fault tolerance","cluster orchestration"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:40:11.758Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"GPU programming, optimization at scale, CUDA, Triton, CUTLASS, Flash Attention, tensor core optimization, PyTorch/JAX internals, torch.compile, XLA, custom operators, kernel fusion, memory bandwidth optimization, profiling with Nsight, NCCL, NVLink, collective communication, model parallelism, INT8/FP8 quantization, mixed-precision techniques, large-scale training infrastructure, fault tolerance, cluster orchestration","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":280000,"maxValue":850000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_e9e3cff7-d9b"},"title":"Performance Engineer","description":"<p>As a Performance Engineer at Anthropic, you will be responsible for identifying and solving novel systems problems that arise when running machine learning algorithms at scale. Your expertise will be crucial in developing systems that optimize the throughput and robustness of our largest distributed systems.</p>\n<p>You will work closely with our team of researchers, engineers, and policy experts to build beneficial AI systems. Your contributions will have a direct impact on the development of our AI technology and its applications.</p>\n<p>We are looking for a highly motivated and experienced engineer who is passionate about solving complex systems problems and has a strong background in software engineering or machine learning. If you are excited about the opportunity to work on cutting-edge AI technology and make a meaningful contribution to the field, we encourage you to apply.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Identify and solve novel systems problems that arise when running machine learning algorithms at scale</li>\n<li>Develop systems that optimize the throughput and robustness of our largest distributed systems</li>\n<li>Collaborate with our team of researchers, engineers, and policy experts to build beneficial AI systems</li>\n<li>Contribute to the development of our AI technology and its applications</li>\n</ul>\n<p>Requirements:</p>\n<ul>\n<li>Significant software engineering or machine learning experience, particularly at supercomputing scale</li>\n<li>Results-oriented, with a bias towards flexibility and impact</li>\n<li>Ability to pick up slack, even if it goes outside your job description</li>\n<li>Enjoy pair programming</li>\n<li>Want to learn more about machine learning research</li>\n<li>Care about the societal impacts of your work</li>\n</ul>\n<p>Preferred qualifications:</p>\n<ul>\n<li>Experience with high-performance, large-scale ML systems</li>\n<li>GPU/Accelerator programming</li>\n<li>ML framework internals</li>\n<li>OS internals</li>\n<li>Language modeling with transformers</li>\n</ul>\n<p>Benefits:</p>\n<ul>\n<li>Competitive compensation and benefits</li>\n<li>Optional equity donation matching</li>\n<li>Generous vacation and parental leave</li>\n<li>Flexible working hours</li>\n<li>Lovely office space in which to collaborate with colleagues</li>\n</ul>\n<p>Guidance on Candidates&#39; AI Usage: Learn about our policy for using AI in our application process</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_e9e3cff7-d9b","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4020350008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$280,000-$850,000 USD","x-skills-required":["software engineering","machine learning","high-performance computing","GPU/Accelerator programming","ML framework internals","OS internals","language modeling with transformers"],"x-skills-preferred":["pair programming","results-oriented","flexibility and impact","ability to pick up slack","enjoy learning"],"datePosted":"2026-04-18T15:40:07.874Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"software engineering, machine learning, high-performance computing, GPU/Accelerator programming, ML framework internals, OS internals, language modeling with transformers, pair programming, results-oriented, flexibility and impact, ability to pick up slack, enjoy learning","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":280000,"maxValue":850000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_63af8568-789"},"title":"Engineering Manager, Inference Routing and Performance","description":"<p><strong>About the role\\nEvery request that hits Claude , from claude.ai, the API, our cloud partners, or internal research , passes through a routing decision. Not a generic load balancer round-robin, but a decision that accounts for what&#39;s already cached where, which accelerator the request runs best on, and what else is in flight across the fleet.\\n\\nGet it right and you extract meaningfully more throughput from the same hardware. Get it wrong and you burn capacity, miss latency SLOs, or shed load that shouldn&#39;t have been shed.\\n\\nThe Inference Routing team owns this layer. We build the cluster-level routing and coordination plane for Anthropic&#39;s inference fleet , the system that sits between the API surface and the inference engines themselves, making fleet-wide efficiency decisions in real time.\\n\\nAs Anthropic moves from &quot;many independent inference replicas&quot; toward &quot;a single warehouse-scale computer running a coordinated program,&quot; Dystro is the coordination layer. This is a deeply technical team.\\n\\nThe engineers here design custom load-balancing algorithms, build quantitative models of system performance, debug latency spikes that cross kernel, network, and framework boundaries, and reason carefully about cache placement across thousands of accelerators.\\n\\nThey work shoulder-to-shoulder with teams that write kernels and ML framework internals.\\n\\nThe EM for this team doesn&#39;t need to write kernels , but they do need the systems depth to make architectural calls, evaluate deeply technical candidates, and spot when a proposed optimization will have second-order effects on the fleet.\\n\\nYou&#39;ll inherit a strong team of distributed-systems engineers, and you&#39;ll be accountable for two things that pull in different directions: shipping system-level performance improvements that measurably increase fleet throughput and efficiency, and running the team operationally so that deploys are safe, incidents are rare, and the teams who depend on Dystro can plan around you with confidence.\\n\\nThe job is holding both.\\n\\n## Representative work:\\nThings the Inference Routing EM actually spends time on:\\n- Deciding whether a proposed routing algorithm change is worth the deploy risk, given the modeled throughput gain and the blast radius if it regresses\\n- Sequencing a quarter where KV-cache offload, a new coordination protocol, and two model launches all compete for the same engineers\\n- Working through a persistent tail-latency regression with the team , walking down from fleet-level metrics to per-replica behavior to a root cause in the networking stack\\n- Building the case (with numbers) to peer teams for why a cross-team protocol change unlocks the next efficiency win\\n- Running the post-incident review after a cache-eviction bug caused a capacity event, and turning it into process changes that stick\\n- Interviewing a candidate who has built schedulers at supercomputing scale, and deciding whether they&#39;d be additive to a team that already goes deep\\n\\n## What you&#39;ll do:\\nDrive system-level performance\\n- Own the technical roadmap for cluster-level inference efficiency , routing decisions, cache placement and eviction, cross-replica coordination, and the protocols that keep routing and inference engines in sync\\n- Partner with the inference engine, kernels, and performance teams to identify fleet-level throughput and latency wins, then turn those into shipped improvements with measurable results\\n- Build the team&#39;s habit of quantitative performance modeling: claim a win only when you can measure it, and know before you ship what the expected effect is\\n\\nDeliver reliably and operate cleanly\\n- Set technical strategy for how routing evolves across heterogeneous hardware (GPUs, TPUs, Trainium) and across all our serving surfaces\\n- Run the team&#39;s operational backbone , on-call rotation, incident response, postmortem review, deploy safety , so the team can ship aggressively without the system becoming fragile\\n- Create clarity at a seam: Inference Routing sits between the API surface, the inference engines, and the cloud deployment teams. You&#39;ll make sure commitments are realistic, dependencies are understood, and nobody is surprised\\n\\nBuild and grow the team\\n- Develop and retain a strong existing team, and hire against the bar described above: people who can go to the OS and framework level when the problem demands it, and who care about production reliability\\n- Coach engineers through a roadmap where priorities shift with model launches, new hardware, and scaling demands. We pair a lot here , you&#39;ll help make that collaboration pattern productive\\n- Pick up slack when it matters. This is a small team in a critical path; sometimes the EM is the one unblocking a stuck deploy or synthesizing a design debate\\n\\n## You may be a good fit if you:\\n- Have 5+ years of engineering management experience, ideally with at least part of that leading teams on critical-path production infrastructure at scale\\n- Have a deep systems background , load balancing, scheduling, cache-coherent distributed state, high-performance networking, or similar. You need enough depth to make architectural calls about routing and efficiency, and to evaluate candidates who go to the kernel and framework level\\n- Have shipped performance improvements in large-scale systems and can explain, with numbers, what the impact was\\n- Have run production infrastructure with real operational stakes: on-call, incident response, capacity events, deploy discipline\\n- Are results-oriented with a bias toward impact, and comfortable working in a space where throughput, latency, stability, and feature velocity all pull in different directions\\n- Build strong relationships across team boundaries , this is a seam role, and much of the job is making sure other teams can rely on yours\\n- Are curious about machine learning systems. You don&#39;t need an ML research background, but you should want to learn how transformer inference actually works and how that shapes the systems problems\\n\\nStrong candidates may also have:\\n- Experience with LLM inference serving , KV caching, continuous batching, request scheduling, prefill/decode disaggregation\\n- Background in cluster schedulers, load balancers, service meshes, or coordination planes at scale\\n- Familiarity with heterogeneous accelerator fleets (GPU/TPU/Trainium) and how hardware differences affect workload placement\\n- Experience with GPU/accelerator programming, ML framework internals, or OS-level performance debugging , enough to follow and evaluate the technical work, not necessarily to do it daily\\n- Led teams at supercomputing or hyperscaler infrastructure scale\\n- Led teams through rapid-growth periods where hiring and onboarding competed with roadmap delivery\\n\\nThe annual compensation range for this role is listed below. For sales roles, the range provided is the role’s On Target Earnings (&quot;OTE&quot;) range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.\\nAnnual Salary: $405,000-$485,000 USD</strong></p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_63af8568-789","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5155391008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$405,000-$485,000 USD","x-skills-required":["engineering management","deep systems background","load balancing","scheduling","cache-coherent distributed state","high-performance networking"],"x-skills-preferred":["LLM inference serving","cluster schedulers","load balancers","service meshes","coordination planes","heterogeneous accelerator fleets","GPU/TPU/Trainium","GPU/accelerator programming","ML framework internals","OS-level performance debugging"],"datePosted":"2026-04-18T15:37:38.038Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"engineering management, deep systems background, load balancing, scheduling, cache-coherent distributed state, high-performance networking, LLM inference serving, cluster schedulers, load balancers, service meshes, coordination planes, heterogeneous accelerator fleets, GPU/TPU/Trainium, GPU/accelerator programming, ML framework internals, OS-level performance debugging","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":405000,"maxValue":485000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_1819a743-ca5"},"title":"Engineering Manager, GPU (ML Accelerator)","description":"<p>About the role:</p>\n<p>As an Engineering Manager on Anthropic&#39;s performance and scaling teams, you will be responsible for ensuring your team identifies and removes bottlenecks, builds robust and durable solutions, and maximizes the efficiency of our systems.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Provide front-line leadership of engineering efforts to improve model performance and scale our inference and training systems</li>\n<li>Become familiar with the team&#39;s technical stack enough to make targeted contributions as an individual contributor</li>\n<li>Manage day-to-day execution of the team&#39;s work</li>\n<li>Prioritize the team&#39;s work and manage projects in a highly dynamic, fast-paced environment</li>\n<li>Coach and support your reports in understanding, and pursuing, their professional growth</li>\n<li>Maintain a deep understanding of the team&#39;s technical work and its implications for AI safety</li>\n</ul>\n<p>You may be a good fit if you:</p>\n<ul>\n<li>Have 1+ years of management experience in a technical environment, particularly performance or distributed systems</li>\n<li>Have a background in machine learning, AI, or a similar related technical field</li>\n<li>Are deeply interested in the potential transformative effects of advanced AI systems and are committed to ensuring their safe development</li>\n<li>Excel at building strong relationships with stakeholders at all levels</li>\n<li>Are a quick learner, capable of understanding and contributing to discussions on complex technical topics</li>\n<li>Have experience managing teams through periods of rapid growth and change</li>\n</ul>\n<p>Strong candidates may also have experience with:</p>\n<ul>\n<li>High-performance, large-scale ML systems</li>\n<li>GPU/Accelerator programming</li>\n<li>ML framework internals</li>\n<li>OS internals</li>\n<li>Language modeling with transformers</li>\n</ul>\n<p>The annual compensation range for this role is $500,000-$850,000 USD.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_1819a743-ca5","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4741104008","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$500,000-$850,000 USD","x-skills-required":["Machine Learning","AI","Performance or Distributed Systems","GPU/Accelerator Programming","ML Framework Internals","OS Internals","Language Modeling with Transformers"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:37:19.926Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Machine Learning, AI, Performance or Distributed Systems, GPU/Accelerator Programming, ML Framework Internals, OS Internals, Language Modeling with Transformers","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":500000,"maxValue":850000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_0a872d93-7f6"},"title":"Engineering Manager, Cloud Inference AWS","description":"<p>We are seeking an experienced Engineering Manager to lead the Cloud Inference team for AWS. You will lead your team to scale and optimize Claude to serve the massive audiences of developers and enterprise companies using AWS.</p>\n<p>As an Engineering Manager, you will own the end-to-end product of Claude on AWS, including API, load balancing, inference, capacity and operations. Your team will ensure our LLMs meet rigorous performance, safety and security standards and enhance our core infrastructure for packaging, testing, and deploying inference technology across the globe.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Set technical strategy and oversee development of Claude on AWS across all layers of the technical stack.</li>\n<li>Collaborate across teams and companies to deeply understand product, infrastructure, operations and capacity needs, identifying potential solutions to support frontier LLM serving</li>\n<li>Work closely with cross-functional stakeholders across companies to align on goals and drive outcomes</li>\n<li>Create clarity for the team and stakeholders in an ambiguous and evolving environment</li>\n<li>Take an inclusive approach to hiring and coaching top technical talent, and support a high performing team</li>\n<li>Design and run processes (e.g. postmortem review, incident response, on-call rotations) that help the team operate effectively and never fail the same way twice</li>\n</ul>\n<p>Requirements:</p>\n<ul>\n<li>10+ years of experience in high-scale, high-reliability software development, particularly infrastructure or capacity management</li>\n<li>5+ years of engineering management experience</li>\n<li>Experience recruiting, scaling, and retaining engineering talent in a high growth environment</li>\n<li>Have experience scaling products, resources and operations to accommodate rapid growth</li>\n<li>Are deeply interested in the potential transformative effects of advanced AI systems and are committed to ensuring their safe development</li>\n<li>Excel at building strong relationships and strategy with stakeholders across engineering, product, finance, and sales</li>\n<li>Have experience working with external partners to align goals and deliver impact</li>\n<li>Enjoy working in a fast-paced, early environment; comfortable with adapting priorities as driven by the rapidly evolving AI space</li>\n<li>Have excellent written and verbal communication skills</li>\n<li>Demonstrated success building a culture of belonging and engineering excellence</li>\n<li>Are motivated by developing AI responsibly and safely</li>\n<li>Are willing and able to travel frequently between Seattle and the SF Bay Area</li>\n</ul>\n<p>Strong candidates may also have experience with:</p>\n<ul>\n<li>Experience with machine learning infrastructure like GPUs, TPUs, or Trainium, as well as supporting networking infrastructure like NCCL</li>\n<li>Experience as a Product Manager</li>\n<li>Experience with deployment and capacity management automation</li>\n<li>Security and privacy best practice expertise</li>\n</ul>\n<p>Annual compensation range for this role is $405,000-$485,000 USD.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_0a872d93-7f6","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5141377008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$405,000-$485,000 USD","x-skills-required":["Cloud Inference","AWS","Machine Learning","Infrastructure Management","Capacity Planning","Security and Privacy","Leadership","Communication","Collaboration"],"x-skills-preferred":["GPU","TPU","Trainium","NCCL","Product Management","Deployment Automation","Security Best Practices"],"datePosted":"2026-04-18T15:37:12.539Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Cloud Inference, AWS, Machine Learning, Infrastructure Management, Capacity Planning, Security and Privacy, Leadership, Communication, Collaboration, GPU, TPU, Trainium, NCCL, Product Management, Deployment Automation, Security Best Practices","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":405000,"maxValue":485000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_a0051ff6-ddf"},"title":"Facilities Operations Manager","description":"<p>We&#39;re seeking a driven Facilities Operations Manager to join our team and ensure the relentless performance of our data center infrastructure. This role is critical to maintaining the uptime and efficiency of the systems powering our AI breakthroughs.</p>\n<p>As a Facilities Operations Manager, you&#39;ll lead teams, oversee cutting-edge facilities, and solve complex problems in real time to keep our mission on track. You&#39;ll own the operation of power, cooling, and monitoring systems at scale, bringing technical depth and a no-excuses mindset to our facility.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Manage all aspects of data center critical infrastructure,switchgear, generators, UPS systems, chillers, liquid cooling, and building monitoring,ensuring 99.999%+ uptime.</li>\n<li>Lead 24x7 teams of facility technicians and vendors, driving safety, execution, and a culture of accountability.</li>\n<li>Troubleshoot and resolve facility emergencies using root cause analysis, acting as the go-to escalation point.</li>\n<li>Spearhead optimization projects, collaborating with engineers to integrate next-gen tech and cut operational costs.</li>\n<li>Own the operations budget, balancing efficiency with performance under tight deadlines.</li>\n<li>Enforce compliance with safety and operational protocols, anticipating regulatory shifts.</li>\n<li>Coordinate with cross-functional teams to deliver high-quality outcomes and boost team morale.</li>\n<li>Support multi-site operations and new facility build-outs as xAI scales.</li>\n</ul>\n<p>Basic Qualifications:</p>\n<ul>\n<li>Minimum of 5 years in data center operations or facility management, ideally with hyperscaler or industrial systems.</li>\n<li>Strong grasp of critical infrastructure,power, cooling, and monitoring systems.</li>\n<li>Proven ability to lead teams and manage projects under pressure.</li>\n<li>Sharp analytical and communication skills.</li>\n</ul>\n<p>Preferred Skills and Experience:</p>\n<ul>\n<li>B.S. in Engineering, Facilities Management, or related field; advanced degree a plus.</li>\n<li>Experience with GPU clusters or AI-driven data center environments.</li>\n<li>Methodical troubleshooting and technical leadership chops.</li>\n<li>Familiarity with Southaven, MS area regulations and practices is a bonus.</li>\n<li>Comfort with Excel, Word, and operational tools; CAD or monitoring software knowledge is a plus.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_a0051ff6-ddf","directApply":true,"hiringOrganization":{"@type":"Organization","name":"xAI","sameAs":"https://www.xai.com/","logo":"https://logos.yubhub.co/xai.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/xai/jobs/4685202007","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["data center operations","facility management","critical infrastructure","team leadership","project management","analytical skills","communication skills"],"x-skills-preferred":["GPU clusters","AI-driven data center environments","methodical troubleshooting","technical leadership","CAD or monitoring software"],"datePosted":"2026-04-18T15:35:02.637Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Southaven, MS"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"data center operations, facility management, critical infrastructure, team leadership, project management, analytical skills, communication skills, GPU clusters, AI-driven data center environments, methodical troubleshooting, technical leadership, CAD or monitoring software"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_2e513a92-ec5"},"title":"Research Scientist (Generative Modeling)","description":"<p>We are seeking a talented Research Scientist with a strong background in generative modeling, particularly diffusion models, to join our modeling team. This role is ideal for candidates with deep expertise in diffusion models applied to images, videos, or 3D assets and scenes.</p>\n<p>While experience in one or more of the following areas is a strong plus: large-scale model training, research in 3D computer vision.</p>\n<p>You will collaborate closely with researchers, engineers, and product teams to bring advanced 3D modeling and machine learning techniques into real-world applications, ensuring that our technology remains at the forefront of visual innovation. This role involves significant hands-on research and engineering work, driving projects from conceptualization through to production deployment.</p>\n<p>Key responsibilities include designing, implementing, and training large-scale diffusion models for generating 3D worlds, developing and experimenting with large-scale diffusion models to add novel control signals, adapting to target aesthetic preferences, or distilling for efficient inference, collaborating closely with research and product teams to understand and translate product requirements into effective technical roadmaps, contributing hands-on to all stages of model development including data curation, experimentation, evaluation, and deployment, continuously exploring and integrating cutting-edge research in diffusion and generative AI more broadly, acting as a key technical resource within the team, mentoring colleagues, and driving best practices in generative modeling and ML engineering.</p>\n<p>Ideal candidate profile includes 3+ years of experience in generative modeling or applied ML roles, extensive experience with machine learning frameworks such as PyTorch or TensorFlow, especially in the context of diffusion models and other generative models, deep expertise in at least one area of generative modeling, strong history of publications or open-source contributions involving large-scale diffusion models, strong coding proficiency in Python and experience with GPU-accelerated computing, ability to engage effectively with researchers and cross-functional teams, clearly translating complex technical ideas into actionable tasks and outcomes, comfortable operating within a dynamic startup environment with high levels of ambiguity, ownership, and innovation.</p>\n<p>Nice to have includes contributions to open-source projects in the fields of computer vision, graphics, or ML, familiarity with large-scale training infrastructure, experience integrating machine learning models into production environments, led or been involved with the development or training of large-scale, state-of-the-art generative models.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_2e513a92-ec5","directApply":true,"hiringOrganization":{"@type":"Organization","name":"World Labs","sameAs":"https://worldlabs.ai","logo":"https://logos.yubhub.co/worldlabs.ai.png"},"x-apply-url":"https://job-boards.greenhouse.io/worldlabs/jobs/4089324009","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$250,000 - $325,000 base salary (good-faith estimate for San Francisco Bay Area upon hire; actual offer based on experience, skills, and qualifications)","x-skills-required":["generative modeling","diffusion models","PyTorch","TensorFlow","machine learning frameworks","large-scale model training","research in 3D computer vision","data curation","experimentation","evaluation","deployment","GPU-accelerated computing","Python"],"x-skills-preferred":["open-source contributions","large-scale training infrastructure","integrating machine learning models into production environments","leading or being involved with the development or training of large-scale, state-of-the-art generative models"],"datePosted":"2026-04-17T13:09:56.134Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"generative modeling, diffusion models, PyTorch, TensorFlow, machine learning frameworks, large-scale model training, research in 3D computer vision, data curation, experimentation, evaluation, deployment, GPU-accelerated computing, Python, open-source contributions, large-scale training infrastructure, integrating machine learning models into production environments, leading or being involved with the development or training of large-scale, state-of-the-art generative models","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":250000,"maxValue":325000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_3a7e27c3-92a"},"title":"Pipeline Engineer (Graphics/3D)","description":"<p>We are building a production-grade web application for 3D Gaussian Splat scene generation, editing, and publishing. We&#39;re looking for a Pipeline Engineer to help integrate cutting-edge research features and make them reliable, debuggable, and delightful to use.</p>\n<p>This is a high-ownership, fullstack-but-backend-heavy role that sits between R&amp;D and frontend. You will work end-to-end across graphics/ML algorithms, backend services, and frontend UI , turning proof-of-concepts into shipped features that users can rely on. The ideal candidate enjoys making complex, messy systems work smoothly in production and improving them continuously based on both internal testing and external user feedback.</p>\n<p><strong>Key Responsibilities</strong></p>\n<ul>\n<li>Bridge research and product by working closely with both graphics/computer vision researchers and frontend engineers to ship usable features.</li>\n<li>Turn standalone Python scripts into clean, production-ready systems with clear inputs, outputs, validations, and failure modes.</li>\n<li>Develop backend services, APIs, and tooling that expose complex 3D workflows in a reliable and scalable way.</li>\n<li>Assist in integrations across the 3D ecosystem, including asset import/export and format conversion with common DCC tools.</li>\n</ul>\n<p><strong>Ideal Candidate Profile</strong></p>\n<ul>\n<li>You have a strong pipeline mindset, with experience turning scripts into production systems with clear inputs, outputs, validations, and failure modes.</li>\n<li>You enjoy building tools and infrastructure that enable others, and you take pride in making complex systems understandable and usable.</li>\n<li>You have fluency in the 3D ecosystem, including familiarity with 3D algorithms, DCC tools and common 3D file formats, sufficient to design integrations and debug workflow issues.</li>\n</ul>\n<p><strong>Minimum Qualifications</strong></p>\n<ul>\n<li>Strong proficiency in Python, including packaging, typing, tooling, debugging, and performance profiling.</li>\n<li>Strong literacy in core 3D graphics and computer vision concepts, such as transforms, cameras, coordinate systems, rendering, and visual artifact debugging.</li>\n<li>Demonstrated experience taking prototypes to production, including refactoring, testing, CI/CD, versioned artifacts, and reproducibility.</li>\n<li>Solid backend fundamentals, including HTTP APIs, FastAPI (or similar frameworks), async/concurrency basics, cloud deployment, and service reliability.</li>\n</ul>\n<p><strong>Strongly Preferred / Nice-to-Haves</strong></p>\n<ul>\n<li>Experience with photogrammetry, 3D reconstruction, or Gaussian splat rendering pipelines.</li>\n<li>Hands-on experience with DCC tools such as Blender, Maya, Houdini, Unreal, or Unity.</li>\n<li>Familiarity with the GPU stack (CUDA, PyTorch), batch/queue systems, and containerization (Docker, Kubernetes).</li>\n<li>Frontend adjacency, with comfort collaborating on React-based parameter plumbing and UX for technical controls.</li>\n<li>Experience with production pipelines at VFX, animation, or gaming studios.</li>\n<li>A production support mindset, including willingness to iterate on documentation, tutorials, and error messages to improve usability and reduce misuse.</li>\n</ul>\n<p><strong>Example Projects You Might Work On</strong></p>\n<ul>\n<li>Packaging ML and 3D Python pipelines into GPU-backed FastAPI services with request validation, reproducible outputs, and well-defined request/response schemas.</li>\n<li>Designing parameter schemas and defaults that map cleanly from frontend controls to backend APIs and internal pipeline configurations.</li>\n<li>Integrating import/export workflows with popular DCC tools (e.g., Blender, Maya, Houdini, Unity, Unreal, USD), identifying workflow friction, and producing lightweight documentation, tutorials, and example code/scripts to help users succeed.</li>\n</ul>\n<p><strong>Who You Are</strong></p>\n<ul>\n<li>Fearless Innovator: We need people who thrive on challenges and aren&#39;t afraid to tackle the impossible.</li>\n<li>Resilient Builder: Impacting Large World Models isn&#39;t a sprint; it&#39;s a marathon with hurdles. We&#39;re looking for builders who can weather the storms of groundbreaking research and come out stronger.</li>\n<li>Mission-Driven Mindset: Everything we do is in service of creating the best spatially intelligent AI systems, and using them to empower people.</li>\n<li>Collaborative Spirit: We&#39;re building something bigger than any one person. We need team players who can harness the power of collective intelligence.</li>\n</ul>\n<p>We&#39;re hiring the brightest minds from around the globe to bring diverse perspectives to our cutting-edge work. If you&#39;re ready to work on technology that will reshape how machines perceive and interact with the world - then World Labs is your launchpad.</p>\n<p>Join us, and let&#39;s make history together.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_3a7e27c3-92a","directApply":true,"hiringOrganization":{"@type":"Organization","name":"World Labs","sameAs":"https://www.worldlabs.ai/","logo":"https://logos.yubhub.co/worldlabs.ai.png"},"x-apply-url":"https://job-boards.greenhouse.io/worldlabs/jobs/4093035009","x-work-arrangement":"remote","x-experience-level":null,"x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Python","3D graphics","computer vision","FastAPI","HTTP APIs","async/concurrency basics","cloud deployment","service reliability"],"x-skills-preferred":["photogrammetry","3D reconstruction","Gaussian splat rendering","Blender","Maya","Houdini","Unreal","Unity","GPU stack","batch/queue systems","containerization","React-based parameter plumbing","UX for technical controls","production pipelines","VFX","animation","gaming studios"],"datePosted":"2026-04-17T13:09:23.876Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, 3D graphics, computer vision, FastAPI, HTTP APIs, async/concurrency basics, cloud deployment, service reliability, photogrammetry, 3D reconstruction, Gaussian splat rendering, Blender, Maya, Houdini, Unreal, Unity, GPU stack, batch/queue systems, containerization, React-based parameter plumbing, UX for technical controls, production pipelines, VFX, animation, gaming studios"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_d04b0aff-768"},"title":"Senior Engineer, Radar Modeling & Simulation","description":"<p>The Software Integration &amp; Operations group turns frontier autonomy into mission-ready aircraft. We own the commit-to-flight pipeline,deterministic aircraft and mission simulation, HIL/VIL integration, CI/CD, automated flight qualification testing, and release engineering. Our goal is simple: make AI fly,safely, repeatably, and fast.</p>\n<p>As a Modeling &amp; Simulation Engineer, you will be responsible for improving and adding to our sensor and communications model suite so that our operator training and internal engineering pipelines have a seamless translation from sim to real results.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Develop and enhance radar sensor models for use in simulation and evaluation of aeronautical vehicles.</li>\n<li>Translate theoretical models into efficient, reliable C++ implementations with a focus on numerical accuracy and performance.</li>\n<li>Validate models against real-world data and authoritative references, including field test data and calibration procedures.</li>\n<li>Collaborate with simulation and training application teams to ensure models integrate cleanly into operator-facing tools.</li>\n<li>Design automated validation and regression testing strategies for mathematical models to ensure fidelity across releases.</li>\n<li>Prototype and evaluate new modeling techniques (reduced-order models, uncertainty quantification, machine learning–based surrogates) to push the state of the art.</li>\n<li>Document assumptions, equations, and validation results so that both engineers and operators can trust model outputs.</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>BS or higher in Aerospace Engineering, Applied Math, Physics, or related field with 5+ years of aerospace modeling experience.</li>\n<li>C++ foundation with experience implementing numerical methods.</li>\n<li>Demonstrated experience with aerospace models such as: Radar sensors, Radio communications systems</li>\n<li>Experience validating simulations against real-world or experimental data.</li>\n<li>Ability to write clear documentation explaining assumptions, limitations, and expected behaviors of models.</li>\n</ul>\n<p><strong>Preferences</strong></p>\n<ul>\n<li>1+ years of experience working on pilot/operator training systems.</li>\n<li>Experience with Eigen or SciPy for model prototyping and validation.</li>\n<li>Familiarity with state estimation sensor models (GPS, IMU, Gyro, etc) for simulation environments.</li>\n<li>Demonstrated experience with payload sensor models including: Laser sensors, IR and optical cameras</li>\n<li>Knowledge of uncertainty quantification and statistical analysis methods.</li>\n<li>Experience with parallelization or GPU acceleration for compute-heavy models.</li>\n<li>Strong problem-solving mindset with a collaborative and detail-oriented approach.</li>\n<li>Familiarity with Python for test automation and data analysis pipelines.</li>\n<li>Passion for aerospace and autonomous vehicle systems.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_d04b0aff-768","directApply":true,"hiringOrganization":{"@type":"Organization","name":"X-BAT Division – X-BAT Engineering - Software","sameAs":"http://bit.ly/shieldai_lever_homepage","logo":"https://logos.yubhub.co/bit.ly.png"},"x-apply-url":"https://jobs.lever.co/shieldai/5118a08f-4ae8-431f-a06f-6dba3eaff113","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$105,000 - $200,000 a year","x-skills-required":["C++","Numerical methods","Aerospace models","Radar sensors","Radio communications systems"],"x-skills-preferred":["Eigen","SciPy","State estimation sensor models","Laser sensors","IR and optical cameras","Uncertainty quantification","Statistical analysis methods","Parallelization","GPU acceleration","Python","Test automation","Data analysis pipelines"],"datePosted":"2026-04-17T13:05:08.450Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Dallas, Texas / San Diego, California"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Automotive","skills":"C++, Numerical methods, Aerospace models, Radar sensors, Radio communications systems, Eigen, SciPy, State estimation sensor models, Laser sensors, IR and optical cameras, Uncertainty quantification, Statistical analysis methods, Parallelization, GPU acceleration, Python, Test automation, Data analysis pipelines","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":105000,"maxValue":200000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_d5b743bb-d8f"},"title":"Product Manager, AI Platforms","description":"<p>The AI Platform Product Manager will drive the strategy and execution of Shield AI&#39;s next-generation autonomy intelligence stack. This PM owns the product vision and roadmap for the Hivemind AI Platform, ensuring we can manufacture, govern, and field advanced world models, robotics foundation models, and vision-language-action systems safely and at scale.</p>\n<p>This role sits at the intersection of AI/ML, autonomy, model lifecycle, infrastructure, and product strategy. The PM partners closely with engineering, AI research, Hivemind Solutions, and field teams to deliver the tooling that enables sovereign autonomy, AI Factories at the edge, and continuous learning,capabilities that are central to Shield AI&#39;s strategic direction.</p>\n<p>This is a high-impact role for an experienced product leader excited to define how foundation models are trained, validated, governed, and deployed across thousands of autonomous systems in highly contested environments.</p>\n<p><strong>Responsibilities:</strong></p>\n<ul>\n<li>AI Model Development &amp; Training Platform</li>\n</ul>\n<p>Own the roadmap for foundation model training workflows, including dataset ingestion, curation, labeling, synthetic data generation, domain model training, and distillation pipelines. Define requirements for world models, robotics models, and VLA-based training, evaluation, and specialization. Lead the evolution of MLOps capabilities in Forge, including data lineage, experiment tracking, model versioning, and scalable evaluation suites.</p>\n<ul>\n<li>Data, Simulation &amp; Synthetic Data Factory</li>\n</ul>\n<p>Define product requirements for synthetic data generation, simulation-integrated data flywheels, and automated scenario generation. Partner with Digital Twin, Simulation, and autonomy teams to convert natural-language mission inputs into data needs, training procedures, and model variants.</p>\n<ul>\n<li>Safe Deployment &amp; Model Governance</li>\n</ul>\n<p>Lead the development of model governance and auditability tooling, including model cards, dataset rights, lineage tracking, safety gates, and compliance evidence. Build guardrails and workflows to safely deploy models onto edge hardware in disconnected, GPS- or comms-denied environments. Partner with Safety, Certification, Cyber, and Engineering teams to ensure traceability and evaluation pipelines meet operational and accreditation requirements.</p>\n<ul>\n<li>Edge Deployment &amp; AI Factory Integration</li>\n</ul>\n<p>Partner with Pilot, EdgeOS, and hardware teams to integrate foundation-model-based perception and reasoning into autonomy behaviors. Define requirements for distillation, quantization, and inference tooling as part of the “three-computer” development and deployment model. Ensure closed-loop workflows between cloud model training and edge-native execution.</p>\n<ul>\n<li>Cross-Functional Leadership</li>\n</ul>\n<p>Collaborate with Engineering, Research, Product, Customer Engagement, and Solutions teams to ensure model outputs meet mission and platform constraints. Translate advanced AI capabilities into intuitive workflows that platform OEMs and partner nations can use to build sovereign AI factories. Sequence foundational capabilities that unblock autonomy, simulation, and customer-facing product teams.</p>\n<ul>\n<li>User &amp; Customer Impact</li>\n</ul>\n<p>Develop deep empathy for ML engineers, autonomy developers, and Solutions engineers who rely on the platform. Capture operational data gaps, mission-driven model needs, and domain-specific specialization requirements. Lead demos and onboarding for model-development capabilities across internal and external teams.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_d5b743bb-d8f","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Shield AI","sameAs":"https://www.shield.ai","logo":"https://logos.yubhub.co/shield.ai.png"},"x-apply-url":"https://jobs.lever.co/shieldai/7886f437-2d5e-4616-8dcb-3dc488f1f585","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$190,000 - $290,000 a year","x-skills-required":["AI Model Development & Training Platform","Data, Simulation & Synthetic Data Factory","Safe Deployment & Model Governance","Edge Deployment & AI Factory Integration","Cross-Functional Leadership","User & Customer Impact","Strong engineering background","Deep understanding of foundation models, robotics models, multimodal models, MLOps, and training infrastructure","Experience managing complex products spanning data pipelines, cloud training clusters, model governance, and edge deployments","Proven success partnering with research teams to transition ML innovations into stable, production-grade workflows"],"x-skills-preferred":["Experience working on autonomy, robotics, embedded AI, or mission-critical systems","Hands-on familiarity with GPU infrastructure, distributed training, or data lakehouse architectures","Experience supporting defense, dual-use, or safety-critical AI systems","Background designing or operating AI Factory–style pipelines (data → training → evaluation → distillation → edge deployment)","Advanced degree in engineering, ML/AI, robotics, or a related field"],"datePosted":"2026-04-17T13:02:54.419Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Diego"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"AI Model Development & Training Platform, Data, Simulation & Synthetic Data Factory, Safe Deployment & Model Governance, Edge Deployment & AI Factory Integration, Cross-Functional Leadership, User & Customer Impact, Strong engineering background, Deep understanding of foundation models, robotics models, multimodal models, MLOps, and training infrastructure, Experience managing complex products spanning data pipelines, cloud training clusters, model governance, and edge deployments, Proven success partnering with research teams to transition ML innovations into stable, production-grade workflows, Experience working on autonomy, robotics, embedded AI, or mission-critical systems, Hands-on familiarity with GPU infrastructure, distributed training, or data lakehouse architectures, Experience supporting defense, dual-use, or safety-critical AI systems, Background designing or operating AI Factory–style pipelines (data → training → evaluation → distillation → edge deployment), Advanced degree in engineering, ML/AI, robotics, or a related field","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":190000,"maxValue":290000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_756d7161-c40"},"title":"Graphics Programmer","description":"<p>We are seeking a talented Graphics Programmer to join our team in building immersive web applications. You will be responsible for leveraging modern WebGL technologies to create engaging 3D experiences that push the boundaries of what&#39;s possible in the browser.</p>\n<p>Responsibilities:\nDevelop and optimize 3D graphics systems for web applications using ThreeJS\nImplement advanced rendering techniques to achieve high visual quality while maintaining performance\nDesign and build reusable graphics components and shader systems\nCollaborate with designers and other developers to turn visual concepts into technical implementations\nDebug and optimize rendering performance across different browsers and devices\nStay current with emerging graphics technologies and best practices in the web space</p>\n<p>Qualifications:\nStrong programming skills (Javascript, Typescript, etc.) with experience building production web applications\nExperience with ThreeJS, WebGL, or other web-based graphics libraries\nUnderstanding of fundamental computer graphics concepts (rendering pipelines, shaders, 3D math)\nExperience with GLSL shader programming and graphics optimizations\nKnowledge of modern web standards and browser capabilities\nAbility to balance visual quality with performance constraints\nBalance utilizing known/reliable solutions while tracking new technologies to improve usability\nFamiliarity with web-based mapping SDKs (Mapbox, MapLibre, etc.)\nFamiliarity with additional 3D frameworks (Babylon.js, PlayCanvas, etc.)\nBackground in OpenGL, DirectX, or other graphics APIs is a plus\nExperience with WebGPU for next-generation web graphics\nKnowledge of physics simulations or procedural content generation\nFamiliarity with asset pipelines and 3D modeling tools\nPrevious work with AR/VR web applications is a plus</p>\n<p>Physical Demands:\nProlonged periods of sitting at a desk and working on a computer\nOccasional standing and walking within the office\nManual dexterity to operate a computer keyboard, mouse, and other office equipment\nVisual acuity to read screens, documents, and reports\nOccasional reaching, bending, or stooping to access file drawers, cabinets, or office supplies\nLifting and carrying items up to 20 pounds occasionally (e.g., office supplies, packages)</p>\n<p>Benefits:\nMedical Insurance: Comprehensive health insurance plans covering a range of services\nDental and Vision Insurance: Coverage for routine dental check-ups, orthodontics, and vision care\nSaronic pays 100% of the premium for employees and 80% for dependents\nTime Off: Generous PTO and Holidays\nParental Leave: Paid maternity and paternity leave to support new parents\nCompetitive Salary: Industry-standard salaries with opportunities for performance-based bonuses\nRetirement Plan: 401(k) plan\nStock Options: Equity options to give employees a stake in the company’s success\nLife and Disability Insurance: Basic life insurance and short- and long-term disability coverage\nAdditional Perks: Free lunch benefit and unlimited free drinks and snacks in the office</p>\n<p>Additional Information:\nThis role requires access to export-controlled information or items that require “U.S. Person” status. As defined by U.S. law, individuals who are any one of the following are considered to be a “U.S. Person”: (1) U.S. citizens, (2) legal permanent residents (a.k.a. green card holders), and (3) certain protected classes of asylees and refugees, as defined in 8 U.S.C. 1324b(a)(3)</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_756d7161-c40","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Saronic Technologies","sameAs":"https://www.saronictechnologies.com/","logo":"https://logos.yubhub.co/saronictechnologies.com.png"},"x-apply-url":"https://jobs.lever.co/saronic/9441b432-c0ec-44b6-9fa7-53be3cd63db0","x-work-arrangement":"onsite","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["ThreeJS","WebGL","GLSL","Javascript","Typescript","WebGPU","Physics simulations","Procedural content generation","Asset pipelines","3D modeling tools"],"x-skills-preferred":[],"datePosted":"2026-04-17T12:56:10.012Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Athens"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"ThreeJS, WebGL, GLSL, Javascript, Typescript, WebGPU, Physics simulations, Procedural content generation, Asset pipelines, 3D modeling tools"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_ed7f6866-84d"},"title":"Lead Quantum Device Theorist","description":"<p>As a Lead Quantum Device Theorist, you will play a central role in advancing the performance of our superconducting quantum processors.</p>\n<p>This position requires deep expertise in circuit QED, quantum device physics, and noise modeling for quantum error correction (QEC).</p>\n<p>You will work closely with experimental teams to model processor dynamics and lead efforts to improve qubit readout fidelity and quantum gate performance across our R&amp;D platforms.</p>\n<p>This role demands strong cross-functional collaboration with specialists in qubit readout, gate calibration, control systems, and superconducting circuit design.</p>\n<p>Together, you will drive innovations that support scalable architectures, quantum advantage, and fault-tolerant error correction.</p>\n<p>Key Responsibilities:</p>\n<ul>\n<li><p>Develop and maintain advanced simulation tools to accurately model noise sources in flux-tunable superconducting qubits</p>\n</li>\n<li><p>Model and analyze entangling gate operations on superconducting quantum processors</p>\n</li>\n<li><p>Apply optimal control techniques to improve quantum gate and readout performance</p>\n</li>\n<li><p>Develop analytical tools to interpret experimental measurements and diagnose performance anomalies</p>\n</li>\n<li><p>Perform detailed error-budget modeling to support quantum error correction (QEC) efforts</p>\n</li>\n<li><p>Collaborate cross-functionally with teams in gate operations, measurement, device design, applications, algorithms, and control engineering.</p>\n</li>\n</ul>\n<p>Required Qualifications:</p>\n<ul>\n<li><p>Ph.D. in Physics, Applied Physics, Electrical Engineering, or a related field, plus 5+ years of relevant work experience</p>\n</li>\n<li><p>Modeling noise in large scale processors and inform Hamiltonian designs</p>\n</li>\n<li><p>Experience simulating open quantum systems</p>\n</li>\n<li><p>Experience collaborating with experimentalists on readout and noise characterization; analyzing and interpreting experimental data, and predicting anomalies</p>\n</li>\n<li><p>Background in gate-based quantum computing or superconducting circuits, either academically or in industry</p>\n</li>\n<li><p>Demonstrated depth and breadth in circuit QED physics, including Hamiltonian modeling, dispersive readout theory, and multi-qubit coupling architectures</p>\n</li>\n<li><p>Proven expertise in noise modeling for quantum error correction, including coherent and incoherent error channels, leakage, crosstalk, and correlated noise</p>\n</li>\n<li><p>Strong programming skills in Python for scientific applications</p>\n</li>\n</ul>\n<p>Preferred Qualifications:</p>\n<ul>\n<li><p>Experience with optimal control theory applied to superconducting qubits</p>\n</li>\n<li><p>Familiarity with quantum error correction codes and fault-tolerant architectures</p>\n</li>\n<li><p>Track record of publications in relevant peer-reviewed journals</p>\n</li>\n<li><p>Experience with high-performance computing or GPU-accelerated simulations</p>\n</li>\n<li><p>Proficiency with scientific computing libraries such as QuTiP, or Stim</p>\n</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_ed7f6866-84d","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Rigetti Computing","sameAs":"https://www.rigetti.com","logo":"https://logos.yubhub.co/rigetti.com.png"},"x-apply-url":"https://jobs.lever.co/rigetti/ff81d9dd-c93b-47d9-ac8f-fd04f60f1bd2","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["circuit QED","quantum device physics","noise modeling","Python","optimal control theory","quantum error correction"],"x-skills-preferred":["quantum error correction codes","high-performance computing","GPU-accelerated simulations","scientific computing libraries"],"datePosted":"2026-04-17T12:54:44.321Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Berkeley"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"circuit QED, quantum device physics, noise modeling, Python, optimal control theory, quantum error correction, quantum error correction codes, high-performance computing, GPU-accelerated simulations, scientific computing libraries"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_e308ff1b-d8b"},"title":"Software Engineer, DevOps, Research Platform","description":"<p>About Mistral AI\\n\\nAt Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life.\\n\\nWe are a team passionate about AI and its potential to transform society. Our diverse workforce thrives in competitive environments and is committed to driving innovation.\\n\\nRole Summary\\n\\nWe are seeking a talented and experienced software engineer to join our Research Platform team. You&#39;ll work closely with our R&amp;D team to build a cloud agnostic platform that improves the stability, scalability and velocity across the research department.\\n\\nResponsibilities\\n\\nAs a DevOps/Platform Engineer, your responsibilities will include:\\n\\n* Designing and implementing complex systems (e.g. scale our research CI with a strong focus toward reliability, reproducibility and speed)\\n\\n* Building flexible yet solid and accessible development environment for researchers, so they can focus on core mission.\\n\\n* Designing, implementing and advocating for solutions addressing large amounts of data and maintainable data pipelines.\\n\\n* Optimizing a variety of builds: container images, large libraries compilation times, python environments...\\n\\n* Building strong relationships with researchers, understanding their workflow and enabling them to achieve more by leveraging your expertise.\\n\\n* Communicating and producing documentation or any content that will help them to make the most out of the tools and systems you&#39;ll build.\\n\\n* Being part of the team that &quot;platformizes&quot; research and constantly improve the daily experience for researchers while avoiding future roadblocks.\\n\\nAbout You\\n\\n* 5+ years of successful experience in a similar DX / DevOps / SRE role.\\n\\n* Proficiency in software development (Python, Go...) and programming best practices.\\n\\n* Exposure to site reliability engineering: root cause analysis, in-production troubleshooting, on-call rotations...\\n\\n* Exposure to infrastructure management: CI/CD, containerization, orchestration, infra-as-code, monitoring, logging, alerting, observability...\\n\\n* Technical product mindset (e.g. understanding how to debug poor adoption).\\n\\n* Excellent problem-solving and communication skills (ability to contextualizing, gauging risks and getting buy-in for high stakes and impactful solutions).\\n\\n* Ownership, high agency and constantly seeking to learn and improving things for others.\\n\\n* Autonomous, self-driven and able to work well in a fast-paced startup environment.\\n\\n* Low ego and team spirit mindset.\\n\\nYour Application Will Be All The More Interesting If You Also Have:\\n\\n* First hand Bazel (or equivalent) experience.\\n\\n* Strong knowledge of Python&#39;s ecosystem.\\n\\n* Familiarity with GPU based workloads and ecosystems.\\n\\n* Experience of full remote environments (you&#39;re comfortable with having some of your users on the other side of the globe).\\n\\nHiring Process\\n\\n* Intro Call - 30 min\\n\\n* Tech Culture Interview - 30 min\\n\\n* Technical Rounds - 2 x 45 min\\n\\n* Culture-fit Discussion - 30 min\\n\\n* Reference Calls\\n\\nBy Applying, You Agree To Our Applicant Privacy Policy.\\n\\nAdditional Information\\n\\nLocation &amp; Remote\\n\\nThis role is primarily based at one of our European offices (Paris, France and London, UK). We will prioritize candidates who either reside there or are open to relocating. We strongly believe in the value of in-person collaboration to foster strong relationships and seamless communication within our team. In certain specific situations, we will also consider remote candidates based in one of the countries listed in this job posting , currently France &amp; UK. In that case, we ask all new hires to visit our local office:\\n\\n* for the first week of their onboarding (accommodation and travelling covered)\\n\\n* then at least 3 days per month\\n\\nWhat We Offer\\n\\n* Competitive salary and equity\\n\\n* Health insurance\\n\\n* Transportation allowance\\n\\n* Sport allowance\\n\\n* Meal vouchers\\n\\n* Private pension plan\\n\\n* Parental: Generous parental leave policy\\n\\n* Visa sponsorship\\n\\nBy Applying, You Agree To Our Applicant Privacy Policy.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_e308ff1b-d8b","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai","logo":"https://logos.yubhub.co/mistral.ai.png"},"x-apply-url":"https://jobs.lever.co/mistral/18be2b70-c05d-48e4-82ac-e5cb462c96c0","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["software development","python","go","site reliability engineering","infrastructure management","CI/CD","containerization","orchestration","infra-as-code","monitoring","logging","alerting","observability"],"x-skills-preferred":["bazel","python's ecosystem","gpu based workloads","full remote environments"],"datePosted":"2026-04-17T12:48:20.869Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Paris"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"software development, python, go, site reliability engineering, infrastructure management, CI/CD, containerization, orchestration, infra-as-code, monitoring, logging, alerting, observability, bazel, python's ecosystem, gpu based workloads, full remote environments"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_9e926934-312"},"title":"Applied Scientist / Research Engineer (Internship)","description":"<p>About Mistral AI</p>\n<p>At Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life.</p>\n<p>We are a global company with teams distributed between France, USA, UK, Germany, and Singapore. We offer a comprehensive AI platform that meets enterprise needs, whether on-premises or in cloud environments. Our offerings include le Chat, the AI assistant for life and work.</p>\n<p>Role Summary</p>\n<p>Mistral AI is seeking Applied Scientists Interns and Research Engineers Interns to drive innovative research and collaborate with clients on complex research projects. You will develop SOTA models across different modalities such as text, image, and speech. By developing novel methods and research ideas, you will apply these models across a diverse set of use cases and domains.</p>\n<p>Responsibilities</p>\n<p>• Run pre-training, post-training, and deploy state-of-the-art models on clusters with thousands of GPUs.\n• Generate and curate data for pre-training and post-training, working on evaluations and making sure the model&#39;s performance beats expectations.\n• Develop the necessary tools and frameworks to facilitate data generation, model training, evaluation, and deployment.\n• Collaborate with cross-functional teams to tackle complex use cases using agents and RAG pipelines.\n• Manage research projects and communications with client research teams.</p>\n<p>About You</p>\n<p>• You are fluent in English, and have excellent communication skills. You are at ease explaining complex technical concepts to both technical and non-technical audiences.\n• You&#39;re an expert with PyTorch or JAX.\n• You&#39;re not afraid of contributing to a big codebase and can find yourself around independently with little guidance.\n• You write clean, readable, high-performance, fault-tolerant Python code.\n• You don&#39;t need roadmaps: you just do. You don&#39;t need a manager: you just ship.\n• Low-ego, collaborative, and eager to learn.\n• You have a track record of success through personal projects, professional projects, or in academia.</p>\n<p>Benefits</p>\n<p>• Competitive salary\n• Food: Daily lunch vouchers\n• Sport: Monthly contribution to a Gympass subscription\n• Transportation: Monthly contribution to a mobility pass</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_9e926934-312","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai","logo":"https://logos.yubhub.co/mistral.ai.png"},"x-apply-url":"https://jobs.lever.co/mistral/426ef8c0-eb26-4004-a690-f33c62b445a7","x-work-arrangement":"onsite","x-experience-level":"entry","x-job-type":"internship","x-salary-range":null,"x-skills-required":["PyTorch","JAX","Python","GPU","data generation","model training","evaluation","deployment"],"x-skills-preferred":["agents","multi-modality","robotics","diffusion models","time-series analysis"],"datePosted":"2026-04-17T12:47:54.108Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Paris"}},"employmentType":"INTERN","occupationalCategory":"Engineering","industry":"Technology","skills":"PyTorch, JAX, Python, GPU, data generation, model training, evaluation, deployment, agents, multi-modality, robotics, diffusion models, time-series analysis"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_6f25b435-69f"},"title":"Technical Support Engineer – On-Premise","description":"<p>We are seeking a Technical Support Engineer - On-Premise Infrastructure to join our Support team in France. This role is ideal for someone who excels at technical troubleshooting, incident investigation, and customer communication in a B2B environment.</p>\n<p>As a key member of the support team, you will be responsible for handling escalated technical issues from on-premise enterprise clients, reproducing complex problems, and collaborating with engineering, data, and product teams to ensure swift resolution. You will report directly to the Head of Support, and play a critical role in maintaining customer satisfaction and improving our support operations.</p>\n<p>Key Responsibilities:</p>\n<ul>\n<li>Frontline Investigation: Handle escalated tickets from enterprise clients via Intercom, focusing on on-premise infrastructure and AI-related issues (e.g., deployment, performance, integration, security).</li>\n<li>Root Cause Analysis: Ask the right questions to gather context, reproduce issues in test environments, and diagnose technical problems (systems, networks, storage, GPU clusters, AI models).</li>\n<li>Cross-Team Collaboration: Work closely with engineering, and deployment teams to escalate, track, and resolve incidents efficiently.</li>\n<li>Proactive Communication: Provide clear, empathetic, and timely updates to clients and internal stakeholders, ensuring transparency throughout the resolution process.</li>\n</ul>\n<p>Knowledge Sharing &amp; Process Improvement:</p>\n<ul>\n<li>Documentation: Create and update technical FAQs, troubleshooting guides, and internal knowledge base articles to empower self-serve/L1 team and reduce recurrence of issues.</li>\n<li>Feedback Loop: Identify recurring pain points in on-premise deployments and suggest improvements to product, documentation, or support workflows.</li>\n</ul>\n<p>Customer-Centric Approach:</p>\n<ul>\n<li>Empathy &amp; Ownership: Maintain a customer-first mindset, ensuring clients feel heard and supported, even in high-pressure situations.</li>\n<li>Solution-Oriented: Proactively propose workarounds, fixes, or process optimizations to enhance the customer experience and reduce incident resolution time.</li>\n</ul>\n<p>Technical Expertise:</p>\n<ul>\n<li>On-Premise &amp; Cloud Environments: Deep understanding of Linux/Windows servers, networking, virtualization, storage, security (firewalls, RGPD compliance), and cloud providers (AWS, GCP, Azure).</li>\n<li>Kubernetes/Helm: Experience with deployment, scaling, and troubleshooting of applications in Kubernetes clusters using Helm charts.</li>\n<li>Terraform: Familiarity with Infrastructure as Code (IaC) for managing cloud resources is a strong plus.</li>\n<li>AI Infrastructure: Knowledge of AI/ML pipelines, LLM/RAG deployments, GPU acceleration, and data storage solutions for enterprise clients.</li>\n<li>Tooling: Proficiency in Intercom, monitoring tools, scripting (Bash/Python), and diagnostic utilities (logs, performance metrics).</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_6f25b435-69f","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai/careers","logo":"https://logos.yubhub.co/mistral.ai.png"},"x-apply-url":"https://jobs.lever.co/mistral/f00a13aa-61f1-4c56-993c-20846adc2b15","x-work-arrangement":"onsite","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Linux/Windows servers","Networking","Virtualization","Storage","Security","Kubernetes/Helm","Terraform","AI/ML pipelines","LLM/RAG deployments","GPU acceleration","Data storage solutions","Intercom","Monitoring tools","Scripting (Bash/Python)","Diagnostic utilities"],"x-skills-preferred":[],"datePosted":"2026-04-17T12:47:50.345Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Paris"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Linux/Windows servers, Networking, Virtualization, Storage, Security, Kubernetes/Helm, Terraform, AI/ML pipelines, LLM/RAG deployments, GPU acceleration, Data storage solutions, Intercom, Monitoring tools, Scripting (Bash/Python), Diagnostic utilities"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_81c1220e-c71"},"title":"Applied Scientist / Research Engineer","description":"<p>About the Job</p>\n<p>Mistral AI is seeking Applied Scientists and Research Engineers to drive innovative research and collaborate with clients on complex research projects. You will develop SOTA models across different modalities such as text, image, and speech. By developing novel methods and research ideas you will apply these models across a diverse set of use cases and domains.</p>\n<p>Responsibilities</p>\n<p>• Run pre-training, post-training and deploy state of the art models on clusters with thousands of GPUs. You don’t panic when you see OOM errors or when NCCL feels like not wanting to talk.\n• Generate and curate data for pre-training and post-training, working on evaluations and making sure the model’s performance beats expectations.\n• Develop the necessary tools and frameworks to facilitate data generation, model training, evaluation and deployment.\n• Collaborate with cross-functional teams to tackle complex use cases using agents and RAG pipelines.\n• Manage research projects and communications with client research teams.</p>\n<p>About You</p>\n<p>• You are fluent in English, and have excellent communication skills. You are at ease explaining complex technical concepts to both technical and non-technical audiences.\n• You’re an expert with PyTorch or JAX.\n• You’re not afraid of contributing to a big codebase and can find yourself around independently with little guidance.\n• You write clean, readable, high-performance, fault-tolerant Python code.\n• You don’t need roadmaps: you just do. You don’t need a manager: you just ship.\n• Low-ego, collaborative and eager to learn.\n• You have a track record of success through personal projects, professional projects or in academia.</p>\n<p>Benefits</p>\n<p>• Competitive cash salary and equity\n• Health Insurance\n• Sport: $90 for gym membership allowance\n• Food: $200 monthly allowance for meals (solution might evolve as we grow bigger)\n• Transportation: $120/month for public transport or Parking charges reimbursed</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_81c1220e-c71","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai","logo":"https://logos.yubhub.co/mistral.ai.png"},"x-apply-url":"https://jobs.lever.co/mistral/c41d9d9e-f0ea-4621-a4a9-3f10dfa9ae84","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["PyTorch","JAX","Python","GPU","OOM","NCCL","Agents","RAG pipelines"],"x-skills-preferred":["PhD","Master","Mathematics","Physics","Machine Learning"],"datePosted":"2026-04-17T12:47:44.697Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Singapore"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"PyTorch, JAX, Python, GPU, OOM, NCCL, Agents, RAG pipelines, PhD, Master, Mathematics, Physics, Machine Learning"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_b7bde4cf-9c8"},"title":"Datacenter Hardware Engineer, HPC","description":"<p>About Mistral</p>\n<p>At Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life.</p>\n<p>Our compute footprint is growing fast to support our science and engineering teams. We’re hiring a Datacenter HW Engineer to maintain, troubleshoot, and scale our GPU/CPU clusters safely and reliably.</p>\n<p>You’ll execute hands-on hardware work in our Paris-area datacenter and partner with hardware owners, DC operations, and vendors to keep one of France’s largest GPU clusters healthy.</p>\n<p>Location: Bruyères-le-Châtel , on-site, field role</p>\n<p>Reporting line: Hardware Ops</p>\n<p>Impact</p>\n<p>• Compute is a key lever for Mistral’s success and our largest spend item.\n• Direct impact on scale: your work keeps one of France’s largest AI clusters healthy as we grow to unprecedented scale.\n• Enable breakthrough AI: you unlock our science &amp; engineering teams to deliver groundbreaking AI solutions.</p>\n<p>Responsibilities</p>\n<p>• Diagnose &amp; operate core server/cluster components - Investigate and handle compute/storage hardware issues (CPU, memory, drives, NICs, GPUs, PSUs) and interconnect problems (switches, cables, transceivers; Ethernet/InfiniBand).\n• Safety &amp; procedures - Apply lockout/tagout (LOTO) and ESD discipline; follow pre/post-work checklists; maintain tidy, safe work areas.\n• First-line diagnostics - Triage using LEDs, POST, beep codes and basic tests; capture evidence (photos, serials, results); open/update/close tickets with clear notes.\n• Preventive maintenance - Provide feedback and ideas to improve proactive activities, monitoring, and targeted follow-ups on recurring or specific anomalies; help turn ad-hoc checks into SOPs, alerts, and dashboards.\n• Parts &amp; logistics - Receive and track parts, keep labeled inventory accurate, manage simple RMAs, and coordinate with vendors.\n• Collaboration &amp; escalation - Partner with senior hardware/firmware owners on complex or multi-node issues; communicate status and next steps crisply.\n• Documentation &amp; quality - Keep SOPs/checklists current; ensure zero undocumented changes and consistent, audit-ready records.</p>\n<p>About you</p>\n<p>• Hands-on mindset in datacenters/server hardware: you can install/re-seat/swap GPU/PCIe cards, NICs, PSUs, drives, and work cleanly in racks (rails, cabling, labeling).\n• Disciplined and meticulous: follows checklists, ESD/LOTO; no rough handling; careful with all high-value server components.\n• Practical electrical basics: power-off, PPE, short-circuit risk awareness.\n• Comfortable in racks: cooling, network, storage, PDU, cable management; can lift/mount safely (within HSE limits).\n• Clear communicator: short factual updates; reliable teammate; punctual and process-minded.\n• Hardware-passionate, professionally grounded: strong curiosity and craft mindset.</p>\n<p>Nice to have</p>\n<p>• HPC/AI/Cloud at scale experience (production environments), large-fleet/server install &amp; maintenance in datacenters.\n• Basic networking (Ethernet/InfiniBand) and basic Linux (boot/check; no coding needed).\n• Coding/automation skills (Python/Bash): small tools/scripts to improve checklists, photo/serial capture, inventory sync, or simple monitoring/reporting.\n• Experience with inventory/RMA tools and vendor coordination.\n• Exposure to HPC/research/industrial environments.</p>\n<p>What we offer</p>\n<p>💰 Competitive salary and equity package</p>\n<p>🧑‍⚕️ Health insurance</p>\n<p>🚴 Transportation allowance</p>\n<p>🥎 Sport allowance</p>\n<p>🥕 Meal vouchers</p>\n<p>💰 Private pension plan</p>\n<p>🍼 Generous parental leave policy</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_b7bde4cf-9c8","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai/careers","logo":"https://logos.yubhub.co/mistral.ai.png"},"x-apply-url":"https://jobs.lever.co/mistral/ddf7bcbb-e223-4768-a553-6e95df472cf7","x-work-arrangement":"onsite","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["GPU/CPU clusters","server hardware","Linux fundamentals","scripting","electrical basics","networking","inventory management"],"x-skills-preferred":["HPC/AI/Cloud at scale experience","basic Linux","coding/automation skills","inventory/RMA tools","vendor coordination"],"datePosted":"2026-04-17T12:47:08.660Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Paris"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"GPU/CPU clusters, server hardware, Linux fundamentals, scripting, electrical basics, networking, inventory management, HPC/AI/Cloud at scale experience, basic Linux, coding/automation skills, inventory/RMA tools, vendor coordination"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_c569b4c8-3f2"},"title":"Applied Scientist / Research Engineer - EMEA","description":"<p>About Mistral AI</p>\n<p>Mistral AI is a pioneering company shaping the future of AI. We believe in the power of AI to simplify tasks, save time, and enhance learning and creativity.</p>\n<p>About The Job</p>\n<p>We are seeking Applied Scientists and Research Engineers to drive innovative research and collaborate with clients on complex research projects. You will develop SOTA models across different modalities such as text, image, and speech. By developing novel methods and research ideas you will apply these models across a diverse set of use cases and domains.</p>\n<p>Responsibilities</p>\n<p>• Run pre-training, post-training and deploy state of the art models on clusters with thousands of GPUs. You don’t panic when you see OOM errors or when NCCL feels like not wanting to talk.\n• Generate and curate data for pre-training and post-training, working on evaluations and making sure the model’s performance beats expectations.\n• Develop the necessary tools and frameworks to facilitate data generation, model training, evaluation and deployment.\n• Collaborate with cross-functional teams to tackle complex use cases using agents and RAG pipelines.\n• Manage research projects and communications with client research teams.</p>\n<p>About You</p>\n<p>• You are fluent in English, and have excellent communication skills. You are at ease explaining complex technical concepts to both technical and non-technical audiences.\n• You’re an expert with PyTorch or JAX.\n• You’re not afraid of contributing to a big codebase and can find yourself around independently with little guidance.\n• You write clean, readable, high-performance, fault-tolerant Python code.\n• You don’t need roadmaps: you just do. You don’t need a manager: you just ship.\n• Low-ego, collaborative and eager to learn.\n• You have a track record of success through personal projects, professional projects or in academia.</p>\n<p>Benefits</p>\n<p>We have local offices in Paris, London, Marseille, Singapore and Palo Alto. France:\n• Competitive cash salary and equity\n• Food: Daily lunch vouchers\n• Sport: Monthly contribution to a Gympass subscription\n• Transportation: Monthly contribution to a mobility pass\n• Health: Full health insurance for you and your family\n• Parental: Generous parental leave policy\n• Visa sponsorship</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_c569b4c8-3f2","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai/careers","logo":"https://logos.yubhub.co/mistral.ai.png"},"x-apply-url":"https://jobs.lever.co/mistral/b7ae8fc4-5779-4ad2-8f5b-632b4d9498cf","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["PyTorch","JAX","Python","GPU","OOM errors","NCCL"],"x-skills-preferred":[],"datePosted":"2026-04-17T12:47:06.801Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Paris"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"PyTorch, JAX, Python, GPU, OOM errors, NCCL"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_2bc207d0-89b"},"title":"Senior Machine Learning Engineer","description":"<p>We are seeking a Senior Machine Learning Research Engineer to join the Machine Learning Science (MLS) team, within the Computational Science department. The ideal candidate has a strong knowledge in designing and building deep learning (DL) pipelines, and expertise in creating reliable, scalable artificial intelligence/machine learning (AI/ML) systems in a cloud environment.</p>\n<p>The MLS team at Freenome develops DL models using massive-scale genomic data that presents significant challenges for current training paradigms. The Senior Machine Learning Research Engineer will primarily be responsible for developing and deploying the infrastructure needed to support development of such DL models: enabling distributed DL pipelines, optimising hardware utilisation for efficient training, and performing model optimisations.</p>\n<p>As part of an interdisciplinary R&amp;D team, they will work in close collaboration with machine learning scientists, computational biologists and software engineers to accelerate the development of state-of-the-art ML/AI models and help Freenome achieve its mission.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Implementing and refining DL pipelines on distributed computing platforms to enhance the speed and efficiency of DL operations, including model training, data handling, model management, and inference.</li>\n<li>Collaborating closely with ML scientists and software engineers to understand current challenges and requirements and ensure that the DL model development pipelines created are perfectly aligned with scientific goals and operational needs.</li>\n<li>Continuously monitoring, evaluating, and optimising DL model training pipelines for performance and scalability.</li>\n<li>Staying up to date with the latest advancements in AI, ML, and related technologies, and quickly learning and adapting new tools and frameworks, if necessary.</li>\n<li>Developing and maintaining robust and reproducible DL pipelines that guarantee that DL pipelines can be reliably executed, maintaining consistency and accuracy of results.</li>\n<li>Driving performance improvements across our stack through profiling, optimisation, and benchmarking. Implementing efficient caching solutions and debugging distributed systems to accelerate both training and evaluation pipelines.</li>\n<li>Acting as a bridge facilitating communication between the engineering and scientific teams, documenting and sharing best practices to foster a culture of learning and continuous improvement.</li>\n</ul>\n<p>Must-haves include:</p>\n<ul>\n<li>MS or equivalent experience in a relevant, quantitative field such as Computer Science, Statistics, Mathematics, Software Engineering, with an emphasis on AI/ML theory and/or practical development.</li>\n<li>5+ years of post-MS industry experience working on developing AI/ML software engineering pipelines.</li>\n<li>Proficiency in a general-purpose programming language: Python (preferred), Java, Julia, C, C++, etc.</li>\n<li>Strong knowledge of ML and DL fundamentals and hands-on experience with machine learning frameworks such as PyTorch, TensorFlow, Jax or Scikit-learn.</li>\n<li>In-depth knowledge of scalable and distributed computing platforms that support complex model training (such as Ray or DeepSpeed) and their integration with ML developer tools like TensorBoard, Wandb, or MLflow.</li>\n<li>Experience with cloud platforms (e.g., AWS, Google Cloud, Azure) and how to deploy and manage AI/ML models and pipelines in a cloud environment.</li>\n<li>Understanding of containerisation technologies (e.g., Docker) and computing resource orchestration tools (e.g., Kubernetes) for deploying scalable ML/AI solutions.</li>\n<li>Proven track record of developing and optimising workflows for training DL models, large language models (LLMs), or similar for problems with high data complexity and volume.</li>\n<li>Experience managing large datasets, including data storage (such as HDFS or Parquet on S3), retrieval, and efficient data processing techniques (via libraries and executors such as PyArrow and Spark).</li>\n<li>Proficiency in version control systems (e.g., Git) and continuous integration/continuous deployment (CI/CD) practices to maintain code quality and automate development workflows.</li>\n<li>Expertise in building and launching large-scale ML frameworks in a scientific environment that supports the needs of a research team.</li>\n<li>Excellent ability to work effectively with cross-functional teams and communicate across disciplines.</li>\n</ul>\n<p>Nice-to-haves include:</p>\n<ul>\n<li>Experience working with large-scale genomics or biological datasets.</li>\n<li>Experience managing multimodal datasets, such as combinations of sequence, text, image, and other data.</li>\n<li>Experience GPU/Accelerator programming and kernel development (such as CUDA, Triton or XLA).</li>\n<li>Experience with infrastructure-as-code and configuration management.</li>\n<li>Experience cultivating MLOps and ML infrastructure best practices, especially around reliability, provisioning and monitoring.</li>\n<li>Strong track record of contributions to relevant DL projects, e.g. on github.</li>\n</ul>\n<p>The US target range of our base salary for new hires is $161,925 - $227,325. You will also be eligible to receive equity, cash bonuses, and a full range of medical, financial, and other benefits depending on the position offered.</p>\n<p>Freenome is proud to be an equal-opportunity employer, and we value diversity. Freenome does not discriminate on the basis of race, colour, religion, marital status, age, national origin, ancestry, physical or mental disability, medical condition, pregnancy, genetic information, gender, sexual orientation, gender identity or expression, veteran status, or any other status protected under federal, state, or local law.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_2bc207d0-89b","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Freenome","sameAs":"https://freenome.com/","logo":"https://logos.yubhub.co/freenome.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/freenome/jobs/8013673002","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$161,925 - $227,325","x-skills-required":["Python","Java","Julia","C","C++","PyTorch","TensorFlow","Jax","Scikit-learn","Ray","DeepSpeed","TensorBoard","Wandb","MLflow","AWS","Google Cloud","Azure","Docker","Kubernetes","Git","Continuous Integration/Continuous Deployment"],"x-skills-preferred":["Large-scale genomics or biological datasets","Multimodal datasets","GPU/Accelerator programming and kernel development","Infrastructure-as-code and configuration management","MLOps and ML infrastructure best practices"],"datePosted":"2026-04-17T12:35:01.240Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Brisbane, California"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Java, Julia, C, C++, PyTorch, TensorFlow, Jax, Scikit-learn, Ray, DeepSpeed, TensorBoard, Wandb, MLflow, AWS, Google Cloud, Azure, Docker, Kubernetes, Git, Continuous Integration/Continuous Deployment, Large-scale genomics or biological datasets, Multimodal datasets, GPU/Accelerator programming and kernel development, Infrastructure-as-code and configuration management, MLOps and ML infrastructure best practices","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":161925,"maxValue":227325,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_eeeb517e-3c5"},"title":"Staff Security Engineer, Infrastructure","description":"<p>We&#39;re looking for a Staff Security Engineer, Infrastructure to secure the core systems that power our platform: GPU compute, multi-cloud environments, networking, and data pipelines. You&#39;ll operate across the full stack, from cloud and Kubernetes to identity, networking, and secrets, designing and implementing security controls that scale with a high-performance AI platform.</p>\n<p>This role is highly hands-on and systems-oriented, sitting at the intersection of security, infrastructure, and distributed systems.</p>\n<p>Your primary responsibilities will be to:</p>\n<ul>\n<li>Build and harden infrastructure security by designing and implementing security controls across cloud infrastructure, Kubernetes and containerized workloads, networking, service meshes, and edge systems, CI/CD pipelines and deployment systems, and secure compute environments for GPU workloads and model execution.</li>\n<li>Implement identity, secrets, and access controls, including machine identity and workload authentication, secrets management and encryption, least-privilege access, and short-lived credentials.</li>\n<li>Protect model weights, inference endpoints, and customer data, design secure data access pathways and isolation mechanisms, and ensure safe multi-tenant execution environments.</li>\n<li>Automate security guardrails directly into infrastructure and CI/CD, use Infrastructure-as-Code to enforce secure defaults, and continuously identify and remediate security gaps through automation.</li>\n<li>Identify and mitigate risks across infrastructure layers, defend against both external attackers and insider threats, and drive projects like network isolation, encryption, and secure service communication.</li>\n</ul>\n<p>To succeed in this role, you&#39;ll need to have:</p>\n<ul>\n<li>8+ years in security engineering, infrastructure, or SRE.</li>\n<li>Strong understanding of cloud security, networking fundamentals, Linux systems, and container security.</li>\n<li>Experience building or securing production infrastructure at scale.</li>\n<li>Deep knowledge of authentication and authorization systems, secrets management and cryptography basics, common vulnerabilities and attack vectors, and ability to design security controls across multiple layers.</li>\n<li>Proficiency in at least one language, experience with Infrastructure-as-Code, and strong automation mindset.</li>\n</ul>\n<p>Nice to have experience with GPU infrastructure, multi-tenant platform isolation, service mesh architectures, and high-growth startup environments.</p>\n<p>What makes this role unique is that you&#39;ll work on cutting-edge AI infrastructure security, secure GPU clusters, model execution, and real-time inference systems, have high ownership, and direct impact on developer trust and platform reliability.</p>\n<p>Our security philosophy is to enable developers, automate everything, assume breach, and design for resilience.</p>\n<p>In terms of compensation and benefits, we offer competitive salary, equity, full health, dental, and vision coverage, and opportunity to work on frontier AI infrastructure.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_eeeb517e-3c5","directApply":true,"hiringOrganization":{"@type":"Organization","name":"fal.ai","sameAs":"https://fal.ai","logo":"https://logos.yubhub.co/fal.ai.png"},"x-apply-url":"https://job-boards.greenhouse.io/fal/jobs/4200560009","x-work-arrangement":"onsite","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["cloud security","networking fundamentals","Linux systems","container security","Infrastructure-as-Code","authentication and authorization systems","secrets management and cryptography basics","common vulnerabilities and attack vectors"],"x-skills-preferred":["GPU infrastructure","multi-tenant platform isolation","service mesh architectures","high-growth startup environments"],"datePosted":"2026-04-17T12:32:36.163Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"cloud security, networking fundamentals, Linux systems, container security, Infrastructure-as-Code, authentication and authorization systems, secrets management and cryptography basics, common vulnerabilities and attack vectors, GPU infrastructure, multi-tenant platform isolation, service mesh architectures, high-growth startup environments"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_690339e7-e86"},"title":"Senior Software Engineer, Autonomy - Calibration, Mapping & Localization","description":"<p>About Cyngn</p>\n<p>Based in Mountain View, CA, Cyngn is a publicly-traded autonomous technology company. We deploy self-driving industrial vehicles like forklifts and tuggers to factories, warehouses, and other facilities throughout North America.</p>\n<p>To build this emergent technology, we are looking for innovative, motivated, and experienced leaders to join us and move this field forward. If you like to build, tinker, and create with a team of trusted and passionate colleagues, then Cyngn is the place for you.</p>\n<p>Key reasons to join Cyngn:</p>\n<p>We are small and big. With under 100 employees, Cyngn operates with the energy of a startup. On the other hand, we’re publicly traded. This means our employees not only work in close-knit teams with mentorship from company leaders,they also get access to the liquidity of our publicly-traded equity.</p>\n<p>We build today and deploy tomorrow. Our autonomous vehicles aren’t just test concepts,they’re deployed to real clients right now. That means your work will have a tangible, visible impact.</p>\n<p>We aren’t robots. We just develop them. We’re a welcoming, diverse team of sharp thinkers and kind humans. Collaboration and trust drive our creative environment. At Cyngn, everyone’s perspective matters,and that’s what powers our innovation.</p>\n<p>About this role:</p>\n<p>As a Staff/Senior Software Engineer on our Calibration, Localization, &amp; Mapping (CLAM) team, you will be responsible for delivering mission-critical improvements and new features to our calibration, localization, and mapping subsystems. You will work on a small, highly focused team developing production-quality software that enables efficient and accurate creation of HD maps at Cyngn deployment-sites and robust localization for Cyngn’s autonomous vehicle fleets.</p>\n<p>Responsibilities</p>\n<ul>\n<li><p>Design, implement, tune, and test mapping, localization, and sensor calibration algorithms for our autonomous vehicle platforms using C++ and Python.</p>\n</li>\n<li><p>Develop tooling and metrics for performance validation and continuous testing frameworks.</p>\n</li>\n<li><p>Balance project tasks, code reviews, and research to meet product-driven milestones in a fast-paced startup environment.</p>\n</li>\n</ul>\n<p>Qualifications</p>\n<ul>\n<li><p>MS/Phd with focus in robotics or a similar technical field of study</p>\n</li>\n<li><p>Solid foundation in probability theory, linear algebra, 3D geometry, and spatial coordinate transformations.</p>\n</li>\n<li><p>In-depth understanding of matrix factorization algorithms and Lie algebra/groups.</p>\n</li>\n<li><p>Solid theoretical knowledge of state-of-the-art techniques in 3D Lidar-based mapping and localization for autonomous vehicles (LOAM series, GICP, FastLIO, bundle-adjustment)</p>\n</li>\n<li><p>Familiarity with state estimation frameworks such as EKF’s as well as modern nonlinear optimization libraries (GTSAM, G2O, Ceres-Solver, GNC-Solver, etc.)</p>\n</li>\n<li><p>6+ years of industry experience as an autonomous vehicle or robotics software engineering professional including hands-on implementation and tuning on production hardware.</p>\n</li>\n<li><p>6+ years industry experience writing C++ software in a production environment - architecture design, unit testing, code review, algorithm performance trade-offs, etc.</p>\n</li>\n<li><p>Proficiency in Python.</p>\n</li>\n<li><p>Excellent written &amp; verbal communication skills.</p>\n</li>\n</ul>\n<p>Bonus Qualifications</p>\n<ul>\n<li><p>Proven record of top-tier publications or patents.</p>\n</li>\n<li><p>Experience with GPU programming, CUDA.</p>\n</li>\n<li><p>Experience in implementing automated map change detection and updating techniques.</p>\n</li>\n<li><p>Experience implementing modern multi-sensor calibration and sensor mis-alignment detection algorithms.</p>\n</li>\n<li><p>Experience with camera-based SLAM and 3D multi-view geometry.</p>\n</li>\n<li><p>Experience working with ROS2 to design, build, and operate robotic systems.</p>\n</li>\n<li><p>Exposure to modern software development version control and project management tools - Git, Jira, etc.</p>\n</li>\n</ul>\n<p>Benefits &amp; Perks</p>\n<ul>\n<li><p>Health benefits (Medical, Dental, Vision, HSA and FSA (Health &amp; Dependent Daycare), Employee Assistance Program, 1:1 Health Concierge)</p>\n</li>\n<li><p>Life, Short-term and long-term disability insurance (Cyngn funds 100% of premiums)</p>\n</li>\n<li><p>Company 401(k)</p>\n</li>\n<li><p>Commuter Benefits</p>\n</li>\n<li><p>Flexible vacation policy</p>\n</li>\n<li><p>Sabbatical leave opportunity after 5 years with the company</p>\n</li>\n<li><p>Paid Parental Leave</p>\n</li>\n<li><p>Daily lunches for in-office employees and fully-stocked kitchen with snacks and beverages</p>\n</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_690339e7-e86","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Cyngn","sameAs":"https://www.cyngn.com/","logo":"https://logos.yubhub.co/cyngn.com.png"},"x-apply-url":"https://jobs.lever.co/cyngn/716dbe41-cac5-4d23-9ec3-cc05b32322b4","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$180,000-198,000 per year","x-skills-required":["C++","Python","Probability theory","Linear algebra","3D geometry","Spatial coordinate transformations","Matrix factorization algorithms","Lie algebra/groups","State estimation frameworks","Nonlinear optimization libraries"],"x-skills-preferred":["GPU programming","CUDA","Automated map change detection and updating techniques","Modern multi-sensor calibration and sensor mis-alignment detection algorithms","Camera-based SLAM and 3D multi-view geometry","ROS2","Git","Jira"],"datePosted":"2026-04-17T12:28:37.248Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Mountain View"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"C++, Python, Probability theory, Linear algebra, 3D geometry, Spatial coordinate transformations, Matrix factorization algorithms, Lie algebra/groups, State estimation frameworks, Nonlinear optimization libraries, GPU programming, CUDA, Automated map change detection and updating techniques, Modern multi-sensor calibration and sensor mis-alignment detection algorithms, Camera-based SLAM and 3D multi-view geometry, ROS2, Git, Jira","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":180000,"maxValue":198000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_8e582153-6af"},"title":"Senior DevOps Lead - Cloud & Autonomous System","description":"<p>About Cyngn</p>\n<p>Cyngn is a publicly-traded autonomous technology company that deploys self-driving industrial vehicles to factories, warehouses, and other facilities throughout North America.</p>\n<p>We are a small company with under 100 employees, operating with the energy of a startup. However, we&#39;re also publicly traded, which means our employees get access to the liquidity of our publicly-traded equity.</p>\n<p>As a Senior DevOps Lead at Cyngn, you will play a vital role in architecting and managing infrastructure across cloud and autonomous vehicle systems. This position combines traditional cloud DevOps leadership with specialized expertise in robotics and autonomous systems infrastructure.</p>\n<p>Responsibilities</p>\n<ul>\n<li>Lead and architect cloud and vehicle infrastructure initiatives across AWS and ROS/Linux environments</li>\n<li>Design and implement scalable solutions for both cloud services and autonomous vehicle systems</li>\n<li>Establish and maintain DevOps best practices, CI/CD pipelines, and infrastructure as code</li>\n<li>Drive observability, monitoring, and incident response strategies</li>\n<li>Optimize performance and cost efficiency of cloud and edge computing resources</li>\n<li>Mentor team members and foster a developer-friendly environment</li>\n<li>Manage on-call rotations and incident response processes</li>\n<li>Architect solutions for processing and storing large-scale vehicle telemetry data</li>\n<li>Lead security initiatives and compliance efforts across infrastructure</li>\n</ul>\n<p>Requirements</p>\n<ul>\n<li>10+ years of relevant DevOps/Infrastructure experience</li>\n<li>Proven track record as a technical lead in platform or infrastructure teams</li>\n<li>Advanced expertise in AWS services, infrastructure as code (Terraform), and Kubernetes</li>\n<li>Strong experience with service mesh (Istio) and Helm/Kustomize</li>\n<li>Deep understanding of ROS/ROS2 and Linux kernel configurations</li>\n<li>Experience with GPU configurations and ML infrastructure</li>\n<li>Expertise in ARM and NVIDIA CUDA platform configurations</li>\n<li>Strong programming skills in Python and shell scripting</li>\n<li>Experience with infrastructure automation (Ansible)</li>\n<li>Expertise in CI/CD tools (Jenkins, GitHub Actions)</li>\n<li>Strong system architecture and design skills</li>\n<li>Excellence in technical documentation</li>\n<li>Outstanding problem-solving abilities</li>\n<li>Strong leadership and mentoring capabilities</li>\n</ul>\n<p>Nice to haves</p>\n<ul>\n<li>Experience with autonomous vehicle systems</li>\n<li>Track record of optimizing GPU-based ML infrastructure</li>\n<li>Experience with large-scale IoT deployments</li>\n<li>Contributions to open-source projects</li>\n<li>Experience with real-time systems and low-latency requirements</li>\n<li>Expertise in security implementations including SSO, IdP, and AWS Cognito</li>\n<li>Experience with JFrog artifactory and container registry management</li>\n<li>Proficiency in AWS IoT Greengrass</li>\n<li>Experience with container resource management on edge devices</li>\n<li>Understanding of CPU affinity and priority scheduling</li>\n<li>Track record of implementing cost optimization strategies</li>\n<li>Experience with scaling systems both horizontally and vertically</li>\n</ul>\n<p>Benefits &amp; Perks</p>\n<ul>\n<li>Health benefits (Medical, Dental, Vision, HSA and FSA (Health &amp; Dependent Daycare), Employee Assistance Program, 1:1 Health Concierge)</li>\n<li>Life, Short-term, and long-term disability insurance (Cyngn funds 100% of premiums)</li>\n<li>Company 401(k)</li>\n<li>Commuter Benefits</li>\n<li>Flexible vacation policy</li>\n<li>Sabbatical leave opportunity after five years with the company</li>\n<li>Paid Parental Leave</li>\n<li>Daily lunches for in-office employees</li>\n<li>Monthly meal and tech allowances for remote employees</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_8e582153-6af","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Cyngn","sameAs":"https://www.cyngn.com/","logo":"https://logos.yubhub.co/cyngn.com.png"},"x-apply-url":"https://jobs.lever.co/cyngn/1c31b7d8-cf85-472f-9358-1e10189cf815","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$198,000-225,000 per year","x-skills-required":["AWS services","infrastructure as code (Terraform)","Kubernetes","service mesh (Istio)","Helm/Kustomize","ROS/ROS2","Linux kernel configurations","GPU configurations","ML infrastructure","ARM","NVIDIA CUDA platform configurations","Python","shell scripting","infrastructure automation (Ansible)","CI/CD tools (Jenkins, GitHub Actions)","system architecture and design skills","technical documentation","problem-solving abilities","leadership and mentoring capabilities"],"x-skills-preferred":["autonomous vehicle systems","optimizing GPU-based ML infrastructure","large-scale IoT deployments","open-source projects","real-time systems and low-latency requirements","security implementations including SSO, IdP, and AWS Cognito","JFrog artifactory and container registry management","AWS IoT Greengrass","container resource management on edge devices","CPU affinity and priority scheduling","cost optimization strategies","scaling systems both horizontally and vertically"],"datePosted":"2026-04-17T12:27:09.593Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Mountain View"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"AWS services, infrastructure as code (Terraform), Kubernetes, service mesh (Istio), Helm/Kustomize, ROS/ROS2, Linux kernel configurations, GPU configurations, ML infrastructure, ARM, NVIDIA CUDA platform configurations, Python, shell scripting, infrastructure automation (Ansible), CI/CD tools (Jenkins, GitHub Actions), system architecture and design skills, technical documentation, problem-solving abilities, leadership and mentoring capabilities, autonomous vehicle systems, optimizing GPU-based ML infrastructure, large-scale IoT deployments, open-source projects, real-time systems and low-latency requirements, security implementations including SSO, IdP, and AWS Cognito, JFrog artifactory and container registry management, AWS IoT Greengrass, container resource management on edge devices, CPU affinity and priority scheduling, cost optimization strategies, scaling systems both horizontally and vertically","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":198000,"maxValue":225000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_c1dcea75-d5a"},"title":"Member of Technical Staff - Infrastructure Engineer","description":"<p>We&#39;re looking for an experienced engineer to join our team in Freiburg, Germany or San Francisco, USA. As a Member of Technical Staff - Infrastructure Engineer, you will be responsible for maintaining and scaling our research infrastructure, ensuring health and optimizing components to extract peak performance from the system. You will also collaborate with research teams to deeply understand their infrastructure needs and design solutions that balance performance with cost efficiency.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Maintaining research infrastructure, ensuring health, and optimizing components to extract peak performance from the system (both on application and infrastructure side)</li>\n<li>Scaling infrastructure to meet growing research demands while maintaining reliability and performance</li>\n<li>Collaborating with research teams to deeply understand their infrastructure needs, and design solutions that balance performance with cost efficiency</li>\n<li>Identifying and resolving performance bottlenecks and capacity hotspots through deep analysis of distributed systems at scale</li>\n<li>Building and evolving telemetry and monitoring systems to provide deep visibility into infrastructure performance, utilization, and costs across our cloud and datacenter fleets</li>\n<li>Participating in on-call rotations and incident response to maintain system reliability</li>\n</ul>\n<p>Technical focus includes:</p>\n<ul>\n<li>Python, Bash, Go</li>\n<li>Kubernetes</li>\n<li>Nvidia GPU drivers and operators</li>\n<li>OTel, Prometheus</li>\n</ul>\n<p>Requirements include:</p>\n<ul>\n<li>Experience building or operating large-scale training platforms</li>\n<li>Worked with large-scale compute clusters (GPUs)</li>\n<li>Proven ability to debug performance and reliability issues across large distributed fleets</li>\n<li>Strong problem-solving skills and ability to work independently</li>\n<li>Strong communication skills and the ability to work effectively with both internal and external partners</li>\n<li>Deep knowledge of modern cloud infrastructure including Kubernetes, Infrastructure as Code, AWS, and GCP</li>\n<li>Experience with SLURM</li>\n</ul>\n<p>We offer a competitive base annual salary of $180,000-$300,000 USD and a hybrid work model with a meaningful in-person presence.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_c1dcea75-d5a","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Black Forest Labs","sameAs":"https://www.blackforestlabs.com/","logo":"https://logos.yubhub.co/blackforestlabs.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/blackforestlabs/jobs/4925659008","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$180,000-$300,000 USD","x-skills-required":["Python","Bash","Go","Kubernetes","Nvidia GPU drivers","Nvidia GPU operators","OTel","Prometheus","Experience building or operating large-scale training platforms","Worked with large-scale compute clusters (GPUs)","Proven ability to debug performance and reliability issues across large distributed fleets","Strong problem-solving skills and ability to work independently","Strong communication skills and the ability to work effectively with both internal and external partners","Deep knowledge of modern cloud infrastructure including Kubernetes, Infrastructure as Code, AWS, and GCP","Experience with SLURM"],"x-skills-preferred":[],"datePosted":"2026-04-17T12:25:55.745Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Freiburg (Germany), San Francisco (USA)"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Bash, Go, Kubernetes, Nvidia GPU drivers, Nvidia GPU operators, OTel, Prometheus, Experience building or operating large-scale training platforms, Worked with large-scale compute clusters (GPUs), Proven ability to debug performance and reliability issues across large distributed fleets, Strong problem-solving skills and ability to work independently, Strong communication skills and the ability to work effectively with both internal and external partners, Deep knowledge of modern cloud infrastructure including Kubernetes, Infrastructure as Code, AWS, and GCP, Experience with SLURM","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":180000,"maxValue":300000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_5c28c97d-fc5"},"title":"Member of Technical Staff - Image / Video Generation","description":"<p><strong>Job Title</strong></p>\n<p>Member of Technical Staff - Image / Video Generation</p>\n<p><strong>Job Description</strong></p>\n<p>We&#39;re the team behind Latent Diffusion, Stable Diffusion, and FLUX,foundational technologies that changed how the world creates images and video. We&#39;re creating the generative models that power how people make images and video,tools used by millions of creators, developers, and businesses worldwide. Our FLUX models are among the most advanced in the world, and we’re just getting started.</p>\n<p><strong>Why This Role</strong></p>\n<p>You&#39;ll train large-scale diffusion models for image and video generation, exploring new approaches while maintaining the rigor that helps us distinguish meaningful progress from incremental tweaks. This isn&#39;t about following established recipes,it&#39;s about running the experiments that clarify which architectural choices matter and which are less impactful.</p>\n<p><strong>What You’ll Work On</strong></p>\n<ul>\n<li>Trains large-scale diffusion transformer models for image and video data, working at the scale where intuitions break and empirical evidence matters</li>\n<li>Rigorously ablates design choices,running experiments that isolate variables, control for confounds, and produce insights you can actually trust,then communicating those results to shape our research direction</li>\n<li>Reasons about the speed-quality tradeoffs of neural network architectures in production settings where both constraints matter simultaneously</li>\n<li>Fine-tunes diffusion models for specialized applications like image and video upscalers, inpainting/outpainting models, and other tasks where general-purpose models aren&#39;t enough</li>\n</ul>\n<p><strong>What We’re Looking For</strong></p>\n<ul>\n<li>You&#39;ve trained large-scale diffusion models and developed strong intuitions about what matters. You know that at research scale, every design choice has tradeoffs, and the only way to know which ones are worth making is through careful ablation. You&#39;re comfortable debugging distributed training issues and presenting research findings to the team.</li>\n</ul>\n<p><strong>Required Skills</strong></p>\n<ul>\n<li>Hands-on experience training large-scale diffusion models for image and video data, with practical knowledge of common failure modes and what matters most in training</li>\n<li>Experience fine-tuning diffusion models for specialized applications,upscalers, inpainting, outpainting, or other tasks where understanding the domain matters as much as understanding the architecture</li>\n<li>Deep understanding of how to effectively evaluate image and video generative models,knowing which metrics correlate with quality and which are just convenient proxies</li>\n<li>Strong proficiency in PyTorch, transformer architectures, and the full ecosystem of modern deep learning</li>\n<li>Solid understanding of distributed training techniques,FSDP, low precision training, model parallelism,because our models don&#39;t fit on one GPU and training decisions impact research outcomes</li>\n</ul>\n<p><strong>Preferred Skills</strong></p>\n<ul>\n<li>Experience writing forward and backward Triton kernels and ensuring their correctness while considering floating point errors</li>\n<li>Proficiency with profiling, debugging, and optimizing single and multi-GPU operations using tools like Nsight or stack trace viewers</li>\n<li>Know the performance characteristics of different architectural choices at scale</li>\n<li>Have published research that contributed to how people think about generative models</li>\n</ul>\n<p><strong>How We Work Together</strong></p>\n<p>We’re a distributed team with real offices that people actually use. Depending on your role, you’ll either join us in Freiburg or SF at least 2 days a week (or one full week every other week), or work remotely with a monthly in-person week to stay connected. We’ll cover reasonable travel costs to make this possible. We think in-person time matters, and we’ve structured things to make it accessible to all. We’ll discuss what this will look like for the role during our interview process.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_5c28c97d-fc5","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Black Forest Labs","sameAs":"https://www.blackforestlabs.com/","logo":"https://logos.yubhub.co/blackforestlabs.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/blackforestlabs/jobs/4132217008","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["large-scale diffusion models","image and video data","PyTorch","transformer architectures","distributed training techniques"],"x-skills-preferred":["writing forward and backward Triton kernels","profiling, debugging, and optimizing single and multi-GPU operations","published research on generative models"],"datePosted":"2026-04-17T12:25:33.116Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Freiburg (Germany)"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"large-scale diffusion models, image and video data, PyTorch, transformer architectures, distributed training techniques, writing forward and backward Triton kernels, profiling, debugging, and optimizing single and multi-GPU operations, published research on generative models"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_38debdc4-b87"},"title":"GPU R&D Engineer (CUDA programming)","description":"<p>You are a passionate technology leader with deep expertise in GPU-accelerated computing and algorithm design. With over a decade of experience in software engineering, you thrive in environments that challenge you to innovate and push boundaries.</p>\n<p>As a GPU R&amp;D Engineer at Synopsys, you will be responsible for optimizing and enhancing existing GPU implementations for cutting-edge ILT (Inverse Lithography Technology) software. You will also design, develop, and deploy new GPU-accelerated algorithms for handling large-scale geometric data in mask synthesis tools.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Optimizing and enhancing existing GPU implementations for cutting-edge ILT software</li>\n<li>Designing, developing, and deploying new GPU-accelerated algorithms for handling large-scale geometric data in mask synthesis tools</li>\n<li>Collaborating with software, hardware, and QA teams to ensure seamless integration of advanced GPU features into Synopsys solutions</li>\n<li>Leading benchmarking and performance testing efforts to maximize throughput and efficiency of GPU algorithms</li>\n<li>Conducting research and staying current on GPU technology advancements, integrating the latest trends into Synopsys EDA products</li>\n<li>Interfacing with customers and hardware vendors to deliver optimal solutions and support rapid chip manufacturing cycles</li>\n</ul>\n<p>This role requires a strong foundation in algorithms and data structures, with proven experience optimizing for performance. You should also have exceptional troubleshooting skills and the ability to resolve complex integration challenges.</p>\n<p>In return, you will have the opportunity to make a tangible impact in the world of electronic design automation and lead initiatives that shape the next generation of semiconductor technology.</p>\n<p>The team you will be a part of is a dynamic, diverse group of engineers focused on advancing mask synthesis and lithography solutions within Synopsys. The team is renowned for its innovative spirit, technical excellence, and collaborative approach, working closely with customers and hardware partners to deliver industry-leading EDA tools.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_38debdc4-b87","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Synopsys","sameAs":"https://careers.synopsys.com","logo":"https://logos.yubhub.co/careers.synopsys.com.png"},"x-apply-url":"https://careers.synopsys.com/job/bengaluru/gpu-r-and-d-engineer-cuda-programming/44408/91681543296","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Advanced knowledge of CUDA or similar GPU computing technologies","Proficiency in C/C++, Python, and distributed computing environments","Strong foundation in algorithms and data structures, with proven experience optimizing for performance","Exceptional troubleshooting skills and ability to resolve complex integration challenges","Experience with computational geometry algorithms, including Beziers, NURBS, and B-splines"],"x-skills-preferred":["Background in designing algorithms for Optical Proximity Correction and Inverse Lithography Technology","Experience with large-scale data handling and distributed systems"],"datePosted":"2026-04-05T13:22:03.873Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Bengaluru"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Advanced knowledge of CUDA or similar GPU computing technologies, Proficiency in C/C++, Python, and distributed computing environments, Strong foundation in algorithms and data structures, with proven experience optimizing for performance, Exceptional troubleshooting skills and ability to resolve complex integration challenges, Experience with computational geometry algorithms, including Beziers, NURBS, and B-splines, Background in designing algorithms for Optical Proximity Correction and Inverse Lithography Technology, Experience with large-scale data handling and distributed systems"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_1662ffb6-3c9"},"title":"R&D Engineering, Sr Staff Engineer","description":"<p>You will work as a senior staff engineer in the R&amp;D engineering team at Synopsys. As a member of this team, you will be responsible for architecting and optimizing high-performance simulation kernels for the Synopsys VCS RTL simulator using advanced C++ techniques. You will also explore and implement GPU acceleration strategies with CUDA to significantly reduce simulation runtimes for customers. Additionally, you will leverage deep knowledge of Verilog/SystemVerilog LRM to ensure accurate and reliable simulation across diverse design environments.</p>\n<p>Your responsibilities will include:</p>\n<ul>\n<li>Architecting and optimizing high-performance simulation kernels for the Synopsys VCS RTL simulator using advanced C++ techniques.</li>\n<li>Exploring and implementing GPU acceleration strategies with CUDA to significantly reduce simulation runtimes for customers.</li>\n<li>Leveraging deep knowledge of Verilog/SystemVerilog LRM to ensure accurate and reliable simulation across diverse design environments.</li>\n<li>Integrating AI-powered tools (such as Cursor, GitHub Copilot, and generative AI assistants) to automate code generation and debugging processes.</li>\n<li>Mentoring and guiding junior engineers, fostering skills development and technical growth within the team.</li>\n<li>Collaborating with distributed R&amp;D teams to maintain Synopsys&#39; leadership and drive innovation in the EDA industry.</li>\n</ul>\n<p>As a senior staff engineer, you will have a significant impact on the company&#39;s success. You will be responsible for driving the evolution of the world&#39;s fastest Verilog simulator, setting new industry standards for performance and reliability. You will also empower customers to achieve greater productivity and efficiency through advanced simulation capabilities and reduced runtimes.</p>\n<p>To be successful in this role, you will need to have:</p>\n<ul>\n<li>8-10 years of relevant experience.</li>\n<li>Expert-level proficiency in C++ with proven experience in performance-critical software development.</li>\n<li>Deep understanding of Verilog/SystemVerilog Language Reference Manuals (LRM) and simulation methodologies.</li>\n<li>Hands-on experience with GPU programming, especially using CUDA for parallel acceleration.</li>\n<li>Familiarity with AI-powered development tools such as Cursor, GitHub Copilot, and generative AI assistants.</li>\n<li>Strong architectural design skills and ability to analyze and optimize complex software systems.</li>\n<li>Experience in mentoring and guiding junior engineers within an R&amp;D environment.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_1662ffb6-3c9","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Synopsys","sameAs":"https://careers.synopsys.com","logo":"https://logos.yubhub.co/careers.synopsys.com.png"},"x-apply-url":"https://careers.synopsys.com/job/sunnyvale/r-and-d-engineering-sr-staff-engineer/44408/92995225280","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$165,000 - $248,000","x-skills-required":["C++","Verilog/SystemVerilog LRM","GPU programming","AI-powered development tools","architectural design skills"],"x-skills-preferred":["CUDA","Cursor","GitHub Copilot","generative AI assistants"],"datePosted":"2026-04-05T13:21:56.685Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Sunnyvale"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"C++, Verilog/SystemVerilog LRM, GPU programming, AI-powered development tools, architectural design skills, CUDA, Cursor, GitHub Copilot, generative AI assistants","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":165000,"maxValue":248000,"unitText":"YEAR"}}}]}