Machine Learning Engineer, Global Public Sector

1c4de3ab-a58 Machine Learning Engineer, Global Public Sector We're hiring a Machine Learning Engineer to bridge the gap between frontier research and real-world impact. As a key member of our GPS Engineering team, you will lead the charge in research into Agent design, Deep Research and AI Safety/reliability, developing novel methodologies that not only power public sector applications but set new standards across the entire Scale organisation.

Your mission is threefold:

Frontier Research & Publication: Leading research into LLM/agent capabilities, reasoning, and safety, with the goal of publishing at top-tier venues (NeurIPS, ICML, ICLR).
Cross-Org Impact: Developing generalised techniques in Agent design, AI Safety and Deep Research agents that scale across our commercial and government platforms.
Mission-Critical Applications: Engineering high-stakes AI systems that impact millions of citizens globally.

You will:

Pioneer Novel Architectures: Design and train state-of-the-art models and agents, moving beyond “off-the-shelf” solutions to create custom architectures for complex public sector reasoning tasks.
Lead AI Safety Initiatives: Research and implement robust safety frameworks, including red teaming, alignment (RLHF/DPO), and bias mitigation strategies essential for sovereign AI.
Drive Deep Research Capabilities: Develop agents capable of long-horizon reasoning and autonomous information synthesis to solve complex problems for national security and public policy.
Publish and Contribute: Represent Scale in the broader research community by publishing high-impact papers and contributing to open-source breakthroughs.
Consult as a Subject Matter Expert: Act as a technical authority for public sector leaders, advising on the theoretical limits and safety requirements of emerging AI.
Build Evaluation Frontiers: Create new benchmarks and evaluation protocols that define what success looks like for high-stakes, non-commercial AI applications.

Ideally, you’d have:

Advanced Degree: PhD or Master’s in Computer Science, Mathematics, or a related field with a focus on Deep Learning.
Research Track Record: A portfolio of first-author publications at major conferences (NeurIPS, ICML, CVPR, EMNLP, etc.).
Engineering Rigour: Strong proficiency in Python, deep learning frameworks (PyTorch/JAX), with the ability to write production-ready code that scales.
Safety Expertise: Experience in alignment, robustness, or interpretability research.

Nice to haves:

Experience with large-scale distributed training on massive clusters.
Experience in building agentic systems that are reliable.
Experience in Sovereign AI or working with highly regulated data environments.
A zero-to-one mindset: Comfortable navigating ambiguity and defining research directions from scratch.

XML job scraping automation by YubHub

]]> full-time senior onsite Python, Deep Learning, PyTorch, JAX, AI Safety, Alignment, Robustness, Interpretability, Large-scale Distributed Training, Agentic Systems, Sovereign AI, Regulated Data Environments Engineering Technology Scale https://logos.yubhub.co/scale.com.png Scale develops reliable AI systems for the world's most important decisions. https://scale.com/ https://job-boards.greenhouse.io/scaleai/jobs/4413274005 Doha, Qatar; London, UK 2026-04-18 b79d9627-55a Research Engineer, Infrastructure, Training Systems We're seeking an infrastructure research engineer to design and build scalable, efficient training systems for large models. As a key member of our team, you'll take ownership of the training stack end-to-end, ensuring every GPU cycle drives scientific progress. Your goal is to make experimentation and training at Thinking Machines fast and reliable, allowing our research teams to focus on science, not system bottlenecks.

Key responsibilities include designing, implementing, and optimizing distributed training systems, developing high-performance optimizations, and establishing standards for reliability, maintainability, and security. You'll collaborate with researchers and engineers to build scalable infrastructure and publish learnings through internal documentation, open-source libraries, or technical reports.

We're looking for someone who blends deep systems and performance expertise with a curiosity for machine learning at scale. A strong understanding of deep learning frameworks, such as PyTorch, and experience working on distributed training for large models are preferred. If you have a track record of improving research productivity through infrastructure design or process improvements, that's a plus.

This role is based in San Francisco, California, and offers a competitive salary range of $350,000 - $475,000 USD per year, depending on background, skills, and experience. We sponsor visas and offer generous health, dental, and vision benefits, unlimited PTO, paid parental leave, and relocation support as needed.

XML job scraping automation by YubHub

]]> full-time senior onsite $350,000 - $475,000 USD per year deep learning frameworks, distributed training, high-performance optimizations, reliability, maintainability, and security, scalable infrastructure, past experience working on distributed training for large models, track record of improving research productivity through infrastructure design or process improvements, contributions to open-source ML infrastructure Engineering Technology Thinking Machines Lab https://logos.yubhub.co/thinkingmachines.ai.png Thinking Machines Lab develops AI products, including ChatGPT and Character.ai, and contributes to open-source projects like PyTorch. https://thinkingmachines.ai/ https://job-boards.greenhouse.io/thinkingmachines/jobs/5013932008 San Francisco 2026-04-18 f0f66ce3-d78 Senior GenAI Research Engineer - Optimization and Kernels As a research engineer on the Scaling team at Databricks, you will be responsible for keeping up with the latest developments in deep learning and advancing the scientific frontier by creating new techniques that go beyond the state of the art.

You will work together on a collaborative team of researchers and engineers with diverse backgrounds and technical training. Your goal will be to make our customers successful in applying state-of-the-art LLMs and AI systems, and we encode our scientific expertise into our products to make that possible.

Your responsibilities will include:

Driving performance improvements through advanced optimization techniques including kernel fusion, mixed precision, memory layout optimization, tiling strategies, and tensorization for training-specific patterns

Designing, implementing, and optimizing high-performance GPU kernels for training workloads (e.g., attention mechanisms, custom layers, gradient computation, activation functions) targeting NVIDIA architectures

Designing and implementing distributed training frameworks for large language models, including parallelism strategies (data, tensor, pipeline, ZeRO-based) and optimized communication patterns for gradient synchronization and collective operations

Profiling, debugging, and optimizing end-to-end training workflows to identify and resolve performance bottlenecks, applying memory optimization techniques like activation checkpointing, gradient sharding, and mixed precision training

We look for candidates with a strong background in computer science or a related field, hands-on experience writing and tuning CUDA kernels for ML training applications, and a deep understanding of parallelism techniques and memory optimization strategies for large-scale model training.

XML job scraping automation by YubHub

]]> full-time senior onsite $166,000-$225,000 USD CUDA, NVIDIA GPU architecture, PyTorch, distributed training frameworks, parallelism techniques, memory optimization strategies Engineering Technology Databricks https://logos.yubhub.co/databricks.com.png Databricks is a data and AI company that provides a unified platform for data, analytics, and AI. It was founded by the original creators of Lakehouse, Apache Spark, Delta Lake, and MLflow. https://databricks.com https://job-boards.greenhouse.io/databricks/jobs/8297797002 San Francisco, California 2026-04-18 ff4d3a91-b20 Principal Engineer - Perf and Benchmarking We're looking for a Principal Engineer to be the technical lead of CoreWeave's Benchmarking & Performance team. You will be responsible for our planet-scale performance data warehouse: Ingesting, storing, transforming and analyzing performance events in all the data centers across our global infrastructure.

You will also be an integral part of achieving industry-leading end-to-end performance benchmarking publications: If MLPerf (Training & Inference), Working closely with NVIDIA (Megatron-LM, TensorRT-LLM & DGX cloud) and the open-source community (llm-d, vLLM and all popular ML frameworks) speak to you, come help us demonstrate CoreWeave's performance reliability leadership in the field.

Responsibilities

Strategy & Leadership - Define the multi-year benchmarking strategy and roadmap; prioritize models/workloads (LLMs, diffusion, vision, speech) and hardware tiers. Build, lead, and mentor a high-performing team of performance engineers and data analysts. Establish governance for claims: documented methodologies, versioning, reproducibility, and audit trails.

Perf Ownership - Lead end-to-end MLPerf Inference and Training submissions: workload selection, cluster planning, runbooks, audits, and result publication. Coordinate optimization tracks with NVIDIA (CUDA, cuDNN, TensorRT/TensorRT-LLM, Triton, NCCL) to hit competitive results; drive upstream fixes where needed.

Internal Latency & Throughput Benchmarks - Design a Kubernetes-native, repeatable benchmarking service that exercises CoreWeave stacks across SUNK (Slurm on Kubernetes), Kueue, and Kubeflow pipelines. Measure and report p50/p95/p99 latency, jitter, tokens/s, time-to-first-token, cold-start/warm-start, and cost-per-token/request across models, precisions (BF16/FP8/FP4), batch sizes, and GPU types. Maintain a corpus of representative scenarios (streaming, batch, multi-tenant) and data sets; automate comparisons across software releases and hardware generations.

Tooling & Automation - Build CI/CD pipelines and K8s controllers/operators to schedule benchmarks at scale; integrate with observability stacks (Prometheus, Grafana, OpenTelemetry) and results warehouses. Implement supply-chain integrity for benchmark artifacts (SBOMs, Cosign signatures).

Cross-functional & Community - Partner with NVIDIA, key ISVs, and OSS projects (vLLM, Triton, KServe, PyTorch/DeepSpeed, ONNX Runtime) to co-develop optimizations and upstream improvements. Support Sales/SEs with authoritative numbers for RFPs and competitive evaluations; brief analysts and press with rigorous, defensible data.

Requirements

10+ years building distributed systems or HPC/cloud services, with deep expertise on large-scale ML training or similar high-performance workloads.

Proven track record of architecting or building planet-scale data systems (e.g., telemetry platforms, observability stacks, cloud data warehouses, large-scale OLAP engines).

Deep understanding of GPU performance (CUDA, NCCL, RDMA, NVLink/PCIe, memory bandwidth), model-server stacks (Triton, vLLM, TensorRT-LLM, TorchServe), and distributed training frameworks (PyTorch FSDP/DeepSpeed/Megatron-LM).

Proficient with Kubernetes and ML control planes; familiarity with SUNK, Kueue, and Kubeflow in production environments.

Excellent communicator able to interface with executives, customers, auditors, and OSS communities.

Nice to have

Experience with time-series databases, log-structured merge trees (LSM), or custom storage engine development.

Experience running MLPerf submissions (Inference and/or Training) or equivalent audited benchmarks at scale.

Contributions to MLPerf, Triton, vLLM, PyTorch, KServe, or similar OSS projects.

Experience benchmarking multi-region fleets and large clusters (thousands of GPUs).

Publications/talks on ML performance, latency engineering, or large-scale benchmarking methodology.

XML job scraping automation by YubHub

]]> full-time senior hybrid $206,000 to $333,000 Distributed systems, HPC/cloud services, Large-scale ML training, GPU performance, Model-server stacks, Distributed training frameworks, Kubernetes, ML control planes, Time-series databases, Log-structured merge trees, Custom storage engine development, MLPerf submissions, Audited benchmarks, Contributions to OSS projects, Benchmarking multi-region fleets, Large clusters, Publications/talks on ML performance Engineering Technology CoreWeave https://logos.yubhub.co/coreweave.com.png CoreWeave is a cloud-based platform for artificial intelligence that provides technology, tools, and teams to enable innovators to build and scale AI with confidence. https://www.coreweave.com https://job-boards.greenhouse.io/coreweave/jobs/4627302006 Sunnyvale, CA / Bellevue, WA 2026-04-18 854e95b5-76b Sr. Director of Product, Research and Training Infrastructure CoreWeave is seeking a visionary Sr. Director of Product, Research Training Infrastructure to lead the product strategy and engineering execution for the services that power the most ambitious AI research labs in the world.

This executive leader will own the product strategy and engineering execution for the Research Training Stack, focusing on the specialized orchestration, evaluation, and iteration tools required for massive-scale pre-training and post-training.

Key responsibilities include:

Frontier Orchestration: Oversee the evolution of SUNK (Slurm on Kubernetes) to provide researchers with deterministic, bare-metal performance through a cloud-native interface.

Holistic Training Services: Drive the development of next-generation orchestrators and automated training-based evaluation frameworks that ensure model quality throughout the lifecycle.

Post-Training Excellence: Build the infrastructure required for sophisticated Reinforcement Learning (RL) and RLHF pipelines, enabling labs to refine foundation models with maximum efficiency.

Customer Advocacy: Act as the primary technical partner for lead researchers at global AI labs, translating their 'future-state' requirements into actionable product roadmaps.

Requirements include:

Proven leadership experience in engineering leadership, with at least 5+ years managing large-scale infrastructure at a top-tier research lab or an AI-native cloud provider.

Deep, hands-on knowledge of Slurm, Kubernetes, and the specific networking requirements (InfiniBand/RDMA) for distributed training clusters.

Research mindset and understanding of the 'pain points' of a research scientist.

Scaling experience delivering mission-critical services on multi-thousand GPU clusters (H100/Blackwell/Rubin architectures).

Strategic vision to define 'what's next' in the AI stack, from automated RL loops to specialized sandbox environments.

Why CoreWeave?

In 2026, CoreWeave is the foundation of the largest infrastructure buildout in human history. We are building AI Factories, not just data centers.

Silicon-Up Innovation: Work directly with the latest NVIDIA architectures.

Impact: You will be the architect of the environment that enables the next new discovery.

Velocity: We move at the speed of the researchers we support, bypassing legacy cloud bottlenecks to deliver raw power.

XML job scraping automation by YubHub

]]> full-time executive hybrid $233,000 to $341,000 Slurm, Kubernetes, InfiniBand/RDMA, Distributed training clusters, GPU clusters, H100/Blackwell/Rubin architectures, Reinforcement Learning (RL), RLHF pipelines Engineering Technology CoreWeave https://logos.yubhub.co/coreweave.com.png CoreWeave is a cloud computing company that provides infrastructure and tools for artificial intelligence research and development. https://www.coreweave.com https://job-boards.greenhouse.io/coreweave/jobs/4665964006 Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA 2026-04-18 d6f9b362-dbe Senior Machine Learning Engineer, ML Training Platform As a Senior Machine Learning Engineer on the Machine Learning Platform team at Reddit, you will be instrumental in architecting, implementing, and maintaining foundational Machine Learning (ML) infrastructure that powers Feeds Ranking, Content Understanding, Recommendations and more.

You will deliver a self-service ML platform that enables the continuous iteration and improvement of systems that use ML techniques including Deep Learning, Natural Language Processing, Recommendation Systems, Representation Learning and Computer Vision.

Key responsibilities include:

Leading the building, testing, and maintenance of ML training infrastructure at Reddit
Designing, building, and optimizing the infrastructure and tooling required to support large-scale machine learning workflows
Evolving the MLE experience, from provisioning interactive GPU environments through large-scale training, supporting on-demand and self-service workflows

You will work closely with the underlying compute team to ensure MLEs have efficient access to training hardware resources and handle resource contention gracefully.

In addition to technical expertise, you will treat internal MLEs as your customers, conducting user research, reducing friction in the 'Idea-to-Prototype' loop, and standardizing software environments (Docker images, Python dependency management).

To be successful in this role, you will have 5+ years of software engineering experience, with a focus on Platform Engineering, ML Infrastructure, or Backend Systems. You will also have deep Kubernetes expertise, Jupyter Ecosystem knowledge, strong coding skills in Python and Go, and experience with GPU environments, cloud providers, and distributed training frameworks.

XML job scraping automation by YubHub

]]> full-time senior remote $216,700-$303,400 USD Kubernetes, Jupyter Ecosystem, Python, Go, GPU environments, Cloud providers, Distributed training frameworks Engineering Technology Reddit https://logos.yubhub.co/redditinc.com.png Reddit is a community-driven platform with over 100,000 active communities and 121 million daily active unique visitors. https://www.redditinc.com https://job-boards.greenhouse.io/reddit/jobs/7074776 Remote - United States 2026-04-18 f723a069-05a Engineering Manager, Notifications Relevance We are looking for an Engineering Manager to lead our Notifications Relevance team, shaping the future of Notifications at Reddit. In this role, you will lead a team of machine learning engineers dedicated to advancing our current Notifications Relevance systems.

This is a high-impact team driving DAU growth and long-term user retention by connecting users to what matters most to them. If applying ML / AI in production to improve the relevance of Reddit Notifications excites you, then you’ve found the right place.

Responsibilities:

Lead the team that architects and designs notifications relevance at Reddit.
Guide team on holistic, adaptive systems covering budgeting optimization, candidate retrieval, and ranking.
Work with ML engineers to design, implement, and optimize machine-learning models that drive personalization and user re-engagement.
Participate in the full development cycle: design, develop, QA, experiment, analyze, and deploy.
Build and maintain a diverse team that can collaborate across disciplines to find technical solutions to complex challenges.
Serve as a thought partner to product and upper management to ensure your team’s plans align with company goals.
Communicate your team’s work and set expectations with cross-functional stakeholders.
Help your engineers identify career goals and create development plans to achieve them.
Constantly seek opportunities to push your engineers & managers outside their comfort zone and turn followers into leaders.

Requirements:

2+ years of experience building and managing engineering teams.
5+ years of experience as a Machine Learning Engineer or Software Engineer working on large-scale machine learning systems.
Deep understanding of building and deploying large-scale recommender systems (retrieval + ranking) in production.
Hands-on experience working with deep learning models, sequential features and real-time systems.
Experience with distributed training and inference using tools like Ray, PyTorch Distributed, or similar.
Familiarity with reinforcement learning or multi-objective optimization in recommendation systems.
Entrepreneurial and self-directed, innovative, results-oriented, biased towards action in fast-paced environments.
Able to communicate and discuss complex topics with technical and non-technical audiences.
Able to tackle ambiguous and undefined problems.

Benefits:

Comprehensive Healthcare Benefits and Income Replacement Programs
401k with Employer Match
Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support
Family Planning Support
Gender-Affirming Care
Mental Health & Coaching Benefits
Flexible Vacation & Paid Volunteer Time Off
Generous Paid Parental Leave

XML job scraping automation by YubHub

]]> full-time senior remote $230,000-$322,000 USD Machine Learning Engineer, Software Engineer, Deep Learning Models, Sequential Features, Real-Time Systems, Distributed Training, Inference, Reinforcement Learning, Multi-Objective Optimization Engineering Technology Reddit https://logos.yubhub.co/redditinc.com.png Reddit is a community-driven platform with over 100,000 active communities and 121 million daily active unique visitors. https://www.redditinc.com https://job-boards.greenhouse.io/reddit/jobs/7340793 Remote - United States 2026-04-18 71554e46-b64 Senior Engineering Manager, AI Runtime At Databricks, we are committed to enabling data teams to solve the world's toughest problems. As a Senior Engineering Manager, you will lead the team owning both the product experience and the foundational infrastructure of our AI Runtime (AIR) product.

You will be responsible for shaping customer-facing capabilities while designing for scalability, extensibility, and performance of GPU training and adjacent areas. This will involve collaborating closely across the platform, product, infrastructure, and research organisations.

Key responsibilities include:

Leading, mentoring, and growing a high-performing engineering team responsible for the Custom Training product and its foundational infrastructure
Defining and owning the product and technical roadmap for AIR, balancing customer experience, functionality, and foundational investments
Collaborating closely with product, research, platform, infrastructure teams, and customers to drive end-to-end delivery
Driving architectural decisions and product design for managed GPU training at scale
Advocating for customer needs through direct engagement, ensuring engineering decisions translate to clear product impact

We are looking for someone with 8+ years of software engineering experience, with 3+ years in engineering management. You should have a track record of building and operating managed GPU training infrastructure at scale, as well as deep familiarity with distributed training frameworks and parallelism strategies.

In addition, you should have experience with training resilience patterns, such as checkpointing, elastic training, and automated failure recovery for long-running jobs. You should also have a strong understanding of GPU performance fundamentals, including NCCL, interconnect topologies, and memory optimisation.

Experience building platform products with clear SLAs is also essential, as is strong cross-functional leadership across platform, product, and research teams. Excellent collaboration and communication skills are also required.

The pay range for this role is $228,600-$314,250 USD per year, depending on location. The total compensation package may also include eligibility for annual performance bonus, equity, and benefits.

XML job scraping automation by YubHub

]]> full-time senior onsite $228,600-$314,250 USD per year software engineering, engineering management, distributed training frameworks, parallelism strategies, GPU training infrastructure, checkpointing, elastic training, automated failure recovery, GPU performance fundamentals, NCCL, interconnect topologies, memory optimisation Engineering Technology Databricks https://logos.yubhub.co/databricks.com.png Databricks is a data and AI company that provides a unified platform for data, analytics, and AI. It was founded by the original creators of Lakehouse, Apache Spark, Delta Lake, and MLflow. https://databricks.com https://job-boards.greenhouse.io/databricks/jobs/8490282002 Mountain View, California; San Francisco, California 2026-04-18 d1728879-43b Staff Product Manager, AI Platform At Databricks, we are building the world's best data and AI infrastructure platform. The AI Platform team builds the infrastructure that powers machine learning and AI at scale on Databricks. Our products span the full ML lifecycle , from feature engineering and model training to model serving and monitoring , enabling data and AI teams to build, deploy, and operate production ML systems with confidence.

You will join a team that ships products used by thousands of the world's most sophisticated data and AI organizations. You will drive the vision and roadmap for AI platform product areas and define how customers build, train, deploy, and monitor AI and ML systems on Databricks. You will collaborate across engineering teams to deliver an integrated and powerful path from experimentation to production.

The impact you will have:

Own the product roadmap for AI platform areas , defining what we build, why, and in what order , to accelerate customer adoption of AI and ML in production.
Drive strategy for key AI platform capabilities, shaping how enterprises operationalize AI at scale.
Partner closely with engineering teams to make deeply technical decisions about ML infrastructure , from distributed training architectures to real-time serving systems.
Represent the voice of the customer by engaging directly with enterprise ML teams, translating their pain points and workflows into platform capabilities that simplify the path to production AI.
Collaborate with GTM, Solutions Architecture, and Customer Success teams to drive enterprise adoption, shape field enablement, and inform competitive positioning.
Define pricing, packaging, and commercialization strategy for AI platform features, working with business teams to maximize value capture.
Grow end-user engagement with Databricks AI tools by identifying adoption bottlenecks and partnering cross-functionally to remove them.

XML job scraping automation by YubHub

]]> full-time staff remote $172,600-$237,325 USD Product Management, AI Platform, Machine Learning, Data Science, Cloud Services, ML/AI Infrastructure, Distributed Training Architectures, Real-Time Serving Systems, Recommendation Systems, Feature Stores, Vector Search, LLM Infrastructure Engineering Technology Databricks https://logos.yubhub.co/databricks.com.png Databricks is a data and AI company that provides a unified platform for data and AI workloads. It was founded by the original creators of Apache Spark, Delta Lake, and MLflow. https://databricks.com https://job-boards.greenhouse.io/databricks/jobs/8427940002 Seattle, Washington 2026-04-18 9ecceef8-349 Research Engineer/Research Scientist, Audio We are seeking a Research Engineer/Research Scientist to join our Audio team. As a member of this team, you will work across the full stack of audio ML, developing audio codecs and representations, sourcing and synthesizing high-quality audio data, training large-scale speech language models and large audio diffusion models, and developing novel architectures for incorporating continuous signals into LLMs.

Our team focuses primarily but not exclusively on speech, building advanced steerable systems spanning end-to-end conversational systems, speech and audio understanding models, and speech synthesis capabilities. The team works closely with many collaborators across pretraining, finetuning, reinforcement learning, production inference, and product to get advanced audio technologies from early research to high-impact real-world deployments.

Responsibilities:

Develop and train audio models, including conversational speech-to-speech, speech translation, speech recognition, text-to-speech, diarization, codecs, and generative audio models
Work across abstraction levels, from signal processing fundamentals to large-scale model training and inference optimization
Collaborate with teams across the company to develop and deploy audio technologies
Communicate clearly and effectively with colleagues and stakeholders

Strong candidates may also have experience with:

Large language model pretraining and finetuning
Training diffusion models for image and audio generation
Reinforcement learning for large language models and diffusion models
End-to-end system optimization, from performance benchmarking to kernel optimization
GPUs, Kubernetes, PyTorch, or distributed training infrastructure

Representative projects:

Training state-of-the-art neural audio codecs for 48 kHz stereo audio
Developing novel algorithms for diffusion pretraining and reinforcement learning
Scaling audio datasets to millions of hours of high-quality audio
Creating robust evaluation methodologies for hard-to-measure qualities such as naturalness or expressiveness
Studying training dynamics of mixed audio-text language models
Optimizing latency and inference throughput for deployed streaming audio systems

XML job scraping automation by YubHub

]]> full-time senior hybrid $350,000-$500,000 USD JAX, PyTorch, large-scale distributed training, signal processing fundamentals, speech language models, audio diffusion models, continuous signals, LLMs, large language model pretraining, diffusion models, reinforcement learning, end-to-end system optimization, GPUs, Kubernetes, distributed training infrastructure Engineering Technology Anthropic https://logos.yubhub.co/anthropic.com.png Anthropic is a public benefit corporation that creates reliable, interpretable, and steerable AI systems. https://www.anthropic.com/ https://job-boards.greenhouse.io/anthropic/jobs/5074815008 San Francisco, CA 2026-04-18 f49203e0-6c6 Research Engineer, Science of Scaling We are seeking a Research Engineer/Scientist to join the Science of Scaling team, responsible for developing the next generation of large language models. In this role, you will work at the intersection of cutting-edge research and practical engineering, contributing to the development of safe, steerable, and trustworthy AI systems.

Responsibilities:

Conduct research into the science of converting compute into intelligence
Independently lead small research projects while collaborating with team members on larger initiatives
Design, run, and analyze scientific experiments to advance our understanding of large language models
Optimize training infrastructure to improve efficiency and reliability
Develop dev tooling to enhance team productivity

You may be a good fit if you:

Have significant software engineering experience and a proven track record of building complex systems
Hold an advanced degree (MS or PhD) in Computer Science, Machine Learning, or a related field
Are proficient in Python and experienced with deep learning frameworks
Are results-oriented with a bias towards flexibility and impact
Enjoy pair programming and collaborative work, and are willing to take on tasks outside your job description to support the team
View research and engineering as two sides of the same coin, seeking to understand all aspects of the research program to maximize impact
Care about the societal impacts of your work and have ambitious goals for AI safety and general progress

Strong candidates may have:

Experience with JAX
Experience with reinforcement learning
Experience working on high-performance, large-scale ML systems
Familiarity with accelerators, Kubernetes, and OS internals
Experience with language modeling using transformer architectures
Background in large-scale ETL processes
Experience with distributed training at scale (thousands of accelerators)

Strong candidates need not have:

Experience in all of the above areas , we value breadth of interest and willingness to learn over checking every box
Prior work specifically on language models or transformers; strong engineering fundamentals and ML knowledge transfer well
An advanced degree , exceptional engineers with strong research instincts are equally encouraged to apply

The annual compensation range for this role is £260,000-£630,000 GBP.

XML job scraping automation by YubHub

]]> full-time senior hybrid £260,000-£630,000 GBP Python, Deep learning frameworks, Software engineering, Machine learning, Advanced degree in Computer Science or related field, JAX, Reinforcement learning, High-performance, large-scale ML systems, Accelerators, Kubernetes, OS internals, Language modeling using transformer architectures, Large-scale ETL processes, Distributed training at scale Engineering Technology Anthropic https://logos.yubhub.co/anthropic.com.png Anthropic is a public benefit corporation that creates reliable, interpretable, and steerable AI systems. https://www.anthropic.com/ https://job-boards.greenhouse.io/anthropic/jobs/5126127008 London, UK 2026-04-18 279d67f2-5b5 Research Engineer / Research Scientist, Tokens We're looking for a Research Engineer / Research Scientist to join our team. As a Research Engineer, you'll touch all parts of our code and infrastructure, whether that's making the cluster more reliable for our big jobs, improving throughput and efficiency, running and designing scientific experiments, or improving our dev tooling.

You'll be working on large-scale ML systems from the ground up, making safe, steerable, trustworthy systems. You'll be excited to write code when you understand the research context and more broadly why it's important.

Strong candidates may also have experience with high performance, large-scale ML systems, GPUs, Kubernetes, Pytorch, or OS internals, language modeling with transformers, reinforcement learning, and large-scale ETL.

Representative projects may include optimizing the throughput of a new attention mechanism, comparing the compute efficiency of two Transformer variants, making a Wikipedia dataset in a format models can easily consume, scaling a distributed training job to thousands of GPUs, writing a design doc for fault tolerance strategies, and creating an interactive visualization of attention between tokens in a language model.

The annual compensation range for this role is $350,000-$500,000 USD.

XML job scraping automation by YubHub

]]> full-time mid hybrid $350,000-$500,000 USD software engineering, machine learning, high performance computing, Kubernetes, Pytorch, OS internals, language modeling, reinforcement learning, large-scale ETL, GPU, transformers, distributed training Engineering Technology Anthropic https://logos.yubhub.co/anthropic.com.png Anthropic is a public benefit corporation that creates reliable, interpretable, and steerable AI systems. https://www.anthropic.com/ https://job-boards.greenhouse.io/anthropic/jobs/4951814008 New York City, NY; New York City, NY | Seattle, WA; San Francisco, CA 2026-04-18 d5b743bb-d8f Product Manager, AI Platforms The AI Platform Product Manager will drive the strategy and execution of Shield AI's next-generation autonomy intelligence stack. This PM owns the product vision and roadmap for the Hivemind AI Platform, ensuring we can manufacture, govern, and field advanced world models, robotics foundation models, and vision-language-action systems safely and at scale.

This role sits at the intersection of AI/ML, autonomy, model lifecycle, infrastructure, and product strategy. The PM partners closely with engineering, AI research, Hivemind Solutions, and field teams to deliver the tooling that enables sovereign autonomy, AI Factories at the edge, and continuous learning,capabilities that are central to Shield AI's strategic direction.

This is a high-impact role for an experienced product leader excited to define how foundation models are trained, validated, governed, and deployed across thousands of autonomous systems in highly contested environments.

Responsibilities:

AI Model Development & Training Platform

Own the roadmap for foundation model training workflows, including dataset ingestion, curation, labeling, synthetic data generation, domain model training, and distillation pipelines. Define requirements for world models, robotics models, and VLA-based training, evaluation, and specialization. Lead the evolution of MLOps capabilities in Forge, including data lineage, experiment tracking, model versioning, and scalable evaluation suites.

Data, Simulation & Synthetic Data Factory

Define product requirements for synthetic data generation, simulation-integrated data flywheels, and automated scenario generation. Partner with Digital Twin, Simulation, and autonomy teams to convert natural-language mission inputs into data needs, training procedures, and model variants.

Safe Deployment & Model Governance

Lead the development of model governance and auditability tooling, including model cards, dataset rights, lineage tracking, safety gates, and compliance evidence. Build guardrails and workflows to safely deploy models onto edge hardware in disconnected, GPS- or comms-denied environments. Partner with Safety, Certification, Cyber, and Engineering teams to ensure traceability and evaluation pipelines meet operational and accreditation requirements.

Edge Deployment & AI Factory Integration

Partner with Pilot, EdgeOS, and hardware teams to integrate foundation-model-based perception and reasoning into autonomy behaviors. Define requirements for distillation, quantization, and inference tooling as part of the “three-computer” development and deployment model. Ensure closed-loop workflows between cloud model training and edge-native execution.

Cross-Functional Leadership

Collaborate with Engineering, Research, Product, Customer Engagement, and Solutions teams to ensure model outputs meet mission and platform constraints. Translate advanced AI capabilities into intuitive workflows that platform OEMs and partner nations can use to build sovereign AI factories. Sequence foundational capabilities that unblock autonomy, simulation, and customer-facing product teams.

User & Customer Impact

Develop deep empathy for ML engineers, autonomy developers, and Solutions engineers who rely on the platform. Capture operational data gaps, mission-driven model needs, and domain-specific specialization requirements. Lead demos and onboarding for model-development capabilities across internal and external teams.

XML job scraping automation by YubHub

]]> full-time senior onsite $190,000 - $290,000 a year AI Model Development & Training Platform, Data, Simulation & Synthetic Data Factory, Safe Deployment & Model Governance, Edge Deployment & AI Factory Integration, Cross-Functional Leadership, User & Customer Impact, Strong engineering background, Deep understanding of foundation models, robotics models, multimodal models, MLOps, and training infrastructure, Experience managing complex products spanning data pipelines, cloud training clusters, model governance, and edge deployments, Proven success partnering with research teams to transition ML innovations into stable, production-grade workflows, Experience working on autonomy, robotics, embedded AI, or mission-critical systems, Hands-on familiarity with GPU infrastructure, distributed training, or data lakehouse architectures, Experience supporting defense, dual-use, or safety-critical AI systems, Background designing or operating AI Factory–style pipelines (data → training → evaluation → distillation → edge deployment), Advanced degree in engineering, ML/AI, robotics, or a related field Engineering Technology Shield AI https://logos.yubhub.co/shield.ai.png Shield AI is a venture-backed deep-tech company founded in 2015, developing intelligent systems to protect service members and civilians. https://www.shield.ai https://jobs.lever.co/shieldai/7886f437-2d5e-4616-8dcb-3dc488f1f585 San Diego 2026-04-17 d2256e99-10a Research Engineer, Machine Learning About Mistral AI

Mistral AI is a pioneering company shaping the future of AI. They believe in the power of AI to simplify tasks, save time, and enhance learning and creativity.

Role Summary

The Research Engineering team at Mistral AI spans Platform (shared infra & clean code) and Embedded (inside research squads). Engineers can move along the research↔production spectrum as needs or interests evolve. As a Research Engineer – ML track, you’ll build and optimise the large-scale learning systems that power their open-weight models.

Responsibilities

Accelerate researchers by taking on the heavy parts of large-scale ML pipelines and building robust tools.
Interface cutting-edge research with production: integrate checkpoints, streamline evaluation, and expose APIs.
Conduct experiments on the latest deep-learning techniques (sparsified 70 B + runs, distributed training on thousands of GPUs).
Design, implement and benchmark ML algorithms; write clear, efficient code in Python.
Deliver prototypes that become production-grade components for Le Chat and their enterprise API.

Requirements

Master’s or PhD in Computer Science (or equivalent proven track record).
4 + years working on large-scale ML codebases.
Hands-on with PyTorch, JAX or TensorFlow; comfortable with distributed training (DeepSpeed / FSDP / SLURM / K8s).
Experience in deep learning, NLP or LLMs; bonus for CUDA or data-pipeline chops.
Strong software-design instincts: testing, code review, CI/CD.
Self-starter, low-ego, collaborative.

What we offer

Competitive salary and equity.
Healthcare: Medical/Dental/Vision covered for you and your family.
Pension: 401K (6% matching)
PTO: 18 days
Transportation: Reimburse office parking charges, or $120/month for public transport
Sport: $120/month reimbursement for gym membership
Meal stipend: $400 monthly allowance for meals (solution might evolve as they grow bigger)
Visa sponsorship
Coaching: they offer BetterUp coaching on a voluntary basis

XML job scraping automation by YubHub

]]> full-time senior hybrid PyTorch, JAX, TensorFlow, Distributed training, Deep learning, NLP, LLMs, CUDA, Data pipeline Engineering Technology Mistral AI https://logos.yubhub.co/mistral.ai.png Mistral AI develops and provides high-performance, open-source AI models, products, and solutions. Their comprehensive AI platform meets both enterprise and personal needs. https://mistral.ai/careers https://jobs.lever.co/mistral/bada0014-0f32-4370-b55f-81c5595c7339 Palo Alto 2026-04-17 5c28c97d-fc5 Member of Technical Staff - Image / Video Generation Job Title

Member of Technical Staff - Image / Video Generation

Job Description

We're the team behind Latent Diffusion, Stable Diffusion, and FLUX,foundational technologies that changed how the world creates images and video. We're creating the generative models that power how people make images and video,tools used by millions of creators, developers, and businesses worldwide. Our FLUX models are among the most advanced in the world, and we’re just getting started.

Why This Role

You'll train large-scale diffusion models for image and video generation, exploring new approaches while maintaining the rigor that helps us distinguish meaningful progress from incremental tweaks. This isn't about following established recipes,it's about running the experiments that clarify which architectural choices matter and which are less impactful.

What You’ll Work On

Trains large-scale diffusion transformer models for image and video data, working at the scale where intuitions break and empirical evidence matters
Rigorously ablates design choices,running experiments that isolate variables, control for confounds, and produce insights you can actually trust,then communicating those results to shape our research direction
Reasons about the speed-quality tradeoffs of neural network architectures in production settings where both constraints matter simultaneously
Fine-tunes diffusion models for specialized applications like image and video upscalers, inpainting/outpainting models, and other tasks where general-purpose models aren't enough

What We’re Looking For

You've trained large-scale diffusion models and developed strong intuitions about what matters. You know that at research scale, every design choice has tradeoffs, and the only way to know which ones are worth making is through careful ablation. You're comfortable debugging distributed training issues and presenting research findings to the team.

Required Skills

Hands-on experience training large-scale diffusion models for image and video data, with practical knowledge of common failure modes and what matters most in training
Experience fine-tuning diffusion models for specialized applications,upscalers, inpainting, outpainting, or other tasks where understanding the domain matters as much as understanding the architecture
Deep understanding of how to effectively evaluate image and video generative models,knowing which metrics correlate with quality and which are just convenient proxies
Strong proficiency in PyTorch, transformer architectures, and the full ecosystem of modern deep learning
Solid understanding of distributed training techniques,FSDP, low precision training, model parallelism,because our models don't fit on one GPU and training decisions impact research outcomes

Preferred Skills

Experience writing forward and backward Triton kernels and ensuring their correctness while considering floating point errors
Proficiency with profiling, debugging, and optimizing single and multi-GPU operations using tools like Nsight or stack trace viewers
Know the performance characteristics of different architectural choices at scale
Have published research that contributed to how people think about generative models

How We Work Together

We’re a distributed team with real offices that people actually use. Depending on your role, you’ll either join us in Freiburg or SF at least 2 days a week (or one full week every other week), or work remotely with a monthly in-person week to stay connected. We’ll cover reasonable travel costs to make this possible. We think in-person time matters, and we’ve structured things to make it accessible to all. We’ll discuss what this will look like for the role during our interview process.

XML job scraping automation by YubHub

]]> full-time staff hybrid large-scale diffusion models, image and video data, PyTorch, transformer architectures, distributed training techniques, writing forward and backward Triton kernels, profiling, debugging, and optimizing single and multi-GPU operations, published research on generative models Engineering Technology Black Forest Labs https://logos.yubhub.co/blackforestlabs.com.png Black Forest Labs is a research lab developing foundational technologies for image and video generation. They have a growing presence in San Francisco and headquarters in Freiburg, Germany. https://www.blackforestlabs.com/ https://job-boards.greenhouse.io/blackforestlabs/jobs/4132217008 Freiburg (Germany) 2026-04-17 dd6ebd20-17d Research Scientist, Gemini Diffusion We're looking for a Research Scientist to join our team in London and help us accelerate our mission. As a Research Scientist, you will apply your deep scientific knowledge and research skills to advance paradigm-shifting research at a large scale. You will be at the heart of our efforts to deliver step-changes in the capabilities of our frontier models, with a significant focus on our Gemini Diffusion project.

Your work may involve brainstorming new disruptive ideas that could become the next generation of frontier AI models, particularly within the text diffusion space. You will prototype and develop these ideas with the rest of the team, contributing directly to Gemini Diffusion research. You will solve key research challenges by designing and executing experimental research on text diffusion models, sharing analyses, and proposing next steps. You will rigorously validate the theoretical and practical impact of our work at a large scale. You will work collaboratively with other Generative AI teams to move the technologies we develop out of the lab and into production. You will advance the fundamental architecture, algorithmic design, and capabilities of large-scale diffusion models. You will bring deep scientific expertise into our projects, sharing your insights and knowledge with other researchers and engineers.

XML job scraping automation by YubHub

]]> full-time senior onsite Advanced degree in computer science, electrical engineering, science, mathematics, or equivalent experience, Academic research experience in machine learning, publications, or research experience in related fields, Experience with some or all LLMs, Transformers, Diffusion models, Text diffusion, Large-scale distributed training, Strong communication skills (via discussion, presentation, technical and research writing, whiteboarding, etc.), Programming experience, particularly with Python-based scientific libraries such as Numpy, Scipy, JAX, PyTorch, or TensorFlow, A track record of building software, either in open source or as part of a company product or research papers, Large-scale system design, distributed systems, Distributed computation for ML, especially in the context of accelerators (e.g., sharding, multi-host computation), C++ or broader programming experience, Data engineering and visualisation Engineering Technology Google DeepMind https://logos.yubhub.co/deepmind.com.png Google DeepMind is a technology company that focuses on artificial intelligence research and development. https://deepmind.com/ https://job-boards.greenhouse.io/deepmind/jobs/7700399 London, UK 2026-03-16 ea503adf-fac Research Engineer, Machine Learning About the Role

We are seeking a Research Engineer to join our Machine Learning team. As a Research Engineer, you will work on building and optimizing large-scale learning systems that power our open-weight models.

Responsibilities

Accelerate researchers by taking on the heavy parts of large-scale ML pipelines and building robust tools.
Interface cutting-edge research with production: integrate checkpoints, streamline evaluation, and expose APIs.
Conduct experiments on the latest deep-learning techniques.
Design, implement and benchmark ML algorithms; write clear, efficient code in Python.
Deliver prototypes that become production-grade components for Le Chat and our enterprise API.

Requirements

Master's or PhD in Computer Science (or equivalent proven track record).
4 + years working on large-scale ML codebases.
Hands-on with PyTorch, JAX or TensorFlow; comfortable with distributed training (DeepSpeed / FSDP / SLURM / K8s).
Experience in deep learning, NLP or LLMs; bonus for CUDA or data-pipeline chops.
Strong software-design instincts: testing, code review, CI/CD.
Self-starter, low-ego, collaborative.

Benefits

Competitive cash salary and equity
Food: Daily lunch vouchers
Sport: Monthly contribution to a Gympass subscription
Transportation: Monthly contribution to a mobility pass
Health: Full health insurance for you and your family
Parental: Generous parental leave policy
Visa sponsorship

XML job scraping automation by YubHub

]]> full-time senior hybrid PyTorch, JAX, TensorFlow, Distributed training, Deep learning, NLP, LLMs, CUDA, Data pipeline Engineering Technology Mistral AI Mistral AI is an AI technology company that develops high-performance, open-source, and cutting-edge models, products, and solutions. https://mistral.ai/careers https://jobs.lever.co/mistral/07447e1d-7900-46d4-b61b-186f2f76847f Paris 2026-03-10 797494bd-994 Research Engineer, Machine Learning About Mistral AI

Mistral AI is a pioneering company that develops and provides high-performance, open-source AI models, products, and solutions.

Role Summary

The Research Engineering team at Mistral AI spans Platform (shared infrastructure and clean code) and Embedded (inside research squads). Engineers can move along the research↔production spectrum as needs or interests evolve.

As a Research Engineer – ML track, you’ll build and optimize the large-scale learning systems that power our open-weight models. Working hand-in-hand with Research Scientists, you’ll either join:

Platform RE Team: Enhance the shared training framework, data pipelines, and cluster tooling used by every team;
Embedded RE Team: Sit inside a research squad (Alignment, Pre-training, Multimodal, …) and turn fresh ideas into repeatable, scalable code.

Responsibilities

Accelerate researchers by taking on the heavy parts of large-scale ML pipelines and building robust tools.
Interface cutting-edge research with production: integrate checkpoints, streamline evaluation, and expose APIs.
Conduct experiments on the latest deep-learning techniques (sparsified 70 B + runs, distributed training on thousands of GPUs).
Design, implement, and benchmark ML algorithms; write clear, efficient code in Python.
Deliver prototypes that become production-grade components for Le Chat and our enterprise API.

Requirements

Master’s or PhD in Computer Science (or equivalent proven track record).
4 + years working on large-scale ML codebases.
Hands-on with PyTorch, JAX, or TensorFlow; comfortable with distributed training (DeepSpeed / FSDP / SLURM / K8s).
Experience in deep learning, NLP, or LLMs; bonus for CUDA or data-pipeline chops.
Strong software-design instincts: testing, code review, CI/CD.
Self-starter, low-ego, collaborative.

What We Offer

Competitive salary and equity.
Healthcare: Medical/Dental/Vision covered for you and your family.
Pension: 401K (6% matching).
PTO: 18 days.
Transportation: Reimburse office parking charges, or $120/month for public transport.
Sport: $120/month reimbursement for gym membership.
Meal stipend: $400 monthly allowance for meals (solution might evolve as we grow bigger).
Visa sponsorship.
Coaching: we offer BetterUp coaching on a voluntary basis.

XML job scraping automation by YubHub

]]> full-time senior hybrid PyTorch, JAX, TensorFlow, Distributed Training, Deep Learning, NLP, LLMs, CUDA, Data Pipelines Engineering Technology Mistral AI Mistral AI develops and provides high-performance, open-source AI models, products, and solutions. The company has a diverse workforce distributed across multiple countries. https://mistral.ai/careers https://jobs.lever.co/mistral/bada0014-0f32-4370-b55f-81c5595c7339 Palo Alto 2026-03-10 1060dfc7-676 Solution Architect, Computer Aided Engineering Solution Architect, Computer Aided Engineering

We are looking for a Solution Architect with deep expertise in AI solutions to drive the efficient use of groundbreaking compute platforms across industries. As a trusted technical advisor to our CAE developers and customers, you will be responsible for embedding NVIDIA software into developers' architectures and workflows.

What you'll be doing:

Support Business Development and Sales teams as part of a team of 4, partnering with Industry Business leads, Account Managers, and Developer Relations managers to drive our developers' ecosystem success.
Work directly with developers and customers in a customer-facing setting.
Support developers in adopting NVIDIA libraries and software frameworks as the foundation for modern AI and data platforms.
Analyze application architectures and find opportunities for acceleration.
Provide feedback and collaborate with engineering, product, and research teams.
Deliver trainings, hackathons, and technical demonstrations on NVIDIA solutions and platforms.

What we need to see:

A MS/PhD degree in Machine Learning, Computational Science, Physics, or a related technical field.
Minimum of 5 years of technical experience in Physics-Machine Learning.
Experience in engineering simulations (e.g. fluid dynamics, atmospheric science, Computer-Aided Engineering technologies).
Familiarity with accelerated computing platforms and GPU-based distributed systems.
Experience in algorithm programming using languages like Python and C/C++.
Development experience using major AI frameworks (e.g., PyTorch, Tensorflow, and similar tools).
Familiarity with containers, numerical libraries, modular software design, version control, GitHub.
Experience designing, prototyping, and building complex AI/ML-based solutions for customers.
Able to reason across components such as data pipelines, models, compute, networking, and orchestration.
Solid written and oral communications skills and familiarity with collaborative environments.
Great teammate who can learn, react, and adapt quickly with a mentality to work for a fast-paced environment.

Ways to stand out from the crowd:

Development experience with NVIDIA software libraries and GPUs.
Experience with Kubernetes, distributed training, and large-scale inference.
Experience supporting or utilizing PCIe accelerators such as GPUs, FPGAs, DSPs from evaluation to production stages.

XML job scraping automation by YubHub

]]> full-time senior remote Machine Learning, Computational Science, Physics, Python, C/C++, PyTorch, Tensorflow, Containers, Numerical libraries, Modular software design, Version control, GitHub, Kubernetes, Distributed training, Large-scale inference, NVIDIA software libraries, GPU-based distributed systems Engineering Technology NVIDIA https://logos.yubhub.co/nvidia.com.png NVIDIA is a technology company that has been transforming computer graphics, PC gaming, and accelerated computing for over 25 years. It has a legacy of innovation and a diverse range of products and services. https://nvidia.wd5.myworkdayjobs.com https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/Switzerland-Remote/Solution-Architect--Computer-Aided-Engineering_JR2014310-1 2026-03-09 e4704a60-8d4 Research Engineer / Research Scientist, Pre-training About Anthropic

Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

About the team

We are seeking passionate Research Scientists and Engineers to join our growing Pre-training team in Zurich. We are involved in developing the next generation of large language models. The team primarily focuses on multimodal capabilities: giving LLMs the ability to understand and interact with modalities other than text.

In this role, you will work at the intersection of cutting-edge research and practical engineering, contributing to the development of safe, steerable, and trustworthy AI systems.

Responsibilities

In this role you will interact with many parts of the engineering and research stacks.

Conduct research and implement solutions in areas such as model architecture, algorithms, data processing, and optimizer development

Independently lead small research projects while collaborating with team members on larger initiatives

Design, run, and analyse scientific experiments to advance our understanding of large language models

Optimise and scale our training infrastructure to improve efficiency and reliability

Develop and improve dev tooling to enhance team productivity

Contribute to the entire stack, from low-level optimisations to high-level model design

Qualifications & Experience

We encourage you to apply even if you do not believe you meet every single criterion. Because we focus on so many areas, the team is looking for both experienced engineers and strong researchers, and encourage anyone along the researcher/engineer spectrum to apply.

Degree (BA required, MS or PhD preferred) in Computer Science, Machine Learning, or a related field

Strong software engineering skills with a proven track record of building complex systems

Expertise in Python and deep learning frameworks

Have worked on high-performance, large-scale ML systems, particularly in the context of language modelling

Familiarity with ML Accelerators, Kubernetes, and large-scale data processing

Strong problem-solving skills and a results-oriented mindset

Excellent communication skills and ability to work in a collaborative environment

You'll thrive in this role if you

Have significant software engineering experience

Are able to balance research goals with practical engineering constraints

Are happy to take on tasks outside your job description to support the team

Enjoy pair programming and collaborative work

Are eager to learn more about machine learning research

Are enthusiastic to work at an organisation that functions as a single, cohesive team pursuing large-scale AI research projects

Have ambitious goals for AI safety and general progress in the next few years, and you’re excited to create the best outcomes over the long-term

Sample Projects

Optimising the throughput of novel attention mechanisms

Proposing Transformer variants, and experimentally comparing their performance

Preparing large-scale datasets for model consumption

Scaling distributed training jobs to thousands of accelerators

Designing fault tolerance strategies for training infrastructure

Creating interactive visualisations of model internals, such as attention patterns

If you're excited about pushing the boundaries of AI while prioritising safety and ethics, we want to hear from you!

Logistics

Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience. Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.

Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.

How we're different

We believe that the highest-impact work in AI safety and general progress in the next few years will be done by a single, cohesive team pursuing large-scale AI research projects. We're committed to creating a work environment that is inclusive, diverse, and supportive of our team members' well-being and career growth.

Career Growth

We're committed to helping our team members grow and develop their careers. We offer opportunities for professional development, mentorship, and career advancement. We believe that our team members are the key to our success, and we're committed to supporting their growth and development.

Benefits

We offer a competitive salary and benefits package, including health insurance, retirement savings, and paid time off. We also offer a range of perks, including a generous parental leave policy, flexible work arrangements, and access to cutting-edge technology and tools.

How to Apply

If you're excited about joining our team and contributing to the development of safe, steerable, and trustworthy AI systems, please submit your application. We can't wait to hear from you!

XML job scraping automation by YubHub

]]> full-time senior hybrid CHF280,000 - CHF680,000 Python, Deep learning frameworks, Machine learning, Software engineering, Kubernetes, ML Accelerators, Large-scale data processing, Transformer variants, Attention mechanisms, Distributed training jobs, Fault tolerance strategies, Interactive visualisations Engineering Technology Anthropic https://logos.yubhub.co/anthropic.com.png Anthropic is a company that aims to create reliable, interpretable, and steerable AI systems. It has a team of researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. https://job-boards.greenhouse.io https://job-boards.greenhouse.io/anthropic/jobs/5135168008 Zürich 2026-03-08 9c72720b-6af Research Engineer, Science of Scaling About Anthropic

About the role

Anthropic is seeking a Research Engineer/Scientist to join the Science of Scaling team, responsible for developing the next generation of large language models. In this role, you will work at the intersection of cutting-edge research and practical engineering, contributing to the development of safe, steerable, and trustworthy AI systems. You'll contribute across the entire stack, from low-level optimizations to high-level algorithm and experimental design, balancing research goals with practical engineering constraints.

Responsibilities:

Conduct research into the science of converting compute into intelligence
Independently lead small research projects while collaborating with team members on larger initiatives
Design, run, and analyse scientific experiments to advance our understanding of large language models
Optimise training infrastructure to improve efficiency and reliability
Develop dev tooling to enhance team productivity

You may be a good fit if you:

Have significant software engineering experience and a proven track record of building complex systems
Hold an advanced degree (MS or PhD) in Computer Science, Machine Learning, or a related field
Are proficient in Python and experienced with deep learning frameworks
Are results-oriented with a bias towards flexibility and impact
Enjoy pair programming and collaborative work, and are willing to take on tasks outside your job description to support the team
View research and engineering as two sides of the same coin, seeking to understand all aspects of the research program to maximise impact
Care about the societal impacts of your work and have ambitious goals for AI safety and general progress

Strong candidates may have:

Experience with JAX
Experience with reinforcement learning
Experience working on high-performance, large-scale ML systems
Familiarity with accelerators, Kubernetes, and OS internals
Experience with language modeling using transformer architectures
Background in large-scale ETL processes
Experience with distributed training at scale (thousands of accelerators)

Strong candidates need not have:

Experience in all of the above areas — we value breadth of interest and willingness to learn over checking every box
Prior work specifically on language models or transformers; strong engineering fundamentals and ML knowledge transfer well
An advanced degree — exceptional engineers with strong research instincts are equally encouraged to apply

Logistics

Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience.
Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.
Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.

We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work.

Your safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you're ever unsure about a communication, don't click any links—visit anthropic.com/careers directly for confirmed position openings.

How we're different

We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We're an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.

The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including

XML job scraping automation by YubHub

]]> full-time senior hybrid £260,000 - £630,000GBP software engineering, Python, deep learning frameworks, JAX, reinforcement learning, high-performance, large-scale ML systems, accelerators, Kubernetes, OS internals, language modeling using transformer architectures, large-scale ETL processes, distributed training at scale, JAX, reinforcement learning, high-performance, large-scale ML systems, accelerators, Kubernetes, OS internals, language modeling using transformer architectures, large-scale ETL processes, distributed training at scale Engineering Technology Anthropic https://logos.yubhub.co/anthropic.com.png Anthropic is a company that aims to create reliable, interpretable, and steerable AI systems. It has a quickly growing team of researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. https://job-boards.greenhouse.io https://job-boards.greenhouse.io/anthropic/jobs/5126127008 London, UK 2026-03-08 58928a28-64d Research Engineer/Research Scientist, Audio About Anthropic

You may be a good fit if you:

Have hands-on experience with training audio models, whether that's conversational speech-to-speech, speech translation, speech recognition, text-to-speech, diarization, codecs, or generative audio models
Genuinely enjoy both research and engineering work, and you'd describe your ideal split as roughly 50/50 rather than heavily weighted toward one or the other
Are comfortable working across abstraction levels, from signal processing fundamentals to large-scale model training and inference optimization
Have deep expertise with JAX, PyTorch, or large-scale distributed training, and can debug performance issues across the full stack
Thrive in fast-moving environments where the most important problem might shift as we learn more about what works
Communicate clearly and collaborate effectively; audio touches many parts of our systems, so you'll work closely with teams across the company
Are passionate about building conversational AI that feels natural, steerable, and safe
Care about the societal impacts of voice AI and want to help shape how these systems are developed responsibly

Strong candidates may also have experience with:

Large language model pretraining and finetuning
Training diffusion models for image and audio generation
Reinforcement learning for large language models and diffusion models
End-to-end system optimization, from performance benchmarking to kernel optimization
GPUs, Kubernetes, PyTorch, or distributed training infrastructure

Representative projects:

Training state-of-the art neural audio codecs for 48 kHz stereo audio
Developing novel algorithms for diffusion pretraining and reinforcement learning
Scaling audio datasets to millions of hours of high quality audio
Creating robust evaluation methodologies for hard-to-measure qualities such as naturalness or expressiveness
Studying training dynamics of mixed audio-text language models
Optimizing latency and inference throughput for deployed streaming audio systems

Logistics

We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work.

Your safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you're ever unsure about a communication, don't click any links—visit anthropic.com/careers directly for confirmed position openings.

How we're different

XML job scraping automation by YubHub

]]> full-time senior hybrid $350,000 - $500,000 USD audio models, speech-to-speech, speech translation, speech recognition, text-to-speech, diarization, codecs, generative audio models, JAX, PyTorch, large-scale distributed training, large language model pretraining, training diffusion models, reinforcement learning, end-to-end system optimization, GPUs, Kubernetes, PyTorch, distributed training infrastructure Engineering Technology Anthropic https://logos.yubhub.co/anthropic.com.png Anthropic is a quickly growing organisation that aims to create reliable, interpretable, and steerable AI systems. The company's mission is to make AI safe and beneficial for users and society as a whole. https://job-boards.greenhouse.io https://job-boards.greenhouse.io/anthropic/jobs/5074815008 San Francisco, CA 2026-03-08 cfee4a87-9c7 Member of Technical Staff, Multimodal Infrastructure Summary

Microsoft AI are looking for a talented Member of Technical Staff, Multimodal Infrastructure to help build the next wave of capabilities of our personalized AI assistant, Copilot. We're looking for someone who will bring an abundance of positive energy, empathy, and kindness to the team every day, in addition to being highly effective.

About the Role

We're looking for someone who will design, develop and maintain large-scale multimodal data processing pipelines, model pretraining and post-training frameworks, and model inference and serving frameworks. You will work closely with research scientists and product engineers on multimodal data processing, model training, inference and serving tasks. As a contributing member of the core group of engineers, you would also bring to the table best practices driving architectural changes and influence roadmap of relevant software and hardware components.

Accountabilities

Design, develop and maintain large-scale multimodal data processing pipelines.
Design, develop and maintain large-scale multimodal model pretraining and post-training frameworks.
Design, develop and maintain large-scale multimodal model inference and serving frameworks.

The Candidate we're looking for

Experience:

Bachelor's Degree in Computer Science, or related technical discipline AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python.

Technical skills:

Strong proficiency in distributed data processing infra (resource utilization management, fault tolerance, ray & spark) and CPU/GPU batch processing optimizations.
Experience with state-of-art model inference and serving frameworks.
Experience with image/video/audio data processing.
Experience with common data formats for efficient I/O.

Personal attributes:

Enjoy working in a fast-paced, design-driven, product development cycle.
Embody our Culture and Values.

Benefits

Starting January 26, 2026, MAI employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.) or 25 miles (non-U.S., country-specific) of that location.
Comprehensive health and wellbeing benefits.
Professional development opportunities.
Financial benefits (bonus, equity, pension, etc.).

XML job scraping automation by YubHub

]]> full-time staff hybrid C, C++, C#, Java, JavaScript, Python, Distributed data processing infra, CPU/GPU batch processing optimizations, State-of-art model inference and serving frameworks, Image/video/audio data processing, Common data formats for efficient I/O, Deep learning frameworks, Auto-regressive and diffusion transformer models, Distributed training techniques, Image/video generation and editing, Efficient architectures, Efficient model design, Reinforcement learning training methods Engineering Technology Microsoft AI https://logos.yubhub.co/microsoft.ai.png Microsoft AI is a leading technology company that specializes in artificial intelligence and machine learning. They are known for their innovative products and services that aim to make a positive impact on people's lives. Microsoft AI is committed to advancing the field of AI and making it more accessible to everyone. https://microsoft.ai https://microsoft.ai/job/member-of-technical-staff-multimodal-infrastructure-mai-superintelligence-team-2/ Redmond 2026-03-06 a82f064b-623 Member of Technical Staff, Multimodal Infrastructure Summary

Microsoft AI are looking for a talented Member of Technical Staff, Multimodal Infrastructure to help build the next wave of capabilities of our personalized AI assistant, Copilot. We’re looking for someone who will bring an abundance of positive energy, empathy, and kindness to the team every day, in addition to being highly effective.

About the Role

As a Member of Technical Staff, Multimodal Infrastructure, you will be responsible for designing, developing, and maintaining large-scale multimodal data processing pipelines, model pretraining and post-training frameworks, and model inference and serving frameworks. You will work closely with research scientists and product engineers to solve infra-related problems and find a path to get things done despite roadblocks to get your work into the hands of users quickly and iteratively.

Accountabilities

Design, develop, and maintain large-scale multimodal data processing pipelines.
Design, develop, and maintain large-scale multimodal model pretraining and post-training frameworks.
Design, develop, and maintain large-scale multimodal model inference and serving frameworks.

The Candidate we're looking for

Experience:

Bachelor’s Degree in Computer Science, or related technical discipline AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python.

Technical skills:

Strong proficiency in distributed data processing infra (resource utilization management, fault tolerance, ray & spark) and CPU/GPU batch processing optimizations.
Experience with state-of-art model inference and serving frameworks.
Experience with image/video/audio data processing.
Experience with common data formats for efficient I/O.

Personal attributes:

Enjoy working in a fast-paced, design-driven, product development cycle.
Embody our Culture and Values.

Benefits

Starting January 26, 2026, MAI employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.) or 25 miles (non-U.S., country-specific) of that location.
Comprehensive health and wellbeing benefits.
Professional development opportunities.
Financial benefits (bonus, equity, pension, etc.).

XML job scraping automation by YubHub

]]> full-time staff hybrid C, C++, C#, Java, JavaScript, Python, Distributed data processing infra, CPU/GPU batch processing optimizations, State-of-art model inference and serving frameworks, Image/video/audio data processing, Common data formats for efficient I/O, Auto-regressive and diffusion transformer models, Distributed training techniques, Image/video generation and editing, Efficient architectures, Efficient model design, Reinforcement learning training methods Engineering Technology Microsoft AI https://logos.yubhub.co/microsoft.ai.png Microsoft AI is a leading technology company that specializes in artificial intelligence and machine learning. They are known for their innovative products and services that aim to make a positive impact on people's lives. Microsoft AI is a subsidiary of Microsoft Corporation, a multinational technology company that was founded in 1975. https://microsoft.ai https://microsoft.ai/job/member-of-technical-staff-multimodal-infrastructure-mai-superintelligence-team/ Mountain View 2026-03-06 05ee2dc4-b1b Principal Applied Scientist Summary

Microsoft AI are looking for a talented Principal Applied Scientist at their Hyderabad office. This role sits at the heart of shaping and improving sports experiences on Bing. You will own end-to-end outcomes: turn ambiguous product questions into measurable hypotheses, define success metrics, design experiments, and ship data-driven ML solutions that move customer and business KPIs.

About the Role

We are seeking an experienced, self-directed Principal Applied Data Scientist to shape and improve sports experiences on Bing. In this individual contributor role, you will own end-to-end outcomes: turn ambiguous product questions into measurable hypotheses, define success metrics, design experiments, and ship data-driven ML solutions that move customer and business KPIs. You will apply modern NLP/IR and multimodal methods—including training and adapting Large Language Models (LLMs) and Small Language Models (SLMs)—to deliver accurate, fresh, and helpful sports answers and discovery experiences at global scale.

Accountabilities

Define north-star metrics and guardrails; build measurement plans, offline scorecards, and online A/B tests; interpret results and drive clear ship/iterate decisions.
Build, train, and adapt LLM/SLM solutions for sports scenarios (prompting, supervised fine-tuning, distillation, and domain adaptation), using disciplined evaluation and error analysis to improve quality, latency, and cost.

The Candidate we're looking for

Experience:

6+ years related experience (e.g., statistics, predictive analytics, research).

Technical skills:

Proficiency in Python (and one of C++/C#/Java preferred) and deep learning frameworks (e.g., PyTorch, TensorFlow); experience with distributed training/inference is a plus.

Personal attributes:

Demonstrated ability to lead through influence as a senior IC: independently define strategy, drive execution across teams, and deliver measurable impact.

Benefits

Starting January 26, 2026, Microsoft AI employees who live within a 50-mile commute of a designated Microsoft office in the U.S. or 25-mile commute of a non-U.S., country-specific location are expected to work from the office at least four days per week.

XML job scraping automation by YubHub

]]> full-time senior hybrid statistics, predictive analytics, research, Python, deep learning frameworks, distributed training/inference, publications, patents, open-source contributions Engineering Technology Microsoft AI https://logos.yubhub.co/microsoft.ai.png Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. https://microsoft.ai https://microsoft.ai/job/principal-applied-scientist-8/ Hyderabad 2026-03-06 7decb2ea-8dc Internship - Machine Learning Research Engineer We are seeking a highly motivated and talented Machine Learning Research Engineer to join our team in Berlin. As a member of our research team, you will be responsible for developing and implementing new machine learning models and algorithms to improve the performance of our search and retrieval systems.

What you'll do

Relentlessly push search quality forward — through models, data, tools, or any other leverage available.
Train, and optimize large-scale deep learning models using frameworks like PyTorch, leveraging distributed training (e.g., PyTorch Distributed, DeepSpeed, FSDP) and hardware acceleration, with a focus on retrieval and ranking models.
Conduct research in representation learning, including contrastive learning, multilingual, evaluation, and multimodal modeling for search and retrieval.
Build and optimize RAG pipelines for grounding and answer generation.

What you need

Understanding of search and retrieval systems, including quality evaluation principles and metrics.
Strong proficiency with PyTorch, including experience in distributed training techniques and performance optimization for large models.
Interested in representation learning, including contrastive learning, dense & sparse vector representations, representation fusion, cross-lingual representation alignment, training data optimization and robust evaluation.
Publication record in AI/ML conferences or workshops (e.g., NeurIPS, ICML, ICLR, ACL, EMNLP, SIGIR).

Why this matters

As a Machine Learning Research Engineer at Perplexity, you will have the opportunity to work on cutting-edge projects that have a direct impact on the performance of our search and retrieval systems. Your contributions will help us to improve the accuracy and efficiency of our models, and ultimately, provide better results for our users.

XML job scraping automation by YubHub

]]> internship entry hybrid PyTorch, distributed training, representation learning, contrastive learning, dense & sparse vector representations, representation fusion Engineering Technology Perplexity https://logos.yubhub.co/perplexity.com.png Perplexity is a leading AI company that provides innovative solutions for search and retrieval systems. With a strong focus on research and development, they aim to push the boundaries of what is possible in the field of artificial intelligence. https://jobs.ashbyhq.com https://jobs.ashbyhq.com/perplexity/b9e1ff15-d52a-46d5-abf0-26460f2a116c Berlin 2026-03-04