Forward Deployed Engineering Manager

fe04c8cc-782 Forward Deployed Engineering Manager Shape the Future of AI

At Labelbox, we're building the critical infrastructure that powers breakthrough AI models at leading research labs and enterprises. Since 2018, we've been pioneering data-centric approaches that are fundamental to AI development, and our work becomes even more essential as AI capabilities expand exponentially.

We're the only company offering three integrated solutions for frontier AI development:

Enterprise Platform & Tools: Advanced annotation tools, workflow automation, and quality control systems that enable teams to produce high-quality training data at scale

Frontier Data Labeling Service: Specialized data labeling through Alignerr, leveraging subject matter experts for next-generation AI models

Expert Marketplace: Connecting AI teams with highly skilled annotators and domain experts for flexible scaling

Why Join Us

High-Impact Environment: We operate like an early-stage startup, focusing on impact over process. You'll take on expanded responsibilities quickly, with career growth directly tied to your contributions.

Technical Excellence: Work at the cutting edge of AI development, collaborating with industry leaders and shaping the future of artificial intelligence.

Innovation at Speed: We celebrate those who take ownership, move fast, and deliver impact. Our environment rewards high agency and rapid execution.

Continuous Growth: Every role requires continuous learning and evolution. You'll be surrounded by curious minds solving complex problems at the frontier of AI.

Clear Ownership: You'll know exactly what you're responsible for and have the autonomy to execute. We empower people to drive results through clear ownership and metrics.

The role

We’re hiring a Forward Deployed Engineering Manager to lead the design, development, and delivery of reinforcement learning environments for agentic AI systems.

You’ll manage a team responsible for building sandboxed, reproducible environments,terminal-based workflows, browser automation, and computer-use simulations,that power both model training and human-in-the-loop evaluation. This is a hands-on leadership role where you’ll set technical direction, guide execution, and stay close to architecture and critical systems.

What You’ll Do

Lead, hire, and develop a high-performing team of Forward Deployed Engineers, setting a high bar for ownership, velocity, and technical quality

Own the RL environment roadmap, aligning team execution with customer needs and evolving model capabilities

Oversee development of sandboxed environments (terminal, browser, tool-augmented workspaces) that support deterministic execution and multi-step agent interaction

Ensure reliability, observability, and data integrity through strong instrumentation (logging, trajectory capture, state snapshotting)

Drive infrastructure excellence across containerization, sandboxing, CI/CD, automated testing, and monitoring

Partner cross-functionally with data operations, product, and leading AI labs to define task design, evaluation protocols, and environment requirements

Enable rapid prototyping and iteration, helping the team move from ambiguous requirements to production-ready systems quickly

Stay close to the technical details,reviewing architecture, unblocking complex issues, and guiding design decisions

What We’re Looking For

5+ years of software engineering experience (Python)

2+ years of experience managing or leading engineers in fast-paced environments

Strong experience with containerization and sandboxing (Docker, Firecracker, or similar)

Solid understanding of reinforcement learning fundamentals (MDPs, reward design, episode structure, observation/action spaces)

Background in infrastructure, developer tooling, or distributed systems

Strong debugging skills and systems thinking across layered, containerized environments

Ability to operate in ambiguity and translate loosely defined problems into clear execution plans

Excellent communication and stakeholder management skills

Preferred

Experience building or working with RL environments (Gym, PettingZoo) or agent benchmarks (SWE-bench, WebArena, OSWorld, TerminalBench)

Familiarity with cloud infrastructure (GCP or AWS)

Prior experience in AI/ML platforms, data companies, or research environments

Contributions to open-source projects in RL, agents, or developer tooling

Why This Role Matters

RL environment quality is a critical bottleneck in advancing agentic AI. Poorly designed or unreliable environments introduce noise into training loops and directly impact model performance.

In this role, you’ll lead the team building the environments that define how models learn,working across a range of cutting-edge projects with leading AI labs. Alignerr offers the speed and ownership of a startup with the scale and resources of Labelbox, giving you the opportunity to have outsized impact on the future of AI.

About Alignerr

Alignerr is Labelbox’s human data organization, powering next-generation AI through high-quality training data, reinforcement learning environments, and evaluation systems. We partner directly with leading AI labs to build the data and infrastructure that push model capabilities forward.

Life at Labelbox

Location: Join our dedicated tech hubs in San Francisco or Wrocław, Poland

Work Style: Hybrid model with 2 days per week in office, combining collaboration and flexibility

Environment: Fast-paced and high-intensity, perfect for ambitious individuals who thrive on ownership and quick decision-making

Growth: Career advancement opportunities directly tied to your impact

Vision: Be part of building the foundation for humanity's most transformative technology

Our Vision

We believe data will remain crucial in achieving artificial general intelligence. As AI models become more sophisticated, the need for high-quality, specialized training data will only grow. Join us in developing new products and services that enable the next generation of AI breakthroughs.

Labelbox is backed by leading investors including SoftBank, Andreessen Horowitz, B Capital, Gradient Ventures, Databricks Ventures, and Kleiner Perkins. Our customers include Fortune 500 enterprises and leading AI labs.

Any emails from Labelbox team members will originate from a @labelbox.com email address. If you encounter anything that raises suspicions during your interactions, we encourage you to exercise caution and suspend or discontinue communications.

XML job scraping automation by YubHub

]]> full-time senior hybrid $180,000-$220,000 USD Software engineering experience (Python), Containerization and sandboxing (Docker, Firecracker, or similar), Reinforcement learning fundamentals (MDPs, reward design, episode structure, observation/action spaces), Infrastructure, developer tooling, or distributed systems, Debugging skills and systems thinking, Experience building or working with RL environments (Gym, PettingZoo) or agent benchmarks (SWE-bench, WebArena, OSWorld, TerminalBench), Familiarity with cloud infrastructure (GCP or AWS), Prior experience in AI/ML platforms, data companies, or research environments, Contributions to open-source projects in RL, agents, or developer tooling Engineering Technology Labelbox https://logos.yubhub.co/labelbox.com.png Labelbox is a data-centric AI development company that provides critical infrastructure for breakthrough AI models. https://www.labelbox.com/ https://job-boards.greenhouse.io/labelbox/jobs/5101195007 San Francisco Bay Area 2026-04-18 c3599ca5-5e7 Research Engineer, Environment Scaling About the role

Responsibilities:

Improve and execute our fine-tuning strategies for adapting Claude to new domains and tasks
Manage technical relationships with external data vendors, including evaluation of data quality and reward design
Collaborate with domain experts to design data pipelines and evaluations
Explore novel ways of creating RL environments for high value tasks
Develop and improve QA frameworks to catch reward hacking and ensure environment quality
Partner with other RL research teams and product teams to translate capability goals into training environments and evals

You may be a good fit if you:

Have experience with fine-tuning large language models for specific domains or real-world use cases and/or domain expertise in an area where we would like to make our models more useful.
Have experience with reinforcement learning, reward design, or training data curation for LLMs
Are comfortable managing technical vendor relationships and iterating quickly on feedback
Find value in reading through datasets to understand them and spot issues
Have strong project management and interpersonal skills
Are passionate about making AI more useful and accessible across different industries
Are excited about a role that includes a combination of ML research, data operations, and project management

Strong candidates may also:

Have experience training production ML systems
Be familiar with distributed systems and cloud infrastructure
Have domain expertise in an area where we would like to make our models more useful
Have experience working with external vendors or technical partners

The annual compensation range for this role is $350,000-$850,000 USD.

XML job scraping automation by YubHub

]]> full-time staff hybrid $350,000-$850,000 USD fine-tuning large language models, reinforcement learning, reward design, training data curation, project management, interpersonal skills, distributed systems, cloud infrastructure, domain expertise, external vendors, technical partners Engineering Technology Anthropic https://logos.yubhub.co/anthropic.com.png Anthropic is a public benefit corporation that creates reliable, interpretable, and steerable AI systems. https://www.anthropic.com/ https://job-boards.greenhouse.io/anthropic/jobs/4951064008 Remote-Friendly (Travel Required) | San Francisco, CA 2026-04-18 912450ea-c61 Research Engineer, Environment Scaling About the role

The Environment Scaling team is a team of researchers and engineers whose goal is to improve the intelligence of our public models for novel verticals and use cases. The team builds the training environments that fuel RL at scale. This is a unique role that combines executing directly on ML research, data operations, and project management to improve our models. You'll own the end-to-end process of creating RL environments for new capabilities: identifying high-value tasks, designing reward signals, managing vendor relationships, and measuring impact on model performance.

Responsibilities:

Improve and execute our fine-tuning strategies for adapting Claude to new domains and tasks
Manage technical relationships with external data vendors, including evaluation of data quality and reward design
Collaborate with domain experts to design data pipelines and evaluations
Explore novel ways of creating RL environments for high value tasks
Develop and improve QA frameworks to catch reward hacking and ensure environment quality
Partner with other RL research teams and product teams to translate capability goals into training environments and evals

You may be a good fit if you:

Have experience with fine-tuning large language models for specific domains or real-world use cases and/or domain expertise in an area where we would like to make our models more useful.
Have experience with reinforcement learning, reward design, or training data curation for LLMs
Are comfortable managing technical vendor relationships and iterating quickly on feedback
Find value in reading through datasets to understand them and spot issues
Have strong project management and interpersonal skills
Are passionate about making AI more useful and accessible across different industries
Are excited about a role that includes a combination of ML research, data operations, and project management

Strong candidates may also:

Have experience training production ML systems
Be familiar with distributed systems and cloud infrastructure
Have domain expertise in an area where we would like to make our models more useful
Have experience working with external vendors or technical partners

Logistics

Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience.
Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.
Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.

We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work.

Your safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you're ever unsure about a communication, don't click any links—visit anthropic.com/careers directly for confirmed position openings.

How we're different

We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We're an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.

The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, and Learning from Human Preferences.

Come work with us!

Anthropic is a public benefit corporation headquartered in San Francisco, CA.

XML job scraping automation by YubHub

]]> full-time senior hybrid $350,000 - $850,000USD fine-tuning large language models, reinforcement learning, reward design, training data curation, project management, interpersonal skills, experience training production ML systems, distributed systems and cloud infrastructure, domain expertise in an area where we would like to make our models more useful, experience working with external vendors or technical partners Engineering Technology Anthropic https://logos.yubhub.co/anthropic.com.png Anthropic is a public benefit corporation that aims to create reliable, interpretable, and steerable AI systems. The company has a quickly growing team of researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. https://job-boards.greenhouse.io https://job-boards.greenhouse.io/anthropic/jobs/4951064008 San Francisco, CA 2026-03-08