Applied Research Engineer

0e93287d-e38 Applied Research Engineer Shape the Future of AI

At Labelbox, we're building the critical infrastructure that powers breakthrough AI models at leading research labs and enterprises. Since 2018, we've been pioneering data-centric approaches that are fundamental to AI development, and our work becomes even more essential as AI capabilities expand exponentially.

As an Applied Research Engineer at Labelbox, you will be at the forefront of developing cutting-edge systems and methods to create, analyze, and leverage high-quality human-in-the-loop data for frontier model developers. Your role will involve designing and implementing advanced systems that align human feedback into AI training processes, such as Reinforcement Learning from Human Feedback (RLHF), Direct Preference Optimization (DPO), etc. You will also work on innovative techniques to measure and improve human data quality, and develop AI-assisted tools to enhance the data labeling process.

Your Impact

Advance the field of AI alignment by developing cutting-edge methods, such as RLHF and novel approaches, that ensure AI systems reflect human preferences more accurately.

Improve the quality of human-in-the-loop data by designing and deploying rigorous measurement and enhancement systems, leading to more reliable AI training.

Increase efficiency and effectiveness in AI-assisted data labeling by creating tools that leverage active learning and adaptive sampling, reducing manual effort while improving accuracy.

Shape the next generation of AI models by investigating how different types of human feedback (e.g., demonstrations, preferences, critiques) impact model performance and alignment.

Optimize human feedback collection by developing novel algorithms that enhance how AI learns from human input, improving model adaptability and responsiveness.

Bridge research and real-world application by integrating breakthroughs into Labelbox’s product suite, making human-AI alignment techniques scalable and impactful for users.

Drive industry innovation by engaging with customers and the broader AI community to understand evolving data needs and share best practices for training frontier models.

Contribute to the AI research ecosystem by publishing in top-tier journals, presenting at leading conferences, and influencing the future of human-centric AI.

Stay ahead of AI advancements by continuously exploring new frontiers in human-AI collaboration, human data quality, and AI alignment, keeping Labelbox at the cutting edge.

Establish Labelbox as a thought leader in AI by creating technical documentation, blog posts, and educational content that shape the industry's approach to human-centric AI development.

What You Bring

A strong foundation in AI and machine learning, backed by a Ph.D. or Master’s degree in Computer Science, Machine Learning, AI, or a related field.

Proven experience (3+ years) in solving complex ML challenges and delivering impactful solutions that improve real-world AI applications.

Expertise in designing and implementing data quality measurement and refinement systems that directly enhance model performance and reliability.

A deep understanding of frontier AI models,such as large language models and multimodal models,and the human data strategies needed to optimize them.

Proficiency in Python and experience with deep learning frameworks like PyTorch, JAX, or TensorFlow to prototype and develop cutting-edge solutions.

A track record of publishing in top-tier AI/ML conferences (e.g., NeurIPS, ICML, ICLR, ACL, EMNLP, NAACL) and contributing to the broader research community.

The ability to bridge research and application by interpreting new findings and rapidly translating them into functional prototypes.

Strong analytical and problem-solving skills that enable you to tackle ambiguous AI challenges with structured, data-driven approaches.

Exceptional communication and collaboration skills, allowing you to work effectively across multidisciplinary teams and with external stakeholders.

Labelbox Applied Research

At Labelbox Applied Research, we're committed to pushing the boundaries of AI and data-centric machine learning, with a particular focus on advanced human-AI interaction techniques. We believe that high-quality human data and sophisticated human feedback integration methods are key to unlocking the next generation of AI capabilities. Our research team works at the intersection of machine learning, human-computer interaction, and AI ethics to develop innovative solutions that can be practically applied in real-world scenarios.

We foster an environment of intellectual curiosity, collaboration, and innovation. We encourage our researchers to explore new ideas, engage in open discussions, and contribute to the wider AI community through publications and conference presentations. Our goal is to be at the forefront of human-centric AI development, setting new standards for how AI systems learn from and interact with humans.

XML job scraping automation by YubHub

]]> full-time senior hybrid $250,000-$300,000 USD AI, Machine Learning, Deep Learning, Python, PyTorch, JAX, TensorFlow, Data Quality Measurement, Refinement Systems, Human-AI Interaction, Frontier AI Models, Large Language Models, Multimodal Models Engineering Technology Labelbox https://logos.yubhub.co/labelbox.com.png Labelbox is a software company that provides a platform for data-centric AI development. https://www.labelbox.com/ https://job-boards.greenhouse.io/labelbox/jobs/4640965007 San Francisco Bay Area 2026-04-18 557894f1-074 Prompt Engineer, Agent Prompts & Evals About Anthropic

Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

About the Role

We’re looking for prompt and context engineers to join our product engineering team to help build AI-first products, features, and evaluations. Your mission will be to bridge the gap between model capabilities and real product experience, working with product teams to build consistent, safe, and beneficial user experiences across all product surfaces.

You will be deeply involved in new product feature and model releases at Anthropic, combining engineering expertise with an understanding of frontier AI applications and model quality. You’ll become an expert on Claude’s behavioural quirks and capabilities and apply that knowledge to deliver the best possible user experience across models and domains. You’ll be the first resource for product teams working on Claude’s AI infrastructure: system prompts, tool prompts, skills, and evaluations.

This role requires someone who can effectively balance caring deeply about making Claude the best it can be while also supporting a wide variety of concurrent projects and efforts across many product teams.

Key Responsibilities

Prompt Engineering Excellence: Design, test, and optimise system prompts and feature-specific prompts that shape Claude’s behaviour across consumer and API products.

Evaluation Development: Build and maintain comprehensive evaluation suites that ensure model quality and consistency across product launches and updates.

Cross-functional Collaboration: Partner closely with product teams, research teams, and safeguards to ensure new features meet quality and safety standards.

Model Launch Support: Play a critical role in model releases, ensuring smooth rollouts and catching regressions before they impact users.

Infrastructure Contribution: Help build and improve the frameworks and tools that allow teams to develop and test prompts and features with confidence.

Knowledge Transfer: Mentor product engineers on prompt engineering best practices and help teams build their first evaluations.

Rapid Iteration: Work in a fast-paced environment where model capabilities advance daily, requiring quick adaptation and creative problem-solving.

What We’re Looking For

Required Qualifications

5+ years of software engineering experience with Python or similar languages.

Demonstrated experience with LLMs and prompt engineering (through work, research, or significant personal projects).

Strong understanding of evaluation methodologies and metrics for AI systems.

Excellent written and verbal communication skills – you’ll need to explain complex model behaviours to diverse stakeholders.

Ability to manage multiple concurrent projects and prioritise effectively.

Experience with version control, CI/CD, and modern software development practices.

Preferred Qualifications

Experience with Claude or other frontier AI models in production settings.

Background in machine learning, NLP, or related fields.

Experience with A/B testing and experimentation frameworks (e.g., Statsig).

Familiarity with AI safety and alignment considerations.

Experience building tools and infrastructure for ML/AI workflows.

Track record of improving AI system performance through systematic evaluation and iteration.

You Might Thrive in This Role If You…

Get excited about the nuances of how language models behave and love finding creative ways to improve their outputs.

Enjoy being at the intersection of research and product, translating cutting-edge capabilities into user value.

Are comfortable with ambiguity and can define success metrics for novel AI features.

Have a strong sense of ownership and drive projects from conception to production.

Are passionate about building AI systems that are helpful, harmless, and honest.

Thrive in collaborative environments and enjoy teaching others.

Logistics

Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience.

Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.

Visa sponsorship: We do sponsor visas! However, we aren’t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.

We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you’re interested in this work.

Your safety matters to us. To protect yourself from potential scams, we want to remind you that we will never ask you to pay any fees for the hiring process. If someone contacts you claiming to be from Anthropic and asks for money, please report it to us immediately.

XML job scraping automation by YubHub

]]> full-time senior hybrid $320,000 - $405,000USD Python, LLMs, Prompt engineering, Evaluation methodologies, Metrics for AI systems, Version control, CI/CD, Modern software development practices, Claude, Frontier AI models, Machine learning, NLP, A/B testing, Experimentation frameworks, AI safety, Alignment considerations, Tools and infrastructure for ML/AI workflows Engineering Technology Anthropic https://logos.yubhub.co/anthropic.com.png Anthropic is a quickly growing organisation with a mission to create reliable, interpretable, and steerable AI systems. Their team is a group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. https://job-boards.greenhouse.io https://job-boards.greenhouse.io/anthropic/jobs/5107121008 San Francisco, CA | New York City, NY 2026-03-08 dbcceacb-d90 Model Behavior Architect We're looking for a Model Behavior Architect to help build Perplexity's AI products and evaluations. You'll sit within our AI team and collaborate closely with research and product teams, designing prompt and context engineering strategies to deliver high quality user experiences across multiple domains and models.

What you'll do

Context Engineering: Design, test, and optimize context strategies and system prompts that shape answer engine behavior across products, features, and use cases.
Evaluation Systems: Build automated and semi-automated evaluation pipelines that measure model quality, catch regressions, and scale across product surfaces.
Model Launch Support: Partner with research and engineering to validate model behavior before and during rollouts, ensuring smooth transitions with no degradation.
Research & Analysis: Identify inconsistencies and failure modes in model outputs through well-designed research projects — for both internal and production-facing systems.
Cross-functional Collaboration: Work closely with design, product, and research teams to translate product goals into concrete model behavior requirements.
Knowledge Sharing: Help engineers across teams build intuition for prompt design, context engineering, and evaluation best practices.
Staying Current: Track the latest alignment, evaluation, and prompting techniques from industry and academia, and bring the best ideas back to the team.

What you need

Experience designing evaluations, benchmarks, or metrics for AI systems.
Strong written and verbal communication skills, particularly in explaining complex concepts to diverse stakeholders.
Ability to manage multiple concurrent projects in a fast-moving environment.
Strong experience with Perplexity or other frontier AI models in production settings.
Demonstrated experience with Python — you'll prototype, debug, automate, and build systems at scale.
3+ years of experience working with LLMs in a product or research setting.

XML job scraping automation by YubHub

]]> full-time senior onsite $180K - $270K experience designing evaluations, benchmarks, or metrics for AI systems, strong written and verbal communication skills, ability to manage multiple concurrent projects, strong experience with Perplexity or other frontier AI models, demonstrated experience with Python, 3+ years of experience working with LLMs, experience with A/B testing or experimentation frameworks, track record of improving AI system performance through systematic evaluation and iteration Engineering Technology Perplexity https://logos.yubhub.co/perplexity.com.png Perplexity is a leading AI company that specializes in building and evaluating AI products and evaluations. They are known for their cutting-edge technology and innovative approach to AI development. https://jobs.ashbyhq.com https://jobs.ashbyhq.com/perplexity/9904db61-b8ca-4207-8f93-88ab6f0cd3fd San Francisco 2026-03-04