Research Engineer — Reinforcement Learning

6ca1bab3-645 Research Engineer — Reinforcement Learning You'll bring reinforcement learning to Firecrawl's core product , building the training infrastructure, reward pipelines, and fine-tuning systems that make our models meaningfully better at extracting, understanding, and structuring web data.

This isn't theoretical RL research. You'll build your own training infra, run fast experiments, ship models to production, and bridge the gap between classical RL approaches and modern LLM agent systems. If you care as much about training throughput as you do about reward design, this is the role.

Salary Range: $180,000–$290,000/year (Range shown is for U.S.-based employees. Compensation outside the U.S. is adjusted fairly based on your country's cost of living.)

Equity Range: Up to 0.15%

Location: San Francisco, CA or Remote (Americas, UTC-3 to UTC-10)

Job Type: Full-Time

Experience: 3+ years in applied RL, ML engineering, or model training , with production systems

Visa: US Citizenship/Visa required for SF; N/A for Remote

Build training infrastructure and reward pipelines from scratch. Design and operate the systems that train and evaluate Firecrawl's models. You'll own the full loop , data collection, reward modeling, training runs, evaluation, and deployment. You build the infra yourself because you're the one who needs it to work.

Fine-tune models to achieve state-of-the-art results. Take foundation models and make them dramatically better at web data extraction, content understanding, and structured output generation. You know how to get from 'decent fine-tune' to 'best-in-class' and you have the patience and rigor to close that gap.

Bridge LLM agents and classical RL. The most interesting problems at Firecrawl sit at the intersection of modern LLM-based agents and classical RL techniques. You'll design reward signals for agent behaviors, apply RL methods to improve multi-step agent workflows, and figure out where traditional RL approaches outperform prompting , and vice versa.

Run fast experiments and iterate. You design experiments that test meaningful hypotheses, run them quickly, and make decisions based on results. You don't spend weeks on experiment infrastructure before getting a single result. Speed of iteration is a core part of how you work.

Communicate clearly to non-RL people. RL can be opaque. You translate your work into language that engineers, product people, and leadership can understand and act on. You know how to explain why a reward function matters without requiring everyone to read the paper.

Collaborate closely with the team. Work directly with the Search/IR-focused Research Engineer and the engineering team to connect RL improvements with search, ranking, and the broader product roadmap.

Builds their own training infra and reward pipelines. You don't wait for an ML platform team to set things up. You build the training loops, reward models, data pipelines, and evaluation frameworks yourself , because you understand that infra choices directly affect the quality of results. You've operated GPU clusters, managed training runs, and debugged convergence issues in production.

Can fine-tune models to SOTA. You've taken models from baseline to best-in-class on tasks that matter. You understand the full fine-tuning lifecycle , data curation, training dynamics, hyperparameter sensitivity, evaluation methodology , and you have the taste to know when a model is actually good versus when the eval is flattering.

Bridges LLM agents and classical RL. You're fluent in both worlds. You understand PPO, RLHF, reward modeling, and policy optimization , and you understand how modern LLM agents work, where they fail, and how RL techniques make them better. You see connections between these domains that most people miss.

Production-minded. You care about whether your models work in production, not just on benchmarks. You've deployed models that serve real traffic and made hard tradeoffs between model quality, latency, and cost. Research that doesn't ship isn't research that matters here.

Runs fast experiments and communicates clearly. You'd rather run three rough experiments this week than one polished one next month. When you have results, anyone on the team can understand what they mean , no decoder ring required.

Backgrounds that tend to do well: RL engineers at AI labs or applied ML teams who've shipped models to production. Researchers who've done RLHF or reward modeling for LLM systems. ML engineers who've built training infrastructure at startups and cared as much about the pipeline as the model. People who've worked at the intersection of RL and language models , whether in academic labs with a production bent or at companies building agent systems.

What We're NOT Looking For:

Pure theorists. If your best RL work lives in a paper and you've never trained a model on real data at real scale, this isn't the role. We need someone who builds and ships.

Researchers who need a platform team. If you expect training infrastructure, data pipelines, and evaluation frameworks to be set up before you can be productive, you'll be frustrated here. You build the tools you need.

People who only know one paradigm. Deep in classical RL but never worked with LLMs? LLM fine-tuner who's never touched RL? You'll be missing half the picture. This role requires fluency in both.

Slow iterators. If your standard experiment cycle is measured in weeks, not days, you'll struggle with the pace. We need someone who can run a meaningful experiment, interpret results, and decide next steps within a day or two.

Black-box communicators. If your typical update is a wall of metrics only another RL researcher can parse, this isn't the right fit. We need someone who can explain what's working, what's not, and why it matters , to people without RL PhDs.

A Note On Pace: We operate at an absurd level of urgency because the window for what we're building won't stay open forever. If that excites you, keep reading. If it doesn't, no hard feelings , but this role probably isn't for you.

Benefits & Perks:

Available to all employees

Salary that makes sense , $180,000–$290,000/year, based on impact, not tenure

Own a piece , Up to 0.15% equity in what you're helping build

Generous PTO , 15 days mandatory, anything after 24 days, just ask (holidays excluded); take the time you need to recharge

Parental leave , 12 weeks fully paid, for moms and dads

Wellness stipend , $100/month for the gym, therapy, massages, or whatever keeps you human

Learning & Development , Expense up to $1,000/year toward anything that helps you grow professionally

Team offsites , A change of scenery, minus the trust falls

Sabbatical , 3 paid months off after 4 years, do something fun and new

Available to US-based full-time employees

Full coverage, no red tape , Medical, dental, and vision (100% for

XML job scraping automation by YubHub

]]> Full time senior remote $180,000–$290,000/year Reinforcement Learning, Machine Learning, Deep Learning, Python, GPU Clusters, Training Runs, Evaluation Frameworks, Data Pipelines, Reward Modeling, Policy Optimization, LLM Agents, Classical RL Techniques Engineering Technology Firecrawl https://logos.yubhub.co/firecrawl.dev.png Firecrawl is a software company that provides a service for extracting data from the web. They have hit 8 figures in ARR and 100k+ GitHub stars. https://www.firecrawl.dev https://jobs.ashbyhq.com/firecrawl/26abaf11-ff85-4f8d-ba44-2b6d32aae2a1 San Francisco, CA (Hybrid) OR Remote (Americas, UTC-3 to UTC-10) 2026-04-24 adaeee82-278 Engineering Manager, Developer Productivity AI Job Title

Engineering Manager, Developer Productivity AI

About the Role

We are seeking a hands-on Engineering Manager to lead a team of high-agency engineers in accelerating Stripe's engineering productivity by effectively deploying LLM agents and tools to automate large swaths of the engineering workflow.

Responsibilities

Lead a senior, cross-continent team of engineers to move rapidly to deploy and exploit the latest in AI devtools.
Deliver compelling user and product experiences for internal users.
Work across the company to redefine how software engineers at Stripe work.
Keep up with an incredibly fast-paced technology environment.
Dig deep into the data on how engineers use AI tools to seek out every bit of benefit, and help Stripe accelerate.

Requirements

2+ years managing high-performing engineering teams, and at least 5+ years of experience as a software engineer.
Proven success in recruiting and building great teams.
A repeated history of delivering great, innovative product experiences - either for internal or external users.
Effective cross-functional collaboration, with the ability to think rigorously, communicate clearly, and make or coordinate difficult decisions and trade-offs.
Thrives with high autonomy and responsibility in an ambiguous environment.
Ability to foster a healthy, inclusive, challenging, and supportive work environment.

Preferred Qualifications

Experience tech leading, mentoring, and managing a team that shipped products to engineers.
Experience building tools and products, ideally for technical audiences - either as internal or external customers.
Deep curiosity and opinionation about the use of AI in software engineering.
Strong written and verbal communication skills for various audiences, including leadership, users, and company-wide.

XML job scraping automation by YubHub

]]> full-time senior remote AI development tools, LLM agents, engineering productivity, software engineering, cross-functional collaboration, team management, product development, tech leading, mentoring, team building, product innovation, communication skills Engineering Technology Stripe https://logos.yubhub.co/stripe.com.png Stripe is a financial infrastructure platform used by millions of companies worldwide. It provides payment processing and revenue growth solutions. https://stripe.com/ https://job-boards.greenhouse.io/stripe/jobs/7736943 US-Remote 2026-03-31 da400243-b70 Credit Risk Strategist, Risk Foundations As a Credit Risk Strategist on the Risk Foundations team at Stripe, you will play a critical role in managing the company's financial and partnership success. You will help manage a rapidly growing merchant portfolio by guiding Stripe's credit strategy and devising growth-friendly solutions that reduce overall credit risk.

Responsibilities:

Develop scalable methods to identify, measure, and control end-to-end risks across the user lifecycle, including credit and fraud risks.
Proactively identify risk management opportunities and outline the strategy, translating complex requirements into technical specifications.
Work closely with Product, Engineering, and Data Science teams to shape new product offerings and develop data-driven, globally scalable systems and processes.
Become an expert on the global ecosystem of Stripe's products, integrations, tools, and partner requirements, and how those elements interact with payment risks.
Develop strong internal relationships across partner teams and challenge the payments industry status quo to enable innovative businesses to flourish online.

Requirements:

5+ years of relevant experience with financial risk modeling.
A builder's mindset with a willingness to question assumptions and conventional wisdom.
Analytical skills, with expertise in data analysis, modeling, and framing business decisions and tradeoffs effectively through quantitative analysis and visualization.
Collaborative skills, with the ability to work effectively across teams and build strong working relationships.
Decisive, yet open to learning, with the ability to make critical decisions and quickly learn and iterate from experiences.
Experience with SQL required.
Bachelor's degree in Economics, Statistics, Computer Science, Engineering, or other quantitative or related field.

XML job scraping automation by YubHub

]]> full-time senior remote financial risk modeling, data analysis, modeling, SQL, risk management, Python, LLM agents, risk management practices Finance Technology Stripe https://logos.yubhub.co/stripe.com.png Stripe is a financial infrastructure platform for businesses, handling billions of dollars every year. https://stripe.com/ https://job-boards.greenhouse.io/stripe/jobs/7550951 Chicago, US-Remote, Toronto 2026-03-31