{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/rl-techniques"},"x-facet":{"type":"skill","slug":"rl-techniques","display":"Rl Techniques","count":6},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_ba66dcb1-8d9"},"title":"Research Scientist, AI Controls and Monitoring","description":"<p>We&#39;re seeking a Research Scientist to join our team focused on AI Controls and Monitoring. As a key member of our team, you will design methods, systems, and experiments to ensure that advanced AI models and agents remain aligned with intended goals, even in high-stakes or adversarial environments.</p>\n<p>Your responsibilities will include developing monitoring techniques and observability methods, researching mechanisms for layered control, and designing red-team simulations to probe weaknesses in oversight and control mechanisms.</p>\n<p>To succeed in this role, you&#39;ll need a strong background in machine learning, particularly in generative AI, and at least three years of experience addressing sophisticated ML problems. You should be comfortable designing control and monitoring experiments for AI systems, building prototype systems, and quickly turning new ideas from the research literature into working prototypes.</p>\n<p>In addition to your technical expertise, you&#39;ll need strong written and verbal communication skills to operate in a cross-functional team.</p>\n<p>This role offers a competitive salary range of $216,000-$270,000 USD, depending on location and experience, as well as equity-based compensation and benefits, including comprehensive health, dental, and vision coverage, retirement benefits, a learning and development stipend, and generous PTO.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_ba66dcb1-8d9","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Scale","sameAs":"https://scale.com/","logo":"https://logos.yubhub.co/scale.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/scaleai/jobs/4675694005","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$216,000-$270,000 USD","x-skills-required":["Machine Learning","Generative AI","AI Control Protocols","AI Risk Evaluations","Runtime Monitoring","Anomaly Detection","Observability"],"x-skills-preferred":["Post-Training and RL Techniques","Scalable Oversight","Interpretability","Debate"],"datePosted":"2026-04-18T15:58:38.219Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA; New York, NY"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Machine Learning, Generative AI, AI Control Protocols, AI Risk Evaluations, Runtime Monitoring, Anomaly Detection, Observability, Post-Training and RL Techniques, Scalable Oversight, Interpretability, Debate","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":216000,"maxValue":270000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_769c0070-5b2"},"title":"Research Scientist, Agent Robustness","description":"<p>As a Research Scientist working on Agent Robustness, you will work on the fundamental challenges of building AI agents that are safe and aligned with humans.</p>\n<p>For example, you might:</p>\n<ul>\n<li>Research the science of AI agent capabilities with a focus on how they relate to safety, risk factors, and methodologies for benchmarking them;</li>\n<li>Design and build harnesses to test AI agents&#39; tendency to take harmful actions when pressured to do so by users or tricked into doing so by elements of their environment;</li>\n<li>Design and build exploits and mitigations for new and unique failure modes that arise as AI agents gain affordances like coding, web browsing, and computer use;</li>\n<li>Characterize and design mitigations for potential failure modes or broader risks of systems involving multiple interacting AI agents.</li>\n</ul>\n<p>Ideally you&#39;d have:</p>\n<ul>\n<li>Commitment to our mission of promoting safe, secure, and trustworthy AI deployments in the industry as frontier AI capabilities continue to advance;</li>\n<li>Practical experience conducting technical research collaboratively;</li>\n<li>Experience with post-training and RL techniques such as RLHF, DPO, GRPO, and similar approaches;</li>\n<li>A track record of published research in machine learning, particularly in generative AI;</li>\n<li>At least three years of experience addressing sophisticated ML problems, whether in a research setting or in product development;</li>\n<li>Strong written and verbal communication skills to operate in a cross-functional team.</li>\n</ul>\n<p>Nice to have:</p>\n<ul>\n<li>Hands-on experience with agent evaluation frameworks such as SWE-bench, WebArena, OSWorld, Inspect, or similar tools;</li>\n<li>Experience with red-teaming, prompt injection, or adversarial testing of AI systems.</li>\n</ul>\n<p>Our research interviews are crafted to assess candidates&#39; skills in practical ML prototyping and debugging, their grasp of research concepts, and their alignment with our organisational culture. We will not ask any LeetCode-style questions. If you&#39;re excited about advancing AI safety and contributing to our mission, we encourage you to apply, even if your experience doesn&#39;t perfectly align with every requirement.</p>\n<p>Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position, determined by work location and additional factors, including job-related skills, experience, interview performance, and relevant education or training. Scale employees in eligible roles are also granted equity-based compensation, subject to Board of Director approval. Your recruiter can share more about the specific salary range for your preferred location during the hiring process, and confirm whether the hired role will be eligible for equity grant. You&#39;ll also receive benefits including, but not limited to: Comprehensive health, dental and vision coverage, retirement benefits, a learning and development stipend, and generous PTO. Additionally, this role may be eligible for additional benefits such as a commuter stipend.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_769c0070-5b2","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Scale","sameAs":"https://scale.com/","logo":"https://logos.yubhub.co/scale.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/scaleai/jobs/4675684005","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$216,000-$270,000 USD","x-skills-required":["Commitment to our mission of promoting safe, secure, and trustworthy AI deployments in the industry as frontier AI capabilities continue to advance","Practical experience conducting technical research collaboratively","Experience with post-training and RL techniques such as RLHF, DPO, GRPO, and similar approaches","A track record of published research in machine learning, particularly in generative AI","At least three years of experience addressing sophisticated ML problems, whether in a research setting or in product development"],"x-skills-preferred":["Hands-on experience with agent evaluation frameworks such as SWE-bench, WebArena, OSWorld, Inspect, or similar tools","Experience with red-teaming, prompt injection, or adversarial testing of AI systems"],"datePosted":"2026-04-18T15:57:29.447Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA; New York, NY"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Commitment to our mission of promoting safe, secure, and trustworthy AI deployments in the industry as frontier AI capabilities continue to advance, Practical experience conducting technical research collaboratively, Experience with post-training and RL techniques such as RLHF, DPO, GRPO, and similar approaches, A track record of published research in machine learning, particularly in generative AI, At least three years of experience addressing sophisticated ML problems, whether in a research setting or in product development, Hands-on experience with agent evaluation frameworks such as SWE-bench, WebArena, OSWorld, Inspect, or similar tools, Experience with red-teaming, prompt injection, or adversarial testing of AI systems","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":216000,"maxValue":270000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_faffcca4-e94"},"title":"Research Engineer, Cybersecurity Reinforcement Learning","description":"<p>About the role</p>\n<p>We&#39;re hiring for the Cybersecurity RL team within Horizons. As a Research Engineer, you&#39;ll help to safely advance the capabilities of our models in secure coding, vulnerability remediation, and other areas of defensive cybersecurity.</p>\n<p>This role blends research and engineering, requiring you to both develop novel approaches and realize them in code. Your work will include designing and implementing RL environments, conducting experiments and evaluations, delivering your work into production training runs, and collaborating with other researchers, engineers, and cybersecurity specialists across and outside Anthropic.</p>\n<p>The role requires domain expertise in cybersecurity paired with interest or experience in training safe AI models. For example, you might be a white hat hacker who&#39;s curious about how LLMs could augment or transform your work, a security engineer interested in how AI could help harden systems at scale, or a detection and response professional wondering how models could enhance defensive workflows.</p>\n<p>Responsibilities</p>\n<ul>\n<li>Design and implement RL environments for secure coding and vulnerability remediation</li>\n<li>Conduct experiments and evaluations to assess the effectiveness of our models</li>\n<li>Deliver your work into production training runs to advance the capabilities of our models</li>\n<li>Collaborate with other researchers, engineers, and cybersecurity specialists across and outside Anthropic</li>\n</ul>\n<p>Requirements</p>\n<ul>\n<li>Experience in cybersecurity research</li>\n<li>Experience with machine learning</li>\n<li>Strong software engineering skills</li>\n<li>Ability to balance research exploration with engineering implementation</li>\n<li>Passion for AI&#39;s potential and commitment to developing safe and beneficial systems</li>\n</ul>\n<p>Strong candidates may also have:</p>\n<ul>\n<li>Professional experience in security engineering, fuzzing, detection and response, or other applied defensive work</li>\n<li>Experience participating in or building CTF competitions and cyber ranges</li>\n<li>Academic research experience in cybersecurity</li>\n<li>Familiarity with RL techniques and environments</li>\n<li>Familiarity with LLM training methodologies</li>\n</ul>\n<p>Logistics</p>\n<ul>\n<li>Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience</li>\n<li>Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience</li>\n<li>Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position</li>\n<li>Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</li>\n<li>Visa sponsorship: We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</li>\n</ul>\n<p>We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work.</p>\n<p>Your safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you&#39;re ever unsure about a communication, don&#39;t click any links,visit anthropic.com/careers directly for confirmed position openings.</p>\n<p>How we&#39;re different</p>\n<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact , advancing our long-term goals of steerable, trustworthy AI , rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We&#39;re an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.</p>\n<p>The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI &amp; Compute, Concrete Problems in AI Safety, and Learning from Human Preferences.</p>\n<p>Come work with us!</p>\n<p>Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_faffcca4-e94","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5025624008","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$300,000-$405,000 USD","x-skills-required":["cybersecurity research","machine learning","software engineering","research exploration","engineering implementation"],"x-skills-preferred":["security engineering","fuzzing","detection and response","RL techniques","LLM training methodologies"],"datePosted":"2026-04-18T15:43:50.288Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"cybersecurity research, machine learning, software engineering, research exploration, engineering implementation, security engineering, fuzzing, detection and response, RL techniques, LLM training methodologies","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":300000,"maxValue":405000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_540ce49c-271"},"title":"Member of Technical Staff - Multimodal Understanding","description":"<p><strong>About the Role</strong></p>\n<p>You will join the multimodal team to push toward superhuman multimodal intelligence. Advance understanding and generation across modalities,image, video, audio, and text,spanning the full stack: data curation/acquisition, tokenizer training, large-scale pre-training, post-training/alignment, infrastructure/scaling, evaluation, tooling/demos, and end-to-end product experiences.</p>\n<p>Collaborate cross-functionally with pre-training, post-training, reasoning, data, applied, and product teams to deliver frontier capabilities in multimodal reasoning, world modeling, tool use, agentic behaviors, and interactive human-AI collaboration. Contribute to building models that can see, hear, reason about, and interact with the world in real time at unprecedented levels.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Design, build, and optimize large-scale distributed systems for multimodal pre-training, post-training, inference, data processing, and tokenization at web/petabyte scale.</li>\n<li>Develop high-throughput pipelines for data acquisition, preprocessing, filtering, generation, decoding, loading, crawling, visualization, and management (images, videos, audio + text).</li>\n<li>Advance multimodal capabilities including spatial-temporal compression, cross-modal alignment, world modeling, reasoning, emergent abilities, audio/image/video understanding &amp; generation, real-time video processing, and noisy data handling.</li>\n<li>Drive data quality and studies: curation (human/synthetic), filtering techniques, analysis, and scalable pipelines to support trillion-parameter models.</li>\n<li>Create evaluation frameworks, internal benchmarks, reward models, and metrics that capture real-world usage, failure modes, interactive dynamics, and human-AI synergy.</li>\n<li>Innovate on algorithms, modeling approaches, hardware/software/algorithm co-design, and scaling paradigms for state-of-the-art performance.</li>\n<li>Build research tooling, user-friendly interfaces, prototypes/demos, full-stack applications, and enable rapid iteration based on feedback.</li>\n<li>Work across the stack (pre-training → SFT/RL/post-training) to enable reasoning, tool calling, agentic behaviors, orchestration, and seamless real-time interactions.</li>\n</ul>\n<p><strong>Basic Qualifications</strong></p>\n<ul>\n<li>Hands-on experience with multimodal pre-training, post-training, or fine-tuning (vision, audio, video, or cross-modal).</li>\n<li>Expert-level proficiency in Python (core language), with strong experience in at least one of: JAX / PyTorch / XLA.</li>\n<li>Proven track record building or optimizing large-scale distributed ML systems (training/inference optimization, GPU utilization, multi-GPU/TPU setups, hardware co-design).</li>\n<li>Deep experience designing and running data pipelines at scale: curation, filtering, generation, quality studies, especially for noisy/real-world multimodal data.</li>\n<li>Strong fundamentals in evaluation design, benchmarks, reward modeling, or RL techniques (particularly for interactive/agentic behaviors).</li>\n<li>Proactive self-starter who thrives in high-intensity environments and is passionate about pushing multimodal AI frontiers.</li>\n<li>Willingness to own end-to-end initiatives and do whatever it takes to deliver breakthrough user experiences.</li>\n</ul>\n<p><strong>Preferred Skills and Experience</strong></p>\n<ul>\n<li>Experience leading major improvements in model capabilities through better data, modeling, algorithms, or scaling.</li>\n<li>Familiarity with state-of-the-art in multimodal LLMs, scaling laws, tokenizers, compression techniques, reasoning, or agentic systems.</li>\n<li>Proficiency in Rust and/or C++ for performance-critical components.</li>\n<li>Hands-on work with large-scale orchestration tools such as Spark, Ray, or Kubernetes.</li>\n<li>Background building full-stack tooling: performant interfaces, real-time research demos/apps, or end-to-end product ownership.</li>\n<li>Passion for end-to-end user experience in interactive, real-time multimodal AI systems.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_540ce49c-271","directApply":true,"hiringOrganization":{"@type":"Organization","name":"xAI","sameAs":"https://www.xai.com","logo":"https://logos.yubhub.co/xai.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/xai/jobs/5111374007","x-work-arrangement":"onsite","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$180,000 - $440,000 USD","x-skills-required":["Multimodal pre-training","Post-training","Fine-tuning","Python","JAX","PyTorch","XLA","Large-scale distributed ML systems","Data pipelines","Evaluation design","Benchmarks","Reward modeling","RL techniques"],"x-skills-preferred":["State-of-the-art in multimodal LLMs","Scaling laws","Tokenizers","Compression techniques","Reasoning","Agentic systems","Rust","C++","Spark","Ray","Kubernetes","Full-stack tooling"],"datePosted":"2026-04-18T15:23:05.119Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Palo Alto, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Multimodal pre-training, Post-training, Fine-tuning, Python, JAX, PyTorch, XLA, Large-scale distributed ML systems, Data pipelines, Evaluation design, Benchmarks, Reward modeling, RL techniques, State-of-the-art in multimodal LLMs, Scaling laws, Tokenizers, Compression techniques, Reasoning, Agentic systems, Rust, C++, Spark, Ray, Kubernetes, Full-stack tooling","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":180000,"maxValue":440000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_6eae2a86-95b"},"title":"Staff Engineer, Autonomy - Tactical Behaviours","description":"<p>This role is perfect for an individual who enjoys solving complex problems across diverse domains and modalities. As a Staff Engineer, Autonomy - Tactical Behaviours, you will design tactical autonomy algorithms to enable unmanned aircraft to perform complex missions across air, land, and sea domains with minimal human supervision.</p>\n<p>Key responsibilities include:\nDesigning tactical autonomy algorithms to enable unmanned aircraft to perform complex missions\nDeveloping high-performance software modules that incorporate planning, decision-making, and behaviour execution strategies\nImplementing and testing behaviour architectures that enable multi-agent coordination, target engagement, reconnaissance, and survivability in contested scenarios\nWorking at the intersection of classical autonomy and machine learning, blending rule-based systems with learning-based methods\nCollaborating with cross-functional teams to ensure seamless integration of autonomy solutions on real-world platforms\nDeploying autonomy capabilities to real platforms and participating in field tests and flight demos\nAnalyzing mission logs and performance data to diagnose failures, optimize behaviour models, and inform iterative development\nContributing to the autonomy roadmap by researching and prototyping new algorithms, identifying tactical capability gaps, and proposing novel solutions\nSupporting defence-focused programs and customer needs by adapting autonomy solutions to evolving mission sets, compliance requirements, and operational feedback</p>\n<p>Required qualifications include:\nBS/MS in Computer Science, Electrical Engineering, Mechanical Engineering, Aerospace Engineering, and/or similar degree, or equivalent practical experience\nTypically requires a minimum of 7 years of related experience with a Bachelor’s degree; or 5 years and a Master’s degree; or 4 years with a PhD; or equivalent work experience\nProficiency in programming languages such as C++ and Python, and familiarity with real-time operating systems (RTOS)\nSignificant background in robotics technologies related to motion planning, behaviour modelling, decision-making, or autonomous system design\nSignificant experience with unmanned system technologies and accompanying algorithms (specifically air domain)\nExperience with simulation tools and environments (e.g., AFSIM, NGTS) for testing and validation\nStrong problem-solving skills, with the ability to troubleshoot and optimise system performance\nExcellent communication and teamwork skills, with the ability to work effectively in a collaborative, multidisciplinary environment\nAbility to obtain a SECRET clearance</p>\n<p>Preferred qualifications include:\nExperience applying ML/RL techniques in autonomy pipelines\nBackground in collaborative behaviours, swarm robotics, or distributed decision-making\nFamiliarity with tactical behaviours for unmanned systems in DoD or government programs\nWork on behaviours applicable across air, ground, and maritime vehicles\nHands-on experience supporting flight demos or live exercises\nExperience with UCI and OMS Standards</p>\n<p>The salary range for this role is $182,720 - $274,080 per year.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_6eae2a86-95b","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Shield AI","sameAs":"https://www.shield.ai","logo":"https://logos.yubhub.co/shield.ai.png"},"x-apply-url":"https://jobs.lever.co/shieldai/9c66691e-4497-4a25-8fe8-c9fdf09046ea","x-work-arrangement":"onsite","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$182,720 - $274,080 per year","x-skills-required":["C++","Python","Real-time operating systems (RTOS)","Motion planning","Behaviour modelling","Decision-making","Autonomous system design","Unmanned system technologies","Simulation tools and environments","Problem-solving","Communication","Teamwork","SECRET clearance"],"x-skills-preferred":["ML/RL techniques","Collaborative behaviours","Swarm robotics","Distributed decision-making","Tactical behaviours","UCI and OMS Standards"],"datePosted":"2026-04-17T13:02:40.551Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Washington, DC / Boston, MA / San Diego, California"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"C++, Python, Real-time operating systems (RTOS), Motion planning, Behaviour modelling, Decision-making, Autonomous system design, Unmanned system technologies, Simulation tools and environments, Problem-solving, Communication, Teamwork, SECRET clearance, ML/RL techniques, Collaborative behaviours, Swarm robotics, Distributed decision-making, Tactical behaviours, UCI and OMS Standards","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":182720,"maxValue":274080,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_b0188062-45f"},"title":"Research Engineer, Cybersecurity Reinforcement Learning","description":"<p><strong>About the role</strong></p>\n<p>We&#39;re hiring for the Cybersecurity RL team within Horizons. As a Research Engineer, you&#39;ll help to safely advance the capabilities of our models in secure coding, vulnerability remediation, and other areas of defensive cybersecurity.</p>\n<p>This role blends research and engineering, requiring you to both develop novel approaches and realise them in code. Your work will include designing and implementing RL environments, conducting experiments and evaluations, delivering your work into production training runs, and collaborating with other researchers, engineers, and cybersecurity specialists across and outside Anthropic.</p>\n<p><strong>You may be a good fit if you:</strong></p>\n<ul>\n<li>Have experience in cybersecurity research.</li>\n<li>Have experience with machine learning.</li>\n<li>Have strong software engineering skills.</li>\n<li>Can balance research exploration with engineering implementation.</li>\n<li>Are passionate about AI&#39;s potential and committed to developing safe and beneficial systems.</li>\n</ul>\n<p><strong>Strong candidates may also have:</strong></p>\n<ul>\n<li>Professional experience in security engineering, fuzzing, detection and response, or other applied defensive work.</li>\n<li>Experience participating in or building CTF competitions and cyber ranges.</li>\n<li>Academic research experience in cybersecurity.</li>\n<li>Familiarity with RL techniques and environments.</li>\n<li>Familiarity with LLM training methodologies.</li>\n</ul>\n<p><strong>Logistics</strong></p>\n<p><strong>Education requirements:</strong> We require at least a Bachelor&#39;s degree in a related field or equivalent experience. <strong>Location-based hybrid policy:</strong> Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</p>\n<p><strong>Visa sponsorship:</strong> We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</p>\n<p><strong>How we&#39;re different</strong></p>\n<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We&#39;re an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time.</p>\n<p><strong>Come work with us!</strong></p>\n<p>Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lot more.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_b0188062-45f","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://job-boards.greenhouse.io","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5025624008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$300,000 - $405,000 USD","x-skills-required":["cybersecurity research","machine learning","software engineering","RL techniques and environments","LLM training methodologies"],"x-skills-preferred":["security engineering","fuzzing","detection and response","CTF competitions and cyber ranges","academic research in cybersecurity"],"datePosted":"2026-03-08T13:44:27.551Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA, New York City, NY"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"cybersecurity research, machine learning, software engineering, RL techniques and environments, LLM training methodologies, security engineering, fuzzing, detection and response, CTF competitions and cyber ranges, academic research in cybersecurity","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":300000,"maxValue":405000,"unitText":"YEAR"}}}]}