{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/alerting"},"x-facet":{"type":"skill","slug":"alerting","display":"Alerting","count":53},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_740da2af-174"},"title":"Security Engineer, Detection & Response","description":"<p>We are seeking a Senior Security Engineer with a specialty in Detection and Incident Response to join our Security Engineering team. This role sits at the intersection of security operations and software engineering, requiring you to investigate incidents and build the systems that detect, contain, and prevent them.</p>\n<p>You will design and ship high-precision detections across cloud services and enterprise SaaS, develop automation that shortens response timelines, and mature the telemetry pipelines that make it all possible. Your ability to write production-quality code is just as important as your ability to triage an alert.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Engineer, test, and deploy detection logic across cloud and enterprise environments, treating detections as software with version control, peer review, and measurable performance.</li>\n</ul>\n<ul>\n<li>Build and maintain incident response automation, runbooks, and tooling that reduce containment timelines without sacrificing developer velocity.</li>\n</ul>\n<ul>\n<li>Mature telemetry pipelines through improved schema design, normalization, enrichment, and quality checks that reduce false positives and increase signal fidelity.</li>\n</ul>\n<ul>\n<li>Perform digital incident investigations to identify and contain potential security breaches.</li>\n</ul>\n<ul>\n<li>Conduct digital forensics and malware analysis to understand attack vectors and adversary methodologies.</li>\n</ul>\n<ul>\n<li>Integrate alerting with messaging and ticketing systems to enable fast, traceable response workflows.</li>\n</ul>\n<ul>\n<li>Partner cross-functionally with IT, security, and engineering teams to harden identity and access patterns, close logging and forensics gaps, and implement maintainable guardrails that scale with the organisation.</li>\n</ul>\n<ul>\n<li>Utilize threat intelligence platforms to improve hunting, detection, and response workflows.</li>\n</ul>\n<ul>\n<li>Clearly explain the significance and impact of incidents, providing actionable recommendations to both technical and non-technical stakeholders.</li>\n</ul>\n<p>Ideal Candidate:</p>\n<ul>\n<li>5+ years of experience in Detection Engineering, Incident Response, or Security Operations, with a strong emphasis on building and shipping security tooling and automation.</li>\n</ul>\n<ul>\n<li>Proficiency in at least one programming language (e.g., Python, Go) and comfort writing production-grade code , not just scripts.</li>\n</ul>\n<ul>\n<li>Hands-on experience designing or improving detection pipelines, SIEM content, and alerting workflows in cloud-native environments.</li>\n</ul>\n<ul>\n<li>Practical experience with SIEM, EDR, and SOAR tools, with a preference for candidates who have built integrations or extended these platforms programmatically.</li>\n</ul>\n<ul>\n<li>Strong understanding of modern cyber threats, common attack techniques, and adversary TTPs.</li>\n</ul>\n<ul>\n<li>Familiarity with digital forensics tools and malware analysis techniques.</li>\n</ul>\n<ul>\n<li>Experience with cloud-native environments (e.g., AWS, GCP, Azure) and the security telemetry those environments generate.</li>\n</ul>\n<ul>\n<li>Exposure to threat intelligence platforms and integrating intel into detection and investigation workflows.</li>\n</ul>\n<ul>\n<li>Strong communication skills, with the ability to translate complex security findings into clear business impact.</li>\n</ul>\n<ul>\n<li>Relevant security certifications (e.g., GCIH, GCFA, GCIA, CISSP, GDSA) are a plus.</li>\n</ul>\n<p>Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position, determined by work location and additional factors, including job-related skills, experience, interview performance, and relevant education or training. Scale employees in eligible roles are also granted equity based compensation, subject to Board of Director approval. Your recruiter can share more about the specific salary range for your preferred location during the hiring process, and confirm whether the hired role will be eligible for equity grant. You’ll also receive benefits including, but not limited to: Comprehensive health, dental and vision coverage, retirement benefits, a learning and development stipend, and generous PTO. Additionally, this role may be eligible for additional benefits such as a commuter stipend.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_740da2af-174","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Scale","sameAs":"https://scale.com/","logo":"https://logos.yubhub.co/scale.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/scaleai/jobs/4684073005","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$237,600-$297,000 USD","x-skills-required":["Detection Engineering","Incident Response","Security Operations","Cloud Services","Enterprise SaaS","Automation","Telemetry Pipelines","Digital Forensics","Malware Analysis","Threat Intelligence Platforms","SIEM","EDR","SOAR","Cloud-Native Environments","Programming Languages","Python","Go"],"x-skills-preferred":["Hands-on experience designing or improving detection pipelines, SIEM content, and alerting workflows in cloud-native environments","Practical experience with SIEM, EDR, and SOAR tools, with a preference for candidates who have built integrations or extended these platforms programmatically","Strong understanding of modern cyber threats, common attack techniques, and adversary TTPs","Familiarity with digital forensics tools and malware analysis techniques","Experience with cloud-native environments (e.g., AWS, GCP, Azure) and the security telemetry those environments generate","Exposure to threat intelligence platforms and integrating intel into detection and investigation workflows","Strong communication skills, with the ability to translate complex security findings into clear business impact","Relevant security certifications (e.g., GCIH, GCFA, GCIA, CISSP, GDSA)"],"datePosted":"2026-04-18T16:00:14.303Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"New York, NY; San Francisco, CA; Seattle, WA; Washington, DC"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Detection Engineering, Incident Response, Security Operations, Cloud Services, Enterprise SaaS, Automation, Telemetry Pipelines, Digital Forensics, Malware Analysis, Threat Intelligence Platforms, SIEM, EDR, SOAR, Cloud-Native Environments, Programming Languages, Python, Go, Hands-on experience designing or improving detection pipelines, SIEM content, and alerting workflows in cloud-native environments, Practical experience with SIEM, EDR, and SOAR tools, with a preference for candidates who have built integrations or extended these platforms programmatically, Strong understanding of modern cyber threats, common attack techniques, and adversary TTPs, Familiarity with digital forensics tools and malware analysis techniques, Experience with cloud-native environments (e.g., AWS, GCP, Azure) and the security telemetry those environments generate, Exposure to threat intelligence platforms and integrating intel into detection and investigation workflows, Strong communication skills, with the ability to translate complex security findings into clear business impact, Relevant security certifications (e.g., GCIH, GCFA, GCIA, CISSP, GDSA)","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":237600,"maxValue":297000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_88ec8f26-4c9"},"title":"Senior IT Systems Engineer","description":"<p>We&#39;re seeking a strategic thinker and proven problem-solver with deep expertise in modern IT ecosystems. As a Sr. IT Systems Engineer, you&#39;ll lead the design, implementation, administration, and optimization of core SaaS platforms, including Okta, Google Workspace, Slack, Atlassian, and other IT tools. You&#39;ll own end-to-end support, monitoring, troubleshooting, and performance tuning of applications, systems, and their complex interconnections,ensuring high availability, security, and seamless user experience.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Designing and implementing SaaS platforms and IT tools</li>\n<li>Providing technical guidance to support business expansion, system scalability, and infrastructure maturity</li>\n<li>Identifying gaps, risks, and opportunities in the environment and leading initiatives to enhance security posture, operational efficiency, and resilience</li>\n<li>Evaluating emerging technologies, IAM trends, and automation platforms and developing business cases and adoption recommendations</li>\n<li>Mentoring junior engineers and collaborating with cross-functional teams to align IT capabilities with organizational goals</li>\n</ul>\n<p>Basic qualifications include 8+ years of hands-on experience administering and optimizing a broad portfolio of SaaS applications in a hybrid and high-growth environment, with advanced proficiency in our core stack: Okta (including Advanced Server Access &amp; Workflows), Google Workspace, Slack Enterprise, Atlassian, etc.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_88ec8f26-4c9","directApply":true,"hiringOrganization":{"@type":"Organization","name":"xAI","sameAs":"https://www.xai.com","logo":"https://logos.yubhub.co/xai.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/xai/jobs/5071895007","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$184,000 - $276,000 USD","x-skills-required":["Okta","Google Workspace","Slack","Atlassian","IAM principles and protocols","APIs for custom integrations","Scripting and automation for monitoring, alerting, and operational efficiency","Azure","AWS","GCP cloud platforms"],"x-skills-preferred":["n8n","Okta Workflows","Workato","Zapier","BetterCloud","custom integrations"],"datePosted":"2026-04-18T15:58:57.233Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Palo Alto, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"IT","industry":"Technology","skills":"Okta, Google Workspace, Slack, Atlassian, IAM principles and protocols, APIs for custom integrations, Scripting and automation for monitoring, alerting, and operational efficiency, Azure, AWS, GCP cloud platforms, n8n, Okta Workflows, Workato, Zapier, BetterCloud, custom integrations","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":184000,"maxValue":276000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_95c49f85-a98"},"title":"Staff+ Software Engineer, Observability","description":"<p><strong>About the Role</strong></p>\n<p>Anthropic is seeking talented and experienced Software Engineers to join our Observability team within the Infrastructure organization. The Observability team owns the monitoring and telemetry infrastructure that every engineer and researcher at Anthropic depends on,from metrics and logging pipelines to distributed tracing, error analytics, alerting, and the dashboards and query interfaces that make it all actionable.</p>\n<p>As Anthropic scales its infrastructure across massive GPU, TPU, and Trainium clusters, the volume and complexity of operational data is growing by orders of magnitude. We’re building next-generation observability systems,high-throughput ingest pipelines, cost-efficient columnar storage, unified query layers across signals, and agentic diagnostic tools,to ensure that engineers can detect, diagnose, and resolve issues in minutes rather than hours, even as the systems they operate become exponentially more complex.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Design and build scalable telemetry ingest and storage pipelines for metrics, logs, traces, and error data across Anthropic’s multi-cluster infrastructure</li>\n</ul>\n<ul>\n<li>Own and evolve core observability platforms, driving migrations and architectural improvements that improve reliability, reduce cost, and scale with organisational growth</li>\n</ul>\n<ul>\n<li>Build instrumentation libraries, SDKs, and integrations that make it easy for engineering teams to emit high-quality telemetry from their services</li>\n</ul>\n<ul>\n<li>Drive alerting and SLO infrastructure that enables teams to define, monitor, and respond to reliability targets with minimal noise</li>\n</ul>\n<ul>\n<li>Reduce mean time to detection and resolution by building cross-signal correlation, unified query interfaces, and AI-assisted diagnostic tooling</li>\n</ul>\n<ul>\n<li>Partner with Research, Inference, Product, and Infrastructure teams to ensure observability solutions meet the unique needs of each organisation</li>\n</ul>\n<p><strong>You May Be a Good Fit If You</strong></p>\n<ul>\n<li>Have 10+ years of relevant industry experience building and operating large-scale observability or monitoring infrastructure</li>\n</ul>\n<ul>\n<li>Have deep experience with at least one observability signal area (metrics, logging, tracing, or error analytics) and familiarity with the others</li>\n</ul>\n<ul>\n<li>Understand high-throughput data pipelines, columnar storage engines, and the tradeoffs involved in ingesting and querying telemetry data at scale</li>\n</ul>\n<ul>\n<li>Have experience operating or building on top of observability platforms such as Prometheus, Grafana, ClickHouse, OpenTelemetry, or similar systems</li>\n</ul>\n<ul>\n<li>Have strong proficiency in at least one of Python, Rust, or Go</li>\n</ul>\n<ul>\n<li>Have excellent communication skills and enjoy partnering with internal teams to improve their operational visibility and incident response capabilities</li>\n</ul>\n<ul>\n<li>Are excited about building foundational infrastructure and are comfortable working independently on ambiguous, high-impact technical challenges</li>\n</ul>\n<p><strong>Strong Candidates May Also Have</strong></p>\n<ul>\n<li>Experience operating metrics systems at very high cardinality (hundreds of millions of active time series or more)</li>\n</ul>\n<ul>\n<li>Experience with log storage migrations or operating columnar databases (ClickHouse, BigQuery, or similar) for analytics workloads</li>\n</ul>\n<ul>\n<li>Experience with OpenTelemetry instrumentation, collector pipelines, and tail-based sampling strategies</li>\n</ul>\n<ul>\n<li>Experience building or operating alerting platforms, on-call tooling, or SLO frameworks at scale</li>\n</ul>\n<ul>\n<li>Experience with Kubernetes-native monitoring, eBPF-based observability, or continuous profiling</li>\n</ul>\n<ul>\n<li>Interest in applying AI/LLMs to operational workflows such as automated root cause analysis, anomaly detection, or intelligent alerting</li>\n</ul>\n<p><strong>Logistics</strong></p>\n<ul>\n<li>Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience</li>\n</ul>\n<ul>\n<li>Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience</li>\n</ul>\n<ul>\n<li>Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position</li>\n</ul>\n<ul>\n<li>Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</li>\n</ul>\n<ul>\n<li>Visa sponsorship: We do sponsor visas! However, we aren’t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</li>\n</ul>\n<p><strong>How we&#39;re different</strong></p>\n<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact , advancing our long-term goals of steerable, trustworthy AI , rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We’re an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.</p>\n<p><strong>Come work with us!</strong></p>\n<p>Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_95c49f85-a98","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5102440008","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"£325,000-£390,000 GBP","x-skills-required":["observability","telemetry","metrics","logging","tracing","error analytics","alerting","SLO infrastructure","cross-signal correlation","unified query interfaces","AI-assisted diagnostic tooling","Python","Rust","Go","Prometheus","Grafana","ClickHouse","OpenTelemetry"],"x-skills-preferred":["high-throughput data pipelines","columnar storage engines","Kubernetes-native monitoring","eBPF-based observability","continuous profiling","AI/LLMs","automated root cause analysis","anomaly detection","intelligent alerting"],"datePosted":"2026-04-18T15:57:27.177Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"London, UK"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"observability, telemetry, metrics, logging, tracing, error analytics, alerting, SLO infrastructure, cross-signal correlation, unified query interfaces, AI-assisted diagnostic tooling, Python, Rust, Go, Prometheus, Grafana, ClickHouse, OpenTelemetry, high-throughput data pipelines, columnar storage engines, Kubernetes-native monitoring, eBPF-based observability, continuous profiling, AI/LLMs, automated root cause analysis, anomaly detection, intelligent alerting","baseSalary":{"@type":"MonetaryAmount","currency":"GBP","value":{"@type":"QuantitativeValue","minValue":325000,"maxValue":390000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_c4e35d55-5d1"},"title":"Technical Program Manager, Safeguards (Infrastructure & Evals)","description":"<p>Job Title: Technical Program Manager, Safeguards (Infrastructure &amp; Evals)</p>\n<p>About Anthropic</p>\n<p>Anthropic&#39;s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole.</p>\n<p>About the Role</p>\n<p>Safeguards Engineering builds and operates the infrastructure that keeps Anthropic&#39;s AI systems safe in production , the classifiers, detection pipelines, evaluation platforms, and monitoring systems that sit between our models and the real world. That infrastructure needs to be not just correct, but reliable: when a safety-critical pipeline goes down or degrades, the consequences can be serious, and they can be invisible until someone looks closely.</p>\n<p>As a Technical Program Manager for Safeguards Infrastructure and Evals, you&#39;ll own the operational health and forward momentum of this stack. Your primary responsibility is driving reliability , owning the incident-response and post-mortem process, ensuring SLOs are defined and met in partnership with various teams, and making sure that when things go wrong, the right people know, the right actions get taken, and those actions actually get closed out.</p>\n<p>Alongside that ongoing operational rhythm, you&#39;ll coordinate the larger platform investments: migrations, eval-platform improvements, and the cross-team dependencies that connect them. This role sits at the intersection of operations and program management. It requires genuine technical depth , you need to understand how these systems work well enough to triage effectively, judge what&#39;s actually safety-critical versus what can wait, and have informed conversations with the engineers building and maintaining them. But the core of the job is keeping the machine running well and the work moving.</p>\n<p>What You&#39;ll Do:</p>\n<ul>\n<li>Own the Safeguards Engineering ops review</li>\n<li>Drive the recurring cadence that keeps the team informed and coordinated: surfacing recent incidents and failures, bringing visibility to reliability trends, and making sure the right people are in the room when decisions need to be made.</li>\n<li>Drive incident tracking and post-mortem execution</li>\n<li>Establish and maintain SLOs with partner teams</li>\n<li>Maintain runbook quality and incident-ownership clarity</li>\n<li>Drive platform migrations and infrastructure projects</li>\n<li>Coordinate evals platform improvements</li>\n</ul>\n<p>You might be a good fit if you:</p>\n<ul>\n<li>Have solid technical program management experience, particularly in operational or infrastructure-heavy environments , you&#39;re comfortable owning a mix of ongoing operational cadences and discrete project work simultaneously.</li>\n<li>Understand how production ML systems work well enough to triage incidents intelligently and have substantive conversations with engineers about what&#39;s going wrong and why , you don&#39;t need to write the code, but you need to follow the technical thread.</li>\n<li>Are energized by closing loops. Post-mortem action items that never get done, SLOs that no one checks, runbooks that go stale , these things bother you, and you know how to build the processes and follow-ups that fix them.</li>\n<li>Can work effectively across team boundaries , comfortable coordinating with partner teams (like Inference) where you don&#39;t have direct authority, and skilled at keeping shared work moving through influence and clear communication.</li>\n<li>Thrive in environments where the work shifts between &#39;keep the lights on&#39; and &#39;build something new&#39; , and can context-switch between incident follow-ups and longer-horizon platform projects without dropping either.</li>\n<li>Have experience with or strong interest in AI safety , you understand why the reliability of a safety-critical pipeline is a different kind of problem than the reliability of a product feature, and that distinction motivates you.</li>\n</ul>\n<p>Strong candidates may also:</p>\n<ul>\n<li>Have experience with SRE practices, incident management frameworks, or on-call operations at scale.</li>\n<li>Have worked on or with evaluation infrastructure for ML systems , understanding how evals get designed, run, and interpreted.</li>\n<li>Have experience driving infrastructure migrations in complex, multi-team environments , particularly where the migration touches operational systems that can&#39;t go offline.</li>\n<li>Be familiar with monitoring and alerting tooling (PagerDuty, Datadog, or equivalents) and the operational culture around them.</li>\n</ul>\n<p>Deadline to apply: None, applications will be received on a rolling basis.</p>\n<p>The annual compensation range for this role is listed below. For sales roles, the range provided is the role&#39;s On Target Earnings (&#39;OTE&#39;) range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role.</p>\n<p>Annual Salary: $290,000-$365,000 USD</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_c4e35d55-5d1","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5108695008","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$290,000-$365,000 USD","x-skills-required":["Technical Program Management","Operational or Infrastructure-heavy environments","Production ML systems","Incident management frameworks","On-call operations","Evaluation infrastructure for ML systems","Infrastructure migrations","Monitoring and alerting tooling"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:56:34.910Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Technical Program Management, Operational or Infrastructure-heavy environments, Production ML systems, Incident management frameworks, On-call operations, Evaluation infrastructure for ML systems, Infrastructure migrations, Monitoring and alerting tooling","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":290000,"maxValue":365000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_9d8d91da-52f"},"title":"Enterprise Risk Management Lead","description":"<p>About Gusto</p>\n<p>At Gusto, we&#39;re on a mission to grow the small business economy. We handle the hard stuff , payroll, health insurance, 401(k)s, and HR , so owners can focus on their craft and their customers.</p>\n<p>With teams in Denver, San Francisco, and New York, we support more than 400,000 small businesses nationwide and are building a workplace that reflects the people we serve.</p>\n<p>All full-time employees receive competitive base pay, benefits, and equity (RSUs) , because everyone who helps build Gusto should share in its success. Offer amounts are determined by role, level, and location. Learn more about our Total Rewards philosophy.</p>\n<p>AI is a fundamental part of how work gets done at Gusto. We expect all team members to actively engage with AI tools relevant to their role and grow their fluency as the technology evolves. AI experience requirements vary by role and will be assessed during the interview process.</p>\n<p>About the Role:</p>\n<p>Gusto is scaling our AI-powered risk function to support a complex, multi-entity business operating in highly regulated environments. As the Enterprise Risk Management Lead, you will own and operate Gusto&#39;s Enterprise Risk and Third Party Risk Management programs , built AI-first, designed to scale, and built to enable the business to move fast without breaking things.</p>\n<p>This is a People Empowerer (manager) role. You balance hands-on program leadership with managing and developing a team of compliance professionals. You navigate the tension between &quot;doing the work&quot; and &quot;leading the work&quot; , contributing directly to complex, high-impact programs while ensuring your team delivers with excellence.</p>\n<p>You are a change agent who influences how automated risk management gets done at Gusto, models AI-enabled ways of working, and helps others grow their own capabilities in the process.</p>\n<p>You will champion the adoption of AI, machine learning, and process automation across risk monitoring, control testing, incident management, and reporting , and you will partner with Product, Data Science, and Engineering to make it explainable, adopted, compliant, and scalable.</p>\n<p>Here’s what you’ll do day-to-day:</p>\n<p>You manage initiatives that are complex in both scope and impact, influencing the strategic direction of Gusto&#39;s compliance risk management framework.</p>\n<p>You apply a deep understanding of the regulatory landscape and how it intersects with Gusto&#39;s business model to proactively design and lead cross-functional risk programs.</p>\n<p>You translate complex risk topics into clear, actionable guidance that senior leaders can immediately understand and operationalize.</p>\n<p>You lead cross-functional working groups, align divergent perspectives, and drive cohesive progress toward shared goals , with minimal oversight.</p>\n<p>As a PE, you balance individual risk and compliance contribution with team leadership.</p>\n<p>You manage operations, professional development, resource allocation, and performance , while staying close enough to the work to be a credible, hands-on partner to your team and stakeholders.</p>\n<p>You model responsible AI use, and act as a source of knowledge and mentorship , supporting your team&#39;s AI journey and helping others apply it responsibly and effectively.</p>\n<p>AI-Enabled Risk Operations, Innovation &amp; Transformation</p>\n<p>This is how you and your team operate , not a side project.</p>\n<ul>\n<li>Champion the adoption of AI, machine learning, process automation, and advanced analytics to improve risk monitoring, control testing, and reporting across ERM, TPRM, and broader compliance functions</li>\n</ul>\n<ul>\n<li>Lead the integration of AI and automation into every phase of the risk lifecycle: vendor assessments, document ingestion and analysis, continuous monitoring and alerting, risk scoring, prioritization, and trend analysis</li>\n</ul>\n<ul>\n<li>Build intelligent risk monitoring and evaluation systems , including auto-tagging for risk issues, audit requests, and regulatory changes , that improve real-time visibility and eliminate manual effort across the enterprise risk portfolio</li>\n</ul>\n<ul>\n<li>Drive the digitalization of risk tools including RCSAs, KRIs, incident reporting, and audit tracking , transforming periodic, reactive processes into continuous intelligence systems with live leading and lagging indicators that enable real-time decision-making</li>\n</ul>\n<ul>\n<li>Partner with Product, Data Science, and Engineering to define requirements for AI-driven workflows, decisioning engines, and dashboards , ensuring explainability, auditability, and regulatory defensibility of all AI-enabled risk decisions</li>\n</ul>\n<ul>\n<li>Design and build intelligent dashboards and reporting tools that deliver real-time risk visibility and decision-quality insights to senior leadership and cross-functional stakeholders</li>\n</ul>\n<ul>\n<li>Design AI workflows with appropriate validation loops, human-in-the-loop checkpoints, and guardrails , ensuring outputs are reliable, governable, and meet regulatory standards before being used to frame risks, recommendations, or decisions</li>\n</ul>\n<ul>\n<li>Stay current on AI advancements and emerging technologies and proactively integrate new capabilities into team operations to increase velocity and scale</li>\n</ul>\n<ul>\n<li>Model responsible AI use , supporting ICs in their AI journeys and fostering a culture of intentional experimentation, accountability, and continuous improvement</li>\n</ul>\n<p>Enterprise Risk Management</p>\n<ul>\n<li>Design, implement, and continuously improve Gusto&#39;s ERM framework, ensuring alignment with best practices and Gusto&#39;s stage of growth and strategic priorities across all entities</li>\n</ul>\n<ul>\n<li>Define and maintain Gusto&#39;s enterprise risk taxonomy, risk appetite statement, and key risk indicators spanning operational, regulatory, technology, financial, and reputational risk domains</li>\n</ul>\n<ul>\n<li>Lead Gusto&#39;s Enterprise Risk Management process , driving integration of risk practices across business functions, promoting a proactive risk culture, and ensuring incident management, root cause analysis, and lessons learned are systematically captured in an automated, AI forward way.</li>\n</ul>\n<ul>\n<li>Apply AI-assisted insights to enterprise risk datasets to surface systemic patterns, validate assumptions, prioritize risks, and deliver proactive, data-driven advisory to senior leadership</li>\n</ul>\n<ul>\n<li>Monitor the regulatory landscape (OCC, FDIC, CFPB, SEC, FINRA, GDPR, NIST, ISO, SOC) and leverage AI to proactively incorporate changes before they become compliance gaps</li>\n</ul>\n<ul>\n<li>Act as a key advisor to senior compliance leadership , translating complex risk findings into clear, actionable recommendations with minimal oversight</li>\n</ul>\n<p>Third Party Risk Management (TPRM)</p>\n<ul>\n<li>Design, implement, and independently manage a high-impact, AI-first TPRM program with clear milestones, progress tracking, and measurable outcomes across all Gusto entities</li>\n</ul>\n<ul>\n<li>Manage the full third-party risk lifecycle , onboarding and risk profiling, periodic assessments, issue management, corrective action tracking, and offboarding , across suppliers, product partners, contractors, service providers, and cloud service providers , and do so in an AI and automated way.</li>\n</ul>\n<ul>\n<li>Maintain a centralized, authoritative vendor risk inventory and risk register, ensuring real-time visibility into Gusto&#39;s third-party risk posture</li>\n</ul>\n<ul>\n<li>Conduct periodic AI-driven audits and reviews of third-party compliance with contractual obligations and regulatory standards, identifying patterns that inform continuous program improvement</li>\n</ul>\n<ul>\n<li>Serve as the central orchestrator across Compliance, Security, Legal, Procurement, IT, and GRC for proactive and reactive third-party incident management</li>\n</ul>\n<ul>\n<li>Own Gusto&#39;s TPRM policy and maintain comprehensive documentation , risk assessments, audit findings, corrective actions , ensuring full accountability and traceability</li>\n</ul>\n<p>People Leadership &amp; Team Development</p>\n<ul>\n<li>Balance individual compliance contribution with team leadership , managing operations, professional development, resource allocation, and performance while staying close to the work</li>\n</ul>\n<ul>\n<li>Coach and develop ICs toward next</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_9d8d91da-52f","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Gusto","sameAs":"https://www.gusto.com/","logo":"https://logos.yubhub.co/gusto.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/gusto/jobs/7746997","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Risk Management","Compliance","AI","Machine Learning","Process Automation","Advanced Analytics","Risk Monitoring","Control Testing","Incident Management","Reporting","Vendor Assessments","Document Ingestion","Analysis","Continuous Monitoring","Alerting","Risk Scoring","Prioritization","Trend Analysis","RCSAs","KRIs","Incident Reporting","Audit Tracking","AI-Driven Workflows","Decisioning Engines","Dashboards","Explainability","Auditability","Regulatory Defensibility","Intelligent Dashboards","Reporting Tools","Real-Time Risk Visibility","Decision-Quality Insights","Senior Leadership","Cross-Functional Stakeholders","Validation Loops","Human-in-the-Loop Checkpoints","Guardrails","Reliable Outputs","Governable Outputs","Regulatory Standards","AI Advancements","Emerging Technologies","Velocity","Scale","Responsible AI Use","ICs","AI Journeys","Accountability","Continuous Improvement","ERM Framework","Best Practices","Gusto's Stage of Growth","Strategic Priorities","Enterprise Risk Taxonomy","Risk Appetite Statement","Key Risk Indicators","Operational Risk","Regulatory Risk","Technology Risk","Financial Risk","Reputational Risk","Root Cause Analysis","Lessons Learned","Automated AI Forward Way","AI-Assisted Insights","Systemic Patterns","Assumptions","Proactive Advisory","Regulatory Landscape","OCC","FDIC","CFPB","SEC","FINRA","GDPR","NIST","ISO","SOC","Proactive Incorporation","Compliance Gaps","Key Advisor","Senior Compliance Leadership","Complex Risk Findings","Clear Actionable Recommendations","Minimally Supervised","High-Impact AI-First TPRM Program","Clear Milestones","Progress Tracking","Measurable Outcomes","Third-Party Risk Lifecycle","Onboarding","Risk Profiling","Periodic Assessments","Issue Management","Corrective Action Tracking","Offboarding","Suppliers","Product Partners","Contractors","Service Providers","Cloud Service Providers","AI and Automated Way","Centralized Vendor Risk Inventory","Risk Register","Real-Time Visibility","Third-Party Risk Posture","Periodic Audits","Reviews","Contractual Obligations","Patterns","Continuous Program Improvement","Central Orchestrator","Security","Legal","Procurement","IT","GRC","Proactive Incident Management","Reactive Incident Management","TPRM Policy","Comprehensive Documentation","Risk Assessments","Audit Findings","Corrective Actions","Traceability","Balance Individual Contribution","Team Leadership","Operations","Professional Development","Resource Allocation","Performance","Close to the Work","Coach and Develop ICs","Next Level"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:56:16.772Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Denver, CO;San Francisco, CA;New York, NY"}},"employmentType":"FULL_TIME","occupationalCategory":"Legal","industry":"Finance","skills":"Risk Management, Compliance, AI, Machine Learning, Process Automation, Advanced Analytics, Risk Monitoring, Control Testing, Incident Management, Reporting, Vendor Assessments, Document Ingestion, Analysis, Continuous Monitoring, Alerting, Risk Scoring, Prioritization, Trend Analysis, RCSAs, KRIs, Incident Reporting, Audit Tracking, AI-Driven Workflows, Decisioning Engines, Dashboards, Explainability, Auditability, Regulatory Defensibility, Intelligent Dashboards, Reporting Tools, Real-Time Risk Visibility, Decision-Quality Insights, Senior Leadership, Cross-Functional Stakeholders, Validation Loops, Human-in-the-Loop Checkpoints, Guardrails, Reliable Outputs, Governable Outputs, Regulatory Standards, AI Advancements, Emerging Technologies, Velocity, Scale, Responsible AI Use, ICs, AI Journeys, Accountability, Continuous Improvement, ERM Framework, Best Practices, Gusto's Stage of Growth, Strategic Priorities, Enterprise Risk Taxonomy, Risk Appetite Statement, Key Risk Indicators, Operational Risk, Regulatory Risk, Technology Risk, Financial Risk, Reputational Risk, Root Cause Analysis, Lessons Learned, Automated AI Forward Way, AI-Assisted Insights, Systemic Patterns, Assumptions, Proactive Advisory, Regulatory Landscape, OCC, FDIC, CFPB, SEC, FINRA, GDPR, NIST, ISO, SOC, Proactive Incorporation, Compliance Gaps, Key Advisor, Senior Compliance Leadership, Complex Risk Findings, Clear Actionable Recommendations, Minimally Supervised, High-Impact AI-First TPRM Program, Clear Milestones, Progress Tracking, Measurable Outcomes, Third-Party Risk Lifecycle, Onboarding, Risk Profiling, Periodic Assessments, Issue Management, Corrective Action Tracking, Offboarding, Suppliers, Product Partners, Contractors, Service Providers, Cloud Service Providers, AI and Automated Way, Centralized Vendor Risk Inventory, Risk Register, Real-Time Visibility, Third-Party Risk Posture, Periodic Audits, Reviews, Contractual Obligations, Patterns, Continuous Program Improvement, Central Orchestrator, Security, Legal, Procurement, IT, GRC, Proactive Incident Management, Reactive Incident Management, TPRM Policy, Comprehensive Documentation, Risk Assessments, Audit Findings, Corrective Actions, Traceability, Balance Individual Contribution, Team Leadership, Operations, Professional Development, Resource Allocation, Performance, Close to the Work, Coach and Develop ICs, Next Level"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_ca221b6f-dca"},"title":"Technical Program Manager, Safeguards (Infrastructure & Evals)","description":"<p><strong>About the Role</strong></p>\n<p>Safeguards Engineering builds and operates the infrastructure that keeps Anthropic&#39;s AI systems safe in production. As a Technical Program Manager for Safeguards Infrastructure and Evals, you&#39;ll own the operational health and forward momentum of this stack.</p>\n<p>Your primary responsibility is driving reliability , owning the incident-response and post-mortem process, ensuring SLOs are defined and met in partnership with various teams, and making sure that when things go wrong, the right people know, the right actions get taken, and those actions actually get closed out.</p>\n<p>Alongside that ongoing operational rhythm, you&#39;ll coordinate the larger platform investments: migrations, eval-platform improvements, and the cross-team dependencies that connect them.</p>\n<p>This role sits at the intersection of operations and program management. It requires genuine technical depth , you need to understand how these systems work well enough to triage effectively, judge what&#39;s actually safety-critical versus what can wait, and have informed conversations with the engineers building and maintaining them.</p>\n<p>But the core of the job is keeping the machine running well and the work moving.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Own the Safeguards Engineering ops review</li>\n<li>Drive the recurring cadence that keeps the team informed and coordinated: surfacing recent incidents and failures, bringing visibility to reliability trends, and making sure the right people are in the room when decisions need to be made.</li>\n<li>Drive incident tracking and post-mortem execution</li>\n<li>Establish and maintain SLOs with partner teams</li>\n<li>Maintain runbook quality and incident-ownership clarity</li>\n<li>Drive platform migrations and infrastructure projects</li>\n<li>Coordinate evals platform improvements</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>Solid technical program management experience, particularly in operational or infrastructure-heavy environments</li>\n<li>Understanding of how production ML systems work well enough to triage incidents intelligently and have substantive conversations with engineers about what&#39;s going wrong and why</li>\n<li>Ability to work effectively across team boundaries</li>\n<li>Experience with or strong interest in AI safety</li>\n</ul>\n<p><strong>Nice to Have</strong></p>\n<ul>\n<li>Experience with SRE practices, incident management frameworks, or on-call operations at scale</li>\n<li>Familiarity with monitoring and alerting tooling (PagerDuty, Datadog, or equivalents)</li>\n<li>Experience driving infrastructure migrations in complex, multi-team environments</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_ca221b6f-dca","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://anthropic.ai/","logo":"https://logos.yubhub.co/anthropic.ai.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5108695008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$290,000-$365,000 USD","x-skills-required":["Technical Program Management","Operational or Infrastructure-heavy Environments","Production ML Systems","Incident Tracking and Post-Mortem Execution","Service-Level Objectives (SLOs)","Runbook Quality and Incident-Ownership Clarity","Platform Migrations and Infrastructure Projects","Evals Platform Improvements"],"x-skills-preferred":["SRE Practices","Incident Management Frameworks","On-Call Operations at Scale","Monitoring and Alerting Tooling","Infrastructure Migrations in Complex, Multi-Team Environments"],"datePosted":"2026-04-18T15:55:20.655Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Technical Program Management, Operational or Infrastructure-heavy Environments, Production ML Systems, Incident Tracking and Post-Mortem Execution, Service-Level Objectives (SLOs), Runbook Quality and Incident-Ownership Clarity, Platform Migrations and Infrastructure Projects, Evals Platform Improvements, SRE Practices, Incident Management Frameworks, On-Call Operations at Scale, Monitoring and Alerting Tooling, Infrastructure Migrations in Complex, Multi-Team Environments","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":290000,"maxValue":365000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_067a9092-157"},"title":"Manager, Software Engineering - Observability","description":"<p>We are seeking a Manager, Software Engineering - Observability to lead our team of engineers responsible for the reliability, scalability, and evolution of Figma&#39;s observability and cost engineering platforms.</p>\n<p>As a key member of our engineering team, you will own and operate Figma&#39;s core observability stack, including vendor platforms such as Datadog, ensuring high availability, strong data quality, and effective signal-to-noise across metrics, logs, and traces.</p>\n<p>You will define and drive the technical strategy for instrumentation standards, observability libraries, agents, and operators used to monitor internal and external facing services. You will also explore and implement innovative, AI-driven approaches to anomaly detection, root cause analysis, signal correlation, and operational automation.</p>\n<p>In addition, you will establish clear frameworks for cost attribution, budgeting, forecasting, and alerting across infrastructure and observability spend, enabling teams to make informed tradeoffs.</p>\n<p>You will partner with infrastructure, product engineering, finance, and security teams to improve visibility into system health and cost efficiency at scale.</p>\n<p>You will lead initiatives to optimize observability footprint and spend, balancing depth of insight with performance and cost considerations.</p>\n<p>You will coach and mentor engineers through career development, performance feedback, and technical leadership, fostering a culture of ownership, collaboration, and high-quality execution.</p>\n<p>We are looking for someone with 4+ years of experience leading infrastructure, observability, or platform engineering teams, with a track record of delivering highly reliable production systems.</p>\n<p>You should have deep hands-on experience with modern observability platforms (e.g., Datadog, OpenTelemetry) across metrics, logs, and distributed tracing.</p>\n<p>You should have a strong understanding of distributed systems, instrumentation best practices, SLO design, and incident response workflows.</p>\n<p>Experience driving cost transparency and accountability initiatives, including cost attribution, budgeting, forecasting, and alerting in cloud environments is also required.</p>\n<p>Preferred skills include experience designing or evolving company-wide observability standards, shared libraries, and agent/operator-based integrations, background in cost optimization for infrastructure or observability tooling, including vendor negotiations and usage modeling, and experience applying AI or machine learning techniques to anomaly detection, root cause analysis, or operational automation.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_067a9092-157","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Figma","sameAs":"https://www.figma.com/","logo":"https://logos.yubhub.co/figma.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/figma/jobs/5807963004","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$258,000-$376,000 USD","x-skills-required":["observability","datadog","opentelemetry","distributed systems","instrumentation best practices","slo design","incident response workflows","cost transparency","accountability initiatives","cost attribution","budgeting","forecasting","alerting"],"x-skills-preferred":["designing or evolving company-wide observability standards","shared libraries","agent/operator-based integrations","cost optimization for infrastructure or observability tooling","vendor negotiations","usage modeling","applying ai or machine learning techniques to anomaly detection","root cause analysis","operational automation"],"datePosted":"2026-04-18T15:55:20.408Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA • New York, NY • United States"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"observability, datadog, opentelemetry, distributed systems, instrumentation best practices, slo design, incident response workflows, cost transparency, accountability initiatives, cost attribution, budgeting, forecasting, alerting, designing or evolving company-wide observability standards, shared libraries, agent/operator-based integrations, cost optimization for infrastructure or observability tooling, vendor negotiations, usage modeling, applying ai or machine learning techniques to anomaly detection, root cause analysis, operational automation","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":258000,"maxValue":376000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_190bd9e9-0d1"},"title":"Staff+ Software Engineer, Observability","description":"<p><strong>About the Role</strong></p>\n<p>Anthropic is seeking talented and experienced Software Engineers to join our Observability team within the Infrastructure organization. The Observability team owns the monitoring and telemetry infrastructure that every engineer and researcher at Anthropic depends on,from metrics and logging pipelines to distributed tracing, error analytics, alerting, and the dashboards and query interfaces that make it all actionable.</p>\n<p>By joining this team, you’ll have a direct impact on the reliability and operational excellence of Anthropic’s research and product systems.</p>\n<p>As Anthropic scales its infrastructure across massive GPU, TPU, and Trainium clusters, the volume and complexity of operational data is growing by orders of magnitude. We’re building next-generation observability systems,high-throughput ingest pipelines, cost-efficient columnar storage, unified query layers across signals, and agentic diagnostic tools,to ensure that engineers can detect, diagnose, and resolve issues in minutes rather than hours, even as the systems they operate become exponentially more complex.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Design and build scalable telemetry ingest and storage pipelines for metrics, logs, traces, and error data across Anthropic’s multi-cluster infrastructure</li>\n</ul>\n<ul>\n<li>Own and evolve core observability platforms, driving migrations and architectural improvements that improve reliability, reduce cost, and scale with organisational growth</li>\n</ul>\n<ul>\n<li>Build instrumentation libraries, SDKs, and integrations that make it easy for engineering teams to emit high-quality telemetry from their services</li>\n</ul>\n<ul>\n<li>Drive alerting and SLO infrastructure that enables teams to define, monitor, and respond to reliability targets with minimal noise</li>\n</ul>\n<ul>\n<li>Reduce mean time to detection and resolution by building cross-signal correlation, unified query interfaces, and AI-assisted diagnostic tooling</li>\n</ul>\n<ul>\n<li>Partner with Research, Inference, Product, and Infrastructure teams to ensure observability solutions meet the unique needs of each organisation</li>\n</ul>\n<p><strong>You May Be a Good Fit If You</strong></p>\n<ul>\n<li>Have 10+ years of relevant industry experience building and operating large-scale observability or monitoring infrastructure</li>\n</ul>\n<ul>\n<li>Have deep experience with at least one observability signal area (metrics, logging, tracing, or error analytics) and familiarity with the others</li>\n</ul>\n<ul>\n<li>Understand high-throughput data pipelines, columnar storage engines, and the tradeoffs involved in ingesting and querying telemetry data at scale</li>\n</ul>\n<ul>\n<li>Have experience operating or building on top of observability platforms such as Prometheus, Grafana, ClickHouse, OpenTelemetry, or similar systems</li>\n</ul>\n<ul>\n<li>Have strong proficiency in at least one of Python, Rust, or Go</li>\n</ul>\n<ul>\n<li>Have excellent communication skills and enjoy partnering with internal teams to improve their operational visibility and incident response capabilities</li>\n</ul>\n<ul>\n<li>Are excited about building foundational infrastructure and are comfortable working independently on ambiguous, high-impact technical challenges</li>\n</ul>\n<p><strong>Strong Candidates May Also Have</strong></p>\n<ul>\n<li>Experience operating metrics systems at very high cardinality (hundreds of millions of active time series or more)</li>\n</ul>\n<ul>\n<li>Experience with log storage migrations or operating columnar databases (ClickHouse, BigQuery, or similar) for analytics workloads</li>\n</ul>\n<ul>\n<li>Experience with OpenTelemetry instrumentation, collector pipelines, and tail-based sampling strategies</li>\n</ul>\n<ul>\n<li>Experience building or operating alerting platforms, on-call tooling, or SLO frameworks at scale</li>\n</ul>\n<ul>\n<li>Experience with Kubernetes-native monitoring, eBPF-based observability, or continuous profiling</li>\n</ul>\n<ul>\n<li>Interest in applying AI/LLMs to operational workflows such as automated root cause analysis, anomaly detection, or intelligent alerting</li>\n</ul>\n<p><strong>Logistics</strong></p>\n<ul>\n<li>Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience</li>\n</ul>\n<ul>\n<li>Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience</li>\n</ul>\n<ul>\n<li>Minimum years of experience: Years of experience required will correlate with the internal job level requirements for the position</li>\n</ul>\n<ul>\n<li>Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</li>\n</ul>\n<ul>\n<li>Visa sponsorship: We do sponsor visas! However, we aren’t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</li>\n</ul>\n<p><strong>How we’re different</strong></p>\n<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact , advancing our long-term goals of steerable, trustworthy AI , rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We’re an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills.</p>\n<p>The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI &amp; Compute, Concrete Problems in AI Safety, and Learning from Human Preferences.</p>\n<p><strong>Come work with us!</strong></p>\n<p>Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_190bd9e9-0d1","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5102440008","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"£325,000-£390,000 GBP","x-skills-required":["Python","Rust","Go","Prometheus","Grafana","ClickHouse","OpenTelemetry"],"x-skills-preferred":["Kubernetes-native monitoring","eBPF-based observability","continuous profiling","AI/LLMs","automated root cause analysis","anomaly detection","intelligent alerting"],"datePosted":"2026-04-18T15:54:10.425Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"London, UK"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Rust, Go, Prometheus, Grafana, ClickHouse, OpenTelemetry, Kubernetes-native monitoring, eBPF-based observability, continuous profiling, AI/LLMs, automated root cause analysis, anomaly detection, intelligent alerting","baseSalary":{"@type":"MonetaryAmount","currency":"GBP","value":{"@type":"QuantitativeValue","minValue":325000,"maxValue":390000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_08d03f20-666"},"title":"Finance Systems Integration Engineer","description":"<p>We are seeking an experienced Finance Systems Integration Engineer to support our finance systems transformation at one of the fastest-growing AI companies. You&#39;ll design and build integrations connecting our ERP platform with critical financial applications and support our ERP implementation initiatives.</p>\n<p>As you master our integration landscape, you&#39;ll have opportunities to expand into Claude-powered AI automation and data pipeline development.</p>\n<p>You&#39;ll build the integration backbone for one of the fastest-growing AI companies, with a front-row seat to how Claude transforms financial operations. This is a foundational role where you&#39;ll shape our integration architecture from the ground up, then expand into cutting-edge AI automation as our needs evolve.</p>\n<p>In this role, you will:</p>\n<ul>\n<li>Design, build, and maintain integrations connecting ERP systems with downstream applications, including ZipHQ, Brex, Navan, Clearwater, Payroll systems, Salesforce, and other critical financial platforms using Workato, MuleSoft, or similar iPaaS solutions.</li>\n</ul>\n<ul>\n<li>Support integration development and testing during the ERP implementation projects.</li>\n</ul>\n<ul>\n<li>Develop and maintain REST APIs, webhooks, and OAuth 2.0 authentication flows for secure system-to-system communication.</li>\n</ul>\n<ul>\n<li>Implement real-time and batch integration patterns supporting high-volume financial transactions.</li>\n</ul>\n<ul>\n<li>Establish monitoring, alerting, and error-handling frameworks to ensure integration reliability and data integrity.</li>\n</ul>\n<ul>\n<li>Document integration architectures, data flows, API specifications, and troubleshooting procedures.</li>\n</ul>\n<ul>\n<li>Collaborate with implementation consulting partners and vendors on technical integration requirements.</li>\n</ul>\n<p>Additional scope includes AI automation and data infrastructure, including AI agent development, data pipeline support, governance, and collaboration.</p>\n<p>You may be a good fit if you have 8+ years of experience in integration development, data engineering, or systems engineering roles, possess hands-on experience with iPaaS platforms, and have strong programming skills in Python and/or JavaScript/TypeScript.</p>\n<p>Strong candidates may also have experience with high-growth technology companies, background in AI/ML companies, and hands-on experience with specific platforms, including Workday Financials, Stripe, Salesforce, Zuora RevPro, Zip Procurement, Clearwater treasury systems, Pigment planning tools, Numeric close management, and programming skills in Python/JavaScript.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_08d03f20-666","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5155195008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$205,000-$265,000 USD","x-skills-required":["integration development","data engineering","systems engineering","iPaaS platforms","Python","JavaScript/TypeScript","REST APIs","webhooks","OAuth 2.0","secure system-to-system communication","real-time and batch integration patterns","high-volume financial transactions","monitoring","alerting","error-handling frameworks","integration reliability","data integrity","API specifications","troubleshooting procedures"],"x-skills-preferred":["AI automation","data infrastructure","AI agent development","data pipeline support","governance","collaboration","high-growth technology companies","AI/ML companies","specific platforms","Workday Financials","Stripe","Salesforce","Zuora RevPro","Zip Procurement","Clearwater treasury systems","Pigment planning tools","Numeric close management"],"datePosted":"2026-04-18T15:52:53.021Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"integration development, data engineering, systems engineering, iPaaS platforms, Python, JavaScript/TypeScript, REST APIs, webhooks, OAuth 2.0, secure system-to-system communication, real-time and batch integration patterns, high-volume financial transactions, monitoring, alerting, error-handling frameworks, integration reliability, data integrity, API specifications, troubleshooting procedures, AI automation, data infrastructure, AI agent development, data pipeline support, governance, collaboration, high-growth technology companies, AI/ML companies, specific platforms, Workday Financials, Stripe, Salesforce, Zuora RevPro, Zip Procurement, Clearwater treasury systems, Pigment planning tools, Numeric close management","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":205000,"maxValue":265000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_22375926-26e"},"title":"Senior IT Systems Engineer","description":"<p>We&#39;re seeking a strategic thinker and proven problem-solver with deep expertise in modern IT ecosystems. As a Sr. IT Systems Engineer, you&#39;ll drive automation, mature enterprise workforce identity and access management (IAM), and architect scalable, secure SaaS integrations.</p>\n<p><strong>Responsibilities:</strong></p>\n<ul>\n<li>Lead the design, implementation, administration, and optimization of core SaaS platforms including Okta, Google Workspace, Slack, Atlassian, and other IT tools.</li>\n<li>Own end-to-end support, monitoring, troubleshooting, and performance tuning of applications, systems, and their complex interconnections,ensuring high availability, security, and seamless user experience.</li>\n<li>Help architect and advance our workforce Identity and Access Management program, including configuration of Single Sign-On (SSO), lifecycle management, provisioning/deprovisioning, access governance, and policy enforcement.</li>\n<li>Serve as the subject matter expert (SME) providing strategic technical guidance to support business expansion, system scalability, and infrastructure maturity.</li>\n<li>Drive cross-functional knowledge sharing by authoring, maintaining, and evolving comprehensive IT documentation, runbooks, and architecture diagrams.</li>\n<li>Proactively identify gaps, risks, and opportunities in the environment; lead initiatives to enhance security posture, operational efficiency, and resilience,prioritizing automation of manual/repetitive processes.</li>\n<li>Evaluate emerging technologies, IAM trends, and automation platforms; develop business cases and lead proof-of-concepts or adoption recommendations.</li>\n<li>Mentor junior engineers and collaborate with cross-functional teams to align IT capabilities with organisational goals.</li>\n</ul>\n<p><strong>Basic Qualifications:</strong></p>\n<ul>\n<li>8+ years of hands-on experience administering and optimising a broad portfolio of SaaS applications in a hybrid and high-growth environment,with advanced proficiency in our core stack: Okta (including Advanced Server Access &amp; Workflows), Google Workspace, Slack Enterprise, Atlassian, etc.</li>\n<li>4+ years of deep experience with n8n, Okta Workflows and/or other leading iPaaS/automation platforms (e.g., Workato, Zapier, BetterCloud, custom integrations).</li>\n<li>Expert-level knowledge of IAM principles and protocols: SSO, SAML, OIDC, OAuth 2.0, SCIM, JIT provisioning, SWA, RBAC, ABAC, and access governance best practices.</li>\n<li>Strong experience designing and working with APIs for custom integrations, data flows, and automation.</li>\n<li>Proficiency in scripting and automation for monitoring, alerting, and operational efficiency (e.g., Google Apps Manager (GAM), Python, Bash, PowerShell, Terraform, or similar); experience building custom solutions is highly valued.</li>\n<li>Solid working knowledge and administrative experience in Azure, AWS, and/or GCP cloud platforms.</li>\n<li>Exceptional analytical and troubleshooting skills with a proven track record of resolving sophisticated, cross-system incidents under pressure.</li>\n<li>Demonstrated ability to deliver measurable business impact, own key deliverables, and drive projects to completion in fast-paced environments with competing priorities.</li>\n<li>Comfortable adapting to dynamic requirements, handling time-sensitive escalations, and participating in on-call rotation.</li>\n<li>Track record of success as a Senior IT Systems Engineer or equivalent in a fast-moving corporate or tech environment.</li>\n<li>Okta certifications (e.g., Okta Certified Professional / Administrator / Consultant) strongly preferred; other relevant certifications (Google Workspace) are a plus.</li>\n<li>Bachelor’s degree in Information Technology, Computer Science, or a related field preferred (or equivalent demonstrated experience) is a plus.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_22375926-26e","directApply":true,"hiringOrganization":{"@type":"Organization","name":"xAI","sameAs":"https://www.xai.com","logo":"https://logos.yubhub.co/xai.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/xai/jobs/5071895007","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$184,000 - $276,000 USD","x-skills-required":["Okta","Google Workspace","Slack","Atlassian","n8n","Okta Workflows","iPaaS/automation platforms","IAM principles and protocols","APIs for custom integrations","data flows","automation","scripting and automation","monitoring","alerting","operational efficiency","Azure","AWS","GCP cloud platforms","analytical and troubleshooting skills"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:51:33.231Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Palo Alto, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"IT","industry":"Technology","skills":"Okta, Google Workspace, Slack, Atlassian, n8n, Okta Workflows, iPaaS/automation platforms, IAM principles and protocols, APIs for custom integrations, data flows, automation, scripting and automation, monitoring, alerting, operational efficiency, Azure, AWS, GCP cloud platforms, analytical and troubleshooting skills","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":184000,"maxValue":276000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_72ebb09d-b37"},"title":"Staff+ Software Engineer, Observability","description":"<p>We&#39;re seeking talented and experienced Software Engineers to join our Observability team within the Infrastructure organization. The Observability team owns the monitoring and telemetry infrastructure that every engineer and researcher at Anthropic depends on,from metrics and logging pipelines to distributed tracing, error analytics, alerting, and the dashboards and query interfaces that make it all actionable.</p>\n<p>As Anthropic scales its infrastructure across massive GPU, TPU, and Trainium clusters, the volume and complexity of operational data is growing by orders of magnitude. We&#39;re building next-generation observability systems,high-throughput ingest pipelines, cost-efficient columnar storage, unified query layers across signals, and agentic diagnostic tools,to ensure that engineers can detect, diagnose, and resolve issues in minutes rather than hours, even as the systems they operate become exponentially more complex.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Design and build scalable telemetry ingest and storage pipelines for metrics, logs, traces, and error data across Anthropic&#39;s multi-cluster infrastructure</li>\n<li>Own and evolve core observability platforms, driving migrations and architectural improvements that improve reliability, reduce cost, and scale with organisational growth</li>\n<li>Build instrumentation libraries, SDKs, and integrations that make it easy for engineering teams to emit high-quality telemetry from their services</li>\n<li>Drive alerting and SLO infrastructure that enables teams to define, monitor, and respond to reliability targets with minimal noise</li>\n<li>Reduce mean time to detection and resolution by building cross-signal correlation, unified query interfaces, and AI-assisted diagnostic tooling</li>\n<li>Partner with Research, Inference, Product, and Infrastructure teams to ensure observability solutions meet the unique needs of each organisation</li>\n</ul>\n<p>You May Be a Good Fit If You:</p>\n<ul>\n<li>Have 10+ years of relevant industry experience building and operating large-scale observability or monitoring infrastructure</li>\n<li>Have deep experience with at least one observability signal area (metrics, logging, tracing, or error analytics) and familiarity with the others</li>\n<li>Understand high-throughput data pipelines, columnar storage engines, and the tradeoffs involved in ingesting and querying telemetry data at scale</li>\n<li>Have experience operating or building on top of observability platforms such as Prometheus, Grafana, ClickHouse, OpenTelemetry, or similar systems</li>\n<li>Have strong proficiency in at least one of Python, Rust, or Go</li>\n<li>Have excellent communication skills and enjoy partnering with internal teams to improve their operational visibility and incident response capabilities</li>\n<li>Are excited about building foundational infrastructure and are comfortable working independently on ambiguous, high-impact technical challenges</li>\n</ul>\n<p>Strong Candidates May Also Have:</p>\n<ul>\n<li>Experience operating metrics systems at very high cardinality (hundreds of millions of active time series or more)</li>\n<li>Experience with log storage migrations or operating columnar databases (ClickHouse, BigQuery, or similar) for analytics workloads</li>\n<li>Experience with OpenTelemetry instrumentation, collector pipelines, and tail-based sampling strategies</li>\n<li>Experience building or operating alerting platforms, on-call tooling, or SLO frameworks at scale</li>\n<li>Experience with Kubernetes-native monitoring, eBPF-based observability, or continuous profiling</li>\n<li>Interest in applying AI/LLMs to operational workflows such as automated root cause analysis, anomaly detection, or intelligent alerting</li>\n</ul>\n<p>The annual compensation range for this role is $405,000-$485,000 USD.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_72ebb09d-b37","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5139910008","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$405,000-$485,000 USD","x-skills-required":["observability","monitoring","telemetry","metrics","logging","tracing","error analytics","alerting","SLO infrastructure","cross-signal correlation","unified query interfaces","AI-assisted diagnostic tooling","Python","Rust","Go","Prometheus","Grafana","ClickHouse","OpenTelemetry"],"x-skills-preferred":["high-throughput data pipelines","columnar storage engines","operating system administration","cloud computing","containerization","DevOps"],"datePosted":"2026-04-18T15:51:29.494Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"observability, monitoring, telemetry, metrics, logging, tracing, error analytics, alerting, SLO infrastructure, cross-signal correlation, unified query interfaces, AI-assisted diagnostic tooling, Python, Rust, Go, Prometheus, Grafana, ClickHouse, OpenTelemetry, high-throughput data pipelines, columnar storage engines, operating system administration, cloud computing, containerization, DevOps","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":405000,"maxValue":485000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_372999e8-579"},"title":"Senior Software Engineer II, AI Workload Orchestration","description":"<p>As a Senior Software Engineer II on the AI Workload Orchestration team, you will help build and operate CoreWeave&#39;s Kubernetes-native platform for admitting, scheduling, and operating AI workloads at scale.</p>\n<p>This platform integrates multiple orchestration and scheduling frameworks such as Kueue, Volcano, and Ray to support modern AI training and inference workflows. It complements SUNK (Slurm on Kubernetes) by providing a Kubernetes-first, cloud-native orchestration layer with deep platform integration.</p>\n<p>You will own meaningful components of the platform, drive reliability and performance improvements, and help scale the system as customer demand and workload complexity continue to grow.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Design, build, and operate Kubernetes-native services for AI workload orchestration and scheduling</li>\n<li>Own one or more platform components end-to-end, including design, implementation, testing, and on-call support</li>\n<li>Improve scheduling latency, cluster utilization, and workload reliability through metrics-driven engineering</li>\n<li>Contribute to architectural discussions across services and influence design decisions within the platform</li>\n<li>Work closely with adjacent teams (CKS, infrastructure, managed inference) to ensure clean interfaces and integrations</li>\n<li>Mentor junior engineers and raise the quality bar for code, design, and operations</li>\n</ul>\n<p>About the role:</p>\n<ul>\n<li>5–8 years of professional software engineering experience in distributed systems, cloud infrastructure, or platform engineering</li>\n<li>Strong experience building production systems in Go (Python or C++ a plus)</li>\n<li>Solid understanding of Kubernetes fundamentals, APIs, controllers, and operating services in production</li>\n<li>Experience working with scheduling, resource management, or quota-based systems</li>\n<li>Proven ability to improve system reliability and performance using data and operational metrics</li>\n<li>Comfortable owning services in production and participating in on-call rotations</li>\n</ul>\n<p>Preferred:</p>\n<ul>\n<li>Experience with Kubernetes-native orchestration frameworks such as Kueue, Volcano, Ray, Kubeflow, or Argo Workflows</li>\n<li>Familiarity with GPU-based workloads, ML training, or inference pipelines</li>\n<li>Knowledge of scheduling concepts such as quota enforcement, pre-emption, and backfilling</li>\n<li>Experience with reliability practices including SLOs, alerting, and incident response</li>\n<li>Exposure to AI infrastructure, HPC, or large-scale distributed compute environments</li>\n</ul>\n<p>Why CoreWeave?</p>\n<p>At CoreWeave, we work hard, have fun, and move fast! We’re in an exciting stage of hyper-growth that you will not want to miss out on. We’re not afraid of a little chaos, and we’re constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values:</p>\n<ul>\n<li>Be Curious at Your Core</li>\n<li>Act Like an Owner</li>\n<li>Empower Employees</li>\n<li>Deliver Best-in-Class Client Experiences</li>\n<li>Achieve More Together</li>\n</ul>\n<p>The base salary range for this role is $165,000 to $242,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).</p>\n<p>What We Offer</p>\n<p>The range we’ve posted represents the typical compensation range for this role. To determine actual compensation, we review the market rate for each candidate which can include a variety of factors. These include qualifications, experience, interview performance, and location.</p>\n<p>In addition to a competitive salary, we offer a variety of benefits to support your needs, including:</p>\n<ul>\n<li>Medical, dental, and vision insurance - 100% paid for by CoreWeave</li>\n<li>Company-paid Life Insurance</li>\n<li>Voluntary supplemental life insurance</li>\n<li>Short and long-term disability insurance</li>\n<li>Flexible Spending Account</li>\n<li>Health Savings Account</li>\n<li>Tuition Reimbursement</li>\n<li>Ability to Participate in Employee Stock Purchase Program (ESPP)</li>\n<li>Mental Wellness Benefits through Spring Health</li>\n<li>Family-Forming support provided by Carrot</li>\n<li>Paid Parental Leave</li>\n<li>Flexible, full-service childcare support with Kinside</li>\n<li>401(k) with a generous employer match</li>\n<li>Flexible PTO</li>\n<li>Catered lunch each day in our office and data center locations</li>\n<li>A casual work environment</li>\n<li>A work culture focused on innovative disruption</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_372999e8-579","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4647595006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$165,000 to $242,000","x-skills-required":["Kubernetes","Go","Distributed systems","Cloud infrastructure","Platform engineering","Scheduling","Resource management","Quota-based systems"],"x-skills-preferred":["Kueue","Volcano","Ray","Kubeflow","Argo Workflows","GPU-based workloads","ML training","Inference pipelines","SLOs","Alerting","Incident response","AI infrastructure","HPC","Large-scale distributed compute environments"],"datePosted":"2026-04-18T15:50:19.636Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Kubernetes, Go, Distributed systems, Cloud infrastructure, Platform engineering, Scheduling, Resource management, Quota-based systems, Kueue, Volcano, Ray, Kubeflow, Argo Workflows, GPU-based workloads, ML training, Inference pipelines, SLOs, Alerting, Incident response, AI infrastructure, HPC, Large-scale distributed compute environments","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":165000,"maxValue":242000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_336080a3-2f9"},"title":"Senior Software Engineer, Network Performance & Reliability","description":"<p>About Us At Cloudflare, we are on a mission to help build a better Internet. Today the company runs one of the world’s largest networks that powers millions of websites and other Internet properties for customers ranging from individual bloggers to SMBs to Fortune 500 companies.</p>\n<p>We protect and accelerate any Internet application online without adding hardware, installing software, or changing a line of code. Internet properties powered by Cloudflare all have web traffic routed through its intelligent global network, which gets smarter with every request. As a result, they see significant improvement in performance and a decrease in spam and other attacks.</p>\n<p>What You’ll Do The Argo team was formed to own a very important aspect of Cloudflare&#39;s systems: enable more reliable network connectivity for Cloudflare’s products than the Internet itself provides. Almost all products in Cloudflare’s portfolio are or will be powered by Argo technology, including CDN, Spectrum, Magic Transit, Stream, Workers, Workers AI, R2, WARP, and more.</p>\n<p>As a member of the Argo team, you’ll be a key technical contributor to the cutting edge network software infrastructure used by those products. You will work closely with various Engineering teams to translate their requirements into new capabilities on the platform. Likewise you will partner with Network Engineering and SRE to ensure that the technology makes the best use of Cloudflare&#39;s world-class edge network.</p>\n<p>You will participate in all stages of the software development lifecycle, from designing and documenting systems, to writing code and automated tests, to planning, managing, and monitoring production software deployments. You will work with a wide range of technologies and programming languages, including Rust, Go, Linux networking, ClickHouse, PostgreSQL, Grafana, Kubernetes, and more.</p>\n<p>Must-Have Skills</p>\n<ul>\n<li>Systems-level programming experience in Go, Rust, C, or C++</li>\n<li>A solid grasp of networking protocols in Layers 3 and 4 of the OSI Model</li>\n<li>Knowledge of HTTP, TLS, and CDN networks</li>\n<li>Experience in implementing secure and highly-available distributed systems</li>\n<li>Experience with monitoring, alerting, and debugging large-scale distributed systems</li>\n<li>Experience participating in an on-call rotation</li>\n<li>Strong collaboration and communication skills</li>\n<li>Experience/interest in network performance monitoring and tuning</li>\n<li>Willingness to adopt and integrate AI tools and systems into your engineering workflow</li>\n</ul>\n<p>Bonus Points</p>\n<ul>\n<li>Knowledge of TCP/IP and Internet routing</li>\n<li>Professional systems-level programming experience in Rust</li>\n<li>Working knowledge of statistical-analysis techniques and control theory</li>\n<li>Experience building tools and APIs</li>\n<li>Experience using AI-assisted development tools (e.g., code completion, codebase analysis, log/data exploration) in a professional setting</li>\n</ul>\n<p>Equity This role is eligible to participate in Cloudflare’s equity plan.</p>\n<p>Benefits Cloudflare offers a complete package of benefits and programs to support you and your family. Our benefits programs can help you pay health care expenses, support caregiving, build capital for the future and make life a little easier and fun!</p>\n<p>Time Off Flexible paid time off covering vacation and sick leave Leave programs, including parental, pregnancy health, medical, and bereavement leave</p>\n<p>More information on the team: https://blog.cloudflare.com/orpheus-saves-internet-requests-while-maintaining-speed/ https://blog.cloudflare.com/orpheus/</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_336080a3-2f9","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Cloudflare","sameAs":"https://www.cloudflare.com/","logo":"https://logos.yubhub.co/cloudflare.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/cloudflare/jobs/7446340","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Systems-level programming experience in Go, Rust, C, or C++","A solid grasp of networking protocols in Layers 3 and 4 of the OSI Model","Knowledge of HTTP, TLS, and CDN networks","Experience in implementing secure and highly-available distributed systems","Experience with monitoring, alerting, and debugging large-scale distributed systems"],"x-skills-preferred":["Knowledge of TCP/IP and Internet routing","Professional systems-level programming experience in Rust","Working knowledge of statistical-analysis techniques and control theory","Experience building tools and APIs","Experience using AI-assisted development tools (e.g., code completion, codebase analysis, log/data exploration) in a professional setting"],"datePosted":"2026-04-18T15:49:58.486Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Hybrid"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Systems-level programming experience in Go, Rust, C, or C++, A solid grasp of networking protocols in Layers 3 and 4 of the OSI Model, Knowledge of HTTP, TLS, and CDN networks, Experience in implementing secure and highly-available distributed systems, Experience with monitoring, alerting, and debugging large-scale distributed systems, Knowledge of TCP/IP and Internet routing, Professional systems-level programming experience in Rust, Working knowledge of statistical-analysis techniques and control theory, Experience building tools and APIs, Experience using AI-assisted development tools (e.g., code completion, codebase analysis, log/data exploration) in a professional setting"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_d34bbf18-2b2"},"title":"Senior Site Reliability Engineer (FinOps) - Platform","description":"<p>As a Senior Site Reliability Engineer (FinOps) - Platform, you will be part of the Platform Engineering department, responsible for designing, building, scaling, and maturing the multi-cloud platform for hosting internal and external services. You will lead technical initiatives for automating system engineering efforts to guarantee the reliability of the global Elastic infrastructure. You will also grow our global Platform infrastructure to meet the increasing scaling demands by developing and maintaining software, tooling, and automations.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Taking an engineering approach in leading technical initiatives for automating system engineering efforts to guarantee the reliability of the global Elastic infrastructure.</li>\n<li>Growing our global Platform infrastructure to meet the increasing scaling demands by developing and maintaining software, tooling, and automations.</li>\n<li>Using an inclusive approach at championing an environment focused on collaboration, operational excellence, and uplifting others.</li>\n<li>Responding to and preventing repeated customer impact in response to major incidents and prioritized problem management.</li>\n</ul>\n<p>The ideal candidate will have success and lessons of experiences from striving for &#39;progress not perfection&#39; in the name of Platform reliability. They will have a background in software engineering to collaborate with engineers to expertly identify, implement, and deliver solutions. An experience in public cloud and managed Kubernetes services is advantageous.</p>\n<p>The role requires passion for developing solutions that involve inclusive communication methods to grow and strengthen partner and team relationships. Examples of working in distributed teams or working remotely is desirable.</p>\n<p>Bonus points for experience in operating a SaaS product in a public cloud, building or operating a Kubernetes-at-scale infrastructure, writing non-trivial programs in Golang or other programming languages, working with containerized services, leading and improving alerting and major incident management standard processes metrics systems, and experience in system administration with professional skills in Linux on distributed systems at scale.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_d34bbf18-2b2","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Elastic","sameAs":"https://www.elastic.co/","logo":"https://logos.yubhub.co/elastic.co.png"},"x-apply-url":"https://job-boards.greenhouse.io/elastic/jobs/7565188","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Cloud computing","Kubernetes","Golang","Containerization","Linux","System administration","Alerting and incident management"],"x-skills-preferred":["Infrastructure-as-Code","Terraform","Crossplane","Distributed systems","Self-organizing teams"],"datePosted":"2026-04-18T15:49:53.439Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Spain"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Cloud computing, Kubernetes, Golang, Containerization, Linux, System administration, Alerting and incident management, Infrastructure-as-Code, Terraform, Crossplane, Distributed systems, Self-organizing teams"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_9e898a04-26d"},"title":"Production Engineer, Support tooling (Tooling and Frameworks)","description":"<p>The Senior Production Engineering team sits at the heart of CoreWeave&#39;s reliability efforts. In this role, you&#39;ll partner closely with our Support/CX teams to build, operate, and evolve internal tooling that enables a &quot;Direct-to-Expert&quot; support model at scale.</p>\n<p>You&#39;ll define and ship AI-assisted workflows, self-service diagnostics, and platform integrations that reduce time-to-resolution and improve customer experience across our cloud.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Design, build, and own support-facing tools for case triage, intelligent routing, and expert engagement, integrating with incident and change management workflows.</li>\n</ul>\n<ul>\n<li>Develop AI-powered assistants and automations that accelerate root-cause discovery, knowledge retrieval, and resolution quality.</li>\n</ul>\n<ul>\n<li>Create and maintain dashboards, alerts, and signals that surface tooling issues early; integrate observability into new tooling to reduce MTTR.</li>\n</ul>\n<ul>\n<li>Build self-service and guided diagnostics that empower Support/CX to resolve common issues and collect high-quality context for escalations.</li>\n</ul>\n<ul>\n<li>Codify reliability and support practices into services, APIs, and Kubernetes-native controllers/operators where appropriate.</li>\n</ul>\n<ul>\n<li>Partner with engineering leadership and internal stakeholders to prioritise roadmap initiatives, land adoption, and measure business impact.</li>\n</ul>\n<ul>\n<li>Participate in an on-call rotation for the tooling you own.</li>\n</ul>\n<p>Minimum qualifications include:</p>\n<ul>\n<li>4+ years of software or infrastructure engineering experience building and operating production services.</li>\n</ul>\n<ul>\n<li>Proficiency in Go or Python (or equivalent experience).</li>\n</ul>\n<ul>\n<li>Strong fundamentals in Linux, containers, and Kubernetes; comfortable debugging in distributed systems.</li>\n</ul>\n<ul>\n<li>Experience with observability (metrics/logs/traces) and using data to improve reliability and support outcomes.</li>\n</ul>\n<ul>\n<li>Demonstrated experience with incident management and steady-state operational excellence (e.g., progressive delivery, testing strategies, error budgets, fault-tolerant design).</li>\n</ul>\n<ul>\n<li>Comfort collaborating with multiple stakeholders (Support/CX, Product, SRE, and service owners).</li>\n</ul>\n<p>Preferred qualifications include:</p>\n<ul>\n<li>Experience integrating or building support/operations tooling (e.g., ticketing/incident systems, status page, knowledge management, chat/alerting integrations).</li>\n</ul>\n<ul>\n<li>Experience automating manual workflows and stitching together productivity platforms.</li>\n</ul>\n<ul>\n<li>Familiarity with AI/ML tooling for retrieval, summarization, or copilot-style assistance.</li>\n</ul>\n<ul>\n<li>Experience codifying operational practices into Kubernetes controllers, operators, or platform services.</li>\n</ul>\n<p>The base salary range for this role is $139,000 to $204,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).</p>\n<p>In addition to a competitive salary, we offer a variety of benefits to support your needs, including:</p>\n<ul>\n<li>Medical, dental, and vision insurance - 100% paid for by CoreWeave</li>\n</ul>\n<ul>\n<li>Company-paid Life Insurance</li>\n</ul>\n<ul>\n<li>Voluntary supplemental life insurance</li>\n</ul>\n<ul>\n<li>Short and long-term disability insurance</li>\n</ul>\n<ul>\n<li>Flexible Spending Account</li>\n</ul>\n<ul>\n<li>Health Savings Account</li>\n</ul>\n<ul>\n<li>Tuition Reimbursement</li>\n</ul>\n<ul>\n<li>Ability to Participate in Employee Stock Purchase Program (ESPP)</li>\n</ul>\n<ul>\n<li>Mental Wellness Benefits through Spring Health</li>\n</ul>\n<ul>\n<li>Family-Forming support provided by Carrot</li>\n</ul>\n<ul>\n<li>Paid Parental Leave</li>\n</ul>\n<ul>\n<li>Flexible, full-service childcare support with Kinside</li>\n</ul>\n<ul>\n<li>401(k) with a generous employer match</li>\n</ul>\n<ul>\n<li>Flexible PTO</li>\n</ul>\n<ul>\n<li>Catered lunch each day in our office and data center locations</li>\n</ul>\n<ul>\n<li>A casual work environment</li>\n</ul>\n<ul>\n<li>A work culture focused on innovative disruption</li>\n</ul>\n<p>Our Workplace</p>\n<p>While we prioritise a hybrid work environment, remote work may be considered for candidates located more than 30 miles from an office, based on role requirements for specialised skill sets. New hires will be invited to attend onboarding at one of our hubs within their first month. Teams also gather quarterly to support collaboration.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_9e898a04-26d","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4617128006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$139,000 to $204,000","x-skills-required":["Go","Python","Linux","containers","Kubernetes","observability","incident management","operational excellence"],"x-skills-preferred":["AI/ML tooling","ticketing/incident systems","status page","knowledge management","chat/alerting integrations","automating manual workflows","productivity platforms"],"datePosted":"2026-04-18T15:48:08.984Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Go, Python, Linux, containers, Kubernetes, observability, incident management, operational excellence, AI/ML tooling, ticketing/incident systems, status page, knowledge management, chat/alerting integrations, automating manual workflows, productivity platforms","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":139000,"maxValue":204000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_40d32156-365"},"title":"Reliability Lead, Common Services","description":"<p>As Reliability Lead, Common Services, you will establish and lead the Reliability Engineering and production operations practice for the Common Services organization. You&#39;ll partner closely with engineering leaders and teams across Common Services to define how we build, release, monitor, and operate critical services,raising the bar on reliability, availability, and operational excellence across the board.</p>\n<p>In this role, you will:</p>\n<ul>\n<li>Establish and lead the SRE / production engineering practice for the Common Services organization, including standards for reliability, incident management, and on-call, in partnership with the central Product Engineering organization.</li>\n<li>Develop an Operational Excellence strategy that focuses on not only improving system performance but also monitoring and reducing operational toil</li>\n<li>Partner with engineering and product teams to define SLOs, SLIs, and error budgets for critical Common Services, and ensure these become part of how teams plan and make tradeoffs.</li>\n<li>Own and improve the incident management lifecycle for Common Services, including on-call rotations, escalation paths, incident tooling, post-incident reviews, and follow-through on corrective actions.</li>\n<li>Drive the observability strategy (metrics, logs, traces, dashboards, alerts) for Common Services, ensuring we have actionable visibility into the health, performance, and capacity of key systems.</li>\n<li>Collaborate with engineering leads to design and review architectures for reliability, scalability, resilience, and operability, including failure modes, redundancy, and graceful degradation.</li>\n<li>Lead efforts to automate and harden operational workflows, including deployments, rollbacks, configuration management, change management, and routine maintenance tasks.</li>\n<li>Build strong, trust-based relationships with partner teams and stakeholders, becoming a go-to leader for production readiness and operational risk within Common Services.</li>\n<li>Hire, mentor, and develop SRE and production engineering talent, fostering a culture of continuous improvement, learning from incidents, and humane on-call.</li>\n<li>Partner with other SRE and production engineering leaders across CoreWeave to align on global practices, tools, and reliability goals, representing the needs and constraints of Common Services.</li>\n</ul>\n<p>You will be responsible for defining the reliability strategy, processes, and standards for the Common Services portfolio and driving consistent, high-quality operational practices across multiple teams.</p>\n<p>The base salary range for this role is $206,000 to $303,000.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_40d32156-365","directApply":true,"hiringOrganization":{"@type":"Organization","name":"CoreWeave","sameAs":"https://www.coreweave.com","logo":"https://logos.yubhub.co/coreweave.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/coreweave/jobs/4650165006","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$206,000 to $303,000","x-skills-required":["Site Reliability Engineering","Production Engineering","Linux-based production environments","Containers","Orchestration technologies","Observability stacks","Alerting systems","SLIs/SLOs","Error budgets","Incident management","On-call rotations","Escalation paths","Post-incident reviews","Corrective actions","Automation tooling","Infrastructure-as-code","CI/CD pipelines"],"x-skills-preferred":["GPU workloads","High-performance computing","Latency/throughput-sensitive systems","Multi-tenant environments","Multi-region environments","Regulated environments","Service ownership models","Mentoring","Managing senior engineers"],"datePosted":"2026-04-18T15:47:45.370Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"New York, NY / Sunnyvale, CA / Bellevue, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Site Reliability Engineering, Production Engineering, Linux-based production environments, Containers, Orchestration technologies, Observability stacks, Alerting systems, SLIs/SLOs, Error budgets, Incident management, On-call rotations, Escalation paths, Post-incident reviews, Corrective actions, Automation tooling, Infrastructure-as-code, CI/CD pipelines, GPU workloads, High-performance computing, Latency/throughput-sensitive systems, Multi-tenant environments, Multi-region environments, Regulated environments, Service ownership models, Mentoring, Managing senior engineers","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":206000,"maxValue":303000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_fca5411d-4fb"},"title":"Staff Site Reliability Engineer - Kubernetes","description":"<p>Secure Every Identity, from AI to Human</p>\n<p>Identity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organisations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence.</p>\n<p>This is an opportunity to do career-defining work. We&#39;re all in on this mission. If you are too, let&#39;s talk.</p>\n<p>Workforce Identity Cloud</p>\n<p>Okta Workforce Identity Cloud (WIC) provides easy, secure access for your workforce so you can focus on other strategic priorities,like reducing costs, and doing more for your customers.</p>\n<p>If you like to be challenged and have a passion for solving large-scale automation, testing, and tuning problems, we would love to hear from you. The ideal candidate is someone who exemplifies the ethics of, “If you have to do something more than once, automate it” and who can rapidly self-educate on new concepts and tools.</p>\n<p><strong>Position Overview:</strong></p>\n<p>The Site Reliability Engineer (SRE) will play a key role in building and managing Kubernetes platforms that support cloud-native applications and services. This position focuses on architecting and managing reliable, scalable, and secure Kubernetes-based platforms on AWS, ensuring high availability and performance while optimising costs and automation. The ideal candidate will have hands-on experience with AWS infrastructure, Kubernetes platform creation, Helm charts, Karpenter scaling, and Istio service mesh.</p>\n<p><strong>Key Responsibilities:</strong></p>\n<ul>\n<li>Kubernetes Platform Creation: Design, implement, and maintain highly available, scalable, and fault-tolerant Kubernetes platforms. Ensure clusters are optimised for production workloads, providing high resilience and operational efficiency.</li>\n</ul>\n<ul>\n<li>AWS Infrastructure Management: Build, manage, and optimise AWS cloud infrastructure, including EKS, ECS, S3, VPCs, RDS, IAM, and more. Implement best practices for cost management, scaling, and security within AWS.</li>\n</ul>\n<ul>\n<li>Helm Management: Utilise Helm to automate and streamline the deployment of applications and services to Kubernetes clusters. Create, maintain, and manage Helm charts for production-ready deployments.</li>\n</ul>\n<ul>\n<li>Karpenter Implementation: Implement and manage Karpenter to dynamically scale Kubernetes clusters in response to workload demands.</li>\n</ul>\n<ul>\n<li>Istio Service Mesh Management: Configure and manage Istio to provide service-to-service communication, security, and observability within the Kubernetes clusters. Enable fine-grained traffic management, service discovery, and policy enforcement.</li>\n</ul>\n<ul>\n<li>Platform Automation &amp; Scaling: Automate the deployment, scaling, and management of infrastructure and applications. Work with CI/CD pipelines to ensure a seamless flow from development to production with minimal downtime.</li>\n</ul>\n<ul>\n<li>Incident Management &amp; Troubleshooting: Respond to incidents, troubleshoot, and resolve system issues related to performance, availability, and security in a timely and effective manner.</li>\n</ul>\n<ul>\n<li>Security &amp; Compliance: Design and implement secure cloud infrastructure with appropriate access controls, network security, and compliance frameworks.</li>\n</ul>\n<ul>\n<li>Documentation &amp; Knowledge Sharing: Create and maintain detailed documentation for Kubernetes platform setup, operational procedures, and best practices. Promote knowledge sharing across teams.</li>\n</ul>\n<p><strong>Required Qualifications:</strong></p>\n<ul>\n<li>4+ years of experience with Kubernetes/Helm;</li>\n</ul>\n<ul>\n<li>4+ years of Experience with Terraform.</li>\n</ul>\n<ul>\n<li>5+ years of Experience with AWS</li>\n</ul>\n<ul>\n<li>Experience with multi-region cloud environments.</li>\n</ul>\n<ul>\n<li>Proven experience with AWS (EC2, RDS, S3, CloudFormation, IAM, etc.) and solid understanding of cloud-native architectures.</li>\n</ul>\n<ul>\n<li>Strong expertise in Kubernetes platform creation, management, and optimisation (e.g., setting up highly available clusters, networking, and storage).</li>\n</ul>\n<ul>\n<li>Hands-on experience with Helm for Kubernetes application deployment and management.</li>\n</ul>\n<ul>\n<li>Practical experience with Karpenter for dynamic scaling of Kubernetes clusters and optimising resource usage.</li>\n</ul>\n<ul>\n<li>Expertise in managing and securing Istio for service mesh, including traffic management, security, and observability features.</li>\n</ul>\n<ul>\n<li>Proficiency in CI/CD pipelines and automation tools (e.g., Jenkins, GitLab, CircleCI, Terraform, Ansible, Spinnaker).</li>\n</ul>\n<ul>\n<li>Strong scripting and automation skills in Python, Bash, or Go for infrastructure management and platform automation.</li>\n</ul>\n<ul>\n<li>Experience with monitoring, logging, and alerting tools such as Prometheus, Grafana, CloudWatch, and ELK Stack.</li>\n</ul>\n<p><strong>Preferred Qualifications:</strong></p>\n<ul>\n<li>Understanding of security best practices for cloud platforms and Kubernetes (e.g., role-based access control (RBAC), encryption, and compliance frameworks).</li>\n</ul>\n<ul>\n<li>Familiarity with Docker and containerization principles.</li>\n</ul>\n<ul>\n<li>Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent professional experience).</li>\n</ul>\n<ul>\n<li>Certifications (Preferred): CKA (Certified Kubernetes Administrator), CKAD (Certified Kubernetes Application Developer), or AWS Certified DevOps Engineer are highly desirable.</li>\n</ul>\n<p>Additional requirements:</p>\n<ul>\n<li>This position requires the ability to access federal environments and/or have access to protected federal data. As a condition of employment for this position, the successful candidate must be able to submit documentation establishing U.S. Person status (e.g. a U.S. Citizen, National, Lawful Permanent Resident, Refugee, or Asylee. 22 CFR 120.15) upon hire.</li>\n</ul>\n<ul>\n<li>Requires in-person onboarding and travel to our San Francisco, CA HQ office or our Chicago office during the first week of employment.</li>\n</ul>\n<p>#LI-Hybrid</p>\n<p>#LI-LSS1</p>\n<p>requisition ID- (P16373_3396241)</p>\n<p>The annual base salary range for this position for candidates located in the San Francisco Bay area is between: $194,000-$267,000 USD</p>\n<p>Below is the annual base salary range for candidates located in California (excluding San Francisco Bay Area), Colorado, Illinois, New York and Washington. Your actual base salary will depend on factors such as your skills, qualifications, experience, and work location. In addition, Okta offers equity (where applicable), bonus, and benefits, including health, dental and vision insurance, 401(k), flexible spending account, and paid leave (including PTO and parental leave) in accordance with our applicable plans and policies. To learn more about our Total Rewards program please visit: https://rewards.okta.com/us.</p>\n<p>The annual base salary range for this position for candidates located in California (excluding San Francisco Bay Area), Colorado, Illinois, New York, and Washington is between:$174,000-$214,000 USD</p>\n<p>The Okta Experience</p>\n<ul>\n<li>Supporting Your Well-Being</li>\n</ul>\n<ul>\n<li>Driving Social Impact</li>\n</ul>\n<ul>\n<li>Developing Talent and Fostering Connection + Community</li>\n</ul>\n<p>We are intentional about connection. Our global community, spanning over 20 offices worldwide, is united by a drive to innovate. Your journey begins with an immersive, in-person onboarding experience designed to accelerate your impact and connect you to our mission and team from day one.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_fca5411d-4fb","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Okta","sameAs":"https://www.okta.com/","logo":"https://logos.yubhub.co/okta.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/okta/jobs/7743339","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$174,000-$214,000 USD","x-skills-required":["Kubernetes","Helm","Terraform","AWS","Cloud-native architectures","Kubernetes platform creation","Kubernetes management","Kubernetes optimisation","Helm for Kubernetes application deployment","Karpenter for dynamic scaling","Istio for service mesh","CI/CD pipelines","Automation tools","Python","Bash","Go","Monitoring","Logging","Alerting"],"x-skills-preferred":["Security best practices for cloud platforms and Kubernetes","Docker and containerization principles","Certified Kubernetes Administrator","Certified Kubernetes Application Developer","AWS Certified DevOps Engineer"],"datePosted":"2026-04-18T15:46:19.185Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Bellevue, Washington; Chicago, Illinois; New York, New York; San Francisco, California; Washington, DC"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Kubernetes, Helm, Terraform, AWS, Cloud-native architectures, Kubernetes platform creation, Kubernetes management, Kubernetes optimisation, Helm for Kubernetes application deployment, Karpenter for dynamic scaling, Istio for service mesh, CI/CD pipelines, Automation tools, Python, Bash, Go, Monitoring, Logging, Alerting, Security best practices for cloud platforms and Kubernetes, Docker and containerization principles, Certified Kubernetes Administrator, Certified Kubernetes Application Developer, AWS Certified DevOps Engineer","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":174000,"maxValue":214000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_061e824c-343"},"title":"Software Engineer: Distributed Systems (Infrastructure)","description":"<p>About Us</p>\n<p>At Cloudflare, we are on a mission to help build a better Internet. Today the company runs one of the world&#39;s largest networks that powers millions of websites and other Internet properties for customers ranging from individual bloggers to SMBs to Fortune 500 companies.</p>\n<p>We protect and accelerate any Internet application online without adding hardware, installing software, or changing a line of code. Internet properties powered by Cloudflare all have web traffic routed through its intelligent global network, which gets smarter with every request. As a result, they see significant improvement in performance and a decrease in spam and other attacks.</p>\n<p>Responsibilities</p>\n<p>As a Software Engineer: Distributed Systems, you will be part of a Resiliency Organization responsible for the core services that power Cloudflare&#39;s global operations. We are looking for engineers to join the Infrastructure Intelligence team and shape the transition toward model-driven network orchestration.</p>\n<p>The team is building a cutting-edge &#39;Maintenance Coordination System&#39;, powered by an infrastructure dependency graph of one of the world&#39;s largest physical networks. This is a foundational step towards designing intelligent, autonomous systems that will transform the orchestration of Cloudflare&#39;s network.</p>\n<p>It forms the basis of many future projects to build the core data structures and services required to ensure our network optimization, network forecasting, and capacity planning are the state of the art. By creating the robust primitives for global coordination today, you will be enabling the next generation of data-driven infrastructure at Cloudflare.</p>\n<p>This is a unique opportunity to work on complex, globally distributed systems which underpin all Cloudflare products.</p>\n<p>Technologies we use:</p>\n<ul>\n<li>Cloudflare Workers, Workers KV, R2, and Durable Objects</li>\n</ul>\n<ul>\n<li>Kubernetes</li>\n</ul>\n<ul>\n<li>Go, Typescript, Python</li>\n</ul>\n<ul>\n<li>For service monitoring we use Prometheus, Grafana and Sentry</li>\n</ul>\n<p>Requirements</p>\n<ul>\n<li>A degree in Computer Science, Engineering, Mathematics, Statistics or related field; OR have relevant background/experience to the field.</li>\n</ul>\n<ul>\n<li>Programming experience in Go, or similar languages</li>\n</ul>\n<ul>\n<li>Experience in designing and implementing secure and highly-available distributed systems</li>\n</ul>\n<ul>\n<li>Experience (and love) for debugging to ensure the system works in all cases</li>\n</ul>\n<ul>\n<li>Experience with a continuous integration workflow and using source control (we use git)</li>\n</ul>\n<ul>\n<li>Experience with continuous delivery and deployment of a k8s hosted application</li>\n</ul>\n<ul>\n<li>Understanding of security issues and responsibilities</li>\n</ul>\n<ul>\n<li>Experience with monitoring, alerting and debugging high volume production systems</li>\n</ul>\n<ul>\n<li>Fluent in analyses of data sets such as logs</li>\n</ul>\n<ul>\n<li>Strong English language oral and written communications skills</li>\n</ul>\n<ul>\n<li>Designing and building APIs</li>\n</ul>\n<ul>\n<li>Experience with the Cloudflare development stack is a plus</li>\n</ul>\n<p>Examples of desirable skills, knowledge and experience</p>\n<ul>\n<li>At least 4 years of hands-on software development experience on meaningfully complex systems.</li>\n</ul>\n<ul>\n<li>Experience with graph theory and building services for graph generation, storage and retrieval.</li>\n</ul>\n<ul>\n<li>An understanding of the systems architecture required to scale machine learning model-driven decision engines in a production environment</li>\n</ul>\n<ul>\n<li>Experience building both backend systems and frontend widgets.</li>\n</ul>\n<ul>\n<li>Ability to contribute to planning, development, and execution to meet commitments and deliver with predictability.</li>\n</ul>\n<ul>\n<li>Experience implementing tools, processes, internal instrumentation, and methodologies.</li>\n</ul>\n<ul>\n<li>Comfortable working on projects with tight deadlines and short release cycles.</li>\n</ul>\n<ul>\n<li>Strong verbal and written English language skills.</li>\n</ul>\n<ul>\n<li>Experience with DCIM, CMDB, IPAM, and other Data Center and Asset Lifecycle Management tools is a plus.</li>\n</ul>\n<ul>\n<li>Experience with data ingestion and analysis - pulling metrics from hundreds of edge data centers.</li>\n</ul>\n<p>Compensation</p>\n<p>For Washington D.C. based hires: Estimated annual salary of $140,000 - 172,000.</p>\n<p>Equity</p>\n<p>This role is eligible to participate in Cloudflare&#39;s equity plan.</p>\n<p>Benefits</p>\n<p>Cloudflare offers a complete package of benefits and programs to support you and your family. Our benefits programs can help you pay health care expenses, support caregiving, build capital for the future and make life a little easier and fun!</p>\n<p>The below is a description of our benefits for employees in the United States, and benefits may vary for employees based outside the U.S.</p>\n<p>Health &amp; Welfare Benefits</p>\n<ul>\n<li>Medical/Rx Insurance</li>\n</ul>\n<ul>\n<li>Dental Insurance</li>\n</ul>\n<ul>\n<li>Vision Insurance</li>\n</ul>\n<ul>\n<li>Flexible Spending Accounts</li>\n</ul>\n<ul>\n<li>Commuter Spending Accounts</li>\n</ul>\n<ul>\n<li>Fertility &amp; Family Forming Benefits</li>\n</ul>\n<ul>\n<li>On-demand mental health support and Employee Assistance Program</li>\n</ul>\n<ul>\n<li>Global Travel Medical Insurance</li>\n</ul>\n<p>Financial Benefits</p>\n<ul>\n<li>Short and Long Term Disability Insurance</li>\n</ul>\n<ul>\n<li>Life &amp; Accident Insurance</li>\n</ul>\n<ul>\n<li>401(k) Retirement Savings Plan</li>\n</ul>\n<ul>\n<li>Employee Stock Participation Plan</li>\n</ul>\n<p>Time Off</p>\n<ul>\n<li>Flexible paid time off covering vacation and sick leave</li>\n</ul>\n<ul>\n<li>Leave programs, including parental, pregnancy health, medical, and bereavement leave</li>\n</ul>\n<p>What Makes Cloudflare Special?</p>\n<p>We&#39;re not just a highly ambitious, large-scale technology company. We&#39;re a highly ambitious, large-scale technology company with a soul. Fundamental to our mission to help build a better Internet is protecting the free and open Internet.</p>\n<p>Project Galileo: Since 2014, we&#39;ve equipped more than 2,400 journalism and civil society organizations in 111 countries with powerful tools to defend themselves against attacks that would otherwise censor their work, technology already used by Cloudflare&#39;s enterprise customers--at no cost.</p>\n<p>Athenian Project: In 2017, we created the Athenian Project to ensure that state and local governments have the highest level of protection and reliability for free, so that their constituents have access to election information and voter registration. Since the project, we&#39;ve provided services to more than 425 local government election websites in 33 states.</p>\n<p>1.1.1.1: We released 1.1.1.1 to help fix the foundation of the Internet by building a faster, more secure and privacy-centric public DNS resolver. This is available publicly for everyone to use - it is the first consumer-focused service Cloudflare has ever built.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_061e824c-343","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Cloudflare","sameAs":"https://www.cloudflare.com/","logo":"https://logos.yubhub.co/cloudflare.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/cloudflare/jobs/7088208","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Programming experience in Go, or similar languages","Experience in designing and implementing secure and highly-available distributed systems","Experience (and love) for debugging to ensure the system works in all cases","Experience with a continuous integration workflow and using source control (we use git)","Experience with continuous delivery and deployment of a k8s hosted application","Understanding of security issues and responsibilities","Experience with monitoring, alerting and debugging high volume production systems","Fluent in analyses of data sets such as logs","Strong English language oral and written communications skills","Designing and building APIs","Experience with the Cloudflare development stack is a plus"],"x-skills-preferred":["At least 4 years of hands-on software development experience on meaningfully complex systems","Experience with graph theory and building services for graph generation, storage and retrieval","An understanding of the systems architecture required to scale machine learning model-driven decision engines in a production environment","Experience building both backend systems and frontend widgets","Ability to contribute to planning, development, and execution to meet commitments and deliver with predictability","Experience implementing tools, processes, internal instrumentation, and methodologies","Comfortable working on projects with tight deadlines and short release cycles","Strong verbal and written English language skills","Experience with DCIM, CMDB, IPAM, and other Data Center and Asset Lifecycle Management tools is a plus","Experience with data ingestion and analysis - pulling metrics from hundreds of edge data centers"],"datePosted":"2026-04-18T15:45:07.400Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Hybrid"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Programming experience in Go, or similar languages, Experience in designing and implementing secure and highly-available distributed systems, Experience (and love) for debugging to ensure the system works in all cases, Experience with a continuous integration workflow and using source control (we use git), Experience with continuous delivery and deployment of a k8s hosted application, Understanding of security issues and responsibilities, Experience with monitoring, alerting and debugging high volume production systems, Fluent in analyses of data sets such as logs, Strong English language oral and written communications skills, Designing and building APIs, Experience with the Cloudflare development stack is a plus, At least 4 years of hands-on software development experience on meaningfully complex systems, Experience with graph theory and building services for graph generation, storage and retrieval, An understanding of the systems architecture required to scale machine learning model-driven decision engines in a production environment, Experience building both backend systems and frontend widgets, Ability to contribute to planning, development, and execution to meet commitments and deliver with predictability, Experience implementing tools, processes, internal instrumentation, and methodologies, Comfortable working on projects with tight deadlines and short release cycles, Strong verbal and written English language skills, Experience with DCIM, CMDB, IPAM, and other Data Center and Asset Lifecycle Management tools is a plus, Experience with data ingestion and analysis - pulling metrics from hundreds of edge data centers"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_b8b4f70b-624"},"title":"Software Engineer, Network Performance & Reliability","description":"<p>About Us At Cloudflare, we&#39;re on a mission to help build a better Internet. We protect and accelerate any Internet application online without adding hardware, installing software, or changing a line of code.</p>\n<p>As a member of the Argo team, you&#39;ll be a key technical contributor to the cutting-edge network software infrastructure used by Cloudflare&#39;s products. You will work closely with various Engineering teams to translate their requirements into new capabilities on the platform.</p>\n<p>Responsibilities</p>\n<ul>\n<li>Participate in all stages of the software development lifecycle, from designing and documenting systems, to writing code and automated tests, to planning, managing, and monitoring production software deployments.</li>\n<li>Work with a wide range of technologies and programming languages, including Rust, Go, Linux networking, ClickHouse, PostgreSQL, Grafana, Kubernetes, and more.</li>\n<li>Use AI-powered tools and systems as part of your daily workflow to analyze and extend codebases, introspect production systems and datasets, and accelerate problem-solving.</li>\n</ul>\n<p>Must-Have Skills</p>\n<ul>\n<li>Systems-level programming experience in Go, Rust, C, or C++.</li>\n<li>A solid grasp of networking protocols in Layers 3 and 4 of the OSI Model.</li>\n<li>Knowledge of HTTP, TLS, and CDN networks.</li>\n<li>Experience in implementing secure and highly-available distributed systems.</li>\n<li>Strong ability to debug issues in complex systems.</li>\n<li>Strong collaboration and communication skills.</li>\n</ul>\n<p>Bonus Points</p>\n<ul>\n<li>Knowledge of TCP/IP and Internet routing.</li>\n<li>Professional systems-level programming experience in Rust.</li>\n<li>Working knowledge of statistical-analysis techniques and control theory.</li>\n<li>Experience building tools and APIs.</li>\n<li>Experience with monitoring, alerting, and debugging large-scale distributed systems</li>\n<li>Experience participating in an on-call rotation.</li>\n<li>Experience using AI-assisted development tools (e.g., code completion, codebase analysis, log/data exploration) in a professional setting.</li>\n</ul>\n<p>What Makes Cloudflare Special?</p>\n<ul>\n<li>We&#39;re a highly ambitious, large-scale technology company with a soul.</li>\n<li>We&#39;re committed to protecting the free and open Internet.</li>\n<li>We&#39;ve equipped more than 2,400 journalism and civil society organizations in 111 countries with powerful tools to defend themselves against attacks that would otherwise censor their work.</li>\n<li>We&#39;ve provided services to more than 425 local government election websites in 33 states.</li>\n<li>We&#39;ve released 1.1.1.1 to help fix the foundation of the Internet by building a faster, more secure and privacy-centric public DNS resolver.</li>\n</ul>\n<p>Sound like something you&#39;d like to be a part of? We&#39;d love to hear from you!</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_b8b4f70b-624","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Cloudflare","sameAs":"https://www.cloudflare.com/","logo":"https://logos.yubhub.co/cloudflare.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/cloudflare/jobs/7446310","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Systems-level programming experience in Go, Rust, C, or C++","A solid grasp of networking protocols in Layers 3 and 4 of the OSI Model","Knowledge of HTTP, TLS, and CDN networks","Experience in implementing secure and highly-available distributed systems","Strong ability to debug issues in complex systems"],"x-skills-preferred":["Knowledge of TCP/IP and Internet routing","Professional systems-level programming experience in Rust","Working knowledge of statistical-analysis techniques and control theory","Experience building tools and APIs","Experience with monitoring, alerting, and debugging large-scale distributed systems"],"datePosted":"2026-04-18T15:44:56.994Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Hybrid"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Systems-level programming experience in Go, Rust, C, or C++, A solid grasp of networking protocols in Layers 3 and 4 of the OSI Model, Knowledge of HTTP, TLS, and CDN networks, Experience in implementing secure and highly-available distributed systems, Strong ability to debug issues in complex systems, Knowledge of TCP/IP and Internet routing, Professional systems-level programming experience in Rust, Working knowledge of statistical-analysis techniques and control theory, Experience building tools and APIs, Experience with monitoring, alerting, and debugging large-scale distributed systems"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_bd9625d9-99b"},"title":"ML Infrastructure Engineer, Safeguards","description":"<p>We are seeking a Machine Learning Infrastructure Engineer to join our Safeguards organization, where you&#39;ll build and scale the critical infrastructure that powers our AI safety systems.</p>\n<p>As part of the Safeguards team, you&#39;ll design and implement ML infrastructure that powers Claude safety. Your work will directly contribute to making AI systems more trustworthy and aligned with human values, ensuring our models operate safely as they become more capable.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Design and build scalable ML infrastructure to support real-time and batch classifier and safety evaluations across our model ecosystem</li>\n<li>Build monitoring and observability tools to track model performance, data quality, and system health for safety-critical applications</li>\n<li>Collaborate with research teams to productionize safety research, translating experimental safety techniques into robust, scalable systems</li>\n<li>Optimize inference latency and throughput for real-time safety evaluations while maintaining high reliability standards</li>\n<li>Implement automated testing, deployment, and rollback systems for ML models in production safety applications</li>\n<li>Partner with Safeguards, Security, and Alignment teams to understand requirements and deliver infrastructure that meets safety and production needs</li>\n<li>Contribute to the development of internal tools and frameworks that accelerate safety research and deployment</li>\n</ul>\n<p>You may be a good fit if you:</p>\n<ul>\n<li>Have 5+ years of experience building production ML infrastructure, ideally in safety-critical domains like fraud detection, content moderation, or risk assessment</li>\n<li>Are proficient in Python and have experience with ML frameworks like PyTorch, TensorFlow, or JAX</li>\n<li>Have hands-on experience with cloud platforms (AWS, GCP) and container orchestration (Kubernetes)</li>\n<li>Understand distributed systems principles and have built systems that handle high-throughput, low-latency workloads</li>\n<li>Have experience with data engineering tools and building robust data pipelines (e.g., Spark, Airflow, streaming systems)</li>\n<li>Are results-oriented, with a bias towards reliability and impact in safety-critical systems</li>\n<li>Enjoy collaborating with researchers and translating cutting-edge research into production systems</li>\n<li>Care deeply about AI safety and the societal impacts of your work</li>\n</ul>\n<p>Strong candidates may have experience with:</p>\n<ul>\n<li>Working with large language models and modern transformer architectures</li>\n<li>Implementing A/B testing frameworks and experimentation infrastructure for ML systems</li>\n<li>Developing monitoring and alerting systems for ML model performance and data drift</li>\n<li>Building automated labeling systems and human-in-the-loop workflows</li>\n<li>Experience in trust &amp; safety, fraud prevention, or content moderation domains</li>\n<li>Knowledge of privacy-preserving ML techniques and compliance requirements</li>\n<li>Contributing to open-source ML infrastructure projects</li>\n</ul>\n<p>We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_bd9625d9-99b","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4778843008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$320,000-$405,000 USD","x-skills-required":["Python","PyTorch","TensorFlow","JAX","Cloud platforms (AWS, GCP)","Container orchestration (Kubernetes)","Distributed systems principles","Data engineering tools (Spark, Airflow, streaming systems)"],"x-skills-preferred":["Large language models and modern transformer architectures","A/B testing frameworks and experimentation infrastructure for ML systems","Monitoring and alerting systems for ML model performance and data drift","Automated labeling systems and human-in-the-loop workflows","Trust & safety, fraud prevention, or content moderation domains","Privacy-preserving ML techniques and compliance requirements","Open-source ML infrastructure projects"],"datePosted":"2026-04-18T15:44:06.907Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, PyTorch, TensorFlow, JAX, Cloud platforms (AWS, GCP), Container orchestration (Kubernetes), Distributed systems principles, Data engineering tools (Spark, Airflow, streaming systems), Large language models and modern transformer architectures, A/B testing frameworks and experimentation infrastructure for ML systems, Monitoring and alerting systems for ML model performance and data drift, Automated labeling systems and human-in-the-loop workflows, Trust & safety, fraud prevention, or content moderation domains, Privacy-preserving ML techniques and compliance requirements, Open-source ML infrastructure projects","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":320000,"maxValue":405000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_0a80aec8-c25"},"title":"Senior Software Engineer, Payments","description":"<p>We are looking for a self-motivated Senior Software Engineer to join our Payments team. As a member of this team, you will be responsible for designing, implementing, and maintaining systems and tools that support flow-level observability, payments reliability, and scalability.</p>\n<p>Your primary focus will be on building and managing large-scale platforms to improve the availability of our Payments platform for internal and external stakeholders. You will collaborate closely with other Payments engineering teams and Infra teams to ensure services are instrumented, scalable, and resilient to support our growing business.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Designing, implementing, and maintaining systems and tools at a platform level that support flow-level observability, payments reliability, and scalability.</li>\n<li>Identifying and driving improvements to increase the Payments Availability, Observability, and Resiliency of Airbnb Payments.</li>\n<li>Developing observability standards/framework for new product readiness to ensure service reliability in SOA and distributed systems.</li>\n<li>Building domain expertise to achieve scalability by understanding the nuances of Payments across processing, compliance, and infra.</li>\n<li>Driving large-scale migration and adoption projects on Observability &amp; Reliability by cross-collaborating with various Payments teams.</li>\n<li>Leading initiatives that promote a culture of reliability throughout the organization by improving incident management platforms and instrumentation.</li>\n</ul>\n<p>Requirements:</p>\n<ul>\n<li>7+ years of experience in back-end software development focusing on large-scale distributed systems.</li>\n<li>BE/B.Tech in Computer Science or a related technical field.</li>\n<li>Strong software development skills in one or more languages such as Java, Python, Kotlin, Scala, or Ruby on Rails.</li>\n<li>Experience in building intelligent AI agents and systems powered by Large Language Models is a plus.</li>\n<li>Evidence of exposure to architectural patterns of a large, high-scale web application (e.g., well-designed APIs, high-volume data pipelines, efficient algorithms).</li>\n<li>Familiarity with cloud platforms like AWS or Google Cloud Platform.</li>\n<li>Deep understanding of software development best practices, including version control, automated testing, CI/CD, and code reviews.</li>\n<li>Experience in incident management, monitoring, alerting, and root cause analysis.</li>\n<li>Effective leadership and communication skills to coordinate cross-functional teams during large-scale projects.</li>\n<li>Experience with initiatives across auto-scaling, self-healing mechanisms, chaos engineering, performance optimization techniques will be a plus.</li>\n<li>Previous experience in AI/ML will also be a plus.</li>\n</ul>\n<p>If you are a strong problem solver and have worked in a team that is on-call for production systems before, we encourage you to apply.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_0a80aec8-c25","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Airbnb","sameAs":"https://www.airbnb.com/","logo":"https://logos.yubhub.co/airbnb.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/airbnb/jobs/7613550","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Java","Python","Kotlin","Scala","Ruby on Rails","Cloud platforms","Software development best practices","Incident management","Monitoring","Alerting","Root cause analysis"],"x-skills-preferred":["AI/ML","Auto-scaling","Self-healing mechanisms","Chaos engineering","Performance optimization techniques"],"datePosted":"2026-04-18T15:43:32.370Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Bangalore, India"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Java, Python, Kotlin, Scala, Ruby on Rails, Cloud platforms, Software development best practices, Incident management, Monitoring, Alerting, Root cause analysis, AI/ML, Auto-scaling, Self-healing mechanisms, Chaos engineering, Performance optimization techniques"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_068d5a1f-5ca"},"title":"Software Engineer","description":"<p>Join the team as Twilio&#39;s next Software Engineer.</p>\n<p>This position is needed to add to our Voice Connectivity Trust team to enable Twilio to better support our customers using Voice in their solutions.</p>\n<p>As a Software Engineer on this team, you will participate in all phases of the software development life cycle, including requirements gathering with Product Managers, technical design, estimations, sprint planning, coding, testing, deployments, and on-call support.</p>\n<p>In this role, you&#39;ll:</p>\n<ul>\n<li>Design and implement real-time services with high throughput and low latency requirements, verify, deploy, and operationalize them</li>\n</ul>\n<ul>\n<li>Work closely with stakeholders to understand customer needs and devise and deliver simple, robust, and scalable solutions</li>\n</ul>\n<ul>\n<li>Be comfortable expressing thoughts and ideas as detailed prose and use it as an effective means to collaborate with leads, architects, and cross-functional teams</li>\n</ul>\n<ul>\n<li>Embrace the challenge of scaling a complex distributed platform with points of presence globally, each one concerned with high availability, high reliability, high throughput, low latency, and media fidelity</li>\n</ul>\n<ul>\n<li>Figure out novel ways of solving customer problems for the Voice channel</li>\n</ul>\n<p>Twilio values diverse experiences from all kinds of industries, and we encourage everyone who meets the required qualifications to apply.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_068d5a1f-5ca","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Twilio","sameAs":"https://www.twilio.com/","logo":"https://logos.yubhub.co/twilio.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/twilio/jobs/7747550","x-work-arrangement":"remote","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Java","RESTful services","API design","event-driven architectures","Kafka","SQS","CI/CD pipelines","cloud infrastructures","AWS","GCP","OpenStack","Azure","excellent written communication skills","strong Java fundamentals","architect","review","debug code","proven ability to critically evaluate AI-generated code","demonstrated proficiency working with AI coding assistants"],"x-skills-preferred":["on-call rotations","incident response","monitoring/alerting tools","Prometheus","Datadog","Grafana","experience scaling data tiers","SQL/NoSQL database and caching technologies","horizontally-scalable","resilient","performing-under-load systems","SIP protocol","Stir/Shaken protocol"],"datePosted":"2026-04-18T15:43:25.354Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Remote - Ireland"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Java, RESTful services, API design, event-driven architectures, Kafka, SQS, CI/CD pipelines, cloud infrastructures, AWS, GCP, OpenStack, Azure, excellent written communication skills, strong Java fundamentals, architect, review, debug code, proven ability to critically evaluate AI-generated code, demonstrated proficiency working with AI coding assistants, on-call rotations, incident response, monitoring/alerting tools, Prometheus, Datadog, Grafana, experience scaling data tiers, SQL/NoSQL database and caching technologies, horizontally-scalable, resilient, performing-under-load systems, SIP protocol, Stir/Shaken protocol"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_6ef6b4ad-b8a"},"title":"Technical Support Analyst","description":"<p>We are looking for a Technical Support Analyst to join our Global Customer Success organization. As a Technical Support Analyst, you will be responsible for building and maintaining real-time monitoring systems, responding to critical incidents, and working alongside backend teams to diagnose and resolve issues fast.</p>\n<p>This is a role for someone who genuinely loves solving technical puzzles and takes pride in being the first line of defense for our clients. You will be expected to adapt, grow, and bring fresh ideas to improve how we operate.</p>\n<p>Your contribution will be:</p>\n<ul>\n<li>Implement and maintain a robust real-time monitoring system that ensures full visibility into critical workflows before incidents escalate</li>\n<li>Provide top support to clients, acting as the main line of defense to address issues, answer queries, and escalate critical incidents when necessary</li>\n<li>Assist backend teams with scripting, bug reproduction, log analysis, and basic API testing</li>\n<li>Create and standardize operational processes that enable scalability and consistent service quality</li>\n<li>Analyze recurring issues and propose data-driven improvements to position the NOC as a strategic function</li>\n<li>Ensure continuous operational coverage with well-structured shift handovers</li>\n<li>Participate in some development activities to build hands-on backend knowledge</li>\n<li>Identify gaps in current tools and workflows and bring solutions to the table</li>\n</ul>\n<p>Skills You Need:</p>\n<ul>\n<li>Fluent English, Spanish &amp; Portuguese(written and verbal)</li>\n<li>5+ year of experience in technical support, NOC operations, or a similar role</li>\n<li>Basic knowledge of monitoring tools and alerting systems</li>\n<li>Coding experience , scripting, debugging, or log analysis</li>\n<li>Familiarity with APIs and ability to assist users with integration or connectivity issues</li>\n<li>Strong analytical and problem-solving mindset</li>\n<li>Customer empathy and a service-oriented approach</li>\n<li>Comfort working in fast-paced, high-stakes environments</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_6ef6b4ad-b8a","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Yuno","sameAs":"https://www.yuno.com/","logo":"https://logos.yubhub.co/yuno.com.png"},"x-apply-url":"https://jobs.lever.co/yuno/8dbc0acd-1b1a-4ff7-b7c6-0423ba024b03","x-work-arrangement":"remote","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Fluent English, Spanish & Portuguese(written and verbal)","5+ year of experience in technical support, NOC operations, or a similar role","Basic knowledge of monitoring tools and alerting systems","Coding experience — scripting, debugging, or log analysis","Familiarity with APIs and ability to assist users with integration or connectivity issues"],"x-skills-preferred":[],"datePosted":"2026-04-17T13:12:31.542Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"São Paulo"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Fluent English, Spanish & Portuguese(written and verbal), 5+ year of experience in technical support, NOC operations, or a similar role, Basic knowledge of monitoring tools and alerting systems, Coding experience — scripting, debugging, or log analysis, Familiarity with APIs and ability to assist users with integration or connectivity issues"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_e308ff1b-d8b"},"title":"Software Engineer, DevOps, Research Platform","description":"<p>About Mistral AI\\n\\nAt Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life.\\n\\nWe are a team passionate about AI and its potential to transform society. Our diverse workforce thrives in competitive environments and is committed to driving innovation.\\n\\nRole Summary\\n\\nWe are seeking a talented and experienced software engineer to join our Research Platform team. You&#39;ll work closely with our R&amp;D team to build a cloud agnostic platform that improves the stability, scalability and velocity across the research department.\\n\\nResponsibilities\\n\\nAs a DevOps/Platform Engineer, your responsibilities will include:\\n\\n* Designing and implementing complex systems (e.g. scale our research CI with a strong focus toward reliability, reproducibility and speed)\\n\\n* Building flexible yet solid and accessible development environment for researchers, so they can focus on core mission.\\n\\n* Designing, implementing and advocating for solutions addressing large amounts of data and maintainable data pipelines.\\n\\n* Optimizing a variety of builds: container images, large libraries compilation times, python environments...\\n\\n* Building strong relationships with researchers, understanding their workflow and enabling them to achieve more by leveraging your expertise.\\n\\n* Communicating and producing documentation or any content that will help them to make the most out of the tools and systems you&#39;ll build.\\n\\n* Being part of the team that &quot;platformizes&quot; research and constantly improve the daily experience for researchers while avoiding future roadblocks.\\n\\nAbout You\\n\\n* 5+ years of successful experience in a similar DX / DevOps / SRE role.\\n\\n* Proficiency in software development (Python, Go...) and programming best practices.\\n\\n* Exposure to site reliability engineering: root cause analysis, in-production troubleshooting, on-call rotations...\\n\\n* Exposure to infrastructure management: CI/CD, containerization, orchestration, infra-as-code, monitoring, logging, alerting, observability...\\n\\n* Technical product mindset (e.g. understanding how to debug poor adoption).\\n\\n* Excellent problem-solving and communication skills (ability to contextualizing, gauging risks and getting buy-in for high stakes and impactful solutions).\\n\\n* Ownership, high agency and constantly seeking to learn and improving things for others.\\n\\n* Autonomous, self-driven and able to work well in a fast-paced startup environment.\\n\\n* Low ego and team spirit mindset.\\n\\nYour Application Will Be All The More Interesting If You Also Have:\\n\\n* First hand Bazel (or equivalent) experience.\\n\\n* Strong knowledge of Python&#39;s ecosystem.\\n\\n* Familiarity with GPU based workloads and ecosystems.\\n\\n* Experience of full remote environments (you&#39;re comfortable with having some of your users on the other side of the globe).\\n\\nHiring Process\\n\\n* Intro Call - 30 min\\n\\n* Tech Culture Interview - 30 min\\n\\n* Technical Rounds - 2 x 45 min\\n\\n* Culture-fit Discussion - 30 min\\n\\n* Reference Calls\\n\\nBy Applying, You Agree To Our Applicant Privacy Policy.\\n\\nAdditional Information\\n\\nLocation &amp; Remote\\n\\nThis role is primarily based at one of our European offices (Paris, France and London, UK). We will prioritize candidates who either reside there or are open to relocating. We strongly believe in the value of in-person collaboration to foster strong relationships and seamless communication within our team. In certain specific situations, we will also consider remote candidates based in one of the countries listed in this job posting , currently France &amp; UK. In that case, we ask all new hires to visit our local office:\\n\\n* for the first week of their onboarding (accommodation and travelling covered)\\n\\n* then at least 3 days per month\\n\\nWhat We Offer\\n\\n* Competitive salary and equity\\n\\n* Health insurance\\n\\n* Transportation allowance\\n\\n* Sport allowance\\n\\n* Meal vouchers\\n\\n* Private pension plan\\n\\n* Parental: Generous parental leave policy\\n\\n* Visa sponsorship\\n\\nBy Applying, You Agree To Our Applicant Privacy Policy.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_e308ff1b-d8b","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai","logo":"https://logos.yubhub.co/mistral.ai.png"},"x-apply-url":"https://jobs.lever.co/mistral/18be2b70-c05d-48e4-82ac-e5cb462c96c0","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["software development","python","go","site reliability engineering","infrastructure management","CI/CD","containerization","orchestration","infra-as-code","monitoring","logging","alerting","observability"],"x-skills-preferred":["bazel","python's ecosystem","gpu based workloads","full remote environments"],"datePosted":"2026-04-17T12:48:20.869Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Paris"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"software development, python, go, site reliability engineering, infrastructure management, CI/CD, containerization, orchestration, infra-as-code, monitoring, logging, alerting, observability, bazel, python's ecosystem, gpu based workloads, full remote environments"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_a2e88648-d1d"},"title":"Mistral Cloud - Site Reliability Engineer","description":"<p>We are seeking highly experienced Site Reliability Engineers (SRE) to shape the reliability, scalability and performance of our Cloud platform and customer facing applications.</p>\n<p>You will work closely with our software engineers and product teams to ensure our systems meet and exceed our internal and external customers&#39; expectations.</p>\n<p>Key responsibilities include:</p>\n<ul>\n<li>Design, build, and maintain scalable, highly available and fault-tolerant infrastructures</li>\n<li>Operate systems and troubleshoot issues in production environments</li>\n<li>Implement and improve monitoring, alerting, and incident response systems</li>\n<li>Implement and maintain workflows and tools for both our customer-facing APIs and large training runs</li>\n</ul>\n<p>Development responsibilities include:</p>\n<ul>\n<li>Drive continuous improvement in infrastructure automation, deployment, and orchestration</li>\n<li>Collaborate with software engineers to develop and implement solutions that enable safe and reproducible model-training experiments</li>\n<li>Help build a cloud platform offering an abstraction layer between science, engineering and infrastructure</li>\n<li>Design and develop new workflows and tooling to improve the reliability, availability and performance of our systems</li>\n</ul>\n<p>Additional responsibilities include:</p>\n<ul>\n<li>Collaborate with the security team to ensure infrastructure adheres to best security practices and compliance requirements</li>\n<li>Document processes and procedures to ensure consistency and knowledge sharing across the team</li>\n<li>Contribute to open-source projects, research publications, blog articles and conferences</li>\n</ul>\n<p>About you:</p>\n<ul>\n<li>Master’s degree in Computer Science, Engineering or a related field</li>\n<li>5+ years of experience in a DevOps/SRE role</li>\n<li>Strong experience with bare metal infrastructure and highly available distributed systems</li>\n<li>Exposure to site reliability issues in critical environments</li>\n<li>Experience working against reliability KPIs</li>\n<li>Hands-on experience with CI/CD, containerization and orchestration tools</li>\n<li>Knowledge of monitoring, logging, alerting and observability tools</li>\n<li>Familiarity with infrastructure-as-code tools</li>\n<li>Proficiency in scripting languages and knowledge of software development best practices</li>\n<li>Strong understanding of networking, security, and system administration concepts</li>\n<li>Excellent problem-solving and communication skills</li>\n</ul>\n<p>Your application will be all the more interesting if you also have:</p>\n<ul>\n<li>Experience in an AI/ML environment</li>\n<li>Experience of high-performance computing (HPC) systems and workload managers</li>\n<li>Worked with modern AI-oriented solutions</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_a2e88648-d1d","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai","logo":"https://logos.yubhub.co/mistral.ai.png"},"x-apply-url":"https://jobs.lever.co/mistral/f76907fd-428a-4824-a1cf-8013974fde29","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["bare metal infrastructure","highly available distributed systems","CI/CD","containerization","orchestration tools","monitoring","logging","alerting","observability tools","infrastructure-as-code tools","scripting languages","software development best practices","networking","security","system administration"],"x-skills-preferred":["AI/ML environment","high-performance computing (HPC) systems","workload managers","modern AI-oriented solutions"],"datePosted":"2026-04-17T12:47:48.920Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Paris"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"bare metal infrastructure, highly available distributed systems, CI/CD, containerization, orchestration tools, monitoring, logging, alerting, observability tools, infrastructure-as-code tools, scripting languages, software development best practices, networking, security, system administration, AI/ML environment, high-performance computing (HPC) systems, workload managers, modern AI-oriented solutions"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_a632e52b-c63"},"title":"Site Reliability Engineer","description":"<p>About Mistral AI</p>\n<p>At Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life.</p>\n<p>We are a dynamic team passionate about AI and its potential to transform society. Our diverse workforce thrives in competitive environments and is committed to driving innovation.</p>\n<p>Role Summary</p>\n<p>We are seeking highly experienced Site Reliability Engineers (SRE) to shape the reliability, scalability and performance of our platform and customer facing applications. You will work closely with our software engineers and research teams to ensure our systems meet and exceed our internal and external customers&#39; expectations.</p>\n<p>Responsibilities</p>\n<p>As a Site Reliability Engineer, you balance the day-to-day operations on production systems with long-term software engineering improvements to reduce operational toil and foster the reliability, availability, and performance of these systems.</p>\n<p>Operations</p>\n<p>• Design, build, and maintain scalable, highly available and fault-tolerant infrastructures to support our web services and ML workloads</p>\n<p>• Make sure our platform, inference and model training environments are always highly available and enable seamless replication of work environments across several HPC clusters</p>\n<p>• Operate systems and troubleshoot issues in production environments (interrupts, on-call responses, users admin, data extraction, infrastructure scaling, etc.)</p>\n<p>• Implement and improve monitoring, alerting, and incident response systems to ensure optimal system performance and minimize downtime</p>\n<p>• Implement and maintain workflows and tools (CI/CD, containerization, orchestration, monitoring, logging and alerting systems) for both our client-facing APIs and large training runs</p>\n<p>• Participate occasionally in on-call rotations to respond to incidents and perform root cause analysis to prevent future occurrences</p>\n<p>Development</p>\n<p>• Drive continuous improvement in infrastructure automation, deployment, and orchestration using tools like Kubernetes, Flux, Terraform</p>\n<p>• Collaborate with AI/ML researchers to develop and implement solutions that enable safe and reproducible model-training experiments</p>\n<p>• Build a cloud-agnostic platform offering an abstraction layer between science and infrastructure</p>\n<p>• Design and develop new workflows and tooling to improve to the reliability, availability and performance of our systems (automation scripts, refactoring, new API-based features, web apps, dashboards, etc.)</p>\n<p>• Collaborate with the security team to ensure infrastructure adheres to best security practices and compliance requirements</p>\n<p>• Document processes and procedures to ensure consistency and knowledge sharing across the team</p>\n<p>• Contribute to open-source projects, research publications, blog articles and conferences</p>\n<p>About You</p>\n<p>• Master’s degree in Computer Science, Engineering or a related field</p>\n<p>• 7+ years of experience in a DevOps/SRE role</p>\n<p>• Strong experience with cloud computing and highly available distributed systems</p>\n<p>• Exposure to site reliability issues in critical environments (issue root cause analysis, in-production troubleshooting, on-call rotations...)</p>\n<p>• Experience working against reliability KPIs (observability, alerting, SLAs)</p>\n<p>• Hands-on experience with CI/CD, containerization and orchestration tools (Docker, Kubernetes...)</p>\n<p>• Knowledge of monitoring, logging, alerting and observability tools (Prometheus, Grafana, ELK Stack, Datadog...)</p>\n<p>• Familiarity with infrastructure-as-code tools like Terraform or CloudFormation</p>\n<p>• Proficiency in scripting languages (Python, Go, Bash...) and knowledge of software development best practices</p>\n<p>• Strong understanding of networking, security, and system administration concepts</p>\n<p>• Excellent problem-solving and communication skills</p>\n<p>• Self-motivated and able to work well in a fast-paced startup environment</p>\n<p>Your Application Will Be All The More Interesting If You Also Have:</p>\n<p>• Experience in an AI/ML environment</p>\n<p>• Experience of high-performance computing (HPC) systems and workload managers (Slurm)</p>\n<p>• Worked with modern AI-oriented solutions (Fluidstack, Coreweave, Vast...)</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_a632e52b-c63","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai","logo":"https://logos.yubhub.co/mistral.ai.png"},"x-apply-url":"https://jobs.lever.co/mistral/6e16e4fa-a60b-4270-a815-06b0450fb597","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["cloud computing","highly available distributed systems","DevOps","SRE","Kubernetes","Flux","Terraform","CI/CD","containerization","orchestration","monitoring","logging","alerting","observability","infrastructure-as-code","scripting languages","software development best practices","networking","security","system administration"],"x-skills-preferred":["AI/ML environment","high-performance computing (HPC) systems","workload managers","modern AI-oriented solutions"],"datePosted":"2026-04-17T12:47:37.519Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Paris"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"cloud computing, highly available distributed systems, DevOps, SRE, Kubernetes, Flux, Terraform, CI/CD, containerization, orchestration, monitoring, logging, alerting, observability, infrastructure-as-code, scripting languages, software development best practices, networking, security, system administration, AI/ML environment, high-performance computing (HPC) systems, workload managers, modern AI-oriented solutions"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_7f9a476c-84f"},"title":"Cybersecurity Engineer, SIEM","description":"<p>About Mistral AI</p>\n<p>At Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life.</p>\n<p>We are a global company with teams distributed between France, USA, UK, Germany, and Singapore. We are looking for a Security Platform Engineer to architect and maintain the infrastructure ensuring the observability of our production systems.</p>\n<p>Role Summary</p>\n<p>Mistral is looking for a Security Platform Engineer to own the set-up, lifecycle, availability, and performance of the SIEM solution, ensuring 99.9% uptime for log ingestion and query availability. The successful candidate will design and maintain high-throughput data pipelines to collect, buffer, and transport logs from distributed systems to the SIEM.</p>\n<p>Responsibilities</p>\n<ul>\n<li>Own the set-up, lifecycle, availability, and performance of the SIEM solution, ensuring 99.9% uptime for log ingestion and query availability.</li>\n<li>Design and maintain high-throughput data pipelines to collect, buffer, and transport logs from distributed systems to the SIEM.</li>\n<li>Implement parsing logic and schema standardization to ensure unstructured logs are searchable and actionable for analysts.</li>\n<li>Manage alert rules, connectors, and dashboard configurations, avoiding manual console configuration (&#39;ClickOps&#39;).</li>\n<li>Analyze ingestion patterns to identify noisy, low-value data. Implement filtering and aggregation at the source to maximize signal-to-noise ratio.</li>\n<li>Architect data tiers to balance query performance with compliance retention requirements and cloud costs.</li>\n</ul>\n<p>About You</p>\n<ul>\n<li>5+ years of experience in Site Reliability Engineering (SRE), Data Engineering, or Security Engineering with a focus on logging infrastructure.</li>\n<li>Deep understanding of log management challenges at scale (indexing strategies, sharding, partitioning, throughput tuning).</li>\n<li>Strong experience deploying and monitoring stateful workloads on Kubernetes and Cloud providers (Azure/GCP) and On-Prem.</li>\n<li>Ability to write production-grade Python or Go for automation and custom log exporters.</li>\n<li>Experience managing monitoring, alerting, and on-call rotations for critical infrastructure.</li>\n</ul>\n<p>Hiring Process</p>\n<ul>\n<li>Introduction call - 30 min</li>\n<li>Hiring Manager interview - 30 min</li>\n<li>Technical Rounds I - 45 min</li>\n<li>Technical Rounds II - 60 min</li>\n<li>Culture-fit discussion - 30 min</li>\n<li>References</li>\n</ul>\n<p>By applying, you agree to our Applicant Privacy Policy.</p>\n<p><strong>Additional Information</strong></p>\n<p>Location &amp; Remote</p>\n<p>The position is based in our Paris HQ offices and we encourage going to the office as much as we can (at least 3 days per week) to create bonds and smooth communication. Our remote policy aims to provide flexibility, improve work-life balance and increase productivity. Each manager can decide the amount of days worked remotely based on autonomy and a specific context (e.g. more flexibility can occur during summer). In any case, employees are expected to maintain regular communication with their teams and be available during core working hours.</p>\n<p>What we offer</p>\n<p>💰 Competitive salary and equity package 🧑‍⚕️ Health insurance 🚴 Transportation allowance 🥎 Sport allowance 🥕 Meal vouchers 💰 Private pension plan 🍼 Generous parental leave policy</p>\n<p>By applying, you agree to our Applicant Privacy Policy.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_7f9a476c-84f","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai","logo":"https://logos.yubhub.co/mistral.ai.png"},"x-apply-url":"https://jobs.lever.co/mistral/6f7f6e7a-3dc4-430b-8957-a64450a10066","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Log management","SIEM","Kubernetes","Cloud providers","Python","Go","Monitoring","Alerting","On-call rotations"],"x-skills-preferred":[],"datePosted":"2026-04-17T12:47:08.705Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Paris"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Log management, SIEM, Kubernetes, Cloud providers, Python, Go, Monitoring, Alerting, On-call rotations"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_45cbaf4a-254"},"title":"Software Engineer, Frontend","description":"<p>About Mistral AI</p>\n<p>At Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life.</p>\n<p>We are a global company with teams distributed between France, USA, UK, Germany, and Singapore. We are a team of developers, designers, and researchers passionate about AI and its potential to transform society.</p>\n<p>Role Summary</p>\n<p>We are seeking a passionate and skilled Senior Frontend Engineer to join our growing team. In this role, you will have the unique opportunity to work on our complete range of products, contributing to its development and enhancement. Your work will directly impact the user experience, making it more engaging, efficient, and intuitive.</p>\n<p>Responsibilities</p>\n<p>• Full Stack Development: Design, develop, and maintain scalable and robust features, ensuring seamless integration between front-end and back-end systems using a modern stack.</p>\n<p>• User-Centric Design: Prioritize user experience and ensure that our products meet the needs and expectations of our user base.</p>\n<p>• Code Quality: Write clean, maintainable, and well-documented code, and participate in code reviews to uphold our high standards of quality.</p>\n<p>• Collaboration: Work closely with cross-functional teams, including product managers, designers, and other engineers, to deliver high-quality software solutions.</p>\n<p>• Problem-Solving: Tackle complex technical challenges and develop elegant, efficient solutions that improve performance and reliability.</p>\n<p>• Innovation: Stay up-to-date with the latest technologies and trends in AI and software development, and apply them to enhance our products.</p>\n<p>About You</p>\n<p>• Proficient in TypeScript and NodeJS with a preference for experience in B2C contexts.</p>\n<p>• Experience in a Front-end framework like React, NextJS, Remix, VueJS.</p>\n<p>• Knowledge of web libraries like Tanstack React Query, tRPC, Framer Motion, …</p>\n<p>• Comfortable shipping products end-to-end.</p>\n<p>• Strong problem-solving abilities and attention to detail.</p>\n<p>• Excellent communication.</p>\n<p>• Low Ego and team spirit mindset.</p>\n<p>• Autonomous and self-starter.</p>\n<p>Now it would be ideal if you have experience with:</p>\n<p>• AI Products, particularly with LLMs.</p>\n<p>• Python.</p>\n<p>• Distributed Systems.</p>\n<p>• Monitoring/Alerting.</p>\n<p>• UX development (Figma).</p>\n<p>Hiring Process</p>\n<p>• Introduction call - 45 min.</p>\n<p>• Hiring Manager interview - 30 min.</p>\n<p>• Technical Rounds - Live-coding Interview (TypeScript) - 45 min - System Design Interview - 45 min - Optional: Deep Dive Interview - 60 min.</p>\n<p>• Culture-fit discussion - 30 min.</p>\n<p>• References.</p>\n<p>Additional Information</p>\n<p>Location &amp; Remote</p>\n<p>This role is based in one of our European offices (Paris, France and London, UK). We will only consider candidates who either reside or are open to relocating there. We strongly believe in the value of in-person collaboration and we encourage going to the office as much as we can (at least 3 days per week) to create bonds and smooth communication.</p>\n<p>Our remote policy aims to provide flexibility, improve work-life balance and increase productivity.</p>\n<p>What we offer</p>\n<p>• Competitive salary and equity (stock-options).</p>\n<p>• Health insurance.</p>\n<p>• Transportation allowance.</p>\n<p>• Sport allowance.</p>\n<p>• Meal vouchers.</p>\n<p>• Private pension plan.</p>\n<p>• Generous parental leave policy.</p>\n<p>• Visa sponsorship.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_45cbaf4a-254","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai","logo":"https://logos.yubhub.co/mistral.ai.png"},"x-apply-url":"https://jobs.lever.co/mistral/305432ef-27ac-4012-a893-a662813ac6e9","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["TypeScript","NodeJS","React","NextJS","Remix","VueJS","Tanstack React Query","tRPC","Framer Motion"],"x-skills-preferred":["AI Products","Python","Distributed Systems","Monitoring/Alerting","UX development (Figma)"],"datePosted":"2026-04-17T12:46:32.859Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Paris"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"TypeScript, NodeJS, React, NextJS, Remix, VueJS, Tanstack React Query, tRPC, Framer Motion, AI Products, Python, Distributed Systems, Monitoring/Alerting, UX development (Figma)"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_3b11932f-d81"},"title":"Senior Software Engineer - Banking Integration Platform","description":"<p>When the Space Shuttle approached the International Space Station, two vehicles built by different teams, in different countries, with fundamentally different engineering philosophies and systems, had to connect perfectly. The Rendezvous, Proximity Operations, and Docking (RPOD) subsystems were engineered to handle complex mismatches such as different power systems, communication protocols, and technical architectures. Get it wrong, and you have an expensive and potentially catastrophic problem in low Earth orbit.</p>\n<p>Mercury is building a bank and will be connecting our modern, product-focused engineering systems to enterprise core banking systems and payment networks built in a different era, with different assumptions and different interfaces. Our Banking Integration Platform as a Service team is like NASA’s RPOD team, building our integration subsystems that are technically correct and operationally trustworthy.</p>\n<p>This is some of the most consequential infrastructure work at Mercury. Every account opening, every monetary transaction, and every balance call will flow through the systems you build. Product teams across the company will depend on clean abstractions that hide the complexity underneath. You&#39;ll be one of the few engineers at Mercury who truly understands the full depth of our Bank Core* and all its internal and external integrations.</p>\n<p>In this role, you will:</p>\n<ul>\n<li>Build Mercury’s integration with an FFIEC-approved bank core and the connections to payment networks.</li>\n<li>Design internal APIs that give product teams simple, consistent interfaces to complex external systems.</li>\n<li>Handle the messy realities of enterprise integrations such as retries, failures, format mismatches, and downtime.</li>\n<li>Build data pipelines that keep Mercury&#39;s systems in sync with our bank core.</li>\n<li>Own monitoring, alerting, and recovery for our most critical external connections.</li>\n<li>Partner with many other teams at Mercury to define clean boundaries and reliable contracts.</li>\n<li>Help shape the technical architecture of Mercury Bank*.</li>\n</ul>\n<p>You should:</p>\n<ul>\n<li>Have direct experience with either a bank core that has achieved FFIEC-compliance (such as FIS) or that of a US-based Global Systemically Important Bank (G-SIB).</li>\n<li>Understand how core banking systems work: accounts, transactions, ledgers, and the data models underneath.</li>\n<li>Be a product-minded engineer who thinks about the developers consuming your APIs, not just the systems you’re connecting to.</li>\n<li>Thrive in environments where you&#39;re building something new rather than maintaining something established.</li>\n<li>Be comfortable with our tech stack (Haskell and TypeScript) or ready to learn.</li>\n<li>Have strong opinions about building reliable, maintainable systems.</li>\n</ul>\n<p>The total rewards package at Mercury includes base salary, equity, and benefits.</p>\n<p>Our salary and equity ranges are highly competitive within the SaaS and fintech industry and are updated regularly using the most reliable compensation survey data for our industry. New hire offers are made based on a candidate’s experience, expertise, geographic location, and internal pay equity relative to peers.</p>\n<p>Our target new hire base salary ranges for this role are the following:</p>\n<ul>\n<li>US employees (any location): $166,600 - $250,900</li>\n<li>Canadian employees (any location): CAD 157,400 - 237,100</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_3b11932f-d81","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mercury","sameAs":"https://www.mercury.com/","logo":"https://logos.yubhub.co/mercury.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/mercury/jobs/5791111004","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$166,600 - $250,900 (US employees), CAD 157,400 - 237,100 (Canadian employees)","x-skills-required":["bank core","FFIEC-compliance","Haskell","TypeScript","API design","data pipelines","monitoring","alerting","recovery"],"x-skills-preferred":[],"datePosted":"2026-04-17T12:46:21.374Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA, New York, NY, Portland, OR, or Remote within Canada or United States"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Finance","skills":"bank core, FFIEC-compliance, Haskell, TypeScript, API design, data pipelines, monitoring, alerting, recovery","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":157400,"maxValue":250900,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_62efca6f-b6f"},"title":"Senior AI Engineer","description":"<p>We&#39;re looking for a Senior AI Engineer who is obsessed with building AI systems that actually work in production: reliable, observable, cost-efficient, and genuinely useful. This is not a research role. You will ship AI-powered features that process real financial data for real businesses.</p>\n<p>LLM &amp; AI Pipeline Engineering - Design, build, and maintain production-grade LLM integration pipelines , including retrieval-augmented generation (RAG), prompt engineering, output parsing, and chain orchestration.</p>\n<p>Develop and operate AI features within Jeeves&#39;s core financial products: spend categorization, document extraction, anomaly detection, financial Q&amp;A, and automated reconciliation.</p>\n<p>Implement structured output validation, fallback handling, and confidence scoring to ensure AI decisions meet reliability standards for financial use cases.</p>\n<p>Evaluate and integrate AI frameworks and tools (LangChain, LlamaIndex, OpenAI API, Anthropic API, HuggingFace, vector databases) and advocate for the right tool for the job.</p>\n<p>Establish prompt versioning and evaluation practices to ensure AI outputs remain accurate and consistent as models and data evolve.</p>\n<p>Retrieval &amp; Vector Search - Design and maintain vector search pipelines using databases such as Pinecone, Weaviate, or pgvector to power semantic search and RAG-based features.</p>\n<p>Build document ingestion and chunking pipelines for Jeeves&#39;s financial data , processing invoices, receipts, policy documents, and transaction records.</p>\n<p>Optimize retrieval quality through embedding model selection, chunk strategy, metadata filtering, and re-ranking techniques.</p>\n<p>ML Model Serving &amp; Operations - Collaborate with data scientists to take trained ML models from experimental notebooks to production serving infrastructure.</p>\n<p>Build and maintain model serving endpoints with appropriate latency SLOs, input validation, and output monitoring.</p>\n<p>Implement model performance monitoring and data drift detection to ensure production models remain accurate over time.</p>\n<p>Support model retraining workflows by designing clean data pipelines and feature engineering that can be continuously updated.</p>\n<p>Backend Integration &amp; Reliability - Integrate AI services cleanly with Jeeves&#39;s backend microservices , designing clear API contracts, circuit breakers, and graceful degradation patterns.</p>\n<p>Write high-quality, testable backend code in Python or Go/Node.js to power AI-integrated features.</p>\n<p>Instrument AI components with structured logging, distributed tracing, latency dashboards, and alerting to ensure operational visibility.</p>\n<p>Collaboration &amp; Growth - Partner with Product, Backend Engineering, and Data Science to define the AI roadmap and translate requirements into reliable systems.</p>\n<p>Contribute to a culture of quality by writing design docs, reviewing peers&#39; AI system designs, and sharing learnings openly.</p>\n<p>Help grow the AI engineering practice at Jeeves by establishing patterns, tooling, and best practices that the broader team can build on.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_62efca6f-b6f","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Jeeves","sameAs":"https://www.jeeves.com/","logo":"https://logos.yubhub.co/jeeves.com.png"},"x-apply-url":"https://jobs.lever.co/tryjeeves/ded9e04e-f18e-4d4c-ae43-4b7882c6200b","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["LLM","AI","Python","LangChain","LlamaIndex","OpenAI API","Anthropic API","HuggingFace","vector databases","Pinecone","Weaviate","pgvector","semantic search","RAG-based features","document ingestion","chunking pipelines","embedding model selection","chunk strategy","metadata filtering","re-ranking techniques","model serving infrastructure","latency SLOs","input validation","output monitoring","model performance monitoring","data drift detection","clean data pipelines","feature engineering","API contracts","circuit breakers","graceful degradation patterns","structured logging","distributed tracing","latency dashboards","alerting"],"x-skills-preferred":[],"datePosted":"2026-04-17T12:39:23.341Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"India"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Finance","skills":"LLM, AI, Python, LangChain, LlamaIndex, OpenAI API, Anthropic API, HuggingFace, vector databases, Pinecone, Weaviate, pgvector, semantic search, RAG-based features, document ingestion, chunking pipelines, embedding model selection, chunk strategy, metadata filtering, re-ranking techniques, model serving infrastructure, latency SLOs, input validation, output monitoring, model performance monitoring, data drift detection, clean data pipelines, feature engineering, API contracts, circuit breakers, graceful degradation patterns, structured logging, distributed tracing, latency dashboards, alerting"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_e2350d04-53f"},"title":"Senior AI Engineer","description":"<p>We&#39;re looking for a Senior AI Engineer who is obsessed with building AI systems that actually work in production: reliable, observable, cost-efficient, and genuinely useful. This is not a research role. You will ship AI-powered features that process real financial data for real businesses.</p>\n<p>LLM &amp; AI Pipeline Engineering - Design, build, and maintain production-grade LLM integration pipelines , including retrieval-augmented generation (RAG), prompt engineering, output parsing, and chain orchestration.</p>\n<p>Develop and operate AI features within Jeeves&#39;s core financial products: spend categorization, document extraction, anomaly detection, financial Q&amp;A, and automated reconciliation.</p>\n<p>Implement structured output validation, fallback handling, and confidence scoring to ensure AI decisions meet reliability standards for financial use cases.</p>\n<p>Evaluate and integrate AI frameworks and tools (LangChain, LlamaIndex, OpenAI API, Anthropic API, HuggingFace, vector databases) and advocate for the right tool for the job.</p>\n<p>Establish prompt versioning and evaluation practices to ensure AI outputs remain accurate and consistent as models and data evolve.</p>\n<p>Retrieval &amp; Vector Search - Design and maintain vector search pipelines using databases such as Pinecone, Weaviate, or pgvector to power semantic search and RAG-based features.</p>\n<p>Build document ingestion and chunking pipelines for Jeeves&#39;s financial data , processing invoices, receipts, policy documents, and transaction records.</p>\n<p>Optimize retrieval quality through embedding model selection, chunk strategy, metadata filtering, and re-ranking techniques.</p>\n<p>ML Model Serving &amp; Operations - Collaborate with data scientists to take trained ML models from experimental notebooks to production serving infrastructure.</p>\n<p>Build and maintain model serving endpoints with appropriate latency SLOs, input validation, and output monitoring.</p>\n<p>Implement model performance monitoring and data drift detection to ensure production models remain accurate over time.</p>\n<p>Support model retraining workflows by designing clean data pipelines and feature engineering that can be continuously updated.</p>\n<p>Backend Integration &amp; Reliability - Integrate AI services cleanly with Jeeves&#39;s backend microservices , designing clear API contracts, circuit breakers, and graceful degradation patterns.</p>\n<p>Write high-quality, testable backend code in Python or Go/Node.js to power AI-integrated features.</p>\n<p>Instrument AI components with structured logging, distributed tracing, latency dashboards, and alerting to ensure operational visibility.</p>\n<p>Collaboration &amp; Growth - Partner with Product, Backend Engineering, and Data Science to define the AI roadmap and translate requirements into reliable systems.</p>\n<p>Contribute to a culture of quality by writing design docs, reviewing peers&#39; AI system designs, and sharing learnings openly.</p>\n<p>Help grow the AI engineering practice at Jeeves by establishing patterns, tooling, and best practices that the broader team can build on.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_e2350d04-53f","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Jeeves","sameAs":"https://www.jeeves.com/","logo":"https://logos.yubhub.co/jeeves.com.png"},"x-apply-url":"https://jobs.lever.co/tryjeeves/66241934-7138-4d7d-8b05-a211ec5d6e24","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["LLM","AI","Python","LangChain","LlamaIndex","OpenAI API","Anthropic API","HuggingFace","vector databases","Pinecone","Weaviate","pgvector","PostgreSQL","async patterns","cloud infrastructure","AWS","GCP","Azure","structured logging","distributed tracing","latency dashboards","alerting"],"x-skills-preferred":[],"datePosted":"2026-04-17T12:38:54.694Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Colombia"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Finance","skills":"LLM, AI, Python, LangChain, LlamaIndex, OpenAI API, Anthropic API, HuggingFace, vector databases, Pinecone, Weaviate, pgvector, PostgreSQL, async patterns, cloud infrastructure, AWS, GCP, Azure, structured logging, distributed tracing, latency dashboards, alerting"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_d477874c-cf5"},"title":"Senior AI Engineer","description":"<p>We&#39;re looking for a Senior AI Engineer who is obsessed with building AI systems that actually work in production: reliable, observable, cost-efficient, and genuinely useful. This is not a research role. You will ship AI-powered features that process real financial data for real businesses.</p>\n<p>LLM &amp; AI Pipeline Engineering - Design, build, and maintain production-grade LLM integration pipelines , including retrieval-augmented generation (RAG), prompt engineering, output parsing, and chain orchestration.</p>\n<p>Develop and operate AI features within Jeeves&#39;s core financial products: spend categorization, document extraction, anomaly detection, financial Q&amp;A, and automated reconciliation.</p>\n<p>Implement structured output validation, fallback handling, and confidence scoring to ensure AI decisions meet reliability standards for financial use cases.</p>\n<p>Evaluate and integrate AI frameworks and tools (LangChain, LlamaIndex, OpenAI API, Anthropic API, HuggingFace, vector databases) and advocate for the right tool for the job.</p>\n<p>Establish prompt versioning and evaluation practices to ensure AI outputs remain accurate and consistent as models and data evolve.</p>\n<p>Retrieval &amp; Vector Search - Design and maintain vector search pipelines using databases such as Pinecone, Weaviate, or pgvector to power semantic search and RAG-based features.</p>\n<p>Build document ingestion and chunking pipelines for Jeeves&#39;s financial data , processing invoices, receipts, policy documents, and transaction records.</p>\n<p>Optimize retrieval quality through embedding model selection, chunk strategy, metadata filtering, and re-ranking techniques.</p>\n<p>ML Model Serving &amp; Operations - Collaborate with data scientists to take trained ML models from experimental notebooks to production serving infrastructure.</p>\n<p>Build and maintain model serving endpoints with appropriate latency SLOs, input validation, and output monitoring.</p>\n<p>Implement model performance monitoring and data drift detection to ensure production models remain accurate over time.</p>\n<p>Support model retraining workflows by designing clean data pipelines and feature engineering that can be continuously updated.</p>\n<p>Backend Integration &amp; Reliability - Integrate AI services cleanly with Jeeves&#39;s backend microservices , designing clear API contracts, circuit breakers, and graceful degradation patterns.</p>\n<p>Write high-quality, testable backend code in Python or Go/Node.js to power AI-integrated features.</p>\n<p>Instrument AI components with structured logging, distributed tracing, latency dashboards, and alerting to ensure operational visibility.</p>\n<p>Collaboration &amp; Growth - Partner with Product, Backend Engineering, and Data Science to define the AI roadmap and translate requirements into reliable systems.</p>\n<p>Contribute to a culture of quality by writing design docs, reviewing peers&#39; AI system designs, and sharing learnings openly.</p>\n<p>Help grow the AI engineering practice at Jeeves by establishing patterns, tooling, and best practices that the broader team can build on.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_d477874c-cf5","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Jeeves","sameAs":"https://www.jeeves.com/","logo":"https://logos.yubhub.co/jeeves.com.png"},"x-apply-url":"https://jobs.lever.co/tryjeeves/639e39d0-b357-4bc2-aff2-968cdedb14b6","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["LLM","AI","Python","Go","Node.js","Pinecone","Weaviate","pgvector","LangChain","LlamaIndex","OpenAI API","Anthropic API","HuggingFace","vector databases","API contracts","circuit breakers","graceful degradation patterns","structured logging","distributed tracing","latency dashboards","alerting"],"x-skills-preferred":[],"datePosted":"2026-04-17T12:38:44.910Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Argentina"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Finance","skills":"LLM, AI, Python, Go, Node.js, Pinecone, Weaviate, pgvector, LangChain, LlamaIndex, OpenAI API, Anthropic API, HuggingFace, vector databases, API contracts, circuit breakers, graceful degradation patterns, structured logging, distributed tracing, latency dashboards, alerting"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_26bff84c-def"},"title":"Senior/Staff Platform Engineer/SRE","description":"<p>About the Role\nWe are seeking a Senior Platform Engineer who will design, develop, and deploy robust platform solutions to ensure the reliability, scalability, and security of our system.</p>\n<p>Responsibilities</p>\n<ul>\n<li>Identify and build AI-powered capabilities into Flow&#39;s platform, from intelligent automation in building operations to personalized resident experiences.</li>\n<li>Use AI-assisted development tools (e.g., Cursor, Claude Code) as part of your daily workflow to accelerate development, improve code quality, and push the boundaries of what a small team can ship.</li>\n<li>Collaborate with product and engineering teams to define clear requirements and translate them into software solutions.</li>\n<li>Core contributor to implementing foundational infrastructure, tooling and automation that is scalable, reliable, and secure.</li>\n<li>Elevate site reliability engineering best practices while collaborating with back-end developers.</li>\n<li>Develop service-level tooling to enhance productionization, data migrations, system hardening, and related initiatives.</li>\n<li>Manage and optimize a multi-region environment.</li>\n<li>Be available for on-call activities for infrastructure and services.</li>\n</ul>\n<p>Ideal Background</p>\n<ul>\n<li>A minimum 10 years in software engineering, site reliability engineering, or platform engineering.</li>\n<li>Fluency with AI-assisted development tools and a strong point of view on how AI changes the way software gets built.</li>\n<li>Ability to design, implement and maintain the tools and systems that support service reliability, monitoring, and alerting.</li>\n<li>Deep understanding of the principles of ensuring high availability, fault tolerance, and efficiency in distributed systems.</li>\n<li>Experience with Infrastructure as Code (IaC): Proficiency with Terraform.</li>\n<li>Experience with Kubernetes.</li>\n<li>Experience administering cloud-based infrastructure (GCP preferred).</li>\n<li>Experience troubleshooting production issues related to cloud infrastructure, configuration, monitoring, deployments, continuous integration and delivery.</li>\n<li>A keen ability to balance elegant design with pragmatic tradeoffs, prioritizing continuous delivery of business value.</li>\n<li>Ability to quickly learn and adapt to new skillsets.</li>\n<li>Experience building software in fast-moving startup environments.</li>\n<li>Participate in incident response and post-mortems to identify and address systemic issues.</li>\n</ul>\n<p>Additional Information\nBenefits</p>\n<ul>\n<li>Comprehensive Benefits Package (Medical / Dental / Vision / Disability / Life)</li>\n<li>Paid time off and 13 paid holidays</li>\n<li>401(k) retirement plan</li>\n<li>Healthcare and Dependent Care Flexible Spending Accounts (FSAs)</li>\n<li>Access to HSA-compatible plans</li>\n<li>Pre-tax commuter benefits</li>\n<li>Employee Assistance Program (EAP), free therapy through SpringHealth, acupuncture, and other wellness offerings</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_26bff84c-def","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Flow","sameAs":"https://flow.com","logo":"https://logos.yubhub.co/flow.com.png"},"x-apply-url":"https://jobs.lever.co/flowlife/3ae47b09-e4b4-41be-9312-fafb1d85cf4d","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$180,000-275,000 per year","x-skills-required":["AI-assisted development tools","Terraform","Kubernetes","Cloud-based infrastructure administration","Site reliability engineering","Monitoring and alerting","Service-level tooling","Multi-region environment management"],"x-skills-preferred":[],"datePosted":"2026-04-17T12:34:32.862Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Palo Alto"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"AI-assisted development tools, Terraform, Kubernetes, Cloud-based infrastructure administration, Site reliability engineering, Monitoring and alerting, Service-level tooling, Multi-region environment management","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":180000,"maxValue":275000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_1dc3f3a4-ced"},"title":"Senior Integrations Engineer","description":"<p>At Eve, we&#39;re redefining what&#39;s possible in legal technology. Our mission is to empower plaintiff law firms with AI-driven solutions that elevate how they operate, serve clients, and grow.</p>\n<p>We believe the future of law will be built by &#39;AI-Native Law Firms&#39; , firms that are managed, scaled, and optimized by intelligent systems rather than manual processes and endless administrative work. Eve&#39;s technology augments the capabilities of attorneys across every stage of a case , from intake and document review to strategy and settlement , so they can focus on what truly matters: achieving the best outcomes for their clients.</p>\n<p><strong>Why Join Eve:</strong></p>\n<p>Product-market fit: Eve is used by over 550+ law firms, and we&#39;re growing fast. Backed by top investors: We&#39;ve raised over $160M from world-class partners including Spark Capital, Andreessen Horowitz(A16z), Menlo Ventures, and Lightspeed. Built by a world-class team: Engineers, designers, and operators from places like Scale, Meta, Airbnb, Cruise, Square, Rubrik, and Lyft are building Eve from the ground up. AI-Native from day one: We&#39;re on the bleeding edge of AI, collaborating directly with teams at OpenAI and Anthropic to build best-in-class AI workflows tailored for legal work. Explosive growth: We are growing 2X revenue Quarter over Quarter. Our revenue model is consumption-based (per legal matter), which means our success is directly tied to how deeply customers adopt the product. That starts with integration. Customers with connected CMS integrations retain and expand at significantly higher rates, and nearly every account that has expanded started with a completed integration.</p>\n<p><strong>Why This Role:</strong></p>\n<p>You&#39;ll own the end-to-end integration experience for Eve&#39;s customers, from contract signed to integration live and usable. Eve integrates with multiple major case management platforms, each with its own API patterns, authentication models, deployment quirks, and edge cases. Some are cloud-native with clean REST APIs. Others are self-hosted on-prem systems behind firewalls that require weeks of configuration. Your job is to figure out the right integration path for each customer and drive it to completion. This is not a project management role, but it&#39;s not a pure engineering role either. You&#39;ll split your time between hands-on technical work (configuration, validation, debugging, scripting) and customer-facing delivery (implementation planning, expectation setting, vendor coordination, internal handoffs).</p>\n<p><strong>What You Will Accomplish:</strong></p>\n<p>Own integration delivery for customers from post-sale through go-live: scoping the integration path, validating configuration (auth setup, tenant and environment validation, allowlisting, callback/redirect setup, granular permissions, document store configuration), testing in sandbox and live environments, and driving to completion Scope integration approaches for new CMS partners, prototype solutions, and deliver technical requirements to Engineering for productization Manage strategic vendor relationships: pressuring vendors for API access, proving whether issues are on their side or ours, identifying unsupported or undocumented API behavior, and negotiating workarounds when standard paths don’t exist Debug and resolve production integration issues by tracing through logs, isolating the layer (our side, vendor side, or customer config), and resolving or routing to Engineering with a clear diagnosis Build the assets required to scale integration delivery: implementation playbooks, vendor-specific setup guides, troubleshooting matrices, customer readiness checklists, known limitation documentation, escalation criteria, and handoff notes for the CS team Identify patterns across customer integrations and feed productization requirements back to Engineering, without duplicating the platform Engineering owns</p>\n<p><strong>What We Are Looking For:</strong></p>\n<p>3-5+ years building, configuring, and debugging production API integrations (REST APIs, pagination, rate limiting, retry logic) Experience owning implementation delivery end-to-end, not just building integrations but driving them to completion with customers Proficiency in Python or Node.js (Python preferred) for scripting, data transformation, and API orchestration Hands-on experience with OAuth 2.0, JWT, token refresh cycles, and permission/scoping troubleshooting Comfortable working with webhooks and debugging common failure modes (expiry, missed events, configuration drift) Experience mapping data across systems that represent the same concepts differently Strong communication skills for customer calls, vendor negotiations, and internal coordination Interest in legal technology and how law firms operate</p>\n<p><strong>You’ll Thrive in This Role If You Have:</strong></p>\n<p>Direct experience with legal tech platforms (Clio, Filevine, SmartAdvocate, Litify) iPaaS or integration platform experience (Workato, Celigo, Mulesoft, Tray.io) A track record of creating implementation documentation, runbooks, or process guides that other teams use SQL familiarity and ability to reason about how integration data flows into a data warehouse On-prem and hybrid deployment experience with self-hosted systems, firewall rules, and VPNs Multi-tenant integration management across hundreds of customer accounts Experience building observability and alerting for integrations at scale</p>\n<p><strong>Additional Information:</strong></p>\n<p>Benefits: Competitive Salary &amp; Equity 401(k) Program with Employer Matching Health, Dental, Vision and Life Insurance Short Term and Long Term Disability Autonomous Work Environment Office Setup Reimbursement Flexible Time Off (FTO) + Holidays Quarterly Team Gatherings</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_1dc3f3a4-ced","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Eve","sameAs":"https://eve.com","logo":"https://logos.yubhub.co/eve.com.png"},"x-apply-url":"https://jobs.lever.co/Eve/8254c2d0-b201-4eba-9e87-4b353fce2e8d","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["API integrations","Python","Node.js","OAuth 2.0","JWT","token refresh cycles","permission/scoping troubleshooting","webhooks","debugging common failure modes","data transformation","API orchestration"],"x-skills-preferred":["legal tech platforms","iPaaS or integration platform","implementation documentation","runbooks","process guides","SQL","on-prem and hybrid deployment","multi-tenant integration management","observability and alerting"],"datePosted":"2026-04-17T12:30:05.903Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"US"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"API integrations, Python, Node.js, OAuth 2.0, JWT, token refresh cycles, permission/scoping troubleshooting, webhooks, debugging common failure modes, data transformation, API orchestration, legal tech platforms, iPaaS or integration platform, implementation documentation, runbooks, process guides, SQL, on-prem and hybrid deployment, multi-tenant integration management, observability and alerting"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_8af6c2b6-03c"},"title":"Member of Technical Staff, Domain (Backend Engineer)","description":"<p>At Anchorage Digital, we are building the world’s most advanced digital asset platform for institutions to participate in crypto. As a Member of Technical Staff on the Domain Engineering team, you are responsible for ensuring a robust technology stack, enabling our company to build scalable, efficient, and maintainable products. Allowing our product teams to focus on developing customer-focused features.</p>\n<p>You are a strong individual contributor and have the ability to significantly contribute to and execute complex engineering projects, enabled with appropriate coding and testing. You can understand the “why” in order to connect dependencies to the “bigger picture” and Anchorage mission and product roadmap.</p>\n<p><strong>Technical Skills</strong></p>\n<ul>\n<li>Collaborate with other engineering teams to identify areas for improvements across our engineering stack.</li>\n<li>Previous experience in establishing shared libraries across teams, with a focus on standardization, code quality, and reduced duplication.</li>\n<li>Proven experience with application observability projects that involved setting up performance metrics, log aggregation, tracing, and alerting systems.</li>\n</ul>\n<p><strong>Complexity and Impact of Work</strong></p>\n<ul>\n<li>Find the right balance between progress (i.e. shipping quickly) and perfection (i.e. measuring twice).</li>\n<li>Foster an efficient deterministic testing culture, with an emphasis on minimizing tech debt and bureaucracy.</li>\n<li>Ship code that will impact the whole organization.</li>\n</ul>\n<p><strong>Organizational Knowledge</strong></p>\n<ul>\n<li>Collaborate across multiple teams, especially on integration, standardization, and shared resources.</li>\n<li>Influence others by engaging in in-depth technical design discussions and demonstrating best practices through technical leadership by example.</li>\n<li>Make a meaningful impact across the entire engineering organization, extending influence beyond the immediate team.</li>\n</ul>\n<p><strong>Communication and Influence</strong></p>\n<ul>\n<li>Communicate technical concepts and solutions effectively to non-technical stakeholders.</li>\n<li>Build strong relationships with colleagues to drive collaboration and innovation.</li>\n</ul>\n<p><strong>You may be a fit for this role if you:</strong></p>\n<ul>\n<li>Are passionate about constantly seeking opportunities to refine and enhance existing systems and processes.</li>\n<li>Driven by a passion for being a force multiplier and influential technical leader in a dynamic, fast-paced startup environment.</li>\n<li>Have expert coding skills in Golang.</li>\n<li>Experienced in cross-functional projects, collaborating effectively with your team and adjacent teams to tackle complex challenges.</li>\n<li>Have excellent soft skills, including the ability to adapt communication for both internal and external stakeholders in an effective manner, bridging gaps with empathy and proactive communication.</li>\n</ul>\n<p><strong>Although not a requirement, bonus points if:</strong></p>\n<ul>\n<li>You have experience with infrastructure-as-code, Terraform, Gitops, Helm.</li>\n<li>You have experience with Google Cloud Platform &amp; Security.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_8af6c2b6-03c","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anchorage Digital","sameAs":"https://anchorage.com","logo":"https://logos.yubhub.co/anchorage.com.png"},"x-apply-url":"https://jobs.lever.co/anchorage/5898d01d-a4a5-44e5-8d20-2f6710dc2035","x-work-arrangement":"remote","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Golang","Application Observability","Performance Metrics","Log Aggregation","Tracing","Alerting Systems"],"x-skills-preferred":["Infrastructure-as-code","Terraform","Gitops","Helm","Google Cloud Platform & Security"],"datePosted":"2026-04-17T12:24:58.203Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"United States"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Finance","skills":"Golang, Application Observability, Performance Metrics, Log Aggregation, Tracing, Alerting Systems, Infrastructure-as-code, Terraform, Gitops, Helm, Google Cloud Platform & Security"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_37049070-1d7"},"title":"Software Engineer, Compute Infrastructure","description":"<p>About Mistral AI\nAt Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity.</p>\n<p>Our technology is designed to integrate seamlessly into daily working life. We democratize AI through high-performance, optimized, open-source and cutting-edge models, products and solutions. Our comprehensive AI platform is designed to meet enterprise needs, whether on-premises or in cloud environments.</p>\n<p>We are a team passionate about AI and its potential to transform society. Our diverse workforce thrives in competitive environments and is committed to driving innovation. Our teams are distributed between France, USA, UK, Germany and Singapore.</p>\n<p>Role Summary\nWe are building one of Europe&#39;s largest AI infrastructure offerings that will provide our customers a private and integrated stack in every form factor they may need — from bare-metal servers to fully-managed PaaS.</p>\n<p>You will join a fast-growing team to help build, scale and automate our computing management stack. You will be responsible for building fault-tolerant and reliable infrastructure to support both our internal processes and customer platform.</p>\n<p>Location: France and UK as primary locations. Remote in Europe can be considered under conditions.</p>\n<p>Key Responsibilities:\n• Design, build, and operate a scalable Kubernetes-based platform to host large-scale AI and HPC workloads, ensuring high performance, reliability, and security.\n• Own the full lifecycle of cluster management, from bootstrapping and provisioning to global operations, by integrating and developing the necessary software components—including automation, monitoring, and orchestration tools.\n• Drive infrastructure innovation by designing workflows, tooling (scripts, APIs, dashboards), and CI/CD pipelines to optimize system reliability, availability, and observability.\n• Champion a zero-trust security model, strengthening IAM, networking (VPC), and access controls to safeguard the platform.\n• Develop user-centric features that simplify operations for both sysadmins and end customers, reducing friction in daily workflows.\n• Lead incident resolution with rigorous root-cause analysis to prevent recurrence and improve system resilience.</p>\n<p>About you\n• Strong proficiency in software development (preferably Golang) and knowledge of software development best practices\n• Successful experience in an Infrastructure Engineering role (SWE, Platform, DevOps, Cloud...)\n• Deep understanding of Kubernetes internals and hands-on experience with containerization and orchestration tools (Docker, Kubernetes, Openstack...)\n• Familiarity with infrastructure-as-code tools like Terraform or CloudFormation\n• Knowledge of monitoring, logging, alerting and observability tools (Prometheus, Grafana, ELK, Datadog...)\n• Exposure to highly available distributed systems and site reliability issues in critical environments (issue root cause analysis, in-production troubleshooting, on-call rotations...)\n• Experience working against reliability KPIs (observability, alerting, SLAs)\n• Excellent problem-solving and communication skills\n• Self-motivation and ability to thrive in a fast-paced startup environment</p>\n<p>Now, it would be ideal if you also had:\n• Experience with HPC workload managers (Slurm) and distributed storage systems (Lustre, Ceph)\n• Demonstrated history of contributing to open-source projects (e.g., code, documentation, bug fixes, feature development, or community support).</p>\n<p>Additional Information\nLocation &amp; Remote\nThis role is primarily based in one of our European offices — Paris, France and London, UK. We will prioritize candidates who either reside there or are open to relocating. We strongly believe in the value of in-person collaboration to foster strong relationships and seamless communication within our team.</p>\n<p>In certain specific situations, we will also consider remote candidates based in one of the countries listed in this job posting — currently France, UK, Germany, Belgium, Netherlands, Spain and Italy.</p>\n<p>In any case, we ask all new hires to visit our Paris HQ office:\n• for the first week of their onboarding (accommodation and travelling covered)\n• then at least 2 days per month</p>\n<p>What we offer\nCompetitive salary and equity\nHealth insurance\nTransportation allowance\nSport allowance\nMeal vouchers\nPrivate pension plan\nGenerous parental leave policy\nVisa sponsorship</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_37049070-1d7","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai"},"x-apply-url":"https://jobs.lever.co/mistral/d60f6c60-ad5e-4753-af8a-56365b7db8b8","x-work-arrangement":"remote","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["software development","Golang","Kubernetes","containerization","orchestration","infrastructure-as-code","Terraform","CloudFormation","monitoring","logging","alerting","observability","Prometheus","Grafana","ELK","Datadog"],"x-skills-preferred":["HPC workload managers","distributed storage systems","open-source projects"],"datePosted":"2026-03-10T11:35:56.693Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Paris"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"software development, Golang, Kubernetes, containerization, orchestration, infrastructure-as-code, Terraform, CloudFormation, monitoring, logging, alerting, observability, Prometheus, Grafana, ELK, Datadog, HPC workload managers, distributed storage systems, open-source projects"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_ee8452c7-48c"},"title":"Software Engineer, Mistral Vibe","description":"<p>About Mistral Vibe</p>\n<p>We are seeking passionate and talented Fullstack Engineers to join our growing Mistral Vibe team and help shape the future of AI-powered coding. With the recent launch of Devstral 2 (our next-gen coding model) and Mistral Vibe (our open-source CLI for end-to-end code automation), this role offers a unique chance to build and refine our AI-driven developer tools. Your work will directly influence how users interact with our products — making them faster, smarter, and more intuitive.</p>\n<p>Responsibilities</p>\n<p>• Full Stack Development: Design, develop, and maintain scalable and robust features, ensuring seamless integration between front-end and back-end systems\n• User-Centric Design: Prioritize user experience and ensure that our products meet the needs and expectations of our user base\n• Code Quality: Write clean, maintainable, and well-documented code, and participate in code reviews to uphold our high quality standards\n• Collaboration: Work closely with cross-functional teams, including product managers, designers, and other engineers, to deliver high-quality software solutions\n• Problem-Solving: Tackle complex technical challenges and develop elegant, efficient solutions that improve performance and reliability\n• Innovation: Stay up-to-date with the latest technologies and trends in AI and software development, and apply them to enhance our products</p>\n<p>About You</p>\n<p>• Degree in Computer Science, Software Engineering, or equivalent practical experience\n• Proficiency in Python\n• Comfortable shipping products end-to-end\n• Strong problem-solving abilities and attention to detail\n• Excellent communication\n• Low Ego and team spirit mindset\n• Autonomous and self-starter</p>\n<p>Now it would be ideal if you have experience with:</p>\n<p>• AI Products, particularly with LLMs\n• Distributed Systems\n• Monitoring/Alerting\n• JavaScript/TypeScript, UX development (Figma)</p>\n<p>Hiring Process</p>\n<p>• Introduction call - 30 min\n• Technical rounds--- Live-coding interview - 45 min--- System Design interview - 45 min--- Project Deep Dive Interview (only for leads &amp; staff engineers) - 60 min\n• Hiring Manager interview - 30 min\n• Culture-fit discussion - 30 min\n• Reference checks</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_ee8452c7-48c","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai"},"x-apply-url":"https://jobs.lever.co/mistral/108dd647-59bd-4077-a840-2a0036404f58","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Python","AI Products","Distributed Systems","Monitoring/Alerting","JavaScript/TypeScript","UX development (Figma)"],"x-skills-preferred":[],"datePosted":"2026-03-10T11:34:02.295Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Paris"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, AI Products, Distributed Systems, Monitoring/Alerting, JavaScript/TypeScript, UX development (Figma)"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_fe8902ba-3ab"},"title":"Software Engineer, Frontend","description":"<p>We are seeking a passionate and skilled Senior Frontend Engineer to join our growing team. In this role, you will have the unique opportunity to work on our complete range of products, contributing to its development and enhancement. Your work will directly impact the user experience, making it more engaging, efficient, and intuitive.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Full Stack Development: Design, develop, and maintain scalable and robust features, ensuring seamless integration between front-end and back-end systems using a modern stack.</li>\n<li>User-Centric Design: Prioritize user experience and ensure that our products meet the needs and expectations of our user base.</li>\n<li>Code Quality: Write clean, maintainable, and well-documented code, and participate in code reviews to uphold our high standards of quality.</li>\n<li>Collaboration: Work closely with cross-functional teams, including product managers, designers, and other engineers, to deliver high-quality software solutions.</li>\n<li>Problem-Solving: Tackle complex technical challenges and develop elegant, efficient solutions that improve performance and reliability.</li>\n<li>Innovation: Stay up-to-date with the latest technologies and trends in AI and software development, and apply them to enhance our products.</li>\n</ul>\n<p>About you:</p>\n<ul>\n<li>Proficient in TypeScript and NodeJS with a preference for experience in B2C contexts.</li>\n<li>Experience in a Front-end framework like React, NextJS, Remix, VueJS</li>\n<li>Knowledge of web libraries like Tanstack React Query, tRPC, Framer Motion, …</li>\n<li>Comfortable shipping products end-to-end</li>\n<li>Strong problem-solving abilities and attention to detail</li>\n<li>Excellent communication</li>\n<li>Low Ego and team spirit mindset</li>\n<li>Autonomous and self-starter</li>\n</ul>\n<p>Now it would be ideal if you have experience with:</p>\n<ul>\n<li>AI Products, particularly with LLMs</li>\n<li>Python</li>\n<li>Distributed Systems</li>\n<li>Monitoring/Alerting</li>\n<li>UX development (Figma)</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_fe8902ba-3ab","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai/careers"},"x-apply-url":"https://jobs.lever.co/mistral/305432ef-27ac-4012-a893-a662813ac6e9","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["TypeScript","NodeJS","React","NextJS","Remix","VueJS","Tanstack React Query","tRPC","Framer Motion"],"x-skills-preferred":["AI Products","Python","Distributed Systems","Monitoring/Alerting","UX development (Figma)"],"datePosted":"2026-03-10T11:33:36.006Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Paris"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"TypeScript, NodeJS, React, NextJS, Remix, VueJS, Tanstack React Query, tRPC, Framer Motion, AI Products, Python, Distributed Systems, Monitoring/Alerting, UX development (Figma)"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_419c1058-a0b"},"title":"Site Reliability Engineer","description":"<p>About Mistral AI</p>\n<p>At Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life. We democratize AI through high-performance, optimized, open-source and cutting-edge models, products and solutions. Our comprehensive AI platform is designed to meet enterprise needs, whether on-premises or in cloud environments.</p>\n<p>Role Summary</p>\n<p>We are seeking highly experienced Site Reliability Engineers (SRE) to shape the reliability, scalability and performance of our platform and customer facing applications. You will work closely with our software engineers and research teams to ensure our systems meet and exceed our internal and external customers&#39; expectations.</p>\n<p>Responsibilities</p>\n<p>As a Site Reliability Engineer, you balance the day-to-day operations on production systems with long-term software engineering improvements to reduce operational toil and foster the reliability, availability, and performance of these systems.</p>\n<p>Operations (50%)</p>\n<ul>\n<li>Design, build, and maintain scalable, highly available and fault-tolerant infrastructures to support our web services and ML workloads</li>\n<li>Make sure our platform, inference and model training environments are always highly available and enable seamless replication of work environments across several HPC clusters</li>\n<li>Operate systems and troubleshoot issues in production environments (interrupts, on-call responses, users admin, data extraction, infrastructure scaling, etc.)</li>\n<li>Implement and improve monitoring, alerting, and incident response systems to ensure optimal system performance and minimize downtime</li>\n<li>Implement and maintain workflows and tools (CI/CD, containerization, orchestration, monitoring, logging and alerting systems) for both our client-facing APIs and large training runs</li>\n<li>Participate occasionally in on-call rotations to respond to incidents and perform root cause analysis to prevent future occurrences</li>\n</ul>\n<p>Development (50%)</p>\n<ul>\n<li>Drive continuous improvement in infrastructure automation, deployment, and orchestration using tools like Kubernetes, Flux, Terraform</li>\n<li>Collaborate with AI/ML researchers to develop and implement solutions that enable safe and reproducible model-training experiments</li>\n<li>Build a cloud-agnostic platform offering an abstraction layer between science and infrastructure</li>\n<li>Design and develop new workflows and tooling to improve to the reliability, availability and performance of our systems (automation scripts, refactoring, new API-based features, web apps, dashboards, etc.)</li>\n<li>Collaborate with the security team to ensure infrastructure adheres to best security practices and compliance requirements</li>\n<li>Document processes and procedures to ensure consistency and knowledge sharing across the team</li>\n<li>Contribute to open-source projects, research publications, blog articles and conferences</li>\n</ul>\n<p>About You</p>\n<ul>\n<li>Master’s degree in Computer Science, Engineering or a related field</li>\n<li>7+ years of experience in a DevOps/SRE role</li>\n<li>Strong experience with cloud computing and highly available distributed systems</li>\n<li>Exposure to site reliability issues in critical environments (issue root cause analysis, in-production troubleshooting, on-call rotations...) </li>\n<li>Experience working against reliability KPIs (observability, alerting, SLAs)</li>\n<li>Hands-on experience with CI/CD, containerization and orchestration tools (Docker, Kubernetes...)</li>\n<li>Knowledge of monitoring, logging, alerting and observability tools (Prometheus, Grafana, ELK Stack, Datadog...)</li>\n<li>Familiarity with infrastructure-as-code tools like Terraform or CloudFormation</li>\n<li>Proficiency in scripting languages (Python, Go, Bash...) and knowledge of software development best practices</li>\n<li>Strong understanding of networking, security, and system administration concepts</li>\n<li>Excellent problem-solving and communication skills</li>\n<li>Self-motivated and able to work well in a fast-paced startup environment</li>\n</ul>\n<p>Your Application Will Be All The More Interesting If You Also Have:</p>\n<ul>\n<li>Experience in an AI/ML environment</li>\n<li>Experience of high-performance computing (HPC) systems and workload managers (Slurm)</li>\n<li>Worked with modern AI-oriented solutions (Fluidstack, Coreweave, Vast...)</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_419c1058-a0b","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai/careers"},"x-apply-url":"https://jobs.lever.co/mistral/6e16e4fa-a60b-4270-a815-06b0450fb597","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["cloud computing","highly available distributed systems","DevOps","SRE","Kubernetes","Flux","Terraform","CI/CD","containerization","orchestration","monitoring","logging","alerting","observability","infrastructure-as-code","scripting languages","software development best practices","networking","security","system administration"],"x-skills-preferred":["AI/ML environment","high-performance computing","workload managers","modern AI-oriented solutions"],"datePosted":"2026-03-10T11:32:04.928Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Paris"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"cloud computing, highly available distributed systems, DevOps, SRE, Kubernetes, Flux, Terraform, CI/CD, containerization, orchestration, monitoring, logging, alerting, observability, infrastructure-as-code, scripting languages, software development best practices, networking, security, system administration, AI/ML environment, high-performance computing, workload managers, modern AI-oriented solutions"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_871d4845-25a"},"title":"Software Engineer, DevOps, Research Platform","description":"<p>We are seeking a talented and experienced software engineer to join our Research Platform team. You&#39;ll work closely with our R&amp;D team to build a cloud agnostic platform that improves the stability, scalability and velocity across the research department.</p>\n<p>As a DevOps/Platform Engineer, your responsibilities will include designing and implementing complex systems, building flexible yet solid and accessible development environment for researchers, designing, implementing and advocating for solutions addressing large amounts of data and maintainable data pipelines, optimizing a variety of builds, building strong relationships with researchers, communicating and producing documentation or any content that will help them to make the most out of the tools and systems you&#39;ll build.</p>\n<p>About you:</p>\n<ul>\n<li>5+ years of successful experience in a similar DX / DevOps / SRE role.</li>\n<li>Proficiency in software development (Python, Go...) and programming best practices.</li>\n<li>Exposure to site reliability engineering: root cause analysis, in-production troubleshooting, on-call rotations...</li>\n<li>Exposure to infrastructure management: CI/CD, containerization, orchestration, infra-as-code, monitoring, logging, alerting, observability...</li>\n<li>Technical product mindset (e.g. understanding how to debug poor adoption).</li>\n<li>Excellent problem-solving and communication skills (ability to contextualizing, gauging risks and getting buy-in for high stakes and impactful solutions).</li>\n<li>Ownership, high agency and constantly seeking to learn and improving things for others.</li>\n<li>Autonomous, self-driven and able to work well in a fast-paced startup environment.</li>\n<li>Low ego and team spirit mindset.</li>\n</ul>\n<p>Your application will be all the more interesting if you also have:</p>\n<ul>\n<li>First hand Bazel (or equivalent) experience.</li>\n<li>Strong knowledge of Python&#39;s ecosystem.</li>\n<li>Familiarity with GPU based workloads and ecosystems.</li>\n<li>Experience of full remote environments (you&#39;re comfortable with having some of your users on the other side of the globe).</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_871d4845-25a","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai/careers"},"x-apply-url":"https://jobs.lever.co/mistral/18be2b70-c05d-48e4-82ac-e5cb462c96c0","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["software development","Python","Go","site reliability engineering","infrastructure management","CI/CD","containerization","orchestration","infra-as-code","monitoring","logging","alerting","observability"],"x-skills-preferred":["Bazel","Python's ecosystem","GPU based workloads and ecosystems","full remote environments"],"datePosted":"2026-03-10T11:31:49.456Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Paris"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"software development, Python, Go, site reliability engineering, infrastructure management, CI/CD, containerization, orchestration, infra-as-code, monitoring, logging, alerting, observability, Bazel, Python's ecosystem, GPU based workloads and ecosystems, full remote environments"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_eafe9949-c5e"},"title":"Cybersecurity Engineer, SIEM","description":"<p>About Mistral AI\\n====================\\n\\nAt Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life.\\n\\nWe are a global company with teams distributed between France, USA, UK, Germany and Singapore. Our comprehensive AI platform meets enterprise needs, whether on-premises or in cloud environments.\\n\\nRole Summary\\n============\\n\\nMistral is looking for a Security Platform Engineer to architect and maintain the infrastructure ensuring the observability of our production systems. You will treat the SIEM and logging infrastructure as a high-performance data product.\\n\\nResponsibilities\\n---------------\\n\\n* Own the set-up, lifecycle, availability, and performance of the SIEM solution, ensuring 99.9% uptime for log ingestion and query availability.\\n* Design and maintain high-throughput data pipelines to collect, buffer, and transport logs from distributed systems to the SIEM.\\n* Implement parsing logic and schema standardization to ensure unstructured logs are searchable and actionable for analysts.\\n* Manage alert rules, connectors, and dashboard configurations, avoiding manual console configuration (&quot;ClickOps&quot;).\\n* Analyze ingestion patterns to identify noisy, low-value data. Implement filtering and aggregation at the source to maximize signal-to-noise ratio.\\n* Architect data tiers to balance query performance with compliance retention requirements and cloud costs.\\n\\nAbout You\\n========\\n\\n* 5+ years of experience in Site Reliability Engineering (SRE), Data Engineering, or Security Engineering with a focus on logging infrastructure.\\n* Deep understanding of log management challenges at scale (indexing strategies, sharding, partitioning, throughput tuning).\\n* Strong experience deploying and monitoring stateful workloads on Kubernetes and Cloud providers (Azure/GCP) and On-Prem.\\n* Ability to write production-grade Python or Go for automation and custom log exporters.\\n* Experience managing monitoring, alerting, and on-call rotations for critical infrastructure.\\n\\nHiring Process\\n============\\n\\n* Introduction call - 30 min\\n* Hiring Manager interview - 30 min\\n* Technical Rounds I - 45 min\\n* Technical Rounds II - 60 min\\n* Culture-fit discussion - 30 min\\n* References\\n\\nAdditional Information\\n====================\\n\\nLocation &amp; Remote\\n-----------------\\nThe position is based in our Paris HQ offices and we encourage going to the office as much as we can (at least 3 days per week) to create bonds and smooth communication. Our remote policy aims to provide flexibility, improve work-life balance and increase productivity. Each manager can decide the amount of days worked remotely based on autonomy and a specific context (e.g. more flexibility can occur during summer). In any case, employees are expected to maintain regular communication with their teams and be available during core working hours.\\n\\nWhat We Offer\\n============\\n\\n* Competitive salary and equity package\\n* Health insurance\\n* Transportation allowance\\n* Sport allowance\\n* Meal vouchers\\n* Private pension plan\\n* Generous parental leave policy</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_eafe9949-c5e","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Mistral AI","sameAs":"https://mistral.ai"},"x-apply-url":"https://jobs.lever.co/mistral/6f7f6e7a-3dc4-430b-8957-a64450a10066","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Site Reliability Engineering","Data Engineering","Security Engineering","Logging infrastructure","Kubernetes","Cloud providers","Python","Go","Monitoring","Alerting","On-call rotations"],"x-skills-preferred":[],"datePosted":"2026-03-10T11:24:38.630Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Paris"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Site Reliability Engineering, Data Engineering, Security Engineering, Logging infrastructure, Kubernetes, Cloud providers, Python, Go, Monitoring, Alerting, On-call rotations"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_f8883394-0fc"},"title":"Solutions Architect, AI and ML","description":"<p>We are looking for an experienced Cloud Solution Architect to help assist customers with adoption of GPU hardware and Software, as well as building and deploying Machine Learning (ML) , Deep Learning (DL), data analytics solutions on various Cloud Computing Platforms.</p>\n<p>As a Solutions Architect, you will engage directly with developers, researchers, and data scientists with some of NVIDIA’s most strategic technology customers as well as work directly with business and engineering teams on product strategy.</p>\n<p><strong>Key Responsibilities:</strong></p>\n<ul>\n<li>Help cloud customers craft, deploy, and maintain scalable, GPU-accelerated inference pipelines on cloud ML services and Kubernetes for large language models (LLMs) and generative AI workloads.</li>\n<li>Enhance performance tuning using TensorRT/TensorRT-LLM, vLLM, Dynamo, and Triton Inference Server to improve GPU utilization and model efficiency.</li>\n<li>Collaborate with multi-functional teams (engineering, product) and offer technical mentorship to cloud customers implementing AI inference at scale.</li>\n<li>Build custom PoCs for solution that address customer’s critical business needs applying NVIDIA hardware and software technology</li>\n<li>Partner with Sales Account Managers or Developer Relations Managers to identify and secure new business opportunities for NVIDIA products and solutions for ML/DL and other software solutions</li>\n<li>Prepare and deliver technical content to customers including presentations about purpose-built solutions, workshops about NVIDIA products and solutions, etc.</li>\n<li>Conduct regular technical customer meetings for project/product roadmap, feature discussions, and intro to new technologies. Establish close technical ties to the customer to facilitate rapid resolution of customer issues</li>\n</ul>\n<p><strong>Requirements:</strong></p>\n<ul>\n<li>BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Statistics, Physics, or other Engineering fields or equivalent experience.</li>\n<li>3+ Years in Solutions Architecture with a proven track record of moving AI inference from POC to production in cloud computing environments including AWS, GCP, or Azure</li>\n<li>3+ years of hands-on experience with Deep Learning frameworks such as PyTorch and TensorFlow</li>\n<li>Excellent knowledge of the theory and practice of LLM and DL inference</li>\n<li>Strong fundamentals in programming, optimizations, and software design, especially in Python</li>\n<li>Experience with containerization and orchestration technologies like Docker and Kubernetes, monitoring, and observability solutions for AI deployments</li>\n<li>Knowledge of Inference technologies - NVIDIA NIM, TensorRT-LLM, Dynamo, Triton Inference Server, vLLM, etc</li>\n<li>Proficiency in problem-solving and debugging skills in GPU environments</li>\n<li>Excellent presentation, communication and collaboration skills</li>\n</ul>\n<p><strong>Nice to Have:</strong></p>\n<ul>\n<li>AWS, GCP or Azure Professional Solution Architect Certification.</li>\n<li>Experience optimizing and deploying large MoE LLMs at scale</li>\n<li>Active contributions to open-source AI inference projects (e.g., vLLM, TensorRT-LLM Dynamo, SGLang, Triton or similar)</li>\n<li>Experience with Multi-GPU Multi-node Inference technologies like Tensor Parallelism/Expert Parallelism, Disaggregated Serving, LWS, MPI, EFA/Infiniband, NVLink/PCIe, etc</li>\n<li>Experience in developing and integrating monitoring and alerting solutions using Prometheus, Grafana, and NVIDIA DCGM and GPU performance Analysis and tools like NVIDIA Nsight Systems</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_f8883394-0fc","directApply":true,"hiringOrganization":{"@type":"Organization","name":"NVIDIA","sameAs":"https://nvidia.wd5.myworkdayjobs.com","logo":"https://logos.yubhub.co/nvidia.com.png"},"x-apply-url":"https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-WA-Redmond/Solutions-Architect--AI-and-ML_JR2005988-1","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Cloud Solution Architecture","GPU hardware and Software","Machine Learning (ML)","Deep Learning (DL)","Data Analytics","Cloud Computing Platforms","Kubernetes","TensorRT","TensorRT-LLM","vLLM","Dynamo","Triton Inference Server","Python","Containerization","Orchestration","Monitoring","Observability","Inference technologies","NVIDIA NIM","Problem-solving","Debugging","GPU environments"],"x-skills-preferred":["AWS","GCP","Azure","Professional Solution Architect Certification","Large MoE LLMs","Open-source AI inference projects","Multi-GPU Multi-node Inference technologies","Monitoring and alerting solutions","Prometheus","Grafana","NVIDIA DCGM","GPU performance Analysis","NVIDIA Nsight Systems"],"datePosted":"2026-03-09T20:45:22.711Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Redmond, CA, Santa Clara, Seattle"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Cloud Solution Architecture, GPU hardware and Software, Machine Learning (ML), Deep Learning (DL), Data Analytics, Cloud Computing Platforms, Kubernetes, TensorRT, TensorRT-LLM, vLLM, Dynamo, Triton Inference Server, Python, Containerization, Orchestration, Monitoring, Observability, Inference technologies, NVIDIA NIM, Problem-solving, Debugging, GPU environments, AWS, GCP, Azure, Professional Solution Architect Certification, Large MoE LLMs, Open-source AI inference projects, Multi-GPU Multi-node Inference technologies, Monitoring and alerting solutions, Prometheus, Grafana, NVIDIA DCGM, GPU performance Analysis, NVIDIA Nsight Systems"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_46a8c619-ec1"},"title":"Backend Engineer: AI Shopping Agents","description":"<p><strong>About the Job</strong></p>\n<p>Constructor is seeking a Backend Engineer to join its AI Shopping Agents team. The primary focus of this job is to design, deliver &amp; maintain web and data pipeline services in close collaboration with other engineers.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Build, deploy, and support backend services</li>\n<li>Define cloud infrastructure using AWS CloudFormation and maintain CI/CD pipelines with GitHub Actions</li>\n<li>Improve and operate our observability stack</li>\n<li>Collaborate with technical and non-technical stakeholders to design, develop, and refine features</li>\n<li>Communicate effectively with stakeholders within and outside the team</li>\n<li>Contribute to data processing pipelines and ETL processes</li>\n</ul>\n<p><strong>Requirements</strong></p>\n<ul>\n<li>Strong proficiency in Python and server-side web development (API design, concurrency/asynchronous programming)</li>\n<li>Experience designing, building, and operating production backend services (performance, reliability, on-call/operations mindset)</li>\n<li>Experience with Infrastructure as Code and cloud resource management (AWS preferred; Azure/GCP also fine)</li>\n<li>Hands-on experience building or maintaining CI/CD pipelines</li>\n<li>Experience with observability: metrics/logs/traces, dashboards, and alerting</li>\n<li>Experience working with databases, including at least one relational and one NoSQL system (e.g., PostgreSQL, DynamoDB)</li>\n</ul>\n<p><strong>Nice to Haves</strong></p>\n<ul>\n<li>Experience with high-load and/or real-time systems</li>\n<li>Experience with distributed/service-oriented architectures, including interface definition and binary RPC (e.g., Protobuf/gRPC)</li>\n<li>Familiarity with additional vector databases</li>\n<li>Experience contributing to or owning ETL/data pipeline systems at scale</li>\n</ul>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Work with smart and empathetic people who will help you grow and make a meaningful impact.</li>\n<li>Regular team offsite events to connect and collaborate.</li>\n<li>Fully remote team - choose where you live.</li>\n<li>Unlimited vacation time - we strongly encourage all of our employees take at least 3 weeks per year.</li>\n<li>Work from home stipend! We want you to have the resources you need to set up your home office.</li>\n<li>Apple laptops provided for new employees.</li>\n<li>Training and development budget for every employee, refreshed each year.</li>\n<li>Maternity &amp; Paternity leave for qualified employees.</li>\n<li>Base salary: $80k–$120K USD, depending on knowledge, skills, experience, and interview results.</li>\n<li>Stock options - offered in addition to the base salary</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_46a8c619-ec1","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Constructor","sameAs":"https://apply.workable.com","logo":"https://logos.yubhub.co/j.com.png"},"x-apply-url":"https://apply.workable.com/j/9A5E2DE872","x-work-arrangement":"remote","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"$80k–$120K USD","x-skills-required":["Python","server-side web development","API design","concurrency/asynchronous programming","Infrastructure as Code","cloud resource management","CI/CD pipelines","observability","metrics/logs/traces","dashboards","alerting","databases","relational databases","NoSQL databases"],"x-skills-preferred":["high-load and/or real-time systems","distributed/service-oriented architectures","interface definition","binary RPC","vector databases","ETL/data pipeline systems"],"datePosted":"2026-03-09T10:58:04.837Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Oregon, United States"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, server-side web development, API design, concurrency/asynchronous programming, Infrastructure as Code, cloud resource management, CI/CD pipelines, observability, metrics/logs/traces, dashboards, alerting, databases, relational databases, NoSQL databases, high-load and/or real-time systems, distributed/service-oriented architectures, interface definition, binary RPC, vector databases, ETL/data pipeline systems","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":80000,"maxValue":120000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_f70dd4a2-526"},"title":"Staff+ Software Engineer, Observability","description":"<p><strong>About the Role</strong></p>\n<p>Anthropic is seeking talented and experienced Software Engineers to join our Observability team within the Infrastructure organisation. The Observability team owns the monitoring and telemetry infrastructure that every engineer and researcher at Anthropic depends on—from metrics and logging pipelines to distributed tracing, error analytics, alerting, and the dashboards and query interfaces that make it all actionable.</p>\n<p><strong>Responsibilities</strong></p>\n<ul>\n<li>Design and build scalable telemetry ingest and storage pipelines for metrics, logs, traces, and error data across Anthropic&#39;s multi-cluster infrastructure</li>\n<li>Own and evolve core observability platforms, driving migrations and architectural improvements that improve reliability, reduce cost, and scale with organisational growth</li>\n<li>Build instrumentation libraries, SDKs, and integrations that make it easy for engineering teams to emit high-quality telemetry from their services</li>\n<li>Drive alerting and SLO infrastructure that enables teams to define, monitor, and respond to reliability targets with minimal noise</li>\n<li>Reduce mean time to detection and resolution by building cross-signal correlation, unified query interfaces, and AI-assisted diagnostic tooling</li>\n<li>Partner with Research, Inference, Product, and Infrastructure teams to ensure observability solutions meet the unique needs of each organisation</li>\n</ul>\n<p><strong>You May Be a Good Fit If You:</strong></p>\n<ul>\n<li>Have 10+ years of relevant industry experience building and operating large-scale observability or monitoring infrastructure</li>\n<li>Have deep experience with at least one observability signal area (metrics, logging, tracing, or error analytics) and familiarity with the others</li>\n<li>Understand high-throughput data pipelines, columnar storage engines, and the tradeoffs involved in ingesting and querying telemetry data at scale</li>\n<li>Have experience operating or building on top of observability platforms such as Prometheus, Grafana, ClickHouse, OpenTelemetry, or similar systems</li>\n<li>Have strong proficiency in at least one of Python, Rust, or Go</li>\n<li>Have excellent communication skills and enjoy partnering with internal teams to improve their operational visibility and incident response capabilities</li>\n<li>Are excited about building foundational infrastructure and are comfortable working independently on ambiguous, high-impact technical challenges</li>\n</ul>\n<p><strong>Strong Candidates May Also Have:</strong></p>\n<ul>\n<li>Experience operating metrics systems at very high cardinality (hundreds of millions of active time series or more)</li>\n<li>Experience with log storage migrations or operating columnar databases (ClickHouse, BigQuery, or similar) for analytics workloads</li>\n<li>Experience with OpenTelemetry instrumentation, collector pipelines, and tail-based sampling strategies</li>\n<li>Experience building or operating alerting platforms, on-call tooling, or SLO frameworks at scale</li>\n<li>Experience with Kubernetes-native monitoring, eBPF-based observability, or continuous profiling</li>\n<li>Interest in applying AI/LLMs to operational workflows such as automated root cause analysis, anomaly detection, or intelligent alerting</li>\n</ul>\n<p><strong>Logistics</strong></p>\n<ul>\n<li>Education requirements: We require at least a Bachelor&#39;s degree in a related field or equivalent experience.</li>\n<li>Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</li>\n<li>Visa sponsorship: We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</li>\n</ul>\n<p><strong>We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work.</strong></p>\n<p><strong>Your safety matters to us. To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses.</strong></p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_f70dd4a2-526","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://job-boards.greenhouse.io","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/5139910008","x-work-arrangement":"hybrid","x-experience-level":"staff","x-job-type":"full-time","x-salary-range":"$405,000 - $485,000 USD","x-skills-required":["observability","metrics","logging","tracing","error analytics","alerting","SLO infrastructure","cross-signal correlation","unified query interfaces","AI-assisted diagnostic tooling","Python","Rust","Go","Prometheus","Grafana","ClickHouse","OpenTelemetry"],"x-skills-preferred":["OpenTelemetry instrumentation","collector pipelines","tail-based sampling strategies","Kubernetes-native monitoring","eBPF-based observability","continuous profiling","AI/LLMs","automated root cause analysis","anomaly detection","intelligent alerting"],"datePosted":"2026-03-08T13:52:33.217Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA | New York City, NY | Seattle, WA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"observability, metrics, logging, tracing, error analytics, alerting, SLO infrastructure, cross-signal correlation, unified query interfaces, AI-assisted diagnostic tooling, Python, Rust, Go, Prometheus, Grafana, ClickHouse, OpenTelemetry, OpenTelemetry instrumentation, collector pipelines, tail-based sampling strategies, Kubernetes-native monitoring, eBPF-based observability, continuous profiling, AI/LLMs, automated root cause analysis, anomaly detection, intelligent alerting","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":405000,"maxValue":485000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_6cc383e0-ff6"},"title":"ML Infrastructure Engineer, Safeguards","description":"<p><strong>About the role</strong></p>\n<p>We are seeking a Machine Learning Infrastructure Engineer to join our Safeguards organization, where you&#39;ll build and scale the critical infrastructure that powers our AI safety systems. You&#39;ll work at the intersection of machine learning, large-scale distributed systems, and AI safety, developing the platforms and tools that enable our safeguards to operate reliably at scale.</p>\n<p><strong>Responsibilities:</strong></p>\n<ul>\n<li>Design and build scalable ML infrastructure to support real-time and batch classifier and safety evaluations across our model ecosystem</li>\n<li>Build monitoring and observability tools to track model performance, data quality, and system health for safety-critical applications</li>\n<li>Collaborate with research teams to productionize safety research, translating experimental safety techniques into robust, scalable systems</li>\n<li>Optimize inference latency and throughput for real-time safety evaluations while maintaining high reliability standards</li>\n<li>Implement automated testing, deployment, and rollback systems for ML models in production safety applications</li>\n<li>Partner with Safeguards, Security, and Alignment teams to understand requirements and deliver infrastructure that meets safety and production needs</li>\n<li>Contribute to the development of internal tools and frameworks that accelerate safety research and deployment</li>\n</ul>\n<p><strong>You may be a good fit if you:</strong></p>\n<ul>\n<li>Have 5+ years of experience building production ML infrastructure, ideally in safety-critical domains like fraud detection, content moderation, or risk assessment</li>\n<li>Are proficient in Python and have experience with ML frameworks like PyTorch, TensorFlow, or JAX</li>\n<li>Have hands-on experience with cloud platforms (AWS, GCP) and container orchestration (Kubernetes)</li>\n<li>Understand distributed systems principles and have built systems that handle high-throughput, low-latency workloads</li>\n<li>Have experience with data engineering tools and building robust data pipelines (e.g., Spark, Airflow, streaming systems)</li>\n<li>Are results-oriented, with a bias towards reliability and impact in safety-critical systems</li>\n<li>Enjoy collaborating with researchers and translating cutting-edge research into production systems</li>\n<li>Care deeply about AI safety and the societal impacts of your work</li>\n</ul>\n<p><strong>Strong candidates may have experience with:</strong></p>\n<ul>\n<li>Working with large language models and modern transformer architectures</li>\n<li>Implementing A/B testing frameworks and experimentation infrastructure for ML systems</li>\n<li>Developing monitoring and alerting systems for ML model performance and data drift</li>\n<li>Building automated labeling systems and human-in-the-loop workflows</li>\n<li>Experience in trust &amp; safety, fraud prevention, or content moderation domains</li>\n<li>Knowledge of privacy-preserving ML techniques and compliance requirements</li>\n<li>Contributing to open-source ML infrastructure projects</li>\n</ul>\n<p><strong>Deadline to apply:</strong></p>\n<p>None. Applications will be reviewed on a rolling basis.</p>\n<p><strong>Logistics</strong></p>\n<ul>\n<li>Education requirements: We require at least a Bachelor&#39;s degree in a related field or equivalent experience.</li>\n<li>Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.</li>\n<li>Visa sponsorship: We do sponsor visas! However, we aren&#39;t able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.</li>\n</ul>\n<p><strong>We encourage you to apply even if you do not believe you meet every single qualification.</strong></p>\n<p>Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you&#39;re interested in this work.</p>\n<p><strong>Your safety matters to us.</strong></p>\n<p>To protect yourself from potential scams, remember that Anthropic recruiters only contact you from @anthropic.com email addresses. In some cases, we may partner with vetted recruiting agencies who will identify themselves as working on behalf of Anthropic. Be cautious of emails from other domains. Legitimate Anthropic recruiters will never ask for money, fees, or banking information before your first day. If you&#39;re ever unsure about a communication, don&#39;t click any links—visit anthropic.com/careers directly for confirmed position openings.</p>\n<p><strong>How we&#39;re different</strong></p>\n<p>We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing the state of the art in AI safety and making a meaningful difference in the world.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_6cc383e0-ff6","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://job-boards.greenhouse.io","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4778843008","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$320,000 - $405,000 USD","x-skills-required":["Python","PyTorch","TensorFlow","JAX","AWS","GCP","Kubernetes","Spark","Airflow","streaming systems"],"x-skills-preferred":["large language models","modern transformer architectures","A/B testing frameworks","experimentation infrastructure","monitoring and alerting systems","automated labeling systems","human-in-the-loop workflows","trust & safety","fraud prevention","content moderation domains","privacy-preserving ML techniques","compliance requirements"],"datePosted":"2026-03-08T13:46:05.401Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, PyTorch, TensorFlow, JAX, AWS, GCP, Kubernetes, Spark, Airflow, streaming systems, large language models, modern transformer architectures, A/B testing frameworks, experimentation infrastructure, monitoring and alerting systems, automated labeling systems, human-in-the-loop workflows, trust & safety, fraud prevention, content moderation domains, privacy-preserving ML techniques, compliance requirements","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":320000,"maxValue":405000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_3514d749-08c"},"title":"Senior Support Engineer","description":"<p><strong>Senior Support Engineer - San Francisco</strong></p>\n<p><strong>Location</strong></p>\n<p>San Francisco</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$234K – $260K • Offers Equity</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<p><strong>Benefits</strong></p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p><strong>About the Team</strong></p>\n<p>The Technical Support team is responsible for ensuring that developers and enterprises can reliably build mission critical solutions using OpenAI models. We provide technical guidance, resolve complex issues and support customers in maximizing value and adoption from deploying our highly-capable models. We work closely with Technical Success, Product, Engineering and others to deliver the best possible experience to our customers at scale. We think from an automation-first mindset and leverage the latest in AI to scale our support operations. Join the Senior Support Engineering (SSE) team at OpenAI and help shape the future of Technical Support in the age of AI.</p>\n<p><strong>About the Role</strong></p>\n<p>We are looking for a Senior Support Engineer to collaborate directly with our strategic enterprise accounts and product teams, helping solve some of the most difficult problems faced by our Customers. You will be part of the best technical troubleshooting team at OpenAI, and our Customers and Engineering teams will look to you for technical guidance in addressing the most technically difficult issues in our environment.</p>\n<p>As a Senior Support Engineer, you will design and run operational processes to monitor our top strategic customers and a 24x7 response team. You’ll work closely with our Infrastructure and Engineering teams to deliver the best possible experience to customers at scale. Working directly with our most strategic Customers - You will be crucial to the success of the most innovative, disruptive, and high-scale AI solutions being built with the OpenAI API platform.</p>\n<p>The nature of this role will be low volume, high difficulty.</p>\n<p>This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.</p>\n<p><strong>In this role, you will:</strong></p>\n<ul>\n<li>Be among the foremost technical and troubleshooting experts for our API platform at OpenAI. You are the last line of defense before the core Engineering team.</li>\n</ul>\n<ul>\n<li>Proactively identify and implement opportunities to scale support operations by leveraging automation and advancements in AI technologies. Contribute to shaping the future of technical support in an AI-driven era.</li>\n</ul>\n<ul>\n<li>Configure and use advanced monitoring and alerting workflows to proactively detect customer impacting issues in real time.</li>\n</ul>\n<ul>\n<li>In partnership with engineering, contribute to reliability reviews and preparedness for new features, launches, or strategic customer requirement updates. Ensure that operational readiness (monitoring, alerting, and fallback plans) is in place for any such changes.</li>\n</ul>\n<ul>\n<li>Design and refine incident response processes and documentation across strategic customers, engineering and support teams.</li>\n</ul>\n<ul>\n<li>Analyze operational metrics and incident RCAs to identify areas for improvement. Proactively recommend and implement enhancements to monitoring dashboards, alert configurations, and support workflows.</li>\n</ul>\n<ul>\n<li>Provide support coverage during holidays and weekends based on business needs.</li>\n</ul>\n<p><strong>You might thrive in this role if you:</strong></p>\n<ul>\n<li>Have a Bachelor’s degree in Computer Science or a related field. A strong software engineering foundation is important for this role’s success.</li>\n</ul>\n<ul>\n<li>Have 8+ years of experience in technical operations roles such as SRE/NOC, designing monitoring systems and resolving production issues in fast-paced and mission-critical environments. A strong track record of troubleshooting complex technical problems at the systems level.</li>\n</ul>\n<ul>\n<li>Have deep familiarity with modern monitoring, alerting, and observability practices. Hands‑on experience setting up or managing metrics, logging, and tracing for distributed systems (e.g., understanding of SLIs/SLOs, alert tuning, dashboard creation).</li>\n</ul>\n<ul>\n<li>Have proven experience leading incident response for high‑severity outages or service disruptions. Able to perform real‑time incident coordination, root cause analysis, and communication with stakeholders.</li>\n</ul>\n<ul>\n<li>Are able to work effectively in a fast-paced environment, prioritize tasks, and manage multiple projects simultaneously.</li>\n</ul>\n<ul>\n<li>Are a strong communicator and team player, with excellent written and verbal communication skills.</li>\n</ul>\n<ul>\n<li>Are able to adapt to changing priorities and requirements, and are flexible in your approach to problem-solving.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_3514d749-08c","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/5431666c-530b-49c0-b67e-32477f9eaf5e","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$234K – $260K","x-skills-required":["Bachelor’s degree in Computer Science or a related field","8+ years of experience in technical operations roles such as SRE/NOC","Designing monitoring systems and resolving production issues in fast-paced and mission-critical environments","Troubleshooting complex technical problems at the systems level","Modern monitoring, alerting, and observability practices","Metrics, logging, and tracing for distributed systems","SLIs/SLOs, alert tuning, dashboard creation","Incident response for high‑severity outages or service disruptions","Real-time incident coordination, root cause analysis, and communication with stakeholders"],"x-skills-preferred":["Automation and advancements in AI technologies","Automation-first mindset and leveraging the latest in AI to scale support operations","Technical and troubleshooting expertise for API platform at OpenAI","Proactive identification and implementation of opportunities to scale support operations","Advanced monitoring and alerting workflows to proactively detect customer impacting issues in real time","Reliability reviews and preparedness for new features, launches, or strategic customer requirement updates","Operational readiness (monitoring, alerting, and fallback plans)","Incident response processes and documentation across strategic customers, engineering and support teams","Operational metrics and incident RCAs to identify areas for improvement","Enhancements to monitoring dashboards, alert configurations, and support workflows"],"datePosted":"2026-03-06T18:43:55.714Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Bachelor’s degree in Computer Science or a related field, 8+ years of experience in technical operations roles such as SRE/NOC, Designing monitoring systems and resolving production issues in fast-paced and mission-critical environments, Troubleshooting complex technical problems at the systems level, Modern monitoring, alerting, and observability practices, Metrics, logging, and tracing for distributed systems, SLIs/SLOs, alert tuning, dashboard creation, Incident response for high‑severity outages or service disruptions, Real-time incident coordination, root cause analysis, and communication with stakeholders, Automation and advancements in AI technologies, Automation-first mindset and leveraging the latest in AI to scale support operations, Technical and troubleshooting expertise for API platform at OpenAI, Proactive identification and implementation of opportunities to scale support operations, Advanced monitoring and alerting workflows to proactively detect customer impacting issues in real time, Reliability reviews and preparedness for new features, launches, or strategic customer requirement updates, Operational readiness (monitoring, alerting, and fallback plans), Incident response processes and documentation across strategic customers, engineering and support teams, Operational metrics and incident RCAs to identify areas for improvement, Enhancements to monitoring dashboards, alert configurations, and support workflows","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":234000,"maxValue":260000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_70806a42-556"},"title":"Senior Support Engineer","description":"<p><strong>Senior Support Engineer - Dublin</strong></p>\n<p><strong>Location</strong></p>\n<p>Dublin, Ireland</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p><strong>About the Team</strong></p>\n<p>The Technical Support team is responsible for ensuring that developers and enterprises can reliably build mission critical solutions using OpenAI models. We provide technical guidance, resolve complex issues and support customers in maximizing value and adoption from deploying our highly-capable models. We work closely with Technical Success, Product, Engineering and others to deliver the best possible experience to our customers at scale. We think from an automation-first mindset and leverage the latest in AI to scale our support operations. Join the Senior Support Engineering (SSE) team at OpenAI and help shape the future of Technical Support in the age of AI.</p>\n<p><strong>About the Role</strong></p>\n<p>We are looking for a Senior Support Engineer to collaborate directly with our strategic enterprise accounts and product teams, helping solve some of the most difficult problems faced by our Customers. You will be part of the best technical troubleshooting team at OpenAI, and our Customers and Engineering teams will look to you for technical guidance in addressing the most technically difficult issues in our environment.</p>\n<p>As a Senior Support Engineer, you will design and run operational processes to monitor our top strategic customers and a 24x7 response team. You’ll work closely with our Infrastructure and Engineering teams to deliver the best possible experience to customers at scale. Working directly with our most strategic Customers - You will be crucial to the success of the most innovative, disruptive, and high-scale AI solutions being built with the OpenAI API platform.</p>\n<p>The nature of this role will be low volume, high difficulty.</p>\n<p>This role is based in Dublin, Ireland. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.</p>\n<p><strong>In this role, you will:</strong></p>\n<ul>\n<li>Be among the foremost technical and troubleshooting experts for our API platform at OpenAI. You are the last line of defense before the core Engineering team.</li>\n</ul>\n<ul>\n<li>Proactively identify and implement opportunities to scale support operations by leveraging automation and advancements in AI technologies. Contribute to shaping the future of technical support in an AI-driven era.</li>\n</ul>\n<ul>\n<li>Configure and use advanced monitoring and alerting workflows to proactively detect customer impacting issues in real time.</li>\n</ul>\n<ul>\n<li>In partnership with engineering, contribute to reliability reviews and preparedness for new features, launches, or strategic customer requirement updates. Ensure that operational readiness (monitoring, alerting, and fallback plans) is in place for any such changes.</li>\n</ul>\n<ul>\n<li>Design and refine incident response processes and documentation across strategic customers, engineering and support teams.</li>\n</ul>\n<ul>\n<li>Analyze operational metrics and incident RCAs to identify areas for improvement. Proactively recommend and implement enhancements to monitoring dashboards, alert configurations, and support workflows.</li>\n</ul>\n<ul>\n<li>Provide support coverage during holidays and weekends based on business needs.</li>\n</ul>\n<p><strong>You might thrive in this role if you:</strong></p>\n<ul>\n<li>Have a Bachelor’s degree in Computer Science or a related field. A strong software engineering foundation is important for this role’s success.</li>\n</ul>\n<ul>\n<li>Have 5+ years of experience in technical operations roles such as SRE/NOC, designing monitoring systems and resolving production issues in fast-paced and mission-critical environments. A strong track record of troubleshooting complex technical problems at the systems level.</li>\n</ul>\n<ul>\n<li>Have deep familiarity with modern monitoring, alerting, and observability practices. Hands‑on experience setting up or managing metrics, logging, and tracing for distributed systems (e.g., understanding of SLIs/SLOs, alert tuning, dashboard creation).</li>\n</ul>\n<ul>\n<li>Have proven experience leading incident response for high‑severity outages or service disruptions. Able to perform real‑time incident coordination, root cause analysis, and drive follow‑ups (post‑mortems, action items) to prevent recurrence. Knowledge of industry best practices for incident management and fault diagnosis.</li>\n</ul>\n<ul>\n<li>Have strong skills in scripting or software engineering (e.g., Python or similar) to automate repetitive tasks and integrate tools.</li>\n</ul>\n<ul>\n<li>Have solid understanding of cloud infrastructure and distributed systems fundamentals. Comfortable working with cloud services, load balancers, databases, and containerized applications.</li>\n</ul>\n<ul>\n<li>Are effective at working cross‑functionally in a high‑trust environment. Strong communication skills to explain technical issues and resolutions to both engineering and non‑technical stakeholders. You can coordinate efforts across teams and are comfortable providing updates in the midst of an ongoing incident.</li>\n</ul>\n<p><strong>Compensation, Benefits and Perks</strong></p>\n<p>This is a position with OpenAI Ireland Ltd., which controls the hiring and management of this position.</p>\n<p>Total compensation includes an annual salary, generous equity, and benefits.</p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>PRSA plan with 8% employer matching</li>\n</ul>\n<ul>\n<li>Unlimited time off</li>\n</ul>\n<ul>\n<li>Annual learning &amp; development stipend ($1,500 USD equivalent per year)</li>\n</ul>\n<p>#LI-NM2</p>\n<p><strong>About OpenAI</strong></p>\n<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_70806a42-556","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/988016e1-de50-42be-925a-438b97291c5d","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Python","Cloud infrastructure","Distributed systems","Monitoring and alerting","Observability","Scripting","Software engineering","Cloud services","Load balancers","Databases","Containerized applications"],"x-skills-preferred":["SLIs/SLOs","Alert tuning","Dashboard creation","Incident management","Fault diagnosis","Cross-functional collaboration","Communication"],"datePosted":"2026-03-06T18:36:57.231Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Dublin"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Cloud infrastructure, Distributed systems, Monitoring and alerting, Observability, Scripting, Software engineering, Cloud services, Load balancers, Databases, Containerized applications, SLIs/SLOs, Alert tuning, Dashboard creation, Incident management, Fault diagnosis, Cross-functional collaboration, Communication"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_e38e0353-95c"},"title":"Senior Support Engineer","description":"<p><strong>Senior Support Engineer - Tokyo</strong></p>\n<p><strong>Location</strong></p>\n<p>Tokyo, Japan</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p><strong>About the Team</strong></p>\n<p>The Technical Support team is responsible for ensuring that developers and enterprises can reliably build mission critical solutions using OpenAI models. We provide technical guidance, resolve complex issues and support customers in maximizing value and adoption from deploying our highly-capable models. We work closely with Technical Success, Product, Engineering and others to deliver the best possible experience to our customers at scale. We think from an automation-first mindset and leverage the latest in AI to scale our support operations. Join the Senior Support Engineering (SSE) team at OpenAI and help shape the future of Technical Support in the age of AI.</p>\n<p><strong>About the Role</strong></p>\n<p>We are looking for a Senior Support Engineer to collaborate directly with our strategic enterprise accounts and product teams, helping solve some of the most difficult problems faced by our Customers. You will be part of the best technical troubleshooting team at OpenAI, and our Customers and Engineering teams will look to you for technical guidance in addressing the most technically difficult issues in our environment.</p>\n<p>As a Senior Support Engineer, you will design and run operational processes to monitor our top strategic customers and a 24x7 response team. You’ll work closely with our Infrastructure and Engineering teams to deliver the best possible experience to customers at scale. Working directly with our most strategic Customers - You will be crucial to the success of the most innovative, disruptive, and high-scale AI solutions being built with the OpenAI API platform.</p>\n<p>The nature of this role will be low volume, high difficulty.</p>\n<p>This role is based in Tokyo, Japan. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.</p>\n<p><strong>In this role, you will:</strong></p>\n<ul>\n<li>Be among the foremost technical and troubleshooting experts for our API platform at OpenAI. You are the last line of defense before the core Engineering team.</li>\n</ul>\n<ul>\n<li>Proactively identify and implement opportunities to scale support operations by leveraging automation and advancements in AI technologies. Contribute to shaping the future of technical support in an AI-driven era.</li>\n</ul>\n<ul>\n<li>Configure and use advanced monitoring and alerting workflows to proactively detect customer impacting issues in real time.</li>\n</ul>\n<ul>\n<li>In partnership with engineering, contribute to reliability reviews and preparedness for new features, launches, or strategic customer requirement updates. Ensure that operational readiness (monitoring, alerting, and fallback plans) is in place for any such changes.</li>\n</ul>\n<ul>\n<li>Design and refine incident response processes and documentation across strategic customers, engineering and support teams.</li>\n</ul>\n<ul>\n<li>Analyze operational metrics and incident RCAs to identify areas for improvement. Proactively recommend and implement enhancements to monitoring dashboards, alert configurations, and support workflows.</li>\n</ul>\n<ul>\n<li>Provide support coverage during holidays and weekends based on business needs.</li>\n</ul>\n<p><strong>You might thrive in this role if you:</strong></p>\n<ul>\n<li>Have a Bachelor’s degree in Computer Science or a related field. A strong software engineering foundation is important for this role’s success.</li>\n</ul>\n<ul>\n<li>Have 8+ years of experience in technical operations roles such as SRE/NOC, designing monitoring systems and resolving production issues in fast-paced and mission-critical environments. A strong track record of troubleshooting complex technical problems at the systems level.</li>\n</ul>\n<ul>\n<li>Have deep familiarity with modern monitoring, alerting, and observability practices. Hands‑on experience setting up or managing metrics, logging, and tracing for distributed systems (e.g., understanding of SLIs/SLOs, alert tuning, dashboard creation).</li>\n</ul>\n<ul>\n<li>Have proven experience leading incident response for high‑severity outages or service disruptions. Able to perform real‑time incident coordination, root cause analysis, and drive follow‑ups (post‑mortems, action items) to prevent recurrence. Knowledge of industry best practices for incident management and fault diagnosis.</li>\n</ul>\n<ul>\n<li>Have strong skills in scripting or software engineering (e.g., Python or similar) to automate repetitive tasks and integrate tools.</li>\n</ul>\n<ul>\n<li>Have solid understanding of cloud infrastructure and distributed systems fundamentals. Comfortable working with cloud services, load balancers, databases, and containerized applications.</li>\n</ul>\n<ul>\n<li>Are effective at working cross‑functionally in a high‑trust environment. Strong communication skills to explain technical issues and resolutions to both engineering and non‑technical stakeholders. You can coordinate efforts across teams and are comfortable providing updates in the midst of an ongoing incident.</li>\n</ul>\n<p><strong>About OpenAI</strong></p>\n<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_e38e0353-95c","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/b2fd550d-3e04-434e-bb91-c5b7bc8ac8b7","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Python","Cloud infrastructure","Distributed systems","Monitoring and alerting","Observability","Scripting","Software engineering","Cloud services","Load balancers","Databases","Containerized applications"],"x-skills-preferred":["Automation","AI technologies","Incident response","Reliability reviews","Post-mortems","Action items","Cross-functional collaboration","Communication","Technical writing"],"datePosted":"2026-03-06T18:36:56.708Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Tokyo, Japan"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Cloud infrastructure, Distributed systems, Monitoring and alerting, Observability, Scripting, Software engineering, Cloud services, Load balancers, Databases, Containerized applications, Automation, AI technologies, Incident response, Reliability reviews, Post-mortems, Action items, Cross-functional collaboration, Communication, Technical writing"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_fb4acb2b-bab"},"title":"Security Reliability Engineering, Lead","description":"<p><strong>Security Reliability Engineering, Lead</strong></p>\n<p><strong>Location</strong></p>\n<p>San Francisco</p>\n<p><strong>Employment Type</strong></p>\n<p>Full time</p>\n<p><strong>Department</strong></p>\n<p>Security</p>\n<p><strong>Compensation</strong></p>\n<ul>\n<li>$293K – $385K</li>\n</ul>\n<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>\n<ul>\n<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>\n</ul>\n<ul>\n<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>\n</ul>\n<ul>\n<li>401(k) retirement plan with employer match</li>\n</ul>\n<ul>\n<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>\n</ul>\n<ul>\n<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>\n</ul>\n<ul>\n<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>\n</ul>\n<ul>\n<li>Mental health and wellness support</li>\n</ul>\n<ul>\n<li>Employer-paid basic life and disability coverage</li>\n</ul>\n<ul>\n<li>Annual learning and development stipend to fuel your professional growth</li>\n</ul>\n<ul>\n<li>Daily meals in our offices, and meal delivery credits as eligible</li>\n</ul>\n<ul>\n<li>Relocation support for eligible employees</li>\n</ul>\n<ul>\n<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>\n</ul>\n<p>More details about our benefits are available to candidates during the hiring process.</p>\n<p>This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.</p>\n<p><strong>About the Team</strong></p>\n<p>The Infrastructure Engineering function sits within IT and is responsible for reliably building, deploying, and operating critical on prem and hybrid environments that power internal services and critical R&amp;D environments.</p>\n<p>This is a new, bootstrap team focused on applying strong Site Reliability Engineering discipline to environments where uptime, safety, recoverability, and security are non-negotiable. The team replaces bespoke, one off infrastructure with standardized infrastructure-as-code building blocks that compound reliability and operational leverage as OpenAI scales.</p>\n<p><strong>About the Role</strong></p>\n<p>We are looking for a Security Reliability Engineering Lead to design, build, and operate reliable, secure, and scalable infrastructure that underpins identity, access, endpoint, and shared platform services across the company.</p>\n<p>In this role, you will own infrastructure and identity systems end to end, from foundational design and provisioning through policy enforcement, upgrades, recovery, and day two operations. You will establish durable, production grade platforms that remove operational friction, enforce security by default, and enable teams to move faster with confidence.</p>\n<p>This role is well suited for a senior engineer who thrives in ambiguity, enjoys owning complex systems end to end, and raises the reliability and security bar by replacing fragile implementations with standardized, repeatable infrastructure.</p>\n<p>This role is based in our San Francisco HQ and requires in-office presence.</p>\n<p><strong>In this role, you will:</strong></p>\n<p><strong>Set direction and establish strong foundations</strong></p>\n<ul>\n<li>Define and evolve infrastructure patterns for on prem and hybrid environments, including self hosted platforms, vendor supported systems, and lab environments.</li>\n</ul>\n<ul>\n<li>Establish standardized, production grade deployment and operational models that replace bespoke implementations.</li>\n</ul>\n<ul>\n<li>Partner with IT, Security, Identity, and Network teams to ensure infrastructure meets reliability, security, and access requirements by design.</li>\n</ul>\n<ul>\n<li>Design and mature the production architecture for IAM adjacent platforms such as Microsoft Entra using SRE principles.</li>\n</ul>\n<ul>\n<li>Establish common management rules and shared resources within Azure subscriptions to ensure consistent, policy aligned operations.</li>\n</ul>\n<p><strong>Build, operate, and scale reliably</strong></p>\n<ul>\n<li>Own the full lifecycle of infrastructure systems, including deployment, upgrades, patching, recovery, and ongoing operations.</li>\n</ul>\n<ul>\n<li>Operate and harden shared infrastructure provisioned through Infra Terraform, ensuring repeatability, auditability, and safe change management.</li>\n</ul>\n<ul>\n<li>Design and implement infrastructure as code and configuration management to support shared services, identity adjacent systems, and endpoint platforms using tools like Chef, Ansible and Terraform.</li>\n</ul>\n<ul>\n<li>Build and operate monitoring, alerting, and incident response mechanisms to meet high availability and recoverability targets.</li>\n</ul>\n<ul>\n<li>Lead incident response and postmortems across infrastructure, identity adjacent platforms, and fleet systems, driving durable fixes and shared learning.</li>\n</ul>\n<ul>\n<li>Build and operate containerized and platform services, including Kubernetes and Docker-based workloads, using DevOps practices that emphasize reliability, repeatability, and safe change management.</li>\n</ul>\n<ul>\n<li>Use Git-based workflows as the source of truth for infrastructure and policy changes, enabling review, auditability, and safe, reversible automation.</li>\n</ul>\n<p><strong>Automate for leverage and safety</strong></p>\n<ul>\n<li>Identify high leverage automation opportunities that eliminate manual toil and reduce operational risk across infrastructure and access related systems.</li>\n</ul>\n<ul>\n<li>Implement guardrails, safety mechanisms, and progressive rollout patterns for infrastructure and policy enforcement changes.</li>\n</ul>\n<ul>\n<li>Ensure automation is safe, observable, and resilient under failure conditions, particularly for shared services and high blast radius systems.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_fb4acb2b-bab","directApply":true,"hiringOrganization":{"@type":"Organization","name":"OpenAI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/openai.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/openai/645ccd65-eb60-4eb7-b094-b01c2269638c","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$293K – $385K","x-skills-required":["Security Reliability Engineering","Infrastructure as Code","Cloud Computing","Containerization","DevOps","Git","Terraform","Ansible","Chef","Kubernetes","Docker","Microsoft Entra","Azure","Identity and Access Management","Endpoint Security","Platform Services"],"x-skills-preferred":["Site Reliability Engineering","Cloud Security","Container Orchestration","Infrastructure Automation","Monitoring and Alerting","Incident Response","Postmortem Analysis","DevOps Practices","Cloud-Native Applications","Microservices Architecture"],"datePosted":"2026-03-06T18:29:47.579Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Security Reliability Engineering, Infrastructure as Code, Cloud Computing, Containerization, DevOps, Git, Terraform, Ansible, Chef, Kubernetes, Docker, Microsoft Entra, Azure, Identity and Access Management, Endpoint Security, Platform Services, Site Reliability Engineering, Cloud Security, Container Orchestration, Infrastructure Automation, Monitoring and Alerting, Incident Response, Postmortem Analysis, DevOps Practices, Cloud-Native Applications, Microservices Architecture","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":293000,"maxValue":385000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_cb538332-6a9"},"title":"Senior/Staff Web Platform Engineer","description":"<p>We are looking for a Senior/Staff Web Platform Engineer to join our team. The successful candidate will be responsible for building and maintaining the systems that enable our product teams to deliver high-quality, performant single-page web applications across desktop, mobile web, and the Comet browser.</p>\n<p><strong>What you&#39;ll do</strong></p>\n<ul>\n<li>Optimize performance for critical web flows (search, answer rendering, browsing), with a focus on improving application response speed and perceived latency.</li>\n<li>Design and implement solutions for customizable user interfaces and experience components across Perplexity and Comet.</li>\n</ul>\n<p><strong>What you need</strong></p>\n<ul>\n<li>6+ years of practical experience as a software engineer with a strong focus on web technologies.</li>\n<li>Deep expertise in modern JavaScript frameworks, particularly React.</li>\n<li>Experience optimizing web application performance and working with metrics such as time-to-first-byte, time-to-interactive, and application response speed.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_cb538332-6a9","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Perplexity AI","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/perplexity.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/perplexity/cf179df1-3d69-4a9d-bda0-0c423efa9255","x-work-arrangement":"onsite","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$250K - $405K","x-skills-required":["JavaScript","React","Web performance optimization"],"x-skills-preferred":["TypeScript","Frontend monitoring and alerting systems"],"datePosted":"2026-03-04T12:27:51.219Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, New York City, Seattle"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"JavaScript, React, Web performance optimization, TypeScript, Frontend monitoring and alerting systems","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":250000,"maxValue":405000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_bed7736d-0a7"},"title":"Browser Infrastructure Engineer","description":"<p>This role exists to build reliable, automated, and scalable infrastructure for Chromium-based browser teams. As a Browser Infrastructure Engineer, you will focus on CI/CD pipelines, monitoring, and development environments to support fast-paced browser innovation.</p>\n<p><strong>What you&#39;ll do</strong></p>\n<p>You will set up and maintain CI/CD pipelines for builds and testing, support and evolve Chromium browser development infrastructure, configure monitoring and alerting systems, manage cloud infrastructure, develop automation scripts, and ensure high availability, resilience, and security of development infrastructure.</p>\n<p><strong>What you need</strong></p>\n<p>You will need 3+ years in software development infrastructure, preferably Chromium browsers, hands-on DevOps and SRE experience, including monitoring and incident management, proficiency in k8s, Terraform, Datadog, Sentry, AWS, Unix, TeamCity, strong CI/CD implementation skills, and ability to thrive in Agile teams with excellent communication.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_bed7736d-0a7","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Perplexity","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/perplexity.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/perplexity/7bce0fcf-eef6-41aa-9243-896f07a0316e","x-work-arrangement":"remote","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["software development infrastructure","CI/CD pipelines","monitoring and alerting systems","cloud infrastructure","automation scripts","DevOps and SRE experience"],"x-skills-preferred":["k8s","Terraform","Datadog","Sentry","AWS","Unix","TeamCity"],"datePosted":"2026-03-04T12:26:23.733Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Belgrade"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"software development infrastructure, CI/CD pipelines, monitoring and alerting systems, cloud infrastructure, automation scripts, DevOps and SRE experience, k8s, Terraform, Datadog, Sentry, AWS, Unix, TeamCity"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_1536743a-239"},"title":"Software Engineer II","description":"<p>We&#39;re looking for a Software Engineer II to join our team. As a Software Engineer II, you will focus on improving the reliability, scalability, and operational excellence of Java-based, microservices-driven systems that power player experiences.</p>\n<p><strong>What you&#39;ll do</strong></p>\n<ul>\n<li>Drive SRE initiatives to improve system availability, performance, and resilience across Java microservices</li>\n<li>Define and track SLOs, SLIs, and error budgets for critical services</li>\n</ul>\n<p><strong>What you need</strong></p>\n<ul>\n<li>Strong experience with Java, Spring Boot, and microservices architectures</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_1536743a-239","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Electronic Arts","sameAs":"https://jobs.ea.com","logo":"https://logos.yubhub.co/jobs.ea.com.png"},"x-apply-url":"https://jobs.ea.com/en_US/careers/JobDetail/Software-Engineer-II/212865","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["Java","Spring Boot","microservices architectures"],"x-skills-preferred":["monitoring","alerting","logging"],"datePosted":"2026-03-01T00:04:50.619Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Austin"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Java, Spring Boot, microservices architectures, monitoring, alerting, logging"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_f93645a0-f0d"},"title":"Senior Backend Engineer - Ventures","description":"<p>We are seeking a Senior Backend Software Engineer to design and deliver the backend systems that power next-generation social and gaming experiences. This is a hands-on role for an engineer who thrives on solving complex technical challenges while also guiding and uplifting others.</p>\n<p><strong>What you&#39;ll do</strong></p>\n<ul>\n<li>Design, build, and maintain backend services and systems supporting live and social features.</li>\n<li>Lead technical initiatives by driving system design discussions and influencing architecture.</li>\n</ul>\n<p><strong>What you need</strong></p>\n<ul>\n<li>7+ years of professional backend development experience, with significant contributions to large-scale, cloud-based platforms.</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_f93645a0-f0d","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Electronic Arts","sameAs":"https://jobs.ea.com","logo":"https://logos.yubhub.co/jobs.ea.com.png"},"x-apply-url":"https://jobs.ea.com/en_US/careers/JobDetail/Senior-Backend-Engineer/212512","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"regular employee","x-salary-range":null,"x-skills-required":["C#/.NET","AWS","SQL","PostgreSQL","NoSQL","CI/CD pipelines","automated testing","monitoring","alerting"],"x-skills-preferred":["C++","distributed frameworks like Microsoft Orleans"],"datePosted":"2026-02-11T15:06:13.288Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Guildford, Surrey, United Kingdom"}},"occupationalCategory":"Engineering","industry":"Technology","skills":"C#/.NET, AWS, SQL, PostgreSQL, NoSQL, CI/CD pipelines, automated testing, monitoring, alerting, C++, distributed frameworks like Microsoft Orleans"}]}