{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/distributed-systems-principles"},"x-facet":{"type":"skill","slug":"distributed-systems-principles","display":"Distributed Systems Principles","count":3},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_bd9625d9-99b"},"title":"ML Infrastructure Engineer, Safeguards","description":"<p>We are seeking a Machine Learning Infrastructure Engineer to join our Safeguards organization, where you&#39;ll build and scale the critical infrastructure that powers our AI safety systems.</p>\n<p>As part of the Safeguards team, you&#39;ll design and implement ML infrastructure that powers Claude safety. Your work will directly contribute to making AI systems more trustworthy and aligned with human values, ensuring our models operate safely as they become more capable.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Design and build scalable ML infrastructure to support real-time and batch classifier and safety evaluations across our model ecosystem</li>\n<li>Build monitoring and observability tools to track model performance, data quality, and system health for safety-critical applications</li>\n<li>Collaborate with research teams to productionize safety research, translating experimental safety techniques into robust, scalable systems</li>\n<li>Optimize inference latency and throughput for real-time safety evaluations while maintaining high reliability standards</li>\n<li>Implement automated testing, deployment, and rollback systems for ML models in production safety applications</li>\n<li>Partner with Safeguards, Security, and Alignment teams to understand requirements and deliver infrastructure that meets safety and production needs</li>\n<li>Contribute to the development of internal tools and frameworks that accelerate safety research and deployment</li>\n</ul>\n<p>You may be a good fit if you:</p>\n<ul>\n<li>Have 5+ years of experience building production ML infrastructure, ideally in safety-critical domains like fraud detection, content moderation, or risk assessment</li>\n<li>Are proficient in Python and have experience with ML frameworks like PyTorch, TensorFlow, or JAX</li>\n<li>Have hands-on experience with cloud platforms (AWS, GCP) and container orchestration (Kubernetes)</li>\n<li>Understand distributed systems principles and have built systems that handle high-throughput, low-latency workloads</li>\n<li>Have experience with data engineering tools and building robust data pipelines (e.g., Spark, Airflow, streaming systems)</li>\n<li>Are results-oriented, with a bias towards reliability and impact in safety-critical systems</li>\n<li>Enjoy collaborating with researchers and translating cutting-edge research into production systems</li>\n<li>Care deeply about AI safety and the societal impacts of your work</li>\n</ul>\n<p>Strong candidates may have experience with:</p>\n<ul>\n<li>Working with large language models and modern transformer architectures</li>\n<li>Implementing A/B testing frameworks and experimentation infrastructure for ML systems</li>\n<li>Developing monitoring and alerting systems for ML model performance and data drift</li>\n<li>Building automated labeling systems and human-in-the-loop workflows</li>\n<li>Experience in trust &amp; safety, fraud prevention, or content moderation domains</li>\n<li>Knowledge of privacy-preserving ML techniques and compliance requirements</li>\n<li>Contributing to open-source ML infrastructure projects</li>\n</ul>\n<p>We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_bd9625d9-99b","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Anthropic","sameAs":"https://www.anthropic.com/","logo":"https://logos.yubhub.co/anthropic.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/anthropic/jobs/4778843008?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$320,000-$405,000 USD","x-skills-required":["Python","PyTorch","TensorFlow","JAX","Cloud platforms (AWS, GCP)","Container orchestration (Kubernetes)","Distributed systems principles","Data engineering tools (Spark, Airflow, streaming systems)"],"x-skills-preferred":["Large language models and modern transformer architectures","A/B testing frameworks and experimentation infrastructure for ML systems","Monitoring and alerting systems for ML model performance and data drift","Automated labeling systems and human-in-the-loop workflows","Trust & safety, fraud prevention, or content moderation domains","Privacy-preserving ML techniques and compliance requirements","Open-source ML infrastructure projects"],"datePosted":"2026-04-18T15:44:06.907Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco, CA"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, PyTorch, TensorFlow, JAX, Cloud platforms (AWS, GCP), Container orchestration (Kubernetes), Distributed systems principles, Data engineering tools (Spark, Airflow, streaming systems), Large language models and modern transformer architectures, A/B testing frameworks and experimentation infrastructure for ML systems, Monitoring and alerting systems for ML model performance and data drift, Automated labeling systems and human-in-the-loop workflows, Trust & safety, fraud prevention, or content moderation domains, Privacy-preserving ML techniques and compliance requirements, Open-source ML infrastructure projects","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":320000,"maxValue":405000,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_9238107d-204"},"title":"Software Architect, Reliability Engineering","description":"<p>Join the team as Twilio&#39;s next Reliability Architect.</p>\n<p>As an Architect in SRE, you will drive the technical strategy, vision and outcomes for Twilio&#39;s Reliability Engineering organisation. You will define and lead solutions and initiatives that ensure Twilio products are reliable worldwide, and you will define standards and guide engineering teams on best practices for designing, building, and operating resilient systems.</p>\n<p>This role is pivotal to Twilio&#39;s commitment to operational excellence, scalability, and pragmatic, large-scale systems design in the cloud.</p>\n<p>Responsibilities:</p>\n<ul>\n<li>Partner with senior technical leaders across Twilio to set and communicate the reliability strategy, translating business goals into measurable outcomes.</li>\n<li>Influence company-wide architectural decisions while balancing long-term vision with near-term and compliance needs.</li>\n<li>Lead the design, implementation, and operation of scalable solutions and paved roads that enable reliable, high-traffic services;</li>\n<li>Influence company-wide architectural decisions to focus on availability, performance, resilience, and cost efficiency using Kubernetes, AWS, Terraform, and modern observability.</li>\n<li>Ensure integrity and quality across the service lifecycle; design fault-tolerant architectures, incident response, disaster recovery, and capacity/cost management.</li>\n<li>Collaborate with product and cross-functional teams to identify reliability risks and convert them into actionable designs, programs, and tooling.</li>\n<li>Establish and champion reliability practices and drive systemic improvements.</li>\n<li>Mentor and grow engineers and technical leaders</li>\n<li>Track and apply emerging SRE, cloud, and large-scale systems best practices; introduce pragmatic innovations that improve reliability at scale.</li>\n</ul>\n<p>Qualifications:</p>\n<ul>\n<li>15+ years of experience in Reliability Engineering, Software Engineering, DevOps roles with a focus on infrastructure, backend systems, and reliability, including as a principal/architect.</li>\n<li>Strong experience in driving strategic technical decisions and defining long-term technical vision.</li>\n<li>In-depth understanding of the role of Reliability Engineering in a large and diverse SaaS organisation.</li>\n<li>Experience driving cross-org technical architecture outcomes.</li>\n<li>Knowledge of cloud architecture, devops practices, and large-scale systems design with microservices.</li>\n<li>Bachelor&#39;s or Master&#39;s degree in Computer Science, Engineering, or a related field (or equivalent experience).</li>\n<li>Strong production experience, including operational management, scaling, partitioning strategies, and tuning for performance and reliability in high-scale environments.</li>\n<li>Hands-on experience with Kubernetes (e.g., EKS), deploying and managing stateful services, and cloud services like AWS.</li>\n<li>Proficiency in infrastructure-as-code tools such as Terraform or CloudFormation for automating infrastructure.</li>\n<li>Expertise in observability tools (e.g., Prometheus, Grafana, Datadog) for monitoring distributed systems and setting up alerting.</li>\n<li>Proficient in at least one programming language (e.g., Go, Python, Java) for building automation and tooling.</li>\n<li>Experience designing incident response processes, SLOs/SLIs, runbooks, and participating in on-call rotations.</li>\n<li>Experience running cross-functional post-incident reviews and driving improvements.</li>\n<li>Strong understanding of distributed systems principles, including consensus, durability, throughput, and availability tradeoffs.</li>\n<li>Proven track record of leading reliability improvements in data-intensive or mission-critical systems and collaborating with engineering teams.</li>\n<li>Excellent problem-solving, analytical, verbal, and written communication skills, with the ability to work in cross-functional and distributed environments.</li>\n<li>Demonstrated leadership in mentoring teams, influencing decisions, and balancing long-term objectives with short-term needs.</li>\n<li>Ability to influence and build effective working relationships with all levels of the organisation.</li>\n</ul>\n<p>Desired:</p>\n<ul>\n<li>Specific experience owning and operating large AWS footprints.</li>\n<li>Knowledge of Kubernetes architecture and concepts.</li>\n<li>Experience with data technologies like Apache Kafka, AWS MSK, or similar for reliable streaming.</li>\n<li>Passion for building reliable products, with prior projects in high-availability systems</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_9238107d-204","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Twilio","sameAs":"https://www.twilio.com/","logo":"https://logos.yubhub.co/twilio.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/twilio/jobs/7658259?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$227,840.00 - $284,800.00 per year","x-skills-required":["Reliability Engineering","Software Engineering","DevOps","Cloud Architecture","Microservices","Kubernetes","AWS","Terraform","Observability Tools","Programming Languages","Incident Response","Distributed Systems Principles"],"x-skills-preferred":["Apache Kafka","AWS MSK","Kubernetes Architecture","Data Technologies"],"datePosted":"2026-04-18T15:42:56.209Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Remote - US"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Reliability Engineering, Software Engineering, DevOps, Cloud Architecture, Microservices, Kubernetes, AWS, Terraform, Observability Tools, Programming Languages, Incident Response, Distributed Systems Principles, Apache Kafka, AWS MSK, Kubernetes Architecture, Data Technologies","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":227840,"maxValue":284800,"unitText":"YEAR"}}},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_de69e6fb-3d3"},"title":"Engineering Leader","description":"<p>About the Role\nWe are seeking an experienced Engineering Manager to lead our engineering teams. This role combines strategic technical leadership with people management, requiring someone who can set technical direction, make critical architectural decisions, and build high-performing teams.</p>\n<p>Responsibilities</p>\n<ul>\n<li>Strategic Technical Leadership: Provide technical direction for full stack systems, making key architectural and systems-level decisions to ensure scalability, reliability, and security. Stay engaged with hands-on technical work on complex systems and critical projects.</li>\n<li>Team Leadership and Development: Lead and mentor a team of engineers, fostering their technical growth and ensuring high-quality output. Create a collaborative, high-performing culture that aligns with Flow&#39;s values of integrity, trust, and excellence.</li>\n<li>Cross-Functional Collaboration: Partner with product, operations, and engineering teams to define clear requirements and translate them into scalable, maintainable software solutions. Effectively communicate technical concepts to non-technical stakeholders.</li>\n<li>Execution and Delivery: Balance technical excellence with pragmatic decision-making to deliver continuous value to the business. Drive projects forward, identifying and resolving blockers to maintain momentum.</li>\n<li>Talent Development: Contribute to recruitment and hiring of engineering talent. Foster an environment of continuous learning, accountability, and teamwork.</li>\n</ul>\n<p>Qualifications</p>\n<ul>\n<li>Technical Expertise: 10+ years of software engineering &amp; management experience with deep knowledge of full stack systems, distributed systems principles, high availability, fault tolerance, and system efficiency. Proficiency in multiple modern programming languages with ability to adapt to new technologies quickly.</li>\n<li>Leadership Experience: Proven experience managing and developing engineering teams, with demonstrated ability to mentor engineers, resolve conflicts, and drive performance improvements.</li>\n<li>Architectural Thinking: Strong track record of making high-level architectural decisions that balance technical feasibility with business goals. Experience working in fast-paced, product-driven environments.</li>\n<li>Communication Skills: Excellent ability to explain complex technical concepts clearly across teams and to non-technical audiences. Strong collaboration and stakeholder management skills.</li>\n<li>Cultural Fit: Alignment with Flow&#39;s values of innovation, excellence, and teamwork. Demonstrated commitment to building positive, inclusive work environments.</li>\n</ul>\n<p>Additional Information\nBenefits\n• Comprehensive Benefits Package (Medical / Dental / Vision / Disability / Life)\n• Paid time off and 13 paid holidays\n• 401(k) retirement plan\n• Healthcare and Dependent Care Flexible Spending Accounts (FSAs)\n• Access to HSA-compatible plans\n• Pre-tax commuter benefits\n• Employee Assistance Program (EAP), free therapy through SpringHealth, acupuncture, and other wellness offerings</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_de69e6fb-3d3","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Flow","sameAs":"https://flow.com","logo":"https://logos.yubhub.co/flow.com.png"},"x-apply-url":"https://jobs.lever.co/flowlife/b3241f3f-43fd-45e8-8472-746b5a8c49ba?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$270,000-350,000 per year","x-skills-required":["software engineering","management experience","full stack systems","distributed systems principles","high availability","fault tolerance","system efficiency","multiple modern programming languages"],"x-skills-preferred":[],"datePosted":"2026-04-17T12:34:59.027Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"New York"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"software engineering, management experience, full stack systems, distributed systems principles, high availability, fault tolerance, system efficiency, multiple modern programming languages","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":270000,"maxValue":350000,"unitText":"YEAR"}}}]}