Staff Software Engineer, Platform

1bebb6dc-380 Staff Software Engineer, Platform We live in unprecedented times – AI has the potential to exponentially augment human intelligence. As the world adjusts to this new reality, leading platform companies are scrambling to build LLMs at billion scale, while large enterprises figure out how to add it to their products.

At Scale, our products include the Generative AI Data Engine, SGP, Donovan, and others that power the most advanced LLMs and generative models in the world through world-class RLHF, human data generation, model evaluation, safety, and alignment.

As a Staff Software Engineer, you will define and drive both the architectural roadmap and implementation of core platforms and software systems. You will be responsible for providing high-level vision and driving adoption across the engineering org for orchestration, data abstraction, data pipelines, identity & access management, and underlying cloud infrastructure.

Impact and Responsibilities:

Architectural Vision: You will drive the design and implementation of foundational systems, acting as a bridge between high-level business goals and technical goals.

Cross-Functional Leadership: You will collaborate with cross-functional teams to define and drive adoption of the next generation of features for our AI data infrastructure.

Technical Ownership: You are responsible for proactively identifying and driving opportunities for organizational growth, driving improvements in programming practices, and upgrading the tools that define our development lifecycle.

Technical Mentorship: You will serve as a subject matter expert, presenting technical information to stakeholders and providing the guidance to elevate the engineering culture across the company.

Ideally you’d have:

8+ years of full-time engineering experience, post-graduation with specialities in back-end systems.

Extensive experience in software development and a deep understanding of distributed systems and public cloud platforms (AWS preferred).

Demonstrated a track record of independent ownership and leadership across successful multi-team engineering projects.

Possess excellent communication and collaboration skills, and the ability to translate complex technical concepts to non-technical stakeholders.

Experience working fluently with standard containerization & deployment technologies like Kubernetes, Terraform, Docker, etc.

Experience with orchestration platforms, such as Temporal and AWS Step Functions.

Experience with NoSQL document databases (MongoDB) and structured databases (Postgres).

Strong knowledge of software engineering best practices and CI/CD tooling (CircleCI, ArgoCD).

Nice to haves:

Experience with data warehouses (Snowflake, Firebolt) and data pipeline/ETL tools (Dagster, dbt).

Experience scaling products at hyper-growth startups.

Excitement to work with AI technologies.

Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position, determined by work location and additional factors, including job-related skills, experience, interview performance, and relevant education or training.

For pay transparency purposes, the base salary range for this full-time position in the locations of San Francisco, New York, Seattle is: $252,000-$315,000 USD

XML job scraping automation by YubHub

]]> full-time staff onsite $252,000-$315,000 USD Software development, Distributed systems, Public cloud platforms, Containerization & deployment technologies, Orchestration platforms, NoSQL document databases, Structured databases, Software engineering best practices, CI/CD tooling, Data warehouses, Data pipeline/ETL tools, Scaling products at hyper-growth startups, AI technologies Engineering Technology Scale https://logos.yubhub.co/scale.com.png Scale develops reliable AI systems for the world's most important decisions, providing high-quality data and full-stack technologies that power leading models. https://scale.com https://job-boards.greenhouse.io/scaleai/jobs/4649893005 San Francisco, CA; New York, NY 2026-04-18 43952002-812 Software Engineer, AI Developer Tooling We're looking for a Software Engineer to join our Platform Engineering team. As a Software Engineer, you will redefine how engineers develop, build, test, and deploy software at Scale using AI development tools in addition to traditional practices. You'll also get widespread exposure to the forefront of the AI race as Scale sees it in enterprises, startups, governments, and large tech companies.

Your responsibilities will include:

Defining next-generation AI development tooling and frameworks using products like Cursor, Claude Code, OpenAI Codex, and MS Copilot, as well as in-house custom-built solutions.
Driving the architecture, design, and implementation of our local development process, build, test, continuous integration, and continuous delivery systems, working closely with stakeholders and internal customers to understand and refine requirements.
Directly mentoring software engineers ranging from new grads to experienced engineers.
Proactively identifying opportunities and driving improvements to software development practices, processes, tools, and languages.
Presenting technical information to teams and stakeholders, providing guidance and insight on development processes and technologies.

Ideally, you'd have:

4+ years of full-time engineering experience, post-graduation, with experience in build, test, or CI/CD systems.
Extensive experience defining and evangelizing best-practices for AI development tools, including cost guardrails, security frameworks, and hosting knowledge-sharing sessions, among others.
Extensive experience in software development and a deep understanding of distributed systems and public cloud platforms (AWS preferred).
Experience configuring, testing, and enabling MCP servers, AI agents, and other associated systems.
A track record of independent ownership of successful engineering projects.
Excellent communication and collaboration skills, and the ability to translate complex technical concepts to non-technical stakeholders.
Experience working fluently with standard infrastructure, containerization, and deployment technologies like Terraform, Docker, Kubernetes, etc.
Experience with modern web frameworks like NodeJS, NextJS, etc.
Strong knowledge of software engineering best practices and CI/CD tooling (CircleCI, Helm, ArgoCD).

This role may be eligible for additional benefits such as a commuter stipend.

XML job scraping automation by YubHub

]]> full-time senior hybrid $180,000-$225,000 USD software development, distributed systems, public cloud platforms, MCP servers, AI agents, standard infrastructure, containerization, deployment technologies, modern web frameworks, software engineering best practices, CI/CD tooling, Cursor, Claude Code, OpenAI Codex, MS Copilot, Terraform, Docker, Kubernetes, NodeJS, NextJS Engineering Technology Scale https://logos.yubhub.co/scale.com.png Scale develops reliable AI systems for the world's most important decisions. https://scale.com/ https://job-boards.greenhouse.io/scaleai/jobs/4676936005 San Francisco, CA; Seattle, WA; New York, NY 2026-04-18 1869fa15-51d Software Engineer, Platform We're looking for a skilled Software Engineer to join our Platform Engineering team. As a key member of our team, you will support the design and development of shared platforms used across Scale. This includes designing our foundational data platforms and lifecycle, architecting Scale's core cloud infrastructure and orchestration stack, and redefining how engineers develop, build, test, and deploy software at Scale.

You will drive the design, and implementation of our foundational platforms and systems, working closely with stakeholders and internal customers to understand and refine requirements. You'll collaborate with cross-functional teams to define, design, and deliver new features. You'll also proactively identify opportunities for, and drive improvements to, current programming practices, including process enhancements and tool upgrades.

Ideally, you'd have 3+ years of full-time engineering experience, post-graduation with specialities in back-end systems. You should have extensive experience in software development and a deep understanding of distributed systems and public cloud platforms (AWS preferred). You should show a track record of independent ownership of successful engineering projects. You should possess excellent communication and collaboration skills, and the ability to translate complex technical concepts to non-technical stakeholders.

You should have experience working fluently with standard containerization & deployment technologies like Kubernetes, Terraform, Docker, etc. You should have experience with orchestration platforms, such as Temporal and AWS Step Functions. You should have experience with NoSQL document databases (MongoDB) and structured databases (Postgres). You should have strong knowledge of software engineering best practices and CI/CD tooling (CircleCI).

Nice to haves include experience with data warehouses (Snowflake, Firebolt) and data pipeline/ETL tools (Dagster, dbt). Experience with authentication/authorization systems (Zanzibar, Authz, etc.) is also a plus. Experience scaling products at hyper-growth startups is highly valued. Excitement to work with AI technologies is a must.

XML job scraping automation by YubHub

]]> full-time mid hybrid $180,000-$225,000 USD software development, distributed systems, public cloud platforms, containerization & deployment technologies, orchestration platforms, NoSQL document databases, structured databases, software engineering best practices, CI/CD tooling, data warehouses, data pipeline/ETL tools, authentication/authorization systems, scaling products at hyper-growth startups, AI technologies Engineering Technology Scale https://logos.yubhub.co/scale.com.png Scale develops reliable AI systems for the world's most important decisions. https://scale.com/ https://job-boards.greenhouse.io/scaleai/jobs/4594879005 San Francisco, CA; New York, NY 2026-04-18 c53ecdd3-dc7 Scale Solution Engineer As a Scale Solution Engineer at Databricks, you will play a critical role in advising customers during their onboarding process. You will work directly with customers to help them onboard and deploy Databricks in their production environment.

Your impact will be significant, ensuring new customers have an excellent experience by providing technical assistance early in their journey. You will become an expert on the Databricks Platform and guide customers in making the best technical decisions. You will also work directly with multiple customers concurrently to provide technical solutions.

To succeed in this role, you will need:

An undergraduate degree or higher in Computer Science, Information Systems, or relevant experience
1+ years experience in a technical role, preferably in the data or cloud field
Knowledge of at least one of the public cloud platforms AWS, Azure, or GCP
Knowledge of a programming language such as Python, Scala, or SQL
Knowledge of end-to-end data analytics workflow
Hands-on professional or academic experience in one or more of the following: Data Engineering technologies (e.g., ETL, DBT, Spark, Airflow), Data Warehousing technologies (e.g., SQL, Stored Procedures, Redshift, Snowflake)
Excellent time management and prioritization skills
Excellent written and verbal communication

Bonus: Knowledge of Data Science and Machine Learning (e.g., build and deploy ML Models)

XML job scraping automation by YubHub

]]> full-time mid remote public cloud platforms, AWS, Azure, GCP, Python, Scala, SQL, Data Engineering technologies, ETL, DBT, Spark, Airflow, Data Warehousing technologies, Stored Procedures, Redshift, Snowflake, Data Science, Machine Learning Engineering Technology Databricks https://logos.yubhub.co/databricks.com.png Databricks is a data and AI company that provides a data intelligence platform to unify and democratize data, analytics, and AI. Over 10,000 organisations worldwide rely on its platform. https://databricks.com https://job-boards.greenhouse.io/databricks/jobs/8408817002 Costa Rica 2026-04-18 93a4ece6-182 Member of Technical Staff, Site Reliability Engineer (HPC) As Microsoft continues to push the boundaries of AI, we are on the lookout for experienced individuals to work with us on the most interesting and challenging AI questions of our time. Our vision is to build systems that have true artificial intelligence across agents, applications, services, and infrastructure. We're looking for an experienced HPC Site Reliability Engineer (SRE) to join our High Performance Computing (HPC) infrastructure team. In this role, you'll blend software engineering and systems engineering to keep our large-scale distributed AI infrastructure reliable and efficient. You'll ensure that AI systems stay efficient and reliable with very high uptimes.

Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

This role is part of Microsoft AI's Superintelligence Team. The MAIST is a startup-like team inside Microsoft AI, created to push the boundaries of AI toward Humanist Superintelligence—ultra-capable systems that remain controllable, safety-aligned, and anchored to human values. Our mission is to create AI that amplifies human potential while ensuring humanity remains firmly in control. We aim to deliver breakthroughs that benefit society—advancing science, education, and global well-being.

Responsibilities Reliability & Availability : Ensure uptime, resiliency, and fault tolerance of HPC clusters powering MAI model training and inference. Observability : Design and maintain monitoring, alerting, and logging systems to provide real-time visibility into all aspects of HPC systems including GPU, clusters, storage and networking. Automation & Tooling : Build automation for deployments, incident response, scaling, and failover in CPU+GPU environments. Incident Management : Lead on-call rotations, troubleshoot production issues, conduct blameless postmortems, and drive continuous improvements. Security & Compliance : Ensure data privacy, compliance, and secure operations across model training and serving environments. Collaboration : Partner with ML engineers and platform teams to improve developer experience and accelerate research-to-production workflows.

Qualifications Required Qualifications Master’s Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering OR Bachelor’s Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering OR equivalent experience

Preferred Qualifications Strong proficiency in Kubernetes, Docker, and container orchestration. Knowledge of CI/CD pipelines for Inference and ML model deployment. Hands-on experience with public cloud platforms like Azure/AWS/GCP and infrastructure-as-code. Expertise in monitoring & observability tools (Grafana, Datadog, OpenTelemetry, etc.). Strong programming/scripting skills in Python, Go, or Bash. Solid knowledge of distributed systems, networking, and storage. Experience running large-scale GPU clusters for ML/AI workloads (preferred). Familiarity with ML training/inference pipelines. Experience with high-performance computing (HPC) and workload schedulers (Kubernetes operators). Background in capacity planning & cost optimization for GPU-heavy environments.

Work on cutting-edge infrastructure that powers the future of Generative AI. Collaborate with world-class researchers and engineers. Impact millions of users through reliable and responsible AI deployments. Competitive compensation, equity options, and comprehensive benefits.

XML job scraping automation by YubHub

]]> full-time staff hybrid $139,900 – $274,800 per year Kubernetes, Docker, container orchestration, CI/CD pipelines, public cloud platforms, infrastructure-as-code, monitoring & observability tools, programming/scripting skills in Python, Go, or Bash, distributed systems, networking, storage, GPU clusters, ML training/inference pipelines, high-performance computing, workload schedulers, strong proficiency in Kubernetes, knowledge of CI/CD pipelines, hands-on experience with public cloud platforms, expertise in monitoring & observability tools, strong programming/scripting skills in Python, Go, or Bash, solid knowledge of distributed systems, experience running large-scale GPU clusters, familiarity with ML training/inference pipelines, experience with high-performance computing Engineering Technology Microsoft https://logos.yubhub.co/microsoft.ai.png Microsoft is a multinational technology company that develops, manufactures, licenses, and supports a wide range of software products, services, and devices. https://microsoft.ai https://microsoft.ai/job/member-of-technical-staff-site-reliability-engineer-hpc-mai-superintelligence-team/ Mountain View 2026-03-08 2b3a3ab9-2bc Member of Technical Staff, HPC Operations Engineering Manager Summary

Microsoft AI are looking for a talented Member of Technical Staff, HPC Operations Engineering Manager to join their MAI SuperIntelligence Team. This role sits at the heart of strategic decision-making, turning market data into actionable insights for a company that's revolutionising haptic entertainment technology. You'll work directly with leadership to shape the company's direction in the cinema and simulation markets.

About the Role

In this role, you'll lead a team of Site Reliability Engineers who blend software engineering and systems engineering to keep our large-scale distributed AI infrastructure reliable and efficient. You'll work closely with ML researchers, data engineers, and product developers to design and operate the platforms that power training, fine-tuning, and serving generative AI models.

Accountabilities

Conduct in-depth market research across cinema and simulation sectors, identifying emerging trends, competitive threats, and partnership opportunities that directly inform the company's quarterly strategic planning sessions
Lead a team of experienced SREs to ensure uptime, resiliency and fault tolerance of AI model training and inference systems

The Candidate we're looking for

Experience:

8+ years technical engineering experience with Site Reliability Engineering, DevOps, or Infrastructure Engineering Leadership roles

Technical skills:

Kubernetes, Docker, and container orchestration
Public cloud platforms like Azure/AWS/GCP and infrastructure-as-code

Personal attributes:

Low ego individual

Benefits

Competitive salary
Benefits and other compensation

XML job scraping automation by YubHub

]]> full-time senior onsite USD $139,900 – $274,800 per year Kubernetes, Docker, container orchestration, public cloud platforms, infrastructure-as-code, monitoring & observability tools, Grafana, Datadog, OpenTelemetry Engineering Technology Microsoft AI https://logos.yubhub.co/microsoft.ai.png Microsoft AI is a leading technology company that empowers every person and every organization on the planet to achieve more. Their mission is to create AI that amplifies human potential while ensuring humanity remains firmly in control. https://microsoft.ai https://microsoft.ai/job/member-of-technical-staff-hpc-operations-engineering-manager-mai-superintelligence-team/ Mountain View 2026-03-06