Senior AI Infrastructure Engineer, Model Serving Platform

859cb1cf-b9c Senior AI Infrastructure Engineer, Model Serving Platform As a Senior AI Infrastructure Engineer on the Model Serving Platform team, you will design and build platforms for scalable, reliable, and efficient serving of Large Language Models (LLMs). Our platform powers cutting-edge research and production systems, supporting both internal and external use cases across various environments.

The ideal candidate combines strong ML fundamentals with deep expertise in backend system design. You’ll work in a highly collaborative environment, bridging research and engineering to deliver seamless experiences to our customers and accelerate innovation across the company.

Responsibilities:

Build and maintain fault-tolerant, high-performance systems for serving LLM workloads at scale.
Build an internal platform to empower LLM capability discovery.
Collaborate with researchers and engineers to integrate and optimize models for production and research use cases.
Conduct architecture and design reviews to uphold best practices in system design and scalability.
Develop monitoring and observability solutions to ensure system health and performance.
Lead projects end-to-end, from requirements gathering to implementation, in a cross-functional environment.

Ideally you’d have:

5+ years of experience building large-scale, high-performance backend systems.
Strong programming skills in one or more languages (e.g., Python, Go, Rust, C++).
Experience with LLM serving and routing fundamentals (e.g. rate limiting, token streaming, load balancing, budgets, etc.).
Experience with LLM capabilities and concepts such as reasoning, tool calling, prompt templates, etc.
Experience with containers and orchestration tools (e.g., Docker, Kubernetes).
Familiarity with cloud infrastructure (AWS, GCP) and infrastructure as code (e.g., Terraform).
Proven ability to solve complex problems and work independently in fast-moving environments.

Nice to haves:

Experience with modern LLM serving frameworks such as vLLM, SGLang, TensorRT-LLM, or text-generation-inference.

Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position, determined by work location and additional factors, including job-related skills, experience, interview performance, and relevant education or training. Scale employees in eligible roles are also granted equity based compensation, subject to Board of Director approval. Your recruiter can share more about the specific salary range for your preferred location during the hiring process, and confirm whether the hired role will be eligible for equity grant. You’ll also receive benefits including, but not limited to: Comprehensive health, dental and vision coverage, retirement benefits, a learning and development stipend, and generous PTO. Additionally, this role may be eligible for additional benefits such as a commuter stipend.

XML job scraping automation by YubHub

]]> full-time senior hybrid $216,000-$270,000 USD Python, Go, Rust, C++, Docker, Kubernetes, AWS, GCP, Terraform, vLLM, SGLang, TensorRT-LLM, text-generation-inference Engineering Technology Scale https://logos.yubhub.co/scale.com.png Scale develops reliable AI systems for the world's most important decisions, providing high-quality data and full-stack technologies to power leading models. https://scale.com/ https://job-boards.greenhouse.io/scaleai/jobs/4520320005 San Francisco, CA; New York, NY 2026-04-18 a45e2e8c-400 Staff Software Engineer, Foundational Model Serving At Databricks, we are enabling data teams to solve the world's toughest problems by building and running the world's best data and AI infrastructure platform. Foundation Model Serving is the API Product for hosting and serving frontier AI model inference for open source models like Llama, Qwen, and GPT OSS as well as proprietary models like Claude and OpenAI GPT.

We're looking for engineers who have owned high scale operational sensitive systems like customer facing APIs, Edge Gateways, ML Inference, or similar services and have an interest in getting deep building LLM APIs and runtimes at scale. As a Staff Engineer, you'll play a critical role in shaping both the product experience and core infrastructure.

The impact you will have:

Design and implement core systems and APIs that power Databricks Foundation Model Serving, ensuring scalability, reliability, and operational excellence.
Partner with product and engineering leadership to define the technical roadmap and long-term architecture for serving workloads.
Drive architectural decisions and trade-offs to optimize performance, throughput, autoscaling, and operational efficiency for GPU serving workloads.
Contribute directly to key components across the serving infrastructure , from working in systems like vLLM and SGLang to creating token based rate limiters and optimizers , ensuring smooth and efficient operations at scale.
Collaborate cross-functionally with product, platform, and research teams to translate customer needs into reliable and performant systems.
Establish best practices for code quality, testing, and operational readiness, and mentor other engineers through design reviews and technical guidance.
Represent the team in cross-organizational technical discussions and influence Databricks’ broader AI platform strategy.

What we look for:

10+ years of experience building and operating large-scale distributed systems.
Experience leading high-scale operationally sensitive backend systems.
A track record of up-leveling teams engineering excellence.
Strong foundation in algorithms, data structures, and system design as applied to large-scale, low-latency serving systems.
Proven ability to deliver technically complex, high-impact initiatives that create measurable customer or business value.
Strong communication skills and ability to collaborate across teams in fast-moving environments.
Strategic and product-oriented mindset with the ability to align technical execution with long-term vision.
Passion for mentoring, growing engineers, and fostering technical excellence.

XML job scraping automation by YubHub

]]> full-time staff onsite $192,000-$260,000 USD large-scale distributed systems, high-scale operationally sensitive backend systems, algorithms, data structures, system design, low-latency serving systems, GPU serving workloads, vLLM, SGLang, token based rate limiters, optimizers Engineering Technology Databricks https://logos.yubhub.co/databricks.com.png Databricks is a data and AI company that provides a unified platform for data, analytics, and AI. It was founded by the original creators of Lakehouse, Apache Spark, Delta Lake, and MLflow. https://databricks.com https://job-boards.greenhouse.io/databricks/jobs/8224683002 San Francisco, California 2026-04-18 d0214534-b6a Senior Applied Scientist We're building the next-generation Grounding Service that powers the latest AI applications—chat assistants, copilots, and autonomous agents—with factual, cited, and trustworthy responses. Our platform stitches together retrieval, reasoning, and real-time data so that large language models stay anchored to enterprise knowledge, the public web, and proprietary tools. We're looking for a Senior Applied Scientist to lead end-to-end science for grounding: inventing retrieval and attribution methods, defining factuality/faithfulness metrics, and shipping production models and APIs that scale to billions of queries. You'll partner closely with engineering, product, research, and customers to deliver fast, reliable, and explainable answers with source citations across a diverse set of domains and modalities. As a team, we value curiosity, pragmatic rigor, and inclusive collaboration. We believe great systems emerge when scientists and engineers co-design metrics, models, and infrastructure—and when we obsess over customer impact, privacy, and safety. Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. Starting January 26, 2026, Microsoft AI (MAI) employees who live within a 50-mile commute of a designated Microsoft office in the U.S. or 25-mile commute of a non-U.S., country-specific location are expected to work from the office at least four days per week. This expectation is subject to local law and may vary by jurisdiction. Responsibilities

Owns the science roadmap for grounding—including retrieval, re-ranking, attribution, and reasoning—driving initiatives from problem framing to production impact. Designs and evolves state-of-the-art retrieval and RAG orchestration across documents, tables, code, and images. Builds citation and provenance systems (e.g., passage highlighting, quote-level alignment, confidence scoring) to reduce hallucinations and increase user trust. Leads experimentation and evaluation using A/B testing, interleaving, NDCG, MRR, precision/recall, and calibration curves to guide measurable trade-offs. Advances tool-augmented grounding through schema-aware retrieval, function calling, knowledge graph joins, and real-time connectors to databases, cloud object stores, search indexes, and the web. Partners with platform engineering to productionize models with scalable inference, embedding services, feature stores, caching, and privacy-compliant multi-tenant systems. Nurtures collaborative relationships with product and business leaders across Microsoft, influencing strategic decisions and driving business impact through technology. Authors white papers, contributes to internal tools and services, and may publish research to generate intellectual property. Bridges the gap between researchers (e.g., Microsoft Research) and development teams, applying long-term research to solve immediate product needs. Leads high-stakes negotiations to ensure cutting-edge technologies are applied practically and effectively. Identifies and solves significant business problems using novel, scalable, and data-driven solutions. Shapes the direction of Microsoft and the broader industry through pioneering product and tooling work. Mentors applied scientists and data scientists, establishing best practices in experimentation, error analysis, and incident review. Collaborates cross-functionally with PMs, research, infrastructure, and security teams to align on milestones, SLAs, and safety protocols. Communicates clearly through design documentation, progress updates, and presentations to executives and customers. Contributes to ethics and privacy policies, identifies bias in product development, and proposes mitigation strategies.

XML job scraping automation by YubHub

]]> full-time senior hybrid Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, Machine Learning, Information Retrieval, Large Language Model Development, Pretraining, Supervised Fine-Tuning, Reinforcement Learning, Optimizing LLM Inference, Master's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field, 6+ years related experience (e.g., statistics, predictive analytics, research), Demonstrated expertise in information retrieval, with publications in top-tier conferences or journals such as NeurIPS, ICML, ICLR, SIGIR, or ACL, Hands-on experience in large language model (LLM) development, including pretraining, supervised fine-tuning (SFT), and reinforcement learning (RL), Proven track record in optimizing LLM inference, or active contributions to open-source frameworks like vLLM, SGLang, or related projects Engineering Technology Microsoft https://logos.yubhub.co/microsoft.ai.png Microsoft is a multinational technology company that develops, manufactures, licenses, and supports a wide range of software products, services, and devices. https://microsoft.ai https://microsoft.ai/job/senior-applied-scientist-37/ Beijing 2026-03-08 61433df5-3e7 Member of Technical Staff, Multimodal Infrastructure Summary

Microsoft AI are looking for a talented Member of Technical Staff, Multimodal Infrastructure to help build the next wave of capabilities of our personalized AI assistant, Copilot. We're looking for someone who will bring an abundance of positive energy, empathy, and kindness to the team every day, in addition to being highly effective.

About the Role

We are seeking a highly skilled and experienced engineer to join our team as a Member of Technical Staff, Multimodal Infrastructure. The successful candidate will be responsible for designing, developing, and maintaining large-scale multimodal data processing pipelines, model pretraining and post-training frameworks, and model inference and serving frameworks. They will work closely with research scientists and product engineers to solve infra-related problems and find a path to get things done despite roadblocks to get their work into the hands of users quickly and iteratively.

Accountabilities

Design, develop, and maintain large-scale multimodal data processing pipelines.
Design, develop, and maintain large-scale multimodal model pretraining and post-training frameworks.
Design, develop, and maintain large-scale multimodal model inference and serving frameworks.

The Candidate we're looking for

Experience:

Bachelor's Degree in Computer Science, or related technical discipline AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python.

Technical skills:

Strong proficiency in distributed data processing infra (resource utilization management, fault tolerance, ray & spark) and CPU/GPU batch processing optimizations.
Experience with state-of-art model inference and serving frameworks.
Experience with image/video/audio data processing.
Experience with common data formats for efficient I/O.

Personal attributes:

Enjoy working in a fast-paced, design-driven, product development cycle.
Embody our Culture and Values.

Benefits

Competitive salary and benefits package.
Opportunities for professional growth and development.
Collaborative and dynamic work environment.
Access to cutting-edge technology and tools.
Flexible work arrangements.

XML job scraping automation by YubHub

]]> full-time staff hybrid Competitive salary and benefits package C, C++, C#, Java, JavaScript, Python, Distributed data processing infra, CPU/GPU batch processing optimizations, State-of-art model inference and serving frameworks, Image/video/audio data processing, Common data formats for efficient I/O, Ray & spark, TensorRT-LLM, SGLang, xDiT, Cache-DiT Engineering Technology Microsoft AI https://logos.yubhub.co/microsoft.ai.png Microsoft AI is a leading technology company that specializes in artificial intelligence and machine learning. They are known for their innovative products and services that aim to make a positive impact on people's lives. Microsoft AI is committed to advancing the field of AI and making it more accessible to everyone. https://microsoft.ai https://microsoft.ai/job/member-of-technical-staff-multimodal-infrastructure-mai-superintelligence-team-3/ New York 2026-03-06