Principal Software Engineer

5e20ca92-993 Principal Software Engineer Monetization Engineering is responsible for building a unified, intelligent, and resilient monetization platform that drives revenue across Microsoft’s AI-native surfaces, including Copilot, Search, MSN, Shopping, and both first-party and third-party ecosystems.

Our mission is to enhance advertiser value, optimize platform performance, and achieve long-term revenue growth through large-scale systems, machine learning-driven optimization, experimentation, and cross-surface innovation.

We are seeking an experienced professional with expertise in GPU inference optimization and a deep understanding of LLM/SLM architecture to join our team.

This is a unique opportunity to contribute to cutting-edge advancements in AI and deep learning while driving impactful solutions for Microsoft’s advertising and monetization platforms.

Microsoft’s mission is to empower every person and every organization on the planet to achieve more.

As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals.

Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Starting January 26, 2026, Microsoft AI (MAI) employees who live within a 50-mile commute of a designated Microsoft office in the U.S. or 25-mile commute of a non-U.S., country-specific location are expected to work from the office at least four days per week.

This expectation is subject to local law and may vary by jurisdiction.

Responsibilities:

Serves as the technological core of Microsoft’s rapidly expanding digital advertising business.

Focus on accelerating Microsoft’s large-scale deep learning inference for Ads, Shopping, Copilot, and other surfaces, including both offline and online applications that support OpenAI LLM models and next-generation LLMs/SLMs.

Play a pivotal role in bridging state-of-the-art GPU and deep learning technologies with critical business applications.

Qualifications:

Required Qualifications:

Bachelor’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.

These requirements include but are not limited to the following specialized security screenings:

Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Preferred Qualifications:

Master’s Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor’s Degree in Computer Science or related technical field AND 15+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.

Solid experience in GPU inference optimization (CUDA, TensorRT, Triton, or custom GPU kernels).

Proficiency in profiling tools (Nsight, TensorBoard, PyTorch profiler) and ability to identify CPU/GPU bottlenecks.

Deep understanding of LLM/SLM architectures (attention, embeddings, MoE, decoders).

Experience optimizing latency-critical online services.

Experience with model compression (quantization, distillation, SVD, low-rank methods).

Experience in building high-throughput inference serving stacks (continuous batching, KV-cache optimizations, routing).

Familiarity with Microsoft’s DLIS, Talon routing, Triton/TensorRT-LLM stack, and Azure/H100/A100 GPU environments.

Publications, competition wins, or real-world deployments related to model efficiency.

#MicrosoftAI

XML job scraping automation by YubHub

]]> full-time senior hybrid $163,000 - $296,400 per year GPU inference optimization, LLM/SLM architecture, C, C++, C#, Java, JavaScript, Python, CUDA, TensorRT, Triton, custom GPU kernels, profiling tools, CPU/GPU bottlenecks, model compression, high-throughput inference serving stacks, DLIS, Talon routing, Triton/TensorRT-LLM stack, Azure/H100/A100 GPU environments Engineering Technology Microsoft https://logos.yubhub.co/microsoft.ai.png Microsoft is a multinational technology company that develops, manufactures, licenses, and supports a wide range of software products, services, and devices. https://microsoft.ai https://microsoft.ai/job/principal-software-engineer-47/ Redmond 2026-04-24