<?xml version="1.0" encoding="UTF-8"?>
<source>
  <jobs>
    <job>
      <externalid>5e20ca92-993</externalid>
      <Title>Principal Software Engineer</Title>
      <Description><![CDATA[<p>Monetization Engineering is responsible for building a unified, intelligent, and resilient monetization platform that drives revenue across Microsoft’s AI-native surfaces, including Copilot, Search, MSN, Shopping, and both first-party and third-party ecosystems.</p>
<p>Our mission is to enhance advertiser value, optimize platform performance, and achieve long-term revenue growth through large-scale systems, machine learning-driven optimization, experimentation, and cross-surface innovation.</p>
<p>We are seeking an experienced professional with expertise in GPU inference optimization and a deep understanding of LLM/SLM architecture to join our team.</p>
<p>This is a unique opportunity to contribute to cutting-edge advancements in AI and deep learning while driving impactful solutions for Microsoft’s advertising and monetization platforms.</p>
<p>Microsoft’s mission is to empower every person and every organization on the planet to achieve more.</p>
<p>As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals.</p>
<p>Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.</p>
<p>Starting January 26, 2026, Microsoft AI (MAI) employees who live within a 50-mile commute of a designated Microsoft office in the U.S. or 25-mile commute of a non-U.S., country-specific location are expected to work from the office at least four days per week.</p>
<p>This expectation is subject to local law and may vary by jurisdiction.</p>
<p>Responsibilities:</p>
<p>Serves as the technological core of Microsoft’s rapidly expanding digital advertising business.</p>
<p>Focus on accelerating Microsoft’s large-scale deep learning inference for Ads, Shopping, Copilot, and other surfaces, including both offline and online applications that support OpenAI LLM models and next-generation LLMs/SLMs.</p>
<p>Play a pivotal role in bridging state-of-the-art GPU and deep learning technologies with critical business applications.</p>
<p>Qualifications:</p>
<p>Required Qualifications:</p>
<p>Bachelor’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.</p>
<p>Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.</p>
<p>These requirements include but are not limited to the following specialized security screenings:</p>
<p>Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.</p>
<p>Preferred Qualifications:</p>
<p>Master’s Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor’s Degree in Computer Science or related technical field AND 15+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.</p>
<p>Solid experience in GPU inference optimization (CUDA, TensorRT, Triton, or custom GPU kernels).</p>
<p>Proficiency in profiling tools (Nsight, TensorBoard, PyTorch profiler) and ability to identify CPU/GPU bottlenecks.</p>
<p>Deep understanding of LLM/SLM architectures (attention, embeddings, MoE, decoders).</p>
<p>Experience optimizing latency-critical online services.</p>
<p>Experience with model compression (quantization, distillation, SVD, low-rank methods).</p>
<p>Experience in building high-throughput inference serving stacks (continuous batching, KV-cache optimizations, routing).</p>
<p>Familiarity with Microsoft’s DLIS, Talon routing, Triton/TensorRT-LLM stack, and Azure/H100/A100 GPU environments.</p>
<p>Publications, competition wins, or real-world deployments related to model efficiency.</p>
<p>#MicrosoftAI</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$163,000 - $296,400 per year</Salaryrange>
      <Skills>GPU inference optimization, LLM/SLM architecture, C, C++, C#, Java, JavaScript, Python, CUDA, TensorRT, Triton, custom GPU kernels, profiling tools, CPU/GPU bottlenecks, model compression, high-throughput inference serving stacks, DLIS, Talon routing, Triton/TensorRT-LLM stack, Azure/H100/A100 GPU environments</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Microsoft</Employername>
      <Employerlogo>https://logos.yubhub.co/microsoft.ai.png</Employerlogo>
      <Employerdescription>Microsoft is a multinational technology company that develops, manufactures, licenses, and supports a wide range of software products, services, and devices.</Employerdescription>
      <Employerwebsite>https://microsoft.ai</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://microsoft.ai/job/principal-software-engineer-47/</Applyto>
      <Location>Redmond</Location>
      <Country></Country>
      <Postedate>2026-04-24</Postedate>
    </job>
  </jobs>
</source>