{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/custom-gpu-kernels"},"x-facet":{"type":"skill","slug":"custom-gpu-kernels","display":"Custom Gpu Kernels","count":1},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_5e20ca92-993"},"title":"Principal Software Engineer","description":"<p>Monetization Engineering is responsible for building a unified, intelligent, and resilient monetization platform that drives revenue across Microsoft’s AI-native surfaces, including Copilot, Search, MSN, Shopping, and both first-party and third-party ecosystems.</p>\n<p>Our mission is to enhance advertiser value, optimize platform performance, and achieve long-term revenue growth through large-scale systems, machine learning-driven optimization, experimentation, and cross-surface innovation.</p>\n<p>We are seeking an experienced professional with expertise in GPU inference optimization and a deep understanding of LLM/SLM architecture to join our team.</p>\n<p>This is a unique opportunity to contribute to cutting-edge advancements in AI and deep learning while driving impactful solutions for Microsoft’s advertising and monetization platforms.</p>\n<p>Microsoft’s mission is to empower every person and every organization on the planet to achieve more.</p>\n<p>As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals.</p>\n<p>Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.</p>\n<p>Starting January 26, 2026, Microsoft AI (MAI) employees who live within a 50-mile commute of a designated Microsoft office in the U.S. or 25-mile commute of a non-U.S., country-specific location are expected to work from the office at least four days per week.</p>\n<p>This expectation is subject to local law and may vary by jurisdiction.</p>\n<p>Responsibilities:</p>\n<p>Serves as the technological core of Microsoft’s rapidly expanding digital advertising business.</p>\n<p>Focus on accelerating Microsoft’s large-scale deep learning inference for Ads, Shopping, Copilot, and other surfaces, including both offline and online applications that support OpenAI LLM models and next-generation LLMs/SLMs.</p>\n<p>Play a pivotal role in bridging state-of-the-art GPU and deep learning technologies with critical business applications.</p>\n<p>Qualifications:</p>\n<p>Required Qualifications:</p>\n<p>Bachelor’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.</p>\n<p>Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.</p>\n<p>These requirements include but are not limited to the following specialized security screenings:</p>\n<p>Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.</p>\n<p>Preferred Qualifications:</p>\n<p>Master’s Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor’s Degree in Computer Science or related technical field AND 15+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.</p>\n<p>Solid experience in GPU inference optimization (CUDA, TensorRT, Triton, or custom GPU kernels).</p>\n<p>Proficiency in profiling tools (Nsight, TensorBoard, PyTorch profiler) and ability to identify CPU/GPU bottlenecks.</p>\n<p>Deep understanding of LLM/SLM architectures (attention, embeddings, MoE, decoders).</p>\n<p>Experience optimizing latency-critical online services.</p>\n<p>Experience with model compression (quantization, distillation, SVD, low-rank methods).</p>\n<p>Experience in building high-throughput inference serving stacks (continuous batching, KV-cache optimizations, routing).</p>\n<p>Familiarity with Microsoft’s DLIS, Talon routing, Triton/TensorRT-LLM stack, and Azure/H100/A100 GPU environments.</p>\n<p>Publications, competition wins, or real-world deployments related to model efficiency.</p>\n<p>#MicrosoftAI</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_5e20ca92-993","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Microsoft","sameAs":"https://microsoft.ai","logo":"https://logos.yubhub.co/microsoft.ai.png"},"x-apply-url":"https://microsoft.ai/job/principal-software-engineer-47/","x-work-arrangement":"hybrid","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$163,000 - $296,400 per year","x-skills-required":["GPU inference optimization","LLM/SLM architecture","C","C++","C#","Java","JavaScript","Python","CUDA","TensorRT","Triton","custom GPU kernels","profiling tools","CPU/GPU bottlenecks","model compression","high-throughput inference serving stacks","DLIS","Talon routing","Triton/TensorRT-LLM stack","Azure/H100/A100 GPU environments"],"x-skills-preferred":[],"datePosted":"2026-04-24T12:10:41.636Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Redmond"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"GPU inference optimization, LLM/SLM architecture, C, C++, C#, Java, JavaScript, Python, CUDA, TensorRT, Triton, custom GPU kernels, profiling tools, CPU/GPU bottlenecks, model compression, high-throughput inference serving stacks, DLIS, Talon routing, Triton/TensorRT-LLM stack, Azure/H100/A100 GPU environments","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":163000,"maxValue":296400,"unitText":"YEAR"}}}]}