<?xml version="1.0" encoding="UTF-8"?>
<source>
  <jobs>
    <job>
      <externalid>859cb1cf-b9c</externalid>
      <Title>Senior AI Infrastructure Engineer, Model Serving Platform</Title>
      <Description><![CDATA[<p>As a Senior AI Infrastructure Engineer on the Model Serving Platform team, you will design and build platforms for scalable, reliable, and efficient serving of Large Language Models (LLMs). Our platform powers cutting-edge research and production systems, supporting both internal and external use cases across various environments.</p>
<p>The ideal candidate combines strong ML fundamentals with deep expertise in backend system design. You’ll work in a highly collaborative environment, bridging research and engineering to deliver seamless experiences to our customers and accelerate innovation across the company.</p>
<p>Responsibilities:</p>
<ul>
<li>Build and maintain fault-tolerant, high-performance systems for serving LLM workloads at scale.</li>
<li>Build an internal platform to empower LLM capability discovery.</li>
<li>Collaborate with researchers and engineers to integrate and optimize models for production and research use cases.</li>
<li>Conduct architecture and design reviews to uphold best practices in system design and scalability.</li>
<li>Develop monitoring and observability solutions to ensure system health and performance.</li>
<li>Lead projects end-to-end, from requirements gathering to implementation, in a cross-functional environment.</li>
</ul>
<p>Ideally you’d have:</p>
<ul>
<li>5+ years of experience building large-scale, high-performance backend systems.</li>
<li>Strong programming skills in one or more languages (e.g., Python, Go, Rust, C++).</li>
<li>Experience with LLM serving and routing fundamentals (e.g. rate limiting, token streaming, load balancing, budgets, etc.).</li>
<li>Experience with LLM capabilities and concepts such as reasoning, tool calling, prompt templates, etc.</li>
<li>Experience with containers and orchestration tools (e.g., Docker, Kubernetes).</li>
<li>Familiarity with cloud infrastructure (AWS, GCP) and infrastructure as code (e.g., Terraform).</li>
<li>Proven ability to solve complex problems and work independently in fast-moving environments.</li>
</ul>
<p>Nice to haves:</p>
<ul>
<li>Experience with modern LLM serving frameworks such as vLLM, SGLang, TensorRT-LLM, or text-generation-inference.</li>
</ul>
<p>Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position, determined by work location and additional factors, including job-related skills, experience, interview performance, and relevant education or training. Scale employees in eligible roles are also granted equity based compensation, subject to Board of Director approval. Your recruiter can share more about the specific salary range for your preferred location during the hiring process, and confirm whether the hired role will be eligible for equity grant. You’ll also receive benefits including, but not limited to: Comprehensive health, dental and vision coverage, retirement benefits, a learning and development stipend, and generous PTO. Additionally, this role may be eligible for additional benefits such as a commuter stipend.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange>$216,000-$270,000 USD</Salaryrange>
      <Skills>Python, Go, Rust, C++, Docker, Kubernetes, AWS, GCP, Terraform, vLLM, SGLang, TensorRT-LLM, text-generation-inference</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Scale</Employername>
      <Employerlogo>https://logos.yubhub.co/scale.com.png</Employerlogo>
      <Employerdescription>Scale develops reliable AI systems for the world&apos;s most important decisions, providing high-quality data and full-stack technologies to power leading models.</Employerdescription>
      <Employerwebsite>https://scale.com/</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/scaleai/jobs/4520320005</Applyto>
      <Location>San Francisco, CA; New York, NY</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>a45e2e8c-400</externalid>
      <Title>Staff Software Engineer, Foundational Model Serving</Title>
      <Description><![CDATA[<p>At Databricks, we are enabling data teams to solve the world&#39;s toughest problems by building and running the world&#39;s best data and AI infrastructure platform. Foundation Model Serving is the API Product for hosting and serving frontier AI model inference for open source models like Llama, Qwen, and GPT OSS as well as proprietary models like Claude and OpenAI GPT.</p>
<p>We&#39;re looking for engineers who have owned high scale operational sensitive systems like customer facing APIs, Edge Gateways, ML Inference, or similar services and have an interest in getting deep building LLM APIs and runtimes at scale. As a Staff Engineer, you&#39;ll play a critical role in shaping both the product experience and core infrastructure.</p>
<p>The impact you will have:</p>
<ul>
<li>Design and implement core systems and APIs that power Databricks Foundation Model Serving, ensuring scalability, reliability, and operational excellence.</li>
<li>Partner with product and engineering leadership to define the technical roadmap and long-term architecture for serving workloads.</li>
<li>Drive architectural decisions and trade-offs to optimize performance, throughput, autoscaling, and operational efficiency for GPU serving workloads.</li>
<li>Contribute directly to key components across the serving infrastructure , from working in systems like vLLM and SGLang to creating token based rate limiters and optimizers , ensuring smooth and efficient operations at scale.</li>
<li>Collaborate cross-functionally with product, platform, and research teams to translate customer needs into reliable and performant systems.</li>
<li>Establish best practices for code quality, testing, and operational readiness, and mentor other engineers through design reviews and technical guidance.</li>
<li>Represent the team in cross-organizational technical discussions and influence Databricks’ broader AI platform strategy.</li>
</ul>
<p>What we look for:</p>
<ul>
<li>10+ years of experience building and operating large-scale distributed systems.</li>
<li>Experience leading high-scale operationally sensitive backend systems.</li>
<li>A track record of up-leveling teams engineering excellence.</li>
<li>Strong foundation in algorithms, data structures, and system design as applied to large-scale, low-latency serving systems.</li>
<li>Proven ability to deliver technically complex, high-impact initiatives that create measurable customer or business value.</li>
<li>Strong communication skills and ability to collaborate across teams in fast-moving environments.</li>
<li>Strategic and product-oriented mindset with the ability to align technical execution with long-term vision.</li>
<li>Passion for mentoring, growing engineers, and fostering technical excellence.</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>staff</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange>$192,000-$260,000 USD</Salaryrange>
      <Skills>large-scale distributed systems, high-scale operationally sensitive backend systems, algorithms, data structures, system design, low-latency serving systems, GPU serving workloads, vLLM, SGLang, token based rate limiters, optimizers</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Databricks</Employername>
      <Employerlogo>https://logos.yubhub.co/databricks.com.png</Employerlogo>
      <Employerdescription>Databricks is a data and AI company that provides a unified platform for data, analytics, and AI. It was founded by the original creators of Lakehouse, Apache Spark, Delta Lake, and MLflow.</Employerdescription>
      <Employerwebsite>https://databricks.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://job-boards.greenhouse.io/databricks/jobs/8224683002</Applyto>
      <Location>San Francisco, California</Location>
      <Country></Country>
      <Postedate>2026-04-18</Postedate>
    </job>
    <job>
      <externalid>89406e8e-f38</externalid>
      <Title>Machine Learning Engineer, Open-Source Software</Title>
      <Description><![CDATA[<p>You will be in charge of open-sourcing state-of-the-art models, whilst maintaining and improving Mistral’s publicly available libraries. Your work is critical in helping turn research breakthroughs into tangible solutions and improve Mistral&#39;s open-source ecosystem.</p>
<p>About the Open Source Software team
Our OSS team is embedded in our Science team and works very closely with various engineering and marketing teams. All OSS team members can fluidly move on the production / research spectrum depending on where the needs are or where their interests lie</p>
<p>Responsibilities
• Releasing our models to open-source platforms and libraries, e.g., vLLM, GitHub, Hugging Face
• Maintaining Mistral’s open-source libraries (mistral-common, mistral-finetune, mistral-inference)
• Create and maintain tooling and services: both internal facing (internal research) and external facing (open-source libraries)
• Implement and optimize open-source and internal libraries for performance and accuracy, ensuring production readiness and employing cutting-edge technology and innovative approaches
• Collaborate with the open-source community (PyTorch, vLLM, Hugging Face)</p>
<p>About you
• Master’s degree in Computer Science, Machine Learning, Data Science, or a related field
• Experience contributing to popular open-source libraries such as PyTorch, Tensorflow, JAX, vLLM, Transformers, Llama.cpp, ...
• Passion for contributing to the open-source software ecosystem
• Expert programming skills in Python, PyTorch, MLOps
• Adaptable, proactive, and autonomous
• Attention to detail and a drive to go the last mile to build almost perfect tools
• Deep understanding of machine learning approaches, especially LLMs and algorithms
• Low-ego, collaborative and have a real team player mindset</p>
<p>Now, it would be ideal if you have:
• Experience with training and fine-tuning large language models (e.g., distillation, supervised fine-tuning, policy optimization)
• Experience working with Slurm
• Worked with research teams before
• Experience as a core-maintainer of a popular ML open-source library</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>mid</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>Python, PyTorch, MLOps, Machine Learning, Large Language Models, Slurm, Open-source libraries, vLLM, GitHub, Hugging Face, PyTorch, Tensorflow, JAX, Transformers, Llama.cpp</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Mistral AI</Employername>
      <Employerlogo></Employerlogo>
      <Employerdescription>Mistral AI develops high-performance, optimized, open-source and cutting-edge AI models, products and solutions for enterprise use Gebased on-premises or in cloud environments.</Employerdescription>
      <Employerwebsite>https://mistral.ai</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.lever.co/mistral/ef4c26fc-3fdb-4dd2-a64e-95264ee769dd</Applyto>
      <Location>Paris</Location>
      <Country></Country>
      <Postedate>2026-03-10</Postedate>
    </job>
    <job>
      <externalid>290c3d28-4b2</externalid>
      <Title>Partner Solution Architect - ASEAN</Title>
      <Description><![CDATA[<p>About Mistral AI</p>
<p>At Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life.</p>
<p>We are a global company with teams distributed between France, USA, UK, Germany and Singapore. We are a diverse workforce that thrives in competitive environments and is committed to driving innovation.</p>
<p>Why This Role Matters</p>
<p>You will be the technical linchpin between Mistral and our strategic partners in ASEAN (Nvidia, Dell, Hyperscalers, Global System Integrators), translating our open-weight models and sovereign AI architecture into deployable, scalable solutions.</p>
<p>By designing joint architectures, influencing partner GTM motions, and earning a seat at the CIO/CTO table, you will accelerate Mistral’s technical credibility and deployment velocity across Asia Pacific.</p>
<p>This is a foundational role where you will define how open-weight AI is operationalized at scale in the region.</p>
<p>What You Will Do</p>
<p><strong>Partner Technical Leadership &amp; Architecture Design</strong></p>
<ul>
<li>Lead the technical design, deployment, and enablement of Mistral’s partner solutions, bridging our AI models with partner infrastructure (Nvidia, Dell, Hyperscalers, GSIs) to deliver scalable AI Labs, AI Factories, and sovereign AI architectures.</li>
</ul>
<ul>
<li>Serve as the trusted technical advisor to partner CTOs, CIOs, and engineering leaders—shaping joint architectures, guiding GPU/model deployment strategies, and accelerating GTM execution.</li>
</ul>
<ul>
<li>Design reference architectures and deployment patterns for partner-led implementations (e.g., multi-GPU inference clusters, AI Lab topologies, private AI clouds).</li>
</ul>
<ul>
<li>Innovate the Executive Briefing Center (EBC) function for technical leaders (CIOs, CTOs, CDOs), positioning Mistral as the default choice for enterprise AI.</li>
</ul>
<ul>
<li>Co-design sovereign AI reference architectures with Nvidia and Dell (H100, H200, GB200 platforms).</li>
</ul>
<p><strong>Co-Sell &amp; Revenue Enablement</strong></p>
<ul>
<li>Collaborate with Mistral’s partner and sales teams to progress deals, providing technical expertise to penetrate accounts and influence GTM pipeline.</li>
</ul>
<ul>
<li>Support partners in qualifying/disqualifying opportunities, ensuring Mistral solutions unlock maximum value for customers.</li>
</ul>
<ul>
<li>Deploy Mistral’s enterprise AI suite (models, fine-tuning, use-case building) in partner-led environments, tailoring solutions to customer requirements.</li>
</ul>
<p><strong>Trusted Advisor &amp; Lighthouse Implementations</strong></p>
<ul>
<li>Drive strategic partner-led opportunities through technical discovery, architecture design, and POC execution.</li>
</ul>
<ul>
<li>Lead lighthouse deployments that become referenceable case studies (e.g., Singtel AI Grid, Accenture AI Lab).</li>
</ul>
<ul>
<li>Establish a scalable partner enablement framework, training 100+ partner engineers across ASEAN.</li>
</ul>
<p><strong>Product Feedback &amp; Internal Collaboration</strong></p>
<ul>
<li>Coordinate with Mistral’s product and engineering teams to relay partner-specific requirements and feedback.</li>
</ul>
<ul>
<li>Align joint GTM and technical execution between Mistral Science, Partner Engineering, and partner field teams.</li>
</ul>
<p>About You</p>
<p><strong>Must-Have</strong></p>
<ul>
<li>10–15 years’ experience in partner-facing technical sales or solution architecture (e.g., Partner SA, Alliance Architect, Partner Technology Strategist).</li>
</ul>
<ul>
<li>Proven ability to engage C-suite and senior technical stakeholders (CTO, CIO, Chief Architect) in strategic architecture discussions.</li>
</ul>
<ul>
<li>Deep GenAI/LLM expertise: RAG, fine-tuning, prompt engineering, model evaluation, and deployment patterns.</li>
</ul>
<ul>
<li>Technical mastery of AI/ML infrastructure (GPU clusters, cloud platforms, model deployment frameworks).</li>
</ul>
<ul>
<li>Track record of co-designing/deploying joint solutions with ecosystem partners (Nvidia, Dell, AWS, Accenture, etc.).</li>
</ul>
<ul>
<li>Executive communication: Ability to articulate science-driven value propositions to technical and business audiences.</li>
</ul>
<ul>
<li>Entrepreneurial mindset: Operates autonomously in high-growth environments; creates playbooks, not follows them.</li>
</ul>
<ul>
<li>Fluent in English; confident working across diverse, cross-cultural teams in Asia.</li>
</ul>
<p><strong>Nice-to-Have</strong></p>
<ul>
<li>Experience with open-weight LLMs or open-source AI stacks (Mistral, Hugging Face, LangChain, vLLM, RAG frameworks).</li>
</ul>
<ul>
<li>Prior involvement in AI Lab, AI Factory, or Sovereign Cloud deployments.</li>
</ul>
<ul>
<li>Familiarity with data governance, model evaluation, and GPU sizing for large-scale inference.</li>
</ul>
<ul>
<li>Network across GSIs and infrastructure partners in Asia</li>
</ul>
<ul>
<li>Exposure to multi-region partner programs or joint GTM initiatives in APJ.</li>
</ul>
<ul>
<li>Bonus languages: Korean, Japanese, or Mandarin for regional partner engagement.</li>
</ul>
<p>What we offer</p>
<ul>
<li>💰 Competitive cash salary and equity</li>
</ul>
<ul>
<li>🚑 Health Insurance : Best in Class</li>
</ul>
<ul>
<li>🥎 Sport : $90 for gym membership allowance</li>
</ul>
<ul>
<li>🥕 Food : $200 monthly allowance for meals (solution might evolve as we grow bigger)</li>
</ul>
<ul>
<li>🚴 Transportation : $120/month for public transport or Parking charges reimbursed</li>
</ul>
<ul>
<li>🏝️ PTO: 18 per year</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>GenAI/LLM expertise, RAG, fine-tuning, prompt engineering, model evaluation, deployment patterns, AI/ML infrastructure, GPU clusters, cloud platforms, model deployment frameworks, co-designing/deploying joint solutions, ecosystem partners, Nvidia, Dell, AWS, Accenture, open-weight LLMs, open-source AI stacks, Mistral, Hugging Face, LangChain, vLLM, RAG frameworks, data governance, model evaluation, GPU sizing, large-scale inference, GSIs, infrastructure partners, multi-region partner programs, joint GTM initiatives, APJ, Korean, Japanese, Mandarin</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Mistral AI</Employername>
      <Employerlogo></Employerlogo>
      <Employerdescription>Mistral AI is an AI technology company that provides high-performance, optimized, open-source and cutting-edge models, products and solutions.</Employerdescription>
      <Employerwebsite>https://mistral.ai/careers</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.lever.co/mistral/fe3542b5-4f99-4d62-af6a-fbdfd13bf0e4</Applyto>
      <Location>Singapore</Location>
      <Country></Country>
      <Postedate>2026-03-10</Postedate>
    </job>
    <job>
      <externalid>f8883394-0fc</externalid>
      <Title>Solutions Architect, AI and ML</Title>
      <Description><![CDATA[<p>We are looking for an experienced Cloud Solution Architect to help assist customers with adoption of GPU hardware and Software, as well as building and deploying Machine Learning (ML) , Deep Learning (DL), data analytics solutions on various Cloud Computing Platforms.</p>
<p>As a Solutions Architect, you will engage directly with developers, researchers, and data scientists with some of NVIDIA’s most strategic technology customers as well as work directly with business and engineering teams on product strategy.</p>
<p><strong>Key Responsibilities:</strong></p>
<ul>
<li>Help cloud customers craft, deploy, and maintain scalable, GPU-accelerated inference pipelines on cloud ML services and Kubernetes for large language models (LLMs) and generative AI workloads.</li>
<li>Enhance performance tuning using TensorRT/TensorRT-LLM, vLLM, Dynamo, and Triton Inference Server to improve GPU utilization and model efficiency.</li>
<li>Collaborate with multi-functional teams (engineering, product) and offer technical mentorship to cloud customers implementing AI inference at scale.</li>
<li>Build custom PoCs for solution that address customer’s critical business needs applying NVIDIA hardware and software technology</li>
<li>Partner with Sales Account Managers or Developer Relations Managers to identify and secure new business opportunities for NVIDIA products and solutions for ML/DL and other software solutions</li>
<li>Prepare and deliver technical content to customers including presentations about purpose-built solutions, workshops about NVIDIA products and solutions, etc.</li>
<li>Conduct regular technical customer meetings for project/product roadmap, feature discussions, and intro to new technologies. Establish close technical ties to the customer to facilitate rapid resolution of customer issues</li>
</ul>
<p><strong>Requirements:</strong></p>
<ul>
<li>BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Statistics, Physics, or other Engineering fields or equivalent experience.</li>
<li>3+ Years in Solutions Architecture with a proven track record of moving AI inference from POC to production in cloud computing environments including AWS, GCP, or Azure</li>
<li>3+ years of hands-on experience with Deep Learning frameworks such as PyTorch and TensorFlow</li>
<li>Excellent knowledge of the theory and practice of LLM and DL inference</li>
<li>Strong fundamentals in programming, optimizations, and software design, especially in Python</li>
<li>Experience with containerization and orchestration technologies like Docker and Kubernetes, monitoring, and observability solutions for AI deployments</li>
<li>Knowledge of Inference technologies - NVIDIA NIM, TensorRT-LLM, Dynamo, Triton Inference Server, vLLM, etc</li>
<li>Proficiency in problem-solving and debugging skills in GPU environments</li>
<li>Excellent presentation, communication and collaboration skills</li>
</ul>
<p><strong>Nice to Have:</strong></p>
<ul>
<li>AWS, GCP or Azure Professional Solution Architect Certification.</li>
<li>Experience optimizing and deploying large MoE LLMs at scale</li>
<li>Active contributions to open-source AI inference projects (e.g., vLLM, TensorRT-LLM Dynamo, SGLang, Triton or similar)</li>
<li>Experience with Multi-GPU Multi-node Inference technologies like Tensor Parallelism/Expert Parallelism, Disaggregated Serving, LWS, MPI, EFA/Infiniband, NVLink/PCIe, etc</li>
<li>Experience in developing and integrating monitoring and alerting solutions using Prometheus, Grafana, and NVIDIA DCGM and GPU performance Analysis and tools like NVIDIA Nsight Systems</li>
</ul>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>Cloud Solution Architecture, GPU hardware and Software, Machine Learning (ML), Deep Learning (DL), Data Analytics, Cloud Computing Platforms, Kubernetes, TensorRT, TensorRT-LLM, vLLM, Dynamo, Triton Inference Server, Python, Containerization, Orchestration, Monitoring, Observability, Inference technologies, NVIDIA NIM, Problem-solving, Debugging, GPU environments, AWS, GCP, Azure, Professional Solution Architect Certification, Large MoE LLMs, Open-source AI inference projects, Multi-GPU Multi-node Inference technologies, Monitoring and alerting solutions, Prometheus, Grafana, NVIDIA DCGM, GPU performance Analysis, NVIDIA Nsight Systems</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>NVIDIA</Employername>
      <Employerlogo>https://logos.yubhub.co/nvidia.com.png</Employerlogo>
      <Employerdescription>NVIDIA is a leading technology company that specializes in designing and manufacturing graphics processing units (GPUs) and high-performance computing hardware.</Employerdescription>
      <Employerwebsite>https://nvidia.wd5.myworkdayjobs.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-WA-Redmond/Solutions-Architect--AI-and-ML_JR2005988-1</Applyto>
      <Location>Redmond, CA, Santa Clara, Seattle</Location>
      <Country></Country>
      <Postedate>2026-03-09</Postedate>
    </job>
    <job>
      <externalid>db67438e-963</externalid>
      <Title>Director, System Software Engineering - Metropolis Accelerated and Inferencing Software</Title>
      <Description><![CDATA[<p><strong>Director, System Software Engineering - Metropolis Accelerated and Inferencing Software</strong></p>
<p>We are looking for an engineering leader who is hands-on with deep learning—comfortable reading/modeling code, not just running it. You will lead, encourage, and develop world-class engineering and data teams distributed across Europe, Asia and the United States.</p>
<p><strong>Key Responsibilities:</strong></p>
<ul>
<li>Architect and operationalize NVIDIA’s end-to-end data Inference Acceleration strategy, powering Inferencing and continuous performance improvements.</li>
<li>Drive Strategic Implementations of TensorRT, VLLM and other accelerated frameworks for inference solutions for Edge and Enterprise devices: Lead Accelerated Computing efforts and solutions for key Metropolis verticals. Set up Proofs of Readiness (PORs) and guide their implementations.</li>
<li>Leading customer solutions: Collaborate with major Metropolis OEMs and Partners to architect highly accelerated and optimized custom deep learning models and inference pipelines for their specific requirements. Offer direct customer support, including debugging, technical education, and handling customer inquiries for our Metropolis partner and customers. Responsible for drafting and finalizing SOWs with internal customers and partners.</li>
<li>Performance Benchmarking: Orchestrate efforts to achieve leading performance results on industry benchmarks like MLPerf on various edge and Enterprise devices.</li>
<li>Technical Leadership &amp; Influence: Function as a technical leader for deep learning across multiple teams, giving oversight and build support. Apply customer insights to influence the composition and structure of upcoming SOC / GPU deep learning hardware.</li>
<li>Scaling the team: Strategically hiring to meet new demands while also mentoring and adjusting existing teams to new deep learning challenges.</li>
<li>Representing Nvidia Deep learning solutions in webinars, conferences and partner events</li>
</ul>
<p><strong>Requirements:</strong></p>
<ul>
<li>Masters in Computer Science/Electrical Engineering or equivalent experience.</li>
<li>A minimum of 8 years of meaningful involvement in machine learning/deep learning research or practical experience, coupled with 7+ years of leadership background and overall 15+ years of industry experience.</li>
<li>Over 10 years of validated expertise in the embedded software sector, holding technical leadership positions accountable for delivering outstanding production software within a multifaceted setting.</li>
<li>Deep Knowledge of GPU, CPU and dedicated deep learning architecture fundamentals and low-level performance optimizations using heterogeneous computing.</li>
<li>Hands-on experience with VLMs, LLMs, or multimodal AI systems applied to perception, data triage, or automated labeling.</li>
<li>Strong expertise in large-scale data processing, systems build, or machine learning pipelines.</li>
<li>Strong communication, careful planning, and technical leadership capabilities.</li>
</ul>
<p><strong>Benefits:</strong></p>
<ul>
<li>Competitive salary package and benefits</li>
<li>Eligible for equity</li>
</ul>
<p><strong>How to Apply:</strong></p>
<p>Applications for this job will be accepted at least until March 13, 2026.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>executive</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>Machine Learning, Deep Learning, GPU, CPU, Heterogeneous Computing, TensorRT, VLLM, Proof of Readiness, Customer Support, Technical Education, Performance Benchmarking, Technical Leadership, Team Scaling, Webinars, Conferences, Partner Events, VLMs, LLMs, Multimodal AI Systems, Perception, Data Triage, Automated Labeling, Large-Scale Data Processing, Systems Build, Machine Learning Pipelines</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>NVIDIA</Employername>
      <Employerlogo>https://logos.yubhub.co/nvidia.com.png</Employerlogo>
      <Employerdescription>NVIDIA is a world leader in physical AI, powering self-driving cars, humanoid robots, intelligent environments, medical devices, and more.</Employerdescription>
      <Employerwebsite>https://nvidia.wd5.myworkdayjobs.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCareerSite/job/US-CA-Santa-Clara/Director--Metropolis-Accelerated-and-Inferencing-Software_JR2011299</Applyto>
      <Location>Santa Clara</Location>
      <Country></Country>
      <Postedate>2026-03-09</Postedate>
    </job>
    <job>
      <externalid>6c07cdd9-125</externalid>
      <Title>Staff Full Stack Software Engineer</Title>
      <Description><![CDATA[<p>Perplexity is seeking an experienced Staff Full Stack Engineer to help revolutionize the way people search and interact online. In this role, you&#39;ll translate cutting-edge AI advances into products that are both useful and engaging.</p>
<p>Our tech stack includes Python, Go, Rust, TypeScript, React FastAPI, PostgreSQL, Redis, Docker, vLLM, and AWS.</p>
<p>Roles and teams at Perplexity are fluid. By applying to this position, you will be eligible to join teams across Perplexity Engineering. During the interview process, we look forward to learning more about your unique talents and figuring out where in our organization you’ll grow and thrive the most.</p>
<p><strong>Responsibilities</strong></p>
<ul>
<li>Building new 0-1 products at high scale</li>
</ul>
<ul>
<li>Working closely with Product, Design, and Data to ship experiments and learn</li>
</ul>
<ul>
<li>Launching new features, experiments, campaigns, and partnerships in a fast-moving environment</li>
</ul>
<ul>
<li>Building core growth infrastructure such as notification platforms, ad attribution, and more</li>
</ul>
<ul>
<li>Analyzing performance metrics and user feedback to identify opportunities for improvement and optimization</li>
</ul>
<ul>
<li>Building delightful and data-proven user journeys</li>
</ul>
<p><strong>Qualifications</strong></p>
<ul>
<li>Strong programming skills with the ability to work across the full stack</li>
</ul>
<ul>
<li>Self-motivated with a willingness to take ownership of tasks</li>
</ul>
<ul>
<li>Good quantitative understanding of data and experimentation</li>
</ul>
<ul>
<li>Experience making data-driven decisions and measuring impact of those decisions (experimentation, feature flags, adhoc analysis)</li>
</ul>
<ul>
<li>A passion for shipping quality products</li>
</ul>
<ul>
<li>8+ years of industry experience</li>
</ul>
<p><strong>How we work with AI</strong></p>
<p>AI is at the heart of what we build, and using it effectively is an expectation for every role here. During interviews, we will be excited to see how you think and understand how you make decisions—qualities that would directly influence our AI development—there may be an opportunity for you to showcase your AI skills; however, we ask that you kindly avoid using AI tools throughout the process unless we explicitly indicate otherwise.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>Full time</Jobtype>
      <Experiencelevel>staff</Experiencelevel>
      <Workarrangement></Workarrangement>
      <Salaryrange>$220K – $405K • Offers Equity</Salaryrange>
      <Skills>Python, Go, Rust, TypeScript, React FastAPI, PostgreSQL, Redis, Docker, vLLM, AWS</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Perplexity</Employername>
      <Employerlogo>https://logos.yubhub.co/perplexity.com.png</Employerlogo>
      <Employerdescription>Perplexity is a technology company that develops products for searching and interacting online.</Employerdescription>
      <Employerwebsite>https://jobs.ashbyhq.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.ashbyhq.com/perplexity/8df04206-7b62-4163-9d94-a4f28680eeb1</Applyto>
      <Location>San Francisco; New York City</Location>
      <Country></Country>
      <Postedate>2026-03-09</Postedate>
    </job>
    <job>
      <externalid>d0214534-b6a</externalid>
      <Title>Senior Applied Scientist</Title>
      <Description><![CDATA[<p>We&#39;re building the next-generation Grounding Service that powers the latest AI applications—chat assistants, copilots, and autonomous agents—with factual, cited, and trustworthy responses. Our platform stitches together retrieval, reasoning, and real-time data so that large language models stay anchored to enterprise knowledge, the public web, and proprietary tools. We&#39;re looking for a Senior Applied Scientist to lead end-to-end science for grounding: inventing retrieval and attribution methods, defining factuality/faithfulness metrics, and shipping production models and APIs that scale to billions of queries. You&#39;ll partner closely with engineering, product, research, and customers to deliver fast, reliable, and explainable answers with source citations across a diverse set of domains and modalities. As a team, we value curiosity, pragmatic rigor, and inclusive collaboration. We believe great systems emerge when scientists and engineers co-design metrics, models, and infrastructure—and when we obsess over customer impact, privacy, and safety. Microsoft&#39;s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. Starting January 26, 2026, Microsoft AI (MAI) employees who live within a 50-mile commute of a designated Microsoft office in the U.S. or 25-mile commute of a non-U.S., country-specific location are expected to work from the office at least four days per week. This expectation is subject to local law and may vary by jurisdiction. Responsibilities</p>
<p>Owns the science roadmap for grounding—including retrieval, re-ranking, attribution, and reasoning—driving initiatives from problem framing to production impact. Designs and evolves state-of-the-art retrieval and RAG orchestration across documents, tables, code, and images. Builds citation and provenance systems (e.g., passage highlighting, quote-level alignment, confidence scoring) to reduce hallucinations and increase user trust. Leads experimentation and evaluation using A/B testing, interleaving, NDCG, MRR, precision/recall, and calibration curves to guide measurable trade-offs. Advances tool-augmented grounding through schema-aware retrieval, function calling, knowledge graph joins, and real-time connectors to databases, cloud object stores, search indexes, and the web. Partners with platform engineering to productionize models with scalable inference, embedding services, feature stores, caching, and privacy-compliant multi-tenant systems. Nurtures collaborative relationships with product and business leaders across Microsoft, influencing strategic decisions and driving business impact through technology. Authors white papers, contributes to internal tools and services, and may publish research to generate intellectual property. Bridges the gap between researchers (e.g., Microsoft Research) and development teams, applying long-term research to solve immediate product needs. Leads high-stakes negotiations to ensure cutting-edge technologies are applied practically and effectively. Identifies and solves significant business problems using novel, scalable, and data-driven solutions. Shapes the direction of Microsoft and the broader industry through pioneering product and tooling work. Mentors applied scientists and data scientists, establishing best practices in experimentation, error analysis, and incident review. Collaborates cross-functionally with PMs, research, infrastructure, and security teams to align on milestones, SLAs, and safety protocols. Communicates clearly through design documentation, progress updates, and presentations to executives and customers. Contributes to ethics and privacy policies, identifies bias in product development, and proposes mitigation strategies.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>hybrid</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, Machine Learning, Information Retrieval, Large Language Model Development, Pretraining, Supervised Fine-Tuning, Reinforcement Learning, Optimizing LLM Inference, Master&apos;s Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field, 6+ years related experience (e.g., statistics, predictive analytics, research), Demonstrated expertise in information retrieval, with publications in top-tier conferences or journals such as NeurIPS, ICML, ICLR, SIGIR, or ACL, Hands-on experience in large language model (LLM) development, including pretraining, supervised fine-tuning (SFT), and reinforcement learning (RL), Proven track record in optimizing LLM inference, or active contributions to open-source frameworks like vLLM, SGLang, or related projects</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Microsoft</Employername>
      <Employerlogo>https://logos.yubhub.co/microsoft.ai.png</Employerlogo>
      <Employerdescription>Microsoft is a multinational technology company that develops, manufactures, licenses, and supports a wide range of software products, services, and devices.</Employerdescription>
      <Employerwebsite>https://microsoft.ai</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://microsoft.ai/job/senior-applied-scientist-37/</Applyto>
      <Location>Beijing</Location>
      <Country></Country>
      <Postedate>2026-03-08</Postedate>
    </job>
    <job>
      <externalid>d3a39f4c-d95</externalid>
      <Title>Software Engineer, Inference - Multi Modal</Title>
      <Description><![CDATA[<p><strong>Software Engineer, Inference - Multi Modal</strong></p>
<p><strong>Location</strong></p>
<p>San Francisco</p>
<p><strong>Employment Type</strong></p>
<p>Full time</p>
<p><strong>Department</strong></p>
<p>Scaling</p>
<p><strong>Compensation</strong></p>
<ul>
<li>$295K – $555K • Offers Equity</li>
</ul>
<p>The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.</p>
<ul>
<li>Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts</li>
</ul>
<ul>
<li>Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)</li>
</ul>
<ul>
<li>401(k) retirement plan with employer match</li>
</ul>
<ul>
<li>Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)</li>
</ul>
<ul>
<li>Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees</li>
</ul>
<ul>
<li>13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)</li>
</ul>
<ul>
<li>Mental health and wellness support</li>
</ul>
<ul>
<li>Employer-paid basic life and disability coverage</li>
</ul>
<ul>
<li>Annual learning and development stipend to fuel your professional growth</li>
</ul>
<ul>
<li>Daily meals in our offices, and meal delivery credits as eligible</li>
</ul>
<ul>
<li>Relocation support for eligible employees</li>
</ul>
<ul>
<li>Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.</li>
</ul>
<p>More details about our benefits are available to candidates during the hiring process.</p>
<p>This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.</p>
<p><strong>About the Team</strong></p>
<p>OpenAI’s Inference team powers the deployment of our most advanced models - including our GPT models, 4o Image Generation, and Whisper - across a variety of platforms. Our work ensures these models are available, performant, and scalable in production, and we partner closely with Research to bring the next generation of models into the world. We&#39;re a small, fast-moving team of engineers focused on delivering a world-class developer experience while pushing the boundaries of what AI can do.</p>
<p>We’re expanding into multimodal inference, building the infrastructure needed to serve models that handle image, audio, and other non-text modalities. These workloads are inherently more heterogeneous and experimental, involving diverse model sizes and interactions, more complex input/output formats, and tighter coordination with product and research.</p>
<p><strong>About the Role</strong></p>
<p>We’re looking for a software engineer to help us serve OpenAI’s multimodal models at scale. You’ll be part of a small team responsible for building reliable, high-performance infrastructure for serving real-time audio, image, and other MM workloads in production.</p>
<p>This work is inherently cross-functional: you’ll collaborate directly with researchers training these models and with product teams defining new modalities of interaction. You&#39;ll build and optimize the systems that let users generate speech, understand images, and interact with models in ways far beyond text.</p>
<p><strong>In this role, you will:</strong></p>
<ul>
<li>Design and implement inference infrastructure for large-scale multimodal models.</li>
</ul>
<ul>
<li>Optimize systems for high-throughput, low-latency delivery of image and audio inputs and outputs.</li>
</ul>
<ul>
<li>Enable experimental research workflows to transition into reliable production services.</li>
</ul>
<ul>
<li>Collaborate closely with researchers, infra teams, and product engineers to deploy state-of-the-art capabilities.</li>
</ul>
<ul>
<li>Contribute to system-level improvements including GPU utilization, tensor parallelism, and hardware abstraction layers.</li>
</ul>
<p><strong>You might thrive in this role if you:</strong></p>
<ul>
<li>Have experience building and scaling inference systems for LLMs or multimodal models.</li>
</ul>
<ul>
<li>Have worked with GPU-based ML workloads and understand the performance dynamics of large models, especially with complex data like images or audio.</li>
</ul>
<ul>
<li>Enjoy experimental, fast-evolving work and collaborating closely with research.</li>
</ul>
<ul>
<li>Are comfortable dealing with systems that span networking, distributed compute, and high-throughput data handling.</li>
</ul>
<ul>
<li>Have familiarity with inference tooling like vLLM, TensorRT-LLM, or custom model parallel systems.</li>
</ul>
<ul>
<li>Own problems end-to-end and are excited to operate in ambiguous, fast-moving spaces.</li>
</ul>
<p><strong>Nice to Have:</strong></p>
<ul>
<li>Experience working with image generation or audio synthesis models in production.</li>
</ul>
<ul>
<li>Exposure to distributed ML training or system-efficient model design.</li>
</ul>
<p><strong>About OpenAI</strong></p>
<p>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>mid</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange>$295K – $555K • Offers Equity</Salaryrange>
      <Skills>Software Engineer, Inference Infrastructure, GPU-based ML Workloads, Tensor Parallelism, Hardware Abstraction Layers, vLLM, TensorRT-LLM, Custom Model Parallel Systems, Image Generation, Audio Synthesis, Distributed ML Training, System-Efficient Model Design</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>OpenAI</Employername>
      <Employerlogo>https://logos.yubhub.co/openai.com.png</Employerlogo>
      <Employerdescription>OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products.</Employerdescription>
      <Employerwebsite>https://jobs.ashbyhq.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://jobs.ashbyhq.com/openai/4d14449e-5e7f-45d4-b103-8776a6c87086</Applyto>
      <Location>San Francisco</Location>
      <Country></Country>
      <Postedate>2026-03-06</Postedate>
    </job>
  </jobs>
</source>