{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/inference-optimization-techniques"},"x-facet":{"type":"skill","slug":"inference-optimization-techniques","display":"Inference Optimization Techniques","count":1},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_4054dca1-a4f"},"title":"AI Inference Engineer","description":"<p>We are looking for an AI Inference engineer to join our growing team. Our current stack is Python, Rust, C++, PyTorch, Triton, CUDA, Kubernetes. You will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.</p>\n<p><strong>What you&#39;ll do</strong></p>\n<p>Develop APIs for AI inference that will be used by both internal and external customers.</p>\n<ul>\n<li>Develop APIs for AI inference that will be used by both internal and external customers</li>\n<li>Benchmark and address bottlenecks throughout our inference stack</li>\n<li>Improve the reliability and observability of our systems and respond to system outages</li>\n<li>Explore novel research and implement LLM inference optimizations</li>\n</ul>\n<p><strong>What you need</strong></p>\n<ul>\n<li>Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)</li>\n<li>Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)</li>\n<li>Understanding of GPU architectures or experience with GPU kernel programming using CUDA</li>\n</ul>\n<p><strong>Why this matters</strong></p>\n<p>As an AI Inference engineer, you will play a critical role in the development and deployment of our machine learning models. Your work will have a direct impact on the performance and reliability of our systems, and will help us to continue to innovate and improve our products.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_4054dca1-a4f","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Perplexity","sameAs":"https://jobs.ashbyhq.com","logo":"https://logos.yubhub.co/perplexity.com.png"},"x-apply-url":"https://jobs.ashbyhq.com/perplexity/e4777627-ff8f-4257-8612-3a016bb58592","x-work-arrangement":"onsite","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":"Final offer amounts are determined by multiple factors, including, experience and expertise.","x-skills-required":["ML systems","deep learning frameworks","GPU architectures"],"x-skills-preferred":["LLM architectures","inference optimization techniques"],"datePosted":"2026-03-04T12:27:20.012Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"London"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"ML systems, deep learning frameworks, GPU architectures, LLM architectures, inference optimization techniques"}]}