{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/model-serving-infrastructure"},"x-facet":{"type":"skill","slug":"model-serving-infrastructure","display":"Model Serving Infrastructure","count":3},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_62efca6f-b6f"},"title":"Senior AI Engineer","description":"<p>We&#39;re looking for a Senior AI Engineer who is obsessed with building AI systems that actually work in production: reliable, observable, cost-efficient, and genuinely useful. This is not a research role. You will ship AI-powered features that process real financial data for real businesses.</p>\n<p>LLM &amp; AI Pipeline Engineering - Design, build, and maintain production-grade LLM integration pipelines , including retrieval-augmented generation (RAG), prompt engineering, output parsing, and chain orchestration.</p>\n<p>Develop and operate AI features within Jeeves&#39;s core financial products: spend categorization, document extraction, anomaly detection, financial Q&amp;A, and automated reconciliation.</p>\n<p>Implement structured output validation, fallback handling, and confidence scoring to ensure AI decisions meet reliability standards for financial use cases.</p>\n<p>Evaluate and integrate AI frameworks and tools (LangChain, LlamaIndex, OpenAI API, Anthropic API, HuggingFace, vector databases) and advocate for the right tool for the job.</p>\n<p>Establish prompt versioning and evaluation practices to ensure AI outputs remain accurate and consistent as models and data evolve.</p>\n<p>Retrieval &amp; Vector Search - Design and maintain vector search pipelines using databases such as Pinecone, Weaviate, or pgvector to power semantic search and RAG-based features.</p>\n<p>Build document ingestion and chunking pipelines for Jeeves&#39;s financial data , processing invoices, receipts, policy documents, and transaction records.</p>\n<p>Optimize retrieval quality through embedding model selection, chunk strategy, metadata filtering, and re-ranking techniques.</p>\n<p>ML Model Serving &amp; Operations - Collaborate with data scientists to take trained ML models from experimental notebooks to production serving infrastructure.</p>\n<p>Build and maintain model serving endpoints with appropriate latency SLOs, input validation, and output monitoring.</p>\n<p>Implement model performance monitoring and data drift detection to ensure production models remain accurate over time.</p>\n<p>Support model retraining workflows by designing clean data pipelines and feature engineering that can be continuously updated.</p>\n<p>Backend Integration &amp; Reliability - Integrate AI services cleanly with Jeeves&#39;s backend microservices , designing clear API contracts, circuit breakers, and graceful degradation patterns.</p>\n<p>Write high-quality, testable backend code in Python or Go/Node.js to power AI-integrated features.</p>\n<p>Instrument AI components with structured logging, distributed tracing, latency dashboards, and alerting to ensure operational visibility.</p>\n<p>Collaboration &amp; Growth - Partner with Product, Backend Engineering, and Data Science to define the AI roadmap and translate requirements into reliable systems.</p>\n<p>Contribute to a culture of quality by writing design docs, reviewing peers&#39; AI system designs, and sharing learnings openly.</p>\n<p>Help grow the AI engineering practice at Jeeves by establishing patterns, tooling, and best practices that the broader team can build on.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_62efca6f-b6f","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Jeeves","sameAs":"https://www.jeeves.com/","logo":"https://logos.yubhub.co/jeeves.com.png"},"x-apply-url":"https://jobs.lever.co/tryjeeves/ded9e04e-f18e-4d4c-ae43-4b7882c6200b","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["LLM","AI","Python","LangChain","LlamaIndex","OpenAI API","Anthropic API","HuggingFace","vector databases","Pinecone","Weaviate","pgvector","semantic search","RAG-based features","document ingestion","chunking pipelines","embedding model selection","chunk strategy","metadata filtering","re-ranking techniques","model serving infrastructure","latency SLOs","input validation","output monitoring","model performance monitoring","data drift detection","clean data pipelines","feature engineering","API contracts","circuit breakers","graceful degradation patterns","structured logging","distributed tracing","latency dashboards","alerting"],"x-skills-preferred":[],"datePosted":"2026-04-17T12:39:23.341Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"India"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Finance","skills":"LLM, AI, Python, LangChain, LlamaIndex, OpenAI API, Anthropic API, HuggingFace, vector databases, Pinecone, Weaviate, pgvector, semantic search, RAG-based features, document ingestion, chunking pipelines, embedding model selection, chunk strategy, metadata filtering, re-ranking techniques, model serving infrastructure, latency SLOs, input validation, output monitoring, model performance monitoring, data drift detection, clean data pipelines, feature engineering, API contracts, circuit breakers, graceful degradation patterns, structured logging, distributed tracing, latency dashboards, alerting"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_5579e8fb-227"},"title":"Senior AI Engineer","description":"<p>We&#39;re looking for a Senior AI Engineer who is obsessed with building AI systems that actually work in production: reliable, observable, cost-efficient, and genuinely useful. This is not a research role. You will ship AI-powered features that process real financial data for real businesses.</p>\n<p>LLM &amp; AI Pipeline Engineering - Design, build, and maintain production-grade LLM integration pipelines , including retrieval-augmented generation (RAG), prompt engineering, output parsing, and chain orchestration.</p>\n<p>Develop and operate AI features within Jeeves&#39;s core financial products: spend categorization, document extraction, anomaly detection, financial Q&amp;A, and automated reconciliation.</p>\n<p>Implement structured output validation, fallback handling, and confidence scoring to ensure AI decisions meet reliability standards for financial use cases.</p>\n<p>Evaluate and integrate AI frameworks and tools (LangChain, LlamaIndex, OpenAI API, Anthropic API, HuggingFace, vector databases) and advocate for the right tool for the job.</p>\n<p>Establish prompt versioning and evaluation practices to ensure AI outputs remain accurate and consistent as models and data evolve.</p>\n<p>Retrieval &amp; Vector Search - Design and maintain vector search pipelines using databases such as Pinecone, Weaviate, or pgvector to power semantic search and RAG-based features.</p>\n<p>Build document ingestion and chunking pipelines for Jeeves&#39;s financial data , processing invoices, receipts, policy documents, and transaction records.</p>\n<p>Optimize retrieval quality through embedding model selection, chunk strategy, metadata filtering, and re-ranking techniques.</p>\n<p>ML Model Serving &amp; Operations - Collaborate with data scientists to take trained ML models from experimental notebooks to production serving infrastructure.</p>\n<p>Build and maintain model serving endpoints with appropriate latency SLOs, input validation, and output monitoring.</p>\n<p>Implement model performance monitoring and data drift detection to ensure production models remain accurate over time.</p>\n<p>Support model retraining workflows by designing clean data pipelines and feature engineering that can be continuously updated.</p>\n<p>Backend Integration &amp; Reliability - Integrate AI services cleanly with Jeeves&#39;s backend microservices , designing clear API contracts, circuit breakers, and graceful degradation patterns.</p>\n<p>Write high-quality, testable backend code in Python or Go/Node.js to power AI-integrated features.</p>\n<p>Instrument AI components with structured logging, distributed tracing, latency dashboards, and alerting to ensure operational visibility.</p>\n<p>Build human-in-the-loop review workflows for AI decisions that require oversight , particularly for high-value financial actions.</p>\n<p>Collaboration &amp; Growth - Partner with Product, Backend Engineering, and Data Science to define the AI roadmap and translate requirements into reliable systems.</p>\n<p>Contribute to a culture of quality by writing design docs, reviewing peers&#39; AI system designs, and sharing learnings openly.</p>\n<p>Help grow the AI engineering practice at Jeeves by establishing patterns, tooling, and best practices that the broader team can build on.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_5579e8fb-227","directApply":true,"hiringOrganization":{"@type":"Organization","name":"Jeeves","sameAs":"https://www.jeeves.com/","logo":"https://logos.yubhub.co/jeeves.com.png"},"x-apply-url":"https://jobs.lever.co/tryjeeves/2f00206f-6091-4eed-8b5f-1325afdbfe30","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["LLM pipeline engineering","RAG architecture","ML system operation","Python programming","AI orchestration framework","ML model serving infrastructure","Observability tooling"],"x-skills-preferred":["Fintech experience","Prompt evaluation frameworks","ML lifecycle management tools","Real-time data streaming"],"datePosted":"2026-04-17T12:38:27.085Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Brazil"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Finance","skills":"LLM pipeline engineering, RAG architecture, ML system operation, Python programming, AI orchestration framework, ML model serving infrastructure, Observability tooling, Fintech experience, Prompt evaluation frameworks, ML lifecycle management tools, Real-time data streaming"},{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_69369815-a11"},"title":"Associate/Vice President, AI Infrastructure Engineer","description":"<p>At BlackRock, technology underpins everything we do. AI is a core strategic priority for the firm, embedded across Aladdin and our investment, client, and operational platforms. We are seeking an AI Infrastructure Engineer to help build and operate the foundational infrastructure that enables AI systems to scale safely, securely, and reliably across the enterprise.</p>\n<p>This role sits within Aladdin Platform Engineering and focuses on the infrastructure and platform services required to support machine learning models, large language models (LLMs), and emerging AI capabilities in production. The successful candidate will work closely with AI Engineers, Data Scientists, Platform Engineers, Security, and Product partners to deliver resilient, cloud native AI platforms in a highly regulated environment.</p>\n<p><strong>Key Responsibilities</strong></p>\n<ul>\n<li>Design, build, and operate AI-focused infrastructure platforms supporting model development, training, evaluation, and inference.</li>\n<li>Engineer scalable, reliable, and secure cloud-native services to support AI workloads across AWS, Azure, and hybrid environments.</li>\n<li>Partner with AI Engineering and Data Science teams to improve developer experience, performance, and operational stability of AI systems.</li>\n<li>Enable production deployment of ML models and LLMs within governed enterprise environments, aligned with firmwide risk and compliance standards.</li>\n<li>Implement and maintain infrastructure as code and automation to ensure repeatable, auditable platform provisioning.</li>\n<li>Build and operate observability, monitoring, and alerting solutions for AI platforms, ensuring availability, performance, and cost transparency.</li>\n<li>Collaborate with Security and Risk partners to integrate identity, access controls, data protection, and governance into AI infrastructure.</li>\n<li>Contribute to architectural decisions and technical standards for AI platforms across Aladdin.</li>\n<li>Participate in on-call rotations and operational support as required for critical platforms.</li>\n<li>Continuously evaluate emerging AI infrastructure technologies and apply them pragmatically within BlackRock’s enterprise context.</li>\n</ul>\n<p><strong>Qualifications</strong></p>\n<ul>\n<li>Strong experience in cloud infrastructure, platform engineering, or systems engineering roles.</li>\n<li>4+ hands-on expertise with AWS and/or Azure and/or GCP, including Azure ML, Azure Foundry, AWS Bedrock, Google Vertex, as well as cloud compute, networking, storage, and security services.</li>\n<li>Understanding of ML platform operations and governance concepts, including model deployment strategies, lifecycle management, monitoring/observability, and Disaster Recovery</li>\n<li>Experience supporting LLMs, generative AI platforms, or model serving infrastructure.</li>\n<li>Experience supporting AI and machine learning workloads, with exposure to managed compute for model training and fine-tuning, experimentation over large datasets, and end-to-end MLOps pipeline flow including data ingestion, training, validation, and deployment.</li>\n<li>Proficiency with Infrastructure as Code tools (e.g., Terraform, ARM/Bicep, CloudFormation).</li>\n<li>Strong programming or scripting skills (e.g., Python, Bash, or similar).</li>\n<li>Experience building and operating containerized and Kubernetes-based platforms.</li>\n<li>Solid understanding of reliability, scalability, observability, and operational best practices.</li>\n<li>Ability to work effectively in cross-functional teams and communicate complex technical concepts clearly.</li>\n</ul>\n<p><strong>Our Benefits</strong></p>\n<p>To help you stay energized, engaged, and inspired, we offer a wide range of employee benefits including: retirement investment and tools designed to help you in building a sound financial future; access to education reimbursement; comprehensive resources to support your physical health and emotional well-being; family support programs; and Flexible Time Off (FTO) so you can relax, recharge, and be there for the people you care about.</p>\n<p><strong>Our Hybrid Work Model</strong></p>\n<p>BlackRock’s hybrid work model is designed to enable a culture of collaboration and apprenticeship that enriches the experience of our employees, while supporting flexibility for all. Employees are currently required to work at least 4 days in the office per week, with the flexibility to work from home 1 day a week. Some business groups may require more time in the office due to their roles and responsibilities. We remain focused on increasing the impactful moments that arise when we work together in person – aligned with our commitment to performance and innovation. As a new joiner, you can count on this hybrid model to accelerate your learning and onboarding experience here at BlackRock.</p>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_69369815-a11","directApply":true,"hiringOrganization":{"@type":"Organization","name":"BlackRock","sameAs":"https://jobs.workable.com","logo":"https://logos.yubhub.co/view.com.png"},"x-apply-url":"https://jobs.workable.com/view/2JsY2bUdeEEzUfhn796RPb/associate%2Fvice-president%2C-ai-infrastructure-engineer-in-edinburgh-at-blackrock","x-work-arrangement":"hybrid","x-experience-level":"mid","x-job-type":"full-time","x-salary-range":null,"x-skills-required":["AWS","Azure","GCP","Cloud compute","Networking","Storage","Security services","ML platform operations","Governance concepts","Model deployment strategies","Lifecycle management","Monitoring/observability","Disaster Recovery","LLMs","Generative AI platforms","Model serving infrastructure","AI and machine learning workloads","Managed compute","Fine-tuning","Experimentation","End-to-end MLOps pipeline flow","Data ingestion","Training","Validation","Deployment","Infrastructure as Code","Terraform","ARM/Bicep","CloudFormation","Programming","Scripting","Containerized and Kubernetes-based platforms","Reliability","Scalability","Observability","Operational best practices"],"x-skills-preferred":["GPU or accelerator-based infrastructure","Financial services or highly regulated industries","Multicloud architectures and enterprise governance requirements"],"datePosted":"2026-03-09T16:39:47.983Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Edinburgh, Scotland"}},"employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Finance","skills":"AWS, Azure, GCP, Cloud compute, Networking, Storage, Security services, ML platform operations, Governance concepts, Model deployment strategies, Lifecycle management, Monitoring/observability, Disaster Recovery, LLMs, Generative AI platforms, Model serving infrastructure, AI and machine learning workloads, Managed compute, Fine-tuning, Experimentation, End-to-end MLOps pipeline flow, Data ingestion, Training, Validation, Deployment, Infrastructure as Code, Terraform, ARM/Bicep, CloudFormation, Programming, Scripting, Containerized and Kubernetes-based platforms, Reliability, Scalability, Observability, Operational best practices, GPU or accelerator-based infrastructure, Financial services or highly regulated industries, Multicloud architectures and enterprise governance requirements"}]}