Engineering Manager, Multimodal (API)

859c75b7-6fc Engineering Manager, Multimodal (API) We are seeking an Engineering Manager to lead our multimodal API product suite. Your team will be responsible for delivering innovative APIs across real-time processing, speech transcription, speech generation, and image creation.

You will own the product roadmap for how we evolve our multimodal API offerings, and you will build the products that allow developers to reach millions of end users through AI audio, video, and images.

Responsibilities

Build, mentor, and grow a high-performing engineering team focused on multimodal API products – including our real-time API, our transcription models (Whisper), our speech generation models (TTS), and our image generation APIs (DALLE and native 4o).
Collaborate closely with product managers, designers, and other stakeholders to define the strategic vision and product roadmap.
Work closely with our research teams to improve our core multimodal models for API customer use cases.
Guide technical and architectural decisions, emphasizing scalability, robustness, and user experience.
Foster a culture of innovation, continuous improvement, and accountability within your team.

Qualifications

Proven experience managing engineering teams that deliver complex, high-quality products at scale.
Strong technical background and proficiency in modern software engineering practices and system architecture.
Excellent collaboration and communication skills to effectively coordinate across diverse teams and stakeholders.
Familiarity with or strong interest in multimodal AI, including speech technologies, real-time systems, and image generation.
Ability to operate effectively in a fast-paced, ambiguous startup environment.

Preferred Qualifications

Experience developing multimodal systems or APIs in AI/ML domains, especially around image generation, audio generation, or speech transcription.
Familiarity with real-time streaming technologies, audio processing, and computer vision.
Hands-on experience with cloud platforms and distributed architectures.

XML job scraping automation by YubHub

]]> Full time mid onsite $293K – $385K multimodal AI, speech technologies, real-time systems, image generation, cloud platforms, distributed architectures, audio generation, speech transcription, real-time streaming technologies, audio processing, computer vision Engineering Technology OpenAI https://logos.yubhub.co/openai.com.png OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. https://openai.com/ https://jobs.ashbyhq.com/openai/1d7f4747-54a3-4141-a39a-c6e7700e969b San Francisco 2026-04-24 ab6520e8-f42 Software Engineer, Full Stack You'll build the AI features at the heart of Gamma's content creation experience. This involves designing and implementing full-stack features powered by LLMs and image generation models, crafting AI workflows that feel intuitive and magical to users, and pushing the boundaries of how we integrate the latest models into our product.

You'll work across the entire stack, from prompt engineering to frontend integration, shaping how millions of people use AI to create presentations, documents, and websites. You'll partner closely with product, design, and engineering to turn AI capabilities into polished, delightful experiences that users reach for every day.

Our team has a strong in-office culture and works in person 4–5 days per week in San Francisco. We love working together to stay creative and connected, with flexibility to work from home when focus matters most.

Responsibilities

Design and implement full-stack AI features using LLMs and image generation models

Build, optimise, and evaluate AI workflows using Gamma's in-house JSX-based prompting framework

Experiment with new models and providers to improve content creation quality, cost, and reliability

Ship features that integrate cutting-edge AI into intuitive user experiences across millions of daily requests

Collaborate with product and design to translate AI possibilities into polished, user-facing features

Optimise AI performance and cost at scale, balancing quality with speed and reliability

Requirements

3+ years building production web applications with strong full-stack fundamentals, including proficiency in TypeScript, React, and Node.js (or similar)

Experience working with LLMs, prompt engineering, or AI/ML systems in production, with genuine curiosity about new models and capabilities

Familiarity with frontend engineering, state management, data models, and API design

Strong product sense, with the ability to translate technical AI capabilities into real user value

A curious, self-driven mindset that matches your technical skill

Experience building AI-powered consumer products or working with image generation models (Nice to have)

Compensation

The base salary for this full-time position, which spans multiple internal levels depending on qualifications, ranges between $180K - $275K plus benefits & equity.

XML job scraping automation by YubHub

]]> Full time mid hybrid $180K - $275K TypeScript, React, Node.js, LLMs, prompt engineering, AI/ML systems, frontend engineering, state management, data models, API design, image generation models Engineering Technology Gamma https://logos.yubhub.co/gamma.com.png Gamma is a technology company that creates content creation experiences. https://gamma.com https://jobs.ashbyhq.com/gamma/068f61d9-3dad-4564-9bca-7b4547e69e55 San Francisco 2026-04-24 0ced2527-01e Member of Technical Staff, Principal Tech Lead Manager, Image Generation We are hiring a Principal Tech Lead Manager to own and grow Copilot's image generation capabilities. This is a high-impact, high-ownership role at the intersection of frontier model integration, evaluation science, and product quality. The successful candidate will set the technical direction for image generation, lead a team of Applied AI engineers and platform engineers, and drive measurable improvements in image quality, user satisfaction, and first-run success.

Responsibilities:

Define and own the end-to-end technical roadmap for Copilot image generation , from prompt understanding and model integration to evaluation, quality, and reliability.
Establish architectural best practices for image generation pipelines, including prompt conditioning, safety filtering, multimodal grounding, and user personalization.
Partner with research, product, and infrastructure teams to translate long-horizon vision into executable milestones.
Serve as the primary technical decision-maker for model selection, integration approaches, and evaluation strategy.
Lead the integration, evaluation, and launch of new image generation models and capabilities into Copilot surfaces.
Drive improvements to first-run success rate, image quality, prompt adherence, and overall user satisfaction with generated images.
Build and own the feedback loop from user signals to model and prompt iteration.
Close the loop from production data to systematic improvements.
Identify and resolve failure modes in image generation including safety gaps, quality regressions, prompt misinterpretation, and accessibility issues.

Qualifications:

Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
Preferred Master's degree or PHD or equivalent experience in Computer Science, Applied Math, Statistics, or a related field.
5+ years of industry experience in applied ML, AI product engineering, or related disciplines.
1+ years managing a team of engineers or data scientists.
Demonstrated experience shipping production AI or ML systems at scale.
Prior experience leading or mentoring engineers, including at formal tech lead and senior IC responsibilities.
Hands-on experience building evaluation frameworks or quality systems for AI/ML products.
Track record of driving measurable quality or performance improvements through systematic iteration.
Direct experience with image generation models (diffusion models, GANs, multimodal models, etc.).
Experience building and owning hillclimbing infrastructure for generative AI.
Formal people management experience, including hiring and performance management.
Experience working on consumer AI products with high user visibility and quality bar.
Prior work in responsible AI for generative media, including safety filtering and content policy enforcement.

XML job scraping automation by YubHub

]]> full-time senior onsite $139,900 - $274,800 per year C, C++, C#, Java, JavaScript, Python, Computer Science, Applied Math, Statistics, AI, Machine Learning, Image Generation, Diffusion Models, GANs, Multimodal Models, Hillclimbing Infrastructure, Generative AI, Responsible AI, Safety Filtering, Content Policy Enforcement Engineering Technology Microsoft https://logos.yubhub.co/microsoft.ai.png Microsoft is a multinational technology company that develops, manufactures, licenses, and supports a wide range of software products, services, and devices. https://microsoft.ai https://microsoft.ai/job/member-of-technical-staff-principal-tech-lead-manager-image-generation/ Redmond 2026-04-24 baf944dc-dee Image Tutor About the Role

As an Image Specialist, you will contribute to xAI's mission by training and refining Grok's ability to interpret and generate compelling and tasteful visual content.

Responsibilities

Use proprietary software to provide labels, annotations, and inputs on projects involving images and visual media.
Deliver high-quality curated data that captures visual nuances and enhances Grok's understanding of composition, lighting, color, and style.

Basic Qualifications

Portfolio displaying excellence in visual work - photography, digital art, illustration, graphic design, or similar.
Strong skills in composition, lighting, color theory, and visual storytelling.
Hands-on experience with AI image generation tools.
Ability to critically analyze and articulate what makes an image work or fail.
Strong communication and analytical skills.

Preferred Skills and Experience

Experience in photo editing, retouching, or post-production workflows.
Familiarity with prompt engineering techniques for image generation models.
Background in art direction, visual critique, or creative feedback roles.
Familiarity with ComfyUI workflows.

Location and Other Expectations

Tutor roles may be offered as full-time, part-time, or contractor positions, depending on role needs and candidate fit.
For contractor positions, hours will vary widely based on project scope and contractor availability, with no fixed commitments required.
Tutor roles may be performed remotely from any location worldwide, subject to legal eligibility, time-zone compatibility, and role specific needs.

Compensation and Benefits

US based candidates: $45/hour - $75/hour depending on factors including relevant experience, skills, education, geographic location, and qualifications. International candidates: Information will be provided to you during the recruitment process.

Benefits vary based on employment type, location and jurisdiction. Benefits for eligible U.S. based positions include health insurance, 401(k) plan, and paid sick leave. Specific details and role specific information will be provided to you during the interview process.

XML job scraping automation by YubHub

]]> full-time|part-time|contractor remote $45/hour - $75/hour proprietary software, composition, lighting, color theory, visual storytelling, AI image generation tools, critical analysis, communication, analytical skills, photo editing, retouching, post-production workflows, prompt engineering techniques, art direction, visual critique, creative feedback roles, ComfyUI workflows Engineering Technology xAI https://logos.yubhub.co/xai.com.png xAI creates AI systems to understand the universe and aid humanity in its pursuit of knowledge. https://www.xai.com/ https://job-boards.greenhouse.io/xai/jobs/5047544007 Remote 2026-04-18 6e0e451e-837 Account Executive We're looking for an Account Executive to join our team. As an Account Executive, you will own the full sales cycle from qualification through close for SMB, startup, AI-native, and enterprise customers. You will manage a book of 30-50+ accounts, driving retention, adoption, and expansion. You will qualify inbound leads and run outbound prospecting into target segments. You will execute sales campaigns in collaboration with sales leadership. You will handle deals across API access and weight licensing. You will translate customer needs and usage patterns into expansion opportunities. You will collaborate with solutions engineering and product on technical questions.

This is a full-cycle role, and you will be responsible for everything from first touch to close to renewal and upsell. You will work with a variety of accounts, including fast-moving AI startups and household-name enterprises. You will need to be able to land deals, prove value, and grow accounts into strategic relationships.

If you're already closing deals and want to be at the forefront of the generative AI wave, this is the role for you.

We're looking for someone who has closed real deals, moves fast, and is hungry to have outsized impact at a company where your work directly drives revenue. You should have 2-4 years of experience in a closing sales role, ideally with a BDR-to-AE progression. You should be comfortable discussing model capabilities, API integration, and deployment with technical buyers. You should have a track record of managing a high volume of accounts and deals in parallel without dropping balls. You should be organized, disciplined, and self-directed.

XML job scraping automation by YubHub

]]> full-time mid hybrid Sales, Generative AI, API Integration, Deployment, Model Capabilities, Technical Buyers, Image Generation, Video Generation, Open-Source Business Models Sales Technology Black Forest Labs https://logos.yubhub.co/blackforestlabs.com.png Black Forest Labs is a research lab that develops foundational technologies for generative models, which power image and video creation. The company is headquartered in Freiburg, Germany, and has a growing presence in San Francisco. https://www.blackforestlabs.com/ https://job-boards.greenhouse.io/blackforestlabs/jobs/5142062008 Freiburg (Germany) 2026-04-17 930f14dd-03d Member of Technical Staff - Multimodal Safety - MAI Super Intelligence Team As a Member of Technical Staff, Multimodal Safety, you will work to develop and implement cutting-edge safety methodologies for post-training multimodal large language models to be served to millions of users through Copilot every day.

We work on the bleeding edge and leverage the most powerful pretrained models and algorithms, making it critical that we ensure our AI systems behave safely and align with organisational values.

You will be responsible for designing novel safety evaluation frameworks, curating high-quality data for robust evaluations and training, prototyping new safety capabilities, and developing safety-focused fine-tuning algorithms.

We're looking for outstanding individuals with deep expertise in multimodal AI safety who can translate research insights into practical solutions while being a strong communicator and collaborative teammate.

The ideal candidate takes the initiative in exploring new safety methodologies and enjoys building world-class, trustworthy AI experiences in a fast-paced applied research environment.

Microsoft's mission is to empower every person and every organisation on the planet to achieve more.

As employees we come together with a growth mindset, innovate to empower others, and collaborate to realise our shared goals.

Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Responsibilities:

Leverage expertise in multimodal safety to uncover potential risks and develop novel mitigation strategies, including alignment techniques and robustness improvements for multimodal large language models.

Create and implement comprehensive evaluation frameworks and red-teaming methodologies to assess model safety across diverse scenarios, edge cases, and potential failure modes.

Build automated safety testing systems, generalise safety solutions into repeatable frameworks, and write efficient code for safety pipelines and intervention systems.

Maintain a user-oriented perspective by understanding safety needs from user perspectives, validating safety approaches through user research, and serving as a trusted advisor on multimodal safety matters.

Track advances in multimodal safety research, identify relevant state-of-the-art techniques, and adapt safety algorithms to drive innovation in production systems serving millions of users.

Embody our culture and values.

XML job scraping automation by YubHub

]]> full-time staff hybrid $119,800 - $234,700 per year multimodal safety, diffusion models, image generation, video generation, audio generation, safety evaluation frameworks, red-teaming methodologies, automated safety testing systems, safety pipelines, intervention systems, multimodal LLM safety, evaluation frameworks, automated red-teaming, guardrail systems, safety pipelines, user-validated safety decisions Engineering Technology Microsoft https://logos.yubhub.co/microsoft.ai.png Microsoft is a multinational technology company that develops, manufactures, licenses, and supports a wide range of software products, services, and devices. https://microsoft.ai https://microsoft.ai/job/member-of-technical-staff-multimodal-safety-mai-super-intelligence-team-3/ New York 2026-03-08 d3a39f4c-d95 Software Engineer, Inference - Multi Modal Software Engineer, Inference - Multi Modal

Location

San Francisco

Employment Type

Full time

Department

Scaling

Compensation

$295K – $555K • Offers Equity

The base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If the role is non-exempt, overtime pay will be provided consistent with applicable laws. In addition to the salary range listed above, total compensation also includes generous equity, performance-related bonus(es) for eligible employees, and the following benefits.

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts

Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)

401(k) retirement plan with employer match

Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)

Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees

13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)

Mental health and wellness support

Employer-paid basic life and disability coverage

Annual learning and development stipend to fuel your professional growth

Daily meals in our offices, and meal delivery credits as eligible

Relocation support for eligible employees

Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.

More details about our benefits are available to candidates during the hiring process.

This role is at-will and OpenAI reserves the right to modify base pay and other compensation components at any time based on individual performance, team or company results, or market conditions.

About the Team

OpenAI’s Inference team powers the deployment of our most advanced models - including our GPT models, 4o Image Generation, and Whisper - across a variety of platforms. Our work ensures these models are available, performant, and scalable in production, and we partner closely with Research to bring the next generation of models into the world. We're a small, fast-moving team of engineers focused on delivering a world-class developer experience while pushing the boundaries of what AI can do.

We’re expanding into multimodal inference, building the infrastructure needed to serve models that handle image, audio, and other non-text modalities. These workloads are inherently more heterogeneous and experimental, involving diverse model sizes and interactions, more complex input/output formats, and tighter coordination with product and research.

About the Role

We’re looking for a software engineer to help us serve OpenAI’s multimodal models at scale. You’ll be part of a small team responsible for building reliable, high-performance infrastructure for serving real-time audio, image, and other MM workloads in production.

This work is inherently cross-functional: you’ll collaborate directly with researchers training these models and with product teams defining new modalities of interaction. You'll build and optimize the systems that let users generate speech, understand images, and interact with models in ways far beyond text.

In this role, you will:

Design and implement inference infrastructure for large-scale multimodal models.

Optimize systems for high-throughput, low-latency delivery of image and audio inputs and outputs.

Enable experimental research workflows to transition into reliable production services.

Collaborate closely with researchers, infra teams, and product engineers to deploy state-of-the-art capabilities.

Contribute to system-level improvements including GPU utilization, tensor parallelism, and hardware abstraction layers.

You might thrive in this role if you:

Have experience building and scaling inference systems for LLMs or multimodal models.

Have worked with GPU-based ML workloads and understand the performance dynamics of large models, especially with complex data like images or audio.

Enjoy experimental, fast-evolving work and collaborating closely with research.

Are comfortable dealing with systems that span networking, distributed compute, and high-throughput data handling.

Have familiarity with inference tooling like vLLM, TensorRT-LLM, or custom model parallel systems.

Own problems end-to-end and are excited to operate in ambiguous, fast-moving spaces.

Nice to Have:

Experience working with image generation or audio synthesis models in production.

Exposure to distributed ML training or system-efficient model design.

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

XML job scraping automation by YubHub

]]> full-time mid onsite $295K – $555K • Offers Equity Software Engineer, Inference Infrastructure, GPU-based ML Workloads, Tensor Parallelism, Hardware Abstraction Layers, vLLM, TensorRT-LLM, Custom Model Parallel Systems, Image Generation, Audio Synthesis, Distributed ML Training, System-Efficient Model Design Engineering Technology OpenAI https://logos.yubhub.co/openai.com.png OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. https://jobs.ashbyhq.com https://jobs.ashbyhq.com/openai/4d14449e-5e7f-45d4-b103-8776a6c87086 San Francisco 2026-03-06