Executive Business Partner

0e51b45e-f97 Executive Business Partner We're hiring an Executive Business Partner to support several technical leaders out of our New York office. This is a non-traditional EA role, requiring creativity in adapting to different people's work styles and the new challenges that emerge at a fast-moving startup.

You will help our team stay focused and organised, managing personal logistics and any tasks that might fall through the cracks. You'll also be the operational lead for our NYC office , coordinating with other locations, planning team events, and managing travel and schedules.

Key responsibilities include:

Managing calendars, scheduling meetings, and coordinating travel for 3-4 technical leaders
Serving as the primary point of contact between your supported leaders and the rest of the company
Owning NYC office operations: coordinating with SF, managing local logistics including swag and supplies, planning team events and offsites
Supporting recruiting coordination efforts
Tracking projects and commitments so nothing falls through the cracks

This role entails real autonomy in making decisions without tight supervision. As the senior operations person in NYC, you will have the opportunity to make a significant impact on the team's success.

We offer a competitive salary range of $200,000 - $250,000 USD per year, depending on background, skills, and experience. We also provide generous health, dental, and vision benefits, unlimited PTO, paid parental leave, and relocation support as needed.

XML job scraping automation by YubHub

]]> full-time executive onsite $200,000 - $250,000 USD per year Executive or administrative support experience, Track record of supporting technical leaders, Adaptability and flexibility, Discretion and professionalism, Coordinating travel and logistics, Experience managing a satellite office, Coordinating across time zones, Adapting to a fast-changing role, Proven professionalism and discretion, Having followed an executive to a new company Operations Technology Thinking Machines Lab https://logos.yubhub.co/thinkingmachines.ai.png Thinking Machines Lab is a technology company that specialises in developing artificial intelligence products. https://thinkingmachines.ai/ https://job-boards.greenhouse.io/thinkingmachines/jobs/5142474008 New York, New York 2026-04-18 b79d9627-55a Research Engineer, Infrastructure, Training Systems We're seeking an infrastructure research engineer to design and build scalable, efficient training systems for large models. As a key member of our team, you'll take ownership of the training stack end-to-end, ensuring every GPU cycle drives scientific progress. Your goal is to make experimentation and training at Thinking Machines fast and reliable, allowing our research teams to focus on science, not system bottlenecks.

Key responsibilities include designing, implementing, and optimizing distributed training systems, developing high-performance optimizations, and establishing standards for reliability, maintainability, and security. You'll collaborate with researchers and engineers to build scalable infrastructure and publish learnings through internal documentation, open-source libraries, or technical reports.

We're looking for someone who blends deep systems and performance expertise with a curiosity for machine learning at scale. A strong understanding of deep learning frameworks, such as PyTorch, and experience working on distributed training for large models are preferred. If you have a track record of improving research productivity through infrastructure design or process improvements, that's a plus.

This role is based in San Francisco, California, and offers a competitive salary range of $350,000 - $475,000 USD per year, depending on background, skills, and experience. We sponsor visas and offer generous health, dental, and vision benefits, unlimited PTO, paid parental leave, and relocation support as needed.

XML job scraping automation by YubHub

]]> full-time senior onsite $350,000 - $475,000 USD per year deep learning frameworks, distributed training, high-performance optimizations, reliability, maintainability, and security, scalable infrastructure, past experience working on distributed training for large models, track record of improving research productivity through infrastructure design or process improvements, contributions to open-source ML infrastructure Engineering Technology Thinking Machines Lab https://logos.yubhub.co/thinkingmachines.ai.png Thinking Machines Lab develops AI products, including ChatGPT and Character.ai, and contributes to open-source projects like PyTorch. https://thinkingmachines.ai/ https://job-boards.greenhouse.io/thinkingmachines/jobs/5013932008 San Francisco 2026-04-18 7e28478b-c37 Research, Audio Expertise We're seeking a researcher to advance the frontier of audio capabilities. You'll explore how audio models enable more natural and efficient communication/collaboration, preserving more information and capturing user intent.

This is a highly collaborative role. You'll work closely across pre-training, post-training, and product with world-class researchers, infrastructure engineers, and designers.

As a researcher in this role, you'll be expected to:

Own research projects on audio training, low-latency inference, and conversational responsiveness.
Design and train large-scale models that natively support audio input and output.
Investigate scaling behaviour such as how data, model size, and compute affect capability and efficiency.
Build and maintain audio data pipelines, including preprocessing, filtering, segmentation, and alignment for training and evaluation.
Collaborate with data and infrastructure teams to scale audio training efficiently across distributed systems.
Publish and present research that moves the entire community forward.

Share code, datasets, and insights that accelerate progress across industry and academia.

This role blends fundamental research and practical engineering, as we do not distinguish between the two roles internally. You will be expected to write high-performance code and read technical reports.

It's an excellent fit for someone who enjoys both deep theoretical exploration and hands-on experimentation, and who wants to shape the foundations of how AI learns.

XML job scraping automation by YubHub

]]> full-time mid|senior onsite $350,000 - $475,000 USD Python, PyTorch, TensorFlow, JAX, Machine Learning, Deep Learning, Distributed Compute Environments, Probability, Statistics, Real-time Inference, Streaming Architectures, Optimization for Low Latency, Large-Scale Audio or Multimodal Models, Speech, Audio, Voice, or Similar Areas Engineering Technology Thinking Machines Lab https://logos.yubhub.co/thinkingmachines.ai.png Thinking Machines Lab is a research organisation that focuses on advancing collaborative general intelligence through AI products and open-source projects. https://thinkingmachines.ai/ https://job-boards.greenhouse.io/thinkingmachines/jobs/5002212008 San Francisco 2026-04-18 0a2ea62c-943 Research Engineer, Infrastructure, RL Systems We're looking for an infrastructure research engineer to design and build the core systems that enable scalable, efficient training of large models through reinforcement learning.

This role sits at the intersection of research and large-scale systems engineering: a builder who understands both the algorithms behind RL and the realities of distributed training and inference at scale. You'll wear many hats, from optimising rollout and reward pipelines to enhancing reliability, observability, and orchestration, collaborating closely with researchers and infra teams to make reinforcement learning stable, fast, and production-ready.

Responsibilities:

Design, build, and optimise the infrastructure that powers large-scale reinforcement learning and post-training workloads.

Improve the reliability and scalability of RL training pipeline, distributed RL workloads, and training throughput.

Develop shared monitoring and observability tools to ensure high uptime, debuggability, and reproducibility for RL systems.

Collaborate with researchers to translate algorithmic ideas into production-grade training pipelines.

Build evaluation and benchmarking infrastructure that measures model progress on helpfulness, safety, and factuality.

Publish and share learnings through internal documentation, open-source libraries, or technical reports that advance the field of scalable AI infrastructure.

We're looking for someone with strong engineering skills, ability to contribute performant, maintainable code and debug in complex codebases. You should have a good understanding of deep learning frameworks (e.g., PyTorch, JAX) and their underlying system architectures.

Experience training or supporting large-scale language models with tens of billions of parameters or more is a plus. Familiarity with monitoring and observability tools (Prometheus, Grafana, OpenTelemetry) is also a plus.

Logistics:

Location: This role is based in San Francisco, California.

Compensation: Depending on background, skills and experience, the expected annual salary range for this position is $350,000 - $475,000 USD.

Visa sponsorship: We sponsor visas. While we can't guarantee success for every candidate or role, if you're the right fit, we're committed to working through the visa process together.

Benefits: Thinking Machines offers generous health, dental, and vision benefits, unlimited PTO, paid parental leave, and relocation support as needed.

XML job scraping automation by YubHub

]]> full-time senior onsite $350,000 - $475,000 USD deep learning frameworks, PyTorch, JAX, complex codebases, scalable AI infrastructure, large-scale language models, monitoring and observability tools, experience training or supporting large-scale language models, familiarity with monitoring and observability tools Engineering Technology Thinking Machines Lab https://logos.yubhub.co/thinkingmachineslab.com.png Thinking Machines Lab is a research organisation that focuses on developing collaborative general intelligence. https://thinkingmachineslab.com/ https://job-boards.greenhouse.io/thinkingmachines/jobs/5013930008 San Francisco 2026-04-18 07a3c83e-51e Research Engineer, Infrastructure, Numerics We're looking for an infrastructure research engineer to design and build the core systems that enable efficient large-scale model training with a focus on numerics. You will focus on improving the numerical foundations of our distributed training stack, from precision formats and kernel optimizations to communication frameworks that make training trillion-parameter models stable, scalable, and fast.

This role is ideal for someone who thrives at the intersection of research and systems engineering: a builder who understands both the math of optimization and the realities of distributed compute.

Responsibilities:

Design and optimize distributed training infrastructure for large-scale LLMs, focusing on performance, stability, and reproducibility across multi-GPU and multi-node setups.
Implement and evaluate low-precision numerics (for example, BF16, MXFP8, NVFP4) to improve efficiency without sacrificing model quality.
Develop kernels and communication primitives that use hardware-level support for mixed and low-precision arithmetic.
Collaborate with research teams to co-design model architectures and training recipes that align with emerging numeric formats and stability constraints.
Prototype and benchmark scaling strategies such as data, tensor, and pipeline parallelism that integrate precision-adaptive computation and quantized communication.
Contribute to the design of our internal orchestration and monitoring systems to ensure that thousands of distributed experiments can run efficiently and reproducibly.
Publish and share learnings through internal documentation, open-source libraries, or technical reports that advance the field of scalable AI infrastructure.

Skills and Qualifications:

Minimum qualifications:

Bachelor’s degree or equivalent experience in computer science, electrical engineering, statistics, machine learning, physics, robotics, or similar.
Understanding of deep learning frameworks (e.g., PyTorch, JAX) and their underlying system architectures.
Thrive in a highly collaborative environment involving many, different cross-functional partners and subject matter experts.
A bias for action with a mindset to take initiative to work across different stacks and different teams where you spot the opportunity to make sure something ships.
Strong engineering skills, ability to contribute performant, maintainable code and debug in complex codebases in areas such as floating-point numerics, low-precision arithmetic, and distributed systems.

Preferred qualifications , we encourage you to apply if you meet some but not all of these:

Familiarity with distributed frameworks such as PyTorch/XLA, DeepSpeed, Megatron-LM.
Experience implementing FP8, INT8, or block-floating point (MX) formats and understanding their numerical trade-offs.
Prior contributions to open-source deep learning infrastructure such as PyTorch, DeepSpeed, or XLA.
Publications, patents, or projects related to numerical optimization, communication-efficient training, or systems for large models.
Experience training and supporting large-scale AI models.
Track record of improving research productivity through infrastructure design or process improvements.

Logistics:

Location: This role is based in San Francisco, California.
Compensation: Depending on background, skills and experience, the expected annual salary range for this position is $350,000 - $475,000 USD.
Visa sponsorship: We sponsor visas. While we can't guarantee success for every candidate or role, if you're the right fit, we're committed to working through the visa process together.
Benefits: Thinking Machines offers generous health, dental, and vision benefits, unlimited PTO, paid parental leave, and relocation support as needed.

XML job scraping automation by YubHub

]]> full-time senior onsite $350,000 - $475,000 USD Bachelor’s degree or equivalent experience in computer science, electrical engineering, statistics, machine learning, physics, robotics, or similar, Understanding of deep learning frameworks (e.g., PyTorch, JAX) and their underlying system architectures, Thriving in a highly collaborative environment involving many, different cross-functional partners and subject matter experts, Strong engineering skills, ability to contribute performant, maintainable code and debug in complex codebases in areas such as floating-point numerics, low-precision arithmetic, and distributed systems, Familiarity with distributed frameworks such as PyTorch/XLA, DeepSpeed, Megatron-LM, Experience implementing FP8, INT8, or block-floating point (MX) formats and understanding their numerical trade-offs, Prior contributions to open-source deep learning infrastructure such as PyTorch, DeepSpeed, or XLA, Publications, patents, or projects related to numerical optimization, communication-efficient training, or systems for large models, Experience training and supporting large-scale AI models, Track record of improving research productivity through infrastructure design or process improvements Engineering Technology Thinking Machines Lab https://logos.yubhub.co/thinkingmachines.ai.png Thinking Machines Lab is a company that creates AI products, including ChatGPT and Character.ai, and contributes to open-source projects like PyTorch. https://thinkingmachines.ai/ https://job-boards.greenhouse.io/thinkingmachines/jobs/5013937008 San Francisco 2026-04-18 ea436883-1a8 Recruiting Coordination and Operations Specialist We're hiring a Recruiting Coordinator to help us build a best-in-class recruiting engine as we scale. This is a coordination-heavy role, but the scope goes beyond scheduling - you'll own the systems and processes that keep the team organized and moving, and you'll be expected to improve them over time.

This means helping to build the processes, documentation, and systems that grow with us, not just executing the ones that already exist. Beyond scheduling, you'll plug into broader recruiting operations such as talent mapping, reporting, referral processing, and events - wherever the team needs support.

Your responsibilities will include:

Maintaining data integrity across our ATS and recruiting tools - auditing regularly and keeping things accurate. Supporting talent mapping, referral processing, inbound application triage, and recruiting events. Tracking interview feedback completion and following up to make sure nothing stalls. Identifying process gaps and proposing improvements - this role is expected to make the function better, not just execute. Collecting, organizing, and auditing interview feedback to ensure nothing gets lost or delayed. Maintaining up-to-date recruiting information in documentation and in Slack, supporting data integrity across recruiting systems.

You'll also manage high-volume interview coordination across multiple time zones, calendars, and stakeholders while maintaining a world-class candidate experience. Coordinate all interview logistics, from travel and expenses to room setup and technical support. Serve as the main point of contact for candidates, communicating in a warm, professional, and informative manner from initial screening through offer. Anticipate scheduling bottlenecks before they happen: monitor interviewer availability, flag conflicts early, and keep hiring velocity high. Scale interviewer pools by partnering with teams to identify, train, and prepare new interviewers as hiring volume grows. Use modern scheduling tools (e.g., Greenhouse, ModernLoop) to automate scheduling, reduce manual work, and improve response speed. Manage high-volume scheduling across time zones, calendars, and multiple stakeholders.

The ideal candidate will have experience coordinating schedules, managing calendars, or supporting operations in a fast-paced environment. They'll be excellent at written and verbal communication, able to adapt quickly to different teams and roles, and comfortable with Google Workspace, scheduling tools, and Slack.

Logistics: Location: San Francisco, California. Compensation: $140,000 - $200,000 USD per year, depending on background, skills, and experience. Visa sponsorship: Yes. Benefits: Generous health, dental, and vision benefits, unlimited PTO, paid parental leave, and relocation support as needed.

XML job scraping automation by YubHub

]]> full-time mid onsite $140,000 - $200,000 USD Experience coordinating schedules, managing calendars, or supporting operations in a fast-paced environment, Excellent written and verbal communication skills, Ability to adapt quickly to different teams and roles, Comfortable with Google Workspace, scheduling tools, and Slack, Do-what-it-takes mindset: proactive, detail-oriented, and able to prioritize tasks effectively, 1+ years in a high-volume scheduling or recruiting coordination role, Experience with ATS, HRIS, and scheduling tools like Greenhouse, GoodTime, Workday, and ModernLoop, Experience partnering closely with recruiters and hiring managers, Proven ability to be proactive in ambiguous situations and propose multiple solutions, Experience supporting technical or leadership-level hiring Operations Technology Thinking Machines Lab https://logos.yubhub.co/thinkingmachines.ai.png Thinking Machines Lab is a technology company that creates AI products, including ChatGPT and Character.ai. https://thinkingmachines.ai/ https://job-boards.greenhouse.io/thinkingmachines/jobs/5156656008 San Francisco 2026-04-18 cba88898-896 Research Engineer, Infrastructure, Kernels We're looking for an infrastructure research engineer to design, optimize, and maintain the compute foundations that power large-scale language model training. You will develop high-performance ML kernels (e.g., CUDA, CuTe, Triton), enable efficient low-precision arithmetic, and improve the distributed compute stack that makes training large models possible.

This role is perfect for an engineer who enjoys working close to the metal and across the research boundary. You'll collaborate with researchers and systems architects to bridge algorithmic design with hardware efficiency. You'll prototype new kernel implementations, profile performance across hardware generations, and help define the numerical and parallelism strategies that determine how we scale next-generation AI systems.

Responsibilities

Design and implement custom ML kernels (e.g., CUDA, CuTe, Triton) for core LLM operations such as attention, matrix multiplication, gating, and normalization, optimized for modern GPU and accelerator architectures.
Design and think through compute primitives to reduce memory bandwidth bottlenecks and improve kernel compute efficiency.
Collaborate with research teams to align kernel-level optimizations with model architecture and algorithmic goals.
Develop and maintain a library of reusable kernels and performance benchmarks that serve as the foundation for internal model training.
Contribute to infrastructure stability and scalability, ensuring reproducibility, consistency across precision formats, and high utilization of compute resources.
Document and share insights through internal talks, technical papers, or open-source contributions to strengthen the broader ML systems community.

Skills and Qualifications

Minimum qualifications:

Bachelor’s degree or equivalent experience in computer science, electrical engineering, statistics, machine learning, physics, robotics, or similar.
Strong engineering skills, ability to contribute performant, maintainable code and debug in complex codebases
Understanding of deep learning frameworks (e.g., PyTorch, JAX) and their underlying system architectures.
Thrive in a highly collaborative environment involving many, different cross-functional partners and subject matter experts.
A bias for action with a mindset to take initiative to work across different stacks and different teams where you spot the opportunity to make sure something ships.
Proficiency in CUDA, CuTe, Triton, or other GPU programming frameworks.
Demonstrated ability to analyze, profile, and optimize compute-intensive workloads.

Preferred qualifications:

Experience training or supporting large-scale language models with tens of billions of parameters or more.
Track record of improving research productivity through infrastructure design or process improvements.
Experience developing or tuning kernels for deep learning frameworks such as PyTorch, JAX, or custom accelerators.
Familiarity with tensor parallelism, pipeline parallelism, or distributed data processing frameworks.
Experience implementing low-precision formats (FP8, INT8, block floating point) or contributing to related compiler stacks (e.g., XLA, TVM).
Contributions to open-source GPU, ML systems, or compiler optimization projects.
Prior research or engineering experience in numerical optimization, communication-efficient training, or scalable AI infrastructure.

XML job scraping automation by YubHub

]]> full-time senior onsite $350,000 - $475,000 USD CUDA, CuTe, Triton, GPU programming frameworks, Deep learning frameworks (e.g., PyTorch, JAX), Computer science, Electrical engineering, Statistics, Machine learning, Physics, Robotics, Experience training or supporting large-scale language models with tens of billions of parameters or more, Track record of improving research productivity through infrastructure design or process improvements, Experience developing or tuning kernels for deep learning frameworks such as PyTorch, JAX, or custom accelerators, Familiarity with tensor parallelism, pipeline parallelism, or distributed data processing frameworks, Experience implementing low-precision formats (FP8, INT8, block floating point) or contributing to related compiler stacks (e.g., XLA, TVM), Contributions to open-source GPU, ML systems, or compiler optimization projects, Prior research or engineering experience in numerical optimization, communication-efficient training, or scalable AI infrastructure Engineering Technology Thinking Machines Lab https://logos.yubhub.co/thinkingmachines.ai.png Thinking Machines Lab is a technology company that has created widely used AI products, including ChatGPT and Character.ai, and open-source projects like PyTorch. https://thinkingmachines.ai/ https://job-boards.greenhouse.io/thinkingmachines/jobs/5013934008 San Francisco 2026-04-18 9be280f4-cbc Software Engineer, Data Infrastructure We're looking for an engineer to join our small, high-impact team responsible for architecting and scaling the core infrastructure behind distributed training pipelines, multimodal data catalogs, and intelligent processing systems that operate over petabytes of data.

As a software engineer on our data infrastructure team, you'll design, build, and operate scalable, fault-tolerant infrastructure for LLM Research: distributed compute, data orchestration, and storage across modalities. You'll develop high-throughput systems for data ingestion, processing, and transformation , including training data catalogs, deduplication, quality checks, and search. You'll also build systems for traceability, reproducibility, and robust quality control at every stage of the data lifecycle.

You'll collaborate with research teams to unlock new features, improve data quality, and accelerate training cycles. You'll implement and maintain monitoring and alerting to support platform reliability and performance.

If you're excited by distributed systems, large-scale data mining, open-source tools like Spark, Kafka, Beam, Ray, and Delta Lake, and enjoy building from the ground up, we'd love to hear from you.

XML job scraping automation by YubHub

]]> full-time entry|mid|senior onsite $350,000 - $475,000 USD backend language (Python or Rust), distributed compute frameworks (Apache Spark or Ray), cloud infrastructure, data lake architectures, batch and streaming pipelines, Kafka, dbt, Terraform, Airflow, web crawler, deduplication, data mining, search, file formats and storage systems Engineering Technology Thinking Machines Lab https://logos.yubhub.co/thinkingmachines.ai.png Thinking Machines Lab is a research organisation that focuses on developing collaborative general intelligence. https://thinkingmachines.ai/ https://job-boards.greenhouse.io/thinkingmachines/jobs/5013919008 San Francisco 2026-04-18 4ced2159-802 Research, Vision Expertise Thinking Machines Lab is seeking a researcher to join their team in San Francisco. The successful candidate will work on advancing the science of visual perception and multimodal learning. They will design architectures that fuse pixels and text, build datasets and evaluation methods that test real-world comprehension, and develop representations that let models ground abstract concepts in the physical world.

The ideal candidate will have expertise in multimodality and experience running large-scale experiments. They will be comfortable contributing to complex engineering systems and have a strong grasp of probability, statistics, and machine learning fundamentals.

This is an evergreen role, meaning that the position is open on an ongoing basis. The company receives many applications, and there may not always be an immediate role that aligns perfectly with the candidate's experience and skills. However, they encourage candidates to apply and continuously review applications.

Responsibilities:

Own research projects on training and performance analysis of multimodal AI models.
Curate and build large-scale datasets and evaluation benchmarks to advance vision capabilities.
Work with data infrastructure engineers, pretraining researchers and engineers, and product teams to create frontier multimodal models and the products that leverage them.
Publish and present research that moves the entire community forward.

Skills and Qualifications:

Ability to design, run, and analyze experiments thoughtfully, with demonstrated research judgment and empirical rigor.
Understanding of machine learning fundamentals, large-scale training, and distributed compute environments.
Proficiency in Python and familiarity with at least one deep learning framework (e.g., PyTorch, TensorFlow, or JAX).
Comfortable with debugging distributed training and writing code that scales.
Bachelor's degree or equivalent experience in Computer Science, Machine Learning, Physics, Mathematics, or a related discipline with strong theoretical and empirical grounding.

Preferred qualifications include research or engineering contributions in visual reasoning, spatial understanding, or multimodal architecture design, experience developing evaluation frameworks for multimodal tasks, publications or open-source contributions in vision-language modeling, video understanding, or multimodal AI, and a strong grasp of probability, statistics, and ML fundamentals.

Logistics:

Location: San Francisco, California.
Compensation: $350,000 - $475,000 USD per year, depending on background, skills, and experience.
Visa sponsorship: Yes.
Benefits: Generous health, dental, and vision benefits, unlimited PTO, paid parental leave, and relocation support as needed.

XML job scraping automation by YubHub

]]> full-time senior onsite $350,000 - $475,000 USD per year Python, Deep learning framework (e.g., PyTorch, TensorFlow, or JAX), Machine learning fundamentals, Large-scale training, Distributed compute environments, Visual reasoning, Spatial understanding, Multimodal architecture design, Evaluation frameworks for multimodal tasks, Vision-language modeling, Video understanding, Multimodal AI Engineering Technology Thinking Machines Lab https://logos.yubhub.co/thinkingmachines.ai.png Thinking Machines Lab is a research organisation that focuses on advancing collaborative general intelligence. They have developed several widely used AI products, including ChatGPT and Character.ai. https://thinkingmachines.ai/ https://job-boards.greenhouse.io/thinkingmachines/jobs/5002288008 San Francisco 2026-04-18 a0fe4cba-5d3 Engineering Manager We're hiring an Engineering Manager to lead a team of senior and staff-level engineers across ML infrastructure and product. You will help the team build and scale systems that are reliable, performant, and easy to operate.

This role combines collaboration with hand-on work. You’ll partner with tech leads to set the technical direction for your team and own its execution. You should also be ready to go deep on system design and contribute directly when needed.

Responsibilities:

Lead and grow a team of senior and staff-level engineers, setting clear expectations and maintaining a high bar for execution.

Own architecture, system design, and long-term technical direction for your team's systems, with emphasis on reliability and performance.

Contribute directly to design reviews, prototyping, and debugging critical issues.

Partner with researchers and product teams to define roadmaps and prioritize work.

Hire and close senior engineering talent. Mentor engineers into technical leaders.

Skills and Qualifications:

Minimum qualifications:

Bachelor’s degree or equivalent industry experience in computer science, engineering, or similar.

8+ years of experience building and scaling production systems, including system design and distributed systems.

3+ years of engineering management experience in high-growth environments.

Preferred qualifications , we encourage you to apply even if you don’t meet all preferred qualifications, but at least some:

Experience managing teams of senior or staff-level engineers.

Background in infrastructure, systems engineering, or developer productivity.

Familiarity with AI/ML systems, data infrastructure, or high-performance computing.

Track record of building or contributing to widely used systems, platforms, or tools.

Logistics:

Compensation: Depending on background, skills and experience, the expected annual salary range for this position is $400,000 - $500,000 USD.

Visa sponsorship: We sponsor visas. While we can't guarantee success for every candidate or role, if you're the right fit, we're committed to working through the visa process together.

Benefits: Thinking Machines offers generous health, dental, and vision benefits, unlimited PTO, paid parental leave, and relocation support as needed.

XML job scraping automation by YubHub

]]> full-time senior onsite $400,000 - $500,000 USD computer science, engineering, system design, distributed systems, engineering management, infrastructure, systems engineering, developer productivity, AI/ML systems, data infrastructure, high-performance computing Engineering Technology Thinking Machines Lab https://logos.yubhub.co/thinkingmachineslab.com.png Thinking Machines Lab is a company that empowers humanity through advancing collaborative general intelligence. It has created some of the most widely used AI products. https://thinkingmachineslab.com/ https://job-boards.greenhouse.io/thinkingmachines/jobs/5165725008 San Francisco 2026-04-18 594b20c4-c28 Infrastructure Engineer, Security We're looking for an infrastructure engineer to own and evolve the security infrastructure that underpins our foundation models. In this role, you'll work across compute, storage, networking, and data platforms, making sure our systems are secure, reliable, and built to scale.

You'll shape controls, architecture, and tooling so that security is part of how the platform works by default. You'll partner closely with research and product teams, enabling them to move quickly while keeping our models, data, and environments protected.

Key responsibilities include:

Architecting security patterns for platforms and services, including network segmentation, service-to-service authentication, RBAC, and policy enforcement in Kubernetes and cloud environments.

Managing identity, access, and secrets for humans and services: workload and cross-cloud identity, least-privilege IAM, and secrets management.

Building secure platforms for data ingestion, processing, and curation: classification, encryption, access controls, and safe sharing patterns across teams.

Writing threat models and reviewing designs with researchers and engineers to help them ship features and experiments in a safe, scalable way.

Automating security checks and building guardrails: policy-as-code, secure infrastructure baselines, validation in CI/CD, and tools that make the secure path the easiest one.

Requirements include:

Bachelor's degree or equivalent experience in engineering, or similar.

Strong background with containers and orchestration (e.g., Kubernetes) and how to secure them (namespaces, network policies, pod security, admission controls, etc.).

Practical experience with Infrastructure as Code (Terraform or similar), including secure patterns for provisioning networks, IAM, and shared services.

Solid understanding of cloud networking and security: VPCs, load balancers, service discovery, mTLS, firewalls, and zero-trust-style architectures.

Proficiency with a systems language such as Rust and scripting in Python for building platform components and internal tools.

Evidence of owning complex, production-critical systems, including debugging issues that span infra, security, and application layers.

Preferred qualifications include experience with ML infrastructure, GPU clusters, or large-scale training environments, as well as background in AI labs, HPC environments, or ML-heavy organizations.

XML job scraping automation by YubHub

]]> full-time senior onsite $200,000 - $475,000 USD Kubernetes, Infrastructure as Code, Cloud Networking and Security, Systems Language (Rust), Scripting (Python), ML Infrastructure, GPU Clusters, Large-Scale Training Environments, AI Labs, HPC Environments Engineering Technology Thinking Machines Lab https://logos.yubhub.co/thinkingmachineslab.com.png Thinking Machines Lab is building a future where everyone has access to the knowledge and tools to make AI work for their unique needs and goals. https://thinkingmachineslab.com/ https://job-boards.greenhouse.io/thinkingmachines/jobs/5015964008 San Francisco 2026-04-18