Software Engineer, Research Data Platform

8f03ad2d-96f Software Engineer, Research Data Platform We're looking for engineers who love working directly with users and who excel at building data products. The Research Data Platform team builds the tools that Anthropic's researchers use every day to manage, query, and analyze the data that goes into training and evaluating frontier models.

As a Software Engineer on the Research Data Platform team, you will:

Build and operate data pipelines that extract data from research training runs and land it in storage systems that are easy and fast to query
Work closely with researchers to design and build APIs, libraries, and web interfaces that support data management, exploration, and analysis
Develop dataset management, data cataloging, and provenance tooling that researchers use in their day-to-day work
Embed with research teams to understand their workflows, identify high-leverage tooling opportunities, and ship solutions quickly
Collaborate with adjacent teams to build on existing systems rather than reinventing them

We do not require prior ML or AI training experience. If you enjoy working closely with technical users, learning new domains quickly, and building tools people actually want to use, you'll pick up the research context fast.

Strong candidates may also have experience with large-scale ETL, columnar storage formats, and query engines (e.g., Spark, BigQuery, DuckDB, Parquet), high-volume time series data , ingestion, storage, and efficient querying, data cataloging, lineage, or metadata management systems, or ML experiment tracking or metrics platforms.

XML job scraping automation by YubHub

]]> full-time mid hybrid $320,000-$405,000 USD large-scale ETL, columnar storage formats, query engines, high-volume time series data, data cataloging, lineage, metadata management systems, ML experiment tracking, Spark, BigQuery, DuckDB, Parquet Engineering Technology Anthropic https://logos.yubhub.co/anthropic.com.png Anthropic is a public benefit corporation that creates reliable, interpretable, and steerable AI systems. https://www.anthropic.com/ https://job-boards.greenhouse.io/anthropic/jobs/5191226008 San Francisco, CA | New York City, NY 2026-04-18 4075c787-328 Member of Technical Staff - Large Scale Data Infrastructure We're looking for infrastructure engineers to work at peta-to-exabyte scale. You'll build data systems behind the largest training runs on thousands of GPUs, where fixing one bottleneck lets researchers train the next breakthrough model.

What You'll Work On:

Scalable data loaders for training runs across thousands of GPUs
Efficient storage and retrieval systems for petabyte-scale datasets
Multi-cloud object storage abstraction
Execute large-scale data migrations across storage systems and providers
Debug and resolve performance bottlenecks in distributed data loading

Technical Focus:

Python, PyTorch DataLoader internals
Object storage (e.g. S3, Azure Blob, GCS)
Parquet for metadata
Video: ffmpeg, PyAV, codec fundamentals

What We're Looking For:

Built and operated data pipelines at petabyte scale
Optimized data loading
Worked with petabyte-scale video and image datasets
Written processing jobs operating on millions of files
Debugged distributed system bottlenecks across large fleets of machines

Nice to Have:

Experience streaming dataset formats (e.g. WebDataset)
Video codec internals and frame-accurate seeking
Distributed systems experience
Slurm and Kubernetes for job orchestration
Experience with object storage performance tuning across providers

How We Work Together:

We're a distributed team with real offices that people actually use. Depending on your role, you'll either join us in Freiburg or SF at least 2 days a week (or one full week every other week), or work remotely with a monthly in-person week to stay connected. We'll cover reasonable travel costs to make this possible. We think in-person time matters, and we've structured things to make it accessible to all. We'll discuss what this will look like for the role during our interview process.

Everything we do is grounded in four values:

Obsessed. We are a frontier research lab. The science has to be right, the understanding deep, the product beautiful.
Low Ego. The work speaks. The best idea wins, no matter who said it. Credit is shared. Nobody is above any task.
Bold. We take the ambitious bet. We ship, we do not wait for conditions to be perfect.
Kind. People over politics. We treat each other with genuine warmth. Agency without empathy creates chaos.

XML job scraping automation by YubHub

]]> full-time staff hybrid $180,000–$300,000 USD + Equity Python, PyTorch, Data Loader Internals, Object Storage, Parquet, Video, ffmpeg, PyAV, Codec Fundamentals, WebDataset, Distributed Systems, Slurm, Kubernetes, Object Storage Performance Tuning Engineering Technology Black Forest Labs https://logos.yubhub.co/blackforestlabs.com.png Black Forest Labs is a research lab developing foundational technologies for generative models that power image and video creation. https://www.blackforestlabs.com/ https://job-boards.greenhouse.io/blackforestlabs/jobs/5019171008 Freiburg (Germany), San Francisco (USA) 2026-04-17 1739131f-5ae Software Engineer - Silicon Validation Tools (SerDes) Engineer the Future with Us

We currently have 614 open roles

Innovation Starts Here

Find Jobs For

Where?When autocomplete results are available use up and down arrows to review and enter to select. Touch device users, explore by touch or with swipe gestures.

Software Engineer - Silicon Validation Tools (SerDes)

Mississauga, Ontario, Canada

Save

Category: EngineeringHire Type: Employee

Job ID 16183Date posted 03/22/2026

We Are:

At Synopsys, we drive the innovations that shape the way we live and connect. Our technology is central to the Era of Pervasive Intelligence, from self-driving cars to learning machines. We lead in chip design, verification, and IP integration, empowering the creation of high-performance silicon chips and software content. Join us to transform the future through continuous technological innovation.

You Are:

You are a hands-on software engineer passionate about bridging the gap between hardware and software in high-speed silicon validation environments. You thrive in lab-heavy settings, working directly with advanced instruments and hardware abstraction layers. Your expertise in Python allows you to architect robust automation frameworks and reusable libraries, streamlining complex workflows and ensuring repeatable, high-quality results. You’re comfortable translating intricate MATLAB algorithms into efficient Python code, maintaining numerical equivalence and clear documentation for future maintainability. Your experience integrating C/C++ SDKs, DLLs, and libraries into Python empowers seamless cross-team collaboration with firmware and software teams worldwide. You are meticulous in your approach to code governance, CI/CD pipelines, and release management, ensuring that your code is reliable, maintainable, and well-documented.

You communicate clearly, translating technical requirements into actionable solutions and collaborating across validation, SDK, firmware, and software teams. You are proactive in identifying opportunities for process improvement, championing best practices, and mentoring teammates. Your curiosity drives you to stay current with industry trends, and your adaptability shines in fast-paced environments where innovation and continuous improvement are valued. You are committed to delivering intuitive GUIs and data tools that empower your team to make informed decisions, reduce operator error, and streamline bench operations. With a strong foundation in high-speed SerDes technologies and lab instrumentation, you are ready to make a lasting impact on industry-leading IP and silicon validation processes.

What You’ll Be Doing:

Defining and maintaining Python automation architecture, folder/repo structure, and coding standards for lab environments.

Building hardware-abstraction layers for lab instruments (J-BERT, oscilloscopes, pattern generators, power supplies) using VISA/SCPI and vendor APIs.

Creating reusable libraries for test sequencing, calibration/adaptation flows, results logging, fault handling, and multi-bench resource scheduling.

Translating MATLAB algorithms and calibration scripts into robust Python (NumPy/SciPy/Pandas), ensuring numerical equivalence and documenting migration notes.

Developing and hardening drivers/wrappers for Keysight/Teledyne-LeCroy/R&S instruments, including connection management, waveform acquisition, and compliance scripts.

Integrating SDK C/C++ code, DLLs, and shared libraries into Python via ctypes/cffi/SWIG, collaborating with firmware teams to validate features and APIs.

Managing code repositories (Git/Perforce), branching strategies, code reviews, release tagging, and CI/CD pipelines for linting, testing, and packaging.

Building internal GUIs for bench control, run setup, live plots, and progress tracking, as well as data processing pipelines for quick-turn analysis and report generation.

The Impact You Will Have:

Accelerate SerDes bring-up and characterization through standardized Python automation and instrument libraries, reducing time-to-first-measurement and increasing bench throughput.

Enhance efficiency and reliability of Silicon Validation processes with robust frameworks, CI-verified packages, and repeatable workflows.

Improve test quality, reproducibility, and data analysis by shipping versioned tools, enforcing parity on MATLAB to Python conversions, and delivering clear analysis artifacts.

Contribute directly to the success of high-speed, mixed-signal SerDes IP, enabling faster SDK/FW feature validation and higher confidence in release readiness.

Support cutting-edge IP solutions that drive industry innovation by turning validation requirements into dependable software and GUIs adopted across benches.

Streamline communication and day-to-day operations across Validation, FW, SDK, Software via documented interfaces, shared roadmaps, and predictable release notes.

Increase code reuse and maintainability with a clear repo structure, a reusable hardware-abstraction layer, and shared GUI components that standardize user experience.

Deliver intuitive, maintainable GUIs for bring-up and mass-char workflows, reducing operator error and enabling telemetry-driven improvements over time.

Foster a culture of innovation and continuous improvement through mentorship, code reviews, documentation, and knowledge sharing within the validation team.

What You’ll Need:

2–6 years professional software engineering experience building automation frameworks or tooling in a hardware or lab environment.

Strong proficiency in Python (OOP, packaging, virtual environments, logging, multithreading for instrument I/O).

Experience controlling lab instruments via SCPI/VISA and vendor SDKs/APIs; comfortable with Windows and Linux benches.

Proven skill integrating native libraries (C/C++ DLL/.so) into Python and debugging across the boundary.

Solid practice with Git: code reviews, branching strategies, conflict resolution, and release tagging.

Data skills: NumPy/Pandas, CSV/Parquet, plotting, and producing analysis artifacts usable by the team.

Clear communicator who can translate validation requirements into robust software and collaborate across SDK, FW, and SW teams.

Who You Are:

Analytical thinker with strong problem-solving skills.

Collaborative team player who thrives in cross-functional environments.

Detail-oriented and organized in both code and documentation.

Curious and eager to learn new technologies and methodologies.

Adaptable, proactive, and comfortable with ambiguity and innovation.

Effective communicator, able to translate complex requirements into actionable solutions.

Mentor and knowledge sharer, fostering a culture of continuous improvement.

The Team You’ll Be A Part Of:

You’ll join the SerDes Silicon Validation team, a dynamic group of engineers dedicated to enabling high-speed IP innovation through rigorous hardware validation and software automation. The team works closely with worldwide software, firmware, and SDK teams, collaborating to deliver robust tools, frameworks, and GUIs that accelerate bring-up, characterization, and validation workflows. Together, you’ll drive advancements in silicon validation, empower efficient lab operations, and shape the future of high-performance IP solutions.

Rewards and Benefits:

We offer a comprehensive range of health, wellness, and financial benefits to cater to your needs. Our total rewards include both monetary and non-monetary offerings. Your recruiter will provide more details about the salary range and benefits during the hiring process.

XML job scraping automation by YubHub

]]> full-time mid onsite Python, MATLAB, C/C++, SCPI/VISA, Git, NumPy, Pandas, CSV/Parquet, plotting Engineering Technology Synopsys https://logos.yubhub.co/careers.synopsys.com.png Synopsys is a leading provider of electronic design automation (EDA) software and intellectual property (IP) used in the design, verification, and manufacturing of electronic systems. https://careers.synopsys.com https://careers.synopsys.com/job/mississauga/software-engineer-silicon-validation-tools-serdes/44408/93120696128 Mississauga 2026-04-05 d48b0655-2fa Data/Infrastructure Advocate Engineer At Hugging Face, we're on a journey to democratise good AI. As our first Data/Infrastructure Advocate Engineer, you'll bridge the gap between cutting-edge data infrastructure and the global community of data engineers, researchers, and developers.

You'll champion Xet storage on the Hugging Face Hub, empowering users to efficiently store, version, and collaborate on large-scale datasets. This role is for someone who thrives at the intersection of technical depth (storage, Parquet, deduplication) and community advocacy—helping define the future of open data workflows.

Your main missions will be:

Grow and nurture the open-source data/infra community—launch initiatives, collaborate with data-focused groups, and organise events or challenges.
Promote the Hugging Face Hub as the go-to platform for data storage, versioning, and collaboration—curate and showcase datasets, benchmarks, and tools like Xet.
Highlight use cases like efficient large dataset updates, Parquet editing, and deduplication to demonstrate the Hub's value for data workflows.
Create demos, benchmarks, and tools (e.g., Colab notebooks) to illustrate best practices for data storage and versioning.
Experiment with Xet, Parquet, and other data formats to showcase their potential for ML and data engineering.
Produce high-quality tutorials, blog posts, and videos that make complex topics accessible.
Share insights on storage optimisation, dataset versioning, and deduplication to empower developers.
Actively participate in online communities (Discord, GitHub, forums) to highlight contributions, answer questions, and foster collaboration.
Ensure datasets and tools released on the Hub are well-documented, with clear examples, benchmarks, and use cases.

About you

You're a great fit if you:

Have strong technical skills in Python, data libraries (e.g., pandas, pyarrow, huggingface/datasets), and storage systems (Parquet, Open Table Formats, S3).
Are a hands-on builder who loves experimenting with data tools, storage optimisation, and dataset versioning.
Can clearly explain complex topics (e.g., deduplication, compression, Parquet editing) through writing, demos, or talks.
Are active in developer communities (GitHub, Discord, forums) and passionate about open source and knowledge sharing.
Thrive in fast-moving environments and enjoy building in public to inspire others.

If you're interested in joining us but don't tick every box above, we still encourage you to apply! We're building a diverse team whose skills, experiences, and backgrounds complement one another.

More about Hugging Face

Hugging Face is an equal opportunity employer, and we do not discriminate based on race, ethnicity, religion, colour, national origin, gender, sexual orientation, age, marital status, veteran status, or ability status.

We value development. You will work with some of the smartest people in our industry.

We provide all employees with reimbursement for relevant conferences, training, and education.

We care about your well-being. We offer flexible working hours and remote options.

We offer health, dental, and vision benefits for employees and their dependents.

We also offer parental leave and flexible paid time off.

We support our employees wherever they are. While we have office spaces in NYC and Paris, we're very distributed, and all remote employees have the opportunity to visit our offices.

If needed, we'll also outfit your workstation to ensure you succeed.

We want our teammates to be shareholders. All employees have company equity as part of their compensation package.

If we succeed in becoming a category-defining platform in machine learning and artificial intelligence, everyone enjoys the upside.

XML job scraping automation by YubHub

]]> full-time entry remote Python, data libraries, pandas, pyarrow, huggingface/datasets, storage systems, Parquet, Open Table Formats, S3 Engineering Technology Hugging Face Hugging Face is a platform for AI builders with over 5 million users and 100k organisations. https://huggingface.co/ https://apply.workable.com/j/5CA82A9A98 New York 2026-03-10 f81a1dc8-ca4 Data/Infrastructure Advocate Engineer - EMEA Remote At Hugging Face, we're on a journey to democratize good AI. We are building the fastest growing platform for AI builders with over 5 million users & 100k organisations who collectively shared over 1M models, 300k datasets & 300k apps. Our open-source libraries have more than 400k+ stars on Github.

As our first Data/Infrastructure Advocate Engineer, you'll bridge the gap between cutting-edge data infrastructure and the global community of data engineers, researchers, and developers. You'll champion Xet storage on the Hugging Face Hub, empowering users to efficiently store, version, and collaborate on large-scale datasets.

This role is for someone who thrives at the intersection of technical depth (storage, Parquet, deduplication) and community advocacy—helping define the future of open data workflows. You'll collaborate with teams like Datasets, Hub, and Infrastructure to shape how developers interact with data on our platform, and inspire a community to build better, faster, and more scalable data pipelines.

Your Main Missions:

Grow and nurture the open-source data/infra community—launch initiatives, collaborate with data-focused groups, and organise events or challenges. Engage with communities like Apache Parquet, Open Tables Formats, and data engineering forums to promote best practices and Hugging Face tools.

Promote the Hugging Face Hub as the go-to platform for data storage, versioning, and collaboration—curate and showcase datasets, benchmarks, and tools like Xet.

Highlight use cases like efficient large dataset updates, Parquet editing, and deduplication to demonstrate the Hub’s value for data workflows.

Create demos, benchmarks, and tools (e.g., Colab notebooks) to illustrate best practices for data storage and versioning.

Experiment with Xet, Parquet, and other data formats to showcase their potential for ML and data engineering.

Produce high-quality tutorials, blog posts, and videos that make complex topics accessible.

Share insights on storage optimisation, dataset versioning, and deduplication to empower developers.

Actively participate in online communities (Discord, GitHub, forums) to highlight contributions, answer questions, and foster collaboration.

Ensure datasets and tools released on the Hub are well-documented, with clear examples, benchmarks, and use cases.

About you

You’re a great fit if you:

Have strong technical skills in Python, data libraries (e.g., pandas, pyarrow, huggingface/datasets), and storage systems (Parquet, Open Table Formats, S3).

Are a hands-on builder who loves experimenting with data tools, storage optimisation, and dataset versioning.

Can clearly explain complex topics (e.g., deduplication, compression, Parquet editing) through writing, demos, or talks.

Are active in developer communities (GitHub, Discord, forums) and passionate about open source and knowledge sharing.

Thrive in fast-moving environments and enjoy building in public to inspire others.

If you're interested in joining us but don't tick every box above, we still encourage you to apply! We're building a diverse team whose skills, experiences, and backgrounds complement one another. We're happy to consider where you might be able to make the biggest impact.

More about Hugging Face

We are actively working to build a culture that values diversity, equity, and inclusivity. We are intentionally building a workplace where you feel respected and supported—regardless of who you are or where you come from. We believe this is foundational to building a great company and community, as well as the future of machine learning more broadly. Hugging Face is an equal opportunity employer, and we do not discriminate based on race, ethnicity, religion, colour, national origin, gender, sexual orientation, age, marital status, veteran status, or ability status.

XML job scraping automation by YubHub

]]> full-time entry remote Python, data libraries, pandas, pyarrow, huggingface/datasets, storage systems, Parquet, Open Table Formats, S3 Engineering Technology Hugging Face Hugging Face is a platform for AI builders with over 5 million users and 100k organisations. https://huggingface.co/ https://apply.workable.com/j/7C7F63E87A Paris 2026-03-10 0841fcf4-9ab Data Engineer SE - II We are on a mission to rid the world of bad customer service by “mobilizing” the way help is delivered. Today’s consumers want an always-available customer service experience that leaves them feeling valued and respected.

Helpshift helps B2B brands deliver this modern customer service experience through a mobile-first approach. We have changed how conversations take place, moving the conversation away from a slow, outdated email and desktop experience to an in-app chat experience that allows users to interact with brands in their own time.

Through our market-leading AI-powered chatbots and automation, we help brands deliver instant and rapid resolutions. Because agents play a key role in delivering help, our platform gives agents superpowers with automation and AI that simply works.

About the Team

Consumers care first and foremost about having their time valued by brands. Brands need insights into their customer service operation to serve their consumers effectively. Such insights and analytics are delivered through various data products like in-app analytics dashboards and data-sharing integrations.

The data platform team is responsible for designing, building, and maintaining the data infrastructure that enables such data and analytics products at scale. We build and manage data pipelines, databases, and other data structures to ensure that the data is reliable, accurate, and easily accessible.

We also enable internal stakeholders with business intelligence and machine learning teams with data ops. This team manages the platform that handles 2 Million events per minute and processes 1+ terabytes of data daily.

About the Role

Building maintainable data pipelines both for data ingestion and operational analytics for data collected from 2 billion devices and 900M Monthly active users
Building customer-facing analytics products that deliver actionable insights and data, easily detect anomalies
Collaborating with data stakeholders to see what their data needs are and being a part of the analysis process
Write design specifications, test, deployment, and scaling plans for the data pipelines
Mentor people in the team & organization

Requirements

3+ years of experience in building and running data pipelines that scale for TBs of data
Proficiency in high-level object-oriented programming language (Python or Java) is must
Experience in Cloud data platforms like Snowflake and AWS, EMR/Athena is a must
Experience in building modern data lakehouse architectures using Snowflake and columnar formats like Apache Iceberg/Hudi, Parquet, etc
Proficiency in Data modeling, SQL query profiling, and data warehousing skills is a must
Experience in distributed data processing engines like Apache Spark, Apache Flink, Datalfow/Apache Beam, etc
Knowledge of workflow orchestrators like Airflow, Dasgter, etc is a plus
Data visualization skills are a plus (PowerBI, Metabase, Tableau, Hex, Sigma, etc)
Excellent verbal and written communication skills
Bachelor’s Degree in Computer Science (or equivalent)

Benefits

Hybrid setup
Worker's insurance
Paid Time Offs
Other employee benefits to be discussed by our Talent Acquisition team in India.

XML job scraping automation by YubHub

]]> full-time senior hybrid Python, Java, Snowflake, AWS, EMR/Athena, Apache Iceberg/Hudi, Parquet, Apache Spark, Apache Flink, Datalflow/Apache Beam, Airflow, Data modeling, SQL query profiling, data warehousing, PowerBI, Metabase, Tableau, Hex, Sigma Engineering Technology Helpshift https://logos.yubhub.co/j.com.png Helpshift is a company that provides a mobile-first customer service experience for B2B brands. It has over 900 million active monthly consumers and is used by hundreds of leading brands. https://apply.workable.com https://apply.workable.com/j/D451DB2325 Pune, Maharashtra, India 2026-03-09