<?xml version="1.0" encoding="UTF-8"?>
<source>
  <jobs>
    <job>
      <externalid>d6450ee6-847</externalid>
      <Title>Data Infrastructure Engineer</Title>
      <Description><![CDATA[<p><strong>About the Role</strong></p>
<p>Cursor ships daily. Every release leaves signals behind: telemetry, prompts, completions, agent runs, sessions. Those signals power model improvement, evals, and experimentation. Data infrastructure is what turns them into something teams can trust.</p>
<p>A lot of systems here started simple so we could move fast. Over time, the constraints change and the “good enough” version becomes the bottleneck. This role owns the full ladder: patch what should be patched, redesign what should be redesigned, ship the replacement, and operate it.</p>
<p>Privacy guarantees are part of correctness. What we can retain and use depends on Privacy Mode and org configuration, and getting that wrong breaks a product promise. We choose work by business impact: what blocks product and model teams today, and what will block them next month.</p>
<p><strong>Sample projects include...</strong></p>
<ul>
<li>A core pipeline started as a pragmatic reuse of infrastructure built for something else. It works, but it cannot guarantee properties downstream consumers now need (for example, point-in-time consistency). You design and ship the replacement while keeping the existing system running.</li>
</ul>
<ul>
<li>A new product surface ships without instrumentation. You talk to the team, define what needs to be captured, and wire it through before the absence becomes anyone else’s problem.</li>
</ul>
<ul>
<li>Eval coverage drops. You trace it to an instrumentation gap introduced weeks ago by a product change nobody flagged. You fix the gap, add a contract so it cannot recur, and ship the dashboard that would have caught it earlier.</li>
</ul>
<ul>
<li>Multiple consumers depend on overlapping data. You design schema evolution and validation so changes in one place do not silently degrade the others.</li>
</ul>
<ul>
<li>Storage costs rise faster than usage. You decide what is worth keeping, implement retention and compression, and delete what is not.</li>
</ul>
<p><strong>What we&#39;re looking for</strong></p>
<p>We’re looking for someone who has built real systems at scale and cares about correctness, cost, and ergonomics.</p>
<p>Strong signals include:</p>
<ul>
<li>Deep experience with Spark (Databricks or open-source Spark both count)</li>
</ul>
<ul>
<li>Production experience with Ray Data</li>
</ul>
<ul>
<li>Hands-on ownership of large data pipelines and storage systems</li>
</ul>
<ul>
<li>Comfort debugging performance issues across client instrumentation, streaming, storage, and model-facing workflows, as well as, compute, storage, and networking layers</li>
</ul>
<ul>
<li>Clear thinking about data modeling and long-term maintainability</li>
</ul>
<ul>
<li>You have good judgment about when to patch and when to rebuild</li>
</ul>
<p>Nice to have</p>
<ul>
<li>Experience running or scaling ClickHouse</li>
</ul>
<ul>
<li>Familiarity with dbt, Dagster, or similar orchestration and modeling tools</li>
</ul>
<p>We&#39;re in-person with cozy offices in North Beach, San Francisco and Manhattan, New York, replete with well-stocked libraries.</p>
<p style="margin-top:24px;font-size:13px;color:#666;">XML job scraping automation by <a href="https://yubhub.co">YubHub</a></p>]]></Description>
      <Jobtype>full-time</Jobtype>
      <Experiencelevel>senior</Experiencelevel>
      <Workarrangement>onsite</Workarrangement>
      <Salaryrange></Salaryrange>
      <Skills>Spark, Ray Data, data pipelines, storage systems, debugging performance issues, data modeling, long-term maintainability, ClickHouse, dbt, Dagster</Skills>
      <Category>Engineering</Category>
      <Industry>Technology</Industry>
      <Employername>Cursor</Employername>
      <Employerlogo>https://logos.yubhub.co/cursor.com.png</Employerlogo>
      <Employerdescription>Cursor is a technology company that ships daily releases, leaving behind signals that power model improvement, evals, and experimentation. The company has multiple offices in North Beach, San Francisco and Manhattan, New York.</Employerdescription>
      <Employerwebsite>https://cursor.com</Employerwebsite>
      <Compensationcurrency></Compensationcurrency>
      <Compensationmin></Compensationmin>
      <Compensationmax></Compensationmax>
      <Applyto>https://cursor.com/careers/software-engineer-data-infrastructure</Applyto>
      <Location>San Francisco</Location>
      <Country></Country>
      <Postedate>2026-03-08</Postedate>
    </job>
  </jobs>
</source>