{"version":"0.1","company":{"name":"YubHub","url":"https://yubhub.co","jobsUrl":"https://yubhub.co/jobs/skill/scrapy"},"x-facet":{"type":"skill","slug":"scrapy","display":"Scrapy","count":1},"x-feed-size-limit":100,"x-feed-sort":"enriched_at desc","x-feed-notice":"This feed contains at most 100 jobs (the most recently enriched). For the full corpus, use the paginated /stats/by-facet endpoint or /search.","x-generator":"yubhub-xml-generator","x-rights":"Free to redistribute with attribution: \"Data by YubHub (https://yubhub.co)\"","x-schema":"Each entry in `jobs` follows https://schema.org/JobPosting. YubHub-native raw fields carry `x-` prefix.","jobs":[{"@context":"https://schema.org","@type":"JobPosting","identifier":{"@type":"PropertyValue","name":"YubHub","value":"job_93c1356c-a95"},"title":"Principal Software Engineer, Web Data - Tech Lead","description":"<p>We&#39;re looking for an exceptional Principal Software Engineer to serve as the de facto Technical Lead for our Web Data Acquisition (WDA) team. This is a highly visible, hands-on technical leadership role where you&#39;ll own the architectural direction for crawling systems, evolve and unify crawling platforms into a best-in-class stack, and elevate a high-performing engineering team.</p>\n<p>As a Principal Software Engineer, you&#39;ll solve complex distributed systems challenges, build modular tooling that accelerates delivery, and set the standard for observability and operational excellence. You&#39;ll have a dedicated manager handling all HR and administrative responsibilities. A product manager connects business needs with technical work. Your focus is 100% technical leadership, mentorship, and hands-on execution.</p>\n<p>Key Responsibilities:</p>\n<ul>\n<li>Technical Leadership &amp; System Design: Proven experience building web crawling or large-scale data systems from scratch. Strong architectural skills designing scalable, fault-tolerant distributed systems. Track record leading complex technical initiatives and driving architecture direction for teams.</li>\n</ul>\n<ul>\n<li>Data Engineering Expertise: Deep background in large-scale data engineering (terabytes daily). Hands-on experience with cloud data warehouses (BigQuery, Snowflake). Experience with Apache Kafka, Kubernetes (GKE/EKS), and orchestration tools (Airflow).</li>\n</ul>\n<ul>\n<li>Web Crawling &amp; Data Extraction: Deep expertise in web crawling technologies and advanced scraping (Scrapy or similar). Experience extracting structured/unstructured web data and SERP extraction. Knowledge of proxy infrastructure management, anti-bot detection, and ethical crawling.</li>\n</ul>\n<ul>\n<li>Leadership &amp; Team Development: Experience mentoring engineers at all levels and fostering collaborative culture. Strong ability to influence technical direction and establish best practices. Track record hiring, coaching, and developing senior engineers.</li>\n</ul>\n<p>Ideal Candidate Profile:</p>\n<ul>\n<li>10+ years software engineering experience. 5+ years focused on data engineering. 3+ years in senior/principal-level technical leadership.</li>\n</ul>\n<ul>\n<li>Strong CS fundamentals (algorithms, data structures, distributed systems). Self-starter who thrives in fast-paced environments.</li>\n</ul>\n<p>Core Technical Stack:</p>\n<ul>\n<li>Python &amp; Java</li>\n<li>Apache Kafka</li>\n<li>GCP (BigQuery, GKE, Vertex AI)</li>\n<li>Snowflake &amp; Starburst/Trino</li>\n<li>Terraform</li>\n<li>Scrapy / Web Scraping Frameworks</li>\n<li>Proxy Management Systems</li>\n<li>Distributed Systems &amp; Kubernetes</li>\n<li>Apache Airflow</li>\n<li>Large-Scale ETL Pipelines</li>\n</ul>\n<p style=\"margin-top:24px;font-size:13px;color:#666;\">XML job scraping automation by <a href=\"https://yubhub.co\">YubHub</a></p>","url":"https://yubhub.co/jobs/job_93c1356c-a95","directApply":true,"hiringOrganization":{"@type":"Organization","name":"ZoomInfo","sameAs":"https://www.zoominfo.com/","logo":"https://logos.yubhub.co/zoominfo.com.png"},"x-apply-url":"https://job-boards.greenhouse.io/zoominfo/jobs/8378092002","x-work-arrangement":"remote","x-experience-level":"senior","x-job-type":"full-time","x-salary-range":"$163,800-$257,400 USD","x-skills-required":["Python","Java","Apache Kafka","Kubernetes","GCP","Snowflake","Terraform","Scrapy","Proxy Management Systems","Distributed Systems","Apache Airflow","Large-Scale ETL Pipelines"],"x-skills-preferred":[],"datePosted":"2026-04-18T15:43:50.896Z","jobLocation":{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"Remote"}},"jobLocationType":"TELECOMMUTE","employmentType":"FULL_TIME","occupationalCategory":"Engineering","industry":"Technology","skills":"Python, Java, Apache Kafka, Kubernetes, GCP, Snowflake, Terraform, Scrapy, Proxy Management Systems, Distributed Systems, Apache Airflow, Large-Scale ETL Pipelines","baseSalary":{"@type":"MonetaryAmount","currency":"USD","value":{"@type":"QuantitativeValue","minValue":163800,"maxValue":257400,"unitText":"YEAR"}}}]}