Member-only story

Operational Excellence at Zeotap — Sherlock, The Path Tracking Service

Yash Bansal
5 min readFeb 7, 2024

--

This blog takes us through our journey of using Schedule based Job management systems, gaps thereof, and why we created our in-house microservice to launch jobs on Data Availability.

Zeotap deals with data. As of today, the lean data engineering team manages 450+ jobs/day across our data pipelines. Tracking every job to ensure smooth functioning and SLA happens to be a major operational task.

Challenges faced

At Zeotap, we use Kingpin, our in-house cron based Job management system, to manage our data pipelines. While Kingpin improves upon the limitations that we faced while using Oozie at Zeotap, a Cron based scheduling system poses the following challenges:

  1. Job failures due to Missing data i.e., a job gets launched on the scheduled frequency even if the input data is missing.
  2. Irregular arrival of Data leads to failures while using a Frequency-based coordinator.
  3. Delay in Ingestion SLA as ingestion jobs are launched about a day after the actual data arrival (to ensure enough buffer to accommodate for delays in data arrival).

The above failures amounted to over 40% of our total failures and resulted in increased ops overhead for our engineering team. To decrease…

--

--

Yash Bansal
Yash Bansal

Written by Yash Bansal

100K+ views, Principal Engineer, Loves to read and write about latest tech, sometimes about life topics . Find me on Topmate - https://topmate.io/yashbansal042

Responses (1)