Dynamic Polling Scheduler: Reinventing Shipment Tracking at ClickPost
16 Sep, 2025
|
6 Min Read
Read Blog
AI-Powered Tracking
Returns & Exchange
Multi-Carrier Integrations
ML based carrier allocation
Failed delivery management
Automated notification engine
Delivery date prediction
Analytics & Reporting
B2B & MPS Solutions
Industry Insights and Benchmarks
Quick Commerce Revenue Intelligence
500+ Carriers including FedEx, UPS, Blue Dart, Amazon, Delihivery etc.
Pre-built integrations including SAP, Increff, etc and custom integrations with your in-house software
Easy integrations with Shopify, Magento, Woocommerce and your custom storefront
Send notifications via any channel: SMS, Email, IVRS, WhatsApp and marketing tool
Enhance post-purchase experience for orders from your own website
Logistics automation for businesses with retail presence
Enabling brands to delight customers with lightening fast deliveries
Manage both your sellers and end-customers 51Ƭ
Automate operations for your bulk-shipping orders
Track your cards and documents seamlessly
Aggregators, carriers, fulfillment softwares- we have something for you!
Complete API documentation of ClickPost
See the results achieved by companies using ClickPost
Latest articles on trends in logistics and supply chain
Check out all new product updates at ClickPost
Everything about ClickPost you need to know
Join us and be a part of the team that is revolutionizing logistics.
Featured in leading publications including YourStory, Deccan Chronicle and The Times of India
Write a message to us in case of any queries here!
ClickPost is SOC2 Compliant, GDPR compliant and also possesses ISO certifications.
List of all ClickPost partners in various industries
Article guided by: and
Polling millions of shipments every few hours with a single cron job creates multiple problems—burst traffic that overwhelms courier APIs, no per-account control, and minimal visibility into failures. To solve this, we built the Dynamic Polling Scheduler, a continuous, account-aware system that spreads the load evenly, guarantees exactly-once message delivery, and provides granular observability. In this article, we’ll explain the limitations of the previous approach, the design of the new scheduler, and how it ensures reliable, scalable, and crash-safe shipment polling.
Burst traffic: The current 2-hour cron pulls ~10 M open shipments in one shot, pushes them to Kafka, and subjects courier partners to massive QPS spikes.
No per-account control: Every customer is polled on the same 2 h cadence, whether they need it or not.
Coarse observability: We only know something’s wrong every two hours.
Missed SLAs: Rate-limit blocks can delay polling beyond the 2 h window.
Move to a continuous, account-aware scheduler that:
Polls each shipment on its own interval (default = 120 min, configurable per account).
Spreads the load evenly every 5 min to keep partner APIs happy.
Guarantees “e油ٱ-DzԳ” enqueue, zero data loss on crashes.
Gives granular, always-on alerting and a lean storage footprint.
Concurrency-safe under many replicas.
Observability without logs.
No | Must / Should | Requirement |
1 | MUST | Poll every active shipment until Delivered or 60 days. |
2 | MUST | Allow per-account polling_interval_minutes, default 120. |
3 | MUST | Slice each interval into 5-min batches (24 slices per 2 h). |
4 | MUST | Exactly-once enqueue to Kafka; idempotent consumers. |
5 | MUST | Crash-safe; recover by replaying an outbox. |
6 | SHOULD | Alert at account level when lag > 110 % of interval. |
Selector Worker (Ingress Control)
Runs every 1 minutes.
Selects shipments that are due for polling (next_poll_at <= NOW()) from the shipment_polling_tracker table.
Inserts a simple record for each due shipment into the scheduler_outbox table.
Commits and exits quickly to keep database locks short and efficient.
Unscheduled Shipments Updater Worker (Egress Control)
Continuously checks the table shipment_polling_tracker for new shipments to be polled(Fresh AWB’s Registered)
Fetches new shipments (with next_poll_at as NULL), assigns a randomized next_poll_at within the account’s polling interval to spread out load.
Updates next_poll_at in shipment_polling_tracker using the polling interval from polling_schedule_config.
Scheduled Shipments Updater Worker (Egress Control)
Periodically reviews all shipments in the shipment_polling_tracker table.
For polled shipments, recalculates and updates next_poll_at based on the last poll time and the account’s interval, adding a small random jitter to avoid future spikes.
Ensures that polling times are evenly distributed, preventing traffic bursts and keeping the system smooth.
Kafka Shipment Tracking
Receives shipment polling events from the Flusher Worker.
Guarantees exactly-once delivery to all downstream consumers, even in the event of retries or failures.
Courier Consumer Pool
Continuously polls Kafka for new shipment events.
Makes HTTP requests to courier partner APIs to fetch the latest shipment status.
Writes status updates and scan events to the next queue in the pipeline for further processing.
Exactly-Once Messaging.
The Flusher uses Kafka’s idempotent-producer protocol keyed on shipment_id, which means a resend after a crash is automatically de-duplicated at the broker level. Downstream consumers therefore see each shipment message once, regardless of partial failures.
Load Smoothing.
By slicing every account’s interval into 24 equal five-minute windows, the system spreads API calls evenly instead of issuing two-hour bursts. Courier-facing QPS is flat, staying within partner rate limits.
Horizontal Scalability.
Both Selector and Flusher are stateless. Additional replicas can be deployed to absorb higher throughput simply by increasing Nomad counts; row-level locks prevent duplicate processing.
Latency Budget.
Control-plane transactions target sub-200 ms per five-minute slice.
Data-plane latency is bounded only by courier round-trip times; polling freshness SLO is 1.1 × configured interval for 99% of shipments.Eliminates courier rate-limit spikes.
Provides per-account configurability without schema changes to downstream consumers.
Provides per-account configurability without schema changes to downstream consumers.
Reduces mean-time-to-detect (MTTD) for polling issues from 2 hours to < 10 minutes.
Instead of relying on row-level locks like FOR UPDATE SKIP LOCKED, we use a JobConfig table as a distributed lease manager.
CREATE TABLE job_config (
job_name TEXT PRIMARY KEY,
slice_start TIMESTAMP,
slice_end TIMESTAMP,
last_started_at TIMESTAMP,
last_completed_at TIMESTAMP,
running BOOLEAN DEFAULT false,
version BIGINT
);
Each worker replica tries to claim a job slice:
UPDATE job_config
SET running = true,
last_started_at = NOW(),
version = version + 1
WHERE job_name = :job
AND running = false
AND version = :current_version;
If update count = 1 → lease acquired.
If update count = 0 → another worker already owns it.
Selector Jobs
Only one replica selects shipments per slice.
Prevents double enqueueing of the same shipments.
Flusher Jobs
Only one replica flushes a batch from the outbox.
Guarantees exactly-once enqueue into Kafka.
Updater Jobs
Ensures only one rescheduler run per slice.
Prevents next_poll_at overwrite collisions.
If a job is missed (e.g., worker crash), the next worker checks for slice_end < NOW() AND running=false.
Missed slices are replayed in order until the schedule is caught up.
In large-scale systems like ours, shipments move through a complex network of jobs, queues, and schedulers. To keep everything running smoothly, we’ve built two streams of alerts—one for the producer side (the scheduler that puts work into queues) and one for the consumer side (the workers that pull messages out of queues).
Think of it like a factory:
Producer alerts make sure the conveyor belt is loaded on time.
Consumer alerts make sure the workers on the line are keeping up.
The scheduler’s job is to decide when each shipment should be checked again and assign it a next_poll_at time. If the scheduler falls behind, shipments don’t get scheduled on time, and downstream processes suffer.
We track two key signals:
What it means: Every 5 minutes, the scheduler should assign a next_poll_at to shipments.
How we check: At any given moment (T0), we look back 10 minutes. If those shipments still don’t have a schedule (next_poll_at is missing), it means the scheduler is lagging.
Why it matters: If the scheduler slows down, the whole system loses its rhythm.
What it means: A shipment is polled now, but it wasn’t polled in the last cycle (a 2-hour window).
Why it matters: This highlights fairness issues—some shipments might get skipped because of ceiling limits, slice caps, or race conditions. Spotting this helps us prevent silent skips in polling.
Even if the scheduler is doing its job, things can still go wrong on the consumption side.
What we track: Whether messages start piling up in the queue.
Why it matters: If consumers (the workers pulling shipments for processing) slow down or stall, backlogs build up quickly. That’s our early warning signal to scale workers or investigate bottlenecks.
Queues are the heartbeat of our tracking system. A delay on the producer side means shipments don’t even enter the pipeline. A delay on the consumer side means shipments are stuck waiting.
By monitoring both ends, we make sure:
No shipment is left unscheduled.
No shipment gets unfairly skipped.
No queue grows silently in the background.
This two-pronged alerting system helps us catch issues early and keep delivery experiences smooth for our clients.