Sizing an event deduplication window
A short note on why a fixed time window for analytics event deduplication needs tuning, not just existing.
Adding deduplication to an analytics pipeline is often treated as a binary: either you dedupe events or you don’t. In practice, a time-windowed dedupe check has a tuning parameter — the window size — that matters as much as the existence of the check itself.
Too short a window, and duplicate events fired from overlapping async code paths (e.g., a retry and the original call both completing within milliseconds of each other) slip through. Too long, and legitimately repeated user actions (the same button pressed twice, intentionally, seconds apart) get incorrectly collapsed into one event.
[!TIP] When tuning a dedupe window, look at the actual time gap between duplicate-but-unwanted events in production logs first — don’t guess a “safe-sounding” number like 500ms or 5s without that data.
This is one of those small parameters that’s easy to set once and forget, but worth revisiting once you have real production event timing to look at.
There’s also a question of what the dedupe key should be, separate from the window size. Keying purely on event name collapses too aggressively — two genuinely different calls with the same name but different properties shouldn’t dedupe against each other. Keying on the full event payload (name plus all properties) is safer but can fail to catch duplicates where a timestamp or request-id property differs between the “real” event and its retry, even though the underlying user action was identical. A dedupe key built from event name plus the semantically stable properties — excluding anything generated fresh per call, like a request id or client timestamp — tends to catch the duplicates that matter without over-collapsing legitimately repeated actions.