Skip to main content

Explanation

Collection cadence: periodic, not real-time

Signals collects on a schedule, not continuously, and the interval that suits short-lived debugging is the wrong one for steady operation. This essay explains why the cadence is what it is, and how to think about choosing it for your team.

Periodic evidence, not a live feed

It is tempting to read "collector" as "monitor" and reach for the shortest interval the tool allows, on the assumption that fresher is always better. For Signals that instinct is misleading. Signals produces periodic diagnostic evidence— a sequence of point-in-time snapshots you analyse and act on — not a real-time monitoring feed you watch for second-by-second change.

That distinction is not a limitation to work around; it follows from what Signals actually surfaces. Most of the conditions it captures — table and index bloat, vacuum lag, transaction-ID wraparound risk, configuration drift — move over hours and days, not seconds. A snapshot taken five minutes after the previous one describes, for almost all of these, the same database state. The extra snapshot adds volume without adding insight.

Why the code default is not the operating default

The poll_interval baked into the code defaults to 5m. That value exists for a specific, narrow purpose: short-lived debugging, where you have started the daemon to watch a problem unfold and want snapshots arriving quickly while you are paying attention. It is deliberately frequent so that an interactive session feels responsive.

For steady operation that same frequency is actively wrong, for two reasons that compound each other:

  • No one reviews a snapshot every few minutes. Diagnostic evidence is only worth collecting at the rate you can read and act on it. A snapshot that no human looks at before the next one arrives is waste, and at 5m the great majority go unread.
  • The volume overwhelms downstream triage. Five-minute snapshots feeding Elevarq Analyzer can open hundreds of tickets a day— far more than a team can triage. The pipeline does not slow down to match your capacity to respond; it simply produces a backlog no one will ever clear.
The lesson is that the most frequent interval the tool permits is not the most useful one. 5m is the right default for the debugging session it was designed for and the wrong default for a system you leave running.

Match the interval to action, not to freshness

The governing principle is simple: choose poll_interval to match how often you will act on the output, not how fresh you would like the data to feel. Freshness you never read is not a benefit. The right cadence is the slowest one that still gives you a new snapshot each time you are ready to look at one.

In practice that resolves to a small number of sensible operating points, each tied to a way of working rather than to a number you tune in isolation:

  • 24h— a daily snapshot, suited to a hands-on DBA who reviews the database once a day.
  • 6h— four snapshots a day, a sound steady-state starting point for most teams.
  • 1h— only worth choosing if your team genuinely has the triage capacity to act on that ticket volume. Without it, the hourly cadence just rebuilds the backlog at a slower rate.

Read those as a spectrum anchored by capacity to respond, not by data velocity. If you cannot say who will act on the extra snapshots and when, the longer interval is the correct choice.

Each collector has its own natural cadence

poll_interval sets the rhythm of the collection cycle, but it is not the whole story. Each collector also has its own natural cadence— the rate at which the thing it measures actually changes — expressed as one of 5m, 15m, 1h, 6h, or 24h. Fast-moving activity statistics have a short natural cadence; slow-moving schema and configuration snapshots have a long one.

Because of this, you do not schedule individual collections, and you do not need the cycle to run as fast as your fastest collector. The cadence you choose is the heartbeat; the collectors layer their own intended rhythm on top of it.

Incidents: keep the cadence calm, trigger on demand

An incident feels like the moment to crank the interval down, but doing so re-creates exactly the volume problem described above — at the worst possible time, when triage attention is already scarce. The better pattern keeps the two concerns separate: leave poll_interval calm at its steady-state value, and take an immediate snapshot by hand whenever you actually need one.

signalsctl collect now

On-demand collection gives you the fresh evidence an incident calls for without committing the daemon to a frequency you will regret once the incident is over. The steady-state cadence and the incident snapshot are different tools for different needs; conflating them by lowering the interval trades a one-time need for a permanent cost.

Where the settings live

This page is about how to think about cadence; the concrete settings live elsewhere. For the poll_interval field, its precedence rules, and the per-collector cadence values, see Configuration. For running the daemon as a long-lived service so that the cadence you choose is the mechanism that actually drives collection, see Run as a service.

Run Signals

docker pull ghcr.io/elevarq/signals