All systems operational

updated 30 s ago

Every service is healthy.

Overall uptime

90.403%

last 90 days

Active incidents

0

all clear

Services tracked

8

public + private

Mean time to resolve

24 m

rolling 90 days

// services

Per-service health.

Sending API

Operational90-day uptime90.222%

POST /v1/email and the synchronous send pipeline.

90 days agotoday

SMTP delivery

Operational90-day uptime89.778%

Outbound MTA pool — dedicated + shared IPs.

90 days agotoday

Webhook delivery

Operational90-day uptime91.667%

Outbound HTTP delivery + replay queue.

90 days agotoday

Dashboard

Operational90-day uptime90.889%

app.voltmail.io — analytics + admin UI.

90 days agotoday

Postal cluster

Operational90-day uptime88.667%

Four bare-metal nodes across us-east-1 + eu-west-1.

90 days agotoday

Postgres (primary)

Operational90-day uptime89.778%

Customer + suppression + log metadata stores.

90 days agotoday

Redis + BullMQ

Operational90-day uptime90.556%

Idempotency cache + job queues.

90 days agotoday

CDN + static assets

Operational90-day uptime91.667%

Marketing site + dashboard JS/CSS.

90 days agotoday

// recent incidents

Past 90 days, with timelines.

Every incident gets a public phase-by-phase write-up. Post-mortems are linked from the Resolved phase whenever published.

Elevated 5xx on POST /v1/email in eu-west-1

Resolved
Sending API
  1. Investigating · 07:23 UTC

    Synthetic checks observed elevated 5xx on the EU send pipeline.

  2. Identified · 07:31 UTC

    Root cause: a stuck Postgres replica drifting on idempotency-cache writes. Failing over to the secondary.

  3. Monitoring · 07:42 UTC

    Failover complete. Error rate back to baseline.

  4. Resolved · 07:51 UTC

    Replica recovered, monitored for 9 minutes, declaring resolved. Post-mortem: voltmail.io/post-mortems/2026-04-15.

Webhook delivery latency above SLO

Resolved
Webhook delivery
  1. Investigating · 14:08 UTC

    p95 webhook delivery latency rose above 8s. Investigating queue depth.

  2. Identified · 14:21 UTC

    A noisy customer with 14k retried-out endpoints saturated the BullMQ DLQ workers.

  3. Monitoring · 14:30 UTC

    Per-tenant queue isolation rolled forward. Latency back to <1s.

  4. Resolved · 14:42 UTC

    No customer impact lasted >5 minutes; SLA credits issued automatically.

Scheduled maintenance — Postgres major upgrade

Resolved
Postgres (primary)
  1. Maintenance · 02:00 UTC

    Pre-announced read-only window for Postgres 16 → 17 upgrade.

  2. Resolved · 02:23 UTC

    Upgrade complete. Read+write restored. No data loss.

Dashboard JS asset 404 after deploy

Resolved
Dashboard
  1. Investigating · 18:14 UTC

    Some users see "ChunkLoadError" on app.voltmail.io.

  2. Identified · 18:17 UTC

    CDN cache mismatch after deploy — old chunk hashes pointed to purged objects.

  3. Resolved · 18:21 UTC

    Cache fully purged + rolling deploy of fresh hashes. Issue resolved.