Post Mortem: Expired SMTP TLS Certificate

Post Mortems

Summary

The TLS certificate for our SMTP server (smtp.improvmx.com) expired. For ~6 hours, customers sending email through ImprovMX on port 587 with a mail app that verifies certificates were unable to send. Inbound email forwarding was not affected. Once detected, we renewed the certificate and restored sending.

Timeline (May 14 - 15 2026, all times UTC)

  • April 2026 We incorrectly rotate the Cloudflare credential handling certificate renewal, and certificate renewals begin failing silently.
  • May 14, 23:08 — [Downtime Begins] The TLS certificates for our SMTP servers expire, and customers are unable to send SMTP emails.
  • May 15, 4:33 — [First Customer Detection] Customer churn report mentions expired SMTP certificate.
  • May 15, 4:40 — [First Responder Signs On] Matthew sees report and begins investigating.
  • 4:42 — [Customers Alerted] Matthew opens a status page incident.
  • 4:52 — [Root Cause Identified][Mitigation Begins] Matthew discovers the certificate auto-renewal had been silently failing because of the rotated credential. Begins the process of acquiring a new certificate and manually renewing the certificate.
  • 5:15 — [Downtime Ends] Fresh certificates are issued and pushed to all SMTP servers.
  • 5:18 — [Recovery Complete] New certificates verified live across the SMTP fleet. Status page resolved.

What Went Wrong

Our SMTP certificates renew automatically, and the credential for renewal was broken a few weeks ago.

The bigger failure was detection: we relied on a 3rd party certificate monitoring service to alert us of close to expiring certificates. But the service gave us no warning.

Followup Actions

  • Add working 1st-party monitoring that checks our certificates and alerts us well before they expire. Certificates are too critical to entrust to a unreliable third party.
  • Add alerting on certificate renewal failures so a broken renewal is caught immediately.

Statement

We sincerely apologize for the disruption to SMTP sending. This was a preventable outage that should have been caught long before it affected customers. We've prioritized the monitoring work above so an expiring certificate can never quietly slip through again.

Matthew Tse

Matthew Tse

Owner and CEO of ImprovMX