Post Mortem: Slow SQL Queries and Missing SQL Indexes

Post Mortems
October 8, 2025

Summary

We recently pushed a change that resulted in an inefficient SQL query without a SQL index. On heavy traffic, this overloaded the SQL servers, which resulted in delayed deliveries.

Timeline (10/08/2025 Eastern US Time):

10:02 Incoming email traffic rises
10:22 [Service Degradation Begins][Machine Alert]: We get an email that SQL connection errors are rising, emails being getting delayed
10:23 [First Responder Signs On]: Matthew signs on and begins investigating
10:30 [Customers Alerted]: We post an incident to status page: https://status.improvmx.com/incident/739965
10:37 [Mitigation #1 Attempted]: We push code to lower number of connections from each mail server, this doesn't seem to help.
10:50 [Mitigation #2 Attempted]: We tune the innoDB cache to use more of the available memory, this doesn't seem to help.
11:02 [Mitigation #3 Attempted][Recovery Begins]: We discover a new slow query due to a misconfigured SQL index. We push code to revert the new query. Query performance recovers, and email delivery times begin normalizing.
11:04 We post a status update that we've identified the issue, and the remediation is in progress.
11:23[Recovery Complete]We post a status update that delivery times have fully normalized

What Went Wrong

We mistakenly modified a resource intensive SQL query in a way that it did not use the SQL index anymore.

Action Items

  • IMX-1400: Add a proper index to the table, and re-enable the new query
  • IMX-1401: Add alerting on "slow queries" in SQL
  • IMX-1402: Tune the InnoDB cache on all SQL servers to use more of the available memory

We sincerely apologize for all the recent delivery issues & delays. We're taking these as difficult learning opportunities to harden our system performance.

Matthew Tse
Owner and CEO of ImprovMX