Summary
There was delayed delivery for ~5 hours, due to a coordinated spam campaign that flooded ImprovMX customer aliases with phishing emails from dozens of disposable domains. The spam volume caused ~29K messages to back up in the mail queue, as delivery workers bottlenecked on connection timeouts to spam destination servers and Gmail rejected forwarded spam with "low reputation" errors that triggered lots of soft-bounce retries.
Timeline (March 27, 2026, all times US Eastern / EDT)
- 11:11 AM — [Service Degradation Begins][Machine Alert] BetterStack alert fires: "ImprovMX Mail queue is large"
- 11:15 AM — [Oncall Signs On] Yosif acknowledges
- 12:01 PM — [Root Cause Identified] Yosif identifies multiple spammer domains sending the same 2-3 spam templates (fake Marriott gift cards, Omaha Steaks promos, BlueCross insurance, DocuSign phishing) across many ImprovMX customer aliases
- 12:10 PM — [Mitigation #1 Attempted] Rate-limiting of spammer domains begins
- 12:42 PM — Rate limits progressively tightened from 250/hr → 150/hr → 50/hr. Queue dips temporarily but grows again as new domains appear
- 2:03 PM — [First Customer Report] - first customer reports about delayed deliveries
- 2:08 PM — [Escalation] Matthew acknowledges the incident and signs on as well
- 2:41 PM — [Customers First Notified] Status page updated with incident
- 3:06 PM — [Mitigation #2 Attempted] Yosif deploys alias pattern rate limiting on spammers
- 3:37 PM — Yosif posts list of 41 spam domains for bulk banning
- 3:44 PM — [Mitigation #3 Attempted][Recovery Begins] Vacuum script deployed — begins draining bouncing spam messages from the mail queue into a quarantine queue. Mail queue begins dropping in size.
- 4:14 PM — Yosif identifies that delivery workers are wasting a lot of time waiting on timeouts when bouncing messages back to the spam servers.
- 4:27 PM — [Mitigation #4 Attempted] Based on Yosif's theory above, Matthew deploys code so delivery workers will stop trying to deliver bounce back reports to spam domains.
- 4:36 PM — [Recovery Complete] Mail queue back to zero
- 4:44 PM — [Incident Complete] All-clear posted to status.improvmx.com
What Went Wrong & Action Items
Ultimately, our infrastructure was vulnerable to spam DDOS exploitation on several levels, which we will be fixing as a matter of utmost priority.
Key Vulnerabilities:
- Our front-door spam detection does not have a rock solid connection to spam blacklists like spamhaus, spamcop, etc. which allowed more spam to make it into our systems. We will fix this.
- Our mail pipeline retries too aggressively on mail bounced by Gmail for "low domain reputation, likely spam". In these cases in the future, we will stop retrying earlier.
- Our mail pipeline waits and retries too aggressively on spam domains that have no intention of receiving bounceback reports. We will drop these bounceback reports earlier.
Other Action Items
- Add no-deploy-needed capability for Operations to rate-limit senders
- Add more metrics to our mail pipeline to have immediate view of the composition (how many messages are legit, bouncing to destination, stuck in bounce-report, etc.)
- Add ready-made tools for quarantining messages out of the mail queue
- Update our on-call alerts and playbooks to ensure earlier customer notification of downtime.
Statement
We sincerely apologize for the delivery delays caused by this incident, and are especially embarrassed by the long time it took for us to post the first status page update.. We are taking urgent steps to harden our systems and operational policies to prevent similar attacks from impacting our customers in the future.
