Web UI unavailable and ingestion has stopped
Incident Report for Mezmo Status Page
Postmortem

Dates:
Start Time: Thursday, January 14, 2021, at 19:42 UTC
End Time: Thursday, January 14, 2021, at 20:27 UTC
Duration: 0:45:00

What happened:

Our WebUI became unavailable and ingestion of new logs stopped for 45 minutes. Logs were automatically resent later and ingested successfully for customers using our ingestion client agent.

Why it happened:

The certificate used by all our services expired. Consequently, all API calls to our service failed, which caused our WebUI to fail and ingestion of new logs to stop.

How we fixed it:

We renewed the certificate and applied it to all affected services. Our WebUI became responsive again and ingestion resumed. Since no logs had been ingested for about 45 minutes, our service had a moderately large backlog to process. As it caught up, users experienced delays in searching, graphing, and timelines for newly submitted logs.

What we are doing to prevent it from happening again:

We’re tightening our internal notifications of upcoming expiration dates for all certificates our service relies upon.

Posted Jan 19, 2021 - 21:59 UTC

Resolved
The web UI is available again and ingestion has resumed. All services are operational.
Posted Jan 14, 2021 - 20:27 UTC
Investigating
The web UI is unavailable and ingestion has stopped. We are investigating.
Posted Jan 14, 2021 - 20:05 UTC
This incident affected: Log Analysis (Log Ingestion (Agent/REST API/Code Libraries), Log Ingestion (Heroku), Log Ingestion (Syslog), Web App).