Mezmo Status Page Status - Ingestion of new logs — for Syslog only

Ingestion of new logs — for Syslog only - Partial Outage

Incident Report for Mezmo Status Page

Postmortem

Dates:
Start Time: Thursday, February 18, 2022, at 00:10 UTC
End Time: Thursday, February 24, 2022, at 23:43 UTC
Duration: 167:33:00

‌

What happened:

The ingestion of new logs to our Syslog endpoint was intermittently failing.

‌

Why it happened:

We recently introduced a new service (Syslog Forwarder) to handle the ingestion of logs sent over Syslog. As the name implies, it forwards logs to downstream services. It was designed to send all logs submitted for each account to a single port opened on the downstream services. No load balancing was implemented in our original design, which performed well in our advance testing.

Once put into production, however, it became apparent that some customer accounts submit logs at a volume higher than the downstream services could process. When this happened, logs lines were buffered in memory by the Syslog Forwarder. Memory increased until the pods crashed. Any log lines held on those pods were lost and never ingested.

‌

How we fixed it:

We improved the design of the Syslog Forwarder by adding a pool of connections to the downstream services. In effect, we added traffic shaping to the Syslog Forwarder.

‌

What we are doing to prevent it from happening again:

The new architecture has been incorporated and proven resilient in production. No further work is needed to prevent this kind of incident from happening again.

Posted Mar 01, 2022 - 20:29 UTC

Resolved

This incident has been resolved. Please reach out to us at support@logdna.com with any additional questions.

Posted Feb 24, 2022 - 23:43 UTC

Identified

New logs — from Syslog only -- are intermittently not being ingested by our service. We are working to restore this functionality as soon as possible.

Posted Feb 24, 2022 - 22:45 UTC

This incident affected: Log Analysis (Log Ingestion (Syslog)).