28826
Finance & Crypto

GitHub's April 2026 Availability: Major Incidents and What We Learned

Posted by u/Tiobasil · 2026-05-18 05:09:03

In April 2026, GitHub experienced 10 incidents that caused degraded performance across its services. To foster transparency, we published a detailed blog post covering the most significant events on April 23 and 27, and enhanced the GitHub status page with more granular information. Here are the key incidents and the steps we're taking to improve reliability.

1. What was the overall impact of April's incidents on GitHub services?

Throughout April, we recorded 10 separate incidents that resulted in degraded performance for various GitHub services. While most were short-lived, two incidents on April 1 stood out: a major code search outage lasting over 8 hours and a brief audit log disruption. These incidents affected code search queries, audit log access, and event streaming for a subset of users. Importantly, no data was lost in either case, and we have since implemented several improvements to prevent similar issues. Our focus remains on reducing incident frequency and recovery time.

GitHub's April 2026 Availability: Major Incidents and What We Learned
Source: github.blog

2. What happened with GitHub's code search on April 1, 2026?

On April 1, between 14:40 and 17:00 UTC, GitHub's code search service became completely unavailable—100% of search queries failed. By 17:00 UTC, we restored the service in a degraded state, returning results that were temporarily stale (reflecting repository changes only up to approximately 07:00 UTC that day). Full recovery, with current data, was achieved by 23:45 UTC. The total period of degraded search lasted 8 hours and 43 minutes. During the 2-hour-20-minute full outage, all code search requests returned errors.

The root cause was a routine infrastructure upgrade to the messaging system supporting code search. An automated change was applied too aggressively, causing a coordination failure between internal services. This halted search indexing, making results stale. While the team worked to recover the messaging infrastructure, an unintended service deployment cleared internal routing state, escalating the staleness issue into a complete outage. We resolved it by performing a controlled restart of the messaging system, reestablishing coordination, and resetting the search index to a point before the disruption. No repository data was lost because the search index is a secondary, derived index—Git repositories themselves were unaffected. After re-indexing completed, all search results reflected the current state.

4. What happened with the audit log service on April 1, 2026?

Earlier the same day, between 15:34 and 16:02 UTC, the audit log service lost connectivity to its backing data store due to a failed credential rotation. This 28-minute window made audit log history unavailable via both the API and web UI. Consequently, 4,297 API actors and 127 github.com users received 5xx errors. Additionally, events created during this window were delayed by up to 29 minutes in the github.com interface and event streaming. However, no audit log events were lost—all events were ultimately written and streamed successfully. Importantly, customers using GitHub Enterprise Cloud with data residency were not impacted by this incident. We were alerted to the infrastructure failure at 15:40 UTC, six minutes after it began, and resolved it by correcting the credential rotation.

GitHub's April 2026 Availability: Major Incidents and What We Learned
Source: github.blog

5. What improvements is GitHub implementing to prevent similar incidents?

Based on the code search outage, we are introducing several specific improvements:

  • Gradual upgrades with better health checks to catch problems before they cascade.
  • Deployment safeguards to prevent unintended changes during active incidents.
  • Faster recovery tooling to reduce time to restore service.
  • Better traffic isolation to prevent cascading impact from unexpected traffic spikes during outages.

For the audit log incident, we are reviewing our credential rotation processes to ensure they are more robust and fail-safe. These near-term and long-term investments aim to increase overall availability and reduce the impact of any future incidents.

6. Were any customer data lost during these April incidents?

No, no customer data was lost during any of the April incidents. In the code search outage, the search index temporarily became stale or unavailable, but the underlying Git repositories remained completely unaffected. Once indexing was restored and caught up, all results reflected the current state. Similarly, during the audit log disruption, although the interface and API were temporarily unavailable and events were delayed, all audit log events were ultimately written and streamed successfully. We treat data integrity as our highest priority, and these incidents did not compromise that commitment.