Full Report
Microsoft is investigating an ongoing outage that is blocking admins worldwide from accessing the Exchange Admin Center (EAC). [...]
Analysis Summary
# Incident Report: Global Exchange Admin Center (EAC) Outage Investigation
## Executive Summary
Microsoft is investigating a critical, global service interruption preventing IT administrators from accessing the Exchange Admin Center (EAC), marked under incident ID EX1051697. The incident manifested as widespread "HTTP Error 500" messages, indicating a service failure potentially linked to recent configuration changes. Response has involved internal reproduction of the issue and validation of workaround access methods.
## Incident Details
- **Discovery Date:** April 9, 2025 (Approx. 8:39 AM UTC, based on two hours prior to 10:39 AM write time)
- **Incident Date:** April 9, 2025 (Ongoing as of reporting)
- **Affected Organization:** Microsoft (Internal service disruption impacting global customers)
- **Sector:** Cloud Services / Software as a Service (SaaS)
- **Geography:** Global
## Timeline of Events
### Initial Access
- **Date/Time:** April 9, 2025 (Approx. 8:39 AM UTC)
- **Vector:** Service Outage / Configuration Error (Not a typical external cyber attack vector)
- **Details:** The incident started, leading to administrative access failure across the globe.
### Lateral Movement
*Not applicable; this was a service availability incident, not an external breach.*
### Data Exfiltration/Impact
* Admins globally were blocked from accessing the EAC.
* Impact centered on administrative control over Exchange environments.
### Detection & Response
- **How it was discovered:** Affected IT administrators reported seeing "HTTP Error 500" errors when logging into the EAC portal, leading to issue logging (EX1051697).
- **Response actions taken:**
* Microsoft acknowledged the issue as a global outage.
* Engineers reproduced the issue internally.
* Diagnostic data was collected for troubleshooting.
* A potential workaround URL (`https://admin.cloud.microsoft/exchange#/`) was suggested and being verified.
* Service engineers are reviewing "recent changes made to the service as a potential root cause."
## Attack Methodology
*Note: Since this is reported as an internal service outage, the MITRE ATT&CK TTPs related to external attackers are not applicable, though change management process failures could be relevant.*
- **Initial Access:** Service failure in the front-end portal handling EAC sessions.
- **Persistence, Privilege Escalation, Defense Evasion, Credential Access, Discovery, Lateral Movement, Collection, Exfiltration, Impact:** Not applicable in the context of a service configuration error.
## Impact Assessment
- **Financial:** Not disclosed, but implied service degradation impacts customers relying on the EAC for management tasks.
- **Data Breach:** Not reported. The incident was related to service accessibility, not data compromise.
- **Operational:** Significant disruption for IT administrators needing to manage Exchange environments.
- **Reputational:** Impacts trust in Microsoft's service stability, especially following recent unrelated Exchange outages.
## Indicators of Compromise
* **Network indicators (Defanged):** None specified as related to malicious external activity.
* **File indicators:** None specified.
* **Behavioral indicators:** Widespread "HTTP Error 500" response when attempting access via `admin.microsoft.com`.
## Response Actions
- **Containment measures:** Microsoft is actively investigating the root cause, which appears to stem from internal service changes.
- **Eradication steps:** Focused on identifying and reversing the problematic service change that caused the error spikes.
- **Recovery actions:** Verifying the stability of the temporary workaround access method.
## Lessons Learned
- **Key takeaways:** Recent changes to the Exchange service configuration were identified as a likely root cause, highlighting the risk associated with updates.
- **What could have been done better:** Improved pre-deployment validation or more rapid rollback capabilities for configuration changes that impact global administrative functions.
- **Historical Context:** This occurs shortly after other significant Exchange-related outages targeting mailbox and web access.
## Recommendations
- **Prevention measures for similar incidents:**
1. Implement more rigorous canary testing and phased rollouts for changes affecting critical administrative portals (like the EAC).
2. Ensure robust, immediate rollback procedures are in place for any configuration update that causes sudden spikes in HTTP 500 errors.
3. Standardize and promote alternative, resilient administrative access paths (like the verified `admin.cloud.microsoft/exchange#/` URL) accessible even when the primary portal endpoint fails.