An outage that lasted more than an hour took out a host of Microsoft cloud services Thursday afternoon, as networking connectivity errors in Microsoft Azure also took out third-party apps and sites running on Microsoft’s cloud.
Beginning around 1:20pm and lasting for more than an hour, the outage appeared to span the breadth and depth of Microsoft’s cloud services, including Office 365, Microsoft Teams, Xbox Live, and several others used by Microsoft’s commercial customers. The service began to recover around 2:30pm, although Microsoft had yet to sound the all-clear as of this writing and warned that it could take some time to get everyone back up and running.
Microsoft updated its status page with an assessment of what went wrong, and to note that the worst is over:
Engineers have identified the underlying root cause as an incorrect name server delegation issue affecting DNS resolution, network connectivity, and downstream impact to Compute, Storage, App Service, AAD, and SQL Database resources. Mitigation has been applied, and the majority of Azure and other Microsoft services have recovered. We are in the process of final validation to ensure full recovery.
Microsoft later provided a little more detail on what happened:
During the migration of a legacy DNS system to Azure DNS, some domains for Microsoft services were incorrectly updated. No customer DNS records were impacted during this incident, and the availability of Azure DNS remained at 100% throughout the incident. The problem impacted only records for Microsoft services.