Amazon Outage: Amazon’s datacentre in UAE reports fire; company says: At around 4:30 AM PST, one of our Availability Zones was impacted by … | – The Times of India


Amazon's datacentre in UAE reports fire; company says: At around 4:30 AM PST, one of our Availability Zones was impacted by ...

Amazon’s Cloud Unit shut down temporarily after ‘Objects Hit’ UAE data centre facility. Amazon Web Service (AWS) confirmed that at approximately 4:30 AM PST on March 1, “objects struck” the facility in availability zone mec1-az2, creating sparks and igniting a fire. The UAE fire department cut power to the building. The zone went dark. In its statement AWS added that other zones remain operational and restoration will take several hours. AWS Health Dashboard currently shows the services at the Datacenter ‘Disrupted’. It says that the following AWS services have been affected by this issue: Amazon Elastic Compute Cloud and Amazon Relational Database Service. The UAE is reeling from Iran’s missile and drone strikes following US and Israeli strikes on Iran. The Iranian strikes reportedly hit airports, ports, and residential areas across the country and the wider Gulf. When news agency Reuters asked AWS whether the incident at the data center was connected to the strikes, the company did not confirm or deny. The company said in its statement will take several hours to restore connectivity in the affected zone, the data center operator said, adding that other zones in the UAE are operating normally.

AWS Health Dashboard Status update

AWS Health dashboard for UAE shows ‘Increased Error Rates’. It also shows multiple services impacted. Here’s the update on the dashboard:Mar 01 6:01 PM PST: We confirm the recovery of the AssociateAddress API requests. We have also applied a change that enables customers to disassociate Elastic IP addresses from resources that are impacted by the underlying power issue. With these mitigations, customers can now successfully create and associate new network addresses in the unaffected AZs as well as re-associate Elastic IPs from resources in the affected zone to resources in the unaffected zones. We still do not have an ETA for power restoration at this time. For customers that can, we recommend using alternate Availability Zones or other AWS Regions where applicable. We will provide another update by 10:00 PM, or sooner if we have additional information to share.Mar 01 4:26 PM PST: We are seeing significant signs of recovery for AssociateAddress requests, and continue to work toward fully mitigating this issue. This combined with the earlier recovery of the AllocateAddress API means customers can now successfully create and associate new network addresses in the unaffected AZs. Other AWS Services are also now observing sustained improvement as a result of the EC2 Networking APIs recovery. We are now focusing on implementing a change that will allow customers to Disassociate Elastic IP addresses from resources that are impacted by the underlying power issue. We expect this specific mitigation to take another hour to complete. We do not have an ETA for power restoration at this time. For customers that can, we recommend using alternate Availability Zones or other AWS Regions where applicable. We will provide another update by 6:30 PM, or sooner if we have additional information to share.Mar 01 2:28 PM PST: We are seeing positive signs of recovery for many of the EC2 APIs, such as Describes and AllocateAddress. We recognize that customers are still experiencing errors when attempting to call the AssociateAddress API, and are unable to disassociate addresses from resources that are affected by the underlying power issue. We continue to work on multiple parallel paths to mitigate both of these issues. We recommend continuing to retry requests wherever possible. We expect our current mitigation efforts for these specific issues to complete within the the two to three hours. As we progress with these mitigation efforts, customers will observe higher success rates for these operations. Additionally, we are investigating ways to speed up these specific mitigation efforts, but are ensuring we do so safely. As of this time, power restoration is still several hours away. We will provide another update by 5:30 PM PST, or sooner if we have additional information to share.Mar 01 12:14 PM PST: We are aware that some customers are experiencing errors when calling EC2 APIs, specifically networking related APIs (AllocateAddress, AssociateAddress, DescribeRouteTable, DescribeNetworkInterfaces). We are actively working on multiple paths to mitigate these issues. For customers experiencing throttling errors on the AllocateAddress APIs, we recommend retrying any failed API requests. We are deploying a configuration change to mitigate the AssociateAddress API errors and expect recovery in the next few hours. DescribeRouteTable and DescribeNetworkInterfaces API calls without specifying zone, Interface or Instance IDs are expected to fail until we restore the impacted zone. We recommend customers to pass these IDs explicitly in these API requests. For customers that can, we recommend considering using alternate AWS Regions. We will provide another update by 3:30 PM PST, or sooner if we have more to share.Mar 01 9:41 AM PST: We want to provide some additional information on the power issue in a single Availability Zone in the ME-CENTRAL-1 Region. At around 4:30 AM PST, one of our Availability Zones (mec1-az2) was impacted by objects that struck the data center, creating sparks and fire. The fire department shut off power to the facility and generators as they worked to put out the fire. We are still awaiting permission to turn the power back on, and once we have, we will ensure we restore power and connectivity safely. It will take several hours to restore connectivity to the impacted AZ. The other AZs in the region are functioning normally. Customers who were running their applications redundantly across the AZs are not impacted by this event. EC2 Instance launches will continue to be impaired in the impacted AZ. We recommend that customers continue to retry any failed API requests. If immediate recovery of an affected resource (EC2 Instance, EBS Volume, RDS DB Instance, etc.) is required, we recommend restoring from your most recent backup, by launching replacement resources in one of the unaffected zones, or an alternate AWS Region. We will provide an update by 12:30 PM PST, or sooner if we have additional information to share.Mar 01 8:59 AM PST: We continue to work toward restoring power in the affected Availability Zone in the ME-CENTRAL-1 Region (mec1-az2). In parallel, we are actively working on improving error rates and latencies that some customers are observing for EC2 Networking and EC2 Describe APIs. Due to increased demand in the unaffected Availability Zones, customers may experience longer than usual provisioning times or may need to retry requests for certain instance types, or pick an alternative instance type. We will provide an update by 10:30 AM PST, or sooner if we have additional information to share.Mar 01 7:09 AM PST: We wanted to provide some additional information on the isolated power issue. At this time, most AWS Services have weighted away from the affected Availability Zone (mec1-az2) and are seeing recovery for their affected operations and workflows. For EC2 Instances, EBS Volumes, and other resources that are impacted in the affected Zone, we will have a longer tail of recovery. At this time, power has not yet been restored to the affected AZ. For now, we recommend continuing to retry any failed API requests. If immediate recovery is required, we recommend customers restore from EBS Snapshots and/or replace affected resources by launching replacement resources in one of the unaffected zones, or an alternate region. As of this time, recovery is still several hours away. We will provide an update by 8:30 AM PST, or sooner if we have additional information to share.Mar 01 6:09 AM PST: We can confirm that a localized power issue has affected a single Availability Zone in the ME-CENTRAL-1 Region (mec1-az2). EC2 Instances, DB Instances, EBS Volumes, and others resources are currently unavailable and will experience connectivity issues at this time. Other AWS Services are also experiencing error rates and latencies for some workflows. We have weighed away traffic for most services at this time. We recommend customers utilize one of the other Availability Zones in the ME-CENTRAL-1 Region at this time, as existing instances in other AZ’s remain unaffected by this issue. We are actively working to restore power and connectivity, at which time we will begin to work to recover affected resources. As of this time, we expect recovery is multiple hours away. We will provide an update by 7:15 AM PST, or sooner if we have additional information to share.Mar 01 5:19 AM PST: We are investigating connectivity and power issues affecting APIs and instances in a single Availability Zone (mec1-az2) in the ME-CENTRAL-1 Region due to a localized power issue. Existing instances in this zone will also be affected. Other AWS Services may also be experiencing increased errors and latencies for their workflows, and we are working to route requests away from this affected Availability Zone. We recommend customers make use of other Availability Zones at this time. Targeting new launches using RunInstances in the remaining AZs should succeed. Existing instances in the other AZs are not affected.

Source link