Even if there is an outage, the CX doesn’t have to suffer. In my current project, there are key AWS services, as well as the CXcoach product, that are required to be running for the application to function. I have architected both full systems and application features in such a way for them not to fail because of transient outages or worse and furthermore to provide instant feedback to users. This also includes not losing anything that has yet to be persisted.
I architected and implemented a serverless solution to check at regular intervals the state of key application services and other endpoints (e.g. app health check). If an outage is experienced the ALB is switched to a Lambda backed LB where the user is provided with information up until the time the service(s) is back online. This is all hooked into Slack and OpsGenie.