Wednesday, July 8, 2009

How to Survive Authorize.Net Outages

I was inspired to write this by the recent heavily reported Authorize.net outage. Credit card processing is one aspect of web site development that you have no choice but to out-source (unless you happen to be a bank). There are several out there, and Authorize.net is one of the biggest. No matter how robust the system is, there are bound to be periodic problems. Over nine years of business, and multiple processors, we saw many of them. Some outages were over in minutes, some lasted hours. Your customers, ready to make a purchase, don't want to hear that your credit card processor is having problems. It is possible to code your website to stay in business during an outage.

At RegionalHelpWanted.com (RHW), we were doing tens of thousands of dollars a day in credit card sales of help wanted advertising, none of which we wanted to lose during an outage by the processor. In the US, our credit card processor was Authorize.net. In Canada, it was Verisign. (We used separate services because we ran the businesses as separate companies.)

On Cupid.com, the problem was similar: we were selling online dating subscriptions; I think our processor was iPay.

In both cases, we were selling electronic services, but the method we used to survive a payment gateway outage with minimum business interruption is applicable to online businesses that are shipping tangible goods.

On your website, your customer should not receive any kind of error message saying that there is a problem with the payment system. The customer does not care about your problems, even if you are not to blame.

When Authorize.net is down, typically your web application will time out when trying to connect to them to make a sale. In a minority of cases, the connection will not time out but will give an invalid response, one that does not fit the normal specification. I don't remember any cases where a processor outage occurred and the processor sent back a valid response with a valid error code like "We're down! Try agin later!" In all of these cases, your web application should behave the same way:

- Capture the order information, and store it for later processing.
- Give the customer a success message
- Use an asynchronous process to retry these transaction until they are completed.

Capture the order information
On RHW, we let users opt-in to storing their credit card information to speed future orders, so we had already done the work necessary to do this safely and securely. On Cupid.com, the whole business was built on reoccurring billing, so same deal there. Your Terms of Use must allow you to always retain order information for enough time to process that order, even if the user does not opt in to longer information retention. While your at it, log any response that you did get from the payment gateway, scrubbed of any security sensitive info like credit card number, as it might help with forensics later.

Give the customer a success message
Error messages are the opposite of user friendly. Tell the customer that you have the order information, that it typically only takes a moment to process and that they will receive an email its done. And then let them be on their way.

On Cupid.com, we'd let the user start using the site with all the privileges of a paying member. If they sent any messages to other members, those messages would be queued until their order was successful.

On RHW, we would post the help wanted advertising. If it later turned out that the credit card was rejected, the ad would be removed. But this also could have been setup so that the ad was not posted until the credit card was accepted.

Process the transactions asynchronously
Queuing is a great way to handle any work that the user should not have to wait around for. You could use it just when there is a connection timeout or for every single transaction.

Cupid.com was a .net application, so it made sense to use Microsoft Message Queqing (MSMQ) for this functionality. The web application writes a message containing the order ID. A queue runner reads the queue, using the order ID to select the order and payment info from the database. It attempts to process the order. If Auth.net times out or returns an unexpected response, the message is sent to back of the queue. It is trivial to build a time delay into the queue runner, otherwise you may find yourself making hundreds of attempts per minute for the same transaction. And it's not nice to kick your credit card processor THAT often when they are down.

RHW was a ColdFusion application, and now it is very easy to do queuing in ColdFusuion too. Our need pre-existed this functionality, so we used MSMQ. We wrote the queue runner in VisualBasic, and the connection to Authorize.net was done as a CF web service.

Once the payment is either accepted or rejected, do make sure you follow up with the customer. On RHW, if a card was declined after waiting in the queue because of a gateway outage, we would call the customer and try to receive alternate payment before we removed their posting.

The queue runner needs to also delete the payment info unless the customer authorized retaining it.

That's it. Once we had that coding in place, Authorize.net outages (or Verisign or iPay) were non-events to the development team. There was some alerting built in so that we'd know if the queue was building up, otherwise we might not even know that Auth.net was down. That is, until the accounting department came over and complained that they couldn't log into the payment gateway.

1 comment:

Vinny said...

Very interesting, Steve. It was smooth sailing while we used the MessageQueue for RHW credit card processing. It was a excellent idea. -Vinny