The VoIP Addict’s Guide – VoIP Redundancy in the Cloud
Make no mistake, almost everything is becoming a cloud based service. Still running Exchange? You’re living in the past, my friend. Phone systems are, of course, no different. While I’ll maintain there are huge advantages to running an on-prem system (mostly cost and low latency), there are a lot of conveniences of having your system in the cloud. Now, when I say cloud, I am referring to platforms like Microsoft Azure and Amazon Web Services for this specific post.
Let’s talk a little about the conveniences of a cloud hosted phone system. First, it makes deploying remote phones a much easier process, mostly because every phone is now remote. It also allows anyone traveling abroad to bring their phone with them, and with Internet access, they can make calls from Singapore as if they were calling from Buffalo, New York (for example) with no international toll charges. Of course, you can always call extension to extension for zero cost. That’s a pretty amazing concept.
You might be thinking, this can all be done with an on-prem system as well, and you’d be right, but why poke holes in your corporate firewall, and subject yourself to the fun of NAT traversal if you don’t need to? You can also accomplish redundancy with an on-premise system, but you will lack the flexibility of providing multi-region connectivity and redundancy (because it’s in the same building), which is what the above-mentioned cloud services can provide.
Why is multi-region connectivity important? Well, if you’ve been reading the news lately, you’ve probably heard that Amazon dropped an entire region for a couple of hours causing mass panic, and the zombie apocalypse (not really). This is the risk you take in exchange for convenience when you place an application or service in the cloud, but when you distribute that application or service across multiple regions, you mitigate that risk significantly. Some businesses went down entirely because they stuck all of their eggs into one basket (region).
It should be known that regions in these cloud services are treated as completely siloed entities. Instances in one region, cannot simply ping an instance in another region via local IP address, even if they are on the same Amazon, or Azure account. For that, you need some sort of connector, like a VPN. Be aware, however, this is accomplished differently based on what service you are using.
Amazon Web Services, for example, does not have any built-in tools at this time to connect regions together. If you’re planning on deploying FreePBX in both Oregon, and Virginia for redundancy, you’ll need to create a VPN between the two systems with your own virtual appliance so that they can exchange configurations. This should not be confused with Sangoma’s High Availability module for FreePBX, as that requires two systems to be on the same subnet with very low latency between them.
Microsoft Azure, DOES provide the ability to create a region-to-region VPN without using a 3rd party VPN concentrator, and with my experience, the more natively supported tools and services you use, the better things work overall. Truthfully, a VPN may not always be necessary, but that will be dependent on the specific phone system, and how it prefers to communicate with its slave or warm spare. It generally isn’t a bad thing to have regardless.
Before I get more into the strategy of multi-region redundancy, I’d be remiss not to mention a second option, which is connecting either Microsoft Azure, or Amazon Web Services to your local corporate network. Both services, have native tools to create a VPN to your network, provided you have a compatible firewall on your side of the equation. In this scenario, you would have a system on your local network, with a warm spare in the cloud, which can talk local IP to local IP. This option isn’t as flexible as moving all phone system communications to the cloud, but would still provide redundancy in the event your on-prem system goes down, but you still have a live Internet connection to your building. If your entire network takes a nose dive, you are SOL.
Strategy: I originally had the idea (when writing this post) of testing Wazo’s built in high availability module, but I found that just installing the platform on Amazon was so incredibly difficult and an inconsistent process that I just gave up. Back when it was called Xivo, I tested high availability and it worked great. It didn’t work as well as Sangoma’s High Availability module, but it did a decent enough job. The way that it works (or worked), is by moving the configuration from the master system to the slave via a secure tunnel, then it would synchronize and shut down Asterisk on the slave. Its job would then be to continuously ping the master, and in the event, the master was unresponsive, start Asterisk, and bring up the SIP trunks. The only thing you’d have to worry about is registering all of your phones to the slave PBX. That can be automated by using IP phones with a secondary SIP server.
>So, because Wazo was such a PITA, I decided to go with something more mature in the open source space for this post, FreePBX. FreePBX can be configured as a warm spare similarly to Wazo, but it isn’t as automated of a process. Take a look here, to see what’s involved in the basic setup. You will STILL employ IP phones with a secondary SIP server (>Sangoma’s phones do this BTW). Like WAZO, a transfer of the configuration is sent to the warm spare in the opposite region via a secure tunnel, but the difference is in the synchronization. Wazo will instantly synchronize, but FreePBX will require a restore to be performed, which can be automated. You will also need to exclude changing the network settings on the warm spare. We aren’t exactly replacing the production system, we are just providing an alternate for phones to register to. The only intervention that should be required in the event of a failover is activating the SIP trunks (because you would have chosen to turn them off in the warm spare’s restore).
To summarize: When your production phone system has an issue and goes down, your IP phones will attempt to register to the secondary SIP server (via public IP address), which resides in another region (using either Azure, or Amazon). To complete the failover, you will need to log into the warm spare, which has now become the production system, and enable the SIP trunks. Within a reasonably quick period of time, calls in and out will occur as if nothing happened.
While this all works, the primary challenge is the timing of the synchronization between systems since it is not instantaneous. Logically, you’ll want to back and restore to the warm space nightly, but if a lot of changes are expected on a system daily, you may want to schedule that more frequently.
If you plan on deploying your phone system to the cloud, and redundancy is going to be an important priority, well then, I hope I gave you something to think about. Stay tuned for my upcoming post on creating a quick and easy VPN between Amazon Web Services regions.