etcd "raft internal error" got you down?

So you're rolling along with CoreOS and then you go and make your cluster rediscover itself after some configuration changes, and you the cluster is broken, and no one's discovering anything, and etcd processes aren't starting, and you're ready to pull your hair out because get something in your etcd logs that looks like:

fail on join request: Raft Internal Error 

I think I've got the fix -- clear our the caching that etcd has, and go ahead and feed the cluster a new discovery token / discovery URL.

Ugh! It had been killing me. I'd been running CoreOS with libvirt/qemu, and I had everything lined up to work, and it'd usually work well the first time I booted the cluster up, but... I'd go and reboot the cluster, and feed it a new discovery URL, and... bam... I'd start getting these left and right with etcd 0.4.6 & 0.4.9.

I'd keep running into this github issue, which shows that I'm not the first to have the problem, but, no solution? The best I could find in other posts was to upgrade to etcd2 (which you do by just running the service called etcd2 instead of just plain etcd). I gave it a try, but, I wasn't having much luck, and I was already deep down the path of figuring this out.

So, I slowed myself down, and I started building from scratch, and slowly watching the machines come up, one by one. Then, I tried powering down, reloading cloud configs, and watching them one by one... I noticed, that the first machine to come up, on it's own... with a brand new discovery token... Would remember -- that it had 4 friends. What!? You're not supposed to know yet!

I don't claim to grok the internals, but, what I did understand is that it was remembering something, that in my opinion it shouldn't have. How might I clear it's memory and work like the first time I started up the cluster?

That being said, I made myself a bit of an experiment, and I went and did a:

systemctl stop etcd
rm /var/lib/etcd/log
rm /var/lib/etcd/conf
shutdown now

...I personally put this in an ansible script, and use it during a playbook that I call "cluster rediscovery" -- which is pretty nice when you're say, on a laptop and bringing the cluster up basically every time you open up your laptop.

High Availability Asterisk using Docker & CoreOS with etcd

After I created my Docker project for using Asterisk with Docker -- I couldn't leave well enough alone. I wanted to create an Asterisk eco-system that had a number of features that just a stand-alone Asterisk couldn't do for me (while it definitely provides the life-blood of my VoIP systems as it is!)

The things that I really want in an Asterisk setup are:

  • High-availability
  • Scalability
  • Visibility (of SIP traffic)

I've since created a set of tools, available in the same docker-asterisk git repo, to accomplish this. Take a look for yourself in the high-availability directory in your clone.

The advantage of this project is that it's really easy to setup this eco-system, with the potential for tens or hundreds of boxen -- with great ease. Choosing Docker lets us leverage the advantage of containerizing our applications and make them easy to test in development, and deploy to production (with assurance that we have developed in environments congruent with production). Choosing CoreOS lets us run those Docker containers easily, and leverages CoreOS's etcd which allows for service discovery -- allowing us to spin up more and more nodes in our cluster -- and let them discover one another.

You could use this setup for a very manageable, and scalable high-availability Asterisk setup -- one that's easily deployable in the cloud (or the closet). One could also customize the setup to dynmically allocate or deallocate more Asterisk boxes during their busy hour -- if paying per cycle with a cloud provider, it's plausible dynamically spinning boxes up and down could save you some considerable "skrilla" (that's money).

If you aren't familiar with the current wave/trend of containerization, and it's advantages, check out this recent InformationWeek article on Linux containers.

With the setup provided here, any given box can go down, and calls will continue to flow into the platform. While it does not have a replicated state machine between Asterisk boxes themselves (however, one may be able to achieve the same with 1:1 hardware redundancy using the virtualization platform of their choice), the philosophy is to have a highly-available load-balancer (which we use Kamailio for) that can dynamically balance load between N-number of Asterisk boxes, and to keep a low call volume on each Asterisk box so that in the case of catastrophic failure of a given Asterisk box, you lose the least amount of calls. The load balancer is a place where we could have a single point of failure, but, this is solved by having identical configurations on 2 (or more) load-balancers and using a virtual IP to fail over between the two (or more) Kamailio load balancers.

It also considers the regular maintenance you might do, and will gracefully drain calls from your Asterisk boxes if you issue at the Asterisk*CLI, for example, core shutdown gracefully. There are also other ways to put your Asterisk boxes into a maintenance mode, that we will cover down the road.

This article covers the methodology and philosophy behind the choices used in the docker-asterisk project. I will later follow this article with a deep-dive and tutorial into the components behind it and show you just how easy it is to stand-up a scalable HA cluster of Asterisk boxes.

The core application technologies I've chosen for telephony to accomplish this are:

  • Asterisk (naturally! For a media server, IVR, PBX or feature server)
  • Kamailio (for load balancing Asterisk)
  • Homer & captagent (for visibility of SIP traffic)

The general gist is to deploy containerized editions of each the core applications under the technology stack that follows, they're configured in a specific way in order to interoperate.

The platform technology stack I've chosen consists of:

I wound up choosing CoreOS to use as the base operating system, which is great at running Docker containers, and comes with etcd -- a great tool for service discovery (which helps our scalability features, quite a bit)

This is a good time to show you an overview of the platform stack:

Platform Stack

We can see that on our host, runs CoreOS as the base operating system, and etcd is included with the CoreOS installation, and will dynamically share service discovery information with the whole CoreOS cluster. CoreOS will be running Docker proper, and on top of that layer we run each of our Docker containers.

Most of the names of the Docker containers should be familiar to you, such as Asterisk, Kamailio & Homer (homer uses a LAMP stack, hence the multide of containers [the four on the right]).

One that is likely new to you is the kamailio-etcd-dispatcher container. kamailio-etcd-dispatcher is a custom open-source application that I created that allows you to spin up more and more Asterisk boxes and have the service to be discovered by our Kamailio load balancer. In one role it announces it's presence, and the companion role it updates (and applies) a new dispatcher.list for Kamailio. (By default, it evenly load balances each box, however, it supports setting meta data that allows you to set a custom distribution if it suits your needs). This application is written in Node.js -- but don't sweat it if you're not running node.js on your platform currently -- since it's containerized, you really can just "let go" of that configuration and let it happen in the container.

Let's take a look at the neworking:

Networking

Let's read it left to right, and you'll see that we have an ITSP who's talking to a virtual IP. This virtual IP (VIP) is handled by Keepalived, and is virtually shared between two Kamailio boxes. A SIP message will be transmitted to this VIP -- and the Kamailio box which is currently master will accept that SIP message. This machine can go down at any point -- Kamailio is stateless, so any message at any point in the SIP conversation can come through, and since SIP (is typically) UDP; the protocol already accounts for the fact that a message could be re-transmitted, so even if the Kamailio machine goes down mid-message, the call will still operate.

Each Kamailio box uses the Kamailio dispatcher and is loaded with the list of Asterisk boxes to dispatch the call too -- and is loaded with a configuration that knows how to detect congestion (so if you're taking a box out of the load-balanced rotation for maintenance, it won't send new calls to it). It then sends the call to the appropriate box.

You'll note on the right, 4 asterisk boxes depicted. The final box being termed "Asterisk@n" -- you can add as many of these Asterisk boxess, and when you add more to the cluster -- using the kamailio-etcd-dispatcher -- they're automatically detected and load balanced to, without any further configuration. They're automatically in production and taking calls.

One thing that isn't considered in this project -- is NAT. All the examples are on the same /24 subnet. I figure that almost everyone's natting concerns are probably different than my own. You'll just need to setup the routing that's most appropriate to your network environment, and setup your Asterisk boxes. This takes out a lot of what could be user-specific configurations and makes for easier testing; giving you the liberty to configure your NAT setup specific to your environment (or lets you avoid NAT entirely, if your environment permits).

As for RTP, each box is aware of it's own Media address, and media is not proxied, but is routed directly to the Asterisk machine from which the call was handled. You may wish to customize this otherwise if you have other needs in your environment.

Asterisk fanboy alert: I haven't yet accounted for this setup using IAX2, but, I know it's possible, and I always prefer to to SIP. If you have a use-case where you can go purely IAX2 I don't want to postpone your joy!

Finally, but possibly most importantly, let's take a look at how the containers and machines inter-operate:

Deployment scheme

There's a lot of detail here, and a lot of connecting pieces. However -- it doesn't require a lot of further configuration (except for the good stuff! Like your dialplans, custom AGI applications, and ancillary applications).

I'll let the diagram do the talking for itself, and we'll get into the details and the guts of it in up-coming articles.

In the next article, we'll get into the nuts and bolts and demonstrate just how easy it is to spin up a new cluster, and we'll demonstrate some of the cool features of this cluster configuration. After we show how it's setup, I'll be coming along with another article which breaks down the components and shows you how to customize the good stuff to suit your production needs.

Thanks for taking the time to read through!

Introducing Bowline.io

I'm introducing my new tool Bowline.io -- a tool to tie your Docker images to a build process. It has a bunch of features that I'm rather proud of:

  • It automatically builds images based on git hooks (or polling over HTTP)
  • It logs the results of those builds, so you can see the history.
  • It'll allow you to build unit tests into your Dockerfiles so you know if it's a passing build
  • It's a front-end to a Docker registry, to give you a graphical view of it
  • It's an authenticated Docker registry, which users can manage their creds online.
  • It keeps some meta data about your "knot", like a README.md and it syntax highlights your Dockerfile
  • It's compatible with docker at the CLI, like docker login and therefore docker push and docker pull

And what it really boils down to is that it's kind of a Dockerhub clone that you can run on your own hardware. Sort of how you might use Dockerhub Enterprise, but, it's open source. You can just spin it up and use it to your heart's content. Check out the documentation on how to run it locally for yourself -- you won't be surprised to find out that it runs in Docker itself.

It's what happened to my "asterisk autobuilder for docker" I wrote about in a previous post. That tool was working out rather well for me, and when I went to extend what was previously an IRC bot to handle multiple builds -- I realized that I could extend it to a web app with some ease (It's a NodeJS application, and already was starting to expose an API).

I've got even more plans for it in the future, some of which are:

  • Continuous deployment features (make an action after your build finishes)
  • More explicit unit testing functionality
  • The capability to test an "ecosystem" of Docker containers together (like a webapp, a db & a web server for example)
  • Ability to create geard & fleet unit files for deployment
  • A distributed build server architecture so you can run it on many boxen (it's been planned in the code, but, not yet implemented)

Feel free to give it a try @ Bowline.io -- I'm looking forward to having more guinea pigs check it out, use the hosted version @ Bowline.io, or running it on their own boxen -- or best yet! Contributing a PR or two against it.

Asterisk Autobuilder for Docker

I've gone ahead and expanded upon my Asterisk Docker image, to make a system that automatically builds a new image for it when it finds a new tarball available.

Here's the key features I was looking for:

  • Build the Asterisk docker image and make it available shortly after a release
  • Monitor the progress of the build process
  • Update the Asterisk-Docker git repo

To address the secondary bullet point, I made a REPL interface that's accessible via IRC -- and like any well behaved IRC netizen; it posts logs to a pastebin.

Speaking of such! In the process, I made an NPM modules for pasteall. If you don't know pasteall.org -- it's the best pastebin you'll ever use.

You can visit the bot in ##asterisk-autobuilder on freenode.net.

As for the last bullet point, when it finds a new tarball it dutifully updates the asterisk-docker github repo, and makes a pull request. Check out the first successful one here. You'll note that it keeps a link to the pasteall.org logs, so you can see the results of the build -- in all their gory detail, every step of the docker build.

I have bigger plans for this, but, some of the shorter-term ones are:

  • Allow multiple branches / multiple builds of Asterisk (Hopefully before Asterisk 13!!)

Docker and Asterisk

Let's get straight to the goods, then we'll examine my methodology.

You can clone or fork my docker-asterisk project on GitHub. And/or you can pull the image from dockerhub.

Which is as simple as running:

docker pull dougbtv/asterisk

Let's inspect the important files in the clone

.
|-- Dockerfile
|-- extensions.conf
|-- iax.conf
|-- modules.conf
`-- tools
    |-- asterisk-cli.sh
    |-- clean.sh
    `-- run.sh

In the root dir:

  • Dockerfile what makes the dockerhub image dougbtv/asterisk
  • extensions.conf a very simple dialplan
  • iax.conf a sample iax.conf which sets up an IAX2 client (for testing, really)
  • modules.conf currently unused, but an example for overriding the modules.conf from the sample files.

In the tools/ dir are some utilities I find myself using over and over:

  • asterisk-cli.sh runs the nsenter command (note: image name must contain "asterisk" for it to detect it, easy enough to modify to fit your needs)
  • clean.sh kills all containers, and removes them.
  • run.sh a suggested way to run the Docker container.

That's about it, for now!


There's a couple key steps to getting Asterisk and Docker playing together nicely, and I have a few requirements:

  • I need to access the Asterisk CLI
  • I also need to allow wide ranges of UDP ports.

On the first bullet point, we'll get over this by using nsenter, which requires root or sudo privileges, but, will let you connect to the CLI, which is what I'm after. I was inspired to use this solution from this article on the docker blog. And I got my method of running it from coderwall.

On the second point... Docker doesn't like UDP it seems (which is what the VoIP world runs on). At least in my tests trying to get some IAX2 VoIP over it, on port 4569. (It's an easier test to mock up than SIP!) (Correction: Actually, Docker is fine with UDP, you just have to let it know when you run a docker container, e.g. docker run -p 4569:4569/udp -t user/tag)

So, I settled on opening up the network to Docker using the --net host parameter on a docker run.

At first, I tried out bridged networking. And maybe not all is lost. Here's the basics on bridging here @ redhat I followed. Make sure you have bridge-utils package: yum install -y bridge-utils. But, I didn't have mileage with it. Somehow I set it up, and it borked my docker images from even getting on the net. I should maybe read the Docker advanced networking docs in more detail. Aaaargh.

Some things I have yet to do are:

  • Setup a secondary container for running FastAGI with xinetd.

I'm thinking I'll running xinetd in it's own container and connect the asterisk image with the xinetd image for running FastAGI.