Sunday, May 12, 2013

Home Alone base station is built with Unix principles in mind.. using LuaJIT!

Home Alone leverages Unix design principles.  That is, it embraces Unix (in the current case, the flavor is Linux) rather than ignore it.

Home Alone follows the principle that if your Linux system can run for months without a hiccup, then that is a good model for reliability. Not a great model, but a good one.  For instance: There are a lot of crazy things that go on in  an Ubuntu 12.04 server (which is used for both the base station and cloud instances of Home Alone). Take a look at syslog sometime.  But it works, for months.

Home Alone uses Upstart as a process monitor daemon. There are several Home Alone processes running on the base station: X10 (CM19A) USB interface, temperature control USB interface, a process to filter messages, one to log messages to disk (for transactional delivery to the cloud), and one to push messages to the cloud.

The base station processes communicate with each other through POSIX message queues. Look them up, they're in Linux and they offer a loosely coupled way for multiple processes to talk to each other.

All Home Alone processes generate rolling log messages (courtesy of Upstart).  The ext4 filesystem is my database for message storage (why not? A good journaling filesystem can be an excellent poor man's index store -- directories and files!).

The Home Alone processes don't depend on any "third party" middleware.  Each process is written in LuaJIT (100% Lua, no C) and links directly to system libraries. Any memory leaks or crashes are *mine*. (So far: 18 days of uptime, feeding the cloud at least every 30 seconds, and with no crashes or memory leaks).

This all runs amazingly light and well coordinated. I can kill a process and Upstart restarts it and it picks up where it left off.   If I unplug the CM19a USB transceiver, it waits until it has been plugged back in.

No Home Alone processes hold state in memory. They are part of a workflow. Any message or event that cannot be dropped is placed into the ext4 filesystem until they are acknowledged to be in possession of the cloud server.  Each message is a file. This file is written "atomically" from the perspective of any consumer. (The file is created hidden, the message is written, the file is then closed and then "renamed" so that the consumer can see it).

A heartbeat is generated (at the application level) from the base station to the cloud server. This allows the ability to detect whether the base station is connected and properly communicating in a timely manner.

I've still got plenty of fault tolerance testing to do, and some recovery stategies to implement. For instance:

  1. If a USB sensor has been unplugged or fails (such as the X10 CM19a), an event should be sent to the cloud.
  2. The base station should be "rebootable" from the cloud server. There should be some remote control.
  3. If the cloud server isn't getting (at least) "heartbeats" from the base station, there should be an email alert.
The base station, when hardened will have at least a 1 month "burn in" before I consider it done.

/todd