Saturday, September 3, 2011

Data that travels, again.

It's really quite surprising how many kinds of system you can build with traveling data and what range of tasks can be accomplished there.To me it is particularly surprising because the paradigm rarely comes to mind in a natural way. After all the most straightforward way of system building is to store the data somewhere in your system and then run multiple activities upon it. Unfortunately this approach usually creates close coupling between data and activities and even between activities or at least their execution sequence.
When a data moves activities can be totally decoupled from each other and even from the original piece of data that started the whole processing sequence. Imagine an event processing where event1 triggers activity1 and activity2. Then activities generate events of next generation with data enriched or transformed and these events trigger their own activities or just get ignored. Those system are very flexible - it's easy to add a new branch or a processor. If you use locality information to deploy event processor involved in the same chain close to each other the processing can be really fast. Obviously events should be relatively small, but if you combine event processing with a data storage solution you can have access to a much larger data object.
Events usually include several key pieces of meta- data to facilitate routing and monitoring: event type, creation time, TTL, creator etc.
On the other side of data traveling is map-reduce, which requires copying of rather large volumes of data. I find that one of most actively "growing" branches on map-reduce tree is HBASE highly suggestive of that paradigm viability. Basically it seems that moving large volumes of data is fine if there is no way you can avoid it. Otherwise pretty much anything else might be a better solution.

Thursday, September 1, 2011

Dead? Very Dead?

This is one of the most complex areas in distributed software. How do you define "dead" and what do you do when a node begins to behave strangely? How do you detect brown-outs and network partitioning? What do you do when these thing hit? The list of questions can go on and on.
Traditionally the simplest solutions work the best. Set a timeout on communications between nodes writing the same key segment, when a timeout expires the node that has access to a quorum device lives on while the other one commits suicide (or goes into read-only mode). Of course, you need a good quorum device - usually a lock service based on Paxos is a good provider of that sort of things. Moving to read-only mode (and coming back from the dead remembering previous state of data on the node) sounds like a better option.  Unfortunately people often forget that the next step is the update from an active node. Update puts extra load on the active node and can (and in many implementations does) push the active node over if it was loaded well enough.
A node that never comes back from the dead (e.g. gets a drive formatted while returning to the service) is simpler, but you might lose data that way AND you still need to solve the problem of adding a node to the affected key segment.I personally favor a resurrection attempt because network partitioning happens much more often than a true death.