With this partitioning schema you must avoid cross-machine writes and reads. Under any sort of load cross-machine writes will fails in a variety of really exciting and difficult to untangle ways. Multi-phase commit is not going to help you there: it doesn't work on any real world load. The transactions just take too long, which gets reflected by humongous resource consumption, wildly different latencies and frequent failures. Network partitioning decreases your availability and adds failures, too. Multi-machine reads are also rather flaky for the same reasons. You also need to think about the combining the results from several nodes, pagination and more. In short - just do not allow cross-machine access if you are doing horizontal data partitioning.
If you do need to have cross-machine updates or queries, you need to implement a workflow. Any financial system is a shining example of such interaction pattern. Clearly, a response in such system is asynchronous.
The importance of Numbers
I had meetings with my mentees and the common issue with them was the lack of any numbers they could put on their requirements. Working on any system, distributed or not, you need to start by having a grasp on some cornerstone numbers:
If you do need to have cross-machine updates or queries, you need to implement a workflow. Any financial system is a shining example of such interaction pattern. Clearly, a response in such system is asynchronous.
The importance of Numbers
I had meetings with my mentees and the common issue with them was the lack of any numbers they could put on their requirements. Working on any system, distributed or not, you need to start by having a grasp on some cornerstone numbers:
- Expected Latency. Clearly, system with synchronous 15 ms response time must have very different architecture from a system with 2 second response time. You need to know both tp50 and tp99.9.
- Expected availability. If it's different for reads and writes or per resource, you need to know all the numbers. Also what are the criteria for availability (e.g. if ping works, but nothing else does - is the service available?)
- Expected redundancy.
- Total amount of money your business owners are ready to spend on the hardware. This is one of the things that greatly inspires your creativity, especially when you realize that the number is several times less than what you think it ought to be.

No comments:
Post a Comment