Software Stuff: Distributed Systems Architecture, part 1.

I got a couple of new mentees interested in distributed systems so I decided to externalize my thinking in hopes it'll help me make the coaching more productive.
Before you even begin to think about technical part of your system, start by analyzing business requirements. Your goal is to figure out your technical requirements AND verify that the business requirements are correct. It does not hart to start with the initial business idea for the new service (e.g. let people to keep their music on the Internet) and write down your own business requirements. The next step, of course, is to compare your set of requirements with the one provided by the business and learn about the differences. Pay special attention to their growth predictions. I saw systems made very difficult to build by extremely optimistic growth projections (e.g. We will have 3,000,000,000,000 requests per day in the first 6 months) . I also saw how underestimating growth made systems victims of their own success: quick popularity and crash under the demand. Lets define our first tenet here:You should know how you are going to scale your system up and down, if needed. You should also know how much work that is going to take.
Obviously, we always build infinitely horizontally scalable systems, so your projections will be simplified by their linearity.
The first thing you usually can analyze is your data. The business provides most of data definitions in form of the requirements, you just need to define you data in more formal way. If you are building a distributed system you can't have one centralized database - it won't scale with the rest of your system and your system won't be infinitely horizontally scalable. You need to decide if you want your data to stay put or move, if you want what basically amounts to horizontally partitioned database or something else.
Horizontally partitioned database is the most popular use case. You have the identical schema entities (atomic data units. For example, Customer or Account) present on all of your nodes. Then you use some key to build a map: Entity -> Storage Node. For example, hash of CustomerID sliced into ranges, where range 1-100 belongs to Storage Node 1, 101-200 to Storage Node 2 and so on. One of the problems with this schema is the possibility of hot spots. If one of your customers is Sears chances are you will retrieve that entry more often than an entry for some local shop.

Software Stuff

Tuesday, August 16, 2011

Distributed Systems Architecture, part 1.

No comments:

Blog Archive

About Me