Monday, 6 May 2013

Distributed System - a high level introduction


Post #1 of the distributed computing discussion series

This is the beginning of the series of articles/blogs on the distributed computing. I will try to first put forward some of the basic concepts of the distributed computing and then take up some of the related problems and dig deeper. In the process I would also dedicate few blogs on existing products/systems which are relevant to the discussion and try to explain why few things are done in some manners or how someone has solved or overcome a problem.

This post is to quickly give you the introduction on the distributed computing from high level as it will be referred at many places in future blogs. Please note that this subject is too vast to cover in a single blog hence I will try to focus on stuff important from practical design and implementation perspective.

Distributed System

For simplicity and for the discussion sake lets define distributed system as a collection of multiple processors (programs, computers etc...) connected by a communication network. These connected processors try to achieve a common goal by communicating with each other using messages that are sent over the network.

How many ways are there to construct such a system? many... but let's consider two of these. First, we can divide a task into m sub tasks and distribute tasks over the network. Second, we can create sets of different subsets of jobs executed by each program in the distributed system. In both the scenarios (or for that matter any other scenario that you can come up with) we clearly note the power of the system. It scales linearly, can handle humongous amount of load, remains available all the time in practical sense. Also to add very important factor, we don't need the million dollars tailor made machines to do relatively bigger task. Rather we can use many cheap commodity hardware to create a distributed system and achieve same task with lot less dollars spent and probably with more redundancy and availability. The use of commodity hardware in comparison with the tailor made bigger machines, provides more resiliency in the typical practical ambiance.

Hence the next question is, why there are very few software programs that exploits the cheap hardware and do computing in economical manner using distributed system.

The answer is simple, the distributed system and computing requires set of different tools and technique than required by traditional applications. We can view, for simplicity, non distributed system as sequential systems, this helps us understand the concept in relatively easier manner.

So what are the benefits and challenges of distributed system and computing? We will cover that after we have clarified one vastly discussed and confused topic.

Parallel vs Distributed computing

Without going into the detailed discussion, this is how we can distinguish between the two. In parallel programming we have multiple processors working and communicating with each other using a shared memory. Whereas in distributed programming model we lack shared memory in true sense, hence each processor works with local memory and communicate with each other by sending messages. But note that this is only at the logical level and in real world we can simulate sending messages using shared memory in case of parallel processors. And on the other hand we can simulate shared memory with connected network in distributed processors by sending messages.

A simple diagram will fix the concept (inspired by wiki)



The first figure is for distributed system whereas the second one is for parallel system.

Now we will list why one should chose distributed computing over the parallel one

Benefits of distributed system

1. Scalability: In parallel system shared memory becomes the bottleneck when we increase the number of processors, too many contentions for single resource

2. Availability: Distributed system is inherently more available due to natural redundancy in it, whereas special efforts are required in parallel system to achieve the same

3. Resiliency: Distributed system doesn't demand same type or processor to work when we want to add extra node in it. The sheer modularity and heterogeneity  in the distributed system provides the resiliency which is not possible in parallel system

4. Sharing: Data and Resource can be shared in distributed computing. Multiple organizations or divisions within same organization can share data and resource with each other. For ex; high power expensive machine can be connected in the distributed system so that other machines/processors can use it

5. Geographical benefit: The local computation or accessibility is possible with distributed structure

6. Reliability: Single processor/machine failure doesn't fail the overall system

7. Economy: This in my view is very important. Since we can plug in less expensive commodity hardware in the distributed system, the overall expense goes south to great extent compared to hugely costly tailor made multiprocessor machines

Benefits of parallel system

1. Accessing shared memory: Accessing shared memory is very fast compared to sending and receiving messages over the network

2. Fine grained parallelism:Since the data can be communicated with much ease and speed, fine grained parallelism is easily achieved in the parallel system

3. Distributed System is hard to develop and maintain: We will see in next section the challenges thrown by the distributed system and computing. All these complexities can be avoided by using a parallel system

4. Hardware support: The presence of multiple processors on a commodity hardware actually spurs the use of parallel programming in an implicit manner

Note: In fact, I will, through out the discussion, in this post or upcoming future posts,  always assume and support parallel processing on single node/machine. Therefore when I say single node processor, I mean single commodity hardware with multiple processors in it. Hence I assume and encourage the use of multiprocessing/multithreading on a node

Challenges of distributed system

We can summarize the challenges in following three points;

1. Lack of shared clock: Due to uncertainty in the communication delays over the network, it's not possible to synchronize the clock of different processors precisely in a distributed system

2. Lack of shared memory: It's impossible for any processor or node to define the global state of the whole system at any point of time, hence no global property of the system can be observed

3. Lack of accurate failure detection: In most of the scenarios, due to communication issues, it's not possible to tell if a processor has failed or the it's unreachable or there is a delay in reaching it. In  nutshell it's not possible to detect the failure with great accuracy, especially in asynchronous model

The above three challenges require different tools and technique to address the issues faced in distributed computing.

Next

In next post, we will discuss about the 'model' of distributed computation. In the next to next we will discuss the logical clock using which we can order events in distributed scenario and so on... Very soon we will have all the concepts in place to discuss the real world problems of distributed computing and how to address them effectively. Also how others (not many) have applied these techniques to design their products

Thanks for reading through the post!

Best,
Sachin Sinha