Sunday, 26 May 2013

Model of distributed system

Post #2 of the distributed computing discussion series 


Previous post gave the high level introduction of the distributed system. In this post we will discuss about model of distributed system.

Once defined, model will help us understand many features and flavours of distributed computing challenges and put them in perspective and allow us to formulate workarounds or solutions to solve or overcome those challenges. These models will be used throughout in our next set of blogs related to the topic. Hence it is important to focus on the subject and understand clearly.

How do we visualize a distributed system?

Here is how I would describe a distributed system in simple terms. 
  • message passing system - nodes interact by sending/receiving messages
  • loosely coupled - no upper bound on message arrival time
  • no shared memory - all nodes have their own private memory
  • no global clock - clocks of different nodes can't be synchronized globally
  • a graph topology - consisting of processes as graph nodes and channels for message passing as edges (directional)
  • ordering of messages are not assumed in channels 
A simple figure to describe what I wrote above;




State and Event

A process or node in broader sense, can be seen in the context of following properties;
  • set of states,
  • set of events, and
  • set of initial conditions
Hence it's easy to see that in a graph kind of topology, with node as a process and edge as a channel, the state of both node and edge change when an event arrives. The following diagram would make it bit clearer;


Here we have a node with state S2 and channel (incoming) to the node with state S1. An event e arrives and it changes the states of both the channel and the node.

Thus we can safely assume that an event changes the state of at least one node and a channel connected to it. This concept will come handy in future discussions

Global State

We have just discussed that a distributed system has many nodes connected with each other through channels and interacting with each other by sending messages through channels. We have also discussed that the systems are loosely connected in the sense that we can't put a cap on message arrival time or the ordering of the messages in the channel. Also a state of the node changes after an event arrives to it (by also changing the state of the channel).

With this much information, we can safely define the global state of a distributed system as a cross product of local states of all the nodes and channels in the system. Also when channels are empty, we define the situation as initial state of the global system

Ordering

When we execute a distributed program, it generates many events (such as function output, sending and receiving of messages etc...). Each of these events would not make much sense if we don't apply order on them. There are many ways of thinking how order can be applied on the events and here are some;
  1. All events in the system are ordered. This means, across the system, with all nodes and channels, each and every event is ordered. That is, there exist a global sequence of events which is properly ordered. This assumes following; 
  • we have a shared global clock available
  • all events are instataneous
  • no two events are simultaneous
  1. Let's relax the first assumption, that is now assume that there is no shared global clock available to be used. This means that globally we can no longer compute the sequence of events. But still we do have the full ordering on a particular process or node. Thus we have partial order in the system but full order on a node
  1. Finally, let's also relax the assumption that every event is due to or caused by a other previous event(s). This makes the full order on a single node also difficult and hence we are left with partial order in the system and partial order on a node
Model of distributed system

Now we can define the model of distributed system by naming the above three ways of ordering in a formal manner. 
  1. Interleaving (Physical time) - Total order across the system
  1. Happened Before (Logical order) - Total order on a process, partial on system
  1. Potential Causality (Causality) - Partial order on a process or system 
The important difference between 2 and 3 is that 2 assumes that all events on a process will have cause and effect relationship, which is not always true. An example would be two different messages received on two different ports of a machine. Or two threads accessing or changing mutually disjoint set of variables.

Other important point to note here is that the potential causality is equivalent to the set of happened before that are consistent with it. Of course this is dependent on the potential causality relation on one process.

Now that we have defined the model of a distributed system, the question that we might have is that which model is the appropriate model? The answer depends on the application for which the model is being used. But as seen logically, a distributed program can be viewed as a set of potential causality, in turn, is equivalent to a set of happened before, and finally each happened before is equivalent to a set of global sequences of events (if each process is taken separately)

Here are few graphical description of two models described above;







The first one is happened before whereas the later is for potential causality.

That's all for this post, in the next one we will discuss about logical clock and vector clock algorithm which are very important in synchronizing events in a distributed system. Basically these clocks are used to track the orders of the events in different models of distributed system as defined above

Best,
Sachin Sinha