Master 2nd year : Systèmes et Communication
2006-2007
Algorithms and Techniques for Distributed Systems
Sacha Krakowiak
This 24-hour course is an introduction to the
fundamentals of distributed computing systems. Here is an outline:
- concepts and methods related to event
ordering and global state definition in a distributed system,
- survey of the main classes of distributed
algorithms, illustrated by the description of a few basic algorithms
(termination, election, etc),
- information consistency and reliable
broadcast protocols,
- problems of distributed consensus and
atomic commitment,
- distributed object management
The emphasis is on basic principles; methods and
algorithms are illustrated with examples from actual systems.
Time and State in Distributed Systems
- Introduction to the problems of distributed systems: asynchrony,
unreliability, scale
- Causality and event ordering in an asynchronous distributed system
- Global states and consistent cuts
Applications: checkpointing algorithms, detection of stable properties -
Logical clocks and applications: distributed mutual exclusion,
distributed queues
- Vector clocks and applications: observation, debugging
- Synchronisation of physical clocks
Cooperation of Distributed Processes
- Virtual rings, insertion and withdrawal protocols.
- Algorithms that use tokens
- Election algorithms
Application: process group management - Termination detection
algorithms
Application: distributed garbage collection
Fault Tolerance
- Fault hypotheses
- Specifying consistency: linearisability, sequential consistency,
causal consistency
- Primary copy and active redundancy
- Fault-tolerant broadcasts and process group management
- Applications: reliable servers, multiple copies, distributed
shared memory
Consensus and Atomic Commitment
- Classes of failure detectors
- The consensus problems
Some special cases (synchronous omission, byzantine agreement) - The
atomic commitment problem; 2 and 3-phase commitment
Applications: distributed transaction management
Distributed Information Management
- Principles and implementation of distributed information management
- Addressing, distributed hash tables
- Large scale broadcast, epidemic algorithms
- Cache management, consistency algorithms
- Propagating modifications, epidemic algorithms
Case studies: P2P systems, distributed
files
References
Books
- R. Guerraoui, L. Rodrigues, Reliable Distributed Programming, Springer, 2006
- A. S. Tanenbaum, M. van Steen, Distributed
Systems - Principles & Paradigms, Prentice Hall, 2002
- S. Mullender (editor), Distributed Systems, 2nd ed. ,
Addison-Wesley, 1993
- M. Singhal, N. G. Shivaratri, Advanced Concepts in Operating
Systems, McGraw-Hill, 1994
- V. C. Barbosa, Introduction to Distributed Algorithms,
MIT Press, 1996