際際滷

際際滷Share a Scribd company logo
DISTRIBUTED SYSTEMS
Principles and Paradigms
Second Edition
ANDREW S. TANENBAUM
MAARTEN VAN STEEN
modified by A. Dobra and R. Newman 2012/2013
Chapter 1
Introduction
What is an Operating System
An operating system is:
A collection of software components that
 Provides useful abstractions and
 Manages resources to
 Support application programs, and
 Provide an interface for users and programs
Operating System Functions
An operating systems main functions are to:
 Schedule processes & multiplex CPU
 Provide mechanisms for IPC and
synchronization
 Manage main memory
 Manage other resources
 Provide convenient persistent storage (files)
 Maintain system integrity, handle failures
 Enforce security policies (e.g., access control)
 Give users and processes an interface
Definition of a Distributed System (1)
A distributed system is (Tannenbaum):
A collection of independent computers
that appears to its users as a single
coherent system.
A distributed system is (Lamport):
One in which the failure of a computer
you didn't even know existed can
render your own computer unusable
Properties of Distributed Systems
 Concurrency
 Multicore systems
 Multiple hosts
 No global clock
 Theoretical impossibility
 Expense of accurate clocks
 Independent view
 Message delay, failure
 Impossible to distinguish slow vs. failed node
 Independent failure
 Message delivery (loss, corruption)
 Nodes (fail-stop, Byzantine)
Software Concepts
An overview of
 NOS (Network Operating Systems) (80s)
 DOS (Distributed Operating Systems) (90s)
 Middleware (00s)
System Description Main Goal
DOS
Tightly-coupled operating system for multi-
processors and homogeneous
multicomputers
Hide and manage
hardware
resources
NOS
Loosely-coupled operating system for
heterogeneous multicomputers (LAN and
WAN)
Offer local
services to remote
clients
Middleware
Additional layer atop of NOS implementing
general-purpose services
Provide
distribution
transparency
Definition of a Distributed System (2)
Figure 1-1. A distributed system organized as middleware. The
middleware layer extends over multiple machines, and offers
each application the same interface.
Transparency in a Distributed System
Figure 1-2. Different forms of transparency in a
distributed system (ISO, 1995).
Other forms:
Parallelism  Hide the number of nodes working on a task
Size  Hide the number of components in the system
Revision  Hide changes in software/hardware versions
Challenges
 Performance
 Concurrency
 Failures
 Scalability
 System updates/growth
 Heterogeneity
 Openness
 Multiplicity of ownership, authority
 Security
 Quality of service/user experience
 Transparency
 Debugging
Approaches
 Virtual clocks
 Group communication
 Heartbeats/failure detection, group membership
 Distributed agreement, snapshots
 Leader election
 Transaction protocols
 Redundancy, replication, caching
 Indirection - naming
 Distributed mutual exclusion
 Middleware, modularization, layering
 Decomposition vs. integration
 Cryptographic protocols
Scalability Problems
Figure 1-3. Examples of scalability limitations.
Engineering = art of compromise (making tradeoffs)
Distributed systems  many theoretical results on lower
bounds of tradeoffs that limit practical solutions
Scalability Examples
Distributed systems are ubiquitous and necessary:
 Web search
 Financial transactions
 Multiplayer games
 DNS
 Travel reservation systems
 Utility infrastructure (e.g., power grid)
 Embedded systems (e.g., cars)
 Sensor networks
Failure to scale is fatal
 Instagram  share cellphone pix
 Facebook IPO
Web Search
 Google uses thousands of machines to
 Provide search results
 Run Page-Rank algorithm
 Issues
 Connecting large number of machines
 Distributed file system (GFS)
 Indexing
 Programming model
 Scaling up when current system reaches limits
Financial Transactions
Volume is huge
 4 million messages per second
 50 million things you can trade
Requirements are stringent
 Low latency
 24/7 operation (around the world)
 Failure is not an option
 Facebook NASDAQ Freeze
 Transaction system overwhelmed
 Hours to complete transactions in falling market
Multiplayer Games
Very popular  huge market
Characteristics
 May have millions of players
 Players operate in same world
 Players interact with world, each other
Issues
 Number of users
 Latency, consistency
 Coordination of multiple servers
 Architecture???
Scalability Problems
Characteristics of decentralized algorithms:
 No machine has complete information about the
system state.
 Machines make decisions based only on local
information.
 Failure of one machine does not ruin the
algorithm.
 There is no implicit assumption that a global
clock exists.
Scaling Techniques (1)
Figure 1-4. The difference between letting (a) a server
or (b) a client check forms as they are being filled.
Scaling Techniques (2)
Figure 1-5. An example of dividing the DNS
name space into zones.
Pitfalls when Developing
Distributed Systems
False assumptions made by first time developer:
 The network is reliable.
 The network is secure.
 The network is homogeneous.
 The topology does not change.
 Latency is zero.
 Bandwidth is infinite.
 Transport cost is zero.
 There is one administrator.
Multicore Systems
 Knights corner: 64 cores on a chip
 Intel Cloud in a Chip  48 cores/256GB @$9K
 http://www.intel.com/content/www/us/en/research/intel-labs-single-
chip-cloud-computer.html
 Most hosts are 2, 4, or 8 core now
 Fine-grained parallelism hard
 Detailed knowledge of algo/programmer involved
 Very fancy compiler
 Scheduling a challenge
 Virtualization
 Treat N cores as N hosts (with low latency comm)
 Do sequential programming
 Use DS framework to integrate
Knights Corner (KC) Chip
10 rings (5 in each direction), Tag Dir, Mem Ctl
Cluster Computing Systems
Figure 1-6. An example of a cluster computing system.
Grid/Cloud Computing Systems
Figure 1-7. A layered architecture for grid computing systems.
Common Distributed Systems
 Query Processing
 Transaction Processing
 Enterprise Applications
 Pervasive Systems
 Sensor Networks
Transaction Processing Systems (1)
Figure 1-8. Example primitives for transactions.
Transaction Processing Systems (2)
Characteristic properties of transactions:
 Atomic: To the outside world, the transaction
happens indivisibly.
 Consistent: The transaction does not violate
system invariants.
 Isolated: Concurrent transactions do not
interfere with each other.
 Durable: Once a transaction commits, the
changes are permanent.
Known as ACID properties
Transaction Processing Systems (3)
Figure 1-9. A nested transaction.
Transaction Processing Systems (4)
Figure 1-10. The role of a TP monitor (a.k.a. Coordinator)
in distributed systems.
Transaction Processing Systems (4.5)
Decomposition of the Transaction Monitor in a TPS
TM  2PC; SCH  serializability; OM  Atomic Update
Client
Client
Client
Client
...
...
Coordinator
Participants
Object
Manager
Scheduler
Transaction
Manager
Object
Object
Object
Object
...
...
Object
Manager
Scheduler
Transaction
Manager
Enterprise Application Integration
Figure 1-11. Middleware as a communication facilitator in
enterprise application integration.
Distributed Pervasive Systems
Requirements for pervasive systems
 Embrace contextual changes.
 Encourage ad hoc composition.
 Recognize sharing as the default.
Electronic Health Care Systems (1)
Questions to be addressed for health care systems:
 Where and how should monitored data be
stored?
 How can we prevent loss of crucial data?
 What infrastructure is needed to generate and
propagate alerts?
 How can physicians provide online feedback?
 How can extreme robustness of the monitoring
system be realized?
 What are the security issues and how can the
proper policies be enforced?
Electronic Health Care Systems (2)
Figure 1-12. Monitoring a person in a pervasive electronic health
care system, using (a) a local hub or
(b) a continuous wireless connection.
Sensor Networks (1)
Questions concerning sensor networks:
 How do we (dynamically) set up an
efficient tree in a sensor network?
 How does aggregation of results take
place? Can it be controlled?
 What happens when network links fail?
Sensor Networks (2)
Figure 1-13. Organizing a sensor network database, while storing
and processing data (a) only at the operators site or
Sensor Networks (3)
Figure 1-13. Organizing a sensor network database, while storing
and processing data  or (b) only at the sensors.
May also do data fusion/aggregation/processing at nodes
along the path to the master node/operator
Some Fundamental Issues
 How do we decompose a complex
problem/task into logical/manageable
chunks?
 What is the physical architecture?
 How do we assign roles/responsibilities to
physical components?
 How do we find components (logical and
physical)?
 How do we define and maintain
consistency?

More Related Content

chap-0 .ppt

  • 1. DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN modified by A. Dobra and R. Newman 2012/2013 Chapter 1 Introduction
  • 2. What is an Operating System An operating system is: A collection of software components that Provides useful abstractions and Manages resources to Support application programs, and Provide an interface for users and programs
  • 3. Operating System Functions An operating systems main functions are to: Schedule processes & multiplex CPU Provide mechanisms for IPC and synchronization Manage main memory Manage other resources Provide convenient persistent storage (files) Maintain system integrity, handle failures Enforce security policies (e.g., access control) Give users and processes an interface
  • 4. Definition of a Distributed System (1) A distributed system is (Tannenbaum): A collection of independent computers that appears to its users as a single coherent system. A distributed system is (Lamport): One in which the failure of a computer you didn't even know existed can render your own computer unusable
  • 5. Properties of Distributed Systems Concurrency Multicore systems Multiple hosts No global clock Theoretical impossibility Expense of accurate clocks Independent view Message delay, failure Impossible to distinguish slow vs. failed node Independent failure Message delivery (loss, corruption) Nodes (fail-stop, Byzantine)
  • 6. Software Concepts An overview of NOS (Network Operating Systems) (80s) DOS (Distributed Operating Systems) (90s) Middleware (00s) System Description Main Goal DOS Tightly-coupled operating system for multi- processors and homogeneous multicomputers Hide and manage hardware resources NOS Loosely-coupled operating system for heterogeneous multicomputers (LAN and WAN) Offer local services to remote clients Middleware Additional layer atop of NOS implementing general-purpose services Provide distribution transparency
  • 7. Definition of a Distributed System (2) Figure 1-1. A distributed system organized as middleware. The middleware layer extends over multiple machines, and offers each application the same interface.
  • 8. Transparency in a Distributed System Figure 1-2. Different forms of transparency in a distributed system (ISO, 1995). Other forms: Parallelism Hide the number of nodes working on a task Size Hide the number of components in the system Revision Hide changes in software/hardware versions
  • 9. Challenges Performance Concurrency Failures Scalability System updates/growth Heterogeneity Openness Multiplicity of ownership, authority Security Quality of service/user experience Transparency Debugging
  • 10. Approaches Virtual clocks Group communication Heartbeats/failure detection, group membership Distributed agreement, snapshots Leader election Transaction protocols Redundancy, replication, caching Indirection - naming Distributed mutual exclusion Middleware, modularization, layering Decomposition vs. integration Cryptographic protocols
  • 11. Scalability Problems Figure 1-3. Examples of scalability limitations. Engineering = art of compromise (making tradeoffs) Distributed systems many theoretical results on lower bounds of tradeoffs that limit practical solutions
  • 12. Scalability Examples Distributed systems are ubiquitous and necessary: Web search Financial transactions Multiplayer games DNS Travel reservation systems Utility infrastructure (e.g., power grid) Embedded systems (e.g., cars) Sensor networks Failure to scale is fatal Instagram share cellphone pix Facebook IPO
  • 13. Web Search Google uses thousands of machines to Provide search results Run Page-Rank algorithm Issues Connecting large number of machines Distributed file system (GFS) Indexing Programming model Scaling up when current system reaches limits
  • 14. Financial Transactions Volume is huge 4 million messages per second 50 million things you can trade Requirements are stringent Low latency 24/7 operation (around the world) Failure is not an option Facebook NASDAQ Freeze Transaction system overwhelmed Hours to complete transactions in falling market
  • 15. Multiplayer Games Very popular huge market Characteristics May have millions of players Players operate in same world Players interact with world, each other Issues Number of users Latency, consistency Coordination of multiple servers Architecture???
  • 16. Scalability Problems Characteristics of decentralized algorithms: No machine has complete information about the system state. Machines make decisions based only on local information. Failure of one machine does not ruin the algorithm. There is no implicit assumption that a global clock exists.
  • 17. Scaling Techniques (1) Figure 1-4. The difference between letting (a) a server or (b) a client check forms as they are being filled.
  • 18. Scaling Techniques (2) Figure 1-5. An example of dividing the DNS name space into zones.
  • 19. Pitfalls when Developing Distributed Systems False assumptions made by first time developer: The network is reliable. The network is secure. The network is homogeneous. The topology does not change. Latency is zero. Bandwidth is infinite. Transport cost is zero. There is one administrator.
  • 20. Multicore Systems Knights corner: 64 cores on a chip Intel Cloud in a Chip 48 cores/256GB @$9K http://www.intel.com/content/www/us/en/research/intel-labs-single- chip-cloud-computer.html Most hosts are 2, 4, or 8 core now Fine-grained parallelism hard Detailed knowledge of algo/programmer involved Very fancy compiler Scheduling a challenge Virtualization Treat N cores as N hosts (with low latency comm) Do sequential programming Use DS framework to integrate
  • 21. Knights Corner (KC) Chip 10 rings (5 in each direction), Tag Dir, Mem Ctl
  • 22. Cluster Computing Systems Figure 1-6. An example of a cluster computing system.
  • 23. Grid/Cloud Computing Systems Figure 1-7. A layered architecture for grid computing systems.
  • 24. Common Distributed Systems Query Processing Transaction Processing Enterprise Applications Pervasive Systems Sensor Networks
  • 25. Transaction Processing Systems (1) Figure 1-8. Example primitives for transactions.
  • 26. Transaction Processing Systems (2) Characteristic properties of transactions: Atomic: To the outside world, the transaction happens indivisibly. Consistent: The transaction does not violate system invariants. Isolated: Concurrent transactions do not interfere with each other. Durable: Once a transaction commits, the changes are permanent. Known as ACID properties
  • 27. Transaction Processing Systems (3) Figure 1-9. A nested transaction.
  • 28. Transaction Processing Systems (4) Figure 1-10. The role of a TP monitor (a.k.a. Coordinator) in distributed systems.
  • 29. Transaction Processing Systems (4.5) Decomposition of the Transaction Monitor in a TPS TM 2PC; SCH serializability; OM Atomic Update Client Client Client Client ... ... Coordinator Participants Object Manager Scheduler Transaction Manager Object Object Object Object ... ... Object Manager Scheduler Transaction Manager
  • 30. Enterprise Application Integration Figure 1-11. Middleware as a communication facilitator in enterprise application integration.
  • 31. Distributed Pervasive Systems Requirements for pervasive systems Embrace contextual changes. Encourage ad hoc composition. Recognize sharing as the default.
  • 32. Electronic Health Care Systems (1) Questions to be addressed for health care systems: Where and how should monitored data be stored? How can we prevent loss of crucial data? What infrastructure is needed to generate and propagate alerts? How can physicians provide online feedback? How can extreme robustness of the monitoring system be realized? What are the security issues and how can the proper policies be enforced?
  • 33. Electronic Health Care Systems (2) Figure 1-12. Monitoring a person in a pervasive electronic health care system, using (a) a local hub or (b) a continuous wireless connection.
  • 34. Sensor Networks (1) Questions concerning sensor networks: How do we (dynamically) set up an efficient tree in a sensor network? How does aggregation of results take place? Can it be controlled? What happens when network links fail?
  • 35. Sensor Networks (2) Figure 1-13. Organizing a sensor network database, while storing and processing data (a) only at the operators site or
  • 36. Sensor Networks (3) Figure 1-13. Organizing a sensor network database, while storing and processing data or (b) only at the sensors. May also do data fusion/aggregation/processing at nodes along the path to the master node/operator
  • 37. Some Fundamental Issues How do we decompose a complex problem/task into logical/manageable chunks? What is the physical architecture? How do we assign roles/responsibilities to physical components? How do we find components (logical and physical)? How do we define and maintain consistency?

Editor's Notes

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37