A distributed system is a collection of independent computers that appears as a single coherent system to users. Key properties include concurrency across multiple cores and hosts, lack of a global clock, and independent failures of nodes. There are many challenges in building distributed systems including performance, concurrency, failures, scalability, and transparency. Common approaches to address these include virtual clocks, group communication, failure detection, transaction protocols, redundancy, and middleware. Distributed systems must be carefully engineered to balance competing design tradeoffs.
1 of 37
Download to read offline
More Related Content
chap-0 .ppt
1. DISTRIBUTED SYSTEMS
Principles and Paradigms
Second Edition
ANDREW S. TANENBAUM
MAARTEN VAN STEEN
modified by A. Dobra and R. Newman 2012/2013
Chapter 1
Introduction
2. What is an Operating System
An operating system is:
A collection of software components that
Provides useful abstractions and
Manages resources to
Support application programs, and
Provide an interface for users and programs
3. Operating System Functions
An operating systems main functions are to:
Schedule processes & multiplex CPU
Provide mechanisms for IPC and
synchronization
Manage main memory
Manage other resources
Provide convenient persistent storage (files)
Maintain system integrity, handle failures
Enforce security policies (e.g., access control)
Give users and processes an interface
4. Definition of a Distributed System (1)
A distributed system is (Tannenbaum):
A collection of independent computers
that appears to its users as a single
coherent system.
A distributed system is (Lamport):
One in which the failure of a computer
you didn't even know existed can
render your own computer unusable
5. Properties of Distributed Systems
Concurrency
Multicore systems
Multiple hosts
No global clock
Theoretical impossibility
Expense of accurate clocks
Independent view
Message delay, failure
Impossible to distinguish slow vs. failed node
Independent failure
Message delivery (loss, corruption)
Nodes (fail-stop, Byzantine)
6. Software Concepts
An overview of
NOS (Network Operating Systems) (80s)
DOS (Distributed Operating Systems) (90s)
Middleware (00s)
System Description Main Goal
DOS
Tightly-coupled operating system for multi-
processors and homogeneous
multicomputers
Hide and manage
hardware
resources
NOS
Loosely-coupled operating system for
heterogeneous multicomputers (LAN and
WAN)
Offer local
services to remote
clients
Middleware
Additional layer atop of NOS implementing
general-purpose services
Provide
distribution
transparency
7. Definition of a Distributed System (2)
Figure 1-1. A distributed system organized as middleware. The
middleware layer extends over multiple machines, and offers
each application the same interface.
8. Transparency in a Distributed System
Figure 1-2. Different forms of transparency in a
distributed system (ISO, 1995).
Other forms:
Parallelism Hide the number of nodes working on a task
Size Hide the number of components in the system
Revision Hide changes in software/hardware versions
9. Challenges
Performance
Concurrency
Failures
Scalability
System updates/growth
Heterogeneity
Openness
Multiplicity of ownership, authority
Security
Quality of service/user experience
Transparency
Debugging
10. Approaches
Virtual clocks
Group communication
Heartbeats/failure detection, group membership
Distributed agreement, snapshots
Leader election
Transaction protocols
Redundancy, replication, caching
Indirection - naming
Distributed mutual exclusion
Middleware, modularization, layering
Decomposition vs. integration
Cryptographic protocols
11. Scalability Problems
Figure 1-3. Examples of scalability limitations.
Engineering = art of compromise (making tradeoffs)
Distributed systems many theoretical results on lower
bounds of tradeoffs that limit practical solutions
12. Scalability Examples
Distributed systems are ubiquitous and necessary:
Web search
Financial transactions
Multiplayer games
DNS
Travel reservation systems
Utility infrastructure (e.g., power grid)
Embedded systems (e.g., cars)
Sensor networks
Failure to scale is fatal
Instagram share cellphone pix
Facebook IPO
13. Web Search
Google uses thousands of machines to
Provide search results
Run Page-Rank algorithm
Issues
Connecting large number of machines
Distributed file system (GFS)
Indexing
Programming model
Scaling up when current system reaches limits
14. Financial Transactions
Volume is huge
4 million messages per second
50 million things you can trade
Requirements are stringent
Low latency
24/7 operation (around the world)
Failure is not an option
Facebook NASDAQ Freeze
Transaction system overwhelmed
Hours to complete transactions in falling market
15. Multiplayer Games
Very popular huge market
Characteristics
May have millions of players
Players operate in same world
Players interact with world, each other
Issues
Number of users
Latency, consistency
Coordination of multiple servers
Architecture???
16. Scalability Problems
Characteristics of decentralized algorithms:
No machine has complete information about the
system state.
Machines make decisions based only on local
information.
Failure of one machine does not ruin the
algorithm.
There is no implicit assumption that a global
clock exists.
17. Scaling Techniques (1)
Figure 1-4. The difference between letting (a) a server
or (b) a client check forms as they are being filled.
19. Pitfalls when Developing
Distributed Systems
False assumptions made by first time developer:
The network is reliable.
The network is secure.
The network is homogeneous.
The topology does not change.
Latency is zero.
Bandwidth is infinite.
Transport cost is zero.
There is one administrator.
20. Multicore Systems
Knights corner: 64 cores on a chip
Intel Cloud in a Chip 48 cores/256GB @$9K
http://www.intel.com/content/www/us/en/research/intel-labs-single-
chip-cloud-computer.html
Most hosts are 2, 4, or 8 core now
Fine-grained parallelism hard
Detailed knowledge of algo/programmer involved
Very fancy compiler
Scheduling a challenge
Virtualization
Treat N cores as N hosts (with low latency comm)
Do sequential programming
Use DS framework to integrate
26. Transaction Processing Systems (2)
Characteristic properties of transactions:
Atomic: To the outside world, the transaction
happens indivisibly.
Consistent: The transaction does not violate
system invariants.
Isolated: Concurrent transactions do not
interfere with each other.
Durable: Once a transaction commits, the
changes are permanent.
Known as ACID properties
32. Electronic Health Care Systems (1)
Questions to be addressed for health care systems:
Where and how should monitored data be
stored?
How can we prevent loss of crucial data?
What infrastructure is needed to generate and
propagate alerts?
How can physicians provide online feedback?
How can extreme robustness of the monitoring
system be realized?
What are the security issues and how can the
proper policies be enforced?
33. Electronic Health Care Systems (2)
Figure 1-12. Monitoring a person in a pervasive electronic health
care system, using (a) a local hub or
(b) a continuous wireless connection.
34. Sensor Networks (1)
Questions concerning sensor networks:
How do we (dynamically) set up an
efficient tree in a sensor network?
How does aggregation of results take
place? Can it be controlled?
What happens when network links fail?
35. Sensor Networks (2)
Figure 1-13. Organizing a sensor network database, while storing
and processing data (a) only at the operators site or
36. Sensor Networks (3)
Figure 1-13. Organizing a sensor network database, while storing
and processing data or (b) only at the sensors.
May also do data fusion/aggregation/processing at nodes
along the path to the master node/operator
37. Some Fundamental Issues
How do we decompose a complex
problem/task into logical/manageable
chunks?
What is the physical architecture?
How do we assign roles/responsibilities to
physical components?
How do we find components (logical and
physical)?
How do we define and maintain
consistency?