�ݺ�ߣ

Cloud Computing
Considerations

Malisa Ncube
Developer Evangelist �C Microsoft
be.com@malisancube
inbox@malisancube.com

What does it take to run an app?
Inspired by Steve Marx
http://blog.smarx.com/posts/what-is-windows-azure-a-hand-drawn-video

What does it take to run an app?

Scalability
? Measured by the number of users that the application can support effectively at the
same time.
? Relates to hardware resources needed (CPU,Memory, Disk and network bandwidth.
? Application logic runs on compute nodes and data on data nodes.
? Vertical scaling is achieved by increasing resources within existing nodes. This is limited
by hardware.
? Horizontal scaling is achieved by adding more nodes. It is more efficient with
homogeneous nodes.
? A scale unit is a combination of resources that needs to be scaled together in horizontal
homogeneous nodes.
? Resource contention limits scalability.
? Scalability is business concern. Google noticed a 20% reduction in traffic after introducing
500ms to page response time. Amazon 100ms caused 1% decrease in revenue.

The cloud
? Gives (illusion) infinite resources and limited by capacity of individual
virtual machines.
? Enabled by short term resource rental model
? Enabled by metered pay-for-use model. Usage costs are transparent.
? Enabled by self-service, on-demand, programmatic provisioning and
releasing of resources, scaling is automatable.
? Gives an ecosystem of managed platform services such as VMs, data
storage, networking, messaging and caching.
? Gives a simplified application development model.

A cloud native application
? Lets the platform do the hard stuff by leveraging the application services.
? Uses non-blocking asynchronous communication in a loosely coupled
architecture.
? Scales horizontally in an elastic mechanism.
? Does not waste resources
? Handles scaling events, node failures, transient failures without downtime
or performance degradation.
? Uses geographic distribution to minimize network latency.
? Upgrades without downtime.
? Scales automatically using proactive and reactive actions.
? Monitors and manages application logs as nodes come and go.

Horizontal scaling compute Pattern
? Horizontal scaling is reversible.
? Supports scaling out and scaling in
? Stateful nodes
? They keep user session information
? They have single point of failure
? Stateless nodes
? Store session information externally from the nodes.

Queue-Centric Workflow Pattern
? Used in web applications to decouple communication between web-tier and service tier
by focusing on the flow of commands.
? A service tier that is unreliable or slow can affect the web tier negatively.
? All communication is asynchronous as message over a queue
? The sender and receiver are loosely coupled. Neither one knows about the
implementation of the other.
? There is some edge cases where the risk of invisibility windows occurs when processing
takes longer than allowed.
? Idempotency concerns. Database transactions, compensating transaction.
? Poison messages placed in dead letter queue.
? QCW is not full CQRS as it does not articulate the read model.

Autoscaling Pattern
? Assumes horizontal scaling architecture
? Concerns are cost optimization and scalability
? Auto-scaling solutions enable scheduled (proactive and reactive) rules
that enable the provisioning of resources as needed.
? Throttling by selectively enabling or disabling features or functionality
based on environmental signals.

Eventual Consistency
? Simultaneous requests for the same data may result in different values.
? Leads to better performance and lower cost.
? Uses Brewer��s CAP theorem (Consistency Availability and Partition
tolerance). 3 Guarantees and application an pick only 2.
? Consistency. Everyone get the same answer.
? Availability. Clients have ongoing access (even if there is a partial system failure)
? Partition tolerance. Means correct operation even if some nodes are cut of from the
network.
? DNS updates and NoSQL are examples of eventually consistent services.

MapReduce Pattern
? Data processing approach for processing highly parallelized datasets.
? Require a mapper and reducer functions. Accepting data and producing
output with subsets of data and output of the mapper aggregated and sent
to the reducer.
? Used to process documents, server logs, social graphs.
? Hadoop implements MR as a batch processing system, optimized for large
amounts of data than response time.
? Created by Google Inc.
? Most effective to bring compute function to data
? Commonly refered to as BigData.
? Hadoop has abstractions on top that create functions e.g. (Mahout - ML,
Hive �C SQL like, Pig �C dataflow, Sqoop �C RDBMs connector)

Database sharding/Federation Pattern
? A database divided into several shards, where each database row
exists only on one shard.
? Shards do not reference other shards.
? Slave shard nodes a typically eventually consistent and readonly.
? Programming model is simplified by maintaining a single logical
database with horizontal scaling.
? Fan-Out queries used to make updates to dependent federation
members. Similar to Windows Azure SQL Data Sync and MapReduce.

Multitenancy and commodity Hardware
? Multitenancy �C multi companies using the system, usually a software
system with an illusion that they are the only tenant.
? Multitenancy in the cloud are standard: DNS Services, Hardware for
VMs, Load balancers, Identity management among others.
? Commonly used in SaaS environments where each tenant runs in a
secure sandbox (HyperV, RDBMS).
? Perfomance managed by using quotas, running resource hungry
service with those less intensive.
? Commodity hardware fails occasionally. Plan on it happening on your
compute nodes and plan on handling it.

Busy Signal Pattern
? Applies to services or resources accessed over a network where a
signal response is busy.
? These may include management, data services and more, and
periodic transient failure should be expected. E.g. Busy signal on
telephones.
? A good application should be able to handle retries and properly
handle failures.
? On HTTP. Response 503 Service Unavailable.
? Clearly identify Busy Signal and Errors and retry on Busy state after an
interval. Log them for further analysis of patterns.

Node Failure Pattern
? Concerns availability and graceful handling of unexpected
application/hardware failures, reboots or node shutdown.
? Application state should be in reliable storage, not on local disk or
individual node.
? Avoid single point of failure by using the N+1 rule.
? AWS & Azure send signals from nodes indicating shutdown and traffic
is routed to different tenants.
? An approach would include having the UI code to retry on failures,
throttling some of the features while the recovery is taking place.
? Azure runs in two fault domains

Network latency problem
? Network latency is a function of distance and bandwith
? Consider Data Compression, Background processing, Predictive Fetching.
? Move applications closer to users
? Move application data closer to users
? Ensure nodes within your application are closer together (Colocation)
? WA uses Affinity Groups
? Consider Valet/Key Pattern for public or temporary access. (Blob storage)
protected through hashing.
? Consider Content Delivery Network (CDN) �C global distributed cache
effective for frequently accessed content. Can be inconsistent.

Feedback, materials and contacts
@malisancube

�ݺ�ߣ

Cloud Computing - Geektalk

More Related Content

Cloud Computing - Geektalk

Editor's Notes