This document provides an overview of observability in distributed computing systems using Kubernetes and Istio. It begins with definitions of key concepts like monitoring, metrics, and observability. It then discusses the differences between traditional "black box" monitoring approaches and modern "white box" approaches enabled by service meshes. The document demonstrates installing Istio on a Kubernetes cluster and deploying the Bookinfo sample application. It concludes with references for further reading.
2. WHO AM I?
Possibly the only Turkish, vegan
living in London, ex-.Net
developer, working in DevOps,
at Contino, proud-mum of 6-yo,
and prod-wife
Loves exploring, learning,
sharing, inevitably communities!
Ex-co-orgasinator of London
PowerShell User Group
@ebrucucen#pwshsummit19 02/05/2019
4. MONITORING
Collecting, processing, aggregating, and
displaying real-time quantitative data about a
system, such as query counts and types, error
counts and types, processing times, and
server lifetimes.[1]
@ebrucucen#pwshsummit19 02/05/2019
[1] Monitoring Distributed Systems, by Betsy Beyer, Rob Ewaschuk
6. TEXT
Audit Logs
Azure Tenant
Metrics
Application
Azure Subscription
Service Health
Activity Logs
Azure Resources
Diagnostic Logs
Monitoring Solutions
Guest OS
Application Insights
Dependency Agent
Log Analytics Agent
Diagnostics Extension
Azure
Custom API
Data Collector API
Non-Azure
Service Configuration
NOT ENOUGH!
8. TRADITIONAL SYSTEMS
Monitoring (Black box)
Structured Log
Well-de鍖ned Metrics
Tracing
@ebrucucen#pwshsummit19 02/05/2019
Not Scalable
Up!=Working
May not be
complete
15. OBSERVABILITY
In control theory, observability is a
measure of how well internal states of a
system can be inferred from knowledge
of its external outputs[2]
@ebrucucen#pwshsummit19 02/05/2019
[2] Wikipedia, 1960, Rudolf Kalman
16. The goal of an observability team is not to
collect logs, metrics or traces. It is to build
a culture of engineering based on facts
and feedback, and then spread that
culture within the broader organization.
@ebrucucen#pwshsummit19 02/05/2019
Brian Knox (DigitalOcean)
OBSERVABILITY
17. 8 FALLACIES OF DISTRIBUTED COMPUTING[3]
1. Thenetworkis reliable
2. Latencyis zero
3. Bandwidthis in鍖nite
4. The network issecure
5. Topologydoesn't change
6. There is oneadministrator
7. Transport cost is zero
8. The network is homogeneous
@ebrucucen#pwshsummit19 02/05/2019
[3] L. Peter Deutsch,1994, James Gosling 1997
18. 8 FALLACIES OF DISTRIBUTED COMPUTING[3]
1. Thenetworkis reliable
2. Latencyis zero
3. Bandwidthis in鍖nite
4. The network issecure
5. Topologydoesn't change
6. There is oneadministrator
7. Transport cost is zero
8. The network is homogeneous
@ebrucucen#pwshsummit19 02/05/2019
[3] L. Peter Deutsch,1994, James Gosling 1997
Manual restart
Dropped packet
Bottlenecks
SSL/TLS?
Cattle
Con鍖icting rules
I/O CPU
Not anymore
22. NETFLIX - OSS- JAVAMICROSERVICES BEFORE SERVICE MESH
CONTAINER
RUNTIME C
SERVICE C
Load-balancer
Discovery
Resiliency
Metrics
Tracing
CONTAINER
RUNTIME A
SERVICE A
Load-balancer
Discovery
Resiliency
Metrics
Tracing
CONTAINER
RUNTIME B
SERVICE B
Load-balancer
Discovery
Resiliency
Metrics
Tracing
23. TEXT FULLY CONNECTED NETWORK
SERVICE
C
SERVICE
B
SERVICE
A
SERVICE
D
SERVICE
E
SERVICE
F
24. TEXT
Next logical step after a container orchestration deployment
- insight (observability), uniformly and ubiquitously
- connection
- control
- observability
- security
SERVICE MESH
25. The Enterprise Path to Service Mesh Architectures,Lee Calcote
NETWORK PLANES
26. POD A POD B
TEXT SIDECAR PATTERN
INGRESS EGRESS
SERVICE A SERVICE B
PROXY
SIDECAR
PROXY
SIDECAR
CONTROL PLANE
CONTAINER ORCHESTRATION
27. ISTIO NAMESPACE
TEXT ISTIO
CITADEL PILOT MIXER
SERVICE
FOO
BAR POD
SIDECAR PROXY
FOO CONTAINER
SERVICE
FOO
FOO POD
SIDECAR PROXY
FOO CONTAINER
Discovery & con鍖g
tls certs Telemetry Reports Policy Checks