際際滷

際際滷Share a Scribd company logo
ShipItCon 2022
#sic2022
Your application
has died
and thats OK!
Hello!
I am Anton Whalley
Partner Technical Specialist
Node.js Diagnostics Member  llnode maintainer
IBM Strategic Open Source Committer
Rust Dublin Co-Organiser
Certified Kubernetes Administrator
3
A (very) short
history of crash
analysis
Magnetic Cores created Mid 1950s
First Deployed by MIT/US Navy
Commercialised IBM 770
4
Modern Crash Analysis
5
Executable
Running Process
Terminated Process
Load
Abort
Snapshot
E.G gcore
Core Dump
*nix
Mini Dump
Windows
Runtime Specific
Reporting
WinDbg
gcc, lldb
Custom Tool
Files
Joyee Cheung  Bring Javascript Back To Life https://www.youtube.com/watch?v=XQIo9knnb2s
Why Crash Analysis
 Locked Down Production Environment
 (Should be) Restricted Access
 Debug tools not available
 Heisenbugs
 Unoptimized Builds
 Issues with logs
 Only captures known/knowns
 Requires adhoc updates
 Stacktraces Dont capture the full state
6
David Pacheco - https://dl.acm.org/doi/10.1145/2039359.2039361
Failure Types
7
Implicit
Explicit
Type Error
Uncaught Exception
Segmentation Fault
Panic
Hardware Error
Assertion Failure
Process Abort
Error Code Exit
Incorrect Result
Leaks Resources
Stops Doing Work
Pathological
Fatal
Non-fatal
Error Message
Returns Error Code
Bryan Cantrill  Docker in Production https://www.youtube.com/watch?v=AdMqCUhvRz8
CNCF Project Coverage
8
Implicit
Explicit
Fatal
Non-fatal
The birth place
of site reliability
engineering?
HMS Sailsbury 1747
First Controlled Experiment
Identified a cure for scurvy
40yrs to be adopted
In 1780 1457 admissions;
in 1806 there were 2.
9
The return of
scurvy!
HMS Alert 1875
85% of crew succumbed
Lemons changed for limes - 1860s
Misunderstood causes
The rise of steam engines meant
shorter trips
10
How We Forget
 Improvements in Adjacent Technology
 Questionable Refinements in Approach
 Solutions are not Accessible
 Concepts with Unbalanced weighting
 The Generation Gap
11
Crash Analysis in K8s
 Core Dump Handler
 Open Source
 Cloud Agnostic - xKS
 From 12 Separate
organisations contribute
12
https://github.com/IBM/core-dump-handler/
13
On Demand Crash Analysis
Augmented Pods with IDE
Integrated Into Git workflow
DX Infrastructure  gitpod.io
https://venshare.com/blog/gitops-coredump/
Future Work
14
- Automated Analysis
- Remove Sensitive Data
- Cloud Events
15
Thanks!
For Listening
You can find me at @dhight9 or @No9 GitHub
https://openjsf.org/blog/

More Related Content

Similar to Your application has died and that's OK (13)

PDF
From IoT to Software Miniaturisation
Ptidej Team
PDF
Agile, Lean, Rugged: The Paper Edition - Ines Sombra's keynote at GOTO London
Fastly
PDF
Agile, Rugged, and Lean - The Paper Edition
Ines Sombra
PDF
Crash Analysis with Reverse Taint
marekzmyslowski
PDF
The Internet-of-things: Architecting for the deluge of data
bcantrill
PDF
Manta: a new internet-facing object storage facility that features compute by...
Hakka Labs
PPTX
Introduction to Critical Systems Engineering (CS 5032 2012)
Ian Sommerville
PPTX
Fault-Tree-Analysis for learning and understanding
AbdulMujebRadi
PPT
2008-10-09 - Bits and Chips Conference - Embedded Systemen Architecture patterns
Jaap van Ekris
PPTX
Originating a new system (System Engineering).pptx
JenelIturiaga
PDF
Five Common Mistakes made when Conducting a Software FMECA
Ann Marie Neufelder
PDF
Zebras all the way down: The engineering challenges of the data path
bcantrill
PPTX
Practical Security Assessments of IoT Devices and Systems
Ollie Whitehouse
From IoT to Software Miniaturisation
Ptidej Team
Agile, Lean, Rugged: The Paper Edition - Ines Sombra's keynote at GOTO London
Fastly
Agile, Rugged, and Lean - The Paper Edition
Ines Sombra
Crash Analysis with Reverse Taint
marekzmyslowski
The Internet-of-things: Architecting for the deluge of data
bcantrill
Manta: a new internet-facing object storage facility that features compute by...
Hakka Labs
Introduction to Critical Systems Engineering (CS 5032 2012)
Ian Sommerville
Fault-Tree-Analysis for learning and understanding
AbdulMujebRadi
2008-10-09 - Bits and Chips Conference - Embedded Systemen Architecture patterns
Jaap van Ekris
Originating a new system (System Engineering).pptx
JenelIturiaga
Five Common Mistakes made when Conducting a Software FMECA
Ann Marie Neufelder
Zebras all the way down: The engineering challenges of the data path
bcantrill
Practical Security Assessments of IoT Devices and Systems
Ollie Whitehouse

Recently uploaded (20)

PDF
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
PPTX
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Pitch ...
Michele Kryston
PPTX
Smarter Governance with AI: What Every Board Needs to Know
OnBoard
PDF
Python Conference Singapore - 19 Jun 2025
ninefyi
PDF
Unlocking FME Flows Potential: Architecture Design for Modern Enterprises
Safe Software
PDF
UiPath Agentic AI ile Ak脹ll脹 Otomasyonun Yeni a脹
UiPathCommunity
PPTX
Paycifi - Programmable Trust_Breakfast_PPTXT
FinTech Belgium
PPTX
叶Wondershare Filmora Crack 14.0.7 + Key Download 2025
sebastian aliya
PPTX
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
PDF
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
ScyllaDB
DOCX
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
PDF
Optimizing the trajectory of a wheel loader working in short loading cycles
Reno Filla
PDF
The Future of Product Management in AI ERA.pdf
Alyona Owens
PPSX
Usergroup - OutSystems Architecture.ppsx
Kurt Vandevelde
PDF
How to Visualize the Spatio-Temporal Data Using CesiumJS
SANGHEE SHIN
PDF
Scaling i.MX Applications Processors Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
PPTX
Simplifica la seguridad en la nube y la detecci坦n de amenazas con FortiCNAPP
Cristian Garcia G.
PDF
The Growing Value and Application of FME & GenAI
Safe Software
PPTX
Enabling the Digital Artisan keynote at ICOCI 2025
Alan Dix
PDF
MPU+: A Transformative Solution for Next-Gen AI at the Edge, a Presentation...
Edge AI and Vision Alliance
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Pitch ...
Michele Kryston
Smarter Governance with AI: What Every Board Needs to Know
OnBoard
Python Conference Singapore - 19 Jun 2025
ninefyi
Unlocking FME Flows Potential: Architecture Design for Modern Enterprises
Safe Software
UiPath Agentic AI ile Ak脹ll脹 Otomasyonun Yeni a脹
UiPathCommunity
Paycifi - Programmable Trust_Breakfast_PPTXT
FinTech Belgium
叶Wondershare Filmora Crack 14.0.7 + Key Download 2025
sebastian aliya
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
ScyllaDB
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
Optimizing the trajectory of a wheel loader working in short loading cycles
Reno Filla
The Future of Product Management in AI ERA.pdf
Alyona Owens
Usergroup - OutSystems Architecture.ppsx
Kurt Vandevelde
How to Visualize the Spatio-Temporal Data Using CesiumJS
SANGHEE SHIN
Scaling i.MX Applications Processors Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
Simplifica la seguridad en la nube y la detecci坦n de amenazas con FortiCNAPP
Cristian Garcia G.
The Growing Value and Application of FME & GenAI
Safe Software
Enabling the Digital Artisan keynote at ICOCI 2025
Alan Dix
MPU+: A Transformative Solution for Next-Gen AI at the Edge, a Presentation...
Edge AI and Vision Alliance

Your application has died and that's OK

  • 3. Hello! I am Anton Whalley Partner Technical Specialist Node.js Diagnostics Member llnode maintainer IBM Strategic Open Source Committer Rust Dublin Co-Organiser Certified Kubernetes Administrator 3
  • 4. A (very) short history of crash analysis Magnetic Cores created Mid 1950s First Deployed by MIT/US Navy Commercialised IBM 770 4
  • 5. Modern Crash Analysis 5 Executable Running Process Terminated Process Load Abort Snapshot E.G gcore Core Dump *nix Mini Dump Windows Runtime Specific Reporting WinDbg gcc, lldb Custom Tool Files Joyee Cheung Bring Javascript Back To Life https://www.youtube.com/watch?v=XQIo9knnb2s
  • 6. Why Crash Analysis Locked Down Production Environment (Should be) Restricted Access Debug tools not available Heisenbugs Unoptimized Builds Issues with logs Only captures known/knowns Requires adhoc updates Stacktraces Dont capture the full state 6 David Pacheco - https://dl.acm.org/doi/10.1145/2039359.2039361
  • 7. Failure Types 7 Implicit Explicit Type Error Uncaught Exception Segmentation Fault Panic Hardware Error Assertion Failure Process Abort Error Code Exit Incorrect Result Leaks Resources Stops Doing Work Pathological Fatal Non-fatal Error Message Returns Error Code Bryan Cantrill Docker in Production https://www.youtube.com/watch?v=AdMqCUhvRz8
  • 9. The birth place of site reliability engineering? HMS Sailsbury 1747 First Controlled Experiment Identified a cure for scurvy 40yrs to be adopted In 1780 1457 admissions; in 1806 there were 2. 9
  • 10. The return of scurvy! HMS Alert 1875 85% of crew succumbed Lemons changed for limes - 1860s Misunderstood causes The rise of steam engines meant shorter trips 10
  • 11. How We Forget Improvements in Adjacent Technology Questionable Refinements in Approach Solutions are not Accessible Concepts with Unbalanced weighting The Generation Gap 11
  • 12. Crash Analysis in K8s Core Dump Handler Open Source Cloud Agnostic - xKS From 12 Separate organisations contribute 12 https://github.com/IBM/core-dump-handler/
  • 13. 13 On Demand Crash Analysis Augmented Pods with IDE Integrated Into Git workflow DX Infrastructure gitpod.io https://venshare.com/blog/gitops-coredump/
  • 14. Future Work 14 - Automated Analysis - Remove Sensitive Data - Cloud Events
  • 15. 15 Thanks! For Listening You can find me at @dhight9 or @No9 GitHub https://openjsf.org/blog/

Editor's Notes

  • #5: Crash analysis is how we actually used to debug Extracting information from a malfunctioning application used to be how we debugged.
  • #12: Improved Ship Technology == Improved development, practices code debuggers Focus on Backtraces lemons swapped for limes Lemons no longer accessible make tooling easy to find and use Biological Causes == The concept of uptime lead to starved connections/DNS failures The generational gap muscle memory of the organisaton fails
  • #13: https://github.com/IBM/core-dump-handler/