This document discusses several CPU high-efficiency programming techniques, including pipeline processing, superscalar execution, out-of-order execution, branch prediction, caching, and multi-core processing. It then provides examples of optimizing code for Intel processors using techniques like reducing branch prediction misses, taking advantage of out-of-order execution and avoiding long dependency chains, using bit operations instead of comparisons, and leveraging parallel execution across processor cores.
The document discusses IO subsystem architecture in Linux. It contains 3 layers: block layer, DM layer and request queue/elevator. The block layer handles generic block IO requests and completion events. The DM layer consists of components like LVM2 and EVMS. The request queue schedules requests using algorithms like deadline and anticipatory. It also contains probes and tracepoints to monitor IO events.
SystemTap is a dynamic tracing tool for Linux systems. It allows users to easily gather information about the running Linux system by defining probe points in a script. The script is compiled into a kernel module which can then be loaded to monitor the specified probe points. Some examples of useful probe points include functions, system calls, and kernel statements. SystemTap scripts can be used to trace execution, profile performance, monitor kernel functions and debug problems by printing at probe points. It provides a safe way to observe a live system without needing to recompile the kernel.
This document discusses various IO devices and measurement tools. It describes the chipset, SATA/SAS disks, SSDs, PCIe flash cards, RAID cards, and NVRAM cards. It provides example specifications and performance numbers for these devices. Tools discussed for measuring IO include fio, iostat, iotop, pidstat, hwconfig, and lsblk. The document considers how IO depth impacts device performance and questions the purpose of RAID cards and ensuring data integrity with PCIe cards.
This document provides an overview of understanding computer networks. It discusses network card models and bandwidth, latency of gigabit network cards, new trends in network cards, important performance metrics, Linux network protocol stacks, tuning the protocol stack, interrupt balancing, network bonding, observing network behavior with tools like ksysguard, wireshark, iptraf, socktop, the cost of network system calls, issues caused by insufficient memory in the protocol stack, packet dropping observation and analysis, and the ethtool utility.
This document discusses CPUs and provides information about their architecture and performance. It begins with an overview and outlines topics like measurement, utilization, chipset architecture, cache hierarchy, and components inside CPUs. Examples are given of Intel Xeon and Sandy Bridge CPUs. Performance numbers are listed for operations like L1/L2 cache references and network/disk data transfers. Tools for investigating hardware topology and benchmarking micro-level performance are also introduced.
Delivered at the FISL13 conference in Brazil: http://www.youtube.com/watch?v=K9w2cipqfvc
This talk introduces the USE Method: a simple strategy for performing a complete check of system performance health, identifying common bottlenecks and errors. This methodology can be used early in a performance investigation to quickly identify the most severe system performance issues, and is a methodology the speaker has used successfully for years in both enterprise and cloud computing environments. Checklists have been developed to show how the USE Method can be applied to Solaris/illumos-based and Linux-based systems.
Many hardware and software resource types have been commonly overlooked, including memory and I/O busses, CPU interconnects, and kernel locks. Any of these can become a system bottleneck. The USE Method provides a way to find and identify these.
This approach focuses on the questions to ask of the system, before reaching for the tools. Tools that are ultimately used include all the standard performance tools (vmstat, iostat, top), and more advanced tools, including dynamic tracing (DTrace), and hardware performance counters.
Other performance methodologies are included for comparison: the Problem Statement Method, Workload Characterization Method, and Drill-Down Analysis Method.
Surge 2014: From Clouds to Roots: root cause performance analysis at Netflix. Brendan Gregg.
At Netflix, high scale and fast deployment rule. The possibilities for failure are endless, and the environment excels at handling this, regularly tested and exercised by the simian army. But, when this environment automatically works around systemic issues that aren’t root-caused, they can grow over time. This talk describes the challenge of not just handling failures of scale on the Netflix cloud, but also new approaches and tools for quickly diagnosing their root cause in an ever changing environment.
Linux 4.x Tracing Tools: Using BPF SuperpowersBrendan Gregg
?
Talk for USENIX LISA 2016 by Brendan Gregg.
"Linux 4.x Tracing Tools: Using BPF Superpowers
The Linux 4.x series heralds a new era of Linux performance analysis, with the long-awaited integration of a programmable tracer: Enhanced BPF (eBPF). Formally the Berkeley Packet Filter, BPF has been enhanced in Linux to provide system tracing capabilities, and integrates with dynamic tracing (kprobes and uprobes) and static tracing (tracepoints and USDT). This has allowed dozens of new observability tools to be developed so far: for example, measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. Tracing superpowers have finally arrived.
In this talk I'll show you how to use BPF in the Linux 4.x series, and I'll summarize the different tools and front ends available, with a focus on iovisor bcc. bcc is an open source project to provide a Python front end for BPF, and comes with dozens of new observability tools (many of which I developed). These tools include new BPF versions of old classics, and many new tools, including: execsnoop, opensnoop, funccount, trace, biosnoop, bitesize, ext4slower, ext4dist, tcpconnect, tcpretrans, runqlat, offcputime, offwaketime, and many more. I'll also summarize use cases and some long-standing issues that can now be solved, and how we are using these capabilities at Netflix."