Roman Vynar, Tim Vaillancourt
Open Source Monitoring for MySQL and MongoDB with
Grafana and Prometheus
This is a hands-on tutorial on setting up the monitoring and graphing for MySQL and MongoDB
servers using Prometheus monitoring system and time-series database with Grafana feature
rich metrics dashboard.
? Prometheus overview
? Prometheus metric exporters
? Queries and expressions on Prometheus DB
? Grafana overview
? Creating graphs and dashboards in Grafana
? MySQL graphing capabilities
? MongoDB graphing capabilities
? Creating alerts in Prometheus
? Using Alertmanager for getting notifications
? Working with Prometheus HTTP API
? Using InfluxDB with Prometheus as a long-term storage option
Virtualbox preparation
There is an appliance containing two pre-installed virtual machines:
? db1.vm - monitor and master db server
? db2.vm - slave db server
Copy the files from USB stick provided to your laptop
Double-click on the .OVA file to import appliance into Virtualbox
Virtualbox network
Each instance is configured with 2 network adapters:
? Host-only adapter
Configure host-only network from the main menu:
Virtualbox > Preferences > Network > Host-only Networks > ^vboxnet0 ̄ or ^Virtualbox Host-
Only Ethernet Adapter ̄ > edit and set: /
Windows users only: open Setting > Network and click OK to re-save host-only network
Starting VMs
Internal static IP addresses assigned:
? db1.vm -
? db2.vm -
Both instances are running CentOS 7 and have all the necessary packages pre-installed.
Unix and MySQL root password: PerconaLive_123
Start both machines
Verify network connectivity
IMPORTANT! The system time should be in sync:
systemctl restart ntpd.service
Pre-installed packages
Percona YUM repo and database packages:
rpm -Uvh http://www.percona.com/downloads/percona-release/redhat/0.1-3/percona-
yum install Percona-Server-server-57 Percona-Server-client-57 Percona-Server-shared-57
yum install Percona-Server-MongoDB
yum install sysbench
yum install initscripts fontconfig
yum install https://grafanarel.s3.amazonaws.com/builds/grafana-2.6.0-1.x86_64.rpm
yum install https://s3.amazonaws.com/influxdb/influxdb-0.10.0-1.x86_64.rpm
pip influxdb pyyaml
Prometheus software
Prometheus and Alertmanager tarballs:
? https://github.com/prometheus/prometheus/releases/download/0.17.0/prometheus-
? https://github.com/prometheus/alertmanager/releases/download/0.1.1/alertmanager-
Pre-compiled exporters from the sources:
? https://github.com/prometheus/node_exporter
? https://github.com/prometheus/mysqld_exporter
? https://github.com/Percona-Lab/prometheus_mongodb_exporter
Prometheus overview
Prometheus is an open-source monitoring system and time series database.
Main features:
? a multi-dimensional data model (time series identified by metric name and key/value pairs)
? a flexible query language to leverage this dimensionality
? no reliance on distributed storage; single server nodes are autonomous
? time series collection happens via a pull model over HTTP
? pushing time series is supported via an intermediary gateway
? targets are discovered via service discovery or static configuration
? multiple modes of graphing and dashboarding support
Prometheus architecture
Prometheus metric exporters
? Node/system metrics exporter
? AWS CloudWatch exporter
? Blackbox exporter
? Collectd exporter
? Consul exporter
? Graphite exporter
? HAProxy exporter
? InfluxDB exporter
? JMX exporter
? Mesos task exporter
? MySQL server exporter
? SNMP exporter
? StatsD exporter
? Apache exporter
? BIND exporter
? Django exporter
? Jenkins exporter
? Memcached exporter
? Minecraft exporter module
? MongoDB exporter
? New Relic exporter
? Nginx metric library
? PostgreSQL exporter
? RabbitMQ exporter
? Redis exporter
? ´ many more ´
Start Prometheus
Most of the actions we will be running on db1 which is a monitor server.
Let¨s review Prometheus config prepared for this tutorial:
cat prometheus.yml
Extract binaries:
tar zxf prometheus-0.17.0.linux-amd64.tar.gz
Check out the startup script:
cat start.sh
Start Prometheus:
./start.sh prometheus
tail -f /var/log/prometheus.log
Access web interface
Go to
Querying Prometheus DB
Prometheus provides a functional expression language that lets the user select and aggregate
time series data in real time.
The result of an expression can either be shown as a graph, viewed as tabular data in
Prometheus's expression browser, or consumed by external systems via the HTTP API.
? http_requests_total
? http_requests_total{job="prometheus", handler="static"}
? {__name__=~"process_.+"}
? scrape_duration_seconds
? scrape_duration_seconds + 2
PromQL functions
Grafana overview
Grafana is an open source, feature rich metrics dashboard and graph editor for Graphite, Elasticsearch,
OpenTSDB, Prometheus and InfluxDB.
Main features:
? User-friendly interface
? Rich graphing, flexible scaling
? Mixed styling
? Themes
? Template variables
? Scripted dashboards
? Repeating graphs and panels
? Authentication, LDAP support
? Annotations
? Shapshot sharing
Start using Grafana
Login to Grafana using admin/admin credentials.
Add datasource
Patch Grafana 2.6.0
It is important to apply the following patch on your Grafana in order to use the interval
template variable to get the good zoomable graphs. The fix is simply to allow variable in Step
field on Grafana graph editor page. For more information, you can look at Grafana¨s github
PR#3757 and PR#4257. We hope the fix will be released in the next Grafana version.
sed -i 's/step_input:""/step_input:c.target.step/; s/ HH:MM/ HH:mm/;
sed -i 's/h=a.interval/h=g.replace(a.interval, c.scopedVars)/'
Percona Grafana dashboards
Open-source and available @ https://github.com/percona/grafana-dashboards
This is a set of Grafana dashboards to be used with Prometheus and InfluxDB datasources for
MySQL and system monitoring. MongoDB dashboard to be shared separately.
? MySQL InnoDB Metrics
? MySQL MyISAM Metrics
? MySQL Overview
? MySQL Performance Schema
? MySQL Query Response Time
? MySQL Replication
? MySQL Table Statistics
? MySQL User Statistics
? Galera Graphs
? TokuDB Graphs
? System Overview
? Disk Space
? Disk Performance
? Cross Server Graphs
? Summary Dashboard
? Trends Dashboard
? Prometheus
? [InfluxDB] 5m downsample
? [InfluxDB] 1h downsample
Install dashboards
Copy dashboard files:
cp -r grafana-dashboards/dashboards/ /var/lib/grafana/
Enable JSON dashboards by adding those lines to /etc/grafana/grafana.ini:
enabled = true
path = /var/lib/grafana/dashboards
Restart Grafana:
systemctl restart grafana-server.service
Creating and using dashboards
node_exporter collectors
Enabled in this tutorial:
? diskstats
? filesystem
? loadavg
? meminfo
? netdev
? stat
? time
? uname
? vmstat
Other available collectors:
? conntrack
? cpu
? entropy
? filefd
? mdadm
? netstat
? textfile
? version
? bonding
? devstat
? gmond
? interrupts
? ipvs
? ksmd
? lastlogin
? megacli
? meminfo_numa
? ntp
? runit
? supervisord
? systemd
? tcpstat
mysqld_exporter collectors
Enabled in this tutorial:
Other collectors:
Running exporters
Let¨s start the exporters on both nodes.
Start node_exporter:
./start.sh node_exporter
tail -20f /var/log/node_exporter.log
Start mysqld_exporter:
./start.sh mysqld_exporter
tail -f /var/log/mysqld_exporter.log
Start mongo instances and mongodb_exporters:
cd ~/grafana_mongodb_dashboards/examples
tail -f example/log/*/mongodb_exporter*
MySQL access for mysqld_exporter
mysqld_exporter requires MySQL credentials to connect to MySQL.
There are a few options:
? command-line argument: -config.my-cnf=<path>/.my.cnf
Note, if you use tilde to specify user¨s homedir it may not always expand to the actual path.
? using environment variables:
export DATA_SOURCE_NAME='user:pass@(localhost:3306)/'
export DATA_SOURCE_NAME='user:pass@unix(/var/lib/mysql/mysql.sock)/'
export DATA_SOURCE_NAME='user:pass@tcp(localhost:3306)/'
Check exporters status
db1, in the terminal:
curl http://localhost:9100/metrics
curl http://localhost:9104/metrics
curl http://localhost:9105/metrics
db2, via web browser:
Prometheus endpoints status:
Prometheus targets
At this point, you should see such picture
Monitoring system metrics
Monitoring disk performance
MySQL graphing capabilities
Let¨s generate some MySQL activity by running OLTP test with sysbench:
Observe MySQL dashboards
MongoDB Dashboards
cp -r /root/grafana_mongodb_dashboards/dashboards/* /var/lib/grafana/dashboards/
Restart grafana (systemctl restart grafana-server.service)
MongoDB graphing capabilities - Before
1. Beginning on `dcu/mongodb_exporter¨
2. Server Status output `db.serverStatus()¨
1. Uptime
2. Asserts
3. Durability
4. BackgroundFlushing
5. Connections
6. ExtraInfo
7. GlobalLock
8. IndexCounter
9. Locks
MongoDB graphing capabilities - After
1. Server Status output `db.serverStatus()¨
1. Uptime
15. Cursors
2. Replica Set Status Output `rs.status()¨
1. Replica Set State
2. Replica Set Optime
3. Replica Set Node-to-Node Ping
4. Replica Set Elections
3. Replica Set Oplog Info
1. Oplog head/tail timestamp
2. Oplog size bytes
3. Oplog item count
MongoDB graphing capabilities - After
4. Sharding Info (mongos)
1. Balancer Locks and Lock Updates
2. Is Cluster Balanced?
3. # of Shards, DBs, Collections, Chunks
4. # of Mongos processes
5. # of Balancer, Split and Sharding events
5. WiredTiger storage-engine (experimental)
6. Cache Usage
7. Block Usage
8. Transactions
9. Etc
MongoDB graphing capabilities - After
1. Server Status output `db.serverStatus()¨
1. Uptime
15. Cursors
2. Replica Set Status Output `rs.status()¨
1. Replica Set State
2. Replica Set Optime
3. Replica Set Node-to-Node Ping
4. Replica Set Elections
3. Replica Set Oplog Info
1. Oplog head/tail timestamp
2. Oplog size bytes
3. Oplog item count
MongoDB Exporter Metric Summary
Per-collection Summary:
1. 60 x DB-level MongoDB metrics on `mongos¨ nodes w/1-shard
? +5-8~ metrics per shard added
2. 157 x DB-level MongoDB metrics on `mongod¨ replica set nodes w/2 x members
? +5-8~ metrics per shard added
3. 676 x OS-level metrics on recent Linux 3.x+
Total metrics: 893+ per Collection (at minimum)!
Total MongoDB MMS metrics: ^400 per ping packet ̄ Reference: http://www.slideshare.net/mongodb/using-the-mongodb-monitoring-service-mms
Per-collection size:
? Raw: 35kb Mongod Replset w/1-node, 17kb Mongos w/1-shard, 91kb Linux node_exporter
? Estimated Snappy compression (used in LevelDB) is about 80%
Recommended fetch interval:
? 5 sec if possible, enough disk space (possibly less?)
? 10 sec (default) if not
Prometheus Metric Grouping with Labels
? Metrics level labels vs Target level labels
? Target-level labels can combine multiple exporters together
Mongo Node
<- Grafana
MongoDB graphing capabilities
MongoDB graphing capabilities
MongoDB graphing capabilities
Prometheus Auto-discovery (Future)
<- Consul
<- Prometheus
New WiredTiger Metrics and the Future
WiredTiger Supported:
? Cache
? BlockManager
? Transaction
? ConcurrentTransaction
? Log (coming soon!)
Future Metrics:
? PerconaFT engine metrics
? RocksDB engine metrics
? Profiler metrics
Making a Go-based Prometheus Exporter
Overall Steps:
1. Metric definition:
2. Function to ^collect ̄ the data (most of the logic):
Making a Go-based Prometheus Exporter
Overall Steps:
3. Function to ^export ̄ the data:
4. Function to ^describe ̄ the data:
Making a Go-based Prometheus Exporter
? Tips / Advice
? Always try to user incremented total values
? Everything is a float64 - store what provides value
? Do ^math ̄ operations on values in Grafana
? Vector labels are for high-cardinality, be conservative
? Not everything needs to be a graph / Prometheus query interface is powerful
Alerting with Prometheus
Alerting with Prometheus is separated into two parts. Alerting rules in Prometheus servers send
alerts to an Alertmanager.
The Alertmanager then manages those alerts, including silencing, inhibition, aggregation and
sending out notifications via methods such as email, PagerDuty, HipChat, Slack, Pushover.
The main steps to setting up alerting and notifications are:
? Create alerting rules in Prometheus
? Setup and configure the Alertmanager
? Configure Prometheus to talk to the Alertmanager with the -alertmanager.url flag
Prometheus alerts
ALERT ExporterDown
IF up == 0
FOR 1m
LABELS { severity = "page" }
summary = "{{$labels.alias}}: exporter down",
description = "Exporter on job '{{$labels.job}}' is not responding"
ALERT SystemMemory
IF round((node_memory_MemAvailable OR (node_memory_MemFree + node_memory_Buffers +
node_memory_Cached)) / node_memory_MemTotal * 100) < 5
FOR 1m
LABELS { severity = "page" }
summary = "{{$labels.alias}}: low memory",
description = "Free {{$value}}% of memory"
Configuring alerts in Prometheus
Let¨s review alert definitions prepared for this tutorial:
cat alerting.rules
Include alerting rules into prometheus.yml:
- alerting.rules
Reload prometheus:
kill -HUP `pidof prometheus`
Alerts in Prometheus web UI
Using Alertmanager
Let¨s review Alertmanager config prepared for this tutorial:
cat alertmanager.yml
Edit it with the appropriate email addresses for testing.
Start Alertmanager
Extract binaries:
tar zxf alertmanager-0.1.1.linux-amd64.tar.gz
Start Alertmanager:
./start.sh alertmanager
Uncomment ALERTMANAGER line in start.sh
Restart Prometheus:
kill `pidof prometheus`
./start.sh prometheus
Alertmanager web UI
Go to Alertmanager web interface
Alert paged by email
Prometheus recording rules
Let¨s review recording rules prepared for this tutorial:
cat recording.rules
Include alerting rules into prometheus.yml:
- recording.rules
Reload prometheus:
kill -HUP `pidof prometheus`
Query for newly created metrics
Working with Prometheus HTTP API
Instant and range queries, at a single point in time or range:
curl -sg 'http://localhost:9090/api/v1/query?query=up{job="mysql"}' | python -m json.tool
curl -sg 'http://localhost:9090/api/v1/query?query=ALERTS{alertstate="firing"}' | python -m
curl -sg "http://localhost:9090/api/v1/query_range?query=node_load1&start=`expr $(date +%s) -
3600`&end=`date +%s`&step=5m" | python -m json.tool
Label values across the whole DB:
curl http://localhost:9090/api/v1/label/alias/values
List of series matching the expression:
curl -sg
|rpc_pipefs|tmpfs"}'| python -m json.tool
Delete series:
curl -g -X DELETE 'http://localhost:9090/api/v1/series?match[]={alias="db2"}'
InfluxDB overview
InfluxDB is an open source time series database. It's useful for recording metrics, events, and
performing analytics.
Web interface
Why InfluxDB?
? Currently, one of a few available remote storage options for Prometheus to use as a long-
term solution
? Multiple retention policies
? Easy to use
? Grafana support
? Clustering
Configure Prometheus with InfluxDB
Create prometheus db in InfluxDB:
create database prometheus;
Uncomment INFLUXDB line in start.sh
Restart Prometheus:
kill `pidof prometheus`
./start.sh prometheus
Load continuous queries to downsample data:
python grafana-dashboards/influxdb_cq.py
Using InfluxDB
Browse data:
use prometheus;
show measurements;
show continuous queries;
select * from node_load1;
use trending;
show retention policies on trending;
select * from trending."5m".node_load1;
show shards;
Add InfluxDB datasource to Grafana
What¨s next?
? Grafana 3.0 release: pie charts, more functionality, improved Prometheus datasource?
? More long-term storage options for Prometheus
? Alertmanager production-ready status?
? InfluxDB or not InfluxDB?
Thank you!

