Your VPS is slow. Your website is loading sluggishly. But why? Is it CPU maxed out? Memory exhausted? Disk I/O bottleneck? Network congestion? Without proper monitoring, you're flying blind, guessing at solutions instead of diagnosing the real problem.
Performance monitoring is the difference between reactive firefighting ("Why is my server down?!") and proactive management ("CPU is approaching 80%, let's optimize or scale before there's an issue"). It's essential for maintaining uptime, optimizing costs, and ensuring great user experience.
This comprehensive guide covers everything you need to know about VPS performance monitoring: from basic command-line tools to professional monitoring solutions, setting up alerting, and interpreting metrics to optimize your server.
• Command-line monitoring tools (top, htop, iostat, netstat)
• Professional monitoring solutions (Netdata, Prometheus, Grafana)
• Setting up alerts and notifications
• Performance optimization strategies
Understanding Key Performance Metrics
Before diving into tools, you need to understand what to monitor:
1. CPU Metrics
- CPU Usage (%): Percentage of CPU capacity in use
- Load Average: Number of processes waiting for CPU time (1min, 5min, 15min averages)
- Process Count: Number of running processes
- Context Switches: How often the CPU switches between processes (high = inefficiency)
For a 2-core CPU, load average should stay below 2.0
Load > CPU cores = system is overloaded
Example: 4-core CPU, load average of 6.0 = problematic
2. Memory (RAM) Metrics
- Used RAM: Memory actively in use by processes
- Free RAM: Available memory
- Cached/Buffered: Memory used for caching (good, can be freed if needed)
- Swap Usage: Disk space used as virtual memory (slow, indicates RAM shortage)
3. Disk Metrics
- Disk Usage (%): Percentage of disk space used
- Disk I/O: Read/write operations per second
- IOPS: Input/Output Operations Per Second
- Await: Average time for I/O requests to be served (latency)
4. Network Metrics
- Bandwidth Usage: Data transferred in/out
- Packets: Network packets sent/received
- Connections: Active network connections
- Errors/Drops: Failed or dropped packets
Command-Line Monitoring Tools (Built-in)
1. top - Real-Time System Overview
top
Basic system monitor showing CPU, memory, and processes.
Understanding top output:
top - 10:30:45 up 5 days, 2:15, 2 users, load average: 0.45, 0.52, 0.48
Tasks: 123 total, 1 running, 122 sleeping, 0 stopped, 0 zombie
%Cpu(s): 5.2 us, 1.3 sy, 0.0 ni, 93.2 id, 0.2 wa, 0.0 hi, 0.1 si, 0.0 st
MiB Mem : 2048.0 total, 256.3 free, 1024.5 used, 767.2 buff/cache
MiB Swap: 1024.0 total, 1024.0 free, 0.0 used. 823.8 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1234 www-data 20 0 512M 128M 16M S 2.3 6.3 1:23.45 php-fpm
5678 mysql 20 0 1.2G 256M 32M S 1.7 12.5 5:45.12 mysqld
Key columns explained:
- us (user): CPU time in user processes
- sy (system): CPU time in kernel
- id (idle): Idle CPU time (higher is better)
- wa (wait): CPU waiting for I/O (high = disk bottleneck)
- RES: Actual RAM used by process
- %MEM: Percentage of RAM used
Useful top commands (while running):
- P: Sort by CPU usage
- M: Sort by memory usage
- k: Kill a process (enter PID)
- 1: Show individual CPU cores
- q: Quit
2. htop - Enhanced Interactive Monitor (Recommended)
# Install htop
sudo apt install htop -y
# Run
htop
Why htop is better than top:
- Color-coded, visual interface
- Mouse support
- Easier process killing
- Tree view of processes
- Built-in search
htop shortcuts:
- F6: Sort by column
- F5: Tree view
- F3 or /: Search process
- F9: Kill process
- u: Filter by user
3. vmstat - Virtual Memory Statistics
# Show stats every 2 seconds
vmstat 2
# Example output:
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 0 256432 51200 767324 0 0 12 45 123 234 5 1 94 0 0
Key columns:
- r: Processes waiting for CPU (if > CPU cores = problem)
- b: Processes in uninterruptible sleep (I/O wait)
- swpd: Swap memory used
- si/so: Swap in/out (non-zero = RAM shortage)
- bi/bo: Blocks in/out (disk I/O)
- wa: CPU waiting for I/O
4. iostat - I/O Statistics
# Install sysstat package
sudo apt install sysstat -y
# Show disk I/O stats every 2 seconds
iostat -x 2
# Example output:
Device rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.12 2.34 1.5 3.2 45.3 89.7 56.32 0.05 10.5 2.1 0.98
Critical metrics:
- %util: Disk utilization (>80% = bottleneck)
- await: Average wait time in ms (>10ms can indicate issues)
- r/s, w/s: Reads and writes per second
5. free - Memory Usage
# Show memory in human-readable format
free -h
# Example output:
total used free shared buff/cache available
Mem: 2.0Gi 1.0Gi 256Mi 64Mi 767Mi 823Mi
Swap: 1.0Gi 0B 1.0Gi
6. df - Disk Space Usage
# Show disk space
df -h
# Show inodes (file count)
df -i
7. netstat/ss - Network Statistics
# Show active connections (modern replacement for netstat)
ss -tunap
# Show listening ports
ss -tlnp
# Count connections by state
ss -tan | awk '{print $1}' | sort | uniq -c
# Show bandwidth usage (requires iftop)
sudo apt install iftop -y
sudo iftop -i eth0
Professional Monitoring Solutions
1. Netdata - Real-Time Monitoring Dashboard (Easiest)
Netdata provides a beautiful, real-time monitoring dashboard accessible via web browser.
Installation (5 minutes):
# One-line installer
bash <(curl -Ss https://my-netdata.io/kickstart.sh)
# Access dashboard
http://your-server-ip:19999
What Netdata monitors:
- CPU (per-core and overall)
- RAM and Swap
- Disk I/O and space
- Network traffic
- System processes
- Web server (Nginx/Apache)
- MySQL/PostgreSQL
- Redis, Docker, and 300+ integrations
Secure Netdata with Nginx Reverse Proxy:
# Install nginx if not already installed
sudo apt install nginx -y
# Create nginx config
sudo nano /etc/nginx/sites-available/netdata
server {
listen 80;
server_name monitoring.yourdomain.com;
location / {
proxy_pass http://127.0.0.1:19999;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
sudo ln -s /etc/nginx/sites-available/netdata /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
# Add SSL with Let's Encrypt
sudo certbot --nginx -d monitoring.yourdomain.com
2. Prometheus + Grafana (Enterprise-Grade)
Industry-standard monitoring stack used by Netflix, Uber, and countless enterprises.
Architecture:
- Prometheus: Metrics collection and storage
- Node Exporter: Collects system metrics
- Grafana: Beautiful visualization dashboards
Installation (Node Exporter + Prometheus + Grafana):
# Install Node Exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz
sudo cp node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
sudo useradd --no-create-home --shell /bin/false node_exporter
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter
# Create systemd service
sudo nano /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
After=network.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl start node_exporter
sudo systemctl enable node_exporter
# Verify node_exporter is running
curl http://localhost:9100/metrics
Install Prometheus:
wget https://github.com/prometheus/prometheus/releases/download/v2.48.0/prometheus-2.48.0.linux-amd64.tar.gz
tar xvfz prometheus-2.48.0.linux-amd64.tar.gz
sudo cp prometheus-2.48.0.linux-amd64/prometheus /usr/local/bin/
sudo cp prometheus-2.48.0.linux-amd64/promtool /usr/local/bin/
sudo mkdir -p /etc/prometheus /var/lib/prometheus
sudo cp -r prometheus-2.48.0.linux-amd64/consoles /etc/prometheus/
sudo cp -r prometheus-2.48.0.linux-amd64/console_libraries /etc/prometheus/
# Create prometheus config
sudo nano /etc/prometheus/prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'node'
static_configs:
- targets: ['localhost:9100']
sudo useradd --no-create-home --shell /bin/false prometheus
sudo chown -R prometheus:prometheus /etc/prometheus /var/lib/prometheus
# Create systemd service
sudo nano /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus
After=network.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl start prometheus
sudo systemctl enable prometheus
# Access Prometheus
http://your-server-ip:9090
Install Grafana:
# Add Grafana repository
sudo apt install -y software-properties-common
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
sudo apt update
sudo apt install grafana -y
# Start Grafana
sudo systemctl start grafana-server
sudo systemctl enable grafana-server
# Access Grafana
http://your-server-ip:3000
# Default login: admin/admin
Configure Grafana:
- Log in to Grafana (admin/admin)
- Add Prometheus data source (Configuration → Data Sources → Add Prometheus)
- Import dashboard: Use ID 1860 (Node Exporter Full) from grafana.com/dashboards
Skip the Complex Setup with VPS Commander
Setting up professional monitoring requires significant time and technical expertise. VPS Commander provides built-in performance monitoring dashboards, real-time metrics, and intelligent alerting - all accessible through an intuitive web interface. No terminal commands or complex configuration required.
Try VPS Commander - Starting at $2.99/month3. Cloud-Based Monitoring Services
Datadog (Enterprise, Paid)
- Comprehensive APM and infrastructure monitoring
- $15/host/month
- Great for teams and multi-server setups
New Relic (Free Tier Available)
- 100GB/month free data ingest
- Application performance monitoring
- Easy integration
UptimeRobot (Free for Basic)
- Website uptime monitoring
- 50 monitors on free plan
- Email/SMS alerts
Setting Up Alerting
Monitoring without alerting means you'll only discover problems when users complain. Set up proactive alerts:
Email Alerts for High Resource Usage
sudo nano /usr/local/bin/resource-alert.sh
#!/bin/bash
EMAIL="your@email.com"
CPU_THRESHOLD=80
MEM_THRESHOLD=85
DISK_THRESHOLD=90
# Check CPU
CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1 | cut -d'.' -f1)
if [ "$CPU_USAGE" -gt "$CPU_THRESHOLD" ]; then
echo "CPU usage is ${CPU_USAGE}%" | mail -s "HIGH CPU ALERT - Server" $EMAIL
fi
# Check Memory
MEM_USAGE=$(free | grep Mem | awk '{print int($3/$2 * 100)}')
if [ "$MEM_USAGE" -gt "$MEM_THRESHOLD" ]; then
echo "Memory usage is ${MEM_USAGE}%" | mail -s "HIGH MEMORY ALERT - Server" $EMAIL
fi
# Check Disk
DISK_USAGE=$(df / | tail -1 | awk '{print $5}' | cut -d'%' -f1)
if [ "$DISK_USAGE" -gt "$DISK_THRESHOLD" ]; then
echo "Disk usage is ${DISK_USAGE}%" | mail -s "HIGH DISK ALERT - Server" $EMAIL
fi
sudo chmod +x /usr/local/bin/resource-alert.sh
# Run every 5 minutes
sudo crontab -e
# Add:
*/5 * * * * /usr/local/bin/resource-alert.sh
Prometheus Alertmanager
Configure alerts in Prometheus for sophisticated alerting:
sudo nano /etc/prometheus/alert_rules.yml
groups:
- name: server_alerts
rules:
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage detected"
- alert: HighMemoryUsage
expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 85
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage detected"
- alert: DiskSpaceLow
expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 15
for: 5m
labels:
severity: critical
annotations:
summary: "Disk space critically low"
Performance Optimization Based on Metrics
High CPU Usage
Diagnosis:
# Find CPU-hungry processes
ps aux --sort=-%cpu | head -10
# Check load average
uptime
Solutions:
- Optimize application code (database queries, loops)
- Enable caching (Redis, Memcached)
- Upgrade to more CPU cores
- Use CDN to offload static content
High Memory Usage
Diagnosis:
# Find memory-hungry processes
ps aux --sort=-%mem | head -10
# Check if swap is being used
free -h
Solutions:
- Restart services to clear memory leaks
- Optimize database queries and connections
- Reduce PHP-FPM/Apache worker processes
- Upgrade RAM
Disk I/O Bottleneck
Diagnosis:
iostat -x 2
# Look for high %util and await times
Solutions:
- Move logs to separate disk
- Optimize database indexes
- Enable query caching
- Upgrade to SSD (if using HDD)
- Use tmpfs for temporary files
Network Issues
Diagnosis:
# Check network usage
iftop -i eth0
# Check connections
ss -s
Solutions:
- Use CDN for static assets
- Enable gzip compression
- Optimize images
- Limit concurrent connections
Monitoring Best Practices
- Establish baselines: Monitor for 1-2 weeks to understand normal behavior
- Set realistic thresholds: Don't alert at 50% CPU; that's normal. Alert at 80-90%
- Monitor trends, not just snapshots: A spike is normal; sustained high usage is a problem
- Document incidents: When you get an alert and fix it, document what happened and the solution
- Test your alerts: Verify alerts actually trigger and reach you
- Review metrics regularly: Weekly review to spot gradual degradation
- Monitor business metrics too: Response times, transaction success rates, user experience
Quick Reference: Monitoring Commands
| Metric | Command |
|---|---|
| CPU usage | top or htop |
| Memory usage | free -h |
| Disk space | df -h |
| Disk I/O | iostat -x 2 |
| Network connections | ss -tunap |
| Load average | uptime |
| Top processes (CPU) | ps aux --sort=-%cpu | head |
| Top processes (Memory) | ps aux --sort=-%mem | head |
Monitoring Checklist
- ✅ Install htop for quick system overview
- ✅ Set up Netdata or Prometheus+Grafana for dashboards
- ✅ Configure email/SMS alerts for critical metrics
- ✅ Monitor CPU, RAM, disk, and network
- ✅ Track application-specific metrics (database, web server)
- ✅ Set up external uptime monitoring (UptimeRobot)
- ✅ Establish performance baselines
- ✅ Review metrics weekly
- ✅ Document incidents and solutions
- ✅ Test alert notifications
Conclusion
Effective performance monitoring transforms VPS management from reactive firefighting to proactive optimization. By understanding key metrics, using the right tools, and setting up proper alerting, you can catch problems before they impact users and optimize resources for cost efficiency.
Start simple with command-line tools like htop and free, then graduate to Netdata for beautiful real-time dashboards. For enterprise needs, Prometheus and Grafana provide unlimited scalability and customization.
Remember: the goal isn't to achieve 0% CPU usage or have infinite resources. The goal is to understand your server's behavior, identify bottlenecks early, and make data-driven decisions about optimization and scaling.
1. Install htop and explore your current resource usage
2. Set up Netdata for real-time monitoring (takes 5 minutes)
3. Configure basic email alerts for CPU/RAM/Disk thresholds
4. Monitor for 1-2 weeks to establish baselines
5. Optimize based on the data you collect
Your monitoring setup is complete when you know about problems before your users do.