Essential Linux Commands for DevOps Engineers
Introduction
When production breaks at 3 AM you can’t waste time searching for “which process is using port 8080” or “check disk space linux.” You need the right commands already in muscle memory.
Mastery isn’t memorizing flags for trivia. It’s gathering signal fast, forming a hypothesis, and narrowing root cause with the least motion: is it CPU saturation, runaway I/O, memory pressure, port collision, or just a noisy log file? The commands here help you answer that—quickly and repeatably.
This guide groups essential Linux tooling by real operational tasks: processes, system resources, services, networks, files/logs, remote access, permissions, and power patterns. Each entry focuses on practical usage and interpretation instead of encyclopedic flag dumps.
Who this is for: DevOps engineers, SREs, platform engineers, or anyone who keeps Linux systems healthy.
You’ll get: The core commands for daily ops, incident response, rollout verification, and automation—plus how to read their output with confidence.
Part 1: Process & System Monitoring
First objective in an incident: establish system truth. What is running, what is consuming, what changed.
1. Process Management
ps – Process snapshot
Point‑in‑time process list: owner, PID, CPU %, memory %, command.
# View all running processes with full details
ps aux
# Alternative format (System V style)
ps -ef
# Find specific processes
ps aux | grep nginx
# Show processes in a tree structure (see parent-child relationships)
ps auxf
# Show only processes for current user
ps ux
# Sort by CPU usage (highest first)
ps aux --sort=-%cpu | head -n 10
# Sort by memory usage
ps aux --sort=-%mem | head -n 10
# Show threads for a specific process
ps -eLf | grep nginx
Reading the output:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
nginx 1234 0.5 2.1 12345 6789 ? S 10:30 0:05 nginx: worker process
- USER: Process owner
- PID: Process ID (unique identifier)
- %CPU: CPU usage percentage
- %MEM: Memory usage percentage
- VSZ: Virtual memory size (KB)
- RSS: Resident Set Size - actual physical memory (KB)
- STAT: Process state (R=running, S=sleeping, Z=zombie, D=uninterruptible sleep)
- START: When the process started
- TIME: Total CPU time used
- COMMAND: The command that started the process
Examples:
# After deploying a new application, check if it's running
ps aux | grep myapp
# Find all Java processes and their memory usage
ps aux | grep java | awk '{print $2, $4, $11}'
# Identify processes consuming more than 50% CPU
ps aux | awk '$3 > 50 {print $0}'
Tips:
ps auxffor parent/child relationships (great for systemd + worker pools)- Wrap in
watch -n 2for rough pseudo‑streaming - Use
[n]ginxpattern to avoid matching the grep process
top / htop – Live view
Real‑time utilization and ranking of processes. htop adds color, filtering, scrolling.
# Launch top (default 3-second refresh)
top
# Better alternative with colors and mouse support (if installed)
htop
# Top with custom refresh interval (1 second)
top -d 1
# Show only processes for specific user
top -u nginx
# Batch mode (for logging/scripting)
top -b -n 1 > system_snapshot.txt
Interactive top commands (while running):
- M - Sort by memory usage (high to low)
- P - Sort by CPU usage (default)
- T - Sort by running time
- k - Kill a process (prompts for PID)
- r - Renice (change priority) of a process
- f - Add/remove display fields
- 1 - Show individual CPU cores
- q - Quit
Header breakdown:
top - 14:23:45 up 23 days, 4:12, 3 users, load average: 0.52, 0.58, 0.59
Tasks: 312 total, 1 running, 311 sleeping, 0 stopped, 0 zombie
%Cpu(s): 3.2 us, 1.1 sy, 0.0 ni, 95.5 id, 0.2 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 16384.0 total, 2048.5 free, 8192.3 used, 6143.2 buff/cache
MiB Swap: 8192.0 total, 7890.1 free, 301.9 used. 7234.8 avail Mem
Key readings: Load (compare to logical CPU count), idle vs wait, memory pressure, swap usage trend.
Why htop: Scroll, tree, filter (F4), interactive kill (F9), color for quick scanning.
# Install htop (if not available)
# Rocky Linux / RHEL:
sudo dnf install htop
# Ubuntu / Debian:
sudo apt install htop
Usage patterns:
# Monitor system during deployment
top -d 1 # Fast refresh to catch spikes
# Watch memory usage after releasing a new version
htop # Press 'M' to sort by memory, watch for leaks
# Identify which process caused a CPU spike
top -b -n 1 | head -n 20 # Capture snapshot for later analysis
Tips:
top -o %MEMfor memory‑first triage- Sustained
wa> 5–10% → investigate storage (iostat) - Load >> CPU count alongside high run queue = saturation
3. Finding Process IDs (PIDs)
Command: pgrep / pidof
# Find PID by process name
pgrep nginx
# Find PID with full command line
pgrep -f "java.*myapp"
# Show PID and process name
pgrep -l nginx
# Alternative: pidof (simpler but less flexible)
pidof nginx
- Use case: Get PIDs for process management, scripting
- DevOps context: Automated restart scripts, health checks
- Pro tip: Use
pgrep -fto match full command line arguments
Command: ps + grep (fallback)
# Find process by name
ps aux | grep nginx
# Get just the PID (using awk)
ps aux | grep nginx | grep -v grep | awk '{print $2}'
# More elegant single command
ps aux | grep [n]ginx | awk '{print $2}'
- Use case: When pgrep isn't available, scripting
- DevOps context: Legacy systems, portable scripts
- Pro tip: The
[n]ginxtrick excludes grep itself from results
Command: lsof (list open files)
# Find which process is using a specific port
lsof -i :8080
# Find all network connections for a process
lsof -p 1234
# Find process using a specific file
lsof /var/log/app.log
- Use case: Port conflict resolution, file lock troubleshooting
- DevOps context: "Port already in use" errors, finding what's holding files
- Pro tip: Combine with
-tflag to get just PIDs:lsof -t -i :8080
4. Terminating Processes
Command: kill / killall
# First, find the PID (see section above)
pgrep nginx
# Output: 12345
# Graceful termination (SIGTERM)
kill 12345
# Force kill (SIGKILL) - use as last resort
kill -9 12345
# Kill all processes by name
killall nginx
# Graceful kill by name
pkill nginx
# Kill with full command match
pkill -f "java.*myapp"
- Use case: Stop unresponsive processes, force restart services
- DevOps context: Emergency process termination during incidents
- Warning:
-9skips cleanup; escalate only if graceful stop fails - Practice: TERM → short wait → KILL only if still present
Signal cheat sheet:
# Common signals
kill -15 <PID> # SIGTERM (default, graceful shutdown)
kill -9 <PID> # SIGKILL (immediate termination, no cleanup)
kill -1 <PID> # SIGHUP (reload configuration)
kill -2 <PID> # SIGINT (Ctrl+C equivalent)
2. System Resource Monitoring
Baseline + deltas = early warning. Watch these before things break.
free – Memory usage
Shows physical + swap breakdown plus reclaimable cache. Track pressure, not raw consumption.
# Display memory usage in human-readable format
free -h
# Show memory in megabytes
free -m
# Show memory in gigabytes
free -g
# Continuous monitoring (update every 2 seconds)
free -h -s 2
# Wide format (better column spacing)
free -h -w
Sample:
total used free shared buff/cache available
Mem: 15Gi 8.2Gi 1.1Gi 523Mi 6.1Gi 6.8Gi
Swap: 8.0Gi 301Mi 7.7Gi
Columns:
- total: Total installed RAM
- used: Memory currently in use by applications
- free: Completely unused memory (usually low—that's normal!)
- shared: Memory used by tmpfs (temporary filesystems)
- buff/cache: Memory used for file system caching (Linux uses "free" RAM for caching)
- available: MOST IMPORTANT - Memory available for new applications without swapping
Misread alert: Low “free” is normal. Focus on available.
Patterns:
# Check if system is low on memory
free -h
# If "available" is low (< 10% of total), investigate with top/htop
# Monitor memory during load testing
watch -n 1 'free -h'
# Check if application deployment increased memory usage
free -h # Before deployment
free -h # After deployment, compare "used" and "available"
# Quick one-liner to check available memory percentage
free | grep Mem | awk '{print ($7/$2) * 100 "%"}'
Red flags:
- Available < 10% total
- Swap steadily rising
- OOM killer messages in logs
Tips:
- Correlate with
vmstat 1(si/so columns) - Rising swap + stable available = benign aging pages
- Use
pmap/smemfor attribution if creeping
df – Filesystem usage
Capacity exhaustion silently breaks writes, logging, queueing.
# Display disk space in human-readable format
df -h
# Show all filesystems (including tmpfs, devtmpfs)
df -ah
# Show only specific filesystem type
df -h -t ext4
# Exclude specific filesystem type (useful to hide tmpfs clutter)
df -h -x tmpfs -x devtmpfs
# Show inode usage instead of block usage
df -i
# Show filesystem type
df -T
Sample:
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 50G 35G 13G 74% /
/dev/sdb1 500G 450G 26G 95% /var
tmpfs 7.8G 524M 7.3G 7% /run
Columns:
- Filesystem: Device or partition name
- Size: Total size of the filesystem
- Used: Space currently in use
- Avail: Available space for new data
- Use%: Percentage of space used
- Mounted on: Where this filesystem is accessible
Threshold guide: plan (80%), act (90%), urgent (95%), risk (100%).
Patterns:
# Quick disk space check (most common command)
df -h
# During incident: Find which filesystem is full
df -h | awk '$5 > 80 {print $0}'
# Check if specific mount has enough space for deployment
df -h /opt/applications
# Monitor disk space during log-heavy operations
watch -n 5 'df -h | grep -E "(Filesystem|/var|/)"'
# Check inode usage (sometimes you run out of inodes, not space!)
df -i
# If inode Use% is high but df -h shows space available, you have too many small files
Inode exhaustion:
# Check inode usage
df -i
# Output shows:
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda1 3276800 3270000 6800 99% /
# Find directories with many files
find /var -xdev -type d -exec sh -c 'echo $(ls -a "$1" | wc -l) "$1"' _ {} \; | sort -n | tail -20
Tips:
- Alert on inode use too
- Prune old images/volumes
- Rotate/compress logs early
du – Directory utilization
Answers “where did the space go?”—pair with sorting.
# Check directory size (most common usage)
du -sh /var/log
# Find largest directories in /var
du -h --max-depth=1 /var | sort -hr | head -n 10
# Same but with better sorting (numeric, not alphabetic)
du -h /var | sort -rh | head -n 10
# Show sizes for all subdirectories
du -h --max-depth=2 /opt
# Include hidden files and directories
du -sh /home/user/.* /home/user/*
# Find total size of specific file types
du -ch /var/log/*.log | grep total
# Real-time monitoring during cleanup
watch -n 5 'du -sh /var/log'
# Find largest files in a directory
du -ah /var/log | sort -rh | head -n 20
# Exclude certain directories
du -h --exclude='*.git' /opt/project
Flags:
-s: Summarize (show total only, don't list all files)-h: Human-readable sizes (K, M, G)-a: Include files, not just directories-c: Show grand total at the end--max-depth=N: Limit directory recursion depth
Patterns:
# Scenario 1: Disk is 95% full, find the culprit
df -h # Shows /var is full
du -h --max-depth=1 /var | sort -hr | head -10
# Output might show: 400G /var/log
du -h --max-depth=1 /var/log | sort -hr | head -10
# Output shows: 350G /var/log/application/old-logs
# Scenario 2: Docker eating disk space
du -sh /var/lib/docker
# Output: 80G /var/lib/docker
# Clean up old Docker images
docker system prune -a --volumes
# Scenario 3: Find all large log files
find /var/log -type f -size +100M -exec du -h {} \; | sort -rh
# Scenario 4: Compare disk usage before/after cleanup
du -sh /var/log > before.txt
# ... perform cleanup ...
du -sh /var/log > after.txt
diff before.txt after.txt
Performance: Limit recursion depth for speed.
# Fast: Just top-level directories
du -h --max-depth=1 /
# Slow: Scans entire filesystem
du -h /
Big consumers:
# Find top 10 largest directories under /
sudo du -h --max-depth=2 / 2>/dev/null | sort -hr | head -n 10
# Find top 20 largest files on system
sudo find / -type f -size +100M -exec du -h {} \; 2>/dev/null | sort -rh | head -n 20
Tips:
ncdufor interactive cleanup- Snapshot periodically for growth trends
- Exclude ephemeral mounts
Command: uptime
# System uptime and load averages
uptime
- Use case: Check system stability, load averages
- DevOps context: Quick health check during incidents
- What the numbers mean: 1min, 5min, 15min load averages
Command: iostat
# I/O statistics
iostat -x 1
# Disk-specific stats
iostat -dx 1
- Use case: Diagnose disk I/O bottlenecks
- DevOps context: Performance troubleshooting, database issues
- What to watch: %util, await times
Part 2: Service Management
Control plane for systemd units. Check state, follow logs, trace failures.
systemctl
# List active services
systemctl list-units --type=service --state=running
# Service status + recent log tail
systemctl status nginx
# Start/stop/restart
sudo systemctl restart nginx
sudo systemctl stop nginx
sudo systemctl start nginx
# Enable on boot (creates symlink)
sudo systemctl enable nginx
# Disable auto-start
sudo systemctl disable nginx
# Check if enabled
systemctl is-enabled nginx
# Reload config without restart (if supported)
sudo systemctl reload nginx
# Show service file location
systemctl cat nginx
# Edit service override
sudo systemctl edit nginx # Creates drop-in at /etc/systemd/system/nginx.service.d/
Common failure modes:
# Service won't start—check why
systemctl status myapp
# Look for "Active: failed" + exit code
# See full error (status truncates)
journalctl -u myapp -n 50 --no-pager
# Check if crash-looping
systemctl list-units --state=failed
Triage checklist:
systemctl status <unit>→ exit code, recent logsjournalctl -u <unit> -n 100→ full startup sequence- Check deps:
systemctl list-dependencies <unit> - Verify file:
systemctl cat <unit>→ paths, env, user
Tips:
- Use
--no-pagerin scripts to avoid truncation is-active/is-enabledreturn 0/non-zero for scripting- After changing unit files:
sudo systemctl daemon-reload
journalctl
Systemd's structured logging. Query by unit, time, priority, field.
# All logs for a unit
journalctl -u nginx
# Follow (like tail -f)
journalctl -u nginx -f
# Last N lines
journalctl -u nginx -n 100
# Time-based filtering
journalctl -u nginx --since "2025-10-07 14:00:00"
journalctl -u nginx --since "1 hour ago"
journalctl -u nginx --since today
# Priority filtering (emerg, alert, crit, err, warning, notice, info, debug)
journalctl -u nginx -p err
# Combine: errors in last hour
journalctl -u nginx -p err --since "1 hour ago"
# Reverse order (newest first)
journalctl -u nginx -r
# Show only from current boot
journalctl -u nginx -b
# Kernel messages (dmesg equivalent)
journalctl -k
# Disk usage
journalctl --disk-usage
# Vacuum old logs (keep last 7 days)
sudo journalctl --vacuum-time=7d
Powerful field filtering:
# All logs from specific executable
journalctl _COMM=sshd
# Logs from specific PID
journalctl _PID=1234
# By user
journalctl _UID=1000
# Multiple units
journalctl -u nginx -u mysql
Output formats:
# JSON for parsing
journalctl -u nginx -o json
# Short (syslog-style)
journalctl -u nginx -o short
# Verbose (all fields)
journalctl -u nginx -o verbose
Post-deployment workflow:
# Mark time, deploy, follow logs
date # Note timestamp
# ... deploy ...
journalctl -u myapp --since "30 seconds ago" -f
# Or filter by priority
journalctl -u myapp -p warning -f
Tips:
- Add
--no-pagerfor grep-able output - Use
-xfor explanatory help text on errors - Combine
-fwith-n 0to follow without history - Set retention:
/etc/systemd/journald.conf→MaxRetentionSec
Part 3: Network Troubleshooting
Layer-by-layer diagnostics: connectivity, DNS, sockets, routing, application.
ss (socket statistics)
Replaced netstat. Shows listening + established sockets, optionally with owning processes.
# List all listening TCP/UDP ports
ss -tuln
# Show processes owning sockets (requires root)
sudo ss -tulpn
# TCP connections only
ss -t
# Established connections
ss -tn state established
# Count connections by state
ss -tan | awk '{print $1}' | sort | uniq -c
# Show specific port
ss -tuln | grep :8080
# Numeric (don't resolve names—faster)
ss -n
# Summary stats
ss -s
Triage patterns:
# "Port already in use" → find what's listening
sudo ss -tulpn | grep :8080
# Check if service bound correctly
ss -tuln | grep :3306 # MySQL example
# Too many connections?
ss -tn | wc -l
# Which IPs connecting most?
ss -tn | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn | head
Tips:
-tTCP,-uUDP,-llistening,-nnumeric,-pprocesses- Much faster than old
netstat - Use
-4or-6to filter IP version
ip (network configuration)
Modern replacement for ifconfig / route. Configure interfaces, routes, tunnels.
# Show all interfaces
ip addr show
# or shorthand:
ip a
# Specific interface
ip addr show eth0
# Show routes
ip route show
# or:
ip r
# Add/delete IP
sudo ip addr add 192.168.1.100/24 dev eth0
sudo ip addr del 192.168.1.100/24 dev eth0
# Bring interface up/down
sudo ip link set eth0 up
sudo ip link set eth0 down
# Show link statistics
ip -s link
# Neighbor table (ARP cache)
ip neigh show
# Routing table with more detail
ip route show table all
Common scenarios:
# Verify IP after DHCP/static config
ip addr show | grep inet
# Check default gateway
ip route show default
# Add temporary route
sudo ip route add 10.0.0.0/8 via 192.168.1.1
# Flush specific interface IPs
sudo ip addr flush dev eth0
# Check MTU
ip link show eth0 | grep mtu
Tips:
- Persistent changes need
/etc/network/interfacesor NetworkManager - Use
-cfor color output:ip -c a -brfor brief table format:ip -br a
curl
HTTP client for API testing, health checks, troubleshooting.
# Basic GET
curl https://api.example.com/health
# Include response headers
curl -I https://example.com
# or verbose:
curl -v https://example.com
# Follow redirects
curl -L https://short.link
# POST JSON
curl -X POST https://api.example.com/data \
-H "Content-Type: application/json" \
-d '{"key":"value"}'
# POST form data
curl -X POST https://example.com/form \
-d "username=admin&password=secret"
# Custom headers
curl -H "Authorization: Bearer TOKEN" https://api.example.com
# Save output
curl -o output.html https://example.com
# Silent (no progress)
curl -s https://api.example.com/status
# Fail on HTTP errors
curl -f https://api.example.com
# Test specific IP (bypass DNS)
curl --resolve example.com:443:192.168.1.100 https://example.com
# Check TLS cert expiry
curl -vI https://example.com 2>&1 | grep "expire"
# Download with resume support
curl -C - -O https://example.com/largefile.iso
# Set timeout
curl --connect-timeout 5 --max-time 10 https://slow-server.com
Health check patterns:
# Simple availability
curl -f -s https://app.example.com/health && echo "UP" || echo "DOWN"
# Check response time
time curl -s https://api.example.com > /dev/null
# Validate status code
curl -s -o /dev/null -w "%{http_code}" https://example.com
# JSON response parsing (with jq)
curl -s https://api.example.com/status | jq '.status'
# Check from specific source IP (multi-homed)
curl --interface eth1 https://example.com
Debugging workflow:
# Start verbose
curl -v https://problem-site.com
# Check SSL handshake
curl -v https://problem-site.com 2>&1 | grep -E "SSL|TLS|certificate"
# Test with/without HTTP2
curl --http2 https://example.com
curl --http1.1 https://example.com
# Bypass proxy temporarily
curl --noproxy '*' https://example.com
Tips:
-s(silent) +-S(show errors) = script-friendly- Use
-wfor custom output format (status, time, size) -kskips cert validation (insecure, use only for testing)
wget
File downloader. Better than curl for recursive downloads and resume.
# Download file
wget https://example.com/file.tar.gz
# Save with different name
wget -O custom-name.tar.gz https://example.com/file.tar.gz
# Resume interrupted download
wget -c https://example.com/large-file.iso
# Background download
wget -b https://example.com/huge-file.zip
# Limit rate (bandwidth throttle)
wget --limit-rate=1m https://example.com/file.zip
# Mirror entire site
wget -m -p -k https://example.com
# Download with auth
wget --user=admin --password=secret https://example.com/file
# Retry on failure
wget --tries=5 https://unreliable-server.com/file
Comparison to curl:
- wget: better for downloads, mirroring, recursive retrieval
- curl: better for API testing, custom headers, protocol flexibility
ping
ICMP echo test. Verify basic IP connectivity.
# Continuous ping
ping google.com
# Send N packets
ping -c 4 google.com
# Flood ping (requires root, use carefully)
sudo ping -f 192.168.1.1
# Set interval (seconds)
ping -i 0.5 google.com
# Specify interface
ping -I eth0 192.168.1.1
# IPv6
ping6 google.com
Usage:
- First test in network triage
- No response: check firewalls, routing, host down
- High latency: network congestion or distance
- Packet loss: unstable link
Tips:
- Use
-cin scripts to avoid infinite loops - Firewalls often block ICMP (absence doesn't prove failure)
- Pair with
mtrfor path analysis
traceroute / mtr
Show network path to destination.
# Basic traceroute
traceroute google.com
# Use ICMP instead of UDP
sudo traceroute -I google.com
# Set max hops
traceroute -m 20 google.com
# Better: mtr (combines traceroute + ping)
mtr google.com
# mtr report mode (10 cycles)
mtr -r -c 10 google.com
Reading output:
- Each line = hop (router)
* * *= timeout (often firewall dropping probes)- High latency at specific hop = bottleneck
- Loss early in path = problem near you; late = problem near destination
When to use: Multi-region connectivity issues, asymmetric routing, finding slow link.
dig / nslookup
DNS query tools. Verify resolution, check propagation, debug CDN issues.
# Basic lookup
dig example.com
# Query specific DNS server
dig @8.8.8.8 example.com
# Short answer only
dig +short example.com
# Reverse DNS
dig -x 8.8.8.8
# Specific record type
dig example.com MX
dig example.com TXT
dig example.com AAAA # IPv6
# Trace full resolution path
dig +trace example.com
# No recursion (ask server directly)
dig +norecurs example.com
# Query time
dig example.com | grep "Query time"
Common tasks:
# Verify DNS change propagated
dig @8.8.8.8 example.com # Google DNS
dig @1.1.1.1 example.com # Cloudflare DNS
# Check TTL
dig example.com | grep -E "^example.com"
# Find authoritative nameservers
dig example.com NS
# Test internal DNS
dig @10.0.0.53 internal.example.com
nslookup alternative:
# Basic query
nslookup example.com
# Specific server
nslookup example.com 8.8.8.8
# Reverse
nslookup 8.8.8.8
Triage pattern: "site not loading"
# 1. Can resolve?
dig example.com +short
# 2. Correct IP?
dig example.com +short
# Compare to expected
# 3. DNS server issue?
dig @8.8.8.8 example.com +short # Public DNS
dig example.com +short # System resolver
# 4. Stale cache?
# (Clear local: sudo systemd-resolve --flush-caches)
# 5. Check from multiple perspectives
dig @1.1.1.1 example.com +short
dig @8.8.8.8 example.com +short
Tips:
+shortfor script parsing+traceshows delegation chain (helpful for debugging zone config)- Use multiple DNS servers to verify propagation
Part 4: File Operations & Text Processing
Search, filter, transform. The foundation of log triage and automation scripting.
find
Locate files by name, size, time, type. Execute batch operations.
# Find by name pattern
find /var/log -name "*.log"
# Case-insensitive
find /var/log -iname "*.LOG"
# Modified in last 7 days
find /var/log -mtime -7
# Modified more than 30 days ago
find /var/log -mtime +30
# Accessed in last 24 hours
find /var/log -atime -1
# Large files (>100MB)
find /var -size +100M
# Files between 10MB and 100MB
find /var -size +10M -size -100M
# Empty files
find /var/log -type f -empty
# Directories only
find /opt -type d -name "cache"
# Execute command on results
find /var/log -name "*.log" -exec gzip {} \;
# Safer: prompt before action
find /var/log -name "*.old" -ok rm {} \;
# Delete (use cautiously!)
find /tmp -name "*.tmp" -mtime +7 -delete
# Combine conditions (AND)
find /var/log -name "*.log" -size +100M -mtime +30
# OR logic
find /var/log \( -name "*.log" -o -name "*.txt" \)
# Exclude pattern
find /var -name "*.log" ! -path "*/archive/*"
# Limit depth
find /var -maxdepth 2 -name "*.conf"
Practical patterns:
# Find largest log files
find /var/log -type f -size +10M -exec ls -lh {} \; | sort -k5 -hr
# Old logs for cleanup
find /var/log -name "*.log.*" -mtime +90 -ls
# Recently changed configs (last 2 days)
find /etc -name "*.conf" -mtime -2
# World-writable files (security audit)
find /var/www -type f -perm -002
# Setuid binaries (security scan)
find / -type f -perm -4000 2>/dev/null
# Files owned by specific user
find /home -user bob -name "*.sh"
# Broken symlinks
find /opt -type l ! -exec test -e {} \; -print
Tips:
- Test with
-lsor-printbefore using-delete - Use
-print0+xargs -0for filenames with spaces - Redirect stderr (
2>/dev/null) to hide permission errors -mtime 0= today,-mtime 1= yesterday
grep
Pattern matching in files. Core tool for log analysis.
# Basic search
grep "error" /var/log/app.log
# Case-insensitive
grep -i "ERROR" /var/log/app.log
# Show line numbers
grep -n "error" /var/log/app.log
# Count matches
grep -c "error" /var/log/app.log
# Invert match (lines NOT containing pattern)
grep -v "debug" /var/log/app.log
# Show context (3 lines before and after)
grep -C 3 "fatal" /var/log/app.log
# Or separately:
grep -B 3 -A 3 "fatal" /var/log/app.log
# Recursive search in directory
grep -r "TODO" /app/src
# Only show filenames
grep -rl "password" /etc
# Multiple patterns (OR)
grep -E "error|warn|fatal" /var/log/app.log
# Or:
grep -e "error" -e "warn" /var/log/app.log
# Whole word match
grep -w "fail" /var/log/app.log # Won't match "failure"
# Extended regex
grep -E "^(ERROR|WARN)" /var/log/app.log
# Perl regex
grep -P "\d{3}-\d{3}-\d{4}" contacts.txt # Phone numbers
# Fixed strings (no regex—faster)
grep -F "user@example.com" /var/log/mail.log
# Binary files
grep -a "string" binaryfile # Treat as text
# With file name in output
grep -H "pattern" *.log
Log triage patterns:
# Errors in last hour (combine with journalctl/tail)
grep -i error /var/log/app.log | tail -n 100
# Filter noise
grep error /var/log/app.log | grep -v "harmless warning"
# Extract IPs
grep -oE "\b([0-9]{1,3}\.){3}[0-9]{1,3}\b" access.log | sort | uniq -c
# Extract timestamps + errors
grep "ERROR" /var/log/app.log | grep -oP "\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}"
# Show only unique errors
grep ERROR /var/log/app.log | sort | uniq
# Count error types
grep ERROR /var/log/app.log | awk '{print $5}' | sort | uniq -c | sort -rn
# Errors NOT from specific component
grep ERROR /var/log/app.log | grep -v "HealthCheck"
# Multi-level filtering
grep 500 access.log | grep -v bot | grep POST
Tips:
- Use
-Efor modern regex (oregrep) grep -vincredibly useful for filtering noise- Combine with
| lessfor scrolling long output - Save common patterns as aliases
tail / head
View file start or end. Essential for log monitoring.
# Last 10 lines (default)
tail /var/log/app.log
# Last N lines
tail -n 50 /var/log/app.log
# Shorthand:
tail -50 /var/log/app.log
# Follow (live updates)
tail -f /var/log/app.log
# Follow multiple files
tail -f /var/log/app1.log /var/log/app2.log
# Follow with line numbers
tail -n 100 -f /var/log/app.log | cat -n
# Start from line N
tail -n +100 /var/log/app.log # From line 100 to end
# First 10 lines
head /var/log/app.log
# First N lines
head -n 20 /var/log/app.log
# All but last N lines
head -n -10 /var/log/app.log # Everything except last 10
Live monitoring patterns:
# Follow + filter
tail -f /var/log/app.log | grep ERROR
# Multiple filters
tail -f /var/log/app.log | grep -E "ERROR|WARN" | grep -v "HealthCheck"
# Follow + highlight
tail -f /var/log/app.log | grep --color=always -E "ERROR|$"
# Follow with timestamps
tail -f /var/log/app.log | while read line; do echo "$(date +%T) $line"; done
# Stop after match appears
tail -f /var/log/app.log | grep -m 1 "Startup complete"
Tips:
tail -ffollows by name; use-Fto handle log rotationless +F filename=tail -fwith scroll-back ability (Ctrl-C to pause)multitailfor advanced multi-file monitoring with colors
awk
Pattern-directed text processing. Extract columns, aggregate, filter.
# Print specific columns (space-delimited)
ps aux | awk '{print $1, $11}'
# Print with header
ps aux | awk 'NR==1 || $3>50' # Header + high CPU
# Filter by condition
df -h | awk '$5 > 80 {print $0}' # >80% full
# Sum values
cat numbers.txt | awk '{sum += $1} END {print sum}'
# Average
awk '{sum+=$1; count++} END {print sum/count}' numbers.txt
# Field separator (CSV)
awk -F',' '{print $2}' data.csv
# Multiple separators
awk -F'[,:]' '{print $3}' file.txt
# Print last field
awk '{print $NF}' file.txt
# String matching
awk '/error/ {print $0}' /var/log/app.log
# Negation
awk '!/debug/ {print $0}' /var/log/app.log
# Count occurrences
awk '/ERROR/ {count++} END {print count}' /var/log/app.log
# Unique values
awk '{print $5}' file.txt | sort | uniq
Practical examples:
# Extract IPs from access log
awk '{print $1}' access.log | sort | uniq -c | sort -rn
# Memory usage by process
ps aux | awk '{mem[$11]++} END {for (i in mem) print mem[i], i}' | sort -rn
# Parse structured logs
awk -F'|' '{print $3}' app.log | sort | uniq -c
# Traffic by hour
awk '{print $4}' access.log | cut -d: -f2 | sort | uniq -c
# Calculate percentiles (simplified)
awk '{print $9}' response_times.txt | sort -n | awk '{a[NR]=$1} END {print a[int(NR*0.95)]}'
# Status code distribution
awk '{print $9}' access.log | sort | uniq -c | sort -rn
Tips:
- Start simple:
{print $N}to extract columns NR= line number,NF= number of fields- Use
-Fto set delimiter (default is whitespace) - Great for quick one-liners; for complex logic, use Python/Perl
sed
Stream editor. Substitute text, delete lines, transform input.
# Replace first occurrence
sed 's/old/new/' file.txt
# Replace all occurrences
sed 's/old/new/g' file.txt
# In-place edit (DANGEROUS—test first!)
sed -i 's/old/new/g' file.txt
# Backup before in-place edit
sed -i.bak 's/old/new/g' file.txt
# Delete lines matching pattern
sed '/pattern/d' file.txt
# Delete specific line number
sed '5d' file.txt
# Delete range
sed '10,20d' file.txt
# Print only matching lines (like grep)
sed -n '/pattern/p' file.txt
# Substitute only on matching lines
sed '/ERROR/ s/foo/bar/g' file.txt
# Multiple commands
sed -e 's/foo/bar/g' -e 's/baz/qux/g' file.txt
# Case-insensitive replace
sed 's/error/ERROR/gI' file.txt
# Add line after match
sed '/pattern/a\New line here' file.txt
# Insert line before match
sed '/pattern/i\New line here' file.txt
# Replace in specific lines
sed '10,20s/old/new/g' file.txt
Config file updates:
# Change port number
sed -i 's/^port.*/port 8080/' config.ini
# Uncomment line
sed -i 's/^# *\(option.*\)/\1/' config.file
# Comment out line
sed -i 's/^dangerous_setting/#&/' config.file
# Replace variable
sed -i "s|OLD_PATH|$NEW_PATH|g" script.sh
# Multi-line replace (advanced)
sed -i ':a;N;$!ba;s/foo\nbar/baz/g' file.txt
Tips:
- Always test without
-ifirst - Use
|as delimiter if pattern contains/:s|/path/|/newpath/| - Backup with
-i.bakbefore modifying production configs - For complex edits, consider
perl -pi -eor Python
cut
Extract columns from delimited text.
# Extract first field (default tab delimiter)
cut -f1 file.txt
# CSV (comma delimiter)
cut -d',' -f2 file.csv
# Multiple fields
cut -d':' -f1,3 /etc/passwd
# Range of fields
cut -d',' -f1-3 file.csv
# All fields except N
cut -d':' -f1,3- /etc/passwd # 1, then 3 onwards
# Character positions
cut -c1-10 file.txt
# Extract username from email
cut -d'@' -f1 emails.txt
# Combine with other commands
ps aux | tr -s ' ' | cut -d' ' -f1,11
Tips:
- Fast and simple for fixed-format data
- Use
awkfor more complex field extraction - Pair with
trto normalize delimiters
Bonus: jq
JSON processor (install separately). Essential for API work.
# Pretty-print JSON
curl -s https://api.example.com | jq '.'
# Extract field
curl -s https://api.example.com/status | jq '.status'
# Array element
jq '.[0]' array.json
# Filter array
jq '.[] | select(.active == true)' data.json
# Extract multiple fields
jq '.[] | {id, name}' data.json
# Count items
jq '. | length' array.json
# Map over array
jq '.[] | .price * 1.1' items.json # Add 10%
Tips:
- Invaluable for parsing API responses
- Use
-rfor raw output (no quotes) - Combine with
curlfor API testing pipelines
Part 5: Remote Operations & File Transfer
Connect, execute, sync. Key tools for multi-server management.
ssh
Secure remote shell access and command execution.
# Basic connection
ssh user@hostname
# Specific port
ssh -p 2222 user@hostname
# Use specific key
ssh -i ~/.ssh/id_rsa_custom user@hostname
# Execute single command
ssh user@hostname "systemctl status nginx"
# Execute multiple commands
ssh user@hostname "cd /opt && ./deploy.sh && systemctl restart app"
# Local port forwarding (tunnel)
ssh -L 8080:localhost:80 user@hostname
# Access remote :80 via local :8080
# Remote port forwarding
ssh -R 9000:localhost:3000 user@hostname
# Remote can access your local :3000 via their :9000
# Dynamic SOCKS proxy
ssh -D 1080 user@hostname
# Configure browser to use localhost:1080
# Jump host (bastion)
ssh -J jump-host@bastion.example.com user@internal-server
# X11 forwarding
ssh -X user@hostname
# Run GUI apps remotely
# Keep connection alive
ssh -o ServerAliveInterval=60 user@hostname
# Disable strict host key checking (testing only!)
ssh -o StrictHostKeyChecking=no user@hostname
# Run command with sudo
ssh -t user@hostname "sudo systemctl restart nginx"
# -t allocates pseudo-terminal (required for sudo password)
SSH config (~/.ssh/config):
# Create config for easy access
cat >> ~/.ssh/config << EOF
Host prod-web
HostName 192.168.1.100
User deploy
Port 2222
IdentityFile ~/.ssh/prod_key
ServerAliveInterval 60
Host *.internal
ProxyJump bastion.example.com
User ops
EOF
# Now simply:
ssh prod-web
ssh server1.internal
ControlMaster (connection reuse):
# Add to ~/.ssh/config
Host *
ControlMaster auto
ControlPath ~/.ssh/cm-%r@%h:%p
ControlPersist 10m
# First connection creates socket; subsequent ones reuse it
# Much faster for repeated commands
Security hardening:
# Server-side /etc/ssh/sshd_config
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
AllowUsers deploy ops
MaxAuthTries 3
ClientAliveInterval 300
ClientAliveCountMax 2
Key generation:
# Generate new key pair
ssh-keygen -t ed25519 -C "user@hostname"
# Copy public key to server
ssh-copy-id user@hostname
# Manual copy (if ssh-copy-id unavailable)
cat ~/.ssh/id_ed25519.pub | ssh user@hostname "mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys"
Tips:
- Use ed25519 keys over RSA (faster, more secure)
- ControlMaster speeds up Ansible, scripts dramatically
- Jump hosts simplify bastion access
- Never disable StrictHostKeyChecking in production
scp
Secure copy between hosts.
# Copy file to remote
scp file.txt user@hostname:/path/to/destination
# Copy from remote
scp user@hostname:/path/to/file.txt ./
# Copy directory recursively
scp -r ./directory user@hostname:/path/
# Preserve permissions and timestamps
scp -p file.txt user@hostname:/path/
# Specific port
scp -P 2222 file.txt user@hostname:/path/
# Use specific key
scp -i ~/.ssh/custom_key file.txt user@hostname:/path/
# Verbose (debugging)
scp -v file.txt user@hostname:/path/
# Limit bandwidth (KB/s)
scp -l 1000 large-file.iso user@hostname:/path/
# Via jump host
scp -J bastion.example.com file.txt user@internal-server:/path/
# Between two remote hosts
scp user1@host1:/path/file.txt user2@host2:/path/
Tips:
-Pfor port (capital P, unlike ssh's-p)- Use rsync for large transfers (better resume, progress)
- SCP doesn't handle interruptions well
rsync
Intelligent file synchronization. Handles interruptions, only transfers changes.
# Basic sync
rsync -avz /local/path/ user@hostname:/remote/path/
# Flags explained:
# -a = archive (recursive, preserve permissions, times, symlinks)
# -v = verbose
# -z = compress during transfer
# Dry run (preview changes)
rsync -avzn /local/path/ user@hostname:/remote/path/
# Show progress
rsync -avz --progress /local/path/ user@hostname:/remote/path/
# Delete files on destination not in source (dangerous!)
rsync -avz --delete /local/path/ user@hostname:/remote/path/
# Exclude patterns
rsync -avz --exclude='*.log' --exclude='tmp/' /local/ user@host:/remote/
# Include only specific patterns
rsync -avz --include='*.conf' --include='*/' --exclude='*' /etc/ backup/
# Partial transfer support (resume)
rsync -avz --partial /local/large-file user@hostname:/remote/
# Bandwidth limit (KB/s)
rsync -avz --bwlimit=1000 /local/ user@hostname:/remote/
# Custom SSH port
rsync -avz -e "ssh -p 2222" /local/ user@hostname:/remote/
# Via jump host
rsync -avz -e "ssh -J bastion" /local/ user@internal:/remote/
# Local sync (no SSH)
rsync -avz /source/ /destination/
# Show what changed
rsync -avzi /local/ user@hostname:/remote/
# i = itemize changes
# Backup with hard links (space-efficient)
rsync -avz --link-dest=/backup/previous /data/ /backup/current/
Practical backup script:
#!/bin/bash
DATE=$(date +%Y%m%d)
DEST="/backup/$DATE"
PREV="/backup/latest"
# Create incremental backup
rsync -avz --link-dest="$PREV" /data/ "$DEST/"
# Update latest symlink
ln -snf "$DEST" /backup/latest
Deployment pattern:
# Deploy with dry run first
rsync -avzn --delete /local/app/ prod-server:/opt/app/
# If ok, deploy for real
rsync -avz --delete /local/app/ prod-server:/opt/app/
# Restart service
ssh prod-server "systemctl restart app"
Tips:
- Trailing slash matters:
/path/syncs contents,/pathsyncs directory itself - Always test with
-nfirst when using--delete - Faster than scp for large/repeated transfers
- Use
--checksumfor verification (slower but accurate)
# Extract specific fields
cut -d':' -f1 /etc/passwd
# Extract columns
ps aux | cut -c1-20
- Use case: Extract specific data from structured text
- DevOps context: Parsing CSV, extracting IDs from output
- Pro tip:
-dsets delimiter,-fselects fields
Part 5: Remote Operations & File Transfer
Command: ssh
# Connect to remote server
ssh user@hostname
# Use specific key
ssh -i ~/.ssh/private_key user@hostname
# Execute remote command
ssh user@hostname "systemctl status nginx"
# SSH tunnel (port forwarding)
ssh -L 8080:localhost:80 user@hostname
- Use case: Remote server access, command execution
- DevOps context: Server management, deployment automation
- Security tip: Always use key-based authentication
Command: scp
# Copy file to remote
scp file.txt user@hostname:/path/to/destination
# Copy from remote
scp user@hostname:/path/to/file.txt ./
# Copy directory recursively
scp -r ./directory user@hostname:/path/
- Use case: File transfer between servers
- DevOps context: Deploying artifacts, copying configs
- Alternative:
rsyncfor larger transfers
Command: rsync
# Sync directories
rsync -avz /local/path/ user@hostname:/remote/path/
# Dry run (preview changes)
rsync -avzn /local/path/ user@hostname:/remote/path/
# Delete files on destination not in source
rsync -avz --delete /local/path/ user@hostname:/remote/path/
# Show progress
rsync -avz --progress /local/path/ user@hostname:/remote/path/
- Use case: Efficient file synchronization, backups
- DevOps context: Deployment automation, backup scripts
- Pro tip: Always test with
-n(dry run) first!
Part 6: System Information Commands
Command: hostname
# Show hostname
hostname
# Show IP address
hostname -i
# Show FQDN
hostname -f
- Use case: Identify which server you're on
- DevOps context: Multi-server management, cluster identification
- Pro tip: Essential in tmux/screen sessions
Command: uname
# Show kernel version
uname -r
# Show all system info
uname -a
- Use case: Check OS version, kernel compatibility
- DevOps context: Verifying system requirements, documentation
Command: date
# Current date/time
date
# Format date
date +"%Y-%m-%d %H:%M:%S"
# UTC time
date -u
# Set date (requires sudo)
sudo date -s "2025-10-07 14:30:00"
- Use case: Timestamps, time synchronization checks
- DevOps context: Log analysis, scheduling tasks
- Pro tip: Use NTP for time sync, not manual setting
Part 7: User & Permission Management
sudo — Controlled Privilege Escalation
# Execute single command as root
sudo systemctl restart nginx
# Open root shell
sudo -i
# Run as specific user
sudo -u postgres psql
# Preserve environment
sudo -E env | grep PATH
# Edit privileged file safely
sudo visudo
sudoedit /etc/nginx/nginx.conf
# Validate sudoers syntax
sudo visudo -c
# Show sudo access
sudo -l
# Log sudo commands
sudo tail -f /var/log/secure | grep sudo
Common sudo workflows:
# Service restart
sudo systemctl restart myapp
# File permissions fix
sudo chown -R appuser:appuser /opt/app
# Package install
sudo dnf install nginx
# Log inspection
sudo journalctl -u nginx -f
# Config edits
sudo vim /etc/systemd/system/myapp.service
Security hardening:
# Limit sudo timeout
echo "Defaults timestamp_timeout=5" | sudo tee -a /etc/sudoers.d/timeout
# Require password always
echo "Defaults !tty_tickets" | sudo tee -a /etc/sudoers.d/notty
# Log all sudo commands
echo "Defaults logfile=/var/log/sudo.log" | sudo tee -a /etc/sudoers.d/logging
# Restrict commands per user
echo "deploy ALL=(ALL) NOPASSWD: /usr/bin/systemctl restart myapp" | sudo tee /etc/sudoers.d/deploy
Troubleshooting sudo:
- Check
/var/log/securefor auth failures - Verify user in wheel/sudo group:
groups username - Validate sudoers:
sudo visudo -c - Test access:
sudo -l -U username
Tips:
- Never edit
/etc/sudoersdirectly, usevisudo - Use
sudoeditfor safe config editing - Grant minimal permissions (specific commands only)
- Log sudo activity for audit trail
chmod — Permission Control
# Make script executable
chmod +x deploy.sh
# Numeric permissions
chmod 755 script.sh # rwxr-xr-x
chmod 644 config.yml # rw-r--r--
chmod 600 secret.key # rw-------
chmod 700 private/ # rwx------
# Symbolic mode
chmod u+x script.sh # User: add execute
chmod g-w config.yml # Group: remove write
chmod o-rwx secret.key # Others: remove all
chmod a+r readme.txt # All: add read
# Recursive
chmod -R 755 /var/www/html
# Special bits
chmod u+s /usr/bin/sudo # setuid
chmod g+s /opt/shared # setgid
chmod +t /tmp # sticky bit
Permission reference:
| Octal | Binary | Symbolic | Meaning |
| ----- | ------ | -------- | -------------------- |
| 7 | 111 | rwx | read, write, execute |
| 6 | 110 | rw- | read, write |
| 5 | 101 | r-x | read, execute |
| 4 | 100 | r-- | read only |
| 3 | 011 | -wx | write, execute |
| 2 | 010 | -w- | write only |
| 1 | 001 | --x | execute only |
| 0 | 000 | --- | no permissions |
Deployment patterns:
# Web server files
sudo chmod -R 755 /var/www/html
sudo chmod -R 644 /var/www/html/*.html
# Application directories
sudo chmod 755 /opt/app
sudo chmod 755 /opt/app/bin/*
sudo chmod 644 /opt/app/config/*
sudo chmod 600 /opt/app/config/secrets.yml
# Logs (application needs write)
sudo chmod 755 /var/log/myapp
sudo chmod 644 /var/log/myapp/*.log
# Shared team directory (setgid)
sudo chmod 2775 /opt/shared
# New files inherit group ownership
Security considerations:
# Find world-writable files (danger)
find / -type f -perm -002 2>/dev/null
# Find setuid binaries (audit these)
find / -type f -perm -4000 2>/dev/null
# SSH key permissions (strict)
chmod 600 ~/.ssh/id_rsa
chmod 644 ~/.ssh/id_rsa.pub
chmod 700 ~/.ssh
# Config files with secrets
chmod 600 /opt/app/config/database.yml
Tips:
- 755 for directories, 644 for files (default safe values)
- 600 for secrets (owner read/write only)
- Use setgid (2775) for shared team directories
- Never chmod 777 (world-writable = security risk)
chown / chgrp — Ownership Management
# Change owner and group
sudo chown user:group file.txt
# Change owner only
sudo chown user file.txt
# Change group only
sudo chgrp group file.txt
sudo chown :group file.txt
# Recursive ownership
sudo chown -R www-data:www-data /var/www/html
# Follow symlinks
sudo chown -LR user:group /path/with/symlinks
# Report changes
sudo chown -v user:group file.txt
# Reference another file
sudo chown --reference=/etc/nginx/nginx.conf custom.conf
Application deployment:
# After deployment, fix ownership
sudo chown -R appuser:appgroup /opt/app
# Web server content
sudo chown -R nginx:nginx /var/www/html
# Database files
sudo chown -R postgres:postgres /var/lib/pgsql
# Log files
sudo chown -R appuser:appgroup /var/log/myapp
Shared directory pattern:
# Create shared directory
sudo mkdir /opt/shared
sudo chgrp developers /opt/shared
sudo chmod 2775 /opt/shared # setgid + group write
# Now all files created inherit 'developers' group
# Team members can collaborate without permission issues
Troubleshooting ownership:
# Find files by owner
find /opt/app -user olduser
# Bulk ownership change
find /opt/app -user olduser -exec sudo chown newuser:newgroup {} +
# Check who owns critical files
ls -l /etc/systemd/system/myapp.service
ls -l /opt/app/bin/start.sh
# Verify web server can read
sudo -u nginx cat /var/www/html/index.html
Tips:
- Use
-Rcarefully, avoid/or/home - Application files should be owned by app user
- Use setgid for shared directories
- Always check ownership after deployment
getent — User/Group Queries
# Query user database
getent passwd username
getent passwd | grep -i john
# Query group database
getent group developers
getent group | grep -i admin
# Check if user exists (script-friendly)
getent passwd deploy > /dev/null && echo "User exists"
# List all groups a user belongs to
getent group | grep username
# Better: use `groups` or `id`
groups username
id -nG username
# Query shadow (requires root)
sudo getent shadow username
# Check LDAP/AD users
getent passwd | wc -l # Total users including LDAP
# Hosts database
getent hosts example.com
# Services database
getent services http
getent services 80
Troubleshooting access:
# User verification workflow
getent passwd username
groups username
id username
sudo -l -U username
# Check application user
getent passwd appuser
ps aux | grep appuser
sudo ls -la /opt/app
# Verify group membership
getent group developers
# Does output include expected user?
# Find user's primary group
id -gn username
# Audit sudo access
getent group wheel
getent group sudo
Tips:
getentqueries all NSS sources (local + LDAP/AD)- Use for portability (works on all Linux distros)
- Better than
cat /etc/passwd(misses LDAP/AD users) - Combine with
idandgroupsfor complete picture
Part 8: DevOps-Specific Power Commands
xargs — Argument Builder for Bulk Operations
# Basic usage
echo "file1 file2 file3" | xargs rm
# From find output
find /tmp -name "*.tmp" | xargs rm
# Parallel execution
cat servers.txt | xargs -P 4 -I {} ssh {} "uptime"
# Handle filenames with spaces
find . -name "*.log" -print0 | xargs -0 rm
# Interactive confirmation
find . -name "*.bak" | xargs -p rm
# One argument per command
cat users.txt | xargs -n 1 sudo useradd
# Custom placeholder
cat servers.txt | xargs -I HOST ssh HOST "df -h"
# Max processes
seq 1 100 | xargs -P 10 -I {} curl "http://api/item/{}"
# Show command before execution
echo "file1 file2" | xargs -t rm
Bulk operations:
# Kill multiple processes
ps aux | grep zombie | awk '{print $2}' | xargs kill -9
# Restart services across servers
cat servers.txt | xargs -I {} ssh {} "systemctl restart nginx"
# Parallel healthcheck
cat production-servers.txt | xargs -P 10 -I {} sh -c 'curl -sf http://{}/health || echo "{} DOWN"'
# Download multiple files
cat urls.txt | xargs -n 1 -P 4 wget
# Change ownership in bulk
find /opt/app -user olduser -print0 | xargs -0 sudo chown newuser:newgroup
# Parallel log search
cat servers.txt | xargs -P 5 -I {} ssh {} "grep ERROR /var/log/app.log"
# Compress old logs
find /var/log -name "*.log" -mtime +30 -print0 | xargs -0 gzip
# Delete empty directories
find . -type d -empty -print0 | xargs -0 rmdir
Deployment automation:
# Deploy to multiple servers
cat prod-servers.txt | xargs -P 3 -I {} sh -c '
echo "Deploying to {}"
rsync -az /local/app/ {}:/opt/app/
ssh {} "systemctl restart myapp"
ssh {} "curl -sf http://localhost:8080/health"
'
# Parallel config updates
cat servers.txt | xargs -P 5 -I {} scp config.yml {}:/opt/app/config/
# Bulk log rotation
cat servers.txt | xargs -I {} ssh {} "sudo logrotate -f /etc/logrotate.conf"
Tips:
-P Nfor parallel execution (speedup)-0withfind -print0for filenames with spaces-I {}for custom placeholder-n 1to run one argument at a time- Test with
echobefore destructive operations
history — Command Audit Trail
# Show full history
history
# Last 20 commands
history 20
# Search history
history | grep ssh
history | grep "systemctl restart"
# Execute by number
!123
# Execute last command
!!
# Execute last command with specific text
!ssh
!curl
# Last argument from previous command
ls /var/log/nginx
cd !$ # cd to /var/log/nginx
# All arguments from previous command
grep ERROR /var/log/app.log
vi !* # vi /var/log/app.log
# Previous command, substitution
docker run old-image
^old^new^ # docker run new-image
# Reverse search (interactive)
Ctrl+R
# Type to search, Enter to execute
History configuration:
# Add to ~/.bashrc for better history
# Unlimited history
export HISTSIZE=10000
export HISTFILESIZE=20000
# Timestamp in history
export HISTTIMEFORMAT="%Y-%m-%d %H:%M:%S "
# Ignore duplicates and space-prefixed commands
export HISTCONTROL=ignoreboth:erasedups
# Ignore common commands
export HISTIGNORE="ls:ll:cd:pwd:history"
# Append to history (don't overwrite)
shopt -s histappend
# Save immediately (not on shell exit)
PROMPT_COMMAND="history -a"
# Multi-line commands on single line
shopt -s cmdhist
Practical workflows:
# Document what you did
history | grep -A 5 "systemctl restart"
# Build runbook from history
history | grep "docker" > docker-commands.txt
# Repeat deployment steps
history | grep -E "(rsync|systemctl restart)"
# Audit manual changes
history | grep "vi /etc"
# Share troubleshooting steps
history 50 | grep -E "(curl|grep|tail)"
Incident response:
# What did I just run?
history 5
# When was nginx restarted?
history | grep "systemctl.*nginx"
# Find that long curl command
history | grep curl | grep -i auth
# Repeat complex command
!curl
# or
Ctrl+R curl
Tips:
- Set large
HISTSIZEfor better recall - Add timestamps with
HISTTIMEFORMAT - Use
Ctrl+Rfor interactive search - Prefix sensitive commands with space to skip history
alias — Command Shortcuts
# Create temporary alias
alias ll='ls -lah'
alias ports='ss -tuln'
# View all aliases
alias
# Remove alias
unalias ll
# Make permanent (add to ~/.bashrc)
echo "alias ll='ls -lah'" >> ~/.bashrc
source ~/.bashrc
# Escape alias (run original command)
\ls # bypasses alias
# Check if command is aliased
type ll
DevOps-focused aliases:
# Add to ~/.bashrc
# System info
alias meminfo='free -h'
alias diskinfo='df -h'
alias cpuinfo='lscpu'
alias ports='ss -tuln'
# Service management
alias svc='systemctl'
alias svcstatus='systemctl status'
alias svcrestart='sudo systemctl restart'
alias svclogs='journalctl -u'
# Process monitoring
alias pscpu='ps aux | sort -nrk 3 | head'
alias psmem='ps aux | sort -nrk 4 | head'
alias topme='htop -u $USER'
# Logs
alias tailf='tail -f'
alias taillogs='sudo tail -f /var/log/messages'
alias greplog='grep -Hn --color=auto'
# Docker (if used)
alias dps='docker ps'
alias dlog='docker logs -f'
alias dexec='docker exec -it'
alias dclean='docker system prune -af'
# Git (if used)
alias gs='git status'
alias gl='git log --oneline -10'
alias gp='git pull'
# Safety aliases
alias rm='rm -i'
alias cp='cp -i'
alias mv='mv -i'
# Navigation
alias ..='cd ..'
alias ...='cd ../..'
alias ll='ls -lah'
alias la='ls -A'
# Network
alias myip='curl ifconfig.me'
alias pingg='ping -c 5 8.8.8.8'
alias listening='sudo ss -tulpn | grep LISTEN'
# System
alias update='sudo dnf update'
alias reboot='sudo systemctl reboot'
alias suspend='sudo systemctl suspend'
Team-shared aliases:
# Create team dotfiles repo
mkdir ~/dotfiles
cd ~/dotfiles
# Create shared aliases
cat > aliases.sh << 'EOF'
# Production shortcuts
alias prod-ssh='ssh -J bastion prod-server'
alias prod-logs='ssh prod-server "sudo journalctl -u myapp -f"'
alias prod-status='ssh prod-server "systemctl status myapp"'
# Deployment helpers
alias deploy-staging='./scripts/deploy.sh staging'
alias deploy-prod='./scripts/deploy.sh production'
# Monitoring
alias check-health='for s in $(cat servers.txt); do curl -sf http://$s/health || echo "$s DOWN"; done'
EOF
# Source in ~/.bashrc
echo "source ~/dotfiles/aliases.sh" >> ~/.bashrc
Tips:
- Use meaningful names (verb-noun pattern)
- Document complex aliases
- Share team aliases via dotfiles repo
- Test aliases before making permanent
- Use functions for complex logic
time — Performance Measurement
# Basic timing (shell builtin)
time ls -R /
# Detailed system time (/usr/bin/time)
/usr/bin/time -v ./script.sh
# Output format
time -p ./command # Portable format
# Redirect time output
{ time ./script.sh; } 2> timing.txt
# Time multiple commands
time { command1; command2; command3; }
Interpreting output:
$ time ./deploy.sh
real 2m15.432s # Wall clock time (total elapsed)
user 0m5.220s # CPU time in user mode
sys 0m1.880s # CPU time in kernel mode
# If real >> user+sys = I/O or network bound
# If user >> sys = CPU intensive (good)
# If sys >> user = kernel overhead (syscalls, I/O)
Detailed metrics with /usr/bin/time:
$ /usr/bin/time -v ./process-logs.sh
Command being timed: "./process-logs.sh"
User time (seconds): 45.23
System time (seconds): 8.12
Percent of CPU this job got: 87%
Elapsed (wall clock) time: 1:01.23
Maximum resident set size (kbytes): 524288
Page faults: 1243
Voluntary context switches: 8934
Performance analysis:
# Compare script versions
time ./old-script.sh > /dev/null
time ./new-script.sh > /dev/null
# Find slow command in pipeline
time cat large.log | grep ERROR | awk '{print $1}' | sort | uniq -c
# Time individual pipeline stages
time cat large.log > /dev/null
time cat large.log | grep ERROR > /dev/null
time cat large.log | grep ERROR | awk '{print $1}' > /dev/null
# Deployment timing
time {
rsync -az /local/ server:/remote/
ssh server "systemctl restart app"
sleep 10
curl http://server/health
}
Benchmarking:
# Run multiple times
for i in {1..10}; do
time ./script.sh > /dev/null
done
# Average timing
for i in {1..10}; do
{ time ./script.sh > /dev/null; } 2>&1 | grep real
done | awk '{sum+=$2; count++} END {print sum/count}'
Tips:
- Shell builtin
timevs/usr/bin/time(different features) - Use
-vwith /usr/bin/time for memory stats - High I/O wait = optimize disk/network
- Profile before optimizing
watch — Live Command Monitoring
# Execute every 2 seconds (default)
watch df -h
# Custom interval
watch -n 5 "ps aux | grep nginx"
# Highlight differences
watch -d -n 1 "ss -s"
# Exit on change
watch -g "curl -s http://localhost/health | grep UP"
# Precise timing
watch -p -n 0.1 "cat /proc/loadavg"
# No title
watch -t df -h
# Beep on error
watch -b "curl -sf http://localhost/health"
Deployment monitoring:
# Watch service status during deployment
watch -n 1 "systemctl status myapp"
# Monitor application startup
watch -d -n 2 "ss -tuln | grep :8080"
# Watch log for errors
watch -n 1 "tail -20 /var/log/myapp/error.log"
# Monitor health endpoint
watch -n 5 "curl -sf http://localhost/health || echo 'DOWN'"
# Wait for service to start
watch -g "curl -s http://localhost/health | grep UP"
# Exits when grep succeeds
# Monitor resource usage
watch -d -n 2 "ps aux | grep myapp | grep -v grep"
# Track deployment progress
watch -n 1 "ls -lh /opt/app/ | tail -5"
System monitoring:
# Live disk usage
watch -d df -h
# Memory changes
watch -d -n 1 free -h
# Network connections
watch -d -n 2 "ss -s"
# Load average
watch -n 5 uptime
# Process count
watch -n 5 "ps aux | wc -l"
# Active connections by IP
watch -d -n 2 "ss -tn | tail -n +2 | awk '{print \$5}' | cut -d: -f1 | sort | uniq -c | sort -nr | head -10"
Automation patterns:
# Wait for port to open (deployment)
watch -g "ss -tuln | grep :8080"
echo "Service is up!"
# Monitor log until error appears
watch -g "grep -q ERROR /var/log/app.log"
echo "Error detected!"
# Wait for file to appear
watch -g "test -f /tmp/deploy-complete"
echo "Deployment finished"
# Monitor certificate expiry
watch -n 3600 "echo | openssl s_client -connect example.com:443 2>/dev/null | openssl x509 -noout -dates"
Tips:
- Use
-dto highlight what changed - Combine with
-gto wait for condition - Quote complex commands with pipes
- Use
-nfor custom intervals (minimum 0.1s) - Exit watch with
Ctrl+C
Part 9: Real-World DevOps Scenarios
These workflows combine multiple commands to solve actual production problems.
Scenario 1: Application Down — High CPU Usage
Situation: Application unresponsive, server CPU at 100%.
Triage workflow:
# 1. Establish current state
uptime
# Load average: 8.5, 7.2, 5.1 (high for 4-core system)
top
# Press '1' to show individual cores
# Press 'P' to sort by CPU
# Identify: java process consuming 380% CPU
# 2. Identify the culprit process
ps aux | sort -nrk 3 | head -5
# Output shows PID 12345, user 'appuser', 'java -jar myapp.jar'
# 3. Get detailed process info
ps -fp 12345
cat /proc/12345/cmdline | tr '\0' ' '
# Full command: java -jar -Xmx2g myapp.jar --spring.profiles.active=prod
# 4. Check what the process is doing
sudo lsof -p 12345 | head -20
# Shows open files, network connections
# 5. Sample thread activity (Java-specific)
sudo -u appuser jstack 12345 > /tmp/thread-dump.txt
cat /tmp/thread-dump.txt | grep -A 10 "RUNNABLE"
# Or use strace for system call analysis
sudo strace -c -p 12345
# Run for 10 seconds, Ctrl+C
# Shows: lots of futex, read, write calls
# 6. Check application logs
journalctl -u myapp -n 200 --no-pager
tail -100 /var/log/myapp/application.log | grep -E "ERROR|WARN"
# 7. Check for resource exhaustion
free -h
# Available: 128M (low memory!)
df -h
# All filesystems have space
# 8. Investigate memory
ps aux | sort -nrk 4 | head -5
# Same java process using 85% memory
# 9. Root cause: memory leak causing GC thrashing
# Evidence: High CPU + high memory + GC logs showing full GC cycles
# 10. Immediate remediation
sudo systemctl restart myapp
# 11. Verify recovery
watch -n 2 "systemctl status myapp"
curl http://localhost:8080/health
# 12. Monitor
watch -d -n 5 "ps aux | grep java | grep -v grep"
# 13. Post-incident
# - Review heap dump
# - Check for memory leaks in code
# - Tune JVM parameters
# - Set up memory alerts
Key commands used: uptime, top, ps, lsof, strace, journalctl, free, systemctl
Scenario 2: Disk Full — Application Failing
Situation: Application writes failing, errors mentioning "No space left on device".
Triage workflow:
# 1. Confirm disk space issue
df -h
# Output: /dev/sda1 50G 50G 0 100% /
# 2. Check inode exhaustion (common gotcha)
df -i
# Output: /dev/sda1 3M 3M 0 100% /
# Problem: Out of inodes, not space!
# 3. Find directory with most files
for dir in /*; do
echo -n "$dir: "
find "$dir" -xdev -type f | wc -l
done
# Output shows /var has 2.8M files
# 4. Drill down
for dir in /var/*; do
echo -n "$dir: "
find "$dir" -xdev -type f 2>/dev/null | wc -l
done
# /var/spool/postfix has 2.5M files!
# 5. Investigate further
ls -la /var/spool/postfix/deferred | head
# Thousands of mail queue files
# 6. Find largest space consumers (if space, not inode issue)
du -hx --max-depth=1 / | sort -hr | head -10
# /var is largest
du -hx --max-depth=1 /var | sort -hr | head -10
# /var/log is 30G
du -hx --max-depth=1 /var/log | sort -hr | head -10
# /var/log/myapp is 28G
# 7. Find large log files
find /var/log -type f -size +1G -exec ls -lh {} \;
# /var/log/myapp/application.log is 25G (not rotated!)
# 8. Check for deleted but open files (hidden space usage)
sudo lsof | grep deleted
# Shows process holding deleted 10G file
# 9. Immediate remediation (for log issue)
# Truncate, don't delete (keeps file descriptor valid)
sudo truncate -s 0 /var/log/myapp/application.log
# Or compress in place
sudo gzip /var/log/myapp/application.log
# 10. Clean old logs
find /var/log -name "*.log" -mtime +30 -exec gzip {} \;
find /var/log -name "*.gz" -mtime +90 -delete
# 11. For inode issue, remove old mail queue
sudo postsuper -d ALL deferred
# 12. Verify space recovered
df -h
df -i
# 13. Restart application
sudo systemctl restart myapp
# 14. Verify health
curl http://localhost:8080/health
tail -f /var/log/myapp/application.log
# 15. Post-incident
# - Configure log rotation
# - Set up disk space monitoring
# - Implement log retention policy
Log rotation fix:
# Create logrotate config
sudo tee /etc/logrotate.d/myapp << EOF
/var/log/myapp/*.log {
daily
rotate 7
compress
delaycompress
missingok
notifempty
create 0644 appuser appgroup
postrotate
systemctl reload myapp
endscript
}
EOF
# Test it
sudo logrotate -d /etc/logrotate.d/myapp
sudo logrotate -f /etc/logrotate.d/myapp
Key commands used: df, du, find, lsof, truncate, logrotate
Scenario 3: Network Connectivity Issues
Situation: Application can't reach database server.
Triage workflow:
# 1. Define the problem
# Application logs show: "Connection refused: db.internal:5432"
# 2. Test basic connectivity
ping -c 4 db.internal
# 64 bytes from 10.0.1.50: success
# Network layer works
# 3. Test DNS resolution
dig db.internal
# Returns: 10.0.1.50
# DNS works
# Verify with getent
getent hosts db.internal
# 10.0.1.50 db.internal
# 4. Check if port is reachable
nc -zv db.internal 5432
# Connection refused
# Alternative: use curl for TCP check
timeout 5 bash -c "</dev/tcp/db.internal/5432" && echo "Port open" || echo "Port closed"
# 5. Check local routing
ip route get 10.0.1.50
# Shows route via 10.0.1.1
# Trace the path
traceroute -n db.internal
# All hops respond
# 6. Check local listening ports
ss -tuln | grep :5432
# Nothing! PostgreSQL not listening
# 7. Log into database server
ssh db.internal
# 8. Check if PostgreSQL is running
sudo systemctl status postgresql
# Active: inactive (dead)
# Service crashed!
# 9. Check why it's down
sudo journalctl -u postgresql -n 100
# Shows: "FATAL: could not create lock file: No space left"
# 10. Check disk space
df -h
# /var is 100% full
# 11. Clean up space (see Scenario 2)
sudo find /var/log -name "*.log" -mtime +7 -exec gzip {} \;
# 12. Start PostgreSQL
sudo systemctl start postgresql
# 13. Verify it's listening
ss -tuln | grep :5432
# tcp LISTEN 0 128 *:5432 *:*
# 14. Test from app server
nc -zv db.internal 5432
# Connection to db.internal 5432 port [tcp/postgresql] succeeded!
# 15. Check PostgreSQL logs
sudo tail -f /var/log/postgresql/postgresql-*.log
# 16. Test application connection
# From app server:
psql -h db.internal -U appuser -d appdb -c "SELECT 1;"
# Success!
# 17. Restart application
ssh app-server "sudo systemctl restart myapp"
# 18. Verify application health
curl http://app-server:8080/health
Network debugging cheat sheet:
# Layer 1-2: Physical/Link
ip link show
ethtool eth0
# Layer 3: Network
ping -c 4 <host>
ip route
ip addr
# Layer 4: Transport
ss -tuln | grep :<port>
nc -zv <host> <port>
# Layer 7: Application
curl -v http://<host>:<port>/
dig <hostname>
# Firewall
sudo firewall-cmd --list-all
sudo iptables -L -n
Key commands used: ping, dig, nc, ss, ip, traceroute, ssh, systemctl, journalctl
Scenario 4: Post-Deployment Validation
Situation: Just deployed new application version, need to verify health.
Complete validation workflow:
# 1. Pre-deployment snapshot
ssh prod-server << 'EOF'
systemctl status myapp > /tmp/pre-deploy.txt
ss -tuln | grep :8080 >> /tmp/pre-deploy.txt
ps aux | grep myapp >> /tmp/pre-deploy.txt
EOF
# 2. Deploy application
rsync -avz --delete /local/app/ prod-server:/opt/app/
# 3. Restart service
ssh prod-server "sudo systemctl restart myapp"
# 4. Wait for startup (30 seconds)
sleep 30
# 5. Check service status
ssh prod-server "systemctl status myapp"
# Active: active (running) since...
# 6. Verify process is running
ssh prod-server "ps aux | grep myapp | grep -v grep"
# Shows java process
# 7. Check port is listening
ssh prod-server "ss -tuln | grep :8080"
# tcp LISTEN 0 128 *:8080 *:*
# 8. Test health endpoint
curl -sf http://prod-server:8080/health
# {"status":"UP","version":"1.2.3"}
# If fails:
curl -v http://prod-server:8080/health
# Shows detailed error
# 9. Check application logs for startup
ssh prod-server "sudo journalctl -u myapp -n 50 --no-pager"
# Look for "Started MyApp" message
# 10. Check for errors
ssh prod-server "sudo journalctl -u myapp -p err -n 20"
# No errors = good
# 11. Watch logs for anomalies
ssh prod-server "sudo tail -f /var/log/myapp/application.log" &
TAIL_PID=$!
# 12. Smoke test critical endpoints
curl -sf http://prod-server:8080/api/users | jq '.data | length'
# Returns count
curl -sf http://prod-server:8080/api/config
# Returns config
# 13. Load test (light)
for i in {1..100}; do
curl -sf http://prod-server:8080/health > /dev/null
echo -n "."
done
echo " Done"
# 14. Monitor resource usage
ssh prod-server "ps aux | grep myapp | awk '{print \$3, \$4}'"
# CPU: 2.5%, MEM: 15.3%
# 15. Check for memory leaks (wait 5 minutes, check again)
sleep 300
ssh prod-server "ps aux | grep myapp | awk '{print \$3, \$4}'"
# CPU: 1.2%, MEM: 15.4% (stable)
# 16. Database connectivity
ssh prod-server "curl -sf http://localhost:8080/api/db-health"
# {"database":"connected"}
# 17. External dependencies
ssh prod-server "curl -sf http://localhost:8080/api/dependencies"
# Shows all upstream services: OK
# 18. Monitor for 10 minutes
watch -d -n 10 "ssh prod-server 'systemctl status myapp | head -3'"
# 19. Check error logs continuously
ssh prod-server "sudo journalctl -u myapp -f" | grep -i error &
# 20. Stop monitoring
kill $TAIL_PID
# 21. Final validation
curl -sf http://prod-server:8080/health && echo "✓ DEPLOYMENT SUCCESS" || echo "✗ DEPLOYMENT FAILED"
# 22. Post-deployment snapshot
ssh prod-server << 'EOF'
systemctl status myapp > /tmp/post-deploy.txt
ss -tuln | grep :8080 >> /tmp/post-deploy.txt
ps aux | grep myapp >> /tmp/post-deploy.txt
EOF
# 23. Compare before/after
ssh prod-server "diff /tmp/pre-deploy.txt /tmp/post-deploy.txt"
Automated health check script:
#!/bin/bash
set -euo pipefail
HOST="$1"
PORT="${2:-8080}"
TIMEOUT=300 # 5 minutes
echo "Validating deployment on $HOST:$PORT"
# Wait for port to open
echo -n "Waiting for port to listen..."
for i in $(seq 1 $TIMEOUT); do
if nc -z "$HOST" "$PORT" 2>/dev/null; then
echo " OK"
break
fi
echo -n "."
sleep 1
done
# Test health endpoint
echo -n "Testing health endpoint..."
if curl -sf "http://$HOST:$PORT/health" > /dev/null; then
echo " OK"
else
echo " FAILED"
exit 1
fi
# Check logs for errors
echo -n "Checking logs for errors..."
if ssh "$HOST" "sudo journalctl -u myapp --since '5 minutes ago' -p err -q"; then
echo " ERRORS FOUND"
exit 1
else
echo " OK"
fi
# Monitor for 60 seconds
echo "Monitoring for 60 seconds..."
for i in {1..60}; do
if ! curl -sf "http://$HOST:$PORT/health" > /dev/null; then
echo "Health check failed after $i seconds"
exit 1
fi
echo -n "."
sleep 1
done
echo " OK"
echo "✓ Deployment validation complete"
Key commands used: rsync, ssh, systemctl, ps, ss, curl, journalctl, tail, watch
Part 10: Pro Tips & Best Practices
1. Command Combinations (Pipeline Mastery)
Process analysis:
# Top 10 CPU consumers
ps aux | sort -nrk 3 | head -10
# Top 10 memory consumers
ps aux | sort -nrk 4 | head -10
# Count processes by user
ps aux | awk '{print $1}' | sort | uniq -c | sort -nr
# Find zombie processes
ps aux | awk '$8=="Z" {print}'
# Process tree for specific process
ps -ef | grep -A 10 nginx
# Total memory by command
ps aux | awk '{arr[$11]+=$6} END {for (i in arr) print i,arr[i]/1024 "MB"}' | sort -nrk 2 | head
Log analysis pipelines:
# Top 10 error types
grep ERROR /var/log/app.log | awk '{print $5}' | sort | uniq -c | sort -nr | head -10
# Requests per minute
awk '{print $1}' access.log | sort | uniq -c
# 95th percentile response time
awk '{print $10}' access.log | sort -n | awk '{a[NR]=$1} END {print a[int(NR*0.95)]}'
# Top IPs by request count
awk '{print $1}' access.log | sort | uniq -c | sort -nr | head -10
# Failed requests by hour
grep " 500 " access.log | awk '{print $4}' | cut -d: -f1-2 | uniq -c
# Unique users today
awk '{print $1}' access.log | sort -u | wc -l
Network analysis:
# Connections by state
ss -ant | awk '{print $1}' | sort | uniq -c
# Connections per IP
ss -tn | tail -n +2 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr
# Listening services with PIDs
sudo ss -tlnp | column -t
# Bandwidth by process (requires iftop)
sudo iftop -P
# Active connections to specific port
ss -tn sport = :8080 | tail -n +2 | wc -l
System monitoring oneliners:
# CPU usage per core
mpstat -P ALL 1 5
# Disk I/O by device
iostat -xz 1 5
# Memory breakdown
free -h && echo "---" && cat /proc/meminfo | grep -E "Dirty|Writeback|Mapped"
# Top 10 open file descriptors by process
sudo lsof | awk '{print $1}' | sort | uniq -c | sort -nr | head -10
# Process with most threads
ps -eo pid,comm,nlwp | sort -nrk 3 | head
File operations:
# Find recently modified config files
find /etc -name "*.conf" -mtime -1 -ls
# Find files larger than 100M modified in last 7 days
find / -type f -size +100M -mtime -7 2>/dev/null
# Duplicate file finder (by size)
find . -type f -exec du -h {} + | sort -h | uniq -d -w 10
# Find and delete temp files older than 7 days
find /tmp -type f -name "*.tmp" -mtime +7 -delete
# Disk usage by user in /home
sudo du -sch /home/*/ | sort -h
2. Essential Aliases for DevOps
Create ~/.bash_aliases or add to ~/.bashrc:
# System monitoring
alias meminfo='free -h'
alias cpuinfo='lscpu | grep -E "Model name|^CPU\(s\)|Thread|Core"'
alias diskinfo='df -h | grep -v loop'
alias topme='htop -u $USER'
# Process management
alias pscpu='ps aux | sort -nrk 3 | head -10'
alias psmem='ps aux | sort -nrk 4 | head -10'
alias pstree='ps axjf'
# Service management
alias svc='systemctl'
alias svcs='systemctl status'
alias svcr='sudo systemctl restart'
alias svce='sudo systemctl enable'
alias svcd='sudo systemctl disable'
alias svclogs='journalctl -u'
alias svclist='systemctl list-units --type=service --state=running'
# Network
alias ports='sudo ss -tulpn'
alias listening='sudo ss -tulpn | grep LISTEN'
alias connections='ss -tu'
alias myip='curl -s ifconfig.me'
alias pingtest='ping -c 5 8.8.8.8'
# Logs
alias tailf='tail -f'
alias taillogs='sudo tail -f /var/log/messages'
alias tailerr='sudo tail -f /var/log/messages | grep -i error'
alias syslog='sudo journalctl -f'
# File operations
alias ll='ls -lah'
alias la='ls -A'
alias lt='ls -lhtr' # Sorted by time
alias lsize='ls -lhS' # Sorted by size
alias tree='tree -C'
# Safety
alias rm='rm -i'
alias cp='cp -i'
alias mv='mv -i'
alias mkdir='mkdir -p'
# Navigation
alias ..='cd ..'
alias ...='cd ../..'
alias ....='cd ../../..'
# Git (if used)
alias gs='git status'
alias gd='git diff'
alias gl='git log --oneline -10'
alias gb='git branch'
alias gp='git pull'
alias gf='git fetch'
# Docker (if used)
alias dps='docker ps'
alias dpsa='docker ps -a'
alias dimg='docker images'
alias dlog='docker logs -f'
alias dexec='docker exec -it'
alias dstop='docker stop $(docker ps -q)'
alias dprune='docker system prune -af'
# Quick commands
alias h='history'
alias c='clear'
alias x='exit'
alias reload='source ~/.bashrc'
# Custom shortcuts
alias dev='cd ~/projects/dev-server && ll'
alias logs='cd /var/log && ll'
alias nginx-reload='sudo systemctl reload nginx && echo "Nginx reloaded"'
alias app-restart='sudo systemctl restart myapp && curl -sf localhost:8080/health'
Team-shared aliases (put in git repo):
# Production shortcuts
alias prod-ssh='ssh -J bastion prod-server'
alias prod-status='ssh prod-server "systemctl status myapp"'
alias prod-logs='ssh prod-server "sudo journalctl -u myapp -f"'
alias prod-restart='ssh prod-server "sudo systemctl restart myapp"'
# Deployment
alias deploy-staging='./scripts/deploy.sh staging'
alias deploy-prod='./scripts/deploy.sh production'
# Health checks
alias check-all='for s in $(cat servers.txt); do echo -n "$s: "; curl -sf http://$s/health && echo OK || echo FAIL; done'
# Log aggregation
alias tail-all='for s in $(cat servers.txt); do echo "=== $s ==="; ssh $s "tail -20 /var/log/app.log"; done'
3. Safety Practices
Before destructive operations:
# Test with echo first
find /old/path -name "*.log" -exec echo rm {} \;
# Dry run with rsync
rsync -avzn --delete /source/ /dest/
# Backup before sed -i
cp file.conf file.conf.backup
sed -i 's/old/new/' file.conf
# Or use sed without -i
sed 's/old/new/' file.conf > file.conf.new
diff file.conf file.conf.new
mv file.conf.new file.conf
# Interactive delete
rm -i important-file
# Confirm before bulk operations
find . -name "*.tmp" -ok rm {} \;
# Use trash instead of rm (install trash-cli)
alias rm='trash'
Validate before execution:
# Check command syntax
sudo nginx -t
sudo systemctl daemon-reload
# Verify sudo permissions
sudo -l
# Test script syntax
bash -n script.sh
shellcheck script.sh
# Validate JSON/YAML
jq . config.json
python3 -m json.tool config.json
# Check file ownership before chown
ls -la file.txt
Use safe patterns:
# Set errexit (exit on error)
set -e
set -euo pipefail
# Use || true for commands that may fail
grep pattern file.txt || true
# Check command exists
command -v jq >/dev/null || { echo "jq not found"; exit 1; }
# Verify file exists before operations
[ -f /path/file ] || { echo "File not found"; exit 1; }
# Lock files for scripts
LOCKFILE=/tmp/myscript.lock
if [ -f "$LOCKFILE" ]; then
echo "Script already running"
exit 1
fi
touch "$LOCKFILE"
trap "rm -f $LOCKFILE" EXIT
4. Efficiency Tips
Command-line shortcuts:
# Reverse search (most powerful)
Ctrl+R # Search history
Ctrl+R # Next match
Enter # Execute
# Navigation
Ctrl+A # Start of line
Ctrl+E # End of line
Ctrl+K # Kill to end of line
Ctrl+U # Kill to start of line
Ctrl+W # Delete word backward
Alt+B # Move back one word
Alt+F # Move forward one word
# Process control
Ctrl+C # Kill current process
Ctrl+Z # Suspend process
jobs # List background jobs
fg # Bring to foreground
bg # Resume in background
# Clear and exit
Ctrl+L # Clear screen
Ctrl+D # Exit shell
Useful shell options:
# Add to ~/.bashrc
# Autocorrect typos in cd
shopt -s cdspell
# Extended glob patterns
shopt -s extglob
# Recursive globbing (**)
shopt -s globstar
# Append to history, don't overwrite
shopt -s histappend
# Multi-line commands on one line
shopt -s cmdhist
# Update LINES and COLUMNS after resize
shopt -s checkwinsize
Productivity multipliers:
# Use !! for last command
sudo !!
# Use !$ for last argument
cat /var/log/nginx/access.log
vi !$
# Use !* for all arguments
grep error /var/log/app.log
less !*
# Quick substitution
docker run old-image
^old^new
# Brace expansion
mv file.{txt,bak} # mv file.txt file.bak
cp file.conf{,.backup} # cp file.conf file.conf.backup
mkdir -p project/{src,bin,lib,docs}
# Command substitution
cd $(dirname $(which nginx))
kill $(pgrep -f "java.*myapp")
# For loops
for i in {1..5}; do echo "Server $i"; done
for server in web{1..3}; do ssh $server uptime; done
for file in *.log; do gzip "$file"; done
# While loops with read
cat servers.txt | while read server; do
echo "Checking $server"
ssh "$server" "df -h"
done
tmux for persistent sessions:
# Start session
tmux
# Detach
Ctrl+B D
# List sessions
tmux ls
# Reattach
tmux attach
# Named session
tmux new -s deploy
# Split windows
Ctrl+B % # Vertical split
Ctrl+B " # Horizontal split
Ctrl+B O # Switch pane
5. Documentation & Learning
Document as you go:
# Save your history with context
history | tail -20 > incident-$(date +%Y%m%d).txt
# Create runbooks from commands
cat << 'EOF' > runbook-deploy.md
# Deployment Runbook
## Pre-deployment
\`\`\`bash
ssh prod-server "systemctl status myapp"
\`\`\`
## Deployment
\`\`\`bash
rsync -avz /local/app/ prod-server:/opt/app/
ssh prod-server "sudo systemctl restart myapp"
\`\`\`
## Validation
\`\`\`bash
curl http://prod-server:8080/health
\`\`\`
EOF
# Screenshot your terminal (script command)
script -t 2>timing.txt deployment.log
# Do your work
# exit
# Replay: scriptreplay -t timing.txt deployment.log
Command man pages:
# Read manual
man ssh
man 5 ssh_config # Config file format
# Search man pages
man -k network
# Show one-line description
whatis grep
# Quick help
grep --help
Learn by exploring:
# See what command does
type ll
type -a python
# Find command location
which nginx
whereis nginx
# Check command version
nginx -v
python --version
# Inspect binaries
file $(which curl)
ldd $(which curl) # Library dependencies
# Explore /proc filesystem
cat /proc/cpuinfo
cat /proc/meminfo
cat /proc/sys/kernel/hostname
6. Monitoring & Alerting
Quick health checks:
# One-liner system check
echo "Load: $(uptime | awk -F'load average:' '{print $2}')" && \
echo "Disk: $(df -h / | awk 'NR==2{print $5}')" && \
echo "Mem: $(free | awk 'NR==2{printf "%.0f%%", $3/$2*100}')"
# Service availability
systemctl is-active myapp && echo "UP" || echo "DOWN"
# Port check
nc -z localhost 8080 && echo "Port open" || echo "Port closed"
# HTTP health
curl -sf http://localhost/health | jq -r '.status' || echo "FAIL"
Alerting with simple scripts:
#!/bin/bash
# disk-alert.sh
THRESHOLD=90
USAGE=$(df -h / | awk 'NR==2{print $5}' | tr -d '%')
if [ $USAGE -gt $THRESHOLD ]; then
echo "ALERT: Disk usage at ${USAGE}%"
# Send notification (mail, slack, etc)
fi
Conclusion
You now have a comprehensive toolkit of Linux commands for DevOps operations.
Key takeaways:
- Master process monitoring (
ps,top) for immediate triage - Know your logs (
journalctl,tail,grep) for root cause analysis - Understand resource monitoring (
free,df,iostat) to prevent issues - Use network tools (
ss,curl,dig) to debug connectivity - Leverage remote operations (
ssh,rsync) for efficient workflows - Apply permission management (
chmod,chown,sudo) securely - Build pipelines (pipes,
xargs,awk) for complex analysis - Practice on non-production systems first
Next steps:
- Bookmark this guide for reference
- Practice one section daily in your work
- Build your personal alias library
- Create runbooks for common tasks
- Automate repetitive operations with scripts
What's next:
- Part 2: Advanced Linux Commands for SRE (coming soon)
- Security-Focused Linux Commands for DevOps
- Docker & Container Management CLI Reference
- Troubleshooting Production Incidents: A Command-Line Guide
Take action:
- Which commands do you use most? Share in comments
- Have a favorite one-liner? Drop it below
- Subscribe for more DevOps tutorials
- Follow for updates on the advanced series
Related guides on this blog:
- Setting Up GitLab CI for Blog Automation (internal link)
- Infrastructure as Code with Ansible (internal link)
- DevOps Workflows: From Code to Deployment (internal link)
Keywords: devops linux commands, linux automation, system monitoring, devops tools, linux troubleshooting, site reliability engineering, system administration, command line reference