Essential Linux Commands for DevOps Engineers

October 13, 2025

linuxdevopscommandstroubleshootingsysadminmonitoring

Essential Linux Commands for DevOps Engineers

Introduction

When production breaks at 3 AM you can’t waste time searching for “which process is using port 8080” or “check disk space linux.” You need the right commands already in muscle memory.

Mastery isn’t memorizing flags for trivia. It’s gathering signal fast, forming a hypothesis, and narrowing root cause with the least motion: is it CPU saturation, runaway I/O, memory pressure, port collision, or just a noisy log file? The commands here help you answer that—quickly and repeatably.

This guide groups essential Linux tooling by real operational tasks: processes, system resources, services, networks, files/logs, remote access, permissions, and power patterns. Each entry focuses on practical usage and interpretation instead of encyclopedic flag dumps.

Who this is for: DevOps engineers, SREs, platform engineers, or anyone who keeps Linux systems healthy.

You’ll get: The core commands for daily ops, incident response, rollout verification, and automation—plus how to read their output with confidence.

Part 1: Process & System Monitoring

First objective in an incident: establish system truth. What is running, what is consuming, what changed.

1. Process Management

`ps` – Process snapshot

Point‑in‑time process list: owner, PID, CPU %, memory %, command.

# View all running processes with full details
ps aux

# Alternative format (System V style)
ps -ef

# Find specific processes
ps aux | grep nginx

# Show processes in a tree structure (see parent-child relationships)
ps auxf

# Show only processes for current user
ps ux

# Sort by CPU usage (highest first)
ps aux --sort=-%cpu | head -n 10

# Sort by memory usage
ps aux --sort=-%mem | head -n 10

# Show threads for a specific process
ps -eLf | grep nginx

Reading the output:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
nginx     1234  0.5  2.1  12345  6789 ?        S    10:30   0:05 nginx: worker process

USER: Process owner
PID: Process ID (unique identifier)
%CPU: CPU usage percentage
%MEM: Memory usage percentage
VSZ: Virtual memory size (KB)
RSS: Resident Set Size - actual physical memory (KB)
STAT: Process state (R=running, S=sleeping, Z=zombie, D=uninterruptible sleep)
START: When the process started
TIME: Total CPU time used
COMMAND: The command that started the process

Examples:

# After deploying a new application, check if it's running
ps aux | grep myapp

# Find all Java processes and their memory usage
ps aux | grep java | awk '{print $2, $4, $11}'

# Identify processes consuming more than 50% CPU
ps aux | awk '$3 > 50 {print $0}'

Tips:

ps auxf for parent/child relationships (great for systemd + worker pools)
Wrap in watch -n 2 for rough pseudo‑streaming
Use [n]ginx pattern to avoid matching the grep process

`top` / `htop` – Live view

Real‑time utilization and ranking of processes. htop adds color, filtering, scrolling.

# Launch top (default 3-second refresh)
top

# Better alternative with colors and mouse support (if installed)
htop

# Top with custom refresh interval (1 second)
top -d 1

# Show only processes for specific user
top -u nginx

# Batch mode (for logging/scripting)
top -b -n 1 > system_snapshot.txt

Interactive top commands (while running):

M - Sort by memory usage (high to low)
P - Sort by CPU usage (default)
T - Sort by running time
k - Kill a process (prompts for PID)
r - Renice (change priority) of a process
f - Add/remove display fields
1 - Show individual CPU cores
q - Quit

Header breakdown:

top - 14:23:45 up 23 days,  4:12,  3 users,  load average: 0.52, 0.58, 0.59
Tasks: 312 total,   1 running, 311 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.2 us,  1.1 sy,  0.0 ni, 95.5 id,  0.2 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  16384.0 total,   2048.5 free,   8192.3 used,   6143.2 buff/cache
MiB Swap:   8192.0 total,   7890.1 free,    301.9 used.   7234.8 avail Mem

Key readings: Load (compare to logical CPU count), idle vs wait, memory pressure, swap usage trend.

Why htop: Scroll, tree, filter (F4), interactive kill (F9), color for quick scanning.

# Install htop (if not available)
# Rocky Linux / RHEL:
sudo dnf install htop

# Ubuntu / Debian:
sudo apt install htop

Usage patterns:

# Monitor system during deployment
top -d 1  # Fast refresh to catch spikes

# Watch memory usage after releasing a new version
htop  # Press 'M' to sort by memory, watch for leaks

# Identify which process caused a CPU spike
top -b -n 1 | head -n 20  # Capture snapshot for later analysis

Tips:

top -o %MEM for memory‑first triage
Sustained wa > 5–10% → investigate storage (iostat)
Load >> CPU count alongside high run queue = saturation

3. Finding Process IDs (PIDs)

Command: pgrep / pidof

# Find PID by process name
pgrep nginx

# Find PID with full command line
pgrep -f "java.*myapp"

# Show PID and process name
pgrep -l nginx

# Alternative: pidof (simpler but less flexible)
pidof nginx

Use case: Get PIDs for process management, scripting
DevOps context: Automated restart scripts, health checks
Pro tip: Use pgrep -f to match full command line arguments

Command: ps + grep (fallback)

# Find process by name
ps aux | grep nginx

# Get just the PID (using awk)
ps aux | grep nginx | grep -v grep | awk '{print $2}'

# More elegant single command
ps aux | grep [n]ginx | awk '{print $2}'

Use case: When pgrep isn't available, scripting
DevOps context: Legacy systems, portable scripts
Pro tip: The [n]ginx trick excludes grep itself from results

Command: lsof (list open files)

# Find which process is using a specific port
lsof -i :8080

# Find all network connections for a process
lsof -p 1234

# Find process using a specific file
lsof /var/log/app.log

Use case: Port conflict resolution, file lock troubleshooting
DevOps context: "Port already in use" errors, finding what's holding files
Pro tip: Combine with -t flag to get just PIDs: lsof -t -i :8080

4. Terminating Processes

Command: kill / killall

# First, find the PID (see section above)
pgrep nginx
# Output: 12345

# Graceful termination (SIGTERM)
kill 12345

# Force kill (SIGKILL) - use as last resort
kill -9 12345

# Kill all processes by name
killall nginx

# Graceful kill by name
pkill nginx

# Kill with full command match
pkill -f "java.*myapp"

Use case: Stop unresponsive processes, force restart services
DevOps context: Emergency process termination during incidents
Warning: -9 skips cleanup; escalate only if graceful stop fails
Practice: TERM → short wait → KILL only if still present

Signal cheat sheet:

# Common signals
kill -15 <PID>  # SIGTERM (default, graceful shutdown)
kill -9 <PID>   # SIGKILL (immediate termination, no cleanup)
kill -1 <PID>   # SIGHUP (reload configuration)
kill -2 <PID>   # SIGINT (Ctrl+C equivalent)

2. System Resource Monitoring

Baseline + deltas = early warning. Watch these before things break.

`free` – Memory usage

Shows physical + swap breakdown plus reclaimable cache. Track pressure, not raw consumption.

# Display memory usage in human-readable format
free -h

# Show memory in megabytes
free -m

# Show memory in gigabytes
free -g

# Continuous monitoring (update every 2 seconds)
free -h -s 2

# Wide format (better column spacing)
free -h -w

Sample:

              total        used        free      shared  buff/cache   available
Mem:           15Gi       8.2Gi       1.1Gi       523Mi       6.1Gi       6.8Gi
Swap:         8.0Gi       301Mi       7.7Gi

Columns:

total: Total installed RAM
used: Memory currently in use by applications
free: Completely unused memory (usually low—that's normal!)
shared: Memory used by tmpfs (temporary filesystems)
buff/cache: Memory used for file system caching (Linux uses "free" RAM for caching)
available: MOST IMPORTANT - Memory available for new applications without swapping

Misread alert: Low “free” is normal. Focus on available.

Patterns:

# Check if system is low on memory
free -h
# If "available" is low (< 10% of total), investigate with top/htop

# Monitor memory during load testing
watch -n 1 'free -h'

# Check if application deployment increased memory usage
free -h  # Before deployment
free -h  # After deployment, compare "used" and "available"

# Quick one-liner to check available memory percentage
free | grep Mem | awk '{print ($7/$2) * 100 "%"}'

Red flags:

Available < 10% total
Swap steadily rising
OOM killer messages in logs

Tips:

Correlate with vmstat 1 (si/so columns)
Rising swap + stable available = benign aging pages
Use pmap / smem for attribution if creeping

`df` – Filesystem usage

Capacity exhaustion silently breaks writes, logging, queueing.

# Display disk space in human-readable format
df -h

# Show all filesystems (including tmpfs, devtmpfs)
df -ah

# Show only specific filesystem type
df -h -t ext4

# Exclude specific filesystem type (useful to hide tmpfs clutter)
df -h -x tmpfs -x devtmpfs

# Show inode usage instead of block usage
df -i

# Show filesystem type
df -T

Sample:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        50G   35G   13G  74% /
/dev/sdb1       500G  450G   26G  95% /var
tmpfs           7.8G  524M  7.3G   7% /run

Columns:

Filesystem: Device or partition name
Size: Total size of the filesystem
Used: Space currently in use
Avail: Available space for new data
Use%: Percentage of space used
Mounted on: Where this filesystem is accessible

Threshold guide: plan (80%), act (90%), urgent (95%), risk (100%).

Patterns:

# Quick disk space check (most common command)
df -h

# During incident: Find which filesystem is full
df -h | awk '$5 > 80 {print $0}'

# Check if specific mount has enough space for deployment
df -h /opt/applications

# Monitor disk space during log-heavy operations
watch -n 5 'df -h | grep -E "(Filesystem|/var|/)"'

# Check inode usage (sometimes you run out of inodes, not space!)
df -i
# If inode Use% is high but df -h shows space available, you have too many small files

Inode exhaustion:

# Check inode usage
df -i

# Output shows:
Filesystem      Inodes  IUsed   IFree IUse% Mounted on
/dev/sda1      3276800 3270000  6800   99% /

# Find directories with many files
find /var -xdev -type d -exec sh -c 'echo $(ls -a "$1" | wc -l) "$1"' _ {} \; | sort -n | tail -20

Tips:

Alert on inode use too
Prune old images/volumes
Rotate/compress logs early

`du` – Directory utilization

Answers “where did the space go?”—pair with sorting.

# Check directory size (most common usage)
du -sh /var/log

# Find largest directories in /var
du -h --max-depth=1 /var | sort -hr | head -n 10

# Same but with better sorting (numeric, not alphabetic)
du -h /var | sort -rh | head -n 10

# Show sizes for all subdirectories
du -h --max-depth=2 /opt

# Include hidden files and directories
du -sh /home/user/.* /home/user/*

# Find total size of specific file types
du -ch /var/log/*.log | grep total

# Real-time monitoring during cleanup
watch -n 5 'du -sh /var/log'

# Find largest files in a directory
du -ah /var/log | sort -rh | head -n 20

# Exclude certain directories
du -h --exclude='*.git' /opt/project

Flags:

-s: Summarize (show total only, don't list all files)
-h: Human-readable sizes (K, M, G)
-a: Include files, not just directories
-c: Show grand total at the end
--max-depth=N: Limit directory recursion depth

Patterns:

# Scenario 1: Disk is 95% full, find the culprit
df -h  # Shows /var is full
du -h --max-depth=1 /var | sort -hr | head -10
# Output might show: 400G /var/log
du -h --max-depth=1 /var/log | sort -hr | head -10
# Output shows: 350G /var/log/application/old-logs

# Scenario 2: Docker eating disk space
du -sh /var/lib/docker
# Output: 80G /var/lib/docker

# Clean up old Docker images
docker system prune -a --volumes

# Scenario 3: Find all large log files
find /var/log -type f -size +100M -exec du -h {} \; | sort -rh

# Scenario 4: Compare disk usage before/after cleanup
du -sh /var/log > before.txt
# ... perform cleanup ...
du -sh /var/log > after.txt
diff before.txt after.txt

Performance: Limit recursion depth for speed.

# Fast: Just top-level directories
du -h --max-depth=1 /

# Slow: Scans entire filesystem
du -h /

Big consumers:

# Find top 10 largest directories under /
sudo du -h --max-depth=2 / 2>/dev/null | sort -hr | head -n 10

# Find top 20 largest files on system
sudo find / -type f -size +100M -exec du -h {} \; 2>/dev/null | sort -rh | head -n 20

Tips:

ncdu for interactive cleanup
Snapshot periodically for growth trends
Exclude ephemeral mounts

Command: uptime

# System uptime and load averages
uptime

Use case: Check system stability, load averages
DevOps context: Quick health check during incidents
What the numbers mean: 1min, 5min, 15min load averages

Command: iostat

# I/O statistics
iostat -x 1

# Disk-specific stats
iostat -dx 1

Use case: Diagnose disk I/O bottlenecks
DevOps context: Performance troubleshooting, database issues
What to watch: %util, await times

Part 2: Service Management

Control plane for systemd units. Check state, follow logs, trace failures.

`systemctl`

# List active services
systemctl list-units --type=service --state=running

# Service status + recent log tail
systemctl status nginx

# Start/stop/restart
sudo systemctl restart nginx
sudo systemctl stop nginx
sudo systemctl start nginx

# Enable on boot (creates symlink)
sudo systemctl enable nginx

# Disable auto-start
sudo systemctl disable nginx

# Check if enabled
systemctl is-enabled nginx

# Reload config without restart (if supported)
sudo systemctl reload nginx

# Show service file location
systemctl cat nginx

# Edit service override
sudo systemctl edit nginx  # Creates drop-in at /etc/systemd/system/nginx.service.d/

Common failure modes:

# Service won't start—check why
systemctl status myapp
# Look for "Active: failed" + exit code

# See full error (status truncates)
journalctl -u myapp -n 50 --no-pager

# Check if crash-looping
systemctl list-units --state=failed

Triage checklist:

systemctl status <unit> → exit code, recent logs
journalctl -u <unit> -n 100 → full startup sequence
Check deps: systemctl list-dependencies <unit>
Verify file: systemctl cat <unit> → paths, env, user

Tips:

Use --no-pager in scripts to avoid truncation
is-active / is-enabled return 0/non-zero for scripting
After changing unit files: sudo systemctl daemon-reload

`journalctl`

Systemd's structured logging. Query by unit, time, priority, field.

# All logs for a unit
journalctl -u nginx

# Follow (like tail -f)
journalctl -u nginx -f

# Last N lines
journalctl -u nginx -n 100

# Time-based filtering
journalctl -u nginx --since "2025-10-07 14:00:00"
journalctl -u nginx --since "1 hour ago"
journalctl -u nginx --since today

# Priority filtering (emerg, alert, crit, err, warning, notice, info, debug)
journalctl -u nginx -p err

# Combine: errors in last hour
journalctl -u nginx -p err --since "1 hour ago"

# Reverse order (newest first)
journalctl -u nginx -r

# Show only from current boot
journalctl -u nginx -b

# Kernel messages (dmesg equivalent)
journalctl -k

# Disk usage
journalctl --disk-usage

# Vacuum old logs (keep last 7 days)
sudo journalctl --vacuum-time=7d

Powerful field filtering:

# All logs from specific executable
journalctl _COMM=sshd

# Logs from specific PID
journalctl _PID=1234

# By user
journalctl _UID=1000

# Multiple units
journalctl -u nginx -u mysql

Output formats:

# JSON for parsing
journalctl -u nginx -o json

# Short (syslog-style)
journalctl -u nginx -o short

# Verbose (all fields)
journalctl -u nginx -o verbose

Post-deployment workflow:

# Mark time, deploy, follow logs
date  # Note timestamp
# ... deploy ...
journalctl -u myapp --since "30 seconds ago" -f

# Or filter by priority
journalctl -u myapp -p warning -f

Tips:

Add --no-pager for grep-able output
Use -x for explanatory help text on errors
Combine -f with -n 0 to follow without history
Set retention: /etc/systemd/journald.conf → MaxRetentionSec

Part 3: Network Troubleshooting

Layer-by-layer diagnostics: connectivity, DNS, sockets, routing, application.

`ss` (socket statistics)

Replaced netstat. Shows listening + established sockets, optionally with owning processes.

# List all listening TCP/UDP ports
ss -tuln

# Show processes owning sockets (requires root)
sudo ss -tulpn

# TCP connections only
ss -t

# Established connections
ss -tn state established

# Count connections by state
ss -tan | awk '{print $1}' | sort | uniq -c

# Show specific port
ss -tuln | grep :8080

# Numeric (don't resolve names—faster)
ss -n

# Summary stats
ss -s

Triage patterns:

# "Port already in use" → find what's listening
sudo ss -tulpn | grep :8080

# Check if service bound correctly
ss -tuln | grep :3306  # MySQL example

# Too many connections?
ss -tn | wc -l

# Which IPs connecting most?
ss -tn | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn | head

Tips:

-t TCP, -u UDP, -l listening, -n numeric, -p processes
Much faster than old netstat
Use -4 or -6 to filter IP version

`ip` (network configuration)

Modern replacement for ifconfig / route. Configure interfaces, routes, tunnels.

# Show all interfaces
ip addr show
# or shorthand:
ip a

# Specific interface
ip addr show eth0

# Show routes
ip route show
# or:
ip r

# Add/delete IP
sudo ip addr add 192.168.1.100/24 dev eth0
sudo ip addr del 192.168.1.100/24 dev eth0

# Bring interface up/down
sudo ip link set eth0 up
sudo ip link set eth0 down

# Show link statistics
ip -s link

# Neighbor table (ARP cache)
ip neigh show

# Routing table with more detail
ip route show table all

Common scenarios:

# Verify IP after DHCP/static config
ip addr show | grep inet

# Check default gateway
ip route show default

# Add temporary route
sudo ip route add 10.0.0.0/8 via 192.168.1.1

# Flush specific interface IPs
sudo ip addr flush dev eth0

# Check MTU
ip link show eth0 | grep mtu

Tips:

Persistent changes need /etc/network/interfaces or NetworkManager
Use -c for color output: ip -c a
-br for brief table format: ip -br a

`curl`

HTTP client for API testing, health checks, troubleshooting.

# Basic GET
curl https://api.example.com/health

# Include response headers
curl -I https://example.com
# or verbose:
curl -v https://example.com

# Follow redirects
curl -L https://short.link

# POST JSON
curl -X POST https://api.example.com/data \
  -H "Content-Type: application/json" \
  -d '{"key":"value"}'

# POST form data
curl -X POST https://example.com/form \
  -d "username=admin&password=secret"

# Custom headers
curl -H "Authorization: Bearer TOKEN" https://api.example.com

# Save output
curl -o output.html https://example.com

# Silent (no progress)
curl -s https://api.example.com/status

# Fail on HTTP errors
curl -f https://api.example.com

# Test specific IP (bypass DNS)
curl --resolve example.com:443:192.168.1.100 https://example.com

# Check TLS cert expiry
curl -vI https://example.com 2>&1 | grep "expire"

# Download with resume support
curl -C - -O https://example.com/largefile.iso

# Set timeout
curl --connect-timeout 5 --max-time 10 https://slow-server.com

Health check patterns:

# Simple availability
curl -f -s https://app.example.com/health && echo "UP" || echo "DOWN"

# Check response time
time curl -s https://api.example.com > /dev/null

# Validate status code
curl -s -o /dev/null -w "%{http_code}" https://example.com

# JSON response parsing (with jq)
curl -s https://api.example.com/status | jq '.status'

# Check from specific source IP (multi-homed)
curl --interface eth1 https://example.com

Debugging workflow:

# Start verbose
curl -v https://problem-site.com

# Check SSL handshake
curl -v https://problem-site.com 2>&1 | grep -E "SSL|TLS|certificate"

# Test with/without HTTP2
curl --http2 https://example.com
curl --http1.1 https://example.com

# Bypass proxy temporarily
curl --noproxy '*' https://example.com

Tips:

-s (silent) + -S (show errors) = script-friendly
Use -w for custom output format (status, time, size)
-k skips cert validation (insecure, use only for testing)

`wget`

File downloader. Better than curl for recursive downloads and resume.

# Download file
wget https://example.com/file.tar.gz

# Save with different name
wget -O custom-name.tar.gz https://example.com/file.tar.gz

# Resume interrupted download
wget -c https://example.com/large-file.iso

# Background download
wget -b https://example.com/huge-file.zip

# Limit rate (bandwidth throttle)
wget --limit-rate=1m https://example.com/file.zip

# Mirror entire site
wget -m -p -k https://example.com

# Download with auth
wget --user=admin --password=secret https://example.com/file

# Retry on failure
wget --tries=5 https://unreliable-server.com/file

Comparison to curl:

wget: better for downloads, mirroring, recursive retrieval
curl: better for API testing, custom headers, protocol flexibility

`ping`

ICMP echo test. Verify basic IP connectivity.

# Continuous ping
ping google.com

# Send N packets
ping -c 4 google.com

# Flood ping (requires root, use carefully)
sudo ping -f 192.168.1.1

# Set interval (seconds)
ping -i 0.5 google.com

# Specify interface
ping -I eth0 192.168.1.1

# IPv6
ping6 google.com

Usage:

First test in network triage
No response: check firewalls, routing, host down
High latency: network congestion or distance
Packet loss: unstable link

Tips:

Use -c in scripts to avoid infinite loops
Firewalls often block ICMP (absence doesn't prove failure)
Pair with mtr for path analysis

`traceroute` / `mtr`

Show network path to destination.

# Basic traceroute
traceroute google.com

# Use ICMP instead of UDP
sudo traceroute -I google.com

# Set max hops
traceroute -m 20 google.com

# Better: mtr (combines traceroute + ping)
mtr google.com

# mtr report mode (10 cycles)
mtr -r -c 10 google.com

Reading output:

Each line = hop (router)
* * * = timeout (often firewall dropping probes)
High latency at specific hop = bottleneck
Loss early in path = problem near you; late = problem near destination

When to use: Multi-region connectivity issues, asymmetric routing, finding slow link.

`dig` / `nslookup`

DNS query tools. Verify resolution, check propagation, debug CDN issues.

# Basic lookup
dig example.com

# Query specific DNS server
dig @8.8.8.8 example.com

# Short answer only
dig +short example.com

# Reverse DNS
dig -x 8.8.8.8

# Specific record type
dig example.com MX
dig example.com TXT
dig example.com AAAA  # IPv6

# Trace full resolution path
dig +trace example.com

# No recursion (ask server directly)
dig +norecurs example.com

# Query time
dig example.com | grep "Query time"

Common tasks:

# Verify DNS change propagated
dig @8.8.8.8 example.com  # Google DNS
dig @1.1.1.1 example.com  # Cloudflare DNS

# Check TTL
dig example.com | grep -E "^example.com"

# Find authoritative nameservers
dig example.com NS

# Test internal DNS
dig @10.0.0.53 internal.example.com

nslookup alternative:

# Basic query
nslookup example.com

# Specific server
nslookup example.com 8.8.8.8

# Reverse
nslookup 8.8.8.8

Triage pattern: "site not loading"

# 1. Can resolve?
dig example.com +short

# 2. Correct IP?
dig example.com +short
# Compare to expected

# 3. DNS server issue?
dig @8.8.8.8 example.com +short  # Public DNS
dig example.com +short  # System resolver

# 4. Stale cache?
# (Clear local: sudo systemd-resolve --flush-caches)

# 5. Check from multiple perspectives
dig @1.1.1.1 example.com +short
dig @8.8.8.8 example.com +short

Tips:

+short for script parsing
+trace shows delegation chain (helpful for debugging zone config)
Use multiple DNS servers to verify propagation

Part 4: File Operations & Text Processing

Search, filter, transform. The foundation of log triage and automation scripting.

`find`

Locate files by name, size, time, type. Execute batch operations.

# Find by name pattern
find /var/log -name "*.log"

# Case-insensitive
find /var/log -iname "*.LOG"

# Modified in last 7 days
find /var/log -mtime -7

# Modified more than 30 days ago
find /var/log -mtime +30

# Accessed in last 24 hours
find /var/log -atime -1

# Large files (>100MB)
find /var -size +100M

# Files between 10MB and 100MB
find /var -size +10M -size -100M

# Empty files
find /var/log -type f -empty

# Directories only
find /opt -type d -name "cache"

# Execute command on results
find /var/log -name "*.log" -exec gzip {} \;

# Safer: prompt before action
find /var/log -name "*.old" -ok rm {} \;

# Delete (use cautiously!)
find /tmp -name "*.tmp" -mtime +7 -delete

# Combine conditions (AND)
find /var/log -name "*.log" -size +100M -mtime +30

# OR logic
find /var/log \( -name "*.log" -o -name "*.txt" \)

# Exclude pattern
find /var -name "*.log" ! -path "*/archive/*"

# Limit depth
find /var -maxdepth 2 -name "*.conf"

Practical patterns:

# Find largest log files
find /var/log -type f -size +10M -exec ls -lh {} \; | sort -k5 -hr

# Old logs for cleanup
find /var/log -name "*.log.*" -mtime +90 -ls

# Recently changed configs (last 2 days)
find /etc -name "*.conf" -mtime -2

# World-writable files (security audit)
find /var/www -type f -perm -002

# Setuid binaries (security scan)
find / -type f -perm -4000 2>/dev/null

# Files owned by specific user
find /home -user bob -name "*.sh"

# Broken symlinks
find /opt -type l ! -exec test -e {} \; -print

Tips:

Test with -ls or -print before using -delete
Use -print0 + xargs -0 for filenames with spaces
Redirect stderr (2>/dev/null) to hide permission errors
-mtime 0 = today, -mtime 1 = yesterday

`grep`

Pattern matching in files. Core tool for log analysis.

# Basic search
grep "error" /var/log/app.log

# Case-insensitive
grep -i "ERROR" /var/log/app.log

# Show line numbers
grep -n "error" /var/log/app.log

# Count matches
grep -c "error" /var/log/app.log

# Invert match (lines NOT containing pattern)
grep -v "debug" /var/log/app.log

# Show context (3 lines before and after)
grep -C 3 "fatal" /var/log/app.log
# Or separately:
grep -B 3 -A 3 "fatal" /var/log/app.log

# Recursive search in directory
grep -r "TODO" /app/src

# Only show filenames
grep -rl "password" /etc

# Multiple patterns (OR)
grep -E "error|warn|fatal" /var/log/app.log
# Or:
grep -e "error" -e "warn" /var/log/app.log

# Whole word match
grep -w "fail" /var/log/app.log  # Won't match "failure"

# Extended regex
grep -E "^(ERROR|WARN)" /var/log/app.log

# Perl regex
grep -P "\d{3}-\d{3}-\d{4}" contacts.txt  # Phone numbers

# Fixed strings (no regex—faster)
grep -F "user@example.com" /var/log/mail.log

# Binary files
grep -a "string" binaryfile  # Treat as text

# With file name in output
grep -H "pattern" *.log

Log triage patterns:

# Errors in last hour (combine with journalctl/tail)
grep -i error /var/log/app.log | tail -n 100

# Filter noise
grep error /var/log/app.log | grep -v "harmless warning"

# Extract IPs
grep -oE "\b([0-9]{1,3}\.){3}[0-9]{1,3}\b" access.log | sort | uniq -c

# Extract timestamps + errors
grep "ERROR" /var/log/app.log | grep -oP "\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}"

# Show only unique errors
grep ERROR /var/log/app.log | sort | uniq

# Count error types
grep ERROR /var/log/app.log | awk '{print $5}' | sort | uniq -c | sort -rn

# Errors NOT from specific component
grep ERROR /var/log/app.log | grep -v "HealthCheck"

# Multi-level filtering
grep 500 access.log | grep -v bot | grep POST

Tips:

Use -E for modern regex (or egrep)
grep -v incredibly useful for filtering noise
Combine with | less for scrolling long output
Save common patterns as aliases

`tail` / `head`

View file start or end. Essential for log monitoring.

# Last 10 lines (default)
tail /var/log/app.log

# Last N lines
tail -n 50 /var/log/app.log
# Shorthand:
tail -50 /var/log/app.log

# Follow (live updates)
tail -f /var/log/app.log

# Follow multiple files
tail -f /var/log/app1.log /var/log/app2.log

# Follow with line numbers
tail -n 100 -f /var/log/app.log | cat -n

# Start from line N
tail -n +100 /var/log/app.log  # From line 100 to end

# First 10 lines
head /var/log/app.log

# First N lines
head -n 20 /var/log/app.log

# All but last N lines
head -n -10 /var/log/app.log  # Everything except last 10

Live monitoring patterns:

# Follow + filter
tail -f /var/log/app.log | grep ERROR

# Multiple filters
tail -f /var/log/app.log | grep -E "ERROR|WARN" | grep -v "HealthCheck"

# Follow + highlight
tail -f /var/log/app.log | grep --color=always -E "ERROR|$"

# Follow with timestamps
tail -f /var/log/app.log | while read line; do echo "$(date +%T) $line"; done

# Stop after match appears
tail -f /var/log/app.log | grep -m 1 "Startup complete"

Tips:

tail -f follows by name; use -F to handle log rotation
less +F filename = tail -f with scroll-back ability (Ctrl-C to pause)
multitail for advanced multi-file monitoring with colors

`awk`

Pattern-directed text processing. Extract columns, aggregate, filter.

# Print specific columns (space-delimited)
ps aux | awk '{print $1, $11}'

# Print with header
ps aux | awk 'NR==1 || $3>50'  # Header + high CPU

# Filter by condition
df -h | awk '$5 > 80 {print $0}'  # >80% full

# Sum values
cat numbers.txt | awk '{sum += $1} END {print sum}'

# Average
awk '{sum+=$1; count++} END {print sum/count}' numbers.txt

# Field separator (CSV)
awk -F',' '{print $2}' data.csv

# Multiple separators
awk -F'[,:]' '{print $3}' file.txt

# Print last field
awk '{print $NF}' file.txt

# String matching
awk '/error/ {print $0}' /var/log/app.log

# Negation
awk '!/debug/ {print $0}' /var/log/app.log

# Count occurrences
awk '/ERROR/ {count++} END {print count}' /var/log/app.log

# Unique values
awk '{print $5}' file.txt | sort | uniq

Practical examples:

# Extract IPs from access log
awk '{print $1}' access.log | sort | uniq -c | sort -rn

# Memory usage by process
ps aux | awk '{mem[$11]++} END {for (i in mem) print mem[i], i}' | sort -rn

# Parse structured logs
awk -F'|' '{print $3}' app.log | sort | uniq -c

# Traffic by hour
awk '{print $4}' access.log | cut -d: -f2 | sort | uniq -c

# Calculate percentiles (simplified)
awk '{print $9}' response_times.txt | sort -n | awk '{a[NR]=$1} END {print a[int(NR*0.95)]}'

# Status code distribution
awk '{print $9}' access.log | sort | uniq -c | sort -rn

Tips:

Start simple: {print $N} to extract columns
NR = line number, NF = number of fields
Use -F to set delimiter (default is whitespace)
Great for quick one-liners; for complex logic, use Python/Perl

`sed`

Stream editor. Substitute text, delete lines, transform input.

# Replace first occurrence
sed 's/old/new/' file.txt

# Replace all occurrences
sed 's/old/new/g' file.txt

# In-place edit (DANGEROUS—test first!)
sed -i 's/old/new/g' file.txt

# Backup before in-place edit
sed -i.bak 's/old/new/g' file.txt

# Delete lines matching pattern
sed '/pattern/d' file.txt

# Delete specific line number
sed '5d' file.txt

# Delete range
sed '10,20d' file.txt

# Print only matching lines (like grep)
sed -n '/pattern/p' file.txt

# Substitute only on matching lines
sed '/ERROR/ s/foo/bar/g' file.txt

# Multiple commands
sed -e 's/foo/bar/g' -e 's/baz/qux/g' file.txt

# Case-insensitive replace
sed 's/error/ERROR/gI' file.txt

# Add line after match
sed '/pattern/a\New line here' file.txt

# Insert line before match
sed '/pattern/i\New line here' file.txt

# Replace in specific lines
sed '10,20s/old/new/g' file.txt

Config file updates:

# Change port number
sed -i 's/^port.*/port 8080/' config.ini

# Uncomment line
sed -i 's/^# *\(option.*\)/\1/' config.file

# Comment out line
sed -i 's/^dangerous_setting/#&/' config.file

# Replace variable
sed -i "s|OLD_PATH|$NEW_PATH|g" script.sh

# Multi-line replace (advanced)
sed -i ':a;N;$!ba;s/foo\nbar/baz/g' file.txt

Tips:

Always test without -i first
Use | as delimiter if pattern contains /: s|/path/|/newpath/|
Backup with -i.bak before modifying production configs
For complex edits, consider perl -pi -e or Python

`cut`

Extract columns from delimited text.

# Extract first field (default tab delimiter)
cut -f1 file.txt

# CSV (comma delimiter)
cut -d',' -f2 file.csv

# Multiple fields
cut -d':' -f1,3 /etc/passwd

# Range of fields
cut -d',' -f1-3 file.csv

# All fields except N
cut -d':' -f1,3- /etc/passwd  # 1, then 3 onwards

# Character positions
cut -c1-10 file.txt

# Extract username from email
cut -d'@' -f1 emails.txt

# Combine with other commands
ps aux | tr -s ' ' | cut -d' ' -f1,11

Tips:

Fast and simple for fixed-format data
Use awk for more complex field extraction
Pair with tr to normalize delimiters

Bonus: `jq`

JSON processor (install separately). Essential for API work.

# Pretty-print JSON
curl -s https://api.example.com | jq '.'

# Extract field
curl -s https://api.example.com/status | jq '.status'

# Array element
jq '.[0]' array.json

# Filter array
jq '.[] | select(.active == true)' data.json

# Extract multiple fields
jq '.[] | {id, name}' data.json

# Count items
jq '. | length' array.json

# Map over array
jq '.[] | .price * 1.1' items.json  # Add 10%

Tips:

Invaluable for parsing API responses
Use -r for raw output (no quotes)
Combine with curl for API testing pipelines

Part 5: Remote Operations & File Transfer

Connect, execute, sync. Key tools for multi-server management.

`ssh`

Secure remote shell access and command execution.

# Basic connection
ssh user@hostname

# Specific port
ssh -p 2222 user@hostname

# Use specific key
ssh -i ~/.ssh/id_rsa_custom user@hostname

# Execute single command
ssh user@hostname "systemctl status nginx"

# Execute multiple commands
ssh user@hostname "cd /opt && ./deploy.sh && systemctl restart app"

# Local port forwarding (tunnel)
ssh -L 8080:localhost:80 user@hostname
# Access remote :80 via local :8080

# Remote port forwarding
ssh -R 9000:localhost:3000 user@hostname
# Remote can access your local :3000 via their :9000

# Dynamic SOCKS proxy
ssh -D 1080 user@hostname
# Configure browser to use localhost:1080

# Jump host (bastion)
ssh -J jump-host@bastion.example.com user@internal-server

# X11 forwarding
ssh -X user@hostname
# Run GUI apps remotely

# Keep connection alive
ssh -o ServerAliveInterval=60 user@hostname

# Disable strict host key checking (testing only!)
ssh -o StrictHostKeyChecking=no user@hostname

# Run command with sudo
ssh -t user@hostname "sudo systemctl restart nginx"
# -t allocates pseudo-terminal (required for sudo password)

SSH config (~/.ssh/config):

# Create config for easy access
cat >> ~/.ssh/config << EOF
Host prod-web
    HostName 192.168.1.100
    User deploy
    Port 2222
    IdentityFile ~/.ssh/prod_key
    ServerAliveInterval 60
    
Host *.internal
    ProxyJump bastion.example.com
    User ops
EOF

# Now simply:
ssh prod-web
ssh server1.internal

ControlMaster (connection reuse):

# Add to ~/.ssh/config
Host *
    ControlMaster auto
    ControlPath ~/.ssh/cm-%r@%h:%p
    ControlPersist 10m

# First connection creates socket; subsequent ones reuse it
# Much faster for repeated commands

Security hardening:

# Server-side /etc/ssh/sshd_config
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
AllowUsers deploy ops
MaxAuthTries 3
ClientAliveInterval 300
ClientAliveCountMax 2

Key generation:

# Generate new key pair
ssh-keygen -t ed25519 -C "user@hostname"

# Copy public key to server
ssh-copy-id user@hostname

# Manual copy (if ssh-copy-id unavailable)
cat ~/.ssh/id_ed25519.pub | ssh user@hostname "mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys"

Tips:

Use ed25519 keys over RSA (faster, more secure)
ControlMaster speeds up Ansible, scripts dramatically
Jump hosts simplify bastion access
Never disable StrictHostKeyChecking in production

`scp`

Secure copy between hosts.

# Copy file to remote
scp file.txt user@hostname:/path/to/destination

# Copy from remote
scp user@hostname:/path/to/file.txt ./

# Copy directory recursively
scp -r ./directory user@hostname:/path/

# Preserve permissions and timestamps
scp -p file.txt user@hostname:/path/

# Specific port
scp -P 2222 file.txt user@hostname:/path/

# Use specific key
scp -i ~/.ssh/custom_key file.txt user@hostname:/path/

# Verbose (debugging)
scp -v file.txt user@hostname:/path/

# Limit bandwidth (KB/s)
scp -l 1000 large-file.iso user@hostname:/path/

# Via jump host
scp -J bastion.example.com file.txt user@internal-server:/path/

# Between two remote hosts
scp user1@host1:/path/file.txt user2@host2:/path/

Tips:

-P for port (capital P, unlike ssh's -p)
Use rsync for large transfers (better resume, progress)
SCP doesn't handle interruptions well

`rsync`

Intelligent file synchronization. Handles interruptions, only transfers changes.

# Basic sync
rsync -avz /local/path/ user@hostname:/remote/path/

# Flags explained:
# -a = archive (recursive, preserve permissions, times, symlinks)
# -v = verbose
# -z = compress during transfer

# Dry run (preview changes)
rsync -avzn /local/path/ user@hostname:/remote/path/

# Show progress
rsync -avz --progress /local/path/ user@hostname:/remote/path/

# Delete files on destination not in source (dangerous!)
rsync -avz --delete /local/path/ user@hostname:/remote/path/

# Exclude patterns
rsync -avz --exclude='*.log' --exclude='tmp/' /local/ user@host:/remote/

# Include only specific patterns
rsync -avz --include='*.conf' --include='*/' --exclude='*' /etc/ backup/

# Partial transfer support (resume)
rsync -avz --partial /local/large-file user@hostname:/remote/

# Bandwidth limit (KB/s)
rsync -avz --bwlimit=1000 /local/ user@hostname:/remote/

# Custom SSH port
rsync -avz -e "ssh -p 2222" /local/ user@hostname:/remote/

# Via jump host
rsync -avz -e "ssh -J bastion" /local/ user@internal:/remote/

# Local sync (no SSH)
rsync -avz /source/ /destination/

# Show what changed
rsync -avzi /local/ user@hostname:/remote/
# i = itemize changes

# Backup with hard links (space-efficient)
rsync -avz --link-dest=/backup/previous /data/ /backup/current/

Practical backup script:

#!/bin/bash
DATE=$(date +%Y%m%d)
DEST="/backup/$DATE"
PREV="/backup/latest"

# Create incremental backup
rsync -avz --link-dest="$PREV" /data/ "$DEST/"

# Update latest symlink
ln -snf "$DEST" /backup/latest

Deployment pattern:

# Deploy with dry run first
rsync -avzn --delete /local/app/ prod-server:/opt/app/

# If ok, deploy for real
rsync -avz --delete /local/app/ prod-server:/opt/app/

# Restart service
ssh prod-server "systemctl restart app"

Tips:

Trailing slash matters: /path/ syncs contents, /path syncs directory itself
Always test with -n first when using --delete
Faster than scp for large/repeated transfers
Use --checksum for verification (slower but accurate)

# Extract specific fields
cut -d':' -f1 /etc/passwd

# Extract columns
ps aux | cut -c1-20

Use case: Extract specific data from structured text
DevOps context: Parsing CSV, extracting IDs from output
Pro tip: -d sets delimiter, -f selects fields

Part 5: Remote Operations & File Transfer

Command: ssh

# Connect to remote server
ssh user@hostname

# Use specific key
ssh -i ~/.ssh/private_key user@hostname

# Execute remote command
ssh user@hostname "systemctl status nginx"

# SSH tunnel (port forwarding)
ssh -L 8080:localhost:80 user@hostname

Use case: Remote server access, command execution
DevOps context: Server management, deployment automation
Security tip: Always use key-based authentication

Command: scp

# Copy file to remote
scp file.txt user@hostname:/path/to/destination

# Copy from remote
scp user@hostname:/path/to/file.txt ./

# Copy directory recursively
scp -r ./directory user@hostname:/path/

Use case: File transfer between servers
DevOps context: Deploying artifacts, copying configs
Alternative: rsync for larger transfers

Command: rsync

# Sync directories
rsync -avz /local/path/ user@hostname:/remote/path/

# Dry run (preview changes)
rsync -avzn /local/path/ user@hostname:/remote/path/

# Delete files on destination not in source
rsync -avz --delete /local/path/ user@hostname:/remote/path/

# Show progress
rsync -avz --progress /local/path/ user@hostname:/remote/path/

Use case: Efficient file synchronization, backups
DevOps context: Deployment automation, backup scripts
Pro tip: Always test with -n (dry run) first!

Part 6: System Information Commands

Command: hostname

# Show hostname
hostname

# Show IP address
hostname -i

# Show FQDN
hostname -f

Use case: Identify which server you're on
DevOps context: Multi-server management, cluster identification
Pro tip: Essential in tmux/screen sessions

Command: uname

# Show kernel version
uname -r

# Show all system info
uname -a

Use case: Check OS version, kernel compatibility
DevOps context: Verifying system requirements, documentation

Command: date

# Current date/time
date

# Format date
date +"%Y-%m-%d %H:%M:%S"

# UTC time
date -u

# Set date (requires sudo)
sudo date -s "2025-10-07 14:30:00"

Use case: Timestamps, time synchronization checks
DevOps context: Log analysis, scheduling tasks
Pro tip: Use NTP for time sync, not manual setting

Part 7: User & Permission Management

sudo — Controlled Privilege Escalation

# Execute single command as root
sudo systemctl restart nginx

# Open root shell
sudo -i

# Run as specific user
sudo -u postgres psql

# Preserve environment
sudo -E env | grep PATH

# Edit privileged file safely
sudo visudo
sudoedit /etc/nginx/nginx.conf

# Validate sudoers syntax
sudo visudo -c

# Show sudo access
sudo -l

# Log sudo commands
sudo tail -f /var/log/secure | grep sudo

Common sudo workflows:

# Service restart
sudo systemctl restart myapp

# File permissions fix
sudo chown -R appuser:appuser /opt/app

# Package install
sudo dnf install nginx

# Log inspection
sudo journalctl -u nginx -f

# Config edits
sudo vim /etc/systemd/system/myapp.service

Security hardening:

# Limit sudo timeout
echo "Defaults timestamp_timeout=5" | sudo tee -a /etc/sudoers.d/timeout

# Require password always
echo "Defaults !tty_tickets" | sudo tee -a /etc/sudoers.d/notty

# Log all sudo commands
echo "Defaults logfile=/var/log/sudo.log" | sudo tee -a /etc/sudoers.d/logging

# Restrict commands per user
echo "deploy ALL=(ALL) NOPASSWD: /usr/bin/systemctl restart myapp" | sudo tee /etc/sudoers.d/deploy

Troubleshooting sudo:

Check /var/log/secure for auth failures
Verify user in wheel/sudo group: groups username
Validate sudoers: sudo visudo -c
Test access: sudo -l -U username

Tips:

Never edit /etc/sudoers directly, use visudo
Use sudoedit for safe config editing
Grant minimal permissions (specific commands only)
Log sudo activity for audit trail

chmod — Permission Control

# Make script executable
chmod +x deploy.sh

# Numeric permissions
chmod 755 script.sh  # rwxr-xr-x
chmod 644 config.yml # rw-r--r--
chmod 600 secret.key # rw-------
chmod 700 private/   # rwx------

# Symbolic mode
chmod u+x script.sh           # User: add execute
chmod g-w config.yml          # Group: remove write
chmod o-rwx secret.key        # Others: remove all
chmod a+r readme.txt          # All: add read

# Recursive
chmod -R 755 /var/www/html

# Special bits
chmod u+s /usr/bin/sudo       # setuid
chmod g+s /opt/shared         # setgid
chmod +t /tmp                 # sticky bit

Permission reference:

| Octal | Binary | Symbolic | Meaning              |
| ----- | ------ | -------- | -------------------- |
| 7     | 111    | rwx      | read, write, execute |
| 6     | 110    | rw-      | read, write          |
| 5     | 101    | r-x      | read, execute        |
| 4     | 100    | r--      | read only            |
| 3     | 011    | -wx      | write, execute       |
| 2     | 010    | -w-      | write only           |
| 1     | 001    | --x      | execute only         |
| 0     | 000    | ---      | no permissions       |

Deployment patterns:

# Web server files
sudo chmod -R 755 /var/www/html
sudo chmod -R 644 /var/www/html/*.html

# Application directories
sudo chmod 755 /opt/app
sudo chmod 755 /opt/app/bin/*
sudo chmod 644 /opt/app/config/*
sudo chmod 600 /opt/app/config/secrets.yml

# Logs (application needs write)
sudo chmod 755 /var/log/myapp
sudo chmod 644 /var/log/myapp/*.log

# Shared team directory (setgid)
sudo chmod 2775 /opt/shared
# New files inherit group ownership

Security considerations:

# Find world-writable files (danger)
find / -type f -perm -002 2>/dev/null

# Find setuid binaries (audit these)
find / -type f -perm -4000 2>/dev/null

# SSH key permissions (strict)
chmod 600 ~/.ssh/id_rsa
chmod 644 ~/.ssh/id_rsa.pub
chmod 700 ~/.ssh

# Config files with secrets
chmod 600 /opt/app/config/database.yml

Tips:

755 for directories, 644 for files (default safe values)
600 for secrets (owner read/write only)
Use setgid (2775) for shared team directories
Never chmod 777 (world-writable = security risk)

chown / chgrp — Ownership Management

# Change owner and group
sudo chown user:group file.txt

# Change owner only
sudo chown user file.txt

# Change group only
sudo chgrp group file.txt
sudo chown :group file.txt

# Recursive ownership
sudo chown -R www-data:www-data /var/www/html

# Follow symlinks
sudo chown -LR user:group /path/with/symlinks

# Report changes
sudo chown -v user:group file.txt

# Reference another file
sudo chown --reference=/etc/nginx/nginx.conf custom.conf

Application deployment:

# After deployment, fix ownership
sudo chown -R appuser:appgroup /opt/app

# Web server content
sudo chown -R nginx:nginx /var/www/html

# Database files
sudo chown -R postgres:postgres /var/lib/pgsql

# Log files
sudo chown -R appuser:appgroup /var/log/myapp

Shared directory pattern:

# Create shared directory
sudo mkdir /opt/shared
sudo chgrp developers /opt/shared
sudo chmod 2775 /opt/shared  # setgid + group write

# Now all files created inherit 'developers' group
# Team members can collaborate without permission issues

Troubleshooting ownership:

# Find files by owner
find /opt/app -user olduser

# Bulk ownership change
find /opt/app -user olduser -exec sudo chown newuser:newgroup {} +

# Check who owns critical files
ls -l /etc/systemd/system/myapp.service
ls -l /opt/app/bin/start.sh

# Verify web server can read
sudo -u nginx cat /var/www/html/index.html

Tips:

Use -R carefully, avoid / or /home
Application files should be owned by app user
Use setgid for shared directories
Always check ownership after deployment

getent — User/Group Queries

# Query user database
getent passwd username
getent passwd | grep -i john

# Query group database
getent group developers
getent group | grep -i admin

# Check if user exists (script-friendly)
getent passwd deploy > /dev/null && echo "User exists"

# List all groups a user belongs to
getent group | grep username
# Better: use `groups` or `id`
groups username
id -nG username

# Query shadow (requires root)
sudo getent shadow username

# Check LDAP/AD users
getent passwd | wc -l  # Total users including LDAP

# Hosts database
getent hosts example.com

# Services database
getent services http
getent services 80

Troubleshooting access:

# User verification workflow
getent passwd username
groups username
id username
sudo -l -U username

# Check application user
getent passwd appuser
ps aux | grep appuser
sudo ls -la /opt/app

# Verify group membership
getent group developers
# Does output include expected user?

# Find user's primary group
id -gn username

# Audit sudo access
getent group wheel
getent group sudo

Tips:

getent queries all NSS sources (local + LDAP/AD)
Use for portability (works on all Linux distros)
Better than cat /etc/passwd (misses LDAP/AD users)
Combine with id and groups for complete picture

Part 8: DevOps-Specific Power Commands

xargs — Argument Builder for Bulk Operations

# Basic usage
echo "file1 file2 file3" | xargs rm

# From find output
find /tmp -name "*.tmp" | xargs rm

# Parallel execution
cat servers.txt | xargs -P 4 -I {} ssh {} "uptime"

# Handle filenames with spaces
find . -name "*.log" -print0 | xargs -0 rm

# Interactive confirmation
find . -name "*.bak" | xargs -p rm

# One argument per command
cat users.txt | xargs -n 1 sudo useradd

# Custom placeholder
cat servers.txt | xargs -I HOST ssh HOST "df -h"

# Max processes
seq 1 100 | xargs -P 10 -I {} curl "http://api/item/{}"

# Show command before execution
echo "file1 file2" | xargs -t rm

Bulk operations:

# Kill multiple processes
ps aux | grep zombie | awk '{print $2}' | xargs kill -9

# Restart services across servers
cat servers.txt | xargs -I {} ssh {} "systemctl restart nginx"

# Parallel healthcheck
cat production-servers.txt | xargs -P 10 -I {} sh -c 'curl -sf http://{}/health || echo "{} DOWN"'

# Download multiple files
cat urls.txt | xargs -n 1 -P 4 wget

# Change ownership in bulk
find /opt/app -user olduser -print0 | xargs -0 sudo chown newuser:newgroup

# Parallel log search
cat servers.txt | xargs -P 5 -I {} ssh {} "grep ERROR /var/log/app.log"

# Compress old logs
find /var/log -name "*.log" -mtime +30 -print0 | xargs -0 gzip

# Delete empty directories
find . -type d -empty -print0 | xargs -0 rmdir

Deployment automation:

# Deploy to multiple servers
cat prod-servers.txt | xargs -P 3 -I {} sh -c '
  echo "Deploying to {}"
  rsync -az /local/app/ {}:/opt/app/
  ssh {} "systemctl restart myapp"
  ssh {} "curl -sf http://localhost:8080/health"
'

# Parallel config updates
cat servers.txt | xargs -P 5 -I {} scp config.yml {}:/opt/app/config/

# Bulk log rotation
cat servers.txt | xargs -I {} ssh {} "sudo logrotate -f /etc/logrotate.conf"

Tips:

-P N for parallel execution (speedup)
-0 with find -print0 for filenames with spaces
-I {} for custom placeholder
-n 1 to run one argument at a time
Test with echo before destructive operations

history — Command Audit Trail

# Show full history
history

# Last 20 commands
history 20

# Search history
history | grep ssh
history | grep "systemctl restart"

# Execute by number
!123

# Execute last command
!!

# Execute last command with specific text
!ssh
!curl

# Last argument from previous command
ls /var/log/nginx
cd !$  # cd to /var/log/nginx

# All arguments from previous command
grep ERROR /var/log/app.log
vi !*  # vi /var/log/app.log

# Previous command, substitution
docker run old-image
^old^new^  # docker run new-image

# Reverse search (interactive)
Ctrl+R
# Type to search, Enter to execute

History configuration:

# Add to ~/.bashrc for better history

# Unlimited history
export HISTSIZE=10000
export HISTFILESIZE=20000

# Timestamp in history
export HISTTIMEFORMAT="%Y-%m-%d %H:%M:%S  "

# Ignore duplicates and space-prefixed commands
export HISTCONTROL=ignoreboth:erasedups

# Ignore common commands
export HISTIGNORE="ls:ll:cd:pwd:history"

# Append to history (don't overwrite)
shopt -s histappend

# Save immediately (not on shell exit)
PROMPT_COMMAND="history -a"

# Multi-line commands on single line
shopt -s cmdhist

Practical workflows:

# Document what you did
history | grep -A 5 "systemctl restart"

# Build runbook from history
history | grep "docker" > docker-commands.txt

# Repeat deployment steps
history | grep -E "(rsync|systemctl restart)"

# Audit manual changes
history | grep "vi /etc"

# Share troubleshooting steps
history 50 | grep -E "(curl|grep|tail)"

Incident response:

# What did I just run?
history 5

# When was nginx restarted?
history | grep "systemctl.*nginx"

# Find that long curl command
history | grep curl | grep -i auth

# Repeat complex command
!curl
# or
Ctrl+R curl

Tips:

Set large HISTSIZE for better recall
Add timestamps with HISTTIMEFORMAT
Use Ctrl+R for interactive search
Prefix sensitive commands with space to skip history

alias — Command Shortcuts

# Create temporary alias
alias ll='ls -lah'
alias ports='ss -tuln'

# View all aliases
alias

# Remove alias
unalias ll

# Make permanent (add to ~/.bashrc)
echo "alias ll='ls -lah'" >> ~/.bashrc
source ~/.bashrc

# Escape alias (run original command)
\ls  # bypasses alias

# Check if command is aliased
type ll

DevOps-focused aliases:

# Add to ~/.bashrc

# System info
alias meminfo='free -h'
alias diskinfo='df -h'
alias cpuinfo='lscpu'
alias ports='ss -tuln'

# Service management
alias svc='systemctl'
alias svcstatus='systemctl status'
alias svcrestart='sudo systemctl restart'
alias svclogs='journalctl -u'

# Process monitoring
alias pscpu='ps aux | sort -nrk 3 | head'
alias psmem='ps aux | sort -nrk 4 | head'
alias topme='htop -u $USER'

# Logs
alias tailf='tail -f'
alias taillogs='sudo tail -f /var/log/messages'
alias greplog='grep -Hn --color=auto'

# Docker (if used)
alias dps='docker ps'
alias dlog='docker logs -f'
alias dexec='docker exec -it'
alias dclean='docker system prune -af'

# Git (if used)
alias gs='git status'
alias gl='git log --oneline -10'
alias gp='git pull'

# Safety aliases
alias rm='rm -i'
alias cp='cp -i'
alias mv='mv -i'

# Navigation
alias ..='cd ..'
alias ...='cd ../..'
alias ll='ls -lah'
alias la='ls -A'

# Network
alias myip='curl ifconfig.me'
alias pingg='ping -c 5 8.8.8.8'
alias listening='sudo ss -tulpn | grep LISTEN'

# System
alias update='sudo dnf update'
alias reboot='sudo systemctl reboot'
alias suspend='sudo systemctl suspend'

Team-shared aliases:

# Create team dotfiles repo
mkdir ~/dotfiles
cd ~/dotfiles

# Create shared aliases
cat > aliases.sh << 'EOF'
# Production shortcuts
alias prod-ssh='ssh -J bastion prod-server'
alias prod-logs='ssh prod-server "sudo journalctl -u myapp -f"'
alias prod-status='ssh prod-server "systemctl status myapp"'

# Deployment helpers
alias deploy-staging='./scripts/deploy.sh staging'
alias deploy-prod='./scripts/deploy.sh production'

# Monitoring
alias check-health='for s in $(cat servers.txt); do curl -sf http://$s/health || echo "$s DOWN"; done'
EOF

# Source in ~/.bashrc
echo "source ~/dotfiles/aliases.sh" >> ~/.bashrc

Tips:

Use meaningful names (verb-noun pattern)
Document complex aliases
Share team aliases via dotfiles repo
Test aliases before making permanent
Use functions for complex logic

time — Performance Measurement

# Basic timing (shell builtin)
time ls -R /

# Detailed system time (/usr/bin/time)
/usr/bin/time -v ./script.sh

# Output format
time -p ./command  # Portable format

# Redirect time output
{ time ./script.sh; } 2> timing.txt

# Time multiple commands
time { command1; command2; command3; }

Interpreting output:

$ time ./deploy.sh

real    2m15.432s    # Wall clock time (total elapsed)
user    0m5.220s     # CPU time in user mode
sys     0m1.880s     # CPU time in kernel mode

# If real >> user+sys = I/O or network bound
# If user >> sys = CPU intensive (good)
# If sys >> user = kernel overhead (syscalls, I/O)

Detailed metrics with /usr/bin/time:

$ /usr/bin/time -v ./process-logs.sh

Command being timed: "./process-logs.sh"
User time (seconds): 45.23
System time (seconds): 8.12
Percent of CPU this job got: 87%
Elapsed (wall clock) time: 1:01.23
Maximum resident set size (kbytes): 524288
Page faults: 1243
Voluntary context switches: 8934

Performance analysis:

# Compare script versions
time ./old-script.sh > /dev/null
time ./new-script.sh > /dev/null

# Find slow command in pipeline
time cat large.log | grep ERROR | awk '{print $1}' | sort | uniq -c

# Time individual pipeline stages
time cat large.log > /dev/null
time cat large.log | grep ERROR > /dev/null
time cat large.log | grep ERROR | awk '{print $1}' > /dev/null

# Deployment timing
time {
  rsync -az /local/ server:/remote/
  ssh server "systemctl restart app"
  sleep 10
  curl http://server/health
}

Benchmarking:

# Run multiple times
for i in {1..10}; do
  time ./script.sh > /dev/null
done

# Average timing
for i in {1..10}; do
  { time ./script.sh > /dev/null; } 2>&1 | grep real
done | awk '{sum+=$2; count++} END {print sum/count}'

Tips:

Shell builtin time vs /usr/bin/time (different features)
Use -v with /usr/bin/time for memory stats
High I/O wait = optimize disk/network
Profile before optimizing

watch — Live Command Monitoring

# Execute every 2 seconds (default)
watch df -h

# Custom interval
watch -n 5 "ps aux | grep nginx"

# Highlight differences
watch -d -n 1 "ss -s"

# Exit on change
watch -g "curl -s http://localhost/health | grep UP"

# Precise timing
watch -p -n 0.1 "cat /proc/loadavg"

# No title
watch -t df -h

# Beep on error
watch -b "curl -sf http://localhost/health"

Deployment monitoring:

# Watch service status during deployment
watch -n 1 "systemctl status myapp"

# Monitor application startup
watch -d -n 2 "ss -tuln | grep :8080"

# Watch log for errors
watch -n 1 "tail -20 /var/log/myapp/error.log"

# Monitor health endpoint
watch -n 5 "curl -sf http://localhost/health || echo 'DOWN'"

# Wait for service to start
watch -g "curl -s http://localhost/health | grep UP"
# Exits when grep succeeds

# Monitor resource usage
watch -d -n 2 "ps aux | grep myapp | grep -v grep"

# Track deployment progress
watch -n 1 "ls -lh /opt/app/ | tail -5"

System monitoring:

# Live disk usage
watch -d df -h

# Memory changes
watch -d -n 1 free -h

# Network connections
watch -d -n 2 "ss -s"

# Load average
watch -n 5 uptime

# Process count
watch -n 5 "ps aux | wc -l"

# Active connections by IP
watch -d -n 2 "ss -tn | tail -n +2 | awk '{print \$5}' | cut -d: -f1 | sort | uniq -c | sort -nr | head -10"

Automation patterns:

# Wait for port to open (deployment)
watch -g "ss -tuln | grep :8080"
echo "Service is up!"

# Monitor log until error appears
watch -g "grep -q ERROR /var/log/app.log"
echo "Error detected!"

# Wait for file to appear
watch -g "test -f /tmp/deploy-complete"
echo "Deployment finished"

# Monitor certificate expiry
watch -n 3600 "echo | openssl s_client -connect example.com:443 2>/dev/null | openssl x509 -noout -dates"

Tips:

Use -d to highlight what changed
Combine with -g to wait for condition
Quote complex commands with pipes
Use -n for custom intervals (minimum 0.1s)
Exit watch with Ctrl+C

Part 9: Real-World DevOps Scenarios

These workflows combine multiple commands to solve actual production problems.

Scenario 1: Application Down — High CPU Usage

Situation: Application unresponsive, server CPU at 100%.

Triage workflow:

# 1. Establish current state
uptime
# Load average: 8.5, 7.2, 5.1 (high for 4-core system)

top
# Press '1' to show individual cores
# Press 'P' to sort by CPU
# Identify: java process consuming 380% CPU

# 2. Identify the culprit process
ps aux | sort -nrk 3 | head -5
# Output shows PID 12345, user 'appuser', 'java -jar myapp.jar'

# 3. Get detailed process info
ps -fp 12345
cat /proc/12345/cmdline | tr '\0' ' '
# Full command: java -jar -Xmx2g myapp.jar --spring.profiles.active=prod

# 4. Check what the process is doing
sudo lsof -p 12345 | head -20
# Shows open files, network connections

# 5. Sample thread activity (Java-specific)
sudo -u appuser jstack 12345 > /tmp/thread-dump.txt
cat /tmp/thread-dump.txt | grep -A 10 "RUNNABLE"

# Or use strace for system call analysis
sudo strace -c -p 12345
# Run for 10 seconds, Ctrl+C
# Shows: lots of futex, read, write calls

# 6. Check application logs
journalctl -u myapp -n 200 --no-pager
tail -100 /var/log/myapp/application.log | grep -E "ERROR|WARN"

# 7. Check for resource exhaustion
free -h
# Available: 128M (low memory!)

df -h
# All filesystems have space

# 8. Investigate memory
ps aux | sort -nrk 4 | head -5
# Same java process using 85% memory

# 9. Root cause: memory leak causing GC thrashing
# Evidence: High CPU + high memory + GC logs showing full GC cycles

# 10. Immediate remediation
sudo systemctl restart myapp

# 11. Verify recovery
watch -n 2 "systemctl status myapp"
curl http://localhost:8080/health

# 12. Monitor
watch -d -n 5 "ps aux | grep java | grep -v grep"

# 13. Post-incident
# - Review heap dump
# - Check for memory leaks in code
# - Tune JVM parameters
# - Set up memory alerts

Key commands used: uptime, top, ps, lsof, strace, journalctl, free, systemctl

Scenario 2: Disk Full — Application Failing

Situation: Application writes failing, errors mentioning "No space left on device".

Triage workflow:

# 1. Confirm disk space issue
df -h
# Output: /dev/sda1  50G  50G  0  100% /

# 2. Check inode exhaustion (common gotcha)
df -i
# Output: /dev/sda1  3M  3M  0  100% /
# Problem: Out of inodes, not space!

# 3. Find directory with most files
for dir in /*; do
  echo -n "$dir: "
  find "$dir" -xdev -type f | wc -l
done
# Output shows /var has 2.8M files

# 4. Drill down
for dir in /var/*; do
  echo -n "$dir: "
  find "$dir" -xdev -type f 2>/dev/null | wc -l
done
# /var/spool/postfix has 2.5M files!

# 5. Investigate further
ls -la /var/spool/postfix/deferred | head
# Thousands of mail queue files

# 6. Find largest space consumers (if space, not inode issue)
du -hx --max-depth=1 / | sort -hr | head -10
# /var is largest

du -hx --max-depth=1 /var | sort -hr | head -10
# /var/log is 30G

du -hx --max-depth=1 /var/log | sort -hr | head -10
# /var/log/myapp is 28G

# 7. Find large log files
find /var/log -type f -size +1G -exec ls -lh {} \;
# /var/log/myapp/application.log is 25G (not rotated!)

# 8. Check for deleted but open files (hidden space usage)
sudo lsof | grep deleted
# Shows process holding deleted 10G file

# 9. Immediate remediation (for log issue)
# Truncate, don't delete (keeps file descriptor valid)
sudo truncate -s 0 /var/log/myapp/application.log

# Or compress in place
sudo gzip /var/log/myapp/application.log

# 10. Clean old logs
find /var/log -name "*.log" -mtime +30 -exec gzip {} \;
find /var/log -name "*.gz" -mtime +90 -delete

# 11. For inode issue, remove old mail queue
sudo postsuper -d ALL deferred

# 12. Verify space recovered
df -h
df -i

# 13. Restart application
sudo systemctl restart myapp

# 14. Verify health
curl http://localhost:8080/health
tail -f /var/log/myapp/application.log

# 15. Post-incident
# - Configure log rotation
# - Set up disk space monitoring
# - Implement log retention policy

Log rotation fix:

# Create logrotate config
sudo tee /etc/logrotate.d/myapp << EOF
/var/log/myapp/*.log {
    daily
    rotate 7
    compress
    delaycompress
    missingok
    notifempty
    create 0644 appuser appgroup
    postrotate
        systemctl reload myapp
    endscript
}
EOF

# Test it
sudo logrotate -d /etc/logrotate.d/myapp
sudo logrotate -f /etc/logrotate.d/myapp

Key commands used: df, du, find, lsof, truncate, logrotate

Scenario 3: Network Connectivity Issues

Situation: Application can't reach database server.

Triage workflow:

# 1. Define the problem
# Application logs show: "Connection refused: db.internal:5432"

# 2. Test basic connectivity
ping -c 4 db.internal
# 64 bytes from 10.0.1.50: success
# Network layer works

# 3. Test DNS resolution
dig db.internal
# Returns: 10.0.1.50
# DNS works

# Verify with getent
getent hosts db.internal
# 10.0.1.50  db.internal

# 4. Check if port is reachable
nc -zv db.internal 5432
# Connection refused

# Alternative: use curl for TCP check
timeout 5 bash -c "</dev/tcp/db.internal/5432" && echo "Port open" || echo "Port closed"

# 5. Check local routing
ip route get 10.0.1.50
# Shows route via 10.0.1.1

# Trace the path
traceroute -n db.internal
# All hops respond

# 6. Check local listening ports
ss -tuln | grep :5432
# Nothing! PostgreSQL not listening

# 7. Log into database server
ssh db.internal

# 8. Check if PostgreSQL is running
sudo systemctl status postgresql
# Active: inactive (dead)

# Service crashed!

# 9. Check why it's down
sudo journalctl -u postgresql -n 100
# Shows: "FATAL: could not create lock file: No space left"

# 10. Check disk space
df -h
# /var is 100% full

# 11. Clean up space (see Scenario 2)
sudo find /var/log -name "*.log" -mtime +7 -exec gzip {} \;

# 12. Start PostgreSQL
sudo systemctl start postgresql

# 13. Verify it's listening
ss -tuln | grep :5432
# tcp  LISTEN  0  128  *:5432  *:*

# 14. Test from app server
nc -zv db.internal 5432
# Connection to db.internal 5432 port [tcp/postgresql] succeeded!

# 15. Check PostgreSQL logs
sudo tail -f /var/log/postgresql/postgresql-*.log

# 16. Test application connection
# From app server:
psql -h db.internal -U appuser -d appdb -c "SELECT 1;"
# Success!

# 17. Restart application
ssh app-server "sudo systemctl restart myapp"

# 18. Verify application health
curl http://app-server:8080/health

Network debugging cheat sheet:

# Layer 1-2: Physical/Link
ip link show
ethtool eth0

# Layer 3: Network
ping -c 4 <host>
ip route
ip addr

# Layer 4: Transport
ss -tuln | grep :<port>
nc -zv <host> <port>

# Layer 7: Application
curl -v http://<host>:<port>/
dig <hostname>

# Firewall
sudo firewall-cmd --list-all
sudo iptables -L -n

Key commands used: ping, dig, nc, ss, ip, traceroute, ssh, systemctl, journalctl

Scenario 4: Post-Deployment Validation

Situation: Just deployed new application version, need to verify health.

Complete validation workflow:

# 1. Pre-deployment snapshot
ssh prod-server << 'EOF'
  systemctl status myapp > /tmp/pre-deploy.txt
  ss -tuln | grep :8080 >> /tmp/pre-deploy.txt
  ps aux | grep myapp >> /tmp/pre-deploy.txt
EOF

# 2. Deploy application
rsync -avz --delete /local/app/ prod-server:/opt/app/

# 3. Restart service
ssh prod-server "sudo systemctl restart myapp"

# 4. Wait for startup (30 seconds)
sleep 30

# 5. Check service status
ssh prod-server "systemctl status myapp"
# Active: active (running) since...

# 6. Verify process is running
ssh prod-server "ps aux | grep myapp | grep -v grep"
# Shows java process

# 7. Check port is listening
ssh prod-server "ss -tuln | grep :8080"
# tcp  LISTEN  0  128  *:8080  *:*

# 8. Test health endpoint
curl -sf http://prod-server:8080/health
# {"status":"UP","version":"1.2.3"}

# If fails:
curl -v http://prod-server:8080/health
# Shows detailed error

# 9. Check application logs for startup
ssh prod-server "sudo journalctl -u myapp -n 50 --no-pager"
# Look for "Started MyApp" message

# 10. Check for errors
ssh prod-server "sudo journalctl -u myapp -p err -n 20"
# No errors = good

# 11. Watch logs for anomalies
ssh prod-server "sudo tail -f /var/log/myapp/application.log" &
TAIL_PID=$!

# 12. Smoke test critical endpoints
curl -sf http://prod-server:8080/api/users | jq '.data | length'
# Returns count

curl -sf http://prod-server:8080/api/config
# Returns config

# 13. Load test (light)
for i in {1..100}; do
  curl -sf http://prod-server:8080/health > /dev/null
  echo -n "."
done
echo " Done"

# 14. Monitor resource usage
ssh prod-server "ps aux | grep myapp | awk '{print \$3, \$4}'"
# CPU: 2.5%, MEM: 15.3%

# 15. Check for memory leaks (wait 5 minutes, check again)
sleep 300
ssh prod-server "ps aux | grep myapp | awk '{print \$3, \$4}'"
# CPU: 1.2%, MEM: 15.4% (stable)

# 16. Database connectivity
ssh prod-server "curl -sf http://localhost:8080/api/db-health"
# {"database":"connected"}

# 17. External dependencies
ssh prod-server "curl -sf http://localhost:8080/api/dependencies"
# Shows all upstream services: OK

# 18. Monitor for 10 minutes
watch -d -n 10 "ssh prod-server 'systemctl status myapp | head -3'"

# 19. Check error logs continuously
ssh prod-server "sudo journalctl -u myapp -f" | grep -i error &

# 20. Stop monitoring
kill $TAIL_PID

# 21. Final validation
curl -sf http://prod-server:8080/health && echo "✓ DEPLOYMENT SUCCESS" || echo "✗ DEPLOYMENT FAILED"

# 22. Post-deployment snapshot
ssh prod-server << 'EOF'
  systemctl status myapp > /tmp/post-deploy.txt
  ss -tuln | grep :8080 >> /tmp/post-deploy.txt
  ps aux | grep myapp >> /tmp/post-deploy.txt
EOF

# 23. Compare before/after
ssh prod-server "diff /tmp/pre-deploy.txt /tmp/post-deploy.txt"

Automated health check script:

#!/bin/bash
set -euo pipefail

HOST="$1"
PORT="${2:-8080}"
TIMEOUT=300  # 5 minutes

echo "Validating deployment on $HOST:$PORT"

# Wait for port to open
echo -n "Waiting for port to listen..."
for i in $(seq 1 $TIMEOUT); do
  if nc -z "$HOST" "$PORT" 2>/dev/null; then
    echo " OK"
    break
  fi
  echo -n "."
  sleep 1
done

# Test health endpoint
echo -n "Testing health endpoint..."
if curl -sf "http://$HOST:$PORT/health" > /dev/null; then
  echo " OK"
else
  echo " FAILED"
  exit 1
fi

# Check logs for errors
echo -n "Checking logs for errors..."
if ssh "$HOST" "sudo journalctl -u myapp --since '5 minutes ago' -p err -q"; then
  echo " ERRORS FOUND"
  exit 1
else
  echo " OK"
fi

# Monitor for 60 seconds
echo "Monitoring for 60 seconds..."
for i in {1..60}; do
  if ! curl -sf "http://$HOST:$PORT/health" > /dev/null; then
    echo "Health check failed after $i seconds"
    exit 1
  fi
  echo -n "."
  sleep 1
done
echo " OK"

echo "✓ Deployment validation complete"

Key commands used: rsync, ssh, systemctl, ps, ss, curl, journalctl, tail, watch

Part 10: Pro Tips & Best Practices

1. Command Combinations (Pipeline Mastery)

Process analysis:

# Top 10 CPU consumers
ps aux | sort -nrk 3 | head -10

# Top 10 memory consumers
ps aux | sort -nrk 4 | head -10

# Count processes by user
ps aux | awk '{print $1}' | sort | uniq -c | sort -nr

# Find zombie processes
ps aux | awk '$8=="Z" {print}'

# Process tree for specific process
ps -ef | grep -A 10 nginx

# Total memory by command
ps aux | awk '{arr[$11]+=$6} END {for (i in arr) print i,arr[i]/1024 "MB"}' | sort -nrk 2 | head

Log analysis pipelines:

# Top 10 error types
grep ERROR /var/log/app.log | awk '{print $5}' | sort | uniq -c | sort -nr | head -10

# Requests per minute
awk '{print $1}' access.log | sort | uniq -c

# 95th percentile response time
awk '{print $10}' access.log | sort -n | awk '{a[NR]=$1} END {print a[int(NR*0.95)]}'

# Top IPs by request count
awk '{print $1}' access.log | sort | uniq -c | sort -nr | head -10

# Failed requests by hour
grep " 500 " access.log | awk '{print $4}' | cut -d: -f1-2 | uniq -c

# Unique users today
awk '{print $1}' access.log | sort -u | wc -l

Network analysis:

# Connections by state
ss -ant | awk '{print $1}' | sort | uniq -c

# Connections per IP
ss -tn | tail -n +2 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr

# Listening services with PIDs
sudo ss -tlnp | column -t

# Bandwidth by process (requires iftop)
sudo iftop -P

# Active connections to specific port
ss -tn sport = :8080 | tail -n +2 | wc -l

System monitoring oneliners:

# CPU usage per core
mpstat -P ALL 1 5

# Disk I/O by device
iostat -xz 1 5

# Memory breakdown
free -h && echo "---" && cat /proc/meminfo | grep -E "Dirty|Writeback|Mapped"

# Top 10 open file descriptors by process
sudo lsof | awk '{print $1}' | sort | uniq -c | sort -nr | head -10

# Process with most threads
ps -eo pid,comm,nlwp | sort -nrk 3 | head

File operations:

# Find recently modified config files
find /etc -name "*.conf" -mtime -1 -ls

# Find files larger than 100M modified in last 7 days
find / -type f -size +100M -mtime -7 2>/dev/null

# Duplicate file finder (by size)
find . -type f -exec du -h {} + | sort -h | uniq -d -w 10

# Find and delete temp files older than 7 days
find /tmp -type f -name "*.tmp" -mtime +7 -delete

# Disk usage by user in /home
sudo du -sch /home/*/ | sort -h

2. Essential Aliases for DevOps

Create ~/.bash_aliases or add to ~/.bashrc:

# System monitoring
alias meminfo='free -h'
alias cpuinfo='lscpu | grep -E "Model name|^CPU\(s\)|Thread|Core"'
alias diskinfo='df -h | grep -v loop'
alias topme='htop -u $USER'

# Process management
alias pscpu='ps aux | sort -nrk 3 | head -10'
alias psmem='ps aux | sort -nrk 4 | head -10'
alias pstree='ps axjf'

# Service management  
alias svc='systemctl'
alias svcs='systemctl status'
alias svcr='sudo systemctl restart'
alias svce='sudo systemctl enable'
alias svcd='sudo systemctl disable'
alias svclogs='journalctl -u'
alias svclist='systemctl list-units --type=service --state=running'

# Network
alias ports='sudo ss -tulpn'
alias listening='sudo ss -tulpn | grep LISTEN'
alias connections='ss -tu'
alias myip='curl -s ifconfig.me'
alias pingtest='ping -c 5 8.8.8.8'

# Logs
alias tailf='tail -f'
alias taillogs='sudo tail -f /var/log/messages'
alias tailerr='sudo tail -f /var/log/messages | grep -i error'
alias syslog='sudo journalctl -f'

# File operations
alias ll='ls -lah'
alias la='ls -A'
alias lt='ls -lhtr'  # Sorted by time
alias lsize='ls -lhS' # Sorted by size
alias tree='tree -C'

# Safety
alias rm='rm -i'
alias cp='cp -i'
alias mv='mv -i'
alias mkdir='mkdir -p'

# Navigation
alias ..='cd ..'
alias ...='cd ../..'
alias ....='cd ../../..'

# Git (if used)
alias gs='git status'
alias gd='git diff'
alias gl='git log --oneline -10'
alias gb='git branch'
alias gp='git pull'
alias gf='git fetch'

# Docker (if used)
alias dps='docker ps'
alias dpsa='docker ps -a'
alias dimg='docker images'
alias dlog='docker logs -f'
alias dexec='docker exec -it'
alias dstop='docker stop $(docker ps -q)'
alias dprune='docker system prune -af'

# Quick commands
alias h='history'
alias c='clear'
alias x='exit'
alias reload='source ~/.bashrc'

# Custom shortcuts
alias dev='cd ~/projects/dev-server && ll'
alias logs='cd /var/log && ll'
alias nginx-reload='sudo systemctl reload nginx && echo "Nginx reloaded"'
alias app-restart='sudo systemctl restart myapp && curl -sf localhost:8080/health'

Team-shared aliases (put in git repo):

# Production shortcuts
alias prod-ssh='ssh -J bastion prod-server'
alias prod-status='ssh prod-server "systemctl status myapp"'
alias prod-logs='ssh prod-server "sudo journalctl -u myapp -f"'
alias prod-restart='ssh prod-server "sudo systemctl restart myapp"'

# Deployment
alias deploy-staging='./scripts/deploy.sh staging'
alias deploy-prod='./scripts/deploy.sh production'

# Health checks
alias check-all='for s in $(cat servers.txt); do echo -n "$s: "; curl -sf http://$s/health && echo OK || echo FAIL; done'

# Log aggregation
alias tail-all='for s in $(cat servers.txt); do echo "=== $s ==="; ssh $s "tail -20 /var/log/app.log"; done'

3. Safety Practices

Before destructive operations:

# Test with echo first
find /old/path -name "*.log" -exec echo rm {} \;

# Dry run with rsync
rsync -avzn --delete /source/ /dest/

# Backup before sed -i
cp file.conf file.conf.backup
sed -i 's/old/new/' file.conf

# Or use sed without -i
sed 's/old/new/' file.conf > file.conf.new
diff file.conf file.conf.new
mv file.conf.new file.conf

# Interactive delete
rm -i important-file

# Confirm before bulk operations
find . -name "*.tmp" -ok rm {} \;

# Use trash instead of rm (install trash-cli)
alias rm='trash'

Validate before execution:

# Check command syntax
sudo nginx -t
sudo systemctl daemon-reload

# Verify sudo permissions
sudo -l

# Test script syntax
bash -n script.sh
shellcheck script.sh

# Validate JSON/YAML
jq . config.json
python3 -m json.tool config.json

# Check file ownership before chown
ls -la file.txt

Use safe patterns:

# Set errexit (exit on error)
set -e
set -euo pipefail

# Use || true for commands that may fail
grep pattern file.txt || true

# Check command exists
command -v jq >/dev/null || { echo "jq not found"; exit 1; }

# Verify file exists before operations
[ -f /path/file ] || { echo "File not found"; exit 1; }

# Lock files for scripts
LOCKFILE=/tmp/myscript.lock
if [ -f "$LOCKFILE" ]; then
  echo "Script already running"
  exit 1
fi
touch "$LOCKFILE"
trap "rm -f $LOCKFILE" EXIT

4. Efficiency Tips

Command-line shortcuts:

# Reverse search (most powerful)
Ctrl+R  # Search history
Ctrl+R  # Next match
Enter   # Execute

# Navigation
Ctrl+A  # Start of line
Ctrl+E  # End of line
Ctrl+K  # Kill to end of line
Ctrl+U  # Kill to start of line
Ctrl+W  # Delete word backward
Alt+B   # Move back one word
Alt+F   # Move forward one word

# Process control
Ctrl+C  # Kill current process
Ctrl+Z  # Suspend process
jobs    # List background jobs
fg      # Bring to foreground
bg      # Resume in background

# Clear and exit
Ctrl+L  # Clear screen
Ctrl+D  # Exit shell

Useful shell options:

# Add to ~/.bashrc

# Autocorrect typos in cd
shopt -s cdspell

# Extended glob patterns
shopt -s extglob

# Recursive globbing (**)
shopt -s globstar

# Append to history, don't overwrite
shopt -s histappend

# Multi-line commands on one line
shopt -s cmdhist

# Update LINES and COLUMNS after resize
shopt -s checkwinsize

Productivity multipliers:

# Use !! for last command
sudo !!

# Use !$ for last argument
cat /var/log/nginx/access.log
vi !$

# Use !* for all arguments
grep error /var/log/app.log
less !*

# Quick substitution
docker run old-image
^old^new

# Brace expansion
mv file.{txt,bak}  # mv file.txt file.bak
cp file.conf{,.backup}  # cp file.conf file.conf.backup
mkdir -p project/{src,bin,lib,docs}

# Command substitution
cd $(dirname $(which nginx))
kill $(pgrep -f "java.*myapp")

# For loops
for i in {1..5}; do echo "Server $i"; done
for server in web{1..3}; do ssh $server uptime; done
for file in *.log; do gzip "$file"; done

# While loops with read
cat servers.txt | while read server; do
  echo "Checking $server"
  ssh "$server" "df -h"
done

tmux for persistent sessions:

# Start session
tmux

# Detach
Ctrl+B D

# List sessions
tmux ls

# Reattach
tmux attach

# Named session
tmux new -s deploy

# Split windows
Ctrl+B %  # Vertical split
Ctrl+B "  # Horizontal split
Ctrl+B O  # Switch pane

5. Documentation & Learning

Document as you go:

# Save your history with context
history | tail -20 > incident-$(date +%Y%m%d).txt

# Create runbooks from commands
cat << 'EOF' > runbook-deploy.md
# Deployment Runbook

## Pre-deployment
\`\`\`bash
ssh prod-server "systemctl status myapp"
\`\`\`

## Deployment
\`\`\`bash
rsync -avz /local/app/ prod-server:/opt/app/
ssh prod-server "sudo systemctl restart myapp"
\`\`\`

## Validation
\`\`\`bash
curl http://prod-server:8080/health
\`\`\`
EOF

# Screenshot your terminal (script command)
script -t 2>timing.txt deployment.log
# Do your work
# exit
# Replay: scriptreplay -t timing.txt deployment.log

Command man pages:

# Read manual
man ssh
man 5 ssh_config  # Config file format

# Search man pages
man -k network

# Show one-line description
whatis grep

# Quick help
grep --help

Learn by exploring:

# See what command does
type ll
type -a python

# Find command location
which nginx
whereis nginx

# Check command version
nginx -v
python --version

# Inspect binaries
file $(which curl)
ldd $(which curl)  # Library dependencies

# Explore /proc filesystem
cat /proc/cpuinfo
cat /proc/meminfo
cat /proc/sys/kernel/hostname

6. Monitoring & Alerting

Quick health checks:

# One-liner system check
echo "Load: $(uptime | awk -F'load average:' '{print $2}')" && \
echo "Disk: $(df -h / | awk 'NR==2{print $5}')" && \
echo "Mem: $(free | awk 'NR==2{printf "%.0f%%", $3/$2*100}')"

# Service availability
systemctl is-active myapp && echo "UP" || echo "DOWN"

# Port check
nc -z localhost 8080 && echo "Port open" || echo "Port closed"

# HTTP health
curl -sf http://localhost/health | jq -r '.status' || echo "FAIL"

Alerting with simple scripts:

#!/bin/bash
# disk-alert.sh

THRESHOLD=90
USAGE=$(df -h / | awk 'NR==2{print $5}' | tr -d '%')

if [ $USAGE -gt $THRESHOLD ]; then
  echo "ALERT: Disk usage at ${USAGE}%"
  # Send notification (mail, slack, etc)
fi

Conclusion

You now have a comprehensive toolkit of Linux commands for DevOps operations.

Key takeaways:

Master process monitoring (ps, top) for immediate triage
Know your logs (journalctl, tail, grep) for root cause analysis
Understand resource monitoring (free, df, iostat) to prevent issues
Use network tools (ss, curl, dig) to debug connectivity
Leverage remote operations (ssh, rsync) for efficient workflows
Apply permission management (chmod, chown, sudo) securely
Build pipelines (pipes, xargs, awk) for complex analysis
Practice on non-production systems first

Next steps:

Bookmark this guide for reference
Practice one section daily in your work
Build your personal alias library
Create runbooks for common tasks
Automate repetitive operations with scripts

What's next:

Part 2: Advanced Linux Commands for SRE (coming soon)
Security-Focused Linux Commands for DevOps
Docker & Container Management CLI Reference
Troubleshooting Production Incidents: A Command-Line Guide

Take action:

Which commands do you use most? Share in comments
Have a favorite one-liner? Drop it below
Subscribe for more DevOps tutorials
Follow for updates on the advanced series

Related guides on this blog:

Setting Up GitLab CI for Blog Automation (internal link)
Infrastructure as Code with Ansible (internal link)
DevOps Workflows: From Code to Deployment (internal link)

Keywords: devops linux commands, linux automation, system monitoring, devops tools, linux troubleshooting, site reliability engineering, system administration, command line reference

Essential Linux Commands for DevOps Engineers

October 13, 2025

linuxdevopscommandstroubleshootingsysadminmonitoring

Essential Linux Commands for DevOps Engineers

Introduction

When production breaks at 3 AM you can’t waste time searching for “which process is using port 8080” or “check disk space linux.” You need the right commands already in muscle memory.

Who this is for: DevOps engineers, SREs, platform engineers, or anyone who keeps Linux systems healthy.

You’ll get: The core commands for daily ops, incident response, rollout verification, and automation—plus how to read their output with confidence.

Part 1: Process & System Monitoring

First objective in an incident: establish system truth. What is running, what is consuming, what changed.

1. Process Management

`ps` – Process snapshot

Point‑in‑time process list: owner, PID, CPU %, memory %, command.

# View all running processes with full details
ps aux

# Alternative format (System V style)
ps -ef

# Find specific processes
ps aux | grep nginx

# Show processes in a tree structure (see parent-child relationships)
ps auxf

# Show only processes for current user
ps ux

# Sort by CPU usage (highest first)
ps aux --sort=-%cpu | head -n 10

# Sort by memory usage
ps aux --sort=-%mem | head -n 10

# Show threads for a specific process
ps -eLf | grep nginx

Reading the output:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
nginx     1234  0.5  2.1  12345  6789 ?        S    10:30   0:05 nginx: worker process

USER: Process owner
PID: Process ID (unique identifier)
%CPU: CPU usage percentage
%MEM: Memory usage percentage
VSZ: Virtual memory size (KB)
RSS: Resident Set Size - actual physical memory (KB)
STAT: Process state (R=running, S=sleeping, Z=zombie, D=uninterruptible sleep)
START: When the process started
TIME: Total CPU time used
COMMAND: The command that started the process

Examples:

# After deploying a new application, check if it's running
ps aux | grep myapp

# Find all Java processes and their memory usage
ps aux | grep java | awk '{print $2, $4, $11}'

# Identify processes consuming more than 50% CPU
ps aux | awk '$3 > 50 {print $0}'

Tips:

ps auxf for parent/child relationships (great for systemd + worker pools)
Wrap in watch -n 2 for rough pseudo‑streaming
Use [n]ginx pattern to avoid matching the grep process

`top` / `htop` – Live view

Real‑time utilization and ranking of processes. htop adds color, filtering, scrolling.

# Launch top (default 3-second refresh)
top

# Better alternative with colors and mouse support (if installed)
htop

# Top with custom refresh interval (1 second)
top -d 1

# Show only processes for specific user
top -u nginx

# Batch mode (for logging/scripting)
top -b -n 1 > system_snapshot.txt

Interactive top commands (while running):

M - Sort by memory usage (high to low)
P - Sort by CPU usage (default)
T - Sort by running time
k - Kill a process (prompts for PID)
r - Renice (change priority) of a process
f - Add/remove display fields
1 - Show individual CPU cores
q - Quit

Header breakdown:

top - 14:23:45 up 23 days,  4:12,  3 users,  load average: 0.52, 0.58, 0.59
Tasks: 312 total,   1 running, 311 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.2 us,  1.1 sy,  0.0 ni, 95.5 id,  0.2 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  16384.0 total,   2048.5 free,   8192.3 used,   6143.2 buff/cache
MiB Swap:   8192.0 total,   7890.1 free,    301.9 used.   7234.8 avail Mem

Key readings: Load (compare to logical CPU count), idle vs wait, memory pressure, swap usage trend.

Why htop: Scroll, tree, filter (F4), interactive kill (F9), color for quick scanning.

# Install htop (if not available)
# Rocky Linux / RHEL:
sudo dnf install htop

# Ubuntu / Debian:
sudo apt install htop

Usage patterns:

# Monitor system during deployment
top -d 1  # Fast refresh to catch spikes

# Watch memory usage after releasing a new version
htop  # Press 'M' to sort by memory, watch for leaks

# Identify which process caused a CPU spike
top -b -n 1 | head -n 20  # Capture snapshot for later analysis

Tips:

top -o %MEM for memory‑first triage
Sustained wa > 5–10% → investigate storage (iostat)
Load >> CPU count alongside high run queue = saturation

3. Finding Process IDs (PIDs)

Command: pgrep / pidof

# Find PID by process name
pgrep nginx

# Find PID with full command line
pgrep -f "java.*myapp"

# Show PID and process name
pgrep -l nginx

# Alternative: pidof (simpler but less flexible)
pidof nginx

Use case: Get PIDs for process management, scripting
DevOps context: Automated restart scripts, health checks
Pro tip: Use pgrep -f to match full command line arguments

Command: ps + grep (fallback)

# Find process by name
ps aux | grep nginx

# Get just the PID (using awk)
ps aux | grep nginx | grep -v grep | awk '{print $2}'

# More elegant single command
ps aux | grep [n]ginx | awk '{print $2}'

Use case: When pgrep isn't available, scripting
DevOps context: Legacy systems, portable scripts
Pro tip: The [n]ginx trick excludes grep itself from results

Command: lsof (list open files)

# Find which process is using a specific port
lsof -i :8080

# Find all network connections for a process
lsof -p 1234

# Find process using a specific file
lsof /var/log/app.log

Use case: Port conflict resolution, file lock troubleshooting
DevOps context: "Port already in use" errors, finding what's holding files
Pro tip: Combine with -t flag to get just PIDs: lsof -t -i :8080

4. Terminating Processes

Command: kill / killall

# First, find the PID (see section above)
pgrep nginx
# Output: 12345

# Graceful termination (SIGTERM)
kill 12345

# Force kill (SIGKILL) - use as last resort
kill -9 12345

# Kill all processes by name
killall nginx

# Graceful kill by name
pkill nginx

# Kill with full command match
pkill -f "java.*myapp"

Use case: Stop unresponsive processes, force restart services
DevOps context: Emergency process termination during incidents
Warning: -9 skips cleanup; escalate only if graceful stop fails
Practice: TERM → short wait → KILL only if still present

Signal cheat sheet:

# Common signals
kill -15 <PID>  # SIGTERM (default, graceful shutdown)
kill -9 <PID>   # SIGKILL (immediate termination, no cleanup)
kill -1 <PID>   # SIGHUP (reload configuration)
kill -2 <PID>   # SIGINT (Ctrl+C equivalent)

2. System Resource Monitoring

Baseline + deltas = early warning. Watch these before things break.

`free` – Memory usage

Shows physical + swap breakdown plus reclaimable cache. Track pressure, not raw consumption.

# Display memory usage in human-readable format
free -h

# Show memory in megabytes
free -m

# Show memory in gigabytes
free -g

# Continuous monitoring (update every 2 seconds)
free -h -s 2

# Wide format (better column spacing)
free -h -w

Sample:

              total        used        free      shared  buff/cache   available
Mem:           15Gi       8.2Gi       1.1Gi       523Mi       6.1Gi       6.8Gi
Swap:         8.0Gi       301Mi       7.7Gi

Columns:

total: Total installed RAM
used: Memory currently in use by applications
free: Completely unused memory (usually low—that's normal!)
shared: Memory used by tmpfs (temporary filesystems)
buff/cache: Memory used for file system caching (Linux uses "free" RAM for caching)
available: MOST IMPORTANT - Memory available for new applications without swapping

Misread alert: Low “free” is normal. Focus on available.

Patterns:

# Check if system is low on memory
free -h
# If "available" is low (< 10% of total), investigate with top/htop

# Monitor memory during load testing
watch -n 1 'free -h'

# Check if application deployment increased memory usage
free -h  # Before deployment
free -h  # After deployment, compare "used" and "available"

# Quick one-liner to check available memory percentage
free | grep Mem | awk '{print ($7/$2) * 100 "%"}'

Red flags:

Available < 10% total
Swap steadily rising
OOM killer messages in logs

Tips:

Correlate with vmstat 1 (si/so columns)
Rising swap + stable available = benign aging pages
Use pmap / smem for attribution if creeping

`df` – Filesystem usage

Capacity exhaustion silently breaks writes, logging, queueing.

# Display disk space in human-readable format
df -h

# Show all filesystems (including tmpfs, devtmpfs)
df -ah

# Show only specific filesystem type
df -h -t ext4

# Exclude specific filesystem type (useful to hide tmpfs clutter)
df -h -x tmpfs -x devtmpfs

# Show inode usage instead of block usage
df -i

# Show filesystem type
df -T

Sample:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        50G   35G   13G  74% /
/dev/sdb1       500G  450G   26G  95% /var
tmpfs           7.8G  524M  7.3G   7% /run

Columns:

Filesystem: Device or partition name
Size: Total size of the filesystem
Used: Space currently in use
Avail: Available space for new data
Use%: Percentage of space used
Mounted on: Where this filesystem is accessible

Threshold guide: plan (80%), act (90%), urgent (95%), risk (100%).

Patterns:

# Quick disk space check (most common command)
df -h

# During incident: Find which filesystem is full
df -h | awk '$5 > 80 {print $0}'

# Check if specific mount has enough space for deployment
df -h /opt/applications

# Monitor disk space during log-heavy operations
watch -n 5 'df -h | grep -E "(Filesystem|/var|/)"'

# Check inode usage (sometimes you run out of inodes, not space!)
df -i
# If inode Use% is high but df -h shows space available, you have too many small files

Inode exhaustion:

# Check inode usage
df -i

# Output shows:
Filesystem      Inodes  IUsed   IFree IUse% Mounted on
/dev/sda1      3276800 3270000  6800   99% /

# Find directories with many files
find /var -xdev -type d -exec sh -c 'echo $(ls -a "$1" | wc -l) "$1"' _ {} \; | sort -n | tail -20

Tips:

Alert on inode use too
Prune old images/volumes
Rotate/compress logs early

`du` – Directory utilization

Answers “where did the space go?”—pair with sorting.

# Check directory size (most common usage)
du -sh /var/log

# Find largest directories in /var
du -h --max-depth=1 /var | sort -hr | head -n 10

# Same but with better sorting (numeric, not alphabetic)
du -h /var | sort -rh | head -n 10

# Show sizes for all subdirectories
du -h --max-depth=2 /opt

# Include hidden files and directories
du -sh /home/user/.* /home/user/*

# Find total size of specific file types
du -ch /var/log/*.log | grep total

# Real-time monitoring during cleanup
watch -n 5 'du -sh /var/log'

# Find largest files in a directory
du -ah /var/log | sort -rh | head -n 20

# Exclude certain directories
du -h --exclude='*.git' /opt/project

Flags:

-s: Summarize (show total only, don't list all files)
-h: Human-readable sizes (K, M, G)
-a: Include files, not just directories
-c: Show grand total at the end
--max-depth=N: Limit directory recursion depth

Patterns:

# Scenario 1: Disk is 95% full, find the culprit
df -h  # Shows /var is full
du -h --max-depth=1 /var | sort -hr | head -10
# Output might show: 400G /var/log
du -h --max-depth=1 /var/log | sort -hr | head -10
# Output shows: 350G /var/log/application/old-logs

# Scenario 2: Docker eating disk space
du -sh /var/lib/docker
# Output: 80G /var/lib/docker

# Clean up old Docker images
docker system prune -a --volumes

# Scenario 3: Find all large log files
find /var/log -type f -size +100M -exec du -h {} \; | sort -rh

# Scenario 4: Compare disk usage before/after cleanup
du -sh /var/log > before.txt
# ... perform cleanup ...
du -sh /var/log > after.txt
diff before.txt after.txt

Performance: Limit recursion depth for speed.

# Fast: Just top-level directories
du -h --max-depth=1 /

# Slow: Scans entire filesystem
du -h /

Big consumers:

# Find top 10 largest directories under /
sudo du -h --max-depth=2 / 2>/dev/null | sort -hr | head -n 10

# Find top 20 largest files on system
sudo find / -type f -size +100M -exec du -h {} \; 2>/dev/null | sort -rh | head -n 20

Tips:

ncdu for interactive cleanup
Snapshot periodically for growth trends
Exclude ephemeral mounts

Command: uptime

# System uptime and load averages
uptime

Use case: Check system stability, load averages
DevOps context: Quick health check during incidents
What the numbers mean: 1min, 5min, 15min load averages

Command: iostat

# I/O statistics
iostat -x 1

# Disk-specific stats
iostat -dx 1

Use case: Diagnose disk I/O bottlenecks
DevOps context: Performance troubleshooting, database issues
What to watch: %util, await times

Part 2: Service Management

Control plane for systemd units. Check state, follow logs, trace failures.

`systemctl`

# List active services
systemctl list-units --type=service --state=running

# Service status + recent log tail
systemctl status nginx

# Start/stop/restart
sudo systemctl restart nginx
sudo systemctl stop nginx
sudo systemctl start nginx

# Enable on boot (creates symlink)
sudo systemctl enable nginx

# Disable auto-start
sudo systemctl disable nginx

# Check if enabled
systemctl is-enabled nginx

# Reload config without restart (if supported)
sudo systemctl reload nginx

# Show service file location
systemctl cat nginx

# Edit service override
sudo systemctl edit nginx  # Creates drop-in at /etc/systemd/system/nginx.service.d/

Common failure modes:

# Service won't start—check why
systemctl status myapp
# Look for "Active: failed" + exit code

# See full error (status truncates)
journalctl -u myapp -n 50 --no-pager

# Check if crash-looping
systemctl list-units --state=failed

Triage checklist:

systemctl status <unit> → exit code, recent logs
journalctl -u <unit> -n 100 → full startup sequence
Check deps: systemctl list-dependencies <unit>
Verify file: systemctl cat <unit> → paths, env, user

Tips:

Use --no-pager in scripts to avoid truncation
is-active / is-enabled return 0/non-zero for scripting
After changing unit files: sudo systemctl daemon-reload

`journalctl`

Systemd's structured logging. Query by unit, time, priority, field.

# All logs for a unit
journalctl -u nginx

# Follow (like tail -f)
journalctl -u nginx -f

# Last N lines
journalctl -u nginx -n 100

# Time-based filtering
journalctl -u nginx --since "2025-10-07 14:00:00"
journalctl -u nginx --since "1 hour ago"
journalctl -u nginx --since today

# Priority filtering (emerg, alert, crit, err, warning, notice, info, debug)
journalctl -u nginx -p err

# Combine: errors in last hour
journalctl -u nginx -p err --since "1 hour ago"

# Reverse order (newest first)
journalctl -u nginx -r

# Show only from current boot
journalctl -u nginx -b

# Kernel messages (dmesg equivalent)
journalctl -k

# Disk usage
journalctl --disk-usage

# Vacuum old logs (keep last 7 days)
sudo journalctl --vacuum-time=7d

Powerful field filtering:

# All logs from specific executable
journalctl _COMM=sshd

# Logs from specific PID
journalctl _PID=1234

# By user
journalctl _UID=1000

# Multiple units
journalctl -u nginx -u mysql

Output formats:

# JSON for parsing
journalctl -u nginx -o json

# Short (syslog-style)
journalctl -u nginx -o short

# Verbose (all fields)
journalctl -u nginx -o verbose

Post-deployment workflow:

# Mark time, deploy, follow logs
date  # Note timestamp
# ... deploy ...
journalctl -u myapp --since "30 seconds ago" -f

# Or filter by priority
journalctl -u myapp -p warning -f

Tips:

Add --no-pager for grep-able output
Use -x for explanatory help text on errors
Combine -f with -n 0 to follow without history
Set retention: /etc/systemd/journald.conf → MaxRetentionSec

Part 3: Network Troubleshooting

Layer-by-layer diagnostics: connectivity, DNS, sockets, routing, application.

`ss` (socket statistics)

Replaced netstat. Shows listening + established sockets, optionally with owning processes.

# List all listening TCP/UDP ports
ss -tuln

# Show processes owning sockets (requires root)
sudo ss -tulpn

# TCP connections only
ss -t

# Established connections
ss -tn state established

# Count connections by state
ss -tan | awk '{print $1}' | sort | uniq -c

# Show specific port
ss -tuln | grep :8080

# Numeric (don't resolve names—faster)
ss -n

# Summary stats
ss -s

Triage patterns:

# "Port already in use" → find what's listening
sudo ss -tulpn | grep :8080

# Check if service bound correctly
ss -tuln | grep :3306  # MySQL example

# Too many connections?
ss -tn | wc -l

# Which IPs connecting most?
ss -tn | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn | head

Tips:

-t TCP, -u UDP, -l listening, -n numeric, -p processes
Much faster than old netstat
Use -4 or -6 to filter IP version

`ip` (network configuration)

Modern replacement for ifconfig / route. Configure interfaces, routes, tunnels.

# Show all interfaces
ip addr show
# or shorthand:
ip a

# Specific interface
ip addr show eth0

# Show routes
ip route show
# or:
ip r

# Add/delete IP
sudo ip addr add 192.168.1.100/24 dev eth0
sudo ip addr del 192.168.1.100/24 dev eth0

# Bring interface up/down
sudo ip link set eth0 up
sudo ip link set eth0 down

# Show link statistics
ip -s link

# Neighbor table (ARP cache)
ip neigh show

# Routing table with more detail
ip route show table all

Common scenarios:

# Verify IP after DHCP/static config
ip addr show | grep inet

# Check default gateway
ip route show default

# Add temporary route
sudo ip route add 10.0.0.0/8 via 192.168.1.1

# Flush specific interface IPs
sudo ip addr flush dev eth0

# Check MTU
ip link show eth0 | grep mtu

Tips:

Persistent changes need /etc/network/interfaces or NetworkManager
Use -c for color output: ip -c a
-br for brief table format: ip -br a

`curl`

HTTP client for API testing, health checks, troubleshooting.

# Basic GET
curl https://api.example.com/health

# Include response headers
curl -I https://example.com
# or verbose:
curl -v https://example.com

# Follow redirects
curl -L https://short.link

# POST JSON
curl -X POST https://api.example.com/data \
  -H "Content-Type: application/json" \
  -d '{"key":"value"}'

# POST form data
curl -X POST https://example.com/form \
  -d "username=admin&password=secret"

# Custom headers
curl -H "Authorization: Bearer TOKEN" https://api.example.com

# Save output
curl -o output.html https://example.com

# Silent (no progress)
curl -s https://api.example.com/status

# Fail on HTTP errors
curl -f https://api.example.com

# Test specific IP (bypass DNS)
curl --resolve example.com:443:192.168.1.100 https://example.com

# Check TLS cert expiry
curl -vI https://example.com 2>&1 | grep "expire"

# Download with resume support
curl -C - -O https://example.com/largefile.iso

# Set timeout
curl --connect-timeout 5 --max-time 10 https://slow-server.com

Health check patterns:

# Simple availability
curl -f -s https://app.example.com/health && echo "UP" || echo "DOWN"

# Check response time
time curl -s https://api.example.com > /dev/null

# Validate status code
curl -s -o /dev/null -w "%{http_code}" https://example.com

# JSON response parsing (with jq)
curl -s https://api.example.com/status | jq '.status'

# Check from specific source IP (multi-homed)
curl --interface eth1 https://example.com

Debugging workflow:

# Start verbose
curl -v https://problem-site.com

# Check SSL handshake
curl -v https://problem-site.com 2>&1 | grep -E "SSL|TLS|certificate"

# Test with/without HTTP2
curl --http2 https://example.com
curl --http1.1 https://example.com

# Bypass proxy temporarily
curl --noproxy '*' https://example.com

Tips:

-s (silent) + -S (show errors) = script-friendly
Use -w for custom output format (status, time, size)
-k skips cert validation (insecure, use only for testing)

`wget`

File downloader. Better than curl for recursive downloads and resume.

# Download file
wget https://example.com/file.tar.gz

# Save with different name
wget -O custom-name.tar.gz https://example.com/file.tar.gz

# Resume interrupted download
wget -c https://example.com/large-file.iso

# Background download
wget -b https://example.com/huge-file.zip

# Limit rate (bandwidth throttle)
wget --limit-rate=1m https://example.com/file.zip

# Mirror entire site
wget -m -p -k https://example.com

# Download with auth
wget --user=admin --password=secret https://example.com/file

# Retry on failure
wget --tries=5 https://unreliable-server.com/file

Comparison to curl:

wget: better for downloads, mirroring, recursive retrieval
curl: better for API testing, custom headers, protocol flexibility

`ping`

ICMP echo test. Verify basic IP connectivity.

# Continuous ping
ping google.com

# Send N packets
ping -c 4 google.com

# Flood ping (requires root, use carefully)
sudo ping -f 192.168.1.1

# Set interval (seconds)
ping -i 0.5 google.com

# Specify interface
ping -I eth0 192.168.1.1

# IPv6
ping6 google.com

Usage:

First test in network triage
No response: check firewalls, routing, host down
High latency: network congestion or distance
Packet loss: unstable link

Tips:

Use -c in scripts to avoid infinite loops
Firewalls often block ICMP (absence doesn't prove failure)
Pair with mtr for path analysis

`traceroute` / `mtr`

Show network path to destination.

# Basic traceroute
traceroute google.com

# Use ICMP instead of UDP
sudo traceroute -I google.com

# Set max hops
traceroute -m 20 google.com

# Better: mtr (combines traceroute + ping)
mtr google.com

# mtr report mode (10 cycles)
mtr -r -c 10 google.com

Reading output:

Each line = hop (router)
* * * = timeout (often firewall dropping probes)
High latency at specific hop = bottleneck
Loss early in path = problem near you; late = problem near destination

When to use: Multi-region connectivity issues, asymmetric routing, finding slow link.

`dig` / `nslookup`

DNS query tools. Verify resolution, check propagation, debug CDN issues.

# Basic lookup
dig example.com

# Query specific DNS server
dig @8.8.8.8 example.com

# Short answer only
dig +short example.com

# Reverse DNS
dig -x 8.8.8.8

# Specific record type
dig example.com MX
dig example.com TXT
dig example.com AAAA  # IPv6

# Trace full resolution path
dig +trace example.com

# No recursion (ask server directly)
dig +norecurs example.com

# Query time
dig example.com | grep "Query time"

Common tasks:

# Verify DNS change propagated
dig @8.8.8.8 example.com  # Google DNS
dig @1.1.1.1 example.com  # Cloudflare DNS

# Check TTL
dig example.com | grep -E "^example.com"

# Find authoritative nameservers
dig example.com NS

# Test internal DNS
dig @10.0.0.53 internal.example.com

nslookup alternative:

# Basic query
nslookup example.com

# Specific server
nslookup example.com 8.8.8.8

# Reverse
nslookup 8.8.8.8

Triage pattern: "site not loading"

# 1. Can resolve?
dig example.com +short

# 2. Correct IP?
dig example.com +short
# Compare to expected

# 3. DNS server issue?
dig @8.8.8.8 example.com +short  # Public DNS
dig example.com +short  # System resolver

# 4. Stale cache?
# (Clear local: sudo systemd-resolve --flush-caches)

# 5. Check from multiple perspectives
dig @1.1.1.1 example.com +short
dig @8.8.8.8 example.com +short

Tips:

+short for script parsing
+trace shows delegation chain (helpful for debugging zone config)
Use multiple DNS servers to verify propagation

Part 4: File Operations & Text Processing

Search, filter, transform. The foundation of log triage and automation scripting.

`find`

Locate files by name, size, time, type. Execute batch operations.

# Find by name pattern
find /var/log -name "*.log"

# Case-insensitive
find /var/log -iname "*.LOG"

# Modified in last 7 days
find /var/log -mtime -7

# Modified more than 30 days ago
find /var/log -mtime +30

# Accessed in last 24 hours
find /var/log -atime -1

# Large files (>100MB)
find /var -size +100M

# Files between 10MB and 100MB
find /var -size +10M -size -100M

# Empty files
find /var/log -type f -empty

# Directories only
find /opt -type d -name "cache"

# Execute command on results
find /var/log -name "*.log" -exec gzip {} \;

# Safer: prompt before action
find /var/log -name "*.old" -ok rm {} \;

# Delete (use cautiously!)
find /tmp -name "*.tmp" -mtime +7 -delete

# Combine conditions (AND)
find /var/log -name "*.log" -size +100M -mtime +30

# OR logic
find /var/log \( -name "*.log" -o -name "*.txt" \)

# Exclude pattern
find /var -name "*.log" ! -path "*/archive/*"

# Limit depth
find /var -maxdepth 2 -name "*.conf"

Practical patterns:

# Find largest log files
find /var/log -type f -size +10M -exec ls -lh {} \; | sort -k5 -hr

# Old logs for cleanup
find /var/log -name "*.log.*" -mtime +90 -ls

# Recently changed configs (last 2 days)
find /etc -name "*.conf" -mtime -2

# World-writable files (security audit)
find /var/www -type f -perm -002

# Setuid binaries (security scan)
find / -type f -perm -4000 2>/dev/null

# Files owned by specific user
find /home -user bob -name "*.sh"

# Broken symlinks
find /opt -type l ! -exec test -e {} \; -print

Tips:

Test with -ls or -print before using -delete
Use -print0 + xargs -0 for filenames with spaces
Redirect stderr (2>/dev/null) to hide permission errors
-mtime 0 = today, -mtime 1 = yesterday

`grep`

Pattern matching in files. Core tool for log analysis.

# Basic search
grep "error" /var/log/app.log

# Case-insensitive
grep -i "ERROR" /var/log/app.log

# Show line numbers
grep -n "error" /var/log/app.log

# Count matches
grep -c "error" /var/log/app.log

# Invert match (lines NOT containing pattern)
grep -v "debug" /var/log/app.log

# Show context (3 lines before and after)
grep -C 3 "fatal" /var/log/app.log
# Or separately:
grep -B 3 -A 3 "fatal" /var/log/app.log

# Recursive search in directory
grep -r "TODO" /app/src

# Only show filenames
grep -rl "password" /etc

# Multiple patterns (OR)
grep -E "error|warn|fatal" /var/log/app.log
# Or:
grep -e "error" -e "warn" /var/log/app.log

# Whole word match
grep -w "fail" /var/log/app.log  # Won't match "failure"

# Extended regex
grep -E "^(ERROR|WARN)" /var/log/app.log

# Perl regex
grep -P "\d{3}-\d{3}-\d{4}" contacts.txt  # Phone numbers

# Fixed strings (no regex—faster)
grep -F "user@example.com" /var/log/mail.log

# Binary files
grep -a "string" binaryfile  # Treat as text

# With file name in output
grep -H "pattern" *.log

Log triage patterns:

# Errors in last hour (combine with journalctl/tail)
grep -i error /var/log/app.log | tail -n 100

# Filter noise
grep error /var/log/app.log | grep -v "harmless warning"

# Extract IPs
grep -oE "\b([0-9]{1,3}\.){3}[0-9]{1,3}\b" access.log | sort | uniq -c

# Extract timestamps + errors
grep "ERROR" /var/log/app.log | grep -oP "\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}"

# Show only unique errors
grep ERROR /var/log/app.log | sort | uniq

# Count error types
grep ERROR /var/log/app.log | awk '{print $5}' | sort | uniq -c | sort -rn

# Errors NOT from specific component
grep ERROR /var/log/app.log | grep -v "HealthCheck"

# Multi-level filtering
grep 500 access.log | grep -v bot | grep POST

Tips:

Use -E for modern regex (or egrep)
grep -v incredibly useful for filtering noise
Combine with | less for scrolling long output
Save common patterns as aliases

`tail` / `head`

View file start or end. Essential for log monitoring.

# Last 10 lines (default)
tail /var/log/app.log

# Last N lines
tail -n 50 /var/log/app.log
# Shorthand:
tail -50 /var/log/app.log

# Follow (live updates)
tail -f /var/log/app.log

# Follow multiple files
tail -f /var/log/app1.log /var/log/app2.log

# Follow with line numbers
tail -n 100 -f /var/log/app.log | cat -n

# Start from line N
tail -n +100 /var/log/app.log  # From line 100 to end

# First 10 lines
head /var/log/app.log

# First N lines
head -n 20 /var/log/app.log

# All but last N lines
head -n -10 /var/log/app.log  # Everything except last 10

Live monitoring patterns:

# Follow + filter
tail -f /var/log/app.log | grep ERROR

# Multiple filters
tail -f /var/log/app.log | grep -E "ERROR|WARN" | grep -v "HealthCheck"

# Follow + highlight
tail -f /var/log/app.log | grep --color=always -E "ERROR|$"

# Follow with timestamps
tail -f /var/log/app.log | while read line; do echo "$(date +%T) $line"; done

# Stop after match appears
tail -f /var/log/app.log | grep -m 1 "Startup complete"

Tips:

tail -f follows by name; use -F to handle log rotation
less +F filename = tail -f with scroll-back ability (Ctrl-C to pause)
multitail for advanced multi-file monitoring with colors

`awk`

Pattern-directed text processing. Extract columns, aggregate, filter.

# Print specific columns (space-delimited)
ps aux | awk '{print $1, $11}'

# Print with header
ps aux | awk 'NR==1 || $3>50'  # Header + high CPU

# Filter by condition
df -h | awk '$5 > 80 {print $0}'  # >80% full

# Sum values
cat numbers.txt | awk '{sum += $1} END {print sum}'

# Average
awk '{sum+=$1; count++} END {print sum/count}' numbers.txt

# Field separator (CSV)
awk -F',' '{print $2}' data.csv

# Multiple separators
awk -F'[,:]' '{print $3}' file.txt

# Print last field
awk '{print $NF}' file.txt

# String matching
awk '/error/ {print $0}' /var/log/app.log

# Negation
awk '!/debug/ {print $0}' /var/log/app.log

# Count occurrences
awk '/ERROR/ {count++} END {print count}' /var/log/app.log

# Unique values
awk '{print $5}' file.txt | sort | uniq

Practical examples:

# Extract IPs from access log
awk '{print $1}' access.log | sort | uniq -c | sort -rn

# Memory usage by process
ps aux | awk '{mem[$11]++} END {for (i in mem) print mem[i], i}' | sort -rn

# Parse structured logs
awk -F'|' '{print $3}' app.log | sort | uniq -c

# Traffic by hour
awk '{print $4}' access.log | cut -d: -f2 | sort | uniq -c

# Calculate percentiles (simplified)
awk '{print $9}' response_times.txt | sort -n | awk '{a[NR]=$1} END {print a[int(NR*0.95)]}'

# Status code distribution
awk '{print $9}' access.log | sort | uniq -c | sort -rn

Tips:

Start simple: {print $N} to extract columns
NR = line number, NF = number of fields
Use -F to set delimiter (default is whitespace)
Great for quick one-liners; for complex logic, use Python/Perl

`sed`

Stream editor. Substitute text, delete lines, transform input.

# Replace first occurrence
sed 's/old/new/' file.txt

# Replace all occurrences
sed 's/old/new/g' file.txt

# In-place edit (DANGEROUS—test first!)
sed -i 's/old/new/g' file.txt

# Backup before in-place edit
sed -i.bak 's/old/new/g' file.txt

# Delete lines matching pattern
sed '/pattern/d' file.txt

# Delete specific line number
sed '5d' file.txt

# Delete range
sed '10,20d' file.txt

# Print only matching lines (like grep)
sed -n '/pattern/p' file.txt

# Substitute only on matching lines
sed '/ERROR/ s/foo/bar/g' file.txt

# Multiple commands
sed -e 's/foo/bar/g' -e 's/baz/qux/g' file.txt

# Case-insensitive replace
sed 's/error/ERROR/gI' file.txt

# Add line after match
sed '/pattern/a\New line here' file.txt

# Insert line before match
sed '/pattern/i\New line here' file.txt

# Replace in specific lines
sed '10,20s/old/new/g' file.txt

Config file updates:

# Change port number
sed -i 's/^port.*/port 8080/' config.ini

# Uncomment line
sed -i 's/^# *\(option.*\)/\1/' config.file

# Comment out line
sed -i 's/^dangerous_setting/#&/' config.file

# Replace variable
sed -i "s|OLD_PATH|$NEW_PATH|g" script.sh

# Multi-line replace (advanced)
sed -i ':a;N;$!ba;s/foo\nbar/baz/g' file.txt

Tips:

Always test without -i first
Use | as delimiter if pattern contains /: s|/path/|/newpath/|
Backup with -i.bak before modifying production configs
For complex edits, consider perl -pi -e or Python

`cut`

Extract columns from delimited text.

# Extract first field (default tab delimiter)
cut -f1 file.txt

# CSV (comma delimiter)
cut -d',' -f2 file.csv

# Multiple fields
cut -d':' -f1,3 /etc/passwd

# Range of fields
cut -d',' -f1-3 file.csv

# All fields except N
cut -d':' -f1,3- /etc/passwd  # 1, then 3 onwards

# Character positions
cut -c1-10 file.txt

# Extract username from email
cut -d'@' -f1 emails.txt

# Combine with other commands
ps aux | tr -s ' ' | cut -d' ' -f1,11

Tips:

Fast and simple for fixed-format data
Use awk for more complex field extraction
Pair with tr to normalize delimiters

Bonus: `jq`

JSON processor (install separately). Essential for API work.

# Pretty-print JSON
curl -s https://api.example.com | jq '.'

# Extract field
curl -s https://api.example.com/status | jq '.status'

# Array element
jq '.[0]' array.json

# Filter array
jq '.[] | select(.active == true)' data.json

# Extract multiple fields
jq '.[] | {id, name}' data.json

# Count items
jq '. | length' array.json

# Map over array
jq '.[] | .price * 1.1' items.json  # Add 10%

Tips:

Invaluable for parsing API responses
Use -r for raw output (no quotes)
Combine with curl for API testing pipelines

Part 5: Remote Operations & File Transfer

Connect, execute, sync. Key tools for multi-server management.

`ssh`

Secure remote shell access and command execution.

# Basic connection
ssh user@hostname

# Specific port
ssh -p 2222 user@hostname

# Use specific key
ssh -i ~/.ssh/id_rsa_custom user@hostname

# Execute single command
ssh user@hostname "systemctl status nginx"

# Execute multiple commands
ssh user@hostname "cd /opt && ./deploy.sh && systemctl restart app"

# Local port forwarding (tunnel)
ssh -L 8080:localhost:80 user@hostname
# Access remote :80 via local :8080

# Remote port forwarding
ssh -R 9000:localhost:3000 user@hostname
# Remote can access your local :3000 via their :9000

# Dynamic SOCKS proxy
ssh -D 1080 user@hostname
# Configure browser to use localhost:1080

# Jump host (bastion)
ssh -J jump-host@bastion.example.com user@internal-server

# X11 forwarding
ssh -X user@hostname
# Run GUI apps remotely

# Keep connection alive
ssh -o ServerAliveInterval=60 user@hostname

# Disable strict host key checking (testing only!)
ssh -o StrictHostKeyChecking=no user@hostname

# Run command with sudo
ssh -t user@hostname "sudo systemctl restart nginx"
# -t allocates pseudo-terminal (required for sudo password)

SSH config (~/.ssh/config):

# Create config for easy access
cat >> ~/.ssh/config << EOF
Host prod-web
    HostName 192.168.1.100
    User deploy
    Port 2222
    IdentityFile ~/.ssh/prod_key
    ServerAliveInterval 60
    
Host *.internal
    ProxyJump bastion.example.com
    User ops
EOF

# Now simply:
ssh prod-web
ssh server1.internal

ControlMaster (connection reuse):

# Add to ~/.ssh/config
Host *
    ControlMaster auto
    ControlPath ~/.ssh/cm-%r@%h:%p
    ControlPersist 10m

# First connection creates socket; subsequent ones reuse it
# Much faster for repeated commands

Security hardening:

# Server-side /etc/ssh/sshd_config
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
AllowUsers deploy ops
MaxAuthTries 3
ClientAliveInterval 300
ClientAliveCountMax 2

Key generation:

# Generate new key pair
ssh-keygen -t ed25519 -C "user@hostname"

# Copy public key to server
ssh-copy-id user@hostname

# Manual copy (if ssh-copy-id unavailable)
cat ~/.ssh/id_ed25519.pub | ssh user@hostname "mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys"

Tips:

Use ed25519 keys over RSA (faster, more secure)
ControlMaster speeds up Ansible, scripts dramatically
Jump hosts simplify bastion access
Never disable StrictHostKeyChecking in production

`scp`

Secure copy between hosts.

# Copy file to remote
scp file.txt user@hostname:/path/to/destination

# Copy from remote
scp user@hostname:/path/to/file.txt ./

# Copy directory recursively
scp -r ./directory user@hostname:/path/

# Preserve permissions and timestamps
scp -p file.txt user@hostname:/path/

# Specific port
scp -P 2222 file.txt user@hostname:/path/

# Use specific key
scp -i ~/.ssh/custom_key file.txt user@hostname:/path/

# Verbose (debugging)
scp -v file.txt user@hostname:/path/

# Limit bandwidth (KB/s)
scp -l 1000 large-file.iso user@hostname:/path/

# Via jump host
scp -J bastion.example.com file.txt user@internal-server:/path/

# Between two remote hosts
scp user1@host1:/path/file.txt user2@host2:/path/

Tips:

-P for port (capital P, unlike ssh's -p)
Use rsync for large transfers (better resume, progress)
SCP doesn't handle interruptions well

`rsync`

Intelligent file synchronization. Handles interruptions, only transfers changes.

# Basic sync
rsync -avz /local/path/ user@hostname:/remote/path/

# Flags explained:
# -a = archive (recursive, preserve permissions, times, symlinks)
# -v = verbose
# -z = compress during transfer

# Dry run (preview changes)
rsync -avzn /local/path/ user@hostname:/remote/path/

# Show progress
rsync -avz --progress /local/path/ user@hostname:/remote/path/

# Delete files on destination not in source (dangerous!)
rsync -avz --delete /local/path/ user@hostname:/remote/path/

# Exclude patterns
rsync -avz --exclude='*.log' --exclude='tmp/' /local/ user@host:/remote/

# Include only specific patterns
rsync -avz --include='*.conf' --include='*/' --exclude='*' /etc/ backup/

# Partial transfer support (resume)
rsync -avz --partial /local/large-file user@hostname:/remote/

# Bandwidth limit (KB/s)
rsync -avz --bwlimit=1000 /local/ user@hostname:/remote/

# Custom SSH port
rsync -avz -e "ssh -p 2222" /local/ user@hostname:/remote/

# Via jump host
rsync -avz -e "ssh -J bastion" /local/ user@internal:/remote/

# Local sync (no SSH)
rsync -avz /source/ /destination/

# Show what changed
rsync -avzi /local/ user@hostname:/remote/
# i = itemize changes

# Backup with hard links (space-efficient)
rsync -avz --link-dest=/backup/previous /data/ /backup/current/

Practical backup script:

#!/bin/bash
DATE=$(date +%Y%m%d)
DEST="/backup/$DATE"
PREV="/backup/latest"

# Create incremental backup
rsync -avz --link-dest="$PREV" /data/ "$DEST/"

# Update latest symlink
ln -snf "$DEST" /backup/latest

Deployment pattern:

# Deploy with dry run first
rsync -avzn --delete /local/app/ prod-server:/opt/app/

# If ok, deploy for real
rsync -avz --delete /local/app/ prod-server:/opt/app/

# Restart service
ssh prod-server "systemctl restart app"

Tips:

Trailing slash matters: /path/ syncs contents, /path syncs directory itself
Always test with -n first when using --delete
Faster than scp for large/repeated transfers
Use --checksum for verification (slower but accurate)

# Extract specific fields
cut -d':' -f1 /etc/passwd

# Extract columns
ps aux | cut -c1-20

Use case: Extract specific data from structured text
DevOps context: Parsing CSV, extracting IDs from output
Pro tip: -d sets delimiter, -f selects fields

Part 5: Remote Operations & File Transfer

Command: ssh

# Connect to remote server
ssh user@hostname

# Use specific key
ssh -i ~/.ssh/private_key user@hostname

# Execute remote command
ssh user@hostname "systemctl status nginx"

# SSH tunnel (port forwarding)
ssh -L 8080:localhost:80 user@hostname

Use case: Remote server access, command execution
DevOps context: Server management, deployment automation
Security tip: Always use key-based authentication

Command: scp

# Copy file to remote
scp file.txt user@hostname:/path/to/destination

# Copy from remote
scp user@hostname:/path/to/file.txt ./

# Copy directory recursively
scp -r ./directory user@hostname:/path/

Use case: File transfer between servers
DevOps context: Deploying artifacts, copying configs
Alternative: rsync for larger transfers

Command: rsync

# Sync directories
rsync -avz /local/path/ user@hostname:/remote/path/

# Dry run (preview changes)
rsync -avzn /local/path/ user@hostname:/remote/path/

# Delete files on destination not in source
rsync -avz --delete /local/path/ user@hostname:/remote/path/

# Show progress
rsync -avz --progress /local/path/ user@hostname:/remote/path/

Use case: Efficient file synchronization, backups
DevOps context: Deployment automation, backup scripts
Pro tip: Always test with -n (dry run) first!

Part 6: System Information Commands

Command: hostname

# Show hostname
hostname

# Show IP address
hostname -i

# Show FQDN
hostname -f

Use case: Identify which server you're on
DevOps context: Multi-server management, cluster identification
Pro tip: Essential in tmux/screen sessions

Command: uname

# Show kernel version
uname -r

# Show all system info
uname -a

Use case: Check OS version, kernel compatibility
DevOps context: Verifying system requirements, documentation

Command: date

# Current date/time
date

# Format date
date +"%Y-%m-%d %H:%M:%S"

# UTC time
date -u

# Set date (requires sudo)
sudo date -s "2025-10-07 14:30:00"

Use case: Timestamps, time synchronization checks
DevOps context: Log analysis, scheduling tasks
Pro tip: Use NTP for time sync, not manual setting

Part 7: User & Permission Management

sudo — Controlled Privilege Escalation

# Execute single command as root
sudo systemctl restart nginx

# Open root shell
sudo -i

# Run as specific user
sudo -u postgres psql

# Preserve environment
sudo -E env | grep PATH

# Edit privileged file safely
sudo visudo
sudoedit /etc/nginx/nginx.conf

# Validate sudoers syntax
sudo visudo -c

# Show sudo access
sudo -l

# Log sudo commands
sudo tail -f /var/log/secure | grep sudo

Common sudo workflows:

# Service restart
sudo systemctl restart myapp

# File permissions fix
sudo chown -R appuser:appuser /opt/app

# Package install
sudo dnf install nginx

# Log inspection
sudo journalctl -u nginx -f

# Config edits
sudo vim /etc/systemd/system/myapp.service

Security hardening:

# Limit sudo timeout
echo "Defaults timestamp_timeout=5" | sudo tee -a /etc/sudoers.d/timeout

# Require password always
echo "Defaults !tty_tickets" | sudo tee -a /etc/sudoers.d/notty

# Log all sudo commands
echo "Defaults logfile=/var/log/sudo.log" | sudo tee -a /etc/sudoers.d/logging

# Restrict commands per user
echo "deploy ALL=(ALL) NOPASSWD: /usr/bin/systemctl restart myapp" | sudo tee /etc/sudoers.d/deploy

Troubleshooting sudo:

Check /var/log/secure for auth failures
Verify user in wheel/sudo group: groups username
Validate sudoers: sudo visudo -c
Test access: sudo -l -U username

Tips:

Never edit /etc/sudoers directly, use visudo
Use sudoedit for safe config editing
Grant minimal permissions (specific commands only)
Log sudo activity for audit trail

chmod — Permission Control

# Make script executable
chmod +x deploy.sh

# Numeric permissions
chmod 755 script.sh  # rwxr-xr-x
chmod 644 config.yml # rw-r--r--
chmod 600 secret.key # rw-------
chmod 700 private/   # rwx------

# Symbolic mode
chmod u+x script.sh           # User: add execute
chmod g-w config.yml          # Group: remove write
chmod o-rwx secret.key        # Others: remove all
chmod a+r readme.txt          # All: add read

# Recursive
chmod -R 755 /var/www/html

# Special bits
chmod u+s /usr/bin/sudo       # setuid
chmod g+s /opt/shared         # setgid
chmod +t /tmp                 # sticky bit

Permission reference:

| Octal | Binary | Symbolic | Meaning              |
| ----- | ------ | -------- | -------------------- |
| 7     | 111    | rwx      | read, write, execute |
| 6     | 110    | rw-      | read, write          |
| 5     | 101    | r-x      | read, execute        |
| 4     | 100    | r--      | read only            |
| 3     | 011    | -wx      | write, execute       |
| 2     | 010    | -w-      | write only           |
| 1     | 001    | --x      | execute only         |
| 0     | 000    | ---      | no permissions       |

Deployment patterns:

# Web server files
sudo chmod -R 755 /var/www/html
sudo chmod -R 644 /var/www/html/*.html

# Application directories
sudo chmod 755 /opt/app
sudo chmod 755 /opt/app/bin/*
sudo chmod 644 /opt/app/config/*
sudo chmod 600 /opt/app/config/secrets.yml

# Logs (application needs write)
sudo chmod 755 /var/log/myapp
sudo chmod 644 /var/log/myapp/*.log

# Shared team directory (setgid)
sudo chmod 2775 /opt/shared
# New files inherit group ownership

Security considerations:

# Find world-writable files (danger)
find / -type f -perm -002 2>/dev/null

# Find setuid binaries (audit these)
find / -type f -perm -4000 2>/dev/null

# SSH key permissions (strict)
chmod 600 ~/.ssh/id_rsa
chmod 644 ~/.ssh/id_rsa.pub
chmod 700 ~/.ssh

# Config files with secrets
chmod 600 /opt/app/config/database.yml

Tips:

755 for directories, 644 for files (default safe values)
600 for secrets (owner read/write only)
Use setgid (2775) for shared team directories
Never chmod 777 (world-writable = security risk)

chown / chgrp — Ownership Management

# Change owner and group
sudo chown user:group file.txt

# Change owner only
sudo chown user file.txt

# Change group only
sudo chgrp group file.txt
sudo chown :group file.txt

# Recursive ownership
sudo chown -R www-data:www-data /var/www/html

# Follow symlinks
sudo chown -LR user:group /path/with/symlinks

# Report changes
sudo chown -v user:group file.txt

# Reference another file
sudo chown --reference=/etc/nginx/nginx.conf custom.conf

Application deployment:

# After deployment, fix ownership
sudo chown -R appuser:appgroup /opt/app

# Web server content
sudo chown -R nginx:nginx /var/www/html

# Database files
sudo chown -R postgres:postgres /var/lib/pgsql

# Log files
sudo chown -R appuser:appgroup /var/log/myapp

Shared directory pattern:

# Create shared directory
sudo mkdir /opt/shared
sudo chgrp developers /opt/shared
sudo chmod 2775 /opt/shared  # setgid + group write

# Now all files created inherit 'developers' group
# Team members can collaborate without permission issues

Troubleshooting ownership:

# Find files by owner
find /opt/app -user olduser

# Bulk ownership change
find /opt/app -user olduser -exec sudo chown newuser:newgroup {} +

# Check who owns critical files
ls -l /etc/systemd/system/myapp.service
ls -l /opt/app/bin/start.sh

# Verify web server can read
sudo -u nginx cat /var/www/html/index.html

Tips:

Use -R carefully, avoid / or /home
Application files should be owned by app user
Use setgid for shared directories
Always check ownership after deployment

getent — User/Group Queries

# Query user database
getent passwd username
getent passwd | grep -i john

# Query group database
getent group developers
getent group | grep -i admin

# Check if user exists (script-friendly)
getent passwd deploy > /dev/null && echo "User exists"

# List all groups a user belongs to
getent group | grep username
# Better: use `groups` or `id`
groups username
id -nG username

# Query shadow (requires root)
sudo getent shadow username

# Check LDAP/AD users
getent passwd | wc -l  # Total users including LDAP

# Hosts database
getent hosts example.com

# Services database
getent services http
getent services 80

Troubleshooting access:

# User verification workflow
getent passwd username
groups username
id username
sudo -l -U username

# Check application user
getent passwd appuser
ps aux | grep appuser
sudo ls -la /opt/app

# Verify group membership
getent group developers
# Does output include expected user?

# Find user's primary group
id -gn username

# Audit sudo access
getent group wheel
getent group sudo

Tips:

getent queries all NSS sources (local + LDAP/AD)
Use for portability (works on all Linux distros)
Better than cat /etc/passwd (misses LDAP/AD users)
Combine with id and groups for complete picture

Part 8: DevOps-Specific Power Commands

xargs — Argument Builder for Bulk Operations

# Basic usage
echo "file1 file2 file3" | xargs rm

# From find output
find /tmp -name "*.tmp" | xargs rm

# Parallel execution
cat servers.txt | xargs -P 4 -I {} ssh {} "uptime"

# Handle filenames with spaces
find . -name "*.log" -print0 | xargs -0 rm

# Interactive confirmation
find . -name "*.bak" | xargs -p rm

# One argument per command
cat users.txt | xargs -n 1 sudo useradd

# Custom placeholder
cat servers.txt | xargs -I HOST ssh HOST "df -h"

# Max processes
seq 1 100 | xargs -P 10 -I {} curl "http://api/item/{}"

# Show command before execution
echo "file1 file2" | xargs -t rm

Bulk operations:

# Kill multiple processes
ps aux | grep zombie | awk '{print $2}' | xargs kill -9

# Restart services across servers
cat servers.txt | xargs -I {} ssh {} "systemctl restart nginx"

# Parallel healthcheck
cat production-servers.txt | xargs -P 10 -I {} sh -c 'curl -sf http://{}/health || echo "{} DOWN"'

# Download multiple files
cat urls.txt | xargs -n 1 -P 4 wget

# Change ownership in bulk
find /opt/app -user olduser -print0 | xargs -0 sudo chown newuser:newgroup

# Parallel log search
cat servers.txt | xargs -P 5 -I {} ssh {} "grep ERROR /var/log/app.log"

# Compress old logs
find /var/log -name "*.log" -mtime +30 -print0 | xargs -0 gzip

# Delete empty directories
find . -type d -empty -print0 | xargs -0 rmdir

Deployment automation:

# Deploy to multiple servers
cat prod-servers.txt | xargs -P 3 -I {} sh -c '
  echo "Deploying to {}"
  rsync -az /local/app/ {}:/opt/app/
  ssh {} "systemctl restart myapp"
  ssh {} "curl -sf http://localhost:8080/health"
'

# Parallel config updates
cat servers.txt | xargs -P 5 -I {} scp config.yml {}:/opt/app/config/

# Bulk log rotation
cat servers.txt | xargs -I {} ssh {} "sudo logrotate -f /etc/logrotate.conf"

Tips:

-P N for parallel execution (speedup)
-0 with find -print0 for filenames with spaces
-I {} for custom placeholder
-n 1 to run one argument at a time
Test with echo before destructive operations

history — Command Audit Trail

# Show full history
history

# Last 20 commands
history 20

# Search history
history | grep ssh
history | grep "systemctl restart"

# Execute by number
!123

# Execute last command
!!

# Execute last command with specific text
!ssh
!curl

# Last argument from previous command
ls /var/log/nginx
cd !$  # cd to /var/log/nginx

# All arguments from previous command
grep ERROR /var/log/app.log
vi !*  # vi /var/log/app.log

# Previous command, substitution
docker run old-image
^old^new^  # docker run new-image

# Reverse search (interactive)
Ctrl+R
# Type to search, Enter to execute

History configuration:

# Add to ~/.bashrc for better history

# Unlimited history
export HISTSIZE=10000
export HISTFILESIZE=20000

# Timestamp in history
export HISTTIMEFORMAT="%Y-%m-%d %H:%M:%S  "

# Ignore duplicates and space-prefixed commands
export HISTCONTROL=ignoreboth:erasedups

# Ignore common commands
export HISTIGNORE="ls:ll:cd:pwd:history"

# Append to history (don't overwrite)
shopt -s histappend

# Save immediately (not on shell exit)
PROMPT_COMMAND="history -a"

# Multi-line commands on single line
shopt -s cmdhist

Practical workflows:

# Document what you did
history | grep -A 5 "systemctl restart"

# Build runbook from history
history | grep "docker" > docker-commands.txt

# Repeat deployment steps
history | grep -E "(rsync|systemctl restart)"

# Audit manual changes
history | grep "vi /etc"

# Share troubleshooting steps
history 50 | grep -E "(curl|grep|tail)"

Incident response:

# What did I just run?
history 5

# When was nginx restarted?
history | grep "systemctl.*nginx"

# Find that long curl command
history | grep curl | grep -i auth

# Repeat complex command
!curl
# or
Ctrl+R curl

Tips:

Set large HISTSIZE for better recall
Add timestamps with HISTTIMEFORMAT
Use Ctrl+R for interactive search
Prefix sensitive commands with space to skip history

alias — Command Shortcuts

# Create temporary alias
alias ll='ls -lah'
alias ports='ss -tuln'

# View all aliases
alias

# Remove alias
unalias ll

# Make permanent (add to ~/.bashrc)
echo "alias ll='ls -lah'" >> ~/.bashrc
source ~/.bashrc

# Escape alias (run original command)
\ls  # bypasses alias

# Check if command is aliased
type ll

DevOps-focused aliases:

# Add to ~/.bashrc

# System info
alias meminfo='free -h'
alias diskinfo='df -h'
alias cpuinfo='lscpu'
alias ports='ss -tuln'

# Service management
alias svc='systemctl'
alias svcstatus='systemctl status'
alias svcrestart='sudo systemctl restart'
alias svclogs='journalctl -u'

# Process monitoring
alias pscpu='ps aux | sort -nrk 3 | head'
alias psmem='ps aux | sort -nrk 4 | head'
alias topme='htop -u $USER'

# Logs
alias tailf='tail -f'
alias taillogs='sudo tail -f /var/log/messages'
alias greplog='grep -Hn --color=auto'

# Docker (if used)
alias dps='docker ps'
alias dlog='docker logs -f'
alias dexec='docker exec -it'
alias dclean='docker system prune -af'

# Git (if used)
alias gs='git status'
alias gl='git log --oneline -10'
alias gp='git pull'

# Safety aliases
alias rm='rm -i'
alias cp='cp -i'
alias mv='mv -i'

# Navigation
alias ..='cd ..'
alias ...='cd ../..'
alias ll='ls -lah'
alias la='ls -A'

# Network
alias myip='curl ifconfig.me'
alias pingg='ping -c 5 8.8.8.8'
alias listening='sudo ss -tulpn | grep LISTEN'

# System
alias update='sudo dnf update'
alias reboot='sudo systemctl reboot'
alias suspend='sudo systemctl suspend'

Team-shared aliases:

# Create team dotfiles repo
mkdir ~/dotfiles
cd ~/dotfiles

# Create shared aliases
cat > aliases.sh << 'EOF'
# Production shortcuts
alias prod-ssh='ssh -J bastion prod-server'
alias prod-logs='ssh prod-server "sudo journalctl -u myapp -f"'
alias prod-status='ssh prod-server "systemctl status myapp"'

# Deployment helpers
alias deploy-staging='./scripts/deploy.sh staging'
alias deploy-prod='./scripts/deploy.sh production'

# Monitoring
alias check-health='for s in $(cat servers.txt); do curl -sf http://$s/health || echo "$s DOWN"; done'
EOF

# Source in ~/.bashrc
echo "source ~/dotfiles/aliases.sh" >> ~/.bashrc

Tips:

Use meaningful names (verb-noun pattern)
Document complex aliases
Share team aliases via dotfiles repo
Test aliases before making permanent
Use functions for complex logic

time — Performance Measurement

# Basic timing (shell builtin)
time ls -R /

# Detailed system time (/usr/bin/time)
/usr/bin/time -v ./script.sh

# Output format
time -p ./command  # Portable format

# Redirect time output
{ time ./script.sh; } 2> timing.txt

# Time multiple commands
time { command1; command2; command3; }

Interpreting output:

$ time ./deploy.sh

real    2m15.432s    # Wall clock time (total elapsed)
user    0m5.220s     # CPU time in user mode
sys     0m1.880s     # CPU time in kernel mode

# If real >> user+sys = I/O or network bound
# If user >> sys = CPU intensive (good)
# If sys >> user = kernel overhead (syscalls, I/O)

Detailed metrics with /usr/bin/time:

$ /usr/bin/time -v ./process-logs.sh

Command being timed: "./process-logs.sh"
User time (seconds): 45.23
System time (seconds): 8.12
Percent of CPU this job got: 87%
Elapsed (wall clock) time: 1:01.23
Maximum resident set size (kbytes): 524288
Page faults: 1243
Voluntary context switches: 8934

Performance analysis:

# Compare script versions
time ./old-script.sh > /dev/null
time ./new-script.sh > /dev/null

# Find slow command in pipeline
time cat large.log | grep ERROR | awk '{print $1}' | sort | uniq -c

# Time individual pipeline stages
time cat large.log > /dev/null
time cat large.log | grep ERROR > /dev/null
time cat large.log | grep ERROR | awk '{print $1}' > /dev/null

# Deployment timing
time {
  rsync -az /local/ server:/remote/
  ssh server "systemctl restart app"
  sleep 10
  curl http://server/health
}

Benchmarking:

# Run multiple times
for i in {1..10}; do
  time ./script.sh > /dev/null
done

# Average timing
for i in {1..10}; do
  { time ./script.sh > /dev/null; } 2>&1 | grep real
done | awk '{sum+=$2; count++} END {print sum/count}'

Tips:

Shell builtin time vs /usr/bin/time (different features)
Use -v with /usr/bin/time for memory stats
High I/O wait = optimize disk/network
Profile before optimizing

watch — Live Command Monitoring

# Execute every 2 seconds (default)
watch df -h

# Custom interval
watch -n 5 "ps aux | grep nginx"

# Highlight differences
watch -d -n 1 "ss -s"

# Exit on change
watch -g "curl -s http://localhost/health | grep UP"

# Precise timing
watch -p -n 0.1 "cat /proc/loadavg"

# No title
watch -t df -h

# Beep on error
watch -b "curl -sf http://localhost/health"

Deployment monitoring:

# Watch service status during deployment
watch -n 1 "systemctl status myapp"

# Monitor application startup
watch -d -n 2 "ss -tuln | grep :8080"

# Watch log for errors
watch -n 1 "tail -20 /var/log/myapp/error.log"

# Monitor health endpoint
watch -n 5 "curl -sf http://localhost/health || echo 'DOWN'"

# Wait for service to start
watch -g "curl -s http://localhost/health | grep UP"
# Exits when grep succeeds

# Monitor resource usage
watch -d -n 2 "ps aux | grep myapp | grep -v grep"

# Track deployment progress
watch -n 1 "ls -lh /opt/app/ | tail -5"

System monitoring:

# Live disk usage
watch -d df -h

# Memory changes
watch -d -n 1 free -h

# Network connections
watch -d -n 2 "ss -s"

# Load average
watch -n 5 uptime

# Process count
watch -n 5 "ps aux | wc -l"

# Active connections by IP
watch -d -n 2 "ss -tn | tail -n +2 | awk '{print \$5}' | cut -d: -f1 | sort | uniq -c | sort -nr | head -10"

Automation patterns:

# Wait for port to open (deployment)
watch -g "ss -tuln | grep :8080"
echo "Service is up!"

# Monitor log until error appears
watch -g "grep -q ERROR /var/log/app.log"
echo "Error detected!"

# Wait for file to appear
watch -g "test -f /tmp/deploy-complete"
echo "Deployment finished"

# Monitor certificate expiry
watch -n 3600 "echo | openssl s_client -connect example.com:443 2>/dev/null | openssl x509 -noout -dates"

Tips:

Use -d to highlight what changed
Combine with -g to wait for condition
Quote complex commands with pipes
Use -n for custom intervals (minimum 0.1s)
Exit watch with Ctrl+C

Part 9: Real-World DevOps Scenarios

These workflows combine multiple commands to solve actual production problems.

Scenario 1: Application Down — High CPU Usage

Situation: Application unresponsive, server CPU at 100%.

Triage workflow:

# 1. Establish current state
uptime
# Load average: 8.5, 7.2, 5.1 (high for 4-core system)

top
# Press '1' to show individual cores
# Press 'P' to sort by CPU
# Identify: java process consuming 380% CPU

# 2. Identify the culprit process
ps aux | sort -nrk 3 | head -5
# Output shows PID 12345, user 'appuser', 'java -jar myapp.jar'

# 3. Get detailed process info
ps -fp 12345
cat /proc/12345/cmdline | tr '\0' ' '
# Full command: java -jar -Xmx2g myapp.jar --spring.profiles.active=prod

# 4. Check what the process is doing
sudo lsof -p 12345 | head -20
# Shows open files, network connections

# 5. Sample thread activity (Java-specific)
sudo -u appuser jstack 12345 > /tmp/thread-dump.txt
cat /tmp/thread-dump.txt | grep -A 10 "RUNNABLE"

# Or use strace for system call analysis
sudo strace -c -p 12345
# Run for 10 seconds, Ctrl+C
# Shows: lots of futex, read, write calls

# 6. Check application logs
journalctl -u myapp -n 200 --no-pager
tail -100 /var/log/myapp/application.log | grep -E "ERROR|WARN"

# 7. Check for resource exhaustion
free -h
# Available: 128M (low memory!)

df -h
# All filesystems have space

# 8. Investigate memory
ps aux | sort -nrk 4 | head -5
# Same java process using 85% memory

# 9. Root cause: memory leak causing GC thrashing
# Evidence: High CPU + high memory + GC logs showing full GC cycles

# 10. Immediate remediation
sudo systemctl restart myapp

# 11. Verify recovery
watch -n 2 "systemctl status myapp"
curl http://localhost:8080/health

# 12. Monitor
watch -d -n 5 "ps aux | grep java | grep -v grep"

# 13. Post-incident
# - Review heap dump
# - Check for memory leaks in code
# - Tune JVM parameters
# - Set up memory alerts

Key commands used: uptime, top, ps, lsof, strace, journalctl, free, systemctl

Scenario 2: Disk Full — Application Failing

Situation: Application writes failing, errors mentioning "No space left on device".

Triage workflow:

# 1. Confirm disk space issue
df -h
# Output: /dev/sda1  50G  50G  0  100% /

# 2. Check inode exhaustion (common gotcha)
df -i
# Output: /dev/sda1  3M  3M  0  100% /
# Problem: Out of inodes, not space!

# 3. Find directory with most files
for dir in /*; do
  echo -n "$dir: "
  find "$dir" -xdev -type f | wc -l
done
# Output shows /var has 2.8M files

# 4. Drill down
for dir in /var/*; do
  echo -n "$dir: "
  find "$dir" -xdev -type f 2>/dev/null | wc -l
done
# /var/spool/postfix has 2.5M files!

# 5. Investigate further
ls -la /var/spool/postfix/deferred | head
# Thousands of mail queue files

# 6. Find largest space consumers (if space, not inode issue)
du -hx --max-depth=1 / | sort -hr | head -10
# /var is largest

du -hx --max-depth=1 /var | sort -hr | head -10
# /var/log is 30G

du -hx --max-depth=1 /var/log | sort -hr | head -10
# /var/log/myapp is 28G

# 7. Find large log files
find /var/log -type f -size +1G -exec ls -lh {} \;
# /var/log/myapp/application.log is 25G (not rotated!)

# 8. Check for deleted but open files (hidden space usage)
sudo lsof | grep deleted
# Shows process holding deleted 10G file

# 9. Immediate remediation (for log issue)
# Truncate, don't delete (keeps file descriptor valid)
sudo truncate -s 0 /var/log/myapp/application.log

# Or compress in place
sudo gzip /var/log/myapp/application.log

# 10. Clean old logs
find /var/log -name "*.log" -mtime +30 -exec gzip {} \;
find /var/log -name "*.gz" -mtime +90 -delete

# 11. For inode issue, remove old mail queue
sudo postsuper -d ALL deferred

# 12. Verify space recovered
df -h
df -i

# 13. Restart application
sudo systemctl restart myapp

# 14. Verify health
curl http://localhost:8080/health
tail -f /var/log/myapp/application.log

# 15. Post-incident
# - Configure log rotation
# - Set up disk space monitoring
# - Implement log retention policy

Log rotation fix:

# Create logrotate config
sudo tee /etc/logrotate.d/myapp << EOF
/var/log/myapp/*.log {
    daily
    rotate 7
    compress
    delaycompress
    missingok
    notifempty
    create 0644 appuser appgroup
    postrotate
        systemctl reload myapp
    endscript
}
EOF

# Test it
sudo logrotate -d /etc/logrotate.d/myapp
sudo logrotate -f /etc/logrotate.d/myapp

Key commands used: df, du, find, lsof, truncate, logrotate

Scenario 3: Network Connectivity Issues

Situation: Application can't reach database server.

Triage workflow:

# 1. Define the problem
# Application logs show: "Connection refused: db.internal:5432"

# 2. Test basic connectivity
ping -c 4 db.internal
# 64 bytes from 10.0.1.50: success
# Network layer works

# 3. Test DNS resolution
dig db.internal
# Returns: 10.0.1.50
# DNS works

# Verify with getent
getent hosts db.internal
# 10.0.1.50  db.internal

# 4. Check if port is reachable
nc -zv db.internal 5432
# Connection refused

# Alternative: use curl for TCP check
timeout 5 bash -c "</dev/tcp/db.internal/5432" && echo "Port open" || echo "Port closed"

# 5. Check local routing
ip route get 10.0.1.50
# Shows route via 10.0.1.1

# Trace the path
traceroute -n db.internal
# All hops respond

# 6. Check local listening ports
ss -tuln | grep :5432
# Nothing! PostgreSQL not listening

# 7. Log into database server
ssh db.internal

# 8. Check if PostgreSQL is running
sudo systemctl status postgresql
# Active: inactive (dead)

# Service crashed!

# 9. Check why it's down
sudo journalctl -u postgresql -n 100
# Shows: "FATAL: could not create lock file: No space left"

# 10. Check disk space
df -h
# /var is 100% full

# 11. Clean up space (see Scenario 2)
sudo find /var/log -name "*.log" -mtime +7 -exec gzip {} \;

# 12. Start PostgreSQL
sudo systemctl start postgresql

# 13. Verify it's listening
ss -tuln | grep :5432
# tcp  LISTEN  0  128  *:5432  *:*

# 14. Test from app server
nc -zv db.internal 5432
# Connection to db.internal 5432 port [tcp/postgresql] succeeded!

# 15. Check PostgreSQL logs
sudo tail -f /var/log/postgresql/postgresql-*.log

# 16. Test application connection
# From app server:
psql -h db.internal -U appuser -d appdb -c "SELECT 1;"
# Success!

# 17. Restart application
ssh app-server "sudo systemctl restart myapp"

# 18. Verify application health
curl http://app-server:8080/health

Network debugging cheat sheet:

# Layer 1-2: Physical/Link
ip link show
ethtool eth0

# Layer 3: Network
ping -c 4 <host>
ip route
ip addr

# Layer 4: Transport
ss -tuln | grep :<port>
nc -zv <host> <port>

# Layer 7: Application
curl -v http://<host>:<port>/
dig <hostname>

# Firewall
sudo firewall-cmd --list-all
sudo iptables -L -n

Key commands used: ping, dig, nc, ss, ip, traceroute, ssh, systemctl, journalctl

Scenario 4: Post-Deployment Validation

Situation: Just deployed new application version, need to verify health.

Complete validation workflow:

# 1. Pre-deployment snapshot
ssh prod-server << 'EOF'
  systemctl status myapp > /tmp/pre-deploy.txt
  ss -tuln | grep :8080 >> /tmp/pre-deploy.txt
  ps aux | grep myapp >> /tmp/pre-deploy.txt
EOF

# 2. Deploy application
rsync -avz --delete /local/app/ prod-server:/opt/app/

# 3. Restart service
ssh prod-server "sudo systemctl restart myapp"

# 4. Wait for startup (30 seconds)
sleep 30

# 5. Check service status
ssh prod-server "systemctl status myapp"
# Active: active (running) since...

# 6. Verify process is running
ssh prod-server "ps aux | grep myapp | grep -v grep"
# Shows java process

# 7. Check port is listening
ssh prod-server "ss -tuln | grep :8080"
# tcp  LISTEN  0  128  *:8080  *:*

# 8. Test health endpoint
curl -sf http://prod-server:8080/health
# {"status":"UP","version":"1.2.3"}

# If fails:
curl -v http://prod-server:8080/health
# Shows detailed error

# 9. Check application logs for startup
ssh prod-server "sudo journalctl -u myapp -n 50 --no-pager"
# Look for "Started MyApp" message

# 10. Check for errors
ssh prod-server "sudo journalctl -u myapp -p err -n 20"
# No errors = good

# 11. Watch logs for anomalies
ssh prod-server "sudo tail -f /var/log/myapp/application.log" &
TAIL_PID=$!

# 12. Smoke test critical endpoints
curl -sf http://prod-server:8080/api/users | jq '.data | length'
# Returns count

curl -sf http://prod-server:8080/api/config
# Returns config

# 13. Load test (light)
for i in {1..100}; do
  curl -sf http://prod-server:8080/health > /dev/null
  echo -n "."
done
echo " Done"

# 14. Monitor resource usage
ssh prod-server "ps aux | grep myapp | awk '{print \$3, \$4}'"
# CPU: 2.5%, MEM: 15.3%

# 15. Check for memory leaks (wait 5 minutes, check again)
sleep 300
ssh prod-server "ps aux | grep myapp | awk '{print \$3, \$4}'"
# CPU: 1.2%, MEM: 15.4% (stable)

# 16. Database connectivity
ssh prod-server "curl -sf http://localhost:8080/api/db-health"
# {"database":"connected"}

# 17. External dependencies
ssh prod-server "curl -sf http://localhost:8080/api/dependencies"
# Shows all upstream services: OK

# 18. Monitor for 10 minutes
watch -d -n 10 "ssh prod-server 'systemctl status myapp | head -3'"

# 19. Check error logs continuously
ssh prod-server "sudo journalctl -u myapp -f" | grep -i error &

# 20. Stop monitoring
kill $TAIL_PID

# 21. Final validation
curl -sf http://prod-server:8080/health && echo "✓ DEPLOYMENT SUCCESS" || echo "✗ DEPLOYMENT FAILED"

# 22. Post-deployment snapshot
ssh prod-server << 'EOF'
  systemctl status myapp > /tmp/post-deploy.txt
  ss -tuln | grep :8080 >> /tmp/post-deploy.txt
  ps aux | grep myapp >> /tmp/post-deploy.txt
EOF

# 23. Compare before/after
ssh prod-server "diff /tmp/pre-deploy.txt /tmp/post-deploy.txt"

Automated health check script:

#!/bin/bash
set -euo pipefail

HOST="$1"
PORT="${2:-8080}"
TIMEOUT=300  # 5 minutes

echo "Validating deployment on $HOST:$PORT"

# Wait for port to open
echo -n "Waiting for port to listen..."
for i in $(seq 1 $TIMEOUT); do
  if nc -z "$HOST" "$PORT" 2>/dev/null; then
    echo " OK"
    break
  fi
  echo -n "."
  sleep 1
done

# Test health endpoint
echo -n "Testing health endpoint..."
if curl -sf "http://$HOST:$PORT/health" > /dev/null; then
  echo " OK"
else
  echo " FAILED"
  exit 1
fi

# Check logs for errors
echo -n "Checking logs for errors..."
if ssh "$HOST" "sudo journalctl -u myapp --since '5 minutes ago' -p err -q"; then
  echo " ERRORS FOUND"
  exit 1
else
  echo " OK"
fi

# Monitor for 60 seconds
echo "Monitoring for 60 seconds..."
for i in {1..60}; do
  if ! curl -sf "http://$HOST:$PORT/health" > /dev/null; then
    echo "Health check failed after $i seconds"
    exit 1
  fi
  echo -n "."
  sleep 1
done
echo " OK"

echo "✓ Deployment validation complete"

Key commands used: rsync, ssh, systemctl, ps, ss, curl, journalctl, tail, watch

Part 10: Pro Tips & Best Practices

1. Command Combinations (Pipeline Mastery)

Process analysis:

# Top 10 CPU consumers
ps aux | sort -nrk 3 | head -10

# Top 10 memory consumers
ps aux | sort -nrk 4 | head -10

# Count processes by user
ps aux | awk '{print $1}' | sort | uniq -c | sort -nr

# Find zombie processes
ps aux | awk '$8=="Z" {print}'

# Process tree for specific process
ps -ef | grep -A 10 nginx

# Total memory by command
ps aux | awk '{arr[$11]+=$6} END {for (i in arr) print i,arr[i]/1024 "MB"}' | sort -nrk 2 | head

Log analysis pipelines:

# Top 10 error types
grep ERROR /var/log/app.log | awk '{print $5}' | sort | uniq -c | sort -nr | head -10

# Requests per minute
awk '{print $1}' access.log | sort | uniq -c

# 95th percentile response time
awk '{print $10}' access.log | sort -n | awk '{a[NR]=$1} END {print a[int(NR*0.95)]}'

# Top IPs by request count
awk '{print $1}' access.log | sort | uniq -c | sort -nr | head -10

# Failed requests by hour
grep " 500 " access.log | awk '{print $4}' | cut -d: -f1-2 | uniq -c

# Unique users today
awk '{print $1}' access.log | sort -u | wc -l

Network analysis:

# Connections by state
ss -ant | awk '{print $1}' | sort | uniq -c

# Connections per IP
ss -tn | tail -n +2 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr

# Listening services with PIDs
sudo ss -tlnp | column -t

# Bandwidth by process (requires iftop)
sudo iftop -P

# Active connections to specific port
ss -tn sport = :8080 | tail -n +2 | wc -l

System monitoring oneliners:

# CPU usage per core
mpstat -P ALL 1 5

# Disk I/O by device
iostat -xz 1 5

# Memory breakdown
free -h && echo "---" && cat /proc/meminfo | grep -E "Dirty|Writeback|Mapped"

# Top 10 open file descriptors by process
sudo lsof | awk '{print $1}' | sort | uniq -c | sort -nr | head -10

# Process with most threads
ps -eo pid,comm,nlwp | sort -nrk 3 | head

File operations:

# Find recently modified config files
find /etc -name "*.conf" -mtime -1 -ls

# Find files larger than 100M modified in last 7 days
find / -type f -size +100M -mtime -7 2>/dev/null

# Duplicate file finder (by size)
find . -type f -exec du -h {} + | sort -h | uniq -d -w 10

# Find and delete temp files older than 7 days
find /tmp -type f -name "*.tmp" -mtime +7 -delete

# Disk usage by user in /home
sudo du -sch /home/*/ | sort -h

2. Essential Aliases for DevOps

Create ~/.bash_aliases or add to ~/.bashrc:

# System monitoring
alias meminfo='free -h'
alias cpuinfo='lscpu | grep -E "Model name|^CPU\(s\)|Thread|Core"'
alias diskinfo='df -h | grep -v loop'
alias topme='htop -u $USER'

# Process management
alias pscpu='ps aux | sort -nrk 3 | head -10'
alias psmem='ps aux | sort -nrk 4 | head -10'
alias pstree='ps axjf'

# Service management  
alias svc='systemctl'
alias svcs='systemctl status'
alias svcr='sudo systemctl restart'
alias svce='sudo systemctl enable'
alias svcd='sudo systemctl disable'
alias svclogs='journalctl -u'
alias svclist='systemctl list-units --type=service --state=running'

# Network
alias ports='sudo ss -tulpn'
alias listening='sudo ss -tulpn | grep LISTEN'
alias connections='ss -tu'
alias myip='curl -s ifconfig.me'
alias pingtest='ping -c 5 8.8.8.8'

# Logs
alias tailf='tail -f'
alias taillogs='sudo tail -f /var/log/messages'
alias tailerr='sudo tail -f /var/log/messages | grep -i error'
alias syslog='sudo journalctl -f'

# File operations
alias ll='ls -lah'
alias la='ls -A'
alias lt='ls -lhtr'  # Sorted by time
alias lsize='ls -lhS' # Sorted by size
alias tree='tree -C'

# Safety
alias rm='rm -i'
alias cp='cp -i'
alias mv='mv -i'
alias mkdir='mkdir -p'

# Navigation
alias ..='cd ..'
alias ...='cd ../..'
alias ....='cd ../../..'

# Git (if used)
alias gs='git status'
alias gd='git diff'
alias gl='git log --oneline -10'
alias gb='git branch'
alias gp='git pull'
alias gf='git fetch'

# Docker (if used)
alias dps='docker ps'
alias dpsa='docker ps -a'
alias dimg='docker images'
alias dlog='docker logs -f'
alias dexec='docker exec -it'
alias dstop='docker stop $(docker ps -q)'
alias dprune='docker system prune -af'

# Quick commands
alias h='history'
alias c='clear'
alias x='exit'
alias reload='source ~/.bashrc'

# Custom shortcuts
alias dev='cd ~/projects/dev-server && ll'
alias logs='cd /var/log && ll'
alias nginx-reload='sudo systemctl reload nginx && echo "Nginx reloaded"'
alias app-restart='sudo systemctl restart myapp && curl -sf localhost:8080/health'

Team-shared aliases (put in git repo):

# Production shortcuts
alias prod-ssh='ssh -J bastion prod-server'
alias prod-status='ssh prod-server "systemctl status myapp"'
alias prod-logs='ssh prod-server "sudo journalctl -u myapp -f"'
alias prod-restart='ssh prod-server "sudo systemctl restart myapp"'

# Deployment
alias deploy-staging='./scripts/deploy.sh staging'
alias deploy-prod='./scripts/deploy.sh production'

# Health checks
alias check-all='for s in $(cat servers.txt); do echo -n "$s: "; curl -sf http://$s/health && echo OK || echo FAIL; done'

# Log aggregation
alias tail-all='for s in $(cat servers.txt); do echo "=== $s ==="; ssh $s "tail -20 /var/log/app.log"; done'

3. Safety Practices

Before destructive operations:

# Test with echo first
find /old/path -name "*.log" -exec echo rm {} \;

# Dry run with rsync
rsync -avzn --delete /source/ /dest/

# Backup before sed -i
cp file.conf file.conf.backup
sed -i 's/old/new/' file.conf

# Or use sed without -i
sed 's/old/new/' file.conf > file.conf.new
diff file.conf file.conf.new
mv file.conf.new file.conf

# Interactive delete
rm -i important-file

# Confirm before bulk operations
find . -name "*.tmp" -ok rm {} \;

# Use trash instead of rm (install trash-cli)
alias rm='trash'

Validate before execution:

# Check command syntax
sudo nginx -t
sudo systemctl daemon-reload

# Verify sudo permissions
sudo -l

# Test script syntax
bash -n script.sh
shellcheck script.sh

# Validate JSON/YAML
jq . config.json
python3 -m json.tool config.json

# Check file ownership before chown
ls -la file.txt

Use safe patterns:

# Set errexit (exit on error)
set -e
set -euo pipefail

# Use || true for commands that may fail
grep pattern file.txt || true

# Check command exists
command -v jq >/dev/null || { echo "jq not found"; exit 1; }

# Verify file exists before operations
[ -f /path/file ] || { echo "File not found"; exit 1; }

# Lock files for scripts
LOCKFILE=/tmp/myscript.lock
if [ -f "$LOCKFILE" ]; then
  echo "Script already running"
  exit 1
fi
touch "$LOCKFILE"
trap "rm -f $LOCKFILE" EXIT

4. Efficiency Tips

Command-line shortcuts:

# Reverse search (most powerful)
Ctrl+R  # Search history
Ctrl+R  # Next match
Enter   # Execute

# Navigation
Ctrl+A  # Start of line
Ctrl+E  # End of line
Ctrl+K  # Kill to end of line
Ctrl+U  # Kill to start of line
Ctrl+W  # Delete word backward
Alt+B   # Move back one word
Alt+F   # Move forward one word

# Process control
Ctrl+C  # Kill current process
Ctrl+Z  # Suspend process
jobs    # List background jobs
fg      # Bring to foreground
bg      # Resume in background

# Clear and exit
Ctrl+L  # Clear screen
Ctrl+D  # Exit shell

Useful shell options:

# Add to ~/.bashrc

# Autocorrect typos in cd
shopt -s cdspell

# Extended glob patterns
shopt -s extglob

# Recursive globbing (**)
shopt -s globstar

# Append to history, don't overwrite
shopt -s histappend

# Multi-line commands on one line
shopt -s cmdhist

# Update LINES and COLUMNS after resize
shopt -s checkwinsize

Productivity multipliers:

# Use !! for last command
sudo !!

# Use !$ for last argument
cat /var/log/nginx/access.log
vi !$

# Use !* for all arguments
grep error /var/log/app.log
less !*

# Quick substitution
docker run old-image
^old^new

# Brace expansion
mv file.{txt,bak}  # mv file.txt file.bak
cp file.conf{,.backup}  # cp file.conf file.conf.backup
mkdir -p project/{src,bin,lib,docs}

# Command substitution
cd $(dirname $(which nginx))
kill $(pgrep -f "java.*myapp")

# For loops
for i in {1..5}; do echo "Server $i"; done
for server in web{1..3}; do ssh $server uptime; done
for file in *.log; do gzip "$file"; done

# While loops with read
cat servers.txt | while read server; do
  echo "Checking $server"
  ssh "$server" "df -h"
done

tmux for persistent sessions:

# Start session
tmux

# Detach
Ctrl+B D

# List sessions
tmux ls

# Reattach
tmux attach

# Named session
tmux new -s deploy

# Split windows
Ctrl+B %  # Vertical split
Ctrl+B "  # Horizontal split
Ctrl+B O  # Switch pane

5. Documentation & Learning

Document as you go:

# Save your history with context
history | tail -20 > incident-$(date +%Y%m%d).txt

# Create runbooks from commands
cat << 'EOF' > runbook-deploy.md
# Deployment Runbook

## Pre-deployment
\`\`\`bash
ssh prod-server "systemctl status myapp"
\`\`\`

## Deployment
\`\`\`bash
rsync -avz /local/app/ prod-server:/opt/app/
ssh prod-server "sudo systemctl restart myapp"
\`\`\`

## Validation
\`\`\`bash
curl http://prod-server:8080/health
\`\`\`
EOF

# Screenshot your terminal (script command)
script -t 2>timing.txt deployment.log
# Do your work
# exit
# Replay: scriptreplay -t timing.txt deployment.log

Command man pages:

# Read manual
man ssh
man 5 ssh_config  # Config file format

# Search man pages
man -k network

# Show one-line description
whatis grep

# Quick help
grep --help

Learn by exploring:

# See what command does
type ll
type -a python

# Find command location
which nginx
whereis nginx

# Check command version
nginx -v
python --version

# Inspect binaries
file $(which curl)
ldd $(which curl)  # Library dependencies

# Explore /proc filesystem
cat /proc/cpuinfo
cat /proc/meminfo
cat /proc/sys/kernel/hostname

6. Monitoring & Alerting

Quick health checks:

# One-liner system check
echo "Load: $(uptime | awk -F'load average:' '{print $2}')" && \
echo "Disk: $(df -h / | awk 'NR==2{print $5}')" && \
echo "Mem: $(free | awk 'NR==2{printf "%.0f%%", $3/$2*100}')"

# Service availability
systemctl is-active myapp && echo "UP" || echo "DOWN"

# Port check
nc -z localhost 8080 && echo "Port open" || echo "Port closed"

# HTTP health
curl -sf http://localhost/health | jq -r '.status' || echo "FAIL"

Alerting with simple scripts:

#!/bin/bash
# disk-alert.sh

THRESHOLD=90
USAGE=$(df -h / | awk 'NR==2{print $5}' | tr -d '%')

if [ $USAGE -gt $THRESHOLD ]; then
  echo "ALERT: Disk usage at ${USAGE}%"
  # Send notification (mail, slack, etc)
fi

Conclusion

You now have a comprehensive toolkit of Linux commands for DevOps operations.

Key takeaways:

Master process monitoring (ps, top) for immediate triage
Know your logs (journalctl, tail, grep) for root cause analysis
Understand resource monitoring (free, df, iostat) to prevent issues
Use network tools (ss, curl, dig) to debug connectivity
Leverage remote operations (ssh, rsync) for efficient workflows
Apply permission management (chmod, chown, sudo) securely
Build pipelines (pipes, xargs, awk) for complex analysis
Practice on non-production systems first

Next steps:

Bookmark this guide for reference
Practice one section daily in your work
Build your personal alias library
Create runbooks for common tasks
Automate repetitive operations with scripts

What's next:

Part 2: Advanced Linux Commands for SRE (coming soon)
Security-Focused Linux Commands for DevOps
Docker & Container Management CLI Reference
Troubleshooting Production Incidents: A Command-Line Guide

Take action:

Which commands do you use most? Share in comments
Have a favorite one-liner? Drop it below
Subscribe for more DevOps tutorials
Follow for updates on the advanced series

Related guides on this blog:

Setting Up GitLab CI for Blog Automation (internal link)
Infrastructure as Code with Ansible (internal link)
DevOps Workflows: From Code to Deployment (internal link)

Keywords: devops linux commands, linux automation, system monitoring, devops tools, linux troubleshooting, site reliability engineering, system administration, command line reference

Essential Linux Commands for DevOps Engineers

Introduction

Part 1: Process & System Monitoring

1. Process Management

ps – Process snapshot

top / htop – Live view

3. Finding Process IDs (PIDs)

4. Terminating Processes

2. System Resource Monitoring

free – Memory usage

df – Filesystem usage

du – Directory utilization

Part 2: Service Management

systemctl

journalctl

Part 3: Network Troubleshooting

ss (socket statistics)

ip (network configuration)

curl

wget

ping

traceroute / mtr

dig / nslookup

Part 4: File Operations & Text Processing

find

grep

tail / head

awk

sed

cut

Bonus: jq

Part 5: Remote Operations & File Transfer

ssh

scp

rsync

Part 5: Remote Operations & File Transfer

Part 6: System Information Commands

Part 7: User & Permission Management

sudo — Controlled Privilege Escalation

chmod — Permission Control

chown / chgrp — Ownership Management

getent — User/Group Queries

Part 8: DevOps-Specific Power Commands

xargs — Argument Builder for Bulk Operations

history — Command Audit Trail

alias — Command Shortcuts

time — Performance Measurement

watch — Live Command Monitoring

Part 9: Real-World DevOps Scenarios

Scenario 1: Application Down — High CPU Usage

Scenario 2: Disk Full — Application Failing

Scenario 3: Network Connectivity Issues

Scenario 4: Post-Deployment Validation

Part 10: Pro Tips & Best Practices

1. Command Combinations (Pipeline Mastery)

2. Essential Aliases for DevOps

3. Safety Practices

4. Efficiency Tips

5. Documentation & Learning

6. Monitoring & Alerting

Conclusion

Essential Linux Commands for DevOps Engineers

Introduction

Part 1: Process & System Monitoring

1. Process Management

ps – Process snapshot

top / htop – Live view

3. Finding Process IDs (PIDs)

4. Terminating Processes

2. System Resource Monitoring

free – Memory usage

df – Filesystem usage

du – Directory utilization

Part 2: Service Management

systemctl

journalctl

Part 3: Network Troubleshooting

ss (socket statistics)

ip (network configuration)

curl

`ps` – Process snapshot

`top` / `htop` – Live view

`free` – Memory usage

`df` – Filesystem usage

`du` – Directory utilization

`systemctl`

`journalctl`

`ss` (socket statistics)

`ip` (network configuration)

`curl`

`wget`

`ping`

`traceroute` / `mtr`

`dig` / `nslookup`

`find`

`grep`

`tail` / `head`

`awk`

`sed`

`cut`

Bonus: `jq`

`ssh`

`scp`

`rsync`

`ps` – Process snapshot

`top` / `htop` – Live view

`free` – Memory usage

`df` – Filesystem usage

`du` – Directory utilization

`systemctl`

`journalctl`

`ss` (socket statistics)

`ip` (network configuration)

`curl`

`wget`

`ping`

`traceroute` / `mtr`

`dig` / `nslookup`

`find`

`grep`

`tail` / `head`

`awk`

`sed`

`cut`

Bonus: `jq`

`ssh`

`scp`

`rsync`