Skip to main content

Troubleshooting Overview

Use a layer-by-layer process to avoid random fixes.

Standard Workflow

Define symptom precisely (timeout, reset, DNS failure, packet loss).
Reproduce with the smallest possible command.
Locate failing layer (DNS, TCP, TLS, app).
Capture evidence (ss, tcpdump, dig, logs).
Change one variable and re-test.

Fast Triage Matrix

Symptom	Likely Layer	First Commands
Name does not resolve	DNS/Application	`dig`, `nslookup`
Timeout before connect	Network/Transport	`traceroute`, `nc`, `ss`
TLS handshake failed	Application/TLS	`openssl s_client`, `curl -v`
Slow but successful	Transport/App	`ss -ti`, request trace

Core Commands

Connectivity

ping -c 4 <host>
traceroute <host>
mtr -rw <host>

DNS

dig <host>
nslookup <host>

TCP and Port Reachability

ss -tulpen
nc -vz <host> <port>

Packet Capture

tcpdump -i any host <host> -nn
tcpdump -i any tcp port 443 -nn

Decision Trees

Cannot connect to service

DNS resolves?
Port reachable?
TLS handshake succeeds?
App returns expected response?

Intermittent failure

Compare by AZ/region/path.
Check MTU and retransmissions.
Correlate with deployment and traffic spikes.

Topic-Specific Guides

Standard Workflow
Fast Triage Matrix
Core Commands
Decision Trees
- Cannot connect to service
- Intermittent failure
Topic-Specific Guides
Related Reading