Skip to main content

Troubleshooting Overview

Use a layer-by-layer process to avoid random fixes.

Standard Workflow

  1. Define symptom precisely (timeout, reset, DNS failure, packet loss).
  2. Reproduce with the smallest possible command.
  3. Locate failing layer (DNS, TCP, TLS, app).
  4. Capture evidence (ss, tcpdump, dig, logs).
  5. Change one variable and re-test.

Fast Triage Matrix

SymptomLikely LayerFirst Commands
Name does not resolveDNS/Applicationdig, nslookup
Timeout before connectNetwork/Transporttraceroute, nc, ss
TLS handshake failedApplication/TLSopenssl s_client, curl -v
Slow but successfulTransport/Appss -ti, request trace

Core Commands

Connectivity

ping -c 4 <host>
traceroute <host>
mtr -rw <host>

DNS

dig <host>
nslookup <host>

TCP and Port Reachability

ss -tulpen
nc -vz <host> <port>

Packet Capture

tcpdump -i any host <host> -nn
tcpdump -i any tcp port 443 -nn

Decision Trees

Cannot connect to service

  1. DNS resolves?
  2. Port reachable?
  3. TLS handshake succeeds?
  4. App returns expected response?

Intermittent failure

  1. Compare by AZ/region/path.
  2. Check MTU and retransmissions.
  3. Correlate with deployment and traffic spikes.

Topic-Specific Guides