☁️ DevOps 与云原生
“自动化一切。如果一件事你做了两次,就应该把它脚本化。”
本章节旨在提供部署、扩展和维护生产级应用所需的现代 DevOps 实践指南。
🐳 Docker
Dockerfile 最佳实践
# 多阶段构建 (Multi-stage build) 用于生成更小的镜像
FROM eclipse-temurin:21-jdk AS build
WORKDIR /app
COPY pom.xml .
COPY src ./src
RUN ./mvnw clean package -DskipTests
FROM eclipse-temurin:21-jre
WORKDIR /app
COPY /app/target/*.jar app.jar
# 为安全考虑使用非 root 用户
RUN addgroup --system app && adduser --system --group app
USER app
EXPOSE 8080
HEALTHCHECK \ # 健康检查
CMD curl -f http://localhost:8080/actuator/health || exit 1
ENTRYPOINT ["java", "-jar", "app.jar"]
Docker Compose
# docker-compose.yml
version: '3.8'
services:
app:
build: .
ports:
- "8080:8080"
environment:
- SPRING_DATASOURCE_URL=jdbc:postgresql://db:5432/myapp
- SPRING_REDIS_HOST=cache
depends_on:
db:
condition: service_healthy
cache:
condition: service_started
networks:
- backend
db:
image: postgres:16
environment:
POSTGRES_DB: myapp
POSTGRES_USER: user
POSTGRES_PASSWORD: password
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U user -d myapp"]
interval: 10s
timeout: 5s
retries: 5
networks:
- backend
cache:
image: redis:7-alpine
networks:
- backend
volumes:
postgres_data:
networks:
backend:
必备 Docker 命令
# 构建并运行
docker build -t myapp:latest .
docker run -d -p 8080:8080 --name myapp myapp:latest
# Compose 操作
docker compose up -d # 后台启动服务
docker compose logs -f app # 实时查看日志
docker compose down -v # 停止并移除服务
# 调试
docker exec -it myapp /bin/sh # 进入容器终端
docker logs --tail 100 -f myapp # 查看最新日志
docker stats # 查看容器资源使用
☸️ Kubernetes
核心概念
Deployment 示例
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
labels:
app: myapp
spec:
replicas: 3 # 副本数
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myapp:latest
ports:
- containerPort: 8080
resources:
requests:
memory: "256Mi" # 内存请求
cpu: "250m" # CPU 请求
limits:
memory: "512Mi" # 内存限制
cpu: "500m" # CPU 限制
livenessProbe:
httpGet:
path: /actuator/health
port: 8080
initialDelaySeconds: 30 # 启动后延迟 30 秒开始探测
periodSeconds: 10 # 每 10 秒探测一次
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 5 # 启动后延迟 5 秒开始探测
periodSeconds: 5 # 每 5 秒探测一次
---
apiVersion: v1
kind: Service
metadata:
name: myapp-service
spec:
selector:
app: myapp
ports:
- port: 80
targetPort: 8080
type: LoadBalancer # 负载均衡器服务
Kubectl 命令
# 集群信息
kubectl cluster-info
kubectl get nodes
# 部署操作
kubectl apply -f deployment.yaml # 应用配置
kubectl get pods -w # 实时查看 Pod 状态
kubectl describe pod myapp-xxx # 查看 Pod 详细信息
# 扩缩容
kubectl scale deployment myapp --replicas=5 # 扩容到 5 个副本
# 调试
kubectl logs myapp-xxx -f # 实时查看容器日志
kubectl exec -it myapp-xxx -- /bin/sh # 进入容器终端
kubectl port-forward svc/myapp-service 8080:80 # 端口转发
# 滚动更新
kubectl set image deployment/myapp myapp=myapp:v2 # 更新镜像
kubectl rollout status deployment/myapp # 查看滚动更新状态
kubectl rollout undo deployment/myapp # 回滚到上一个版本
🔄 CI/CD with GitHub Actions
完整工作流示例
# .github/workflows/deploy.yml
name: 构建与部署
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: 设置 JDK 21
uses: actions/setup-java@v4
with:
java-version: '21'
distribution: 'temurin'
cache: maven
- name: 运行测试
run: ./mvnw verify
build:
needs: test
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v4
- name: 设置 Docker Buildx
uses: docker/setup-buildx-action@v3
- name: 登录容器注册表
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: 构建并推送镜像
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: |
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
deploy:
needs: build
runs-on: ubuntu-latest
environment: production
steps:
- name: 部署到 Kubernetes
uses: azure/k8s-deploy@v4
with:
manifests: k8s/
images: |
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
☁️ 云平台
AWS 服务概览
| 服务 | 用途 | 替代方案 (Google Cloud) |
|---|---|---|
| EC2 | 虚拟机服务器 | GCP Compute Engine |
| S3 | 对象存储 | GCP Cloud Storage |
| RDS | 托管数据库 | GCP Cloud SQL |
| Lambda | 无服务器函数 | GCP Cloud Functions |
| EKS | 托管 Kubernetes | GCP GKE |
| CloudWatch | 监控 | GCP Cloud Monitoring |
| SQS/SNS | 消息队列 | GCP Pub/Sub |
AWS CLI 示例
# S3 操作
aws s3 cp file.txt s3://mybucket/ # 上传文件
aws s3 sync ./dist s3://mybucket/static/ # 同步目录
# ECR (容器注册表)
aws ecr get-login-password | docker login --username AWS --password-stdin <account>.dkr.ecr.<region>.amazonaws.com
docker tag myapp:latest <account>.dkr.ecr.<region>.amazonaws.com/myapp:latest
docker push <account>.dkr.ecr.<region>.amazonaws.com/myapp:latest
# ECS / EKS
aws ecs update-service --cluster mycluster --service myapp --force-new-deployment
aws eks update-kubeconfig --name mycluster --region us-east-1
📊 监控与可观测性
可观测性的三大支柱
| 支柱 | 用途 | 推荐工具 |
|---|---|---|
| 日志 (Logs) | 事件记录 | ELK Stack, Loki, CloudWatch Logs |
| 指标 (Metrics) | 性能度量 | Prometheus, Datadog, CloudWatch |
| 追踪 (Traces) | 请求流分析 | Jaeger, Zipkin, X-Ray |
# Prometheus 抓取配置示例
scrape_configs:
- job_name: 'spring-app'
metrics_path: '/actuator/prometheus' # Spring Boot 指标端点
static_configs:
- targets: ['app:8080']
📝 详细主题
DevOps 原则 (Principles)
- 基础设施即代码 (IaC) - 将所有基础设施都纳入版本控制。
- 不可变基础设施 (Immutable Infrastructure) - 替换实例而非打补丁。
- 自动化部署 - 减少人为错误。
- 全方位监控 - 在用户抱怨前发现问题。
- 向前推进失败 (Fail Forward) - 快速回滚,无责事后分析。