跳到主要内容

☁️ DevOps 与云原生

“自动化一切。如果一件事你做了两次,就应该把它脚本化。”

本章节旨在提供部署、扩展和维护生产级应用所需的现代 DevOps 实践指南。


🐳 Docker

Dockerfile 最佳实践

# 多阶段构建 (Multi-stage build) 用于生成更小的镜像
FROM eclipse-temurin:21-jdk AS build
WORKDIR /app
COPY pom.xml .
COPY src ./src
RUN ./mvnw clean package -DskipTests

FROM eclipse-temurin:21-jre
WORKDIR /app
COPY --from=build /app/target/*.jar app.jar

# 为安全考虑使用非 root 用户
RUN addgroup --system app && adduser --system --group app
USER app

EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=3s \ # 健康检查
CMD curl -f http://localhost:8080/actuator/health || exit 1

ENTRYPOINT ["java", "-jar", "app.jar"]

Docker Compose

# docker-compose.yml
version: '3.8'

services:
app:
build: .
ports:
- "8080:8080"
environment:
- SPRING_DATASOURCE_URL=jdbc:postgresql://db:5432/myapp
- SPRING_REDIS_HOST=cache
depends_on:
db:
condition: service_healthy
cache:
condition: service_started
networks:
- backend

db:
image: postgres:16
environment:
POSTGRES_DB: myapp
POSTGRES_USER: user
POSTGRES_PASSWORD: password
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U user -d myapp"]
interval: 10s
timeout: 5s
retries: 5
networks:
- backend

cache:
image: redis:7-alpine
networks:
- backend

volumes:
postgres_data:

networks:
backend:

必备 Docker 命令

# 构建并运行
docker build -t myapp:latest .
docker run -d -p 8080:8080 --name myapp myapp:latest

# Compose 操作
docker compose up -d # 后台启动服务
docker compose logs -f app # 实时查看日志
docker compose down -v # 停止并移除服务

# 调试
docker exec -it myapp /bin/sh # 进入容器终端
docker logs --tail 100 -f myapp # 查看最新日志
docker stats # 查看容器资源使用

☸️ Kubernetes

核心概念

Deployment 示例

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
labels:
app: myapp
spec:
replicas: 3 # 副本数
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myapp:latest
ports:
- containerPort: 8080
resources:
requests:
memory: "256Mi" # 内存请求
cpu: "250m" # CPU 请求
limits:
memory: "512Mi" # 内存限制
cpu: "500m" # CPU 限制
livenessProbe:
httpGet:
path: /actuator/health
port: 8080
initialDelaySeconds: 30 # 启动后延迟 30 秒开始探测
periodSeconds: 10 # 每 10 秒探测一次
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 5 # 启动后延迟 5 秒开始探测
periodSeconds: 5 # 每 5 秒探测一次
---
apiVersion: v1
kind: Service
metadata:
name: myapp-service
spec:
selector:
app: myapp
ports:
- port: 80
targetPort: 8080
type: LoadBalancer # 负载均衡器服务

Kubectl 命令

# 集群信息
kubectl cluster-info
kubectl get nodes

# 部署操作
kubectl apply -f deployment.yaml # 应用配置
kubectl get pods -w # 实时查看 Pod 状态
kubectl describe pod myapp-xxx # 查看 Pod 详细信息

# 扩缩容
kubectl scale deployment myapp --replicas=5 # 扩容到 5 个副本

# 调试
kubectl logs myapp-xxx -f # 实时查看容器日志
kubectl exec -it myapp-xxx -- /bin/sh # 进入容器终端
kubectl port-forward svc/myapp-service 8080:80 # 端口转发

# 滚动更新
kubectl set image deployment/myapp myapp=myapp:v2 # 更新镜像
kubectl rollout status deployment/myapp # 查看滚动更新状态
kubectl rollout undo deployment/myapp # 回滚到上一个版本

🔄 CI/CD with GitHub Actions

完整工作流示例

# .github/workflows/deploy.yml
name: 构建与部署

on:
push:
branches: [main]
pull_request:
branches: [main]

env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}

jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: 设置 JDK 21
uses: actions/setup-java@v4
with:
java-version: '21'
distribution: 'temurin'
cache: maven

- name: 运行测试
run: ./mvnw verify

build:
needs: test
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'

steps:
- uses: actions/checkout@v4

- name: 设置 Docker Buildx
uses: docker/setup-buildx-action@v3

- name: 登录容器注册表
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: 构建并推送镜像
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: |
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max

deploy:
needs: build
runs-on: ubuntu-latest
environment: production

steps:
- name: 部署到 Kubernetes
uses: azure/k8s-deploy@v4
with:
manifests: k8s/
images: |
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}

☁️ 云平台

AWS 服务概览

服务用途替代方案 (Google Cloud)
EC2虚拟机服务器GCP Compute Engine
S3对象存储GCP Cloud Storage
RDS托管数据库GCP Cloud SQL
Lambda无服务器函数GCP Cloud Functions
EKS托管 KubernetesGCP GKE
CloudWatch监控GCP Cloud Monitoring
SQS/SNS消息队列GCP Pub/Sub

AWS CLI 示例

# S3 操作
aws s3 cp file.txt s3://mybucket/ # 上传文件
aws s3 sync ./dist s3://mybucket/static/ # 同步目录

# ECR (容器注册表)
aws ecr get-login-password | docker login --username AWS --password-stdin <account>.dkr.ecr.<region>.amazonaws.com
docker tag myapp:latest <account>.dkr.ecr.<region>.amazonaws.com/myapp:latest
docker push <account>.dkr.ecr.<region>.amazonaws.com/myapp:latest

# ECS / EKS
aws ecs update-service --cluster mycluster --service myapp --force-new-deployment
aws eks update-kubeconfig --name mycluster --region us-east-1

📊 监控与可观测性

可观测性的三大支柱

支柱用途推荐工具
日志 (Logs)事件记录ELK Stack, Loki, CloudWatch Logs
指标 (Metrics)性能度量Prometheus, Datadog, CloudWatch
追踪 (Traces)请求流分析Jaeger, Zipkin, X-Ray
# Prometheus 抓取配置示例
scrape_configs:
- job_name: 'spring-app'
metrics_path: '/actuator/prometheus' # Spring Boot 指标端点
static_configs:
- targets: ['app:8080']

📝 详细主题


DevOps 原则 (Principles)
  1. 基础设施即代码 (IaC) - 将所有基础设施都纳入版本控制。
  2. 不可变基础设施 (Immutable Infrastructure) - 替换实例而非打补丁。
  3. 自动化部署 - 减少人为错误。
  4. 全方位监控 - 在用户抱怨前发现问题。
  5. 向前推进失败 (Fail Forward) - 快速回滚,无责事后分析。