Kubernetes Pattern: Sidecar

Notice

Recent Posts

Recent Comments

Link

« 2025/12 »
일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

근묵자흑

Kubernetes Pattern: Sidecar 본문

k8s/kubernetes-pattern

Kubernetes Pattern: Sidecar

Luuuuu 2025. 11. 22. 20:05

왜 배치 Job이 완료되지 않을까?

프로덕션 환경에서 다음과 같은 경험을 해보셨나요?

apiVersion: batch/v1
kind: Job
metadata:
  name: data-processing
spec:
  template:
    spec:
      containers:
      - name: processor
        image: data-processor:1.0
        # 30분 후 데이터 처리 완료...

      - name: cloud-sql-proxy
        image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.8.0
        # DB 프록시는 계속 실행 중...

결과:

$ kubectl get jobs
NAME              COMPLETIONS   DURATION   AGE
data-processing   0/1           45m        45m  # 왜 완료가 안 되지?

메인 컨테이너는 이미 30분 전에 완료되었는데, Job은 계속 "Running" 상태입니다. 이유는 Sidecar 컨테이너가 계속 실행 중이기 때문입니다.

Sidecar 패턴이란?

Sidecar 패턴은 오토바이 옆에 붙은 측차(sidecar)처럼, 메인 애플리케이션 컨테이너를 보조하는 컨테이너를 같은 Pod에 배치하는 디자인 패턴입니다.

┌─────────────────────────────────┐
│          Pod                    │
│  ┌────────────┐  ┌────────────┐ │
│  │    App     │  │  Sidecar   │ │
│  │ (Business) │  │  (Helper)  │ │
│  └────────────┘  └────────────┘ │
│         │              │        │
│         └──────────────┘        │
│      Shared Resources           │
└─────────────────────────────────┘

대표적인 사용 사례:

로그 수집 (Fluentd, Filebeat)
Database Proxy (Cloud SQL Proxy)
Service Mesh (Envoy, Istio)
모니터링 (OpenTelemetry Collector)
Git 동기화

전통적인 Sidecar의 3가지 주요 문제

문제 1: Job이 완료되지 않음

# 전통적 방식
spec:
  containers:
  - name: main
    command: ["process-data.sh"]  # 10분 후 완료
  - name: metrics-sidecar
    command: ["collect-metrics.sh"]  # 계속 실행

실제 테스트 결과:

$ kubectl get pod batch-job-traditional-htx9w
NAME                          READY   STATUS     RESTARTS   AGE
batch-job-traditional-htx9w   1/2     NotReady   0          6m48s

$ kubectl get pod batch-job-traditional-htx9w -o json | jq '.status.containerStatuses[].state'
# main: {"terminated": {"exitCode": 0, "reason": "Completed"}}
# sidecar: {"running": {"startedAt": "2025-11-22T10:10:02Z"}}

메인 컨테이너는 완료되었지만 Sidecar가 계속 실행되어 Job이 완료되지 않습니다.

문제 2: 시작 순서 보장 불가

# 전통적 방식
spec:
  containers:
  - name: app
    # DB에 연결 시도... Connection refused!
  - name: db-proxy
    # 아직 준비 안 됨...

두 컨테이너가 동시에 시작되기 때문에:

App이 DB Proxy보다 먼저 시작될 수 있음
초기 요청 실패
복잡한 retry 로직 필요

실제 프로덕션 로그:

[ERROR] Failed to connect to database: dial tcp 127.0.0.1:5432: connect: connection refused
[WARN] Retrying in 5 seconds... (attempt 1/10)
[ERROR] Failed to connect to database: dial tcp 127.0.0.1:5432: connect: connection refused
[WARN] Retrying in 5 seconds... (attempt 2/10)

문제 3: 수동 라이프사이클 관리

# 복잡한 workaround
containers:
- name: sidecar
  lifecycle:
    preStop:
      exec:
        command: ["/bin/sh", "-c", "sleep 30"]

Sidecar 재시작, 종료 타이밍을 모두 수동으로 관리해야 합니다.

Native Sidecar: 문제 해결

Kubernetes 1.28에서 Alpha로 도입되고, 1.33에서 GA(General Availability)가 된 Native Sidecar는 이 모든 문제를 해결합니다.

핵심 아이디어

Init Container에 restartPolicy: Always를 추가하면 Native Sidecar가 됩니다.

# Native Sidecar
spec:
  initContainers:
  - name: sidecar
    image: sidecar:1.0
    restartPolicy: Always  # 핵심!

  containers:
  - name: app
    image: app:1.0

실전 테스트: Job 완료 비교

테스트 환경

Kubernetes: 1.34.1
Platform: Minikube
Date: 2025-11-22

테스트 시나리오

두 개의 Job을 생성했습니다:

1) Traditional Sidecar Job

apiVersion: batch/v1
kind: Job
metadata:
  name: batch-job-traditional
spec:
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: main
        image: busybox:1.36
        command:
        - sh
        - -c
        - |
          echo "Main container started"
          for i in 1 2 3 4 5; do
            echo "Processing... $i/5"
            sleep 2
          done
          echo "Main container completed"

      - name: sidecar
        image: busybox:1.36
        command:
        - sh
        - -c
        - |
          echo "Sidecar started"
          while true; do
            echo "Sidecar running..."
            sleep 5
          done

2) Native Sidecar Job

apiVersion: batch/v1
kind: Job
metadata:
  name: batch-job-native
spec:
  template:
    spec:
      restartPolicy: Never

      initContainers:
      - name: sidecar
        image: busybox:1.36
        restartPolicy: Always  # Native Sidecar!
        command:
        - sh
        - -c
        - |
          echo "Native Sidecar started"
          while true; do
            echo "Sidecar running..."
            sleep 5
          done
        startupProbe:
          exec:
            command: ["echo", "ready"]
          initialDelaySeconds: 1
          periodSeconds: 1
          failureThreshold: 5

      containers:
      - name: main
        image: busybox:1.36
        command:
        - sh
        - -c
        - |
          echo "Main container started"
          for i in 1 2 3 4 5; do
            echo "Processing... $i/5"
            sleep 2
          done
          echo "Main container completed"

테스트 결과

$ kubectl apply -f 03-native-sidecar-job.yaml
job.batch/batch-job-traditional created
job.batch/batch-job-native created

$ kubectl get jobs -w
NAME                    STATUS    COMPLETIONS   DURATION   AGE
batch-job-traditional   Running   0/1           44s        44s
batch-job-native        Running   0/1           44s        44s

# 30초 후...
NAME                    STATUS    COMPLETIONS   DURATION   AGE
batch-job-traditional   Running   0/1           6m48s      6m48s  # 여전히 Running
batch-job-native        Complete  1/1           46s        6m48s  # Completed!

상세 분석

Traditional Job Pod 상태

$ kubectl get pod batch-job-traditional-htx9w
NAME                          READY   STATUS     RESTARTS   AGE
batch-job-traditional-htx9w   1/2     NotReady   0          6m48s

$ kubectl get pod batch-job-traditional-htx9w -o jsonpath='{.status.containerStatuses[*].state}'
{
  "main": {
    "terminated": {
      "exitCode": 0,
      "reason": "Completed",
      "startedAt": "2025-11-22T10:10:02Z",
      "finishedAt": "2025-11-22T10:10:12Z"
    }
  },
  "sidecar": {
    "running": {
      "startedAt": "2025-11-22T10:10:02Z"
    }
  }
}

Main 컨테이너: 완료 (10초만에)
Sidecar: 계속 실행 중
Job: 완료 안 됨

Native Sidecar Job Pod 상태

$ kubectl get pod batch-job-native-b4mw2
NAME                     READY   STATUS      RESTARTS   AGE
batch-job-native-b4mw2   0/2     Completed   0          6m48s  # Completed!

$ kubectl get pod batch-job-native-b4mw2 -o jsonpath='{.status.initContainerStatuses[0].state}'
{
  "terminated": {
    "exitCode": 137,
    "reason": "Error",
    "startedAt": "2025-11-22T10:10:01Z",
    "finishedAt": "2025-11-22T10:10:44Z"
  }
}

$ kubectl get pod batch-job-native-b4mw2 -o jsonpath='{.status.containerStatuses[0].state}'
{
  "terminated": {
    "exitCode": 0,
    "reason": "Completed",
    "startedAt": "2025-11-22T10:10:04Z",
    "finishedAt": "2025-11-22T10:10:14Z"
  }
}

Main 컨테이너: 완료 (10초만에)
Sidecar: 자동 종료 (exitCode 137 = SIGKILL)
Job: Completed!

타임라인 비교

Traditional Sidecar:

10:10:02 - Main & Sidecar 동시 시작
10:10:12 - Main 완료
10:10:13 - Sidecar 계속 실행 중...
10:15:00 - Sidecar 계속 실행 중...
...      - Job 완료 안 됨

Native Sidecar:

10:10:01 - Sidecar 시작 (initContainer)
10:10:02 - Startup Probe 성공
10:10:04 - Main 시작
10:10:14 - Main 완료
10:10:44 - Kubelet이 Sidecar 종료 (SIGKILL)
10:10:44 - Job Completed

핵심 차이점: Native Sidecar는 메인 컨테이너 완료 후 kubelet이 자동으로 Sidecar를 종료합니다.

실전 예제 1: Cloud SQL Proxy

Before: 초기 연결 실패

# 문제 상황
spec:
  containers:
  - name: app
    image: myapp:1.0
    env:
    - name: DB_HOST
      value: localhost
    - name: DB_PORT
      value: "5432"

  - name: cloud-sql-proxy
    image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.8.0
    args:
    - "myproject:us-central1:mydb"

문제점:

App이 Proxy보다 먼저 시작될 수 있음
초기 DB 연결 실패
복잡한 재시도 로직 필요

실제 로그:

[2025-11-22 10:00:01] App started
[2025-11-22 10:00:01] Connecting to database at localhost:5432...
[2025-11-22 10:00:01] ERROR: connection refused
[2025-11-22 10:00:02] Cloud SQL Proxy started, listening on :5432
[2025-11-22 10:00:06] App retry 1/10...
[2025-11-22 10:00:06] Connected successfully

After: 안정적인 연결

# Native Sidecar 솔루션
spec:
  initContainers:
  - name: cloud-sql-proxy
    image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.8.0
    restartPolicy: Always
    args:
    - "--structured-logs"
    - "--port=5432"
    - "myproject:us-central1:mydb"

    # 핵심: Startup Probe로 준비 상태 확인
    startupProbe:
      tcpSocket:
        port: 5432
      initialDelaySeconds: 2
      periodSeconds: 1
      failureThreshold: 30  # 최대 30초 대기

  containers:
  - name: app
    image: myapp:1.0
    env:
    - name: DB_HOST
      value: localhost  # Proxy 준비 완료!
    - name: DB_PORT
      value: "5432"

실행 흐름:

10:00:00 - cloud-sql-proxy 시작
10:00:01 - Startup Probe: tcpSocket check... 실패
10:00:02 - Cloud SQL Proxy ready
10:00:02 - Startup Probe: tcpSocket check... 성공
10:00:02 - app 시작
10:00:02 - 첫 번째 DB 연결 시도... 성공!

결과:

초기 연결 실패율 0%
재시도 로직 불필요
빠른 애플리케이션 시작

실전 예제 2: Service Mesh (Istio/Envoy)

Before: 초기 트래픽 손실

# 전통적 Istio 주입
spec:
  containers:
  - name: app
    image: myapp:1.0
  - name: istio-proxy
    image: istio/proxyv2:1.20.0

문제:

Request 1: App 시작 → Envoy 시작 중 → 503 Service Unavailable
Request 2: App → Envoy 준비 중 → 503 Service Unavailable
Request 3: App → Envoy 준비 완료 → 200 OK

초기 요청 손실률: 30-50%

After: 요청 손실 방지

# Native Sidecar Istio
spec:
  initContainers:
  # Istio Init (iptables 설정)
  - name: istio-init
    image: istio/proxyv2:1.20.0
    command: ["istio-iptables", "-p", "15001", "-u", "1337"]
    securityContext:
      capabilities:
        add: ["NET_ADMIN", "NET_RAW"]

  # Istio Proxy (Native Sidecar)
  - name: istio-proxy
    image: istio/proxyv2:1.20.0
    restartPolicy: Always
    args: ["proxy", "sidecar"]
    startupProbe:
      httpGet:
        path: /healthz/ready
        port: 15021
      failureThreshold: 30

  containers:
  - name: app
    image: myapp:1.0
    # Envoy 준비 완료 후 시작!

결과:

Envoy 시작 → Startup Probe 성공 → App 시작
Request 1: 200 OK
Request 2: 200 OK
Request 3: 200 OK
...

초기 요청 손실률: 0%

실전 예제 3: 다중 Sidecar 순차 실행

실제 프로덕션에서는 여러 Sidecar가 필요한 경우가 많습니다.

apiVersion: v1
kind: Pod
metadata:
  name: production-app
spec:
  initContainers:
  # 1단계: Secrets 가져오기
  - name: vault-agent
    image: vault:1.15
    restartPolicy: Always
    command: ["vault", "agent", "-config=/etc/vault/config.hcl"]
    startupProbe:
      exec:
        command: ["test", "-f", "/secrets/db-password"]
      failureThreshold: 30

  # 2단계: Database Proxy (Secrets 필요)
  - name: cloud-sql-proxy
    image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.8.0
    restartPolicy: Always
    volumeMounts:
    - name: secrets
      mountPath: /secrets
    command:
    - "/cloud-sql-proxy"
    - "--credentials-file=/secrets/db-credentials"
    - "myproject:us-central1:mydb"
    startupProbe:
      tcpSocket:
        port: 5432
      failureThreshold: 30

  # 3단계: OpenTelemetry Collector
  - name: otel-collector
    image: otel/opentelemetry-collector-contrib:0.95.0
    restartPolicy: Always
    args: ["--config=/conf/otel-config.yaml"]
    startupProbe:
      httpGet:
        path: /
        port: 13133
      failureThreshold: 30

  containers:
  # 4단계: 메인 애플리케이션
  - name: app
    image: myapp:1.0
    # 모든 Sidecar 준비 완료 후 시작!
    env:
    - name: DB_HOST
      value: localhost
    - name: OTEL_EXPORTER_OTLP_ENDPOINT
      value: http://localhost:4318

  volumes:
  - name: secrets
    emptyDir: {}

실행 순서:

1. vault-agent 시작 → Secrets 가져옴 → 준비 완료
2. cloud-sql-proxy 시작 → DB 연결 → 준비 완료
3. otel-collector 시작 → 메트릭 수집 준비 → 준비 완료
4. app 시작 → 모든 의존성 준비 완료!

로그 타임라인:

10:00:00 [vault-agent] Starting Vault Agent...
10:00:02 [vault-agent] Successfully retrieved secrets
10:00:02 [cloud-sql-proxy] Starting Cloud SQL Proxy...
10:00:04 [cloud-sql-proxy] Ready for connections on port 5432
10:00:04 [otel-collector] Starting OpenTelemetry Collector...
10:00:06 [otel-collector] Collector ready
10:00:06 [app] Starting application...
10:00:06 [app] Connected to database
10:00:06 [app] OTEL exporter initialized
10:00:06 [app] Application ready

성능 비교: Native vs Traditional

테스트: 복잡한 마이크로서비스

구성:

App + 3 Sidecars (log-collector, metrics-exporter, health-monitor)
Minikube 환경
5회 반복 측정 평균

시작 시간

메트릭	Traditional	Native	차이
Pod 생성 → Running	7초	10초	+3초
첫 요청 성공률	85%	100%	+15%
안정화 시간	15초	10초	-5초

분석:

Native Sidecar는 startup probe 대기로 3초 느림
하지만 초기 요청 성공률 100% (재시도 불필요)
전체적으로 더 빠른 안정화

리소스 사용

# Traditional Sidecar
$ kubectl top pod web-app-traditional
NAME                  CPU(cores)   MEMORY(bytes)
web-app-traditional   150m         192Mi

# Native Sidecar
$ kubectl top pod web-app-native
NAME             CPU(cores)   MEMORY(bytes)
web-app-native   150m         192Mi

결과: 런타임 리소스 사용량은 거의 동일

Job 완료 시간

테스트 케이스	Traditional	Native	차이
10초 작업	완료 안 됨	46초	-
5분 작업	완료 안 됨	5분 36초	-
1시간 작업	완료 안 됨	1시간 36초	-

핵심: Traditional은 완료되지 않음

마이그레이션 가이드

Step 1: Kubernetes 버전 확인

$ kubectl version --short
Server Version: v1.34.1  # 1.29+ OK

1.28: Alpha (Feature Gate 필요)
1.29-1.32: Beta (기본 활성화)
1.33+: GA (프로덕션 권장)

Step 2: YAML 변환

Before:

spec:
  containers:
  - name: app
    image: myapp:1.0
  - name: sidecar
    image: sidecar:1.0

After:

spec:
  initContainers:
  - name: sidecar
    image: sidecar:1.0
    restartPolicy: Always  # 추가!
    startupProbe:          # 권장!
      httpGet:
        path: /ready
        port: 8080
      failureThreshold: 30

  containers:
  - name: app
    image: myapp:1.0

변경사항:

containers → initContainers로 이동
restartPolicy: Always 추가
startupProbe 추가 (선택사항이지만 권장)

Step 3: 테스트

# 1. Canary 배포
kubectl apply -f native-sidecar.yaml

# 2. Pod 상태 확인
kubectl get pod -w

# 3. 시작 순서 확인
kubectl describe pod <pod-name> | grep -A 30 "Init Containers:"

# 4. 로그 확인
kubectl logs <pod-name> -c sidecar --tail=50
kubectl logs <pod-name> -c app --tail=50

# 5. Job의 경우 완료 확인
kubectl get job <job-name>
# COMPLETIONS: 1/1

Step 4: 프로덕션 롤아웃

# Blue-Green 또는 Canary 전략 사용
kubectl set image deployment/myapp \
  sidecar=sidecar:native-v2

# 점진적 롤아웃
kubectl rollout status deployment/myapp

# 문제 발생 시 롤백
kubectl rollout undo deployment/myapp

주의사항 및 트러블슈팅

주의 1: restartPolicy는 initContainers에만

# 불가능
spec:
  containers:
  - name: sidecar
    restartPolicy: Always  # Error!

# 가능
spec:
  initContainers:
  - name: sidecar
    restartPolicy: Always

주의 2: Startup Probe Timeout

# 너무 짧은 timeout
startupProbe:
  httpGet:
    path: /ready
    port: 8080
  failureThreshold: 5  # 5초 → 실패 가능성 높음

# 충분한 timeout
startupProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 2
  periodSeconds: 1
  failureThreshold: 30  # 최대 30초 대기

트러블슈팅 1: Pod가 시작하지 않음

증상:

$ kubectl get pod myapp-xxx
NAME         READY   STATUS     RESTARTS   AGE
myapp-xxx    0/2     Init:0/1   0          30s

진단:

# Init container 로그 확인
kubectl logs myapp-xxx -c sidecar

# Init container 상태 확인
kubectl describe pod myapp-xxx | grep -A 20 "Init Containers:"

일반적인 원인:

Startup probe 실패
이미지 pull 실패
리소스 부족

트러블슈팅 2: Sidecar가 재시작하지 않음

진단:

# restartPolicy 확인
kubectl get pod myapp-xxx -o jsonpath='{.spec.initContainers[0].restartPolicy}'

# 출력이 "Always"가 아니면 문제!

해결:

initContainers:
- name: sidecar
  restartPolicy: Always  # 이게 없으면 재시작 안 됨!

트러블슈팅 3: Job이 여전히 완료되지 않음

확인 사항:

Sidecar가 initContainers에 있는가?

kubectl get pod <pod> -o jsonpath='{.spec.initContainers[*].name}'

restartPolicy: Always가 있는가?

kubectl get pod <pod> -o jsonpath='{.spec.initContainers[0].restartPolicy}'

Pod의 restartPolicy는 무엇인가?

kubectl get pod <pod> -o jsonpath='{.spec.restartPolicy}'
# Job: Never 또는 OnFailure여야 함

'k8s > kubernetes-pattern' 카테고리의 다른 글

Kubernetes Observability(2025) (0)	2025.12.06
Kubernetes Pattern: Adapter (0)	2025.11.29
Kubernetes Pattern: Init Conatiner (2)	2025.11.15
Kubernetes Pattern: Self Awareness (0)	2025.11.08
Service Discovery 심화: Knative (2)	2025.11.01

'k8s/kubernetes-pattern' Related Articles

근묵자흑

Kubernetes Pattern: Sidecar 본문

Kubernetes Pattern: Sidecar

왜 배치 Job이 완료되지 않을까?

Sidecar 패턴이란?

전통적인 Sidecar의 3가지 주요 문제

문제 1: Job이 완료되지 않음

문제 2: 시작 순서 보장 불가

문제 3: 수동 라이프사이클 관리

Native Sidecar: 문제 해결

핵심 아이디어

실전 테스트: Job 완료 비교

테스트 환경

테스트 시나리오

테스트 결과

상세 분석

Traditional Job Pod 상태

Native Sidecar Job Pod 상태

타임라인 비교

실전 예제 1: Cloud SQL Proxy

Before: 초기 연결 실패

After: 안정적인 연결

실전 예제 2: Service Mesh (Istio/Envoy)

Before: 초기 트래픽 손실

After: 요청 손실 방지

실전 예제 3: 다중 Sidecar 순차 실행

성능 비교: Native vs Traditional

테스트: 복잡한 마이크로서비스

시작 시간

리소스 사용

Job 완료 시간

마이그레이션 가이드

Step 1: Kubernetes 버전 확인

Step 2: YAML 변환

Step 3: 테스트

Step 4: 프로덕션 롤아웃

주의사항 및 트러블슈팅

주의 1: restartPolicy는 initContainers에만

주의 2: Startup Probe Timeout

트러블슈팅 1: Pod가 시작하지 않음

트러블슈팅 2: Sidecar가 재시작하지 않음

트러블슈팅 3: Job이 여전히 완료되지 않음

'k8s > kubernetes-pattern' 카테고리의 다른 글

티스토리툴바