[Triton 운영형 서빙 플랫폼 (GitOps · 검증 · Alerting) - 에필로그]

에필로그 — “GitOps 기반 Triton 서빙이 ‘배포→검증→관측→알림’ 루프로 고정”

📌 전체 경로 요약

순서	주제
1	🔗 Triton (CPU-only) GitOps 통합: ONNX 1개 서빙 + Prometheus/Grafana 관측
2	🔗 MLflow → Triton 자동 배포 파이프라인 구축 (Airflow · 검증 체인· 최소 롤백)
3	🔗 Alerting 운영 표준 매뉴얼 (Dev/Prod 분리 + Triton Serving Alerts)
4	🔗 Triton 운영형 서빙 플랫폼 (GitOps · 검증 · Alerting) - 검증

🎯 전체 회고 요약

단계	핵심 목표	주요 개선점
1	Triton 서빙 기반	GitOps 분리 · explicit load · 관측
2	배포 자동화	MLflow 단일 소스 · 검증 기반 commit/rollback
3	알람 운영	null default · namespace 라우팅 · latency 알럿

🔄 핵심 문장:
“Register → Materialize → Load → Ready → Smoke Infer → Commit → Observe → Alert → (Fail) Rollback”
모델 배포를 명시적 상태 전이(load/unload) 로 정의하고,
모든 전이가 검증 체인을 통과한 경우에만 commit 되도록 고정

🧩 Serving + Observability 전체 구조

SERVING 계층: Triton (Inference 전용)
CONTROL 계층: Airflow DAG (배포→검증→확정/롤백 오케스트레이션)
SOURCE OF TRUTH: MLflow Registry + current.json(운영 모델 상태)
OBS 계층: Prometheus → Grafana + Alertmanager → Slack
DEV/PROD 분리 원칙: namespace/label/selector 기반으로 완전 절단

🔁 One Commit → Triton Serving Loop

GitOps로 Triton은 항상 실행 상태 유지
Airflow가 MLflow 모델을 materialize 후 explicit load
load → ready → smoke infer 통과 시에만 운영 상태 commit
메트릭 기반 관측 + Alertmanager 정책으로 알람 통제
메트릭 기반 관측 + Alertmanager 정책으로 알람 통제

🧠 운영 원칙 정리

1) 환경 분리

triton-dev / triton-prod 네임스페이스 분리
model-repo는 NFS에서 dev/prod path 자체를 분리
- /model-repo/dev, /model-repo/prod
Prometheus는 ServiceMonitor release 라벨 매칭으로만 스크랩
- dev: release=monitoring-dev
- prod: release=monitoring-prod

2) Serving 제어 원칙 (explicit 모드)

model-control-mode=explicit
운영에서 모델 교체는 위험 구간이므로:
- 자동 로딩 대신 필요 시에만 load/unload
- “배포 성공”이 아니라 검증 통과 후에만 운영 확정

3) 운영 상태의 단일 기준 (current.json)

current.json이 운영 모델 상태의 Single Source of Truth
실패 버전은 삭제가 아니라 격리(.failed_*)
→ 재현/원인 분석 가능성을 운영 수준에서 보존

4) Observability / Alerting 표준

Triton은 /metrics 자체 노출(프로메테우스 포맷)
Alertmanager는 null default가 출발점
Slack 전송은 namespace 정규식 match로만 허용
- dev/prod 교차는 “실수해도” 최종 단계에서 차단

✅ 최종 점검 체크리스트 (E2E)

triton-dev, triton-prod Pod Running
GET /v2/health/ready → 200 OK
모델 load 성공 (POST /v2/repository/models/{model}/load)
smoke infer 성공 (POST /v2/models/{model}/infer)
/metrics에서 nv_inference_* 계열 증가 확인(실제 infer 후)
Prometheus Targets에서 triton scrape up 확인(dev/prod 각각)
Grafana에서 RPS/Latency/Queue/CPU/Count 패널 동작 확인
Triton 알럿이 dev/prod 채널로 섞이지 않고 라우팅되는지 확인

🏁 회고

이번 시리즈의 결론은 “Triton을 띄웠다”가 아니라
서빙 계층을 GitOps로 재현 가능하게 만들고, 배포를 상태 전이로 통제하며, 관측·알림을 운영 정책으로 고정했다는 점입니다.
Triton은 모델이 실행되는 최전선이고, 그 최전선을
검증 체인(load/ready/infer) + 최소 롤백(current.json 기준 복구) + 알람 라우팅(dev/prod 분리) 으로 안정화했습니다.
이제 GPU·대규모 트래픽·게이트웨이 계층 확장도
“안전한 기반 위에서” 진행할 수 있는 상태입니다.

🙌 프로젝트 GitHub 저장소

GitHub 코드: [GitOps] mlops-platform
DAG 코드: [DAG] airflow-dags

에필로그 — “GitOps 기반 Triton 서빙이 ‘배포→검증→관측→알림’ 루프로 고정”#

📌 전체 경로 요약#

🎯 전체 회고 요약#

🧩 Serving + Observability 전체 구조#

🔁 One Commit → Triton Serving Loop#

🧠 운영 원칙 정리#

1) 환경 분리#

2) Serving 제어 원칙 (explicit 모드)#

3) 운영 상태의 단일 기준 (current.json)#

4) Observability / Alerting 표준#

✅ 최종 점검 체크리스트 (E2E)#

🏁 회고#

🙌 프로젝트 GitHub 저장소#

에필로그 — “GitOps 기반 Triton 서빙이 ‘배포→검증→관측→알림’ 루프로 고정”

📌 전체 경로 요약

🎯 전체 회고 요약

🧩 Serving + Observability 전체 구조

🔁 One Commit → Triton Serving Loop

🧠 운영 원칙 정리

1) 환경 분리

2) Serving 제어 원칙 (explicit 모드)

3) 운영 상태의 단일 기준 (current.json)

4) Observability / Alerting 표준

✅ 최종 점검 체크리스트 (E2E)

🏁 회고

🙌 프로젝트 GitHub 저장소