[GitOps ๊ธฐ๋ฐ˜ E2E ML Platform - ์‹ค์ œ ๋™์ž‘ ํ™•์ธ]

์ด ๊ธ€์—์„œ ๋‹ค๋ฃจ๋Š” ๊ฒƒ Triton READY ์ƒํƒœ, Model Repository, FastAPI health/models/reload API, Metrics endpoint๋ฅผ ์‹ค์ œ ์‹คํ–‰ ๊ฒฐ๊ณผ(Proof)๋กœ ๊ฒ€์ฆ ์„ ์ˆ˜์ง€์‹ GitOps ๊ธฐ๋ฐ˜ E2E ML Platform - ๋ชจ๋ธ ๋“ฑ๋ก/์„œ๋น™ ๋ฐ˜์˜ ๋ถ„๋ฆฌ Serving Runtime Proof ์‹ค์ œ๋กœ READY / Reload / Metrics๊นŒ์ง€ ์‚ด์•„ ์žˆ๋Š”๊ฐ€ ๋“ค์–ด๊ฐ€๋ฉฐ ์ง€๊ธˆ๊นŒ์ง€ ์ด ์‹œ๋ฆฌ์ฆˆ์—์„œ๋Š” ๋‹ค์Œ์„ ์„ค๋ช…ํ–ˆ์Šต๋‹ˆ๋‹ค. ML ํ”Œ๋žซํผ ๊ตฌ์กฐ GitOps ํ™˜๊ฒฝ ์„ค๊ณ„ Optional ๋ ˆ์ด์–ด ๋ถ„๋ฆฌ E2E ํŒŒ์ดํ”„๋ผ์ธ Drift Gate / Promotion ์ „๋žต MLflow + Triton + FastAPI ์„œ๋น™ ๊ตฌ์กฐ ํ•˜์ง€๋งŒ ์—ฌ๊ธฐ์„œ ์ค‘์š”ํ•œ ์งˆ๋ฌธ์ด ํ•˜๋‚˜ ๋‚จ์Šต๋‹ˆ๋‹ค. ...

March 6, 2026 ยท 5 min

[Triton ์šด์˜ํ˜• ์„œ๋น™ ํ”Œ๋žซํผ (GitOps ยท ๊ฒ€์ฆ ยท Alerting) - ๊ฒ€์ฆ]

๐Ÿงญ ๋ชฉ์ฐจ ๊ตฌ๋ถ„ ์ฆ๋ช… ํ•ต์‹ฌ A. GitOps ๋ถ„๋ฆฌ Triton dev/prod ๋…๋ฆฝ ๋ฐฐํฌ ๋ฐ ์ƒํƒœ ๊ณ ์ • B. ๋ชจ๋ธ ์ œ์–ด NFS model-repo ๋ถ„๋ฆฌ + explicit load ํ†ต์ œ C. ์„œ๋น™ ๊ฒ€์ฆ load โ†’ ready โ†’ infer E2E ์„ฑ๊ณต D. ๊ด€์ธก ๊ฐ€๋Šฅ์„ฑ /metrics โ†’ Prometheus โ†’ Grafana ์—ฐ๊ณ„ E. ๋ฐฐํฌ ํ†ต์ œ MLflowโ†’Airflow ๊ฒ€์ฆ ์ฒด์ธ + commit/rollback F. ์•Œ๋Ÿฟ ๋ถ„๋ฆฌ Alertmanager null default ๊ธฐ๋ฐ˜ dev/prod ๋ถ„๋ฆฌ G. ์•Œ๋Ÿฟ ์‹ค์ฆ Triton latency ์•Œ๋Ÿฟ E2E ๋™์ž‘ A. Triton GitOps & Dev/Prod ๋ถ„๋ฆฌ 1๏ธโƒฃ ArgoCD Applications (GitOps ๊ธฐ์ค€) โœ” dev/prod ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ์ƒํƒœ ์ฆ๋ช… ...

January 7, 2026 ยท 4 min

[Triton ์šด์˜ํ˜• ์„œ๋น™ ํ”Œ๋žซํผ (GitOps ยท ๊ฒ€์ฆ ยท Alerting) - ์—ํ•„๋กœ๊ทธ]

์—ํ•„๋กœ๊ทธ โ€” โ€œGitOps ๊ธฐ๋ฐ˜ Triton ์„œ๋น™์ด โ€˜๋ฐฐํฌโ†’๊ฒ€์ฆโ†’๊ด€์ธกโ†’์•Œ๋ฆผโ€™ ๋ฃจํ”„๋กœ ๊ณ ์ •โ€ ๐Ÿ“Œ ์ „์ฒด ๊ฒฝ๋กœ ์š”์•ฝ ์ˆœ์„œ ์ฃผ์ œ 1 ๐Ÿ”— Triton (CPU-only) GitOps ํ†ตํ•ฉ: ONNX 1๊ฐœ ์„œ๋น™ + Prometheus/Grafana ๊ด€์ธก 2 ๐Ÿ”— MLflow โ†’ Triton ์ž๋™ ๋ฐฐํฌ ํŒŒ์ดํ”„๋ผ์ธ ๊ตฌ์ถ• (Airflow ยท ๊ฒ€์ฆ ์ฒด์ธยท ์ตœ์†Œ ๋กค๋ฐฑ) 3 ๐Ÿ”— Alerting ์šด์˜ ํ‘œ์ค€ ๋งค๋‰ด์–ผ (Dev/Prod ๋ถ„๋ฆฌ + Triton Serving Alerts) 4 ๐Ÿ”— Triton ์šด์˜ํ˜• ์„œ๋น™ ํ”Œ๋žซํผ (GitOps ยท ๊ฒ€์ฆ ยท Alerting) - ๊ฒ€์ฆ ๐ŸŽฏ ์ „์ฒด ํšŒ๊ณ  ์š”์•ฝ ๋‹จ๊ณ„ ํ•ต์‹ฌ ๋ชฉํ‘œ ์ฃผ์š” ๊ฐœ์„ ์  1 Triton ์„œ๋น™ ๊ธฐ๋ฐ˜ GitOps ๋ถ„๋ฆฌ ยท explicit load ยท ๊ด€์ธก 2 ๋ฐฐํฌ ์ž๋™ํ™” MLflow ๋‹จ์ผ ์†Œ์Šค ยท ๊ฒ€์ฆ ๊ธฐ๋ฐ˜ commit/rollback 3 ์•Œ๋žŒ ์šด์˜ null default ยท namespace ๋ผ์šฐํŒ… ยท latency ์•Œ๋Ÿฟ ๐Ÿ”„ ํ•ต์‹ฌ ๋ฌธ์žฅ: ...

January 5, 2026 ยท 3 min

[Triton ์šด์˜ํ˜• ์„œ๋น™ ํ”Œ๋žซํผ (GitOps ยท ๊ฒ€์ฆ ยท Alerting) - Alerting ์šด์˜ ํ‘œ์ค€ ๋งค๋‰ด์–ผ]

์ด ๊ธ€์—์„œ ๋‹ค๋ฃจ๋Š” ๊ฒƒ Triton ์„œ๋น™ ํ™˜๊ฒฝ์—์„œ dev/prod ์•Œ๋Ÿฟ์„ ์™„์ „ํžˆ ๋ถ„๋ฆฌํ•˜๊ณ , PrometheusRule/Alertmanager/Grafana๋ฅผ ํ•˜๋‚˜์˜ ํŒ๋‹จ ํ๋ฆ„์œผ๋กœ ๊ณ ์ •ํ•˜๋Š” GitOps ๊ธฐ๋ฐ˜ Alerting ์šด์˜ ํ‘œ์ค€ ์„ค๊ณ„ ์„ ์ˆ˜์ง€์‹ Triton ์„œ๋น™ ํ”Œ๋žซํผ - MLflow โ†’ Triton ์ž๋™ ๋ฐฐํฌ ํŒŒ์ดํ”„๋ผ์ธ ์ด ๋‹จ๊ณ„์—์„œ ํ•ด๊ฒฐํ•˜๋ ค๋Š” ๋ฌธ์ œ Observability๋Š” ๋Œ€์‹œ๋ณด๋“œ๊ฐ€ ์•„๋‹ˆ๋ผ, ์‚ฌ๊ณ ๋ฅผ ๋ง‰๋Š” ์šด์˜ ์ •์ฑ…์ž…๋‹ˆ๋‹ค. ์ด ๋ฌธ์„œ๋Š” dev/prod ์•Œ๋Ÿฟ์„ ์™„์ „ํžˆ ๋ถ„๋ฆฌํ•˜๊ณ , ๋ผ๋ฒจ ์‹ค์ˆ˜๋กœ ์ธํ•œ ๊ต์ฐจ ์ „์†ก๊นŒ์ง€ ๊ตฌ์กฐ์ ์œผ๋กœ ์ฐจ๋‹จํ•˜๋ฉฐ, Triton ์„œ๋น™ ํ’ˆ์งˆ์„ ๋ชจ๋ธ ์‹คํ–‰ ๊ด€์ ์—์„œ ๊ฐ์ง€ํ•˜๋„๋ก ์„ค๊ณ„๋œ GitOps ๊ธฐ๋ฐ˜ Alerting ์šด์˜์ž…๋‹ˆ๋‹ค. ...

January 2, 2026 ยท 7 min

[Triton ์šด์˜ํ˜• ์„œ๋น™ ํ”Œ๋žซํผ (GitOps ยท ๊ฒ€์ฆ ยท Alerting) - MLflow โ†’ Triton ์ž๋™ ๋ฐฐํฌ ํŒŒ์ดํ”„๋ผ์ธ ๊ตฌ์ถ•]

์ด ๊ธ€์—์„œ ๋‹ค๋ฃจ๋Š” ๊ฒƒ MLflow Registry๋ฅผ ๋‹จ์ผ ์†Œ์Šค๋กœ ์‚ผ์•„ Airflow DAG์—์„œ Triton์— ๋ชจ๋ธ์„ ์ž๋™ ๋ฐฐํฌํ•˜๊ณ , ๊ฒ€์ฆ ์ฒด์ธ(load/ready/infer) ํ†ต๊ณผ ํ›„์—๋งŒ ์šด์˜ ํ™•์ •ํ•˜๋ฉฐ, ์‹คํŒจ ์‹œ ์ž๋™ ๋กค๋ฐฑํ•˜๋Š” ํŒŒ์ดํ”„๋ผ์ธ์„ ๊ตฌ์ถ•ํ•œ ๊ณผ์ • ์„ ์ˆ˜์ง€์‹ Triton ์„œ๋น™ ํ”Œ๋žซํผ - Triton ๊ตฌ์ถ• ์ด ๋‹จ๊ณ„์—์„œ ํ•ด๊ฒฐํ•˜๋ ค๋Š” ๋ฌธ์ œ ์‹ค๋ฌด ํ™˜๊ฒฝ์—์„œ ๋ชจ๋ธ ๋ฐฐํฌ๋Š” โ€œ์ƒˆ ๋ชจ๋ธ์„ ์˜ฌ๋ฆฌ๋Š” ์ž‘์—…"์ด ์•„๋‹ˆ๋ผ โ€œํ˜„์žฌ ์šด์˜ ์ƒํƒœ๋ฅผ ์•ˆ์ „ํ•˜๊ฒŒ ๊ฐฑ์‹ ํ•˜๋Š” ์ƒํƒœ ์ „์ด(State Transition)โ€œ์— ๊ฐ€๊น๋‹ค. ์ด๋ฒˆ ๋‹จ๊ณ„์—์„œ๋Š” MLflow Registry๋ฅผ ๋‹จ์ผ ์†Œ์Šค๋กœ ์‚ผ์•„ Triton Inference Server์— ๋ชจ๋ธ์„ ์ž๋™ ๋ฐฐํฌํ•˜๊ณ , ๋กœ๋”ฉ/ํ—ฌ์Šค ์ฒดํฌ/์‹ค์ œ ์ถ”๋ก  ๊ฒ€์ฆ์„ ๋ชจ๋‘ ํ†ต๊ณผํ•œ ๊ฒฝ์šฐ์—๋งŒ ์šด์˜ ๋ชจ๋ธ์„ ํ™•์ •(commit)ํ•˜๋ฉฐ, ์ค‘๊ฐ„ ๋‹จ๊ณ„์—์„œ ํ•˜๋‚˜๋ผ๋„ ์‹คํŒจํ•˜๋ฉด ์ด์ „ ์šด์˜ ์ƒํƒœ๋กœ ์ž๋™ ๋ณต๊ตฌ๋˜๋Š” ์ตœ์†Œ ๋กค๋ฐฑ ๊ตฌ์กฐ๋ฅผ ๊ตฌํ˜„ํ–ˆ๋‹ค. ...

December 29, 2025 ยท 4 min

[Triton ์šด์˜ํ˜• ์„œ๋น™ ํ”Œ๋žซํผ (GitOps ยท ๊ฒ€์ฆ ยท Alerting) - Triton ๊ตฌ์ถ•]

์ด ๊ธ€์—์„œ ๋‹ค๋ฃจ๋Š” ๊ฒƒ Triton Inference Server๋ฅผ CPU-only GitOps ๊ตฌ์กฐ๋กœ ๋ฐฐํฌํ•˜๊ณ , ONNX ๋ชจ๋ธ 1๊ฐœ์˜ load/infer ๊ฒ€์ฆ ๋ฐ Prometheus/Grafana ๊ด€์ธก๊นŒ์ง€ ์„œ๋น™ ํ”Œ๋žซํผ ๋ผˆ๋Œ€๋ฅผ ๊ตฌ์ถ•ํ•œ ๊ณผ์ • ์„ ์ˆ˜์ง€์‹ Observability 8๋‹จ๊ณ„: Data Pipeline ๊ณ ๋„ํ™” ์ด ๋‹จ๊ณ„์—์„œ ํ•ด๊ฒฐํ•˜๋ ค๋Š” ๋ฌธ์ œ ์‹ค๋ฌด์—์„œ ์„œ๋น™ ๊ณ„์ธต์€ ๊ณง๋ฐ”๋กœ ํŠธ๋ž˜ํ”ฝ๊ณผ SLA๋ฅผ ๋งž๋Š” ์ตœ์ „์„ ์ด๋‹ค. ๋ชจ๋ธ์ด ์•„๋ฌด๋ฆฌ ์ข‹์•„๋„ ์„œ๋น™์ด ๋ถˆ์•ˆ์ •ํ•˜๋ฉด ์šด์˜ ์‹œ ๋ฐ”๋กœ ๋ฌด๋„ˆ์ง„๋‹ค. ์ด๋ฒˆ์—๋Š” Triton ์ฒซ ๊ตฌ์ถ•์œผ๋กœ GPU/ํŒŒ์ดํ”„๋ผ์ธ ์—ฐ๋™์„ ์ผ๋ถ€๋Ÿฌ ๋นผ๊ณ , Triton ์ž์ฒด๋ฅผ GitOps๋กœ ์•ˆ์ •์ ์œผ๋กœ ๋„์šฐ๊ณ , ๋ชจ๋ธ load โ†’ infer โ†’ metrics ๊ด€์ธก๊นŒ์ง€ ์„œ๋น™ ํ”Œ๋žซํผ ๋ผˆ๋Œ€ ๊ตฌ์ถ•์„ ์ง„ํ–‰ํ–ˆ๋‹ค. ...

December 26, 2025 ยท 4 min

[MLOps ํ”Œ๋žซํผ Observability & Data Pipeline - ๊ฒ€์ฆ]

๐Ÿงญ ๋ชฉ์ฐจ ๊ตฌ๋ถ„ ์ฆ๋ช… ํฌ์ธํŠธ A. Observability ๋ฉ”ํŠธ๋ฆญยท๋กœ๊ทธยท์•Œ๋žŒ dev/prod ์™„์ „ ๋ถ„๋ฆฌ B. FastAPI & Platform Observability FastAPI + Platform ๋Œ€์‹œ๋ณด๋“œ ์ •์ƒ ๋™์ž‘ C. Data Pipeline Rawโ†’Feature ETL ์ž๋™ ์‹คํ–‰ ์„ฑ๊ณต D. Data Pipeline Advanced ๋ฒ„์ „ยท์Šคํ‚ค๋งˆยท๋ฉ”ํƒ€๋ฐ์ดํ„ฐยท๊ด€์ธก ๋“ฑ ์šด์˜ํ˜• ๊ตฌ์กฐ A. Observability ๊ณ„์ธต Observability ๊ณ„์ธต(๋ชจ๋‹ˆํ„ฐ๋ง + ๋กœ๊ทธ ์ˆ˜์ง‘ + ์•Œ๋žŒ)์ด GitOps ๊ธฐ๋ฐ˜์œผ๋กœ dev/prod ์™„์ „ ๋ถ„๋ฆฌ + ์ž๋™ํ™” ๋˜์–ด ์žˆ์Œ์„ ์ฆ๋ช…ํ•ฉ๋‹ˆ๋‹ค. 1๏ธโƒฃ ArgoCD Applications (GitOps ๊ธฐ๋ฐ˜ ๊ตฌ์„ฑ) โœ” 1-1. CLI๋กœ ์ „์ฒด Application ์ƒํƒœ ํ™•์ธ kubectl -n argocd get applications ...

November 30, 2025 ยท 11 min

[MLOps ํ”Œ๋žซํผ Observability & Data Pipeline - 8๋‹จ๊ณ„ : Data Pipeline ๊ณ ๋„ํ™”]

์ด ๊ธ€์—์„œ ๋‹ค๋ฃจ๋Š” ๊ฒƒ ๊ธฐ์กด v1 ๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ์— ๋ฒ„์ „ ๋””๋ ‰ํ„ฐ๋ฆฌ ๊ตฌ์กฐ, schema.json, metadata.json, KST ํƒ€์ž„๋ผ์ธ์„ ์–น์–ด ML Feature Store์— ๊ฐ€๊นŒ์šด v2 ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ ๊ณ ๋„ํ™”ํ•œ ๊ณผ์ • ์„ ์ˆ˜์ง€์‹ Observability 7๋‹จ๊ณ„: Data Pipeline ๊ตฌ์ถ• ์ด ๋‹จ๊ณ„์—์„œ ํ•ด๊ฒฐํ•˜๋ ค๋Š” ๋ฌธ์ œ ์ด์ „ ๋‹จ๊ณ„์—์„œ S3 Raw์—์„œ Feature CSV๊นŒ์ง€ ๋™์ž‘ํ•˜๋Š” ์—”๋“œํˆฌ์—”๋“œ ๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ(v1)์„ ๋งŒ๋“ค์—ˆ๋‹ค. ํ•˜์ง€๋งŒ ์‹ค๋ฌด ML ํ”Œ๋žซํผ ์ž…์žฅ์—์„œ๋Š” โ€œ๊ทธ๋•Œ ๊ทธ ์‹คํ–‰์—์„œ ์–ด๋–ค ๋ฐ์ดํ„ฐ๋ฅผ, ์–ด๋–ค ํ’ˆ์งˆ๋กœ, ์–ด๋А ๋ฒ„์ „์— ์ €์žฅํ–ˆ๋Š”์ง€"๊ฐ€ ์‹œ๊ฐ„/๋ฒ„์ „/์Šคํ‚ค๋งˆ/๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๊นŒ์ง€ ํ•œ ๋ฒˆ์— ๋‚จ์•„์•ผ ํ•œ๋‹ค. ์ด๋ฒˆ ๋‹จ๊ณ„์—์„œ๋Š” ๊ธฐ์กด v1 ํŒŒ์ดํ”„๋ผ์ธ์„ ๋ฒ„๋ฆฌ์ง€ ์•Š๊ณ , ๊ทธ ์œ„์— ๋ฒ„์ „ ๋””๋ ‰ํ„ฐ๋ฆฌ ๊ตฌ์กฐ + schema.json + metadata.json + KST ํƒ€์ž„๋ผ์ธ์„ ์–น์–ด ์‹ค์ œ ML Feature Store์— ๋” ๊ฐ€๊นŒ์šด v2 ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ ๊ณ ๋„ํ™”ํ–ˆ๋‹ค. ...

November 15, 2025 ยท 6 min