[GitOps ๊ธฐ๋ฐ˜ E2E ML Platform - ๋ถ€ํ•˜ ํ…Œ์ŠคํŠธ] k6๋กœ ๊ฒ€์ฆํ•œ Triton + FastAPI ์„œ๋น™ ์„ฑ๋Šฅ: 136 RPS, p95 553ms

์ด ๊ธ€์—์„œ ๋‹ค๋ฃจ๋Š” ๊ฒƒ k6 ๋ถ€ํ•˜ ํ…Œ์ŠคํŠธ๋กœ Triton + FastAPI ์„œ๋น™ ์„ฑ๋Šฅ์„ ์‹ค์ธก ๊ฒ€์ฆ: 136 RPS, p95 553ms, ์—๋Ÿฌ์œจ 0% (CPU-only 3๋…ธ๋“œ ํด๋Ÿฌ์Šคํ„ฐ) ์„ ์ˆ˜์ง€์‹ GitOps ๊ธฐ๋ฐ˜ E2E ML Platform - ์šด์˜ ๋ฌธ์„œํ™” Load Test: ์„œ๋น™ ์„ฑ๋Šฅ ๊ฒ€์ฆ ์‹ค์ œ๋กœ ์–ผ๋งˆ๋‚˜ ๋ฒ„ํ‹ฐ๋Š”๊ฐ€ ๋“ค์–ด๊ฐ€๋ฉฐ ์ด ์‹œ๋ฆฌ์ฆˆ์—์„œ ์ง€๊ธˆ๊นŒ์ง€ ๋‹ค์Œ์„ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค. Triton READY Model Repository Loaded FastAPI Health OK Reload API Success Metrics Exported ํ•˜์ง€๋งŒ ์—ฌ๊ธฐ์„œ ์ค‘์š”ํ•œ ์งˆ๋ฌธ์ด ํ•˜๋‚˜ ๋” ๋‚จ์Šต๋‹ˆ๋‹ค. ์‹ค์ œ ํŠธ๋ž˜ํ”ฝ์ด ๋“ค์–ด์™”์„ ๋•Œ ์ด ์‹œ์Šคํ…œ์ด ๋ฒ„ํ‹ฐ๋Š”๊ฐ€? ...

March 18, 2026 ยท 5 min

[Triton ์šด์˜ํ˜• ์„œ๋น™ ํ”Œ๋žซํผ - dynamic_batching + instance_group: CPU ์„œ๋น™ ์„ฑ๋Šฅ ์ตœ์ ํ™”]

์ด ๊ธ€์—์„œ ๋‹ค๋ฃจ๋Š” ๊ฒƒ CPU-only ํ™˜๊ฒฝ์—์„œ Triton์˜ dynamic_batching๊ณผ instance_group ์„ค์ •์œผ๋กœ ์ฒ˜๋ฆฌ๋Ÿ‰์„ ๋†’์ด๊ณ  latency๋ฅผ ์•ˆ์ •ํ™”ํ•˜๋Š” ์„ฑ๋Šฅ ์ตœ์ ํ™” ๊ณผ์ • ์„ ์ˆ˜์ง€์‹ Triton ์„œ๋น™ ํ”Œ๋žซํผ - Alerting ์šด์˜ ํ‘œ์ค€ ๋งค๋‰ด์–ผ ์ด ๋‹จ๊ณ„์—์„œ ํ•ด๊ฒฐํ•˜๋ ค๋Š” ๋ฌธ์ œ Triton์„ ์˜ฌ๋ฆฌ๊ณ  ๋ชจ๋ธ์ด READY ์ƒํƒœ๊ฐ€ ๋˜๋ฉด ๋์ผ๊นŒ? ๊ธฐ๋ณธ ์„ค์ •๋งŒ ์‚ฌ์šฉํ•˜๋ฉด ์š”์ฒญ์ด ํ•˜๋‚˜์”ฉ ์ˆœ์„œ๋Œ€๋กœ ์ฒ˜๋ฆฌ๋ฉ๋‹ˆ๋‹ค. ๋™์‹œ ์š”์ฒญ์ด ๋ชฐ๋ฆฌ๋ฉด ํ๊ฐ€ ์Œ“์ด๊ณ , latency๊ฐ€ ๊ธ‰๊ฒฉํžˆ ์˜ฌ๋ผ๊ฐ‘๋‹ˆ๋‹ค. ์ด ๊ธ€์€ dynamic_batching๊ณผ instance_group ์„ค์ •์œผ๋กœ CPU ํ™˜๊ฒฝ์—์„œ ์ฒ˜๋ฆฌ๋Ÿ‰์„ ๋†’์ด๊ณ  latency๋ฅผ ์•ˆ์ •ํ™”ํ•œ ๊ณผ์ •์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ๐ŸŽฏ ํ•ต์‹ฌ ์š”์•ฝ ๊ธฐ๋ณธ ์„ค์ •์˜ ํ•œ๊ณ„: single instance + no batching -> ์š”์ฒญ ์ง๋ ฌ ์ฒ˜๋ฆฌ instance_group: CPU ์ธ์Šคํ„ด์Šค ์ˆ˜ ์ฆ๊ฐ€ -> ๋ณ‘๋ ฌ ์ถ”๋ก  ์ฒ˜๋ฆฌ dynamic_batching: ์š”์ฒญ ์ž๋™ ๋ฌถ์Œ -> ๋ฐฐ์น˜ ๋‹จ์œ„ ์ถ”๋ก ์œผ๋กœ ์ฒ˜๋ฆฌ๋Ÿ‰ ํ–ฅ์ƒ ์ ์šฉ ์‹œ์ : explicit mode์—์„œ๋Š” model load ์‹œ config.pbtxt ๋ฐ˜์˜ ์‹ค์ธก ๊ฒฐ๊ณผ: 136 RPS, p95 553ms, ์—๋Ÿฌ์œจ 0% (100 VU, CPU-only 3๋…ธ๋“œ) 1๏ธโƒฃ ๊ธฐ๋ณธ ์„ค์ •๋งŒ์œผ๋กœ๋Š” ๋ฌด์—‡์ด ๋ถ€์กฑํ•œ๊ฐ€ Triton์„ ์ฒ˜์Œ ์‹คํ–‰ํ•˜๋ฉด ๊ธฐ๋ณธ ๋™์ž‘์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ...

March 15, 2026 ยท 4 min

[Triton ์šด์˜ํ˜• ์„œ๋น™ ํ”Œ๋žซํผ (GitOps ยท ๊ฒ€์ฆ ยท Alerting) - ๊ฒ€์ฆ]

๐Ÿงญ ๋ชฉ์ฐจ ๊ตฌ๋ถ„ ์ฆ๋ช… ํ•ต์‹ฌ A. GitOps ๋ถ„๋ฆฌ Triton dev/prod ๋…๋ฆฝ ๋ฐฐํฌ ๋ฐ ์ƒํƒœ ๊ณ ์ • B. ๋ชจ๋ธ ์ œ์–ด NFS model-repo ๋ถ„๋ฆฌ + explicit load ํ†ต์ œ C. ์„œ๋น™ ๊ฒ€์ฆ load โ†’ ready โ†’ infer E2E ์„ฑ๊ณต D. ๊ด€์ธก ๊ฐ€๋Šฅ์„ฑ /metrics โ†’ Prometheus โ†’ Grafana ์—ฐ๊ณ„ E. ๋ฐฐํฌ ํ†ต์ œ MLflowโ†’Airflow ๊ฒ€์ฆ ์ฒด์ธ + commit/rollback F. ์•Œ๋Ÿฟ ๋ถ„๋ฆฌ Alertmanager null default ๊ธฐ๋ฐ˜ dev/prod ๋ถ„๋ฆฌ G. ์•Œ๋Ÿฟ ์‹ค์ฆ Triton latency ์•Œ๋Ÿฟ E2E ๋™์ž‘ A. Triton GitOps & Dev/Prod ๋ถ„๋ฆฌ 1๏ธโƒฃ ArgoCD Applications (GitOps ๊ธฐ์ค€) โœ” dev/prod ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ์ƒํƒœ ์ฆ๋ช… ...

January 7, 2026 ยท 4 min

[Triton ์šด์˜ํ˜• ์„œ๋น™ ํ”Œ๋žซํผ (GitOps ยท ๊ฒ€์ฆ ยท Alerting) - ์—ํ•„๋กœ๊ทธ]

์—ํ•„๋กœ๊ทธ โ€” โ€œGitOps ๊ธฐ๋ฐ˜ Triton ์„œ๋น™์ด โ€˜๋ฐฐํฌโ†’๊ฒ€์ฆโ†’๊ด€์ธกโ†’์•Œ๋ฆผโ€™ ๋ฃจํ”„๋กœ ๊ณ ์ •โ€ ๐Ÿ“Œ ์ „์ฒด ๊ฒฝ๋กœ ์š”์•ฝ ์ˆœ์„œ ์ฃผ์ œ 1 ๐Ÿ”— Triton (CPU-only) GitOps ํ†ตํ•ฉ: ONNX 1๊ฐœ ์„œ๋น™ + Prometheus/Grafana ๊ด€์ธก 2 ๐Ÿ”— MLflow โ†’ Triton ์ž๋™ ๋ฐฐํฌ ํŒŒ์ดํ”„๋ผ์ธ ๊ตฌ์ถ• (Airflow ยท ๊ฒ€์ฆ ์ฒด์ธยท ์ตœ์†Œ ๋กค๋ฐฑ) 3 ๐Ÿ”— Alerting ์šด์˜ ํ‘œ์ค€ ๋งค๋‰ด์–ผ (Dev/Prod ๋ถ„๋ฆฌ + Triton Serving Alerts) 4 ๐Ÿ”— Triton ์šด์˜ํ˜• ์„œ๋น™ ํ”Œ๋žซํผ (GitOps ยท ๊ฒ€์ฆ ยท Alerting) - ๊ฒ€์ฆ ๐ŸŽฏ ์ „์ฒด ํšŒ๊ณ  ์š”์•ฝ ๋‹จ๊ณ„ ํ•ต์‹ฌ ๋ชฉํ‘œ ์ฃผ์š” ๊ฐœ์„ ์  1 Triton ์„œ๋น™ ๊ธฐ๋ฐ˜ GitOps ๋ถ„๋ฆฌ ยท explicit load ยท ๊ด€์ธก 2 ๋ฐฐํฌ ์ž๋™ํ™” MLflow ๋‹จ์ผ ์†Œ์Šค ยท ๊ฒ€์ฆ ๊ธฐ๋ฐ˜ commit/rollback 3 ์•Œ๋žŒ ์šด์˜ null default ยท namespace ๋ผ์šฐํŒ… ยท latency ์•Œ๋Ÿฟ ๐Ÿ”„ ํ•ต์‹ฌ ๋ฌธ์žฅ: ...

January 5, 2026 ยท 3 min

[Triton ์šด์˜ํ˜• ์„œ๋น™ ํ”Œ๋žซํผ (GitOps ยท ๊ฒ€์ฆ ยท Alerting) - Alerting ์šด์˜ ํ‘œ์ค€ ๋งค๋‰ด์–ผ]

์ด ๊ธ€์—์„œ ๋‹ค๋ฃจ๋Š” ๊ฒƒ Triton ์„œ๋น™ ํ™˜๊ฒฝ์—์„œ dev/prod ์•Œ๋Ÿฟ์„ ์™„์ „ํžˆ ๋ถ„๋ฆฌํ•˜๊ณ , PrometheusRule/Alertmanager/Grafana๋ฅผ ํ•˜๋‚˜์˜ ํŒ๋‹จ ํ๋ฆ„์œผ๋กœ ๊ณ ์ •ํ•˜๋Š” GitOps ๊ธฐ๋ฐ˜ Alerting ์šด์˜ ํ‘œ์ค€ ์„ค๊ณ„ ์„ ์ˆ˜์ง€์‹ Triton ์„œ๋น™ ํ”Œ๋žซํผ - MLflow โ†’ Triton ์ž๋™ ๋ฐฐํฌ ํŒŒ์ดํ”„๋ผ์ธ ์ด ๋‹จ๊ณ„์—์„œ ํ•ด๊ฒฐํ•˜๋ ค๋Š” ๋ฌธ์ œ Observability๋Š” ๋Œ€์‹œ๋ณด๋“œ๊ฐ€ ์•„๋‹ˆ๋ผ, ์‚ฌ๊ณ ๋ฅผ ๋ง‰๋Š” ์šด์˜ ์ •์ฑ…์ž…๋‹ˆ๋‹ค. ์ด ๋ฌธ์„œ๋Š” dev/prod ์•Œ๋Ÿฟ์„ ์™„์ „ํžˆ ๋ถ„๋ฆฌํ•˜๊ณ , ๋ผ๋ฒจ ์‹ค์ˆ˜๋กœ ์ธํ•œ ๊ต์ฐจ ์ „์†ก๊นŒ์ง€ ๊ตฌ์กฐ์ ์œผ๋กœ ์ฐจ๋‹จํ•˜๋ฉฐ, Triton ์„œ๋น™ ํ’ˆ์งˆ์„ ๋ชจ๋ธ ์‹คํ–‰ ๊ด€์ ์—์„œ ๊ฐ์ง€ํ•˜๋„๋ก ์„ค๊ณ„๋œ GitOps ๊ธฐ๋ฐ˜ Alerting ์šด์˜์ž…๋‹ˆ๋‹ค. ...

January 2, 2026 ยท 7 min

[Triton ์šด์˜ํ˜• ์„œ๋น™ ํ”Œ๋žซํผ (GitOps ยท ๊ฒ€์ฆ ยท Alerting) - MLflow โ†’ Triton ์ž๋™ ๋ฐฐํฌ ํŒŒ์ดํ”„๋ผ์ธ ๊ตฌ์ถ•]

์ด ๊ธ€์—์„œ ๋‹ค๋ฃจ๋Š” ๊ฒƒ MLflow Registry๋ฅผ ๋‹จ์ผ ์†Œ์Šค๋กœ ์‚ผ์•„ Airflow DAG์—์„œ Triton์— ๋ชจ๋ธ์„ ์ž๋™ ๋ฐฐํฌํ•˜๊ณ , ๊ฒ€์ฆ ์ฒด์ธ(load/ready/infer) ํ†ต๊ณผ ํ›„์—๋งŒ ์šด์˜ ํ™•์ •ํ•˜๋ฉฐ, ์‹คํŒจ ์‹œ ์ž๋™ ๋กค๋ฐฑํ•˜๋Š” ํŒŒ์ดํ”„๋ผ์ธ์„ ๊ตฌ์ถ•ํ•œ ๊ณผ์ • ์„ ์ˆ˜์ง€์‹ Triton ์„œ๋น™ ํ”Œ๋žซํผ - Triton ๊ตฌ์ถ• ์ด ๋‹จ๊ณ„์—์„œ ํ•ด๊ฒฐํ•˜๋ ค๋Š” ๋ฌธ์ œ ์‹ค๋ฌด ํ™˜๊ฒฝ์—์„œ ๋ชจ๋ธ ๋ฐฐํฌ๋Š” โ€œ์ƒˆ ๋ชจ๋ธ์„ ์˜ฌ๋ฆฌ๋Š” ์ž‘์—…"์ด ์•„๋‹ˆ๋ผ โ€œํ˜„์žฌ ์šด์˜ ์ƒํƒœ๋ฅผ ์•ˆ์ „ํ•˜๊ฒŒ ๊ฐฑ์‹ ํ•˜๋Š” ์ƒํƒœ ์ „์ด(State Transition)โ€œ์— ๊ฐ€๊น๋‹ค. ์ด๋ฒˆ ๋‹จ๊ณ„์—์„œ๋Š” MLflow Registry๋ฅผ ๋‹จ์ผ ์†Œ์Šค๋กœ ์‚ผ์•„ Triton Inference Server์— ๋ชจ๋ธ์„ ์ž๋™ ๋ฐฐํฌํ•˜๊ณ , ๋กœ๋”ฉ/ํ—ฌ์Šค ์ฒดํฌ/์‹ค์ œ ์ถ”๋ก  ๊ฒ€์ฆ์„ ๋ชจ๋‘ ํ†ต๊ณผํ•œ ๊ฒฝ์šฐ์—๋งŒ ์šด์˜ ๋ชจ๋ธ์„ ํ™•์ •(commit)ํ•˜๋ฉฐ, ์ค‘๊ฐ„ ๋‹จ๊ณ„์—์„œ ํ•˜๋‚˜๋ผ๋„ ์‹คํŒจํ•˜๋ฉด ์ด์ „ ์šด์˜ ์ƒํƒœ๋กœ ์ž๋™ ๋ณต๊ตฌ๋˜๋Š” ์ตœ์†Œ ๋กค๋ฐฑ ๊ตฌ์กฐ๋ฅผ ๊ตฌํ˜„ํ–ˆ๋‹ค. ...

December 29, 2025 ยท 4 min

[Triton ์šด์˜ํ˜• ์„œ๋น™ ํ”Œ๋žซํผ (GitOps ยท ๊ฒ€์ฆ ยท Alerting) - Triton ๊ตฌ์ถ•]

์ด ๊ธ€์—์„œ ๋‹ค๋ฃจ๋Š” ๊ฒƒ Triton Inference Server๋ฅผ CPU-only GitOps ๊ตฌ์กฐ๋กœ ๋ฐฐํฌํ•˜๊ณ , ONNX ๋ชจ๋ธ 1๊ฐœ์˜ load/infer ๊ฒ€์ฆ ๋ฐ Prometheus/Grafana ๊ด€์ธก๊นŒ์ง€ ์„œ๋น™ ํ”Œ๋žซํผ ๋ผˆ๋Œ€๋ฅผ ๊ตฌ์ถ•ํ•œ ๊ณผ์ • ์„ ์ˆ˜์ง€์‹ Observability 8๋‹จ๊ณ„: Data Pipeline ๊ณ ๋„ํ™” ์ด ๋‹จ๊ณ„์—์„œ ํ•ด๊ฒฐํ•˜๋ ค๋Š” ๋ฌธ์ œ ์‹ค๋ฌด์—์„œ ์„œ๋น™ ๊ณ„์ธต์€ ๊ณง๋ฐ”๋กœ ํŠธ๋ž˜ํ”ฝ๊ณผ SLA๋ฅผ ๋งž๋Š” ์ตœ์ „์„ ์ด๋‹ค. ๋ชจ๋ธ์ด ์•„๋ฌด๋ฆฌ ์ข‹์•„๋„ ์„œ๋น™์ด ๋ถˆ์•ˆ์ •ํ•˜๋ฉด ์šด์˜ ์‹œ ๋ฐ”๋กœ ๋ฌด๋„ˆ์ง„๋‹ค. ์ด๋ฒˆ์—๋Š” Triton ์ฒซ ๊ตฌ์ถ•์œผ๋กœ GPU/ํŒŒ์ดํ”„๋ผ์ธ ์—ฐ๋™์„ ์ผ๋ถ€๋Ÿฌ ๋นผ๊ณ , Triton ์ž์ฒด๋ฅผ GitOps๋กœ ์•ˆ์ •์ ์œผ๋กœ ๋„์šฐ๊ณ , ๋ชจ๋ธ load โ†’ infer โ†’ metrics ๊ด€์ธก๊นŒ์ง€ ์„œ๋น™ ํ”Œ๋žซํผ ๋ผˆ๋Œ€ ๊ตฌ์ถ•์„ ์ง„ํ–‰ํ–ˆ๋‹ค. ...

December 26, 2025 ยท 4 min