[GitOps ๊ธฐ๋ฐ˜ E2E ML Platform - ์šด์˜ ์ œ์–ด ๊ตฌ์กฐ]

์ด ๊ธ€์—์„œ ๋‹ค๋ฃจ๋Š” ๊ฒƒ Prometheus ๊ธฐ๋ฐ˜ Observability ๊ตฌ์กฐ, Auto Rollback ์ •์ฑ…(error rate/latency/service health), Manual Rollback DAG, ๊ด€์ธก ์‹คํŒจ ์‹œ ์ •์ฑ… ์„ ์ˆ˜์ง€์‹ GitOps ๊ธฐ๋ฐ˜ E2E ML Platform - ์‹ค์ œ ๋™์ž‘ ํ™•์ธ Observability / Auto Rollback / Manual Rollback ๋ฐฐํฌ๋ณด๋‹ค ์ค‘์š”ํ•  ์ˆ˜ ์žˆ๋Š” ์šด์˜ ์ œ์–ด ๋“ค์–ด๊ฐ€๋ฉฐ ML ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•  ๋•Œ ๋งŽ์€ ๊ด€์‹ฌ์ด ๋‹ค์Œ ๋‹จ๊ณ„์— ์ง‘์ค‘๋ฉ๋‹ˆ๋‹ค. Train Register Deploy ํ•˜์ง€๋งŒ ์‹ค์ œ ์šด์˜ ํ™˜๊ฒฝ์—์„œ๋Š” ๋ฐฐํฌ ์ดํ›„์— ๋” ์ค‘์š”ํ•œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ๋‹ค์Œ ์ƒํ™ฉ์„ ์ƒ๊ฐํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ...

March 6, 2026 ยท 5 min

[MLOps ํ”Œ๋žซํผ Observability & Data Pipeline - 2๋‹จ๊ณ„ : Alertmanager Slack & ํŠธ๋Ÿฌ๋ธ”์ŠˆํŒ…]

์ด ๊ธ€์—์„œ ๋‹ค๋ฃจ๋Š” ๊ฒƒ Alertmanager ์„ค์ •์„ SealedSecret์œผ๋กœ ๊ด€๋ฆฌํ•˜๊ณ , dev/prod ๊ฐ๊ฐ์˜ Slack ์ฑ„๋„๋กœ ์•Œ๋žŒ์ด ์ •ํ™•ํžˆ ํ๋ฅด๋Š” ๊ด€์ธก ํŒŒ์ดํ”„๋ผ์ธ์„ ์™„์„ฑํ•˜๋Š” ๊ณผ์ •์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ์„ ์ˆ˜์ง€์‹ Observability 1๋‹จ๊ณ„: kube-prometheus-stack + GitOps ๊ตฌ์ถ• ์ด ๋‹จ๊ณ„์—์„œ ํ•ด๊ฒฐํ•˜๋ ค๋Š” ๋ฌธ์ œ ๋ชจ๋ธ์ด ์ž˜ ํ•™์Šต๋˜๊ณ  ์ž˜ ๋ฐฐํฌ๋˜๋Š” ๊ฒƒ๋ณด๋‹ค, ๋ฌธ์ œ๊ฐ€ ์ƒ๊ฒผ์„ ๋•Œ ์ฆ‰์‹œ ๊ฐ์ง€๋˜๋Š” ๊ฒƒ์ด ๋” ์ค‘์š”ํ•  ๋•Œ๊ฐ€ ๋งŽ๋‹ค. dev/prod ๊ฐ๊ฐ์—์„œ ์•Œ๋žŒ์ด ์ •ํ™•ํ•œ Slack ์ฑ„๋„๋กœ, ๊นจ์ง ์—†์ด ํ๋ฅด๋Š” ๊ด€์ธก ํŒŒ์ดํ”„๋ผ์ธ์„ ๋จผ์ € ์™„์„ฑํ•ด์•ผ ํ•œ๋‹ค. ์ด ๊ธฐ๋ฐ˜์ด ๊ฐ–์ถฐ์ ธ์•ผ ์ดํ›„ FastAPI ์ง€์—ฐ, ํ•ซ์Šค์™‘ ์‹คํŒจ, DAG ์—๋Ÿฌ ๊ฐ™์€ ์šด์˜ํ˜• MLOps ์ด๋ฒคํŠธ๋ฅผ ์‹ค์‹œ๊ฐ„์œผ๋กœ ๊ฐ์ง€ํ•  ์ˆ˜ ์žˆ๋‹ค. ...

October 18, 2025 ยท 5 min

[MLOps ์šด์˜ ๊ณ ๋„ํ™” - ๊ฒ€์ฆ (Proof of Automation)]

๐Ÿง  Proof of Automation โ€” ์ž‘๋™ ๊ฒ€์ฆ ๋ฃจํ”„ ํ•˜๋‚˜์˜ ์ปค๋ฐ‹์œผ๋กœ CI โ†’ CD โ†’ ํ•™์Šต/๋“ฑ๋ก โ†’ READY โ†’ ํ•ซ์Šค์™‘ โ†’ ์‹คํ—˜ โ†’ ๊ด€์ œ๊นŒ์ง€ ์ž๋™ํ™”๋˜์—ˆ์Œ์„ ์‹œ๊ฐ์ ์œผ๋กœ ์ฆ๋ช…ํ•ฉ๋‹ˆ๋‹ค. ๐Ÿงญ ๋ชฉ์ฐจ # ์„น์…˜ 0 ์ค€๋น„ (CI/CDยทGitOpsยทSecrets ๊ธฐ๋ณธ ์„ธํŒ…) 1 CI ์‹คํ–‰ (GitHub Actions) 2 ArgoCD ์ž๋™ ๋™๊ธฐํ™” 3 Airflow ํ•™์Šตโ†’๋“ฑ๋กโ†’READY (FAIL/SUCCESS + Slack) 4 MLflow ๋ชจ๋ธ ๋“ฑ๋กยท๋ณ„์นญยท์•„ํ‹ฐํŒฉํŠธ 5 FastAPI ํ•ซ์Šค์™‘ (/reload) 6 ์ˆ˜๋™ ๋กค๋ฐฑ (์šด์˜ ๋ณต๊ตฌ ๊ฐ€๋“œ๋ ˆ์ผ) 7 ๋กœ๊ทธ ๊ณ„์ธต (Airflow=S3 / FastAPI=NFS) 8 ArgoCDโ†”Slack ์šด์˜ ๊ด€์ œ 9 ๋ณด์•ˆ ์ž๋™ํ™” (AWS Rotation + SealedSecrets Re-seal) 10 ํŠธ๋ž˜ํ”ฝ ์‹คํ—˜ (A/B ยท Canary ยท Blue-Green) 11 One-Commit Flow ์ „์ฒด ์ฒด์ธ ๊ฒ€์ฆ 0) ๐Ÿงฐ ์ค€๋น„ (GitOps) charts/fastapi/values/{dev,prod}.yaml์—์„œ ALIAS_SELECTION_MODE/DEFAULT_ALIAS/CANARY_PERCENT ๊ฐ’์„ ์‹œ๋‚˜๋ฆฌ์˜ค๋ณ„๋กœ ์ˆ˜์ • โ†’ git commit โ†’ git push (ํ…Œ์ŠคํŠธ ์Šคํฌ๋ฆฝํŠธ) /ops/ab_test.sh ์ €์žฅ (์•„๋ž˜ ์ œ๊ณต) 1) ๐Ÿงช CI ์ง„์ž… (GitHub Actions) ...

October 15, 2025 ยท 14 min

[MLOps ์šด์˜ ๊ณ ๋„ํ™” - ์—ํ•„๋กœ๊ทธ]

์—ํ•„๋กœ๊ทธ โ€” โ€œํ•œ ๋ฒˆ์˜ ์ปค๋ฐ‹์œผ๋กœ ๋๊นŒ์ง€ ๊ฐ€๋Š” ์ž์œจํ˜• MLOps ํ”Œ๋žซํผโ€ ๐Ÿ“Œ ์ „์ฒด ๊ฒฝ๋กœ ์š”์•ฝ ์ˆœ์„œ ์ฃผ์ œ 0 ๐Ÿ”— FastAPI A/BยทCanaryยทBlue-Green ์„œ๋น™ ๋ฒ ์ด์Šค 1 ๐Ÿ”— ํ•ซ์Šค์™‘ ๊ณ ๋„ํ™” (/reload ๋ณด์•ˆยทDAG ์ž๋™ํ™”) 2 ๐Ÿ”— Slack Alert ํ†ตํ•ฉ (FastAPIยทAirflow ๊ณต์šฉ) 3 ๐Ÿ”— ๋ชจ๋ธ ๋กค๋ฐฑ ์ž๋™ํ™” (๋“ฑ๋ก ์‹คํŒจ ๋Œ€๋น„ ๋ณต๊ตฌ) 4 ๐Ÿ”— FastAPI ๋กœ๊ทธ ์•ˆ์ •ํ™” (NFS + PV/PVC + Loguru) 5 ๐Ÿ”— Airflow ์•ˆ์ •ํ™” & FastAPI HTTPS ๋ณด์•ˆ 6 ๐Ÿ”— GitOps ๊ณ ๋„ํ™” (Argo CDยทMetalLBยทApplicationSet) 7 ๐Ÿ”— Argo CD Notifications ์ž๋™ํ™” (Slack ์—ฐ๋™) 8 ๐Ÿ”— CI/CD ์šด์˜ ์ž๋™ํ™” (GitHub ActionsยทHelm Lint) 9 ๐Ÿ”— ์‹œํฌ๋ฆฟ ๊ด€๋ฆฌ & ํ‚ค ํšŒ์ „ ์ž๋™ํ™” (AWSยทSealedSecret) 10 ๐Ÿ”— ๊ฒ€์ฆ (Proof of Automation) ๐ŸŽฏ ์ „์ฒด ํšŒ๊ณ  ์š”์•ฝ (0~9๋‹จ๊ณ„) ๋‹จ๊ณ„ ํ•ต์‹ฌ ๋ชฉํ‘œ ์ฃผ์š” ๊ฐœ์„ ์  0 FastAPI ๋ฆฌ๋‰ด์–ผ /predictยท/variantยท/reload๋กœ A/BยทCanaryยทBlue-Green ๊ณต์šฉ ๋ผ์šฐํŒ… 1 ํ•ซ์Šค์™‘ ๋ณด์•ˆ/์ž๋™ํ™” ํ† ํฐ ์ธ์ฆ /reload + Ingress ํ™”์ดํŠธ๋ฆฌ์ŠคํŠธ/TLS 2 Slack Alert ํ†ตํ•ฉ FastAPIยทAirflow ๊ณต์šฉ ์•Œ๋ฆผ ํ•จ์ˆ˜ + ์‹คํŒจ ์ฝœ๋ฐฑ Slack ์—ฐ๋™ 3 ๋กค๋ฐฑ ์ž๋™ํ™” ๋“ฑ๋ก ์‹คํŒจ ์‹œ ์ด์ „ ๋ฒ„์ „ ๋กค๋ฐฑ + READY ํ›„ ์ž๋™ /reload 4 ๋กœ๊ทธ ์•ˆ์ •ํ™” FastAPI=NFS, Airflow=S3 ๋กœ๊ทธ ๊ณ„์ธต ๋ถ„๋ฆฌยทํ‘œ์ค€ํ™” 5 Airflow ์•ˆ์ •ํ™”ยทHTTPS Sensor/์˜์กด์„ฑ ์ •๋ฆฌ + cert-manager ๊ธฐ๋ฐ˜ TLS 6 GitOps ์ „ํ™˜ MetalLB + SealedSecret + ApplicationSet ๋ฐฐํฌ ๊ตฌ์กฐ 7 GitOps ๋ชจ๋‹ˆํ„ฐ๋ง ArgoCD โ†’ Slack Sync/Health ์ƒํƒœ ์•Œ๋ฆผ 8 CI/CD ๊ณ ๋„ํ™” Helm Lint + kubeconform + yamllint PR/๋จธ์ง€ ํŒŒ์ดํ”„๋ผ์ธ 9 SecretOps ํ‘œ์ค€ํ™” AWS Key Rotation + SealedSecret Re-seal ์ž๋™ํ™” ๐Ÿ”„ ํ•ต์‹ฌ ...

October 3, 2025 ยท 3 min

[MLOps ์šด์˜ ๊ณ ๋„ํ™” - 9๋‹จ๊ณ„: ์‹œํฌ๋ฆฟ ๊ด€๋ฆฌ & ํ‚ค ํšŒ์ „ ์ž๋™ํ™” (AWSยทSealedSecret)]

์ด ๊ธ€์—์„œ ๋‹ค๋ฃจ๋Š” ๊ฒƒ AWS IAM ํ‚ค ํšŒ์ „๊ณผ SealedSecret ์žฌ์•”ํ˜ธํ™”๋ฅผ ์Šคํฌ๋ฆฝํŠธ๋กœ ํ‘œ์ค€ํ™”ํ•˜๊ณ , GitOps ์ž๋™ ๋ฐ˜์˜๊นŒ์ง€ ์ด์–ด์ง€๋Š” ๋ฐ˜์ž๋™ SecretOps ์ฒด๊ณ„๋ฅผ ๊ตฌ์ถ•ํ•˜๋Š” ๊ณผ์ •์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ์„ ์ˆ˜์ง€์‹ MLOps ์šด์˜ ๊ณ ๋„ํ™” 8๋‹จ๊ณ„: CI/CD ์šด์˜ ์ž๋™ํ™” โ€” GitHub Actions + Helm Lint ํŒŒ์ดํ”„๋ผ์ธ ์ด ๋‹จ๊ณ„์—์„œ ํ•ด๊ฒฐํ•˜๋ ค๋Š” ๋ฌธ์ œ MLOps์—์„œ๋Š” ๋ชจ๋ธ๋ณด๋‹ค ๋จผ์ € โ€˜๋น„๋ฐ€โ€™์ด ๋ฌด๋„ˆ์ง„๋‹ค. AWS ํ‚ค์ฒ˜๋Ÿผ ์ฃผ๊ธฐ์ ์œผ๋กœ ๊ต์ฒดํ•ด์•ผ ํ•˜๋Š” ๊ฐ’์€ ์ž๋™ํ™”๊ฐ€ ํ•„์š”ํ•˜์ง€๋งŒ, JWTยทSlack Webhook์ฒ˜๋Ÿผ ๊ต์ฒด ์‹œ ์˜ํ–ฅ์ด ํฐ ๊ฐ’์€ ์šด์˜์ž๊ฐ€ ์ง์ ‘ ๊ฒฐ์ •ํ•ด์•ผ ํ•œ๋‹ค. ์ž๋™ํ™”ํ•  ๊ฒƒ๊ณผ ์‚ฌ๋žŒ์ด ํ™•์ธํ•ด์•ผ ํ•  ๊ฒƒ์„ ๊ตฌ๋ถ„ํ•œ SecretOps ์ฒด๊ณ„๋ฅผ ํ™•๋ฆฝํ•œ๋‹ค. ...

September 25, 2025 ยท 3 min