MLOps / ML Platform Engineer

Production-grade ML Platform์„ ์„ค๊ณ„ํ•˜๊ณ  ์šด์˜ํ•˜๋Š” ํ”Œ๋žซํผ ์—”์ง€๋‹ˆ์–ด์ž…๋‹ˆ๋‹ค.

DevOps 2๋…„ 3๊ฐœ์›”(๊ฐ€๋น„์•„) ์šด์˜ ๊ฒฝํ—˜์„ ๊ธฐ๋ฐ˜์œผ๋กœ,

GitOps ๊ธฐ๋ฐ˜ dev/prod ๋ถ„๋ฆฌํ˜• Production-grade E2E ML Platform์„ ์ง์ ‘ ์„ค๊ณ„ยท๊ตฌ์ถ•ยท๊ฒ€์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

๋‹จ์ˆœํžˆ ๋ชจ๋ธ์„ ๋ฐฐํฌํ•˜๋Š” ์‹œ์Šคํ…œ์ด ์•„๋‹ˆ๋ผ,

Data โ†’ Feature โ†’ Training โ†’ Registry โ†’ Deploy โ†’ Inference โ†’ Monitoring

์ „ ๊ณผ์ •์„ ์ž๋™ํ™”ํ•˜๊ณ ,

์šด์˜ ํ™˜๊ฒฝ์—์„œ ํ•„์š”ํ•œ ๋กค๋ฐฑ ์ „๋žต, ํ™•์žฅ์„ฑ, ๊ด€์ธก ์ฒด๊ณ„, ์žฅ์•  ๋Œ€์‘ ๊ตฌ์กฐ๊นŒ์ง€ ํฌํ•จํ•œ ML Platform ์•„ํ‚คํ…์ฒ˜๋ฅผ ๊ตฌํ˜„ํ–ˆ์Šต๋‹ˆ๋‹ค.


๐Ÿ”Ž Platform Snapshot (์š”์•ฝ)

  • GitOps: ArgoCD ๊ธฐ๋ฐ˜ dev/prod ์™„์ „ ๋ถ„๋ฆฌ ํ™˜๊ฒฝ
  • Orchestration: Airflow DAG ๊ธฐ๋ฐ˜ E2E ์ž๋™ํ™”
  • Model Registry: MLflow Registry + alias ๊ธฐ๋ฐ˜ Hot Swap / Rollback
  • Serving: Triton Inference Server + FastAPI reload ๊ตฌ์กฐ
  • Observability: Prometheus / Grafana / Alertmanager ๊ธฐ๋ฐ˜ ์šด์˜ ๊ด€์ธก
  • Reproducibility: Feature Store-lite(๋ฒ„์ „ํ™” + latest ๊ณ ์ •), Feast ๊ฒ€์ฆ
  • Deployment Strategy: Promotion / Shadow ๋ถ„๊ธฐ + Mirror/Split ํŠธ๋ž˜ํ”ฝ ๋ผ์šฐํŒ… + Triton RollingUpdate zero-downtime
  • Security & Resilience: NetworkPolicy(4์„œ๋น„์Šค) + ResourceQuota + Fail-open ์—์Šค์ปฌ๋ ˆ์ด์…˜ + Contract Testing
  • Architecture: Core / Baseline / Optional ๋ถ„๋ฆฌํ˜• ํ”Œ๋žซํผ ๊ตฌ์กฐ
  • Proof System: GitOps ๊ฒฝ๊ณ„ / runtime / optional attach-detach / observability ๊ฒ€์ฆ ๋ฌธ์„œํ™”
  • ๋ถ€ํ•˜ ํ…Œ์ŠคํŠธ: 136 RPS, p95 553ms, ์—๋Ÿฌ์œจ 0% (k6, 100 VU, 3๋…ธ๋“œ ํด๋Ÿฌ์Šคํ„ฐ)

๐Ÿ‘‰ ๋ชจ๋“  ๊ตฌ์„ฑ์€ GitHub ๋ฐ ๋ธ”๋กœ๊ทธ์— Proof ํ˜•ํƒœ๋กœ ๊ณต๊ฐœ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.


๐Ÿš€ ๋Œ€ํ‘œ ํ”„๋กœ์ ํŠธ

Production-grade E2E ML Platform

GitOps ๊ธฐ๋ฐ˜ dev/prod ํ™˜๊ฒฝ ๋ถ„๋ฆฌ์™€

Airflowโ€“MLflowโ€“Tritonโ€“FastAPIโ€“Observability๋ฅผ ํฌํ•จํ•œ ํ†ตํ•ฉ ML Platform ํ”„๋กœ์ ํŠธ์ž…๋‹ˆ๋‹ค.

์ด ํ”„๋กœ์ ํŠธ๋Š” ๋‹จ์ˆœ ๊ตฌํ˜„์ด ์•„๋‹ˆ๋ผ,

  • GitOps ๊ฒฝ๊ณ„ ๊ตฌ์กฐ
  • Optional attach / detach ๊ตฌ์กฐ
  • Serving Runtime ์ƒํƒœ
  • Observability ์‹œ์Šคํ…œ
  • ์šด์˜ ๋ฌธ์„œ ๋ฐ Proof ์ฒด๊ณ„

๋ฅผ ์‹ค์ œ ์‹คํ–‰ ๊ฒฐ๊ณผ์™€ ์บก์ฒ˜ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฒ€์ฆํ•œ ํ”Œ๋žซํผ์ž…๋‹ˆ๋‹ค.

๐Ÿ”— ํ”„๋กœ์ ํŠธ ๋ณด๊ธฐ

https://keonhoban.github.io/mlops-journey/projects/mlops_pipeline_e2e/01/


๐Ÿ”ฌ ๊ธฐ์ˆ  ์„ค๊ณ„ ํƒ์ƒ‰ ๋ฐ ๊ฒ€์ฆ ๊ธฐ๋ก

์•„๋ž˜ ๊ธ€๋“ค์€ ํ˜„์žฌ ML Platform์„ ๊ตฌ์„ฑํ•˜๊ธฐ๊นŒ์ง€ ์ง„ํ–‰ํ–ˆ๋˜

๊ธฐ์ˆ  ์„ค๊ณ„ ํƒ์ƒ‰ ๋ฐ ๊ฒ€์ฆ ๊ธฐ๋ก์ž…๋‹ˆ๋‹ค.

Observability ์„ค๊ณ„

Prometheus / Grafana / Alertmanager ๊ธฐ๋ฐ˜

ML Platform ๊ด€์ธก ์ฒด๊ณ„ ์„ค๊ณ„ ๋ฐ ์šด์˜ ๊ตฌ์กฐ

https://keonhoban.github.io/mlops-journey/projects/mlops_pipeline_observability/01/

Triton Serving ๊ตฌ์กฐ

Triton Inference Server ๊ธฐ๋ฐ˜

๋ชจ๋ธ ์„œ๋น™ ๋ฐ alias ๊ธฐ๋ฐ˜ hot swap ๊ตฌ์กฐ ์„ค๊ณ„

https://keonhoban.github.io/mlops-journey/projects/triton/01/

Feature Store (Lite + Feast ๊ฒ€์ฆ)

Feature Store-lite ๊ตฌ์กฐ์™€

Feast ๊ธฐ๋ฐ˜ Feature Store ๊ฒ€์ฆ ๊ธฐ๋ก

https://keonhoban.github.io/mlops-journey/projects/feature_store/01/


๐Ÿ”— GitHub

GitOps Repository

GitOps ๊ธฐ๋ฐ˜ ML Platform ์ธํ”„๋ผ ์ฝ”๋“œ

https://github.com/keonhoban/mlops-infra-gitops

Airflow DAG

ML Pipeline Orchestration ์ฝ”๋“œ

https://github.com/keonhoban/airflow-dags-dev

MLOps Experiments

ML Platform ์‹คํ—˜ ๋ฐ ๊ฒ€์ฆ ์ฝ”๋“œ

https://github.com/keonhoban/mlops-infra-labs


๐Ÿ—๏ธ Architecture Philosophy

์ œ๊ฐ€ ๊ตฌ์ถ•ํ•œ ํ”Œ๋žซํผ์€ ์‹คํ—˜์šฉ ๊ตฌ์„ฑ์ด ์•„๋‹ˆ๋ผ,

์šด์˜ ํ™˜๊ฒฝ์„ ์ „์ œ๋กœ ์„ค๊ณ„๋œ ML Platform์ž…๋‹ˆ๋‹ค.

  • Kubernetes ๊ธฐ๋ฐ˜ ML Platform
  • GitOps ๊ธฐ๋ฐ˜ ๋ฐฐํฌ ๋ฐ ํ™˜๊ฒฝ ๋ถ„๋ฆฌ ์ „๋žต
  • MLflow Tracking + Registry ์šด์˜ ๊ตฌ์กฐ
  • MLflow alias ๊ธฐ๋ฐ˜ ๋ชจ๋ธ ๊ต์ฒด ๋ฐ runtime reload
  • DAG ๊ธฐ๋ฐ˜ ์ž๋™ ํ•™์Šตยท๋ฐฐํฌ ํŒŒ์ดํ”„๋ผ์ธ
  • ์žฅ์•  ๋Œ€์‘์„ ๊ณ ๋ คํ•œ ๋กค๋ฐฑ ๋ฐ ์ƒํƒœ ์ „์ด ์„ค๊ณ„
  • Promotion/Shadow ๋ถ„๊ธฐ + NetworkPolicy ๊ธฐ๋ฐ˜ ๋„คํŠธ์›Œํฌ ๊ฒฉ๋ฆฌ
  • ๋ฉ”ํŠธ๋ฆญยท๋กœ๊ทธยท์•Œ๋žŒ ๊ธฐ๋ฐ˜ ์šด์˜ ๊ด€์ธก ์ฒด๊ณ„

โ€œ๋ชจ๋ธ์„ ์˜ฌ๋ฆฐ๋‹คโ€๊ฐ€ ์•„๋‹ˆ๋ผ,

์ง€์†์ ์œผ๋กœ ์šด์˜ ๊ฐ€๋Šฅํ•œ ํ”Œ๋žซํผ์„ ๋งŒ๋“ ๋‹ค๋Š” ๊ด€์ ์œผ๋กœ ์„ค๊ณ„ํ–ˆ์Šต๋‹ˆ๋‹ค.


๐Ÿ’ผ DevOps / SRE Experience

๊ฐ€๋น„์•„ DevOpsํŒ€ (System Engineer)

2023.01 ~ 2025.03

๋Œ€๊ทœ๋ชจ ๋ฉ”์ผยท์ธํ”„๋ผ ์šด์˜ ๋ฐ ์ž๋™ํ™”๋ฅผ ๋‹ด๋‹นํ–ˆ์Šต๋‹ˆ๋‹ค.

์ฃผ์š” ๊ฒฝํ—˜

๋ฌด์ค‘๋‹จ ๋ฉ”์ผ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜

  • ์‹ ๊ทœ ์„œ๋ฒ„ ํ”„๋กœ๋น„์ €๋‹
  • DNS / MX / SPF / DKIM ์ „ํ™˜
  • ์›๋ณต ์‹œ๋‚˜๋ฆฌ์˜ค ์„ค๊ณ„
  • ์ฆ๋ถ„ ์ด์ „ ์ž๋™ํ™”

SMTP ๋ฐœ์†ก ์šฐํšŒ ์ž๋™ํ™”

  • Loop ๋ฐฉ์ง€ ๋กœ์ง ์„ค๊ณ„
  • ์ค‘๋ณต ์šฐํšŒ ๊ฐ์ง€
  • Ansible ๊ธฐ๋ฐ˜ ์ž๋™ํ™”
  • ๋กœ๊ทธ ์ž๋™ ๊ด€๋ฆฌ ์ฒด๊ณ„

CI/CD ๋ฐ ๋ฐฐํฌ ์šด์˜

  • GitLab CI + Docker + Helm
  • Health Check ํ‘œ์ค€ํ™”
  • CloudFront / Route53 ๊ธฐ๋ฐ˜ ํŠธ๋ž˜ํ”ฝ ์ „ํ™˜

Kubernetes ์šด์˜

  • ๋…ธ๋“œ ์ฆ์„ค ๋ฐ ์•ˆ์ „ ํˆฌ์ž… ์ ˆ์ฐจ
  • Cordon / ๊ฒ€์ฆ ๊ธฐ๋ฐ˜ ์šด์˜ ์ „๋žต

์šด์˜ ์ž๋™ํ™”์™€ ์•ˆ์ •์„ฑ์„ ์ค‘์‹ฌ์œผ๋กœ ๊ฒฝํ—˜์„ ์ถ•์ ํ–ˆ์Šต๋‹ˆ๋‹ค.


๐Ÿ›  Tech Stack

ML Platform

MLflow / Airflow / Triton / FastAPI

Feature Store-lite / Feast

Prometheus / Grafana / Alertmanager

Infrastructure

Kubernetes / ArgoCD / Docker

AWS (Route53, CloudFront ๋“ฑ)


๐Ÿ“œ Certifications

  • AWS Solutions Architect โ€“ Professional
  • AWS Solutions Architect โ€“ Associate
  • ์ •๋ณด์ฒ˜๋ฆฌ๊ธฐ์‚ฌ
  • ๋ฆฌ๋ˆ…์Šค ๋งˆ์Šคํ„ฐ 2๊ธ‰
  • ๋„คํŠธ์›Œํฌ ๊ด€๋ฆฌ์‚ฌ

๐ŸŽ“ Education

์ปดํ“จํ„ฐ๊ณตํ•™ ํ•™์‚ฌ (ํ•™์ ์€ํ–‰์ œ, 4.2 / 4.5)

๋™์˜๊ณผํ•™๋Œ€ํ•™๊ต ์˜๋ฌดํ–‰์ •๊ณผ ์กธ์—…


๐ŸŽฏ Direction

๋‹จ์ˆœ ๋ชจ๋ธ ๋ฐฐํฌ ์—”์ง€๋‹ˆ์–ด๊ฐ€ ์•„๋‹ˆ๋ผ,

Production-grade ML Platform์„ ์„ค๊ณ„ํ•˜๊ณ  ์šด์˜ ์•ˆ์ •์„ฑ์„ ๊ฐœ์„ ํ•˜๋Š” ํ”Œ๋žซํผ ์—”์ง€๋‹ˆ์–ด๋ฅผ ์ง€ํ–ฅํ•ฉ๋‹ˆ๋‹ค.


๐Ÿ“ฌ Contact

Email: keonho0510@naver.com

GitHub: https://github.com/keonhoban