[MLOps 운영 고도화 - 0단계: FastAPI A/BΒ·CanaryΒ·Blue-Green μ„œλΉ™ 베이슀]

이 κΈ€μ—μ„œ λ‹€λ£¨λŠ” 것 MLflow Alias 기반으둜 A/BΒ·CanaryΒ·Blue-Green μ„Έ κ°€μ§€ μ„œλΉ™ μ „λž΅μ„ ν•˜λ‚˜μ˜ FastAPI μ•±μ—μ„œ μ²˜λ¦¬ν•˜λŠ” ꡬ쑰λ₯Ό μ„€κ³„ν•˜κ³ , 이후 운영 μžλ™ν™”μ˜ 곡톡 베이슀λ₯Ό λ§Œλ“œλŠ” 과정을 λ‹€λ£Ήλ‹ˆλ‹€. μ„ μˆ˜μ§€μ‹ [TS] Airflow 기초 μžλ™ν™” νŠΈλŸ¬λΈ”μŠˆνŒ… β€” Airflow β†’ MLflow β†’ FastAPI 연동 κΈ°λ³Έ 흐름 Level 3은 이 κΈ€λΆ€ν„° μ‹œμž‘ν•©λ‹ˆλ‹€. 이 λ‹¨κ³„μ—μ„œ ν•΄κ²°ν•˜λ €λŠ” 문제 운영 ν™˜κ²½μ—μ„œ λͺ¨λΈμ€ μ–Έμ œλ“  ꡐ체될 수 있고, κ·Έ μˆœκ°„μ΄ μ„œλΉ„μŠ€ ν’ˆμ§ˆμ΄ κ°€μž₯ 크게 ν”λ“€λ¦¬λŠ” μœ„ν—˜ ꡬ간이닀. 이후 λͺ¨λ“  μžλ™ν™”(ν•™μŠ΅Β·λ“±λ‘Β·ν•«μŠ€μ™‘Β·λ‘€λ°±)의 기반이 λ˜λŠ” A/BΒ·CanaryΒ·Blue-Green μ „λž΅μ„ λͺ¨λ‘ μ²˜λ¦¬ν•  수 μžˆλŠ” FastAPI μ„œλΉ™ ꡬ쑰뢀터 μž‘μ•„μ•Ό ν•œλ‹€. 이 λΌˆλŒ€κ°€ 완성돼야 Airflow, MLflow, ArgoCD와 μ—°κ²°λœ μš΄μ˜ν˜• MLOps νŒŒμ΄ν”„λΌμΈμ„ λ§Œλ“€ 수 μžˆλ‹€. ...

July 18, 2025 Β· 5 min

[MLOps ν”Œλž«νΌ ꡬ좕 : Airflow-MLflow-FastAPI (Helm)]

🧩 μ‹€μ „ μ‹œλ‚˜λ¦¬μ˜€ 기반 ꡬ성 λ°°κ²½ 이 ν”„λ‘œμ νŠΈλŠ” λ‹¨μˆœ μ‹€μŠ΅μ„ λ„˜μ–΄μ„œ, μ‹€μ œ λ°œμƒν•˜λŠ” λ‹€μŒκ³Ό 같은 λ¬Έμ œλ“€μ„ ν•΄κ²°ν•˜κΈ° μœ„ν•œ MLOps 인프라 ꡬ좕을 λͺ©ν‘œλ‘œ μ„€κ³„λ˜μ—ˆμŠ΅λ‹ˆλ‹€. μ—¬λŸ¬ λͺ¨λΈ μ‹€ν—˜ κ²°κ³Όκ°€ λ’€μ„žμ—¬ 좔적이 μ–΄λ €μš΄ 문제 β†’ MLflow Tracking μ„œλ²„ + PostgreSQL 메타데이터 μ €μž₯μ†Œ ꡬ성 λͺ¨λΈ 파일 및 λ‘œκ·Έκ°€ λ‘œμ»¬μ—λ§Œ μ €μž₯λ˜μ–΄ ν˜‘μ—… 및 μž¬ν˜„μ„±μ΄ λ–¨μ–΄μ§€λŠ” 문제 β†’ S3 기반 artifact store ꡬ성 + pyfunc 기반 λͺ¨λΈ μ„œλΉ™ ꡬ쑰 섀계 μˆ˜μž‘μ—… DAG 등둝, λͺ¨λΈ 배포 λ“±μ˜ λΉ„νš¨μœ¨μ  운영 문제 β†’ Airflow + GitSync μ—°λ™μœΌλ‘œ νŒŒμ΄ν”„λΌμΈ μžλ™ν™” 및 버전 관리 κ°€λŠ₯ ...

July 15, 2025 Β· 3 min

[MLOps ν”Œλž«νΌ ꡬ좕 - 6단계: μ‹€μ‹œκ°„ λͺ¨λΈ ν•«μŠ€μ™‘ ꡬ쑰 μ‹€ν—˜]

이 κΈ€μ—μ„œ λ‹€λ£¨λŠ” 것 Airflow DAGμ—μ„œ 쑰건뢀 λͺ¨λΈ 등둝 ν›„ FastAPI ν•«μŠ€μ™‘κΉŒμ§€ E2E μžλ™ν™” 흐름을 μ‹€ν—˜ν•©λ‹ˆλ‹€. μ„ μˆ˜μ§€μ‹ MLOps ν”Œλž«νΌ ꡬ좕 5단계: FastAPI μ„œλΉ™ 및 ν•«μŠ€μ™‘ ꡬ쑰 ꡬ좕 β€” FastAPI λͺ¨λΈ λ‘œλ”©κ³Ό /reload API ꡬ쑰 이 λ‹¨κ³„μ—μ„œ ν•΄κ²°ν•˜λ €λŠ” 문제 κ°œλ³„ μ»΄ν¬λ„ŒνŠΈ(Airflow, MLflow, FastAPI)λ₯Ό 각각 κ΅¬μ„±ν–ˆμ§€λ§Œ, ν•™μŠ΅λΆ€ν„° μ„œλΉ™κΉŒμ§€μ˜ μžλ™ν™” 흐름을 E2E둜 검증해야 μ‹€μ œ 운영 κ°€λŠ₯ μ—¬λΆ€λ₯Ό νŒλ‹¨ν•  수 μžˆμŠ΅λ‹ˆλ‹€. 이 λ‹¨κ³„μ—μ„œλŠ” Airflow DAGμ—μ„œ λͺ¨λΈ ν•™μŠ΅ β†’ μ„±λŠ₯ κΈ°μ€€ λΆ„κΈ° β†’ MLflow 등둝 β†’ FastAPI ν•«μŠ€μ™‘κΉŒμ§€ 전체 νŒŒμ΄ν”„λΌμΈμ„ μ‹€ν—˜ν•©λ‹ˆλ‹€. ...

July 10, 2025 Β· 2 min

[MLOps ν”Œλž«νΌ ꡬ좕 - 3단계: MLflow : PostgreSQL + S3 연동 기반 Helm ꡬ성]

이 κΈ€μ—μ„œ λ‹€λ£¨λŠ” 것 MLflowλ₯Ό PostgreSQL + S3 λ°±μ—”λ“œλ‘œ κ΅¬μ„±ν•˜μ—¬ Kubernetes에 Helm λ°°ν¬ν•˜κ³ , μ»€μŠ€ν…€ 이미지와 Ingressλ₯Ό μ—°λ™ν•©λ‹ˆλ‹€. μ„ μˆ˜μ§€μ‹ MLOps ν”Œλž«νΌ ꡬ좕 2단계: S3 & PostgreSQL Secret 관리 β€” Secret/ConfigMap μ£Όμž… μ „λž΅ 이 λ‹¨κ³„μ—μ„œ ν•΄κ²°ν•˜λ €λŠ” 문제 ν•œ ν”„λ‘œμ νŠΈμ— μ—¬λŸ¬ λ²„μ „μ˜ λͺ¨λΈμ΄ λ“±μž₯ν•˜λ©΄, 각 μ‹€ν—˜μ˜ νŒŒλΌλ―Έν„°/λ©”νŠΈλ¦­/μ•„ν‹°νŒ©νŠΈλ₯Ό μ²΄κ³„μ μœΌλ‘œ 좔적할 μ„œλ²„κ°€ ν•„μš”ν•©λ‹ˆλ‹€. 둜컬 MLflow UI(SQLite + file μ €μž₯)λ‘œλŠ” νŒ€ κ³΅μœ μ™€ μ•„ν‹°νŒ©νŠΈ μ˜μ†μ„±μ΄ λΆ€μ‘±ν•©λ‹ˆλ‹€. 이 λ‹¨κ³„μ—μ„œλŠ” MLflowλ₯Ό PostgreSQL(메타데이터) + S3(μ•„ν‹°νŒ©νŠΈ) λ°±μ—”λ“œλ‘œ κ΅¬μ„±ν•˜κ³  Helm으둜 Kubernetes에 λ°°ν¬ν•©λ‹ˆλ‹€. ...

June 30, 2025 Β· 3 min

[Airflow 기초 μžλ™ν™” - Airflow β†’ MLflow β†’ FastAPI]

🧭 전체 흐름 μ˜ˆμ‹œ [AIRFLOW DAG μ‹€ν–‰] ↓ [train_mlflow.py] - iris λͺ¨λΈ ν•™μŠ΅ - νŒŒλΌλ―Έν„°/λ©”νŠΈλ¦­ λ‘œκΉ… - λͺ¨λΈ Registry 등둝 ↓ [promote_mlflow.py] - μ΅œμ‹  λͺ¨λΈμ„ Production으둜 μ „ν™˜ ↓ [FastAPI] - models:/IrisModel/Production β†’ μ‹€μ‹œκ°„ 예츑 πŸ‘‰ μ‹€μŠ΅ μ½”λ“œλŠ” πŸ”— GitHub (Airflow + MLflow + FastAPI) βœ… [1단계] ν”„λ‘œμ νŠΈ κΈ°λ³Έ 폴더 ꡬ쑰 섀계 πŸ“ 1. 전체 디렉토리 ꡬ성도 mlops_project/ β”œβ”€β”€ airflow/ πŸ›« Airflow μ„€μ • 및 DAG μŠ€μΌ€μ€„λŸ¬ β”‚ β”œβ”€β”€ dags/ ← DAG μ •μ˜ 디렉토리 β”‚ β”‚ └── train_with_mlflow.py ← ν•™μŠ΅ DAG (MLflow 연동) β”‚ β”œβ”€β”€ Dockerfile.airflow ← Airflow용 Dockerfile β”‚ β”œβ”€β”€ requirements.txt ← Airflow μ˜μ‘΄μ„± β”‚ └── .dockerignore β”‚ β”œβ”€β”€ fastapi/ ⚑ FastAPI 예츑 API μ„œλ²„ β”‚ β”œβ”€β”€ app/ β”‚ β”‚ └── main.py ← λͺ¨λΈ μ„œλΉ™ μ—”λ“œν¬μΈνŠΈ β”‚ β”œβ”€β”€ Dockerfile.api ← FastAPI용 Dockerfile β”‚ β”œβ”€β”€ requirements.txt ← FastAPI μ˜μ‘΄μ„± β”‚ └── .dockerignore β”‚ β”œβ”€β”€ ml_code/ 🧠 ML ν•™μŠ΅ 및 ν”„λ‘œλͺ¨μ…˜ μ½”λ“œ β”‚ β”œβ”€β”€ train_mlflow.py ← λͺ¨λΈ ν•™μŠ΅ 및 MLflow λ‘œκΉ… β”‚ └── promote_mlflow.py ← λͺ¨λΈ ν”„λ‘œλͺ¨μ…˜ (Staging β†’ Production) β”‚ β”œβ”€β”€ mlflow_store/ πŸ—‚οΈ MLflow μ €μž₯μ†Œ 경둜 (λ³Όλ₯¨) β”‚ β”œβ”€β”€ Dockerfile.mlflow ← MLflow μ„œλ²„ μ»€μŠ€ν„°λ§ˆμ΄μ§• β”‚ β”œβ”€β”€ mlflow.db ← Model Registry DB (sqlite) β”‚ β”œβ”€β”€ mlruns/ ← μ‹€ν—˜ 둜그 디렉토리 β”‚ β”œβ”€β”€ artifacts/ ← λͺ¨λΈ μ•„ν‹°νŒ©νŠΈ μ €μž₯μ†Œ β”‚ └── .dockerignore β”‚ β”œβ”€β”€ docker-compose.yaml 🧩 전체 μ„œλΉ„μŠ€ ꡬ성 μ •μ˜ β”œβ”€β”€ .env πŸ” 민감 정보 (.env둜 뢄리) β”œβ”€β”€ README.md πŸ“ 전체 ν”„λ‘œμ νŠΈ λ¬Έμ„œν™” β”œβ”€β”€ .gitignore └── .dockerignore βœ… [2단계] docker-compose.yaml 톡합 ꡬ성 🧭 ꡬ성 λͺ©ν‘œ μ„œλΉ„μŠ€λͺ… μ„€λͺ… 포트 airflow DAG μ‹€ν–‰ ν™˜κ²½ (webserver/scheduler) 8080 postgres Airflow 메타데이터 μ €μž₯용 DB λ‚΄λΆ€ 톡신 mlflow MLflow UI + Registry κΈ°λŠ₯ 5000 fastapi μΆ”λ‘  API μ„œλ²„ (λͺ¨λΈ λ‘œλ”©) 8000 이미지 μ‚¬μš©μ‹œ 주의 (UI만 μ œκ³΅ν•˜λŠ” 이미지 쑴재) πŸ“„ docker-compose.yaml 전체 μ˜ˆμ‹œ version: '3.8' services: # πŸ“¦ PostgreSQL: Airflow 메타데이터 μ €μž₯용 DB postgres: image: postgres:13 container_name: postgres env_file: - .env # ← 민감정보 뢄리 (아이디/λΉ„λ²ˆ) environment: POSTGRES_USER: ${POSTGRES_USER} POSTGRES_PASSWORD: ${POSTGRES_PASSWORD} POSTGRES_DB: ${POSTGRES_DB} volumes: # ← μ½”λ“œ/데이터 곡유 및 μ˜μ†μ„± 보μž₯ - postgres_data:/var/lib/postgresql/data # ← DB 데이터 μœ μ§€ (μž¬μ‹œμž‘ λŒ€λΉ„) # πŸ›« Airflow: DAG μŠ€μΌ€μ€„λŸ¬ 및 νƒœμŠ€ν¬ μ‹€ν–‰ airflow: build: context: ./airflow # β†’ Airflow μ „μš© Dockerfile 경둜 dockerfile: Dockerfile.airflow container_name: airflow command: standalone # β†’ 둜컬 ν…ŒμŠ€νŠΈμš© 간단 μ‹€ν–‰ λͺ…λ Ή # (- Scheduler + Webserver + DB μ΄ˆκΈ°ν™”κΉŒμ§€ μžλ™μœΌλ‘œ ν•œλ²ˆμ— μ‹€ν–‰) # (- 싀무/μš΄μ˜μ—μ„œλŠ” airflow-webserver, airflow-scheduler ν•„λ“œ 뢄리) ports: - "8080:8080" # β†’ Airflow μ›Ή UI (localhost:8080) depends_on: - postgres # β†’ DBκ°€ λ¨Όμ € μ˜¬λΌμ™€μ•Ό Airflow μ‹œμž‘ κ°€λŠ₯ env_file: - .env environment: # Airflow 메타데이터 DB μ—°κ²° μ£Όμ†Œ AIRFLOW__CORE__SQL_ALCHEMY_CONN: ${AIRFLOW__CORE__SQL_ALCHEMY_CONN} # Airflow 예제 DAG λΆˆλŸ¬μ˜¬μ§€ μ—¬λΆ€ AIRFLOW__CORE__LOAD_EXAMPLES: ${AIRFLOW__CORE__LOAD_EXAMPLES} MLFLOW_TRACKING_URI: http://mlflow:5000 # β†’ DAG μ½”λ“œμ—μ„œ MLflow 연동 volumes: - ./airflow/dags:/opt/airflow/dags # DAG 파일 mount - ./ml_code:/opt/airflow/ml_code # ν•™μŠ΅ μ½”λ“œ 곡유 - ./mlflow_store:/mlflow # λͺ¨λΈ μ €μž₯μ†Œ 곡유 # πŸ”¬ MLflow: μ‹€ν—˜ 좔적 + λͺ¨λΈ λ ˆμ§€μŠ€νŠΈλ¦¬ μ„œλ²„ mlflow: build: context: ./mlflow_store # μ»€μŠ€ν…€ Dockerfile μœ„μΉ˜ dockerfile: Dockerfile.mlflow ports: - "5000:5000" # β†’ MLflow UI (localhost:5000) volumes: - ./mlflow_store:/mlflow # μ‹€ν—˜ 둜그 + DB + artifacts μ €μž₯ environment: - MLFLOW_TRACKING_URI=http://0.0.0.0:5000 # λ‚΄λΆ€ μ»¨ν…Œμ΄λ„ˆ κΈ°μ€€ URI # ⚑ FastAPI: λͺ¨λΈ μ„œλΉ™ API fastapi: build: context: ./fastapi dockerfile: Dockerfile.api container_name: fastapi ports: - "8000:8000" # β†’ 예츑 API μ—”λ“œν¬μΈνŠΈ (localhost:8000) volumes: - ./fastapi/app:/app/app # FastAPI app 디렉토리 mount - ./ml_code:/app/ml_code # ν•™μŠ΅/λͺ¨λΈ μ½”λ“œ 곡유 - ./mlflow_store:/mlflow # μ €μž₯된 λͺ¨λΈ 뢈러였기 μœ„ν•œ mount # πŸ—‚οΈ λ³Όλ₯¨ μ •μ˜ (Postgres DB μ˜μ†μ„± μœ μ§€) volumes: postgres_data: 🎁 μΆ”κ°€λ‘œ ν•΄μ•Ό ν•  것 Airflow 첫 μ‹€ν–‰ ν›„μ—” 보톡 κ΄€λ¦¬μž 계정 생성도 ν•΄μ€˜μ•Ό 함: # airflow μ»¨ν…Œμ΄λ„ˆ 접속 docker exec -it airflow bash # κ΄€λ¦¬μž 계정 생성 airflow users create \ --username airflow \ --password airflow \ --firstname Keoho \ --lastname Ban \ --role Admin \ --email airflow@example.com πŸ” [ꡬ좕 Tip] Airflow, FastAPI, MLflow κ°„ 곡유 λ³Όλ₯¨ ꡬ쑰 확인 곡유 λ¦¬μ†ŒμŠ€ μ„€λͺ… ./mlflow_store:/mlflow (MLflow) MLflow μ„œλ²„κ°€ μ“°λŠ” 둜그/λͺ¨λΈ μ €μž₯μ†Œ ./mlflow_store:/mlflow (Airflow) ν•™μŠ΅ ν›„ λͺ¨λΈ μ €μž₯ μœ„μΉ˜ 곡유 ./mlflow_store:/mlflow (FastAPI) λͺ¨λΈ μΆ”λ‘  μ‹œ λ‘œλ“œ 경둜 곡유 ➑ 경둜 톡일성이 μ€‘μš”ν•¨! μ§€κΈˆμ€ λͺ¨λ‘ ./mlflow둜 곡유 (./mlflow ν•˜μœ„μ— /mlruns 쑴재) ...

June 13, 2025 Β· 8 min

[MLflow : Tracking + FastAPI 연동]

λͺ©ν‘œ MLflow Tracking Server ꡬ성 μ‹€ν—˜(Experiment), νŒŒλΌλ―Έν„°, λ©”νŠΈλ¦­, μ•„ν‹°νŒ©νŠΈ 기둝 λͺ¨λΈ 등둝 β†’ Stage 이동 β†’ API μ—°λ™κΉŒμ§€ πŸ‘‰ μ‹€μŠ΅ μ½”λ“œλŠ” πŸ”— GitHub (Mlflow - Tracking + FastAPI) 🧭 μ‹€μŠ΅ 전체 흐름 μš”μ•½ [1단계] MLflow Tracking Server ꡬ성 (둜컬 ν™˜κ²½μ—μ„œ μ‹€ν–‰) [2단계] μ‹€ν—˜ μ‹€ν–‰ (train.py) β†’ λͺ¨λΈ ν•™μŠ΅, 기둝 [3단계] λͺ¨λΈ 등둝 및 Stage μ„€μ • (Production 이동) [4단계] FastAPI 연동 β†’ 예츑 API μ„œλΉ„μŠ€ 🧩 μ‹€μŠ΅ 디렉토리 μ˜ˆμ‹œ mlops-mlflow/ β”œβ”€β”€ app/ β”‚ β”œβ”€β”€ train.py # λͺ¨λΈ ν›ˆλ ¨ 및 μ‹€ν—˜ 기둝 β”‚ └── model.pkl # μ €μž₯된 λͺ¨λΈ β”œβ”€β”€ mlruns/ # μ‹€ν—˜ 데이터 μžλ™ 생성 β”œβ”€β”€ fastapi_app/ β”‚ └── app.py # FastAPI 예츑 API β”œβ”€β”€ Dockerfile (선택) └── README.md βœ… [1단계] MLflow μ„€μΉ˜ & μ‹€ν–‰ πŸ› οΈ 가상 ν™˜κ²½ μ„€μ • # 1. venv μ„€μΉ˜ sudo apt install python3-venv -y # 2. κ°€μƒν™˜κ²½ 생성 python3 -m venv .venv # 3. κ°€μƒν™˜κ²½ ν™œμ„±ν™” source .venv/bin/activate # 4. νŒ¨ν‚€μ§€ μ„€μΉ˜ pip install mlflow scikit-learn pandas fastapi uvicorn # 5. λ‚˜κ°ˆ λ•Œ deactivate πŸ”§ MLflow μ„œλ²„ μ‹€ν–‰ mlflow ui --port 5000 # http://localhost:5000 μ—μ„œ UI 확인 πŸ§ͺ [2단계] μ‹€ν—˜ μ‹€ν–‰ (train.py) # app/train.py import mlflow import mlflow.sklearn from sklearn.ensemble import RandomForestClassifier from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split # MLflow μ„€μ • mlflow.set_tracking_uri("http://localhost:5000") mlflow.set_experiment("iris-rf-exp") with mlflow.start_run(): iris = load_iris() X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2) clf = RandomForestClassifier(n_estimators=100, max_depth=3) clf.fit(X_train, y_train) acc = clf.score(X_test, y_test) mlflow.log_param("n_estimators", 100) mlflow.log_param("max_depth", 3) mlflow.log_metric("accuracy", acc) mlflow.sklearn.log_model(clf, "model") # μ‹€ν—˜ μ‹€ν–‰ python app/train.py μ‹€ν—˜μ΄ λλ‚˜λ©΄ mlruns/ 폴더에 μ‹€ν—˜ 기둝 및 λͺ¨λΈμ΄ μ €μž₯ ...

June 6, 2025 Β· 3 min