[Airflow - 5๋‹จ๊ณ„: PythonOperator + MLflow Tracking ์—ฐ๋™]

์ด ๊ธ€์—์„œ ๋‹ค๋ฃจ๋Š” ๊ฒƒ PythonOperator๋กœ MLflow ์‹คํ—˜์„ ์ž๋™ํ™”ํ•˜์—ฌ, ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐยท๋ฉ”ํŠธ๋ฆญยท๋ชจ๋ธ์„ ์ž๋™ ๊ธฐ๋กํ•˜๋Š” Airflow + MLflow ์—ฐ๋™ ํŒจํ„ด์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ์„ ์ˆ˜์ง€์‹ Airflow 4๋‹จ๊ณ„: BashOperator๋กœ ์™ธ๋ถ€ ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ ์‹คํ–‰ โ€” ์™ธ๋ถ€ ์Šคํฌ๋ฆฝํŠธ ๋ถ„๋ฆฌ ํŒจํ„ด ์ด ๋‹จ๊ณ„์—์„œ ํ•ด๊ฒฐํ•˜๋ ค๋Š” ๋ฌธ์ œ ๋ชจ๋ธ์„ ํ•™์Šตํ•  ๋•Œ๋งˆ๋‹ค ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ, ์ •ํ™•๋„, ๋ชจ๋ธ ์•„ํ‹ฐํŒฉํŠธ๋ฅผ ์ˆ˜๋™์œผ๋กœ ๊ธฐ๋กํ•˜๋ฉด ์‹คํ—˜ ์žฌํ˜„์ด ๋ถˆ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. MLflow Tracking์€ ์ด ๋ชจ๋“  ์ •๋ณด๋ฅผ ์ž๋™ ๊ธฐ๋กํ•˜๊ณ , Airflow DAG์—์„œ ํ˜ธ์ถœํ•˜๋ฉด ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ ์ž์ฒด๋ฅผ ์ž๋™ํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์‹ค์Šต ์ฝ”๋“œ: GitHub (Airflow_and_MLflow) ๐Ÿงญ ์‹ค์Šต ์ „์ฒด ํ๋ฆ„ ์š”์•ฝ [1๋‹จ๊ณ„] MLflow ์‹คํ—˜ ์Šคํฌ๋ฆฝํŠธ ์ž‘์„ฑ [2๋‹จ๊ณ„] Airflow DAG ๊ตฌ์„ฑ [3๋‹จ๊ณ„] DAG ์‹คํ–‰ ๋ฐ ํŒŒ๋ผ๋ฏธํ„ฐ/๋ฉ”ํŠธ๋ฆญ ํ™•์ธ [4๋‹จ๊ณ„] ๋ชจ๋ธ ์ €์žฅ ๋ฐ ๋กœ๊น… ์ƒํƒœ ์ ๊ฒ€ ๐Ÿ“ ๋””๋ ‰ํ† ๋ฆฌ ๊ตฌ์กฐ airflow/ โ”œโ”€โ”€ dags/ โ”‚ โ”œโ”€โ”€ train_with_mlflow.py โ† DAG ํŒŒ์ผ โ”œโ”€โ”€ ml_code/ โ”‚ โ”œโ”€โ”€ train_mlflow.py โ† MLflow ์—ฐ๋™ ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ โ””โ”€โ”€ mlruns/ โ† MLflow ๋กœ๊น… ๊ฒฐ๊ณผ ์ €์žฅ ํด๋” (์ž๋™ ์ƒ์„ฑ) ๐Ÿงช 1๋‹จ๊ณ„: MLflow ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ (ํ•ต์‹ฌ ๋ถ€๋ถ„) # airflow/ml_code/train_mlflow.py def run_experiment(): mlflow.set_tracking_uri("file:/opt/airflow/mlruns") mlflow.set_experiment("airflow_mlflow_example") with mlflow.start_run(): data = load_iris() X, y = data.data, data.target model = RandomForestClassifier(n_estimators=50, max_depth=3) model.fit(X, y) preds = model.predict(X) # ํ•™์Šต ๋ฐ์ดํ„ฐ๋กœ ํ‰๊ฐ€ โ€” ์‹ค์ œ ์šด์˜์—์„œ๋Š” train_test_split ํ•„์ˆ˜ acc = accuracy_score(y, preds) mlflow.log_param("n_estimators", 50) mlflow.log_param("max_depth", 3) mlflow.log_metric("accuracy", acc) mlflow.sklearn.log_model(model, "model") ์ „์ฒด ์ฝ”๋“œ: GitHub (train_mlflow.py) ...

June 10, 2025 ยท 2 min

[Airflow - 4๋‹จ๊ณ„: BashOperator๋กœ ์™ธ๋ถ€ Python ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ ์‹คํ–‰]

์ด ๊ธ€์—์„œ ๋‹ค๋ฃจ๋Š” ๊ฒƒ DAG ๋‚ด๋ถ€์— ๋กœ์ง์„ ๋„ฃ์ง€ ์•Š๊ณ , ๋ณ„๋„ Python ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ(train.py)๋ฅผ BashOperator๋กœ ํ˜ธ์ถœํ•˜์—ฌ ์ฝ”๋“œ์™€ ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜์„ ๋ถ„๋ฆฌํ•˜๋Š” ํŒจํ„ด์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ์„ ์ˆ˜์ง€์‹ Airflow 3๋‹จ๊ณ„: ML ํŒŒ์ดํ”„๋ผ์ธ DAG ๊ตฌ์„ฑ โ€” DAG ๊ตฌ์กฐ์™€ XCom ์ „๋‹ฌ ์ด ๋‹จ๊ณ„์—์„œ ํ•ด๊ฒฐํ•˜๋ ค๋Š” ๋ฌธ์ œ ์ด์ „ ๋‹จ๊ณ„์—์„œ๋Š” ํ•™์Šต ๋กœ์ง์„ DAG ํŒŒ์ผ ์•ˆ์— ์ง์ ‘ ์ž‘์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์‹ค๋ฌด์—์„œ๋Š” ํ•™์Šต ์ฝ”๋“œ๊ฐ€ ์ˆ˜๋ฐฑ ์ค„ ์ด์ƒ์ด๊ณ , ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธํ‹ฐ์ŠคํŠธ๊ฐ€ ๋ณ„๋„๋กœ ๊ด€๋ฆฌํ•ฉ๋‹ˆ๋‹ค. DAG๋Š” ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜๋งŒ ๋‹ด๋‹นํ•˜๊ณ , ํ•™์Šต ์ฝ”๋“œ๋Š” ๋…๋ฆฝ ์Šคํฌ๋ฆฝํŠธ๋กœ ๋ถ„๋ฆฌํ•ด์•ผ ์œ ์ง€๋ณด์ˆ˜์™€ ํ˜‘์—…์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ์‹ค์Šต ์ฝ”๋“œ: GitHub (BashOperator_and_Python_ML_Script) ...

June 10, 2025 ยท 2 min

[Airflow - 3๋‹จ๊ณ„: ML ํŒŒ์ดํ”„๋ผ์ธ DAG ๊ตฌ์„ฑ]

์ด ๊ธ€์—์„œ ๋‹ค๋ฃจ๋Š” ๊ฒƒ ๋ฐ์ดํ„ฐ ๋กœ๋”ฉ โ†’ ๋ชจ๋ธ ํ•™์Šต โ†’ ๋ชจ๋ธ ์ €์žฅ์˜ ML ์›Œํฌํ”Œ๋กœ์šฐ๋ฅผ DAG๋กœ ๊ตฌ์„ฑํ•˜๊ณ , XCom์œผ๋กœ ๋‹จ๊ณ„๋ณ„ ๊ฒฐ๊ณผ๋ฅผ ์ „๋‹ฌํ•˜๋Š” ํŒจํ„ด์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ์„ ์ˆ˜์ง€์‹ Airflow 2๋‹จ๊ณ„: Python & Bash Operator + XCom โ€” XCom ๋ฐ์ดํ„ฐ ์ „๋‹ฌ ํŒจํ„ด ์ด ๋‹จ๊ณ„์—์„œ ํ•ด๊ฒฐํ•˜๋ ค๋Š” ๋ฌธ์ œ ์‹ค์ œ ML ํŒŒ์ดํ”„๋ผ์ธ์€ ๋ฐ์ดํ„ฐ ์ค€๋น„ โ†’ ๋ชจ๋ธ ํ•™์Šต โ†’ ๋ชจ๋ธ ์ €์žฅ์ด ์ˆœ์ฐจ์ ์œผ๋กœ ์—ฐ๊ฒฐ๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ๋‹จ๊ณ„์˜ ๊ฒฐ๊ณผ(๋ฐ์ดํ„ฐ ๊ฒฝ๋กœ, ๋ชจ๋ธ ๊ฒฝ๋กœ)๋ฅผ ๋‹ค์Œ ๋‹จ๊ณ„์— ์ „๋‹ฌํ•ด์•ผ ํ•˜๊ณ , ์‹คํŒจ ์‹œ ์–ด๋А ๋‹จ๊ณ„์—์„œ ๋ฌธ์ œ๊ฐ€ ์ƒ๊ฒผ๋Š”์ง€ ์ถ”์ ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ธ€์—์„œ๋Š” ๊ฐ€์ƒ ๋ฐ์ดํ„ฐ๋กœ ์ด ํ๋ฆ„์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•ฉ๋‹ˆ๋‹ค. ...

June 7, 2025 ยท 2 min

[Airflow - 2๋‹จ๊ณ„: Python & Bash Operator + XCom ๋ฐ์ดํ„ฐ ์ „๋‹ฌ]

์ด ๊ธ€์—์„œ ๋‹ค๋ฃจ๋Š” ๊ฒƒ PythonOperator์™€ BashOperator๋ฅผ ํ•˜๋‚˜์˜ DAG์— ์กฐํ•ฉํ•˜๊ณ , XCom์œผ๋กœ Task ๊ฐ„ ๋ฐ์ดํ„ฐ๋ฅผ ์ „๋‹ฌํ•˜๋Š” ํŒจํ„ด์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ์„ ์ˆ˜์ง€์‹ Airflow 1๋‹จ๊ณ„: ๋กœ์ปฌ ํ™˜๊ฒฝ์—์„œ ๊ธฐ๋ณธ DAG ์‹คํ–‰ โ€” DAG ๊ตฌ์กฐ์™€ ์‹คํ–‰ ํ™˜๊ฒฝ ์ด ๋‹จ๊ณ„์—์„œ ํ•ด๊ฒฐํ•˜๋ ค๋Š” ๋ฌธ์ œ ์‹ค์ œ ํŒŒ์ดํ”„๋ผ์ธ์—์„œ๋Š” ์—ฌ๋Ÿฌ Task๊ฐ€ ์ˆœ์„œ๋Œ€๋กœ ์‹คํ–‰๋˜๋ฉด์„œ ๊ฒฐ๊ณผ๋ฅผ ๋‹ค์Œ Task์— ๋„˜๊ฒจ์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ํ•™์Šต Task๊ฐ€ ๋ชจ๋ธ ๊ฒฝ๋กœ๋ฅผ ์ƒ์„ฑํ•˜๋ฉด ๋ฐฐํฌ Task๊ฐ€ ๊ทธ ๊ฒฝ๋กœ๋ฅผ ๋ฐ›์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค. Airflow์˜ XCom์€ ์ด Task ๊ฐ„ ์†Œ๊ทœ๋ชจ ๋ฐ์ดํ„ฐ ์ „๋‹ฌ์„ ๋‹ด๋‹นํ•ฉ๋‹ˆ๋‹ค. ์‹ค์Šต ์ฝ”๋“œ: GitHub (PythonOperator_and_XCom) ...

June 7, 2025 ยท 2 min

[Airflow - 1๋‹จ๊ณ„: ๋กœ์ปฌ ํ™˜๊ฒฝ์—์„œ ๊ธฐ๋ณธ DAG ์‹คํ–‰]

์ด ๊ธ€์—์„œ ๋‹ค๋ฃจ๋Š” ๊ฒƒ Docker Compose๋กœ Airflow๋ฅผ ๋กœ์ปฌ์— ๋„์šฐ๊ณ , ์ฒซ DAG๋ฅผ ์ž‘์„ฑยท์‹คํ–‰ยท๋กœ๊ทธ ํ™•์ธํ•˜๋Š” ์ „์ฒด ๊ณผ์ •์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ์„ ์ˆ˜์ง€์‹ Kubernetes 1~5๋‹จ๊ณ„ ์ˆ˜๋ฃŒ ๊ถŒ์žฅ (Docker ๊ธฐ๋ณธ ์ง€์‹ ํ•„์ˆ˜) Kubernetes๋ฅผ ๊ฑด๋„ˆ๋›ฐ๊ณ  ์ด ๊ธ€๋ถ€ํ„ฐ ์‹œ์ž‘ํ•ด๋„ ๋ฉ๋‹ˆ๋‹ค (Docker๋งŒ ์„ค์น˜๋˜์–ด ์žˆ์œผ๋ฉด ๊ฐ€๋Šฅ) ์ด ๋‹จ๊ณ„์—์„œ ํ•ด๊ฒฐํ•˜๋ ค๋Š” ๋ฌธ์ œ ML ํŒŒ์ดํ”„๋ผ์ธ์€ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ โ†’ ์ „์ฒ˜๋ฆฌ โ†’ ํ•™์Šต โ†’ ํ‰๊ฐ€ โ†’ ๋ฐฐํฌ์˜ ์—ฌ๋Ÿฌ ๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ์ด ๋‹จ๊ณ„๋“ค์„ ์ˆ˜๋™์œผ๋กœ ์‹คํ–‰ํ•˜๋ฉด ์žฌํ˜„์„ฑ์ด ์—†๊ณ  ์žฅ์•  ์ถ”์ ์ด ์–ด๋ ต์Šต๋‹ˆ๋‹ค. Airflow๋Š” ์ด ์›Œํฌํ”Œ๋กœ์šฐ๋ฅผ DAG(๋ฐฉํ–ฅ ๋น„์ˆœํ™˜ ๊ทธ๋ž˜ํ”„)๋กœ ์ •์˜ํ•˜๊ณ , ์Šค์ผ€์ค„๋งยท์žฌ์‹œ๋„ยท๋กœ๊ทธ๋ฅผ ์ž๋™ ๊ด€๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ...

June 7, 2025 ยท 2 min