Skip to content

Latest commit

 

History

History
18 lines (12 loc) · 991 Bytes

File metadata and controls

18 lines (12 loc) · 991 Bytes

Template ETL with Airflow, Spark and Postgres

Simple template for ETLs using Postgres, Spark and Airflow deployed on docker containers.

To deploy simply run:

docker-compose up

After that log into the Airflow interface at http://localhost:8080/home using the credentials:

user: airflow
password: airflow

The databases and tables in Postgres are created by the sript /dags/db/init.sql when the container is created by Docker Compose, and the credentials are located in the enviroment file /dags/.env (in a real environment, this file should not be added to git).

The Dockerfile pulls and Airflow docker image and installs the required Python packages described at requirements.txt.

The ETL DAG is located at /dags/etl_DAG.py and as a demonstration, reads data from an API, then transforms it using PySpark, loads it into the database and aggreagates it using SQL querys.