Project Apache Spark in Scala

The main purpose of the project is to practice Apache Spark in Scala.

Architecture

Overview

Project Overview:

This project leverages Scala to implement an Extract, Transform, and Load (ETL) pipeline. Data is extracted from various sources (CSV files and PostgreSQL databases), undergoes transformations and analysis, and is then loaded into three distinct sinks (CSV, Parquet, and PostgreSQL).

Data Sources:

Multiple CSV files PostgreSQL databases: Transaction Poland Transaction France Transaction China Transaction USA

Data Transformations and Analysis:

The specific transformations and analysis steps are not explicitly mentioned in the image or description. However, the project likely involves data cleaning, filtering, aggregation, and potentially more complex operations depending on the data's nature and intended use.

Data Sinks:

CSV files Parquet files PostgreSQL databases

Instruction for building and running Scala application

Copy the project from GitHub
Open project
Build "postgres" Docker image cd PostgresSQL && docker build -t postgres .
Start the Docker container
Check PostgreSQL connection: docker exec -it postgres psql -U postgres -d postgres \dt
Run Scala application

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Project Apache Spark in Scala

Architecture

Overview

Instruction for building and running Scala application

Files

README.md

Latest commit

History

README.md

File metadata and controls

Project Apache Spark in Scala

Architecture

Overview

Instruction for building and running Scala application