Skip to content

This repository contains the development of a data warehouse system for monitoring and analyzing daily weather data across 31 districts in Surabaya City from January 2000 to November 2024. The project implements an ETL (Extract, Transform, Load) process using Pentaho Data Integration (PDI), stores data in MySQL databases with OLTP and OLAP schemas.

Notifications You must be signed in to change notification settings

harishartanto/surabaya-weather-dwh

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Surabaya Weather Data Warehouse

This project involves the development of a comprehensive data warehouse system designed to monitor and analyze daily weather data across 31 districts in Surabaya City from January 2000 to November 2024. The project leverages an ETL (Extract, Transform, Load) pipeline built with Pentaho Data Integration (PDI) to process and organize data efficiently. The processed data is stored in MySQL databases using both OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) schemas, enabling robust data management and analysis. Additionally, in this project, a Tableau dashboard was created to visualize the historical weather data, providing insights into weather patterns, conditions, and trends in Surabaya City.

Technology Used

Data Sources

  • Administrative and Geographical Data:
  • Weather Data: Extracted from OpenMeteo (via Elevation API)
  • Weather Condition Codes: Descriptions of weather codes (WMO Code 4677) were scraped from WMO Meteorological codes pages in the CEDA Artefacts Service.
  • Rainfall Intensity: Rainfall intensity criteria were sourced from BMKG.

Project Workflow

  1. Prepare OLTP and OLAP Databases:
    • Import the database schemas using the *_db-dump.sql files provided in the data/mysql_db directory.
    • Use the OLTP and OLAP SQL files to set up the databases.
  2. Extract Data:
    • Run the PDI transformations to extract data from various sources.
      • get_districts.ktr: Extracts administrative area data.
      • get_weather-historical.ktr: Extracts historical weather data from OpenMeteo.
      • get_wmo_code.ktr: Extracts weather condition codes (WMO Code 4677) from the CEDA Artefacts Service.
  3. Initialize Dimension Tables:
    • Run the following transformations in PDI:
      • insert_dim_time.ktr: Initializes data for time dimension table (dim_time).
      • insert_dim_rainfall.ktr: Initializes data for rainfall intensity dimension table (dim_rainfall) based on BMKG criteria.
  4. Transform and Load Data:
    • Run the weather_transform.kjb job in PDI to load data from the OLTP to the OLAP schema. This job focuses on transforming and loading data for the following tables:
      • dim_location
      • dim_weather_condition
      • fact_weather

Tableau Dashboard

The interactive Tableau dashboard for this project was built using Tableau Desktop with data sourced directly from the OLAP schema in MySQL. The final dashboard is available on Tableau Public: Surabaya Historical Weather Analytics Dashboard

Project Report

For a detailed explanation of the project, including methodologies, results, and diagrams, refer to the Project Deck or Project Report in the others directory.

About

This repository contains the development of a data warehouse system for monitoring and analyzing daily weather data across 31 districts in Surabaya City from January 2000 to November 2024. The project implements an ETL (Extract, Transform, Load) process using Pentaho Data Integration (PDI), stores data in MySQL databases with OLTP and OLAP schemas.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published