This project involves the development of a comprehensive data warehouse system designed to monitor and analyze daily weather data across 31 districts in Surabaya City from January 2000 to November 2024. The project leverages an ETL (Extract, Transform, Load) pipeline built with Pentaho Data Integration (PDI) to process and organize data efficiently. The processed data is stored in MySQL databases using both OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) schemas, enabling robust data management and analysis. Additionally, in this project, a Tableau dashboard was created to visualize the historical weather data, providing insights into weather patterns, conditions, and trends in Surabaya City.
- ETL Tools: Pentaho Data Integration (PDI) Deveoper Edition 9.4.0.0-343
- Database: MySQL (Community Server) 8.4.3
- Visualization: Tableau Desktop 2024.3
- Data Source: OpenMeteo; Wikidata; kodewilayah.id; CEDA Artefacts Service; Badan Meteorologi, Klimatologi, dan Geofisika (BMKG) Indonesia
- Administrative and Geographical Data:
- District and city names were extracted from and Wikidata (via web scraping) and validated using data from kodewilayah.id, based on Peraturan Menteri Dalam Negeri (Permendagri) No. 72/2019.
- District and city codes were sourced from kodewilayah.id.
- District and city latitude and longitude data were sourced from Wikidata (via web scraping), based on Geographic Names Server (GNS).
- District and city elevation data were extracted from OpenMeteo (via Historical Weather API).
- Weather Data: Extracted from OpenMeteo (via Elevation API)
- Weather Condition Codes: Descriptions of weather codes (WMO Code 4677) were scraped from WMO Meteorological codes pages in the CEDA Artefacts Service.
- Rainfall Intensity: Rainfall intensity criteria were sourced from BMKG.
- Prepare OLTP and OLAP Databases:
- Import the database schemas using the
*_db-dump.sql
files provided in thedata/mysql_db
directory. - Use the OLTP and OLAP SQL files to set up the databases.
- Import the database schemas using the
- Extract Data:
- Run the PDI transformations to extract data from various sources.
- get_districts.ktr: Extracts administrative area data.
- get_weather-historical.ktr: Extracts historical weather data from OpenMeteo.
- get_wmo_code.ktr: Extracts weather condition codes (WMO Code 4677) from the CEDA Artefacts Service.
- Run the PDI transformations to extract data from various sources.
- Initialize Dimension Tables:
- Run the following transformations in PDI:
- insert_dim_time.ktr: Initializes data for time dimension table (dim_time).
- insert_dim_rainfall.ktr: Initializes data for rainfall intensity dimension table (dim_rainfall) based on BMKG criteria.
- Run the following transformations in PDI:
- Transform and Load Data:
- Run the weather_transform.kjb job in PDI to load data from the OLTP to the OLAP schema. This job focuses on transforming and loading data for the following tables:
- dim_location
- dim_weather_condition
- fact_weather
- Run the weather_transform.kjb job in PDI to load data from the OLTP to the OLAP schema. This job focuses on transforming and loading data for the following tables:
The interactive Tableau dashboard for this project was built using Tableau Desktop with data sourced directly from the OLAP schema in MySQL. The final dashboard is available on Tableau Public: Surabaya Historical Weather Analytics Dashboard
For a detailed explanation of the project, including methodologies, results, and diagrams, refer to the Project Deck or Project Report in the others
directory.