Data warehouse |
Design DWH and describe the subject area |
Inmon Kimball Data volt2 Anchor modeling |
Hadoop HDFS map reduce |
MapReduce task for aggregating data about New York Taxi using Hadoop HDFS infrastructure |
Yandex.Cloud HadoopStreaming HDFS S3 MapReduce CLI Shell Hadoop Cluster Administration ETL |
Hadoop Hive |
Providing constant access to cold data, creating a Star scheme and a showcase using the Hadoop Hive infrastructure |
Yandex.Cloud S3 HDFS HIVE MapReduce TEZ YARN HiveSQL CLI Shell Hadoop Cluster Administration |
Apache Spark |
Creating a data showcase using the Apache Spark infrastructure |
Yandex.Cloud S3 HDFS PySpark CLI Shell Hadoop Cluster Administration |
Docker Kafka Spark |
Creating a data showcase using the Docker-compose , Kafka , GreenPlum and Spark infrastructure |
Yandex.Cloud Kafka SparkStreaming Docker-compose ZooKeeper GreenPlum |
Apache Airflow |
Auto-collecting of currency exchange rate data from the website with Apache Kafka and uploading to GreenPlum |
VK.Cloud Airflow GreenPlum Jinja macros ETL parsing bash IDE CI/CD |
Google Kubernetes |
Deploying a Kubernetes cluster with the installation of components to run the custom script and tracking the result using Spring History Server |
VK.Cloud terminal Ubuntu Kubectl Kubernetes Helm DOCKER S3 Spark Spark Operator Spark History Server |
Apache SparkML |
Creating a bot identifier using PySpark among user sessions with two tasks - to train the best data model and to apply it. |
PySpark SparkML SparkSession Pipeline |
Docker PostgreSQL |
Initiating PostgreSQL container with Docker-compose |
Docker Docker-compose PostgreSQL Adminer Python Pscorpg2 DockerHUB |
Docker Debezium Kafka PostgreSQL |
Creating Kafka topics monitoring pipeline with Debezium connect |
Docker Docker-compose PostgreSQL Debezium Kafka |
PySpark Poetry |
Creating PySpark project with Poetry DMS |
PySpark Poetry PyTest Quinn Wheel |
Docker Spark cluster |
Creating standalone Spark cluster on local PC |
Docker Docker-compose PySpark AWS CLI PostgreSQL Terminal Bash |
Docker Airflow Hive HDFS Spark |
Pet project with with Apache Airflow PySpark pipeline to ETL Forex data |
Docker Docker-compose Airflow HDFS Hive PySpark Bash |