Project name | Description | Skills and tools |
---|---|---|
Data warehouse | Design DWH and describe the subject area |
Inmon Kimball Data volt2 Anchor modeling |
Hadoop HDFS map reduce | MapReduce task for aggregating data about New York Taxi using Hadoop HDFS infrastructure |
Yandex.Cloud HadoopStreaming HDFS S3 MapReduce CLI Shell Hadoop Cluster Administration ETL |
Hadoop Hive | Providing constant access to cold data, creating a Star scheme and a showcase using the Hadoop Hive infrastructure |
Yandex.Cloud S3 HDFS HIVE MapReduce TEZ YARN HiveSQL CLI Shell Hadoop Cluster Administration |
Apache Spark | Creating a data showcase using the Apache Spark infrastructure |
Yandex.Cloud S3 HDFS PySpark CLI Shell Hadoop Cluster Administration |
Docker Kafka Spark | Creating a data showcase using the Docker-compose , Kafka , GreenPlum and Spark infrastructure |
Yandex.Cloud Kafka SparkStreaming Docker-compose ZooKeeper GreenPlum |
Apache Airflow | Auto-collecting of currency exchange rate data from the website with Apache Kafka and uploading to GreenPlum |
VK.Cloud Airflow GreenPlum Jinja macros ETL parsing bash IDE CI/CD |
Google Kubernetes | Deploying a Kubernetes cluster with the installation of components to run the custom script and tracking the result using Spring History Server |
VK.Cloud terminal Ubuntu Kubectl Kubernetes Helm DOCKER S3 Spark Spark Operator Spark History Server |
Apache SparkML | Creating a bot identifier using PySpark among user sessions with two tasks - to train the best data model and to apply it. |
PySpark SparkML SparkSession Pipeline |
Docker PostgreSQL | Initiating PostgreSQL container with Docker-compose |
Docker Docker-compose PostgreSQL Adminer Python Pscorpg2 DockerHUB |
Docker Debezium Kafka PostgreSQL | Creating Kafka topics monitoring pipeline with Debezium connect |
Docker Docker-compose PostgreSQL Debezium Kafka |
PySpark Poetry | Creating PySpark project with Poetry DMS |
PySpark Poetry PyTest Quinn Wheel |
Docker Spark cluster | Creating standalone Spark cluster on local PC |
Docker Docker-compose PySpark AWS CLI PostgreSQL Terminal Bash |
Docker Airflow Hive HDFS Spark | Pet project with with Apache Airflow PySpark pipeline to ETL Forex data |
Docker Docker-compose Airflow HDFS Hive PySpark Bash |
-
Notifications
You must be signed in to change notification settings - Fork 0
s-evsyukov/portfolio_projects
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
skills & tools portfolio
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published