Skip to content

s-evsyukov/portfolio_projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data engineer portfolio


Pet-projects Description:

Project name Description Skills and tools
Data warehouse Design DWH and describe the subject area Inmon Kimball Data volt2 Anchor modeling
Hadoop HDFS map reduce MapReduce task for aggregating data about New York Taxi using Hadoop HDFS infrastructure Yandex.Cloud HadoopStreaming HDFS S3 MapReduce CLI Shell Hadoop Cluster Administration ETL
Hadoop Hive Providing constant access to cold data, creating a Star scheme and a showcase using the Hadoop Hive infrastructure Yandex.Cloud S3 HDFS HIVE MapReduce TEZ YARN HiveSQL CLI Shell Hadoop Cluster Administration
Apache Spark Creating a data showcase using the Apache Spark infrastructure Yandex.Cloud S3 HDFS PySpark CLI Shell Hadoop Cluster Administration
Docker Kafka Spark Creating a data showcase using the Docker-compose, Kafka, GreenPlum and Spark infrastructure Yandex.Cloud Kafka SparkStreaming Docker-compose ZooKeeper GreenPlum
Apache Airflow Auto-collecting of currency exchange rate data from the website with Apache Kafka and uploading to GreenPlum VK.Cloud Airflow GreenPlum Jinja macros ETL parsing bash IDE CI/CD
Google Kubernetes Deploying a Kubernetes cluster with the installation of components to run the custom script and tracking the result using Spring History Server VK.Cloud terminal Ubuntu Kubectl Kubernetes Helm DOCKER S3 Spark Spark Operator Spark History Server
Apache SparkML Creating a bot identifier using PySpark among user sessions with two tasks - to train the best data model and to apply it. PySpark SparkML SparkSession Pipeline
Docker PostgreSQL Initiating PostgreSQL container with Docker-compose Docker Docker-compose PostgreSQL Adminer Python Pscorpg2 DockerHUB
Docker Debezium Kafka PostgreSQL Creating Kafka topics monitoring pipeline with Debezium connect Docker Docker-compose PostgreSQL Debezium Kafka
PySpark Poetry Creating PySpark project with Poetry DMS PySpark Poetry PyTest Quinn Wheel
Docker Spark cluster Creating standalone Spark cluster on local PC Docker Docker-compose PySpark AWS CLI PostgreSQL Terminal Bash
Docker Airflow Hive HDFS Spark Pet project with with Apache Airflow PySpark pipeline to ETL Forex data Docker Docker-compose Airflow HDFS Hive PySpark Bash

About

skills & tools portfolio

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published