Roadmap

Welcome to my Roadmap repository! This repository showcases a comprehensive collection of projects that document my learning journey in data engineering. Each folder represents a specific area of study, featuring a variety of project types, including mini-projects, guided projects, hobby projects, and industry projects. This roadmap serves as both a learning tracker and a portfolio to highlight my growing skills and expertise.

Overview

This repository is structured to reflect my learning path in data engineering. Each project demonstrates practical applications of the concepts I have learned, organized into dedicated files for easy navigation. By showcasing these projects, I aim to provide a clear and structured overview of my technical skills and development.

Roadmap Files

Understanding Data Engineering

The Understanding Data Engineering.md file contains key theoretical concepts and definitions from the DataCamp "Understanding Data Engineering" course. It serves as a reference guide for important topics and terminologies in the field of data engineering.

Key Concepts Covered

Airflow: Open-source workflow management for scheduling data engineering tasks.
AWS (Amazon Web Services): Amazon's cloud computing services.
Azure: Microsoft's cloud services.
Big Data: Management of large and complex datasets characterized by volume, variety, velocity, veracity, and value.
Cloud Computing: Utilizing remote servers hosted on the internet for data management and processing.
Database Schema: The logical structure of a database, including its data organization and relationships.
Data Engineering: The process of designing, constructing, and managing data systems to facilitate analysis.
Data Ingestion: The process of importing data into a system or database.
Data Lake: A storage repository that holds large amounts of raw data.
Data Pipelines: A set of processes for moving and transforming data.
Data Warehousing: Centralized storage of data from multiple sources for analysis.
ETL (Extract, Transform, Load): A process that extracts data from one source, transforms it, and loads it into a target system.
Google Cloud: Cloud services provided by Google.
NoSQL: Non-relational databases for storing structured, semi-structured, and unstructured data.
Parallel Processing: The simultaneous use of multiple compute resources to process data.
Redshift: Amazon's cloud data warehouse service.
S3: Amazon’s cloud object storage service.

Introduction to SQL

The files in this section (Stored Procedure.sql, Student Tables and Views.sql) include projects from my Introduction to SQL coursework, focusing on concepts like:

Stored Procedures: Demonstrated in the Stored Procedure.sql file.
Creating Views: Showcased in the Student Tables and Views.sql file.

Intermediate SQL

This section contains five mini-projects and one guided project that apply various intermediate SQL concepts, including:

Group By, Order By, Aggregation Functions, Joins, and more.

Notable Projects

Analyzing Student's Mental Health: This guided project uses various SQL functions (GROUP BY, AVG, COUNT) to analyze student data.
Analyze International Debt's Statistics: Focuses on using SQL to summarize and analyze debt statistics using GROUP BY, SUM, and other essential SQL functions.
Exploring London’s Travel Network: A guided project that demonstrates the use of aggregation and filtering functions (SUM, GROUP BY, LIMIT).

Joining Data in SQL

Projects in this section demonstrate practical applications of SQL joins, including:

Inner Joins, Left Joins, Right Joins, Full Joins, and Cross Joins.

Additional projects cover Set Theory operations (UNION, INTERSECT, EXCEPT) and Subqueries.

Relational Databases in SQL

These projects focus on relational database concepts, including:

Data Migration: A project that demonstrates migrating data using INSERT INTO and CREATE TABLE.
Attribute Constraints: Managing data integrity through constraints like NOT NULL, UNIQUE, and foreign keys.
Many-to-Many Relationships: Demonstrating relational schema designs using surrogate keys and junction tables.
Referential Integrity: Managing referential integrity with ON UPDATE and ON DELETE behaviors.

Database Design

This section covers advanced database design principles, including normalization, schema design, and best practices for creating scalable data systems.

Coding Challenges using Python and SQL

This directory includes sub-folders related to Coding Challenges using Python and SQL in different platforms such as HackerRank and LeetCode. Diversifying my knowledge in different area such as solving problems using algorithms, data-analysis, and database management.

Python for Data Engineering

This directory contains a collection of Data Pipeline scripts developed in Python, as part of my learning journey in the Python for Data Engineering course on Coursera, which I completed through a financial aid opportunity. I plan to continue adding scripts here to build and refine my practical data engineering skills.

Python CLI Projects

This directory contains Python CLI Scripts that I built during fun time and any ideas that I come up with. Featuring also what I have learned throughout my journey learning Python for Data Engineering

Contact

Feel free to reach out to me for any questions or opportunities:

Email: christianbacani581@gmail.com
LinkedIn: Click Here
Portfolio: Click Here

Conclusion

This repository serves as a reflection of my learning journey in data engineering. As I continue to learn and grow, I will update this repository with new projects and insights. Thank you for visiting!

Name		Name	Last commit message	Last commit date
Latest commit History 402 Commits
.github		.github
Coding Challenges using Python and SQL		Coding Challenges using Python and SQL
Database Designs/Database Design of Video Game Reviews Kaggle Datasets		Database Designs/Database Design of Video Game Reviews Kaggle Datasets
Guided Projects in DataCamp		Guided Projects in DataCamp
Intermediate SQL		Intermediate SQL
Introduction to SQL		Introduction to SQL
Joining Data in SQL		Joining Data in SQL
Python CLI Projects/Student Management System		Python CLI Projects/Student Management System
Python for Data Engineering		Python for Data Engineering
Relational Databases in SQL		Relational Databases in SQL
Understanding Data Engineering		Understanding Data Engineering
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Roadmap

Table of Contents

Overview

Roadmap Files

Understanding Data Engineering

Key Concepts Covered

Introduction to SQL

Intermediate SQL

Notable Projects

Joining Data in SQL

Relational Databases in SQL

Database Design

Coding Challenges using Python and SQL

Python for Data Engineering

Python CLI Projects

Contact

Conclusion

About

Languages

christianebacani/Roadmap

Folders and files

Latest commit

History

Repository files navigation

Roadmap

Table of Contents

Overview

Roadmap Files

Understanding Data Engineering

Key Concepts Covered

Introduction to SQL

Intermediate SQL

Notable Projects

Joining Data in SQL

Relational Databases in SQL

Database Design

Coding Challenges using Python and SQL

Python for Data Engineering

Python CLI Projects

Contact

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Languages