Skip to content

This repository serves as a temporary portfolio showcasing SQL projects, Python Scripts related to Data Engineering, highlighting key accomplishments and implementations.

Notifications You must be signed in to change notification settings

christianebacani/Roadmap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Roadmap

Welcome to my Roadmap repository! This repository showcases a comprehensive collection of projects that document my learning journey in data engineering. Each folder represents a specific area of study, featuring a variety of project types, including mini-projects, guided projects, hobby projects, and industry projects. This roadmap serves as both a learning tracker and a portfolio to highlight my growing skills and expertise.

Table of Contents

  1. Overview

  2. Roadmap Files

  3. Contact

  4. Conclusion

Overview

This repository is structured to reflect my learning path in data engineering. Each project demonstrates practical applications of the concepts I have learned, organized into dedicated files for easy navigation. By showcasing these projects, I aim to provide a clear and structured overview of my technical skills and development.

Roadmap Files

Understanding Data Engineering

The Understanding Data Engineering.md file contains key theoretical concepts and definitions from the DataCamp "Understanding Data Engineering" course. It serves as a reference guide for important topics and terminologies in the field of data engineering.

Key Concepts Covered

  • Airflow: Open-source workflow management for scheduling data engineering tasks.
  • AWS (Amazon Web Services): Amazon's cloud computing services.
  • Azure: Microsoft's cloud services.
  • Big Data: Management of large and complex datasets characterized by volume, variety, velocity, veracity, and value.
  • Cloud Computing: Utilizing remote servers hosted on the internet for data management and processing.
  • Database Schema: The logical structure of a database, including its data organization and relationships.
  • Data Engineering: The process of designing, constructing, and managing data systems to facilitate analysis.
  • Data Ingestion: The process of importing data into a system or database.
  • Data Lake: A storage repository that holds large amounts of raw data.
  • Data Pipelines: A set of processes for moving and transforming data.
  • Data Warehousing: Centralized storage of data from multiple sources for analysis.
  • ETL (Extract, Transform, Load): A process that extracts data from one source, transforms it, and loads it into a target system.
  • Google Cloud: Cloud services provided by Google.
  • NoSQL: Non-relational databases for storing structured, semi-structured, and unstructured data.
  • Parallel Processing: The simultaneous use of multiple compute resources to process data.
  • Redshift: Amazon's cloud data warehouse service.
  • S3: Amazon’s cloud object storage service.

Introduction to SQL

The files in this section (Stored Procedure.sql, Student Tables and Views.sql) include projects from my Introduction to SQL coursework, focusing on concepts like:

  • Stored Procedures: Demonstrated in the Stored Procedure.sql file.
  • Creating Views: Showcased in the Student Tables and Views.sql file.

Intermediate SQL

This section contains five mini-projects and one guided project that apply various intermediate SQL concepts, including:

  • Group By, Order By, Aggregation Functions, Joins, and more.

Notable Projects

  • Analyzing Student's Mental Health: This guided project uses various SQL functions (GROUP BY, AVG, COUNT) to analyze student data.
  • Analyze International Debt's Statistics: Focuses on using SQL to summarize and analyze debt statistics using GROUP BY, SUM, and other essential SQL functions.
  • Exploring London’s Travel Network: A guided project that demonstrates the use of aggregation and filtering functions (SUM, GROUP BY, LIMIT).

Joining Data in SQL

Projects in this section demonstrate practical applications of SQL joins, including:

  • Inner Joins, Left Joins, Right Joins, Full Joins, and Cross Joins.

Additional projects cover Set Theory operations (UNION, INTERSECT, EXCEPT) and Subqueries.

Relational Databases in SQL

These projects focus on relational database concepts, including:

  • Data Migration: A project that demonstrates migrating data using INSERT INTO and CREATE TABLE.
  • Attribute Constraints: Managing data integrity through constraints like NOT NULL, UNIQUE, and foreign keys.
  • Many-to-Many Relationships: Demonstrating relational schema designs using surrogate keys and junction tables.
  • Referential Integrity: Managing referential integrity with ON UPDATE and ON DELETE behaviors.

Database Design

This section covers advanced database design principles, including normalization, schema design, and best practices for creating scalable data systems.

Coding Challenges using Python and SQL

This directory includes sub-folders related to Coding Challenges using Python and SQL in different platforms such as HackerRank and LeetCode. Diversifying my knowledge in different area such as solving problems using algorithms, data-analysis, and database management.

Python for Data Engineering

This directory contains a collection of Data Pipeline scripts developed in Python, as part of my learning journey in the Python for Data Engineering course on Coursera, which I completed through a financial aid opportunity. I plan to continue adding scripts here to build and refine my practical data engineering skills.

Python CLI Projects

This directory contains Python CLI Scripts that I built during fun time and any ideas that I come up with. Featuring also what I have learned throughout my journey learning Python for Data Engineering

Contact

Feel free to reach out to me for any questions or opportunities:

Conclusion

This repository serves as a reflection of my learning journey in data engineering. As I continue to learn and grow, I will update this repository with new projects and insights. Thank you for visiting!

About

This repository serves as a temporary portfolio showcasing SQL projects, Python Scripts related to Data Engineering, highlighting key accomplishments and implementations.

Topics

Resources

Stars

Watchers

Forks

Languages