FDA Drug Adverse Events ETL Project

This project involves extracting data on drug adverse events from the FDA's open-source data platform, transforming it using Python Pandas, and loading it into Snowflake for further analysis and visualization. The entire ETL process is orchestrated using Apache Airflow and Docker, with data visualization performed using Tableau Desktop.

Project Overview

This project involves:

Designing an API to retrieve data from the FDA's open-source data platform.
Building a Python script to fetch and process the data.
Storing the data in AWS S3.
Modeling and storing the transformed data in Snowflake.
Orchestrating the entire ETL process using Apache Airflow.
Visualizing the data using Tableau Desktop.

Architecture Diagram

Tools and Services Used

Python: For scripting and data processing.
Pandas: For data manipulation and analysis.
AWS S3: As a data lake for storing raw and transformed data.
Snowflake: For data modeling and storage.
Apache Airflow: For orchestrating the ETL workflow.
Docker: For containerizing the Airflow environment.
Visual Studio Code: As the development environment.
Tableau Desktop: For data visualization.
ODBC Snowflake Driver: For connecting Tableau to Snowflake.

API Details

Base URL: https://api.fda.gov/drug/event.json
Search Query: receivedate:[20200101 TO 20201231] AND occurcountry:"US" AND _missing_:companynumb

The API retrieves data reported directly from consumers in the United States in the year 2020 who had adverse reactions to suspected drugs.

Data Extraction

Build a Python script to fetch 80,000 rows using the API. Implement paging as the maximum rows the API can fetch is 1000.
Use Pandas to extract only the required columns and handle null values.
Handle missing values and update fields as per the data shown on the FDA page.

Data Storage

Use AWS S3 as a data lake.
Create an S3 bucket with different folders for raw and cleaned data.

Data Modeling

Create SQL worksheets in Snowflake for designing database schema.
Create new schemas, databases, and tables to handle the transformed data from S3.

ETL Workflow

Create an Airflow DAG script to orchestrate the ETL process with the following tasks:

Check API Availability: Ensure the API is available.
Extract Data: Fetch FDA drug adverse events data from the API, load it into S3, and push the S3 path to Xcom for reference.
Transform Data: Pull the S3 path from Xcom, extract the raw data from S3, transform and normalize it, and load the transformed data into a different S3 folder.
Create Snowflake Stage: Set up a Snowflake stage for the transformed data.
Load Data into Snowflake: Copy the transformed data into Snowflake tables using the stage from Task 4.

Setup Instructions

Install Astro CLI: A tool provided by Astronomer to help manage and interact with Airflow environments, allowing you to run Airflow locally in Docker containers.
Install Docker Desktop: Update the .wslconfig file to use the maximum CPU and RAM possible (minimum 6 processors and 6GB RAM).
Update Airflow Configurations: Modify the .env and requirements.txt files as shown in how_to_run.docx file.
Start Docker Desktop: Initialize the Astro environment using Astro commands in Visual Studio Code as detailed in how_to_run.docx file.

Running the Project

Run Airflow in Docker: Use Astro commands as detailed in how_to_run.docx file.
Access Airflow: Navigate to http://localhost:8080.
Navigate to Airflow Connections: Create HTTP, AWS, and Snowflake connections.
Trigger the Airflow DAG.

Data Visualization with Tableau Desktop

Once the data is loaded into Snowflake:

Install the ODBC Snowflake Driver: Required for Tableau Desktop to connect to Snowflake.
Create an Extract Connection: Connect Tableau Desktop to Snowflake.
Data Modeling in Tableau: Create a star schema and define relationships between the cleaned data tables.
Create Visualizations: Use Tableau Desktop to process and visualize the data.

Dashboard

Checkout Tableau data visualization at 2020_Drug_adverse_events | Tableau Public

Checkout how_to_run.docx file for detailed steps to run this project.

Checkout S3 URI.txt to get URI's for raw and transformed data.

Contribution

Contributions are welcome! Please open an issue or submit a pull request with any improvements.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
scripts		scripts
.gitattributes		.gitattributes
2020_Drug_adverse_events.twbx		2020_Drug_adverse_events.twbx
Architecture_diagram.png		Architecture_diagram.png
How_To_Run.docx		How_To_Run.docx
README.md		README.md
S3 URI.txt		S3 URI.txt
Snowflake_worksheet_for_dsesigning_DB_schema.txt		Snowflake_worksheet_for_dsesigning_DB_schema.txt
Tableau Dashboard.pdf		Tableau Dashboard.pdf
architecture_diagram_100%zoom.png		architecture_diagram_100%zoom.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FDA Drug Adverse Events ETL Project

Table of Contents

Project Overview

Architecture Diagram

Tools and Services Used

API Details

Data Extraction

Data Storage

Data Modeling

ETL Workflow

Setup Instructions

Running the Project

Data Visualization with Tableau Desktop

Dashboard

Checkout Tableau data visualization at 2020_Drug_adverse_events | Tableau Public

Checkout how_to_run.docx file for detailed steps to run this project.

Checkout S3 URI.txt to get URI's for raw and transformed data.

Contribution

About

Releases

Packages

Languages

ravishankar324/FDA-Drug-Adverse-events-ETL-Project

Folders and files

Latest commit

History

Repository files navigation

FDA Drug Adverse Events ETL Project

Table of Contents

Project Overview

Architecture Diagram

Tools and Services Used

API Details

Data Extraction

Data Storage

Data Modeling

ETL Workflow

Setup Instructions

Running the Project

Data Visualization with Tableau Desktop

Dashboard

Checkout Tableau data visualization at 2020_Drug_adverse_events | Tableau Public

Checkout how_to_run.docx file for detailed steps to run this project.

Checkout S3 URI.txt to get URI's for raw and transformed data.

Contribution

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages