Skip to content

This project aims to import data from different sources in python and extracting insights about data quality using PANDAS library. Three different data sources are used in this dataset.

Notifications You must be signed in to change notification settings

Taimoormukhtar/Camden_tree_analysis

Repository files navigation

Camden_tree_analysis

There are several steps to this analysis

  • Importing Libraries
  • Load Data and Perform Initial Exploration
  • Further Inspect the Datasets
  • Identify Missing Values
  • Identify Outliers in the Trees Dimensions
  • Identify Duplicates in the Trees Dataset
  • Identify Geolocation Issues
  • Identify Unmatched Data Each step involves analyzing the three datasets.

Dataset Information

  • Tree Dataset: This dataset is present in excel format. It contains data such as Identifier (unique for each tree), tree location in Camden (Latitude and Longitude), tree characteristics (Spread, Diameter and Height) and other necessary information such as Site Name, Inspection Date, Inspection Due Date
  • Tree Environmental Dataset: This dataset is present in csv format and contains information such as Identifier, Maturity, Physiological Condition, Tree Set To Be Removed, Removal Reason, Capital Asset Value For Amenity Trees, Carbon Storage In Kilograms, Gross Carbon Sequestration Per Year In Kilograms and Pollution Removal Per Year In Grams
  • Common Names Dataset: This dataset is present in json format and it contains the scientific and common names of trees. This data is taken from a Horticulture website.

Overall analysis of these datasets leads to data quality issues and their solutions.

About

This project aims to import data from different sources in python and extracting insights about data quality using PANDAS library. Three different data sources are used in this dataset.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published