There are several steps to this analysis
- Importing Libraries
- Load Data and Perform Initial Exploration
- Further Inspect the Datasets
- Identify Missing Values
- Identify Outliers in the Trees Dimensions
- Identify Duplicates in the Trees Dataset
- Identify Geolocation Issues
- Identify Unmatched Data Each step involves analyzing the three datasets.
- Tree Dataset: This dataset is present in excel format. It contains data such as Identifier (unique for each tree), tree location in Camden (Latitude and Longitude), tree characteristics (Spread, Diameter and Height) and other necessary information such as Site Name, Inspection Date, Inspection Due Date
- Tree Environmental Dataset: This dataset is present in csv format and contains information such as Identifier, Maturity, Physiological Condition, Tree Set To Be Removed, Removal Reason, Capital Asset Value For Amenity Trees, Carbon Storage In Kilograms, Gross Carbon Sequestration Per Year In Kilograms and Pollution Removal Per Year In Grams
- Common Names Dataset: This dataset is present in json format and it contains the scientific and common names of trees. This data is taken from a Horticulture website.
Overall analysis of these datasets leads to data quality issues and their solutions.