The dataset contains annual data on the flows of international immigrants as recorded by the countries of destination. The data presents both inflows and outflows according to the place of birth, citizenship, or place of previous/next residence for both foreigners and nationals. The current version presents data pertaining to 45 countries.
- Exploring Dataset
- Indexing and Filtering
- Visualizing Dataset
- Line Plot of Dataset
- License
- Contact
File: Canada.xlsx
Description: Contains immigration data for Canada, including country of origin, continent, and region.
- Type: Type of immigration (e.g., Immigrants)
- Coverage: Coverage type (e.g., Foreigners)
- Country: Origin country of immigrants
- Continent: Continent of the origin country
- Region: Region of the origin country
- Year Columns (1980-2013): Number of immigrants per year
- Total: Total number of immigrants from 1980 to 2013
- Python 3.x
- Required libraries:
- numpy
- pandas
- openpyxl
-
Clone the repository:
git clone <repository_url> cd <repository_folder>
-
Install dependencies:
pip install -r requirements.txt
-
Ensure the
Canada.xlsx
dataset is placed in the correct directory.
- Reads and cleans the dataset
- Provides statistical summaries
- Renames columns for better readability
- Adds a Total column to sum up immigration data for each country
- Load necessary libraries (numpy, pandas, openpyxl).
- Read the Excel file into a Pandas DataFrame.
- Display the first and last five rows to inspect data.
- Get a concise summary of the dataset using
info()
. - Retrieve column headers and index values.
- Convert index and columns to lists for better manipulation.
- Display the shape of the dataset (rows, columns).
- Remove unnecessary/null values to clean the dataset.
- Rename columns (OdName to Country, AreaName to Continent, etc.).
- Add a Total column summing immigration numbers (1980-2013).
- Check for null values to ensure data integrity.
- Generate statistical summaries using
describe()
.
- Reads Excel data into a Pandas DataFrame.
- Cleans unnecessary and null values.
- Renames columns for better readability.
- Summarizes immigration data with key statistics.
This script demonstrates various techniques for indexing and selecting data from a dataset using Pandas. The dataset used in this example is an Excel file (Canada.xlsx
) containing migration data.
Before running the script, ensure you have the following libraries installed:
numpy
pandas
openpyxl
keras
You can install the missing packages using:
pip install numpy pandas openpyxl keras
-
Reading and Preprocessing the Dataset
- The dataset is loaded from an Excel file using
pd.read_excel()
. - Unnecessary rows and footers are skipped.
- Column names are renamed for better clarity.
- The dataset is loaded from an Excel file using
-
Filtering Data
- Extracting the list of countries from the dataset.
- Selecting specific columns (years 1980-1985) for all countries.
-
Indexing Operations
- Setting the
Country
column as the index. - Retrieving full row data for a specific country (e.g.,
Japan
). - Selecting a specific year for a country.
- Extracting multiple years for a country.
- Setting the
-
Converting Column Names to Strings
- Ensuring column names are of string type to prevent ambiguity in indexing.
-
Filtering with Conditions
- Extracting data for Asian countries (
Continent = Asia
). - Applying multiple conditions to filter data further (e.g.,
Southern Asia
region).
- Extracting data for Asian countries (
-
Final Review of Data
- Displaying data dimensions and column names.
- Showing the first few rows of the modified dataset.
Simply execute the script in a Python environment:
python script_name.py
Make sure to update the file path for Canada.xlsx
before running.
The script prints various filtered views of the dataset to the console, demonstrating different indexing and selection techniques in Pandas.
- The dataset path should be adjusted based on your local directory structure.
- If encountering issues with
openpyxl
, ensure the package is installed properly.
This script serves as a practical guide for working with Pandas DataFrames, focusing on indexing, slicing, and filtering data efficiently.
This repository contains visualizations of immigration trends from different continents over time. The graphs represent the number of immigrants from Africa, Asia, Europe, and the world to a specific country or region.
The following plots are included:
Each graph plots the number of immigrants over a period of time (1980 - 2013), categorized by country. The trends help visualize migration patterns and how they evolved over the years.
- Use these images to analyze migration trends from different continents.
- Compare the immigration rates from different regions.
- Identify peaks and declines in immigration numbers.
This project contains visualizations of immigration data from different countries over time. The included images showcase trends and significant events affecting immigration patterns.
Description: This line plot compares the number of immigrants from China and India to Canada from 1980 to 2013. The trends highlight significant increases over time, with some fluctuations reflecting policy changes and global events.
Description: This visualization displays immigration trends from Haiti to Canada over the years. The data indicates fluctuations in migration numbers, with notable variations across different periods.
Description: This chart highlights immigration from Haiti, emphasizing the spike in 2010 due to the devastating Haiti Earthquake. The graph includes an annotation marking this significant event, demonstrating how humanitarian crises influence migration patterns.
These images can be used for:
- Understanding historical immigration trends
- Analyzing the impact of global events on migration
- Supporting research or presentations on immigration patterns
This project is licensed under the IBM License - see the LICENSE file for details.
For any inquiries or feedback, please contact Mahmoud Mohamed Abdallah at Mahmoud_abdallah20@outlook.com.