Skip to content

kk2491/Machine-Learning-Algorithms

Repository files navigation

I would like to Thank Sentdex for excellent tutorials on Machine Learning Algorithms.

Also I am currently updating the references and the original source for some of the code in the repository.

Machine Learning - Flowchart

1. Understand the client requirement / Problem statement

2. Data Understanding

a. Data Collection (CSV file, logs, sensor data, data from SQL etc)

b. Data explore

c. Data quality : Analyze the data such that the sufficient information or data is available to build the model

3. Data Preparation

a. Cleaning the data
Check for NULL and NA values in the dataset, and take necessary actions
Remove if dataset is huge and removing a samples doesn’t affect the quality of the data
Impute the missing values with mean, median or KNN

b. Check for outliers in the dataset
Might be due to human error. This can be checked by using boxplot or the summary statistics of the dat**a. Remove or replace accordingly

c. Check how the features are distribute using histogram

d. Divide the data into train and test data set.

e. Feature Selection
This can be done using filter, wrapper and embedded method
Filter – Using statistical methods. Correlation check.
Wrapper – Subset, Forward, Backward, Hybrid selection, Boruta feature selection, Random forest important variable selection
Embedded – Lasso and Ridge

f. Feature Engineering
New features are created from the existing features
Feature transformations
Perform normalization and standardization

g. Dimensionality reduction
The dimension of the dataset can be reduced by using techniques such as PCa.

4. Model Building

a. Based on the problem statement decide whether the problem belongs to supervised and unsupervised model.

b. If supervised model, check whether the target variable is continuous or discrete.
If continuous – use regression model
If discrete – use classification model

c. Build the model using appropriate machine learning algorithm on train data.

d. Multiple models can be built to check which gives the better accuracy.

5. Model Evaluation

a. Once the model is built check the accuracy/error rate on both training data and testing data using appropriate evaluation metrics.

b. If the accuracy is good on Training and poor on Testing dataset then the model is overfitting
Build new model using regularization
Ask the customer to provide more data samples

c. If the accuracy is poor on both training and test data set then the model is overfitting
Build new model by adding more features
Build new model which includes feature transformation

Releases

No releases published

Packages

No packages published