GitHub - kk2491/Machine-Learning-Algorithms: Simple Implementation of machine learning algorithms

I would like to Thank Sentdex for excellent tutorials on Machine Learning Algorithms.

Also I am currently updating the references and the original source for some of the code in the repository.

Machine Learning - Flowchart

1. Understand the client requirement / Problem statement

2. Data Understanding

a. Data Collection (CSV file, logs, sensor data, data from SQL etc)

b. Data explore

c. Data quality : Analyze the data such that the sufficient information or data is available to build the model

3. Data Preparation

a. Cleaning the data
Check for NULL and NA values in the dataset, and take necessary actions
Remove if dataset is huge and removing a samples doesn’t affect the quality of the data
Impute the missing values with mean, median or KNN

b. Check for outliers in the dataset
Might be due to human error. This can be checked by using boxplot or the summary statistics of the dat**a. Remove or replace accordingly

c. Check how the features are distribute using histogram

d. Divide the data into train and test data set.

e. Feature Selection
This can be done using filter, wrapper and embedded method
Filter – Using statistical methods. Correlation check.
Wrapper – Subset, Forward, Backward, Hybrid selection, Boruta feature selection, Random forest important variable selection
Embedded – Lasso and Ridge

f. Feature Engineering
New features are created from the existing features
Feature transformations
Perform normalization and standardization

g. Dimensionality reduction
The dimension of the dataset can be reduced by using techniques such as PCa.

4. Model Building

a. Based on the problem statement decide whether the problem belongs to supervised and unsupervised model.

b. If supervised model, check whether the target variable is continuous or discrete.
If continuous – use regression model
If discrete – use classification model

c. Build the model using appropriate machine learning algorithm on train data.

d. Multiple models can be built to check which gives the better accuracy.

5. Model Evaluation

a. Once the model is built check the accuracy/error rate on both training data and testing data using appropriate evaluation metrics.

b. If the accuracy is good on Training and poor on Testing dataset then the model is overfitting
Build new model using regularization
Ask the customer to provide more data samples

c. If the accuracy is poor on both training and test data set then the model is overfitting
Build new model by adding more features
Build new model which includes feature transformation

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.idea		.idea
Boruta_Feature_Selection		Boruta_Feature_Selection
Caret		Caret
Decision_Tree		Decision_Tree
Deep_Learning		Deep_Learning
How_to_deploy		How_to_deploy
KNN		KNN
Linear_Regression		Linear_Regression
Logistic_Regression		Logistic_Regression
SVM		SVM
Tensorflow		Tensorflow
To_Do		To_Do
study_materials		study_materials
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning - Flowchart

1. Understand the client requirement / Problem statement

2. Data Understanding

3. Data Preparation

4. Model Building

5. Model Evaluation

About

Releases

Packages

Languages

kk2491/Machine-Learning-Algorithms

Folders and files

Latest commit

History

Repository files navigation

Machine Learning - Flowchart

1. Understand the client requirement / Problem statement

2. Data Understanding

3. Data Preparation

4. Model Building

5. Model Evaluation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages