Per capita violent crimes prediction on the Communities and Crime Dataset from UCI Machine Learning.
- Genetic Algorithm is used for feature selection.
- Data is then descritized using pandas cut function into bins.
- Decision trees are used for the rules generation.
- Rules generated are used in the Fuzzy Inference System developed in matlab.
- Accuracy is tested.
- Abhilash Gahankari [2020H1030113H]
- Aashita Dutta [2020H1030130H]
- Satish Phale [2020H1030155H]
- Harsha Varun [2020PHXP0437H]
An example implementation of a simple Genetic Algorithm applied to feature selection. It will be using the Communities and Crime Dataset from UCI Machine Learning, which has 128 attributes, aimining to use these attributes in order to predict the label (Per Capita Violent Crimes). We will use the Genetic Algorithm to select which of those 128 features are most relevant to this prediction.
After features selected by genetic algorithm, continuous values are dicretized into range values using pandas cut function into bins and encoded into class variables.
Generating Rules from Discretized+Encoded data optimized using Genetic Feature Selections. sklearn.tree.DecisionTreeClassifier¶ is used to generate rules in text format with distinct classes based on probability, number of samples and gini index to measure how impure the partitions are.
Rules along with data are fed into Fuzzy Inference System in MATLAB, where trapezoidal function is used as membership function for fuzzification and trules inference system and centrold method is used for defuzzification. Accuracy measures gives best value for medium class.