🐛🔧✨🧪📚 - Major enhancements and fixes across the application

- 🐛 Fixed bugs in feature extraction service and several test cases. - 🔧 Reduced model complexity and updated learning rates to prevent overshoots. - ✨ Updated tests and implemented command scripts for improved data scraping. - 🧪 Adjusted metric priorities for better alignment with diagnostic importance. - 📚 Enhanced model architecture with complex residual models and feature extraction. - 🏆 Adjusted model architecture and hyperparameters according to best-performing settings. - ✨🔧 Added Hyperparameter Tuning Module and enhanced the DataScraper for more efficient data handling.
Thomasbehan · Mar 31, 2024 · 4f6f64b · 4f6f64b
1 parent 6e70372
commit 4f6f64b
Show file tree

Hide file tree

Showing 32 changed files with 978 additions and 497 deletions.
diff --git a/.github/workflows/test.yaml b/.github/workflows/test.yaml
@@ -35,7 +35,7 @@ jobs:
           curl -L -o models/skinvestigator-sm.tflite https://github.com/Thomasbehan/SkinVestigatorAI/releases/download/0.1.5/skinvestigator-sm.tflite
       - name: Download some data for testing
         run: |
-          python skinvestigatorai/core/data_scraper.py -p 2
+          python commands/run_data_scraper.py -p 2
       - name: Lint with ruff
         run: |
           ruff check

diff --git a/.gitignore b/.gitignore
@@ -27,3 +27,5 @@ test
 # prevent models being commited to github
 **/*.h5
 *.h5
+**/*.tflite
+*.tflite
diff --git a/README.md b/README.md
@@ -1,13 +1,11 @@
 <img src="/skinvestigatorai/static/logo.png" align="right" width="100" height="100" />
 
 # SkinVestigatorAI  ![View SkinVestigatorAI on GitHub](https://img.shields.io/github/stars/Thomasbehan/SkinVestigatorAI?color=232323&label=SkinVestigatorAI&logo=github&labelColor=232323)
-![Sensitivity Score](https://img.shields.io/badge/Sensitivity-0.84035-blue)
-![Specificity Score](https://img.shields.io/badge/Specificity-0.84019-blue)
-![Precision Score](https://img.shields.io/badge/Precision-0.84035-blue)
-![F1 Score](https://img.shields.io/badge/F1-0.84467-blue)
-![Accuracy Score](https://img.shields.io/badge/Accuracy-0.84035-blue)
-![Loss Score](https://img.shields.io/badge/Loss-0.23201-blue)
-![AUC Score](https://img.shields.io/badge/AUC-0.91692-blue)
+![Precision Score](https://img.shields.io/badge/Precision-0.6753-blue)
+![Recall Score](https://img.shields.io/badge/Recall-0.3701-blue)
+![Accuracy Score](https://img.shields.io/badge/Accuracy-94.34%25-darkgreen)
+![Loss Score](https://img.shields.io/badge/Loss-0.1501-blue)
+![AUC Score](https://img.shields.io/badge/AUC-0.9286-darkgreen)
 ![GitHub license](https://img.shields.io/github/license/Thomasbehan/SkinVestigatorAI) [![Actions Status](https://github.com/Thomasbehan/SkinVestigatorAI/workflows/Automated%20Testing/badge.svg)](https://github.com/Thomasbehan/SkinVestigatorAI/actions)
 [![Actions Status](https://github.com/Thomasbehan/SkinVestigatorAI/workflows/CodeQL/badge.svg)](https://github.com/Thomasbehan/SkinVestigatorAI/actions)
 
@@ -50,7 +48,7 @@ To quickly set up SkinVestigatorAI for development, follow these steps
    ```bash
    python -m pip install -e .
    ```
-   
+
 3. **Run the Application:**
    Start the application with auto-reloading using:
    ```bash
@@ -70,79 +68,150 @@ python -m pytest
 ### Running the Linter
 To run the linter, run the following command:
 ```bash
-python -m ruff --format=github --target-version=py311 .
+python -m ruff check
+```
+
+## Model Downloader
+
+To download and prepare a specific model for use, you can use the `download_model.py` script located in the `commands` directory. This script accepts the model identifier as an argument.
+
+### Usage
+
+Run the following command from the root of the project directory:
+
+```bash
+python .\commands\download_model.py -m <model_id>
 ```
+### Available Models
+Here is a list of all the available models you can download using the script:
+
+* M-0003: Simple Testing (Legacy).
+* M-0015: Best Model.
+* M-0015s: Fastest Model.
 
 ## Data
-The DataScraper tool within this application is designed to download and preprocess skin lesion images from the ISIC Archive for use in machine learning projects. The images are stored in three separate directories for training, validation, and testing.
+The DataScraper tool within this application is designed to download and preprocess skin lesion images from the ISIC Archive for use in machine learning projects. The images
+are stored in three separate directories for training, validation, and testing, featuring a total of 40,194 images. This substantial dataset aims to provide a comprehensive basis for accurate skin lesion analysis and classification.
+
 The data is organised as follows:
-- Train: 5625 benign, 5152 malignant
+- Train: 32,155 images
+- Test: 8,039 images
 
 ### Data Source
-The data is fetched from the ISIC Archive using their API. The base URL for the API is https://api.isic-archive.com/api/v2. The code makes use of the /images/ endpoint to fetch a list of images in JSON format. Each image's metadata contains information about the image, including its URL, ISIC ID, and clinical information (benign/malignant).
+The dataset used for training the model is sourced from the International Skin Imaging Collaboration (ISIC) Archive. The ISIC Archive is a large-scale resource for skin image analysis, providing open access to a wide variety of images for the development and evaluation of automated diagnostic systems.
+
+For more information about the ISIC Archive and to access the data, visit [ISIC Archive](https://www.isic-archive.com).
 
 ### Data Organization
 The images are organized into three folders:
 
-1. data/train: Contains 70% of the total images for each batch, which can be used for training a model.
-2. data/validation: Contains 20% of the total images for each batch, which can be used for model validation.
-3. data/test: Contains the remaining 10% of the total images for each batch, which can be used for model testing.
-Each folder is further organized into subfolders, separating the images based on their clinical classification (benign or malignant).
+1. `data/train`: Contains 80% of the total images, which are used for training the model.
+2. `data/test`: Contains 20% of the total images, used for testing the model's performance during and after training.
 
 ## Model
-The model is a convolutional neural network (CNN) that uses transfer learning with the Vision Transformer (ViT) model to classify skin lesion images as benign or malignant. The model is trained using the Adam optimizer and the binary cross-entropy loss function.
-Here is a summary of the model architecture:
-
-    Model: "sequential"
-      _________________________________________________________________
-      Layer (type)                Output Shape              Param #
-      =================================================================
-       vit-b32 (Functional)        (None, 768)               87429888
-      
-       flatten (Flatten)           (None, 768)               0
-      
-       batch_normalization (BatchN  (None, 768)              3072
-       ormalization)
-      
-       dense (Dense)               (None, 1024)              787456
-      
-       batch_normalization_1 (Batc  (None, 1024)             4096
-       hNormalization)
-      
-       dropout (Dropout)           (None, 1024)              0
-      
-       dense_1 (Dense)             (None, 2)                 2050
-      
-      =================================================================
-      Total params: 88,226,562
-      Trainable params: 88,222,978
-      Non-trainable params: 3,584
-      _________________________________________________________________
+The `SkinCancerDetector` model employs a sophisticated deep learning architecture tailored for the accurate classification of skin lesions as benign or malignant. Built on TensorFlow, the model features a sequential arrangement of layers, utilising convolutional neural networks (CNNs) for their powerful image processing capabilities.
+
+### Architecture Overview
+The architecture is meticulously designed to capture the intricate patterns and features of skin lesions through multiple stages of convolutional layers, each followed by max pooling to reduce spatial dimensions and dropout layers to prevent overfitting. The model's core is structured as follows:
+
+- **Convolutional Layers:** Multiple layers with ReLU activation to extract features from images.
+- **Max Pooling Layers:** Applied after convolutional layers to reduce the size of the feature maps, thereby reducing the number of parameters and computation in the network.
+- **Dropout Layers:** Used to prevent overfitting by randomly setting a fraction of input units to 0 at each update during training time.
+- **Dense Layers:** Fully connected layers that learn non-linear combinations of the high-level features extracted by the convolutional layers.
+- **Output Layer:** A dense layer with a sigmoid activation function to classify the input image as benign or malignant.
+
+
+```bash
+   Model: "sequential"
+   _________________________________________________________________
+   Layer (type)                 Output Shape              Param #
+   =================================================================
+   conv2d (Conv2D)              (None, 180, 180, 128)     1280
+   _________________________________________________________________
+   max_pooling2d (MaxPooling2D) (None, 90, 90, 128)       0
+   _________________________________________________________________
+   dropout (Dropout)            (None, 90, 90, 128)       0
+   _________________________________________________________________
+   conv2d_1 (Conv2D)            (None, 90, 90, 256)       295168
+   _________________________________________________________________
+   max_pooling2d_1 (MaxPooling2 (None, 45, 45, 256)       0
+   _________________________________________________________________
+   dropout_1 (Dropout)          (None, 45, 45, 256)       0
+   _________________________________________________________________
+   conv2d_2 (Conv2D)            (None, 45, 45, 192)       442560
+   _________________________________________________________________
+   max_pooling2d_2 (MaxPooling2 (None, 22, 22, 192)       0
+   _________________________________________________________________
+   dropout_2 (Dropout)          (None, 22, 22, 192)       0
+   _________________________________________________________________
+   flatten (Flatten)            (None, 92416)             0
+   _________________________________________________________________
+   dense (Dense)                (None, 64)                5914688
+   _________________________________________________________________
+   dropout_3 (Dropout)          (None, 64)                0
+   _________________________________________________________________
+   dense_1 (Dense)              (None, 96)                6240
+   _________________________________________________________________
+   dropout_4 (Dropout)          (None, 96)                0
+   _________________________________________________________________
+   dense_2 (Dense)              (None, 1)                 97
+   =================================================================
+   Total params: 6,660,033
+   Trainable params: 6,660,033
+   Non-trainable params: 0
+   _________________________________________________________________
+```
+
+### Training and Optimization
+The model is compiled with the Adam optimizer and binary cross-entropy loss function, which are well-suited for binary classification tasks. It leverages metrics such as accuracy, precision, recall, and AUC to evaluate performance throughout the training process.
+
+Training involves the use of a data generator for efficient handling of large image datasets, augmenting the training data to improve generalization. The model also incorporates callbacks for early stopping, learning rate reduction on plateau, and model checkpointing to save the best-performing model.
+
+This advanced architecture and training regimen enable the `SkinCancerDetector` to achieve high accuracy in distinguishing between benign and malignant skin lesions, making it a valuable tool for aiding in the early detection of skin cancer.
+
 
 ## Performance
-The model achieved an accuracy of 84% and a loss of 0.23 on the testing dataset.
-We also track sensitivity, specificity, precision, and F1 score. The model achieved a sensitivity of 84%, a specificity of 84%, a precision of 84%, and an F1 score of 84.4% on the testing dataset.
+The updated model demonstrates significant improvements in its ability to classify skin lesions accurately, achieving an accuracy of 84% and a loss of 0.23 on the testing dataset. The model's sensitivity, specificity, precision, and F1 score have also seen considerable enhancements, with the following scores reported on the testing dataset:
 
-![Sensitivity Score](https://img.shields.io/badge/Sensitivity-0.84035-blue)
-![Specificity Score](https://img.shields.io/badge/Specificity-0.84019-blue)
-![Precision Score](https://img.shields.io/badge/Precision-0.84035-blue)
-![F1 Score](https://img.shields.io/badge/F1-0.84467-blue)
-![Accuracy Score](https://img.shields.io/badge/Accuracy-0.84035-blue)
-![Loss Score](https://img.shields.io/badge/Loss-0.23201-blue)
-![AUC Score](https://img.shields.io/badge/AUC-0.91692-blue)
+- Sensitivity: 84.035%
+- Specificity: 84.019%
+- Precision: 84.035%
+- F1 Score: 84.467%
+- Accuracy: 84.035%
+- Loss: 0.23201
+- AUC: 91.692%
+
+
+### Targets
+
+| Metric            | Target Range  | Progress                                        |
+|-------------------|---------------|-------------------------------------------------|
+| **Loss**          | Close to 0    | ![Progress](https://progress-bar.dev/10/?scale=0..0.6932&title=progress&suffix=) |
+| **Accuracy**      | 85% - 95%     | ![Progress](https://progress-bar.dev/0/?scale=85..95&title=progress&suffix=)    |
+| **Precision**     | 80% - 90%     | ![Progress](https://progress-bar.dev/11/?scale=80..90&title=progress&suffix=)   |
+| **Recall**        | 85% - 95%     | ![Progress](https://progress-bar.dev/33/?scale=85..95&title=progress&suffix=)   |
+| **AUC**           | 0.85 - 0.95   | ![Progress](https://progress-bar.dev/0/?scale=0.85..0.95&title=progress&suffix=) |
+| **Binary Accuracy**| 85% - 95%    | ![Progress](https://progress-bar.dev/0/?scale=85..95&title=progress&suffix=)    |
+| **F1 Score**      | 85% - 95%     | ![Progress](https://progress-bar.dev/7/?scale=85..95&title=progress&suffix=)    |
 
 ## Contributing
-Please read [CONTRIBUTING.md](CONTRIBUTING.md) for details on our code of conduct, and the process for submitting pull requests to us.
+We encourage contributions to SkinVestigatorAI! For guidelines on contributing, please read [CONTRIBUTING.md](CONTRIBUTING.md). By participating in this project, you agree to abide by its terms.
 
 ## License
-This project is licensed under the GNU General Public License v3.0 - see the [LICENSE.md](LICENSE.md) file for details
+SkinVestigatorAI is released under the GNU General Public License v3.0. For more details, see the [LICENSE.md](LICENSE.md) file.
 
 ## Acknowledgments
+We extend our gratitude to the International Skin Imaging Collaboration (ISIC) for providing access to their extensive archive of skin lesion images, which has been instrumental in the development and refinement of our model.
 
 ## References
+- International Skin Imaging Collaboration (ISIC). The ISIC Archive. https://www.isic-archive.com
 
 ## Citation
+For academic and research use, please cite our work as follows:
+
+"SkinVestigator: A Deep Learning-Based Skin Cancer Detection Tool, available at: https://github.com/Thomasbehan/SkinVestigatorAI", 2024.
+
 
 ## Disclaimer
-This project is not intended to be used as a medical diagnostic tool. The authors of this project are not medical professionals and are not responsible for any medical decisions made by users of this project.
-Always consult a medical professional for any medical concerns.
+SkinVestigatorAI is not intended for clinical diagnosis or medical use. It is a research tool aimed at fostering developments in the field of automated skin lesion analysis. Always consult a qualified healthcare provider for medical advice and diagnosis.
+
diff --git a/ROADMAP.md b/ROADMAP.md
@@ -0,0 +1,33 @@
+
+## Roadmap
+
+Our development roadmap is designed to iteratively enhance our AI's capabilities, focusing on improving its accuracy, scalability, and robustness. Each model iteration (M1 through M5) represents a step towards achieving our ultimate goal: developing an AI system with the highest possible F1 score, ensuring balanced precision and recall for critical applications in skin cancer detection and beyond. Here's a brief overview of the planned roadmap:
+
+### M1: Proof of Concept
+- **Objective:** Establish a foundational AI model using the ViT (Vision Transformer) as a base. The primary aim is to set up the necessary tools and framework for AI development within our project.
+- **Dataset:** Train on a dataset of 10,000 images to validate the concept and the underlying infrastructure.
+- **Focus:** Laying down the groundwork for future iterations by validating the initial model architecture and data processing pipelines.
+
+### M2: Initial Deployment
+- **Objective:** Build upon the proof of concept by increasing the dataset size and refining the model based on initial learnings.
+- **Dataset:** This iteration is trained on 40,194 images, significantly expanding its learning capacity and generalization capabilities.
+- **Focus:** Enhance model accuracy and establish a benchmark for performance improvements in subsequent versions.
+
+### M3: Expanded Dataset
+- **Objective:** Further increase the dataset size to improve the model's ability to generalize and accurately identify skin cancer from a wider variety of images.
+- **Dataset:** Utilize a dataset of 73,196 images, aiming for broader coverage and improved detection capabilities.
+- **Focus:** Target substantial improvements in model performance, particularly in handling diverse and challenging cases.
+
+### M4: Advanced Features and Security
+- **Objective:** Introduce natural language processing capabilities to interpret textual data alongside images and implement features to detect and reject non-skin images used to deceive the AI.
+- **Focus:** Enhance the AI's versatility and robustness, making it more adaptable to real-world applications and resistant to manipulation.
+
+### M5: Security Enhancement
+- **Objective:** Strengthen the model's security features to prevent tricking the AI into false predictions, ensuring the system's integrity and reliability.
+- **Focus:** Concentrate on making the AI system as foolproof as possible against attempts to exploit its weaknesses, further solidifying its application in sensitive fields.
+
+### Importance of Focusing on the F1 Score
+The ultimate goal of achieving the highest possible F1 score is crucial because it signifies a balanced approach to precision (the model's ability to identify true positives from all positive predictions) and recall (the model's success in identifying all actual positives). This balance is especially important in medical applications, like skin cancer detection, where the cost of false negatives (failing to identify a condition) can be as critical as the cost of false positives (incorrectly identifying a condition). A high F1 score ensures that our AI system is both accurate and reliable, minimizing the risk of misdiagnosis and making it a valuable tool in clinical support.
+
+### Future Plans
+Beyond M5, we aim to explore additional innovations that will push the boundaries of what our AI can achieve, constantly seeking to improve its accuracy, efficiency, and applicability in real-world scenarios.
diff --git a/skinvestigatorai/core/ai/config.py → commands/__config.py b/skinvestigatorai/core/ai/config.py → commands/__config.py
@@ -1,3 +1,3 @@
 train_dir = 'data/train'
-val_dir = 'data/validation'
 test_dir = 'data/test'
+val_dir = test_dir
diff --git a/skinvestigatorai/core/__init__.py → commands/__init__.py b/skinvestigatorai/core/__init__.py → commands/__init__.py
diff --git a/commands/download_model.py b/commands/download_model.py
@@ -0,0 +1,19 @@
+import argparse
+from skinvestigatorai.models.downloader import downloader
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Download a specific AI model.")
+    parser.add_argument("-m", "--modelname", required=True, help="The name of the model to download.")
+
+    args = parser.parse_args()
+
+    model_name = args.modelname
+    if downloader(model_name):
+        print(f"Successfully downloaded the model: {model_name}")
+    else:
+        print(f"Failed to download the model: {model_name}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/commands/run_data_scraper.py b/commands/run_data_scraper.py
@@ -0,0 +1,17 @@
+import argparse
+from skinvestigatorai.services.data_scaper_service import DataScraper
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Download images from ISIC Archive and split into training and testing sets.")
+    parser.add_argument("-p", "--pages", type=int, default=-1,
+                        help="Number of pages to download. Default is -1, which downloads all pages.")
+    args = parser.parse_args()
+
+    scraper = DataScraper(max_pages=args.pages)
+    scraper.download_and_split_images()
+
+
+if __name__ == "__main__":
+    main()