Analysis of Machine Learning Models for Trash Detection

This project evaluates the performance of four deep learning-based models—GroundingDINO, YOLO, DETR, and ResNet—for object detection, trained on a subset of the TACO (Trash Annotations in Context) dataset. Each model was assessed based on Precision, Recall, and mean Average Precision (mAP) to determine the most effective solution. Additionally, the models were tested in real-time to evaluate their practical applicability. A comprehensive evaluation of this project is available on our webpage and detailed in our paper. This project was the final for our class CSCI 5561 Intro to Computer Vision and we would like to extend a special thank you to our professor Volkan Isler.

Data

The TACO dataset (Trash Annotations in Context) is a comprehensive collection of images depicting trash and recycling objects in real-world settings. This dataset, used for training and evaluating our models, includes images containing between 0 and 40 objects spanning 19 classes, such as bottles, cans, and plastic bags. Additionally, it features a "catch-all" class, labeled as unlabeled litter, for unidentified items. The dataset comprises 4,000 images, which were a.ugmented to expand the total to 6,000 images, all resized to 416x416 pixels. The annotations were created using Roboflow and are available here.

Models

I was personally in charge of the DETR model. A DETR (DEtection TRansformer) is a transformer-based object detection model that eliminates the need for traditional hand-crafted components like anchor generation and NMS, using an end-to-end approach with a bipartite matching loss to predict objects directly from input images.

This model was first introduced in a paper by Facebook AI in 2020. Since then, numerous similarly constructed models have been developed, each offering unique benefits and improvements tailored to specific applications. Below is a list of the models we utilized for this project.

DETR
RT-DETR
Conditional DETR
Deformable DETR

Results

Due to isues with compute power and time, we were unable to get the DETR and RT-DETR to converge in a reasonable amount of time. We trained the Conditional DETR and Deformable DETR at varrying epochs to achieve the results detailed below.

model	Number of Epochs	mAP 50-90	mAP 50	Precision	Recall
Conditional DETR	50	0.083	0.117	0.682	0.109
Conditional DETR	200	0.234	0.301	0.534	0.238
Deformable DETR	50	0.006	0.008	0.642	0.022

Code

To support this project, I implemented a pipeline to convert COCO object annotations into the required Hugging Face Dataset format. Using this pipeline, I fine-tuned pretrained DETR models, initially trained on the COCO dataset, with our TACO dataset. The notebook detailing the fine-tuning process is available at src/finetune_DeTr.ipynb. Additionally, an evaluation notebook can be found at src/eval.ipynb, and a script for loading and running the model in real-time is provided at src/real_time_eval.py.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
Results		Results
docs		docs
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysis of Machine Learning Models for Trash Detection

Data

Models

Results

Code

About

Languages

isaac-berlin/CV_Garbage_Detection

Folders and files

Latest commit

History

Repository files navigation

Analysis of Machine Learning Models for Trash Detection

Data

Models

Results

Code

About

Topics

Resources

Stars

Watchers

Forks

Languages