Skip to content

heydarshahi/Near-Duplicate-Image-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

similar-image-detection

simple near-duplicate detection

Description

This is a simple near-duplicate detection web app based on image-hash algorithm and BK-Tree data structure.

As webapp.py is run, initialize_bktree() is called, the BK-Tree is initialized from files/hashlist.txt.

Run Server

Run webapp.py to initialize server. Open a browser and navigate to "0.0.0.0:8000/html" for a simple client demonstration. You can also send POST messages using Postman or Advanced REST Client. The message type should be multipart/form-data and the parameters should be:

  • type="file", name="theimage" (the jpg or png file to add/search)
  • name="request_query", value: "image" or "hash" or "id"
  • name="request_id", value: ID of the added/searched image

Modules & Functions

  • webapp.py: Main function that sets up a Sanic web server. Functions include:

    • post_file_add(): The route handled by this function is /image/add/. A post request (as described in the Description part) should be sent, to add new images. Adding should be by image and id. Returns a json file with status "file received" or "existing ID".

      • post_file_search(): The route handled by this function is /image/search/. To search for an image by id or image. Returns a json file with a list of duplicate IDs.
    • notify_server_stopping(app, loop): This is called before server stops to persist the added hashes by calling image_helper.persist_hash_tree()

  • image_helper.py: This module consists of all the background adding and searching functions. Functions include:

    • initialize_bktree(): Reads previously saved hashes from /files/hashlist.txt and builds a BK-Tree of Img(hash, id) object. Img is also a collection.

    • process_image(file): Returns a Pillow.Image object from a file.

    • find_hash(image): Gets an Image and returns its hash using the function in module hash_helper

    • find_hash_by_id(id): Gets an ID and searches in id_hash_dict dictionary which contains all id-hash tuples.

    • add_image(image_hash, id): Creates an Img, checks if the ID is existing, then adds to the hash_tree and id_hash_dict variables.

    • find_duplicates(image_hash, distance): Searches the hash_tree for images whose hashes are from from hamming distance of at most distance from the query hash.

    • mydistance(img1, img2): Takes two Img object and returns the hamming distance of their hashes.

    • persist_hash_tree(): Saves id_hash_dict into /files/hashlist.txt file.

  • hash_helper.py This is for computing the hash of an image using imagehash library.

About

simple near-duplicate detection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published