Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to Use generate_text_prompts for Custom Dataset Embeddings and Reparameterization Fine-tuning #567

Open
TAKAGNU opened this issue Jan 25, 2025 · 0 comments

Comments

@TAKAGNU
Copy link

TAKAGNU commented Jan 25, 2025

I am trying to use the generate_text_prompts function to generate text embeddings for a custom dataset in a format similar to Flickr30k. My goal is to use these embeddings to reparameterize and fine-tune YOLO-World. Here is my current workflow:

I am using the tokens_positive_eval corresponding text as the input categories for generate_text_prompts to generate the text embeddings.

However, this results in a large number of categories (73,350), which requires setting num_classes and num_training_classes to 73,350 during reparameterization fine-tuning.

As a beginner, I am unsure if this is the correct approach. Specifically, I have the following questions:

Is it appropriate to use tokens_positive_eval text as the input for generate_text_prompts to generate embeddings for a custom dataset?

Is setting num_classes and num_training_classes to 73,350 the correct way to handle such a large number of categories during fine-tuning?

Are there any best practices or recommended workflows for generating text embeddings and fine-tuning YOLO-World on custom datasets with a large number of categories?

Any guidance or suggestions would be greatly appreciated!

Thank you in advance for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant