Welcome to my advanced multimodal chatbot! Here's a breakdown of what it can do and how to get it up and running.
- ChatGPT-like interaction: Chatbot can act as a normal AI assistant.
- RAG (Retrieval Augmented Generation) capabilities: The chatbot can perform RAG in 3 different ways
- With preprocessed documents
- Documents that the user uploads while using the chatbot
- Any webiste that the user requests.
- Image generation: Chatbot utilizes a stable diffusion model to generate images.
- Image understanding: Chatbot Understands the content of images and can answer user's question based on the content of the image using the LLava model.
- DuckDuckGo integration: Access the DuckDuckGo search engine to provide answers based on search results when needed.
- Summarization: Summarize website content or documents upon user request.
- Text and voice interaction: Interact with chatbot through both text and voice inputs.
- Memory: The GPT models in the chatbot also have access to the memory (user's previous queries during the current session).
NOTE: This chatbot includes both the RAG-GPT and WebRAGQuery projects.
YouTube video: Link
- LLM chains and agents
- GPT function calling
- Retrieval Augmented generation (RAG)
- GPT 4: Website
- text-embedding-ada-002: Website
- llava-hf/llava-v1.6-mistral-7b-hf: Code - Demo - Website - Models
- stabilityai/stable-diffusion-xl-base-1.0 : Website
- openai/whisper-base.en: Website
- Operating System: Linux OS or Windows Subsystem for Linux (WSL).
- GPU VRAM: Minimum 15 GB for full execution.
- OpenAI or Azure OpenAI Credentials: Required for GPT functionality.
- Ensure you have Python installed along with required dependencies.
sudo apt update && sudo apt upgrade
python3 -m venv chatbot-env
git clone <the repository>
cd multimodal-chatbot
source ...Path to the environment/chatbot-env/bin/activate
pip install -r requirements.txt
- No need to download model weights separately; all models are accessed directly from the HuggingFace hub.
To prepare Documents for RAG, Copy PDF files to data/docs
directory and execute:
python src/prepare_vectordb_from_docs.py.
Run the provided script:
./run_chatbot.sh
Visit http://127.0.0.1:7860 in your web browser after executing the command.
tmux kill-session -t chatbot
Terminal One: RAG Reference Service
python src/utils/web_servers/rag_reference_service.py
Terminal Two: LLava Service
python src/utils/web_servers/llava_service.py
Terminal Three: Stable Diffusion Service
python src/utils/web_servers/sdxl_service.py
Terminal Four: Speech-to-Text Service
python src/utils/web_servers/stt_service.py
Launch Chatbot Interface in terminal five:
python src/app.py
or
gradio src/app.py
- Langchain: introduction
- Duckduckgo search engine: Documentation
- Gradio: Documentation
- OpenAI: Developer quickstart
- Transformers: Documentation
- chromadb: Documentation
- bs4: Documentation