This project is a personal exploration into building a Retrieval-Augmented Generation (RAG) application designed to assist IT help desk agents. The core idea is that many IT support tickets share similar underlying issues, and past solutions often hold valuable insights. This app aims to leverage a knowledge base of previously resolved tickets to help agents quickly navigate towards potential solutions for new, incoming tickets.
Essentially, this app acts as a "memory" of past tickets, allowing the agent to benefit from the collective experience of the help desk.
This project utilizes the following freely available tools:
- Embedding Model: sentence-transformers/all-MiniLM-L6-v2 (via Hugging Face) - For generating vector embeddings of text data.
- Generation Model: meta-llama/Llama-3.1-8B-Instruct (via Hugging Face) - A large language model for generating text based on retrieved information.
These tools were selected due to their accessibility, performance, and suitability for the task.
Hugging Face Resources:
Here's how you can set up and run this Mini RAG App:
Option 1: Without Docker
-
Install Dependencies:
pip install -r requirements.txt
This command will install all the necessary Python packages listed in the
requirements.txt
file. -
Set Environment Variables:
- The model uses the Hugging Face hub to use the embeddings and generation model. You will need to set your access token in the
.env
file, as well as the path to your data.
- The model uses the Hugging Face hub to use the embeddings and generation model. You will need to set your access token in the
-
Run the script:
./start_app.sh
This command will run the script and show you the output.
Option 2: Using Docker
-
Build the Docker Image
docker build -t mini-rag-app .
Do not forget to check the
data
folder path in Dockerfile. -
Start the Docker container
docker run -d --name mini-rag-app -p 8000:8000 -e HF_TOKEN="YOUR_TOKEN_HERE" mini-rag-app
Interact with the App
- The app runs on http://127.0.0.1:8000 by default.
- Go to http://127.0.0.1:8000/docs/ to see the API endpoints and try them out.
retrieve_docs
: the endpoint for finding and returning the most relevant ones from the old tickets, given the query ticket ID.get_help
: the endpoint for generating an LLM response to help with the given ticket ID, by looking at the similar tickets from the resolved ones.
This project is a personal exploration and not intended for production use. The focus is on demonstrating the concept rather than achieving a fully robust solution.
This is a prototype, and there are many avenues for future development:
- Large Language Model: The model for this task was originally the
Llama-3.2-8B-Instruct
, which I could not use from the inference API due to limitations. It would be logical to expect a jump in the answer quality with a bigger model. Moreover, there are many other open source alternatives to the Llama models. - Vector Database: In a more realistic scenario, it would be better to set up a persistent vector DB, so that we can store the embeddings there. That way, we would not need to compute the embeddings of each document for every request. We would only need to compute the embedding of the given query document, and then just use the vector DB to retrieve the similar documents for us.
- Embedding Model: Although the results from the embedding model are quite decent, if we had a vector DB, we could use a bigger and better embedding model for the retrieval, since we only need to calculate the embeddings once and then store them. According to the MTEB Leaderboard,
all-MiniLM-L6-v2
ranks currently on the 141st place. So, there is a lot of room for improvement here. - Enhanced Search: Implementing more advanced search and retrieval techniques.
- Feedback Loop: Incorporating agent feedback to improve the quality of retrieved suggestions.
- User Interface: Developing a more polished user interface for agents to interact with.
- Scalability: Improving the system's ability to handle a much larger dataset of tickets.
- Data cleaning: Cleaning the provided dataset to ensure higher quality data.