wandb · morganmcg1 · Dec 26, 2024 · Dec 26, 2024 · Dec 26, 2024 · Dec 26, 2024
diff --git a/.cursorrules b/.cursorrules
@@ -0,0 +1,35 @@
+When running tests in this codebase:
+
+# Testing
+1. Use the following pytest flags to prevent early exit issues and ensure complete test output:
+   ```bash
+   python -m pytest tests/ -v --tb=short --capture=tee-sys
+   ```
+
+2. These flags help in the following ways:
+   - `-v`: Verbose output
+   - `--tb=short`: Short traceback format
+   - `--capture=tee-sys`: Proper output capture that prevents early termination
+
+3. This is particularly important for async tests and tests involving API calls or event loops.
+
+4. If you need to debug a specific test, you can run it in isolation:
+   ```bash
+   python -m pytest tests/path_to_test.py::test_name -v --tb=short --capture=tee-sys
+   ```
+
+Remember to use these flags when running tests to ensure reliable test execution and complete output. 
+
+# Using Weave to analyze logged inputs and outputs
+
+The Weave api can be used to analyze logged inputs and outputs. Here is an example of iterating over the 
+input documents to this call and extracting the ids.
+
+Search the Weave documentation for more information on how to use the Weave api.
+
+```python
+import weave
+client = weave.init("wandbot/wandbot-dev")
+candidate_call = client.get_call("0194b427-ba78-77f3-9989-222419262817")
+final_candidate_ids = [doc.metadata["id"] for doc in candidate_call.inputs["inputs"].documents]
+```
diff --git a/.gitignore b/.gitignore
@@ -1,11 +1,19 @@
 # Byte-compiled / optimized / DLL files
 __pycache__/
+*.
 *.py[cod]
 *$py.class
 
+temp_index/
+e2b*
+
 # C extensions
 *.so
 
+testing_*.py
+testing_*.ipynb
+temp_*
+
 # Distribution / packaging
 .Python
 build/
@@ -105,6 +113,8 @@ celerybeat.pid
 # Environments
 .env
 .venv
+wandbot_venv/
+3-10_env/
 env/
 venv/
 ENV/

diff --git a/.replit b/.replit
@@ -1,19 +1,16 @@
-run = "bash run.sh"
 entrypoint = "main.py"
-modules = ["python-3.10:v18-20230807-322e88b"]
+modules = ["python-3.12"]
 
-disableInstallBeforeRun = true
+[nix]
+channel = "stable-24_05"
 
-hidden = [".pythonlibs"]
+[unitTest]
+language = "python3"
 
-[nix]
-channel = "stable-23_05"
+[gitHubImport]
+requiredFiles = [".replit", "replit.nix"]
 
 [deployment]
 run = ["sh", "-c", "bash run.sh"]
-build = ["sh", "-c", "bash build.sh"]
 deploymentTarget = "gce"
-
-[[ports]]
-localPort=8000
-externalPort=80
+build = ["sh", "-c", "bash build.sh"]
diff --git a/README.md b/README.md
@@ -1,9 +1,24 @@
 # wandbot
 
-Wandbot is a question-answering bot designed specifically for Weights & Biases [documentation](https://docs.wandb.ai/).
+WandBot is a question-answering bot designed specifically for Weights & Biases Models and Weave [documentation](https://docs.wandb.ai/).
 
 ## What's New
 
+### wandbot v1.3.0
+**New:**
+
+- **Move to uv for package management**: Installs and dependency checks cut down from minutes to seconds
+- **Support python 3.12 on replit**
+- **Move to lazing loading in app.py to help with startup**: Replit app deployments can't seen to handle the delay from loading the app, despite attempting async or background tasks
+- **Add wandb artifacts cache cleanup**: Saved 1.2GB of disk space
+- **Turn off web search**: Currently we don't have a web search provider to use.
+- **Refactored EvalConfig and evals script**: Switched config to using simple_parsing for free cli arguments. Added n_trials, debug mode. Undid hardcoding of ja weave eval dataset.
+- **Removed langchain-cohere**: Started hitting dependency errors, removed it in favor of raw cohere client.
+- **wandb Tables Feedback logging disabled in prep for Weave feedback**
+- **Small formatting updates for weave.op**
+- **Add dotenv in app.py for easy env var loads**
+
+
 ### wandbot v1.2.0
 
 This release introduces a number of exciting updates and improvements:
@@ -35,76 +50,69 @@ Japanese
 
 ## Features
 
-- Wandbot employs Retrieval Augmented Generation with a ChromaDB backend, ensuring efficient and accurate responses to user queries by retrieving relevant documents.
+- WandBot uses:
+  - a local ChromaDB vector store
+  - OpenAI's v3 embeddings
+  - GPT-4 for query enhancement and response synthesis
+  - Cohere's re-ranking model
 - It features periodic data ingestion and report generation, contributing to the bot's continuous improvement. You can view the latest data ingestion report [here](https://wandb.ai/wandbot/wandbot-dev/reportlist).
 - The bot is integrated with Discord and Slack, facilitating seamless integration with these popular collaboration platforms.
-- Performance monitoring and continuous improvement are made possible through logging and analysis with Weights & Biases Tables. Visit the workspace for more details [here](https://wandb.ai/wandbot/wandbot_public).
-- Wandbot has a fallback mechanism for model selection, which is used when GPT-4 fails to generate a response.
-- The bot's performance is evaluated using a mix of metrics, including retrieval accuracy, string similarity, and the correctness of model-generated responses.
-- Curious about the custom system prompt used by the bot? You can view the full prompt [here](data/prompts/chat_prompt.json).
+- Performance monitoring and continuous improvement are made possible through logging and analysis with Weights & Biases Weave
+- Has a fallback mechanism for model selection
 
 ## Installation
 
-The project is built with Python version `>=3.10.0,<3.11` and utilizes [poetry](https://python-poetry.org/) for managing dependencies. Follow the steps below to install the necessary dependencies:
+The project is built with Python version `3.12` and utilizes `uv` for dependency management. Follow the steps below to install the necessary dependencies:
 
 ```bash
-git clone git@github.com:wandb/wandbot.git
-pip install poetry
-cd wandbot
-poetry install --all-extras
-# Depending on which platform you want to run on run the following command:
-# poetry install --extras discord # for discord
-# poetry install --extras slack # for slack
-# poetry install --extras api # for api
+bash build.sh
 ```
 
 ## Usage
 
-### Data Ingestion
-
-The data ingestion module pulls code and markdown from Weights & Biases repositories [docodile](https://github.com/wandb/docodile) and [examples](https://github.com/wandb/examples) ingests them into vectorstores for the retrieval augmented generation pipeline.
-To ingest the data run the following command from the root of the repository
-```bash
-poetry run python -m src.wandbot.ingestion
-```
-You will notice that the data is ingested into the `data/cache` directory and stored in three different directories `raw_data`, `vectorstore` with individual files for each step of the ingestion process.
-These datasets are also stored as wandb artifacts in the project defined in the environment variable `WANDB_PROJECT` and can be accessed from the [wandb dashboard](https://wandb.ai/wandb/wandbot-dev).
-
-
-### Running the Q&A Bot
+### Running WandBot
 
 Before running the Q&A bot, ensure the following environment variables are set:
 
 ```bash
 OPENAI_API_KEY
 COHERE_API_KEY
+WANDB_API_KEY
+WANDBOT_API_URL="http://localhost:8000"
+WANDB_TRACING_ENABLED="true"
+LOG_LEVEL=INFO
+WANDB_PROJECT="wandbot-dev"
+WANDB_ENTITY= <your W&B entity>
+
+```
+
+If you're running the slack or discord apps you'll also need the following keys/tokens set as env vars:
+
+```
 SLACK_EN_APP_TOKEN
 SLACK_EN_BOT_TOKEN
 SLACK_EN_SIGNING_SECRET
 SLACK_JA_APP_TOKEN
 SLACK_JA_BOT_TOKEN
 SLACK_JA_SIGNING_SECRET
-WANDB_API_KEY
 DISCORD_BOT_TOKEN
-COHERE_API_KEY
-WANDBOT_API_URL="http://localhost:8000"
-WANDB_TRACING_ENABLED="true"
-WANDB_PROJECT="wandbot-dev"
-WANDB_ENTITY="wandbot"
 ```
 
-Once these environment variables are set, you can start the Q&A bot application using the following commands:
+Then build the app to install all dependencies in a virtual env.
+
+```
+bash build.sh
+```
+
+Start the Q&A bot application using the following commands:
 
 ```bash
-(poetry run uvicorn wandbot.api.app:app --host="0.0.0.0" --port=8000 > api.log 2>&1) & \
-(poetry run python -m wandbot.apps.slack -l en > slack_en_app.log 2>&1) & \
-(poetry run python -m wandbot.apps.slack -l ja > slack_ja_app.log 2>&1) & \
-(poetry run python -m wandbot.apps.discord > discord_app.log 2>&1)
+bash run.sh
 ```
 
-You might need to then call the endpoint to trigger the final wandbot app initialisation:
+Then call the endpoint to trigger the final wandbot app initialisation:
 ```bash
-curl http://localhost:8000/
+curl http://localhost:8000/startup
 ```
 
 For more detailed instructions on installing and running the bot, please refer to the [run.sh](./run.sh) file located in the root of the repository.
@@ -113,44 +121,110 @@ Executing these commands will launch the API, Slackbot, and Discord bot applicat
 
 ### Running the Evaluation pipeline
 
-Make sure to set the environments in your terminal.
+**Eval Config**
+
+Modify the evaluation config file here: `wandbot/src/wandbot/evaluation/config.py`
+
+`evaluation_strategy_name` : attribute name in Weave Evaluation dashboard
+`eval_dataset` : 
+    - [Latest English evaluation dataset](https://wandb.ai/wandbot/wandbot-eval/weave/datasets?peekPath=%2Fwandbot%2Fwandbot-eval%2Fobjects%2Fwandbot_eval_data%2Fversions%2FeCQQ0GjM077wi4ykTWYhLPRpuGIaXbMwUGEB7IyHlFU%3F%26): "weave:///wandbot/wandbot-eval/object/wandbot_eval_data:eCQQ0GjM077wi4ykTWYhLPRpuGIaXbMwUGEB7IyHlFU"
+    - [Latest Japanese evaluation dataset](https://wandb.ai/wandbot/wandbot-eval-jp/weave/datasets?peekPath=%2Fwandbot%2Fwandbot-eval-jp%2Fobjects%2Fwandbot_eval_data_jp%2Fversions%2FoCWifIAtEVCkSjushP0bOEc5GnhsMUYXURwQznBeKLA%3F%26): "weave:///wandbot/wandbot-eval-jp/object/wandbot_eval_data_jp:oCWifIAtEVCkSjushP0bOEc5GnhsMUYXURwQznBeKLA" 
+`eval_judge_model` : model used for judge
+`wandb_entity` : wandb entity name for record
+`wandb_project` : wandb project name for record
+
+**Dependencies**
 
+Ensure wandbot is installed by installing the production depenencies, activate the virtual env that was created and then install the evaluation dependencies
+
+```
+bash build.sh
+source wandbot_venv/bin/activate
+uv pip install -r eval_requirements.txt
+poetry install
 ```
-set -o allexport; source .env; set +o allexport
+
+**Environment variables**
+
+Make sure to set the environment variables (i.e. LLM provider keys etc) from the `.env` file.
+
+**Launch the wandbot app**
+You can either use `uvicorn` or `gunicorn` to launch N workers to be able to serve eval requests in parallel. Note that weave Evaluations also have a limit on the number of parallel calls make, set via the `WEAVE_PARALLELISM` env variable, which is set further down in the `eval.py` file using the `n_weave_parallelism` flag. Launch wandbot with 8 workers for faster evaluation. The `WANDBOT_FULL_INIT` env var triggers the full wandbot app initialization.
+
+`uvicorn`
+```bash
+WANDBOT_FULL_INIT=1 uvicorn wandbot.api.app:app \
+--host 0.0.0.0 \
+--port 8000 \
+--workers 8 \
+--timeout-keep-alive 75 \
+--loop uvloop \
+--http httptools
+```
+
+alternatively you can also run wandbot with `gunicorn`:
+
+```bash
+WANDBOT_FULL_INIT=1 \
+    ./wandbot_venv/bin/gunicorn wandbot.api.app:app \
+    --preload \
+    --bind 0.0.0.0:8000 \
+    --timeout=200 \
+    --workers=20 \
+    --worker-class uvicorn.workers.UvicornWorker
 ```
 
-Launch the wandbot with 8 workers. This speeds up evaluation
+Testing: You can test that the app is running correctly by making a request to the `chat/query` endpoint, you should receive a response payload back from wandbot after 30 - 90 seconds:
 
+```bash
+curl -X POST \
+   http://localhost:8000/chat/query \
+  -H 'Content-Type: application/json' \
+  -d '{"question": "How do I log a W&B artifact?"}'
 ```
-WANDBOT_EVALUATION=1 gunicorn wandbot.api.app:app --bind 0.0.0.0:8000 --timeout=200 --workers=8 --worker-class uvicorn.workers.UvicornWorker
+
+**Debugging**
+For debugging purposes during evaluation you can run a single instance of the app by chaning the `uvicorn` command above to use `--workers 1` 
 ```
 
+**Run the evaluation**
 
+Launch W&B Weave evaluation in the root `wandbot` directory. Ensure that you're virtual envionment is active. By default, a sample will be evaluated 3 times in order to account for both the stochasticity of wandbot and our LLM judge. For debugging, pass the `--debug` flag to only evaluate on a small number of samples. To adjust the number of parallel evaluation calls weave makes use the `--n_weave_parallelism` flag when calling `eval.py` 
 
-Set up for evaluation
+```
+source wandbot_venv/bin/activate
 
-wandbot/src/wandbot/evaluation/config.py
-- `evaluation_strategy_name` : attribute name in Weave Evaluation dashboard
-- `eval_dataset` : 
-    - [Latest English evaluation dataset](https://wandb.ai/wandbot/wandbot-eval/weave/datasets?peekPath=%2Fwandbot%2Fwandbot-eval%2Fobjects%2Fwandbot_eval_data%2Fversions%2FeCQQ0GjM077wi4ykTWYhLPRpuGIaXbMwUGEB7IyHlFU%3F%26): "weave:///wandbot/wandbot-eval/object/wandbot_eval_data:eCQQ0GjM077wi4ykTWYhLPRpuGIaXbMwUGEB7IyHlFU"
-    - [Latest Japanese evaluation dataset](https://wandb.ai/wandbot/wandbot-eval-jp/weave/datasets?peekPath=%2Fwandbot%2Fwandbot-eval-jp%2Fobjects%2Fwandbot_eval_data_jp%2Fversions%2FoCWifIAtEVCkSjushP0bOEc5GnhsMUYXURwQznBeKLA%3F%26): "weave:///wandbot/wandbot-eval-jp/object/wandbot_eval_data_jp:oCWifIAtEVCkSjushP0bOEc5GnhsMUYXURwQznBeKLA" 
-- `eval_judge_model` : model used for judge
-- `wandb_entity` : wandb entity name for record
-- `wandb_project` : wandb project name for record
+python src/wandbot/evaluation/weave_eval/eval.py
+```
+
+Debugging, only running evals on 1 sample and for 1 trial:
 
-Launch W&B Weave evaluation
 ```
-python src/wandbot/evaluation/weave_eval/main.py
+python src/wandbot/evaluation/weave_eval/eval.py  --debug --n_debug_samples=1 --n_trials=1
 ```
 
-## Overview of the Implementation
+Evaluate on Japanese dataset:
 
-1. Creating Document Embeddings with ChromaDB
-2. Constructing the Q&A RAGPipeline
-3. Selection of Models and Implementation of Fallback Mechanism
-4. Deployment of the Q&A Bot on FastAPI, Discord, and Slack
-5. Utilizing Weights & Biases Tables for Logging and Analysis
-6. Evaluating the Performance of the Q&A Bot
+```
+python src/wandbot/evaluation/weave_eval/eval.py  --lang ja
+```
+
+To only evaluate each sample once:
+
+```
+python src/wandbot/evaluation/weave_eval/eval.py  --n_trials 1
+```
+
+
+### Data Ingestion
 
-You can monitor the usage of the bot in the following project:
-https://wandb.ai/wandbot/wandbot_public
+The data ingestion module pulls code and markdown from Weights & Biases repositories [docodile](https://github.com/wandb/docodile) and [examples](https://github.com/wandb/examples) ingests them into vectorstores for the retrieval augmented generation pipeline.
+To ingest the data run the following command from the root of the repository
+
+```bash
+python -m wandbot.ingestion
+```
+
+You will notice that the data is ingested into the `data/cache` directory and stored in three different directories `raw_data`, `vectorstore` with individual files for each step of the ingestion process.
+
+These datasets are also stored as wandb artifacts in the project defined in the environment variable `WANDB_PROJECT` and can be accessed from the [wandb dashboard](https://wandb.ai/wandb/wandbot-dev).