Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wandbot v1.3 - There be changes ahead #85

Open
wants to merge 145 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
145 commits
Select commit Hold shift + click to select a range
4d23861
Upgrade to python 3.11, remove poetry for uv, move to lazy loading in…
morganmcg1 Dec 26, 2024
ed0b3d9
ruff fixes
morganmcg1 Dec 26, 2024
284ad1d
run black
morganmcg1 Dec 26, 2024
94adeca
update to weave.op
morganmcg1 Dec 26, 2024
324164e
Update pyproject.toml
morganmcg1 Dec 26, 2024
6cbabe3
remove poetry.lock
morganmcg1 Dec 26, 2024
960c2a4
add disk-usage route
morganmcg1 Dec 26, 2024
f9bceb2
fix disk usage
morganmcg1 Dec 26, 2024
25941a9
add no-user to installs
morganmcg1 Dec 26, 2024
47bc78b
simple, newer .replit file
morganmcg1 Dec 26, 2024
143a06c
Update README
morganmcg1 Dec 26, 2024
182c667
add clear pip cache
morganmcg1 Dec 26, 2024
8eae613
add disk usage increment logging
morganmcg1 Dec 26, 2024
5d40093
Add dotenv for better devX
morganmcg1 Dec 26, 2024
a520738
add bulid.sh debug disk usage logging
morganmcg1 Dec 26, 2024
380527d
add top 20 disk usage logging
morganmcg1 Dec 26, 2024
8d81f01
tidy up
morganmcg1 Dec 26, 2024
555d433
add wandb cache cleanup
morganmcg1 Dec 26, 2024
eaeedc5
readme
morganmcg1 Dec 26, 2024
825741e
Update readme
morganmcg1 Dec 27, 2024
a8cefd7
Add args and configs to eval script, update reqs, update readme
morganmcg1 Dec 27, 2024
eea41f0
update env var, rename main.py to eval
morganmcg1 Dec 27, 2024
e06ab67
Remove langchain-cohere
morganmcg1 Dec 27, 2024
230e0bf
disable feedback logging for now
morganmcg1 Dec 27, 2024
41f2bfb
Readme
morganmcg1 Dec 27, 2024
822ff1a
remove feedback
morganmcg1 Dec 31, 2024
a58d27a
remove poetry lock and eval deps
morganmcg1 Dec 31, 2024
b0893be
add eval_requiremnts.txt
morganmcg1 Dec 31, 2024
bf65a39
OpenHands: Add native chromadb implementation with optimized MMR search
openhands-agent Jan 1, 2025
8f47776
OpenHands: Switch to native chromadb with numpy 2.2.0 support
openhands-agent Jan 1, 2025
03fb31b
OpenHands: Update chromadb to 0.6.0 for numpy 2.2.0 compatibility
openhands-agent Jan 1, 2025
a3a1fdd
python 3.12, add retries for evals
morganmcg1 Jan 1, 2025
8567ba1
Merge branch 'make_wandbot_great_again' of https://github.com/wandb/w…
morganmcg1 Jan 1, 2025
17df3fb
update eval naming and logging
morganmcg1 Jan 2, 2025
dcdb23d
Add validation error retry to query enhancer chain
morganmcg1 Jan 2, 2025
914a1e9
update default index artifact, add config logging to evals
morganmcg1 Jan 2, 2025
6dc42da
remove emojis from disk usage message
morganmcg1 Jan 3, 2025
7c2d779
formatting
morganmcg1 Jan 3, 2025
b43c2d6
update readme
morganmcg1 Jan 3, 2025
cf8a633
modify similarity search in retrieval
morganmcg1 Jan 3, 2025
e0a54f4
Replace langchain-chroma with native ChromaDB implementation
openhands-agent Jan 4, 2025
6ace947
remove mistaken openhands chroma commit
morganmcg1 Jan 5, 2025
8c0ea03
rename chroma wrapper
morganmcg1 Jan 5, 2025
e91d81b
Fix native ChromaDB implementation to match langchain-chroma behavior
openhands-agent Jan 5, 2025
0ce71f5
remove langchain embeddings, add native embeddings models
morganmcg1 Jan 5, 2025
8b5b856
make EmbeddingsModel callable
morganmcg1 Jan 5, 2025
60c66f3
Centralise all configs
morganmcg1 Jan 6, 2025
4ad024c
fix configs
morganmcg1 Jan 6, 2025
66313fc
prompt updates
morganmcg1 Jan 6, 2025
61ddd45
update readme
morganmcg1 Jan 6, 2025
f6e4f7b
increase retries for query handler
morganmcg1 Jan 6, 2025
255cd3f
Add e2b dockerfile
morganmcg1 Jan 8, 2025
80278d4
update docker file
morganmcg1 Jan 8, 2025
0da7bdc
docker fixes
morganmcg1 Jan 8, 2025
b6fd1d6
Update readme and dockerfile
morganmcg1 Jan 9, 2025
d52c23d
add docker temp dir clearnup
morganmcg1 Jan 9, 2025
c002c88
Use code interpreter with python 3.12
jakubno Jan 9, 2025
67a7005
Merge pull request #87 from jakubno/make_wandbot_great_again
morganmcg1 Jan 9, 2025
a3d0714
fix retry in query handler
morganmcg1 Jan 9, 2025
51afeb4
tidy up configs app routes, tidy up web search
morganmcg1 Jan 9, 2025
3aa0103
Add async methods through entire RAG pipeline
morganmcg1 Jan 12, 2025
b56e0eb
fix up evals script
morganmcg1 Jan 12, 2025
8160686
evals import fix
morganmcg1 Jan 12, 2025
700efa0
evals config updates
morganmcg1 Jan 13, 2025
c756f8d
gitignire
morganmcg1 Jan 13, 2025
ffcdc74
modify QueryEnhancer prompts
morganmcg1 Jan 15, 2025
6d93e4f
Eval: pass experiment name to wandbot call; update eval config import
morganmcg1 Jan 16, 2025
d50618e
tidy up response synthesis args and update readme
morganmcg1 Jan 19, 2025
174d9f8
commit for now
morganmcg1 Jan 19, 2025
1ce0f59
undo QueryEnhancer prompt changes for eval
morganmcg1 Jan 19, 2025
12d0b9a
update index to chroma v34 for eval
morganmcg1 Jan 19, 2025
c9a205c
fix app to eval config
morganmcg1 Jan 19, 2025
41dafc5
quieten disk usage logs on startup
morganmcg1 Jan 19, 2025
96a58f2
Add de-dupe of retrieved contexts before re-ranking
morganmcg1 Jan 19, 2025
4dca364
Experiment: try fetch_k = 20
morganmcg1 Jan 20, 2025
6ddc48a
Downgrade some requirements for old compatibility
morganmcg1 Jan 20, 2025
b1e45cf
Adds float or base64 encoding option to EmbeddingModel, set config to…
morganmcg1 Jan 20, 2025
417df7c
actually switch embedding encoding format to base64, update eval.py f…
morganmcg1 Jan 20, 2025
212d32e
change search type from mmr to similarity
morganmcg1 Jan 20, 2025
ef2111e
Tidy up retriever and configs
morganmcg1 Jan 22, 2025
4bbe178
Implement hacky MMR for langchain MMR equivlancy
morganmcg1 Jan 22, 2025
c8b4648
Decompose EmbeddingModel into provider-specific classes
morganmcg1 Jan 22, 2025
cf4636b
fix EmbeddingCall
morganmcg1 Jan 22, 2025
dcc379a
Return to fetch_k 60 for experiment.
morganmcg1 Jan 22, 2025
c64a382
Update config naming
morganmcg1 Jan 22, 2025
8b7d067
Add rich logging
morganmcg1 Jan 24, 2025
8d93cc3
Remove llamaindex, langchain and Raga eval dependencies
morganmcg1 Jan 24, 2025
4f0fab1
remove llama_index, ragas package deps
morganmcg1 Jan 24, 2025
73da804
base64 handling in EmbeddingModel
morganmcg1 Jan 24, 2025
0582a58
silence prints in hacky mmr
morganmcg1 Jan 24, 2025
840427d
Add Evaluation tests
morganmcg1 Jan 24, 2025
a28274f
Re-organise evaluation folder, keep relevancy, faithfulness prompts
morganmcg1 Jan 24, 2025
7777783
Update rich console log style
morganmcg1 Jan 24, 2025
999e6b3
Tidy up evaluation output format
morganmcg1 Jan 24, 2025
f58621e
Add LLMModel class to swap out model providers
morganmcg1 Jan 24, 2025
5655d1c
Fix up LLLModel and correctness evaluator
morganmcg1 Jan 24, 2025
9ba48b8
Better Evaluation and LLM parsing and error handling
morganmcg1 Jan 24, 2025
8e13842
Switch eval prompt back to use "reason"
morganmcg1 Jan 25, 2025
89f7a90
add anthropic to requirements.txt
morganmcg1 Jan 25, 2025
5b470cd
Remove langchain from QueryEnhancer
morganmcg1 Jan 25, 2025
3fceb99
Fix anthropic response_model parsing
morganmcg1 Jan 25, 2025
9781cfa
modify eval message
morganmcg1 Jan 25, 2025
b235d90
Update QeuryEnhancer and tests
morganmcg1 Jan 25, 2025
25e4d35
Update tests
morganmcg1 Jan 25, 2025
7545e37
update .gitignore
morganmcg1 Jan 25, 2025
bc093d7
Add more logging for api call error
morganmcg1 Jan 25, 2025
4157c7a
API monitoring: Add retries to config, add error propagation from api…
morganmcg1 Jan 28, 2025
8c95706
Add addition error propagation to tests
morganmcg1 Jan 28, 2025
34a8cc0
Replace langchain Document with native pydantic model
morganmcg1 Jan 28, 2025
9277ef0
add ApiStatus to unify api error handling
morganmcg1 Jan 28, 2025
809ae9f
update pydantic settings for configs
morganmcg1 Jan 28, 2025
fbe2246
Update pydantic configs
morganmcg1 Jan 28, 2025
964953f
Update tests, add embeddings model tests
morganmcg1 Jan 28, 2025
4b063fd
Update pydantic configs
morganmcg1 Jan 28, 2025
20a3fb0
Add .cursor rules
morganmcg1 Jan 28, 2025
a1c4d0b
Update query enhancer api status reporting, add RetrievalResult, add …
morganmcg1 Jan 28, 2025
d71bbad
Emit api status and response synthesis llm messages to evaluator
morganmcg1 Jan 28, 2025
8b67660
Modify query embedding and retrieval
morganmcg1 Jan 28, 2025
4ce60ae
fetch_k=20 (from 60), deleted old chat config
morganmcg1 Jan 29, 2025
311d9d2
Fix web search api status
morganmcg1 Jan 29, 2025
3b9fd86
Adust passing around of configs and params
morganmcg1 Jan 29, 2025
8112ab5
Update .cursorrules
morganmcg1 Jan 30, 2025
97f6e59
Add logging of query in debug_run_mmr_batch function input
morganmcg1 Jan 30, 2025
7d77cfc
add query to debug_run_mmr_batch 2
morganmcg1 Jan 30, 2025
0dd1428
Update weave requirement and README
morganmcg1 Jan 30, 2025
ef3e1eb
Update README
morganmcg1 Jan 30, 2025
16fa1a1
fetch_k-20
morganmcg1 Jan 30, 2025
f7324e2
Remove re-ranker fallback and instead raise an exception
morganmcg1 Jan 30, 2025
2974bc4
Use cha_config instead of search_paras in retriever. Bring vectorstor…
morganmcg1 Jan 30, 2025
3720348
fix _async_retrieve signature to not expect search_params
morganmcg1 Jan 30, 2025
7769dbf
fix up .retrieve call to only take query_texts
morganmcg1 Jan 30, 2025
28fbbfa
fix rereanker error handling
morganmcg1 Jan 30, 2025
52d24ee
Update logger message
morganmcg1 Jan 30, 2025
ee9cd10
Modify re-ranker logger message
morganmcg1 Jan 30, 2025
4fc1ab9
Add improved reranker retry to use config values
morganmcg1 Jan 30, 2025
2924568
adjust chat_config type
morganmcg1 Jan 30, 2025
86982bf
chat_config types
morganmcg1 Jan 30, 2025
f5ff831
fix up retry configs
morganmcg1 Jan 30, 2025
abba225
Update chat.py logging and docstring
morganmcg1 Jan 30, 2025
923a860
Fix tests
morganmcg1 Jan 30, 2025
921aa0d
small correctness file cleanups
morganmcg1 Jan 30, 2025
9ab8c61
Clean up MMR implementation part 1
morganmcg1 Jan 30, 2025
a360d0e
Tidy up MMR part 2
morganmcg1 Jan 31, 2025
f849b7e
Tody up logging
morganmcg1 Jan 31, 2025
659c13d
Raise error if reranker fails, increase retry wait, add checks for ty…
morganmcg1 Jan 31, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions .cursorrules
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
When running tests in this codebase:

# Testing
1. Use the following pytest flags to prevent early exit issues and ensure complete test output:
```bash
python -m pytest tests/ -v --tb=short --capture=tee-sys
```

2. These flags help in the following ways:
- `-v`: Verbose output
- `--tb=short`: Short traceback format
- `--capture=tee-sys`: Proper output capture that prevents early termination

3. This is particularly important for async tests and tests involving API calls or event loops.

4. If you need to debug a specific test, you can run it in isolation:
```bash
python -m pytest tests/path_to_test.py::test_name -v --tb=short --capture=tee-sys
```

Remember to use these flags when running tests to ensure reliable test execution and complete output.

# Using Weave to analyze logged inputs and outputs

The Weave api can be used to analyze logged inputs and outputs. Here is an example of iterating over the
input documents to this call and extracting the ids.

Search the Weave documentation for more information on how to use the Weave api.

```python
import weave
client = weave.init("wandbot/wandbot-dev")
candidate_call = client.get_call("0194b427-ba78-77f3-9989-222419262817")
final_candidate_ids = [doc.metadata["id"] for doc in candidate_call.inputs["inputs"].documents]
```
10 changes: 10 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,11 +1,19 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.
*.py[cod]
*$py.class

temp_index/
e2b*

# C extensions
*.so

testing_*.py
testing_*.ipynb
temp_*

# Distribution / packaging
.Python
build/
Expand Down Expand Up @@ -105,6 +113,8 @@ celerybeat.pid
# Environments
.env
.venv
wandbot_venv/
3-10_env/
env/
venv/
ENV/
Expand Down
19 changes: 8 additions & 11 deletions .replit
Original file line number Diff line number Diff line change
@@ -1,19 +1,16 @@
run = "bash run.sh"
entrypoint = "main.py"
modules = ["python-3.10:v18-20230807-322e88b"]
modules = ["python-3.12"]

disableInstallBeforeRun = true
[nix]
channel = "stable-24_05"

hidden = [".pythonlibs"]
[unitTest]
language = "python3"

[nix]
channel = "stable-23_05"
[gitHubImport]
requiredFiles = [".replit", "replit.nix"]

[deployment]
run = ["sh", "-c", "bash run.sh"]
build = ["sh", "-c", "bash build.sh"]
deploymentTarget = "gce"

[[ports]]
localPort=8000
externalPort=80
build = ["sh", "-c", "bash build.sh"]
202 changes: 138 additions & 64 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,24 @@
# wandbot

Wandbot is a question-answering bot designed specifically for Weights & Biases [documentation](https://docs.wandb.ai/).
WandBot is a question-answering bot designed specifically for Weights & Biases Models and Weave [documentation](https://docs.wandb.ai/).

## What's New

### wandbot v1.3.0
**New:**

- **Move to uv for package management**: Installs and dependency checks cut down from minutes to seconds
- **Support python 3.12 on replit**
- **Move to lazing loading in app.py to help with startup**: Replit app deployments can't seen to handle the delay from loading the app, despite attempting async or background tasks
- **Add wandb artifacts cache cleanup**: Saved 1.2GB of disk space
- **Turn off web search**: Currently we don't have a web search provider to use.
- **Refactored EvalConfig and evals script**: Switched config to using simple_parsing for free cli arguments. Added n_trials, debug mode. Undid hardcoding of ja weave eval dataset.
- **Removed langchain-cohere**: Started hitting dependency errors, removed it in favor of raw cohere client.
- **wandb Tables Feedback logging disabled in prep for Weave feedback**
- **Small formatting updates for weave.op**
- **Add dotenv in app.py for easy env var loads**


### wandbot v1.2.0

This release introduces a number of exciting updates and improvements:
Expand Down Expand Up @@ -35,76 +50,69 @@ Japanese

## Features

- Wandbot employs Retrieval Augmented Generation with a ChromaDB backend, ensuring efficient and accurate responses to user queries by retrieving relevant documents.
- WandBot uses:
- a local ChromaDB vector store
- OpenAI's v3 embeddings
- GPT-4 for query enhancement and response synthesis
- Cohere's re-ranking model
- It features periodic data ingestion and report generation, contributing to the bot's continuous improvement. You can view the latest data ingestion report [here](https://wandb.ai/wandbot/wandbot-dev/reportlist).
- The bot is integrated with Discord and Slack, facilitating seamless integration with these popular collaboration platforms.
- Performance monitoring and continuous improvement are made possible through logging and analysis with Weights & Biases Tables. Visit the workspace for more details [here](https://wandb.ai/wandbot/wandbot_public).
- Wandbot has a fallback mechanism for model selection, which is used when GPT-4 fails to generate a response.
- The bot's performance is evaluated using a mix of metrics, including retrieval accuracy, string similarity, and the correctness of model-generated responses.
- Curious about the custom system prompt used by the bot? You can view the full prompt [here](data/prompts/chat_prompt.json).
- Performance monitoring and continuous improvement are made possible through logging and analysis with Weights & Biases Weave
- Has a fallback mechanism for model selection

## Installation

The project is built with Python version `>=3.10.0,<3.11` and utilizes [poetry](https://python-poetry.org/) for managing dependencies. Follow the steps below to install the necessary dependencies:
The project is built with Python version `3.12` and utilizes `uv` for dependency management. Follow the steps below to install the necessary dependencies:

```bash
git clone git@github.com:wandb/wandbot.git
pip install poetry
cd wandbot
poetry install --all-extras
# Depending on which platform you want to run on run the following command:
# poetry install --extras discord # for discord
# poetry install --extras slack # for slack
# poetry install --extras api # for api
bash build.sh
```

## Usage

### Data Ingestion

The data ingestion module pulls code and markdown from Weights & Biases repositories [docodile](https://github.com/wandb/docodile) and [examples](https://github.com/wandb/examples) ingests them into vectorstores for the retrieval augmented generation pipeline.
To ingest the data run the following command from the root of the repository
```bash
poetry run python -m src.wandbot.ingestion
```
You will notice that the data is ingested into the `data/cache` directory and stored in three different directories `raw_data`, `vectorstore` with individual files for each step of the ingestion process.
These datasets are also stored as wandb artifacts in the project defined in the environment variable `WANDB_PROJECT` and can be accessed from the [wandb dashboard](https://wandb.ai/wandb/wandbot-dev).


### Running the Q&A Bot
### Running WandBot

Before running the Q&A bot, ensure the following environment variables are set:

```bash
OPENAI_API_KEY
COHERE_API_KEY
WANDB_API_KEY
WANDBOT_API_URL="http://localhost:8000"
WANDB_TRACING_ENABLED="true"
LOG_LEVEL=INFO
WANDB_PROJECT="wandbot-dev"
WANDB_ENTITY= <your W&B entity>

```

If you're running the slack or discord apps you'll also need the following keys/tokens set as env vars:

```
SLACK_EN_APP_TOKEN
SLACK_EN_BOT_TOKEN
SLACK_EN_SIGNING_SECRET
SLACK_JA_APP_TOKEN
SLACK_JA_BOT_TOKEN
SLACK_JA_SIGNING_SECRET
WANDB_API_KEY
DISCORD_BOT_TOKEN
COHERE_API_KEY
WANDBOT_API_URL="http://localhost:8000"
WANDB_TRACING_ENABLED="true"
WANDB_PROJECT="wandbot-dev"
WANDB_ENTITY="wandbot"
```

Once these environment variables are set, you can start the Q&A bot application using the following commands:
Then build the app to install all dependencies in a virtual env.

```
bash build.sh
```

Start the Q&A bot application using the following commands:

```bash
(poetry run uvicorn wandbot.api.app:app --host="0.0.0.0" --port=8000 > api.log 2>&1) & \
(poetry run python -m wandbot.apps.slack -l en > slack_en_app.log 2>&1) & \
(poetry run python -m wandbot.apps.slack -l ja > slack_ja_app.log 2>&1) & \
(poetry run python -m wandbot.apps.discord > discord_app.log 2>&1)
bash run.sh
```

You might need to then call the endpoint to trigger the final wandbot app initialisation:
Then call the endpoint to trigger the final wandbot app initialisation:
```bash
curl http://localhost:8000/
curl http://localhost:8000/startup
```

For more detailed instructions on installing and running the bot, please refer to the [run.sh](./run.sh) file located in the root of the repository.
Expand All @@ -113,44 +121,110 @@ Executing these commands will launch the API, Slackbot, and Discord bot applicat

### Running the Evaluation pipeline

Make sure to set the environments in your terminal.
**Eval Config**

Modify the evaluation config file here: `wandbot/src/wandbot/evaluation/config.py`

`evaluation_strategy_name` : attribute name in Weave Evaluation dashboard
`eval_dataset` :
- [Latest English evaluation dataset](https://wandb.ai/wandbot/wandbot-eval/weave/datasets?peekPath=%2Fwandbot%2Fwandbot-eval%2Fobjects%2Fwandbot_eval_data%2Fversions%2FeCQQ0GjM077wi4ykTWYhLPRpuGIaXbMwUGEB7IyHlFU%3F%26): "weave:///wandbot/wandbot-eval/object/wandbot_eval_data:eCQQ0GjM077wi4ykTWYhLPRpuGIaXbMwUGEB7IyHlFU"
- [Latest Japanese evaluation dataset](https://wandb.ai/wandbot/wandbot-eval-jp/weave/datasets?peekPath=%2Fwandbot%2Fwandbot-eval-jp%2Fobjects%2Fwandbot_eval_data_jp%2Fversions%2FoCWifIAtEVCkSjushP0bOEc5GnhsMUYXURwQznBeKLA%3F%26): "weave:///wandbot/wandbot-eval-jp/object/wandbot_eval_data_jp:oCWifIAtEVCkSjushP0bOEc5GnhsMUYXURwQznBeKLA"
`eval_judge_model` : model used for judge
`wandb_entity` : wandb entity name for record
`wandb_project` : wandb project name for record

**Dependencies**

Ensure wandbot is installed by installing the production depenencies, activate the virtual env that was created and then install the evaluation dependencies

```
bash build.sh
source wandbot_venv/bin/activate
uv pip install -r eval_requirements.txt
poetry install
```
set -o allexport; source .env; set +o allexport

**Environment variables**

Make sure to set the environment variables (i.e. LLM provider keys etc) from the `.env` file.

**Launch the wandbot app**
You can either use `uvicorn` or `gunicorn` to launch N workers to be able to serve eval requests in parallel. Note that weave Evaluations also have a limit on the number of parallel calls make, set via the `WEAVE_PARALLELISM` env variable, which is set further down in the `eval.py` file using the `n_weave_parallelism` flag. Launch wandbot with 8 workers for faster evaluation. The `WANDBOT_FULL_INIT` env var triggers the full wandbot app initialization.

`uvicorn`
```bash
WANDBOT_FULL_INIT=1 uvicorn wandbot.api.app:app \
--host 0.0.0.0 \
--port 8000 \
--workers 8 \
--timeout-keep-alive 75 \
--loop uvloop \
--http httptools
```

alternatively you can also run wandbot with `gunicorn`:

```bash
WANDBOT_FULL_INIT=1 \
./wandbot_venv/bin/gunicorn wandbot.api.app:app \
--preload \
--bind 0.0.0.0:8000 \
--timeout=200 \
--workers=20 \
--worker-class uvicorn.workers.UvicornWorker
```

Launch the wandbot with 8 workers. This speeds up evaluation
Testing: You can test that the app is running correctly by making a request to the `chat/query` endpoint, you should receive a response payload back from wandbot after 30 - 90 seconds:

```bash
curl -X POST \
http://localhost:8000/chat/query \
-H 'Content-Type: application/json' \
-d '{"question": "How do I log a W&B artifact?"}'
```
WANDBOT_EVALUATION=1 gunicorn wandbot.api.app:app --bind 0.0.0.0:8000 --timeout=200 --workers=8 --worker-class uvicorn.workers.UvicornWorker

**Debugging**
For debugging purposes during evaluation you can run a single instance of the app by chaning the `uvicorn` command above to use `--workers 1`
```

**Run the evaluation**

Launch W&B Weave evaluation in the root `wandbot` directory. Ensure that you're virtual envionment is active. By default, a sample will be evaluated 3 times in order to account for both the stochasticity of wandbot and our LLM judge. For debugging, pass the `--debug` flag to only evaluate on a small number of samples. To adjust the number of parallel evaluation calls weave makes use the `--n_weave_parallelism` flag when calling `eval.py`

Set up for evaluation
```
source wandbot_venv/bin/activate

wandbot/src/wandbot/evaluation/config.py
- `evaluation_strategy_name` : attribute name in Weave Evaluation dashboard
- `eval_dataset` :
- [Latest English evaluation dataset](https://wandb.ai/wandbot/wandbot-eval/weave/datasets?peekPath=%2Fwandbot%2Fwandbot-eval%2Fobjects%2Fwandbot_eval_data%2Fversions%2FeCQQ0GjM077wi4ykTWYhLPRpuGIaXbMwUGEB7IyHlFU%3F%26): "weave:///wandbot/wandbot-eval/object/wandbot_eval_data:eCQQ0GjM077wi4ykTWYhLPRpuGIaXbMwUGEB7IyHlFU"
- [Latest Japanese evaluation dataset](https://wandb.ai/wandbot/wandbot-eval-jp/weave/datasets?peekPath=%2Fwandbot%2Fwandbot-eval-jp%2Fobjects%2Fwandbot_eval_data_jp%2Fversions%2FoCWifIAtEVCkSjushP0bOEc5GnhsMUYXURwQznBeKLA%3F%26): "weave:///wandbot/wandbot-eval-jp/object/wandbot_eval_data_jp:oCWifIAtEVCkSjushP0bOEc5GnhsMUYXURwQznBeKLA"
- `eval_judge_model` : model used for judge
- `wandb_entity` : wandb entity name for record
- `wandb_project` : wandb project name for record
python src/wandbot/evaluation/weave_eval/eval.py
```

Debugging, only running evals on 1 sample and for 1 trial:

Launch W&B Weave evaluation
```
python src/wandbot/evaluation/weave_eval/main.py
python src/wandbot/evaluation/weave_eval/eval.py --debug --n_debug_samples=1 --n_trials=1
```

## Overview of the Implementation
Evaluate on Japanese dataset:

1. Creating Document Embeddings with ChromaDB
2. Constructing the Q&A RAGPipeline
3. Selection of Models and Implementation of Fallback Mechanism
4. Deployment of the Q&A Bot on FastAPI, Discord, and Slack
5. Utilizing Weights & Biases Tables for Logging and Analysis
6. Evaluating the Performance of the Q&A Bot
```
python src/wandbot/evaluation/weave_eval/eval.py --lang ja
```

To only evaluate each sample once:

```
python src/wandbot/evaluation/weave_eval/eval.py --n_trials 1
```


### Data Ingestion

You can monitor the usage of the bot in the following project:
https://wandb.ai/wandbot/wandbot_public
The data ingestion module pulls code and markdown from Weights & Biases repositories [docodile](https://github.com/wandb/docodile) and [examples](https://github.com/wandb/examples) ingests them into vectorstores for the retrieval augmented generation pipeline.
To ingest the data run the following command from the root of the repository

```bash
python -m wandbot.ingestion
```

You will notice that the data is ingested into the `data/cache` directory and stored in three different directories `raw_data`, `vectorstore` with individual files for each step of the ingestion process.

These datasets are also stored as wandb artifacts in the project defined in the environment variable `WANDB_PROJECT` and can be accessed from the [wandb dashboard](https://wandb.ai/wandb/wandbot-dev).
Loading