feat(chat): fetch examples from chromadb for few-shot learning #21

supermaxiste · 2024-01-25T10:51:48Z

This PR adds the following:

Major:

New ChromaDB collection that includes example questions and corresponding queries. It works as follows:

It creates a collection called examples/ where all the example questions are indexed and their corresponding query attached as metadata
A new function generate_examples queries the collection and outputs a structured prompt which can be embedded into a template prompt

A practical example. Given a file q1.sparql:

# Who am I?
SELECT ?me

The ChromaDB collection examples/ will store the question Who am I? and attach SELECT ?me as metadata. When calling the generate_example function on a question (Who am I working with?), ChromaDB will look for the closest question(s) and return the following output:

Question:
Who am I?
Answer:
SELECT ?me

Minor

Renamed default collection to schema instead of test
Added function to import examples (get_sparql_examples) to io.py as it seemed more fitting there
Updated dependencies to have prefect working (see Update prefect version #18 for longer-term solution)
Updated ChromaDB API functions

… corresponding queries

cmdoret

🙌 Nice work! I have a few suggestions and comments, mostly about keeping functions simple and reusable.

fix: default prompt templates in aikg/config/chat.py need to be adapted to support {examples_str}.

aikg/config/chroma.py

aikg/flows/chroma_examples.py

aikg/server.py

aikg/utils/chat.py

aikg/utils/io.py

Co-authored-by: Cyril Matthey-Doret <cyril.matthey-doret@epfl.ch>

supermaxiste · 2024-02-05T13:57:41Z

Thank you for all the comments and suggestions @cmdoret 🙌

Summary of relevant or open changes

Based on your suggestion on making the example parser more flexible:

I created a parse_sparql_example function that works with a text stream in aikg/utils/io.py
I added get_sparql_examples as a task in aikg/flows/chroma_examples.py as you suggested

Clarification 1

clarification/question: the newget_sparql_examples function works slightly differently compared to before.

Before: output a list with a single Document
After: output a list with multiple Document one per example

Because of this in chroma_build_examples_flow I had to loop through the output to be able to batch

# aikg/flows/chroma_examples.py 
# L110-122

    # Create subject documents
    docs = get_sparql_examples(
        dir=chroma_input_dir,
    )

    # Vectorize and index documents by batches to reduce overhead
    logger.info(f"Indexing by batches of {chroma_cfg.batch_size} items")
    embed_counter = 0
    for doc in docs:
        for batch in chunked(doc, chroma_cfg.batch_size):
            embed_counter += len(batch)
            index_batch(batch)
    logger.info(f"Indexed {embed_counter} items.")

Clarification 2

clarification/question: for #21 (review) should I directly change the sparql_template to match the one we used or would you like to add another template so that we have one with and one without examples?

cmdoret · 2024-02-05T16:32:56Z

clarification 1: If docs is a list of Documents, we should be able to loop directly on chunk(docs):

In [2]: from more_itertools import chunked

In [3]: docs = [{'data': i, 'meta': v} for i, v in enumerate(['red', 'blue', 'green'])]

In [4]: docs
Out[4]: 
[{'data': 0, 'meta': 'red'},
 {'data': 1, 'meta': 'blue'},
 {'data': 2, 'meta': 'green'}]

In [5]: for batch in chunked(docs, 2):
   ...:     print(f"---\nBATCH: {batch}")
   ...: 
---
BATCH: [{'data': 0, 'meta': 'red'}, {'data': 1, 'meta': 'blue'}]
---
BATCH: [{'data': 2, 'meta': 'green'}]

clarification 2: Indeed, examples should be optional. We can change the template directly, however we should set a default value for the examples parameter of generate_sparql, such that the user can ignore it and it will just insert nothing. Does that make sense?.
- Also rather than having the Examples: header in the template, it should probably be inserted dynamically if examples is not empty. Otherwise, not providing examples will result in an Example: header followed by another header.

supermaxiste · 2024-02-19T09:24:20Z

Alright, all changes included as follows:

generate_sparql now has examples set to an empty string as default and the argument had to be moved after the arguments without default
without example provided, now {str_examples} doesn't inject anything into the prompt. The header "Examples" is part of the example generation function
I restored the loop on Documents to a loop on chunks defined by the chroma config and within the chunks I'm looping through each doc

I tested the code and it runs successfully 👍

cmdoret

I just have one last question regarding the batch-indexing. Other than that, it looks good! :)

aikg/flows/chroma_examples.py

cmdoret

Looks good to me :) Great work 🚀

Danyil and others added 6 commits January 25, 2024 11:31

skip header row for rdf requests

023e54e

fix: update chromadb.API -> chromadb.ClientApi

e3ded23

chore: update dep versions

14185eb

fix(deps): utils fix and dependency temp fix for prefect

5b349e7

feat(examples-draft): first draft to provide examples to prompt

30094c7

feat(few-shot): implemented chromadb setup for examples questions and…

8745471

… corresponding queries

supermaxiste added the enhancement New feature or request label Jan 25, 2024

supermaxiste self-assigned this Jan 25, 2024

cmdoret self-requested a review January 25, 2024 12:21

cmdoret requested changes Jan 26, 2024

View reviewed changes

supermaxiste and others added 7 commits February 5, 2024 13:18

fix: example env variable name

ed14f6e

Co-authored-by: Cyril Matthey-Doret <cyril.matthey-doret@epfl.ch>

suggestion(style): emphasis on pairs

4b3cf6a

Co-authored-by: Cyril Matthey-Doret <cyril.matthey-doret@epfl.ch>

fix: add document input dir parameter

6b36123

Co-authored-by: Cyril Matthey-Doret <cyril.matthey-doret@epfl.ch>

feat: add path validation options to chroma input_dir

9cea0b1

Co-authored-by: Cyril Matthey-Doret <cyril.matthey-doret@epfl.ch>

chore: remove commented code

8130850

Co-authored-by: Cyril Matthey-Doret <cyril.matthey-doret@epfl.ch>

fix: drop debugging print statement

074176f

Co-authored-by: Cyril Matthey-Doret <cyril.matthey-doret@epfl.ch>

refactor: make parser for examples more flexible

537a9af

Stefan Milosavljevic added 2 commits February 19, 2024 09:23

fix(examples): examples optional and improved looping

60a9afa

fix(loop): restored chunk loop and added internal loop

c25f6df

cmdoret reviewed Feb 22, 2024

View reviewed changes

aikg/flows/chroma_examples.py Outdated Show resolved Hide resolved

fix: get_sparql_example returns a doc instead of a list with doc

23d98c8

cmdoret self-requested a review February 29, 2024 13:11

cmdoret approved these changes Feb 29, 2024

View reviewed changes

cmdoret changed the title ~~feat(few-shot-examples): add new chromadb flow to get similar examples~~ feat(chat): fetch examples from chromadb for few-shot learning Mar 1, 2024

cmdoret merged commit ec3618e into main Mar 1, 2024

cmdoret deleted the feat/examples branch April 29, 2024 11:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(chat): fetch examples from chromadb for few-shot learning #21

feat(chat): fetch examples from chromadb for few-shot learning #21

supermaxiste commented Jan 25, 2024

cmdoret left a comment

supermaxiste commented Feb 5, 2024

cmdoret commented Feb 5, 2024

supermaxiste commented Feb 19, 2024

cmdoret left a comment •

edited

Loading

cmdoret left a comment

feat(chat): fetch examples from chromadb for few-shot learning #21

feat(chat): fetch examples from chromadb for few-shot learning #21

Conversation

supermaxiste commented Jan 25, 2024

Major:

Minor

cmdoret left a comment

Choose a reason for hiding this comment

supermaxiste commented Feb 5, 2024

Summary of relevant or open changes

Clarification 1

Clarification 2

cmdoret commented Feb 5, 2024

supermaxiste commented Feb 19, 2024

cmdoret left a comment • edited Loading

Choose a reason for hiding this comment

cmdoret left a comment

Choose a reason for hiding this comment

cmdoret left a comment •

edited

Loading