Skip to content
/ GoR Public

Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs

License

Notifications You must be signed in to change notification settings

ulab-uiuc/GoR

This branch is up to date with master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Oct 16, 2024
47ac674 · Oct 16, 2024

History

14 Commits
Oct 16, 2024
Oct 14, 2024
Oct 7, 2024
Oct 16, 2024
Oct 14, 2024
Oct 14, 2024
Oct 14, 2024
Oct 14, 2024
Oct 14, 2024
Oct 14, 2024
Oct 14, 2024
Oct 14, 2024
Oct 14, 2024

Repository files navigation

Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs

Build Build License
Build Build Build

🌐 Project Page | 📜 arXiv

GoR

News

[2024.10.16] 🌟 GoR is released.

📌Preliminary

Environment Setup

# python==3.10
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install dgl==1.0.0+cu113 -f https://data.dgl.ai/wheels/cu113/repo.html
pip install openai==0.28
pip install pandas
pip install langchain
pip install langchain-core
pip install langchain-community
pip install langchain-experimental
pip install tiktoken
pip install tqdm
pip install bert_score
pip install rouge_score
pip install networkx
pip install faiss-gpu
pip install transformers

Dataset Preparation

QMSum WCEP Booksum GovReport SQuALITY

Save the downloaded files in the ./data/[DATASET_NAME] folder.

Important

Before running the experiment, please configure your API KEY in "get_llm_response_via_api" in utils.py

⭐Experiments

Query Simulation and Graph Construction

Generate simulated queries and construct graphs. The constructed graphs are saved in the ./graph folder.

# DATASET Choices: qmsum, wcep, booksum, govreport, squality
# Training Set
python graph_construction.py --cuda 0 --dataset [DATASET] --train
# Test Set
python graph_construction.py --cuda 0 --dataset [DATASET]

Training Preparation

Pre-compute BERTScore and save training data in the ./training_data folder.

# DATASET Choices: qmsum, wcep, booksum, govreport, squality
python training_preparation.py --cuda 0 --dataset [DATASET]

Training

# DATASET Choices: qmsum, wcep, booksum, govreport, squality
python train.py --cuda 0 --dataset [DATASET]

Evaluation

# DATASET Choices: qmsum, wcep, booksum, govreport, squality
# Generate summary results
python eval.py --cuda 0 --dataset [DATASET]
# Evaluation
python sum_eval.py --cuda 0 --file_name ./result/[DATASET].json

Citation

@article{GoR,
  title={Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs},
  author={Haozhen Zhang and Tao Feng and Jiaxuan You},
  journal={arXiv preprint arXiv:2410.11001},
  year={2024}
}

About

Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages