This branch is up to date with master.

Name	Name	Last commit message	Last commit date
Latest commit ViktorAxelsen sync Oct 16, 2024 47ac674 · Oct 16, 2024 History 14 Commits
docs	docs	sync	Oct 16, 2024
figures	figures	sync	Oct 14, 2024
LICENSE	LICENSE	Create LICENSE	Oct 7, 2024
README.md	README.md	sync	Oct 16, 2024
data_process.py	data_process.py	sync	Oct 14, 2024
eval.py	eval.py	sync	Oct 14, 2024
graph_construction.py	graph_construction.py	sync	Oct 14, 2024
prompt_pool.py	prompt_pool.py	sync	Oct 14, 2024
retrieval.py	retrieval.py	sync	Oct 14, 2024
sum_eval.py	sum_eval.py	sync	Oct 14, 2024
train.py	train.py	sync	Oct 14, 2024
training_preparation.py	training_preparation.py	sync	Oct 14, 2024
utils.py	utils.py	sync	Oct 14, 2024

Repository files navigation

Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs

🌐 Project Page | 📜 arXiv

News

[2024.10.16] 🌟 GoR is released.

📌Preliminary

Environment Setup

# python==3.10
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install dgl==1.0.0+cu113 -f https://data.dgl.ai/wheels/cu113/repo.html
pip install openai==0.28
pip install pandas
pip install langchain
pip install langchain-core
pip install langchain-community
pip install langchain-experimental
pip install tiktoken
pip install tqdm
pip install bert_score
pip install rouge_score
pip install networkx
pip install faiss-gpu
pip install transformers

Dataset Preparation

QMSum WCEP Booksum GovReport SQuALITY

Save the downloaded files in the ./data/[DATASET_NAME] folder.

Important

Before running the experiment, please configure your API KEY in "get_llm_response_via_api" in utils.py

⭐Experiments

Query Simulation and Graph Construction

Generate simulated queries and construct graphs. The constructed graphs are saved in the ./graph folder.

# DATASET Choices: qmsum, wcep, booksum, govreport, squality
# Training Set
python graph_construction.py --cuda 0 --dataset [DATASET] --train
# Test Set
python graph_construction.py --cuda 0 --dataset [DATASET]

Training Preparation

Pre-compute BERTScore and save training data in the ./training_data folder.

# DATASET Choices: qmsum, wcep, booksum, govreport, squality
python training_preparation.py --cuda 0 --dataset [DATASET]

Training

# DATASET Choices: qmsum, wcep, booksum, govreport, squality
python train.py --cuda 0 --dataset [DATASET]

Evaluation

# DATASET Choices: qmsum, wcep, booksum, govreport, squality
# Generate summary results
python eval.py --cuda 0 --dataset [DATASET]
# Evaluation
python sum_eval.py --cuda 0 --file_name ./result/[DATASET].json

Citation

@article{GoR,
  title={Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs},
  author={Haozhen Zhang and Tao Feng and Jiaxuan You},
  journal={arXiv preprint arXiv:2410.11001},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs

News

📌Preliminary

Environment Setup

Dataset Preparation

⭐Experiments

Query Simulation and Graph Construction

Training Preparation

Training

Evaluation

Citation

About

Releases

Packages

Languages

License

ulab-uiuc/GoR

Folders and files

Latest commit

History

Repository files navigation

Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs

News

📌Preliminary

Environment Setup

Dataset Preparation

⭐Experiments

Query Simulation and Graph Construction

Training Preparation

Training

Evaluation

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages