Skip to content

Qualcomm-AI-research/clevrskills-bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ClevrSkills-Bench

This repository contains the StreamRoboLM model code, pre-trained checkpoints and evaluation code from "ClevrSkills: Compositional Language and Visual Reasoning in Robotics".

Getting Started

First, please follow the instructions at ClevrSkills to install ClevrSkills task suite, start a docker container and download the required resources for ClevrSkills. Please make sure to mount the dataset/checkpoint directories to the container.

Checkout the ClevrSkills repo to v0 branch to reproduce the results from the paper.

cd /path/to/clevrskills/repo/
git checkout v0 

Clone this repository inside the ClevrSkills docker container,

export CLEVRSKILLS_PATH=/path/to/clevrskills/repo
git clone https://github.com/Qualcomm-AI-research/clevrskills-bench.git
cd clevrskills-bench

Dataset

Download the ClevrSkills dataset and extract it.

The open-sourced data consists of 3 zip files, each containing data for 1 level of tasks. The zip files are split into ~10GB splits. You might require merging them before unzipping using

zip -s 0 simple_tasks.zip --out simple_tasks_merge.zip

Once you have downloaded and extracted the data, the directory will look like,

/path/to/data
|
└───simple_tasks/
|   └───train/
|   | ...
|   └───val/
|   | ...
|   └───test/
|   |   └───match_pose/
|   |   |   └───traj_{seed}/
|   |   |   ...
|   |   └───move_without_hitting/
...

You could also generate trajectories to evaluate your policy on.

Checkpoints

We provide StreamRoboLM (llama3) checkpoints for L0, L1, and L2 task levels respectively. The architecture uses Llama-3.2-3B-Instruct as the base language model and is trained using LoRA. Use download_ckpt.sh to download the checkpoints.

bash scripts/download_ckpts.sh ./models/

Once completed, the directory ./models/ should contain 3 directories l0_ckpt/, l1_ckpt/, and l2_ckpt/ for L0, L1 and L2 checkpoints respectively.

The BSD-3 Clear License of this repository also applies to the model weights.

Evaluation

Once you have downloaded/generated the data and downloaded a checkpoint, please update the corresponding paths for checkpoint(checkpoint_path), test trajectories(data_root) and output directory(output_dir) in eval_config.yaml. To reproduce the numbers in the paper, please use the test split of the dataset to run the evaluation. Then use the following command to evaluate respective checkpoint for finetuning results

PYTHONPATH=./ python scripts/main.py --config-path clevrskills_bench/eval_config.yaml --task_level {task_level}

where task_level can take values L0, L1 or L2.

We recommend the use of a GPU with at least 48GB VRAM along with at least 32GB of RAM.

Repository Structure

ClevrSkills-Bench
|
└───clevrskills_bench: core package
|   |
|   └───streamrobolm: code for the streamrobolm architecture
|   |   └───model_wrappers: wrapper that encapsulates the submodules and implements `forward`
|   |   |   |   lm_wrappers.py: abstract language model wrapper
|   |   |   |   vlm_wrappers.py: abstract vlm wrapper
|   |   |   |   robo_wrappers.py: encapsulates streamrobolm and implements forward pass
|   |   └───models:
|   |   |   └───huggingface: code for llama3 architecture from huggingface and modified for streamrobolm
|   |   |   └───vision: code for the vision head (VIT)
|   |   └───model.py: code to initialize the model given config parameters
|   |   └───robo_heads.py: implements the lstm action head
|   |   └───utils.py: miscellaneous utilities for the architecture code
|   |
|   └───evaluator.py: main code for evaluation
|   |
|   └───utils.py: miscellaneous utilities for evaluation
└───scripts:
|   |
|   └───main.py: evaluation entry script

Citation

If you find our code useful, please cite:

@article{haresh2024clevrskills,
  title={ClevrSkills: Compositional Language And Visual Understanding in Robotics},
  author={Haresh, Sanjay and Dijkman, Daniel and Bhattacharyya, Apratim and Memisevic, Roland},
  journal={Advances in Neural Information Processing Systems},
  year={2024}
}