Skip to content

Latest commit

 

History

History
80 lines (77 loc) · 2.56 KB

README.md

File metadata and controls

80 lines (77 loc) · 2.56 KB

Data Preparation

Install Download Tools

pip install video2dataset

or

git clone https://github.com/iejMac/video2dataset
cd video2dataset
pip install -e .

Download Metadata

wget -O hdvila100m.zip https://hdvila.blob.core.windows.net/dataset/hdvila100m.zip?sp=r&st=2022-06-28T03:33:11Z&se=2026-01-01T11:33:11Z&spr=https&sv=2021-06-08&sr=b&sig=VaqQkLFDqKinfkaPNs1jJ1EQIYCB%2FUPYiqFqmjWye6Y%3D

Then unzip the metadata zip file.

unzip hdvilla100m.zip

With the metadata, we will deal with these data into parquet files by running this code:

python makeparquet.py

Once you run this, you should have a file hd_vila.parquet with all the relevant metadata. The files are organized as:

data
├── caption_config
├── model
├── scripts
├── utils
├── makeparquet.py
├── config.yaml
├── download_hdvila.sh
├── hdvila
│   ├── hdvila_part0.jsonl 
│   ├── hdvila_part1.jsonl 
│   ├── hdvila_part2.jsonl 
│   ├── hdvila_part3.jsonl 
│   ├── hdvila_part4.jsonl
│   ├── hdvila_part5.jsonl
│   ├── hdvila_part6.jsonl
│   ├── hdvila_part7.jsonl
│   ├── hdvila_part8.jsonl
│   ├── hdvila_part9.jsonl
│   ├── hdvila_part10.jsonl
│   ├── hd_vila.parquet

Download HDVILA-100M Source Data

Please check your path in download_hdvila.sh before running the script for downloading the dataset:

bash download_hdvila.sh

Annotate Your Videos

  1. Download Pretrained Captioners for Videos (Images) and Audio.

    pip install gdown
    gdown https://drive.google.com/file/d/1vYqb0Lb_3sQ5bo6XV-FQ4n7k_0M9UMU3/view?usp=sharing
    tar -xvf audio_captioner.tar.gz
    gdown https://drive.google.com/file/d/1ZFCWZ8csMWLYsg9CWt71PJmKYpSn-FMt/view?usp=sharing
    tar -xvf vision_captioner.tar.gz
  2. Deploy captioners for data annotation Set up the python environment for captioner.

    bash setup_env.sh

    Video Annotation with Captions

    bash scripts/run_vision_captioner.sh

    Audio Annotation with Captions

    bash scripts/run_audio_captioner.sh
  3. (Optional) Deploy Depth Estimator to annotate 3D contents We highly recommend you to use GeoWizard to generate high-quality 3D contents. while the shortage of GeoWizard is the inference speed of generative models. Therefore, in our practice, we use the DPT to annotate major data.