HIVE is a program for creating 3D free-viewpoint video from RGB video.
A live demo can be viewed at anthonydickson.github.io/HIVE_Renderer.
The following instructions assume you are using Docker to run HIVE.
We recommend you to use the standalone Docker image anthonydickson/hive:standalone
if you are only interested in running HIVE.
It contains everything needed to run HIVE.
If you want to run edited code, refer to Setting Up Your Development Environment.
- Windows or Ubuntu
- NVIDIA GPU with 6 GB+ memory, e.g. RTX 2080 Ti, RTX 3060.
- CUDA 11.6+
- 16 GB of RAM
Here are the steps to quickly run a video through HIVE. You only need to download (pull) the Docker image.
- Make sure you have Docker installed and GPU support is enabled. You can check this by running a container with the options
--gpus all
and verifying that thenvidia-smi
command shows the correct information. - Create one folder containing two folders, a folder for your videos/datasets and a folder for the outputs:
mkdir HIVE cd HIVE mkdir data mkdir outputs
- Copy your videos or RGB-D datasets (TUM, StrayScanner or Unreal) into your data folder, e.g.
HIVE/data
. - Run the following command in a terminal from inside the root folder, e.g.
HIVE/
:GUI_PORT_NUMBER=8081;SERVER_PORT_NUMBER=8080; docker run --name HIVE --rm --gpus all -p ${GUI_PORT_NUMBER}:${GUI_PORT_NUMBER} -p ${SERVER_PORT_NUMBER}:{SERVER_PORT_NUMBER} -v $(pwd)/data:/app/data -v $(pwd)/outputs:/app/outputs -it anthonydickson/hive:standalone python3 -m hive.interface --port ${GUI_PORT_NUMBER}
- Navigate to localhost:8081 and fill in
dataset_path
andoutput_path
with the relative path to the dataset in your data folder and the relative path of the output folder in your outputs folder, and click the button at the bottom of the page that says 'Start Pipeline'. You can leave the other settings at their default values. For long videos, you can setnum_frames
to something like 150 to make it run faster. For datasets with ground truth data, uncheckestimate_pose
and/orestimate_depth
to use the ground truth data. - After the pipeline has finished running, check the terminal output for a link. Navigate to that link in your web browser to view the 3D video.
- To run another video, exit the Docker container in your terminal (Ctrl+C).
Note: Oculus headset users can use desktop mode in Oculus Link to view the 3D video in VR. Run the steps above as normal, and at the end click the button at the bottom of the screen saying 'Enter VR' from your headset.
Clone the repo:
git clone --recurse-submodules https://github.com/AnthonyDickson/HIVE.git
If you forget to or cannot clone with --recurse-submodules
, then clone the git dependencies with the following:
git submodule update --init --recursive
This command can also be used to pull/update any changes in the submodules.
Choose one of three options for setting up the dev environment (in the recommended order):
- Pull (download) the pre-built image (~18 GB):
docker pull anthonydickson/hive:runtime-cu118
- Done! Go to Running the Program for basic usage.
- Run the build command:
docker build -f Dockerfile.runtime -t anthonydickson/hive:runtime-cu118 .
- Done! Go to Running the Program for basic usage.
-
Install Python 3.8. A virtual environment (e.g., Conda, virtualenv, pipenv) is recommended.
-
Install the Python packages:
pip install -r requirements.txt
-
Install the dependencies:
- CUDA Toolkit 11.6+
- OpenCV 3.4.16
- Make sure to enable the CMake flag
-DWITH_CUDA=true
.
- Make sure to enable the CMake flag
- COLMAP
- The
gltf_2.0_draco_extension
branch of Draco. - Download the model weights:
WEIGHTS_PATH=/root/.cache/pretrained COLMAP_VOCAB_PATH=/root/.cache/colmap mkdir ${WEIGHTS_PATH} mkdir ${COLMAP_VOCAB_PATH} python scripts/download_weights.py wget https://github.com/intel-isl/DPT/releases/download/1_0/dpt_hybrid_nyu-2ce69ec7.pt -O ${WEIGHTS_PATH}/dpt_hybrid_nyu.pt curl -L $(yadisk-direct https://disk.yandex.ru/d/ouP6l8VJ0HpMZg) -o big-lama.zip && unzip big-lama.zip -d ${WEIGHTS_PATH} && rm big-lama-zip wget https://demuc.de/colmap/vocab_tree_flickr100K_words256K.bin -O ${COLMAP_VOCAB_PATH}/vocab.bin
Refer to the Dockerfile for detailed setup instructions on Ubuntu 20.04.
-
Done! Go to Running the Program for basic usage.
You can download a sample dataset from the TUM website.
The sequence fr3/walking_xyz
is a good one to start with.
Make sure you download and extract the dataset to the data/
folder.
The following examples assume you are using the standalone Docker image.
Below is an example of how to run the program with ground truth data:
docker run --rm --gpus all -v $(pwd)/data:/app/data -v $(pwd)/outputs:/app:outputs -p 8080:8080 -it anthonydickson/hive:standalone python3 -m hive --dataset_path data/rgbd_dataset_freiburg3_walking_xyz --output_path data/rgbd_dataset_freiburg3_walking_xyz_output --num_frames 150 --webxr_run_server
Below is an example of how to run the program with estimated data:
docker run --rm --gpus all -v $(pwd)/data:/app/data -v $(pwd)/outputs:/app:outputs -p 8080:8080 -it anthonydickson/hive:standalone python3 -m hive --dataset_path data/rgbd_dataset_freiburg3_walking_xyz --output_path data/rgbd_dataset_freiburg3_walking_xyz_output --num_frames 150 --frame_step 15 --estimate_pose --estimate_depth --webxr_run_server
If you are using the cloned repo and either the runtime or dev Docker image, you will need change the commands in this README file by:
- mounting the entire repository folder
-v $(pwd):/app
instead of just the data and outputs folders, - removing the port bindings (e.g.,
-p 8080:8080
), - removing the flag
--webxr_run_server
, - running the renderer web server manually (see Viewing the 3D Video).
There should be run configurations for PyCharm included when you clone the repo from GitHub in the .idea
folder.
If you want help with the CLI and the options, you can either refer to options.py or view the help via:
docker run --rm -it anthonydickson/hive python3 -m hive --help
Below is a list of the most useful CLI options:
--dataset_path <path/to/dataset>
Specify the path to either: a video file, TUM dataset or an iPhone dataset (StrayScanner).--output_path <path/to/folder>
Specify where the results should be written to.--overwrite_ok
Allow existing video files inoutput_path
or the WebXR export path to be overwritten.--no_cache
By default the pipeline will use any cached converted datasets inoutput_path
. Use this flag to automatically delete any cached datasets.--estimate_depth
By default the pipeline will try to use any depth maps that are provided with the input sequence. Use this flag to estimate depth maps instead.--estimate_pose
By default the pipeline will try to use ground truth camera intrinsics matrix and poses in thecamera_matrix.txt
andcamera_trajectory.txt
files. Use this flag to estimate the camera parameters via COLMAP instead.--num_frames <int>
If specified, any frames after this index are truncated.--align_scene
Whether to align the scene with the ground plane. Enable this if the recording device was held at an angle (facing upwards or downwards, not level) and the scene is not level in the renderer. This setting is recommended if you are using estimated pose.--inpainting_mode
Inpaint gaps in the background.0
- no inpainting.1
- Depth: cv2, Background: cv22
- Depth: cv2, Background: LaMa3
- Depth: LaMa, Background: cv24
- Depth: LaMa, Background: LaMa
--billboard
Creates flat billboards for foreground objects. This is intended as a workaround for cases where the estimated depth results in stretched out meshes with missing body parts.--static_camera
Indicate that the camera was not moving during capture. This will use the Kinect sensor camera matrix and the identity pose for the camera trajectory. Note: You do not need the flag--estimate_pose
when using this flag.--webxr_run_server
Whether to automatically run the renderer web server.
The Docker containers will, by default, bring up the python interpreter. All you need to do to get the main script (or any other script) running is to append the usual command, minus the call to python, to the following:
docker run --rm --gpus all -it anthonydickson/hive:standalone
For example, if you wanted to test whether the container is CUDA enabled:
docker run --rm --gpus all -it anthonydickson/hive:standalone python3 -c "import torch; print(torch.cuda.is_available())"
You can run the pipeline from a web based interface instead of the CLI. Assuming you are using Docker, you can run this by running the following command:
docker run -v $(pwd)/data:/app/data -v $(pwd)/outputs:/app/outputs -p 8081:8081 -p 8080:8080 --rm --gpus all -it anthonydickson/hive:standalone python3 -m hive.interface
and navigating to localhost:8081. Note that you should use relative paths for the dataset path and output paths.
Thank you to Felix for implementing this web interface and the image inpainting.
- Start the Docker container:
- If you are using the standalone Docker image, simply add the flag
--webxr_run_server
when you run HIVE. If you want to view a video from an output folder, run the following command:and navigate to localhost:8080.docker run --rm -p 8080:8080 -v $(pwd)/outputs/<dataset_name>/mesh:/app/third_party/HIVE_Renderer/docs/demo -it anthonydickson/hive:standalone bash -c "cd /app/third_party/HIVE_Renderer && npm run start"
- If you cloned the repo and are using the Docker image
anthonydickson/hive:runtime-cu118
:
If you are using PyCharm there is a run configuration included.docker run --rm --name Hive-Renderer -p 8080:8080 -v $(pwd)/third_party/webxr3dvideo/src:/app/src:ro -v $(pwd)/third_party/webxr3dvideo/docs:/app/docs anthonydickson/hive-renderer:node-16
- If you are using the standalone Docker image, simply add the flag
- When you run the pipeline it will print the link to view the video, navigate to this link in your browser.
Refer to the Hive Renderer repo for the code.
This program accepts datasets in three formats:
- TUM RGB-D SLAM Dataset
- RGB-D datasets created on an iOS device using StrayScanner
- RGB Video
- The HIVE dataset format (see HIVE Dataset Format)
The above datasets are automatically converted to the HIVE dataset format.
Each 3D video is saved to a folder with the glTF formatted mesh files and JSON metadata:
<converted dataset>
│ ...
└── mesh
│ fg.glb
│ bg.glb
└── metadata.json
This folder is saved under the dataset folder.
Overall, the expected folder structure for the HIVE dataset format is as follows:
<dataset>
│ metadata.json
│ camera_matrix.txt
│ camera_trajectory.txt
│ rgb
│ │ 000000.png
│ │ 000001.png
│ │ ...
│ └── 999999.png
│ depth
│ │ 000000.png
│ │ 000001.png
│ │ ...
│ └── 999999.png
│ mask
│ │ 000000.png
│ │ 000001.png
│ │ ...
│ └── 999999.png
└── ...
Datasets should be placed in a folder inside the data/
folder.
Generally, the number of colour frames must match the number of depth maps, masks and lines in the camera trajectory file.
Within each dataset folder, there should be the following 5 items:
-
The metadata in a JSON formatted file that contains the following fields:
num_frames
: The number of frames in the video sequence.frame_step
: The frequency in frames to sample frames for COLMAP and pose optimisation.fps
: The framerate of the video.width
: The width of the video frames in pixels.height
: The height of the video frames in pixels.depth_scale
: A scalar that when multiplied with depth map, will transform the depth values to meters.max_depth
: Depth values are clipped to this value. Roughly corresponds to meters.depth_mask_dilation_iterations
: The number of times to apply the dilation filter to the dynamic object masks when creating the masked depth maps.is_gt
: Whether the dataset was created using ground truth camera and depth data.estimate_pose
: Whether the camera parameters were estimated with COLMAP.estimate_depth
: Whether the depth maps were estimated.colmap_options
: The COLMAP configuration that was used ifestimate_pose
wasTrue
. This is a nested dictionary that contains the following fields:binary_path
: The path to the COLMAP binary.vocab_path
: The path to the COLMAP vocab file.is_single_camera
: Whether the dataset was captured with a single camera.dense
: Whether to run the dense reconstruction.quality
: The preset to use for the automatic reconstruction pipeline.
-
The camera intrinsics in a text file in the following format:
fx 0 cx 0 fy cy 0 0 1
where
fx
,fy
are the focal length (pixels) andcx
,cy
the principal point. -
The camera trajectory in a text file in the following format:
qx qy qz qw tx ty tz ... qx qy qz qw tx ty tz
where:
qx
,qy
,qz
andqw
form a quaternion; andtx
,ty
andtz
form a translation vector. There must be one line per frame in the input video sequence. Absolute pose values are expected (i.e. all relative to world origin) that transform points in world coordinates to camera space.Pose data is stored in the same coordinate frame as COLMAP: x (right), -y (down), -z (forwards, camera looking down +z). To convert to a right-handed coordinate system, you will need to multiply by following homogeneous transform:
1 0 0 0 0 -1 0 0 0 0 -1 0 0 0 0 1
-
The colour (RGB) frames are PNG files in a folder with names that preserve the frames' natural ordering, e.g.:
rgb │ 000001.png │ 000002.png │ ... └── 999999.png
-
The depth maps are 16-bit PNG files in a folder with names that preserve the frames' natural ordering, e.g.:
depth │ 000001.png │ 000002.png │ ... └── 999999.png
The depth maps are expected to be stored in a 16-bit grayscale image. The depth values should be in millimeters and increasing from the camera (i.e. depth = 0 at the camera, depth of 1000 is 1000 millimeters).
If you use our code, please cite:
@inproceedings{dickson2022vrvideos,
title={VRVideos: A Flexible Pipeline for VR Video Creation},
author={Dickson, Anthony and Shanks, Jeremy and Ventura, Jonathan and Knott, Alistair and Zollmann, Stefanie},
booktitle={2022 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR)},
pages={1--5},
year={2022},
organization={IEEE}
}