PyTorch implementation of a collections of scalable Video Transformer Benchmarks.
-
Updated
May 4, 2022 - Python
PyTorch implementation of a collections of scalable Video Transformer Benchmarks.
Developed the ViViT model for medical video classification, enhancing 3D organ image analysis using transformer-based architectures.
Python script to fine tune Open source Video Vision Transformer (ViVit) using HuggingFace Trainer Library
The dataset used for the "A non-contact SpO2 estimation using video magnification and infrared data" publication
Video vision transformers for hierarchical anomaly detection in video scenes.
Some incomplete works with 2D action recognition on MM-Fit dataset using ViT, ViViT, and MLP-Mixer Topics Resources
A comparative study of ViViT, CNN-GRU sequence models for video action recognition using the UCF101 dataset
Unofficial Tensorflow implementation of the ViViT model architecture
This repository contains code for training and evaluating transformer-based models like TimeSformer and VideoMAE for sign language recognition on the WLASL dataset. The project includes frame sampling techniques, preprocessing pipelines, fine-tuning strategies, and performance evaluation using metrics like top-1, top-5, and top-10 accuracy.
Add a description, image, and links to the vivit topic page so that developers can more easily learn about it.
To associate your repository with the vivit topic, visit your repo's landing page and select "manage topics."