PaperTok is a web application that converts PDF content into engaging TikTok-style videos. It automates the process of content extraction, script generation, and video creation, making it easy to transform academic papers, stories, or any PDF document into shareable short-form videos.
- PDF content extraction
- AI-powered script generation
- Text-to-speech conversion
- Automatic video creation with captions
- Web interface for easy uploads and downloads
- Python 3.7+
- FFmpeg
- Gemini API
FFmpeg is required for video processing. Follow these instructions to install it:
- Download FFmpeg from https://ffmpeg.org/download.html
- Extract the downloaded archive
- Add the FFmpeg
bin
folder to your system PATH
brew install ffmpeg
sudo apt update
sudo apt install ffmpeg
-
Clone the repository:
git clone https://github.com/yourusername/papertok.git cd papertok
-
Create a virtual environment:
python -m venv .venv source .venv/bin/activate # On Windows, use `.venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Set up environment variables:
- Copy
.env.example
to.env
- Fill in the necessary Gemini API key and configurations
- Copy
-
Start the FastAPI web application:
uvicorn main:app --reload
-
Open a web browser and navigate to
http://localhost:8000
-
Upload a PDF file and follow the on-screen instructions to generate your TikTok video.
main.py
: Main FastAPI applicationcaption_generator.py
: Generates captions for the videoscript_generator.py
: AI-powered script generation from PDF contenttext_to_speech.py
: Converts generated script to speechvideo_processing.py
: Handles video creation and editingutils.py
: Utility functionstemplates/
: HTML templates for the web interfacestatic/output/
: Static assets and output for generated videosuploads/
: Temporary storage for uploaded PDFspdfs/
: Storage for processed PDF filesbackground.mp4
: Background video used in the video creation processvoiceOver.wav
: Audio file generated by text-to-speechscript.txt
: Contains the script generated from PDF content
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.