Skip to content

jannawro/verbatim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

verbatim

This is a commandline tool for creating transcripts of conversations recorded as audio files. It uses the following AI models to achieve that:

Requirements

ffmpeg installed

Download ffmpeg and install it.

Hugging face token

The script requires a token to the huggingface API for downloading pyannote.audio models. Here's how do get one.

Accepting terms and conditions

In order to download the pyannote.audio models you need to accept their terms and conditions. More on that here.

How to run

Get the code

Clone this repository using git:

git clone https://github.com/jannawro/verbatim.git

Installing dependencies

cd verbatim
python -m venv .venv
source .venv/bin/activate
python -m pip install -r requirements.txt

Run

Run:

python verbatim/main.py \
    --audio-file sample.mp3 \
    --audio-format mp3 \
    --hugging-face-token hf_1234567890 \
    --speakers 2 \
    --output transcript.txt

To see all options run:

python verbatim/main.py --help

Recommendations

Use the optimal --whisper-model

Use whisper models variants according to recommendations from OpenAI. This can be set via the --whisper-model flag. From initial testing models smaller than "large" work fine for english. For satysfying results with other languages "large" is recommended.

Use the optimal number of --workers

Verbatim will spin up the amount of threads specified by --workers. This number cannot be greater than the number of your CPU cores. Please note that each worker will also spin up a separate instance of Whisper for parallel processing. Use carefully togheter with --whisper-model to make sure you have enough resources to spin up the number of Whispers of designated size.

About

Transcribe conversations verbatim.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages