Skip to content

Latest commit

 

History

History
81 lines (56 loc) · 2.19 KB

README.md

File metadata and controls

81 lines (56 loc) · 2.19 KB

verbatim

This is a commandline tool for creating transcripts of conversations recorded as audio files. It uses the following AI models to achieve that:

Requirements

ffmpeg installed

Download ffmpeg and install it.

Hugging face token

The script requires a token to the huggingface API for downloading pyannote.audio models. Here's how do get one.

Accepting terms and conditions

In order to download the pyannote.audio models you need to accept their terms and conditions. More on that here.

How to run

Get the code

Clone this repository using git:

git clone https://github.com/jannawro/verbatim.git

Installing dependencies

cd verbatim
python -m venv .venv
source .venv/bin/activate
python -m pip install -r requirements.txt

Run

Run:

python verbatim/main.py \
    --audio-file sample.mp3 \
    --audio-format mp3 \
    --hugging-face-token hf_1234567890 \
    --speakers 2 \
    --output transcript.txt

To see all options run:

python verbatim/main.py --help

Recommendations

Use the optimal --whisper-model

Use whisper models variants according to recommendations from OpenAI. This can be set via the --whisper-model flag. From initial testing models smaller than "large" work fine for english. For satysfying results with other languages "large" is recommended.

Use the optimal number of --workers

Verbatim will spin up the amount of threads specified by --workers. This number cannot be greater than the number of your CPU cores. Please note that each worker will also spin up a separate instance of Whisper for parallel processing. Use carefully togheter with --whisper-model to make sure you have enough resources to spin up the number of Whispers of designated size.