Contains sample collection of files to get started with a local instance of whisper that can also be deployed as an AWS lambda function (lambda hosted version not yet tested).
This project is mainly about experimenting with making it into a local API.
- Docker Desktop (if building locally)
- Python 3.10 or 3.11
- AWS Lambda Rie
- Download from https://github.com/aws/aws-lambda-runtime-interface-emulator/releases
- Simply download the latest version named "aws-lambda-rie" (at the time of writing this, latest was 1.21)
- Place the downloaded file in the docker/ directory
- Alternatively, you can modify the Dockerfile to use an aws lambda base image, but ffmpeg is also inhumanly difficult to get working on RHEL-based images
- Git Bash is preferred (on windows, otherwise you may need to modify the commands a bit to work in cmd or powershell)
- Your favorite IDE that can work with python notebooks (VSCode is recommended)
- A GPU, preferrably minimum of 4GB
-
Initialize a virtual environment, activate it, and install dependencies
python -m venv .venv source .venv/Scripts/activate pip install -r requirements-local.txt
-
Start up the container
docker compose up
-
Open the whisper.ipynb file, and run each cell, start to finish. Voila, it should convert the speech to text. You can start experimenting from here!
-
I'm having some trouble, any additional docs?
-
I modified function.py, but the container isn't picking up the updates!
Remember to tear down the old container with the old file first, simply run
docker compose down && docker compose up --build
It'll tear the old container down and start up a new one with the updates
-
Why make a dockerized version of whisper?
Because for the love of god I almost became a broken man trying to get ffmpeg to work with whisper and its dependencies on windows. It can work with pydub/pyaudio, but it has some sort of vendetta against whisper on windows.
Please note that some parts of this project were developed with the assistance of Github Copilot and ChatGPT.