The efronic-voice-assistant
is a voice-controlled assistant platform which runs on a raspberry pi. It leverages various AI models and APIs to provide speech-to-text, text-to-speech, and natural language understanding capabilities. The project integrates with OpenAI's GPT-4, Whisper, and other AI services to deliver a comprehensive voice assistant experience.
- Speech-to-Text: Converts spoken language into text using the Whisper model.
- Text-to-Speech: Converts text responses back into spoken language.
- Natural Language Understanding: Uses OpenAI's GPT-4 model to understand and respond to user queries.
- Wake Word Detection: Activates the assistant using a predefined wake word.
- Endpoint Detection: Automatically detects the end of user speech to process the input.
The application uses several configuration files to manage settings for different environments:
appsettings.json
: Default configuration.appsettings.Development.json
: Development-specific settings.appsettings.Production.json
: Production-specific settings.appsettings_example_json.json
: Example configuration file with placeholders.
{
"OPENAI_API_KEY": "sk-example-XXXXXXXXXXXXXXXXXXXXXXXXXXXX",
"OPENAI_GPT_MODEL": "gpt-4",
"OPENAI_API_ENDPOINT": "chat/completions",
"WHISPER_MODEL": "whisper-1",
"PV_ACCESS_KEY": "example-access-key",
"OPENAI_BASE_URL": "https://api.openai.com/v1/",
"WHISPER_API_URL": "https://api.openai.com/v1/audio/transcriptions",
"AWS_ACCESS_KEY_ID": "EXAMPLEAWSACCESSKEYID",
"AWS_SECRET_ACCESS_KEY": "EXAMPLEAWSSECRETACCESSKEY",
"CHEETAH_ACCESS_KEY": "example-access-key",
"CHEETAH_ENDPOINT_DURATION_SEC": "3.0f",
"CHEETAH_ENABLE_AUTOMATIC_PUNCTUATION": "true",
"CHEETAH_AUDIO_DEVICE_INDEX": "-1",
"mpg123Path": "./models/mpg123.exe",
"MS_COPILOT_API_KEY_1": "example-api-key-1",
"MS_COPILOT_API_KEY_2": "example-api-key-2",
"MS_COPILOT_API_ENDPOINT": "openai/deployments/gpt-4o/chat/completions?api-version=2023-03-15-preview",
"MS_COPILOT_BASE_URL": "https://example-voice-assistant.openai.azure.com/",
"MS_COPILOT_GPT_MODEL": "gpt-4o",
"Options": {
"Tokens": "./models/sherpa-onnx-streaming-zipformer-en-2023-06-26/tokens.txt",
"Provider": "cpu",
"Encoder": "./models/sherpa-onnx-streaming-zipformer-en-2023-06-26/encoder-epoch-99-avg-1-chunk-16-left-128.onnx",
"Decoder": "./models/sherpa-onnx-streaming-zipformer-en-2023-06-26/decoder-epoch-99-avg-1-chunk-16-left-128.onnx",
"Joiner": "./models/sherpa-onnx-streaming-zipformer-en-2023-06-26/joiner-epoch-99-avg-1-chunk-16-left-128.onnx"
}
}
.gitignore
.vscode/
AIClient.cs
appsettings_example_json.json
appsettings.Development.json
appsettings.json
appsettings.Production.json
AudioPlayer.cs
bin/
ConfigurationReloader.cs
efronic-voice-assistant.csproj
efronic-voice-assistant.sln
Hey-Wiz_en_linux_v3_0_0.ppn
Hey-Wiz_en_windows_v3_0_0.ppn
MicToText.cs
models/
obj/
OnnxOptions.cs
PicovoiceHandler.cs
Program.cs
PwmController.cs
run-paraformer.sh
run-transducer.sh
SpeechSynthesizer.cs
SpeechToTextController.cs
WaveHeader.cs
WhisperClient.cs
Program
: Entry point of the application. Manages initialization and main event loop.MicToText
: Handles microphone input and converts speech to text.SpeechSynthesizer
: Converts text to speech.AIClient
: Communicates with OpenAI's GPT-4 model.WhisperClient
: Interfaces with the Whisper model for speech-to-text conversion.PicovoiceHandler
: Manages wake word detection.
To run the application, use the following command:
dotnet run
Ensure that you have the necessary API keys and models configured in the appsettings.json
file.
This project is licensed under the MIT License. See the LICENSE file for details.
Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.
For any questions or support, please contact the project maintainers.
This README provides an overview of the efronic-voice-assistant
project, its features, configuration, and how to run it. For more detailed information, refer to the source code and configuration files.