efronic-voice-assistant

Overview

The efronic-voice-assistant is a voice-controlled assistant platform which runs on a raspberry pi. It leverages various AI models and APIs to provide speech-to-text, text-to-speech, and natural language understanding capabilities. The project integrates with OpenAI's GPT-4, Whisper, and other AI services to deliver a comprehensive voice assistant experience.

Features

Speech-to-Text: Converts spoken language into text using the Whisper model.
Text-to-Speech: Converts text responses back into spoken language.
Natural Language Understanding: Uses OpenAI's GPT-4 model to understand and respond to user queries.
Wake Word Detection: Activates the assistant using a predefined wake word.
Endpoint Detection: Automatically detects the end of user speech to process the input.

Configuration

The application uses several configuration files to manage settings for different environments:

appsettings.json: Default configuration.
appsettings.Development.json: Development-specific settings.
appsettings.Production.json: Production-specific settings.
appsettings_example_json.json: Example configuration file with placeholders.

Example Configuration (`appsettings_example_json.json`)

{
  "OPENAI_API_KEY": "sk-example-XXXXXXXXXXXXXXXXXXXXXXXXXXXX",
  "OPENAI_GPT_MODEL": "gpt-4",
  "OPENAI_API_ENDPOINT": "chat/completions",
  "WHISPER_MODEL": "whisper-1",
  "PV_ACCESS_KEY": "example-access-key",
  "OPENAI_BASE_URL": "https://api.openai.com/v1/",
  "WHISPER_API_URL": "https://api.openai.com/v1/audio/transcriptions",
  "AWS_ACCESS_KEY_ID": "EXAMPLEAWSACCESSKEYID",
  "AWS_SECRET_ACCESS_KEY": "EXAMPLEAWSSECRETACCESSKEY",
  "CHEETAH_ACCESS_KEY": "example-access-key",
  "CHEETAH_ENDPOINT_DURATION_SEC": "3.0f",
  "CHEETAH_ENABLE_AUTOMATIC_PUNCTUATION": "true",
  "CHEETAH_AUDIO_DEVICE_INDEX": "-1",
  "mpg123Path": "./models/mpg123.exe",
  "MS_COPILOT_API_KEY_1": "example-api-key-1",
  "MS_COPILOT_API_KEY_2": "example-api-key-2",
  "MS_COPILOT_API_ENDPOINT": "openai/deployments/gpt-4o/chat/completions?api-version=2023-03-15-preview",
  "MS_COPILOT_BASE_URL": "https://example-voice-assistant.openai.azure.com/",
  "MS_COPILOT_GPT_MODEL": "gpt-4o",
  "Options": {
    "Tokens": "./models/sherpa-onnx-streaming-zipformer-en-2023-06-26/tokens.txt",
    "Provider": "cpu",
    "Encoder": "./models/sherpa-onnx-streaming-zipformer-en-2023-06-26/encoder-epoch-99-avg-1-chunk-16-left-128.onnx",
    "Decoder": "./models/sherpa-onnx-streaming-zipformer-en-2023-06-26/decoder-epoch-99-avg-1-chunk-16-left-128.onnx",
    "Joiner": "./models/sherpa-onnx-streaming-zipformer-en-2023-06-26/joiner-epoch-99-avg-1-chunk-16-left-128.onnx"
  }
}

Project Structure (will be refined further)

.gitignore
.vscode/
AIClient.cs
appsettings_example_json.json
appsettings.Development.json
appsettings.json
appsettings.Production.json
AudioPlayer.cs
bin/
ConfigurationReloader.cs
efronic-voice-assistant.csproj
efronic-voice-assistant.sln
Hey-Wiz_en_linux_v3_0_0.ppn
Hey-Wiz_en_windows_v3_0_0.ppn
MicToText.cs
models/
obj/
OnnxOptions.cs
PicovoiceHandler.cs
Program.cs
PwmController.cs
run-paraformer.sh
run-transducer.sh
SpeechSynthesizer.cs
SpeechToTextController.cs
WaveHeader.cs
WhisperClient.cs

Key Classes

Program: Entry point of the application. Manages initialization and main event loop.
MicToText: Handles microphone input and converts speech to text.
SpeechSynthesizer: Converts text to speech.
AIClient: Communicates with OpenAI's GPT-4 model.
WhisperClient: Interfaces with the Whisper model for speech-to-text conversion.
PicovoiceHandler: Manages wake word detection.

Running the Application

To run the application, use the following command:

dotnet run

Ensure that you have the necessary API keys and models configured in the appsettings.json file.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.

Contact

For any questions or support, please contact the project maintainers.

This README provides an overview of the efronic-voice-assistant project, its features, configuration, and how to run it. For more detailed information, refer to the source code and configuration files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

efronic-voice-assistant

Overview

Features

Configuration

Example Configuration (`appsettings_example_json.json`)

Project Structure (will be refined further)

Key Classes

Running the Application

License

Contributing

Contact

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
AudioPlayer.cs		AudioPlayer.cs
ChatGPTClient.cs		ChatGPTClient.cs
ConfigurationReloader.cs		ConfigurationReloader.cs
Hey-Wiz_en_windows_v3_0_0.ppn		Hey-Wiz_en_windows_v3_0_0.ppn
PicovoiceHandler.cs		PicovoiceHandler.cs
Program.cs		Program.cs
PwmController.cs		PwmController.cs
README.md		README.md
SpeechSynthesizer.cs		SpeechSynthesizer.cs
SpeechToTextController.cs		SpeechToTextController.cs
WhisperClient.cs		WhisperClient.cs
efronic-voice-assistant.csproj		efronic-voice-assistant.csproj
efronic-voice-assistant.sln		efronic-voice-assistant.sln

efronic/efronic-voice-assistant

Folders and files

Latest commit

History

Repository files navigation

efronic-voice-assistant

Overview

Features

Configuration

Example Configuration (appsettings_example_json.json)

Project Structure (will be refined further)

Key Classes

Running the Application

License

Contributing

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Example Configuration (`appsettings_example_json.json`)

Packages