Skip to content

A simple speech recognition system using PocketSphinx that converts audio input into text and executes basic voice commands.

Notifications You must be signed in to change notification settings

liubomyr123/speech-recognition-assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PocketSphinx Speech Recognition

Overview

This project implements a basic speech recognition system using PocketSphinx. Currently, it serves as a template that captures audio input, converts it into text, and demonstrates command execution using recognized speech.

Installation

Follow these steps to install and build the project:

  1. Clone the repository:

    cd ~
    git clone https://github.com/cmusphinx/pocketsphinx.git
  2. Install dependencies:

    sudo apt install \
         ffmpeg \
         libasound2-dev \
         libportaudio2 \
         libportaudiocpp0 \
         libpulse-dev \
         libsox-fmt-all \
         portaudio19-dev \
         sox
  3. Build and install PocketSphinx:

    mkdir build
    cd build
    cmake ..
    cmake --build .
    sudo cmake --build . --target install

Build and Run

Build the Program

make

Build with other languages: RU/DE

make MODEL_DE=1

# or

make MODEL_RU=1

Run the Program

./live

Clean the Build

make clean

Features

  • Uses PocketSphinx for speech recognition.
  • Captures audio input via SoX.
  • Converts speech to text.
  • Demonstrates command execution using recognized phrases.
  • Example commands:
    • Saying "browser" opens a web browser.
    • Saying "browser exit" closes the browser.
  • The system is designed as a foundation for further development and testing of voice commands.

Dependencies

  • PocketSphinx
  • SoX (for audio capture)
  • PortAudio (for real-time audio processing)

Additional Resources

Language models

Go to Acoustic and Language Models website and download language models:

  • 'language model' Examples: cmusphinx-voxforge-de.lm.bin ru.lm
  • 'dictionary': Examples: cmusphinx-voxforge-de.dic ru.dic
  • 'hidden markov model': Examples: voxforge.cd_ptm_5000 zero_ru.cd_cont_4000

Save them in one of the folders: MODEL_DE, MODEL_RU, MODEL_UA...

You can also find these models here: https://drive.google.com/drive/folders/1UAlBpDFsMTmH69C1u_6yrGoLEp1GN9ov

Then rename them into dictionary.dic, hmm, language_model.lm or if some of them are binary: dictionary.dic.bin, language_model.lm.bin.

Then use them inside load_modals() function like this:

#ifdef MODEL_UA
    printf("\n");
    printf("✅ MODEL_UA flag is defined.\n");
    if (check_models_files(&is_run_default_setup, "MODEL_UA") == 1)
    {
        printf("Loading...\n");
    }
    printf("\n");
#endif

#ifdef MODEL_DE
    printf("\n");
    printf("✅ MODEL_DE flag is defined.\n");
    if (check_models_files(&is_run_default_setup, "MODEL_DE") == 1)
    {
        printf("Loading...\n");
    }
    printf("\n");
#endif

Here instead of Loading...:

  printf("Loading...\n");

Load real models:

  ps_config_set_str(speech_config, "hmm", "MODEL_DE/hmm");
  ps_config_set_str(speech_config, "lm", "MODEL_DE/language_model.lm.bin");
  ps_config_set_str(speech_config, "dict", "MODEL_DE/dictionary.dic");

Releases

No releases published

Packages

No packages published