Notq is a Python base tool collected and developed for speech and language processing in Persian. Speech processing is increasingly playing an important role in data analysis in various health research such as diagnose mental disorders. Early diagnosis of diseases is one of the most important concerns of the health system and most psychiatric disorders cause changes in the semantic network of words. Knowing and extracting the features of this network can help diagnose these disorders. The purpose of this project is to collect and develop tools for speech processing in Persian and semantic load analysis of their words to be integrated in a library or tool and the user can easily access all available high quality tools. In this library, to achieve this goal, modules such as converting speech to text, audio and vice manipulation tools, processing and analyzing text have been provided.
First, install fluency package and syllable counter package:
pip install git+https://github.com/salsina/persian-fluency-detector#egg=persian_fluency_detector
pip install git+https://github.com/salsina/Persian-syllable-counter#egg=persian_syllable_counter
pip install git+https://github.com/shaqayeql/Notq#egg=notq
Before you get started, here's a list of functions you can use:
This function converts audio files to text files.
- audio_file_path: The path of the .wav audio file
- function_name: The tool of speech-to-text converter user wants to use. it can have the values of “VOSK_wav”, “Google_wav” or “Microsoft”. If no argument is given, the default value would be “VOSK_wav”.
- output_text_directory: The directory in which the output text file would be saved. Default value is "notq_outputs"+os.sep+"text_files".
- subscription: The subscription for microsoft azure.
- region: The region for microsoft azure.
speechToText("VOICE_AD\\titleA.mp3", function_name="Google_wav", output_text_directory="myDirectory\\myTextFiles")
This function converts mp3 file/files to wav file/files.
- audio_file_path: The path of the .mp3 audio file/files
- output_directory_path: The directory in which the output .wav file/files would be saved. Default value is "notq_outputs"+os.sep+"wav_audio_files".
- singleFilePath: A boolean which indicates whether there are multiple MP3 files the user wants to convert or there is only one file. If sets to False, the "audio_file_path" argument must be the path of a directory; otherwise, the audio_file_path" argument must be the path of a single MP3 file. The default value is True.
mp3ToWav("VOICE_AD\\titleA.mp3", output_directory_path="myDirectory")
This function changes sample rate of file/files to the desired rate.
- audio_file_path: The path of the .wav audio file/files
- sampleRate: The desired sample rate of the output file. Default value is
- output_directory_path: The directory in which the output .wav file/files would be saved. Default value is "notq_outputs"+os.sep+"wav_audio_files".
- singleFilePath: A boolean which indicates whether there are multiple MP3 files the user wants to convert or there is only one file. If sets to False, the "audio_file_path" argument must be the path of a directory; otherwise, the audio_file_path" argument must be the path of a single .wav file. The default value is True.
resample("VOICE_AD\\titleA.mp3" , sampleRate)
This function returns a similarity model by getting a similarity model path as an input.
- similarityModelPath: The path of similarity model. The model has a size of about 4GBs. Please be be careful to address the .bin file to the input.
similarityModel = loadSimilarityModel("cc.fa.300.bin")
This function finds cosine similarity between sentences.
- sentence1: The first sentence in string format.
- sentence2: The second sentence in string format.
- similarityModel: The model object got from the loadSimilarityModel function (mentioned above).
similarity = cosineSimilarity("من امروز به باشگاه رفتم", "امروز بود که بنده به باشگاه ورزش رجوع کردم", similarityModel)
This function splits .wav and .mp3 audio files into smaller parts.
- audio_file_path: The path of the .wav or .mp3 audio file
- output_directory_path: The directory in which the splitted files would be saved. Default value is "notq_outputs" + os.sep +file_name + "_splitted".
- dividing_len: The length of splitted audio files in seconds. The default value is 60.
splitAudiofile("VOICE_AD\\titleA.mp3", output_directory_path="myDirectory", dividing_len = 120)
This function returns a list of the beginings and the ends of silence times in a .wav audio file.
- audio_file_path: The path of the .wav audio file
- min_silence_time: The minimum silence time that counts as silence in miliseconds. The default value is 100.
- silence_threshhold: The minimum threshhold for frequency of silence times. The default value is inputAudio.dBFS - 16.
silenceTime("VOICE_AD\\titleA.mp3", min_silence_time=200)
This function calculates fluency factors in a .wav audio file.
- audio_file_path: The path of the .wav audio file
- fluency_type: The type of fluency factor: "SpeechRate", "ArticulationRate", "PhonationTimeRatio" or "MeanLengthOfRuns". The default value is "SpeechRate".
- Speech Rate: This measure is the actual number of syllables uttered, divided by the total speech time in minutes. This is the gross measure of the speed of speech production, it includes the hesitation in the total time spent speaking.
- Articulation Rate: This measure is the actual number of syllables uttered, divided by the total amount of time spent speaking. In this case, the hesitation time is eliminated from the calculation; this gives a measure of the speed of actual articulation only.
- Phonation Time Ratio: This is determined by totaling the pause times for each sample and calculating it as a percent of the total speech time. It indicates the amount of hesitation relative to actual speaking time, a combined measure of pause frequency and duration.
- Mean Length of Runs: The mean number of syllables uttered between hesitations. It indicates the length of utterance between pauses.
getFluency("VOICE_AD\\titleA.mp3", fluencyType="ArticulationRate")
You can run test.py for testing functions that used in Notq library.