Snowman
0.1.0
|
Voice activity detector class. More...
#include <snowboy-detect.h>
Public Member Functions | |
SnowboyVad (const std::string &resource_filename) | |
Default constructor. More... | |
bool | Reset () |
Resets the vad. More... | |
int | RunVad (const std::string &data, bool is_end=false) |
Runs the VAD algorithm. More... | |
int | RunVad (const float *const data, const int array_length, bool is_end=false) |
Runs vad on float samples. More... | |
int | RunVad (const int16_t *const data, const int array_length, bool is_end=false) |
Runs vad on int16_t samples. More... | |
int | RunVad (const int32_t *const data, const int array_length, bool is_end=false) |
Runs vad on int32_t samples. More... | |
void | SetAudioGain (const float audio_gain) |
Apply a fixed gain to the input audio. More... | |
void | ApplyFrontend (const bool apply_frontend) |
Enable or disable audio frontend (NS & AGC). More... | |
int | SampleRate () const |
Returns the expected sample rate for audio provided to RunDetection(). More... | |
int | NumChannels () const |
Returns the expected number of channels for audio provided to RunDetection(). More... | |
int | BitsPerSample () const |
Returns the expected number of bits for audio provided to RunDetection(). More... | |
~SnowboyVad () | |
Destructor. | |
Friends | |
class | testing::Inspector |
Voice activity detector class.
Class that only does voice activity detection without trying to match any hotwords. Do not use this in addition to SnowboyDetect, since SnowboyDetect contains a VAD pipeline as well.
snowboy::SnowboyVad::SnowboyVad | ( | const std::string & | resource_filename | ) |
Default constructor.
Constructor that takes a resource file.
[in] | resource_filename | Filename of resource file. |
void snowboy::SnowboyVad::ApplyFrontend | ( | const bool | apply_frontend | ) |
Enable or disable audio frontend (NS & AGC).
If <apply_frontend> is true, then apply frontend audio processing; otherwise turns the audio processing off. Frontend audio processing includes algorithms such as automatic gain control (AGC), noise suppression (NS) and so on. Generally adding frontend audio processing helps the performance, but if the model is not trained with frontend audio processing, it may decrease the performance. The general rule of thumb is:
[in] | apply_frontend | New frontend state |
int snowboy::SnowboyVad::BitsPerSample | ( | ) | const |
Returns the expected number of bits for audio provided to RunDetection().
int snowboy::SnowboyVad::NumChannels | ( | ) | const |
Returns the expected number of channels for audio provided to RunDetection().
bool snowboy::SnowboyVad::Reset | ( | ) |
Resets the vad.
After reset the pipeline will behave identical to a freshly constructed instance.
int snowboy::SnowboyVad::RunVad | ( | const float *const | data, |
const int | array_length, | ||
bool | is_end = false |
||
) |
Runs vad on float samples.
If NumChannels() > 1, e.g., NumChannels() == 2, then the array is as follows:
d1c1, d1c2, d2c1, d2c2, d3c1, d3c2, ..., dNc1, dNc2
where d1c1 means data point 1 of channel 1.
[in] | data | Small chunk of data to be detected. |
[in] | array_length | Length of the data array in elements. |
[in] | is_end | Set it to true if it is the end of a utterance or file. |
int snowboy::SnowboyVad::RunVad | ( | const int16_t *const | data, |
const int | array_length, | ||
bool | is_end = false |
||
) |
Runs vad on int16_t samples.
If NumChannels() > 1, e.g., NumChannels() == 2, then the array is as follows:
d1c1, d1c2, d2c1, d2c2, d3c1, d3c2, ..., dNc1, dNc2
where d1c1 means data point 1 of channel 1.
[in] | data | Small chunk of data to be detected. |
[in] | array_length | Length of the data array in elements. |
[in] | is_end | Set it to true if it is the end of a utterance or file. |
int snowboy::SnowboyVad::RunVad | ( | const int32_t *const | data, |
const int | array_length, | ||
bool | is_end = false |
||
) |
Runs vad on int32_t samples.
If NumChannels() > 1, e.g., NumChannels() == 2, then the array is as follows:
d1c1, d1c2, d2c1, d2c2, d3c1, d3c2, ..., dNc1, dNc2
where d1c1 means data point 1 of channel 1.
[in] | data | Small chunk of data to be detected. |
[in] | array_length | Length of the data array in elements. |
[in] | is_end | Set it to true if it is the end of a utterance or file. |
int snowboy::SnowboyVad::RunVad | ( | const std::string & | data, |
bool | is_end = false |
||
) |
Runs the VAD algorithm.
Supported audio format is WAVE (with linear PCM, 8-bits unsigned integer, 16-bits signed integer or 32-bits signed integer). See SampleRate(), NumChannels() and BitsPerSample() for the required sampling rate, number of channels and bits per sample values. You are supposed to provide a small chunk of data (e.g., 0.1 second) each time you call RunDetection(). Larger chunk usually leads to longer delay, but less CPU usage.
Code | Info |
---|---|
-2 | Silence. |
-1 | Error. |
0 | No event. |
[in] | data | Small chunk of data to be detected. See above for the supported data format. |
[in] | is_end | Set it to true if it is the end of a utterance or file. |
int snowboy::SnowboyVad::SampleRate | ( | ) | const |
Returns the expected sample rate for audio provided to RunDetection().
void snowboy::SnowboyVad::SetAudioGain | ( | const float | audio_gain | ) |
Apply a fixed gain to the input audio.
In case you have a very weak microphone, you can use this function to boost input audio level.
[in] | audio_gain | Gain to apply. A gain of 1 means no volume change. |