(version 5.0) offers speech quality and intelligibility that were previously achievable only at nearly double the bitrate. Below are the results of comparative tests with the MELPe 1200 bps vocoder, which operates at almost twice the bitrate.
For Digital HF Radio and other markets.
TWELP Technology Features. The vocoder is based on newest technology of speech coding called "Tri-Wave Excited Linear Prediction" (TWELP) that was developed by experts of DSPINI.
TWELP technology is a new class of vocoders that differs from any other LPC-based vocoders by:
- advance reliable method of pitch estimation
- pitch-synchronous analysis
- advance tri-wave model of excitation
- newest quantization schemes
- pitch-synchronous synthesis
Thanks to these unique features, TWELP technology provides significantly better speech quality than other well-known technologies—including AMBE+2, MELPe, ACELP, and others—at equivalent bit rates ranging from 300 bps to 4800 bps and beyond.
Additionally, unlike other low-bitrate vocoders (such as MELPe, for example), TWELP delivers much higher quality for non-speech signals, including sirens, background music, and similar audio.
The TWELP 700 bps and MELPe 1200 bps vocoders were tested using the ITU-T P.50 speech base in 20 different languages.
Note:
We have updated the speech database by minimizing inter-speech pauses to eliminate their impact on the evaluation results. Therefore, the numbers obtained from the quality measurements using this updated speech database differ from those previously obtained with the original speech database, where speech pauses were not removed.
Language | MELPe 1200 bps | TWELP 700 bps |
---|---|---|
American | 2.711 | 2.703 |
Arabic | 2.722 | 2.676 |
British | 2.689 | 2.670 |
Chinese | 2.607 | 2.671 |
Danish | 2.720 | 2.800 |
Dutch | 2.644 | 2.603 |
Finnich | 2.630 | 2.608 |
French | 2.818 | 2.777 |
German | 2.793 | 2.766 |
Greek | 2.698 | 2.710 |
Hindi | 2.847 | 2.773 |
Hungarian | 2.793 | 2.766 |
Italian | 2.983 | 2.953 |
Japanese | 2.898 | 2.860 |
Norwegian | 2.745 | 2.740 |
Polish | 2.758 | 2.745 |
Portuguese | 2.894 | 2.796 |
Russian | 2.689 | 2.659 |
Spanish | 2.816 | 2.787 |
Swedish | 2.907 | 2.868 |
Average | 2.768 | 2.747 |
A difference is on average 0.021 PESQ |
STOI (Short-Time Objective Intelligibility) and ESTOI (Extended Short-Time Objective Intelligibility) metrics were used to assess speech intelligibility:
Language | MELPe 1200 bps | TWELP 700 bps |
---|---|---|
American | 85.20 | 86.02 |
Arabic | 84.76 | 84.79 |
British | 80.97 | 83.85 |
Chinese | 83.74 | 85.00 |
Danish | 84.86 | 86.66 |
Dutch | 82.66 | 84.69 |
Finnich | 80.12 | 81.22 |
French | 84.28 | 85.92 |
German | 85.01 | 85.81 |
Greek | 84.03 | 85.14 |
Hindi | 84.38 | 85.42 |
Hungarian | 84.72 | 85.67 |
Italian | 85.00 | 84.92 |
Japanese | 86.01 | 85.71 |
Norwegian | 85.12 | 86.43 |
Polish | 83.93 | 85.73 |
Portuguese | 85.05 | 85.62 |
Russian | 82.55 | 84.36 |
Spanish | 83.57 | 84.27 |
Swedish | 83.47 | 83.85 |
Average | 83.97 | 85.05 |
The difference is, on average, 1.08% in favor of TWELP 700 |
Language | MELPe1200 bps | TWELP 700 bps |
---|---|---|
American | 78.01 | 76.99 |
Arabic | 78.27 | 77.03 |
British | 75.02 | 75.36 |
Chinese | 78.08 | 78.15 |
Danish | 77.70 | 78.30 |
Dutch | 76.97 | 76.39 |
Finnich | 73.56 | 72.43 |
French | 77.91 | 77.82 |
German | 76.58 | 76.40 |
Greek | 78.51 | 77.89 |
Hindi | 76.53 | 75.38 |
Hungarian | 76.09 | 76.33 |
Italian | 77.88 | 76.02 |
Japanese | 80.21 | 78.83 |
Norwegian | 78.86 | 78.62 |
Polish | 77.75 | 77.59 |
Portuguese | 78.04 | 77.49 |
Russian | 76.02 | 75.43 |
Spanish | 77.57 | 76.84 |
Swedish | 75.91 | 74.06 |
Average | 77.27 | 76.67 |
A difference is on average 0.6 % |
Speech Samples (WAV-files).
A few independent experts compared the TWELP 700 bps vocoder with the MELPe 1200 bps vocoder using the preference method.
Although opinions were almost evenly split, the majority of listeners preferred the TWELP vocoder, noting that the speech sound was more natural and less synthetic.
All listeners were surprised to learn that the TWELP vocoder operates at nearly half the bitrate.
You can play and listen to short samples of the source speech, as well as the speech processed by the MELPe 600 bps vocoder and the TWELP 300 bps vocoder, which operates at half the bit rate, using the links in the table below.
You can also download the complete set of P.50 samples as zip files for all languages simultaneously by using the links in the 'Downloads' section at the bottom of the page.
Superiority In Quality Of The Non-speech Signals. In contrast to other LBR vocoders (MELPe, AMBE+2, etc.), TWELP vocoders provide high quality of non-speech signals, including police, ambulance, fire sirens, etc. This feature in conjunction with high quality natural human-sounding of voice makes TWELP vocoders well suitable for replacement of analog radio by digital radio and also for other applications where high quality transmitting of non-speech signals is relevant along with high quality transmitting of speech signals.
Source signal | MELPe 1200 bps | TWELP 700 bps |
---|---|---|
High Robustness To Acoustic Noise. In contrast to other LBR vocoders, TWELP vocoders are highly robust to acoustic noise due to a reliable pitch estimation method and other features of TWELP technology.
Language | MELPe 1200 bps | TWELP 700 bps |
---|---|---|
American | 66.92 | 65.85 |
Arabic | 69.34 | 66.49 |
British | 64.32 | 63.51 |
Chinese | 69.84 | 69.47 |
Danish | 65.41 | 65.93 |
Dutch | 64.64 | 64.00 |
Finnich | 62.45 | 61.84 |
French | 68.53 | 67.75 |
German | 65.00 | 64.78 |
Greek | 68.12 | 67.90 |
Hindi | 64.62 | 63.33 |
Hungarian | 68.20 | 66.55 |
Italian | 66.67 | 63.85 |
Japanese | 71.71 | 69.50 |
Norwegian | 69.64 | 69.53 |
Polish | 67.27 | 66.78 |
Portuguese | 67.93 | 66.72 |
Russian | 65.57 | 64.44 |
Spanish | 68.39 | 67.76 |
Swedish | 62.05 | 59.51 |
Average | 66.83 | 65.77 |
A difference is on average 1.06 % |
Below, you can listen to a short fragment of heavily noisy English speech after passing through MELPe and TWELP vocoders, with NPP (Noise Pre-Processor) and NCSE disabled and enabled, respectively.
NPP/NCSE | Input speech (SNR=10dB) | MELPe 1200 bps | TWELP 700 bps |
---|---|---|---|
Disabled | |||
Enabled |
These vocoders are based on an effective Joint Source-Channel Coding approach. Each vocoder is equipped with a custom-designed FEC, tailored to its specific characteristics and operational conditions.
TWELP Robust vocoders provide high speech quality simultaneously in noisy channel as well as in noiseless channel. FEC can operate with "soft decisions" as well as with "hard decisions" from a modem. "Soft decisions" mode provides much better robustness in comparison with the "hard decisions" mode.
For all users of our non-robust vocoder versions, we offer the following recommendations.
Essentially, the diagram shows by what percentage speech quality is reduced when a specific bit is distorted. The first bits in order cause catastrophic distortions, while the latter bits have significantly less impact on quality.
Additional Functionalities. The following additional functionalities are developed by DSPINI and integrated into TWELP vocoders:
- Noise Cancellation Speech Enhancement (NCSE)
- Automatic Gain Control (AGC),
- Voice Activity Detector (VAD),
- Discontinuous Transmission (DTX),
- Tone Detection/Generation (Single tones and Dual tones). The tones are transmitted by the vocoder facilities.
Note:
The Tone Detector/Generator functionalities are not integrated into the code by default but can be added free of charge upon request.
Each functionality has unique features, performance and characteristics, providing significant superiority over any well-known implementations on the market.
Technical Characteristics And Resource Requirements:
Bit Rate (bps) | Algorithm | Frame size (ms) | Algorithmic delay (including frame size) (ms) | Sampling rate (kHz) | Signal format | Bit stream format |
---|---|---|---|---|---|---|
700 | TWELP | 80 | 100 | 8 | Linear 16-bit PCM |
56 |
Name | Functionality | Technical characteristics | |
---|---|---|---|
Name | Value | ||
AGC | Automatic Gain Control |
Control range: | 0 ... +42 dB |
NCSE | Noise Canceller - Speech Enhancer |
SNR increasing | 20 dB |
Speech quality improvement |
> 0.1 PESQ | ||
Tone Detector |
Single/Dual tones detection |
In accordance with international standards | |
Tone Generator |
Single/Dual tones generation |
Special generator, kept continuity of signal (phase and amplitude of signal of previous frame) |
|
DTX | Discontinuous Transmission |
Reduces bit rate down to 110 bps in pauses between active speech regions |
|
VAD | Voice Activity Detection |
High reliability even with pink noise at an SNR < 0 dB. | |
CNG | Comfort Noise Generation |
Type of noise | "white" |
Level | - 60 dB |
Module | MIPS* peak | Memory (KBytes) | ||||
---|---|---|---|---|---|---|
Program | Data | |||||
Constants | Channel | Heap | Stack | |||
Voice Encoder | 84 | 42 | 267 | 4.7 | 5.7 | 1.0 |
NCSE | 6.0 | |||||
AGC | 0.5 | |||||
Voice Decoder | 19 | |||||
Voice Encoder + Voice Decoder |
103 | |||||
Total | 109.5 |
Module | MIPS* peak | Memory (KBytes) | ||||
---|---|---|---|---|---|---|
Program | Data | |||||
Constants | Channel | Heap | Stack | |||
Voice Encoder | 30 | 81 | 267 | 4.7 | 5.7 | 1.0 |
NCSE | 2.6 | |||||
AGC | 0.3 | |||||
Voice Decoder | 5.3 | |||||
Voice Encoder + Voice Decoder |
35.3 | |||||
Total | 38.2 |
Module | MIPS* peak | Memory (KBytes) | ||||
---|---|---|---|---|---|---|
Program | Data | |||||
Constants | Channel | Heap | Stack | |||
Voice Encoder | 51 | 26 | 267 | 4.7 | 5.7 | 1.0 |
NCSE | 6.2 | |||||
AGC | 0.4 | |||||
Voice Decoder | 13 | |||||
Voice Encoder + Voice Decoder |
64 | |||||
Total | 70.6 |
* DSPINI continues optimization of the TWELP algorithm and code in order to minimize computational complexity of the vocoder.
Software Integrity and Security. DSPINI guarantees the ABSOLUTE integrity of its software, free from any undocumented features, undeclared capabilities, or hidden functions. Our customers can be assured that none of our software/code contains any secret features or functionalities concealed from the user. If necessary, we are ready to provide the source code of our software products for appropriate certification.
Moreover, our software is available in source code form—you simply need to purchase the appropriate license to use it.
Guarantee And Support. DSPINI guarantees a quality and accordance of all technical characteristics of the product to requirement of current specifications. Testing and other method of quality control are used for guarantee support.
Any Platforms. DSPINI can port this vocoder software into any other DSP, RISC or general- purposes platform inshort time: 1-2 months.
Licensing Terms. To use the vocoder, customer should obtain a license from DSPINI only.
Customization. The vocoder can be customized under any specific requirements- other bit rate, frame size, any other robustness to channel errors, etc. Please contact with us for details.
Prospects. DSPINI is impoving and developing continuously a set of new vocoders with range from 300 bps up to 9600 bps, based on TWELP technology.
Related Software. This vocoder may be effectively used in a bundle with other DSPINI's products:
- Linear and acoustic echo cancellers,
- Multichannel noise cancellers (including two-microphone adaptive array),
- Wired or radiomodems for any types of channels and bitrates,
- Other products.
- Datasheet (pdf)
- ITU-T P.50 source speech samples (zip)
- MELPe 1200 bps speech samples (zip)
- TWELP 700 bps speech samples (zip)
- P.862 and STOI/ESTOI utilities
- PC-evaluation package (zip) — on request
- User's Guide document (pdf) — on request