DSP Innovations - CODEC2 vs TWELP at 2400 bps

Details: Written by Sergey; Published: 2025 February 11; Last Updated: 2025 February 25

In recent years, we have noticed that vocoder selection for critical voice communication systems is sometimes made without full consideration of the technical aspects and potential consequences of the choice.
It is noteworthy that even in high-end communication devices, free codecs are occasionally used, which can impact key performance characteristics and, in some cases, lead to a system that does not meet the expected quality and reliability standards.

We hope the information provided below will be useful not only for those new to the field but also for experienced professionals, helping them better understand the critical characteristics of vocoders and make well-informed decisions when selecting the right vocoder for their voice communication systems.

In this report, we compare the popular CODEC2 (version 1.2) vocoder at a 2400 bps bit rate with the TWELP vocoder at bit rates of 2400 bps and 700 bps. Our goal is to demonstrate that CODEC2 falls significantly short in terms of speech quality and intelligibility compared to the TWELP vocoder, even when TWELP operates at a considerably lower bit rate.

Essential Information. As professionals in the field of voice communication know, any communication system is primarily defined by three key characteristics:

Speech Quality: Refers to how natural and clear the speech sounds. Higher quality means less distortion, fewer artifacts, and a more pleasant listening experience.
It directly impacts the recognition of the speaker and the conveyance of their emotions, which is especially critical in high-stakes situations.
Speech Intelligibility: Measures how easily speech can be understood, even in noisy conditions. While degradation in quality may occur, high intelligibility ensures that key information remains recognizable.
A drop in intelligibility down to 90% (missing one word out of ten) may not be extremely critical, as the brain can often infer lost information from context.
However, when intelligibility falls down to 70% and lower (missing three or more words out of ten), communication becomes significantly more challenging and can lead to irreversible consequences in critical situations.
Latency: Refers to the time delay between speech input and output. Lower latency is crucial for real-time communication to prevent unnatural pauses and delays.
Noticeable time delays not only complicate communication but can also result in severe consequences when rapid and accurate voice transmission is vital.

Therefore, vocoders should primarily be compared based on these essential parameters.

All measurement results presented below can be independently verified using the samples and utilities available via the links at the bottom of this webpage.

You will find not only all the necessary utilities in the archive but also a complete test environment in the form of batch (BAT) command files. You only need to run the corresponding command file to test the speech quality and intelligibility of samples processed by a specific vocoder.
You can also explore the information inside the command files to see how the process works.

Technology Features. The TWELP vocoders are based on the newest technology of speech coding called "Tri-Wave Excited Linear Prediction" (TWELP) that was developed by experts of DSPINI.

TWELP technology is a new class of vocoders that differs from any other LPC-based vocoders by:

advance reliable method of pitch estimation
pitch-synchronous analysis
advance tri-wave model of excitation
newest quantization schemes
pitch-synchronous synthesis

Thanks to its unique features, TWELP technology offers significantly better speech quality and intelligibility compared to other well-known technologies, including AMBE+2, MELPe, ACELP, and others, at bit rates ranging from 300 bps to 4800 bps and beyond.
Unlike many other LBR vocoders (such as MELPe, for example), TWELP also provides superior quality for non-speech signals like sirens, background music, and more.

In contrast, CODEC2 is based on the older and simpler SHC (Sinusoidal Harmonic Coding) technology, which was widely used over 30 years ago.

Speech Quality. As is well known, obtaining an objective assessment of the speech quality of low-bitrate vocoders by ear is extremely difficult and labor-intensive.

Despite the common belief that the ITU-T P.862 utility, specifically designed for objective speech quality evaluation, is not suitable for assessing low-bitrate vocoders, our many years of experience show that this is not entirely accurate.
Yes, of course, this assessment is not absolutely precise. However, it is very useful as it helps to identify quality differences when comparing vocoders.

Recently, we have started using speech intelligibility measurements based on the STOI (Short-Time Objective Intelligibility) and ESTOI (Extended Short-Time Objective Intelligibility) methods and have found a clear correlation between PESQ quality scores and intelligibility scores in the STOI/ESTOI metrics.
Of course, there are some differences, especially when comparing vocoders with very low bitrates. However, it is safe to say that all these evaluation methods are important and, by complementing each other, provide a sufficiently accurate and objective picture when comparing vocoder quality.

The TWELP 2400 bps, TWELP 700 bps vocoders and CODEC2 2400 bps vocoder were tested using the ITU-T P.50 speech database for 20 different languages.
We have updated the speech database by minimizing inter-speech pauses to eliminate their impact on the evaluation results.
Therefore, the numbers obtained from the quality measurements using this updated speech database differ from those previously obtained with the original speech database, where speech pauses were not removed.
The ITU-T P.862 utility was used to estimate speech quality in PESQ terms:

Language	TWELP 2400 bps	TWELP 700 bps	CODEC2 2400 bps
American	3.213	2.703	2.641
Arabic	3.214	2.676	2.605
British	3.098	2.670	2.420
Chinese	3.249	2.671	2.636
Danish	3.268	2.800	2.655
Dutch	3.070	2.603	2.427
Finnich	2.989	2.608	2.348
French	3.381	2.777	2.714
German	3.189	2.766	2.592
Greek	3.262	2.710	2.594
Hindi	3.247	2.773	2.633
Hungarian	3.245	2.766	2.873
Italian	3.399	2.953	2.884
Japanese	3.353	2.860	2.741
Norwegian	3.264	2.740	2.775
Polish	3.213	2.745	2.541
Portuguese	3.314	2.796	2.794
Russian	3.153	2.659	2.624
Spanish	3.244	2.787	2.917
Swedish	3.256	2.868	2.675
Average	3.226	2.747	2.654
Superiority of the TWELP 2400 and TWELP 700 over CODEC2 2400 is on average 0.572 and 0.093 PESQ apropriately

The diagram and table above illustrate the significant difference in speech quality between the TWELP and CODEC2 vocoders.
As is well known, for such low bit rates, even a 0.1 PESQ difference is quite substantial. In this case, we observe a notable PESQ difference of 0.572 between TWELP and CODEC2.
The speech quality of the CODEC2 vocoder at 2400 bps is lower than that of the TWELP vocoder at 700 bps by nearly 0.1 PESQ, despite TWELP operating at a significantly lower bit rate.

Speech Intelligibility. The five-point speech intelligibility scale is typically represented in the following terms:

Excellent – 96-100% intelligibility
Good – 86-95% intelligibility
Fair – 70-85% intelligibility
Poor – 50-69% intelligibility
Bad – <50% intelligibility

This scale is used, for example, in speech intelligibility assessment studies such as the Modified Rhyme Test (MRT) or the Speech Intelligibility Index (SII).

We use the STOI (Short-Time Objective Intelligibility) and ESTOI (Extended Short-Time Objective Intelligibility) metrics to evaluate speech intelligibility.
These metrics have proven their objectivity over the past years and have become so popular that even the latest version of Matlab includes one of these evaluators.
Although we use both metrics, we believe that the ESTOI metric provides a more objective result when assessing parametric vocoders, which are highly nonlinear devices that significantly distort the spectral composition of the signal.

Here is the comparison of the speech intelligibility, using the same updated ITU-T P.50 speech base for 20 different languages and the above mentioned STOI and ESTOI metrics:

Language	TWELP 2400 bps	TWELP 700 bps	CODEC2 2400 bps
American	92.69	86.02	85.93
Arabic	92.03	84.79	85.09
British	90.98	83.85	82.38
Chinese	92.00	85.00	86.55
Danish	93.00	86.66	85.70
Dutch	91.58	84.69	83.28
Finnich	89.54	81.22	81.54
French	92.33	85.92	86.98
German	92.28	85.81	84.81
Greek	92.54	85.14	85.27
Hindi	92.37	85.42	84.02
Hungarian	92.05	85.67	87.65
Italian	92.36	84.92	85.37
Japanese	92.53	85.71	85.84
Norwegian	93.13	86.43	87.81
Polish	92.68	85.73	85.60
Portuguese	91.77	85.62	86.86
Russian	91.38	84.36	85.56
Spanish	91.96	84.27	86.03
Swedish	91.02	83.85	85.06
Average	92.01	85.05	85.37
Superiority of the TWELP 2400 over CODEC2 2400 is on average 6.64% A difference between TWELP 700 and CODEC2 2400 is an average just 0.32%

The diagram and table above also illustrate the significant difference in speech intelligibility between the TWELP and CODEC2 vocoders.

Measurements indicate that these vocoders belong to fundamentally different intelligibility categories.

TWELP at 2400 bps is close to the upper boundary of the "Good" category (86-95%)..
CODEC2 at 2400 bps falls within the "Fair" category (70-85%), providing intelligibility comparable to the TWELP vocoder at 700 bps, despite TWELP operating at a significantly lower bitrate.

Considering that a low-bitrate vocoder is a nonlinear device that significantly distorts the spectrum of the original speech signal, the ESTOI metric provides more accurate assessments of speech intelligibility after vocoding:

Language	TWELP 2400 bps	TWELP 700 bps	CODEC2 2400 bps
American	87.35	76.99	73.18
Arabic	87.59	77.03	73.12
British	85.55	75.36	68.66
Chinese	88.03	78.15	77.79
Danish	87.91	78.30	73.48
Dutch	86.47	76.39	70.44
Finnich	83.80	72.43	70.46
French	87.69	77.82	75.20
German	86.69	76.40	70.72
Greek	88.52	77.89	75.02
Hindi	86.62	75.38	67.96
Hungarian	86.83	76.33	75.15
Italian	87.52	76.02	74.25
Japanese	88.62	78.83	75.53
Norwegian	88.65	78.62	77.16
Polish	88.20	77.59	72.66
Portuguese	86.87	77.49	75.04
Russian	86.25	75.43	72.62
Spanish	87.22	76.84	76.67
Swedish	84.82	74.06	72.56
Average	87.06	76.67	73.38
Superiority of the TWELP 2400 and TWELP 700 over CODEC2 2400 is on average 13.68% and 3.29% apropriately

The diagram and the table above illustrate the significant difference in speech intelligibility between the TWELP and CODEC2 vocoders.
Measurements indicate that these vocoders belong to fundamentally different intelligibility categories.

TWELP 2400 bps vocoder is well within the "Good" category (86-95%).
TWELP 700 bps vocoder falls within the "Fair" category (70-85%).
CODEC2 2400 bps vocoder, on average, also falls within the "Fair" category (70-85%), but tends to remain in its lower range. For some languages, its intelligibility may even fall into the "Poor" category (50-70%).

These results show that the CODEC2 2400 bps vocoder performs noticeably below even the TWELP 700 bps vocoder, despite TWELP operating at a significantly lower bitrate.

Speech Samples (WAV-files). We used the preference method when comparing speech quality and intelligibility by ear.

Initially, we compared the CODEC2 2400 bps vocoder with the TWELP 2400 bps vocoder.
All independent listeners preferred the TWELP vocoder, noting its superior quality, clarity, naturalness, and speech intelligibility.

None of the listeners or experts agreed with the claim that the speech quality of the CODEC2 vocoder is comparable to that of TWELP or only slightly inferior.
Everyone unanimously agreed that the TWELP vocoder offers a distinctly higher level of speech quality.

We also conducted comparative tests between the CODEC2 2400 bps vocoder and the TWELP 700 bps vocoder, which operates at nearly four times lower bit rate.
Although the opinions of the listeners shifted in this case, the majority still favored the TWELP vocoder.

Listeners were surprised to learn that the vocoders operate at bit rates that differ by nearly four times.

You can listen to short samples of the original speech as well as the processed speech from these vocoders in any of the 20 languages through the links in the table below.
Additionally, you can download the complete set of ITU-T P.50 samples (updated version without pauses) as ZIP files for all languages at once using the links in the "Downloads" section at the bottom of the page.

For the best listening experience, we recommend using high-quality headphones or premium audio equipment to hear the nuances and differences in the vocoder sound more clearly.

Language	Source speech	CODEC2 2400 bps	TWELP 700 bps	TWELP 2400 bps
American
Arabic
British
Chinese
Danish
Dutch
Finnich
French
German
Greek
Hindi
Hungarian
Italian
Japanese
Norwegian
Polish
Portuguese
Russian
Spanish
Swedish

Superiority In Quality Of The Non-speech Signals. In contrast to other LBR vocoders (MELPe, AMBE+2, etc.), TWELP vocoders provide high quality of non-speech signals, including police, ambulance, fire sirens, etc. This feature in conjunction with high quality natural human-sounding of voice makes TWELP vocoders well suitable for replacement of analog radio by digital radio and also for other applications where high quality transmitting of non-speech signals is relevant along with high quality transmitting of speech signals.

Source signal	CODEC2 2400 bps	TWELP 700 bps	TWELP 2400 bps

High Robustness To Acoustic Noise. In real-world voice communication applications, the speech signal at the vocoder's input is typically distorted to some extent by external noise.
In many cases, such as military operations and similar scenarios, the signal-to-noise ratio (SNR) can be extremely low.
High speech intelligibility under such conditions is a critically important factor, often directly affecting people’s safety and even their lives.

We used the ITU-T P.50 speech database as a basis for generating samples where the original speech was mixed with "pink" noise at various SNR levels, ranging from 40 dB down to 0 dB.
We then processed all these noisy speech samples using the TWELP 2400 bps and CODEC2 2400 bps vocoders.
Next, we measured speech intelligibility for all these samples using the ESTOI metric, as it is the most suitable for objectively evaluating speech signals processed by a parametric vocoder, especially in noisy environments.

At high SNR levels, adding noise has little effect on intelligibility. However, when the SNR drops below 30 dB, speech intelligibility after both vocoders begins to decline.
Since the CODEC2 vocoder inherently provides significantly lower speech quality and intelligibility compared to the TWELP vocoder, speech intelligibility at very low SNR levels after CODEC2 becomes unacceptably low, while after TWELP, it remains at a satisfactory level.

Below, we present a diagram and a table showing intelligibility values for SNR = 10 dB, along with previously measured intelligibility values for clean, noise-free speech.
This diagram and table clearly illustrate the advantages of the TWELP vocoder over the CODEC2 vocoder in acoustic noise conditions.

The TWELP vocoder ensures that even highly noisy speech at SNR = 10 dB remains more intelligible than the completely clean, noise-free speech processed by the CODEC2 vocoder.

Language	TWELP (noise-free)	TWELP (SNR=10dB)	CODEC2 (noise-free)	CODEC2 (SNR=10dB)
American	87.35	74.70	73.18	63.94
Arabic	87.59	76.56	73.12	66.89
British	85.55	72.18	68.66	60.25
Chinese	88.03	78.73	77.79	69.58
Danish	87.91	74.29	73.48	64.41
Dutch	86.47	72.06	70.44	60.21
Finnich	83.80	71.56	70.46	62.44
French	87.69	77.23	75.20	66.83
German	86.69	73.04	70.72	63.19
Greek	88.52	77.26	75.02	67.00
Hindi	86.62	72.90	67.96	59.29
Hungarian	86.83	76.83	75.15	67.28
Italian	87.52	74.19	74.25	64.87
Japanese	88.62	78.40	75.53	67.93
Norwegian	88.65	78.31	77.16	69.52
Polish	88.20	75.47	72.66	63.92
Portuguese	86.87	75.40	75.04	66.91
Russian	86.25	74.27	72.62	63.57
Spanish	87.22	77.32	76.67	69.15
Swedish	84.82	69.17	72.56	61.50
Average	87.06	74.95	73.38	64.93
The superiority of TWELP 2400 over CODEC2 2400 averages 13.68% in a noise-free environment. The superiority of TWELP 2400 over CODEC2 2400 averages 10.02% in acoustic noise at SNR = 10 dB

You can see that the average intelligibility of very noisy speech after the TWELP 2400 bps vocoder (74.95%) is higher than the intelligibility of clear (noise-free) speech after the CODEC2 2400 bps vocoder (73.38%), remaining in the "Fair" speech quality category.
The intelligibility of the same noisy speech after the CODEC2 2400 bps vocoder is only 64.93%, which corresponds to the "Poor" speech quality category.

Below, you can listen to short samples of noisy English speech at SNR = 10 dB processed by both vocoders.

CODEC2 2400 bps	TWELP 2400 bps	Source signal (SNR=10dB)

Note:
All the above evaluations for both vocoders were conducted without activating noise reduction systems in order to assess the vocoders themselves exclusively.

Latency. Here is a comparison of the latency added by these vocoders to the communication system.

TWELP 2400 bps vocoder operates with a frame size of 20 ms (160 samples) and has the same look-ahead time in the analysis, providing a total algorithmic delay of 40 ms.
TWELP 700 bps vocoder operates with a frame size of 80 ms (640 samples) and has a look-ahead time of 20 ms (160 samples), providing a total algorithmic delay of 100 ms.
CODEC2 2400 bps vocoder operates with a frame size of 20 ms (160 samples) and has the same look-ahead time 20 ms (160 samples), providing a total algorithmic delay of 40 ms.

Note:
We did not analyze the CODEC2 vocoder's code to determine its algorithmic delay.
Instead, we estimated the look-ahead time based on the sample delay in the output files relative to the input samples, which typically corresponds to the actual value of this time.
The frame size value was taken from the vocoder's documentation.

You can see that both the TWELP 2400 and CODEC2 2400 vocoders have the same low latency of 40 ms, while the TWELP 700 bps vocoder has a latency of 100 ms. All these latencies are acceptable for voice communication systems.

Guarantee And Support. DSPINI guarantees a quality and accordance of all technical characteristics of the product to requirement of current specifications. Testing and other method of quality control are used for guarantee support.

Software Integrity and Security. DSPINI guarantees the ABSOLUTE integrity of its software, free from any undocumented features, undeclared capabilities, or hidden functions. Our customers can be assured that none of our software/code contains any secret features or functionalities concealed from the user. If necessary, we are ready to provide the source code of our software products for appropriate certification.
Moreover, our software is available in source code form—you simply need to purchase the appropriate license to use it.

Any Platforms. DSPINI performs a highly optimized porting of the vocoder for any other DSP, RISC or general-purpose platform in short time: 1-2 months.

Licensing Terms. DSPINI is the exclusive owner of the rights to the TWELP vocoder software, a customer should obtain a license from DSPINI only.

Customization. DSPINI can customize any vocoder under specific requirements- other bit rate, frame size, any other robustness to channel errors, etc. Please contact us for the details.

Prospects. DSPINI is impoving and developing continuously a set of new vocoders with range from 300 bps up to 9600 bps, based on SPR and TWELP technologies.

Related Software. Any vocoder may be effectively used in a bundle with other DSPINI's products:

Linear and acoustic echo cancellers,
Speech Enhacers / Noise cancellers,
Wired or radiomodems for any types of channels and bitrates,
Other DSP products.

Downloads:

Full testbench (zip ~417 MB)

Note:
This testbench includes:
- The ITU-T P.50 speech database (updated by removing pauses and adding noisy speech samples)
- The ITU-T P.862 utility
- The STOI/ESTOI utility
- The Audio File Time Aligner utility
- Audio samples processed by CODEC2 and TWELP vocoders at different bitrates
- A testing environment with command (BAT) files

Conclusion. The open-source CODEC2 vocoder is a notable achievement for a free codec operating at low bit rates, and if minimizing costs is a primary concern for your application, it may be worth considering.

However, due to its older technology, CODEC2 tends to offer lower speech quality and intelligibility compared to newer solutions, including the TWELP vocoder, which provides superior performance even at a much lower bit rate. Moreover, CODEC2 struggles with non-speech signals like sirens, etc.
We recommend that you carefully evaluate all factors relevant to your specific application and, if needed, consult with experts in voice communication before making a decision on which codec to use.

We would be happy to provide more detailed information about the TWELP vocoder, assist with your decision-making, and discuss whether it may be a good fit for your needs. Please feel free to contact us via email.

Additionally, if you would like to see a comparison of our vocoders with any other standard or open-source options, we are happy to conduct the necessary tests and share the results on our website.