This document contains the description of an algorithm for the coding
of speech signals sampled at 8 kHz. The algorithm, called iLBC, uses
a block-independent linear-predictive coding (LPC) algorithm and has
support for two basic frame lengths: 20 ms at 15.2 kbit/s and 30 ms
at 13.33 kbit/s. When the codec operates at block lengths of 20 ms,
it produces 304 bits per block, which SHOULD be packetized as in [1].
Similarly, for block lengths of 30 ms it produces 400 bits per block,
which SHOULD be packetized as in [1]. The two modes for the
different frame sizes operate in a very similar way. When they
differ it is explicitly stated in the text, usually with the notation
x/y, where x refers to the 20 ms mode and y refers to the 30 ms mode.
The described algorithm results in a speech coding system with a
controlled response to packet losses similar to what is known from
pulse code modulation (PCM) with packet loss concealment (PLC), such
as the ITU-T G.711 standard [4], which operates at a fixed bit rate
of 64 kbit/s. At the same time, the described algorithm enables
fixed bit rate coding with a quality-versus-bit rate tradeoff close
to state-of-the-art. A suitable RTP payload format for the iLBC
codec is specified in [1].
Some of the applications for which this coder is suitable are real
time communications such as telephony and videoconferencing,
streaming audio, archival, and messaging.
Andersen, et al. Experimental [Page 4]
RFC 3951 Internet Low Bit Rate Codec December 2004
Cable Television Laboratories (CableLabs(R)) has adopted iLBC as a
mandatory PacketCable(TM) audio codec standard for VoIP over Cable
applications [3].
This document is organized as follows. Section 2 gives a brief
outline of the codec. The specific encoder and decoder algorithms
are explained in sections 3 and 4, respectively. Appendix A provides
a c-code reference implementation.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in BCP 14, RFC 2119 [2].
The codec consists of an encoder and a decoder as described in
sections 2.1 and 2.2, respectively.
The essence of the codec is LPC and block-based coding of the LPC
residual signal. For each 160/240 (20 ms/30 ms) sample block, the
following major steps are performed: A set of LPC filters are
computed, and the speech signal is filtered through them to produce
the residual signal. The codec uses scalar quantization of the
dominant part, in terms of energy, of the residual signal for the
block. The dominant state is of length 57/58 (20 ms/30 ms) samples
and forms a start state for dynamic codebooks constructed from the
already coded parts of the residual signal. These dynamic codebooks
are used to code the remaining parts of the residual signal. By this
method, coding independence between blocks is achieved, resulting in
elimination of propagation of perceptual degradations due to packet
loss. The method facilitates high-quality packet loss concealment
(PLC).
The input to the encoder SHOULD be 16 bit uniform PCM sampled at 8
kHz. It SHOULD be partitioned into blocks of BLOCKL=160/240 samples
for the 20/30 ms frame size. Each block is divided into NSUB=4/6
consecutive sub-blocks of SUBL=40 samples each. For 30 ms frame
size, the encoder performs two LPC_FILTERORDER=10 linear-predictive
coding (LPC) analyses. The first analysis applies a smooth window
centered over the second sub-block and extending to the middle of the
fifth sub-block. The second LPC analysis applies a smooth asymmetric
window centered over the fifth sub-block and extending to the end of
the sixth sub-block. For 20 ms frame size, one LPC_FILTERORDER=10
linear-predictive coding (LPC) analysis is performed with a smooth
window centered over the third sub-frame.
Andersen, et al. Experimental [Page 5]
RFC 3951 Internet Low Bit Rate Codec December 2004
For each of the LPC analyses, a set of line-spectral frequencies
(LSFs) are obtained, quantized, and interpolated to obtain LSF
coefficients for each sub-block. Subsequently, the LPC residual is
computed by using the quantized and interpolated LPC analysis
filters.
The two consecutive sub-blocks of the residual exhibiting the maximal
weighted energy are identified. Within these two sub-blocks, the
start state (segment) is selected from two choices: the first 57/58
samples or the last 57/58 samples of the two consecutive sub-blocks.
The selected segment is the one of higher energy. The start state is
encoded with scalar quantization.
A dynamic codebook encoding procedure is used to encode 1) the 23/22
(20 ms/30 ms) remaining samples in the two sub-blocks containing the
start state; 2) the sub-blocks after the start state in time; and 3)
the sub-blocks before the start state in time. Thus, the encoding
target can be either the 23/22 samples remaining of the two sub-
blocks containing the start state or a 40-sample sub-block. This
target can consist of samples indexed forward in time or backward in
time, depending on the location of the start state.
The codebook coding is based on an adaptive codebook built from a
codebook memory that contains decoded LPC excitation samples from the
already encoded part of the block. These samples are indexed in the
same time direction as the target vector, ending at the sample
instant prior to the first sample instant represented in the target
vector. The codebook is used in CB_NSTAGES=3 stages in a successive
refinement approach, and the resulting three code vector gains are
encoded with 5-, 4-, and 3-bit scalar quantization, respectively.
The codebook search method employs noise shaping derived from the LPC
filters, and the main decision criterion is to minimize the squared
error between the target vector and the code vectors. Each code
vector in this codebook comes from one of CB_EXPAND=2 codebook
sections. The first section is filled with delayed, already encoded
residual vectors. The code vectors of the second codebook section
are constructed by predefined linear combinations of vectors in the
first section of the codebook.
As codebook encoding with squared-error matching is known to produce
a coded signal of less power than does the scalar quantized start
state signal, a gain re-scaling method is implemented by a refined
search for a better set of codebook gains in terms of power matching
after encoding. This is done by searching for a higher value of the
gain factor for the first stage codebook, as the subsequent stage
codebook gains are scaled by the first stage gain.
Andersen, et al. Experimental [Page 6]
RFC 3951 Internet Low Bit Rate Codec December 2004
Typically for packet communications, a jitter buffer placed at the
receiving end decides whether the packet containing an encoded signal
block has been received or lost. This logic is not part of the codec
described here. For each encoded signal block received the decoder
performs a decoding. For each lost signal block, the decoder
performs a PLC operation.
The decoding for each block starts by decoding and interpolating the
LPC coefficients. Subsequently the start state is decoded.
For codebook-encoded segments, each segment is decoded by
constructing the three code vectors given by the received codebook
indices in the same way that the code vectors were constructed in the
encoder. The three gain factors are also decoded and the resulting
decoded signal is given by the sum of the three codebook vectors
scaled with respective gain.
An enhancement algorithm is applied to the reconstructed excitation
signal. This enhancement augments the periodicity of voiced speech
regions. The enhancement is optimized under the constraint that the
modification signal (defined as the difference between the enhanced
excitation and the excitation signal prior to enhancement) has a
short-time energy that does not exceed a preset fraction of the
short-time energy of the excitation signal prior to enhancement.
A packet loss concealment (PLC) operation is easily embedded in the
decoder. The PLC operation can, e.g., be based on repeating LPC
filters and obtaining the LPC residual signal by using a long-term
prediction estimate from previous residual blocks.
The following block diagram is an overview of all the components of
the iLBC encoding procedure. The description of the blocks contains
references to the section where that particular procedure is further
described.
Andersen, et al. Experimental [Page 7]
RFC 3951 Internet Low Bit Rate Codec December 2004
+-----------+ +---------+ +---------+
speech -> | 1. Pre P | -> | 2. LPC | -> | 3. Ana | ->
+-----------+ +---------+ +---------+
+---------------+ +--------------+
-> | 4. Start Sel | ->| 5. Scalar Qu | ->
+---------------+ +--------------+
+--------------+ +---------------+
-> |6. CB Search | -> | 7. Packetize | -> payload
| +--------------+ | +---------------+
----<---------<------
sub-frame 0..2/4 (20 ms/30 ms)
Figure 3.1. Flow chart of the iLBC encoder
1. Pre-process speech with a HP filter, if needed (section 3.1).
2. Compute LPC parameters, quantize, and interpolate (section 3.2).
3. Use analysis filters on speech to compute residual (section 3.3).
4. Select position of 57/58-sample start state (section 3.5).
5. Quantize the 57/58-sample start state with scalar quantization
(section 3.5).
6. Search the codebook for each sub-frame. Start with 23/22 sample
block, then encode sub-blocks forward in time, and then encode
sub-blocks backward in time. For each block, the steps in Figure
3.4 are performed (section 3.6).
7. Packetize the bits into the payload specified in Table 3.2.
The input to the encoder SHOULD be 16-bit uniform PCM sampled at 8
kHz. Also it SHOULD be partitioned into blocks of BLOCKL=160/240
samples. Each block input to the encoder is divided into NSUB=4/6
consecutive sub-blocks of SUBL=40 samples each.
Andersen, et al. Experimental [Page 8]
RFC 3951 Internet Low Bit Rate Codec December 2004
0 39 79 119 159
+---------------------------------------+
| 1 | 2 | 3 | 4 |
+---------------------------------------+
20 ms frame
0 39 79 119 159 199 239
+-----------------------------------------------------------+
| 1 | 2 | 3 | 4 | 5 | 6 |
+-----------------------------------------------------------+
30 ms frame
Figure 3.2. One input block to the encoder for 20 ms (with four sub-
frames) and 30 ms (with six sub-frames).
In some applications, the recorded speech signal contains DC level
and/or 50/60 Hz noise. If these components have not been removed
prior to the encoder call, they should be removed by a high-pass
filter. A reference implementation of this, using a filter with a
cutoff frequency of 90 Hz, can be found in Appendix A.28.
The input to the LPC analysis module is a possibly high-pass filtered
speech buffer, speech_hp, that contains 240/300 (LPC_LOOKBACK +
BLOCKL = 80/60 + 160/240 = 240/300) speech samples, where samples 0
through 79/59 are from the previous block and samples 80/60 through
239/299 are from the current block. No look-ahead into the next
block is used. For the very first block processed, the look-back
samples are assumed to be zeros.
For each input block, the LPC analysis calculates one/two set(s) of
LPC_FILTERORDER=10 LPC filter coefficients using the autocorrelation
method and the Levinson-Durbin recursion. These coefficients are
converted to the Line Spectrum Frequency representation. In the 20
ms case, the single lsf set represents the spectral characteristics
as measured at the center of the third sub-block. For 30 ms frames,
the first set, lsf1, represents the spectral properties of the input
signal at the center of the second sub-block, and the other set,
lsf2, represents the spectral characteristics as measured at the
center of the fifth sub-block. The details of the computation for 30
ms frames are described in sections 3.2.1 through 3.2.6. Section
3.2.7 explains how the LPC Analysis and Quantization differs for 20
ms frames.
Andersen, et al. Experimental [Page 9]
RFC 3951 Internet Low Bit Rate Codec December 2004
The first step in the LPC analysis procedure is to calculate
autocorrelation coefficients by using windowed speech samples. This
windowing is the only difference in the LPC analysis procedure for
the two sets of coefficients. For the first set, a 240-sample-long
standard symmetric Hanning window is applied to samples 0 through 239
of the input data. The first window, lpc_winTbl, is defined as
lpc_winTbl[i]= 0.5 * (1.0 - cos((2*PI*(i+1))/(BLOCKL+1)));
i=0,...,119
lpc_winTbl[i] = winTbl[BLOCKL - i - 1]; i=120,...,239
The windowed speech speech_hp_win1 is then obtained by multiplying
the first 240 samples of the input speech buffer with the window
coefficients:
speech_hp_win1[i] = speech_hp[i] * lpc_winTbl[i];
i=0,...,BLOCKL-1
From these 240 windowed speech samples, 11 (LPC_FILTERORDER + 1)
autocorrelation coefficients, acf1, are calculated:
acf1[lag] += speech_hp_win1[n] * speech_hp_win1[n + lag];
lag=0,...,LPC_FILTERORDER; n=0,...,BLOCKL-lag-1
In order to make the analysis more robust against numerical precision
problems, a spectral smoothing procedure is applied by windowing the
autocorrelation coefficients before the LPC coefficients are
computed. Also, a white noise floor is added to the autocorrelation
function by multiplying coefficient zero by 1.0001 (40dB below the
energy of the windowed speech signal). These two steps are
implemented by multiplying the autocorrelation coefficients with the
following window:
lpc_lagwinTbl[0] = 1.0001;
lpc_lagwinTbl[i] = exp(-0.5 * ((2 * PI * 60.0 * i) /FS)^2);
i=1,...,LPC_FILTERORDER
where FS=8000 is the sampling frequency
Then, the windowed acf function acf1_win is obtained by
acf1_win[i] = acf1[i] * lpc_lagwinTbl[i];
i=0,...,LPC_FILTERORDER
The second set of autocorrelation coefficients, acf2_win, are
obtained in a similar manner. The window, lpc_asymwinTbl, is applied
to samples 60 through 299, i.e., the entire current block. The
Andersen, et al. Experimental [Page 10]
RFC 3951 Internet Low Bit Rate Codec December 2004
window consists of two segments, the first (samples 0 to 219) being
half a Hanning window with length 440 and the second a quarter of a
cycle of a cosine wave. By using this asymmetric window, an LPC
analysis centered in the fifth sub-block is obtained without the need
for any look-ahead, which would add delay. The asymmetric window is
defined as
lpc_asymwinTbl[i] = (sin(PI * (i + 1) / 441))^2; i=0,...,219
lpc_asymwinTbl[i] = cos((i - 220) * PI / 40); i=220,...,239
and the windowed speech is computed by
speech_hp_win2[i] = speech_hp[i + LPC_LOOKBACK] *
lpc_asymwinTbl[i]; i=0,....BLOCKL-1
The windowed autocorrelation coefficients are then obtained in
exactly the same way as for the first analysis instance.
The generation of the windows lpc_winTbl, lpc_asymwinTbl, and
lpc_lagwinTbl are typically done in advance, and the arrays are
stored in ROM rather than repeating the calculation for every block.
From the 2 x 11 smoothed autocorrelation coefficients, acf1_win and
acf2_win, the 2 x 11 LPC coefficients, lp1 and lp2, are calculated
in the same way for both analysis locations by using the well known
Levinson-Durbin recursion. The first LPC coefficient is always 1.0,
resulting in ten unique coefficients.
After determining the LPC coefficients, a bandwidth expansion
procedure is applied to smooth the spectral peaks in the
short-term spectrum. The bandwidth addition is obtained by the
following modification of the LPC coefficients:
lp1_bw[i] = lp1[i] * chirp^i; i=0,...,LPC_FILTERORDER
lp2_bw[i] = lp2[i] * chirp^i; i=0,...,LPC_FILTERORDER
where "chirp" is a real number between 0 and 1. It is RECOMMENDED to
use a value of 0.9.
Thus far, two sets of LPC coefficients that represent the short-term
spectral characteristics of the speech signal for two different time
locations within the current block have been determined. These
coefficients SHOULD be quantized and interpolated. Before this is
Andersen, et al. Experimental [Page 11]
RFC 3951 Internet Low Bit Rate Codec December 2004
done, it is advantageous to convert the LPC parameters into another
type of representation called Line Spectral Frequencies (LSF). The
LSF parameters are used because they are better suited for
quantization and interpolation than the regular LPC coefficients.
Many computationally efficient methods for calculating the LSFs from
the LPC coefficients have been proposed in the literature. The
detailed implementation of one applicable method can be found in
Appendix A.26. The two arrays of LSF coefficients obtained, lsf1 and
lsf2, are of dimension 10 (LPC_FILTERORDER).
Because the LPC filters defined by the two sets of LSFs are also
needed in the decoder, the LSF parameters need to be quantized and
transmitted as side information. The total number of bits required
to represent the quantization of the two LSF representations for one
block of speech is 40, with 20 bits used for each of lsf1 and lsf2.
For computational and storage reasons, the LSF vectors are quantized
using three-split vector quantization (VQ). That is, the LSF vectors
are split into three sub-vectors that are each quantized with a
regular VQ. The quantized versions of lsf1 and lsf2, qlsf1 and
qlsf2, are obtained by using the same memoryless split VQ. The
length of each of these two LSF vectors is 10, and they are split
into three sub-vectors containing 3, 3, and 4 values, respectively.
For each of the sub-vectors, a separate codebook of quantized values
has been designed with a standard VQ training method for a large
database containing speech from a large number of speakers recorded
under various conditions. The size of each of the three codebooks
associated with the split definitions above is
int size_lsfCbTbl[LSF_NSPLIT] = {64,128,128};
The actual values of the vector quantization codebook that must be
used can be found in the reference code of Appendix A. Both sets of
LSF coefficients, lsf1 and lsf2, are quantized with a standard
memoryless split vector quantization (VQ) structure using the squared
error criterion in the LSF domain. The split VQ quantization
consists of the following steps:
1) Quantize the first three LSF coefficients (1 - 3) with a VQ
codebook of size 64.
2) Quantize the next three LSF coefficients 4 - 6 with VQ a codebook
of size 128.
3) Quantize the last four LSF coefficients (7 - 10) with a VQ
codebook of size 128.
Andersen, et al. Experimental [Page 12]
RFC 3951 Internet Low Bit Rate Codec December 2004
This procedure, repeated for lsf1 and lsf2, gives six quantization
indices and the quantized sets of LSF coefficients qlsf1 and qlsf2.
Each set of three indices is encoded with 6 + 7 + 7 = 20 bits. The
total number of bits used for LSF quantization in a block is thus 40
bits.
The LSF representation of the LPC filter has the convenient property
that the coefficients are ordered by increasing value, i.e., lsf(n-1)
< lsf(n), 0 < n < 10, if the corresponding synthesis filter is
stable. As we are employing a split VQ scheme, it is possible that
at the split boundaries the LSF coefficients are not ordered
correctly and hence that the corresponding LP filter is unstable. To
ensure that the filter used is stable, a stability check is performed
for the quantized LSF vectors. If it turns out that the coefficients
are not ordered appropriately (with a safety margin of 50 Hz to
ensure that formant peaks are not too narrow), they will be moved
apart. The detailed method for this can be found in Appendix A.40.
The same procedure is performed in the decoder. This ensures that
exactly the same LSF representations are used in both encoder and
decoder.
From the two sets of LSF coefficients that are computed for each
block of speech, different LSFs are obtained for each sub-block by
means of interpolation. This procedure is performed for the original
LSFs (lsf1 and lsf2), as well as the quantized versions qlsf1 and
qlsf2, as both versions are used in the encoder. Here follows a
brief summary of the interpolation scheme; the details are found in
the c-code of Appendix A. In the first sub-block, the average of the
second LSF vector from the previous block and the first LSF vector in
the current block is used. For sub-blocks two through five, the LSFs
used are obtained by linear interpolation from lsf1 (and qlsf1) to
lsf2 (and qlsf2), with lsf1 used in sub-block two and lsf2 in sub-
block five. In the last sub-block, lsf2 is used. For the very first
block it is assumed that the last LSF vector of the previous block is
equal to a predefined vector, lsfmeanTbl, obtained by calculating the
mean LSF vector of the LSF design database.
lsfmeanTbl[LPC_FILTERORDER] = {0.281738, 0.445801, 0.663330,
0.962524, 1.251831, 1.533081, 1.850586, 2.137817,
2.481445, 2.777344}
Andersen, et al. Experimental [Page 13]
RFC 3951 Internet Low Bit Rate Codec December 2004
The interpolation method is standard linear interpolation in the LSF
domain. The interpolated LSF values are converted to LPC
coefficients for each sub-block. The unquantized and quantized LPC
coefficients form two sets of filters respectively. The unquantized
analysis filter for sub-block k is defined as follows
___
\
Ak(z)= 1 + > ak(i)*z^(-i)
/__
i=1...LPC_FILTERORDER
The quantized analysis filter for sub-block k is defined as follows
___
\
A~k(z)= 1 + > a~k(i)*z^(-i)
/__
i=1...LPC_FILTERORDER
A reference implementation of the lsf encoding is given in Appendix
A.38. A reference implementation of the corresponding decoding can
be found in Appendix A.36.
As previously stated, the codec only calculates one set of LPC
parameters for the 20 ms frame size as opposed to two sets for 30 ms
frames. A single set of autocorrelation coefficients is calculated
on the LPC_LOOKBACK + BLOCKL = 80 + 160 = 240 samples. These samples
are windowed with the asymmetric window lpc_asymwinTbl, centered over
the third sub-frame, to form speech_hp_win. Autocorrelation
coefficients, acf, are calculated on the 240 samples in speech_hp_win
and then windowed exactly as in section 3.2.1 (resulting in
acf_win).
This single set of windowed autocorrelation coefficients is used to
calculate LPC coefficients, LSF coefficients, and quantized LSF
coefficients in exactly the same manner as in sections 3.2.3 through
3.2.4. As for the 30 ms frame size, the ten LSF coefficients are
divided into three sub-vectors of size 3, 3, and 4 and quantized by
using the same scheme and codebook as in section 3.2.4 to finally get
3 quantization indices. The quantized LSF coefficients are
stabilized with the algorithm described in section 3.2.5.
From the set of LSF coefficients computed for this block and those
from the previous block, different LSFs are obtained for each sub-
block by means of interpolation. The interpolation is done linearly
in the LSF domain over the four sub-blocks, so that the n-th sub-
Andersen, et al. Experimental [Page 14]
RFC 3951 Internet Low Bit Rate Codec December 2004
frame uses the weight (4-n)/4 for the LSF from old frame and the
weight n/4 of the LSF from the current frame. For the very first
block the mean LSF, lsfmeanTbl, is used as the LSF from the previous
block. Similarly as seen in section 3.2.6, both unquantized, A(z),
and quantized, A~(z), analysis filters are calculated for each of the
four sub-blocks.
The block of speech samples is filtered by the quantized and
interpolated LPC analysis filters to yield the residual signal. In
particular, the corresponding LPC analysis filter for each 40 sample
sub-block is used to filter the speech samples for the same sub-
block. The filter memory at the end of each sub-block is carried
over to the LPC filter of the next sub-block. The signal at the
output of each LP analysis filter constitutes the residual signal for
the corresponding sub-block.
A reference implementation of the LPC analysis filters is given in
Appendix A.10.
In principle any good design of a perceptual weighting filter can be
applied in the encoder without compromising this codec definition.
However, it is RECOMMENDED to use the perceptual weighting filter Wk
for sub-block k specified below:
Wk(z)=1/Ak(z/LPC_CHIRP_WEIGHTDENUM), where
LPC_CHIRP_WEIGHTDENUM = 0.4222
This is a simple design with low complexity that is applied in the
LPC residual domain. Here Ak(z) is the filter obtained for sub-block
k from unquantized but interpolated LSF coefficients.
The start state is quantized by using a common 6-bit scalar quantizer
for the block and a 3-bit scalar quantizer operating on scaled
samples in the weighted speech domain. In the following we describe
the state encoding in greater detail.
Andersen, et al. Experimental [Page 15]
RFC 3951 Internet Low Bit Rate Codec December 2004
The two sub-blocks containing the start state are determined by
finding the two consecutive sub-blocks in the block having the
highest power. Advantageously, down-weighting is used in the
beginning and end of the sub-frames, i.e., the following measure is
computed (NSUB=4/6 for 20/30 ms frame size):
nsub=1,...,NSUB-1
ssqn[nsub] = 0.0;
for (i=(nsub-1)*SUBL; i<(nsub-1)*SUBL+5; i++)
ssqn[nsub] += sampEn_win[i-(nsub-1)*SUBL]*
residual[i]*residual[i];
for (i=(nsub-1)*SUBL+5; i<(nsub+1)*SUBL-5; i++)
ssqn[nsub] += residual[i]*residual[i];
for (i=(nsub+1)*SUBL-5; i<(nsub+1)*SUBL; i++)
ssqn[nsub] += sampEn_win[(nsub+1)*SUBL-i-1]*
residual[i]*residual[i];
where sampEn_win[5]={1/6, 2/6, 3/6, 4/6, 5/6}; MAY be used. The
sub-frame number corresponding to the maximum value of
ssqEn_win[nsub-1]*ssqn[nsub] is selected as the start state
indicator. A weighting of ssqEn_win[]={0.8,0.9,1.0,0.9,0.8} for 30
ms frames and ssqEn_win[]={0.9,1.0,0.9} for 20 ms frames; MAY
advantageously be used to bias the start state towards the middle of
the frame.
For 20 ms frames there are three possible positions for the two-sub-
block length maximum power segment; the start state position is
encoded with 2 bits. The start state position, start, MUST be
encoded as
start=1: start state in sub-frame 0 and 1
start=2: start state in sub-frame 1 and 2
start=3: start state in sub-frame 2 and 3
For 30 ms frames there are five possible positions of the two-sub-
block length maximum power segment, the start state position is
encoded with 3 bits. The start state position, start, MUST be
encoded as
start=1: start state in sub-frame 0 and 1
start=2: start state in sub-frame 1 and 2
start=3: start state in sub-frame 2 and 3
start=4: start state in sub-frame 3 and 4
start=5: start state in sub-frame 4 and 5
Andersen, et al. Experimental [Page 16]
RFC 3951 Internet Low Bit Rate Codec December 2004
Hence, in both cases, index 0 is not used. In order to shorten the
start state for bit rate efficiency, the start state is brought down
to STATE_SHORT_LEN=57 samples for 20 ms frames and STATE_SHORT_LEN=58
samples for 30 ms frames. The power of the first 23/22 and last
23/22 samples of the two sub-frame blocks identified above is
computed as the sum of the squared signal sample values, and the
23/22-sample segment with the lowest power is excluded from the start
state. One bit is transmitted to indicate which of the two possible
57/58 sample segments is used. The start state position within the
two sub-frames determined above, state_first, MUST be encoded as
state_first=1: start state is first STATE_SHORT_LEN samples
state_first=0: start state is last STATE_SHORT_LEN samples
The block of residual samples in the start state is first filtered by
an all-pass filter with the quantized LPC coefficients as denominator
and reversed quantized LPC coefficients as numerator. The purpose of
this phase-dispersion filter is to get a more even distribution of
the sample values in the residual signal. The filtering is performed
by circular convolution, where the initial filter memory is set to
zero.
res(0..(STATE_SHORT_LEN-1)) = uncoded start state residual
res((STATE_SHORT_LEN)..(2*STATE_SHORT_LEN-1)) = 0
Pk(z) = A~rk(z)/A~k(z), where
___
\
A~rk(z)= z^(-LPC_FILTERORDER)+>a~k(i+1)*z^(i-(LPC_FILTERORDER-1))
/__
i=0...(LPC_FILTERORDER-1)
and A~k(z) is taken from the block where the start state begins
res -> Pk(z) -> filtered
ccres(k) = filtered(k) + filtered(k+STATE_SHORT_LEN),
k=0..(STATE_SHORT_LEN-1)
The all-pass filtered block is searched for its largest magnitude
sample. The 10-logarithm of this magnitude is quantized with a 6-bit
quantizer, state_frgqTbl, by finding the nearest representation.
Andersen, et al. Experimental [Page 17]
RFC 3951 Internet Low Bit Rate Codec December 2004
This results in an index, idxForMax, corresponding to a quantized
value, qmax. The all-pass filtered residual samples in the block are
then multiplied with a scaling factor scal=4.5/(10^qmax) to yield
normalized samples.
state_frgqTbl[64] = {1.000085, 1.071695, 1.140395, 1.206868,
1.277188, 1.351503, 1.429380, 1.500727, 1.569049,
1.639599, 1.707071, 1.781531, 1.840799, 1.901550,
1.956695, 2.006750, 2.055474, 2.102787, 2.142819,
2.183592, 2.217962, 2.257177, 2.295739, 2.332967,
2.369248, 2.402792, 2.435080, 2.468598, 2.503394,
2.539284, 2.572944, 2.605036, 2.636331, 2.668939,
2.698780, 2.729101, 2.759786, 2.789834, 2.818679,
2.848074, 2.877470, 2.906899, 2.936655, 2.967804,
3.000115, 3.033367, 3.066355, 3.104231, 3.141499,
3.183012, 3.222952, 3.265433, 3.308441, 3.350823,
3.395275, 3.442793, 3.490801, 3.542514, 3.604064,
3.666050, 3.740994, 3.830749, 3.938770, 4.101764}
The normalized samples are quantized in the perceptually weighted
speech domain by a sample-by-sample scalar DPCM quantization as
depicted in Figure 3.3. Each sample in the block is filtered by a
weighting filter Wk(z), specified in section 3.4, to form a weighted
speech sample x[n]. The target sample d[n] is formed by subtracting
a predicted sample y[n], where the prediction filter is given by
Pk(z) = 1 - 1 / Wk(z).
+-------+ x[n] + d[n] +-----------+ u[n]
residual -->| Wk(z) |-------->(+)---->| Quantizer |------> quantized
+-------+ - /|\ +-----------+ | residual
| \|/
y[n] +--------------------->(+)
| |
| +------+ |
+--------| Pk(z)|<------+
+------+
Figure 3.3. Quantization of start state samples by DPCM in weighted
speech domain.
The coded state sample u[n] is obtained by quantizing d[n] with a 3-
bit quantizer with quantization table state_sq3Tbl.
state_sq3Tbl[8] = {-3.719849, -2.177490, -1.130005, -0.309692,
0.444214, 1.329712, 2.436279, 3.983887}
Andersen, et al. Experimental [Page 18]
RFC 3951 Internet Low Bit Rate Codec December 2004
The quantized samples are transformed back to the residual domain by
1) scaling with 1/scal; 2) time-reversing the scaled samples; 3)
filtering the time-reversed samples by the same all-pass filter, as
in section 3.5.2, by using circular convolution; and 4) time-
reversing the filtered samples. (More detail is in section 4.2.)
A reference implementation of the start-state encoding can be found
in Appendix A.46.
A dynamic codebook is used to encode 1) the 23/22 remaining samples
in the two sub-blocks containing the start state; 2) the sub-blocks
after the start state in time; and 3) the sub-blocks before the start
state in time. Thus, the encoding target can be either the 23/22
samples remaining of the 2 sub-blocks containing the start state, or
a 40-sample sub-block. This target can consist of samples that are
indexed forward in time or backward in time, depending on the
location of the start state. The length of the target is denoted by
lTarget.
The coding is based on an adaptive codebook that is built from a
codebook memory that contains decoded LPC excitation samples from the
already encoded part of the block. These samples are indexed in the
same time direction as is the target vector and end at the sample
instant prior to the first sample instant represented in the target
vector. The codebook memory has length lMem, which is equal to
CB_MEML=147 for the two/four 40-sample sub-blocks and 85 for the
23/22-sample sub-block.
The following figure shows an overview of the encoding procedure.
+------------+ +---------------+ +-------------+
-> | 1. Decode | -> | 2. Mem setup | -> | 3. Perc. W. | ->
+------------+ +---------------+ +-------------+
+------------+ +-----------------+
-> | 4. Search | -> | 5. Upd. Target | ------------------>
| +------------+ +------------------ |
----<-------------<-----------<----------
stage=0..2
+----------------+
-> | 6. Recalc G[0] | ---------------> gains and CB indices
+----------------+
Figure 3.4. Flow chart of the codebook search in the iLBC encoder.
Andersen, et al. Experimental [Page 19]
RFC 3951 Internet Low Bit Rate Codec December 2004
1. Decode the part of the residual that has been encoded so far,
using the codebook without perceptual weighting.
2. Set up the memory by taking data from the decoded residual. This
memory is used to construct codebooks. For blocks preceding the
start state, both the decoded residual and the target are time
reversed (section 3.6.1).
3. Filter the memory + target with the perceptual weighting filter
(section 3.6.2).
4. Search for the best match between the target and the codebook
vector. Compute the optimal gain for this match and quantize that
gain (section 3.6.4).
5. Update the perceptually weighted target by subtracting the
contribution from the selected codebook vector from the
perceptually weighted memory (quantized gain times selected
vector). Repeat 4 and 5 for the two additional stages.
6. Calculate the energy loss due to encoding of the residual. If
needed, compensate for this loss by an upscaling and
requantization of the gain for the first stage (section 3.7).
The following sections provide an in-depth description of the
different blocks of Figure 3.4.
The codebook memory is based on the already encoded sub-blocks, so
the available data for encoding increases for each new sub-block that
has been encoded. Until enough sub-blocks have been encoded to fill
the codebook memory with data, it is padded with zeros. The
following figure shows an example of the order in which the sub-
blocks are encoded for the 30 ms frame size if the start state is
located in the last 58 samples of sub-block 2 and 3.
+-----------------------------------------------------+
| 5 | 1 |///|////////| 2 | 3 | 4 |
+-----------------------------------------------------+
Figure 3.5. The order from 1 to 5 in which the sub-blocks are
encoded. The slashed area is the start state.
Andersen, et al. Experimental [Page 20]
RFC 3951 Internet Low Bit Rate Codec December 2004
The first target sub-block to be encoded is number 1, and the
corresponding codebook memory is shown in the following figure. As
the target vector comes before the start state in time, the codebook
memory and target vector are time reversed; thus, after the block has
been time reversed the search algorithm can be reused. As only the
start state has been encoded so far, the last samples of the codebook
memory are padded with zeros.
+-------------------------
|zeros|\\\\\\\\|\\\\| 1 |
+-------------------------
Figure 3.6. The codebook memory, length lMem=85 samples, and the
target vector 1, length 22 samples.
The next step is to encode sub-block 2 by using the memory that now
has increased since sub-block 1 has been encoded. The following
figure shows the codebook memory for encoding of sub-block 2.
+-----------------------------------
| zeros | 1 |///|////////| 2 |
+-----------------------------------
Figure 3.7. The codebook memory, length lMem=147 samples, and the
target vector 2, length 40 samples.
The next step is to encode sub-block 3 by using the memory which has
been increased yet again since sub-blocks 1 and 2 have been encoded,
but the sub-block still has to be padded with a few zeros. The
following figure shows the codebook memory for encoding of sub-block
3.
+------------------------------------------
|zeros| 1 |///|////////| 2 | 3 |
+------------------------------------------
Figure 3.8. The codebook memory, length lMem=147 samples, and the
target vector 3, length 40 samples.
The next step is to encode sub-block 4 by using the memory which now
has increased yet again since sub-blocks 1, 2, and 3 have been
encoded. This time, the memory does not have to be padded with
zeros. The following figure shows the codebook memory for encoding
of sub-block 4.
Andersen, et al. Experimental [Page 21]
RFC 3951 Internet Low Bit Rate Codec December 2004
+------------------------------------------
|1|///|////////| 2 | 3 | 4 |
+------------------------------------------
Figure 3.9. The codebook memory, length lMem=147 samples, and the
target vector 4, length 40 samples.
The final target sub-block to be encoded is number 5, and the
following figure shows the corresponding codebook memory. As the
target vector comes before the start state in time, the codebook
memory and target vector are time reversed.
+-------------------------------------------
| 3 | 2 |\\\\\\\\|\\\\| 1 | 5 |
+-------------------------------------------
Figure 3.10. The codebook memory, length lMem=147 samples, and the
target vector 5, length 40 samples.
For the case of 20 ms frames, the encoding procedure looks almost
exactly the same. The only difference is that the size of the start
state is 57 samples and that there are only three sub-blocks to be
encoded. The encoding order is the same as above, starting with the
23-sample target and then encoding the two remaining 40-sample sub-
blocks, first going forward in time and then going backward in time
relative to the start state.
To provide a perceptual weighting of the coding error, a
concatenation of the codebook memory and the target to be coded is
all-pole filtered with the perceptual weighting filter specified in
section 3.4. The filter state of the weighting filter is set to
zero.
in(0..(lMem-1)) = unweighted codebook memory
in(lMem..(lMem+lTarget-1)) = unweighted target signal
in -> Wk(z) -> filtered,
where Wk(z) is taken from the sub-block of the target
weighted codebook memory = filtered(0..(lMem-1))
weighted target signal = filtered(lMem..(lMem+lTarget-1))
The codebook search is done with the weighted codebook memory and the
weighted target, whereas the decoding and the codebook memory update
uses the unweighted codebook memory.
Andersen, et al. Experimental [Page 22]
RFC 3951 Internet Low Bit Rate Codec December 2004
The codebook for the search is created from the perceptually weighted
codebook memory. It consists of two sections, where the first is
referred to as the base codebook and the second as the expanded
codebook, as it is created by linear combinations of the first. Each
of these two sections also has a subsection referred to as the
augmented codebook. The augmented codebook is only created and used
for the coding of the 40-sample sub-blocks and not for the 23/22-
sample sub-block case. The codebook size used for the different
sub-blocks and different stages are summarized in the table below.
Stage
1 2 & 3
--------------------------------------------
22 128 (64+0)*2 128 (64+0)*2
Sub- 1:st 40 256 (108+20)*2 128 (44+20)*2
Blocks 2:nd 40 256 (108+20)*2 256 (108+20)*2
3:rd 40 256 (108+20)*2 256 (108+20)*2
4:th 40 256 (108+20)*2 256 (108+20)*2
Table 3.1. Codebook sizes for the 30 ms mode.
Table 3.1 shows the codebook size for the different sub-blocks and
stages for 30 ms frames. Inside the parentheses it shows how the
number of codebook vectors is distributed, within the two sections,
between the base/expanded codebook and the augmented base/expanded
codebook. It should be interpreted in the following way:
(base/expanded cb + augmented base/expanded cb). The total number of
codebook vectors for a specific sub-block and stage is given by the
following formula:
Tot. cb vectors = base cb + aug. base cb + exp. cb + aug. exp. cb
The corresponding values to Figure 3.1 for 20 ms frames are only
slightly modified. The short sub-block is 23 instead of 22 samples,
and the 3:rd and 4:th sub-frame are not present.
The base codebook is given by the perceptually weighted codebook
memory that is mentioned in section 3.5.3. The different codebook
vectors are given by sliding a window of length 23/22 or 40, given by
variable lTarget, over the lMem-long perceptually weighted codebook
memory. The indices are ordered so that the codebook vector
containing sample (lMem-lTarget-n) to (lMem-n-1) of the codebook
Andersen, et al. Experimental [Page 23]
RFC 3951 Internet Low Bit Rate Codec December 2004
memory vector has index n, where n=0..lMem-lTarget. Thus the total
number of base codebook vectors is lMem-lTarget+1, and the indices
are ordered from sample delay lTarget (23/22 or 40) to lMem+1 (86 or
148).
The base codebook is expanded by a factor of 2, creating an
additional section in the codebook. This new section is obtained by
filtering the base codebook, base_cb, with a FIR filter with filter
length CB_FILTERLEN=8. The construction of the expanded codebook
compensates for the delay of four samples introduced by the FIR
filter.
cbfiltersTbl[CB_FILTERLEN]={-0.033691, 0.083740, -0.144043,
0.713379, 0.806152, -0.184326,
0.108887, -0.034180};
___
\
exp_cb(k)= + > cbfiltersTbl(i)*x(k-i+4)
/__
i=0...(LPC_FILTERORDER-1)
where x(j) = base_cb(j) for j=0..lMem-1 and 0 otherwise
The individual codebook vectors of the new filtered codebook, exp_cb,
and their indices are obtained in the same fashion as described above
for the base codebook.
For cases where encoding entire sub-blocks, i.e., cbveclen=40, the
base and expanded codebooks are augmented to increase codebook
richness. The codebooks are augmented by vectors produced by
interpolation of segments. The base and expanded codebook,
constructed above, consists of vectors corresponding to sample delays
in the range from cbveclen to lMem. The codebook augmentation
attempts to augment these codebooks with vectors corresponding to
sample delays from 20 to 39. However, not all of these samples are
present in the base codebook and expanded codebook, respectively.
Therefore, the augmentation vectors are constructed as linear
combinations between samples corresponding to sample delays in the
range 20 to 39. The general idea of this procedure is presented in
the following figures and text. The procedure is performed for both
the base codebook and the expanded codebook.
Andersen, et al. Experimental [Page 24]
RFC 3951 Internet Low Bit Rate Codec December 2004
- - ------------------------|
codebook memory |
- - ------------------------|
|-5-|---15---|-5-|
pi pp po
| | Codebook vector
|---15---|-5-|-----20-----| <- corresponding to
i ii iii sample delay 20
Figure 3.11. Generation of the first augmented codebook.
Figure 3.11 shows the codebook memory with pointers pi, pp, and po,
where pi points to sample 25, pp to sample 20, and po to sample 5.
Below the codebook memory, the augmented codebook vector
corresponding to sample delay 20 is drawn. Segment i consists of
fifteen samples from pointer pp and forward in time. Segment ii
consists of five interpolated samples from pi and forward and from po
and forward. The samples are linearly interpolated with weights
[0.0, 0.2, 0.4, 0.6, 0.8] for pi and weights [1.0, 0.8, 0.6, 0.4,
0.2] for po. Segment iii consists of twenty samples from pp and
forward. The augmented codebook vector corresponding to sample delay
21 is produced by moving pointers pp and pi one sample backward in
time. This gives us the following figure.
- - ------------------------|
codebook memory |
- - ------------------------|
|-5-|---16---|-5-|
pi pp po
| | Codebook vector
|---16---|-5-|-----19-----| <- corresponding to
i ii iii sample delay 21
Figure 3.12. Generation of the second augmented codebook.
Figure 3.12 shows the codebook memory with pointers pi, pp and po
where pi points to sample 26, pp to sample 21, and po to sample 5.
Below the codebook memory, the augmented codebook vector
corresponding to sample delay 21 is drawn. Segment i now consists of
sixteen samples from pp and forward. Segment ii consists of five
interpolated samples from pi and forward and from po and forward, and
the interpolation weights are the same throughout the procedure.
Segment iii consists of nineteen samples from pp and forward. The
same procedure of moving the two pointers is continued until the last
augmented vector corresponding to sample delay 39 has been created.
This gives a total of twenty new codebook vectors to each of the two
Andersen, et al. Experimental [Page 25]
RFC 3951 Internet Low Bit Rate Codec December 2004
sections. Thus the total number of codebook vectors for each of the
two sections, when including the augmented codebook, becomes lMem-
SUBL+1+SUBL/2. This is provided that augmentation is evoked, i.e.,
that lTarget=SUBL.
The codebook search uses the codebooks described in the sections
above to find the best match of the perceptually weighted target, see
section 3.6.2. The search method is a multi-stage gain-shape
matching performed as follows. At each stage the best shape vector
is identified, then the gain is calculated and quantized, and finally
the target is updated in preparation for the next codebook search
stage. The number of stages is CB_NSTAGES=3.
If the target is the 23/22-sample vector the codebooks are indexed so
that the base codebook is followed by the expanded codebook. If the
target is 40 samples the order is as follows: base codebook,
augmented base codebook, expanded codebook, and augmented expanded
codebook. The size of each codebook section and its corresponding
augmented section is given by Table 3.1 in section 3.6.3.
For example, when the second 40-sample sub-block is coded, indices 0
- 107 correspond to the base codebook, 108 - 127 correspond to the
augmented base codebook, 128 - 235 correspond to the expanded
codebook, and indices 236 - 255 correspond to the augmented expanded
codebook. The indices are divided in the same fashion for all stages
in the example. Only in the case of coding the first 40-sample sub-
block is there a difference between stages (see Table 3.1).
The codebooks are searched to find the best match to the target at
each stage. When the best match is found, the target is updated and
the next-stage search is started. The three chosen codebook vectors
and their corresponding gains constitute the encoded sub-block. The
best match is decided by the following three criteria:
1. Compute the measure
(target*cbvec)^2 / ||cbvec||^2
for all codebook vectors, cbvec, and choose the codebook vector
maximizing the measure. The expression (target*cbvec) is the dot
product between the target vector to be coded and the codebook vector
for which we compute the measure. The norm, ||x||, is defined as the
square root of (x*x).
Andersen, et al. Experimental [Page 26]
RFC 3951 Internet Low Bit Rate Codec December 2004
2. The absolute value of the gain, corresponding to the chosen
codebook vector, cbvec, must be smaller than a fixed limit,
CB_MAXGAIN=1.3:
|gain| < CB_MAXGAIN
where the gain is computed in the following way:
gain = (target*cbvec) / ||cbvec||^2
3. For the first stage, the dot product of the chosen codebook vector
and target must be positive:
target*cbvec > 0
In practice the above criteria are used in a sequential search
through all codebook vectors. The best match is found by registering
a new max measure and index whenever the previously registered max
measure is surpassed and all other criteria are fulfilled. If none
of the codebook vectors fulfill (2) and (3), the first codebook
vector is selected.
The gain follows as a result of the computation
gain = (target*cbvec) / ||cbvec||^2
for the optimal codebook vector found by the procedure in section
3.6.4.1.
The three stages quantize the gain, using 5, 4, and 3 bits,
respectively. In the first stage, the gain is limited to positive
values. This gain is quantized by finding the nearest value in the
quantization table gain_sq5Tbl.
gain_sq5Tbl[32]={0.037476, 0.075012, 0.112488, 0.150024, 0.187500,
0.224976, 0.262512, 0.299988, 0.337524, 0.375000,
0.412476, 0.450012, 0.487488, 0.525024, 0.562500,
0.599976, 0.637512, 0.674988, 0.712524, 0.750000,
0.787476, 0.825012, 0.862488, 0.900024, 0.937500,
0.974976, 1.012512, 1.049988, 1.087524, 1.125000,
1.162476, 1.200012}
The gains of the subsequent two stages can be either positive or
negative. The gains are quantized by using a quantization table
times a scale factor. The second stage uses the table gain_sq4Tbl,
and the third stage uses gain_sq3Tbl. The scale factor equates 0.1
Andersen, et al. Experimental [Page 27]
RFC 3951 Internet Low Bit Rate Codec December 2004
or the absolute value of the quantized gain representation value
obtained in the previous stage, whichever is larger. Again, the
resulting gain index is the index to the nearest value of the
quantization table times the scale factor.
gainQ = scaleFact * gain_sqXTbl[index]
gain_sq4Tbl[16]={-1.049988, -0.900024, -0.750000, -0.599976,
-0.450012, -0.299988, -0.150024, 0.000000, 0.150024,
0.299988, 0.450012, 0.599976, 0.750000, 0.900024,
1.049988, 1.200012}
gain_sq3Tbl[8]={-1.000000, -0.659973, -0.330017,0.000000,
0.250000, 0.500000, 0.750000, 1.00000}
Before performing the search for the next stage, the perceptually
weighted target vector is updated by subtracting from it the selected
codebook vector (from the perceptually weighted codebook) times the
corresponding quantized gain.
target[i] = target[i] - gainQ * selected_vec[i];
A reference implementation of the codebook encoding is found in
Appendix A.34.
The start state is quantized in a relatively model independent manner
using 3 bits per sample. In contrast, the remaining parts of the
block are encoded by using an adaptive codebook. This codebook will
produce high matching accuracy whenever there is a high correlation
between the target and the best codebook vector. For unvoiced speech
segments and background noises, this is not necessarily so, which,
due to the nature of the squared error criterion, results in a coded
signal with less power than the target signal. As the coded start
state has good power matching to the target, the result is a power
fluctuation within the encoded frame. Perceptually, the main problem
with this is that the time envelope of the signal energy becomes
unsteady. To overcome this problem, the gains for the codebooks are
re-scaled after the codebook encoding by searching for a new gain
factor for the first stage codebook that provides better power
matching.
First, the energy for the target signal, tene, is computed along with
the energy for the coded signal, cene, given by the addition of the
three gain scaled codebook vectors. Because the gains of the second
Andersen, et al. Experimental [Page 28]
RFC 3951 Internet Low Bit Rate Codec December 2004
and third stage scale with the gain of the first stage, when the
first stage gain is changed from gain[0] to gain_sq5Tbl[i] the energy
of the coded signal changes from cene to
cene*(gain_sq5Tbl[i]*gain_sq5Tbl[i])/(gain[0]*gain[0])
where gain[0] is the gain for the first stage found in the original
codebook search. A refined search is performed by testing the gain
indices i=0 to 31, and as long as the new codebook energy as given
above is less than tene, the gain index for stage 1 is increased. A
restriction is applied so that the new gain value for stage 1 cannot
be more than two times higher than the original value found in the
codebook search. Note that by using this method we do not change the
shape of the encoded vector, only the gain or amplitude.
The total number of bits used to describe one frame of 20 ms speech
is 304, which fits in 38 bytes and results in a bit rate of 15.20
kbit/s. For the case of a frame length of 30 ms speech, the total
number of bits used is 400, which fits in 50 bytes and results in a
bit rate of 13.33 kbit/s. In the bitstream definition, the bits are
distributed into three classes according to their bit error or loss
sensitivity. The most sensitive bits (class 1) are placed first in
the bitstream for each frame. The less sensitive bits (class 2) are
placed after the class 1 bits. The least sensitive bits (class 3)
are placed at the end of the bitstream for each frame.
In the 20/30 ms frame length cases for each class, the following hold
true: The class 1 bits occupy a total of 6/8 bytes (48/64 bits), the
class 2 bits occupy 8/12 bytes (64/96 bits), and the class 3 bits
occupy 24/30 bytes (191/239 bits). This distribution of the bits
enables the use of uneven level protection (ULP) as is exploited in
the payload format definition for iLBC [1]. The detailed bit
allocation is shown in the table below. When a quantization index is
distributed between more classes, the more significant bits belong to
the lowest class.
Andersen, et al. Experimental [Page 29]
RFC 3951 Internet Low Bit Rate Codec December 2004
Bitstream structure:
------------------------------------------------------------------+
Parameter | Bits Class <1,2,3> |
| 20 ms frame | 30 ms frame |
----------------------------------+---------------+---------------+
Split 1 | 6 <6,0,0> | 6 <6,0,0> |
LSF 1 Split 2 | 7 <7,0,0> | 7 <7,0,0> |
LSF Split 3 | 7 <7,0,0> | 7 <7,0,0> |
------------------+---------------+---------------+
Split 1 | NA (Not Appl.)| 6 <6,0,0> |
LSF 2 Split 2 | NA | 7 <7,0,0> |
Split 3 | NA | 7 <7,0,0> |
------------------+---------------+---------------+
Sum | 20 <20,0,0> | 40 <40,0,0> |
----------------------------------+---------------+---------------+
Block Class | 2 <2,0,0> | 3 <3,0,0> |
----------------------------------+---------------+---------------+
Position 22 sample segment | 1 <1,0,0> | 1 <1,0,0> |
----------------------------------+---------------+---------------+
Scale Factor State Coder | 6 <6,0,0> | 6 <6,0,0> |
----------------------------------+---------------+---------------+
Sample 0 | 3 <0,1,2> | 3 <0,1,2> |
Quantized Sample 1 | 3 <0,1,2> | 3 <0,1,2> |
Residual : | : : | : : |
State : | : : | : : |
Samples : | : : | : : |
Sample 56 | 3 <0,1,2> | 3 <0,1,2> |
Sample 57 | NA | 3 <0,1,2> |
------------------+---------------+---------------+
Sum | 171 <0,57,114>| 174 <0,58,116>|
----------------------------------+---------------+---------------+
Stage 1 | 7 <6,0,1> | 7 <4,2,1> |
CB for 22/23 Stage 2 | 7 <0,0,7> | 7 <0,0,7> |
sample block Stage 3 | 7 <0,0,7> | 7 <0,0,7> |
------------------+---------------+---------------+
Sum | 21 <6,0,15> | 21 <4,2,15> |
----------------------------------+---------------+---------------+
Stage 1 | 5 <2,0,3> | 5 <1,1,3> |
Gain for 22/23 Stage 2 | 4 <1,1,2> | 4 <1,1,2> |
sample block Stage 3 | 3 <0,0,3> | 3 <0,0,3> |
------------------+---------------+---------------+
Sum | 12 <3,1,8> | 12 <2,2,8> |
----------------------------------+---------------+---------------+
Stage 1 | 8 <7,0,1> | 8 <6,1,1> |
sub-block 1 Stage 2 | 7 <0,0,7> | 7 <0,0,7> |
Stage 3 | 7 <0,0,7> | 7 <0,0,7> |
------------------+---------------+---------------+
Andersen, et al. Experimental [Page 30]
RFC 3951 Internet Low Bit Rate Codec December 2004
Stage 1 | 8 <0,0,8> | 8 <0,7,1> |
sub-block 2 Stage 2 | 8 <0,0,8> | 8 <0,0,8> |
Indices Stage 3 | 8 <0,0,8> | 8 <0,0,8> |
for CB ------------------+---------------+---------------+
sub-blocks Stage 1 | NA | 8 <0,7,1> |
sub-block 3 Stage 2 | NA | 8 <0,0,8> |
Stage 3 | NA | 8 <0,0,8> |
------------------+---------------+---------------+
Stage 1 | NA | 8 <0,7,1> |
sub-block 4 Stage 2 | NA | 8 <0,0,8> |
Stage 3 | NA | 8 <0,0,8> |
------------------+---------------+---------------+
Sum | 46 <7,0,39> | 94 <6,22,66> |
----------------------------------+---------------+---------------+
Stage 1 | 5 <1,2,2> | 5 <1,2,2> |
sub-block 1 Stage 2 | 4 <1,1,2> | 4 <1,2,1> |
Stage 3 | 3 <0,0,3> | 3 <0,0,3> |
------------------+---------------+---------------+
Stage 1 | 5 <1,1,3> | 5 <0,2,3> |
sub-block 2 Stage 2 | 4 <0,2,2> | 4 <0,2,2> |
Stage 3 | 3 <0,0,3> | 3 <0,0,3> |
Gains for ------------------+---------------+---------------+
sub-blocks Stage 1 | NA | 5 <0,1,4> |
sub-block 3 Stage 2 | NA | 4 <0,1,3> |
Stage 3 | NA | 3 <0,0,3> |
------------------+---------------+---------------+
Stage 1 | NA | 5 <0,1,4> |
sub-block 4 Stage 2 | NA | 4 <0,1,3> |
Stage 3 | NA | 3 <0,0,3> |
------------------+---------------+---------------+
Sum | 24 <3,6,15> | 48 <2,12,34> |
----------------------------------+---------------+---------------+
Empty frame indicator | 1 <0,0,1> | 1 <0,0,1> |
-------------------------------------------------------------------
SUM 304 <48,64,192> 400 <64,96,240>
Table 3.2. The bitstream definition for iLBC for both the 20 ms
frame size mode and the 30 ms frame size mode.
When packetized into the payload, the bits MUST be sorted as follows:
All the class 1 bits in the order (from top to bottom) as specified
in the table, all the class 2 bits (from top to bottom), and all the
class 3 bits in the same sequential order. The last bit, the empty
frame indicator, SHOULD be set to zero by the encoder. If this bit
is set to 1 the decoder SHOULD treat the data as a lost frame. For
example, this bit can be set to 1 to indicate lost frame for file
storage format, as in [1].
Andersen, et al. Experimental [Page 31]
RFC 3951 Internet Low Bit Rate Codec December 2004
This section describes the principles of each component of the
decoder algorithm.
+-------------+ +--------+ +---------------+
payload -> | 1. Get para | -> | 2. LPC | -> | 3. Sc Dequant | ->
+-------------+ +--------+ +---------------+
+-------------+ +------------------+
-> | 4. Mem setup| -> | 5. Construct res |------->
| +-------------+ +------------------- |
---------<-----------<-----------<------------
Sub-frame 0...2/4 (20 ms/30 ms)
+----------------+ +----------+
-> | 6. Enhance res | -> | 7. Synth | ------------>
+----------------+ +----------+
+-----------------+
-> | 8. Post Process | ----------------> decoded speech
+-----------------+
Figure 4.1. Flow chart of the iLBC decoder. If a frame was lost,
steps 1 to 5 SHOULD be replaced by a PLC algorithm.
1. Extract the parameters from the bitstream.
2. Decode the LPC and interpolate (section 4.1).
3. Construct the 57/58-sample start state (section 4.2).
4. Set up the memory by using data from the decoded residual. This
memory is used for codebook construction. For blocks preceding
the start state, both the decoded residual and the target are time
reversed. Sub-frames are decoded in the same order as they were
encoded.
5. Construct the residuals of this sub-frame (gain[0]*cbvec[0] +
gain[1]*cbvec[1] + gain[2]*cbvec[2]). Repeat 4 and 5 until the
residual of all sub-blocks has been constructed.
6. Enhance the residual with the post filter (section 4.6).
7. Synthesis of the residual (section 4.7).
8. Post process with HP filter, if desired (section 4.8).
Andersen, et al. Experimental [Page 32]
RFC 3951 Internet Low Bit Rate Codec December 2004
The decoding of the LP filter parameters is very straightforward.
For a set of three/six indices, the corresponding LSF vector(s) are
found by simple table lookup. For each of the LSF vectors, the three
split vectors are concatenated to obtain qlsf1 and qlsf2,
respectively (in the 20 ms mode only one LSF vector, qlsf, is
constructed). The next step is the stability check described in
section 3.2.5 followed by the interpolation scheme described in
section 3.2.6 (3.2.7 for 20 ms frames). The only difference is that
only the quantized LSFs are known at the decoder, and hence the
unquantized LSFs are not processed.
A reference implementation of the LPC filter reconstruction is given
in Appendix A.36.
The scalar encoded STATE_SHORT_LEN=58 (STATE_SHORT_LEN=57 in the 20
ms mode) state samples are reconstructed by 1) forming a set of
samples (by table lookup) from the index stream idxVec[n], 2)
multiplying the set with 1/scal=(10^qmax)/4.5, 3) time reversing the
57/58 samples, 4) filtering the time reversed block with the
dispersion (all-pass) filter used in the encoder (as described in
section 3.5.2); this compensates for the phase distortion of the
earlier filter operation, and 5 reversing the 57/58 samples from the
previous step.
in(0..(STATE_SHORT_LEN-1)) = time reversed samples from table
look-up,
idxVecDec((STATE_SHORT_LEN-1)..0)
in(STATE_SHORT_LEN..(2*STATE_SHORT_LEN-1)) = 0
Pk(z) = A~rk(z)/A~k(z), where
___
\
A~rk(z)= z^(-LPC_FILTERORDER) + > a~ki*z^(i-(LPC_FILTERORDER-1))
/__
i=0...(LPC_FILTERORDER-1)
and A~k(z) is taken from the block where the start state begins
in -> Pk(z) -> filtered
out(k) = filtered(STATE_SHORT_LEN-1-k) +
filtered(2*STATE_SHORT_LEN-1-k),
k=0..(STATE_SHORT_LEN-1)
Andersen, et al. Experimental [Page 33]
RFC 3951 Internet Low Bit Rate Codec December 2004
The remaining 23/22 samples in the state are reconstructed by the
same adaptive codebook technique described in section 4.3. The
location bit determines whether these are the first or the last 23/22
samples of the 80-sample state vector. If the remaining 23/22
samples are the first samples, then the scalar encoded
STATE_SHORT_LEN state samples are time-reversed before initialization
of the adaptive codebook memory vector.
A reference implementation of the start state reconstruction is given
in Appendix A.44.
The decoding of the LPC excitation vector proceeds in the same order
in which the residual was encoded at the encoder. That is, after the
decoding of the entire 80-sample state vector, the forward sub-blocks
(corresponding to samples occurring after the state vector samples)
are decoded, and then the backward sub-blocks (corresponding to
samples occurring before the state vector) are decoded, resulting in
a fully decoded block of excitation signal samples.
In particular, each sub-block is decoded by using the multistage
adaptive codebook decoding module described in section 4.4. This
module relies upon an adaptive codebook memory constructed before
each run of the adaptive codebook decoding. The construction of the
adaptive codebook memory in the decoder is identical to the method
outlined in section 3.6.3, except that it is done on the codebook
memory without perceptual weighting.
For the initial forward sub-block, the last STATE_LEN=80 samples of
the length CB_LMEM=147 adaptive codebook memory are filled with the
samples of the state vector. For subsequent forward sub-blocks, the
first SUBL=40 samples of the adaptive codebook memory are discarded,
the remaining samples are shifted by SUBL samples toward the
beginning of the vector, and the newly decoded SUBL=40 samples are
placed at the end of the adaptive codebook memory. For backward
sub-blocks, the construction is similar, except that every vector of
samples involved is first time reversed.
A reference implementation of the excitation decoding loop is found
in Appendix A.5.
Andersen, et al. Experimental [Page 34]
RFC 3951 Internet Low Bit Rate Codec December 2004
The Multistage Adaptive Codebook Decoding module is used at both the
sender (encoder) and the receiver (decoder) ends to produce a
synthetic signal in the residual domain that is eventually used to
produce synthetic speech. The module takes the index values used to
construct vectors that are scaled and summed together to produce a
synthetic signal that is the output of the module.
The unpacked index values provided at the input to the module are
references to extended codebooks, which are constructed as described
in section 3.6.3, except that they are based on the codebook memory
without the perceptual weighting. The unpacked three indices are
used to look up three codebook vectors. The unpacked three gain
indices are used to decode the corresponding 3 gains. In this
decoding, the successive rescaling, as described in section 3.6.4.2,
is applied.
A reference implementation of the adaptive codebook decoding is
listed in Appendix A.32.
If packet loss occurs, the decoder receives a signal saying that
information regarding a block is lost. For such blocks it is
RECOMMENDED to use a Packet Loss Concealment (PLC) unit to create a
decoded signal that masks the effect of that packet loss. In the
following we will describe an example of a PLC unit that can be used
with the iLBC codec. As the PLC unit is used only at the decoder,
the PLC unit does not affect interoperability between
implementations. Other PLC implementations MAY therefore be used.
The PLC described operates on the LP filters and the excitation
signals and is based on the following principles:
If the block is received correctly, the PLC only records state
information of the current block that can be used in case the next
block is lost. The LP filter coefficients for each sub-block and the
entire decoded excitation signal are all saved in the decoder state
structure. All of this information will be needed if the following
block is lost.
Andersen, et al. Experimental [Page 35]
RFC 3951 Internet Low Bit Rate Codec December 2004
If the block is not received, the block substitution is based on a
pitch-synchronous repetition of the excitation signal, which is
filtered by the last LP filter of the previous block. The previous
block's information is stored in the decoder state structure.
A correlation analysis is performed on the previous block's
excitation signal in order to detect the amount of pitch periodicity
and a pitch value. The correlation measure is also used to decide on
the voicing level (the degree to which the previous block's
excitation was a voiced or roughly periodic signal). The excitation
in the previous block is used to create an excitation for the block
to be substituted, such that the pitch of the previous block is
maintained. Therefore, the new excitation is constructed in a
pitch-synchronous manner. In order to avoid a buzzy-sounding
substituted block, a random excitation is mixed with the new pitch
periodic excitation, and the relative use of the two components is
computed from the correlation measure (voicing level).
For the block to be substituted, the newly constructed excitation
signal is then passed through the LP filter to produce the speech
that will be substituted for the lost block.
For several consecutive lost blocks, the packet loss concealment
continues in a similar manner. The correlation measure of the last
block received is still used along with the same pitch value. The LP
filters of the last block received are also used again. The energy
of the substituted excitation for consecutive lost blocks is
decreased, leading to a dampened excitation, and therefore to
dampened speech.
For the case in which a block is received correctly when the previous
block was not, the correctly received block's directly decoded speech
(based solely on the received block) is not used as the actual
output. The reason for this is that the directly decoded speech does
not necessarily smoothly merge into the synthetic speech generated
for the previous lost block. If the two signals are not smoothly
merged, an audible discontinuity is accidentally produced.
Therefore, a correlation analysis between the two blocks of
excitation signal (the excitation of the previous concealed block and
that of the current received block) is performed to find the best
phase match. Then a simple overlap-add procedure is performed to
merge the previous excitation smoothly into the current block's
excitation.
Andersen, et al. Experimental [Page 36]
RFC 3951 Internet Low Bit Rate Codec December 2004
The exact implementation of the packet loss concealment does not
influence interoperability of the codec.
A reference implementation of the packet loss concealment is
suggested in Appendix A.14. Exact compliance with this suggested
algorithm is not needed for a reference implementation to be fully
compatible with the overall codec specification.
The decoder contains an enhancement unit that operates on the
reconstructed excitation signal. The enhancement unit increases the
perceptual quality of the reconstructed signal by reducing the
speech-correlated noise in the voiced speech segments. Compared to
traditional postfilters, the enhancer has an advantage in that it can
only modify the excitation signal slightly. This means that there is
no risk of over enhancement. The enhancer works very similarly for
both the 20 ms frame size mode and the 30 ms frame size mode.
For the mode with 20 ms frame size, the enhancer uses a memory of six
80-sample excitation blocks prior in time plus the two new 80-sample
excitation blocks. For each block of 160 new unenhanced excitation
samples, 160 enhanced excitation samples are produced. The enhanced
excitation is 40-sample delayed compared to the unenhanced
excitation, as the enhancer algorithm uses lookahead.
For the mode with 30 ms frame size, the enhancer uses a memory of
five 80-sample excitation blocks prior in time plus the three new
80-sample excitation blocks. For each block of 240 new unenhanced
excitation samples, 240 enhanced excitation samples are produced.
The enhanced excitation is 80-sample delayed compared to the
unenhanced excitation, as the enhancer algorithm uses lookahead.
Outline of Enhancer
The speech enhancement unit operates on sub-blocks of 80 samples,
which means that there are two/three 80 sample sub-blocks per frame.
Each of these two/three sub-blocks is enhanced separately, but in an
analogous manner.
Andersen, et al. Experimental [Page 37]
RFC 3951 Internet Low Bit Rate Codec December 2004
unenhanced residual
|
| +---------------+ +--------------+
+-> | 1. Pitch Est | -> | 2. Find PSSQ | -------->
+---------------+ | +--------------+
+-----<-------<------<--+
+------------+ enh block 0..1/2 |
-> | 3. Smooth | |
+------------+ |
\ |
/\ |
/ \ Already |
/ 4. \----------->----------->-----------+ |
\Crit/ Fulfilled | |
\? / v |
\/ | |
\ +-----------------+ +---------+ | |
Not +->| 5. Use Constr. | -> | 6. Mix | ----->
Fulfilled +-----------------+ +---------+
---------------> enhanced residual
Figure 4.2. Flow chart of the enhancer.
1. Pitch estimation of each of the two/three new 80-sample blocks.
2. Find the pitch-period-synchronous sequence n (for block k) by a
search around the estimated pitch value. Do this for n=1,2,3,
-1,-2,-3.
3. Calculate the smoothed residual generated by the six pitch-
period-synchronous sequences from prior step.
4. Check if the smoothed residual satisfies the criterion (section
4.6.4).
5. Use constraint to calculate mixing factor (section 4.6.5).
6. Mix smoothed signal with unenhanced residual (pssq(n) n=0).
The main idea of the enhancer is to find three 80 sample blocks
before and three 80-sample blocks after the analyzed unenhanced sub-
block and to use these to improve the quality of the excitation in
that sub-block. The six blocks are chosen so that they have the
highest possible correlation with the unenhanced sub-block that is
being enhanced. In other words, the six blocks are pitch-period-
synchronous sequences to the unenhanced sub-block.
Andersen, et al. Experimental [Page 38]
RFC 3951 Internet Low Bit Rate Codec December 2004
A linear combination of the six pitch-period-synchronous sequences is
calculated that approximates the sub-block. If the squared error
between the approximation and the unenhanced sub-block is small
enough, the enhanced residual is set equal to this approximation.
For the cases when the squared error criterion is not fulfilled, a
linear combination of the approximation and the unenhanced residual
forms the enhanced residual.
Pitch estimates are needed to determine the locations of the pitch-
period-synchronous sequences in a complexity-efficient way. For each
of the new two/three sub-blocks, a pitch estimate is calculated by
finding the maximum correlation in the range from lag 20 to lag 120.
These pitch estimates are used to narrow down the search for the best
possible pitch-period-synchronous sequences.
Upon receiving the pitch estimates from the prior step, the enhancer
analyzes and enhances one 80-sample sub-block at a time. The pitch-
period-synchronous-sequences pssq(n) can be viewed as vectors of
length 80 samples each shifted n*lag samples from the current sub-
block. The six pitch-period-synchronous-sequences, pssq(-3) to
pssq(-1) and pssq(1) to pssq(3), are found one at a time by the steps
below:
1) Calculate the estimate of the position of the pssq(n). For
pssq(n) in front of pssq(0) (n > 0), the location of the pssq(n)
is estimated by moving one pitch estimate forward in time from the
exact location of pssq(n-1). Similarly, pssq(n) behind pssq(0) (n
< 0) is estimated by moving one pitch estimate backward in time
from the exact location of pssq(n+1). If the estimated pssq(n)
vector location is totally within the enhancer memory (Figure
4.3), steps 2, 3, and 4 are performed, otherwise the pssq(n) is
set to zeros.
2) Compute the correlation between the unenhanced excitation and
vectors around the estimated location interval of pssq(n). The
correlation is calculated in the interval estimated location +/- 2
samples. This results in five correlation values.
3) The five correlation values are upsampled by a factor of 4, by
using four simple upsampling filters (MA filters with coefficients
upsFilter1.. upsFilter4). Within these the maximum value is
found, which specifies the best pitch-period with a resolution of
a quarter of a sample.
Andersen, et al. Experimental [Page 39]
RFC 3951 Internet Low Bit Rate Codec December 2004
upsFilter1[7]={0.000000 0.000000 0.000000 1.000000
0.000000 0.000000 0.000000}
upsFilter2[7]={0.015625 -0.076904 0.288330 0.862061
-0.106445 0.018799 -0.015625}
upsFilter3[7]={0.023682 -0.124268 0.601563 0.601563
-0.124268 0.023682 -0.023682}
upsFilter4[7]={0.018799 -0.106445 0.862061 0.288330
-0.076904 0.015625 -0.018799}
4) Generate the pssq(n) vector by upsampling of the excitation memory
and extracting the sequence that corresponds to the lag delay that
was calculated in prior step.
With the steps above, all the pssq(n) can be found in an iterative
manner, first moving backward in time from pssq(0) and then forward
in time from pssq(0).
0 159 319 479 639
+---------------------------------------------------------------+
| -5 | -4 | -3 | -2 | -1 | 0 | 1 | 2 |
+---------------------------------------------------------------+
|pssq 0 |
|pssq -1| |pssq 1 |
|pssq -2| |pssq 2 |
|pssq -3| |pssq 3 |
Figure 4.3. Enhancement for 20 ms frame size.
Figure 4.3 depicts pitch-period-synchronous sequences in the
enhancement of the first 80 sample block in the 20 ms frame size
mode. The unenhanced signal input is stored in the last two sub-
blocks (1 - 2), and the six other sub-blocks contain unenhanced
residual prior-in-time. We perform the enhancement algorithm on two
blocks of 80 samples, where the first of the two blocks consists of
the last 40 samples of sub-block 0 and the first 40 samples of sub-
block 1. The second 80-sample block consists of the last 40 samples
of sub-block 1 and the first 40 samples of sub-block 2.
Andersen, et al. Experimental [Page 40]
RFC 3951 Internet Low Bit Rate Codec December 2004
0 159 319 479 639
+---------------------------------------------------------------+
| -4 | -3 | -2 | -1 | 0 | 1 | 2 | 3 |
+---------------------------------------------------------------+
|pssq 0 |
|pssq -1| |pssq 1 |
|pssq -2| |pssq 2 |
|pssq -3| |pssq 3 |
Figure 4.4. Enhancement for 30 ms frame size.
Figure 4.4 depicts pitch-period-synchronous sequences in the
enhancement of the first 80-sample block in the 30 ms frame size
mode. The unenhanced signal input is stored in the last three sub-
blocks (1 - 3). The five other sub-blocks contain unenhanced
residual prior-in-time. The enhancement algorithm is performed on
the three 80 sample sub-blocks 0, 1, and 2.
A linear combination of the six pssq(n) (n!=0) form a smoothed
approximation, z, of pssq(0). Most of the weight is put on the
sequences that are close to pssq(0), as these are likely to be most
similar to pssq(0). The smoothed vector is also rescaled so that the
energy of z is the same as the energy of pssq(0).
___
\
y = > pssq(i) * pssq_weight(i)
/__
i=-3,-2,-1,1,2,3
pssq_weight(i) = 0.5*(1-cos(2*pi*(i+4)/(2*3+2)))
z = C * y, where C = ||pssq(0)||/||y||
The criterion of the enhancer is that the enhanced excitation is not
allowed to differ much from the unenhanced excitation. This
criterion is checked for each 80-sample sub-block.
e < (b * ||pssq(0)||^2), where b=0.05 and (Constraint 1)
e = (pssq(0)-z)*(pssq(0)-z), and "*" means the dot product
Andersen, et al. Experimental [Page 41]
RFC 3951 Internet Low Bit Rate Codec December 2004
From the criterion in the previous section, it is clear that the
excitation is not allowed to change much. The purpose of this
constraint is to prevent the creation of an enhanced signal
significantly different from the original signal. This also means
that the constraint limits the numerical size of the errors that the
enhancement procedure can make. That is especially important in
unvoiced segments and background noise segments for which increased
periodicity could lead to lower perceived quality.
When the constraint in the prior section is not met, the enhanced
residual is instead calculated through a constrained optimization by
using the Lagrange multiplier technique. The new constraint is that
e = (b * ||pssq(0)||^2) (Constraint 2)
We distinguish two solution regions for the optimization: 1) the
region where the first constraint is fulfilled and 2) the region
where the first constraint is not fulfilled and the second constraint
must be used.
In the first case, where the second constraint is not needed, the
optimized re-estimated vector is simply z, the energy-scaled version
of y.
In the second case, where the second constraint is activated and
becomes an equality constraint, we have
z= A*y + B*pssq(0)
where
A = sqrt((b-b^2/4)*(w00*w00)/ (w11*w00 + w10*w10)) and
w11 = pssq(0)*pssq(0)
w00 = y*y
w10 = y*pssq(0) (* symbolizes the dot product)
and
B = 1 - b/2 - A * w10/w00
Appendix A.16 contains a listing of a reference implementation for
the enhancement method.
Andersen, et al. Experimental [Page 42]
RFC 3951 Internet Low Bit Rate Codec December 2004
Upon decoding or PLC of the LP excitation block, the decoded speech
block is obtained by running the decoded LP synthesis filter,
1/A~k(z), over the block. The synthesis filters have to be shifted
to compensate for the delay in the enhancer. For 20 ms frame size
mode, they SHOULD be shifted one 40-sample sub-block, and for 30 ms
frame size mode, they SHOULD be shifted two 40-sample sub-blocks.
The LP coefficients SHOULD be changed at the first sample of every
sub-block while keeping the filter state. For PLC blocks, one
solution is to apply the last LP coefficients of the last decoded
speech block for all sub-blocks.
The reference implementation for the synthesis filtering can be found
in Appendix A.48.
If desired, the decoded block can be filtered by a high-pass filter.
This removes the low frequencies of the decoded signal. A reference
implementation of this, with cutoff at 65 Hz, is shown in Appendix
A.30.
This algorithm for the coding of speech signals is not subject to any
known security consideration; however, its RTP payload format [1] is
subject to several considerations, which are addressed there.
Confidentiality of the media streams is achieved by encryption;
therefore external mechanisms, such as SRTP [5], MAY be used for that
purpose.
It is possible and suggested to evaluate certain iLBC implementation
by utilizing methodology and tools available at
http://www.ilbcfreeware.org/evaluation.html
[1] Duric, A. and S. Andersen, "Real-time Transport Protocol (RTP)
Payload Format for internet Low Bit Rate Codec (iLBC) Speech",
RFC 3952, December 2004.
[2] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997.
Andersen, et al. Experimental [Page 43]
RFC 3951 Internet Low Bit Rate Codec December 2004
[3] PacketCable(TM) Audio/Video Codecs Specification, Cable
Television Laboratories, Inc.
[4] ITU-T Recommendation G.711, available online from the ITU
bookstore at http://www.itu.int.
[5] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norman,
"The Secure Real Time Transport Protocol (SRTP)", RFC 3711, March
2004.
This extensive work, besides listed authors, has the following
authors, who could not have been listed among "official" authors (due
to IESG restrictions in the number of authors who can be listed):
Manohar N. Murthi (Department of Electrical and Computer
Engineering, University of Miami), Fredrik Galschiodt, Julian
Spittka, and Jan Skoglund (Global IP Sound).
The authors are deeply indebted to the following people and thank
them sincerely:
Henry Sinnreich, Patrik Faltstrom, Alan Johnston, and Jean-
Francois Mule for great support of the iLBC initiative and for
valuable feedback and comments.
Peter Vary, Frank Mertz, and Christoph Erdmann (RWTH Aachen);
Vladimir Cuperman (Niftybox LLC); Thomas Eriksson (Chalmers Univ
of Tech), and Gernot Kubin (TU Graz), for thorough review of the
iLBC document and their valuable feedback and remarks.
Andersen, et al. Experimental [Page 44]
RFC 3951 Internet Low Bit Rate Codec December 2004
APPENDIX A. Reference Implementation
This appendix contains the complete c-code for a reference
implementation of encoder and decoder for the specified codec.
The c-code consists of the following files with highest-level
functions:
iLBC_test.c: main function for evaluation purpose
iLBC_encode.h: encoder header
iLBC_encode.c: encoder function
iLBC_decode.h: decoder header
iLBC_decode.c: decoder function
The following files contain global defines and constants:
iLBC_define.h: global defines
constants.h: global constants header
constants.c: global constants memory allocations
The following files contain subroutines:
anaFilter.h: lpc analysis filter header
anaFilter.c: lpc analysis filter function
createCB.h: codebook construction header
createCB.c: codebook construction function
doCPLC.h: packet loss concealment header
doCPLC.c: packet loss concealment function
enhancer.h: signal enhancement header
enhancer.c: signal enhancement function
filter.h: general filter header
filter.c: general filter functions
FrameClassify.h: start state classification header
FrameClassify.c: start state classification function
gainquant.h: gain quantization header
gainquant.c: gain quantization function
getCBvec.h: codebook vector construction header
getCBvec.c: codebook vector construction function
helpfun.h: general purpose header
helpfun.c: general purpose functions
hpInput.h: input high pass filter header
hpInput.c: input high pass filter function
hpOutput.h: output high pass filter header
hpOutput.c: output high pass filter function
iCBConstruct.h: excitation decoding header
iCBConstruct.c: excitation decoding function
iCBSearch.h: excitation encoding header
iCBSearch.c: excitation encoding function
Andersen, et al. Experimental [Page 45]
RFC 3951 Internet Low Bit Rate Codec December 2004
LPCdecode.h: lpc decoding header
LPCdecode.c: lpc decoding function
LPCencode.h: lpc encoding header
LPCencode.c: lpc encoding function
lsf.h: line spectral frequencies header
lsf.c: line spectral frequencies functions
packing.h: bitstream packetization header
packing.c: bitstream packetization functions
StateConstructW.h: state decoding header
StateConstructW.c: state decoding functions
StateSearchW.h: state encoding header
StateSearchW.c: state encoding function
syntFilter.h: lpc synthesis filter header
syntFilter.c: lpc synthesis filter function
The implementation is portable and should work on many different
platforms. However, it is not difficult to optimize the
implementation on particular platforms, an exercise left to the
reader.
/******************************************************************
iLBC Speech Coder ANSI-C Source Code
iLBC_decode.h
Copyright (C) The Internet Society (2004).
All Rights Reserved.
******************************************************************/
#ifndef __iLBC_ILBCDECODE_H
#define __iLBC_ILBCDECODE_H
#include "iLBC_define.h"
short initDecode( /* (o) Number of decoded
samples */
iLBC_Dec_Inst_t *iLBCdec_inst, /* (i/o) Decoder instance */
int mode, /* (i) frame size mode */
int use_enhancer /* (i) 1 to use enhancer
0 to run without
enhancer */
);
void iLBC_decode(
float *decblock, /* (o) decoded signal block */
unsigned char *bytes, /* (i) encoded signal bits */
iLBC_Dec_Inst_t *iLBCdec_inst, /* (i/o) the decoder state
structure */
int mode /* (i) 0: bad packet, PLC,
1: normal */
Andersen, et al. Experimental [Page 63]
RFC 3951 Internet Low Bit Rate Codec December 2004
);
#endif
/******************************************************************
iLBC Speech Coder ANSI-C Source Code
anaFilter.h
Copyright (C) The Internet Society (2004).
All Rights Reserved.
******************************************************************/
#ifndef __iLBC_ANAFILTER_H
#define __iLBC_ANAFILTER_H
void anaFilter(
Andersen, et al. Experimental [Page 96]
RFC 3951 Internet Low Bit Rate Codec December 2004
float *In, /* (i) Signal to be filtered */
float *a, /* (i) LP parameters */
int len,/* (i) Length of signal */
float *Out, /* (o) Filtered signal */
float *mem /* (i/o) Filter state */
);
#endif
/******************************************************************
iLBC Speech Coder ANSI-C Source Code
anaFilter.c
Copyright (C) The Internet Society (2004).
All Rights Reserved.
******************************************************************/
#include <string.h>
#include "iLBC_define.h"
/*----------------------------------------------------------------*
* LP analysis filter.
*---------------------------------------------------------------*/
void anaFilter(
float *In, /* (i) Signal to be filtered */
float *a, /* (i) LP parameters */
int len,/* (i) Length of signal */
float *Out, /* (o) Filtered signal */
float *mem /* (i/o) Filter state */
){
int i, j;
float *po, *pi, *pm, *pa;
po = Out;
/* Filter first part using memory from past */
for (i=0; i<LPC_FILTERORDER; i++) {
pi = &In[i];
pm = &mem[LPC_FILTERORDER-1];
pa = a;
*po=0.0;
Andersen, et al. Experimental [Page 97]
RFC 3951 Internet Low Bit Rate Codec December 2004
for (j=0; j<=i; j++) {
*po+=(*pa++)*(*pi--);
}
for (j=i+1; j<LPC_FILTERORDER+1; j++) {
*po+=(*pa++)*(*pm--);
}
po++;
}
/* Filter last part where the state is entirely
in the input vector */
for (i=LPC_FILTERORDER; i<len; i++) {
pi = &In[i];
pa = a;
*po=0.0;
for (j=0; j<LPC_FILTERORDER+1; j++) {
*po+=(*pa++)*(*pi--);
}
po++;
}
/* Update state vector */
memcpy(mem, &In[len-LPC_FILTERORDER],
LPC_FILTERORDER*sizeof(float));
}
/******************************************************************
iLBC Speech Coder ANSI-C Source Code
createCB.h
Copyright (C) The Internet Society (2004).
All Rights Reserved.
******************************************************************/
#ifndef __iLBC_CREATECB_H
#define __iLBC_CREATECB_H
void filteredCBvecs(
float *cbvectors, /* (o) Codebook vector for the
higher section */
Andersen, et al. Experimental [Page 98]
RFC 3951 Internet Low Bit Rate Codec December 2004
float *mem, /* (i) Buffer to create codebook
vectors from */
int lMem /* (i) Length of buffer */
);
void searchAugmentedCB(
int low, /* (i) Start index for the search */
int high, /* (i) End index for the search */
int stage, /* (i) Current stage */
int startIndex, /* (i) CB index for the first
augmented vector */
float *target, /* (i) Target vector for encoding */
float *buffer, /* (i) Pointer to the end of the
buffer for augmented codebook
construction */
float *max_measure, /* (i/o) Currently maximum measure */
int *best_index,/* (o) Currently the best index */
float *gain, /* (o) Currently the best gain */
float *energy, /* (o) Energy of augmented
codebook vectors */
float *invenergy/* (o) Inv energy of aug codebook
vectors */
);
void createAugmentedVec(
int index, /* (i) Index for the aug vector
to be created */
float *buffer, /* (i) Pointer to the end of the
buffer for augmented codebook
construction */
float *cbVec /* (o) The construced codebook vector */
);
#endif
/******************************************************************
iLBC Speech Coder ANSI-C Source Code
createCB.c
Copyright (C) The Internet Society (2004).
All Rights Reserved.
******************************************************************/
Andersen, et al. Experimental [Page 99]
RFC 3951 Internet Low Bit Rate Codec December 2004
#include "iLBC_define.h"
#include "constants.h"
#include <string.h>
#include <math.h>
/*----------------------------------------------------------------*
* Construct an additional codebook vector by filtering the
* initial codebook buffer. This vector is then used to expand
* the codebook with an additional section.
*---------------------------------------------------------------*/
void filteredCBvecs(
float *cbvectors, /* (o) Codebook vectors for the
higher section */
float *mem, /* (i) Buffer to create codebook
vector from */
int lMem /* (i) Length of buffer */
){
int j, k;
float *pp, *pp1;
float tempbuff2[CB_MEML+CB_FILTERLEN];
float *pos;
memset(tempbuff2, 0, (CB_HALFFILTERLEN-1)*sizeof(float));
memcpy(&tempbuff2[CB_HALFFILTERLEN-1], mem, lMem*sizeof(float));
memset(&tempbuff2[lMem+CB_HALFFILTERLEN-1], 0,
(CB_HALFFILTERLEN+1)*sizeof(float));
/* Create codebook vector for higher section by filtering */
/* do filtering */
pos=cbvectors;
memset(pos, 0, lMem*sizeof(float));
for (k=0; k<lMem; k++) {
pp=&tempbuff2[k];
pp1=&cbfiltersTbl[CB_FILTERLEN-1];
for (j=0;j<CB_FILTERLEN;j++) {
(*pos)+=(*pp++)*(*pp1--);
}
pos++;
}
}
/*----------------------------------------------------------------*
* Search the augmented part of the codebook to find the best
* measure.
*----------------------------------------------------------------*/
Andersen, et al. Experimental [Page 100]
RFC 3951 Internet Low Bit Rate Codec December 2004
void searchAugmentedCB(
int low, /* (i) Start index for the search */
int high, /* (i) End index for the search */
int stage, /* (i) Current stage */
int startIndex, /* (i) Codebook index for the first
aug vector */
float *target, /* (i) Target vector for encoding */
float *buffer, /* (i) Pointer to the end of the buffer for
augmented codebook construction */
float *max_measure, /* (i/o) Currently maximum measure */
int *best_index,/* (o) Currently the best index */
float *gain, /* (o) Currently the best gain */
float *energy, /* (o) Energy of augmented codebook
vectors */
float *invenergy/* (o) Inv energy of augmented codebook
vectors */
) {
int icount, ilow, j, tmpIndex;
float *pp, *ppo, *ppi, *ppe, crossDot, alfa;
float weighted, measure, nrjRecursive;
float ftmp;
/* Compute the energy for the first (low-5)
noninterpolated samples */
nrjRecursive = (float) 0.0;
pp = buffer - low + 1;
for (j=0; j<(low-5); j++) {
nrjRecursive += ( (*pp)*(*pp) );
pp++;
}
ppe = buffer - low;
for (icount=low; icount<=high; icount++) {
/* Index of the codebook vector used for retrieving
energy values */
tmpIndex = startIndex+icount-20;
ilow = icount-4;
/* Update the energy recursively to save complexity */
nrjRecursive = nrjRecursive + (*ppe)*(*ppe);
ppe--;
energy[tmpIndex] = nrjRecursive;
/* Compute cross dot product for the first (low-5)
samples */
Andersen, et al. Experimental [Page 101]
RFC 3951 Internet Low Bit Rate Codec December 2004
crossDot = (float) 0.0;
pp = buffer-icount;
for (j=0; j<ilow; j++) {
crossDot += target[j]*(*pp++);
}
/* interpolation */
alfa = (float) 0.2;
ppo = buffer-4;
ppi = buffer-icount-4;
for (j=ilow; j<icount; j++) {
weighted = ((float)1.0-alfa)*(*ppo)+alfa*(*ppi);
ppo++;
ppi++;
energy[tmpIndex] += weighted*weighted;
crossDot += target[j]*weighted;
alfa += (float)0.2;
}
/* Compute energy and cross dot product for the
remaining samples */
pp = buffer - icount;
for (j=icount; j<SUBL; j++) {
energy[tmpIndex] += (*pp)*(*pp);
crossDot += target[j]*(*pp++);
}
if (energy[tmpIndex]>0.0) {
invenergy[tmpIndex]=(float)1.0/(energy[tmpIndex]+EPS);
} else {
invenergy[tmpIndex] = (float) 0.0;
}
if (stage==0) {
measure = (float)-10000000.0;
if (crossDot > 0.0) {
measure = crossDot*crossDot*invenergy[tmpIndex];
}
}
else {
measure = crossDot*crossDot*invenergy[tmpIndex];
}
/* check if measure is better */
ftmp = crossDot*invenergy[tmpIndex];
if ((measure>*max_measure) && (fabs(ftmp)<CB_MAXGAIN)) {
Andersen, et al. Experimental [Page 102]
RFC 3951 Internet Low Bit Rate Codec December 2004
*best_index = tmpIndex;
*max_measure = measure;
*gain = ftmp;
}
}
}
/*----------------------------------------------------------------*
* Recreate a specific codebook vector from the augmented part.
*
*----------------------------------------------------------------*/
void createAugmentedVec(
int index, /* (i) Index for the augmented vector
to be created */
float *buffer, /* (i) Pointer to the end of the buffer for
augmented codebook construction */
float *cbVec/* (o) The construced codebook vector */
) {
int ilow, j;
float *pp, *ppo, *ppi, alfa, alfa1, weighted;
ilow = index-5;
/* copy the first noninterpolated part */
pp = buffer-index;
memcpy(cbVec,pp,sizeof(float)*index);
/* interpolation */
alfa1 = (float)0.2;
alfa = 0.0;
ppo = buffer-5;
ppi = buffer-index-5;
for (j=ilow; j<index; j++) {
weighted = ((float)1.0-alfa)*(*ppo)+alfa*(*ppi);
ppo++;
ppi++;
cbVec[j] = weighted;
alfa += alfa1;
}
/* copy the second noninterpolated part */
pp = buffer - index;
memcpy(cbVec+index,pp,sizeof(float)*(SUBL-index));
Andersen, et al. Experimental [Page 103]
RFC 3951 Internet Low Bit Rate Codec December 2004
}
/******************************************************************
iLBC Speech Coder ANSI-C Source Code
enhancer.h
Copyright (C) The Internet Society (2004).
All Rights Reserved.
Andersen, et al. Experimental [Page 109]
RFC 3951 Internet Low Bit Rate Codec December 2004
******************************************************************/
#ifndef __ENHANCER_H
#define __ENHANCER_H
#include "iLBC_define.h"
float xCorrCoef(
float *target, /* (i) first array */
float *regressor, /* (i) second array */
int subl /* (i) dimension arrays */
);
int enhancerInterface(
float *out, /* (o) the enhanced recidual signal */
float *in, /* (i) the recidual signal to enhance */
iLBC_Dec_Inst_t *iLBCdec_inst
/* (i/o) the decoder state structure */
);
#endif
/******************************************************************
iLBC Speech Coder ANSI-C Source Code
filter.h
Copyright (C) The Internet Society (2004).
All Rights Reserved.
******************************************************************/
Andersen, et al. Experimental [Page 123]
RFC 3951 Internet Low Bit Rate Codec December 2004
#ifndef __iLBC_FILTER_H
#define __iLBC_FILTER_H
void AllPoleFilter(
float *InOut, /* (i/o) on entrance InOut[-orderCoef] to
InOut[-1] contain the state of the
filter (delayed samples). InOut[0] to
InOut[lengthInOut-1] contain the filter
input, on en exit InOut[-orderCoef] to
InOut[-1] is unchanged and InOut[0] to
InOut[lengthInOut-1] contain filtered
samples */
float *Coef,/* (i) filter coefficients, Coef[0] is assumed
to be 1.0 */
int lengthInOut,/* (i) number of input/output samples */
int orderCoef /* (i) number of filter coefficients */
);
void AllZeroFilter(
float *In, /* (i) In[0] to In[lengthInOut-1] contain
filter input samples */
float *Coef,/* (i) filter coefficients (Coef[0] is assumed
to be 1.0) */
int lengthInOut,/* (i) number of input/output samples */
int orderCoef, /* (i) number of filter coefficients */
float *Out /* (i/o) on entrance Out[-orderCoef] to Out[-1]
contain the filter state, on exit Out[0]
to Out[lengthInOut-1] contain filtered
samples */
);
void ZeroPoleFilter(
float *In, /* (i) In[0] to In[lengthInOut-1] contain filter
input samples In[-orderCoef] to In[-1]
contain state of all-zero section */
float *ZeroCoef,/* (i) filter coefficients for all-zero
section (ZeroCoef[0] is assumed to
be 1.0) */
float *PoleCoef,/* (i) filter coefficients for all-pole section
(ZeroCoef[0] is assumed to be 1.0) */
int lengthInOut,/* (i) number of input/output samples */
int orderCoef, /* (i) number of filter coefficients */
float *Out /* (i/o) on entrance Out[-orderCoef] to Out[-1]
contain state of all-pole section. On
exit Out[0] to Out[lengthInOut-1]
contain filtered samples */
);
Andersen, et al. Experimental [Page 124]
RFC 3951 Internet Low Bit Rate Codec December 2004
void DownSample (
float *In, /* (i) input samples */
float *Coef, /* (i) filter coefficients */
int lengthIn, /* (i) number of input samples */
float *state, /* (i) filter state */
float *Out /* (o) downsampled output */
);
#endif
/******************************************************************
iLBC Speech Coder ANSI-C Source Code
filter.c
Copyright (C) The Internet Society (2004).
All Rights Reserved.
******************************************************************/
#include "iLBC_define.h"
/*----------------------------------------------------------------*
* all-pole filter
*---------------------------------------------------------------*/
void AllPoleFilter(
float *InOut, /* (i/o) on entrance InOut[-orderCoef] to
InOut[-1] contain the state of the
filter (delayed samples). InOut[0] to
InOut[lengthInOut-1] contain the filter
input, on en exit InOut[-orderCoef] to
InOut[-1] is unchanged and InOut[0] to
InOut[lengthInOut-1] contain filtered
samples */
float *Coef,/* (i) filter coefficients, Coef[0] is assumed
to be 1.0 */
int lengthInOut,/* (i) number of input/output samples */
int orderCoef /* (i) number of filter coefficients */
){
int n,k;
for(n=0;n<lengthInOut;n++){
for(k=1;k<=orderCoef;k++){
*InOut -= Coef[k]*InOut[-k];
Andersen, et al. Experimental [Page 125]
RFC 3951 Internet Low Bit Rate Codec December 2004
}
InOut++;
}
}
/*----------------------------------------------------------------*
* all-zero filter
*---------------------------------------------------------------*/
void AllZeroFilter(
float *In, /* (i) In[0] to In[lengthInOut-1] contain
filter input samples */
float *Coef,/* (i) filter coefficients (Coef[0] is assumed
to be 1.0) */
int lengthInOut,/* (i) number of input/output samples */
int orderCoef, /* (i) number of filter coefficients */
float *Out /* (i/o) on entrance Out[-orderCoef] to Out[-1]
contain the filter state, on exit Out[0]
to Out[lengthInOut-1] contain filtered
samples */
){
int n,k;
for(n=0;n<lengthInOut;n++){
*Out = Coef[0]*In[0];
for(k=1;k<=orderCoef;k++){
*Out += Coef[k]*In[-k];
}
Out++;
In++;
}
}
/*----------------------------------------------------------------*
* pole-zero filter
*---------------------------------------------------------------*/
void ZeroPoleFilter(
float *In, /* (i) In[0] to In[lengthInOut-1] contain
filter input samples In[-orderCoef] to
In[-1] contain state of all-zero
section */
float *ZeroCoef,/* (i) filter coefficients for all-zero
section (ZeroCoef[0] is assumed to
be 1.0) */
float *PoleCoef,/* (i) filter coefficients for all-pole section
(ZeroCoef[0] is assumed to be 1.0) */
int lengthInOut,/* (i) number of input/output samples */
Andersen, et al. Experimental [Page 126]
RFC 3951 Internet Low Bit Rate Codec December 2004
int orderCoef, /* (i) number of filter coefficients */
float *Out /* (i/o) on entrance Out[-orderCoef] to Out[-1]
contain state of all-pole section. On
exit Out[0] to Out[lengthInOut-1]
contain filtered samples */
){
AllZeroFilter(In,ZeroCoef,lengthInOut,orderCoef,Out);
AllPoleFilter(Out,PoleCoef,lengthInOut,orderCoef);
}
/*----------------------------------------------------------------*
* downsample (LP filter and decimation)
*---------------------------------------------------------------*/
void DownSample (
float *In, /* (i) input samples */
float *Coef, /* (i) filter coefficients */
int lengthIn, /* (i) number of input samples */
float *state, /* (i) filter state */
float *Out /* (o) downsampled output */
){
float o;
float *Out_ptr = Out;
float *Coef_ptr, *In_ptr;
float *state_ptr;
int i, j, stop;
/* LP filter and decimate at the same time */
for (i = DELAY_DS; i < lengthIn; i+=FACTOR_DS)
{
Coef_ptr = &Coef[0];
In_ptr = &In[i];
state_ptr = &state[FILTERORDER_DS-2];
o = (float)0.0;
stop = (i < FILTERORDER_DS) ? i + 1 : FILTERORDER_DS;
for (j = 0; j < stop; j++)
{
o += *Coef_ptr++ * (*In_ptr--);
}
for (j = i + 1; j < FILTERORDER_DS; j++)
{
o += *Coef_ptr++ * (*state_ptr--);
}
Andersen, et al. Experimental [Page 127]
RFC 3951 Internet Low Bit Rate Codec December 2004
*Out_ptr++ = o;
}
/* Get the last part (use zeros as input for the future) */
for (i=(lengthIn+FACTOR_DS); i<(lengthIn+DELAY_DS);
i+=FACTOR_DS) {
o=(float)0.0;
if (i<lengthIn) {
Coef_ptr = &Coef[0];
In_ptr = &In[i];
for (j=0; j<FILTERORDER_DS; j++) {
o += *Coef_ptr++ * (*Out_ptr--);
}
} else {
Coef_ptr = &Coef[i-lengthIn];
In_ptr = &In[lengthIn-1];
for (j=0; j<FILTERORDER_DS-(i-lengthIn); j++) {
o += *Coef_ptr++ * (*In_ptr--);
}
}
*Out_ptr++ = o;
}
}
/******************************************************************
iLBC Speech Coder ANSI-C Source Code
FrameClassify.h
Copyright (C) The Internet Society (2004).
All Rights Reserved.
******************************************************************/
#ifndef __iLBC_FRAMECLASSIFY_H
#define __iLBC_FRAMECLASSIFY_H
int FrameClassify( /* index to the max-energy sub-frame */
iLBC_Enc_Inst_t *iLBCenc_inst,
/* (i/o) the encoder state structure */
float *residual /* (i) lpc residual signal */
);
Andersen, et al. Experimental [Page 128]
RFC 3951 Internet Low Bit Rate Codec December 2004
#endif
/******************************************************************
iLBC Speech Coder ANSI-C Source Code
gainquant.h
Copyright (C) The Internet Society (2004).
All Rights Reserved.
******************************************************************/
#ifndef __iLBC_GAINQUANT_H
#define __iLBC_GAINQUANT_H
float gainquant(/* (o) quantized gain value */
float in, /* (i) gain value */
float maxIn,/* (i) maximum of gain value */
int cblen, /* (i) number of quantization indices */
int *index /* (o) quantization index */
);
float gaindequant( /* (o) quantized gain value */
int index, /* (i) quantization index */
float maxIn,/* (i) maximum of unquantized gain */
int cblen /* (i) number of quantization indices */
);
#endif
/******************************************************************
iLBC Speech Coder ANSI-C Source Code
Andersen, et al. Experimental [Page 131]
RFC 3951 Internet Low Bit Rate Codec December 2004
gainquant.c
Copyright (C) The Internet Society (2004).
All Rights Reserved.
******************************************************************/
#include <string.h>
#include <math.h>
#include "constants.h"
#include "filter.h"
/*----------------------------------------------------------------*
* quantizer for the gain in the gain-shape coding of residual
*---------------------------------------------------------------*/
float gainquant(/* (o) quantized gain value */
float in, /* (i) gain value */
float maxIn,/* (i) maximum of gain value */
int cblen, /* (i) number of quantization indices */
int *index /* (o) quantization index */
){
int i, tindex;
float minmeasure,measure, *cb, scale;
/* ensure a lower bound on the scaling factor */
scale=maxIn;
if (scale<0.1) {
scale=(float)0.1;
}
/* select the quantization table */
if (cblen == 8) {
cb = gain_sq3Tbl;
} else if (cblen == 16) {
cb = gain_sq4Tbl;
} else {
cb = gain_sq5Tbl;
}
/* select the best index in the quantization table */
minmeasure=10000000.0;
tindex=0;
for (i=0; i<cblen; i++) {
Andersen, et al. Experimental [Page 132]
RFC 3951 Internet Low Bit Rate Codec December 2004
measure=(in-scale*cb[i])*(in-scale*cb[i]);
if (measure<minmeasure) {
tindex=i;
minmeasure=measure;
}
}
*index=tindex;
/* return the quantized value */
return scale*cb[tindex];
}
/*----------------------------------------------------------------*
* decoder for quantized gains in the gain-shape coding of
* residual
*---------------------------------------------------------------*/
float gaindequant( /* (o) quantized gain value */
int index, /* (i) quantization index */
float maxIn,/* (i) maximum of unquantized gain */
int cblen /* (i) number of quantization indices */
){
float scale;
/* obtain correct scale factor */
scale=(float)fabs(maxIn);
if (scale<0.1) {
scale=(float)0.1;
}
/* select the quantization table and return the decoded value */
if (cblen==8) {
return scale*gain_sq3Tbl[index];
} else if (cblen==16) {
return scale*gain_sq4Tbl[index];
}
else if (cblen==32) {
return scale*gain_sq5Tbl[index];
}
return 0.0;
}
Andersen, et al. Experimental [Page 133]
RFC 3951 Internet Low Bit Rate Codec December 2004
/******************************************************************
iLBC Speech Coder ANSI-C Source Code
helpfun.h
Copyright (C) The Internet Society (2004).
All Rights Reserved.
******************************************************************/
#ifndef __iLBC_HELPFUN_H
#define __iLBC_HELPFUN_H
void autocorr(
float *r, /* (o) autocorrelation vector */
const float *x, /* (i) data vector */
int N, /* (i) length of data vector */
int order /* largest lag for calculated
autocorrelations */
);
void window(
float *z, /* (o) the windowed data */
const float *x, /* (i) the original data vector */
const float *y, /* (i) the window */
int N /* (i) length of all vectors */
);
void levdurb(
float *a, /* (o) lpc coefficient vector starting
with 1.0 */
float *k, /* (o) reflection coefficients */
float *r, /* (i) autocorrelation vector */
int order /* (i) order of lpc filter */
);
void interpolate(
Andersen, et al. Experimental [Page 138]
RFC 3951 Internet Low Bit Rate Codec December 2004
float *out, /* (o) the interpolated vector */
float *in1, /* (i) the first vector for the
interpolation */
float *in2, /* (i) the second vector for the
interpolation */
float coef, /* (i) interpolation weights */
int length /* (i) length of all vectors */
);
void bwexpand(
float *out, /* (o) the bandwidth expanded lpc
coefficients */
float *in, /* (i) the lpc coefficients before bandwidth
expansion */
float coef, /* (i) the bandwidth expansion factor */
int length /* (i) the length of lpc coefficient vectors */
);
void vq(
float *Xq, /* (o) the quantized vector */
int *index, /* (o) the quantization index */
const float *CB,/* (i) the vector quantization codebook */
float *X, /* (i) the vector to quantize */
int n_cb, /* (i) the number of vectors in the codebook */
int dim /* (i) the dimension of all vectors */
);
void SplitVQ(
float *qX, /* (o) the quantized vector */
int *index, /* (o) a vector of indexes for all vector
codebooks in the split */
float *X, /* (i) the vector to quantize */
const float *CB,/* (i) the quantizer codebook */
int nsplit, /* the number of vector splits */
const int *dim, /* the dimension of X and qX */
const int *cbsize /* the number of vectors in the codebook */
);
void sort_sq(
float *xq, /* (o) the quantized value */
int *index, /* (o) the quantization index */
float x, /* (i) the value to quantize */
const float *cb,/* (i) the quantization codebook */
int cb_size /* (i) the size of the quantization codebook */
);
int LSF_check( /* (o) 1 for stable lsf vectors and 0 for
Andersen, et al. Experimental [Page 139]
RFC 3951 Internet Low Bit Rate Codec December 2004
nonstable ones */
float *lsf, /* (i) a table of lsf vectors */
int dim, /* (i) the dimension of each lsf vector */
int NoAn /* (i) the number of lsf vectors in the
table */
);
#endif
/******************************************************************
iLBC Speech Coder ANSI-C Source Code
helpfun.c
Copyright (C) The Internet Society (2004).
All Rights Reserved.
******************************************************************/
#include <math.h>
#include "iLBC_define.h"
#include "constants.h"
/*----------------------------------------------------------------*
* calculation of auto correlation
*---------------------------------------------------------------*/
void autocorr(
float *r, /* (o) autocorrelation vector */
const float *x, /* (i) data vector */
int N, /* (i) length of data vector */
int order /* largest lag for calculated
autocorrelations */
){
int lag, n;
float sum;
for (lag = 0; lag <= order; lag++) {
sum = 0;
for (n = 0; n < N - lag; n++) {
sum += x[n] * x[n+lag];
}
r[lag] = sum;
}
Andersen, et al. Experimental [Page 140]
RFC 3951 Internet Low Bit Rate Codec December 2004
}
/*----------------------------------------------------------------*
* window multiplication
*---------------------------------------------------------------*/
void window(
float *z, /* (o) the windowed data */
const float *x, /* (i) the original data vector */
const float *y, /* (i) the window */
int N /* (i) length of all vectors */
){
int i;
for (i = 0; i < N; i++) {
z[i] = x[i] * y[i];
}
}
/*----------------------------------------------------------------*
* levinson-durbin solution for lpc coefficients
*---------------------------------------------------------------*/
void levdurb(
float *a, /* (o) lpc coefficient vector starting
with 1.0 */
float *k, /* (o) reflection coefficients */
float *r, /* (i) autocorrelation vector */
int order /* (i) order of lpc filter */
){
float sum, alpha;
int m, m_h, i;
a[0] = 1.0;
if (r[0] < EPS) { /* if r[0] <= 0, set LPC coeff. to zero */
for (i = 0; i < order; i++) {
k[i] = 0;
a[i+1] = 0;
}
} else {
a[1] = k[0] = -r[1]/r[0];
alpha = r[0] + r[1] * k[0];
for (m = 1; m < order; m++){
sum = r[m + 1];
for (i = 0; i < m; i++){
sum += a[i+1] * r[m - i];
}
Andersen, et al. Experimental [Page 141]
RFC 3951 Internet Low Bit Rate Codec December 2004
k[m] = -sum / alpha;
alpha += k[m] * sum;
m_h = (m + 1) >> 1;
for (i = 0; i < m_h; i++){
sum = a[i+1] + k[m] * a[m - i];
a[m - i] += k[m] * a[i+1];
a[i+1] = sum;
}
a[m+1] = k[m];
}
}
}
/*----------------------------------------------------------------*
* interpolation between vectors
*---------------------------------------------------------------*/
void interpolate(
float *out, /* (o) the interpolated vector */
float *in1, /* (i) the first vector for the
interpolation */
float *in2, /* (i) the second vector for the
interpolation */
float coef, /* (i) interpolation weights */
int length /* (i) length of all vectors */
){
int i;
float invcoef;
invcoef = (float)1.0 - coef;
for (i = 0; i < length; i++) {
out[i] = coef * in1[i] + invcoef * in2[i];
}
}
/*----------------------------------------------------------------*
* lpc bandwidth expansion
*---------------------------------------------------------------*/
void bwexpand(
float *out, /* (o) the bandwidth expanded lpc
coefficients */
float *in, /* (i) the lpc coefficients before bandwidth
expansion */
float coef, /* (i) the bandwidth expansion factor */
int length /* (i) the length of lpc coefficient vectors */
){
int i;
Andersen, et al. Experimental [Page 142]
RFC 3951 Internet Low Bit Rate Codec December 2004
float chirp;
chirp = coef;
out[0] = in[0];
for (i = 1; i < length; i++) {
out[i] = chirp * in[i];
chirp *= coef;
}
}
/*----------------------------------------------------------------*
* vector quantization
*---------------------------------------------------------------*/
void vq(
float *Xq, /* (o) the quantized vector */
int *index, /* (o) the quantization index */
const float *CB,/* (i) the vector quantization codebook */
float *X, /* (i) the vector to quantize */
int n_cb, /* (i) the number of vectors in the codebook */
int dim /* (i) the dimension of all vectors */
){
int i, j;
int pos, minindex;
float dist, tmp, mindist;
pos = 0;
mindist = FLOAT_MAX;
minindex = 0;
for (j = 0; j < n_cb; j++) {
dist = X[0] - CB[pos];
dist *= dist;
for (i = 1; i < dim; i++) {
tmp = X[i] - CB[pos + i];
dist += tmp*tmp;
}
if (dist < mindist) {
mindist = dist;
minindex = j;
}
pos += dim;
}
for (i = 0; i < dim; i++) {
Xq[i] = CB[minindex*dim + i];
}
*index = minindex;
Andersen, et al. Experimental [Page 143]
RFC 3951 Internet Low Bit Rate Codec December 2004
}
/*----------------------------------------------------------------*
* split vector quantization
*---------------------------------------------------------------*/
void SplitVQ(
float *qX, /* (o) the quantized vector */
int *index, /* (o) a vector of indexes for all vector
codebooks in the split */
float *X, /* (i) the vector to quantize */
const float *CB,/* (i) the quantizer codebook */
int nsplit, /* the number of vector splits */
const int *dim, /* the dimension of X and qX */
const int *cbsize /* the number of vectors in the codebook */
){
int cb_pos, X_pos, i;
cb_pos = 0;
X_pos= 0;
for (i = 0; i < nsplit; i++) {
vq(qX + X_pos, index + i, CB + cb_pos, X + X_pos,
cbsize[i], dim[i]);
X_pos += dim[i];
cb_pos += dim[i] * cbsize[i];
}
}
/*----------------------------------------------------------------*
* scalar quantization
*---------------------------------------------------------------*/
void sort_sq(
float *xq, /* (o) the quantized value */
int *index, /* (o) the quantization index */
float x, /* (i) the value to quantize */
const float *cb,/* (i) the quantization codebook */
int cb_size /* (i) the size of the quantization codebook */
){
int i;
if (x <= cb[0]) {
*index = 0;
*xq = cb[0];
} else {
i = 0;
while ((x > cb[i]) && i < cb_size - 1) {
i++;
Andersen, et al. Experimental [Page 144]
RFC 3951 Internet Low Bit Rate Codec December 2004
}
if (x > ((cb[i] + cb[i - 1])/2)) {
*index = i;
*xq = cb[i];
} else {
*index = i - 1;
*xq = cb[i - 1];
}
}
}
/*----------------------------------------------------------------*
* check for stability of lsf coefficients
*---------------------------------------------------------------*/
int LSF_check( /* (o) 1 for stable lsf vectors and 0 for
nonstable ones */
float *lsf, /* (i) a table of lsf vectors */
int dim, /* (i) the dimension of each lsf vector */
int NoAn /* (i) the number of lsf vectors in the
table */
){
int k,n,m, Nit=2, change=0,pos;
float tmp;
static float eps=(float)0.039; /* 50 Hz */
static float eps2=(float)0.0195;
static float maxlsf=(float)3.14; /* 4000 Hz */
static float minlsf=(float)0.01; /* 0 Hz */
/* LSF separation check*/
for (n=0; n<Nit; n++) { /* Run through a couple of times */
for (m=0; m<NoAn; m++) { /* Number of analyses per frame */
for (k=0; k<(dim-1); k++) {
pos=m*dim+k;
if ((lsf[pos+1]-lsf[pos])<eps) {
if (lsf[pos+1]<lsf[pos]) {
tmp=lsf[pos+1];
lsf[pos+1]= lsf[pos]+eps2;
lsf[pos]= lsf[pos+1]-eps2;
} else {
lsf[pos]-=eps2;
lsf[pos+1]+=eps2;
}
change=1;
Andersen, et al. Experimental [Page 145]
RFC 3951 Internet Low Bit Rate Codec December 2004
}
if (lsf[pos]<minlsf) {
lsf[pos]=minlsf;
change=1;
}
if (lsf[pos]>maxlsf) {
lsf[pos]=maxlsf;
change=1;
}
}
}
}
return change;
}
/******************************************************************
iLBC Speech Coder ANSI-C Source Code
hpOutput.h
Copyright (C) The Internet Society (2004).
All Rights Reserved.
******************************************************************/
#ifndef __iLBC_HPOUTPUT_H
#define __iLBC_HPOUTPUT_H
void hpOutput(
float *In, /* (i) vector to filter */
int len,/* (i) length of vector to filter */
float *Out, /* (o) the resulting filtered vector */
float *mem /* (i/o) the filter state */
);
#endif
/******************************************************************
iLBC Speech Coder ANSI-C Source Code
iCBConstruct.h
Copyright (C) The Internet Society (2004).
All Rights Reserved.
Andersen, et al. Experimental [Page 149]
RFC 3951 Internet Low Bit Rate Codec December 2004
******************************************************************/
#ifndef __iLBC_ICBCONSTRUCT_H
#define __iLBC_ICBCONSTRUCT_H
void index_conv_enc(
int *index /* (i/o) Codebook indexes */
);
void index_conv_dec(
int *index /* (i/o) Codebook indexes */
);
void iCBConstruct(
float *decvector, /* (o) Decoded vector */
int *index, /* (i) Codebook indices */
int *gain_index,/* (i) Gain quantization indices */
float *mem, /* (i) Buffer for codevector construction */
int lMem, /* (i) Length of buffer */
int veclen, /* (i) Length of vector */
int nStages /* (i) Number of codebook stages */
);
#endif
/******************************************************************
iLBC Speech Coder ANSI-C Source Code
iCBSearch.h
Copyright (C) The Internet Society (2004).
All Rights Reserved.
******************************************************************/
#ifndef __iLBC_ICBSEARCH_H
#define __iLBC_ICBSEARCH_H
Andersen, et al. Experimental [Page 152]
RFC 3951 Internet Low Bit Rate Codec December 2004
void iCBSearch(
iLBC_Enc_Inst_t *iLBCenc_inst,
/* (i) the encoder state structure */
int *index, /* (o) Codebook indices */
int *gain_index,/* (o) Gain quantization indices */
float *intarget,/* (i) Target vector for encoding */
float *mem, /* (i) Buffer for codebook construction */
int lMem, /* (i) Length of buffer */
int lTarget, /* (i) Length of vector */
int nStages, /* (i) Number of codebook stages */
float *weightDenum, /* (i) weighting filter coefficients */
float *weightState, /* (i) weighting filter state */
int block /* (i) the sub-block number */
);
#endif
/******************************************************************
iLBC Speech Coder ANSI-C Source Code
packing.h
Copyright (C) The Internet Society (2004).
All Rights Reserved.
******************************************************************/
#ifndef __PACKING_H
#define __PACKING_H
void packsplit(
int *index, /* (i) the value to split */
int *firstpart, /* (o) the value specified by most
significant bits */
int *rest, /* (o) the value specified by least
significant bits */
int bitno_firstpart, /* (i) number of bits in most
significant part */
int bitno_total /* (i) number of bits in full range
of value */
);
void packcombine(
int *index, /* (i/o) the msb value in the
combined value out */
int rest, /* (i) the lsb value */
int bitno_rest /* (i) the number of bits in the
lsb part */
);
void dopack(
unsigned char **bitstream, /* (i/o) on entrance pointer to
place in bitstream to pack
new data, on exit pointer
to place in bitstream to
pack future data */
int index, /* (i) the value to pack */
int bitno, /* (i) the number of bits that the
value will fit within */
int *pos /* (i/o) write position in the
current byte */
);
Andersen, et al. Experimental [Page 178]
RFC 3951 Internet Low Bit Rate Codec December 2004
void unpack(
unsigned char **bitstream, /* (i/o) on entrance pointer to
place in bitstream to
unpack new data from, on
exit pointer to place in
bitstream to unpack future
data from */
int *index, /* (o) resulting value */
int bitno, /* (i) number of bits used to
represent the value */
int *pos /* (i/o) read position in the
current byte */
);
#endif
/******************************************************************
iLBC Speech Coder ANSI-C Source Code
packing.c
Copyright (C) The Internet Society (2004).
All Rights Reserved.
******************************************************************/
#include <math.h>
#include <stdlib.h>
#include "iLBC_define.h"
#include "constants.h"
#include "helpfun.h"
#include "string.h"
/*----------------------------------------------------------------*
* splitting an integer into first most significant bits and
* remaining least significant bits
*---------------------------------------------------------------*/
void packsplit(
int *index, /* (i) the value to split */
int *firstpart, /* (o) the value specified by most
significant bits */
int *rest, /* (o) the value specified by least
significant bits */
Andersen, et al. Experimental [Page 179]
RFC 3951 Internet Low Bit Rate Codec December 2004
int bitno_firstpart, /* (i) number of bits in most
significant part */
int bitno_total /* (i) number of bits in full range
of value */
){
int bitno_rest = bitno_total-bitno_firstpart;
*firstpart = *index>>(bitno_rest);
*rest = *index-(*firstpart<<(bitno_rest));
}
/*----------------------------------------------------------------*
* combining a value corresponding to msb's with a value
* corresponding to lsb's
*---------------------------------------------------------------*/
void packcombine(
int *index, /* (i/o) the msb value in the
combined value out */
int rest, /* (i) the lsb value */
int bitno_rest /* (i) the number of bits in the
lsb part */
){
*index = *index<<bitno_rest;
*index += rest;
}
/*----------------------------------------------------------------*
* packing of bits into bitstream, i.e., vector of bytes
*---------------------------------------------------------------*/
void dopack(
unsigned char **bitstream, /* (i/o) on entrance pointer to
place in bitstream to pack
new data, on exit pointer
to place in bitstream to
pack future data */
int index, /* (i) the value to pack */
int bitno, /* (i) the number of bits that the
value will fit within */
int *pos /* (i/o) write position in the
current byte */
){
int posLeft;
/* Clear the bits before starting in a new byte */
if ((*pos)==0) {
Andersen, et al. Experimental [Page 180]
RFC 3951 Internet Low Bit Rate Codec December 2004
**bitstream=0;
}
while (bitno>0) {
/* Jump to the next byte if end of this byte is reached*/
if (*pos==8) {
*pos=0;
(*bitstream)++;
**bitstream=0;
}
posLeft=8-(*pos);
/* Insert index into the bitstream */
if (bitno <= posLeft) {
**bitstream |= (unsigned char)(index<<(posLeft-bitno));
*pos+=bitno;
bitno=0;
} else {
**bitstream |= (unsigned char)(index>>(bitno-posLeft));
*pos=8;
index-=((index>>(bitno-posLeft))<<(bitno-posLeft));
bitno-=posLeft;
}
}
}
/*----------------------------------------------------------------*
* unpacking of bits from bitstream, i.e., vector of bytes
*---------------------------------------------------------------*/
void unpack(
unsigned char **bitstream, /* (i/o) on entrance pointer to
place in bitstream to
unpack new data from, on
exit pointer to place in
bitstream to unpack future
data from */
int *index, /* (o) resulting value */
int bitno, /* (i) number of bits used to
represent the value */
int *pos /* (i/o) read position in the
current byte */
Andersen, et al. Experimental [Page 181]
RFC 3951 Internet Low Bit Rate Codec December 2004
){
int BitsLeft;
*index=0;
while (bitno>0) {
/* move forward in bitstream when the end of the
byte is reached */
if (*pos==8) {
*pos=0;
(*bitstream)++;
}
BitsLeft=8-(*pos);
/* Extract bits to index */
if (BitsLeft>=bitno) {
*index+=((((**bitstream)<<(*pos)) & 0xFF)>>(8-bitno));
*pos+=bitno;
bitno=0;
} else {
if ((8-bitno)>0) {
*index+=((((**bitstream)<<(*pos)) & 0xFF)>>
(8-bitno));
*pos=8;
} else {
*index+=(((int)(((**bitstream)<<(*pos)) & 0xFF))<<
(bitno-8));
*pos=8;
}
bitno-=BitsLeft;
}
}
}
/******************************************************************
iLBC Speech Coder ANSI-C Source Code
StateConstructW.h
Andersen, et al. Experimental [Page 182]
RFC 3951 Internet Low Bit Rate Codec December 2004
Copyright (C) The Internet Society (2004).
All Rights Reserved.
******************************************************************/
#ifndef __iLBC_STATECONSTRUCTW_H
#define __iLBC_STATECONSTRUCTW_H
void StateConstructW(
int idxForMax, /* (i) 6-bit index for the quantization of
max amplitude */
int *idxVec, /* (i) vector of quantization indexes */
float *syntDenum, /* (i) synthesis filter denumerator */
float *out, /* (o) the decoded state vector */
int len /* (i) length of a state vector */
);
#endif
/******************************************************************
iLBC Speech Coder ANSI-C Source Code
StateConstructW.c
Copyright (C) The Internet Society (2004).
All Rights Reserved.
******************************************************************/
#include <math.h>
#include <string.h>
#include "iLBC_define.h"
#include "constants.h"
#include "filter.h"
/*----------------------------------------------------------------*
* decoding of the start state
*---------------------------------------------------------------*/
void StateConstructW(
int idxForMax, /* (i) 6-bit index for the quantization of
max amplitude */
int *idxVec, /* (i) vector of quantization indexes */
float *syntDenum, /* (i) synthesis filter denumerator */
Andersen, et al. Experimental [Page 183]
RFC 3951 Internet Low Bit Rate Codec December 2004
float *out, /* (o) the decoded state vector */
int len /* (i) length of a state vector */
){
float maxVal, tmpbuf[LPC_FILTERORDER+2*STATE_LEN], *tmp,
numerator[LPC_FILTERORDER+1];
float foutbuf[LPC_FILTERORDER+2*STATE_LEN], *fout;
int k,tmpi;
/* decoding of the maximum value */
maxVal = state_frgqTbl[idxForMax];
maxVal = (float)pow(10,maxVal)/(float)4.5;
/* initialization of buffers and coefficients */
memset(tmpbuf, 0, LPC_FILTERORDER*sizeof(float));
memset(foutbuf, 0, LPC_FILTERORDER*sizeof(float));
for (k=0; k<LPC_FILTERORDER; k++) {
numerator[k]=syntDenum[LPC_FILTERORDER-k];
}
numerator[LPC_FILTERORDER]=syntDenum[0];
tmp = &tmpbuf[LPC_FILTERORDER];
fout = &foutbuf[LPC_FILTERORDER];
/* decoding of the sample values */
for (k=0; k<len; k++) {
tmpi = len-1-k;
/* maxVal = 1/scal */
tmp[k] = maxVal*state_sq3Tbl[idxVec[tmpi]];
}
/* circular convolution with all-pass filter */
memset(tmp+len, 0, len*sizeof(float));
ZeroPoleFilter(tmp, numerator, syntDenum, 2*len,
LPC_FILTERORDER, fout);
for (k=0;k<len;k++) {
out[k] = fout[len-1-k]+fout[2*len-1-k];
}
}
Andersen, et al. Experimental [Page 184]
RFC 3951 Internet Low Bit Rate Codec December 2004
/******************************************************************
iLBC Speech Coder ANSI-C Source Code
StateSearchW.h
Copyright (C) The Internet Society (2004).
All Rights Reserved.
******************************************************************/
#ifndef __iLBC_STATESEARCHW_H
#define __iLBC_STATESEARCHW_H
void AbsQuantW(
iLBC_Enc_Inst_t *iLBCenc_inst,
/* (i) Encoder instance */
float *in, /* (i) vector to encode */
float *syntDenum, /* (i) denominator of synthesis filter */
float *weightDenum, /* (i) denominator of weighting filter */
int *out, /* (o) vector of quantizer indexes */
int len, /* (i) length of vector to encode and
vector of quantizer indexes */
int state_first /* (i) position of start state in the
80 vec */
);
void StateSearchW(
iLBC_Enc_Inst_t *iLBCenc_inst,
/* (i) Encoder instance */
float *residual,/* (i) target residual vector */
float *syntDenum, /* (i) lpc synthesis filter */
float *weightDenum, /* (i) weighting filter denuminator */
int *idxForMax, /* (o) quantizer index for maximum
amplitude */
int *idxVec, /* (o) vector of quantization indexes */
int len, /* (i) length of all vectors */
int state_first /* (i) position of start state in the
80 vec */
);
#endif
Andersen, et al. Experimental [Page 185]
RFC 3951 Internet Low Bit Rate Codec December 2004
/******************************************************************
iLBC Speech Coder ANSI-C Source Code
syntFilter.h
Copyright (C) The Internet Society (2004).
All Rights Reserved.
******************************************************************/
#ifndef __iLBC_SYNTFILTER_H
#define __iLBC_SYNTFILTER_H
void syntFilter(
float *Out, /* (i/o) Signal to be filtered */
float *a, /* (i) LP parameters */
int len, /* (i) Length of signal */
float *mem /* (i/o) Filter state */
);
#endif
/******************************************************************
iLBC Speech Coder ANSI-C Source Code
syntFilter.c
Copyright (C) The Internet Society (2004).
All Rights Reserved.
******************************************************************/
#include "iLBC_define.h"
/*----------------------------------------------------------------*
* LP synthesis filter.
*---------------------------------------------------------------*/
void syntFilter(
float *Out, /* (i/o) Signal to be filtered */
float *a, /* (i) LP parameters */
int len, /* (i) Length of signal */
Andersen, et al. Experimental [Page 190]
RFC 3951 Internet Low Bit Rate Codec December 2004
float *mem /* (i/o) Filter state */
){
int i, j;
float *po, *pi, *pa, *pm;
po=Out;
/* Filter first part using memory from past */
for (i=0; i<LPC_FILTERORDER; i++) {
pi=&Out[i-1];
pa=&a[1];
pm=&mem[LPC_FILTERORDER-1];
for (j=1; j<=i; j++) {
*po-=(*pa++)*(*pi--);
}
for (j=i+1; j<LPC_FILTERORDER+1; j++) {
*po-=(*pa++)*(*pm--);
}
po++;
}
/* Filter last part where the state is entirely in
the output vector */
for (i=LPC_FILTERORDER; i<len; i++) {
pi=&Out[i-1];
pa=&a[1];
for (j=1; j<LPC_FILTERORDER+1; j++) {
*po-=(*pa++)*(*pi--);
}
po++;
}
/* Update state vector */
memcpy(mem, &Out[len-LPC_FILTERORDER],
LPC_FILTERORDER*sizeof(float));
}
Andersen, et al. Experimental [Page 191]
RFC 3951 Internet Low Bit Rate Codec December 2004
Authors' Addresses
Soren Vang Andersen
Department of Communication Technology
Aalborg University
Fredrik Bajers Vej 7A
9200 Aalborg
Denmark
Phone: ++45 9 6358627
EMail: sva@kom.auc.dk
Alan Duric
Telio AS
Stoperigt. 2
Oslo, N-0250
Norway
Phone: +47 21673555
EMail: alan.duric@telio.no
Henrik Astrom
Global IP Sound AB
Olandsgatan 42
Stockholm, S-11663
Sweden
Phone: +46 8 54553040
EMail: henrik.astrom@globalipsound.com
Roar Hagen
Global IP Sound AB
Olandsgatan 42
Stockholm, S-11663
Sweden
Phone: +46 8 54553040
EMail: roar.hagen@globalipsound.com
Andersen, et al. Experimental [Page 192]
RFC 3951 Internet Low Bit Rate Codec December 2004
W. Bastiaan Kleijn
Global IP Sound AB
Olandsgatan 42
Stockholm, S-11663
Sweden
Phone: +46 8 54553040
EMail: bastiaan.kleijn@globalipsound.com
Jan Linden
Global IP Sound Inc.
900 Kearny Street, suite 500
San Francisco, CA-94133
USA
Phone: +1 415 397 2555
EMail: jan.linden@globalipsound.com
Andersen, et al. Experimental [Page 193]
RFC 3951 Internet Low Bit Rate Codec December 2004
Full Copyright Statement
Copyright (C) The Internet Society (2004).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the IETF's procedures with respect to rights in IETF Documents can
be found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at ietf-
ipr@ietf.org.
Acknowledgement
Funding for the RFC Editor function is currently provided by the
Internet Society.
Andersen, et al. Experimental [Page 194]