The document summarizes linear predictive coding (LPC), a speech compression technique. LPC works by modeling the human vocal tract and representing each speech segment as a linear combination of past speech samples. It analyzes speech signals by determining if segments are voiced or unvoiced, estimating the pitch period, and computing filter coefficients. The coefficients and other parameters are transmitted to allow reconstruction of the speech. LPC can achieve a bit rate of 2400 bps, making it suitable for secure communications. Simulation results show LPC can compress male and female speech but introduces noise, performing better on male voices which have less high frequencies.
1 of 4
Downloaded 17 times
More Related Content
Speech Compression using LPC
1. Adaptive Signal Processing Term Paper 2015 DISHA MODI (Roll No:15MECC12) 1
AbstractThe past decade has observed progress towards the
submission of low-rate speech coders to public and military
communications. It is essential to this progress that has been the
new speech coders accomplished high quality speech at low data
rates. These coders include mechanisms to show the spectral
properties of speech like speech waveform matching, and
improve the code performance for the human ear. Several of
these have been adopted in cellular telephony standards.
Service providers are unceasingly met with the challenge of
accommodating more users within a limited allocated bandwidth
in mobile communication services. For this object, service
providers are constantly in search of low bit-rate speech coders
that deliver high-quality speech.
In this paper the simulated low bit rate speech signal using
Linear Predictive Coding (LPC) in MATLAB was implemented.
Index TermsAuto Correlation, Formants, LPC, Levinson
Durbin recursion.
I. INTRODUCTION
LPC was first introduced as a method for encoding human
speech by the United States Department of Defense in federal
standard 1015, published in 1984[1]. Vocal tract can be
approximated as a variable diameter tube. Human speech is
produced in the vocal tract. The linear predictive coding
(LPC) model is based on the vocal tract characterized by this
tube of a varying diameter and it represented in mathematical
approximation. At a particular time, the speech sample is
equals to linear sum of the p previous samples. The important
facet of LPC is the linear predictive filter which determines
the value of the next sample by a linear combination of
previous samples. In normal scenario, speech is sampled at
8000 samples/second with 8 bits quantization. This delivers
data rate of 64000 bits/second. Linear predictive coding drops
this to 2400 bits/second.[1]. At this rate the speech has a
distinct synthetic sound and there is an obvious loss of quality.
However, the speech can still be easily understandable and
audible to human kind. Hence, it is a lossy form of
compression.
Sometimes, lossy algorithms are thought-out acceptable
because the loss of quality is often undetectable to the human
ear. Fact is that in conversations silence take up greater than
50% of time. It is an easy way to save bandwidth that not to
transmit the silence. One important thing about speech
production is that mechanically there is a high correlation
between adjacent samples of speech.
II. LPC SYSTEM IMPLEMENTATION
The filter model used in LPC is known as the linear predictive
filter. It has two key components: analysis / encoding and
synthesis / decoding.
III. LPC Analyzing/encoding
The encoding part of LPC includes observing the speech
signal and break down it into segments.
Fig. 1 LPC encoder block-diagram
LP methods have been used in control and information
theorycalled methods of system estimation and system
identification used extensively in speech under group of
names mentioned below referred from [7].
1. covariance method
2. autocorrelation method
3. lattice method
4. inverse filter formulation
5. spectral estimation formulation
6. maximum likelihood method
7. inner product method
A. Input speech
Under the normal situation, the input signal is sampled at a
rate of 8000 samples per second. This input signal is then
break down into segments and it is transmitted to the receiver.
The 8000 samples in each second of speech signal are broken
into approx. 180 sample segments. This means that each
segment represents 22.5 milliseconds of the input speech
signal.
B. Voice/Unvoiced Determination
As per LPC algorithm, before a speech segment is determined
as being voiced or unvoiced it is first passed through a low-
pass filter with a band of 1 kHz. It is important to determine if
a segment is voiced or unvoiced because voiced sounds have a
distinct waveform then unvoiced sounds. The LPC encoder
informs the decoder if a signal segment is voiced or unvoiced
by sending a single bit. Remember that voiced sounds are
generally vowels and can be considered as a pulse that is
similar to periodic waveforms. These sounds have very large
amplitudes and high energy levels. Voiced sounds also have
distinct formant or resonant frequencies. Unvoiced sounds are
usually non-vowel or consonants sounds and often have
random waveforms and are chaotic. It has smaller amplitudes
then voiced sounds and therefore less energy.
Hence, the decision of voiced and unvoiced speech signals is
confirmed by counting the number of times a waveform
crosses the x-axis and then comparing that value to the
normally range of values (threshold Values) for most unvoiced
and voiced sounds.
Speech Compression using LPC
Disha Modi, M.Tech (Communication),
Electronics and Communication Department
Institute of Technology - Nirma University
2. Adaptive Signal Processing Term Paper 2015 DISHA MODI (Roll No:15MECC12) 2
C. Pitch Period Estimation
The pitch period can be thought of as the period of the vocal
cord vibration that happens during the construction of voiced
speech. Therefore, the pitch period is only required for the
decoding of voiced segments and is not needed for unvoiced
segments since they are produced by turbulent air flow not
vocal cord vibrations. One type of algorithm takes advantage
of the fact that the autocorrelation of a period function,
Rxx(k), will have a maximum when k is equivalent to the
pitch period. These algorithms usually detect a maximum
value by checking the autocorrelation value against a
threshold value. One problem with algorithms that use
autocorrelation is that the validity of their results is susceptible
to interference as a result of other resonances in the vocal
tract. When interference occurs the algorithm cant guarantee
accurate results. Another problem with autocorrelation
algorithms occurs because voiced speech is not entirely
periodic. This means that the maximum will be lower than it
should be for a true periodic signal.
D. Vocal Tract Filter
The filter that is used by the decoder to re-form the original
input signal is formed based on a set of coefficients. In order
to find the filter coefficients that best match the current
segment being examined the encoder tries to minimize the
mean squared error.
=
E[ ]=0
-2E[ ]=0
[ ] [ ]
(Use fact that [ ]
Taking the derivative yields a set of M equations. To solve for
the filter coefficients E[ ] has to be estimate.
Autocorrelation is the approach that will be explained here for
linear predictive coding. Autocorrelation needs several initial
assumptions be made about the set or sequence of speech
samples, [ ], in the current segment. First, it needs [ ] be
stationary and second, it needs the [ ] sequence is zero
outside of the current segment. In autocorrelation, each
E[ ] is converted into an autocorrelation function of
the form Ryy(|i-j|). The estimation of an autocorrelation
function Ryy(k) can be expressed as follows.
Using Ryy(k), the M equations that were acquired from taking
the derivative of the mean squared error can be written in
matrix form RA = P where A contains the filter coefficients.
In order to determine the filter coefficients, the equation A =
P must be solved. This equation cannot be solved without
first computing . This is an easy computation if one
observes that R is symmetric and all diagonals consist of the
same element. This type of matrix is called a Toeplitz matrix
and can be easily inverted [1].
The Levinson-Durbin (L-D) Algorithm is a recursive
algorithm that is considered very computationally efficient
since it takes advantage of the properties of R when
determining the filter coefficients.
L-D Algorithm [2]
The basic simple ideas behind the recursion are first that it is
easy to solve the system for k =1, and second that it is also
very simple to solve for a k +1 coefficients sized problem
when we have solved a for a k coefficients sized problem. In
general none of the coefficients of the different sized problem
match, so it is not a way to calculate but a way to
calculate the whole vector as a function of ,
and . Thinking about it Levinson-Durbin induction would
be a better name.
We are looking for =[ ] so that =[ ] with
=[ ] and is not necessary at this stage. The dot
product of the second line of gives
+ = 0
Therefore,
and +
Solving the size K+1 Problem
Suppose that we have solved the size k problem and have
found , and .
Then we have
has one more row and column than so we cannot
apply it directly to , however if we expend with a zero
and call this vector we can apply to it and we get
the following interesting result
3. Adaptive Signal Processing Term Paper 2015 DISHA MODI (Roll No:15MECC12) 3
Since the matrix is symmetric, we also have something
remarkable when reversing the order of coefficients of
and calling this vector .
We can notice that a linear combination is of
the form wanted for since the first element is a 1 for all
values of . Now if there was a value of for
Calculating ) gives
IV. TRANSMITTING THE PARAMETERS[1]
In an original form, speech is usually transmitted at 64,000
bits/second using 8 bits/sample and a rate of 8000 Hz for
sampling. LPC drops this rate to 2,400 bits/second by breaking
the speech into segments and then directing the
voiced/unvoiced information, the pitch period, and the
coefficients for the filter that signifies the vocal tract for each
segment. The compressed signal used by the filter on the
receiver end is determined by the classification of the speech
segment as voiced or unvoiced and by the pitch period of the
segment. The encoder transmits a single bit to tell if the
current segment is voiced or unvoiced. The pitch period is
quantized using quantizer. 6 bits are required to represent the
pitch period.
If the segment contains voiced speech than a 10th order filter
is used. This means that 11 values are needed: 10 reflection
coefficients and the gain. If the segment contains unvoiced
speech than a 4th order filter is used. This means that 5 values
are needed: 4 reflection coefficients and the gain.
Quantization done as follows:
1 bit voiced/unvoiced
6 bits pitch period (60 values)
10 bits k1 and k2 (5 each)
10 bits k3 and k4 (5 each)
16 bits k5, k6, k7, k8 (4 each)
3 bits k9
2 bits k10
5 bits gain G
1 bit synchronization
54 bits TOTAL BITS PER FRAME
Verification for Bit Rate of LPC Speech Segments
Sample rate = 8000 samples/second
Samples per segment = 180 samples/segment
Segment rate = Sample Rate/ Samples per Segment
= (8000 samples/second)/ (180 samples/second)
= 44.444444.... Segments/second
Segment size = 54 bits/segment
Bit rate = Segment size * Segment rate
= (54 bits/segment) * (44.44 segments/second)
= 2400 bits/second
V. LPC synthesis/decoding
Fig. 2 LPC synthesizer/decoder block-diagram [4]
The process of decoding a sequence of speech segments is the
reverse of the encoding process. Each segment is decoded
individually and the sequence of reproduced sound segments
is joined together to represent the entire input speech signal.
The decoding or synthesis of a speech segment is based on the
54 bits of information that are transmitted from the encoder.
Each segment of speech has a different LPC filter that is
eventually produced using the reflection coefficients and the
gain that are received from the encoder. 10 reflection
coefficients are used for voiced segment filters and 4
reflection coefficients are used for unvoiced segments. These
reflection coefficients are used to generate the vocal tract
coefficients or parameters which are used to create the filter.
The final step of decoding a segment of speech is to pass the
excitement signal through the filter to produce the synthesized
speech signal.
VI. APPLICATION
In general, the most common usage for speech compression is
in standard telephone systems. In fact, a lot of the technology
4. Adaptive Signal Processing Term Paper 2015 DISHA MODI (Roll No:15MECC12) 4
used in speech compression was developed by the phone
companies. Further applications of LPC and other speech
compression schemes are voice mail systems, telephone
answering machines, and multimedia applications. Most
multimedia applications, unlike telephone applications,
involve one-way communication and involve storing the data.
SIMULATION RESULTS
Simulated low bit rate different speech signals using Linear
Predictive Coding (LPC) in MATLAB was implemented.
Fig. 3 Female Original Voice
Fig. 4 Female LPC coded Voice
Fig. 5 Male Original Voice
Fig. 6 Male LPC coded Voice
Performance measurements of LPC compressed signals (both
male and female) are shown in Table I. Looking at the SNR
computed in Table I, it is obvious that both male and female
sounds are noisy as they have a low SNR value. It observed
that for all levels of compression the quality is better with
male signal than female signal; On the other hand the
compression factor with female signal has larger values
comparable with these of male signal. This result is expected
because the female voice has more high frequencies than male
voice. It has observed that no further enhancements can be
achieved beyond certain level of decomposition for both
signals.
PARAMETER MALE FEMALE
Sampling Rate 8000 8000
File length
(in seconds)
2.07 2.77
Length of Original
Signal
99328 133120
Length of
Constructed Signal
97920 132480
SNR(in dB) 17.077 14.77
Compression Ratio 0.9858 0.9952
Table 1 Comparison of male and female LPC synthesized voice
CONCLUSION
Linear Predictive Coding is an analysis/synthesis technique to
lossy speech compression that attempts to model the human
production of sound instead of transmitting an estimate of the
sound wave. Linear predictive coding achieves a bit rate of
2400 bits/second which makes it ideal for use in secure
telephone systems. Secure telephone systems are more
concerned that the content and meaning of speech, rather than
the quality of speech, be preserved. The tradeoff for LPCs
low bit rate is that it does have some difficulty with certain
sounds and it produces speech that sound synthetic. Linear
predictive coding encoders break up a sound signal into
different segments and then send information on each segment
to the decoder. The encoder send information on whether the
segment is voiced or unvoiced and the pitch period for voiced
segment which is used to create an excitement signal in the
decoder. The encoder also sends information about the vocal
tract which is used to build a filter on the decoder side which
when given the excitement signal as input can reproduce the
original speech.
REFERENCES
[1] J. Bradbury, Linear Predictive Coding, 2000.
[2] C. Collomb, 1 . Description of Linear Prediction 2 . Minimizing the
error, pp. 17, 2009.
[3] D. R. Sandeep, Compression and Enhancement of Speech Signals, no.
Seiscon, pp. 774779, 2011.
[4] M. A. Osman, N. Al, H. M. Magboub, and S. A. Alfandi, Speech
compression uses LPC and wavelet, pp. 9299, 2010.
[5] V. Hardman and O. Hodson. Internet/Mbone Audio (2000) 5-7.
[6] Scott C. Douglas. Introduction to Adaptive Filters, Digital Signal
Processing Handbook (1999) 7-12.
[7] D. S. Processing, Digital Speech Processing Lecture 13 Linear
Predictive Coding ( LPC ) - Introduction LPC Methods.
Poor, H. V., Looney, C. G., Marks II, R. J., Verd炭, S., Thomas, J. A.,
Cover, T. M. Information Theory. The Electrical Engineering Handbook
(2000) 56-57.