
S3/Diamond Rio 600 Review
Audio Compression Explained - Continued
November 12, 2001
This basic idea is called perceptual coding, and basically results in the audio being stored as it sounds, rather than as it is. To decide which audio data should be kept and which discarded the compressor refers to what is called a psychoacoustics model. This tells the encoder which data are redundant or irrelevant.
Redundant information is the easier of the two to discard because it just removes data that is duplicate. Also, anything stored over a frequency of 22.05 kHz is ignored because its outside the basic scope of human perception. Some people will argue that these high-end frequencies are important to accurate sound reproduction, but for the general listening public these frequencies are unimportant.
Irrelevant data is a more complex to judge and requires much more use of the psychoacoustics model to eliminate. The idea behind psychoacoustics coding is that certain properties of a waveform are in effect meaningless to a listener.
Masking is one aspect of the psychoacoustics model; it refers to the tendency of a listener to prioritise certain sounds according to the situations where they arise. For example, a cough in a quiet room would seem loud, but if the same cough was made after a door slam, the cough would be perceived as being much quieter.
This quirk of human audio perception is what allows MP3 encoding to remove as much data as it does. However, such masked sounds are not totally removed from the audio waveform, but instead given a lower priority in the audio recording. This is accomplished by marked sounds being assigned fewer bits of data than unmasked sounds. Throughout a piece of music there are thousands of places where masking is used, producing a compressed version that sounds almost identical the unmarked piece.
The first step of converting a PCM audio stream into a compressed MP3 file involves slicing the audio stream into frequency bands. The MP3 encoders use a mathematical algorithm to do this job, such as FFT (Fast Fourier Transformation) or DCT (Discrete Cosine Transformation). Typically these processes take the original PCM stream and separate it into 32 bands, each representing part of the frequency spectrum. One of these bands may contain low frequency data, others high frequency. The psychoacoustics model is consulted to decide if any of these frequencies are irrelevant. This stage of the calculation works out which sounds can be given fewer bits of data without the change being audible.
Next, these bands are processed into frames, and the encoder uses these to decide where marking will be used. This calculation is known at the mask-to-noise radio, and with this information the bit allocation for each frame can be calculated. In this final stage the encoder decides how many bits should be used to encode each frame. The total number of bits available to allocate is decided before the encoding takes place, typically at 128 kbps (kilobits per second) for most MP3 tracks.
| Previous: « Audio Compression Explained | Next: WMA Vs. MP3 » |
|
Add hardwarecentral.com to your favorites
|

