Earlier I promised to do some intensive investigation into the question of what settings in Audacity and Lame result in the best compromise of quality and filesize for podfic recordings. Here are the results.
First, I want to make it clear that I am only talking about MP3 files as recorded with
Audacity and exported with the Lame encoder. Other formats may be better (AAC, the encoding used in the m4b podbook format, provides better sound quality for a given bitrate; Speex is specifically formatted for speech recording and playback) but are not as universally playable; other codecs may provide superior encoding, but most people recording podfic do it with the free Audacity/Lame combination. I may experiment with other codecs and formats in the future.
There are four variables in play: the sample rate, the sample size, the bit rate, and the number of channels (stereo/mono).
The sample rate is the frequency at which the signal is stored, that is, how many samples per second are stored. The standard sample rate of commercial CD audio is 44,100 samples per second = 44.1kHz. The audible frequencies are limited to half the sample rate; because voice has a narrow range from high to low frequency, sample rates as low as 8kHz are sometimes used. The sample rate affects the size of the Audacity project files, but not the size of the exported MP3 file, therefore, there is no reason not to use the 44.1kHz default other than conservation of space on your own computer.
The sample size is the amount of computer storage used to save the digital representation of each sample. An audio CD has a precision of 16 bits; Audacity also allows 24-bit sample size, which is commonly used in digital recording, and 32-bit float, which has almost infinite dynamic range. These greater ranges between loud and soft are almost certainly not needed in voice recording, but again, as the sample size affects the size of the Audacity project files, but not the size of the exported MP3 file, there is no reason not to use the 32-bit float default other than conservation of space on your own computer.
(Okay, I'm handwaving a bit, here, because there is another reason to reduce the precision of frequency and decibel range: to reduce unwanted noise. But the
Audacity Wiki suggests that
using the noise removal filter is more effective and less likely to compromise the signal.)
The bit rate is the number of bits per second devoted to data storage. MP3 is a lossy format - some of the data which is deemed "unimportant" by the algorithm is ignored - and so the bit rate determines the resolution of the resulting file. 128 kbps is the commercial CD audio standard, and is the default produced by the Lame encoder, but a wide selection of bit rates are available both above and below this value. A larger bit rate will result in a higher-quality and larger file. Filesize is a direct function of bit rate, i.e., twice the bit rate = twice the size.
The number of channels is either one or two: mono or stereo. Mono recordings play the same channel in both ears of the listener, but stereo recordings have two different channels, one in each ear. When you export to MP3, the bit rate you encode to will be used for both channels together. In practical terms, what this means is that a stereo MP3 encoded at 128 kbps will be twice the filesize but have the same quality as a mono MP3 encoded at 64kbps. Therefore, unless you are using stereo effects in your podfic, you should use a single channel (mono).
Okay, those are the facts. Now comes the subjectivity. Various sources (see bibliography) generally agreed that 32-48kbps is sufficient for speech. I recorded a test message (filtering it with the noise removal effect) sampled at various settings, and exported it at various settings, uploaded to my mp3 player and listened to it using earbuds, and drew the following conclusions:
- Sample rate variation: 11.025 kHz recording gives a terrible sound quality; the difference between 44.1 and 22.05 is slight but noticeable.
- Sample size variation: I noticed no difference among the rates of 16, 24, and 32 bit sample size.
- Mono/Stereo variation: As expected, a mono recording at a bit rate of X kbps and a stereo recording at a bit rate of 2X kbps sounded identical.
- Bit rate variation: At 16 kbps, a mono recording is noticeably degraded; it's possible that there's a difference between 32 and 48 kbps, but I am really not sure if I'm just imagining it. I can't tell the difference among 48, 64, 96, and 128 at all. Incidentally, stereo at 16 kbps sounded truly awful.
Want to try on your own?
Here is a zip file containing my test audio files. The file naming convention I used is: test2.[sample rate].[sample size].[channels].[bit rate].mp3 - for example, test2.44.32.2.128.mp3 = 44.1 kHz, 32-bit float, stereo, exported at 128 kbps.
The moral of the story is: for small audio files with minimal quality degradation, record your podfic in mono format using Audacity's defaults, and export it at either 32 or 48 kbps.
Bibliography and links
Audacity WikiWhat are Bitrates?The Secret Lives of MP3 FilesWWW FAQs: What bitrate should I use for my audio files?Recommended Bitrates for CD, FM Radio, and Speech Sources