Java Sound Resources: FAQ: Audio Programming

This page presents Questions and Answers related to the Java Sound API.

Audio Programming

1. DataLines
1.1. General
1.1.1. How can I be notified when data is available for write/read in a SourceDataLine or TargetDataLine?
1.1.2. Why does it fail to open any line with 16 kHz sample rate?
1.1.3. How can I get a SourceDataLine or TargetDataLine in μ-law format?
1.1.4. Why does simultaneous recording and playback only work when first opening the playback line (SourceDataLine)?
1.1.5. Why doesn't simultaneous recording and playback work at all with the Sun JDK 1.3/1.4 on GNU/Linux?
1.1.6. How can I get a Line from a specific Mixer?
1.1.7. Why are there no mono lines with the "Direct Audio Devices" mixers on Linux?
1.1.8. Why is a SourceDataLine called "source" and a TargetDataLine called "target" though it's actually the other way round?
1.1.9. Why are DataLine.getFramePosition() and DataLine.getMicrosecondPosition() so inaccurate?
1.1.10. Why does DataLine.getLevel() always return -1?
1.1.11. What is the difference between DataLine.isActive() and DataLine.isRunning()?
1.1.12. How can I detect a buffer underrun or overrun?
1.1.13. Why is there no event for notifying applications of an underrun/overrun condition?
1.1.14. How can I find out the current playback or recording position?
1.1.15. How can I do looping in playback?
1.2. SourceDataLine
1.2.1. How can I avoid that the last bit of sound played on a SourceDataLine is repeated?
1.2.2. Why is playback distorted, too fast or too slow with the JDK 1.5.0 beta, but not with earlier versions of the JDK?
1.3. TargetDataLine
1.3.1. How can I capture from a specific source (microphone or line-in)?
1.3.2. How can I get more than one TargetDataLine?
1.3.3. Why is in not possible to open more than one TargetDataLine at the same time?
1.3.4. Why do I get a LineUnavailableException: "Requested format incompatible with already established device format"?
1.3.5. How can I control the volume when recording with a TargetDataLine?
1.3.6. How should I use stop() and drain() on a TargetDataLine?
1.3.7. Why is TargetDataLine.read() blocking for a long time?
1.3.8. Why is the end of recordings cut off prematurely?
1.4. Clip
1.4.1. Why do I get an out of memory exception when trying to use a Clip with a 5 MB audio file?
1.4.2. Why do I get "LineUnavailableException: No Free Voices" when opening a Clip?
1.4.3. How can I rewind a Clip?
1.4.4. Why does the frame/microsecond position not jump back to zero when a Clip is looped?
1.4.5. Why are there failures, clicks and other random effects if a clip is played multiple times with 1.5?
2. Controls
2.1. Why does the SourceDataLine instances I get when using the "Direct Audio Device" (ALSA on Linux) have no controls?
2.2. What is the difference between a BALANCE and a PAN control? Which one should I use?
2.3. Why do mono lines from a "Direct Audio Device" have no PAN control?
2.4. Why does obtaining a gain control work with 1.4.2, but not with 1.5.0?
2.5. Why do Clip and SourceDataLine instances have no VOLUME control?
2.6. Why is there no sample rate control in 1.5.0?
3. DataLine buffers
3.1. What is the minimum buffer size I can use?
3.2. Why does a line have the default buffer size though a buffer size was specified in a DataLine.Info object when obtaining the line?
3.3. Why is it not possible to use large buffers for a DataLine with 1.5.0?
4. Mixers
4.1. What are all these mixers?
4.2. Why are there mixers from which I can't get a SourceDataLine?
4.3. How can I redirect sound output to a phone / modem device?
4.4. Can I use multiple soundcards at the same time?
4.5. Why can I record from different soundcards, but not play back to them?
4.6. How can I obtain the formats supported by a mixer (or at all)?
4.7. What formats are supported by "Direct Audio Device" mixers?
4.8. Why are there AudioFormat objects with frame rate/sample rate reported as -1 when I query a Mixer for its supported formats?
4.9. How can I detect which Port Mixer belongs to which soundcard?
4.10. How can I find out which Mixer implementation is used?
4.11. Why do I get lines from the "Java Sound Audio Engine" in the JDK 1.5.0 though the "Direct Audio Device" mixers are available, too?
5. Soundcard Drivers
5.1. Which soundcard drivers can be used by Java Sound?
5.2. How can I find out which soundcard driver is used?
5.3. I've installed ALSA and the JDK 1.4.2 to take advantage of the ALSA support. Now, how do I use it?
5.4. Can I make ALSA the default in version 1.4.2?
5.5. How can I enable mixing with the "Direct Audio Device" mixers on Linux?
5.6. What are the requirements for using the direct audio devices?
5.7. How can I find out which soundcard driver is installed on my Linux system?
5.8. How does Java Sound deal with hardware buffers of the soundcard?
6. Synchronization
6.1. How can I synchronize two or more playback lines?
6.2. How can I synchronize playback (SourceDataLines) with recording (TargetDataLines)?
6.3. How can I synchronize playback to an external clock?
6.4. Do multiple Clip instances that are looped stay in sync?
6.5. Why does recording or playing for a certain period of time results in audio data that is shorter or longer than the period I recorded / played?
6.6. How can I use Mixer.synchronize()?
7. Audio Files
7.1. How can I save audio data to a file, like .wav or .aiff?
7.2. How can I add special chunks to .wav or .aiff files (like for a descriptive text or copyright)?
7.3. Is it possible to get information about loop points (e.g. from the 'smpl' chunk in .wav files) using the AudioFileFormat properties?
7.4. Why does AudioFileFormat.getFrameLength() always return -1 for .wav files?
7.5. Why does a .wav file contain PCM_UNSIGNED data if I try to save 8 bit PCM_SIGNED data?
7.6. How can I read in a .vox file and save it as .wav file?
7.7. How can I read from a headerless audio file?
7.8. How can I determine the length or the duration of an audio file?
7.9. How can I write an audio file in smaller parts?
7.10. Why are some .wav files not recognized by Java Sound?
7.11. Why is it not possible to write big-endian data using a WaveAudioOutputStream?
7.12. How can I edit or modify audio files?
7.13. How can I play audio files where the data is cached in the RAM?
7.14. Why is there a difference between using AudioSystem.write(..., File) and using AudioSystem.write(..., OutputStream) with a FileOutputStream?
7.15. Where can I find documentation on the AudioOutputStream programming?
7.16. How can I start playback of a file at a certain position?
7.17. Is it possible to read and write multichannel audio files?
7.18. How can I compare two audio files?
7.19. Is it possible to insert recorded audio data into an existing file?
7.20. How can I store an audio file in a byte array?
7.21. Which value should I use for the length of the file in AudioOutputStreams if the length is not known in advance?
8. Sample Representation and AudioFormat
8.1. How is audio represented digitally?
8.2. In which cases should I use a floating point representation for audio data?
8.3. What is the meaning of frame rate in AudioFormat?
8.4. What is the meaning of frame size in Audioformat?
8.5. What is signed / unsigned?
8.6. How can I use Java's signed byte type to store an 8 bit unsigned sample?
8.7. How can I find out if an AudioFormat is signed or unsigned?
8.8. What is endianess / big endian / little endian?
8.9. How are samples organized in a byte array/stream?
8.10. What does "unknown sample rate" in an AudioFormat object mean?
9. Conversion between sample representations
9.1. How can I convert 8 bit signed samples to 8 bit unsigned or vice versa?
9.2. How do I convert short (16 bit) samples to bytes to store them in a byte array?
9.3. How do I convert float or double samples to bytes to store them in a byte array?
9.4. How can I reconstruct sample values from a byte array?
9.5. How can I convert between mono and stereo?
9.6. How can I make a mono stream appear on one channel of a stereo stream?
10. AudioInputStreams and Byte Arrays
10.1. How can I read an audio file and store the audio data in a byte array?
10.2. How can I write audio data from a byte array to an audio file?
10.3. How can I calculate the number of bytes to skip from the length in seconds?
10.4. How do I rewind an AudioInputStream?
10.5. How do I skip backwards on an AudioInputStream?
10.6. How can I implement a real-time AudioInputStream, though I cannot give a length for it, as it is not known in advance?
10.7. How can I mix two (or more) AudioInputStream instances to a resulting AudioInputStream?
10.8. How can I create an AudioInputStream that represents a portion of another AudioInputStream?
10.9. Why does AudioInputStream.getFrameLength() return -1?
10.10. What is the difference between AudioSystem.getAudioInputStream(InputStream) and new AudioInputStream(InputStream, AudioFormat, long)?
11. Data Processing (Amplifying, Mixing, Signal Processing)
11.1. How can I do some processing on an A-law stream (like amplifing it)?
11.2. How can I detect the level of sound while I am recording it?
11.3. How can I do sample rate conversion?
11.4. How can I detect the frequency (or pitch) of sound data?
11.5. How can I do equalizing / noise reduction / fft / echo cancellation / ...?
11.6. How can I do silence supression or silence detection?
11.7. How can I do mixing of audio streams?
11.8. Should I use float or double for signal processing?
11.9. How can I do computations with complex numbers in Java?
11.10. How can I change the pitch (frequency) of audio data without changing the duration?
11.11. How can I change the duration of audio data without changing the pitch (frequency)?
11.12. How can I use reverbation?
11.13. How can I find out the maximum volume of a sound file?
11.14. How can I normalize the volume of sound?
11.15. How can I calculate the power of a signal?
12. Compression and Encodings
12.1. Ogg Vorbis
12.1.1. What is Ogg Vorbis?
12.1.2. How can I play back Ogg Vorbis files?
12.1.3. How can I encode Ogg Vorbis files?
12.1.4. Who should we lobby to get Ogg Vorbis support in the Sun JRE?
12.1.5. How can I get the duration of an Ogg Vorbis file?
12.2. mp3
12.2.1. How can I play back mp3 files?
12.2.2. Why is there no mp3 decoder in the Sun JRE/JDK?
12.2.3. What is the legal state of the JLayer mp3 decoder?
12.2.4. What are the differences between the JLayer mp3 decoder plug-in and the Sun mp3 decoder plug-in?
12.2.5. How can I encode mp3 files?
12.2.6. Is there a mp3 encoder implemented in pure java?
12.2.7. Which input formats can I use for the mp3 encoder?
12.2.8. Is mp3 encoding possible on Mac OS?
12.2.9. Why do I get an UnsupportedAudioFileException when trying to play a mp3 file?
12.2.10. How can I get the length of an mp3 stream?
12.3. GSM 06.10
12.3.1. Is there support for GSM?
12.3.2. Why does the GSM codec refuses to encode from/decode to the format I want?
12.3.3. How can I read a .wav file with GSM data or store GSM-encoded data in a .wav file?
12.3.4. I want to convert to/from GSM using the Tritonus plug-in. However, I do not work with files or streams. Rather, I want to convert byte[] arrays.
12.3.5. How can I decode GSM from frames of 260 bit?
12.3.6. How can I calculate the duration of a GSM file?
12.3.7. Are there native implementations of codecs that are compatible with the framing format used by the Java Sound GSM codec?
12.4. A-law and μ-law
12.4.1. What are A-law and μ-law?
12.4.2. How can I convert a PCM encoded byte[] to a μ-law byte[]?
12.5. Speex
12.5.1. What is Speex?
12.5.2. Is there support for Speex?
12.5.3. How do I use JSpeex?
12.5.4. How can I get the duration of a Speex file?
12.6. Miscellaneous
12.6.1. Is there support for ADPCM (a.k.a. G723) in Java Sound?
12.6.2. Is there support for WMA and ASF in Java Sound?
12.6.3. How can I convert between two encoded formats directly (e.g. from mp3 to A-law)?
12.6.4. What compression schemas can I use?
12.6.5. How can I get Encoding instances for GSM and mp3 with JDKs older than 1.5.0?
12.6.6. Is there support for RealAudio / RealMedia (.ra / .rm files)?
12.6.7. How can I get support for a new encoding?
13. Audio data transfer over networks
13.1. How can I do streaming of audio data?
13.2. Why do I get distorted sound in my streaming application if it is used on the internet, but works on a LAN?
13.3. How can I upload recorded audio data to a server?
13.4. What compression schema should I use to transfer audio data over a network?
14. Ports
14.1. How do I use the interface Port?
14.2. Why is it not possible to retrieve Port instances?
14.3. Why is it not possible to retrieve Control instances from Port lines?
14.4. What does opening and closing mean for Port lines?
14.5. Why is it not possible to read data from a microphone Port line?
14.6. Can I use Java Sound's Port interface to control volume and tone of sound played with an application using JMF?
14.7. Why are there no Port instances of certain predefined types (like Port.Info.MICROPHONE or Port.Info.COMPACT_DISC) on Linux?
15. Miscellaneous
15.1. Why is playback of audio data with Java Sound significantly quieter than with a similar player on the native OS?
15.2. Can I use multi-channel sound?
15.3. Which multi-channel soundcards can I use with Java Sound?
15.4. Can I use the rear channels of a four-channel soundcard (like Soundblaster Life! and Soundblaster Audigy)?
15.5. How can I read audio data from a CD?
15.6. Why is there no sound at all when running my program on Linux, while on Windows it works as expected?
15.7. How can I display audio data as a waveform?
15.8. What is the difference between AudioInputStream and TargetDataLine?
15.9. Does Java Sound support 24 bit/96 kHz audio?

1. DataLines

1.1. General
1.1.1. How can I be notified when data is available for write/read in a SourceDataLine or TargetDataLine?
1.1.2. Why does it fail to open any line with 16 kHz sample rate?
1.1.3. How can I get a SourceDataLine or TargetDataLine in μ-law format?
1.1.4. Why does simultaneous recording and playback only work when first opening the playback line (SourceDataLine)?
1.1.5. Why doesn't simultaneous recording and playback work at all with the Sun JDK 1.3/1.4 on GNU/Linux?
1.1.6. How can I get a Line from a specific Mixer?
1.1.7. Why are there no mono lines with the "Direct Audio Devices" mixers on Linux?
1.1.8. Why is a SourceDataLine called "source" and a TargetDataLine called "target" though it's actually the other way round?
1.1.9. Why are DataLine.getFramePosition() and DataLine.getMicrosecondPosition() so inaccurate?
1.1.10. Why does DataLine.getLevel() always return -1?
1.1.11. What is the difference between DataLine.isActive() and DataLine.isRunning()?
1.1.12. How can I detect a buffer underrun or overrun?
1.1.13. Why is there no event for notifying applications of an underrun/overrun condition?
1.1.14. How can I find out the current playback or recording position?
1.1.15. How can I do looping in playback?
1.2. SourceDataLine
1.2.1. How can I avoid that the last bit of sound played on a SourceDataLine is repeated?
1.2.2. Why is playback distorted, too fast or too slow with the JDK 1.5.0 beta, but not with earlier versions of the JDK?
1.3. TargetDataLine
1.3.1. How can I capture from a specific source (microphone or line-in)?
1.3.2. How can I get more than one TargetDataLine?
1.3.3. Why is in not possible to open more than one TargetDataLine at the same time?
1.3.4. Why do I get a LineUnavailableException: "Requested format incompatible with already established device format"?
1.3.5. How can I control the volume when recording with a TargetDataLine?
1.3.6. How should I use stop() and drain() on a TargetDataLine?
1.3.7. Why is TargetDataLine.read() blocking for a long time?
1.3.8. Why is the end of recordings cut off prematurely?
1.4. Clip
1.4.1. Why do I get an out of memory exception when trying to use a Clip with a 5 MB audio file?
1.4.2. Why do I get "LineUnavailableException: No Free Voices" when opening a Clip?
1.4.3. How can I rewind a Clip?
1.4.4. Why does the frame/microsecond position not jump back to zero when a Clip is looped?
1.4.5. Why are there failures, clicks and other random effects if a clip is played multiple times with 1.5?

1.1. General

1.1.1. How can I be notified when data is available for write/read in a SourceDataLine or TargetDataLine?
1.1.2. Why does it fail to open any line with 16 kHz sample rate?
1.1.3. How can I get a SourceDataLine or TargetDataLine in μ-law format?
1.1.4. Why does simultaneous recording and playback only work when first opening the playback line (SourceDataLine)?
1.1.5. Why doesn't simultaneous recording and playback work at all with the Sun JDK 1.3/1.4 on GNU/Linux?
1.1.6. How can I get a Line from a specific Mixer?
1.1.7. Why are there no mono lines with the "Direct Audio Devices" mixers on Linux?
1.1.8. Why is a SourceDataLine called "source" and a TargetDataLine called "target" though it's actually the other way round?
1.1.9. Why are DataLine.getFramePosition() and DataLine.getMicrosecondPosition() so inaccurate?
1.1.10. Why does DataLine.getLevel() always return -1?
1.1.11. What is the difference between DataLine.isActive() and DataLine.isRunning()?
1.1.12. How can I detect a buffer underrun or overrun?
1.1.13. Why is there no event for notifying applications of an underrun/overrun condition?
1.1.14. How can I find out the current playback or recording position?
1.1.15. How can I do looping in playback?
1.1.1.

How can I be notified when data is available for write/read in a SourceDataLine or TargetDataLine?

You have to use SourceDataLine/TargetDataLine.available(). The usual implementation for streaming audio (in Java Sound) is a dedicated thread for that - look at the Java Sound Demo which you can download from Sun or at the Java Sound Resources: Examples. (Florian)

1.1.2.

Why does it fail to open any line with 16 kHz sample rate?

Apparently, most Java Sound implementations do not provide that, even if the soundcard supports it. Future implementations will support that. (Florian)

1.1.3.

How can I get a SourceDataLine or TargetDataLine in μ-law format?

TargetDataLines are supposed to act as a "direct" way to communicate with the audio hardware device, i.e. your soundcard. When your soundcard does not support μ-law directly, the TargetDataLine won't either.

The way to go is to open a TargetDataLine in pcm format and route it through a format converter. See doc of AudioSystem to get converted Streams. The converted stream you get provides μ-law samples then.

There is no drawback in this approach: all PC soundcards that I know of deliver only PCM, so it has to be rendered to μ-law anyway in software. Whether in the soundcard's driver, the operating system layer or in the application (your java program) doesn't matter. You get maximum portability when only using pcm for TargetDataLines. (Florian)

1.1.4.

Why does simultaneous recording and playback only work when first opening the playback line (SourceDataLine)?

This depends on the soundcard and its driver to the native operating system. E.g. Soundblaster 16 or 64 do not provide real full duplex, only a kind of pseudo full duplex. I experienced under Windows that you can only use this pseudo full duplex when you have a certain order in opening record/playback lines. (Florian)

1.1.5.

Why doesn't simultaneous recording and playback work at all with the Sun JDK 1.3/1.4 on GNU/Linux?

Due to problems with some OSS drivers, full-duplex is disabled by default in versions up to 1.4.1. There are several ways to get full-duplex:

  • Use the ALSA support in JDK 1.4.2 or later. Note that in 1.4.2, the ALSA support is not used by default for playback. If you call AudioSytem.getLine(), the default is used ("Java Sound Audio Engine"). To use the "Direct Audio Device" (which uses ALSA), obtain the respective mixer with AudioSystem.getMixer() and call getLine() on the mixer. To detect the "Direct Audio Device", look for a string "ALSA" in the vendor or description string of the Mixer.Info object. Although string comparison is not a nice way, it is higly likely that "ALSA" will appear in at least one of the string in future releases. For recording, the "Direct Audio Device" is the default. A way to make the is the default for playback, too, is to rename /dev/audio and /dev/dsp. However, this will disable sound support for all non-ALSA programs. In version 1.5, the "Direct Audio Device" are the default for playback, too, if the soundcard supports mixing in hardware.

  • Use Tritonus. The Tritonus plug-ins work with Java versions that are older than 1.4.2, too. However, it is recommended to use 1.4.2 if possible. The ALSA support in 1.4.2 is more stable than the one in Tritonus.

See also Q: 3.3 (Matthias)

1.1.6.

How can I get a Line from a specific Mixer?

Obtain the list of available Mixer implementations with AudioSystem.getMixerInfo(). Select one of the available and call AudioSystem.getMixer(Mixer.Info) to obtain the Mixer. With this object you can call Mixer.getLine(Line.Info) instead of AudioSystem.getLine(Line.Info). In the JDK 1.5.0, you can also use the ease-of-use methods in AudioSystem:

With the JDK 1.5.0, there is an additional possibility: The default provider properties can be used to select the default Mixer for each type of line (SourceDataLine, TargetDataLine, Clip, Port). The default Mixer, if available, is used in AudioSystem.getLine(). For details, see the specification. (Matthias)

1.1.7.

Why are there no mono lines with the "Direct Audio Devices" mixers on Linux?

The implementation of the "Direct Audio Device" queries the soundcard driver for the supported formats. Some ALSA drivers do not support mono lines, so they are not available in the "Direct Audio Device". The workaround is to open a stereo line and expand the mono data to stereo. See also How can I convert between mono and stereo? and How can I make a mono stream appear on one channel of a stereo stream? (Matthias)

1.1.8.

Why is a SourceDataLine called "source" and a TargetDataLine called "target" though it's actually the other way round?

Well, nobody really knows why this fancy naming was chosen. From the perspective of an application, it's counter-intuitive. To understand it, take the perspective of a Mixer object: It receives data from the application via a SourceDataLine object, this is its source of data. And it delivers data to the application via a TargetDataLine. So from the perspective of the Mixer, this is the target of its data. (Matthias)

1.1.9.

Why are DataLine.getFramePosition() and DataLine.getMicrosecondPosition() so inaccurate?

The implementation of these methods in the "Java Sound Audio Engine" is bad and will not be fixed. The "Direct Audio Device" has a much better implementation. See also What are all these mixers?

But keep in mind that it is not possible to get a frame precise playback position with these methods. There is too much buffering in the data path (also in the audio hardware), so calculating the position is always only an estimation.

If you try to measure the precision of DataLine.getMicrosecondPosition() with a real-time clock, you are also likely to see the effect of a clock drift. For details on this phenomenon see Why does recording or playing for a certain period of time results in audio data that is shorter or longer than the period I recorded / played? (Matthias)

1.1.10.

Why does DataLine.getLevel() always return -1?

DataLine.getLevel() is not implemented in current versions of the Sun JDK (1.4.1), nor in any other known Java Sound implementation. Here is a suggestion from Florian Bomers on how to implement this functionality yourself:

  • Read the data from the TargetDataLine in blocks.

  • Convert each block to a common format, e.g. normalized floats [-1, +1], or 8 bit signed bytes. If your project can make use of LGPL'd code, have a look at class FloatSampleBuffer (for floats) or TConversionTool (for integer-based values) of the Tritonus project.

  • Calculate the level of the block. This could be the average, RMS power, peak amplitude, or similar. Be sure to use the absolute values (or squaring the amplitudes for the power). See also How can I calculate the power of a signal?

(Matthias)

1.1.11.

What is the difference between DataLine.isActive() and DataLine.isRunning()?

This is an issue where even the Java Sound gurus do not know a satisfying answer. A useful definition would be the following:

  • isActive() returns true if the line is in started state, i.e. between calls to start() and stop().

  • isRunning() returns true if data is actually read from or written to the device. This would mean that isRunning() returns false in case of buffer underruns or overruns.

However, this is not the way it is implemented. For the "Direct Audio Device" mixers, isActive() and isRunning() always return the same value. In general, it is recommended to use isActive(), since it is specified less ambigously and it is implemented consistently. See also bug #4791152. (Matthias)

1.1.12.

How can I detect a buffer underrun or overrun?

The following is working reliably at least with the "Direct Audio Device" mixers:

  • SourceDataLine: underrun if (line.available() == line.getBufferSize())

    SourceDataLine.available(): how much data can be written to the buffer. If the whole buffer can be written to, there is no data in the buffer to be rendered.

  • TargetDataLine: overrun if (line.available() == line.getBufferSize())

    TargetDataLine.available(): how much data can be read from the buffer. If the whole buffer can be read, there is no space in the buffer for new data captured from the line.

(Matthias)

1.1.13.

Why is there no event for notifying applications of an underrun/overrun condition?

This is Florian's (and my) opinion:

Java Sound is a low level audio API. We decided to give highest priority to performance and "bare" functionality, rather than adding many high-level features. And although this is not a reason to not add it, all low level audio API's that I have worked closely with do not provide underrun notification.

(Matthias)

1.1.14.

How can I find out the current playback or recording position?

There are two possibilities:

  • Use DataLine.getFramePosition() or DataLine.getFramePosition(). These methods are supposed to return the current "hearing" position. However, they weren't implemented well prior to the JDK 1.5.0.

  • Count the frames that you read from or write to the DataLine and add one full buffer size and 15 milliseconds (ballpark figure for hardware delay) to it. As reference point use the time when the write()/read() method returns. This allows amount correct extrapolation. This method works best if you call read()/write() with buffers that fit exactly into the line's buffer size.

    This approach also works reasonably fine with 1.4.2 and before. It is implemented in the JAM program at J1 2003.

(Matthias)

1.1.15.

How can I do looping in playback?

There are two possibilities:

(Matthias)

1.2. SourceDataLine

1.2.1. How can I avoid that the last bit of sound played on a SourceDataLine is repeated?
1.2.2. Why is playback distorted, too fast or too slow with the JDK 1.5.0 beta, but not with earlier versions of the JDK?
1.2.1.

How can I avoid that the last bit of sound played on a SourceDataLine is repeated?

This can be avoided easily: after writing all data to the SourceDataLine call drain() and stop(). If you want to reuse the line after this, call start() again before writing more data to the line. (Matthias)

1.2.2.

Why is playback distorted, too fast or too slow with the JDK 1.5.0 beta, but not with earlier versions of the JDK?

The reason is a common misconception about how Line.open() works. According to the specification, open() without parameters opens a line in a "default format". The default format of a line is an implementation specific property. It is not the AudioFormat used in the DataLine.Info object. Rather, the format in DataLine.Info is used to request a DataLine instance that is capable of handling this format. This does not necessarily mean that the line has to be opened in that format. Note that it is possible to construct DataLine.Info with an array of AudioFormat objects. This means that the requested line has to be able to handle any of the given formats.

The Java Sound implementaion prior to JDK 1.5.0 had the following property: If only one AudioFormat is given in a DataLine.Info, this AudioFormat becomes the default format of the line. This caused the behaviour that it was possible to specify the format for open() via the DataLine.Info object. However, this behaviour was never specified, it is just an implementation specific property you can't rely on in general. The "Direct Audio Device" mixers in JDK 1.5.0 beta (see also What are all these mixers?) behave different: they just pick one of the supported hardware formats as default format. This is a correct behaviour according to the specification, since the specification doesn't specify how the default format is chosen.

Therefore, it is recommended to always specify the format when opening a DataLine: use open(AudioFormat format) or open(AudioFormat format, int buffersize) rather than Line.open() without parameters. See also Line.open(), SourceDataLine.open(AudioFormat), SourceDataLine.open(AudioFormat, int), TargetDataLine.open(AudioFormat) and TargetDataLine.open(AudioFormat, int)

Wrong code
AudioFormat format = ...;
DataLine.Info info = new DataLine.Info(SourceDataLine.class, format);
// line is *capable* of being opened in format
SourceDataLine line = (SourceDataLine) AudioSystem.getLine(info);
// open in default format, not necessarily the same as format
line.open();
Correct code
AudioFormat format = ...;
DataLine.Info info = new DataLine.Info(SourceDataLine.class, format);
// line is *capable* of being opened in format
SourceDataLine line = (SourceDataLine) AudioSystem.getLine(info);
// open in desired format
line.open(format);

It was decided to change the behaviour for the final version of the JDK 1.5.0 to provide backward compatibility with the JDK 1.4. The former unportable technique will be specified behaviour. See also bugs #5053380 and #5067526 (Matthias)

1.3. TargetDataLine

1.3.1. How can I capture from a specific source (microphone or line-in)?
1.3.2. How can I get more than one TargetDataLine?
1.3.3. Why is in not possible to open more than one TargetDataLine at the same time?
1.3.4. Why do I get a LineUnavailableException: "Requested format incompatible with already established device format"?
1.3.5. How can I control the volume when recording with a TargetDataLine?
1.3.6. How should I use stop() and drain() on a TargetDataLine?
1.3.7. Why is TargetDataLine.read() blocking for a long time?
1.3.8. Why is the end of recordings cut off prematurely?
1.3.1.

How can I capture from a specific source (microphone or line-in)?

You can use the system mixer of your operating system to select the recording source in the same way you would do it for a native program. With newer versions of the Sun JDK, you can achieve the same by using the interface javax.sound.sampled.Port. See the section Ports for details. (Matthias)

1.3.2.

How can I get more than one TargetDataLine?

Current implementations of the Java Sound API do not support multiple TargetDataLines for the same recording source. There are no plans to change this behaviour. If, in the future, multi-channel soundcards are supported, it may be possible to get different TargetDataLine instances for the different inputs. If you just want to "split" lines, do it in your application. See also Can I use multi-channel sound? (Matthias)

1.3.3.

Why is in not possible to open more than one TargetDataLine at the same time?

Well, because it's a bug. The above is true for the Sun JDK up to version 1.4.2 on Solaris and Windows, and up to 1.4.1 on Linux. Beginning with version 1.5.0 for Solaris and Windows and version 1.4.2 for Linux there are the new "Direct Audio Device" mixer that don't have this limitation.

Tritonus is unaffected by this limitation. (Matthias)

1.3.4.

Why do I get a LineUnavailableException: "Requested format incompatible with already established device format"?

This is a bug that was fixed for 1.4.2. If you have to use an older version, there are two possible workarounds:

  • Do not play back anything using the "Java Sound Audio Engine" before recording. In version prior to 1.4.2, there is no way of doing playback at all without using the "Java Sound Audio Engine". If the "Java Sound Audio Engine" is used, it results in opening the sound device for 44100 Hz, 16 bit stereo, thereby setting the "previously established format".

  • Always capture at 16 bit, stereo, 44100Hz. If you need your sound data in a different format, you can convert it afterwards. See also Conversion between sample representations and How can I do sample rate conversion?

(Matthias)

1.3.5.

How can I control the volume when recording with a TargetDataLine?

The obvious solution would be to get a Control object of type VOLUME or MASTER_GAIN for the TargetDataLine and manipulate the volume via this object. However, this is not possible, since no known Java Sound implementation supports any controls for TargetDataLine instances.

What you can do is to use the system mixer to control the recording volume --- it affects hardware settings in the soundcard. One possibility is to use the mixer application of the operating system. The other possibility is using Port lines from inside a Java Sound application. See the section Ports for details.

The remaining possibility is to implement a volume control digitally: multiplying each single sample of the sound data with a certain value that lowers or raises the level proportionally. See also Change the amplitude (volume) of an audio file (Matthias)

1.3.6.

How should I use stop() and drain() on a TargetDataLine?

It is specified that TargetDataLine.drain() has to wait until all data has been delivered to the TargetDataLine. If the line is not yet stopped, there is always data being delivered to the line. So you should call drain() only after stop(). In fact, drain() isn't needed with TargetDataLine at all.

A common technique to terminate reading from a TargetDataLine is the following:

TDL.stop();
do
{
    count = TDL.read();
}
while (count > 0);
TDL.close();

For an implementation of TargetDataLine.drain() to be 100% compliant you need to block when the line is started and there is still data available. One way to do this is the following:

public void drain()
{
    while (isActive() && (available() > 0))
    {
        Thread.sleep(100);
    }
}

(Matthias)

1.3.7.

Why is TargetDataLine.read() blocking for a long time?

By specification, TargetDataLine.read() is a blocking call: it waits until the requested amount of data is available. To use read() in a non-blocking manner, you can check how much data is available with available() and request only that amount. If you want to use read() in a standard blocking manner, but need quick response for a real-time application, use smaller buffers for reading. See also What is the minimum buffer size I can use? (Matthias)

1.3.8.

Why is the end of recordings cut off prematurely?

Even after calling stop() on a TargetDataLine, there may be data remaining in its internal buffer. Make sure you read data until there is no more available. Then you can call close() on the line. See also How should I use stop() and drain() on a TargetDataLine? (Matthias)

1.4. Clip

1.4.1. Why do I get an out of memory exception when trying to use a Clip with a 5 MB audio file?
1.4.2. Why do I get "LineUnavailableException: No Free Voices" when opening a Clip?
1.4.3. How can I rewind a Clip?
1.4.4. Why does the frame/microsecond position not jump back to zero when a Clip is looped?
1.4.5. Why are there failures, clicks and other random effects if a clip is played multiple times with 1.5?
1.4.1.

Why do I get an out of memory exception when trying to use a Clip with a 5 MB audio file?

For files of this size, you should stream the audio. Like that you treat buffers of small size and feed them successively into the audio device. Look at the Java Sound Resources: Examples, there are some streaming audio players to take as a start. (Florian)

1.4.2.

Why do I get "LineUnavailableException: No Free Voices" when opening a Clip?

This happens with the "Java Sound Audio Engine" when too many clips are open. While you can obtain any number of Clip instances, only 32 can be open at the same time. This is a hard limitation of the engine; it can only mix 32 channels. As a workaround, you can close unused clips and open them once they are needed again. If you really need more than 32 channels, you can do the mixing in your application and output the result to a SourceDataLine. (Matthias)

1.4.3.

How can I rewind a Clip?

Stop the clip by calling stop(), then use clip.setFramePosition(0) or clip.setMicrosecondPosition(0). Alternativly, you can set looping points so that rewinding occurs automatically: clip.setLoopPoints(0, -1) (In this case you have to call clip.loop(...) instead of clip.start().) (Matthias)

1.4.4.

Why does the frame/microsecond position not jump back to zero when a Clip is looped?

getFramePosition() and getMicrosecondPosition() are specified to return the position corresponding to the time since the line (or clip) was opened. If you want to get the position inside the loop of a looping clip, you can use something similar to this (assuming you are looping over the whole length of the clip):

currentFrame = clip.getFramePosition() %
clip.getFrameLength();

(Matthias)

1.4.5.

Why are there failures, clicks and other random effects if a clip is played multiple times with 1.5?

This is a bug, and apparently one not easy to fix. See bug #6251460. Note that you can work around this issue by using the old "Java Sound Audio Engine" instead of the "Direct Audio Device" mixers. This way, you get the same behaviour as in 1.4. See also What are all these mixers? (Matthias)

2. Controls

2.1. Why does the SourceDataLine instances I get when using the "Direct Audio Device" (ALSA on Linux) have no controls?
2.2. What is the difference between a BALANCE and a PAN control? Which one should I use?
2.3. Why do mono lines from a "Direct Audio Device" have no PAN control?
2.4. Why does obtaining a gain control work with 1.4.2, but not with 1.5.0?
2.5. Why do Clip and SourceDataLine instances have no VOLUME control?
2.6. Why is there no sample rate control in 1.5.0?
2.1.

Why does the SourceDataLine instances I get when using the "Direct Audio Device" (ALSA on Linux) have no controls?

Lines from these mixers do not provide controls in 1.4.2. In Florian's original opinion, "any control would obscure the initial idea, to provide high-performance direct audio access". However, he changed his mind and implemented volume and balance controls in 1.5.0. (Matthias)

2.2.

What is the difference between a BALANCE and a PAN control? Which one should I use?

In music, pan knobs are used for mono input lines to control how they are mapped to stereo output lines. On the other hand, for stereo input lines, the knob is labelled "balance". So you should get a PAN control for mono lines and a BALANCE control for stereo lines (and none for lines with more than 2 channels).

In the Sun J2SDK, PAN controls behave like BALANCE controls for stereo lines and BALANCE like PAN for mono lines. However, this is only a convenience for compatibility. To write portable programs, you should not rely on this behaviour. (Matthias)

2.3.

Why do mono lines from a "Direct Audio Device" have no PAN control?

To implement a PAN control for a mono line, it has to be "distributed" between the left and right channel of a stereo line. This was no problem with the "Java Sound Audio Engine". The "Java Sound Audio Engine" always opens the soundcard in stereo, so it is always possible to do this "distribution". The "Direct Audio Device" implementation, however, opens the soundcard in mono if a mono line is requested. So it's not possible to implement a PAN control for such lines.

The workaround is to work with stereo: convert your stream to stereo and open the SourceDataLine in that stereo format. Then this line will have a BALANCE control, which works like a PAN control. See also How can I convert between mono and stereo? and What is the difference between a BALANCE and a PAN control? Which one should I use? (Matthias)

2.4.

Why does obtaining a gain control work with 1.4.2, but not with 1.5.0?

Gain (FloatControl.Type.MASTER_GAIN / FloatControl.Type.VOLUME) controls are still available with the "Direct Audio Device" mixers in 1.5.0 (see also What are all these mixers?). However, the behaviour has been changed so that controls are only available after the line has been opened. This was necessary because in general, some control are only available if the device driver supports certain features, which can be queried only after the respective device has been opened. (Matthias)

2.5.

Why do Clip and SourceDataLine instances have no VOLUME control?

Clip and SourceDataLine instances provide a FloatControl.Type.MASTER_GAIN control rather than a FloatControl.Type.VOLUME control to control the playback volume. See also Why does obtaining a gain control work with 1.4.2, but not with 1.5.0? (Matthias)

2.6.

Why is there no sample rate control in 1.5.0?

The "Direct Audio Device" mixers in 1.5 (see What are all these mixers?) do not provide a sample rate control. To cite Florian:

This is mostly because we wanted to give direct access to the sound hardware, without the problems of high-level features — namely latency and processor usage. We may add sample rate in future if we find a good way to add it without affecting performance.

As an alternative, you can resample your data with a sample rate converter to achieve the same effect. See also How can I do sample rate conversion?

Or, you can still use the sample rate control of the "Java Sound Audio Engine" with 1.5 by requesting lines directly from it. See How can I get a Line from a specific Mixer? (Matthias)

3. DataLine buffers

3.1. What is the minimum buffer size I can use?
3.2. Why does a line have the default buffer size though a buffer size was specified in a DataLine.Info object when obtaining the line?
3.3. Why is it not possible to use large buffers for a DataLine with 1.5.0?
3.1.

What is the minimum buffer size I can use?

Obviously, this depends on the operating system, the hardware, the Java VM, which Mixer implementation you use and several other factors. The following measurements have been found experimentally on a very old PC (350 MHz) under Linux with the Sun JDK 1.4.2_02:

formatsample ratePlaybackRecording
Java Sound Audio EngineDirect Audio DeviceSimple Input DeviceDirect Audio Device
8 bit mono11025 Hzno resultsno resultsno resultsno results
22050 Hzno resultsno resultsno resultsno results
44100 Hzno resultsno resultsno resultsno results
8 bit stereo11025 Hzno resultsno resultsno resultsno results
22050 Hzno resultsno resultsno resultsno results
44100 Hzno resultsno resultsno resultsno results
16 bit mono11025 Hz1024 bytesno resultsno resultsno results
22050 Hz2048 bytesno resultsno resultsno results
44100 Hz4096 bytesno resultsno resultsno results
16 bit stereo11025 Hzno resultsno resultsno resultsno results
22050 Hzno resultsno resultsno resultsno results
44100 Hzno resultsno resultsno resultsno results

These measurements suggest that the latency introduced by buffers in the "Java Sound Audio Engine" is about 50 ms, independant of the sample rate. (Matthias)

3.2.

Why does a line have the default buffer size though a buffer size was specified in a DataLine.Info object when obtaining the line?

This happens with the "Direct Audio Device" of the JDK 1.5.0 if the line is opened with open(AudioFormat) instead of open(AudioFormat, int). The reason for this behaviour is that by requiring a certain buffersize or range of buffersizes in DataLine.Info, you obtain a line that is capable of setting its buffersize to the respective value. You still have to choose the actual value. This is done when opening the line: with open(AudioFormat, int), a certain buffer size for the line can be specified. If open(AudioFormat) is used, the line is opened with the default buffer size. Until 1.4.2, a buffersize in DataLine.Info was used in opening if the open() call does not specify a buffer size. However, it was decided that automatically taking over this value is a questionable convenience. (Matthias)

3.3.

Why is it not possible to use large buffers for a DataLine with 1.5.0?

The DataLine implementation of the "Java Sound Audio Engine" has a circular buffer per line instance. For SourceDataLine instances, write() writes data to this buffer. A separate thread reads from the circular buffer and transfers the data to the native layer of the engine. This allows for arbitrary sized buffers, but results in the overhead of an additional buffer and one thread per DataLine.

The DataLine implementation of the "Direct Audio Device" of 1.5.0 does not have a circular buffer. Instead, it writes/reads data directly to/from the soundcard driver. This gives higher performance and lower latency. On the other hand, it restricts buffer sizes to what the soundcard driver supports.

Adding a layer of buffering to the "Direct Audio Device" mixers would result in the same performance penalty as the DataLine implementation of the "Java Sound Audio Engine". It would introduce a general overhead though the additional functionality is only needed in special cases. Therefore, it is unlikely that the implementation of the "Direct Audio Device" mixers will be changed to allow larger buffers.

If you need larger buffers, you can implement an additional layer with a circular buffer in your application. Then you can choose any size you want for this buffer. And note that you need an additional thread — like the "Java Sound Audio Engine". The Answering Machine has classes that do a similar job. There is also the class org.tritonus.share.TCircularBuffer in Tritonus that you can use for this purpose. (Matthias)

4. Mixers

4.1. What are all these mixers?
4.2. Why are there mixers from which I can't get a SourceDataLine?
4.3. How can I redirect sound output to a phone / modem device?
4.4. Can I use multiple soundcards at the same time?
4.5. Why can I record from different soundcards, but not play back to them?
4.6. How can I obtain the formats supported by a mixer (or at all)?
4.7. What formats are supported by "Direct Audio Device" mixers?
4.8. Why are there AudioFormat objects with frame rate/sample rate reported as -1 when I query a Mixer for its supported formats?
4.9. How can I detect which Port Mixer belongs to which soundcard?
4.10. How can I find out which Mixer implementation is used?
4.11. Why do I get lines from the "Java Sound Audio Engine" in the JDK 1.5.0 though the "Direct Audio Device" mixers are available, too?
4.1.

What are all these mixers?

There are several implementations of Mixer in Java Sound:

"Java Sound Audio Engine", beatnik engine

This is a software mixing engine. It provides SourceDataLine and Clip instances. It does not provide TargetDataLine instances. Output of this mixer goes to the audio device. In versions up to 1.4.2, this mixer is the default for playback. In 1.5, it is only used if there is no other way to mix audio streams (because neither the soundcard hardware nor the device driver support mixing).

Simple Input Devices, "Microsoft Sound Mapper" (Windows), "Linux,dev/dsp,multi threaded" (Linux), "Linux,dev/audio,multi threaded" (Linux, Solaris)

In versions 1.4.2 and earlier, this mixer is used for recording. It provides TargetDataLine instances, but nothing else. In 1.5, it is no longer available, because the direct audio devices can be used for recording on all platforms.

Direct Audio Devices, "Primary Sound Driver" (Windows), "Primary Sound Capture Driver" (Windows), "Soundcard [plughw:0,0]" (Linux)

These are mixers that can be used for playback as well as for recording. They provide SourceDataLine, TargetDataLine and Clip instances. In 1.4.2, they became available on Linux; in 1.5, Solaris and Windows followed. These mixers allow simultaneous playback and recording (full-duplex) if the soundcard supports it. These mixers do not do software mixing. So mixing of multiple playback lines is only available if either the soundcard hardware or the device driver are capable of mixing. In other words: You may get only one SourceDataLine, and you will always get only one TargetDataLine

Port Mixers, "Port Soundcard" (Windows), "Port Soundcard [hw:0,0]" (Linux)

These mixers provide Port instances, but no other type of Line. So you can't play back or record with these mixers. They became available with 1.4.2 for Windows, and will be available for Solaris and Linux, too, in 1.5. See also Ports

Note that what Java Sound calls "Mixer" is different from what Windows calls "Mixer":

Java SoundWindows
Mixeraudio device
Portmixer

See also How can I find out which Mixer implementation is used? (Matthias)

4.2.

Why are there mixers from which I can't get a SourceDataLine?

There are mixer that only provide TargetDataLine instances. In the Sun JDK up to 1.4.2, SourceDataLine instances are only provided by the "Java Sound Audio Engine", while TargetDataLine instances are only provided by the "Simple Input Device" mixers. This is subject to change for JDK 1.5.

Starting with version 1.4.2, there are additional mixers that provide only Port instances. See also What are all these mixers? (Matthias)

4.3.

How can I redirect sound output to a phone / modem device?

With the Sun JDK 1.4.2 or earlier on Windows, you can set the default audio device to the telephone device: Control panel -> Multimedia (or Sounds...) -> Preferred Device. With the "Direct Audio Device" mixers of the JDK 1.5 it is also possible to use the default provider properties to set the default Mixer / MixerProvider inside Java Sound.

See also Why are there mixers from which I can't get a SourceDataLine?, How can I capture from a specific source (microphone or line-in)?, How can I get a Line from a specific Mixer? and Why can I record from different soundcards, but not play back to them? (Matthias)

4.4.

Can I use multiple soundcards at the same time?

For the Sun JDK, this is possible with version 1.4.2 and later for Linux and with version 1.5.0 and later for Solaris and Windows. For Tritonus, this is possible with the ALSA Mixer implementation. (Matthias)

4.5.

Why can I record from different soundcards, but not play back to them?

This is true for Solaris and Windows for Java versions up to 1.4.2. There, playback is only possible via the "Java Sound Audio Engine", which always uses the first soundcard. On the other hand, recording in these versions is done with the "Simple Input Device", which provider one Mixer instance per soundcard.

With the "Direct Audio Device" mixers, it is possible to choose different soundcards for output, too. See also What are all these mixers? (Matthias)

4.6.

How can I obtain the formats supported by a mixer (or at all)?

First, obtain a list of supported lines either from a Mixer object or from AudioSystem. For this, use the methods getSourceLineInfo() and getTargetLineInfo(). Then, check each of the returned Line.Info objects if it is an instance of DataLine.Info. If it is, cast the object to DataLine.Info. Now you can call getFormats() to obtain the AudioFormat types supported by this line type.

A code example:

Line.Info[] infos = AudioSystem.getSourceLineInfo();
// or:
// Line.Info[] infos = AudioSystem.getTargetLineInfo();
for (int i = 0; i < infos.length; i++)
{
  if (infos[i] instanceof DataLine.Info)
  {
    DataLine.Info dataLineInfo = (DataLine.Info) infos[i];
    AudioFormat[] supportedFormats = dataLineInfo.getFormats();
  }
}

To see what is supported on your system, you can use the application jsinfo. See also Why are there AudioFormat objects with frame rate/sample rate reported as -1 when I query a Mixer for its supported formats? (Matthias)

4.7.

What formats are supported by "Direct Audio Device" mixers?

It depends on the hardware. The mixers just report formats that are supported by the device driver. Typically, there are between 8 and 20 supported formats. To write a portable application, you should not assume that a certain format is always supported (though in fact, 44.1 kHz 16 bit stereo is almost always supported). Rather, you should check the supported formats at run-time and try to convert your audio data to one of the available formats. See also How can I obtain the formats supported by a mixer (or at all)? (Matthias)

4.8.

Why are there AudioFormat objects with frame rate/sample rate reported as -1 when I query a Mixer for its supported formats?

The -1 (AudioSystem.NOT_SPECIFIED) means that any reasonable sample rate is supported. Common soundcards typically support sample rates between 4 kHz and 48 kHz. See also How can I obtain the formats supported by a mixer (or at all)? (Matthias)

4.9.

How can I detect which Port Mixer belongs to which soundcard?

There is no really satisfying solution. You can try to match the name in the Mixer.Info object of a Port Mixer against the one of a DataLine Mixer. On Linux, this is reliable by looking at the device id that is part of the mixer name: "(hw:0)", "(hw:1)", "(plughw:0,1)". The first (or only) number refers to the number of the soundcard.

Windows don't allow to query which port belongs to which soundcard (there are ways on Windows, but it was not possible to use them for Java Sound because they require actually opening the devices). So the only thing you can do is to match the name of the soundcard. However, this will not always work reliably. Especially, if there are two soundcards of the same model, their names will look the same.

See also What are all these mixers? (Matthias)

4.10.

How can I find out which Mixer implementation is used?

You can detect the mixer implementation from the class types of the lines you get:

Mixer implementationinterface typeclass name
Java Sound Audio EngineMixerHeadspaceMixer
SourceDataLineMixerSourceLine
ClipMixerClip
Direct Audio DeviceMixerDirectAudioDevice
SourceDataLineDirectAudioDevice$DirectSDL
TargetDataLineDirectAudioDevice$DirectTDL
ClipDirectAudioDevice$DirectClip
Simple Input DeviceMixerSimpleInputDevice
TargetDataLineSimpleInputDevice$InputDeviceDataLine
Tritonus ESD mixerMixerEsdMixer
SourceDataLineEsdSourceDataLine
TargetDataLineEsdTargetDataLine
ClipEsdClip
Tritonus ALSA mixerMixerAlsaDataLineMixer
SourceDataLineAlsaSourceDataLine
TargetDataLineAlsaTargetDataLine

See also What are all these mixers? and How can I find out which soundcard driver is used? (Matthias)

4.11.

Why do I get lines from the "Java Sound Audio Engine" in the JDK 1.5.0 though the "Direct Audio Device" mixers are available, too?

In the JDK 1.5.0, the "Direct Audio Device" mixers are used by default if they support more than one concurrently active SourceDataLine. This is the case if either the soundcard hardware supports mixing of multiple channels (and the driver supports it) or the driver does software mixing of multiple channels.

If this is not the case, the "Java Sound Audio Engine" is used by default. If you don't mind the limitation that there will be only one SourceDataLine or Clip instance, you can still use the "Direct Audio Device" mixers by addressing them directly (see How can I get a Line from a specific Mixer?).

See also Can I make ALSA the default in version 1.4.2? and How can I enable mixing with the "Direct Audio Device" mixers on Linux? (Matthias)

5. Soundcard Drivers

5.1. Which soundcard drivers can be used by Java Sound?
5.2. How can I find out which soundcard driver is used?
5.3. I've installed ALSA and the JDK 1.4.2 to take advantage of the ALSA support. Now, how do I use it?
5.4. Can I make ALSA the default in version 1.4.2?
5.5. How can I enable mixing with the "Direct Audio Device" mixers on Linux?
5.6. What are the requirements for using the direct audio devices?
5.7. How can I find out which soundcard driver is installed on my Linux system?
5.8. How does Java Sound deal with hardware buffers of the soundcard?
5.1.

Which soundcard drivers can be used by Java Sound?

Mixer implementationWindowsLinux
Java Sound Audio EngineWindows Multimedia APIOSS or ALSA OSS emulation
Direct Audio DeviceDirectSoundALSA
Simple Input DeviceWindows Multimedia APIOSS or ALSA OSS emulation
Tritonus ALSA Mixer---ALSA
Tritonus ESD Mixer---depends on the version of Esound. There are versions for OSS and ALSA.
jsasioASIO Driver API---

See also What are all these mixers? and Q: 3.5 (Matthias)

5.2.

How can I find out which soundcard driver is used?

First, check which mixer is used (see How can I find out which Mixer implementation is used?). Then consult the table in Which soundcard drivers can be used by Java Sound? to find out the driver.

For Linux, there is no way to tell from Java Sound if a real OSS driver or ALSA's OSS emulation is used. See also How can I find out which soundcard driver is installed on my Linux system? (Matthias)

5.3.

I've installed ALSA and the JDK 1.4.2 to take advantage of the ALSA support. Now, how do I use it?

In 1.4.2, the "Java Sound Audio Engine" is still the default. To use the ALSA support, you have to obtain the Mixer object representing the direct audio access. Then, obtain lines from this object instead of via AudioSystem. See also How can I get a Line from a specific Mixer? (Matthias)

5.4.

Can I make ALSA the default in version 1.4.2?

You can, but only with an ugly trick: rename, remove or disable the device files /dev/dsp*. This disables the Java Sound Audio Engine, so the JDK falls back to use the ALSA mixers. But be aware that this disables the software synthesizer ("Java Sound Synthesizer"), too. So you won't be able to play MIDI files. And of course native applications using /dev/dsp won't be happy, too. (Matthias)

5.5.

How can I enable mixing with the "Direct Audio Device" mixers on Linux?

The "Direct Audio Device" implementation on Linux is based on ALSA. Mixing is available in the Mixer instance if ALSA provides mixing. This is the case if the soundcard can do mixing in hardware and its ALSA driver supports this feature. This is true for some common soundcards like Soundblaster LIFE! and Soundblaster Audigy and cards based on the Trident 4D Wave NX chipset. If this feature is available at all, it needs no special configuration. It is enabled by default.

Using ALSA's dmix plug-in does not work together with Java Sound. The reason is that the "Direct Audio Device" mixer implementation based on ALSA queries the available hardware devices. However, a dmix device in ALSA is no hardware device, so it is not recognized. Discussions about this issue led to the conclusion that there is no easy way to integrate a query for additional devices.

See also Q: 3.4 (Matthias)

5.6.

What are the requirements for using the direct audio devices?

According to Florian:

Operating SystemJDK versionAudio Device driver
Linux1.4.2ALSA 0.9.2 or later
Windows1.5.0DirectSound 5.0 or later (included with Windows ME/2000/XP)
Solaris1.5.0Mixer enabled (available in Solaris 8 and later)

(Matthias)

5.7.

How can I find out which soundcard driver is installed on my Linux system?

Run /sbin/lsmod to show the currently loaded kernel modules. If there are entries "snd" and "snd-*", you are running ALSA. A typical picture of ALSA is like this:

snd-mixer-oss          12672   1 (autoclean) [snd-pcm-oss]
snd-seq                38348   0 (autoclean) (unused)
snd-emu10k1            65956   1 (autoclean)
snd-hwdep               5024   0 (autoclean) [snd-emu10k1]
snd-rawmidi            13792   0 (autoclean) [snd-emu10k1]
snd-pcm                64416   0 (autoclean) [snd-pcm-oss snd-emu10k1]
snd-page-alloc          6148   0 (autoclean) [snd-emu10k1 snd-pcm]
snd-timer              15040   0 (autoclean) [snd-seq snd-pcm]
snd-ac97-codec         42200   0 (autoclean) [snd-emu10k1]
snd-seq-device          4116   0 (autoclean) [snd-seq snd-emu10k1 snd-rawmidi]
snd-util-mem            1504   0 (autoclean) [snd-emu10k1]
snd                    36832   0 (autoclean) [snd-pcm-oss snd-mixer-oss snd-seq
snd-emu10k1 snd-hwdep snd-rawmidi snd-pcm snd-timer snd-ac97-codec
snd-seq-device snd-util-mem]
soundcore               3556   6 (autoclean) [snd]
		  

An alternative way it to look for the directory /proc/asound/. It is only present if ALSA is active. (Matthias)

5.8.

How does Java Sound deal with hardware buffers of the soundcard?

Internally, Java Sound implementations usually do not work with hardware buffers. Instead, they use the platform's audio API for accessing the soundcard. See also DataLine buffers (Matthias)

6. Synchronization

6.1. How can I synchronize two or more playback lines?
6.2. How can I synchronize playback (SourceDataLines) with recording (TargetDataLines)?
6.3. How can I synchronize playback to an external clock?
6.4. Do multiple Clip instances that are looped stay in sync?
6.5. Why does recording or playing for a certain period of time results in audio data that is shorter or longer than the period I recorded / played?
6.6. How can I use Mixer.synchronize()?
6.1.

How can I synchronize two or more playback lines?

The synchronization functions in Mixer are not implemented. Nevertheless, playback typically stays in sync. (Matthias)

6.2.

How can I synchronize playback (SourceDataLines) with recording (TargetDataLines)?

As with multiple playback lines from the same Mixer object, playback and recording lines from the same Mixer object stay in sync once they are started. In practice, this means that you can achieve synchronization this easy way only by using the "Direct Audio Device" mixers. Since the "Java Sound Audio Engine" only provides playback lines, but no recording lines, playback/recording sync is not as easy with the "Java Sound Audio Engine". See also How can I synchronize two or more playback lines?

If playback and recording lines originate from different Mixer objects, you need to synchronize the soundcards that are represented by the Mixer objects. So the situation is similar to external synchronization. See also How can I synchronize playback to an external clock?

(Matthias)

6.3.

How can I synchronize playback to an external clock?

This is possible in one of two ways:

See also Q: 3.12 (Matthias)

6.4.

Do multiple Clip instances that are looped stay in sync?

Yes. There is no mechanism in Java Sound to start Clip instances synchronuously. However, calling start() for all Clip instances in a loop with the Clip instances otherwise prepared should be precise enough. Once started, Clip instances played on the same Mixer instance should stay in sync. If they don't, make sure they have the exactly same length. Clip instances played on different Mixer instance are likely to drift away from each other, unless the soundcard clocks are synchronized (which is only possible on "pro" soundcards). (Matthias)

6.5.

Why does recording or playing for a certain period of time results in audio data that is shorter or longer than the period I recorded / played?

The reason of this problem is clock drift. There are two clocks involved in this scenario: The real time clock is used to measure the period of time you are recording or playing. The soundcard clock determine how many samples are recorded or played during this period. Since there are two different hardware devices, the inherently drift away from each other over time.

There are several ways to deal with this problem:

  • You can try to minimize the drift by making both clocks high-precision. The real-time clock of the computer can be synchronized to atomic clocks by using some means of synchronization. The Network Time Protocol (NTP) is commonly used for this on the internet. On Windows, the utility AboutTime can be used for synchronization. The precision of the soundcard clock can be enhanced by using a professional soundcard with a "word clock" input. This input has to be connected to an external high-precision time base. In this case, the soundcard clock is synchronized to the external clock source. Professional studios often spend tens of thousands of dollars to purchase a high-precision time base. Note that this solution minimizes the drift, but cannot remove it completely.

  • You can use the soundcard clock as your time base to measure wall-clock time. This way, you have removed the second clock, so there is no drift. While this may sound inconvenient, it may be a good solution if the audio data has to be synchronized to, for instance, video playback or the playback of slides, mouse events or MIDI. If your soundcard's clock is synchronized to an external time base as described in the previous point, using it to measure wall-clock time is likely to give much better results than using the computer's (unsynchronized) "real time" clock.

  • If both of the above solutions are not appropriate, you can adapt the length of the audio data by doing time streching/shrinking. This usually requires fairly advanced and computationally expensive DSP algorithms. In this case, you do not remove the clock drift, but remove the effect of it on your audio data.

(Matthias)

6.6.

How can I use Mixer.synchronize()?

Synchronization isn't implemented in any known Java Sound implementation. It may be implemented in future versions. Note that you can check the availability of synchronization with the method Mixer.isSynchronizationSupported(). See also Do multiple Clip instances that are looped stay in sync?, How can I synchronize two or more playback lines? and Why does recording or playing for a certain period of time results in audio data that is shorter or longer than the period I recorded / played? (Matthias)

7. Audio Files

7.1. How can I save audio data to a file, like .wav or .aiff?
7.2. How can I add special chunks to .wav or .aiff files (like for a descriptive text or copyright)?
7.3. Is it possible to get information about loop points (e.g. from the 'smpl' chunk in .wav files) using the AudioFileFormat properties?
7.4. Why does AudioFileFormat.getFrameLength() always return -1 for .wav files?
7.5. Why does a .wav file contain PCM_UNSIGNED data if I try to save 8 bit PCM_SIGNED data?
7.6. How can I read in a .vox file and save it as .wav file?
7.7. How can I read from a headerless audio file?
7.8. How can I determine the length or the duration of an audio file?
7.9. How can I write an audio file in smaller parts?
7.10. Why are some .wav files not recognized by Java Sound?
7.11. Why is it not possible to write big-endian data using a WaveAudioOutputStream?
7.12. How can I edit or modify audio files?
7.13. How can I play audio files where the data is cached in the RAM?
7.14. Why is there a difference between using AudioSystem.write(..., File) and using AudioSystem.write(..., OutputStream) with a FileOutputStream?
7.15. Where can I find documentation on the AudioOutputStream programming?
7.16. How can I start playback of a file at a certain position?
7.17. Is it possible to read and write multichannel audio files?
7.18. How can I compare two audio files?
7.19. Is it possible to insert recorded audio data into an existing file?
7.20. How can I store an audio file in a byte array?
7.21. Which value should I use for the length of the file in AudioOutputStreams if the length is not known in advance?
7.1.

How can I save audio data to a file, like .wav or .aiff?

Have a look at the Java Sound Resources: Examples. (Florian)

7.2.

How can I add special chunks to .wav or .aiff files (like for a descriptive text or copyright)?

The Java Sound API does not support this currently. Future versions are likely to, because this is indeed quite important. For the moment, you will need to implement your own class for writing .wav or .aiff files. Or make meaningful filenames... (Florian)

7.3.

Is it possible to get information about loop points (e.g. from the 'smpl' chunk in .wav files) using the AudioFileFormat properties?

While with the JDK 1.5's properties there is a way to represent such information, Sun's AudioFileReader implementation just ignores such chunks. However, it is possible to write an own implementation that handles the chunks and places the information in AudioFileFormat properties. See also Q & A 2, “Service Provider Interface (SPI)” (Matthias)

7.4.

Why does AudioFileFormat.getFrameLength() always return -1 for .wav files?

This information is never given in the AudioFileFormat for .wav files. It is a more or less reasonable choice from an implementation point of view. The reason is the chunk-oriented structure of the .wav file format. The information about the audio data length is in the format chunk of the .wav file. According to the specification, this chunk may be the last one. In other words: It may be the case that for getting the format information, you have to read to the end of a 20 MB file. That's why the implementors decided to not give this information.

The workaround: fetch an AudioInputStream with AudioSystem.getAudioInputStream(File). Then query the AudioInputStream object for its length. You can see an example of this technique in Getting information about an audio file. (Matthias)

7.5.

Why does a .wav file contain PCM_UNSIGNED data if I try to save 8 bit PCM_SIGNED data?

By the specification, 8 bit data in .wav files has to be unsigned. Therefore, the signedness is converted automatically by Java Sound's file writer. (Matthias)

7.6.

How can I read in a .vox file and save it as .wav file?

Probably it's simplest to do all by yourself: use a RandomAccessFile or similar to open the vox file, parse the headers, etc. You need to know the vox file format, you can find many documents specifying it on the Internet. To create a .wav file from that, create a AudioFileFormat instance with the format read from the vox-header and supply an InputStream with the audi data of the vox file. You can then use AudioSystem.write() to write a .wav file. (Florian)

7.7.

How can I read from a headerless audio file?

If you know the format of your data, you can use the following approach:

File file = new File("headerless_audio_data.dat");
InputStream is = new FileInputStream(file);
is = new BufferedInputStream(is);
AudioFormat format = new AudioFormat(...);
long lLengthInFrames = file.length() / format.getFrameSize();
AudioInputStream ais = new AudioInputStream(is, format,
               lLengthInFrames);

See also the example Converting raw data (headerless) files. (Matthias)

7.8.

How can I determine the length or the duration of an audio file?

A common technique that works for PCM data is shown in the example Getting information about an audio file. For files with encoded data, the general technique is the following:

File file = new File("my_file.ext");
AudioFileFormat audioFileFormat = AudioSystem.getAudioFileFormat(file);
// get all properties
Map<String, Object> properties = audioFileFormat.properties();
// duration is in microseconds
Long duration = (Long) properties.get("duration");
}

Note that this technique requires the JDK 1.5.0. Even with this version, it currently does not work for ordinary .aiff, .au and .wav files (this is an implementation issue that can be fixed easily).

With recent javazoom versions of the mp3 and Ogg Vorbis plug-ins (not with the Tritonus versions), you can use a hack that tries to simulate the above programming technique. It can be used with older JDK versions:

import org.tritonus.share.sampled.file.TAudioFileFormat

File file = new File("my_file.ext");
AudioFileFormat audioFileFormat = AudioSystem.getAudioFileFormat(file);
if (audioFileFormat instanceof TAudioFileFormat)
{
    // Tritonus SPI compliant audio file format.
    Map properties = ((TAudioFileFormat) audioFileFormat).properties();
    // duration is in microseconds
    Long duration = (Long) properties.get("duration");
}

See also Why does AudioInputStream.getFrameLength() return -1?, How can I get the duration of an Ogg Vorbis file?, How can I get the length of an mp3 stream? and How can I calculate the duration of a GSM file? (Matthias)

7.9.

How can I write an audio file in smaller parts?

AudioSystem.write() assumes that the AudioInputStream you pass to it contains everything that should go into the file. If you don't want to write the file as a whole, but in blocks, you can't use AudioSystem.write(). The alternative is to use Tritonus' AudioOutputStream architecture. See Tritonus plug-ins. (Matthias)

7.10.

Why are some .wav files not recognized by Java Sound?

Most types of audio files formats, including .wav, can contain audio data in various compressed formats. Only some of the formats are handled by the standard audio file readers. The formats handled are A-law and μ-law. Not handled are IMA ADPCM, MS ADPCM, and others. (Matthias)

7.11.

Why is it not possible to write big-endian data using a WaveAudioOutputStream?

.wav files always store data in little-endian order. And by design, AudioOutputStreams do not do any magic. Especially, they do not automatically convert endianess or signedness. (Matthias)

7.12.

How can I edit or modify audio files?

There are no special methods for this in Java Sound. Nevertheless, it is obviously possible: read data from a file into a byte array, modify the audio data there and save the modified array to a file. See also How can I read an audio file and store the audio data in a byte array? and How can I write audio data from a byte array to an audio file?.

An alternative approach is to write a subclass of AudioInputStream that modifies the data "flowing" through it. You can see an example of this technique in Change the amplitude (volume) of an audio file. (Matthias)

7.13.

How can I play audio files where the data is cached in the RAM?

There are two possibilities:

  • Use Clip lines. They load the data into the RAM before playback. However, there is a limit to the size of the data somwhere between 2 and 5 MB. See also Clip

  • Read the whole file (including its headers) into a byte array. Then construct a ByteArrayInputStream from this array and pass it to AudioSystem.getAudioInputStream(InputStream) to obtain an AudioInputStream.

(Matthias)

7.14.

Why is there a difference between using AudioSystem.write(..., File) and using AudioSystem.write(..., OutputStream) with a FileOutputStream?

The basic problem is that the length of the audio data has to be given in the header of an audio file, and the header is written at the beginning of the file. The length may not be known at the time the header is written. If the AudioInputStream passed to write() has a known length, this length is used for filling in the header. If, however, the AudioInputStream has an unknown length (AudioSystem.NOT_SPECIFIED), there is no valid information to fill in the header at the beginning. OutputStream allows only sequential writing: once the header is written, it cannot be changed any more. So if the length of the audio data is unknown, the header will contain invalid length information. If the destination is given as a File, the audio file writer can open the file in random access mode. After writing all audio data, it goes back to the beginning of the file and fixes the header with the then-known length information. This method is called "backpatching".

Due to this behaviour, AudioSystem.write(..., File) is recommended over AudioSystem.write(..., OutputStream), if using it is possible. See also Why does AudioInputStream.getFrameLength() return -1?

The AudioOutputStream architecture of Tritonus has to deals with the same problem. There, the difference exists between using a TSeekableDataOutputStream (representing a File, allows backpatching) and using a TNonSeekableDataOutputStream (representing an OutputStream, does not allow backpatching). See also How can I write an audio file in smaller parts? (Matthias)

7.15.

Where can I find documentation on the AudioOutputStream programming?

The API documentation is part of the Tritonus docs. See Q: 9 The recommended way to learn about programming with the AudioOutputStream architecture is to look at the examples that use it like Saving waveform data to a file (AudioOutputStream version). (Matthias)

7.16.

How can I start playback of a file at a certain position?

You can call skip() on the AudioInputStream you obtain for the file. Note that skip() can only advance the position, it cannot go back. To rewind see How do I rewind an AudioInputStream? (Matthias)

7.17.

Is it possible to read and write multichannel audio files?

The file readers and writers of both the Sun JDK and Tritonus should support interleaved multi-channel WAVE files. This feature hasn't been tested extensively, so there may be minor bugs, but it should basically work.

Interleaved multichannel PCM formats are represented by an AudioFormat instance with the respective number of channels.

See also Can I use multi-channel sound? (Matthias)

7.18.

How can I compare two audio files?

If you want to compare files if they are exactly the same, this is easy: just compare them byte by byte. However, typically, you want to compare two different recordings of the same piece of music. Because of noise, quantisation errors, different volume levels and other effects, two recordings do never match exactly. So a simple comparison can't be used.

A useful comparison is a non-trivial task that requires knowledge about digital signal processing. One approach to do such a comparison is the following:

  1. normalize the file based on signal power

  2. transform to frequency domain with an FFT

  3. scale down the FFT components

  4. compare the series of frequency components with a statistical analysis for correlation

You may get better results by exchanging step 1 with step 2 and/or using a wavelet transformation instead of FFT.

See also How can I do equalizing / noise reduction / fft / echo cancellation / ...? and Q: 2 (Matthias)

7.19.

Is it possible to insert recorded audio data into an existing file?

With standard Java Sound functionality, it is not possible to insert recorded sound into an existing file. The obvious workaround is to record to a new, temporary file and then put the pieces together to the file you want.

If direct writing to an existing file is important to you, you could try to hack the AudioOutputStream classes of Tritonus. I think it is possible to introduce a constructor flag for "open existing" instead of "overwrite file completely" and to introduce a skip() method to move to a cue point. If you're interested in this, Florian and I will help you to find your way through the implementation of AudioOutputStreams. (Matthias)

7.20.

How can I store an audio file in a byte array?

You can pass an instance of ByteArrayOutputStream to AudioSystem.write(..., OutputStream). The byte array you extract from the ByteArrayOutputStream will contain the complete file including the headers. This technique is especially useful if you want to store audio files in a database. (Matthias)

7.21.

Which value should I use for the length of the file in AudioOutputStreams if the length is not known in advance?

If the length of the file is not known in advance, you should use the value AudioSystem.NOT_SPECIFIED. Pass this value to the constructors of AudioOutputStream subclasses directly or use it in requesting an AudioOutputStream instance via AudioSystemShadow.getAudioOutputStream().

Note that not knowing the length makes it impossible to use OutputStreams as target for some audio file types (File targets should work always). (Matthias)

8. Sample Representation and AudioFormat

8.1. How is audio represented digitally?
8.2. In which cases should I use a floating point representation for audio data?
8.3. What is the meaning of frame rate in AudioFormat?
8.4. What is the meaning of frame size in Audioformat?
8.5. What is signed / unsigned?
8.6. How can I use Java's signed byte type to store an 8 bit unsigned sample?
8.7. How can I find out if an AudioFormat is signed or unsigned?
8.8. What is endianess / big endian / little endian?
8.9. How are samples organized in a byte array/stream?
8.10. What does "unknown sample rate" in an AudioFormat object mean?
8.1.

How is audio represented digitally?

Each second of sound has so many (on a CD, 44,100) digital samples of sound pressure per second. The number of samples per second is called sample rate or sample frequency. In PCM (pulse code modulation) coding, each sample is usually a linear representation of amplitude as a signed integer (sometimes unsigned for 8 bit). There is one such sample for each channel, one channel for mono, two channels for stereo, four channels for quad, more for surround sound. One sample frame consists of one sample for each of the channels in turn, by convention running from left to right.

Each sample can be one byte (8 bits), two bytes (16 bits), three bytes (24 bits), or maybe even 20 bits or a floating-point number. Sometimes, for more than 16 bits per sample, the sample is padded to 32 bits (4 bytes) The order of the bytes in a sample is different on different platforms. In a Windows WAV soundfile, the less significant bytes come first from left to right ("little endian" byte order). In an AIFF soundfile, it is the other way round, as is standard in Java ("big endian" byte order). Floating-point numbers (4 byte float or 8 byte double) are the same on all platforms.

See also How are samples organized in a byte array/stream? and What is endianess / big endian / little endian? (Matthias)

8.2.

In which cases should I use a floating point representation for audio data?

Converting sample data to a floating point representation (float or double data type) is handy if you are doing DSP stuff. In this case, it gives greater precision and greater dynamic range. In all other cases, there is no advantage. Note also that conversion to or from floats is expensive, while dealing only with integer formats is typically much faster. (Matthias)

8.3.

What is the meaning of frame rate in AudioFormat?

For PCM, A-law and μ-law data, a frame is all data that belongs to one sampling intervall. This means that the frame rate is the same as the sample rate.

For compressed formats like Ogg Vorbis, mp3 and GSM 06.10, the situation is different. A frame is a block of data as it is output by the encoder. Often, these blocks contain the information for several sampling intervalls. For instance, a mp3 frame represents about 24 ms. So the frame rate is about 40 Hz. However, the sample rate of the original is preserved even inside the frames and is correctly restored after decoding. (Matthias)

8.4.

What is the meaning of frame size in Audioformat?

As outlined in the previous question, it depends on what a frame is. For PCM, the frame size is just the number of bytes for one sample, multiplied with the number of channels. Note that usually each individual sample is represented in an integer number of bytes. For instance, a 12 bit stereo frame uses 4 bytes, not 3. For compressed formats, the frame size is some more-or-less arbitrarily chosen number that is a property of the compression schema. Some compression methods do not have a constant, but variable frame size. In this case the value returned by AudioFormat.getFrameSize() is -1. Some common frame sizes:

formatframe size
PCM, 8 bit mono1 byte
PCM, 8 bit stereo2 bytes
PCM, 16 bit mono2 bytes
PCM, 16 bit stereo4 bytes
GSM 06.1033 bytes
mp3-1 (variable)
Ogg Vorbis-1 (variable)

(Matthias)

8.5.

What is signed / unsigned?

For PCM, sample values are represented by integers. These integers can be signed or unsigned, similar to signed or unsigned data types in programming languages like C. The following table shows the value ranges for signed and unsigned integers of common sizes and of the general case:

sample sizesignednessminimum valuecenter valuemaximum value
8 bitunsigned0128255
signed-1280127
16 bitunsigned03276865536
signed-32768032767
24 bitunsigned0838860816777215
signed-838860808388607
32 bitunsigned021474236484294967295
signed-214742364802147423647
n bitunsigned02n - 12n - 1
signed-(2n - 1)02n - 1 - 1

(Matthias)

8.6.

How can I use Java's signed byte type to store an 8 bit unsigned sample?

Basically, a byte is a storage container for 8 bits. Whether these 8 bits are used to store a signed or an unsigned numer is a matter of interpretation. Yes, Java always interprets bytes as signed. But they can be interpreted just the other way, too.

The 8 bits can always represent 256 different bit patterns. In unsigned interpretation, these 256 bit patterns are interpreted as the decimal values 0 to 255. In signed interpretation, patterns are interpreted as the decimal values -128 to 127.

The following table may help to understand this.

bit patternunsigned (straight binary)signed (two's complement)
0000 000000
0000 000111
.........
0111 1110126126
0111 1111127127
1000 0000128-128
1000 0001129-127
.........
1111 1110254-2
1111 1111255-1

In representing wave forms, the range of the respective interpretation is used to express minimum and maximum of the wave.

waveform pointunsigned codingsigned coding
minimum value0-128
center value1280
maximum value255127

As you can see, the difference between signed and unsigned notation, expressed in decimal, is 128. (Matthias)

8.7.

How can I find out if an AudioFormat is signed or unsigned?

For PCM, check if the encoding equals either AudioFormat.Encoding.PCM_SIGNED or AudioFormat.Encoding.PCM_UNSIGNED. (Matthias)

8.8.

What is endianess / big endian / little endian?

Most common computers have their memory organized in units of 8 bits, called a byte. The bytes can be adressed by ordinal numbers, starting with zero. (The hardware organization of the memory is often in rows of 16, 32, 64, 128 or even more bits. But the instruction set of the processor still gives you the view of the byte-organized memory.) If you want to store a value that needs more than 8 bits, the question arises how the bits of the value are divided into bytes and stored in memory. If you have a value with 16 bits, there is not much discussion that it has to be divided into two groups: bit 0 to 7 and bits 8 to 15. But then, the fight starts. Some CPUs store the first group (bit 0 to 7) in the byte with the lower address and the second group (bits 8 to 15) in the byte with the higher address. This schema is called little endian. As an example, all Intel architecture and Alpha CPUs are little endian. Other types of CPUs do it the other way round, which is called big endian. Sparc (Sun), PowerPC (Motorola, IBM) and Mips (PMC-Sierra) CPUs are big endian.

For Java Sound, endianess matters if the size of samples (as given by AudioFormat.getSampleSizeInBits()) is greater than 8 bit. For 8 bit data, while the endianess still has to be specified in an AudioFormat object, it has no significance. It is a convention in Java Sound that Mixer, AudioFileWriter and FormatConversionProvider implementations handle both endianesses, but you can't really rely on this. (Matthias)

8.9.

How are samples organized in a byte array/stream?

It depends on the format of the data, which is given as an AudioFormat instance. Below are some common cases.

byte0123
PCM (signed or unsigned) 8 bit mono (1 channel)1. sample2. sample3. sample4. sample
PCM (signed or unsigned) 8 bit stereo (2 channels)1. frame, left sample1. frame, right sample2. frame, left sample2. frame, right sample
PCM (signed or unsigned) 16 bit mono (1 channel), little endian1. sample, low byte1. sample, high byte2. sample, low byte2. sample, high byte
PCM (signed or unsigned) 16 bit mono (1 channel), big endian1. sample, high byte1. sample, low byte2. sample, high byte2. sample, low byte
PCM (signed or unsigned) 16 bit stereo (2 channels), little endian1. frame, left sample, low byte1. frame, left sample, high byte1. frame, right sample, low byte1. frame, right sample, high byte
PCM (signed or unsigned) 16 bit stereo (2 channels), big endian1. frame, left sample, high byte1. frame, left sample, low byte1. frame, right sample, high byte1. frame, right sample, low byte
PCM (signed or unsigned) 32 bit mono (1 channel), big endian1. frame, 1. sample, 4. byte (bit 24-31)1. frame, 1. sample, 3. byte (bit 16-23)1. frame, 1. sample, 2. byte (bit 8-15)1. frame, 1. sample, 1. byte (bit 0-7)
PCM (signed or unsigned) 32 bit mono (1 channel), little endian1. frame, 1. sample, 1. byte (bit 0-7)1. frame, 1. sample, 2. byte (bit 8-15)1. frame, 1. sample, 3. byte (bit 16-23)1. frame, 1. sample, 4. byte (bit 24-31)

To understand the terms little endian, big endian, high byte and low byte, see What is endianess / big endian / little endian? See also How do I convert short (16 bit) samples to bytes to store them in a byte array? and How can I reconstruct sample values from a byte array? (Matthias)

8.10.

What does "unknown sample rate" in an AudioFormat object mean?

Since 1.5.0, "unknown sample rate" is output by AudioFormat.toString() if the sample rate is -1 (AudioSystem.NOT_SPECIFIED). See also Why are there AudioFormat objects with frame rate/sample rate reported as -1 when I query a Mixer for its supported formats? (Matthias)

9. Conversion between sample representations

9.1. How can I convert 8 bit signed samples to 8 bit unsigned or vice versa?
9.2. How do I convert short (16 bit) samples to bytes to store them in a byte array?
9.3. How do I convert float or double samples to bytes to store them in a byte array?
9.4. How can I reconstruct sample values from a byte array?
9.5. How can I convert between mono and stereo?
9.6. How can I make a mono stream appear on one channel of a stereo stream?
9.1.

How can I convert 8 bit signed samples to 8 bit unsigned or vice versa?

Signed to unsigned:

byte unsigned = (byte) (signed + 128);

Unsigned to signed:

byte signed = (byte) (unsigned - 128);

Alternativly, you can use for both conversions:

byte changed = (byte) (original ^ 0x80);

(Matthias)

9.2.

How do I convert short (16 bit) samples to bytes to store them in a byte array?

Generally:

short sample = ...;
byte high = (byte) (sample >> 8) & 0xFF;
byte low = (byte) (sample & 0xFF);

If you want to store them in an array in big endian byte order:

short sample = ...;
byte[] buffer = ...;
int offset = ...;
// high byte
buffer[offset + 0] = (byte) (sample >> 8) & 0xFF;
// low byte
buffer[offset + 1] = (byte) (sample & 0xFF);

If you want to store them in an array in little endian byte order:

short sample = ...;
byte[] buffer = ...;
int offset = ...;
// low byte
buffer[offset + 0] = (byte) (sample & 0xFF);
// high byte
buffer[offset + 1] = (byte) (sample >> 8) & 0xFF;

Note that in Java arithmetic operations on integers are always done with int's (32 bit) or long's (64 bit). Using arithmetic operations on byte or short leads to extending them to int. Therefore, storing 16 bit values in int (32 bit) variables uses less processing time if you want to do calculations like the above. On the other hand, it doubles memory usage.

Optimized code to do these conversions can be found in the class TConversionTool of Tritonus. See also How are samples organized in a byte array/stream? (Matthias)

9.3.

How do I convert float or double samples to bytes to store them in a byte array?

You can do this with the following steps:

  1. Make sure the values of the samples are normalized to a range -1.0 to +1.0

  2. Multiply the values with 32767.0

  3. Convert the values to int, preferrably by using Math.round()

  4. Proceed as described in How do I convert short (16 bit) samples to bytes to store them in a byte array?

Code example for float samples:

// the sample to process
float fSample = ...;
// saturation
fSample = Math.min(1.0F, Math.max(-1.0F, fSample);
// scaling and conversion to integer
int nSample = Math.round(fSample * 32767.0F);
byte high = (byte) (nSample >> 8) & 0xFF;
byte low = (byte) (nSample & 0xFF);

Code example for double samples:

// the sample to process
double dSample = ...;
// saturation
dSample = Math.min(1.0, Math.max(-1.0, dSample);
// scaling and conversion to integer
int nSample = (int) Math.round(dSample * 32767.0);
byte high = (byte) (nSample >> 8) & 0xFF;
byte low = (byte) (nSample & 0xFF);

(Matthias)

9.4.

How can I reconstruct sample values from a byte array?

The code below assumes that buffer is an array of bytes and offset an int, used as a an index into the buffer. It further assumes that the sample values are signed for sample sizes greater than 8 bit.

sample sizedata typeendianess or signednesscode
8 bitshort (upper 8 bit contain the value, lower 8 bit are filled with zero)signed
short sample = (short)
(buffer[offset] << 8);
unsigned
short sample = (short)
( (buffer[offset] ^ 0x80 ) << 8);
float (normalized to the range [-1.0 .. +1.0])signed
float sample =
buffer[offset] / 128.0F;
unsigned
float sample =
( (buffer[offset] & 0xFF) - 128)
/ 128.0F;
double (normalized to the range [-1.0 .. +1.0])signed
double sample =
buffer[offset] / 128.0;
unsigned
double sample =
( (buffer[offset] & 0xFF) - 128)
/ 128.0;
16 bitshort (all bits used)little
short sample = (short)
(  (buffer[offset + 0] & 0xFF)
 | (buffer[offset + 1] << 8)  );
big
short sample = (short)
(  (buffer[offset + 0] << 8)
 | (buffer[offset + 1] & 0xFF) );
int (lower 16 bit contain the value, upper 16 bit are sign extended)little
int sample =
  (buffer[offset + 0] & 0xFF)
| (buffer[offset + 1] << 8);
big
int sample =
  (buffer[offset + 0] << 8)
| (buffer[offset + 1] & 0xFF);
float (normalized to the range [-1.0 .. +1.0])little
float sample =
(  (buffer[offset + 0] & 0xFF)
 | (buffer[offset + 1] << 8) )
/ 32768.0F;
big
float sample =
(  (buffer[offset + 0] << 8)
 | (buffer[offset + 1] & 0xFF) )
/ 32768.0F;
double (normalized to the range [-1.0 .. +1.0])little
double sample =
(  (buffer[offset + 0] & 0xFF)
 | (buffer[offset + 1] << 8) )
/ 32768.0;
big
double sample =
(  (buffer[offset + 0] << 8)
 | (buffer[offset + 1] & 0xFF) )
/ 32768.0;
24 bitint (lower 24 bit contain the value, upper 8 bit are sign extended)little
int sample =
   (buffer[offset + 0] & 0xFF)
| ((buffer[offset + 1] & 0xFF) << 8)
|  (buffer[offset + 2] << 16);
big
int sample =
   (buffer[offset + 0] << 16)
| ((buffer[offset + 1] & 0xFF) << 8)
|  (buffer[offset + 2] & 0xFF);
float (normalized to the range [-1.0 .. +1.0])little
float sample =
(   (buffer[offset + 0] & 0xFF)
 | ((buffer[offset + 1] & 0xFF) << 8)
 |  (buffer[offset + 2] << 16) )
/ 8388606.0F;
big
float sample =
(   (buffer[offset + 0] << 16)
 | ((buffer[offset + 1] & 0xFF) << 8)
 |  (buffer[offset + 2] & 0xFF) )
/ 8388606.0F;
double (normalized to the range [-1.0 .. +1.0])little
double sample =
(   (buffer[offset + 0] & 0xFF)
 | ((buffer[offset + 1] & 0xFF) << 8)
 |  (buffer[offset + 2] << 16) )
/ 8388606.0;
big
double sample =
(   (buffer[offset + 0] << 16)
 | ((buffer[offset + 1] & 0xFF) << 8)
 |  (buffer[offset + 2] & 0xFF) )
/ 8388606.0;
32 bitint (all bits used)little
int sample =
   (buffer[offset + 0] & 0xFF)
| ((buffer[offset + 1] & 0xFF) << 8)
| ((buffer[offset + 2] & 0xFF) << 16)
|  (buffer[offset + 3] << 24);
big
int sample =
   (buffer[offset + 0] << 24)
| ((buffer[offset + 1] & 0xFF) << 16)
| ((buffer[offset + 2] & 0xFF) << 8)
|  (buffer[offset + 3] & 0xFF);
float (normalized to the range [-1.0 .. +1.0])little
float sample =
(   (buffer[offset + 0] & 0xFF)
 | ((buffer[offset + 1] & 0xFF) << 8)
 | ((buffer[offset + 2] & 0xFF) << 16)
 |  (buffer[offset + 3] << 24) )
/ 2147483648.0F;
big
float sample =
(   (buffer[offset + 0] << 24)
 | ((buffer[offset + 1] & 0xFF) << 16)
 | ((buffer[offset + 2] & 0xFF) << 8)
 |  (buffer[offset + 3] & 0xFF) )
/ 2147483648.0F;
double (normalized to the range [-1.0 .. +1.0])little
double sample =
(   (buffer[offset + 0] & 0xFF)
 | ((buffer[offset + 1] & 0xFF) << 8)
 | ((buffer[offset + 2] & 0xFF) << 16)
 |  (buffer[offset + 3] << 24) )
/ 2147483648.0;
big
double sample =
(   (buffer[offset + 0] << 24)
 | ((buffer[offset + 1] & 0xFF) << 16)
 | ((buffer[offset + 2] & 0xFF) << 8)
 |  (buffer[offset + 3] & 0xFF) )
/ 2147483648.0;

Optimized code to do these conversions can be found in the class TConversionTool of Tritonus. See also How are samples organized in a byte array/stream? (Matthias)

9.5.

How can I convert between mono and stereo?

This is possible with the PCM2PCM converter of Tritonus. It is available as part of the "Tritonus Miscellaneous" package. See Tritonus Plug-ins. (Matthias)

9.6.

How can I make a mono stream appear on one channel of a stereo stream?

You can use a technique like shown below. The example assumes that the data is 8 bit unsigned.

// incoming: mono input stream
// outgoing: stereo output stream
void monoToSingleSideStereo(byte[] incoming, byte[] outgoing)
{
    int nSignalOffset;
    int nSilenceOffset;
    // this is for unsigned data. For signed data, use the value 0.
    int nSilenceValue = -128;

    if (copyToLeftChannel)
    {
        nSignalOffset = 0;
        nSilenceOffset = 1;
    }
    else // signal to the right channel
    {
        nSignalOffset = 1;
        nSilenceOffset = 0;
    }
    for (int i = 0; i < incoming.length; i++)
    {
        outgoing[(i * 2) + nSignalOffset] = incoming[i];
        outgoing[(i * 2) + nSilenceOffset] = nSilenceValue;
    }
} 

Alternativly, you can use a PAN control on a SourceDataLine while doing playback. Note that this only works with the "Java Sound Audio Engine". With the "Direct Audio Device" mixers, you have to use a workaround: convert the mono stream to a stereo stream (see How can I convert between mono and stereo?), open the line in stereo and use a BALANCE control. See also Why are there no mono lines with the "Direct Audio Devices" mixers on Linux? (Matthias)

10. AudioInputStreams and Byte Arrays

10.1. How can I read an audio file and store the audio data in a byte array?
10.2. How can I write audio data from a byte array to an audio file?
10.3. How can I calculate the number of bytes to skip from the length in seconds?
10.4. How do I rewind an AudioInputStream?
10.5. How do I skip backwards on an AudioInputStream?
10.6. How can I implement a real-time AudioInputStream, though I cannot give a length for it, as it is not known in advance?
10.7. How can I mix two (or more) AudioInputStream instances to a resulting AudioInputStream?
10.8. How can I create an AudioInputStream that represents a portion of another AudioInputStream?
10.9. Why does AudioInputStream.getFrameLength() return -1?
10.10. What is the difference between AudioSystem.getAudioInputStream(InputStream) and new AudioInputStream(InputStream, AudioFormat, long)?
10.1.

How can I read an audio file and store the audio data in a byte array?

Create a ByteArrayOutputStream object. Then, in a loop, read form the AudioInputStream and write the data read from it to the ByteArrayOutputStream. Once all data is processed, call ByteArrayOutputStream.toByteArray() to get a byte array with all the data. See Buffering of audio data in memory for a code example.

As an alternative, you can do the following:

  • Calculate the required size of the byte array from the number of frames and the frame size (see How can I determine the length or the duration of an audio file?).

  • Create a byte array of the calculated size.

  • Call AudioInputStream.read() with this array. Note that while this typically reads the whole file in one call, this is not quaranteed. If, for some reason, reading the whole content of the AudioInputStream does not succeed, only part of the data may be written to the byte array. Therefore, you have to compare the return value of read() against the length of the byte array. If some part is missing, you have to call read() again with an appropriate offset.

(Matthias)

10.2.

How can I write audio data from a byte array to an audio file?

Create a ByteArrayInputStream object from the byte array, create an AudioInputStream from it, then call AudioSystem.write(). See Buffering of audio data in memory for a code example. (Matthias)

10.3.

How can I calculate the number of bytes to skip from the length in seconds?

Use one of the following formulas:

bytes = seconds * sample rate * channels * (bits per sample / 8)

or

bytes = seconds * sample rate * frame size

You can get the sample rate, number of channels, bits per sample and frame size from an AudioFormat object. (Matthias)

10.4.

How do I rewind an AudioInputStream?

See the example Playing an audio file multiple times. Note that the way the JavaSoundDemo does it is not recommended, because it relies on implementation specific behaviour. (Matthias)

10.5.

How do I skip backwards on an AudioInputStream?

In general, there is no clean way besides buffering the whole content of the AudioInputStream as done in the example Playing an audio file multiple times. There is one possibility: if the AudioInputStream is created from a FileInputStream, you can use AudioInputStream.skip() with a negative skipp amount. This works because the AudioInputStream implementation just passes the skip() call to its underlying stream and the FileInputStream implementation is able to handle random accesses. Note, however, that this relies on unspecified, implementation specific behaviour of the Sun JDK. Therefore, this approach should be used with care. (Matthias)

10.6.

How can I implement a real-time AudioInputStream, though I cannot give a length for it, as it is not known in advance?

You should use AudioSystem.NOT_SPECIFIED as length. This approach seems logical to me and it works fine in my program. (Florian)

10.7.

How can I mix two (or more) AudioInputStream instances to a resulting AudioInputStream?

There are no special methods in the Java Sound API to do this. However, mixing is a trivial signal processing task, it can be accomplished with plain Java code. Have a look at Concatenating or mixing audio files. See also How can I do mixing of audio streams? (Matthias)

10.8.

How can I create an AudioInputStream that represents a portion of another AudioInputStream?

To create a derived AudioInputStream that starts at frame start of the original AudioInputStream and has a length of length frames, you can use the folloing code:

AudioInputStream originalAIS = ...
int start = ...; // in frames
int length = ...; // in frames

int frameSize = originalAIS.getFormat().getFrameSize();
originalAIS.skip(start * frameSize);
AudioInputStream derivedAIS = new AudioInputStream(originalAIS,
                                     originalAIS.getFormat(), length);

(Matthias)

10.9.

Why does AudioInputStream.getFrameLength() return -1?

A length of -1 (AudioSystem.NOT_SPECIFIED) means that the length of the stream is unknown. This typically happens in two situations:

  • If an AudioInputStream obtains its data from a TargetDataLine, the amount of data (and therefore, the length of the stream) is determined by the length of the recording. Obviously, this cannot be known at the time the AudioInputStream instance is created.

  • If audio data is encoded to or decoded from a compression format like Ogg Vorbis or mp3, where the length of the encoded data is not a simple fraction of the length of the unencoded data. In this case, it is not possible for the codec to calculate the length of the converted stream. So it has to state that the lengt is unknown.

To write portable programs, you should always expect that the length of an AudioInputStream may be -1. For instance, if you are calculating a buffer size from the stream length, you should handle this case separately. (Matthias)

10.10.

What is the difference between AudioSystem.getAudioInputStream(InputStream) and new AudioInputStream(InputStream, AudioFormat, long)?

AudioSystem.getAudioInputStream(InputStream) "intelligently" parses the header of the file in InputStream and tries to retrieve the format of it. This fails for "raw" audio files or files that aren't recognized by Java Sound. The AudioInputStream returned by this method is at the position where the actual audio data starts, the file header is skipped.

AudioInputStream(InputStream, AudioFormat, long) is a "stupid" constructor that just returns an AudioInputStream with the InputStream used "as is". No attempt is made to verify the format with the given AudioFormat instance - if you pass a wrong AudioFormat, the data in InputStream is interpreted in a wrong way.

Using the second way on an InputStream that is obtained from an audio file would give an AudioInputStream where the "audio data" starts with the file header. Often, the difference won't be noticable, because headers are typically short (typically 44 bytes for .wav files, 24 bytes for .au files). However, there is no quarantee that the header is not much longer in some audio files, and that it will be audible as clicks or noise.

An exception are audio files without a header. These are typically "streamable" formats, e.g. mp3 and GSM 06.10. There, the data is organized in frames, and each frame has a very basic description of the audio data. So for these headerless formats, the two ways to get an AudioInputStream are equivalent. (Matthias)

11. Data Processing (Amplifying, Mixing, Signal Processing)

11.1. How can I do some processing on an A-law stream (like amplifing it)?
11.2. How can I detect the level of sound while I am recording it?
11.3. How can I do sample rate conversion?
11.4. How can I detect the frequency (or pitch) of sound data?
11.5. How can I do equalizing / noise reduction / fft / echo cancellation / ...?
11.6. How can I do silence supression or silence detection?
11.7. How can I do mixing of audio streams?
11.8. Should I use float or double for signal processing?
11.9. How can I do computations with complex numbers in Java?
11.10. How can I change the pitch (frequency) of audio data without changing the duration?
11.11. How can I change the duration of audio data without changing the pitch (frequency)?
11.12. How can I use reverbation?
11.13. How can I find out the maximum volume of a sound file?
11.14. How can I normalize the volume of sound?
11.15. How can I calculate the power of a signal?
11.1.

How can I do some processing on an A-law stream (like amplifing it)?

It is much easier to change gain with linear encoding (PCM). I would strongly suggest that - especially when you have the data in linear format at first. You'd have to convert it back to A-law after processing. (Florian)

11.2.

How can I detect the level of sound while I am recording it?

First of all, you should have the data in PCM format (preferable in signed PCM). Then you can look at the samples to detect the amplitude (level). Some statistics are suitable, too, like taking the average of the absolute values or RMS. (Florian)

11.3.

How can I do sample rate conversion?

Currently, this is not supported by the Sun JDK (see bug #4916960). Tritonus has a sample rate converter that is available as a plug-in for other Java Sound implementations, too. See the 'Tritonus Miscellaneous' package at Tritonus Plug-ins. See Converting the sample rate of audio files for a code example. Also, JMF supports sample rate conversion. See also How can I convert between two encoded formats directly (e.g. from mp3 to A-law)? and Q: 16 (Matthias)

11.4.

How can I detect the frequency (or pitch) of sound data?

What you need is an algorithm called 'fast fourier transform', abbreviated 'FFT'. See also How can I do equalizing / noise reduction / fft / echo cancellation / ...? and Q: 2. (Matthias)

11.5.

How can I do equalizing / noise reduction / fft / echo cancellation / ...?

Java Sound is an API concerned with basic sound input and output. It does not contain digital signal processing algorithms. Nevertheless, you can do this with Java; you just have to code it on your own. Craig Lindley's book (see Q: 1) contains some DSP algorithm. Also, it is often easy to transform C or C++ code found on the net to Java.

You may want to have a look at the comp.dsp FAQ

For code that does fft, have a look at the Peruna Project (original website is offline, view it at the Internet Archive). (Matthias)

11.6.

How can I do silence supression or silence detection?

This can be achieved with a variant of a common DSP algorith called "noise gate". A noise gate is a special form of a compressor, which belongs to the area of dynamic processing. (Matthias)

11.7.

How can I do mixing of audio streams?

If you want to do playback of multiple streams, just obtain multiple instances of SourceDataLine, one for each stream to play. The data fed to the SourceDataLine instances is mixed inside the Mixer instance, either in software or in hardware. There is no way to monitor the result of the mixing, other than looping the soundcard's output line to some input line.

For mixing without playback see How can I mix two (or more) AudioInputStream instances to a resulting AudioInputStream?

If the sources of the audio data are not AudioInputStream instances, but byte buffers, you can use the class FloatSampleBuffer of Tritonus. (Matthias)

11.8.

Should I use float or double for signal processing?

This is a question discussed over and over again. It seems that there is no definitive answer. Which way to go depends on the circumstances and the requirements. Here are some arguments in favour of each alternative.

Advantages of using float:

  • It uses half of the memory size used by double: 4 bytes instead of 8 bytes per sample. This may be an issue if lage amounts of data are stored in a floating point representation.

  • Calculations may be faster. This depends on the processor. For Pentium-class processors, there is no performance gain by using float with the standard FPU (Floating Point Unit): Both float and double are handled using an 80 bit represention internally anyway. However, "multimedia" instructions that execute more than one operation simultaneously are only available for float.

  • The memory bandwidth needed to transfer data from the RAM to the processor and vice versa is half of that needed for double. This is an issue in real time systems with high throuput, where the memory bandwidth is the limiting factor.

Advantages of using double:

  • There are smaller rounding errors for filter constants, so for algorithms with feedback (IIR filters), the propapility of numerical instability is lower.

  • Some algorithms with a lot of feedback like reverb may require double.

  • Several mathematical functions (for instance sin(), log(), pow()) are only available with double parameters and return values. Using double throughout instead of float saves the conversions between float and double.

(Matthias)

11.9.

How can I do computations with complex numbers in Java?

Here are two implementations of classes for complex numbers: Cmplx.java, The Colt Distribution (Open Source Libraries for High Performance Scientific and Technical Computing in Java). (Matthias)

11.10.

How can I change the pitch (frequency) of audio data without changing the duration?

This is a quite complex problem called "pitch shifting". It requires advanced DSP algorithms. This is not available as part of the Java Sound API and is unlikely to ever become so. However, it is possible to do this in Java. One example is in Craig Lindley's book (see Q: 1). (Matthias)

11.11.

How can I change the duration of audio data without changing the pitch (frequency)?

This is a problem similar to pitch shifting: It requires non-trivial DSP algorithms. See Marvin's mail and Simon's mail for some links. (Matthias)

11.12.

How can I use reverbation?

The "Java Sound Audio Engine" (see What are all these mixers?) has an implementation of a Reverb control (it is implemented as Control of the Mixer. Note that Mixer extends Line, so you can get controls from a Mixer, too). However, it seems that it is not working.

In general, it is recommended to implement reverb yourself. The reason is that the availability of reverb as a control of a Mixer as an implementation-specific property of certain Mixer implementations. The "Java Sound Audio Engine" supports reverb, all other mixers don't. So relying on reverb in the mixer makes your program not portable. In the upcoming JDK 1.5.0, the "Java Sound Audio Engine" is no longer the default mixer, the default are now the "Direct Audio Device" mixers. There are many good reasons to use the "Direct Audio Device" mixers instead of the "Java Sound Audio Engine", including low latency and support for multiple soundcards. But if you need the reverberation, you are hooked to the "Java Sound Audio Engine". And one day, the "Java Sound Audio Engine" may disappear completely. (Matthias)

11.13.

How can I find out the maximum volume of a sound file?

In a loop, go through the whole file and examine each sample. The maximum volume is the maximum of the absolute values of all samples. For getting the sample values, see How are samples organized in a byte array/stream? (Matthias)

11.14.

How can I normalize the volume of sound?

One way to do it is to scan the whole wave to find it's max (and min, or abs() it) valued sample, get the ratio of this to the available max and scale up the whole wave.

An alternative that might work in practice is to use a compressor - i.e. apply a scaling algorithm that boosts the lower-level parts of the signal. This has the perceived effect of making everything sound louder - it's often done to TV ads.

The latter (compression) is the preferrable approach. Just looking for the maximum and minimum sample may result in not getting a silent tune louder, because it may have a single peak to the maximum. It's better to calculate the average level of the whole piece and use this value in relation to the possible maximum level to predict a compression ratio. Note that this usage of the term "compression" refers to reducing the dynamic range of music. It has nothing to do with the compression of MP3, which means reducing the storage size or bitrate. (Matthias)

11.15.

How can I calculate the power of a signal?

You have to calculate the root-mean-square average of continuous samples. For four samples, the formula looks like this:

rms = sqrt( (x0^2 + x1^2 + x2^2 + x3^2) / 4)

(Matthias)

12. Compression and Encodings

12.1. Ogg Vorbis
12.1.1. What is Ogg Vorbis?
12.1.2. How can I play back Ogg Vorbis files?
12.1.3. How can I encode Ogg Vorbis files?
12.1.4. Who should we lobby to get Ogg Vorbis support in the Sun JRE?
12.1.5. How can I get the duration of an Ogg Vorbis file?
12.2. mp3
12.2.1. How can I play back mp3 files?
12.2.2. Why is there no mp3 decoder in the Sun JRE/JDK?
12.2.3. What is the legal state of the JLayer mp3 decoder?
12.2.4. What are the differences between the JLayer mp3 decoder plug-in and the Sun mp3 decoder plug-in?
12.2.5. How can I encode mp3 files?
12.2.6. Is there a mp3 encoder implemented in pure java?
12.2.7. Which input formats can I use for the mp3 encoder?
12.2.8. Is mp3 encoding possible on Mac OS?
12.2.9. Why do I get an UnsupportedAudioFileException when trying to play a mp3 file?
12.2.10. How can I get the length of an mp3 stream?
12.3. GSM 06.10
12.3.1. Is there support for GSM?
12.3.2. Why does the GSM codec refuses to encode from/decode to the format I want?
12.3.3. How can I read a .wav file with GSM data or store GSM-encoded data in a .wav file?
12.3.4. I want to convert to/from GSM using the Tritonus plug-in. However, I do not work with files or streams. Rather, I want to convert byte[] arrays.
12.3.5. How can I decode GSM from frames of 260 bit?
12.3.6. How can I calculate the duration of a GSM file?
12.3.7. Are there native implementations of codecs that are compatible with the framing format used by the Java Sound GSM codec?
12.4. A-law and μ-law
12.4.1. What are A-law and μ-law?
12.4.2. How can I convert a PCM encoded byte[] to a μ-law byte[]?
12.5. Speex
12.5.1. What is Speex?
12.5.2. Is there support for Speex?
12.5.3. How do I use JSpeex?
12.5.4. How can I get the duration of a Speex file?
12.6. Miscellaneous
12.6.1. Is there support for ADPCM (a.k.a. G723) in Java Sound?
12.6.2. Is there support for WMA and ASF in Java Sound?
12.6.3. How can I convert between two encoded formats directly (e.g. from mp3 to A-law)?
12.6.4. What compression schemas can I use?
12.6.5. How can I get Encoding instances for GSM and mp3 with JDKs older than 1.5.0?
12.6.6. Is there support for RealAudio / RealMedia (.ra / .rm files)?
12.6.7. How can I get support for a new encoding?

12.1. Ogg Vorbis

12.1.1. What is Ogg Vorbis?
12.1.2. How can I play back Ogg Vorbis files?
12.1.3. How can I encode Ogg Vorbis files?
12.1.4. Who should we lobby to get Ogg Vorbis support in the Sun JRE?
12.1.5. How can I get the duration of an Ogg Vorbis file?
12.1.1.

What is Ogg Vorbis?

From the website:

Ogg Vorbis is a fully open, non-proprietary, patent-and-royalty-free, general-purpose compressed audio format for mid to high quality (8kHz-48.0kHz, 16+ bit, polyphonic) audio and music at fixed and variable bitrates from 16 to 128 kbps/channel. This places Vorbis in the same competitive class as audio representations such as MPEG-4 (AAC), and similar to, but higher performance than MPEG-1/2 audio layer 3, MPEG-4 audio (TwinVQ), WMA and PAC.

Vorbis is the first of a planned family of Ogg multimedia coding formats being developed as part of Xiph.org's Ogg multimedia project.

For more information see The Ogg Vorbis CODEC project. (Matthias)

12.1.2.

How can I play back Ogg Vorbis files?

A Plug-in for Java Sound is available from the Tritonus project. It uses JOrbis, a pure Java decoder from the JCraft project. Under development is also a decoder based on native libraries. (Matthias)

12.1.3.

How can I encode Ogg Vorbis files?

A beta version of an encoder based on native libraries is available as part of the Tritonus project. See plug-ins (Matthias)

12.1.4.

Who should we lobby to get Ogg Vorbis support in the Sun JRE?

You can vote for the RFE #4671067 to include Ogg Vorbis in the JRE. A remark from Florian:

As far as I know, there is no development yet in Java Sound for Mustang. Also, there are legal problems (licenses...) for including ogg support in Java. Sun cannot just include 3rd party code, no matter what license it is published. I've tried to push inclusion of native bindings for ogg in Java so that you just need to install the ogg library locally to get ogg support in Java.

See also RFE #4499904. (Matthias)

12.1.5.

How can I get the duration of an Ogg Vorbis file?

Currently, the JavaZOOM version of the Vorbis decoder plug-in (VorbisSPI) sets the duration property in TAudioFileFormat if the data source is a File. Ways to provide length and duration information for URL and InputStream sources are under discussion; see the mailing list archives. See also How can I determine the length or the duration of an audio file? (Matthias)

12.2. mp3

12.2.1. How can I play back mp3 files?
12.2.2. Why is there no mp3 decoder in the Sun JRE/JDK?
12.2.3. What is the legal state of the JLayer mp3 decoder?
12.2.4. What are the differences between the JLayer mp3 decoder plug-in and the Sun mp3 decoder plug-in?
12.2.5. How can I encode mp3 files?
12.2.6. Is there a mp3 encoder implemented in pure java?
12.2.7. Which input formats can I use for the mp3 encoder?
12.2.8. Is mp3 encoding possible on Mac OS?
12.2.9. Why do I get an UnsupportedAudioFileException when trying to play a mp3 file?
12.2.10. How can I get the length of an mp3 stream?
12.2.1.

How can I play back mp3 files?

There is a pure Java decoder of the javazoom project. Tritonus, the open source implementation of Java Sound incorporates it. There is a plug-in available which runs under any JVM.

Sun has also released a pure java mp3 decoder plug-in: Java MP3 PlugIn

There is also a native mp3 decoder implementation. It is part of the mp3 encoder plug-in. See Tritonus Plug-ins. (Matthias)

12.2.2.

Why is there no mp3 decoder in the Sun JRE/JDK?

A quote from Florian:

As far as I know, Sun will not include MP3 support into the JRE, mostly because it would require a separate license to click through during installation. That's also the reason why it could not be enabled that your software downloads the plug-in on your own since the license must be acknowledged by every end-user. It's the crazy lawyers.

(Matthias)

12.2.3.

What is the legal state of the JLayer mp3 decoder?

There was much discussion on the mailing list; see the archive for details. As a short summary, see Eric's Mail and Florian's Mail. If you want to avoid legal issues completly, it is recommended to use Ogg Vorbis instead of mp3. (Matthias)

12.2.4.

What are the differences between the JLayer mp3 decoder plug-in and the Sun mp3 decoder plug-in?

  • The Sun decoder is twice as fast as the JLayer decoder though it is written in pure java, too.

  • The Sun decoder only supports MPEG 1 audio layer III files, while the JLayer decoder supports MPEG 1 and MPEG2, audio layer I - III files.

  • .

See also What is the legal state of the JLayer mp3 decoder? (Matthias)

12.2.5.

How can I encode mp3 files?

Java is free, this collides with the (enforced) licences for mp3 encoders. I have studied very carefully the mp3 licencing model and also asked at Fraunhofer (inventors of mp3) for additional information: it won't be possible to deliver a free mp3 encoder legally. (If anyone knows a "hole", please let me know. (not the available source code - this is not appropriate: the encoders available as source code - most of them based on the ISO reference implementation - create bad quality mp3's and the licence doesn't allow the use of such encoders!))

The Tritonus team is working on an interface to the open source encoder LAME. Like that people who do not fear licence problems can download LAME as a separate package and link it to Java Sound. See also Q: 7 (Florian)

12.2.6.

Is there a mp3 encoder implemented in pure java?

No. At least none that is available to the public. (Matthias)

12.2.7.

Which input formats can I use for the mp3 encoder?

The Tritonus mp3 encoder supports the following input formats: 16 bit signed PCM; mono or stereo; big or little endian; 8, 11.025, 12, 16, 22.05, 24, 32, 44.1 or 48 kHz sample rate. (Matthias)

12.2.8.

Is mp3 encoding possible on Mac OS?

LAME and its Tritonus plug-in are reported to work on Mac OS X, but not on Mac OS 9. For details, see Steven's mail. (Matthias)

12.2.9.

Why do I get an UnsupportedAudioFileException when trying to play a mp3 file?

First, check your installation as described on the bottom of the Java Sound Plugins page. If your installation is correct, but the file still doesn't play, there are two common reasons: id3v2 tags or a variable bit rate (VBR) header. Both are prepended to an ordinary mp3 file. And the AudioFileReader for mp3 can't detect this situation. The Tritonus team does not plan to fix this behaviour. However, JavaZOOM provides a modified version of the AudioFileReader. (Matthias)

12.2.10.

How can I get the length of an mp3 stream?

Currently, you can use the following hack with the JLayer decoder:

import java.io.*;
import javazoom.jl.decoder.*;
import javax.sound.sampled.*;

public class TestMP3Duration
{
    public static void main(String args[])
    {

        try
        {
            File f = new File(args[0]);
            Bitstream m_bitstream = new Bitstream(
                              new FileInputStream(f));
            Header m_header = m_bitstream.readFrame();

            int mediaLength = (int)f.length();
 
            int nTotalMS = 0;
            if (mediaLength != AudioSystem.NOT_SPECIFIED) {
               nTotalMS = Math.round(m_header.total_ms(mediaLength));
            }

            System.out.println("Length in ms: " + nTotalMS);
        } catch(Exception e) {
            e.printStackTrace();
        } 
    }
}

It seems that the decoder released by Sun (see How can I play back mp3 files?) does not support any means to obtain the length.

In the future (once the JDK 1.5.0 is released) it will be possible to get the length in a portable way using AudioFileFormat properties. See also How can I determine the length or the duration of an audio file? (Matthias)

12.3. GSM 06.10

12.3.1. Is there support for GSM?
12.3.2. Why does the GSM codec refuses to encode from/decode to the format I want?
12.3.3. How can I read a .wav file with GSM data or store GSM-encoded data in a .wav file?
12.3.4. I want to convert to/from GSM using the Tritonus plug-in. However, I do not work with files or streams. Rather, I want to convert byte[] arrays.
12.3.5. How can I decode GSM from frames of 260 bit?
12.3.6. How can I calculate the duration of a GSM file?
12.3.7. Are there native implementations of codecs that are compatible with the framing format used by the Java Sound GSM codec?
12.3.1.

Is there support for GSM?

Yes, you can download a service provider plug-in for GSM 06.10 from Java Sound Plugins. Since this implementation is pure-java, it can be used with any Java Sound implementation on any platform. For examples of using the GSM plug-in, see Encoding an audio file to GSM 06.10, Decoding an encoded audio file and Playing an encoded audio file. (Matthias)

12.3.2.

Why does the GSM codec refuses to encode from/decode to the format I want?

GSM 06.10 only works with 8 kHz sample rate. This is a property of the format and cannot be changed. The whole algorithm depends on this.

Therefore, Tritonus' GSM coded supports only two format at the PCM side: AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 8000.0F, 16, 1, 2, 8000.0F, false) and AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 8000.0F, 16, 1, 2, 8000.0F, true). If you want to use other source or target formats, you have to do the conversion in two steps:

For encoding, first convert the audio data from your source format to one the encoder accepts. Then you can do the encoding.

For decoding, decode to one of the format the decoder supports. In a second step, convert to your desired target format.

If your data has a sample rate different form 8 kHz, you have to do a sample rate conversion. See How can I do sample rate conversion?

Note that GSM actually uses only the 13 most significant bit of the 16 bit PCM samples. The 3 least significant bits are ignored while encoding, and set to zero while decoding. (Matthias)

12.3.3.

How can I read a .wav file with GSM data or store GSM-encoded data in a .wav file?

This is not supported by Tritonus. The reason is that Microsoft specified a fancy scrambling of bits for GSM in .wav. We considered it too much work to comply with such "standards". For details, see the GSM page of Jutta. See also Are there native implementations of codecs that are compatible with the framing format used by the Java Sound GSM codec? (Matthias)

12.3.4.

I want to convert to/from GSM using the Tritonus plug-in. However, I do not work with files or streams. Rather, I want to convert byte[] arrays.

You have two choices:

  • Convert the byte array to and from AudioInputStreams using ByteArrayInputStreams and ByteArrayOutputStreams. For details, see the questions How can I read an audio file and store the audio data in a byte array? and How can I write audio data from a byte array to an audio file?.

    If you want to encode data captured with a TargetDataLine, use the AudioInputStream constructor with a TargetDataLine parameter. This is the recommended way, because it is clean and does not directely access low-level APIs. It is highly likely to be portable between different Java Sound implementations (Assuming that one day there will be an alternate GSM codec implementation).

  • Use the low-level API of the GSM decoder and encoder. This is tricky and is not officially supported by the Tritonus team. The source code is your friend, besides that you may get support from the original authors. To say it short: it is not recommended. If you really want to do it, you can take the implementation of GSMFormatConversionProvider.java in Tritonus as an example.

(Matthias)

12.3.5.

How can I decode GSM from frames of 260 bit?

GSM frames indeed have a length of 260 bits, which is equal to 32.5 bytes. To store such frames in files, the common technique is to pad each frame with 4 zero bits at the end, so that a frame fits into 33 bytes. This is the format used by the GSM codec of Tritonus. So if you do the same padding, the data can be decoded by this codec. (Matthias)

12.3.6.

How can I calculate the duration of a GSM file?

If the length of the encoded data is known, calculating the total duration is quite easy: A GSM frame with 33 bytes contains information about 160 samples at a sample rate of 8 kHz; each frame represents 20 milliseconds. So the formula is:

long length_of_data = ...; // in bytes
long number_of_frames = length_of_data / 33;
long duration = number_of_frames * 20; // in milliseconds

See also How can I determine the length or the duration of an audio file? (Matthias)

12.3.7.

Are there native implementations of codecs that are compatible with the framing format used by the Java Sound GSM codec?

Yes, there are quite a number of programs listed on GSM Applications and GSM for X.

Note that Microsoft uses a different framing for GSM. Therefore, Microsoft GSM codecs are incompatible. See How can I read a .wav file with GSM data or store GSM-encoded data in a .wav file? (Matthias)

12.4. A-law and μ-law

12.4.1. What are A-law and μ-law?
12.4.2. How can I convert a PCM encoded byte[] to a μ-law byte[]?
12.4.1.

What are A-law and μ-law?

These are logarithmic codings of a sample value. The values are stored in 8 bit, but the range of values is roughly equal to 14 bit linear. So coding 16 bit data to A-law and μ-law means half the storage size with about 15 per cent quality loss. See a mathematical definition. (Matthias)

12.4.2.

How can I convert a PCM encoded byte[] to a μ-law byte[]?

When you are processing streams, read the documentation of AudioSystem. There are functions like getAudioInputStream(AudioFormat, AudioInputStream) that do the conversion for you.

In case you absolutely want to do the conversion "by hand", look at how Tritonus is doing it: have a look at the class TConversionTool. (Florian)

12.5. Speex

12.5.1. What is Speex?
12.5.2. Is there support for Speex?
12.5.3. How do I use JSpeex?
12.5.4. How can I get the duration of a Speex file?
12.5.1.

What is Speex?

Speex is an audio compression format designed for speech. It is open source and patent free. For more information see the Speex Homepage. (Matthias)

12.5.2.

Is there support for Speex?

Yes, have a look at the JSpeex project. (Matthias)

12.5.3.

How do I use JSpeex?

See Mark's mail. (Matthias)

12.5.4.

How can I get the duration of a Speex file?

Currently, there seems to be no way to find out the duration. The typical problem with getting the duration of compressed audio data is that there is no linear relation between the length of the encoded data and the length of the unencoded data. So typcally, length information is available if either:

  • There is a header that contains this information

  • The implementor of the decoder decided to read or skip through the whole stream to gather this information. Whether this is possible depends on the encoded format and on the stream: it requires resetting, so this is only possible if the stream is seekable or can be reopened from the beginning or the whole content is cached in memory. Implementor typically decide against caching, since it may consume several megabytes of memory.

I don't know details about the Speex format, so I don't know if there is a possibility to make length information available. See also How can I determine the length or the duration of an audio file? (Matthias)

12.6. Miscellaneous

12.6.1. Is there support for ADPCM (a.k.a. G723) in Java Sound?
12.6.2. Is there support for WMA and ASF in Java Sound?
12.6.3. How can I convert between two encoded formats directly (e.g. from mp3 to A-law)?
12.6.4. What compression schemas can I use?
12.6.5. How can I get Encoding instances for GSM and mp3 with JDKs older than 1.5.0?
12.6.6. Is there support for RealAudio / RealMedia (.ra / .rm files)?
12.6.7. How can I get support for a new encoding?
12.6.1.

Is there support for ADPCM (a.k.a. G723) in Java Sound?

Currently not. There is an alpha version of a codec for IMA ADPCM in Tritonus. However, the file readers and writers haven't been adapted to handle this format, so the codec is of little use. Doing this is not really difficult, volunteers are appreciated. Developing support for MS ADPCM shouldn't be too difficult, too.

Also note that JMF can handle IMA ADPCM. (Matthias)

12.6.2.

Is there support for WMA and ASF in Java Sound?

WMA and ASF are not supported by Java Sound or any known plug-in to it. Of course there are native programs that can do the conversion. (Matthias)

12.6.3.

How can I convert between two encoded formats directly (e.g. from mp3 to A-law)?

You have to do this in 2 to 4 steps:

  • Convert it to PCM, 16 bit, any endianess, sample rate and channels as the original input file

  • If necessary, convert sample rate and number of channels (as separate steps) to the values you want in the target file.

  • convert that PCM stream to the target format, same sample rate and channels.

See also the example Converting audio files to different encodings, sample size, channels, sample rate (Matthias)

12.6.4.

What compression schemas can I use?

The table below gives you an overview:

schemaavailabilitylocationusable in applets
A-lawstandard part of Java Sound implementations---yes
μ-lawstandard part of Java Sound implementations---yes
GSM 06.10part of Tritonus, available as plug-in for other Java Sound implementationsTritonus plug-insyes
mp3part of Tritonus, available as plug-in for other Java Sound implementationsTritonus plug-insdecoding: yes, encoding: no
Ogg Vorbispart of Tritonus, available as plug-in for other Java Sound implementations.Tritonus plug-insdecoding: yes, encoding: no (pure java encoder is under development)
Speexavailable as plug-in for other Java Sound implementationsJSpeex projectno
IMA ADPCMunder development in Tritonus---yes
MS ADPCMnot available------

See also What compression schema should I use to transfer audio data over a network?

Also note that JMF has quite a few more codecs than Java Sound. (Matthias)

12.6.5.

How can I get Encoding instances for GSM and mp3 with JDKs older than 1.5.0?

Since 1.5.0, the way to obtain Encoding instances for non-standard encodings like GSM 06.10, Ogg Vorbis and mp3 is to use the constructor Encoding(String name) (See, for instance, Encoding an audio file to GSM 06.10). In JDKs older than 1.5.0, this constructor is protected. So calling it directly is not possible. The old workaround for this problem was a special class org.tritonus.share.sampled.Encodings introduced by Tritonus. It can be used to retrieve Encoding instances. See this older version of GSMEncoder for an example how to do this. (Matthias)

12.6.6.

Is there support for RealAudio / RealMedia (.ra / .rm files)?

There isn't, and it doesn't look like there will be in the near future. RealAudio is a proprietary format; there is no specification available to the public. Due to that, it's hard to implement support for it. Currently, the only way to do this seems to use native libraries provided by RealNetworks. If you want to change this situation, bug RealNetworks to publish specs (politely, please). (Matthias)

12.6.7.

How can I get support for a new encoding?

If you need support for an encoding that is currently not supported, you can code it yourself (or pay somebody to do so). Java Sound has an extension mechanism called "service provider interface" (SPI). For supporting a new format, you need to write a plug-in that implements the interface FormatConversionProvider. Typically, it is also necessary to write new audio file readers (interface AudioFileReader) and audio file writers (interface AudioFileWriter) or extend existing ones.

See also Q & A 2, “Service Provider Interface (SPI)” (Matthias)

13. Audio data transfer over networks

13.1. How can I do streaming of audio data?
13.2. Why do I get distorted sound in my streaming application if it is used on the internet, but works on a LAN?
13.3. How can I upload recorded audio data to a server?
13.4. What compression schema should I use to transfer audio data over a network?
13.1.

How can I do streaming of audio data?

There is no special support for streaming protocols in the Java Sound API. Options include:

  • Implement your own streaming protocol based on the java.net.* classes.

  • Use the realtime streaming protocol (RTP) implementation included in the Java Media Framework (JMF).

(Matthias)

13.2.

Why do I get distorted sound in my streaming application if it is used on the internet, but works on a LAN?

With a naive streaming approach (simply writing to and reading from sockets), you need a guaranteed network bandwidth and minimum network latency. Though this is not really guaranteed on an ethernet, the bandwidth is typically sufficient for smooth operation. On the internet, however, bandwidth is much more limited and latency much higher than on an ethernet. So packet are arriving late, with leads to clicks in the sound. To compensate for this effects, special streaming protocols are needed. The most common of there is the Real-time protocol (RTP). (Matthias)

13.3.

How can I upload recorded audio data to a server?

There are several ways to do this:

  • One possibility is to use sockets (classes java.net.Socket and java.net.ServerSocket). See this mail for more details, including an example server program.

  • Another possibility is to use HTTP requests. Both POST and PUT requests can be used for uploading.

  • A more sophisticated approach is to use the realtime streaming protocol (RTP) implementation included in the Java Media Framework (JMF).

You may also want to have a look at Java Sound Resources: Applications: Answering Machine (Matthias)

13.4.

What compression schema should I use to transfer audio data over a network?

It depends on your requirements. There is a general trade-off between bandwidth, processing power and quality. Better quality needs either more bandwidth or more processing power. Here is a short overview of some common compression schemas:

schemabandwidthprocessing req.quality
PCM uncompressed (CD quality)1.4 Mbit/s (176.4 kByte/s)nonevery good
A-law, μ-law64 kbit/s (8 kByte/s)lowbad
GSM 06.1013.6 kByte/smediummedium (speech), bad (music)
mp3typically 8 - 40 kByte/shighmedium to good

For speech, GSM 06.10 is a common choise. It is widely used in internet phone and voice chat applications. For high-quality music, use mp3 or (better) Ogg Vorbis. See also What compression schemas can I use? (Matthias)

14. Ports

14.1. How do I use the interface Port?
14.2. Why is it not possible to retrieve Port instances?
14.3. Why is it not possible to retrieve Control instances from Port lines?
14.4. What does opening and closing mean for Port lines?
14.5. Why is it not possible to read data from a microphone Port line?
14.6. Can I use Java Sound's Port interface to control volume and tone of sound played with an application using JMF?
14.7. Why are there no Port instances of certain predefined types (like Port.Info.MICROPHONE or Port.Info.COMPACT_DISC) on Linux?
14.1.

How do I use the interface Port?

Have a look at the chapter "Processing Audio with Controls" in the Java Sound Programmer's Guide. You can also have a look at how the applications jsinfo and systemmixer deal with ports. (Matthias)

14.2.

Why is it not possible to retrieve Port instances?

Up to version 1.4.1, there was no Port implementation in the Sun JDK. In 1.4.2, an implementation was added for Windows. In 1.5.0, an implementation was added for Solaris and Linux. (Matthias)

14.3.

Why is it not possible to retrieve Control instances from Port lines?

Make sure you are opening the Port line before retrieving controls. For instance:

Port port = ...;
port.open();
Control[] controls = port.getControls();

(Matthias)

14.4.

What does opening and closing mean for Port lines?

Typically, the implementation of ports needs to query the soundcard's mixer for its properties and build internal data structures for Control instances. Since this is often an expensive operation, it is only done if the port is really used, i.e. when it is opened. So you need to open the Port to retrieve and use Control instances. After closing the port, the association between the Control instances and native resources of the soundcard are invalidated, so that changes to the controls do have no effect.

See also Why is it not possible to retrieve Control instances from Port lines? (Matthias)

14.5.

Why is it not possible to read data from a microphone Port line?

This is due to the design of the hardware. Soundcards usually have only one Analog-Digital-Converter (ADC), but multiple input lines. You can obtain a TargetDataLine to get the digital data provided by the ADC. On the other hand, Port lines represent the analog inputs to the ADC and the analog outputs from the Digital-Analog-Converter (DAC). By using the controls of a Port line, you can influence the signal level on that line that reaches the ADC, or influence the volume on the output line that leads to your speakers. In other words, the Port lines are the abstraction of the hardware mixer on the soundcard. While one could question why Port and DataLine have a common base interface, it should be clear that you can't read digital data from an object representing an analog line.

See also How can I detect which Port Mixer belongs to which soundcard? (Matthias)

14.6.

Can I use Java Sound's Port interface to control volume and tone of sound played with an application using JMF?

Yes, this is possible. Port lines control the hardware mixer of the soundcard, so using them affects everything played, even sound from native applications. (Matthias)

14.7.

Why are there no Port instances of certain predefined types (like Port.Info.MICROPHONE or Port.Info.COMPACT_DISC) on Linux?

Some operating systems or soundcard driver APIs do not provide information on the type of the available mixer channels. In these cases, a Java Sound implementation cannot match mixer channels with pre-defined Port types. Especially, this is the case with ALSA, which is used as the basis for the Port implementation of the Sun JDK on Linux.

To write portable programs, you should not rely on the availability of pre-defined Port types. If in doubt, obtain the list of available Ports and let the user decide which one to use. This is a good idea anyway, since some users don't have a microphone connected to the "mic in" channel of the soundcard, but via a preamp connected to the "line in" channel. (Matthias)

15. Miscellaneous

15.1. Why is playback of audio data with Java Sound significantly quieter than with a similar player on the native OS?
15.2. Can I use multi-channel sound?
15.3. Which multi-channel soundcards can I use with Java Sound?
15.4. Can I use the rear channels of a four-channel soundcard (like Soundblaster Life! and Soundblaster Audigy)?
15.5. How can I read audio data from a CD?
15.6. Why is there no sound at all when running my program on Linux, while on Windows it works as expected?
15.7. How can I display audio data as a waveform?
15.8. What is the difference between AudioInputStream and TargetDataLine?
15.9. Does Java Sound support 24 bit/96 kHz audio?
15.1.

Why is playback of audio data with Java Sound significantly quieter than with a similar player on the native OS?

There was the issue that Sun's implementation of Java Sound (at least up to version 0.99) lowers the level of output in order to avoid clippings when several lines are mixed. Probably this "feature" is the problem.

I find this "feature" quite doubtful. A Java Sound programmer should use GainControls attached to single lines to lower the volume, if wanted. Many applications won't profit of this "feature": e.g. they only play one line at a time. Or the mixed sounds don't create clippings. This is not unusual, as even "normalized" sounds leave most of the time enough room - there must coincide peaks to create a clipping. The case that the soft synth AND audio are playing simultaneously can be expected in "quality" programs which provide a way to lower the gain of the lines - or do the gain decrease automatically.

As Java Sound is supposed to be a low-level engine, such an approach would not be suitable. The problem of the feature is a general decrease of signal-to-noise ratio of all Java Sound programs. Automatic lowering of volume prevents the use in "serious" or professional environments... (Florian)

Note that the above is true for the "Java Sound Audio Engine". It does not apply to the "Direct Audio Device" mixers. See also What are all these mixers? (Matthias)

15.2.

Can I use multi-channel sound?

With the "Direct Audio Device" mixers (see What are all these mixers?) it is possible to use multi-channel cards.

On Windows, the device drivers of multi-channel cards usually split the hardware facilities into stereo channels, each provided by a separate logical device. On Linux with ALSA, device drivers of multi-channel cards typically represent the hardware by one device with all channels together (interleaved). However, it is possible to split the channels using the ALSA configuration files.

Without the "Direct Audio Device" mixers, it is possible to record from, but not play back to logically splitted devices. For playback, the first one is used. See Why can I record from different soundcards, but not play back to them?

See also Is it possible to read and write multichannel audio files? (Matthias)

15.3.

Which multi-channel soundcards can I use with Java Sound?

Soundcards known to work well with Java Sound (JDK 1.5.0) on Windows as well as on Linux are the M-Audio Delta 44 and Delta 66. Another card working on Windows is the ESI Waveterminal 192X. However, it is reported to have stability problems with Java Sound. (Matthias)

15.4.

Can I use the rear channels of a four-channel soundcard (like Soundblaster Life! and Soundblaster Audigy)?

Yes, if access to these channels are provided by the soundcard driver in a useful way. For Windows, there is no obvious solution; details are under investigation. For Linux, it should be possible. See also Can I use multi-channel sound? (Matthias)

15.5.

How can I read audio data from a CD?

On Linux, you can do this with Tritonus' CDDA extension. See Tritonus Plug-ins, Java Sound Resources: Examples: CD Digital Audio Extraction and Java Sound Resources: Applications: Ripper.

Currently, there is no implementation doing the same for Windows or other operating systems. Other possible solutions include:

  • On some Windows systems as well as on some Linux systems, reading audio CDs is integrated into the operating system. Typically, the CD is mapped into the file system as another disk with one .wav file per track. In this case, you can just open and read one of these files with Java Sound as you would do with any other audio file.

  • Use an external tool to extract the digital data from the CD to a .wav file. Then process this file with Java Sound. It's possible to keep this mechanism "under the hood": invoke the capturing utility from inside your java app (System.exec() or simular) and pass it the name of a temporary file it has to write to. After the utility has completed, read this file.

  • On most systems, you can select the CD as a recording source in the system mixer (this requires your CD drive to be connected to your soundcard with an analog cable). Then do a audio recording with Java Sound. Of course, this does not result in a digital copy of the data on CD.

(Matthias)

15.6.

Why is there no sound at all when running my program on Linux, while on Windows it works as expected?

A common pitfall on Linux are mixing daemons like esd and artsd. They open the audio device exclusively. So if they are running while the Java VM is started, the VM is denied access to the audio device. There are three possible solutions:

  • Use a soundcard that does mixing in hardware. In this case, the Java VM and the mixing daemon can coexist, because opening the audio device is no longer exclusive; the audio streams of the VM and the daemon are mixed in hardware.

    Using ALSA's dmix plug-in is currently no solution, since the "Direct Audio Device" mixer implementation opens ALSA PCM devices in "hw" mode and therefore misses devices emulated by dmix.

  • Kill or disable the mixing daemon while you are using Java Sound programs.

  • As a "light-weight" solution, you can install ALSA including its OSS emulation and configure your system so that the sound daemon uses ALSA directly while the Java VM uses the OSS emulation or the other way round. This way the JVM and the sound daemon wont't interfere. Note that the "Java Sound Audio Engine" uses the OSS API while the "Direct Audio Device" mixer uses the ALSA API.

See also Q: 3.4 and How can I enable mixing with the "Direct Audio Device" mixers on Linux?. (Matthias)

15.7.

How can I display audio data as a waveform?

Well, you have to extract the sample values from the byte stream (see How are samples organized in a byte array/stream? and How can I reconstruct sample values from a byte array?) and then draw some lines...

There are some examples of classes that implement such a thing:

See also How can I calculate the power of a signal? (Matthias)

15.8.

What is the difference between AudioInputStream and TargetDataLine?

InputStream represents a stream of bytes that may be read from a file, URL or other data source. AudioInputStream extends InputStream with properties that are needed for interpreting audio data: the data format (an AudioFormat object) and the length of the stream in frames. An AudioInputStream instance can be wrapped around any InputStream object to provide this information.

TargetDataLine is much more specific: it represents an audio line to which data is output from an audio device (represented by a Mixer object). Data recorded from an audio capture device is delivered to a TargetDataLine, from which it can be read by the application. So lines of various types (TargetDataLine, SourceDataLine, Clip, Port) are not arbitrary software objects that can be created and connected to a mixer or audio device. Rather, they are part of the mixer or device itself.

The difference between AudioInputStream and TargetDataLine is mirrored by the difference between AudioOutputStream and SourceDataLine. While AudioOutputStream (a concept introduced by Tritonus) is a general concept of something you can write audio data to, SourceDataLine is specific to a Mixer instance.

For programming, there are two subtle differences:

  • The handling of read lengths that are not an integral multiple of the frame size: AudioInputStream silently rounds down the length (in bytes) to the nearest integer number of frames. TargetDataLine throws an IllegalArgumentException.

  • The behaviour when the end of data is reached: AudioInputStream.read() returns -1 if there is no more data. TargetDataLine.read() returns 0 if the line is closed (otherwise, read() is guaranteed to block until the requested amount of data is available).

(Matthias)

15.9.

Does Java Sound support 24 bit/96 kHz audio?

There is nothing in the API that prevents dealing with 24 bit/96 kHz. The implementations of the API are a different story. The "Java Sound Audio Engine" does not support it. Therefore, there is no support for it in the JDK up to 1.4.2. With the "Direct Audio Device" mixers in the JDK 1.5.0 (Linux: 1.4.2), it should be possible. See also What are all these mixers? (Matthias)