User Tools

Site Tools



Making Sound Just Work

One of the “second tier” of requirements mentioned several times at the OSDL Portland Linux Desktop Architects workshop was “making audio on Linux just work”. Many people find it easy to leave this requirement lying around in various lists of goals and requirements, but before we can make any progress on defining a plan to implement the goal, we first need to define it rather more precisely.

This list is intended to avoid any implementation details, and is focused entirely on a task oriented analysis of the issues. Your input is sought to complete, improve and clarify this analysis.


The list below is a set of tasks that a user could reasonably expect to perform on a computer running Linux that has access to zero, one or more audio interfaces, as well as zero one or more network interfaces.

The desired task should either work, or produce a sensible and comprehensible error message explaining why it failed. For example, attempting to control input gain on a device that has no hardware mixer should explain that the device has no controls for input gain.


  • identify what audio h/w exists on the system
  • identify what network audio destinations are available
  • assess the capabilities of each h/w device (multi-channel, audio quality, hardware decoders, etc.)
  • choose some given audio h/w or network endpoint as the default for input
  • ditto for output
  • enable/disable given audio h/w
  • easily (auto)load any kernel modules required for given functionality


  • play a compressed audio file
    • user driven (e.g. play(1))
    • app driven (e.g. {kde,gnome_play}_audiofile())
    • Note: uniform handling of “classes” of data types between user driven and app driven environments is a requirement
  • play a PCM encoded audio file (specifics as above)
  • hear system sounds
  • game audio
  • music composition
  • music editing
  • video post production


  • low-latency low sample rate recording & playback
  • Support multiple sound devices (USB headset + sound card)
  • Allow configuration of output device per event type (music player→speakers, Incoming call ringtone→speakers, voip calls→headset)
  • Give apps hooks for hot-plug sound devices


  • record from hardware inputs
    • use default audio interface
    • use other audio interface
    • specify which h/w input to use
    • control input gain
  • record from other application(s)
  • record from live (network-delivered) audio streams
    • PCM/lossless compression (WAV, FLAC etc)
    • lossy compression (mp3, ogg etc)


  • control h/w mixer device (if any)
    • allow use of a generic app for this
    • NOTE to non-audio-focused readers: the h/w mixer is part of the audio interface that is used to control signal levels, input selection for recording, and other h/w specific features. Some pro-audio interfaces do not have a h/w mixer, most consumer ones do. It has almost nothing to do with “hardware mixing” which describes the ability of the h/w to mix together multiple software-delivered audio data streams.
  • multiple applications using soundcard simultaneously
  • control application volumes independently
  • provide necessary apps for controlling specialized hardware (e.g. RME HDSP, ice1712, ice1724, liveFX)


  • route audio to specific h/w among several installed devices
  • route audio between applications
  • route audio across network
  • route audio without using h/w (regardless to whether or not h/w is available; e.g. streaming media)


  • which of the above should work in a multi-user scenario?


  • basically, the task list if covered by the above list, but there are some added criteria:
    • audio data formats divide into:
      • direct sample data (e.g. RIFF/WAV, AIFF)
      • losslessly compressed (e.g. FLAC)
      • lossy compression (e.g. Vorbis, MP3)
    • apps that can handle a given division should all handle the same set of formats, with equal prowess. i.e. apps don't have to handle lossy compression formats, but if they do, they should all handle the same set of lossy compression formats. Principle: minimize user suprise.
    • user should see no or limited obstacles to handling proprietary formats


  • use multiple soundcards as a single logical device
  • use multiple sub-devices as a single logical device (sub-devices are independent chipsets on a single audio interface; many soundcards have analog i/o and digital i/o available as two different sub-devices)
desktop/making_sound_that_just_works.txt · Last modified: 2016/07/19 01:22 (external edit)