Making Sound Just Work

One of the “second tier” of requirements mentioned several times at the OSDL Portland Linux Desktop Architects workshop was “making audio on Linux just work”. Many people find it easy to leave this requirement lying around in various lists of goals and requirements, but before we can make any progress on defining a plan to implement the goal, we first need to define it rather more precisely.

This list is intended to avoid any implementation details, and is focused entirely on a task oriented analysis of the issues. Your input is sought to complete, improve and clarify this analysis.

DEFINING THE GOAL

The list below is a set of tasks that a user could reasonably expect to perform on a computer running Linux that has access to zero, one or more audio interfaces, as well as zero one or more network interfaces.

The desired task should either work, or produce a sensible and comprehensible error message explaining why it failed. For example, attempting to control input gain on a device that has no hardware mixer should explain that the device has no controls for input gain.

CONFIGURATION (see also MIXING below)

identify what audio h/w exists on the system
identify what network audio destinations are available
assess the capabilities of each h/w device (multi-channel, audio quality, hardware decoders, etc.)
choose some given audio h/w or network endpoint as the default for input
ditto for output
enable/disable given audio h/w
easily (auto)load any kernel modules required for given functionality

PLAYBACK

play a compressed audio file
- user driven (e.g. play(1))
- app driven (e.g. {kde,gnome_play}_audiofile())
- Note: uniform handling of “classes” of data types between user driven and app driven environments is a requirement
play a PCM encoded audio file (specifics as above)
hear system sounds
game audio
music composition
music editing
video post production

VOIP

low-latency low sample rate recording & playback
Support multiple sound devices (USB headset + sound card)
Allow configuration of output device per event type (music player→speakers, Incoming call ringtone→speakers, voip calls→headset)
Give apps hooks for hot-plug sound devices

RECORDING

record from hardware inputs
- use default audio interface
- use other audio interface
- specify which h/w input to use
- control input gain
record from other application(s)
record from live (network-delivered) audio streams
- PCM/lossless compression (WAV, FLAC etc)
- lossy compression (mp3, ogg etc)

MIXING

control h/w mixer device (if any)
- allow use of a generic app for this
- NOTE to non-audio-focused readers: the h/w mixer is part of the audio interface that is used to control signal levels, input selection for recording, and other h/w specific features. Some pro-audio interfaces do not have a h/w mixer, most consumer ones do. It has almost nothing to do with “hardware mixing” which describes the ability of the h/w to mix together multiple software-delivered audio data streams.
multiple applications using soundcard simultaneously
control application volumes independently
provide necessary apps for controlling specialized hardware (e.g. RME HDSP, ice1712, ice1724, liveFX)

ROUTING

route audio to specific h/w among several installed devices
route audio between applications
route audio across network
route audio without using h/w (regardless to whether or not h/w is available; e.g. streaming media)

MULTIUSER

which of the above should work in a multi-user scenario?

FORMATS

basically, the task list if covered by the above list, but there are some added criteria:
- audio data formats divide into:
  - direct sample data (e.g. RIFF/WAV, AIFF)
  - losslessly compressed (e.g. FLAC)
  - lossy compression (e.g. Vorbis, MP3)
- apps that can handle a given division should all handle the same set of formats, with equal prowess. i.e. apps don't have to handle lossy compression formats, but if they do, they should all handle the same set of lossy compression formats. Principle: minimize user suprise.
- user should see no or limited obstacles to handling proprietary formats

MISC

use multiple soundcards as a single logical device
use multiple sub-devices as a single logical device (sub-devices are independent chipsets on a single audio interface; many soundcards have analog i/o and digital i/o available as two different sub-devices)

Wiki

Table of Contents

Contents

Making Sound Just Work

DEFINING THE GOAL

CONFIGURATION (see also MIXING below)

PLAYBACK

VOIP

RECORDING

MIXING

ROUTING

MULTIUSER

FORMATS

MISC

Wiki

User Tools

Site Tools

Table of Contents

Contents

Making Sound Just Work

DEFINING THE GOAL

CONFIGURATION (see also MIXING below)

PLAYBACK

VOIP

RECORDING

MIXING

ROUTING

MULTIUSER

FORMATS

MISC

Page Tools