Thursday, June 18, 2009


State of sound in Linux not so sorry after all



About two years ago, I wrote an article titled the "The Sorry State of Sound in Linux", hoping to get some sound issues in Linux fixed. Now two years later a lot has changed, and it's time to take another look at the state of sound in Linux today.


A quick summary of the last article for those that didn't read it:
  • Sound in Linux has an interesting history, and historically lacked sound mixing on hardware that was more software based than hardware.
  • Many sound servers were created to solve the mixing issue.
  • Many libraries were created to solve multiple back-end issues.
  • ALSA replaced OSS version 3 in the Kernel source, attempting to fix existing issues.
  • There was a closed source OSS update which was superb.
  • Linux distributions have been removing OSS support from applications in favor of ALSA.
  • Average sound developer prefers a simple API.
  • Portability is a good thing.
  • Users are having issues in certain scenarios.


Now much has changed, namely:
  • OSS is now free and open source once again.
  • PulseAudio has become widespread.
  • Existing libraries have been improved.
  • New Linux Distributions have been released, and some existing ones have attempted an overhaul of their entire sound stack to improve users' experience.
  • People read the last article, and have more knowledge than before, and in some cases, have become more opinionated than before.
  • I personally have looked much closer at the issue to provide even more relevant information.


Let's take a closer look at the pros and cons of OSS and ALSA as they are, not five years ago, not last year, not last month, but as they are today.

First off, ALSA.
ALSA consists of three components. First part is drivers in the Kernel with an API exposed for the other two components to communicate with. Second part is a sound developer API to allow developers to create programs which communicate with ALSA. Third part is a sound mixing component which can be placed between the other two to allow multiple programs using the ALSA API to output sound simultaneously.

To help make sense of the above, here is a diagram:


Note, the diagrams presented in this article are made by myself, a very bad artist, and I don't plan to win any awards for them. Also they may not be 100% absolutely accurate down to the last detail, but accurate enough to give the average user an idea of what is going on behind the scenes.

A sound developer who wishes to output sound in their application can take any of the following routes with ALSA:
  • Output using ALSA API directly to ALSA's Kernel API (when sound mixing is disabled)
  • Output using ALSA API to sound mixer, which outputs to ALSA's Kernel API (when sound mixing is enabled)
  • Output using OSS version 3 API directly to ALSA's Kernel API
  • Output using a wrapper API which outputs using any of the above 3 methods


As can be seen, ALSA is quite flexible, has sound mixing which OSSv3 lacked, but still provides legacy OSSv3 support for older programs. It also offers the option of disabling sound mixing in cases where the sound mixing reduced quality in any way, or introduced latency which the end user may not want at a particular time.

Two points should be clear, ALSA has optional sound mixing outside the Kernel, and the path ALSA's OSS legacy API takes lacks sound mixing.

An obvious con should be seen here, ALSA which was initially designed to fix the sound mixing issue at a lower and more direct level than a sound server doesn't work for "older" programs.

Obvious pros are that ALSA is free, open source, has sound mixing, can work with multiple sound cards (all of which OSS lacked during much of version 3's lifespan), and included as part of the Kernel source, and tries to cater to old and new programs alike.

The less obvious cons are that ALSA is Linux only, it doesn't exist on FreeBSD or Solaris, or Mac OS X or Windows. Also, the average developer finds ALSA's native API too hard to work with, but that is debatable.


Now let's take a look at OSS today. OSS is currently at version 4, and is a completely different beast than OSSv3 was.
Where OSSv3 went closed source, OSSv4 is open sourced today, under GPL, 3 clause BSD, and CDDL.
While a decade old OSS was included in the Linux Kernel source, the new greatly improved OSSv4 is not, and thus may be a bit harder for the average user to try out. Older OSSv3 lacked sound mixing and support for multiple sound cards, OSSv4 does not. Most people who discuss OSS or try OSS to see how it stacks up against ALSA unfortunately are referring to, or are testing out the one that is a decade old, providing a distortion of the facts as they are today.

Here's a diagram of OSSv4:
A sound developer wishing to output sound has the following routes on OSSv4:
  • Output using OSS API right into the Kernel with sound mixing
  • Output using ALSA API to the OSS API with sound mixing
  • Output using a wrapper API to any of the above methods


Unlike in ALSA, when using OSSv4, the end user always has sound mixing. Also because sound mixing is running in the Kernel itself, it doesn't suffer from the latency ALSA generally has.

Although OSSv4 does offer their own ALSA emulation layer, it's pretty bad, and I haven't found a single ALSA program which is able to output via it properly. However, this isn't an issue, since as mentioned above, ALSA's own sound developer API can output to OSS, providing perfect compatibility with ALSA applications today. You can read more about how to set that up in one of my recent articles.

ALSA's own library is able to do this, because it's actually structured as follows:

As you can see, it can output to either OSS or ALSA Kernel back-ends (other back-ends too which will be discussed lower down).

Since both OSS and ALSA based programs can use an OSS or ALSA Kernel back-end, the differences between the two are quite subtle (note, we're not discussing OSSv3 here), and boils down to what I know from research and testing, and is not immediately obvious.

OSS always has sound mixing, ALSA does not.
OSS sound mixing is of higher quality than ALSA's, due to OSS using more precise math in its sound mixing.
OSS has less latency compared to ASLA when mixing sound due to everything running within the Linux Kernel.
OSS offers per application volume control, ALSA does not.
ALSA can have the Operating System go into suspend mode when sound was playing and come out of it with sound still playing, OSS on the other hand needs the application to restart sound.
OSS is the only option for certain sound cards, as ALSA drivers for a particular card are either really bad or non existent.
ALSA is the only option for certain sound cards, as OSS drivers for a particular card are either really bad or non existent.
ALSA is included in Linux itself and is easy to get ahold of, OSS (v4) is not.

Now the question is where does the average user fall in the above categories? If the user has a sound card which only works (well) with one or the other, then obviously they should use the one that works properly. Of course a user may want to try both to see if one performs better than the other one.

If the user really needs to have a program output sound right until Linux goes into suspend mode, and then continues where it left off when resuming, then ALSA is (currently) the only option. I personally don't find this to be a problem, and furthermore I doubt it's a large percentage of users that even use suspend in Linux. Suspend in general in Linux isn't great, due to some rogue piece of hardware like a network or video card which screws it up.

If the user doesn't want a hassle, ALSA also seems the obvious choice, as it's shipped directly with the Linux Kernel, so it's much easier for the user to use a modern ALSA than it is a modern OSS. However it should be up to the Linux Distribution to handle these situations, and to the end user, switching from one to the other should be seamless and transparent. More on this later.

Yet we also see due to better sound mixing and latency when sound mixing is involved, that OSS is the better choice, as long as none of the above issues are present. But the better mixing is generally only noticed at higher volume levels, or rare cases, and latency as I'm referring to is generally only a problem if you play heavy duty games, and not a problem if you just want to listen to some music or watch a video.


But wait this is all about the back-end, what about the whole developer API issue?

Many people like to point fingers at the various APIs (I myself did too to some extent in my previous article). But they really don't get it. First off, this is how your average sound wrapper API works:

The program outputs sound using a wrapper, such as OpenAL, SDL, or libao, and then sound goes to the appropriate high level or low level back-end, and the user doesn't have to worry about it.

Since the back-ends can be various Operating Systems sound APIs, they allow a developer to write a program which has sound on Windows, Mac OS X, Linux, and more pretty easily.

Some like Adobe like to say how this is some kind of problem, and makes it impossible to output sound in Linux. Nothing could be further from the truth. Graphs like these are very misleading. OpenAL, SDL, libao, GStreamer, NAS, Allegro, and more all exist on Windows too. I don't see anyone complaining there.

I can make a similar diagram for Windows:

This above diagram is by no means complete, as there's XAudio, other wrapper libs, and even some Windows only sound libraries which I've forgotten the name of.

This by no means bothers anybody, and should not be made an issue.

In terms of usage, the libraries stack up as follows:
OpenAL - Powerful, tricky to use, great for "3D audio". I personally was able to get a lot done by following a couple of example and only spent an hour or two adding sound to an application.
SDL - Simplistic, uses a callback API, decent if it fits your program design. I personally was able to add sound to an application in half an hour with SDL, although I don't think it fits every case load.
libao - Very simplistic, incredibly easy to use, although problematic if you need your application to not do sound blocking. I added sound to a multitude of applications using libao in a matter of minutes. I just think it's a bit more annoying to do if you need to give your program its own sound thread, so again depends on the case load.

I haven't played with the other sound wrappers, so I can't comment on them, but the same ideas are played out with each and every one.

Then of course there's the actual OSS and ALSA APIs on Linux. Now why would anyone use them when there are lovely wrappers that are more portable, customized to match any particular case load? In the average case, this is in fact true, and there is no reason to use OSS or ALSA's API to output sound. In some cases, using a wrapper API can add latency which you may not want, and you don't need any of the advantages of using a wrapper API.

Here's a breakdown of how OSS and ALSA's APIs stack up.
OSSv3 - Easy to use, most developers I spoke to like it, exists on every UNIX but Mac OS X. I added sound to applications using OSSv3 in 10 minutes.
OSSv4 - Mostly backwards compatible with v3, even easier to use, exists on every UNIX except Mac OS X and Linux when using the ALSA back-end, has sound re-sampling, and AC3 decoding out of the box. I added sound to several applications using OSSv4 in 10 minutes each.
ALSA - Hard to use, most developers I spoke to dislike it, poorly documented, not available anywhere but Linux. Some developers however prefer it, as they feel it gives them more flexibility than the OSS API. I personally spent 3 hours trying to make heads or tails out of the documentation and add sound to an application. Then I found sound only worked on the machine I was developing on, and had to spend another hour going over the docs and tweaking my code to get it working on both machines I was testing on at the time. Finally, I released my application with the ALSA back-end, to find several people complaining about no sound, and started receiving patches from several developers. Many of those patches fixed sound on their machine, but broke sound on one of my machines. Here we are a year later, and my application after many hours wasted by several developers, ALSA now seems to output sound decently on all machines tested, but I sure don't trust it. We as developers don't need these kinds of issues. Of course, you're free to disagree, and even cite examples how you figured out the documentation, added sound quickly, and have it work flawlessly everywhere by everyone who tested your application. I must just be stupid.

Now I previously thought the OSS vs. ALSA API issue was significant to end users, in so far as what they're locked into, but really it only matters to developers. The main issue is though, if I want to take advantage of all the extra features that OSSv4's API has to offer (and I do), I have to use the OSS back-end. Users however don't have to care about this one, unless they use programs which take advantage of these features, which there are few of.

However regarding wrapper APIs, I did find a few interesting results when testing them in a variety of programs.
App -> libao -> OSS API -> OSS Back-end - Good sound, low latency.
App -> libao -> OSS API -> ALSA Back-end - Good sound, minor latency.
App -> libao -> ALSA API -> OSS Back-end - Good sound, low latency.
App -> libao -> ALSA API -> ALSA Back-end - Bad sound, horrible latency.
App -> SDL -> OSS API -> OSS Back-end - Good sound, really low latency.
App -> SDL -> OSS API -> ALSA Back-end - Good sound, minor latency.
App -> SDL -> ALSA API -> OSS Back-end - Good sound, low latency.
App -> SDL -> ALSA API -> ALSA Back-end - Good sound, minor latency.
App -> OpenAL -> OSS API -> OSS Back-end - Great sound, really low latency.
App -> OpenAL -> OSS API -> ALSA Back-end - Adequate sound, bad latency.
App -> OpenAL -> ALSA API -> OSS Back-end - Bad sound, bad latency.
App -> OpenAL -> ALSA API -> ALSA Back-end - Adequate sound, bad latency.
App -> OSS API -> OSS Back-end - Great sound, really low latency.
App -> OSS API -> ALSA Back-end - Good sound, minor latency.
App -> ALSA API -> OSS Back-end - Great sound, low latency.
App -> ALSA API -> ALSA Back-end - Good sound, bad latency.

If you're having a hard time trying to wrap your head around the above chart, here's a summary:
  • OSS back-end always has good sound, except when using OpenAL->ALSA to output to it.
  • ALSA generally sounds better when using the OSS API, and has lower latency (generally because that avoids any sound mixing as per an earlier diagram).
  • OSS related technology is generally the way to go for best sound.


But wait, where do sound servers fit in?

Sounds servers were initially created to deal with problems caused by OSSv3 which currently are non existent, namely sound mixing. The sound server stack today looks something like this:

As should be obvious, these sound servers today do nothing except add latency, and should be done away with. KDE 4 has moved away from the aRts sound server, and instead uses a wrapper API known as Phonon, which can deal with a variety of back-ends (which some in themselves can go through a particular sound server if need be).

However as mentioned above, ALSA's mixing is not of the same high quality as OSS's is, and ALSA also lacks some nice features such as per application volume control.

Now one could turn off ALSA's low quality mixer, or have an application do it's own volume control internally via modifying the sound wave its outputting, but these choices aren't friendly towards users or developers.

Seeing this, Fedora and Ubuntu has both stepped in with a so called state of the art sound server known as PulseAudio.

If you remember this:

As you can see, ALSA's API can also output to PulseAudio, meaning programs written using ALSA's API can output to PulseAudio and use PulseAudio's higher quality sound mixer seamlessly without requiring the modification of old programs. PulseAudio is also able to send sound to another PulseAudio server on the network to output sound remotely. PulseAudio's stack is something like this:

As you can see it looks very complex, and a 100% accurate breakdown of PulseAudio is even more complex.

Thanks to PulseAudio being so advanced, most of the wrapper APIs can output to it, and Fedora and Ubuntu ship with all that set up for the end user, it can in some cases also receive sound written for another sound server such as ESD, without requiring ESD to run on top of it. It also means that many programs are now going through many layers before they reach the sound card.

Some have seen PulseAudio as the new Voodoo which is our new savior, sound written to any particular API can be output via it, and it has great mixing to boot.

Except many users who play games for example are crying that this adds a TREMENDOUS amount of latency, and is very noticeable even in not so high-end games. Users don't like hearing enemies explode a full 3 seconds after they saw the enemy explode on screen. Don't let anyone kid you, there's no way a sound server, especially with this level of bloat and complexity ever work with anything approaching low latency acceptable for games.

Compare the insanity that is PulseAudio with this:

Which do you think looks like a better sound stack, considering that their sound mixing, per application volume control, compatibility with applications, and other features are on par?

And yes, lets not forget the applications. I'm frequently told about how some application is written to use a particular API, therefore either OSS or ALSA need to be the back-end they use. However as explained above, either API can be used on either back-end. If setup right, you don't have to have a lack of sound using newer version of Flash when using the OSS back-end.

So where are we today exactly?
The biggest issues I find is that the Distributions simply aren't setup to make the choice easy on the users. Debian and derivatives provide a Linux sound base package to select whether you want OSS or ALSA to be your back-end, except it really doesn't do anything. Here's what we do need from such a package:
  • On selecting OSS, it should install the latest OSS package, as well as ALSA's ALSA API->OSS back-end interface, and set it up.
  • Minimally configure an installed OpenAL to use OSS back-end, and preferably SDL, libao, and other wrapper libraries as well.
  • Recognize the setting when installing a new application or wrapper library and configure that to use OSS as well.
  • Do all the above in reverse when selecting ALSA instead.

Such a setup would allow users to easily switch between them if their sound card only worked with the one which wasn't the distribution's default. It would also easily allow users to objectively test which one works better for them if they care to, and desire to use the best possible setup they can. Users should be given this capability. I personally believe OSS is superior, but we should leave the choice up to the user if they don't like whichever is the system default.

Now I repeatedly hear the claim: "But, but, OSS was taken out of the Linux Kernel source, it's never going to be merged back in!"

Let's analyze that objectively. Does it matter what is included in the default Linux Kernel? Can we not use VirtualBox instead of KVM when KVM is part of the Linux Kernel and VirtualBox isn't? Can we not use KDE or GNOME when neither of them are part of the Linux Kernel?

What matters in the end is what the distributions support, not what's built in. Who cares what's built in? The only difference is that the Kernel developers themselves won't maintain anything not officially part of the Kernel, but that's the precise jobs that the various distributions fill, ensuring their Kernel modules and related packages work shortly after each new Kernel comes out.

Anyways, a few closing points.

I believe OSS is the superior solution over ALSA, although your mileage may vary. It'd be nice if OSS and ALSA just shared all their drivers, not having an issue where one has support for one sound card, but not the other.

OSS should get suspend support and anything else it lacks in comparison to ALSA even if insignificant. Here's a hint, why doesn't Ubuntu hire the OSS author and get it more friendly in these last few cases for the end user? He is currently looking for a job. Also throw some people at it to improve the existing volume controlling widgets to be friendlier with the new OSSv4, and maybe get stuff like HAL to recognize OSSv4 out of the box.

Problems should be fixed directly, not in a roundabout matter as is done with PulseAudio, that garbage needs to go. If users need remote sound (and few do), one should just be easily able to map /dev/dsp over NFS, and output everything to OSS that way, achieving network transparency on the file level as UNIX was designed for (everything is a file), instead of all these non UNIX hacks in place today in regards to sound.

The distributions really need to get their act together. Although in recent times Draco Linux has come out which is OSS only, and Arch Linux seems to treat OSSv4 as a full fledged citizen to the end user, giving them choice, although I'm told they're both bad in the the ALSA compatibility department not setting it up properly for the end user, and in the case of Arch Linux, requiring the user to modify the config files of each application/library that uses sound.

OSS is portable thanks to its OS abstraction API, being more relevant to the UNIX world as a whole, unlike ALSA. FreeBSD however uses their own take on OSS to avoid the abstraction API, but it's still mostly compatible, and one can install the official OSSv4 on FreeBSD if they so desire.

Sound in Linux really doesn't have to be that sorry after all, the distributions just have to get their act together, and stop with all the finger pointing, propaganda, and FUD that is going around, which is only relevant to ancient versions of OSS, if not downright irrelevant or untrue. Let's stop the madness being perpetrated by the likes of Adobe, PulseAudio propaganda machine, and whoever else out there. Let's be objective and use the best solutions instead of settling for mediocrity or hack upon hack.

236 comments:

1 – 200 of 236   Newer›   Newest»
Andrew Z said...

You want to look at the state "today," but developers** need to support older technology. For example, "today" Debian and Red Hat support Linux distributions with kernels year old.

** I am a developer (not a sound developer) and a Linux fan, and I know first hand developing for Linux would be easier if distro's united in terms of standards such as packaging, libraries, and layout of system files.

Theodore Tso said...

I've commented on this on your earliest May 25th post, but let me make the point (succinctly) here as well. One of the problems with your analysis is that you are only considering the raw technology, and not the development community and how supportable a technology will be in the long run based on whether the lead developer is capable of attracting other OSS developers to contribute towards his vision.

This is a critical point as far as a distribution is concerned, since they can't afford to subsidize all of the development for a technology or subsystem themselves. So it's very easy to say, "<Distribution X> should just support OSS", but it has to take into account whether or not it has an active viable community, or whether it needs to employ all of the developers in order to make it be successful --- and if OSS has only two developers, even if they are extremely talented, is that enough to support the entire sound subsystem and all of its device drivers? And if not, why haven't other sound developers flocked to OSS in the past? And if most of the existing sound developers have already committed to developing sound device drivers for ALSA, are the OSS developers capable of attracting either the current ALSA developers, or new sound developers, to flock under the banner? What has changed so that they will be successful in the future, as compared to their past record?

If you were the hiring manager at a distribution, these are all questions that you would be wanting answers towards before deciding that hiring Hannu would make good business sense, don't you think?

Jorge said...

You say that OSS is open osurce now but here I read that: "Open Sound System is now free for personal and non-commercial use and comes with a license key that will allow you to run OSS. The license key is valid for up to 6 months at a time after which you will need to download and install OSS again"
So, is it really open source?

Theodore Tso said...

Jorge, the GPL'ed version can be found here:

http://www.4front-tech.com/developer/sources/stable/gpl/

For a good time, though, try running the OSSv4 kernel files through the standard Linux kernel "checkpatch" script:

total: 588 errors, 1644 warnings, 6631 lines checked

oss_audio_core.c has style problems, please review. If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

This is just from one file. It would take a very large amount of effort to make OSSv4 to acceptable for merging into the Linux kernel.

Reece Dunn said...

When writing an application, the OSS API is easy to work with - it uses the existing file API pointing to /dev/dsp or whatever. The problem from the users point of view is that only one application can access this at one time.

ALSA does not, IIUC have the one application only limitation (although you may need driver/sound card support for this). However, I have found the API complex to work with - especially when trying to avoid underruns.

PulseAudio is a maturing system, and after an initially bumpy ride (on Ubuntu at least) is looking good. As it matures, the user experience can only improve. It also allows multiple applications to play sound at the same time. From a development point of view, it provides a complicated API (if you want fine-grained control), or a simple API (that uses an open/read/write/close style API). Using the simple API, I was able to get an application to support pulseaudio very quickly.

GStreamer is different to the audio APIs above in that it is about rendering and processing media streams. You can use it to play any supported audio and video file, or encode them into a different format. It supports alsa, oss, pulseaudio and other audio back ends, making it easy to support playback to those devices. GStreamer looks very good, but I am not finding any decent tutorials or documentation on how to do anything beyond connecting an audio pipeline (i.e. how do you write an audio source? can you hook into a sink easily?)

As for the others (jack, esd, arts, phonon), I don't have any experience of those to comment.

matt.helsley said...

Which OpenAL library did you use? The reference implementation has long been utter junk. Yet it was the only OpenAL implementation on Linux for quite some time and hence shipped with many distros. In contrast, OpenAL-soft has an ALSA implementation with superior sound quality (though I haven't measured the latency).

Perhaps the cross-platform sound API you couldn't quite recall is the "Miles sound system"?

I don't care that ALSA doesn't "always" support mixing if OSSv3 is at fault. That doesn't reflect poorly on ALSA so much as OSSv3.

You seem to be ignoring the fact that the OSSv3 -> OSSv4 cycle was unbearably long. It was so unbearable that hordes of sound mixers sprang up and confused folks like the Flash developers. So unbearable that OSSv3 was dropped and ALSA adopted in the meantime.

In contrast, I've noticed problems with pulseaudio and ALSA are fixed without multi-year lag. I much prefer that to 4Front's approach.

nils said...

It's sad jack was left out. It works the best for realtime applications (<10ms latency), is supported on all major os' , seems to be the favorite of most audio applications.

I can't speak from a developers perspective but from a users it allows a lot of flexibilty in routing audio from one app to another.

Aigarius said...

PulseAudio had latency problems because it exposed some weird ALSA bugs in odd places that ALSA libraries did not tuch that way. With the latest kernel, latest ALSA libraries and latest PulseAudio, the latencies (added by PA) should be in single milisecond range.

Even on released Ubuntu stack latencies are too low for me to measure without specialised equipment.

henke said...

About Flash, look for libflashsupport. It's a c file letting you use any sound api that you can code into it.

Ilmari Heikkinen said...

You don't fix performance bottlenecks (ALSA) by executing more code (Pulseaudio.) You fix performance bottlenecks (ALSA) by fixing performance bottlenecks (ALSA.)

Except that ALSA has only maintenance, no development. It's had the same problems (dmix sucks) and same bugs (dmix doesn't work on OSS emulation) for the past 10 years. It's dead technology.

Pharaoh Atem said...

I see a problem with supporting OSSv4 on Linux directly. Drivers.

OSSv4 has support for quite a few common sound devices. However, MIDI devices are still not supported.

Wouldn't it make sense for OSSv4 to support what ALSA does, but better while providing additional functionality? Well, unfortunately, OSSv4 does not yet do that.

Also, there is a lot of drivers already written for ALSA. What would we do about them? If those drivers could be ported to OSSv4 easily, then maybe we could get somewhere.

I want to see my preferred distribution, Fedora, add OSSv4 support, but I do not think it will happen. Why? Fedora prides itself on being very close to upstream in its packages, including the kernel, even though there are quite a few patches in there. They don't even allow kmod packages anymore because of this.

Also, Theodore Tso said that the OSSv4 code is not yet acceptable for merging into the kernel. If an effort could be made to make OSSv4 acceptable for merging into the kernel, then maybe more people would work on it. As it is, somebody bringing up adding support for OSSv4 is more likely to be flamed into submission rather than getting something constructive out of it.

I do agree that all this nonsense about sound servers is rather pointless though. PulseAudio is rather good, but it is rather annoying when there is a noticeable delay in the sound effects in a game. Also, sometimes I experience the weird effect of every sound effect being replayed when no event is activating them.

If you can drum up the support needed to get OSSv4 considered for distro/kernel inclusion, then great!

Otherwise, we are all screwed with the mess that is ALSA.

meteficha said...

I may look like a troll but this is my PulseAudio use case (that I'm really using right now). What I have:

- I have an internal sound card.
- I have an USB headphone (which acts as a USB sound card).
- I want to play sounds.

Well, simple enough. Now:

- If I'm playing sound through my internal card and plug the headphone, I want the sound to continue playing.
- Whenever I feel like, I want to transfer the sound to the headphone.
- Sometimes someone wants to hear what I'm listening, so I want to send the audio to both outputs.

There are more use cases, but those seem sufficient. Now, how does your OSSv4 magic work with this one? PulseAudio makes it dead easy, transfering audio streams between sound cards is as easy as it gets, with a GUI and everything.

sega01 said...

Hey,

Sometime ago, I read your original Sorry State of sound in Linux post and really liked it. I develop an Arch Linux fork called Icadyptes, and have opted to exclusively use OSS4, for multiple reasons. After not being able to get the microphone input on an HDA Intel chipset working under ALSA, I gave OSS4 a try and it worked right off the bat :-). Since then, I've done more reading and comparing and found that OSS4 is probably better for my needs.

Not sure how Alsa/OSS specific this is, but I was very impressed with my past OSS4 music setup. I had an Icadyptes box with encrypted rootfs through dm-crypt storing my music (usually CD-quality FLAC's), sshfs mounted by another box (my main Icadyptes package building box, so lots of heavy CPU and I/O usage) which played the music through cmus (in a dtach session, but that's mostly irrelevant). It *never* skipped even once, even with full load on both boxes. I don't know the details of ALSA and OSS well enough to say that ALSA would have had problems or not, but my experience in the past says that ALSA might not have fared as well. Of course, I/O scheduling is a huge factor, among other things. I'm currently using MPD outputting to OSS4 on a loop-AES encrypted home folder and have had great success with both quality and no skipping.

Icadyptes does not have alsa-lib packaged, nor ALSA support in the kernel. I have to modify a few things to explicitly use OSS, but support for it is quite reasonable, as your blog post says. From a packager/distribution developer's standpoint, I have to give a huge recomendation for OSS. However, it is not without drawbacks in its current state.

OSS4 is wonderful to use, but has a horrific build system. It makes me cringe. If OSS4's goodness is the inverse of its difficulty to build, it's amazing (it really isn't that bad when you get it working, but it's certainly close to painful). I need to update to a 4.2 snapshot, but my (not totally mine, modified from others) build script (PKGBUILD, for Pacman/Arch Linux/*) can be found here.

Anyways, excellent job on the article :-). When I get around to learning C to a point that I could be more useful, I would love to personally help with OSS development. I think the most important things OSS4 needs are some sort of automatic module loading through udev and a revamping of the build system. It would be nice to have kernel patches, too, but just the fact of building it seperately really isn't that bad. Eyes are blurring, better post this comment before I pass out.

Cheers, and thanks!

David Fox said...

An audio system that doesn't support suspend/resume may be acceptable to a majority of Linux users, but it is simply not an option for a general purpose Linux distribution to ship as its standard configuration.

AdamW said...

There's another rather large problem with the 'F/OSS' OSSv4, which is that it is not in point of fact F/OSS, and expressly refuses to commit to ever reliably being F/OSS:

http://developer.opensound.com/opensource_oss/licensing.html

(the last bit, titled "Is everything in OSS open sourced?")

Frankly, we now have a reasonable story for sound on Linux. The story is everyone should run PulseAudio, and apps should use either the PulseAudio API, the simple ALSA API, or a sound library which does one of those two for them. What would help more than anything else is if someone would employ, oh, say, about five or ten more full-time ALSA hackers, because PA is just a sound server, and what we really need are improved and fixed ALSA drivers. And for that, ALSA needs more developers.

WWWWolf said...

This is all very interesting...

...but what about MIDI?

I tend to remember old OSS's MIDI support as something that sometimes sort of worked, while ALSA's MIDI has been quite well-working for me. And the vast majority of apps nowadays seem to depend on ALSA's MIDI features. I can't seem to remember if OSS did the funky routing things that ALSA supports.

Does the new OSS do this too?
How much work will be needed to convert the existing ALSA MIDI applications to OSS? (I don't think there are any interface libraries in use here...)

(And why the heck this blog won't let me log in using OpenID, dammit? Some hard-hitting open-source commenting this is, if we need to use an Ewil Googley Login =)

slashdotaccount said...

Linux needs a Gallium3D for audio, a bunch of different APIs (OSS, ALSA, OpenAL, SDL, etc) thinly layered on top of a very flexible backend.

Christine! said...

Yeah, it's nice, but until OSSv4 supports usb microphones it's utterly useless to anyone that uses VOIP.

I finally gave up and ran pulseaudio, which works fine AND I can send music from my server straight to the pulseaudio server on every other computer in the house.

Rudd-O said...

I'm sorry, dude, but you don't know what you're talking about for shit.

I play UNREAL TOURNAMENT over pulseaudio, and I get ZERO LATENCY. I can even dynamically switch sound cards while playing, moving the game audio to my headphones and continuing to play music over the stereo which is plugged into a different sound card, and I see ABSOLUTELY NO PERFORMANCE OR LATENCY PROBLEM.

If you actually knew what you were writing about, you'd know WHY PulseAudio actually REDUCES latency.

Next time, do some more research instead of talking trash.

Philip Hands said...

On reading your comment about the inadequacy of the linux-sound-base package I thought that I might suggest that you submit a wishlist bug.

That was on the assumption that oss4 was available in Debian, but I find that the ongoing effort to package oss4 for Debian has so far failed to survive scrutiny on the grounds that the package's licensing is in need of attention.

Of course, I'd expect that the effort to package it will continue, so you could always help with that if you want to make that happen more quickly. Until that's done, you cannot reasonably expect other packages to support it very well.

k said...

Can someone explain why ALSA has never been able to reproduce sound loudness at Windows equivalent volume levels on my hardware, why we don't have extra stuff in sound drivers like "reduce mic noise level" or "reduce echo" or why such resource wasting volume control per application implementation compared to Windows is suddenly so acceptable?

Why can't we fix software mixing and volume control per application in ALSA?

ken said...

Wow, what a mess. You can't see the forest for the trees. Take a look at CoreAudio for a real API that actually works, without caveats.

QuoiD said...

WOW, their site (OSS) is a little confusing... according to the developer page link, the license is GPL v2 (as of 2008--which is when the post was made) and various others too, however when you try to download from the main site it says:

"Welcome to the Open Sound System Driver Download page

Open Sound System is now free for personal and non-commercial use and comes with a license key that will allow you to run OSS. The license key is valid for up to 6 months at a time after which you will need to download and install OSS again. There are no time limitations or restricted functionality during the licensing period. A permanant license key that will entitle you to free support and upgrades can be ordered here"


Now as far as I know, if its licensed under GPL they can't restrict commercial usage at all??? presumably there are some internal parts unusable commercially while the rest of it is GPL... Its a very silly licensing system IMO. It just doesnt make enough sense.

Oh, they also don't seem to make enough distinction between what license applies to what version, its just referred to as OSS. Confusing!

ion said...

PulseAudio actually reduces hardware latency.

“these sound servers today do nothing except add latency” is a false statement.

“stop with all the finger pointing, propaganda, and FUD that is going around” – yeah, you do that. :-)

gurra said...

The article and the ensuing comment, as well as my own struggles with making sound work on my machines clearly show that the state of sound in Linux still is a sorry mess. This is probably the largest factor in slowing down the deployment of Linux on privately owned machines.

I think that in order to really fix the problem, a number of sound system developers and a few very skilled systems architects need to get together and make a new end-to-end design. The understanding of the problem is much better today than it was when ALSA was conceived as a replacement for OSS.

You must be able to plug in and adjust the sound level of each input and output device individually. You must be able to adjust the gain of every application consuming or producing sound individually.
You must be able to perform transformations to the sound streams at the proper points in the sound chain. It must be possible to test the system at all key points in the chain. A requirements specification would be a first step to building a specification for a simple and robust sound system with minimum latency. All of what there currently is, is a patchwork of ideas not fully thought through.

thegnu said...

Arch Linux is not nearly as bad as you describe it. After configuring ALSA--which last time I used it, can be done via an autoconfig script if you're lazy--it just works.

Arch, in my opinion, is one of the easiest distros to get up and running, and by far the easiest to troubleshoot. But then, I'm an ArchFag.

jhansonxi said...

It would be interesting to see comparisons with Jack (as mentioned above) and commercial solutions like FMOD.

Killa B said...

@thegnu:
I think you may have misunderstood the part about Arch Linux.

It is easy to setup ALSA. It's not so easy to setup ALSA compatibility in OSS.

When I first tried OSS, I had a few problems, the worst being a complete lack of sound in SDLMAME. It took me forever to fix it, but I finally managed with the aid of the previous entry.

Arch is a great distro, but this is one thing that could be improved.

Will said...

Excellent and very informative! A compelling case to bring OSS back.

insane coder said...

Wow, I went away for the weekend, and I come back to find my mailbox filled with more than 200 comments on this article, not to mention all the ones I'm finding all over the net. That's a first.

Hi Andrew Z, yes, things would be easier regarding deployment if the communities agreed more. You may be interested in reading this other article for my thoughts on that.

Hi Theodore.

>I've commented on this on your earliest May 25th post, but let me make the point (succinctly) here as well.
Okay, I responded in my other article as well.

>One of the problems with your analysis is that you are only considering the raw technology
Objective analysis is one of the most important tools there are for grading a technology. Other methods will usually include bias.

>not the development community
It really depends on the development community.
Most development communities I talk to are the kind filled with people that put their hands over their ears and chant: "nyah nyah nyah, I can't hear you", and ignore any fundamental issues that you may point out.
You also have to judge a community by what they put out. We've seen what Microsoft puts out, and they have the largest community of them all. By your size method, we should just all switch to Windows and ditch Linux. I think those of us using Linux are here because we find it superior in ways that matter to us. We need that same objectivity when reviewing our own homegrown projects, and not get sidetracked by how many followers any particular technology has.

>and how supportable a technology will be in the long run based on whether the lead developer is capable of attracting other OSS developers to contribute towards his vision.
Well, that's just what we call bad advertising. Linux itself has bad advertising, we really should be attracting a lot more developers. I think I'll write an article about that sometime this week.
However, once we found out about a good technology, do the advertising yourself like I am, tell your friends, objectively discuss it, and see if it's something we as a community should work on. We don't have to lock ourselves into how well a single developer advertises his own product.

>This is a critical point as far as a distribution is concerned, since they can't afford to subsidize all of the development for a technology or subsystem themselves.
Very true. Except you have Red Hat today subsidizing PulseAudio which came out of no where, and is solving a problem which doesn't exist, instead of spending money fixing the actual problems we do have.

>So it's very easy to say, "<Distribution X> should just support OSS"
Indeed it is. But it was not easy to write say the above article to call awareness to why <Distribution X> should support OSS (I never had the "just" in there).

>but it has to take into account whether or not it has an active viable community, or whether it needs to employ all of the developers in order to make it be successful
At this point, I only think one developer needs to be employed to make OSS itself successful, and maybe some other guys to get patches and features into GNOME and KDE mixers to make it better for the end user.

>and if OSS has only two developers, even if they are extremely talented, is that enough to support the entire sound subsystem and all of its device drivers?
They managed to do it till now. Their track record speaks for itself.

insane coder said...

>And if not, why haven't other sound developers flocked to OSS in the past?
Maybe because it was closed source? And even now that it's open source, people are still living in the past? Look at what Mr. Reece Dunn states in the comments here:

"When writing an application, the OSS API is easy to work with - it uses the existing file API pointing to /dev/dsp or whatever. The problem from the users point of view is that only one application can access this at one time."
That hasn't been true with OSS for at least two years, or in FreeBSD since forever, and certainly was covered here, showing he hasn't even read this article and people are still very much living in the past.
It's sad how out of date most people are with their information, and refuse to read anything new.

Recall this:
"The master then asked his students: What should a good programmer avoid?
The first answered: Ignoring what is going on in the code.
The second answered: Idiot friends.
The third answered: A weak community.
The fourth answered: Allocating resources without freeing them.
The fifth answered: Becoming complacent in his understanding of what is best.
The master stated, the fifth answered best, as one who becomes complacent will end up with what everyone else answered."
Seems to mirror our current discussion pretty well.

>And if most of the existing sound developers have already committed to developing sound device drivers for ALSA
Maybe in the Linux world, not in the open source world at large. Lets not forget FreeBSD, OpenBSD, and whoever else out there.

>are the OSS developers capable of attracting either the current ALSA developers, or new sound developers, to flock under the banner?
By themselves? Perhaps not, but that's where we should come in.

>What has changed so that they will be successful in the future, as compared to their past record?
They're open source now? And despite not having anything but two developers in the past they have a very comprehensive set of drivers?

>If you were the hiring manager at a distribution, these are all questions that you would be wanting answers towards before deciding that hiring Hannu would make good business sense, don't you think?
I actually do play a role in deciding who to hire people at work. When we need someone to fill a position, we do our research on what type of person we need, and then find the person that fits that position the best, based on their knowledge, abilities, and how well they work with other people we already employ.
I don't know how well Hannu and Dev play with others, but I do very much want that talent, and be willing to spend money on them at least in a 3-6 month trial term to see what they can pull off based on requirements given to them. If they did well, I'd probably permanently employ them, as long as there was work to be done and new sound cards were still coming out. If only I worked at Red Hat or Novell, eh?

Hi Jorge. That info by the download link on their site is really old, seems they forgot to update it. OSS v4.1+ doesn't have any license keys in it. And there is indeed an open source repository.

Theodore.
> This is just from one file. It would take a very large amount of effort to make OSSv4 to acceptable for merging into the Linux kernel.
I don't believe it needs to go into the Kernel. Why keep perpetuating this myth?

insane coder said...

Hi matt.helsley.
I installed libopenal-dev and libalut-dev packages in Debian, and used stuff from them. I don't believe it was "soft". Which may explain why I get such horrible performance with it when it goes through the ALSA library.

The sound APIs I forgot were Audiere and FMOD.

>I don't care that ALSA doesn't "always" support mixing if OSSv3 is at fault. That doesn't reflect poorly on ALSA so much as OSSv3.
No, it actually does reflect poorly on ALSA. There's no reason why they shouldn't support mixing with it. OSSv3 itself, after it went closed source got mixing, as did FreeBSD versions of it. There's no reason why any particular sound API shouldn't have mixing. If ALSA can't have mixing everywhere, then it fails to do what any sensible sound system should be capable of.

>You seem to be ignoring the fact that the OSSv3 -> OSSv4 cycle was unbearably long.
You seem to be ignoring that this didn't hurt FreeBSD in the slightest. It was the Linux people who kept trying to fix the wrong problem. And now that someone has finally fixed the right problem, no one even wants to look at it.

>confused folks like the Flash developers
If they got confused by the "wrappers" aspect of it, then they have serious issues. All they had to do was pick SDL for example, and they'd have sound EVERYWHERE, including Windows and OS X under a single roof.

Hi Aigarius.
>Even on released Ubuntu stack latencies are too low for me to measure without specialized equipment.
Try a bunch of games. Also try something like this program. Everyone who I spoke to gets 2-3 seconds worth of latency with it.

Hi henke.
>About Flash, look for libflashsupport. It's a c file letting you use any sound api that you can code into it.
I have never got that to work. Ever. See my other article about that.

Hi Ilmari Heikkinen.
Well said.

Hi Pharaoh Atem.
>However, MIDI devices are still not supported.
Yeah, I don't get that. You'd think tying in a public domain sound font, and parsing the instruments and outputting the PCM would be pretty easy.
I personally use TiMidity when I want to play MIDI files.

>Wouldn't it make sense for OSSv4 to support what ALSA does, but better while providing additional functionality? Well, unfortunately, OSSv4 does not yet do that.
Yes, and for the main features, OSSv4 is way ahead of ALSA. It's the lesser features that OSSv4 now needs to worry about. I mentioned this in the article.

>Also, there is a lot of drivers already written for ALSA. What would we do about them? If those drivers could be ported to OSSv4 easily, then maybe we could get somewhere.
From what I'm told, they can be. Just need more people to pitch in doing so.
Also in any event, it seems OSS has more drivers that only it supports than ALSA does, but I could be wrong about that.

>If you can drum up the support needed to get OSSv4 considered for distro/kernel inclusion, then great!
I'm trying to, but I shouldn't be the only one, we all need to be proactive here.

Hi meteficha.
I can't comment on each and every use case, but if you find something is working perfectly for you then by all means stick with it.
OSSv4 has a mixing system which seems really really powerful, but I don't think the GUI they provide is enough to unlock all the potential of it.

insane coder said...

Hi David Fox.
>An audio system that doesn't support suspend/resume may be acceptable to a majority of Linux users, but it is simply not an option for a general purpose Linux distribution to ship as its standard configuration.
Perhaps not, I think that's debatable, but either way we should work on that.
I personally would much rather have a sound system which works well when it's on, instead of a sound system that works better when/while it's turned off.

Hi Christine!.
>Yeah, it's nice, but until OSSv4 supports usb microphones it's utterly useless to anyone that uses VOIP.
Weird, I thought I saw an OSS USB driver in there.
As for VoIP, I have no issues with OSSv4. I have a microphone recommended by Cisco for use with their VoIP system which plugs into normal audio jacks, and it works great. It has a logo of "PARROTT" on it, not sure beyond that though as my employer got a whole batch of them from Cisco.

Hi Rudd-O.
Hahaha, you had me rolling, my sarcasm detector's needle exploded.
Yeah, it's hilarious when people say stuff like zero latency, like such a thing exists. Maybe PulseAudio comes with a time machine too.
Yeah, and the PulseAudio dev and his cronies going around saying how it reduces latency is funny too. Next they're going to say that PulseAudio's 5-10% CPU usage actually gives your CPU more MHz, so running PulseAudio actually reduces load and makes the rest of your machines faster.
Thanks, I needed the laugh, keep it up.

Hi Philip Hands.
I'm not sure what's the deal with the licensing issues, last I heard, it had to do with a bunch of header files still having a copy of the old license in them which is fixed in their repository.
In the mean time however, could it be placed in non-free?

I'd pitch in with helping to package it if I had a clue how to do this kind of stuff. I once tried reading some documentation on building a deb, and it seemed like a nightmare. Maybe things improved since then?

Hi k.
It was explained, ALSA's sound mixer isn't of such great quality. If you use OSSv4, odds are you'll be able to get higher volume, and even at OSSv4's maximum it will sound better than ALSA.

Hi QuoiD.
Yes, they need to update their pages and info which is quite old. I also think the developer doesn't quite get the idea of tri-licensing, instead saying that each license is for a different OS (the license matching whatever that OS license is).

sipiatti said...

Rudd-O: If you play unreal tournament without latency maybe it is your luck, or the developers of it made something that is completily unknown to others. I usually play id games, and e.g. ETQW has 3-5 secs latency with pulse. Absolutely unplayable :(

Reece Dunn said...

@insane coder

"When writing an application, the OSS API is easy to work with - it uses the existing file API pointing to /dev/dsp or whatever. The problem from the users point of view is that only one application can access this at one time."

That hasn't been true with OSS for at least two years, or in FreeBSD since forever, and certainly was covered here, showing he hasn't even read this article and people are still very much living in the past.

It's sad how out of date most people are with their information, and refuse to read anything new.


Sigh. This is not through reading information. Yes, I did a bit of reading to figure out how to write to the /dev/dsp device - getting the correct samplerate and such.

NOTE: This is OSS v2 (or v3), not OSS v4 as I don't have access to a system that supports v4. From what I understand, OSS v4 does provide what you are alluding to, but what systems support it (since from Ted Tso's comments, it isn't in the kernel).

The observation there is from practical experience on Ubuntu and Kubuntu 8.04, 8.10 and more recently on 9.04.

If I run Kubuntu 9.04 and install the Adobe flash plugin:
1. Start a program that uses OSS (I have used both Cepstral Swift and my own Orator Text-To-Speech program with eSpeak via OSS) and test it out -- everything is ok;
2. Fire up Firefox and watch a YouTube video or something that requires flash -- everything is ok;
3. Go back to program 1 and try it again -- the open() call fails with "Device or resource busy" when passed with "/dev/dsp";
4. Close Firefox -- make sure the process is no longer running;
5. Go back to program 1 and try it again -- everything is ok;

You'll need to use Kubuntu to run the test, as I am always getting the failure in (3) -- I think this is to do with PulseAudio running (even on the Ubuntu 9.04 system I am using now) -- as Kubuntu does not install pulseaudio by default.

So no, this is not information that I have read, this is practical experience on a current system.

I haven't tried other distributions to see what they are like.

Joe P. said...

It seems that all that's needed for improving the state of sound is to refactor the way certain sound packages are managed on the various distributions, and bring certain features of either OSS or ALSA up to par. If your assessment is accurate, I'd go with OSS, given that it seems to be closer to par than ALSA, despite the fact that ALSA historically has more mouths touting it. I like what you said about the track record of OSS speaking for itself, and it really seems like the OSS developers have their heads on straighter than all of the ALSA maintainers put together.

In terms of marketing, OSS needs our help in that area. I had no idea OSS4 had done away with the admittedly useless and unhelpful 6 month licensing period. (I haven't kept up with OSS since they started playing games with their licenses, seems like it's time I start to do so again.) I have developed using both OSS3 and ALSA, and like you say, coding for ALSA was a nightmare. Up until reading this and the other insane coding articles you've linked to from here, I thought that coding for both OSS3 and ALSA was the only way to provide support which included mixing in all cases (without going through garbage like PulseAudio, of course).

Without revealing too much about what I do, let me just say that latency is one of my department's biggest concerns, and the claim of people like Aigarius, Rudd-O, ion, and others that PulseAudio has "zero latency" or "reduces latency" is not only false, but downright ludicrous, not to mention impossible. Even if people won't listen to the fact that it's a logical and mathematical impossibility for adding a layer to reduce anything but patience, I can tell you that every stress test I had my people do involving PulseAudio added at least some latency. If I recall correctly, our worst case was something involving complicated CPU-intensive interpolation, and running it through PulseAudio added almost 2 full seconds of latency. I agree with Ilmari Heikkinen, if you have a pile of garbage, you don't fix it by attempting to hide it under yet more garbage, you clean it up and get rid of it.

Joe P. said...

In response to the comments that are still going on about merging OSS with the kernel, all I can say is, like it says in the article, we all use apps that aren't part of the kernel all the time. It doesn't make one bit of difference if OSS is in the kernel or not. Please stop mentioning it as a concern.

For those who laud ALSA's ability to suspend/resume as a reason to use it over OSS, let me just say that a system whose best quality is that it knows how to shut off properly is scarcely a good system at all, but again, I agree that OSS should add save/load support. And Reece Dunn, maybe you should fully read articles that you intend to comment on, rather than just skimming them and then saying things that are so blatantly foolish when compared to the very thing you are responding to. If it wasn't clear enough by your first post that you hadn't actually read the article, your more recent post makes it quite clear you missed the line that says "Most people who discuss OSS or try OSS to see how it stacks up against ALSA unfortunately are referring to, or are testing out the one that is a decade old, providing a distortion of the facts as they are today."

Finally, I find it interesting that all of the comments here agreeing with the article are well thought out, intelligent, and present cases backed up by factual examples. Conversely, the comments absurdly extolling the merits of garbage like ALSA and PulseAudio seem to be coming from people stuck on status quo, not willing to let go of what they're used to. Most of these comments hold as much sway in my mind as would a comment saying "ALSA IZ TEH AWESOMNEZZ!!!!!!!!! U R STOOPID." I laughed especially hard at AdamW's post, which I hope for his sake was intended to be facetious and sarcastic, because absolutely everything he said is contrary to logic and intelligence.

Given that the positive feedback is coming from clearly intelligent people, and the negative feedback is coming from people who would be better off finding a profession where the precious organ they're wasting won't be missed, I'd say that you clearly know what you're talking about. Kudos to you, and if you have any more ideas on how we can make an impact, please let me know.

TGM said...

It sounds to me it sounds like this: "If everyone is willing to settle on a high latency, mid range quality sound - why embrace higher quality lower latency technology?"

Well you guys tell me, you jumped ship from OSS3 to ALSA in the first place...

Developers should only be worrying about the higher level libraries and leaving the hard work to the toolkits; Hard work that is going on today because other OSs support OSS4 first.

If OSS4 does what it says on the tin then it will end up becoming the staple for recording studio *nixes and the rest will be left behind.

As an extension to this, Linux also has a real-time kernel. Why are we not making Linux the *best* it can be, but sticking to a mediocre inbetween...

We seriously need to raise our game, core devs and software devs.

Reece Dunn said...

@Joe J.

Ok, so OSS v4 does not have the multiple application access to /dev/dsp issue. I misread insane coders comment as "modern systems" instead of "OSS in the past 2 years", sorry my mistake.

From a developer's perspective, you need at least 3 machines to do testing of OSS/ALSA/PulseAudio (not sure about testing Jack and others):
1. a machine with ALSA and OSSv3 set up (i.e., no PulseAudio or OSSv4)
2. a machine with PulseAudio configured and running
3. a machine with OSSv4 configured and running

Now from a users point of view, trying to get OSSv4 up and running feels like the fiasco with PulseAudio when Ubuntu initially made the switch (things are better now with the latest version). That's after you convince them to run some "arcane" commands that they may or may not understand.

Setting up OSSv4 on my system, the Adobe flash plugin isn't working -- it (along with anything else that uses PulseAudio) is silent. Googling about this, it is not possible to get PulseAudio to use OSSv4 as a back end.

Your ordinary user wants things to "just work". The distributions are going to need convincing to make the switch to yet another sound system after investing resources to migrate to PulseAudio, fixing issues there.

And please, be more respectful.

Dan said...

Wow... usually i'm the first one to post, but this article seems to be all over the internet...

So, i liked everything you said. Pretty much everything i would have said has already been taken by the people who have posted here, so no need to repeat stuff, but i did want to comment on adobe's FUD, which no one seems to have addressed yet.

Basically, it's as follows, just because adobe can't make a diagram to save their lives, doesn't mean that the system they're diagramming is flawed. Sure there's many different paths and some are better than others, but that doesn't mean it has to be confusing. Why take a look at this diagram i made of how to get to the supermarket. See? Not at all confusing.

I hope that gave everyone a good laugh, but now kidding aside, it proves that anyone can make diagrams of even the simplest things in such a way that they look overwhelming and intimidating. I liked your diagrams on this article, they were clear, concise, and to the point. (Though, i dunno why you spent the time to match the backgrounds to your blog's color, rather than just making them transparent... and what's with the black line on the windoze diagram? but i guess it could be worse... at least they don't look like one of the coderlings drew them)

P.S. Reece, i'm sure that Joe P. finds it VERY respectful that you didn't even bother to read so much as his NAME correctly... And i don't think he was disrespectful in any way.

Reece Dunn said...

@Dan

Thanks for pointing out the typo/thinko in my previous reply. I didn't mean Joe P any disrespect by it.

Update: the pulseaudio/flash issue can be resolved by purging and removing pulseaudio.

Matteo said...

Hi I'm just an end user (a sophisticated one if you wish), not a dev. I don't personally have any qualms about using oss vs alsa, to me it's just a question of functionality.
When I switched from oss to alsa years ago, it magically solved the mixing problem and all was good (after cmmiing the bloddy thing from source cause the included versions by distros at the time were death traps).
My use-case today is a gnome Desktop browser with flash and some guitar recording stuff through a behringer UCG102 and jack (sometimes I even play with midi). Now all is not good, flash hogs the sound (I'm on Ubuntu LTS), or is not able to access it if anything else is using it. Jack works, but the implementation in ubuntu is horrid.
So most of this is distro specific and outdated (LTS is not supposed to be outdated, but Linux users live on the bleeding edge), but it points out that there is still substantial work to be done.
To be honest I'd rather see resources added to the alsa-pulseaudio-jack landscape, than spent on OSS which is now left in the margins having achieved pariah status with licensing ordeals (the fact that you sponsor timidity rather than hardware midi, in lieu of OSS lack of support, tells me you don't do much audio production and should test the difference).
My personal opinion is that time and resources should be best spent on fixing alsa (possibly even refactoring if necessary), and pulseaudio (mainly teaching it when to shutdown for games and low latency work) rather than reroute them on a technology which is dubious and lacks support for important functionality.

Just my 0.02 $

Matt

dawhead said...

OSS always has sound mixing, ALSA does not.

Except that it puts it into the kernel, for no technical reason other than its easier, and thereby conflicts with every principle of Linux kernel/user-space development that has existed for at least 10 years.


OSS sound mixing is of higher quality than ALSA's, due to OSS using more precise math in its sound mixing.



This is totally bogus. Its never been double-blind tested, and moreover it definitely doesn't use and cannot use floating point math for mixing, which just happens to be the format used by every other audio application and API other than those that are forced to use fixed-point because of DSP chip architecture (e.g. ProTools). Nobody in the pro-audio world seems to like the sound of fixed point. Why does OSS not use floating point? Because the linux kernel does not allow floating point use - or more precisely, it does not save and restore FP registers during kernel context switches, thus making it practically impossible.


OSS has less latency compared to ASLA when mixing sound due to everything running within the Linux Kernel.


False, and utterly without any technical foundation at all. ALSA and even a "user space sound server" like JACK can match the hardware latency of the audio interface, just as OSS can do.


OSS offers per application volume control, ALSA does not.


Aha! Truth!


Look, I have nothing in particular against OSS although Hannu and Dev did make some architectural mistakes early on. These were mostly inherited from the Unix tradition, and led to them considering audio interfaces as similar to almost any other kind of device you could name except for the video interface. For video, its widely accepted that no Unix-style application accesses the device directly. Unfortunately, this lesson wasn't spotted early on in ALSA's life either, hence some of the reason for the mess that we still have in front of us today. The biggest problem with OSS is its two-sided lack of respect for (a) kernel design principles that dictate keeping policy (eg. how, when, why audio gets mixed) out of the kernel (b) the fact that almost no app should ever be accessing the audio interface directly, just as they don't for video. Other than this, its mostly a wash one way or the other.

There do continue to be deep issues with audio on Linux but neither OSS nor ALSA represent obvious solutions to them.

Dev Mazumdar said...
This comment has been removed by the author.
Theodore Tso said...

>> This is just from one file. It would take a very large amount of
effort to make OSSv4 to acceptable for merging into the Linux kernel.
>I don't believe it needs to go into the Kernel. Why keep perpetuating
this myth?

Linux distributions, as well as the general Linux community, have learned over the years that out-of-tree device drivers are a maintenance nightmare. Keeping them working, and debugging them when they are problems, are an order of magnitude harder when the device drivers are maintained out of the kernel.

So the OSSv4 sound drivers they don't have to be merged into the Linux kernel, but good luck trying to convince distributions that they should try to support said sound drivers in that way. Red Hat and SuSE both learned the hard way how painful it was to support Xen when they shipped something which needed to be in the kernel before it was integrated into the mainline kernel. I very much doubt most distributions are going to be willing to make that mistake again, after after going through the pain and suffering of having to try to forward port Xen to newer kernels 18-24 months later, or having seen vicariously the pain and suffering of those of those who did make the mistake of trying to support something that should have been merged into mainline first.

Dev Mazumdar said...

I am interested in seeing how can we bridge the Open Sound vs ALSA divide. Both groups have been at this for over 10 years and neither side has been able to claim victory.

Please visit the Open Sound Forums at this link and let's start an honest discussion with ALSA developers.

Can we bridge the ideology gap between ALSA and OSS interfaces? Can we bridge the licensing ideology?

Maybe this is what KDE and Gnome folks need to do as well. We can put away our respective jihads and unite or we can continue the fight and maybe this is what makes each other stronger.

Hannu Savolainen said...

OSS/Free and ALSA got forked from OSS v3.8.1 about 10+ years ago. At that moment the OSS API lacked some features that were required by some "all-inclusive" applications. Developers of ALSA started development of a sound subsystem that provided control of every imaginable feature that any (past, current or future) sound card could have.

OSS in turn continued to the same direction for a while. I added some API features that were missing. However after few years I realized that this approach was seriously wrong.

The result was a turn of 100 degrees to right. After examining the applications that were available at that time I found out that it's better to isolate the applications from the hardware rather to give them more control over the hardware.

Actually the OSSv4 API is 100% compatible with the older v3 one. There are enough new ioctl calls to make some "advanced" applications happy. However most applications just want to play or record audio streams with given sample rate, format and number of channels. In addition many applications may want to control the recording/playback volume or the latencies/timing of the stream.

The current situation is that there are two overlapping audio API's in Linux. ALSA is designed for the power users and requires an power user to keep it running. OSS is more oriented for the ordinary users and application developers (with the required hooks for the power users/developers).

I don't see it likely that will OSSv4 ever replaces ALSA in Linux kernel. OSS is a cross platform package that works under several other operating systems too. This doesn't fit with the development model of Linux where everything should be included in the same source tree and to follow the same coding standards. However it might happen that some Linux distributions offer OSSv4 instead of (or in addition to) ALSA.

OSS project needs more driver developers as Theodore Tso suggested. Two developers can maintain whole bunch of drivers if the work can be done in full time basis (without need to fund the development by doing some other full time work). However it looks like development of OSS cannot be continued as full time job so there is need for more driver developers.

zettberlin said...

I do not see, that you know the Linux-audio developement of today well enough to propose a decision like changing the very core of it.

You never discussed Jack here: be aware, that jack is the standard for productive audio in Linux (so-called "music-production"-stuff, you know...)

Your answer regarding the MIDI-capabilities of OSS makes me wonder, if you ever have seen a MIDI-keyboard connected to a computer or if you know, what a sequencer would be. How about this:

I have a sampler, that plays drumsounds answering to MIDI-channel 6 while it has a distorted guitar-riff in another patch that can be triggered via channel 1. on another MIDI-port I have a softsynth running, that I like to play with a USB-MIDI-keyboard, whereas the drums and the riff should be played by a software-sequencer. I also would like to have some FX in a host-software that I' like to control with the knobs on the keyboard. The Synth should be routed to the FX and at the same time being recorded on separate tracks in a hd-recorder.
Ah yes: and if by any chance possible - with less then 10ms latency and while I am watching an inspiring movie(Chronicals of Riddick or Galactica would be nice) via a videoplayer.

OK with OSS4?
It is whatsoever no problem with ALSA-MIDI plus Jack running the ALSA-Backend....

insane coder said...

Hi Joe.
If you'd like to keep up with future ideas I have, please subscribe to this blog.

Hi Reece.
You may also be interested in following the directions here for setting up OSSv4 to be more compatible.

Hi Dan.
That supermarket chart had me rolling. I don't think I laughed as hard reading it as I did about anything in more than a year. Thanks.

Hi Matteo.
>the fact that you sponsor timidity rather than hardware midi, in lieu of OSS lack of support, tells me you don't do much audio production and should test the difference

Several years back, people I worked with were composing MIDIs for the software we were putting out. Since then however, we switched to various PCM formats, and in the case of our old MIDIs, we converted them to OGG Vorbis with WinAMP. I couldn't really comment on anything beyond that. I'm not a musician. However, if you look in OSSv4's tree, there is some MIDI code there that can be enabled at compile time, I'm not sure how complete it is though. Maybe Hannu and Dev can answer that.


Hi Dawhead.

>>OSS always has sound mixing, ALSA does not.
>Except that it puts it into the kernel, for no technical reason other than its easier, and thereby conflicts with every principle of Linux kernel/user-space development that has existed for at least 10 years.

I'll need some more information on that. Why can't it be in the Kernel? Any good reasons? Or is it only because someone said so, or only meant it in a particular case and it somehow spread to all cases?
However, I find it hard to disagree with that sound mixing should be in the lowest of layers of the audio stack, instead of having some audio go above and some go below.
It's also ridicules that the mixing is now done by a user space application which is then scheduled wrong. For these reasons alone, I believe it makes sense to go into the Kernel.



>>OSS sound mixing is of higher quality than ALSA's, due to OSS using more precise math in its sound mixing.

>This is totally bogus. Its never been double-blind tested

Good luck finding perfectly matched computers with perfectly matched drivers in ALSA and OSS, and perfectly matched code paths between them to test in your perfect environment.
However, there have been MANY people who have initially used ALSA, and didn't like how it sounded a bit garbled on higher volumes, and then when trying OSSv4, they were able to set their volumes even higher and still get crystal clear sound. Maybe all of us who have noticed this are zombies, otherwise, there's nothing bogus about our claims.


>and moreover it definitely doesn't use and cannot use floating point math for mixing, which just happens to be the format used by every other audio application and API other than those that are forced to use fixed-point because of DSP chip architecture (e.g. ProTools). Nobody in the pro-audio world seems to like the sound of fixed point. Why does OSS not use floating point? Because the linux kernel does not allow floating point use - or more precisely, it does not save and restore FP registers during kernel context switches, thus making it practically impossible.

Wow. You do know anything that can be done iteratively can be done recursively, right? Same thing when it comes to math and floating point vs. fixed point.
It may be easier to write code with the same level of precision as floating point in a fixed point manner, but it is quite doable. Various system emulators emulate with perfect accuracy floating point math all the time.

insane coder said...

>>OSS has less latency compared to ASLA when mixing sound due to everything running within the Linux Kernel.
>False, and utterly without any technical foundation at all

Go sit down with several games and run some tests. It shouldn't be hard to notice that there is a larger distortion using ALSA between what is going on on the screen and what you hear from your speakers. Of course the difference you notice could vary based on a bunch of other factors.
It generally gets worse and worse with ALSA, and especially PulseAudio the more CPU hungry the game you're using is.

>the fact that almost no app should ever be accessing the audio interface directly
In fact, they don't. /dev/dsp is a virtual device handled by the Kernel, it's not direct by a long shot these days.

Hi Theodore.
>Red Hat and SuSE both learned the hard way how painful it was to support Xen when they shipped something which needed to be in the kernel before it was integrated into the mainline kernel.

I recall mentioning in my article that VirtualBox isn't part of the Kernel yet still marvelously supported. What do you say happened here?
Distros also seem to manage quite well with handling nvidia drivers or the like which aren't in the mainline Kernel.
I don't think pointing a finger at Xen as a sole example of a problem means that Kernel modules as a whole fail, there's many counterexamples.

Hi zettberlin.
> You never discussed Jack here: be aware, that jack is the standard for productive audio in Linux (so-called "music-production"-stuff, you know...)

You're right, I avoided discussing something I don't know much about. I'm a game developer, I play games. I also work on some VoIP related technology. There's a definite issue going on from my perspective, and I see it a lot with other gamers out there. I don't know how the state of things are for Musicians like yourself.
However, I do recall seeing that Jack can use OSS as a back-end, you'll have to test that yourself though to see how well it works, I have no idea. If there's issues, it should be improved. Also, all the people I know who do music production, do it on Windows. Wonder why that is? In any event also see what I responded to Matteo.

dawhead said...

Re: mixing in the kernel. You don't do this sort of thing in the kernel because Linus Says So. If Linus wasn't enough, so do all the kernel lieutenants. If that wasn't enough, so does everyone except Hannu and Dev from 4Front. This is a bit like argument by authority, except that there's no effective counter-argument :)

Moreover, CoreAudio, which for all its faults is still the cleanest, most unified audio API that comes as part of any OS, does not do mixing in the kernel, but does so in user space. This is partly for the same reasons that no Linux system should ever do so, and partly for implementation reasons.

Emulating floating point in software inside the kernel? Hah. Do you realize how utterly insane this is? You don't write device drivers to control real-time sensitive hardware and couple them to an FPU emulation. Why do you think CPU's have had FPU's (and now SSE/SSE2)?

Re: "direct access to a device". My point is more subtle than I think you are grasping. When you do video on Unix-like systems, you don't use open/read/write/close/ioctl. You use a highly abstracted API (it could be X11, DirectFB, Quartz or even, arguably, OpenGL) that doesn't attempt to model what the app is doing in terms of device control. OSS is still quite strongly attached to the "unix" model (open/read/write/ioctl/close) even though it has started to provide higher level libraries. This was one of the most important differences about ALSA in the early days - even though its technically possible to use open/read/write/ioctl/close on ALSA devices, to the best of my knowledge, nobody has ever written an application that does so - it has been understood from the start that access to the device is mediated by the ALSA library.

Re: ALSA and latency, and other stuff too. Maybe I should introduce myself. I'm the guy who originally wrote JACK. I wrote the original ALSA drivers for a couple of high end (pro-audio, not gaming) audio interfaces. I contributed quite a lot to some early design decisions in ALSA to make sure it would work efficiently with pro-audio hardware (since this stuff tends to be designed differently from consumer audio interfaces). I wrote Ardour (http://ardour.org/). I can tell you with absolute confidence What you're observing about ALSA and latency is an app programming issue, not an ALSA issue.

As I wrote at the end of my original post, there are many real problems with Linux audio. Neither OSS nor ALSA represent coherent solutions to them at this time.

insane coder said...

Hi dawhead.

>Re: mixing in the kernel. You don't do this sort of thing in the kernel because Linus Says So. If Linus wasn't enough, so do all the kernel lieutenants. If that wasn't enough, so does everyone except Hannu and Dev from 4Front. This is a bit like argument by authority, except that there's no effective counter-argument :)

I'm still waiting to hear the argument. "Because" isn't a reason, it's what a 5-year old says. I see your lips moving, but no sound is coming out.


>Emulating floating point in software inside the kernel? Hah. Do you realize how utterly insane this is?

You realize I'm "insane coder"?


>You don't write device drivers to control real-time sensitive hardware and couple them to an FPU emulation. Why do you think CPU's have had FPU's (and now SSE/SSE2)?

Excellent point, so why isn't the Kernel modified to allow floating point in it?


>Re: "direct access to a device". My point is more subtle than I think you are grasping. When you do video on Unix-like systems, you don't use open/read/write/close/ioctl. You use a highly abstracted API (it could be X11, DirectFB, Quartz or even, arguably, OpenGL) that doesn't attempt to model what the app is doing in terms of device control. OSS is still quite strongly attached to the "unix" model (open/read/write/ioctl/close) even though it has started to provide higher level libraries. This was one of the most important differences about ALSA in the early days - even though its technically possible to use open/read/write/ioctl/close on ALSA devices, to the best of my knowledge, nobody has ever written an application that does so - it has been understood from the start that access to the device is mediated by the ALSA library.

So what's the problem with using it like a file?


>Re: ALSA and latency, and other stuff too. Maybe I should introduce myself. I'm the guy who originally wrote JACK. I wrote the original ALSA drivers for a couple of high end (pro-audio, not gaming) audio interfaces. I contributed quite a lot to some early design decisions in ALSA to make sure it would work efficiently with pro-audio hardware (since this stuff tends to be designed differently from consumer audio interfaces). I wrote Ardour (http://ardour.org/).

If you say so.


>I can tell you with absolute confidence What you're observing about ALSA and latency is an app programming issue, not an ALSA issue.

Right...
So say I made an app which uses SDL or OpenAL to output sound. And I can change the back-end they use to be OSS or ALSA. When I do so I notice a difference? Who is being blamed here? The people using SDL/OpenAL? If it works with one back-end, did they screw it up?
Maybe it's the people who write the back-ends in SDL or OpenAL? If so, why aren't they being fixed?

Or maybe ALSA is just so hard to program with that I can't figure out how to write correctly to it from my app, and neither can the SDL or OpenAL people.
Maybe you're right that the back-end is great, and everything else isn't, but if no one can get to the back-end properly, then we have a problem, no?

If you really know so much about ALSA, then please fix SDL and OpenAL, and I'll be happy to test again to see who was at fault, and update on how things stand. But as it stands now, ALSA hardly seems like an option for CPU heavy games, or any tasks that are similar.

dawhead said...

The reason, which has been consistent in the Linux kernel community for more than 10 years, is that you do not put POLICY into the kernel, only mechanism. Anything that can be technically accomplished in user-space belongs there, not in the kernel where the implications of poor design, error and inefficient design are much more significant for everyone. As we have grown toward a better understanding of how to do things efficiently in user-space, this has shifted the things that are considered acceptable inside the kernel. When OSS was young and ALSA was just beginning, we frankly didn't really know how to do no-added-latency mixing in user space. Now we do.

The kernel doesn't permit floating point because (a) nobody has ever managed to convince Linus that any kernel mechanisms need floating point math and (b) saving & restoring FP registers inside the kernel adds measurable costs to a kernel context switch.

Re: accessing the device like a file. I have two responses, which are essentially the same response. The first is that its not a file - its a streaming data processor with real time deadlines if you want to avoid perceptual glitches. Encouraging a programming API based around the concepts that the Unix file API uses simply encourages programmers to ignore critical considerations about how the device actually works. This connects us with the 2nd response: imagine what a mess video (that is to say, the display/monitor) would be if every app tried to access the abstraction as if it was a file. No, X11, GDI (win32) and Quartz (OS X), not to mention OpenGL and other higher level APIs, all remove 99% of the "fileness" of the video frame buffer in favor of an abstraction that (1) better lends itself to shared access (2) focuses the developer (or at least, the GUI/widget toolkit developer) on the salient properties of the underlying hardware.

Re ALSA, OpenAL, SDL etc. Well, yes, most of the people who developed layers on top of ALSA didn't really understand it very well. Most people who write "audio APIs" let alone "audio apps" don't actually understand most of the issues, and they are not at fault for this. I'm not going to fix other libraries - I have my hands full with JACK and Ardour, which I've worked on for nearly 10 years now. But I can assure you, not via hand-waving but via actual working code and dozens if not hundreds of apps that use the JACK API that it can be done correctly.

This is where I return to CoreAudio again. CoreAudio imposes a certain programming design on any code that interacts with it. its a very different programming model than either OSS or ALSA require (though both can be used in similar ways if the programmer wants to - this is what JACK does). But this critical difference - forcing you to think about things in terms of a "pull" model - where the hardware will demand that you process data - rather than a "push" model - where your app processes data at whatever rate it wants - leads to a design in which everything works together. You have write really twisted CoreAudio code, for example, if you want to make your application fail when the "device" in use is actually another application (e.g. JACK). Contrast this with Linux, where first OSS and then ALSA continued to encourage application/library authors to promote a read/write approach to device interaction, with inadequate documentation and insufficient abstraction.

Its a mess. It might get better.

insane coder said...

>The reason, which has been consistent in the Linux kernel community for more than 10 years, is that you do not put POLICY into the kernel, only mechanism. Anything that can be technically accomplished in user-space belongs there, not in the kernel

Okay, that I can understand. But these user space servers must be scheduled properly. All too often I see them fail when the program using them is using way too much CPU.

>The kernel doesn't permit floating point because (a) nobody has ever managed to convince Linus that any kernel mechanisms need floating point math and (b) saving & restoring FP registers inside the kernel adds measurable costs to a kernel context switch.

That I realize, however if something belongs in the Kernel, and it can't use floating point, then the obvious conclusion is to write it in a fixed point manner.

>Re: accessing the device like a file. its not a file - its a streaming data processor with real time deadlines if you want to avoid perceptual glitches.

I don't see how that prevents us from making believe it is a file.
We treat network connections as files, and they have real time deadlines too. We even do VoIP!

>Encouraging a programming API based around the concepts that the Unix file API uses simply encourages programmers to ignore critical considerations about how the device actually works. /video discussion/

I find it hard to imagine that we can't have a file based setup for video.
2D video is merely writing an exact number of pixels, or updating some pixels. This can easily be done with a file mechanism.
3D video which is done in software, is essentially 2D video.
3D video done in a hardware manner is a matter of creating 3D shapes, uploading textures to them, and rotating, and transforming or whatever.
Think of your 3D environment as being a directory structure, using a particular file naming design to access each object as needed. Upload a texture can be writing a particular buffer to a file. Rotating a scene can be done with ioctls and more.

I don't recommend this because of the complexity, but there is no technical reason that video couldn't be accessed via a file format.
In the case of audio, the handling of it is nowhere near as complex as video, certainly not for the average program use case.

>Re ALSA, OpenAL, SDL etc. Well, yes, most of the people who developed layers on top of ALSA didn't really understand it very well. Most people who write "audio APIs" let alone "audio apps"

Yes, so you're not going to work on it, no one else is going to work on. We're left at square one where developers can't make products which sound good using ALSA. So games are going to either be Windows only, or come to Linux using OSS. This is why we have a "Sorry State of Sound on Linux". And if no one is going to fix it, it's going to remain that way. With this status quo, I wonder, Will Linux ever be mainstream?

>This is where I return to CoreAudio again.
Which doesn't exist on Linux. You may advocate it, but we can't even test it on Linux.

>CoreAudio imposes a certain programming design on any code that interacts with it forcing you to think about things in terms of a "pull" model - where the hardware will demand that you process data - rather than a "push" model

This is only true if you can design your program to work that way. SDL also uses a pull model. For some programs such a forced design makes things much harder to work with. Maybe it's a necessary evil.

>Its a mess. It might get better.
I'm still waiting to see people move forward and make it better. I've been waiting for a few years now, and I still see no real progress.

Reece Dunn said...

@insane coder

Thanks for the tips. I am running on OSS4 now, and an annoying crackling and sound drop out has gone. I suspect that this is a bug in either the ALSA Intel sound driver or PulseAudio <-> ALSA interaction.


re: file-based vs ALSA API

To me, the file-based OSS (and file-like simple PulseAudio API) are easier to develop to. For a text-to-speech (TTS) program like eSpeak, where the TTS engine is generating audio and sending that to a callback, it is very easy to use the file-like model.

With ALSA, using a simple write-based approach, I keep getting underruns. So, to fix that, I'd have to write an insane polling loop with a buffer-request callback and underrun detection/recovery, while writing the data to an internal buffer.

This is likely to be on a thread, and since the TTS callback is on another thread, I'd need to lock access to the data. Now, I could use a producer/consumer-based model, but the potential for bugs is huge, and I want to keep the API consistent across all back-ends.

From the GStreamer docs, there does appear to be a file-like API that is built on top of a more complex API, but there doesn't appear to be any documentation and examples on how to access this. Nor how to write an audio source.

insane coder said...

Hi Reece.

>Thanks for the tips. I am running on OSS4 now, and an annoying crackling and sound drop out has gone.

Glad you managed to figure it out and are enjoying it.

I hope you see that OSSv4 provides better quality for the average user.

Everyone should be able to access this quality without the hassle.

Just when you try to point it out to people, you get too many responses based on OSS not supporting mixing, or that it's superior quality, or easier to develop for is a myth.

I guess you now see what I'm saying. Tell your friends.

Hannu Savolainen said...

"The reason, which has been consistent in the Linux kernel community for more than 10 years, is that you do not put POLICY into the kernel, only mechanism. Anything that can be technically accomplished in user-space belongs there, not in the kernel where the implications of poor design, error and inefficient design are much more significant for everyone."

This is correct up to some degree. It is clear that you should not do things like computer vision or speech recognition in kernel space. However this is very clearly not the case with so called "virtual mixing".

I really can't understand why everybody thinks virtual mixing is some rocket science. It is definitely not. Virtual mixing algorithm is just a loop that takes samples from all active input buffers, sums them together and writes to the output buffer. This gets repeated only (say) 48000 times/second. In reality there are some additional arithmetic operations to handle input/output volumes and peak meters. However there is _NO_ FFT or computationally expensive filters. There are no fancy data structures. The CPU time consumed by the mixing is far far below measurable. The code is just a for loop that cannot crash the kernel.

What happens if these computations are moved from kernel to user space? What happens is that the below measurable CPU usage gets accounted to the user space load instead of system load. However the system doesn't become any faster. The computations will still take the same amount of time.

Note that sample rate conversions are separated from "virtual mixing" in OSSv4. SRC is more CPU intensive than mixing. Applications can do it in user space or leave it to be done by OSS.

Hannu Savolainen said...

For QuoiD:

"Now as far as I know, if its licensed under GPL they can't restrict commercial usage at all??? presumably there are some internal parts unusable commercially while the rest of it is GPL... Its a very silly licensing system IMO. It just doesnt make enough sense."

The answer is that OSS is available under 4 different licenses. What you were looking at was the license for the precompiled "commercial" binaries. The source code packages for the same stuff (excluding couple of binary only drivers) are available for download under GPLv2, CDDL and BSD licenses.

Hannu Savolainen said...

About floating point in kernel: This feature can be disabled when compiling OSS. It's enabled by default because there have not been any problems caused by it during past few years (which IMHO proves that there are no problems). Also the CPU load is exactly the same than without FP.

Believe or not the reason to use floating point in audio computations is precision. This may sound strange since all elementary computer programming books tell that integer computations are precise and floating point is not. You may think that I have misunderstood something important but this is not the case.

The alternative for floating point is fixed point. A common practice is to store a 24 bit sample in 32 bit integer so that the 8 most significant bits are unused (or used as headroom during computations). The problem with fixed point is that there is no footroom at all and the 8 bit headroom (24 dB) is far from enough.

For example if you attenuate a fixed point signal by 30 dB which is not that much. 30 dB attenuation means that the (originally) 24 bit sample is divided by ~1000. Let's select 1024 which is the same than shifting the sample 10 bit positions to right. Now you have only 14 bits of precision left and the least significant bits are lost forever. Even if you amplify the sample by 30 dB the 10 least significant bits will not return back. If the sample value was less than 1024 then the sample will drop to 0 and all information is lost.

Floating point doesn't have this problem. Single precision FP format has 24 bits for the mantissa (which is perfect for audio) and the remaining bits are used as the exponent. If you now divide the sample by 1024 then the actual sample stored in mantissa will not change at all. Just the exponent will change because the bits were shifted "virtually". During computations trhe mantissa will always store 24 most significant bits of the sample. Some information may leak on the floor but the lost information is about 70 dB below the current signal level.

Ex-Cyber said...

I still have some doubts regarding the licensing; hopefully they will be easy to address with a little clarification. In June 2007 the user "admin0" (apparently Hannu or Dev) made the following statement in a comment on Hannu's blog:

"GPL is the license of the source code of OSS. You can compile, link, modify and redistribute the source code or binaries derived from it. All this is covered by the GPL license which defines your rights as a software developer.

However when you invoke OSS you need to follow the licensing policy defined by 4Front Technologies. We as the initial developer and the copyright owner of Open Sound System we have legal rights to define the terms for use. What our licensing policy says has higher priority than whatever the GPL says."

This is confusing, to say the least. It makes it sound like the intended license is really not GPL but rather some sort of "GPL plus arbitrary restrictions applicable at runtime".

I realize that this statement is somewhat old. However, I have not seen anything more recent that clearly contradicts it, and it is so at odds with my understanding of copyright law and the GPL that I can't ignore it.

QuoiD said...

Hey Hannu,
Thank you so much for clearing that licensing misunderstanding up. Since that is the case there really are no real licensing issues! I reread the website and realised that i had just misinterpreted it altogether, habitually looking for prebuilt packages instead of source. Apologies for my mistake, although i should point out that its possible others may have interpreted it the same way. :)

A question though, is there any real advantage to having floating point code "inside" the kernel, as opposed to using libraries in userspace? (other than better control of scheduling priority -- which i can understand may be an extremely valid concern for scheduling latency-and thus real latency)

I'd also like to bring up a point, that future apps/games may well want to mix thousands of streams of audio, (possibly already done? just add a zero or two) Should this be the ultimate task/goal of a sound server, sortoflike opengl drivers for graphic cards, or rather a separate library? opinions?
If not, the concern here would seem to be how many "separate" apps running interleaved, can "share" the sound system, and how/where the mixing takes place. (userspace / kernelspave / hardware) It seems like a cut and dried problem if we can eliminate userspace scheduling latency, however what about hardware mixing? How would apps access that without running into problems.

Hardware audio mixing with a GPGPU-based system, would mean that any sound system/driver system, would (in a sense) become a complex funnel. Software fallback is obviously necessary when without hardware mixing tho, and it seems the correct place to provide this backup system--is integrated into the sound server! (otherwise it means two different application interfaces for hardware vs software) Kernel vs Userspace is really only an issue if scheduling problems add latency-- and until resolved the best bet is probably the one OSS is taking. (kernelspace floating point mixing)

Im hoping the final solution will arrive in the form of masses of cheap GPGPU style chips, running sound servers capable of massive amounts of high quality audio mixing. Larrabee anyone? :)

-Quoid

dawhead said...

Re; video "file" APIs ... sure, I am not saying that one cannot imagine a file-API-based model for video. But guess what ... no existing widely used platform offers this, and none have done so for years. It might be worth considering why this is before just assuming that everyone who has designed X Window, GDI, Quartz and many more are all just missing the boat.

Re: scheduling. Deadlines get missed because of 3 reasons (a slight simplification, but it will do). (1) they are not running in right scheduling class (SCHED_OTHER, the default POSIX/Linux scheduling class will never work for media without a lot of buffering to hide scheduling jitter) (2) excessively long code paths in the kernel (3) hardware stealing system resources (eg. PCI bus access) for too long. You don't fix any of these problems with a new audio API.

Re: network connections and "real time". Sorry, but this isn't even in the same class as audio interface deadlines. The only time network connections have deadlines close to the stuff we are talking about for audio are when they are used for ... streaming realtime audio. The rest of the time, precisely because of the potential for jitter, both endpoints buffer, which is equivalent in audio terms to adding latency. Not to say that you can't do quite well - there are quite a few systems that can do WAN/DSL realtime streaming audio now with only 30ms latency over long haul arbitrary connections. But even they tend to come with dropped packet handling heuristics, which nobody in their right mind would ever add to an audio interface driver.

Regarding me working or not working on APIs targetting games and desktop/consumer apps: I've spent 10+ years of my time, the majority of it unpaid, working more or less full time on pro-audio and music creation software for Linux and POSIX-ish platforms, all of which is GPL'ed and freely available. Please don't get on my own personal case about the fact that I don't work on APIs that target the area that interests you the most. Yes, problems won't get solved without developers. But developers are not actually the problem here. The central problem is that nobody actually knows what the solution looks like.

Several weeks ago, I wrote on a linux audio mailing list about how I see the problem - someone was complaining about it all being politics:

Its not politics. Its the lack of politics. There are no leaders with any power to enforce any decisions. There is no police authority to identify people who fail to comply with "joint decisions". There is no justice system to punish or expel those who do. This is an anarchistic meritocracy, and yes, its harder to get system infrastructure developed in this environment than in a system like windows or OS X where a single person can say "it shall be thus". thats good, and its bad.

I continue to stand by that assessment, and I hope to present some ideas about this at the Linux Plumbers Conference this fall.

Theodore Tso said...

>About floating point in kernel: This feature can be disabled when compiling OSS. It's enabled by default because there have not been any problems caused by it during past few years (which IMHO proves that there are no problems).

Nope, that just means that people have gotten lucky. It's true that many programs don't use floating point, but it's also true that the kernel does not save floating point registers when transitioning from user context to kernel context, so if you try to use floating point in the kernel, you will corrupt the floating point registers seen by the userspace program after a system call or an interrupt handler returns.

A statement such as "just because there haven't been any observable problems means the code is bug-free", especially when it is conceptually wrong, would certainly give a competent distribution kernel engineer some serious amounts of pause about trying to support OSSv4 in a Linux distribution.

Theodore Tso said...

>But if I
sell workstations with support to companies that need audio, I think it
would make a lot of sense to me to provide a better audio solution than
what my competitors are shipping.

How many distributions do you think are making money selling support for workstations that need audio? Go ahead.... name one.

>Also look at Ubuntu, they're not even profitable. Mark Shuttleworth
seems to have made his goal in life to take down Windows from the #1 OS
position, and doesn't care how much money he flushes on the problem.

I'd suggest you talk to Mark before you assume that's either his goal or his business plan. And he is planning on moving Canonical to be at least breakeven in the next year or two.

zettberlin said...

First of all: thanks for you sane and reasonable reaction to my rather coldish post. :-)

Still I must say: OSS4 cannot solve anything in Linux audio. ALSA may have its flaws - documentation and API-design are the most prominent I suppose.
Rant about that! Criticize ALSA from your point of view as a game-programmer! Hit them hard, make you matter be heared among the ALSA-people!

But trying to replace ALSA wont solve anything but increase the chaos and make the situation much worse.

Why?

You have posted this:

> Also, all the people I know who do music production, do it on Windows. Wonder why that is?

The answer is:
Linux lacks a set of stable applications and plugins, that provide all the functionality, a musician wishes to use on a computer. I use it either way, because I can live with some shortcomings and in my studio-work I am very fine with Ardour (wich uses jack excluselively, and: no, the oss-backend of jack does not provide the same you get with ALSA or FFado plus jack).

And indeed the very thing that musicians find most intriguing in Linux, is the things you get by the combination of jackd+ALSA with a realtime-kernel.

I am also a bit puzzeled by your examples for the bad performance of ALSA.
If I play Sauerbraten on my Linux-boxes(and all of them run ALSA and for games they run ALSA only), I get a BANG! that feels like 30-40 ms latency as I hit the left mouse-button. I remember playing a demo of UT some 3 years ago on a ALSA-only Linux yet I cannot remember any sound-issues. And it is the same with Skype and some other VOIP-apps, I tested and installed for clients so give that:

If application A,B and D run OK with a given sound system but application C does not - is there not the slightest chance, that application C is not implementing the sound support the same as good as A,B and D? Simply put: it may be not that nice and easy to make an app be heared on Linux via ALSA but it looks like it is possible...

And it is the same with the quality-issues, you have mentioned. I mean, I make music with such a system and that is, nowadays, combining 2-3 or 12 streams, some of them synthesized from out of nowhere and I never noticed the problems you have mentioned - not on my Tannoy-nearfield-monitors and not in one of my Sennheiser or AKG-phones.

So maybe, there is another flaw in an application, that spoils the party for you?

Greg said...

I find it odd that in this article and the 2 year old one, neither freebob/ffado or jack where mentioned at all. If I want low latency for use on a DAW (Digital Audio Workstation) I would set it up with a firewire soundcard, using ffado, and then let jack run on top of that.

Given that jack can use any underlying sound architecture, alsa, oss, firewire and that it is trivially easy for a new user to learn with its GUI applications, I think linux sound systems should go the way of providing nothing more than driver level support of the soundcard to jack. That way jack can work with the smallest latency possible and provide a common interface for all applications to route their audio to.

Hannu Savolainen said...

"However when you invoke OSS you need to follow the licensing policy defined by 4Front Technologies. We as the initial developer and the copyright owner of Open Sound System we have legal rights to define the terms for use. What our licensing policy says has higher priority than whatever the GPL says.

This is confusing, to say the least. It makes it sound like the intended license is really not GPL but rather some sort of "GPL plus arbitrary restrictions applicable at runtime".

This was the original "business model" I tried to promote when we released OSS in open source. However that model was too confusing and we have later moved to the usual interpretation of GPLv2. The community web site has just not been updated due to lack of time.

The common interpretation of GPL is that availability of source code means that the software is "gratis" or "royalty free". Individuals/organizations who use GPL'ed software do so because they don't want to pay any royalties. Unfortunately this means that non-business critical software products like OSS cannot be developed in full-time because there is no revenue.

Hannu Savolainen said...

"A question though, is there any real advantage to having floating point code "inside" the kernel, as opposed to using libraries in userspace? (other than better control of scheduling priority -- which i can understand may be an extremely valid concern for scheduling latency-and thus real latency)"
I agree using floating point in current vmix doesn't give that much benefits. The code is just slightly more readable because no additional scaling operations are not required like with fixed point.

OSS can be configured to use fixed point instead of floating point. However this doesn't seem to make any difference. The CPU load caused by OSS/vmix is still the same and there is no measurable difference in performance.

"I'd also like to bring up a point, that future apps/games may well want to mix thousands of streams of audio, (possibly already done? just add a zero or two) Should this be the ultimate task/goal of a sound server, sortoflike opengl drivers for graphic cards, or rather a separate library? opinions?
If not, the concern here would seem to be how many "separate" apps running interleaved, can "share" the sound system, and how/where the mixing takes place. (userspace / kernelspave / hardware) It seems like a cut and dried problem if we can eliminate userspace scheduling latency, however what about hardware mixing? How would apps access that without running into problems."

Virtual mixing can be bypassed by opening the device with O_EXCL flag. Vmix can also be disabled when OSS is compiled. In this way vmix doesn't conflict with applications/libraries that do all the mixing themselves.

OSS does virtual mixing in kernel (by default) because this is the best way to do it completely transparently. Applications written 15 years ago will still work with vmix just like they worked when they were developed. There is no need to develop new plugins for any special mixing libraries (there are dozens of them).

"Hardware audio mixing with a GPGPU-based system, would mean that any sound system/driver system, would (in a sense) become a complex funnel. Software fallback is obviously necessary when without hardware mixing tho, and it seems the correct place to provide this backup system--is integrated into the sound server! (otherwise it means two different application interfaces for hardware vs software) Kernel vs Userspace is really only an issue if scheduling problems add latency-- and until resolved the best bet is probably the one OSS is taking. (kernelspace floating point mixing)"

The key is transparency. If mixing is done in kernel level then it can be re-implemented to use GPU or some DSP card. Applications will not see any difference. They can use /dev/dsp in the same way they did 15 years ago. In fact virtual mixing looks like hardware mixing to the applications.

Hannu Savolainen said...

Theodore Tso has left a new comment on the post "State of sound in Linux not so sorry after all":

>>About floating point in kernel: This feature can be disabled when compiling OSS. It's enabled by default because there have not been any problems caused by it during past few years (which IMHO proves that there are no problems).

>Nope, that just means that people have gotten lucky. It's true that many programs don't use floating point, but it's also true that the kernel does not save floating point registers when transitioning from user context to kernel context, so if you try to use floating point in the kernel, you will corrupt the floating point registers seen by the userspace program after a system call or an interrupt handler returns.

This is the valid point against using floating point in kernel.

OSS does this by saving the floating point registers before entering vmix and restores them after that. The actual mixing loop is done with all interrupts disabled. There is no risk that this fails.

The potential problem is that the computations done by vmix might trigger a floating point exception (say division by zero). This has been taken care by designing the algorithm so that there will be no exceptions. It is true that this could cause problems arbitrary algorithms. However vmix has been used by 1000's of users for two years which should be a proof that the algorithm is foolproof.

"A statement such as "just because there haven't been any observable problems means the code is bug-free", especially when it is conceptually wrong, would certainly give a competent distribution kernel engineer some serious amounts of pause about trying to support OSSv4 in a Linux distribution."

For this reason floating point can be disabled when OSS is compiled for Linux distributions.

sapphirecat said...

What is most important to a user is that sound works in the first place. I'm pondering switching to Windows (and running Linux in a VM for local/testing web server) because, for the 9 years I've been a Linux user, sound has sucked. Badly.

If you install Rosegarden on Ubuntu and have it start JACK for you (which, by its whining, makes you *think* is necessary, but in reality all it uses JACK for is voice tracks), then it'll block all your PulseAudio stuff because /usr/bin/qjackctl is a shell script that runs pasuspend qjackctl.real "$@" or so. Also, you basically can't get MIDI without timidity or fluidsynth, even back in the day when I had emu10k1 hardware in here somewhere.

So the default state for Rosegarden+JACK is to make your system completely silent. Brilliant!

It was never any easier on Red Hat or Gentoo.

I would say then that users don't even really want a *choice* between ALSA and OSS. If you install a sound card in Windows, it just works with everything, except those old DOS games that included their own sound card drivers, and which are basically extinct. Presumably the Mac situation is similar, but without the DOS bit.

The disturbing part of OSSv4 from a developer perspective is the mention of AC3 decoding. For one, AC3 appears to be patented, which would immediately put it in the -multiverse ghetto on Ubuntu. Forgive my ignorance, for it is not clear about the status of it in your post, but at the very least AC3 would need to be completely optional.

Second, if the decoding is happening in-kernel (again, clarity is lacking), that means that any OSS AC3 bug can potentially root your box. Obviously that doesn't really apply if the sound card has accelerated AC3 decoding. I've never heard of such a thing, but I wouldn't be surprised if someone created it, given its obvious parallel to MPEG-2 and H.264 acceleration on video cards.

Anyway, if it included userspace Vorbis decoding as part of the package, that could be a selling point for OSSv4 to the average Free developer. But AC3 is a "run away screaming" sort of "feature".

dawhead said...

JACK is a system targetting pro-audio and music creation workflows. It is not aimed at desktop and/or consumer situations (although it can work suprisingly well there too). As such, it is not concerned with whether or not your desktop sound effects, skype or flash video playback works while it uses the audio interface (though the "jumping jack flash" library actually takes care of that last point quite nicely, from a technical perspective). If you're concerned with the kinds of stuff JACK was designed for, you almost certainly do not want these "random" sounds being involved with your current work. If you don't care about that kind of stuff, then you probably don't care about JACK either.

JACK is much more similar to something like ASIO, which historically did NOT cooperate well with traditional windows applications that used Microsoft's "MME" API. This has changed now that the WDM model has taken over on that platform, and both "pro-audio" apps and consumer/desktop stuff talks to the hardware via the same (highly abstracted) API. The same situation exists on OS X too, where a single API (CoreAudio) is used no matter what the target audience or "niche" the application is aiming for (games, desktop stuff, consumer media, pro-audio, music creation etc.)

My own perspective (admittedly biased towards the pro-audio/music end of things) is that both WDM and CoreAudio work because (a) they satisfy the most demanding app requirements (pro-audio) while simultaneously enforcing the pull model on all applications (though some may use a 3rd party library that does lots of buffering to make a "push" model work OK). On Linux, we don't have the option of "enforcing" anything, and so we have a mess where games-centric developers get their own API, pro-audio developers get their own API, desktop/consumer media apps gets their own API and so on and so forth.

insane coder said...

Hi dawhead.
>Re; video "file" APIs ... sure, I am not saying that one cannot imagine a file-API-based model for video. But guess what ... no existing widely used platform offers this, and none have done so for years. It might be worth considering why this is before just assuming that everyone who has designed X Window, GDI, Quartz and many more are all just missing the boat.

You might want to read what I said again, I didn't advocate a file API for video.


>Re: scheduling. Deadlines get missed because of 3 reasons (a slight simplification, but it will do). (1) they are not running in right scheduling class (SCHED_OTHER, the default POSIX/Linux scheduling class will never work for media without a lot of buffering to hide scheduling jitter) (2) excessively long code paths in the kernel (3) hardware stealing system resources (eg. PCI bus access) for too long. You don't fix any of these problems with a new audio API.

No, but you can fix that when the mixing is all in-kernel.


>But even they tend to come with dropped packet handling heuristics, which nobody in their right mind would ever add to an audio interface driver.

Sounds like a good idea to me then, it should be added!


>The central problem is that nobody actually knows what the solution looks like.

I imagine the solution would look like an API that developers can use easily without making mistakes, and not have programs which sort of work. And on top of that, the stack beneath the API would work well, have mixing, and not suffer from starvation.

>Its not politics. Its the lack of politics. There are no leaders with any power to enforce any decisions.

Yeah maybe. Except we can't just have any leader step up with bias. Whoever enforces the decision should sit down with current solutions in a variety of situations chosen as a test-bench by various communities, then see how each fits each community best, then determine which is the technical way to go, whether to keep one or two of what exists, merge certain things, or come up with something new. All too often it seems people making the decisions don't actually know enough about what they're deciding about, or only have 3rd had knowledge which is biased.

insane coder said...

Hi Theodore.

>A statement such as "just because there haven't been any observable problems means the code is bug-free", especially when it is conceptually wrong, would certainly give a competent distribution kernel engineer some serious amounts of pause about trying to support OSSv4 in a Linux distribution.

In this exact same post you quoted "This feature can be disabled when compiling". Now it looks like you're looking for problems which don't exist, and just want to bash OSS, if that's the case, then I have nothing further to say on the matter.


>How many distributions do you think are making money selling support for workstations that need audio? Go ahead.... name one.

Did you miss the whole "If I"?


>I'd suggest you talk to Mark before you assume that's either his goal or his business plan.

Have a look at this. In any event, if I sat down with him, rather than discuss anything so inane, I would give him a couple of demonstrations of problems certain users are seeing, propose some solutions, ask him his thoughts, and ask him to enforce them.

Hi zettberlin.
>Rant about that! Criticize ALSA from your point of view as a game-programmer! Hit them hard, make you matter be heared among the ALSA-people!

I did, in this article. I also provided solutions which don't involve ALSA, nothing hits harder than that.


>If application A,B and D run OK with a given sound system but application C does not - is there not the slightest chance, that application C is not implementing the sound support the same as good as A,B and D?

The issue here is one I brought up in other responses here. It depends on how demanding the game is on CPU usage. A,B, and D in your case probably use less than 50% of the CPU. C on the other hand uses most of the CPU. Games which demand more out of the system tend to have the sound server back-ends start to provide a very bad experience.

I could also be wrong about this, but doesn't UT use the OSS API for output?

>And it is the same with the quality-issues, you have mentioned. I mean, I make music with such a system and that is, nowadays, combining 2-3 or 12 streams, some of them synthesized from out of nowhere and I never noticed the problems you have mentioned - not on my Tannoy-nearfield-monitors and not in one of my Sennheiser or AKG-phones.

And how much CPU does that process use? Look at bsnes which I linked to above, it constantly mixes 8 audio stream internally, all of them synthesized from out of nowhere, as well as having a monstrous effect on the CPU while it generates game data. It also requires input+audio+video all be synced together. It's a lot more demanding than say Doom 3 on your CPU, and gives an idea of when demanding audio and CPU constraints just fail on ALSA, and especially PulseAudio. You may call what you're doing pro-audio, but from a game developers perspective, we don't really view it as *intensive*.

Hi Greg. You may want to read the rest of the responses here.

Hi sapphirecat.
I don't believe the conversion is done in the Kernel, see what else Hannu said here.
Also see this page. Vorbis is there too, but I'm not sure what the state of it in the source code is.

sinamas said...

RE: dawhead, WDM/CoreAudio perspective

The pull model is a bad fit for some cases. Like dynamically generated combined audio and video with low latency requirements (so you can't do much of audio or video buffering). The pull model applies to video as well, as long as you want a constant frame rate (we don't want to particularly discriminate video deadlines for audio deadlines). Dealing with two pull models at once doesn't seem too pleasant does it? The pull model typically ends up having to be wrapped as a push model, which usually means more buffering.

WDM doesn't really force the pull model on you, as you can poll how much space is available in a buffer (IAudioClient::GetCurrentPadding) and write. CoreAudio seems to pull often enough that it doesn't become much of an issue. CoreAudio also provides nice things like estimation of the actual sample rate in terms of OS timers, which is helpful for adjusting the frame rate to the sample rate (or vice versa, one of them usually has to give a little for practical purposes).

zettberlin said...

> And how much CPU does that process use?

70-80% are common, 90% and more usually produce xruns but in no situation I get any distortion. It is either working fine or breaking down - no in-betweens.

I do not talk about simply playing some pre-produced samples. I talk about recording 8 tracks simultaneously while 40+ other tracks are being played and some of the recorded tracks are being synthesized as they are recorded. All at 96KHz samplerate and 32bit float resolution. And with 15-20 independent processor-plugins running.

> Look at bsnes which I linked to above, it constantly mixes 8 audio stream internally, all of them synthesized from out of nowhere, as well as having a monstrous effect on the CPU while it generates game data. It also requires input+audio+video all be synced together. It's a lot more demanding than say Doom 3 on your CPU, and gives an idea of when demanding audio and CPU constraints just fail on ALSA, and especially PulseAudio.

So games produce sounds all new - and not out of samples? And I can change several parameters of these producing synthesizer-maths by turning knobs on MIDI-controllers? All with less than 10ms latency?

> You may call what you're doing pro-audio, but from a game developers perspective, we don't really view it as *intensive*.

A single really powerful synthesizer with less than 1 MB download-size can be more demanding then any game you ever had running.
Give me one of your 4K-dollars gamer-boxes, built in a pro-soundcard, I can run over its limits easily with a single synth like ZynaddsubFX on any operating system.

The difference is the flexibilty. Software for music-production and sound-processors allow the user to build up structures of unlimited complexity. And they have to render all this madness in a way that gives the player the feeling, that the software responses immediately to any demand he or she throws at the maths using a controller-interface that has more knobs and switches then any full-fledged gamer-interface.
A steering-wheel, some 5 on/off-switches and 3 pedals? Ha! I got a Keyboard with 25 velocity-sensitive keys and 12 knobs, sliders and wheels and this little creature is nothing compared to somewhat more sophisticated MIDI-controllers such as drums that have layered zones sensitive to the user touching them, sending MIDI-commands and parameters in wild combinations no programmer can forsee. To mathematic constructs no programmer can forsee either.

Guess why there are specialized DSP-cards for Audio-workstations build exluselively to render 2-3 applications for processing something as "simple" as room-simulation....

insane coder said...

Hi sinamas.

Yes, you described what I was trying to get at perfectly. Using SDL and their pullback model, despite the simplicity of SDL, it took me a while to figure out to get my program to work with all these pull models. Even then I wasn't happy about, as I had to write too much on top to deal with buffering everything when I don't even necessarily have samples to buffer yet. Push models worked much better, and was less of a pain, and I'm getting less gaps in the audio when pushing.

Hi zettberlin.

What you're describing seems to be a very specialized case with a lot of specialized hardware, not at all something geared towards the home user in any way. Something which very few potential Linux users would use.

>So games produce sounds all new - and not out of samples?

That is correct.

>And I can change several parameters of these producing synthesizer-maths

Yes.

>by turning knobs on MIDI-controllers?

No, by using your keyboard/gamepad to interact with the game.

>All with less than 10ms latency?

If it was with <10ms latency all the time, do you think I would be saying we have a sorry state of sound in Linux? I mention in my article, that in some cases we see latency measured in seconds, and it's unacceptable.

>Give me one of your 4K-dollars gamer-boxes

I don't have any $4000 boxes. The most I ever spent on a computer was $2500, and that was years ago when computers were way more expensive than today, that machine wasn't even close to top of the line. I spent ~$700 on my last computer that I built to be top of the line for my needs.

zettberlin said...

Hi Insane Coder

> What you're describing seems to be a very specialized case with a lot of specialized hardware, not at all something geared towards the home user in any way.

80% of those, who make music with their computers, use such scenarios frequently. To impose such load to a music-system, you dont need a MIDI-Keyboard by the way. You can program even more crazy realtime-load by using the automation-curves in Ardour. In the mixing-phase you often want more than 30 parameters of 20 FX-plug-ins to be changed at the same time.
Ardour does this via jack. LMMS can do even more since it supports synth-plugins also... and it works best with its native-ALSA-interface.
You can do that with MIDI-Keyboard (available for about EURO 80,-) or you can use the automation, so in the end everybody can do that as he/she tests a system for the first time to find out, if it is usable for music-production.

>>So games produce sounds all new - and not out of samples?
>That is correct.

You would synthesize a thunderstorm or an realistic impact-noise from scratch instead of using a sample-player?

If your game can do such tricks then it is a hell of a softsynth ;-)

Whatsoever: sound-software can take any system to its limits easily, both in terms of bandwith and in terms of CPU-load. A professional oriented softsynth can be programmed to produce CPU-load to freeze a quad-core with 8Gig of RAM even if started without a GUI. An this is not a bug: you simply have the opportunity to combine so many generators and filters and FX that the maths become quite demanding and ahhmmm intense...

If OSS4 cannot do all this, it would impose a regression.
To change an established system, that works for many, many usage-scenarios (including games like UT, that works perfectly well with Alsa's OSS-emulation) is a quite hard decision - if the change brings regression in a field, that is commonly supported in both win and macos, I would even avoid a discussion about switching...

dawhead said...

@sinamas: the issue with push/pull models is very simple. you can't get minimal latency reliably with a push model, because you want the apps to generate audio very close to the time it will be played, which is more or less the opposite of how the push model is defined. what you can easily do, however, is to layer a push model on top of a pull model by adding buffering (which also means adding latency).

ergo: we have the pull model - which is best suited for realtime, low latency requirements - and then layered above it we can have a push model for apps that need it. the other way around doesn't work anywhere nearly as well, if at all.

this is why ASIO, WDM, CoreAudio, JACK and even PortAudio are all based on a pull model, and why game-centric libraries like SDL and OpenAL etc. are all based on the push model. there is no need for any conflict here, as long as the push model crew don't try to insist that their design propagates all the way into the lowest layers of an audio stack.

dawhead said...

@insane coder ... you quoted me saying:

Re: scheduling. Deadlines get missed because of 3 reasons (a slight simplification, but it will do). (1) they are not running in right scheduling class (SCHED_OTHER, the default POSIX/Linux scheduling class will never work for media without a lot of buffering to hide scheduling jitter) (2) excessively long code paths in the kernel (3) hardware stealing system resources (eg. PCI bus access) for too long. You don't fix any of these problems with a new audio API.

and then added:


No, but you can fix that when the mixing is all in-kernel.


I'm sorry, but this is just wrong. If your application misses deadlines due to scheduling policies, excessively long code paths or mis-designed hardware, how will "mixing in the kernel" enable it to catch up? It failed to read/write data in the correct timeframe - this has nothing to do with mixing. The only case it applies to is if the application in question is the "mixer" (e.g. a user-space sound server). However, we already know that with SCHED_FIFO or SCHED_RR scheduling and a correct kernel, this doesn't happen. Ergo - there is no reason to mix in kernel space. It doesn't reduce latency, and it doesn't solve any outstanding problems with scheduling deadlines.

sinamas said...

@dawhead:
You won't get any lower latency using the pull model than you do polling read/write cursors in a loop that wakes up every time the cursors change. The simple pull model means that if you don't have the data ready at pull time you underrun. This means that you underrun at an earlier point than you would if writing to whatever buffer the pull implementation ends up writing to directly. Now we need to buffer on top of this new underrun point instead of the underlying one. As sample generation is affected by other timing constraints, we can't just generate more samples at pull time without consequences. We also typically want enough buffering to determine if we're close to underrun because the OS didn't schedule us for some reason, so that we can do an unfortunate rescque operation like skipping a video frame and not waiting for user input. All this isn't too much of an issue as long as the pull implementation only does very little buffering and pulls often (like CoreAudio has done in my experience), but it's clearly suboptimal if the pull implementation simply does the buffer writing for us, and there's nothing about this case that makes it any less real-time than other soft real-time audio applications.

Like I said, WDM supports a push model through the WASAPI IAudioClient and IAudioRenderClient interfaces. SDL actually uses a pull model, and it is troublesome for this case because it easily underruns unless it uses a big buffer, and asks for the whole buffer at a time. OpenAL isn't too well suited either since it doesn't give accurate buffer status information.

In case you're curious, this is an emulator, not a game.

dawhead said...

@sinamas: You won't get any lower latency using the pull model than you do polling read/write cursors in a loop that wakes up every time the cursors change

That is the definition of a pull model. In a push model, the application simply pushes data to the API as it feels like it, and assumes that buffering and whatever else will keep things running.

Thanks for the clarification on SDL.

insane coder said...

Hi zettberlin.

>You would synthesize a thunderstorm or an realistic impact-noise from scratch instead of using a sample-player?

Yes!

The way it works in the program I pointed out, it doesn't know what kind of sound to synthesize in advance, it doesn't have an archive of samples, and they must be generated on the fly.

>If OSS4 cannot do all this, it would impose a regression.

It handles the scenarios I've tested it in better than ALSA does.
Perhaps in certain caseloads which you describe which I hope you can agree are the minority of users (most users do not do their own sound mixing, although most home users do play games), then maybe ALSA currently has better hardware support, I don't know, OSS4 defaults to not compiling in MIDI support. But that's why my article advocated giving the user the choice of whether to use OSS4 or ALSA. I personally believe that the average home user's distro should default to OSS4, while a distro geared towards the music mixing your describing can very well default to ALSA.

>To change an established system, that works for many, many usage-scenarios (including games like UT, that works perfectly well with Alsa's OSS-emulation)

So if UT does indeed use OSS, and you're playing it in ALSA, then you don't get ANY software mixing. In which case it doesn't work perfectly well with ALSA, because if you have a server running the background, or some program is left open with a sound handle, UT won't output any sound, or while you're in UT, you won't get alarm notifications from your day planner that you have to go somewhere.

Hi dawhead.

>I'm sorry, but this is just wrong. If your application misses deadlines due to scheduling policies

I didn't say application, I'm referring to the sound server, which all too often starves. They don't have an issue of starving when inside the kernel.

It also reduces latency, because the server starving causes output to take longer.

dawhead said...

@insane coder: "I didn't say application, I'm referring to the sound server, which all too often starves. They don't have an issue of starving when inside the kernel."

actually, you didn't make it clear which app was starving. but anyway, this is still a non-issue. JACK is a sound server that (a) adds absolutely no latency to the audio signal path (it does burn a few CPU cycles that might otherwise be available) and (b) does not starve on any properly configured system. i can run a 4-way parallel compile on my system, totally maxing out the CPU and the disk I/O subsystems for 30 minutes at a time, and JACK will never miss a deadline. on my particular hardware, that works down to about 128 frames @ 48kHz. other people have systems where this works down to 32 frames. and this is not because of ALSA or OSS: its just because JACK was written to get this kind of thing right. unfortunately, on a misconfigured system there is nothing you can do, and if the system is bad enough (i.e. has hardware with PCI bus hogging and/or interrupt handlers that disable interrupts for too long) then even an in-kernel solution will still fail some of the time.

ildella said...

today I tried (again...) ekiga, 3.2 on ubuntu 9.04. It worked fine with SIP, but I have latency in mic... I received perfectly the other guy but I am lagged, about one second. The ekiga wiki tells that pulseaudio gives latency problems due to some ALSA bug... hope that is true that latest kernel-alsa-pulseaudio fixes this problem.

Ericounet said...

hi,

You wrote that the lacking of kernel software mixing is a problem with alsa: I have two audio-cards and they all do hardware mixing: I don't need any software mixing: (m-audio and sb live cards).
So the alsa team did it right: you can add software mixing if needed, and avoid it when hardware mixing is available.

Or maybe, there is something I didn't understand right?

Valent said...

Here is what Lennart Poettering, Pulse Audio Guru had to say about this article:

Nah, this is a complete and utter bullshit story. Slashdot just proved
again that it is full of nonsense. Gah. Disgusting.

I don't think that this deserves a real response. I mean
really, this smells more like a astroturfing from 4front, with all
that OSS4 fanboyism.

This guys is just some lame fud blogger, not a technical guy who does
any real the work, knows the technical details, works with the
community and gets his stuff into the kernel or the distributions.

Would be good if Slashdot would verify that the folks whose story they
post actually know what they are talking about. Because this dude
obviously hasn't. But I guess Slashdot is not the New York Times and
asking some actual respected Linux developers or even just
linux-audio-devel before publishing such FUD stories would be asking for
too much.

That famous Adobe jungle picture that was posted 2007 was grossly
misleading already, and it still is. At least arts, nas, esd, oss were
obsolete back then already, and mentioning almost unknown niche system
such as Allegro or ClanLib doesn't make it any better.

What I have to say about the situation of Linux audio APIs I posted here:

http://0pointer.de/blog/projects/guide-to-sound-apis.html

If you care enough about slashdot, then try to get them to bring a
story about that blog story, even if it is already frm last year. As a
change from their usual stories this one, as I dare to say, would be
written by someone who has at least a bit insight into what's really
going on. ;-)

Lennart

segedunum said...

You know, I really, really worry about Lennart more than any developer I have yet seen. Once again he goes on the offensive but addresses nothing of what is written in the article. He keeps posting back to that rather inadequate article he did about the state of audio on Linux where he bludgeons the need for PulseAudio in. I cannot believe the mental gymnastics people go to to justify it, and no, per application volumes and mixing is not enough.

It is totally obvious if you look at the architecture that if you put in layers between the application and the sound card you will add latency. Jack at least seems to have been made with a particular use case and set of benchmarks in mind. When it does processing as Pulse does, as well as various other things Lennart deemed necessary, then it will add *a lot* of latency and even worse the latency will not be predictable at all. You have to be incredibly stupid or totally blind not to understand that. Having been through the experience I will *never* make a Skype call with Pulse in the middle ever again.

In addition, we've got all sorts of silly side-issues such as how one volume slider should be enough for anyone in Fedora. Want to adjust any other volume levels, and heaven forbid, line-in? Tough. It's not a use case for Lennart. Keep in mind that there's nothing unusual in this. That's all we're talking about here. Using the volume controls you have always been able to use on any system:

If you want to do weird stuff, use weird tools. Don't expect us to support all the exotic use cases minds could come up with to support in a single simple UI.

http://lwn.net/Articles/330684/

At the moment I'm involved in a bit of a company project to create a Linux desktop environment for our Rails developers to work with. I've even had a situation with OpenSuse 11.1 in testing where no sound comes out. I uninstall all the PulseAudio packages and it starts working. It's going to be cheaper just to get Macs with this brain damage going on.

dawhead said...

It is not true that adding layers between an application and the hardware automatically adds latency. the only thing you can say for certain is that adding layers takes away some processor cycles and may also cause memory/cache behaviour in ways that affect performance. The number of cycles is likely to be small for any sanely designed system, especially compared to the number available. The memory/cache behaviour can even be improved a little by additional layers, though this could only be determined on a case-by-case basis.

Latency is only added when delays to the signal pathway are introduced that exceed the "block size" used by the hardware to process audio.

Steve Fink said...

Err... did Lennart say you could post his response here? That smells like a private email. (And I see nothing wrong with its tone if it *is* private; if it was intended to be public, then he's being an ass.)

Reece Dunn said...

Before Ubuntu switched to PulseAudio, audio was good (aside from the oss3 single-device issue).

Since Ubuntu switched, I have had nothing but problems with audio. Some of this may be due to the kernel configuration that Ubuntu is using (see the pulseaudio mailing list). Some of it may be due to PulseAudio pushing the alsa drivers and exposing bugs in the drivers, alsa-libs and pulseaudio.

Having moved over to oss4 I have only had two issues:
1. some audio scratchiness when using totem -- not investigated as I don't really use it;
2. switching between normal and full screen for some applications (noticed it with the Einstein game -- SDL?) causes sound to stop and skip, as if it is muted. This only happens about 75% of the time.

This is far, far better than what I had previously.

Ideally, I'd like to see Linux distributions offer the choice of audio setup:
1. alsa and oss3
2. oss4 -- recommended for ordinary users
3. PulseAudio
4. Jack -- recommended for audio professionals
5. Phonon? -- not sure if this fits with the APIs like ALSA and Jack, or is more like SDL, and other higher-level APIs

Then, the other APIs like SDL and GStreamer will be available on top of the chosen setup.

The hardest part of this would be in the different audio setups for different applications/APIs.

Hannu Savolainen said...

>dawhead said:

"It is not true that adding layers between an application and the hardware automatically adds latency. the only thing you can say for certain is that adding layers takes away some processor cycles and may also cause memory/cache behaviour in ways that affect performance. The number of cycles is likely to be small for any sanely designed system, especially compared to the number available. The memory/cache behaviour can even be improved a little by additional layers, though this could only be determined on a case-by-case basis."

"Latency is only added when delays to the signal pathway are introduced that exceed the "block size" used by the hardware to process audio."

Latency of (say 1ms) means that all the processing required to produce new 1ms block of audio (be it called as fragment, period or something else) must get complete before the previous block has been consumed by the device. This requirement must be met every single time or otherwise an underrun will occur. This is very simple requirement and it should be obvious to everybody.

Having a processing path or a pipeline makes things slightly more complicated. There will be no latencies if the whole pipeline gets executed within the 1ms window. In addition the pipeline must be executed in the right order. If any of the layers/stages gets executed before the earlier ones then it will use old data. This will cause clicks in the output and probably the output will stay delayded by 1ms after that.

For this reason I'm bit suspicious about sound architectures where different layers/stages are handled by independent processes/threads. Is there really a way to guarantee that all the stages will get executed in the right order. What guarantees that all the actual client tasks producing audio will get executed before it's too late?

OSSv4 does mixing in the kernel. The mixing loop is part of the audio interrupt handler which guarantees that it will get executed immediately when the new (1ms) processing period starts. The mixing loop will also wake up all the client tasks immediately (in the beginning of the processing window). They have maximum amount of time left to produce next block of audio in time.

dawhead said...

Hannu, lets go over this again (again).

The hardware configuration of the audio interface creates latency because all PCI/USB/Firewire devices do block-structured processing. There is no sane way to get around this basic fact, and it means that however you configure the device, there will be a fixed minimal delay between data being written to the device and it emerging from some physical connector (and vice versa).

This fixed h/w-induced latency leads to two further questions we can ask:

1) do any of the software layers between an application and the hardware add anything to this latency?

2) can the CPU/OS/applications meet the deadlines that the h/w configuration imply?

With regards (1), the answer is that there is no inherent reason for additional layers to add latency, but they may choose to for various reasons. Many audio APIs (e.g. PortAudio, JACK, CoreAudio and others) add nothing. Others (e.g. gstreamer, phonon, various win32 APIs and other) deliberately add buffering, typically for one of two reasons: (a) to allow the application to use block sizes that are not related to the hardware configuration (b) to deal with some of the issues relating to question 2. The key point is this: there is no inherent reason why audio apps that have N layers of software between them and the hardware should have latencies larger than the hardware itself. The only limitation is that if they actually need close to the full block size to process audio I/O, then the CPU cycles used by the various intermediate layers may be an issue.

With regards to question 2, I still fail to understand why after all these years you cannot see the success of RT scheduling on Linux. Your kernel-side mixer is still subject to all the same glitches caused by badly behaved interrupt handlers and devices that hog the bus as any user-space code. Mixing in the kernel doesn't save you from having to context switch back and forth between the kernel and potentially non-RT userspace code. Arguably the kernel side mixer makes a few things a bit easier, but RT scheduling even in the mainstream kernel is now very good, and in the RT patched versions, it is good enough that there can really be no argument that kernel-side mixing is necessary, let alone that user space code cannot meet RT deadlines. And audio is far from the only application domain to realize this - the embedded systems niche has many examples of people with more demanding scheduling than even pro-audio, running in user-space with RT scheduling. Why do you continue to believe that you have to put code into the kernel to meet deadlines?

Hannu Savolainen said...

>dawhead wrote:

"1) do any of the software layers between an application and the hardware add anything to this latency? "

Not necessarily if the processing done by all layers doesn't take too long time. The question is how do you make sure that everything gets computed in the right order and within the available time window?

"2) can the CPU/OS/applications meet the deadlines that the h/w configuration imply?"

This depends both on the CPU and the applications. The time the applications need to produce new block of samples must not be more than the available CPU can give. In addition the computations must start in the beginning of the time slot. If they get fired too late then the result will be a timeout even the actual computations itself don't take too long. This is a very common problem when applications/layers use a timer that is asynchronous with the audio stream. This kind of timers may fire when there is just (say) 1% of the timing window left. If the processing takes anything more than 0.9999999% then there will be a dropout.

It doesn't matter if the intermediate layers (PortAudio, JACK or whatever) take just microseconds or nanoseconds to execute. If the result is not ready slightly before required then an underrun will occur. The problem is not the CPU time required by the software layers or the sound subsystem (which is probably measured in microseconds). The problem is that the processing must be start as early as necessary to get it completed before the time runs out. If the applications/layers use wrong timing approach then the error may be milliseconds depending on thge precision of the timer.

RT scheduling makes things slightly easier. However the actual problem will not get solved. All the 1st level applications that produce the actual audio mix must get fired before the next stage does the mixing. And this stage must get finished before the subsequent stages run. Finally the last stage must finish its processing before the time window enforced by the device ends. This could be possible if all the layers negotiate their firing times. However what happens if some high priority system task decides to execute in the middle of the chain?

In the OSSv4 vmix model there is just one final mixing stage that gets executed in the interrupt priority that is above any application processes (be they high priority RT ones or ordinary processes). The mixing will always get executed well before the device runs out of data. The applications will then have maximum amount of time to produce more output. If there is no intermediate mixing involved then all the processing done by the applications can be run without depending on any timers.

dawhead said...

The question is how do you make sure that everything gets computed in the right order and within the available time window?

By using the scheduling facilities that have been developed for this exact purpose.

In addition the computations must start in the beginning of the time slot.

so .. an interrupt handler has to wake up a user-space application, and quickly. wow

hannu, have you actually tried to understand JACK's design? do you understand that at least with an RT kernel, there are no high priority system tasks that can interfere. the only things that can run are (a) the kernel threads responsible for interrupts that have higher priority than the current task (ie. system timer and audio interrupt) or an even higher priority RT task. non-RT tasks don't get a chance.

every single issue you describe is something that has been more or less solved by JACK years ago. your description of the problems reads like something from the late 90's, not a decade later. there is nothing in your kernel mixing model that even attempts to make sure that applications supplying data to the device will run on time. and you write about if all the layers negotiate their firing times when this is the very opposite of what is required - deterministic, primarily serialized execution of a sequence of tasks. and your continued advocacy of the unix file-based API continues to encourage developers to write applications that completely ignore timing deadlines - which is OK as long you add lots of buffering to make up for this. In short, I simply do not understand how you can possibly describe the "problem" in the way you do, or why you think that kernel mixing is a solution.

Kernel mixing is a solution to one detail of the linux audio mess in my opinion. It means that there is no way "around" it - no secret backdoor to access the device. However, I continue to believe that the design is unnecessary and makes it all but impossible to ever be accepted into the kernel.

sinamas said...

@dawhead: "That is the definition of a pull model."

Only if it pulls _every_ time the cursors change.

AFAIK hardware often allows reading the play cursor (and writing more data to the ring buffer) at a higher granularity than the period interrupt fires. Pull implementations usually only pull at period interrupts, and if the period interrupt could be configured to fire at each audio frame most probably wouldn't because of the overhead. Thus the pull model often ends up with less flexibility when constrained by other deadlines, or in general to use for instance hrtimers or busy waiting rather than the period interrupts, compared to a push model that provides information on how far we are from underrun/overflow.

Hannu Savolainen said...

More about asynchronous timing. It looks lime many audio applications check if they can write given amount of audio data to the device without blocking. If this is not possible then they compute how long to wait and call nsleep() (or whatever) to wait until they can perform the write. This is potentially dangerous for several reasons:

1) The definition of nsleep() says that the function will sleep _AT LEAST_ the requested amount of microseconds. If the application has requested a sleep of 0.9 ms (900 us) then nsleep() will round this amount up to the next system tick. If the system HZ is 1000 then the nsleep() will wait until at least 1 ms has elapsed. This will mean a delay that is between 0.90000...00001 and 2 ms (depending on the moment when the call was made). This in turn will guarantee that the wait will take too long time.

2) The second problem is that the system timer (be it as real time as possible) is asynchronous with the audio device. The system timer tick may fire at a moment that is near the very end of the time window defined by the device. Even the system timer fires exactly at the right moment requested by the device the application may have no time left to complete all of its processing.

3) There may be other applications running at higher priority than any RT tasks. Even the problems caused by 1) and 2) don't cause dropouts the high priority tasks may cause them.

4) If all the underlaying layers of the audio pipeline use RT timers then what guarantees that they will get executed in the right order?

There has been discussion about pull vs push models. Pull is considered to be the best choice. However true pull is very complicated in practice. The pull event must be communicated down to the very lowest level off the applications that provide the actual audio data. Upper levels will have to wait until all the lower levels have run. Then an "done" signal must be delivered upwards and the upper level needs to wait for all of them. All applications must be able to do their processing within the time window. Otherwise the output of all the applications will suffer from breakouts. Doesn't this be this bit inefficient?

The OSSv4 approach is "parallel push". At the moment when all the applications can produce new data a wake up (push) signal will get broadcast to all of them. Since the applications don't do any mixing they don't need to synchronize with any upper mixing layers. Instead they just push/write more data to the device. The applications can use any number of libraries in their signal path. These libraries don't do any mixing between applications so they can be executed in the context of the application itself. There is no need to use additional (asynchronous) timers between the library layers. They just feed the data upwards to the layer that that writes it to the vmix device.

The problems will start only after all the tasks to run (be they audio or not) require more CPU cycles than available in the system.

Note that having exactly 100.000000000...% CPU load in the system means that the processor is probably overloaded. 100% CPU load is really the upper bound. Even the system load would require 1000 times more CPU cycles the load figure will not get anywhere above 100%. In single task systems the load can go up to 99.99999...% without problems. However in multitasking systems loads anywhere higher than 80% are likely to indicate situations where instantaneous load peaks above the CPU speed.

Hannu Savolainen said...

>dawhead said:

""The question is how do you make sure that everything gets computed in the right order and within the available time window?""

"By using the scheduling facilities that have been developed for this exact purpose."

What is the cost? You will have to do communication between the layers to ensure that they fire in the right order. If you take any independently developed audio software layers then how many of them do this kind of mutual co-operation?

""In addition the computations must start in the beginning of the time slot.""

so .. an interrupt handler has to wake up a user-space application, and quickly. wow"

Exactly. OSSv4 is able to let all the applications to wake as early as possible. It's up to the system how fast they actually wake up. The point is that no other approach will not be able to wake them up as early as OSSv4.

"hannu, have you actually tried to understand JACK's design? do you understand that at least with an RT kernel, there are no high priority system tasks that can interfere. the only things that can run are (a) the kernel threads responsible for interrupts that have higher priority than the current task (ie. system timer and audio interrupt) or an even higher priority RT task. non-RT tasks don't get a chance."

There are no higher priority tasks as long as you run only JACK. However what happens if there are?

Actually it doesn't matter if the audio tasks have higher priority than the others. Even if JACK is the only task and it consumes only 1% of the CPU time it may fail. What matters is just if JACK can get its processing done before it's too late. It must be guaranteed that JACK gets a chance to run when it has the maximum amount of time left to do its computations. Only the way how OSSv4 wakes up the client applications guarantees that.

Hannu Savolainen said...

sinamas said:

"AFAIK hardware often allows reading the play cursor (and writing more data to the ring buffer) at a higher granularity than the period interrupt fires. Pull implementations usually only pull at period interrupts, and if the period interrupt could be configured to fire at each audio frame most probably wouldn't because of the overhead. Thus the pull model often ends up with less flexibility when constrained by other deadlines, or in general to use for instance hrtimers or busy waiting rather than the period interrupts, compared to a push model that provides information on how far we are from underrun/overflow. "

This is partially true. It's technically possible to read the HW cursor (DMA pointer) at any time. This is not the problem. The problem is how to get the application to read the cursor at the right moment. It must get done far (enough) before the buffer gets emptied. There are two approaches to do that:

1) Use synchronous audio interrupt timing (blocking writes on OSSv4 devices). The application will resume after Tright+Tsystem_latencies.

2) Use Asynchronous RT timers when the application will resume after tRight+Ttimer_error+Tsystem_latency.

The difference is Ttimer_error. It can be zero but usually it depends on the time difference between the occurrences of audio interrupts and the system timer interrupts. The error can be larger than the length of the available audio processing time. window.

dawhead said...

@hannau, in the post that begins More about asynchronous timing...

... there are so many mistakes here i just don't don't know where to begin. no application is using a pull model API ever calls any of the sleep(2) related system calls in its audio i/o thread. that is just completely incorrect design, and incidentally, a design that has been inadvertently promoted by the use of file-based unix i/o APIs for audio.

your comments on the system timer reveal a complete ignorance of how CoreAudio works. apple deliberately uses the system timer in tandem with a DLL driven by the audio interface. they have come up with an elegant, sophisticated design that completely eliminates the issues caused by the presence of two clocks (the sample and system clocks). there is no reason that linux/*nix audio APIs could not do the same, they just don't. well actually, pulseaudio does, but its at too high a level in the stack to really get the job done correctly.

There may be other applications running at higher priority than any RT tasks ... do you not understand POSIX scheduling classes? Anything in SCHED_OTHER (that is, all kernel threads except for interrupt-handlers, and all user space threads except those granted !SCHED_OTHER) have lower absolute priority that any RT task, and they will NEVER execute if an RT scheduled thread is ready to run.

If all the underlaying layers of the audio pipeline use RT timers then what guarantees that they will get executed in the right order? .... more complete lack of understanding of how this all works. You don't use RT timers. In fact, there are no RT timers other than some accidentally named stuff in the POSIX API that actually have nothing to do with "real time" in any conventional sense.

Exactly. OSSv4 is able to let all the applications to wake as early as possible. It's up to the system how fast they actually wake up. The point is that no other approach will not be able to wake them up as early as OSSv4. ... this is nonsense. where do you think ALSA wakes up an application waiting on a file descriptor corresponding to an audio device? the only possible difference is whether you do it in the bottom or upper half of the interrupt handler, and if you do it in the bottom half, you've contravened kernel design policy and again contributed to the reasons why OSS is not acceptable to the kernel community. ALSA wakes up any application from the upper half of the interrupt handler, and if the application thread has RT scheduling, it will resume execution directly on return from the kernel. you cannot do this any faster without using the bottom half.

Even if JACK is the only task and it consumes only 1% of the CPU time it may fail. ... if it fails on an RT kernel, then OSS stands just as much chance of failing as well, since the scheduling error will be caused by interrupt handlers that block other interrupt handlers (including the audio one), or the data delivery error will be caused by a PCI bus hog that similarly blocks DMA from within the OSS driver stack.

hannu - do you actually understand that RT scheduling classes on POSIX are completely independent of SCHED_OTHER? do you understand that scheduling any SCHED_OTHER task if a SCHER_RR or SCHED_FIFO is ready to run is a major error in the OS scheduler, on the same level as blocking all interrupts somewhere other than a bottom half interrupt driver?

again, you write as if the linux audio community has not had JACK for the past 6-7 years. JACK does everything you claim is either hard, impossible or "not sufficiently guaranteed", and it does it (with help from the kernel) in ways that you simply don't seem to understand. moreover, it also allows inter-application audio routing, something that the OSS kernel mixer doesn't address, thus creating a double layer of "mixing" where only one is really necessary.

segedunum said...

It is not true that adding layers between an application and the hardware automatically adds latency.

Well no. KDE's Phonon is basically a set of APIs that just hands off to lower level systems and doesn't do any processing at all as far as I know. It's designed to be a sound API that looks the same as anything else you would program with on KDE and Qt and try to insulate their apps from this silliness. The opportunity for latency is quite small. I would imagine they did that because they got spooked by the whole sound server thing where we could have standardised on something like arts.......but everyone went off and did their own thing. To create another one seems like complete folly.

The JACK guys seem to have done a good job of identifying what they wanted to do and identifying what the problems would be and how to solve them. However, would people be able to standardise on it? The history of sound servers on Linux says NO. We keep coming back to the same position because no distributor is trying to identify what they need to do. They just throw whatever packages comes their way in.

As a developer I always try and solve problems as low down the stack as possible, especially if you want to assume prerequisites like mixing. To try and paint over those problems and wrap them with yet another sound server and then create ad-hoc emulation APIs so that apps that use ALSA will think they're using ALSA on top of all of that is insane. The bug list and the strange corner problems people seem to be having tells us that this simply isn't working with Pulse.

Valent said...

I'm forwarding Lennart Poettering response because he has no intent into discussing things with no point because he discussed this all before on gnome-devel and fedora-devel with people who have enough techical knowledge, so please read on these lists.

So here is Lennart's reply:

> segedunum said...
> "You know, I really, really worry about Lennart more than any
> developer I have yet seen. Once again he goes on the offensive but

So, you are asking me to respond to this? This is just fud, or uninformed at best.

Also, I am not sure which article he actually speaks of. It can only be this one:

http://0pointer.de/blog/projects/guide-to-sound-apis.html

But that has nothing to do with the "state" of audio. It's just a onverview over the current APIs.

The guy who wrote this says I am "bludgeoning" in the need for PA into that. Which is just not true. It is prety clear from my article actually that only for the fewest cases it is really good idea right now to link directly to PA, everything else should be done with other APIs. (only mixers should be linked directly against PA, nothing else).

I tried to be a nice guy, and made the dependencies on PA only very very soft in the audio stack. It is very easy to rip it out, if you don't like it. You can switch to another backend in Gst, in libcanberra, all over the place, and PA is gone. I designed it that way in an attempt to satisfy the folks who think PA is an abomination. But of course, this was a pointless excercise, people who want to complain, just complain anyway, regardless how all of this is
handled.

Also words like "mental gymnastics" certainly don't help to make this sound like a particular fair comment.

If you think that PA is a good idea, then use it. If you don't, then don't, but please don't constantly complain if others do. I am not forcing anything on anyone. I made it particularly easy to rip it out if you think I am evil and my code is too. And this is Free Software. So you get my stuff for free, and you have the complete freedom what to do with it.

Also, mixing/per-stream volumes is certainly not what PA is about. Try googling for why it actually is good. The biggest benefit still is the gitch-free playback mode, even though that might not be directly visible or even understandable by the users.


*** END OF PART 1 ***

Valent said...

*** BEGINNING OF PART 2 ***

> It is totally obvious if you look at the architecture that if you put
> in layers between the application and the sound card you will add
...
> *never* make a Skype call with Pulse in the middle ever again.

This is just so wrong. Of course every layer you put into your stack makes things a bit slower. But "*a lot*" is certainly wrong. A context switch or two is slower than no context switch, but it is still in the ns range, and completely irrelevant on the desktop. In fact PA is much more flexible with latency and fulfilling what applications ask for than any other system, because we schedule audio by system timers instead of sound card timers.

Much like PA, JACK also requires a couple of context switches before the data from the apps are written to the hw playback buffer. So in this respect there isn't much of a difference.

In fact, since PA actually does zero-copy data transfer via SHM, and by never having to convert/copy/touch the PCM data the cache pressure and CPU load of PA can be considerably lower than JACK's.

And the usual Skype "argument" doesn't cut it. Skype is closed source, it hasn't done a release for Linux in years. I cannot fix it, and it is seriously broken in many ways, and the vendor doesn't care. The right place to complain is skype, not PA.

> In addition, we've got all sorts of silly side-issues such as how
...
> the volume controls you have always been able to use on any system:

PA is happy to record from the Line-In. However, driving one sepaker set with two PCs is exotic. And I am not planning to support that. If you want to, then bypass PA and use a low-level alsamixer. There's nothing wrong with that.

The guy who wrote this seems to suggest I was taking away his freedom by not designing my stuff so that driving one speaker set with two PCs is just one click away. But that's just nonsense. The full ALSA mixer is here to stay. You will always be able to install some ALSA mixer and control input feedback if you feel the need for it. However, I really see no reason to expose this in the UI by default, because stuff like *driving one speaker set from two PCs* is NOT A STANDARD USE CASE. Make the common things easy, and the uncommon things possible. Which is what I am doing.

And the other "arguments" usually raised when folks talk about the mixer are even more exotic. e.g. supporting the "analog path" for CD audio playback. Which is completely obsolete, and we don't even ship any application on Fedora by default that could still do that.

Joe P. said...

After seeing the responses from Valent in the name of Lennart, I must take back what I've been saying about PulseAudio. It's clear to me now, that far from lying about PulseAudio reducing latency, that Lennart's intelligence level is such that he is able to mistakenly believe wholeheartedly that this impossibility and others like it are the case. The capacity for such self-delusion is a rare gift, and truly Lennart is among a select few to possess it in such quantities. This was made especially clear to me by the following sequence of events:

1. Valent posts a link to an article in the name of Lennart.
2. Segedunum replies to said link in a well-thought out and logical way.
3. Valent posts in the name of Lennart that he is somehow unsure of which article segedunum is referring to, supposes that it must be that article, and then insults segedunum for bringing it up, claiming that it "has nothing to do with the 'state' of audio".

Clearly this is indicative of the astonishing level of thought that goes into PulseAudio. Thank you Valent, for posting these responses from Lennart, and finally giving me a clearer insight into exactly what PulseAudio is made of. Once again, I apologize for calling the propaganda from PulseAudio "lies", and will endeavor to call it "sheer stupidity" in the future.

I also don't like the cry of "lack of technical knowledge", the claims that everyone with an opinion contrary to PulseAudio's is "uninformed", and the accusations of "fanboyism" coming from Valent in the name of Lennart. You keep attacking people for "spreading FUD", when in fact, it is you sir, who is spreading FUD with these outrageous accusations. Saying things like "the guy who wrote this seems to suggest I was taking away his freedom by not designing my stuff so that driving one speaker set with two PCs is just one click away", when the author clearly never proposed anything of the sort, or indeed anything outside of standard use cases, is not only spreading FUD, but libelous to boot.

Making these baseless, hypocritical, and even absurd accusations is not going to make you any friends, and will only serve to discredit yourself further. I can't speak for anyone else, but I have done extensive work with sound APIs, consider myself to be very informed, am nobody's fanboy, and don't appreciate having my intelligence insulted. I use the best thing available, and frankly, that is most definitely not PulseAudio.

Joe P. said...

But this is all tangential. I am sure that insane coder didn't write this article to start flame wars, name calling, competition, and pointless posturing. He wrote it to outline the state of sound in linux, and clearly says that certain sound systems are better for certain systems than others, and that it should be made easy for the end user to choose which sound system they want their programs to use. People seem to be getting confused over the difference between a sound system and an API, so let me break it down for you.

Sound systems are the execution layer of a computer which control sound output. These days, all the major sound systems are pretty much compatible with any API, as insane coder has said many times throughout this thread. Given that they are compatible with any API, what sound system is used ultimately matters only to the end user.

The API, however, is the set of libraries and codes a developer uses to interface with the sound system. The confusion comes because many of these APIs are made in conjunction with the sound systems, and provide direct access to their associated sound system, for example, the OSS and ALSA APIs. This does not make them inseparable from their sound system counterparts, nor does it prevent cross-usage. Other APIs are built on top of the existing sound systems, and many of them can even be configured to tell them which one to use, such as Phonon. API only matters from a development standpoint, one should use whichever API best fits their use case.

Now that we have a clear demarkation between what applies to development and what applies to the end user, we can say that all this article is really doing is providing an outline of what sound systems exist, which ones are better in what cases, and saying that the choice of sound system should be easily available for the end user.

insane coder said...

From reading the multitude of comments I'm getting, I'm starting to wonder if I'm the only sane guy in a nut-house, so I appear insane to everyone else.

I said OSS is better for some cases, while ALSA is better for others. We who prefer OSS would like it to be easy to select OSS, and easily switch back and forth if need be. And if you prefer ALSA, then by all means, keep ALSA. Yet this message somehow seems to have been completely lost, and I wonder if people lack reading comprehension.

What comments aren't being posted here?
"Hey OSS can't be part of the official Linux kernel tree."
"Hey how can you suggest OSS when ALSA works better for me?"
"Does OSS support my piano which currently works with ALSA?"
"Prove it that OSS works better for games."
"But OSS lacks suspend which I need, you're insane!"

I'm not the only one who is finding OSS to fit their needs. There's posts about it everywhere on forums and blogs and heaven knows where else. Why is it we have choices between many setups for Linux in other areas where some choices are obviously inferior to others across the board, but in a basic case of audio where there's obvious pros on both sides, and each fits a particular area really well, there's a lack of choice? Why are people against having this option?

Kudos Joe for noticing what this article is about (or at least, I think you got the point).

Theodore Tso said...

Insane coder,

The reason why people are pointing out problems such as "why OSS has a long way to go before it would be accepted into the kernel", and "OSS doesn't support suspend/resume, which is fatal", etc., is because you were suggesting that distributions support OSSv4, either instead of or in addition to ALSA.

Most mainstream distributions don't want to support kernel patches or kernel drivers which aren't in the mainline because it is a support nightmare for them. Ubuntu will do it for some proprietary drivers if there is a hardware manufacturer willing to pay all of the costs associated with supporting that out-of-tree driver, and there are no other alternatives. But that's an exception, not the rule, and in the past many distributions have paid dearly for supporting something not in the main stream kernel ---- since funny thing, once a distribution a start supporting some feature, their paying customers expect it to be there in subsequent releases, even if they have to delay releases and move heaven and earth to make it work (for example: Xen).

Suspend/resume is critical because distributions targeting the desktop have to have smooth suspend/resume. If a distribution switches to OSSv4, and as result, suspend/resume breaks, it's a great way of driving away its user base. So you are suggesting things that most sane distribution release managers wouldn't even begin to contemplate --- certainly not for distributions which are successful and intend to remain so.

There are very few users of Arch Linux and even fewer of Draco Linux. Ever wonder why? Saying that suspend/resume doesn't work isn't FUD, it's simply stating a fact that for many people is a show-stopper. You can't blame a distribution for shying away given that fact.

insane coder said...

Hi Theodore.

>once a distribution a start supporting some feature, their paying customers expect it to be there in subsequent releases

That's the great thing about something like OSS, if it isn't there, you don't realize it isn't there except for the following:
No sound mixing with OSS API.
Possibly worse sound mixing with ALSA.
Possibly worse latency with ALSA.
Lack of drivers for certain cards.

Those reasons in themselves are reasons OSS should be there in the first place.
However, applications won't stop working if OSS isn't in a future release.
If there's something in OSS that a company will seriously depends on, why isn't it in ALSA already?

>Suspend/resume is critical because distributions targeting the desktop have to have smooth suspend/resume.

So make the default ALSA, what's the problem? If you currently use OSS, you know the risks, and don't care. Also this is a reason to help with suspend/resume, instead of ignoring OSSv4.

>There are very few users of Arch Linux and even fewer of Draco Linux. Ever wonder why?

Not in the slightest, and the reasons have little to do with audio.
For the average user, the less complex the better, neither Arch or Draco make things easy on the user compared to say Ubuntu or SuSE.

>Saying that suspend/resume doesn't work isn't FUD

No one said that it not working was FUD, I myself in the article said it doesn't work.

>it's simply stating a fact that for many people is a show-stopper. You can't blame a distribution for shying away given that fact.

I can blame a distro for not offering it as an option to those who want it despite the issues.

Dan said...

Insane coder, why are you bothering to respond to him? It's like talking to a wall. He's just repeating yourself, then you repeat yourself... even though it's stupid to say "we're not gonna support it because we may need to stop supporting it" for something that can be switched with no hassle, you're not gonna get through to him, and playing a broken record in response to a broken record is not gonna be productive. Just focus on getting your main message across and ignore anyone who makes irrelevant and repetitive arguments. Seriously.

insane coder said...

Hi Dan.

Very good points, now that I think about it, and review some of what Theodore has to say, it seems clear.

First he posted in my other article about "how to configure OSSv4 to be more compatible for people that use OSSv4" with an off topic post that OSSv4 is evil or some such.

Further he posts here in response to OSSv4 offering both fixed and floating math, whichever you prefer to use, that floating is evil and therefore OSS can't be used, even where fixed can be set to be the only option provided by a distro, where I already stated "Now it looks like you're looking for problems which don't exist, and just want to bash OSS, if that's the case, then I have nothing further to say on the matter.".

Not to mention the constant repeating of details already known, and complete ignorance of anything counter to his ideas, or answering questions posed in response.

Seems he has a vendetta against OSSv4 and can't be objective about anything discussed here. Doing a little search on Google shows someone by the same name who works on Linux, if he is indeed the same person, it'd explain why there's so much bias in Linux regarding OSS.

I'll take your advice, and ignore him in the future, even though I'd rather get people who can make important decisions understand that there are issues with ALSA and the status quo just isn't good enough for certain case loads. I really don't want this sorry state to continue. But no one seems to want to fix cases where ALSA is lacking, instead pretending that such issues don't exist. No one wants to even think about offering OSSv4 either, due to bias, or this golden pedestal ALSA has been placed upon, even though at the very least it'd offer ALSA good competition that may improve the situation. I like Linux for home use, and really would like to get all of my company's products ported to Linux, but no one wants to make Linux ready to do so. This sorry state will probably continue for many years to come, and it seems no matter how many users or 3rd party developers complain legitimately about the situation, we'll just be ignored. How many years and how many users or developers need to make a fuss before some action will be taken?

Theodore Tso said...

>Lack of drivers for certain cards.

I'm not aware of cases where OSSv4 has a driver for a card where ALSA does not --- can you give some examples?

>I can blame a distro for not offering it as an option to those who want it despite the issues.

So most distributions have ways that people can contribute OSSv4 packages as optional "community-supported" packages. Whether that is "first class" or "second class" depends on competence level of the contributor. If for example Hannu did that with Ubuntu packaging, that might be a way to convince Canonical to hire him. (Note: I am not a representative of Canonical, so obviously this isn't a job offer.) But that's how this works; you offer the packaging as proof of your ability to add value, and then ask the company to hire you; you don't ask the company to hire you on spec.

Or maybe you could do it, if you think you are competent enough to create the packaging for Ubuntu or Fedora. Or maybe you can recruit some people to help you with the packaging. Debian uses teams to package complex software such as GNOME or KDE. But the power of open source is that if you think something should be done, you have the power to do it. When I wasn't satisfied with the properties of the ext3 filesystem, I didn't kvetch about it; I worked to create ext4, and recruited other people to help with the project. If you feel so strongly about OSSv4, then go and try to do something about it.

Kvetching about it, however, isn't so likely to achieve much in the way of results. Talk is cheap; action gets you results. So if you feel strongly about it, go forth and package OSSv4 for some distribution and invite people to use it!

Joe P. said...

In response to Dan and insane coder, I don't think that that is the correct response. Though repetitive, and slightly biased, Theodore, unlike many of the other people who have commented here, seems to at least have a decent amount of intelligence, and is somewhat reasonable. It's people like this who need to be convinced to take an objective look at things in order to bring about the freedom you aim for in terms of linux choices.

In response to Theodore himself, you still seem to be missing the point. It doesn't matter whether you think OSS is a good system or not. Whether you approve of kernel mixing, or think suspend/resume is a key feature, or are worried about future support of the system is not the issue here. As I said in my previous comment, the only feature that is 100% necessary for a sound system is cross-API compatibility. Given that OSS is cross-compatible, as is ALSA, (and PulseAudio for that matter,) there is no reason not to support OSS for as long as it's feasible. If you're a distribution manager, and you decide that suspend/resume support is a key feature, then feel free to make ALSA the default, but at least provide OSS as an option.

Also, your concern about future maintenance isn't gonna fly, because as I said, as long as the sound system is cross-compatible, which it is, packages can be set up to swap out one system for another with minimal effort. Thus, the highly unlikely apocalyptic change in OSS compatibility that you keep prophecizing is a non-issue. Even if this doomsday of yours does come to pass, the distribution can simply replace it with another system no problem. An argument that says "this may not last forever, so there's no point in doing it now" is absurd. Otherwise, I wouldn't get up and go to work every day, because I'm going to die eventually anyway, so what's the point? Further, most packages that distributions have to maintain are not part of the mainstream kernel. Why is this one any different?

All that I'm saying, and I believe insane coder to be saying as well, is that if the distributions provided support for whatever sound systems anyone might want, then the end user would have the freedom to make their own choice. That way, if I'm an end user with a chipset on which OSS provides superior sound quality, then I can choose OSS. If I decide instead that I need seamless suspend/resume support, then I can choose ALSA. And if I want to be a moron for whatever reason, then I can choose to use PulseAudio, but it should be my choice to make, not the distributions'.

Joe P. said...

Just a quick response to Theodore's most recent comment. I hadn't noticed that you had replied again when I wrote my last post. You make an excellent point, perhaps insane coder should look into packaging OSS himself, maybe get Hannu in on it as well. I'd certainly be willing to provide whatever support I could to help get the options out there. So, guys, what do you think?

dawhead said...

@joe: That way, if I'm an end user with a chipset on which OSS provides superior sound quality, then I can choose OSS. If I decide instead that I need seamless suspend/resume support, then I can choose ALSA. And if I want to be a moron for whatever reason, then I can choose to use PulseAudio, but it should be my choice to make, not the distributions'.

I'd be curious just precisely what you imagine the role of the distributions is, because I'm pretty certain that most of them don't see it the way you do.

The problem with your claims about cross-API compatibility is that it only goes so far, and the "so far" that it goes is almost never enough to stop clamors for something more to be done. The platforms on which sound works well (such as OS X) are marked by a complete absence of attempts at back-compatibility (Apple simply told all developers of audio apps for OS 9 that their apps would have to be re-written), and by an almost complete absence of add-on layers designed to make things "easier" for app developers. Ironically, one of the better of those layers that does exist on OS X (JACK) originates on Linux.

Theodore Tso said...

@joe,

>Further, most
packages that distributions have to maintain are not part of the
mainstream kernel. Why is this one any different?

Most packages that distributions maintain are pure userspace-only software. OSS has a significant component consisting of kernel modules (or kernel source files which must to be merged into the kernel sources used by the distribution).

Binary kernel modules must be recompiled each time the Linux kernel is modified, and there is no guarantee that the kernel modules will work with newer versions of the mainstream kernel. In practice most manufacturers that try to maintain external device drivers have to continuously modify them to work with the latest mainstream kernel.

This is what makes OSS non-trivial for distributions to support; the fact that sound drivers are kernel code, and kernel code not merged into the mainstream kernel is a nightmare to support.

Most packages that distributions have to support (aside from the kernel, obviously) are userspace only software, and the ABI exported for userspace programs (i.e., system calls) is stable. This is not true for kernel modules; the ABI, and indeed the API, is not stable. This is why kernel modules must be recompiled (and then retested) for each kernel release; even a minor change to the configuration options used to compile the kernel may result in changes in the kernel module ABI. (i.e., due to changes in data structures used by the kernel modules to communicate with the core kernel code, etc.)

It is constantly amazing to me how many people (including Insane Coder) are saying, all distributions have to do is "just" support OSS. If you think it's so easy, you go ahead and try to package OSS for use in some distribution! After all, it's trivial, and won't require massive amounts of ongoing support to provide both now and in the future --- according to you, it's just a triviality, and it's amazing to you why distributions aren't pursuing such an "obvious" course of action. Well, if it's so easy, why don't you do it?!? Open Source means that there's nothing stopping you other than your competence level and the laws of physics. :-)

Hannu Savolainen said...

Valent wrote:

"In fact, since PA actually does zero-copy data transfer via SHM, and by never having to convert/copy/touch the PCM data the cache pressure and CPU load of PA can be considerably lower than JACK's."

Really?

"Zero copy" looks certainly cool when mentioned in the propaganda material. However talking about "considerable" savings is more or less bull's fart.

Typical audio stream is 48 kHz / 16 bits / stereo which gives 192000 bytes/second. I did a simple test in my computer (an entry level Athlon 64 one) and running memcpy(a, b, 192000) one million times took 44 seconds so copying one second of audio data took just 44 usec and the total load caused by copying was 0.0044%. Looks really significant.

History of audio in Linux is full of "emperor's new clothes" style solutions. If the previous sound system lacks some feature(s) then somebody reimplements a completely new solution from scratch to add the feature(s). The new solution has always different API than the older ones. There is usually a mixing server that cannot be used at the same time with the other servers. As a workaround the new solution emulates some of the other servers. Finally after all the applications have been converted to use the new system somebody comes with an yet another sound system that makes all the others obsolete.

First we had OSS. Then came ALSA, ESD and ART. Then JACK was considered to be the API to use instead of ALSA. Now we have PulseAudio. Also there is PortAudio and Gstreamer. What's next?

New systems are marketed using buzzwords like "more advanced", "low latencies", "zero copy" and "glitch free" that don't mean anything in reality. Of course all distributions must move to this brand new system because it's "zero copy". And all distributions should also start shipping preemptive RT-kernels instead of the vanilla one because the new sound system depends on it.

dawhead said...

@hannu: First we had OSS. Then came ALSA, ESD and ART. Then JACK was considered to be the API to use instead of ALSA. Now we have PulseAudio. Also there is PortAudio and Gstreamer. What's next?

This is wrong and you know it.

ESD was written for and existed alongside the OSS drivers and API, before ALSA was developed. ESD was not an API, but a server that allowed applications using the unix/file-based API provided by OSS to share the audio interface.

This is also true for arts to some extent, although it is true that it also contained an API for "more advanced" apps to use for audio synthesis and so forth. I believe that it also predated ALSA, or at the very least continued to use the unix/OSS audio API for a significant time. arts was discontinued by its developer when he became of its limitations.

JACK has never been promoted as the next API to use - JACK has always been targetted at a specific niche market (pro-audio and music creation software) precisely because neither the OSS nor ALSA API provide the necessary functionality by themselves. PulseAudio might never have been started if, in fact, us JACK developers had been more willing to promote JACK as the "next API", but we were not and are not. This is not because of the technology, but the political/social issues with trying to get unified use of an API on the linux platform.

PortAudio has been around since before ALSA as well, and unlike OSS or ALSA it explicitly aims at providing a cross-platform API to allow developers to simultaneously target Windows (ASIO/MME/WDM), OS X (CoreAudio), OSS, ALSA, JACK and other audio APIs. It is *still* the case that a developer gets the most portability out of using PortAudio compared to any other API. It is even more notable that the basic design of PortAudio's API mirrors that of ASIO, WDM, CoreAudio and JACK in ways that OSS and ALSA still do not really provide.

GStreamer is not an API to deliver audio to an audio interface, though it happens to come with plugins/nodes that do I/O. The choice to use GStreamer is not really based on getting audio in or out of the program, but on the need to set up complex processing streams involving multiple data types. None of the APIs mentioned here provide any facilities for doing this, nor have any plans to do so.

And all distributions should also start shipping preemptive RT-kernels instead of the vanilla one because the new sound system depends on it.

This is also untrue. The preemptive RT-patched kernel certainly is even more reliable than the mainstream kernel. But so much of the RT patch has been merged into the mainstream kernel already that you can actually get very good results without that particular kernel. It would certainly be nice for people who wish to get down to the hardware limits of their audio interfaces if an RT kernel was always available, but, for example, the majority of users of JACK are not running on RT kernels and still get quite reasonable results.

and finally: History of audio in Linux is full of "emperor's new clothes" style solutions. If the previous sound system lacks some feature(s) then somebody reimplements a completely new solution from scratch to add the feature(s).

yes, this is a problem. but its a problem that actually originates with the problems many developers had specifically with you, Hannu, and your refusal/reluctance/unwillingness to do anything substantive to the OSS system when it was part of the kernel, thus causing the development of ALSA. had you been more willing to take OSS in radically new directions then, ALSA may never have been started, and perhaps OSS would contain the kind of design represented by CoreAudio/WDM/JACK, thus removing another major post-ALSA development.

Herby said...

Interesting discussion. Now get a recording application functional. Even better one that uses more than two channels recording. After that, try multiple sound cards recording, each using (how about 8) multiple channels. Synchronized! Yes, there are recording applications that desire to record as many as 32 channels at once. Let the flames begin as to which is "better".

insane coder said...

Hi. Theodore
>I'm not aware of cases where OSSv4 has a driver for a card where ALSA does not --- can you give some examples?

Currently? I'm not sure. OSSv4 was the only thing that supported Creative's X-Fi card till recently.
I have an SB600 that didn't have any sound with ALSA when I first got it, and I haven't tested ALSA on it since I installed OSSv4 on the machine.
I have a friend with an on-board (something from Intel I believe) that he says doesn't give him sound in ALSA.

However, as for a difference in sound, there are many cards (generally on-board) which when tested seem to provide clearer sound with OSSv4 than ALSA.
For the SB750 when I tested it, the clarity of sound seemed like night and day. You can also check the comments in my first sound article that many users commented they got better sounding sound with OSSv4. I see it discussed on the Ubuntu forums from time to time too.

>If you feel so strongly about OSSv4, then go and try to do something about it.

I'm currently spending some of my free time on working on a volume control program for OSSv4, since I don't like the one provided, and the alternate ones I've seen don't appeal to me.

>Kvetching about it, however, isn't so likely to achieve much in the way of results. Talk is cheap; action gets you results.

Spreading knowledge many times lead to results. How many people knew they could get mixing with /dev/dsp when they lack a hardware mixer before reading this article? For those that want to play some commercial Linux games while also listening to something else, that can be very useful.

>So if you feel strongly about it, go forth and package OSSv4 for some distribution and invite people to use it!

4Front already supplies packages for Debian/Ubuntu, Fedora, and others. This article is inviting people to use it, and improve it where it is lacking.
What I want from distros is just to include those packages in their repositories, and on install, also sets up the other sound systems to use it if applicable.

Hi Joe.
Yes, I will continue to respond to Theodore, if it's not just a rehash of what was already said.

Hi dawhead.
>I'd be curious just precisely what you imagine the role of the distributions is, because I'm pretty certain that most of them don't see it the way you do.

Let's think about this for a moment. All the major distros support GNOME, KDE, and XFCE and whatever else. They offer nano, vi, and emacs. They offer Firefox, Konqueror, SeaMonkey, Arora, and others. Being that supporting GNOME or KDE in themselves are a huge undertaking, they should be able to offer another sound system too. Why would I think this isn't up to the distros?

insane coder said...

Hi Theodore.
>Binary kernel modules must be recompiled each time the Linux kernel is modified, and there is no guarantee that the kernel modules will work with newer versions of the mainstream kernel.

Yes, true.
And 4Front has been keeping up with their packages, I've never seen them fall more than one version behind the kernel. And distros rarely ship the newest kernel version with the latest version they put out anyway. Debian 5.0 is on 2.6.26? Ubuntu 9.04 is on 2.6.27?

>It is constantly amazing to me how many people (including Insane Coder) are saying, all distributions have to do is "just" support OSS. If you think it's so easy, you go ahead and try to package OSS for use in some distribution!

This is already done. Have you even bothered to look what 4Front offers in their download section?

>Well, if it's so easy, why don't you do it?!?

As for what I perceive to be a lack, I'll deal with what I know I can handle. As I said, I'm currently working on a volume control app. I'd also be happy to make a program to setup all the sound wrappers to use OSS, instead of requiring the user to do it. Although I'm not familiar with what each distros views as the correct way to manage such a program, but I'd be happy to make a C program which can check for the existence of said files and update them. (I don't know sed, and awk or whatever most distros normally use to do that kind of stuff.)

Hi dawhead.
>but its a problem that actually originates with the problems many developers had specifically with you, Hannu, and your refusal/reluctance/unwillingness to do anything substantive to the OSS system when it was part of the kernel, thus causing the development of ALSA.

This has been brought up already, and no Hannu was not responsible for ALSA, it was those that felt they have to keep ditching the current API and layers to go anywhere, instead of just improving the existing ones. Even if it was/is Hannu's fault, it doesn't mean we should continue to live with the issues we have now. I think everyone here can agree there is some sound problem on Linux, even if we don't agree on what the exact solution is. I also think most of us agree that we need to fix the underlying system as opposed to keep adding multiple layers on top of it to hide issues.

dawhead said...

@insane coder ... you talk as if "just fixing the underlying system" is somehow a simple task. Ignoring all the technical issues, which are actually relatively tractable, there is the much vaster problem of doing this within the context of the overall Linux design/architecture process.

I have said many times (even within this thread) that when Apple switched to CoreAudio (an audio subsystem and API that seems to work very well for just about any purpose) that they had the ability to simply bludgeon all their developers over the head with the new API.

We simply don't have that choice in Linux. Even though there were many perfectly good technical reasons to replace OSS with something else back the late 1990's, even the replacement continued to be required by everyone to be compatible with the OSS API. There was never any question of simply coming in and saying "the old API is gone, use the new one".

Simply getting developers of all kinds of audio software to agree on what the audio subsystem should look like is almost impossibly hard. Game developers want X, pro-audio developers want Y, desktop app developers want Z. If someone comes along and says "You should all be using A", the response will be the same as it has been all along: each group will continue to use the APIs and subsystems they have already developed.

It is hard to imagine a system that is so much better than the specific APIs generally used by each development group that everyone will be convinced to switch to it. When Apple and MS do this kind of switch, they do it by forcing the switch, not hoping people will agree that the new system is better than the old. Who do you think in the Linux world is capable of imposing that kind of switch?

Theodore Tso said...

>And 4Front has been keeping up with their packages, I've never seen them fall more than one version behind the kernel. And distros rarely ship the newest kernel version with the latest version they put out anyway. Debian 5.0 is on 2.6.26? Ubuntu 9.04 is on 2.6.27?

Well, I tend to do actual kernel development, so if it doesn't work with the bleeding edge kernel, it's not very useful for me. But then again, the in-kernel ALSA drivers work just fine for my Lenovo X61s laptop (which uses the snd_intel_hda drivers). But hey, great, people can use whatever they want.

If they have bug reports, they can report them to 4Front, which I assume will only give help if they are paying customers. That's a fine model if people are willing to live with the tradeoff that it brings.

If enough people in the community distributions use it, maybe the commercial distro's will decide it's worth it to support it. Or maybe they won't, figuring that their primary customer base (mostly enterprise servers for many of the commercial distributions) doesn't care about sound.

As dawhead has already pointed out, this isn't likely going to change the API's used by most of the applications. But maybe that won't matter; you've made the claim that for at least some cards, using the ALSA libraries with the OSS back-end is still preferable.

Hannu Savolainen said...

dawhead said:

"Even though there were many perfectly good technical reasons to replace OSS with something else back the late 1990's, even the replacement continued to be required by everyone to be compatible with the OSS API. "

And what were these perfectly good technical reasons" to *REPLACE* OSS?

There were some areas where OSS was incomplete such as the mixer (control panel) functionality and missing way to find out what kind of devices are available. These problems and some minor ones got fixed far before ALSA become functional. The fixes made to subsequent versions of OSS never got back ported to OSS/Free drivers included in the kernel because maintainers of OSS/Free didn't see them necessary.

The real reasons why OSS was "replaced" were not technical but something else. Some developer(s) wanted to create a new sound system instead of improving OSS. They invented technical looking sales arguments like "better latencies", "more advanced" or "supports all aspects of hardware" that look like killer arguments even nobody understands what they really mean. OSS didn't support these featurettes so the only choice was to replace it with something entirely different instead of improving OSS.

dawhead said...

@hannu: And what were these perfectly good technical reasons" to *REPLACE* OSS?

1) OSS had no shared infrastructure - every driver was entirely self-contained. Many drivers started life by copying an existing driver, and then a bug would be fixed in an ancestor and not a descendant. It became clear that the drivers should be sharing some kind of mid-level code.

2) OSS was not thread safe - if multiple threads were using the device it was relatively easy to crash the kernel.

3) OSS had no user-space API other than the Unix file API. Attempts to suggest that it should were met with fierce resistance.

4) The OSS MIDI sequencer had many issues with timing, and could not be used to move MIDI between applications.

5) OSS had no way to represent hardware data formats in a device-neutral way. Applications that wanted to use devices with non-interleaved data formats would have to know that the specific device behind /dev/dsp required this. It was impossible to write h/w independent applications as soon as we begin to create drivers for non-consumer audio interfaces.

I could go on, but my memory is starting to get a little rusty.

The real reasons why OSS was "replaced" were not technical but something else. Some developer(s) wanted to create a new sound system instead of improving OSS. They invented technical looking sales arguments like "better latencies", "more advanced" or "supports all aspects of hardware" that look like killer arguments even nobody understands what they really mean.

No Hannu, you've demonstrated consistently that you don't understand what these phrases mean. There are plenty of people in the Linux audio community who understand precisely what these mean. And please remember - I was there at the beginning of ALSA. It wasn't "some developer(s)" - it was very specific people (Jaroslav, and even me) who got irritated that you refused then (as you refuse now) to acknowledge the limitations of the OSS driver implementation, and refused to cooperate with people who wanted to fix things in a deep way. It wasn't that we wanted to see an end to OSS, but we felt that it was very clearly that OSS needed to move in a direction that you (and Dev) were not prepared to move in.

Don't feel too bad - we have similar issues in the JACK community at present, and who can tell where that will take us?

And I want to stress that I am not defending ALSA - ALSA too suffers from some major design decisions that in retrospect turn out to be flaws. But because neither Jaroslav nor Takashi (the only guys who are really paid to work on ALSA) have the same level of personal attachment to ALSA as you did to OSS, and because the ultimate decision making process for ALSA now lies with the relevant parts of the linux kernel "group", there seems more of a chance that it may shift in the right direction (whatever the hell that is) over time.

Theodore Tso said...

@dawhead,

> OSS had no shared infrastructure - every driver was entirely self-contained. Many drivers started life by copying an existing driver, and then a bug would be fixed in an ancestor and not a descendant....

To be fair this was not entirely Hannu's fault. The problem was that when disappeared trying to make the commercial version of OSS, the Linux distributions needed sound drives that could be loaded as modules. Since Hannu didn't do the work, Alan Cox did, and arguably he did it the quickest way possible. (Source: http://lkml.indiana.edu/hypermail/linux/kernel/0706.3/0122.html)

As far as OSS not being flexible enough from an API point of view --- perhaps that would have eventually been solved, and maybe Hannu has solved in OSSv4. But the problem in the meantime people needed a solution right away, and they needed an open source solution; not a closed-source proprietary solution.

I see this as a Greek tragedy more than any thing else. Hannu needed a job, since he liked food for his meals, and at the time when he was casting about looking for a business model, it was too early for him to find a job from the Distro's. I can't blame him for doing what he did, but when he stepped away from maintaining the free, in-kernel drivers, he lost the ability to dictate what the in-kernel driver interfaces would look like.

The interesting thing is that if you take a look at what I think is a fairly dispassionate view of the history, say here: http://www.linuxhardware.org/article.php?story=01/03/06/179255 and Hannu's version of the events, say here: http://4front-tech.com/hannublog/?p=5 they don't actually differ that much. He ascribes to malice and conspiracy theories what I believe was the desire to make things work as best as could be --- using Open Source Software (sorry, no Linux distribution was going to tell their users that if they wanted sound they had to pay $$$ to 4Front Technologies, Inc.)

At the end of the day we are where we are. There are fundamentally sound reasons why distributions aren't going to support out-of-tree device drivers; and even if 4Front is (for now) providing ports of OSSv4 to recent Linux kernels, can distributions count on this happening indefinitely? Especially given that Hannu doesn't have a job, and if he does, who knows if it will continue?

In any case, you don't have to convince me; I don't work at a Linux distribution, so it's not my call to make. I know lots of people who do, and I'm pretty sure I know what their priorities and concerns would be, but if you think I'm wrong, go ahead and try to convince a distribution to depend on a 3rd party out-of-tree kernel driver supplier.

Personally, I think we're better off trying to make the existing libraries and kernel space tools better, than continuing this OSS vs. ALSA battle.

dawhead said...

@theodore: Personally, I think we're better off trying to make the existing libraries and kernel space tools better, than continuing this OSS vs. ALSA battle.

Well, yes, of course. But that requires a definition of "better", and so far the incremental approach ("lets fix this, then lets fix that, then lets fix whatever rises to the top of the pile") has really caused a pretty nasty situation. When all the involved parties can't even agree on what the situation should be evolving towards, its pretty hard to work to make things "better".

For what its worth, I hope to be presenting a talk on this at the Linux Plumbers Conference this fall. Not sure that it can do a lot, but I'll try not to pull punches.

Hannu Savolainen said...

dawhead said:

>@hannu: And what were these perfectly good technical reasons" to *REPLACE* OSS?

"1) OSS had no shared infrastructure - every driver was entirely self-contained. Many drivers started life by copying an existing driver, and then a bug would be fixed in an ancestor and not a descendant. It became clear that the drivers should be sharing some kind of mid-level code."

This is not a fault of OSS but OSS/Free. OSS has always had shared infrastructure where everything that can be done in common code is done in common code. The guys who continued maintaining OSS/Free after me wanted to optimize drivers for performance by inlining all the common code in the individual drivers.

"2) OSS was not thread safe - if multiple threads were using the device it was relatively easy to crash the kernel."

This was indeed a good reason to replace OSS with something entirely different instead of adding proper mutexes to the code. And once again this was a problem of OSS/Free instead of OSS.

"3) OSS had no user-space API other than the Unix file API. Attempts to suggest that it should were met with fierce resistance."

OSS is a file level API and will always be. Any library level APIs can be implemented on top of it. If ALSA had been implemented on top of OSS then this API war will have never started.

"4) The OSS MIDI sequencer had many issues with timing, and could not be used to move MIDI between applications."

The whole concept of sequencer was wrong and ALSA copied it on steroids.

"5) OSS had no way to represent hardware data formats in a device-neutral way."

OSS handles this by doing the required conversions automagically. The application can use whatever format it wants without checking if the device supports it. However this feature was not implemented in OSS/Free.

"Applications that wanted to use devices with non-interleaved data formats would have to know that the specific device behind /dev/dsp required this. It was impossible to write h/w independent applications as soon as we begin to create drivers for non-consumer audio interfaces."

Why should this be an issue? OSS API doesn't have concept of non-interleaved channels. In this way application developers don't need to know anything about non-interleaved vs interleaved devices. If necessary OSS will do de/re-interleaving automatically in background.

"T>he real reasons why OSS was "replaced" were not technical but something else. Some developer(s) wanted to create a new sound system instead of improving OSS. They invented technical looking sales arguments like "better latencies", "more advanced" or "supports all aspects of hardware" that look like killer arguments even nobody understands what they really mean."

"No Hannu, you've demonstrated consistently that you don't understand what these phrases mean. There are plenty of people in the Linux audio community who understand precisely what these mean."

And what might be the precise understanding for example on "low latencies"? How low is low? Is that figure based on some blind listening test? How to get that kind of latencies? When do latencies matter? Are there any tradeoffs?

Alex Kovar said...

Very interesting article! OSS sounds like it has some clear advantages but I agree with david that it NEEDS to support suspend/resume. I personally use it all the time and a distributions ability to support it effectively is one of my main judging criteria for how good that distribution is. Especially on laptops.

dawhead said...

@hannu: so, if i may paraphrase - the problems with "OSS" were actually with OSS/Free, and its never been fair that the kernel guys said no to relying on OSS (the non-free version) since that fixed all the problems with OSS? i think ted tso summarized this well a few comments back - it has elements of a tragedy, but its also the past.

re: interleaved etc. ... just as was true in 1999, you ignore the fact that interleaving/deinterleaving large numbers of channels when you don't need to is phenomenally stupid because of the cost in cache misses. The fact that consumer and desktop apps don't deal with this kind of data doesn't mean that pro-audio and music creation apps should not (and in fact, its the natural format for such applications). Its also the natural format for hardware targetting such niches. There was no way with OSS (Free, at least) to deliver non-interleaved data, or even refer to the concept.

Finally, regarding your comments on the meaning of "low latency", I just can't believe that you've even been living on the same planet as, say, the rest of the people who are on the linux audio developers mailing list. Sure, there is no fixed answer to what kind of latency is enough for a given task. But we have a perfectly good understanding of what kinds of latencies are required for different tasks, and this is not something that has anything to do with Linux, OSS, ALSA - its been discussed within the audio technology & engineering world ever since the arrival of digital audio. I am just flabbergasted and dismayed that you can treat this stuff in such an ignorant and dismissive way.

Mackenzie said...

Welcome to 2009. Suspend and resume do tend to work very reliably in the Linux world these days. I've been using Linux for 3 years. Of these machines:
- My 3 laptops
- 3 roommates' 4 laptops
- Boyfriend's 4 laptops
All suspend and resume reliably.

So, um, yes, I do count it as important that ALSA knows what to do when suspend/resume occurs.

Hannu Savolainen said...

Dawhead said:

"re: interleaved etc. ... just as was true in 1999, you ignore the fact that interleaving/deinterleaving large numbers of channels when you don't need to is phenomenally stupid because of the cost in cache misses."

There is no big differences between interleaved and non-interleaved writes to a non-interleaved device. In both cases exactly the same amount of data needs to be copied. Exactly the same number of pages (or cache lines) needs to be touched. The access ordering is slightly different and may cause some minor effects. Even then it's not guaranteed that non-interleaved access is the fastest method.

"The fact that consumer and desktop apps don't deal with this kind of data doesn't mean that pro-audio and music creation apps should not (and in fact, its the natural format for such applications). Its also the natural format for hardware targetting such niches."

Most professional devices use interleaved channels. I think RME is the only manufacturer that uses non-interleaved.

"Finally, regarding your comments on the meaning of "low latency", I just can't believe that you've even been living on the same planet as, say, the rest of the people who are on the linux audio developers mailing list. Sure, there is no fixed answer to what kind of latency is enough for a given task. But we have a perfectly good understanding of what kinds of latencies are required for different tasks, and this is not something that has anything to do with Linux, OSS, ALSA - its been discussed within the audio technology & engineering world ever since the arrival of digital audio. I am just flabbergasted and dismayed that you can treat this stuff in such an ignorant and dismissive way."

I have been monitoring the latency discussion on Linux audio and ALSA mailing lists for years. There has not been much discussion about the real latency requirements of given type of application. The usual question is how to get lowest possible latencies. At least 10 ms is required and few milliseconds is even better. This same question has been asked by 100's of OSS developers who contact me.

Don't forget that speed of sound is just 343 m/s (1125 ft/s). This means that 1 ms equals to 35 cm, 10 ms is 3.5 m and 100 ms equals to 35 m. If you sit at a back row of a big movie theatre then do you notice any latencies? Probably not.

There are pathological cases where even minor latencies matter. A good example is a karaoke effect processor. The singer will get confused if the amplified sound is delayed too much. There applications where latencies are not tolerated at all (DAW workstations) but where even significantly large latencies can be fully compensated.

And how to get lowest possible latencies? The sound system you use doesn't matter as long as there are no serious design mistakes. The bottleneck is that the application must get executed during every single time window. If latencies get lower the window gets shorter. At some point things will start to fail. Using higher priorities may help as long as there are no other high priority processes. Use of RT kernels should also improve the situation. However running a vanilla kernel in single user mode has given as good latencies than I have ever required.

I have done significant amount of research on latencies using a mixed signal oscilloscopes. I have connected the analog channels to line in and out. The digital inputs have been connected to the parallel port of the PC. I have then modified various applications to write spikes to the audio output and to change the parallel port bits for example when it calls write or does something else. Based on that I think I have better understanding on latencies and audio timing than the people on linux audio mailing lists.

dawhead said...

@hannu: lets take a reasonably common case of 26 channels with 4 byte samples (24-in-32). To deinterleave this data into a h/w buffer from an interleaved setup will completely trash the L1 cache on any modern computer, since consecutive samples from each channel are 26*4=96bytes apart. Memory access is almost random from the perspective of the cache hardware (though of course, it is not). The same applies to re-interleaving from non-interleaved format. This only gets worse as the block size increases.

Your only argument about this seems to be that "it doesn't cost much to do this". I think this is a bad sign - an elegant API design will not create work for the hardware when none is necessary. In the case of any multitrack playback system delivering to a non-interleaved output system, no such work is necessary.

Sure, RME are the main players here. But guess what - go to any studio that does this stuff, and RME gear dominates completely. They are effectively the only company that make off-the-shelf, general purpose multichannel (>10) interfaces for PCI at this point, and certainly the only place you are going to get MADI support for Linux right now. Their hardware is everywhere that needs it - in academia, in recording studios, in portable recording facilities, in broadcasting.

Your discussion of latency is silly. Yes, there are plenty of people who don't know very much about this and ignore the time it takes sound to travel through air. But I was not referring to those people, nor those artificial and generally incorrect examples.

Latency matters when someone carries out an action that changes what they can hear. That's all you need to remember. Certainly, if "hearing" is mediated by a large air gap, then the latency requirements to ensure that the change in sound does not seem unduly delayed relative to their action are pretty loose.

But for many, many situations where a musician or audio engineer is playing or mixing sound, the timing requirements are much tighter. They are using near field monitors or headphones, where the delay caused by sound travelling through air is either zero (headphones) or on the order of 1-2 msec (near field monitors).

There applications where latencies are not tolerated at all (DAW workstations) but where even significantly large latencies can be fully compensated.

As the author of a DAW, I have no idea what you are trying to say here. Latency compensation *inside* the signal pathway of DAW is not directly related to the I/O pathways. This kind of compensation exists to take care of situations where some signal pathways flow through latency-adding computation (eg. FFT) and others do not. DAWs are actually examples of a case where both very large latencies can be acceptable (e.g. when used purely as a hard disk recorder, and monitoring is accomplished via other means), and where only the smallest latencies are acceptable (e.g. when running MIDI-driven instrument plugins and being "played" by a performer with near-field or headphone monitoring).

Since most of the people who actually post on LAD are concerned with recording/music creation, naturally their interest take them in search of the lowest latencies. I have never seen anyone who knows even a little bit about this stuff say that desktop media players need this kind of "performance".

The sound system you use doesn't matter as long as there are no serious design mistakes.

Absolutely. But this begs the question quite a bit ... the API of a particular sound system can encourage, or discourage the right designs in application software. This is not something where ALSA really does any better (or worse) than OSS, but where both CoreAudio, JACK and even ASIO do rather well - you have to work hard to mis-design your application when using those APIs.

segedunum said...

Also, mixing/per-stream volumes is certainly not what PA is about. Try googling for why it actually is good. The biggest benefit still is the gitch-free playback mode


That's possibly the most stupid thing I've ever heard. Pulse is anything but glitch-free, and even if it was it's still an exceptionally poor reason for starting the project.

dawhead said...

@segedunum: is it necessary to be be so negative? do you understand what "glitch free mode" is, from a design perspective? It was never noted as a reason to "start the project" - Lennart came across the idea some years after starting the project and thought it would be a good one. What he is calling "glitch free" is basically the same fundamental design that is found in CoreAudio and the new windows audio driver model. It just happens to exist at a point in the audio stack that I would consider sub-optimal - it should be part of the kernel driver design, not part of a user-space library (at least, not only part of a user-space library).

Hannu Savolainen said...

I had to split this reply to two parts because it became too long:

Dawhead said:
".... Memory access is almost random from the perspective of the cache hardware (though of course, it is not). The same applies to re-interleaving from non-interleaved format. This only gets worse as the block size increases."

For sure there are serious cache effects. However in real world there is no noticeable difference between interleaved and non-interleaved write (I have actually measured this). Speed of the copy doesn't seem to be bound by L1 cache but something else.

Also OSS is more flexible in channel allocation than ALSA. You are not limited to one big stream of 26/whatever channels. You can split the device to multiple streams of 1, 2, 4, 8, ... channels each.

"Your only argument about this seems to be that "it doesn't cost much to do this". I think this is a bad sign - an elegant API design will not create work for the hardware when none is necessary."

OSS doesn't support non-interleaved streams for policy reasons. The primary design goal of OSS has always been strong device abstraction. Adding new concept such as channel ordering violates this goal. It adds one more feature the applications should check and support.

There is no technical reason why it's not supported. Implementing this feature should not take longer than few hours. It can be done if it appears to be necessary. At this moment it doesn't seem to be necessary or even useful.

"In the case of any multitrack playback system delivering to a non-interleaved output system, no such work is necessary."

How about the situation where only the last channel(s) have signal to play. You may win some cycles by using non-interleaved writes. However you will lose lot of cycles because you have to copy samples for unused channels. OSS let's you to copy just the channels that you actually need.

Hannu Savolainen said...

I had to split this reply to two parts because it became too long:

Dawhead said:
".... Memory access is almost random from the perspective of the cache hardware (though of course, it is not). The same applies to re-interleaving from non-interleaved format. This only gets worse as the block size increases."

For sure there are serious cache effects. However in real world there is no noticeable difference between interleaved and non-interleaved write (I have actually measured this). Speed of the copy doesn't seem to be bound by L1 cache but something else.

Also OSS is more flexible in channel allocation than ALSA. You are not limited to one big stream of 26/whatever channels. You can split the device to multiple streams of 1, 2, 4, 8, ... channels each.

"Your only argument about this seems to be that "it doesn't cost much to do this". I think this is a bad sign - an elegant API design will not create work for the hardware when none is necessary."

OSS doesn't support non-interleaved streams for policy reasons. The primary design goal of OSS has always been strong device abstraction. Adding new concept such as channel ordering violates this goal. It adds one more feature the applications should check and support.

There is no technical reason why it's not supported. Implementing this feature should not take longer than few hours. It can be done if it appears to be necessary. At this moment it doesn't seem to be necessary or even useful.

"In the case of any multitrack playback system delivering to a non-interleaved output system, no such work is necessary."

How about the situation where only the last channel(s) have signal to play. You may win some cycles by using non-interleaved writes. However you will lose lot of cycles because you have to copy samples for unused channels. OSS let's you to copy just the channels that you actually need.

Hannu Savolainen said...

dawhead wrote:

"Your discussion of latency is silly. Yes, there are plenty of people who don't know very much about this and ignore the time it takes sound to travel through air. But I was not referring to those people, nor those artificial and generally incorrect examples."

It's silly to limit discussion to only to applications that really need low latencies. The point is that such applications are a special case. The real problem is that also applications that don't have low latency requirements try to push the latencies down to a level where the application starts to break. Then nothing works without use of high priorities or some RT extensions.

Hannu> "There applications where latencies are not tolerated at all (DAW workstations) but where even significantly large latencies can be fully compensated."

"As the author of a DAW, I have no idea what you are trying to say here."

What I mean with compensating latencies is based on the fact that both recording and playback sides of a sound card are locked to the same sample rate clock. If you start playback and recording at the same moment then they will keep running in 1:1 sync forever. Even the buffering latencies are relatively high your recorded samples will be perfectly in sync with the playback (practically no latency between them). The benefit is that random disk read/write delays and things like that don't cause so much problems because longer audio buffers can be used.

MIDI output plugins require minimal latencies. So the DAW application needs to use smaller buffers in that case. OTOH if the output of the plugin needs to be recorded on the disk then you probably don't loop the output through the sound card. Instead you record the direct output of the plugin which is always latency free. It is correct that the monitoring output of the plugin will suffer from the latencies.

Another example of latency compensation is a case where some scientists wanted to measure how fast different persons react to audio stimulus. They played a beep and measured the time until the person hit a button connected to the joystick port. It may seem that this requires very very very low latencies. However this is not the case. The application was redesigned to use line-in to record the stimulus on the left channel and a beep produced by the push button on the right channel. The time difference between these signals gives precise time regardless of the latencies.

Actually at least half of all audio related applications written for Linux are custom/in-house ones that will never be released in open source. They probably don't know anything about the LAD mailing list. Instead they copy the audio stuff from some other application or use OSS which is has documented API.

dawhead said...

@hannu (part one):

The primary design goal of OSS has always been strong device abstraction.

yes, the OSS API (also known as the Unix file API) abstracts away so much that you almost cannot tell its an audio interface :)

hannu - i just don't understand your design philosophy. you seem to operate in a vacuum of linux/unix history. everything you argue for goes against the design decisions of every other major audio API developer, whether its Apple, Steinberg, PropellerHeads, EMagic (back in the day), Digidesig and even Microsoft. You argue for design choices ("all audio is interleaved") that are directly contrary to the choices that the whole audio software industry has made.

Do you think that all these other people are fools? Do you think that they don't know what they are doing?

You write, regarding interleaving: Adding new concept such as channel ordering violates this goal. It adds one more feature the applications should check and support. ... fine, but your existing design decision means that any application that doesn't naturally generate interleaved samples has to do work that is unnecessary. It cannot operate in its "natural" way. So I don't understand your philosophy in any way other than this: like almost every attempt at audio APIs on linux to date, OSS focuses on designs for the common case that make the more demanding cases inefficient. This is very, very marked contrast to CoreAudio, ASIO the new windows driver model, and JACK, where the demanding cases can get what they need, and the common case can be accomodated.

To me, this is totally backwards, and is a major contributor to the fracturing of audio APIs on Linux.

dawhead said...

@hannu: It's silly to limit discussion to only to applications that really need low latencies. The point is that such applications are a special case.

I have never denied this or claimed otherwise. My point (and its been my point for at least 8 years) is if that you don't provide an API that encourages the right kind of software design for genuinely demanding yet still reasonably common cases, you end up with a really nasty situation, just like the one we have on Linux today. Its certainly true that you can do low latency audio i/o with OSS - thats why there is an OSS backend for JACK. But the design of OSS by itself encouraged (and continues to encourage) many developers to write applications in a style that cannot be integrated into a low latency situations. The result was clear on linux in the early 2000's when people started converting synthesis apps from doing disk i/o to realtime audio i/o: they simply switched over to write(2)-ing to the audio interface in the same way that they used to write to disk, which appeared to be how OSS was intended to be used (to any naive programmer).

The result: dozens of apps that simply couldn't function if the user requested very low latency (e.g. when the synth was driven by realtime MIDI input). This wasn't the fault of the OSS implementation (or the ALSA support of the OSS API that followed). It was because these apps were simply not written correctly to handle that kind of situation and in addition, because they directly accessed the driver via read/write, there was no way to interpose any buffering between them and the device parameters.

Contrast this with CoreAudio - every application is required to process audio via a callback that is timer-driven. Each application can have its own timer interval, if it chooses. Each application can layer additional layers of buffering above the callback if it chooses. But no application can avoid the fundamental reality that they need to continually and periodically wake up and process audio. The entire API is designed around this concept, going all the way up to AudioUnit plugins and all the way down to the C++ shared objects that form the parent class for OS X kernel audio drivers.

OSS (and ALSA) continue to provide a reasonably functional implementation of a HAL for audio, albeit with slightly different levels of detail. But neither of them do anything to address the issue of application design, which is partly why we have so many very different APIs on linux.

I haven't touched OSS or ALSA code in several years now. I shudder to think of anyone writing an audio application using either of the APIs that these systems offer. The evidence is that almost all developers will get either the basic design, or specific details, or both, wrong whichever one they choose.

Dev Mazumdar said...

Dawhead,

You write: yes, the OSS API (also known as the Unix file API) abstracts away so much that you almost cannot tell its an audio interface :)


Everybody understands /dev/dsp is an audio device. ALSA makes it a huge mess - do I use default:plug or defualt:hw or hw:pcm0 or /dev/asound/foo/bar/c0d0p0?

Please list the applications in Linux besides Jack that actually know what to do with non-interleaved audio?

Infact why don't you poll your users if they even use Jack in non-interleaved audio format?.


You people talk about non-interleaved audio like it was some cool concept like Google. News flash, it were useful, everybody would be be using non-interleaved audio. Even Intel when designing its Azalia HD spec found it wasn't useful.


ALl I can do is LMFAO @ CoreAUdio - they have the balls to go and patent Per-process volume?
http://news.cnet.com/8301-17939_109-10226032-2.html?tag=newsEditorsPicksArea.0

OSS had per-process based volume back in 2001 (perhaps earlier)

Hannu Savolainen said...

I have been criticized for sticking with the stupid Unix file I/O and for ignoring the ideas presented by the "Linux audio community". Let me tell why.

The reason is very simple: OSS is designed for Linux and Unix.

Why Linux is Linux instead of LIN-DOS, Lindows, LinOS, LinVMS or something else? Why did so many developers started working on Linux next day Linus had released the first snapshot? The reason was that they all (including me) had found out that Unix was the best operating system ever made. It was just far too expensive for hackers and not open sourced.

Then what makes Unix/Linux/POSIX the best operating system? There are just few main reasons:

1) The main reason is the very simple and elegant API that includes the uniform file/device model.

2) Very modular pipe oriented application design philosophy. Every command performs just one task. There are ls, cat, sed, grep, sort, awk and so on for text. The pbmplus package does the same for images. There are also similar tools for video. Audio tools like ossplay, ossrecord, sox, ecasound, and lame work on audio.

3) Powerfull scripting (shell) language.

4) Multiprogramming.

5) TCP/IP networking (that is compatible with the device/file/pipe model) and related tools.

OSS is an audio/sound API designed for Linux, Unix and other POSIX compatible operating systems. For this reason its design follows the above model. There are the few device specific system calls (read/write/open/close). Three ioctl calls to set the sampling rate, format and number of channels.

Few more ioctl calls are available for interactive (GUI) applications and lot more for all kind of system level tools. Latencies can be pushed down by using linear priorities and RT kernels. However the preferred way is to rethink the problem so that extreme latencies are not required.

The internals of OSS have been optimized to ensure break free sound when somebody wants to play MP3 files or DVD movies while the kernel is compiling in background. Audio devices can be shared by multiple applications. Channels of multi channel devices can be allocated freely between multiple applications.

It is correct that OSS is not optimized for applications that don't follow the Unix philosophy. This includes all-mighty monolithic DAW applications. They can be implemented on top of OSS but there may be some limitations.

This model has worked perfectly for all developers and applications except few "LAD" rebels who would like to do "serious audio and music production" using Linux which is a general purpose operating system.

May I ask why do you have chosen Linux which is being developed for entirely different use? Why didn't you select a true real-time operating system? Or why didn't you develop a dedicated operating for music production (Musix) so that you don't need to use the stupid Unix file I/O model? By working under the right operating system you might have been able to design something that beats Windows and Mac in professional audio and music production market.

Instead you stick here shaking the boat of the Linux community. You have replaced the audio API with something that has nothing to do with POSIX/Unix/Linux. Your solution is an API that has 1500+ functions and just few of them are documented. You push linux kernel developers and distribution vendors to add real time features to the kernel images so that your solution could work.

And of course it's everybody else's fault if they don't understand your ideas. You are the Linux audio developer community. You are the only guys who do something that maters. All the others are just lamers who work on their silly simple media players that don't do anything impressive.

Reece Dunn said...

Ok. Calm down people.

I don't see why it is not possible to have:

1. a JACK-style API for applications that need/require low-latency and a fine degree of control at the application level (e.g. for software like Rosegarden);

2. a Unix-style file-based API (be it the OSS/OSS4 or the simple PulseAudio API) for applications that don't care about latency/buffering;

3. a middle-ground API (ALSA? SDL?) for applications that have slightly different needs.

For audio conversion and media playback applications, a higher-level API like GStreamer is better but goes beyond the API that interfaces with the device.

This is ok, as different applications have different requirements and needs.

From a user perspective, they should be able to switch between ALSA+OSS, OSS4, PulseAudio and JACK configurations easily.

dawhead said...

@hannu: ok, so after that rant, in which you seek to defend the use of the unix file API for audio, perhaps you can explain which Unix system uses it for video.

dawhead said...

@hannu: additionally, i must say that

1) i object to being treated as the voice of "the linux audio developers".

2) i object being treated as the voice and/or designer of ALSA.

It is true that I am active and vocal within the community of the LAD mailing list, and some of my design decisions have played a large role in how many different people on that list develop their software. But I am not its voice, and I do not represent them in any meaningful way.

I was also involved in some details of the design of ALSA in the early days, but I certainly had nothing to do with the vast majority of the API or its overall design. I have not been involved in any aspect of ALSA for many years, and like many others I do not find the API to be well designed or very easy to use. ALSA is certainly not my idea of a good audio API, it just happens to have some things that I regard as internal design improvements over OSS.

dawhead said...

@reece: i don't think anyone imagines that a single API will ever be used by every Linux audio application - things are way too far fragmented already, and there are too many people who believe that their own requirements are special enough to justify a particular API (they might even be right!).

The issue in my eyes has more to do with the layer of the API stack(s) where user-space meets the kernel. If you get this right, then you can build all kinds of things in user-space to meet the demands of different kinds of applications. If you get it wrong, you end up forcing either unnecessary compromises and poor design in some kinds of applications, or you trigger the development of alternative versions of the userspace/kernel junction. In my opinion, the current junction is misdesigned (and in ways that OSS would not improve). Fixing it is more of a political/social issue than a technical one.

dawhead said...

@hannu wrote:

Everybody understands /dev/dsp is an audio device. ALSA makes it a huge mess - do I use default:plug or defualt:hw or hw:pcm0 or /dev/asound/foo/bar/c0d0p0?


I am not going to defend ALSA for reasons I've described previously. On the other hand, I've also already outlined the many reasons why encouraging application authors (not library authors) to open a device node for audio i/o causes many problems for more advanced users of audio.


Please list the applications in Linux besides Jack that actually know what to do with non-interleaved audio?


Any application that uses libsndfile, or gstreamer can handle non-interleaved audio. That covers quite a few. Perhaps you mean "know what to do with hardware that (only) supports non-interleaved audio"? That is certainly a very small subset, but that is for two reasons. First of all, most of the applications that are used in conjunction with such hardware use JACK, which has completely removed format negotiation from the API (so that in fact, yes, JACK is the only that has to deal with it). Secondly, ALSA offers a trivial, run-time choice to handle the conversion so that applications don't actually need to know anything about it if they don't want to. The difference compared to OSS is that with ALSA, they can care about interleaving if they want to.


Infact why don't you poll your users if they even use Jack in non-interleaved audio format?.


All JACK clients use non-interleaved audio formats.


You people talk about non-interleaved audio like it was some cool concept like Google.


I haven't sought to portray it as "some cool concept, merely as a format choice that is almost as fundamental as sample depth and sample rate, and deserves to be representable in any audio API.


News flash, it were useful, everybody would be be using non-interleaved audio.


New flash: almost every major music creation application on every platform, an even higher percentage of multichannel recording applications on every platform, and absolutely every audio plugin (FX or instrument) on every platform and every plugin API is using non-interleaved audio


ALl I can do is LMFAO @ CoreAUdio - they have the balls to go and patent Per-process volume?


So your only response to the radically different design that CoreAudio represents is to critique Apple's IP policies? I'm no fan of software patents (I'm actually appearing in court next week over one such patent), but this doesn't seem to me to be a very useful response to a technical issue.

dawhead said...

@hannu: this model has worked perfectly for all developers and applications


i would wager that the vast majority of developers of audio software do not use ALSA or OSS directly at all, making this discussion rather irrelevant from a development standpoint. in fact, the desktop environments seem very keen for all their developers to use a library for all audio i/o (e.g. phonon in KDE), and not use ALSA or OSS at all. This is good news, since that library can be made to do all kinds of cool thing, very few of which involve ALSA or OSS.


except few "LAD" rebels who would like to do "serious audio and music production" using Linux which is a general purpose operating system.


Maybe because its so general purpose that, like OS X (and these days even Windows), it can actually be used for this purpose?


May I ask why do you have chosen Linux which is being developed for entirely different use? Why didn't you select a true real-time operating system?


Why don't you ask the embedded Linux market (supposedly the 2nd biggest market place for Linux) the same question? Our reasons would be quite similar, but their's are better documented.

You push linux kernel developers and distribution vendors to add real time features to the kernel images so that your solution could work.


Oh yeah, we have so much control over what they do. In fact, we even threatened to execute a linux kernel developer every day until all of Ingo Molnar's RT patch is merged. Unfortunately, it seems that the embedded Linux market has even more gangsters than the vicious LAD community, and so we've run out of guys to shoot. Not to shabby though, since most of his work in now merged. And it seems that users, distributions and most linux kernel developers are very happy with that situation. You're not?


And of course it's everybody
else's fault if they don't understand your ideas. You are the Linux audio developer community. You are the only guys who do something that maters. All the others are just lamers who work on their silly simple media players that don't do anything impressive.


Do you have to be so asinine? I have nothing but respect for the brilliant guys who have worked on Amarok, Rhythmbox and several other similar applications. I use Rhythmbox every day and I find it a very impressive program. I've even worked on the gstreamer JACK plugin so that applications like this can be integrated into JACK-based workflows because many users have asked for this, and it seems like a good idea. BUT neither those guys nor myself are under any illusion that our needs as audio software developers are really that similar. They have to care about tricky stuff like file formats, id3 tagging, playlist management and so on ; I have to care about latency, disk i/o bandwidth, and squeezing every last drop out of the CPU while handling audio. When I talk with people who write this kind of app (which I do quite often) we don't pretend that our applications are the same, yet I detect some sense that you are. Why?

segedunum said...

is it necessary to be be so negative? do you understand what "glitch free mode" is, from a design perspective?

Well yes it is, because the state of audio as shipped in distributions today is just so exceptionally poorly put together. I'm not so interested in 'glitch free mode' as a design perspective as I am as to how it practically works in Pulse and whether anyone feels the benefits. They don't.

The bottom line is that for many use cases glitch free audio just doesn't seem to work and latencies are just too high.

That's the problem that many people seem to have here, including Thedore. They start moving off on tangents about design perspectives and community support when it's a question of looking long and hard at what actually works better from an end user perspective.

Hannu Savolainen said...

dawhead wrote:

>@hannu: ok, so after that rant, in which you seek to defend the use of the unix file API for audio, perhaps you can explain which Unix system uses it for video.

First of all my rant was not for you but for the LAD/ALSA centric community in general.

I recall that the DBV (digital video) API for Linux uses the file API. I have not played with it for years so I cannot remember it precisely. However I don't see why read/write could not be used for mpeg 1-4 or mjpeg streams.

Using read/write for uncompressed realtime video is not good idea because there is too much data to copy around. However mmap with poll/select and few ioctl calls should be perfectly suitable for video. My understanding is that V4L2 works in this way.

dawhead said...

@hannu: no,i do not mean "playing video files". I mean interacting with the video interface. Yes, the fbdev interface exists - I don't see you arguing that apps should be using this instead of X Window. Or are you?

dawhead said...

@hannu: to be more specific - the main reason i have issues with OSS are is the same reason that this is not a good example for a program that wants to put something on the display.

Hannu Savolainen said...

dawhead said:

"@hannu: no,i do not mean "playing video files". I mean interacting with the video interface. Yes, the fbdev interface exists - I don't see you arguing that apps should be using this instead of X Window. Or are you?"

I was not arguing about that.

Both the interfaces I mentioned (DVB and V4L2) are device interfaces for TV tuner and frame grabber hardware.

dawhead said...

@hannu: right, well if you're not arguing that fbdev is an appropriate general API for applications that want a GUI, perhaps you could explain what is so different about that compared to audio. From my perspective, they are more or less identical - a shared hardware resource, whose operation is based on regular processing of a data buffer, and a desire by different applications to do display various things on it. Why does the OSS API make more sense for audio apps than fbdev (or something like it) does for GUI apps?

Joe P. said...

It seems like no one is getting it but segedunum and me, and maybe theodore a bit in that one post where he managed to stay on-topic. I can't take much more of this incessant bickering, so please allow me to bludgeon you all over the head with the point of this article:

Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users
Sound Systems and End Users

I feel like my previous comments have been mostly ignored, because they're not about the "hot topic" that you all seem so keen to discuss, but perhaps this will finally get through to you all that this article is clearly not meant as a forum to debate the tired old argument of which API developers should use, but about providing a choice to the end users of which sound system they want to use. (See my previous comment for more details and less bludgeoning.)

dawhead said...

@joep: so, lets take a concrete example. suppose end users want to route audio between applications. or suppose users want to route audio over the network. or suppose users want to switch an audio stream from one soundcard to another (e.g. USB headset to the builtin or vice versa).

whether or not the applications can do this/be used like this depends in very large part on which audio API they are written to use. if they are written to use the OSS API, then this kind of functionality will be a little tricky to implement, though it can be done (PulseAudio attempts to do so, as do a few other system infrastructure attempts). If they use the ALSA API, then its a little easier, though still far from trivial, especially in terms of user/system configuration.

offering users a choice of "which sound system they want to use" has essentially nothing to do with this kind of issue, especially if the choice is OSS or ALSA.

3 years ago I was at a Linux Architects meeting where I sketched out about a dozen basic tasks that users needed to be able to accomplish with the audio system. We are not really substantively closer to that goal in real terms today. PulseAudio has aimed at trying to make things better - many people seem to feel that it has made them worse. And PulseAudio is currently explicitly not targetting the needs of pro-audio/music creation users, which just continues to keep things confused and maddening.

Joe P. said...

dawhead, your statement that "whether or not the applications can do this/be used like this depends in very large part on which audio API they are written to use" can only mean that you once again missed the crucial part of this article. It's the part about how applications written with any API can be run using any sound system, and it's been said multiple times throughout this trainwreck of a thread. Yes, even "evil" OSS applications can be run in your "almighty" ALSA and you can get all of the "superior" ALSA benefits if you prefer. As long as a program is outputting to the sound system that has a given feature (in your case, ALSA), the API used to create that program is ultimately irrelevant to whether that feature can be used. There also, the sound system being used to output is irrelevant to ability to use API-side features.

The one intelligent thing you said is that "offering users a choice of which sound system they want to use has essentially nothing to do with this kind of issue", though it's better put as "this kind of issue has essentially nothing to do with offering users a choice of which sound system they want to use". You are correct that they are completely separate. This discussion is about the latter, not the former.

That said, offering users a choice of sound system will not in the least bit affect what API is the best for writing different types of applications, nor will it limit the ability of developers in any way. If a certain API has a feature that a developer wants to make use of, then (s)he should by all means use that API. If a certain sound system has a feature that the end user wants, then (s)he should of course use that sound system. Even in a case where a developer needed features from the OSS API, and an end user would prefer to use the ALSA sound system, there is no conflict. There is no practical overlap between what API is used and what sound system is used. Historically there was such a connection, but by this point, both systems are cross-compatible. Anyone who says otherwise is living in the past.

It might make it easier to understand if you visualize the APIs as completely different entities than the sound systems of the same name. Forget for a moment everything you know about sound systems, sound APIs, companies, and histories. Now, imagine I'm giving you the ALSA sound system and telling you it's called "AudioCenter". I'm giving you the OSS sound system and telling you it's called "OuterSound". Finally I'm giving you the ALSA API, and telling you it's called "AllSound". Now I tell you that AllSound can be set up to output to either AudioCenter or OuterSound. So, with such an abstraction, is there really a connection between AllSound and AudioCenter anymore?

In closing, giving developers the choice of which API to use and giving end users the choice of which sound system to use can only serve to provide more flexibility for both developers and end users, not less. The choice of API has been available to developers for a long time. That flexibility is already in place. It's time to make the choice of sound system readily available to the end user.

Hannu Savolainen said...

dawhead said:

"@hannu: right, well if you're not arguing that fbdev is an appropriate general API for applications that want a GUI, perhaps you could explain what is so different about that compared to audio."

"arguing" is not a proper word. I didn't comment the whole subject at all. I simply have never used fbdev for any purpose and I don't know how it's implemented.

I don't know where you are aiming at. Comparing audio and frame buffer is like comparing apples and astereoids. It doesn't make any sense.

dawhead said...

@joep: I sadly think you're mistaken.

It is true that there are hacks to re-route the audio out of an application written using the OSS API and do something else with it than the designers of OSS intended. but these are distinctly hacks (necessitated largely by the fact that the OSS API mandates direct use of system calls, not a user space library/API). its partly because these are hacks that the attempts to make things like PulseAudio work are so difficult.

Your assumption that developers can use "whatever API they choose" and that this can mesh nicely with users picking "whatever sound system they choose" is, I think, flawed at the outset.

It is only if applications are developed with an API that specifically allows interception of the audio streams, mixer control requests and so forth, that a "sound system" as you put it can effectively be used by the user to get what they want done.

The only way that I can agree with what you seem to be claiming is if by "sound system" you simply mean "kernel drivers". If this is all you mean, then your vision of things is certainly technically possible, but hardly does anything to solve the vast majority of the problems that Linux users have with audio.

dawhead said...

@hannu: oh, they are very related. I outlined the relationship in my last comment about this. Let me repeat:

* a shared hardware resource (video or audio interface)
* device operates by periodically taking the contents of a data buffer and pushing it to a human-sensable device (screen or output connectors/speakers)
* multiple applications that want to put their "output" into the user's sensory experience

You have advocated that if an application wants to do this with audio, it should use system calls to access the device. You've provided several reasons why you think this is a good idea. You've even insisted that this is part of the "unix way" of doing things.

Yet for accessing the video interface, nothing but extremely special purpose applications ever access the device in this way. They use a highly abstracted, rather complex server/client API called X Window (or, on OS X, a similar system call Quartz). This allows many different things, among them device sharing, remote network access (in the case of X), and a conceptually abstract API for developers. These days, it even allows, literally, for the output of different applications to be "mixed" together (though it always did that to some extent).

If its really so great that programs should access the audio device directly via a simple set of system calls, then why shouldn't they do the same thing for video? Why has every unix operating system adopted some kind of server/client API for video that abstracts the device and interposes between applications and the device driver? And why wouldn't any sane unix system do the same thing for audio, where the user's goals and the system design goals are more or less identical?

Hannu Savolainen said...

dawhead wrote:

"If its really so great that programs should access the audio device directly via a simple set of system calls, then why shouldn't they do the same thing for video? Why has every unix operating system adopted some kind of server/client API for video that abstracts the device and interposes between applications and the device driver? And why wouldn't any sane unix system do the same thing for audio, where the user's goals and the system design goals are more or less identical?"

X was created 25 years ago. At that time graphics hardware was very limited. Video frame buffers were just flat pixel arrays. The only way to manage multiple windows by multiple applications was to use a central display server that performs all the drawing operations. However if X was is designed todat then the result is probably different. Current accelerated graphics adapters have capability to manage multiple windows in hardware level. There is no longer need for a central server that does all the drawing in software.

A modern version of X will handle local windows locally without using a monolithic central server. Support for remote displays will still require a server to handle them. However this can be handled easily in library level.

The same is true for audio. There is no need to use any sound server for mixing audio sent to a local sound cards. Connections to remote audio devices can be handled transparently. OSS uses a loopback device for this. The application doesn't need to be avare about that. Only the remote connections are handled by a server listening to the server side of the loopback device.

dawhead said...

@hannu: congratulations on completely evading the point. I was not discussing the internal implementation of X11 or X12.

I was pointing out the APIs that are used to access the video interface even on Unix systems look absolutely nothing like the file API that you champion as "the right way to do things". Moreover, just about nobody has suggested moving X functionality into the kernel "because there is no need for a server" (anymore).

It doesn't matter (much) how any particular functionality is implemented inside the audio system, just like it doesn't matter (much) whether its software or hardware doing compositing. The point is that you claim the right way to access a hardware device for audio in a unix system is via the file API. Yet for years already passed, and as far out as anyone can see, accessing video in a unix system is (almost always) done via high level APIs that have no relationship to the Unix file API. Since you are the one who designed this state of affairs in the first place (when you designed OSS), I think that is perhaps incumbent on you to justify the difference. In your defense, I must note that Sun did essentially the same thing, but this is not much of a defense when you consider all the other things that Sun tried and eventually dropped or buried :)

You say "there is no need for .." and "we do this in OSS", which are just tautologies. Its precisely because you insist on doing everything behind the file API, instead of in front of it, that you can make these statements. The fact the OSS architecture doesn't include a sound server doesn't mean that a sound server is automatically useless. The fact that OSS just does everything inside the kernel doesn't mean that it there are no arguments for doing things elsewhere. The design of X recognizes this; despite massive changes in video hardware design over the years, nobody, is seriously suggesting changing this most fundamental aspect of its design.

Hannu Savolainen said...

dawhead> You say "there is no need for .." and "we do this in OSS", which are just tautologies. Its precisely because you insist on doing everything behind the file API, instead of in front of it, that you can make these statements. The fact the OSS architecture doesn't include a sound server doesn't mean that a sound server is automatically useless. The fact that OSS just does everything inside the kernel doesn't mean that it there are no arguments for doing things elsewhere. The design of X recognizes this; despite massive changes in video hardware design over the years, nobody, is seriously suggesting changing this most fundamental aspect of its design.

Could you be bit more specific. I still have no idea about what kind of video operations you mean. However let's try yet another time.

I have already said that comparing audio and video is like comparing apples and asteroids. There are really from two different planets.

The lowest level (kernel) API for agraphics card is the frame buffer which is pretty much useless for 99% of applications. For sure the higher level primitive operations for graphics don't look anything like the Unix file API. However at the lowest level you still have open, close, mmap and ioctl.

You cannot use this as any proof of concept for audio. The lowest level Unix device/file API is perfectly suitable for 99% of audio applications. All they need is capability to play and record audio streams.

There are higher level operations like playing/recording audio files, sound generation/synthesis, filtering/effects (plugins), MP3 encode/decode and so on. The APIs for them can be implemented using libraries, servers or whatever. However this doesn't mean that the lowest level audio operations should be moved to a library too.

All good software design is based on layers. Lowest level operations are performed on the lowest layer. Higher level operations are performed on a higher software layer that uses the services of the lower one. In this way applications can select the API that works on the right abstraction level for them. Higher layer implementations can be replaced or there can be multiple implementations installed at the same time.

X is not an example of good design. It has not survived 25 years because the concept is superior. It has survived simply because it cannot be replaced.

Is this what you wanted me to say?

insane coder said...

Hi dawhead.
>you talk as if "just fixing the underlying system" is somehow a simple task

I never said it was simple. However I believe the problem needs to be fixed lower down as opposed to hacks on top, just/only fix the underlying system. Do you disagree with this?

>I have said many times (even within this thread) that when Apple switched to CoreAudio (an audio subsystem and API that seems to work very well for just about any purpose) that they had the ability to simply bludgeon all their developers over the head with the new API.

And this means what to you exactly?

First off, Apple's sound system only works on a handful of sound cards, if you hack OS X to run on non EFI hardware, you'll find very few sound cards provide correct sound. It is much easier to design a sound system that works if you limited your hardware/software package to only cards that you view behave correctly. However in the larger market, we know hardware lie about their capabilities all the time, and do things differently than they advertise. For us to maintain compatibility with the vast number of cards we do, we'll need to maintain algorithms which are fault tolerant, and once we design things to be fault tolerant, we also give leeway to the developers to make more mistakes than they should, which isn't necessarily a bad thing. DWIM style programming is great for those that don't want or particularly need to full understand the system they're dealing with.

Second of all, you act as if CoreAudio is the only thing OS X has, which is blatantly false. OS X supports side by side CoreAudio and OpenAL. OpenAL is a full fledged citizen under OS X. I will also remind you OpenAL uses a push architecture. And irregardless of this, wrapper libraries such as SDL and libao and others all exist on OS X. There doesn't need to be any bludgeoning to only use a particular set of APIs. Where did you get this idea from? Maybe internally that's true, but it's hardly true for those writing user-space software.

>We simply don't have that choice in Linux. Even though there were many perfectly good technical reasons to replace OSS with something else back the late 1990's, even the replacement continued to be required by everyone to be compatible with the OSS API. There was never any question of simply coming in and saying "the old API is gone, use the new one".

Explain the advantages of getting rid of an old API in general. Then see if you can make that case for audio, and have all the points hold true.

Things should be compatible with the OSS API for two reasons. First off it's less platform dependent, second it's easy. Unless you can provide me a solution which is both easy, plus works on all the UNIXes I develop for, I won't use your solution. I don't use the ALSA API because it's not easy (as in properly design sound output in a short time after reading documentation, and can feel reasonably sure sound will work properly on other machines too), and because I'm already using OSS for other systems, which don't provide ALSA.

insane coder said...

Now of course perhaps I should use OpenAL or something related instead for portability, and better documentation and examples out there, except if I use OpenAL on Linux, ALSA works horribly, and users complain sound is bad. If the ALSA team can't ensure the wrapper libraries support it well, then using the wrapper libraries aren't an option either. Which leaves me with OSS API, which works well on top of ALSA too.

And there's a reason why we need compatibility with the OSS API, there's many programs, closed sourced ones at that, that only work with OSS. We really don't need to go around breaking the few commercial applications that exist on Linux, driving away the larger commercial markets (and if you believe we do, please let me know, so I can put you in my RMS mentality book).

Coming in and saying the old API is gone use the new one is bad in principal, it's also bad in theory with ALSA, because the ALSA API is junk towards the average developer, whether directly or indirectly.

>Simply getting developers of all kinds of audio software to agree on what the audio subsystem should look like is almost impossibly hard. Game developers want X, pro-audio developers want Y, desktop app developers want Z. If someone comes along and says "You should all be using A", the response will be the same as it has been all along: each group will continue to use the APIs and subsystems they have already developed.

Correct, which is why the UNDERLYING SYSTEM should be fixed, and then everything else can run on top of that.

>It is hard to imagine a system that is so much better than the specific APIs generally used by each development group that everyone will be convinced to switch to it.

The APIs themselves are fine as they are, a developer can find one to match his needs today quite well.
It is easy to imagine a system that is better than what we have today.

ALSA which always has sound mixing and is of higher quality, and doesn't start to go haywire under high CPU load would be much better than today, and if that was the state of things I wouldn't complain.

If OSS has suspend/resume, good support for MIDI adapter on sound cards, and integrated with distros better, I wouldn't complain either.

It's quite easy to envision that.

What you would like on the other hand seems to be a clean room where nothing existing works, and everyone will be annoyed that they have to do things your way, which not even Apple took that far.

>When Apple and MS do this kind of switch, they do it by forcing the switch

Oh? When MS decided DirectSound is no longer "THE API" to use, they didn't force everyone to stop using it, they just moved it to a higher level in the stack.

>Who do you think in the Linux world is capable of imposing that kind of switch?

Better yet, who do I think besides you believes we need some massive destroying backwards compatibility switch?

insane coder said...

Hi Theodore.

>Well, I tend to do actual kernel development, so if it doesn't work with the bleeding edge kernel, it's not very useful for me.

I understand that. However if you use bleeding edge, then also use bleeding edge OSS direct from their repository, which matches Linux changes pretty closely. Furthermore, if you're a Kernel developer and don't like that something broke, and it's within your aptitude, fix it!

In any event, we agree OSS isn't for everyone, and that's fine.

>But hey, great, people can use whatever they want.

If only it was easier to use whatever they want.

>As dawhead has already pointed out, this isn't likely going to change the API's used by most of the applications.

Which I already pointed out is irrelevant.

>But maybe that won't matter; you've made the claim that for at least some cards, using the ALSA libraries with the OSS back-end is still preferable

Others have agreed too.

I guess I see what you're saying though that we see people everywhere, even Reece in this thread seems to find OSS has better sound for him, yet we have no idea which sound cards any of these people use.

Hi Mackenzie.
>Welcome to 2009. Suspend and resume do tend to work very reliably in the Linux world these days.

Lucky you. Even on my machines that use ALSA, I only have one which I've noticed reliably uses suspend.

Even if you use need to use suspend, you can just tell OSS to go to sleep prior to suspending and avoid any issues.
In any event, we agree OSS should have suspend support added. I also see no problem with anyone feeling they prefer ALSA, before or after OSS gets suspend, if you enjoy it and have no issues, then by all mean keep ALSA.

It just shouldn't mean that those of us who prefer OSS, that don't mind the suspend issue, shouldn't be allowed to easily use OSS.

insane coder said...

Hi dawhead.
>This is not something where ALSA really does any better (or worse) than OSS, but where both CoreAudio, JACK and even ASIO do rather well - you have to work hard to mis-design your application when using those APIs.

Are you certain that you can even design certain applications to use those APIs?
Apple acknowledges that designing certain applications around CoreAudio would be very hard if not near impossible to do, and therefore offer OpenAL for applications whose entire design would run counter to how CoreAudio works. See what sinamas said above about applications that have to respond to several pieces of hardware, not just audio.

>yes, the OSS API (also known as the Unix file API) abstracts away so much that you almost cannot tell its an audio interface

Which is a problem how? Very few applications need to know otherwise.

>if that you don't provide an API that encourages the right kind of software design for genuinely demanding yet still reasonably common cases, you end up with a really nasty situation, just like the one we have on Linux today.

I disagree strongly with this.
I believe the nasty situation in Linux today is subpar sound quality with certain setups, and APIs that are too hard to use correctly, while solutions to the first two problems lack niceties in other areas. Maybe in your field you can attribute everything to the API design, but can you apply that to my field?

>Contrast this with CoreAudio - every application is required to process audio via a callback that is timer-driven.

Or use OpenAL, CoreAudio's sister which means every application does NOT have to process audio via a callback that is timer driven.

Enforcing timer driven callbacks is a VERY bad idea. Once you have two of these in an application, your application is dead. Such a system only works right if there's only ONE device doing this.

>Each application can layer additional layers of buffering above the callback if it chooses.

Hey, here's an idea, why not offer this additional layer right within the sound system?

>The evidence is that almost all developers will get either the basic design, or specific details, or both, wrong whichever one they choose.

How much evidence is there that most of these developers will screw it up to the extent that sound doesn't work on a large percentage of systems with OSS? Indeed way too many OSS applications worry about blocking/nonblocking audio, but do they screw it up to the extent that their implementation can be viewed as barely works?

insane coder said...

>ok, so after that rant, in which you seek to defend the use of the unix file API for audio, perhaps you can explain which Unix system uses it for video.

Maybe you should look into Plan9.
In any event, one doesn't have anything to do with the other. Video in 3D, or widget wise is complex and a file API doesn't make much sense.
Unless you mean pure video where you're constantly posting a series of frames, and only dealing with frames, in which case a file API makes a ton of sense, and there's no reason why pure video shouldn't use a file like API. A file API would be a very welcome solution for video streams as opposed to the disaster that is the X API for dealing with it.

On the other hand if I follow your logic that I shouldn't use a file API to work with something which can be abstracted to passing a series of data to and I must use something higher level, maybe we shouldn't use file APIs for files either? After all, file systems and hard drives have all kind of timing and buffering issues as well. Recent sync issues on particular file systems stirred up a lot of controversy.

Hi Joe.
Please don't spam the thread like that. While we agree in principal, there are other issues unrelated to the end users being discussed here which is important (yet I agree it is off topic).


Hi dawhead.

> so, lets take a concrete example. suppose end users want to route audio between applications. or suppose users want to route audio over the network. or suppose users want to switch an audio stream from one sound card to another (e.g. USB headset to the built in or vice versa). whether or not the applications can do this/be used like this depends in very large part on which audio API they are written to use. if they are written to use the OSS API, then this kind of functionality will be a little tricky to implement, though it can be done (PulseAudio attempts to do so, as do a few other system infrastructure attempts). If they use the ALSA API, then its a little easier, though still far from trivial, especially in terms of user/system configuration.

Please look at how Plan9 does audio. Everything is via an easy UNIX interface, and they support every trick in the book in regards to routing, mixing, on the fly stream moving, and everything else.

insane coder said...

>It is true that there are hacks to re-route the audio out of an application written using the OSS API and do something else with it than the designers of OSS intended. but these are distinctly hacks (necessitated largely by the fact that the OSS API mandates direct use of system calls, not a user space library/API). its partly because these are hacks that the attempts to make things like PulseAudio work are so difficult.

Plan9 provides a lot of tricks which makes it easy on the developers, and also gives a lot of power to the end user, without needing any massive bloat like PulseAudio, or trying to hijack open()/ioctl() calls. In fact Plan9 is better in these regards, because they provide more power at the virtual file level. Linux needs to copy more from Plan9, stopping with /proc or whatever they copied last isn't enough.

>I was pointing out the APIs that are used to access the video interface even on Unix systems look absolutely nothing like the file API that you champion as "the right way to do things"

Which says nothing towards their correctness. Why do you equate what is done with what is right? X11 for example is very very wrong.

>Yet for years already passed, and as far out as anyone can see, accessing video in a unix system is (almost always) done via high level APIs that have no relationship to the Unix file API

Which for pure video is a mistake.

>The design of X recognizes this; despite massive changes in video hardware design over the years, nobody, is seriously suggesting changing this most fundamental aspect of its design.

I don't know which planet you've been on for the past 25 years, but since X came out, to even today, there is a non stop stream of criticism to the way X does things. Developers especially are annoyed at developing with X for widgets (they only use wxWidgets, GTK, or Qt), and for video, XVideo is poorly documented and also annoying.

insane coder said...

Hi Hannu.
>I have already said that comparing audio and video is like comparing apples and asteroids. There are really from two different planets.
The lowest level (kernel) API for a graphics card is the frame buffer which is pretty much useless for 99% of applications. You cannot use this as any proof of concept for audio. The lowest level Unix device/file API is perfectly suitable for 99% of audio applications.

That is very much what I was trying to say. Thanks for simplifying it well.

We need to look at what works best for the vast majority when comparing, as opposed to looking at special cases, using them as the norm, and then going so far as applying those warped ideas elsewhere.

>There are higher level operations like playing/recording audio files, sound generation/synthesis, filtering/effects (plugins), MP3 encode/decode and so on. The APIs for them can be implemented using libraries, servers or whatever. However this doesn't mean that the lowest level audio operations should be moved to a library too.

I strongly agree with this. May I ask you then why there is AC3 support in OSS? Is it only for cards that support it in hardware?

>X is not an example of good design. It has not survived 25 years because the concept is superior. It has survived simply because it cannot be replaced.

Hopefully that will change - eventually. X has becoming less relevant over time thanks to GNOME and KDE, and embedded systems that still have video. I don't think this will be solved for many years to come though.

dawhead said...

@insane coder (and hannu): let me try to clear up the comparison with X, because I don't want to get totally derailed by this.

Find me a single critique of X by anyone who actually knows what they are talking about in which they propose to replace the basic model of (a) a client/server design (b) a very abstract API that looks nothing like the file API. Yes, there are many deep and very fair criticisms of the design of X (some of the best come from people who were involved in that process), but I have never seen any that suggest that these two basic features were a mistake.

Now, Hannu and you are both suggesting that OSS should be available as the kernel-side part of the audio stack, and you've both suggested that its OK or even desirable to layer additional APIs above it that take of various "higher level needs".

Yet at the same time, you both keep talking about applications that directly use the OSS API. Hannu has even suggested that its probably the best API for "simple audio playback".

This is precisely the disjunction I am talking about. On some level, certainly as an application developer, I don't care what the implementation of the device drivers looks like. But the advocates for OSS seem to keep mixing up its status as a device driver implementation with an audio stack that uses the Unix file API as *its* API.

Ted Tso has explained rather well the technical and socio-cultural reasons why having OSS inside the kernel is problematic. I've tried to explain why having applications use the Unix file API for audio I/O is problematic. You guys have even gone as far as acknowledging that many/most/all applications are really better off using some higher level API. If this was the case - at least if it was universally the case - then this higher level API could have low level I/O sinks/sources that talk to OSS, to ALSA, even to audio stacks on other platforms. Wow, that's a revolutionary concept :) The key point is that it would no longer matter to the applications which kernel driver implementation was present or what the kernel/user-space API looked like.

But instead, Hannu continues to advocate that the OSS API is a good API for programs to use for "simple playback", which means that any attempt to add some higher level API (Phonon, GStreamer, OpenAL, JACK, SDL ....) has to deal with the fact that some applications will just short-circuit it and go straight to the file API represented by OSS.

result? a mess unless everyone has agreed that OSS is the right choice, which it is abundantly clear that they will not. there are lots of messes like this on unix systems, but this one is particularly bad because the problem occurs at the kernel/userspace boundary, where "fixing" the issue is particularly hard (it requires the kind of functionality represented by FUSE, for example).

dawhead said...

@insane coder: i am in a relatively unusual position of being the author of components of all 3 levels of an audio stack used by many people. these include:

* kernel device drivers (for OSS and ALSA)
* a mid-level audio API (JACK, with backends for OSS, ALSA, CoreAudio, ASIO etc)
* audio applications, including Ardour, a digital audio workstation

From this perspective, I am required to really take a strong position when people make claims about OSS "sounding better", and I discussed these way back in the ~170 comment stream attached to this blob. Even if these differences have been properly established, all this says is that the ALSA driver is misconfiguring the hardware mixer and OSS is getting it right. This is really nothing more than a bug that needs fixing in ALSA - its not an argument in favor of replacing the kernel drivers. I understand that for regular users who are simply frustrated by whatever it is that ALSA may (or may not) have misconfigured that this isn't much of an answer.

Are you certain that you can even design certain applications to use those APIs?


Absolutely certain.


Apple acknowledges that designing certain applications around CoreAudio would be very hard if not near impossible to do, and therefore offer OpenAL for applications whose entire design would run counter to how CoreAudio works.


Yes, and I welcome such designs openly. Layering a push-model API on top of a pull-model API is pretty easy to do (not always quite so easy to get it perfectly right, but certainly much easier than the other way around). I have tried to make it clear that although I think that the lower levels of the audio stack should use a pull-model, having libraries that add a push model above it is a good idea. That said, many applications would get better performance if they used the pull model as well.


See what sinamas said above about applications that have to respond to several pieces of hardware, not just audio.


I'm quite familiar with the problems involved here, and there are very long-standing engineering solutions for dealing with them (primarily using "delay locked loops"). Unfortunately, many programmers (including myself) don't have the background in the right parts of engineering to know about these solutions, and so we fumble around in the dark for quite some time. I personally didn't understand how to sync two different data streams (video/audio) correctly from a design perspective until I had been writing audio software for about 5 or 6 years. The design of CoreAudio (or any other pull-model API) is absolutely not an impediment to this issue.

Enforcing timer driven callbacks is a VERY bad idea. Once you have two of these in an application, your application is dead. Such a system only works right if there's only ONE device doing this.


I'm sorry, but this is simply and demonstrably false.

Hey, here's an idea, why not offer this additional layer right within the sound system?

Whether you like the design of the library they provided (which most people do not), this is precisely what ALSA was trying to do. Unfortunately the project has never had enough manpower and instead of working on refining and extending the capabilities (and even design) of libasound, the developers that do exist have to continue dealing with driver-level issues (largely caused by the nightmare of the Intel HD spec and USB "quirks").

From the evidence before us, I am not sure I would have been so thrilled if libasound had expanded in this direction, but certainly the basic idea was there.

Theodore Tso said...

Insane Coder,

>We need to look at what works best for the vast majority when comparing, as opposed to looking at special cases, using them as the norm, and then going so far as applying those warped ideas elsewhere.

That's oversimplifying the problem, unfortunately. It is true that we need to provide some solution that works well for the vast majority of the common cases ---- but we have to accommodate the special cases as well. That's something which is desirable for the entire, overall system.

At the kernel level the key assumption has always been that the making sure the API is flexible enough to handle the special cases as _the_ highest priority. We only want to minimize the number of interfaces we export from the kernel, and we want to keep them stable.

Whether or not this is via straightforward "Unix" file-oriented API is somewhat beside the point. Applications, for example, don't use a simple Unix API for printing any more. Why is that? Because printing has gotten to complicated for a simple Unix file-oriented API. Some printers have multiple output trays; some printers use Postscript; some use PCL; some use other formatting language altogether. Some printers are accessed via USB; some via a raw TCP/IP socket; some via the Internet Printing Protocol. And so applications use a much more complicated, much richer library interface, and most people would say that this is appropriately so.

So yes, we need a simple interface for programs whose needs are simple. True. But whether that interface must be the kernel-exported interface is a different question. The advantage of doing it in a userspace library is that library can be more easily changed than kernel code.

As far as your claim that some libraries don't have very good sound when ALSA is used as a backend, that doesn't prove that ALSA is at fault; it may just be that the particular library's backend driver for ALSA is crappy. The fact that some JACK is able to produce professional quality sound via ALSA's kernel interface tends to indicate to me that it is the sound libraries which are at fault. Maybe ALSA could make things easier for the sound libraries' programmers if things were better documented, but at the end of the day, this is something which is fixable; and, I would argue, the easier problem to solve.

>Now of course perhaps I should use OpenAL or something related instead for portability, and better documentation and examples out there, except if I use OpenAL on Linux, ALSA works horribly, and users complain sound is bad.

So maybe the answeris we should fix OpenAL. As Hannu has pointed out before; in 1ms sound travels 34 cm; in 10ms sound travels 3.4 meters --- and people don't complain about the latency induced by placement of their computer speakers, or the fact that instruments in the back row of an orchestra are delayed by 10ms from the instruments in the front row of the orchestra. And 10ms is plenty of time to do software mixing and on or more layers of software libraries. If there's a problem, in all likelihood it's in the design and/or implementation of the libraries.

dawhead said...

@insanecoder: i don't think that citing what Plan 9 does makes a huge amount of sense. i've used plan 9 - i even did some hacking on it back in the mid-90's, and although its a very elegant piece of design, it has its own set of problems too. i thought we were attempting to talk about real-world solutions and designs. Although the world might be a better place if everything was done the way Plan 9 does things, no existing platform is close enough to Plan 9's internal design to make using some of its higher level ideas particularly easy to do.

Finally, in all my comments about the "video API", I guess I was not clear enough. At no time was I discussing the API for handling video data. I was referring to APIs for interacting with the video interface, which for 99% of all apps means putting up a GUI.

Reece Dunn said...

@insane coder: "I guess I see what you're saying though that we see people everywhere, even Reece in this thread seems to find OSS has better sound for him, yet we have no idea which sound cards any of these people use."

I am using the intel-hda driver for a HP pavilion dv9000 laptop running on Ubuntu 9.04 with Linux kernel 2.6.28.x and the proprietary NVidia drivers. If there is anything else that you need (lspci, etc.) let me know - I can get this later on today.

The PulseAudio issues I was having may be ALSA/kernel related. One thing that I have noticed (and you have mentioned in the article) is that it is less processor intensive, which means that it is running quieter (fans are less active now ^_^).

Reece Dunn said...

@dawhead

A file-based (or file-like) API is very simple to grasp and code to. Of the APIs I have used (OSS, ALSA, PulseAudio using it's file-like API), ALSA is the hardest to get working.

The advantage of a complex API is that it can properly model the capabilities and functionality exposed by a sound card (i.e. by making no assumptions or design choices between interleaved vs non-interleaved). Apart from that there is no difference between the two (and as the different bindings prove, you can interoperate both ways).

The question really is: what is the best kernel driver API for sound cards such that application-level APIs like OSS, ALSA, JACK, OpenAL and SDL can be built? This really should be an issue that is internal to the Linux kernel -- it can create whatever API/abstraction it wants to the sound card, and provides OSS, ALSA, JACK and other API bindings.

Here, OSS is still cross-platform (as it is essentially a specification for a file-based interface to sound cards) and the main APIs being supported natively, and all sharing the driver code (i.e. there isn't the code and bug duplication) and you don't get the current "an application is using /dev/dsp" issue with OSSv3.

From a developers point of view, nothing has changed -- they still have access to the API they want to use. Other APIs that are not currently in the main kernel tree can use one of the back-ends to map it to an interface that is supported. Higher-level APIs like GStreamer, Phonon and arts can be built/configured in the same way.

The question then is where do the (cross-application) audio mixers sit. I don't know. I would suggest that this is a different API that the kernel exposes. The question is whether this mixing should be done in the kernel or in a user-space driver. I don't know enough here to say one way or another, but all the bindings should go through the same mixer logic -- there is no reason for it to be duplicated.

From a users perspective, they should see better performance and sound quality. I don't know about the other features of PulseAudio and how they should be architected (this is something that the kernel and sound devs should be discussing). I also don't know about other systems (BDSs, OpenSolaris, Windows, etc.)

Reece Dunn said...

@dawhead: "Even if these differences have been properly established, all this says is that the ALSA driver is misconfiguring the hardware mixer and OSS is getting it right. This is really nothing more than a bug that needs fixing in ALSA - its not an argument in favor of replacing the kernel drivers."

Agreed... for the most part.

I don't see why the kernel cannot expose both ALSA and OSS APIs, provided it routes both to the same driver code -- for the sake of simplified maintenance and shared bug fixing/developer pool (both OSS and ALSA benefits from having both parties fixing driver bugs).

PulseAudio is slightly different in that
(a) it puts demands on the ALSA drivers that they haven't expected before, and as a result hitting more ALSA driver bugs,
(b) it has a higher latency demand of the kernel (which hit Ubuntu worse than other distributions)
(c) it requires more processor power to do the things it does (audio mixing?), which means that under load the audio will drop out.

dawhead said...

@reece dunn: I don't see why the kernel cannot expose both ALSA and OSS APIs

Hah! It does!

No, what you are actually asking for, I suspect, is for it contain two driver implementations, both the ALSA drivers and the OSS drivers, and somehow choose between them. This is not going to happen. Neither ALSA nor OSS are written to be parallel systems.

Also, your observations about PA are mostly just ... wrong. If anything what you describe would be true of JACK, but are not true of PA.

dawhead said...

@reecedunn: what i've been critical of is not just a "file-based API", which in and of itself would be OK. the problem that i have with OSS as an API for applications to use is that it uses the Unix file API as system calls to provide access to the audio interface.

This means that an application calls open, read, write, close and ioctl. There is no non-hackish way to put anything between the application and the kernel drivers. Any "extra" functionality you want to provide (e.g. resampling) has to be inside the kernel driver - it cannot be in a library.

Contrast with would be possible if it called, say, snd_open(), snd_read(), snd_write(), snd_close() and snd_ioctl(). These would all be library functions that might directly call into the kernel, or might do something much more complex.

This is precisely how the ALSA API handles things, except that it adds "_pcm_" to the function names to distinguish from the similar functions that access the hardware mixer and MIDI interfaces.

dawhead said...

@theodore tso: actually, the latency requirements are quite a bit tighter than this. i don't think this is the place to explain the precise numbers, but instead to just note that we can provably match the hardware's lowest possible latency capabilities in user space (proof by existence: JACK). Given that Hannu is correct that most application don't need to get close to this lower bound just makes the job easier in the general case. Hence ... PulseAudio :)

insane coder said...

Hi dawhead.

>Now, Hannu and you are both suggesting that OSS should be available as the kernel-side part of the audio stack, and you've both suggested that its OK or even desirable to layer additional APIs above it that take of various "higher level needs".

I'm not referring to a kernel side part of the stack. I'm saying the average developer wants an OSS file like API that works well.

>Yet at the same time, you both keep talking about applications that directly use the OSS API. Hannu has even suggested that its probably the best API for "simple audio playback".

Indeed, this style API is one of the easiest.
If it'd make you feel better, you can change the API to oss_open(), oss_close(), oss_write(), oss_setopts() and whatever, and provide the same support in a sane manner to the user. I really don't care what goes on behind the scenes, as long as it's as good as what OSS currently provides. However, the benefit of it using native system calls like open()/close()/write(), and ioctl() means I can have an application which runs (albeit without sound) when that particular sound library isn't available.

>From this perspective, I am required to really take a strong position when people make claims about OSS "sounding better", and I discussed these way back in the ~170 comment stream attached to this blob.

I don't know what you're specifically referring to here.
However instead of viewing it as baseless claims, why not find out why people are finding OSS works better on their sound card? Do you really think people all over are making up lies saying they get better sound on OSS for no particular reason other than to stir up trouble?

>all this says is that the ALSA driver is misconfiguring the hardware mixer and OSS is getting it right

What hardware mixer?
Every sound card I noticed a difference where OSS sounds better has no hardware mixer.

insane coder said...

>This is really nothing more than a bug that needs fixing in ALSA - its not an argument in favor of replacing the kernel drivers.

I've mentioned this many times already. I'll say it again. Let ALSA have mixing everywhere (when there's no hardware mixer), improve mixing quality, and not go haywire when CPU is under heavy load, and I'd gladly use ALSA. I don't care what the solution in the end is, ALSA, OSS, or something else entirely as
long as it works well for the situations I need it to. ALSA currently is not the solution for me, or others in the same boat I'm in.

>>Are you certain that you can even design certain applications to use those APIs?
>Absolutely certain.

The rest of us are not so certain.

>Layering a push-model API on top of a pull-model API is pretty easy to do (not always quite so easy to get it perfectly right, but certainly much easier than the other way around). I have tried to make it clear that although I think that the lower levels of the audio stack should use a pull-model, having libraries that add a push model above it is a good idea. That said, many applications would get better performance if they used the pull model as well.

Now since sound cards are pull models, and OSS/ALSA are push models we already have this done. What you really want is to also expose that pull model directly.

>I personally didn't understand how to sync two different data streams (video/audio) correctly from a design perspective until I had been writing audio software for about 5 or 6 years. The design of CoreAudio (or any other pull-model API) is absolutely not an impediment to this issue.

5 or 6 years, I see. So for the rest of us mere mortals, it seems near impossible. Programming shouldn't need 5 or 6 years experience with some component to do a decent job of writing software for it.

>I'm sorry, but this is simply and demonstrably false.

You yourself just said 5 or 6 years experience required, so you just in fact demonstrated it is true.

insane coder said...

Hi Theodore.

>>We need to look at what works best for the vast majority when comparing, as opposed to looking at special cases, using them as the norm, and then going so far as applying those warped ideas elsewhere.
>That's oversimplifying the problem, unfortunately. It is true that we need to provide some solution that works well for the vast majority of the common cases ---- but we have to accommodate the special cases as well. That's something which is desirable for the entire, overall system.

It seems you have misunderstood me and then further quoted me out of context.

I'm not talking about designing a subsystem based on special cases, but rather designing a subsystem (audio) based on the special cases of another subsystem (video user interface), as dawhead keeps trying to do by comparing outputting sound to designing a GUI interface.

>As far as your claim that some libraries don't have very good sound when ALSA is used as a backend, that doesn't prove that ALSA is at fault; it may just be that the particular library's backend driver for ALSA is crappy.

This has already been discussed. And yes we know the back-end for ALSA is bad in those libraries.

>Maybe ALSA could make things easier for the sound libraries' programmers if things were better documented, but at the end of the day, this is something which is fixable; and, I would argue, the easier problem to solve.

Yes, but it seems the ALSA guys will need to fix these libraries, because the library programmers don't know how to use ALSA. And until this is fixed, I can't use a library which outputs to ALSA. Effectively, I can't use anything which goes through ALSA API.

>So maybe the answeris we should fix OpenAL.

That would be a good thing please get the guys who know ALSA really well to do so.

>As Hannu has pointed out before; in 1ms sound travels 34 cm; in 10ms sound travels 3.4 meters --- and people don't complain about the latency induced by placement of their computer speakers, or the fact that instruments in the back row of an orchestra are delayed by 10ms from the instruments in the front row of the orchestra. And 10ms is plenty of time to do software mixing and on or more layers of software libraries.

If that was what I was talking about, this would indeed be a sorry conversation. I can't notice a 10ms difference, nor would I care. The issue is when I'm measuring a latency in seconds. I mentioned this several times already.

>If there's a problem, in all likelihood it's in the design and/or implementation of the libraries.

That's likely. However there is also noticeable latency (like 300ms) on some setups in using ALSA API over ALSA vs. using OSS API over OSSv4. That could be a driver issue, but it is just enough to make audio noticeably not sync with lip movement and other things, causing endless annoyance to end users.

Hi dawhead.
>Finally, in all my comments about the "video API", I guess I was not clear enough. At no time was I discussing the API for handling video data. I was referring to APIs for interacting with the video interface, which for 99% of all apps means putting up a GUI.

If you mean putting up a GUI as in widgets and menus and pointers, why the heck are you comparing this to outputting sound? Do you really view the former as the same type of system as the latter? Hannu was right, you really are comparing apples and asteroids.

dawhead said...

@insanecoder: do you realize what a massive conceptual leap it is to move from open/read/write/close to oss_open/oss_read/oss_write/oss_close? do you realize that Hannu's reluctance to do this is one of the main reasons that ALSA even exists?

dawhead said...

@insanecoder: However there is also noticeable latency (like 300ms) on some setups in using ALSA API over ALSA vs. using OSS API over OSSv4. That could be a driver issue, but it is just enough to make audio noticeably not sync with lip movement and other things, causing endless annoyance to end users.

I'd be willing to bet a crate of oranges that this is a problem in whatever library is being used. We have thousands of users of JACK with the ALSA backend, on just about every audio interface you could name including lots of crappy consumer ones (the builtin motherboard type), and I have never seen any reports of this type. ALSA just doesn't have any way to make this happen by itself.

insane coder said...

Hi dawhead.
>do you realize what a massive conceptual leap it is to move from open/read/write/close to oss_open/oss_read/oss_write/oss_close?

Conceptual leap? Not at all, it's the exact same concept.

>do you realize that Hannu's reluctance to do this is one of the main reasons that ALSA even exists?

No, and I find that hard to believe to be the truth. You can't just prefix those API functions with oss_ or alsa_ or pcm_ in the case of ALSA and get the same results. If it was that easy, no one (especially myself) would find the ALSA API difficult to work with.

Furthermore, FreeBSD and others have no issues using open()/write()/ioctl() directly. Why is this an issue with Linux?

>>However there is also noticeable latency (like 300ms) on some setups in using ALSA API over ALSA vs. using OSS API over OSSv4. That could be a driver issue, but it is just enough to make audio noticeably not sync with lip movement and other things, causing endless annoyance to end users.
>I'd be willing to bet a crate of oranges that this is a problem in whatever library is being used.

I'd bet a crate of two dozen oranges that you misread that. I said ALSA API over ALSA. I didn't say library via ALSA API over ALSA.

>ALSA just doesn't have any way to make this happen by itself.

Really? What about a sound mixer (dmix) running in user space that is starving?

dawhead said...

@insanecoder: If you mean putting up a GUI as in widgets and menus and pointers, why the heck are you comparing this to outputting sound? Do you really view the former as the same type of system as the latter? Hannu was right, you really are comparing apples and asteroids.

Actually, it really isn't so different. Think about the range of audio applications. You have simple audio file players, which are more or less equivalent to wanting to dump raw video into the framebuffer. You have sample-based synthesis applications, which are conceptually equivalent to putting several different video "objects" on screen, with timing controlled by something else (e.g. MIDI or game events). You have full blown synthesis environments in which the application is doing huge amounts of computation and user interaction in order to generate entire orchestral scores, which is not so different conceptually from what is going on with a GUI. such applications likely use synthesis libraries, equivalent to GUI toolkits, and manipulate things via the library API to get audio to show up, just like a GUI app doesn't really deal much with any of the lower layers of onscreen drawing, but focuses on widgets and so forth.

BUT ... this really wasn't my point. My point is that in both cases, there is a hardware device that needs to be shared between applications, whose basic operation is fundamentally similar (periodic processing of some kind of buffer), and whose basic goal is fundamentally similar (to take data and render it so that the user will see or hear it. Whether what needs to be drawn is a GUI widget or raw video, or what wants to be played is a simple audio file or 120 physically modelled MIDI-driven synthetic instruments, the basic control and data flows are very isomorphic.

In the case of the video interface, everyone accepts that its better to have (a) a client/server model to handle sharing (b) to use an abstract API. I'm just pushing on the point about why its so hard to get people to agree on these points for the case of the audio interface.

dawhead said...

@insanecoder: No, and I find that hard to believe to be the truth. You can't just prefix those API functions with oss_ or alsa_ or pcm_ in the case of ALSA and get the same results. If it was that easy, no one (especially myself) would find the ALSA API difficult to work with.

This (see the simple playback example) is rather old but tries to demonstrate the point. It looks more complex than OSS mostly because ALSA has separate calls for setting each aspect of the data format, rather than using a single ioctl with a bit field. So you see explicit calls to set sample rate, bit depth, and so on, instead of just one. ALSA also allows control over interleaved/non-interleaved as discussed here, so that is an extra parameter compared to the OSS case. The only extra step that really has no counterpart in OSS is the call to snd_pcm_prepare().

What about a sound mixer (dmix) running in user space that is starving?

Dmix is not a server. Its some functions in the ALSA library that do lock-free mixdown into a shared buffer using some code that is a bit too clever for its own good. There is/was a user-space ALSA mixing server, but I don't believe that anyone uses it and I don't believe that Takashi advocates its use.

The problems you describe with lip sync were very evident, for example, in the Flash implementation for linux. Did I mention that I meant the Flash implementation that used OSS? :)

When Adobe switched over to use ALSA (and these days, they make it possible to switch - we even have a JACK backend for Flash now), things got much better. But not because of ALSA - they just didn't make the mistakes they made the first time that led to audio/video sync being so abysmal. It had nothing to do with the API at all (I know this from long discussions with the guy who did the work).

insane coder said...

Hi dawhead.

>You have simple audio file players, which are more or less equivalent to wanting to dump raw video into the framebuffer. You have sample-based synthesis applications, which are conceptually equivalent to putting several different video "objects" on screen, with timing controlled by something else (e.g. MIDI or game events). You have full blown synthesis environments in which the application is doing huge amounts of computation and user interaction in order to generate entire orchestral scores, which is not so different conceptually from what is going on with a GUI. such applications likely use synthesis libraries, equivalent to GUI toolkits, and manipulate things via the library API to get audio to show up, just like a GUI app doesn't really deal much with any of the lower layers of onscreen drawing, but focuses on widgets and so forth.

With audio, you're always generating a sound stream. Even if you have a series of synthesizers generating that stream using all kinds of libraries and user interaction, at the end of the day it's a stream. You're going to take that stream and either save it to a file, or output it to the sound card. Notice the save it to a file part. Your main goal in sound irregardless of whether you have an existing sound file, or several, or creating a sound file on the fly is taking that stream and writing it out. It's nice to be able to make a few init routine changes and have one interface write out your sound.

GUIs are vastly different. Never in a normal interactive GUI application are you creating interactive widgets and sending them to a file. Sound isn't "interactive", and even a GUI as it is in a static view is worthless aside from screen-shot demonstrations.

I find it astonishing you think most sound applications and most GUI applications are dealing with a similar setup. They are not conceptually equivalent. We may be experiencing them via our human senses, does that mean since I can touch a wall and feel it that it has anything to do with sound?

>In the case of the video interface, everyone accepts that its better to have (a) a client/server model to handle sharing

Since when was this? For non-networked video interfaces very few recommend a client/server model.

I'd also find it interesting to see where Google's new windowing system goes, but from what I'm currently told, it will run directly on top of Linux, provide windowing and apps without client/server, and any networking you want to do will run on top of that, using a browser as the client (yes, I know X has client and server ideas reversed) to another server when you need to view something remotely. The idea is to pass back and forth encapsulated data using existing data protocols and rendering locally, as opposed to offloading all rendering elsewhere.

>This (see the simple playback example) is rather old but tries to demonstrate the point.

I followed that among other documentation when I wrote sound with ALSA. Again, if it was that simple to integrate into my app and work right, I would have no issues, but sadly, a lot of tweaking had to go into it to make it work right.

>Dmix is not a server. Its some functions in the ALSA library that do lock-free mixdown into a shared buffer using some code that is a bit too clever for its own good.

I don't care who or where or why the problem happens. As long as it does happen, ALSA can't be used in those circumstances.

dawhead said...

@insanecoder: You're going to take that stream and either save it to a file, or output it to the sound card. Notice the save it to a file part. Your main goal in sound irregardless of whether you have an existing sound file, or several, or creating a sound file on the fly is taking that stream and writing it out. It's nice to be able to make a few init routine changes and have one interface write out your sound.

admirable goals. but you left off some other possible goals: suppose you want to route the audio to another application? send it across the network? to do this you need an API that is at least one layer above the file API, or you need crazy hacks to reroute the audio back out of the kernel.

In the case of the video interface, everyone accepts that its better to have (a) a client/server model to handle sharing

Since when was this? For non-networked video interfaces very few recommend a client/server model.


Funny. Apple doesn't seem to agree with you. Quartz is a client/server system (most people don't realize this, partly because it has no networking capabilities). There are some suggestions that Microsoft have also gone with this model too. Most microkernels use it. From what I've seen of Google Chrome OS, it too will use a client/server model. I cannot think of any system except for direct use of the framebuffer that definitely avoids a client/server design. This is for a very simple reason: all these people accept the wisdom of not trying to "mix" output from different applications inside the kernel.

I don't care who or where or why the problem happens.

So if you don't care whether a problem is in application code, or a 3rd party library, or in the ALSA library, or in the kernel drivers, why even discuss it? I've tried to acknowledge that I think that the APIs are too complex for most users, and I've provided one concrete example (Flash) of where reasonably skilled developers got things wrong with a simpler API.

I followed that among other documentation when I wrote sound with ALSA. Again, if it was that simple to integrate into my app and work right, I would have no issues, but sadly, a lot of tweaking had to go into it to make it work right.

Glad to be of service :)

For years, I have never advocated that anyone use the ALSA API for writing applications unless they had very specific reasons to do so. My dislike of it are for different reasons than my issues with the OSS API, but they are a an important part of the many reasons why JACK exists (though realistically, inter-application audio routing was more important).

There are a few reasonable APIs for applications to use. None of them are really perfect, and none of them work well for everyone. The only thing that is clear to me is that applications should not be using either of the ALSA or OSS APIs directly, ever.

sinamas said...

@dawhead: The only thing that is clear to me is that applications should not be using either of the ALSA or OSS APIs directly, ever.

AFAIK there are few good alternatives for my case. I don't have much of a problem with either the ALSA or OSS API though, so I don't see why I would use anything else. Most wrapper APIs either implement the pull model badly (pulling too infrequently or being prone to underruns perhaps due to lack of real-time priority), or provide a push API with too little information on how far we are from underrun/overrun/blocking. For the record I never tried jack, as I had no compelling reason to.

Reece Dunn said...

@dawhead:
[@reece dunn: I don't see why the kernel cannot expose both ALSA and OSS APIs]

>Hah! It does!

I know it does. I was trying to address the comments here that were saying "this/that API is the one true way".

> No, what you are actually asking for, I suspect, is for it contain two driver implementations, both the ALSA drivers and the OSS drivers, and somehow choose between them. This is not going to happen. Neither ALSA nor OSS are written to be parallel systems.

No! Having two internal drivers for the same sound card is crazy.

What I am saying is that there should be a "close to the metal" interface on which the drivers are written. This would be fluid and internal to the kernel, so it can adapt to meet the needs of new sound capabilities (such as 3d audio). The user-level APIs (OSS, ALSA, JACK and others) are then written once targetting this interface.

There may be technical reasons why this can't be the case (e.g. latency requirements for JACK), I don't know.

> Also, your observations about PA are mostly just ... wrong.

Why:

>> (a) it puts demands on the ALSA drivers that they haven't expected before, and as a result hitting more ALSA driver bugs,

Check the PA mailing list.

>> (b) it has a higher latency demand of the kernel (which hit Ubuntu worse than other distributions)

https://tango.0pointer.de/pipermail/pulseaudio-discuss/2009-February/003150.html

-- "Latencies of 210ms is *REALLY NOT NECESSARY*."

>> (c) it requires more processor power to do the things it does (audio mixing?), which means that under load the audio will drop out.

So how come running an audio application I get periods of silence when using PulseAudio under load, and not in OSS? I may have be hitting ALSA bugs here, though.

dawhead said...

@reece: What I am saying is that there should be a "close to the metal" interface on which the drivers are written. This would be fluid and internal to the kernel, so it can adapt to meet the needs of new sound capabilities (such as 3d audio). The user-level APIs (OSS, ALSA, JACK and others) are then written once targetting this interface.

I think there is some confusion here.

OSS and ALSA are, in large part, kernel-side device driver implementations. It just so happens that the API that OSS uses looks no different to the usual Unix file API, whereas the one that ALSA offers is substantially more complex. Either way, all user-space interaction with the device drivers happens in the same way: system calls are made, the device driver code executes and stuff happens. This is the "thin layer" that exists on any Unix-like operating system.

.... stuff about Pulse ...

I did say mostly not true ;)

"Latencies of 210ms is *REALLY NOT NECESSARY*."

The main reason that I commented here is that JACK is generally much more demanding. 210ms of latency for most JACK users is really substantively above what they would normally use. It is also sensitive to poorly design hardware, such as cards that don't run the playback & capture pointers in sync (thus generating multiple closely spaced interrupts for what should really be a single continuous process), or cards that don't handle duplex configuration very well.

I am not here to defend PulseAudio. Lennart has done some interesting things with it, and is subject to some unjustified criticism. On the other hand, Pulse demonstrably causes issues for some people in ways that other audio systems do not. Overall, I do not think it is close to an total solution for audio on linux, though some of the technology in it might be a part of such a solution.

Hannu Savolainen said...

dawhead> Yet at the same time, you both keep talking about applications that directly use the OSS API. Hannu has even suggested that its probably the best API for "simple audio playback".

dawhead> This is precisely the disjunction I am talking about. On some level, certainly as an application developer, I don't care what the implementation of the device drivers looks like. But the advocates for OSS seem to keep mixing up its status as a device driver implementation with an audio stack that uses the Unix file API as *its* API.

It is perfectly OK that an application developer uses the API that is best suited for his use. It can be the file/device level API, a library or something based on a server. Equally well the developer can use any GUI toolkit or interface that is available. This doesn't cause any problems if the APIs are implemented in a proper way.

The real problem is that many of the current sound APIs or systems are not properly designed. They cannot co-exist as they should this limits the number of APIs that can be available at the same time. This leads to a situation where all applications should have input/output plugins for all possible APIs and some logic to autodetect which one to use.

For example OSS and ALSA are mutually exclusive because ALSA has a conflicting kernel level interface bundled with its library level API. If ALSA is installed then OSS applications will not work properly (because ALSA's OSS emulation is not 100% compatible).

If OSSv4 is installed then ALSA becomes unavailable. OSS has a limited implementation of alsa-lib (just a subset). The result is that ALSA-only applications shipped with most Linux distributions stop working. Finally current OSS applications don't work reliably with OSSv4 because they have been modified to work with somehow incompatible OSS emulation. Some server based APIs may get out of business if they don't have OSS support compiled in.

Some APIs use a ceantral server that lets multiple applications to share the same audio device. It is redundant with OSS because OSS does hw/sw mixing itself. This doesn't necessarily cause problems but there has been situations where such servers block each other or normal OSS applications. This happens with professional audio devices that don't have vmix enabled by default.

If these issues get solved then there are no strong reasons to avoid library based sound systems. I just don't see it necessary to duplicate the functionality provided by a kernel level API using some library based solution. However adding functionality that cannot be provided in kernel level makes lot of sense.

One subject that needs to be addresses is sharing of audio devices (mixing). The ideal solution is that the mixing is done by hardware. In this way the result is fully transparent. Applications will not see any difference between hw mixing and no mixing. Software based mixing in kernel level (vmix) is poor man's HW mixing. It's always there and APIs in all levels can use it. Server based mixing in turn is likely to result in situation where only one API at time can use the device.

One issue with library level sound systems is library/version dependencies. If the system doesn't have the right libraries/versions installed then the application will not load. The device API doesn't have this kind of problems since it's always there. If application cannot open the audio device it can still run without audio functionality.

Can we agree about this?

insane coder said...

>admirable goals. but you left off some other possible goals: suppose you want to route the audio to another application? send it across the network?

You don't know what pipes and sockets are? All these things are done via file APIs. It's also why I told you to look into how sound is handled in Plan9, they take it to an extreme.

>to do this you need an API that is at least one layer above the file API
>or you need crazy hacks to reroute the audio back out of the kernel.

No and no. You vision of what the file API is capable of seems to be really limited, and you fail to show how sound is the same ballpark as GUI creation.

>Funny. Apple doesn't seem to agree with you.

And I should care why?
Would you like me to point out some absurdities in their architecture to show you they're not right about everything?
Or perhaps you should read other articles I wrote to point out why standard bodies or organizations don't get it right.

You can even see I wrote some articles about how file APIs are far from perfect too.

>>I don't care who or where or why the problem happens.
>So if you don't care whether a problem is in application code, or a 3rd party library, or in the ALSA library, or in the kernel drivers, why even discuss it?

Thanks for twisting it. I said I don't care where in ALSA it happens, I just want it fixed. Now you're just throwing out 3rd party library in there (yet again).

>>I followed that among other documentation when I wrote sound with ALSA. Again, if it was that simple to integrate into my app and work right, I would have no issues, but sadly, a lot of tweaking had to go into it to make it work right.
>Glad to be of service :)

How were you of service? This was over a year ago. I also double checked my code, I had to add >100 lines onto that example, and modify it heavily to make it work. OSS code in itself isn't even 20 lines. From my notes on the matter, I couldn't get sound to output consistently without some heavy buffer on top of it with ALSA, OSS had no such issues.

Hannu Savolainen said...

insane coder> Hey, here's an idea, why not offer this additional layer right within the sound system?

dawhead> Whether you like the design of the library they provided (which most people do not), this is precisely what ALSA was trying to do. Unfortunately the project has never had enough manpower and instead of working on refining and extending the capabilities (and even design) of libasound, the developers that do exist have to continue dealing with driver-level issues (largely caused by the nightmare of the Intel HD spec and USB "quirks").

I would like to add SB X-Fi to this list. All the current audio HW architectures are seriously flawed. They are unbearably complex. Somehow it looks like they all have been created by HW engineers that have no idea how software works. These designs were supposed to let single drivers to support all possible devices. However the result was that every device (motherboard) require hand written quirks to work. ALSA has currently more resources (bigger developer community) than OSS. However the task is a mission impossible even for them. All this hassle is out from the development of the APIs themselves. Personally I have (temporarily) lost all motivation to continue development of OSS because 99.999...999% of the work is just finding workarounds for bugs in HW design.

The claim that some API sounds better than some other is all bogus. Properly working API feeds the samples to the speakers/headphones without causing any artefacts. This is piece of cake to handle.

However the problem is that in many cases (Linux) audio applications are not capable to feed/consume audio data to/from the device fast enough. This causes breaks to the signal. There are just few reasons to that:

1) This is OSS specific but the main reason why current OSS applications don't work at all. They enable non-blocking I/O when opening the device. However they don't handle non-blocking reads/writes in the way defined by POSIX. The result is that playback/recording will progress in FFFFFFFWD speed and the signal will be completely garbled. This happens because OSS/Free and ALSA's OSS emulation used the O_NDELAY/O_NONBLOCK flags of open() for wrong purpose.

2) Many applications try to use lower latencies than necessary. There are types of applications that don't tolerate any latencies at all. However typical applications will work just fine even the latencies are in acceptable level. Using too low latencies cause serious tradeoffs. The system overhead (interrupt rate, system call overhead and number of context switches) raise proportionally when latencies decrease. After certain level the application will not work under vanilla Linux kernel.

3) Large number of applications use asynchronous timers that are not reliable enough. This approach is potentially dangerous in applications that require lower latencies. dawhead claimed that this theory is wrong because CoreAudio applications use asynchronous timers too. However this proof is not correct. CoreAudio is designed to be used with synchronous timing. Asynchronous timers can be used reliably only when the latency is degree of magnitude below the precision of the timers. The situation under Linux is exactly the same.

Problem 1) is OSSv4 specific. As far as I know no other API suffers from it. Fixing it is easy in theory but it should be done in all applications.

The other issues are caused by design bugs in the applications. If they are fixed then every audio system (API) should work equally well. However if the API enforces use of such bad practices then the API itself is wrong.

This was my 2 cents about insane audio coding.

dawhead said...

@insanecoder:

you don't open a socket with open(). you don't connect a socket with a call that is part of the file API. you generally don't get reliable connection shutdown by calling close(). most applications that use sockets don't even use read/write with them because the semantics are not helpful (hence send(2) and recv(2)) ... so i am not clear what you are imagining? you cannot take code written to open a file/OSS device node and make it work on a socket without substantively, perhaps even totally, rewriting it.

contrast with ALSA, where an application written using snd_pcm_FOOBAR will work similarly well (or badly) no matter what the actual audio delivery mechanism is.

pipes don't have enough buffering to be useful for audio (limited to about 5kB on most kernels).

>to do this you need an API that is at least one layer above the file API
>or you need crazy hacks to reroute the audio back out of the kernel.

No and no. You vision of what the file API is capable of seems to be really limited,


When an application calls read(2) or write(2) and without using hacks like LD_PREOPEN or a redirection via FUSE (or whatever the user-space filesystem of the week flavor is right now), the data is destined to come from/be delivered to the kernel, and nowhere else. I believe that you understand this, so I really don't understand what you are alluding to. By using such an API, you require that stuff is either redirected back into userspace, or handled in the kernel. If its handled in user space, why go via the kernel? If its handled in the kernel ... well, we've been there already.

>Funny. Apple doesn't seem to agree with you.

And I should care why?


I don't think you should care. You should just be more careful about claiming the client/server model for handling interactions with the a hardware device is "dead". Apple's Quartz is the one of the most recent (still, rather old I admit) attempt to completely redesign a video interface interaction system. As such, and not because its Apple, its worth taking a look at what their engineers (who are in general about as clever and about as stupid as anybody else) decided to do. The fact that Microsoft appears (I'm not 100% certain) to have recently adopted a similar model is also relevant to claims that client/server for device interaction is dead.

>Glad to be of service :)

How were you of service? This was over a year ago. I also double checked my code, I had to add >100 lines onto that example, and modify it heavily to make it work.


(a) I did try to add the smiley.
(b) the text right above the example does note that the code is not a real program. I also wrote it over 5 years ago and have not revisited it until today.
(c) what was the "nature" of the code you had to add and modifications you had to make?

dawhead said...

@hannu: Large number of applications use asynchronous timers that are not reliable enough. This approach is potentially dangerous in applications that require lower latencies. dawhead claimed that this theory is wrong because CoreAudio applications use asynchronous timers too. However this proof is not correct. CoreAudio is designed to be used with synchronous timing. Asynchronous timers can be used reliably only when the latency is degree of magnitude below the precision of the timers. The situation under Linux is exactly the same.

Couple of things. I never (intentionally) commented on the use of synchronous or async timers. I did make a comment about how CoreAudio allows different client applications of the audio interface to request different wakeup intervals ("timers"). I think that insanecoder or someone else interpreted this to be related to the use of some kind of POSIX-ish timer, which it was not.

I don't believe that anyone should attempt to explicitly use system timers on linux for audio. I can't believe that ALSA still has this API around, let alone that people use this idea (with or without ALSA). it is just so broken!

But notice that CoreAudio doesn't use synchronous timers for this purpose either. They use the system timer and a kernel-side DLL that allows them to very accurately predict "where" the audio timer/hardware pointer is right now. This is a really nice design for everything except the lowest possible latencies (and its one reason why Linux can still go lower than OS X in this area).

insane coder said...

Hi Hannu.
>The claim that some API sounds better than some other is all bogus. Properly working API feeds the samples to the speakers/headphones without causing any artefacts. This is piece of cake to handle.

That'd be nice if it was true. Too many tests I conducted suggest otherwise.

>This is OSS specific but the main reason why current OSS applications don't work at all. They enable non-blocking I/O when opening the device. However they don't handle non-blocking reads/writes in the way defined by POSIX. The result is that playback/recording will progress in FFFFFFFWD speed and the signal will be completely garbled. This happens because OSS/Free and ALSA's OSS emulation used the O_NDELAY/O_NONBLOCK flags of open() for wrong purpose.

I wrote a game with OSSv4, I didn't bother designing or testing it on OSS/Free or ALSA's OSS emulation. My design was to run at 60 FPS and to sync the video to that, effectively pausing the program after every frame output for ~16ms (except less since some milliseconds are used to calculate the next video/audio frame). I purposely used O_NONBLOCK. It didn't go in fast forward mode, because the video part was preventing it. Except when I enable fast forward mode within the game, in which case both the video and the audio run fast, which is fine. It ended up working great.

I can't say in a bubble that using O_NONBLOCK is wrong.

dawhead said...

@insanecoder: My design was to run at 60 FPS and to sync the video to that, effectively pausing the program after every frame output for ~16ms (except less since some milliseconds are used to calculate the next video/audio frame).

i'm not really blaming you in any way since the APIs to do this correctly are either absent or sufficiently non-standard enough but ... even so ... do you realize how wrong this design is? just theoretically speaking? you should be running from either the video clock (i.e. vertical retrace signal, unavailable under X, only available with OpenGL and not even then) or the audio clock, and then using a DLL to sync with the other one. Using a system timer for this is just ... wrong. I am sorry that the correct APIs are not available to enable you to do this correctly.

insane coder said...

Hi dawhead.

>you don't open a socket with open(). you don't connect a socket with a call that is part of the file API.

No, one uses socket() and connect(), but after that, it's just like any other file descriptor.

>you generally don't get reliable connection shutdown by calling close().

You don't? Since when? All modern OSs to my knowledge don't differentiate between shutdown(SHUT_RDWR) and close().

>most applications that use sockets don't even use read/write with them because the semantics are not helpful

Uh right, I use read and write all the time. In fact other than being able to specify certain flags, read and write are identical to send and recv.
In fact, sometimes I even pass that file descriptor to fdopen() or __gnu_cxx::stdio_filebuf<char> and use C or C++ file API with them.

>you cannot take code written to open a file/OSS device node and make it work on a socket without substantively, perhaps even totally, rewriting it.

I do it all the time. I'm not sure why you think otherwise. I have an init function which either inits a file and writes a PCM header, or opens the socket, or opens /dev/dsp and call some ioctl()s, or popen() for a pipe to LAME (I have my own modified popen based on pipes and dup2() which doesn't need a special pclose(), and also avoids some caveats popen() has), all of which return a file descriptor, then proceed to use write() as needed, and finishes up with a close().

>pipes don't have enough buffering to be useful for audio (limited to about 5kB on most kernels).

I'm not sure what you mean by that.

>You should just be more careful about claiming the client/server model for handling interactions with the a hardware device is "dead".

Why do you keep reading such things into what I say? I didn't call it dead. I said there's other options.

>what was the "nature" of the code you had to add and modifications you had to make?

I wrote a game, sound wasn't working properly without buffering, nor would it work for everyone that tested it until the initialization routine was tweaked dozens of times. Right now my init routine is huge, as opposed to OSS where I only had 5 ioctl() calls.

insane coder said...

>i'm not really blaming you in any way since the APIs to do this correctly are either absent or sufficiently non-standard enough but ... even so ... do you realize how wrong this design is? just theoretically speaking? you should be running from either the video clock (i.e. vertical retrace signal, unavailable under X, only available with OpenGL and not even then) or the audio clock, and then using a DLL to sync with the other one. Using a system timer for this is just ... wrong. I am sorry that the correct APIs are not available to enable you to do this correctly.

RTDSC/gettimeofday() is no good? why?

In certain cases, such as when emulating a system designed to run at exactly 58.9276 FPS or something equally retarded, I also can't just sync to a video or sound interface, and have the input occur at the speed it's supposed to occur at.

«Oldest ‹Older   1 – 200 of 236   Newer› Newest»