Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: caudec: a multiprocess audio converter for Linux and OS X (Read 26757 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

caudec: a multiprocess audio converter for Linux and OS X

caudec is a command-line utility for GNU/Linux and OS X that transcodes (converts) audio files from one format (codec) to another. It leverages multi-core CPUs with lots of RAM by using a ramdisk, and running multiple processes concurrently (one per file and per codec). It is Free Software, licensed under the GNU General Public License (version 3). The APEv2 tagger that's bundled with versions 1.7.1 and later, is licensed under the Mozilla Public License, version 2.

  • Supported input codecs: WAV, AIFF, CAF, FLAC, WavPack, Monkey's Audio, TAK, Apple Lossless.
  • Supported output codecs: WAV, AIFF, CAF, FLAC, Flake, WavPack, Monkey's Audio, TAK, Apple Lossless, lossyWAV, LAME, Ogg Vorbis, Nero AAC, qaac, Musepack, Opus.
  • Support for high quality resampling and downmixing / upmixing to stereo, with SoX.
  • Optimized I/O: input files are copied onto a tmpfs mount sequentially, so as to get the best performance out of the underlying medium (e.g. a hard drive). Transcoding however is done concurrently. Example: file 1 gets copied. When that's done, transcoding of file 1 starts. Meanwhile, file 2 gets copied, etc… Very little time is lost reading the files.
  • Transcoding to several different codecs at once is possible. In that case, decoding of input files is done only once.
  • Multiple instances of caudec can be run concurrently while sharing ressources.
  • Metadata is preserved (as much as possible) from one codec to another.
  • Multiprocess Replaygain scanner (except for Opus and Musepack).
  • Uses existing, popular command line encoders/decoders.


Tested under Arch Linux and OS X. Download here. Please use the bug tracker to report any bugs. Feedback is most welcome!

caudec: a multiprocess audio converter for Linux and OS X

Reply #1
I just released version 1.1.0, which adds support for Musepack.

caudec: a multiprocess audio converter for Linux and OS X

Reply #2
Excuse my ignorance, but does TAK actually work under Linux?

caudec: a multiprocess audio converter for Linux and OS X

Reply #3
The encoder/decoder (Takc.exe) works with wine. Linux users can use it for archiving, while transcoding to some other codec (e.g. lossy) for listening purposes. Caudec supports TAK encoding and decoding if the user has installed both Wine and TAK.

caudec: a multiprocess audio converter for Linux and OS X

Reply #4
It just occurred to me that I left out one of caudec's main selling points: it's fast. It sounds obvious to me, but maybe it isn't so much. I was never a sales person. It might also not be obvious that it works best on somewhat large sets of files (e.g. a whole album with one or two CDs, one file per track).

Encoding ABBA's 2CD The Definitive Collection (148 minutes, 37 tracks) from WAV to FLAC --best, with one process, on a Core i7 @ 2.2 GHz: 46x real time.
Same as above, with 8 processes: 186x

Just for kicks, FLAC -5 (default setting) with 8 processes encodes at 569x, TAK -p2 at 743x.

caudec: a multiprocess audio converter for Linux and OS X

Reply #5
I just released version 1.3.0 of caudec, that
  • adds support for WavPack lossy
  • adds support for resampling of stereo files
  • corrects a bug that increased disk space usage on tmpfs
  • improves prediction of required disk space on tmpfs
  • adds support for a CAUDECDIR environment variable for setting the temporary dir to your liking

Upgrading is highly recommended, if only for the bug fix. Please report any issues using the issues tracker.

caudec: a multiprocess audio converter for Linux and OS X

Reply #6
Hi skamp.  I tried your caudec script and it is definitely very fast.  I tested it by transcoding from flac to ogg -q 7 an album of flacs and it shaved maybe 40% off the time taken by oggenc or by ffmpeg>wav>oggenc or straight ffmpeg -i $file -acodec libvorbis etc.  As far as I can tell all the speed benefit comes from parallel processing (I checked this by processing a single file and finding that in this case caudec is in fact slower than oggenc or a more typical bash script).  So I'm wondering what is the point of creating the tmpfs and doing so much copying?  Is it just to facilitate dropping files in and out of a queue?  I can't see any need to create a memory consuming structure for machines with large amounts of RAM, because transcoding is almost all CPU.  So I like your script's speed but I wonder if the same thing couldn't be achieved more simply by using job control to get bash running parallel encoder processes, or maybe I missed something important?

caudec: a multiprocess audio converter for Linux and OS X

Reply #7
While encoding is a parallel task, reading from a drive is intrinsically sequential. You can't double read speed by reading 2 files at once. In fact, you're likely to harm read speed. By queuing disk operations and running encoding purely in RAM, caudec cuts out the parallel read bottlenecks and runs the process as fast as possible.

caudec: a multiprocess audio converter for Linux and OS X

Reply #8
I can see the logic, but disk reads are very high these days.  How can there be a bottle neck when reading 6 or 8 or 10 lossless files of maybe between 20MB and 50MB each, which are going to take a a few seconds to decode and encode anyway? Surely that doesn't present any kind of challenge with modern hardware?

I'd noticed that oggenc on XP was significantly faster than a gcc compiled oggenc binary in Linux so I was keen to try to make up the difference and Skamp's script prompted me to go back to my bash scripts and add some parallel processing.  My scripts are simpler stuff: essentially decode+dump metadata function, encoder function, metadata writer function.  By letting the core functions of the script run in parallel/background processes (number of cores +1) I can achieve about the same improvements, for example the directory I transcoded earlier, flacs to oggs:

my original bash script:
real   3m3.301s
user   3m8.952s
sys   0m3.496s

caudec:
real   1m47.993s
user   3m11.467s
sys   0m4.126

my bash script with some parallel processes/backgrounding:
real   1m52.904s
user   3m10.826s
sys   0m3.877s

But I only have 4 year old dual core AMD Athlon64 desktop and a 5 year old Core Duo (32-bit only) and a similar vintage Core 2 Duo....no experience of i7 here so I can't personally scale my tests up to 4 cores and 8 threads. Has anyone with modern hardware (quad core, multi GB RAM, SATA III etc) actually measured the difference and if so is it found it to be substantial?  At the moment I can see Skamp's caudec page which compares single thread processing (and I assume conventional read from HDD) with parallel processing from tmpfs.  Obviously the parallelism makes a huge difference and perhaps that accounts for all or almost all the difference, so what is missing is some data showing that the tmpfs is solving a problem or adding a benefit.

edited for typos.

caudec: a multiprocess audio converter for Linux and OS X

Reply #9
What Canar said. Hard drives don't like concurrent access, and you actually lose read speed (more than proportionally) as you increase the number of concurrent accesses. My laptop hard drive tops out at maybe 70 MB/s on a single access, but it's not like it gives me 17.5 MB/s per file when I'm accessing 4 files at once, it gives me less than that. Same thing with my USB3 HDD where my backup resides. I tested it a while ago so I don't have the exact figures anymore, but my observation was that single-access, sequential reading was needed.

I have a quad-core i7 with 8 threads and 8 GiB of RAM, so my objective was to get the highest transcoding speeds possible while leveraging the gear at my disposal. Copying input files to a tmpfs sequentially while transcoding them concurrently proved to be the most efficient way. The speed gains range from slight to significant, depending on the gear, the configuration (number of processes, etc…) and the set of files you're transcoding. E.g. reading 8 files at once can slow my hard drive to a crawl.


caudec: a multiprocess audio converter for Linux and OS X

Reply #11
I dug up an old version (before 1.0) that didn't copy input files to a tmpfs. Here are the results when transcoding FLACs from my hard drive to Ogg Vorbis, with 8 processes, on a 2 CD album with 37 files (same external encoders):
  • old version: 71.41 seconds (15.0 MB/s) (124.3x)
  • latest caudec: 58.71 seconds (18.2 MB/s) (151.1x)

That's roughly a 21% speed increase. Maybe not quite as dramatic as one could hope, but substantial nonetheless.

Obviously I dropped filesystem caches before each run.

caudec: a multiprocess audio converter for Linux and OS X

Reply #12
Thanks for the info.  If I ever get an i7 I'll be keen to transcode this way.  I've been trying out different numbers of parallel processes and I've found that on my Athlon 64 I get maximum transcode speed by allowing 5 parallel processes instead of 3, and this now performs at least as quickly as the tmpfs method (time difference is <1%), though it's all snail paced compared to your i7 figures; where you get 124x I get 26x (all on the same disk) 

caudec: a multiprocess audio converter for Linux and OS X

Reply #13
I'm guessing your hard drive is less of a bottleneck with your configuration (CPU speed, number of concurrent reads on the HDD) than with mine

Incidentally, the tmpfs method provides no speed gain when I'm transcoding FLACs located on my SSD. In that case, the storage medium is no longer the bottleneck. Unfortunately, my SSD is nowhere near large enough to hold my entire FLAC library, so I still have to deal with my slowish HDD.

caudec: a multiprocess audio converter for Linux and OS X

Reply #14
I got my Core 2 Duo 1.6 GHz running 64-bit Debian Stable headless with 512MB RAM to hit the heady heights of 33x.  It's a champagne moment.  Tomorrow I buy the (parallel) stripes, body kit and chrome exhaust.

caudec: a multiprocess audio converter for Linux and OS X

Reply #15
I'd noticed that oggenc on XP was significantly faster than a gcc compiled oggenc binary in Linux so I was keen to try to make up the difference


That's the reason I added support for Windows binaries with Wine. There are instructions on how to install and use those with caudec.
lvqcl's Ogg Vorbis AoTuV ICC build might be of interest to you.

caudec: a multiprocess audio converter for Linux and OS X

Reply #16
I saw the info on wine and win binaries in your docs/examples and it struck a chord because I'd previously noticed a big discrepancy between the speed of oggenc in XP (with foobar as frontend) and oggenc in Debian 32-bit.  But as I don't make a habit of watching the text scroll by I can live with my newly parallelised scripts doing 26x or 33x (finally quicker than AoTuV in XP on my hardware). I'll stick with native binaries so I can run the same scripts across different free OS and architectures and not have to care if wine is installed/working/worth the effort.

caudec: a multiprocess audio converter for Linux and OS X

Reply #17
btw I booted my XP install to see what foobar2000 and oggenc were doing and discovered that the apparent gulf in encoder performance between oggenc in Debian and oggenc in XP was simply due to foobar2000 running two oggenc processes in parallel (XP version of oggenc being aoTuVb6.03 from rarewares).  Once both cores are maxed out oggenc performs a little faster (very little: <1%, probably has more to do with OS services than the binary) in Debian 32-bit than in XP SP3 32-bit though the difference is very slight (if you measured it using a button-press stopwatch you'd never know there was any difference).  Anyway if I happen again on an application which apparently performs hugely better or worse on a different OS I'll take a closer look before assuming something is either very wrong or inexplicably excellent....

caudec: a multiprocess audio converter for Linux and OS X

Reply #18
That's roughly a 21% speed increase. Maybe not quite as dramatic as one could hope, but substantial nonetheless.


The benefit gets more obvious as CPU time decreases (the HDD becomes more of a bottleneck). Here's a case where the difference becomes "dramatic": encoding WAVs to FLAC (-q 5, FLAC's default compression level).
  • old version: 70.63 seconds (22.2 MB/s) (125.7x)
  • latest caudec: 38.33 seconds (40.9 MB/s) (231.4x)

That's a 84% speed increase  YMMV of course.

caudec: a multiprocess audio converter for Linux and OS X

Reply #19
I am glad more Linux stuff is being done since I use Linux on my laptop and I learn new things all the time. Regards.

caudec: a multiprocess audio converter for Linux and OS X

Reply #20
I was curious, so I implemented a switch for disabling the preloading of input files to RAM, for cases where the underlying medium is a fast SSD, ramdisk or whatever. I ran a few tests with light to intensive CPU tasks, and the speed gains were negligible. Since inappropriate / uneducated use of that switch could easily cause terrible performance, I've decided to revert the change and not include it in a future release (not until everyone has terrabyte SSDs, at least).

caudec: a multiprocess audio converter for Linux and OS X

Reply #21
I released version 1.4.0, with many changes (pretty much as many commits as all of the other versions combined):
  • now runs on Mac OS X (tested on Lion)
  • smart handling of concurrent instances
  • better detection of ramdisks
  • don't abort if no ramdisk is available
  • support for e/m TAK compression parameters
  • removed reckless option to disable checking of available space
  • fixed long standing bugs in the installation script
  • fixed regression with empty APEv2 tags
  • better handling of ALAC metadata
  • changed handling of user interruption (Ctrl+C), removed pgrep dependency
  • lots of minor fixes

Upgrading is strongly recommended. Please use the tracker to report any bugs.

caudec: a multiprocess audio converter for Linux and OS X

Reply #22
Latest version (1.4.3) brings support for Opus and ALAC encoding, among other improvements and fixes. See changes.

caudec: a multiprocess audio converter for Linux and OS X

Reply #23
Excelent, thank you skamp. Going to test it on Debian 6.

caudec: a multiprocess audio converter for Linux and OS X

Reply #24
I released caudec 1.5.0. Changes:

  • Replaygain scanner (except for Opus and Musepack)
  • preservation of embedded artwork from FLAC and ALAC, to FLAC, ALAC, AAC and MP3
  • new -C switch disables metadata preservation
  • report both read and write speeds
  • better estimation of ramdisk space requirements with APE input files
  • various fixes


Thanks to Garf for his help on the RG scanner.