Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Help ripping ~30,000 CDs (Read 42892 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Help ripping ~30,000 CDs

Hey! I'm a newbie around these forums, but hopefully I'll be around quite a bit. I really like the community here, and hope I can contribute in the future. But enough introduction, here's the interesting stuff:

I work for a college radio station, and we've decided to undertake the rather ambitious project of digitizing the CDs we've acquired over the years. This is a pretty monumental undertaking, so I'm looking to make this as painless and quick as possible. We have a very rough approximation of about 30,000 CDs that we're looking to convert to digital files, and it's my job to work out many of the more technical aspects of the project.

The problem with being a college radio station is that we're on a pretty limited budget. We can't afford any sort of robot or anything like that to help the process along, nor can we afford any sort of service, so we're stuck doing it ourselves. Thankfully, we have a bunch of people willing to put the time and effort in. We also aren't terribly picky about getting every rip totally 100% perfect. But I've done a fair bit of research, and here's the kind of plan I had in mind:

Ideally, we have one pretty decent quad-core desktop that we're planning to outfit with four CD drives. We have software that allows us to rip multiple discs at once to V0 MP3s which are stored on a small RAID 1 array inside the computer. I've done some informal ripping tests, and have narrowed down the two pieces of software that seem to work best to fre:ac and dBpoweramp. I have also tried EAC and simply ripping with MediaMonkey, but freac and dBpoweramp seemed the most efficient and easy to use. Now, if I decide to use one of these pieces of software (if anyone has any suggestions, I'm 100% open to them!) how can I configure them to make them as painless as possible? Would using multiple drives be an option? I found very little information about software that provided ripping from multiple drives simultaneously, so I'm assuming this is not a common feature. If not, would using different computers be our best bet? If anyone has any other suggestions about ripping multiple discs at the same time or other ways to improve efficiency then that would probably make my life much easier.

thanks for your time!

Help ripping ~30,000 CDs

Reply #1
I would use cueripper & foobar2000 but that's just me...

Help ripping ~30,000 CDs

Reply #2
I would use cueripper & foobar2000 but that's just me...


would you cueripper for the actual digitization and foobar2000 for the library management? In terms of actually playing back the library, I kind of had my heart set on MediaMonkey, I feel like it's perfect for this sort of thing. But I'll definitely give both foobar and cueripper a try, thanks dood.

Help ripping ~30,000 CDs

Reply #3
What about the dBpoweramp batch ripper? Is there anything wrong with it?

Help ripping ~30,000 CDs

Reply #4
What about the dBpoweramp batch ripper? Is there anything wrong with it?


I'm about 1/2 way through the process of ripping about 10,000 CDs to FLAC files with dbpoweramp. And batch ripper is just fine...  Highly recommended.

Help ripping ~30,000 CDs

Reply #5
You might try running multiple instances of dbPoweramp each connected to a different drive.
Consider ripping to a lossless format.
TheWellTemperedComputer.com

Help ripping ~30,000 CDs

Reply #6
You might try running multiple instances of dbPoweramp each connected to a different drive.
Consider ripping to a lossless format.


It was something I gave some serious consideration to, but the only issue with that is hard drive space. After some math I figured out that I should be able to fit all of the CDs on two 2TB hard drives. I feel like using FLAC or other lossless formats would increase the number of storage space required by quite a bit. Also there is the issue of compatibility with other computer systems in the station. Not all of the computers use software that play nice with FLAC, but everything will play MP3s.

Help ripping ~30,000 CDs

Reply #7
What about the dBpoweramp batch ripper? Is there anything wrong with it?


I'm about 1/2 way through the process of ripping about 10,000 CDs to FLAC files with dbpoweramp. And batch ripper is just fine...  Highly recommended.


That is 100% exactly what I was looking for. I can't believe I didn't come across it even with all the searching I did. I'll try this out right now, thank you so much!

Help ripping ~30,000 CDs

Reply #8
Quote
Ideally, we have one pretty decent quad-core desktop that we're planning to outfit with four CD drives.
Not a bad idea.  But I would also consider allowing your volunteers to take a stack of CDs home.    As long as you standardize on software & settings, and as long as everybody saves their logs, that should work.  Maybe you can appeal to your listeners for more volunteers?  The task is manageable (perhaps in a semester or two) if you get enough people working on it in parallel.  You dont wan't too many people working on it, because you need to maintain standards and keep track of the CDs.

You'll need a plan/procedure for dealing with ripping errors.  Maybe try a different drive/computer, maybe look for another copy of the CD, maybe just have someone "authorized" to listen and approve the file if they can't hear anything wrong...

And, you'll need tagging standards because the online databases are not all standardized or correct.    With 4 drives ripping at once, I'd guess that checking/correcting tags and filenames will take just as much time as the ripping.

Quote
We have software that allows us to rip multiple discs at once to V0 MP3s...

After some math I figured out that I should be able to fit all of the CDs on two 2TB hard drives. I feel like using FLAC or other lossless formats would increase the number of storage space required by quite a bit...

Not all of the computers use software that play nice with FLAC, but everything will play MP3s.
Here are my thoughts...  My biggest concern is that a couple of years from now, or when you get half-way through the project, somebody is going to wish you'd used a lossless format.

FLAC is going to take...  maybe 3 times as much space as V0...  Maybe 4 times as much...    That might be manageable, and I would think about it.

You don't need a format that plays on all computers...  Just the radio station's computers, and you should be able to install a FLAC CODEC on all of the station's machines that are used for audio editing/playback.    If you've mostly got Macs, ALAC may be a better choice than FLAC.    And, any lossless format can be converted to any other lossless or lossy format if necessary or desired.

Help ripping ~30,000 CDs

Reply #9
OP: I would concur with others to consider lossless as well, until I remembered you saying you weren't picky about the ripping accuracy.*

In that case, I might suggest a conservative setting like -V2 or even -V3 to save more space (or even -V5 is supposed to be very good under normal circumstances). If I recall correctly there is one online streaming radio that uses 64kbps AAC which has even more space savings than all the previous MP3 settings I suggested.

The worst scenario is that the songs/albums that get played/requested the most can be re-ripped to lossless, which in all likelihood will be a tiny fraction of the overall total number of CD's in the library.

Good luck!

edit: * seems to me that lossless archival of audio CD's rips containing errors does not make much sense
"Something bothering you, Mister Spock?"

Help ripping ~30,000 CDs

Reply #10
Quote
Ideally, we have one pretty decent quad-core desktop that we're planning to outfit with four CD drives.
Not a bad idea.  But I would also consider allowing your volunteers to take a stack of CDs home.    As long as you standardize on software & settings, and as long as everybody saves their logs, that should work.  Maybe you can appeal to your listeners for more volunteers?  The task is manageable (perhaps in a semester or two) if you get enough people working on it in parallel.  You dont wan't too many people working on it, because you need to maintain standards and keep track of the CDs.


Unfortunately that's not an option for us.  We've had problems with theft in the past and as a result many of us are reluctant to let large chunks of our library out of our sight. Having things go in parallel is a good idea though. We have multiple computers in the station that don't get much use, maybe we can commandeer them.


You'll need a plan/procedure for dealing with ripping errors.  Maybe try a different drive/computer, maybe look for another copy of the CD, maybe just have someone "authorized" to listen and approve the file if they can't hear anything wrong...

And, you'll need tagging standards because the online databases are not all standardized or correct.    With 4 drives ripping at once, I'd guess that checking/correcting tags and filenames will take just as much time as the ripping.


I had the idea of dealing with any CD that didn't have metadata in this way: Say if working on a stack of 50CDs, the 30th one didn't have metadata. The person doing the bulk ripping would simply place a post-it or some other marking label on the jewel case and the CD would be revisited later on another computer where we could find/type in the metadata manually. MediaMonkey has an awesome Discogs tagging add-on which I'm sure will be invaluable.

Unreadable CDs and other serious ripping would have another (more angry colored) post-it placed on them for further review in the future. Honestly our current goal is to digitize as much of our library as possible but still rather quickly, so glossing over a CD here or there won't be too much of an issue.


Quote
We have software that allows us to rip multiple discs at once to V0 MP3s...

After some math I figured out that I should be able to fit all of the CDs on two 2TB hard drives. I feel like using FLAC or other lossless formats would increase the number of storage space required by quite a bit...

Not all of the computers use software that play nice with FLAC, but everything will play MP3s.
Here are my thoughts...  My biggest concern is that a couple of years from now, or when you get half-way through the project, somebody is going to wish you'd used a lossless format.

FLAC is going to take...  maybe 3 times as much space as V0...  Maybe 4 times as much...    That might be manageable, and I would think about it.


Hm. I think you're very much right. With 3 and 4TB drives dropping in price so much, I feel like lossless is going to not be so much more of an expense to have in the near future. Some other thread I found and some pretty basic math makes me feel like we're going to need roughly 10TB to store all of our stuff without ANY data redundancy. But the problem is that 10TB is 5 2TB drives or roughly $600. This is all dependent on budget, I guess, and budget info isn't something I have at the moment. Lossless files though are definitely the best idea for a serious archival project.
But the problem is that I don't know how serious this archival project needs to be because pristine sound quality isn't something that most people care much about here at the station. Some people actually play songs posted on YouTube on air.  (which I am fairly sure is a crime against nature )

I just can't figure out if the increase in sound quality when ripping lossless is worth huge the increase in space and expense. But then there's the issue of future-proofing.

Help ripping ~30,000 CDs

Reply #11
i second the suggestion of ripping to a lossless format. first, storage is cheap. second, generation loss could be an issue. after all, many if not most radio stations broadcast their show in a lossy format (internet radio, digital audio broadcasting, etc.).

Help ripping ~30,000 CDs

Reply #12
I ripped about 7000 CDs to FLAC using dBpoweramp, a Sony XL1B2 200-disc mediachanger (well actually two, luckily since one wore out ... 2nd hand ones available for cheap at Amazon: http://www.amazon.com/gp/offer-listing/B00...;condition=used ).

You can probably use dBpoweramp's Batch Ripper. What I did -- this was at a time Batch Ripper was fresh and a bit immature -- was to hack together an AutoIT3 script that automated dBpoweramp. I did once post it at http://www.avsforum.com/avs-vb/showthread....86#post13939586 , but don't hold it against me, it is fairly lame coding.  (And forget whatever I wrote there about HDCD. I regret using the HDCD DSP.)

I know that people have modified REACT to work with the mediachanger too.

Help ripping ~30,000 CDs

Reply #13
Sorry, I didn't read all posts carefully, nevertheless, let me add/emphasize some points.
  • You should insist on accurate, secure and lossless rips! You'll only do it once, do it right!
  • I've no experience with real batch rippers, but ripped a lot of CDs on ordinary PCs equipped with two CD-Rom drives. You will just open two instances of dbpoweramp or EAC and it's just fine. However, this is limited in terms of keeping a clear view on open CD cases on the desk and open program instances on the screen. With four drives you'll not gain much increase in over all speed, IMHO. The computer will wait for you (rather than you waiting for the computer).
  • Metadata is a crucial issue. Databases are not correct anyway and every n-th CD will not be found. Then you'll have to enter the data by yourself, the most time consuming step.
If you can't effort a batch ripper, you probably can distribute the job to many volunteers. In advance you have to agree on a standard:
  • codec
  • folder hierarchy
  • tagging scheme incl. cover images
  • what to do with unknown or erroneous CDs
When ripping manually I'll pre-sort a bunch of CDs: regular albums, sampler, soundtracks etc. This reflects my folder hierarchy and speeds up ripping in my case. You could consider things like this, when distributing to volunteers.

I see, you're not likely to give CDs away (I fully understand!). That way, a large room with many computers equipped with max. 2 drives each will help more than few computers with a lot of drives each. Just my humble opinion ;-). Such a set up will likely be outperformed by a real batch ripper. In addition a network storage might be interesting for you. And a scanner for missing cover art.

Just my thoughts, hopefully of some help for you :-)

Help ripping ~30,000 CDs

Reply #14
You will just open two instances of dbpoweramp or EAC and it's just fine.


Be careful. My experience with dBpoweramp is that it might from time to time switch to the most-recently-used drive. Probably not without telling me, but I have overlooked it (and gotten a few rips with absolutely wrong content). I don't think it is intended to have concurrent versions open.


pre-sort a bunch of CDs: regular albums, sampler, soundtracks etc.


Also:
- remasters, if you want to have them distinguished.  The metadata sources do not.
- promos. Some of them have beeb sounds and talking interfering with the music.
- I keep classical music away from the rest -- or rather: music sorted by composer, apart from music sorted by performer.

Help ripping ~30,000 CDs

Reply #15
On the cost of storing lossless files - consider the tens if not hundreds of thousands of dollars that those 30,000 CDs cost originally. Storage space for FLAC runs about 5 cents per CD.

On rippers, absolutely use dBpoweramp. I find that it saves a lot of time on metadata. It is also one of the, if not the, fastest ripper around.

On work flow, I would recommend that if a CD does not have metadata, set it aside in a pile to be ripped later. This saves having to match the metadata to the rip at a leter time.

Help ripping ~30,000 CDs

Reply #16
>My experience with dBpoweramp is that it might from time to time switch to the most-recently-used drive

The last R14.2 release should have eliminated this possibility, how ever for true multi drive ripping Batch Ripper was designed for that operation (and has been in use 24/7 for the last 4 years by the largest commercial ripping companies out there).

Help ripping ~30,000 CDs

Reply #17
First off, I want to say thanks so much to everyone replying to this thread so far. It's been incredibly helpful.



i second the suggestion of ripping to a lossless format. first, storage is cheap. second, generation loss could be an issue. after all, many if not most radio stations broadcast their show in a lossy format (internet radio, digital audio broadcasting, etc.).

This is a good point, we do broadcast online. I think that this is such a huge undertaking and going lossless will make it even huger, but I'm being convinced more and more that going lossless is worth the effort. Then it's just the issue of "how do we back up and make 10TB of data network accessible on the budget of a college radio station?"


I ripped about 7000 CDs to FLAC using dBpoweramp, a Sony XL1B2 200-disc mediachanger (well actually two, luckily since one wore out ... 2nd hand ones available for cheap at Amazon: http://www.amazon.com/gp/offer-listing/B00...;condition=used ).

You can probably use dBpoweramp's Batch Ripper. What I did -- this was at a time Batch Ripper was fresh and a bit immature -- was to hack together an AutoIT3 script that automated dBpoweramp. I did once post it at http://www.avsforum.com/avs-vb/showthread....86#post13939586 , but don't hold it against me, it is fairly lame coding.  (And forget whatever I wrote there about HDCD. I regret using the HDCD DSP.)

I know that people have modified REACT to work with the mediachanger too.

I appreciate the links, and dBoink is a pretty awesome name. If at one point we do decide to get a dedicated ripper, the XL1B will be at the top of the list, thank you! Does Sony have any current version of this that they're selling? And how did you go about storing 7000 CDs of FLAC files?

Sorry, I didn't read all posts carefully, nevertheless, let me add/emphasize some points.
  • You should insist on accurate, secure and lossless rips! You'll only do it once, do it right!
  • I've no experience with real batch rippers, but ripped a lot of CDs on ordinary PCs equipped with two CD-Rom drives. You will just open two instances of dbpoweramp or EAC and it's just fine. However, this is limited in terms of keeping a clear view on open CD cases on the desk and open program instances on the screen. With four drives you'll not gain much increase in over all speed, IMHO. The computer will wait for you (rather than you waiting for the computer).

This is something I didn't consider. I did some more tests last night, and it looks like three drives is the sweet spot. I also think I'll stick with dBpoweramp's batch ripper though.

[/li][li] Metadata is a crucial issue. Databases are not correct anyway and every n-th CD will not be found. Then you'll have to enter the data by yourself, the most time consuming step.[/li][/list]If you can't effort a batch ripper, you probably can distribute the job to many volunteers. In advance you have to agree on a standard:
  • codec
  • folder hierarchy
  • tagging scheme incl. cover images
  • what to do with unknown or erroneous CDs
When ripping manually I'll pre-sort a bunch of CDs: regular albums, sampler, soundtracks etc. This reflects my folder hierarchy and speeds up ripping in my case. You could consider things like this, when distributing to volunteers.

I think I'll deal with bad or missing metadata by marking CDs that dB couldn't rip automatically and revisiting later to manually type the data in, but the folders for different types of music is an excellent idea. We get lots of promo material and compilation albums so it's probably a good idea to separate the music into broad categories.

Here's the example folder hierarchy I was thinking of:
\Library\Category\Artist\Album (ID number we add when we get it)\Track number. Artist - Title

or for a real-life example:
Library\Full Albums\Modeselektor\Monkeytown (30458)\04. Modeselektor - Evil Twin.flac

I see, you're not likely to give CDs away (I fully understand!). That way, a large room with many computers equipped with max. 2 drives each will help more than few computers with a lot of drives each. Just my humble opinion ;-). Such a set up will likely be outperformed by a real batch ripper. In addition a network storage might be interesting for you. And a scanner for missing cover art.
Just my thoughts, hopefully of some help for you :-)

That's probably a great point. We have some older computers that may allow us to rip while keeping everything standardized, that's something I'll look in to. Coming up with a way to store the files efficiently and as cheaply as possible is another concern, too. But thankfully, cover art isn't terribly important, so we probably won't spend too much time on that.

It would be nice to digitize at a rate of like 100CDs/hour, but that's likely unattainable without multiple people working at once.

You will just open two instances of dbpoweramp or EAC and it's just fine.


Be careful. My experience with dBpoweramp is that it might from time to time switch to the most-recently-used drive. Probably not without telling me, but I have overlooked it (and gotten a few rips with absolutely wrong content). I don't think it is intended to have concurrent versions open.

pre-sort a bunch of CDs: regular albums, sampler, soundtracks etc.


Also:
- remasters, if you want to have them distinguished.  The metadata sources do not.
- promos. Some of them have beeb sounds and talking interfering with the music.
- I keep classical music away from the rest -- or rather: music sorted by composer, apart from music sorted by performer.

It would be nice to be able to be incredibly specific about these things (classical music and promo material) I'm only here three more years!  I think that we might overlook some of the more specific subfolder ordering in the interest of time. We'll be putting the entire library into MediaMonkey, too, so it'll be organized in that way.

On the cost of storing lossless files - consider the tens if not hundreds of thousands of dollars that those 30,000 CDs cost originally. Storage space for FLAC runs about 5 cents per CD.


That certainly puts it into perspective, yeah. Hard drives are pretty cheap in the long run.

It just seems like the biggest hurdle now is to store and back up all these terabytes of data we're going to create by going lossless and still make them accessible to the other computers on our local network.

Help ripping ~30,000 CDs

Reply #18
If at one point we do decide to get a dedicated ripper, the XL1B will be at the top of the list, thank you! Does Sony have any current version of this that they're selling? And how did you go about storing 7000 CDs of FLAC files?


They discontinued the XL1B (and dumped the prices gradually down to $84 for the last ones -- compare that to an original price of $799, which was itself half the price of the Powerfile it was based on!), and I think they replaced it with a BluRay changer. Which does not interest me, so I haven't paid attention since.

7000 CDs in FLAC, that fits on a single 3TB hard drive. (Plus backup and offsite backup.) 30 000 CDs should then fit on five. There are even cheap consumer-grade motherboards with 6 SATA connections.

Help ripping ~30,000 CDs

Reply #19
Bad news, our budget means we probably won't be able to afford the equipment to go full FLAC.  We'll probably stick with V0.

But for our RAID array I picked out this enclosure with four of these in RAID 5

Help ripping ~30,000 CDs

Reply #20
Little side note - CD ripping is not digitizing.

Help ripping ~30,000 CDs

Reply #21
Word of advice, if you want a stress free life, stay away from port multipliers...

Help ripping ~30,000 CDs

Reply #22
Bad news, our budget means we probably won't be able to afford the equipment to go full FLAC.  We'll probably stick with V0.

But for our RAID array I picked out this enclosure with four of these in RAID 5


Just my 2 cents, coming from low-cost world.

HP Proliant Microserver N40L (280 USD on amazon) + linux from usb flash drive + 5 x SATA 3GB drives in linux soft RAID5 (the 5th drive in the optical drive bay in SATA drive internal enclosure, SATA connector on board) = 12TB of reliable redundant file space. The drives are hot-swappable in linux (tested in internet forums). An extra eSATA connector available - possible to boot from another external SATA drive, or 6th drive for the array.

The only issue is price of harddrives which is going down only slowly now.

I would not rip to MP3s either, considering the amount of work the process will take.

And BTW if you need MP3s from FLACs, perhaps the linux FUSE mp3fs would come handy, it works very good http://khenriks.github.com/mp3fs/

Help ripping ~30,000 CDs

Reply #23
My experience with those SiliconImage SATA controllers - if more than one SATA drives are hooked, the performance goes down (raw read/write stream 130MB/s down to 90MB/s in our case for 7.200 SATAII drives). Hooking 4 SATA drives to a single SATA line via built-in replicator to the SiliconImage card and running RAID5 on top - I would not do that.

Help ripping ~30,000 CDs

Reply #24
Since this is turning into a discussion of storage (on a budget), here's my uneducated two cents, subject to change upon anyone's better arguments:

- RAID is not backup. RAID is a way to reduce the number of times you need to resort to your backup. RAID does not protect against a thief, a lightening strike, or a 'holy s**t, what did I just do?'. RAID5/6/Z/2Z gives you a limited time to replace a broken drive, that's all. That's a big deal if you care about uptime, but on a budget, you don't. You would rather take the array offline until you are sure it is OK again.

- Striping -- i.e., spreading one file over multiple drives -- (basically all RAIDs except RAID1 ... and some nonstandard solutions) is a bit dangerous: even if you have a fault-tolerance of 1 faulty drive of, say, 4 then you need all the other 3 in order to read a single file. You also need the RAID setup. That is, you cannot take a single drive out of the array and get anything out of it -- and if you will take the 3 working drives out, then you need to mount them in a RAID array that can read it.

- There is a proprietary solution called UnRAID which eliminates the issues of striping: it simply dedicates a drive as parity, monitors the other drives, and whenever you write to a drive, it also updates the parity drive. That means, you can take drive #2 out of the array, mount it on a different computer, and every file on drive #2 is readable. If drive #2 AND the parity drive is ruined -- then retrieve merely drive #2 from your backup and clone it. There is a performance loss (writing takes twice the time), but if media files are basically write-once-read-many, that is no issue.


If you still want to do striping (like, RAID5):

- Enclosure RAID with port multiplier? The www is full of complaints about data loss, so I dare not even try. Yes port multipliers slow things down (everything has to go through the same channel), and that might be one reason for issues -- the OS might give up because it sees the drive as unresponsive.
(I'm using a port multiplier myself, but with 5 individual drives, no striping, and it is still a bit stressful: I thought it would be no issue as I only read the file I'm playing, no writing -- so I thought: but Windows writes to the NTFS journal all the time, or something like that.)

- Stay away from 'hardware RAID'. Mainly because you won't actually get hardware RAID on a budget, even though some weasels market it as such -- it is done in the drivers, and kind of gives you all issues of hardware RAID and all issues of software RAID. And if you actually go for a hardware RAID card, then you need two identicals, in order to have a backup if the card breaks, further violating the 'on a budget' purpose.

- Linux software RAID? Less issues. FreeNAS with ZFS' RAID-Z? Tried it once on a too old box, ZFS does require a bit of resources.