Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: A secure ripper for linux (Read 162343 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

A secure ripper for linux

UPDATE: new version 0.2 release at 4 august 2006. A lot of improvements.
UPDATE: New version 0.1.1 released at 12 february 2006. Hopefully no more gettext trouble at startup.
UPDATE: New rewritten in Ruby: version 0.1, renamed to Rubyripper is released at 29 januari 2006.
UPDATE: new version 0.1.2 is released at 10 november.
UPDATE: New version 0.1.1 is released at 6 november.
UPDATE: New version 0.1 is released at 5 november.

See http://rubyforge.org/projects/rubyripper/ for the changelog.

This info in this first post is quite dated. I suggest you go to the Hydrogenaudio wiki page for more updated information. I'll try to keep the info there up-to-date.

Hi all,

I was getting tired of emulating EAC on linux, so I decided to program a relative simple but secure procedure to rip my cd's. I've totally rewritten the original program in ruby and it now includes a Gtk2-based GUI.

So what is Rubyripper:

- an easy to use gui which uses of cdparanoia and cdda2wav.
- a smart way to make sure rips are done perfect. For details look at my site at the documentation section. There the main idea is worked out.
- support for lame, vorbis and flac
- playlist support
- fetches cddb info
- save settings which autoload on startup

Known problems:
Special character in tags are not supported at the moment. I guess It has something to do with unicode. The files are named correctly though.

How to install it?
Make sure to have ruby-freedb, ruby-libglade2, cdparanoia and cdda2wav installed as a minimum. You can optionally choose for lame, vorbis or flac, depending on the format you want to encode to.

Then download Rubyripper from my site.

Unpack it (tar xfj <filename>), make rubyripper.rb executable (chmod +x <filename>) and just run from inside the directory ./rubyripper.rb. If it doesn't work please make sure dependencies are ok.


The source (same as executable) is published under the GPL license. The rewrite is linux/bsd-only for now. For any Mac Os users who are interested: please respond to my question at page 4.

Let me hear if you find it usable or when any errors occur. I have already tested on a dozen cd's of mine, but there can always occur new problems. I'm also open to feature requests
A secure audio ripper for linux: code.google.com/p/rubyripper

A secure ripper for linux

Reply #1
Can it rip flawlessly scratched discs with a caching drive? Because cdparanoia is such a pain with mine that I use EAC with wine.
Nevertheless these are good news.
[edit:]I did not mention the scratched discs.
Stupidity is root of all evil.

A secure ripper for linux

Reply #2
Please test it out! My idea is that a wrong rip will always have a different SHA1-checksum in different tries. If this is true, it won't get unnoticed and pyripper tries it again.
A secure audio ripper for linux: code.google.com/p/rubyripper

A secure ripper for linux

Reply #3
You should mention the python-cddb module dependency
[edit]Forget this, I missed it on my first read...
Stupidity is root of all evil.

A secure ripper for linux

Reply #4
If the drive caches I'm pretty sure this will be useless...


edit: See sTisTi's reply for more info.

A secure ripper for linux

Reply #5
Though I did mention the module, I made it more clear now.

About drives that cache being useless: Is it really absolutely impossible to have a correct rip from them? According to my knowledge cdparanoia can't handle any error correction. But does it always need error correction to provide a correct rip?
A secure audio ripper for linux: code.google.com/p/rubyripper

A secure ripper for linux

Reply #6
Quote
Though I did mention the module, I made it more clear now.

About drives that cache being useless: Is it really absolutely impossible to have a correct rip from them? According to my knowledge cdparanoia can't handle any error correction. But does it always need error correction to provide a correct rip?
[a href="index.php?act=findpost&pid=338602"][{POST_SNAPBACK}][/a]

I think the problem is that drives that cache do not report an error as they re-read the data from their cache instead from the CD. However, if you rip the complete track twice and compare the CRCs, caching is no problem (like in EAC's test&copy feature). But then you also don't need any "secure" ripping methods, you can just as well rip in burst mode, which is way faster.Therefore, using "secure" ripping methods with "test&copy" feature is pointless, as the matching CRCs are safety enough. And for error correction through re-reading (as in EAC's secure mode), drives that cache audio are worthless because, as said above, they re-read from their cache instead of trying to read the disc again. The only way to avoid this is to DISABLE the cache as e.g. EAC and Plextools Pro can do. So, the real challenge is to find a way to disable the cache in conjunction with cdparanoia. Then, it would probably have the same usefulness for secure rips as EAC.
As for error correction, this is not necessary for perfect rips as long as the discs have no errors
Proverb for Paranoids: "If they can get you asking the wrong questions, they don't have to worry about answers."
-T. Pynchon (Gravity's Rainbow)

A secure ripper for linux

Reply #7
I don't know what EAC does to bypass cache but foobar reads blocks bigger than the cache thus bypassing it.
Cdparanoia would probably have to be modified to do this but I don't know...

A secure ripper for linux

Reply #8
Well, seems like the comparison between the two rips of this track will never match.
Each time it gets a different hash. As far as I know, that is just a plain logical behavior.
Stupidity is root of all evil.

A secure ripper for linux

Reply #9
Since there are no python bindings to cdparanoia this is difficult to achieve for me. It would require to make changes in the source code of cdparanoia, which is (I think) written in C. I simply do not have the skills for that.

But for what we have now, it's already a big improvement in my humble opinion. Test and copy was nowhere available in linux yet for as far as I know. At the least you can be certain that a rip was either succesfull or failed.

@damaki: Good to hear, this works for others too. I only had two discs so far with a mismatch.
A secure audio ripper for linux: code.google.com/p/rubyripper

A secure ripper for linux

Reply #10
Quote
I don't know what EAC does to bypass cache but foobar reads blocks bigger than the cache thus bypassing it.
Cdparanoia would probably have to be modified to do this but I don't know...
[a href="index.php?act=findpost&pid=338608"][{POST_SNAPBACK}][/a]


My CD-Drive caches audio data.
Since CDex uses cdparanoia, do I understand you right - I CAN'T use CDex?
Ogg Vorbis for music and speech [q-2.0 - q6.0]
FLAC for recordings to be edited
Speex for speech

A secure ripper for linux

Reply #11
I'll just correct myself... since this checks hash of complete file it is unaffected by cache.

Cache is only a problem if you rip a small piece at a time and compare several rips of that small block. Since a complete rip obviously is not cached this is not a problem.

A secure ripper for linux

Reply #12
Quote
My CD-Drive caches audio data.
Since CDex uses cdparanoia, do I understand you right - I CAN'T use CDex?
[a href="index.php?act=findpost&pid=338611"][{POST_SNAPBACK}][/a]

At least you can't be sure of getting secure rips if you use CDex in paranoia mode due to the caching problem. You could as well use burst mode for that, which is faster.
CDex is really worse than useless for drives that cache audio - I once tried to extract a heavily scratched CD with my laptop drive that caches audio in paranoia mode. It flew in a breeze over the CD without complaining, reporting any error or re-reading, but the result was abysmal: full of pops and clicks.
For secure rips with a drive that caches audio, you have to either:
- use some kind of "test & copy" feature (in burst mode) and compare CRCs
- use a secure mode that can disable cache (i.e. EAC's secure mode)
- use the accuraterip database (with EAC or dbPoweramp)
- or maybe use foobar's new ripper, but I don't know it
Proverb for Paranoids: "If they can get you asking the wrong questions, they don't have to worry about answers."
-T. Pynchon (Gravity's Rainbow)

A secure ripper for linux

Reply #13
Quote
I don't know what EAC does to bypass cache but foobar reads blocks bigger than the cache thus bypassing it.
Cdparanoia would probably have to be modified to do this but I don't know...
[a href="index.php?act=findpost&pid=338608"][{POST_SNAPBACK}][/a]


EAC sends special commands (FUA bit to flush cache) to the drive. CDparanoia has that feature in the CVS, but not in the release versions. And the CVS FUA code is broken anyway. I think FUA is the least "hackish" way of working around cache.

A secure ripper for linux

Reply #14
(off-topic:)
another great feature of EAC missong on gnu/linux (that i know of) is CUE reading/writing/support.

soooo... a great feature to implement would be correct creation & implementation of teh CUE.

closest to support on gnu/linux is K3B (which is only partial/basic support).


(back to on-topic)
regarding audio cache - wouldn't a drive that caches audio data have identical rips in multiple passes, yet these identical checksums would fail to provide a secure rip due to the audio cache?

in that case, consistency != secure... just a thought...


later

A secure ripper for linux

Reply #15
Quote
Since there are no python bindings to cdparanoia this is difficult to achieve for me. It would require to make changes in the source code of cdparanoia, which is (I think) written in C. I simply do not have the skills for that.

But for what we have now, it's already a big improvement in my humble opinion. Test and copy was nowhere available in linux yet for as far as I know. At the least you can be certain that a rip was either succesfull or failed.

@damaki: Good to hear, this works for others too. I only had two discs so far with a mismatch.
[a href="index.php?act=findpost&pid=338610"][{POST_SNAPBACK}][/a]


A mighty suggestion for your ripper:
1. If the compare between 2 files fails, keep the two copies.
2. Compare each sample of both files, samples that are identical should be written to a 3rd file, fill with zeros the rest, and keep a table of 'samples to go'.
3. Re-rip, now you have a 4th file, compare the 'samples to go' from this 4th file with 1st and 2nd file, if one matches write to 3rd file, update 'samples to go'
4. Re-rip, now you have a 5th file...and so on and on until samples to go is void.

An orthodox way of doing things but I believe this should mean WAY LESS rips to get a proper file.

A secure ripper for linux

Reply #16
A few suggestions:
  • cdparanoia supports offset correction, like EAC. It would be a shame not to include it (-O switch of cdparanoia)
  • you might want to make the temporary filenames a bit more secure, by adding $$ (PID of the program)
  • I don't see the point of keeping the first rip once you've computed its hash. If /tmp is all in RAM (which it should), you're only monopolizing RAM for nothing (and .wav files are relatively big). I don't know python very well, but if I'm not mistaking, you're only deleting the whole album at once, when all the tracks have been ripped (500-600 MiB x2 !).
  • instead of supporting only a few codecs, you could simply allow the user to add a custom command line, like in EAC. I rip my CDs to WavPack!
  • you might also want to make cdparanoia's command line tweakable.
Good initiative anyway

A secure ripper for linux

Reply #17
Quote
(back to on-topic)
regarding audio cache - wouldn't a drive that caches audio data have identical rips in multiple passes, yet these identical checksums would fail to provide a secure rip due to the audio cache?

in that case, consistency != secure... just a thought...
[a href="index.php?act=findpost&pid=338651"][{POST_SNAPBACK}][/a]

No, as the drive's cache is way too small to hold a whole song. So matching CRCs with test & copy are secure for any drive.
Proverb for Paranoids: "If they can get you asking the wrong questions, they don't have to worry about answers."
-T. Pynchon (Gravity's Rainbow)

A secure ripper for linux

Reply #18
Quote
A mighty suggestion for your ripper:
1. If the compare between 2 files fails, keep the two copies.
2. Compare each sample of both files, samples that are identical should be written to a 3rd file, fill with zeros the rest, and keep a table of 'samples to go'.
3. Re-rip, now you have a 4th file, compare the 'samples to go' from this 4th file with 1st and 2nd file, if one matches write to 3rd file, update 'samples to go'
4. Re-rip, now you have a 5th file...and so on and on until samples to go is void.
[a href="index.php?act=findpost&pid=338671"][{POST_SNAPBACK}][/a]

This is a good idea, of course, it's not necessary to store the entire file - just the hash value for each of the attempts.

Alongside the SHA-1 comparison, maybe you should actively compare the two files, so you can get an idea of how different they are. This could be kept as a number of different samples, or peak RMS difference over the track. Average RMS difference would also be instructive. Because of the way SHA-1 works, a flip of a single bit will have a major effect on the hash output. While this is good for cryptographic applications, there are instances when a bit flip (one or two LSB flips in a whole file, for example) is simply not a problem. Maybe you should give the user the option of running until the peak signal difference is -80dB or something.

Anyways, this looks really promising. Good luck for future development.

A secure ripper for linux

Reply #19
Well good initiative for Linux users like me. I am working a small lame/ogg/flac frontend so I might just hook this up in somehow. Keep it coming  .
An eye for eye will make the whole world blind

A secure ripper for linux

Reply #20
Some good points I have have seen here . I'll try to implement them, provided that I can, as soon as possible. This will probably be in the weekend.

I will make the cdparanoia options tweakable in next version.

Quote
  • you might want to make the temporary filenames a bit more secure, by adding $$ (PID of the program)
  • I don't see the point of keeping the first rip once you've computed its hash. If /tmp is all in RAM (which it should), you're only monopolizing RAM for nothing (and .wav files are relatively big). I don't know python very well, but if I'm not mistaking, you're only deleting the whole album at once, when all the tracks have been ripped (500-600 MiB x2 !).


Sorry I fail to see, how security has anything to do with this. By the way, /tmp isn't in RAM, but is just a local directory tmp which is destroyed at the end of the ripping process. I could also use the official TEMP, as is given by the system.
And why should the temp files be in RAM? No other ripper I know does this, including EAC. This would require a lot of RAM usage! Please your opinions about this.

Although disk space isn't a concern of mine, I'll change the process anyway. Next version will rip a track and save the digit in a list. It will rip the track again and looks if the digests match (as is now) and otherwise append the digest to the list (which is new). If a third rip is necessary it will look if the digest at the end compare to one of the digests in the list. And so on. If the encoding is finished, the source files will be deleted to save space and the digestlist will be emptied. There will be an option to keep the source files. This will save unnecessary rip trials and hard disk.

Quote
  • instead of supporting only a few codecs, you could simply allow the user to add a custom command line, like in EAC. I rip my CDs to WavPack!


I didn't know people were using other formats in linux. Please feel welcome to tell which format and what a sensible default would be

And now I have some coffee
A secure audio ripper for linux: code.google.com/p/rubyripper

A secure ripper for linux

Reply #21
I'm back from my coffee

Quote
Alongside the SHA-1 comparison, maybe you should actively compare the two files, so you can get an idea of how different they are. This could be kept as a number of different samples, or peak RMS difference over the track. Average RMS difference would also be instructive. Because of the way SHA-1 works, a flip of a single bit will have a major effect on the hash output. While this is good for cryptographic applications, there are instances when a bit flip (one or two LSB flips in a whole file, for example) is simply not a problem. Maybe you should give the user the option of running until the peak signal difference is -80dB or something.


Though I like the idea of seeing how much the files would differ, this would not be very simple to implement. RMS for one thing only tells about the average volume of a track. Ripping errors are not likely to change the result of this significantly in my opinion. I'll see if something else could be done, but it won't have high priority. Any ideas how to solve this will be welcome though.

@cartman & maybe others: In the end I want to have a gui on top of this. But since I'm not a really good designer this might still take a while. You might just help with designing a glade file for a gtk gui (with glade or gazpacho) or a ui file (with qt designer) for a qt gui. I'm sure that actually linking the gui to the program won't be that hard. I already experimented some
A secure audio ripper for linux: code.google.com/p/rubyripper

A secure ripper for linux

Reply #22
Quote
@cartman & maybe others: In the end I want to have a gui on top of this. But since I'm not a really good designer this might still take a while. You might just help with designing a glade file for a gtk gui (with glade or gazpacho) or a ui file (with qt designer) for a qt gui. I'm sure that actually linking the gui to the program won't be that hard. I already experimented some
[a href="index.php?act=findpost&pid=338797"][{POST_SNAPBACK}][/a]


Mine is just an Qt4 experiment ;-) just make sure backend is seperate from frontend
An eye for eye will make the whole world blind

A secure ripper for linux

Reply #23
if diskspace is not an issue, using 'cmp' instead of 'sha1sum' will be faster, especially when there is a difference since cmp stops at the first different byte.

Josh

A secure ripper for linux

Reply #24
Well, the SHA1-sum of two files takes about 2 seconds to execute on my sempron 2600. Notice that I use the internal sha1-module of python for this. Compared to the 3-7 minute time to actually rip each track twice, this is not really a problem.
A secure audio ripper for linux: code.google.com/p/rubyripper