IPB

Welcome Guest ( Log In | Register )

9 Pages V   1 2 3 > »   
Reply to this topicStart new topic
A secure ripper for linux, My latest Python programming results
frodoontop
post Oct 31 2005, 18:42
Post #1





Group: Members
Posts: 131
Joined: 6-March 03
Member No.: 5359



UPDATE: new version 0.2 release at 4 august 2006. A lot of improvements.
UPDATE: New version 0.1.1 released at 12 february 2006. Hopefully no more gettext trouble at startup.
UPDATE: New rewritten in Ruby: version 0.1, renamed to Rubyripper is released at 29 januari 2006.
UPDATE: new version 0.1.2 is released at 10 november.
UPDATE: New version 0.1.1 is released at 6 november.
UPDATE: New version 0.1 is released at 5 november.

See http://rubyforge.org/projects/rubyripper/ for the changelog.

This info in this first post is quite dated. I suggest you go to the Hydrogenaudio wiki page for more updated information. I'll try to keep the info there up-to-date.

Hi all,

I was getting tired of emulating EAC on linux, so I decided to program a relative simple but secure procedure to rip my cd's. I've totally rewritten the original program in ruby and it now includes a Gtk2-based GUI.

So what is Rubyripper:

- an easy to use gui which uses of cdparanoia and cdda2wav.
- a smart way to make sure rips are done perfect. For details look at my site at the documentation section. There the main idea is worked out.
- support for lame, vorbis and flac
- playlist support
- fetches cddb info
- save settings which autoload on startup

Known problems:
Special character in tags are not supported at the moment. I guess It has something to do with unicode. The files are named correctly though.

How to install it?
Make sure to have ruby-freedb, ruby-libglade2, cdparanoia and cdda2wav installed as a minimum. You can optionally choose for lame, vorbis or flac, depending on the format you want to encode to.

Then download Rubyripper from my site.

Unpack it (tar xfj <filename>), make rubyripper.rb executable (chmod +x <filename>) and just run from inside the directory ./rubyripper.rb. If it doesn't work please make sure dependencies are ok.


The source (same as executable) is published under the GPL license. The rewrite is linux/bsd-only for now. For any Mac Os users who are interested: please respond to my question at page 4.

Let me hear if you find it usable or when any errors occur. I have already tested on a dozen cd's of mine, but there can always occur new problems. I'm also open to feature requests smile.gif

This post has been edited by frodoontop: Aug 4 2006, 20:20


--------------------
A secure audio ripper for linux: code.google.com/p/rubyripper
Go to the top of the page
+Quote Post
damaki
post Oct 31 2005, 20:35
Post #2





Group: Members
Posts: 143
Joined: 13-July 03
From: Paris, France
Member No.: 7740



Can it rip flawlessly scratched discs with a caching drive? Because cdparanoia is such a pain with mine that I use EAC with wine.
Nevertheless these are good news. smile.gif
[edit:]I did not mention the scratched discs.

This post has been edited by damaki: Oct 31 2005, 20:42


--------------------
Stupidity is root of all evil.
Go to the top of the page
+Quote Post
frodoontop
post Oct 31 2005, 20:58
Post #3





Group: Members
Posts: 131
Joined: 6-March 03
Member No.: 5359



Please test it out! My idea is that a wrong rip will always have a different SHA1-checksum in different tries. If this is true, it won't get unnoticed and pyripper tries it again.


--------------------
A secure audio ripper for linux: code.google.com/p/rubyripper
Go to the top of the page
+Quote Post
damaki
post Oct 31 2005, 21:07
Post #4





Group: Members
Posts: 143
Joined: 13-July 03
From: Paris, France
Member No.: 7740



You should mention the python-cddb module dependency wink.gif
[edit]Forget this, I missed it on my first read...

This post has been edited by damaki: Oct 31 2005, 21:15


--------------------
Stupidity is root of all evil.
Go to the top of the page
+Quote Post
Jan S.
post Oct 31 2005, 21:11
Post #5





Group: Admin
Posts: 2550
Joined: 26-September 01
From: Denmark
Member No.: 21



If the drive caches I'm pretty sure this will be useless...


edit: See sTisTi's reply for more info.

This post has been edited by Jan S.: Oct 31 2005, 21:32
Go to the top of the page
+Quote Post
frodoontop
post Oct 31 2005, 21:15
Post #6





Group: Members
Posts: 131
Joined: 6-March 03
Member No.: 5359



Though I did mention the module, I made it more clear now.

About drives that cache being useless: Is it really absolutely impossible to have a correct rip from them? According to my knowledge cdparanoia can't handle any error correction. But does it always need error correction to provide a correct rip?


--------------------
A secure audio ripper for linux: code.google.com/p/rubyripper
Go to the top of the page
+Quote Post
sTisTi
post Oct 31 2005, 21:27
Post #7





Group: Members
Posts: 385
Joined: 25-June 04
Member No.: 14895



QUOTE (frodoontop @ Oct 31 2005, 12:15 PM)
Though I did mention the module, I made it more clear now.

About drives that cache being useless: Is it really absolutely impossible to have a correct rip from them? According to my knowledge cdparanoia can't handle any error correction. But does it always need error correction to provide a correct rip?
*

I think the problem is that drives that cache do not report an error as they re-read the data from their cache instead from the CD. However, if you rip the complete track twice and compare the CRCs, caching is no problem (like in EAC's test&copy feature). But then you also don't need any "secure" ripping methods, you can just as well rip in burst mode, which is way faster.Therefore, using "secure" ripping methods with "test&copy" feature is pointless, as the matching CRCs are safety enough. And for error correction through re-reading (as in EAC's secure mode), drives that cache audio are worthless because, as said above, they re-read from their cache instead of trying to read the disc again. The only way to avoid this is to DISABLE the cache as e.g. EAC and Plextools Pro can do. So, the real challenge is to find a way to disable the cache in conjunction with cdparanoia. Then, it would probably have the same usefulness for secure rips as EAC.
As for error correction, this is not necessary for perfect rips as long as the discs have no errors wink.gif

This post has been edited by sTisTi: Oct 31 2005, 21:32


--------------------
Proverb for Paranoids: "If they can get you asking the wrong questions, they don't have to worry about answers."
-T. Pynchon (Gravity's Rainbow)
Go to the top of the page
+Quote Post
Jan S.
post Oct 31 2005, 21:30
Post #8





Group: Admin
Posts: 2550
Joined: 26-September 01
From: Denmark
Member No.: 21



I don't know what EAC does to bypass cache but foobar reads blocks bigger than the cache thus bypassing it.
Cdparanoia would probably have to be modified to do this but I don't know...
Go to the top of the page
+Quote Post
damaki
post Oct 31 2005, 21:40
Post #9





Group: Members
Posts: 143
Joined: 13-July 03
From: Paris, France
Member No.: 7740



Well, seems like the comparison between the two rips of this track will never match.
Each time it gets a different hash. As far as I know, that is just a plain logical behavior. smile.gif


--------------------
Stupidity is root of all evil.
Go to the top of the page
+Quote Post
frodoontop
post Oct 31 2005, 21:41
Post #10





Group: Members
Posts: 131
Joined: 6-March 03
Member No.: 5359



Since there are no python bindings to cdparanoia this is difficult to achieve for me. It would require to make changes in the source code of cdparanoia, which is (I think) written in C. I simply do not have the skills for that.

But for what we have now, it's already a big improvement in my humble opinion. Test and copy was nowhere available in linux yet for as far as I know. At the least you can be certain that a rip was either succesfull or failed.

@damaki: Good to hear, this works for others too. I only had two discs so far with a mismatch.

This post has been edited by frodoontop: Oct 31 2005, 21:54


--------------------
A secure audio ripper for linux: code.google.com/p/rubyripper
Go to the top of the page
+Quote Post
de Mon
post Oct 31 2005, 21:43
Post #11





Group: Members
Posts: 474
Joined: 1-December 02
Member No.: 3940



QUOTE (Jan S. @ Oct 31 2005, 12:30 PM)
I don't know what EAC does to bypass cache but foobar reads blocks bigger than the cache thus bypassing it.
Cdparanoia would probably have to be modified to do this but I don't know...
*


My CD-Drive caches audio data.
Since CDex uses cdparanoia, do I understand you right - I CAN'T use CDex?


--------------------
Ogg Vorbis for music and speech [q-2.0 - q6.0]
FLAC for recordings to be edited
Speex for speech
Go to the top of the page
+Quote Post
Jan S.
post Oct 31 2005, 22:24
Post #12





Group: Admin
Posts: 2550
Joined: 26-September 01
From: Denmark
Member No.: 21



I'll just correct myself... since this checks hash of complete file it is unaffected by cache.

Cache is only a problem if you rip a small piece at a time and compare several rips of that small block. Since a complete rip obviously is not cached this is not a problem.
Go to the top of the page
+Quote Post
sTisTi
post Oct 31 2005, 22:33
Post #13





Group: Members
Posts: 385
Joined: 25-June 04
Member No.: 14895



QUOTE (de Mon @ Oct 31 2005, 12:43 PM)
My CD-Drive caches audio data.
Since CDex uses cdparanoia, do I understand you right - I CAN'T use CDex?
*

At least you can't be sure of getting secure rips if you use CDex in paranoia mode due to the caching problem. You could as well use burst mode for that, which is faster.
CDex is really worse than useless for drives that cache audio - I once tried to extract a heavily scratched CD with my laptop drive that caches audio in paranoia mode. It flew in a breeze over the CD without complaining, reporting any error or re-reading, but the result was abysmal: full of pops and clicks.
For secure rips with a drive that caches audio, you have to either:
- use some kind of "test & copy" feature (in burst mode) and compare CRCs
- use a secure mode that can disable cache (i.e. EAC's secure mode)
- use the accuraterip database (with EAC or dbPoweramp)
- or maybe use foobar's new ripper, but I don't know it

This post has been edited by sTisTi: Oct 31 2005, 22:34


--------------------
Proverb for Paranoids: "If they can get you asking the wrong questions, they don't have to worry about answers."
-T. Pynchon (Gravity's Rainbow)
Go to the top of the page
+Quote Post
rjamorim
post Oct 31 2005, 23:02
Post #14


Rarewares admin


Group: Members
Posts: 7515
Joined: 30-September 01
From: Brazil
Member No.: 81



QUOTE (Jan S. @ Oct 31 2005, 06:30 PM)
I don't know what EAC does to bypass cache but foobar reads blocks bigger than the cache thus bypassing it.
Cdparanoia would probably have to be modified to do this but I don't know...
*


EAC sends special commands (FUA bit to flush cache) to the drive. CDparanoia has that feature in the CVS, but not in the release versions. And the CVS FUA code is broken anyway. I think FUA is the least "hackish" way of working around cache.

This post has been edited by rjamorim: Oct 31 2005, 23:03


--------------------
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org
Go to the top of the page
+Quote Post
xmixahlx
post Nov 1 2005, 01:43
Post #15





Group: Members
Posts: 1394
Joined: 20-December 01
From: seattle
Member No.: 693



(off-topic:)
another great feature of EAC missong on gnu/linux (that i know of) is CUE reading/writing/support.

soooo... a great feature to implement would be correct creation & implementation of teh CUE.

closest to support on gnu/linux is K3B (which is only partial/basic support).


(back to on-topic)
regarding audio cache - wouldn't a drive that caches audio data have identical rips in multiple passes, yet these identical checksums would fail to provide a secure rip due to the audio cache?

in that case, consistency != secure... just a thought...


later

This post has been edited by xmixahlx: Nov 1 2005, 01:48


--------------------
RareWares/Debian :: http://www.rarewares.org/debian.html
Go to the top of the page
+Quote Post
legg
post Nov 1 2005, 03:44
Post #16





Group: Members
Posts: 175
Joined: 5-March 05
From: Morelia, Mexico
Member No.: 20386



QUOTE (frodoontop @ Oct 31 2005, 03:41 PM)
Since there are no python bindings to cdparanoia this is difficult to achieve for me. It would require to make changes in the source code of cdparanoia, which is (I think) written in C. I simply do not have the skills for that.

But for what we have now, it's already a big improvement in my humble opinion. Test and copy was nowhere available in linux yet for as far as I know. At the least you can be certain that a rip was either succesfull or failed.

@damaki: Good to hear, this works for others too. I only had two discs so far with a mismatch.
*


A mighty suggestion for your ripper:
1. If the compare between 2 files fails, keep the two copies.
2. Compare each sample of both files, samples that are identical should be written to a 3rd file, fill with zeros the rest, and keep a table of 'samples to go'.
3. Re-rip, now you have a 4th file, compare the 'samples to go' from this 4th file with 1st and 2nd file, if one matches write to 3rd file, update 'samples to go'
4. Re-rip, now you have a 5th file...and so on and on until samples to go is void.

An orthodox way of doing things but I believe this should mean WAY LESS rips to get a proper file.

This post has been edited by legg: Nov 1 2005, 03:45


--------------------
Home page: http://lc.fie.umich.mx/~legg/indexen.php
Go to the top of the page
+Quote Post
skamp
post Nov 1 2005, 09:11
Post #17





Group: Developer
Posts: 1444
Joined: 4-May 04
From: France
Member No.: 13875



A few suggestions:
  • cdparanoia supports offset correction, like EAC. It would be a shame not to include it (-O switch of cdparanoia)
  • you might want to make the temporary filenames a bit more secure, by adding $$ (PID of the program)
  • I don't see the point of keeping the first rip once you've computed its hash. If /tmp is all in RAM (which it should), you're only monopolizing RAM for nothing (and .wav files are relatively big). I don't know python very well, but if I'm not mistaking, you're only deleting the whole album at once, when all the tracks have been ripped (500-600 MiB x2 !).
  • instead of supporting only a few codecs, you could simply allow the user to add a custom command line, like in EAC. I rip my CDs to WavPack!
  • you might also want to make cdparanoia's command line tweakable.
Good initiative anyway smile.gif


--------------------
See my profile for measurements, tools and recommendations.
Go to the top of the page
+Quote Post
sTisTi
post Nov 1 2005, 15:52
Post #18





Group: Members
Posts: 385
Joined: 25-June 04
Member No.: 14895



QUOTE (xmixahlx @ Oct 31 2005, 04:43 PM)
(back to on-topic)
regarding audio cache - wouldn't a drive that caches audio data have identical rips in multiple passes, yet these identical checksums would fail to provide a secure rip due to the audio cache?

in that case, consistency != secure... just a thought...
*

No, as the drive's cache is way too small to hold a whole song. So matching CRCs with test & copy are secure for any drive.


--------------------
Proverb for Paranoids: "If they can get you asking the wrong questions, they don't have to worry about answers."
-T. Pynchon (Gravity's Rainbow)
Go to the top of the page
+Quote Post
cabbagerat
post Nov 1 2005, 17:21
Post #19





Group: Members
Posts: 1018
Joined: 27-September 03
From: Cape Town
Member No.: 9042



QUOTE (legg @ Oct 31 2005, 06:44 PM)
A mighty suggestion for your ripper:
1. If the compare between 2 files fails, keep the two copies.
2. Compare each sample of both files, samples that are identical should be written to a 3rd file, fill with zeros the rest, and keep a table of 'samples to go'.
3. Re-rip, now you have a 4th file, compare the 'samples to go' from this 4th file with 1st and 2nd file, if one matches write to 3rd file, update 'samples to go'
4. Re-rip, now you have a 5th file...and so on and on until samples to go is void.
*

This is a good idea, of course, it's not necessary to store the entire file - just the hash value for each of the attempts.

Alongside the SHA-1 comparison, maybe you should actively compare the two files, so you can get an idea of how different they are. This could be kept as a number of different samples, or peak RMS difference over the track. Average RMS difference would also be instructive. Because of the way SHA-1 works, a flip of a single bit will have a major effect on the hash output. While this is good for cryptographic applications, there are instances when a bit flip (one or two LSB flips in a whole file, for example) is simply not a problem. Maybe you should give the user the option of running until the peak signal difference is -80dB or something.

Anyways, this looks really promising. Good luck for future development.


--------------------
Simulate your radar: http://www.brooker.co.za/fers/
Go to the top of the page
+Quote Post
cartman
post Nov 1 2005, 17:28
Post #20





Group: Members
Posts: 76
Joined: 11-January 04
From: Turkiye
Member No.: 11118



Well good initiative for Linux users like me. I am working a small lame/ogg/flac frontend so I might just hook this up in somehow. Keep it coming smile.gif .


--------------------
An eye for eye will make the whole world blind
Go to the top of the page
+Quote Post
frodoontop
post Nov 1 2005, 19:24
Post #21





Group: Members
Posts: 131
Joined: 6-March 03
Member No.: 5359



Some good points I have have seen here smile.gif. I'll try to implement them, provided that I can, as soon as possible. This will probably be in the weekend.

I will make the cdparanoia options tweakable in next version.

QUOTE (skamp @ Nov 1 2005, 12:11 AM)
[*]you might want to make the temporary filenames a bit more secure, by adding $$ (PID of the program)
[*]I don't see the point of keeping the first rip once you've computed its hash. If /tmp is all in RAM (which it should), you're only monopolizing RAM for nothing (and .wav files are relatively big). I don't know python very well, but if I'm not mistaking, you're only deleting the whole album at once, when all the tracks have been ripped (500-600 MiB x2 !).


Sorry I fail to see, how security has anything to do with this. By the way, /tmp isn't in RAM, but is just a local directory tmp which is destroyed at the end of the ripping process. I could also use the official TEMP, as is given by the system.
And why should the temp files be in RAM? No other ripper I know does this, including EAC. This would require a lot of RAM usage! Please your opinions about this.

Although disk space isn't a concern of mine, I'll change the process anyway. Next version will rip a track and save the digit in a list. It will rip the track again and looks if the digests match (as is now) and otherwise append the digest to the list (which is new). If a third rip is necessary it will look if the digest at the end compare to one of the digests in the list. And so on. If the encoding is finished, the source files will be deleted to save space and the digestlist will be emptied. There will be an option to keep the source files. This will save unnecessary rip trials and hard disk.

QUOTE (skamp @ Nov 1 2005, 12:11 AM)
[*]instead of supporting only a few codecs, you could simply allow the user to add a custom command line, like in EAC. I rip my CDs to WavPack!


I didn't know people were using other formats in linux. Please feel welcome to tell which format and what a sensible default would be smile.gif

And now I have some coffee tongue.gif


--------------------
A secure audio ripper for linux: code.google.com/p/rubyripper
Go to the top of the page
+Quote Post
frodoontop
post Nov 1 2005, 19:45
Post #22





Group: Members
Posts: 131
Joined: 6-March 03
Member No.: 5359



I'm back from my coffee tongue.gif

QUOTE (cabbagerat @ Nov 1 2005, 08:21 AM)
Alongside the SHA-1 comparison, maybe you should actively compare the two files, so you can get an idea of how different they are. This could be kept as a number of different samples, or peak RMS difference over the track. Average RMS difference would also be instructive. Because of the way SHA-1 works, a flip of a single bit will have a major effect on the hash output. While this is good for cryptographic applications, there are instances when a bit flip (one or two LSB flips in a whole file, for example) is simply not a problem. Maybe you should give the user the option of running until the peak signal difference is -80dB or something.


Though I like the idea of seeing how much the files would differ, this would not be very simple to implement. RMS for one thing only tells about the average volume of a track. Ripping errors are not likely to change the result of this significantly in my opinion. I'll see if something else could be done, but it won't have high priority. Any ideas how to solve this will be welcome though.

@cartman & maybe others: In the end I want to have a gui on top of this. But since I'm not a really good designer this might still take a while. You might just help with designing a glade file for a gtk gui (with glade or gazpacho) or a ui file (with qt designer) for a qt gui. I'm sure that actually linking the gui to the program won't be that hard. I already experimented some smile.gif


--------------------
A secure audio ripper for linux: code.google.com/p/rubyripper
Go to the top of the page
+Quote Post
cartman
post Nov 1 2005, 20:18
Post #23





Group: Members
Posts: 76
Joined: 11-January 04
From: Turkiye
Member No.: 11118



QUOTE (frodoontop @ Nov 1 2005, 10:45 PM)
@cartman & maybe others: In the end I want to have a gui on top of this. But since I'm not a really good designer this might still take a while. You might just help with designing a glade file for a gtk gui (with glade or gazpacho) or a ui file (with qt designer) for a qt gui. I'm sure that actually linking the gui to the program won't be that hard. I already experimented some smile.gif
*


Mine is just an Qt4 experiment ;-) just make sure backend is seperate from frontend smile.gif


--------------------
An eye for eye will make the whole world blind
Go to the top of the page
+Quote Post
jcoalson
post Nov 1 2005, 20:56
Post #24


FLAC Developer


Group: Developer
Posts: 1526
Joined: 27-February 02
Member No.: 1408



if diskspace is not an issue, using 'cmp' instead of 'sha1sum' will be faster, especially when there is a difference since cmp stops at the first different byte.

Josh
Go to the top of the page
+Quote Post
frodoontop
post Nov 1 2005, 23:25
Post #25





Group: Members
Posts: 131
Joined: 6-March 03
Member No.: 5359



Well, the SHA1-sum of two files takes about 2 seconds to execute on my sempron 2600. Notice that I use the internal sha1-module of python for this. Compared to the 3-7 minute time to actually rip each track twice, this is not really a problem.


--------------------
A secure audio ripper for linux: code.google.com/p/rubyripper
Go to the top of the page
+Quote Post

9 Pages V   1 2 3 > » 
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 22nd September 2014 - 01:01