Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Questions about AccurateRip (Read 13385 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Questions about AccurateRip

I've seen a few threads talking about the problem with the old AccurateRip CRC, and I'm still not clear on how significant the problem is.  If I were to rip a CD and AR tells me that it matched against an old AR CRC, then how confident should I be that it is truly accurate, and that an error in my rip didn't slip thru undetected?

Is there anyone who plans on re-ripping all of their CDs to try to get a match against the new AR CRC, because they aren't too confident in the old CRC?  I have not yet started ripping my few hundred CDs yet...  should I be comfortable ripping them now and have AR compare them to mostly old CRCs?  Or is it worth waiting 5-10 years until the AR database has had most of the old CRCs replaced with the new CRCs, before I start ripping them?

Another thing I have read is that an AR match is a match for the whole track, unless the track is the first track of the CD (in which case the first 5 frames are omitted) or the last track of the CD (in which case the last 5 frames are omitted).  Do I have that correct?  If that is true, then I don't understand how AR can declare whether the first/last tracks on any CD is accurate, if it is not taking into account the entire track?

Questions about AccurateRip

Reply #1

anyone that worried, will never be satisified.

if accurate says its accurate, all it means that someone else got the same result. So the bigger the confidence, the more got the same result.

waiting 5-10 years to rip just because of any potential error in AR would be ludicrous in any case

Questions about AccurateRip

Reply #2
That AR now uses an actual CRC does nothing to fix the possibility of consistent errors in the database, no matter how under-blown or over-blown the paranoia.  It merely reduces the possibility of collisions which was already insanely low to begin with since flaws with the old hash calculation weren't exploitable because of the way ripping errors occur.

Another thing I have read is that an AR match is a match for the whole track, unless the track is the first track of the CD (in which case the first 5 frames are omitted) or the last track of the CD (in which case the last 5 frames are omitted).  Do I have that correct?
Yes and the same holds for the new CRC.

If that is true, then I don't understand how AR can declare whether the first/last tracks on any CD is accurate, if it is not taking into account the entire track?
Do you have a better idea about how to account for drives with different offsets which cannot overread?

Questions about AccurateRip

Reply #3
If that is true, then I don't understand how AR can declare whether the first/last tracks on any CD is accurate, if it is not taking into account the entire track?


The width of the data "grove" of an audio cd is measured in microns.
Read errors are caused by disc damage.
Foil rot, digs and scratches are enormous by comparison to the width of the data.
Damage that would create an error in the first 5 sectors of a cd that could not be corrected by the drive would also create read errors in the adjacent rings of the data track. These errors would extend far beyond the first 5 sectors and would alter the Ar hash. Errors, let alone audible ones completely contained within the 5 sectors would be rare to say the least.

Questions about AccurateRip

Reply #4
AccurateRip is the one of the coolest idea ever!

But, before I knew about AR, or whenever AR says "cannot verify", or "not found in database", I've NEVER had an audible defect when EAC itself reports "no errors".  So, I'm not paranoid about it.

Questions about AccurateRip

Reply #5
That AR now uses an actual CRC does nothing to fix the possibility of consistent errors in the database, no matter how under-blown or over-blown the paranoia.

What exactly do you mean by "consistent errors"?  Do you mean when 2 different people get the exact same errors for a CD and submit the results into the database?  I'd say that is an extremely rare scenario that one shouldn't worry about.

It merely reduces the possibility of collisions which was already insanely low to begin with since flaws with the old hash calculation weren't exploitable because of the way ripping errors occur.

OK yes, I was wondering about the possibility of a bad rip's CRC colliding with an accurate rip's CRC due to the bug, and I was also wondering (but didn't mention in the original post) about the possibility of a collision between rips of 2 different CDs.  If you say the chances of any collisions is insanely low, then that does soothe my concerns.

If that is true, then I don't understand how AR can declare whether the first/last tracks on any CD is accurate, if it is not taking into account the entire track?
Do you have a better idea about how to account for drives with different offsets which cannot overread?

Well, if AR has a database of drives and knows whether they can or can't read the first/last 5 frames, then when a user submits a CRC, the user's drive model could also be submitted to let AR know if the CRC includes the first and/or last 5 frames.  So I guess you could have 4 different CRCs for each disc - CRC of full track, CRC of track minus first 5 frames, CRC of track minus last 5 frames, and CRC of track minus both first and last 5 frames.  Then when another user rips and compares, their drive model is sent with the request so that AR knows which CRC to compare it against.

The width of the data "grove" of an audio cd is measured in microns.
Read errors are caused by disc damage.
Foil rot, digs and scratches are enormous by comparison to the width of the data.
Damage that would create an error in the first 5 sectors of a cd that could not be corrected by the drive would also create read errors in the adjacent rings of the data track. These errors would extend far beyond the first 5 sectors and would alter the Ar hash. Errors, let alone audible ones completely contained within the 5 sectors would be rare to say the least.

Thanks, that is reassuring.

Questions about AccurateRip

Reply #6
Well, if AR has a database of drives and knows whether they can or can't read the first/last 5 frames, then when a user submits a CRC, the user's drive model could also be submitted to let AR know if the CRC includes the first and/or last 5 frames.  So I guess you could have 4 different CRCs for each disc - CRC of full track, CRC of track minus first 5 frames, CRC of track minus last 5 frames, and CRC of track minus both first and last 5 frames.  Then when another user rips and compares, their drive model is sent with the request so that AR knows which CRC to compare it against.

I think one of the dangers in this approach is that different firmware versions in the same drive could behave differently.

Besides, if I recall, one frame is 1/75 of a second, so 5 frames is 1/15 of a second. How often is the first or last 1/15 of a second of a CD not silence, or at least the tail end of a fade?

Questions about AccurateRip

Reply #7
Do you mean when 2 different people get the exact same errors for a CD and submit the results into the database?
Yes.

I'd say that is an extremely rare scenario that one shouldn't worry about.
Perhaps, but despite what you may have read, you should realize that errors don't necessarily have to be caused by damage.  They can also be caused by any combination of the following: pressing-wide manufacturing defect, defective hardware and buggy ripping software.  All three possibilities have been documented on this forum and on others.

If you say the chances of any collisions is insanely low, then that does soothe my concerns.
So that we're clear a collision is when two different rips give the exact same hash.  The old hash (which should not be called a CRC because it was never a CRC; if it were a CRC then there probably would never have been a new hash which is finally now a CRC) ignored some bits in the data at only a select few locations and one entire channel of one sample once out of every 65,536 samples.  Ripping errors just don't happen such that they will only occur at the exact same spot that the AR hash loses coverage.  If you're able to verify your disc against an alternate offset that is not a multiple of 65,536 samples, then the odds of errant data not being covered drops to zero.

So I guess you could have 4 different CRCs for each disc - CRC of full track, CRC of track minus first 5 frames, CRC of track minus last 5 frames, and CRC of track minus both first and last 5 frames.
So that we're clear, the missing first five frames are only on the first track and the missing last five frames are on the last track.  Having a CRC just for drives that can overread do nothing to satisfy the paranoia of those who don't have drives that can overread (which is most drives), though this is only for one track.  The other track can be fully covered since an offset only affects one of the tracks, not both.

Then when another user rips and compares, their drive model is sent with the request so that AR knows which CRC to compare it against.
Seems like an unnecessary complication and a lot of extra record keeping to me.

Questions about AccurateRip

Reply #8
Besides, if I recall, one frame is 1/75 of a second, so 5 frames is 1/15 of a second. How often is the first or last 1/15 of a second of a CD not silence, or at least the tail end of a fade?

Good point.

I'd say that is an extremely rare scenario that one shouldn't worry about.
Perhaps, but despite what you may have read, you should realize that errors don't necessarily have to be caused by damage.  They can also be caused by any combination of the following: pressing-wide manufacturing defect, defective hardware and buggy ripping software.  All three possibilities have been documented on this forum and on others.

That's good to know.

If you're able to verify your disc against an alternate offset that is not a multiple of 65,536 samples, then the odds of errant data not being covered drops to zero.

Can you please explain what you mean by "verify your disc against an alternate offset"?

Having a CRC just for drives that can overread do nothing to satisfy the paranoia of those who don't have drives that can overread (which is most drives), though this is only for one track.

But it would satisfy the paranoia of those who do have drives that can overread. 

The other track can be fully covered since an offset only affects one of the tracks, not both.

So then there only needs to be 3 CRCs. 

Questions about AccurateRip

Reply #9
Can you please explain what you mean by "verify your disc against an alternate offset"?


The Accuraterip database may contain hash data that comes from different manufacturing lots, ripped cd-r or mounted images. These can contain exactly the same audio data however it has been shifted by the addition or subtraction of some amount of samples. (basically same byte stream but with different starting points)

The Accuraterip hash masks certain bits in certain samples in a pattern.
For example I created a sample wave file and then calculated the Ar hash for it.
The hash was 17ff5f16
Then I simulated a read error by changing the value of a bit in the right channel sample at position 2.
When a hash is calculated for this different wave you would expect a different value however I get 17ff5f16.
The bit I changed in the right channel sample at position 2 is one of those masked bits. Since it is not used in the Ar hash calculation it does not matter what its value is as demonstrated by my example.

Now if alternate offset hash data existed and I shifted the masked error at sample #2 to another position that is used in the hash then a comparison of the hash values would show the error. The offset does not need to be a multiple of 65,536. In my example shifting the audio by say 3 samples would be all that is needed.

Questions about AccurateRip

Reply #10
Quote
If you're able to verify your disc against an alternate offset that is not a multiple of 65,536 samples, then the odds of errant data not being covered drops to zero.


An update to R14 will have this feature, I think I might be able to take credit for thinking it up

Questions about AccurateRip

Reply #11
Now if alternate offset hash data existed and I shifted the masked error at sample #2 to another position that is used in the hash then a comparison of the hash values would show the error. The offset does not need to be a multiple of 65,536. In my example shifting the audio by say 3 samples would be all that is needed.

Shifting the offset and the error will never happen in the real world.

So that we're clear 65,536 is the number of samples between the data in one channel being dropped.  Let's assume I have a track where only one channel of one sample is in error and the rest of the data was correct (which might not even be possible).  If there is data for this track with an offset of 65,533, then the errant sample will be included in the new hash and won't match the one on record.

In the case of a single bit missing (again, is this type of ripping error even possible ???), I don't know it's periodicity and was not assuming that it is every 65,536 samples.  Whatever it is, though, there will be offsets that cause it to be re-masked and offsets that do not.  Are you telling us that there is one bit that is masked every 3 samples or 33% of all possible offsets?  If so then I was misreading you.  Seeing that the coverage of the old hash is reported as being 97%, just a lone single bit being masked 33% of the time would account for more than a third of what is not covered.  This seems high to me.

Questions about AccurateRip

Reply #12
So then there only needs to be 3 CRCs. 

No, there would still need to be four since not all drives have negative read offsets.

Anyhow, this request for separate checksums for overreading drives has been asked before and shot down each and every time.  I'm not sure if Spoon is ever going to change his mind on this, and quite frankly, I support his current position.

Questions about AccurateRip

Reply #13
In the case of a single bit missing (again, is this type of ripping error even possible ???), I don't know it's periodicity and was not assuming that it is every 65,536 samples.


Well, it is always "possible" that an error which passes the C1&C2 parity protection, is interpolated with nothing more and nothing less than a single bit wrong. But as far as I  understand,  C1 and C2 error corrections work at byte (= 8 bits) level and not at bit level, right? Doesn't that mean that when errors exceed the level of parity protection, then at least a byte is interpolated?

Right? I'm no expert here.

(I have no idea how bad these pesky Defective by Design discs can be on this matter -- some do yield so much bit errors that AccurateRip is totally useless, and should be left out of the discussion; some seem to verify fairly well on burst ripping though. But DbD discs alone do anyway not justify a change in AccurateRip, IMHO.)


Seeing that the coverage of the old hash is reported as being 97%, just a lone single bit being masked 33% of the time would account for more than a third of what is not covered.  This seems high to me.


By strange coincidence, the "97%" equals the Red Book minimum number of correct frames. Yes, up to 3% of the frames can have (one or more) errors. (And 3% is also -- approximately, of course -- the chance of having a single bit in a byte wrong by uniform draw, cf. the above. Don't know if that is a coincidence, or if that is the way the AR suboptimality did emerge in the first place?)

Questions about AccurateRip

Reply #14
Quote
If you're able to verify your disc against an alternate offset that is not a multiple of 65,536 samples, then the odds of errant data not being covered drops to zero.

An update to R14 will have this feature, I think I might be able to take credit for thinking it up

I thought that R14 is already able to verify against a different offset?

Questions about AccurateRip

Reply #15
Are you telling us that ...


No

I did misread what you said about every 65,536 samples. My apologies.

Some years before Ar I did see a track rip with single sample error.
More recently one with 8 sample errors all in the right channel.
I could not tell you on either of them how they would fall into the masking pattern of Ar.
All others I recall had many more errors evenly distributed between right and left.
I'll assume at this point that I have never seen an Ar false positive, at least not one due to data masking.

Questions about AccurateRip

Reply #16
But as far as I  understand,  C1 and C2 error corrections work at byte (= 8 bits) level and not at bit level, right?

I believe it works at frame level where a frame is comprised of 6 samples, not to be confused with sectors of 588 samples or whatever nouns are most proper and/or you prefer to use.  My memory of CIRC is extremely rusty.  Maybe Pio or someone with similar knowledge might come along and clarify.

at least a byte is interpolated?

How do you interpolate just a byte?  All interpolation I've ever viewed was nothing more than a line connecting adjacent samples.

By strange coincidence, the "97%" equals the Red Book minimum number of correct frames.

Yes, it is a coincidence.  You can easily find the old AR hash calculation on this forum and compare it to the way in which CIRC works.  You'll find that they are quite a bit different.

@alondon, I've seen a difference between two accurate pressings that differed by just one sample (but both channels) or it was sample N for one channel and sample N+1 for the other, IIRC.  Considering we're talking about actual pressings, I can't say exactly how the data corruption occurred.

EDIT: I recently stumbled across a rip that had just one sample in just one channel in error.  The CRC for the audio data of the track matched that in the log, indicating that there was no post-ripping corruption.

Questions about AccurateRip

Reply #17
Quote
If you're able to verify your disc against an alternate offset that is not a multiple of 65,536 samples, then the odds of errant data not being covered drops to zero.

An update to R14 will have this feature, I think I might be able to take credit for thinking it up

I thought that R14 is already able to verify against a different offset?


It does, this is to strengthen the non-AR2 CRC by matching against 2/3/4 pressings and then presenting that, if you have the additional match boosts the older CRC.

Questions about AccurateRip

Reply #18
But as far as I  understand,  C1 and C2 error corrections work at byte (= 8 bits) level and not at bit level, right?

I believe it works at frame level where a frame is comprised of 6 samples


Well yeah, isn't it 28 bytes = 24 bytes data + 4 bytes parity, right? (6 * 16 * 2 / 8 = 24). But if a frame turns out C1-uncorrectable I have no idea how the interleaving works and how C1 and C2 work in combination.

Anyway, a sample is two bytes (per channel), so if the smallest-interpolable-unit is between two samples, then interpolation is even less likely to produce something which is one-and-only-one bit wrong. But OTOH, this probability is likely to be much higher than under the "uniformity" assumption, since a simple average is likely a much better guess.

(The "97%" thing was rather what is reasonable to considered a "high number".)



That said: if AccurateRip were to be upgraded anyway, of course you fix errors even if they did not turn out material. It's a g33k thing

Questions about AccurateRip

Reply #19
That AR now uses an actual CRC does nothing to fix the possibility of consistent errors in the database, no matter how under-blown or over-blown the paranoia.  It merely reduces the possibility of collisions which was already insanely low to begin with since flaws with the old hash calculation weren't exploitable because of the way ripping errors occur.


I am not aware of anyone demonstrating that this is actually a problem. The probability of an identical error occurring on 2 separate physical discs and being read the same way on two different drives has to be basically zero. If the problem was with the glass master then its not really an error.

Their are algorithms that scan for pops and clicks and I have suggested that spoon add a DSP to scan rips and allow users to actually listen to any potential spots that are picked up. Though this would have a greater role for rips that can't be verified with AR.

Questions about AccurateRip

Reply #20
I am not aware of anyone demonstrating that this is actually a problem.
As Woodiville said the other day, please don't commit the fallacy of incredulity.

If the problem was with the glass master then its not really an error.
I mentioned other possible causes than a problem with the glass master.

The odds of two separate physical discs being read the same wrong way on two different physical drives is not "basically zero."


Questions about AccurateRip

Reply #22
They can also be caused by any combination of the following: pressing-wide manufacturing defect, defective hardware and buggy ripping software.  All three possibilities have been documented on this forum and on others.

I've provided links not too long ago and don't feel like trying to dig them up again.  Google is your friend.

Questions about AccurateRip

Reply #23
NP, thanks for the reply.
1. pressing-wide manufacturing defect
2. defective hardware 
3. buggy ripping software

I agree 2 and 3 could be real. I don't think 1 is really a ripping error. The rip would still be accurate, you are just accurately reading and reporting "bad" data.

#3 may be overcome by using CTDB as its separate from AR and can verify a rip as well as repair it. #2 is much harder, especially since the internal components of most drives are only made by a few companies and could reproduce an error across multiple manufacturers and drives...

I guess if you really wanted to be paranoid AR could report the number of different drives and rippers in an AR report (ie  AR 26 [14 drives, 2 rippers]).

Questions about AccurateRip

Reply #24
Not "could be", 2 and 3 are real.  There are drives that sometimes shift data by two bytes, reversing the channels and offsetting them by one sample, for example.  (EDIT: Please note that when I say hardware, I am including the possibility that there is a problem with the firmware which can make two drives based on the same internal components behave differently.)  There are several examples of EAC causing errors as well.  One such error was fixed and some recent evidence has got me thinking that this fix is now responsible for other problems.

It is possible for #1 to produce different but consistent results depending on the drive and on the software and how it is configured.

X number of drives and Y number of rippers would help to reduce at least some of the paranoia, though there will probably always be nut-jobs out there.  Though I don't really think the exceptions I'm discussing warrant changes, I do think this is a very good idea.