IPB

Welcome Guest ( Log In | Register )

CUETools DB
Eli
post Mar 31 2010, 02:27
Post #1





Group: Members
Posts: 1056
Joined: 16-October 03
Member No.: 9337



I have only now become aware of Gregory S. Chudov's effort to develop CTDB (CUETools DB). I am very excited about this as I have actually been suggesting this to spoon (dbpoweramp's developer for over 5 years)

http://db.cuetools.net/about.php

for others that have missed it:
QUOTE
What's it for?
You probably heard about AccurateRip, a wonderfull database of CD rip checksums, which helps you make sure your CD rip is an exact copy of original CD. What it can tell you is how many other people got the same data when copying this CD. CUETools Database is an extension of this idea.
What are the advantages?

* The most important feature is the ability not only to detect, but also correct small amounts of errors that occured in the ripping process.
* It's free of the offset problems. You don't even need to set up offset correction for your CD drive to be able to verify and what's more important, submit rips to the database. Different pressings of the same CD are treated as the same disc by the database, it doesn't care.
* Verification results are easier to deal with. There are exactly three possible outcomes: rip is correct, rip contains correctable errors, rip is unknown (or contains errors beyond repair).
* If there's a match, you can be certain it's really a match, because in addition to recovery record database uses a well-known CRC32 checksum of the whole CD image (except for 10*588 offset samples in the first and last seconds of the disc). This checksum is used as a rip ID in CTDB.

What are the downsides and limitations?

* CUETools DB doesn't bother with tracks. Your rip as a whole is either good/correctable, or it isn't. If one of the tracks is damaged beyound repair, CTDB cannot tell which one.
* If your rip contains errors, verification/correction process will involve downloading about 200kb of data, which is much more than it takes for AccurateRp.
* Verification process is slower than with AR.
* Database was just born and at the moment contains much less CDs than AR.

How many errors can a rip contain and still be repairable?

* That depends. The best case scenario is when there's one continuous damaged area up to 30-40 sectors (about half a second) long.
* The worst case scenario is 4 non-continuous damaged sectors in (very) unlucky positions.

What information does the database contain per each submission?

* CD TOC (Table Of Contents), i.e. length of every track.
* Offset-finding checksum, i.e. small (16 byte) recovery record for a set of samples throughout the CD, which allows to detect the offset difference between the rip in database and your rip, even if your rip contains some errors.
* CRC32 of the whole disc (except for some leadin/leadout samples).
* Submission date, artist, title.
* 180kb recovery record, which is stored separately and accessed only when verifying a broken rip or repairing it.


--------------------
http://forum.dbpoweramp.com/showthread.php?t=21072
Go to the top of the page
+Quote Post
 
Start new topic
Replies
Gregory S. Chudo...
post Mar 31 2010, 18:46
Post #2





Group: Developer
Posts: 695
Joined: 2-October 08
From: Ottawa
Member No.: 59035



Of course CTDB is open, and the code required to use it is LGPLed as all CUETools libraries. The only problem is it's in C#, i wonder if i will have to provide a .dll with C interface at some point. The algorithm is not very simple, there's quite a lot of code.

The basic algorithm is Reed-Solomon code on 16-bit words. Unfortunately, 32-bit Reed-Solomon is extremely slow, and 16-bit Reed-Solomon can be used directly only on chunks of up to 64k words == 128kbytes. So i have to process the file as a matrix with rows of 10 sectors (5880 samples == 11760 words/columns). Such matrix has up to ~30000 rows for a 70 minute CD, so i can use 16-bit Reed-Solomon for each column independently. Using the same notation as in wikipedia it's a (65536,65528) code, which produces 8 words for each column. So the total size is 8*11760*16bit = 188160 bytes.

N-word recovery record can detect and correct up to N/2 erroneous words, so this 8-word recovery record can detect up to 4 errors in each column. N cannot be much smaller, but it also cannot be much larger, because processing time grows proportionally to N, so N=8 was chosen as the highest value which is still "fast enough" - close to FLAC decoding speed.

Row size doesn't have such impact on performance, so it can be easily extended in the future, so that popular CDs can have larger recovery records. Current size was chosen so that if database contained as many entries as AccurateRip, it would fit on a 1TB drive. I also took into account that making records larger only helps in best-case scenario when the damage is sequential (scratches etc). When damage occurs at random points, fixing it requires larger N, not larger row size, but this has a performance impact. So the current record size was chosen to be more or less balanced.

Is there a point in better identification of where the damage is, when the database is unable to fix it?

Discs don't have to pass AR before being added to the CTDB, AR is used only as a kind of proof that there is a physical CD with such content when adding with CUETools.
CD Rippers can add CDs to CTDB even if AR doesn't know them. There is already a number of CDs in database submitted by CUERipper, some of them have confidence 1 - that means they didn't pass AR check or weren't found in AR.

This post has been edited by Gregory S. Chudov: Mar 31 2010, 19:03


--------------------
CUETools 2.1.4
Go to the top of the page
+Quote Post
Eli
post Mar 31 2010, 19:33
Post #3





Group: Members
Posts: 1056
Joined: 16-October 03
Member No.: 9337



QUOTE (Gregory S. Chudov @ Mar 31 2010, 13:46) *
Is there a point in better identification of where the damage is, when the database is unable to fix it?


Not for RS repair, however for the ripper, this would allow re-ripping of the part of the disc where CRCs do not match and therefore are the problem areas.

QUOTE
Discs don't have to pass AR before being added to the CTDB, AR is used only as a kind of proof that there is a physical CD with such content when adding with CUETools.
CD Rippers can add CDs to CTDB even if AR doesn't know them. There is already a number of CDs in database submitted by CUERipper, some of them have confidence 1 - that means they didn't pass AR check or weren't found in AR.


My reason for suggesting that the DB should only include AR confirmed discs is to verify that the correction data will fix a disc to the correct state. Also, it may help limit the size of the database by only adding correct discs.

QUOTE
Row size doesn't have such impact on performance, so it can be easily extended in the future, so that popular CDs can have larger recovery records.


I would argue that less popular discs may warrant more data as they may be less replaceable...

How is meta-data handled in your database since this info is also saved?

This post has been edited by Eli: Mar 31 2010, 19:34


--------------------
http://forum.dbpoweramp.com/showthread.php?t=21072
Go to the top of the page
+Quote Post
Gregory S. Chudo...
post Apr 5 2010, 12:15
Post #4





Group: Developer
Posts: 695
Joined: 2-October 08
From: Ottawa
Member No.: 59035



QUOTE (Eli @ Mar 31 2010, 22:33) *
Not for RS repair, however for the ripper, this would allow re-ripping of the part of the disc where CRCs do not match and therefore are the problem areas.

This information will only be useful for rippers specially designed for it. I'm hoping very much that such rippers as EAC will support CTDB at some point, but i doubt that this support will go beyond simple verification/submission. Besides, ripper usually knows where the problem areas are (using C2 error pointers and comparing results from several passes).

QUOTE (Eli @ Mar 31 2010, 22:33) *
My reason for suggesting that the DB should only include AR confirmed discs is to verify that the correction data will fix a disc to the correct state. Also, it may help limit the size of the database by only adding correct discs.

You can never be sure that correction data will fix a disc to the correct state. As with AccurateRip, all you can be sure of is that a certain number of submissions have the same data. If you rip a CD with EAC and there were errors, your incorrect rip will appear in AccurateRip database at some point, after that you can submit your incorrect rip to CTDB (if it accepts rips with confidence 1). Besides, there are CDs that are absent in AR database, and i want CTDB to be able to handle them.
As for the impact on database size, we will have to see how it goes. Maybe at some point i will have to do periodic purges of old unconfirmed submissions (with confidence 1).

QUOTE (Eli @ Mar 31 2010, 22:33) *
I would argue that less popular discs may warrant more data as they may be less replaceable...

There are a lot more unpopular CDs than popular ones, so we can double the amount of data stored for CDs with > 100 submissions and the database will only grow by 10%.

QUOTE (Eli @ Mar 31 2010, 22:33) *
How is meta-data handled in your database since this info is also saved?

For now it only keeps artist/title information. CTDB server also has a replicated MusicBrainz database clone. For the moment all this is used only in web interface, which helps manage the database. I plan to improve integration with MusicBrainz when the next version of MusicBrainz database schema comes out. Would be nice to fetch all the necessary data in one request to a single server. CUERipper currently has to contact 4 different databases (AR, CTDB, MuscBrainz, FreeDB), which can sometimes take a lot of time.


--------------------
CUETools 2.1.4
Go to the top of the page
+Quote Post

Posts in this topic
- Eli   CUETools DB   Mar 31 2010, 02:27
- - Eli   For those that would like to see this added to dBp...   Mar 31 2010, 02:32
- - Eli   Chudov, What type of error correction are you usin...   Mar 31 2010, 17:50
- - Gregory S. Chudov   Of course CTDB is open, and the code required to u...   Mar 31 2010, 18:46
|- - Eli   QUOTE (Gregory S. Chudov @ Mar 31 2010, 13...   Mar 31 2010, 19:33
|- - Gregory S. Chudov   QUOTE (Eli @ Mar 31 2010, 22:33) Not for ...   Apr 5 2010, 12:15
- - Eli   Gregory, Thanks for the reply. Any thought of cha...   Apr 5 2010, 18:35
- - krafty   In the future, WILL THERE BE something that may de...   Apr 5 2010, 19:09
- - greynol   As I said elsewhere, consolidation is a good thing...   Apr 5 2010, 19:32
- - krafty   Well said, and if there was a way to identify thes...   Apr 6 2010, 00:15
- - Eli   A tool to scan audio for clicks/pops characteristi...   Apr 6 2010, 02:04
- - Eli   dBpoweramp forum CUETools DB thread QUOTE (Spoon;9...   Apr 8 2010, 00:11
- - Gregory S. Chudov   QUOTE (Eli @ Apr 5 2010, 21:35) Also, hav...   Apr 16 2010, 22:03
|- - Eli   QUOTE (Gregory S. Chudov @ Apr 16 2010, 17...   Apr 19 2010, 23:03
- - Gregory S. Chudov   We all benefit from CTDB support in as many applic...   Apr 20 2010, 11:06
- - Eli   spoon has said in the past that the majority of hi...   Apr 20 2010, 14:34
- - Eli   After my last post I thought about this some more....   Apr 20 2010, 15:50
|- - zfox   QUOTE (Eli @ Apr 20 2010, 17:50) After my...   Apr 20 2010, 16:03
- - Eli   Wouldn't change the software license at all. T...   Apr 20 2010, 17:25
- - zfox   That can be feasible even with GPL (dependency thr...   Apr 20 2010, 18:05
- - Eli   NP, I am sure you, zfox, would not mind providing ...   Apr 20 2010, 18:55
- - Teknojnky   methinks the record companies would have a little ...   Apr 20 2010, 20:09
|- - Eli   QUOTE (Teknojnky @ Apr 20 2010, 15:09) me...   Apr 20 2010, 21:55
|- - Gregory S. Chudov   QUOTE (Teknojnky @ Apr 20 2010, 15:09) me...   May 7 2010, 20:44
|- - Eli   QUOTE (Gregory S. Chudov @ May 7 2010, 15...   May 10 2010, 02:51
|- - Gregory S. Chudov   QUOTE (Eli @ May 10 2010, 05:51) Of cours...   May 10 2010, 07:38
- - zfox   QUOTE (Gregory S. Chudov @ Mar 31 2010, 20...   Apr 20 2010, 20:21
- - zfox   QUOTE (Teknojnky @ Apr 20 2010, 22:09) me...   Apr 20 2010, 20:26
- - Respwaned2   Hey, this is a great initiative. I always wondered...   May 7 2010, 17:01
- - sauvage78   Use encode in repair mode. It seems CTDB can dete...   May 7 2010, 17:28
|- - Respwaned2   QUOTE (sauvage78 @ May 7 2010, 16:28) Use...   May 7 2010, 19:11
- - sauvage78   1: Well I never had "Index out of range...   May 7 2010, 19:36
|- - Respwaned2   QUOTE (sauvage78 @ May 7 2010, 18:36) 1: ...   May 7 2010, 19:41
- - sauvage78   I never tried to submit myself (I wait to have a f...   May 7 2010, 20:15
- - Gregory S. Chudov   QUOTE (Respwaned2 @ May 7 2010, 22:41) It...   May 7 2010, 20:52
|- - Respwaned2   QUOTE (Gregory S. Chudov @ May 7 2010, 19...   May 7 2010, 22:05
|- - Gregory S. Chudov   QUOTE (Respwaned2 @ May 8 2010, 01:05) I...   May 7 2010, 23:51
- - Respwaned2   I've also got a "Exception: The specified...   May 7 2010, 23:51
- - Gregory S. Chudov   Maybe the output path (for the log) was too long -...   May 7 2010, 23:59
- - odyssey   How do I submit a rip to CTDB??? I tried Cueripper...   May 8 2010, 13:13
- - Fandango   I guess it is submitted when the rip is done... @...   May 8 2010, 16:10
- - Gregory S. Chudov   I don't store pregaps exactly because they are...   May 8 2010, 17:19
|- - Saxo   QUOTE (Gregory S. Chudov @ May 8 2010, 17...   May 17 2010, 14:25
|- - Fandango   QUOTE (Saxo @ May 17 2010, 15:25) I have ...   May 17 2010, 17:10
||- - Saxo   QUOTE (Fandango @ May 17 2010, 18:10) QUO...   May 22 2010, 17:46
|- - Gregory S. Chudov   QUOTE (Saxo @ May 17 2010, 17:25) In my o...   May 22 2010, 18:07
- - odyssey   I just ripped a few CD's that should not be in...   May 8 2010, 23:03
- - Gregory S. Chudov   You can easily check if CD is in database - when y...   May 8 2010, 23:41
|- - odyssey   QUOTE (Gregory S. Chudov @ May 8 2010, 23...   May 9 2010, 00:16
|- - odyssey   QUOTE (Gregory S. Chudov @ May 8 2010, 23...   May 9 2010, 00:32
- - Gregory S. Chudov   Select encode mode, and select 'repair' fr...   May 9 2010, 00:21
- - sauvage78   While I agree that burst rip should be accepted as...   May 9 2010, 04:12
|- - Gregory S. Chudov   QUOTE (sauvage78 @ May 9 2010, 07:12) The...   May 10 2010, 07:55
- - sauvage78   I only post an interesting log that I just found i...   May 10 2010, 03:51
- - greynol   The last time you claimed "damage due to a sc...   May 10 2010, 06:43
- - sauvage78   Well the two problems cases are very different: -...   May 10 2010, 08:06
|- - Gregory S. Chudov   QUOTE (sauvage78 @ May 10 2010, 11:06) bu...   May 10 2010, 08:46
- - greynol   Offset differences between pressings can be far gr...   May 10 2010, 08:10
- - sauvage78   Yes, your exemple made me realize that the more th...   May 10 2010, 08:57
|- - greynol   QUOTE (sauvage78 @ May 10 2010, 00:57) So...   May 10 2010, 17:53
- - Fandango   Is CTDB able to identify null sample tracks? AR fa...   May 10 2010, 18:37
|- - greynol   QUOTE (Fandango @ May 10 2010, 10:37) I m...   May 10 2010, 19:01
- - Gregory S. Chudov   CTDB doesn't care about tracks and it doesn...   May 10 2010, 18:43
- - sauvage78   I have found one more strange log: An AR3 rip with...   May 12 2010, 03:50
- - Gregory S. Chudov   I was actually testing CUERipper on that disc My ...   May 12 2010, 03:59
- - sauvage78   Ok, Thks I have submitted this CD so that you ca...   May 12 2010, 04:10
- - Gregory S. Chudov   Thanks   May 12 2010, 04:12
- - sauvage78   I just re-checked this CD, now the reverse is happ...   May 12 2010, 05:51
- - Gregory S. Chudov   I purged the second entry completely. Before i did...   May 12 2010, 06:12
- - sauvage78   Ok, once more I was completly wrong about CTDB I...   May 12 2010, 06:58
- - sauvage78   Well maybe the biggest use would be for protected ...   May 22 2010, 18:27
|- - Saxo   QUOTE (sauvage78 @ May 22 2010, 19:27) In...   May 22 2010, 20:46
- - sauvage78   Saxo: I am not arguing about the usefullness of cu...   May 22 2010, 22:04
- - sauvage78   Hi Greg, In a discussion (that took a bad directio...   May 27 2010, 18:34
|- - Gregory S. Chudov   QUOTE (sauvage78 @ May 27 2010, 21:34) It...   May 27 2010, 19:57
|- - greynol   QUOTE (Gregory S. Chudov @ May 27 2010, 11...   May 27 2010, 20:32
- - greynol   You might want to go over this thread again since ...   May 27 2010, 18:48
- - sauvage78   Gregory: QUOTE It doesn't submit in Burst mode...   May 27 2010, 19:03
- - greynol   QUOTE (sauvage78 @ May 27 2010, 11:03) So...   May 27 2010, 20:15
- - Teknojnky   maybe cuetools / cueripper have grown, or about to...   May 27 2010, 20:33
|- - greynol   QUOTE (Teknojnky @ May 27 2010, 12:33) ma...   May 27 2010, 20:54
- - sauvage78   Thks for the answers. I am reassured that the issu...   May 27 2010, 20:40
|- - Gregory S. Chudov   QUOTE (sauvage78 @ May 27 2010, 23:40) Gr...   May 27 2010, 20:56
|- - greynol   QUOTE (sauvage78 @ May 27 2010, 12:40) Lo...   May 27 2010, 21:06
- - Fuki   Hi! I've been away for a while. Since the...   May 27 2010, 21:25
- - Gregory S. Chudov   It means that database only knows about a pressing...   May 27 2010, 21:38
- - Fuki   This rip has no data track... BTW What would be t...   May 27 2010, 21:48
- - Gregory S. Chudov   Has no data track, Accurately Ripped   May 27 2010, 21:50
- - Fuki   CUETools report this: CODECD-Extra data track leng...   May 27 2010, 21:58
- - sauvage78   greynol: QUOTE Hopefully you have them organized b...   May 27 2010, 22:08
|- - Saxo   QUOTE (sauvage78 @ May 27 2010, 23:08) Ev...   May 28 2010, 15:57
|- - Fuki   QUOTE (Saxo @ May 28 2010, 15:57) QUOTE (...   May 29 2010, 18:05
- - sauvage78   Fuki: If I were Greynol I would say that "th...   May 27 2010, 22:24
|- - Goratrix   QUOTE (sauvage78 @ May 27 2010, 23:24) Mo...   May 28 2010, 14:30
- - Fuki   Tnx sauvage78! Like a lot of the rips I have o...   May 28 2010, 08:59
- - spoon   >I store the drive model and ripper version so ...   May 28 2010, 09:29
|- - Fandango   QUOTE (spoon @ May 28 2010, 10:29) Unfort...   May 28 2010, 13:46
- - sauvage78   Goratrix: I already asked Greg for such a feature ...   May 29 2010, 06:30
- - sauvage78   Saxo: Actually I still use EAC, so what I am criti...   May 29 2010, 07:56
- - greynol   This topic is about the CUEToolsDB, not about conf...   May 29 2010, 18:13
3 Pages V   1 2 3 >


Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 21st August 2014 - 09:24