IPB

Welcome Guest ( Log In | Register )

> foobar2000 General Forum Rules

This is NOT a tech support forum.
Tech support questions go to foobar2000 Tech Support forum instead.

See also: Hydrogenaudio Terms of Service.

 
Reply to this topicStart new topic
(possible feature request) field for audio hash
Nisto
post Oct 28 2012, 04:58
Post #1





Group: Members
Posts: 55
Joined: 27-September 09
Member No.: 73522



Hi. Is there a way to access a hash (of any kind), or actually just any way to identify the actual audio of a file? I'm using the customdb component, and it needs key(s) in order to know which tags goes to which tracks. I have only used tags as keys so far, but I realized I come across tag-less files--and even files incapable of being tagged--every so often, so that of course means any data for these type of files, from customdb, will be merged with other tag-less files. But that's really just one of the ways tags or general file information can crash (for me anyway).

Anyway... as far as I know, foobar2000 doesn't really provide a simple "%hash%" field right? (Sure, there's %__md5% but not all containers define MD5 hashes...) If that's really the case, then I'm guessing the reason it hasn't been implemented yet, is either because you didn't see any use for it, or because it would slow down the software a bit? But how would it be if you at least allowed users to "enable" a hash field through a standard component (e.g. optional when installing fb2k)?

I did read this topic by the way, but I don't like the idea of storing these IDs in the tags.

Thanks!
Go to the top of the page
+Quote Post
mudlord
post Oct 28 2012, 05:25
Post #2





Group: Developer (Donating)
Posts: 818
Joined: 1-December 07
Member No.: 49165



So you basically want a component that adds a audio hash for the raw PCM/data from the decoders, right?
Go to the top of the page
+Quote Post
Nisto
post Oct 28 2012, 18:45
Post #3





Group: Members
Posts: 55
Joined: 27-September 09
Member No.: 73522



QUOTE (mudlord @ Oct 28 2012, 06:25) *
So you basically want a component that adds a audio hash for the raw PCM/data from the decoders, right?


Yeah! Is it doable?
Go to the top of the page
+Quote Post
kode54
post Oct 28 2012, 19:05
Post #4





Group: Admin
Posts: 4691
Joined: 15-December 02
Member No.: 4082



It would need to scan every file at least once, and then it would need to store the hash somewhere. And then I'm not really sure how it would display it, if it were not stored as a normal metadata field.
Go to the top of the page
+Quote Post
Nisto
post Oct 28 2012, 19:29
Post #5





Group: Members
Posts: 55
Joined: 27-September 09
Member No.: 73522



QUOTE (kode54 @ Oct 28 2012, 20:05) *
It would need to scan every file at least once, and then it would need to store the hash somewhere. And then I'm not really sure how it would display it, if it were not stored as a normal metadata field.


Well, as I said, I don't want hashes stored in the tags anyway, so that wouldn't help me very much... If I were to do it that way, then I actually wouldn't even be using customdb at the moment: the reason I AM using the component is because, FLAC for example, doesn't (yet) have some standard fields which I make use of in customdb officially specified.

Isn't there a way you can fingerprint the audio "on-the-fly" somehow? Perhaps if you just hashed the first/last few x bytes of the audio, would that make it more lightweight?
Go to the top of the page
+Quote Post
Kohlrabi
post Oct 29 2012, 00:17
Post #6





Group: Super Moderator
Posts: 1150
Joined: 12-March 05
From: Kiel, Germany
Member No.: 20561



QUOTE (Nisto @ Oct 28 2012, 20:29) *
Well, as I said, I don't want hashes stored in the tags anyway, so that wouldn't help me very much... If I were to do it that way, then I actually wouldn't even be using customdb at the moment: the reason I AM using the component is because, FLAC for example, doesn't (yet) have some standard fields which I make use of in customdb officially specified.
Is there a (compelling) reason why you don't want to store the hash metainfo in a metadata field, but rather insist on some other "official" solution? Sounds like you make life harder for yourself than it needs to be.

That said, this problem might be interesting and easy enough to hone my f2k-component coding skills. Or Peter could expand the verifier component.

This post has been edited by Kohlrabi: Oct 29 2012, 00:23


--------------------
It's only audiophile if it's inconvenient.
Go to the top of the page
+Quote Post
romor
post Oct 29 2012, 03:19
Post #7





Group: Members
Posts: 682
Joined: 16-January 09
Member No.: 65630



QUOTE (Nisto @ Oct 28 2012, 05:58) *
Hi. Is there a way to access a hash (of any kind), or actually just any way to identify the actual audio of a file?

foo_biometric can write audio fingerprint to tag

QUOTE (Nisto @ Oct 28 2012, 05:58) *
I'm using the customdb component, and it needs key(s) in order to know which tags goes to which tracks.

can you use path?
$crc32(%path%)
maybe append subsong index in case you use cue sheets or chapters and similar?


--------------------
scripts: http://goo.gl/M1qVLQ
Go to the top of the page
+Quote Post
Nisto
post Oct 29 2012, 10:26
Post #8





Group: Members
Posts: 55
Joined: 27-September 09
Member No.: 73522



QUOTE (Kohlrabi @ Oct 29 2012, 01:17) *
Is there a (compelling) reason why you don't want to store the hash metainfo in a metadata field, but rather insist on some other "official" solution?

One other reason why I don't want to store things in the tags is because that means I'll have to do that for anything new I ever rip / download. Imagine if I forget to tag the track(s) before playing or rating anything? If I rate a track when the hash tag has not yet been applied, then there's still an easy possibility of a crash with tag/key-less tracks. I just don't see it becoming a habit on my end... It would be much easier if some sort of identification was available directly. Actually, I think even the value of the first or last non-null byte of the audio would be enough, because I can couple it with the sample count (and the sample count (%length_samples%) of a file does crash with a few other files in my collection when only using that as the key with customdb, but not many at all). Is it still too much to ask.. ?

QUOTE (Kohlrabi @ Oct 29 2012, 01:17) *
That said, this problem might be interesting and easy enough to hone my f2k-component coding skills.

I would really appreciate the help!

QUOTE (romor @ Oct 29 2012, 04:19) *
foo_biometric can write audio fingerprint to tag

Please read my previous posts fully (I know of foo_biometric already).

QUOTE (romor @ Oct 29 2012, 04:19) *
can you use path?
$crc32(%path%)
maybe append subsong index in case you use cue sheets or chapters and similar?

I'm afraid not :/ Usually when I download or rip stuff, I put the files in a temporary folder, then I re-tag the files (which I rarely do right away) and put it in my proper music folder. By that time I'm sure to have played the tracks at least a few times, and maybe even rated them, so...

This post has been edited by Nisto: Oct 29 2012, 10:31
Go to the top of the page
+Quote Post
Kohlrabi
post Oct 29 2012, 12:46
Post #9





Group: Super Moderator
Posts: 1150
Joined: 12-March 05
From: Kiel, Germany
Member No.: 20561



QUOTE (Nisto @ Oct 29 2012, 11:26) *
One other reason why I don't want to store things in the tags is because that means I'll have to do that for anything new I ever rip / download.
How is that affected by where this information is stored?

QUOTE (Nisto @ Oct 29 2012, 11:26) *
It would be much easier if some sort of identification was available directly.
What does "directly" mean? Hash it on-the-fly? That seems excessive and highly impractical.

QUOTE (Nisto @ Oct 29 2012, 11:26) *
Actually, I think even the value of the first or last non-null byte of the audio would be enough, because I can couple it with the sample count (and the sample count (%length_samples%) of a file does crash with a few other files in my collection when only using that as the key with customdb, but not many at all).
The method of hashing is completely unrelated to the means of storing the hash.

QUOTE (Nisto @ Oct 29 2012, 11:26) *
QUOTE (Kohlrabi @ Oct 29 2012, 01:17) *
That said, this problem might be interesting and easy enough to hone my f2k-component coding skills.

I would really appreciate the help!
I can't promise anything, since I don't have much free time this week, and my skills are rather undeveloped and rusty.


--------------------
It's only audiophile if it's inconvenient.
Go to the top of the page
+Quote Post
maruseru
post Oct 29 2012, 19:22
Post #10





Group: Members
Posts: 10
Joined: 22-June 09
Member No.: 70872



I'd prefer CRC
Go to the top of the page
+Quote Post
Nisto
post Oct 29 2012, 21:59
Post #11





Group: Members
Posts: 55
Joined: 27-September 09
Member No.: 73522



QUOTE (Kohlrabi @ Oct 29 2012, 01:17) *
How is that affected by where this information is stored?

Because actually storing a hash in the file means it'll have to be done manually for everything I open? Even if that could be automatically done, I just don't like it...

QUOTE (Kohlrabi @ Oct 29 2012, 01:17) *
What does "directly" mean? Hash it on-the-fly? That seems excessive and highly impractical.

Yes. If it's impractical, can you tell me something that IS practical? As I've said like three times already, it doesn't actually need to be a hash, and not even of the whole audio chunk. Anything to further identify something of the actual audio. Like the peak dB (though I don't use ReplayGain or anything, so not sure that's possible...) or something.
Go to the top of the page
+Quote Post
naturfreak
post Oct 29 2012, 22:37
Post #12





Group: Members
Posts: 176
Joined: 16-October 03
Member No.: 9338



QUOTE (maruseru @ Oct 29 2012, 19:22) *
I'd prefer CRC

Hmm. Problem: Many possible collisions -> Different audio files could have the same hash values.
A hash should be at least 64 Bits long to avoid probality of such collisions.
Go to the top of the page
+Quote Post
Kohlrabi
post Oct 29 2012, 23:28
Post #13





Group: Super Moderator
Posts: 1150
Joined: 12-March 05
From: Kiel, Germany
Member No.: 20561



QUOTE (Nisto @ Oct 29 2012, 22:59) *
QUOTE (Kohlrabi @ Oct 29 2012, 01:17) *
How is that affected by where this information is stored?

Because actually storing a hash in the file means it'll have to be done manually for everything I open? Even if that could be automatically done, I just don't like it...
I see only two methods of doing it: Analysing the file upon playback, or analysing a selected group of songs by manually invoking a scan of the data. This is for example how Zao's seekbar does it, as far as I know. But then this information will only be useful if it can be stored somewhere, and thus be accessible not only during playback. I think Zao stores the waveform information in his own serialization container/database, since attaching that info into the metadata would be quite excessive. Also the audio stream itself is essentially the same information, making it quite redundant, too. So, one could come up with a component which stores all the hashed information in its own database, so files don't get altered. I just don't know how the hash can then be made accessible to title formatting functions, so I'd still prefer a tag, since it is essentially only some bytes of data, and can be transparently accessed by any function or component which can use tags/title formatting. But I guess there is a way to "create" title formatting field references, since playback statistics do that.

I should just start doing it I guess. smile.gif

This post has been edited by Kohlrabi: Oct 29 2012, 23:36


--------------------
It's only audiophile if it's inconvenient.
Go to the top of the page
+Quote Post
mudlord
post Oct 30 2012, 10:27
Post #14





Group: Developer (Donating)
Posts: 818
Joined: 1-December 07
Member No.: 49165



QUOTE (naturfreak @ Oct 29 2012, 16:37) *
QUOTE (maruseru @ Oct 29 2012, 19:22) *
I'd prefer CRC

Hmm. Problem: Many possible collisions -> Different audio files could have the same hash values.
A hash should be at least 64 Bits long to avoid probality of such collisions.


Same logic applies to tons of things like MD5. So something like the SHA-1 standard would be needed. Or whatever people would want in such a component if one makes it.

This post has been edited by mudlord: Oct 31 2012, 00:18
Go to the top of the page
+Quote Post
mudlord
post Oct 31 2012, 01:53
Post #15





Group: Developer (Donating)
Posts: 818
Joined: 1-December 07
Member No.: 49165



you mean something like:
http://mudlord.info/temp/foo_audiohasher.fb2k-component
Go to the top of the page
+Quote Post
Nisto
post Oct 31 2012, 02:55
Post #16





Group: Members
Posts: 55
Joined: 27-September 09
Member No.: 73522



What type of hash is this? Also, can I access the hash somehow, without scanning a file first?
Go to the top of the page
+Quote Post
mudlord
post Oct 31 2012, 03:46
Post #17





Group: Developer (Donating)
Posts: 818
Joined: 1-December 07
Member No.: 49165



Hash is SHA-1, and no, you must scan a file first.
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 21st December 2014 - 10:34