IPB

Welcome Guest ( Log In | Register )

> Hydrogenaudio Forum Rules

- No Warez. This includes warez links, cracks and/or requests for help in getting illegal software or copyrighted music tracks!
- No Spamming or Trolling on the boards, this includes useless posts, trying to only increase post count or trying to deliberately create a flame war.
- No Hateful or Disrespectful posts. This includes: bashing, name-calling or insults directed at a board member.
- Click here for complete Hydrogenaudio Terms of Service

2 Pages V   1 2 >  
Reply to this topicStart new topic
Collecting ideas for a free, perfect hashing tool
sn0wman
post Apr 9 2006, 10:58
Post #1





Group: Members
Posts: 82
Joined: 3-February 05
Member No.: 19557



please submit your ideas which you can't find implemented in any hashing tool around, or which feature is so important that you would not use a program without it.

many thanks, sn0wman.
Go to the top of the page
+Quote Post
RedFox
post Apr 10 2006, 21:06
Post #2





Group: Members
Posts: 70
Joined: 7-September 04
From: Paris, France
Member No.: 16842



There are already nice hashing & checksum tools, eg: fsum or par2 (includes recovery), but what I miss most is the ability to calculate a hash of audio files that applies only to the audio part.
Ie: I store & update tag values in the files, so any hash calculated for the file would be incorrect after I change the value of a tag in that file.
Some lossless formats include verification (eg: flac), but iirc, mp3 doesn't.


--------------------
Best audio player for the power user: foobar2000
Go to the top of the page
+Quote Post
sn0wman
post Apr 10 2006, 23:04
Post #3





Group: Members
Posts: 82
Joined: 3-February 05
Member No.: 19557



thats the main feature of my oss application, be patient smile.gif.
i want to do as much as possible before releasing it, thats why i posted this topic, now i am asking myself is that only me who wants so much from a hashing utility ? many algorithms implemented, unicode, regular expression for searching the files, shell integration including own context hashing 'profiles', cumulative folder content logging etc ? i hope other ppl will find it usefull, saying nothing of the mentioned extraordinary audio features.
Go to the top of the page
+Quote Post
zima
post Apr 11 2006, 00:15
Post #4





Group: Members
Posts: 136
Joined: 3-July 03
From: Pomerania
Member No.: 7541



I wonder...how far will you take shell integration? Few steps I imagine:
1) "check integrity of file" option in right click menu
2) field in right click menu that automatically shows whether file is correct when right-clicking (possible?)
3) something in tray that automatically checks files (can be limited/predermined to, for example, only from removable media) and marks their icons "yeah, this one's ok" (possible?)


--------------------
http://last.fm/user/zima
Go to the top of the page
+Quote Post
legg
post Apr 11 2006, 02:56
Post #5





Group: Members
Posts: 175
Joined: 5-March 05
From: Morelia, Mexico
Member No.: 20386



An online hash database. Even when this is risky, it might be of help for those that rip extremely damage cds.


--------------------
Home page: http://lc.fie.umich.mx/~legg/indexen.php
Go to the top of the page
+Quote Post
kwanbis
post Apr 11 2006, 03:51
Post #6





Group: Developer (Donating)
Posts: 2390
Joined: 28-June 02
From: Argentina
Member No.: 2425



isn't that accuraterip?


--------------------
MAREO: http://www.webearce.com.ar
Go to the top of the page
+Quote Post
legg
post Apr 11 2006, 05:18
Post #7





Group: Members
Posts: 175
Joined: 5-March 05
From: Morelia, Mexico
Member No.: 20386



QUOTE (kwanbis @ Apr 10 2006, 08:51 PM) *
isn't that accuraterip?


Dunno, the idea just crossed my mind. But nevertheless it might be a good feature for his tool.


--------------------
Home page: http://lc.fie.umich.mx/~legg/indexen.php
Go to the top of the page
+Quote Post
sn0wman
post Apr 11 2006, 10:49
Post #8





Group: Members
Posts: 82
Joined: 3-February 05
Member No.: 19557



QUOTE (kwanbis @ Apr 11 2006, 04:51 AM) *
isn't that accuraterip?


doesn't accurate rip base on a wav's checksum ?
so using it (?) for ordinary hashing is useless.
however, the application is also able to calculate the lossless files fingerprints so maybe this would be the place for it, nice smile.gif.

thanks for all sugesstions, other are welcome.
Go to the top of the page
+Quote Post
emtee
post Apr 14 2006, 20:42
Post #9





Group: Members
Posts: 198
Joined: 18-October 02
Member No.: 3569



1) Integrity check of md5, sfv, par, par2 files.
2) Directory recursive.
3) Multiplatform.
4) GUI-based.

These would be awesome smile.gif
Go to the top of the page
+Quote Post
krmathis
post Apr 14 2006, 22:08
Post #10





Group: Members
Posts: 742
Joined: 27-May 02
From: Oslo, Norway
Member No.: 2133



What emtee mention. In addition to the 'hash of audio files that applies only to the audio part' which RedFox suggested.
A command line application is fine for me, since I always have Terminal open anyways..
Go to the top of the page
+Quote Post
PiezoTransducer
post Apr 14 2006, 22:35
Post #11





Group: Members
Posts: 67
Joined: 28-September 05
Member No.: 24754



How about having the resulting hash be simple enough that it can be included as a field of the metadata of the file itself?

I'm not sure what I mean by "simple"... I guess I'm leaving it up to the reader to decide.

It'd be nice if this is something that would eventually find native support in all major music players.

The hash should also ideally be immune to changes to the audio stream that don't affect the decoded output, like audio that's been padded with digital silence should have the same hash as audio without silence.
Go to the top of the page
+Quote Post
Triza
post Apr 14 2006, 22:48
Post #12





Group: Members
Posts: 367
Joined: 16-November 03
Member No.: 9867



QUOTE (emtee @ Apr 14 2006, 11:42 AM) *
1) Integrity check of md5, sfv, par, par2 files.
2) Directory recursive.
3) Multiplatform.
4) GUI-based.

These would be awesome smile.gif


Actually No.

New 4) 1st we need a COMMANDLINE based. Then someone can create a wrapper on the top of that.

5) Open source
6) cross-platform

Otherwise I won't be able to use it.

Triza
Go to the top of the page
+Quote Post
p0l1m0rph1c
post Apr 15 2006, 20:34
Post #13





Group: Members
Posts: 50
Joined: 9-December 03
From: China
Member No.: 10315



QUOTE (PiezoTransducer @ Apr 15 2006, 06:35 AM) *
How about having the resulting hash be simple enough that it can be included as a field of the metadata of the file itself?

I'm not sure what I mean by "simple"... I guess I'm leaving it up to the reader to decide.

It'd be nice if this is something that would eventually find native support in all major music players.

The hash should also ideally be immune to changes to the audio stream that don't affect the decoded output, like audio that's been padded with digital silence should have the same hash as audio without silence.


Heh, what can be simpler than 52E2B834 (CRC32) or D75909AF25EF3788957459263AD0D74D (MD5)?
Easily fits into any type of tag.
Go to the top of the page
+Quote Post
norz
post Apr 18 2006, 16:34
Post #14





Group: Members
Posts: 34
Joined: 12-April 06
From: Paris, France
Member No.: 29463



QUOTE (sn0wman @ Apr 11 2006, 12:04 AM) *
thats the main feature of my oss application, be patient smile.gif.

Maybe you could base the audio decoding part on existing plugins (eg: 1by1 player uses winamp plugins)
Or maybe -given that it's oss- it's better to include existing libraries?
Just a thought, I'm not a developer wink.gif
Go to the top of the page
+Quote Post
pepoluan
post Apr 18 2006, 19:19
Post #15





Group: Members
Posts: 1455
Joined: 22-November 05
From: Jakarta
Member No.: 25929



While we're on the topic of hashes...

Why must audio files be hashed using MD5? It's too complicated I think. I mean, MD5's main function is not for error-checking, rather to prevent willful tampering.

For the normal damages that happen to audio files, CRC32 is enough. Perhaps 2 CRC32 values with different polynomials. Should be quite robust. And it's easier to implement. Not to mention a wholelottafaster.


--------------------
Nobody is Perfect.
I am Nobody.

http://pandu.poluan.info
Go to the top of the page
+Quote Post
sn0wman
post Apr 18 2006, 22:18
Post #16





Group: Members
Posts: 82
Joined: 3-February 05
Member No.: 19557



for the firsth, i am not collecting ideas for something i gonna start with, but for something i have started about 1 year ago, so the work is in very advanced stadium. that implies:
    - application wont be (too far) cross-platform, however i will try to make its engine (there is one) to be;
    - application is GUI based, but commandline parameters passing is on the TODO list, standalone
    commandline version also, and it may (?) be cross-platform;
    - application already features MD5, CRC16&32 and many others;
and now:
    - i like legg's ideas of making use of accuraterip database, online and offline, also par/par2 file checking/creating is a new idea for me.
    - i like zima's idea about the tray icon. i just like, not say i will do that smile.gif !
    - i dont like zima's idea of showing the result in context menu - it sounds very interesting also for me, but we cant forget that showing it (menu) used to be an instant action, we cant wait for the system context menu (hashing !);
    - application will store audio hash in a tag (already on TODO);
QUOTE
Maybe you could base the audio decoding part on existing plugins (eg: 1by1 player uses winamp plugins)

what you mean by that ? audio hash doesnt need encoding, fingerprint does.

This post has been edited by sn0wman: Apr 23 2006, 15:59
Go to the top of the page
+Quote Post
PiezoTransducer
post Apr 19 2006, 01:31
Post #17





Group: Members
Posts: 67
Joined: 28-September 05
Member No.: 24754



QUOTE (sn0wman @ Apr 18 2006, 03:18 PM) *
- application will store audio hash in a tag (already on TODO);

Just an elaboration of my vague comment a couple posts above. It'd be nice if there were a standard hash (fingerprint?) tag for the the audio just like there is a replaygain value for loudness. I may be thinking along different lines from your original intention. I'm thinking on the level of making like... a new RFC, while I think you're talking about just an application.
Go to the top of the page
+Quote Post
p0l1m0rph1c
post Apr 22 2006, 01:29
Post #18





Group: Members
Posts: 50
Joined: 9-December 03
From: China
Member No.: 10315



QUOTE (pepoluan @ Apr 19 2006, 03:19 AM) *
While we're on the topic of hashes...

Why must audio files be hashed using MD5? It's too complicated I think. I mean, MD5's main function is not for error-checking, rather to prevent willful tampering.

For the normal damages that happen to audio files, CRC32 is enough. Perhaps 2 CRC32 values with different polynomials. Should be quite robust. And it's easier to implement. Not to mention a wholelottafaster.


Well, no one forced whoever to use MD5 for hashing. And well, MD5 is still the hash algorithm which attains the best speed/security ratio. You could use MD4, which is faster but is known to be flawed. Yeah, you could use CRC32, but you have the probability of 1 in 4 billion that the error will not be detected.

Long shot, but why risk it when you can use MD5 (or whatever, like SHA-1 or <insert hash algo here>). The speed is not that whole better. You can probably go 2x faster with CRC32 than with MD5. Maybe a little more. Either way, your speed is bounded by hard drive speed, not by the algorithm.
Go to the top of the page
+Quote Post
SebastianG
post Apr 24 2006, 14:26
Post #19





Group: Developer
Posts: 1318
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



QUOTE (p0l1m0rph1c @ Apr 22 2006, 02:29 AM) *
Well, no one forced whoever to use MD5 for hashing. And well, MD5 is still the hash algorithm which attains the best speed/security ratio. You could use MD4, which is faster but is known to be flawed.

So is MD5 IIRC (flawed in terms of security against an intelligent attacker who intentionally wants to create collisions). But If you just want to protect files against "random corruption" CRC32 is fine, too.

However, if you also plan to use the "hash" as some kind of key in a database it better be large (160 bits or more). Note that the probability of a collision with 2^X randomly generated codes of 2X bits length is around 50%.

Sebi
Go to the top of the page
+Quote Post
p0l1m0rph1c
post Apr 24 2006, 17:21
Post #20





Group: Members
Posts: 50
Joined: 9-December 03
From: China
Member No.: 10315



Well, yeah. So is SHA-1 (conceptually, not everyone will bother to do 2^63 iterations, heh). My point there was speed. The advantages of MD5 for other uses other than checksumming (you mentioned databases as example), overcome the not-too-large speed penalty over say, CRC32.
Go to the top of the page
+Quote Post
rjamorim
post Apr 25 2006, 02:04
Post #21


Rarewares admin


Group: Members
Posts: 7515
Joined: 30-September 01
From: Brazil
Member No.: 81



QUOTE (p0l1m0rph1c @ Apr 21 2006, 09:29 PM) *
Yeah, you could use CRC32, but you have the probability of 1 in 4 billion that the error will not be detected.


Since nobody is trying to detect intentional tampering here (why would someone bother to tamper the signatures of your music collection? Insert subliminal messages?), I don't see the point of going with full-blown MD5, SHA or WhirlPool. If the case against CRC32 is avoiding collision once every 4 billion times, let's go with CRC64, or CRC256, or CRC65536 if you're really insane tongue.gif

Also, CRC gives you an opportunity to implement some error correction if your stream has few errors ("Correction can also be done if information lost is lower than information held by the checksum"). Cryptographic hashes throw that opportunity out of the window.

This post has been edited by rjamorim: Apr 25 2006, 02:12


--------------------
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org
Go to the top of the page
+Quote Post
norz
post May 6 2006, 15:05
Post #22





Group: Members
Posts: 34
Joined: 12-April 06
From: Paris, France
Member No.: 29463



QUOTE (sn0wman @ Apr 18 2006, 23:18) *
QUOTE

Maybe you could base the audio decoding part on existing plugins (eg: 1by1 player uses winamp plugins)

what you mean by that ? audio hash doesnt need encoding, fingerprint does.

My mistake: I thought you'd have to decode the audio to hash it (hence to idea of using existing plugins), but I guess you'll just take the audio bits and hash them without any prior processing.

edit: spelling

This post has been edited by norz: Jul 25 2006, 20:23
Go to the top of the page
+Quote Post
norz
post Jul 25 2006, 20:27
Post #23





Group: Members
Posts: 34
Joined: 12-April 06
From: Paris, France
Member No.: 29463



@sn0wman: Any news on your project?
Go to the top of the page
+Quote Post
norz
post Jul 25 2006, 21:18
Post #24





Group: Members
Posts: 34
Joined: 12-April 06
From: Paris, France
Member No.: 29463



QUOTE (norz @ Jul 25 2006, 21:27) *
@sn0wman: Any news on your project?

A workaround solution until sn0wman's program is released:
Use a decoder and a hashing program that supports pipes.

Example (on windows):
madplay.exe --output=wave:- "mysong.mp3" | md5sum
This will send a 16bit pcm wave stream to md5sum.
md5sum is a port of gnu utils, from here I think.

I have tested this by replacing some characters in the tags with foobar.
Original and modified files:
- have same size
- have different md5 checksums
- produce decoded wave streams that have the same checksum

---edit begin:
I'm using madplay 0.15.2 (beta).

Regarding tags: my foobar2000 writes id3v1 and ape2 tags to the mp3, and madplay doesn't like this: on those files it will display an error message saying: "error: frame 999: lost synchronization", where 999 is the last decoded frame. However, the md5 checksum will stay the same for an mp3 file without ape2 tags, and after foobar2000 has applied ape2 tag to it.

I've changing my command line a bit:
madplay.exe --output=wave:- --verbose --display-time=remaining %1 | md5sum > %1.md5
This will display remaining time (on terminal) as it processes the file,
and write a .md5 file automatically, which makes it better suited to be called by a batch script to produce .md5 checksums (eg: with sweep)
---edit end

This post has been edited by norz: Jul 25 2006, 21:44
Go to the top of the page
+Quote Post
sn0wman
post Aug 4 2006, 16:17
Post #25





Group: Members
Posts: 82
Joined: 3-February 05
Member No.: 19557



i am not dead, just on holidays now smile.gif
see you soon with some news, ok. (little basic cmd alpha testing version ? [testing the tag-independent engine])

best regards, sn0wman
Go to the top of the page
+Quote Post

2 Pages V   1 2 >
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 25th December 2014 - 16:21