IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
Comparing AAC files, How to extract and compare the raw AAC audio data?
gskluzacek
post Jan 7 2014, 00:12
Post #1





Group: Members
Posts: 7
Joined: 6-January 14
Member No.: 113838



I have many AAC music files on many different computers and hard drive... mostly from iTunes with m4a and m4p extensions and aside from the file names and meta data I'm looking for a way to determine if the AAC files are for the same song. BTW I've accumulated all the files on a Mac and would prefer to do the processing there, but I also have access to a PC as well.

My first thought was to take an MD5 hash of the files and compare them, but it turns out that differences in the metadata can cause the MD5 hash to differ between 2 copies of the same song. I also tried to delete all metadata (used AtomicParsley --metaEnema) and then compare the MD5 hashes, but other info stuffed into the AAC media container also causes different MD5 hashes.

So if I could somehow extract the raw AAC audio data from the file then write it to a temp file and take the MD5 has of that, it should allow me to detect identical files.

I've used a similar approach to compare JPEG files, but I can't seem to find a way to get at the raw AAC audio data.

Also looking to do the the same to compare MP3 files...

Thanks in Advanced
-Greg
Go to the top of the page
+Quote Post
includemeout
post Jan 7 2014, 00:25
Post #2





Group: Members
Posts: 282
Joined: 16-December 09
From: Maringá, Brazil
Member No.: 76067



For iTunes maybe this may help.

AFAIK it tracks music files down by their "digital signature" (or something like that) checking them up against a database (Gracenote?). There are also some other programs for PCs which work in a similar fashion, but I don't recall any names ATM.

Edit: further clarification

This post has been edited by includemeout: Jan 7 2014, 00:27


--------------------
Listen to the music, not the media.
Go to the top of the page
+Quote Post
gskluzacek
post Jan 7 2014, 00:28
Post #3





Group: Members
Posts: 7
Joined: 6-January 14
Member No.: 113838



QUOTE (includemeout @ Jan 6 2014, 17:25) *
For iTunes maybe this may help.

AFAIK it tracks music files down by their "digital signature" if memory doesn't fail me, that's how they used to call it a few years back when I used iTunes. There also are some other programs for PC which work in a similar fashion but I don't recall any names ATM.


Thanks, I will check it out... I wonder if it will work on files that are not in an iTunes library... meaning over the years, the files have been scattered here and there and lots of them are not even in iTunes any more.
Go to the top of the page
+Quote Post
gskluzacek
post Jan 7 2014, 00:38
Post #4





Group: Members
Posts: 7
Joined: 6-January 14
Member No.: 113838



QUOTE (includemeout @ Jan 6 2014, 17:25) *
For iTunes maybe this may help.

AFAIK it tracks music files down by their "digital signature" (or something like that) checking them up against a database (Gracenote?). There are also some other programs for PCs which work in a similar fashion, but I don't recall any names ATM.

Edit: further clarification


hmmm.... $50 a little pricy for my blood smile.gif I was hoping to do the job using some shell scripting and some open source / freeware libraries.
Go to the top of the page
+Quote Post
nu774
post Jan 7 2014, 02:20
Post #5





Group: Developer
Posts: 538
Joined: 22-November 10
From: Japan
Member No.: 85902



Since m4p is DRM protected, there may not be so much options for you, and I cannot tell you how to do with them.
For non DRM protected files, you can use something like the following to extract raw AAC bitstream:
CODE
ffmpeg -i input.m4a -c:a copy -f s8 output.raw.aac

This is a bit tricky job since ffmpeg will append ADTS headers for AAC output by default. ADTS header should be usually fine, but since you want raw AAC, "-f s8" is set to fake output to be a signed 8bit raw PCM. In combination with -c:a copy, it seems that ffmpeg successfully writes raw AAC bitstream as intended.
If you want to listen to the raw AAC file, probably you have to append ADTS header by the following:
CODE
faad -a output.adts.aac input.raw.aac

Go to the top of the page
+Quote Post
gskluzacek
post Jan 7 2014, 02:22
Post #6





Group: Members
Posts: 7
Joined: 6-January 14
Member No.: 113838



I will give that a try... thanks!
Go to the top of the page
+Quote Post
mudlord
post Jan 7 2014, 03:48
Post #7





Group: Developer (Donating)
Posts: 813
Joined: 1-December 07
Member No.: 49165



http://mudlord.info/temp/foo_audiohasher.fb2k-component
Go to the top of the page
+Quote Post
nu774
post Jan 7 2014, 06:03
Post #8





Group: Developer
Posts: 538
Joined: 22-November 10
From: Japan
Member No.: 85902



QUOTE (mudlord @ Jan 7 2014, 11:48) *

That's interesting, but I wonder if it is appropriate to hash decoded PCM of floating point based (therefore not assured to be bit-exact) lossy coders.
Of course it should be enough to compare decoded PCM if OP just wants to compare A from B NOW.
Go to the top of the page
+Quote Post
saratoga
post Jan 7 2014, 06:30
Post #9





Group: Members
Posts: 5045
Joined: 2-September 02
Member No.: 3264



As long as the same decoder is used it should work. Changing decoders will possibly change the hashes for some formats, particularly for 24 bit audio or fixed point arithmetic.
Go to the top of the page
+Quote Post
Porcus
post Jan 7 2014, 08:53
Post #10





Group: Members
Posts: 1913
Joined: 30-November 06
Member No.: 38207



QUOTE (gskluzacek @ Jan 7 2014, 00:12) *
many AAC music files

How many? More than can be imported to a media player and sorted by length?

That said, maybe someone can give input on the following: suppose I do ffmpeg -i infile.m4a -acodec copy outfile.m4a, is there any possibility that the infile might contain headers (say, for gapless playback) which will be lost and a player see them as e.g. different lengths? mp3 files might have misleading length information ...

This post has been edited by Porcus: Jan 7 2014, 08:56


--------------------
One day in the Year of the Fox came a time remembered well
Go to the top of the page
+Quote Post
nu774
post Jan 7 2014, 13:29
Post #11





Group: Developer
Posts: 538
Joined: 22-November 10
From: Japan
Member No.: 85902



QUOTE (Porcus @ Jan 7 2014, 16:53) *
That said, maybe someone can give input on the following: suppose I do ffmpeg -i infile.m4a -acodec copy outfile.m4a, is there any possibility that the infile might contain headers (say, for gapless playback) which will be lost and a player see them as e.g. different lengths? mp3 files might have misleading length information ...

Although I don't understand why you remux to m4a here, your guess is correct.
Amount of delay and padding for gapless playback are usually stored under a special tag named "iTunSMPB", and it is lost by that remux process (actually, ffmpeg will copy most of major tags in infile.m4a but not iTunSMPB).
Go to the top of the page
+Quote Post
mudlord
post Jan 7 2014, 22:39
Post #12





Group: Developer (Donating)
Posts: 813
Joined: 1-December 07
Member No.: 49165



QUOTE (saratoga @ Jan 6 2014, 23:30) *
As long as the same decoder is used it should work. Changing decoders will possibly change the hashes for some formats, particularly for 24 bit audio or fixed point arithmetic.


Pretty much. And I doubt the AAC decoder would change, unless its based on FFMPEG, and upstream they discover some things that need fixing or something.

The component uses FB2K's input services to decode the files to raw PCM data, which is then sent directly to the hashing functions. There is still room for improvement like multithreading, a proper output hash dialog, and different selectable hashroutines. (right now it uses SHA-1)
Go to the top of the page
+Quote Post
marc2003
post Jan 8 2014, 00:14
Post #13





Group: Members
Posts: 4593
Joined: 27-January 05
From: England
Member No.: 19379



perhaps perfect tunes could do the job? i've not tried it myself but apparently it finds duplicates of lossless/lossy files so obviously it's not checking to see if they are bitperfect.

http://www.dbpoweramp.com/perfecttunes.htm
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 23rd October 2014 - 15:01