IPB

Welcome Guest ( Log In | Register )

2 Pages V   1 2 >  
Reply to this topicStart new topic
foo_chacon, A thingie to convert metadata charset.
Yirkha
post Oct 5 2008, 16:08
Post #1





Group: FB2K Moderator
Posts: 2359
Joined: 30-November 07
Member No.: 49158



I needed to fix broken tags of a bunch of files yesterday, so I've made myself this component to do that efficiently and I thought perhaps someone else might find it useful as well, so here it is.

The offered functionality is essentially similar to what the "Override charset" option in foo_infobox did, though it's accessed directly from the context menu and for any number of tracks at once.

It can be generally used to fix ID3v1 tags or cue sheets saved in a codepage different from that of your system. (And no, nobody wants to hear about those infelicitous files from shabby sources ;)

documentation & tutorial
foo_chacon-v3.zip (62 kB, v3, 2010/04/07)
mirror


This post has been edited by Yirkha: Aug 23 2010, 05:23


--------------------
Full-quoting makes you scroll past the same junk over and over.
Go to the top of the page
+Quote Post
Borisz
post Dec 10 2008, 20:59
Post #2





Group: Members
Posts: 381
Joined: 27-September 03
Member No.: 9041



I am surprised that there are no replies to this, maybe because up till now there was no reason to switch from foo_infobox.

Thanks for this plugin, it does a seldom needed function, but when it's needed, it is unmeasurably helpful. It does the exact thing why I kept infobox in foobar, but it does it so much better.

This post has been edited by Borisz: Dec 10 2008, 20:59


--------------------
http://evilboris.sonic-cult.net/346/
Sega Saturn, Shiro!
Go to the top of the page
+Quote Post
nevets1219
post Dec 11 2008, 11:01
Post #3





Group: Members
Posts: 46
Joined: 10-October 05
Member No.: 25022



Just wanted to say thanks for such a great utility. This alone definitely made the switch from v0.8.3 all that much easier smile.gif

Might I ask you to explain more in regards to "convert to local codepage first" feature?

Also, might I suggest that a filter be created so that it's easier to select all Chinese codepages or all Japanese codepages.
Go to the top of the page
+Quote Post
Yirkha
post Dec 11 2008, 11:57
Post #4





Group: FB2K Moderator
Posts: 2359
Joined: 30-November 07
Member No.: 49158



Oh noes, people have found this and started asking questions... sleep.gif


"Convert to local codepage first" feature is necessary, because foobar2000 reads files with unspecified character set as in "local system codepage". That is then converted to UTF-8, as everything else in fb2k. Because we want to re-read the tags in another charset, it's usually needed to first convert them back from UTF-8 to that local system codepage, then reparse it from whatever you have selected to UTF-8 again - and the checkbox enables the first part of this process.

Note that this is also why this component is inherently unsafe - there is no guarantee that the conversion "CP_target read as CP_system => CP_UTF8 => CP_system" is fully equivalent. The proper way would be to read the tags from various file formats directly, not using the standard input modules. But it seems to work quite well so far, so let's hope this won't be needed.


Regarding the filter -
Yes, something like that could be added and it would need some additional configuration and/or custom-drawn groups. Though when I used Chacon, I usually chose one particular charset and processed many subsequent files with it easily, because the setting was remembered. When something different came, even mindlessly skimming through the whole list was not so much hassle. I tend to leave it as simple and stupid as it is, thank you.


--------------------
Full-quoting makes you scroll past the same junk over and over.
Go to the top of the page
+Quote Post
nevets1219
post Dec 11 2008, 22:22
Post #5





Group: Members
Posts: 46
Joined: 10-October 05
Member No.: 25022



Thanks for the info on the "convert to local pages first"

Regarding the filter feature, yea it's a bit more situational and probably wouldn't save all that much effort.
Go to the top of the page
+Quote Post
2E7AH
post Jan 23 2009, 22:34
Post #6





Group: Validating
Posts: 2424
Joined: 21-May 08
Member No.: 53675



would it be possible to extend the component with custom chararacter remaping?

than we could easily transform latin to cyrillic or something similar
Go to the top of the page
+Quote Post
Yirkha
post Feb 2 2009, 10:22
Post #7





Group: FB2K Moderator
Posts: 2359
Joined: 30-November 07
Member No.: 49158



That would be possible. This component currently simply uses Windows routines to convert between different encodings, but adding a custom convertor wouldn't be hard and it would provide additional flexibility.

However I'm thinking about the way to store such remapping tables. To stay within the scope of "character set remapping", it would need to allow mapping arbitrary binary sequences to Unicode codepoints. Because it's not possible to use two different charsets in one user-editable text file, the data must be formatted for instance in hex - and I'd use the same format as iconv for great compatibility. For example, mapping A/B/C to a/b/c:
CODE
0x41 0x61
0x42 0x62
0x43 0x63

But when you speak about transliteration, I'm not sure if that format would be as suitable for it. Some kind of list of replacements, both already in Unicode, seems much better for such usage to me. And then I'm not sure if it has much to do with character set remapping...


--------------------
Full-quoting makes you scroll past the same junk over and over.
Go to the top of the page
+Quote Post
2E7AH
post Feb 2 2009, 18:41
Post #8





Group: Validating
Posts: 2424
Joined: 21-May 08
Member No.: 53675



ok, probably using $replace() is the easy way

i was thinking about simple remappings, in the same code page, and you think more globaly
i wouldn't have anything to suggest because the subject is beyond me
Go to the top of the page
+Quote Post
Yirkha
post Feb 14 2009, 03:59
Post #9





Group: FB2K Moderator
Posts: 2359
Joined: 30-November 07
Member No.: 49158



v0.0.2 is up, features one simple addition: it is possible to copy text from selected fields in the preview pane using context menu or keyboard shortcut Ctrl+C. Helps when you don't have a clue how the tags should really look - you can for instance paste them to Google and see if it yields plausible results.


--------------------
Full-quoting makes you scroll past the same junk over and over.
Go to the top of the page
+Quote Post
deviantus
post Feb 17 2009, 13:01
Post #10





Group: Members
Posts: 1
Joined: 24-November 08
Member No.: 63084



What about UTF-16 support?
Go to the top of the page
+Quote Post
2E7AH
post Feb 17 2009, 13:40
Post #11





Group: Validating
Posts: 2424
Joined: 21-May 08
Member No.: 53675



QUOTE (deviantus @ Feb 17 2009, 13:01) *
What about UTF-16 support?

Preferences → Advanced → Tagging → MP3 → ID3v2 writer compatibility mode

[edit] that is for writting, foobar has no problem with reading UTF 16

This post has been edited by 2E7AH: Feb 17 2009, 13:57
Go to the top of the page
+Quote Post
Yirkha
post Feb 17 2009, 15:55
Post #12





Group: FB2K Moderator
Posts: 2359
Joined: 30-November 07
Member No.: 49158



If your tags are stored in UTF-16, but read as UTF-8 or other charset, this component can't help you. It doesn't access the tags directly and such texts would get truncated or otherwise mangled before they even get there. (And that's an inherent limitation of how it works, not limited to UTF-16.)


--------------------
Full-quoting makes you scroll past the same junk over and over.
Go to the top of the page
+Quote Post
neothe0ne
post Apr 26 2009, 22:40
Post #13





Group: Members
Posts: 295
Joined: 25-September 05
Member No.: 24684



I was going to say once that Acropolis's masstagger addons component added this functionality to the masstagger in foobar (so there were components doing this before), but with v1.8 around foobar version 0.9.6.x Peter blocked third parties from attaching to it (doh). It's not exactly the same, but if it's possible (I don't know much about codepages) it would be nice to have a function specific to converting Traditional Chinese to Simplified and vice-versa, since Acropolis's not deprecated plugin could do that. I'd understand if you aren't able to or aren't willing though.
Go to the top of the page
+Quote Post
Yirkha
post Apr 26 2009, 23:46
Post #14





Group: FB2K Moderator
Posts: 2359
Joined: 30-November 07
Member No.: 49158



That's a bit more related to what 2E7AH suggested, again not so much about character conversion. I might add another interface for this kind of conversions or custom transliterations, basically it's not a bad idea.


--------------------
Full-quoting makes you scroll past the same junk over and over.
Go to the top of the page
+Quote Post
2E7AH
post Jul 6 2009, 07:46
Post #15





Group: Validating
Posts: 2424
Joined: 21-May 08
Member No.: 53675



Yirkha, can you look here:

I tested one track converting the tags to latin-1 (ISO 8859-1) with Mp3tag, than using foo_chacon to convert it correctly in foobar, but without success. I tried with "Convert to lacal page" checked and unchecked, but same result. I'm in CP1251
It worked OK in the past, but I don't know if I was converting from this code page

Go to the top of the page
+Quote Post
Yirkha
post Apr 20 2010, 22:17
Post #16





Group: FB2K Moderator
Posts: 2359
Joined: 30-November 07
Member No.: 49158



Just FYI, version 3 of the component has been released a few days ago after I was confronted with a case of files with charset problems fixable using silly $replace(), but not using Chacon automatically. It might have been actually something similar to the post above from last year, which I didn't read nor respond to for reasons I don't really know huh.gif

The UI has changed a bit and the plugin is generally more capable, but everything might be even more confusing if you don't know what you are doing now. You'll note the "[x] Convert to local code page first" checkbox is gone - if you have known configuration with it enabled or disabled, it was equivalent to selecting preconversion charset "<system code page>" or "<disabled>" respectively.

Have fun.


--------------------
Full-quoting makes you scroll past the same junk over and over.
Go to the top of the page
+Quote Post
sailorh
post Oct 6 2010, 16:28
Post #17





Group: Members
Posts: 13
Joined: 16-December 04
Member No.: 18704



I absolutely love this plugin. I needed something just like it. I haven't figured out what app is corrupting my tags, but this fixes them.

One little request, could you add the ability to fix only certain fields? Sometimes I have a garbled Artist tag but the Album tag is correctly encoded. So applying a different charset to the whole file ends up messing up the Album field.

But I love the plugin. Keep up the good work. smile.gif
Go to the top of the page
+Quote Post
Yirkha
post Oct 6 2010, 21:55
Post #18





Group: FB2K Moderator
Posts: 2359
Joined: 30-November 07
Member No.: 49158



QUOTE (sailorh @ Oct 6 2010, 17:28) *
I haven't figured out what app is corrupting my tags, but this fixes them.
Huh, you have some app on your computer which randomly mangles your tags and just live with it? huh.gif

QUOTE (sailorh @ Oct 6 2010, 17:28) *
One little request, could you add the ability to fix only certain fields? Sometimes I have a garbled Artist tag but the Album tag is correctly encoded. So applying a different charset to the whole file ends up messing up the Album field.
OK, even though all the tags are written in the same codepage most of the time, I see it might be useful for the kind of problems you have.
For the time being, you can leave a Properties window on the affected tracks opened and combine the values afterwards, or using clipboard etc.


--------------------
Full-quoting makes you scroll past the same junk over and over.
Go to the top of the page
+Quote Post
sailorh
post Oct 7 2010, 15:18
Post #19





Group: Members
Posts: 13
Joined: 16-December 04
Member No.: 18704



Yeah, I'm not sure how these files got mangled. But it is only certain fields in seemingly random files.

Thanks for the suggestion about leaving a properties page open. I'll try that out.
Go to the top of the page
+Quote Post
neothe0ne
post Nov 29 2010, 06:56
Post #20





Group: Members
Posts: 295
Joined: 25-September 05
Member No.: 24684



I'd bet the offending app is WMP11/12.

Yirkha, I think there's a long-standing bug with this component. If you import a CUE sheet with a local codepage encoded in ANSI and try to fix it, the component doesn't rewrite the CUE sheet in Unicode/UTF-8, which results in foobar2000 showing the correct characters but the actual text file having ? marks (which means permanent loss of text if you delete the tracks from your foobar playlist). I've worked around this by pre-saving offending CUE sheets in UTF-8, then running "fix metadata charset", but it'd be nice if the component can do this automatically.
Go to the top of the page
+Quote Post
rend3r
post Dec 15 2010, 18:24
Post #21





Group: Members
Posts: 3
Joined: 20-May 10
Member No.: 80773



Could you add ability to disable conversation certain fields (Artist name, Track Title, Album title) in mp3 tags? Adding checkboxes in "Fix Metadata Charset" window, for example.
Go to the top of the page
+Quote Post
amrok
post Jan 7 2011, 21:03
Post #22





Group: Members
Posts: 1
Joined: 7-January 11
Member No.: 87154



I'm sure it is cp866, cause there is Русский → .Р.......к.и.й. How can I restore original tag? What I have doing wrong?
Go to the top of the page
+Quote Post
lvqcl
post Jan 7 2011, 21:47
Post #23





Group: Developer
Posts: 3327
Joined: 2-December 07
Member No.: 49183



Copy tags and paste them here.
Go to the top of the page
+Quote Post
tksh
post Mar 9 2011, 00:32
Post #24





Group: Members
Posts: 15
Joined: 26-March 05
Member No.: 20947



Feature request: allow removal of certain code page entries -- my non-unicode tags cover four languages and even then I only use a small handful of the total combinations I can choose from.

Or alternatively (and as a much more difficult request), do something similar to the language auto-detection logic in web browsers that guess the correct encoding.
Go to the top of the page
+Quote Post
Yoshi8765
post Mar 18 2011, 11:16
Post #25





Group: Members
Posts: 3
Joined: 18-March 11
Member No.: 89097



Just going to say, this is such an awesome component! Thank you thank you thank you! I'm an avid fan of Jpop and Jrock and was bummed when my songs appeared in gibberish in foobar2k, but with this component, they are fixed in literally 2 seconds! It's so easy and straightforward to use. laugh.gif
Go to the top of the page
+Quote Post

2 Pages V   1 2 >
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 25th July 2014 - 15:49