Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Text to speech component (Read 77778 times) previous topic - next topic
0 Members and 2 Guests are viewing this topic.

Text to speech component

Reply #76
Quote
it's only 16-bit, first cd is crystal, second mike ...

I believe he asked which sample rates the voices are availabe in, not bit depths.

Text to speech component

Reply #77
I have no idea if this is possible or even feasable... but it's just a suggestion.

Could there be a preference where you could add your own 'phonetic' spelling to some words?

For example if the speech engine gets the word 'Pantera' wrong we could type in something like "Pantera = Pant air a' so every time the program came across the word pantera it would pronounce it 'pant air a'?

Like I said, I have no idea if that's possible with this since it's using the Microsoft Speech Engine but it would be good.

Spadge


Text to speech component

Reply #79
Yup, I also think it'spossible. Though I think most of the work would be in maintainng a database of the phonetics for the different artists and tracks. How can you do this with the AT&T engine mazy? (I would love to get my hands on the higher quality version).

Text to speech component

Reply #80
in the ATTNaturalVoices\TTS1.4\Desktop\bin dir there's WinDictEdit.exe ...

you can open ATTNaturalVoices\TTS1.4\Desktop\data\en_us\en_us.dict with it and add your own words and their phonetic transcription.

example (included in the file  ):

bin Laden /b ih n 1 # l aa 1 d ih n 0/

you can select language (en_us, en_uk, etc.) and part of speech (none, noun, verb, modifier, function, interjection)

one word can have more phonetic transcriptions, differentiate by language and / or part

Text to speech component

Reply #81
from help for WinDictEdit

The Dictionary Editor can be used to change the pronunciation of words or allow you to easily add a new abbreviation.  We’ll describe two different examples to explain how to use the Dictionary Editor to do both.

Defining Replacements
In this example, we’ll demonstrate how to add a new transcription for the word “WRT” which we want to synthesize as “with respect to”.  This same technique works for other words as well such as “NE” for “north east”, “MPH” for “miles per hour”, etc.

First, start the Dictionary Editor and either open an existing dictionary file or create a new file. Next, create a new word from the Edit menu.  This will pop up a dialog as shown below.


The top portion of the dialog box describes the word that is being added or changed in the dictionary.  The bottom portion of the dialog can be used to experiment with different pronunciations.  The red “mouth/lips” button allows you to hear the how the word or list of phonemes will be pronounced.  The ‘T’ button displays the phonemes for the word in the “word” or “sounds like” text box.

Adding replacements is very simple:

1. Type in the word to be expanded in the “word” text box in the top section of the dialog, “WRT” in our example.

2.Type in the replacement text in the bottom “Sounds Like” text box in the bottom section of the dialog, e.g. “with respect to”.

3. C lick the “T” button next to the “Sounds Like” text box to retrieve the list of phonemes for the text in the “Sounds Like” text box.

4. Copy the phonemes from the “Phonemes” text box in the “Sounds Like” section at the bottom of the dialog and paste the phonemes into the phonemes text box in the “Edit Entry” section at the top of the dialog.

5.  Click the red “mouth/lips” button next to the phonemes in the top part of the dialog to hear how the word “WRT” will be pronounced.

6.  Click the “save” button and you’ve defined a new pronunciation.

Next step is to tell the TTS engine about your dictionary file, but first we’ll explore changing the pronunciation of a word.

Changing the default pronunciation of a word
In this example, we’ll change the pronuncation of the name “Proulx” from the default which sounds like “prowlx” to sound like “Pru” which rhymes with “Sue”.  This time, we’re going to use the WinDictEdit tool to replace a few of the phonemes in the default pronunication of the word. 

Here’s the procedure:


1.  Type in the word to be expanded in the “word” text box in the top section of the dialog, “Proulx” in our example

2.  Click the red “mouth/lips” button next to the word in the top part of the dialog to hear how the word “Proulx” will be pronounced.

3.  Click the “T” button next to the word “Proulx” in the top part of the dialog to see the phonemes for the word “Proulx” which are “p r aw l k s 1”.  Note that the first part of the pronunciation is correct so we can use the “p r” phonemes but need to replace the “aw l k s 1” with something that rhymes with “Sue”.

4.  Next, we need to identify the phonemes that make up the “ue” in “Sue” so type “Sue” in the “Sounds like” text box at the bottom of the dialog.

5.  Click the “T” button next to the “Sounds Like” text box to retrieve the list of phonemes “s uw 1” for the text “Sue” in the “Sounds Like” text box.  The “uw” phoneme provides the “ue” sound in “Sue”. You can verify this by deleting the ‘s’ phoneme from the phonemes text box in the “Sounds Like” section and then click the “lips/mouth” button to hear the “uw” phoneme.

6.  We need to replace the “aw l k s” phonemes in the “Phonemes” text box in the “Word” pane of the dialog with the “uw” from the “Sounds Like” pane so  copy that phoneme up to the  top, making the list of phonemes in the word pane “p r uw 1”.

7.  Click the red “mouth/lips” button next to the phonemes in the top part of the dialog to hear how the word “Proulx” will be pronounced. Sure enough, “Proulx” now rhymes with “Sue”.

8.  Click the “save” button and you’ve defined a new pronunciation for the name “Proulx”.

Adding Custom Pronunciations to Your Application
Once you have defined your custom pronunciations, you’ll need to tell your application about them.  Microsoft SAPI 5.1 allows for the creation of user and application dictionaries.  The intent of this feature is that the application dictionary includes pronunciations that apply to all users while the user dictionary is unique to an individual user but you may use them any way you wish. The AT&T Labs Natural Voices TTS SDK allows an application to define any number of custom dictionaries, and to control the order in which they are searched.

When searching for a transcription, the TTS Server first searches custom dictionaries, in the order specified by the client application, and then its own internal dictionaries to find some pronunciation for the word.  The search stops as soon as a pronunciation is found.

You’ll find code samples for adding custom dictionaries to your application in Chapter 5 of the System Developer's Guide which describes the SDK.

Alternatively, you can update the file which the engine uses during engine initialization.  You’ll find tts.cfg in the data subdirectory, e.g.

C:\program files\attnaturalvoices\tts1.3\desktop\data\tts.cfg

The tts.cfg file describes all of the voices and languages that are available to the TTS engine.  Browse through the file to find the “language” section for the language for which you’re defining the dictionary, e.g.

Code: [Select]
Language                en_us 
LanguageLocale          en_us
LanguageDictionary      en_us\en_us.dict att_darpabet_english
LanguageTextAnalysis    en_us\fe_en_us.dll

 
is the section for the US English language.  To use a dictionary file en_us\mydict.dict, add a new line to tts.cfg as follows:

Code: [Select]
Language                en_us 
LanguageLocale          en_us
LanguageDictionary      en_us\en_us.dict att_darpabet_english
LanguageTextAnalysis    en_us\fe_en_us.dll
UserDictionary          es_us\mydict.dict


The TTS engine will use the transcriptions in mydict.dict.

Text to speech component

Reply #82
Thanks for the info mazy. Great work!

Text to speech component

Reply #83
When closing 0.7RC10 after using 'text to speech', I get this:

All is OK if 'text to speech' has not been used even though it is loaded in the component library.

Text to speech component

Reply #84
RC10, WinME, the whatever 5.1 thing installed, using the dll for .7

This ERROR

ERROR (CORE) : Failed to load DLL: foo_tts.dll, reason: Unable to load dll.

Text to speech component

Reply #85
Quote
When closing 0.7RC10 after using 'text to speech', I get this:

All is OK if 'text to speech' has not been used even though it is loaded in the component library.

Are you sure you don't use foo_history? I just noticed that when using it, I got the same error all the time, but now, after removing the plugin the errors are gone.

Text to speech component

Reply #86
I don't own winme so I cannot repeat your error ditto_n. In this case, I hope anza's theory is correct .

Text to speech component

Reply #87
Wow, I just experience the same error myself when closing down foobar! This was just after updating to RC10. However, when I removed all plugins, the error was still there, so I think it's related to the foobar database functionality and not to any 3rd party plugins.
The metedb_handle is what foobar provides in order for a plugin to interface with the database (from a playlist). I found that simply by pressing "next track" a number of times within the playlist, foobar will leak the corresoponding number of objects (without any plugins) when you close down.

[UPDATE] I still had foo_remote in my components directory and this was what was giving the error. When this was removed, the error was gone. I also found the same problem with foo_tts, so it may be that a some other plugins will give this error now (at least those plugins that used an older version of the SDK). I will look into my source code and see if I can remove the error. Will keep you posted.

Text to speech component

Reply #88
Quote
I removed all plugins

Looks like you didn't. Apparently you are the only person having this kind of problem (leak after pressing next).
Microsoft Windows: We can't script here, this is bat country.

Text to speech component

Reply #89
You were right. foo_remote was giving me the error (refer to updated post above).

Text to speech component

Reply #90
Quote
I still had foo_remote in my components directory and this was what was giving the error. When this was removed, the error was gone. I also found the same problem with foo_tts, so it may be that a some other plugins will give this error now (at least those plugins that used an older version of the SDK). I will look into my source code and see if I can remove the error. Will keep you posted.

It's a matter of having resource leaks in the code, not of compiling with latest SDK.
For your information, that message has been there for really long time, but was dumped to debugger console instead; I thought it would work better to replace it with a popup so component devs actually notice the problem.
Microsoft Windows: We can't script here, this is bat country.

Text to speech component

Reply #91
True. It's just that I saw the date stamp of meta_db.cpp within the latest SDK having a date stamp of Aug 29th, which made me think that a recompile with the latest SDK might alleviate the problem. Is it maybe the case that the exception handling in the SDK has improved and the leak was always there? (no such problems were evident in pre-RC10 versions of foobar).

Text to speech component

Reply #92
Erm, read my post above again.
If you run any pre-RC9 (probably also most of 0.6x) under debugger and leak metadb_handles, you will get the same message logged into debugger console. Making a MessageBox() out of it is only thing I changed recently about it (and it's done in the exe, not by the SDK).
I can't make my SDK prevent your code from leaking things (unless I implement component signing, heh heh), tough luck.
Microsoft Windows: We can't script here, this is bat country.

Text to speech component

Reply #93
Re-read & duly noted . I'll look into it and keep you posted.

Text to speech component

Reply #94
Ok, the problem is fixed. You were absolutely right: I was not releasing a metadb_handle that automatically add_refs when I receive it.

You can download the update from the link in the first post (note that this is only for 0.7, as I'm no longer bothering with 0.6x).

Text to speech component

Reply #95
Quote
Ok, the problem is fixed. You were absolutely right: I was not releasing a metadb_handle that automatically add_refs when I receive it.

You can download the update from the link in the first post (note that this is only for 0.7, as I'm no longer bothering with 0.6x).

Are you sure this is the updated file? I'm still getting to same metadb message box when 'text to speech' is used (no it's not foo_history, foo_remote or foo_dbsearch causing the problem). When I extract foo_tts0.7 which I've loaded from the link in your first post,

I get foo_tts.dll which was created 18 August 2003.

Text to speech component

Reply #96
I just checked via the same link and it has a modified date of sept 1 and it is 36kb in size. You may have been too quick for me rectangle when I was updating my site to the new update. Anyway, give the download another try (and clear your internet cache just in case).
Could somebody else please verify this for me? Thanks.

Text to speech component

Reply #97
Quote
You may have been too quick for me rectangle when I was updating my site to the new update.

Thanks paulski. All is OK now. I've got the new file and it works fine. I was too quick for you. I really must get a life... 

Text to speech component

Reply #98
Just wanted to say that i finally got a copy of AT&T Natural Voices, and love the TTS foobar component.  I hope it will be mantained for new foobar versions

Also, for those of you wondering where to get the naturalvoices software, you can buy it for $35 here: http://www.nextup.com/nvsamples.html

Text to speech component

Reply #99
Recompiled for foobar2000 v0.8.2. Bumped component version to 0.2. Fixed metadb_handle leak. Fixed possible problem with non-ASCII characters in read text. "Auto announce at end of song" no longer works, it already was disabled in the source code I found. Renamed menu command to "Components/Announce playing track" (it didn't read out a playlist item anyway).

Note: I do not intend to take over development of this component, so posting feature requests is futile, unless someone else does it.

Links:
plugin
updated source

Edit: I missed one conversion from internal UTF-8 representation to UTF-16, fixed now.