Adding voices to children’s stories.

There are several stages to this.  

First is the generation of an HTML file that can display the story text and the pictures.  Here I am using Libre Office Writer.  An add-on adds the ability to export the material as xhtml, in a form that is easier to work with than Word’s ‘save as HTML filtered’.  I am using Libre for these explanatory pages.

As JavaScript starts to impose the rules of XML, every image has ‘alt’ code to describe it.  Although it is meant to describe images that cannot be loaded, it can be repurposed to contain the text to be spoken, plus the selection of a voice to speak it.  It can be split, by adding ‘/’ to separate the two functions.

The first attempt used the default voice, but this was not sympathetic to the story. So to find alternative voices to use, an app was constructed with AI help.

chooseSpeaker.htm

List the code to see a simple structure.  The speaker could be chosen by adding a number after the string, with / to separate it.

But when putting the story on the web was considered, it was realised that the list could be in a different order for every machine that used it!

Try hearSpeaker.htm to hear your own list.

So that meant that speakers must be selected by name.  More searching produced cumbersome code that required knowledge of the whole name, including its language.  But by using ‘includes’ instead of equality, an easier solution was found.

Try findSpeaker.htm with George, Susan and Catherine. (Case sensitive)

When it came to using the code for storytelling, a way had to be found to attach it to each image reference.  An AI-generated solution added a listener to each image, selected by class, when the page was completely loaded.  This was saved as a .js file for inclusion, together with the code to select and apply the voice.

The next wish is to add expression to the voice.  References have started to appear to SSML, see Speech Synthesis Markup Language (SSML)  |  Cloud Text-to-Speech API  |  Google Cloud .  Does this work with common browser speech engines?

It seems instead that it is a compilation engine that is part of a paid service.

So perhaps punctuation can add some emotion.  Stops and commas can introduce delays, and perhaps a question mark can cause a pitch change.

With quote.htm you can test the effect of adding a question mark.  With George I cannot hear any change, but Zira gives a slight upturn in tone.

So the only methods that seem available at present are to change the speed and pitch of the whole speech.  The text in the alt tab can be split as text/speaker/pitch/speed.

Here is a deep utterance from George/George/0.5/0.5      Here is a normal utterance from George/George/1/1      And here is George in a rush/George/1.5/1.5

Catherine      With a question mark?     Without a quesstion mark/Catherine/1/1      Susan   With a question mark?     Without a quesstion mark/Susan/1/1      James     With a question mark?     Without a quesstion mark/James/1/1

You can see the final result on Showscript.com