This page has been robot translated, sorry for typos if any. Original content here.
Speech synthesis and voice reading e-books
Automatic speech synthesis - the process of generating a speech signal - a technology that makes it possible to read a text (document, letter, SMS) in a voice close to natural. In order for the synthesized speech to sound natural, it is necessary to solve a whole range of problems related both to ensuring the naturalness of the voice at the level of timbre, smoothness of sound and intonation, and to the correct placement of stresses, decoding of abbreviations, numbers, abbreviations and special characters.
Synthesis technology can be in demand both in a narrow subject area, and in a wide, or unlimited. For a narrow area, the sound quality can be reduced to the most natural, due to the compilation of pre-recorded long speech fragments related to this area. An example of such a synthesis (called macrosynthesis) can serve as a warning system for the movement of trains used at stations of large cities in Russia. It is much more difficult to make a speech synthesizer for unlimited text in any subject area. In this case, the user can ask the synthesis system for pronunciation any phrase or sentence.
Speech Synthesis Methods
Today, there are three main areas of synthesis: the diphon approach (the diphon is the sound from the middle of one phoneme to the middle of the adjacent phoneme), the allophone approach (the implementation of the phoneme surrounded by the context left and right) and Unit Selection technology (selection of sound elements from the speech base). But each of them individually has its drawbacks:
Diphon approach - allows you to make a legible, but unnatural timbre speech signal. In the timbre of synthesized speech, the timbre of the donor speaker is not recognized.
Allophonic approach - the naturalness of the voice is slightly higher than in the diphon approach due to a larger set of sound elements. However, as in diphon synthesis, the voice turns out to be rather robotic, and it is difficult to recognize the voice of the donor announcer in it.
Unit Selection - the naturalness of the speech timbre is high and in the synthesized voice retains the timbre of the voice of the speaker-donor. However, due to restrictions on the size of the voice base, some texts (words and their combinations) are pronounced with noticeable distortions until the complete loss of individual sounds.
Add. Information: Text-to-speech (TTS) engine or speech synthesis engines - programs similar to drivers are designed to convert and synthesize text into a sound wave. Speech synthesis engines do not have built-in controls and just one TTS is not enough for your computer to speak. To use TTS, you need a synthesis program (TextAloud, Cool Reader, Balabolka, etc.), which replaces the interface, allowing you to work with TTS, change its settings, sound and timbre of speech, and manage other features.
Acapela, the developers of the popular Russian-language voice module Nikolay, released a new Russian, female voice engine, which was called Alyona, runs on SAPI-5 with a frequency of 22 kHz, Alyona is far ahead of Nikolai in the quality of synthesized speech, the voice timbre and intonation are more pleasant than users Katerina’s engine from ScanSoft RealSpeak.
Along with the engine comes Lexicon Manager - a dictionary editor that allows you to change the pronunciation of words both literally and phonetically.
For example, using KooBAudio, mp3book2005 and this voice engine - a 4-hour novel can be voiced and transferred to mp3 in 10 minutes
Acapela Alyona - works well with programs such as: KooBAudio 0.7.0.7, mp3book2005, Balabolka, Cool Reader ...
Good voice engine! In general, he really reads more clearly than all the other Russian-speaking ones, including Nikolai, but the latter has a more pleasant voice due to the low timbre of his voice and makes less mistakes in stresses.
Download Balabolka 22.214.171.1245 (7 Mb) Virus Free by KAV
Free and the most successful, with all the necessary functions, the ability to record in mp3.
ToM Reader 2.73
Download ToM Reader 2.73 (1 Mb) Virus Free by KAV
Free, familiar type of book, visual bookmarks, the ability to record in mp3.
ToM Reader Russian reader. The main advantage is that it does not interfere with the pronunciation of the voice engine, and reads sentences, and not paragraphs as in many programs, so it’s easy to track reading. It looks like a bound book, which is also convenient. Supported formats: txt, doc, rtf, htm.
Download MP3book2005 (7 Mb) Virus Free by KAV
With all the necessary functions for editing vocabulary, writing to MP3 and reading.
MP3book2005 is a program for editing vocabulary, reading, and writing to mp3. It edits the dictionary perfectly, but I would like it to be a book view. Supported formats: txt, rtf, htm, fb2.
2. Download the Infovox Desktop 2.220 Engine SP3 ( Acapela_Infovox_Desktop_2.220_EngineSP3.rar ) (24.08 Mb) - a control and activation program, with a native, simple reader, and most importantly, the program for editing the Alenin pronunciation dictionary - Lexicon Manager.
4. Download the Reader , a program for reading books ( links above ).
5. Download AlyonaSlovari-Alyona22k ( AlyonaSlovari.rar ) (1.2 Mb) - dictionaries for Alena for 24345 words.
7. Download MSagent.exe and ms_speech_api.exe ( sintez_bib.rar ) (1.09 MB archive) - MSagent.exe (400 KB) - agent for working with speech recognition and synthesis engines, ms_speech_api.exe (830 KB) - libraries necessary for the operation of speech recognition and speech synthesis programs (not needed for Windows 7).
Supports OS: Windows XP, Windows Vista, Windows 7, 8.
1) Install the main Infovox Desktop 2.220 Engine control
2) On top, install the voice engine Alyona Russian 2.220
3) Launch the License Manager and copy the "Copy to Clipboard" License Code button
4) Run key.exe, paste the code there and click "Make Key" to generate a license file.
5) Import the resulting file into the License Manager by clicking "Import License File"
6) Add dictionaries using the Lexicon Manager (Lexicon-Voice Associations - Add Lexicon ... or File - Import Lexicon)
Additional libraries are required for operation (install in this order!):
1. MSagent.exe and ms_speech_api.exe - libraries necessary for the operation of speech recognition and synthesis programs. (not needed for Win 7)
2. Acapela ELAN Tempo Multimedia V126.96.36.199 Nicolai - Russian language synthesis engine for the agent.
3. ToM Reader 2.73 or MP3book2005 - programs for reading, which one will like.
4. Download the stress dictionary for ELAN Tempo Multimedia Nikolai .
We copy the main dictionary exc_rus.txt, and abb_rus.txt for abbreviations, in the C: \ Program Files \ Elan package, and exc_rus.txt also in Program Files \ MP3book2005 \ DIC, with a replacement.
MP3book2005 only edits exc_rus.txt, abb_rus.txt needs to be edited in Notepad or Word. These are company dictionaries, they are used by ELAN Tempo Multimedia.
There is also a dictionary that is embedded in the ToM Reader reading program (Digalo Russian Nicolai.dic). In no case do not use such dictionaries, they only worsen the pronunciation.
In ToM Reader, the engine settings are approximately the following:
To edit the dictionary in MP3book2005, click the Dictionary button, and if necessary, load the dictionary exc_rus.txt. To add a new word to the dictionary, press the button at the top of Pronunciation, write the word on the bottom line, if it is highlighted in the text, it will already be there, click the Check button, place the cursor on the correct stress, press (<), and if the pronunciation is suitable click Add. Then the button at the top Save to overwrite the dictionary. Abbreviations, such as asterisks, are not used in company dictionaries, each word is written separately. This is not convenient, but the pronunciation is better.
You can keep ToM Reader and MP3book2005 open at the same time. You read in ToM Reader, edit in MP3book2005, and after changing the dictionary, ToM Reader needs to be rebooted. Only MP3book2005 can be used. It should be borne in mind that ToM Reader is free, and not a registered MP3book2005 has minor restrictions.
Acapela ELAN Tempo Multimedia sometimes reads words written in CAPITAL letters, spelling.
Digalo Nicholas is an old version.
You can take Digalo TTS 2000 (DigaloCoreRus.exe-7.44 MB, SAPI 4) and ToM Reader Russian. Digalo TTS 2000 is a voice engine that supports several languages, including Russian. It is paid, but you can find crack. ToM Reader Russian is a program that uses Digalo TTS 2000 to read books.
Digalo TTS 2000 has the Russian voice Nicolai, it is better than voices from other companies, but not perfect, so it needs a dictionary. There are two options: use the dictionary embedded in ToM Reader, and use the dictionary in Digalo itself. The first is simpler because it uses asterisks (*) that replace part of the word, but are of lower quality, the second is more complex, but also of higher quality.
In the first case, we take the dictionary Digalo Russian Nicolai.dic and copy it to the dict folder in ToM Reader, which appears when you open ToM Reader and in the settings set: use the dictionary. In the second case, the process is slightly longer. But its advantage is that you will get a better pronunciation, and other programs, such as PROMT, which do not have the ability to connect a dictionary, will have the correct pronunciation, because Digalo will use its vocabulary.
So, take DigaloEditor 1.0 and unpack it in c: \ Program Files \ Digalo \ Digalo 2000 Russian \ russian \ data. There appear: DigaloEditor.exe - a program for editing the dictionary, abb_rus.txt and exc_rus.txt - dictionaries. abb_rus.txt for abbreviations, exc_rus.txt for other words. DigaloEditor.exe only edits exc_rus.txt, abb_rus.txt needs to be edited in Notepad or Word.
Now about the features of editing in DigaloEditor.
If you want to add a word or find, click the Add button and start typing, and a search will automatically occur, and if there is such a combination, it is highlighted in red. And the most important thing. When typing a word, it is already recorded in the dictionary, and if you exit it while saving the result, it will be in the dictionary. In independence, is there such a word in the dictionary or not. Therefore, if you wrote the correct word, click the save button. If it is not correct or there is already such a word, then delete the entered line with the Delete button. And so save-delete after each set. The accent is put with a "<" without quotation marks, there should be an equal number of spaces on one side and the other. For example: "fish trout fish foret <l". Line: “fish trout fish-foret <l” will cause an error in Digalo. Words with variable emphasis, depending on the meaning, must be written in a phrase. Padlock = padlock <k.
The result is heard after restarting the reading program.
Russian Speech Synthesis Program Talker 2.0.6 and Digalo and SpeechCube Voice Engines
Govorilka is a small voice reading program. She can read aloud any text that you give her in any language, in any established voice. Records text to an MP3 file.
Key features of the Govorilka program.
Reading text by voice.
Writing readable text to an audio file (* .WAV, * .MP3) with increased speed * and with a breakdown into parts of a given size.
Adjust reading speed and pitch.
Automatically scrolls the text on the screen so that the readable fragment is always visible (tracking the speech). In this case, the read text can be highlighted in color.
Refillable pronunciation dictionaries, which makes it easy to adjust the pronunciation of individual words and phrases.
Opens large files in DOS and Windows encoding.
Opens texts from Microsoft® Word and HTML files.
Read text size up to 2 gigabytes.
The text and cursor position when exiting the program are remembered.
Please note that the current version of the program is a test (beta) - there may be minor errors.
What is useful Govorilka: Govorilka protects your eyesight. With it, the texts of electronic books can be listened to, and not read from the monitor screen. You can find out how words and phrases sound in a foreign language. You can quickly burn books to MP3 files and listen to them on your MP3 player. With the help of Talkers, you can evaluate the capabilities of computer speech synthesis and teach your computer how to talk.
Description: A speaker is needed for someone who likes listening to texts more than reading them from the monitor screen or takes care of their eyesight and wants to read electronic book texts while sitting away from the monitor, who wants to know how words and phrases sound in a foreign language. Anyone who wants to teach their computer to speak and who is just curious to find out how it all works needs a talker.
Additional features: change in reading speed and pitch; opening large files in DOS and Windows encoding, as well as reading text from MS Word files; record speech into an audio file (wav or mp3); automatic scrolling of text on the screen so that the readable fragment is always visible; reading text on the clipboard, the ability to change the pronunciation (dictionary).
Add.Information: Govorilka’s interface is multilingual, in Windows 2000 / XP the program will start working immediately, but Windows 95/98 / NT users may have to download some missing files - text-to-speech engine and SAPI (for details, see the home page) .
Talker 2.0.6 and Digalo and SpeechCube voice enginesVirus Free by KAV
Talker 2.2.2 (the official final version of December 9, 2009)Virus Free by KAV
Speech synthesis systems
Reader - a program designed for convenient reading of texts and electronic books from a computer screen. In addition, many readers are able to voice texts, using special speech synthesis programs for this.
A good reader has many features that make reading from the screen less tiring. The layout in the form of a book, smooth scrolling of text, text smoothing are just some of the tools used in readers.
** Govorilka ** is a small free program for reading texts using voice synthesis engines.
In order for programs included in the “Readers” category to be able to read texts with a “human voice”, the SAPI library (Speech Application Programming Interface, or Speech API) and voice engines must be installed in the system.
To date, two versions of the Speech API are common: SAPI4 and SAPI5. Both of these libraries are incompatible, but they do not interfere with each other and can work on the same computer, therefore it is recommended that you install both of them for programs supporting both libraries (this will allow you to have more voice engines).
Operating systems Windows XP, Vista, and 7 usually already have the installed SAPI5 libraries, so you only need to (but not necessarily) install SAPI4. However, there may be times when you need to install SAPI5. You can download and familiarize yourself with the installation features of each of these libraries on their pages: Download SAPI .
Also, for speech synthesis, the computer must have installed voice engines for the desired language. It was already indicated above that the SAPI4 and SAPI5 libraries are incompatible, therefore each of the voice engines can work only with one of these libraries. If both Speech API libraries are installed on your computer, then you can install all voice engines: Download voice engines for SAPI .
Screen Access Programs
The VIRGO 4 screen access program is the result of BAUM's many years of work in developing the VIRGO program, whose main goal is to ensure the comfortable operation of blind and visually impaired users with Windows. VIRGO 4 allows the user to choose which information to display on the Braille display and which to pronounce in voice. Visually impaired users can also use the integrated Galileo screen enlargement system in VIRGO 4. The integrated approach of VIRGO 4, using braille and speech, flexibly combines the power of both methods of information output for the convenience of the user.
MyStick is the first mobile screen access that works without installation on all modern Windows computers. Inserted into a free USB port on the computer, MyStick starts automatically and the user can immediately work with the computer. After removing MyStick, no files remain on the computer and no configuration changes. MyStick is a U3 flash drive. With MyStick, blind and partially sighted PC users are not tied to a specific, specially equipped computer and can access any computer running Windows. There are two options for MyStick: with speech output and screen magnification and only with speech output. MyStick versions are available for Russian, English, German, French, Swedish, Norwegian and Danish.
Screen access program Cobra 9.1 simplifies the work with Windows 7, Vista or Windows XP for blind and visually impaired computer users. COBRA combines all the standard features of a modern, user-friendly screen access program. COBRA captures user requirements and displays important information from a computer monitor using speech, braille, or screen magnification.
Speech synthesis has a long history, overgrown with legends. Back in the 10th century, Herbert Avrilaksky was credited with owning the art of making a teraphim - a talking dead head. Made of bronze, this head answered with questions “yes” and “no” to the questions of anyone who addressed it. In the middle of the 13th century, the Dominican monk Albert von Bolstedt and the English philosopher and naturalist Roger Bacon also tried to create the first examples of “talking heads”.
At the end of the 18th century, the Danish scientist Christian Kratzenstein, a full member of the Russian Academy of Sciences, created a model of the human voice path that can pronounce five long vowel sounds (a, e, u, o, y). The model was a system of acoustic resonators of various shapes that made vowels using vibrating reeds excited by an air stream. In 1778, the Austrian scientist Wolfgang von Kampelen supplemented the Kratzenstein model with models of the tongue and lips and introduced an acoustic-mechanical speaking machine capable of reproducing certain sounds and their combinations. Hissing and whistling were blown out with the help of a special fur with manual control. In 1837, scientist Charles Wheatstone introduced an improved version of the machine that could reproduce vowels and most consonants. And in 1846, Joseph Faber demonstrated his talking organ, Euphonia, in which an attempt was made to synthesize not only speech, but also singing.
At the end of the XIX century, the famous scientist Alexander Bell created his own "talking" mechanical model, very similar in design to the Wheatstone machine. With the onset of the 20th century, the era of electric machines began, and scientists were able to use sound wave generators and build algorithmic models on their basis.
In the 1930s, Bellr Labs employee Homer Dudley, working on the problem of finding ways to reduce the bandwidth needed in telephony to increase its transmission capacity, develops VOCODER (short for English voice - voice, English coder - encoder ) - a keyboard-controlled electronic analyzer and speech synthesizer. Dudley's idea was to analyze the voice signal, disassemble it into parts, and re-synthesize into a less demanding line throughput. An enhanced version of Dudley's vocoder, VODER, was unveiled at the 1939 New York World's Fair.
The first speech synthesizers sounded rather unnatural, and often it was hardly possible to make out the phrases they produced. However, the quality of synthesized speech has constantly improved, and the speech generated by modern speech synthesis systems can sometimes not be distinguished from real human speech. But despite the success of electronic speech synthesizers, research in the field of creating mechanical speech synthesizers is still being conducted, for example, for use in humanoid robots.
The first computer-based speech synthesis systems began to appear in the late 1950s, and the first text-to-speech synthesizer was created in 1968.
It will not be superfluous for your friends to learn this information, share an article with them!