This page has been robot translated, sorry for typos if any. Original content here.
Speech synthesis and voice reading e-books
Automatic speech synthesis - the process of generating a speech signal - a technology that makes it possible to read the text (document, letter, SMS) in a voice close to the natural one. In order for the synthesized speech to sound natural, it is necessary to solve a whole range of tasks related both to ensuring the naturalness of voice at the level of timbre, smoothness of sound and intonation, and with the correct placement of accents, decoding abbreviations, numbers, abbreviations and special characters.
Synthesis technology can be claimed both in a narrow subject area, and in a broad, or unlimited. For a narrow region, the sound quality can be reduced to the most natural, due to the compilation of pre-recorded long speech fragments belonging to this region. An example of such a synthesis (called macrosynthesis) can serve as a warning system for the movement of trains that are used at the stations of large cities in Russia. It is much more difficult to make a speech synthesizer for unlimited text in any subject area. In this case, the user can set the synthesis system to pronounce any phrase or sentence.
Speech synthesis methods
Today there are three main directions of synthesis: the diphone approach (the diphone is the sound from the middle of one phoneme to the middle of the neighboring phoneme), the allophone approach (realization of the phoneme in the context of left and right) and the Unit Selection technology (selection of sound elements from the speech base). But each of them individually has its drawbacks:
Diphon approach allows you to make a speech signal that is legible but unnatural in timbre. In the timbre of synthesized speech, the timbre of the speaker donor is not recognized.
Allophone approach - the naturalness of the voice is slightly higher than in the diphone approach due to a larger set of sound elements. However, as in the diphonic synthesis, the voice is quite robotic, and it is difficult to recognize the voice of the donor speaker in it.
Unit Selection - the naturalness of the voice timbre is high and in the synthesized voice retains the timbre of the voice of the donor announcer. However, due to limitations on the size of the voice base, some texts (words and their combinations) are pronounced with noticeable distortions, up to the complete loss of individual sounds.
Add. Information: Text-to-speech (TTS) engine or speech synthesis engines - programs similar to drivers, designed to convert and synthesize text into a sound wave. The speech synthesis engines do not have built-in controls and it’s not enough for one TTS to have your computer speak. To use TTS, you need a synthesis program (TextAloud, Cool Reader, Balabolka, etc.), which replaces the interface, allowing you to work with TTS, change its settings, sound and voice and control other features.
The company Acapela, the developers of the popular Russian-language voice module Nikolay, released a new Russian, female voice engine, which was named Alyona, works on SAPI-5 at 22 KHz, according to the quality of the synthesized speech, Alyona is far ahead of Nicholas, the voice timbre and intonation are more pleasant for users than at the Katerina engine from ScanSoft RealSpeak.
Together with the engine comes Lexicon Manager, a dictionary editor that allows you to change the pronunciation of words both literally and phonetically.
For example, using KooBAudio, mp3book2005 and this voice engine - a 4 hour romance can be voiced and converted to mp3 in 10 minutes
Acapela Alyona - works well with such programs as: KooBAudio 0.7.0.7, mp3book2005, Balabolka, Cool Reader ...
Good voice engine! In general, he really reads more clearly than all the other Russian speakers, including Nikolai, but the latter has a more pleasant voice due to the low timbre of his voice and he makes less mistakes in accents.
Download Balabolka 220.127.116.115 (7 Mb) Virus Free by KAV
Free and the most successful, with all the necessary functions, the ability to record in mp3.
ToM Reader 2.73
Download ToM Reader 2.73 (1 Mb) Virus Free by KAV
Free, familiar book, visual bookmarks, the ability to record in mp3.
ToM Reader Russian program for reading. The main advantage is that it does not interfere with the pronunciation of the voice engine, and reads the sentences and not the paragraphs as in many programs, so it is easy to track the reading. It looks like a book in a binding, which is also convenient. Supported formats: txt, doc, rtf, htm.
Download MP3book2005 (7 Mb) Virus Free by KAV
With all the necessary functions for editing the dictionary, writing to MP3 and reading.
MP3book2005 is a dictionary editing, reading, and recording software for mp3. Edits the dictionary perfectly, but I would like it to be a book. Supported formats: txt, rtf, htm, fb2.
2. Download Infovox Desktop 2.220 Engine SP3 ( Acapela_Infovox_Desktop_2.220_EngineSP3.rar ) (24.08 Mb) - management and activation program, with a native, simple reader, and most importantly with the Alyonny Dictionary dictionary editing program - Lexicon Manager.
4. Download the reader , a program for reading books ( links above ).
5. Download AlyonaSlovari-Alyona22k ( AlyonaSlovari.rar ) (1.2 Mb) - dictionaries for Alain in 24,345 words.
7. Download MSagent.exe and ms_speech_api.exe ( sintez_bib.rar ) (archive-1.09 MB) - MSagent.exe (400 KB) - agent for working with speech recognition and synthesis engines, ms_speech_api.exe (830 KB) - libraries necessary for the operation of speech recognition and synthesis programs (for Windows 7 are not needed).
Supports OS: Windows XP, Windows Vista, Windows 7, 8.
1) Install the main control Infovox Desktop 2.220 Engine
2) On top, install the voice engine Alyona English 2.220
3) Launch the License Manager and copy the License Code button "Copy to Clipboard"
4) Launch key.exe, paste the code there and click "Make Key" to generate the license file.
5) Import the resulting file in the License Manager by clicking "Import License File"
6) Add dictionaries using Lexicon Manager (Lexicon-Voice Associations - Add Lexicon ... or File - Import Lexicon)
Additional libraries are required for work (install in this order!):
1. MSagent.exe and ms_speech_api.exe are libraries necessary for the operation of speech recognition and synthesis programs. (not needed for Win 7)
2. Acapela ELAN Tempo Multimedia V18.104.22.168 Nicolai is an engine for the synthesis of Russian speech for an agent.
3. ToM Reader 2.73 or MP3book2005 - reading software, which one you like.
4. Download stress dictionary for ELAN Tempo Multimedia Nikolai .
We copy the main exc_rus.txt dictionary, and abb_rus.txt - for abbreviations, in the C: \ Program Files \ Elan package, and exc_rus.txt also in Program Files \ MP3book2005 \ DIC, with replacement.
MP3book2005 only edits exc_rus.txt, abb_rus.txt should be edited in Notepad or Word. These are branded dictionaries used by ELAN Tempo Multimedia.
There is also a dictionary that is built into the ToM Reader (Digalo Russian Nicolai.dic) reading program. In no case do not use such dictionaries, they only worsen the pronunciation.
In ToM Reader, the engine settings are as follows:
To edit the dictionary in the MP3book2005, click the Dictionary button, and if required, load the exc_rus.txt dictionary. To add a new word to the dictionary, press the button at the top of the Pronunciation, write the word on the bottom line, if it is highlighted in the text, then it will already be there, click the Check button, put the cursor on the correct place of stress, press (<), and if the pronunciation fits click Add. Then the button at the top Save to overwrite the dictionary. Abbreviations, like asterisks, are not used in branded dictionaries, each word is written separately. This is not convenient, but the pronunciation is better.
You can keep the ToM Reader and MP3book2005 open at the same time. You read in ToM Reader, edit it in MP3book2005, while after changing the dictionary, ToM Reader must be restarted. Can only use MP3book2005. It should be borne in mind that the ToM Reader is free, and not registered MP3book2005 has minor limitations.
Acapela ELAN Tempo Multimedia sometimes reads words spelled in CAPITAL letters.
Digalo Nikolay - the old version.
You can take Digalo TTS 2000 (DigaloCoreRus.exe-7,44 MB, SAPI 4) and ToM Reader Russian. Digalo TTS 2000 is a voice engine that supports several languages, including Russian. It is paid, but you can find crack. ToM Reader Russian is a program that uses the Digalo TTS 2000 to read books.
Digalo TTS 2000 has Nicolai's Russian voice, it is better than votes from other companies, but not perfect, so a dictionary is needed for it. There are two options: use the dictionary built into the ToM Reader, and use the dictionary in Digalo itself. The first is simpler, because it uses asterisks (*) to replace part of the word, but less quality, the second is more complicated, but also more qualitative.
In the first case, take the Digalo Russian Nicolai.dic dictionary and copy it to the dict folder in ToM Reader, which appears when you open ToM Reader and set it in the settings: use the dictionary. In the second case, the process is somewhat longer. But its advantage is that you will get a better pronunciation, and other programs, such as PROMT, that do not have the ability to connect a dictionary, will have the correct pronunciation, since Digalo will use its dictionary.
So, take DigaloEditor 1.0 and unpack it in c: \ Program Files \ Digalo \ Digalo 2000 Russian \ russian \ data. There appear: DigaloEditor.exe - dictionary editing program, abb_rus.txt and exc_rus.txt - dictionaries. abb_rus.txt for abbreviations, exc_rus.txt for other words. DigaloEditor.exe only edits exc_rus.txt, abb_rus.txt needs to be edited in Notepad or Word.
Now about the features of editing in DigaloEditor.
If you want to add a word or search, click the Add button, and start typing, the search is automatically performed, and if such a combination is there, it is highlighted in red. And the most important thing. When you type a word, it is already recorded in the dictionary, and if you exit it saving the result, it will be in the dictionary. In independence, there is such a word in the dictionary or not. Therefore, if you have written the correct word, press the save button. If it is not correct or there is already such a word, then delete the entered line with the Delete button. And so save-delete after each set. The emphasis is placed with the “<” sign without quotes, spaces must be an equal number on one side and the other. For example: “trout fish fish fore <e”. Line: “fish trout fish <e” will cause an error in Digalo. Words with variable stress depending on the meaning, you need to write in the phrase. Suspension lock = suspended lock <k.
The result is heard after the reader is restarted.
Russian Speech Synthesis Program Speaking 2.0.6 and Digalo and SpeechCube voice engines
Govorilka is a small program for reading texts by voice. She can read aloud any text you give her in any language, in any established voice. Record text to MP3 file.
The main features of the program Govorilka.
Reading text in voice.
Record readable text into an audio file (* .WAV, * .MP3) with increased speed * and broken down into parts of a given size.
Adjust reading speed and voice pitch.
Automatically scrolls text on the screen so that the readable fragment is always visible (tracking speech). In this case, the readable text can be highlighted in color.
The updated dictionary of pronunciations, which allows you to easily adjust the pronunciation of individual words and phrases.
Opens large files in DOS and Windows encoding.
Opens texts from Microsoft® Word and HTML files.
Readable text size up to 2 gigabytes.
Memorized text and cursor position when exiting the program.
Please note that the current version of the program is a test (beta) - there may be minor errors.
What is useful Govorilka: Govorilka saves your eyesight. With it you can listen to the texts of electronic books, and not read from the monitor screen. You can learn how words and phrases sound in a foreign language. You can quickly burn books to MP3 files and listen to them on your MP3 player. With the help of the talker you can evaluate the possibilities of computer speech synthesis and teach your computer to talk.
Description: A talker is necessary for someone who likes to listen to texts more than to read them from the monitor screen or saves their eyesight and wants to read texts of e-books sitting away from the monitor, who wants to know how words and phrases sound in a foreign language. A talker is needed by anyone who wants to teach his computer to speak and who is just curious to find out how it all works.
Additional features: change of reading speed and voice pitch; opening large files in DOS and Windows encoding, as well as reading text from MS Word files; record speech to an audio file (wav or mp3); automatic scrolling of text on the screen so that the readable fragment is always visible; reading the text in the clipboard, the ability to change the pronunciation (dictionary).
Add.Information: The interface of Govorilka is multilingual, in Windows 2000 / XP the program will work immediately, but users of Windows 95/98 / NT may have to download some missing files - text-to-speech engine and SAPI (details - on the home page) .
Speaker 2.0.6 and Digalo and SpeechCube voice enginesVirus Free by KAV
Talker 2.2.2 (official final version dated 12/09/2009)Virus Free by KAV
Speech synthesis systems
Reader - a program designed for easy reading of texts and e-books from a computer screen. In addition, many readers are able to voice texts, using special speech synthesis programs.
A good reader has many features that make screen reading less tedious. Layout in the form of a book, smooth text scrolling, text smoothing are just some of the tools used in the reader.
** Govorilka ** is a small free program for reading texts using voice synthesis engines.
In order for the programs included in the Readers category to voice texts with a “human voice”, the SAPI library (Speech Application Programming Interface, or Speech API) and voice engines must be installed in the system.
Two versions of Speech API are distributed today: SAPI4 and SAPI5. Both of these libraries are incompatible, but do not interfere with each other and can work on the same computer, therefore for programs supporting both libraries it is recommended to install both of them (this will allow to have more voice engines).
The operating systems Windows XP, Vista and 7 usually already have SAPI5 libraries installed, so you only need to (but not necessarily) install SAPI4. However, there may be such cases when there is a need to establish SAPI5. Download and familiarize yourself with the installation features of each of these libraries on their pages: Download SAPI .
Also, for speech synthesis, it is necessary to have installed voice engines for the desired language on the computer. It was already stated above that the SAPI4 and SAPI5 libraries are incompatible, therefore each of the voice engines can work only with one of these libraries. If you have both Speech API libraries installed on your computer, you can install all voice engines: Download voice engines for SAPI .
Screen Access Programs
The VIRGO 4 screen access program is the result of BAUM's many years of work in the development of the VIRGO program, the main goal of which is to ensure comfortable work of blind and visually impaired users with Windows. VIRGO 4 allows the user to choose which information to display on the Braille display and which to pronounce in voice. Visually impaired users can also use the GALILEO screen magnification system integrated into VIRGO 4. Integrated approach VIRGO 4, using braille and speech, flexibly combines the power of both methods of displaying information for the convenience of the user.
MyStick is the first mobile screen access that works without installation on all modern Windows computers. Inserted into a free USB port of the computer, MyStick starts automatically and the user can immediately work with the computer. After removing MyStick, no files remain on the computer and no configuration changes. MyStick is a U3 format flash drive. With MyStick, blind and visually impaired PC users are not tied to a specific, specially equipped computer and can access any computer running Windows. There are two MyStick options: with speech output and screen magnification, and only with speech output. MyStick versions are available for Russian, English, German, French, Swedish, Norwegian and Danish languages.
The screen access program Cobra 9.1 makes it easy to work with Windows 7, Vista or Windows XP for blind and visually impaired computer users. COBRA combines all the standard features of a modern user-friendly screen access program. COBRA captures user requirements and displays important information from a computer monitor using speech, braille, or screen magnification.
Speech synthesis has a long history, overgrown with legends. Back in the tenth century, Herbert Avrilaksky was credited with mastering the art of making a teraphim - a talking dead head. Made of bronze, this head with the words “yes” and “no” answered the questions of anyone who applied to it. In the middle of the 13th century, the Dominican monk Albert von Bolstedt and the English philosopher and naturalist Roger Bacon also tried to create the first examples of “talking heads”.
At the end of the 18th century, the Danish scientist Christian Kratzenstein, a full member of the Russian Academy of Sciences, created a model of the human speech tract capable of making five long vowel sounds (a, uh, o, o). The model was a system of acoustic resonators of various shapes, which emitted vowels with the help of vibrating reeds, excited by the air flow. In 1778, Austrian scientist Wolfgang von Kampelen supplemented the Kratzenstein model with models of tongue and lips and introduced an acoustic-mechanical speaking machine capable of reproducing certain sounds and their combinations. Hissing and whistling were blown with the help of special hand-operated fur. In 1837, scientist Charles Wheatstone (Charles Wheatstone) presented an improved version of the machine, capable of reproducing vowels and most consonant sounds. And in 1846, Joseph Faber (Joseph Faber) demonstrated his speaking organ Euphonia, in which an attempt was made to synthesize not only speech, but also singing.
At the end of the XIX century, the famous scientist Alexander Bell created his own “talking” mechanical model, very similar in design to the Wheatstone machine. With the advent of the 20th century, the era of electric cars began, and scientists were able to use sound wave generators and build algorithmic models on their basis.
In the 1930s, a Bell Labs employee Homer Dudley (Homer Dudley), working on the problem of finding ways to reduce the bandwidth needed in telephony to increase its transmitting capacity, develops VOCODER (short for English. Voice - voice, English. Coder - encoder ) - keyboard-controlled electronic analyzer and speech synthesizer. Dudley's idea was to analyze the voice signal, disassemble it into parts and re-synthesize it into a less demanding bandwidth. An improved version of the vocoder Dudley, VODER, was presented at the 1939 New York World Expo.
The first speech synthesizers sounded rather unnatural, and often they could hardly make out the phrases they produced. However, the quality of synthesized speech has been constantly improved, and the speech generated by modern speech synthesis systems is sometimes indistinguishable from real human speech. But despite the successes of electronic speech synthesizers, research in the field of creating mechanical speech synthesizers is still being conducted, for example, for use in humanoid robots.
The first speech synthesis systems based on computer technology began to appear in the late 1950s, and the first text-to-speech synthesizer was created in 1968.
It will not be superfluous for your friends to find out this information, share the article with them!