This page has been robot translated, sorry for typos if any. Original content here.
Synthesis of speech and reading of electronic books by voice
Automatic speech synthesis - the process of generating a speech signal - a technology that makes it possible to read the text (document, letter, sms) in a voice close to the natural one. In order for synthesized speech to sound natural, it is necessary to solve a whole complex of tasks related to ensuring the naturalness of the voice at the level of the timbre, smoothness of sound and intonation, and with the correct placement of stresses, deciphering abbreviations, numbers, abbreviations and special signs.
The synthesis technology can be claimed both in a narrow subject area, and in a wide, or unlimited. For a narrow area, the sound quality can be reduced to the most natural, by compiling pre-recorded lengthy speech fragments related to a given area. An example of such a synthesis (called macrosynthesis) can serve as an alert system for the movement of trains, used at stations of large cities in Russia. It is much more difficult to make a speech synthesizer for unlimited text of any subject area. In this case, the user can set the synthesizing system for pronunciation any phrase or sentence.
Methods of speech synthesis
Today, there are three main directions of synthesis: the diphonic approach (the diphon is the sound from the middle of one phoneme to the middle of the neighboring phoneme), the allophonic approach (realization of the phoneme in the context of the context on the left and the right), and the Unit Selection (selection of sound elements from the speech base). But each of them individually has its drawbacks:
Diphonic approach - allows you to make a legible, but unnatural tone on the timbre. The timbre of the synthesized speech does not recognize the voice of the donor speaker.
Allophone approach - the naturalness of the voice is somewhat higher than in the diphon approach due to a larger set of sound elements. However, as in the diphone synthesis, the voice turns out to be rather robotic, and it is difficult to recognize the voice of the donor-speaker.
Unit Selection - the naturalness of the timbre of speech is high and in the synthesized voice preserves the timbre color of the voice of the donor-donor. However, due to restrictions on the size of the voice database, some texts (words and their combinations) are pronounced with noticeable distortions until the individual sounds are completely lost.
Extras. Information: Text-to-speech (TTS) engine or speech synthesis engines - programs similar to drivers, designed to convert and synthesize text into a sound wave. The speech synthesis engines do not have a built-in control and in order for your computer to speak, one TTS is not enough. To use TTS, you need a synthesis program (TextAloud, Cool Reader, Balabolka, etc.) that replaces the interface, allowing you to work with TTS, change its settings, sound and timbre of speech, and control the rest of the possibilities.
The company Acapela developers of the popular Russian-speaking voice module Nicholas, released a new Russian, female voice engine, which was called Alena, runs on SAPI-5 with a frequency of 22 kHz, the quality of the synthesized speech Alyona is much ahead of Nicholas, voice and intonation in the opinion of users more pleasant than at the engine Katerina from ScanSoft RealSpeak.
Together with the engine comes Lexicon Manager - dictionary editor, which allows you to change the pronunciation of words both alphabetically and phonetically.
For example, using KooBAudio, mp3book2005 and this voice engine - a 4 hour romance can be announced and translated into mp3 in 10 minutes
Acapela Alyona - works well with such programs as: KooBAudio 0.7.0.7, mp3book2005, Balabolka, Cool Reader ...
A good voice engine! In general, it does read more clearly than all other Russian speakers, including Nikolai, but the latter has a more pleasant voice due to the low voice of the voice and makes less mistakes in accents.
Download Balabolka 22.214.171.1245 (7 Mb) Virus Free by KAV
Free and most successful, with all the necessary functions, the ability to write to mp3.
ToM Reader 2.73
Download ToM Reader 2.73 (1 Mb) Virus Free by KAV
Free, familiar look of the book, visual bookmarks, the ability to record in mp3.
ToM Reader Russian program for reading. The main advantage is that it does not interfere with the pronunciation of the voice engine, and reads by sentences, not by paragraphs as in many programs, so it's easy to keep track of the reading. It looks like a book in a bind, which is also convenient. Supported formats: txt, doc, rtf, htm.
Download MP3book2005 (7 Mb) Virus Free by KAV
With all the necessary functions for editing the dictionary, writing to MP3 and reading.
MP3book2005 is a program for editing a dictionary, reading, and writing to mp3. Edits the dictionary perfectly, but I would like it to be a book view. Supported formats: txt, rtf, htm, fb2.
2. Download Infovox Desktop 2.220 Engine SP3 ( Acapela_Infovox_Desktop_2.220_EngineSP3.rar ) (24.08 Mb) - a program for control and activation, with a native, simple reader, and most importantly with the program for editing the Alenin Dictionary of Pronunciations - Lexicon Manager.
4. Download the Reader , a program for reading books ( links above ).
5. Download AlyonaSlovari-Alyona22k ( AlyonaSlovari.rar ) (1.2 Mb) - dictionaries to Alena for 24345 words.
7. Download MSagent.exe and ms_speech_api.exe ( sintez_bib.rar ) (archive-1.09 MB) - MSagent.exe (400 KB) - agent for working with speech recognition and synthesis engines, ms_speech_api.exe (830 KB) - libraries necessary for speech recognition and speech synthesis (for Windows 7 are not needed).
Supports OS: Windows XP, Windows Vista, Windows 7, 8.
2) On top, install the voice engine Alyona Russian 2.220
3) Run the License Manager and copy the button "Copy to Clipboard" License Code
4) Run key.exe, insert the code there and press "Make Key" to generate the license file.
5) Import the resulting file into the License Manager by clicking "Import License File"
6) We add dictionaries using Lexicon Manager (Lexicon-Voice Associations - Add Lexicon ... or File - Import Lexicon)
To work, you need additional libraries (install in this order!):
1. MSagent.exe and ms_speech_api.exe are the libraries needed for the speech recognition and synthesis programs. (for Win 7 are not needed)
2. Acapela ELAN Tempo Multimedia V126.96.36.199 Nicolai is the engine for synthesizing Russian speech for an agent.
3. ToM Reader 2.73 or MP3book2005 - programs for reading, which one will like.
4. Download the accent dictionary for ELAN Tempo Multimedia Nikolai .
We copy the main dictionary exc_rus.txt, and abb_rus.txt for the abbreviations, in the C: \ Program Files \ Elan package, and exc_rus.txt also in the Program Files \ MP3book2005 \ DIC, with the replacement.
MP3book2005 only edits exc_rus.txt, abb_rus.txt needs to be edited in Notepad or Word. These are branded dictionaries, they are used by ELAN Tempo Multimedia.
There is also a dictionary that is built into the reading program of ToM Reader (Digalo Russian Nicolai.dic). Do not use such dictionaries at all, they only worsen the pronunciation.
In ToM Reader, the engine settings are approximately the following:
To edit the dictionary in MP3book2005, press the Dictionary button, and if necessary, load the dictionary exc_rus.txt. To add a new word to the dictionary, we press the button at the top of the Pronunciation, we write the word in the bottom line, if it is highlighted in the text, then it will already be there, press the Check button, put the cursor on the correct place of stress, press (<), and if the pronunciation is suitable click Add. Then click the button at the top of the Save button to overwrite the dictionary. Abbreviations, like asterisks, are not used in branded dictionaries, each word is recorded separately. This is not convenient, but the pronunciation is more qualitative.
You can keep ToM Reader and MP3book2005 simultaneously open. Read in ToM Reader, edit in MP3book2005, and after changing the ToM Reader dictionary you need to restart. You can only use MP3book2005. At the same time, it should be taken into account that ToM Reader is free, and not registered MP3book2005 has insignificant restrictions.
Acapela ELAN Tempo Multimedia sometimes reads words written in CAPITAL letters, spelled.
Digalo Nicholas - the old version.
You can take Digalo TTS 2000 (DigaloCoreRus.exe-7.44 MB, SAPI 4) and ToM Reader Russian. Digalo TTS 2000 is a voice engine that supports several languages, including Russian. It is paid, but you can find crack. ToM Reader Russian is a program that uses Digalo TTS 2000 to read books.
Digalo TTS 2000 has the Russian voice Nicolai, it's better than voices from other companies, but not perfect, so you need a dictionary for it. There are two options: to use the dictionary built into the ToM Reader, and use the dictionary in Digalo itself. The first is simpler, because it uses the asterisks (*) to replace part of the word, but less qualitative, the second is more complicated, but also better.
In the first case, we take the Digalo Russian Nicolai.dic dictionary and copy it into the dict folder in the ToM Reader, which appears when you open the ToM Reader and set it in the settings: use the dictionary. In the second case, the process is somewhat longer. But its advantage is that you get a better pronunciation, and other programs, such as PROMT, that do not have the ability to connect a dictionary, will have the correct pronunciation, because Digalo will use its dictionary.
So, take DigaloEditor 1.0 and unpack it into c: \ Program Files \ Digalo \ Digalo 2000 Russian \ russian \ data. There appear: DigaloEditor.exe - the program for editing the dictionary, abb_rus.txt and exc_rus.txt - dictionaries. abb_rus.txt for abbreviations, exc_rus.txt for other words. DigaloEditor.exe only edits exc_rus.txt, abb_rus.txt needs to be edited in Notepad or Word.
Now about the features of editing in DigaloEditor.
If you want to add a word or find it, click the Add button, and start typing, it automatically searches, and if such a combination is it is highlighted in red. And the most important thing. When typing a word, it is already written in the dictionary, and if you leave it after saving the result, it will be in the dictionary. In independence, there is such a word in the dictionary or not. So if you wrote the right word, press the save button. If it is not correct or if there is already such a word, delete the entered line with the Delete button. And so save-delete after each set. The stress is placed with the sign "<" without quotes, there must be an equal number of spaces on one side and on the other. For example: "fish trout fish fore". Line: "fish trout fish-foresh <l" will cause an error in Digalo. Words with variable stress depending on the meaning, you need to write in the phrase. Suspension lock = hanging lock <k.
The result is heard after the reading program is rebooted.
The program for the synthesis of Russian speech Speaker 2.0.6 and voice engines Digalo and SpeechCube
Govorilka is a small program for reading texts with voice. She can read aloud any text that you will give it in any language, any voice set. Writes the text to an MP3 file.
The main features of the Govorilka program.
Reading the text in voice.
Write readable text to a sound file (* .WAV, * .MP3) with increased speed * and broken down into parts of the specified size.
Adjust the speed of reading and voice height.
Automatically scrolls the text on the screen to always see the readable fragment (speech tracking). However, the text can be highlighted in color.
Rechargeable pronunciation dictionaries that allow you to easily adjust the pronunciation of individual words and phrases.
Opens large files in DOS and Windows encoding.
Opens texts from Microsoft® Word and HTML files.
The size of the readable text is up to 2 gigabytes.
The text and the cursor position are remembered when exiting the program.
Please note that the current version of the program is a test version (beta) - there may be minor errors.
What is useful for Govorilka: Govorilka takes care of your eyesight. With her the texts of electronic books can be listened to, and not read from the screen of the monitor. You can learn how to sound words and phrases in a foreign language. You can quickly burn books into MP3 files and listen to them on an MP3 player. With the help of Govorilka you can evaluate the possibilities of computer-assisted speech synthesis and teach your computer to talk.
Description: A talker is needed for someone who loves to listen to texts rather than reading them from the screen of the monitor or saves their eyesight and wants to read the texts of e-books sitting away from the monitor who wants to learn how words and phrases sound in a foreign language. A talker is needed by anyone who wants to teach their computer to talk and who is just curious to know how this all works.
Additional features: changing the reading speed and voice height; opening large files in DOS and Windows encoding, and reading text from MS Word files; record speech in a sound file (wav or mp3); automatic scrolling of text on the screen, so that the readable fragment is always visible; reading text that is in the clipboard, the ability to change the pronunciation (dictionary).
Extras.Information: The interface for Govorilka is multilingual, the program will work right away in Windows 2000 / XP, but Windows 95/98 / NT users may have to download some missing files - text-to-speech engine and SAPI (details on the home page) .
Speaker 2.0.6 and voice engines Digalo and SpeechCubeVirus Free by KAV
Speech 2.2.2 (official final version of 09.12.2009)Virus Free by KAV
Systems of synthesis of speech
Reader - a program designed for easy reading of texts and electronic books from the computer screen. In addition, many readers are able to voice texts, using for this purpose special programs of speech synthesis.
A good reader has many functions making reading from the screen less tedious. Layout in the form of a book, smooth scrolling text, anti-aliasing text - just some of the tools used in the reader.
** Balabolka ** is a free program for reading text files with a human voice.
** Govorilka ** - a small free program for reading texts with the help of voice synthesis engines.
In order for programs that are part of the "Readers" category to be able to voice the texts in a "human voice", the SAPI library (Speech Application Programming Interface, or Speech API) and voice engines must be installed in the system.
To date, two versions of the Speech API are available: SAPI4 and SAPI5. Both of these libraries are incompatible, but they do not interfere with each other and can work on the same computer, therefore it is recommended that both programs support both libraries (this will allow to have more voice engines).
In the operating systems Windows XP, Vista and 7, there are usually already installed SAPI5 libraries, so you need (but not necessary) only to install SAPI4. However, there are also possible cases when there is a need to establish SAPI5. Download and familiarize yourself with the installation features of each of these libraries on their pages: Download SAPI .
Also, for speech synthesis, you need to have installed voice engines on the computer for the desired language. It has already been pointed out that SAPI4 and SAPI5 libraries are incompatible, therefore each of the voice engines can work only with one of these libraries. If both Speech API libraries are installed on your computer, you can install all voice engines: Download voice engines for SAPI .
Screen Access Software
The program of screen access VIRGO 4 is the result of many years of BAUM's work on the development of the VIRGO program, the main purpose of which is to ensure the comfortable work of blind and visually impaired users with Windows. VIRGO 4 allows the user to choose what information to display on the braille display, and which voice to pronounce. Weak-vision users can also take advantage of the integrated in VIRGO 4 system to increase the screen of GALILEO. The complex approach VIRGO 4, using braille and speech, flexibly combines the power of both methods of information output for the convenience of the user.
MyStick is the first mobile screen access that works without installation on all modern computers with Windows. Inserted into the free USB port of the computer, MyStick starts automatically and the user can immediately work with the computer. After removing MyStick, no files remain on the computer and no configuration changes. MyStick is a USB flash drive of U3 format. With MyStick, blind and visually impaired PC users are not tied to a specific, specially equipped computer and can access any computer running Windows. There are two versions of MyStick: with speech output and screen magnification and only with voice output. Available versions of MyStick for Russian, English, German, French, Swedish, Norwegian and Danish.
The Cobra 9.1 screen access program makes it easy to work with Windows 7, Vista or Windows XP for blind and visually impaired computer users. COBRA unites all the standard functions of a modern user-oriented screen access program. COBRA fixes user requirements and displays important information from a computer monitor with speech, braille or screen magnification.
The synthesis of speech is a long story, overgrown with legends. As early as the tenth century, Herbert Avrilaksky was credited with mastering the art of making a terafim - a talking dead head. Made of bronze, this head answered with the words "yes" and "no" to the questions of anyone addressing it. In the middle of the 13th century, the Dominican monk Albert von Bolshtedt and the English philosopher and naturalist Roger Bacon also tried to create the first samples of "talking heads".
At the end of the 18th century, the Christian Danish Christian Kratsenshtein, a full member of the Russian Academy of Sciences, created a model of the human speech path, capable of pronouncing five long vowel sounds (a, e, u, o, y). The model was a system of acoustic resonators of various shapes, which issued vowel sounds with the help of vibrating tongues, excited by the airflow. In 1778 the Austrian scientist Wolfgang von Kampelen supplemented the model of Kratsenstein with language and lip models and introduced an acoustic-mechanical talking machine capable of reproducing certain sounds and their combinations. The hissing and whistling were blown out by means of special furs with manual control. In 1837, the scientist Charles Wheatstone (Charles Wheatstone) introduced an improved version of the machine, capable of reproducing vowels and most consonant sounds. And in 1846, Joseph Faber (Joseph Faber) demonstrated his talking organ Euphonia, in which an attempt was made to synthesize not only speech but also singing.
At the end of the XIX century, the famous scientist Alexander Bell created his own "talking" mechanical model, very similar in design to the Wheatstone machine. With the advent of the 20th century, the era of electric machines began, and scientists were able to use sound wave generators and build algorithmic models on their basis.
In the 1930s, Bell Labs worker Homer Dudley, working on the problem of finding ways to reduce the bandwidth needed in telephony to increase its transmission capacity, develops VOCODER (short for English, voice, coder) ) - a keyboard-controlled electronic analyzer and speech synthesizer. Dudley's idea was to analyze the voice signal, disassemble it and resynthesize it into a less demanding bandwidth line. The advanced version of the vocoder Dudley, VODER, was presented at the New York World Exhibition in 1939.
The first synthesizers of speech sounded rather unnaturally, and often it was hardly possible to disassemble their phrases. However, the quality of synthesized speech has constantly improved, and the speech generated by modern speech synthesis systems can sometimes not be distinguished from real human speech. But despite the success of electronic speech synthesizers, research in the field of creating mechanical speech synthesizers is still being conducted, for example, for use in humanoid robots.
The first speech synthesis systems based on computer technology began to appear in the late 1950s, and the first text-to-speech synthesizer was created in 1968.
It will not be superfluous for your friends to know this information, share their article with them!