This page has been robot translated, sorry for typos if any. Original content here.
Synthesis of speech and reading of electronic books by voice
Automatic speech synthesis - the process of generating a speech signal - a technology that makes it possible to read the text (document, letter, sms) in a voice close to the natural one. In order for the synthesized speech to sound natural, it is necessary to solve a whole complex of tasks related to ensuring the naturalness of the voice at the level of the timbre, smoothness of sound and intonation, and with the correct placement of stresses, deciphering abbreviations, numbers, abbreviations and special signs.
The technology of synthesis can be claimed both in a narrow subject area, and in a wide, or unlimited. For a narrow area, sound quality can be reduced to the most natural, by compiling pre-recorded lengthy speech fragments related to a given area. An example of such a synthesis (called macrosynthesis) can serve as an alert system for the movement of trains, used at stations of large cities in Russia. It is much more difficult to make a speech synthesizer for unlimited text of any subject area. In this case, the user can specify any phrase or sentence for the synthesizing system.
Methods of speech synthesis
Today, there are three main directions of synthesis: the diphonic approach (the diphon is the sound from the middle of one phoneme to the middle of the neighboring phoneme), the allophonic approach (realization of the phoneme in the context of the context on the left and the right), and the Unit Selection (selection of sound elements from the speech base). But each of them individually has its drawbacks:
Diphonic approach - allows you to make a legible, but unnatural tone on the timbre. The timbre of the synthesized speech does not recognize the voice of the donor speaker.
Allophone approach - the naturalness of the voice is somewhat higher than in the diphon approach due to a larger set of sound elements. However, as in the diphone synthesis, the voice is rather robotic, and it is difficult to recognize the voice of the donor-speaker.
Unit Selection - the naturalness of the timbre of speech is high and in the synthesized voice preserves the timbre color of the voice of the donor-donor. However, due to limitations on the size of the voice base, some texts (words and their combinations) are pronounced with noticeable distortions until the individual sounds are completely lost.
Extras. Information: Text-to-speech (TTS) engine or speech synthesis engines - programs similar to drivers, designed to convert and synthesize text into a sound wave. The speech synthesis engines do not have a built-in control, and in order for your computer to speak, one TTS is not enough. To use TTS, you need a synthesis program (TextAloud, Cool Reader, Balabolka, etc.) that replaces the interface, allowing you to work with TTS, change its settings, sound and voice, and control the rest of the possibilities.
The company Acapela developers of the popular Russian-speaking voice module Nicholas, released a new Russian, female voice engine, which was called Alena, runs on SAPI-5 with a frequency of 22 kHz, the quality of the synthesized speech Alena far ahead of Nicholas, voice and intonation in the opinion of users more pleasant than At the engine Katerina from ScanSoft RealSpeak.
Together with the engine comes Lexicon Manager - a dictionary editor that allows you to change the pronunciation of words both alphabetically and phonetically.
For example, using KooBAudio, mp3book2005 and this voice engine - a 4 hour romance can be announced and translated into mp3 in 10 minutes
Acapela Alyona - works well with such programs as: KooBAudio 0.7.0.7, mp3book2005, Balabolka, Cool Reader ...
A good voice engine! In general, it really reads more clearly than all other Russian speakers, including Nikolai, but the latter has a more pleasant voice due to the low voice of the voice and makes less mistakes in accents.
Download Balabolka 188.8.131.525 (7 Mb) Virus Free by KAV
Free and most successful, with all the necessary functions, the ability to write to mp3.
ToM Reader 2.73
Download ToM Reader 2.73 (1 Mb) Virus Free by KAV
Free, familiar form of the book, visual bookmarks, the ability to record in mp3.
ToM Reader Russian program for reading. The main advantage is that it does not interfere with the pronunciation of the voice engine, and reads by sentences, not by paragraphs as in many programs, so it's easy to keep track of the reading. It looks like a book in a binding, which is also convenient. Supported formats: txt, doc, rtf, htm.
Download MP3book2005 (7 Mb) Virus Free by KAV
With all the necessary functions for editing the dictionary, writing to MP3 and reading.
MP3book2005 is a program for editing a dictionary, reading, and writing to mp3. Edits the dictionary perfectly, but I would like it to be a book view. Supported formats: txt, rtf, htm, fb2.
2. Download Infovox Desktop 2.220 Engine SP3 ( Acapela_Infovox_Desktop_2.220_EngineSP3.rar ) (24.08 Mb) - a program for control and activation, with a native, simple reader, and most importantly with the program for editing the Alenin Dictionary of Pronunciations - Lexicon Manager.
3. Download US English 2.220 ( English_ID2220.rar ) (233.13 Mb) - the engine for the synthesis of English speech.
4. Download the Reader , a program for reading books ( links above ).
5. Download AlyonaSlovari-Alyona22k ( AlyonaSlovari.rar ) (1.2 Mb) - dictionaries to Alena for 24345 words.
7. Download MSagent.exe and ms_speech_api.exe ( sintez_bib.rar ) (archive-1.09 MB) - MSagent.exe (400 KB) - agent for working with speech recognition and synthesis engines, ms_speech_api.exe (830 KB) - Libraries necessary for speech recognition and speech synthesis (for Windows 7 are not needed).
Supports OS: Windows XP, Windows Vista, Windows 7, 8.
2) On top, install the voice engine Alyona Russian 2.220
3) Run the License Manager and copy the button "Copy to Clipboard" License Code
4) Run key.exe, insert the code there and press "Make Key" to generate the license file.
5) Import the resulting file into the License Manager by clicking "Import License File"
6) Add dictionaries using Lexicon Manager (Lexicon-Voice Associations - Add Lexicon ... or File - Import Lexicon)
To work, you need additional libraries (install in this order!):
1. MSagent.exe and ms_speech_api.exe are the libraries needed for speech recognition and synthesis. (For Win 7 are not needed)
2. Acapela ELAN Tempo Multimedia V184.108.40.206 Nicolai - the engine for the synthesis of Russian speech for the agent.
3. ToM Reader 2.73 or MP3book2005 - programs for reading, which one like.
4. Download the accent dictionary for ELAN Tempo Multimedia Nikolai .
We copy the main dictionary exc_rus.txt, and abb_rus.txt - for abbreviations, in the C: \ Program Files \ Elan package, and exc_rus.txt also in Program Files \ MP3book2005 \ DIC, with replacement.
MP3book2005 only edits exc_rus.txt, abb_rus.txt needs to be edited in Notepad or Word. These are branded dictionaries, they are used by ELAN Tempo Multimedia.
There is also a dictionary that is built into the reading program of ToM Reader (Digalo Russian Nicolai.dic). Do not use such dictionaries at all, they only worsen the pronunciation.
In ToM Reader, the engine settings are approximately the following:
To edit the dictionary in MP3book2005, press the Dictionary button, and if necessary, load the exc_rus.txt dictionary. To add a new word to the dictionary, we press the button at the top of the Pronunciation, we write the word in the bottom line, if it is highlighted in the text, then it will already be there, press the Check button, move the cursor to the correct place of stress, press (<), and if the pronunciation is suitable Click Add. Then click the button at the top of the Save button to overwrite the dictionary. Abbreviations, like asterisks, are not used in branded dictionaries, each word is recorded separately. This is not convenient, but the pronunciation is more qualitative.
You can keep ToM Reader and MP3book2005 open at the same time. Read in ToM Reader, edit in MP3book2005, then after changing the ToM Reader dictionary you need to restart. You can only use MP3book2005. It should be noted that the ToM Reader is free, but not registered MP3book2005 has minor limitations.
Acapela ELAN Tempo Multimedia sometimes reads words written in CAPITAL letters, by letters.
Digalo Nicholas - the old version.
You can take Digalo TTS 2000 (DigaloCoreRus.exe-7.44 MB, SAPI 4) and ToM Reader Russian. Digalo TTS 2000 is a voice engine that supports several languages, including Russian. It is paid, but you can find crack. ToM Reader Russian is a program that uses Digalo TTS 2000 to read books.
Digalo TTS 2000 has the Russian voice Nicolai, it's better than voices from other companies, but not perfect, so you need a dictionary for it. There are two options: use the dictionary embedded in the ToM Reader, and use the dictionary in Digalo itself. The first is simpler, because it uses asterisks (*) to replace part of the word, but less qualitative, the second is more complicated, but also more qualitative.
In the first case, we take the Digalo Russian Nicolai.dic dictionary and copy it into the dict folder in the ToM Reader, which appears when you open the ToM Reader and set it in the settings: use the dictionary. In the second case, the process is somewhat longer. But its advantage is that you get a better pronunciation, and other programs, such as PROMT, that do not have the ability to connect a dictionary, will have the correct pronunciation, because Digalo will use its dictionary.
So, take DigaloEditor 1.0 and unpack it into c: \ Program Files \ Digalo \ Digalo 2000 Russian \ russian \ data. There appear: DigaloEditor.exe - the program for editing the dictionary, abb_rus.txt and exc_rus.txt - dictionaries. Abb_rus.txt for abbreviations, exc_rus.txt for other words. DigaloEditor.exe only edits exc_rus.txt, abb_rus.txt needs to be edited in Notepad or Word.
Now about the features of editing in DigaloEditor.
If you want to add a word or find it, click the Add button, and start typing, it automatically searches, and if such a combination is it is highlighted in red. And the most important thing. When typing a word, it is already written to the dictionary, and if you leave it after saving the result, it will be in the dictionary. In independence, there is such a word in the dictionary or not. So if you wrote the right word, press the save button. If it is not correct or if there is already such a word, delete the entered line by clicking the Delete button. And so save-delete after each set. The stress is placed with the sign "<" without quotes, there must be an equal number of spaces on one side and on the other. For example: "fish trout fish fore". Line: "fish trout fish-foresh <l" will cause an error in Digalo. Words with variable stress depending on the meaning, you need to write in the phrase. Suspension lock = hanging lock <k.
The result is heard after the read program is rebooted.
The program for the synthesis of Russian speech Speaker 2.0.6 and voice engines Digalo and SpeechCube
Govorilka is a small program for reading texts with voice. She can read aloud any text that you give to her in any language, by any fixed voice. Writes the text to an MP3 file.
The main features of the Govorilka program.
Reading the text in voice.
Write readable text to an audio file (*. WAV, * .MP3) with increased speed * and broken down into parts of the specified size.
Adjust the speed of reading and voice height.
Automatically scrolls text on the screen to always see the readable fragment (speech tracking). The read text can be highlighted in color.
Rechargeable pronunciation dictionaries that allow you to easily adjust the pronunciation of individual words and phrases.
Opens large files in DOS and Windows encoding.
Opens texts from Microsoft® Word and HTML files.
The size of the readable text is up to 2 gigabytes.
The text and the cursor position are remembered when exiting the program.
Please note that the current version of the program is a test version (beta) - there may be minor errors.
How useful Govorilka: Govorilka takes care of your eyesight. With her the texts of electronic books can be listened to, and not read from the screen of the monitor. You can learn how to sound words and phrases in a foreign language. You can quickly burn books into MP3 files and listen to them on an MP3 player. With the help of Govorilka you can evaluate the possibilities of computer synthesis of speech and teach your computer to talk.
Description : A talker is needed for someone who loves to listen to texts rather than reading them from the screen of the monitor or saves their eyesight and wants to read e-books sitting away from the monitor who wants to learn how words and phrases sound in a foreign language. A talker is needed by anyone who wants to teach their computer to talk and who is just curious to know how this all works.
Additional features: changing the reading speed and voice height; Opening large files in DOS and Windows encoding, as well as reading text from MS Word files; Record speech in a sound file (wav or mp3); Automatic scrolling of text on the screen so that the readable fragment is always visible; Reading text that is in the clipboard, the ability to change the pronunciation (dictionary).
Extras.The interface for Govorilka is multilingual, the program will start working right away in Windows 2000 / XP, but Windows 95/98 / NT users may have to download some missing files - text-to-speech engine and SAPI (details on the home page) .
Speaker 2.0.6 and voice engines Digalo and SpeechCubeVirus Free by KAV
Speech 2.2.2 (official final version of 09.12.2009)Virus Free by KAV
Systems of synthesis of speech
Reader - a program designed for easy reading of texts and electronic books from the computer screen. In addition, many readers are able to voice texts, using for this purpose special programs of speech synthesis.
A good reader has many functions making reading from the screen less tedious. Layout in the form of a book, smooth scrolling text, anti-aliasing text - just some of the tools used in the reader.
** Balabolka ** is a free program for reading text files with a human voice.
** Govorilka ** - a small free program for reading texts with the help of voice synthesis engines.
In order for programs that are part of the "Readers" category to be able to voice the texts in a "human voice", the SAPI library (Speech Application Programming Interface, or Speech API) and voice engines must be installed in the system.
To date, two versions of the Speech API have been distributed: SAPI4 and SAPI5. Both of these libraries are incompatible, but they do not interfere with each other and can work on the same computer, therefore it is recommended that both programs support both libraries (this will allow to have more voice engines).
In the operating systems Windows XP, Vista and 7, there are usually already installed SAPI5 libraries, so you need (but not required) only to install SAPI4. However, there are also possible cases when there is a need to establish SAPI5. Download and familiarize yourself with the installation features of each of these libraries on their pages: Download SAPI .
Also, for speech synthesis, it is necessary to have installed voice engines on the computer for the desired language. It has already been pointed out that SAPI4 and SAPI5 libraries are incompatible, therefore each of the voice engines can work only with one of these libraries. If both Speech API libraries are installed on your computer, you can install all voice engines: Download voice engines for SAPI .
Screen Access Software
The program of screen access VIRGO 4 is the result of many years of BAUM's work on the development of the VIRGO program, the main purpose of which is to provide comfortable work for blind and visually impaired users with Windows. VIRGO 4 allows the user to choose what information to display on the braille display, and which voice to pronounce. Weak-vision users can also take advantage of the GALILEO screen-enlargement system integrated into the VIRGO 4. The complex approach VIRGO 4, using braille and speech, flexibly combines the power of both methods of information output for the convenience of the user.
MyStick is the first mobile screen access that works without installation on all modern computers with Windows. Inserted into a free USB port of the computer, MyStick starts automatically and the user can immediately work with the computer. After removing MyStick, no files remain on the computer and no configuration is changed. MyStick is a USB flash drive of U3 format. With MyStick, blind and visually impaired PC users are not tied to a specific, specially equipped computer and can access any computer running Windows. There are two variants of MyStick: with speech output and screen magnification and only with voice output. Available versions of MyStick for Russian, English, German, French, Swedish, Norwegian and Danish.
The Cobra 9.1 screen access program makes it easy to work with Windows 7, Vista or Windows XP for blind and visually impaired computer users. COBRA unites all the standard functions of a modern user-oriented screen access program. COBRA fixes user requirements and displays important information from a computer monitor with speech, braille or screen magnification.
The synthesis of speech is a long story, overgrown with legends. As early as the tenth century, Herbert Avrilaksky was credited with mastering the art of making a terafim - a talking dead head. Made of bronze, this head answered with the words "yes" and "no" to the questions of anyone addressing it. In the middle of the 13th century, the Dominican monk Albert von Bolshtedt and the English philosopher and naturalist Roger Bacon also tried to create the first samples of "talking heads".
At the end of the 18th century, the Christian Danish Christian Kratsenshtein, a full member of the Russian Academy of Sciences, created a model of the human speech path, capable of pronouncing five long vowel sounds (a, e, u, o, y). The model was a system of acoustic resonators of various shapes, which issued vowel sounds with the help of vibrating tongues, excited by the airflow. In 1778, the Austrian scientist Wolfgang von Kampelen supplemented the Kratzenstein model with language and lip models and introduced an acoustic-mechanical talking machine capable of reproducing certain sounds and their combinations. The hissing and whistling were blown out by means of special furs with manual control. In 1837, the scientist Charles Wheatstone (Charles Wheatstone) presented an improved version of the machine, capable of reproducing vowels and most consonant sounds. And in 1846, Joseph Faber (Joseph Faber) demonstrated his talking organ Euphonia, in which an attempt was made to synthesize not only speech but also singing.
At the end of the XIX century, the famous scientist Alexander Bell created his own "talking" mechanical model, very similar in design to the Wheatstone machine. With the advent of the 20th century, the era of electric machines began, and scientists were able to use sound wave generators and build algorithmic models on their basis.
In the 1930s, Bell Labs worker Homer Dudley, working on the problem of finding ways to reduce the bandwidth needed in telephony to increase its transmission capacity, develops VOCODER (short for English, voice, coder) ) - a keyboard-controlled electronic analyzer and speech synthesizer. Dudley's idea was to analyze the voice signal, disassemble it and resynthesize it into a less demanding bandwidth line. The improved version of the vocoder Dudley, VODER, was presented at the New York World Exhibition in 1939.
The first synthesizers of speech sounded rather unnaturally, and often it was hardly possible to disassemble the phrases they produced. However, the quality of synthesized speech has constantly improved, and the speech generated by modern speech synthesis systems can sometimes not be distinguished from real human speech. But despite the success of electronic speech synthesizers, research in the field of creating mechanical speech synthesizers is still being conducted, for example, for use in humanoid robots.
The first speech-based speech synthesis systems began to appear in the late 1950s, and the first text-to-speech synthesizer was created in 1968.
Now everyone can publish articles Try it first!
To write an article
It will not be superfluous for your friends to learn this information, share their article with them!