eSpeak-ng: program to speak text from a file or from stdin

Piyush Panchariya
4 min readSep 19, 2021

eSpeak-ng is a command line tool for Linux that converts text to speech. This is a compact speech synthesizer that provides support to English and many other languages. It is written in C. eSpeak reads the text from the standard input or the input file.

eSpeak-ng can be used as a command-line program, or as a shared library.

It supports Speech Synthesis Markup Language (SSML).

Language voices are identified by the language’s ISO 639–1 code. They can be modified by “voice variants”. These are text files which can change characteristics such as pitch range, add effects such as echo, whisper and croaky voice, or make systematic adjustments to formant frequencies to change the sound of the voice. For example, “af” is the Afrikaans voice. “af+f2” is the Afrikaans voice modified with the “f2” voice variant which changes the formants and the pitch range to give a female sound.

eSpeakNG uses an ASCII representation of phoneme names which is loosely based on the Usenet system.

Phonetic representations can be included within text input by including them within double square-brackets. For example: espeak-ng -v en “Hello [[w3:ld]]” will say

Hello world in English.

Syntax of espeak-ng command

espeak-ng [option] ["<words>"]example: espeak-ng "Hello World"

options of espeak-ng command.

Synopsis:

espeak-ng [options] [words]

Description: espeak-ng is a software speech synthesizer for English, and some other languages.

Options:

-h, — help

Show summary of options.

— version

Prints the espeak library version and the location of the espeak voice data.

-f <text file>

Text file to speak.

— stdin

Read text input from stdin instead of a file.

If neither -f nor — stdin are provided, <words> are spoken, or if no words are provided then text is spoken from stdin a line at a time.

-d <device>

Use the specified device to speak the audio on. If not specified, the default audio device is used.

-q

Quiet, don´t produce any speech (may be useful with -x).

-a <integer>

Amplitude, 0 to 200, default is 100.

-g <integer>

Word gap. Pause between words, units of 10ms at the default speed.

-k <integer>

Indicate capital letters with: 1=sound, 2=the word “capitals”, higher values = a pitch increase (try -k20).

-l <integer>

Line length. If not zero (which is the default), consider lines less than this length as end-of-clause.

-p <integer>

Pitch adjustment, 0 to 99, default is 50.

-s <integer>

Speed in words per minute, default is 175.

-v <voice name>

Use voice file of this name from espeak-ng-data/voices. A variant can be specified using voice+variant, such as af+m3.

-w <wave file name>

Write output to this WAV file, rather than speaking it directly.

— split=<minutes>

Used with -w to split the audio output into <minutes> recorded chunks.

-b

Input text encoding, 1=UTF8, 2=8 bit, 4=16 bit.

-m

Indicates that the text contains SSML (Speech Synthesis Markup Language) tags or other XML tags. Those SSML tags which are supported are interpreted. Other tags, including HTML, are ignored, except that some HTML tags such as

-x

Write phoneme mnemonics to stdout.

-X

Write phonemes mnemonics and translation trace to stdout. If rules files have been built with — compile=debug, line numbers will also be displayed.

-z

No final sentence pause at the end of the text.

— stdout

Write speech output to stdout.

— compile=voicename

Compile the pronunciation rules and dictionary in the current directory. =<voicename< is optional and specifies which language is compiled.

— compile-debug=voicename

Compile the pronunciation rules and dictionary in the current directory as above, but include line numbers, that get shown when -X is used.

— ipa

Write phonemes to stdout using International Phonetic Alphabet. — ipa=1 Use ties, — ipa=2 Use ZWJ, — ipa=3 Separate with _.

— tie=<character>

The character to use to join multi-letter phonemes in -x and — ipa output.

— path=<path>

Specifies the directory containing the espeak-ng-data directory.

— pho

Write mbrola phoneme data (.pho) to stdout or to the file in — phonout.

— phonout=<filename>

Write output from -x -X commands and mbrola phoneme data to this file.

— punct=”<characters>”

Speak the names of punctuation characters during speaking. If =<characters> is omitted, all punctuation is spoken.

— sep=<character>

The character to separate phonemes from the -x and — ipa output.

— voices[=<language code>]

Lists the available voices. If =<language code> is present then only those voices which are suitable for that language are listed.

— voices=<directory>

Lists the voices in the specified subdirectory.

Examples

espeak-ng “This is a test”

Speak the sentence “This is a test” using the default English voice.

espeak-ng -f hello.txt

Speak the contents of hello.txt using the default English voice.

cat hello.txt | espeak-ng

Speak the contents of hello.txt using the default English voice.

espeak-ng -x hello

Speak the word “hello” using the default English voice, and print the phonemes that were spoken.

espeak-ng -ven-us “[[h@´loU]]”

Speak the phonemes “h@´loU” using the American English voice.

espeak-ng — voices

List all voices supported by eSpeak.

espeak-ng — voices=en

List all voices that speak English (en).

espeak-ng — voices=mb

List all voices using the MBROLA voice synthesizer.

Author

eSpeak NG is maintained by Reece H. Dunn msclrhd@gmail.com. It is based on eSpeak by Jonathan Duddington jonsd@jsd.clara.co.uk.

This manual page is based on the eSpeak page written by Luke Yelavich themuso@ubuntu.com for the Ubuntu project.

--

--