Depending on the second language you’re trying to master, pronunciation is arguably the hardest aspect to conquer. The Japanese and English languages are no exception. Japanese, with its highly syllabic alphabet, often has a hard time accommodating the often chaotic nature of natural English pronunciation.

While a native English speaker’s tongue might stumble when trying to spit out makudonarudo (McDonald) smoothly the first few times our language allows us to pick it up with a little practice. Japanese English speakers have far more adversity trying to understand all the diminished sounds of a native English speaker casually uttering the name of the famous hamburger chain.

With that, NTT has revealed technology it’s working on that may one day automatically correct a Japanese person’s English pronunciation by editing the speed and rhythm while keeping the original speaker’s voice intact.

This technology, along with several other projects, was put on display at NTT Communication Science Laboratories’ Open House 2013 on 6 and 7 June.  Here, the center’s director, Eisaku Maeda, reminded everyone that these products are still very far away from hitting the market but the institute wanted to give people a hands-on experience with future technology.

To give an example of how it works, let’s say a Japanese person spoke the English sentence:

“I will choose the pink one.”

Depending on the person’s abilities it may come out as:

“I uiru choosu za pinku one.”

When this voice hits the machine, it will first decode what was being said and then edit the sound data to fit what a native speaker would sound like. Presumably, this is done by editing out the extraneous vowel sounds such as the “u” at the end of pinku. They might also fix up the “l” and “wi” sounds by simply shortening them.

Finally they would just edit out any gaps between words and connect similar sounds like the “s” at the end of “choose” and the “th” at the beginning of “the.” With all the edits made the end result should sound in the words of NTT “native-like.”

The biggest challenge of this project is for the software to accurately make out what the second language speakers are saying. For this, they are also developing highly accurate speech recognition technology.

NTT says that the speech recognition they are currently developing has a 17.9 percent error rate – considerably lower than the 30.1 percent rate of currently existing technology. This is accomplished by an algorithm which loops through the audio data and can “learn” the speaker’s particular habits or accents. This also improves the program’s ability to filter out noise from the data with a high degree of accuracy.

It’s unclear to what end this pronunciation smoother would do aside from clear up confusion in foreign restaurants on whether Japanese people want a beer or the bill. NTT is considering if it would be useful for international business presentations and teleconferences.

It could be useful simply as a training device. Hearing your own voice speaking with “perfect” pronunciation could be a good source of motivation to keep up the rigorous vocal training needed for Japanese people interested in eliminating as much accent from their speaking as possible.

Source: IT Media News (Japanese)
NTT Communication Science Laboratories (English/Japanese)