TTS is widely used in modern appliances as a hands-free user interface through varieties of deployment of artificial intelligenc technology and machine learning protocols. At the outset of the creation of a Speech corpus, recording engineers must thoroughly understand the key characteristics the corpus will be used in the downstream TTS processing and thereby perform ensuing speech recording to meet specifications pertinent to ultimate TTS requirements.
A popular everyday application of TTS technology to many people is the built-in voice assistant accompanying commercial smart phones, examples include Siri and Google Play. The MuScene Studio, consisted of the audio recording division and the forensic acoustics laboratory, had played important roles in many similar projects, such as verbal navigation guidance platforms, several international leading-brand smart phone voice assistant systems, interactive unmanned customers service platforms, linguistic and phonetic speech corpus recordings with synchronized Electrographic signal recording, and many research projects on vocal spectroscopic sampling and analysis.
In principle, TTS and its machine learning platform can be deployed on other types of commercial products that involves interactive interface, bestowing vivid lifeform on conventional stand-still utilities. Such transforms not only make upgraded utilities more user-friendly, but also open up new usage spectrum unthought of before their advent.
If your upcoming corporate project is considering incorporating some aspects of TTS technology into your product-line, yet fretting over lack of needed technological knowledge towards its deployment, you are welcome and encouraged to contact the MuScene Studio for effective consultation over how you should schedule your new investment.
➤ A crude estimation totals more than 3000 recording studios, mostly one-person workshop, scattered across Taiwan. How shall you pick the right partner for your TTS project?
Experts from our forensic acoustics laboratory have analyzed relevant key indices to successful deployment of TTS technology at several scales, among which are spatial acoustic environment that eliminates low-frequency standing waves, low background noise level, finely tuned reverberation level and signal decay time, qualified acoustic and recording engineers, professional announcer, program manager who is also a linguistic specialist, and finally the performance of all recording hardware involved in the creation of the speech corpora.
Frankly speaking, the criteria for a qualified TTS Speech Studio is very strict and hard to meet, and it can be judged that well over 95% of domestic recording studios do not meet the minimum standards for executing a TTS project.
➤ Announcer, Voice Actor, or Character Voice — the key player of a speech corpus
The human voice, or resynthesized versions using that human voice as a model template, that will represent the soul of the end product.
Therefore, this character voice must be representative of the nature, status, or figurative image of the physical product that conveys messages via that characteristic voice. Pleasant to the ears, while an important factor, is not the sole one; other phonology characteristics may sometimes prevail its choice. Related expertise advices will be provided by our linguistic/acoustic program manager at the outset of your TTS project.
➤ Linguistic expert plays a pivotal role in a TTS project by finalizing the transcript of speech contents and coverage to be incorporated into the Speech Corpus that will be the ultimate source of voices that constitute the vocal output information in the end application. The linguistic expert assures that the speech corpus contains all needed verbal elements for later re-synthesis of speech dialogues. He/she is also responsible for assuring the phonetic and phonological accuracy, as well as all fine aspects of the announcer’s utterance, such as prosody, tempo, and intonation, etc.
➤ TTS recording project starts with text quantity evaluation on which ensuing project time-line, budget appropriation, accuracy assurance, phonological segmentation, annotation, machine learning training and testing steps are based. The Project Manager needs multidisciplinary knowledge as well as good command of a variety of resources to ensure the smooth progress of the project.
An exemplary, typical TTS project text may contain 2,000 sentences, roughly a 30,000-word vocabulary, the actual in-studio speech recording time with an experienced working team is about 25 working hours; the end speech corpus will last about 3~4 contiguous hours.
Other than the nature of a TTS project being multidisciplinary, the actual project execution involves multiple intertwined serial and parallel production processes. In this regard, the Project Manager to a TTS project is what a conductor to an orchestra, the smooth flow and progress of the TTS project relies on the PM’s timely versatile fine tuning of available resources.
Careful post-recording monitor of the recorded speech data is imperative before further processing is possible — any deviation from phonetic/phonologic accuracy must be corrected to ensure error- and trouble-free downstream applications. It is not unusual to add additional speech ingredients after the completion of initial speech recording to expand the coverage of the speech corpus being constructed.
Studio recording is placed at the middle of an integrated TTS project. It is preceded by basic preparative research works of the market application, cultural and linguistic morphology of the targeted market, and linguistic text tailoring; followed by recording accuracy assurance, phoneme segmentation and annotation steps. In the late stage of a TTS project, the major working force becomes information technology related experts and software programmers, when the speech corpus will be actually incorporated into the commercial product as a voice conveyed information flow.
Based on our track record in many large scale international voice projects covering from medical- to TTS-targeted applications, the MuScene Studio can proudly promise our customers with the highest voice quality Speech Corpus targeted for TTS projects. We can serve your needs in any stage of your TTS Project with full confidence. Please visit our websites at en.muscene-studio.com and www.voice-forensics.com for a quick glance at our technical specialties before contacting us at en.muscene-studio.com/contact.html.