VoxWave answers #1 : Details on the development of ALYS’ vocal libraries.



Today we reveal what happens behind the scenes with the development of ALYS’ French and Japanese vocal libraries. Discover below every details.

To answer clearly, it is necessary to recall the context in which ALYS has been developed ALYS: we had very little money (the company was founded with our personal funds), a small team made up of highly versatile talents (we were only three students back then), and especially huge expectations from the public.

Which interface do you use for ALYS?

First of all, in order to properly understand our approach, il is necessary to remind that there are two main types of engines for vocal synthesis: synthesis engines based on samples resynthesis (compatible with interfaces such as VOCALOID, UTAU, Cadencii, NiaoNiao, NameWave,…) based on the exploitation of a library of samples (vocal libraries), and engines recreating from scratch voices, sometimes based on the knowledge gained through learning the voice with an artificial neuronal network (Praat, CeVIO, Sinsy,…).

Each technology has its advantages and disadvantages: the vocal libraries compatible with the first type of engine described can be developed quickly with a good desktop computer, whereas the second type of engine requires using very powerful computers (supercomputers) , which requires a much higher investment than the funds we have (hundreds of thousands euros).

Therefore, or choice was naturally focused on the first type of technology, especially that there are freely available or licensed.Hence it was possible to quickly improve ALYS by modifying the vocal synthesis engine without radically changing the way to use it, improving our internal processes.It was a way to give to our project a more concrete aspect, being then able to quickly provide a voice to the public.

As I have been using for several years various free solutions, I chose to use UTAU — after comparing it to other software such as Cadencii, NiaoNiao or NameWave.Indeed, today UTAU is to me the most stable free interface, although optimized for the Japanese and not for the French.That is why in this context, ALYS’ vocal library is still very hard to use, and it is also why we choose not to publish its prototypes.

This interface also has the advantage of allowing to choose one’s own synthesis engine (or “resampler”, as designated by UTAU users).This way we could early in the project use the widest possible set of freely available technology. This is also why ALYS’ timbre has changed between ”Dans mon Monde” and “Avenir”: the engine has been updated after the publication of the first song.

How was recorded voice ALYS’ vocal library? What did you use?

We recorded ALYS’ vocal libraries in July 2014. I, for this occasion, used my own equipment: an AKG Perception 220 microphone and a Focusrite Saffire USB 6.0 audio interface.
The recording was also done in two steps, at the beginning and the end of the month, which explains the slight difference of timbre between ALYS’ Japanese vocal library and ALYS’ French vocal library. The voice of a person undergoes many fluctuations, which may, even on the scale of a month, leading to many distortions in the final result (and with very few means of controlling it).
As regards the process itself: I wrote in the previous two months recording scripts to get all the samples required for the synthesis of French and Japanese (one per language), and once in the recording place, we asked Poucet to pronounce the sounds in the order specified by the script, at different pitches.
For the anecdote, ALYS has another French vocal library that has never been used in public. Why? Just because we had underestimated the installation time in the recording place, and it was also not thoroughly soundproofed – no outside noise is essential for good recording quality.
For this reason, we decided to focus on the recording of the Japanese library, and have planned new records in late July in order to have something more qualitative, that would allow to achieve the best quality of voice possible with our means we had at that time.

However, we have recently begin to work on drastically improving ALYS’ voice quality by beginning again this recording work from scratch, this time in much better conditions (perfectly finalized script, recorded in excellent audio conditions and worked well with Poucet).We will not tell you more for the moment but you should be satisfied with the result! Things change for the best! 😉

You said the first recording environment was noisy, have the vocal libraries been post-processed?

Yes, and a lot!We initially had to split all the records in different files, since we recorded long audio files directly in IL Edison. These files were then segmented into SoundForge 10, samples obtained being renamed after a specific chart wrote by myself. After that, I had to do audio enhancements and repaired what needed to be repaired. For both vocal libraries, some of Poucet’s consonants were a bit too long, so I had to shorten them when possible. This mostly concerned the voiceless consonants such as /t/ and /s/. Obviously this was not as easy for voiced consonants such as /s/ and /z/, and these therefore remained long consonants. It also happened (very rarely, but still) that some consonants (especially /t/ and /p/) were mispronounced (or not pronounced at all in very rare cases).The latest case was a /t/ pronounced as a /d/ in the Japanese vocal library, and I had to fix that.

I also had to remove all lips and saliva sounds from the records.

The biggest part of this, however, consisted in eliminating the static background noise due to the place where we recorded ALYS, and this file by file — that is to say that 790 audio files for the Japanese vocal library and 802 audio files for the French vocal library had to be edited in iZotope RX.
There were various sounds that were not detected during recording, such as subtle chair noises during vowels, which I also had to delete.
People who followed me on Twitter during this part of the development of the Japanese vocal library could see a lot of screenshots showing audio spectra the recordings before and after being cleaned.

How were ALYS’ libraries built, which type of vocal libraries are they?

ALYS’ Japanese library was recorded following my own Japanese recording script (also called reclist by UTAU users) written in VCV (Vowel-Consonant-Vowel), with additional vowels such as /I/ or differentiation between the /z/ and /dz/.This library also includes entries in VC (vowel-consonant) in order to perform English words (what some call Engrish: English made with a Japanese library) more easily. For instance, in Hajime Ni, an [ak] note was used for the word “dark” in the sentence “a shot in the dark”This vocal library has been recorded on three main pitches. For vowels and vowel-vowel transitions, we also recorded additional pitches to make the range of ALYS larger, and the transitions between the main pitches smoother.

We also noticed afterward that the vocal library has been recorded in a “kire” style: the voice is softer, almost whispered on lower notes, whereas it is more powerful on higher pitches.

As regards the French vocal library, it has been recorded in CVVC (consonant-vowel and vowel-consonant).There are two main pitches and three additional pitches. Due to lack of time, I needed help by an outsider from VoxWave to help me to write the recording script (which I complemented afterwards).Although the Japanese script could be generated by an algorithm, the French script had to be manually written.
I’ve realized when using the prototype that additional records and transitions would have been needed, hence I had to adapt my way of using the prototype.

I have also chosen not to use the built-in development tools of UTAU, that I already had the occasion to use but seems too limiting. So I used the software setParam to set the oto.ini (written in Notepad) for both vocal libraries. setParam not only allows a quick vocal library development, but also offers tools to check the quality of oto.ini. Let’s recall that at this time we absolutely wanted to release a song for the month of September. setParam made me able to develop enough of the French vocal library for “Dans mon Monde”, which was still in alpha when it was published!

I also used Masao’s FRQ Editor to generate with world4utau the frequency tables for ALYS’ recordings, since this engine has the most powerful and accurate pitch estimation tool. Then I converted these frequency tables in the format used by ALYS’ engine, after fixing the various issues that I could detect with the FRQ Editor.

How are ALYS’ libraries working? How do you use them?

ALYS’ libraries — especially the French one — are very complex to use.
Since the Japanese vocal library is recorded in VCV, I only need one or two notes per syllable. I use two notes when I need to use a vowel recorded on an extra pitch: I create a first note which is 100 ms long, then the rest of the syllable is done with another note using only this extra pitch. This is done when a note is between two main pitches, or above the highest main pitch, or below the lowest main pitch.

But French is the funniest.

First of all, I must make ALYS pronounce correctly all the CVVC, and it cannot be done easily without solid know-how in CVVC.
At the time of “Dans mon Monde”, I already had the opportunity to work with some English libraries (including my own made for Tashi, my own UTAU), but I was not experienced enough to make smooth transitions, and we can hear it in “Dans mon Monde”.
However, by using daily for several hours ALYS during over a year, I finally managed to master her vocal library. This level-up becomes completely obvious when listening to “Sous cette pluie.

Thus, each syllable requires at least two notes, a CV (Consonant-Vowel) and VC (Vowel-Consonant).However, I only use two notes for short syllables.
For long syllables, I use a third note to the vowel between the CV and VC notes. The vowel is a stationary recording from either one of the main pitch or an additional pitch.

How do you tune ALYS’ voice?

I simply sing the song myself and try to recreate the pitch and volume changes I do or which could sound good (in my opinion).
I also happen to apply filters (called “flags”) to the voice of ALYS. In the case of “Avenir”, I felt that the original voice of ALYS did not sound well with the song, hence I filtered it in order to make her sound more mature or adult. In one of the upcoming songs of ALYS, you will discover that by pushing further enough this very same filter, we can modify ALYS’ voice in order to make her sound more ”androgynous”.But I will say no more!

ALYS’ pronunciation is based on my own pronunciation of French.
Some people have noticed ALYS had an accent … It’s true! Indeed, it is mine. I sometimes tend to invert open /O/ and closed /o/.The same goes for open /E/ and closed /e/.Well, “invert”, only if one assumes that a “standard” French exists. But people who share the same accent with me will hear no accent in ALYS’ voice. Speaking in French without an accent is a chimera.

Ultimately, that’s why there are as many artists as ALYS’s: each artist can redesign ALYS visually, but musically, each composer bring to ALYS a number of specificities, including their own accent!
We hope this article was able to provide answers to your questions or confirmations you expected. ALYS’ vocal libraries are still in a very beginning, and its capacity will dramatically increase dramatically in the following months. Furthermore ALYS remain way more than a vocal library, since it is never limited to it and never will it. We wish to create around it a much wider and more virtuous universe for all her users and fans.

Thank you Drak-pa for those answers. 🙂


You could already send us your creations and applications in order to create original content for ALYS. We confirm this and we will give you more details about the steps to follow to submit proposals to the team; shall you be a composer, songwriter, illustrator or animator.


Related articles