VocaloidOtaku.net Forums - Providing Everything Vocaloid: Yamaha's new patent that could allow Vocaloid respond to your questions - VocaloidOtaku.net Forums - Providing Everything Vocaloid

Jump to content


Welcome to VocaloidOtaku!

You are currently viewing our forum as a guest which means you are limited to some discussions and certain features.
Take a few minutes to browse around. Should you enjoy what you see, register and you will gain access to more stuff.

Registration is simple and fast. It won't fetch you more than a minute.
Click here to join!
Guest Message © 2017 DevFuse
Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

Yamaha's new patent that could allow Vocaloid respond to your questions Yup, possible chatbot integration for Vocaloid

#1 User is offline   exemplar Icon

  • Icon
  • Group: Members
  • Posts: 3,228
  • Joined: 10-March 12
  • Gender:Male

Posted 20 April 2017 - 01:25 PM

This was published today and it was filed back in December last year under the title Technology for responding to remarks using speech synthesis. So this is another patent that refers to the patents I have dug up the past few months with regards to what Yamaha is planning to do with voice recognition in the editor (link & link). Instead of posting more jibber jabber here about it i'll get into the meat of the patent of it.

Quote

Abstract: The present invention is provided with: a voice input section that receives a remark (a question) via a voice signal; a reply creation section that creates a voice sequence of a reply (response) to the remark; a pitch analysis section that analyzes the pitch of a first segment (e.g., word ending) of the remark; and a voice generation section (a voice synthesis section, etc.) that generates a reply, in the form of voice, represented by the voice sequence. The voice generation section controls the pitch of the entire reply in such a manner that the pitch of a second segment (e.g., word ending) of the reply assumes a predetermined pitch (e.g., five degrees down) with respect to the pitch of the first segment of the remark. Such arrangements can realize synthesis of replying voice capable of giving a natural feel to the user

Quote

TECHNICAL FIELD
The present invention relates to a speech or voice synthesis apparatus and system which, in response to a remark, question or utterance made by voice input, provide replying output, as well as a coding/decoding device related to the voice synthesis.

BACKGROUND


In recent years, the following voice synthesis techniques have been proposed. Examples of such proposed voice synthesis techniques include a technique that synthesizes and outputs voice corresponding to a speaking tone and voice quality of a user and thereby generates voice in a more human-like manner (see, for example, Patent Literature 1), and a technique that analyzes voice of a user to diagnose psychological and health states etc. of the user (see, for example, Patent Literature 2).

Also proposed in recent years is a voice interaction or dialogue system which implements voice interaction with a user by outputting, in synthesized voice, content designated by a scenario while recognizing voice input by the user (see, for example, Patent Literature 3).

Quote

Let's assume a dialogue system which combines the aforementioned voice synthesis technique and the voice interaction or dialogue system, and which searches for data in response to a question given by voice of a user (spoken question by the user) and outputs an answer or reply in synthesized voice. In such a case, however, there would occur a problem that the voice output by the voice synthesis gives the user an unnatural feeling, more specifically a feeling as if a machine were speaking.

Quote

SUMMARY OF INVENTION

In view of the foregoing, it is an object of the present invention to realize, in a technique for responding to a question or remark by use of voice synthesis, synthesis of responsive or replying voice capable of giving a natural feeling to a user. More specifically, the present invention seeks to provide a technique which can easily and controllably realize replying voice that gives a good impression to the user, replying voice that gives a bad impression, etc.

In studying a man-machine system which synthesizes voice of a reply to a question (or remark) given by a user, the inventors of the present invention etc. first considered what kinds of dialogues are actually conducted between persons, focusing on non-linguistic information (i.e., non-verbal information other than verbal or linguistic information) and particularly pitches (frequencies) characterizing dialogues.

Here, consider a dialogue between persons where one of the persons (hereinafter “person b”) returns a reply to a question given by the other person (hereinafter “person a”). Often, in such a case, when person a has uttered the question, not only person a but also person b, who is going to reply the question, keeps in mind a pitch of a given segment of the question with a strong impression. In returning a reply to the question with a meaning of agreement, approval, affirmation or the like, person b utters replying voice in such a manner that a pitch of a portion characterizing the reply, such as the word ending or word beginning, of the reply assumes a predetermined relationship, more specifically a consonant interval relationship, with (with respect to) the pitch of the question having impressed the person. The inventors etc. thought that, because the pitch which left an impression in the mind of person a about his or her question and the pitch of the portion charactering the reply of person b are in the above-mentioned relationship, person a would have a comfortable and easing good impression about the reply of person b.

Further, people have communicated with one another for a long time from the ancient times when there was no language. It is presumed that pitch and volume of human voice has played a very important role in human communications under such environment. It is also presumed that, although voice-pitch-based communications are forgotten in these modern times when languages have developed, “predetermined pitch relationship” used from the ancient times can give a “somehow comfortable” feel because such a predetermined pitch relationship has been inscribed in the human DNA and handed down to the present times.

Quote

Namely, in order to achieve the aforementioned objects, one aspect of the present invention provides a voice synthesis apparatus comprising: a voice input section configured to receive a voice signal of a remark; a pitch analysis section configured to analyze a pitch of a first segment of the remark; an acquisition section configured to acquire a reply to the remark; and a voice generation section configured to generate voice of the reply acquired by the acquisition section, the voice generation section controlling a pitch of the voice of the reply in such a manner that a second segment of the reply has a pitch associated with the pitch of the first segment analyzed by the pitch analysis section.

According to such an embodiment of the invention, it is possible to prevent the voice of the reply, synthesized in response to the input voice signal of a question (remark), from being accompanied by an unnatural feel. Note that the reply to the question (remark) is not limited to a specific or concrete reply and may sometimes be in the form of back-channel feedback (interjection), such as “eec” (romanized Japanese meaning “Yah.”), “naruhodo” (“I see.”) or “sou desune” (“I agree.”) Further, the reply is not limited to one in human voice and may sometimes be in the form of voice of an animal, such as “wan” (“bowwow”) or “Nyâ” (“meow”). Namely, the terms “reply” and “voice” are used herein to refer to concepts embracing not only voice uttered by a person but also voice of an animal.

You can read the full patent over at http://www.freshpate...20170110111.php

#2 User is offline   sleepysheep7 Icon

  • GET OFF MY LAWN YOU WHIPPERSNAPPERS!
  • Icon
  • Group: VO+ Members
  • Posts: 9,776
  • Joined: 02-May 10
  • Gender:Male
  • Producers:OSTER-P

Posted 21 April 2017 - 01:27 AM

So if I am understanding this correctly it is basically like Siri, Cortanna, or What ever Google calls their voice question thing?
♈♉♊♌♍♎♏♐♑♒♓


Posted Image

.....IDK

#3 User is offline   AyuRox Icon

  • Icon
  • Group: Moderators
  • Posts: 3,400
  • Joined: 11-July 10
  • Gender:Male
  • Location:NC, USA

Posted 21 April 2017 - 01:35 AM

View Postsleepysheep7, on 20 April 2017 - 09:27 PM, said:

So if I am understanding this correctly it is basically like Siri, Cortanna, or What ever Google calls their voice question thing?

Basically what I thought, too.

Not that it isn't cool, but it doesn't seem to be, like, groundbreaking.
My YouTube
My SoundCloud
My WIP Thread

Posted Image
Set by PinkShinigami @ Ayumi Hamasaki Sekai

#4 User is offline   Somebodyrandom Icon

  • Icon
  • Group: Members
  • Posts: 5,641
  • Joined: 04-May 10
  • Gender:Female
  • Producers:Shu-tP

Posted 21 April 2017 - 10:03 AM

I think I said this in the last patent topic... but yamaha wants to add AI to Vocaloid somehow.

no details tho, the new guy who took over after V4 was released is really keen, who knows where Vocaloid is headed?
Youtube poster 1; what does the [/] do?

Youtube poster 2; I'm guessing that it breaks the word in half.

Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic


1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users