23. Use Cases
? Voice Web Search?
? Speech Command Interface?
? Domain Specific Grammars Contingent on Earlier Inputs
? Continuous Recognition of Open Dialog?
? Domain Specific Grammars Filling Multiple Input Fields
? Speech UI present when no visible UI need be present?
? Voice Activity Detection?
? Temporal Structure of Synthesis to Provide Visual Feedback?
? Hello World
? Speech Translation?
? Speech Enabled Email Client?
? Dialog Systems?
? Multimodal Interaction?
? Speech Driving Directions?
? Multimodal Video Game
? Multimodal Search?
24. ? Voice Web Search?
? Speech Command Interface?
? Domain Specific Grammars Contingent on Earlier Inputs
? Continuous Recognition of Open Dialog?
? Domain Specific Grammars Filling Multiple Input Fields
? Speech UI present when no visible UI need be present?
? Voice Activity Detection?
? Temporal Structure of Synthesis to Provide Visual Feedback?
? Hello World
? Speech Translation?
? Speech Enabled Email Client?
? Dialog Systems?
? Multimodal Interaction?
? Speech Driving Directions?
? Multimodal Video Game
? Multimodal Search?
? ?
Use Cases
25. ? Voice Web Search?
? Speech Command Interface?
? Domain Specific Grammars Contingent on Earlier Inputs
? Continuous Recognition of Open Dialog?
? Domain Specific Grammars Filling Multiple Input Fields
? Speech UI present when no visible UI need be present?
? Voice Activity Detection?
? Temporal Structure of Synthesis to Provide Visual Feedback?
? Hello World
? Speech Translation?
? Speech Enabled Email Client?
? Dialog Systems?
? Multimodal Interaction?
? Speech Driving Directions?
? Multimodal Video Game
? Multimodal Search?
つまり
コミュニケーションロボットを
つくるためのAPIでしょ!?
Use Cases
26. 1. User agents must only start speech input sessions with explicit, informed user consent. User
consent can include, for example:
- User click on a visible speech input element which has an obvious graphical representation showing that it ?
will start speech input.
- Accepting a permission prompt shown as the result of a call to SpeechRecognition.start.
- Consent previously granted to always allow speech input for this web page.
2.User agents must give the user an obvious indication when audio is being recorded.
- In a graphical user agent, this could be a mandatory notification displayed by the user agent as part of its ?
chrome and not accessible by the web page. This could for example be a pulsating/blinking record icon as ?
part of the browser chrome/address bar, an indication in the status bar, an audible notification, or anything ?
else relevant and accessible to the user. This UI element must also allow the user to stop recording.
- In a speech-only user agent, the indication may for example take the form of the system speaking the label ?
of the speech input element, followed by a short beep.
3. The user agent may also give the user a longer explanation the first time speech input is used, to
let the user now what it is and how they can tune their privacy settings to disable speech recording if
required.
4. To minimize the chance of users unwittingly allowing web pages to record speech without their
knowledge, implementations must abort an active speech input session if the web page lost input
focus to another window or to another tab within the same user agent.
Security and privacy considerations
27. Web Speech API
Speech Synthesis API 音声合成API
Speech Recognition API 音声認識API
A web API for controlling a text-to-speech output.
Method to provide speech input in a web browser.
28. SpeechSynthesis Attributes?音声合成の属性
Speech Synthesis API
pending
speaking
paused
SpeechSynthesis Methods?音声合成のメソッド
speak
cancel
pause
resume
getVoices
SpeechSynthesisUtterance Attributes?音声合成発話の属性
text
lang
voiceURI
volume
rate
pitch