The document discusses the AT&T Speech API, which provides speech recognition services. It describes how the API works, including submitting audio files for transcription and receiving a JSON response. It also provides instructions for developers to get started, including signing up for an API key and examples of making authentication requests and posting audio files. Code samples are shown for capturing audio, making POST requests to submit it for transcription, and receiving the transcription response.
2. September 25, 2012
AT&T SPEECH API DEEP DIVE
Michael Owens (@mko on Twitter, mowens on Github)
Jay Lieske ( jay.lieske@att.com, jayatyp on Github)
AT&T Developer Program
息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
3. WHAT IS THE
AT&T SPEECH API?
2
息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
4. How the
AT&T
Speech
API Works
2
息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
5. Powered by AT&T WATSON
Developed 20+ years
Optimized for different usage scenarios:
Web Search
Business Search
Question & Answer
Voicemail-to-Text
Short Message (SMS)
TV Search/Remote (U-Verse)
Generic Speech-to-Text
2
息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
6. Simple Speech-to-Text
One REST endpoint
Accepts audio in WAV or AMR
Structured JSON response
Text spoken by user
Metrics to evaluate recognition quality
AT&T Native SDKs for Android and iOS
handle audio capture and streaming
2
息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
7. Apps in the Wild
AT&T-Translator Speak4it U4Verse-Easy-Remote
2
息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
8. GETTING STARTED
WITH THE AT&T
SPEECH API
3
息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
9. Sign Up for API Access
j.mp/ATTDevSignUp
Free API Access for
DevLab Attendees
Detailed Instructions in
your Attendee Packet
Sign up with code
APILAB12
AT&T Staff is on hand to
answer questions and
help get you set up
2
息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
10. Before You Code
Get your API Keys from Developer portal:
Client ID (API Key on the AT&T Developer Portal)
Client Secret (Secret Key on the AT&T Developer Portal)
OAuth 2.0 client_credentials grant type
OAuth 2.0 access_token
Audio File Types:
AMR: narrowband, 12.2 kbits/s, 8 kHz sampling
WAV: 16 bit PCM WAV, single channel, 8 kHz sampling
Audio File Length:
Voicemail: 4 minutes or less
Other: 1 minute or less
2
息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
12. Step 2: POST Audio to AT&T
(Non-Streaming HTTP Request)
Request Method: POST
Request URL: https://api.att.com/rest/1/SpeechToText
Request Headers: Accept: application/json
Authorization: Bearer xxyz123
Content-Type: audio/wav
Content-Length: 1534
X-SpeechContext: BusinessSearch
Request Body: AUDIO_BINARY_DATA
Note: The Audio Binary Data
goes directly in POST Body,
not a MIME Attachment.
2
息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
13. Step 2: POST Audio to AT&T
(Streaming HTTP Request)
Request Method: POST
Request URL: https://api.att.com/rest/1/SpeechToText
Request Headers: Accept: application/json
Authorization: Bearer xxyz123
Content-Type: audio/amr
Transfer-Encoding: chunked
X-SpeechContext: QuestionAndAnswer
Request Body: 200
Note: Numbers are the AUDIO_BINARY_DATA_CHUNK
recommended chunk size 200
in hexadecimal format. AUDIO_BINARY_DATA_CHUNK
0
2
息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
14. AT&T SPEECH API
EXAMPLE
APPLICATION
Download the Source:
https://github.com/attdevsupport/2012DevLabExamples
4
息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
15. Transcription in Three Steps
1. Capture Audio Input 2. POST Audio to AT&T 3. Use AT&T API Response
Capturing audio input differs Once the audio input has been The AT&T API sends back a very
from platform to platform. captured, we send the easy to parse JSON object with
compatible audio 鍖le from our the interpreted text.
In our Basic Example, we use a server to the Speech API using
small Adobe Flex app to access In our Basic example, we
a simple POST.
the mic via Flash, capture the output this to the users screen
audio in one of the two In our Basic Example, we use a pretty printed and syntax
accepted formats, then save small Node.js module called highlighted, but you could do
that newly created audio 鍖le to Watson.js (NPM: watson-js) much more.
disk on the server. to OAuth to the Speech API
In our Speech Labs, we will look
and then POST the audio 鍖le.
In our Speech Labs, we will look at other ways to use this data,
at the methods by which you In our Speech Labs, we will do like searching for businesses
can capture and stream audio this on iOS, Android, and Web. on Foursquare.
directly to the Speech API.
2
息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
16. Watson.js
Node.js API Wrapper for the AT&T
Speech API
GitHub: http://github.com/mowens/watson-js/
NPM: https://npmjs.org/package/watson-js
5
息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
17. Using Watson.js
1. Require API Wrapper
var WatsonClient = require(watson-js);
2. Set API Client Options
var options = {
client_id: ATT_API_CLIENT_ID,
client_secret: ATT_API_CLIENT_SECRET,
access_token: ACCESS_TOKEN,
scope: "SPEECH",
context: "Generic",
access_token_url: "https://api.att.com/oauth/token",
api_domain: "api.att.com"
};
3. Instantiate New API Client
var Watson = new WatsonClient.Watson(options);
2
息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
18. The Methods of Watson.js
Watson.getAccessToken(callback)
Method for requesting a new OAuth Access Token using
the Client Credentials grant type and passes the returned
Access Token to the passed callback function.
Watson.speechToText(speechFile, accessToken, callback)
Method for piping a speech 鍖le (passed as an absolute 鍖le
location) to the AT&T Speech API using the passed access
token. The API Responses JSON is returned to the passed
callback function as parsed JSON.
2
息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
19. AT&T SPEECH API
EXAMPLE APP CODE
WALKTHROUGH
Using the AT&T Speech API to convert
generic audio to text in a web browser.
example-basic in the examples repo
6
息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
20. Frameworks &
Requirements:
Server-side:
Node.js: JavaScript platform for building fast, scalable network apps
FS: Node.js File System module
Express: Minimal web application framework for Node.js
Optimist: Lightweight option parsing module for Node.js
HBS: Express View Engine wrapper for Handlebars
Watson.js: Simple API Wrapper for AT&T Speech API
Client-side:
jQuery: The gold standard of client-side JavaScript libraries
swfobject: JavaScript to make embedding Flash objects easier
Bootstrap: Twitters CSS framework for quickly developing web apps
2
息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
21. Capture Audio Input
recorder.swf:
Adobe Flex app that accesses the users microphone and emits events to JS
recorder.js:
JavaScript interface to receive events, update UI, and POST 鍖le to Node.js
Node.js upload script:
function cp(source, destination, callback) {
fs.readFile(source, function(err, buf) {
fs.writeFile(destination, buf, callback);
});
}
app.post('/upload', function(req, res) {
cp(req.files.upload_file.filename.path, __dirname +
req.files.upload_file.filename.name, function(err) {
res.send({ saved: 'saved' });
return;
});
});
2
息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
22. POST Audio to AT&T
AJAX Request via POST from client side to Node.js
// Receive an AJAX POST from client-side JavaScript
app.post('/speechToText', function(req, res) {
// Pass the audio file and access token to AT&T Speech API
Watson.speechToText(__dirname + '/public/audio/audio.wav',
this.access_token, function(err, reply) {
// Pass any errors associated with API call to client-side JS
if(err) { res.send({ error: err }); return; }
// Return the parsed JSON to client-side JavaScript
res.send(reply);
return;
});
});
2
息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
23. Use Speech API Response
Example API Response, returned Response-
What-The-Response-Parameter-Means
from call using Content-Type of Parameter
application/json: Recognition Body"object"for"the"AT&T"Speech"API"Response
ResponseId Unique"IdenG鍖er"for"a"speci鍖c"API"call
Array"of"hypothesis"objects"(possible"
{ NBest
transcripGons"of"audio"data).
"Recognition": {
PlainKtext,"cleaned"up"representaGon"of"the"
"ResponseId": "74a964bf2fe", ResultText Hypothesis."This"should"be"used"when"displaying"
"NBest": [ { the"text"to"users."
"WordScores": [1, 0.75, 1, 0.75], Con鍖dence"score"for"the"overall"Hypothesis."
"Confidence": 0.75, Confidence Scored"on"a"scale"from"0"(not"con鍖dent)"to"1.0"
(very"con鍖dent)
"Grade": "accept",
Recommended"acGon"to"take"with"the"current"
"ResultText": "This is a test.", Grade
Hypothesis:"accept,"reject,"or"con鍖rm
"Words": [This, is, a, Array"of"the"individual"words."Con鍖dence"scores"
test.], Words for"each"word"are"available"in"the"WordScores"
"LanguageId": "en-us", array."
"Hypothesis": "This is a test." Array"of"individual"con鍖dence"scores"for"each"
WordScores word"in"the"ResultText"parameter."Corresponds"
} ] to"Words"array.
} RepresentaGon"of"the"response"language."
} LanguageId Supports"English"&"Spanish"in"Generic;"EnglishK
only"in"other"contexts.
The"raw"transcripGon"of"the"audio"that"was"
Hypothesis
interpreted.
2
息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
24. Up Next:
Michael Fitzpatrick
2
息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
25. Up Next:
Jason Goecke
Adam Kalsey
2
息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
26. ADVANCED
EXAMPLES
What can you do with Speech-to-text?
You could
Make your mobile or web application accessible with voice commands
Post tweets using voice commands in a simple Twitter app
Add on-the-鍖y transcripts while recording in a podcasting app
Add captioning to videos hosted on your website automatically
Create real-time closed captions of a conference speakers presentation
Search for nearby places to check in at on Foursquare
7
息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
27. Speech Labs
Were now going to break out into three clusters, each focusing on a
different technology stack. Work independently or with a partner!
Web (Flex + Node.js) iOS (Objective-C) Android (Java)
In the Web Speech Lab, Michael In the iOS Speech Lab, Brant In the Android Speech Lab, Jay
will be on hand to help get your will help you try out the AT&T will help you try out the AT&T
Node.js app working with the Speech API on iOS and go into Speech API on Android and go
AT&T Speech API. Code up your more depth about the AT&T into more depth about the
own Speech API app from Speech SDK for iOS. AT&T Speech SDK for Android.
scratch, or you can start from a The mobile SDK allows you to The mobile SDK allows you to
boilerplate app that uses quickly capture and stream quickly capture and stream
Foursquare to search for audio from your iPhone or iPad audio from your Android
locations and allow you to app to the AT&T Speech API. phone or tablet app to the
check-in from your web AT&T Speech API.
browser!
2
息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
AT&T Developer Program
28. September 25, 2012
THANKS! ANY QUESTIONS?
Michael Owens (@mko on Twitter, mowens on Github)
Jay Lieske ( jay.lieske@att.com, jayatyp on Github)
AT&T Developer Program
息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.