際際滷

際際滷Share a Scribd company logo
09.25.2012
September 25, 2012




AT&T SPEECH API DEEP DIVE
                      Michael Owens (@mko on Twitter, mowens on Github)
                      Jay Lieske ( jay.lieske@att.com, jayatyp on Github)




                                                                                                                                  AT&T Developer Program
   息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
WHAT IS THE
    AT&T SPEECH API?




2
     息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                    AT&T Developer Program
How the
    AT&T
    Speech
    API Works




2
     息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                    AT&T Developer Program
Powered by AT&T WATSON
     Developed 20+ years
     Optimized for different usage scenarios:
       Web Search
       Business Search
       Question & Answer
       Voicemail-to-Text
       Short Message (SMS)
       TV Search/Remote (U-Verse)
       Generic Speech-to-Text
2
      息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                     AT&T Developer Program
Simple Speech-to-Text
     One REST endpoint
     Accepts audio in WAV or AMR
     Structured JSON response
        Text spoken by user
        Metrics to evaluate recognition quality
     AT&T Native SDKs for Android and iOS
     handle audio capture and streaming




2
      息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                     AT&T Developer Program
Apps in the Wild




    AT&T-Translator                                                                               Speak4it                          U4Verse-Easy-Remote



2
     息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                       AT&T Developer Program
GETTING STARTED
    WITH THE AT&T
    SPEECH API




3
     息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                    AT&T Developer Program
Sign Up for API Access
     j.mp/ATTDevSignUp
     Free API Access for
     DevLab Attendees
     Detailed Instructions in
     your Attendee Packet
     Sign up with code
     APILAB12
     AT&T Staff is on hand to
     answer questions and
     help get you set up

2
      息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                     AT&T Developer Program
Before You Code
     Get your API Keys from Developer portal:
       Client ID (API Key on the AT&T Developer Portal)
       Client Secret (Secret Key on the AT&T Developer Portal)
     OAuth 2.0 client_credentials grant type
     OAuth 2.0 access_token
     Audio File Types:
       AMR: narrowband, 12.2 kbits/s, 8 kHz sampling
       WAV: 16 bit PCM WAV, single channel, 8 kHz sampling
     Audio File Length:
       Voicemail: 4 minutes or less
       Other: 1 minute or less


2
       息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                      AT&T Developer Program
Step 1: Connect via OAuth
    Request Method:                                                              POST
    Request URL:                                                                 https://api.att.com/oauth/token

    Request Headers: Content-Type: application/x-www-form-
                                                                                 urlencoded
    Request Body:                                                                client_id=ATT_API_CLIENT_ID
                                                                                 &client_secret=ATT_API_CLIENT_SECRET
                                                                                 &grant_type=client_credentials
                                                                                 &scope=SPEECH

    Response Body:                                                               {
                                                                                              "access_token": "xxyz123"
                                                                                 }




2
      息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                     AT&T Developer Program
Step 2: POST Audio to AT&T
                                                        (Non-Streaming HTTP Request)
    Request Method: POST
    Request URL:    https://api.att.com/rest/1/SpeechToText
    Request Headers: Accept: application/json
                                                                                 Authorization: Bearer xxyz123
                                                                                 Content-Type: audio/wav
                                                                                 Content-Length: 1534
                                                                                 X-SpeechContext: BusinessSearch
    Request Body:                                                                 AUDIO_BINARY_DATA
    Note: The Audio Binary Data
    goes directly in POST Body,
    not a MIME Attachment.


2
        息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                       AT&T Developer Program
Step 2: POST Audio to AT&T
                                                                                     (Streaming HTTP Request)
    Request Method: POST
    Request URL:    https://api.att.com/rest/1/SpeechToText
    Request Headers: Accept: application/json
                                                                                Authorization: Bearer xxyz123
                                                                                Content-Type: audio/amr
                                                                                Transfer-Encoding: chunked
                                                                                X-SpeechContext: QuestionAndAnswer
    Request Body:                                                               200
    Note: Numbers are the                                                       AUDIO_BINARY_DATA_CHUNK
    recommended chunk size                                                      200
    in hexadecimal format.                                                      AUDIO_BINARY_DATA_CHUNK
                                                                                0
2
       息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                      AT&T Developer Program
AT&T SPEECH API
    EXAMPLE
    APPLICATION
    Download the Source:
    https://github.com/attdevsupport/2012DevLabExamples




4
       息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                      AT&T Developer Program
Transcription in Three Steps
         1. Capture Audio Input                                                              2. POST Audio to AT&T                      3. Use AT&T API Response

    Capturing audio input differs                                                   Once the audio input has been                     The AT&T API sends back a very
    from platform to platform.                                                      captured, we send the                             easy to parse JSON object with
                                                                                    compatible audio 鍖le from our                     the interpreted text.
    In our Basic Example, we use a                                                  server to the Speech API using
    small Adobe Flex app to access                                                                                                    In our Basic example, we
                                                                                    a simple POST.
    the mic via Flash, capture the                                                                                                    output this to the users screen
    audio in one of the two                                                         In our Basic Example, we use a                    pretty printed and syntax
    accepted formats, then save                                                     small Node.js module called                       highlighted, but you could do
    that newly created audio 鍖le to                                                 Watson.js (NPM: watson-js)                    much more.
    disk on the server.                                                             to OAuth to the Speech API
                                                                                                                                      In our Speech Labs, we will look
                                                                                    and then POST the audio 鍖le.
    In our Speech Labs, we will look                                                                                                  at other ways to use this data,
    at the methods by which you                                                     In our Speech Labs, we will do                    like searching for businesses
    can capture and stream audio                                                    this on iOS, Android, and Web.                    on Foursquare.
    directly to the Speech API.




2
       息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                          AT&T Developer Program
Watson.js
    Node.js API Wrapper for the AT&T
    Speech API

     GitHub: http://github.com/mowens/watson-js/
     NPM: https://npmjs.org/package/watson-js




5
      息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                     AT&T Developer Program
Using Watson.js
    1. Require API Wrapper
          var WatsonClient = require(watson-js);

    2. Set API Client Options
          var options = {
              client_id: ATT_API_CLIENT_ID,
              client_secret: ATT_API_CLIENT_SECRET,
              access_token: ACCESS_TOKEN,
              scope: "SPEECH",
              context: "Generic",
              access_token_url: "https://api.att.com/oauth/token",
              api_domain: "api.att.com"
           };

    3. Instantiate New API Client
          var Watson = new WatsonClient.Watson(options);

2
      息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                     AT&T Developer Program
The Methods of Watson.js
    Watson.getAccessToken(callback)
    Method for requesting a new OAuth Access Token using
    the Client Credentials grant type and passes the returned
    Access Token to the passed callback function.


    Watson.speechToText(speechFile, accessToken, callback)
    Method for piping a speech 鍖le (passed as an absolute 鍖le
    location) to the AT&T Speech API using the passed access
    token. The API Responses JSON is returned to the passed
    callback function as parsed JSON.



2
      息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                     AT&T Developer Program
AT&T SPEECH API
    EXAMPLE APP CODE
    WALKTHROUGH
    Using the AT&T Speech API to convert
    generic audio to text in a web browser.
    example-basic in the examples repo




6
     息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                    AT&T Developer Program
Frameworks &
    Requirements:
    Server-side:
     Node.js:                                  JavaScript platform for building fast, scalable network apps
     FS:                                       Node.js File System module
     Express:                                  Minimal web application framework for Node.js
     Optimist:                                 Lightweight option parsing module for Node.js
     HBS:                                      Express View Engine wrapper for Handlebars
     Watson.js:                                Simple API Wrapper for AT&T Speech API

    Client-side:
     jQuery:                                   The gold standard of client-side JavaScript libraries
     swfobject:                                JavaScript to make embedding Flash objects easier
     Bootstrap:                                Twitters CSS framework for quickly developing web apps


2
       息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                      AT&T Developer Program
Capture Audio Input
    recorder.swf:
            Adobe Flex app that accesses the users microphone and emits events to JS
    recorder.js:
            JavaScript interface to receive events, update UI, and POST 鍖le to Node.js
    Node.js upload script:
            function cp(source, destination, callback) {
              fs.readFile(source, function(err, buf) {
                 fs.writeFile(destination, buf, callback);
              });
            }
            app.post('/upload', function(req, res) {
              cp(req.files.upload_file.filename.path, __dirname +
              req.files.upload_file.filename.name, function(err) {
                 res.send({ saved: 'saved' });
                 return;
              });
            });

2
       息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                      AT&T Developer Program
POST Audio to AT&T
    AJAX Request via POST from client side to Node.js
    // Receive an AJAX POST from client-side JavaScript
    app.post('/speechToText', function(req, res) {

      // Pass the audio file and access token to AT&T Speech API
      Watson.speechToText(__dirname + '/public/audio/audio.wav',
      this.access_token, function(err, reply) {

           // Pass any errors associated with API call to client-side JS
           if(err) { res.send({ error: err }); return; }

           // Return the parsed JSON to client-side JavaScript
           res.send(reply);
           return;

      });

    });


2
      息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                     AT&T Developer Program
Use Speech API Response
    Example API Response, returned                                                                                                Response-
                                                                                                                                               What-The-Response-Parameter-Means
    from call using Content-Type of                                                                                               Parameter
    application/json:                                                                                                         Recognition    Body"object"for"the"AT&T"Speech"API"Response
                                                                                                                                 ResponseId    Unique"IdenG鍖er"for"a"speci鍖c"API"call
                                                                                                                                               Array"of"hypothesis"objects"(possible"
    {                                                                                                                                  NBest
                                                                                                                                               transcripGons"of"audio"data).
    "Recognition": {
                                                                                                                                               PlainKtext,"cleaned"up"representaGon"of"the"
      "ResponseId": "74a964bf2fe",                                                                                               ResultText    Hypothesis."This"should"be"used"when"displaying"
      "NBest": [ {                                                                                                                             the"text"to"users."
        "WordScores": [1, 0.75, 1, 0.75],                                                                                                      Con鍖dence"score"for"the"overall"Hypothesis."
        "Confidence": 0.75,                                                                                                      Confidence    Scored"on"a"scale"from"0"(not"con鍖dent)"to"1.0"
                                                                                                                                               (very"con鍖dent)
        "Grade": "accept",
                                                                                                                                               Recommended"acGon"to"take"with"the"current"
        "ResultText": "This is a test.",                                                                                               Grade
                                                                                                                                               Hypothesis:"accept,"reject,"or"con鍖rm
        "Words": [This, is, a,                                                                                                           Array"of"the"individual"words."Con鍖dence"scores"
    test.],                                                                                                                          Words   for"each"word"are"available"in"the"WordScores"
        "LanguageId": "en-us",                                                                                                                 array."
        "Hypothesis": "This is a test."                                                                                                        Array"of"individual"con鍖dence"scores"for"each"
                                                                                                                                 WordScores    word"in"the"ResultText"parameter."Corresponds"
        } ]                                                                                                                                    to"Words"array.
      }                                                                                                                                        RepresentaGon"of"the"response"language."
    }                                                                                                                            LanguageId    Supports"English"&"Spanish"in"Generic;"EnglishK
                                                                                                                                               only"in"other"contexts.
                                                                                                                                               The"raw"transcripGon"of"the"audio"that"was"
                                                                                                                                 Hypothesis
                                                                                                                                               interpreted.


2
        息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                                         AT&T Developer Program
Up Next:




                                     Michael Fitzpatrick

2
     息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                    AT&T Developer Program
Up Next:




                                                          Jason Goecke
                                                           Adam Kalsey
2
     息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                    AT&T Developer Program
ADVANCED
    EXAMPLES
    What can you do with Speech-to-text?
     You could
      Make your mobile or web application accessible with voice commands
      Post tweets using voice commands in a simple Twitter app
      Add on-the-鍖y transcripts while recording in a podcasting app
      Add captioning to videos hosted on your website automatically
      Create real-time closed captions of a conference speakers presentation
      Search for nearby places to check in at on Foursquare




7
       息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                      AT&T Developer Program
Speech Labs
    Were now going to break out into three clusters, each focusing on a
    different technology stack. Work independently or with a partner!

           Web (Flex + Node.js)                                                                  iOS (Objective-C)                           Android (Java)

    In the Web Speech Lab, Michael                                                 In the iOS Speech Lab, Brant                       In the Android Speech Lab, Jay
    will be on hand to help get your                                               will help you try out the AT&T                     will help you try out the AT&T
    Node.js app working with the                                                   Speech API on iOS and go into                      Speech API on Android and go
    AT&T Speech API. Code up your                                                  more depth about the AT&T                          into more depth about the
    own Speech API app from                                                        Speech SDK for iOS.                                AT&T Speech SDK for Android.
    scratch, or you can start from a                                               The mobile SDK allows you to                       The mobile SDK allows you to
    boilerplate app that uses                                                      quickly capture and stream                         quickly capture and stream
    Foursquare to search for                                                       audio from your iPhone or iPad                     audio from your Android
    locations and allow you to                                                     app to the AT&T Speech API.                        phone or tablet app to the
    check-in from your web                                                                                                            AT&T Speech API.
    browser!




2
       息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
                                                                                                                                          AT&T Developer Program
September 25, 2012




THANKS! ANY QUESTIONS?
                      Michael Owens (@mko on Twitter, mowens on Github)
                      Jay Lieske ( jay.lieske@att.com, jayatyp on Github)




                                                                                                                                  AT&T Developer Program
   息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.

More Related Content

AT&T 2012 DevLab Speech API Deep Dive

  • 2. September 25, 2012 AT&T SPEECH API DEEP DIVE Michael Owens (@mko on Twitter, mowens on Github) Jay Lieske ( jay.lieske@att.com, jayatyp on Github) AT&T Developer Program 息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.
  • 3. WHAT IS THE AT&T SPEECH API? 2 息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 4. How the AT&T Speech API Works 2 息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 5. Powered by AT&T WATSON Developed 20+ years Optimized for different usage scenarios: Web Search Business Search Question & Answer Voicemail-to-Text Short Message (SMS) TV Search/Remote (U-Verse) Generic Speech-to-Text 2 息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 6. Simple Speech-to-Text One REST endpoint Accepts audio in WAV or AMR Structured JSON response Text spoken by user Metrics to evaluate recognition quality AT&T Native SDKs for Android and iOS handle audio capture and streaming 2 息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 7. Apps in the Wild AT&T-Translator Speak4it U4Verse-Easy-Remote 2 息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 8. GETTING STARTED WITH THE AT&T SPEECH API 3 息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 9. Sign Up for API Access j.mp/ATTDevSignUp Free API Access for DevLab Attendees Detailed Instructions in your Attendee Packet Sign up with code APILAB12 AT&T Staff is on hand to answer questions and help get you set up 2 息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 10. Before You Code Get your API Keys from Developer portal: Client ID (API Key on the AT&T Developer Portal) Client Secret (Secret Key on the AT&T Developer Portal) OAuth 2.0 client_credentials grant type OAuth 2.0 access_token Audio File Types: AMR: narrowband, 12.2 kbits/s, 8 kHz sampling WAV: 16 bit PCM WAV, single channel, 8 kHz sampling Audio File Length: Voicemail: 4 minutes or less Other: 1 minute or less 2 息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 11. Step 1: Connect via OAuth Request Method: POST Request URL: https://api.att.com/oauth/token Request Headers: Content-Type: application/x-www-form- urlencoded Request Body: client_id=ATT_API_CLIENT_ID &client_secret=ATT_API_CLIENT_SECRET &grant_type=client_credentials &scope=SPEECH Response Body: { "access_token": "xxyz123" } 2 息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 12. Step 2: POST Audio to AT&T (Non-Streaming HTTP Request) Request Method: POST Request URL: https://api.att.com/rest/1/SpeechToText Request Headers: Accept: application/json Authorization: Bearer xxyz123 Content-Type: audio/wav Content-Length: 1534 X-SpeechContext: BusinessSearch Request Body: AUDIO_BINARY_DATA Note: The Audio Binary Data goes directly in POST Body, not a MIME Attachment. 2 息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 13. Step 2: POST Audio to AT&T (Streaming HTTP Request) Request Method: POST Request URL: https://api.att.com/rest/1/SpeechToText Request Headers: Accept: application/json Authorization: Bearer xxyz123 Content-Type: audio/amr Transfer-Encoding: chunked X-SpeechContext: QuestionAndAnswer Request Body: 200 Note: Numbers are the AUDIO_BINARY_DATA_CHUNK recommended chunk size 200 in hexadecimal format. AUDIO_BINARY_DATA_CHUNK 0 2 息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 14. AT&T SPEECH API EXAMPLE APPLICATION Download the Source: https://github.com/attdevsupport/2012DevLabExamples 4 息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 15. Transcription in Three Steps 1. Capture Audio Input 2. POST Audio to AT&T 3. Use AT&T API Response Capturing audio input differs Once the audio input has been The AT&T API sends back a very from platform to platform. captured, we send the easy to parse JSON object with compatible audio 鍖le from our the interpreted text. In our Basic Example, we use a server to the Speech API using small Adobe Flex app to access In our Basic example, we a simple POST. the mic via Flash, capture the output this to the users screen audio in one of the two In our Basic Example, we use a pretty printed and syntax accepted formats, then save small Node.js module called highlighted, but you could do that newly created audio 鍖le to Watson.js (NPM: watson-js) much more. disk on the server. to OAuth to the Speech API In our Speech Labs, we will look and then POST the audio 鍖le. In our Speech Labs, we will look at other ways to use this data, at the methods by which you In our Speech Labs, we will do like searching for businesses can capture and stream audio this on iOS, Android, and Web. on Foursquare. directly to the Speech API. 2 息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 16. Watson.js Node.js API Wrapper for the AT&T Speech API GitHub: http://github.com/mowens/watson-js/ NPM: https://npmjs.org/package/watson-js 5 息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 17. Using Watson.js 1. Require API Wrapper var WatsonClient = require(watson-js); 2. Set API Client Options var options = { client_id: ATT_API_CLIENT_ID, client_secret: ATT_API_CLIENT_SECRET, access_token: ACCESS_TOKEN, scope: "SPEECH", context: "Generic", access_token_url: "https://api.att.com/oauth/token", api_domain: "api.att.com" }; 3. Instantiate New API Client var Watson = new WatsonClient.Watson(options); 2 息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 18. The Methods of Watson.js Watson.getAccessToken(callback) Method for requesting a new OAuth Access Token using the Client Credentials grant type and passes the returned Access Token to the passed callback function. Watson.speechToText(speechFile, accessToken, callback) Method for piping a speech 鍖le (passed as an absolute 鍖le location) to the AT&T Speech API using the passed access token. The API Responses JSON is returned to the passed callback function as parsed JSON. 2 息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 19. AT&T SPEECH API EXAMPLE APP CODE WALKTHROUGH Using the AT&T Speech API to convert generic audio to text in a web browser. example-basic in the examples repo 6 息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 20. Frameworks & Requirements: Server-side: Node.js: JavaScript platform for building fast, scalable network apps FS: Node.js File System module Express: Minimal web application framework for Node.js Optimist: Lightweight option parsing module for Node.js HBS: Express View Engine wrapper for Handlebars Watson.js: Simple API Wrapper for AT&T Speech API Client-side: jQuery: The gold standard of client-side JavaScript libraries swfobject: JavaScript to make embedding Flash objects easier Bootstrap: Twitters CSS framework for quickly developing web apps 2 息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 21. Capture Audio Input recorder.swf: Adobe Flex app that accesses the users microphone and emits events to JS recorder.js: JavaScript interface to receive events, update UI, and POST 鍖le to Node.js Node.js upload script: function cp(source, destination, callback) { fs.readFile(source, function(err, buf) { fs.writeFile(destination, buf, callback); }); } app.post('/upload', function(req, res) { cp(req.files.upload_file.filename.path, __dirname + req.files.upload_file.filename.name, function(err) { res.send({ saved: 'saved' }); return; }); }); 2 息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 22. POST Audio to AT&T AJAX Request via POST from client side to Node.js // Receive an AJAX POST from client-side JavaScript app.post('/speechToText', function(req, res) { // Pass the audio file and access token to AT&T Speech API Watson.speechToText(__dirname + '/public/audio/audio.wav', this.access_token, function(err, reply) { // Pass any errors associated with API call to client-side JS if(err) { res.send({ error: err }); return; } // Return the parsed JSON to client-side JavaScript res.send(reply); return; }); }); 2 息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 23. Use Speech API Response Example API Response, returned Response- What-The-Response-Parameter-Means from call using Content-Type of Parameter application/json: Recognition Body"object"for"the"AT&T"Speech"API"Response ResponseId Unique"IdenG鍖er"for"a"speci鍖c"API"call Array"of"hypothesis"objects"(possible" { NBest transcripGons"of"audio"data). "Recognition": { PlainKtext,"cleaned"up"representaGon"of"the" "ResponseId": "74a964bf2fe", ResultText Hypothesis."This"should"be"used"when"displaying" "NBest": [ { the"text"to"users." "WordScores": [1, 0.75, 1, 0.75], Con鍖dence"score"for"the"overall"Hypothesis." "Confidence": 0.75, Confidence Scored"on"a"scale"from"0"(not"con鍖dent)"to"1.0" (very"con鍖dent) "Grade": "accept", Recommended"acGon"to"take"with"the"current" "ResultText": "This is a test.", Grade Hypothesis:"accept,"reject,"or"con鍖rm "Words": [This, is, a, Array"of"the"individual"words."Con鍖dence"scores" test.], Words for"each"word"are"available"in"the"WordScores" "LanguageId": "en-us", array." "Hypothesis": "This is a test." Array"of"individual"con鍖dence"scores"for"each" WordScores word"in"the"ResultText"parameter."Corresponds" } ] to"Words"array. } RepresentaGon"of"the"response"language." } LanguageId Supports"English"&"Spanish"in"Generic;"EnglishK only"in"other"contexts. The"raw"transcripGon"of"the"audio"that"was" Hypothesis interpreted. 2 息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 24. Up Next: Michael Fitzpatrick 2 息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 25. Up Next: Jason Goecke Adam Kalsey 2 息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 26. ADVANCED EXAMPLES What can you do with Speech-to-text? You could Make your mobile or web application accessible with voice commands Post tweets using voice commands in a simple Twitter app Add on-the-鍖y transcripts while recording in a podcasting app Add captioning to videos hosted on your website automatically Create real-time closed captions of a conference speakers presentation Search for nearby places to check in at on Foursquare 7 息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 27. Speech Labs Were now going to break out into three clusters, each focusing on a different technology stack. Work independently or with a partner! Web (Flex + Node.js) iOS (Objective-C) Android (Java) In the Web Speech Lab, Michael In the iOS Speech Lab, Brant In the Android Speech Lab, Jay will be on hand to help get your will help you try out the AT&T will help you try out the AT&T Node.js app working with the Speech API on iOS and go into Speech API on Android and go AT&T Speech API. Code up your more depth about the AT&T into more depth about the own Speech API app from Speech SDK for iOS. AT&T Speech SDK for Android. scratch, or you can start from a The mobile SDK allows you to The mobile SDK allows you to boilerplate app that uses quickly capture and stream quickly capture and stream Foursquare to search for audio from your iPhone or iPad audio from your Android locations and allow you to app to the AT&T Speech API. phone or tablet app to the check-in from your web AT&T Speech API. browser! 2 息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property. AT&T Developer Program
  • 28. September 25, 2012 THANKS! ANY QUESTIONS? Michael Owens (@mko on Twitter, mowens on Github) Jay Lieske ( jay.lieske@att.com, jayatyp on Github) AT&T Developer Program 息"2012"AT&T"Intellectual"Property."All"rights"reserved."AT&T"and"the"AT&T"logo"are"trademarks"of"AT&T"Intellectual"Property.