際際滷

際際滷Share a Scribd company logo
Spock
An advanced Human Computer Interaction Engine and API
A BTP synopsis by Sonal Raj & Shashi Kant
Objective
What goals will Spock achieve ?
Objectives
 Obtain a viable knowledge of the working of existing algorithms and
methods of interaction, and build up a concept system to improve these
techniques.
 Code a Human Computer Interaction Engine capable to techniques like
motion and gesture recognition, speech synthesis (biometric authentications,
iris control and many other features could be included in future versions and
are beyond the scope of this project.).
 Implement an Application Programming Interface (API) to extend the
functionality of this engine to multitude of applications running on the
system to implement it in their own way.
Spock Sonal Raj & Shashi Kant 3
Motivation
Why on earth did we want to do this?
Motivations
 Definitely not marks or Compulsion ! 
 Inspired by brilliant pieces of work like The Sixth Sense by Pranav Mistry
 Vision to overcome the limitations of existing systems like Microsoft Kinect
and Nintendo.
 Sci-fi movies like Star Trek or the Jarvis interactive computer systems in the
Iron Man movie series
Spock Sonal Raj (487/10) & Shashi Kant (462/10) 5
Motivations
Spock Sonal Raj (487/10) & Shashi Kant (462/10) 6
Pranav Mistrys work at MIT on Sixth Sense can be checked out at:
http://www.pranavmistry.com/projects/sixthsense/
Iron Mans J.A.R.V.I.S computer system can be referenced at : http://marvel-
movies.wikia.com/wiki/J.A.R.V.I.S.
Mission Statement
This Projects mission is to develop a technological product which will change
the way you interacted with your computers using life-like sensing methods.
Evolution
Spock Sonal Raj (487/10) & Shashi Kant (462/10) 7
Natural User Interface (NUI)
Existing Work Survey
What all led to this?
Projects
Sixth Sense
- Pranav Mistry
- MIT Labs
hello
Projects
Sixth Sense
- Pranav Mistry
- MIT Labs
hello
Projects
Sixth Sense
- Pranav Mistry
- MIT Labs
hello
Sphinx
Carnegie Mellon
University
Microsoft Kinect
- MS R&D
hello
SIRI for IOS
Apple Inc.
CMUSphinx toolkit is a leading speech recognition toolkit with
various tools used to build speech applications.
Kinect is a line of motion sensing input devices by Microsoft
for Xbox 360 and Xbox One video game consoles and
Windows PCs.
SIRI is a voice assistant for iOS. Uses natural speech
synthesis tools.
Papers and
Publications
hello
1. Real-time hand gesture recognition using range
cameras, Herv辿 Lahamy and Derek Litchi, Department of
Geomatics Engineering, University of Calgary, NW, Calgary,
Alberta
2. Real-Time Human Pose Recognition in Parts from
Single Depth Images, Jamie Shotton Andrew Fitzgibbon,
Microsoft Research Cambridge & Xbox Incubation.
3. Minimum variance modulation filter for robust
speech recognition. Yu-Hsiang Bosco Chiu and Richard M Stern,
Carnegie Mellon University, Pittsburgh, USA.
4. Some recent research work at LIUM based on the
use of CMU Sphinx, Yannick Est竪ve ET. Al. LIUM,
University of Le Mans, France
Proposed Work
What we are going to do?
Tech to be Used
Spock Sonal Raj (487/10) & Shashi Kant (462/10) 15
 Languages
1. C/C++
2. Python
3. Bash
 Frameworks
 OpenCV
 CMUSphinx
Impact Areas
Spock Sonal Raj (487/10) & Shashi Kant (462/10) 16
 Lively Computing experience to muggle
computer users.
 Robust and Capable devices.
 Advanced Search
 Security feed analysis
 And many more . . .
Spock   the human computer interaction system - synopsis
Future Work and Applications
 Boon to physically challenged disabled users. The blind can now command a
computer with voice and gesture with accurate dictation tools within Spock
to control a system, rather than typing on Braille keyboards.
 Used in Embedded devices for robot and appliance control
 take search to the next level. You can not only search for your keywords
within a video or audio file, but with powerful system architecture, you can
do it in near real-time
 In combination of suitable machine Learning and AI Techniques, it can be
used in Defense and Military operations for monitoring cross-border
conversations or surveillance data in an automated and much more efficient
way compared to manned analysis.
Spock Sonal Raj (487/10) & Shashi Kant (462/10) 18
Progress so far.
Where have we reached?
Progress
 Technologies learnt and configured
 Studied documentations of the OpenCV Framework and the Sphinx
Speech toolkit from CMU
 Fiddled around with configurations quite a bit. Thereby, their
configurations in Linux as well as Windows was carried out and
learnt successfully.
 Interface implementations of the above frameworks.
Spock Sonal Raj (487/10) & Shashi Kant (462/10) 20
Progress
 Coding Work: Noise Reduction and image synthesis: [Algorithm] This
was done by
 subtracting the RGB value of the pixels of the previous frame from the RGB
values of the pixels of the current frame.
 Then this image is converted to octachrome (8 colors only - red, blue, green,
cyan, magenta, yellow, white, black). This makes most pixels neutral or grey.
 This is followed by the greying of those pixels not surrounded by 20 non-
grey pixels, in the function crosshair (IplImage*img1, IplImage* img2). The
non-grey pixels that remain represent proper motion and noise is eliminated.
A database is provided with the code which contains a set of points for each
gesture.
Spock Sonal Raj (487/10) & Shashi Kant (462/10) 21
Progress
 Coding Work: Speech synthesis:
 Split the waveform by on utterances by silences
 match all possible combination of words with the audio models used in
speech recognition acoustic model phonetic dictionary Language Model.
 Other concepts used in speech recognition are : Lattice, N-best lists, word
confusion networks, speech database, text databases are currently being
implemented.
Spock Sonal Raj (487/10) & Shashi Kant (462/10) 22
Project Timeline
What and by when?
Spock   the human computer interaction system - synopsis
References
Who helped?
References
hello
Projects
1. Sixth Sense, Pranav Mistry, MIT Labs
2. Microsoft Kinect, Microsoft Research and XBox
3. CMUSphinx, Carnegie Mellon university
4. SIRI and Google Voice Search, Apple and Google, Inc.
Papers and Publications.
1. Real-time hand gesture recognition using range cameras, Herv辿 Lahamy and Derek Litchi, Department of
Geomatics Engineering, University of Calgary, NW, Calgary, Alberta
2. Real-Time Human Pose Recognition in Parts from Single Depth Images, Jamie Shotton Andrew
Fitzgibbon, Microsoft Research Cambridge & Xbox Incubation.
3. Minimum variance modulation filter for robust speech recognition. Yu-Hsiang Bosco Chiu and Richard M
Stern, Carnegie Mellon University, Pittsburgh, USA.
4. Some recent research work at LIUM based on the use of CMU Sphinx, Yannick Est竪ve ET. Al. LIUM,
University of Le Mans, France
Thank You
For your patience 
Monitor the Progress of Spock and view the code at:
https://github.com/sonal-raj/Spock

More Related Content

Spock the human computer interaction system - synopsis

  • 1. Spock An advanced Human Computer Interaction Engine and API A BTP synopsis by Sonal Raj & Shashi Kant
  • 2. Objective What goals will Spock achieve ?
  • 3. Objectives Obtain a viable knowledge of the working of existing algorithms and methods of interaction, and build up a concept system to improve these techniques. Code a Human Computer Interaction Engine capable to techniques like motion and gesture recognition, speech synthesis (biometric authentications, iris control and many other features could be included in future versions and are beyond the scope of this project.). Implement an Application Programming Interface (API) to extend the functionality of this engine to multitude of applications running on the system to implement it in their own way. Spock Sonal Raj & Shashi Kant 3
  • 4. Motivation Why on earth did we want to do this?
  • 5. Motivations Definitely not marks or Compulsion ! Inspired by brilliant pieces of work like The Sixth Sense by Pranav Mistry Vision to overcome the limitations of existing systems like Microsoft Kinect and Nintendo. Sci-fi movies like Star Trek or the Jarvis interactive computer systems in the Iron Man movie series Spock Sonal Raj (487/10) & Shashi Kant (462/10) 5
  • 6. Motivations Spock Sonal Raj (487/10) & Shashi Kant (462/10) 6 Pranav Mistrys work at MIT on Sixth Sense can be checked out at: http://www.pranavmistry.com/projects/sixthsense/ Iron Mans J.A.R.V.I.S computer system can be referenced at : http://marvel- movies.wikia.com/wiki/J.A.R.V.I.S. Mission Statement This Projects mission is to develop a technological product which will change the way you interacted with your computers using life-like sensing methods.
  • 7. Evolution Spock Sonal Raj (487/10) & Shashi Kant (462/10) 7 Natural User Interface (NUI)
  • 8. Existing Work Survey What all led to this?
  • 9. Projects Sixth Sense - Pranav Mistry - MIT Labs hello
  • 10. Projects Sixth Sense - Pranav Mistry - MIT Labs hello
  • 11. Projects Sixth Sense - Pranav Mistry - MIT Labs hello
  • 12. Sphinx Carnegie Mellon University Microsoft Kinect - MS R&D hello SIRI for IOS Apple Inc. CMUSphinx toolkit is a leading speech recognition toolkit with various tools used to build speech applications. Kinect is a line of motion sensing input devices by Microsoft for Xbox 360 and Xbox One video game consoles and Windows PCs. SIRI is a voice assistant for iOS. Uses natural speech synthesis tools.
  • 13. Papers and Publications hello 1. Real-time hand gesture recognition using range cameras, Herv辿 Lahamy and Derek Litchi, Department of Geomatics Engineering, University of Calgary, NW, Calgary, Alberta 2. Real-Time Human Pose Recognition in Parts from Single Depth Images, Jamie Shotton Andrew Fitzgibbon, Microsoft Research Cambridge & Xbox Incubation. 3. Minimum variance modulation filter for robust speech recognition. Yu-Hsiang Bosco Chiu and Richard M Stern, Carnegie Mellon University, Pittsburgh, USA. 4. Some recent research work at LIUM based on the use of CMU Sphinx, Yannick Est竪ve ET. Al. LIUM, University of Le Mans, France
  • 14. Proposed Work What we are going to do?
  • 15. Tech to be Used Spock Sonal Raj (487/10) & Shashi Kant (462/10) 15 Languages 1. C/C++ 2. Python 3. Bash Frameworks OpenCV CMUSphinx
  • 16. Impact Areas Spock Sonal Raj (487/10) & Shashi Kant (462/10) 16 Lively Computing experience to muggle computer users. Robust and Capable devices. Advanced Search Security feed analysis And many more . . .
  • 18. Future Work and Applications Boon to physically challenged disabled users. The blind can now command a computer with voice and gesture with accurate dictation tools within Spock to control a system, rather than typing on Braille keyboards. Used in Embedded devices for robot and appliance control take search to the next level. You can not only search for your keywords within a video or audio file, but with powerful system architecture, you can do it in near real-time In combination of suitable machine Learning and AI Techniques, it can be used in Defense and Military operations for monitoring cross-border conversations or surveillance data in an automated and much more efficient way compared to manned analysis. Spock Sonal Raj (487/10) & Shashi Kant (462/10) 18
  • 19. Progress so far. Where have we reached?
  • 20. Progress Technologies learnt and configured Studied documentations of the OpenCV Framework and the Sphinx Speech toolkit from CMU Fiddled around with configurations quite a bit. Thereby, their configurations in Linux as well as Windows was carried out and learnt successfully. Interface implementations of the above frameworks. Spock Sonal Raj (487/10) & Shashi Kant (462/10) 20
  • 21. Progress Coding Work: Noise Reduction and image synthesis: [Algorithm] This was done by subtracting the RGB value of the pixels of the previous frame from the RGB values of the pixels of the current frame. Then this image is converted to octachrome (8 colors only - red, blue, green, cyan, magenta, yellow, white, black). This makes most pixels neutral or grey. This is followed by the greying of those pixels not surrounded by 20 non- grey pixels, in the function crosshair (IplImage*img1, IplImage* img2). The non-grey pixels that remain represent proper motion and noise is eliminated. A database is provided with the code which contains a set of points for each gesture. Spock Sonal Raj (487/10) & Shashi Kant (462/10) 21
  • 22. Progress Coding Work: Speech synthesis: Split the waveform by on utterances by silences match all possible combination of words with the audio models used in speech recognition acoustic model phonetic dictionary Language Model. Other concepts used in speech recognition are : Lattice, N-best lists, word confusion networks, speech database, text databases are currently being implemented. Spock Sonal Raj (487/10) & Shashi Kant (462/10) 22
  • 26. References hello Projects 1. Sixth Sense, Pranav Mistry, MIT Labs 2. Microsoft Kinect, Microsoft Research and XBox 3. CMUSphinx, Carnegie Mellon university 4. SIRI and Google Voice Search, Apple and Google, Inc. Papers and Publications. 1. Real-time hand gesture recognition using range cameras, Herv辿 Lahamy and Derek Litchi, Department of Geomatics Engineering, University of Calgary, NW, Calgary, Alberta 2. Real-Time Human Pose Recognition in Parts from Single Depth Images, Jamie Shotton Andrew Fitzgibbon, Microsoft Research Cambridge & Xbox Incubation. 3. Minimum variance modulation filter for robust speech recognition. Yu-Hsiang Bosco Chiu and Richard M Stern, Carnegie Mellon University, Pittsburgh, USA. 4. Some recent research work at LIUM based on the use of CMU Sphinx, Yannick Est竪ve ET. Al. LIUM, University of Le Mans, France
  • 27. Thank You For your patience Monitor the Progress of Spock and view the code at: https://github.com/sonal-raj/Spock