The document describes Spock, a proposed human computer interaction engine and API. The goals of Spock are to improve existing interaction techniques like gesture and speech recognition, and provide an API for other applications. It is motivated by projects like Pranav Mistry's Sixth Sense and interfaces in sci-fi movies. The proposed work will use OpenCV, CMUSphinx, and Python/C++, and aims to provide a more natural user experience. Initial progress includes configuring frameworks and algorithms for noise reduction, image synthesis, and speech synthesis. Future applications could help disabled users or enhance search and security.
1 of 27
More Related Content
Spock the human computer interaction system - synopsis
1. Spock
An advanced Human Computer Interaction Engine and API
A BTP synopsis by Sonal Raj & Shashi Kant
3. Objectives
Obtain a viable knowledge of the working of existing algorithms and
methods of interaction, and build up a concept system to improve these
techniques.
Code a Human Computer Interaction Engine capable to techniques like
motion and gesture recognition, speech synthesis (biometric authentications,
iris control and many other features could be included in future versions and
are beyond the scope of this project.).
Implement an Application Programming Interface (API) to extend the
functionality of this engine to multitude of applications running on the
system to implement it in their own way.
Spock Sonal Raj & Shashi Kant 3
5. Motivations
Definitely not marks or Compulsion !
Inspired by brilliant pieces of work like The Sixth Sense by Pranav Mistry
Vision to overcome the limitations of existing systems like Microsoft Kinect
and Nintendo.
Sci-fi movies like Star Trek or the Jarvis interactive computer systems in the
Iron Man movie series
Spock Sonal Raj (487/10) & Shashi Kant (462/10) 5
6. Motivations
Spock Sonal Raj (487/10) & Shashi Kant (462/10) 6
Pranav Mistrys work at MIT on Sixth Sense can be checked out at:
http://www.pranavmistry.com/projects/sixthsense/
Iron Mans J.A.R.V.I.S computer system can be referenced at : http://marvel-
movies.wikia.com/wiki/J.A.R.V.I.S.
Mission Statement
This Projects mission is to develop a technological product which will change
the way you interacted with your computers using life-like sensing methods.
12. Sphinx
Carnegie Mellon
University
Microsoft Kinect
- MS R&D
hello
SIRI for IOS
Apple Inc.
CMUSphinx toolkit is a leading speech recognition toolkit with
various tools used to build speech applications.
Kinect is a line of motion sensing input devices by Microsoft
for Xbox 360 and Xbox One video game consoles and
Windows PCs.
SIRI is a voice assistant for iOS. Uses natural speech
synthesis tools.
13. Papers and
Publications
hello
1. Real-time hand gesture recognition using range
cameras, Herv辿 Lahamy and Derek Litchi, Department of
Geomatics Engineering, University of Calgary, NW, Calgary,
Alberta
2. Real-Time Human Pose Recognition in Parts from
Single Depth Images, Jamie Shotton Andrew Fitzgibbon,
Microsoft Research Cambridge & Xbox Incubation.
3. Minimum variance modulation filter for robust
speech recognition. Yu-Hsiang Bosco Chiu and Richard M Stern,
Carnegie Mellon University, Pittsburgh, USA.
4. Some recent research work at LIUM based on the
use of CMU Sphinx, Yannick Est竪ve ET. Al. LIUM,
University of Le Mans, France
15. Tech to be Used
Spock Sonal Raj (487/10) & Shashi Kant (462/10) 15
Languages
1. C/C++
2. Python
3. Bash
Frameworks
OpenCV
CMUSphinx
16. Impact Areas
Spock Sonal Raj (487/10) & Shashi Kant (462/10) 16
Lively Computing experience to muggle
computer users.
Robust and Capable devices.
Advanced Search
Security feed analysis
And many more . . .
18. Future Work and Applications
Boon to physically challenged disabled users. The blind can now command a
computer with voice and gesture with accurate dictation tools within Spock
to control a system, rather than typing on Braille keyboards.
Used in Embedded devices for robot and appliance control
take search to the next level. You can not only search for your keywords
within a video or audio file, but with powerful system architecture, you can
do it in near real-time
In combination of suitable machine Learning and AI Techniques, it can be
used in Defense and Military operations for monitoring cross-border
conversations or surveillance data in an automated and much more efficient
way compared to manned analysis.
Spock Sonal Raj (487/10) & Shashi Kant (462/10) 18
20. Progress
Technologies learnt and configured
Studied documentations of the OpenCV Framework and the Sphinx
Speech toolkit from CMU
Fiddled around with configurations quite a bit. Thereby, their
configurations in Linux as well as Windows was carried out and
learnt successfully.
Interface implementations of the above frameworks.
Spock Sonal Raj (487/10) & Shashi Kant (462/10) 20
21. Progress
Coding Work: Noise Reduction and image synthesis: [Algorithm] This
was done by
subtracting the RGB value of the pixels of the previous frame from the RGB
values of the pixels of the current frame.
Then this image is converted to octachrome (8 colors only - red, blue, green,
cyan, magenta, yellow, white, black). This makes most pixels neutral or grey.
This is followed by the greying of those pixels not surrounded by 20 non-
grey pixels, in the function crosshair (IplImage*img1, IplImage* img2). The
non-grey pixels that remain represent proper motion and noise is eliminated.
A database is provided with the code which contains a set of points for each
gesture.
Spock Sonal Raj (487/10) & Shashi Kant (462/10) 21
22. Progress
Coding Work: Speech synthesis:
Split the waveform by on utterances by silences
match all possible combination of words with the audio models used in
speech recognition acoustic model phonetic dictionary Language Model.
Other concepts used in speech recognition are : Lattice, N-best lists, word
confusion networks, speech database, text databases are currently being
implemented.
Spock Sonal Raj (487/10) & Shashi Kant (462/10) 22
26. References
hello
Projects
1. Sixth Sense, Pranav Mistry, MIT Labs
2. Microsoft Kinect, Microsoft Research and XBox
3. CMUSphinx, Carnegie Mellon university
4. SIRI and Google Voice Search, Apple and Google, Inc.
Papers and Publications.
1. Real-time hand gesture recognition using range cameras, Herv辿 Lahamy and Derek Litchi, Department of
Geomatics Engineering, University of Calgary, NW, Calgary, Alberta
2. Real-Time Human Pose Recognition in Parts from Single Depth Images, Jamie Shotton Andrew
Fitzgibbon, Microsoft Research Cambridge & Xbox Incubation.
3. Minimum variance modulation filter for robust speech recognition. Yu-Hsiang Bosco Chiu and Richard M
Stern, Carnegie Mellon University, Pittsburgh, USA.
4. Some recent research work at LIUM based on the use of CMU Sphinx, Yannick Est竪ve ET. Al. LIUM,
University of Le Mans, France
27. Thank You
For your patience
Monitor the Progress of Spock and view the code at:
https://github.com/sonal-raj/Spock