ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
ParSyll Algorithm
While the code and test environment still refers to SYLLABIX (the earlier name assigned to the prototype
algorithm prior to the year 2000), it has been renamed due to the fact that a game with the name
Syllabix is now in existence.
At some time the program names, files and environment will be updated to reflect the new name. In the
interim, rights are claimed by way of use, reference, communication and publication including this very
document now emailed, distributed and reflected in electronic media.
Copyright and right are claimed in terms of the Berne Copyright Convention and in terms of the
Copyright Act 98 of 1978 of South Africa. No part of this publication or of the program(s) or any
associated code may be reproduced or transmitted in any form or by any means, electronic or
mechanical, including photocopying, recording or by any information storage and retrieval system,
without permission in writing from the author Trevor Nigel Gadd. All rights reserved.
The following is a brief description of the algorithm.
The purpose of the algorithm is to segment written words and names into auto-determined 'syllables'
which are then interpreted phonetically to a degree, and used to construct a retrieval 'code' that
inherently 'groups' like-sounding words or names together to 'broaden' search results during a textual
enquiry.
It is important to note that the ParSyll Algorithm does not attempt to emulate dictionary syllable
definition. It uses instead, raw logic to attempt syllable segmentation in isolation from referential data
and NO WHOLE WORDS are stored or referenced in the execution of its task.
The algorithm is divided into eight major segments, executed sequentially :-
1. An initial segmentation
1.1 Incorporates some temporary special character-sequence augmentation
which is deleted again at the end of initial segmentation
2. Diphthongs and Triphthongs
2.1 Segmentation is based on 'majority-fit' solutions, resulting in some
incorrect sound-splits and conjoins (is 'ruin' one syllable or two?
'IENCE' in SCIENCE? 'IENCE' in CONSCIENCE? etc.)
3. Complex segmentation
4. Ending sound segmentation
4.1 Some sequences, eg 'NG' in the middle of a word might be split as a
result of syllable segmentation, eg '..N~G..'. The same sequence in
an ending sound or final syllable might not, eg. '~ING'
5. Phonetic substitutions
The phonetic substitutions of PHONIX are established and documented.
It is anticipated that syllable segmentation will enable different,
if similar, substitutions to be defined. Significantly, simpler
substitutions may suffice by virtue of the 'added definition' of
syllable boundaries.
5.1 First Syllable
5.1.1 Leading character substitutions
5.1.2 Embedded & trailing character substitutions and negations
5.2 Middle Syllables Substitutions
5.2.1 General character substitutions and negations
5.3 Last Syllable Substitutions
5.3.1 Ending-sound substitutions and negations
6. Elimination of carrier vocalization (vowels) and 'silent' consonants
7. Character mapping to broaden search results
8. Indexing of results for retrieval purposes
Data evaluation of results for the development of algorithm segment 5 is underway
T.N. Gadd
22 December 2015

More Related Content

Viewers also liked (18)

Fuel cells
Fuel cellsFuel cells
Fuel cells
Paul Katsus
?
Natal conto ninguem-da-prendas-painatalNatal conto ninguem-da-prendas-painatal
Natal conto ninguem-da-prendas-painatal
professoraisasoares
?
Case Study Example
Case Study ExampleCase Study Example
Case Study Example
ESHETIE MEKONENE AMARE
?
Las relaciones del estado de israel & m¨¦xico.Las relaciones del estado de israel & m¨¦xico.
Las relaciones del estado de israel & m¨¦xico.
christianpulido
?
CERTIFICATE OF EHS Inspector
CERTIFICATE OF EHS InspectorCERTIFICATE OF EHS Inspector
CERTIFICATE OF EHS Inspector
LOI NGUYEN
?
Response card nxt
Response card nxtResponse card nxt
Response card nxt
William McIntosh
?
¡¾µÚ2»Ø¡¿¥È¥ì©`¥Ë¥ó¥°ÄÚÈÝ
¡¾µÚ2»Ø¡¿¥È¥ì©`¥Ë¥ó¥°ÄÚÈÝ¡¾µÚ2»Ø¡¿¥È¥ì©`¥Ë¥ó¥°ÄÚÈÝ
¡¾µÚ2»Ø¡¿¥È¥ì©`¥Ë¥ó¥°ÄÚÈÝ
Kei Komatsu
?
Lecture 6
Lecture 6Lecture 6
Lecture 6
Wael Sharba
?
Tics 1 Ultimate Tics 1 Ultimate
Tics 1 Ultimate
Percy Pi?a Peixoto
?
Solution SOM 2009 Edition (Most Perfect)
Solution SOM 2009 Edition (Most Perfect)Solution SOM 2009 Edition (Most Perfect)
Solution SOM 2009 Edition (Most Perfect)
Hilmi Hanoin
?
2013 TMP Partners Meeting and Volunteer Recognition Reception
2013 TMP Partners Meeting and Volunteer Recognition Reception2013 TMP Partners Meeting and Volunteer Recognition Reception
2013 TMP Partners Meeting and Volunteer Recognition Reception
Toronto Region Immigrant Employment Council
?
I360 ciits
I360 ciitsI360 ciits
I360 ciits
William McIntosh
?
Seduc pi 4613780Seduc pi 4613780
Seduc pi 4613780
R?mulo Willyams Rodrigues
?
Narrativas period¨ªsticas en la webNarrativas period¨ªsticas en la web
Narrativas period¨ªsticas en la web
Cartapia
?
I360 school net
I360 school netI360 school net
I360 school net
William McIntosh
?
Research Project on Knowledge vs CGPA
Research Project on Knowledge vs CGPAResearch Project on Knowledge vs CGPA
Research Project on Knowledge vs CGPA
Muneeb Anwar
?
§±§â§Ö§Ù§Ö§ß§ä§Ñ§è§Ú§ñ §Ü§à§æ§Ö §ß§Ö§Þ§Ö§è§Ü§à§Û §ä§à§â§Ô§à§Ó§à§Û §Þ§Ñ§â§Ü§Ú Melitta
§±§â§Ö§Ù§Ö§ß§ä§Ñ§è§Ú§ñ §Ü§à§æ§Ö §ß§Ö§Þ§Ö§è§Ü§à§Û §ä§à§â§Ô§à§Ó§à§Û §Þ§Ñ§â§Ü§Ú Melitta§±§â§Ö§Ù§Ö§ß§ä§Ñ§è§Ú§ñ §Ü§à§æ§Ö §ß§Ö§Þ§Ö§è§Ü§à§Û §ä§à§â§Ô§à§Ó§à§Û §Þ§Ñ§â§Ü§Ú Melitta
§±§â§Ö§Ù§Ö§ß§ä§Ñ§è§Ú§ñ §Ü§à§æ§Ö §ß§Ö§Þ§Ö§è§Ü§à§Û §ä§à§â§Ô§à§Ó§à§Û §Þ§Ñ§â§Ü§Ú Melitta
Coffee2cup
?
Lecture 10
Lecture 10Lecture 10
Lecture 10
Wael Sharba
?
Natal conto ninguem-da-prendas-painatalNatal conto ninguem-da-prendas-painatal
Natal conto ninguem-da-prendas-painatal
professoraisasoares
?
Las relaciones del estado de israel & m¨¦xico.Las relaciones del estado de israel & m¨¦xico.
Las relaciones del estado de israel & m¨¦xico.
christianpulido
?
CERTIFICATE OF EHS Inspector
CERTIFICATE OF EHS InspectorCERTIFICATE OF EHS Inspector
CERTIFICATE OF EHS Inspector
LOI NGUYEN
?
¡¾µÚ2»Ø¡¿¥È¥ì©`¥Ë¥ó¥°ÄÚÈÝ
¡¾µÚ2»Ø¡¿¥È¥ì©`¥Ë¥ó¥°ÄÚÈÝ¡¾µÚ2»Ø¡¿¥È¥ì©`¥Ë¥ó¥°ÄÚÈÝ
¡¾µÚ2»Ø¡¿¥È¥ì©`¥Ë¥ó¥°ÄÚÈÝ
Kei Komatsu
?
Tics 1 Ultimate Tics 1 Ultimate
Tics 1 Ultimate
Percy Pi?a Peixoto
?
Solution SOM 2009 Edition (Most Perfect)
Solution SOM 2009 Edition (Most Perfect)Solution SOM 2009 Edition (Most Perfect)
Solution SOM 2009 Edition (Most Perfect)
Hilmi Hanoin
?
Seduc pi 4613780Seduc pi 4613780
Seduc pi 4613780
R?mulo Willyams Rodrigues
?
Narrativas period¨ªsticas en la webNarrativas period¨ªsticas en la web
Narrativas period¨ªsticas en la web
Cartapia
?
Research Project on Knowledge vs CGPA
Research Project on Knowledge vs CGPAResearch Project on Knowledge vs CGPA
Research Project on Knowledge vs CGPA
Muneeb Anwar
?
§±§â§Ö§Ù§Ö§ß§ä§Ñ§è§Ú§ñ §Ü§à§æ§Ö §ß§Ö§Þ§Ö§è§Ü§à§Û §ä§à§â§Ô§à§Ó§à§Û §Þ§Ñ§â§Ü§Ú Melitta
§±§â§Ö§Ù§Ö§ß§ä§Ñ§è§Ú§ñ §Ü§à§æ§Ö §ß§Ö§Þ§Ö§è§Ü§à§Û §ä§à§â§Ô§à§Ó§à§Û §Þ§Ñ§â§Ü§Ú Melitta§±§â§Ö§Ù§Ö§ß§ä§Ñ§è§Ú§ñ §Ü§à§æ§Ö §ß§Ö§Þ§Ö§è§Ü§à§Û §ä§à§â§Ô§à§Ó§à§Û §Þ§Ñ§â§Ü§Ú Melitta
§±§â§Ö§Ù§Ö§ß§ä§Ñ§è§Ú§ñ §Ü§à§æ§Ö §ß§Ö§Þ§Ö§è§Ü§à§Û §ä§à§â§Ô§à§Ó§à§Û §Þ§Ñ§â§Ü§Ú Melitta
Coffee2cup
?

Similar to ParSyll Algorithm (20)

An Overview of Hadoop
An Overview of HadoopAn Overview of Hadoop
An Overview of Hadoop
Asif Ali
?
Modeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert SystemModeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert System
csandit
?
Automatic subtitle generation
Automatic subtitle generationAutomatic subtitle generation
Automatic subtitle generation
tanyasaxena1611
?
What Shazam doesn't want you to know
What Shazam doesn't want you to knowWhat Shazam doesn't want you to know
What Shazam doesn't want you to know
Roy van Rijn
?
05 - Bypassing DEP, or why ASLR matters
05 - Bypassing DEP, or why ASLR matters05 - Bypassing DEP, or why ASLR matters
05 - Bypassing DEP, or why ASLR matters
Alexandre Moneger
?
Cs6660 compiler design may june 2016 Answer Key
Cs6660 compiler design may june 2016 Answer KeyCs6660 compiler design may june 2016 Answer Key
Cs6660 compiler design may june 2016 Answer Key
appasami
?
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET Journal
?
lempel_ziv
lempel_zivlempel_ziv
lempel_ziv
Litu Rout
?
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
?
Erlang, an overview
Erlang, an overviewErlang, an overview
Erlang, an overview
Patrick Huesler
?
Stemming is one of several text normalization techniques that converts raw te...
Stemming is one of several text normalization techniques that converts raw te...Stemming is one of several text normalization techniques that converts raw te...
Stemming is one of several text normalization techniques that converts raw te...
NALESVPMEngg
?
Robust Speech Recognition Technique using Mat lab
Robust Speech Recognition Technique using Mat labRobust Speech Recognition Technique using Mat lab
Robust Speech Recognition Technique using Mat lab
IRJET Journal
?
Reversing & Malware Analysis Training Part 4 - Assembly Programming Basics
Reversing & Malware Analysis Training Part 4 - Assembly Programming BasicsReversing & Malware Analysis Training Part 4 - Assembly Programming Basics
Reversing & Malware Analysis Training Part 4 - Assembly Programming Basics
securityxploded
?
Erlang
ErlangErlang
Erlang
ESUG
?
DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...
DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...
DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...
Hakka Labs
?
Programming in Computational Biology
Programming in Computational BiologyProgramming in Computational Biology
Programming in Computational Biology
AtreyiB
?
Diving into Functional Programming
Diving into Functional ProgrammingDiving into Functional Programming
Diving into Functional Programming
Lev Walkin
?
Deep Learning Tutorial | Deep Learning Tutorial for Beginners | Neural Networ...
Deep Learning Tutorial | Deep Learning Tutorial for Beginners | Neural Networ...Deep Learning Tutorial | Deep Learning Tutorial for Beginners | Neural Networ...
Deep Learning Tutorial | Deep Learning Tutorial for Beginners | Neural Networ...
Edureka!
?
8080 8085 assembly language_programming manual programando
8080 8085 assembly  language_programming manual programando 8080 8085 assembly  language_programming manual programando
8080 8085 assembly language_programming manual programando
Universidad de Tarapaca
?
G0361034038
G0361034038G0361034038
G0361034038
ijceronline
?
An Overview of Hadoop
An Overview of HadoopAn Overview of Hadoop
An Overview of Hadoop
Asif Ali
?
Modeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert SystemModeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert System
csandit
?
Automatic subtitle generation
Automatic subtitle generationAutomatic subtitle generation
Automatic subtitle generation
tanyasaxena1611
?
What Shazam doesn't want you to know
What Shazam doesn't want you to knowWhat Shazam doesn't want you to know
What Shazam doesn't want you to know
Roy van Rijn
?
05 - Bypassing DEP, or why ASLR matters
05 - Bypassing DEP, or why ASLR matters05 - Bypassing DEP, or why ASLR matters
05 - Bypassing DEP, or why ASLR matters
Alexandre Moneger
?
Cs6660 compiler design may june 2016 Answer Key
Cs6660 compiler design may june 2016 Answer KeyCs6660 compiler design may june 2016 Answer Key
Cs6660 compiler design may june 2016 Answer Key
appasami
?
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET Journal
?
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
?
Stemming is one of several text normalization techniques that converts raw te...
Stemming is one of several text normalization techniques that converts raw te...Stemming is one of several text normalization techniques that converts raw te...
Stemming is one of several text normalization techniques that converts raw te...
NALESVPMEngg
?
Robust Speech Recognition Technique using Mat lab
Robust Speech Recognition Technique using Mat labRobust Speech Recognition Technique using Mat lab
Robust Speech Recognition Technique using Mat lab
IRJET Journal
?
Reversing & Malware Analysis Training Part 4 - Assembly Programming Basics
Reversing & Malware Analysis Training Part 4 - Assembly Programming BasicsReversing & Malware Analysis Training Part 4 - Assembly Programming Basics
Reversing & Malware Analysis Training Part 4 - Assembly Programming Basics
securityxploded
?
Erlang
ErlangErlang
Erlang
ESUG
?
DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...
DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...
DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...
Hakka Labs
?
Programming in Computational Biology
Programming in Computational BiologyProgramming in Computational Biology
Programming in Computational Biology
AtreyiB
?
Diving into Functional Programming
Diving into Functional ProgrammingDiving into Functional Programming
Diving into Functional Programming
Lev Walkin
?
Deep Learning Tutorial | Deep Learning Tutorial for Beginners | Neural Networ...
Deep Learning Tutorial | Deep Learning Tutorial for Beginners | Neural Networ...Deep Learning Tutorial | Deep Learning Tutorial for Beginners | Neural Networ...
Deep Learning Tutorial | Deep Learning Tutorial for Beginners | Neural Networ...
Edureka!
?
8080 8085 assembly language_programming manual programando
8080 8085 assembly  language_programming manual programando 8080 8085 assembly  language_programming manual programando
8080 8085 assembly language_programming manual programando
Universidad de Tarapaca
?

ParSyll Algorithm

  • 1. ParSyll Algorithm While the code and test environment still refers to SYLLABIX (the earlier name assigned to the prototype algorithm prior to the year 2000), it has been renamed due to the fact that a game with the name Syllabix is now in existence. At some time the program names, files and environment will be updated to reflect the new name. In the interim, rights are claimed by way of use, reference, communication and publication including this very document now emailed, distributed and reflected in electronic media. Copyright and right are claimed in terms of the Berne Copyright Convention and in terms of the Copyright Act 98 of 1978 of South Africa. No part of this publication or of the program(s) or any associated code may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording or by any information storage and retrieval system, without permission in writing from the author Trevor Nigel Gadd. All rights reserved. The following is a brief description of the algorithm. The purpose of the algorithm is to segment written words and names into auto-determined 'syllables' which are then interpreted phonetically to a degree, and used to construct a retrieval 'code' that inherently 'groups' like-sounding words or names together to 'broaden' search results during a textual enquiry. It is important to note that the ParSyll Algorithm does not attempt to emulate dictionary syllable definition. It uses instead, raw logic to attempt syllable segmentation in isolation from referential data and NO WHOLE WORDS are stored or referenced in the execution of its task. The algorithm is divided into eight major segments, executed sequentially :- 1. An initial segmentation 1.1 Incorporates some temporary special character-sequence augmentation which is deleted again at the end of initial segmentation 2. Diphthongs and Triphthongs 2.1 Segmentation is based on 'majority-fit' solutions, resulting in some incorrect sound-splits and conjoins (is 'ruin' one syllable or two? 'IENCE' in SCIENCE? 'IENCE' in CONSCIENCE? etc.) 3. Complex segmentation 4. Ending sound segmentation 4.1 Some sequences, eg 'NG' in the middle of a word might be split as a result of syllable segmentation, eg '..N~G..'. The same sequence in an ending sound or final syllable might not, eg. '~ING' 5. Phonetic substitutions The phonetic substitutions of PHONIX are established and documented. It is anticipated that syllable segmentation will enable different, if similar, substitutions to be defined. Significantly, simpler substitutions may suffice by virtue of the 'added definition' of
  • 2. syllable boundaries. 5.1 First Syllable 5.1.1 Leading character substitutions 5.1.2 Embedded & trailing character substitutions and negations 5.2 Middle Syllables Substitutions 5.2.1 General character substitutions and negations 5.3 Last Syllable Substitutions 5.3.1 Ending-sound substitutions and negations 6. Elimination of carrier vocalization (vowels) and 'silent' consonants 7. Character mapping to broaden search results 8. Indexing of results for retrieval purposes Data evaluation of results for the development of algorithm segment 5 is underway T.N. Gadd 22 December 2015