際際滷

際際滷Share a Scribd company logo
Tamil Internet Conference 2020
TamilInayaVaani - Integrating TVA Open-
Source spellchecker with Python
T. Shrinivasan, Nithya Duraisamy, Ashok Ramachandran, Manickkavasakam,
Arunmozhi, and A. Muthiah
Who are we?
Few Open Source Contributors from
Ezhil Foundation
Kaniyam Foundation
Thamizha
Mozilla Tamilnadu
Indian Linux Users group, Chennai
IndicNLP
Having similar dreams in many heads
Open source Tamil Spellchecker
A Dream for many years becoming real
Existing Efforts

Hunspell

GNU Aspell

LanguageTool.org

Open-Tamil Solthiruthi

Bloom Filter based spellchecker
Still long way to go
How long?
Problems with Tamil Spellchecker

Infinity Vocabulary

Rich in Morphology

Agglutinative

Free Word Order

Sandhi

...
Few Algorithms
Levenshtein distance search
Levenshtein distance search
Few Algorithms
Norvig Algorithm
Norvig Algorithm
Still not perfect
Research continues...
TamilinayaVaani
A Open Source Spellchecker from
Tamil Virtual Academy
Tamilinayavaani -  integrating tva open-source spellchecker with python
Tamilinayavaani -  integrating tva open-source spellchecker with python
TN Govt announcement
All the software released in GNU GPL V2
All digital content in CC-BY-SA
TamilinayaVaani

Developed as Desktop Version

C# based

Limited version of Vaani.neechalkaran.com

Cant use in Linux

Cant use as command line

Cant integrate with other applications
Porting to Python
Why?
Porting to Python
Python  Easy to develop further
Easy integration
Web applications
API
Scalable
Tamilinayavaani -  integrating tva open-source spellchecker with python
Python Port Code

https://github.com/tshrinivasan/Tamilinaiya-Spellchecker
The beauty of Open Source
More Contributions
Open-Tamil Python Library

The defacto Python library for Tamil Computing

Process tamil text

Build Games, Tamil Utilities

http://Tamilpesu.us
Tamilinayavaani -  integrating tva open-source spellchecker with python
Integrating with sandhichecker

Open-Tamil has a SandhiChecker

40+ rules

Added this sandhi Checker to Tamilinayavaani
Python Packaging

Easy install in any OS

Pip install tamiliyavaani
Sample Usage
Web Interface with TinyMCE

Added a good web interface
Web Interface
Web Interface
Web Interface
Web Interface
Web Interface
JavaScript
A JavaScript port is on the way
TODO

Provide API

Host as a Public website

Test and add more rules

Set edit distance=2

Find method to yield better alternate
Word Corpus

Collected 1,53,548 unique tamil nouns

Collected 25,83,000 unique tamil words

https://github.com/KaniyamFoundation/all_tamil_words

https://github.com/KaniyamFoundation/all_tamil_nouns
TODO

Clean them manually

Build a golden corpus for quick lookup

BloomFilter/SymSpell/LSTM and more
Please Contribute

Give Tamil Rules

Give Tamil Corpus

Write Code

Test

Document

Provide Hosting

Donate
Thanks

Muthu Annamalai

Tamilnadu Government

Neechalkaran

Nithya Duraisamy

Ashok Ramachandran

Manickkavasakam

Arunmozhi

And All Contributors for
Ezhil Foundation, Kaniyam
Foundation, Thamizha,
IndicNLP and all other
Open Source Teams
Contact

T Shrinivasan

tshrinivasan@gmail.com

Kaniyam.com

More Related Content

Tamilinayavaani - integrating tva open-source spellchecker with python