際際滷

際際滷Share a Scribd company logo
Terminology work and term databases in Estonia
With emphasis on termbase data structures
Arvi Tavast, PhD
qlaara
Riga, 4 November 2015
Lexicography Terminology Whats wrong Quantitative
Introduction
From Estonian terminology to termbase data structures
We used to have specialised lexicography that people
a鍖ectionately called terminology
Then we had a bit of terminology
(even applied to general language)
There were calls for a uni鍖ed termbase of all terms
Which is unfortunately not doable:
coverage
reliability
lack of convention
theoretical issues
The following presentation gives a bit more detail
Lexicography Terminology Whats wrong Quantitative
Outline
1 Lexicography: semasiological data structures
2 Terminology: onomasiological data structures
3 Whats wrong
Data structures
Metaphors of communication
4 Quantitative dictionary data structures
Data structures
Division of labour
Lexicography Terminology Whats wrong Quantitative
Semasiological data structures
Words and what they mean
en: table
1. a piece of furniture with four legs and a 鍖at top
de: Tisch
2. layout of data in rows and columns
de: Tabelle
en: desk
- an o鍖ce table
de: Tisch
de: Schreibtisch
en: spreadsheet
- a data layout consisting of rows and columns
de: Tabelle
de: Arbeitsblatt
Lexicography Terminology Whats wrong Quantitative
Onomasiological data structures
Concepts and how they are called
1 A piece of furniture with four legs and a 鍖at top, for eating
en: table
de: Tisch
2 A piece of furniture with four legs and a 鍖at top, for writing
en: desk
de: Tisch
de: Schreibtisch
3 Layout of data in rows and columns
en: table
en: spreadsheet
de: Tabelle
de: Arbeitsblatt
Lexicography Terminology Whats wrong Quantitative
Example
Latvian-Estonian dictionary
Lexicography Terminology Whats wrong Quantitative
Example
Latvian-Estonian dictionary
Lexicography Terminology Whats wrong Quantitative
Whats wrong
Data structures
Semasiology
Pro: easy for the editor, understandable for the reader
Con: no support for consistency
A narrative about the editor, not a data source about language
Onomasiology
Pro: consistency, scalability, standardisation
Con: need for explicit binary decisions
An oversimpli鍖ed data source about language; works if
concepts are known
Both
Binary: either means or does not mean, there is no scale
Introspective: claims are not falsi鍖able
Simplistic: assume the concepts are (or can be) known
The channel metaphor of communication
Lexicography Terminology Whats wrong Quantitative
Whats wrong
The channel metaphor vs uncertainty reduction
Encoding of a message must contain a set of discriminable
states that is greater than or equal to the number of
discriminable states in the to-be-encoded message
or:
Encoding thoughts with words can only work if the number of
possible thoughts is smaller than or equal to the number of
possible words
This is the case only in very restricted domains (e.g. weather
forecasts)
Ramscar, M. et al. 2010. The E鍖ects of Feature-Label-Order and Their Implications
for Symbolic Learning. Cognitive Science 34(6): 909957.
Lexicography Terminology Whats wrong Quantitative
Quantitative data structures
Words (lexomes), their relatedness and other numerical parameters
Empirical data sources, rather than introspective
Corpus research, frequencies, collocations, distributional
semantics
Human experimental judgements
NB Meaning is inherently introspective, not measurable.
Relative meaning is measurable
Quanti鍖ed data, rather than binary
Types of relatedness: synonyms, equivalents, cohyponyms, etc.
Other numerical parameters: frequency, valence, emotion,
reaction times, naming latencies, neighbourhood density,
relative entropy, median absolute deviation, morphological
distribution, search statistics etc.
Lexicography Terminology Whats wrong Quantitative
Quantitative data structures
Relatedness can be quanti鍖ed and presented as a graph or a table
table1 table2 desk spreadsheet Tisch Schreibtisch Tabelle Arbeitsblatt
table1 1 0 0.1 0 0.6 0.4 0 0
table2 0 1 0 0.5 0 0 0.8 0.8
desk 0.1 0 1 0 0.6 0.8 0 0
spreadsheet 0 0.5 0 1 0 0 0.7 0.8
Tisch 0.6 0 0.6 0 1 0.8 0 0
Schreibtisch 0.4 0 0.8 0 0.8 1 0 0
Tabelle 0 0.8 0 0.7 0 0 1 0.8
Arbeitsblatt 0 0.8 0 0.8 0 0 0.8 1
Fictional data for demonstration purposes only
Lexicography Terminology Whats wrong Quantitative
Division of labour
Dumb user, smart dictionary vs smart user, dumb dictionary
A smart dictionary provides the correct answers
A dumb dictionary provides hints, like a thesaurus or synonym
dictionary
A dumb user looks for de鍖nite answers
A smart user can 鍖gure out the answer based on even subtle
hints
Lexicography Terminology Whats wrong Quantitative
Thanks for listening
Contacts and recommended reading
際際滷s:
www.slideshare.net/arvitavast
Contact:
arvi@qlaara.com
Easy reading:
blog.qlaara.com
Pointer to the real stu鍖:
Ramscar, M. et al. 2010. The E鍖ects of
Feature-Label-Order and Their Implications for Symbolic
Learning. Cognitive Science 34(6): 909957
Ad

Recommended

Textmining
Textmining
sidhunileshwar
Intro to HTML
Intro to HTML
St. John Lutheran School
14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation
RIILP
Starting
Starting
b1983
Fragen: visualisierung
Fragen: visualisierung
Stefan Gradmann
Fragebogen mit bildern
Fragebogen mit bildern
Stefan Gradmann
HAN_XU_ICDMW2014
HAN_XU_ICDMW2014
Han Xu, PhD
Soc 355
Soc 355
Tiffini Travis
Trabajo movie maker jose maria y jesus rueda
jesusrueda rueda
My c.v
My c.v
ASILA SALEH
Best seo company in india infos india
Best seo company in india infos india
Nick Sharma
Curriculo y virtualidad
Martha Vel叩squez
Compu
1234loko
Summer
Summer
fazuaje
Filming schedule ig2_in one
Filming schedule ig2_in one
Steampunk_Productions
An Introduction To Mobile Software Testing
An Introduction To Mobile Software Testing
Stephen Janaway
Kingdom Security Case Study
Kingdom Security Case Study
Scott Walker
A Tale of One City
A Tale of One City
Felicia Nelson
IR CHAPTER_TWO Most important for students
IR CHAPTER_TWO Most important for students
abduwasiahmed
Chapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdf
JemalNesre1
Chapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrieval
captainmactavish1996
Distributional semantics
Distributional semantics
Rabindra Nath Nandi
FinalDraftRevisisions
FinalDraftRevisisions
Joshua StGeorge
Automatic Profiling Of Learner Texts
Automatic Profiling Of Learner Texts
Jeff Nelson
Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)
Cornelius Puschmann
2010-04-29-swnj-pcls-presentation
2010-04-29-swnj-pcls-presentation
Douglas Randall
Web classification of Digital Libraries using GATE Machine Learning 油
Web classification of Digital Libraries using GATE Machine Learning 油
sstose
Customizable Segmentation of
Customizable Segmentation of
Andi Wu
Data For Tense And Aspect Systems In Bantu Derek Nurse
Data For Tense And Aspect Systems In Bantu Derek Nurse
molemowankie
Analysis And Indexing General Terms Experimentation
Analysis And Indexing General Terms Experimentation
Ashley Hernandez

More Related Content

Viewers also liked (10)

Trabajo movie maker jose maria y jesus rueda
jesusrueda rueda
My c.v
My c.v
ASILA SALEH
Best seo company in india infos india
Best seo company in india infos india
Nick Sharma
Curriculo y virtualidad
Martha Vel叩squez
Compu
1234loko
Summer
Summer
fazuaje
Filming schedule ig2_in one
Filming schedule ig2_in one
Steampunk_Productions
An Introduction To Mobile Software Testing
An Introduction To Mobile Software Testing
Stephen Janaway
Kingdom Security Case Study
Kingdom Security Case Study
Scott Walker
A Tale of One City
A Tale of One City
Felicia Nelson
Trabajo movie maker jose maria y jesus rueda
jesusrueda rueda
Best seo company in india infos india
Best seo company in india infos india
Nick Sharma
Curriculo y virtualidad
Martha Vel叩squez
Compu
1234loko
Summer
Summer
fazuaje
An Introduction To Mobile Software Testing
An Introduction To Mobile Software Testing
Stephen Janaway
Kingdom Security Case Study
Kingdom Security Case Study
Scott Walker

Similar to Terminology work and term databases in Estonia (20)

IR CHAPTER_TWO Most important for students
IR CHAPTER_TWO Most important for students
abduwasiahmed
Chapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdf
JemalNesre1
Chapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrieval
captainmactavish1996
Distributional semantics
Distributional semantics
Rabindra Nath Nandi
FinalDraftRevisisions
FinalDraftRevisisions
Joshua StGeorge
Automatic Profiling Of Learner Texts
Automatic Profiling Of Learner Texts
Jeff Nelson
Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)
Cornelius Puschmann
2010-04-29-swnj-pcls-presentation
2010-04-29-swnj-pcls-presentation
Douglas Randall
Web classification of Digital Libraries using GATE Machine Learning 油
Web classification of Digital Libraries using GATE Machine Learning 油
sstose
Customizable Segmentation of
Customizable Segmentation of
Andi Wu
Data For Tense And Aspect Systems In Bantu Derek Nurse
Data For Tense And Aspect Systems In Bantu Derek Nurse
molemowankie
Analysis And Indexing General Terms Experimentation
Analysis And Indexing General Terms Experimentation
Ashley Hernandez
Themes identification techniques in qualitative research
Themes identification techniques in qualitative research
Ghulam Qambar
Text mining introduction-1
Text mining introduction-1
Sumit Sony
NLP todo
NLP todo
Rohit Verma
Engineering Intelligent NLP Applications Using Deep Learning Part 1
Engineering Intelligent NLP Applications Using Deep Learning Part 1
Saurabh Kaushik
The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...
AIMS (Agricultural Information Management Standards)
The role of linguistic information for shallow language processing
The role of linguistic information for shallow language processing
Constantin Orasan
13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation
RIILP
Ir 03
Ir 03
Mohammed Romi
IR CHAPTER_TWO Most important for students
IR CHAPTER_TWO Most important for students
abduwasiahmed
Chapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdf
JemalNesre1
Chapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrieval
captainmactavish1996
FinalDraftRevisisions
FinalDraftRevisisions
Joshua StGeorge
Automatic Profiling Of Learner Texts
Automatic Profiling Of Learner Texts
Jeff Nelson
Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)
Cornelius Puschmann
2010-04-29-swnj-pcls-presentation
2010-04-29-swnj-pcls-presentation
Douglas Randall
Web classification of Digital Libraries using GATE Machine Learning 油
Web classification of Digital Libraries using GATE Machine Learning 油
sstose
Customizable Segmentation of
Customizable Segmentation of
Andi Wu
Data For Tense And Aspect Systems In Bantu Derek Nurse
Data For Tense And Aspect Systems In Bantu Derek Nurse
molemowankie
Analysis And Indexing General Terms Experimentation
Analysis And Indexing General Terms Experimentation
Ashley Hernandez
Themes identification techniques in qualitative research
Themes identification techniques in qualitative research
Ghulam Qambar
Text mining introduction-1
Text mining introduction-1
Sumit Sony
Engineering Intelligent NLP Applications Using Deep Learning Part 1
Engineering Intelligent NLP Applications Using Deep Learning Part 1
Saurabh Kaushik
The role of linguistic information for shallow language processing
The role of linguistic information for shallow language processing
Constantin Orasan
13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation
RIILP
Ad

Recently uploaded (20)

Overview of Stem Cells and Immune Modulation.ppsx
Overview of Stem Cells and Immune Modulation.ppsx
AhmedAtwa29
Science Holiday Homework (interesting slide )
Science Holiday Homework (interesting slide )
aryanxkohli88
What is Skeleton system.pptx by aahil sir
What is Skeleton system.pptx by aahil sir
bhatbashir421
Impact of Network Topologies on Blockchain Performance
Impact of Network Topologies on Blockchain Performance
vschiavoni
Relazione di laboratorio Idrolisi dell'amido (in inglese)
Relazione di laboratorio Idrolisi dell'amido (in inglese)
paolofvesco
SULFUR PEARL OF NAMIBIA - Thiomargarita namibiensis
SULFUR PEARL OF NAMIBIA - Thiomargarita namibiensis
aparnamp966
The scientific heritage No 162 (162) (2025)
The scientific heritage No 162 (162) (2025)
The scientific heritage
Science grade 7 assesement Quarter I based on matatag curriculum
Science grade 7 assesement Quarter I based on matatag curriculum
BryanLebasnon1
Sujay Rao Mandavilli public profile June 2025.pdf
Sujay Rao Mandavilli public profile June 2025.pdf
Sujay Rao Mandavilli
GBSN_Unit 3 - Medical and surgical Asepsis
GBSN_Unit 3 - Medical and surgical Asepsis
Areesha Ahmad
Flexible Denture -Removable partial denture.pptx
Flexible Denture -Removable partial denture.pptx
Nekemiya
THE CIRCULATORY SYSTEM GRADE 9 SCIENCE.pptx
THE CIRCULATORY SYSTEM GRADE 9 SCIENCE.pptx
roselyncatacutan
Lecture 9 Natural selection Evolution.pptx
Lecture 9 Natural selection Evolution.pptx
madi34702
Chromatography 際際滷s for the course of Introduction to Biology and Chemistry...
Chromatography 際際滷s for the course of Introduction to Biology and Chemistry...
Md. Arif Shahriar
Study of Appropriate Information Combination in Image-based Obfuscated Malwar...
Study of Appropriate Information Combination in Image-based Obfuscated Malwar...
takahashi34
Investigatory_project Topic:-effect of electrolysis in solar desalination .pdf
Investigatory_project Topic:-effect of electrolysis in solar desalination .pdf
shubham997ku
Gas Exchange in Insects and structures 01
Gas Exchange in Insects and structures 01
PhoebeAkinyi1
lysosomes "suicide bags of cell" and hydrolytic enzymes
lysosomes "suicide bags of cell" and hydrolytic enzymes
kchaturvedi070
History of Nursing and Nursing As A Profession UNIT-3.pptx
History of Nursing and Nursing As A Profession UNIT-3.pptx
madhusrinivas68
Science 7 DLL Week 1 Quarter 1 Matatag Curriculum
Science 7 DLL Week 1 Quarter 1 Matatag Curriculum
RONAFAITHLOOC
Overview of Stem Cells and Immune Modulation.ppsx
Overview of Stem Cells and Immune Modulation.ppsx
AhmedAtwa29
Science Holiday Homework (interesting slide )
Science Holiday Homework (interesting slide )
aryanxkohli88
What is Skeleton system.pptx by aahil sir
What is Skeleton system.pptx by aahil sir
bhatbashir421
Impact of Network Topologies on Blockchain Performance
Impact of Network Topologies on Blockchain Performance
vschiavoni
Relazione di laboratorio Idrolisi dell'amido (in inglese)
Relazione di laboratorio Idrolisi dell'amido (in inglese)
paolofvesco
SULFUR PEARL OF NAMIBIA - Thiomargarita namibiensis
SULFUR PEARL OF NAMIBIA - Thiomargarita namibiensis
aparnamp966
The scientific heritage No 162 (162) (2025)
The scientific heritage No 162 (162) (2025)
The scientific heritage
Science grade 7 assesement Quarter I based on matatag curriculum
Science grade 7 assesement Quarter I based on matatag curriculum
BryanLebasnon1
Sujay Rao Mandavilli public profile June 2025.pdf
Sujay Rao Mandavilli public profile June 2025.pdf
Sujay Rao Mandavilli
GBSN_Unit 3 - Medical and surgical Asepsis
GBSN_Unit 3 - Medical and surgical Asepsis
Areesha Ahmad
Flexible Denture -Removable partial denture.pptx
Flexible Denture -Removable partial denture.pptx
Nekemiya
THE CIRCULATORY SYSTEM GRADE 9 SCIENCE.pptx
THE CIRCULATORY SYSTEM GRADE 9 SCIENCE.pptx
roselyncatacutan
Lecture 9 Natural selection Evolution.pptx
Lecture 9 Natural selection Evolution.pptx
madi34702
Chromatography 際際滷s for the course of Introduction to Biology and Chemistry...
Chromatography 際際滷s for the course of Introduction to Biology and Chemistry...
Md. Arif Shahriar
Study of Appropriate Information Combination in Image-based Obfuscated Malwar...
Study of Appropriate Information Combination in Image-based Obfuscated Malwar...
takahashi34
Investigatory_project Topic:-effect of electrolysis in solar desalination .pdf
Investigatory_project Topic:-effect of electrolysis in solar desalination .pdf
shubham997ku
Gas Exchange in Insects and structures 01
Gas Exchange in Insects and structures 01
PhoebeAkinyi1
lysosomes "suicide bags of cell" and hydrolytic enzymes
lysosomes "suicide bags of cell" and hydrolytic enzymes
kchaturvedi070
History of Nursing and Nursing As A Profession UNIT-3.pptx
History of Nursing and Nursing As A Profession UNIT-3.pptx
madhusrinivas68
Science 7 DLL Week 1 Quarter 1 Matatag Curriculum
Science 7 DLL Week 1 Quarter 1 Matatag Curriculum
RONAFAITHLOOC
Ad

Terminology work and term databases in Estonia

  • 1. Terminology work and term databases in Estonia With emphasis on termbase data structures Arvi Tavast, PhD qlaara Riga, 4 November 2015
  • 2. Lexicography Terminology Whats wrong Quantitative Introduction From Estonian terminology to termbase data structures We used to have specialised lexicography that people a鍖ectionately called terminology Then we had a bit of terminology (even applied to general language) There were calls for a uni鍖ed termbase of all terms Which is unfortunately not doable: coverage reliability lack of convention theoretical issues The following presentation gives a bit more detail
  • 3. Lexicography Terminology Whats wrong Quantitative Outline 1 Lexicography: semasiological data structures 2 Terminology: onomasiological data structures 3 Whats wrong Data structures Metaphors of communication 4 Quantitative dictionary data structures Data structures Division of labour
  • 4. Lexicography Terminology Whats wrong Quantitative Semasiological data structures Words and what they mean en: table 1. a piece of furniture with four legs and a 鍖at top de: Tisch 2. layout of data in rows and columns de: Tabelle en: desk - an o鍖ce table de: Tisch de: Schreibtisch en: spreadsheet - a data layout consisting of rows and columns de: Tabelle de: Arbeitsblatt
  • 5. Lexicography Terminology Whats wrong Quantitative Onomasiological data structures Concepts and how they are called 1 A piece of furniture with four legs and a 鍖at top, for eating en: table de: Tisch 2 A piece of furniture with four legs and a 鍖at top, for writing en: desk de: Tisch de: Schreibtisch 3 Layout of data in rows and columns en: table en: spreadsheet de: Tabelle de: Arbeitsblatt
  • 6. Lexicography Terminology Whats wrong Quantitative Example Latvian-Estonian dictionary
  • 7. Lexicography Terminology Whats wrong Quantitative Example Latvian-Estonian dictionary
  • 8. Lexicography Terminology Whats wrong Quantitative Whats wrong Data structures Semasiology Pro: easy for the editor, understandable for the reader Con: no support for consistency A narrative about the editor, not a data source about language Onomasiology Pro: consistency, scalability, standardisation Con: need for explicit binary decisions An oversimpli鍖ed data source about language; works if concepts are known Both Binary: either means or does not mean, there is no scale Introspective: claims are not falsi鍖able Simplistic: assume the concepts are (or can be) known The channel metaphor of communication
  • 9. Lexicography Terminology Whats wrong Quantitative Whats wrong The channel metaphor vs uncertainty reduction Encoding of a message must contain a set of discriminable states that is greater than or equal to the number of discriminable states in the to-be-encoded message or: Encoding thoughts with words can only work if the number of possible thoughts is smaller than or equal to the number of possible words This is the case only in very restricted domains (e.g. weather forecasts) Ramscar, M. et al. 2010. The E鍖ects of Feature-Label-Order and Their Implications for Symbolic Learning. Cognitive Science 34(6): 909957.
  • 10. Lexicography Terminology Whats wrong Quantitative Quantitative data structures Words (lexomes), their relatedness and other numerical parameters Empirical data sources, rather than introspective Corpus research, frequencies, collocations, distributional semantics Human experimental judgements NB Meaning is inherently introspective, not measurable. Relative meaning is measurable Quanti鍖ed data, rather than binary Types of relatedness: synonyms, equivalents, cohyponyms, etc. Other numerical parameters: frequency, valence, emotion, reaction times, naming latencies, neighbourhood density, relative entropy, median absolute deviation, morphological distribution, search statistics etc.
  • 11. Lexicography Terminology Whats wrong Quantitative Quantitative data structures Relatedness can be quanti鍖ed and presented as a graph or a table table1 table2 desk spreadsheet Tisch Schreibtisch Tabelle Arbeitsblatt table1 1 0 0.1 0 0.6 0.4 0 0 table2 0 1 0 0.5 0 0 0.8 0.8 desk 0.1 0 1 0 0.6 0.8 0 0 spreadsheet 0 0.5 0 1 0 0 0.7 0.8 Tisch 0.6 0 0.6 0 1 0.8 0 0 Schreibtisch 0.4 0 0.8 0 0.8 1 0 0 Tabelle 0 0.8 0 0.7 0 0 1 0.8 Arbeitsblatt 0 0.8 0 0.8 0 0 0.8 1 Fictional data for demonstration purposes only
  • 12. Lexicography Terminology Whats wrong Quantitative Division of labour Dumb user, smart dictionary vs smart user, dumb dictionary A smart dictionary provides the correct answers A dumb dictionary provides hints, like a thesaurus or synonym dictionary A dumb user looks for de鍖nite answers A smart user can 鍖gure out the answer based on even subtle hints
  • 13. Lexicography Terminology Whats wrong Quantitative Thanks for listening Contacts and recommended reading 際際滷s: www.slideshare.net/arvitavast Contact: arvi@qlaara.com Easy reading: blog.qlaara.com Pointer to the real stu鍖: Ramscar, M. et al. 2010. The E鍖ects of Feature-Label-Order and Their Implications for Symbolic Learning. Cognitive Science 34(6): 909957