�ݺ�ߣ

Terminology work and term databases in Estonia
With emphasis on termbase data structures
Arvi Tavast, PhD
qlaara
Riga, 4 November 2015

Lexicography Terminology What’s wrong Quantitative
Introduction
From Estonian terminology to termbase data structures
We used to have specialised lexicography that people
aﬀectionately called terminology
Then we had a bit of terminology
(even applied to general language)
There were calls for a uniﬁed termbase of all terms
Which is unfortunately not doable:
coverage
reliability
lack of convention
theoretical issues
The following presentation gives a bit more detail

Outline
1 Lexicography: semasiological data structures
2 Terminology: onomasiological data structures
3 What’s wrong
Data structures
Metaphors of communication
4 Quantitative dictionary data structures
Data structures
Division of labour

Semasiological data structures
Words and what they mean
en: table
1. a piece of furniture with four legs and a ﬂat top
de: Tisch
2. layout of data in rows and columns
de: Tabelle
en: desk
- an oﬃce table
de: Tisch
de: Schreibtisch
en: spreadsheet
- a data layout consisting of rows and columns
de: Tabelle
de: Arbeitsblatt

Onomasiological data structures
Concepts and how they are called
1 A piece of furniture with four legs and a ﬂat top, for eating
en: table
de: Tisch
2 A piece of furniture with four legs and a ﬂat top, for writing
en: desk
de: Tisch
de: Schreibtisch
3 Layout of data in rows and columns
en: table
en: spreadsheet
de: Tabelle
de: Arbeitsblatt

Example
Latvian-Estonian dictionary

What’s wrong
Data structures
Semasiology
Pro: easy for the editor, understandable for the reader
Con: no support for consistency
A narrative about the editor, not a data source about language
Onomasiology
Pro: consistency, scalability, standardisation
Con: need for explicit binary decisions
An oversimpliﬁed data source about language; works if
concepts are known
Both
Binary: either means or does not mean, there is no scale
Introspective: claims are not falsiﬁable
Simplistic: assume the concepts are (or can be) known
The channel metaphor of communication

What’s wrong
The channel metaphor vs uncertainty reduction
Encoding of a message must contain a set of discriminable
states that is greater than or equal to the number of
discriminable states in the to-be-encoded message
or:
Encoding thoughts with words can only work if the number of
possible thoughts is smaller than or equal to the number of
possible words
This is the case only in very restricted domains (e.g. weather
forecasts)
Ramscar, M. et al. 2010. The Eﬀects of Feature-Label-Order and Their Implications
for Symbolic Learning. Cognitive Science 34(6): 909–957.

Quantitative data structures
Words (lexomes), their relatedness and other numerical parameters
Empirical data sources, rather than introspective
Corpus research, frequencies, collocations, distributional
semantics
Human experimental judgements
NB Meaning is inherently introspective, not measurable.
Relative meaning is measurable
Quantiﬁed data, rather than binary
Types of relatedness: synonyms, equivalents, cohyponyms, etc.
Other numerical parameters: frequency, valence, emotion,
reaction times, naming latencies, neighbourhood density,
relative entropy, median absolute deviation, morphological
distribution, search statistics etc.

Quantitative data structures
Relatedness can be quantiﬁed and presented as a graph or a table
table1 table2 desk spreadsheet Tisch Schreibtisch Tabelle Arbeitsblatt
table1 1 0 0.1 0 0.6 0.4 0 0
table2 0 1 0 0.5 0 0 0.8 0.8
desk 0.1 0 1 0 0.6 0.8 0 0
spreadsheet 0 0.5 0 1 0 0 0.7 0.8
Tisch 0.6 0 0.6 0 1 0.8 0 0
Schreibtisch 0.4 0 0.8 0 0.8 1 0 0
Tabelle 0 0.8 0 0.7 0 0 1 0.8
Arbeitsblatt 0 0.8 0 0.8 0 0 0.8 1
Fictional data for demonstration purposes only

Division of labour
Dumb user, smart dictionary vs smart user, dumb dictionary
A smart dictionary provides the correct answers
A dumb dictionary provides hints, like a thesaurus or synonym
dictionary
A dumb user looks for deﬁnite answers
A smart user can ﬁgure out the answer based on even subtle
hints

Thanks for listening
Contacts and recommended reading
�ݺ�ߣs:
www.slideshare.net/arvitavast
Contact:
arvi@qlaara.com
Easy reading:
blog.qlaara.com
Pointer to the real stuﬀ:
Ramscar, M. et al. 2010. The Eﬀects of
Feature-Label-Order and Their Implications for Symbolic
Learning. Cognitive Science 34(6): 909–957

�ݺ�ߣ

Terminology work and term databases in Estonia

Recommended

More Related Content

Viewers also liked (10)

Similar to Terminology work and term databases in Estonia (20)

Recently uploaded (20)

Terminology work and term databases in Estonia