際際滷

際際滷Share a Scribd company logo
Lancaster-Oslo-
Bergen Corpus
BUSINESS AND LEGAL
TRANSLATION
COMPARABLE BILINGUAL CORPORA:
THE LOB CORPUS
WHAT IS A CORPUS?
IT IS A COLLECTION OF ELECTRONICALLY STORED
SEMIOTIC DATA THAT HAS BEEN DESIGNED ACCORDING
TO SPECIFIC CORPUS DESIGN CRITERIA TO BE
MAXIMALLY REPRESENTATIVE OF (A PARTICULAR VARIETY
OF) LANGUAGE OR OTHER SEMIOTIC SYSTEMS (BUTLER,
2004).
FROM THE DEFINITION
 It can be processed by software (electronically stored
data).
 Meaning making. It includes gestures as well (semiotic).
 The corpus is representative of a language.The
researchers carefully decide what to include and exclude,
and in what proportion (has been designed carefully).
 It represents a valid sample of a language variety or any
other semiotic system (representative). Naturally
occurring examples of language (spoken or written).
 When we find out about the corpus we can make
conclusions of the language or semiotic system.
WHAT IS CORPUS?
 It is a principled and large collection (body)
of authentic texts that are stored in a
computer, an analyzed using software
designed for corpus analysis.
 Principled data collection is not done
randomly, but following a planned operation.
 Authentic means genuine communication
of people (going about their normal
business). (Sinclair, 1996).
 Computer Readable Semiotic Data (it makes
the analysis easier, faster and more
accurate).
 Authentic Material (people have produced
it in particular social occasions, or they have
been considered as what has been deemed
as authentic).
 Designed to be representative.
What is a corpus?
 A comparable corpus is one corpus in a set of two or more monolingual corpora,
typically each in a different language, built according to the same principles.The
content is therefore similar and results can be compared between the corpora
even though they are not translations of each other (and therefore, there are not
aligned).
Comparable corpus
 NORMALLY SPECIALIZED COLLECTIONS OF SIMILAR
SOURCE TEXTS IN THE TWO LANGUAGES.
 IT CAN BE 卒MINED卒 FOR TERMINOLOGY AND OTHER
EQUIVALENCES SUCH CORPORA.
COMPARABLE BILINGUAL CORPUS
 THE LOB CORPUS EXISTS IN TWO MAIN VERSIONS:
THE ORIGINAL VERSION AND A POS-TAGGED VERSION.
 IN THE TAGGED CORPUS EACH WORD IS
ACCOMPANIED BY A WORD-CLASS TAG, ASSIGNED
THROUGH A COMBINATION OF AUTOMATIC TAGGING
PROGRAMS AND MANUAL PRE- AND POST-EDITING.
LOB CORPORA._Important aspects a translator needs to know
LOB CORPORA._Important aspects a translator needs to know
Tagged versions
Each word is accompanied by a word-class tag
There is no syntactic bracketing.
I: a horizontal format, with a running text where each word is immediately
followed by its associated tag;
II: a vertical format, where each word is on a separate line together with its
associated tag, some 'special information' and a reference number.
LOB CORPORA._Important aspects a translator needs to know
BIBLIOGRAPHY
 https://books.google.com.pa/books?
id=AyRwW9YtuRsC&pg=PA1&lpg=PA1&dq=examples+of+horizontal+and+vertical+tag+for+lob+corpus&source
=bl&ots=RdSY-3LlVh&sig=ACfU3U3X3JgRERmO6-8ZceXojEjvkkTrhw&hl=es-
419&sa=X&ved=2ahUKEwjhlpP22Lr3AhXXkmoFHUcHBkgQ6AF6BAgiEAM#v=onepage&q=examples%20of
%20horizontal%20and%20vertical%20tag%20for%20lob%20corpus&f=false
 https://wmtang.org/corpus-linguistics/a-glossary-of-corpus-types/
 https://www.youtube.com/watch?v=GWVFWgRgeOA
 https://varieng.helsinki.fi/CoRD/corpora/LOB/bibliography.html
 https://search.r-project.org/CRAN/refmans/corpora/html/LOBStats.html
 https://www1.essex.ac.uk/linguistics/external/clmt/w3c/corpus_ling/content/history.html
 https://www.oxfordhandbooks.com/view/10.1093/oxfordhb/9780199922765.001.0001/oxfordhb-9780199922765-
miscMatter-10
LOB CORPORA._Important aspects a translator needs to know

More Related Content

LOB CORPORA._Important aspects a translator needs to know

  • 2. BUSINESS AND LEGAL TRANSLATION COMPARABLE BILINGUAL CORPORA: THE LOB CORPUS
  • 3. WHAT IS A CORPUS? IT IS A COLLECTION OF ELECTRONICALLY STORED SEMIOTIC DATA THAT HAS BEEN DESIGNED ACCORDING TO SPECIFIC CORPUS DESIGN CRITERIA TO BE MAXIMALLY REPRESENTATIVE OF (A PARTICULAR VARIETY OF) LANGUAGE OR OTHER SEMIOTIC SYSTEMS (BUTLER, 2004).
  • 4. FROM THE DEFINITION It can be processed by software (electronically stored data). Meaning making. It includes gestures as well (semiotic). The corpus is representative of a language.The researchers carefully decide what to include and exclude, and in what proportion (has been designed carefully). It represents a valid sample of a language variety or any other semiotic system (representative). Naturally occurring examples of language (spoken or written). When we find out about the corpus we can make conclusions of the language or semiotic system.
  • 5. WHAT IS CORPUS? It is a principled and large collection (body) of authentic texts that are stored in a computer, an analyzed using software designed for corpus analysis. Principled data collection is not done randomly, but following a planned operation. Authentic means genuine communication of people (going about their normal business). (Sinclair, 1996).
  • 6. Computer Readable Semiotic Data (it makes the analysis easier, faster and more accurate). Authentic Material (people have produced it in particular social occasions, or they have been considered as what has been deemed as authentic). Designed to be representative. What is a corpus?
  • 7. A comparable corpus is one corpus in a set of two or more monolingual corpora, typically each in a different language, built according to the same principles.The content is therefore similar and results can be compared between the corpora even though they are not translations of each other (and therefore, there are not aligned). Comparable corpus
  • 8. NORMALLY SPECIALIZED COLLECTIONS OF SIMILAR SOURCE TEXTS IN THE TWO LANGUAGES. IT CAN BE 卒MINED卒 FOR TERMINOLOGY AND OTHER EQUIVALENCES SUCH CORPORA. COMPARABLE BILINGUAL CORPUS
  • 9. THE LOB CORPUS EXISTS IN TWO MAIN VERSIONS: THE ORIGINAL VERSION AND A POS-TAGGED VERSION. IN THE TAGGED CORPUS EACH WORD IS ACCOMPANIED BY A WORD-CLASS TAG, ASSIGNED THROUGH A COMBINATION OF AUTOMATIC TAGGING PROGRAMS AND MANUAL PRE- AND POST-EDITING.
  • 12. Tagged versions Each word is accompanied by a word-class tag There is no syntactic bracketing. I: a horizontal format, with a running text where each word is immediately followed by its associated tag; II: a vertical format, where each word is on a separate line together with its associated tag, some 'special information' and a reference number.
  • 14. BIBLIOGRAPHY https://books.google.com.pa/books? id=AyRwW9YtuRsC&pg=PA1&lpg=PA1&dq=examples+of+horizontal+and+vertical+tag+for+lob+corpus&source =bl&ots=RdSY-3LlVh&sig=ACfU3U3X3JgRERmO6-8ZceXojEjvkkTrhw&hl=es- 419&sa=X&ved=2ahUKEwjhlpP22Lr3AhXXkmoFHUcHBkgQ6AF6BAgiEAM#v=onepage&q=examples%20of %20horizontal%20and%20vertical%20tag%20for%20lob%20corpus&f=false https://wmtang.org/corpus-linguistics/a-glossary-of-corpus-types/ https://www.youtube.com/watch?v=GWVFWgRgeOA https://varieng.helsinki.fi/CoRD/corpora/LOB/bibliography.html https://search.r-project.org/CRAN/refmans/corpora/html/LOBStats.html https://www1.essex.ac.uk/linguistics/external/clmt/w3c/corpus_ling/content/history.html https://www.oxfordhandbooks.com/view/10.1093/oxfordhb/9780199922765.001.0001/oxfordhb-9780199922765- miscMatter-10