This document discusses corpus linguistics and its applications in semantic and pragmatic studies. It provides definitions and examples of corpus linguistics, prominent corpora that are used in research, and how corpus linguistics can be applied to study semantic prosody. The document also discusses how corpus linguistics can inform the study of semantics versus pragmatics and provides examples of studies analyzing nominal compounds, genre analysis, and other linguistic features using corpus-driven approaches.
1 of 18
More Related Content
Corpus and semantics final
1. M A R I A C A R O L I N A
F I L I P E R O D R I G U E S
F A B I O R O N D I N E L L I
E R I C O C A E T A N O
Corpus linguistics and
semantics studies
2. What is corpus linguistics?
Collection and analysis of a specific set of data
Corpus characteristics
Technology and corpora
Access to language in proper use
Quantification
It facilitates the access to the material
3. Most known corpora
Brown Corpus
British National Corpus
Oxford English Corpus
International Corpus of English
Example: http://corpus.byu.edu/coca/
4. Corpus linguistics in semantic prosody
Prosody in the term semantic prosody is borrowed
from Firth (1957), who used it to refer to phonological
colouring which spreads beyond semantic boundaries. To
give an example, the word animal has so strong a nasal
prosody that the vowel sound of the letter a is endowed
with a nasal quality through assimilation, simply because a
is closely adjacent to the nasal sound of n. In the same way,
lexical items share this particular phenomenon of
prosody in lexical patterning. Enlightened by Firthian
sense of a prosody, Bill Louw coins the term semantic
prosody and endows it with its first definition, a
consistent aura of meaning with which a form is imbued
by its collocates (Louw, 1993: 157).
5. Louw illustrates SP with several examples such as the
adverbs utterly, the phrase bent on and the expression
symptomatic of, which simultaneously carry negative
SP. These three words are followed by expressions
which refer to undesirable things, such as destroying,
ruining, clinical, depression, multitude of sins, etc.
6. Semantics x Pragmatics
Semantic meaning and pragmatic meaning are the two
extremes in meaning system, for semantic meaning
can be seen as the meaning which arises only from
linguistic factors in a piece of communication, while
pragmatic meaning is that meaning imposed by the
non-linguistic elements which has an impact on
communication
7. Studies on Corpus Linguistics and Semantics
-Chishman and Teixeira (2009) provide us with an
interesting study on nominal compounds based on Corpus
Linguistics.
-Data from 10 digital issues of National Geographic analyzed
by a software.
-It recalls a common question Brazilian students of English
may ask: when trying to say bolo de ma巽a, for instance, they
may try cake of apple or even apples cake before getting
to apple cake, the correct nominal compound.
8. - Identification and categorization of recurrent
semantic relations between nominal compounds.
Examples: in memory drugs we find a relation of
telicity, for those drugs aim at serving memory
purposes. In school play, there is a relation of
localization, while in rice bag the effect is of
meronimy, for one element contains the other.
9. Such analysis could inspire us to observe the relation of
compound nouns and even suggest that Brazilian students of
English take a deeper look at them. For instance, what kind of
relation would students find in the following compounds?
How could they explain it with their own words?
- car accident - fruit bat - skin cancer
- island culture - lemon tree - cameraman
- metal armor - ethanol production
10. A Corpus-Driven Approach to Genre Analysis
- The paper shows that an exhaustive corpus-driven
approach, mixed with statistics, is the most effective
analytical method for comparing texts across genres.
- By using the resources above, the author examines
the characteristics of each genre, looking at words
and phrasal behavior
- According to the author, such na approach can
contribute much to the study of the pragmatic
analisys of written texts
11. Genres
Prior conceptions of genres considered external
criteria. Biber(1988,1993)
With the new approach, genre can based on internal
criteria
Instead of using a priori listings, genre can emerge
through quantitative research in linguistics
Biber (1988) and the multianalytical approach: if
some linguistic features are frequently in a text,
other features will appear less frequently
12. Corpus compilation
The general reference includes academic texts,
newspaper and literature from 6 pre-existing corpora
The size of the resulting genre corpora are as follows:
academic corpus (MicroConcord B + text category J of
the 4 corpora), 1,662,106 running words; newspaper
corpus (MicoroConcord A + text category A, B, C texts of
4 corpora), 1,760,664 running words; literature corpus
(text category K-R texts from 4 corpora), 1,019,254
running words. The size of a general reference corpora
derived from mixing the 4 corpora (hereafter re ferred to
as the GR corpus) was 4,071,830 running words.
13. Vocabulary variety and difficulty
The ranked order is, 1. newspaper, 2. literature and 3.
academic. Therefore, both S-TTR and Guiraud values
suggest that newspaper English uses the most varied
vocabulary, literary English an intermediate one, and
academic English the smallest, if estimators of lexical
density are used.
The inclusion of longer words is taken to mean that texts
have many difficult words from a solely empirical
perspective - 1. academic, 2. newspaper and 3. literature.
14. N-Gram analisys
This analisys was done by comparing multi-word units
between genre corpora, in particular 4-word units
occurring in each genre corpus. Coniam (2004) used
KfNgram (Fletcher 2002) to compute 4-word units
occurring in specific genre texts taken from applied
linguistics articles
N-grams are able to identify the commonest
collocations in a discourse far more effectively than a
single word analysis. There is an overall tendency
toward using multi-word fixed units in academic texts
as opposed to other genres.
15. Personality in texts: I, we and passives
Kuo researched the use of the personal pronoun in
academic texts from an empirical viewpoint. The use of the
personal pronoun provides an environment creating an
interpersonal interaction between the writer and the
readers (Kuo 1999:123)
Literature overuses I, while academic and newspapers
underuse it. Academic and literature use we more often
than newspapers.
The passive voice is much more used in academic texts
16. Nominalization
Biber et al. (1998:58) suggest that, studying a
morphological characteristic in a corpus can teach us
both about the frequency and distribution of the
characteristic and about the differing functions of
particular variants.
nominalization creates forms ending with -tion -
sion, -ness, -ment and -ity, including plural forms.
17. academic texts show nominalization at a higher ratio
than other genres and its texts tend to use
nominalizations ending with -ity, -ment, but at a
much lower frequency, -ness
Newspapers show a similar use of nominalization as
academic texts, but the ment form is predominant
literature works use these three nominalizations
almost equally and the ness form is the most salient
18. References
ZHANG, Changu. An Overview of Corpus-based Studies of
Semantic Prosody. Asian Social Science, vol. 6, June 2010.
CHISHMAN, Rove; TEIXEIRA, Lilian F. A sem但ntica dos
compostos nominais em l鱈ngua inglesa: um estudo de
corpus. Veredas on-line Lingu鱈stica de Corpus e
Computacional, 2/2009, P. 84-99
NISHINA, Yasunori (2007) A Corpus-Driven Approach to
Genre Analysis: The Reinvestigation of Academic,
Newspaper and Literary Texts, ELR Journal, 1 (2).