際際滷

際際滷Share a Scribd company logo
The DictionaryofItalianCollocations: Design and Integration in an Online LearningEnvironmentStefania Spina UniversityforForeigners Perugia, Italia
The Dictionary of Italian CollocationsLREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations2Part of APRIL project (Personalised web environmentforlanguagelearning)NLP resourcesas a supportfor the lexicalcompetenceofstudentsofItalianwithin a VirtualLearningEnvironment(VLE).
PresentationoutlineLREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations3background and motivationreference corpusmethodologydictionary compilationintegrationwithin VLE
Backgrounddifferentsyntactic and semanticprofiles, butprototypicalfeatures:semanticnon-compositionalitynon-substitutabilityofcomponentsbysemanticallysimilarwordsnon-insertionofexternalitemscontinuum ratherthan definite categoriesLREC 2010 - Stefania Spina -  The DictionaryofItalianCollocations4
ContinuumLREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations5semanticnon-compositionalityTagliare la corda runawayaprire la porta open the doornon-substitutabilityCamera oscura dark room{fare|porre|rivolgere|formulare} una domanda ask a question* Stanza oscurainsertionofexternalitemsfare una lunga calda riposante doccia take a long, hot, restfulshowerSistema *molto operativo operating system
Motivation: collocations in SLALREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations6improvinglearnersfluencynon-nativespeakers and L2 vocabulary: first single words, then more extendedchunkstrend tooveruse the creative combinationofisolatedwordsSinclairs open choiceprincipleExamplesfromItalianleanercorporapreoccupata per il corso che mi mette nelle difficolt (Russia)mettere in difficolt cause problemse poi alla fine ho fatto questa decisione (Vietnam)	Prendere una decisione make a decision
DICILREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations7collocationsrequirespecificpedagogicalattentionDictionaryofItalianCollocations(DICI)itiscorpus-based; itis a learner-orientedtool: listof the most common Italiancollocations, classified on a frequencybasis;itisalsobased on statisticalmethodologies (dispersion in the differenttextualgenresrepresented in the corpus).
Reference corpusLREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations8Perugia corpus: POS-tagged, lemmatized
POS filteringLREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations9Analysisofexistinglistofcollocations:150 different POS sequences10 mostproductive POS sequences
Experimentalmethodology: 4stepsLREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations10extractionof candidate collocationsfrom corpus;filteringof the candidate collocations: frequencyand dispersion;compilation of the dictionary;integrationof the dictionarywith the online learning6POS sequences
12-million-word sample, 4sectionsCollocationsextractionLREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations11via IMS Corpus Workbenchremovingall the candidateswithfrequency = 141643 collocationsTwo more filters:DispersionManual (non-collocations)
DispersionLREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations12Examples:Aggrottare la fronte tofrown (fiction)Vincere le elezioni towin the elections (press)Dare una definizione togive a definition (academic prose)JuillandsDvalue (Juilland - Chang-Rodriguez, 1964)Dvalue: combinedwithfrequency = usageUsage value  2  2047 candidate collocationsManualselection. Finalresult:listof1553 word combinations = dictionaryentries
CollocationslistLREC 2010 - Stefania Spina -  The DictionaryofItalianCollocations13
Compilation of the DictionaryLREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations14Lexical database enrichedwithtwokindsof data:Visibleto the learner (client output)definition, examples, part-of-speech, syntacticcontextofoccurrenceofcollocationstobeprocessedbyotherapplications (server)internalsyntacticconfigurationforautomaticrecognition
DB integration in the VLELREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations15VirtualLearningEnvironment:web applicationspecificallydevotedtolanguagelearningLELE (Linguistically-EnhancedLearningEnvironment)providelanguagelearnerswithadditional NLP resources, in ordertoimprovetheirlinguisticcompetencereceptive and productivelearningactivitiesconcerning the recognition and the activeuseofcollocations
LELE FeaturesLREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations16toautomaticallyrecognize and highlightmulti-wordunits in writtenItaliantexts;to show additionallinguistic information about the selectedcollocations;to generate collocationtestsforcollocationalcompetenceassessmentofsecond or foreignlanguagelearners.
LELE schemeLREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations17server
LREC 2010 presentation
LREC 2010 presentation
ConclusionsLREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations20Nextstep:samemethodologyto the whole corpus, forall the 10 selected POS sequencesFurtherresearchrefinestatisticalmeasuresassigncollocationstodifferentlevelsofcompetenceothertools (productivetasks)
LREC 2010 - Stefania Spina -  The Dictionary of Italian Collocations21Stefania Spinastefania.spina@unistrapg.ithttp://april.unistrapg.it

More Related Content

LREC 2010 presentation

  • 1. The DictionaryofItalianCollocations: Design and Integration in an Online LearningEnvironmentStefania Spina UniversityforForeigners Perugia, Italia
  • 2. The Dictionary of Italian CollocationsLREC 2010 - Stefania Spina - The Dictionary of Italian Collocations2Part of APRIL project (Personalised web environmentforlanguagelearning)NLP resourcesas a supportfor the lexicalcompetenceofstudentsofItalianwithin a VirtualLearningEnvironment(VLE).
  • 3. PresentationoutlineLREC 2010 - Stefania Spina - The Dictionary of Italian Collocations3background and motivationreference corpusmethodologydictionary compilationintegrationwithin VLE
  • 4. Backgrounddifferentsyntactic and semanticprofiles, butprototypicalfeatures:semanticnon-compositionalitynon-substitutabilityofcomponentsbysemanticallysimilarwordsnon-insertionofexternalitemscontinuum ratherthan definite categoriesLREC 2010 - Stefania Spina - The DictionaryofItalianCollocations4
  • 5. ContinuumLREC 2010 - Stefania Spina - The Dictionary of Italian Collocations5semanticnon-compositionalityTagliare la corda runawayaprire la porta open the doornon-substitutabilityCamera oscura dark room{fare|porre|rivolgere|formulare} una domanda ask a question* Stanza oscurainsertionofexternalitemsfare una lunga calda riposante doccia take a long, hot, restfulshowerSistema *molto operativo operating system
  • 6. Motivation: collocations in SLALREC 2010 - Stefania Spina - The Dictionary of Italian Collocations6improvinglearnersfluencynon-nativespeakers and L2 vocabulary: first single words, then more extendedchunkstrend tooveruse the creative combinationofisolatedwordsSinclairs open choiceprincipleExamplesfromItalianleanercorporapreoccupata per il corso che mi mette nelle difficolt (Russia)mettere in difficolt cause problemse poi alla fine ho fatto questa decisione (Vietnam) Prendere una decisione make a decision
  • 7. DICILREC 2010 - Stefania Spina - The Dictionary of Italian Collocations7collocationsrequirespecificpedagogicalattentionDictionaryofItalianCollocations(DICI)itiscorpus-based; itis a learner-orientedtool: listof the most common Italiancollocations, classified on a frequencybasis;itisalsobased on statisticalmethodologies (dispersion in the differenttextualgenresrepresented in the corpus).
  • 8. Reference corpusLREC 2010 - Stefania Spina - The Dictionary of Italian Collocations8Perugia corpus: POS-tagged, lemmatized
  • 9. POS filteringLREC 2010 - Stefania Spina - The Dictionary of Italian Collocations9Analysisofexistinglistofcollocations:150 different POS sequences10 mostproductive POS sequences
  • 10. Experimentalmethodology: 4stepsLREC 2010 - Stefania Spina - The Dictionary of Italian Collocations10extractionof candidate collocationsfrom corpus;filteringof the candidate collocations: frequencyand dispersion;compilation of the dictionary;integrationof the dictionarywith the online learning6POS sequences
  • 11. 12-million-word sample, 4sectionsCollocationsextractionLREC 2010 - Stefania Spina - The Dictionary of Italian Collocations11via IMS Corpus Workbenchremovingall the candidateswithfrequency = 141643 collocationsTwo more filters:DispersionManual (non-collocations)
  • 12. DispersionLREC 2010 - Stefania Spina - The Dictionary of Italian Collocations12Examples:Aggrottare la fronte tofrown (fiction)Vincere le elezioni towin the elections (press)Dare una definizione togive a definition (academic prose)JuillandsDvalue (Juilland - Chang-Rodriguez, 1964)Dvalue: combinedwithfrequency = usageUsage value 2 2047 candidate collocationsManualselection. Finalresult:listof1553 word combinations = dictionaryentries
  • 13. CollocationslistLREC 2010 - Stefania Spina - The DictionaryofItalianCollocations13
  • 14. Compilation of the DictionaryLREC 2010 - Stefania Spina - The Dictionary of Italian Collocations14Lexical database enrichedwithtwokindsof data:Visibleto the learner (client output)definition, examples, part-of-speech, syntacticcontextofoccurrenceofcollocationstobeprocessedbyotherapplications (server)internalsyntacticconfigurationforautomaticrecognition
  • 15. DB integration in the VLELREC 2010 - Stefania Spina - The Dictionary of Italian Collocations15VirtualLearningEnvironment:web applicationspecificallydevotedtolanguagelearningLELE (Linguistically-EnhancedLearningEnvironment)providelanguagelearnerswithadditional NLP resources, in ordertoimprovetheirlinguisticcompetencereceptive and productivelearningactivitiesconcerning the recognition and the activeuseofcollocations
  • 16. LELE FeaturesLREC 2010 - Stefania Spina - The Dictionary of Italian Collocations16toautomaticallyrecognize and highlightmulti-wordunits in writtenItaliantexts;to show additionallinguistic information about the selectedcollocations;to generate collocationtestsforcollocationalcompetenceassessmentofsecond or foreignlanguagelearners.
  • 17. LELE schemeLREC 2010 - Stefania Spina - The Dictionary of Italian Collocations17server
  • 20. ConclusionsLREC 2010 - Stefania Spina - The Dictionary of Italian Collocations20Nextstep:samemethodologyto the whole corpus, forall the 10 selected POS sequencesFurtherresearchrefinestatisticalmeasuresassigncollocationstodifferentlevelsofcompetenceothertools (productivetasks)
  • 21. LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations21Stefania Spinastefania.spina@unistrapg.ithttp://april.unistrapg.it