�ݺ�ߣ

Tags as Tools for Social Classification Dr. Isabella Peters Department of Information Science Institute for Language and Information Heinrich-Heine-University D��sseldorf, Germany 34th Annual Conference of the German Classification Society, July 2010

Outline Theoretical assumptions : Social classification can be based on folksonomies Power Tags are most relevant tags Tag distributions on resource level become stable Three main research questions: How to build social classifications (automatically) ? Are Power Tags most relevant for a resource? (When do tag distributions become stable?) Results Based on study with students of University of D��sseldorf

Assumption I Social classification can be based on folksonomies Folksonomy = sum of all tags of all users of a collaborative information service (e.g. delicious) Platform folksonomy vs. resource folksonomy Broad folksonomy (delicious) vs. narrow folksonomy (youtube) Social classification = collaborative knowledge representation with natural-language terms = ��social categorization��

Assumption I Social classification can be based on folksonomies Resource folksonomy reflects via tags collective user intelligence in giving meaning to the resource Most popular tags are the most important tags for the resource = Power Tags Only observable in broad folksonomies because of multiple tagging! Folksonomies deliver concept candidates for social classification

Method I Aim: Finding tag pairs for construction of social classification Step 1: Calculating Power Tags for resource Number n of Power Tags depends on type of tag distribution Power law ? n = exponent Inverse-logistic distribution ? n = tags left from turning point Social classification can be based on folksonomies Power Law Inverse-logistic distribution

Method I Step 2: Calculating co-occurrence for Power Tags and tags of platform folksonomy Basis = Power Tags I from resource level Power Tags II = co-occurring tags from platform level Tag pair is most valuable for social categorization ? Because of reflecting collective user intelligence Social classification can be based on folksonomies Power Tags I Power Tags II

Research Question I Step 3: Determination of Power Tags I and II can be carried out automatically 1) Identifying distribution type 2) Labeling first n tags as Power Tags I 3) Identifying co-occurring tags 4) Identifying distribution type 5) Extracting first n tags as Power Tags II 6) Combining Power Tags I and Power Tags II as tag pairs Step 4: Intellectual determination of relationship between Power Tags I and Power Tags II ? collaborative or individual How to build social classifications (automatically) ?

Research Question I Examples: 1. a) Power Tags I Android 1. b) Power Tags II Mobile Google 2. a) Power Tags I Web 2.0 2. b) Power Tags II Tools Social Blog Socialsoftware Bookmarks How to build social classifications (automatically) ? Community Tagging Web AJAX online association related term Google RT association related term mobile RT Android relation descriptor set hierarchy broader term web BT meronymy narrower term partitive blog NTP meronymy narrower term partitive bookmarks NTP meronymy narrower term partitive tagging NTP meronymy narrower term partitive community NTP meronymy narrower term partitive ajax NTP association related term online RT synonymy used for Socialsoftware UF Web 2.0 relation descriptor set

Assumption II Power Tags are most relevant tags To build social classifications based on Power Tags an important precondition must be fulfilled: Power Tags ARE the most relevant tags for a resource Problem: relevance judgments as well as tagging behaviour are highly subjective and error-prone (regarding spelling etc.) Is the collective intelligence of users capable of ��ironing out�� too personal and erroneous tags so that all users are satisfied with high-frequent tags?

Method II Power Tags are most relevant tags Investigation of 30 resources downloaded from delicious in February 2010 Participants: 20 students of Information Science at the HHU D��sseldorf All resources tagged with ��folksonomy�� and tagged from at least 100 users To guarantee that students are technical able to judge relevance of tags To guarantee that broad tag distributions can be used as test sample User evaluation Tag is relevant for resource = indicated with 1 Tag is not relevant for resource = indicated with 0 Students had access to resource Students did not know the delicious-rank of the tags Relevance distribution of tags for every resource by student judgments

Research Question II Are Power Tags most relevant for a resource? Determination of relevance: 50% and more of students judged tag as relevant Extraction of Top 10-delicious-tags How many students called these Top 10-tags relevant? Calculation of relative frequency of students relevance judgments ? Pearson �� 0,49 N = 30

Research Question II Are Power Tags most relevant for a resource? Result: only the first two tags are relevant Strong indication for Power Tags Problems in relevance judgments Bias to german tags No unification of spelling variants ? solution: tag gardening (NLP) No combination of phrase tags

Assumption III Tag distributions on resource level become stable Studies showed that the shape of tag distributions remains stable after reaching a particular number of tags and users Kipp & Campbell (2006) Maarek et al. (2006) Halpin, Robu, & Shepherd (2007) Maass, Kowatsch, & M��nster (2007) Maier & Thalmann (2007)

Assumption III Tag distributions on resource level become stable If this assumption is true and ��stable�� is considered as No rank permutation of tags appear anymore Relative number of tags does not change anymore it means that �� Power Tags I and II are like controlled vocabulary for a resource Users gained consenus in describing and tagging the resource �C visualized in Power Tags Tags in Long Tail of distribution may be synonyms, tags with typing errors, narrower concepts, etc.

Open Research Question III When do tag distributions become stable? To automate classification processes we need to know after which number of tagging users a tag distribution remains stable and when no changes in the ranking of tags appear anymore After that we can extract Power Tags for social classification for the particular resource

Open Research Question & Method III When do tag distributions become stable? Comparison of tag distribution with n users and final tag distribution (downloaded at a point in time) Calculation of relative frequency of every tag rel. freq (t 1 �� t n ) for particular user numbers Calculation of average distance between final tag distribution and tag distribution with n users Subtraction of ��rel. freq (t n ,fd) of final distribution and ��rel. freq (t n ,td) of tag distribution with n users Stability achieved when �� rel. freq (t n ,fd) - ��rel. freq (t n ,td) < threshold value

Conclusion Social Classification can be based on folksonomies �C Power Tags are concept candidates Extraction of Power Tags I and II pairs can be carried out automatically Determination of the relationship inherent in tag pairs requires intellectual processing Power Tags are most relevant tags Relevance of tags can be enhanced through unification and combination of similar tags (here: not synonyms but spelling variants) ? tag gardening Ongoing research: when do tag distributions become stable?

Conclusion What type of tag distribution ? Tag distribution stable? Extraction of Power Tags I & II Pairs of relevant Power Tags Candidate vocabulary Definition of concepts and of semantic relations Intellectual structuring Social knowledge organization system Automatic processing Intellectual processing

Comments? Questions? Isabella Peters: isabella.peters@uni-duesseldorf.de Greetings from D��sseldorf! This presentation is available on �ݺ�ߣShare: http://www.slideshare.net/isabellapeters.

References Halpin, H., Robu, V. and Shepherd, H. (2007): The Complex Dynamics of Collaborative Tagging. In: Carey L. Williamson, C. L., Zurko, M. E., Patel-Schneider, P. F. and Shenoy, P. J. (Eds.): Proceedings of the 16th International WWW Conference, Ban, Alberta, Canada. ACM, New York, 211-220. Kipp, M., & Campbell, D. (2006). Patterns and Inconsistencies in Collaborative Tagging Systems: An Examination of Tagging Practices. In Proceedings of the 17th Annual Meeting of the American Society for Information Science and Technology, Austin, Texas, USA . Maarek, Y., Marnasse, N., Navon, Y., & Soroka, V. (2006). Tagging the Physical World. In Proceedings of the Collaborative Web Tagging Workshop at WWW 2006, Edinburgh, Scotland . Maass, W., Kowatsch, T., & M��nster, T. (2007). Vocabulary Patterns in Free-for-all Collaborative Indexing Systems. In Proceedings of International Workshop on Emergent Semantics and Ontology Evolution, Busan, Korea (pp. 45�C57). Maier, R., & Thalmann, S. (2007). Kollaboratives Tagging zur inhaltlichen Beschreibung von Lern- und Wissensressourcen. In R. Tolksdorf & J. Freytag (Eds.), Proceedings of XML Tage, Berlin, Germany, Proceedings of XML Tage, Berlin, Germany (pp. 75�C86). Berlin: Freie Universit?t Berlin. Peters, I. (2009). Folksonomies: Indexing and Retrieval in Web 2.0. Berlin: De Gruyter, Saur. Peters, I., & Stock, W. G. (2010). "Power Tags" in Information Retrieval. Library Hi Tech, 28(1), 81-93. Peters, I., & Weller, K. (2008). Tag Gardening for Folksonomy Enrichment and Maintenance. Webology, 5(3), Article 58, from http://www.webology.ir/2008/ v5n3/a58.html. Stock, W.G. (2006). On Relevance Distributions. Journal of the American Society for Information Science and Technology , 57(8), 1126-1129.

�ݺ�ߣ

Tags as tools for social classification

Recommended

More Related Content

What's hot (19)

Similar to Tags as tools for social classification (20)

More from Isabella Peters (7)

Tags as tools for social classification