ºÝºÝߣshows by User: eickhoff / http://www.slideshare.net/images/logo.gif ºÝºÝߣshows by User: eickhoff / Wed, 24 Jul 2019 15:48:17 GMT ºÝºÝߣShare feed for ºÝºÝߣshows by User: eickhoff Unsupervised Learning of General-Purpose Embeddings for User and Location Modeling /slideshow/unsupervised-learning-of-generalpurpose-embeddings-for-user-and-location-modeling/157555587 sigir19-190724154817
Many social network applications depend on robust representations of spatio-temporal data. In this work, we present an embedding model based on feed-forward neural networks which transforms social media check-ins into dense feature vectors encoding geographic, temporal, and functional aspects for modeling places, neighborhoods, and users. We employ the embedding model in a variety of applications including location recommendation, urban functional zone study, and crime prediction. For location recommendation, we propose a Spatio-Temporal Embedding Similarity algorithm (STES) based on the embedding model. In a range of experiments on real life data collected from Foursquare, we demonstrate our model's effectiveness at characterizing places and people and its applicability in aforementioned problem domains. Finally, we select eight major cities around the globe and verify the robustness and generality of our model by porting pre-trained models from one city to another, thereby alleviating the need for costly local training.]]>

Many social network applications depend on robust representations of spatio-temporal data. In this work, we present an embedding model based on feed-forward neural networks which transforms social media check-ins into dense feature vectors encoding geographic, temporal, and functional aspects for modeling places, neighborhoods, and users. We employ the embedding model in a variety of applications including location recommendation, urban functional zone study, and crime prediction. For location recommendation, we propose a Spatio-Temporal Embedding Similarity algorithm (STES) based on the embedding model. In a range of experiments on real life data collected from Foursquare, we demonstrate our model's effectiveness at characterizing places and people and its applicability in aforementioned problem domains. Finally, we select eight major cities around the globe and verify the robustness and generality of our model by porting pre-trained models from one city to another, thereby alleviating the need for costly local training.]]>
Wed, 24 Jul 2019 15:48:17 GMT /slideshow/unsupervised-learning-of-generalpurpose-embeddings-for-user-and-location-modeling/157555587 eickhoff@slideshare.net(eickhoff) Unsupervised Learning of General-Purpose Embeddings for User and Location Modeling eickhoff Many social network applications depend on robust representations of spatio-temporal data. In this work, we present an embedding model based on feed-forward neural networks which transforms social media check-ins into dense feature vectors encoding geographic, temporal, and functional aspects for modeling places, neighborhoods, and users. We employ the embedding model in a variety of applications including location recommendation, urban functional zone study, and crime prediction. For location recommendation, we propose a Spatio-Temporal Embedding Similarity algorithm (STES) based on the embedding model. In a range of experiments on real life data collected from Foursquare, we demonstrate our model's effectiveness at characterizing places and people and its applicability in aforementioned problem domains. Finally, we select eight major cities around the globe and verify the robustness and generality of our model by porting pre-trained models from one city to another, thereby alleviating the need for costly local training. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/sigir19-190724154817-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Many social network applications depend on robust representations of spatio-temporal data. In this work, we present an embedding model based on feed-forward neural networks which transforms social media check-ins into dense feature vectors encoding geographic, temporal, and functional aspects for modeling places, neighborhoods, and users. We employ the embedding model in a variety of applications including location recommendation, urban functional zone study, and crime prediction. For location recommendation, we propose a Spatio-Temporal Embedding Similarity algorithm (STES) based on the embedding model. In a range of experiments on real life data collected from Foursquare, we demonstrate our model&#39;s effectiveness at characterizing places and people and its applicability in aforementioned problem domains. Finally, we select eight major cities around the globe and verify the robustness and generality of our model by porting pre-trained models from one city to another, thereby alleviating the need for costly local training.
Unsupervised Learning of General-Purpose Embeddings for User and Location Modeling from Carsten Eickhoff
]]>
381 3 https://cdn.slidesharecdn.com/ss_thumbnails/sigir19-190724154817-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Web2Text: Deep Structured Boilerplate Removal /eickhoff/web2text-deep-structured-boilerplate-removal ecir2018web2text1-180405150727
Web pages are a valuable source of information for many natural language processing and information retrieval tasks. Extracting the main content from those documents is essential for the performance of derived applications. To address this issue, we introduce a novel model that performs sequence labeling to collectively classify all text elements in an HTML page as either boilerplate or main content. Our method uses convolutional networks on top of DOM tree features to learn unary classification potentials for each block of text on the page and pairwise potentials for each pair of neighboring text blocks. We find the most likely labeling according to these potentials using the Viterbi algorithm. The proposed method improves page cleaning performance on the CleanEval benchmark compared to the state-of-the-art. As a component of information retrieval pipelines it improves retrieval performance on the ClueWeb12 collection.]]>

Web pages are a valuable source of information for many natural language processing and information retrieval tasks. Extracting the main content from those documents is essential for the performance of derived applications. To address this issue, we introduce a novel model that performs sequence labeling to collectively classify all text elements in an HTML page as either boilerplate or main content. Our method uses convolutional networks on top of DOM tree features to learn unary classification potentials for each block of text on the page and pairwise potentials for each pair of neighboring text blocks. We find the most likely labeling according to these potentials using the Viterbi algorithm. The proposed method improves page cleaning performance on the CleanEval benchmark compared to the state-of-the-art. As a component of information retrieval pipelines it improves retrieval performance on the ClueWeb12 collection.]]>
Thu, 05 Apr 2018 15:07:26 GMT /eickhoff/web2text-deep-structured-boilerplate-removal eickhoff@slideshare.net(eickhoff) Web2Text: Deep Structured Boilerplate Removal eickhoff Web pages are a valuable source of information for many natural language processing and information retrieval tasks. Extracting the main content from those documents is essential for the performance of derived applications. To address this issue, we introduce a novel model that performs sequence labeling to collectively classify all text elements in an HTML page as either boilerplate or main content. Our method uses convolutional networks on top of DOM tree features to learn unary classification potentials for each block of text on the page and pairwise potentials for each pair of neighboring text blocks. We find the most likely labeling according to these potentials using the Viterbi algorithm. The proposed method improves page cleaning performance on the CleanEval benchmark compared to the state-of-the-art. As a component of information retrieval pipelines it improves retrieval performance on the ClueWeb12 collection. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/ecir2018web2text1-180405150727-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Web pages are a valuable source of information for many natural language processing and information retrieval tasks. Extracting the main content from those documents is essential for the performance of derived applications. To address this issue, we introduce a novel model that performs sequence labeling to collectively classify all text elements in an HTML page as either boilerplate or main content. Our method uses convolutional networks on top of DOM tree features to learn unary classification potentials for each block of text on the page and pairwise potentials for each pair of neighboring text blocks. We find the most likely labeling according to these potentials using the Viterbi algorithm. The proposed method improves page cleaning performance on the CleanEval benchmark compared to the state-of-the-art. As a component of information retrieval pipelines it improves retrieval performance on the ClueWeb12 collection.
Web2Text: Deep Structured Boilerplate Removal from Carsten Eickhoff
]]>
1276 5 https://cdn.slidesharecdn.com/ss_thumbnails/ecir2018web2text1-180405150727-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Cognitive Biases in Crowdsourcing /slideshow/cognitive-biases-in-crowdsourcing/92978861 cognitivebiasesincrowsourcing1-180405150543
Crowdsourcing has become a popular paradigm in data curation, annotation and evaluation for many artificial intelligence and information retrieval applications. Considerable efforts have gone into devising effective quality control mechanisms that identify or discourage cheat submissions in an attempt to improve the quality of noisy crowd judgments. Besides purposeful cheating, there is another source of noise that is often alluded to but insufficiently studied: Cognitive biases. This paper investigates the prevalence and effect size of a range of common cognitive biases on a standard relevance judgment task. Our experiments are based on three sizable publicly available document collections and note significant detrimental effects on annotation quality, system ranking and the performance of derived rankers when task design does not account for such biases.]]>

Crowdsourcing has become a popular paradigm in data curation, annotation and evaluation for many artificial intelligence and information retrieval applications. Considerable efforts have gone into devising effective quality control mechanisms that identify or discourage cheat submissions in an attempt to improve the quality of noisy crowd judgments. Besides purposeful cheating, there is another source of noise that is often alluded to but insufficiently studied: Cognitive biases. This paper investigates the prevalence and effect size of a range of common cognitive biases on a standard relevance judgment task. Our experiments are based on three sizable publicly available document collections and note significant detrimental effects on annotation quality, system ranking and the performance of derived rankers when task design does not account for such biases.]]>
Thu, 05 Apr 2018 15:05:43 GMT /slideshow/cognitive-biases-in-crowdsourcing/92978861 eickhoff@slideshare.net(eickhoff) Cognitive Biases in Crowdsourcing eickhoff Crowdsourcing has become a popular paradigm in data curation, annotation and evaluation for many artificial intelligence and information retrieval applications. Considerable efforts have gone into devising effective quality control mechanisms that identify or discourage cheat submissions in an attempt to improve the quality of noisy crowd judgments. Besides purposeful cheating, there is another source of noise that is often alluded to but insufficiently studied: Cognitive biases. This paper investigates the prevalence and effect size of a range of common cognitive biases on a standard relevance judgment task. Our experiments are based on three sizable publicly available document collections and note significant detrimental effects on annotation quality, system ranking and the performance of derived rankers when task design does not account for such biases. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/cognitivebiasesincrowsourcing1-180405150543-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Crowdsourcing has become a popular paradigm in data curation, annotation and evaluation for many artificial intelligence and information retrieval applications. Considerable efforts have gone into devising effective quality control mechanisms that identify or discourage cheat submissions in an attempt to improve the quality of noisy crowd judgments. Besides purposeful cheating, there is another source of noise that is often alluded to but insufficiently studied: Cognitive biases. This paper investigates the prevalence and effect size of a range of common cognitive biases on a standard relevance judgment task. Our experiments are based on three sizable publicly available document collections and note significant detrimental effects on annotation quality, system ranking and the performance of derived rankers when task design does not account for such biases.
Cognitive Biases in Crowdsourcing from Carsten Eickhoff
]]>
379 2 https://cdn.slidesharecdn.com/ss_thumbnails/cognitivebiasesincrowsourcing1-180405150543-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Evaluating Music Recommender Systems for Groups /slideshow/evaluating-music-recommender-systems-for-groups/92978616 recsys2017-180405150323
Recommendation to groups of users is a challenging and currently only passingly studied task. Especially the evaluation aspect often appears ad-hoc and instead of truly evaluating on groups of users, synthesises groups by merging individual preferences. In this paper, we present a user study, recording the individual and shared preferences of actual groups of participants, resulting in a robust, standardized evaluation benchmark. Using this benchmarking dataset, that we share with the research community, we compare the respective performance of a wide range of music group recommendation techniques proposed in the literature.]]>

Recommendation to groups of users is a challenging and currently only passingly studied task. Especially the evaluation aspect often appears ad-hoc and instead of truly evaluating on groups of users, synthesises groups by merging individual preferences. In this paper, we present a user study, recording the individual and shared preferences of actual groups of participants, resulting in a robust, standardized evaluation benchmark. Using this benchmarking dataset, that we share with the research community, we compare the respective performance of a wide range of music group recommendation techniques proposed in the literature.]]>
Thu, 05 Apr 2018 15:03:23 GMT /slideshow/evaluating-music-recommender-systems-for-groups/92978616 eickhoff@slideshare.net(eickhoff) Evaluating Music Recommender Systems for Groups eickhoff Recommendation to groups of users is a challenging and currently only passingly studied task. Especially the evaluation aspect often appears ad-hoc and instead of truly evaluating on groups of users, synthesises groups by merging individual preferences. In this paper, we present a user study, recording the individual and shared preferences of actual groups of participants, resulting in a robust, standardized evaluation benchmark. Using this benchmarking dataset, that we share with the research community, we compare the respective performance of a wide range of music group recommendation techniques proposed in the literature. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/recsys2017-180405150323-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Recommendation to groups of users is a challenging and currently only passingly studied task. Especially the evaluation aspect often appears ad-hoc and instead of truly evaluating on groups of users, synthesises groups by merging individual preferences. In this paper, we present a user study, recording the individual and shared preferences of actual groups of participants, resulting in a robust, standardized evaluation benchmark. Using this benchmarking dataset, that we share with the research community, we compare the respective performance of a wide range of music group recommendation techniques proposed in the literature.
Evaluating Music Recommender Systems for Groups from Carsten Eickhoff
]]>
115 1 https://cdn.slidesharecdn.com/ss_thumbnails/recsys2017-180405150323-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Active Content-Based Crowdsourcing Task Selection /slideshow/active-contentbased-crowdsourcing-task-selection/67756581 cikm2016-161028022144
Crowdsourcing has long established itself as a viable alternative to corpus annotation by domain experts for tasks such as document relevance assessment. The crowdsourcing process traditionally relies on high degrees of label redundancy in order to mitigate the detrimental effects of individually noisy worker submissions. Such redundancy comes at the cost of increased label volume, and, subsequently, monetary requirements. In practice, especially as the size of datasets increases, this is undesirable. In this paper, we focus on an alternate method that exploits document information instead, to infer relevance labels for unjudged documents. We present an active learning scheme for document selection that aims at maximising the overall relevance label prediction accuracy, for a given budget of available relevance judgements by exploiting system-wide estimates of label variance and mutual information. Our experiments are based on TREC 2011 Crowdsourcing Track data and show that our method is able to achieve state-of-the-art performance while requiring 17 – 25% less budget. This paper has been accepted for presentation at the 25th ACM International Conference on Information and Knowledge Management (CIKM).]]>

Crowdsourcing has long established itself as a viable alternative to corpus annotation by domain experts for tasks such as document relevance assessment. The crowdsourcing process traditionally relies on high degrees of label redundancy in order to mitigate the detrimental effects of individually noisy worker submissions. Such redundancy comes at the cost of increased label volume, and, subsequently, monetary requirements. In practice, especially as the size of datasets increases, this is undesirable. In this paper, we focus on an alternate method that exploits document information instead, to infer relevance labels for unjudged documents. We present an active learning scheme for document selection that aims at maximising the overall relevance label prediction accuracy, for a given budget of available relevance judgements by exploiting system-wide estimates of label variance and mutual information. Our experiments are based on TREC 2011 Crowdsourcing Track data and show that our method is able to achieve state-of-the-art performance while requiring 17 – 25% less budget. This paper has been accepted for presentation at the 25th ACM International Conference on Information and Knowledge Management (CIKM).]]>
Fri, 28 Oct 2016 02:21:44 GMT /slideshow/active-contentbased-crowdsourcing-task-selection/67756581 eickhoff@slideshare.net(eickhoff) Active Content-Based Crowdsourcing Task Selection eickhoff Crowdsourcing has long established itself as a viable alternative to corpus annotation by domain experts for tasks such as document relevance assessment. The crowdsourcing process traditionally relies on high degrees of label redundancy in order to mitigate the detrimental effects of individually noisy worker submissions. Such redundancy comes at the cost of increased label volume, and, subsequently, monetary requirements. In practice, especially as the size of datasets increases, this is undesirable. In this paper, we focus on an alternate method that exploits document information instead, to infer relevance labels for unjudged documents. We present an active learning scheme for document selection that aims at maximising the overall relevance label prediction accuracy, for a given budget of available relevance judgements by exploiting system-wide estimates of label variance and mutual information. Our experiments are based on TREC 2011 Crowdsourcing Track data and show that our method is able to achieve state-of-the-art performance while requiring 17 – 25% less budget. This paper has been accepted for presentation at the 25th ACM International Conference on Information and Knowledge Management (CIKM). <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/cikm2016-161028022144-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Crowdsourcing has long established itself as a viable alternative to corpus annotation by domain experts for tasks such as document relevance assessment. The crowdsourcing process traditionally relies on high degrees of label redundancy in order to mitigate the detrimental effects of individually noisy worker submissions. Such redundancy comes at the cost of increased label volume, and, subsequently, monetary requirements. In practice, especially as the size of datasets increases, this is undesirable. In this paper, we focus on an alternate method that exploits document information instead, to infer relevance labels for unjudged documents. We present an active learning scheme for document selection that aims at maximising the overall relevance label prediction accuracy, for a given budget of available relevance judgements by exploiting system-wide estimates of label variance and mutual information. Our experiments are based on TREC 2011 Crowdsourcing Track data and show that our method is able to achieve state-of-the-art performance while requiring 17 – 25% less budget. This paper has been accepted for presentation at the 25th ACM International Conference on Information and Knowledge Management (CIKM).
Active Content-Based Crowdsourcing Task Selection from Carsten Eickhoff
]]>
410 2 https://cdn.slidesharecdn.com/ss_thumbnails/cikm2016-161028022144-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Efficient Parallel Learning of Word2Vec /slideshow/efficient-parallel-learning-of-word2vec/63702654 presentation-160704092949
Since its introduction, Word2Vec and its variants are widely used to learn semantics-preserving representations of words or entities in an embedding space which can be used to produce state-of-art results for various Natural Language Processing tasks. Existing implementations aim to learn efficiently by running multiple threads in parallel while operating on a single model in shared memory, ignoring incidental memory update collisions. We show that these collisions can degrade the efficiency of parallel learning, and propose a straightforward caching strategy that improves the efficiency by a factor of 4. This paper has been accepted for presentation at the ICML Machine Learning Systems Workshop in New York City, USA.]]>

Since its introduction, Word2Vec and its variants are widely used to learn semantics-preserving representations of words or entities in an embedding space which can be used to produce state-of-art results for various Natural Language Processing tasks. Existing implementations aim to learn efficiently by running multiple threads in parallel while operating on a single model in shared memory, ignoring incidental memory update collisions. We show that these collisions can degrade the efficiency of parallel learning, and propose a straightforward caching strategy that improves the efficiency by a factor of 4. This paper has been accepted for presentation at the ICML Machine Learning Systems Workshop in New York City, USA.]]>
Mon, 04 Jul 2016 09:29:49 GMT /slideshow/efficient-parallel-learning-of-word2vec/63702654 eickhoff@slideshare.net(eickhoff) Efficient Parallel Learning of Word2Vec eickhoff Since its introduction, Word2Vec and its variants are widely used to learn semantics-preserving representations of words or entities in an embedding space which can be used to produce state-of-art results for various Natural Language Processing tasks. Existing implementations aim to learn efficiently by running multiple threads in parallel while operating on a single model in shared memory, ignoring incidental memory update collisions. We show that these collisions can degrade the efficiency of parallel learning, and propose a straightforward caching strategy that improves the efficiency by a factor of 4. This paper has been accepted for presentation at the ICML Machine Learning Systems Workshop in New York City, USA. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/presentation-160704092949-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Since its introduction, Word2Vec and its variants are widely used to learn semantics-preserving representations of words or entities in an embedding space which can be used to produce state-of-art results for various Natural Language Processing tasks. Existing implementations aim to learn efficiently by running multiple threads in parallel while operating on a single model in shared memory, ignoring incidental memory update collisions. We show that these collisions can degrade the efficiency of parallel learning, and propose a straightforward caching strategy that improves the efficiency by a factor of 4. This paper has been accepted for presentation at the ICML Machine Learning Systems Workshop in New York City, USA.
Efficient Parallel Learning of Word2Vec from Carsten Eickhoff
]]>
571 4 https://cdn.slidesharecdn.com/ss_thumbnails/presentation-160704092949-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
An Eye-Tracking Study of Query Reformulation /slideshow/an-eyetracking-study-of-query-reformulation/51855596 presentation-150820113044-lva1-app6892
Information about a user's domain knowledge and interest can be important signals for many information retrieval tasks such as query suggestion or result ranking. State-of-the-art user models rely on coarse-grained representations of the user's previous knowledge about a topic or domain. In this paper, we study query refinement using eye-tracking in order to gain precise and detailed insight into which terms the user was exposed to in a search session and which ones they showed a particular interest in. We measure fixations on the term level, allowing for a detailed model of user attention. To allow for a wide-spread exploitation of our findings, we generalize from the restrictive eye-gaze tracking to using more accessible signals: mouse cursor traces. Based on the public API of a popular search engine, we demonstrate how query suggestion candidates can be ranked according to traces of user attention and interest, resulting in significantly better performance than achieved by an attention-oblivious industry solution. Our experiments suggest that modelling term-level user attention can be achieved with great reliability and holds significant potential for supporting a range of traditional IR tasks. This paper has been accepted for presentation at ACM SIGIR 2015.]]>

Information about a user's domain knowledge and interest can be important signals for many information retrieval tasks such as query suggestion or result ranking. State-of-the-art user models rely on coarse-grained representations of the user's previous knowledge about a topic or domain. In this paper, we study query refinement using eye-tracking in order to gain precise and detailed insight into which terms the user was exposed to in a search session and which ones they showed a particular interest in. We measure fixations on the term level, allowing for a detailed model of user attention. To allow for a wide-spread exploitation of our findings, we generalize from the restrictive eye-gaze tracking to using more accessible signals: mouse cursor traces. Based on the public API of a popular search engine, we demonstrate how query suggestion candidates can be ranked according to traces of user attention and interest, resulting in significantly better performance than achieved by an attention-oblivious industry solution. Our experiments suggest that modelling term-level user attention can be achieved with great reliability and holds significant potential for supporting a range of traditional IR tasks. This paper has been accepted for presentation at ACM SIGIR 2015.]]>
Thu, 20 Aug 2015 11:30:43 GMT /slideshow/an-eyetracking-study-of-query-reformulation/51855596 eickhoff@slideshare.net(eickhoff) An Eye-Tracking Study of Query Reformulation eickhoff Information about a user's domain knowledge and interest can be important signals for many information retrieval tasks such as query suggestion or result ranking. State-of-the-art user models rely on coarse-grained representations of the user's previous knowledge about a topic or domain. In this paper, we study query refinement using eye-tracking in order to gain precise and detailed insight into which terms the user was exposed to in a search session and which ones they showed a particular interest in. We measure fixations on the term level, allowing for a detailed model of user attention. To allow for a wide-spread exploitation of our findings, we generalize from the restrictive eye-gaze tracking to using more accessible signals: mouse cursor traces. Based on the public API of a popular search engine, we demonstrate how query suggestion candidates can be ranked according to traces of user attention and interest, resulting in significantly better performance than achieved by an attention-oblivious industry solution. Our experiments suggest that modelling term-level user attention can be achieved with great reliability and holds significant potential for supporting a range of traditional IR tasks. This paper has been accepted for presentation at ACM SIGIR 2015. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/presentation-150820113044-lva1-app6892-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Information about a user&#39;s domain knowledge and interest can be important signals for many information retrieval tasks such as query suggestion or result ranking. State-of-the-art user models rely on coarse-grained representations of the user&#39;s previous knowledge about a topic or domain. In this paper, we study query refinement using eye-tracking in order to gain precise and detailed insight into which terms the user was exposed to in a search session and which ones they showed a particular interest in. We measure fixations on the term level, allowing for a detailed model of user attention. To allow for a wide-spread exploitation of our findings, we generalize from the restrictive eye-gaze tracking to using more accessible signals: mouse cursor traces. Based on the public API of a popular search engine, we demonstrate how query suggestion candidates can be ranked according to traces of user attention and interest, resulting in significantly better performance than achieved by an attention-oblivious industry solution. Our experiments suggest that modelling term-level user attention can be achieved with great reliability and holds significant potential for supporting a range of traditional IR tasks. This paper has been accepted for presentation at ACM SIGIR 2015.
An Eye-Tracking Study of Query Reformulation from Carsten Eickhoff
]]>
512 7 https://cdn.slidesharecdn.com/ss_thumbnails/presentation-150820113044-lva1-app6892-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Introduction to Information Retrieval /eickhoff/introduction-to-information-retrieval hva-2013-131018094854-phpapp01
This 2-hour lecture was held at Amsterdam University of Applied Sciences (HvA) on October 16th, 2013. It represents a basic overview over core technologies used by ICT companies such as Google, Twitter or Facebook. The lecture does not require a strong technical background and stays at conceptual level.]]>

This 2-hour lecture was held at Amsterdam University of Applied Sciences (HvA) on October 16th, 2013. It represents a basic overview over core technologies used by ICT companies such as Google, Twitter or Facebook. The lecture does not require a strong technical background and stays at conceptual level.]]>
Fri, 18 Oct 2013 09:48:54 GMT /eickhoff/introduction-to-information-retrieval eickhoff@slideshare.net(eickhoff) Introduction to Information Retrieval eickhoff This 2-hour lecture was held at Amsterdam University of Applied Sciences (HvA) on October 16th, 2013. It represents a basic overview over core technologies used by ICT companies such as Google, Twitter or Facebook. The lecture does not require a strong technical background and stays at conceptual level. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/hva-2013-131018094854-phpapp01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> This 2-hour lecture was held at Amsterdam University of Applied Sciences (HvA) on October 16th, 2013. It represents a basic overview over core technologies used by ICT companies such as Google, Twitter or Facebook. The lecture does not require a strong technical background and stays at conceptual level.
Introduction to Information Retrieval from Carsten Eickhoff
]]>
2533 5 https://cdn.slidesharecdn.com/ss_thumbnails/hva-2013-131018094854-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECIR'13) /eickhoff/ecir2013-eickhoff ecir2013-eickhoff-130403060121-phpapp01
State-of-the-art content sharing platforms often require users to assign tags to pieces of media in order to make them easily retrievable. Since this task is sometimes perceived as tedious or boring, annotations can be sparse. Commenting on the other hand is a frequently used means of expressing user opinion towards shared media items. We propose the use of time series analyses in order to infer potential tags and indexing terms for audio-visual content from user comments. In this way, we mitigate the vocabulary gap between queries and document descriptors. Additionally, we show how large-scale encyclopedias such as Wikipedia can aid the task of tag prediction by serving as surrogates for high-coverage natural language vocabulary lists. Our evaluation is conducted on a corpus of several million real-world user comments from the popular video sharing platform YouTube, and demonstrates significant improvements in retrieval performance. This work together with Wen Li and Arjen P. de Vries has been accepted for full oral presentation at the 35th European Conference on Information Retrieval (ECIR) in Moscow, Russia. The full version of the article is available at: http://link.springer.com/chapter/10.1007/978-3-642-36973-5_4 ]]>

State-of-the-art content sharing platforms often require users to assign tags to pieces of media in order to make them easily retrievable. Since this task is sometimes perceived as tedious or boring, annotations can be sparse. Commenting on the other hand is a frequently used means of expressing user opinion towards shared media items. We propose the use of time series analyses in order to infer potential tags and indexing terms for audio-visual content from user comments. In this way, we mitigate the vocabulary gap between queries and document descriptors. Additionally, we show how large-scale encyclopedias such as Wikipedia can aid the task of tag prediction by serving as surrogates for high-coverage natural language vocabulary lists. Our evaluation is conducted on a corpus of several million real-world user comments from the popular video sharing platform YouTube, and demonstrates significant improvements in retrieval performance. This work together with Wen Li and Arjen P. de Vries has been accepted for full oral presentation at the 35th European Conference on Information Retrieval (ECIR) in Moscow, Russia. The full version of the article is available at: http://link.springer.com/chapter/10.1007/978-3-642-36973-5_4 ]]>
Wed, 03 Apr 2013 06:01:20 GMT /eickhoff/ecir2013-eickhoff eickhoff@slideshare.net(eickhoff) Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECIR'13) eickhoff State-of-the-art content sharing platforms often require users to assign tags to pieces of media in order to make them easily retrievable. Since this task is sometimes perceived as tedious or boring, annotations can be sparse. Commenting on the other hand is a frequently used means of expressing user opinion towards shared media items. We propose the use of time series analyses in order to infer potential tags and indexing terms for audio-visual content from user comments. In this way, we mitigate the vocabulary gap between queries and document descriptors. Additionally, we show how large-scale encyclopedias such as Wikipedia can aid the task of tag prediction by serving as surrogates for high-coverage natural language vocabulary lists. Our evaluation is conducted on a corpus of several million real-world user comments from the popular video sharing platform YouTube, and demonstrates significant improvements in retrieval performance. This work together with Wen Li and Arjen P. de Vries has been accepted for full oral presentation at the 35th European Conference on Information Retrieval (ECIR) in Moscow, Russia. The full version of the article is available at: http://link.springer.com/chapter/10.1007/978-3-642-36973-5_4 <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/ecir2013-eickhoff-130403060121-phpapp01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> State-of-the-art content sharing platforms often require users to assign tags to pieces of media in order to make them easily retrievable. Since this task is sometimes perceived as tedious or boring, annotations can be sparse. Commenting on the other hand is a frequently used means of expressing user opinion towards shared media items. We propose the use of time series analyses in order to infer potential tags and indexing terms for audio-visual content from user comments. In this way, we mitigate the vocabulary gap between queries and document descriptors. Additionally, we show how large-scale encyclopedias such as Wikipedia can aid the task of tag prediction by serving as surrogates for high-coverage natural language vocabulary lists. Our evaluation is conducted on a corpus of several million real-world user comments from the popular video sharing platform YouTube, and demonstrates significant improvements in retrieval performance. This work together with Wen Li and Arjen P. de Vries has been accepted for full oral presentation at the 35th European Conference on Information Retrieval (ECIR) in Moscow, Russia. The full version of the article is available at: http://link.springer.com/chapter/10.1007/978-3-642-36973-5_4
Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECIR'13) from Carsten Eickhoff
]]>
766 2 https://cdn.slidesharecdn.com/ss_thumbnails/ecir2013-eickhoff-130403060121-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
https://cdn.slidesharecdn.com/profile-photo-eickhoff-48x48.jpg?cb=1652068501 www.carsten-eickhoff.com https://cdn.slidesharecdn.com/ss_thumbnails/sigir19-190724154817-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/unsupervised-learning-of-generalpurpose-embeddings-for-user-and-location-modeling/157555587 Unsupervised Learning ... https://cdn.slidesharecdn.com/ss_thumbnails/ecir2018web2text1-180405150727-thumbnail.jpg?width=320&height=320&fit=bounds eickhoff/web2text-deep-structured-boilerplate-removal Web2Text: Deep Structu... https://cdn.slidesharecdn.com/ss_thumbnails/cognitivebiasesincrowsourcing1-180405150543-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/cognitive-biases-in-crowdsourcing/92978861 Cognitive Biases in Cr...