際際滷shows by User: NattiyaKanhabua / http://www.slideshare.net/images/logo.gif 際際滷shows by User: NattiyaKanhabua / Thu, 23 Jul 2015 15:22:43 GMT 際際滷Share feed for 際際滷shows by User: NattiyaKanhabua Search, Exploration and Analytics of Evolving Data /slideshow/search-exploration-and-analytics-of-evolving-data/50851225 costkeystonenattiyakanhabuasearchexplorationanalyticsevolvingdata-150723152243-lva1-app6891
The 1st Keystone Summer School: Keyword Search over Big Data]]>

The 1st Keystone Summer School: Keyword Search over Big Data]]>
Thu, 23 Jul 2015 15:22:43 GMT /slideshow/search-exploration-and-analytics-of-evolving-data/50851225 NattiyaKanhabua@slideshare.net(NattiyaKanhabua) Search, Exploration and Analytics of Evolving Data NattiyaKanhabua The 1st Keystone Summer School: Keyword Search over Big Data <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/costkeystonenattiyakanhabuasearchexplorationanalyticsevolvingdata-150723152243-lva1-app6891-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> The 1st Keystone Summer School: Keyword Search over Big Data
Search, Exploration and Analytics of Evolving Data from Nattiya Kanhabua
]]>
1599 7 https://cdn.slidesharecdn.com/ss_thumbnails/costkeystonenattiyakanhabuasearchexplorationanalyticsevolvingdata-150723152243-lva1-app6891-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Towards Concise Preservation by Managed Forgetting: Research Issues and Case Study /slideshow/towards-concise-preservation-by-managed-forgetting-research-issues-and-case-study/43149361 cidnxcztxowbc35l5n0r-signature-b979870abcb2dd7faca5989b6f989ba37ec245cb670b808a5ccdd574c9abcad5-poli-150102084158-conversion-gate02
In human memory, forgetting plays a crucial role for focusing on important things and neglecting irrelevant details. In digital memories, the idea of systematic forgetting has found little attention, so far. At first glance, forgetting seems to contradict the purpose of archival and preservation. However, we are currently facing a tremendous growth in volumes of digital content. Thus, it becomes ever more important to focus, while forgetting irrelevant details, redundancies and noise. This holds true for better organizing the information space as well as in preservation management for making and revisiting decisions on what to keep. Therefore, we propose the introduction of the concept of managed forgetting as part of a joint information management and preservation management process in digital memories. Managed forgetting models resource selection as a function of attention and significance dynamics. Based on dynamic, multidimensional information value assessment it identifies information objects, e.g., documents or images of decreasing importance and/or topicality and triggers forgetting actions. Those actions include a variety of options, namely, aggregation and summarization, revised search and ranking behavior, elimination of redundancy, and finally, also deletion. In this paper, we present our vision for managed forgetting, discuss the challenges as well as our first ideas for its introduction, and present a case study for its motivation.]]>

In human memory, forgetting plays a crucial role for focusing on important things and neglecting irrelevant details. In digital memories, the idea of systematic forgetting has found little attention, so far. At first glance, forgetting seems to contradict the purpose of archival and preservation. However, we are currently facing a tremendous growth in volumes of digital content. Thus, it becomes ever more important to focus, while forgetting irrelevant details, redundancies and noise. This holds true for better organizing the information space as well as in preservation management for making and revisiting decisions on what to keep. Therefore, we propose the introduction of the concept of managed forgetting as part of a joint information management and preservation management process in digital memories. Managed forgetting models resource selection as a function of attention and significance dynamics. Based on dynamic, multidimensional information value assessment it identifies information objects, e.g., documents or images of decreasing importance and/or topicality and triggers forgetting actions. Those actions include a variety of options, namely, aggregation and summarization, revised search and ranking behavior, elimination of redundancy, and finally, also deletion. In this paper, we present our vision for managed forgetting, discuss the challenges as well as our first ideas for its introduction, and present a case study for its motivation.]]>
Fri, 02 Jan 2015 08:41:58 GMT /slideshow/towards-concise-preservation-by-managed-forgetting-research-issues-and-case-study/43149361 NattiyaKanhabua@slideshare.net(NattiyaKanhabua) Towards Concise Preservation by Managed Forgetting: Research Issues and Case Study NattiyaKanhabua In human memory, forgetting plays a crucial role for focusing on important things and neglecting irrelevant details. In digital memories, the idea of systematic forgetting has found little attention, so far. At first glance, forgetting seems to contradict the purpose of archival and preservation. However, we are currently facing a tremendous growth in volumes of digital content. Thus, it becomes ever more important to focus, while forgetting irrelevant details, redundancies and noise. This holds true for better organizing the information space as well as in preservation management for making and revisiting decisions on what to keep. Therefore, we propose the introduction of the concept of managed forgetting as part of a joint information management and preservation management process in digital memories. Managed forgetting models resource selection as a function of attention and significance dynamics. Based on dynamic, multidimensional information value assessment it identifies information objects, e.g., documents or images of decreasing importance and/or topicality and triggers forgetting actions. Those actions include a variety of options, namely, aggregation and summarization, revised search and ranking behavior, elimination of redundancy, and finally, also deletion. In this paper, we present our vision for managed forgetting, discuss the challenges as well as our first ideas for its introduction, and present a case study for its motivation. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/cidnxcztxowbc35l5n0r-signature-b979870abcb2dd7faca5989b6f989ba37ec245cb670b808a5ccdd574c9abcad5-poli-150102084158-conversion-gate02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> In human memory, forgetting plays a crucial role for focusing on important things and neglecting irrelevant details. In digital memories, the idea of systematic forgetting has found little attention, so far. At first glance, forgetting seems to contradict the purpose of archival and preservation. However, we are currently facing a tremendous growth in volumes of digital content. Thus, it becomes ever more important to focus, while forgetting irrelevant details, redundancies and noise. This holds true for better organizing the information space as well as in preservation management for making and revisiting decisions on what to keep. Therefore, we propose the introduction of the concept of managed forgetting as part of a joint information management and preservation management process in digital memories. Managed forgetting models resource selection as a function of attention and significance dynamics. Based on dynamic, multidimensional information value assessment it identifies information objects, e.g., documents or images of decreasing importance and/or topicality and triggers forgetting actions. Those actions include a variety of options, namely, aggregation and summarization, revised search and ranking behavior, elimination of redundancy, and finally, also deletion. In this paper, we present our vision for managed forgetting, discuss the challenges as well as our first ideas for its introduction, and present a case study for its motivation.
Towards Concise Preservation by Managed Forgetting: Research Issues and Case Study from Nattiya Kanhabua
]]>
852 2 https://cdn.slidesharecdn.com/ss_thumbnails/cidnxcztxowbc35l5n0r-signature-b979870abcb2dd7faca5989b6f989ba37ec245cb670b808a5ccdd574c9abcad5-poli-150102084158-conversion-gate02-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Understanding the Diversity of Tweets in the Time of Outbreaks /slideshow/understanding-the-diversity-of-tweets-in-the-time-of-outbreaks/43149123 pkfg69yrrfegbokior2w-signature-4c5464e0687501dd96cb9b339374d3f81818b71df95d2630199d16c74c242ec1-poli-150102082822-conversion-gate02
A microblogging service like Twitter continues to surge in importance as a means of sharing information in social networks. In the medical domain, several works have shown the potential of detecting public health events (i.e., infectious disease outbreaks) using Twitter messages or tweets. Given its real-time nature, Twitter can enhance early outbreak warning for public health authorities in order that a rapid response can take place. Most of previous works on detecting outbreaks in Twitter simply analyze tweets matched disease names and/or locations of interests. However, the effectiveness of such method is limited for two main reasons. First, disease names are highly ambiguous, i.e., referring slangs or non health-related contexts. Second, the characteristics of infectious diseases are highly dynamic in time and place, namely, strongly time-dependent and vary greatly among different regions. In this paper, we propose to analyze the temporal diversity of tweets during the known periods of real-world outbreaks in order to gain insight into a temporary focus on specific events. More precisely, our objective is to understand whether the temporal diversity of tweets can be used as indicators of outbreak events, and to which extent. We employ an efficient algorithm based on sampling to compute the diversity statistics of tweets at particular time. To this end, we conduct experiments by correlating temporal diversity with the estimated event magnitude of 14 real-world outbreak events manually created as ground truth. Our analysis shows that correlation results are diverse among different outbreaks, which can reflect the characteristics (severity and duration) of outbreaks.]]>

A microblogging service like Twitter continues to surge in importance as a means of sharing information in social networks. In the medical domain, several works have shown the potential of detecting public health events (i.e., infectious disease outbreaks) using Twitter messages or tweets. Given its real-time nature, Twitter can enhance early outbreak warning for public health authorities in order that a rapid response can take place. Most of previous works on detecting outbreaks in Twitter simply analyze tweets matched disease names and/or locations of interests. However, the effectiveness of such method is limited for two main reasons. First, disease names are highly ambiguous, i.e., referring slangs or non health-related contexts. Second, the characteristics of infectious diseases are highly dynamic in time and place, namely, strongly time-dependent and vary greatly among different regions. In this paper, we propose to analyze the temporal diversity of tweets during the known periods of real-world outbreaks in order to gain insight into a temporary focus on specific events. More precisely, our objective is to understand whether the temporal diversity of tweets can be used as indicators of outbreak events, and to which extent. We employ an efficient algorithm based on sampling to compute the diversity statistics of tweets at particular time. To this end, we conduct experiments by correlating temporal diversity with the estimated event magnitude of 14 real-world outbreak events manually created as ground truth. Our analysis shows that correlation results are diverse among different outbreaks, which can reflect the characteristics (severity and duration) of outbreaks.]]>
Fri, 02 Jan 2015 08:28:22 GMT /slideshow/understanding-the-diversity-of-tweets-in-the-time-of-outbreaks/43149123 NattiyaKanhabua@slideshare.net(NattiyaKanhabua) Understanding the Diversity of Tweets in the Time of Outbreaks NattiyaKanhabua A microblogging service like Twitter continues to surge in importance as a means of sharing information in social networks. In the medical domain, several works have shown the potential of detecting public health events (i.e., infectious disease outbreaks) using Twitter messages or tweets. Given its real-time nature, Twitter can enhance early outbreak warning for public health authorities in order that a rapid response can take place. Most of previous works on detecting outbreaks in Twitter simply analyze tweets matched disease names and/or locations of interests. However, the effectiveness of such method is limited for two main reasons. First, disease names are highly ambiguous, i.e., referring slangs or non health-related contexts. Second, the characteristics of infectious diseases are highly dynamic in time and place, namely, strongly time-dependent and vary greatly among different regions. In this paper, we propose to analyze the temporal diversity of tweets during the known periods of real-world outbreaks in order to gain insight into a temporary focus on specific events. More precisely, our objective is to understand whether the temporal diversity of tweets can be used as indicators of outbreak events, and to which extent. We employ an efficient algorithm based on sampling to compute the diversity statistics of tweets at particular time. To this end, we conduct experiments by correlating temporal diversity with the estimated event magnitude of 14 real-world outbreak events manually created as ground truth. Our analysis shows that correlation results are diverse among different outbreaks, which can reflect the characteristics (severity and duration) of outbreaks. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/pkfg69yrrfegbokior2w-signature-4c5464e0687501dd96cb9b339374d3f81818b71df95d2630199d16c74c242ec1-poli-150102082822-conversion-gate02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> A microblogging service like Twitter continues to surge in importance as a means of sharing information in social networks. In the medical domain, several works have shown the potential of detecting public health events (i.e., infectious disease outbreaks) using Twitter messages or tweets. Given its real-time nature, Twitter can enhance early outbreak warning for public health authorities in order that a rapid response can take place. Most of previous works on detecting outbreaks in Twitter simply analyze tweets matched disease names and/or locations of interests. However, the effectiveness of such method is limited for two main reasons. First, disease names are highly ambiguous, i.e., referring slangs or non health-related contexts. Second, the characteristics of infectious diseases are highly dynamic in time and place, namely, strongly time-dependent and vary greatly among different regions. In this paper, we propose to analyze the temporal diversity of tweets during the known periods of real-world outbreaks in order to gain insight into a temporary focus on specific events. More precisely, our objective is to understand whether the temporal diversity of tweets can be used as indicators of outbreak events, and to which extent. We employ an efficient algorithm based on sampling to compute the diversity statistics of tweets at particular time. To this end, we conduct experiments by correlating temporal diversity with the estimated event magnitude of 14 real-world outbreak events manually created as ground truth. Our analysis shows that correlation results are diverse among different outbreaks, which can reflect the characteristics (severity and duration) of outbreaks.
Understanding the Diversity of Tweets in the Time of Outbreaks from Nattiya Kanhabua
]]>
733 4 https://cdn.slidesharecdn.com/ss_thumbnails/pkfg69yrrfegbokior2w-signature-4c5464e0687501dd96cb9b339374d3f81818b71df95d2630199d16c74c242ec1-poli-150102082822-conversion-gate02-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Why Is It Difficult to Detect Outbreaks in Twitter? /slideshow/why-is-it-difficult-to-detect-outbreaks-in-twitter/43148925 y3rfgzwqtzkvnzk7ljdo-signature-9c9ab5f481265babba1748657605522deb88b222e3da0b3baccbb365c1cffb26-poli-150102081844-conversion-gate01
In this paper, we present an event-based Epidemic Intelligence (EI) system framework leveraging social media data, e.g., Twitter messages (or tweets) for providing public health officials the necessary tools to survey and sift through relevant information, namely, disease outbreak events. There exists three main research challenges in gathering epidemic intelligence from social media streams: 1) dynamic classification to enable message filtering, 2) signal generation producing reliable warnings based on observed term frequency changes in the filtered messages, and 3) providing search and recommendation functionalities to domain experts, for better assessment of the potential outbreak threats associated with the generated signals. We outline possible approaches to solve these important challenges as well as discuss areas where further research is required. The aim of this paper is to provide guidance for similar endeavors, and to give prospective event-based Epidemic Intelligence system builders a more realistic view on the benefits and issues of social media stream analysis.]]>

In this paper, we present an event-based Epidemic Intelligence (EI) system framework leveraging social media data, e.g., Twitter messages (or tweets) for providing public health officials the necessary tools to survey and sift through relevant information, namely, disease outbreak events. There exists three main research challenges in gathering epidemic intelligence from social media streams: 1) dynamic classification to enable message filtering, 2) signal generation producing reliable warnings based on observed term frequency changes in the filtered messages, and 3) providing search and recommendation functionalities to domain experts, for better assessment of the potential outbreak threats associated with the generated signals. We outline possible approaches to solve these important challenges as well as discuss areas where further research is required. The aim of this paper is to provide guidance for similar endeavors, and to give prospective event-based Epidemic Intelligence system builders a more realistic view on the benefits and issues of social media stream analysis.]]>
Fri, 02 Jan 2015 08:18:44 GMT /slideshow/why-is-it-difficult-to-detect-outbreaks-in-twitter/43148925 NattiyaKanhabua@slideshare.net(NattiyaKanhabua) Why Is It Difficult to Detect Outbreaks in Twitter? NattiyaKanhabua In this paper, we present an event-based Epidemic Intelligence (EI) system framework leveraging social media data, e.g., Twitter messages (or tweets) for providing public health officials the necessary tools to survey and sift through relevant information, namely, disease outbreak events. There exists three main research challenges in gathering epidemic intelligence from social media streams: 1) dynamic classification to enable message filtering, 2) signal generation producing reliable warnings based on observed term frequency changes in the filtered messages, and 3) providing search and recommendation functionalities to domain experts, for better assessment of the potential outbreak threats associated with the generated signals. We outline possible approaches to solve these important challenges as well as discuss areas where further research is required. The aim of this paper is to provide guidance for similar endeavors, and to give prospective event-based Epidemic Intelligence system builders a more realistic view on the benefits and issues of social media stream analysis. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/y3rfgzwqtzkvnzk7ljdo-signature-9c9ab5f481265babba1748657605522deb88b222e3da0b3baccbb365c1cffb26-poli-150102081844-conversion-gate01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> In this paper, we present an event-based Epidemic Intelligence (EI) system framework leveraging social media data, e.g., Twitter messages (or tweets) for providing public health officials the necessary tools to survey and sift through relevant information, namely, disease outbreak events. There exists three main research challenges in gathering epidemic intelligence from social media streams: 1) dynamic classification to enable message filtering, 2) signal generation producing reliable warnings based on observed term frequency changes in the filtered messages, and 3) providing search and recommendation functionalities to domain experts, for better assessment of the potential outbreak threats associated with the generated signals. We outline possible approaches to solve these important challenges as well as discuss areas where further research is required. The aim of this paper is to provide guidance for similar endeavors, and to give prospective event-based Epidemic Intelligence system builders a more realistic view on the benefits and issues of social media stream analysis.
Why Is It Difficult to Detect Outbreaks in Twitter? from Nattiya Kanhabua
]]>
569 3 https://cdn.slidesharecdn.com/ss_thumbnails/y3rfgzwqtzkvnzk7ljdo-signature-9c9ab5f481265babba1748657605522deb88b222e3da0b3baccbb365c1cffb26-poli-150102081844-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Leveraging Dynamic Query Subtopics for Time-aware Search Result Diversification /slideshow/leveraging-dynamic-query-subtopics-for-timeaware-search-result-diversification/43147824 tempdiv-150102071836-conversion-gate02
Search result diversification is a common technique for tackling the problem of ambiguous and multi-faceted queries by maximizing query aspects or subtopics in a result list. In some special cases, subtopics associated to such queries can be temporally ambiguous, for instance, the query US Open is more likely to be targeting the tennis open in September, and the golf tournament in June. More precisely, users' search intent can be identified by the popularity of a subtopic with respect to the time where the query is issued. In this paper, we study search result diversification for time-sensitive queries, where the temporal dynamics of query subtopics are explicitly determined and modeled into result diversification. Unlike aforementioned work that, in general, considered only static subtopics, we leverage dynamic subtopics by analyzing two data sources (i.e., query logs and a document collection). By using these data sources, it provides the insights from different perspectives of how query subtopics change over time. Moreover, we propose novel time-aware diversification methods that leverage the identified dynamic subtopics. A key idea is to re-rank search results based on the freshness and popularity of subtopics. To this end, our experimental results show that the proposed methods can significantly improve the diversity and relevance effectiveness for time-sensitive queries in comparison with state-of-the-art methods.]]>

Search result diversification is a common technique for tackling the problem of ambiguous and multi-faceted queries by maximizing query aspects or subtopics in a result list. In some special cases, subtopics associated to such queries can be temporally ambiguous, for instance, the query US Open is more likely to be targeting the tennis open in September, and the golf tournament in June. More precisely, users' search intent can be identified by the popularity of a subtopic with respect to the time where the query is issued. In this paper, we study search result diversification for time-sensitive queries, where the temporal dynamics of query subtopics are explicitly determined and modeled into result diversification. Unlike aforementioned work that, in general, considered only static subtopics, we leverage dynamic subtopics by analyzing two data sources (i.e., query logs and a document collection). By using these data sources, it provides the insights from different perspectives of how query subtopics change over time. Moreover, we propose novel time-aware diversification methods that leverage the identified dynamic subtopics. A key idea is to re-rank search results based on the freshness and popularity of subtopics. To this end, our experimental results show that the proposed methods can significantly improve the diversity and relevance effectiveness for time-sensitive queries in comparison with state-of-the-art methods.]]>
Fri, 02 Jan 2015 07:18:36 GMT /slideshow/leveraging-dynamic-query-subtopics-for-timeaware-search-result-diversification/43147824 NattiyaKanhabua@slideshare.net(NattiyaKanhabua) Leveraging Dynamic Query Subtopics for Time-aware Search Result Diversification NattiyaKanhabua Search result diversification is a common technique for tackling the problem of ambiguous and multi-faceted queries by maximizing query aspects or subtopics in a result list. In some special cases, subtopics associated to such queries can be temporally ambiguous, for instance, the query US Open is more likely to be targeting the tennis open in September, and the golf tournament in June. More precisely, users' search intent can be identified by the popularity of a subtopic with respect to the time where the query is issued. In this paper, we study search result diversification for time-sensitive queries, where the temporal dynamics of query subtopics are explicitly determined and modeled into result diversification. Unlike aforementioned work that, in general, considered only static subtopics, we leverage dynamic subtopics by analyzing two data sources (i.e., query logs and a document collection). By using these data sources, it provides the insights from different perspectives of how query subtopics change over time. Moreover, we propose novel time-aware diversification methods that leverage the identified dynamic subtopics. A key idea is to re-rank search results based on the freshness and popularity of subtopics. To this end, our experimental results show that the proposed methods can significantly improve the diversity and relevance effectiveness for time-sensitive queries in comparison with state-of-the-art methods. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/tempdiv-150102071836-conversion-gate02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Search result diversification is a common technique for tackling the problem of ambiguous and multi-faceted queries by maximizing query aspects or subtopics in a result list. In some special cases, subtopics associated to such queries can be temporally ambiguous, for instance, the query US Open is more likely to be targeting the tennis open in September, and the golf tournament in June. More precisely, users&#39; search intent can be identified by the popularity of a subtopic with respect to the time where the query is issued. In this paper, we study search result diversification for time-sensitive queries, where the temporal dynamics of query subtopics are explicitly determined and modeled into result diversification. Unlike aforementioned work that, in general, considered only static subtopics, we leverage dynamic subtopics by analyzing two data sources (i.e., query logs and a document collection). By using these data sources, it provides the insights from different perspectives of how query subtopics change over time. Moreover, we propose novel time-aware diversification methods that leverage the identified dynamic subtopics. A key idea is to re-rank search results based on the freshness and popularity of subtopics. To this end, our experimental results show that the proposed methods can significantly improve the diversity and relevance effectiveness for time-sensitive queries in comparison with state-of-the-art methods.
Leveraging Dynamic Query Subtopics for Time-aware Search Result Diversification from Nattiya Kanhabua
]]>
743 2 https://cdn.slidesharecdn.com/ss_thumbnails/tempdiv-150102071836-conversion-gate02-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
On the Value of Temporal Anchor Texts in Wikipedia /NattiyaKanhabua/on-the-value-of-temporal-anchor-texts-in-wikipedia taia2014v2-150102071315-conversion-gate02
Wikipedia has become a widely accepted reference point for information of all kinds; real-world events (e.g., natural disasters, man-made incidents, and political events) as well as specific entities like politicians, celebrities, and entities involved in an event. Due to its open construction and negotiation, Wikipedia is an important new cultural and societal phenomenon, and the content of Wikipedia articles is a valuable source for different applications. For instance, the edit history and view logs of Wikipedia can be leveraged for detecting an event and its associated entities. In this study, we analyze temporal anchor texts extracted from the edit history. We propose a model for Wikipedia and anchor texts viewed as a temporal resource and a probabilistic method for ranking temporal anchor texts. Our preliminary results show that relevant anchor texts composed of evolving information (e.g., the changes of names and semantic roles, as well as evolving context) that reflects societal trends and perceptions, thus being candidates for capturing entity evolution.]]>

Wikipedia has become a widely accepted reference point for information of all kinds; real-world events (e.g., natural disasters, man-made incidents, and political events) as well as specific entities like politicians, celebrities, and entities involved in an event. Due to its open construction and negotiation, Wikipedia is an important new cultural and societal phenomenon, and the content of Wikipedia articles is a valuable source for different applications. For instance, the edit history and view logs of Wikipedia can be leveraged for detecting an event and its associated entities. In this study, we analyze temporal anchor texts extracted from the edit history. We propose a model for Wikipedia and anchor texts viewed as a temporal resource and a probabilistic method for ranking temporal anchor texts. Our preliminary results show that relevant anchor texts composed of evolving information (e.g., the changes of names and semantic roles, as well as evolving context) that reflects societal trends and perceptions, thus being candidates for capturing entity evolution.]]>
Fri, 02 Jan 2015 07:13:15 GMT /NattiyaKanhabua/on-the-value-of-temporal-anchor-texts-in-wikipedia NattiyaKanhabua@slideshare.net(NattiyaKanhabua) On the Value of Temporal Anchor Texts in Wikipedia NattiyaKanhabua Wikipedia has become a widely accepted reference point for information of all kinds; real-world events (e.g., natural disasters, man-made incidents, and political events) as well as specific entities like politicians, celebrities, and entities involved in an event. Due to its open construction and negotiation, Wikipedia is an important new cultural and societal phenomenon, and the content of Wikipedia articles is a valuable source for different applications. For instance, the edit history and view logs of Wikipedia can be leveraged for detecting an event and its associated entities. In this study, we analyze temporal anchor texts extracted from the edit history. We propose a model for Wikipedia and anchor texts viewed as a temporal resource and a probabilistic method for ranking temporal anchor texts. Our preliminary results show that relevant anchor texts composed of evolving information (e.g., the changes of names and semantic roles, as well as evolving context) that reflects societal trends and perceptions, thus being candidates for capturing entity evolution. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/taia2014v2-150102071315-conversion-gate02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Wikipedia has become a widely accepted reference point for information of all kinds; real-world events (e.g., natural disasters, man-made incidents, and political events) as well as specific entities like politicians, celebrities, and entities involved in an event. Due to its open construction and negotiation, Wikipedia is an important new cultural and societal phenomenon, and the content of Wikipedia articles is a valuable source for different applications. For instance, the edit history and view logs of Wikipedia can be leveraged for detecting an event and its associated entities. In this study, we analyze temporal anchor texts extracted from the edit history. We propose a model for Wikipedia and anchor texts viewed as a temporal resource and a probabilistic method for ranking temporal anchor texts. Our preliminary results show that relevant anchor texts composed of evolving information (e.g., the changes of names and semantic roles, as well as evolving context) that reflects societal trends and perceptions, thus being candidates for capturing entity evolution.
On the Value of Temporal Anchor Texts in Wikipedia from Nattiya Kanhabua
]]>
805 3 https://cdn.slidesharecdn.com/ss_thumbnails/taia2014v2-150102071315-conversion-gate02-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Ranking Related News Predictions /slideshow/ranking-related-news-predictions/43147704 abhkmwyytaefkmbskgta-signature-92eb0b6e6900a5160f41724283e912b5d44ab828208c9b7fb35a5dd6fe4f2989-poli-150102070954-conversion-gate01
We estimate that nearly one third of news articles contain references to future events. While this information can prove crucial to understanding news stories and how events will develop for a given topic, there is currently no easy way to access this information. We propose a new task to address the problem of retrieving and ranking sentences that contain mentions to future events, which we call ranking related news predictions. In this paper, we formally define this task and propose a learning to rank approach based on 4 classes of features: term similarity, entity-based similarity, topic similarity, and temporal similarity. Through extensive evaluations using a corpus consisting of 1.8 millions news articles and 6,000 manually judged relevance pairs, we show that our approach is able to retrieve a significant number of relevant predictions related to a given topic.]]>

We estimate that nearly one third of news articles contain references to future events. While this information can prove crucial to understanding news stories and how events will develop for a given topic, there is currently no easy way to access this information. We propose a new task to address the problem of retrieving and ranking sentences that contain mentions to future events, which we call ranking related news predictions. In this paper, we formally define this task and propose a learning to rank approach based on 4 classes of features: term similarity, entity-based similarity, topic similarity, and temporal similarity. Through extensive evaluations using a corpus consisting of 1.8 millions news articles and 6,000 manually judged relevance pairs, we show that our approach is able to retrieve a significant number of relevant predictions related to a given topic.]]>
Fri, 02 Jan 2015 07:09:54 GMT /slideshow/ranking-related-news-predictions/43147704 NattiyaKanhabua@slideshare.net(NattiyaKanhabua) Ranking Related News Predictions NattiyaKanhabua We estimate that nearly one third of news articles contain references to future events. While this information can prove crucial to understanding news stories and how events will develop for a given topic, there is currently no easy way to access this information. We propose a new task to address the problem of retrieving and ranking sentences that contain mentions to future events, which we call ranking related news predictions. In this paper, we formally define this task and propose a learning to rank approach based on 4 classes of features: term similarity, entity-based similarity, topic similarity, and temporal similarity. Through extensive evaluations using a corpus consisting of 1.8 millions news articles and 6,000 manually judged relevance pairs, we show that our approach is able to retrieve a significant number of relevant predictions related to a given topic. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/abhkmwyytaefkmbskgta-signature-92eb0b6e6900a5160f41724283e912b5d44ab828208c9b7fb35a5dd6fe4f2989-poli-150102070954-conversion-gate01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> We estimate that nearly one third of news articles contain references to future events. While this information can prove crucial to understanding news stories and how events will develop for a given topic, there is currently no easy way to access this information. We propose a new task to address the problem of retrieving and ranking sentences that contain mentions to future events, which we call ranking related news predictions. In this paper, we formally define this task and propose a learning to rank approach based on 4 classes of features: term similarity, entity-based similarity, topic similarity, and temporal similarity. Through extensive evaluations using a corpus consisting of 1.8 millions news articles and 6,000 manually judged relevance pairs, we show that our approach is able to retrieve a significant number of relevant predictions related to a given topic.
Ranking Related News Predictions from Nattiya Kanhabua
]]>
912 6 https://cdn.slidesharecdn.com/ss_thumbnails/abhkmwyytaefkmbskgta-signature-92eb0b6e6900a5160f41724283e912b5d44ab828208c9b7fb35a5dd6fe4f2989-poli-150102070954-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Temporal summarization of event related updates /slideshow/temporal-summarization-of-event-related-updates/43146952 7qocg70fr5ymhpm7zluv-signature-bdc37f4db7526ccbc4d42e38bca98db20a89cad2d48fa25b0d1076bd04804780-poli-150102062626-conversion-gate01
Wikipedia is a free multilingual online encyclopedia covering a wide range of general and specific knowledge. Its con- tent is continuously maintained up-to-date and extended by a supporting community. In many cases, real-world events influence the collaborative editing of Wikipedia articles of the involved or affected entities. In this paper, we present Wikipedia Event Reporter, a web-based system that supports the entity-centric, temporal analytics of event-related information in Wikipedia by analyzing the whole history of article updates. For a given entity, the system first identifies peaks of update activities for the entity using burst detection and automatically extracts event-related updates using a machine-learning approach. Further, the system deter- mines distinct events through the clustering of updates by exploiting different types of information such as update time, textual similarity, and the position of the updates within an article. Finally, the system generates the meaningful temporal summarization of event-related updates and automatically annotates the identified events in a timeline.]]>

Wikipedia is a free multilingual online encyclopedia covering a wide range of general and specific knowledge. Its con- tent is continuously maintained up-to-date and extended by a supporting community. In many cases, real-world events influence the collaborative editing of Wikipedia articles of the involved or affected entities. In this paper, we present Wikipedia Event Reporter, a web-based system that supports the entity-centric, temporal analytics of event-related information in Wikipedia by analyzing the whole history of article updates. For a given entity, the system first identifies peaks of update activities for the entity using burst detection and automatically extracts event-related updates using a machine-learning approach. Further, the system deter- mines distinct events through the clustering of updates by exploiting different types of information such as update time, textual similarity, and the position of the updates within an article. Finally, the system generates the meaningful temporal summarization of event-related updates and automatically annotates the identified events in a timeline.]]>
Fri, 02 Jan 2015 06:26:26 GMT /slideshow/temporal-summarization-of-event-related-updates/43146952 NattiyaKanhabua@slideshare.net(NattiyaKanhabua) Temporal summarization of event related updates NattiyaKanhabua Wikipedia is a free multilingual online encyclopedia covering a wide range of general and specific knowledge. Its con- tent is continuously maintained up-to-date and extended by a supporting community. In many cases, real-world events influence the collaborative editing of Wikipedia articles of the involved or affected entities. In this paper, we present Wikipedia Event Reporter, a web-based system that supports the entity-centric, temporal analytics of event-related information in Wikipedia by analyzing the whole history of article updates. For a given entity, the system first identifies peaks of update activities for the entity using burst detection and automatically extracts event-related updates using a machine-learning approach. Further, the system deter- mines distinct events through the clustering of updates by exploiting different types of information such as update time, textual similarity, and the position of the updates within an article. Finally, the system generates the meaningful temporal summarization of event-related updates and automatically annotates the identified events in a timeline. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/7qocg70fr5ymhpm7zluv-signature-bdc37f4db7526ccbc4d42e38bca98db20a89cad2d48fa25b0d1076bd04804780-poli-150102062626-conversion-gate01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Wikipedia is a free multilingual online encyclopedia covering a wide range of general and specific knowledge. Its con- tent is continuously maintained up-to-date and extended by a supporting community. In many cases, real-world events influence the collaborative editing of Wikipedia articles of the involved or affected entities. In this paper, we present Wikipedia Event Reporter, a web-based system that supports the entity-centric, temporal analytics of event-related information in Wikipedia by analyzing the whole history of article updates. For a given entity, the system first identifies peaks of update activities for the entity using burst detection and automatically extracts event-related updates using a machine-learning approach. Further, the system deter- mines distinct events through the clustering of updates by exploiting different types of information such as update time, textual similarity, and the position of the updates within an article. Finally, the system generates the meaningful temporal summarization of event-related updates and automatically annotates the identified events in a timeline.
Temporal summarization of event related updates from Nattiya Kanhabua
]]>
653 3 https://cdn.slidesharecdn.com/ss_thumbnails/7qocg70fr5ymhpm7zluv-signature-bdc37f4db7526ccbc4d42e38bca98db20a89cad2d48fa25b0d1076bd04804780-poli-150102062626-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Temporal Web Dynamics: Implications from Search Perspective /slideshow/l9-searching-temporalweb/43146115 8hv5oys8r2kvltofuqh2-signature-3d3ed0f0cd6d481d9daa1986c97ec172ecf141d97e6ab70dbba104ff21208f08-poli-150102053912-conversion-gate02
In this talk, we will give a survey of current approaches to searching the temporal web. In such a web collection, the contents are created and/or edited over time, and examples are web archives, news archives, blogs, micro-blogs, personal emails and enterprise documents. Unfortunately, traditional IR approaches based on term-matching only can give unsatisfactory results when searching the temporal web. The reason for this is multifold: 1) the collection is strongly time-dependent, i.e., with multiple versions of documents, 2) the contents of documents are about events happened at particular time periods, 3) the meanings of semantic annotations can change over time, and 4) a query representing an information need can be time-sensitive, so-called a temporal query. Several major challenges in searching the temporal web will be discussed, namely, 1) How to understand temporal search intent represented by time-sensitive queries? 2) How to handle the temporal dynamics of queries and documents? and 3) How to explicitly model temporal information in retrieval and ranking models? To this end, we will present current approaches to the addressed problems as well as outline the directions for future research.]]>

In this talk, we will give a survey of current approaches to searching the temporal web. In such a web collection, the contents are created and/or edited over time, and examples are web archives, news archives, blogs, micro-blogs, personal emails and enterprise documents. Unfortunately, traditional IR approaches based on term-matching only can give unsatisfactory results when searching the temporal web. The reason for this is multifold: 1) the collection is strongly time-dependent, i.e., with multiple versions of documents, 2) the contents of documents are about events happened at particular time periods, 3) the meanings of semantic annotations can change over time, and 4) a query representing an information need can be time-sensitive, so-called a temporal query. Several major challenges in searching the temporal web will be discussed, namely, 1) How to understand temporal search intent represented by time-sensitive queries? 2) How to handle the temporal dynamics of queries and documents? and 3) How to explicitly model temporal information in retrieval and ranking models? To this end, we will present current approaches to the addressed problems as well as outline the directions for future research.]]>
Fri, 02 Jan 2015 05:39:12 GMT /slideshow/l9-searching-temporalweb/43146115 NattiyaKanhabua@slideshare.net(NattiyaKanhabua) Temporal Web Dynamics: Implications from Search Perspective NattiyaKanhabua In this talk, we will give a survey of current approaches to searching the temporal web. In such a web collection, the contents are created and/or edited over time, and examples are web archives, news archives, blogs, micro-blogs, personal emails and enterprise documents. Unfortunately, traditional IR approaches based on term-matching only can give unsatisfactory results when searching the temporal web. The reason for this is multifold: 1) the collection is strongly time-dependent, i.e., with multiple versions of documents, 2) the contents of documents are about events happened at particular time periods, 3) the meanings of semantic annotations can change over time, and 4) a query representing an information need can be time-sensitive, so-called a temporal query. Several major challenges in searching the temporal web will be discussed, namely, 1) How to understand temporal search intent represented by time-sensitive queries? 2) How to handle the temporal dynamics of queries and documents? and 3) How to explicitly model temporal information in retrieval and ranking models? To this end, we will present current approaches to the addressed problems as well as outline the directions for future research. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/8hv5oys8r2kvltofuqh2-signature-3d3ed0f0cd6d481d9daa1986c97ec172ecf141d97e6ab70dbba104ff21208f08-poli-150102053912-conversion-gate02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> In this talk, we will give a survey of current approaches to searching the temporal web. In such a web collection, the contents are created and/or edited over time, and examples are web archives, news archives, blogs, micro-blogs, personal emails and enterprise documents. Unfortunately, traditional IR approaches based on term-matching only can give unsatisfactory results when searching the temporal web. The reason for this is multifold: 1) the collection is strongly time-dependent, i.e., with multiple versions of documents, 2) the contents of documents are about events happened at particular time periods, 3) the meanings of semantic annotations can change over time, and 4) a query representing an information need can be time-sensitive, so-called a temporal query. Several major challenges in searching the temporal web will be discussed, namely, 1) How to understand temporal search intent represented by time-sensitive queries? 2) How to handle the temporal dynamics of queries and documents? and 3) How to explicitly model temporal information in retrieval and ranking models? To this end, we will present current approaches to the addressed problems as well as outline the directions for future research.
Temporal Web Dynamics: Implications from Search Perspective from Nattiya Kanhabua
]]>
600 6 https://cdn.slidesharecdn.com/ss_thumbnails/8hv5oys8r2kvltofuqh2-signature-3d3ed0f0cd6d481d9daa1986c97ec172ecf141d97e6ab70dbba104ff21208f08-poli-150102053912-conversion-gate02-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Temporal Web Dynamics and Implications for Information Retrieval /slideshow/alexandria-searching-temporalweb/43146035 anvb1jwericnirkgnkdw-signature-eae5eef55e63a8b8fa9b10af78032dc34c1180da06b51090a4619f6464395da0-poli-150102053351-conversion-gate01
In this talk, we will give a survey of current approaches to searching the temporal web. In such a web collection, the contents are created and/or edited over time, and examples are web archives, news archives, blogs, micro-blogs, personal emails and enterprise documents. Unfortunately, traditional IR approaches based on term-matching only can give unsatisfactory results when searching the temporal web. The reason for this is multifold: 1) the collection is strongly time-dependent, i.e., with multiple versions of documents, 2) the contents of documents are about events happened at particular time periods, 3) the meanings of semantic annotations can change over time, and 4) a query representing an information need can be time-sensitive, so-called a temporal query. Several major challenges in searching the temporal web will be discussed, namely, 1) How to understand temporal search intent represented by time-sensitive queries? 2) How to handle the temporal dynamics of queries and documents? and 3) How to explicitly model temporal information in retrieval and ranking models? To this end, we will present current approaches to the addressed problems as well as outline the directions for future research.]]>

In this talk, we will give a survey of current approaches to searching the temporal web. In such a web collection, the contents are created and/or edited over time, and examples are web archives, news archives, blogs, micro-blogs, personal emails and enterprise documents. Unfortunately, traditional IR approaches based on term-matching only can give unsatisfactory results when searching the temporal web. The reason for this is multifold: 1) the collection is strongly time-dependent, i.e., with multiple versions of documents, 2) the contents of documents are about events happened at particular time periods, 3) the meanings of semantic annotations can change over time, and 4) a query representing an information need can be time-sensitive, so-called a temporal query. Several major challenges in searching the temporal web will be discussed, namely, 1) How to understand temporal search intent represented by time-sensitive queries? 2) How to handle the temporal dynamics of queries and documents? and 3) How to explicitly model temporal information in retrieval and ranking models? To this end, we will present current approaches to the addressed problems as well as outline the directions for future research.]]>
Fri, 02 Jan 2015 05:33:51 GMT /slideshow/alexandria-searching-temporalweb/43146035 NattiyaKanhabua@slideshare.net(NattiyaKanhabua) Temporal Web Dynamics and Implications for Information Retrieval NattiyaKanhabua In this talk, we will give a survey of current approaches to searching the temporal web. In such a web collection, the contents are created and/or edited over time, and examples are web archives, news archives, blogs, micro-blogs, personal emails and enterprise documents. Unfortunately, traditional IR approaches based on term-matching only can give unsatisfactory results when searching the temporal web. The reason for this is multifold: 1) the collection is strongly time-dependent, i.e., with multiple versions of documents, 2) the contents of documents are about events happened at particular time periods, 3) the meanings of semantic annotations can change over time, and 4) a query representing an information need can be time-sensitive, so-called a temporal query. Several major challenges in searching the temporal web will be discussed, namely, 1) How to understand temporal search intent represented by time-sensitive queries? 2) How to handle the temporal dynamics of queries and documents? and 3) How to explicitly model temporal information in retrieval and ranking models? To this end, we will present current approaches to the addressed problems as well as outline the directions for future research. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/anvb1jwericnirkgnkdw-signature-eae5eef55e63a8b8fa9b10af78032dc34c1180da06b51090a4619f6464395da0-poli-150102053351-conversion-gate01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> In this talk, we will give a survey of current approaches to searching the temporal web. In such a web collection, the contents are created and/or edited over time, and examples are web archives, news archives, blogs, micro-blogs, personal emails and enterprise documents. Unfortunately, traditional IR approaches based on term-matching only can give unsatisfactory results when searching the temporal web. The reason for this is multifold: 1) the collection is strongly time-dependent, i.e., with multiple versions of documents, 2) the contents of documents are about events happened at particular time periods, 3) the meanings of semantic annotations can change over time, and 4) a query representing an information need can be time-sensitive, so-called a temporal query. Several major challenges in searching the temporal web will be discussed, namely, 1) How to understand temporal search intent represented by time-sensitive queries? 2) How to handle the temporal dynamics of queries and documents? and 3) How to explicitly model temporal information in retrieval and ranking models? To this end, we will present current approaches to the addressed problems as well as outline the directions for future research.
Temporal Web Dynamics and Implications for Information Retrieval from Nattiya Kanhabua
]]>
1049 2 https://cdn.slidesharecdn.com/ss_thumbnails/anvb1jwericnirkgnkdw-signature-eae5eef55e63a8b8fa9b10af78032dc34c1180da06b51090a4619f6464395da0-poli-150102053351-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Preservation and Forgetting: Friends or Foes? /NattiyaKanhabua/preservation-and-forgetting-friends-or-foes ap9q8betrsglal3qmqqf-signature-a63cd9b7da7492f80d47485b3700053dc5c4ddaffd1daf169e1149a9c1a8c73b-poli-150102053103-conversion-gate01
Humans are very effective in remembering by abstraction, pattern exploitation, or contextualization. On the other hand, humans are also capable of forgetting irrelevant details, an important role in the human brain helping us to focus on relevant things instead of drowning in details by remembering everything. The research question that we address in this paper is: Can we learn from human remembering and forgetting in order to develop more advanced preservation technology? In particular, we aim at studying how a managed or controlled form of forgetting can play a role in digital preservation, including personal and organizational archives as well as collective memories. Our research goal is twofold: 1) to establish effective preservation for more concise and accessible digital memories, and 2) to enable the easier and wider adoption of preservation technology. The concept of managed forgetting is discussed in more detail in the research work of the European project ForgetIT, which investigates the proposed concept by mean of an integrated information and preservation management approach.]]>

Humans are very effective in remembering by abstraction, pattern exploitation, or contextualization. On the other hand, humans are also capable of forgetting irrelevant details, an important role in the human brain helping us to focus on relevant things instead of drowning in details by remembering everything. The research question that we address in this paper is: Can we learn from human remembering and forgetting in order to develop more advanced preservation technology? In particular, we aim at studying how a managed or controlled form of forgetting can play a role in digital preservation, including personal and organizational archives as well as collective memories. Our research goal is twofold: 1) to establish effective preservation for more concise and accessible digital memories, and 2) to enable the easier and wider adoption of preservation technology. The concept of managed forgetting is discussed in more detail in the research work of the European project ForgetIT, which investigates the proposed concept by mean of an integrated information and preservation management approach.]]>
Fri, 02 Jan 2015 05:31:03 GMT /NattiyaKanhabua/preservation-and-forgetting-friends-or-foes NattiyaKanhabua@slideshare.net(NattiyaKanhabua) Preservation and Forgetting: Friends or Foes? NattiyaKanhabua Humans are very effective in remembering by abstraction, pattern exploitation, or contextualization. On the other hand, humans are also capable of forgetting irrelevant details, an important role in the human brain helping us to focus on relevant things instead of drowning in details by remembering everything. The research question that we address in this paper is: Can we learn from human remembering and forgetting in order to develop more advanced preservation technology? In particular, we aim at studying how a managed or controlled form of forgetting can play a role in digital preservation, including personal and organizational archives as well as collective memories. Our research goal is twofold: 1) to establish effective preservation for more concise and accessible digital memories, and 2) to enable the easier and wider adoption of preservation technology. The concept of managed forgetting is discussed in more detail in the research work of the European project ForgetIT, which investigates the proposed concept by mean of an integrated information and preservation management approach. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/ap9q8betrsglal3qmqqf-signature-a63cd9b7da7492f80d47485b3700053dc5c4ddaffd1daf169e1149a9c1a8c73b-poli-150102053103-conversion-gate01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Humans are very effective in remembering by abstraction, pattern exploitation, or contextualization. On the other hand, humans are also capable of forgetting irrelevant details, an important role in the human brain helping us to focus on relevant things instead of drowning in details by remembering everything. The research question that we address in this paper is: Can we learn from human remembering and forgetting in order to develop more advanced preservation technology? In particular, we aim at studying how a managed or controlled form of forgetting can play a role in digital preservation, including personal and organizational archives as well as collective memories. Our research goal is twofold: 1) to establish effective preservation for more concise and accessible digital memories, and 2) to enable the easier and wider adoption of preservation technology. The concept of managed forgetting is discussed in more detail in the research work of the European project ForgetIT, which investigates the proposed concept by mean of an integrated information and preservation management approach.
Preservation and Forgetting: Friends or Foes? from Nattiya Kanhabua
]]>
533 5 https://cdn.slidesharecdn.com/ss_thumbnails/ap9q8betrsglal3qmqqf-signature-a63cd9b7da7492f80d47485b3700053dc5c4ddaffd1daf169e1149a9c1a8c73b-poli-150102053103-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Concise Preservation by Combining Managed Forgetting and Contextualized Remembering /NattiyaKanhabua/concise-preservation-by-combining-managed-forgetting-and-contextualized-remembering allatbbnsak3vwaquemi-signature-a697add395ff73d238b4063dec2d62e3aaf209a7af53ca1a047cfbe194576b4d-poli-150102052836-conversion-gate01
With the growing volumes of and reliance on digital content, there is a clear need for better information access solutions that keep relevant information accessible and usable in long-term. Inspired by the role of forgetting in the human brain, we envision a concept of managed forgetting for systematically dealing with information that progressively ceases in importance as well as with redundant information. Although inspired by human memory, managed forgetting is meant to complement rather than copy human remembering and forgetting. It can be regarded as functions of attention and significance dynamics relying on multi-faceted information assessment. This talk introduces our vision for managed forgetting on the conceptual level as part of an Integrated Cognitive Framework for Time-aware Information Access. We discuss relevant research and application aspects for managed forgetting. To this end, we present our first results and point out issues where further research is required.]]>

With the growing volumes of and reliance on digital content, there is a clear need for better information access solutions that keep relevant information accessible and usable in long-term. Inspired by the role of forgetting in the human brain, we envision a concept of managed forgetting for systematically dealing with information that progressively ceases in importance as well as with redundant information. Although inspired by human memory, managed forgetting is meant to complement rather than copy human remembering and forgetting. It can be regarded as functions of attention and significance dynamics relying on multi-faceted information assessment. This talk introduces our vision for managed forgetting on the conceptual level as part of an Integrated Cognitive Framework for Time-aware Information Access. We discuss relevant research and application aspects for managed forgetting. To this end, we present our first results and point out issues where further research is required.]]>
Fri, 02 Jan 2015 05:28:36 GMT /NattiyaKanhabua/concise-preservation-by-combining-managed-forgetting-and-contextualized-remembering NattiyaKanhabua@slideshare.net(NattiyaKanhabua) Concise Preservation by Combining Managed Forgetting and Contextualized Remembering NattiyaKanhabua With the growing volumes of and reliance on digital content, there is a clear need for better information access solutions that keep relevant information accessible and usable in long-term. Inspired by the role of forgetting in the human brain, we envision a concept of managed forgetting for systematically dealing with information that progressively ceases in importance as well as with redundant information. Although inspired by human memory, managed forgetting is meant to complement rather than copy human remembering and forgetting. It can be regarded as functions of attention and significance dynamics relying on multi-faceted information assessment. This talk introduces our vision for managed forgetting on the conceptual level as part of an Integrated Cognitive Framework for Time-aware Information Access. We discuss relevant research and application aspects for managed forgetting. To this end, we present our first results and point out issues where further research is required. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/allatbbnsak3vwaquemi-signature-a697add395ff73d238b4063dec2d62e3aaf209a7af53ca1a047cfbe194576b4d-poli-150102052836-conversion-gate01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> With the growing volumes of and reliance on digital content, there is a clear need for better information access solutions that keep relevant information accessible and usable in long-term. Inspired by the role of forgetting in the human brain, we envision a concept of managed forgetting for systematically dealing with information that progressively ceases in importance as well as with redundant information. Although inspired by human memory, managed forgetting is meant to complement rather than copy human remembering and forgetting. It can be regarded as functions of attention and significance dynamics relying on multi-faceted information assessment. This talk introduces our vision for managed forgetting on the conceptual level as part of an Integrated Cognitive Framework for Time-aware Information Access. We discuss relevant research and application aspects for managed forgetting. To this end, we present our first results and point out issues where further research is required.
Concise Preservation by Combining Managed Forgetting and Contextualized Remembering from Nattiya Kanhabua
]]>
555 3 https://cdn.slidesharecdn.com/ss_thumbnails/allatbbnsak3vwaquemi-signature-a697add395ff73d238b4063dec2d62e3aaf209a7af53ca1a047cfbe194576b4d-poli-150102052836-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Can Twitter & Co. Save Lives? /slideshow/can-twitter-co-save-lives/43145860 mpii-seminar2013outbreakdetectionintwitterslides-150102052403-conversion-gate02
In this talk, we present an event-based Epidemic Intelligence (EI) system framework leveraging social media data, e.g., Twitter messages (or tweets) for providing public health officials the necessary tools to survey and sift through relevant information, namely, disease outbreak events. There exist three main research challenges in gathering epidemic intelligence from social media streams: 1) dynamic classification to enable message filtering, 2) signal generation producing reliable warnings based on observed term frequency changes in the filtered messages, and 3) providing search and recommendation functionalities to domain experts, for better assessment of the potential outbreak threats associated with the generated signals. We outline possible approaches to solve these important challenges as well as discuss areas where further research is required. The objective is to provide guidance for similar endeavors, and to give prospective event-based Epidemic Intelligence system builders a more realistic view on the benefits and issues of social media stream analysis.]]>

In this talk, we present an event-based Epidemic Intelligence (EI) system framework leveraging social media data, e.g., Twitter messages (or tweets) for providing public health officials the necessary tools to survey and sift through relevant information, namely, disease outbreak events. There exist three main research challenges in gathering epidemic intelligence from social media streams: 1) dynamic classification to enable message filtering, 2) signal generation producing reliable warnings based on observed term frequency changes in the filtered messages, and 3) providing search and recommendation functionalities to domain experts, for better assessment of the potential outbreak threats associated with the generated signals. We outline possible approaches to solve these important challenges as well as discuss areas where further research is required. The objective is to provide guidance for similar endeavors, and to give prospective event-based Epidemic Intelligence system builders a more realistic view on the benefits and issues of social media stream analysis.]]>
Fri, 02 Jan 2015 05:24:03 GMT /slideshow/can-twitter-co-save-lives/43145860 NattiyaKanhabua@slideshare.net(NattiyaKanhabua) Can Twitter & Co. Save Lives? NattiyaKanhabua In this talk, we present an event-based Epidemic Intelligence (EI) system framework leveraging social media data, e.g., Twitter messages (or tweets) for providing public health officials the necessary tools to survey and sift through relevant information, namely, disease outbreak events. There exist three main research challenges in gathering epidemic intelligence from social media streams: 1) dynamic classification to enable message filtering, 2) signal generation producing reliable warnings based on observed term frequency changes in the filtered messages, and 3) providing search and recommendation functionalities to domain experts, for better assessment of the potential outbreak threats associated with the generated signals. We outline possible approaches to solve these important challenges as well as discuss areas where further research is required. The objective is to provide guidance for similar endeavors, and to give prospective event-based Epidemic Intelligence system builders a more realistic view on the benefits and issues of social media stream analysis. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/mpii-seminar2013outbreakdetectionintwitterslides-150102052403-conversion-gate02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> In this talk, we present an event-based Epidemic Intelligence (EI) system framework leveraging social media data, e.g., Twitter messages (or tweets) for providing public health officials the necessary tools to survey and sift through relevant information, namely, disease outbreak events. There exist three main research challenges in gathering epidemic intelligence from social media streams: 1) dynamic classification to enable message filtering, 2) signal generation producing reliable warnings based on observed term frequency changes in the filtered messages, and 3) providing search and recommendation functionalities to domain experts, for better assessment of the potential outbreak threats associated with the generated signals. We outline possible approaches to solve these important challenges as well as discuss areas where further research is required. The objective is to provide guidance for similar endeavors, and to give prospective event-based Epidemic Intelligence system builders a more realistic view on the benefits and issues of social media stream analysis.
Can Twitter & Co. Save Lives? from Nattiya Kanhabua
]]>
541 4 https://cdn.slidesharecdn.com/ss_thumbnails/mpii-seminar2013outbreakdetectionintwitterslides-150102052403-conversion-gate02-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Searching the Temporal Web: Challenges and Current Approaches /slideshow/searching-the-temporal-web-challenges-and-current-approaches/43145812 vkagxmemr0e4ku5uhbaz-signature-7897371b91f9cddfe76c8cbefc575e76b2b4f8d495e84b8bfc38e28904ae299d-poli-150102052024-conversion-gate02
This talk gives a survey of current approaches to searching the temporal web. In such a web collection, the contents are created and/or edited over time, and examples are web archives, news archives, blogs, micro-blogs, personal emails and enterprise documents. Unfortunately, traditional IR approaches based on term-matching only can give unsatisfactory results when searching the temporal web. The reason for this is multifold: 1) the collection is strongly time-dependent, i.e., with multiple versions of documents, 2) the contents of documents are about events happened at particular time periods, 3) the meanings of semantic annotations can change over time, and 4) a query representing an information need can be time-sensitive, so-called a temporal query. Several major challenges in searching the temporal web will be discussed, namely, 1) How to understand temporal search intent represented by time-sensitive queries? 2) How to handle the temporal dynamics of queries and documents? and 3) How to explicitly model temporal information in retrieval and ranking models? To this end, we will present current approaches to the addressed problems as well as outline the directions for future research.]]>

This talk gives a survey of current approaches to searching the temporal web. In such a web collection, the contents are created and/or edited over time, and examples are web archives, news archives, blogs, micro-blogs, personal emails and enterprise documents. Unfortunately, traditional IR approaches based on term-matching only can give unsatisfactory results when searching the temporal web. The reason for this is multifold: 1) the collection is strongly time-dependent, i.e., with multiple versions of documents, 2) the contents of documents are about events happened at particular time periods, 3) the meanings of semantic annotations can change over time, and 4) a query representing an information need can be time-sensitive, so-called a temporal query. Several major challenges in searching the temporal web will be discussed, namely, 1) How to understand temporal search intent represented by time-sensitive queries? 2) How to handle the temporal dynamics of queries and documents? and 3) How to explicitly model temporal information in retrieval and ranking models? To this end, we will present current approaches to the addressed problems as well as outline the directions for future research.]]>
Fri, 02 Jan 2015 05:20:24 GMT /slideshow/searching-the-temporal-web-challenges-and-current-approaches/43145812 NattiyaKanhabua@slideshare.net(NattiyaKanhabua) Searching the Temporal Web: Challenges and Current Approaches NattiyaKanhabua This talk gives a survey of current approaches to searching the temporal web. In such a web collection, the contents are created and/or edited over time, and examples are web archives, news archives, blogs, micro-blogs, personal emails and enterprise documents. Unfortunately, traditional IR approaches based on term-matching only can give unsatisfactory results when searching the temporal web. The reason for this is multifold: 1) the collection is strongly time-dependent, i.e., with multiple versions of documents, 2) the contents of documents are about events happened at particular time periods, 3) the meanings of semantic annotations can change over time, and 4) a query representing an information need can be time-sensitive, so-called a temporal query. Several major challenges in searching the temporal web will be discussed, namely, 1) How to understand temporal search intent represented by time-sensitive queries? 2) How to handle the temporal dynamics of queries and documents? and 3) How to explicitly model temporal information in retrieval and ranking models? To this end, we will present current approaches to the addressed problems as well as outline the directions for future research. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/vkagxmemr0e4ku5uhbaz-signature-7897371b91f9cddfe76c8cbefc575e76b2b4f8d495e84b8bfc38e28904ae299d-poli-150102052024-conversion-gate02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> This talk gives a survey of current approaches to searching the temporal web. In such a web collection, the contents are created and/or edited over time, and examples are web archives, news archives, blogs, micro-blogs, personal emails and enterprise documents. Unfortunately, traditional IR approaches based on term-matching only can give unsatisfactory results when searching the temporal web. The reason for this is multifold: 1) the collection is strongly time-dependent, i.e., with multiple versions of documents, 2) the contents of documents are about events happened at particular time periods, 3) the meanings of semantic annotations can change over time, and 4) a query representing an information need can be time-sensitive, so-called a temporal query. Several major challenges in searching the temporal web will be discussed, namely, 1) How to understand temporal search intent represented by time-sensitive queries? 2) How to handle the temporal dynamics of queries and documents? and 3) How to explicitly model temporal information in retrieval and ranking models? To this end, we will present current approaches to the addressed problems as well as outline the directions for future research.
Searching the Temporal Web: Challenges and Current Approaches from Nattiya Kanhabua
]]>
714 5 https://cdn.slidesharecdn.com/ss_thumbnails/vkagxmemr0e4ku5uhbaz-signature-7897371b91f9cddfe76c8cbefc575e76b2b4f8d495e84b8bfc38e28904ae299d-poli-150102052024-conversion-gate02-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Improving Temporal Language Models For Determining Time of Non-Timestamped Documents /slideshow/improving-temporal-language-models-for-determining-time-of-nontimestamped-documents/43145585 y79lki7svqqut9pgloya-signature-1dc5bb68a9f5f0911c139c97f0a167515d373e783078433350c3d7fc9bbce4bc-poli-150102050938-conversion-gate01
Taking the temporal dimension into account in searching, i.e., using time of content creation as part of the search condition, is now gaining increasingly interest. However, in the case of web search and web warehousing, the timestamps (time of creation or creation of contents) of web pages and documents found on the web are in general not known or cannot be trusted, and must be determined otherwise. In this paper, we describe approaches that enhance and increase the quality of existing techniques for determining timestamps based on a temporal language model. Through a number of experiments on temporal document collections we show how our new methods improve the accuracy of timestamping compared to the previous models.]]>

Taking the temporal dimension into account in searching, i.e., using time of content creation as part of the search condition, is now gaining increasingly interest. However, in the case of web search and web warehousing, the timestamps (time of creation or creation of contents) of web pages and documents found on the web are in general not known or cannot be trusted, and must be determined otherwise. In this paper, we describe approaches that enhance and increase the quality of existing techniques for determining timestamps based on a temporal language model. Through a number of experiments on temporal document collections we show how our new methods improve the accuracy of timestamping compared to the previous models.]]>
Fri, 02 Jan 2015 05:09:38 GMT /slideshow/improving-temporal-language-models-for-determining-time-of-nontimestamped-documents/43145585 NattiyaKanhabua@slideshare.net(NattiyaKanhabua) Improving Temporal Language Models For Determining Time of Non-Timestamped Documents NattiyaKanhabua Taking the temporal dimension into account in searching, i.e., using time of content creation as part of the search condition, is now gaining increasingly interest. However, in the case of web search and web warehousing, the timestamps (time of creation or creation of contents) of web pages and documents found on the web are in general not known or cannot be trusted, and must be determined otherwise. In this paper, we describe approaches that enhance and increase the quality of existing techniques for determining timestamps based on a temporal language model. Through a number of experiments on temporal document collections we show how our new methods improve the accuracy of timestamping compared to the previous models. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/y79lki7svqqut9pgloya-signature-1dc5bb68a9f5f0911c139c97f0a167515d373e783078433350c3d7fc9bbce4bc-poli-150102050938-conversion-gate01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Taking the temporal dimension into account in searching, i.e., using time of content creation as part of the search condition, is now gaining increasingly interest. However, in the case of web search and web warehousing, the timestamps (time of creation or creation of contents) of web pages and documents found on the web are in general not known or cannot be trusted, and must be determined otherwise. In this paper, we describe approaches that enhance and increase the quality of existing techniques for determining timestamps based on a temporal language model. Through a number of experiments on temporal document collections we show how our new methods improve the accuracy of timestamping compared to the previous models.
Improving Temporal Language Models For Determining Time of Non-Timestamped Documents from Nattiya Kanhabua
]]>
594 4 https://cdn.slidesharecdn.com/ss_thumbnails/y79lki7svqqut9pgloya-signature-1dc5bb68a9f5f0911c139c97f0a167515d373e783078433350c3d7fc9bbce4bc-poli-150102050938-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Exploiting temporal information in retrieval of archived documents (doctoral consortium) /slideshow/exploiting-temporal-information-in-retrieval-of-archived-documents/43145527 dgk0utyztbqfq7yhpre3-signature-ef6d9c480a834522e5c9d08e00f5da6fd412efd740c65d0216e02cee2f05fd22-poli-150102050616-conversion-gate02
In a text retrieval community, many researchers have shown a good quality of searching a current snapshot of the Web. However, only a small number have demonstrated a good quality of searching a long-term archival domain, where documents are preserved for a long time, i.e., ten years or more. In such a domain, a search application is not only applicable for archivists or historians, but also in a context of national library and enterprise search (searching document repositories, emails, etc.). In the rest of this paper, we will explain three problems of searching document archives and propose possible approaches to solve these problems. Our main research question is: How to improve the quality of search in a document archive using temporal information?]]>

In a text retrieval community, many researchers have shown a good quality of searching a current snapshot of the Web. However, only a small number have demonstrated a good quality of searching a long-term archival domain, where documents are preserved for a long time, i.e., ten years or more. In such a domain, a search application is not only applicable for archivists or historians, but also in a context of national library and enterprise search (searching document repositories, emails, etc.). In the rest of this paper, we will explain three problems of searching document archives and propose possible approaches to solve these problems. Our main research question is: How to improve the quality of search in a document archive using temporal information?]]>
Fri, 02 Jan 2015 05:06:15 GMT /slideshow/exploiting-temporal-information-in-retrieval-of-archived-documents/43145527 NattiyaKanhabua@slideshare.net(NattiyaKanhabua) Exploiting temporal information in retrieval of archived documents (doctoral consortium) NattiyaKanhabua In a text retrieval community, many researchers have shown a good quality of searching a current snapshot of the Web. However, only a small number have demonstrated a good quality of searching a long-term archival domain, where documents are preserved for a long time, i.e., ten years or more. In such a domain, a search application is not only applicable for archivists or historians, but also in a context of national library and enterprise search (searching document repositories, emails, etc.). In the rest of this paper, we will explain three problems of searching document archives and propose possible approaches to solve these problems. Our main research question is: How to improve the quality of search in a document archive using temporal information? <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/dgk0utyztbqfq7yhpre3-signature-ef6d9c480a834522e5c9d08e00f5da6fd412efd740c65d0216e02cee2f05fd22-poli-150102050616-conversion-gate02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> In a text retrieval community, many researchers have shown a good quality of searching a current snapshot of the Web. However, only a small number have demonstrated a good quality of searching a long-term archival domain, where documents are preserved for a long time, i.e., ten years or more. In such a domain, a search application is not only applicable for archivists or historians, but also in a context of national library and enterprise search (searching document repositories, emails, etc.). In the rest of this paper, we will explain three problems of searching document archives and propose possible approaches to solve these problems. Our main research question is: How to improve the quality of search in a document archive using temporal information?
Exploiting temporal information in retrieval of archived documents (doctoral consortium) from Nattiya Kanhabua
]]>
943 4 https://cdn.slidesharecdn.com/ss_thumbnails/dgk0utyztbqfq7yhpre3-signature-ef6d9c480a834522e5c9d08e00f5da6fd412efd740c65d0216e02cee2f05fd22-poli-150102050616-conversion-gate02-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Determining Time of Queries for Re-ranking Search Results /slideshow/determining-time-of-queries-for-reranking-search-results/43145287 aeone1m6sbcgplbrbibo-signature-23b1a42679bd91e7057f1a9c83f28a92aeeb4481ae63937521f19e094dd0e66a-poli-150102045416-conversion-gate02
Recent work on analyzing query logs shows that a significant fraction of queries are temporal, i.e., relevancy is dependent on time, and temporal queries play an important role in many domains, e.g., digital libraries and document archives. Temporal queries can be divided into two types: 1) those with temporal criteria explicitly provided by users, and 2) those with no temporal criteria provided. In this paper, we deal with the latter type of queries, i.e., queries that comprise only keywords, and their relevant documents are associated to particular time periods not given by the queries. We propose a number of methods to determine the time of queries using temporal language models. After that, we show how to increase the retrieval effectiveness by using the determined time of queries to re-rank the search results. Through extensive experiments we show that our proposed approaches improve retrieval effectiveness.]]>

Recent work on analyzing query logs shows that a significant fraction of queries are temporal, i.e., relevancy is dependent on time, and temporal queries play an important role in many domains, e.g., digital libraries and document archives. Temporal queries can be divided into two types: 1) those with temporal criteria explicitly provided by users, and 2) those with no temporal criteria provided. In this paper, we deal with the latter type of queries, i.e., queries that comprise only keywords, and their relevant documents are associated to particular time periods not given by the queries. We propose a number of methods to determine the time of queries using temporal language models. After that, we show how to increase the retrieval effectiveness by using the determined time of queries to re-rank the search results. Through extensive experiments we show that our proposed approaches improve retrieval effectiveness.]]>
Fri, 02 Jan 2015 04:54:16 GMT /slideshow/determining-time-of-queries-for-reranking-search-results/43145287 NattiyaKanhabua@slideshare.net(NattiyaKanhabua) Determining Time of Queries for Re-ranking Search Results NattiyaKanhabua Recent work on analyzing query logs shows that a significant fraction of queries are temporal, i.e., relevancy is dependent on time, and temporal queries play an important role in many domains, e.g., digital libraries and document archives. Temporal queries can be divided into two types: 1) those with temporal criteria explicitly provided by users, and 2) those with no temporal criteria provided. In this paper, we deal with the latter type of queries, i.e., queries that comprise only keywords, and their relevant documents are associated to particular time periods not given by the queries. We propose a number of methods to determine the time of queries using temporal language models. After that, we show how to increase the retrieval effectiveness by using the determined time of queries to re-rank the search results. Through extensive experiments we show that our proposed approaches improve retrieval effectiveness. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/aeone1m6sbcgplbrbibo-signature-23b1a42679bd91e7057f1a9c83f28a92aeeb4481ae63937521f19e094dd0e66a-poli-150102045416-conversion-gate02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Recent work on analyzing query logs shows that a significant fraction of queries are temporal, i.e., relevancy is dependent on time, and temporal queries play an important role in many domains, e.g., digital libraries and document archives. Temporal queries can be divided into two types: 1) those with temporal criteria explicitly provided by users, and 2) those with no temporal criteria provided. In this paper, we deal with the latter type of queries, i.e., queries that comprise only keywords, and their relevant documents are associated to particular time periods not given by the queries. We propose a number of methods to determine the time of queries using temporal language models. After that, we show how to increase the retrieval effectiveness by using the determined time of queries to re-rank the search results. Through extensive experiments we show that our proposed approaches improve retrieval effectiveness.
Determining Time of Queries for Re-ranking Search Results from Nattiya Kanhabua
]]>
528 6 https://cdn.slidesharecdn.com/ss_thumbnails/aeone1m6sbcgplbrbibo-signature-23b1a42679bd91e7057f1a9c83f28a92aeeb4481ae63937521f19e094dd0e66a-poli-150102045416-conversion-gate02-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Supporting Exploration and Serendipity in Information Retrieval /slideshow/supporting-exploration-and-serendipity-in-information-retrieval/43144697 xl8haxkprpeffa6eiy5d-signature-bdc6dbc5f398bd5d0dc4fd8837a57e4ffb5056aca10199022472dd02b5cde633-poli-150102042720-conversion-gate02
Exploratory Search and Recommender Systems]]>

Exploratory Search and Recommender Systems]]>
Fri, 02 Jan 2015 04:27:20 GMT /slideshow/supporting-exploration-and-serendipity-in-information-retrieval/43144697 NattiyaKanhabua@slideshare.net(NattiyaKanhabua) Supporting Exploration and Serendipity in Information Retrieval NattiyaKanhabua Exploratory Search and Recommender Systems <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/xl8haxkprpeffa6eiy5d-signature-bdc6dbc5f398bd5d0dc4fd8837a57e4ffb5056aca10199022472dd02b5cde633-poli-150102042720-conversion-gate02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Exploratory Search and Recommender Systems
Supporting Exploration and Serendipity in Information Retrieval from Nattiya Kanhabua
]]>
912 5 https://cdn.slidesharecdn.com/ss_thumbnails/xl8haxkprpeffa6eiy5d-signature-bdc6dbc5f398bd5d0dc4fd8837a57e4ffb5056aca10199022472dd02b5cde633-poli-150102042720-conversion-gate02-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Time-aware Approaches to Information Retrieval /slideshow/timeaware-approaches-to-information-retrieval/43144643 7hskt6jwtggzhomqlviy-signature-dcfec3011fe9c6117ea060de2ea97b64fe8186b8c37abaf0cb1bdd5e4cc18a7d-poli-150102042423-conversion-gate02
We address major challenges in searching temporal document collections. In such collections, documents are created and/or edited over time. Examples of temporal document collections are web archives, news archives, blogs, personal emails and enterprise documents. Unfortunately, traditional IR approaches based on term-matching only can give unsatisfactory results when searching temporal document collections. The reason for this is twofold: the contents of documents are strongly time-dependent, i.e., documents are about events happened at particular time periods, and a query representing an information need can be time-dependent as well, i.e., a temporal query. Our contributions are different time-aware approaches within three topics in IR: content analysis, query analysis, and retrieval and ranking models. In particular, we aim at improving the retrieval effectiveness by 1) analyzing the contents of temporal document collections, 2) performing an analysis of temporal queries, and 3) explicitly modeling the time dimension into retrieval and ranking. Leveraging the time dimension in ranking can improve the retrieval effectiveness if information about the creation or publication time of documents is available. We analyze the contents of documents in order to determine the time of non-timestamped documents using temporal language models. We subsequently employ the temporal language models for determining the time of implicit temporal queries, and the determined time is used for re-ranking search results in order to improve the retrieval effectiveness. We study the effect of terminology changes over time and propose an approach to handling terminology changes using time-based synonyms. In addition, we propose different methods for predicting the effectiveness of temporal queries, so that a particular query enhancement technique can be performed to improve the overall performance. When the time dimension is incorporated into ranking, documents will be ranked according to both textual and temporal similarity. In this case, time uncertainty should also be taken into account. Thus, we propose a ranking model that considers the time uncertainty, and improve ranking by combining multiple features using learning-to-rank techniques. Through extensive evaluation, we show that our proposed time-aware approaches outperform traditional retrieval methods and improve the retrieval effectiveness in searching temporal document collections.]]>

We address major challenges in searching temporal document collections. In such collections, documents are created and/or edited over time. Examples of temporal document collections are web archives, news archives, blogs, personal emails and enterprise documents. Unfortunately, traditional IR approaches based on term-matching only can give unsatisfactory results when searching temporal document collections. The reason for this is twofold: the contents of documents are strongly time-dependent, i.e., documents are about events happened at particular time periods, and a query representing an information need can be time-dependent as well, i.e., a temporal query. Our contributions are different time-aware approaches within three topics in IR: content analysis, query analysis, and retrieval and ranking models. In particular, we aim at improving the retrieval effectiveness by 1) analyzing the contents of temporal document collections, 2) performing an analysis of temporal queries, and 3) explicitly modeling the time dimension into retrieval and ranking. Leveraging the time dimension in ranking can improve the retrieval effectiveness if information about the creation or publication time of documents is available. We analyze the contents of documents in order to determine the time of non-timestamped documents using temporal language models. We subsequently employ the temporal language models for determining the time of implicit temporal queries, and the determined time is used for re-ranking search results in order to improve the retrieval effectiveness. We study the effect of terminology changes over time and propose an approach to handling terminology changes using time-based synonyms. In addition, we propose different methods for predicting the effectiveness of temporal queries, so that a particular query enhancement technique can be performed to improve the overall performance. When the time dimension is incorporated into ranking, documents will be ranked according to both textual and temporal similarity. In this case, time uncertainty should also be taken into account. Thus, we propose a ranking model that considers the time uncertainty, and improve ranking by combining multiple features using learning-to-rank techniques. Through extensive evaluation, we show that our proposed time-aware approaches outperform traditional retrieval methods and improve the retrieval effectiveness in searching temporal document collections.]]>
Fri, 02 Jan 2015 04:24:23 GMT /slideshow/timeaware-approaches-to-information-retrieval/43144643 NattiyaKanhabua@slideshare.net(NattiyaKanhabua) Time-aware Approaches to Information Retrieval NattiyaKanhabua We address major challenges in searching temporal document collections. In such collections, documents are created and/or edited over time. Examples of temporal document collections are web archives, news archives, blogs, personal emails and enterprise documents. Unfortunately, traditional IR approaches based on term-matching only can give unsatisfactory results when searching temporal document collections. The reason for this is twofold: the contents of documents are strongly time-dependent, i.e., documents are about events happened at particular time periods, and a query representing an information need can be time-dependent as well, i.e., a temporal query. Our contributions are different time-aware approaches within three topics in IR: content analysis, query analysis, and retrieval and ranking models. In particular, we aim at improving the retrieval effectiveness by 1) analyzing the contents of temporal document collections, 2) performing an analysis of temporal queries, and 3) explicitly modeling the time dimension into retrieval and ranking. Leveraging the time dimension in ranking can improve the retrieval effectiveness if information about the creation or publication time of documents is available. We analyze the contents of documents in order to determine the time of non-timestamped documents using temporal language models. We subsequently employ the temporal language models for determining the time of implicit temporal queries, and the determined time is used for re-ranking search results in order to improve the retrieval effectiveness. We study the effect of terminology changes over time and propose an approach to handling terminology changes using time-based synonyms. In addition, we propose different methods for predicting the effectiveness of temporal queries, so that a particular query enhancement technique can be performed to improve the overall performance. When the time dimension is incorporated into ranking, documents will be ranked according to both textual and temporal similarity. In this case, time uncertainty should also be taken into account. Thus, we propose a ranking model that considers the time uncertainty, and improve ranking by combining multiple features using learning-to-rank techniques. Through extensive evaluation, we show that our proposed time-aware approaches outperform traditional retrieval methods and improve the retrieval effectiveness in searching temporal document collections. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/7hskt6jwtggzhomqlviy-signature-dcfec3011fe9c6117ea060de2ea97b64fe8186b8c37abaf0cb1bdd5e4cc18a7d-poli-150102042423-conversion-gate02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> We address major challenges in searching temporal document collections. In such collections, documents are created and/or edited over time. Examples of temporal document collections are web archives, news archives, blogs, personal emails and enterprise documents. Unfortunately, traditional IR approaches based on term-matching only can give unsatisfactory results when searching temporal document collections. The reason for this is twofold: the contents of documents are strongly time-dependent, i.e., documents are about events happened at particular time periods, and a query representing an information need can be time-dependent as well, i.e., a temporal query. Our contributions are different time-aware approaches within three topics in IR: content analysis, query analysis, and retrieval and ranking models. In particular, we aim at improving the retrieval effectiveness by 1) analyzing the contents of temporal document collections, 2) performing an analysis of temporal queries, and 3) explicitly modeling the time dimension into retrieval and ranking. Leveraging the time dimension in ranking can improve the retrieval effectiveness if information about the creation or publication time of documents is available. We analyze the contents of documents in order to determine the time of non-timestamped documents using temporal language models. We subsequently employ the temporal language models for determining the time of implicit temporal queries, and the determined time is used for re-ranking search results in order to improve the retrieval effectiveness. We study the effect of terminology changes over time and propose an approach to handling terminology changes using time-based synonyms. In addition, we propose different methods for predicting the effectiveness of temporal queries, so that a particular query enhancement technique can be performed to improve the overall performance. When the time dimension is incorporated into ranking, documents will be ranked according to both textual and temporal similarity. In this case, time uncertainty should also be taken into account. Thus, we propose a ranking model that considers the time uncertainty, and improve ranking by combining multiple features using learning-to-rank techniques. Through extensive evaluation, we show that our proposed time-aware approaches outperform traditional retrieval methods and improve the retrieval effectiveness in searching temporal document collections.
Time-aware Approaches to Information Retrieval from Nattiya Kanhabua
]]>
825 4 https://cdn.slidesharecdn.com/ss_thumbnails/7hskt6jwtggzhomqlviy-signature-dcfec3011fe9c6117ea060de2ea97b64fe8186b8c37abaf0cb1bdd5e4cc18a7d-poli-150102042423-conversion-gate02-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation) /slideshow/cikm2012-l2-rposter/43144241 c2hgpo3spe7lvhwtprl4-signature-50256047cd8e6ac08936133021c8c8f3c2c43e0fd093b285ecf7943a52c98051-poli-150102040001-conversion-gate02
Retrieval effectiveness of temporal queries can be improved by taking into account the time dimension. Existing temporal ranking models follow one of two main approaches: 1) a mixture model linearly combining textual similarity and temporal similarity, and 2) a probabilistic model generating a query from the textual and temporal part of document independently. In this paper, we propose a novel time-aware ranking model based on learning-to-rank techniques. We employ two classes of features for learning a ranking model, entity-based and temporal features, which are derived from annotation data. Entity-based features are aimed at capturing the semantic similarity between a query and a document, whereas temporal features measure the temporal similarity. Through extensive experiments we show that our ranking model significantly improves the retrieval effectiveness over existing time-aware ranking models.]]>

Retrieval effectiveness of temporal queries can be improved by taking into account the time dimension. Existing temporal ranking models follow one of two main approaches: 1) a mixture model linearly combining textual similarity and temporal similarity, and 2) a probabilistic model generating a query from the textual and temporal part of document independently. In this paper, we propose a novel time-aware ranking model based on learning-to-rank techniques. We employ two classes of features for learning a ranking model, entity-based and temporal features, which are derived from annotation data. Entity-based features are aimed at capturing the semantic similarity between a query and a document, whereas temporal features measure the temporal similarity. Through extensive experiments we show that our ranking model significantly improves the retrieval effectiveness over existing time-aware ranking models.]]>
Fri, 02 Jan 2015 04:00:01 GMT /slideshow/cikm2012-l2-rposter/43144241 NattiyaKanhabua@slideshare.net(NattiyaKanhabua) Learning to Rank Search Results for Time-Sensitive Queries (poster presentation) NattiyaKanhabua Retrieval effectiveness of temporal queries can be improved by taking into account the time dimension. Existing temporal ranking models follow one of two main approaches: 1) a mixture model linearly combining textual similarity and temporal similarity, and 2) a probabilistic model generating a query from the textual and temporal part of document independently. In this paper, we propose a novel time-aware ranking model based on learning-to-rank techniques. We employ two classes of features for learning a ranking model, entity-based and temporal features, which are derived from annotation data. Entity-based features are aimed at capturing the semantic similarity between a query and a document, whereas temporal features measure the temporal similarity. Through extensive experiments we show that our ranking model significantly improves the retrieval effectiveness over existing time-aware ranking models. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/c2hgpo3spe7lvhwtprl4-signature-50256047cd8e6ac08936133021c8c8f3c2c43e0fd093b285ecf7943a52c98051-poli-150102040001-conversion-gate02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Retrieval effectiveness of temporal queries can be improved by taking into account the time dimension. Existing temporal ranking models follow one of two main approaches: 1) a mixture model linearly combining textual similarity and temporal similarity, and 2) a probabilistic model generating a query from the textual and temporal part of document independently. In this paper, we propose a novel time-aware ranking model based on learning-to-rank techniques. We employ two classes of features for learning a ranking model, entity-based and temporal features, which are derived from annotation data. Entity-based features are aimed at capturing the semantic similarity between a query and a document, whereas temporal features measure the temporal similarity. Through extensive experiments we show that our ranking model significantly improves the retrieval effectiveness over existing time-aware ranking models.
Learning to Rank Search Results for Time-Sensitive Queries (poster presentation) from Nattiya Kanhabua
]]>
524 2 https://cdn.slidesharecdn.com/ss_thumbnails/c2hgpo3spe7lvhwtprl4-signature-50256047cd8e6ac08936133021c8c8f3c2c43e0fd093b285ecf7943a52c98051-poli-150102040001-conversion-gate02-thumbnail.jpg?width=120&height=120&fit=bounds document White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
https://cdn.slidesharecdn.com/profile-photo-NattiyaKanhabua-48x48.jpg?cb=1535291651 I am an assistant professor at the Department of Computer Science, Aalborg University, Denmark. Previously, I was a postdoctoral researcher at the L3S Research Center.My research interests are information retrieval, Web and social media mining, human memory-inspired information management, user behaviour studies, and health-related search and analytics. people.cs.aau.dk/~nattiya/ https://cdn.slidesharecdn.com/ss_thumbnails/costkeystonenattiyakanhabuasearchexplorationanalyticsevolvingdata-150723152243-lva1-app6891-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/search-exploration-and-analytics-of-evolving-data/50851225 Search, Exploration an... https://cdn.slidesharecdn.com/ss_thumbnails/cidnxcztxowbc35l5n0r-signature-b979870abcb2dd7faca5989b6f989ba37ec245cb670b808a5ccdd574c9abcad5-poli-150102084158-conversion-gate02-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/towards-concise-preservation-by-managed-forgetting-research-issues-and-case-study/43149361 Towards Concise Preser... https://cdn.slidesharecdn.com/ss_thumbnails/pkfg69yrrfegbokior2w-signature-4c5464e0687501dd96cb9b339374d3f81818b71df95d2630199d16c74c242ec1-poli-150102082822-conversion-gate02-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/understanding-the-diversity-of-tweets-in-the-time-of-outbreaks/43149123 Understanding the Dive...