Crowdsourcing was used to build a dataset classifying news and non-news queries by having workers label search queries as news-related or not. Providing additional context like news headlines and search results improved labeling quality compared to a basic interface. The final dataset of over 1,000 queries was high quality, with workers agreeing and correctly identifying news queries. Validation questions helped ensure reliable labels by catching workers gaming the system.
BW article on professional respondents 2-23 (1)Brett Watkins
Ìý
This document discusses issues with professional respondents in qualitative research and proposes solutions. It analyzes the costs of different recruitment methods, finding that database recruitment is much more cost-effective than list recruitment due to higher response rates. It argues that becoming more adversarial towards database members would reduce cooperation rates and drive up costs. Instead, it suggests that advanced database technologies can improve quality by validating member data, identifying duplicative or suspicious entries, and flagging professional respondents without their knowledge. This allows for easier database registration to attract more members while still screening out cheaters.
Two-step Classification method for Spatial Decision TreeAbhishek Agrawal
Ìý
This document summarizes an article on classifying spatial data using a two-step decision tree method. It introduces spatial classification and describes the authors' approach of using spatial relationships and attributes in decision trees. The method collects classified spatial objects, builds predicate descriptions, performs relevance analysis to identify important attributes, determines optimal buffer sizes, constructs the decision tree using fine-grained predicates, and evaluates performance on real and synthetic datasets.
This document discusses marketing and services. It begins by introducing the presenters and defining a service as a set of consumable benefits. It then defines marketing as communicating the value of a product or service to customers for the purpose of selling it. The document goes on to describe key characteristics of services, the marketing mix/7Ps, internal and external marketing, service classification frameworks, and a chart on tangibility and intangibility of different industries.
The document provides an introduction to classification techniques in machine learning. It defines classification as assigning objects to predefined categories based on their attributes. The goal is to build a model from a training set that can accurately classify previously unseen records. Decision trees are discussed as a popular classification technique that recursively splits data into more homogeneous subgroups based on attribute tests. The document outlines the process of building decision trees, including selecting splitting attributes, stopping criteria, and evaluating performance on a test set. Examples are provided to illustrate classification tasks and building a decision tree model.
International Business_Chapter 4_International Trade Theory_Charles W. HillMd. Bellal Hossain Raju
Ìý
This document provides an overview of international trade theories, including:
- Mercantilism from the 16th century which argued countries should maximize exports and minimize imports.
- Adam Smith's theory of absolute advantage from 1776 which argued countries should specialize in what they produce most efficiently.
- David Ricardo's theory of comparative advantage from 1817 which extended the argument for free trade.
- Later theories such as Heckscher-Ohlin, product life cycle theory, and theories addressing firm strategy and national competitive advantage.
This document discusses using a decision tree algorithm to classify customers of a Portuguese bank as likely or not to subscribe to a term deposit product based on their attributes in a marketing dataset. It presents the dataset, attribute selection process, approach using decision tree induction, and evaluation metrics including accuracy, error rate, and lift. The conclusions identify target customer groups and recommendations for improving the classifier include using ensemble methods and addressing class imbalance.
Kaplan and haenlein 2010 Users of the world, unite! The challenges and opport...Twittercrisis
Ìý
This document provides a definition and classification of social media. It begins by discussing the history and evolution of social media, from early bulletin board systems and personal homepages to modern platforms like Facebook, YouTube, and Wikipedia. The document distinguishes social media from related concepts of Web 2.0 and user-generated content. It then proposes a classification of social media applications according to their degree of social presence/media richness and level of self-disclosure/self-presentation. Major categories include collaborative projects, blogs, content communities, social networking sites, and virtual worlds. The classification is intended to help structure the rapidly evolving field of social media.
An organization is a group of individuals working together toward common goals. As an organization increases in size, the need for a well-defined structure also increases. Organizing involves deciding how to best group activities and resources, while organizational structure refers to the patterns and groups of jobs. There are various elements that make up organizational structure, including division of labor, chain of command, and span of control. Managers make design decisions around how to divide tasks, group jobs, set reporting relationships, and distribute authority.
The document discusses the need for collaboration between research and PR professionals to meet client needs. It outlines what an ideal global monitoring solution would provide but acknowledges limitations. Options presented focus on priority markets and issues, aggregated online content, on-demand broadcasts, and iterative improvements. Scoring would involve qualitative attributes and reputational drivers with transparent scoring. Reporting would combine human-generated and automated reports as well as empowering users. Implementation requires coordination, buy-in, and making the solution part of operations through ongoing management and assessment.
2023 Challenges and Opportunities Impacting Technical Documentation Team Capa...Scott Abel
Ìý
The document summarizes the results of a survey on technical documentation teams. It found that most teams are small (84% have 24 or fewer staff) and face significant challenges like resource constraints. While many teams are moving to structured content, resistance to change prevents some from doing so. Top opportunities for teams include helping marketing, sales, and customer satisfaction by optimizing content. Overall, the survey shows documentation is essential but teams still need more support and capabilities to meet business needs.
London coi september 26th, 2011 - marshall sponderMarshall Sponder
Ìý
This document discusses challenges and solutions for social media analytics. It identifies four types of social media analytics platforms: self-serve, hybrid, internal, and full service. Choosing the right platform depends on factors like an organization's social media maturity, technical requirements, and analytics needs. While platforms provide brand analysis and sentiment data, they have limitations like query length, geo-location support, and data consistency. Proper planning around questions, metrics, triggers, and reporting is needed to successfully use social media analytics data.
The future for performance management, quality and true continuous improvement for local council planning services. Uses much of the data that councils already send to government, supplements it with some new approaches to customer and quality feedback, and brings it all together in one tidy, holistic report.
Triple Your Experiment Velocity by Integrating Optimizely with Your Data Ware...Optimizely
Ìý
Harnessing the power of data and experimentation is central to Updater’s goal of revolutionizing the moving industry. However, while scaling their experimentation program, the team at Updater had to overcome the challenge that many of their conversions happen offline or on third-party sites and therefore couldn't be used in testing. During this session, you’ll learn how Updater fixed this blind spot in their funnels, tripled experiment velocity, and discovered deep user insights by integrating their experimentation platform and data warehouse.
This document discusses breaking through the "analysis barrier" in web analytics. It describes the difference between reporting and analysis, with analysis involving deeper study of problems and recommendations for change. The document outlines a 5-stage model of web analytics maturity and provides examples of real customer analyses, showing how they identified issues and made recommendations. It introduces Semphonic as a consultancy that helps clients overcome the analysis barrier through an analytic roadmap and ongoing deep-dive analysis projects.
This document discusses how business intelligence and data analytics can help organizations match partners and customers more quickly and effectively. It provides examples of how various platforms and tools can be used to gain insights into customer needs, supply and demand trends, matching rates and times. This includes analyzing customer satisfaction surveys, application and matching data, and using search tools and reports to develop strategies to optimize processes and improve matching outcomes. The overall goal is to increase matching rates and reduce times in order to deliver better customer service and experiences.
This document discusses how business intelligence and data analytics can help match participants faster and more effectively for an exchange program. It provides an overview of common challenges in matching, such as long matching times and unsatisfied participants. It then describes how platforms like CustomerGauge, manage.aiesec.org, and myaiesec.net can be used to obtain data on customer preferences, supply and demand trends, and matching metrics to develop strategies to optimize the matching process. Specific analytics tools and functions are outlined for gathering insights to increase matching rates and satisfaction while decreasing matching times.
This is a presentation I gave at Hadoop Summit San Jose 2014, on doing fuzzy matching at large scale using combinations of Hadoop & Solr-based techniques.
Yelp Data Set Challenge (What drives restaurant ratings?)Prashanth Raj
Ìý
The document discusses analyzing the Yelp dataset to understand what drives customer ratings on social media platforms. It aims to identify the most influential variables that affect three target variables: ratings of individual reviews (review_stars), average ratings of businesses (business_stars), and number of reviews businesses receive (business_review_count). Various data preparation steps are described, including data access, cleaning, transformation, and reduction. Descriptive statistics of the relevant variables are also provided. Regression and neural network models will be used to analyze the relationships between the target and predictor variables.
The talk has three parts : the first part gives an overview of data science work, including roadmap of data science team, responsibility and value of data scientists; the second part talks about pitfalls in analysis and teaches some common analysis methods; the third part takes decision support, metrics and AB testing as examples to explain the data science work and how they are translated to business value.
Asset finance systems projects guide 101David Pedreno
Ìý
You are starting, or have already started, an asset finance and leasing system implementation what are the typical pain points ahead? In this “101" guide and tips, Richmond Consulting Group looks at the key areas that will need attention if the journey is to be a smooth one.
Search analytics what why how - By Otis Gospodnetic lucenerevolution
Ìý
- Search analytics involves analyzing query and click data from search to generate reports that provide insights on search performance, user behavior, and areas for improvement.
- Key things search analytics can measure include query failures, high exit rates, irrelevant results, clicks, sessions, and more to help optimize search relevance, interface, and content.
- Reports need to surface both issues to address as well as successes to learn from.
This document provides an overview and agenda for a webinar on using MaxDiff analysis to design products people want to buy. It introduces the speakers, Chris Robson from Parametric Marketing and Esther LaVielle from SurveyAnalytics. The agenda includes an introduction to MaxDiff analysis, a demonstration of building a MaxDiff survey in SurveyAnalytics, reviewing MaxDiff reporting tools, and a Q&A session.
This document provides an overview and agenda for a webinar on using MaxDiff analysis to design products people want to buy. It introduces the speakers, Chris Robson from Parametric Marketing and Esther LaVielle from SurveyAnalytics. The agenda includes an introduction to MaxDiff analysis, a demonstration of building a MaxDiff survey in SurveyAnalytics, reviewing MaxDiff reporting tools, and a Q&A session.
Search analytics what why how - By Otis Gospodneticlucenerevolution
Ìý
- Search analytics involves analyzing query and click data from search to produce reports that provide insights on search performance, user behavior, and areas for improvement.
- Key things search analytics can measure include query failures, high exit rates, irrelevant results, clicks, sessions, and more to help optimize search relevance, interface, and content.
- Producing and analyzing the right reports over time helps support design, content, and experience decisions to improve search and better meet user needs.
This document discusses A/B testing at Microsoft's Bing search engine. It provides three examples of A/B tests run at Bing. The first test compared two variants of the Windows search box location and found that moving it to the left significantly increased user engagement. The second test compared showing 8 vs 10 search results and found that truncating to 8 results performed better. The third test compared adding "site links" to ads, allowing multiple destinations, and found that the variant with site links performed better despite being slightly slower. The document emphasizes that A/B testing is the best way to determine causal effects of changes and detect unexpected consequences.
The document discusses the need for collaboration between research and PR professionals to meet client needs. It outlines what an ideal global monitoring solution would provide but acknowledges limitations. Options presented focus on priority markets and issues, aggregated online content, on-demand broadcasts, and iterative improvements. Scoring would involve qualitative attributes and reputational drivers with transparent scoring. Reporting would combine human-generated and automated reports as well as empowering users. Implementation requires coordination, buy-in, and making the solution part of operations through ongoing management and assessment.
2023 Challenges and Opportunities Impacting Technical Documentation Team Capa...Scott Abel
Ìý
The document summarizes the results of a survey on technical documentation teams. It found that most teams are small (84% have 24 or fewer staff) and face significant challenges like resource constraints. While many teams are moving to structured content, resistance to change prevents some from doing so. Top opportunities for teams include helping marketing, sales, and customer satisfaction by optimizing content. Overall, the survey shows documentation is essential but teams still need more support and capabilities to meet business needs.
London coi september 26th, 2011 - marshall sponderMarshall Sponder
Ìý
This document discusses challenges and solutions for social media analytics. It identifies four types of social media analytics platforms: self-serve, hybrid, internal, and full service. Choosing the right platform depends on factors like an organization's social media maturity, technical requirements, and analytics needs. While platforms provide brand analysis and sentiment data, they have limitations like query length, geo-location support, and data consistency. Proper planning around questions, metrics, triggers, and reporting is needed to successfully use social media analytics data.
The future for performance management, quality and true continuous improvement for local council planning services. Uses much of the data that councils already send to government, supplements it with some new approaches to customer and quality feedback, and brings it all together in one tidy, holistic report.
Triple Your Experiment Velocity by Integrating Optimizely with Your Data Ware...Optimizely
Ìý
Harnessing the power of data and experimentation is central to Updater’s goal of revolutionizing the moving industry. However, while scaling their experimentation program, the team at Updater had to overcome the challenge that many of their conversions happen offline or on third-party sites and therefore couldn't be used in testing. During this session, you’ll learn how Updater fixed this blind spot in their funnels, tripled experiment velocity, and discovered deep user insights by integrating their experimentation platform and data warehouse.
This document discusses breaking through the "analysis barrier" in web analytics. It describes the difference between reporting and analysis, with analysis involving deeper study of problems and recommendations for change. The document outlines a 5-stage model of web analytics maturity and provides examples of real customer analyses, showing how they identified issues and made recommendations. It introduces Semphonic as a consultancy that helps clients overcome the analysis barrier through an analytic roadmap and ongoing deep-dive analysis projects.
This document discusses how business intelligence and data analytics can help organizations match partners and customers more quickly and effectively. It provides examples of how various platforms and tools can be used to gain insights into customer needs, supply and demand trends, matching rates and times. This includes analyzing customer satisfaction surveys, application and matching data, and using search tools and reports to develop strategies to optimize processes and improve matching outcomes. The overall goal is to increase matching rates and reduce times in order to deliver better customer service and experiences.
This document discusses how business intelligence and data analytics can help match participants faster and more effectively for an exchange program. It provides an overview of common challenges in matching, such as long matching times and unsatisfied participants. It then describes how platforms like CustomerGauge, manage.aiesec.org, and myaiesec.net can be used to obtain data on customer preferences, supply and demand trends, and matching metrics to develop strategies to optimize the matching process. Specific analytics tools and functions are outlined for gathering insights to increase matching rates and satisfaction while decreasing matching times.
This is a presentation I gave at Hadoop Summit San Jose 2014, on doing fuzzy matching at large scale using combinations of Hadoop & Solr-based techniques.
Yelp Data Set Challenge (What drives restaurant ratings?)Prashanth Raj
Ìý
The document discusses analyzing the Yelp dataset to understand what drives customer ratings on social media platforms. It aims to identify the most influential variables that affect three target variables: ratings of individual reviews (review_stars), average ratings of businesses (business_stars), and number of reviews businesses receive (business_review_count). Various data preparation steps are described, including data access, cleaning, transformation, and reduction. Descriptive statistics of the relevant variables are also provided. Regression and neural network models will be used to analyze the relationships between the target and predictor variables.
The talk has three parts : the first part gives an overview of data science work, including roadmap of data science team, responsibility and value of data scientists; the second part talks about pitfalls in analysis and teaches some common analysis methods; the third part takes decision support, metrics and AB testing as examples to explain the data science work and how they are translated to business value.
Asset finance systems projects guide 101David Pedreno
Ìý
You are starting, or have already started, an asset finance and leasing system implementation what are the typical pain points ahead? In this “101" guide and tips, Richmond Consulting Group looks at the key areas that will need attention if the journey is to be a smooth one.
Search analytics what why how - By Otis Gospodnetic lucenerevolution
Ìý
- Search analytics involves analyzing query and click data from search to generate reports that provide insights on search performance, user behavior, and areas for improvement.
- Key things search analytics can measure include query failures, high exit rates, irrelevant results, clicks, sessions, and more to help optimize search relevance, interface, and content.
- Reports need to surface both issues to address as well as successes to learn from.
This document provides an overview and agenda for a webinar on using MaxDiff analysis to design products people want to buy. It introduces the speakers, Chris Robson from Parametric Marketing and Esther LaVielle from SurveyAnalytics. The agenda includes an introduction to MaxDiff analysis, a demonstration of building a MaxDiff survey in SurveyAnalytics, reviewing MaxDiff reporting tools, and a Q&A session.
This document provides an overview and agenda for a webinar on using MaxDiff analysis to design products people want to buy. It introduces the speakers, Chris Robson from Parametric Marketing and Esther LaVielle from SurveyAnalytics. The agenda includes an introduction to MaxDiff analysis, a demonstration of building a MaxDiff survey in SurveyAnalytics, reviewing MaxDiff reporting tools, and a Q&A session.
Search analytics what why how - By Otis Gospodneticlucenerevolution
Ìý
- Search analytics involves analyzing query and click data from search to produce reports that provide insights on search performance, user behavior, and areas for improvement.
- Key things search analytics can measure include query failures, high exit rates, irrelevant results, clicks, sessions, and more to help optimize search relevance, interface, and content.
- Producing and analyzing the right reports over time helps support design, content, and experience decisions to improve search and better meet user needs.
This document discusses A/B testing at Microsoft's Bing search engine. It provides three examples of A/B tests run at Bing. The first test compared two variants of the Windows search box location and found that moving it to the left significantly increased user engagement. The second test compared showing 8 vs 10 search results and found that truncating to 8 results performed better. The third test compared adding "site links" to ads, allowing multiple destinations, and found that the variant with site links performed better despite being slightly slower. The document emphasizes that A/B testing is the best way to determine causal effects of changes and detect unexpected consequences.
2. Introduction What is news query classification and why would we build a dataset to examine it? Binary classification task performed by Web search enginesUp to 10% of queries may be news-related [Bar-Ilan et al, 2009]Have workers judge Web search queries as news-related or notNews-RelatedNews ResultsgunmanNon-News-RelatedWeb Search EngineUserWeb Search Results1
5. News Queries change over time for query in task return Random(Yes,No)end loop2News-related?Sea Creature?Query: Octopus?Work Cup Predications?How can we overcome these difficulties to create a high quality dataset for news query classification?
11. Dataset Construction Methodology How can we go about building a news query classification dataset?Sample Queries from the MSN May 2006 Query LogCreate Gold judgments to validate the workersPropose additional content to tackle the temporal nature of news queries and prototype interfaces to evaluate this content on a small test setCreate the final labels using the best setting and interfaceEvaluate in terms of agreementEvaluate against `experts`4
12. Dataset Construction Methodology Sampling Queries: Create 2 query sets sampled from the MSN May 2006 Query-logPoisson Sampling One for testing (testset)Fast crowdsourcing turn around timeVery low cost One for the final dataset (fullset)10x queries Only labelled once5Date Time Query 2006-05-01 00:00:08 What is May Day? 2006-05-08 14:43:42 protest in Puerto ricoTestset Queries : 91Fullset Queries : 1206
13. Dataset Construction Methodology How to check our workers are not `gaming’ the system?Gold Judgments (honey-pot)Small set (5%) of queriesCatch out bad workers early in the task`Cherry-picked` unambiguous queries Focus on news-related queriesMultiple workers per query3 workers Majority result6Date Time Query Validation2006-05-01 00:00:08 What is May Day? No2006-05-08 14:43:42 protest in Puerto rico Yes
14. Dataset Construction Methodology How to counter temporal nature of news queries? Workers need to know what the news stories of the time were . . . But likely will not remember what the main stories during May 2006 Idea: add extra information to interface News Headlines News Summaries Web Search results Prototype Interfaces Use small testset to keep costs and turn-around time low See which works the best7
15. Interfaces : BasicWhat the workers need to doClarify news-relatedness8Query and DateBinary labelling
16. Interfaces : Headline12 news headlines from the New York Times . . . Will the workers bother to read these?9
19. Interfaces : LinkSupportedLinks to three major search engines . . . Triggers a search containing the query and its dateAlso get some additional feedback from workers12
20. Dataset Construction Methodology How do we evaluate our the quality of our labels?Agreement between the three workers per query The more the workers agree, the more confident that we can be that our resulting majority label is correct Compare with `expert’ (me) judgments See how many of the queries that the workers judged news-related match the ground truth13Date Time Query Worker Expert2006-05-05 07:31:23 abcnews Yes No2006-05-08 14:43:42 protest in Puerto rico Yes Yes
26. Experimental Setup Research QuestionsHow do our interface and setting effect the quality of our labelsBaseline quality? How bad is it?How much can the honey-pot bring?How about our extra information {headlines,summaries,result rankings}Can we create a good quality dataset?Agreement?Vs ground truthtestsetfullset15
37. How is our Baseline?Validation is very important:32% of judgments were rejected Basic InterfaceAccuracy : Combined Measure (assumes that the workers labelled non-news-related queries correctly)Recall : The % of all news-related queries that the workers labelled correctlyAs expected the baseline is fairly poor, i.e. Agreement between workers per query is low (25-50%)20% of those were completed VERY quickly: Bots?Kfree : Kappa agreement assuming that workers would label randomly Precision : The % of queries labelled as news-related that agree with our ground truth. . . and new usersKfleiss : Kappa agreement assuming that workers will label according to class distributionWatch out for bursty judging
38. Adding Additional Information By providing additional news-related information does label quality increase and which is the best interface? Answer: Yes, as shown by the performance increase The LinkSuported interface provides the highest performance19. . . but putting the information with each query causes workers to just match the textWeb results provide just as much information as headlinesMore information increases performanceAgreement .We can help users by providing more informationHeadlineInlineHeadlineHeadlineSumaryLinkSupportedBasic
39. Labelling the FullSet20Link-Supported Interface We now label the fullset 1204 queries Gold Judgments LinkSupported Interface Are the resulting labels of sufficient quality?
40. High recall and agreement indicate that the labels are of high qualityPrecision . Workers finding other queries to be news-relatedRecall Workers got all of the news-related queries right!Agreement .Workers maybe learning the task?Majority of work done by 3 userstestsetfullset
41. Conclusions & Best PracticesCrowdsourcing is useful for building a news-query classification dataset
42. We are confident that our dataset is reliable since agreement is highBest PracticesOnline worker validation is paramount, catch out bots and lazy workers to improve agreement Provide workers with additional information to help improve labelling quality Workers can learn, running large single jobs may allow workers to become better at the taskQuestions?21
Editor's Notes
#6: Poisson sampling – literature says is representative