際際滷shows by User: s_shah / http://www.slideshare.net/images/logo.gif 際際滷shows by User: s_shah / Wed, 17 Jul 2013 12:12:28 GMT 際際滷Share feed for 際際滷shows by User: s_shah Organizational Overlap on Social Networks and its Applications /slideshow/fp086-hsieh/24347253 fp086-hsieh-130717121228-phpapp02
[This work was presented at WWW 2013.] Online social networks have become important for networking, communication, sharing, and discovery. A considerable challenge these networks face is the fact that an online social network is partially observed because two individuals might know each other, but may not have established a connection on the site. Therefore, link prediction and recommendations are important tasks for any online social network. In this paper, we address the problem of computing edge affinity between two users on a social network, based on the users belonging to organizations such as companies, schools, and online groups. We present experimental insights from social network data on organizational overlap, a novel mathematical model to compute the probability of connection between two peo- ple based on organizational overlap, and experimental validation of this model based on real social network data. We also present novel ways in which the organization overlap model can be applied to link prediction and community detection, which in itself could be useful for recommending entities to follow and generating personalized news feed.]]>

[This work was presented at WWW 2013.] Online social networks have become important for networking, communication, sharing, and discovery. A considerable challenge these networks face is the fact that an online social network is partially observed because two individuals might know each other, but may not have established a connection on the site. Therefore, link prediction and recommendations are important tasks for any online social network. In this paper, we address the problem of computing edge affinity between two users on a social network, based on the users belonging to organizations such as companies, schools, and online groups. We present experimental insights from social network data on organizational overlap, a novel mathematical model to compute the probability of connection between two peo- ple based on organizational overlap, and experimental validation of this model based on real social network data. We also present novel ways in which the organization overlap model can be applied to link prediction and community detection, which in itself could be useful for recommending entities to follow and generating personalized news feed.]]>
Wed, 17 Jul 2013 12:12:28 GMT /slideshow/fp086-hsieh/24347253 s_shah@slideshare.net(s_shah) Organizational Overlap on Social Networks and its Applications s_shah [This work was presented at WWW 2013.] Online social networks have become important for networking, communication, sharing, and discovery. A considerable challenge these networks face is the fact that an online social network is partially observed because two individuals might know each other, but may not have established a connection on the site. Therefore, link prediction and recommendations are important tasks for any online social network. In this paper, we address the problem of computing edge affinity between two users on a social network, based on the users belonging to organizations such as companies, schools, and online groups. We present experimental insights from social network data on organizational overlap, a novel mathematical model to compute the probability of connection between two peo- ple based on organizational overlap, and experimental validation of this model based on real social network data. We also present novel ways in which the organization overlap model can be applied to link prediction and community detection, which in itself could be useful for recommending entities to follow and generating personalized news feed. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/fp086-hsieh-130717121228-phpapp02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> [This work was presented at WWW 2013.] Online social networks have become important for networking, communication, sharing, and discovery. A considerable challenge these networks face is the fact that an online social network is partially observed because two individuals might know each other, but may not have established a connection on the site. Therefore, link prediction and recommendations are important tasks for any online social network. In this paper, we address the problem of computing edge affinity between two users on a social network, based on the users belonging to organizations such as companies, schools, and online groups. We present experimental insights from social network data on organizational overlap, a novel mathematical model to compute the probability of connection between two peo- ple based on organizational overlap, and experimental validation of this model based on real social network data. We also present novel ways in which the organization overlap model can be applied to link prediction and community detection, which in itself could be useful for recommending entities to follow and generating personalized news feed.
Organizational Overlap on Social Networks and its Applications from Sam Shah
]]>
1733 2 https://cdn.slidesharecdn.com/ss_thumbnails/fp086-hsieh-130717121228-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds document Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 1
The "Big Data" Ecosystem at LinkedIn /slideshow/the-big-data-ecosystem-at-linkedin-24324982/24324982 sigmod2013-130717004706-phpapp01
[This work was presented at SIGMOD'13.] The use of large-scale data mining and machine learning has proliferated through the adoption of technologies such as Hadoop, with its simple programming semantics and rich and active ecosystem. This paper presents LinkedIn's Hadoop-based analytics stack, which allows data scientists and machine learning researchers to extract insights and build product features from massive amounts of data. In particular, we present our solutions to the "last mile" issues in providing a rich developer ecosystem. This includes easy ingress from and egress to online systems, and managing workflows as production processes. A key characteristic of our solution is that these distributed system concerns are completely abstracted away from researchers. For example, deploying data back into the online system is simply a 1-line Pig command that a data scientist can add to the end of their script. We also present case studies on how this ecosystem is used to solve problems ranging from recommendations to news feed updates to email digesting to descriptive analytical dashboards for our members.]]>

[This work was presented at SIGMOD'13.] The use of large-scale data mining and machine learning has proliferated through the adoption of technologies such as Hadoop, with its simple programming semantics and rich and active ecosystem. This paper presents LinkedIn's Hadoop-based analytics stack, which allows data scientists and machine learning researchers to extract insights and build product features from massive amounts of data. In particular, we present our solutions to the "last mile" issues in providing a rich developer ecosystem. This includes easy ingress from and egress to online systems, and managing workflows as production processes. A key characteristic of our solution is that these distributed system concerns are completely abstracted away from researchers. For example, deploying data back into the online system is simply a 1-line Pig command that a data scientist can add to the end of their script. We also present case studies on how this ecosystem is used to solve problems ranging from recommendations to news feed updates to email digesting to descriptive analytical dashboards for our members.]]>
Wed, 17 Jul 2013 00:47:06 GMT /slideshow/the-big-data-ecosystem-at-linkedin-24324982/24324982 s_shah@slideshare.net(s_shah) The "Big Data" Ecosystem at LinkedIn s_shah [This work was presented at SIGMOD'13.] The use of large-scale data mining and machine learning has proliferated through the adoption of technologies such as Hadoop, with its simple programming semantics and rich and active ecosystem. This paper presents LinkedIn's Hadoop-based analytics stack, which allows data scientists and machine learning researchers to extract insights and build product features from massive amounts of data. In particular, we present our solutions to the "last mile" issues in providing a rich developer ecosystem. This includes easy ingress from and egress to online systems, and managing workflows as production processes. A key characteristic of our solution is that these distributed system concerns are completely abstracted away from researchers. For example, deploying data back into the online system is simply a 1-line Pig command that a data scientist can add to the end of their script. We also present case studies on how this ecosystem is used to solve problems ranging from recommendations to news feed updates to email digesting to descriptive analytical dashboards for our members. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/sigmod2013-130717004706-phpapp01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> [This work was presented at SIGMOD&#39;13.] The use of large-scale data mining and machine learning has proliferated through the adoption of technologies such as Hadoop, with its simple programming semantics and rich and active ecosystem. This paper presents LinkedIn&#39;s Hadoop-based analytics stack, which allows data scientists and machine learning researchers to extract insights and build product features from massive amounts of data. In particular, we present our solutions to the &quot;last mile&quot; issues in providing a rich developer ecosystem. This includes easy ingress from and egress to online systems, and managing workflows as production processes. A key characteristic of our solution is that these distributed system concerns are completely abstracted away from researchers. For example, deploying data back into the online system is simply a 1-line Pig command that a data scientist can add to the end of their script. We also present case studies on how this ecosystem is used to solve problems ranging from recommendations to news feed updates to email digesting to descriptive analytical dashboards for our members.
The "Big Data" Ecosystem at LinkedIn from Sam Shah
]]>
2247 2 https://cdn.slidesharecdn.com/ss_thumbnails/sigmod2013-130717004706-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 1
Root Cause Detection in a Service-Oriented Architecture /slideshow/p93-kim/24324853 p93-kim-130717003913-phpapp01
[This paper was presented at SIGMETRICS 2013.] Large-scale websites are predominantly built as a service-oriented architecture. Here, services are specialized for a certain task, run on multiple machines, and communicate with each other to serve a users request. An anomalous change in a metric of one service can propagate to other services during this communication, resulting in overall degradation of the request. As any such degradation is revenue impacting, maintaining correct functionality is of paramount concern: it is important to find the root cause of any anomaly as quickly as possible. This is challenging because there are numerous metrics or sensors for a given service, and a modern website is usually composed of hundreds of services running on thousands of machines in multiple data centers. This paper introduces MonitorRank, an algorithm that can reduce the time, domain knowledge, and human effort required to find the root causes of anomalies in such service-oriented architectures. In the event of an anomaly, MonitorRank provides a ranked order list of possible root causes for monitoring teams to investigate. MonitorRank uses the historical and current time-series metrics of each sensor as its input, along with the call graph generated between sensors to build an unsupervised model for ranking. Experiments on real production outage data from LinkedIn, one of the largest online social networks, shows a 26% to 51% improvement in mean average precision in finding root causes compared to baseline and current state-of-the-art methods.]]>

[This paper was presented at SIGMETRICS 2013.] Large-scale websites are predominantly built as a service-oriented architecture. Here, services are specialized for a certain task, run on multiple machines, and communicate with each other to serve a users request. An anomalous change in a metric of one service can propagate to other services during this communication, resulting in overall degradation of the request. As any such degradation is revenue impacting, maintaining correct functionality is of paramount concern: it is important to find the root cause of any anomaly as quickly as possible. This is challenging because there are numerous metrics or sensors for a given service, and a modern website is usually composed of hundreds of services running on thousands of machines in multiple data centers. This paper introduces MonitorRank, an algorithm that can reduce the time, domain knowledge, and human effort required to find the root causes of anomalies in such service-oriented architectures. In the event of an anomaly, MonitorRank provides a ranked order list of possible root causes for monitoring teams to investigate. MonitorRank uses the historical and current time-series metrics of each sensor as its input, along with the call graph generated between sensors to build an unsupervised model for ranking. Experiments on real production outage data from LinkedIn, one of the largest online social networks, shows a 26% to 51% improvement in mean average precision in finding root causes compared to baseline and current state-of-the-art methods.]]>
Wed, 17 Jul 2013 00:39:13 GMT /slideshow/p93-kim/24324853 s_shah@slideshare.net(s_shah) Root Cause Detection in a Service-Oriented Architecture s_shah [This paper was presented at SIGMETRICS 2013.] Large-scale websites are predominantly built as a service-oriented architecture. Here, services are specialized for a certain task, run on multiple machines, and communicate with each other to serve a users request. An anomalous change in a metric of one service can propagate to other services during this communication, resulting in overall degradation of the request. As any such degradation is revenue impacting, maintaining correct functionality is of paramount concern: it is important to find the root cause of any anomaly as quickly as possible. This is challenging because there are numerous metrics or sensors for a given service, and a modern website is usually composed of hundreds of services running on thousands of machines in multiple data centers. This paper introduces MonitorRank, an algorithm that can reduce the time, domain knowledge, and human effort required to find the root causes of anomalies in such service-oriented architectures. In the event of an anomaly, MonitorRank provides a ranked order list of possible root causes for monitoring teams to investigate. MonitorRank uses the historical and current time-series metrics of each sensor as its input, along with the call graph generated between sensors to build an unsupervised model for ranking. Experiments on real production outage data from LinkedIn, one of the largest online social networks, shows a 26% to 51% improvement in mean average precision in finding root causes compared to baseline and current state-of-the-art methods. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/p93-kim-130717003913-phpapp01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> [This paper was presented at SIGMETRICS 2013.] Large-scale websites are predominantly built as a service-oriented architecture. Here, services are specialized for a certain task, run on multiple machines, and communicate with each other to serve a users request. An anomalous change in a metric of one service can propagate to other services during this communication, resulting in overall degradation of the request. As any such degradation is revenue impacting, maintaining correct functionality is of paramount concern: it is important to find the root cause of any anomaly as quickly as possible. This is challenging because there are numerous metrics or sensors for a given service, and a modern website is usually composed of hundreds of services running on thousands of machines in multiple data centers. This paper introduces MonitorRank, an algorithm that can reduce the time, domain knowledge, and human effort required to find the root causes of anomalies in such service-oriented architectures. In the event of an anomaly, MonitorRank provides a ranked order list of possible root causes for monitoring teams to investigate. MonitorRank uses the historical and current time-series metrics of each sensor as its input, along with the call graph generated between sensors to build an unsupervised model for ranking. Experiments on real production outage data from LinkedIn, one of the largest online social networks, shows a 26% to 51% improvement in mean average precision in finding root causes compared to baseline and current state-of-the-art methods.
Root Cause Detection in a Service-Oriented Architecture from Sam Shah
]]>
971 5 https://cdn.slidesharecdn.com/ss_thumbnails/p93-kim-130717003913-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds document Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 1
The "Big Data" Ecosystem at LinkedIn /slideshow/the-big-data-ecosystem-at-linkedin-23512853/23512853 p1125-sumbaly-130626081959-phpapp01
[This is work presented at SIGMOD'13.] The use of large-scale data mining and machine learning has proliferated through the adoption of technologies such as Hadoop, with its simple programming semantics and rich and active ecosystem. This paper presents LinkedIn's Hadoop-based analytics stack, which allows data scientists and machine learning researchers to extract insights and build product features from massive amounts of data. In particular, we present our solutions to the "last mile" issues in providing a rich developer ecosystem. This includes easy ingress from and egress to online systems, and managing workflows as production processes. A key characteristic of our solution is that these distributed system concerns are completely abstracted away from researchers. For example, deploying data back into the online system is simply a 1-line Pig command that a data scientist can add to the end of their script. We also present case studies on how this ecosystem is used to solve problems ranging from recommendations to news feed updates to email digesting to descriptive analytical dashboards for our members.]]>

[This is work presented at SIGMOD'13.] The use of large-scale data mining and machine learning has proliferated through the adoption of technologies such as Hadoop, with its simple programming semantics and rich and active ecosystem. This paper presents LinkedIn's Hadoop-based analytics stack, which allows data scientists and machine learning researchers to extract insights and build product features from massive amounts of data. In particular, we present our solutions to the "last mile" issues in providing a rich developer ecosystem. This includes easy ingress from and egress to online systems, and managing workflows as production processes. A key characteristic of our solution is that these distributed system concerns are completely abstracted away from researchers. For example, deploying data back into the online system is simply a 1-line Pig command that a data scientist can add to the end of their script. We also present case studies on how this ecosystem is used to solve problems ranging from recommendations to news feed updates to email digesting to descriptive analytical dashboards for our members.]]>
Wed, 26 Jun 2013 08:19:59 GMT /slideshow/the-big-data-ecosystem-at-linkedin-23512853/23512853 s_shah@slideshare.net(s_shah) The "Big Data" Ecosystem at LinkedIn s_shah [This is work presented at SIGMOD'13.] The use of large-scale data mining and machine learning has proliferated through the adoption of technologies such as Hadoop, with its simple programming semantics and rich and active ecosystem. This paper presents LinkedIn's Hadoop-based analytics stack, which allows data scientists and machine learning researchers to extract insights and build product features from massive amounts of data. In particular, we present our solutions to the "last mile" issues in providing a rich developer ecosystem. This includes easy ingress from and egress to online systems, and managing workflows as production processes. A key characteristic of our solution is that these distributed system concerns are completely abstracted away from researchers. For example, deploying data back into the online system is simply a 1-line Pig command that a data scientist can add to the end of their script. We also present case studies on how this ecosystem is used to solve problems ranging from recommendations to news feed updates to email digesting to descriptive analytical dashboards for our members. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/p1125-sumbaly-130626081959-phpapp01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> [This is work presented at SIGMOD&#39;13.] The use of large-scale data mining and machine learning has proliferated through the adoption of technologies such as Hadoop, with its simple programming semantics and rich and active ecosystem. This paper presents LinkedIn&#39;s Hadoop-based analytics stack, which allows data scientists and machine learning researchers to extract insights and build product features from massive amounts of data. In particular, we present our solutions to the &quot;last mile&quot; issues in providing a rich developer ecosystem. This includes easy ingress from and egress to online systems, and managing workflows as production processes. A key characteristic of our solution is that these distributed system concerns are completely abstracted away from researchers. For example, deploying data back into the online system is simply a 1-line Pig command that a data scientist can add to the end of their script. We also present case studies on how this ecosystem is used to solve problems ranging from recommendations to news feed updates to email digesting to descriptive analytical dashboards for our members.
The "Big Data" Ecosystem at LinkedIn from Sam Shah
]]>
13608 13 https://cdn.slidesharecdn.com/ss_thumbnails/p1125-sumbaly-130626081959-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds document White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 1
How to Win Friends and Influence People (with Hadoop) /slideshow/how-to-win-friends-and-influence-people-with-hadoop-21542806/21542806 howtowinfriendsandinfluencepeoplev2-130520150755-phpapp02
Strata NY 2012: http://strataconf.com/stratany2012/public/schedule/detail/25568 Hadoop is gaining momentum with most companies as a means to do log analysis and business reporting. Hadoop is a great tool for solving these problems, but it can be used to build much more interesting data applications. Hadoop is a general purpose, high performance data processing pipeline. At LinkedIn, the largest professional social network, we use Hadoop for several uncommon and interesting use cases. For instance, we look at marketing as a recommendation problem, not a sales problem. To do this, we use Hadoop for our recommendation, data processing, and content delivery pipelines, approaching marketing as a scientific process that helps us learn how to advertise better. To this end, weve developed a Hadoop-based system that generates and prioritizes marketing email messages. As another example, we use Hadoop to generate updates in a members news feed. This system can be used to deliver rich analytical insights to members or to quickly prototype an idea for a new update, all with a 1-line command thats easy enough for even product managers to use. As one final example, we use Hadoop to power several recommendation systems, including People You May Know. In this talk, well describe how LinkedIn leverages Hadoop for these use cases. Well give detailed descriptions of the systems and tools that we have built to use Hadoop for production pipelines (such as Azkaban and Kafka), and interesting things weve learned along the way. Well talk about how Hadoop allows us to come up with ideas, rapidly test them, and how we can quickly turn these ideas into scalable production processes. ]]>

Strata NY 2012: http://strataconf.com/stratany2012/public/schedule/detail/25568 Hadoop is gaining momentum with most companies as a means to do log analysis and business reporting. Hadoop is a great tool for solving these problems, but it can be used to build much more interesting data applications. Hadoop is a general purpose, high performance data processing pipeline. At LinkedIn, the largest professional social network, we use Hadoop for several uncommon and interesting use cases. For instance, we look at marketing as a recommendation problem, not a sales problem. To do this, we use Hadoop for our recommendation, data processing, and content delivery pipelines, approaching marketing as a scientific process that helps us learn how to advertise better. To this end, weve developed a Hadoop-based system that generates and prioritizes marketing email messages. As another example, we use Hadoop to generate updates in a members news feed. This system can be used to deliver rich analytical insights to members or to quickly prototype an idea for a new update, all with a 1-line command thats easy enough for even product managers to use. As one final example, we use Hadoop to power several recommendation systems, including People You May Know. In this talk, well describe how LinkedIn leverages Hadoop for these use cases. Well give detailed descriptions of the systems and tools that we have built to use Hadoop for production pipelines (such as Azkaban and Kafka), and interesting things weve learned along the way. Well talk about how Hadoop allows us to come up with ideas, rapidly test them, and how we can quickly turn these ideas into scalable production processes. ]]>
Mon, 20 May 2013 15:07:55 GMT /slideshow/how-to-win-friends-and-influence-people-with-hadoop-21542806/21542806 s_shah@slideshare.net(s_shah) How to Win Friends and Influence People (with Hadoop) s_shah Strata NY 2012: http://strataconf.com/stratany2012/public/schedule/detail/25568 Hadoop is gaining momentum with most companies as a means to do log analysis and business reporting. Hadoop is a great tool for solving these problems, but it can be used to build much more interesting data applications. Hadoop is a general purpose, high performance data processing pipeline. At LinkedIn, the largest professional social network, we use Hadoop for several uncommon and interesting use cases. For instance, we look at marketing as a recommendation problem, not a sales problem. To do this, we use Hadoop for our recommendation, data processing, and content delivery pipelines, approaching marketing as a scientific process that helps us learn how to advertise better. To this end, weve developed a Hadoop-based system that generates and prioritizes marketing email messages. As another example, we use Hadoop to generate updates in a members news feed. This system can be used to deliver rich analytical insights to members or to quickly prototype an idea for a new update, all with a 1-line command thats easy enough for even product managers to use. As one final example, we use Hadoop to power several recommendation systems, including People You May Know. In this talk, well describe how LinkedIn leverages Hadoop for these use cases. Well give detailed descriptions of the systems and tools that we have built to use Hadoop for production pipelines (such as Azkaban and Kafka), and interesting things weve learned along the way. Well talk about how Hadoop allows us to come up with ideas, rapidly test them, and how we can quickly turn these ideas into scalable production processes. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/howtowinfriendsandinfluencepeoplev2-130520150755-phpapp02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Strata NY 2012: http://strataconf.com/stratany2012/public/schedule/detail/25568 Hadoop is gaining momentum with most companies as a means to do log analysis and business reporting. Hadoop is a great tool for solving these problems, but it can be used to build much more interesting data applications. Hadoop is a general purpose, high performance data processing pipeline. At LinkedIn, the largest professional social network, we use Hadoop for several uncommon and interesting use cases. For instance, we look at marketing as a recommendation problem, not a sales problem. To do this, we use Hadoop for our recommendation, data processing, and content delivery pipelines, approaching marketing as a scientific process that helps us learn how to advertise better. To this end, weve developed a Hadoop-based system that generates and prioritizes marketing email messages. As another example, we use Hadoop to generate updates in a members news feed. This system can be used to deliver rich analytical insights to members or to quickly prototype an idea for a new update, all with a 1-line command thats easy enough for even product managers to use. As one final example, we use Hadoop to power several recommendation systems, including People You May Know. In this talk, well describe how LinkedIn leverages Hadoop for these use cases. Well give detailed descriptions of the systems and tools that we have built to use Hadoop for production pipelines (such as Azkaban and Kafka), and interesting things weve learned along the way. Well talk about how Hadoop allows us to come up with ideas, rapidly test them, and how we can quickly turn these ideas into scalable production processes.
How to Win Friends and Influence People (with Hadoop) from Sam Shah
]]>
448 2 https://cdn.slidesharecdn.com/ss_thumbnails/howtowinfriendsandinfluencepeoplev2-130520150755-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 1
Strata 2013 - LinkedIn Endorsements: Reputation, Virality, and Social Tagging /slideshow/strata-endorsements/16924502 strata-endorsements-130304111839-phpapp02
(To see the animations, please download the presentation.) Endorsements are a one-click system to recognize someone for their skills and expertise on LinkedIn, the largest professional online social network. This is one of the latest data features in LinkedIns portfolio, and the endorsement ecosystem generates a large graph of reputation signals and viral user activity. Underneath this feature, there are several interesting and difficult data questions: 1. How do you automatically create a taxonomy of skills in the professional context? 2. How do you disambiguate between different contexts of skills? For instance, search could mean information retrieval, search & seizure, search & rescue, among others. 3. How can you leverage data to determine someones authoritativeness in a skill? 4. How do you use that authoritativeness to recommend people to endorse? 5. How do you optimize a complex large scale machine learning system for viral growth & engagement? In this talk, well examine the practical aspects of building a data feature like Endorsements. Well talk about marrying product design and data, deep diving into several of the lessons weve learned along the way - all using skills & endorsements as an empirical case study. Well include technical detail on our approaches and how we combine crowdsourcing, machine learning, and large scale distributed systems to recommend topics to users. Well also show interesting results on how members are using the endorsements feature and how its spread across the network. ]]>

(To see the animations, please download the presentation.) Endorsements are a one-click system to recognize someone for their skills and expertise on LinkedIn, the largest professional online social network. This is one of the latest data features in LinkedIns portfolio, and the endorsement ecosystem generates a large graph of reputation signals and viral user activity. Underneath this feature, there are several interesting and difficult data questions: 1. How do you automatically create a taxonomy of skills in the professional context? 2. How do you disambiguate between different contexts of skills? For instance, search could mean information retrieval, search & seizure, search & rescue, among others. 3. How can you leverage data to determine someones authoritativeness in a skill? 4. How do you use that authoritativeness to recommend people to endorse? 5. How do you optimize a complex large scale machine learning system for viral growth & engagement? In this talk, well examine the practical aspects of building a data feature like Endorsements. Well talk about marrying product design and data, deep diving into several of the lessons weve learned along the way - all using skills & endorsements as an empirical case study. Well include technical detail on our approaches and how we combine crowdsourcing, machine learning, and large scale distributed systems to recommend topics to users. Well also show interesting results on how members are using the endorsements feature and how its spread across the network. ]]>
Mon, 04 Mar 2013 11:18:39 GMT /slideshow/strata-endorsements/16924502 s_shah@slideshare.net(s_shah) Strata 2013 - LinkedIn Endorsements: Reputation, Virality, and Social Tagging s_shah (To see the animations, please download the presentation.) Endorsements are a one-click system to recognize someone for their skills and expertise on LinkedIn, the largest professional online social network. This is one of the latest data features in LinkedIns portfolio, and the endorsement ecosystem generates a large graph of reputation signals and viral user activity. Underneath this feature, there are several interesting and difficult data questions: 1. How do you automatically create a taxonomy of skills in the professional context? 2. How do you disambiguate between different contexts of skills? For instance, search could mean information retrieval, search & seizure, search & rescue, among others. 3. How can you leverage data to determine someones authoritativeness in a skill? 4. How do you use that authoritativeness to recommend people to endorse? 5. How do you optimize a complex large scale machine learning system for viral growth & engagement? In this talk, well examine the practical aspects of building a data feature like Endorsements. Well talk about marrying product design and data, deep diving into several of the lessons weve learned along the way - all using skills & endorsements as an empirical case study. Well include technical detail on our approaches and how we combine crowdsourcing, machine learning, and large scale distributed systems to recommend topics to users. Well also show interesting results on how members are using the endorsements feature and how its spread across the network. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/strata-endorsements-130304111839-phpapp02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> (To see the animations, please download the presentation.) Endorsements are a one-click system to recognize someone for their skills and expertise on LinkedIn, the largest professional online social network. This is one of the latest data features in LinkedIns portfolio, and the endorsement ecosystem generates a large graph of reputation signals and viral user activity. Underneath this feature, there are several interesting and difficult data questions: 1. How do you automatically create a taxonomy of skills in the professional context? 2. How do you disambiguate between different contexts of skills? For instance, search could mean information retrieval, search &amp; seizure, search &amp; rescue, among others. 3. How can you leverage data to determine someones authoritativeness in a skill? 4. How do you use that authoritativeness to recommend people to endorse? 5. How do you optimize a complex large scale machine learning system for viral growth &amp; engagement? In this talk, well examine the practical aspects of building a data feature like Endorsements. Well talk about marrying product design and data, deep diving into several of the lessons weve learned along the way - all using skills &amp; endorsements as an empirical case study. Well include technical detail on our approaches and how we combine crowdsourcing, machine learning, and large scale distributed systems to recommend topics to users. Well also show interesting results on how members are using the endorsements feature and how its spread across the network.
Strata 2013 - LinkedIn Endorsements: Reputation, Virality, and Social Tagging from Sam Shah
]]>
9930 23 https://cdn.slidesharecdn.com/ss_thumbnails/strata-endorsements-130304111839-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 1
https://cdn.slidesharecdn.com/profile-photo-s_shah-48x48.jpg?cb=1527813970 https://cdn.slidesharecdn.com/ss_thumbnails/fp086-hsieh-130717121228-phpapp02-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/fp086-hsieh/24347253 Organizational Overlap... https://cdn.slidesharecdn.com/ss_thumbnails/sigmod2013-130717004706-phpapp01-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/the-big-data-ecosystem-at-linkedin-24324982/24324982 The &quot;Big Data&quot; Ecosyst... https://cdn.slidesharecdn.com/ss_thumbnails/p93-kim-130717003913-phpapp01-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/p93-kim/24324853 Root Cause Detection i...