ݺߣshows by User: brunoabrahao / http://www.slideshare.net/images/logo.gif ݺߣshows by User: brunoabrahao / Wed, 14 Aug 2013 14:39:24 GMT ݺߣShare feed for ݺߣshows by User: brunoabrahao Trace Complexity of Network Inference /slideshow/abrahao-kdd2013/25251474 abrahao-kdd2013-130814143924-phpapp01
The network inference problem consists of reconstructing the edge set of a network given traces representing the chronology of infection times as epidemics spread through the network. This problem is a paradigmatic representative of prediction tasks in machine learning that require deducing a latent structure from observed patterns of activity in a network, which often require an unrealistically large number of resources (e.g., amount of available data, or computational time). A fundamental question is to understand which properties we can predict with a reasonable degree of accuracy with the available resources, and which we cannot. We define the trace complexity as the number of distinct traces required to achieve high fidelity in reconstructing the topology of the unobserved network or, more generally, some of its properties. We give algorithms that are competitive with, while being simpler and more efficient than, existing network inference approaches. Moreover, we prove that our algorithms are nearly optimal, by proving an information-theoretic lower bound on the number of traces that an optimal inference algorithm requires for performing this task in the general case. Given these strong lower bounds, we turn our attention to special cases, such as trees and bounded-degree graphs, and to property recovery tasks, such as reconstructing the degree distribution without inferring the network. We show that these problems require a much smaller (and more realistic) number of traces, making them potentially solvable in practice. By Bruno Abrahao, Flavio Chierichetti, Robert Kleinberg and Alessandro Panconesi.]]>

The network inference problem consists of reconstructing the edge set of a network given traces representing the chronology of infection times as epidemics spread through the network. This problem is a paradigmatic representative of prediction tasks in machine learning that require deducing a latent structure from observed patterns of activity in a network, which often require an unrealistically large number of resources (e.g., amount of available data, or computational time). A fundamental question is to understand which properties we can predict with a reasonable degree of accuracy with the available resources, and which we cannot. We define the trace complexity as the number of distinct traces required to achieve high fidelity in reconstructing the topology of the unobserved network or, more generally, some of its properties. We give algorithms that are competitive with, while being simpler and more efficient than, existing network inference approaches. Moreover, we prove that our algorithms are nearly optimal, by proving an information-theoretic lower bound on the number of traces that an optimal inference algorithm requires for performing this task in the general case. Given these strong lower bounds, we turn our attention to special cases, such as trees and bounded-degree graphs, and to property recovery tasks, such as reconstructing the degree distribution without inferring the network. We show that these problems require a much smaller (and more realistic) number of traces, making them potentially solvable in practice. By Bruno Abrahao, Flavio Chierichetti, Robert Kleinberg and Alessandro Panconesi.]]>
Wed, 14 Aug 2013 14:39:24 GMT /slideshow/abrahao-kdd2013/25251474 brunoabrahao@slideshare.net(brunoabrahao) Trace Complexity of Network Inference brunoabrahao The network inference problem consists of reconstructing the edge set of a network given traces representing the chronology of infection times as epidemics spread through the network. This problem is a paradigmatic representative of prediction tasks in machine learning that require deducing a latent structure from observed patterns of activity in a network, which often require an unrealistically large number of resources (e.g., amount of available data, or computational time). A fundamental question is to understand which properties we can predict with a reasonable degree of accuracy with the available resources, and which we cannot. We define the trace complexity as the number of distinct traces required to achieve high fidelity in reconstructing the topology of the unobserved network or, more generally, some of its properties. We give algorithms that are competitive with, while being simpler and more efficient than, existing network inference approaches. Moreover, we prove that our algorithms are nearly optimal, by proving an information-theoretic lower bound on the number of traces that an optimal inference algorithm requires for performing this task in the general case. Given these strong lower bounds, we turn our attention to special cases, such as trees and bounded-degree graphs, and to property recovery tasks, such as reconstructing the degree distribution without inferring the network. We show that these problems require a much smaller (and more realistic) number of traces, making them potentially solvable in practice. By Bruno Abrahao, Flavio Chierichetti, Robert Kleinberg and Alessandro Panconesi. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/abrahao-kdd2013-130814143924-phpapp01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> The network inference problem consists of reconstructing the edge set of a network given traces representing the chronology of infection times as epidemics spread through the network. This problem is a paradigmatic representative of prediction tasks in machine learning that require deducing a latent structure from observed patterns of activity in a network, which often require an unrealistically large number of resources (e.g., amount of available data, or computational time). A fundamental question is to understand which properties we can predict with a reasonable degree of accuracy with the available resources, and which we cannot. We define the trace complexity as the number of distinct traces required to achieve high fidelity in reconstructing the topology of the unobserved network or, more generally, some of its properties. We give algorithms that are competitive with, while being simpler and more efficient than, existing network inference approaches. Moreover, we prove that our algorithms are nearly optimal, by proving an information-theoretic lower bound on the number of traces that an optimal inference algorithm requires for performing this task in the general case. Given these strong lower bounds, we turn our attention to special cases, such as trees and bounded-degree graphs, and to property recovery tasks, such as reconstructing the degree distribution without inferring the network. We show that these problems require a much smaller (and more realistic) number of traces, making them potentially solvable in practice. By Bruno Abrahao, Flavio Chierichetti, Robert Kleinberg and Alessandro Panconesi.
Trace Complexity of Network Inference from Bruno Abrahao
]]>
5604 7 https://cdn.slidesharecdn.com/ss_thumbnails/abrahao-kdd2013-130814143924-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
On the Separability of Structural Classes of Communities /slideshow/kdd2012-talk-copy/14049168 kdd2012-talkcopy-120823051346-phpapp01
Three major factors govern the intricacies of community extraction in networks: (1) the application domain includes a wide variety of networks of fundamentally different natures, (2) the literature offers a multitude of disparate community detection algorithms, and (3) there is no consensus characterizing how to discriminate communities from non-communities. In this paper, we present a comprehensive analysis of community properties through a class separability framework. Our approach enables the assessment of the structural dissimilarity among the output of multiple community detection algorithms and between the output of algorithms and communities that arise in practice. To demonstrate this concept, we furnish our method with a large set of structural properties and multiple community detection algorithms. Applied to a diverse collection of large scale network datasets, the analysis reveals that (1) the different detection algorithms extract fundamentally different structures; (2) the structure of communities that arise in practice is closest to that of communities that random-walk-based algorithms extract, although still significantly different from that of the output of all the algorithms; and (3) a small subset of the properties are nearly as discriminative as the full set, while making explicit the ways in which the algorithms produce biases. Our framework enables an informed choice of the most suitable community detection method for a given purpose and network and allows for a comparison of existing community detection algorithms while guiding the design of new ones. ]]>

Three major factors govern the intricacies of community extraction in networks: (1) the application domain includes a wide variety of networks of fundamentally different natures, (2) the literature offers a multitude of disparate community detection algorithms, and (3) there is no consensus characterizing how to discriminate communities from non-communities. In this paper, we present a comprehensive analysis of community properties through a class separability framework. Our approach enables the assessment of the structural dissimilarity among the output of multiple community detection algorithms and between the output of algorithms and communities that arise in practice. To demonstrate this concept, we furnish our method with a large set of structural properties and multiple community detection algorithms. Applied to a diverse collection of large scale network datasets, the analysis reveals that (1) the different detection algorithms extract fundamentally different structures; (2) the structure of communities that arise in practice is closest to that of communities that random-walk-based algorithms extract, although still significantly different from that of the output of all the algorithms; and (3) a small subset of the properties are nearly as discriminative as the full set, while making explicit the ways in which the algorithms produce biases. Our framework enables an informed choice of the most suitable community detection method for a given purpose and network and allows for a comparison of existing community detection algorithms while guiding the design of new ones. ]]>
Thu, 23 Aug 2012 05:13:42 GMT /slideshow/kdd2012-talk-copy/14049168 brunoabrahao@slideshare.net(brunoabrahao) On the Separability of Structural Classes of Communities brunoabrahao Three major factors govern the intricacies of community extraction in networks: (1) the application domain includes a wide variety of networks of fundamentally different natures, (2) the literature offers a multitude of disparate community detection algorithms, and (3) there is no consensus characterizing how to discriminate communities from non-communities. In this paper, we present a comprehensive analysis of community properties through a class separability framework. Our approach enables the assessment of the structural dissimilarity among the output of multiple community detection algorithms and between the output of algorithms and communities that arise in practice. To demonstrate this concept, we furnish our method with a large set of structural properties and multiple community detection algorithms. Applied to a diverse collection of large scale network datasets, the analysis reveals that (1) the different detection algorithms extract fundamentally different structures; (2) the structure of communities that arise in practice is closest to that of communities that random-walk-based algorithms extract, although still significantly different from that of the output of all the algorithms; and (3) a small subset of the properties are nearly as discriminative as the full set, while making explicit the ways in which the algorithms produce biases. Our framework enables an informed choice of the most suitable community detection method for a given purpose and network and allows for a comparison of existing community detection algorithms while guiding the design of new ones. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/kdd2012-talkcopy-120823051346-phpapp01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Three major factors govern the intricacies of community extraction in networks: (1) the application domain includes a wide variety of networks of fundamentally different natures, (2) the literature offers a multitude of disparate community detection algorithms, and (3) there is no consensus characterizing how to discriminate communities from non-communities. In this paper, we present a comprehensive analysis of community properties through a class separability framework. Our approach enables the assessment of the structural dissimilarity among the output of multiple community detection algorithms and between the output of algorithms and communities that arise in practice. To demonstrate this concept, we furnish our method with a large set of structural properties and multiple community detection algorithms. Applied to a diverse collection of large scale network datasets, the analysis reveals that (1) the different detection algorithms extract fundamentally different structures; (2) the structure of communities that arise in practice is closest to that of communities that random-walk-based algorithms extract, although still significantly different from that of the output of all the algorithms; and (3) a small subset of the properties are nearly as discriminative as the full set, while making explicit the ways in which the algorithms produce biases. Our framework enables an informed choice of the most suitable community detection method for a given purpose and network and allows for a comparison of existing community detection algorithms while guiding the design of new ones.
On the Separability of Structural Classes of Communities from Bruno Abrahao
]]>
6127 7 https://cdn.slidesharecdn.com/ss_thumbnails/kdd2012-talkcopy-120823051346-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
On the Intrinsic Locality Properties of Web Reference Streams /brunoabrahao/on-the-intrinsic-locality-properties-of-web-reference-streams proposal-111107195354-phpapp01
]]>

]]>
Mon, 07 Nov 2011 19:53:52 GMT /brunoabrahao/on-the-intrinsic-locality-properties-of-web-reference-streams brunoabrahao@slideshare.net(brunoabrahao) On the Intrinsic Locality Properties of Web Reference Streams brunoabrahao <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/proposal-111107195354-phpapp01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br>
On the Intrinsic Locality Properties of Web Reference Streams from Bruno Abrahao
]]>
5029 6 https://cdn.slidesharecdn.com/ss_thumbnails/proposal-111107195354-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds presentation 000000 http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Self-Adaptive SLA-Driven Capacity Management for Internet Services /slideshow/selfadaptive-sladriven-capacity-management-for-internet-services/10063132 noms06-final-111107162939-phpapp02
This work considers the problem of hosting multiple third-party Internet services in a cost-effective manner so as to maximize a provider’s business objective. For this purpose, we present a dynamic capacity management framework based on an optimization model, which links a cost model based on SLA contracts with an analytical queuing-based performance model, in an attempt to adapt the platform to changing capacity needs in real time. In addition, we propose a two-level SLA specification for different operation modes, namely, normal and surge, which allows for per-use service accounting with respect to requirements of throughput and tail distribution response time. The cost model proposed is based on penalties, incurred by the provider due to SLA violation, and rewards, received when the service level expectations are exceeded. Finally, we evaluate approximations for predicting the performance of the hosted services under two different scheduling disciplines, namely FCFS and processor sharing. Through simulation, we assess the effectiveness of the proposed approach as well as the level of accuracy resulting from the performance model approximations.]]>

This work considers the problem of hosting multiple third-party Internet services in a cost-effective manner so as to maximize a provider’s business objective. For this purpose, we present a dynamic capacity management framework based on an optimization model, which links a cost model based on SLA contracts with an analytical queuing-based performance model, in an attempt to adapt the platform to changing capacity needs in real time. In addition, we propose a two-level SLA specification for different operation modes, namely, normal and surge, which allows for per-use service accounting with respect to requirements of throughput and tail distribution response time. The cost model proposed is based on penalties, incurred by the provider due to SLA violation, and rewards, received when the service level expectations are exceeded. Finally, we evaluate approximations for predicting the performance of the hosted services under two different scheduling disciplines, namely FCFS and processor sharing. Through simulation, we assess the effectiveness of the proposed approach as well as the level of accuracy resulting from the performance model approximations.]]>
Mon, 07 Nov 2011 16:29:37 GMT /slideshow/selfadaptive-sladriven-capacity-management-for-internet-services/10063132 brunoabrahao@slideshare.net(brunoabrahao) Self-Adaptive SLA-Driven Capacity Management for Internet Services brunoabrahao This work considers the problem of hosting multiple third-party Internet services in a cost-effective manner so as to maximize a provider’s business objective. For this purpose, we present a dynamic capacity management framework based on an optimization model, which links a cost model based on SLA contracts with an analytical queuing-based performance model, in an attempt to adapt the platform to changing capacity needs in real time. In addition, we propose a two-level SLA specification for different operation modes, namely, normal and surge, which allows for per-use service accounting with respect to requirements of throughput and tail distribution response time. The cost model proposed is based on penalties, incurred by the provider due to SLA violation, and rewards, received when the service level expectations are exceeded. Finally, we evaluate approximations for predicting the performance of the hosted services under two different scheduling disciplines, namely FCFS and processor sharing. Through simulation, we assess the effectiveness of the proposed approach as well as the level of accuracy resulting from the performance model approximations. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/noms06-final-111107162939-phpapp02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> This work considers the problem of hosting multiple third-party Internet services in a cost-effective manner so as to maximize a provider’s business objective. For this purpose, we present a dynamic capacity management framework based on an optimization model, which links a cost model based on SLA contracts with an analytical queuing-based performance model, in an attempt to adapt the platform to changing capacity needs in real time. In addition, we propose a two-level SLA specification for different operation modes, namely, normal and surge, which allows for per-use service accounting with respect to requirements of throughput and tail distribution response time. The cost model proposed is based on penalties, incurred by the provider due to SLA violation, and rewards, received when the service level expectations are exceeded. Finally, we evaluate approximations for predicting the performance of the hosted services under two different scheduling disciplines, namely FCFS and processor sharing. Through simulation, we assess the effectiveness of the proposed approach as well as the level of accuracy resulting from the performance model approximations.
Self-Adaptive SLA-Driven Capacity Management for Internet Services from Bruno Abrahao
]]>
5610 5 https://cdn.slidesharecdn.com/ss_thumbnails/noms06-final-111107162939-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
On the Internet Delay Space Dimensionality /brunoabrahao/on-the-internet-delay-space-dimensionality bu-talk-111107154145-phpapp01
We investigate the dimensionality properties of the Internet delay space, i.e., the matrix of measured round-trip latencies between Internet hosts. Previous work on network coordinates has indicated that this matrix can be embedded, with reasonably low distortion, into a 4- to 9-dimensional Euclidean space. The application of Principal Component Analysis (PCA) reveals the same dimensionality values. Our work addresses the question: to what extent is the dimensionality an intrinsic property of the delay space, defined without reference to a host metric such as Euclidean space? Is the intrinsic dimensionality of the Internet delay space approximately equal to the dimension determined using embedding techniques or PCA? If not, what explains the discrepancy? What properties of the network contribute to its overall dimensionality? Using datasets obtained via the King [14] method, we study different measures of dimensionality to establish the following conclusions. First, based on its power-law behavior, the structure of the delay space can be better characterized by fractal measures. Second, the intrinsic dimension is significantly smaller than the value predicted by the previous studies; in fact by our measures it is less than 2. Third, we demonstrate a particular way in which the AS topology is reflected in the delay space; subnetworks composed of hosts which share an upstream Tier-1 autonomous system in common possess lower dimensionality than the combined delay space. Finally, we observe that fractal measures, due to their sensitivity to non-linear structures, display higher precision for measuring the influence of subtle features of the delay space geometry.]]>

We investigate the dimensionality properties of the Internet delay space, i.e., the matrix of measured round-trip latencies between Internet hosts. Previous work on network coordinates has indicated that this matrix can be embedded, with reasonably low distortion, into a 4- to 9-dimensional Euclidean space. The application of Principal Component Analysis (PCA) reveals the same dimensionality values. Our work addresses the question: to what extent is the dimensionality an intrinsic property of the delay space, defined without reference to a host metric such as Euclidean space? Is the intrinsic dimensionality of the Internet delay space approximately equal to the dimension determined using embedding techniques or PCA? If not, what explains the discrepancy? What properties of the network contribute to its overall dimensionality? Using datasets obtained via the King [14] method, we study different measures of dimensionality to establish the following conclusions. First, based on its power-law behavior, the structure of the delay space can be better characterized by fractal measures. Second, the intrinsic dimension is significantly smaller than the value predicted by the previous studies; in fact by our measures it is less than 2. Third, we demonstrate a particular way in which the AS topology is reflected in the delay space; subnetworks composed of hosts which share an upstream Tier-1 autonomous system in common possess lower dimensionality than the combined delay space. Finally, we observe that fractal measures, due to their sensitivity to non-linear structures, display higher precision for measuring the influence of subtle features of the delay space geometry.]]>
Mon, 07 Nov 2011 15:41:42 GMT /brunoabrahao/on-the-internet-delay-space-dimensionality brunoabrahao@slideshare.net(brunoabrahao) On the Internet Delay Space Dimensionality brunoabrahao We investigate the dimensionality properties of the Internet delay space, i.e., the matrix of measured round-trip latencies between Internet hosts. Previous work on network coordinates has indicated that this matrix can be embedded, with reasonably low distortion, into a 4- to 9-dimensional Euclidean space. The application of Principal Component Analysis (PCA) reveals the same dimensionality values. Our work addresses the question: to what extent is the dimensionality an intrinsic property of the delay space, defined without reference to a host metric such as Euclidean space? Is the intrinsic dimensionality of the Internet delay space approximately equal to the dimension determined using embedding techniques or PCA? If not, what explains the discrepancy? What properties of the network contribute to its overall dimensionality? Using datasets obtained via the King [14] method, we study different measures of dimensionality to establish the following conclusions. First, based on its power-law behavior, the structure of the delay space can be better characterized by fractal measures. Second, the intrinsic dimension is significantly smaller than the value predicted by the previous studies; in fact by our measures it is less than 2. Third, we demonstrate a particular way in which the AS topology is reflected in the delay space; subnetworks composed of hosts which share an upstream Tier-1 autonomous system in common possess lower dimensionality than the combined delay space. Finally, we observe that fractal measures, due to their sensitivity to non-linear structures, display higher precision for measuring the influence of subtle features of the delay space geometry. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/bu-talk-111107154145-phpapp01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> We investigate the dimensionality properties of the Internet delay space, i.e., the matrix of measured round-trip latencies between Internet hosts. Previous work on network coordinates has indicated that this matrix can be embedded, with reasonably low distortion, into a 4- to 9-dimensional Euclidean space. The application of Principal Component Analysis (PCA) reveals the same dimensionality values. Our work addresses the question: to what extent is the dimensionality an intrinsic property of the delay space, defined without reference to a host metric such as Euclidean space? Is the intrinsic dimensionality of the Internet delay space approximately equal to the dimension determined using embedding techniques or PCA? If not, what explains the discrepancy? What properties of the network contribute to its overall dimensionality? Using datasets obtained via the King [14] method, we study different measures of dimensionality to establish the following conclusions. First, based on its power-law behavior, the structure of the delay space can be better characterized by fractal measures. Second, the intrinsic dimension is significantly smaller than the value predicted by the previous studies; in fact by our measures it is less than 2. Third, we demonstrate a particular way in which the AS topology is reflected in the delay space; subnetworks composed of hosts which share an upstream Tier-1 autonomous system in common possess lower dimensionality than the combined delay space. Finally, we observe that fractal measures, due to their sensitivity to non-linear structures, display higher precision for measuring the influence of subtle features of the delay space geometry.
On the Internet Delay Space Dimensionality from Bruno Abrahao
]]>
5959 11 https://cdn.slidesharecdn.com/ss_thumbnails/bu-talk-111107154145-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
https://cdn.slidesharecdn.com/profile-photo-brunoabrahao-48x48.jpg?cb=1611581767 Data Science, Machine Learning, Computational Social Science http://www.cs.stanford.edu/~abrahao https://cdn.slidesharecdn.com/ss_thumbnails/abrahao-kdd2013-130814143924-phpapp01-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/abrahao-kdd2013/25251474 Trace Complexity of Ne... https://cdn.slidesharecdn.com/ss_thumbnails/kdd2012-talkcopy-120823051346-phpapp01-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/kdd2012-talk-copy/14049168 On the Separability of... https://cdn.slidesharecdn.com/ss_thumbnails/proposal-111107195354-phpapp01-thumbnail.jpg?width=320&height=320&fit=bounds brunoabrahao/on-the-intrinsic-locality-properties-of-web-reference-streams On the Intrinsic Local...