ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Modeling Decisions for Artificial Intelligence (MDAI)
               2012, Girona, Spain
Introduction

? We present a benchmarking of two distinct algorithms for
extracting community structure from On-line Social Networks
(OSNs), considering how we can representatively sample an OSN
graph while maintaining its community structure.



? We do this by extracting the community structure from the
original and sampled versions of five well-known benchmarking
datasets and comparing the results.
   ? We assume there is NO a priori knowledge about the
   expected result.
   ? A supervised sampling is performed.
Extraction of the community structure


Algorithm 1: Newman¡¯s algorithm


   ? Extracts the communities by successively dividing the graph
   into components, using Freeman¡¯s betweenness centrality
   measure until modularity Q is maximized.


   ? Modularity (Q): Is the measure used to quantify the quality of
   the community partitions ¡®on the fly¡¯. Usual range: [0.3 - 0.7].


   ? We have implemented it in Python (using NetworkX library).
Extraction of the community structure


Algorithm 2: Blondel¡¯s method


   1. The method looks for smaller communities by optimizing
   modularity locally.
   2. Then it aggregates nodes of the same community and builds
   a new network whose nodes are communities.
    Steps 1 and 2 are repeated until modularity Q is maximized.


   ? The default version was used from the Gephi graph
   processing software.
Filtering / Sampling process

2-step process: In order to obtain a subset of a complete graph we
apply a process consisting of filtering and sampling.

    ? First step: Filtering (seed node selection).
         Consists of filtering the graph nodes based on their degree
         or their clustering coefficient. Filtering thresholds are user
         defined.

        ? Goal: Identify hub nodes and dense regions of the
        graph.

    ? Second step: Sampling.
        We apply a sampling at 1 hop to obtain all the neighbours
        connected to each seed node.

        ? Goal: Maintain core community structure.
Datasets used

                         Graph Statistics

               Karate   Dolphins    GrQc    Enron    Facebook


#Nodes          34        62        5242    10630     31720


#Edges          78        159       14496   164837    80592


Avg. degree     4.59      5.13      5.530   31.013    5.081


Clust. coef.    0.57      0.26      0.529   0.383     0.079



Avg. path
               2.408     3.356      6.049   3.160     6.432
length



Diameter         5         8         17       20        9
Sample criteria

? Sampling requires user supervision

? Sampling % desired: [ 10%-20% ]

? Only large graphs datasets (3) are sampled


                                           Resulting sample
                    Filter         Value                        Sample
                                                size

   ArXiv-GrQc       Degree          ¡Ý30        17.91%         All neighbors


   Enron        Clustering coef.    =1         20.83%         All neighbors


   Facebook     Clustering coef.   ¡Ý0.5        10.75%         All neighbors
Sampling statistics

? Graph statistics for sampled Vs original datasets (sss / ooo)


                           GrQc             Enron          Facebook
                       Degree >= 30     Clust.Coef = 1   Clust.Coef >= 0.5


   #Nodes               939 / 5242      2218 / 10630     3410 / 31720



   #Edges              5715 / 14446    14912 / 164387    6561 / 80592



   Avg. degree         12.17 / 5,53    12.315 / 31,013   3.848 / 5,081



   Clust. coef.        0.698 / 0,529    0.761 / 0,383    0.632 / 0,079



   Avg. path length    4.51 / 6,049     3.143 / 3,160    8.388 / 6,432



   Diameter               10 / 17           7 / 20            27 / 9
Sampling statistics

? Indicator: Clustering coefficient shows a common pattern, increasing
   in sampled datasets. This serves as an indicator that the core is
   preferentially included in the samples.
                           GrQc             Enron          Facebook
                       Degree >= 30     Clust.Coef = 1   Clust.Coef >= 0.5


    #Nodes              939 / 5242      2218 / 10630     3410 / 31720



    #Edges             5715 / 14446    14912 / 164387    6561 / 80592



    Avg. degree        12.17 / 5,53    12.315 / 31,013   3.848 / 5,081



    Clust. coef.       0.698 / 0,529   0.761 / 0,383     0.632 / 0,079



    Avg. path length   4.51 / 6,049     3.143 / 3,160    8.388 / 6,432



    Diameter              10 / 17           7 / 20            27 / 9
Empirical Tests and Results

1. First, we evaluate Newman¡¯s algorithm with the sampled datasets.

                        Stop                           Original or
                     Iteration    Q      Communities   Sampled

     Karate              4       0.494        5            O
     Dolphins            5       0.591        6            O
     GrQc                56      0.777       57            S
     Enron              865      0.421       869           S
     Enron Early*        51      0.325       56            S
     Facebook            40      0.870       190           S
Empirical Tests and Results

2. Blondel¡¯s method allows us to extract the communities from the
original dataset, given it¡¯s greater execution speed in comparison with
Newman¡¯s method.

                              Original              Sampled

                          Q              C      Q             C
           GrQc          0.856           390   0.789      11
           Enron         0.491           43    0.560      68
           Facebook      0.681       1105      0.519      33




? How to compare nodes community matching?
    ? NMI : Normalized Mutual Information
Normalized Mutual Information

? After labeling the communities, we match the nodes inside every
corresponding community in the sampled and original datasets.

? Purity: 100% means that all nodes in same communities are matched in
both datasets.

? We compare the Top N communities ( N =10 )

o Handicap
    Newman¡¯s and Blondel¡¯s methods are stochastic and non-deterministic
    ? Give slightly different results in each execution.
Normalized Mutual Information

? After labeling the communities, we match the nodes inside every
corresponding community in the sampled and original datasets.

? Purity: 100% means that all nodes in same communities are matched in
both datasets.

? We compare the Top N communities ( N =10 )

o Handicap
    Newman¡¯s and Blondel¡¯s methods are stochastic and non-deterministic
    ? Give slightly different results in each execution.
                                 NMI sampled
               NMI orig. Vs.                     NMI orig Vs. orig.   Net loss
                               Vs. sampled (B)
               sampled (A)                            (C)              (C- A)

   GrQc           0.66559          0.82544            0.77301         0.10742


   Enron          0.69069          0.86903            0.82012         0.12943


   Facebook       0.58996          0.73249            0.69215         0.10219
Newman¡¯s Vs. Blondel¡¯s

? In terms of modularity (Q) and number of communities (C)



                    Blondel¡¯s          Blondel¡¯s        Newman¡¯s
                    Original           Sampled          Sampled

                   Q            C    Q         C       Q       C
     GrQc        0.856      390     0.789      11     0.777    57
     Enron       0.491      43      0.560      68     0.325    56
     Facebook    0.681     1105     0.519      33     0.870   190




? The best modularity values are dataset dependent.
Newman¡¯s Vs. Blondel¡¯s

? In terms of modularity (Q) and number of communities (C)



                     Blondel¡¯s          Blondel¡¯s      Newman¡¯s
                     Original           Sampled        Sampled

                   Q             C    Q         C     Q       C
     GrQc         0.856      390     0.789      11   0.777    57
     Enron        0.491      43      0.560      68   0.325    56
     Facebook     0.681     1105     0.519      33   0.870   190




? The methods may give distinct results in terms of the number of
communities and modularity values.
Newman¡¯s (NG) Vs. Blondel¡¯s (BN)

? In terms of NMI: Normalized Mutual Information.
         Comparing Top N communities

              NMI BN Vs. NG   NMI NG Vs. BN   NMI orig Vs. orig.      Net loss
                  (A)             (B)              (C)             (C- Avg (A,B))

GrQc             0.69116         0.87243            0.77301          -0.00878


Enron            0.31313         0.68796            0.82012            0.31958


Enron Early      0.83437         0.44320            0.82012            0.18133


Facebook         0.62056         0.54551            0.69215            0.10911




? Results show significant differences between the assignment of the
nodes between methods.
Conclusions

? We¡¯ve benchmarked 5 statistically and topologically distinct datasets

    ? Applying 2 community structure algorithms
    ? Sampling original datasets


? Results indicate that is possible to identify the principal communities
for large complex datasets using sampling.

    ? It maintains the key facets of the community structure of a
    dataset (NMI statistic shows high correspondence is maintained)
    ? Significantly reduces of the dataset size (80-90%)


? However, a difference is found in the assignment of nodes to
communities between different executions and methods, due to their
stochastic nature.
Thank you for
your attention :-)

More Related Content

Similar to Analysis of on line social networks (OSNs) represented as graphs- extraction of an approximation of community structure using sampling (20)

Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
Carlos Castillo (ChaTo)
?
Microarray Analysis
Microarray AnalysisMicroarray Analysis
Microarray Analysis
James McInerney
?
Original SOINN
Original SOINNOriginal SOINN
Original SOINN
SOINN Inc.
?
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
ABINASHPADHY6
?
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN Applications
Chester Chen
?
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MLAI2
?
Fuzzy c means clustering protocol for wireless sensor networks
Fuzzy c means clustering protocol for wireless sensor networksFuzzy c means clustering protocol for wireless sensor networks
Fuzzy c means clustering protocol for wireless sensor networks
mourya chandra
?
Fa19_P1.pptx
Fa19_P1.pptxFa19_P1.pptx
Fa19_P1.pptx
Md Abul Hayat
?
Network sampling, community detection
Network sampling, community detectionNetwork sampling, community detection
Network sampling, community detection
roberval mariano
?
Do Fractional Norms and Quasinorms Help to Overcome the Curse of Dimensiona...
Do Fractional Norms and Quasinorms Help to Overcome the Curse of Dimensiona...Do Fractional Norms and Quasinorms Help to Overcome the Curse of Dimensiona...
Do Fractional Norms and Quasinorms Help to Overcome the Curse of Dimensiona...
Alexander Gorban
?
Prediction of beta decay half-lives with .pptx
Prediction of beta decay half-lives with .pptxPrediction of beta decay half-lives with .pptx
Prediction of beta decay half-lives with .pptx
u2000055
?
CDAC 2018 Pellegrini clustering ppi networks
CDAC 2018 Pellegrini clustering ppi networksCDAC 2018 Pellegrini clustering ppi networks
CDAC 2018 Pellegrini clustering ppi networks
Marco Antoniotti
?
20231 MCHA022 (Analytical Chemistry 2).pdf
20231 MCHA022 (Analytical Chemistry 2).pdf20231 MCHA022 (Analytical Chemistry 2).pdf
20231 MCHA022 (Analytical Chemistry 2).pdf
TshiamoMotaung
?
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
odsc
?
Phylogenetics1
Phylogenetics1Phylogenetics1
Phylogenetics1
S¨¦bastien De Landtsheer
?
Ieek fall Conference 2013
Ieek fall Conference 2013Ieek fall Conference 2013
Ieek fall Conference 2013
toha ardi nugraha
?
Design of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer DiseasesDesign of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer Diseases
Mohamed Loey
?
4.1 network analysis basic
4.1 network analysis basic4.1 network analysis basic
4.1 network analysis basic
jilung hsieh
?
L6 Speed Studies
L6 Speed StudiesL6 Speed Studies
L6 Speed Studies
Hossam Shafiq I
?
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
Jason Riedy
?
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
ABINASHPADHY6
?
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN Applications
Chester Chen
?
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MLAI2
?
Fuzzy c means clustering protocol for wireless sensor networks
Fuzzy c means clustering protocol for wireless sensor networksFuzzy c means clustering protocol for wireless sensor networks
Fuzzy c means clustering protocol for wireless sensor networks
mourya chandra
?
Network sampling, community detection
Network sampling, community detectionNetwork sampling, community detection
Network sampling, community detection
roberval mariano
?
Do Fractional Norms and Quasinorms Help to Overcome the Curse of Dimensiona...
Do Fractional Norms and Quasinorms Help to Overcome the Curse of Dimensiona...Do Fractional Norms and Quasinorms Help to Overcome the Curse of Dimensiona...
Do Fractional Norms and Quasinorms Help to Overcome the Curse of Dimensiona...
Alexander Gorban
?
Prediction of beta decay half-lives with .pptx
Prediction of beta decay half-lives with .pptxPrediction of beta decay half-lives with .pptx
Prediction of beta decay half-lives with .pptx
u2000055
?
CDAC 2018 Pellegrini clustering ppi networks
CDAC 2018 Pellegrini clustering ppi networksCDAC 2018 Pellegrini clustering ppi networks
CDAC 2018 Pellegrini clustering ppi networks
Marco Antoniotti
?
20231 MCHA022 (Analytical Chemistry 2).pdf
20231 MCHA022 (Analytical Chemistry 2).pdf20231 MCHA022 (Analytical Chemistry 2).pdf
20231 MCHA022 (Analytical Chemistry 2).pdf
TshiamoMotaung
?
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
odsc
?
Design of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer DiseasesDesign of an Intelligent System for Improving Classification of Cancer Diseases
Design of an Intelligent System for Improving Classification of Cancer Diseases
Mohamed Loey
?
4.1 network analysis basic
4.1 network analysis basic4.1 network analysis basic
4.1 network analysis basic
jilung hsieh
?
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
Jason Riedy
?

Recently uploaded (20)

AI in Talent Acquisition: Boosting Hiring
AI in Talent Acquisition: Boosting HiringAI in Talent Acquisition: Boosting Hiring
AI in Talent Acquisition: Boosting Hiring
Beyond Chiefs
?
Mastering Azure Durable Functions - Building Resilient and Scalable Workflows
Mastering Azure Durable Functions - Building Resilient and Scalable WorkflowsMastering Azure Durable Functions - Building Resilient and Scalable Workflows
Mastering Azure Durable Functions - Building Resilient and Scalable Workflows
Callon Campbell
?
Recruiting Tech: A Look at Why AI is Actually OG
Recruiting Tech: A Look at Why AI is Actually OGRecruiting Tech: A Look at Why AI is Actually OG
Recruiting Tech: A Look at Why AI is Actually OG
Matt Charney
?
STRING FUNCTIONS IN JAVA BY N SARATH KUMAR
STRING FUNCTIONS IN JAVA BY N SARATH KUMARSTRING FUNCTIONS IN JAVA BY N SARATH KUMAR
STRING FUNCTIONS IN JAVA BY N SARATH KUMAR
Sarathkumar Narsupalli
?
SAP Automation with UiPath: Solution Accelerators and Best Practices - Part 6...
SAP Automation with UiPath: Solution Accelerators and Best Practices - Part 6...SAP Automation with UiPath: Solution Accelerators and Best Practices - Part 6...
SAP Automation with UiPath: Solution Accelerators and Best Practices - Part 6...
DianaGray10
?
Building High-Impact Teams Beyond the Product Triad.pdf
Building High-Impact Teams Beyond the Product Triad.pdfBuilding High-Impact Teams Beyond the Product Triad.pdf
Building High-Impact Teams Beyond the Product Triad.pdf
Rafael Burity
?
EXCEPTION HANDLING IN JAVA BY N SARATH KUMAR
EXCEPTION HANDLING IN JAVA BY N SARATH KUMAREXCEPTION HANDLING IN JAVA BY N SARATH KUMAR
EXCEPTION HANDLING IN JAVA BY N SARATH KUMAR
Sarathkumar Narsupalli
?
TrustArc Webinar - Data Privacy and Cyber Security: A Symbiotic Relationship
TrustArc Webinar - Data Privacy and Cyber Security: A Symbiotic RelationshipTrustArc Webinar - Data Privacy and Cyber Security: A Symbiotic Relationship
TrustArc Webinar - Data Privacy and Cyber Security: A Symbiotic Relationship
TrustArc
?
Convert EML files to PST on Mac operating system
Convert EML files to PST on Mac operating systemConvert EML files to PST on Mac operating system
Convert EML files to PST on Mac operating system
Rachel Walker
?
Next.js Development: The Ultimate Solution for High-Performance Web Apps
Next.js Development: The Ultimate Solution for High-Performance Web AppsNext.js Development: The Ultimate Solution for High-Performance Web Apps
Next.js Development: The Ultimate Solution for High-Performance Web Apps
rwinfotech31
?
Fast Screen Recorder v2.1.0.11 Crack Updated [April-2025]
Fast Screen Recorder v2.1.0.11 Crack Updated [April-2025]Fast Screen Recorder v2.1.0.11 Crack Updated [April-2025]
Fast Screen Recorder v2.1.0.11 Crack Updated [April-2025]
jackalen173
?
Leadership Spectrum by Sonam Sherpa at GDG Kathmandu March Monthly Meetup
Leadership Spectrum by Sonam Sherpa at GDG Kathmandu March Monthly MeetupLeadership Spectrum by Sonam Sherpa at GDG Kathmandu March Monthly Meetup
Leadership Spectrum by Sonam Sherpa at GDG Kathmandu March Monthly Meetup
GDG Kathmandu
?
Ricardo Jebb Bruno - A Structural CAD Technician
Ricardo Jebb Bruno - A Structural CAD TechnicianRicardo Jebb Bruno - A Structural CAD Technician
Ricardo Jebb Bruno - A Structural CAD Technician
Ricardo Jebb Bruno
?
Microsoft Digital Defense Report 2024 .pdf
Microsoft Digital Defense Report 2024 .pdfMicrosoft Digital Defense Report 2024 .pdf
Microsoft Digital Defense Report 2024 .pdf
Abhishek Agarwal
?
Artificial Neural Networks, basics, its variations and examples
Artificial Neural Networks, basics, its variations and examplesArtificial Neural Networks, basics, its variations and examples
Artificial Neural Networks, basics, its variations and examples
anandsimple
?
Transactional Outbox & Inbox Patterns.pptx
Transactional Outbox & Inbox Patterns.pptxTransactional Outbox & Inbox Patterns.pptx
Transactional Outbox & Inbox Patterns.pptx
Maysam Mousa
?
Automated Engineering of Domain-Specific Metamorphic Testing Environments
Automated Engineering of Domain-Specific Metamorphic Testing EnvironmentsAutomated Engineering of Domain-Specific Metamorphic Testing Environments
Automated Engineering of Domain-Specific Metamorphic Testing Environments
Pablo G¨®mez Abajo
?
202408_JAWSPANKRATION_Introduction_of_Minaden.pdf
202408_JAWSPANKRATION_Introduction_of_Minaden.pdf202408_JAWSPANKRATION_Introduction_of_Minaden.pdf
202408_JAWSPANKRATION_Introduction_of_Minaden.pdf
NTTDOCOMO-ServiceInnovation
?
Benefits of Moving Ellucian Banner to Oracle Cloud
Benefits of Moving Ellucian Banner to Oracle CloudBenefits of Moving Ellucian Banner to Oracle Cloud
Benefits of Moving Ellucian Banner to Oracle Cloud
AstuteBusiness
?
Least Privilege AWS IAM Role Permissions
Least Privilege AWS IAM Role PermissionsLeast Privilege AWS IAM Role Permissions
Least Privilege AWS IAM Role Permissions
Chris Wahl
?
AI in Talent Acquisition: Boosting Hiring
AI in Talent Acquisition: Boosting HiringAI in Talent Acquisition: Boosting Hiring
AI in Talent Acquisition: Boosting Hiring
Beyond Chiefs
?
Mastering Azure Durable Functions - Building Resilient and Scalable Workflows
Mastering Azure Durable Functions - Building Resilient and Scalable WorkflowsMastering Azure Durable Functions - Building Resilient and Scalable Workflows
Mastering Azure Durable Functions - Building Resilient and Scalable Workflows
Callon Campbell
?
Recruiting Tech: A Look at Why AI is Actually OG
Recruiting Tech: A Look at Why AI is Actually OGRecruiting Tech: A Look at Why AI is Actually OG
Recruiting Tech: A Look at Why AI is Actually OG
Matt Charney
?
STRING FUNCTIONS IN JAVA BY N SARATH KUMAR
STRING FUNCTIONS IN JAVA BY N SARATH KUMARSTRING FUNCTIONS IN JAVA BY N SARATH KUMAR
STRING FUNCTIONS IN JAVA BY N SARATH KUMAR
Sarathkumar Narsupalli
?
SAP Automation with UiPath: Solution Accelerators and Best Practices - Part 6...
SAP Automation with UiPath: Solution Accelerators and Best Practices - Part 6...SAP Automation with UiPath: Solution Accelerators and Best Practices - Part 6...
SAP Automation with UiPath: Solution Accelerators and Best Practices - Part 6...
DianaGray10
?
Building High-Impact Teams Beyond the Product Triad.pdf
Building High-Impact Teams Beyond the Product Triad.pdfBuilding High-Impact Teams Beyond the Product Triad.pdf
Building High-Impact Teams Beyond the Product Triad.pdf
Rafael Burity
?
EXCEPTION HANDLING IN JAVA BY N SARATH KUMAR
EXCEPTION HANDLING IN JAVA BY N SARATH KUMAREXCEPTION HANDLING IN JAVA BY N SARATH KUMAR
EXCEPTION HANDLING IN JAVA BY N SARATH KUMAR
Sarathkumar Narsupalli
?
TrustArc Webinar - Data Privacy and Cyber Security: A Symbiotic Relationship
TrustArc Webinar - Data Privacy and Cyber Security: A Symbiotic RelationshipTrustArc Webinar - Data Privacy and Cyber Security: A Symbiotic Relationship
TrustArc Webinar - Data Privacy and Cyber Security: A Symbiotic Relationship
TrustArc
?
Convert EML files to PST on Mac operating system
Convert EML files to PST on Mac operating systemConvert EML files to PST on Mac operating system
Convert EML files to PST on Mac operating system
Rachel Walker
?
Next.js Development: The Ultimate Solution for High-Performance Web Apps
Next.js Development: The Ultimate Solution for High-Performance Web AppsNext.js Development: The Ultimate Solution for High-Performance Web Apps
Next.js Development: The Ultimate Solution for High-Performance Web Apps
rwinfotech31
?
Fast Screen Recorder v2.1.0.11 Crack Updated [April-2025]
Fast Screen Recorder v2.1.0.11 Crack Updated [April-2025]Fast Screen Recorder v2.1.0.11 Crack Updated [April-2025]
Fast Screen Recorder v2.1.0.11 Crack Updated [April-2025]
jackalen173
?
Leadership Spectrum by Sonam Sherpa at GDG Kathmandu March Monthly Meetup
Leadership Spectrum by Sonam Sherpa at GDG Kathmandu March Monthly MeetupLeadership Spectrum by Sonam Sherpa at GDG Kathmandu March Monthly Meetup
Leadership Spectrum by Sonam Sherpa at GDG Kathmandu March Monthly Meetup
GDG Kathmandu
?
Ricardo Jebb Bruno - A Structural CAD Technician
Ricardo Jebb Bruno - A Structural CAD TechnicianRicardo Jebb Bruno - A Structural CAD Technician
Ricardo Jebb Bruno - A Structural CAD Technician
Ricardo Jebb Bruno
?
Microsoft Digital Defense Report 2024 .pdf
Microsoft Digital Defense Report 2024 .pdfMicrosoft Digital Defense Report 2024 .pdf
Microsoft Digital Defense Report 2024 .pdf
Abhishek Agarwal
?
Artificial Neural Networks, basics, its variations and examples
Artificial Neural Networks, basics, its variations and examplesArtificial Neural Networks, basics, its variations and examples
Artificial Neural Networks, basics, its variations and examples
anandsimple
?
Transactional Outbox & Inbox Patterns.pptx
Transactional Outbox & Inbox Patterns.pptxTransactional Outbox & Inbox Patterns.pptx
Transactional Outbox & Inbox Patterns.pptx
Maysam Mousa
?
Automated Engineering of Domain-Specific Metamorphic Testing Environments
Automated Engineering of Domain-Specific Metamorphic Testing EnvironmentsAutomated Engineering of Domain-Specific Metamorphic Testing Environments
Automated Engineering of Domain-Specific Metamorphic Testing Environments
Pablo G¨®mez Abajo
?
Benefits of Moving Ellucian Banner to Oracle Cloud
Benefits of Moving Ellucian Banner to Oracle CloudBenefits of Moving Ellucian Banner to Oracle Cloud
Benefits of Moving Ellucian Banner to Oracle Cloud
AstuteBusiness
?
Least Privilege AWS IAM Role Permissions
Least Privilege AWS IAM Role PermissionsLeast Privilege AWS IAM Role Permissions
Least Privilege AWS IAM Role Permissions
Chris Wahl
?

Analysis of on line social networks (OSNs) represented as graphs- extraction of an approximation of community structure using sampling

  • 1. Modeling Decisions for Artificial Intelligence (MDAI) 2012, Girona, Spain
  • 2. Introduction ? We present a benchmarking of two distinct algorithms for extracting community structure from On-line Social Networks (OSNs), considering how we can representatively sample an OSN graph while maintaining its community structure. ? We do this by extracting the community structure from the original and sampled versions of five well-known benchmarking datasets and comparing the results. ? We assume there is NO a priori knowledge about the expected result. ? A supervised sampling is performed.
  • 3. Extraction of the community structure Algorithm 1: Newman¡¯s algorithm ? Extracts the communities by successively dividing the graph into components, using Freeman¡¯s betweenness centrality measure until modularity Q is maximized. ? Modularity (Q): Is the measure used to quantify the quality of the community partitions ¡®on the fly¡¯. Usual range: [0.3 - 0.7]. ? We have implemented it in Python (using NetworkX library).
  • 4. Extraction of the community structure Algorithm 2: Blondel¡¯s method 1. The method looks for smaller communities by optimizing modularity locally. 2. Then it aggregates nodes of the same community and builds a new network whose nodes are communities. Steps 1 and 2 are repeated until modularity Q is maximized. ? The default version was used from the Gephi graph processing software.
  • 5. Filtering / Sampling process 2-step process: In order to obtain a subset of a complete graph we apply a process consisting of filtering and sampling. ? First step: Filtering (seed node selection). Consists of filtering the graph nodes based on their degree or their clustering coefficient. Filtering thresholds are user defined. ? Goal: Identify hub nodes and dense regions of the graph. ? Second step: Sampling. We apply a sampling at 1 hop to obtain all the neighbours connected to each seed node. ? Goal: Maintain core community structure.
  • 6. Datasets used Graph Statistics Karate Dolphins GrQc Enron Facebook #Nodes 34 62 5242 10630 31720 #Edges 78 159 14496 164837 80592 Avg. degree 4.59 5.13 5.530 31.013 5.081 Clust. coef. 0.57 0.26 0.529 0.383 0.079 Avg. path 2.408 3.356 6.049 3.160 6.432 length Diameter 5 8 17 20 9
  • 7. Sample criteria ? Sampling requires user supervision ? Sampling % desired: [ 10%-20% ] ? Only large graphs datasets (3) are sampled Resulting sample Filter Value Sample size ArXiv-GrQc Degree ¡Ý30 17.91% All neighbors Enron Clustering coef. =1 20.83% All neighbors Facebook Clustering coef. ¡Ý0.5 10.75% All neighbors
  • 8. Sampling statistics ? Graph statistics for sampled Vs original datasets (sss / ooo) GrQc Enron Facebook Degree >= 30 Clust.Coef = 1 Clust.Coef >= 0.5 #Nodes 939 / 5242 2218 / 10630 3410 / 31720 #Edges 5715 / 14446 14912 / 164387 6561 / 80592 Avg. degree 12.17 / 5,53 12.315 / 31,013 3.848 / 5,081 Clust. coef. 0.698 / 0,529 0.761 / 0,383 0.632 / 0,079 Avg. path length 4.51 / 6,049 3.143 / 3,160 8.388 / 6,432 Diameter 10 / 17 7 / 20 27 / 9
  • 9. Sampling statistics ? Indicator: Clustering coefficient shows a common pattern, increasing in sampled datasets. This serves as an indicator that the core is preferentially included in the samples. GrQc Enron Facebook Degree >= 30 Clust.Coef = 1 Clust.Coef >= 0.5 #Nodes 939 / 5242 2218 / 10630 3410 / 31720 #Edges 5715 / 14446 14912 / 164387 6561 / 80592 Avg. degree 12.17 / 5,53 12.315 / 31,013 3.848 / 5,081 Clust. coef. 0.698 / 0,529 0.761 / 0,383 0.632 / 0,079 Avg. path length 4.51 / 6,049 3.143 / 3,160 8.388 / 6,432 Diameter 10 / 17 7 / 20 27 / 9
  • 10. Empirical Tests and Results 1. First, we evaluate Newman¡¯s algorithm with the sampled datasets. Stop Original or Iteration Q Communities Sampled Karate 4 0.494 5 O Dolphins 5 0.591 6 O GrQc 56 0.777 57 S Enron 865 0.421 869 S Enron Early* 51 0.325 56 S Facebook 40 0.870 190 S
  • 11. Empirical Tests and Results 2. Blondel¡¯s method allows us to extract the communities from the original dataset, given it¡¯s greater execution speed in comparison with Newman¡¯s method. Original Sampled Q C Q C GrQc 0.856 390 0.789 11 Enron 0.491 43 0.560 68 Facebook 0.681 1105 0.519 33 ? How to compare nodes community matching? ? NMI : Normalized Mutual Information
  • 12. Normalized Mutual Information ? After labeling the communities, we match the nodes inside every corresponding community in the sampled and original datasets. ? Purity: 100% means that all nodes in same communities are matched in both datasets. ? We compare the Top N communities ( N =10 ) o Handicap Newman¡¯s and Blondel¡¯s methods are stochastic and non-deterministic ? Give slightly different results in each execution.
  • 13. Normalized Mutual Information ? After labeling the communities, we match the nodes inside every corresponding community in the sampled and original datasets. ? Purity: 100% means that all nodes in same communities are matched in both datasets. ? We compare the Top N communities ( N =10 ) o Handicap Newman¡¯s and Blondel¡¯s methods are stochastic and non-deterministic ? Give slightly different results in each execution. NMI sampled NMI orig. Vs. NMI orig Vs. orig. Net loss Vs. sampled (B) sampled (A) (C) (C- A) GrQc 0.66559 0.82544 0.77301 0.10742 Enron 0.69069 0.86903 0.82012 0.12943 Facebook 0.58996 0.73249 0.69215 0.10219
  • 14. Newman¡¯s Vs. Blondel¡¯s ? In terms of modularity (Q) and number of communities (C) Blondel¡¯s Blondel¡¯s Newman¡¯s Original Sampled Sampled Q C Q C Q C GrQc 0.856 390 0.789 11 0.777 57 Enron 0.491 43 0.560 68 0.325 56 Facebook 0.681 1105 0.519 33 0.870 190 ? The best modularity values are dataset dependent.
  • 15. Newman¡¯s Vs. Blondel¡¯s ? In terms of modularity (Q) and number of communities (C) Blondel¡¯s Blondel¡¯s Newman¡¯s Original Sampled Sampled Q C Q C Q C GrQc 0.856 390 0.789 11 0.777 57 Enron 0.491 43 0.560 68 0.325 56 Facebook 0.681 1105 0.519 33 0.870 190 ? The methods may give distinct results in terms of the number of communities and modularity values.
  • 16. Newman¡¯s (NG) Vs. Blondel¡¯s (BN) ? In terms of NMI: Normalized Mutual Information. Comparing Top N communities NMI BN Vs. NG NMI NG Vs. BN NMI orig Vs. orig. Net loss (A) (B) (C) (C- Avg (A,B)) GrQc 0.69116 0.87243 0.77301 -0.00878 Enron 0.31313 0.68796 0.82012 0.31958 Enron Early 0.83437 0.44320 0.82012 0.18133 Facebook 0.62056 0.54551 0.69215 0.10911 ? Results show significant differences between the assignment of the nodes between methods.
  • 17. Conclusions ? We¡¯ve benchmarked 5 statistically and topologically distinct datasets ? Applying 2 community structure algorithms ? Sampling original datasets ? Results indicate that is possible to identify the principal communities for large complex datasets using sampling. ? It maintains the key facets of the community structure of a dataset (NMI statistic shows high correspondence is maintained) ? Significantly reduces of the dataset size (80-90%) ? However, a difference is found in the assignment of nodes to communities between different executions and methods, due to their stochastic nature.
  • 18. Thank you for your attention :-)