ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Applying social network analysis
to Parliamentary Proceedings
Automatic discovery of meaningful cliques



Author:
Justin van Wees

Supervisors:
Dr. Maarten Marx
Dr. Johan van Doornik

June 23, 2011
Why?
Motivation and research question
Research question
    Can we discover communities of politicians
      that debate on a speci c policy area?

                         Motivation
?   It¡¯s unknown which member is responsible for a certain
    policy area

?   Discover what issues are discussed within a policy area

?   Serve as example application of social network analysis
    techniques
How?
Background and methodology
Applying social network analysis to Parliamentary Proceedings
Applying social network analysis to Parliamentary Proceedings
<root>
  <docinfo>...</docinfo>
  <meta>...</meta>
  <proceedings>
    <topic>
       <scene type="speaker" speaker="Hamer" party="PvdA" function="Mevrouw"
              role="mp" title="Mevrouw Hamer (PvdA)" MPid="02221">
         <speech party="PvdA" speaker="Hamer" function="Mevrouw"
                 role="mp" MPid="02221">
           <p>Dat is helemaal niet waar. U bewijst nu voor de derde keer
                 dat u niet ...</p>
         </speech>
         <speech type="interruption" party="Verdonk" speaker="Verdonk"
                 function="Mevrouw" role="mp" MPid="02995">
           <p>Mag ik even uitpraten? Dank u. Zo werkt dat, gewoon fatsoen.
                 Dank u wel. [...]</p>
         </speech>
       </scence>
    </topic>
  </proceedings>
</root>
A simple graph
A directed graph
42


                  32
                                21
                       12
        84



  100
             10
                            8
                                 15




A weighted directed graph
.8&&%9":3()(;&/%3<"3='()(,-


                                               8
  456",,%#()(+77()(,-
                                           8
                        2
                                 4
                                                         !"#$%&'()(**+()(,-
                    2


4,"2'()(B1$A()(,-
                                     >":#%1%#$)456/?2%3()(@+A()(,-
                .//0%&1/&'2()(0/1%&3,%32


               A single debate represented in a graph
Debates during Cabinet Kok II
A community
   A group of nodes that are relatively densely
connected to each other but sparsely connected to
       other dense groups in the network
A k-clique (k = 4)   K-clique communities (k = 4)
Finding issues that a community is discussing

?   Retrieve all ¡®community text¡¯

?   Tokenized at word level

?   Lemmatize

?   Use parsimonious language models to nd most
    ¡®descriptive¡¯ terms
What?
Results and conclusion
General network statistics of Kok II

              No distinction With distinction
             between MP/MG between MP/MG
                  roles           roles
Nodes               211               218
Edges              3594              3615
Density            0,081             0,076
Finding k-clique communties

?   By default, found groups are note ¡®cohesive¡¯

?   Filter out ¡®noise¡¯ by setting a threshold on edge weights

?   At 15 interruptions: 197 nodes, 741 edges, 31 k-clique
    communities
Applying social network analysis to Parliamentary Proceedings
Applying social network analysis to Parliamentary Proceedings
Finding k-clique communties

?   All k-clique communities could be traced back to a single
    policy area

?   Except for more ¡®general¡¯ policy areas

?   92% of the community members directly related to the policy
    area covered by the community

?   85% of top 20 ¡®issue terms¡¯ relevant to policy area

?   K-clique community detection and parsimonious language
    models are successful methods for automatic discovery of
    communities within debate networks
Discussion
... and future research
?   Method for setting edge weight threshold

?   Reviewing of k-cliques done by single person

?   Used four years of data, shorter time-window possible?

?   Focused on Cabinet Kok II, what about other (earlier)
    cabinets?

?   Completely di?erent data?
Questions?
For detailed results, datasets and programs see:
 http://justinvanwees.nl/goto/bachelorscriptie

More Related Content

Applying social network analysis to Parliamentary Proceedings

  • 1. Applying social network analysis to Parliamentary Proceedings Automatic discovery of meaningful cliques Author: Justin van Wees Supervisors: Dr. Maarten Marx Dr. Johan van Doornik June 23, 2011
  • 3. Research question Can we discover communities of politicians that debate on a speci c policy area? Motivation ? It¡¯s unknown which member is responsible for a certain policy area ? Discover what issues are discussed within a policy area ? Serve as example application of social network analysis techniques
  • 7. <root> <docinfo>...</docinfo> <meta>...</meta> <proceedings> <topic> <scene type="speaker" speaker="Hamer" party="PvdA" function="Mevrouw" role="mp" title="Mevrouw Hamer (PvdA)" MPid="02221"> <speech party="PvdA" speaker="Hamer" function="Mevrouw" role="mp" MPid="02221"> <p>Dat is helemaal niet waar. U bewijst nu voor de derde keer dat u niet ...</p> </speech> <speech type="interruption" party="Verdonk" speaker="Verdonk" function="Mevrouw" role="mp" MPid="02995"> <p>Mag ik even uitpraten? Dank u. Zo werkt dat, gewoon fatsoen. Dank u wel. [...]</p> </speech> </scence> </topic> </proceedings> </root>
  • 10. 42 32 21 12 84 100 10 8 15 A weighted directed graph
  • 11. .8&&%9":3()(;&/%3<"3='()(,- 8 456",,%#()(+77()(,- 8 2 4 !"#$%&'()(**+()(,- 2 4,"2'()(B1$A()(,- >":#%1%#$)456/?2%3()(@+A()(,- .//0%&1/&'2()(0/1%&3,%32 A single debate represented in a graph
  • 13. A community A group of nodes that are relatively densely connected to each other but sparsely connected to other dense groups in the network
  • 14. A k-clique (k = 4) K-clique communities (k = 4)
  • 15. Finding issues that a community is discussing ? Retrieve all ¡®community text¡¯ ? Tokenized at word level ? Lemmatize ? Use parsimonious language models to nd most ¡®descriptive¡¯ terms
  • 17. General network statistics of Kok II No distinction With distinction between MP/MG between MP/MG roles roles Nodes 211 218 Edges 3594 3615 Density 0,081 0,076
  • 18. Finding k-clique communties ? By default, found groups are note ¡®cohesive¡¯ ? Filter out ¡®noise¡¯ by setting a threshold on edge weights ? At 15 interruptions: 197 nodes, 741 edges, 31 k-clique communities
  • 21. Finding k-clique communties ? All k-clique communities could be traced back to a single policy area ? Except for more ¡®general¡¯ policy areas ? 92% of the community members directly related to the policy area covered by the community ? 85% of top 20 ¡®issue terms¡¯ relevant to policy area ? K-clique community detection and parsimonious language models are successful methods for automatic discovery of communities within debate networks
  • 23. ? Method for setting edge weight threshold ? Reviewing of k-cliques done by single person ? Used four years of data, shorter time-window possible? ? Focused on Cabinet Kok II, what about other (earlier) cabinets? ? Completely di?erent data?
  • 24. Questions? For detailed results, datasets and programs see: http://justinvanwees.nl/goto/bachelorscriptie