ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
New Directions for
Power Law Research

 Michael Mitzenmacher
  Harvard University


                        1
Internet Mathematics
       Articles Related to This Talk

        The Future of Power Law Research


         Dynamic Models for File Sizes
         and Double Pareto Distributions

          A Brief History of Generative
          Models for Power Law and
          Lognormal Distributions


                           2
Motivation: General
? Power laws (and/or scale-free networks) are now
  everywhere.
   ¨C See the popular texts Linked by Barabasi or Six
     Degrees by Watts.
   ¨C In computer science: file sizes, download times,
     Internet topology, Web graph, etc.
   ¨C Other sciences: Economics, physics, ecology,
     linguistics, etc.
? What has been and what should be the research
  agenda?

                                              3
My (Biased) View
?    There are 5 stages of power law network research.
    1) Observe: Gather data to demonstrate power law behavior
       in a system.
    2) Interpret: Explain the importance of this observation in
       the system context.
    3) Model: Propose an underlying model for the observed
       behavior of the system.
    4) Validate: Find data to validate (and if necessary
       specialize or modify) the model.
    5) Control: Design ways to control and modify the
       underlying behavior of the system based on the model.
                                               4
My (Biased) View
? In networks, we have spent a lot of time observing
  and interpreting power laws.
? We are currently in the modeling stage.
   ¨C Many, many possible models.
   ¨C I¡¯ll talk about some of my favorites later on.
? We need to now put much more focus on
  validation and control.
   ¨C And these are specific areas where computer science
     has much to contribute!

                                                5
Models
? After observation, the natural step is to
  explain/model the behavior.
? Outcome: lots of modeling papers.
  ¨C And many models rediscovered.
? Lots of history¡­



                                              6
History
? In 1990¡¯s, the abundance of observed power laws in networks
  surprised the community.
   ¨C Perhaps they shouldn¡¯t have¡­ power laws appear frequently
     throughout the sciences.
       ?   Pareto : income distribution, 1897
       ?   Zipf-Auerbach: city sizes, 1913/1940¡¯s
       ?   Zipf-Estouf: word frequency, 1916/1940¡¯s
       ?   Lotka: bibliometrics, 1926
       ?   Yule: species and genera, 1924.
       ?   Mandelbrot: economics/information theory, 1950¡¯s+
? Observation/interpretation were/are key to initial understanding.
? My claim: but now the mere existence of power laws should not
  be surprising, or necessarily even noteworthy.
? My (biased) opinion: The bar should now be very high for
  observation/interpretation.
                                                   7
Power Law Distribution
? A power law distribution satisfies
                   Pr[ X ¡Ý x] ~ cx ?¦Á
? Pareto distribution
                   Pr[ X ¡Ý x] = k  ( )
                                 x ?¦Á
   ¨C Log-complementary cumulative distribution function
     (ccdf) is exactly linear.
               ln Pr[ X ¡Ý x] = ?¦Á ln x + ¦Á ln k
? Properties
   ¨C Infinite mean/variance possible



                                            8
Lognormal Distribution
? X is lognormally distributed if Y = ln X is
  normally distributed.
? Density function: f ( x) = 1 e?(ln x ? ? ) / 2¦Ò
                                              2     2



? Properties:                2¦Ð ¦Òx
   ¨C Finite mean/variance.
   ¨C Skewed: mean > median > mode
   ¨C Multiplicative: X1 lognormal, X2 lognormal
     implies X1X2 lognormal.

                                        9
Similarity
? Easily seen by looking at log-densities.
? Pareto has linear log-density.
          ln f ( x) = ?(¦Á ? 1) ln x + ¦Á ln k + ln ¦Á
? For large ¦Ò, lognormal has nearly linear log-
  density.                                ( ln x ? ? ) 2
        ln f ( x) = ? ln x ? ln 2¦Ð ¦Ò ?
                                               2¦Ò 2
? Similarly, both have near linear log-ccdfs.
   ¨C Log-ccdfs usually used for empirical, visual tests of
     power law behavior.
? Question: how to differentiate them empirically?


                                                10
Lognormal vs. Power Law
? Question: Is this distribution lognormal or
  a power law?
  ¨C Reasonable follow-up: Does it matter?
? Primarily in economics
  ¨C Income distribution.
  ¨C Stock prices. (Black-Scholes model.)
? But also papers in ecology, biology,
  astronomy, etc.

                                      11
Preferential Attachment
? Consider dynamic Web graph.
  ¨C Pages join one at a time.
  ¨C Each page has one outlink.
? Let Xj(t) be the number of pages of degree j
  at time t.
? New page links:
  ¨C With probability ¦Á, link to a random page.
  ¨C With probability (1- ¦Á), a link to a page chosen
    proportionally to indegree. (Copy a link.)
                                        12
Preferential Attachment History
? This model (without the graphs) was
  derived in the 1950¡¯s by Herbert Simon.
  ¨C ¡­ who won a Nobel Prize in economics for
    entirely different work.
  ¨C His analysis was not for Web graphs, but for
    other preferential attachment problems.




                                       13
Optimization Model: Power Law
? Mandelbrot experiment: design a language over a d-
  ary alphabet to optimize information per character.
   ¨C Probability of jth most frequently used word is pj.
   ¨C Length of jth most frequently used word is cj.
? Average information per word:
              H = ?¡Æ j p j log 2 p j
? Average characters per word:
                     C = ¡Æ j p jc j

? Optimization leads to power law.

                                                14
Monkeys Typing Randomly
? Miller (psychologist, 1957) suggests following:
  monkeys type randomly at a keyboard.
   ¨C Hit each of n characters with probability p.
   ¨C Hit space bar with probability 1 - np > 0.
   ¨C A word is sequence of characters separated by a space.
? Resulting distribution of word frequencies follows
  a power law.
? Conclusion: Mandelbrot¡¯s ¡°optimization¡± not
  required for languages to have power law

                                              15
Generative Models: Lognormal
? Start with an organism of size X0.
? At each time step, size changes by a random
  multiplicative factor.
                      X t = Ft ?1 X t ?1
? If Ft is taken from a lognormal distribution, each Xt is
  lognormal.
? If Ft are independent, identically distributed then (by
  CLT) Xt converges to lognormal distribution.


                                            16
BUT!
? If there exists a lower bound:
             X t = max(¦Å , Ft ?1 X t ?1 )
   then Xt converges to a power law
  distribution. (Champernowne, 1953)
? Lognormal model easily pushed to a power
  law model.


                                            17
Double Pareto Distributions

? Consider continuous version of lognormal
  generative model.
   ¨C At time t, log Xt is normal with mean ?t and variance
     ¦Ò2 t
? Suppose observation time is distributed
  exponentially.
   ¨C E.g., When Web size doubles every year.
? Resulting distribution is Double Pareto.
   ¨C Between lognormal and Pareto.
   ¨C Linear tail on a log-log chart, but a lognormal body.

                                               18
Lognormal vs. Double Pareto




                     19
And So Many More¡­
? New variations coming up all of the time.
? Question : What makes a new power law model
  sufficiently interesting to merit attention and/or
  publication?
   ¨C Strong connection to an observed process.
      ? Many models claim this, but few demonstrate it convincingly.
   ¨C Theory perspective: new mathematical insight or
     sophistication.
? My (biased) opinion: the bar should start being
  raised on model papers.
                                                    20
Validation: The Current Stage
? We now have so many models.
? It may be important to know the right model, to
  extrapolate and control future behavior.
? Given a proposed underlying model, we need tools
  to help us validate it.
? We appear to be entering the validation stage of
  research¡­. BUT the first steps have focused on
  invalidation rather than validation.

                                      21
Examples : Invalidation
? Lakhina, Byers, Crovella, Xie
   ¨C Show that observed power-law of Internet topology
     might be because of biases in traceroute sampling.
? Chen, Chang, Govindan, Jamin, Shenker,
  Willinger
   ¨C Show that Internet topology has characteristics that do
     not match preferential-attachment graphs.
   ¨C Suggest an alternative mechanism.
      ? But does this alternative match all characteristics, or are we
        still missing some?


                                                        22
My (Biased) View
? Invalidation is an important part of the process!
  BUT it is inherently different than validating a
  model.
? Validating seems much harder.
? Indeed, it is arguable what constitutes a validation.
? Question: what should it mean to say
  ¡°This model is consistent with observed data.¡±


                                          23
Time-Series/Trace Analysis
? Many models posit some sort of actions.
   ¨C New pages linking to pages in the Web.
   ¨C New routers joining the network.
   ¨C New files appearing in a file system.
? A validation approach: gather traces and see if the
  traces suitably match the model.
   ¨C Trace gathering can be a challenging systems problem.
   ¨C Check model match requires using appropriate
     statistical techniques and tests.
   ¨C May lead to new, improved, better justified models.

                                              24
Sampling and Trace Analysis
? Often, cannot record all actions.
   ¨C Internet is too big!
? Sampling
   ¨C Global: snapshots of entire system at various times.
   ¨C Local: record actions of sample agents in a system.
? Examples:
   ¨C Snapshots of file systems: full systems vs. actions of
     individual users.
   ¨C Router topology: Internet maps vs. changes at subset
     of routers.
? Question: how much/what kind of sampling is
  sufficient to validate a model appropriately?
   ¨C Does this differ among models?            25
To Control
? In many systems, intervention can impact the
  outcome.
   ¨C Maybe not for earthquakes, but for computer networks!
   ¨C Typical setting: individual agents acting in their own
     best interest, giving a global power law. Agents can be
     given incentives to change behavior.
? General problem: given a good model, determine
  how to change system behavior to optimize a
  global performance function.
   ¨C Distributed algorithmic mechanism design.
   ¨C Mix of economics/game theory and computer science.
                                              26
Possible Control Approaches
? Adding constraints: local or global
   ¨C Example: total space in a file system.
   ¨C Example: preferential attachment but links limited by
     an underlying metric.
? Add incentives or costs
   ¨C Example: charges for exceeding soft disk quotas.
   ¨C Example: payments for certain AS level connections.
? Limiting information
   ¨C Impact decisions by not letting everyone have true view
     of the system.

                                              27
Conclusion : My (Biased) View
?    There are 5 stages of power law research.
    1) Observe: Gather data to demonstrate power law
       behavior in a system.
    2) Interpret: Explain the import of this observation in the
       system context.
    3) Model: Propose an underlying model for the observed
       behavior of the system.
    4) Validate: Find data to validate (and if necessary
       specialize or modify) the model.
    5) Control: Design ways to control and modify the
       underlying behavior of the system based on the model.
?    We need to focus on validation and control.
    ¨C   Lots of open research problems.
                                                28
A Chance for Collaboration
? The observe/interpret stages of research are dominated by
  systems; modeling dominated by theory.
   ¨C And need new insights, from statistics, control theory,
     economics!!!
? Validation and control require a strong theoretical
  foundation.
   ¨C Need universal ideas and methods that span different types of
     systems.
   ¨C Need understanding of underlying mathematical models.
? But also a large systems buy-in.
   ¨C Getting/analyzing/understanding data.
   ¨C Find avenues for real impact.
? Good area for future systems/theory/others collaboration
  and interaction.
                                               29

More Related Content

Similar to Radcliffe (20)

Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
Jason Riedy
?
Knowledge_Representbhhggghhhhhhhtrrghjuuuuation.ppt
Knowledge_Representbhhggghhhhhhhtrrghjuuuuation.pptKnowledge_Representbhhggghhhhhhhtrrghjuuuuation.ppt
Knowledge_Representbhhggghhhhhhhtrrghjuuuuation.ppt
SupriyoRoy41
?
2013 py con awesome big data algorithms
2013 py con awesome big data algorithms2013 py con awesome big data algorithms
2013 py con awesome big data algorithms
c.titus.brown
?
Why do we need machine learning? Neural Networks for Machine Learning Lectur...
Why do we need machine learning? Neural Networks for Machine Learning  Lectur...Why do we need machine learning? Neural Networks for Machine Learning  Lectur...
Why do we need machine learning? Neural Networks for Machine Learning Lectur...
felipe61638
?
PMSCS 657_Parallel and Distributed processing
PMSCS 657_Parallel and Distributed processingPMSCS 657_Parallel and Distributed processing
PMSCS 657_Parallel and Distributed processing
Md. Mashiur Rahman
?
2013 siam-cse-big-data
2013 siam-cse-big-data2013 siam-cse-big-data
2013 siam-cse-big-data
c.titus.brown
?
Tutorial 6 (web graph attributes)
Tutorial 6 (web graph attributes)Tutorial 6 (web graph attributes)
Tutorial 6 (web graph attributes)
Kira
?
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
Sujit Pal
?
social.pptx
social.pptxsocial.pptx
social.pptx
NISHASOMSCS113
?
On the value of stochastic analysis for software engineering
On the value of stochastic analysis for software engineeringOn the value of stochastic analysis for software engineering
On the value of stochastic analysis for software engineering
CS, NcState
?
Random graph models
Random graph modelsRandom graph models
Random graph models
networksuw
?
Tsinghua invited talk_zhou_xing_v2r0
Tsinghua invited talk_zhou_xing_v2r0Tsinghua invited talk_zhou_xing_v2r0
Tsinghua invited talk_zhou_xing_v2r0
Joe Xing
?
Chapter 12 knowledge representation nd description
Chapter 12 knowledge representation nd descriptionChapter 12 knowledge representation nd description
Chapter 12 knowledge representation nd description
AfraseyabKhan1
?
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo
JaeJun Yoo
?
HyperMembrane Structures for Open Source Cognitive Computing
HyperMembrane Structures for Open Source Cognitive ComputingHyperMembrane Structures for Open Source Cognitive Computing
HyperMembrane Structures for Open Source Cognitive Computing
Jack Park
?
5954987.ppt
5954987.ppt5954987.ppt
5954987.ppt
MukhtiarKhan5
?
Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of...
Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of...Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of...
Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of...
e2wi67sy4816pahn
?
Topology ppt
Topology pptTopology ppt
Topology ppt
karan saini
?
Topology ppt
Topology pptTopology ppt
Topology ppt
boocse11
?
Lec01-Algorithems - Introduction and Overview.pdf
Lec01-Algorithems - Introduction and Overview.pdfLec01-Algorithems - Introduction and Overview.pdf
Lec01-Algorithems - Introduction and Overview.pdf
MAJDABDALLAH3
?
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
Jason Riedy
?
Knowledge_Representbhhggghhhhhhhtrrghjuuuuation.ppt
Knowledge_Representbhhggghhhhhhhtrrghjuuuuation.pptKnowledge_Representbhhggghhhhhhhtrrghjuuuuation.ppt
Knowledge_Representbhhggghhhhhhhtrrghjuuuuation.ppt
SupriyoRoy41
?
2013 py con awesome big data algorithms
2013 py con awesome big data algorithms2013 py con awesome big data algorithms
2013 py con awesome big data algorithms
c.titus.brown
?
Why do we need machine learning? Neural Networks for Machine Learning Lectur...
Why do we need machine learning? Neural Networks for Machine Learning  Lectur...Why do we need machine learning? Neural Networks for Machine Learning  Lectur...
Why do we need machine learning? Neural Networks for Machine Learning Lectur...
felipe61638
?
PMSCS 657_Parallel and Distributed processing
PMSCS 657_Parallel and Distributed processingPMSCS 657_Parallel and Distributed processing
PMSCS 657_Parallel and Distributed processing
Md. Mashiur Rahman
?
Tutorial 6 (web graph attributes)
Tutorial 6 (web graph attributes)Tutorial 6 (web graph attributes)
Tutorial 6 (web graph attributes)
Kira
?
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
Sujit Pal
?
On the value of stochastic analysis for software engineering
On the value of stochastic analysis for software engineeringOn the value of stochastic analysis for software engineering
On the value of stochastic analysis for software engineering
CS, NcState
?
Random graph models
Random graph modelsRandom graph models
Random graph models
networksuw
?
Tsinghua invited talk_zhou_xing_v2r0
Tsinghua invited talk_zhou_xing_v2r0Tsinghua invited talk_zhou_xing_v2r0
Tsinghua invited talk_zhou_xing_v2r0
Joe Xing
?
Chapter 12 knowledge representation nd description
Chapter 12 knowledge representation nd descriptionChapter 12 knowledge representation nd description
Chapter 12 knowledge representation nd description
AfraseyabKhan1
?
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo
JaeJun Yoo
?
HyperMembrane Structures for Open Source Cognitive Computing
HyperMembrane Structures for Open Source Cognitive ComputingHyperMembrane Structures for Open Source Cognitive Computing
HyperMembrane Structures for Open Source Cognitive Computing
Jack Park
?
Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of...
Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of...Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of...
Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of...
e2wi67sy4816pahn
?
Lec01-Algorithems - Introduction and Overview.pdf
Lec01-Algorithems - Introduction and Overview.pdfLec01-Algorithems - Introduction and Overview.pdf
Lec01-Algorithems - Introduction and Overview.pdf
MAJDABDALLAH3
?

Recently uploaded (20)

[Webinar] Scaling Made Simple: Getting Started with No-Code Web Apps
[Webinar] Scaling Made Simple: Getting Started with No-Code Web Apps[Webinar] Scaling Made Simple: Getting Started with No-Code Web Apps
[Webinar] Scaling Made Simple: Getting Started with No-Code Web Apps
Safe Software
?
Unlock AI Creativity: Image Generation with DALL¡¤E
Unlock AI Creativity: Image Generation with DALL¡¤EUnlock AI Creativity: Image Generation with DALL¡¤E
Unlock AI Creativity: Image Generation with DALL¡¤E
Expeed Software
?
Build with AI on Google Cloud Session #4
Build with AI on Google Cloud Session #4Build with AI on Google Cloud Session #4
Build with AI on Google Cloud Session #4
Margaret Maynard-Reid
?
Replacing RocksDB with ScyllaDB in Kafka Streams by Almog Gavra
Replacing RocksDB with ScyllaDB in Kafka Streams by Almog GavraReplacing RocksDB with ScyllaDB in Kafka Streams by Almog Gavra
Replacing RocksDB with ScyllaDB in Kafka Streams by Almog Gavra
ScyllaDB
?
Gojek Clone Multi-Service Super App.pptx
Gojek Clone Multi-Service Super App.pptxGojek Clone Multi-Service Super App.pptx
Gojek Clone Multi-Service Super App.pptx
V3cube
?
Revolutionizing-Government-Communication-The-OSWAN-Success-Story
Revolutionizing-Government-Communication-The-OSWAN-Success-StoryRevolutionizing-Government-Communication-The-OSWAN-Success-Story
Revolutionizing-Government-Communication-The-OSWAN-Success-Story
ssuser52ad5e
?
UiPath Agentic Automation Capabilities and Opportunities
UiPath Agentic Automation Capabilities and OpportunitiesUiPath Agentic Automation Capabilities and Opportunities
UiPath Agentic Automation Capabilities and Opportunities
DianaGray10
?
UiPath Automation Developer Associate Training Series 2025 - Session 2
UiPath Automation Developer Associate Training Series 2025 - Session 2UiPath Automation Developer Associate Training Series 2025 - Session 2
UiPath Automation Developer Associate Training Series 2025 - Session 2
DianaGray10
?
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)
Tsuyoshi Hirayama
?
Transform Your Future with Front-End Development Training
Transform Your Future with Front-End Development TrainingTransform Your Future with Front-End Development Training
Transform Your Future with Front-End Development Training
Vtechlabs
?
TrustArc Webinar - Building your DPIA/PIA Program: Best Practices & Tips
TrustArc Webinar - Building your DPIA/PIA Program: Best Practices & TipsTrustArc Webinar - Building your DPIA/PIA Program: Best Practices & Tips
TrustArc Webinar - Building your DPIA/PIA Program: Best Practices & Tips
TrustArc
?
Technology use over time and its impact on consumers and businesses.pptx
Technology use over time and its impact on consumers and businesses.pptxTechnology use over time and its impact on consumers and businesses.pptx
Technology use over time and its impact on consumers and businesses.pptx
kaylagaze
?
What Makes "Deep Research"? A Dive into AI Agents
What Makes "Deep Research"? A Dive into AI AgentsWhat Makes "Deep Research"? A Dive into AI Agents
What Makes "Deep Research"? A Dive into AI Agents
Zilliz
?
DevNexus - Building 10x Development Organizations.pdf
DevNexus - Building 10x Development Organizations.pdfDevNexus - Building 10x Development Organizations.pdf
DevNexus - Building 10x Development Organizations.pdf
Justin Reock
?
FinTech - US Annual Funding Report - 2024.pptx
FinTech - US Annual Funding Report - 2024.pptxFinTech - US Annual Funding Report - 2024.pptx
FinTech - US Annual Funding Report - 2024.pptx
Tracxn
?
UiPath Automation Developer Associate Training Series 2025 - Session 1
UiPath Automation Developer Associate Training Series 2025 - Session 1UiPath Automation Developer Associate Training Series 2025 - Session 1
UiPath Automation Developer Associate Training Series 2025 - Session 1
DianaGray10
?
Field Device Management Market Report 2030 - TechSci Research
Field Device Management Market Report 2030 - TechSci ResearchField Device Management Market Report 2030 - TechSci Research
Field Device Management Market Report 2030 - TechSci Research
Vipin Mishra
?
L01 Introduction to Nanoindentation - What is hardness
L01 Introduction to Nanoindentation - What is hardnessL01 Introduction to Nanoindentation - What is hardness
L01 Introduction to Nanoindentation - What is hardness
RostislavDaniel
?
Computational Photography: How Technology is Changing Way We Capture the World
Computational Photography: How Technology is Changing Way We Capture the WorldComputational Photography: How Technology is Changing Way We Capture the World
Computational Photography: How Technology is Changing Way We Capture the World
HusseinMalikMammadli
?
Cloud of everything Tech of the 21 century in Aviation
Cloud of everything Tech of the 21 century in AviationCloud of everything Tech of the 21 century in Aviation
Cloud of everything Tech of the 21 century in Aviation
Assem mousa
?
[Webinar] Scaling Made Simple: Getting Started with No-Code Web Apps
[Webinar] Scaling Made Simple: Getting Started with No-Code Web Apps[Webinar] Scaling Made Simple: Getting Started with No-Code Web Apps
[Webinar] Scaling Made Simple: Getting Started with No-Code Web Apps
Safe Software
?
Unlock AI Creativity: Image Generation with DALL¡¤E
Unlock AI Creativity: Image Generation with DALL¡¤EUnlock AI Creativity: Image Generation with DALL¡¤E
Unlock AI Creativity: Image Generation with DALL¡¤E
Expeed Software
?
Build with AI on Google Cloud Session #4
Build with AI on Google Cloud Session #4Build with AI on Google Cloud Session #4
Build with AI on Google Cloud Session #4
Margaret Maynard-Reid
?
Replacing RocksDB with ScyllaDB in Kafka Streams by Almog Gavra
Replacing RocksDB with ScyllaDB in Kafka Streams by Almog GavraReplacing RocksDB with ScyllaDB in Kafka Streams by Almog Gavra
Replacing RocksDB with ScyllaDB in Kafka Streams by Almog Gavra
ScyllaDB
?
Gojek Clone Multi-Service Super App.pptx
Gojek Clone Multi-Service Super App.pptxGojek Clone Multi-Service Super App.pptx
Gojek Clone Multi-Service Super App.pptx
V3cube
?
Revolutionizing-Government-Communication-The-OSWAN-Success-Story
Revolutionizing-Government-Communication-The-OSWAN-Success-StoryRevolutionizing-Government-Communication-The-OSWAN-Success-Story
Revolutionizing-Government-Communication-The-OSWAN-Success-Story
ssuser52ad5e
?
UiPath Agentic Automation Capabilities and Opportunities
UiPath Agentic Automation Capabilities and OpportunitiesUiPath Agentic Automation Capabilities and Opportunities
UiPath Agentic Automation Capabilities and Opportunities
DianaGray10
?
UiPath Automation Developer Associate Training Series 2025 - Session 2
UiPath Automation Developer Associate Training Series 2025 - Session 2UiPath Automation Developer Associate Training Series 2025 - Session 2
UiPath Automation Developer Associate Training Series 2025 - Session 2
DianaGray10
?
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)
Tsuyoshi Hirayama
?
Transform Your Future with Front-End Development Training
Transform Your Future with Front-End Development TrainingTransform Your Future with Front-End Development Training
Transform Your Future with Front-End Development Training
Vtechlabs
?
TrustArc Webinar - Building your DPIA/PIA Program: Best Practices & Tips
TrustArc Webinar - Building your DPIA/PIA Program: Best Practices & TipsTrustArc Webinar - Building your DPIA/PIA Program: Best Practices & Tips
TrustArc Webinar - Building your DPIA/PIA Program: Best Practices & Tips
TrustArc
?
Technology use over time and its impact on consumers and businesses.pptx
Technology use over time and its impact on consumers and businesses.pptxTechnology use over time and its impact on consumers and businesses.pptx
Technology use over time and its impact on consumers and businesses.pptx
kaylagaze
?
What Makes "Deep Research"? A Dive into AI Agents
What Makes "Deep Research"? A Dive into AI AgentsWhat Makes "Deep Research"? A Dive into AI Agents
What Makes "Deep Research"? A Dive into AI Agents
Zilliz
?
DevNexus - Building 10x Development Organizations.pdf
DevNexus - Building 10x Development Organizations.pdfDevNexus - Building 10x Development Organizations.pdf
DevNexus - Building 10x Development Organizations.pdf
Justin Reock
?
FinTech - US Annual Funding Report - 2024.pptx
FinTech - US Annual Funding Report - 2024.pptxFinTech - US Annual Funding Report - 2024.pptx
FinTech - US Annual Funding Report - 2024.pptx
Tracxn
?
UiPath Automation Developer Associate Training Series 2025 - Session 1
UiPath Automation Developer Associate Training Series 2025 - Session 1UiPath Automation Developer Associate Training Series 2025 - Session 1
UiPath Automation Developer Associate Training Series 2025 - Session 1
DianaGray10
?
Field Device Management Market Report 2030 - TechSci Research
Field Device Management Market Report 2030 - TechSci ResearchField Device Management Market Report 2030 - TechSci Research
Field Device Management Market Report 2030 - TechSci Research
Vipin Mishra
?
L01 Introduction to Nanoindentation - What is hardness
L01 Introduction to Nanoindentation - What is hardnessL01 Introduction to Nanoindentation - What is hardness
L01 Introduction to Nanoindentation - What is hardness
RostislavDaniel
?
Computational Photography: How Technology is Changing Way We Capture the World
Computational Photography: How Technology is Changing Way We Capture the WorldComputational Photography: How Technology is Changing Way We Capture the World
Computational Photography: How Technology is Changing Way We Capture the World
HusseinMalikMammadli
?
Cloud of everything Tech of the 21 century in Aviation
Cloud of everything Tech of the 21 century in AviationCloud of everything Tech of the 21 century in Aviation
Cloud of everything Tech of the 21 century in Aviation
Assem mousa
?

Radcliffe

  • 1. New Directions for Power Law Research Michael Mitzenmacher Harvard University 1
  • 2. Internet Mathematics Articles Related to This Talk The Future of Power Law Research Dynamic Models for File Sizes and Double Pareto Distributions A Brief History of Generative Models for Power Law and Lognormal Distributions 2
  • 3. Motivation: General ? Power laws (and/or scale-free networks) are now everywhere. ¨C See the popular texts Linked by Barabasi or Six Degrees by Watts. ¨C In computer science: file sizes, download times, Internet topology, Web graph, etc. ¨C Other sciences: Economics, physics, ecology, linguistics, etc. ? What has been and what should be the research agenda? 3
  • 4. My (Biased) View ? There are 5 stages of power law network research. 1) Observe: Gather data to demonstrate power law behavior in a system. 2) Interpret: Explain the importance of this observation in the system context. 3) Model: Propose an underlying model for the observed behavior of the system. 4) Validate: Find data to validate (and if necessary specialize or modify) the model. 5) Control: Design ways to control and modify the underlying behavior of the system based on the model. 4
  • 5. My (Biased) View ? In networks, we have spent a lot of time observing and interpreting power laws. ? We are currently in the modeling stage. ¨C Many, many possible models. ¨C I¡¯ll talk about some of my favorites later on. ? We need to now put much more focus on validation and control. ¨C And these are specific areas where computer science has much to contribute! 5
  • 6. Models ? After observation, the natural step is to explain/model the behavior. ? Outcome: lots of modeling papers. ¨C And many models rediscovered. ? Lots of history¡­ 6
  • 7. History ? In 1990¡¯s, the abundance of observed power laws in networks surprised the community. ¨C Perhaps they shouldn¡¯t have¡­ power laws appear frequently throughout the sciences. ? Pareto : income distribution, 1897 ? Zipf-Auerbach: city sizes, 1913/1940¡¯s ? Zipf-Estouf: word frequency, 1916/1940¡¯s ? Lotka: bibliometrics, 1926 ? Yule: species and genera, 1924. ? Mandelbrot: economics/information theory, 1950¡¯s+ ? Observation/interpretation were/are key to initial understanding. ? My claim: but now the mere existence of power laws should not be surprising, or necessarily even noteworthy. ? My (biased) opinion: The bar should now be very high for observation/interpretation. 7
  • 8. Power Law Distribution ? A power law distribution satisfies Pr[ X ¡Ý x] ~ cx ?¦Á ? Pareto distribution Pr[ X ¡Ý x] = k ( ) x ?¦Á ¨C Log-complementary cumulative distribution function (ccdf) is exactly linear. ln Pr[ X ¡Ý x] = ?¦Á ln x + ¦Á ln k ? Properties ¨C Infinite mean/variance possible 8
  • 9. Lognormal Distribution ? X is lognormally distributed if Y = ln X is normally distributed. ? Density function: f ( x) = 1 e?(ln x ? ? ) / 2¦Ò 2 2 ? Properties: 2¦Ð ¦Òx ¨C Finite mean/variance. ¨C Skewed: mean > median > mode ¨C Multiplicative: X1 lognormal, X2 lognormal implies X1X2 lognormal. 9
  • 10. Similarity ? Easily seen by looking at log-densities. ? Pareto has linear log-density. ln f ( x) = ?(¦Á ? 1) ln x + ¦Á ln k + ln ¦Á ? For large ¦Ò, lognormal has nearly linear log- density. ( ln x ? ? ) 2 ln f ( x) = ? ln x ? ln 2¦Ð ¦Ò ? 2¦Ò 2 ? Similarly, both have near linear log-ccdfs. ¨C Log-ccdfs usually used for empirical, visual tests of power law behavior. ? Question: how to differentiate them empirically? 10
  • 11. Lognormal vs. Power Law ? Question: Is this distribution lognormal or a power law? ¨C Reasonable follow-up: Does it matter? ? Primarily in economics ¨C Income distribution. ¨C Stock prices. (Black-Scholes model.) ? But also papers in ecology, biology, astronomy, etc. 11
  • 12. Preferential Attachment ? Consider dynamic Web graph. ¨C Pages join one at a time. ¨C Each page has one outlink. ? Let Xj(t) be the number of pages of degree j at time t. ? New page links: ¨C With probability ¦Á, link to a random page. ¨C With probability (1- ¦Á), a link to a page chosen proportionally to indegree. (Copy a link.) 12
  • 13. Preferential Attachment History ? This model (without the graphs) was derived in the 1950¡¯s by Herbert Simon. ¨C ¡­ who won a Nobel Prize in economics for entirely different work. ¨C His analysis was not for Web graphs, but for other preferential attachment problems. 13
  • 14. Optimization Model: Power Law ? Mandelbrot experiment: design a language over a d- ary alphabet to optimize information per character. ¨C Probability of jth most frequently used word is pj. ¨C Length of jth most frequently used word is cj. ? Average information per word: H = ?¡Æ j p j log 2 p j ? Average characters per word: C = ¡Æ j p jc j ? Optimization leads to power law. 14
  • 15. Monkeys Typing Randomly ? Miller (psychologist, 1957) suggests following: monkeys type randomly at a keyboard. ¨C Hit each of n characters with probability p. ¨C Hit space bar with probability 1 - np > 0. ¨C A word is sequence of characters separated by a space. ? Resulting distribution of word frequencies follows a power law. ? Conclusion: Mandelbrot¡¯s ¡°optimization¡± not required for languages to have power law 15
  • 16. Generative Models: Lognormal ? Start with an organism of size X0. ? At each time step, size changes by a random multiplicative factor. X t = Ft ?1 X t ?1 ? If Ft is taken from a lognormal distribution, each Xt is lognormal. ? If Ft are independent, identically distributed then (by CLT) Xt converges to lognormal distribution. 16
  • 17. BUT! ? If there exists a lower bound: X t = max(¦Å , Ft ?1 X t ?1 ) then Xt converges to a power law distribution. (Champernowne, 1953) ? Lognormal model easily pushed to a power law model. 17
  • 18. Double Pareto Distributions ? Consider continuous version of lognormal generative model. ¨C At time t, log Xt is normal with mean ?t and variance ¦Ò2 t ? Suppose observation time is distributed exponentially. ¨C E.g., When Web size doubles every year. ? Resulting distribution is Double Pareto. ¨C Between lognormal and Pareto. ¨C Linear tail on a log-log chart, but a lognormal body. 18
  • 19. Lognormal vs. Double Pareto 19
  • 20. And So Many More¡­ ? New variations coming up all of the time. ? Question : What makes a new power law model sufficiently interesting to merit attention and/or publication? ¨C Strong connection to an observed process. ? Many models claim this, but few demonstrate it convincingly. ¨C Theory perspective: new mathematical insight or sophistication. ? My (biased) opinion: the bar should start being raised on model papers. 20
  • 21. Validation: The Current Stage ? We now have so many models. ? It may be important to know the right model, to extrapolate and control future behavior. ? Given a proposed underlying model, we need tools to help us validate it. ? We appear to be entering the validation stage of research¡­. BUT the first steps have focused on invalidation rather than validation. 21
  • 22. Examples : Invalidation ? Lakhina, Byers, Crovella, Xie ¨C Show that observed power-law of Internet topology might be because of biases in traceroute sampling. ? Chen, Chang, Govindan, Jamin, Shenker, Willinger ¨C Show that Internet topology has characteristics that do not match preferential-attachment graphs. ¨C Suggest an alternative mechanism. ? But does this alternative match all characteristics, or are we still missing some? 22
  • 23. My (Biased) View ? Invalidation is an important part of the process! BUT it is inherently different than validating a model. ? Validating seems much harder. ? Indeed, it is arguable what constitutes a validation. ? Question: what should it mean to say ¡°This model is consistent with observed data.¡± 23
  • 24. Time-Series/Trace Analysis ? Many models posit some sort of actions. ¨C New pages linking to pages in the Web. ¨C New routers joining the network. ¨C New files appearing in a file system. ? A validation approach: gather traces and see if the traces suitably match the model. ¨C Trace gathering can be a challenging systems problem. ¨C Check model match requires using appropriate statistical techniques and tests. ¨C May lead to new, improved, better justified models. 24
  • 25. Sampling and Trace Analysis ? Often, cannot record all actions. ¨C Internet is too big! ? Sampling ¨C Global: snapshots of entire system at various times. ¨C Local: record actions of sample agents in a system. ? Examples: ¨C Snapshots of file systems: full systems vs. actions of individual users. ¨C Router topology: Internet maps vs. changes at subset of routers. ? Question: how much/what kind of sampling is sufficient to validate a model appropriately? ¨C Does this differ among models? 25
  • 26. To Control ? In many systems, intervention can impact the outcome. ¨C Maybe not for earthquakes, but for computer networks! ¨C Typical setting: individual agents acting in their own best interest, giving a global power law. Agents can be given incentives to change behavior. ? General problem: given a good model, determine how to change system behavior to optimize a global performance function. ¨C Distributed algorithmic mechanism design. ¨C Mix of economics/game theory and computer science. 26
  • 27. Possible Control Approaches ? Adding constraints: local or global ¨C Example: total space in a file system. ¨C Example: preferential attachment but links limited by an underlying metric. ? Add incentives or costs ¨C Example: charges for exceeding soft disk quotas. ¨C Example: payments for certain AS level connections. ? Limiting information ¨C Impact decisions by not letting everyone have true view of the system. 27
  • 28. Conclusion : My (Biased) View ? There are 5 stages of power law research. 1) Observe: Gather data to demonstrate power law behavior in a system. 2) Interpret: Explain the import of this observation in the system context. 3) Model: Propose an underlying model for the observed behavior of the system. 4) Validate: Find data to validate (and if necessary specialize or modify) the model. 5) Control: Design ways to control and modify the underlying behavior of the system based on the model. ? We need to focus on validation and control. ¨C Lots of open research problems. 28
  • 29. A Chance for Collaboration ? The observe/interpret stages of research are dominated by systems; modeling dominated by theory. ¨C And need new insights, from statistics, control theory, economics!!! ? Validation and control require a strong theoretical foundation. ¨C Need universal ideas and methods that span different types of systems. ¨C Need understanding of underlying mathematical models. ? But also a large systems buy-in. ¨C Getting/analyzing/understanding data. ¨C Find avenues for real impact. ? Good area for future systems/theory/others collaboration and interaction. 29