際際滷

際際滷Share a Scribd company logo
STATISTICAL ANALYSIS OF TAXI
MOBILITY IN SAN FRANCISCO




                               Oleguer Sagarra Pascual
          Master in Computational Physics. UB-UPC 2011
                  PhD Supervisor: Dr. Albert Diaz Guilera
Motivation

 Human Mobility Research
   GPS data applications
 Complex Network Science
   Directed Weighted network metric
   Big dataset visualisation
Structure


 The data
 Statistical review
 Building the net
 Net structure
 Open questions and some answers
The Data
  GPS high frequency* mobility traces from CRAWDAD
  UNIX time, GPS latitude & longitude, occupancy
  537 Taxis**
  106 Trips




* <ti+1-ti> 90 s

** Considered independent
Statistical Analysis: Overview

Constant regime
ending with fat tails
Initial Noise
Similar shapes for r
Bump on the tails



   Typical t, r
Ef鍖ciency & Location
Do taxis perform different?
  Ef鍖ciency parameters
Do taxis move different?
  Centre of Masses
  Radius of Gyration
Spread

 Radius of Gyration: Mean
 spread over taxi reference
 frame.
 Taxis wander when empty,
 not when full.
 Taxis mainly operate
 inside the city: Short
 Trips*.

* But the tails are fat... what about long trips?
Long Trips: Ranges & Correlation

  Overlap in r not present in t

  Length correlation for consecutive (full-empty) trip pairs in two groups:

     Short (r<5 km)

     Long (r>20 km)




The similude between tails is explained by legal issues

Movement habits of customers determine the tails of both statistics
Building the net


  Discretized grid of San Fracisco area (100 x 100 m)*
  Nodes: Locations (17000)
  Edges: Trips between locations (weighted and directed)
  Excluded self-loops (5000) and isolated nodes (100)


* Using the cartesian coordinates UTM system
Net structure
  Truncated power law
  k,s, distribution*


  Full net less dense, faster
  decay in trips**
*Locations in San Francisco are
limited, amount of trips is not.
**Behavioural differences
Weighted net: Correlations
 Strength and degree provide similar information
 The more connected a node is, the more usual it is
 visited, it becomes more important and better
 connected with important nodes.
Assortativity: r values
 We want to study behavioral patterns: Linking tendency

 Pearson r values on strengths (quantitative measurement)

   Directed net : 4 possible pairs

   Full: All disassortative, Empty: Mixed behavior
Top connected nodes greatly in鍖uence the statistics

Net is robust

  Assortative tendency for middle ranged nodes

  Dissassortative tendency for small/top nodes
Assortativity:neighbor degree
    We compute the mean (weighted) neighbor degree: Qualitative
    measure for linking tendency of equivalent* nodes P(k|k).
        If the network is uncorrelated, <knn>=ctt.




*Taking a mean 鍖eld approach, i.e. nodes are equivalent if they are equally connected/preferred (degree/strength).
Net representations



 Empty Net (video)
Some answers
     So, how do customers move?

          Limited minimum range of usage of taxi (but non-negligible long trips)

          Heterogeneous destinations (mostly inside the city) and heterogeneous lengths

          Move between hot spots, move from/to hubs and scattered locations

     So, how do drivers move?

          They are strongly in鍖uenced by customers (long trips)

          They do not seem to perform very differently*

          Searching strategy after each run: Nearest local hot spot (clustering) or global
          hub (assortativity)**

     Obvious? Maybe, but allows for comparison with other data sets and is a proof that
     the methodology is correct.

* Although no scaling relation was found...

** Reminiscent of L辿vy Flight...
Open questions and further research

     Mobility:

          GPS mobility traces can be studied through a consistent* complex metric

          Set the basis for a simple mechanistic (agent based) model to be simulated and
          optimitzed

          Compare with other datasets (other transportation means, other cities...)

     Complex Networks:

          Re鍖ne assortativity measures (r/neighbor degree)

          Further statistical characterization of the network (correlations, clustering, disparity...)

          Introduction of time measures: Net growth, dynamical weighted net, coupling
          transport/social networks...



* Fat tailed degree distributions, similar features shared with other studied systems.
Thanks for your attention...
STATISTICAL ANALYSIS OF TAXI
MOBILITY IN SAN FRANCISCO:
Complementary Material




                               Oleguer Sagarra Pascual
          Master in Computational Physics. UB-UPC 2011
Trimming and 鍖tting the data
Distributions for r,t
Building the net: Isolated and Sel鍖oops
Complex Networks: De鍖nitions

 E : Number of edges, N: Number of nodes

 Density

 Degree k (in-out): Number of edges entering (leaving) a node.

 Degree (strength) assortativity: r, <kwnn>,<swnn>

 Betweenness centrality: Rel. number of (weighted) shortest path
 passing through a node.

 Strongly (Weakly) connected component: Complete subgraph
 containing the nodes connected among themselves via pairs of
 directed edges (undirected).
Betweenness/Strength and Assortativity

More Related Content

Taxis.key

  • 1. STATISTICAL ANALYSIS OF TAXI MOBILITY IN SAN FRANCISCO Oleguer Sagarra Pascual Master in Computational Physics. UB-UPC 2011 PhD Supervisor: Dr. Albert Diaz Guilera
  • 2. Motivation Human Mobility Research GPS data applications Complex Network Science Directed Weighted network metric Big dataset visualisation
  • 3. Structure The data Statistical review Building the net Net structure Open questions and some answers
  • 4. The Data GPS high frequency* mobility traces from CRAWDAD UNIX time, GPS latitude & longitude, occupancy 537 Taxis** 106 Trips * <ti+1-ti> 90 s ** Considered independent
  • 5. Statistical Analysis: Overview Constant regime ending with fat tails Initial Noise Similar shapes for r Bump on the tails Typical t, r
  • 6. Ef鍖ciency & Location Do taxis perform different? Ef鍖ciency parameters Do taxis move different? Centre of Masses Radius of Gyration
  • 7. Spread Radius of Gyration: Mean spread over taxi reference frame. Taxis wander when empty, not when full. Taxis mainly operate inside the city: Short Trips*. * But the tails are fat... what about long trips?
  • 8. Long Trips: Ranges & Correlation Overlap in r not present in t Length correlation for consecutive (full-empty) trip pairs in two groups: Short (r<5 km) Long (r>20 km) The similude between tails is explained by legal issues Movement habits of customers determine the tails of both statistics
  • 9. Building the net Discretized grid of San Fracisco area (100 x 100 m)* Nodes: Locations (17000) Edges: Trips between locations (weighted and directed) Excluded self-loops (5000) and isolated nodes (100) * Using the cartesian coordinates UTM system
  • 10. Net structure Truncated power law k,s, distribution* Full net less dense, faster decay in trips** *Locations in San Francisco are limited, amount of trips is not. **Behavioural differences
  • 11. Weighted net: Correlations Strength and degree provide similar information The more connected a node is, the more usual it is visited, it becomes more important and better connected with important nodes.
  • 12. Assortativity: r values We want to study behavioral patterns: Linking tendency Pearson r values on strengths (quantitative measurement) Directed net : 4 possible pairs Full: All disassortative, Empty: Mixed behavior
  • 13. Top connected nodes greatly in鍖uence the statistics Net is robust Assortative tendency for middle ranged nodes Dissassortative tendency for small/top nodes
  • 14. Assortativity:neighbor degree We compute the mean (weighted) neighbor degree: Qualitative measure for linking tendency of equivalent* nodes P(k|k). If the network is uncorrelated, <knn>=ctt. *Taking a mean 鍖eld approach, i.e. nodes are equivalent if they are equally connected/preferred (degree/strength).
  • 16. Some answers So, how do customers move? Limited minimum range of usage of taxi (but non-negligible long trips) Heterogeneous destinations (mostly inside the city) and heterogeneous lengths Move between hot spots, move from/to hubs and scattered locations So, how do drivers move? They are strongly in鍖uenced by customers (long trips) They do not seem to perform very differently* Searching strategy after each run: Nearest local hot spot (clustering) or global hub (assortativity)** Obvious? Maybe, but allows for comparison with other data sets and is a proof that the methodology is correct. * Although no scaling relation was found... ** Reminiscent of L辿vy Flight...
  • 17. Open questions and further research Mobility: GPS mobility traces can be studied through a consistent* complex metric Set the basis for a simple mechanistic (agent based) model to be simulated and optimitzed Compare with other datasets (other transportation means, other cities...) Complex Networks: Re鍖ne assortativity measures (r/neighbor degree) Further statistical characterization of the network (correlations, clustering, disparity...) Introduction of time measures: Net growth, dynamical weighted net, coupling transport/social networks... * Fat tailed degree distributions, similar features shared with other studied systems.
  • 18. Thanks for your attention...
  • 19. STATISTICAL ANALYSIS OF TAXI MOBILITY IN SAN FRANCISCO: Complementary Material Oleguer Sagarra Pascual Master in Computational Physics. UB-UPC 2011
  • 22. Building the net: Isolated and Sel鍖oops
  • 23. Complex Networks: De鍖nitions E : Number of edges, N: Number of nodes Density Degree k (in-out): Number of edges entering (leaving) a node. Degree (strength) assortativity: r, <kwnn>,<swnn> Betweenness centrality: Rel. number of (weighted) shortest path passing through a node. Strongly (Weakly) connected component: Complete subgraph containing the nodes connected among themselves via pairs of directed edges (undirected).