際際滷

際際滷Share a Scribd company logo
1
Charlottesville Open Data
Challenge
Team DSB
Matt Miller, Nikhil Shetty
2
OBSERVATIONS IN NUMBER OF CLIENTS DATA
 High variance from April to June indicate either special events (holiday, festival, event in downtown mall),
beautiful weather drawing visitors to the downtown mall, and/or surprise inclement weather forcing visitors
indoors and onto Wi-Fi
 No observable increasing or decreasing trend in overall time series; the slope of the plotted trendline is not
statistically significant
Monticello Wine Trail Festival
Tom Tom Founders Festival
Pride Festival
3
NUMBER OF CLIENTS & WEATHER DATA
 Monthly trend in number of clients reveals
correlation with weather data. Number of
clients rises and falls with temperature
 April-August: High
 Sept-Oct : Medium
 Nov-Mar: Low
 Precipitation, observed at a daily level, does
not seem to have a consistent effect on the
number of clients. More granular, hourly data
may be more predictive
 The number of clients is highest in the
months of April to August  a time when most
UVa students are out of town. Thus, UVa
students are not a significant percentage of
Wi-Fi clients at the downtown mall
4
STRONG WEEKLY SEASONALITY OBSERVED IN
THE CLIENTS DATA
 Number of clients exhibits strong weekly seasonality  increases steadily through the week starting on
Sunday, peaks on Friday and settles down at the end of the week
 Fridays are the most popular days on downtown mall, particularly from April to September, during
Fridays After Five
5
SESSIONS DATA CLOSELY FOLLOWS CLIENTS
DATA
 Number of sessions is highly correlated with number of clients
 The histogram of sessions per client follows a near normal distribution indicating there are no additional factors affecting
number of sessions beyond those captured in the number of clients
Note: The data for number of sessions is missing for the months of Jan and half of Feb.
Therefore the # sessions values in Jan & Feb are low.
6
OBSERVATIONS IN USAGE DATA
 Usage data is inconsistent with clients data. Usage is highest in Oct-Nov while clients are highest in Apr-Aug,
indicating that the drivers of usage differ from drivers of clients
 No global trend observed in usage data
 Downloads are roughly 85% of total data usage, with uploads comprising the remainder. This ratio shifts slightly
towards uploads on Friday, Saturday, and Sunday
7
NO WEEKLY SEASONALITY IN USAGE
 The number of clients is highest on Fridays and Saturdays, but data usage does not peak on those days. Thus,
weekend visitors drive up the number of clients but are light consumers of Wi-Fi data
 Therefore, clients can be broken down into two segments:
 Segment 1  Weekend visitors, large in number but light users of data
 Segment 2  Likely local residents/businesses, small in number but heavy users of data
8
DAILY SEASONALITY IN USAGE DATA
Total usage follows a daily seasonality peaking between 10am-6pm EST (9am-5pm with daylight savings) each
day. Since these are non-peak hours for visitors, it reinforces the hypothesis that local residents and/or
businesses (Segment 2) are the biggest consumers of Wi-Fi data
Note: The time on the x-axis is UTC time zone
9
PARKING TICKET DATA ACTS AS A PROXY
FOR DOWNTOWN MALL ACTIVITY
Heatmap of Parking Tickets Issued 2017
 Parking tickets are issued Mon-Fri
 Data set is publicly available through
City of Charlottesville Open Data Portal
0
50
100
150
200
250
300
350
400
450
500
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Parking Tickets by Hour of Day
and Day of Week
Mon Tue Wed Thu Fri
10
COMPARISON OF WEEKLY SEASONALITY IS
INCONCLUSIVE
On a daily level, parking tickets
track more closely with data
usage than with sessions or
clients, but still the relationship
is weak
Note: Weekends excluded because very few parking tickets are issued on weekends
-
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
0
20
40
60
80
100
120
140
160
180
Mon Tue Wed Thu Fri
DataUsage(MB)
Tickets,Clients,Sessions
Average Parking Tickets Verses
Wi-Fi Clients, Sessions, and Usage
Tickets Clients Sessions (x10^-1) Data Usage
11
PARKING TICKETS SHOW A MEANINGFUL
CORRELATION TO DATA USAGE AT 4-HOUR
GRANULARITY
Note: Weekends excluded because very few parking tickets are issued on weekends
y = 9982.5x + 533600
R族 = 0.0297
-
1,000,000
2,000,000
3,000,000
4,000,000
5,000,000
6,000,000
7,000,000
0 20 40 60 80 100
DataUsage(B)
Parking Tickets
4-Hour Data Usage vs Parking Tickets
y = 0.3945x + 11.469
R族 = 0.0362
0
2
4
6
8
10
12
14
16
18
0 1 2 3 4 5
LN(DataUsage)
LN(Parking Tickets + 1)
Log-Log Transform
4-Hour Data Usage vs Parking Tickets
 Parking tickets partially explain visitors to the downtown mall, and therefore data usage
 If client and session data were available with 4-hour granularity, we could more rigorously test this claim
and tease out the relationship between tickets and data usage versus tickets and clients
12
13
NO OBSERVED SEASONALITY IN CLIENTS
ACROSS DAYS OF MONTH
14
BREAK-UP OF USAGE DATA

More Related Content

2018 Charlottesville Open Data Challenge - Team DSB

  • 1. 1 Charlottesville Open Data Challenge Team DSB Matt Miller, Nikhil Shetty
  • 2. 2 OBSERVATIONS IN NUMBER OF CLIENTS DATA High variance from April to June indicate either special events (holiday, festival, event in downtown mall), beautiful weather drawing visitors to the downtown mall, and/or surprise inclement weather forcing visitors indoors and onto Wi-Fi No observable increasing or decreasing trend in overall time series; the slope of the plotted trendline is not statistically significant Monticello Wine Trail Festival Tom Tom Founders Festival Pride Festival
  • 3. 3 NUMBER OF CLIENTS & WEATHER DATA Monthly trend in number of clients reveals correlation with weather data. Number of clients rises and falls with temperature April-August: High Sept-Oct : Medium Nov-Mar: Low Precipitation, observed at a daily level, does not seem to have a consistent effect on the number of clients. More granular, hourly data may be more predictive The number of clients is highest in the months of April to August a time when most UVa students are out of town. Thus, UVa students are not a significant percentage of Wi-Fi clients at the downtown mall
  • 4. 4 STRONG WEEKLY SEASONALITY OBSERVED IN THE CLIENTS DATA Number of clients exhibits strong weekly seasonality increases steadily through the week starting on Sunday, peaks on Friday and settles down at the end of the week Fridays are the most popular days on downtown mall, particularly from April to September, during Fridays After Five
  • 5. 5 SESSIONS DATA CLOSELY FOLLOWS CLIENTS DATA Number of sessions is highly correlated with number of clients The histogram of sessions per client follows a near normal distribution indicating there are no additional factors affecting number of sessions beyond those captured in the number of clients Note: The data for number of sessions is missing for the months of Jan and half of Feb. Therefore the # sessions values in Jan & Feb are low.
  • 6. 6 OBSERVATIONS IN USAGE DATA Usage data is inconsistent with clients data. Usage is highest in Oct-Nov while clients are highest in Apr-Aug, indicating that the drivers of usage differ from drivers of clients No global trend observed in usage data Downloads are roughly 85% of total data usage, with uploads comprising the remainder. This ratio shifts slightly towards uploads on Friday, Saturday, and Sunday
  • 7. 7 NO WEEKLY SEASONALITY IN USAGE The number of clients is highest on Fridays and Saturdays, but data usage does not peak on those days. Thus, weekend visitors drive up the number of clients but are light consumers of Wi-Fi data Therefore, clients can be broken down into two segments: Segment 1 Weekend visitors, large in number but light users of data Segment 2 Likely local residents/businesses, small in number but heavy users of data
  • 8. 8 DAILY SEASONALITY IN USAGE DATA Total usage follows a daily seasonality peaking between 10am-6pm EST (9am-5pm with daylight savings) each day. Since these are non-peak hours for visitors, it reinforces the hypothesis that local residents and/or businesses (Segment 2) are the biggest consumers of Wi-Fi data Note: The time on the x-axis is UTC time zone
  • 9. 9 PARKING TICKET DATA ACTS AS A PROXY FOR DOWNTOWN MALL ACTIVITY Heatmap of Parking Tickets Issued 2017 Parking tickets are issued Mon-Fri Data set is publicly available through City of Charlottesville Open Data Portal 0 50 100 150 200 250 300 350 400 450 500 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Parking Tickets by Hour of Day and Day of Week Mon Tue Wed Thu Fri
  • 10. 10 COMPARISON OF WEEKLY SEASONALITY IS INCONCLUSIVE On a daily level, parking tickets track more closely with data usage than with sessions or clients, but still the relationship is weak Note: Weekends excluded because very few parking tickets are issued on weekends - 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 0 20 40 60 80 100 120 140 160 180 Mon Tue Wed Thu Fri DataUsage(MB) Tickets,Clients,Sessions Average Parking Tickets Verses Wi-Fi Clients, Sessions, and Usage Tickets Clients Sessions (x10^-1) Data Usage
  • 11. 11 PARKING TICKETS SHOW A MEANINGFUL CORRELATION TO DATA USAGE AT 4-HOUR GRANULARITY Note: Weekends excluded because very few parking tickets are issued on weekends y = 9982.5x + 533600 R族 = 0.0297 - 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 6,000,000 7,000,000 0 20 40 60 80 100 DataUsage(B) Parking Tickets 4-Hour Data Usage vs Parking Tickets y = 0.3945x + 11.469 R族 = 0.0362 0 2 4 6 8 10 12 14 16 18 0 1 2 3 4 5 LN(DataUsage) LN(Parking Tickets + 1) Log-Log Transform 4-Hour Data Usage vs Parking Tickets Parking tickets partially explain visitors to the downtown mall, and therefore data usage If client and session data were available with 4-hour granularity, we could more rigorously test this claim and tease out the relationship between tickets and data usage versus tickets and clients
  • 12. 12
  • 13. 13 NO OBSERVED SEASONALITY IN CLIENTS ACROSS DAYS OF MONTH