際際滷

際際滷Share a Scribd company logo
EXPLORATORY
Kan Nishida
co-founder/CEO
Exploratory
Summary
Beginning of 2016, launched Exploratory, Inc. to make
Data Science available for everyone.
Prior to Exploratory, Kan was a director of development at
Oracle leading development teams for building various
Data Science products in areas including Machine
Learning, BI, Data Visualization, Mobile Analytics, Big Data,
etc.
While at Oracle, Kan also provided training and consulting
services to help organizations transform with data.
@KanAugust
Instructor
Mission
Make Data Science available for everyone
Data Science is not just for Engineers and Statisticians.
Exploratory makes it possible for Everyone to do Data Science.
The Third Wave
First Wave Second Wave Third Wave
Proprietary Open Source
UI & Programming Programming
201620001976
Monetization Commoditization Democratization
Statisticians Data Scientists
Smart Waves - Machine Learning / AI
Algorithms
Experience
Tools
Open Source
UI & Automation
Business Users
Theme
Users
Exploratory
Questions
Data Science Work鍖ow
CommunicationData Access
Data Wrangling
Data
Visualization
Machine
Learning /
Statistics
Exploration
Questions
What you can do with Exploratory
CommunicationData Access
Data Wrangling
Visualization
Machine Learning /
Statistics
Exploratory
Data
Analysis
Questions CommunicationData Access
Data Wrangling
Visualization
Exploratory
Data
Analysis
Machine Learning /
Statistics
Working with Date & Time
 User Activity Data
 Each row represents an user access for a 鍖ctional online service.
 There are 6 columns, timestamp, user id, event type, IP address,
OS, and OS version.
 Download EDF
Data
Data Wrangling: Working with Date / Time Data and Visualizing It
Questions
1. What is the duration (date range) of this data?
2. What is DAU (Daily Active Users) and how its
been changed over time?
3. Which days of week (e.g. Monday) and hours
are more active?
Data Wrangling: Working with Date / Time Data and Visualizing It
Data Wrangling: Working with Date / Time Data and Visualizing It
Character vs. Date/Time
Data Type
Character vs. Date/Time
Character vs. Date/Time
Character vs. Date/Time
Date data is recognized as character.
Dates duration is
igonored
Sorted as
character.
e.g. 10 (Oct.)
comes after 1
(Jan)
Data: Date-unicorn.csv
Character vs. Date/Time
Various transformation on date data is available
Data is sorted as dates.
Duration
honors
date
interval.
Date vs. POSIXct
Data Type for Data & Time
Date POSIXct
Only Date
Both
Date and Time
By making it to Date & Time data type, you
can do a lot of cool things.
1. Convert Character to Date / Time
2. Extract Date / Time Attributes
3. Filter with Date / Time
4. Duration
5. Round Date / Time
6. Timezone
Common Tasks
1. Convert Character to Date / Time
2. Extract Date / Time Attributes
3. Filter with Date / Time
4. Duration
5. Round Date / Time
6. Timezone
Common Tasks
Data Wrangling: Working with Date / Time Data and Visualizing It
Data Wrangling: Working with Date / Time Data and Visualizing It
Only codes you need to know
 Year
 Month
 Day
 Hour
 Minute
 Second
Date Format
2017-01-01
y m d
2017-01-01 08:10:10
y m d h m s
ymd("2015-10-01")
ymd("2015/10/01")
ymd("Created on 2015 October 1st")
ymd
ymd_hms("2015-01-10T06:10:15")
ymd_hms("2015/01/10 06:10:15 UTC")
ymd_hms("Created on 15-01-10 at 06:10:15 AM")
ymd_hms
mdy("01-10-2015")
mdy("01/10/2015")
mdy("Created on 1 10 2015")
mdy
mdy_hms("10-01-2015T06:10:15")
mdy_hms("10/01/2015T06:10:15")
mdy_hms("Created on 10 1 2015")
mdy_hms
1. Convert Character to Date / Time
2. Extract Date / Time Attributes
3. Filter with Date / Time
4. Duration
5. Round Date / Time
6. Timezone
Common Tasks
Date/Time
 2017-06-24
 2017-06-24
 June
 24th
 2017
 175th day in 2017
 Saturday
Date Attributes
 2017-06-24
 2017-01-01 08:10:10
Time Attributes
 AM
 8 hours
 10 minutes
 10 seconds
Time Attributes
 2017-01-01 08:10:10
month(start_time, label = TRUE)
 Jan, Feb, Mar 
Extract Month
Extract Month
From Column Header Menu
1. Select Extract
2. Select Month - Short Name (Jan)
Extract Month
Data Wrangling: Working with Date / Time Data and Visualizing It
wday(start_time, label = TRUE)
 Sun, Mon, Tue 
Extract Day of Week
Extract Day of Week
From Column Header Menu
1. Select Extract
2. Select Day of Week - Short Name (Mon)
Extract Day of Week
Data Wrangling: Working with Date / Time Data and Visualizing It
Ordinal - Ordered Factor
 Month, Day of Week should
be sorted in the natural
order.
 Rs factor data type
supports Order information.
 Functions like wday,
month, take care of it.
1. Convert Character to Date / Time
2. Extract Date / Time Attributes
3. Filter with Date / Time
4. Duration
5. Round Date / Time
6. Timezone
Common Tasks
Relative vs. Absolute
Relative Date
 Previous Year
 This Year
 Last <N> Years
 Year to Date
Year
Today20182017
Previous Year
This Year
2016
Last 2 Years
2019
Year to Date
Relative Date
Absolute Date
 equal to / not equal to
 is in / is not in
 earlier than
 later than
 between
Today20182017
Year == 2017
Date > 1 year ago
2016
Year > 2016
2019
Between 2017-06-01 and 2018-1-30
Absolute Date
Filter: Year is 2017
Filter: Later than 1 year ago
Filter: Later than 2016-12-01
1. Convert Character to Date / Time
2. Extract Date / Time Attributes
3. Filter with Date / Time
4. Duration
5. Round Date / Time
6. Timezone
Common Tasks
3 weeks
4 weeks
2 weeks
First Date Last Date
First Date Last Date
First Date Last Date
Duration
Calculate Duration
last_activity_date - 鍖rst_activity_date
1. Calculate lifetime for each user
1. Calculate lifetime for each user
last_activity_date - 鍖rst_activity_date
Duration is calculated and stored in seconds (unit) as di鍖time data type.
From Column Header Menu
1. Select Change Data Type
2. Select Convert to Number
3. Select Days
2. Convert the lifetime to numeric data type (in days)
2. Convert the lifetime to numeric data type (in days)
as.numeric(duration, units = days)
You can set units inside as.numeric function.
1. Convert Character to Date / Time
2. Extract Date / Time Attributes
3. Filter with Date / Time
4. Duration
5. Round Date / Time
6. Timezone
Common Tasks
Round Date/Time
 Round
 Ceiling
 Floor
Round to Day
Round to Week
Now timestamp column shows each weeks start date
Round to Month
Now timestamp column shows each months start date (i.e. 1st)
# round
round_date(start_time, unit="week")
# ceil
ceiling_date(start_time, unit="week")
# 鍖oor
鍖oor_date(start_time, unit="week")
Round vs. Ceiling vs. Floor
round_date(ymd(2017-06-24), unit="week")
 2017-06-25"
Round
The border is on Wednesday noon
Round
ceiling_date(ymd(2017-06-24), unit="week")
 2017-06-25"
Ceiling
The border is on Sunday midnight (0:00am)
Ceiling
鍖oor_date(ymd(2017-06-24), unit="week")
 2017-06-18"
Floor
The border is on Sunday midnight (0:00am)
Floor
To round date to week..
From Column Header Menu
1. Select Round
2. Select Round Date 
3. Select Week
round_date(created_at, week)
1. Convert Character to Date / Time
2. Extract Date / Time Attributes
3. Filter with Date / Time
4. Duration
5. Round Date / Time
6. Timezone
Common Tasks
 We have Temperature Data of London and Tokyo
 Each row represents a temperature for a certain date/time in year 2016.
There are 17,498 temperature data of London and 19,489temperature
data of Tokyo
 Each temperature record has date/time, longitude, latitude, temperature,
etc
 Filename: Date-London-temp.csv and Date-Tokyo-temp.csv
Timezone - Data
Extension Data - Weather
Data Wrangling: Working with Date / Time Data and Visualizing It
Data Wrangling: Working with Date / Time Data and Visualizing It
Data Wrangling: Working with Date / Time Data and Visualizing It
Data Wrangling: Working with Date / Time Data and Visualizing It
For London, 2:00pm is the peak of
Average temperature
 It sounds reasonable.
For Tokyo, 5:00am is the peak of
Average temperature
 ???
When you compare hourly temperature data
between London and Tokyo
Data: Date-London-temp.csv, Date-Tokyo-temp.csv
 From the hourly temperature data of Tokyo, I want to know what time is
the most hot in the day, but the time indicated by the date / time data
is different from the actual time in Tokyo
 We would like to compare average hourly temperatures of two cities
with different time zones
Problem
2PM JST (Japan Standard Time)
2PM GMT (Greenwich Mean Time)
Timezone
London Tokyo (Time difference from London: 9hours)
UTC (Coordinated Universal Time)
 It is the base point for all other time zones in the world
 POSIXct is basically based on the UTC
 UTC and GMT (Greenwich Mean Time) are almost identical. ( That is
why the hourly temperature data for London is displayed correctly on
the previous chart.)
Timezone
 2017-01-01 08:10:10 UTC
 2017-01-01 08:10:10 -900
Various Time zones
 America/New York
 America/Los_Angeles
 Asia/Tokyo
Asia/TokyoAmerica/Los_Angeles America/New_York
with_tz
# Append Timezone information
with_tz(ymd_hms("2015-10-01 02:20:34))
 "2015-09-30 19:20:34 PDT"
Default value of with_tz is local
machines timezone.
In this example, PDT (Paci鍖c
Daylight Time)
with_tz(ymd_hms("2015-10-01 02:20:34))
 "2015-09-30 19:20:34 PDT"
with_tz(ymd_hms("2015-10-01 02:20:34"), tz = "Asia/Tokyo")
 "2015-10-01 11:20:34 JST"
with_tz
By specifying timezone information,
You can convert date/time to any
timezone
with_tz
From Column Header Menu
1. Select Create Calculation
with_tz
Credits
lubridate
Do more with dates and times in R
https://lubridate.tidyverse.org
Garrett Grolemund Hadley Wickham Vitalie Spinu
Future Seminars
January 15th (Tuesday), 2019
 Data Wrangling: Working with Text Data
Planned
 Analytics 101 - When to use which algorithms?
 Data Wrangling: Introduction to Regular Expression
https://exploratory.io/online-seminar
Contact
Email
kan@exploratory.io
Data Science Training
https://exploratory.io/training
Twitter
@KanAugust
Online Seminar
https://exploratory.io/online-seminar

More Related Content

Data Wrangling: Working with Date / Time Data and Visualizing It