2. Kan Nishida
co-founder/CEO
Exploratory
Summary
Beginning of 2016, launched Exploratory, Inc. to make
Data Science available for everyone.
Prior to Exploratory, Kan was a director of development at
Oracle leading development teams for building various
Data Science products in areas including Machine
Learning, BI, Data Visualization, Mobile Analytics, Big Data,
etc.
While at Oracle, Kan also provided training and consulting
services to help organizations transform with data.
@KanAugust
Instructor
7. Questions
What you can do with Exploratory
CommunicationData Access
Data Wrangling
Visualization
Machine Learning /
Statistics
Exploratory
Data
Analysis
10. User Activity Data
Each row represents an user access for a 鍖ctional online service.
There are 6 columns, timestamp, user id, event type, IP address,
OS, and OS version.
Download EDF
Data
12. Questions
1. What is the duration (date range) of this data?
2. What is DAU (Daily Active Users) and how its
been changed over time?
3. Which days of week (e.g. Monday) and hours
are more active?
18. Character vs. Date/Time
Date data is recognized as character.
Dates duration is
igonored
Sorted as
character.
e.g. 10 (Oct.)
comes after 1
(Jan)
Data: Date-unicorn.csv
19. Character vs. Date/Time
Various transformation on date data is available
Data is sorted as dates.
Duration
honors
date
interval.
22. By making it to Date & Time data type, you
can do a lot of cool things.
23. 1. Convert Character to Date / Time
2. Extract Date / Time Attributes
3. Filter with Date / Time
4. Duration
5. Round Date / Time
6. Timezone
Common Tasks
24. 1. Convert Character to Date / Time
2. Extract Date / Time Attributes
3. Filter with Date / Time
4. Duration
5. Round Date / Time
6. Timezone
Common Tasks
27. Only codes you need to know
Year
Month
Day
Hour
Minute
Second
33. 1. Convert Character to Date / Time
2. Extract Date / Time Attributes
3. Filter with Date / Time
4. Duration
5. Round Date / Time
6. Timezone
Common Tasks
46. Ordinal - Ordered Factor
Month, Day of Week should
be sorted in the natural
order.
Rs factor data type
supports Order information.
Functions like wday,
month, take care of it.
47. 1. Convert Character to Date / Time
2. Extract Date / Time Attributes
3. Filter with Date / Time
4. Duration
5. Round Date / Time
6. Timezone
Common Tasks
56. 1. Convert Character to Date / Time
2. Extract Date / Time Attributes
3. Filter with Date / Time
4. Duration
5. Round Date / Time
6. Timezone
Common Tasks
57. 3 weeks
4 weeks
2 weeks
First Date Last Date
First Date Last Date
First Date Last Date
Duration
64. From Column Header Menu
1. Select Change Data Type
2. Select Convert to Number
3. Select Days
2. Convert the lifetime to numeric data type (in days)
67. 1. Convert Character to Date / Time
2. Extract Date / Time Attributes
3. Filter with Date / Time
4. Duration
5. Round Date / Time
6. Timezone
Common Tasks
81. 1. Convert Character to Date / Time
2. Extract Date / Time Attributes
3. Filter with Date / Time
4. Duration
5. Round Date / Time
6. Timezone
Common Tasks
82. We have Temperature Data of London and Tokyo
Each row represents a temperature for a certain date/time in year 2016.
There are 17,498 temperature data of London and 19,489temperature
data of Tokyo
Each temperature record has date/time, longitude, latitude, temperature,
etc
Filename: Date-London-temp.csv and Date-Tokyo-temp.csv
Timezone - Data
88. For London, 2:00pm is the peak of
Average temperature
It sounds reasonable.
For Tokyo, 5:00am is the peak of
Average temperature
???
When you compare hourly temperature data
between London and Tokyo
Data: Date-London-temp.csv, Date-Tokyo-temp.csv
89. From the hourly temperature data of Tokyo, I want to know what time is
the most hot in the day, but the time indicated by the date / time data
is different from the actual time in Tokyo
We would like to compare average hourly temperatures of two cities
with different time zones
Problem
90. 2PM JST (Japan Standard Time)
2PM GMT (Greenwich Mean Time)
Timezone
92. UTC (Coordinated Universal Time)
It is the base point for all other time zones in the world
POSIXct is basically based on the UTC
UTC and GMT (Greenwich Mean Time) are almost identical. ( That is
why the hourly temperature data for London is displayed correctly on
the previous chart.)
96. with_tz
# Append Timezone information
with_tz(ymd_hms("2015-10-01 02:20:34))
"2015-09-30 19:20:34 PDT"
Default value of with_tz is local
machines timezone.
In this example, PDT (Paci鍖c
Daylight Time)
97. with_tz(ymd_hms("2015-10-01 02:20:34))
"2015-09-30 19:20:34 PDT"
with_tz(ymd_hms("2015-10-01 02:20:34"), tz = "Asia/Tokyo")
"2015-10-01 11:20:34 JST"
with_tz
By specifying timezone information,
You can convert date/time to any
timezone
103. January 15th (Tuesday), 2019
Data Wrangling: Working with Text Data
Planned
Analytics 101 - When to use which algorithms?
Data Wrangling: Introduction to Regular Expression
https://exploratory.io/online-seminar