The talk will be oriented on differences between "doing" a research and an application of time series data mining to real problems in business on a real rich data.
I will discuss, why research and business need to be related and also not. Typical tasks of time series data mining in energetics with use cases in R will be shown.
Convert to study guideBETA
Transform any presentation into a summarized study guide, highlighting the most important points and key insights.
1 of 66
Downloaded 19 times
More Related Content
Time Series Data Mining - from PhD to Startup
1. Time Series Data Mining -
from PhD to Startup
Peter Laurinec
October 27, 2018
3. Highlights
Time series data mining - from PhD to start-up:
Problems and solutions for using large amount of long
time series (TS),
1/27
4. Highlights
Time series data mining - from PhD to start-up:
Problems and solutions for using large amount of long
time series (TS),
TS data mining methods,
1/27
5. Highlights
Time series data mining - from PhD to start-up:
Problems and solutions for using large amount of long
time series (TS),
TS data mining methods,
PhD. study thesis - combining and developing TS data
mining methods,
1/27
6. Highlights
Time series data mining - from PhD to start-up:
Problems and solutions for using large amount of long
time series (TS),
TS data mining methods,
PhD. study thesis - combining and developing TS data
mining methods,
TSrepr R package - TS representations,
1/27
7. Highlights
Time series data mining - from PhD to start-up:
Problems and solutions for using large amount of long
time series (TS),
TS data mining methods,
PhD. study thesis - combining and developing TS data
mining methods,
TSrepr R package - TS representations,
Work after Phd - energy start-up,
1/27
8. Highlights
Time series data mining - from PhD to start-up:
Problems and solutions for using large amount of long
time series (TS),
TS data mining methods,
PhD. study thesis - combining and developing TS data
mining methods,
TSrepr R package - TS representations,
Work after Phd - energy start-up,
Differences and my thoughts,
1/27
9. Highlights
Time series data mining - from PhD to start-up:
Problems and solutions for using large amount of long
time series (TS),
TS data mining methods,
PhD. study thesis - combining and developing TS data
mining methods,
TSrepr R package - TS representations,
Work after Phd - energy start-up,
Differences and my thoughts,
What we do there...
1/27
10. Time Series Data in Energetics
Smart metering
Measuring electricity consumption or production
(photovoltaic panels) from every consumer or
producer (together prosumer) every 5, 15, or 30
minutes,
This creates a large amount of time series data,
3 years of data from consumer 96*365*3 =
105120...from 10 thousand consumers... > 1 billion rows
of multiple columns,
Smart grid - set of consumers and producers,
2/27
11. Time Series Data in Energetics
Smart metering
Measuring electricity consumption or production
(photovoltaic panels) from every consumer or
producer (together prosumer) every 5, 15, or 30
minutes,
This creates a large amount of time series data,
3 years of data from consumer 96*365*3 =
105120...from 10 thousand consumers... > 1 billion rows
of multiple columns,
Smart grid - set of consumers and producers,
Characteristics:
High-dimensionality,
Multiple seasonalities (daily, weekly, yearly),
Large amount of stochastic factors as: weather,
holidays, black-outs, changes on market etc. 2/27
13. Typical Use Cases
Forecasting el. consumption or production - market
planning, black-outs prevention etc.,
4/27
14. Typical Use Cases
Forecasting el. consumption or production - market
planning, black-outs prevention etc.,
Extract typical pro鍖les of consumption - changes in tariffs,
create new ones etc.,
4/27
15. Typical Use Cases
Forecasting el. consumption or production - market
planning, black-outs prevention etc.,
Extract typical pro鍖les of consumption - changes in tariffs,
create new ones etc.,
Optimizing electricity consumption of some consumer,
4/27
16. Typical Use Cases
Forecasting el. consumption or production - market
planning, black-outs prevention etc.,
Extract typical pro鍖les of consumption - changes in tariffs,
create new ones etc.,
Optimizing electricity consumption of some consumer,
Optimizing whole smart grid,
4/27
17. Typical Use Cases
Forecasting el. consumption or production - market
planning, black-outs prevention etc.,
Extract typical pro鍖les of consumption - changes in tariffs,
create new ones etc.,
Optimizing electricity consumption of some consumer,
Optimizing whole smart grid,
Monitoring smart grid,
4/27
18. Typical Use Cases
Forecasting el. consumption or production - market
planning, black-outs prevention etc.,
Extract typical pro鍖les of consumption - changes in tariffs,
create new ones etc.,
Optimizing electricity consumption of some consumer,
Optimizing whole smart grid,
Monitoring smart grid,
Anomaly detection.
4/27
19. TS Data Mining Methods
Methods for working with TS:
5/27
20. TS Data Mining Methods
Methods for working with TS:
TS representations,
5/27
21. TS Data Mining Methods
Methods for working with TS:
TS representations,
TS distance measures,
5/27
22. TS Data Mining Methods
Methods for working with TS:
TS representations,
TS distance measures,
Tasks:
5/27
23. TS Data Mining Methods
Methods for working with TS:
TS representations,
TS distance measures,
Tasks:
TS classi鍖cation,
TS clustering,
TS forecasting,
TS anomaly detection,
TS indexing.
5/27
24. PhD. Thesis Goals
The thesis had the goal to investigate, in the broader
context, the usage of time series data mining (analysis)
methods in order to improve the predictive performance
of machine learning methods and its combinations.
6/27
25. PhD. Thesis Goals
The thesis had the goal to investigate, in the broader
context, the usage of time series data mining (analysis)
methods in order to improve the predictive performance
of machine learning methods and its combinations.
In more detail, the goal was to investigate the usage of
various time series representations for seasonal time
series, clustering, and forecasting methods for electricity
consumption forecasting accuracy improvement.
6/27
28. I. Time Series Representations
What can we do for solving problems with high-dimensional
TS?
9/27
29. I. Time Series Representations
What can we do for solving problems with high-dimensional
TS?
Use time series representations!
9/27
30. I. Time Series Representations
What can we do for solving problems with high-dimensional
TS?
Use time series representations!
They are excellent to:
Reduce memory load.
Accelerate subsequent machine learning algorithms.
Implicitly remove noise from the data.
Emphasize the essential characteristics of the data.
Help to 鍖nd patterns in data (or motifs).
9/27
33. I. Time Series Representations 1
I used TS representations for:
1
Laurinec P., Luck叩 M., Lecture Notes in Engineering and Computer Science:
Proceedings of The World Congress on Engineering and Computer Science 2016.
12/27
34. I. Time Series Representations 1
I used TS representations for:
Dimensionality reduction (curse of dimensionality),
1
Laurinec P., Luck叩 M., Lecture Notes in Engineering and Computer Science:
Proceedings of The World Congress on Engineering and Computer Science 2016.
12/27
35. I. Time Series Representations 1
I used TS representations for:
Dimensionality reduction (curse of dimensionality),
Emphasising the main characteristics of data,
1
Laurinec P., Luck叩 M., Lecture Notes in Engineering and Computer Science:
Proceedings of The World Congress on Engineering and Computer Science 2016.
12/27
36. I. Time Series Representations 1
I used TS representations for:
Dimensionality reduction (curse of dimensionality),
Emphasising the main characteristics of data,
More accurate clustering of consumers TS to create more
predictable (forecastable) groups of aggregated TS of
electricity consumption.
1
Laurinec P., Luck叩 M., Lecture Notes in Engineering and Computer Science:
Proceedings of The World Congress on Engineering and Computer Science 2016.
12/27
39. TSrepr
TSrepr - CRAN2, GitHub3
R package for time series representations computing
Large amount of various methods are implemented
Several useful support functions are also included
Easy to extend and to use
data <- rnorm(1000)
repr_paa(data, func = median, q = 10)
2
https://CRAN.R-project.org/package=TSrepr
3
https://github.com/PetoLau/TSrepr/
15/27
40. All type of time series representations methods are implemented, so far these:
PAA - Piecewise Aggregate Approximation ( repr_paa )
DWT - Discrete Wavelet Transform ( repr_dwt )
DFT - Discrete Fourier Transform ( repr_dft )
DCT - Discrete Cosine Transform ( repr_dct )
PIP - Perceptually Important Points ( repr_pip )
SAX - Symbolic Aggregate Approximation ( repr_sax )
PLA - Piecewise Linear Approximation ( repr_pla )
Mean seasonal pro鍖le ( repr_seas_profile )
Model-based seasonal representations based on linear model ( repr_lm )
FeaClip - Feature extraction from clipping representation ( repr_feaclip )
Additional useful functions are implemented as:
Windowing ( repr_windowing )
Matrix of representations ( repr_matrix )
Normalisation functions - z-score ( norm_z ), min-max ( norm_min_max )
16/27
41. Usage of TSrepr
mat <- "some matrix with lot of time series"
mat_reprs <- repr_matrix(mat, func = repr_lm,
args = list(method = "rlm", freq = c(48, 48*7)),
normalise = TRUE, func_norm = norm_z)
mat_reprs <- repr_matrix(mat, func = repr_feaclip,
windowing = TRUE, win_size = 48)
clustering <- kmeans(mat_reprs, 20)
17/27
44. II. Clustering Multiple Data Streams 4
Motivation:
4
https://github.com/PetoLau/ClipStream/
20/27
45. II. Clustering Multiple Data Streams 4
Motivation:
Deal with velocity of data coming,
4
https://github.com/PetoLau/ClipStream/
20/27
46. II. Clustering Multiple Data Streams 4
Motivation:
Deal with velocity of data coming,
Dynamic change of number of clusters,
4
https://github.com/PetoLau/ClipStream/
20/27
47. II. Clustering Multiple Data Streams 4
Motivation:
Deal with velocity of data coming,
Dynamic change of number of clusters,
Automatic anomaly detection (anomalous consumers),
4
https://github.com/PetoLau/ClipStream/
20/27
48. II. Clustering Multiple Data Streams 4
Motivation:
Deal with velocity of data coming,
Dynamic change of number of clusters,
Automatic anomaly detection (anomalous consumers),
Automatic change detection.
4
https://github.com/PetoLau/ClipStream/
20/27
49. II. Clustering Multiple Data Streams 4
Motivation:
Deal with velocity of data coming,
Dynamic change of number of clusters,
Automatic anomaly detection (anomalous consumers),
Automatic change detection.
Approach:
Take advantage of incrementality of clipped representation
(windowing),
Fast detection of anomalous consumers from extracted features from
clipping,
Change detection by Anderson-Darling test.
4
https://github.com/PetoLau/ClipStream/
20/27
52. III. Time Series Forecasting
Large number of methods suitable for forecasting:
Time series analysis methods:
ARIMA,
Exponential smoothing,
Theta,
23/27
53. III. Time Series Forecasting
Large number of methods suitable for forecasting:
Time series analysis methods:
ARIMA,
Exponential smoothing,
Theta,
Regression methods:
Linear regression, GAM,
SVR, Gaussian process,
Regression trees, Bagging, Random Forest, Boosting,
Arti鍖cial Neural Networks.
23/27
54. III. Time Series Forecasting 5
Finding the most suitable forecasting methods with
clustering...
STL+ARIMA, Exponential smoothing, Tree-based methods,
Advanced ANNs (S2S + LSTM nets).
5
https://github.com/PetoLau/TSMedianBasedEnsembleLearning/,
https://github.com/PetoLau/UnsupervisedEnsembles/,
https://github.com/PetoLau/DensityEnsembles/
24/27
55. III. Time Series Forecasting 5
Finding the most suitable forecasting methods with
clustering...
STL+ARIMA, Exponential smoothing, Tree-based methods,
Advanced ANNs (S2S + LSTM nets).
The problem of choosing the most suitable method among the
set of methods...
Solution:
Ensemble learning - combining forecasts.
5
https://github.com/PetoLau/TSMedianBasedEnsembleLearning/,
https://github.com/PetoLau/UnsupervisedEnsembles/,
https://github.com/PetoLau/DensityEnsembles/
24/27
56. Life after PhD
I was happy to be hired by start-up PowereX.
We solve problems strongly related with my thesis.
25/27
57. Life after PhD
I was happy to be hired by start-up PowereX.
We solve problems strongly related with my thesis.
PowereX
P2P energy sharing - commodity and also capacity,
Analysis of consumers smart meter data,
Forecasting and modelling maximal load (hourly, daily,
etc.).
25/27
58. Differences between PhD and Business
PhD:
Strong focus on accuracy measures - % of Mean
Absolute Percentage Error, or internal validation indexes
for clustering...
26/27
59. Differences between PhD and Business
PhD:
Strong focus on accuracy measures - % of Mean
Absolute Percentage Error, or internal validation indexes
for clustering...
Many times working with poor academic datasets.
26/27
60. Differences between PhD and Business
PhD:
Strong focus on accuracy measures - % of Mean
Absolute Percentage Error, or internal validation indexes
for clustering...
Many times working with poor academic datasets.
Business:
Finding real value for customers,
Accuracy is not that important,
Working on real rich data.
26/27
61. Differences between PhD and Business
PhD:
Strong focus on accuracy measures - % of Mean
Absolute Percentage Error, or internal validation indexes
for clustering...
Many times working with poor academic datasets.
Business:
Finding real value for customers,
Accuracy is not that important,
Working on real rich data.
But...they are also related and need each other...
26/27
63. Conclusions
TS data mining:
TS representations are our 鍖ends in clustering,
forecasting, classi鍖cation etc.,
27/27
64. Conclusions
TS data mining:
TS representations are our 鍖ends in clustering,
forecasting, classi鍖cation etc.,
Implemented in TSrepr package,
27/27
65. Conclusions
TS data mining:
TS representations are our 鍖ends in clustering,
forecasting, classi鍖cation etc.,
Implemented in TSrepr package,
PhD study is great practice before work.
27/27
66. Conclusions
TS data mining:
TS representations are our 鍖ends in clustering,
forecasting, classi鍖cation etc.,
Implemented in TSrepr package,
PhD study is great practice before work.
Questions: Peter Laurinec laurinec.peter@gmail.com
Code: https://github.com/PetoLau/
More research: https://petolau.github.io/research
Blog: https://petolau.github.io
27/27