Predicting energy consumption in commercial buildings. Natural language processing of technician repair comments. Estimating tumbling spacecraft state from aliased sensor data.
2. Choose Your StoryChoose Your Story
7707-2-TOTAL7707-2-TOTAL
(770) 728-6825(770) 728-6825
1. Only Nyquist Knows
2. The Meaning of Mean
3. Data Dearth
4. Question the Question
5. Deep Net Runs Aground
6. Escape the Maze
bit.ly/pawsvote
3. 1. Only Nyquist Knows1. Only Nyquist Knows
When your vehicle is out of control...
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
Photo by
US Secret
Service
by
Eric Cutright
Public Domain
Photo
byNASA
Public Domain
Photo
4. 1. Only Nyquist Knows1. Only Nyquist Knows
Nav sensors (gyro., accel) are "pegged"
All you know is solar power:
How fast isHow fast is
the tumble?the tumble?
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
12 sec ?12 sec ?4 sec !4 sec !
8. WorkaroundsWorkarounds
If Nyquist sampling (2x faster than truth) isn't possible....
Use a di?erent sensor
Postprocess existing signal (radio doppler)
Sample irregularly!
Captures higher frequencies
Lomb-Scargle to post-process
Probabilistic modeling
Great for overwhelming data volume (IoT)
spectrum = scipy.signal.lombscargle(sample_times, samples, frequencies)
9. 2. The Meaning of Mean2. The Meaning of Mean
Means don't tell the whole story
Consider both and
Meaning may be found in the
means for each...
group, cluster, or class
For us we started with grouping by
time of day, but that wasn't
enough...
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
? ¦Ò
10. 2. The Meaning of Mean2. The Meaning of Mean
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
Regression and classi?cation required
Many "fundamental frequencies"
13. 3. Data Dearth3. Data Dearth
Tuning a 2-DOF predictive ?lter for performance
More data gives algorithm more to work with
Less Over?tting
More Performance
Anticlined cli?s
or "terraces"
More DataMore Data
PerformancePerformance
($)($)
ConservatismConservatism
14. Sometimes more of the same doesn't help
Exogenous factors confound the smartest algorithm
Make the exogenous endogenous (new data source)
3. Data Dearth3. Data Dearth
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
15. 4. Question the Question4. Question the Question
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
More sales => More returns
Normalize return rate for sales
(lag-compensated)
Multiple interracting causes
Correlation != Causation
(a. la. )Tyler Vigen
Reduce these returns surges!
16. Simple equation everyone can agree onSimple equation everyone can agree on
But it'sBut it's Wrong!Wrong!
RejectsRejects
SalesSales
((last quarterlast quarter))
4. Question the Question4. Question the Question
((last quarterlast quarter))
And it'sAnd it's Late!Late!
"Cost of quality""Cost of quality"
"Customer reject rate""Customer reject rate"
"Defect rate""Defect rate"
6¦Ò
Reject rateReject rate ==
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
17. 4. Better "Question"4. Better "Question"
Rejects (last quarter)Rejects (last quarter)
Sales (Sales (qtr before lastqtr before last))
Reject rateReject rate ==
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
20. "Birth-Death Process""Birth-Death Process"
r = ¦² ¦Ásr k n?k
H(t, ¦Ó)S(t) R(t)
All products "die",
Question is when
Flow rate
(Reject rate)
Product enters
"pipeline" arbitrarily
SaleSale RejectRejectLagLag
And the portion that
happens too soon
21. 4. Question the Question4. Question the Question
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
Histogram reveals trend and seasonality
24. 4. Question the Question4. Question the Question
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
Fiscal Quarter
Geography
Diagnosis
Retailer
Salesperson
Model
Lot
Reason
28. 4. Analyze the Question4. Analyze the Question
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
You stop counting
You stop accepting returns
You stop selling
Cumulative histograms focus attention on ?nal total
Product returns stop when...
29. 4. Normalize & Compare4. Normalize & Compare
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
Fiscal Quarter
Geography
Diagnosis
Retailer
Salesperson
Model
Lot
Reason
30. 4. Analyze the Question4. Analyze the Question
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
Normalize histograms to compare categories
Normalize by what?
Sales (which ones)?
Total returns?
How are we doing this week?
Not just this quarter
31. 4. Question the Question4. Question the Question
Unsupervised natural language processing?
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
President inaugural speeches
Target category = political party
32. 4. Question the Question4. Question the Question
What are the US Presidents' political parties based on speeches?
33. 4. Question the Question4. Question the Question
What are the US Presidents' political parties based on speeches?
34. 4. Question the Question4. Question the Question
The category you're interested in will not likely be the
most important "factor" in the NLP statistics
Dimension reduction (SVD, PCA) can identify factors
Word-sets that are most signi?cant
These represent the "themes"
Interpretation of these "themes" is up to you
Statistics Meaning¡Ù
35. 5. Deep Nets Run Aground5. Deep Nets Run Aground
Deep net performs well!
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", "6"
36. 5. Deep Nets Run Aground5. Deep Nets Run Aground
Not so fast... it's over?tting
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", "6"
37. 5. Deep Nets Run Aground5. Deep Nets Run Aground
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", "6"
a = W pS ,Sk (k+1)
k
p WS ,Sk (k+1)
k
a
Conventional Hebb rule
W = W + t pnew old
q q
T
W = W + ¦Á(t ? a )pnew old
q q q
T
Hebb "delta" rule
38. 5.5. Shallow DataShallow Data
Model degree:
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", "6"
a = W pS ,Sk (k+1)
k
p WS ,Sk (k+1)
k
a
S S¡Æk
k (k+1)
Training data DOF:
S S N1 3
samples (independent samples)
39. 5.5. Shallow DataShallow Data
Model degree:
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", "6"
a = W pS ,Sk (k+1)
k
p WS ,Sk (k+1)
k
a
S S + S S1 2 2 3
Training data DOF:
(S + S )N1 3
samples
(1 hidden layer)
(independent samples)
40. 5. Bottom Line5. Bottom Line
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", "6"
N << Nhidden training
bit.ly/nntune
42. 6. Escape from the Maze6. Escape from the Maze
Tight heuristics vital for e?cient graph search
"Always turn right" is not good enough
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
43. 6. Escape from the Maze6. Escape from the Maze
Don't bother with "exhaustive" correlation search
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
complexity ¡Ö O(M N ) ¡Ö 102 2 24
Find db relationships using meta-data
min, max, median
#records
#distinct
for reals: mean, std
complexity ¡Ö O(MNlog(N)) ¡Ö 1013
105
107
44. Human HeuristicsHuman Heuristics
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
Business knowledge narrows search:Business knowledge narrows search:
Repair technicians
Product designers
Factory managers
Suppliers
Sales channels
Call center
45. Accidental "Experiements"Accidental "Experiements"
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
Look for di?erences inLook for di?erences in
Model
Lot
Product
Sales Channel
Customer Demographic
Region/Culture
Look for ...Look for ...
New/deleted features
Documentation updates
Cost-saving parts changes
Production facilities (outsourced vs insourced)
46. Kruskal's AlgorithmKruskal's Algorithm
Minimum Spanning TreeMinimum Spanning Tree
1. Add lowest cost edge with new node
2. Repeat until all nodes accounted for
def minimum_spanning_zipcodes():
zipcode_query_sequence = []
G = build_graph(api.db, limit=1000000)
for CG in nx.connected_component_subgraphs(G):
for edge in nx.minimum_spanning_edges(CG):
zipcode_query_sequence += [edge[2]['zipcode']]
return zipcode_query_sequence
Produces one graph for each connected subgraph
Built into python graph library (` `):networkx
47. A* AlgorithmA* Algorithm
Minimum Path to GoalMinimum Path to Goal
from networkx.algorithms.shortest_paths import astar_path
astar_path(G, source, target, heuristic=None)
Provably optimal and optimally e?cient
But typical data relationship graph has large branching
factor
Built into python graph library (` `)networkx
48. A* AlgorithmA* Algorithm
Minimum Path to GoalMinimum Path to Goal
from networkx.algorithms.shortest_paths import astar_path
astar_path(G, source, target, heuristic=None)
Provably optimal and optimally e?cient
Built into python graph library (` `)networkx
You better have a good heuristic!
50. Choose Your StoryChoose Your Story
7707-2-TOTAL7707-2-TOTAL
(770) 728-6825(770) 728-6825
1. Only Nyquist Knows
2. The Meaning of Mean
3. Data Dearth
4. Question the Question
5. Deep Net Runs Aground
6. Escape the Maze
Consider sample rate
Classify before mean
Explore data sources
Reject rate metric
data > nodes x inputs
Lazy correlation
bit.ly/pawsvote
51. ReferencesReferences
2011, Mike Bostock
2014, Lane, Zen, Kowalski, PDX Python U.G.
2014, Hagan, Demuth, et. al., OKSU
"Forecasting Product Returns"
2001, Toktay, INSEAD
2014, Andrew D. Straw
" "
2014, Matt Makai
"Data Driven Documents"
"Data Science with `pug`"
"Neural Network Design"
`scipy.ransac`
Choose Your Own Adventure Presentation