9. ??
Sir Francis Galton (1822-1911)
Galton was a polymath who made
important contributions in many ?elds of
science, including meteorology (the
anti-cyclone and the ?rst popular weather
maps), statistics (regression and
correlation), psychology (synesthesia),
biology (the nature and mechanism of
heredity), and criminology (?ngerprints)
He ?rst introduced the use of
questionnaires and surveys for collecting
data on human communities.
10 / 50
10. ??
Karl Pearson (1857 - 1936)
Student of Francis Galton
He has been credited with establishing
the discipline of mathematical statistics,
and contributed signi?cantly to the ?eld of
biometrics, meteorology, theories of social
Darwinism and eugenics
Founding chair of department of Applied
Statistics in University of London (1911),
the ?rst stat department in the world!
Founding editor of Biometrika
11 / 50
19. ????
????
?? ???: U = {1, , N}.
??: ??? ?YN = N?1 N
i=1 yi
???? ??: B ? U.
Ii =
1 if i B
0 otherwise.
???: ?? ?? ?yB = N?1
B
N
i=1 Iiyi, where NB =
N
i=1 Ii is the big
data sample size (NB < N).
21 / 50
20. ????
Fundamental theorem of estimation error
Formula (Meng, 2016)
E(?yB ? ?Y )2
= E(2
I,Y ) 2
1 ? fB
fB
where I,Y is the correlation between I and Y ,fB = NB/N, is the big
data sampling mechanism, generally unknown.
Three components: data quality, problem dif?culty, and data quantity
?? ??? (Effective sample size): ??? ????? ??? Big data
???? ?? ??(MSE)? ?? ??? simple random sample ? ???
22 / 50
22. ????
Paradox of Big data
???? ??? ?? ?? ???? ???? ????? ???? ??
CI = (?yB ? 1.96 (1 ? fB)S2/NB, ?yB + 1.96 (1 ? fB)S2/NB)
As NB , we have
Pr( ?YN CI) 0.
Paradox: ??? ???? ?? ???? ???? ??? ??, ???
??? ???? ? ??? ?? ??? ????. (If one ignores the bias
and apply the standard method of estimation, the bigger the dataset, the
more misleading it is for valid statistical inference.)
24 / 50
24. Salvation
1. ?? ?? ??: Data integration
??? ???: ????? ??? ???
????? ?? ??? ??
??? ???? ?? ??? Y ? ?????? ??.
I = 1 I = 0
Y = 1 NB1
Y = 0 NB0
NB N ? NB
where Ii = 1 if unit i belongs to the big data sample and Ii = 0 otherwise.
?? ??: P = P(Y = 1).
27 / 50
25. Salvation
??? ?????? ??? ?? ??? ??? ??. (?? ????
????)
I = 1 I = 0
Y = 1 nB1 nC1 n1
Y = 0 nB0 nC0 n0
n
? ???? ??? ???? P? ??? ????
28 / 50
26. Salvation
??? ???
Note that
P(Y = 1) = P(Y = 1 | I = 1)P(I = 1) + P(Y = 1 | I = 0)P(I = 0).
Three components
1 P(I = 1): Big data proportion (known)
2 P(Y = 1 | I = 1) = NB1/NB: obtained from the big data.
3 P(Y = 1 | I = 0): estimated by nC1/(nC0 + nC1) from the survey data.
Final estimator
?P = PBWB + ?PC(1 ? WB) (1)
where WB = NB/N, PB = NB1/NB, and ?PC = nC1/(nC0 + nC1).
29 / 50
27. Salvation
Remark
Variance
V ( ?P) = (1 ? WB)2
V ( ?PC)
.
= (1 ? WB)
1
n
PC(1 ? PC).
If WB is close to one, then the above variance is very small.
Instead of using ?PC = nC1/(nC0 + nC1), we can construct a ratio
estimator of PC to improve the ef?ciency. That is, use
?PC,r =
1
1 + ?C
where
?C =
NB0/NB1
nB0/nB1
(nC0/nC1).
30 / 50
35. ?? ??
Area level model (Contd)
The goal is to predict Yi(=??) using the observation of ?Yi (=????)
and and Xi(=KT ??).
Area level model is a useful tool for combining information from different
sources by making an area level matching.
Area level model consists of two parts:
1 Sampling error model: relationship between ?Yi and Yi.
2 Structural error model: relationship between Yi and Xi.
39 / 50
36. ?? ??
Area level model: Fay-Herriot model approach
Figure: A Directed Acyclic Graph (DAG) for classical area level models.
?Y
Y
X
(2)(1)
(1): Sampling error model (known),
(2): Structural error model (known up to ).
40 / 50
37. ?? ??
Combining two models
Prediction model = sampling error model + structural error model
Bayes formula for prediction model
p(Yi | ?Yi, Xi) g( ?Yi | Yi)f(Yi | Xi),
where g() is the sampling error model and f() is the structural error
model.
g(): assumed to be known.
f(): known up to parameter . ?????
Yi = Xi + ei, ei (0, 2
X2
i )
? ???
41 / 50
38. ?? ??
Parameter estimation
Obtain the prediction model using Bayes formula
EM algorithm: Update the parameters
?(t+1)
= arg max
i
E{log f(Yi | Xi; ) | ?Yi, Xi; ?(t)
}
where the conditional expectation is with respect to the prediction model
evaluated at the current parameter ?(t)
.
42 / 50
39. ?? ??
Prediction vs Parameter estimation
Figure: EM algorithm
?Y
Y
X
?
M-step
E-step
43 / 50
40. ?? ??
Prediction (frequentist approach)
?? ??: Expectation from the prediction model at = ?
?Y ?
i = E{Yi | ?Yi, Xi; ?}
If f(Yi | Xi) is a normal distribution then
?Y ?
i = i
?Yi + (1 ? i)E(Yi | Xi; ?)
for some i where
i =
V (Yi | Xi; ?)
V ( ?Yi) + V (Yi | Xi; ?)
.
44 / 50