�ݺ�ߣ

Novelties in social science stats
Jiří Haviger
Faculty of Informatics nad Management
University of Hradec Kralove
Czech republic

Jiří Haviger
psycholinguistic stat Text-Based Detection of the Risk of Depression
quantitative linguistic Mutual Intelligibility between West-Slavic Languages
stat for medicine CVID-Associated Tumors
stat for psychology Using mood induction procedures in psychological research
stat for math didactics The van Hiele geometry thinking levels
Orcid 0000-0001-8353-497X
Loop profile 550323
ResearcherID K-2198-2017
Scopus Author ID 24829164400
2

Points of presentation
recap of IMRaD
recap of inference stat
Sample size determination
Machine learning in social research
Structural equation modeling
3

IMRaD – structure of scientific storytelling
Introduction – WHY we create our reseach ?
Opening
Literature review
Gap-of-knowledge
Research question / hypothesis
Method – HOW we design our research ?
Design
Data collection
Sample / Participants
Procedure
Results – WHAT are our findings ?
Remind the reader of the research question.
Answer to the research question in words.
Present the relevant statistics (table, graph).
Discussion – WHERE it is usefull ?
Summary of the results
Theoretical implications
Practical implications
Limits of our study
Suggestions for future research
4

Introduction
Why we create our reseach
Reviewer – topic
Method
How we design our research, obtain sample and handle with data
Reviewer – methodology and statistics
Results
What are our findings?
Reviewer – statistics
Discussion
Where it is useful?
Reviewer – topic
5
IMRaD - reviewers

Sample description
old fashion article
6

Participants - article 1
Informed consent
Ethical committee approval
Short description
Age mean, range
Gender
Recruitment method
Exclusion / inclusion criteria
important for
Selection bias
7

Recruitment method
Psychology students as a part of course
Non-psychology students with a monetary
compensation
Inclusion criteria
Informed consent
important for
Selection bias
different motivation
8

Randomly assignation
Usually in Design
Recruitment method
Voluntaries
Credit in courses
Power analysis
Given strange of effect
Given alpha, power
important for
Selection bias
Voluntaries – non random
9

Recruitment method
Online survey panel
Power analysis
Monte Carlo simulation
Given alpha, power
Inclusion criteria
Important for
Selection bias
Online = without control
10

How to determine sample size?
reminder of classical hypothesis testing
basic idea
no information about population, but only about sample
generalize (inference) from sample to population
We hope that there is some difference, relation, …
We have sample descriptive
null hypothesis about population
H0 usually diff = 0, can be different
We want Come our sample from population where is H0 true?
decision based on probabilities => p-value, sig
11

example
Example: We would like to proof, that there is some effect of intervention
between experimental and control group
We have mean + sd from experimental,
mean + sd from control
null hypothesis: effect(experimental) – effect(control) = 0
α (alpha), usually α = 0.05 (see next)
We want Come our sample from population where is H0 true?
prob (sample is from popul with H0) = sig (p-value)
Decision sig < α => H0 in popul is improbable (H0 is not true) => discovery
sig > α => H0 in popul is probable (H0 is true) => non-discovery
12

Experimental – control = Δ
Problem:
Sig (p-value) depends on Δ and sample size
Δ increasing => sig decreasing (it‘s OK)
n increasing => sig decreasing
bigger samples => more significant results
https://imgs.xkcd.com/comics/slope_hypothesis_testing.png
Solution - set minimal acceptable values
based on expert minimal clinical important difference (MID,MCID)
based on distribution effect size measures (Cohen‘s D, Pearson‘s r, …)
Problem with sig
13

we would like to
generalize (inference) from sample to population
decision based on probabilities => sig
Medicine and data mining terminology: test gives
negative result (T-) H0 true no effect (difference) detected non-discoveries (no illness)
positive result (T+) H0 false effect (differnce) detected discoveries (illness)
Decision about effect of intervention
in population
two possible error
Test result
sig > α
test negative(T-)
sig < α
test positive(T+)
In reality
In population
health True Negative
False Positive
Falsely detected illness
illness
False Negative
Missed illness
True Positive
test vs reality in population – two kind of errors
14

We would like to control prob of erros in acceptable values
falsely detect effect FP type I error p (reject H0 | H0 true) control via α
missed effect FN type II error p (accept H0 | H0 false) control via β
Statistical power TP p (reject H0 | H0 false) control via 1-β
Error I prob, that study detects difference if a difference not exists
Error II prob, that study misses difference if a difference really exists
Power prob, that study detects a difference if a difference really exists
Two possible error in decision
Test-based decission
T- T+
Population
P- True nondiscoveries
False discoveries
Acceptable p(FP) = α
P+
False nondicoveries
Acceptable p(FN) = β
True discoveries
Acceptable p(TP) = 1-β
two kind of errors
15

There are four values in relation
n required sample size
α required maximal prob of error one
1-β required minimal power (or max prob of error two, β)
D required effect size, eg. Cohens D
From three of them you can determine last one
https://rpsychologist.com/d3/nhst/
Four values in relation
16

Effect size interpretation
Cohen’s d Pearson’s r
Pointless d < 0.2 |r| < 0.1
Small d >= 0.2 |r| >= 0.1
Medium d >= 0.5 |r| >= 0.3
Large d >= 0.8 |r| >= 0.5
https://digitalcommons.wayne.edu/jmasm/vol8/iss2/26/
Effect size interpretation
17

A priori given (α , β, D) compute required sample size
Sensitivity given (α , β, n) compute required D
Post hoc given (α , n, D) computed achievenment β
…
Type of power analysis
18

App G* power
https://www.psychologie.hhu.de/arbeitsgruppen/allgemeine-
psychologie-und-arbeitspsychologie/gpower
R pwr package
https://www.statmethods.net/stats/power.html
Python power part of stat package
https://machinelearningmastery.com/statistical-power-and-
power-analysis-in-python/
G* power application
19

Experimental vs control group
prior power analysis
not normal distribution => Wilc-Mann-Whintest
α = 0.05
power = 1-β = 0.8
effect size d = 0.5
Allocation ratio =1
same sample size for control and
experimental is the same
20

not normal distribution => Wilc-Mann-Whitney
α = 0.05
power = 1-β = 0.8
effect size d = 0.5
Allocation ratio =1
same sample size for control and
experimental is the same
required sample size
N1 = N2 = 53
21

One group, pretest and posttest
normal distribution
=> mean difference matched pairs
α = 0.05
power = 1-β = 0.95
effect size d = 0.5
pretest vs posttest
22

normal distribution
=> mean difference matched pairs
α = 0.05
power = 1-β = 0.95
effect size d = 0.5
N = 45
pretest vs posttest
23

normal distribution => Pearson correlation
α = 0.05
power = 1-β = 0.95
effect size correlation = 0.3
relation between variables
24

normal distribution => Pearson correlation
α = 0.05
power = 1-β = 0.95
effect size correlation = 0.3
N = 115
relation between variables
25

+ verification research
+ hypothesis of difference, relation, univariate regression
+ software, packages
+ conected with effect size
- exploratory research
- factor analysis EFA
- multiple testing problem
- CFA, SEM
- most of machine learning methods
https://xkcd.com/2303/
26

Machine learning in social sciencies
In social sciencies is mostly used supervised machine learning
Predictor(s) one target variable
0/1 clasification
scale regression
The goal useful prediction
no create new theory
no verification of hypothesis
automatically method selection (XGBoost, NN, …)
27

In social sciencies is mostly used supervised machine learning
What we mean as (supervised) machine learning proces?
General model with unknown parametrs
Estimation parametrs from training part of dataset
=> trained model
Used trained model to testing part of data
=> quality of model
Visualisation of NN
http://playground.tensorflow.org/
28

General model => learning => trained model
What we mean as general model?
Classification
logistic regression (!),
decission tree,
XGBoost, …
Regression
(multi) linear regression (!),
ordinal,
curve estimation,
neural networks, …
29

For commonly used regressions (linear, logistic) shoul be used ML proces too.
train at trainning part of dataset only
quality measures at testing part of dataset only
regression R, R^2 Correlation target vs predicted
MSE, RMSE (root) mean square error
MAPE (%) mean absolute percentage error
classification Nagelkerke R^2
ACC accuracy
Pre, Rec Precission, recall
Sen, Spec Sensitivity, specificity 30

+ universal method
not necessary to know, what is meaning of variables
+ more possibility to set general models (eg NN: #layers, #neurons in layer, Transform function, ……)
=> no-one know, which model is „correct“, only wich is better and which worst
=> more possibility in parametrs => more probability of sucess
+ small possiblility to „hack“ results, cause of quality measure in testing part only
- (!) black–box, problematic explanation of model
- (!) difficulty to compare with theory
- (!) difficuilty establish new thoery
GOOD for decision making, POOR for explanation and theory
31

We have observed variables
model(s)
We want verificate our theoretical model(s)
describing weight of relations
Goal of SEM evaluate given models(s)
Based on https://semopy.com/syntax.html
Structural equation modelling (SEM)
32

Latent
can‘t be measure directely
affect observed variables
Manifested (observed)
can be measure directly
influenced by latent variables
variables
33

Observed D3, D4, …
Latent Design, Owner, Construction
Measurement operator
=~ design =~ D3 + D4 + ..
Regression operator
~ owner ~ design + comstruction
Covariance operator
~~ not in this model, can be C3 ~~ C4 + C5
Example from
https://www.mdpi.com/2071-1050/9/5/822/htm
relations in model
34

App JASP – SEM
https://jasp-stats.org/
R lavaan package,
https://lavaan.ugent.be/tutorial/syntax1.html
Python SEMOPY package
https://semopy.com/
35

# measurement model
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + y2 + y3 + y4
dem65 =~ y5 + y6 + y7 + y8
# regressions
dem60 ~ ind60
dem65 ~ ind60 + dem60
# residual correlations
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
code of model
36

JASP
37
As deafault disable
Enable in +

JASP
38
More than one model
Model description
Options, output, estimation

JASP
39
Output
additional fit measure
Path diagram

JASP - output
40
Compare more models
Chisq p-value should be > .05
Same df not possible to compute difference test
https://easystats.github.io/effectsize/reference/interpret_gfi.html

JASP – output and criterion
41
Fit indices
CFI should be > .90
Information criteria for comparing models
AIC lower => better
BIC lower => better
Other fit measures
RMSEA should be < .08 or < .05.
GFI 1 perfect fit.
≥ 0.95 excellent fit (Kline, 2005)
≥ 0.9 reasonable fit (Hu & Bentler, 1998)
https://easystats.github.io/effectsize/reference/interpret_gfi.html
https://uedufy.com/how-to-interpret-model-fit-results-in-amos/

Structural equation modeling
+ more relations (better than hypothesis testing)
+ more than one target (better than ML)
+ theory-based model (better than ML)
+ recomandation for sample size (https://www.danielsoper.com/statcalc/calculator.aspx?id=89 )
+ general model confirmatory factor analysis,
mediation analysis, …
- no recomandation how to create good model,
- no automatic way to generate model
- two math ways – SEM / Partial Least Square Path Modeling (PLS PM),
- (!) poor implication to real application
Sometimes not clear how to intervate to expect good effect
GOOD for explanation and theory, POOR for application
42

Summary
Machine learning in social research
SEM
Jiří Haviger
Faculty of Informatics nad Management
University of Hradec Kralove
Czech republic
43

�ݺ�ߣ

Novelties in social science statistics

More Related Content

Novelties in social science statistics