回帰不連続デザイン(Regression Discontinuity Design, RDD)Jaehyun Song
?
神戸大学法学研究 政治学方法論特殊講義III(担当: 藤村直史) 報告資料
回帰不連続デザイン(Regression Discontinuity Design, RDD)
報告日:2016年7月8日
( PDF version is also available in http://www.jaysong.net )
4. なぜ操作変数法?
操作変数法
■英語表記
instrumental variable (IV) method
? 測定できていない交絡要因(Unmeasured
confounders)があっても、因果効果(causal
effects)を推定できる=疫学者の夢?
Hernan MA, Robins JM. Instruments for causal inference: an
epidemiologist's dream? Epidemiology. 2006;17(4):360-72.
5. A?Yを分析したい
Brookhart, M. A., et al.(2010). Confounding control in healthcare database research:
challenges and potential approaches. Med Care, 48(6 Suppl), S114-120.
13. IV Reporting Checklist
2. partial F-statisticにて操作変数と説明変数の関連
を記述する。
28% の論文でF-statisticsもしくはpartial r2が報告されていた。
■事例 ●方法の節(partial F-statistic)
? We estimated the association of previous prescriptions
with actual prescriptions using linear regressions of
exposures on previous prescriptions and report the partial
r2 and partial F tests. Larger partial F statistics imply
stronger associations of previous prescriptions and
exposure.[46]
■事例 ●結果の節
? previous prescription was strongly associated with actual
prescription
事例: Davies NM, et al. Epidemiology 2013; 24(3): 352-62.
Checklist: Davies NM, et al. Epidemiology 2013; 24(3): 363-9.
14. IV Reporting Checklist
3. 操作変数と説明変数それぞれについて測定さ
れた交絡要因との関連を報告もしくはテストす
る。
Risk differences were estimated using ordinary least squares regression of confounder
on exposure or instrument (COX-2s vs. nonselective NSAIDs).
事例: Davies NM, et al. Epidemiology 2013; 24(3): 352-62.
Checklist: Davies NM, et al. Epidemiology 2013; 24(3): 363-9.
■事例 ●結果(Table)
15. IV Reporting Checklist
4. 複数の操作変数を使用する場合、過剰識別制
約(over identification)についてテストする。
i.e., Sargan test or Hansen test[116,117]
事例:Palmer TM, et al. Stat Methods Med Res 2012; 21(3): 223-42.
Checklist: Davies NM, et al. Epidemiology 2013; 24(3): 363-9.
■事例
●方法の節(Sargan test)
? In models including multiple instruments
the Sargan test of over-identification,
available in the ivreg2 command, was used to
test the joint validity of the instruments.[39]
●結果の節(Sargan test+Table)
? For each multiple instrument model, the
Sargan over-identification test provides little
evidence against the joint validity of the
instruments.
(抜粋)
事例は、遺伝子をIVとした研究
16. IV Reporting Checklist
5. 2値の結果変数、説明変数、操作変数の場合、
それぞれの頻度についてクロスした集計表を
呈示する。
事例: Davies NM, et al. Epidemiology 2013; 24(3): 352-62.
Checklist: Davies NM, et al. Epidemiology 2013; 24(3): 363-9.
■事例 ●結果の節
? All combinations of surrogate instruments, prescriptions, and
outcomes had one or more events (Table 2)
18. IV Reporting Checklist
6. 2値の結果変数について線形モデルを使用する
場合、robust (sandwich estimators)もしくは
bootstrapによるstandard errorsを使用し、必要
に応じて研究対象者のクラスターを考慮に入れ
た分析を行う。
Checklist: Davies NM, et al. Epidemiology 2013; 24(3): 363-9.
■事例
●方法の節(robust standard errors/clustering)
? We calculated robust standard errors accounting for
clustering of patients by physician and the binary
(heteroskedastic) outcome[36]
事例: Davies NM, et al. Epidemiology 2013; 24(3): 352-62.
19. IV Reporting Checklist
1. 推定した結果の対象者は誰か、関連する仮定
とともに記述する。
61%のStudyで明記されていた。 (Swanson 2013)
Swanson SA, et al.: Epidemiology, 24: 370-4, 2013
Checklist: Davies NM, et al. Epidemiology 2013; 24(3): 363-9.
例えば、
monotonicityの仮定のもとのlocal average treatment
effect(LATE)= complierにおける効果
no effect modificationの仮定のもとのaverage treatment
effect(ATE)=対象者全体における効果
23. IV reporting flowchart 前半
Swanson SA, et al.: Epidemiology, 24: 370-4, 2013
仮定
(1) 操作変数が説明変数と関連している
こと
(2) 操作変数は説明変数を介してのみ結
果変数に影響を与えていること
(exclusion restriction)
(3) 操作変数から結果変数への別の
ルートがないこと
検証可能
検証不可能
24. IV reporting flowchart 後半
Swanson SA, et al.: Epidemiology, 24: 370-4, 2013
仮定
(4h) for ATE 効果が均一である
(4m) for LATE 単調性(monotonicity)
No “defiers”
交互作用がないと仮定
28. IV Estimators
? LATE; local average treatment effect (complierに
おける平均因果効果):母集団の構成員のうち
complier(暴露群と非暴露群)における結果変数
の期待値の差
? ATE; average treatment effect (平均因果効果):母
集団の構成員すべてが非暴露群から暴露群に
変化したときの,結果変数の期待値の差
? TET; treatment effect among the treated (暴露群
の平均因果効果)
? TEU; treatment effect among the untreated (非暴
露群の平均因果効果)
29. IV Reporting Checklist
1. 推定した結果の対象者は誰か、関連する仮定
とともに記述する。
■事例
●方法の節(local average treatment effect)
? This is the local average causal effect for patients who
would be prescribed COX-2s by physicians with
prescribing preference u*, but not by physicians with a
lower preference v.
The specific group of patients for whom this parameter
applies is unknown.[45]
Checklist: Davies NM, et al. Epidemiology 2013; 24(3): 363-9.
事例: Davies NM, et al. Epidemiology 2013; 24(3): 352-62.
31. IVの分類とバイアスへのケア
Garabedian LF, et al. Ann Intern Med. 2014;161(2):131-8.
systematic review in PubMed, EconLit, PsycINFO, Social Services
Abstracts, Social Sciences Citation Index, and Web of Science to
identify instrumental variable comparative effectiveness
research (CER) studies that were published in an English-
language, peer-reviewed journal through 31 December 2011
and conducted in the United States and other industrialized
countries.
187 IV studies in CER
これまで操作変数として使用された変数を分類し、それぞれで考慮すべき
交絡要因について説明 (ただし、GeneticsはIVの分類から除外)
32. IVの分類 in Davies 2013
Davies NM, et al. Epidemiology 2013; 24(3): 363-9.
地域の特徴
病院や医師の治療パターン
その他
日付?時間
合併症
など
33. IVの分類
4 Most Commonly Used Instrument Categories in
Comparative Effectiveness Research
? Distance to facility: 患者住居から病院までの距離
? Regional variation:地域における治療パターン
? Facility variation:病院における治療パターン
? Physician variation:医師における治療パターン
Garabedian LF, et al. Ann Intern Med. 2014;161(2):131-8.
34. 分類に応じた操作変数(IV)の例
4 Most Commonly Used Instrument Categories in
Comparative Effectiveness Research
Garabedian LF, et al. Ann Intern Med. 2014;161(2):131-8.
35. 操作変数法における仮定とその違反
IV assumption and the violation
Garabedian LF, et al. Ann Intern Med. 2014;161(2):131-8.
IV-結果変数間交絡要因
この交絡要因は操作変数法における仮定の(2)「操作変数は説明変数を介してのみ
結果変数に影響を与えていること (exclusion restriction)」に違反している!
36. IV-結果変数間交絡要因の分類
for the Most Commonly Used Instruments and a Mortality Outcome
Garabedian LF, et al. Ann Intern Med. 2014;161(2):131-8.
For all type of instrument
1. Geographic location 都市/田舎
2. Patient characteristics 人種、学歴、所得、年齢、医療保
険、健康状態?合併症、健康行動
3. Treatment characteristics その他の治療を受けたかどう
か、治療までの時間
4. Facility characteristics 手術件数、診療科、教育施設かど
うか、公的機関かどうか
For Regional variation
1. Provider supply 病床数、介護施設数
2. Technology adoption and utilization 高度医療提供、
処方傾向
37. IV-結果変数間交絡要因の調整
for the Most Commonly Used Instruments and a Mortality Outcome
Garabedian LF, et al. Ann Intern Med. 2014;161(2):131-8.
38. さいごに
? 操作変数法(IV)を使用した多く
の研究 (48 of 65 [74%])で、IV以
外の解析手法が併用されていた。
? どの解析にも前提とする仮定や
限界があり、十分に理解した上で
比較検討することが必要
Garabedian LF, et al. Ann Intern Med. 2014;161(2):131-8.
#12: Davies 2013, p366 Table 5.
A minority (n = 25, 28%) reported partial F-statistics or partial r2 of the association of the instruments with the exposures.
Under monotonicity, studies identify a weighted average of local average treatment effects. This causal parameter is the effect of treatment on those persons whose treatment decision was affected by the instrumental variable. The assumption of no effect modification by the instrument among the treated identifies the average effect of treatment on the treated subpopulation. (Davies 2013 p365)
intention-to-treatの結果とは違う。(Davies 2013 p365)
Standard output from techniques such as two-stage least squares report standard errors that are valid under conditional homoscedasticity. If the variance of the error terms is heteroskedastic (eg, if the outcome is binary in a linear effects model), then the estimates of the confidence intervals may be biased. Therefore, studies should use heteroscedasticity robust standard errors (sandwich estimators). (Davies 2013 p365)
#14: It is generally accepted that F-test values <10 indicate weak instruments.[10] 10. Staiger D, Stock JH. Instrumental variables regression with weak instruments. Econometrica 1997; 65:557–586.
From the supplementary data of Mojtabai R, Crum RM. Cigarette smoking and onset of mood and anxiety disorders. Am J Public Health 2013; 103(9): 1656-65.
基準は、 Staiger and Stock (1997) が、一段回目の推定で操作変数の係数がゼロという帰無仮説を検定した時に、F-statisticsが10以下であれば経験的にweakだと書き、それをstock and yogo (2005)が補足したものがベースになっている
Checklist: Davies 2013, p366
A minority (n = 25, 28%) reported partial F-statistics or partial r2 of the association of the instruments with the exposures.
事例: Davies 2013, p360
The F-statistic indicated that the instrumental variable results did not suffer from weak instruments.[46]
Test of weak identification
Kleibergen-Paap Wald statistic, robust to heteroskedasticity.
#15: The observed baseline covariables were strongly associated with prescriptions (Table 1).(事例 Davies 2013 p356)
#16: Palmer p231
There are two commonly used tests of over-identification; the Hansen test and the Sargan test.[39,48]Rejection of an over-identification test is taken to indicate that at least one of the instruments is not valid (i.e., it does not give the same estimate as the other instruments).[49]
p235
The Hausman tests suggest that the IV estimates using multiple instruments differ from the OLS estimate.
竹林先生の2つ目の事例
116. Hansen LP. Large sample properties of generalized method of moments estimators. Econometrica. 1982;50:1029–1054.
117. Sargan J. The estimation of economic relationships using instrumental variables. Econometrica. 1958;26:393–415.
It should be recalled that it is still necessary to assume that at least one instrument is exogenous. This means that if none of the instruments is exogenous, the Sargan, Hansen and Basmann statistics will be biased and inconsistent and can erroneously fail to reject the null hypothesis (Murray, 2006a).
#17: All combinations of surrogate instruments, prescriptions, and outcomes had one or more events (Table 2)(事例 Davies 2013 p357)
#18: Standard output from techniques such as two-stage least squares report standard errors that are valid under conditional homoscedasticity. If the variance of the error terms is heteroskedastic (eg, if the outcome is binary in a linear effects model), then the estimates of the confidence intervals may be biased. Therefore, studies should use heteroscedasticity robust standard errors (sandwich estimators). (Davies 2013 p365)
#19: Davies 2013, p366
A minority (n = 25, 28%) reported partial F-statistics or partial r2 of the association of the instruments with the exposures.
Under monotonicity, studies identify a weighted average of local average treatment effects. This causal parameter is the effect of treatment on those persons whose treatment decision was affected by the instrumental variable. The assumption of no effect modification by the instrument among the treated identifies the average effect of treatment on the treated subpopulation. (Davies 2013 p365)
intention-to-treatの結果とは違う。(Davies 2013 p365)
Standard output from techniques such as two-stage least squares report standard errors that are valid under conditional homoscedasticity. If the variance of the error terms is heteroskedastic (eg, if the outcome is binary in a linear effects model), then the estimates of the confidence intervals may be biased. Therefore, studies should use heteroscedasticity robust standard errors (sandwich estimators). (Davies 2013 p365)
#20: intention-to-treatの結果とは違う。(Davies 2013 p365)
Two options are the average treatment effect in the population and the local average treatment effect (LATE ) in the subpopulation of “compliers.” For nondichotomous instruments, LATE -like effects are weighted averages of the effect in multiple subgroups. Because “compliers” cannot be identified, the LATE is a causal effect in an unknown subset of the population, for example, if one found a beneficial effect of treatment, it is unclear who the “compliers” would be who would have this benefit. Because the average treatment effect and the LATE will generally differ, investigators should be explicit about their reasons for choosing one over the other; only 61% of studies in our review were explicit.(Swanson 2013 p372)
#23: Monotonicity is generally implausible for the commonly proposed preference-based instruments. (Swanson 2013, p373.)
#24: Monotonicity is generally implausible for the commonly proposed preference-based instruments. (Swanson 2013, p373.)
#25: Monotonicity is generally implausible for the commonly proposed preference-based instruments. (Swanson 2013, p373.)
Whether estimating bounds or effects (see below), one needs to decide the causal effect of interest. Two options are the average treatment effect in the population and the local average treatment effect (LATE) in the subpopulation of “compliers.” For Davies et al, “compliers” are patients who would be prescribed a selective NSAID had they seen a physician who preferred selective NSAIDs, but would be prescribed a nonselective NSAID had they seen a physician who preferred nonselective NSAIDs. For nondichotomous instruments, LATE -like effects are weighted averages of the effect in multiple subgroups. Because “compliers” cannot be identified, the LATE is a causal effect in an unknown subset of the population, for example, if one found a beneficial effect of treatment, it is unclear who the “compliers” would be who would have this benefit. Because the average treatment effect and the LATE will generally differ, investigators should be explicit about their reasons for choosing one over the other; only 61% of studies in our review were explicit.(Swanson 2013, p372.)
Homogeneity
If interest is in the average treatment effect, the untestable fourth condition for point estimation may be either condition (4c) of identical (ie, constant) treatment effect for all individuals in the population—generally impossible for dichotomous outcomes and highly unlikely for others—or the weaker homogeneity condition (4h) of no additive effect modification across levels of the instrument within the treated and the untreated (no effect modification only in the treated would allow valid estimation of the effect in the treated). The next step in the flowchart is to assess this homogeneity condition (4h) using subject-matter knowledge.
事例: Davies p361
Limitations of instrumental variable analysis are its imprecision and the large sample sizes required, as well as the unverifiable assumptions required for point identification.
#30: intention-to-treatの結果とは違う。(Davies 2013 p365)
Two options are the average treatment effect in the population and the local average treatment effect (LATE ) in the subpopulation of “compliers.” For nondichotomous instruments, LATE -like effects are weighted averages of the effect in multiple subgroups. Because “compliers” cannot be identified, the LATE is a causal effect in an unknown subset of the population, for example, if one found a beneficial effect of treatment, it is unclear who the “compliers” would be who would have this benefit. Because the average treatment effect and the LATE will generally differ, investigators should be explicit about their reasons for choosing one over the other; only 61% of studies in our review were explicit.(Swanson 2013 p372)
■事例
●方法の節(no effect modification)
? Under the no-effect-modification assumption, ψ0 is equal to the average effect of prescription on those prescribed, E[Y(1) ? Y(0)|X = 1]. If no-effect-modification assumption also holds for the untreated, then the average causal effect is identified. However, as the prescriptions are binary, it is impossible for no-effect-modification assumption to hold simultaneously for those prescribed and not prescribed.[33] Assuming no effect modification, we further specified multiple previous prescriptions as instruments and used the generalized method of moments estimator to estimate weighted averages of the effect of prescription on those prescribed.[38,44]
The size of bias is a decreasing function of the strength of association of Z and X, and the proportion of the patients with Z = 0 who were not prescribed COX-2s.
#35: *Studies that used instruments from multiple categories were counted more than once. Percentages are based on 187 total instrumental variable comparative effectiveness research studies.
? “Treatment” refers to instrument assignment to treatment group rather than actual receipt of treatment.
#38: 65 instrumental variable CER studies that used 1 of the 4 most common instruments and a mortality outcome to determine whether the authors discussed or controlled for the potential instrument–outcome confounders.
None of the studies in our review controlled for all potential instrument–outcome confounders we identified in the literature.
Most (39 of 65 [60%]) used data from only administrative databases, such as electronic medical records and insurer claims.
* Analysis was limited to studies that used 1 of the 4 most commonly used instruments and a mortality outcome (see Supplement 3). Studies that used 1 of the 4 instruments appear in multiple columns.
? Studies that used procedure or facility volume as an instrument or independent variable were removed from the denominator.
#39: 65 instrumental variable CER studies that used 1 of the 4 most common instruments and a mortality outcome to determine whether the authors discussed or controlled for the potential instrument–outcome confounders.