贝叶斯非参数因果推断#
import arviz as az
import matplotlib.lines as mlines
import matplotlib.patches as mpatches
import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
import pandas as pd
import pymc as pm
import pymc_bart as pmb
import pytensor.tensor as pt
import statsmodels.api as sm
import xarray as xr
from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor
%config InlineBackend.figure_format = 'retina' # high resolution figures
az.style.use("arviz-darkgrid")
rng = np.random.default_rng(42)
因果推断与倾向得分#
很少有比断言因果关系更强的主张,也很少有比这更具争议的主张。一个幼稚的世界模型——充满了脆弱的联系和不合逻辑的推论——是阴谋论和愚蠢的特征。另一方面,对因果关系的精细和详细的知识,以明确的期望、合理的联系和令人信服的反事实为特征,将引导您顺利穿过这个喧嚣、蓬勃发展的混乱世界。
实验环境中的因果推断与观察环境中的因果推断截然不同。在实验环境中,处理是随机分配的。对于观察数据,我们永远不知道将分析的每个对象分配到其处理的机制。这带来了选择效应造成偏差的风险,从而混淆了对处理的分析。倾向得分是研究人群中每个人接受治疗状态(例如,吸烟者/非吸烟者、离婚/已婚)的估计概率。我们将在下面回顾理解个体治疗倾向如何帮助减轻选择偏差。
用于估计倾向得分和治疗效果的非参数方法试图对精确的函数关系保持不可知论,并避免模型误设。这种推动源于这样一种观点,即过度的建模假设会滋生不必要的混淆误差,应该避免。参见不可知统计的基础 Aronow 和 Miller [2019]。我们最终会认为,这种极简主义的“沙漠景观”美学在具有真正重要性的因果问题面前会粉碎,即使我们可以使用非参数方法,也需要实质性的假设。
在本笔记本中,我们将解释和激励在因果推断问题的分析中使用倾向得分。我们的重点将放在我们如何 (a) 估计倾向得分和 (b) 在因果问题的分析中使用它们的方式上。我们将看到它们如何帮助避免因果推断中的选择偏差风险,以及它们可能出错的地方。对于熟悉使用先验形式的信息来权衡和重新权衡其主张的贝叶斯分析师来说,此方法应该很舒适。倾向得分加权只是另一种利用世界知识丰富模型的机会。我们将展示如何直接应用它们,然后间接应用于去偏机器学习的因果推断方法中。
我们将使用两个数据集来说明这些模式:Causal Inference: What If Hernán 和 Robins [2020] 中使用的 NHEFS 数据,以及 Bayesian Nonparametrics for Causal Inference and Missing Data Daniels et al. [2024] 中分析的第二个以患者为中心的数据集。将强调用于估计倾向得分和因果效应的非参数 BART 模型与更简单的回归模型之间的对比。
关于倾向得分匹配的说明
倾向得分通常与 倾向得分匹配 技术同义。我们不会涵盖这个主题。它是倾向得分建模的自然延伸,但在我们看来,它通过围绕 (a) 选择匹配算法和 (b) 样本量减少导致的信息丢失的要求引入了复杂性。
演示的结构#
下面我们将看到许多非参数方法估计因果效应的示例。其中一些方法效果良好,而另一些方法则会误导我们。我们将演示这些方法如何作为对我们的推断框架施加越来越严格的假设的工具。在整个过程中,我们将使用潜在结果符号来讨论因果推断——我们广泛借鉴了 Aronow 和 Miller [2019] 的工作,您可以查阅更多详细信息。但简而言之,当我们想要讨论治疗状态对结果 \(Y\) 的因果影响时,我们将结果 \(Y(t)\) 表示为治疗方案 \(T = t\) 下的结果度量 \(Y\)。
倾向得分在选择效应偏差中的应用
NHEFS 关于体重减轻以及与吸烟关系的数据
倾向得分在选择效应偏差中的应用
健康支出数据以及与吸烟关系的数据
双重/双重 ML 应用于估计 ATE
中介分析应用于估计直接和间接效应
在不同混淆威胁下保证因果推断所需的这组不断升级的假设揭示了整个推断过程。
为什么我们关心倾向得分?#
对于观察数据,我们无法重新运行分配机制,但我们可以估计它,并转换我们的数据以按比例加权每个组内的数据摘要,以便分析较少受到每个组中不同阶层过度表示的影响。这就是我们希望使用倾向得分来实现的目标。
首先,也是有点表面的是,倾向得分是一种降维技术。我们采用复杂的协变量剖面 \(X_{i}\) 来表示个体的测量属性,并将其简化为标量 \(p^{i}_{T}(X)\)。它也是一种思考个体在不同治疗方案下的潜在结果的工具。在政策评估的背景下,它可以帮助部分地分析人口各阶层政策采纳的激励程度。是什么驱动了人口中每个角落的采纳或分配?如何引导不同的社会阶层走向或远离政策的采纳?理解这些动态对于衡量选择偏差可能在任何样本数据中出现的原因至关重要。Paul Goldsmith-Pinkham 的 讲座 特别清楚地说明了最后一点,以及为什么这种观点对结构计量经济学家具有吸引力。
在考虑倾向得分时,关键思想是我们不能许可因果主张,除非 (i) 治疗分配独立于协变量剖面,即 \(T \perp\!\!\!\perp X\),并且 (ii) 结果 \(Y(0)\) 和 \(Y(1)\) 同样有条件地独立于治疗 \(T | X\)。如果这些条件成立,那么我们说给定 \(X\),\(T\) 是 强可忽略的。这有时也称为 非混淆性 或 可交换性 假设。对于由协变量剖面 \(X\) 定义的人口中的每个阶层,我们要求,在控制 \(X\) 后,个体采用哪种治疗状态就像随机一样。这意味着在控制 \(X\) 后,治疗组和未治疗组之间结果的任何差异都可以归因于治疗本身,而不是混淆变量。
定理指出,如果给定 \(X\),\(T\) 是强可忽略的,那么 (i) 和 (ii) 也给定 \(p_{T}(X)\) 成立。因此,有效的统计推断在较低维度空间中进行,使用倾向得分作为较高维度数据的代理。这很有用,因为复杂协变量剖面的一些阶层可能人口稀少,因此替代倾向得分使我们能够避免高维度缺失数据的风险。当我们控制了足够多的政策采纳驱动因素时,因果推断是非混淆的,以至于每个协变量剖面 \(X\) 内的选择效应看起来本质上是随机的。这表明的见解是,当您想要估计因果效应时,您只需要控制影响治疗分配概率的协变量。更具体地说,如果对分配机制建模比对结果机制建模更容易,则可以在观察数据的因果推断中使用分配机制。
鉴于我们正在测量正确的协变量剖面以诱导 强可忽略性 的假设,那么可以深思熟虑地使用倾向得分来支持因果主张。
图片中的倾向得分#
一些方法学家提倡通过假设和评估有向无环图来分析因果主张,这些图据称代表了因果影响的关系。在倾向得分分析的情况下,我们有以下图片。请注意 X 对结果的影响没有改变,但其对治疗的影响被捆绑到倾向得分指标中。我们对 强可忽略性 的假设正在全力以赴地许可我们进行因果主张。倾向得分方法通过纠正性地使用捆绑到倾向得分中的结构来实现这些移动。
显示代码单元格源
fig, axs = plt.subplots(1, 2, figsize=(20, 15))
axs = axs.flatten()
graph = nx.DiGraph()
graph.add_node("X")
graph.add_node("p(X)")
graph.add_node("T")
graph.add_node("Y")
graph.add_edges_from([("X", "p(X)"), ("p(X)", "T"), ("T", "Y"), ("X", "Y")])
graph1 = nx.DiGraph()
graph1.add_node("X")
graph1.add_node("T")
graph1.add_node("Y")
graph1.add_edges_from([("X", "T"), ("T", "Y"), ("X", "Y")])
nx.draw(
graph,
arrows=True,
with_labels=True,
pos={"X": (1, 2), "p(X)": (1, 3), "T": (1, 4), "Y": (2, 1)},
ax=axs[1],
node_size=6000,
font_color="whitesmoke",
font_size=20,
)
nx.draw(
graph1,
arrows=True,
with_labels=True,
pos={"X": (1, 2), "T": (1, 4), "Y": (2, 1)},
ax=axs[0],
node_size=6000,
font_color="whitesmoke",
font_size=20,
)

在上面的图片中,我们看到倾向得分变量的包含如何阻止协变量剖面 X 和治疗变量 T 之间的路径。这是对 强可忽略性 假设的有用视角,因为它意味着 \(T \perp\!\!\!\perp X |p(X)\),这意味着协变量剖面在治疗分支之间根据倾向得分进行平衡。这是所假设设计的可检验的含义!
如果我们的治疗状态是这样的,即个体或多或少会积极地选择进入该状态,那么对治疗组和对照组之间差异的天真比较将具有误导性,因为我们在人口中过度表示了个体类型(协变量剖面)。随机化通过平衡治疗组和对照组之间的协变量剖面并确保结果独立于治疗分配来解决此问题。但我们不能总是随机化。倾向得分很有用,因为它们可以通过对观察数据的特定转换来帮助模拟样本数据中治疗状态的 如同 随机分配。
这种类型的假设以及随之而来的基于倾向得分的平衡测试通常用于替代系统地确定治疗分配的结构 DAG 的详细阐述。其思想是,如果我们可以在条件倾向得分的情况下实现协变量之间的平衡,那么我们已经模拟了如同随机分配,我们可以避免指定过多结构的麻烦,并对机制的结构保持不可知论。这通常可能是一个有用的策略,但正如我们将看到的,它回避了因果问题的特异性和数据生成过程。
非混淆推断:NHEFS 数据#
来自国家健康和营养检查调查的此数据集记录了大约 1500 人在两个时期内的体重、活动和吸烟习惯的详细信息。第一个时期建立了基线,随后的时期是 10 年后。我们将分析个体(trt == 1
)是否在随访访问前戒烟。每个个体的 outcome
代表比较这两个时期的相对体重增加/减少。
try:
nhefs_df = pd.read_csv("../data/nhefs.csv")
except:
nhefs_df = pd.read_csv(pm.get_data("nhefs.csv"))
nhefs_df.head()
年龄 | 种族 | 性别 | 吸烟强度 | 烟龄 | wt71 | active_1 | active_2 | education_2 | education_3 | education_4 | education_5 | exercise_1 | exercise_2 | age^2 | wt71^2 | smokeintensity^2 | smokeyrs^2 | trt | outcome | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 42 | 1 | 0 | 30 | 29 | 79.04 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1764 | 6247.3216 | 900 | 841 | 0 | -10.093960 |
1 | 36 | 0 | 0 | 20 | 24 | 58.63 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1296 | 3437.4769 | 400 | 576 | 0 | 2.604970 |
2 | 56 | 1 | 1 | 20 | 26 | 56.81 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 3136 | 3227.3761 | 400 | 676 | 0 | 9.414486 |
3 | 68 | 1 | 0 | 3 | 53 | 59.42 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 4624 | 3530.7364 | 9 | 2809 | 0 | 4.990117 |
4 | 40 | 0 | 0 | 20 | 19 | 87.09 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1600 | 7584.6681 | 400 | 361 | 0 | 4.989251 |
我们可能想知道调查回复中代表的是谁,以及此调查中测量的差异在多大程度上与更广泛人群中的模式相对应?如果我们查看结果的总体差异
raw_diff = nhefs_df.groupby("trt")[["outcome"]].mean()
print("Treatment Diff:", raw_diff["outcome"].iloc[1] - raw_diff["outcome"].iloc[0])
raw_diff
Treatment Diff: 2.540581454955888
outcome | |
---|---|
trt | |
0 | 1.984498 |
1 | 4.525079 |
我们看到两组之间存在一些总体差异,但进一步细分,我们可能会担心差异是由于各组在治疗组和对照组中不同协变量剖面之间不平衡造成的。也就是说,我们可以检查数据不同阶层中隐含的差异,以了解我们可能在人口的不同角落存在不平衡。这种不平衡表明对治疗状态的一些选择效应,这就是我们希望通过倾向得分建模来解决的问题。
strata_df = (
nhefs_df.groupby(
[
"trt",
"sex",
"race",
"active_1",
"active_2",
"education_2",
]
)[["outcome"]]
.agg(["count", "mean"])
.rename({"age": "count"}, axis=1)
)
global_avg = nhefs_df["outcome"].mean()
strata_df["global_avg"] = global_avg
strata_df["diff"] = strata_df[("outcome", "mean")] - strata_df["global_avg"]
strata_df.reset_index(inplace=True)
strata_df.columns = [" ".join(col).strip() for col in strata_df.columns.values]
strata_df.style.background_gradient(axis=0)
trt | 性别 | 种族 | active_1 | active_2 | education_2 | 结果计数 | 结果均值 | 全局平均值 | 差异 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0 | 0 | 193 | 2.858158 | 2.638300 | 0.219859 |
1 | 0 | 0 | 0 | 0 | 0 | 1 | 46 | 3.870131 | 2.638300 | 1.231831 |
2 | 0 | 0 | 0 | 0 | 1 | 0 | 29 | 4.095394 | 2.638300 | 1.457095 |
3 | 0 | 0 | 0 | 0 | 1 | 1 | 5 | 0.568137 | 2.638300 | -2.070163 |
4 | 0 | 0 | 0 | 1 | 0 | 0 | 160 | 0.709439 | 2.638300 | -1.928861 |
5 | 0 | 0 | 0 | 1 | 0 | 1 | 36 | 0.994271 | 2.638300 | -1.644029 |
6 | 0 | 0 | 1 | 0 | 0 | 0 | 36 | 2.888559 | 2.638300 | 0.250259 |
7 | 0 | 0 | 1 | 0 | 0 | 1 | 4 | 6.322334 | 2.638300 | 3.684034 |
8 | 0 | 0 | 1 | 0 | 1 | 0 | 4 | -5.501240 | 2.638300 | -8.139540 |
9 | 0 | 0 | 1 | 1 | 0 | 0 | 20 | -1.354505 | 2.638300 | -3.992804 |
10 | 0 | 0 | 1 | 1 | 0 | 1 | 9 | 0.442138 | 2.638300 | -2.196162 |
11 | 0 | 1 | 0 | 0 | 0 | 0 | 157 | 2.732690 | 2.638300 | 0.094390 |
12 | 0 | 1 | 0 | 0 | 0 | 1 | 59 | 2.222754 | 2.638300 | -0.415546 |
13 | 0 | 1 | 0 | 0 | 1 | 0 | 36 | 2.977257 | 2.638300 | 0.338957 |
14 | 0 | 1 | 0 | 0 | 1 | 1 | 17 | 2.087297 | 2.638300 | -0.551003 |
15 | 0 | 1 | 0 | 1 | 0 | 0 | 200 | 1.700405 | 2.638300 | -0.937895 |
16 | 0 | 1 | 0 | 1 | 0 | 1 | 55 | -0.492455 | 2.638300 | -3.130754 |
17 | 0 | 1 | 1 | 0 | 0 | 0 | 19 | 2.644629 | 2.638300 | 0.006329 |
18 | 0 | 1 | 1 | 0 | 0 | 1 | 18 | 3.047791 | 2.638300 | 0.409491 |
19 | 0 | 1 | 1 | 0 | 1 | 0 | 9 | 1.637378 | 2.638300 | -1.000922 |
20 | 0 | 1 | 1 | 0 | 1 | 1 | 4 | 0.735846 | 2.638300 | -1.902454 |
21 | 0 | 1 | 1 | 1 | 0 | 0 | 34 | 0.647564 | 2.638300 | -1.990736 |
22 | 0 | 1 | 1 | 1 | 0 | 1 | 13 | 4.815856 | 2.638300 | 2.177556 |
23 | 1 | 0 | 0 | 0 | 0 | 0 | 76 | 4.737206 | 2.638300 | 2.098906 |
24 | 1 | 0 | 0 | 0 | 0 | 1 | 18 | 5.242349 | 2.638300 | 2.604049 |
25 | 1 | 0 | 0 | 0 | 1 | 0 | 23 | 3.205170 | 2.638300 | 0.566870 |
26 | 1 | 0 | 0 | 0 | 1 | 1 | 4 | 6.067620 | 2.638300 | 3.429320 |
27 | 1 | 0 | 0 | 1 | 0 | 0 | 70 | 4.630845 | 2.638300 | 1.992545 |
28 | 1 | 0 | 0 | 1 | 0 | 1 | 12 | 7.570608 | 2.638300 | 4.932308 |
29 | 1 | 0 | 1 | 0 | 0 | 0 | 4 | 7.201967 | 2.638300 | 4.563668 |
30 | 1 | 0 | 1 | 0 | 0 | 1 | 3 | 10.698826 | 2.638300 | 8.060526 |
31 | 1 | 0 | 1 | 1 | 0 | 0 | 7 | 0.778359 | 2.638300 | -1.859941 |
32 | 1 | 0 | 1 | 1 | 0 | 1 | 3 | 9.790449 | 2.638300 | 7.152149 |
33 | 1 | 1 | 0 | 0 | 0 | 0 | 55 | 5.095007 | 2.638300 | 2.456708 |
34 | 1 | 1 | 0 | 0 | 0 | 1 | 7 | 9.832617 | 2.638300 | 7.194318 |
35 | 1 | 1 | 0 | 0 | 1 | 0 | 14 | -1.587808 | 2.638300 | -4.226108 |
36 | 1 | 1 | 0 | 0 | 1 | 1 | 4 | 8.761674 | 2.638300 | 6.123375 |
37 | 1 | 1 | 0 | 1 | 0 | 0 | 67 | 3.862593 | 2.638300 | 1.224293 |
38 | 1 | 1 | 0 | 1 | 0 | 1 | 17 | 3.162162 | 2.638300 | 0.523862 |
39 | 1 | 1 | 1 | 0 | 0 | 0 | 5 | 0.522196 | 2.638300 | -2.116104 |
40 | 1 | 1 | 1 | 0 | 0 | 1 | 2 | 7.826238 | 2.638300 | 5.187938 |
41 | 1 | 1 | 1 | 1 | 0 | 0 | 8 | 5.756044 | 2.638300 | 3.117744 |
42 | 1 | 1 | 1 | 1 | 0 | 1 | 4 | 5.440875 | 2.638300 | 2.802575 |
接下来,我们将绘制两个组中偏离全局均值的偏差。这里,每一行是一个阶层,颜色代表该阶层所属的治疗组。我们通过点的大小来区分阶层的大小。
显示代码单元格源
def make_strata_plot(strata_df):
joined_df = strata_df[strata_df["trt"] == 0].merge(
strata_df[strata_df["trt"] == 1], on=["sex", "race", "active_1", "active_2", "education_2"]
)
joined_df.sort_values("diff_y", inplace=True)
# Func to draw line segment
def newline(p1, p2, color="black"):
ax = plt.gca()
l = mlines.Line2D([p1[0], p2[0]], [p1[1], p2[1]], color="black", linestyle="--")
ax.add_line(l)
return l
fig, ax = plt.subplots(figsize=(20, 15))
ax.scatter(
joined_df["diff_x"],
joined_df.index,
color="red",
alpha=0.7,
label="Control Sample Size",
s=joined_df["outcome count_x"] * 3,
)
ax.scatter(
joined_df["diff_y"],
joined_df.index,
color="blue",
alpha=0.7,
label="Treatment Sample Size",
s=joined_df["outcome count_y"] * 3,
)
for i, p1, p2 in zip(joined_df.index, joined_df["diff_x"], joined_df["diff_y"]):
newline([p1, i], [p2, i])
ax.set_xlabel("Difference from the Global Mean")
ax.set_title(
"Differences from Global Mean \n by Treatment Status and Strata",
fontsize=20,
fontweight="bold",
)
ax.axvline(0, color="k")
ax.set_ylabel("Strata Index")
ax.legend()
make_strata_plot(strata_df)

显示了跨阶层的相当一致的模式——治疗组在几乎所有阶层中都实现了超过全局均值的结果。每个阶层中每个治疗组的样本量较小。然后,我们取阶层特定平均值的平均值,以看到更明显的区别。
strata_expected_df = strata_df.groupby("trt")[["outcome count", "outcome mean", "diff"]].agg(
{"outcome count": ["sum"], "outcome mean": "mean", "diff": "mean"}
)
print(
"Treatment Diff:",
strata_expected_df[("outcome mean", "mean")].iloc[1]
- strata_expected_df[("outcome mean", "mean")].iloc[0],
)
strata_expected_df
Treatment Diff: 3.662365976037309
结果计数 | 结果均值 | 差异 | |
---|---|---|---|
总和 | 均值 | 均值 | |
trt | |||
0 | 1163 | 1.767384 | -0.870916 |
1 | 403 | 5.429750 | 2.791450 |
这种练习表明,我们的样本构建方式,即数据生成过程的某些方面,将人口的某些阶层从采用治疗组中拉开,并且出现在治疗组中会将结果变量向右拉。被治疗的倾向是负向的,因此污染了任何因果推断主张。我们应该合理地担心,未能考虑到这种偏差可能会导致关于 (a) 方向和 (b) 戒烟对体重影响程度的错误结论。这在观察性研究中是很自然的,我们 永远不会 随机分配到治疗条件,但对治疗倾向建模可以帮助我们模拟随机分配的条件。
准备建模数据#
我们现在只是以特定格式准备数据以进行建模,从协变量数据 X 中删除 outcome
和 trt
。
X = nhefs_df.copy()
y = nhefs_df["outcome"]
t = nhefs_df["trt"]
X = X.drop(["trt", "outcome"], axis=1)
X.head()
年龄 | 种族 | 性别 | 吸烟强度 | 烟龄 | wt71 | active_1 | active_2 | education_2 | education_3 | education_4 | education_5 | exercise_1 | exercise_2 | age^2 | wt71^2 | smokeintensity^2 | smokeyrs^2 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 42 | 1 | 0 | 30 | 29 | 79.04 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1764 | 6247.3216 | 900 | 841 |
1 | 36 | 0 | 0 | 20 | 24 | 58.63 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1296 | 3437.4769 | 400 | 576 |
2 | 56 | 1 | 1 | 20 | 26 | 56.81 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 3136 | 3227.3761 | 400 | 676 |
3 | 68 | 1 | 0 | 3 | 53 | 59.42 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 4624 | 3530.7364 | 9 | 2809 |
4 | 40 | 0 | 0 | 20 | 19 | 87.09 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1600 | 7584.6681 | 400 | 361 |
倾向得分建模#
在第一步中,我们定义一个模型构建函数来捕获治疗的概率,即每个个体的倾向得分。
我们指定要评估的两种类型的模型。一种完全依赖逻辑回归,另一种使用 BART(贝叶斯加法回归树)来建模协变量和治疗分配之间的关系。BART 模型的好处是使用基于树的算法来探索样本数据中各个阶层之间的交互效应。有关 BART 和 PyMC 实现的更多详细信息,请参阅 此处。
拥有像 BART 这样灵活的模型是理解我们在进行逆倾向加权调整时所做的事情的关键。想法是,我们数据集中的任何给定阶层都将由一组协变量描述。个体类型将由这些协变量剖面表示——属性向量 \(X\)。我们的数据中任何给定协变量剖面选择的观测值份额代表对该类型个体的偏差。使用像 BART 这样灵活的模型对分配机制进行建模使我们能够估计人口中一系列阶层的结果。
首先,我们将个体倾向得分建模为个体协变量剖面的函数
def make_propensity_model(X, t, bart=True, probit=True, samples=1000, m=50):
coords = {"coeffs": list(X.columns), "obs": range(len(X))}
with pm.Model(coords=coords) as model_ps:
X_data = pm.MutableData("X", X)
t_data = pm.MutableData("t", t)
if bart:
mu = pmb.BART("mu", X, t, m=m)
if probit:
p = pm.Deterministic("p", pm.math.invprobit(mu))
else:
p = pm.Deterministic("p", pm.math.invlogit(mu))
else:
b = pm.Normal("b", mu=0, sigma=1, dims="coeffs")
mu = pm.math.dot(X_data, b)
p = pm.Deterministic("p", pm.math.invlogit(mu))
t_pred = pm.Bernoulli("t_pred", p=p, observed=t_data, dims="obs")
idata = pm.sample_prior_predictive()
idata.extend(
pm.sample(
samples, init="adapt_diag", random_seed=105, idata_kwargs={"log_likelihood": True}
)
)
idata.extend(pm.sample_posterior_predictive(idata))
return model_ps, idata
m_ps_logit, idata_logit = make_propensity_model(X, t, bart=False, samples=1000)
Sampling: [b, t_pred]
Auto-assigning NUTS sampler...
Initializing NUTS using adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [b]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 98 seconds.
Sampling: [t_pred]
m_ps_probit, idata_probit = make_propensity_model(X, t, bart=True, probit=True, samples=4000)
Sampling: [mu, t_pred]
Multiprocess sampling (4 chains in 4 jobs)
PGBART: [mu]
Sampling 4 chains for 1_000 tune and 4_000 draw iterations (4_000 + 16_000 draws total) took 85 seconds.
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters. A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
Sampling: [t_pred]
使用倾向得分:权重和伪人口#
一旦我们拟合了这些模型,我们就可以比较每个模型如何将治疗倾向(在我们的例子中是戒烟倾向)归因于每个这样的测量个体。让我们绘制研究中前 20 个左右个体的倾向得分分布,以查看两个模型之间的差异。
az.plot_forest(
[idata_logit, idata_probit],
var_names=["p"],
coords={"p_dim_0": range(20)},
figsize=(10, 13),
combined=True,
kind="ridgeplot",
model_names=["Logistic Regression", "BART"],
r_hat=True,
ridgeplot_alpha=0.4,
);

这些倾向得分可以提取出来,并与其他协变量一起检查。
ps_logit = idata_logit["posterior"]["p"].mean(dim=("chain", "draw")).round(2)
ps_logit
<xarray.DataArray 'p' (p_dim_0: 1566)> array([0.1 , 0.15, 0.13, ..., 0.13, 0.47, 0.18]) Coordinates: * p_dim_0 (p_dim_0) int64 0 1 2 3 4 5 6 ... 1560 1561 1562 1563 1564 1565
ps_probit = idata_probit["posterior"]["p"].mean(dim=("chain", "draw")).round(2)
ps_probit
<xarray.DataArray 'p' (p_dim_0: 1566)> array([0.19, 0.18, 0.23, ..., 0.18, 0.31, 0.21]) Coordinates: * p_dim_0 (p_dim_0) int64 0 1 2 3 4 5 6 ... 1560 1561 1562 1563 1564 1565
在这里,我们绘制了每个模型下倾向得分的分布,并展示了倾向得分权重的倒数将如何应用于观察到的数据点。
显示代码单元格源
fig, axs = plt.subplots(3, 2, figsize=(20, 15))
axs = axs.flatten()
colors = {1: "blue", 0: "red"}
axs[0].hist(ps_logit.values[t == 0], ec="black", color="red", bins=30, label="Control", alpha=0.6)
axs[0].hist(
ps_logit.values[t == 1], ec="black", color="blue", bins=30, label="Treatment", alpha=0.6
)
axs[2].hist(ps_logit.values[t == 0], ec="black", color="red", bins=30, label="Control", alpha=0.6)
axs[2].hist(
1 - ps_logit.values[t == 1], ec="black", color="blue", bins=30, label="Treatment", alpha=0.6
)
axs[0].set_xlabel("Propensity Scores")
axs[1].set_xlabel("Propensity Scores")
axs[1].set_ylabel("Count of Observations")
axs[0].set_ylabel("Count of Observations")
axs[0].axvline(0.9, color="black", linestyle="--", label="Extreme Propensity Score")
axs[0].axvline(0.1, color="black", linestyle="--")
axs[1].hist(ps_probit.values[t == 0], ec="black", color="red", bins=30, label="Control", alpha=0.6)
axs[1].hist(
ps_probit.values[t == 1], ec="black", color="blue", bins=30, label="Treatment", alpha=0.6
)
axs[3].hist(ps_probit.values[t == 0], ec="black", color="red", bins=30, label="Control", alpha=0.6)
axs[3].hist(
1 - ps_probit.values[t == 1], ec="black", color="blue", bins=30, label="Treatment", alpha=0.6
)
axs[3].set_title("Overlap of inverted Propensity Scores")
axs[2].set_title("Overlap of inverted Propensity Scores")
axs[1].axvline(0.9, color="black", linestyle="--", label="Extreme Propensity Score")
axs[1].axvline(0.1, color="black", linestyle="--")
axs[2].axvline(0.9, color="black", linestyle="--", label="Extreme Propensity Score")
axs[2].axvline(0.1, color="black", linestyle="--")
axs[3].axvline(0.9, color="black", linestyle="--", label="Extreme Propensity Score")
axs[3].axvline(0.1, color="black", linestyle="--")
axs[0].set_xlim(0, 1)
axs[1].set_xlim(0, 1)
axs[0].set_title("Propensity Scores under Logistic Regression", fontsize=20)
axs[1].set_title(
"Propensity Scores under Non-Parametric BART model \n with probit transform", fontsize=20
)
axs[4].scatter(
X["age"], y, color=t.map(colors), s=(1 / ps_logit.values) * 20, ec="black", alpha=0.4
)
axs[4].set_xlabel("Age")
axs[5].set_xlabel("Age")
axs[5].set_ylabel("y")
axs[4].set_ylabel("y")
axs[4].set_title("Sized by IP Weights")
axs[5].set_title("Sized by IP Weights")
axs[5].scatter(
X["age"], y, color=t.map(colors), s=(1 / ps_probit.values) * 20, ec="black", alpha=0.4
)
red_patch = mpatches.Patch(color="red", label="Control")
blue_patch = mpatches.Patch(color="blue", label="Treated")
axs[2].legend(handles=[red_patch, blue_patch])
axs[0].legend()
axs[1].legend()
axs[5].legend(handles=[red_patch, blue_patch]);

当我们考虑倾向得分的分布时,我们正在寻找正性,即要求由协变量值组合定义的任何患者亚组中,接受给定治疗的条件概率不能为 0 或 1。我们不希望 过度拟合 我们的倾向模型并具有完美的治疗/对照组分配,因为这会降低它们在加权方案中的用途。倾向得分的重要特征是它们是跨组相似性的度量——0 或 1 的极端预测会威胁到因果治疗效果的可识别性。极端倾向得分(在由某些协变量剖面定义的阶层内)表明该模型不相信可以很好地定义这些个体治疗之间的因果对比。或者至少,对比组之一的样本量会非常小。一些作者建议在 (.1, .9) 处绘制边界以过滤掉或限制极端情况以解决这种过度拟合,但这本质上是对模型误设的承认。
我们还可以查看倾向得分分区中协变量的平衡。这些类型的图有助于我们评估在条件倾向得分的情况下,我们的样本在治疗组和对照组之间是否显示出平衡的剖面。我们正在寻求显示协变量剖面在条件倾向得分下的平衡。
显示代码单元格源
temp = X.copy()
temp["ps"] = ps_logit.values
temp["ps_cut"] = pd.qcut(temp["ps"], 5)
def plot_balance(temp, col, t):
fig, axs = plt.subplots(1, 5, figsize=(20, 9))
axs = axs.flatten()
for c, ax in zip(np.sort(temp["ps_cut"].unique()), axs):
std0 = temp[(t == 0) & (temp["ps_cut"] == c)][col].std()
std1 = temp[(t == 1) & (temp["ps_cut"] == c)][col].std()
pooled_std = (std0 + std1) / 2
mean_diff = (
temp[(t == 0) & (temp["ps_cut"] == c)][col].mean()
- temp[(t == 1) & (temp["ps_cut"] == c)][col].mean()
) / pooled_std
ax.hist(
temp[(t == 0) & (temp["ps_cut"] == c)][col],
alpha=0.6,
color="red",
density=True,
ec="black",
bins=10,
cumulative=False,
)
ax.hist(
temp[(t == 1) & (temp["ps_cut"] == c)][col],
alpha=0.4,
color="blue",
density=True,
ec="black",
bins=10,
cumulative=False,
)
ax.set_title(f"Propensity Score: {c} \n Standardised Mean Diff {np.round(mean_diff, 4)} ")
ax.set_xlabel(col)
red_patch = mpatches.Patch(color="red", label="Control")
blue_patch = mpatches.Patch(color="blue", label="Treated")
axs[0].legend(handles=[red_patch, blue_patch])
plt.suptitle(
f"Density Functions of {col} \n by Partitions of Propensity Score",
fontsize=20,
fontweight="bold",
)
plot_balance(temp, "age", t)

plot_balance(temp, "wt71", t)

plot_balance(temp, "smokeyrs", t)

plot_balance(temp, "smokeintensity", t)

在理想的世界中,我们会在每个协变量的治疗组之间实现完美的平衡,但即使我们在此处看到的近似平衡也是有用的。当我们具有良好的协变量平衡(条件倾向得分)时,我们可以然后在统计摘要模型中使用倾向得分的加权方案,以便“纠正”两组中协变量剖面的表示。如果个体的倾向得分使得他们极有可能接受治疗状态,例如 0.95,那么我们希望降低他们在治疗中出现的重要性,并提高他们在对照组中出现的重要性。这是有道理的,因为他们的高倾向得分意味着相似的个体已经大量存在于治疗组中,但在对照组中不太可能出现。因此,我们的纠正策略是重新加权他们对每个组中摘要统计数据的贡献。
稳健和双重稳健加权方案#
我们一直热衷于强调基于倾向的权重是一种修正。因果分析师有机会在天平上施加影响,并调整分配给治疗组和对照组中个体的代表性份额。然而,没有通用的纠正措施,自然而然地出现了各种替代方案来填补简单倾向得分加权失败的空白。我们将在下面看到许多替代加权方案。
要指出的主要区别是原始倾向得分权重和倾向得分权重的双重稳健理论之间的区别。
双重稳健方法之所以如此命名,是因为它们代表了因果效应的折衷估计器,它结合了 (i) 治疗分配模型(如倾向得分)和 (ii) 更直接的响应结果模型。该方法以某种方式组合这两个估计器,以生成治疗效果的统计无偏估计。它们效果良好,因为它们的组合方式要求模型中只需要一个被正确指定。分析师选择其加权方案的组成部分,只要他们认为他们已正确建模 (i) 或 (ii)。这是“基于设计”的推断和“基于模型”的推断之间明显区别中一个有趣的张力——对双重稳健估计器的要求在某种程度上是对清教徒方法学家的让步;我们需要两者兼而有之,并且当其中任何一个单独就足够时,这并不明显。
估计治疗效果#
在本节中,我们构建了一组函数,用于从我们的倾向得分后验分布中提取和提取样本;使用此倾向得分来重新加权观察到的结果变量在我们的治疗组和对照组中,以重新计算平均治疗效果 (ATE)。它使用三种逆概率加权方案之一重新加权我们的数据,然后绘制三个视图 (1) 各组的原始倾向得分 (2) 原始结果分布和 (3) 重新加权的结果分布。
首先,我们为我们将探索的每种加权调整方法定义了一堆辅助函数
def make_robust_adjustments(X, t):
X["trt"] = t
p_of_t = X["trt"].mean()
X["i_ps"] = np.where(t, (p_of_t / X["ps"]), (1 - p_of_t) / (1 - X["ps"]))
n_ntrt = X[X["trt"] == 0].shape[0]
n_trt = X[X["trt"] == 1].shape[0]
outcome_trt = X[X["trt"] == 1]["outcome"]
outcome_ntrt = X[X["trt"] == 0]["outcome"]
i_propensity0 = X[X["trt"] == 0]["i_ps"]
i_propensity1 = X[X["trt"] == 1]["i_ps"]
weighted_outcome1 = outcome_trt * i_propensity1
weighted_outcome0 = outcome_ntrt * i_propensity0
return weighted_outcome0, weighted_outcome1, n_ntrt, n_trt
def make_raw_adjustments(X, t):
X["trt"] = t
X["ps"] = np.where(X["trt"], X["ps"], 1 - X["ps"])
X["i_ps"] = 1 / X["ps"]
n_ntrt = n_trt = len(X)
outcome_trt = X[X["trt"] == 1]["outcome"]
outcome_ntrt = X[X["trt"] == 0]["outcome"]
i_propensity0 = X[X["trt"] == 0]["i_ps"]
i_propensity1 = X[X["trt"] == 1]["i_ps"]
weighted_outcome1 = outcome_trt * i_propensity1
weighted_outcome0 = outcome_ntrt * i_propensity0
return weighted_outcome0, weighted_outcome1, n_ntrt, n_trt
def make_doubly_robust_adjustment(X, t, y):
m0 = sm.OLS(y[t == 0], X[t == 0].astype(float)).fit()
m1 = sm.OLS(y[t == 1], X[t == 1].astype(float)).fit()
m0_pred = m0.predict(X)
m1_pred = m1.predict(X)
X["trt"] = t
X["y"] = y
## Compromise between outcome and treatment assignment model
weighted_outcome0 = (1 - X["trt"]) * (X["y"] - m0_pred) / (1 - X["ps"]) + m0_pred
weighted_outcome1 = X["trt"] * (X["y"] - m1_pred) / X["ps"] + m1_pred
return weighted_outcome0, weighted_outcome1, None, None
值得在此扩展双重稳健估计的理论。我们在上面展示了实现治疗分配估计器和响应或结果估计器之间折衷的代码。但为什么这有用?考虑双重稳健估计器的函数形式。
关于这些公式如何影响结果模型和治疗分配模型之间的折衷,这不是很直观。但考虑极端情况。首先,假设我们的模型 \(m_{1}\) 与我们的结果 \(Y\) 完全吻合,那么分数的分子为 0,我们最终得到模型预测的平均值。相反,假设模型 \(m_{1}\) 被错误指定,并且我们在分子中存在一些误差 \(\epsilon > 0\)。如果倾向得分模型是准确的,那么在治疗类别中,我们的分母应该很高……比如 \(\sim N(.9, .05)\),因此估计器将接近 \(\epsilon\) 的数字加回到 \(m_{1}\) 预测中。类似的推理适用于 \(Y(0)\) 情况。
其他估计器更简单——表示通过估计倾向得分的变换来缩放结果变量。每个估计器都试图通过我们学到的关于治疗倾向的知识来重新平衡数据点。但是,估计器之间的差异(正如我们将看到的)在其应用中很重要。
现在我们定义原始和重新加权结果的绘图函数。
显示代码单元格源
def plot_weights(bins, top0, top1, ylim, ax):
ax.axhline(0, c="gray", linewidth=1)
ax.set_ylim(ylim)
bars0 = ax.bar(bins[:-1] + 0.025, top0, width=0.04, facecolor="red", alpha=0.6)
bars1 = ax.bar(bins[:-1] + 0.025, -top1, width=0.04, facecolor="blue", alpha=0.6)
for bars in (bars0, bars1):
for bar in bars:
bar.set_edgecolor("black")
for x, y in zip(bins, top0):
ax.text(x + 0.025, y + 10, str(y), ha="center", va="bottom")
for x, y in zip(bins, top1):
ax.text(x + 0.025, -y - 10, str(y), ha="center", va="top")
def make_plot(
X,
idata,
lower_bins=[np.arange(1, 30, 1), np.arange(1, 30, 1)],
ylims=[
(-100, 370),
(
-40,
100,
),
(-50, 110),
],
text_pos=(20, 80),
ps=None,
method="robust",
):
X = X.copy()
if ps is None:
n_list = list(range(1000))
## Choose random ps score from posterior
choice = np.random.choice(n_list, 1)[0]
X["ps"] = idata["posterior"]["p"].stack(z=("chain", "draw"))[:, choice].values
else:
X["ps"] = ps
X["trt"] = t
propensity0 = X[X["trt"] == 0]["ps"]
propensity1 = X[X["trt"] == 1]["ps"]
## Get Weighted Outcomes
if method == "robust":
X["outcome"] = y
weighted_outcome0, weighted_outcome1, n_ntrt, n_trt = make_robust_adjustments(X, t)
elif method == "raw":
X["outcome"] = y
weighted_outcome0, weighted_outcome1, n_ntrt, n_trt = make_raw_adjustments(X, t)
else:
weighted_outcome0, weighted_outcome1, _, _ = make_doubly_robust_adjustment(X, t, y)
### Top Plot of Propensity Scores
bins = np.arange(0.025, 0.85, 0.05)
top0, _ = np.histogram(propensity0, bins=bins)
top1, _ = np.histogram(propensity1, bins=bins)
fig, axs = plt.subplots(3, 1, figsize=(20, 20))
axs = axs.flatten()
plot_weights(bins, top0, top1, ylims[0], axs[0])
axs[0].text(0.05, 230, "Control = 0")
axs[0].text(0.05, -90, "Treatment = 1")
axs[0].set_ylabel("No. Patients", fontsize=14)
axs[0].set_xlabel("Estimated Propensity Score", fontsize=14)
axs[0].set_title(
"Inferred Propensity Scores and IP Weighted Outcome \n by Treatment and Control",
fontsize=20,
)
### Middle Plot of Outcome
outcome_trt = y[t == 1]
outcome_ntrt = y[t == 0]
top0, _ = np.histogram(outcome_ntrt, bins=lower_bins[0])
top1, _ = np.histogram(outcome_trt, bins=lower_bins[0])
plot_weights(lower_bins[0], top0, top1, ylims[2], axs[1])
axs[1].set_ylabel("No. Patients", fontsize=14)
axs[1].set_xlabel("Raw Outcome Measure", fontsize=14)
axs[1].text(text_pos[0], text_pos[1], f"Control: E(Y) = {outcome_ntrt.mean()}")
axs[1].text(text_pos[0], text_pos[1] - 20, f"Treatment: E(Y) = {outcome_trt.mean()}")
axs[1].text(
text_pos[0],
text_pos[1] - 40,
f"tau: E(Y(1) - Y(0)) = {outcome_trt.mean()- outcome_ntrt.mean()}",
fontweight="bold",
)
## Bottom Plot of Adjusted Outcome using Inverse Propensity Score weights
axs[2].set_ylabel("No. Patients", fontsize=14)
if method in ["raw", "robust"]:
top0, _ = np.histogram(weighted_outcome0, bins=lower_bins[1])
top1, _ = np.histogram(weighted_outcome1, bins=lower_bins[1])
plot_weights(lower_bins[1], top0, top1, ylims[1], axs[2])
axs[2].set_xlabel("Estimated IP Weighted Outcome \n Shifted", fontsize=14)
axs[2].text(text_pos[0], text_pos[1], f"Control: E(Y) = {weighted_outcome0.sum() / n_ntrt}")
axs[2].text(
text_pos[0], text_pos[1] - 20, f"Treatment: E(Y) = {weighted_outcome1.sum() / n_trt}"
)
axs[2].text(
text_pos[0],
text_pos[1] - 40,
f"tau: E(Y(1) - Y(0)) = {weighted_outcome1.sum() / n_trt - weighted_outcome0.sum() / n_ntrt}",
fontweight="bold",
)
else:
top0, _ = np.histogram(weighted_outcome0, bins=lower_bins[1])
top1, _ = np.histogram(weighted_outcome1, bins=lower_bins[1])
plot_weights(lower_bins[1], top0, top1, ylims[1], axs[2])
trt = np.round(np.mean(weighted_outcome1), 5)
ntrt = np.round(np.mean(weighted_outcome0), 5)
axs[2].set_xlabel("Estimated IP Weighted Outcome \n Shifted", fontsize=14)
axs[2].text(text_pos[0], text_pos[1], f"Control: E(Y) = {ntrt}")
axs[2].text(text_pos[0], text_pos[1] - 20, f"Treatment: E(Y) = {trt}")
axs[2].text(
text_pos[0], text_pos[1] - 40, f"tau: E(Y(1) - Y(0)) = {trt - ntrt}", fontweight="bold"
)
Logit 倾向模型#
我们使用稳健的倾向得分估计方法绘制结果和重新加权的结果分布。

此图中有很多内容,因此值得更仔细地浏览一下。在第一个面板中,我们有治疗组和对照组的倾向得分分布。在第二个面板中,我们再次绘制了原始结果数据,作为组之间分割的分布。此外,我们还介绍了结果的期望值以及如果在原始结果数据上天真地估计 ATE。最后,在第三个面板中,我们有重新加权的结果变量——使用逆倾向得分重新加权,并且我们基于调整后的结果变量导出 ATE。我们在此图中突出显示的是原始和调整后结果下的 ATE 之间的区别。
接下来,因为我们是贝叶斯主义者——我们提取并评估基于采样的倾向得分的 ATE 后验分布。我们在上面看到了 ATE 的点估计,但在因果推断的背景下,了解估计中的不确定性通常更重要。
def get_ate(X, t, y, i, idata, method="doubly_robust"):
X = X.copy()
X["outcome"] = y
### Post processing the sample posterior distribution for propensity scores
### One sample at a time.
X["ps"] = idata["posterior"]["p"].stack(z=("chain", "draw"))[:, i].values
if method == "robust":
weighted_outcome_ntrt, weighted_outcome_trt, n_ntrt, n_trt = make_robust_adjustments(X, t)
ntrt = weighted_outcome_ntrt.sum() / n_ntrt
trt = weighted_outcome_trt.sum() / n_trt
elif method == "raw":
weighted_outcome_ntrt, weighted_outcome_trt, n_ntrt, n_trt = make_raw_adjustments(X, t)
ntrt = weighted_outcome_ntrt.sum() / n_ntrt
trt = weighted_outcome_trt.sum() / n_trt
else:
X.drop("outcome", axis=1, inplace=True)
weighted_outcome_ntrt, weighted_outcome_trt, n_ntrt, n_trt = make_doubly_robust_adjustment(
X, t, y
)
trt = np.mean(weighted_outcome_trt)
ntrt = np.mean(weighted_outcome_ntrt)
ate = trt - ntrt
return [ate, trt, ntrt]
qs = range(4000)
ate_dist = [get_ate(X, t, y, q, idata_logit, method="robust") for q in qs]
ate_dist_df_logit = pd.DataFrame(ate_dist, columns=["ATE", "E(Y(1))", "E(Y(0))"])
ate_dist_df_logit.head()
ATE | E(Y(1)) | E(Y(0)) | |
---|---|---|---|
0 | 3.904674 | 5.621594 | 1.716920 |
1 | 2.735570 | 4.564235 | 1.828665 |
2 | 3.332896 | 5.128174 | 1.795278 |
3 | 3.958929 | 5.662041 | 1.703112 |
4 | 2.575863 | 4.515414 | 1.939551 |
接下来,我们绘制 ATE 的后验分布。
显示代码单元格源
def plot_ate(ate_dist_df, xy=(4.0, 250)):
fig, axs = plt.subplots(1, 2, figsize=(20, 7))
axs = axs.flatten()
axs[0].hist(
ate_dist_df["E(Y(1))"], bins=30, ec="black", color="blue", label="E(Y(1))", alpha=0.5
)
axs[0].hist(
ate_dist_df["E(Y(0))"], bins=30, ec="black", color="red", label="E(Y(0))", alpha=0.7
)
axs[1].hist(ate_dist_df["ATE"], bins=30, ec="black", color="slateblue", label="ATE", alpha=0.6)
ate = np.round(ate_dist_df["ATE"].mean(), 2)
axs[1].axvline(ate, label="E(ATE)", linestyle="--", color="black")
axs[1].annotate(f"E(ATE): {ate}", xy, fontsize=20, fontweight="bold")
axs[1].set_title(f"Average Treatment Effect \n E(ATE): {ate}", fontsize=20)
axs[0].set_title("E(Y) Distributions for Treated and Control", fontsize=20)
axs[1].set_xlabel("Average Treatment Effect")
axs[0].set_xlabel("Expected Potential Outcomes")
axs[1].legend()
axs[0].legend()
plot_ate(ate_dist_df_logit)

请注意,这种治疗效果的估计与我们通过取各组平均值的简单差异得到的结果有很大不同。
BART 倾向模型#
接下来,我们将原始估计器和双重稳健估计器应用于使用 BART 非参数模型实现的倾向分布。

我们在这里看到,基于 BART 的倾向得分的重新加权方案如何在结果分布上引起与上面不同的形状,但仍然实现了 ATE 的类似估计。
ate_dist_probit = [get_ate(X, t, y, q, idata_probit, method="doubly_robust") for q in qs]
ate_dist_df_probit = pd.DataFrame(ate_dist_probit, columns=["ATE", "E(Y(1))", "E(Y(0))"])
ate_dist_df_probit.head()
ATE | E(Y(1)) | E(Y(0)) | |
---|---|---|---|
0 | 3.360633 | 5.134718 | 1.774085 |
1 | 3.327173 | 5.092400 | 1.765227 |
2 | 3.327616 | 5.088878 | 1.761262 |
3 | 3.242207 | 4.999089 | 1.756882 |
4 | 3.373167 | 5.126389 | 1.753222 |
plot_ate(ate_dist_df_probit, xy=(3.6, 250))

请注意,使用双重稳健方法测量的方差更小。这并不奇怪,双重稳健方法的设计就具有这种预期效果。
模型选择时的考虑因素#
评估人群平均值的变化是一回事,但我们可能希望考虑到人群中效应异质性的概念,因此,BART 模型通常更擅长确保跨越我们数据更深层次的准确预测。但是,机器学习模型在预测任务中的灵活性并不能保证在因果效应估计中使用时,样本中归因的倾向评分能够很好地校准以恢复真实的治疗效果。我们在因果关系背景下使用非参数模型的灵活性时必须小心。
首先观察 BART 模型在样本日益狭窄的分层中引起的异质准确性。
显示代码单元格源
fig, axs = plt.subplots(4, 2, figsize=(20, 25))
axs = axs.flatten()
az.plot_ppc(idata_logit, ax=axs[0])
az.plot_ppc(idata_probit, ax=axs[1])
idx1 = list((X[X["race"] == 1].index).values)
idx0 = list((X[X["race"] == 0].index).values)
az.plot_ppc(idata_logit, ax=axs[2], coords={"obs": idx1})
az.plot_ppc(idata_probit, ax=axs[3], coords={"obs": idx0})
idx1 = list((X[(X["race"] == 1) & (X["sex"] == 1)].index).values)
idx0 = list((X[(X["race"] == 0) & (X["sex"] == 1)].index).values)
az.plot_ppc(idata_logit, ax=axs[4], coords={"obs": idx1})
az.plot_ppc(idata_probit, ax=axs[5], coords={"obs": idx0})
idx1 = list((X[(X["race"] == 1) & (X["sex"] == 1) & (X["active_1"] == 1)].index).values)
idx0 = list((X[(X["race"] == 0) & (X["sex"] == 1) & (X["active_1"] == 1)].index).values)
az.plot_ppc(idata_logit, ax=axs[6], coords={"obs": idx1})
az.plot_ppc(idata_probit, ax=axs[7], coords={"obs": idx0})
axs[0].set_title("Overall PPC - Logit")
axs[1].set_title("Overall PPC - BART")
axs[2].set_title("Race Specific PPC - Logit")
axs[3].set_title("Race Specific PPC - BART")
axs[4].set_title("Race/Gender Specific PPC - Logit")
axs[5].set_title("Race/Gender Specific PPC - BART")
axs[6].set_title("Race/Gender/Active Specific PPC - Logit")
axs[7].set_title("Race/Gender/Active Specific PPC - BART")
plt.suptitle("Posterior Predictive Checks - Heterogenous Effects", fontsize=20);
/Users/nathanielforde/mambaforge/envs/pymc_causal/lib/python3.11/site-packages/arviz/plots/ppcplot.py:267: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
flatten_pp = list(predictive_dataset.dims.keys())
/Users/nathanielforde/mambaforge/envs/pymc_causal/lib/python3.11/site-packages/arviz/plots/ppcplot.py:271: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
flatten = list(observed_data.dims.keys())

像这样的观察结果对于激发在因果推断中使用灵活的机器学习方法大有帮助。用于捕获结果分布或倾向评分分布的模型应该对数据极值之间的变化敏感。我们可以在上面看到,随着我们深入数据分区,更简单的逻辑回归模型的预测能力会下降。我们将在下面的示例中看到,诸如 BART 之类的机器学习模型的灵活性如何成为问题。我们还将看到如何修复它。听起来自相矛盾的是,一个更完美的倾向评分模型将干净地分离治疗类别,从而使重新平衡更难实现。通过这种方式,像 BART 这样灵活的模型(容易过度拟合)在逆倾向权重方案的情况下需要谨慎使用。
倾向评分回归#
另一种可能更直接的因果推断方法是直接使用回归分析。Angrist 和 Pischke Angrist 和 Pischke [2008] 认为回归的熟悉属性使其更受欢迎,但也承认倾向性评分有其作用,谨慎的分析师可以将这两种方法结合起来。在这里,我们将展示如何在回归背景下结合倾向评分,以推导出治疗效果的估计值。
def make_prop_reg_model(X, t, y, idata_ps, covariates=None, samples=1000):
### Note the simplification for specifying the mean estimate in the regression
### rather than post-processing the whole posterior
ps = idata_ps["posterior"]["p"].mean(dim=("chain", "draw")).values
X_temp = pd.DataFrame({"ps": ps, "trt": t, "trt*ps": t * ps})
if covariates is None:
X = X_temp
else:
X = pd.concat([X_temp, X[covariates]], axis=1)
coords = {"coeffs": list(X.columns), "obs": range(len(X))}
with pm.Model(coords=coords) as model_ps_reg:
sigma = pm.HalfNormal("sigma", 1)
b = pm.Normal("b", mu=0, sigma=1, dims="coeffs")
X = pm.MutableData("X", X)
mu = pm.math.dot(X, b)
y_pred = pm.Normal("pred", mu, sigma, observed=y, dims="obs")
idata = pm.sample_prior_predictive()
idata.extend(pm.sample(samples, idata_kwargs={"log_likelihood": True}))
idata.extend(pm.sample_posterior_predictive(idata))
return model_ps_reg, idata
model_ps_reg, idata_ps_reg = make_prop_reg_model(X, t, y, idata_logit)
Sampling: [b, pred, sigma]
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, b]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1 seconds.
Sampling: [pred]
使用倾向评分作为降维技术拟合回归模型在这里似乎效果良好。我们恢复的治疗效果估计值与上面的基本相同。
az.summary(idata_ps_reg)
均值 | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
b[ps] | 2.458 | 0.622 | 1.360 | 3.682 | 0.010 | 0.007 | 3550.0 | 2618.0 | 1.0 |
b[trt] | 3.464 | 0.468 | 2.624 | 4.366 | 0.008 | 0.006 | 3192.0 | 2684.0 | 1.0 |
b[trt*ps] | -0.729 | 0.917 | -2.387 | 1.079 | 0.015 | 0.011 | 3584.0 | 3266.0 | 1.0 |
sigma | 7.803 | 0.131 | 7.565 | 8.050 | 0.002 | 0.001 | 4694.0 | 3447.0 | 1.0 |
model_ps_reg_bart, idata_ps_reg_bart = make_prop_reg_model(X, t, y, idata_probit)
Sampling: [b, pred, sigma]
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, b]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1 seconds.
Sampling: [pred]
az.summary(idata_ps_reg_bart)
均值 | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
b[ps] | 3.057 | 0.660 | 1.836 | 4.304 | 0.012 | 0.008 | 3164.0 | 2698.0 | 1.0 |
b[trt] | 3.228 | 0.470 | 2.363 | 4.114 | 0.009 | 0.006 | 2883.0 | 3054.0 | 1.0 |
b[trt*ps] | -0.339 | 0.950 | -2.073 | 1.495 | 0.016 | 0.012 | 3527.0 | 3487.0 | 1.0 |
sigma | 7.789 | 0.134 | 7.545 | 8.055 | 0.002 | 0.001 | 4555.0 | 3187.0 | 1.0 |
因果推断作为回归插补#
上面我们将因果效应估计值解读为回归模型中治疗变量的系数。一种可以说是更简洁的方法是使用拟合的回归模型来插补不同治疗方案下潜在结果 Y(1)、Y(0) 的分布。通过这种方式,我们对因果推断有了另一种视角。至关重要的是,这种视角是非参数的,因为它没有对插补模型所需的功能形式做任何假设。
X_mod = X.copy()
X_mod["ps"] = ps = idata_probit["posterior"]["p"].mean(dim=("chain", "draw")).values
X_mod["trt"] = 1
X_mod["trt*ps"] = X_mod["ps"] * X_mod["trt"]
with model_ps_reg_bart:
# update values of predictors:
pm.set_data({"X": X_mod[["ps", "trt", "trt*ps"]]})
idata_trt = pm.sample_posterior_predictive(idata_ps_reg_bart)
idata_trt
Sampling: [pred]
-
<xarray.Dataset> Dimensions: (chain: 4, draw: 1000, obs: 1566) Coordinates: * chain (chain) int64 0 1 2 3 * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 992 993 994 995 996 997 998 999 * obs (obs) int64 0 1 2 3 4 5 6 7 ... 1559 1560 1561 1562 1563 1564 1565 Data variables: pred (chain, draw, obs) float64 -11.53 7.462 19.63 ... 15.41 -1.16 11.88 Attributes: created_at: 2024-02-28T20:19:53.151423 arviz_version: 0.17.0 inference_library: pymc inference_library_version: 5.10.3
-
<xarray.Dataset> Dimensions: (obs: 1566) Coordinates: * obs (obs) int64 0 1 2 3 4 5 6 7 ... 1559 1560 1561 1562 1563 1564 1565 Data variables: pred (obs) float64 -10.09 2.605 9.414 4.99 ... 1.36 3.515 4.763 15.76 Attributes: created_at: 2024-02-28T20:19:53.152822 arviz_version: 0.17.0 inference_library: pymc inference_library_version: 5.10.3
-
<xarray.Dataset> Dimensions: (X_dim_0: 1566, X_dim_1: 3) Coordinates: * X_dim_0 (X_dim_0) int64 0 1 2 3 4 5 6 ... 1560 1561 1562 1563 1564 1565 * X_dim_1 (X_dim_1) int64 0 1 2 Data variables: X (X_dim_0, X_dim_1) float64 0.1939 1.0 0.1939 ... 0.2145 1.0 0.2145 Attributes: created_at: 2024-02-28T20:19:53.153537 arviz_version: 0.17.0 inference_library: pymc inference_library_version: 5.10.3
X_mod = X.copy()
X_mod["ps"] = ps = idata_probit["posterior"]["p"].mean(dim=("chain", "draw")).values
X_mod["trt"] = 0
X_mod["trt*ps"] = X_mod["ps"] * X_mod["trt"]
with model_ps_reg_bart:
# update values of predictors:
pm.set_data({"X": X_mod[["ps", "trt", "trt*ps"]]})
idata_ntrt = pm.sample_posterior_predictive(idata_ps_reg_bart)
idata_ntrt
Sampling: [pred]
-
<xarray.Dataset> Dimensions: (chain: 4, draw: 1000, obs: 1566) Coordinates: * chain (chain) int64 0 1 2 3 * draw (draw) int64 0 1 2 3 4 5 6 7 8 ... 992 993 994 995 996 997 998 999 * obs (obs) int64 0 1 2 3 4 5 6 7 ... 1559 1560 1561 1562 1563 1564 1565 Data variables: pred (chain, draw, obs) float64 12.87 0.7811 1.743 ... -2.888 5.276 Attributes: created_at: 2024-02-28T20:19:53.403518 arviz_version: 0.17.0 inference_library: pymc inference_library_version: 5.10.3
-
<xarray.Dataset> Dimensions: (obs: 1566) Coordinates: * obs (obs) int64 0 1 2 3 4 5 6 7 ... 1559 1560 1561 1562 1563 1564 1565 Data variables: pred (obs) float64 -10.09 2.605 9.414 4.99 ... 1.36 3.515 4.763 15.76 Attributes: created_at: 2024-02-28T20:19:53.404685 arviz_version: 0.17.0 inference_library: pymc inference_library_version: 5.10.3
-
<xarray.Dataset> Dimensions: (X_dim_0: 1566, X_dim_1: 3) Coordinates: * X_dim_0 (X_dim_0) int64 0 1 2 3 4 5 6 ... 1560 1561 1562 1563 1564 1565 * X_dim_1 (X_dim_1) int64 0 1 2 Data variables: X (X_dim_0, X_dim_1) float64 0.1939 0.0 0.0 0.1792 ... 0.2145 0.0 0.0 Attributes: created_at: 2024-02-28T20:19:53.405450 arviz_version: 0.17.0 inference_library: pymc inference_library_version: 5.10.3
idata_trt["posterior_predictive"]["pred"].mean()
<xarray.DataArray 'pred' ()> array(3.91927737)
idata_ntrt["posterior_predictive"]["pred"].mean()
<xarray.DataArray 'pred' ()> array(0.78187425)
idata_trt["posterior_predictive"]["pred"].mean() - idata_ntrt["posterior_predictive"]["pred"].mean()
<xarray.DataArray 'pred' ()> array(3.13740312)
所有这些关于因果推断问题的观点在这里似乎大致趋同。接下来,我们将看到一个例子,分析师所做的选择可能会完全错误,并且在应用这些推断工具时需要格外小心。
混淆推断:医疗支出数据#
现在,我们将开始查看 贝叶斯非参数因果推断和缺失数据 中分析的医疗支出数据集。关于此数据集的显著特征是,由于吸烟的存在,对支出没有明显的因果影响。我们遵循作者的做法,尝试对 smoke
对数化 log_y
的影响进行建模。我们最初将重点放在估计 ATE 上。关于吸烟的影响,几乎没有明显的信号可以辨别。我们想证明,即使我们选择了正确的方法并尝试使用正确的工具来控制偏差 - 如果我们过于关注机制而不是数据生成过程,我们仍然可能会错过眼皮底下的故事。
try:
df = pd.read_csv("../data/meps_bayes_np_health.csv", index_col=["Unnamed: 0"])
except:
df = pd.read_csv(pm.get_data("meps_bayes_np_health.csv"), index_col=["Unnamed: 0"])
df = df[df["totexp"] > 0].reset_index(drop=True)
df["log_y"] = np.log(df["totexp"] + 1000)
df["loginc"] = np.log(df["income"])
df["smoke"] = np.where(df["smoke"] == "No", 0, 1)
df
/Users/nathanielforde/mambaforge/envs/pymc_causal/lib/python3.11/site-packages/pandas/core/arraylike.py:399: RuntimeWarning: divide by zero encountered in log
result = getattr(ufunc, method)(*inputs, **kwargs)
年龄 | bmi | edu | income | povlev | region | 性别 | marital | 种族 | seatbelt | smoke | phealth | totexp | log_y | loginc | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 30 | 39.1 | 14 | 78400 | 343.69 | Northeast | Male | Married | White | Always | 0 | Fair | 40 | 6.946976 | 11.269579 |
1 | 53 | 20.2 | 17 | 180932 | 999.30 | West | Male | Married | Multi | Always | 0 | Very Good | 429 | 7.264730 | 12.105877 |
2 | 81 | 21.0 | 14 | 27999 | 205.94 | West | Male | Married | White | Always | 0 | Very Good | 14285 | 9.634627 | 10.239924 |
3 | 77 | 25.7 | 12 | 27999 | 205.94 | West | Female | Married | White | Always | 0 | Fair | 7959 | 9.100414 | 10.239924 |
4 | 31 | 23.0 | 12 | 14800 | 95.46 | South | Female | Divorced | White | Always | 0 | Excellent | 5017 | 8.702344 | 9.602382 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
16425 | 23 | 26.6 | 16 | 23000 | 130.72 | South | Male | Separated | Asian | Always | 0 | Very Good | 130 | 7.029973 | 10.043249 |
16426 | 55 | 21.9 | 12 | 23000 | 130.72 | South | Female | Married | Asian | Always | 0 | Very Good | 468 | 7.291656 | 10.043249 |
16427 | 22 | -9.0 | 9 | 7000 | 38.66 | Midwest | Male | Married | White | Always | 0 | Excellent | 711 | 7.444833 | 8.853665 |
16428 | 22 | 24.2 | 10 | 7000 | 38.66 | Midwest | Female | Married | White | Always | 0 | Excellent | 587 | 7.369601 | 8.853665 |
16429 | 20 | 26.9 | 10 | 9858 | 84.24 | Midwest | Female | Separated | White | Always | 0 | Fair | 1228 | 7.708860 | 9.196039 |
16430 行 × 15 列
摘要统计#
让我们回顾一下基本的摘要统计信息,看看它们在不同的人口分层中是如何变化的
raw_diff = df.groupby("smoke")[["log_y"]].mean()
print("Treatment Diff:", raw_diff["log_y"].iloc[0] - raw_diff["log_y"].iloc[1])
raw_diff
Treatment Diff: 0.05280094075302166
log_y | |
---|---|
smoke | |
0 | 8.098114 |
1 | 8.045313 |
pd.set_option("display.max_rows", 500)
strata_df = df.groupby(["smoke", "sex", "race", "phealth"])[["log_y"]].agg(["count", "mean", "std"])
global_avg = df["log_y"].mean()
strata_df["global_avg"] = global_avg
strata_df.reset_index(inplace=True)
strata_df.columns = [" ".join(col).strip() for col in strata_df.columns.values]
strata_df["diff"] = strata_df["log_y mean"] - strata_df["global_avg"]
strata_df.sort_values("log_y count", ascending=False).head(30).style.background_gradient(axis=0)
smoke | 性别 | 种族 | phealth | log_y 计数 | log_y 均值 | log_y 标准差 | 全局平均值 | 差异 | |
---|---|---|---|---|---|---|---|---|---|
29 | 0 | Female | White | Very Good | 1858 | 8.101406 | 0.896128 | 8.089203 | 0.012204 |
27 | 0 | Female | White | Good | 1572 | 8.231117 | 1.010783 | 8.089203 | 0.141914 |
25 | 0 | Female | White | Excellent | 1385 | 7.919802 | 0.846725 | 8.089203 | -0.169400 |
59 | 0 | Male | White | Very Good | 1321 | 7.987652 | 0.922520 | 8.089203 | -0.101551 |
57 | 0 | Male | White | Good | 1129 | 8.178290 | 1.003363 | 8.089203 | 0.089088 |
55 | 0 | Male | White | Excellent | 1122 | 7.728966 | 0.779346 | 8.089203 | -0.360236 |
26 | 0 | Female | White | Fair | 659 | 8.487774 | 1.113656 | 8.089203 | 0.398572 |
7 | 0 | Female | Black | Good | 515 | 8.125243 | 0.944796 | 8.089203 | 0.036040 |
9 | 0 | Female | Black | Very Good | 488 | 7.870293 | 0.884956 | 8.089203 | -0.218909 |
56 | 0 | Male | White | Fair | 434 | 8.601018 | 1.112748 | 8.089203 | 0.511816 |
110 | 1 | Male | White | Good | 335 | 7.939632 | 0.887826 | 8.089203 | -0.149571 |
84 | 1 | Female | White | Good | 324 | 8.077777 | 0.968686 | 8.089203 | -0.011426 |
5 | 0 | Female | Black | Excellent | 307 | 7.748597 | 0.812461 | 8.089203 | -0.340606 |
6 | 0 | Female | Black | Fair | 266 | 8.534893 | 1.057159 | 8.089203 | 0.445690 |
86 | 1 | Female | White | Very Good | 266 | 7.913179 | 0.902211 | 8.089203 | -0.176024 |
39 | 0 | Male | Black | Very Good | 246 | 7.765843 | 0.831623 | 8.089203 | -0.323360 |
37 | 0 | Male | Black | Good | 235 | 8.002760 | 1.051284 | 8.089203 | -0.086443 |
112 | 1 | Male | White | Very Good | 235 | 7.848349 | 0.900002 | 8.089203 | -0.240854 |
4 | 0 | Female | Asian | Very Good | 193 | 7.864920 | 0.859187 | 8.089203 | -0.224283 |
83 | 1 | Female | White | Fair | 191 | 8.403307 | 0.989581 | 8.089203 | 0.314105 |
28 | 0 | Female | White | Poor | 186 | 9.160054 | 1.138894 | 8.089203 | 1.070852 |
35 | 0 | Male | Black | Excellent | 184 | 7.620076 | 0.771911 | 8.089203 | -0.469127 |
0 | 0 | Female | Asian | Excellent | 164 | 7.786508 | 0.899504 | 8.089203 | -0.302694 |
2 | 0 | Female | Asian | Good | 162 | 7.873122 | 0.768487 | 8.089203 | -0.216080 |
82 | 1 | Female | White | Excellent | 149 | 7.860320 | 0.837483 | 8.089203 | -0.228882 |
108 | 1 | Male | White | Excellent | 148 | 7.652529 | 0.717871 | 8.089203 | -0.436674 |
109 | 1 | Male | White | Fair | 148 | 8.282303 | 1.111403 | 8.089203 | 0.193101 |
58 | 0 | Male | White | Poor | 140 | 9.308711 | 1.255442 | 8.089203 | 1.219509 |
34 | 0 | Male | Asian | Very Good | 140 | 7.792831 | 0.772666 | 8.089203 | -0.296371 |
32 | 0 | Male | Asian | Good | 134 | 7.993583 | 1.123291 | 8.089203 | -0.095620 |
显示代码单元格源
def make_strata_plot(strata_df):
joined_df = strata_df[strata_df["smoke"] == 0].merge(
strata_df[strata_df["smoke"] == 1], on=["sex", "race", "phealth"]
)
joined_df.sort_values("diff_y", inplace=True)
# Func to draw line segment
def newline(p1, p2, color="black"):
ax = plt.gca()
l = mlines.Line2D([p1[0], p2[0]], [p1[1], p2[1]], color="black", linestyle="--")
ax.add_line(l)
return l
fig, ax = plt.subplots(figsize=(20, 15))
ax.scatter(
joined_df["diff_x"],
joined_df.index,
color="red",
alpha=0.7,
label="Control Sample Size",
s=joined_df["log_y count_x"] / 2,
)
ax.scatter(
joined_df["diff_y"],
joined_df.index,
color="blue",
alpha=0.7,
label="Treatment Sample Size",
s=joined_df["log_y count_y"] / 2,
)
for i, p1, p2 in zip(joined_df.index, joined_df["diff_x"], joined_df["diff_y"]):
newline([p1, i], [p2, i])
ax.set_xlabel("Difference from the Global Mean")
ax.set_title(
"Differences from Global Mean \n by Treatment Status and Strata",
fontsize=20,
fontweight="bold",
)
ax.axvline(0, color="k")
ax.set_ylabel("Strata Index")
ax.legend()
make_strata_plot(strata_df)

在这个可视化中很难看到清晰的模式。在两个治疗组中,当存在一些显着的样本量时,我们看到两组的平均差异都接近于零。
strata_expected_df = strata_df.groupby("smoke")[["log_y count", "log_y mean", "diff"]].agg(
{"log_y count": ["sum"], "log_y mean": "mean", "diff": "mean"}
)
print(
"Treatment Diff:",
strata_expected_df[("log_y mean", "mean")].iloc[0]
- strata_expected_df[("log_y mean", "mean")].iloc[1],
)
strata_expected_df
Treatment Diff: 0.28947855780477827
log_y 计数 | log_y 均值 | 差异 | |
---|---|---|---|
总和 | 均值 | 均值 | |
smoke | |||
0 | 13657 | 8.237595 | 0.148392 |
1 | 2773 | 7.948116 | -0.141087 |
数据中我们的治疗效果似乎确实没有影响或影响很小。我们能使用逆倾向评分加权方法来恢复这种洞察力吗?
fig, axs = plt.subplots(2, 2, figsize=(20, 8))
axs = axs.flatten()
axs[0].hist(
df[df["smoke"] == 1]["log_y"],
alpha=0.3,
density=True,
bins=30,
label="Smoker",
ec="black",
color="blue",
)
axs[0].hist(
df[df["smoke"] == 0]["log_y"],
alpha=0.5,
density=True,
bins=30,
label="Non-Smoker",
ec="black",
color="red",
)
axs[1].hist(
df[df["smoke"] == 1]["log_y"],
density=True,
bins=30,
cumulative=True,
histtype="step",
label="Smoker",
color="blue",
)
axs[1].hist(
df[df["smoke"] == 0]["log_y"],
density=True,
bins=30,
cumulative=True,
histtype="step",
label="Non-Smoker",
color="red",
)
lkup = {1: "blue", 0: "red"}
axs[2].scatter(df["loginc"], df["log_y"], c=df["smoke"].map(lkup), alpha=0.4)
axs[2].set_xlabel("Log Income")
axs[3].scatter(df["age"], df["log_y"], c=df["smoke"].map(lkup), alpha=0.4)
axs[3].set_title("Log Outcome versus Age")
axs[2].set_title("Log Outcome versus Log Income")
axs[3].set_xlabel("Age")
axs[0].set_title("Empirical Densities")
axs[0].legend()
axs[1].legend()
axs[1].set_title("Empirical Cumulative \n Densities");

这些图似乎证实了两个组之间结果的无差异性质。在分布的外四分位数处有一些差异的暗示。这很重要,因为它表明平均治疗效果极小。结果以对数美元尺度记录,因此任何类型的单位移动都相当可观。
qs = np.linspace(0.05, 0.99, 100)
quantile_diff = (
df.groupby("smoke")[["totexp"]]
.quantile(qs)
.reset_index()
.pivot(index="level_1", columns="smoke", values="totexp")
.rename({0: "Non-Smoker", 1: "Smoker"}, axis=1)
.assign(diff=lambda x: x["Non-Smoker"] - x["Smoker"])
.reset_index()
.rename({"level_1": "quantile"}, axis=1)
)
fig, axs = plt.subplots(1, 2, figsize=(20, 6))
axs[0].plot(quantile_diff["quantile"], quantile_diff["Smoker"])
axs[0].plot(quantile_diff["quantile"], quantile_diff["Non-Smoker"])
axs[0].set_title("Q-Q plot comparing \n Smoker and Non-Smokers")
axs[1].plot(quantile_diff["quantile"], quantile_diff["diff"])
axs[1].set_title("Differences across the Quantiles");

哪里可能出错?#
现在我们使用简单的虚拟编码为分类变量准备数据,并从数据集中抽取一千个观测值用于我们的初始建模。
dummies = pd.concat(
[
pd.get_dummies(df["seatbelt"], drop_first=True, prefix="seatbelt"),
pd.get_dummies(df["marital"], drop_first=True, prefix="marital"),
pd.get_dummies(df["race"], drop_first=True, prefix="race"),
pd.get_dummies(df["sex"], drop_first=True, prefix="sex"),
pd.get_dummies(df["phealth"], drop_first=True, prefix="phealth"),
],
axis=1,
)
idx = df.sample(1000, random_state=100).index
X = pd.concat(
[
df[["age", "bmi"]],
dummies,
],
axis=1,
)
X = X.iloc[idx]
t = df.iloc[idx]["smoke"]
y = df.iloc[idx]["log_y"]
X
年龄 | bmi | seatbelt_Always | seatbelt_Never | seatbelt_NoCar | seatbelt_Seldom | seatbelt_Sometimes | marital_Married | marital_Separated | marital_Widowed | race_Black | race_Indig | race_Multi | race_PacificIslander | race_White | sex_Male | phealth_Fair | phealth_Good | phealth_Poor | phealth_Very Good | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2852 | 27 | 23.7 | False | False | False | False | False | True | False | False | False | False | False | False | True | True | False | False | False | False |
13271 | 71 | 29.1 | True | False | False | False | False | False | False | True | True | False | False | False | False | False | True | False | False | False |
6786 | 19 | 21.3 | True | False | False | False | False | False | True | False | False | False | False | False | True | False | False | False | False | True |
15172 | 20 | 38.0 | True | False | False | False | False | False | True | False | True | False | False | False | False | False | False | False | False | True |
10967 | 22 | 28.7 | True | False | False | False | False | True | False | False | False | False | False | False | True | False | False | False | False | True |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
5404 | 30 | 35.6 | True | False | False | False | False | True | False | False | False | False | False | False | True | True | False | True | False | False |
8665 | 80 | 22.0 | True | False | False | False | False | False | False | True | False | False | False | False | False | True | False | False | False | False |
3726 | 49 | 32.9 | True | False | False | False | False | True | False | False | True | False | False | False | False | True | False | True | False | False |
6075 | 49 | 34.2 | True | False | False | False | False | True | False | False | False | False | False | True | False | False | False | False | False | True |
795 | 53 | 28.2 | True | False | False | False | False | True | False | False | False | False | False | False | True | True | False | False | False | False |
1000 行 × 20 列
m_ps_expend_bart, idata_expend_bart = make_propensity_model(
X, t, bart=True, probit=False, samples=1000, m=80
)
m_ps_expend_logit, idata_expend_logit = make_propensity_model(X, t, bart=False, samples=1000)
Sampling: [mu, t_pred]
Multiprocess sampling (4 chains in 4 jobs)
PGBART: [mu]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 46 seconds.
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters. A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
Sampling: [t_pred]
Sampling: [b, t_pred]
Auto-assigning NUTS sampler...
Initializing NUTS using adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [b]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 19 seconds.
Sampling: [t_pred]
az.plot_trace(idata_expend_bart, var_names=["mu", "p"]);

显示代码单元格源
ps = idata_expend_bart["posterior"]["p"].mean(dim=("chain", "draw")).values
fig, axs = plt.subplots(2, 1, figsize=(20, 10))
axs = axs.flatten()
ax = axs[0]
ax1 = axs[1]
ax1.set_title("Overlap of Inverted Propensity Scores")
ax.hist(
ps[t == 0],
bins=30,
ec="black",
alpha=0.4,
color="blue",
label="Propensity Scores in Treated",
)
ax1.hist(
ps[t == 0],
bins=30,
ec="black",
alpha=0.4,
color="blue",
label="Propensity Scores in Treated",
)
ax.set_xlabel("Propensity Scores")
ax.set_ylabel("Count of Observed")
ax1.set_ylabel("Count of Observed")
ax.hist(
ps[t == 1], bins=30, ec="black", alpha=0.6, color="red", label="Propensity Scores in Control"
)
ax1.hist(
1 - ps[t == 1],
bins=30,
ec="black",
alpha=0.6,
color="red",
label="Propensity Scores in Control",
)
ax.set_title("BART Model - Health Expenditure Data \n Propensity Scores per Group", fontsize=20)
ax.axvline(0.9, color="black", linestyle="--", label="Extreme Propensity Scores")
ax.axvline(0.1, color="black", linestyle="--")
ax1.axvline(0.9, color="black", linestyle="--", label="Extreme Propensity Scores")
ax1.axvline(0.1, color="black", linestyle="--")
ax.legend()
fig, ax2 = plt.subplots(figsize=(20, 6))
ax2.scatter(X["age"], y, color=t.map(colors), s=(1 / ps) * 20, ec="black", alpha=0.4)
ax2.set_xlabel("Age")
ax2.set_xlabel("Age")
ax2.set_ylabel("y")
ax2.set_ylabel("y")
ax2.set_title("Sized by IP Weights", fontsize=20)
red_patch = mpatches.Patch(color="red", label="Control")
blue_patch = mpatches.Patch(color="blue", label="Treated")
ax2.legend(handles=[red_patch, blue_patch]);


请注意,直方图图中每组的尾部没有很好地重叠,并且我们有一些极端的倾向评分。这表明倾向评分可能没有很好地拟合以实现良好的平衡特性。
非参数 BART 倾向模型被错误指定#
BART 模型拟合的灵活性无法很好地校准以恢复平均治疗效果。让我们评估在稳健的逆倾向权重估计下加权的结果分布。
## Evaluate at the expected realisation of the propensity scores for each individual
ps = idata_expend_bart["posterior"]["p"].mean(dim=("chain", "draw")).values
make_plot(
X,
idata_expend_bart,
ylims=[(-100, 340), (-60, 260), (-60, 260)],
lower_bins=[np.arange(6, 15, 0.5), np.arange(2, 14, 0.5)],
text_pos=(11, 80),
method="robust",
ps=ps,
)

这是一个可疑的结果。在后验倾向评分分布的期望值下评估,ATE 的稳健 IPW 估计量表明治疗组和对照组之间存在显着差异。第三面板中重新加权的结果在对照组和治疗组之间发散,表明平衡的彻底失败。当原始结果的平均值中的简单差异显示接近 0 的差异时,但重新加权的差异显示在对数尺度上移动了 1 以上。发生了什么?在接下来内容中,我们将从几个不同的角度来审视这个问题,但您应该对目前的结果持相当怀疑的态度。
如果我们查看不同估计量下的后验 ATE 分布会发生什么?
qs = range(4000)
ate_dist = [get_ate(X, t, y, q, idata_expend_bart, method="doubly_robust") for q in qs]
ate_dist_df_dr = pd.DataFrame(ate_dist, columns=["ATE", "E(Y(1))", "E(Y(0))"])
ate_dist = [get_ate(X, t, y, q, idata_expend_bart, method="robust") for q in qs]
ate_dist_df_r = pd.DataFrame(ate_dist, columns=["ATE", "E(Y(1))", "E(Y(0))"])
ate_dist_df_dr.head()
ATE | E(Y(1)) | E(Y(0)) | |
---|---|---|---|
0 | -0.115872 | 7.962193 | 8.078065 |
1 | -0.119544 | 7.961215 | 8.080759 |
2 | -0.117121 | 7.966924 | 8.084046 |
3 | -0.060144 | 8.003459 | 8.063603 |
4 | -0.106624 | 7.956411 | 8.063035 |
此表中的每一行都显示了平均治疗效果的估计值,以及使用双重稳健估计量和从我们的 BART 模型拟合隐含的倾向评分分布的后验中抽取的样本得出的结果变量的重新加权均值。
plot_ate(ate_dist_df_r, xy=(0.5, 300))

跨后验分布的抽取样本推导出 ATE 估计值并对这些值求平均值似乎给出了更合理的数字,但仍然被夸大到超出我们的 EDA 建议的最小差异。相反,如果我们使用双重稳健估计量,我们会再次恢复更合理的数字。
plot_ate(ate_dist_df_dr, xy=(-0.35, 200))

回想一下,双重稳健估计量设置为在倾向评分加权和(在我们的例子中)简单的 OLS 预测之间取得折衷。只要这两个模型之一被正确指定,此估计量就可以恢复准确的无偏治疗效果。在这种情况下,我们看到双重稳健估计量偏离了我们的倾向评分的含义,并优先考虑回归模型。
我们还可以检查基于 BART 的倾向评分如何以及是否破坏平衡,从而威胁到逆倾向评分加权的有效性。
temp = X.copy()
temp["ps"] = ps = idata_expend_bart["posterior"]["p"].mean(dim=("chain", "draw")).values
temp["ps_cut"] = pd.qcut(temp["ps"], 5)
plot_balance(temp, "bmi", t)

回归如何提供帮助?#
我们刚刚看到了一个示例,说明错误指定的机器学习模型如何在研究中严重偏倚因果估计值。我们已经看到了一种修复方法,但是如果我们只是尝试更简单的探索性回归建模,情况会如何呢?回归会自动根据协变量特征的极端程度及其在样本中的普遍程度对数据点进行加权。因此,从这个意义上讲,它以逆加权方法无法做到的方式调整了异常倾向评分。
model_ps_reg_expend, idata_ps_reg_expend = make_prop_reg_model(X, t, y, idata_expend_bart)
Sampling: [b, pred, sigma]
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, b]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1 seconds.
Sampling: [pred]
az.summary(idata_ps_reg_expend, var_names=["b"])
均值 | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
b[ps] | 27.677 | 0.728 | 26.301 | 29.039 | 0.016 | 0.011 | 2091.0 | 2334.0 | 1.0 |
b[trt] | 1.318 | 0.369 | 0.631 | 2.023 | 0.008 | 0.006 | 2231.0 | 2512.0 | 1.0 |
b[trt*ps] | -2.493 | 0.958 | -4.232 | -0.621 | 0.020 | 0.015 | 2313.0 | 2121.0 | 1.0 |
这个模型显然太简单了。它仅恢复了由于错误指定的 BART 倾向模型而产生的有偏估计值,但是如果我们仅将倾向性评分用作协变量特征中的另一个特征会怎样呢?让它增加精度,但拟合我们认为实际上反映了因果关系的模型。
model_ps_reg_expend_h, idata_ps_reg_expend_h = make_prop_reg_model(
X,
t,
y,
idata_expend_bart,
covariates=["age", "bmi", "phealth_Fair", "phealth_Good", "phealth_Poor", "phealth_Very Good"],
)
Sampling: [b, pred, sigma]
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, b]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 2 seconds.
Sampling: [pred]
az.summary(idata_ps_reg_expend_h, var_names=["b"])
均值 | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
b[ps] | 9.227 | 0.595 | 8.113 | 10.354 | 0.009 | 0.007 | 4017.0 | 2444.0 | 1.01 |
b[trt] | 0.137 | 0.241 | -0.279 | 0.629 | 0.004 | 0.003 | 3739.0 | 2796.0 | 1.00 |
b[trt*ps] | -2.277 | 0.821 | -3.737 | -0.636 | 0.014 | 0.010 | 3481.0 | 2686.0 | 1.00 |
b[age] | 0.064 | 0.002 | 0.060 | 0.068 | 0.000 | 0.000 | 3909.0 | 2756.0 | 1.00 |
b[bmi] | 0.089 | 0.004 | 0.081 | 0.097 | 0.000 | 0.000 | 3448.0 | 2860.0 | 1.00 |
b[phealth_Fair] | 0.488 | 0.175 | 0.150 | 0.811 | 0.003 | 0.002 | 3288.0 | 2991.0 | 1.00 |
b[phealth_Good] | 0.342 | 0.149 | 0.051 | 0.609 | 0.003 | 0.002 | 2908.0 | 2943.0 | 1.00 |
b[phealth_Poor] | 0.614 | 0.256 | 0.126 | 1.083 | 0.004 | 0.003 | 3954.0 | 2788.0 | 1.00 |
b[phealth_Very Good] | 1.252 | 0.135 | 1.011 | 1.514 | 0.002 | 0.002 | 2944.0 | 3176.0 | 1.00 |
这要好得多,我们可以看到,结合健康因素对倾向评分特征进行建模可以得出更合理的治疗效果估计值。这种发现与 Angrist 和 Pischke 报告的教训相呼应,即
“控制正确的协变量的回归在消除选择效应方面做得相当不错……”第 91 页 Mostly Harmless Econometrics Angrist 和 Pischke [2008]
因此,我们又回到了正确控制的问题。实际上没有办法避免这种负担。在缺乏领域知识和对当前问题的仔细关注的情况下,机器学习不能充当万能药。这永远不是人们想听到的鼓舞人心的信息,但不幸的是,这是事实。回归对此有所帮助,因为系数表的毫不留情的清晰度提醒我们,没有办法替代对正确的事物进行良好的衡量。
双重/去偏机器学习和 Frisch-Waugh-Lovell 定理#
回顾一下 - 我们已经看到了两个使用逆概率加权调整进行因果推断的示例。我们已经看到了当倾向评分模型得到良好校准时,它是如何工作的。我们已经看到了它何时失败以及如何修复失败。这些是我们工具带中的工具 - 适用于不同的问题,但需要我们仔细考虑数据生成过程和适当协变量的类型。
在简单的倾向建模方法失败的情况下,我们看到了一个数据集,其中我们的治疗分配没有区分平均治疗效果。我们还看到了如果我们增强基于倾向的估计量,我们如何改善该技术的识别特性。在这里,我们将展示另一个示例,说明如何将倾向模型与基于回归建模的见解相结合,以利用机器学习方法为因果推断提供的灵活建模可能性。在本节中,我们大量借鉴了 Matheus Facure 的工作,特别是他的著作 Python 中的因果推断 Facure [2023],但最初的想法可以在 Victor et al. [2018] 中找到
Frisch-Waugh-Lovell 定理#
该定理的思想是,对于任何 OLS 拟合的线性模型,其焦点参数为 \(\beta_{1}\),辅助参数为 \(\gamma_{i}\)
我们可以通过两步程序检索相同的值 \(\beta_{i}, \gamma_{i}\)
将 \(Y\) 对辅助协变量进行回归,即 \(\hat{Y} = \gamma_{1}Z_{1} + ... + \gamma_{n}Z_{n} \)
将 \(D_{1}\) 对相同的辅助项进行回归,即 \(\hat{D_{1}} = \gamma_{1}Z_{1} + ... + \gamma_{n}Z_{n}\)
取残差 \(r(D) = D_{1} - \hat{D_{1}}\) 和 \(r(Y) = Y - \hat{Y}\)
拟合回归 \(r(Y) = \beta_{0} + \beta_{1}r(D)\) 以找到 \(\beta_{1}\)
去偏机器学习的思想是复制此练习,但在焦点变量是我们的治疗变量的情况下,利用机器学习模型的灵活性来估计两个残差项。
这与强可忽略性的要求有关,因为通过使用焦点变量作为 \(T\) 的过程,我们创建了治疗变量 \(r(T)\),该变量根据构造无法从协变量特征 \(X\) 预测,从而确保 \(T \perp\!\!\!\perp X\)。这是思想的完美结合!
使用 K 折交叉验证避免过度拟合#
正如我们在上面看到的,基于树的模型(如 BART)的自动正则化效果的担忧之一是它们倾向于过度拟合所看到的数据。在这里,我们将使用 K 折交叉验证来生成对数据的样本外分割的预测。此步骤对于避免模型过度拟合样本数据,从而避免我们上面看到的错误校准效果至关重要。同样,(这很容易被遗忘),这是一个纠正步骤,以避免由于过度索引观察到的样本数据而产生的已知偏差。当我们在特定样本上估计倾向评分时,使用倾向评分来实现平衡会在过度拟合的情况下失效。在这种情况下,我们的倾向模型与样本的特殊性过于一致,无法充当衡量总体相似性的角色。
dummies = pd.concat(
[
pd.get_dummies(df["seatbelt"], drop_first=True, prefix="seatbelt"),
pd.get_dummies(df["marital"], drop_first=True, prefix="marital"),
pd.get_dummies(df["race"], drop_first=True, prefix="race"),
pd.get_dummies(df["sex"], drop_first=True, prefix="sex"),
pd.get_dummies(df["phealth"], drop_first=True, prefix="phealth"),
],
axis=1,
)
train_dfs = []
temp = pd.concat([df[["age", "bmi"]], dummies], axis=1)
for i in range(4):
idx = temp.sample(1000, random_state=100).index
X = temp.iloc[idx].copy()
t = df.iloc[idx]["smoke"]
y = df.iloc[idx]["log_y"]
train_dfs.append([X, t, y])
remaining = [True if not i in idx else False for i in temp.index]
temp = temp[remaining]
temp.reset_index(inplace=True, drop=True)
应用去偏 ML 方法#
接下来,我们定义用于拟合和预测倾向评分和结果模型的函数。因为我们是贝叶斯方法,所以我们将记录结果模型和倾向模型残差的后验分布。在这两种情况下,我们都将使用基线 BART 规范来利用机器学习的灵活性来提高准确性。然后,我们使用 K 折过程拟合模型并预测样本外折叠的残差。这使我们能够通过后处理残差的后验预测分布来提取 ATE 的后验分布。
def train_outcome_model(X, y, m=50):
coords = {"coeffs": list(X.columns), "obs": range(len(X))}
with pm.Model(coords=coords) as model:
X_data = pm.MutableData("X", X)
y_data = pm.MutableData("y_data", y)
mu = pmb.BART("mu", X_data, y, m=m)
sigma = pm.HalfNormal("sigma", 1)
obs = pm.Normal("obs", mu, sigma, observed=y_data)
idata = pm.sample_prior_predictive()
idata.extend(pm.sample(1000, progressbar=False))
return model, idata
def train_propensity_model(X, t, m=50):
coords = {"coeffs": list(X.columns), "obs_id": range(len(X))}
with pm.Model(coords=coords) as model_ps:
X_data = pm.MutableData("X", X)
t_data = pm.MutableData("t_data", t)
mu = pmb.BART("mu", X_data, t, m=m)
p = pm.Deterministic("p", pm.math.invlogit(mu))
t_pred = pm.Bernoulli("obs", p=p, observed=t_data)
idata = pm.sample_prior_predictive()
idata.extend(pm.sample(1000, progressbar=False))
return model_ps, idata
def cross_validate(train_dfs, test_idx):
test = train_dfs[test_idx]
test_X = test[0]
test_t = test[1]
test_y = test[2]
train_X = pd.concat([train_dfs[i][0] for i in range(4) if i != test_idx])
train_t = pd.concat([train_dfs[i][1] for i in range(4) if i != test_idx])
train_y = pd.concat([train_dfs[i][2] for i in range(4) if i != test_idx])
model, idata = train_outcome_model(train_X, train_y)
with model:
pm.set_data({"X": test_X, "y_data": test_y})
idata_pred = pm.sample_posterior_predictive(idata)
y_resid = idata_pred["posterior_predictive"]["obs"].stack(z=("chain", "draw")).T - test_y.values
model_t, idata_t = train_propensity_model(train_X, train_t)
with model_t:
pm.set_data({"X": test_X, "t_data": test_t})
idata_pred_t = pm.sample_posterior_predictive(idata_t)
t_resid = (
idata_pred_t["posterior_predictive"]["obs"].stack(z=("chain", "draw")).T - test_t.values
)
return y_resid, t_resid, idata_pred, idata_pred_t
y_resids = []
t_resids = []
model_fits = {}
for i in range(4):
y_resid, t_resid, idata_pred, idata_pred_t = cross_validate(train_dfs, i)
y_resids.append(y_resid)
t_resids.append(t_resid)
t_effects = []
for j in range(1000):
intercept = np.ones_like(1000)
covariates = pd.DataFrame({"intercept": intercept, "t_resid": t_resid[j, :].values})
m0 = sm.OLS(y_resid[j, :].values, covariates).fit()
t_effects.append(m0.params["t_resid"])
model_fits[i] = [m0, t_effects]
print(f"Estimated Treatment Effect in K-fold {i}: {np.mean(t_effects)}")
显示代码单元格输出
Sampling: [mu, obs, sigma]
Multiprocess sampling (4 chains in 4 jobs)
CompoundStep
>PGBART: [mu]
>NUTS: [sigma]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 30 seconds.
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters. A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
Sampling: [mu, obs]
Sampling: [mu, obs]
Multiprocess sampling (4 chains in 4 jobs)
PGBART: [mu]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 49 seconds.
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters. A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
Sampling: [mu, obs]
Estimated Treatment Effect in K-fold 0: -0.0016336648336909886
Sampling: [mu, obs, sigma]
Multiprocess sampling (4 chains in 4 jobs)
CompoundStep
>PGBART: [mu]
>NUTS: [sigma]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 29 seconds.
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters. A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
Sampling: [mu, obs]
Sampling: [mu, obs]
Multiprocess sampling (4 chains in 4 jobs)
PGBART: [mu]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 48 seconds.
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters. A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
Sampling: [mu, obs]
Estimated Treatment Effect in K-fold 1: -0.03578615474390446
Sampling: [mu, obs, sigma]
Multiprocess sampling (4 chains in 4 jobs)
CompoundStep
>PGBART: [mu]
>NUTS: [sigma]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 29 seconds.
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters. A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
Sampling: [mu, obs]
Sampling: [mu, obs]
Multiprocess sampling (4 chains in 4 jobs)
PGBART: [mu]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 49 seconds.
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters. A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
Sampling: [mu, obs]
Estimated Treatment Effect in K-fold 2: -0.02477987896421092
Sampling: [mu, obs, sigma]
Multiprocess sampling (4 chains in 4 jobs)
CompoundStep
>PGBART: [mu]
>NUTS: [sigma]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 30 seconds.
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters. A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
Sampling: [mu, obs]
Sampling: [mu, obs]
Multiprocess sampling (4 chains in 4 jobs)
PGBART: [mu]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 49 seconds.
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters. A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
Sampling: [mu, obs]
Estimated Treatment Effect in K-fold 3: -4.86938979011845e-05
<xarray.DataArray 'obs' (z: 4000, obs_dim_2: 4000)> array([[ 0.22653491, -0.03317959, 3.07522255, ..., 0.79365904, 0.63064306, 0.39937819], [-0.24739791, 0.4835072 , 0.42044312, ..., 0.87374101, -1.12103236, 1.69532195], [ 0.62194872, 0.02413724, 2.11188934, ..., 1.75446071, 0.92303434, 0.46220062], ..., [ 0.69504896, -1.03537392, 0.77942322, ..., 2.24453555, -0.80185656, 2.48775509], [ 1.27379947, -1.33591298, 1.57352526, ..., 0.24282071, -1.53594352, 2.42091436], [ 0.71378209, 0.4391884 , 0.31025925, ..., 1.31416537, 1.59547794, 1.26825234]]) Coordinates: * obs_dim_2 (obs_dim_2) int64 0 1 2 3 4 5 6 7 ... 993 994 995 996 997 998 999 * z (z) object MultiIndex * chain (z) int64 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 3 3 3 3 3 3 3 3 3 3 3 3 3 * draw (z) int64 0 1 2 3 4 5 6 7 8 ... 992 993 994 995 996 997 998 999
后处理残差的后验预测分布
t_effects = []
intercepts = []
for i in range(4000):
intercept = np.ones_like(4000)
covariates = pd.DataFrame({"intercept": intercept, "t_resid": t_resids_stacked[i, :].values})
m0 = sm.OLS(y_resids_stacked[i, :].values, covariates).fit()
t_effects.append(m0.params["t_resid"])
intercepts.append(m0.params["intercept"])
fig, axs = plt.subplots(1, 2, figsize=(15, 6))
axs = axs.flatten()
axs[0].hist(t_effects, bins=30, ec="black", color="slateblue", label="ATE", density=True, alpha=0.6)
x = np.linspace(-1, 1, 10)
for i in range(1000):
axs[1].plot(x, intercepts[i] + t_effects[i] * x, color="darkred", alpha=0.3)
axs[0].set_title("Double ML - ATE estimate \n Distribution")
axs[1].set_title(r" Posterior Regression Line of Residuals: r(Y) ~ $\beta$r(T)")
ate = np.mean(t_effects)
axs[0].axvline(ate, color="black", linestyle="--", label=f"E(ATE) = {np.round(ate, 2)}")
axs[1].plot(x, np.mean(intercepts) + np.mean(t_effects) * x, color="black", label="Expected Fit")
axs[1].legend()
axs[0].legend();

我们可以看到,去偏机器学习技术如何帮助减轻天真地将 BART 模型拟合到倾向评分估计任务所产生的一些错误校准效果。此外,我们现在已经量化了 ATE 中的一些不确定性,这决定了残差上相对平坦的回归线。
条件平均治疗效果#
我们在这里注意到,双重 ML 方法也为问题提供了另一种视角。它们使您可以摆脱对平均治疗效果的关注,并检索更适合个体协变量特征的估计值。大部分理论细节在 Facure [2023] 中进行了详细阐述,我们在此不做介绍,只是建议在估计特定治疗效果时使用残差预测非常有意义。我们将在此处简要展示如何利用导出的残差,利用机器学习的灵活性来检索 CATE 估计值,以更好地捕获异质治疗效果的范围。
def make_cate(y_resids_stacked, t_resids_stacked, train_dfs, i, method="forest"):
train_X = pd.concat([train_dfs[i][0] for i in range(4)])
train_t = pd.concat([train_dfs[i][1] for i in range(4)])
df_cate = pd.DataFrame(
{"y_r": y_resids_stacked[i, :].values, "t_r": t_resids_stacked[i, :].values}
)
df_cate["target"] = df_cate["y_r"] / np.where(
df_cate["t_r"] == 0, df_cate["t_r"] + 1e-25, df_cate["t_r"]
)
df_cate["weight"] = df_cate["t_r"] ** 2
train_X.reset_index(drop=True, inplace=True)
train_t.reset_index(drop=True, inplace=True)
if method == "forest":
CATE_model = RandomForestRegressor()
CATE_model.fit(train_X, df_cate["target"], sample_weight=df_cate["weight"])
elif method == "gradient":
CATE_model = GradientBoostingRegressor()
CATE_model.fit(train_X, df_cate["target"], sample_weight=df_cate["weight"])
else:
CATE_model = sm.WLS(df_cate["target"], train_X, weights=df_cate["weight"])
CATE_model = CATE_model.fit()
df_cate["CATE"] = CATE_model.predict(train_X)
df_cate["t"] = train_t
return df_cate
显示代码单元格源
fig, axs = plt.subplots(1, 3, figsize=(20, 7))
axs = axs.flatten()
q_95 = []
for i in range(100):
cate_df = make_cate(y_resids_stacked, t_resids_stacked, train_dfs, i)
axs[1].hist(
cate_df[cate_df["t"] == 0]["CATE"],
bins=30,
alpha=0.1,
color="red",
density=True,
)
q_95.append(
[
cate_df[cate_df["t"] == 0]["CATE"].quantile(0.99),
cate_df[cate_df["t"] == 1]["CATE"].quantile(0.99),
cate_df[cate_df["t"] == 0]["CATE"].quantile(0.01),
cate_df[cate_df["t"] == 1]["CATE"].quantile(0.01),
]
)
axs[1].hist(cate_df[cate_df["t"] == 1]["CATE"], bins=30, alpha=0.1, color="blue", density=True)
axs[1].set_title(
"CATE Predictions \n Estimated across Posterior of Residuals", fontsize=20, fontweight="bold"
)
q_df = pd.DataFrame(q_95, columns=["Control_p99", "Treated_p99", "Control_p01", "Treated_p01"])
axs[2].hist(q_df["Treated_p99"], ec="black", color="blue", alpha=0.4, label="Treated p99")
axs[2].hist(q_df["Control_p99"], ec="black", color="red", alpha=0.4, label="Control p99")
axs[2].legend()
axs[2].set_title("Distribution of p99 CATE predictions")
axs[0].hist(q_df["Treated_p01"], ec="black", color="blue", alpha=0.4, label="Treated p01")
axs[0].hist(q_df["Control_p01"], ec="black", color="red", alpha=0.4, label="Control p01")
axs[0].legend()
axs[0].set_title("Distribution of p01 CATE predictions");

这种观点开始显示因果影响中异质性的重要性,并提供了一种评估治疗差异影响的方法。但是,虽然治疗人群(即吸烟者)被暗示具有更长的尾部和更极端的结果 - 您可能想知道为什么效果在某种程度上是对称的?为什么他们也会表现出较少的极端支出呢?
中介效应和因果结构#
上面我们已经看到了许多旨在避免偏差的不同方法如何引导我们得出结论,即吸烟对您可能的医疗保健成本的影响有限或没有影响。值得强调的是,这应该让您感到奇怪!
难题的解决方案不在于精确的估计方法,而在于问题的结构。不是说吸烟与医疗保健成本无关,而是这种影响是通过吸烟对我们健康的影响来调节的。
中介的结构性施加要求有效的因果推断只有在序贯可忽略性成立时才能通过。也就是说 - 潜在结果独立于治疗分配历史,条件是协变量特征。更多详细信息可以在 Daniels et al. [2024] 中找到。但是,如果我们按照 贝叶斯中介分析 中的描述将中介结构写入我们的模型,我们就可以看到它是如何工作的。根本的一点是,我们需要扩充我们的建模以解释因果流的结构。这是一种关于数据生成过程的实质性结构信念,我们将其强加于中介和结果的联合分布模型中。
fig, ax = plt.subplots(figsize=(20, 6))
graph = nx.DiGraph()
graph.add_node("T")
graph.add_node("M")
graph.add_node("Y")
graph.add_edges_from([("T", "M"), ("M", "Y"), ("T", "Y")])
nx.draw(
graph,
arrows=True,
with_labels=True,
pos={"T": (1, 2), "M": (1.8, 3), "Y": (3, 1)},
ax=ax,
node_size=6000,
font_color="whitesmoke",
font_size=20,
)

在这里,治疗 T 通过中介 M 流动以影响结果 Y,同时直接影响结果。
dummies = pd.concat(
[
pd.get_dummies(df["seatbelt"], drop_first=True, prefix="seatbelt"),
pd.get_dummies(df["marital"], drop_first=True, prefix="marital"),
pd.get_dummies(df["race"], drop_first=True, prefix="race"),
pd.get_dummies(df["sex"], drop_first=True, prefix="sex"),
pd.get_dummies(df["phealth"], drop_first=True, prefix="phealth"),
],
axis=1,
)
idx = df.sample(5000, random_state=100).index
X = pd.concat(
[
df[["age", "bmi"]],
dummies,
],
axis=1,
)
X = X.iloc[idx]
t = df.iloc[idx]["smoke"]
y = df.iloc[idx]["log_y"]
X
lkup = {
"phealth_Poor": 1,
"phealth_Fair": 2,
"phealth_Good": 3,
"phealth_Very Good": 4,
"phealth_Excellent": 5,
}
### Construct the health status variables as an ordinal rank
### to use the health rank as a mediator for smoking.
m = pd.DataFrame(
(
pd.from_dummies(
X[["phealth_Poor", "phealth_Fair", "phealth_Good", "phealth_Very Good"]],
default_category="phealth_Excellent",
).values.flatten()
),
columns=["health"],
)["health"].map(lkup)
X_m = X.copy().drop(["phealth_Poor", "phealth_Fair", "phealth_Good", "phealth_Very Good"], axis=1)
X_m
年龄 | bmi | seatbelt_Always | seatbelt_Never | seatbelt_NoCar | seatbelt_Seldom | seatbelt_Sometimes | marital_Married | marital_Separated | marital_Widowed | race_Black | race_Indig | race_Multi | race_PacificIslander | race_White | sex_Male | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2852 | 27 | 23.7 | False | False | False | False | False | True | False | False | False | False | False | False | True | True |
13271 | 71 | 29.1 | True | False | False | False | False | False | False | True | True | False | False | False | False | False |
6786 | 19 | 21.3 | True | False | False | False | False | False | True | False | False | False | False | False | True | False |
15172 | 20 | 38.0 | True | False | False | False | False | False | True | False | True | False | False | False | False | False |
10967 | 22 | 28.7 | True | False | False | False | False | True | False | False | False | False | False | False | True | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1135 | 51 | 29.8 | False | False | False | False | False | False | True | False | False | False | False | False | True | False |
1706 | 50 | 26.6 | True | False | False | False | False | True | False | False | False | False | False | False | True | False |
11205 | 24 | 23.7 | True | False | False | False | False | False | True | False | False | False | False | False | True | False |
12323 | 36 | 35.9 | True | False | False | False | False | False | False | False | True | False | False | False | False | True |
15604 | 29 | -9.0 | True | False | False | False | False | False | True | False | False | False | False | False | True | False |
5000 行 × 16 列
我们定义了一个灵活的模型,该模型使用 BART 模型估计结果,并使用更简单的线性回归估计中介变量。但是,我们将两者都嵌入线性方程中,以捕获“解读”吸烟对健康支出的总效应、间接效应和直接效应(由健康调节)所需的系数。
def mediation_model(X_m, t, m, y):
with pm.Model(coords={"coeffs": list(X_m.columns), "obs_id": range(len(X_m))}) as model:
t_data = pm.MutableData("t", t, dims="obs_id")
m = pm.MutableData("m", m, dims="obs_id")
X_data = pm.MutableData("X", X_m)
# intercept priors
im = pm.Normal("im", mu=0, sigma=10)
iy = pm.Normal("iy", mu=0, sigma=10)
# slope priors
a = pm.Normal("a", mu=0, sigma=1)
b = pm.Normal("b", mu=0, sigma=1)
cprime = pm.Normal("cprime", mu=0, sigma=1)
# noise priors
sigma1 = pm.Exponential("sigma1", 1)
sigma2 = pm.Exponential("sigma2", 1)
bart_mu = pmb.BART("mu", X_data, y)
beta = pm.Normal("beta", mu=0, sigma=1, dims="coeffs")
mu_m = pm.Deterministic("mu_m", pm.math.dot(X_data, beta), dims="obs_id")
# likelihood
pm.Normal(
"mlikelihood", mu=(im + a * t_data) + mu_m, sigma=sigma1, observed=m, dims="obs_id"
)
pm.Normal(
"ylikelihood",
mu=iy + b * m + cprime * t_data + bart_mu,
sigma=sigma2,
observed=y,
dims="obs_id",
)
# calculate quantities of interest
indirect_effect = pm.Deterministic("indirect effect", a * b)
total_effect = pm.Deterministic("total effect", a * b + cprime)
return model
model = mediation_model(X_m, t, m, y)
pm.model_to_graphviz(model)
with model:
result = pm.sample(tune=1000, random_seed=42, target_accept=0.95)
Multiprocess sampling (4 chains in 4 jobs)
CompoundStep
>NUTS: [im, iy, a, b, cprime, sigma1, sigma2, beta]
>PGBART: [mu]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 156 seconds.
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters. A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
ax = az.plot_posterior(
result,
var_names=["cprime", "indirect effect", "total effect"],
ref_val=0,
hdi_prob=0.95,
figsize=(20, 6),
)
ax[0].set(title="direct effect");

在这里,我们开始了解发生了什么。吸烟的直接影响是负面的,但通过健康状况中介的间接影响是正面的,并且综合效应抵消/减少了对结果的治疗效果。使用这种新结构,我们当然可以使用回归插补方法进行因果推断,并推导出关于吸烟通过观察到的健康特征调节的影响的某种其他条件不变的规律。更简洁地,我们可以使用潜在结果框架推导出因果中介估计量,这种方式不依赖于结果和中介模型的功能形式。
中介估计量#
在 中介 分析中,我们致力于这样一种观点,即问题的正确因果结构由治疗变量和感兴趣的结果之间的中断影响路径表示。这意味着治疗存在预期效果,但我们需要做更多工作来解开治疗对结果的自然直接效应 (NDE) 和自然间接效应 (NIE)。
这些量使用潜在结果符号表示如下
令 \(M(t)\) 为治疗规范 \(t\) 下的中介值。
那么 \(Y(t, M(t))\) 是结果变量 Y 的“嵌套”潜在结果,它依赖于基于 \(t\) 的 \(M\) 的实现。
区分 \(t\) 作为治疗变量在 \(\{ 0, 1\}\) 中的特定设置,以及 \(t^{*}\) 作为替代设置。
现在我们定义
NDE: \(E[Y(t, M(t^{*})) - Y(t^{*}, M(t^{*}))]\)
也就是说,我们对不同治疗下结果的差异感兴趣,这些差异由特定治疗方案下 M 的值调节。
NIE: \(E[(Y(t, M(t))) - Y(t, M(t^{*}))]\)
这相当于由于生成中介值 M 的治疗方案的差异而导致的结果 Y 的差异。
TE: NDE + NIE
这些方程很微妙,潜在结果符号可能有点晦涩,但这是我们继承的符号。关键点只是在 NDE 中,中介效应的治疗方案是固定的,而结果的值是变化的。在 NIE 中,中介效应的值是变化的,而结果变量的治疗方案是固定的。
我们将演示如何使用反事实插补方法恢复因果中介估计量,以获得中介值,并将这些插补的中介值适当地代入 NDE 和 NIE 的公式中。我们将这些计算基于我们拟合初始中介模型时导出的推断数据对象 result
。
def counterfactual_mediation(model, result, X, treatment_status=1):
if treatment_status == 1:
t_mod = np.ones(len(X), dtype="int32")
else:
t_mod = np.zeros(len(X), dtype="int32")
with model:
# update values of predictors:
pm.set_data({"t": t_mod})
idata = pm.sample_posterior_predictive(result, var_names=["mlikelihood"], progressbar=False)
return idata
### Impute Mediation values under different treatment regimes
### To be used to vary the imputation efforts of the outcome variable in the
### NDE and NIE calculations below.
idata_1m = counterfactual_mediation(model, result, X_m, treatment_status=1)
idata_0m = counterfactual_mediation(model, result, X_m, treatment_status=0)
def counterfactual_outcome(
model, result, m_idata, sample_index=0, treatment_status=1, modified_m=True
):
"""Ensure we can change sample_index so we can post-process the mediator posterior predictive
distributions and derive posterior predictive views of the conditional variation in the outcome.
"""
if treatment_status == 1:
t_mod = np.ones(len(X), dtype="int32")
m_mod = az.extract(m_idata["posterior_predictive"]["mlikelihood"])["mlikelihood"][
:, sample_index
].values.astype(int)
else:
t_mod = np.zeros(len(X), dtype="int32")
m_mod = az.extract(m_idata["posterior_predictive"]["mlikelihood"])["mlikelihood"][
:, sample_index
].values.astype(int)
if not modified_m:
m_mod = result["constant_data"]["m"].values
with model:
# update values of predictors:
pm.set_data({"t": t_mod, "m": m_mod})
idata = pm.sample_posterior_predictive(result, var_names=["ylikelihood"], progressbar=False)
return idata
### Using one draw from the posterior of the mediation inference objects.
### We vary the treatment of the outcome but keep the Mediator values static under
### the counterfactual regime of no treatment
idata_nde1 = counterfactual_outcome(model, result, m_idata=idata_0m, treatment_status=1)
idata_nde0 = counterfactual_outcome(model, result, m_idata=idata_0m, treatment_status=0)
### We fix the treatment regime for the outcome but vary the mediator status
### between those counterfactual predictions and the observed mediator values
idata_nie0 = counterfactual_outcome(model, result, m_idata=idata_0m, treatment_status=0)
idata_nie1 = counterfactual_outcome(
model, result, m_idata=idata_0m, treatment_status=0, modified_m=False
)
Sampling: [mlikelihood]
Sampling: [mlikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
### Natural Direct Effect
nde = (
idata_nde1["posterior_predictive"]["ylikelihood"].mean()
- idata_nde0["posterior_predictive"]["ylikelihood"].mean()
)
nde
<xarray.DataArray 'ylikelihood' ()> array(-0.0828947)
### Natural InDirect Effect
nie = (
idata_nie0["posterior_predictive"]["ylikelihood"].mean()
- idata_nie1["posterior_predictive"]["ylikelihood"].mean()
)
nie
<xarray.DataArray 'ylikelihood' ()> array(0.08293172)
### Total Effect
nde + nie
<xarray.DataArray 'ylikelihood' ()> array(3.70153227e-05)
请注意,我们以这种方式恢复的估计值如何反映了从上面中介模型导出的规范公式。然而,重要的是,无论我们的中介和结果模型的参数结构如何,这种插补方法都是可行的。
接下来,我们循环遍历中介值的反事实后验中的抽取样本,以推导出因果估计量的后验预测分布。
estimands = []
for i in range(400):
idata_nde1 = counterfactual_outcome(
model, result, m_idata=idata_0m, treatment_status=1, sample_index=i
)
idata_nde0 = counterfactual_outcome(
model, result, m_idata=idata_0m, treatment_status=0, sample_index=i
)
idata_nie0 = counterfactual_outcome(
model, result, m_idata=idata_0m, treatment_status=0, sample_index=i
)
idata_nie1 = counterfactual_outcome(
model, result, m_idata=idata_0m, treatment_status=0, modified_m=False, sample_index=i
)
nde = (
idata_nde1["posterior_predictive"]["ylikelihood"].mean()
- idata_nde0["posterior_predictive"]["ylikelihood"].mean()
)
nie = (
idata_nie0["posterior_predictive"]["ylikelihood"].mean()
- idata_nie1["posterior_predictive"]["ylikelihood"].mean()
)
te = nde + nie
estimands.append([nde.item(), nie.item(), te.item()])
estimands_df = pd.DataFrame(
estimands, columns=["Natural Direct Effect", "Natural Indirect Effect", "Total Effect"]
)
显示代码单元格输出
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
Sampling: [ylikelihood]
estimands_df.head()
自然直接效应 | 自然间接效应 | 总效应 | |
---|---|---|---|
0 | -0.083009 | 0.083016 | 0.000007 |
1 | -0.083036 | 0.086371 | 0.003335 |
2 | -0.082918 | 0.082654 | -0.000264 |
3 | -0.082951 | 0.084162 | 0.001211 |
4 | -0.082967 | 0.092567 | 0.009600 |
显示代码单元格源
fig, axs = plt.subplots(1, 3, figsize=(25, 8))
axs = axs.flatten()
axs[0].hist(estimands_df["Natural Direct Effect"], bins=20, ec="black", color="red", alpha=0.3)
axs[1].hist(estimands_df["Natural Indirect Effect"], bins=20, ec="black", alpha=0.3)
axs[2].hist(estimands_df["Total Effect"], bins=20, ec="black", color="slateblue")
axs[2].axvline(
estimands_df["Total Effect"].mean(),
color="black",
linestyle="--",
label="Expected Total Effect",
)
axs[1].axvline(
estimands_df["Natural Indirect Effect"].mean(),
color="black",
linestyle="--",
label="Expected NIE",
)
axs[0].axvline(
estimands_df["Natural Direct Effect"].mean(),
color="black",
linestyle="--",
label="Expected NDE",
)
axs[0].set_title("Posterior Predictive Distribution \n Natural Direct Effect")
axs[0].set_xlabel("Change in Log(Expenditure)")
axs[1].set_xlabel("Change in Log(Expenditure)")
axs[2].set_xlabel("Change in Log(Expenditure)")
axs[1].set_title("Posterior Predictive Distribution \n Natural Indirect Effect")
axs[2].set_title("Posterior Predictive Distribution \n Total Effect")
axs[2].legend()
axs[1].legend()
axs[0].legend()
plt.suptitle(
"Causal Mediation Estimands \n Using Potential Outcomes", fontsize=20, fontweight="bold"
);

结论#
倾向评分是思考因果推断问题结构的有用的工具。它们直接关系到选择效应的考虑因素,可以主动用于重新加权从观察数据集收集的证据。它们可以与灵活的机器学习模型和交叉验证技术一起应用,以纠正像 BART 这样的机器学习模型的过度拟合。它们不像贝叶斯先验的应用那样是模型拟合正则化的直接工具,但它们是同一脉络中的工具。它们要求分析师提出他们对治疗分配机制的理论 - 在强可忽略性下,这足以评估感兴趣结果的政策变化。
倾向评分在双重稳健估计方法和因果推断的去偏机器学习方法中的作用强调了结果变量理论与治疗分配机制理论之间的平衡。我们已经看到,盲目地将机器学习模型应用于因果问题可能会导致错误指定的治疗分配模型,以及基于朴素点估计值的严重倾斜的估计值。Angrist 和 Pischke 认为,这应该将我们推回周到而谨慎的回归建模的安全性。即使在去偏机器学习模型的情况下,也隐含地诉诸于回归估计量,该估计量支持 frisch-waugh-lowell 结果。但是,即使在这里,仓促的方法也会导致违反直觉的主张。这些用于思考的工具中的每一种都有应用范围 - 了解其局限性对于支持可信的因果主张至关重要。
我们看到的众多结果揭示了需要仔细关注数据生成模型的结构。最后一个例子说明了因果推断与因果结构图上的推断密切相关。不确定性在因果结构中的传播确实很重要!希望通过机器学习的魔力自动识别这些结构只是一种幻想。我们已经看到倾向评分方法试图通过重新加权推断作为纠正步骤,我们已经看到了双重稳健方法,这些方法试图通过机器学习策略的预测能力来纠正推断,最后我们已经看到了结构建模如何通过对协变量之间的影响路径施加约束来纠正估计值。
这就是推断的完成方式。您编码您对世界的知识,并随着证据的积累更新您的观点。因果推断需要对世界的结构性观点做出承诺,当无知使推断站不住脚时,我们不能假装无知。
参考文献#
MA Hernán 和 JM Robins. 因果推断:假设. Chapman & Hall/CRC, 2020.
Joshua Angrist 和 Jörn-Steffen Pischke. Mostly Harmless Econometrics. Princeton University Press, 2008.
Chernozhukov Victor、Chetverikov Denis、Demirer Mert、Duflo Esther、Hansen Christian、Newey Whitney 和 Robins James. 用于治疗和结构参数的双重/去偏机器学习. The Econometrics Journal, 21(1):1 – 68, 2018. doi:https://doi.org/10.1111/ectj.12097.
水印#
%load_ext watermark
%watermark -n -u -v -iv -w -p pytensor
Last updated: Wed Feb 28 2024
Python implementation: CPython
Python version : 3.11.7
IPython version : 8.20.0
pytensor: 2.18.6
networkx : 3.2.1
matplotlib : 3.8.2
statsmodels: 0.14.1
pymc : 5.10.3
pytensor : 2.18.6
arviz : 0.17.0
pandas : 2.2.0
pymc_bart : 0.5.7
xarray : 2024.1.1
numpy : 1.24.4
Watermark: 2.4.3
许可声明#
此示例库中的所有笔记本均根据 MIT 许可证 提供,该许可证允许修改和重新分发用于任何用途,前提是保留版权和许可证声明。
引用 PyMC 示例#
要引用此笔记本,请使用 Zenodo 为 pymc-examples 存储库提供的 DOI。
重要提示
许多笔记本改编自其他来源:博客、书籍... 在这种情况下,您也应该引用原始来源。
另请记住引用您的代码使用的相关库。
这是一个 bibtex 中的引用模板
@incollection{citekey,
author = "<notebook authors, see above>",
title = "<notebook title>",
editor = "PyMC Team",
booktitle = "PyMC examples",
doi = "10.5281/zenodo.5654871"
}
呈现后可能看起来像