Endovascular vs. Open Surgery:
Analysis of Survival Outcomes Using
Instrumental Variables

Jared Huling

Joint work with Dr. Menggang Yu

and Dr. James O'Malley, Dartmouth

Biostatistics & Medical Informatics
University of Wisconsin – Madison

www.stat.wisc.edu/~huling

Causal Inference In Observational Studies

Abdominal Aortic Aneurysm

Previous Analysis

Previous Analysis

Instrumental Variables

Can we estimate the causal effect of $X$ on $T$ without knowing $U$?

Consider the model $T = \beta X + U, \mbox{ where } U = (V, \epsilon)$.

so $\hat{\beta}_{OLS} = \beta + (X^TX)^{-1}X^TU$ is biased because $E(X^TU) \neq 0$

Instrumental Variables

Can we estimate the causal effect of $X$ on $T$ without knowing $U$?

Consider the model $T = \beta X + U, \mbox{ where } U = (V, \epsilon)$.

so $\hat{\beta}_{OLS} = \beta + (X^TX)^{-1}X^TU$ is biased because $E(X^TU) \neq 0$

Instrumental Variables


Instrumental Variables in Linear Regression



  • Assumptions
    • model:
    •      $T = \beta X + U $
    • where
    •        $Z\perp\!\!\!\perp U \mbox { and } X \not\!\perp\!\!\!\perp Z$
    • and   $T\perp\!\!\!\perp Z|X, U$

Instrumental Variables in Linear Regression



  • Estimator
    •      $\hat{\beta}_{IV} = (Z'X)^{-1}Z'T$ is consistent whereas $\hat{\beta}_{OLS} \mbox{ is not}$
    • Interpretation as Two-stage least squares:
    • $\hat{\beta}_{2SLS} = (\hat{X}'\hat{X})^{-1}\hat{X}'T \mbox{, where } \hat{X} = Z(Z'Z)^{-1}Z'X$ is another consistent IV estimator
    •     Equivalent to regressing $X \mbox{ on } Z$ and then regressing $T \mbox{ on } \hat{X}$ from the first regression


Instrumental Variables in Survival Analysis

Instrumental Variables in Survival Analysis




Instrumental Variables in Survival Analysis




Review: Accelerated Failure Time Model




Simulation from log-normal AFT Model with Crossing

Review: AFT Model Estimation

Rank-based Estimation of $\beta$ \begin{align*} \Psi_n(\beta) &= \sum_{i = 1}^n\int \rho(t, \beta)\{ X_i - \overline{X}(t, \beta) \} \mbox{ } \mathrm{d}N_i(t; \beta) \\ & \\ \mbox{where } \overline{X}(t, \beta) &\equiv \frac{1}{n}\sum_{j=1}^n X_j I(\epsilon_j^\beta \ge t) \mbox{ } / \mbox{ } \frac{1}{n}\sum_{j=1}^n I(\epsilon_j^\beta \ge t) \mbox{ and} \\ & \\ \epsilon_i^\beta &= \log{T_i} - \beta X_i \mbox{ is the residual for subject }i \\ \mbox{ and } N_i(t; \beta) &= I(\epsilon_i^\beta \leq t, \Delta_i = 1) \end{align*}

$\hat{\beta}$ is the zero crossing of $\Psi_n(\beta)$. Its asymptotic normality was proved by Tsiatis (1990) and Ying (1993)

Review: AFT Model Estimation

Rank-based Estimation of $\beta$ \begin{align*} \Psi_n(\beta) &= \sum_{i = 1}^n\int \rho(t, \beta)\{ X_i - \overline{X}(t, \beta) \} \mbox{ } \mathrm{d}N_i(t; \beta) \\ & \end{align*}

AFT Estimating Equation

Assumptions for IVs in the Accelerated Failure Time Model


    • The underlying model is:

      $\log \widetilde{T}_i = \beta X_i + U_i$,
      $i=1,\dots,n$


      • where
      •      $Z_i\perp\!\!\!\perp U_i$
      •      $\widetilde{T}_i\perp\!\!\!\perp Z_i|X_i, U_i$
      •      $X_i \not\!\perp\!\!\!\perp Z_i$
    • A key difference from standard AFT assumptions:
    •      $C_i\perp\!\!\!\perp (X_i, Z_i, U_i, \widetilde{T}_i)$

Methods for Unmeasured Confounders - possible estimator of $\beta$


In the spirit of 2SLS, replace $X$ with ${\color{Yellow}\hat{X}} = Z(Z'Z)^{-1}Z'X$ \begin{align*} \Psi_n^{2SLS}(\beta) &= \sum_{i = 1}^n\int \rho(t, \beta)\{ {\color{Yellow}\hat{X}_i} - {\color{Yellow}{\tilde{X}}(t, \beta)} \} \mbox{ } \mathrm{d}N_i(t; \beta) \\ & \\ \mbox{where } {\color{Yellow}{\tilde{X}}(t, \beta)} &\equiv \frac{1}{n}\sum_{j=1}^n {\color{Yellow}\hat{X}_j} I({\color{Yellow}\hat{\epsilon}_j^\beta} \ge t) \mbox{ } / \mbox{ } \frac{1}{n}\sum_{j=1}^n I( {\color{Yellow}\hat{\epsilon}_j^\beta} \ge t) \mbox{ and} \\ & \\ {\color{Yellow}\hat{\epsilon}_i^\beta} &= \log{T_i} - \beta {\color{Yellow}\hat{X}_i} \mbox{ is the residual for subject }i \mbox{ and } N_i(t; \beta) = I({\color{Yellow}\hat{\epsilon}_i^\beta} \leq t, \Delta_i = 1) \end{align*}

However this still imposes a linear

assumption on the effect of the IV on $X$

Methods for Unmeasured Confounders - possible estimator of $\beta$


In the spirit of IV, replace $X$ with ${\color{Yellow}Z}$ \begin{align*} \Psi_n^{IV}(\beta) &= \sum_{i = 1}^n\int \rho(t, \beta)\{ {\color{Yellow}Z_i} - {\color{Yellow}\overline{Z}(t, \beta)} \} \mbox{ } \mathrm{d}N_i(t; \beta) \\ & \\ \mbox{where } {\color{Yellow}\overline{Z}(t, \beta)} &\equiv \frac{1}{n}\sum_{j=1}^n {\color{Yellow}Z_j} I(\epsilon_j^\beta \ge t) \mbox{ } / \mbox{ } \frac{1}{n}\sum_{j=1}^n I(\epsilon_j^\beta \ge t) \mbox{ and} \\ & \\ \epsilon_i^\beta &= \log{T_i} - \beta {\color{Yellow}X_i} \mbox{ is the residual for subject }i \mbox{ and } N_i(t; \beta) = I(\epsilon_i^\beta \leq t, \Delta_i = 1) \end{align*}

We compare ${\color{Yellow}Z_i} \mbox{ with } {\color{Yellow}\overline{Z}(t, \beta)}$, the mean IV value for those in the risk set for $i$

Methods for Unmeasured Confounders - possible estimator of $\beta$



However, in \begin{align*} \Psi_n^{IV}(\beta) &= \sum_{i = 1}^n\int \rho(t, \beta)\{ {\color{Yellow}Z_i} - {\color{Yellow}\overline{Z}(t, \beta)} \} \mbox{ } \mathrm{d}N_i(t; \beta) \\ & \mbox{ } \\ \epsilon_i^\beta &= \log{T_i} - \beta {\color{Yellow}X_i} \mbox{ depends on } {\color{Yellow} C_i} \end{align*}

and thus this estimator is not consistent for $\beta$

IV Estimation in the Accelerated Failure Time Model


A natural method for handling non-ignorability of censoring is inverse probability-of-censoring weighting \begin{align*} \Psi_n^{IV-IPCW}(\beta) &= \sum_{i = 1}^n\int \rho(t, \beta)\{ {\color{Yellow}Z_i} - {\color{Yellow}\overline{Z}_{\hat{G}_C}(t, \beta)} \} \frac{\mathrm{d}N_i(t; \beta)}{\hat{G}_C(t + \beta X_i)} \\ & \\ \mbox{where } {\color{Yellow}\overline{Z}_{\hat{G}_C}(t, \beta)} &\equiv \frac{1}{n}\sum_{j=1}^n \frac{{\color{Yellow}Z_j} I({\color{Yellow}\epsilon_j^\beta} \ge t)}{\hat{G}_C(t + \beta X_j)} \mbox{ } / \mbox{ } \frac{1}{n}\sum_{j=1}^n \frac{I( {\color{Yellow}\epsilon_j^\beta} \ge t)}{\hat{G}_C(t + \beta X_j)} \mbox{ and} \\ & \\ {\color{Yellow}\epsilon_i^\beta} &= \log{T_i} - \beta {\color{Yellow}X_i} \mbox{ is the residual for subject }i \\ \mbox{ and } N_i(t; \beta) &= I({\color{Yellow}\epsilon_i^\beta} \leq t, \Delta_i = 1)\\ \mbox{and } \hat{G}_C & \mbox{ is the Kaplan-Meier estimator of } G_C\mbox{, the survival function of } C \end{align*}

Asymptotic Theory for IVs in the Accelerated Failure Time Model

Asymptotic Theory for IVs in the Accelerated Failure Time Model

Computation for IVs in the Accelerated Failure Time Model

Computation for IVs in the Accelerated Failure Time Model

Computation for IVs in the Accelerated Failure Time Model

Computation for IVs in the Accelerated Failure Time Model


Inference for IV Estimation in the AFT Model


Standard AFT - Inference

Inference for IV Estimation in the AFT Model

Simulation   $\widetilde{T} = \exp{\{ \beta X + \beta_U U + \epsilon \}}$ where $X = \alpha_Z \exp{\{Z\}} + \alpha_U U + \epsilon^* $ and $\epsilon$, $\epsilon^* \sim N(0,1)$, $\epsilon \perp\!\!\!\perp \epsilon^*$


$\alpha_U = $

$Cor(U,X)=$

$\beta_U = $

$Cor(U,{T})=$

$\alpha_Z = $

$Cor(Z,X)=$

Coverage   $\widetilde{T} = \exp{\{ \beta X + \beta_U U + \epsilon \}}$ where $X = \alpha_Z \exp{\{Z\}} + \alpha_U U + \epsilon^* $

Coverage (Zooming In)   $\widetilde{T} = \exp{\{ \beta X + \beta_U U + \epsilon \}}$ where $X = \alpha_Z \exp{\{Z\}} + \alpha_U U + \epsilon^* $

Preliminary Analysis

Kaplan-Meier Curves for All Patients

Kaplan-Meier Curves for Rupture Cases

Preliminary Analysis


Estimate of the effect of endovascular repair (EVAR)
on log survival time
accounting for prior conditions, demographic variables, and others


$ \begin{array}{c|rrl} {\text{Estimator}} & {\hat{\boldsymbol\beta}_{EVAR}} & {(95\% \text{ Conf.}} & {\text{Interval})} \\ \hline \text{AFT} & 0.047 & (-0.063, & 0.144) \\ \text{AFT-IV} & -0.169 & (-0.420, & 0.080) \\ \text{AFT-2SLS} & -0.175 & (-0.432, & 0.074) \\ \text{AFT-IV-IPCW} & -0.156 & (-0.364, & 0.052) \\ \end{array} $

Conclusions


/