Pulse lineResearch With Heart Logo

Proof of Concept Example for Use of Simulation to Allow Data Pooling Despite Privacy Restrictions.

TitleProof of Concept Example for Use of Simulation to Allow Data Pooling Despite Privacy Restrictions.
Publication TypeJournal Article
Year of Publication2021
AuthorsFilshtein TJ, Li X, Zimmerman SC, Ackley SF, M Glymour M, Power MC
JournalEpidemiology
Volume32
Issue5
Pagination638-647
Date Published2021 Sep 01
ISSN1531-5487
Abstract

BACKGROUND: Integrating results from multiple samples is often desirable, but privacy restrictions may preclude full data pooling, and most datasets do not include fully harmonized variable sets. We propose a simulation-based method leveraging partial information across datasets to guide creation of synthetic data based on explicit assumptions about the underlying causal structure that permits pooled analyses that adjust for all desired confounders in the context of privacy restrictions.

METHODS: This proof-of-concept project uses data from the Health and Retirement Study (HRS) and Atherosclerosis Risk in Communities (ARIC) study. We specified an estimand of interest and a directed acyclic graph (DAG) summarizing the presumed causal structure for the effect of glycated hemoglobin (HbA1c) on cognitive change. We derived publicly reportable statistics to describe the joint distribution of each variable in our DAG. These summary estimates were used as data-generating rules to create synthetic datasets. After pooling, we imputed missing covariates in the synthetic datasets and used the synthetic data to estimate the pooled effect of HbA1c on cognitive change, adjusting for all desired covariates.

RESULTS: Distributions of covariates and model coefficients and associated standard errors for our model estimating the effect of HbA1c on cognitive change were similar across cohort-specific original and preimputation synthetic data. The estimate from the pooled synthetic incorporates control for confounders measured in either original dataset.

DISCUSSION: Our approach has advantages over meta-analysis or individual-level pooling/data harmonization when privacy concerns preclude data sharing and key confounders are not uniformly measured across datasets.

DOI10.1097/EDE.0000000000001373
Alternate JournalEpidemiology
PubMed ID34183527
PubMed Central IDPMC8338788
Grant ListU01 AG009740 / AG / NIA NIH HHS / United States
U01 HL096917 / HL / NHLBI NIH HHS / United States
U01 HL096902 / HL / NHLBI NIH HHS / United States
HHSN268201700002C / HL / NHLBI NIH HHS / United States
HHSN268201700001I / HL / NHLBI NIH HHS / United States
HHSN268201700004I / HL / NHLBI NIH HHS / United States
U01 HL096814 / HL / NHLBI NIH HHS / United States
R01 HL070825 / HL / NHLBI NIH HHS / United States
HHSN268201700003I / HL / NHLBI NIH HHS / United States
R01 AG057869 / AG / NIA NIH HHS / United States
U01 HL096812 / HL / NHLBI NIH HHS / United States
HHSN268201700005C / HL / NHLBI NIH HHS / United States
HHSN268201700001C / HL / NHLBI NIH HHS / United States
HHSN268201700003C / HL / NHLBI NIH HHS / United States
U01 HL096899 / HL / NHLBI NIH HHS / United States
HHSN268201700004C / HL / NHLBI NIH HHS / United States
HHSN268201700002I / HL / NHLBI NIH HHS / United States
HHSN268201700005I / HL / NHLBI NIH HHS / United States