Evaluation Considerations For EHR-Based Phenotyping

Evaluation considerations for EHR-based phenotyping algorithms: A case study for Drug Induced Liver Injury
Casey Overby, Chunhua Weng, Krystl Haerian, Adler Perotte Carol Friedman, George Hripcsak Department of Biomedical Informatics Columbia University
AMIA TBI paper presentation March 20th, 2013
Success in part due to GWAS consortia to obtain needed sample sizes

Published Genome-Wide Associations through 09/2011 1,596 published GWA at p5X10-8 for 249 traits
NHGRI GWA Catalog www.genome.gov/ GWAStudies Background and Motivation
There are added challenges to studying pharmacological traits

Drug response is complex
Risk factors in pathogenesis of drug induced liver injury (DILI)
Source:Tujios & Fontana et al. Nat. Rev. Gastroenterol. Hepatol. 2011
Sample sizes are small compared to typical association studies of
common disease
Adverse drug events Responder types
Background and Motivation
Consortium recruitment approaches

Recruit and phenotype participants prospectively
Protocol driven recruitment
Electronic health records (EHR) linked with DNA biorepositories

EHR phenotyping
Successes developing EHR algorithms within eMERGE

Type II diabetes Peripheral arterial disease Atrial fibrillation Crohn disease Multiple sclerosis Rheumatoid arthritis
Source: www.phekb.org
High PPV
(Kho et al. JAMIA 2012; Kullo et al. JAMIA 2010; Ritchie et al. AJHG 2010; Denny et al. Circulation 2010; Peissig et al. JAMIA 2012)
Unique characteristics of DILI

Rare condition of low prevalence Complex condition
Drug is causal agent of liver injury Different drugs can cause DILI Pattern of liver injury varies between drug
Pattern of liver injury based on liver enzyme elevations No tests to confirm drug causality (some assessment tools exist) High PPV may be challenging
Why study?
DILI accounts for up to 15 % of acute liver failure cases in the
U.S., of which 75% requires liver transplant or lead to death Most frequent reason for withdrawal of approved drugs from the market
Lack understanding of underlying mechanisms of DILI Computerized approaches can reduce the burden of
traditional approaches to screening for rare conditions

(Jha AK et al. JAMIA 1998;Thadani SR et al. JAMIA 2009)
Overview of EHR phenotyping process
Case definition

(Re-)Design EHR Phenotyping algorithm e.g., ICD-9 codes for acute liver injury, Decreased liver function lab
Case definition
e.g., liver injury

(Re-)Design EHR Phenotyping algorithm
Case definition
Implement EHR Phenotyping algorithm

(Re-)Design EHR Phenotyping algorithm
Case definition
Evaluate EHR Phenotyping algorithm

(Re-)Design EHR Phenotyping algorithm If algorithm needs improvement Implement EHR Phenotyping algorithm
Case definition
Disseminate EHR Phenotyping algorithm If algorithm is sufficient to be useful
Overview of methods to develop & evaluate initial algorithm

Design EHR Phenotyping algorithm
DILI Case definition (iSAEC)
Disseminate EHR Phenotyping algorithm
Methods and Results
Overview of methods to develop & evaluate initial algorithm

Report lessons learned
Develop an evaluation framework Methods and Results
Lessons inform evaluator approach and algorithm design changes

Develop an evaluation framework
Lessons learned
Initial DILI EHR phenotyping algorithm
Clinical data warehouse
A1. Diagnosed with liver injury?
18,423
yes
A2. Exposure to drug? no
no Exclude Exclude no
Consider chronicity no
13,972
yes Exclude
C3. ALT >= 3x ULN
B. New liver injury?
no
C2. ALT >= 5x ULN
DILI case definition

no
2,375
yes
1.
a.
Liver injury diagnosis (A1)

Acute liver injury (C1-C4) b. New liver injury (B)
yes
yes
C1. ALP >= 2x ULN
yes
2.
Caused by a drug
a. New drug (A2) b. Not by another disease (D)
C4. Bilirubin >= 2x ULN yes
1,264
D. Other diagnoses?
560
no
Patients meeting drug induced liver injury criteria
Ref: Aithal, G.P., et al. Case Definition and Phenotype Standardization in Drug-induced Liver Injury. Clin Charmacol Ther. 2011 Jun; 89(6): 806-15
no
yes
Exclude
Exclude
Methods and Results
Estimated positive predictive value

Initial algorithm results: 100 randomly selected for manual review from 560 patients
Reviewer 1
TP: 27 FP: 42 NA: 30 PPV: TP/(TP+FP) = 27/(42+27) = 39%
20
Reviewer 2
20
20
20
Reviewer 3
20
Reviewer 4
Preliminary kappa coefficient:
0.50 (Moderate agreement)
Interpretation of PPV is unclear given moderate agreement among reviewers
Methods and Results
Included measurement and demonstration studies

Measurement study goal to determine the extent and nature of the errors with which a measurement is made using a specific instrument. Evaluator effectiveness Demonstration study goal establishes a relation which may be associational or causal between a set of measured variables. Algorithm performance
Definitions from: Evaluation methods in medical informatics Friedman & Wyatt 2006
Methods and Results
Included quantitative and qualitative assessment

Quantitative data Inter-rater reliability assessment PPV Qualitative data Perceptions of evaluation approach effectiveness
e.g., review tool, artifacts reviewed
Perceptions of benefit of results

e.g., correct for the case definition?
Methods and Results
An evaluation framework and results

Measurement study (evaluator effectiveness) Quantitative results Kappa coefficient: 0.50 TP: 27 FP: 42 NA: 30 PPV: TP/(TP+FP) = 39% Demonstration study (algorithm performance)
Qualitative results
Perceptions of evaluation approach effectiveness: Differences between evaluation platforms Visualizing lab values Availability of notes Discharge summary vs. other notes
Perceptions of benefit of results (themes in FPs): Babies Patients who died Overdose patients Patients who had a liver transplant
Methods and Results
Lesson learned: Whats correct for the algorithm may not be correct for the case definition
Are we measuring what we mean to measure? Case definition: liver injury due to medication, not by another disease Many FPs were transplant patients Patients correct for the algorithm, but liver enzymes elevated due to procedure Revision: exclude transplant patients
Lessons learned
Improved algorithm design given themes in FPs

Added exclusions Babies Overdose patients Patients who died Transplant patients
A collaborative approach to develop an EHR phenotyping algorithm for DILI in preparation
Lessons learned
Lesson learned: Evaluator effectiveness influences ability to drawing appropriate inferences about algorithm performance
How does our evaluation approach influence performance
estimations?
Moderate agreement among algorithm reviewers, so interpretation of
PPV unclear
Revision: Improve evaluator approach
Lessons learned
Improved evaluator approach

aph value
Consensus among 4 reviewers Assign TP/FP status by
300
200
100
20120315
20120320
20120325
20120330
20120405
20120410
20120415
20120420
20120426
20120501
20120506
20120511
20120517
20120522
20120527
20120601
20120607
20120612
20120617
20120622
20120628
20120703
20120708
20120713 20120713 20120713
400
1. First-pass review of temporal relationship

alt value
200
Assign preliminary TP, FP, unknown status
20120315
20120320
20120325
20120330
20120405
20120410
20120415
20120420
20120426
20120501
20120506
20120511
20120517
20120522
20120527
20120601
20120607
20120612
20120617
20120622
20120628
20120703
20120708
Confirm suspected TPs

bilirubinIV value
Assign TP/FP if unknown status in first pass
review
20120315
20120320
20120325
20120330
20120405
20120410
20120415
20120420
20120426
20120501
20120506
20120511
20120517
20120522
20120527
20120601
20120607
20120612
20120617
20120622
20120628
20120703
20120708
A collaborative approach to develop an EHR phenotyping algorithm for DILI in preparation
Lessons learned
20120719
20120719

2. Perform Chart review
20120719
Summary of findings
Lessons learned from applying our evaluation framework Whats correct for the algorithm may not be correct for the case definition (Are we measuring what we mean to measure?) Evaluator effectiveness influences ability to draw appropriate inferences about algorithm performance Demonstrated that our evaluation framework is useful Informed improvements in algorithm design Informed improvements in evaluator approach Likely more useful for rare conditions Demonstrated EHR phenotyping algorithm development is an iterative process Complexity of the algorithm may influence
Acknowledgments
Dr. Yufeng Shen - Serious Adverse Event Consortium collaborator eMERGE collaborators
Mount Sinai (Drs. Omri Gottesman, Erwin Bottinger, and Steve Ellis) Mayo Clinic (Drs. Jyotishman Pathak, Sean Murphy, Kevin Bruce, Stephanie Johnson,
Jay Talwalker, Christopher G. Chute, Iftikhar J. Kullo) Northwestern (Dr. Abel Kho) Vanderbilt (Dr. Josh Denny)
DILIN collaborator
UNC-CH (Dr. Ashraf Farrag)
Columbia Training in Biomedical Informatics (NIH NLM #T15 LM007079) The eMERGE network U01 HG006380-01 (Mount Sinai)
Questions?
Quantitative results
Measurement study Kappa coefficient: 0.50 TP: 27 FP: 42 NA: 30
Demonstration study
PPV: TP/(TP+FP) = 39%
Develop an evaluation framework
Casey L. Overby casey@dbmi.columbia.edu
Qualitative results
Perceptions of evaluation approach effectiveness: Differences between evaluation platforms Visualizing lab values Availability of notes Discharge summary vs. other notes
Perceptions of benefit of results (themes in FPs): Babies Patients who died Overdose patients Patients who had a liver transplant

Evaluation Considerations For EHR-Based Phenotyping

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Evaluation Considerations For EHR-Based Phenotyping

Uploaded by

Copyright:

Available Formats

Evaluation considerations for EHR-based phenotyping algorithms: A case study for Drug Induced Liver Injury

AMIA TBI paper presentation March 20th, 2013

Success in part due to GWAS consortia to obtain needed sample sizes

NHGRI GWA Catalog www.genome.gov/ GWAStudies Background and Motivation

There are added challenges to studying pharmacological traits

Source:Tujios & Fontana et al. Nat. Rev. Gastroenterol. Hepatol. 2011

Sample sizes are small compared to typical association studies of

Background and Motivation

Consortium recruitment approaches

Electronic health records (EHR) linked with DNA biorepositories

Background and Motivation

Successes developing EHR algorithms within eMERGE

Background and Motivation

Unique characteristics of DILI

Background and Motivation

traditional approaches to screening for rare conditions

Background and Motivation

Overview of EHR phenotyping process

Background and Motivation

Overview of EHR phenotyping process

e.g., liver injury

Background and Motivation

Overview of EHR phenotyping process

Implement EHR Phenotyping algorithm

Background and Motivation

Overview of EHR phenotyping process

Evaluate EHR Phenotyping algorithm

Implement EHR Phenotyping algorithm

Background and Motivation

Overview of EHR phenotyping process

Disseminate EHR Phenotyping algorithm If algorithm is sufficient to be useful

Evaluate EHR Phenotyping algorithm

Background and Motivation

Overview of methods to develop & evaluate initial algorithm

DILI Case definition (iSAEC)

Disseminate EHR Phenotyping algorithm

Evaluate EHR Phenotyping algorithm

Implement EHR Phenotyping algorithm

Methods and Results

Overview of methods to develop & evaluate initial algorithm

DILI Case definition (iSAEC)

Disseminate EHR Phenotyping algorithm

Evaluate EHR Phenotyping algorithm

Implement EHR Phenotyping algorithm

Report lessons learned

Develop an evaluation framework Methods and Results

Lessons inform evaluator approach and algorithm design changes

DILI Case definition (iSAEC)

Disseminate EHR Phenotyping algorithm

Evaluate EHR Phenotyping algorithm

Implement EHR Phenotyping algorithm

Report lessons learned

Develop an evaluation framework

Initial DILI EHR phenotyping algorithm

Clinical data warehouse

A1. Diagnosed with liver injury?

A2. Exposure to drug? no

C3. ALT >= 3x ULN

B. New liver injury?

C2. ALT >= 5x ULN

DILI case definition

Liver injury diagnosis (A1)

C1. ALP >= 2x ULN