際際滷

際際滷Share a Scribd company logo
and how to deal with that 
Dark Side and bright side of AI in Medicine
Parisa Rashidi, PhD
University of Florida
AI: Good Force?
Source: Google Image Search, AI Quotes
AI: Dark Side?
Adversarial Attacks
 Small and carefully designed change in input or
model, to intentionally force the model to make
a mistake.
 Why potential in clinical applications:
 Claim reimbursement handled by algorithms,
 Digital surrogates of patient response to drugs trials, or
approval decisions,
 Often-competing interests within health care, e.g.
insurance claims,
 Providers seeking to maximize and payers seeking to minimize.
 Billions of dollars at stake in systems outputs.
Finlayson, Samuel G., John D. Bowers, Joichi Ito, Jonathan L. Zittrain, Andrew L. Beam, and Isaac S. Kohane. "Adversarial attacks on medical machine learning." Science 363, no. 6433 (2019): 1287-1289.
1
Benign Perturbation Malignant
2
Adversarial Attack Examples
Ethical gray zone
(a dermatologist
could the camera
at any angle)
Suggested by the
Endocrine Society
Finlayson, Samuel G., John D. Bowers, Joichi Ito, Jonathan L. Zittrain, Andrew L. Beam, and Isaac S. Kohane. "Adversarial attacks on medical machine learning." Science 363, no. 6433 (2019): 1287-1289.
3
Clinical Systems Vulnerabilityto Adversarial Attacks
 Ground truth is often ambiguous.
 Medical imaging is standard, attacks do not
need to meet same standards of invariance.
 Often, commodity network architectures are
used (think ImageNet).
 Medical data interchange is limited, no
universal mechanism for authentication.
 Hospital infrastructure hard to update.
 Many potential adversaries.
4
Solutions
 Active engagement and dialogue between medical, technical, legal,
and ethical experts.
 Leave it to endpoint users, rather than as a preemptive solution?
(procrastination principle)
 Otherwise, resulting in rigid regulatory structure and stalling development.
 Resilience is difficult: breaking systems is easier than protecting them.
 Incremental, defensive, short-term steps
 E.g. Fingerprint hash of data.
5
Reidentificationof Study Participants
Schwarz, Christopher G., Walter K. Kremers, Terry M. Therneau, Richard R. Sharp, Jeffrey L. Gunter,
Prashanthi Vemuri, Arvin Arani et al. "Identification of Anonymous MRI Research Participants with Face-
Recognition Software." New England Journal of Medicine 381, no. 17 (2019): 1684-1686.
83%
Match
84
volunteers
2
6
Biasin AI
 Problem in an algorithm sold by a leading health
services company, called Optum, for care
decision-making for millions of people.
 Correcting the bias would more than double the
number of black patients flagged as at risk of
complicated medical needs, collectively 48,772
additional chronic diseases.
7
3
Safety: Design & EvaluationDecisions
Spiegelhalter, David. "Should We Trust Algorithms?." Harvard Data Science Review 2, no. 1 (2020).
8
4
Safety: Hacking Performance Metric
 In practice, human doctors will be hyper-
vigilant about the high-risk subtype, even
though it is rare.
 Existing AI models do indeed show
concerning error rates on clinically
important subsets despite encouraging
aggregate performance metrics.
Mistaken cases,
not critical
Mistaken
cases, critical
Human
AI
System
Oakden-Rayner, Luke, Jared Dunnmon, Gustavo Carneiro, and Christopher R辿. "Hidden stratification causes
clinically meaningful failures in machine learning for medical imaging." arXiv preprint arXiv:1909.12475 (2019).
4
9
 Deep learning benefits
 No need for manual feature engineering,
 Improved performance,
 Ability to capitalize on large amounts of data.
 Significant drawback for clinical tasks: not
interpretable
 Models decisions and important features
are not inherently determined, resulting in:
 Lack of trust
 Inability to diagnose problems
 Undetected issues
Deep Learnings BlackBox
5
10
Interpretability
Shickel, Benjamin, Tyler J. Loftus, Lasith Adhikari, Tezcan Ozrazgat-Baslanti, Azra Bihorac, and Parisa Rashidi.
"DeepSOFA: A Continuous Acuity Score for Critically Ill Patients using Clinically Interpretable Deep Learning."
Scientific reports 9, no. 1 (2019): 1879.
11
Sustainability: Red AI vs. Green
AI
 Expensive processing of one
example: AlphaGo, the best version
of which required 1,920 CPUs and
280 GPUs to play a single game of
Go at a cost of over $1,000 per hour.
 Massive number of experiments:
researchers from Google trained over
12,800 neural networks in their neural
architecture search to improve
performance on object detection and
language modeling.
Schwartz, R., J. Dodge, and N. A. Smith. "Green AI (2019)." arXiv preprint arXiv:1907.10597 (2019).
The amount of compute used to train deep learning
models has increased 300,000x in 6 years.
12
6
A Frameworkfor AI Quality and Safety in
Medicine
Challen, Robert, Joshua Denny, Martin Pitt, Luke Gompels, Tom Edwards, and Krasimira Tsaneva-
Atanasova. "Artificial intelligence, bias and clinical safety." BMJ Qual Saf 28, no. 3 (2019): 231-237.
13
A general Frameworkfor AI Qualityand Safetyin Medicine
Challen, Robert, Joshua Denny, Martin Pitt, Luke Gompels, Tom Edwards, and Krasimira Tsaneva-Atanasova.
"Artificial intelligence, bias and clinical safety." BMJ Qual Saf 28, no. 3 (2019): 231-237.
14
Dos and Donts
Beil, Michael, Ingo Proft, Daniel van Heerden, Sigal Sviri, and Peter Vernon van Heerden. "Ethical considerations about artificial intelligence for prognostication in intensive
care." Intensive Care Medicine Experimental 7, no. 1 (2019): 70.
Fair, Accountable, and Transparent (FAT) algorithms
15
My Last 際際滷

More Related Content

Dark Side and Bright Side of AI in Medicine

  • 1. and how to deal with that Dark Side and bright side of AI in Medicine Parisa Rashidi, PhD University of Florida
  • 2. AI: Good Force? Source: Google Image Search, AI Quotes AI: Dark Side?
  • 3. Adversarial Attacks Small and carefully designed change in input or model, to intentionally force the model to make a mistake. Why potential in clinical applications: Claim reimbursement handled by algorithms, Digital surrogates of patient response to drugs trials, or approval decisions, Often-competing interests within health care, e.g. insurance claims, Providers seeking to maximize and payers seeking to minimize. Billions of dollars at stake in systems outputs. Finlayson, Samuel G., John D. Bowers, Joichi Ito, Jonathan L. Zittrain, Andrew L. Beam, and Isaac S. Kohane. "Adversarial attacks on medical machine learning." Science 363, no. 6433 (2019): 1287-1289. 1 Benign Perturbation Malignant 2
  • 4. Adversarial Attack Examples Ethical gray zone (a dermatologist could the camera at any angle) Suggested by the Endocrine Society Finlayson, Samuel G., John D. Bowers, Joichi Ito, Jonathan L. Zittrain, Andrew L. Beam, and Isaac S. Kohane. "Adversarial attacks on medical machine learning." Science 363, no. 6433 (2019): 1287-1289. 3
  • 5. Clinical Systems Vulnerabilityto Adversarial Attacks Ground truth is often ambiguous. Medical imaging is standard, attacks do not need to meet same standards of invariance. Often, commodity network architectures are used (think ImageNet). Medical data interchange is limited, no universal mechanism for authentication. Hospital infrastructure hard to update. Many potential adversaries. 4
  • 6. Solutions Active engagement and dialogue between medical, technical, legal, and ethical experts. Leave it to endpoint users, rather than as a preemptive solution? (procrastination principle) Otherwise, resulting in rigid regulatory structure and stalling development. Resilience is difficult: breaking systems is easier than protecting them. Incremental, defensive, short-term steps E.g. Fingerprint hash of data. 5
  • 7. Reidentificationof Study Participants Schwarz, Christopher G., Walter K. Kremers, Terry M. Therneau, Richard R. Sharp, Jeffrey L. Gunter, Prashanthi Vemuri, Arvin Arani et al. "Identification of Anonymous MRI Research Participants with Face- Recognition Software." New England Journal of Medicine 381, no. 17 (2019): 1684-1686. 83% Match 84 volunteers 2 6
  • 8. Biasin AI Problem in an algorithm sold by a leading health services company, called Optum, for care decision-making for millions of people. Correcting the bias would more than double the number of black patients flagged as at risk of complicated medical needs, collectively 48,772 additional chronic diseases. 7 3
  • 9. Safety: Design & EvaluationDecisions Spiegelhalter, David. "Should We Trust Algorithms?." Harvard Data Science Review 2, no. 1 (2020). 8 4
  • 10. Safety: Hacking Performance Metric In practice, human doctors will be hyper- vigilant about the high-risk subtype, even though it is rare. Existing AI models do indeed show concerning error rates on clinically important subsets despite encouraging aggregate performance metrics. Mistaken cases, not critical Mistaken cases, critical Human AI System Oakden-Rayner, Luke, Jared Dunnmon, Gustavo Carneiro, and Christopher R辿. "Hidden stratification causes clinically meaningful failures in machine learning for medical imaging." arXiv preprint arXiv:1909.12475 (2019). 4 9
  • 11. Deep learning benefits No need for manual feature engineering, Improved performance, Ability to capitalize on large amounts of data. Significant drawback for clinical tasks: not interpretable Models decisions and important features are not inherently determined, resulting in: Lack of trust Inability to diagnose problems Undetected issues Deep Learnings BlackBox 5 10
  • 12. Interpretability Shickel, Benjamin, Tyler J. Loftus, Lasith Adhikari, Tezcan Ozrazgat-Baslanti, Azra Bihorac, and Parisa Rashidi. "DeepSOFA: A Continuous Acuity Score for Critically Ill Patients using Clinically Interpretable Deep Learning." Scientific reports 9, no. 1 (2019): 1879. 11
  • 13. Sustainability: Red AI vs. Green AI Expensive processing of one example: AlphaGo, the best version of which required 1,920 CPUs and 280 GPUs to play a single game of Go at a cost of over $1,000 per hour. Massive number of experiments: researchers from Google trained over 12,800 neural networks in their neural architecture search to improve performance on object detection and language modeling. Schwartz, R., J. Dodge, and N. A. Smith. "Green AI (2019)." arXiv preprint arXiv:1907.10597 (2019). The amount of compute used to train deep learning models has increased 300,000x in 6 years. 12 6
  • 14. A Frameworkfor AI Quality and Safety in Medicine Challen, Robert, Joshua Denny, Martin Pitt, Luke Gompels, Tom Edwards, and Krasimira Tsaneva- Atanasova. "Artificial intelligence, bias and clinical safety." BMJ Qual Saf 28, no. 3 (2019): 231-237. 13
  • 15. A general Frameworkfor AI Qualityand Safetyin Medicine Challen, Robert, Joshua Denny, Martin Pitt, Luke Gompels, Tom Edwards, and Krasimira Tsaneva-Atanasova. "Artificial intelligence, bias and clinical safety." BMJ Qual Saf 28, no. 3 (2019): 231-237. 14
  • 16. Dos and Donts Beil, Michael, Ingo Proft, Daniel van Heerden, Sigal Sviri, and Peter Vernon van Heerden. "Ethical considerations about artificial intelligence for prognostication in intensive care." Intensive Care Medicine Experimental 7, no. 1 (2019): 70. Fair, Accountable, and Transparent (FAT) algorithms 15

Editor's Notes

  1. As recently as 2013, most hospitals were operating using the ninth edition of this coding scheme, published in 1978, despite the fact that a revised version (ICD-10) was published in 1990.
  2. Ethics in Human-AI Interactions1.Shouldnt violate others freedom.2.The benefit been created should outweigh the risk.3.Benefit and risk should distribute fairly to everyone.