This document discusses using genetic programming (GP) with secure multiparty computation (SMC) to analyze sensitive data from multiple parties without revealing the data. It presents an implementation of SMCGP using Obliv-C, an extension of C that compiles circuits for two-party computation (2PC). Experimental results show SMCGP has similar predictive performance as normal GP but with a runtime overhead of approximately 1,290 times slower. Scenarios include both 2PC with data split between two parties and 1PC with a single data holder. Applications include symbolic regression on benchmark datasets and fault localization using real-world defect data.
1 of 20
Download to read offline
More Related Content
Learning Without Peeking: Secure Multi-party Computation Genetic Programming
1. Learning Without Peeking:
Secure Multi-party Computation Genetic
Programming
Jinhan Kim, KAIST
Michael G. Epitropakis, Lancaster University
Shin Yoo, KAIST
SSBSE 2018
Montpellier, France
2. Genetic Programming
• GP has been successfully applied to various domains.
Data Security
Test coverage data
Defect proneness data
Internal resources
4. Example
Users don’t trust each other,
but they want to achieve a common goal.
Alice Bob
I don’t
trust Bob
I don’t
trust Alice
5. Example
Alice Bob
A = 1 if Alice loves Bob
0 otherwise
B = 1 if Bob loves Alice
0 otherwise
Common goal: A & B
Without trusted third party, we can do this by using
Secure Multiparty Computation (SMC)
7. 1-2 Oblivious Transfer
Alice Bob
Message m0, m1
Random bit b
C = m0 if b = 0
m1 if b = 1
Alice (sender) has no idea
what message has been sent.
Bob (receiver) don’t know
the other message.
The 1–2 oblivious transfer protocol can be implemented over
asymmetric cryptography, such as the RSA.
enctypted
9. Obliv-C
• Obliv-C is both a domain specific extension of C and a gcc
wrapper that compiles the extension.
High-level interface for 2PC
Extension of C
12. Scenario 1: Multiparty Dataholder (2PC)
Obliv-CData Party 1 Data Party 2
GP Party
Fault info
(Jan 2018 - Mar 2018)
Fault info
(April 2018 - June 2018)
14. Study Subjects
• Symbolic Regression using GP
• Keijzer-6
• Nguyen-7
• Vladislavleva-4
• Dow Chemical, Chemical process dataset
• Learning to Ranking using GP
• FLUCCS, Real-world fault dataset from Defects4j
15. RQ1: How well does the SMCGP perform
compared to the Normal-GP?
SMCGP and Normal-GP have
same logic, do same job
16. RQ1: Symbolic Regression
There is no statistical difference
between SMCGP and Normal-GP
For 2PC, we split dataset in half
17. RQ1: FLUCCS
Localize faults in Mockito-1 is
particularly challenging
Learning from 32 versions, we
tested on four versions.
18. RQ2: What is the runtime overhead of
SMCGP when compared to Normal-GP?