Advanced Data Security

Securing Genetics with the Enya SDK

Millions of people avoid medical tests and sequencing due to privacy concerns. You might be curious about your health, but you are not sure if you want your genetic data to be seen by many people and used by advertisers to sell you things. Blockdoc allows disease risks to be securely calculated and delivered, without the data ever leaving your phone. How does this work?

The Blockdoc iOS and Android App use the Enya SDK and API, a leading data privacy framework. The Blockdoc App allows patients to estimate their risk of cardiovascular events using secure multiparty computation. In this approach, the data entered in response to standard risk assessment questions (e.g. smoking status, diabetes, BMI) is cryptographically protected before being analyzed by Rainbow Genomics. This protects the patient's privacy and also protects Rainbow, since they never see or handle sensitive patient data that would expose them to liability if these data were lost or leaked.

At the core of Enya’s framework for Blockdoc is secure multi-party computation (SMC) — a subfield of cryptography that allows parties to jointly compute while keeping inputs and algorithms private. Unlike traditional cryptography, which assures security and integrity of communication or storage and the adversary is outside the system of participants, in SMC the goal is to protect the privacy of all participants, whilst still allowing them to compute on the data to provide scores and risk estimates.

Imagine you are a doctor who studies obesity and heart attack risk. One of the first things you might like to know are age demographics across particular regions and countries. You could simply try to ask several million people for their date of birth but most people would be reluctant to tell you. Among other uses, banks use this information to verify a person's identity. Would you give your date of birth to someone who just contacted you by email? Even if you trusted that person, you might still be inclined to fudge and round down a few years, since most people prefer to be younger.

Here is another way – the doctor could ask everyone to add a random number between -100 to +100 to their age and submit the result. So if your age is 27 and your choice of random number is -12, you would subtract 12 from 27 and send the result, 15, to the doctor. The doctor would then not know your age, but she (or he) can still determine something very useful, which is the distribution of ages within the population. To do this, the doctor would average over many responses. Similar to how the mean displacement of a diffusing piece of dust is zero, the random terms would tend to cancel, gradually revealing a good estimate of the population’s true age structure.

This approach works well for research but there is an obvious problem – in a typical medical context, a patient expects their doctor to tell them something about their health and not just make abstract statements about global demographics.  

Let’s add a few twists to the above approach. Here is a simplified example of secure multiparty computation. Imagine a doctor has developed a new algorithm for estimating cardiac risk. Let’s say it’s a simple addition, such as risk = age + 3. The patient, Alice, may wish to keep her age (= 27) secret and the doctor may wish to protect her algorithm (risk = age + 3). As shown in the figure, these two parties could agree to do the following. First, they could both subtract a random number from their secret (Step 1) and then share the differences (Step 2). Next, they could each add their random number to the other party’s share (Step 3) and then exchange the results of those calculations (Step 4). Finally, one (or both) parties could add the shares, revealing the patient’s risk (= 30) without, at any point, transmitting either the patient’s age or the doctor’s secret.

Mathematically, how does this work? If you kept track of all the terms, you see that risk = (age – rp) + rd + (wd -rd) + rp = age + (rprp) + (rd - rd)+ wd. The middle two terms evaluate to zero, simplifying the expression to the desired result, risk = age + wd. Essentially, both parties inject noise into a communications protocol, the noisy signal undergoes previously agreed-upon linear mathematical operations, and then, the noise is removed at the very end. The fact that both parties remember and do not disclose or share their noise terms (rp and rd, respectively) is what protects the secrets. 

This simplified example omits details required to make such systems work in practice – most obviously, in this toy scheme, the patient can immediately obtain the doctor’s secret, wd, by inverting the algorithm (risk – 27 = 3). As soon as the secrets on both sides become more complex, it becomes exceedingly difficult for either party to learn (or even estimate) the other party’s secrets. Multiplication requires additional steps such as mathematically-paired random numbers called Beaver Triples. If you have read to this point, you might be interested to learn more, or, to come join us in Palo Alto to build a secure foundation for digital health. In the former case, here is the original 1991 publication on Beaver Triples (Efficient Multiparty Protocols Using Circuit Randomization) and in the latter case, additionally, reach out.

Back Top