When considering data privacy and protections, there is no data more important than personal data, whether that’s medical, financial, or even social. The discussions around access to our data, or even our metadata, becomes about who knows what, and if my personal data is safe. Today’s announcement between Intel, Microsoft, and DARPA, is a program designed around keeping information safe and encrypted, but still using that data to build better models or provide better statistical analysis without disclosing the actual data. It’s called Fully Homomorphic Encryption, but it is so computationally intense that the concept is almost useless in practice. This program between the three companies is a driver to provide IP and silicon to accelerate the compute, enabling a more secure environment for collaborative data analysis.

Mind Your Data

Data protection is one of the most important aspects to the future of computing. The volume of personal data is continually growing, as well as the value of that data, and the number of legal protections required. This makes any processing of personal, private, and confidential data difficult, often resulting in dedicated data silos, because any processing requires data transfer coupled with encryption/decryption, involving trust that isn’t always possible. All it takes is for one key in the chain to be lost or leaked, and the dataset is compromised.

There is a way around this, known as Fully Homomorphic Encryption (FHE). FHE enables the ability to take encrypted data, transfer it to where it needs to go, perform calculations on it, and get results without ever knowing the exact underlying dataset.

Take for example, analyzing medical data records: if a researcher needs to process a specific data-set for some analysis, the traditional method would be to encrypt the data, send the data, decrypt the data, and process it – but giving the researcher access to the specifics in the records might not be legal or face regulatory challenges. With FHE, that researcher can take the encrypted data, perform the analysis and get a result, without ever knowing any specifics of the dataset. This might involve combined statistical analysis of a population over multiple encrypted datasets, or taking those encrypted datasets and using them as additional inputs to train machine learning algorithms, enhancing the accuracy through having more data. Of course, the researcher has to have trust that the data given is complete and genuine, however that is arguably a different topic than enabling compute on encrypted data.

One of the issues as to why this matters is because the best insights from data come from the largest datasets. This includes being able to train a neural network, and the best neural networks are coming against issues of not having enough data, or are facing regulatory hurdles when it comes to the sensitive nature of that data. This is why Fully Homomorphic Encryption, the ability to analyze data without knowing its contents, matters.

Fully Homomorphic Encryption, as a concept, has been around for several decades, however the concept has only been realized in the last 20 years or so. A number of partial homomorphic encryption schemes were presented in that initial timeframe, and since 2010 several PHE/FHE designs able to process basic operations on encrypted data or cyphertexts have been developed with a number of libraries developed with industry standards. Some of these are open source. A lot of these methods are computationally complex for obvious reasons due to dealing with encrypted data, although efforts are being made with SIMD-like packing and other features to accelerate processing. Even though FHE schemes are being accelerated, this isn’t the same as decryption, because the math doesn’t decrypt the data - because the data is always in an encrypted state, it can (arguably) be used by untrusted third parties as the underlying information is never exposed. (One might argue that a sufficient dataset could reveal more than intended despite being encrypted.)

Today’s Announcement: Custom Silicon for FHE

When measuring the performance of FHE compute, the result is compared to the same analysis against the plain text version of the data. Because of the computational complexity of FHE compute, current compute methods are substantially slower. Encryption methods to enable FHE can increase the size of the data by 100-1000x, and then compute on that data is 10000x to 1 million times slower than conventional compute. That means that one second of compute on the raw data can take from 3 hours to 12 days.

So whether that means combining hospital medical records over a state, or customizing a personal service using personal metadata gathered on a user’s smartphone, FHE at that scale is no longer a viable solution. Enter the DARPA DPRIVE program.

  • DARPA: Defense Advanced Research Projects Agency
  • DPRIVE: Data Protection in Virtual Environments

Intel has announced that as part of the DPRIVE program, it has signed an agreement with DARPA to develop custom IP leading to silicon to enable faster FHE in the cloud, specifically with Microsoft on both Azure and JEDI cloud, initially with the US government. As part of this multi-year project, expertise from Intel Labs, Intel’s Design Engineering, and Intel’s Data Platforms Group will come together to create a dedicated ASIC to reduce the computational overhead of FHE over existing CPU-based methods. The press release states that the target is to reduce processing time by five orders of magnitude from current methods, reducing compute times from days to minutes.

Intel already has a foot in the door when it comes to FHE, having a research team inside Intel Labs dedicated to the issue. This has primarily been on the side of software, standards, and regulatory hurdles, but will now also move into hardware design, cloud software stacks, and collaborative deployment within Azure and JEDI cloud for the US government. Other highlighted target markets include healthcare, insurance, and finance.

During the Intel Labs day in December 2020, Intel detailed some of the direction it was already going in with this work, along with standards and development to parallel traditional encryption but at an international scale given the additional regulatory hurdles. Microsoft will now become part of that discussion with the DPRIVE program, along with Intel’s continued investments at the academic level.

Aside from the ‘five orders of magnitude’ element, today’s announcement doesn’t go beyond that in creating definitive goals, nor does it present a time-frame, instead saying this is a ‘multi-year’ agreement. It will be interesting to see how much Intel or their academic affiliations discuss on the topic beyond today, beyond the work standardization.

Related Reading

 

 

Comments Locked

31 Comments

View All Comments

  • edzieba - Tuesday, March 9, 2021 - link

    "R&D" is not a misspelled musical genre. You actually need to Research and Develop technologies before you can even think of turning them into commercial products. This is the announcement of an R&D programme.
  • stephenbrooks - Monday, March 8, 2021 - link

    If this allows training a neural network, I don't see how it isn't leaking the data, unless the results from the neural network also come out encrypted (in which case how would the researcher know *anything* from this process?)
  • brucethemoose - Monday, March 8, 2021 - link

    Yeah. One still has to trust the researchers/users not to build a model that spits out results too close to the original data, right? They would have no way to verify its accuracy though.

    Intentionally trying to leak stuff from processing homomorphically encrypted data would be an interesting avenue for research.
  • mwalker0 - Monday, March 8, 2021 - link

    It is possible to train a neural network on encrypted data, there is a lot of research in this area right now:
    https://arxiv.org/pdf/1904.07303.pdf
    https://arxiv.org/pdf/1911.07101.pdf
    https://papers.nips.cc/paper/2019/file/56a3107cad6...
  • Wereweeb - Tuesday, March 9, 2021 - link

    No one said it isn't possible, what they said is that a malicious actor could try milking the encrypted data to rebuild a generally accurate model of the original data.
  • JoeTheDestroyr - Wednesday, March 10, 2021 - link

    Yes, the results of the calculations are encrypted too. Only the provider of the (encrypted) data can decrypt results from said data.

    The point is to separate security of data from security of the computation (software and hardware).
  • suppurating-mind-zit - Tuesday, March 9, 2021 - link

    Not even the Borg would assimilate our species given our self-destructive, self-ruining, ignorant and rampant creation of technology that will serve to only weaken and eventually destroy the idea of the individual... and the ironic part is, the Borg'd version of that involves a little dignity still... unlike us and our version of the dystopian hell... nevermind no one here understands the comparison... ugh... enjoy the ... uhhhh... fun.. tech??? yay?

    k thx bye!!!1!!!!!
  • Dr. Swag - Wednesday, March 10, 2021 - link

    Correct me if wrong, but I think the article is incorrect in saying that the researcher can get the result on the dataset. I'm pretty sure the point is that you can delegate computation-which is sort of implied by the 2nd and 3rd images-without worrying about the third party computers being able to access the data itself. If they were able to obtain the results of the computations without ever seeing the data that would be immensely complicated and also probably leak information about the data. The point is you can delegate computations to third parties, they perform those computations on their computers, and then send the results to you and you can then decrypt the data to get the results. They never really have the results: just an encrypted form of the results
  • bottlething - Tuesday, March 16, 2021 - link

    Quote: “One might argue that a sufficient dataset could reveal more than intended despite being encrypted”.

    —the implications of this statement is almost incomprehensible and could provide an interesting perspective on the meaning of privacy in current discussions of digital information.

    As always, it is a privilege to have access to the thorough and expertly treatment of various topics here at Anandtech—thank you very much.
  • Galgomite - Wednesday, March 17, 2021 - link

    Sounds good to me. No homomorphic.

Log in

Don't have an account? Sign up now