With the rapid rise in both the interest (market capitalization) and the quantity of data in biotechnology, deriving meaningful and marketable insights becomes more difficult. While traditional techniques such as statistical modeling have been employed to interpret such data, recent advances in computational power have enabled resource intensive machine learning and artificial intelligence techniques to become more tractable.
In this section we discuss some sample problems, a secure encryption scheme, and a general framework to formulate experiments and train models.
Problems that will be solved in the NUCLE.AI network will be angled towards our partnerships.
While the examples above are clear, they only represent a small fraction of potential problems that may be solved from the vast blockchain of medical and biological data.
Clustering and identifying cells through time to determine effective drug and medicine treatments.
Using computer vision and object recognition to identify malignant tumors.
Providing tailored information based on analysis of personalized data.
Using medical information to identify at risk individuals for patients, doctors, analysts, and scientists.
To ensure the data aggregation maintains privacy for the individual users, the Nucle.ai scheme will use Differential Privacy. When tokens are exchanged for the statistical analysis from the models, some amount of inference can be made about specific individuals within the sampled group. As a result, the differential privacy scheme first proposed by Cynthia Dwork attempts to obfuscate personally identifying information by adding mathematical noise to collected samples, while still retaining a high degree of statistical accuracy. The scheme ensures that the data provides minimal evidence as to whether any individual person submitted their data, mitigating any form of linkage attacks.
While it is not possible for the individual contribution to be completely decoupled from the result, we can formally bound the privacy loss for a particular user by the relative likelihood that a given result R occured in a particular dataset.
The ε-differential privacy is defined by the equation:
Where D and D’ differ in at least one element. Next, for the various statistical data we will aggregate, we want to ensure low privacy loss while maintaining the statistical validity of our results. Let ƒ be a function mapping our data to a desired statistical output.
We denote Δƒ as the l1 sensitivity of this function, defined by:
A low sensitivity will correspond to low privacy loss, and so the goal of the scheme is to implement a differentially private algorithm with low sensitivity.
Now, let M denote our differentially private mechanism, which takes as input the dataset D and generates the aggregated statistics after adding mathematical noise. One method for generating such noise is by sampling from the Laplacian Distribution, X~Lap(λ), such that: