"The underlying data can never be decoded. However, the underlying patterns in the data will remain, ensuring that privacy and anonymity is maintained without compromising the accuracy of the models."
Data is at the core of our network. Nodes require precise and authentic data to train models in a competitive environment. Thus, the data collection process has been made a fundamental aspect platform. A data collection participant will be able send specific data, including DNA data, medical history, demographic information, etc. to the NUCLE.AI servers, in exchange for tokens upon usage (as explained in the proceeding section). This data will be pre-processed and filtered for any personally identifiable information.
Once a particular data point is selected for a training or validation set, it is filtered for its relevant sections and disseminated through the network in an encrypted form. This encryption scheme is robust, ensuring that the underlying data can never be decoded.
However, the underlying patterns in the data will remain, ensuring that privacy and anonymity is maintained without compromising the accuracy of the models.
The NUCLE.AI team understands the importance of ensuring that this data remains confidential. Our methods are fully HIPAA compliant. There is no scenario in which the platform exposes any of the data that it utilizes, as the overarching NUCLE.AI protocol performs a multitude of steps to ensure the security and anonymity of its medical data. Moreover, our platform will implement state-of-the-art differential privacy techniques to protect against personally identifying information in the resulting analyses, as explained in section §3.4.5.
While it is impossible to guarantee the authenticity of submitted data without physically processing the data, there are many ways in which falsified data can be preprocessed or filtered out. We can perform multivariate outlier checks on biometric data submissions to determine with high confidence whether the data was falsified. Naturally, to detect these outliers, we must first be able to accurately classify the data and perform feature scoring.
This will be accomplished by using the machine learning models produced by the NUCLE.AI blockchain network, as discussed in § 3.3.2 and 3.3.3.
"... there are many ways in which falsified data can be preprocessed or filtered out."
"... these models will continuously mature as the network matures."
While this will initially require our own models and pre-verified data, these models will continuously mature as the network matures.
Other filtering methods will be selectively applied on a case-by-case basis, dependent on the type of data. For example, for DNA data there are many ways we can check whether a particular DNA sample is internally consistent both through our models and using traditional algorithms, such as validating stretches of candidate coding sequences and observing the presence of microsatellite sequences.