Data Collection
Technology / Data Collection

Data Encryption

"The underlying data can never be decoded. However, the underlying patterns in the data will remain, ensuring that privacy and anonymity is maintained without compromising the accuracy of the models."

Data is at the core of our network. Nodes require precise and authentic data to train models in a competitive environment. Thus, the data collection process has been made a fundamental aspect platform. A data collection participant will be able send specific data, including DNA data, medical history, demographic information, etc. to the NUCLE.AI servers, in exchange for tokens upon usage (as explained in the proceeding section). This data will be pre-processed and filtered for any personally identifiable information.

Once a particular data point is selected for a training or validation set, it is filtered for its relevant sections and disseminated through the network in an encrypted form. This encryption scheme is robust, ensuring that the underlying data can never be decoded.

However, the underlying patterns in the data will remain, ensuring that privacy and anonymity is maintained without compromising the accuracy of the models.

The NUCLE.AI team understands the importance of ensuring that this data remains confidential. Our methods are fully HIPAA compliant. There is no scenario in which the platform exposes any of the data that it utilizes, as the overarching NUCLE.AI protocol performs a multitude of steps to ensure the security and anonymity of its medical data. Moreover, our platform will implement state-of-the-art differential privacy techniques to protect against personally identifying information in the resulting analyses, as explained in section §3.4.5.


Each submission of data will be accompanied by a corresponding wallet address. For each such data point, any usage within a training or validation set for a particular model will be recorded. Whenever an entity purchases the analysis of a model, a fraction of the coins expended in the purchase will be sent to the listed address.
Since many distinct entities may request similar analyses, and the same entity may even request the same analysis at different times (i.e. as better models develop from the ecosystem), a participating user can receive a continuous stream of revenue in the form of our token.
Even after submitting the data, the participant retains ownership of their data. Any data the NUCLE.AI platform receives will be kept confidential from other entities. Moreover, when NUCLE.AI receives the data, it will be pre-processed to remove any sensitive information to ensure that neither NUCLE.AI nor any of the data analysts have any ability to associate data with a particular participant.
Proper analysis of data provides insight into the latent factors associated with particular medical conditions, allowing for better predictive models and health profiles of patients. However, while a great deal of data is widely available, developing optimal models is a near impossible task, and training any particular model requires a massive amount of computation.
The NUCLE.AI platform solves both problems by crowdsourcing the development of models and leveraging the full computational power of the network to train the models. As a result, the models developed through the network will have a distinct competitive advantage over the models developed by small teams with limited amounts of computational power.
Thus, the analyses produced by the network will be highly desirable, providing a strong incentive for institutions to both hold the platform’s tokens, and to contribute data to further develop the models of interest.
The NUCLE.AI team is currently taking steps to registering the network tokens as a security with the Securities and Exchange Commission (SEC). If successfully verified, the token holders will receive dividends in the form of NUCLE.AI (NCI) tokens. The dividend payout will be proportional to the net income of NUCLE.AI, and the total number of NCI tokens held by an address.
Node operators provide the core operation of the network, training and/ or developing competitive models, confirming transactions, verifying the integrity of other nodes, and collectively asserting the different stages of the block cycles.
Model training requires tremendous amounts of computational power, ensuring its validity as a Proof of Intelligent Work, and allowing the block rewards to be homologous with other blockchain technologies.
These rewards will be carefully modulated to ensure that the additional difficulty of developing competitive models is incentivized accordingly.

Data Filtering

While it is impossible to guarantee the authenticity of submitted data without physically processing the data, there are many ways in which falsified data can be preprocessed or filtered out. We can perform multivariate outlier checks on biometric data submissions to determine with high confidence whether the data was falsified. Naturally, to detect these outliers, we must first be able to accurately classify the data and perform feature scoring.

This will be accomplished by using the machine learning models produced by the NUCLE.AI blockchain network, as discussed in § 3.3.2 and 3.3.3.

"... there are many ways in which falsified data can be preprocessed or filtered out."

"... these models will continuously mature as the network matures."

While this will initially require our own models and pre-verified data, these models will continuously mature as the network matures.

Other filtering methods will be selectively applied on a case-by-case basis, dependent on the type of data. For example, for DNA data there are many ways we can check whether a particular DNA sample is internally consistent both through our models and using traditional algorithms, such as validating stretches of candidate coding sequences and observing the presence of microsatellite sequences.