Technology / Blockchain

The NUCLE.AI blockchain’s network structure is highly homologous to those of other blockchain platforms in that the verifiers (commonly called miners) of the system are individuals or groups with nodes that have large computational power.

Given this similarity, the NUCLE.AI network aims to pioneer the blockchain space by having verifiers of the network direct their computing powers to training node-unique machine learning models on human biological data.

The nodes that produce the best models for extrapolating from the provided data reap the reward of their computational work as outlined in the economics section (§4.1). We denote this process the Proof of Intelligent Work (PIW).

Verification Protocol for Proof of Intelligent Work

At its core, the NUCLE.AI block verification process takes the form of solving a classic machine learning optimization problem.

At the start of each verification (mining) cycle, the entire NUCLE.AI network is supplied with a batch of patient data encrypted using a structure preserving map as a training set along with a training label specifications.

The training models that crowdsourced data analysts use may be recurrent neural networks (RNN), topic modeling (such as LDA), or graphical models depending on the variables of interest within the dataset provided.

Each node of the NUCLE.AI network will then train their models on the provided training set during a predetermined training period. The exact duration of the training period will be determined by the load on the network, the number of nodes available, as well as the properties of variables being optimized over.

After the training period, each participating node will publish its model securely (via implementation of buffer times to avoid copy-optimizers) along with an explicit declaration of the block that the node is elongating.

Immediately following this declaration period, the NUCLE.AI network will be supplied with a validation set. Each active node will then run its trained models with the newly supplied validation set as the input, and publish their results.

The node with the model that produces the optimal sets of values as the output from the validation set will be awarded with network tokens (see §4.1, token economics), and have temporary agency over which token transactions and network information to append to the growing chain of verified blocks.

The step-by-step block generation protocol is outlined in §3.4.4.

Possible Attacks and Defenses

After the validation period, an attacker may try to flood the network with bogus models claiming to have high accuracy to waste the NUCLE.AI servers’ time to check them.
To prevent this, we will ask for all the scores from the nodes first, which is fast, then we will ask only the bestscoring node for its model along with a small deposit of tokens.
If we run the validation and find a worse score than what the node claims, we will keep the deposit then ask the runner up node for its model instead. An honest node will always get its deposit back.
If the particular properties of the data points in the training data set are valuable, attackers will attempt to analyze them to recover sensitive medical data such as DNA sequences. As will be explained further in §3.4.3., we utilize locally homeomorphic and globally non-homeomorphic encryption algorithms that render such attacks extremely difficult.
An attacker controlling the majority of the nodes but not the majority of the training power may attempt to rewrite the transaction history of the blockchain by rebroadcasting old blocks except with different recipients or amounts.
Our block generation leader will re-encrypt the training and validation data for each new block using the previous block’s hash as the key, which forces the attacker to retrain all subsequent blocks’ models in order to fool the network, making this infeasible.
As a safety net, during our early release, our block generation leader will also store the block IDs and hashes publicly and immutably in an Ethereum smart contract so all other nodes can detect tampering in blocks published by malicious nodes.
If an attacker gains full control over our servers, no sensitive client data is at risk. Patients encrypt the data themselves and share the private keys only with parties willing to pay through the data transfer smart contract. As discussed above, even our servers cannot alter previous transaction history.
However, a compromised block generation leader could give out the validation data to certain nodes in advance, allowing them to win every block. Indeed our block generation leader is a point of centralization, being the only entity that can commission the creation of new blocks.
There are ways to mitigate the damage of this attack, such as having network rules limiting block epoch frequency. Our block generation leader will be relatively simple and separated from other servers, minimizing the surface area so we can focus on the security of the few functions it performs.
If an attacker can manage to win a block while submitting an undertrained model, they have greatly improved chances at winning the next block as well, if many people opt to switch their model to this undertrained model. This becomes problematic if the attacker can continuously control the network, as they can then selectively prevent any transactions from being listed in the block.
However, depending on how the (likely) non-convex optimization space is organized, the attacker will eventually choose a direction at a saddle-point that converges to a worse local minimum than another model that chose a different direction, effectively breaking the chain.
Moreover, this chain can at most continue until the next starter block protocol. As such, this attack scenario is both unlikely and self-correcting.