Blockchains provide a fertile ground for storing and managing big data in a decentralized manner. Some examples of using blockchains for big data include Numer.ai for storing machine learning data shapes, IPFS and FileCoin for distributed storage, and BigchainDB as a distributed database with blockchain characteristics. Blockchains allow users to store and control their own data while granting view rights to interested and trusted parties, making it even GDPR compliant by keeping users as the data controllers.
8. Consistency, Availability, Partition Tolerance
Any read must reflect any (related) prior
write
Any live node must respond (correctly!) in
a finite (predefined?) amount of time
The system continues to work even if
network is partitioned
15. Fallacies of distributed computing
(by Peter Deutsch)
The network is
reliable.
There is one
administrator.
Infinite bandwidth.
The network is
secure.
Topology doesnt
change.
Latency is zero.
Transport cost is
zero.
The network is
homogeneous.
16. Latency - network & computation
Asynchrony: no upper bound
Synchrony: theres an upper bound
Partial-synchrony: theres an upper bound
but it is not known beforehand
18. Entity (e.g. node) actions
Respond correctly (0=>0, 1=>1)
Does not respond (0=>陸, 1=>陸)
Respond falsely (0=>1, 1=>0)
19. Distributed systems
Adversarial (Byzantine) vs non-adversarial
value replication vs state machine
replication (e.g. RBR vs SBR in MySQL)
Dont trust, verify (back in the days they
used to say Nullius in verba)
20. The holy trinity
Safety: only good things happen (i.e. bad
things dont happen)
Liveness: a good thing will eventually
happen
Fault-tolerance: can function with
components down
21. Consensus components (1/2)
Integrity: all correct processes decide at
most one value v, and v is the right
value (safety property).
Agreement: all correct processes must
agree on the same value (safety property).
22. Consensus components (2/2)
Termination: all correct processes decide
some value (liveness property).
Validity: if all correct processes decide
v, then v must have been proposed by some
correct process (non-triviality property).
27. Fun Facts
FLP just stands for Fischer, Lynch (the
same from Gilbert/Lynch!), Paterson
The term Byzantine (with the meaning
adversarial) was introduced in 1978 in a
work by Leslie Lamport and Robert Shostak
Lamport was roommate with Whitfield Diffie
(Diffie-Hellman anyone?) in the 70s
28. Conflict resolution
In distributed databases: through a nonce
or the timestamp of the operation
In blockchains - consensus protocol:
PoW: through probabilistic consensus (Nakamoto
consensus) - most work wins
PoS: crypto-economic incentives and disincentives
29. CAP vs FLP?
CAP theorem represents a trade-off between
liveness and safety
Availability ~ Liveness
Consistency ~ Safety
CAP is a particular case of FLP
30. Lets complicate things even more
Public vs Private blockchain (e.g. Bitcoin
vs Hyperledger)
Permissionless vs Permissioned (e.g
Ethereum vs Corda)
34. That is if you dont need a database that
Is decentralized
Is immutable
Is tamper proof
Works with untrusted participants (i.e.
suitable for value transfer networks)
45. For instance
Numer.ai (Fan and Vercauteren homomorphic
encryption scheme, because ML only cares
about data shape)
IPFS: distributed storage
FileCoin: distributed storage with Proof
of Replication and Proof of Storage
46. For instance
BigchainDB (big data distributed database
with blockchain characteristics)
OrbitDB (serverless, distributed, peer-
to-peer database)
47. For instance
The Graph (GraphQL for Web3)
Ocean Protocol (data economy, secure
expose data)
SingularityNET (Decentralized Artificial
Intelligence Marketplace - AI services
and algorithms)