際際滷

際際滷Share a Scribd company logo
(Big) Data in (Block) Chains
Felix Crisan
About Me
NETOPIA / mobilPay
Big Data
Blockchain
Blockchain Romania Community
@fixone on almost every platform
Trivia
Problem statement
Storage Computing
Monolithic Regular disk
(diskette, USB
drive)
Regular
processor
Distributed Disks in RAID,
NAS, SAN
MapReduce (?!)
Distributed database
= storage + more than
one processor (or,
these days, computer)
Distributed database
= homogenous* nodes
CAP Theorem
Postulated by Eric Brewer in 1999
Mathematical proof by Gilbert and Lynch in
2002
Consistency, Availability, Partition Tolerance
Any read must reflect any (related) prior
write
Any live node must respond (correctly!) in
a finite (predefined?) amount of time
The system continues to work even if
network is partitioned
Consistency,
Availability, Partition
Tolerance -> pick any
two
Distributed Database classification
Distributed ledger =
heterogeneous
distributed database
(usually mutable!)
Blockchain =
guaranteed
(cryptography FTW!)
immutable
DL with no trust
assumptions
Evolution of data structures
 File -> Database
 Database -> Distributed Database
 Distributed Database -> Distributed
Ledger
 Distributed Ledger -> Blockchain
Did anyone say b鉛看界一界鞄温庄稼?
Fallacies of distributed computing
(by Peter Deutsch)
The network is
reliable.
There is one
administrator.
Infinite bandwidth.
The network is
secure.
Topology doesnt
change.
Latency is zero.
Transport cost is
zero.
The network is
homogeneous.
Latency - network & computation
Asynchrony: no upper bound
Synchrony: theres an upper bound
Partial-synchrony: theres an upper bound
but it is not known beforehand
Byzantine failure
Failure models
Omission failure
(drop messages)
Crash failure
(stop functioning)
Entity (e.g. node) actions
Respond correctly (0=>0, 1=>1)
Does not respond (0=>陸, 1=>陸)
Respond falsely (0=>1, 1=>0)
Distributed systems
Adversarial (Byzantine) vs non-adversarial
value replication vs state machine
replication (e.g. RBR vs SBR in MySQL)
Dont trust, verify (back in the days they
used to say Nullius in verba)
The holy trinity
Safety: only good things happen (i.e. bad
things dont happen)
Liveness: a good thing will eventually
happen
Fault-tolerance: can function with
components down
Consensus components (1/2)
Integrity: all correct processes decide at
most one value v, and v is the right
value (safety property).
Agreement: all correct processes must
agree on the same value (safety property).
Consensus components (2/2)
Termination: all correct processes decide
some value (liveness property).
Validity: if all correct processes decide
v, then v must have been proposed by some
correct process (non-triviality property).
FLP impossibility:
fault-tolerant
agreement is
impossible in an
asynchronous system
Otherwise put:
distributed consensus
is impossible if one
process fails
This means that all
distributed systems
have trade-offs
(Performance
notwithstanding)
Fun Facts
FLP just stands for Fischer, Lynch (the
same from Gilbert/Lynch!), Paterson
The term Byzantine (with the meaning
adversarial) was introduced in 1978 in a
work by Leslie Lamport and Robert Shostak
Lamport was roommate with Whitfield Diffie
(Diffie-Hellman anyone?) in the 70s
Conflict resolution
In distributed databases: through a nonce
or the timestamp of the operation
In blockchains - consensus protocol:
 PoW: through probabilistic consensus (Nakamoto
consensus) - most work wins
 PoS: crypto-economic incentives and disincentives
CAP vs FLP?
CAP theorem represents a trade-off between
liveness and safety
Availability ~ Liveness
Consistency ~ Safety
CAP is a particular case of FLP
Lets complicate things even more
Public vs Private blockchain (e.g. Bitcoin
vs Hyperledger)
Permissionless vs Permissioned (e.g
Ethereum vs Corda)
Blockchain Trilemma
Scalability
Decentralization
Security
Okay, so when do I
need a blockchain?
Almost never.
That is if you dont need a database that
Is decentralized
Is immutable
Is tamper proof
Works with untrusted participants (i.e.
suitable for value transfer networks)
BigData in BlockChains
Data in Chains
BigData in BlockChains
Evolution of value
Up to 19th century-> land
Afterwards, until now-> capital
Future-> data (some say)
Not only data, also
metadata ( =
governance +
consensus )
Cambridge Analytica?
BigData in BlockChains
BigData in BlockChains?
Blockchains provide a
fertile ground for
data applications and
data storage
(however, beware of
the pseudonimity in
public blockchains)
For instance
Numer.ai (Fan and Vercauteren homomorphic
encryption scheme, because ML only cares
about data shape)
IPFS: distributed storage
FileCoin: distributed storage with Proof
of Replication and Proof of Storage
For instance
BigchainDB (big data distributed database
with blockchain characteristics)
OrbitDB (serverless, distributed, peer-
to-peer database)
For instance
The Graph (GraphQL for Web3)
Ocean Protocol (data economy, secure
expose data)
SingularityNET (Decentralized Artificial
Intelligence Marketplace - AI services
and algorithms)
Yes, blockchain
allows users to store
and control their own
data
And they can give
interested (trusted)
parties view rights
(using viewing keys)
 and its even GDPR
compliant (users
themselves are the
data controllers)
Its actually already working
Congrats, you made it so far!
BigData in BlockChains

More Related Content

BigData in BlockChains

Editor's Notes

  • #4: 5MB drive in 1956
  • #5: IBM System/360 Shipped Jun 1965 Weight 1700 (770 kg) Memory 8-64 kB CPU Bus 1.3 MB/s Mem Bus 0.7 MB/s 10-30 000 Instructions per second
  • #7: More than one processor -> multi thread/mutex/race conditions