The document discusses Google's engineering culture and infrastructure. It provides an overview of Google's practices around code review, team programming using tools like Gerrit, and the engineering pipeline. It also shares personal stories from software engineers and principles for balancing process with creativity.
Simple practices in performance monitoring and evaluationSchubert Zhang
?
This document discusses concepts and approaches for performance monitoring and evaluation. It defines key metrics like throughput, latency, concurrency and provides examples for measuring API and system performance. Specific metrics are outlined for services like call centers. Benchmarking quality of services and setting performance SLAs are also covered. The document provides code examples for implementing metrics collection and visualization using tools like JMX, Ganglia and Zabbix. It demonstrates measuring performance for a demo web application.
The document contains career advice articles on various topics:
1) Engineers should provide feedback and work with product owners, not just implement orders.
2) People should self-promote good work to get noticed and advance their careers.
3) Technical skills are important but interacting well with others is key to career progression.
4) Minor work issues should not be overblown and one should maintain perspective outside of work.
HiveServer2 was reconstructed and reimplemented to address limitations in the original HiveServer1 such as lack of concurrency, incomplete security implementations, and instability. HiveServer2 uses a multithreaded architecture where each client connection creates a new execution context including a session and operations. This allows HiveServer2 to associate a Hive execution context like the session and Driver with the thread serving each client request. The new Thrift interface in HiveServer2 also enables better support for common database features around authentication, authorization, and auditing compared to the original Thrift API in HiveServer1.
Horizon is a distributed SQL database that allows users to query and analyze big data stored in HBase using a familiar SQL interface. It uses the H2 database engine and customizes HBase's data model to provide features like indexing, partitioning, and SQL support. Horizon aims to make big data more accessible while maintaining HBase's scalability. It will integrate with Hadoop ecosystems and provide high performance data loading, scanning, and analysis tools. Horizon's architecture distributes the SQL engine across servers and uses HBase as the distributed storage layer.
This document provides an introduction and overview of HBase coprocessors. It discusses the motivations for using coprocessors such as performing distributed and parallel computations directly on data stored in HBase without data movement. It describes the architecture of coprocessors and compares the HBase coprocessor model to Google's Bigtable coprocessor model. It also provides details on the different types of coprocessors (observers and endpoints), how they are implemented and used, and provides examples code for both.
- The document discusses the vision for a new big data database (BigDataBase) with high scalability and the ability to store and analyze petabytes of data in real-time.
- An initial trial using HBase as the storage engine for a customized SQL interface showed potential but had limitations in features, models, and performance.
- The document proposes wrapping HBase in a middleware to add it as a pluggable storage engine to MySQL/PostgreSQL, enabling SQL queries over HBase's distributed data storage.
- It also considers designing a new SQL server from scratch that interfaces with HBase through the middleware, implementing additional database features like indexing, ACID compliance, and partitioning for big data work
The document discusses fans of Running Gump and their love for his running. It notes Google is good at run and Running Gump is stylish and wonderful when running. His runs are plentiful and he has become idolized and mythicized, gaining many followers and fans. While his new ideas are anticipated, the reasons for and methods behind his success are less known. Ultimately, his popularity may stem from love and keeping his feet on the ground.
The document provides an evaluation report of DaStor, a Cassandra-based data storage and query system. It summarizes the testbed hardware configuration including 9 nodes with 112 cores and 144GB RAM. It also describes the DaStor configuration, data schema for call detail records (CDR), storage architecture with indexing scheme, and benchmark results showing a throughput of around 80,000 write operations per second for the cluster.
This document discusses big data and cloud computing. It introduces cloud storage and computing models. It then discusses how big data requires distributed systems that can scale out across many commodity machines to handle large volumes and varieties of data with high velocity. The document outlines some famous cloud products and their technologies. Finally, it provides an overview of the company's focus on enterprise big data management leveraging cloud technologies, and lists some of its cloud products and services including data storage, object storage, MapReduce and compute cloud services.
This document provides an overview of Google's Megastore database system. It discusses three key aspects: the data model and schema language for structuring data, transactions for maintaining consistency, and replication across datacenters for high availability. The data model takes a relational approach and uses the concept of entity groups to partition data at a fine-grained level for scalability. Transactions provide ACID semantics within entity groups. Replication uses Paxos consensus for strong consistency across datacenters.
Hanborq optimizations on hadoop map reduce 20120221aSchubert Zhang
?
Hanborq has developed optimizations to improve the performance of Hadoop MapReduce in three key areas:
1. The runtime environment uses a worker pool and improved scheduling to reduce job completion times from tens of seconds to near real-time.
2. The processing engine utilizes techniques like sendfile for zero-copy data transfer and Netty batch fetching to reduce network overhead and CPU usage during shuffling.
3. Sort avoidance algorithms are implemented to minimize expensive sorting operations through techniques such as early reduce and hash aggregation.
20. 最重要的技术基础:Asymmetric Encryption
Whit?eld Di?e
(born 1944)
Martin Hellman
(born 1945)
2015 Turing Award
The citation for the award was: "For fundamental
contributions to modern cryptography. Di?e and
Hellman's groundbreaking 1976 paper, "New Directions
in Cryptography," introduced the ideas of public-key
cryptography and digital signatures, which are the
foundation for most regularly-used security protocols
on the internet today."
https://en.wikipedia.org/wiki/Whit?eld_Di?e
https://en.wikipedia.org/wiki/Martin_Hellman
22. “加密” 和 “数字签名”:Two Principles
1. Any message encrypted with Bob’s public key
can only be decrypted with Bob’s private key.
2. Anyone with access to Alice’s public key can verify
that a message (signature) could only have been
created by Alice (with access to her private key).
Alice Bobmessage
23. 交易(Bitcoin Transactions)原理和“价值传递”
交易表述 in the paper 更准确的解释
Notice that none of this requires an o?cial third party to
authorize or authenticate the transactions. Alice, Bob, and Carol
can generate their own public-private key pairs without help from
third parties. Anyone who knows Alice's and Bob's public keys
can independently verify that the chain of signatures is
cryptographically valid. Digital signatures—combined with a few
innovations we'll discuss later—let people engage in banking
without needing a bank.
Hash
Owner 2’s
Public Key
Digest
Owner 1’s
Private Key
Owner 1’s
Signature
Encrypt
Owner 1
注:实际上,在?特币和区块链中,转账就叫“Send”
24. Bitcoin Transaction:多Input -> 多Output (?特币特有)
实际的?个交易允许多个输?(input)、多个输出(output)
A bitcoin transaction contains a list of inputs and outputs.
Each output has a public key associated with it.
For a later transaction to spend those coins, it needs an input with a
matching digital signature.?
https://bitcoin.org/en/how-it-works
https://arstechnica.com/tech-policy/2017/12/how-bitcoin-works/
https://www.blockchain.com
25. Address & Wallet
Derived from public key.
Hash ?ngerprint of public key.
Address
https://arstechnica.com/tech-policy/2017/12/how-bitcoin-works/
https://blog.csdn.net/waixingrenabc/article/details/82190566 (?特币地址是怎么计算出来的)
Address A
Address B
Address C
Wallet of Alice
Address D
Address F
Wallet of Bob
Private Key A
Public Key A
Private Key B
Public Key B
Private Key C
Public Key C
Private Key D
Public Key D
Private Key F
Public Key F
Transaction output to A
Transaction output to A
Transaction output to A
Transaction output to B
Transaction output to C
Transaction output to C
Transaction output to D
Transaction output to D
Transaction output to F
Transaction output to F
Wallet
Walle
https://www.blockchain.com/btc/address/3HqH1qGAqNWPpbrvyGjnRxNEjcUKD4e6ea
26. Transactions, UTXOs, Balances
Summary:
? The bitcoin blockchain does not hold account balances
? Bitcoin?wallets hold keys (addresses)?to UTXOs
? If included in a transaction, an entire UTXO is spent (in some cases
partially received back as “change” in the form of a brand new UTXO)
花钱原则:
1. 每?个交易的输?(input),都必须是前?某个交易的输出(output)
2. 前?某个交易的输出,只能被?(spend)?次 (double-spending problem)
3. 每次交易必须把前?的交易的输出bitcoins花完
4. 你可以把转账剩余的bitcoins拿出??部分?付给矿?做交易费
5. 实在花不完的话,剩下的bitcoins在?付到??钱包中的某个地址中
https://blockchain.info/tx/ae51116179e79bd6ecaf72fcdc743375a49467bfc219b114fb81d630ce31a00b
UTXO= Unspent Transaction Output
UTXOs和交易时对UTXOs的检查
同时解决了双花问题(double-spending problem)
https://www.jianshu.com/p/aebf0e4174ad
https://medium.com/cybermiles/diving-into-ethereums-world-state-c893102030ed
https://hackernoon.com/getting-deep-into-ethereum-how-data-is-stored-in-ethereum-e3f669d96033
27. Bitcoin’s Blockchain
共享的公共账本 (shared public ledger)
每个节点都存?样的全部的交易记录数据
includes a list of
transactions
?ush
https://www.blockchain.com/explorer
链状结构增加攻击难度
40. 以太坊的三个能?层次
第?层能?:和?特币?样的以太币发?和交易 (Blockchain Transaction)
可以挖矿(获得激励以太币)、转账交易,所有?切都是从A地址到B地址,逻辑上?特币的多输出多输出更简单,通过底层区块
链构建了共享账本 (a shared world ledger, world wide ledger)。同时这个底层区块链数据层,也为后?的外界和智能合约程序交
互建?了唯?的数据(消息)通道。
第?层能?:智能合约(Smart Contract)
这是他的核?能?,?标是打造去中?化的,世界共享的共享计算机 (a shared world computing platform, world computer),
任何?的程序执?需求都可以??编写代码并提交上链,然后当被调?时?络共识选择?个节点帮你执?。所谓智能合约,可以理
解为我们?程师平时都理解的程序bot (例如钉钉机器?),所以他也归?个账户地址(就跟?个实际的??样—-机器?)。你给它
发个消息,触发它执??段代码。这段程序还会维护?些变量数据,这些数据是同步存储到全?节点的。
第三层能?:TOKEN/通证 (可流通的数字权益证明)
其实就是智能合约上的有规范的应?,也就是给上述那种Bot定义了标准的数据变量和接?(Interface),就像给?个类定义标准的
Interface,Interface规定了类似转账、授权额度、发?等相关接?。任何?都可以继承和实现这个标准接?(如ERC20、
ERC777标准),发???的罢翱碍贰狈。实际就是把这个标准接?的智能合约代码部署到链上,初始化给你??或某个地址?些
TOKEN额度,然后就可以通过调?这个智能合约的?法进?操作,例如转?些TOKEN给其他?了。
41. 数据结构
https://pegasys.tech/ethereum-explained-merkle-trees-world-state-transactions-and-more/
https://kauri.io/ethereum-explained-merkle-trees-world-state-transa/1f4196c3db7f41e5845f063dc1581a4e/a
https://hackernoon.com/getting-deep-into-ethereum-how-data-is-stored-in-ethereum-e3f669d96033
https://medium.com/cybermiles/diving-into-ethereums-world-state-c893102030ed
address1 balance1, storageRoot1, codeHash1, nonce
address2 balance2, storageRoot2, codeHash2, nonce
address3 balance3, storageRoot3, codeHash3, nonce
…
…
World
State
balanceN, storageRootN, codeHashN, nonce
Storage
ephemeral & constantly updating
stateRoot
address1 smart contract data
address2 smart contract data
address3 smart contract data
…
…
smart contract data
storageRoot1
storageRoot2
storageRoot3
storageRootN
ephemeral & constantly updating
智能合约数据
ephemeral
随着合约程序执?持续更新
每个地址账户有?个storage结构
全局的状态数据
ephemeral
随着交易的上链持续更新
内部结构是Key-Value,每个地址账户?个状态结构
交易回执数据
permanent
每个交易?条数据
每个block有个结构
交易数据
permanent
每个交易?条数据
每个block有个结构
1. The world state trie contains the mapping between addresses and account
states. The hash of the root node of the world state trie is included in a block (in the
stateRoot ?eld) to represent the current state when that block was created. We have
only one world state trie.
2. The account storage trie contains the data associated to a smart contract. The
hash of the root node of the Account storage trie is included in the account state (in
the storageRoot ?eld). We have one Account storage trie for each account.
3. The transaction trie contains all the transactions included in a block. The hash of
the root node of the Transaction trie is included in the block header (in the
transactionsRoot ?eld). We have one transaction trie per block.
4. The transaction receipt trie contains all the transaction receipts for the
transactions included in a block. The hash of the root node of the transaction
receipts trie is included in also included in the block header (in the receiptsRoot ?eld);
We have one transaction receipts trie per block.
Blockchain
permanent
Block中的stateRoot
只是当时瞬间的快照
Block中的transactionRoot和receiptsRoot
代表了不可变的Transactions和Receipts
43. 以太币 ETH / Ether
以太坊的以太币实际上和?特币没有本质区别
但以太坊实现了?个分布式共享计算机系统
?以太币被?来?付帮你将交易写?区块持久化,以及帮你计算代码(智能合约)的机器的费?
所谓“燃料”币
发起交易时,钱包客户端提示付费
44. Account / Address / Wallet
两类账户/地址:
1. 只储存着以太币的账户/地址:Externally Owned Accounts (EOA),和?特币的地址差不多,也成为“外部账户”。An EOA account
is the account that you and I would have, that we can use to send Ether to one another and deploy smart contracts.
2. 合约的账户/地址:Contract Accounts (CA),也就是“机器?”的地址. A contract account is the account that is created when a
smart contract is deployed. Every smart contract has its own Ethereum account address. 存有代码(并可能存储着 ETH 或其它
Token)的账户。智能合约上传后,其代码就?直保存在以太坊系统上,等待着被调?激活使?。
这两种账户都会有相应的私钥,私钥持有者就可以?私钥来签发该账户的交易。
An Externally Owned Account A Contract Account
MetaMask, Wallet
46. 以太坊的 “交易”
Transactions:
在?特币区块链上主要是指?特币的转账交易,?在以太坊上则更为多样,主
要包括三类 :
1. ETH 的转账 (和?特币类似):Transactions that transfer value between
two EOAs (e.g., change the sender and receiver account balances)
2. 部署智能合约:Transactions that deploy a contract (therefore, create an
account, the contract account)。
3. ?户与智能合约的交互:Transactions that send a message call to a
contract (e.g., set a value in the smart contract by sending a message call
that executes a setter method)
4. 智能合约所发?TOKEN的转移: Transactions that transfer tokens between
two EOAs.
以太坊交易和智能合约的抽象:所有业务形态在以太坊的区块链上,都抽象成
blockchain在上完成?笔交易
1. 在区块链数据库上,存储统?的数据结构的记录,记录中数据格式是?笔交易;
2. 把所有?为都抽象成交易:?个真正的交易、创建?个合约、调?(触发)合约中
的某个?法等,都抽象成底层数据库中的交易
3. 这就像微信在其聊天通道中,把所有其中东西都抽象成消息?样,?如微信转账
也是?个消息。
4. 从另?个?度讲,我们可以把transactions构成的blockchain看做是?个持续
append的?志;