Establish The Core of Cloud Computing Application by Using Hazelcast (Chinese)Joseph Kuo
?
The concept of cloud computing has been introduced for several years. Many of us may be able to roughly imagine what it is, some of us may know how to describe it, but only a few do know how to implement it. Does NoSQL, MapReduce or Big Data equal to cloud computing? Can a service be said that it is cloud-based if it is using any of those tools? Many companies and groups have declared that their online services are cloud-based or they are using cloud computing, but are those all true? Except for the questions above, where should we start if we would like to establish a cloud-based service which is distributed, flexible, reliable, available, scalable and stable? This session intends to lead you through the gate of mysteries and head to the beautiful realm of cloud computing by using powerful tools, like Hazelcast. Welcome to journey with us to the core of cloud computing application!
https://cyberjos.blog/java/seminar/jcconf-2014-establish-the-core-of-cloud-computing-application-by-using-hazelcast/
How to plan a hadoop cluster for testing and production environmentAnna Yen
?
Athemaster wants to share our experience to plan Hardware Spec, server initial and role deployment with new Hadoop Users. There are 2 testing environments and 3 production environments for case study.
How to plan a hadoop cluster for testing and production environmentAnna Yen
?
Athemaster wants to share our experience to plan Hardware Spec, server initial and role deployment with new Hadoop Users. There are 2 testing environments and 3 production environments for case study.
Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016)涛 吴
?
This slide delivered by Zuoyan Qin, Chief engineer from XiaoMi Cloud Storage Team, was for talk at Arch summit Beijing-2016 regarding how Pegasus was designed.
4. WHAT
? “Kafka is a messaging system that was
originally developed at LinkedIn to
serve as the foundation for LinkedIn's
activity stream and operational data
processing pipeline.”
13年7月5?日星期五
5. User cases
? Operational monitoring: real-time, heads-up
monitoring
? Reporting and Batch processing: load data into
a data warehouse or Hadoop system
13年7月5?日星期五
42. Drawbacks
? All disk reads and writes will go through this
uni?ed cache. This feature cannot easily be
turned off without using direct I/O, so even if
a process maintains an in-process cache of the
data, this data will likely be duplicated in OS
pagecache, effectively storing everything
twice.
13年7月5?日星期五
45. If we use memory(JVM)
? The memory overhead of objects is very
high, often doubling the size of the data
stored (or worse).
? Java garbage collection becomes
increasingly sketchy and expensive as
the in-heap data increases.
13年7月5?日星期五
46. cache size
? at least double the available cache by
having automatic access to all free
memory, and likely double again by
storing a compact byte structure rather
than individual objects. Doing so will
result in a cache of up to 28-30GB on a
32GB machine.
13年7月5?日星期五
48. Conclusion
? using the ?lesystem and relying on
pagecache is superior to maintaining an
in-memory cache or other structure
13年7月5?日星期五
49. Go Extreme!
? Write to ?lesystem DIRECTLY!
? (In effect this just means that it is transferred
into the kernel's pagecache where the OS
can ?ush it later.)
13年7月5?日星期五
50. Furthermore
? You can con?gure: every N messages or
every M seconds. It is to put a bound on
the amount of data "at risk" in the event
of a hard crash.
? Varnish use pagecache-centric design as
well.
13年7月5?日星期五
55. BTree for Disk
? Disk seeks come at 10 ms a pop
? each disk can do only one seek at a time
? parallelism is limited
? the observed performance of tree
structures is often super-linear
13年7月5?日星期五
56. Lock
? Page or row locking to avoid lock the
tree
13年7月5?日星期五
57. Two Facts
? no advantage of driver density because
of the heavy reliance on disk seek
? need small (< 100GB) high RPM SAS
drives to maintain a sane ratio of data
to seek capacity
13年7月5?日星期五
59. Feature
? One queue is one log ?le
? Operations is O(1)
? Reads do not block writes or each other
? Decouple with data size
? Retain messages after consumption
13年7月5?日星期五
61. 1. The operating system reads data from the disk
into pagecache in kernel space
2. The application reads the data from kernel
space into a user-space buffer
3. The application writes the data back into
kernel space into a socket buffer
4. The operating system copies the data from the
socket buffer to the NIC buffer where it is sent
over the network
13年7月5?日星期五
62. zerocopy
? data is copied into pagecache exactly
once and reused on each consumption
instead of being stored in memory and
copied out to kernel space every time it
is read
13年7月5?日星期五
67. Key point
? End-to-end: compress by producers and
de-compress by consumers
? Batch: compression aims to compress a
‘message set’
? Kafka supports GZIP and Snappy
protocols
13年7月5?日星期五
77. Msg Format
? N byte message:
? If magic byte is 0
1. 1 byte "magic" identi?er to allow format changes
2. 4 byte CRC32 of the payload
3. N - 5 byte payload
? If magic byte is 1
1. 1 byte "magic" identi?er to allow format changes
2. 1 byte "attributes" identi?er to allow annotations on the message independent of the
version (e.g. compression enabled, type of codec used)
3. 4 byte CRC32 of the payload
4. N - 6 byte payload
13年7月5?日星期五
78. Log format on-disk
? On-disk format of a message
? message length : 4 bytes (value: 1+4+n)
? ‘magic’ value : 1 byte
? crc : 4 bytes
? payload : n bytes
? partition id and node id to uniquely identify a
message
13年7月5?日星期五
83. Writes
? Append-write
? When rotate:
? M : M messages in a log ?le
? S : S seconds after last ?ush
? Durability guarantee: losing at most M
messages or S seconds of data in the
event of a system crash
13年7月5?日星期五
85. Buffer Reads
? auto double buffer size
? you can specify the max buffer size
13年7月5?日星期五
86. Offset Search
? Search steps:
1. locating the log segment ?le in which
the data is stored
2. calculating the ?le-speci?c offset from
the global offset value
3. reading from that ?le offset
? Simple binary in memory
13年7月5?日星期五
89. Deletes
? Policy: N days ago or N GB
? Deleting while reading?
? a copy-on-write style segment list
implementation that provides
consistent views to allow a binary
search to proceed on an immutable
static snapshot view of the log
segments
13年7月5?日星期五