The document provides an overview of NewSQL databases. It discusses why NewSQL databases were created, including the need to handle extreme amounts of data and traffic. It describes some key characteristics of NewSQL databases, such as providing scalability like NoSQL databases while also supporting SQL and ACID transactions. Finally, it reviews some examples of NewSQL database products, like VoltDB and Google Spanner, and their architectures.
This document introduces Apache Spark. It discusses MapReduce and its limitations in processing large datasets. Spark was developed to address these limitations by enabling fast sharing of data across clusters using resilient distributed datasets (RDDs). RDDs allow transformations like map and filter to be applied lazily and support operations like join and groupByKey. This provides benefits for iterative and interactive queries compared to MapReduce.
This document discusses NewSQL databases and provides examples of NewSQL products. It begins by explaining the limitations of traditional SQL databases and NoSQL databases in handling big data. It then introduces NewSQL as an approach that provides scalability like NoSQL with ACID transactions and SQL support like traditional databases. Example NewSQL databases discussed in detail include VoltDB, which uses an in-memory architecture, and Google Spanner, which provides a globally distributed SQL database. The document also briefly mentions MySQL Cluster as another NewSQL approach.
This document discusses big data analytics and SQL on Hadoop. It begins with an introduction to big data analysis and different approaches for analyzing big data, including traditional analytic databases, MapReduce, and SQL on Hadoop. The document then focuses on SQL on Hadoop, explaining that it refers to a new generation of analytic databases built on Hadoop. It provides examples of SQL on Hadoop systems like Hive, Impala, and Presto. The rest of the document discusses Google's Dremel and Tenzing systems for interactively analyzing large datasets using SQL.
Spark Streaming allows for scalable, high-throughput, fault-tolerant stream processing of live data streams. It works by chopping data streams into batches, treating each batch as an RDD, and processing them using RDD transformations and operations. This provides a simple abstraction called a DStream that represents a continuous stream of data as a series of RDDs. Transformations applied to DStreams are similarly applied to the underlying RDDs. Spark Streaming also supports window operations, output operations, and integrating streaming with Spark's ML and graph processing capabilities.
The document provides an overview of the Scala programming language. It discusses what Scala is, its definition, and opinions on Scala from notable figures like the creator of Java. It also covers Scala's scalability, data types, functional aspects, object-oriented aspects, and comparison to Java. Key points are that Scala is a multi-paradigm language that integrates functional and object-oriented programming, runs on the JVM, and is more concise and expressive than Java.
This document provides guidance on designing RESTful APIs. It recommends using nouns instead of verbs, keeping URLs simple with only two endpoints per resource, and following conventions from leading APIs. Complex variations and optional parameters should be "swept behind the '?'." The document emphasizes designing for application developers by making APIs intuitive, consistent and complete while also accommodating exceptional clients. It suggests adding an API virtualization layer to handle complexity.
Big Data Platform Field Case in MelOn (in Korean)
- Presented by Byeong-hwa Yoon, engineer manager at Loen Entertainment
- at Gruter TECHDAY 2014 Oct. 29 Seoul, Korea
Spark Streaming allows for scalable, high-throughput, fault-tolerant stream processing of live data streams. It works by chopping data streams into batches, treating each batch as an RDD, and processing them using RDD transformations and operations. This provides a simple abstraction called a DStream that represents a continuous stream of data as a series of RDDs. Transformations applied to DStreams are similarly applied to the underlying RDDs. Spark Streaming also supports window operations, output operations, and integrating streaming with Spark's ML and graph processing capabilities.
The document provides an overview of the Scala programming language. It discusses what Scala is, its definition, and opinions on Scala from notable figures like the creator of Java. It also covers Scala's scalability, data types, functional aspects, object-oriented aspects, and comparison to Java. Key points are that Scala is a multi-paradigm language that integrates functional and object-oriented programming, runs on the JVM, and is more concise and expressive than Java.
This document provides guidance on designing RESTful APIs. It recommends using nouns instead of verbs, keeping URLs simple with only two endpoints per resource, and following conventions from leading APIs. Complex variations and optional parameters should be "swept behind the '?'." The document emphasizes designing for application developers by making APIs intuitive, consistent and complete while also accommodating exceptional clients. It suggests adding an API virtualization layer to handle complexity.
Big Data Platform Field Case in MelOn (in Korean)
- Presented by Byeong-hwa Yoon, engineer manager at Loen Entertainment
- at Gruter TECHDAY 2014 Oct. 29 Seoul, Korea
7. 蟆暑 Smart Device 讀螳豢豌 : Morgan Stanley Internet Trends 2010(http://www.morganstanley.com/institutional/techresearch/pdfs/Internet_Trends_041210.pdf)
8. 蟆暑 Smart Device 讀螳(PC vs. SmartPhone)豢豌 : Morgan Stanley Internet Trends 2010(http://www.morganstanley.com/institutional/techresearch/pdfs/Internet_Trends_041210.pdf)
33. れ 企殊磯 Tenth File SystemCheaper than Google, Better than NAS.The cheapest file system in the world Easy-to-use Automatic error detection, notification, recovery 24/7 zero down-time豢豌 : Daum