SDEC2011 Big engineer vs small entreprenuerKorea Sdec
油
This document discusses the differences between a big engineer at an established company and a small entrepreneur starting their own company. It describes the challenges a small startup faces in mobile service development including unpredictable traffic, lack of resources, and difficulty building their own cultures. The entrepreneur considers using clouds but has concerns about latency, reliability and costs. They decide to build their own hybrid system to meet goals of low latency, high performance, reliability and scalability at a low price.
In the SNS domain, Response time of Friend Suggestion Algorithms and several SNA algorithms is in direct proportion to square of relationship number. In addition, increasing rate of relationship number is bigger and bigger. But existing usage pattern of Relational DB is suffering poor performance. To guarantee performance and scalability, we have developed such methods for Friend Suggestion and SNA
Relation Pruning using intimacy value
No Join & Keeping all Data in-Memory Strategy
Distributed Graph Structure
The document provides an overview of installing and configuring Hadoop for single node, pseudo-distributed, and clustered environments. It describes downloading and extracting Hadoop, configuring Hadoop to run in different modes, starting the Hadoop daemons, and using example applications to test Hadoop functionality. Configuration files for the HDFS, MapReduce, and daemon processes are edited to set up the Hadoop infrastructure.
The document provides an overview of installing and configuring Hadoop for single node, pseudo-distributed, and clustered environments. It describes downloading and extracting Hadoop, configuring Hadoop to run in different modes, starting the Hadoop daemons, and using example applications to test Hadoop functionality. Configuration files for the HDFS, MapReduce, and daemon processes are edited to set up the Hadoop infrastructure.
The document provides an overview of NoSQL data modeling concepts and different NoSQL database types including document databases, column-oriented databases, key-value stores, and graph databases. It discusses data modeling approaches for each type and compares databases like MongoDB and CouchDB. The document also covers topics like CAP theorem, eventual consistency, and distributed system techniques from Dynamo.
This document provides an overview and introduction to Pig, an infrastructure for analyzing large datasets using Hadoop MapReduce. It discusses what Pig is, why it should be used, how to install and set up Pig, the components of Pig including Pig Latin and the Pig engine, and provides examples of how to perform common data analysis tasks like filtering, grouping, joining and ordering data using Pig Latin scripts.
This document provides an overview of Apache Mahout, an open source machine learning library built on Apache Hadoop. It describes Mahout as a scalable machine learning framework that supports algorithms like clustering, classification, and collaborative filtering. It also summarizes Mahout's history, goals, examples of common use cases, supported algorithms, and the Taste collaborative filtering framework.
This document provides an overview of Hive, including what it is, its supported platforms, required software, how to download, install, configure, and provide examples of using Hive with SQL-like queries on datasets. Hive allows SQL-like querying of data stored in Hadoop via HiveQL queries that are converted into MapReduce jobs under the hood.
The document provides an overview of NoSQL databases and discusses various types including document databases, column-family stores, and key-value pairs. It provides examples of MongoDB, CouchDB, Redis, HBase and their data models, query operations, and architectures.
The document provides an overview of installing and configuring Hadoop for single node, pseudo-distributed, and clustered environments. It describes downloading and extracting Hadoop, configuring Hadoop to run in the different modes, starting the Hadoop daemons, and using example applications to test functionality. Configuration files and processes are outlined for the NameNode, DataNode, JobTracker, and TaskTracker along with using the HDFS and MapReduce components.
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and HiveKorea Sdec
油
The document presents an overview of the Seoul Data Engineering Camp focused on migrating legacy telecommunications databases to Hadoop and Hive. It discusses the motivations for using Hive, its internal architecture, use cases, including log analysis and machine learning, and optimization strategies. Additionally, it outlines the development history and community growth surrounding Hive and includes example code snippets for data processing.
SDEC2011 Mahout - the what, the how and the whyKorea Sdec
油
1) Mahout is an Apache project that builds a scalable machine learning library.
2) It aims to support a variety of machine learning tasks such as clustering, classification, and recommendation.
3) Mahout algorithms are implemented using MapReduce to scale linearly with large datasets.
The document discusses TACC, a high-level programming language designed to enhance the development of scalable and fault-tolerant key-value stores. It emphasizes how TACC simplifies application logic, improves performance, and minimizes synchronization issues, leading to faster development times and increased reliability. An example of a location service demonstrates TACC's ability to efficiently handle real-time user location tracking with high throughput and low latency.
SDEC2011 Using Couchbase for social game scaling and speedKorea Sdec
油
The document discusses Couchbase, a distributed NoSQL database that offers simplicity, speed, and scalability for social gaming applications. Couchbase's architecture allows seamless data handling, horizontal scaling, and integration with platforms like EC2, addressing challenges faced by companies like Tribal Crossing in managing large datasets. The presentation also highlights Couchbase's roadmap for future enhancements, including mobile-cloud synchronization and improved querying capabilities.
The document presents information about Arcus, a cloud caching solution based on Memcached, detailing its architecture, performance metrics, and integration with Zookeeper for node management. It emphasizes the importance of consistent hashing for effective data distribution across cache nodes and showcases command examples for Memcached operations. The content appears to be based on a presentation from a 2011 conference.
33. Digital Media
-Broadcast & Entertainment Media
-Multimedia Publishing
Game & Software Development
-Global Software Development
-Rapid Game Development
Cloud Computing
-Content Aggregation, Distribution
-Database Hosting
Global Enterprise
-Data Processing Outsourcing
-High Speed Remote Backups
-Digital Asset Distribution
Life Sciences
Intelligence & Defense