How to plan a hadoop cluster for testing and production environmentAnna Yen
?
Athemaster wants to share our experience to plan Hardware Spec, server initial and role deployment with new Hadoop Users. There are 2 testing environments and 3 production environments for case study.
Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...Etu Solution
?
講者:SYSTEX 數據加值應用發展部產品經理 | 陶靖霖
議題簡介:認清現實吧! Big Data 是個熱門詞彙、熱門議題,但是問題的核心仍然圍繞在資料處理的流程、架構與技術,要踏入 Big Data 的領域,使用者會遭遇哪些挑戰? Splunk 被譽為「全球最佳的 Big Data Company」,究竟在資料處理的流程中擁有什麼獨特的技術優勢,能夠幫助使用者克服這些挑戰?又有哪些成功幫助使用者從資料中萃取出價值的應用案例?歡迎來認識 Splunk 以及全球 Big Data 成功案例。
Amazon S3 provides inexpensive cloud storage while EC2 offers virtual computing resources. S3 allows storage of unlimited data for $0.15 per GB per month with data retrieval priced at $0.10-$0.13 per GB depending on amount. EC2's virtual machines range in power and price from $0.10 per hour for a small instance to $0.80 per hour for an extra large one. Both services offer flexibility to scale up or down on demand with no long term commitments.
source: http://www.sfbayacm.org/?p=1394
The specifics of a cloud’s computing architecture may have an impact on application design. This is particularly important in Infrastructure as a Service (IaaS) cloud environments.
This presentation analyzes aspects of the Amazon EC2 IaaS cloud environment that differ from a traditional datacenter and introduces general best practices for ensuring data privacy, storage persistence, and reliable DBMS backup. Best practices for application robustness and scalability on demand are reviewed and are especially significant in leveraging the full potential of an IaaS cloud. The need for a cloud application management and configuration system is briefly reviewed and two alternate approaches to cloud application management are described (RightScale and Kaavo).
How to plan a hadoop cluster for testing and production environmentAnna Yen
?
Athemaster wants to share our experience to plan Hardware Spec, server initial and role deployment with new Hadoop Users. There are 2 testing environments and 3 production environments for case study.
Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...Etu Solution
?
講者:SYSTEX 數據加值應用發展部產品經理 | 陶靖霖
議題簡介:認清現實吧! Big Data 是個熱門詞彙、熱門議題,但是問題的核心仍然圍繞在資料處理的流程、架構與技術,要踏入 Big Data 的領域,使用者會遭遇哪些挑戰? Splunk 被譽為「全球最佳的 Big Data Company」,究竟在資料處理的流程中擁有什麼獨特的技術優勢,能夠幫助使用者克服這些挑戰?又有哪些成功幫助使用者從資料中萃取出價值的應用案例?歡迎來認識 Splunk 以及全球 Big Data 成功案例。
Amazon S3 provides inexpensive cloud storage while EC2 offers virtual computing resources. S3 allows storage of unlimited data for $0.15 per GB per month with data retrieval priced at $0.10-$0.13 per GB depending on amount. EC2's virtual machines range in power and price from $0.10 per hour for a small instance to $0.80 per hour for an extra large one. Both services offer flexibility to scale up or down on demand with no long term commitments.
source: http://www.sfbayacm.org/?p=1394
The specifics of a cloud’s computing architecture may have an impact on application design. This is particularly important in Infrastructure as a Service (IaaS) cloud environments.
This presentation analyzes aspects of the Amazon EC2 IaaS cloud environment that differ from a traditional datacenter and introduces general best practices for ensuring data privacy, storage persistence, and reliable DBMS backup. Best practices for application robustness and scalability on demand are reviewed and are especially significant in leveraging the full potential of an IaaS cloud. The need for a cloud application management and configuration system is briefly reviewed and two alternate approaches to cloud application management are described (RightScale and Kaavo).
S3 Strategic Solutions Services is a certified veteran-owned consulting firm that provides process improvement solutions to prime contractors, small businesses, and manufacturers. They offer direct support, facility layout planning, process improvement studies, program and project management, training programs, and disaster recovery services. The company was founded in 2009 and is led by Christopher Dakin with over 30 years of experience in engineering and quality. They take an action-oriented approach on all projects to address clients' needs.
S3 is a data storage service that provides scalable, reliable storage through a RESTful API. Files are stored as objects within buckets and can be accessed via HTTP, BitTorrent, or SOAP protocols. Access is controlled on a per file or per user basis. Users are billed based on the amount of data stored and transferred. The service provides effectively unlimited storage at low costs but has some limitations around global availability and eventual consistency.
Brief research on Amazon S3 for my company.
Feel free to comment/feedback. Thanks!
Connect with me on LinkedIn : sg.linkedin.com/in/yulunteo/
Seems like there are still plenty of people viewing this presentation after so long.
Maybe i should consider doing a update for Cloudfront/Glacier as well..
View these slides if you're you new to cloud computing and would like to learn more about Amazon Web Services (AWS), if you intend to implement a project and would like to discover the basics of the AWS cloud or if you are a business looking to evaluate cloud computing.
In the webinar based on these slides, we answered the following questions:
? What is Cloud Computing with AWS and what benefits can it deliver?
? Who is using AWS and what are they using it for?
? How can I use AWS Services to run my workloads?
View the webinar recording on YouTube here: http://youtu.be/QROD20r6-sQ
Pegasus: Designing a Distributed Key Value System (Arch summit beijing-2016)涛 吴
?
This slide delivered by Zuoyan Qin, Chief engineer from XiaoMi Cloud Storage Team, was for talk at Arch summit Beijing-2016 regarding how Pegasus was designed.
2. 定义 A file system (often also written as filesystem ) is a method of storing and organizing computer files and their data. Essentially, it organizes these files into a database for the storage, organization, manipulation, and retrieval by the computer's operating system.
8. How to scale 本来应该说是 Scale Your Storage 这个 topic 太大 涉及到应用,硬件,网络 先简单的谈分布式文件系统 谈点穷人的方案
9. 看看我们有的文件系统 太多了…看分类吧 Disk file systems –ext3/ntfs/zfs/wafl ( 大部分我们熟悉的 ) File systems with built in fault-tolerance-zfs/brfs File systems optimized for flash memory, solid state media Record-oriented file systems Shared disk file systems Distributed file systems Distributed fault-tolerant file systems Distributed parallel file systems Distributed parallel fault-tolerant file systems GoogleFilesystem/CloudStore/Lustre/HDFS Peer-to-peer file systems Special purpose file systems Pseudo- and virtual file systems Encrypted file systems
10. Google File system Like Google File system KFS HDFS Why like? Master – chunk 架构 POSIX Like Interface 设计目标一致
11. GFS Goal 最开始是为爬虫等应用设计的 The system is built from many inexpensive commodity components that often fail. The system stores a modest number of large Files. large streaming reads and small random reads The workloads also have many large, sequential writes that append data to Files High sustained bandwidth is more important than low latency
18. MogileFS Application level No single point of failure Automatic file replication "Better than RAID" Flat Namespace Shared-Nothing / No RAID required Local filesystem agnostic
24. FastDFS 构成 Tracker Server 主要做调度工作,在访问上起负载均衡的作用。记录 storage server 的状态,是连接 Client 和 Storage server 的枢纽。 Storage Server 存储服务器,文件物理内容和 meta data 都保存到存储服务器上 Storage Server 构成不同的组 ( 卷 /Volume) 同组的 Storage Server 的文件是相同的
#10: SSD - 嵌入式系统 , 写入优化 Record – 是记录方式的,区别于大部分文件系统的 Shard-disk SAN- Redhat GFS 等 DFS - SMB is also known as Common Internet File System (CIFS) DFS-FT MS DFS / DFS-Parallel 没啥好说的,还有下面 DFS-Parallel-FT GoogleFilesystem / CloudStore / Lustre [Laster]
#31: Namespace 的查找是一个消耗很大的操作 , NFS 文件句柄缓存 NFS 文件句柄缓存 Facebook have extended the Linux kernel to allow NFS file opens via inode number rather than filename to avoid the NetApp scaling issue. Namespace 可以扁平化