The document discusses refactoring techniques to replace type codes with other object-oriented constructs. It describes replacing magic numbers with symbolic constants to make code more readable. It also covers replacing type codes with classes, subclasses, or state/strategy objects to encapsulate behavior associated with different types and allow polymorphism. This improves extensibility and maintains type safety compared to using raw integer type codes.
The document discusses refactoring techniques to replace type codes with other object-oriented constructs. It describes replacing magic numbers with symbolic constants to make code more readable. It also covers replacing type codes with classes, subclasses, or state/strategy objects to encapsulate behavior associated with different types and allow polymorphism. This improves extensibility and maintains type safety compared to using raw integer type codes.
Data compression, data security, and machine learningChris Huang
?
This document discusses using data compression techniques to improve machine learning models. It proposes using model compression or reduction methods to simplify deep neural network (DNN) models in order to run them on mobile devices with similar accuracy. One approach described is removing small weight connections, retraining, then using codebooks and Huffman coding to compress models by 20-49x. The document also discusses using lossless compression prior to machine learning to reduce data volume and speed up execution. Overall, the document explores how compression techniques can help make machine learning models more efficient.
This document discusses service reliability monitoring strategies. It describes a service reliability hierarchy that focuses on preventing incidents rather than just responding to them. It also discusses using metrics and alerts to monitor services at different levels of granularity. Specifically, it recommends alerting on high-level service objectives while still allowing inspection of individual components. The document then provides examples of how AWS CloudWatch can be used to collect metrics, define alerts and monitor services. It also discusses the tradeoffs of white-box vs black-box monitoring approaches.
This document provides a summary of chapters 1 and 2 from the SRE book. Chapter 1 discusses the sysadmin approach versus Google's SRE approach. The key aspects of SRE include focusing on software engineering to automate tasks, maintaining a 50% cap on operational work, and using an error budget to balance change velocity and reliability. Chapter 2 describes Google's production environment, including the use of Borg for resource management, Colossus for storage, Chubby for locking services, and gRPC for RPC communication. It also discusses development practices like code reviews and shared code repositories.
Real time big data applications with hadoop ecosystemChris Huang
?
The document discusses real-time big data applications using the Hadoop ecosystem. It provides examples of operational and analytical use cases for online music and banking. It also discusses technologies like Impala, Stinger, Kafka and Storm that can enable near real-time and interactive analytics. The key takeaways are that real-time does not always mean faster than batch, and that a combination of batch and real-time processing is often needed to build big data applications.
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...Chris Huang
?
Trend Micro collects lots of threat knowledge data for clients containing many different threat (web) entities. Most threat entities will be observed along with relations, such as malicious behaviors or interaction chains among them. So, we built a graph model on HBase to store all the known threat entities and their relationships, allowing clients to query threat relationships via any given threat entity. This presentation covers what problems we try to solve, what and how the design decisions we made, how we design such a graph model, and the graph computation tasks involved.
The document discusses approaches for building real-time applications on Hadoop systems before using Impala. It recommends using HBase to store and query data in real-time, SolrCloud for secondary indexing, and streaming tools like Storm on YARN for continuously processing data. The document provides examples of querying log data and malware information in real-time. It emphasizes clarifying use cases, computing data batches efficiently, and minimizing the gap between batches to approach real-time capabilities. The document advises that Impala is not always needed and that the same problems can occur, so the three-arrow approach of HBase, SolrCloud, and streaming often provides good real-time functionality without overengineering the solution.
This document provides an overview and introduction to Apache Solr, including:
- What Solr is and its main features like being based on Lucene, using inverted indexes, and having REST APIs.
- The basics of indexing and searching in Solr.
- An overview of SolrCloud which allows distributing a Solr index across multiple servers for scalability.
This document discusses Trend Micro's experience scaling their big data infrastructure for threat detection. It describes how their infrastructure and data needs have grown substantially over time. Trend Micro now processes over 8 billion URLs and collects over 7 TB of data daily from a global network of over 3 billion sensors using Hadoop clusters. They have also developed machine learning and data mining techniques to analyze this data and identify threats, allowing them to block malicious URLs and threats within 15 minutes of appearing online. The document outlines lessons learned around scaling infrastructure to handle unstructured and high-volume data streams for timely cyber threat analysis.
Wissbi is an open source toolset for building distributed event processing pipelines easily. It provides basic commands like wissbi-sub and wissbi-pub that allow receiving and sending messages. Filters can be written in any language and run in parallel as daemon processes configured through files. This allows constructing complex multi-stage data workflows. The ecosystem also includes tools like a log collector and metric collector that use Wissbi for transport. It aims to minimize operating effort through a simple design that relies mainly on filesystem operations and standard Unix tools and commands.
Hbase status quo apache-con europe - nov 2012Chris Huang
?
The document summarizes the status of HBase and its relationship with HDFS. In the past, HDFS did not prioritize HBase's needs, but reliability, availability, and performance have improved with Hadoop 1.0 and 2.0. Hadoop 2.0 features like HDFS high availability and wire compatibility directly benefit HBase. Further improvements planned for Hadoop 2.x like direct reads and zero-copy support could significantly boost HBase performance. The HBase project is also advancing with new versions focused on features like coprocessors and performance optimizations.
Hbase schema design and sizing apache-con europe - nov 2012Chris Huang
?
The document provides an overview of HBase schema design and cluster sizing notes. It discusses HBase architecture including tables, regions, distribution, and compactions. It emphasizes the importance of schema design, including using intelligent keys, denormalization, and duplication to overcome limitations. The document also covers techniques like salting keys, hashing vs sequential keys, and examples of schema design for applications like mail inbox and Facebook insights. It stresses designing for the use case and avoiding hotspotting when sizing clusters.
This document discusses various techniques for refactoring methods to simplify method calls. It covers refactoring techniques related to method names, parameters, private functions, constructors, and exceptions. Some of the techniques discussed include renaming methods, adding or removing parameters, separating queries from modifiers, parameterizing methods, replacing parameters with explicit methods, introducing parameter objects, and replacing exceptions with tests. Examples are provided for many of the techniques. The overall goal is to improve method interfaces and simplify method calls through refactoring.
This document discusses various refactoring techniques for simplifying conditional expressions and logic in code. It provides examples of refactoring techniques like decomposing conditional expressions, consolidating duplicate conditional fragments, removing control flags, replacing nested conditionals with guard clauses, replacing conditionals with polymorphism, introducing null objects, and introducing assertions. The goal of these refactoring techniques is to simplify complex conditional logic and make the code easier to read, understand and maintain.
This document outlines various refactoring techniques including self-encapsulating fields, replacing data values with objects, changing values to references, changing references to values, replacing arrays with objects, duplicating observed data, changing unidirectional associations to bidirectional, and changing bidirectional associations to unidirectional. Each technique is explained with examples to illustrate how it can be implemented in code.
This document discusses various refactoring techniques for improving the design of existing code. It describes 9 techniques: Extract Method, Inline Method, Inline Temp, Replace Temp with Query, Introduce Explaining Variable, Split Temporary Variable, Remove Assignments to Parameters, Replace Method with Method Object, and Substitute Algorithm. For each technique, it provides the motivation and mechanics for implementing the refactoring.
This document discusses principles of refactoring and code smells. It covers reasons for refactoring like improving design and making code more understandable. It provides guidelines on when to refactor, such as after adding features or fixing bugs. Common code smells are also explained like duplicated code, long methods, large classes, and primitive obsession. The document gives refactoring techniques to address each smell, such as extracting methods and classes. It emphasizes that refactoring improves code quality without changing external behavior.