ProxySQL on KubernetesRené CannaòKubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It allows you to group hosts, schedule containers, enable communication between containers, associate containers to storage, and ensure high availability and scalability. The demo uses Minikube to run a single-node Kubernetes cluster locally, installs Helm package manager, and deploys a MySQL database cluster on Kubernetes with replication and load balancing using Helm charts. It also shows how to connect to and upgrade the MySQL deployment.
ProxySQL High Avalability and Configuration Management OverviewRené CannaòThe document provides an overview of high availability and configuration management options for ProxySQL. It discusses deploying ProxySQL locally on application servers, in a dedicated layer, or using both approaches. When deploying in a dedicated layer, options for high availability include keepalived, load balancers, Consul, and Kubernetes. Configuration can be managed through tools like Ansible, Puppet, or by loading SQL files. ProxySQL Cluster enables syncing configuration across nodes.
MySQL Shell for DBAsFrederic DescampsThe document discusses MySQL Shell and how it can help database administrators (DBAs) with common tasks like deploying architectures, preparing upgrades, dumping and loading data, and managing users. MySQL Shell provides tools like the Admin API for configuring MySQL clusters and replicasets, an upgrade checker utility to validate upgrades to MySQL 8.0, and parallel dump and load functionality to backup, migrate, and reset data.
Containerd Internals: Building a Core Container RuntimePhil EstesContainerd Internals: Building a Core Container Runtime discusses the architecture and internals of Containerd. It provides a brief history of Containerd and explains its goals of providing a clean API, full OCI support, and decoupled components. It describes Containerd's components like runtimes, storage, and snapshots. It then explains the processes of pulling an image, starting a container, and getting Prometheus metrics.
Running MariaDB in multiple data centersMariaDB plcThe document discusses running MariaDB across multiple data centers. It begins by outlining the need for multi-datacenter database architectures to provide high availability, disaster recovery, and continuous operation. It then describes topology choices for different use cases, including traditional disaster recovery, geo-synchronous distributed architectures, and how technologies like MariaDB Master/Slave and Galera Cluster work. The rest of the document discusses answering key questions when designing a multi-datacenter topology, trade-offs to consider, architecture technologies, and pros and cons of different approaches.
Proxysql use case scenarios fosdem17Alkin TezuysalThis document provides an overview of use cases for the ProxySQL database proxy. It discusses how ProxySQL can be used to:
1. Improve scalability through features like connection pooling, read/write splitting, and sharding.
2. Enhance high availability with seamless failover, load balancing, and cluster awareness.
3. Enable advanced query capabilities such as caching, rewriting, blocking, and routing.
4. Provide manageability tools for authentication, runtime configuration, and monitoring.
The document describes several specific scenarios where ProxySQL can optimize operations, help solve performance issues, and empower database administrators. It also outlines how ProxySQL has been tested at large scale supporting millions of
Airflow Best Practises & Roadmap to Airflow 2.0Kaxil NaikThis document provides an overview of new features in Airflow 1.10.8/1.10.9 and best practices for writing DAGs and configuring Airflow for production. It also outlines the roadmap for Airflow 2.0, including dag serialization, a revamped real-time UI, developing a production-grade modern API, releasing official Docker/Helm support, and improving the scheduler. The document aims to help users understand recent Airflow updates and plan their migration to version 2.0.
What to Expect From Oracle database 19cMaria ColganThe Oracle Database has recently switched to an annual release model. Oracle Database 19c is only the second release in this new model. So what can you expect from the latest version of the Oracle Database? This presentation explains how Oracle Database 19c is really 12.2.0.3 the terminal release of the 12.2 family and the new features you can find in this release.
Oracle RAC 19c and Later - Best Practices #OOWLONMarkus MichalewiczThis version of "Oracle Real Application Clusters (RAC) 19c & Later – Best Practices" was first presented in Oracle Open World (OOW) London 2020 and includes content from the OOW 2019 version of the deck. The deck has been updated with the latest information regarding ORAchk as well as upgrade tips & tricks.
MapReduceTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESLThis document describes the MapReduce programming model for processing large datasets in a distributed manner. MapReduce allows users to write map and reduce functions that are automatically parallelized and run across large clusters. The input data is split and the map tasks run in parallel, producing intermediate key-value pairs. These are shuffled and input to the reduce tasks, which produce the final output. The system handles failures, scheduling and parallelization transparently, making it easy for programmers to write distributed applications.
How to set up orchestrator to manage thousands of MySQL serversSimon J MuddThis document discusses how to scale Orchestrator to manage thousands of MySQL servers. It describes how Booking.com uses Orchestrator to manage their thousands of MySQL servers. As the number of monitored servers increases, integration with internal infrastructure is needed, Orchestrator performance must be optimized, and high availability and wider user access features are added. The document provides examples of configuration settings and special considerations needed to effectively use Orchestrator at large scale.
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)Jean-François GagnéTo get better replication speed and less lag, MySQL implements parallel replication in the same schema, also known as LOGICAL_CLOCK. But fully benefiting from this feature is not as simple as just enabling it.
In this talk, I explain in detail how this feature works. I also cover how to optimize parallel replication and the improvements made in MySQL 8.0 and back-ported in 5.7 (Write Sets), greatly improving the potential for parallel execution on replicas (but needing RBR).
Come to this talk to get all the details about MySQL 5.7 and 8.0 Parallel Replication.
Change Data Streaming Patterns for Microservices With Debezium confluent(Gunnar Morling, RedHat) Kafka Summit SF 2018
Debezium (noun | de·be·zi·um | /dɪ:ˈbɪ:ziːəm/): secret sauce for change data capture (CDC) streaming changes from your datastore that enables you to solve multiple challenges: synchronizing data between microservices, gradually extracting microservices from existing monoliths, maintaining different read models in CQRS-style architectures, updating caches and full-text indexes and feeding operational data to your analytics tools
Join this session to learn what CDC is about, how it can be implemented using Debezium, an open source CDC solution based on Apache Kafka and how it can be utilized for your microservices. Find out how Debezium captures all the changes from datastores such as MySQL, PostgreSQL and MongoDB, how to react to the change events in near real time and how Debezium is designed to not compromise on data correctness and completeness also if things go wrong. In a live demo we’ll show how to set up a change data stream out of your application’s database without any code changes needed. You’ll see how to sink the change events into other databases and how to push data changes to your clients using WebSockets.
Distributed fun with etcdAbdulaziz AlMalkietcd is a distributed key value store that provides a reliable way to store data across a cluster of machines.
How I learned to time travel, or, data pipelining and scheduling with AirflowPyDataThis document discusses how the author learned to use Airflow for data pipelining and scheduling tasks. It describes some early tools like Cron and Luigi that were used for scheduling. It then evaluates options like Drake, Pydoit, Pinball, Luigi, and AWS Data Pipeline before settling on Airflow due to its sophistication in handling complex dependencies, built-in scheduling and monitoring, and flexibility. The author also develops a plugin called smart-airflow to add file-based checkpointing capabilities to Airflow to track intermediate data transformations.
From my sql to postgresql using kafka+debeziumClement DemonchyREX How Jobteaser got rid of an old data dump job using change data capture (CDC) with Debezium and Kafka.
Deploying Kafka Streams Applications with Docker and Kubernetesconfluent(Gwen Shapira + Matthias J. Sax, Confluent) Kafka Summit SF 2018
Kafka Streams, Apache Kafka’s stream processing library, allows developers to build sophisticated stateful stream processing applications which you can deploy in an environment of your choice. Kafka Streams is not only scalable, but fully elastic allowing for dynamic scale-in and scale-out as the library handles state migration transparently in the background. By running Kafka Streams applications on Kubernetes, you will be able to use Kubernetes powerful control plane to standardize and simplify the application management—from deployment to dynamic scaling.
In this technical deep dive, we’ll explain the internals of dynamic scaling and state migration in Kafka Streams. We’ll then show, with a live demo, how a Kafka Streams application can run in a Docker container on Kubernetes and the dynamic scaling of an application running in Kubernetes.
The automation challenge: Kubernetes Operators vs Helm ChartsAna-Maria MihalceanuThe document discusses Kubernetes operators versus Helm charts for automating application deployments and management. It provides an overview of Helm charts and their advantages such as parameterization and template-based configuration. Operators are presented as packaging human operational knowledge and providing benefits like automatic tooling and security. The document also demonstrates publishing Helm charts to a repository, generating an operator from a Helm chart, and concludes that operators are better for stateful applications requiring high availability while Helm is suitable for generic application deployments.
Optimizing MariaDB for maximum performanceMariaDB plcWhen it comes to optimizing the performance of a database, DBAs have to look at everything from the OS to the network. In this session, MariaDB Enterprise Architect Manjot Singh shares best practices for getting the most out of MariaDB. He highlights recommended OS settings, important configuration and tuning parameters, options for improving replication and clustering performance and features such as query result caching.
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...HostedbyConfluentMore and more Enterprises are relying on Apache Kafka to run their businesses. Cluster administrators need the ability to mirror data between clusters to provide high availability and disaster recovery.
MirrorMaker 2, released recently as part of Kafka 2.4.0, allows you to mirror multiple clusters and create many replication topologies. Learn all about this awesome new tool and how to reliably and easily mirror clusters.
We will first describe how MirrorMaker 2 works, including how it addresses all the shortcomings of MirrorMaker 1. We will also cover how to decide between its many deployment modes. Finally, we will share our experience running it in production as well as our tips and tricks to get a smooth ride.
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricksAs a general computing engine, Spark can process data from various data management/storage systems, including HDFS, Hive, Cassandra and Kafka. For flexibility and high throughput, Spark defines the Data Source API, which is an abstraction of the storage layer. The Data Source API has two requirements.
1) Generality: support reading/writing most data management/storage systems.
2) Flexibility: customize and optimize the read and write paths for different systems based on their capabilities.
Data Source API V2 is one of the most important features coming with Spark 2.3. This talk will dive into the design and implementation of Data Source API V2, with comparison to the Data Source API V1. We also demonstrate how to implement a file-based data source using the Data Source API V2 for showing its generality and flexibility.
Dask: Scaling PythonMatthew RocklinDask is a Python library for parallel computing that allows users to scale existing Python code to larger datasets and clusters. It provides parallelized versions of NumPy, Pandas, and Scikit-Learn that have the same interfaces as the originals. Dask can be used to parallelize existing Python code with minimal changes, and it supports scaling computations from a single multicore machine to large clusters with thousands of nodes. Dask's task-scheduling approach allows it to be more flexible than other parallel frameworks and to support complex computations and real-time workloads.
MariaDB Administrator 교육 Sangmo Kim- MariaDB 소개
- MariaDB 서버 구성 및 아키텍처 이해
- MariaDB 스토리지 엔진
- MariaDB 데이터베이스 관리
- 트랜잭션 / Locking 의 이해
- MariaDB 보안
- 백업과 복구를 통한 데이터베이스 관리
- MariaDB upgrade
- MariaDB 모니터링
- MySQL 에서 MariaDB 로의 전환
Faster, better, stronger: The new InnoDBMariaDB plcFor MariaDB Enterprise Server 10.5, the default transactional storage engine, InnoDB, has been significantly rewritten to improve the performance of writes and backups. Next, we removed a number of parameters to reduce unnecessary complexity, not only in terms of configuration but of the code itself. And finally, we improved crash recovery thanks to better consistency checks and we reduced memory consumption and file I/O thanks to an all new log record format.
In this session, we’ll walk through all of the improvements to InnoDB, and dive deep into the implementation to explain how these improvements help everything from configuration and performance to reliability and recovery.
Apache Flink internalsKostas TzoumasThis document provides an overview of Apache Flink internals. It begins with an introduction and recap of Flink programming concepts. It then discusses how Flink programs are compiled into execution plans and executed in a pipelined fashion, as opposed to being executed eagerly like regular code. The document outlines Flink's architecture including the optimizer, runtime environment, and data storage integrations. It also covers iterative processing and how Flink handles iterations both by unrolling loops and with native iterative datasets.
Tips & Tricks for Apache Kafka®confluentKat Grigg, Confluent, Senior Customer Success Architect + Jen Snipes, Confluent, Senior Customer Success Architect
This presentation will cover tips and best practices for Apache Kafka. In this talk, we will be covering the basic internals of Kafka and how these components integrate together including brokers, topics, partitions, consumers and producers, replication, and Zookeeper. We will be talking about the major categories of operations you need to be setting up and monitoring including configuration, deployment, maintenance, monitoring and then debugging.
https://www.meetup.com/KafkaBayArea/events/270915296/
Group Replication in MySQL 8.0 ( A Walk Through ) MydbopsThis presentation provides an overview about Group Replication in MySQL 8.0. The primary election algorithm, Replication modes are described here.
www.mydbops.com
Oracle RAC 19c and Later - Best Practices #OOWLONMarkus MichalewiczThis version of "Oracle Real Application Clusters (RAC) 19c & Later – Best Practices" was first presented in Oracle Open World (OOW) London 2020 and includes content from the OOW 2019 version of the deck. The deck has been updated with the latest information regarding ORAchk as well as upgrade tips & tricks.
MapReduceTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESLThis document describes the MapReduce programming model for processing large datasets in a distributed manner. MapReduce allows users to write map and reduce functions that are automatically parallelized and run across large clusters. The input data is split and the map tasks run in parallel, producing intermediate key-value pairs. These are shuffled and input to the reduce tasks, which produce the final output. The system handles failures, scheduling and parallelization transparently, making it easy for programmers to write distributed applications.
How to set up orchestrator to manage thousands of MySQL serversSimon J MuddThis document discusses how to scale Orchestrator to manage thousands of MySQL servers. It describes how Booking.com uses Orchestrator to manage their thousands of MySQL servers. As the number of monitored servers increases, integration with internal infrastructure is needed, Orchestrator performance must be optimized, and high availability and wider user access features are added. The document provides examples of configuration settings and special considerations needed to effectively use Orchestrator at large scale.
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)Jean-François GagnéTo get better replication speed and less lag, MySQL implements parallel replication in the same schema, also known as LOGICAL_CLOCK. But fully benefiting from this feature is not as simple as just enabling it.
In this talk, I explain in detail how this feature works. I also cover how to optimize parallel replication and the improvements made in MySQL 8.0 and back-ported in 5.7 (Write Sets), greatly improving the potential for parallel execution on replicas (but needing RBR).
Come to this talk to get all the details about MySQL 5.7 and 8.0 Parallel Replication.
Change Data Streaming Patterns for Microservices With Debezium confluent(Gunnar Morling, RedHat) Kafka Summit SF 2018
Debezium (noun | de·be·zi·um | /dɪ:ˈbɪ:ziːəm/): secret sauce for change data capture (CDC) streaming changes from your datastore that enables you to solve multiple challenges: synchronizing data between microservices, gradually extracting microservices from existing monoliths, maintaining different read models in CQRS-style architectures, updating caches and full-text indexes and feeding operational data to your analytics tools
Join this session to learn what CDC is about, how it can be implemented using Debezium, an open source CDC solution based on Apache Kafka and how it can be utilized for your microservices. Find out how Debezium captures all the changes from datastores such as MySQL, PostgreSQL and MongoDB, how to react to the change events in near real time and how Debezium is designed to not compromise on data correctness and completeness also if things go wrong. In a live demo we’ll show how to set up a change data stream out of your application’s database without any code changes needed. You’ll see how to sink the change events into other databases and how to push data changes to your clients using WebSockets.
Distributed fun with etcdAbdulaziz AlMalkietcd is a distributed key value store that provides a reliable way to store data across a cluster of machines.
How I learned to time travel, or, data pipelining and scheduling with AirflowPyDataThis document discusses how the author learned to use Airflow for data pipelining and scheduling tasks. It describes some early tools like Cron and Luigi that were used for scheduling. It then evaluates options like Drake, Pydoit, Pinball, Luigi, and AWS Data Pipeline before settling on Airflow due to its sophistication in handling complex dependencies, built-in scheduling and monitoring, and flexibility. The author also develops a plugin called smart-airflow to add file-based checkpointing capabilities to Airflow to track intermediate data transformations.
From my sql to postgresql using kafka+debeziumClement DemonchyREX How Jobteaser got rid of an old data dump job using change data capture (CDC) with Debezium and Kafka.
Deploying Kafka Streams Applications with Docker and Kubernetesconfluent(Gwen Shapira + Matthias J. Sax, Confluent) Kafka Summit SF 2018
Kafka Streams, Apache Kafka’s stream processing library, allows developers to build sophisticated stateful stream processing applications which you can deploy in an environment of your choice. Kafka Streams is not only scalable, but fully elastic allowing for dynamic scale-in and scale-out as the library handles state migration transparently in the background. By running Kafka Streams applications on Kubernetes, you will be able to use Kubernetes powerful control plane to standardize and simplify the application management—from deployment to dynamic scaling.
In this technical deep dive, we’ll explain the internals of dynamic scaling and state migration in Kafka Streams. We’ll then show, with a live demo, how a Kafka Streams application can run in a Docker container on Kubernetes and the dynamic scaling of an application running in Kubernetes.
The automation challenge: Kubernetes Operators vs Helm ChartsAna-Maria MihalceanuThe document discusses Kubernetes operators versus Helm charts for automating application deployments and management. It provides an overview of Helm charts and their advantages such as parameterization and template-based configuration. Operators are presented as packaging human operational knowledge and providing benefits like automatic tooling and security. The document also demonstrates publishing Helm charts to a repository, generating an operator from a Helm chart, and concludes that operators are better for stateful applications requiring high availability while Helm is suitable for generic application deployments.
Optimizing MariaDB for maximum performanceMariaDB plcWhen it comes to optimizing the performance of a database, DBAs have to look at everything from the OS to the network. In this session, MariaDB Enterprise Architect Manjot Singh shares best practices for getting the most out of MariaDB. He highlights recommended OS settings, important configuration and tuning parameters, options for improving replication and clustering performance and features such as query result caching.
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...HostedbyConfluentMore and more Enterprises are relying on Apache Kafka to run their businesses. Cluster administrators need the ability to mirror data between clusters to provide high availability and disaster recovery.
MirrorMaker 2, released recently as part of Kafka 2.4.0, allows you to mirror multiple clusters and create many replication topologies. Learn all about this awesome new tool and how to reliably and easily mirror clusters.
We will first describe how MirrorMaker 2 works, including how it addresses all the shortcomings of MirrorMaker 1. We will also cover how to decide between its many deployment modes. Finally, we will share our experience running it in production as well as our tips and tricks to get a smooth ride.
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricksAs a general computing engine, Spark can process data from various data management/storage systems, including HDFS, Hive, Cassandra and Kafka. For flexibility and high throughput, Spark defines the Data Source API, which is an abstraction of the storage layer. The Data Source API has two requirements.
1) Generality: support reading/writing most data management/storage systems.
2) Flexibility: customize and optimize the read and write paths for different systems based on their capabilities.
Data Source API V2 is one of the most important features coming with Spark 2.3. This talk will dive into the design and implementation of Data Source API V2, with comparison to the Data Source API V1. We also demonstrate how to implement a file-based data source using the Data Source API V2 for showing its generality and flexibility.
Dask: Scaling PythonMatthew RocklinDask is a Python library for parallel computing that allows users to scale existing Python code to larger datasets and clusters. It provides parallelized versions of NumPy, Pandas, and Scikit-Learn that have the same interfaces as the originals. Dask can be used to parallelize existing Python code with minimal changes, and it supports scaling computations from a single multicore machine to large clusters with thousands of nodes. Dask's task-scheduling approach allows it to be more flexible than other parallel frameworks and to support complex computations and real-time workloads.
MariaDB Administrator 교육 Sangmo Kim- MariaDB 소개
- MariaDB 서버 구성 및 아키텍처 이해
- MariaDB 스토리지 엔진
- MariaDB 데이터베이스 관리
- 트랜잭션 / Locking 의 이해
- MariaDB 보안
- 백업과 복구를 통한 데이터베이스 관리
- MariaDB upgrade
- MariaDB 모니터링
- MySQL 에서 MariaDB 로의 전환
Faster, better, stronger: The new InnoDBMariaDB plcFor MariaDB Enterprise Server 10.5, the default transactional storage engine, InnoDB, has been significantly rewritten to improve the performance of writes and backups. Next, we removed a number of parameters to reduce unnecessary complexity, not only in terms of configuration but of the code itself. And finally, we improved crash recovery thanks to better consistency checks and we reduced memory consumption and file I/O thanks to an all new log record format.
In this session, we’ll walk through all of the improvements to InnoDB, and dive deep into the implementation to explain how these improvements help everything from configuration and performance to reliability and recovery.
Apache Flink internalsKostas TzoumasThis document provides an overview of Apache Flink internals. It begins with an introduction and recap of Flink programming concepts. It then discusses how Flink programs are compiled into execution plans and executed in a pipelined fashion, as opposed to being executed eagerly like regular code. The document outlines Flink's architecture including the optimizer, runtime environment, and data storage integrations. It also covers iterative processing and how Flink handles iterations both by unrolling loops and with native iterative datasets.
Tips & Tricks for Apache Kafka®confluentKat Grigg, Confluent, Senior Customer Success Architect + Jen Snipes, Confluent, Senior Customer Success Architect
This presentation will cover tips and best practices for Apache Kafka. In this talk, we will be covering the basic internals of Kafka and how these components integrate together including brokers, topics, partitions, consumers and producers, replication, and Zookeeper. We will be talking about the major categories of operations you need to be setting up and monitoring including configuration, deployment, maintenance, monitoring and then debugging.
https://www.meetup.com/KafkaBayArea/events/270915296/
Group Replication in MySQL 8.0 ( A Walk Through ) MydbopsThis presentation provides an overview about Group Replication in MySQL 8.0. The primary election algorithm, Replication modes are described here.
www.mydbops.com
Gpdb best practices v a01 20150313Sanghee LeePivotal Greenplum Database의 Best practices 문서를 한글화하였습니다. GPDB 구축시에 한 번은 고민해야 할 사항들이 정리가 되어 있습니다. http://gpdb.docs.pivotal.io/gpdb-434.html
steeleye Replication 싵Ә시큐리티 공유 스토리지를 이용한 H/A Cluster 뿐만 아니라
Replication을 이용한 Shared Nothing H/A Cluster 제공
내장된 Application 인지형의 고가용성 기능 제공
DB에 대하여 이중으로 Check 하는 Depth 모니터링 기능
30개의 주요한 Applications 지원
게임 서비스를 위한 AWS상의 고성능 SQL 데이터베이스 구성 (이정훈 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018Amazon Web Services Korea게임 서비스를 위한 AWS상의 고성능 SQL 데이터베이스 구성
게임 서비스 아키텍처에서 관계형 데이터베이스는 핵심 컴포넌트이며 또한 전체 서비스의 성능 병목 지점이 되곤 합니다. 이 세션에서는 AWS 상에서 게임 서비스를 구현할 때, 기존 물리환경에서의 DB 성능과 동일하거나 더 높은 성능을 얻을 수 있는 구성을 설명 드리며, MS SQL 구성의 성능 데모를 시연하고자 합니다.
System Capa Planning_DBA oracle edu엑셈This document provides recommendations for system capacity planning for an Oracle database:
- Plan for 1 CPU per 200 concurrent users and prefer medium speed CPUs over fewer faster CPUs.
- Reserve 10% of memory for the operating system and allocate 220 MB for the Oracle SGA and 3 MB per user process.
- Use striped and mirrored or striped with parity RAID for disks. Consider raw devices or SANs if possible.
- Ensure the network capacity is adequate based on site size.
1. Part 1 ORACLE │161
KEEP BUFFER 활용 방안
㈜엑셈 컨설팅본부/DB컨설팅팀 장 정민
개요
Oracle 은 유저가 요청한 작업을 빠르게 처리하기 위해 Buffer Cache 라는것을 사용한다.
Buffer Cache 는 SGA 에 위치하고 있으며, 오라클 인스턴스에 접속하는 모든 프로세스에 의해
공유된다. 이 Buffer Cache 는 오라클 I/O 관리의 핵심으로 자주 사용하는 데이터 파일의 블록
들을 메모리에 상주시킴으로써 물리적인 I/O Operation 을 줄이는 역할을 한다. Buffer Cache
를 효과적으로 사용하면 물리적 I/O 가 줄어들고 자연스럽게 I/O 성능문제를 해결할 수 있다.
오라클의 버전이 올라감에 따라 Buffer Cache 를 처리하는 알고리즘은 끊임없이 개선되었고,
더불어 새로운 관리 방법들이 제공되었다. Oracle 7 까지의 Buffer Cache 는 하나의 틀로서 운
영되었고, 각 Object 의 특성에 따른 차별적인 Buffer Cache 이용이 어려웠다. 이런 면을 해결
하기 위해 Oracle 8 부터 Multiple buffer pool 이라는 기능을 지원하게 되었는데, 이로 인해 각
Object 의 특성이나 엑세스 빈도 등 차별성을 고려하여 Buffer Cache 를 보다 세밀하게 관리 할
수 있게 되었다.
Keep Buffer 는 이 Multiple Buffer Pool 을 구성하는 여러 영역중의 하나이다.
KEEP Buffer 사용 목적 및 특성
Keep Buffer 의 사용 목적은 본래의 Buffer Cache 의 목적과 마찬가지로, Object 를 메모리에
상주시킴으로써 물리적인 I/O 를 피하는데 있다.
Keep Buffer Pool 과 Default Buffer Pool 이 데이터 블록을 Cache 하거나 Flush 할 때 서로 다
른 알고리즘을 사용하지는 않는다. 그럼에도 불구하고 하나였던 Buffer Cache 영역을 Multiple
Buffer Pool 로 나누게 된 이유는 Object 의 특성을 고려한 Buffer Cache 이용을 위함이다.
2. 162│2013 기술백서 White Paper
KEEP Buffer 의 메모리 공간은 Sequential 하게 관리된다. 한번 KEEP 된 세그먼트는 KEEP
Buffer 의 메모리 공간을 모두 할당하기 전까지 메모리에 유지되다가 KEEP Buffer 공간을 모두
할당하게 되면, 가장 오래된 블록부터 default pool 로 밀려난다. 때문에 KEEP 대상이 되는 세
그먼트들의 사이즈를 정확히 계산하여 KEEP Buffer 크기를 적절히 할당해야 한다.
KEEP Buffer 사용 순서
Keep Buffer 의 사용은 다음과 같은 순서로 진행한다.
1. KEEP 대상 선정하기
2. OS Memory & SGA 공간 확인
3. KEEP Buffer 설정
4. 테이블/인덱스 속성 변경
5. 테이블/인덱스 Keeping
6. KEEP 효율성 체크
7. KEEP 대상 선정 기준
KEEP 대상 선정 하기
KEEP 대상 선정에 있어서 명확한 기준점은 없다. 실제 업무를 고려하여 각 DB 의 운영환경에
맞는 대상을 선정해야 한다. 만약 KEEP 하는 대상이 아주 빈번하게 사용되는 블록 이라면, 기본
적인 Default Buffer 를 사용해도 Cache 돼있을 가능성이 높은데, 이런 경우에는 Keep Buffer
사용이 성능상의 이점을 가져오지 못한다. 반대로 자주 사용되지 않는 블록을 Keep Buffer 에
상주시킨다면, 사용하지 않는 메모리를 가지고 있는 것이기 때문에 전반적인 성능을 저하시키는
요인이 될 수도 있다.
때문에 Keep 대상을 선정하는데 있어서는 실제 업무의 고려가 필수적이다. 예를 들어 하루에 1
번만 수행되는 프로그램인데 어떻게 해서든 수행시간을 단축시켜야 하는 경우나, 많은 업무를
처리하는 시간대에 꼭 수행 되야 하는데 많은 I/O 때문에 병목현상을 일으켜서 시스템에 전반적
으로 악영향을 끼치는 프로그램 등이 있을 수 있다. 이런 프로그램들이 사용하는 Object 들이
3. Part 1 ORACLE │163
Keep 대상이 될 수 있다. 그러나 위와 같은 업무 프로그램에 사용되는 모든 세그먼트를 KEEP
Buffer 에 상주 시킬 수 는 없다. 그러므로 KEEP Buffer 에 상주시킬 대상은 각 DB 운영환경을
고려하여 선정해야 한다. 이외에도 크기, DML 빈도, 데이터 엑세스 빈도에 따라 Keep 하기에 적
절한 세그먼트들이 존재한다.
선정 기준[1]. 프로그램 중요도
Keep 대상 선정에 있어서 가장 중요한 부분이 바로 해당 세그먼트를 조회하는 업무 프로그램의
중요도 이다. 해당 프로그램이 중요하지 않다면 굳이 Keep Buffer 를 사용해야 할 필요가 없다.
반대로 해당 세그먼트를 조회하는 프로그램이 중요도가 아주 높고 어떻게든 수행시간을 단축해
야 한다면 프로그램 수행 빈도와 상관없이 KEEP 대상으로의 선정을 고려할 수 있다.
선정 기준[2]. 세그먼트 크기
세그먼트 크기가 일정하지 않고, 과다하게 커지는 세그먼트는 Keep Buffer 의 효율성을 떨어뜨
릴 수 있다. Keep 된 세그먼트는 Keep Buffer 의 용량이 부족하면 오래된 블록부터 Default
Buffer 로 밀려나게 되는데, 크기가 계속 커지는 세그먼트가 Keep Buffer 에 존재한다면 타 세
그먼트를 조회하는 프로그램의 성능 저하를 가져올 수 있기 때문이다. 따라서 일정한 사이즈 또
는 변동 량이 심하지 않으면서 최대 크기가 일정 수준 이하인 경우의 세그먼트를 선정하는 것이
바람직하다. 예를 들면 ‘최대 크기가 10 만 블록 이하인 세그먼트’ 같은 기준을 정할 수 있다.
선정 기준[3]. Full Table Scan & Index Full Scan & Index Fast Full Scan
KEEP Buffer 에 KEEP 된 세그먼트를 조회할 때 효율성을 극대화 하기 위해서는 다소 많은 량을
처리해야 하는 경우이다. Scan 범위가 넓은 비효율 Index Scan 이나 Full Table Scan, Index
Fast Full Scan 으로 처리되는 세그먼트가 대상이 될 수 있다.
4. 164│2013 기술백서 White Paper
KEEP 대상 선정 SQLScript
SELECT owner ,
table_name ,
index_name ,
partition_name ,
SUM( blocks ) AS t_blocks
FROM (
SELECT sg.owner ,
decode( SUBSTR( s.ob_type , 1 , 5 ) , 'TABLE' , s.ob_name , 'INDEX' , (
SELECT table_name
FROM dba_indexes
WHERE index_name = s.ob_name
) ) AS table_name ,
decode( SUBSTR( s.ob_type , 1 , 5 ) , 'INDEX' , s.ob_name ) AS index_name ,
sg.partition_name ,
sg.blocks
FROM (
SELECT DISTINCT object_name AS ob_name ,
object_type AS ob_type
FROM v$sql_plan
WHERE ( operation = 'TABLE ACCESS'
AND options = 'FULL' )
OR ( operation = 'INDEX'
AND options = 'FULL SCAN' )
OR ( operation = 'INDEX'
AND options = 'FAST FULL SCAN' ) --> 선정 기준[3]
) s ,
dba_segments sg
WHERE s.ob_name = sg.segment_name
)
GROUP BY owner ,
table_name ,
index_name ,
partition_name
HAVING SUM( blocks ) > 100000 --> 선정 기준[2]SELECT * FROM DUAL
5. Part 1 ORACLE │165
OS Memory & SGA 공간 확인
OS Memory SGA 공간 확인 SQLScript
$ cat /proc/meminfo
MemTotal: 4055152 kB
MemFree: 1390308 kB
Buffers: 166768 kB
Cached: 2019992 kB
SwapCached: 0 kB
Active: 1118484 kB
Inactive: 1277864 kB
………
SGA공간 확인 SQLScript
● SGA 전체 size 확인
SELECT name ,
ROUND( bytes/1024/1024 ) "size(MB)"
FROM V$SGAINFO;
NAME size(MB)
------------------------------- ----------
Fixed SGA Size 2
Redo Buffers 5
Buffer Cache Size 48
Shared Pool Size 128
Large Pool Size 0
Java Pool Size 24
Streams Pool Size 0
Shared IO Pool Size 0
Granule Size 4
Maximum SGA Size 207
Startup overhead in Shared Pool 72
Free SGA Memory Available 0
● Data Buffer size 확인
6. 166│2013 기술백서 White Paper
SELECT name ,
current_size
FROM v$buffer_pool;
NAME CURRENT_SIZE
-------------------- ------------
DEFAULT 48
KEEP BUFFER 설정
KEEP Buffer 설정은 KEEP Buffer 크기와 SGA 여유공간에 따라, Online 작업 또는 Offline 작
업으로 수행한다. 이 문서의 스크립트는 SGA 영역의 메모리 관리를 수동으로 하는 경우를 바탕
으로 작성 하였다.
KEEP Buffer 설정 Script
@sga
NAME size(MB)
-------------------------------- ----------
Buffer Cache Size 500
Maximum SGA Size 1019
Free SGA Memory Available 228
@bc
NAME CURRENT_SIZE
-------------------- ------------
DEFAULT 500
● KEEP Buffer 의 크기가 SGA 의 Free 공간보다 작은 경우 Online ( KEEP Buffer 100M )
SQL> alter system set db_keep_cache_size = 100M scope = both;
System altered.
SQL> @sga
7. Part 1 ORACLE │167
NAME size(MB)
-------------------------------- ----------
Buffer Cache Size 600
Maximum SGA Size 1019
Free SGA Memory Available 128
SQL> @bc
NAME CURRENT_SIZE
-------------------- ------------
KEEP 100
DEFAULT 500
● KEEP Buffer 의 크기가 SGA 의 Free 공간보다 큰 경우 ( KEEP Buffer 300M )
1. SGA 전체 크기 늘린 후 KEEP Buffer 할당 Offline 작업 필요
SQL> alter system set sga_max_size = 1100M scope = spfile;
System altered.
SQL> alter system set db_keep_cache_size = 300M scope = spfile;
System altered.
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area 1169227776 bytes
Fixed Size 2212696 bytes
Variable Size 301993128 bytes
Database Buffers 855638016 bytes
Redo Buffers 9383936 bytes
Database mounted.
Database opened.
SQL> @sga
8. 168│2013 기술백서 White Paper
NAME size(MB)
-------------------------------- ----------
Buffer Cache Size 816
Maximum SGA Size 1115
Free SGA Memory Available 0
SQL> @bc
NAME CURRENT_SIZE
-------------------- ------------
KEEP 304
DEFAULT 512
=> SGA 전체 크기가 1G 를 초과하면서 Granule 크기가 16Mb 로 늘어났다
이 때 지정한 값 보다 큰 16 의 배수 중 가장 작은 크기의 값이 할당된다.
2. SGA 의 다른 영역의 크기를 줄인 후 KEEP Buffer 할당 할 경우 Online 작업 가능
Online 상태에서 변경 가능 Parameter 추출 Script
SELECT name ,
issys_modifiable
FROM v$parameter
WHERE name LIKE '%size%'
AND issys_modifiable = 'IMMEDIATE'
NAME ISSYS_MOD
--------------------------------------- ---------
shared_pool_size IMMEDIATE
large_pool_size IMMEDIATE
java_pool_size IMMEDIATE
streams_pool_size IMMEDIATE
db_cache_size IMMEDIATE
db_2k_cache_size IMMEDIATE
db_4k_cache_size IMMEDIATE
db_8k_cache_size IMMEDIATE
db_16k_cache_size IMMEDIATE
db_32k_cache_size IMMEDIATE
db_keep_cache_size IMMEDIATE
db_recycle_cache_size IMMEDIATE
_shared_io_pool_size IMMEDIATE
9. Part 1 ORACLE │169
db_flash_cache_size IMMEDIATE
db_recovery_file_dest_size IMMEDIATE
result_cache_max_size IMMEDIATE
workarea_size_policy IMMEDIATE
max_dump_file_size IMMEDIATE
=> 해당 Parameter 값을 적절히 조절하여 Free Memory 확보 후 KEEP 설정
테이블/인덱스 속성 변경
테이블/인덱스 속성 변경 Script
ALTER TABLE T1 STORAGE (BUFFER_POOL KEEP); --테이블 속성 변경
ALTER INDEX T1_PK STORAGE (BUFFER_POOL KEEP); --인덱스 속성 변경
ALTER TABLE P1 MODIFY PARTITION P1_1 STORAGE (BUFFER_POOL KEEP); --파티션 테이블 속성 변경
ALTER INDEX P1_ID1 MODIFY PARTITION P1_ID1_1 STORAGE (BUFFER_POOL KEEP);--파티션 인덱스 속성 변경
테이블/인덱스 Keeping
Segment 의 Buffer Pool 이 KEEP 으로 설정된 테이블과 인덱스는 Query 시 KEEP Buffer 에
해당 세그먼트의 블록을 로딩하게 된다. 그러므로 최초 세그먼트를 Loading 할 때에는 Disk I/O
가 발생하게 된다. 만일 처음 실행하는 때를 포함하여 모든 Application 의 조회에서 Disk I/O
를 제거하고 싶다면, 업무가 진행되기 전에 해당 세그먼트들을 Full Table Scan 이나 Index
Fast Full Scan 으로 KEEP Buffer 에 로딩시키면 된다.
KEEP Buffer 효율성 판단
KEEP Buffer 의 사용에 명확한 기준이 정해져 있는 것이 아니라 운영 환경에 따라 차이가 존재
한다. 때문에 모든 운영 환경에서 같은 방법으로 효율성을 판단하기에는 무리가 있다. 하지만 다
음과 같은 자료들이 KEEP Buffer 의 효율성을 판단하는데 근거가 될 수 있다.
KEEP Buffer Size & Hit Ratio SQLScript
10. 170│2013 기술백서 White Paper
SELECT current_size keep_size ,
seg_size ,
ROUND( seg_size/current_size*100 , 1 ) "Ratio(%)"
FROM v$buffer_pool ,
(
SELECT SUM( bytes ) /1024 /1024 seg_size
FROM dba_segments
WHERE buffer_pool = 'KEEP'
)
WHERE name = 'KEEP'
KEEP_SIZE SEG_SIZE Ratio(%)
---------- ---------- ----------
304 118 38.8
SELECT db_block_gets ,
consistent_gets ,
physical_reads ,
CASE
WHEN db_block_gets+consistent_gets <> 0
THEN ROUND(( 1-( physical_reads/( db_block_gets+consistent_gets ) ) ) *100 , 2 )
END "Keep_Hit(%)"
FROM v$buffer_pool_statistics
WHERE name = 'KEEP'
DB_BLOCK_GETS CONSISTENT_GETS PHYSICAL_READS Keep_Hit(%)
------------- --------------- -------------- -----------
0 44474 4435 90.03
만약 KEEP Buffer 를 사용하는 Segment 크기의 총 합이 KEEP Buffer 크기보다 작은 경우, 해
당 Segment 들의 크기가 더 이상 커지지 않는다면, 한 번 KEEP 영역으로 올라간 Segment 의
Cache Hit Ratio 가 100%에 가깝게 될 것이다.
만일 KEEP Buffer 의 크기가 해당 영역을 사용하는 Segment 들의 크기보다 작다면 KEEP
Buffer 영역에서 경합이 발생하고, 그로 인해 Physical I/O 가 발생하여 Cache Hit Ratio 가 떨
어질 수 있다. 이때 KEEP Buffer 의 크기를 늘려주거나 중요도가 떨어지는 Segment 의 KEEP
Buffer 사용을 막는 방안을 고려 해 볼 수 있다. 반대로 KEEP Buffer 의 크기가 Segment 들의
11. Part 1 ORACLE │171
크기보다 많이 크다면, 사용하지 않는 메모리 공간을 차지하고 있는 것이므로 KEEP Buffer 의
크기를 줄이는 것을 고려 해 볼 수 있다. 따라서 시스템 성능과 Segment 들의 중요도에 따라 효
율적인 KEEP Buffer 의 크기 조절이 필요하다.
다음 스크립트로 dba_hist_seg_stat 뷰를 조회하여 Segment 조회시 발생하는 I/O 발생량에
대한 AWR 정보를 확인 하여 Segment 별 효율성을 판단 할 수 있다.
Segment I/O SQLScript
accept i_begin_time prompt 'Enter begin time[YYYYMMDDHH24]: '
accept i_end_time prompt 'Enter end time[YYYYMMDDHH24]: '
variable v_begin_time char(10)
variable v_end_time char(10)
exec :v_begin_time:=&i_begin_time
exec :v_end_time :=&i_end_time
SELECT /*+ leading(k) */
s.dbid ,
decode( SUBSTR( o.object_type , 1 , 5 ) , 'TABLE' , o.object_name , 'INDEX' , (
SELECT table_name
FROM dba_indexes
WHERE index_name = o.object_name
AND owner = k.owner
) ) AS table_name ,
decode( SUBSTR( o.object_type , 1 , 5 ) , 'INDEX' , o.object_name ) AS index_name ,
s.snap_id ,
TO_CHAR( w.begin_interval_time , 'yyyymmdd.hh24' ) AS begin_time ,
s.physical_reads_delta ,
s.physical_reads_direct_delta ,
s.physical_reads_delta + s.physical_reads_direct_delta AS total_diskio
FROM sys.wrm$_snapshot w ,
dba_hist_seg_stat s ,
dba_objects o ,
(
SELECT owner ,
segment_name
FROM dba_segments
12. 172│2013 기술백서 White Paper
WHERE buffer_pool = 'KEEP'
) k
WHERE w.begin_interval_time >= to_timestamp( '2013062510' , 'yyyymmddhh24' )
AND w.end_interval_time <= to_timestamp( '2013062518' , 'yyyymmddhh24' )
AND w.snap_id = s.snap_id
AND w.dbid = s.dbid
AND w.instance_number = s.instance_number
AND s.obj# = o.object_id
AND k.segment_name = o.object_name
AND k.owner = o.owner
ORDER BY 2 , 3 , 5
결론
KEEP Buffer 를 사용하는데 있어 가장 중요한 것이 업무의 반영일 것이다. 자주 사용하는
Object 들만 상주하는 Buffer Cache 하나만 사용하는 것이 시스템 전체의 관점에서 보면 효율
적일 수도 있다. 하지만 업무의 중요성이나 특성을 고려한다면 다른 결과가 나올 수 있다. 적게
실행 되더라도 중요도가 높은 업무가 있을 수 있고, 수행시간 단축이 매우 중요한 업무가 있을
수 있다. 이러한 업무에 대한 특성을 반영한 운영계획을 세우는데 있어서, KEEP Buffer 를 효율
적으로 사용 할 수 있다면, 시스템 성능 향상에 큰 도움이 될 것이다.