The document discusses the process of query compilation in a database management system. It involves 6 main steps: 1) Parsing the SQL query into a parse tree, 2) Converting the parse tree into a logical query plan, 3) Optimizing the logical query plan by applying transformation rules, 4) Estimating the sizes of results from operations in the logical query plan, 5) Generating multiple physical query plans from the logical plan, and 6) Estimating the costs of physical plans and selecting the most efficient plan to execute. The document focuses on relational algebra rules for optimization and techniques for estimating result sizes of operations like selections, joins, and projections.
An array is a list of homogeneous (similar data type) elements stored in contiguous memory locations. Elements are accessed via an index. One dimensional arrays use a single subscript, while two dimensional arrays use two subscripts to reference an element. Arrays can be initialized during declaration. The size of an array is calculated as upper bound - lower bound + 1. Elements of arrays can be traversed, inserted, or deleted. Two dimensional arrays can be stored in row major or column major order, affecting the calculation of element addresses. Multidimensional arrays generalize this to any number of dimensions.
The document discusses key concepts in calculus including:
- Differential calculus examines how quantities change by looking at their rates of change, represented by derivatives.
- Integration is used to determine quantities like material needs or structure weights by calculating the area under a curve.
- Calculus has various applications in fields like engineering, physics, and robotics where quantities change continuously over time.
- The document provides examples of how differential and integral calculus are used in applications such as space travel planning, architecture, and robotics.
t-tests in R - Lab slides for UGA course FANR 6750richardchandler
油
This document outlines a lab on summary statistics, graphics, and t-tests in R. It introduces topics like importing data, creating graphics like boxplots and histograms, and performing different types of t-tests (two-sample t-test assuming equal variances, paired t-test, equality of variances test) to compare two samples and determine if they came from the same population. Exercises are provided to have students practice these skills, and an assignment asks them to write an R script to conduct and comment on the results of various t-tests.
The aim of this presentation is to revise the functional regression models with scalar response (Linear, Nonlinear and Semilinear) and the extension to the more general case where the response belongs to the exponential family (binomial, poisson, gamma, ...). This extension allows to develop new functional classification methods based on this regression models. Some examples along with code implementation in R are provided during the talk. Lecturer: Manuel Febrero Bande, Univ. de Santiago de Compostela, Spain.
This document discusses OLAP functions in Informix 12.1. It provides an overview of OLAP and what it is used for in business intelligence. It then describes the OVER clause and how it defines the domain of OLAP function calculation using optional PARTITION BY, ORDER BY, and WINDOW FRAME clauses. Several examples of ranking, aggregation, and analytic OLAP functions like RANK, SUM, and LAG are shown. The document concludes by noting how OLAP functions can be accelerated by the Informix Warehouse Accelerator.
Multinomial Logistic Regression with Apache SparkDB Tsai
油
Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it's with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer.
Bio:
DB Tsai is a machine learning engineer working at Alpine Data Labs. He is recently working with Spark MLlib team to add support of L-BFGS optimizer and multinomial logistic regression in the upstream. He also led the Apache Spark development at Alpine Data Labs. Before joining Alpine Data labs, he was working on large-scale optimization of optical quantum circuits at Stanford as a PhD student.
This document discusses algorithms and data structures. It begins by defining an algorithm as a set of instructions to accomplish a task and lists criteria such as being unambiguous and terminating. Data types and abstract data types are introduced. Methods for analyzing programs are covered, including time and space complexity using asymptotic notation. Examples are provided to illustrate iterative and recursive algorithms for summing lists as well as matrix operations.
2014-06-20 Multinomial Logistic Regression with Apache SparkDB Tsai
油
Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it's with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer.
Bio:
DB Tsai is a machine learning engineer working at Alpine Data Labs. He is recently working with Spark MLlib team to add support of L-BFGS optimizer and multinomial logistic regression in the upstream. He also led the Apache Spark development at Alpine Data Labs. Before joining Alpine Data labs, he was working on large-scale optimization of optical quantum circuits at Stanford as a PhD student.
This document discusses balanced binary search trees (BSTs), specifically AVL trees. It explains that AVL trees ensure insertions and deletions are O(log N) by keeping the height difference between left and right subtrees no more than 1. It covers the four types of rotations (LL, RR, LR, RL) used to rebalance the tree after insertions or deletions. Deletions can also cause imbalance and require L, R, L0, L1, R0, R1, L-1, R-1 rotations depending on the balancing factors. Examples are provided for each type of rotation.
Efficient Hill Climber for Multi-Objective Pseudo-Boolean Optimizationjfrchicanog
油
1) The document proposes an efficient hill climber algorithm for multi-objective pseudo-boolean optimization problems.
2) It computes scores that represent the change in fitness from moving to neighboring solutions, and updates these scores incrementally as the solution moves rather than recomputing from scratch.
3) The scores can be decomposed and updated in constant time by analyzing the variable interaction graph to identify variables that do not interact.
Spark summit talk, july 2014 powered by revealDebasish Das
油
This document discusses using quadratic programming solvers for non-negative matrix factorization with Spark. It provides an overview of matrix factorization and how NMF can be formulated as a quadratic program. It then describes using ADMM and ECOS to solve the resulting QP, including implementations in Spark. Experimental results on movie recommendation datasets show the performance of different approaches for constraints like positivity, sparsity, and equality constraints. Future work areas include optimization and additional constrained convex minimization problems.
1. A state variable representation uses state variables, inputs, and outputs to model dynamic systems. The state variables provide information about the internal state of the system.
2. The behavior of a system can be described by state equations that relate the time derivatives of the state variables to the inputs, state variables, and outputs.
3. Eigenvalues and eigenvectors, which are derived from state variable models, have many applications including vibration analysis, image recognition, and determining communication channel capacities.
This document discusses query optimization in database systems. It begins by describing the components of a database management system and how queries are processed. It then explains that the goal of query optimization is to reduce the execution cost of a query by choosing efficient access methods and ordering operations. The document outlines different query plans involving table scans, index scans, and joins. It also introduces concepts like filter factors, statistics about tables and indexes, and how these are used to estimate the cost of alternative query execution plans.
This document discusses randomized data structures and algorithms. It begins by motivating randomized data structures by noting that some data structures like binary search trees have average case performance but worst case inputs. Randomizing the data structure removes dependency on inputs and provides expected case performance. The document then discusses treaps and randomized skip lists as examples of randomized data structures that provide efficient expected case performance for operations like insertion, deletion, and search. It also covers topics like randomized number generation, primality testing, and how randomization can transform average case runtimes into expected case runtimes.
- A parallel search algorithm finds two elements in a sorted array that bracket a query element in logarithmic time using p processors (paragraph 1).
- A parallel merging algorithm uses the parallel search to rank and merge two sorted arrays in optimal O(log log n) time and O(n log log n) work (paragraph 2).
- An efficient sorting algorithm uses the optimal parallel merging algorithm in a merge sort approach to sort n elements in O(n log n) work and O(log n log log n) time (paragraph 3).
The document discusses query execution in database management systems. It begins with an example query on a City, Country database and represents it in relational algebra. It then discusses different query execution strategies like table scan, nested loop join, sort merge join, and hash join. The strategies are compared based on their memory and disk I/O requirements. The document emphasizes that query execution plans can be optimized for parallelism and pipelining to improve performance.
This document outlines a tutorial agenda that is split into two parts. Part 1 covers getting started with RESTful applications, standard responses, response generators, and exception handlers. Part 2 covers request handlers, jQuery and jQueryUI, and a pair-programming challenge, as well as discussing more advanced topics and answering questions.
Relazioni presentate al convegno GIORNATE DI NEUROPSICOLOGIA DELL'ET EVOLUTIVA
che si 竪 svolto dal 20 al 23 Gennaio 2016, a Bressanone (BZ), presso la Libera Universit di Bolzano, Via Ratisbona, 16.
BlackBerry smartphones can send faxes. Online stores offer free shipping within the US and Canada when purchasing mobile phones and accessories. High resolution cameras, web browsing, sharing files and playing music with low battery usage are possible due to mobile phone chips.
Iain Lewis has over 10 years of experience in customer service roles. He currently works as a stage technician and front of house assistant for a theatre group, building and maintaining sets, loading equipment, and attending to patrons. Previously he has worked in animal handling, retail, and as a snowboard rental technician. Lewis has qualifications in musical instrument making, carpentry, and is a certified snowboard instructor. He is looking for a new opportunity that allows him to utilize his problem-solving and customer service skills.
This document describes the development of an autonomous electric car controlled by a microcontroller and Android app. The car can operate in either manual mode, controlled by a user, or autonomous mode, controlled by the microcontroller receiving instructions from the app via Bluetooth. The app allows the user to select the mode and send commands to control the car's movement and sensors. The microcontroller code executes these commands, controlling the car's motors and reading sensor data to enable autonomous operation. The project combined skills in motor control, microcontroller programming, communication protocols, and app development to create a functioning prototype autonomous vehicle.
El documento describe los roles importantes en el desarrollo de sitios web como el director creativo, el due単o, los programadores y los copywriters. Luego clasifica los sitios web por audiencia, dinamismo, apertura y profundidad. Finalmente analiza la anatom鱈a de las p叩ginas de CNN y Coca Cola Colombia, describiendo elementos como el logo, men炭, slider y contenido.
Online chat is a form of communication that utilizes computer programs that allow for two-way conversations between users in real time (events that occur in cyberspace at the same speed that they would occur in real li
1. O documento fornece informa巽探es t辿cnicas sobre diversos tipos de materiais, acabamentos, fixa巽探es, fun巽探es de conjuntos de parafusos, tipos de aperto, cabe巽as, medidas em polegadas e m辿tricas, tabelas de roscas e dicas sobre aplica巽達o de roscas posti巽as.
2. Inclui detalhes sobre 10 tipos de materiais comuns, 6 tipos de acabamento, 4 tipos de fixa巽達o, 4 fun巽探es de conjuntos de parafusos, 6 tipos de aperto e 7 tipos de cabe巽as.
3
The document discusses the 7 layers of security in a computer security ecosystem according to the OSI model: physical, data link, network, transport, session, presentation, and application. It describes attacks that can occur at each layer and how lower layer security measures like firewalls and intrusion detection systems are not sufficient to prevent application layer attacks. The growth of applications and their vulnerabilities has increased risks to the entire security ecosystem. Implementing application security is necessary to proactively reduce vulnerabilities and better protect the ecosystem.
Las plantas vasculares sin semillas dominaban el paisaje hace unos 350 millones de a単os. Evolucionaron a partir de las algas verdes de la clase Charophyceae hace unos 430 millones de a単os. Inclu鱈an Bri坦fitos, helechos y plantas similares a 叩rboles que crec鱈an en bosques tropicales pantanosos durante el per鱈odo Carbon鱈fero.
2014-06-20 Multinomial Logistic Regression with Apache SparkDB Tsai
油
Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it's with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer.
Bio:
DB Tsai is a machine learning engineer working at Alpine Data Labs. He is recently working with Spark MLlib team to add support of L-BFGS optimizer and multinomial logistic regression in the upstream. He also led the Apache Spark development at Alpine Data Labs. Before joining Alpine Data labs, he was working on large-scale optimization of optical quantum circuits at Stanford as a PhD student.
This document discusses balanced binary search trees (BSTs), specifically AVL trees. It explains that AVL trees ensure insertions and deletions are O(log N) by keeping the height difference between left and right subtrees no more than 1. It covers the four types of rotations (LL, RR, LR, RL) used to rebalance the tree after insertions or deletions. Deletions can also cause imbalance and require L, R, L0, L1, R0, R1, L-1, R-1 rotations depending on the balancing factors. Examples are provided for each type of rotation.
Efficient Hill Climber for Multi-Objective Pseudo-Boolean Optimizationjfrchicanog
油
1) The document proposes an efficient hill climber algorithm for multi-objective pseudo-boolean optimization problems.
2) It computes scores that represent the change in fitness from moving to neighboring solutions, and updates these scores incrementally as the solution moves rather than recomputing from scratch.
3) The scores can be decomposed and updated in constant time by analyzing the variable interaction graph to identify variables that do not interact.
Spark summit talk, july 2014 powered by revealDebasish Das
油
This document discusses using quadratic programming solvers for non-negative matrix factorization with Spark. It provides an overview of matrix factorization and how NMF can be formulated as a quadratic program. It then describes using ADMM and ECOS to solve the resulting QP, including implementations in Spark. Experimental results on movie recommendation datasets show the performance of different approaches for constraints like positivity, sparsity, and equality constraints. Future work areas include optimization and additional constrained convex minimization problems.
1. A state variable representation uses state variables, inputs, and outputs to model dynamic systems. The state variables provide information about the internal state of the system.
2. The behavior of a system can be described by state equations that relate the time derivatives of the state variables to the inputs, state variables, and outputs.
3. Eigenvalues and eigenvectors, which are derived from state variable models, have many applications including vibration analysis, image recognition, and determining communication channel capacities.
This document discusses query optimization in database systems. It begins by describing the components of a database management system and how queries are processed. It then explains that the goal of query optimization is to reduce the execution cost of a query by choosing efficient access methods and ordering operations. The document outlines different query plans involving table scans, index scans, and joins. It also introduces concepts like filter factors, statistics about tables and indexes, and how these are used to estimate the cost of alternative query execution plans.
This document discusses randomized data structures and algorithms. It begins by motivating randomized data structures by noting that some data structures like binary search trees have average case performance but worst case inputs. Randomizing the data structure removes dependency on inputs and provides expected case performance. The document then discusses treaps and randomized skip lists as examples of randomized data structures that provide efficient expected case performance for operations like insertion, deletion, and search. It also covers topics like randomized number generation, primality testing, and how randomization can transform average case runtimes into expected case runtimes.
- A parallel search algorithm finds two elements in a sorted array that bracket a query element in logarithmic time using p processors (paragraph 1).
- A parallel merging algorithm uses the parallel search to rank and merge two sorted arrays in optimal O(log log n) time and O(n log log n) work (paragraph 2).
- An efficient sorting algorithm uses the optimal parallel merging algorithm in a merge sort approach to sort n elements in O(n log n) work and O(log n log log n) time (paragraph 3).
The document discusses query execution in database management systems. It begins with an example query on a City, Country database and represents it in relational algebra. It then discusses different query execution strategies like table scan, nested loop join, sort merge join, and hash join. The strategies are compared based on their memory and disk I/O requirements. The document emphasizes that query execution plans can be optimized for parallelism and pipelining to improve performance.
This document outlines a tutorial agenda that is split into two parts. Part 1 covers getting started with RESTful applications, standard responses, response generators, and exception handlers. Part 2 covers request handlers, jQuery and jQueryUI, and a pair-programming challenge, as well as discussing more advanced topics and answering questions.
Relazioni presentate al convegno GIORNATE DI NEUROPSICOLOGIA DELL'ET EVOLUTIVA
che si 竪 svolto dal 20 al 23 Gennaio 2016, a Bressanone (BZ), presso la Libera Universit di Bolzano, Via Ratisbona, 16.
BlackBerry smartphones can send faxes. Online stores offer free shipping within the US and Canada when purchasing mobile phones and accessories. High resolution cameras, web browsing, sharing files and playing music with low battery usage are possible due to mobile phone chips.
Iain Lewis has over 10 years of experience in customer service roles. He currently works as a stage technician and front of house assistant for a theatre group, building and maintaining sets, loading equipment, and attending to patrons. Previously he has worked in animal handling, retail, and as a snowboard rental technician. Lewis has qualifications in musical instrument making, carpentry, and is a certified snowboard instructor. He is looking for a new opportunity that allows him to utilize his problem-solving and customer service skills.
This document describes the development of an autonomous electric car controlled by a microcontroller and Android app. The car can operate in either manual mode, controlled by a user, or autonomous mode, controlled by the microcontroller receiving instructions from the app via Bluetooth. The app allows the user to select the mode and send commands to control the car's movement and sensors. The microcontroller code executes these commands, controlling the car's motors and reading sensor data to enable autonomous operation. The project combined skills in motor control, microcontroller programming, communication protocols, and app development to create a functioning prototype autonomous vehicle.
El documento describe los roles importantes en el desarrollo de sitios web como el director creativo, el due単o, los programadores y los copywriters. Luego clasifica los sitios web por audiencia, dinamismo, apertura y profundidad. Finalmente analiza la anatom鱈a de las p叩ginas de CNN y Coca Cola Colombia, describiendo elementos como el logo, men炭, slider y contenido.
Online chat is a form of communication that utilizes computer programs that allow for two-way conversations between users in real time (events that occur in cyberspace at the same speed that they would occur in real li
1. O documento fornece informa巽探es t辿cnicas sobre diversos tipos de materiais, acabamentos, fixa巽探es, fun巽探es de conjuntos de parafusos, tipos de aperto, cabe巽as, medidas em polegadas e m辿tricas, tabelas de roscas e dicas sobre aplica巽達o de roscas posti巽as.
2. Inclui detalhes sobre 10 tipos de materiais comuns, 6 tipos de acabamento, 4 tipos de fixa巽達o, 4 fun巽探es de conjuntos de parafusos, 6 tipos de aperto e 7 tipos de cabe巽as.
3
The document discusses the 7 layers of security in a computer security ecosystem according to the OSI model: physical, data link, network, transport, session, presentation, and application. It describes attacks that can occur at each layer and how lower layer security measures like firewalls and intrusion detection systems are not sufficient to prevent application layer attacks. The growth of applications and their vulnerabilities has increased risks to the entire security ecosystem. Implementing application security is necessary to proactively reduce vulnerabilities and better protect the ecosystem.
Las plantas vasculares sin semillas dominaban el paisaje hace unos 350 millones de a単os. Evolucionaron a partir de las algas verdes de la clase Charophyceae hace unos 430 millones de a単os. Inclu鱈an Bri坦fitos, helechos y plantas similares a 叩rboles que crec鱈an en bosques tropicales pantanosos durante el per鱈odo Carbon鱈fero.
Gabaritos para Imprimir - Tamanho Real de Parafusos, Porcas e ArruelasRH Indufix Fixadores
油
Quanta vezes voc棚 precisou comprar um parafuso, porca ou arruela e n達o sabia ao certo qual era a medida, tipo de cabe巽a, passo de rosca ou at辿 mesmo a aplica巽達o correta?
Para resolver essas quest探es, preparamos um guia onde voc棚 encontrar叩 imagens de fixadores em TAMANHO REAL que, ao serem impressas, podem lhe auxiliar a descobrir a medida do item!
Imprima a p叩gina com o gabarito, posicione sua amostra de parafuso, porca ou arruela sobre a imagem correspondente e identifique corretamente suas dimens探es. Viu como 辿 f叩cil?
PARA MELHOR QUALIDADE, BAIXE NO LINK http://www.catalogo.indufix.com.br/tamanho-real-parafuso-porca-arruela-gabarito-para-imprimir
Market research and proposed strategy for the Brother P-Touch to become an essential tool for kids and parents during the Back-to-school retail season. This work was completed during my time with Concept Farm as an account supervisor and strategist on the brand.
The document discusses query optimization in database management systems. It covers converting SQL queries to logical and physical query plans, improving logical plans through algebraic transformations, and choosing the optimal physical query plan by considering the order of operations and join trees. The goal is to select the most efficient physical plan by estimating the size of relations and intermediate results.
This document provides an overview of digital systems and number representation in digital logic design. It discusses:
- Digital systems take discrete inputs and have discrete internal states to generate discrete outputs.
- Digital systems can be combinational (output depends only on input) or sequential (output depends on input and state). Sequential systems can be synchronous (state updates at clock) or asynchronous.
- Number systems like binary, octal, hexadecimal represent numbers using different radixes or bases. Binary uses two digits (0-1) while octal uses eight and hexadecimal uses sixteen.
- Operations like addition and subtraction can be performed in any number base through appropriate algorithms. Numbers can be converted between bases through division and
The document outlines various statistical and data analysis techniques that can be performed in R including importing data, data visualization, correlation and regression, and provides code examples for functions to conduct t-tests, ANOVA, PCA, clustering, time series analysis, and producing publication-quality output. It also reviews basic R syntax and functions for computing summary statistics, transforming data, and performing vector and matrix operations.
The document discusses query optimization in databases. It explains that the goal of query optimization is to determine the most efficient execution plan for a query to minimize the time needed. It outlines the typical steps in query optimization, including parsing/translation, applying relational algebra, and optimizing the query plan. It also discusses techniques like generating alternative execution plans using equivalence rules, estimating plan costs based on statistical data, and using heuristics or dynamic programming to choose the optimal plan.
S1 - Process product optimization using design experiments and response surfa...CAChemE
油
An intensive practical course mainly for PhD-students on the use of designs of experiments (DOE) and response surface methodology (RSM) for optimization problems. The course covers relevant background, nomenclature and general theory of DOE and RSM modelling for factorial and optimisation designs in addition to practical exercises in Matlab. Due to time limitations, the course concentrates on linear and quadratic models on the k3 design dimension. This course is an ideal starting point for every experimental engineering wanting to work effectively, extract maximal information and predict the future behaviour of their system.
Mikko M辰kel辰 (DSc, Tech) is a postdoctoral fellow at the Swedish University of Agricultural Sciences in Ume奪, Sweden and is currently visiting the Department of Chemical Engineering at the University of Alicante. He is working in close cooperation with Paul Geladi, Professor of Chemometrics, and using DOE and RSM for process optimization mainly for the valorization of industrial wastes in laboratory and pilot scales.
This document discusses the architecture and optimization of database management systems (DBMS). It covers:
1) The main components of a DBMS architecture including the query executor, buffer manager, storage manager, transaction manager, and more.
2) Query optimization techniques including rule-based optimization, cost-based optimization using a dynamic programming algorithm to search the plan space, and reducing the plan space.
3) Cost estimation including estimating selectivity factors, output sizes, and costs of different query execution plans without executing them.
Relational algebra allows querying relational databases using a set of operators. Key operators include selection () to filter tuples, projection () to select attributes, and join () to combine relations. More complex queries can be built by combining multiple operators. The division operator (/) is used to find tuples that have a relationship to all tuples in another relation. While not directly supported, division queries can be computed from other relational algebra operations and their complements.
Unit-1 Basic Concept of Algorithm.pptxssuser01e301
油
The document discusses various topics related to algorithms including algorithm design, real-life applications, analysis, and implementation. It specifically covers four algorithms - the taxi algorithm, rent-a-car algorithm, call-me algorithm, and bus algorithm - for getting from an airport to a house. It also provides examples of simple multiplication methods like the American, English, and Russian approaches as well as the divide and conquer method.
The document discusses using machine learning techniques like reinforcement learning and generative adversarial networks to improve query optimization in databases. Specifically, it summarizes work using deep Q-learning (DQ) and a neural optimizer (Neo) to learn join ordering, as well as using intra-query learning with SkinnerDB. It proposes using generative adversarial networks and Monte Carlo tree search to address shortcomings in existing approaches like lack of training data and balancing exploration vs exploitation. Generative adversarial networks could generate additional training data while Monte Carlo tree search would help optimize join ordering on a per-query basis.
The document discusses query processing in distributed databases. It describes the key steps of query processing as decomposition, localization, and optimization. Decomposition breaks queries into algebraic query trees. Localization rewrites the trees to replace relations with fragments based on fragmentation schemes. Optimization chooses the lowest cost query execution plan based on a cost model and localized query trees. The document provides examples of localization rules and parallel/distributed query operations like sorting, joining, and aggregation.
Process for heuristics optimization
1. The parser of a high-level query generates an initial internal representation;
2. Apply heuristics rules to optimize the internal representation.
3. A query execution plan is generated to execute groups of operations based on the access paths available on the files involved in the query.
Lecture 06 relational algebra and calculusemailharmeet
油
The document discusses data manipulation languages (DML) for databases. There are two main types of DML: navigational/procedural and non-navigational/non-procedural. Relational algebra is a non-navigational DML defined by Codd that uses algebraic operations like selection, projection, join, etc. on tables. Relational calculus is also a non-navigational DML that defines new relations in terms of predicates on tuple variables ranging over named relations.
Relational algebra is the basis for formal relational query languages. It uses operations like selection, projection, join, union and aggregation to query relations and return results as relations. Some key points covered are:
- Relational algebra operations include selection, projection, join, union, set differences and aggregations to query and manipulate relations.
- Relations are the operands and operations return relations, so operations can be composed to form complex queries.
- Joins combine tuples from two relations based on a join condition. Outer joins retain all tuples from one or both relations whether or not they meet the join condition.
- Aggregate operations perform functions like sum, count and average across groups of tuples in a
Relational algebra is a procedural query language used to manipulate relations in a relational database. It consists of operators like select, project, join, union, and set difference. SQL is based on the concepts of relational algebra. Relational algebra expressions specify a sequence of operators to apply to relations in order to retrieve the desired data from the database. Some key operators include selection to filter tuples, projection to select attributes, and join to combine tuples from two relations based on a join condition.
The document discusses relational algebra operations. It defines the select, project, cartesian product, and join operations. It provides examples of relational algebra expressions using these operations on sample relations to retrieve specific tuples based on conditions.
This document provides an overview of statistical concepts and analysis techniques in R, including measures of central tendency, data variability, correlation, regression, and time series analysis. Key points covered include mean, median, mode, variance, standard deviation, z-scores, quartiles, standard deviation vs variance, correlation, ANOVA, and importing/working with different data structures in R like vectors, lists, matrices, and data frames.
1. DBMS 2001 Notes 6: Query Compilation 1
Principles of Database
Management Systems
6: Query Compilation and
Optimization
Pekka Kilpel辰inen
(partially based on Stanford CS245 slide
originals by Hector Garcia-Molina, Jeff Ullman
and Jennifer Widom)
2. DBMS 2001 Notes 6: Query Compilation 2
Overview
We have studied recently:
Algorithms for selections and joins, and their costs
(in terms of disk I/O)
Next: A closer look at query compilation
Parsing
Algebraic optimization of logical query plans
Estimating sizes of intermediate results
Focus still on selections and joins
Remember the overall process of query
execution:
3. DBMS 2001 Notes 6: Query Compilation 3
{P1,P2,..}
{P1,C1>...}
parse
convert
apply laws
estimate result sizes
consider physical plans estimate costs
pick best
execute
Pi
answer
SQL query
parse tree
logical query plan
improved l.q.p
l.q.p. +sizes
statistics
4. DBMS 2001 Notes 6: Query Compilation 4
Step 1: Parsing
Check syntactic correctness of the query, and
transform it into a parse tree
Based on the formal grammar for SQL
Inner nodes nonterminal symbols (syntactic
categories for things like <Query>, <RelName>, or
<Condition>)
Leaves terminal symbols: names of relations or
attributes, keywords (SELECT, FROM, ),
operators (+, AND, OR, LIKE, ), operands (10,
'%1960', )
Also check semantic correctness: Relations and
attributes exist, operands compatible with
operators,
5. DBMS 2001 Notes 6: Query Compilation 5
Example: SQL query
Consider querying a movie database with relations
StarsIn(title, year, starName)
MovieStar(name, address, gender, birthdate)
SELECT title
FROM StarsIn, MovieStar
WHERE starName = name AND birthdate LIKE %1960 ;
(Find the titles for movies with stars born in 1960)
6. DBMS 2001 Notes 6: Query Compilation 6
Example: Parse Tree
<Query>
<SFW>
SELECT <SelList> FROM <FromList> WHERE <Condition>
<Attribute> <RelName> , <FromList> AND
title StarsIn <RelName>
<Condition> <Condition>
<Attribute> = <Attribute> <Attribute> LIKE <Pattern>
starName name birthdate %1960
MovieStar
7. DBMS 2001 Notes 6: Query Compilation 7
Step 2:
Parse Tree > Logical Query
Plan Basic strategy:
SELECT A, B, C
FROM R1, R2
WHERE Cond ;
becomes
A,B,C[Cond(R1 x R2)]
8. DBMS 2001 Notes 6: Query Compilation 8
Example: Logical Query Plan
title
starName=name and birthdate LIKE %1960
StarsIn MovieStar
9. DBMS 2001 Notes 6: Query Compilation 9
Step 3: Improving the L.Q.P
Transform the logical query plan into an
equivalent form expected to lead to
better performance
Based on laws of relational algebra
Normal to produce a single optimized form,
which acts as input for the generation of
physical query plans
10. DBMS 2001 Notes 6: Query Compilation 10
Example: Improved Logical Query Plan
title
starName=name
StarsIn
birthdate LIKE %1960
MovieStar
11. DBMS 2001 Notes 6: Query Compilation 11
Step 4: Estimate Result Sizes
Cost estimates of database algorithms
depend on sizes of input relations
> Need to estimate sizes of
intermediate results
Estimates based on statistics about
relations, gathered periodically or
incrementally
12. DBMS 2001 Notes 6: Query Compilation 12
Example: Estimate Result Sizes
Need expected size
StarsIn
MovieStar
13. DBMS 2001 Notes 6: Query Compilation 13
Steps 5, 6, ...
Generate and compare query plans
generate alternate query plans P1, , Pk by
selecting algorithms and execution orders
for relational operations
Estimate the cost of each plan
Choose the plan Pi estimated to be "best"
Execute plan Pi and return its result
14. DBMS 2001 Notes 6: Query Compilation 14
Example: One Physical Plan
Parameters: join order,
memory size, project attributes,...
Hash join
SEQ scan index scan Parameters:
Select Condition,...
StarsIn MovieStar
15. DBMS 2001 Notes 6: Query Compilation 15
Next: Closer look at ...
Transformation rules
Estimating result sizes
16. DBMS 2001 Notes 6: Query Compilation 16
Relational algebra optimization
Transformation rules
(preserve equivalence)
What are good transformations?
17. DBMS 2001 Notes 6: Query Compilation 17
Rules: Natural joins
R S = S R (commutative)
(R S) T = R (S T) (associative)
Carry attribute names in results, so order
is not important
> Can evaluate in any order
18. DBMS 2001 Notes 6: Query Compilation 18
(R x S) x T = R x (S x T)
R x S = S x R
R U (S U T) = (R U S) U T
R U S = S U R
Rules:
Cross products & union similarly
(both associative & commutative):
19. DBMS 2001 Notes 6: Query Compilation 19
Rules: Selects
1. p1p2(R) =
2. p1vp2(R) =
p1 [ p2 (R)]
[ p1 (R)] U [ p2 (R)]
1. Especially useful (applied left-to-right):
Allows compound select conditions to be
split and moved to suitable positions
(See next)
NB: = AND; v = OR
20. DBMS 2001 Notes 6: Query Compilation 20
Rules: Products to Joins
Definition of Natural Join:
R S = L[C(R x S)] ;
Condition C equates attributes common to R
and S, and L projects one copy of them out
Applied right-to-left
- definition of general join applied similarly
21. DBMS 2001 Notes 6: Query Compilation 21
Let p = predicate with only R attribs
q = predicate with only S attribs
m = predicate with attribs common to R,S
p (R S) =
q (R S) =
Rules: + combined
[p (R)] S
R [q (S)]
More rules can be derived...
23. DBMS 2001 Notes 6: Query Compilation 23
Derivation for first one;
Others for homework:
pq (R S) =
p [q (R S) ] =
p [(R q (S) ] =
[p (R)] [q (S)]
24. DBMS 2001 Notes 6: Query Compilation 24
Rules for , combined with X
similar...
e.g., p (R X S) = ?
25. DBMS 2001 Notes 6: Query Compilation 25
p1p2 (R) p1 [p2 (R)]
p (R S) [p (R)] S
R S S R
Some good transformations:
No transformation is always good
Usually good: early selections
26. DBMS 2001 Notes 6: Query Compilation 26
Outline - Query Processing
Relational algebra level
transformations
good transformations
Detailed query plan level
estimate costs
generate and compare plans
next
27. DBMS 2001 Notes 6: Query Compilation 27
Estimating cost of query plan
(1) Estimating size of intermediate results
>
(2) Estimating # of I/Os
(considered last week)
28. DBMS 2001 Notes 6: Query Compilation 28
Estimating result size
Maintain statistics for relation R
T(R) : # tuples in R
L(R) : Length of rows,
# of bytes in each R tuple
B(R): # of blocks to hold all R tuples
V(R, A) :
# distinct values in R for attribute A
29. DBMS 2001 Notes 6: Query Compilation 29
Example
R A: 20 byte string
B: 4 byte integer
C: 8 byte date
D: 5 byte string
A B C D
cat 1 10 a
cat 1 20 b
dog 1 30 a
dog 1 40 c
bat 1 50 d
T(R) = 5 L(R) = 37
V(R,A) = 3 V(R,C) = 5
V(R,B) = 1 V(R,D) = 4
31. DBMS 2001 Notes 6: Query Compilation 31
L(W) = L(R)
T(W) = ?
Estimates for selection W = A=a (R)
= AVG number of tuples that satisfy
an equality condition on R.A
=
32. DBMS 2001 Notes 6: Query Compilation 32
Example
R V(R,A)=3
V(R,B)=1
V(R,C)=5
V(R,D)=4
AVG size of, say, A=val(R)?
T(A=cat(R))=2, T(A=dog(R))=2, T(A=bat(R))=1
=> (2+2+1)/3 = 5/3
A B C D
cat 1 10 a
cat 1 20 b
dog 1 30 a
dog 1 40 c
bat 1 50 d
33. DBMS 2001 Notes 6: Query Compilation 33
Size of W = Z=a(R) in general:
Assume: Only existing Z values a1, a2,
are used in a select expression Z= ai,
each with equal probalility 1/V(R,Z)
=> the expected size of Z=val(R) is
E(T(W)) = 裡i 1/V(R,Z) * T(Z=a
i (R))
= T(R)/V(R,Z)
34. DBMS 2001 Notes 6: Query Compilation 34
What about selection W=z val (R) ?
T(W) = ?
Estimate 1: T(W) = T(R)/2
Rationale: On avg, 1/2 of tuples satisfy the condition
Estimate 2: T(W) = T(R)/3
Rationale: Acknowledges the tendency of
selecting "interesting" (e.g., rare tuples) more
frequently
<, and < similary
35. DBMS 2001 Notes 6: Query Compilation 35
Size estimates for W = R1 R2
Let X = attributes of R1
Y = attributes of R2
X Y =
Same as R1 x R2
Case 1
36. DBMS 2001 Notes 6: Query Compilation 36
W = R1 R2 X Y = A
R1 A B C R2 A D
Case 2
Assumption:
V(R1,A) V(R2,A) A(R1) A(R2)
V(R2,A) V(R1,A) A(R2) A(R1)
Containment of value sets [Sec. 7.4.4]
37. DBMS 2001 Notes 6: Query Compilation 37
Why should containment of value sets hold?
Consider joining relations
Faculties(FName, ) and
Depts(DName, FName, )
where Depts.FName is a foreign key, and
faculties can have 0,,n departments.
Now V(Depts, FName) V(Faculties,
FName), and referential integrity requires that
FName(Depts) FName(Faculties)
38. DBMS 2001 Notes 6: Query Compilation 38
R1 A B C R2 A D
a
Estimating T(W) when V(R1,A) V(R2,A)
Take
1 tuple
Match
Each estimated to match with T(R2)/V(R2,A)
tuples ...
so T(W) = T(R1)T(R2)/V(R2, A)
40. DBMS 2001 Notes 6: Query Compilation 40
With similar ideas, can estimate sizes of:
W = AB (R) .. [Sec. 7.4.2]
W = A=aB=b (R) = A=a (B=b (R));
Ass. A and B independent =>
T(W) = T(R)/(V(R, A) x V(R, B))
W=R(A,X,Y) S(X,Y,B); [Sec. 7.4.5]
Ass. X and Y independent =>
T(W) = T(R)T(S)/(max{V(R,X), V(S,X)} x
max{V(R,Y), V(S,Y)})
Union, intersection, diff, . [Sec. 7.4.7]
41. DBMS 2001 Notes 6: Query Compilation 41
Note: for complex expressions, need
intermediate estimates.
E.g. W = [A=a (R1) ] R2
Treat as relation U
T(U) = T(R1)/V(R1,A) L(U) = L(R1)
Also need an estimate for V(U, Ai) !
42. DBMS 2001 Notes 6: Query Compilation 42
To estimate V (U, Ai)
E.g., U = A=a (R)
Say R has attributes A,B,C,D
V(U, A) = ?
V(U, B) = ?
V(U, C) = ?
V(U, D) = ?
43. DBMS 2001 Notes 6: Query Compilation 43
Example
R V(R,A)=3
V(R,B)=1
V(R,C)=T(R)=5
V(R,D)=3
U = A=x (R)
A B C D
cat 1 10 10
cat 1 20 20
dog 1 30 10
dog 1 40 30
bat 1 50 10
V(U, D) = V(R,D)/T(R,A) ... V(R,D)
V(U,A) =1 V(U,B) =1 V(U,C) = T(R)
V(R,A)
44. DBMS 2001 Notes 6: Query Compilation 44
V(U,Ai) for Joins U = R1(A,B) R2(A,C)
V(U,A) = min { V(R1, A), V(R2, A) }
V(U,B) = V(R1, B)
V(U,C) = V(R2, C)
[Assumption called
preservation of value sets, sect. 7.4.4]
Values of non-join
attributes preserved
48. DBMS 2001 Notes 6: Query Compilation 48
Outline/Summary
Estimating cost of query plan
Estimating size of results done!
Estimating # of IOs last week
Generate and compare plans
skip this
Execute physical operations of query plan
Sketched last week