The document provides explanations of various SQL concepts including cross join, order by, distinct, union and union all, truncate and delete, compute clause, data warehousing, data marts, fact and dimension tables, snowflake schema, ETL processing, BCP, DTS, multidimensional analysis, and bulk insert. It also discusses the three primary ways of storing information in OLAP: MOLAP, ROLAP, and HOLAP.
1 of 6
Downloaded 13 times
More Related Content
Sqlserver interview questions
1. CROSS JOIN:
This join is a Cartesian join that does not necessitate any condition to join. The resultset
contains records that are multiplication of record number from both the tables.
How do you sort in sql:
order by" statement can be used to sort columns returned in a SELECT statement. The
ORDER BY clause is not valid in views, inline functions, derived tables, and subqueries,
unless TOP is also specified.
How do you select unique records in SQL?:
Using the DISTINCT clause
select distinct FirstName from Employees
Union & Union All:
The difference between Union and Union all is that Union all will not eliminate duplicate
rows, instead it just pulls all rows from all tables fitting your query specifics and
combines them into a table.
If you know that all the records returned are unique from your union, use UNION ALL
instead, it gives faster results
Truncate & Delete:
Truncate an Delete both are used to delete data from the table. These both command will
only delete data of the specified table, they cannot remove the whole table data
structure.Both statements delete the data from the table not the structure of the table.
TRUNCATE is a DDL (data definition language) command whereas DELETE is a DML
(data manipulation language) command.
You can use WHERE clause(conditions) with DELETE but you can't use WHERE clause
with TRUNCATE .
You cann't rollback data in TRUNCATE but in DELETE you can rollback
data.TRUNCATE removes(delete) the record permanently.
A trigger doesnt get fired in case of TRUNCATE whereas Triggers get fired in DELETE
command.
Compute Clause in SQL?:
2. Generates totals that appear as additional summary columns at the end of the result set.
When used with BY, the COMPUTE clause generates control-breaks and subtotals in the
result set.
USE Database;
GO
SELECT CustomerID, OrderDate, SubTotal, TotalDue
FROM Sales.SalesOrderHeader
WHERE ID = 1
ORDER BY OrderDate
COMPUTE SUM(SubTotal), SUM(TotalDue);
What is Datawarehousing:
In computing, a data warehouse (DW) is a database used for reporting and analysis. The
data stored in the warehouse is uploaded from the operational systems. The data may pass
through an operational data store for additional operations before it is used in the DW for
reporting.
A data warehouse maintains its functions in three layers: staging, integration, and access.
Staging is used to store raw data for use by developers. The integration layer is used to
integrate data and to have a level of abstraction from users. The access layer is for getting
data out for users.
The term Data Warehouse was coined by Bill Inmon in 1990, which he defined in the
following way: "A warehouse is a subject-oriented, integrated, time-variant and non-
volatile collection of data in support of management's decision making process". He
defined the terms in the sentence as follows:
Subject Oriented:
Data that gives information about a particular subject instead of about a company's
ongoing operations.
Integrated:
Data that is gathered into the data warehouse from a variety of sources and merged into a
coherent whole.
Time-variant:
All data in the data warehouse is identified with a particular time period.
Non-volatile
Data is stable in a data warehouse. More data is added but data is never removed. This
enables management to gain a consistent picture of the business.
What is Data Mart: A data mart is the access layer of the data warehouse environment
3. that is used to get data out to the users. The data mart is a subset of the data warehouse
which is usually oriented to a specific business line or team.
A data mart is a simple form of a data warehouse that is focused on a single subject (or
functional area), such as Sales, Finance, or Marketing. Data marts are often built and
controlled by a single department within an organization. Given their single-subject
focus, data marts usually draw data from only a few sources. The sources could be
internal operational systems, a central data warehouse, or external data.
A data mart is a repository of data gathered from operational data and other sources that
is designed to serve a particular community of knowledge workers. In scope, the data
may derive from an enterprise-wide database or data warehouse or be more specialized.
The emphasis of a data mart is on meeting the specific demands of a particular group of
knowledge users in terms of analysis, content, presentation,and ease-of-use. Users of a
data mart can expect to have data presented in terms that are familiar.
What are Fact Table & Dimension Tables:
In data warehousing, a fact table consists of the measurements, metrics or facts of a
business process. It is often located at the centre of a star schema or a snowflake schema,
surrounded by dimension tables.
Fact tables provide the (usually) additive values that act as independent variables by
which dimensional attributes are analyzed. Fact tables are often defined by their grain.
The grain of a fact table represents the most atomic level by which the facts may be
defined. The grain of a SALES fact table might be stated as "Sales volume by Day by
Product by Store". Each record in this fact table is therefore uniquely defined by a day,
product and store. Other dimensions might be members of this fact table (such as
location/region) but these add nothing to the uniqueness of the fact records. These
"affiliate dimensions" allow for additional slices of the independent facts but generally
provide insights at a higher level of aggregation (a region contains many stores).
In data warehousing, a dimension table is one of the set of companion tables to a fact
table.
The fact table contains business facts or measures and foreign keys which refer to
candidate keys (normally primary keys) in the dimension tables.
Contrary to fact tables, the dimension tables contain descriptive attributes (or fields)
which are typically textual fields or discrete numbers that behave like text. These
attributes are designed to serve two critical purposes: query constraining/filtering and
query result set labeling.
Dimension attributes are supposed to be:
Verbose - labels consisting of full words,
Descriptive,
Complete - no missing values,
Discretely valued - only one value per row in dimensional table,
Quality assured - no misspelling, no impossible values.
4. Snake flow schema to Design the tables:
In computing, a snowflake schema is a logical arrangement of tables in a
multidimensional database such that the entity relationship diagram resembles a
snowflake in shape. The snowflake schema is represented by centralized fact tables which
are connected to multiple dimensions.
The snowflake schema is similar to the star schema. However, in the snowflake schema,
dimensions are normalized into multiple related tables, whereas the star schema's
dimensions are normalized with each dimension represented by a single table. A complex
snowflake shape emerges when the dimensions of a snowflake schema are elaborate,
having multiple levels of relationships, and the child tables have multiple parent tables
("forks in the road"). The "snowflaking" effect only affects the dimension tables and NOT
the fact tables.
Processing of ETL Dataware housing:
Extract, transform and load (ETL) is a process in database usage and especially in data
warehousing that involves:
Extracting data from outside sources
Transforming it to fit operational needs (which can include quality levels)
Loading it into the end target (database or data warehouse)
Extract - The first part of an ETL process involves extracting the data from the source
systems. In many cases this is the most challenging aspect of ETL, as extracting data
correctly will set the stage for how subsequent processes will go.
Transform - The transform stage applies a series of rules or functions to the extracted data
from the source to derive the data for loading into the end target.
Load - The load phase loads the data into the end target, usually the data warehouse
(DW). Depending on the requirements of the organization, this process varies widely
What is BCP:
The bcp utility copies data between an instance of SQL Server and a data file in a user-
specified format.The Bulk Copy Program (BCP) is a command-line utility that ships with
Microsoft SQL Server. With BCP, you can import and export large amounts of data in and
out of SQL Server databases quickly and easily.
DTS in SQL Server:
Data Transformation Services, or DTS, is a set of objects and utilities to allow the
automation of extract, transform and load operations to or from a database. The objects
are DTS packages and their components, and the utilities are called DTS tools. DTS was
5. included with earlier versions of Microsoft SQL Server, and was almost always used with
SQL Server databases, although it could be used independently with other databases.
DTS allows data to be transformed and loaded from heterogeneous sources using OLE
DB, ODBC, or text-only files, into any supported database. DTS can also allow
automation of data import or transformation on a scheduled basis, and can perform
additional functions such as FTPing files and executing external programs. In addition,
DTS provides an alternative method of version control and backup for packages when
used in conjunction with a version control system, such as Microsoft Visual SourceSafe .
Multi dimensional Analysis:
Multidimensional analysis is a data analysis process that groups data into two or more
categories: data dimensions and measurements. For example, a data set consisting of the
number of wins for a single football team at each of several years is a single-dimensional
(in this case, longitudinal) data set. A data set consisting of the number of wins for several
football teams in a single year is also a single-dimensional (in this case, cross-sectional)
data set. A data set consisting of the number of wins for several football teams over
several years is a two-dimensional data set.
In many disciplines, two-dimensional data sets are also called panel data. While, strictly
speaking, two- and higher- dimensional data sets are "multi-dimensional," the term
"multidimensional" tends to be applied only to data sets with three or more dimensions.
For example, some forecast data sets provide forecasts for multiple target periods,
conducted by multiple forecasters, and made at multiple horizons. The three dimensions
provide more information than can be gleaned from two dimensional panel data sets.
Bulk Insert:
The Bulk Insert task provides an efficient way to copy large amounts of data into a SQL
Server table or view. For example, suppose your company stores its million-row product
list on a mainframe system, but the company's e-commerce system uses SQL Server to
populate Web pages. You must update the SQL Server product table nightly with the
master product list from the mainframe. To update the table, you save the product list in a
tab-delimited format and use the Bulk Insert task to copy the data directly into the SQL
Server table.
There are primary three ways in which we store information in OLAP:-
MOLAP
Multidimensional OLAP (MOLAP) stores dimension and fact data in a persistent data
store using compressed indexes. Aggregates are stored to facilitate fast data access.
MOLAP query engines are usually proprietary and optimized for the storage format used
by the MOLAP data store. MOLAP offers faster query processing than ROLAP and
usually requires less storage. However, it doesnt scale as well and requires a separate
6. database for storage.
ROLAP
Relational OLAP (ROLAP) stores aggregates in relational database tables. ROLAP use
of the relational databases allows it to take advantage of existing database resources,
plus it allows ROLAP applications to scale well. However, ROLAPs use of tables to
store aggregates usually requires more disk storage than MOLAP, and it is generally not
as fast.
HOLAP
As its name suggests, hybrid OLAP (HOLAP) is a cross between MOLAP and ROLAP.
Like ROLAP, HOLAP leaves the primary data stored in the source database. Like
MOLAP,
HOLAP stores aggregates in a persistent data store thats separate from the primary
relational database. This mix allows HOLAP to offer the advantages of both MOLAP
and ROLAP. However, unlike MOLAP and ROLAP, which follow well-defined
standards,
HOLAP has no uniform implementation.