You are on page 1of 30

Page 1

SQL Server 2012


What is Normalization?
Database normalization is a data design and organization process applied to data structures
based on rules that help building relational databases. In relational database design, the
process of organizing data to minimize redundancy is called normalization. Normalization
usually involves dividing a database into two or more tables and defining relationships
between the tables. The objective is to isolate data so that additions, deletions, and
modifications of a field can be made in just one table and then propagated through the rest
of the database via the defined relationships.

What is De-normalization?
De-normalization is the process of attempting to optimize the performance of a database by
adding redundant data. It is sometimes necessary because current DBMSs implement the
relational model poorly. A true relational DBMS would allow for a fully normalized database
at the logical level, while providing physical storage of data that is tuned for high
performance. De-normalization is a technique to move from higher to lower normal forms of
database modeling in order to speed up database access.

What are the basic functions for master, msdb, model, tempdb databases?

The Master database contains catalog and data for all databases of the SQL
Server instance and it holds the engine together. Because SQL Server cannot start if
the master database is not working.
The msdb database contains data of database backups, SQL Agent, DTS
packages, SQL Server jobs, and log shipping.

The tempdb contains temporary objects like global and local temporary tables and
stored procedures.

The model is a template database which is used for creating a new user
database.

What are the difference between clustered and a non-clustered index?

1. A clustered index is a special type of index that reorders the way records in the table are
2.

physically stored. Therefore table can have only one clustered index. The leaf nodes of a
clustered index contain the data pages.
A non clustered index is a special type of index in which the logical order of the index does
not match the physical stored order of the rows on disk. The leaf node of a non clustered
index does not consist of the data pages. Instead, the leaf nodes contain index rows.
/* Create (UNIQUE) Nonclustered Index over Table */
CREATE(UNIQUE) NONCLUSTERED INDEX[IX_MyTable_NonClustered]
ON[dbo].[Table1]
(
[First]ASC,
[Second]ASC
)ON[PRIMARY]
GO

Page 2
Now, Query will use Index Seek, before that, Query will use Table Scan,
/* Create Clustered Index over Table */
CREATE CLUSTERED INDEX[IX_MyTable_Clustered]
ON[dbo].[MyTable]
(
[ID]ASC
)ON[PRIMARY]
GO

Note: When any data modification operations (DML - INSERT, UPDATE, or DELETE
statements) table fragmentation can occur. DBCC DBREINDEX statement can be used to
rebuild all the indexes on all the tables in database. DBCC DBREINDEX is efficient over
dropping and recreating indexes. E.g. DBCC DBREINDEX (TableName, '', 80)

What are the different index configurations a table can have?


A table can have one of the following index configurations:-

No indexes
A clustered index

A clustered index and many nonclustered indexes

A nonclustered index

Many nonclustered indexes

What is difference between Index Seek vs. Index Scan?

Index Seek and Index Scan are operation for query tuning in execution plans.
Table Scan scans every record of the table. So the cost of proportional is the number of rows
of that table.

The Index Scan is preferred only when the table is small.

Index Seek only touches the rows which qualify and the pages that contain that qualifying
rows, so the cost of proportional is the number of qualifying rows and pages instead of the
number of rows in the table.

Index seek is preferred for highly sensitive queries.

What is blocking and how would you troubleshoot it?


Blocking happens when one connection from an application holds a lock and a second connection
requires a conflicting lock type. This forces the second connection to wait, blocked on the first.

What is Fill factor?

The 'fill factor' option indicate how full SQL Server will create each index page.
When the index page doesnt have free space for inserting a new row, SQL Server
will create new index page and transfer some rows from the previous index page to
the new index page. This process is called page split.

Page 3

If we want to reduce the number of page splits then we can use Fill factor option.
Using Fill factor, SQL Server will reserve some space on each index page.

The fill factor is a value from 1 through 100 that indicates the percentage of the
index page to be left empty. The default value for fill factor is 0.

If the table contains the data which is not changed frequently then we can set the fill
factor option to 100. When the table's data is modified frequently, we can set the fill
factor option to 80% or as we want.

What are different types of Collation Sensitivity?


Case sensitivity A and a, B and b, etc.
Accent sensitivity a and , o and , etc.
Kana Sensitivity When Japanese kana characters Hiragana and Katakana are treated
differently,
it is called Kana sensitive.
Width sensitivity A single-byte character (half-width) and the same character represented
as a double-byte character (full-width) are treated differently than it is width sensitive.
Collation refers to a set of rules that determine how data is sorted and compared.
Character data is sorted using rules that define the correct character sequence, with options
for specifying case sensitivity, accent marks, kana character types and character width.

What is difference between DELETE & TRUNCATE commands?


Delete command removes the rows from a table based on the condition that we provide
with a WHERE clause. Truncate will actually remove all the rows from a table and there will
be no data in the table after we run the truncate command.
TRUNCATE

TRUNCATE is faster and uses fewer system and transaction log resources than DELETE.
TRUNCATE removes the data by de-allocating the data pages used to store the tables data,
and only the page de-allocations are recorded in the transaction log.

TRUNCATE removes all rows from a table, but the table structure, its columns, constraints,
indexes and so on, remains. The counter used by an identity for new rows is reset to the seed
for the column.

You cannot use TRUNCATE TABLE on a table referenced by a FOREIGN KEY constraint. Because
TRUNCATE TABLE is not logged, it cannot activate a trigger.

TRUNCATE cannot be rolled back.

TRUNCATE is DDL Command.

TRUNCATE Resets identity of the table

DELETE

Page 4

DELETE removes rows one at a time and records an entry in the transaction log for each
deleted row.
If you want to retain the identity counter, use DELETE instead. If you want to remove table
definition and its data, use the DROP TABLE statement.

DELETE Can be used with or without a WHERE clause

DELETE Activates Triggers.

DELETE can be rolled back.

DELETE is DML Command.

DELETE does not reset identity of the table.

If we drop a table, does it also drop related objects like constraints, indexes,
columns, defaults, Views and Stored Procedures?
YES, SQL Server drops all related objects, which exists inside a table like, constraints, indexes,
columns, defaults etc. BUT dropping a table will not drop Views and Stored Procedures as they exist
outside the table.

What is a table called, if it has neither Cluster nor Non-cluster Index? What is
it used for?
Unindexed table or Heap. A heap is a table that does not have a clustered index and, therefore, the
pages are not linked by pointers. Many times it is better to drop all indexes from table and then do
bulk of inserts and to restore those indexes after that.

Can we add Identity Column to Decimal data type column?


Yes. SQL Server supports.

How to insert values to identity column in SQL Server?


CREATE TABLE Customer (ID Int IDENTITY (1,1) , Name varchar(100), Address varchar(200))
The Below statement allows insert data to the identity field by setting IDENTITY_INSERT ON for a
particular table,
SET IDENTITY_INSERT Customer ON

What is the difference between a HAVING CLAUSE and a WHERE CLAUSE?


Both specify a search condition for a group or an aggregate. But the difference is that HAVING can be
used only with the SELECT statement. HAVING is typically used in a GROUP BY clause. When GROUP
BY is not used, HAVING behaves like a WHERE clause. Having Clause is basically used only with the
GROUP BY function in a query whereas WHERE Clause is applied to each row before they are part of
the GROUP BY function in a query.

When is the use of UPDATE_STATISTICS command?


This command is basically used when a large processing of data has occurred. If a large amount of
deletions any modification or Bulk Copy into the tables has occurred, it has to update the indexes to

Page 5
take these changes into account. UPDATE_STATISTICS updates the indexes on these tables
accordingly.

By Mistake, duplicate records exists in table, then how to delete copy of the
record?
WITH [CTE DUPLICATE] AS
(
SELECT
RN = ROW_NUMBER () OVER (PARTITION BY CompanyTitle ORDER BY Id DESC),
Id, CompanyTitle, ContactName, LastContactDate
FROM Suppliers
)
DELETE FROM [CTE DUPLICATE] WHERE RN > 1

What is the difference between ISNULL () and COALESCE ()?


ISNULL accepts only 2 parameters. The first parameter is checked for NULL value, if it is NULL then
the second parameter is returned, otherwise it returns first parameter.
COALESCE accepts two or more parameters. One can apply 2 or as many parameters, but it returns
only the first non NULL parameter.

What is SQL Profiler?


SQL Profiler is a graphical tool that allows developer or system administrators to monitor events in an
instance of Microsoft SQL Server. You can capture and save data about each event to a file or SQL
Server table to analyze later.

Page 6
For example, you can monitor a production environment to see which stored procedures are
hampering performances by executing too slowly.
Use SQL Profiler to monitor only the events in which you are interested. If traces are becoming
too large, you can filter them based on the information you want, so that only a subset of the event
data is collected. Monitoring too many events adds overhead to the server and the monitoring process
and can cause the trace file or trace table to grow very large, especially when the monitoring process
takes place over a long period of time.

Steps to create and run a new trace based on this definition file.
1. Open Profiler.
2. From the File menu, select New > Trace...
3. In the 'Connect to SQL Server' dialog box, Connect to the SQL Server that you are
going to be tracing, by providing the server name and login information (Make sure
you connect as a sysadmin).
4. In the General tab of the 'Trace Properties' dialog box click on the folder icon
against the 'Template file name:' text box. Select the trace template file that
you've just downloaded.
5. Now we need to save the Profiler output to a table, for later analysis. So, Check the
check box against 'Save to table:' check box. In the 'Connect to SQL Server' dialog
box, specify an SQL Server name (and login, password), on which you'd like to store
the Profiler output. In the 'Destination Table' dialog box, select a database and table
name. Click OK. It's a good idea to save the Profiler output to a different SQL Server,
than the one on which we are conducting load test.
6. Check 'Enable trace stop time:' and select a date and time at which you want the
trace to stop itself. (I typically conduct the load test for about 2 hours).
7. Click "Run" to start the trace.
It's also a good idea to run Profiler on a client machine, instead of, on the SQL Server itself.
On the client machine, make sure you have enough space on the system drive

What is User Defined Functions? What kind of User-Defined Functions can be


created?
User-Defined Functions allow defining its own T-SQL functions that can accept 0 or more parameters
and return a single scalar data value or a table data type.
Different Kinds of User-Defined Functions created are:
Scalar User-Defined Function: A Scalar user-defined function returns one of the scalar data types.
Text, ntext, image and timestamp data types are not supported. These are the type of user-defined
functions that most developers are used to in other programming languages. You pass in 0 to many
parameters and you get a return value.

Page 7
Inline Table-Value User-Defined Function: An Inline Table-Value user-defined function returns a
table data type and is an exceptional alternative to a view as the user-defined function can pass
parameters into a T-SQL select command and in essence provide us with a parameterized, nonupdateable view of the underlying tables.
Multi-statement Table-Value User-Defined Function: A Multi-Statement Table-Value user-defined
function returns a table and is also an exceptional alternative to a view as the function can support
multiple T-SQL statements to build the final result where the view is limited to a single SELECT
statement. Also, the ability to pass parameters into a TSQL select command or a group of them gives
us the capability to in essence create a parameterized, non-updateable view of the data in the
underlying tables. Within the create function command you must define the table structure that is
being returned. After creating this type of user-defined function, It can be used in the FROM clause of
a T-SQL command unlike the behavior found when using a stored procedure which can also return
record sets.

What does the NOLOCK query hint do?


Table hints allow you to override the default behavior of the query optimizer for statements. They are
specified in the FROM clause of the statement. While overriding the query optimizer is not always
suggested, it can be useful when many users or processes are touching data. The NOLOCK
query hint is a good example because it allows you to read data regardless of who else is working with
the data; that is, it allows a dirty read of data you read data no matter if other users are
manipulating it. A hint like NOLOCK increases concurrency with large data stores.
SELECT * FROM table_name (NOLOCK)
Microsoft advises against using NOLOCK, as it is being replaced by the READUNCOMMITTED query
hint.

What is a WITH (NOLOCK)?

WITH (NOLOCK) is used to unlock the data which is locked by the transaction that is
not yet committed. This command is used before SELECT statement.
When the transaction is committed or rolled back then there is no need to use
NOLOCK function because the data is already released by the committed transaction.

Syntax: WITH(NOLOCK)

Example:
SELECT * FROM EmpDetails WITH(NOLOCK)

WITH(NOLCOK) is similar as READ UNCOMMITTED

What are ACID properties of Transaction?


Following are the ACID properties for Database.
Atomicity Transactions may be set of SQL statements. If any of statement fails then the
entire transaction fails. The transaction follows all or nothing rule.

Page 8
Consistency This property says that the transaction should be always in consistent state.
If any transaction is going to affect the databases consistent state then the transaction
could be rolled back.
Isolation This property says that one transaction cannot retrieve the data that has been
modified by any other transaction until its completed.
Durability When any transaction is committed then it must be persisted. In the case of
failure only committed transaction will be recovered and uncommitted transaction will be
rolled back.

What is a CTE?
A common table expression (CTE) is a temporary named result set that can be used within other
statements like SELECT, INSERT, UPDATE, and DELETE. It is not stored as an object and its
pelifetime/scope is limited to the query. It is defined using the WITH statement as the following
example shows:

WITH ExampleCTE (id, fname, lname)


AS
(
SELECT id, firstname, lastname FROM table
)
SELECT * FROM ExampleCTE
A CTE can be used in place of a view in some instances.

What are DBCC commands?


Basically, the Database Consistency Checker (DBCC) provides a set of commands (many of which are
undocumented) to maintain databases maintenance, validation, and status checks. The syntax is
DBCC followed by the command name. Here are three examples:
DBCC CHECKALLOC Check disk allocation consistency.
DBCC OPENTRAN Display information about recent transactions.
DBCC HELP
Display Help for DBCC commands.

What are temp tables? What is the difference between global and local temp
tables?
Temporary tables are temporary storage structures. You may use temporary tables as buckets to store
data that you will manipulate before arriving at a final format. The hash (#) character is used to
declare a temporary table as it is prepended to the table name. A single hash (#) specifies a local
temporary table.
CREATE TABLE #tempLocal (nameid int, fname varchar (50), lname varchar (50))
Local temporary tables are available to the current connection for the user, so they disappear when
the user disconnects.

Page 9
Global temporary tables may be created with double hashes (##). These are available to all users via
all connections, and they are deleted only when all connections are closed.
CREATE TABLE ##tempGlobal (nameid int, fname varchar (50), lname varchar (50))
Once created, these tables are used just like permanent tables; they should be deleted when you are
finished with them. Within SQL Server, temporary tables are stored in the Temporary Tables folder of
the tempdb database.

How are transactions used?


Transactions allow you to group SQL commands into a single unit. The transaction begins with a
certain task and ends when all tasks within it are complete. The transaction completes successfully
only if all commands within it complete successfully. The whole thing fails if one command fails. The
BEGIN TRANSACTION, ROLLBACK TRANSACTION, and COMMIT TRANSACTION statements are used to
work with transactions. A group of tasks starts with the begin statement. If any problems occur, the
rollback command is executed to abort. If everything goes well, all commands are permanently
executed via the commit statement.

BEGIN TRANSACTION
Statement1
Statement2
..................
...............
IF(@@ERROR>0)
ROLLBACK TRANSACTION
ELSE
COMMIT TRANSACTION

SQL Server Exception Handling by TRYCATCH:


To handle exception in SQL Server we have TRY...CATCH blocks. We put T-SQL statements
in TRY block and to handle exception we write code in CATCH block. If there is an error in
code within TRY block then the control will automatically jump to the corresponding CATCH
blocks. In SQL Server, against a Try block we can have only one CATCH block.
Syntax: BEGIN TRY
--T-SQL statements
--or T-SQL statement blocks
END TRY
BEGIN CATCH
--T-SQL statements
--or T-SQL statement blocks
END CATCH

System Defined Error Functions used within CATCH block:

Page 10

1.

ERROR_NUMBER() - This returns the error number and its value is same as for @@ERROR

function.
2.
3.

ERROR_LINE() - This returns the line number of T-SQL statement that caused error.
ERROR_SEVERITY() - This returns the severity level of the error. A TRY..CATCH block
combination catches all the errors that have a severity between 11 and 19

4.

ERROR_STATE() - This returns the state number of the error.

5.

ERROR_PROCEDURE() - This returns the name of the stored procedure or trigger where the
error occurred.

6.

ERROR_MESSAGE() - This returns the full text of error message. The text includes the

values supplied for any substitutable parameters, such as lengths, object names, or times .

BEGIN TRANSACTION trans


BEGIN TRY
INSERT INTO Department (DeptID, DeptName, Location) VALUES (2,'HR','Delhi')
INSERT INTO
Employee(EmpID,Name,Salary,Address,DeptID)VALUES(1,'Mohan',18000,'Delhi',1)
BEGIN COMMIT TRANSACTION trans
END
END TRY
BEGIN CATCH
print 'Error Occurred'
BEGIN ROLLBACK TRANSACTION trans
END
END CATCH

What is Log Shipping?


Log shipping is the process of automating the backup of database and transaction log files on a
production SQL server, and then restoring them onto a standby server. Enterprise Editions only
supports log shipping. In log shipping, the transactional log file from one server is automatically
updated into the backup database on the other server. If one server fails, the other server will have
the same db and can be used this as the Disaster Recovery plan. The key feature of log shipping is
that it will automatically backup transaction logs throughout the day and automatically restore them
on the standby server at defined interval.

Name 3 ways to get an accurate count of the number of records in a table?


1. SELECT * FROM table1
2. SELECT COUNT(*) FROM table1
3. SELECT rows FROM sysindexes WHERE id = OBJECT_ID(table1) AND indid < 2

What is CHECK Constraint?


A CHECK constraint is used to limit the values that can be placed in a column. The check constraints
are used to enforce domain integrity.
IF OBJECT_ID ('dbo.Vendors', 'U') IS NOT NULL
DROP TABLE dbo.Vendors;

Page 11
GO
CREATE TABLE dbo.Vendors
(VendorID int PRIMARY KEY, VendorName nvarchar (50),
CreditRating tinyint)
GO
ALTER TABLE dbo.Vendors ADD CONSTRAINT CK_Vendor_CreditRating
CHECK (CreditRating >= 1 AND CreditRating <= 5)
Note: To modify a CHECK constraint, you must delete the existing CHECK constraint and then recreate it with the new definition.

What is Identity?
Identity (or AutoNumber) is a column that automatically generates numeric values. A start
and increment value can be set, but most DBA leave these at 1. A GUID column also
generates numbers; the value of this cannot be controlled. Identity/GUID columns do
not need to be indexed.

What is a view? What is the WITH CHECK OPTION clause for a view?
Views are designed to control access to data . A view is a virtual table that consists of fields
from one or more real tables. Views are often used to join multiple tables or to control
access to the underlying tables.
The WITH CHECK OPTION for a view prevents data modifications (to the data) that do not
confirm to the WHERE clause of the view definition. This allows data to be updated via the
view, but only if it belongs in the view
The with check option causes makes the where clause a two-way restriction. This option is useful
when the view should limit inserts and updates with the same restrictions applied to the where
clause.

This clause is very important because it prevents changes that do not meet the
view's criteria.
Example: Create a view on database pubs for table authors that show the name, phone number and
state from all authors from California. This is very simple:
CREATE VIEW dbo.AuthorsCA
AS
SELECT au_id, au_fname, au_lname, phone, state, contract
FROM dbo.authors
WHERE state = 'ca'
This is an updatable view and a user can change any column, even the state column:
UPDATE AuthorsCA SET state='NY'
After this update, there will be no authors from California state. This might not be the expected
behavior.

Page 12
Example: Same as above but the state column cannot be changed.
CREATE VIEW dbo.AuthorsCA2
AS
SELECT au_id, au_fname, au_lname, phone, state, contract
FROM dbo.authors
WHERE state = 'ca'
With Check Option
The view is still updatable, except for the state column:
Note: UPDATE AuthorsCA2 SET state='NY'
This will cause an error and the state will not be changed.

How to get @@ERROR and @@ROWCOUNT at the same time?


If @@Rowcount is checked after Error checking statement then it will have 0 as the value
of @@Recordcount as it would have been reset. And if @@Recordcount is checked before
the error-checking statement then @@Error would get reset. To get @@error and
@@rowcount at the same time do both in same statement and store them in local variable.
SELECT @RC = @@ROWCOUNT, @ER = @@ERROR

What are the advantages of using Stored Procedures?


1. Stored procedure can reduced network traffic and latency, boosting application
performance.
2. Stored procedure execution plans can be reused, staying cached in SQL Server's
memory, reducing server overhead.
3. Stored procedures help promote code reuse.
4. Stored procedures can encapsulate logic. You can change stored procedure code
without affecting clients.
5. Stored procedures provide better security to your data.

Can SQL Servers linked to other servers like Oracle?


SQL Server can be linked to any server provided it has OLE-DB provider from Microsoft to
allow a link.
E.g. Oracle has an OLE-DB provider for oracle provider that Microsoft provides to add it
as linked server to SQL Server group.

What is BCP? When does it used?


BulkCopy is a tool used to copy huge amount of data from tables and views. BCP does
not copy the structures same as source to destination. BULK INSERT command helps to
import a data file into a database table or view in a user-specified format

Page 13
What is an execution plan? When would you use it? How would you view the
execution plan?
An execution plan is basically a road map that graphically or textually shows the data
retrieval methods chosen by the SQL Server query optimizer for a stored procedure or adhoc query and is a very useful tool for a developer to understand the performance
characteristics of a query or stored procedure since the plan is the one that SQL Server will
place in its cache and use to execute the stored procedure or query. From within Query
Analyzer is an option called "Show Execution Plan" (located on the Query drop-down menu).
If this option is turned on it will display query execution plan in separate window when
query is ran again.

Materialized Views or Indexed View


Materialized views in Microsoft SQL Server
Steps to create materialized view in SQL Server. Index views are the SQL Server realization
of materialized views.
Our SQL Server solution is a DSS for tracking information about the transactions that
occurred in our enterprise and the associated products.
We're going to use the sample AdventureWorks database shipped with SQL Server 2005 and
the tables Production.TransactionHistory and Production.Product which stores
information about the entities transaction history and the underlying products. We're going
to write a SQL SELECT to report the total cost and quantity by product (see Listing 4). When
the query is executed, in my case, it took 42334 microseconds to execute.
SET STATISTICS TIME ON
select p.ProductID, sum(t.ActualCost), sum(t.Quantity)
from Production.TransactionHistory t inner join Production.Product p on
t.ProductID=p.ProductID group by p.ProductID;
To improve the response time, our strategy is to implement a materialized view in SQL
Server. First of all we have to create a new view. It is remarkable to say that we need to
include the WITH SCHEMABINDING option to bind the view to the schema of the base
tables.
create view v_TotalCostQuantityByProduct with schemabinding
as
select p.ProductID, sum(t.ActualCost), sum(t.Quantity)
from Production.TransactionHistory t inner join Production.Product p on
t.ProductID=p.ProductID group by p.ProductID;
And then a clustered index will be created on this regular view. This turns the regular view
into an indexed view, that operation took 20998 microseconds to execute.
create unique clustered index TotalCostQuantityByProduct on v_TotalCostQuantityByProduct
(ProductID)
Now let's run a SQL SELECT statement against the created materialized view as shown in
Listing. And it took just 32 microseconds to execute. Good improvement.

Page 14

select * from v_TotalCostQuantityByProduct

What is use of EXCEPT clause? How it differs from NOT IN clause.


-When we combine two queries using EXCEPT clause, it will returns distinct rows from the first SELECT
statement that are not returned by the second one.
-EXCEPT clause works the same way as the UNION operator of SQL and MINUS clause in Oracle.
-The syntax of EXCEPT clause is as follow,
SELECT column1 [, column2 ]
FROM table1 [, table2 ]
[WHERE condition]
EXCEPT
SELECT column1 [, column2 ]
FROM table1 [, table2 ]
[WHERE condition]
The difference between EXCEPT and NOT IN clause is EXCEPT operator returns all distinct rows from
the rows returned by first select statement which does not exist in the rows returned by second select
statement. On the other hand NOT IN will return all rows from returned by first select statement
which does not exist in the rows returned by second select statement.

What is ROW_NUMBER function?

RANK is one of the Ranking functions which are used to give rank to each row in the result set
of a SELECT statement.
For using this function first specify the function name, followed by the empty parentheses.

Then specify the OVER function. For this function, you have to pass an ORDER BY clause as an
argument. The clause specifies the column(s) that you are going to rank.

For Example
SELECT ROW_NUMBER() OVER(ORDER BY Salary DESC) AS [RowNumber], EmpName, Salary,
[Month], [Year] FROM EmpSalary

In the result you will see that the highest salary got the first rand and the lowest salary got
the last rank. Here the rows with equal salaries will not get same ranks.

What are different types of replication in SQL Server?


There are three types of replication in SQL SERVER,
1. Snapshot Replication.

In Snapshot Replication snapshot of one database is transferred to another database.


In this replication data can be refreshed periodically and all data will be copied to
another database every time the table is refreshed.

2. Transactional Replication

Page 15

In transactional replication data will be same as in snapshot replication, but later


only the transactions are synchronized instead of replicating the whole database.
We can specify the refresh of database either continuously or on periodic basis.

3. Merge Replication

Merge replication replicate data from multiple sources into a single central database.
The initial load will be same as in snapshot replication but later it allows change of
data both on subscriber and publisher, later when they come on-line it detects and
combines them and updates accordingly.

What are Magic tables in SQL Server?


-In SQL Server there are two system tables Inserted and Deleted called Magic tables.
-These are not the physical tables but the virtual tables generally used with the triggers to
retrieve the inserted, deleted or updated rows.
-When a record is inserted in the table that record will be there on INSERTED Magic table.

-When a record is updated in the table that existing record will be there on DELETED Magic
table and modified data will be there in INSERTED Magic table.
-When a record is deleted from that table that record will be there on DELETED Magic table.

What is Trigger?
-In SQL the Trigger is the procedural code that executed when you INSERT, DELETE or
UPDATE data in the table.
-Triggers are useful when you want to perform any automatic actions such as cascading
changes through related tables, enforcing column restrictions, comparing the results of data
modifications and maintaining the referential integrity of data across a database.
-For example, to prevent the user to delete the any Employee from EmpDetails table,
following trigger can be created.
Create Trigger del_emp
on EmpDetails
For delete
as
Begin
rollback transaction
print "You cannot delete any Employee!"
End
-When someone will delete a row from the EmpDetails table, the del_emp trigger cancels
the deletion, rolls back the transaction, and prints a message "You cannot delete any
Employee!"

Page 16
What is Normalization of database? What are its benefits?
Normalization is set of rules that are to be applied while designing the database tables
which are to be connected with each other by relationships. This set of rules is called
Normalization.
Benefits of normalizing the database are
1. No need to restructure existing tables for new data.
2. Reducing repetitive entries.
3. Reducing required storage space
4. Increased speed and flexibility of queries.

Merge Statement (in SQL Server 2008 and later versions)


One of the new features of SQL Server 2008 is Merge Statement. Using a single statement,
we can Add/Update records in our database table, without explicitly checking for the
existence of records to perform operations like Insert or Update,
Syntax:
Merge <Target Table>
Using <Source Table or Table Expression>
On <Join/Merge condition> (similar to outer join)
WHEN MATCHED <statement to run when match found in target>
WHEN [TARGET] NOT MATCHED <statement to run when no match found in target>
For Example:
Merge INTO dbo.tbl_Customers as C
Using dbo.tbl_CustomersTemp as CT
On c.CustId = CT.CustId
WHEN MATCHED THEN
--Row exists and data is different
UPDATE SET C.CompanyName = CT.CompanyName, C.Phone = CT.Phone
WHEN NOT MATCHED THEN
--Row exists in Source but not in target/destination
INSERT (CustId, CompanyName, Phone)
Valued (CT.CustId, CT. CompanyName, CT.Phone)
WHEN SOURCE NOT MATCHED THEN
--Row exists in target but not in Source
DELETE OUTPUT $action, inserted.id, deleted.id

SQL Server Service Broker


SQL Server Service Broker (SSSB) to send and receive messages between databases (either
in local/remote instances) by using Transact T-SQL.

Page 17
Note: Messages can be sent from within the same database, different database, or even

remotely located SQL Server instances.


The Service Broker communicates with a protocol called the Dialog, which allows for bidirectional communication between two endpoints (databases). The Dialog protocol specifies
the logical steps required for a reliable conversation, and makes sure that messages are
received in the order they were sent.

Cursor:
Cursors are required when we need to update records in a database table in singleton
fashion that means row by row. A Cursor also impacts the performance of the SQL Server
since it uses the SQL Server instances memory, reduce concurrency, decrease
network bandwidth and lock resources. So, better to avoid the use of cursor.
Alternatives solutions for like as WHILE loop, Temporary tables and Table variables.
Note: We should use cursor in that case when there is no option except cursor.

What is the difference between #temp table and @table variable?


#Temp Table
Temp table is a temporary table that is generally created to store session specific data. Its
kind of normal table but it is created and populated on disk, in the system database
tempdb with a session-specific identifier packed onto the name, to differentiate between
similarly-named
#temp
tables
created
from
other
sessions.
The data in this #temp table (in fact, the table itself) is visible only to the current scope.
Generally, the table gets cleared up automatically when the current procedure goes out of
scope, however, we should manually clean up the data when we are done with it.
Syntax:
-- create temporary table
CREATE TABLE #myTempTable (AutoID int, MyName char(50))
-- populate temporary table
INSERT INTO #myTempTable (AutoID, MyName)
SELECT AutoID, MyName FROM myOriginalTable WHERE

AutoID <= 50000

-- Drop temporary table


drop table #myTempTable

@Table Variable
Table variable is similar to temporary table except with more flexibility. It is not physically
stored in the hard disk, it is stored in the memory. We should choose this when we need to
store less
100
records.
Syntax:

Page 18

DECLARE @myTable TABLE (AutoID int, myName char(50))


INSERT INTO @myTable (AutoID, myName)
SELECT YakID, YakName FROM myTable WHERE AutoID <= 50

How to Return XML in SQL Server?


We can use FOR XML statement at the end of the query to return xml data from the SQL
Server. There are three mode of returning XML and they are auto, raw and explicit
For Example: Select * from test For XML auto (<test sno="1" name="a" />)

While function is used to count more than two billion rows in a table?
Count(), COUNT_BIG()

Types of SQL Keys


1.
Super Key
Super key is a set of one or more than one keys that can be used to identify a record
uniquely in a table. Example : Primary key, Unique key, Alternate key are subset of Super
Keys.
2.
Candidate Key
A Candidate Key is a set of one or more fields/columns that can identify a record uniquely in
a table. There can be multiple Candidate Keys in one table. Each Candidate Key can work as
Primary Key.
Example: In below diagram ID, RollNo and EnrollNo are Candidate Keys since all these
three fields can be work as Primary Key.
3.
Primary Key
Primary key is a set of one or more fields/columns of a table that uniquely identify a record
in database table. It can not accept null, duplicate values. Only one Candidate Key can be
Primary Key.
4.
Alternate key
A Alternate key is a key that can be work as a primary key. Basically it is a candidate key
that currently is not primary key.
Example: In below diagram RollNo and EnrollNo becomes Alternate Keys when we define
ID as Primary Key.
5.
Composite/Compound Key
Composite Key is a combination of more than one fields/columns of a table. It can be a
Candidate key, Primary key.
6.
Foreign Key
Foreign Key is a field in database table that is Primary key in another table. It can accept
multiple null, duplicate values.

Page 19

Example: We can have a DeptID column in the Employee table which is pointing to DeptID
column in a department table where it a primary key.
Defined Keys CREATE TABLE Department ( ID int PRIMARY KEY, Name varchar (50) NOT NULL, Address
varchar (200) NOT NULL, )
CREATE TABLE Student ( ID int PRIMARY KEY, RollNo varchar(10) NOT NULL,
Name varchar (50) NOT NULL, EnrollNo varchar(50) UNIQUE, Address varchar(200)
NOT NULL, DeptID int FOREIGN KEY REFERENCES Department(DeptID))

Note: Practically in database, we have only three types of keys Primary Key, Unique
Key and Foreign Key. Other types of keys are only concepts of RDBMS that we
need to know.

Page 20

SQL Server Integration Services (SSIS) Best Practices


The best part of SSIS is that it is a component of SQL server. It comes free with the SQL
Server installation and you don't need a separate license for it. Because of this, along with
hardcore BI developers, database developers and database administrators are also using it
to transfer and transform data.
SSIS 2008 has enhanced the internal dataflow pipeline engine to provide even better
performance.
Best Practice #1 - Pulling High Volumes of Data

While Inserting the high volume of data into the target table, indexes got
fragmented heavily up to 85%-90% (If the destination table has primary clustered
key and two non-clustered keys). We can use the online index rebuilding feature
to rebuild/defrag the indexes, but again the fragmentation level was back to 90%
after every 15-20 minutes during the load. This whole process of data transfer and
parallel online index rebuilds took almost 12-13 hours which was much more than
our expected time for data transfer.
Then we came with an approach to make the target table a heap by dropping all the
indexes on the target table in the beginning, transfer the data to the heap and on
data transfer completion, recreate indexes on the target table. With this approach,
the whole process (by dropping indexes, transferring data and recreating indexes)
took just 3-4 hours which was what we were expecting.
So the recommendation is to consider dropping your target table indexes if
possible before inserting data to it. Specially, if the volume of inserts is very high.

Page 21

DISABLE INDEX: ALTER Index indexname ON tablename.columnname DISABLE;


ENABLE INDEX: ALTER Index indexname ON tablename.columnname REBUILD;

Best Practice #2 - Avoid SELECT *


If you pull columns which are not required at destination (or for which no mapping exists)
SSIS will emit warnings like this.
Removing this unused output column can increase Data Flow task performance.

Beware when you are using "Table or view" or "Table name or view name from
variable" data access mode in OLEDB source. It behaves like SELECT * and pulls all the
columns.
Tip: Try to fit as many rows into the buffer which will eventually reduce the number of
buffers passing through the dataflow pipeline engine and improve performance.
Best Practice #3 - Effect of OLEDB Destination Settings
There are couple of settings with OLEDB destination which can impact the performance of
data transfer as listed below.
Data Access Mode This setting provides the 'fast load' option which internally uses a
BULK INSERT statement for uploading data into the destination table instead of a
simple INSERT statement (for each single row) as in the case for other options. So unless
you have a reason for changing it, don't change this default value of fast load.
Keep Identity By default this setting is unchecked which means the destination table (if it
has an identity column) will create identity values on its own. If you check this setting, the
dataflow engine will ensure that the source identity values are preserved and same value is
inserted into the destination table.
Keep Nulls Again by default this setting is unchecked which means default value will be
inserted (if the default constraint is defined on the target column) during insert into the

Page 22
destination table if NULL value is coming from the source for that particular column. If you
check this option then default constraint on the destination table's column will be ignored
and preserved NULL of the source column will be inserted into the destination.
Table Lock By default this setting is checked and the recommendation is to let it be
checked unless the same table is being used by some other process at same time. It
specifies a table lock will be acquired on the destination table instead of acquiring multiple
row level locks, which could turn into lock escalation problems.
Check Constraints Again by default this setting is checked and recommendation is to uncheck it if you are sure that the incoming data is not going to violate constraints of the
destination table. This setting specifies that the dataflow pipeline engine will
validate the incoming data against the constraints of target table. If you un-check
this option it will improve the performance of the data load.

Best Practice #4 - Effect of Rows Per Batch and Maximum Insert Commit Size
Settings.
Rows per batch The default value for this setting is -1 which specifies all incoming rows
will be treated as a single batch. You can change this default behavior and break all
incoming rows into multiple batches. The allowed value is only positive integer which
specifies the maximum number of rows in a batch.
Maximum insert commit size The default value for this setting is '2147483647' (largest
value for 4 byte integer type) which specifies all incoming rows will be committed once on
successful completion.
You can specify a positive value for this setting to indicate that commit will be done for
those number of records. You might be wondering, changing the default value for this
setting will put overhead on the dataflow engine to commit several times. Yes that is true,
but at the same time it will release the pressure on the transaction log and tempdb to grow
tremendously specifically during high volume data transfers.
The above two settings are very important to understand to improve the performance of
tempdb and the transaction log. For example if you leave 'Max insert commit size' to its
default, the transaction log and tempdb will keep on growing during the extraction process
and if you are transferring a high volume of data the tempdb will soon run out of memory as
a result of this your extraction will fail. So it is recommended to set these values to an
optimum value based on your environment.
Best Practice #5 SQL Server Destination Adapter
It is recommended to use the SQL Server Destination adapter, if your target is a local SQL
Server database. It provides a similar level of data insertion performance as the
Bulk Insert task and provides some additional benefits. But, with the SQL Server
Destination adapter we can transformation the data before uploading it to the
destination, which is not possible with Bulk Insert task.

Page 23
Note: Remember if your SQL Server database is on a remote server, you cannot use
SQL Server Destination adapter. It is better to use the OLEDB destination adapter to
minimize future changes.
Best Practice #6 - Avoid asynchronous transformation (such as Sort, Aggregate
Transformations) wherever possible
Internally, SSIS runtime engine executes the package. It executes every task other than
data flow task in the defined order. Whenever the SSIS runtime engine encounters a data
flow task, it hands over the execution of the data flow task to data flow pipeline engine.
The data flow pipeline engine breaks the execution of a data flow task into one more
execution tree(s) to achieve high performance.
At run time Data Flow Engine divides the Data Flow Task operations into Execution Trees. These
execution trees specify how buffers and threads are allocated in the package.
Each tree creates a new buffer and may execute on a different thread. When a new buffer is created
such as when a partially blocking or blocking transformation is added to the pipeline, additional
memory is required to handle the data transformation.

A new buffer requires extra memory to deal with the transformation that it is associated
with.

Buffer usage
Use of buffers by SSIS transformation type,
- Row-by-row transformations: Rows are processed as they enter the component, thus,
there is no need to accumulate data. Because it is able to use buffers previously created (by
preceding components/precedents), its not necessary to create new ones and copy data into
them. Examples: Data Conversion, Lookup, Derived Column, etc.
- Partially blocking transformations: These are usually used to combine data sets. Since
there is more than one data entry, it is possible to have huge amounts of rows waiting,
stored in memory, for the other data set to reach the component. In these cases, the
components data output is copied to new buffers and new execution threads may be
created. Examples: Union All, Merge Join, etc.
- Fully blocking transformations: Some transformations need the complete data set before
they start running. Therefore, these are the ones that impact on performance the most. In
these cases, as well, new buffers and new execution threads are created.
Examples: Aggregate, Sort.
SSIS reuses previously used buffers as much as possible, in order to increase performance.
Row-by-row transformations are known as synchronous. Each input row produces one output
row. On the other hand, in partially-blocking and fully-blocking transformations, known as

Page 24

asynchronous, there is no need to have the same number of input rows as output rows (they need
no output rows at all).

Begin Path 0 [Tree 1]


output "OLE DB Source Output" (11); component "SrcEmployee" (1)
input "OLE DB Destination Input" (29); component "DestEmployee" (16)
End Path 0

Begin Path 1 [Tree 2]


output "OLE DB Source Output" (124); component "SrcDepartment" (114)
input "Sort Input" (147); component "SortDepartment" (146)
End Path 1
Begin Path 2 [Tree 3]
output "Sort Output" (148); component "SortDepartment" (146)
input "OLE DB Destination Input" (142); component "DestDepartment" (129)
End Path 2

Best Practice #7 - DefaultBufferMaxSize and DefaultBufferMaxRows

As per the above Best Practice, the execution tree creates buffers for storing incoming rows and
performing transformations. So,
How many buffers does it create?
How many rows fit into a single buffer?
How does it impact performance?
The number of buffer created is dependent on how many rows fit into a buffer and how many rows
fit into a buffer dependent on few other factors.
--The first consideration is, the estimated row size, which is the sum of the maximum sizes of
all the columns from the incoming records.

Page 25
--The second consideration is, the DefaultBufferMaxSize property of the data flow task. This
property specifies the default maximum size of a buffer. The default value is 10 MB and its
upper and lower boundaries are constrained by two internal properties of SSIS which are
MaxBufferSize (100MB) and MinBufferSize (64 KB). It means the size of a buffer can be as small as
64 KB and as large as 100 MB.
--The third factor is, DefaultBufferMaxRows which is again a property of data flow task which
specifies the default number of rows in a buffer. Its default value is 10000.
Best Practice #8 - How DelayValidation property can help you

SSIS uses validation to determine if the package could fail at runtime. SSIS uses two types of
validation. First is package validation (early validation) which validates the package and all its
components before starting the execution of the package. Second SSIS uses component validation
(late validation), which validates the components of the package once started.
Let's consider a scenario where the first component of the package creates an object i.e. a
temporary table, which is being referenced by the second component of the package. During
package validation, the first component has not yet executed, so no object has been created
causing a package validation failure when validating the second component.
SSIS will throw a validation exception and will not start the package execution. So how will you
get this package running in this common scenario?
To help you in this scenario, every component has a DelayValidation (default=FALSE) property. If
you set it to TRUE, early validation will be skipped and the component will be validated during
package execution.
Best Practice #9 - Better performance with parallel execution
SSIS has been designed to achieve high performance by running the executables of the
package and data flow tasks in parallel. This parallel execution of the SSIS package
executables and data flow tasks can be controlled by two properties provided by SSIS as
discussed below.

-- MaxConcurrentExecutables - It's the property of the SSIS package and


specifies the number of executables (different tasks inside the package) that can
run in parallel within a package or in other words, the number of threads SSIS
runtime engine can create to execute the executables of the package in parallel.
If we have a package workflow with parallel tasks, this property will make a
difference. Its default value is -1, which means total number of available processors
+ 2, also if you have hyper-threading enabled then it is total number of logical
processors + 2.
--EngineThreads - As I said, MaxConcurrentExecutables is a property of the
package and used by SSIS runtime engine for parallel execution of package
executables, likewise data flow tasks have the EngineThreads property which is
used by the data flow pipeline engine and has a default value of 5 in SSIS 2005
and 10 in SSIS 2008.

Page 26
This property specifies the number of source threads (does data pull from source)
and worker thread (does transformation and upload into the destination) that can be
created by data flow pipeline engine to manage the flow of data and data
transformation inside a data flow task, it means if the EngineThreads has value 5
then up to 5 source threads and also up to 5 worker threads can be created. Please
note, this property is just a suggestion to the data flow pipeline engine, the pipeline
engine may create less or more threads if required.
Best Practice #10 - Lookup transformation consideration

In the data warehousing world, it's a frequent requirement to have records from a
source by matching them with a lookup table.
Lookup transformation has been designed to perform optimally; for example by
default it uses
Full Caching mode, in which all reference dataset records are brought into
memory in the beginning (pre-execute phase of the package) and kept for
reference. This way it ensures the lookup operation performs faster and at the same
time it reduces the load on the reference data table as it does not have to fetch
each individual record one by one when required.
If you do not have enough memory or the data does change frequently you can either use
Partial caching mode or No Caching mode.
In Partial Caching mode, whenever a record is required it is pulled from the reference
table and kept in memory, with it you can also specify the maximum amount of memory to
be used for caching and if it crosses that limit it removes the least used records from
memory to make room for new records.
This mode is recommended when you have memory constraints and your reference data
does not change frequently.
No Caching mode performs slower as every time it needs a record it pulls from the
reference table and no caching is done except the last row.
It is recommended if you have a large reference dataset and you don't have enough
memory to hold it and also if your reference data is changing frequently and you want the
latest data.
Best Practice #11 - Finally few more general SSIS tips

1. Merge Statement: Use the MERGE statement for joining INSERT and UPDATE
data in a single statement while incrementally uploading data (no need for lookup
transformation) and Change Data Capture for incremental data pulls.
2. Change Data Capture: CDC is a new feature in SQL Server 2008 that records
insert, update and delete activity in SQL Server tables. A good example of how
this feature can be used is in performing periodic updates to a data warehouse.

Page 27
The requirement for the extract, transform, and load (ETL) process is to update the
data warehouse with any data that has changed in the source systems since the
last time the ETL process was run.
Note: CDC is a feature that must be enabled at the database level; it is disabled
by default. To enable CDC you must be a member of the sysadmin fixed server
role. You can enable CDC on any user database; you cannot enable it on system
databases.
3. RunInOptimizedMode (default FALSE) property of data flow task can be set to
TRUE to disable columns for letting them flow down the line if they are not being
used by downstream components of the data flow task. Hence it improves the
performance of the data flow task.
The SSIS project also has the RunInOptimizedMode property, which is applicable
at design time only, which if you set to TRUE ensures all the data flow tasks are
run in optimized mode irrespective of individual settings at the data flow task level.
4. Make use of sequence containers to group logical related tasks into a single
group for better visibility and understanding.
5. By default a task, like Execute SQL task or Data Flow task, opens a
connection when starting and closes it once its execution completes. If you want to
reuse the same connection in multiple tasks, you can set RetainSameConnection
property of connection manager to TRUE, in that case once the connection is
opened it will stay open, so that other tasks can reuse and also in that single
connection.

Page 28

Data warehousing concepts


Dimensional data model is most often used in data warehousing systems. This is
different from the 3rd normal form, commonly used for transactional (OLTP) type
systems.

What is Data Warehouse? A data warehouse is a electronic storage of an


Organization's historical data for the purpose of analysis and reporting.
According to Bill Inmon, a datawarehouse should be subject-oriented, nonvolatile, integrated and time-variant.
Uses of DWH: A data warehouse helps to integrate data (see Data integration) and
store them historically so that we can analyze different aspects of business
including, performance analysis, trend, prediction etc. over a given time frame and
use the result of our analysis to improve the efficiency of business processes .

What is data mart? Data marts are generally designed for a single subject area.
An organization may have data pertaining to different departments like
Finance, HR, Marketting etc. stored in data warehouse and each department
may have separate data marts. These data marts can be built on top of the
data warehouse.

What is ER model? ER model or entity-relationship model is a particular


methodology of data modeling wherein the goal of modeling is to normalize
the data by reducing redundancy. This is different than dimensional modeling
where the main goal is to improve the data retrieval mechanism.

What is dimensional modelling? Dimensional model consists of dimension


and fact tables. Fact tables store different transactional measurements and
the foreign keys from dimension tables that qualifies the data. The goal of
Dimensional model is not to achive high degree of normalization but to
facilitate easy and faster data retrieval.
Ralph Kimball is one of the strongest proponents of this very popular data modeling
technique which is often used in many enterprise level data warehouses.
If you want to read a quick and simple guide on dimensional modeling, please check
our Guide to dimensional modeling.

Page 29

What is dimension? A dimension is something that qualifies a quantity


(measure).
For an example, consider this: If I just say 20kg, it does not mean anything. But
if I say, "20kg of Rice (Product) is sold to Ramesh (customer) on 5th April (date)",
then that gives a meaningful sense. These product, customer and dates are some
dimension that qualified the measure - 20kg.
Dimensions are mutually independent. Technically speaking, a dimension is a data
element that categorizes each item in a data set into non-overlapping regions.

What is Fact? A fact is something that is quantifiable (Or measurable). Facts are
typically (but not always) numerical values that can be aggregated.

What are additive, semi-additive and non-additive measures?


Non-additive Measures
Non-additive measures are those which cannot be used inside any numeric
aggregation function (e.g. SUM (), AVG () etc.). One example of non-additive fact
is any kind of ratio or percentage. Example, 5% profit margin, revenue to asset ratio
etc. A non-numerical data can also be a non-additive measure when that data is
stored in fact tables, e.g. some kind of varchar flags in the fact table.

Semi Additive Measures


Semi-additive measures are those where only a subsetof aggregation function
can be applied. Lets say account balance. A sum() function on balance does not
give a useful result but max() or min() balance might be useful. Consider price rate
or currency rate. Sum is meaningless on rate; however, average function might be
useful.

Additive Measures
Additive measures can be used with any aggregation function like Sum(), Avg() etc.
Example is Sales Quantity etc.

Page 30

You might also like