Tuesday, February 10, 2009

SQL Optimization Tips

Tips

1. Usage of Stored Procedures


Probably the most common mistake which is found in application code are query requests that doesn't make use of either prepared queries or stored procedures and instead use non-parameterized ad-hoc queries to request data from the database.
Not preparing your queries or using stored procedures can unnecessarily bloat SQL Server's plan cache. What is the plan cache? Simply speaking, it's a part of the SQL Server's pool of shared memory where query execution plans are kept after queries have been parsed, compiled, and optimized for execution. This area of memory is searched whenever a query is executed in order to determine if an existing plan can be re-used to satisfy a query request. Re-using plans saves the database engine the potentially CPU intensive work of having to re-parse, re-compile, and re-optimize the query over and over again even if the only difference is the values being used in the WHERE clause. This leads to quicker query response times and lessens the chance of CPU pressure on the server.
The cached execution plan used to give stored procedures a performance advantage over queries. However, for the last couple of versions of SQL Server, execution plans are cached for all T-SQL batches, regardless of whether or not they are in a stored procedure. Therefore, performance based on this feature is no longer a selling point for stored procedures. Any T-SQL batch with static syntax that is submitted frequently enough to prevent its execution plan from aging out of memory will receive identical performance benefits. The "static" part is key; any change, even something as insignificant as the addition of a comment, will prevent matching with a cached plan and thereby prevent plan re-use.
However, stored procedures can still provide performance benefits where they can be used to reduce network traffic. You only have to send the EXECUTE stored_proc_name statement over the wire instead of a whole T-SQL routine, which can be pretty extensive for complex operations. A well-designed stored procedure can reduce many round trips between the client and the server to a single call.
Additionally, using stored procedures allows you to enhance execution plan re-use, and thereby improve performance, by using remote procedure calls (RPCs) to process the stored procedure on the server. When you use a SqlCommand.CommandType of StoredProcedure, the stored procedure is executed via RPC. The way RPC marshals parameters and calls the procedure on the server side makes it easier for the engine to find the matching execution plan and simply plug in the updated parameter values.

One last thing to think about when considering using stored procedures to enhance performance is whether you are leveraging T-SQL strengths. Think about what you want to do with the data.


• Are you using set-based operations, or doing other operations that are strongly supported in T-SQL? Then stored procedures are an option, although in-line queries would also work.


• Are you trying to do row-based operations, or complex string manipulation? Then you probably want to re-think doing this processing in T-SQL, which excludes using stored procedures, at least until SQL Server 2005 is released and Common Language Runtime (CLR) integration is available.
1.1 Use SET NOCOUNT ONStored procedures by default will return the count of rows affected by each statement in the procedure.

If you don't need to use this information in your application (most applications don't), use the SET NOCOUNT ON statement in your stored procedure to stop this behavior. This removes one or more trips between the client and server, depending on how many statements that affect rows are included in the stored procedure. This doesn't sound like a huge issue, but it can adversely affect performance in high-traffic applications.


1.2 Don't Use the sp_ prefix


The sp_ prefix is reserved for system stored procedures. The database engine will always look for stored procedures with this prefix first in the master database. This means that it will take just a bit longer to finish processing while the engine looks through the master database first, and then the database where the stored procedure actually resides. Also, if there happens to be a system stored procedure with an identical name, your procedure won't get processed at all.


1.3 Use Optional Parameters Sparingly


Think carefully before using optional parameters extensively. You can easily impact performance by doing extra work that isn't required based on the set of parameters fed in for any given execution. You can get around this by using conditional coding for every possible combination of parameters, but this is time-consuming and increases your opportunities for error.

1.4 Use OUTPUT Parameters

Where PossibleYou can add a little speed and save a little processing power by using an OUTPUT parameter to return scalar data. Where your application needs a single value returned, try this approach instead of materializing a result set. You can also use an OUTPUT parameter to return a cursor when that is appropriate, but we'll leave the cursors vs. set-based processing ideological dispute for a future article.


1.5 DDL First, DML LaterSQL Server will recompile a stored procedure when a DML statement is executed after a data definition language (DDL) statement.

where the DML references any object modified by the DDL. This happens because SQL Server needs to take into account the changes made to the object by the DDL in order to create a plan for the DML. If you take care of all your DDL at the start of the stored procedure, then it only has to recompile once. If you mix DDL and DML statements, you will force the stored procedure to recompile multiple times, and that will adversely affect your performance.


2. Turn on the execution plan, and statistics


The first thing you need to do is to use the tools that help you determine whether a query done one way is better than another. That’s what we were trying to do. By comparing the original query to a new query that we come up with is the best way to evaluate the benefits of any changes.
To do this, go into SQL Server Management Studio and select the Query menu. Select the “Include Actual Query Plan.” This turns on the graphical Execution Plan when you execute a query, and that can be found in the bottom pane after the execution.
In the Execution Plan, you can mouse over the components of the plan and it provides tool tip information boxes. The box contains Estimated Sub tree Cost, which can be used to help determine whether one query is better than another. Of course, it’s not always right, as some query parts are not included in the execution plan, but it helps. It is also helpful to know the estimated number of rows, which is also found in this tool tip box.
Next, turn on statistics. Type the following statement:


SET STATISTICS IO ON;


This causes statistics to be output to the Messages tab in the bottom pane. The information you want here is mainly logical reads and physical reads. Logical reads are page reads from memory. Physical reads are page reads from disk. This stat can be a little deceptive as it doesn’t include CPU in the metric, but in general, the less page reads, the less work done, and so the more performance the query will be.
To counteract the above two you should also compare the actual execution times. To do this, execute the following statement:


SET STATISTICS TIME ON;


This also has issues, as blocking and contention issues affect the output time. You should execute the query a few times to determine how accurate the time shown is. en that part is loaded, otherwise it is not loaded. I have kept all 8 pages in Iframe and it reduced page load time to 50%.


3. TempDB Optimization


When SQL Server is installed the setup program creates tempdb database. Tempdb is a system database used by SQL Server to store temporary tables and temporary stored procedures, for sorting, subqueries, and aggregates with GROUP BY, ORDER BY, for cursors and so on. Tempdb database contains only temporary objects, so if you want to create a permanent object, do not create it in the tempdb database.


3.1 Physical Properties of tempdb:


The following table lists the initial configuration values of the tempdb data and log files. The sizes of these files may vary slightly for different editions of SQL Server.
File Logical name Physical name File growth Primary data tempdev tempdb.mdf Auto grow by 10 percent until the disk is full Log templog templog.ldf Auto grow by 10 percent to a maximum of 2 terabytes
The size of tempdb can affect the performance of a system. For example, if the tempdb size is too small, the system processing could be too occupied with auto growing the database to support your workload requirement every time that you start SQL Server. You can avoid this overhead by increasing the size of tempdb
Table of Contents


3.2 Performance Improvements in tempdb


In SQL Server, tempdb performance is improved in the following ways:


· Temporary tables and table variables may be cached. Caching allows operations that drop and create the temporary objects to execute very quickly and reduces page allocation contention.
· Allocation page latching protocol is improved. This reduces the number of UP (update) latches that are used.
· Logging overhead for tempdb is reduced. This reduces disk I/O bandwidth consumption on the tempdb log file.
· The algorithm for allocating mixed pages in tempdb is improved.
Moving System Databases:
This topic describes how to move system databases in SQL Server. Moving system databases may be useful in the following situations:
· Failure recovery. For example, the database is in suspect mode or has shut down because of a hardware failure.
· Planned relocation.
· Relocation for scheduled disk maintenance.
Connect to an instance either using management studio or query analyzer. Let us first check physical location and logical name of the existing tempdb files.
– FOR SQLServer 2005 USE MASTER GO SELECT name,physical_name,size,max_size FROM sys.master_files WHERE database_id = DB_ID(N’tempdb’); GO
OR
- For SQLServer 2000. This query can be used in SQLServer 2005 – as well. USE TEMPDB GO SELECT name,filename,size,max size
Table of Contents
- FROM sysfiles GO
Following will be the output from the first query. We have modified the result set to fit into the page.
name physical_name size max_size ——- ———————— —– ——- tempdev C:\MSSQL\DATA\tempdb.mdf 1024 -1 templog C:\MSSQL\DATA\templog.ldf 64 -1
In the above result set,:
• name is the logical name of the file. • Physical_name is current location where data and log file resides. It has been modified to fit into page for better display. • Size is the file size shown in terms of 8KB pages. • Max_size indicates maximum size of the file. -1 indicates, that we allow file to grow until disk is full.
Now issue the following command to change the size of the data and log file and also specify new path to put files in new location. If one does not want to change the file size, SIZE parameter can be omitted. Make sure that you change the FILENAME parameter value appropriately before executing the command.
USE MASTER GO ALTER DATABASE tempdb MODIFY FILE (NAME = tempdev, SIZE=200 MB, FILENAME = ‘D:\SQLServer\Data\tempdb.mdf’) GO
ALTER DATABASE tempdb MODIFY FILE (NAME = templog, SIZE = 500 MB, FILENAME = ‘D:\SQLServer\Data\templog.ldf’) GO
Even though command is executed successfully, change will not be immediate. We need to stop and restart SQL Server service. Next time upon instance startup, tempdb will be created using files in new location.
Table of Contents
Once instance is up and running, we need to delete old tempdb files from its original location. Let us verify that change has taken place by issuing following command. New values for location and size will be displayed along with other related values.
USE tempdb GO sp_helpfile GO
Appropriate sizing and placement of tempdb will definitely help in resolving tempdb performance bottlenecks. In our future posts, we will discuss how to avoid performance issues related to tempdb – we will look at hardware optimization and placement of tempdb, number of files created for tempdb, proper file sizes and code optimizations.


3.3 tempdb Space Requirements:


The size and physical placement of the tempdb database can affect the performance of a system. For example, if the size that is defined for tempdb is too small, part of the system-processing load may be taken up with auto growing tempdb to the size required to support the workload every time you restart the instance of SQL Server. You can avoid this overhead by increasing the sizes of the tempdb data and log file. For information about determining the appropriate amount of disk space required for tempdb.
It is difficult to estimate the tempdb space requirement for an application. This section describes the general methodology for estimating the space requirement. These methods are not precise. It requires experience and experiment to gain a satisfactory result. We recommend that you always have a safety factor of about 20% more space. Space estimates must also allow for the future growth of data in the application.
To understand the space requirement of tempdb data files, first look at which features of SQL Server 2005 use tempdb. The tempdb space requirement is different for each feature. The following features use tempdb:
Query
Triggers
Snapshot isolation and read committed snapshot (RCSI)
MARS
Online index creation
Table of Contents
Temporary tables, table variables, and table-valued functions
DBCC CHECK
LOB parameters
Cursors
Service Broker and event notification
XML and LOB variable
Query notifications
Database mail
Index creation
User-defined functions
A server instance may use some or all of these features. For some servers, one or two features might dominate tempdb space usage. In this case, we concentrate most of the capacity planning effort on these few features. For example, you may find out that for one server, a query requires a peak of 50 GB of tempdb space, and RCSI requires a peak of 20 GB of tempdb space, and the remaining features require only 1 GB of peak space. It would be sufficient to allocate 85 GB (71GB + 20% overhead) of tempdb space for this server.

4. SQL Server Profiler:


SQL Profiler is a tool that captures SQL Server events from the server and saves those events in what's known as a trace file. You can then analyze or use the trace file to troubleshoot logic or performance problems. You can use this utility to monitor several areas of server activity, such as:
· Analyzing and debugging SQL statements and stored procedures.
· Monitoring slow performance.
· Stress analysis.
· General debugging and troubleshooting.
· Fine-tuning indexes.
· Auditing and reviewing security activity.


For the most part, Profiler is an administrative tool that requires a bit of experience to master. (Use Profiler to monitor only the events you're troubleshooting or analyzing.) Fortunately, Profiler provides a graphical interface, which makes both learning and monitoring much simpler. However, Profiler requires 10 MB of free space; if free space falls below 10 MB, Profiler stops.
To access Profiler, you must be the administrator or have permission to connect to a specific instance of SQL Server and have granted permissions to execute Profiler stored procedures.
Although on the surface SQL Server 2005 Profiler has not changed dramatically, you may still need help setting up a Profiler session. Here I'll outline several steps to get you up and running. The final step includes sample T-SQL queries to analyze the SQL Server Profiler results.


1.Start SQL Server 2005 Profiler via the GUI interface by navigating to Start All Programs Microsoft SQL Server 2005 Performance Tools SQL Server Profiler.

Start a new Trace session by selecting the 'File' menu and the 'New Trace' option. Once the 'Connect to Server' interface loads, select the 'Type' as either 'Database Engine' or 'Analysis Services'. For this tip we will use the 'Database Engine' option. Then select the 'Server Name' followed by the 'Authentication' type. Finally, press the 'Connect' button to start the configuration.

Configure SQL Server Profiler Trace Properties – General Tab:
· Trace Name: Specify a meaningful name for the session
· Use the template: A total of eight templates are available with predefined events selected
· For this tip we will use the 'Standard (default)' template
· Save to table: To retain a copy of the data save the results to either a database table or Windows file.
· For this tip we will save the results to the dbo.TraceResults table in my user defined database

Configure SQL Server Profiler Trace Properties – Events Selection Tab:
· Review the specific events and select the needed check boxes to capture the desired data.
· Show All Events: Select this check box to see all events that Profiler will be able to capture.
· Show All Columns: Select this check box to see all columns that Profiler will be able to capture.

Additional Configuration – Column Filters
· Specify filters based on the columns that are selected for the session to limit the data.

Additional Configuration – Organize Columns
· Specify the column order as well grouping settings for the final data.

To start the session, press the 'Run' button on the Trace Properties interface.
Review the results that are captured in the Profiler interface.

5. SQL Server Performance Monitor Counters :


Performance Monitor collects data about different counters, such as memory use. Performance Monitor can show you data in graphical format in real time, or you can save the data to log files. you will also use log files for long-term performance monitoring. Working with a log file can be difficult to learn on your own because the options are not intuitive and are hidden on different screens.
You can choose between two Performance Monitors: one in the Administrative Tools group and one in the SQL Server group. They are the same basic program, but you need to run the one in the SQL Server group because it automatically loads the SQL Server-related counters. You run this version of the program with the following command: Perfmon.exe C:\Mssql\Binn\ Sqlctrs.pmc, where the .pmc file is the Performance Monitor counter file that contains the SQL counters. You can write applications that provide your own counters, and you can modify the new system stored procedures called sp_user_counter1 through sp_user_counter10 and track them, too.
When you run the program from the SQL Server group.The program that includes the set of SQL counters, five counters appear at the bottom of the window when Performance Monitor starts. The five counters are
· Cache Hit Ratio
· I/O — Transactions per second
· I/O — Page Reads per second
· I/O Single Page Writes per second
· User Connections


Understanding Counters


Windows NT lets you watch the performance of the system by “counting” the activity associated with any of its objects. Examples of objects in Windows NT are processors, disk drives, and processes. Each object has specific counters associated with it; for example, the % User Time counter is associated with a CPU or processor to designate what percent of the CPU is taken up by user programs (as opposed to system processes).

SQL Server includes many predefined counters, most of which you aren’t likely to use except in special cases. It can be difficult to know which counters are the basic
ones to watch. If you have chosen the SQL Server Performance Monitor, several counters have been set up as default counters, such as Cache Hit Ratio and User Connections. You can create your own defaults by creating a .pmc file.


Cache Hit Ratio :
To monitor your cache, watch SQL Server — Cache Hit Ratio. It monitors the rate at which the system finds pages in memory without having to go to disk. The cache hit ratio is the number of logical reads divided by the total of logical plus physical reads. If the value for this counter is consistently less than 80 percent, you should allocate more memory to SQL Server, buy more system memory, or both. However, before you buy more memory, you can try changing the read-ahead configuration options. Also look at the discussion of free buffers in the next chapter to determine whether the number of free buffers is approaching zero. Changing the free buffers configuration parameter may increase the cache hit ratio.


I/O — Transactions per second
SQL Server: I/O Transactions Per Second counter is a bit misleading, it is still good, especially for capacity planning. This counter measures the number of Transact-SQL batches processed since the last refresh period. You should not use these results against any standard TPC benchmark tests that give results in transactions per second — it is not referring to a Begin/Commit transaction, just to batches of commands. Watch this number over a span of several months, because an increase in this counter can indicate that the use of SQL Server is growing.


I/O — Page Reads per second:
The SQL Server: I/O Page Reads per Second counter is the number of pages not found in SQL Server data cache, which indicates physical reads of data pages from disk. This value does not count pages that are read from the Windows NT virtual memory disk file. There is no way to watch only the logical page reads per second. According to sources in the SQL development team, counters for logical pages reads are hidden in a structure that is not available in this version of SQL Server. However, you can figure out the logical page reads per second by taking the total page reads per second and subtracting the physical page reads per second.


I/O Single Page Writes per second:
You should occasionally turn on the I/O Single Page Writes counter. A lot of single page writes means you need to tune SQL Server, because it is writing single pages to disk instead of its normal block of pages. Most writes consist of an entire extent

(eight pages) and are performed at a checkpoint. The lazywriter handles all the writing of an extent at a time. When SQL is forced to hunt for free pages, it starts finding and writing the LRU pages to disk — one page at a time. A high number of single page writes means that SQL Server does not have enough memory to keep a normal amount of pages in data cache.


User Connections:
Last but not least, you can define counters. The user-defined counters are in the SQL Server User-Defined Counters object in the Master database. The 10 counters correspond to 10 new stored procedures called sp_User_Counter1 through sp_User_Counter10. These stored procedures are the only system stored procedures you should change. If you look at the code of the procedure, they all perform a Select 0, which, when tracked on Performance Monitor, draws a flat line at the bottom of the screen. Replace the Select 0 with a Select statement that returns one number; an integer is preferable, but float, real, and decimal numbers also work. These queries should be quick, not ones that take minutes to run.


1. 6.Tuning SQL Server Hardware:
Tuning SQL Server for performance is in many ways like constructing a house. You must find a suitable location with rich soil, build a solid foundation to support two or three levels, install the electrical and plumbing, finish the walls for painting and decorating, and finally conduct ongoing maintenance.
When specifying your SQL Server hardware requirements always consider the following:
The size of the database;
Number of concurrent users;
Transaction throughput;
The operations that will be performed on the databases.
Hardware planning:
Proper hardware planning is the first step in achieving high performance. It is imperative to determine what hardware to use based on requirements that can be calculated by capacity planning for CPUs, memory, drives, network interface cards (NICs), etc. At times this decision is rather easy; you can use corporate standards to simplify the hardware support. If standards do not exist, seize the opportunity and begin building standards to simplify the overall management.

While standardization is critical to supporting a large number of servers, it should be complimented by the latest technologies so you may continuously achieve higher performance at lower costs. Two current hardware technologies that should be considered in the short and long-term are 64-bit technologies and database accelerators.


Hardware-based accelerators:
Database accelerators have been getting a fair amount of press lately for SQL Server. At a conceptual level, database accelerators are hardware devices with CPUs and memory. They store the entire database or a subset of tables in memory to process user transactions that write back to the SQL Server, which ultimately stores the data. The advantage is that these devices can sometimes support the entire database or core tables in dedicated memory, which outperforms accessing the data from disk. This can help from a scale up or scale out perspective because a single database accelerator can alleviate hardware bottlenecks without a change to the existing SQL Server.
I would recommend that for a production server that hosts your important most frequently accessed databases and for database that must have high availability that you upgrade the SQL Server hardware beyond the minimum specification in some cases to the highest specification that can produce.
This will help to prevent bottlenecks and poor system performance.
Processor: I would recommend that you "scale up" your server's processor and in some cases also "Scale out."
By "Scaling Up" to faster processors SQL Server will process queries faster. This will result in your end users seeing an improvement in performance.
Also it may be necessary to "scale out" your processors. According to Microsoft © SQL Server can support up to 32 processors on Symmetrical Multiprocessing (SMP) computers running on Windows 2000 Data Centre Server.
Scaling out multiple processors may improve fault tolerance and maybe necessary for some very large databases.


Disk Drives and Disk Space:
When planning your SQL Server Hardware Requirements it is important to remember that database applications can extremely Input/Output intensive.
Most senior SQL Server DBA’s would recommend using fast SCSI drives and quality disk controllers (or if you have no budgetary restrictions a SAN environment).

As an important note when using write caching disk controllers, I recommend that you confirm with the manufacturer before implementation that they are suitable for use with Database Servers.
Using inappropriate write caching disk controllers can result in data loss if a "catastrophic failure" occurs, as an example: loss of power.
If you want redundancy and fault tolerance you should consider using RAID. To quote from Microsoft "The different Levels of RAID provide varying levels of performance and fault tolerance. Hardware based RAIDS provides more performance that Operating System RAID


Memory:
An insufficient amount of RAM can cause your SQL server to continually read data from disk instead of cached memory.
This will impact on query performance in most cases this impact will be significant. Having an appropriate amount of memory will allow SQL Server to process your queries more efficiently: it can hold more data in its cache.

6. SQL Server Join Hints:
JOIN hints can be used in a query to specify the type of JOIN the Query Optimizer is to use for the execution plan. The JOIN options are:
Hash
Merge
Loop
Hash Joins:
Hash joins are used for joining large data sets. The optimizer uses the smaller of two tables or data sources to build a hash table on the join key in memory. It then scans the larger table, probing the hash table to find the joined rows.
This method is best used when the smaller table fits in available memory. The cost is then limited to a single read pass over the data for the two tables.
The optimizer uses a hash join to join two tables if they are joined using an equijoin and if either of the following conditions are true:
A large amount of data needs to be joined.
A large fraction of a small table needs to be joined.
Merge Joins:
Sort merge joins can be used to join rows from two independent sources. Hash joins generally perform better than sort merge joins. On the other hand, sort merge joins can perform better than hash joins if both of the following conditions exist:
The row sources are sorted already.
A sort operation does not have to be done.
However, if a sort merge join involves choosing a slower access method (an index scan as opposed to a full table scan), then the benefit of using a sort merge might be lost.
Sort merge joins are useful when the join condition between two tables is an inequality condition (but not a non equality) like <, <=, >, or >=. Sort merge joins

perform better than nested loop joins for large data sets. You cannot use hash joins unless there is an equality condition.
In a merge join, there is no concept of a driving table. The join consists of two steps:
Sort join operation: Both the inputs are sorted on the join key.
Merge join operation: The sorted lists are merged together.
If the input is already sorted by the join column, then a sort join operation is not performed for that row source.
The optimizer can choose a sort merge join over a hash join for joining large amounts of data if any of the following conditions are true:
The join condition between two tables is not an equi-join.
Because of sorts already required by other operations, the optimizer finds it is cheaper to use a sort merge than a hash join.
loop joins
Nested loop joins are useful when small subsets of data are being joined and if the join condition is an efficient way of accessing the second table.
It is very important to ensure that the inner table is driven from (dependent on) the outer table. If the inner table's access path is independent of the outer table, then the same rows are retrieved for every iteration of the outer loop, degrading performance considerably. In such cases, hash joins joining the two independent row sources perform better.
A nested loop join involves the following steps:
The optimizer determines the driving table and designates it as the outer table.
The other table is designated as the inner table.
For every row in the outer table, Oracle accesses all the rows in the inner table. The outer loop is for every row in outer table and the inner loop is for every row in the inner table. The outer loop appears before the inner loop in the execution plan, as follows:
NESTED LOOPS
outer_loop
inner_loop

7. Performance Tuning Tips for SQL Server Backup and Restore:


In SQL Server, you can back up a database or the transaction log. A backup creates an image of your data that you can restore to the source database or another database. There are three types of database backups: full, differential and copy-only.
A full backup is a copy of your backup that contains the state of the database at the point the database backup started. It includes the changes that occurred since that point, until the point where the data portion of the backup completed.
A differential backup contains all changes that occurred since the last complete backup. So, a differential backup done on Monday will contain all the data that has changed in the database since the last full backup was done on Sunday. A differential backup done on Tuesday will contain all data that has changed in the database since Sunday.
A copy-only backup is one that will not interrupt differential backups or log shipping chains.
A database backup includes the state of the database as of the time you started the backup and any changes that occurred while you were backing up the data portion of the backup.
You can back up a complete database or one or more filegroups or files of the database. You can do a complete backup (called a full backup), a differential backup (all changes that occurred in the database, filegroup or file since the last full backup) or a copy-only backup (SQL 2005 only).
Here are some examples:


BACKUP DATABASE AdventureWorks TO DISK='test.bak'
RESTORE FILELISTONLY FROM DISK='test.bak' WITH FILE=1
Backup/Restore Optimization Tips
l Try to perform backup to the local hard disk first, and copy backup file(s) to the tape later
l Perform backup on multiple backup devices.
l Perform backup on a physical disk array, so the more disks in array the more quickly the backup will be made.
Table of Contents
l Perform backups during periods of low database access.
l Use full backup to minimize the time to restore databases.
l Use incremental backup to minimize the time to backup databases.
l Use differential backup instead of incremental backup when the users update the same data many times.
l Try to separate your database to different files and filegroups to backing up only appropriate file/filegroup.
l Use Windows NT Performance Monitor or Windows 2000 System Monitor to check a backup impact on the total system performance.
l To decrease the backup operation's time consider backing up more often.
l Place a tape drive on another SCSI bus as disks or a CD-ROM drive.
There is a trend today to store more and more data, especially blobs of it, in databases. Terabyte databases are no longer a rarity. SQL Server 2005 addresses problems with backing up and restoring these very large databases and provides greater recovery options for the VLDBs, as well as the smaller databases.
Copy-only backups – allows you to copy databases for cloning purposes
Mirrored tape backups – allows simultaneous tape backups up to four devices
Torn page detection's replacement – checksum
Transaction marks, which allow you to keep multiple databases in sync
Restoring from a database snapshot
SQL Server 2000 and SQL Server 2005 offer three database recovery models: Simple Recovery, Full Recovery, and Bulk-Logged Recovery. The database recovery model you choose can affect its performance during these operations: BULK INSERT, SELECT INTO, and CREATE INDEX (including indexed views), WRITETEXT, and UPDATETEXT.
For example, if you choose the Full Recovery model, BULK INSERT, SELECT INTO, and CREATE INDEX (including indexed views), WRITETEXT, and UPDATETEXT operations are all fully logged. While this ensures the best recoverability, it is not the fastest or most efficient way to perform these operations.

If you want to speed these operations, you can choose the Bulk-Logged Recovery model, where these operations are minimally logged. This option still provides a good level of recoverability, but is faster than using the Full Recovery model. This option is similar to the SELECT INTO/BULK COPY database option available with older versions of SQL Server. The Simple Recovery model is most similar to the TRUNC. LOG ON CHECKPT. option available with older versions of SQL Server.


8. SQL Server Replication:
Replication is the process of sharing data between databases in different locations. Using replication, we can create copies of the database and share the copy with different users so that they can make changes to their local copy of database and later synchronize the changes to the source database.


The following functions and terms are used in replication:

Publisher
Publication
Article
Distributor
Subscribers
Subscription
Publisher is a server that makes the data available for subscription to other servers. In addition to that, publisher also identifies what data has changed at the subscriber during the synchronizing process. Publisher contains publication(s).
Subscriber is a server that receives and maintains the published data. Modifications to the data at subscriber can be propagated back to the publisher.
Distributor is the server that manages the flow of data through the replication system. Two types of distributors are present, one is remote distributor and the other one local distributor. Remote distributor is separate from publisher and is configured as distributor for replication. Local distributor is a server that is configured as publisher and distributor.
Agents are the processes that are responsible for copying and distributing data between publisher and subscriber. There are different types of agents supporting different types of replication.

Snapshot Agent is an executable file that prepares snapshot files containing schema and data of published tables and database objects, stores the files in the snapshot folder, and records synchronization jobs in the distribution database.
An article can be any database object, like Tables (Column filtered or Row filtered), Views, Indexed views, Stored Procedures, and User defined functions.
Publication is a collection of articles.
Replication Benefits
Users working in different geographic locations can work with their local copy of data thus allowing greater autonomy.
Database replication can also supplement your disaster-recovery plans by duplicating the data from a local database server to a remote database server. If the primary server fails, your applications can switch to the replicated copy of the data and continue operations.
You can automatically back up a database by keeping a replica on a different computer. Unlike traditional backup methods that prevent users from getting access to a database during backup, replication allows you to continue making changes online.
You can replicate a database on additional network servers and reassign users to balance the loads across those servers. You can also give users who need constant access to a database their own replica, thereby reducing the total network traffic.
Database-replication logs the selected database transactions to a set of internal replication-management tables, which can then be synchronized to the source database. Database replication is different from file replication, which essentially copies files.
Microsoft SQL Server 2005 supports the following types of replication:
Snapshot Replication
Transactional Replication
Merge Replication
Snapshot Replication

Snapshot replication is also known as static replication. Snapshot replication copies and distributes data and database objects exactly as they appear at the current moment in time.
Subscribers are updated with complete modified data and not by individual transactions, and are not continuous in nature.
This type is mostly used when the amount of data to be replicated is small and data/DB objects are static or does not change frequently.
Transactional Replication
Transactional replication is also known as dynamic replication. In transactional replication, modifications to the publication at the publisher are propagated to the subscriber incrementally.
Publisher and the subscriber are always in synchronization and should always be connected.
This type is mostly used when subscribers always need the latest data for processing.
Merge replication
It allows making autonomous changes to replicated data on the Publisher and on the Subscriber. With merge replication, SQL Server captures all incremental data changes in the source and in the target databases, and reconciles conflicts according to rules you configure or using a custom resolver you create. Merge replication is best used when you want to support autonomous changes on the replicated data on the Publisher and on the Subscriber.
Replication agents involved in merge replication are snapshot agent and merge agent.
Implement merge replication if, changes are made constantly at the publisher and subscribing servers, and must be merged in the end.
By default, the publisher wins all conflicts that it has with subscribers because it has the highest priority. Conflict resolver can be customized.


9. Optimizing SQL Server Database Design:
There are two components to designing a database: logical and physical. Logical database design involves modeling your business requirements and data using database components such as tables and constraints without regard for how or where the data will be physically stored. Physical database design involves mapping the logical design onto physical media, taking advantage of the hardware and software features available that allow the data to be physically accessed and maintained as quickly as possible, and indexing.
It is important to correctly design the database to model your business requirements and to take advantage of hardware and software features early in the development cycle of a database application because it is difficult to make changes to these components later.
Techniques for Optimizing Databases
Most of the major DBMSs support the following techniques although perhaps by different names. Each of the following techniques can be used to tune database performance .
Partitioning — breaking a single database table into sections stored in multiple files.
Raw partitions versus file systems — choosing whether to store database data in an OS-controlled file or not.
Indexing — choosing the proper indexes and options to enable efficient queries.
Denormalization — varying from the logical design to achieve better query performance.
Clustering — enforcing the physical sequence of data on disk.
Interleaving data — combining data from multiple tables into a single, sequenced file.
Free space — leaving room for data growth.
Compression — algorithmically reducing storage requirements.
File placement and allocation — putting the right files in the right place.

Page size — using the proper page size for efficient data storage and I/O.
Reorganization — removing inefficiencies from the database .
Let us see what are the points you need to keep in mind while designing an SQL Server database, which needs to deliver optimal performance.
Let us keep a bulleted list of points that I want to discuss.
Normalize your tables to an acceptable level;
Identify the ideal data types for your columns;
Intelligently use constraints;
Use Stored Procedures;
Normalize your tables to an acceptable level:
Normalization is a crucial process in database design, which eliminates redundant data from the database. This makes the database sleek and easily searchable which results in fast retrieval of data. There are five levels to which the database can be normalized. It is best to keep your tables to the third normal form.
One of the main advantages of normalizing a database is that it can eliminate storage of NULLS. They make the database bulky, and will also increase response times.
Identify the ideal data types.
Mistakes are made commonly while defining the columns of a table. I would always suggest giving a second thought on the appropriateness of each column data types. This will have a two-fold effect on your performance. The stored data will be easily accessible, and the tables of your database will not be bulky. Let us take an example of the same.
Consider the table where you need to store an Employee ID, Employee Joining Date and Employee Salary per month for a small firm, with 100 employees. You may create the table like this.
EmployeeID Integer
EmployeeJoiningDate DateTime
EmployeeSalary Money.

The integer datatype takes four bytes to store data; the data time data type takes eight bytes of storage space and the Money datatype consumes eight bytes of memory. That means a typical data row in the table is 20 bytes long. When you have 100 rows, the total memory consumed by the table is 2000 bytes.
Now, when you can use appropriate datatypes, you can bring this number down. That means you can achieve optimization of storage space by opting for the right datatype. Let us see how we can optimize the table design provided above.
The EmployeeID field we know can take only an integer number, with a max of 100, because that is the number of employees in the company. In this case, you can use the tinyint datatype, which can hold values between 1-255. The advantage is that, it takes up only one byte of storage space. See, you have brought down the memory consumption to one fourth of the original, just by selecting the appropriate datatype.
Consider that after all this analytical deduction, I have arrived in a decision to use the smalldatetime and smallmoney datatypes for the EmployeeJoiningDate and EmployeeSalary respectively. The savings I receive are 4 bytes for the EmployeeJoiningDate field and four bytes for the EmployeeSalary field. The total size of one row is just 9 bytes now, and for a hundred records, you receive a saving of a whooping 1100 bytes.
Even though I have taken a simple table structure with just three columns and a max of 100 records, we have saved 1100 bytes. Think of the savings if the table contains a couple of million records and more columns than just three!
Intelligent use of constraints.
In my view, a near perfect (there is again, nothing called perfect!) database system will use constraints to enforce database integrity. Even though there are other methods that serve the same purpose, constraints provide the best option.
Use of Stored Procedures
This is a big area to discuss. Just giving a foreword on it, SQL Server stored procedures can provide you with a faster way to access or manipulate data. The stored procedures can also help us reduce network traffic by minimizing the size of requests from the client and reducing processing capacity required at the client.

10. Optimizing XML Performance:


Design Considerations:
Choose the appropriate XML class for the job.
Consider validating large documents.
Process large documents in chunks, if possible.
Use streaming interfaces.
Consider hard-coded transformations.
Consider element and attribute name lengths.
Consider sharing the XmlNameTable.
Parsing XML:
Use XmlTextReader to parse large XML documents.
Use XmlValidatingReader for validation.
Consider combining XmlReader and XmlDocument.
On the XmlReader, use the MoveToContent and Skip methods to skip unwanted items.
Validating XML:
Use XmlValidatingReader.
Do not validate the same document more than once.
Consider caching the schema.
Writing XML:
Use XmlTextWriter.
XPath Queries
Use XPathDocument to process XPath statements.
Avoid the // operator by reducing the search scope.
Compile both dynamic and static XPath expressions.
XSLT Processing
Use XPathDocument for faster XSLT transformations.
Consider caching compiled style sheets.
Consider splitting complex transformations into several stages.
Minimize the size of the output document.
Write efficient XSLT.


11. General Tips
Use Clustered IndexesHaving the clustered index on the primary key is sometimes not the most efficient place for the clustered index to be. A clustered index is the most performance type of index. The whole table is sorted according to the clustered index. If the table is involved in lots of joins based on the primary key, it is probably the right place for it to be, but if you are continually filtering or grouping on other columns in a table, then you should possibly consider changing the primary key index to Non-Clustered, and putting the clustered index on those filtered or grouped columns.
The following statement removes and existing clustered index on the primary key and replaces it with a non-clustered index:
ALTER TABLE MySchema.SalesOrderHeader
DROP CONSTRAINT PK_SalesOrderHeader
GO
ALTER TABLE MySchema.SalesOrderHeader
ADD CONSTRAINT PK_SalesOrderHeader
PRIMARY KEY NONCLUSTERED(SalesOrderID);
GO
Then the following statement adds a new clustered index to a table.
CREATE CLUSTERED INDEX MyClusteredIndex
ON MySchema.SalesOrderHeader (OrderID)
GO

Table of Contents
Use Indexed Views
Indexed Views have been around for a while. A view is like a named query, and these days you can add indexes to them. If used correctly, they can cause a massive improvement in execution times, often better than a clustered index with covering columns on the original table. Also, in SQL Server Developer Edition and Enterprise Edition, a view index will also be automatically used if it is the best index even if you don’t actually specify the view in your query!
CREATE VIEW MySchema.SalesByCustomer
WITH SCHEMABINDING
AS
SELECT soh.SalesTerritoryID, soh.CustomerID,
SUM(sod.Quantity * sod.UnitPrice)
FROM MySchema.SalesOrderHeader soh
INNER JOIN MySchema.SalesOrderDetail sod
ON (soh.SalesOrderID = sod.SalesOrderID)
GROUP BY soh.SalesOrderTerritory, soh.CustomerID
GO
Note the use of the schema binding attribute. This prevents you from changing underlying tables while this view exists, and is necessary if you want to add an index. Some people avoid indexed views for this reason, as the maintenance becomes more complicated as further dependencies to the view are created. The following statement adds an index:
CREATE UNIQUE CLUSTERED INDEX IdxSalesOrderView
ON MySchema.SalesByCustomer(
SalesTerritoryID, CustomerID
)
GO


User Covering Indexes
Covering indexes are a feature that was newly added to SQL 2005. Basically, you can create an index optimized for the query itself based on joins, filters and grouping, and then add additional columns that can be retrieved directly from the index for use in select statements, as follows:
CREATE NONCLUSTERED INDEX TestIndex
ON MySchema.SalesOrderDetail(OrderId)
INCLUDE (Quantity, UnitPrice)
The above statement causes a non-clustered index to be created on the SalesOrderDetail table. If queries are executed on the OrderId column, the index will be used, and if the only other columns being retrieved are Quantity and UnitPrice, and then the query optimizer doesn’t need to retrieve any extra columns from the underlying table. It can just use the index. Because the query optimizer doesn’t need to query the original table, performance is improved.
Keep your Clustered Index small
One thing you need to consider when determining where to put your clustered index is how big the key for that index will be. The problem here is that the key to the clustered index is also used as the key for every non-clustered index in the table. So if you have a large clustered index on a table with a decent number of rows, the size could blow out significantly. In the case where there is no clustered index on a table, this could be just as bad, because it will use the row pointer, which is 8 bytes per row.
Avoid cursors
Cursors are less perform ant because every FETCH statement executed is equivalent to another SELECT statement execution that returns a single row. The optimizer can’t optimize a CURSOR statement, instead optimizing the queries within each execution of the cursor loop, which is undesirable. Given that most CURSOR statements can be re-written using set logic, they should generally be avoided.


Archive old data
If you want to improve query performance, give the optimizer less work to do. If you can cut down the number of rows the query has deal with, then performance will improve. I have no problem with people creating audit triggers to move historical data into other tables for this reason. Alternatively, if you don’t need your data after a certain period of time, back up your database and remove the data.
Partition your data correctly
These days, you don’t actually have to move old data out of a table to improve query performance. You can partition your table into a number of data segments based on a partition function. The query optimizer can use the partition function to look at rows only on the most appropriate file group. To create partitions, you need a partition function and a partition scheme.
CREATE PARTITION FUNCTION myRangePartitionFunction(int)
AS RANGE RIGHT FOR VALUES (1,100,1000)
Once the partition function is created, you can then apply the function to a partition scheme for a table.
CREATE PARTITION SCHEME myRangePartitionScheme
AS PARTITION myRangePartitionFunction
TO (filegrp1, filegrp2, filegrp3, filegrp4)
Then it’s just a matter of creating the table to use the partition scheme on the column you decided to partition on:
CREATE TABLE mySchema.myPartitionTable(
col1 int,
col2 nvarchar(100)
)
ON myRangePartitionScheme (col1)


Remove user defined inline scalar functions
Inline scalar functions are convenient if you want to return a single value, but at the expense of performance. They look somewhat like stored procedures, and they can be used in SQL statements. The problem is that they are not expanded and therefore not optimized into the query plan by the query optimizer. Bad news, because it turns a Seek into a Scan. Queries like this may appear to be per formant in the Execution plans and also in the IO statistics, but when you run the query, it can perform really badly. No seriously, really bad.
Here’s an example of what I’m talking about:
CREATE FUNCTION dbo.fnGetPostalCode (
@Suburb nvarchar (100),
@State nvarchar (10))
RETURNS int
AS
BEGIN
RETURN ISNULL (
(
SELECT Postal Code
FROM dbo.PostalCode
WHERE Suburb = @Suburb
AND State = @State
), -1);
END
GO
The following statement will only perform a clustered index scan, not a seek, and on a big table this could seriously affect performance.

SELECT s.SalesPersonID,
s.SuburbName,
s.State,
dbo.fnGetPostalCode(s.SuburbName,s.State)
AS PostalCode
FROM dbo.SalesPerson
You can have a look at the details by clicking on SQL Server Management Studio’s Query menu, and selecting “Include Actual Execution Plan”
One way to get around this is to simply inline the underlying query from the function, as follows:
SELECT s.SalesPersonID, s.SuburbName, s.State,
ISNULL( (SELECT PostalCode
FROM dbo.PostalCode
WHERE Suburb = s.SuburbName
AND State = s.State), -1)
AS PostalCode
FROM dbo.SalesPerson
Inline the SQL statement will perform significantly better.

Use APPLY
The apply statement was created for the situation where you put multiple inline nested queries in the one statement. For example, take the following statement:
SELECT soh.SalesOrderID,
Quantity=(SELECT TOP 1 (Quantity)
FROM Sales.SalesOrderDetails
WHERE SalesOrderID = soh.SalesOrderID),
UnitPrice=(SELECT TOP 1 (UnitPrice)
FROM Sales.SalesOrderDetails
WHERE SalesOrderID = soh.SalesOrderID)
FROM Sales.SalesOrderHeader soh
This performs an extra query, retrieving data from another table using the same criterion. This can now be replaced with the following:
SELECT soh.SalesOrderID, soh.OrderDate, a.*
FROM Sales.SalesOrderHeader soh
CROSS APPLY (
SELECT TOP (1) sod.UnitPrice, sod.Quantity
FROM Sales.SalesOrderDetail sod
WHERE sod.SalesOrderId = soh.SalesOrderId
ORDER BY sod.Quantity DESC
) as a


Use the correct transaction isolation level
If there are a lot of rows in your table, multiple concurrent requests to that table could cause contention if the correct transaction isolation level is not set. If requests are repeatedly blocked, it could be time to consider whether to change.
For example, READ UNCOMMITED is equivalent to dirty reads, or NOLOCK. That is, if a transaction is in the middle of processing and you read a row, the data may not be valid, especially if multiple inserts/updates are occuring that require atomicity. This is the most per formant and it ignores locking altogether, but is generally not allowed by good design and is a special case.


With READ_COMMITTED_SNAPSHOT, it specifies that any data read by the transaction will be the transitionally consistent version of the data that existed at the start of the transaction. Internally, it makes a versioned copy of the data and this is placed in tempdb until the transaction has competed. Except when the database is being recovered, snapshot transactions do not request locks when reading data, and therefore do not block other transactions from writing data. Transactions writing data also do not block other transactions reading data.
There are various other types of transaction options, including REPEATABLE_READ and SERIALIZABLE amongst others that you can look at to determine whether they are appropriate for your needs.


Use Computed Columns
Computed columns are derived from other columns in a table. By creating and indexing a computed column, you can turn what would otherwise be a scan into a seek. For example, if you needed to calculate Sales Price and you had a Quantity and UnitPrice column, multiplying them in the SQL inline would cause a table scan as it multiplied the two columns together for every single row. Create a computed column called Sales Price, then index it and the query optimizer will no longer need to retrieve the UnitPrice and Quantity data and do a calculation - it’s already done.


Change SQL Server Settings
· You can increase the 'min memory per query' option to improve the performance of queries that use hashing or sorting operations, if your SQL Server has a lot of memory available and there are many queries running concurrently on the server.
· You can increase the 'max async IO' option if your SQL Server works on a high performance server with high-speed intelligent disk subsystem (such as hardware-based RAID with more than 10 disks).
· You can change the 'network packet size' option to the appropriate value.
· You can change the 'fill factor' option to the appropriate value.
· You can increase the 'recovery interval' value.
· You can set the 'priority boost' SQL Server options to 1.
· Set the 'max worker threads' options to the maximum number of the user connections to your SQL Server box.
· You can specify the 'min server memory' and 'max server memory' options.
· You can specify the 'set working set size' SQL Server option to reserve the amount of physical memory space for SQL Server.
For more help, please visit:
http://www.mssqlcity.com/Tips/tipSrvSet.htm
Manage Indexes on tables
· If you have a big sql or stored procedure, which is taking too much time to execute, then you can use SQL Server’s Query Optimizer Tool and it will suggest you to create indexes.
· But wait note one thing that you make only necessary indexes, extra indexes can become load on your database and when you are adding, editing or deleting record, it will become headache for you.
· Avoid creating indexes on text columns and datetime columns.


Useful Links

Ø http://www.codeproject.com/KB/viewstate/ServerViewState.aspx
Ø http://searchsqlserver.techtarget.com/tip/0,289483,sid87_gci1159433,00.html#setup
Ø http://www.microsoft.com/technet/prodtechnol/sql/2005/workingwithtempdb.mspx
Ø http://www.databasedesign-resource.com/sql-server-hardware-requirements.html
Ø http://www.sql-server-performance.com/tips/database_design_p1.aspx
Ø http://www.codeproject.com/KB/database/sql2005-replication.aspx
Ø http://searchsqlserver.techtarget.com/tip/1,289483,sid87_gci1058017,00.html?FromTaxonomy=/pr/301355
Ø http://www.sql-server-performance.com/tips/t_sql_where_p1.aspx
Ø http://www.sql-server-performance.com/tips/indexed_views_p1.aspx
Ø http://www.sql-server-performance.com/tips/tempdb_p1.aspx
Ø http://www.sql-server-performance.com/tips/performance_monitor_general_p1.aspx
http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/

Book: Professional SQL Server 2005 Performance Tuning: Performance Tuning -- Steven Worth, Christian Bolton, Justin Langford, Michael Cape, Joshua J. Jin, Douglas Hinson, Haidong Ji, Paul A. Mestemaker, Arindam Sen

No comments: