3.3 tempdb Space Requirements:
The size and physical placement of the tempdb database can affect the performance of a system. For example, if the size that is defined for tempdb is too small, part of the system-processing load may be taken up with auto growing tempdb to the size required to support the workload every time you restart the instance of SQL Server. You can avoid this overhead by increasing the sizes of the tempdb data and log file. For information about determining the appropriate amount of disk space required for tempdb.
It is difficult to estimate the tempdb space requirement for an application. This section describes the general methodology for estimating the space requirement. These methods are not precise. It requires experience and experiment to gain a satisfactory result. We recommend that you always have a safety factor of about 20% more space. Space estimates must also allow for the future growth of data in the application.
To understand the space requirement of tempdb data files, first look at which features of SQL Server 2005 use tempdb. The tempdb space requirement is different for each feature. The following features use tempdb:
Query
Triggers
Snapshot isolation and read committed snapshot (RCSI)
MARS
Online index creation
Table of Contents
Temporary tables, table variables, and table-valued functions
DBCC CHECK
LOB parameters
Cursors
Service Broker and event notification
XML and LOB variable
Query notifications
Database mail
Index creation
User-defined functions
A server instance may use some or all of these features. For some servers, one or two features might dominate tempdb space usage. In this case, we concentrate most of the capacity planning effort on these few features. For example, you may find out that for one server, a query requires a peak of 50 GB of tempdb space, and RCSI requires a peak of 20 GB of tempdb space, and the remaining features require only 1 GB of peak space. It would be sufficient to allocate 85 GB (71GB + 20% overhead) of tempdb space for this server.
4. SQL Server Profiler:
SQL Profiler is a tool that captures SQL Server events from the server and saves those events in what's known as a trace file. You can then analyze or use the trace file to troubleshoot logic or performance problems. You can use this utility to monitor several areas of server activity, such as:
· Analyzing and debugging SQL statements and stored procedures.
· Monitoring slow performance.
· Stress analysis.
· General debugging and troubleshooting.
· Fine-tuning indexes.
· Auditing and reviewing security activity.
For the most part, Profiler is an administrative tool that requires a bit of experience to master. (Use Profiler to monitor only the events you're troubleshooting or analyzing.) Fortunately, Profiler provides a graphical interface, which makes both learning and monitoring much simpler. However, Profiler requires 10 MB of free space; if free space falls below 10 MB, Profiler stops.
To access Profiler, you must be the administrator or have permission to connect to a specific instance of SQL Server and have granted permissions to execute Profiler stored procedures.
Although on the surface SQL Server 2005 Profiler has not changed dramatically, you may still need help setting up a Profiler session. Here I'll outline several steps to get you up and running. The final step includes sample T-SQL queries to analyze the SQL Server Profiler results.
1.Start SQL Server 2005 Profiler via the GUI interface by navigating to Start All Programs Microsoft SQL Server 2005 Performance Tools SQL Server Profiler.
Start a new Trace session by selecting the 'File' menu and the 'New Trace' option. Once the 'Connect to Server' interface loads, select the 'Type' as either 'Database Engine' or 'Analysis Services'. For this tip we will use the 'Database Engine' option. Then select the 'Server Name' followed by the 'Authentication' type. Finally, press the 'Connect' button to start the configuration.
Configure SQL Server Profiler Trace Properties – General Tab:
· Trace Name: Specify a meaningful name for the session
· Use the template: A total of eight templates are available with predefined events selected
· For this tip we will use the 'Standard (default)' template
· Save to table: To retain a copy of the data save the results to either a database table or Windows file.
· For this tip we will save the results to the dbo.TraceResults table in my user defined database
Configure SQL Server Profiler Trace Properties – Events Selection Tab:
· Review the specific events and select the needed check boxes to capture the desired data.
· Show All Events: Select this check box to see all events that Profiler will be able to capture.
· Show All Columns: Select this check box to see all columns that Profiler will be able to capture.
Additional Configuration – Column Filters
· Specify filters based on the columns that are selected for the session to limit the data.
Additional Configuration – Organize Columns
· Specify the column order as well grouping settings for the final data.
To start the session, press the 'Run' button on the Trace Properties interface.
Review the results that are captured in the Profiler interface.
5. SQL Server Performance Monitor Counters :
Performance Monitor collects data about different counters, such as memory use. Performance Monitor can show you data in graphical format in real time, or you can save the data to log files. you will also use log files for long-term performance monitoring. Working with a log file can be difficult to learn on your own because the options are not intuitive and are hidden on different screens.
You can choose between two Performance Monitors: one in the Administrative Tools group and one in the SQL Server group. They are the same basic program, but you need to run the one in the SQL Server group because it automatically loads the SQL Server-related counters. You run this version of the program with the following command: Perfmon.exe C:\Mssql\Binn\ Sqlctrs.pmc, where the .pmc file is the Performance Monitor counter file that contains the SQL counters. You can write applications that provide your own counters, and you can modify the new system stored procedures called sp_user_counter1 through sp_user_counter10 and track them, too.
When you run the program from the SQL Server group.The program that includes the set of SQL counters, five counters appear at the bottom of the window when Performance Monitor starts. The five counters are
· Cache Hit Ratio
· I/O — Transactions per second
· I/O — Page Reads per second
· I/O Single Page Writes per second
· User Connections
Understanding Counters
Windows NT lets you watch the performance of the system by “counting” the activity associated with any of its objects. Examples of objects in Windows NT are processors, disk drives, and processes. Each object has specific counters associated with it; for example, the % User Time counter is associated with a CPU or processor to designate what percent of the CPU is taken up by user programs (as opposed to system processes).
SQL Server includes many predefined counters, most of which you aren’t likely to use except in special cases. It can be difficult to know which counters are the basic
ones to watch. If you have chosen the SQL Server Performance Monitor, several counters have been set up as default counters, such as Cache Hit Ratio and User Connections. You can create your own defaults by creating a .pmc file.
Cache Hit Ratio :
To monitor your cache, watch SQL Server — Cache Hit Ratio. It monitors the rate at which the system finds pages in memory without having to go to disk. The cache hit ratio is the number of logical reads divided by the total of logical plus physical reads. If the value for this counter is consistently less than 80 percent, you should allocate more memory to SQL Server, buy more system memory, or both. However, before you buy more memory, you can try changing the read-ahead configuration options. Also look at the discussion of free buffers in the next chapter to determine whether the number of free buffers is approaching zero. Changing the free buffers configuration parameter may increase the cache hit ratio.
I/O — Transactions per second
SQL Server: I/O Transactions Per Second counter is a bit misleading, it is still good, especially for capacity planning. This counter measures the number of Transact-SQL batches processed since the last refresh period. You should not use these results against any standard TPC benchmark tests that give results in transactions per second — it is not referring to a Begin/Commit transaction, just to batches of commands. Watch this number over a span of several months, because an increase in this counter can indicate that the use of SQL Server is growing.
I/O — Page Reads per second:
The SQL Server: I/O Page Reads per Second counter is the number of pages not found in SQL Server data cache, which indicates physical reads of data pages from disk. This value does not count pages that are read from the Windows NT virtual memory disk file. There is no way to watch only the logical page reads per second. According to sources in the SQL development team, counters for logical pages reads are hidden in a structure that is not available in this version of SQL Server. However, you can figure out the logical page reads per second by taking the total page reads per second and subtracting the physical page reads per second.
I/O Single Page Writes per second:
You should occasionally turn on the I/O Single Page Writes counter. A lot of single page writes means you need to tune SQL Server, because it is writing single pages to disk instead of its normal block of pages. Most writes consist of an entire extent
(eight pages) and are performed at a checkpoint. The lazywriter handles all the writing of an extent at a time. When SQL is forced to hunt for free pages, it starts finding and writing the LRU pages to disk — one page at a time. A high number of single page writes means that SQL Server does not have enough memory to keep a normal amount of pages in data cache.
User Connections:
Last but not least, you can define counters. The user-defined counters are in the SQL Server User-Defined Counters object in the Master database. The 10 counters correspond to 10 new stored procedures called sp_User_Counter1 through sp_User_Counter10. These stored procedures are the only system stored procedures you should change. If you look at the code of the procedure, they all perform a Select 0, which, when tracked on Performance Monitor, draws a flat line at the bottom of the screen. Replace the Select 0 with a Select statement that returns one number; an integer is preferable, but float, real, and decimal numbers also work. These queries should be quick, not ones that take minutes to run.
1. 6.Tuning SQL Server Hardware:
Tuning SQL Server for performance is in many ways like constructing a house. You must find a suitable location with rich soil, build a solid foundation to support two or three levels, install the electrical and plumbing, finish the walls for painting and decorating, and finally conduct ongoing maintenance.
When specifying your SQL Server hardware requirements always consider the following:
The size of the database;
Number of concurrent users;
Transaction throughput;
The operations that will be performed on the databases.
Hardware planning:
Proper hardware planning is the first step in achieving high performance. It is imperative to determine what hardware to use based on requirements that can be calculated by capacity planning for CPUs, memory, drives, network interface cards (NICs), etc. At times this decision is rather easy; you can use corporate standards to simplify the hardware support. If standards do not exist, seize the opportunity and begin building standards to simplify the overall management.
While standardization is critical to supporting a large number of servers, it should be complimented by the latest technologies so you may continuously achieve higher performance at lower costs. Two current hardware technologies that should be considered in the short and long-term are 64-bit technologies and database accelerators.
Hardware-based accelerators:
Database accelerators have been getting a fair amount of press lately for SQL Server. At a conceptual level, database accelerators are hardware devices with CPUs and memory. They store the entire database or a subset of tables in memory to process user transactions that write back to the SQL Server, which ultimately stores the data. The advantage is that these devices can sometimes support the entire database or core tables in dedicated memory, which outperforms accessing the data from disk. This can help from a scale up or scale out perspective because a single database accelerator can alleviate hardware bottlenecks without a change to the existing SQL Server.
I would recommend that for a production server that hosts your important most frequently accessed databases and for database that must have high availability that you upgrade the SQL Server hardware beyond the minimum specification in some cases to the highest specification that can produce.
This will help to prevent bottlenecks and poor system performance.
Processor: I would recommend that you "scale up" your server's processor and in some cases also "Scale out."
By "Scaling Up" to faster processors SQL Server will process queries faster. This will result in your end users seeing an improvement in performance.
Also it may be necessary to "scale out" your processors. According to Microsoft © SQL Server can support up to 32 processors on Symmetrical Multiprocessing (SMP) computers running on Windows 2000 Data Centre Server.
Scaling out multiple processors may improve fault tolerance and maybe necessary for some very large databases.
Disk Drives and Disk Space:
When planning your SQL Server Hardware Requirements it is important to remember that database applications can extremely Input/Output intensive.
Most senior SQL Server DBA’s would recommend using fast SCSI drives and quality disk controllers (or if you have no budgetary restrictions a SAN environment).
As an important note when using write caching disk controllers, I recommend that you confirm with the manufacturer before implementation that they are suitable for use with Database Servers.
Using inappropriate write caching disk controllers can result in data loss if a "catastrophic failure" occurs, as an example: loss of power.
If you want redundancy and fault tolerance you should consider using RAID. To quote from Microsoft "The different Levels of RAID provide varying levels of performance and fault tolerance. Hardware based RAIDS provides more performance that Operating System RAID
Memory:
An insufficient amount of RAM can cause your SQL server to continually read data from disk instead of cached memory.
This will impact on query performance in most cases this impact will be significant. Having an appropriate amount of memory will allow SQL Server to process your queries more efficiently: it can hold more data in its cache.
6. SQL Server Join Hints:
JOIN hints can be used in a query to specify the type of JOIN the Query Optimizer is to use for the execution plan. The JOIN options are:
Hash
Merge
Loop
Hash Joins:
Hash joins are used for joining large data sets. The optimizer uses the smaller of two tables or data sources to build a hash table on the join key in memory. It then scans the larger table, probing the hash table to find the joined rows.
This method is best used when the smaller table fits in available memory. The cost is then limited to a single read pass over the data for the two tables.
The optimizer uses a hash join to join two tables if they are joined using an equijoin and if either of the following conditions are true:
A large amount of data needs to be joined.
A large fraction of a small table needs to be joined.
Merge Joins:
Sort merge joins can be used to join rows from two independent sources. Hash joins generally perform better than sort merge joins. On the other hand, sort merge joins can perform better than hash joins if both of the following conditions exist:
The row sources are sorted already.
A sort operation does not have to be done.
However, if a sort merge join involves choosing a slower access method (an index scan as opposed to a full table scan), then the benefit of using a sort merge might be lost.
Sort merge joins are useful when the join condition between two tables is an inequality condition (but not a non equality) like <, <=, >, or >=. Sort merge joins
perform better than nested loop joins for large data sets. You cannot use hash joins unless there is an equality condition.
In a merge join, there is no concept of a driving table. The join consists of two steps:
Sort join operation: Both the inputs are sorted on the join key.
Merge join operation: The sorted lists are merged together.
If the input is already sorted by the join column, then a sort join operation is not performed for that row source.
The optimizer can choose a sort merge join over a hash join for joining large amounts of data if any of the following conditions are true:
The join condition between two tables is not an equi-join.
Because of sorts already required by other operations, the optimizer finds it is cheaper to use a sort merge than a hash join.
loop joins
Nested loop joins are useful when small subsets of data are being joined and if the join condition is an efficient way of accessing the second table.
It is very important to ensure that the inner table is driven from (dependent on) the outer table. If the inner table's access path is independent of the outer table, then the same rows are retrieved for every iteration of the outer loop, degrading performance considerably. In such cases, hash joins joining the two independent row sources perform better.
A nested loop join involves the following steps:
The optimizer determines the driving table and designates it as the outer table.
The other table is designated as the inner table.
For every row in the outer table, Oracle accesses all the rows in the inner table. The outer loop is for every row in outer table and the inner loop is for every row in the inner table. The outer loop appears before the inner loop in the execution plan, as follows:
NESTED LOOPS
outer_loop
inner_loop
7. Performance Tuning Tips for SQL Server Backup and Restore:
In SQL Server, you can back up a database or the transaction log. A backup creates an image of your data that you can restore to the source database or another database. There are three types of database backups: full, differential and copy-only.
A full backup is a copy of your backup that contains the state of the database at the point the database backup started. It includes the changes that occurred since that point, until the point where the data portion of the backup completed.
A differential backup contains all changes that occurred since the last complete backup. So, a differential backup done on Monday will contain all the data that has changed in the database since the last full backup was done on Sunday. A differential backup done on Tuesday will contain all data that has changed in the database since Sunday.
A copy-only backup is one that will not interrupt differential backups or log shipping chains.
A database backup includes the state of the database as of the time you started the backup and any changes that occurred while you were backing up the data portion of the backup.
You can back up a complete database or one or more filegroups or files of the database. You can do a complete backup (called a full backup), a differential backup (all changes that occurred in the database, filegroup or file since the last full backup) or a copy-only backup (SQL 2005 only).
Here are some examples:
BACKUP DATABASE AdventureWorks TO DISK='test.bak'
RESTORE FILELISTONLY FROM DISK='test.bak' WITH FILE=1
Backup/Restore Optimization Tips
l Try to perform backup to the local hard disk first, and copy backup file(s) to the tape later
l Perform backup on multiple backup devices.
l Perform backup on a physical disk array, so the more disks in array the more quickly the backup will be made.
Table of Contents
l Perform backups during periods of low database access.
l Use full backup to minimize the time to restore databases.
l Use incremental backup to minimize the time to backup databases.
l Use differential backup instead of incremental backup when the users update the same data many times.
l Try to separate your database to different files and filegroups to backing up only appropriate file/filegroup.
l Use Windows NT Performance Monitor or Windows 2000 System Monitor to check a backup impact on the total system performance.
l To decrease the backup operation's time consider backing up more often.
l Place a tape drive on another SCSI bus as disks or a CD-ROM drive.
There is a trend today to store more and more data, especially blobs of it, in databases. Terabyte databases are no longer a rarity. SQL Server 2005 addresses problems with backing up and restoring these very large databases and provides greater recovery options for the VLDBs, as well as the smaller databases.
Copy-only backups – allows you to copy databases for cloning purposes
Mirrored tape backups – allows simultaneous tape backups up to four devices
Torn page detection's replacement – checksum
Transaction marks, which allow you to keep multiple databases in sync
Restoring from a database snapshot
SQL Server 2000 and SQL Server 2005 offer three database recovery models: Simple Recovery, Full Recovery, and Bulk-Logged Recovery. The database recovery model you choose can affect its performance during these operations: BULK INSERT, SELECT INTO, and CREATE INDEX (including indexed views), WRITETEXT, and UPDATETEXT.
For example, if you choose the Full Recovery model, BULK INSERT, SELECT INTO, and CREATE INDEX (including indexed views), WRITETEXT, and UPDATETEXT operations are all fully logged. While this ensures the best recoverability, it is not the fastest or most efficient way to perform these operations.
If you want to speed these operations, you can choose the Bulk-Logged Recovery model, where these operations are minimally logged. This option still provides a good level of recoverability, but is faster than using the Full Recovery model. This option is similar to the SELECT INTO/BULK COPY database option available with older versions of SQL Server. The Simple Recovery model is most similar to the TRUNC. LOG ON CHECKPT. option available with older versions of SQL Server.
8. SQL Server Replication:
Replication is the process of sharing data between databases in different locations. Using replication, we can create copies of the database and share the copy with different users so that they can make changes to their local copy of database and later synchronize the changes to the source database.
The following functions and terms are used in replication:
Publisher
Publication
Article
Distributor
Subscribers
Subscription
Publisher is a server that makes the data available for subscription to other servers. In addition to that, publisher also identifies what data has changed at the subscriber during the synchronizing process. Publisher contains publication(s).
Subscriber is a server that receives and maintains the published data. Modifications to the data at subscriber can be propagated back to the publisher.
Distributor is the server that manages the flow of data through the replication system. Two types of distributors are present, one is remote distributor and the other one local distributor. Remote distributor is separate from publisher and is configured as distributor for replication. Local distributor is a server that is configured as publisher and distributor.
Agents are the processes that are responsible for copying and distributing data between publisher and subscriber. There are different types of agents supporting different types of replication.
Snapshot Agent is an executable file that prepares snapshot files containing schema and data of published tables and database objects, stores the files in the snapshot folder, and records synchronization jobs in the distribution database.
An article can be any database object, like Tables (Column filtered or Row filtered), Views, Indexed views, Stored Procedures, and User defined functions.
Publication is a collection of articles.
Replication Benefits
Users working in different geographic locations can work with their local copy of data thus allowing greater autonomy.
Database replication can also supplement your disaster-recovery plans by duplicating the data from a local database server to a remote database server. If the primary server fails, your applications can switch to the replicated copy of the data and continue operations.
You can automatically back up a database by keeping a replica on a different computer. Unlike traditional backup methods that prevent users from getting access to a database during backup, replication allows you to continue making changes online.
You can replicate a database on additional network servers and reassign users to balance the loads across those servers. You can also give users who need constant access to a database their own replica, thereby reducing the total network traffic.
Database-replication logs the selected database transactions to a set of internal replication-management tables, which can then be synchronized to the source database. Database replication is different from file replication, which essentially copies files.
Microsoft SQL Server 2005 supports the following types of replication:
Snapshot Replication
Transactional Replication
Merge Replication
Snapshot Replication
Snapshot replication is also known as static replication. Snapshot replication copies and distributes data and database objects exactly as they appear at the current moment in time.
Subscribers are updated with complete modified data and not by individual transactions, and are not continuous in nature.
This type is mostly used when the amount of data to be replicated is small and data/DB objects are static or does not change frequently.
Transactional Replication
Transactional replication is also known as dynamic replication. In transactional replication, modifications to the publication at the publisher are propagated to the subscriber incrementally.
Publisher and the subscriber are always in synchronization and should always be connected.
This type is mostly used when subscribers always need the latest data for processing.
Merge replication
It allows making autonomous changes to replicated data on the Publisher and on the Subscriber. With merge replication, SQL Server captures all incremental data changes in the source and in the target databases, and reconciles conflicts according to rules you configure or using a custom resolver you create. Merge replication is best used when you want to support autonomous changes on the replicated data on the Publisher and on the Subscriber.
Replication agents involved in merge replication are snapshot agent and merge agent.
Implement merge replication if, changes are made constantly at the publisher and subscribing servers, and must be merged in the end.
By default, the publisher wins all conflicts that it has with subscribers because it has the highest priority. Conflict resolver can be customized.
9. Optimizing SQL Server Database Design:
There are two components to designing a database: logical and physical. Logical database design involves modeling your business requirements and data using database components such as tables and constraints without regard for how or where the data will be physically stored. Physical database design involves mapping the logical design onto physical media, taking advantage of the hardware and software features available that allow the data to be physically accessed and maintained as quickly as possible, and indexing.
It is important to correctly design the database to model your business requirements and to take advantage of hardware and software features early in the development cycle of a database application because it is difficult to make changes to these components later.
Techniques for Optimizing Databases
Most of the major DBMSs support the following techniques although perhaps by different names. Each of the following techniques can be used to tune database performance .
Partitioning — breaking a single database table into sections stored in multiple files.
Raw partitions versus file systems — choosing whether to store database data in an OS-controlled file or not.
Indexing — choosing the proper indexes and options to enable efficient queries.
Denormalization — varying from the logical design to achieve better query performance.
Clustering — enforcing the physical sequence of data on disk.
Interleaving data — combining data from multiple tables into a single, sequenced file.
Free space — leaving room for data growth.
Compression — algorithmically reducing storage requirements.
File placement and allocation — putting the right files in the right place.
Page size — using the proper page size for efficient data storage and I/O.
Reorganization — removing inefficiencies from the database .
Let us see what are the points you need to keep in mind while designing an SQL Server database, which needs to deliver optimal performance.
Let us keep a bulleted list of points that I want to discuss.
Normalize your tables to an acceptable level;
Identify the ideal data types for your columns;
Intelligently use constraints;
Use Stored Procedures;
Normalize your tables to an acceptable level:
Normalization is a crucial process in database design, which eliminates redundant data from the database. This makes the database sleek and easily searchable which results in fast retrieval of data. There are five levels to which the database can be normalized. It is best to keep your tables to the third normal form.
One of the main advantages of normalizing a database is that it can eliminate storage of NULLS. They make the database bulky, and will also increase response times.
Identify the ideal data types.
Mistakes are made commonly while defining the columns of a table. I would always suggest giving a second thought on the appropriateness of each column data types. This will have a two-fold effect on your performance. The stored data will be easily accessible, and the tables of your database will not be bulky. Let us take an example of the same.
Consider the table where you need to store an Employee ID, Employee Joining Date and Employee Salary per month for a small firm, with 100 employees. You may create the table like this.
EmployeeID Integer
EmployeeJoiningDate DateTime
EmployeeSalary Money.
The integer datatype takes four bytes to store data; the data time data type takes eight bytes of storage space and the Money datatype consumes eight bytes of memory. That means a typical data row in the table is 20 bytes long. When you have 100 rows, the total memory consumed by the table is 2000 bytes.
Now, when you can use appropriate datatypes, you can bring this number down. That means you can achieve optimization of storage space by opting for the right datatype. Let us see how we can optimize the table design provided above.
The EmployeeID field we know can take only an integer number, with a max of 100, because that is the number of employees in the company. In this case, you can use the tinyint datatype, which can hold values between 1-255. The advantage is that, it takes up only one byte of storage space. See, you have brought down the memory consumption to one fourth of the original, just by selecting the appropriate datatype.
Consider that after all this analytical deduction, I have arrived in a decision to use the smalldatetime and smallmoney datatypes for the EmployeeJoiningDate and EmployeeSalary respectively. The savings I receive are 4 bytes for the EmployeeJoiningDate field and four bytes for the EmployeeSalary field. The total size of one row is just 9 bytes now, and for a hundred records, you receive a saving of a whooping 1100 bytes.
Even though I have taken a simple table structure with just three columns and a max of 100 records, we have saved 1100 bytes. Think of the savings if the table contains a couple of million records and more columns than just three!
Intelligent use of constraints.
In my view, a near perfect (there is again, nothing called perfect!) database system will use constraints to enforce database integrity. Even though there are other methods that serve the same purpose, constraints provide the best option.
Use of Stored Procedures
This is a big area to discuss. Just giving a foreword on it, SQL Server stored procedures can provide you with a faster way to access or manipulate data. The stored procedures can also help us reduce network traffic by minimizing the size of requests from the client and reducing processing capacity required at the client.
10. Optimizing XML Performance:
Design Considerations:
Choose the appropriate XML class for the job.
Consider validating large documents.
Process large documents in chunks, if possible.
Use streaming interfaces.
Consider hard-coded transformations.
Consider element and attribute name lengths.
Consider sharing the XmlNameTable.
Parsing XML:
Use XmlTextReader to parse large XML documents.
Use XmlValidatingReader for validation.
Consider combining XmlReader and XmlDocument.
On the XmlReader, use the MoveToContent and Skip methods to skip unwanted items.
Validating XML:
Use XmlValidatingReader.
Do not validate the same document more than once.
Consider caching the schema.
Writing XML:
Use XmlTextWriter.
XPath Queries
Use XPathDocument to process XPath statements.
Avoid the // operator by reducing the search scope.
Compile both dynamic and static XPath expressions.
XSLT Processing
Use XPathDocument for faster XSLT transformations.
Consider caching compiled style sheets.
Consider splitting complex transformations into several stages.
Minimize the size of the output document.
Write efficient XSLT.
11. General Tips
Use Clustered IndexesHaving the clustered index on the primary key is sometimes not the most efficient place for the clustered index to be. A clustered index is the most performance type of index. The whole table is sorted according to the clustered index. If the table is involved in lots of joins based on the primary key, it is probably the right place for it to be, but if you are continually filtering or grouping on other columns in a table, then you should possibly consider changing the primary key index to Non-Clustered, and putting the clustered index on those filtered or grouped columns.
The following statement removes and existing clustered index on the primary key and replaces it with a non-clustered index:
ALTER TABLE MySchema.SalesOrderHeader
DROP CONSTRAINT PK_SalesOrderHeader
GO
ALTER TABLE MySchema.SalesOrderHeader
ADD CONSTRAINT PK_SalesOrderHeader
PRIMARY KEY NONCLUSTERED(SalesOrderID);
GO
Then the following statement adds a new clustered index to a table.
CREATE CLUSTERED INDEX MyClusteredIndex
ON MySchema.SalesOrderHeader (OrderID)
GO
Table of Contents
Use Indexed Views
Indexed Views have been around for a while. A view is like a named query, and these days you can add indexes to them. If used correctly, they can cause a massive improvement in execution times, often better than a clustered index with covering columns on the original table. Also, in SQL Server Developer Edition and Enterprise Edition, a view index will also be automatically used if it is the best index even if you don’t actually specify the view in your query!
CREATE VIEW MySchema.SalesByCustomer
WITH SCHEMABINDING
AS
SELECT soh.SalesTerritoryID, soh.CustomerID,
SUM(sod.Quantity * sod.UnitPrice)
FROM MySchema.SalesOrderHeader soh
INNER JOIN MySchema.SalesOrderDetail sod
ON (soh.SalesOrderID = sod.SalesOrderID)
GROUP BY soh.SalesOrderTerritory, soh.CustomerID
GO
Note the use of the schema binding attribute. This prevents you from changing underlying tables while this view exists, and is necessary if you want to add an index. Some people avoid indexed views for this reason, as the maintenance becomes more complicated as further dependencies to the view are created. The following statement adds an index:
CREATE UNIQUE CLUSTERED INDEX IdxSalesOrderView
ON MySchema.SalesByCustomer(
SalesTerritoryID, CustomerID
)
GO
User Covering Indexes
Covering indexes are a feature that was newly added to SQL 2005. Basically, you can create an index optimized for the query itself based on joins, filters and grouping, and then add additional columns that can be retrieved directly from the index for use in select statements, as follows:
CREATE NONCLUSTERED INDEX TestIndex
ON MySchema.SalesOrderDetail(OrderId)
INCLUDE (Quantity, UnitPrice)
The above statement causes a non-clustered index to be created on the SalesOrderDetail table. If queries are executed on the OrderId column, the index will be used, and if the only other columns being retrieved are Quantity and UnitPrice, and then the query optimizer doesn’t need to retrieve any extra columns from the underlying table. It can just use the index. Because the query optimizer doesn’t need to query the original table, performance is improved.
Keep your Clustered Index small
One thing you need to consider when determining where to put your clustered index is how big the key for that index will be. The problem here is that the key to the clustered index is also used as the key for every non-clustered index in the table. So if you have a large clustered index on a table with a decent number of rows, the size could blow out significantly. In the case where there is no clustered index on a table, this could be just as bad, because it will use the row pointer, which is 8 bytes per row.
Avoid cursors
Cursors are less perform ant because every FETCH statement executed is equivalent to another SELECT statement execution that returns a single row. The optimizer can’t optimize a CURSOR statement, instead optimizing the queries within each execution of the cursor loop, which is undesirable. Given that most CURSOR statements can be re-written using set logic, they should generally be avoided.
Archive old data
If you want to improve query performance, give the optimizer less work to do. If you can cut down the number of rows the query has deal with, then performance will improve. I have no problem with people creating audit triggers to move historical data into other tables for this reason. Alternatively, if you don’t need your data after a certain period of time, back up your database and remove the data.
Partition your data correctly
These days, you don’t actually have to move old data out of a table to improve query performance. You can partition your table into a number of data segments based on a partition function. The query optimizer can use the partition function to look at rows only on the most appropriate file group. To create partitions, you need a partition function and a partition scheme.
CREATE PARTITION FUNCTION myRangePartitionFunction(int)
AS RANGE RIGHT FOR VALUES (1,100,1000)
Once the partition function is created, you can then apply the function to a partition scheme for a table.
CREATE PARTITION SCHEME myRangePartitionScheme
AS PARTITION myRangePartitionFunction
TO (filegrp1, filegrp2, filegrp3, filegrp4)
Then it’s just a matter of creating the table to use the partition scheme on the column you decided to partition on:
CREATE TABLE mySchema.myPartitionTable(
col1 int,
col2 nvarchar(100)
)
ON myRangePartitionScheme (col1)
Remove user defined inline scalar functions
Inline scalar functions are convenient if you want to return a single value, but at the expense of performance. They look somewhat like stored procedures, and they can be used in SQL statements. The problem is that they are not expanded and therefore not optimized into the query plan by the query optimizer. Bad news, because it turns a Seek into a Scan. Queries like this may appear to be per formant in the Execution plans and also in the IO statistics, but when you run the query, it can perform really badly. No seriously, really bad.
Here’s an example of what I’m talking about:
CREATE FUNCTION dbo.fnGetPostalCode (
@Suburb nvarchar (100),
@State nvarchar (10))
RETURNS int
AS
BEGIN
RETURN ISNULL (
(
SELECT Postal Code
FROM dbo.PostalCode
WHERE Suburb = @Suburb
AND State = @State
), -1);
END
GO
The following statement will only perform a clustered index scan, not a seek, and on a big table this could seriously affect performance.
SELECT s.SalesPersonID,
s.SuburbName,
s.State,
dbo.fnGetPostalCode(s.SuburbName,s.State)
AS PostalCode
FROM dbo.SalesPerson
You can have a look at the details by clicking on SQL Server Management Studio’s Query menu, and selecting “Include Actual Execution Plan”
One way to get around this is to simply inline the underlying query from the function, as follows:
SELECT s.SalesPersonID, s.SuburbName, s.State,
ISNULL( (SELECT PostalCode
FROM dbo.PostalCode
WHERE Suburb = s.SuburbName
AND State = s.State), -1)
AS PostalCode
FROM dbo.SalesPerson
Inline the SQL statement will perform significantly better.
Use APPLY
The apply statement was created for the situation where you put multiple inline nested queries in the one statement. For example, take the following statement:
SELECT soh.SalesOrderID,
Quantity=(SELECT TOP 1 (Quantity)
FROM Sales.SalesOrderDetails
WHERE SalesOrderID = soh.SalesOrderID),
UnitPrice=(SELECT TOP 1 (UnitPrice)
FROM Sales.SalesOrderDetails
WHERE SalesOrderID = soh.SalesOrderID)
FROM Sales.SalesOrderHeader soh
This performs an extra query, retrieving data from another table using the same criterion. This can now be replaced with the following:
SELECT soh.SalesOrderID, soh.OrderDate, a.*
FROM Sales.SalesOrderHeader soh
CROSS APPLY (
SELECT TOP (1) sod.UnitPrice, sod.Quantity
FROM Sales.SalesOrderDetail sod
WHERE sod.SalesOrderId = soh.SalesOrderId
ORDER BY sod.Quantity DESC
) as a
Use the correct transaction isolation level
If there are a lot of rows in your table, multiple concurrent requests to that table could cause contention if the correct transaction isolation level is not set. If requests are repeatedly blocked, it could be time to consider whether to change.
For example, READ UNCOMMITED is equivalent to dirty reads, or NOLOCK. That is, if a transaction is in the middle of processing and you read a row, the data may not be valid, especially if multiple inserts/updates are occuring that require atomicity. This is the most per formant and it ignores locking altogether, but is generally not allowed by good design and is a special case.
With READ_COMMITTED_SNAPSHOT, it specifies that any data read by the transaction will be the transitionally consistent version of the data that existed at the start of the transaction. Internally, it makes a versioned copy of the data and this is placed in tempdb until the transaction has competed. Except when the database is being recovered, snapshot transactions do not request locks when reading data, and therefore do not block other transactions from writing data. Transactions writing data also do not block other transactions reading data.
There are various other types of transaction options, including REPEATABLE_READ and SERIALIZABLE amongst others that you can look at to determine whether they are appropriate for your needs.
Use Computed Columns
Computed columns are derived from other columns in a table. By creating and indexing a computed column, you can turn what would otherwise be a scan into a seek. For example, if you needed to calculate Sales Price and you had a Quantity and UnitPrice column, multiplying them in the SQL inline would cause a table scan as it multiplied the two columns together for every single row. Create a computed column called Sales Price, then index it and the query optimizer will no longer need to retrieve the UnitPrice and Quantity data and do a calculation - it’s already done.
Change SQL Server Settings
· You can increase the 'min memory per query' option to improve the performance of queries that use hashing or sorting operations, if your SQL Server has a lot of memory available and there are many queries running concurrently on the server.
· You can increase the 'max async IO' option if your SQL Server works on a high performance server with high-speed intelligent disk subsystem (such as hardware-based RAID with more than 10 disks).
· You can change the 'network packet size' option to the appropriate value.
· You can change the 'fill factor' option to the appropriate value.
· You can increase the 'recovery interval' value.
· You can set the 'priority boost' SQL Server options to 1.
· Set the 'max worker threads' options to the maximum number of the user connections to your SQL Server box.
· You can specify the 'min server memory' and 'max server memory' options.
· You can specify the 'set working set size' SQL Server option to reserve the amount of physical memory space for SQL Server.
For more help, please visit:
http://www.mssqlcity.com/Tips/tipSrvSet.htm
Manage Indexes on tables
· If you have a big sql or stored procedure, which is taking too much time to execute, then you can use SQL Server’s Query Optimizer Tool and it will suggest you to create indexes.
· But wait note one thing that you make only necessary indexes, extra indexes can become load on your database and when you are adding, editing or deleting record, it will become headache for you.
· Avoid creating indexes on text columns and datetime columns.
Useful Links
Ø http://www.codeproject.com/KB/viewstate/ServerViewState.aspx
Ø http://searchsqlserver.techtarget.com/tip/0,289483,sid87_gci1159433,00.html#setup
Ø http://www.microsoft.com/technet/prodtechnol/sql/2005/workingwithtempdb.mspx
Ø http://www.databasedesign-resource.com/sql-server-hardware-requirements.html
Ø http://www.sql-server-performance.com/tips/database_design_p1.aspx
Ø http://www.codeproject.com/KB/database/sql2005-replication.aspx
Ø http://searchsqlserver.techtarget.com/tip/1,289483,sid87_gci1058017,00.html?FromTaxonomy=/pr/301355
Ø http://www.sql-server-performance.com/tips/t_sql_where_p1.aspx
Ø http://www.sql-server-performance.com/tips/indexed_views_p1.aspx
Ø http://www.sql-server-performance.com/tips/tempdb_p1.aspx
Ø http://www.sql-server-performance.com/tips/performance_monitor_general_p1.aspx
http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/
Book: Professional SQL Server 2005 Performance Tuning: Performance Tuning -- Steven Worth, Christian Bolton, Justin Langford, Michael Cape, Joshua J. Jin, Douglas Hinson, Haidong Ji, Paul A. Mestemaker, Arindam Sen