As developers, we often use the clustering index and non-clustered index, if I ask most people the following two questions may have few satisfactory answers.
- What is the clustering index (clustered index) / What is non-clustering index (nonclustered index)?
- clustered index and non-clustered index What is the difference?
In fact, the index can be understood as a special directory. Microsoft’s SQL SERVER provides two indexes: clustered index (also known as the clustering index) and non-clustered index (also known as non-clustering index). For example to illustrate the difference between clustered index and non-clustered index:
In fact, the body itself, our dictionary is a clustered index. For example, we want to check “Ann”, will naturally open the first few pages of the dictionary, because “Ann” is spelled (“a”, “n”, “n”), while ordering in English according to the spelling dictionary begins with the letters “a” and “z” at the end, then “ann” word naturally came in front of the dictionary. If you turn over all with “a” at the beginning of the section still can not find the word, then it shows that you do not have this word in the dictionary; Similarly, if the investigation “Zoo”, then you’ll turn your dictionary to the final part, because “Zoo” is spelled (“z”, “o”, “o”). That is, the body of the dictionary itself is a directory, you do not need to go check the other directory to find what you are looking for.
We call this body content itself is an arrangement in accordance with certain rules directory as “clustered index”.
If you know a word, it can quickly be found in the dictionary. However, you may encounter a word you do not know and do not know its pronunciation; this time, you can’t find the word you want with the earlier method, instead of proceed immediately to a “bookmark” after page to find the word you’re looking for. But you combine the page and found the sort of words are not really text sorting method, which is a dictionary word in the body of the mapping in a non-clustered index. We can find the word in this way, but it requires two processes, the results of the first found in the directory “bookmark”, then you need the ability to turn the page.
We refer to this directory is pure directory, text is the pure text of the sort as “non-clustered index”
Through the above example, we can understand what is “clustered index” and “non-clustered index.”
Further extended look, we can easily understand: Each table can have only one clustered index, because the directory can be sorted according to a method.
|Action||Using clustered indexes||Using non-clustered indexes|
|Columns are often grouped Sort||yes||yes|
|Return data within a certain range||yes||no|
|A little or a different value||no||no|
|A small number of different values||yes||no|
|A large number of different values||no||yes|
|Frequently updated columns||no||yes|
|Foreign key columns||yes||yes|
|Primary key column||yes||yes|
|Frequent changes of the index column||no||yes|
In fact, it can be understood on the table by the previous example clustered index and non-clustered index defined. Such as: return data within a certain range. For example, you have a time column in a table, and you build the clustering index to the column, when you query the entire data between January 1, 2004 to October 1, 2004, the returen of the result is very fast, because your body of this dictionary is sorted by date, clustering index just need to find all the data to be retrieved in the beginning and end of the data; rather than non-clustered index, you must first be found directory data found in each of the corresponding page number, and then found out the exact content based on page numbers.
On the Myth index used
The purpose is to apply the theory. Although we have just listed when to use a clustered index or non-clustered index, but it is very easy to ignore or not a comprehensive analysis of the actual situation in practice for above rules. Here we will talk about the actual problems encountered in practice misunderstanding of what index to use in order to master the method of indexing.
- The primary key is the clustered index
This idea is utterly wrong, is a waste of the clustered index. Although SQL SERVER default is to create a clustered index on the primary key.
Normally, we would have built an ID column in each table to distinguish each of the data, and the ID column is automatically increased, usually 1. For example, we have an column named ‘Gid‘. At this point, if we set this column as the primary key, SQL SERVER will default set this column as the clustered index. The advantage of this is that you can keep your data in the database physically sorted according to GID, but I think it has little significance.
The advantage of the clustered index is obvious, and each table can have only one clustered index rules, which makes clustered index becomes more valuable.
By definition we talked about earlier, the biggest benefit of using the clustered index is based on the query requirements, it can quickly narrow your search, to avoid the full table scan. In practice, since the GID number is automatically generated, we do not know the GID number of each record, so it is difficult in practice to query by GID number. This indicates set the primary key as clustered index has become a waste of resources.
If your system has been established for a long time, and stored large amount of data, so each time a user opens the home page to conduct a full table scan, this sense is not great, the vast majority of users file a month ago have been visited, and this can only create more overhead database. In fact, we can allow the user to open the system page, the database only query the user for the past three months have not read the documents, by “Date” field to limit the table scan speed up the search. If your system has been established for 2 years, then your home page display speed will be eight times compuare with the original speed theoretically, even faster.
Here, the reference to “theoretically” the words, because if you blindly clustered index is built on the GID of the primary key, your query speed is not so high, even if you are on a “date” This field the establishment of an index (non-clustered index). Here we look at the data in the amount of 10 million cases of various query speed performance (data for the three month period 250,000)
- Build a clustered index on the primary key, and not divide period:
- Build a clustered index on the primary key, set up non-clustered index on Date_Created
- The clustered index based on date column (Date_Created)
Select ID,Date_Created,Name,Title from Employee;
Time: 128,470 ms (ie: 128 seconds)
Select ID,Date_Created,Name,Title from Employee where Date_Created> dateadd(day, -90, getdate ());
Time: 53763 ms (54 seconds)
Select ID,Date_Created,Name,Title from Employee where Date_Created> dateadd(day,-90,getdate());
Time: 2423 milliseconds (2 seconds)
Although each statement extracted data is 250,000, the difference is huge variety of situations, particularly the establishment of the clustered index difference at the date column. In fact, if your database is really the capacity of 10 million words, based on the primary key ID column, first and second case as above, on a Web page performance is timed out, simply can not be displayed. This is what I abandon ID column as the single most important factor in a clustered index.
More speed is to come: Before each select statement added:
declare @d datetime set @d=getdate()
And, after the select statement added:
select [Statement execution takes time (ms)]=datediff(ms,@d,getdate())
In fact, we can find the above example, the first statement is identical 2,3 and indexed fields are the same; the only difference is that the former fariqi field established in a non-clustered index, which is based on this field the polymerization index, but the query speed is starker. So, is not simply index on any field can improve query speed.
from the construction of the table, we can see that this has a 10 million data table fariqi field has 5003 different records. Build clustered index on this field is very appropriate. In reality, every day we will send several documents, issued on the date of these documents is the same, it is entirely consistent with the requirements of the establishment of the clustered index: “neither the vast majority are the same, but the same can not be only a handful of” rule. In view of this, we have established an “appropriate” aggregate index for us to improve query speed is very important.
As already mentioned: are inseparable from the field during data query is “date” as well as user’s own “user name.” Since these two fields are so important, we can combine them to create a composite index (compound index).
Many people think that as long as any field added to the clustered index, you can speed up the search, it was also confused: If the clustered index fields separate complex query, it will slow down the speed of it? With this issue, we look at the following query speed (the result set is 250,000 data) 😦 Date column fariqi first row in the starting column composite clustered index, the user name neibuyonghu row after row)
select ID,Date_Created,Name,Title from Employee where Date_Created>'2004-5-5';
Query Speed: 2513 ms
select ID,Date_Created,Name,Title from Employee where Date_Created>'2004-5-5' and neibuyonghu='office';
Query Speed: 2516 ms
select ID,Date_Created,Name,Title from Employee where Title='manager';
Query Speed: 60280 ms
From the above tests, we can see that if only the starting column of the clustered index as a query and use the composite clustered index simultaneously query speed all columns are almost the same, even more than to spend all of the composite index of the column even slightly fast (in the query result set as the number of cases); and if only non-composite clustered index starting column as a query, then this index is of no effect. Of course, statements like 1,2 query speed because the number of entries in a query, if the composite index of all columns have to spend, and less if the query results, which would form the “Index covering”, which can achieve optimal performance . Also, please remember: The other columns regardless of whether you use the clustering index often, but it must be the leading column of the most frequently used columns.
Experience using indexes that are not on other books
- clustered index faster than using the primary key index is non-clustered index
- clustered index faster than by a general order by the primary key for the speed, especially in the case of a small amount of data
- Time-to-aggregation within the index, reduce the percentage of data search time will be accounted for by proportional entire table, and regardless of how many polymerization index
- Date column will not have every minute of the input speed slowed inquiry
The following are examples of statements are extracted 250,000:(data)
select ID,Date_Created,Name,Title from employee where Date_Created = '2004-9-16';
time: 3326 ms
select ID,Date_Created,Name,Title from employee where ID <= 250000;
time: 4470 ms
Here, with nearly a quarter of a clustered index faster than the non-clustered index is not the primary key
select ID,Date_Created,Name,Title from Employee order by fariqi;
time: 12936 ms
select ID,Date_Created,Name,Title from employee order by gid;
time: 18843 ms
Here, clustered index than by a general primary key for the order by, speed of 3/10. In fact, if the small amount of data, then use the clustered index to sort columns than the use of non-clustered index was obviously much faster; and the amount of data is very large, such as 100,000 or more, the speed difference between the two is not obvious.
select ID,Date_Created,Name,Title from employee where Date_Created> '2004-1-1';
time: 6343 milliseconds (extract 1 million)
select ID,Date_Created,Name,Title from employee where Date_Created> '2004-6-6';
time: 3170 milliseconds (extract 500,000)
select ID,Date_Created,Name,Title from employee where Date_Created = '2004-9-16';
Time: 3326 milliseconds (and results on the sentence exactly the same as if the number of acquisition, then use the greater than and equal to the number is the same.)
select ID,Date_Created,Name,Title from employee where Date_Created > '2004-1-1' and Date_Created<'2004-6-6';
time: 3280 ms
The following example, a total of 1,000,000 data, January 1, 2004 after the data has 500,000, but only two different dates, exact date to date; 500,000 before the data, there are 5000 different date, date accurate to the second.
select ID,Date_Created,Name,Title from employee where Date_Created> '2004-1-1' order by Date_Created
time: 6390 ms
select ID,Date_Created,Name,Title from employee where Date_Created<'2004-1-1' order by Date_Created;
time: 6453 ms
“Water can carry a boat, can also capsize”, the index is the same. An index helps improve the retrieval performance, but excessive or improper index will cause the system inefficient. Because each user added an index on a table, the database will need to do more work. Too many indexes may even cause index fragmentation.
So, we want to establish a “proper” index system, particularly for polymerization index creation, should strive to make your database to get high performance play.
Of course, in practice, due diligence as a database administrator, you still need more testing some programs, find out what kind of program the highest efficiency, most effective.