RAID is a Redundant Array of Inexpensive Disks. Simply put, RAID is a way to multiple independent hard disk (physical disk) combined in different ways to form a disk group (logical drives), thereby providing a single higher hard disk storage performance ratio and provide data backup technology.
Disk array composed of different ways to become a RAID level (RAID Levels).
In order to protect the security of user data, data backup function is in the user data once the damage occurs; you can make use of the backup information to restore corrupted data. In the user’s perspective, disk group consisting of like a hard drive, users can partition it, format and so on. In short, the operation of the disk array with a single hard drive exactly the same. The difference is that the speed of the disk array storage is much higher than a single hard disk, and can provide automatic data backup.
RAID technology is currently divided into two types: hardware-based and software-based RAID technology RAID technology.
Which can be achieved in the Linux RAID functionality through built-in software, so you can omit purchase expensive hardware RAID controllers and accessories can greatly enhance the performance and reliability of disk IO.
Because it is software to implement RAID functions, so it flexible configuration, easy management. Use software RAID, you can also implement several physical disks are combined into a larger virtual equipment to achieve improved performance and data redundancy purposes. Of course, hardware-based RAID solution than software-based RAID technology in performance and service performance slightly better, in particular in terms of multi-bit error detection and repair capabilities, automatic error detection and disk array reconstruction.
Use of RAID technology in storage system benefits mainly in the following three ways:
- By multiple disks grouped together as a logical disk spanning functionality provided
- By dividing the data into a plurality of data blocks (Block) parallel write / read multiple disks to improve disk access speed
- Provide fault tolerance by mirroring or parity operation
RAID level introduction
Commonly used RAID class, namely RAID 0, RAID1, RAID 2, RAID 3, RAID 4 and RAID 5, plus a combo-type ‘RAID 0+1’ or called ‘RAID 10’.
We first RAID level advantages these shortcomings make a comparison:
RAID level relative disadvantage relative merits
|RAID level||Relative Disadvantage||Relative Merits|
|RAID 0||Fastest||no fault access|
|RAID 1||is fully fault-tolerant||high cost|
|RAID 2||with Hamming code parity, data redundancy and more||slow|
|RAID 3||write performance||is no better multi-tasking capabilities|
|RAID 4||multitasking and fault tolerance||Parity disk drive performance bottleneck|
|RAID 5||Multitasking and fault tolerance||is written with overhead|
|RAID 0+1/RAID 10||fast, fully fault tolerant||cost|
RAID 0-characteristics, principles and applications
Also known as striped pattern (striped), namely the continuous data across multiple disk access, as shown. When there is a data request can be executed in parallel across multiple disks, each disk execution of its own that part of the data request. Parallel operation of this data can take full advantage of bus bandwidth, significantly improve overall disk access performance. Because the reads and writes are done in parallel on the device, read and write performance will increase, which is usually the main reason for running RAID 0. However, RAID 0 has no data redundancy, if the drive fails, it will not recover any data.
To achieve RAID0 must have two or more hard drives, RAID0 stripe group to achieve the data is not stored on a hard drive, but divided into data blocks stored on different drives. Because the data is distributed on different drives, so greatly increased data throughput, drive load is relatively balanced. If the data required for just on a different drive in the most efficient. It is not necessary to calculate checksum, and easy to implement. Its disadvantage is that it has no data error control, if the data in a drive error occurs, even if the data on other disks correctly does not help. It should not be used for data stability requirements of the occasion. If the user image (including animation) editing and other requirements for the use of transmission is relatively large RAID0 more appropriate. Meanwhile, RAID can improve the data transfer rate required to read documents such as distributed on two hard drives, two hard disks can be read at the same time. Then read the same file at the time the original is reduced to 1/2. In all the levels, RAID 0 is the fastest speed. But there is no redundancy of RAID 0, if a disk (physical) damage, then all the data are not available.
RAID 1-principles and application
RAID 1, also known as mirroring (Mirroring), a fully redundant mode have shown. RAID 1 can be used for two or 2xN disk, and use 0 or more spare disks simultaneously write data written to the mirror disk each time. This array of high reliability, but its effective capacity is reduced to half of the total capacity, while the size of the disk should be equal, or else only a total capacity size of the smallest disk.
For the use of such equipment is RAID1 configuration, RAID controller must be able simultaneously to two disk read and write both mirrored disks. By the following chart you can see there must be two drives. Because it is when a group of mirror disk configuration problems, you can use the mirror to improve the fault tolerance of the system. It is relatively easy to design and implement. Every read disc read only piece of data that is the same as the rate of data block read transfer rate of single disks. Because RAID1 check very complete, it is for the processing capability of the system greatly affected, usually by a software RAID functionality, and such implementations heavier load on the server when it will greatly affect the efficiency of the server. When your system requires extremely high reliability of such statistical data, it is more appropriate to use RAID1. And RAID1 technical support “hot replacement”, that the case of the failed disk UPS replacement, has been replaced as long to recover data from a disk image can be. When the primary hard disk is damaged, it can replace the primary hard disk mirroring hard work. Mirrored disk drive is a backup disk, can be imagined, the safety of this hard disk model is very high, RAID 1 for data security is the best on all RAID levels for. But its disk utilization was only 50%, is the lowest of all RAID levels.
RAID 2-features, principles and applications
RAID 2 is conceptually similar to the same RAID 3, both compartmentalization of data distributed on different hard disk, slice in bits or bytes. However, the use of certain RAID 2 encoding technology to provide error checking and recovery. This encoding technique requires more disk storage inspection and recovery information, making the technical implementation of RAID 2 is more complex. Therefore, rarely used in a commercial environment.
Too many disk parity used !!
Is on each disk left shows the individual bits of data from a data obtained by different bit Hamming checksum calculation can be saved on another set of disks, the specific circumstances see below. Because of the characteristics of Hamming code, which can occur error correction case of an error in the data, in order to ensure proper output. Its data transfer rate is very high, if you want to achieve the ideal speed, it is best to save the checksum to enhance hard disk ECC codes for the design of the controller, it is simpler than RAID3,4 or 5. There is no free lunch here, too, to make use of Hamming codes, you must pay for data redundancy. The output data rate and drive group slowest equal.
RAID 3-features, principles and applications
RAID 3 is the data do first XOR operation to generate Parity Data (parity data), the data and Parity Data in the parallel access pattern writing member disk drives, hence, it has parallel access mode of advantages and disadvantages. Furthermore, RAID 3 per a data transmission, all data that is updated throughout the Stripe ? every member of the relative position of the disk drive are updated with ?, thus no need to read the part of the disk drive out of the existing data with new XOR operation data for the case, and then write this situation occurs ? RAID 4 and RAID 5 will happen, commonly known as Read, Modify, Write Process, we tentatively translated as read-modify-write process ?. Thus, in all RAID levels, RAID 3 write performance is the best.
Parity Data RAID 3 is usually stored in a dedicated Parity Disk, but because each data update the entire Stripe, therefore, RAID Parity Disk 3 and not as RAID Parity Disk 4, it will cause a bottleneck access.
RAID concurrent access Mode 3, the need to support special features of RAID controllers, disk drives in order to achieve synchronization control, and the advantages of the write performance to the current Caching technology, you can replace it, it is generally believed that the RAID 3 Applications will be phased out.
RAID 3 with its superior write performance, especially for use in large-scale, continuous file written based applications, such as graphics, images, video editing, multimedia, data warehousing, high-speed data acquisition, and so on.
RAID3 compared RAID2, less parity disk data usage, reduce costs, enhance efficiency check, the disadvantage is not supported error corrected
This checksum and RAID2 different, not only troubleshoot the error correction. It is a process to access the data of one band, this can improve read and write speed, it is like RAID 0 as a parallel manner to store data, but speed is not fast RAID 0. Check code is generated and stored on another disk when writing data. When the need to implement the user must have three or more drives, the read write rate are high speed, because the parity bit is relatively small, and therefore relatively less computing time. With software RAID control will be very difficult to achieve control is not very easy. It is mainly used for graphics (including animation) and other requirements of high throughput applications. Unlike RAID 2, RAID 3 single disks store parity information. If a disk fails, the parity disks and other data disk can be re-generated data. If the parity disk failure did not affect the data use. RAID 3 for a large number of continuous data transfer rate can provide very good, but for random data, the parity disk will become a write bottleneck. The use of a single parity disk to protect data while not mirroring safe, but disk utilization has been greatly improved for the n-1.
RAID3 perform a disk IO bound to affect up all this time, the other queue IO must wait, it can not be achieved RAID3 concurrent IO.
RAID 4-features, principles and applications
Create RAID 4 requires three or more disks, which holds the parity information on one drive, and RAID 0 mode to write data to another disk, as shown in FIG. Because one disk is reserved for parity information, the size of the array is (N-l) * S, where S is the array size of the smallest drive. Like that, the size of the disk should be equal In RAID 1.
If a drive fails, you can use the parity information to reconstruct all data. If the two drives fails, all data will be lost. The reason does not often use this level is to verify the information stored on a drive. Each time you write a different disk, you must update this information. Therefore, when a large number of write data parity disk is likely to cause a bottleneck, so this level of RAID now rarely used.
RAID 4 is taking independent access mode, and a single dedicated Parity Disk to store Parity Data. Each RAID a longer data transmission ? Strip ? 4, and can perform Overlapped I / O, and therefore a good performance it reads.
However, due to the exclusive use of a single Parity Disk to store Parity Data, so when written, it will cause big bottleneck. Therefore, RAID 4 has not been widely used.
RAID4 and RAID3 is like, the difference is that its access to the data is based on data blocks, that is, according to the disk, and each is a disk. You can not see it on the diagram, RAID3 is once a bar, and RAID4 once a vertical bar. It features a RAID3 also Tingxiang, but in case of failure recovery, it may be difficult to RAID3 much larger than the design difficulty controller also many large and efficient access to the data is not very good.
RAID4 perform an IO occupies only one disc, the other IO can be performed simultaneously with the IO, provided that the other objective is not the IO IO to read the disk.
RAID 5-features, principles and applications
When combined with a large number of physical disks you want and still keep some redundancy, RAID 5 is probably the most useful RAID mode. RAID 5 can be used on three or more disks, and use 0 or more spare. Like RAID 4, as the size of the apparatus is obtained RAID5 (N-1) * S.
The biggest difference between RAID5 and RAID4 parity information is distributed evenly on each drive
As shown in Figure 4, thus avoiding the emergence of RAID 4 bottleneck. If one of the disks fails, due to parity information, so all data can still be maintained. If you can use a spare disk, then after equipment failure, data synchronization will begin immediately. If two disks fail simultaneously, all data will be lost. RAID5 can withstand a disk failure, but can not withstand two or more disk failures.
RAID 5 is also to take independent access mode, but the Parity Data is written to each member of the dispersion disk drive, therefore, have a Overlapped I / O multi-task performance (for parallel IO), we should also single out as RAID 4 Exclusive Parity Disk write bottlenecks. However, RAID 5 writes data here, still a little by the “read-modify-write process” drag.
Because RAID 5 can perform Overlapped I/O multi-task, the more so when the number of members of RAID 5 disk drives, the higher its performance as a disk drive and then a time can execute a Thread, so the more disk drives, Thread can Overlapped of the more, of course, the higher the performance. But conversely, the more disk drives in the array may have a chance to disk drive failure, the higher the reliability of the entire array, or MTDL (Mean Time to Data Loss) is reduced.
Basically, much more than the task of environment, access frequently, the amount of data is not great application, are suitable for use RAID 5 architecture, Such as enterprise file server, WEB server, online trading systems, electronic commerce and other applications, are small amount of data, accessing frequently used.
It can be seen from the diagram that the parity exists on all disks, where 0 represents the first band p0 parity value, the other meaning is the same. RAID5 read very efficient writing efficiency in general, a good collective access efficiency block style. Because parity on different disks, so improving the reliability, allowing a single disk error.
RAID 5 is also based on the data parity bit to ensure data security, but it is not a separate hard drive to store data parity bit, but the interaction parity bit data segment stored on each drive. Thus, any damage to a hard disk, can be used to reconstruct damaged data based on the parity bit to other hard disk. Hard disk utilization is n-1. But its data parallelism resolved, and design of the controller is also very difficult.
RAID 3 compared to RAID 5, RAID 3 important difference is that every once data transfer is required involving all the disk array. For RAID 5, most of the data transmission is only one disk operation can be performed in parallel. The “write losses” in RAID 5, that every write operation will have four actual read / write operations, two of which read the old data and parity information, the two write new data and parity information. RAID-5, then the advantage is that it provides redundancy (supports a disk after dropping still running), disk space utilization higher (N-1/N), read and write faster (N-1 times).
The biggest advantage is in the case of a RAID5 disk dropped, RAID usual work, each piece must respect RAID0 disk are normal and normal condition before they can work better fault tolerance. So RAID5 RAID level is one of the most common type.
RAID5 parity bit is P-bit data by other bands do XOR (xor) obtained.
P = D1 xor D2 xor D3 … xor Dn (D1, D2, D3 … Dn data blocks, P is parity, xor as XOR)
XOR (Exclusive OR) parity principle in the following table:
|Value A||Value B||Value C|
Where A and B represents the two-bit value, which can be found, A and B, as when, XOR result is 0, A and B are not the same when, XOR result is 1, and know XOR result and A and B of the any of a number, you can launch another counter-value. For example, A is 1, XOR result is 1, then B is certainly zero if XOR result is 0, then B is certainly one. This is the XOR encoding and verification of the basic principles.
RAID 5 raid mode is a very practical value, widely used in various environments. raid 5 mode works:
RAID 5 using at least three disks to implement arrays, it can not only realize acceleration RAID 0 function can backup data RAID 1, and in which there is an array of three hard drives, the data it will need to be stored in accordance with the user defined split file size is divided into two fragments stored in the hard disk which, at this time, the third disk array which does not receive the file fragmentation, it receives are stored in another part used to verify the data of data among two hard drives This part of the check data is generated through a certain algorithm, you can recover the data stored on the hard disk of the other two by this part of the data. In addition, these three hard task is not static, which means that the memory of them may be the No. 1 hard drive and 2 hard disk file fragmentation is good for storing divided, it may be that the 2nd hard disk storage in the next time and the 3rd hard drive to complete this task. We can say that in every store operation among each drive tasks are assigned randomly, however, file fragmentation must be two hard drives used to store the divided another hard drive to store parity information.
The parity information is usually obtained through raid controller operation, usually such information is necessary to have a separate chip to computing and decided to send this information to a piece of hard disk storage on a raid controller.
RAID 5 RAID 0 will also achieve high-speed memory read and will implement RAID 1 data recovery function, which means that under the above mentioned circumstances, RAID 5 can make use of three hard drives implement RAID 0 speed while doubling function It will implement RAID data backup 1, and RAID hard disk damage when 5 of them, adding a new hard drive can also be realized to restore the data.
To analyze the following raid 5 on how to achieve the data reduction, for example, the use of three hard drives to form a raid 5 array of user-defined split file size is 64k, this time need to store file size of 128k. First, when the raid controller receives this part of the use of certain algorithms derived data verification information, and then dividing this file fragments into a 128k file size of 64k two sizes, then these two files simultaneously fragments were placed to the 1st and the 2nd hard disk drives, the last check information is sent to the 3rd hard drive. Among the array if a disk is damaged, or you can restore the original data: If the above is used to store parity information on the 3rd hard disk is damaged, you can rebuild the hard drive to verify information on the 1st and the 2nd; if damaged is on the 1st or 2nd hard disk, you can use the parity information stored on the hard disk of the 3rd regenerate the original file fragmentation.
Some raid 5 mode is not good, if you were a piece of the hard disk array information has changed, then you need to recalculate the file split fragments, and also need to be recalculated parity information, then, three hard drives are required Recall.
Effective capacity Similarly, if to do raid 5 array, it is best to use the same speed the same hard drive capacity, raid 5 mode is the array size of the smallest hard disk capacity multiplied by the number of disk arrays after subtracting the number one, the number of hard disk here To subtract one because there is a hard drive to store parity information.
raid 5 can be achieved both doubling in speed, but also to ensure the security of the data, so many of the high-end systems use this raid mode.
RAID 0+1/RAID 10-characteristics, principles and applications
RAID 0+1/RAID 10, combines the advantages of RAID 0 and RAID 1, and suitable for use in high speed demand, but also fully fault tolerant, of course, there are many funding applications
RAID stripe cut “striped” access mode
Among the use of stripes cut(Data Stripping) the RAID system, the members of the disk drive access mode can be divided into two types:
- Parallel Access
- Paralleled Access
RAID 2 and RAID 3 is to take the parallel access mode.
RAID 0, RAID 4, RAID 5 and RAID 6 is the use of independent access mode