Scalability and elasticity are the trendiest words in the database arena these days – everybody scales, and everybody claims that only they scale the right way :) .

There are several ways to scale a database. When evaluating a database solution, ask yourself how it scales and see if it scales in a way that would be optimal for the needs of your application.

Before we can choose the most suitable scaling solution, there are a couple of things to consider:

MySQL Scalability Issues – What is it that we want to scale in a database?

Capacity – The most natural scaling scenario, where data has grown and the disk/memory/attached storage is too small.
Throughput – Application usage has grown, resulting in a decline in performance. Throughput is influenced by latency & concurrency, which means you can either improve the response time for every task (latency) , do a lot of tasks in parallel (concurrency), or use a combination of both (remember that one influences the other, and as concurrency increases, it affects latency as well.)
Scaling throughput can be accomplished either for a specific type of transaction – such as scaling for read operations only – or for all types of transactions and operations: reads, writes and everything in between.

What we are willing to undergo during the scale event?

The Scale can be:

Offline – This scale event requires stopping the applications running against the database until we bring back the newly scaled (up or down) database. This scale event can take anywhere from minutes to hours. In some cases, it may also require a re-portioning event or something of the sort.
To maintain service during the time that we need to take the database offline, we often back up the data and restore it elsewhere to maintain service from a temporary location. Also, once the scaled instance is once again operational, you’ll need to sync all your data with the data in your backup.
Online – The database continues to serve the application during this scale event. At most, the application might suffer some performance degradation during the scaling time, but overall service is maintained.
Since your database records could change during the scale event, any solution that offers online scalability needs to be able to ensure data consistency.

So what types of scales are there?

Scaling Up and Down

Scaling up and down are the most common and ‘brute’ ways to scale:

Need more capacity or need better performance? Then buy a bigger, faster, meaner machine. The cloud infrastructure makes this approach very easy from the operational standpoint: you simply spawn the next available VM class and voila!

This approach can be achieved through a Do-It-Yourself operation of purchasing and installing additional virtual resources, or by subscribing to a database service that provides the ability to allocate additional instances on demand.

Advantages of scaling up or down:

It’s simple – This type of scale is usually fairly simple to deploy, and is supported by most of the databases used today (so that once more processors and cores are identified, the database software is able to take advantage of the additional resources.)
No code changes to the application – it is still runs against a single (larger) machine.
Scales both throughput and capacity

Disadvantages of scaling up or down:

Cost – High-end hardware is usually costly when deploying physical servers as well as when provisioning virtualized resources on the cloud. Moreover, if considering non-standard equipment (ultra-fast networks, PCI SSD storage etc.), the cost for such an infrastructure will be so expensive, making it unfeasible for most organizations. In addition, most of the cloud infrastructure uses commodity hardware, so the more high-end configurations and hardware are usually not available at all.
Limited MySQL scalability – At the end of the day, you can’t outgrow the capabilities of a single machine. In the cloud infrastructure, those capabilities are also restricted to the hardware offers by your chosen cloud provider.
Scaling up or down is usually done offline – This is the case because the hardware needs to be changed. To avoid service downtime, some kind of master/slave architecture needs to be established and upgraded one by one.

Scaling Out and In

This approach is more complex to achieve because it requires the database to be designed to support scale-out capabilities. There are various ways to scale-out; each solution carries its own advantages, disadvantages, tradeoffs, and architectural implications. We’ll address two popular architectures: read replicas and sharding.

Distributed read replicas:

This is the most common way to scale out. In this configuration, you have a single master server where all writes are performed and then distributed to all the read replicas. Read operations can be done against the master or the read replicas. The distribution of write operations can be synchronous, where the write operation will be blocking until it reaches all the read replicas (ensuring data consistency), or it can be asynchronous, where the write operation is unaffected, resulting in data inconsistency since you may not be reading the most up-to-date data.

Advantages of scaling out and in using read replicas:

Simple – relatively easy to implement compared to other distribution approaches
Scales beyond the limitations of a single machine
Usually is done online (but the client is not automatically updated on the added/removed replicas)

Disadvantages of scaling out and in using read replicas:

Scaling is only for throughput and read operation (sometimes at the expense of write operations).
Application changes are usually required to perform the reads and writes in the right place and to add or remove nodes.

The read replicas approach is used by MySQL replication, Amazon RDS, Zimory, and more. It is usually suitable for 1^st generation web applications where the application is mostly read (90/10, 80/20 – scalability is needed mostly for read operations). For next-generation web apps that are user and content-driven, this approach is less suitable.

With the expansion of social applications and social/sharing features, today’s applications are characterized with a much higher ratio of write operations. Some applications even demonstrate a ratio of 70/30 for write operations.

Sharding:

Sharding is another way to resolve MySQL scalability issues. It usually means splitting up the data by some logic derived from the application. This can be done by selecting a key in the data and splitting the data by hashing that key and having some distribution logic. It can also be done by identifying the application needs and setting different tables or different data sets in different databases (splitting the North-America sales data from the EMEA sales data, etc.)

This approach is simple from the database standpoint, but is very complex from the application standpoint since the application needs to be modified to deal with the data being scattered into the different shards. Moreover, combining data from different shards can be very complex and involves development in the application (you can’t just run a simple JOIN.)

Advantages of scaling out and in using sharding:

Scales beyond the limitations of a single machine
Scales both read and write operations (but makes some operations impossible to achieve in the database)
Scales both throughput and capacity

Disadvantages of scaling out and in using sharding:

Complex and requires application changes
Scaling is usually offline and requires a re-partitioning event – and may require application changes.

Today, there are some solutions that introduce auto-sharding (Scalebase, Dbshards). This approach makes sharding more similar to shared-nothing partitioning, thus taking the sting out of some sharding complexities. However, it still requires application awareness and could prove to be a limiting factor if you needed to update your app or migrate to a different database solution.

Understanding RAID 10 and RAID 01

Before understanding database, its also important to understand more about Disk RAID.

This article explains the difference between the two with a simple diagram.

I’m going to keep this explanation very simple for you to understand the basic concepts well. In the following diagrams A, B, C, D, E and F represents blocks.

RAID 10

RAID 10 is also called as RAID 1+0
It is also called as “stripe of mirrors”
It requires minimum of 4 disks
To understand this better, group the disks in pair of two (for mirror). For example, if you have a total of 6 disks in RAID 10, there will be three groups–Group 1, Group 2, Group 3 as shown in the above diagram.
Within the group, the data is mirrored. In the above example, Disk 1 and Disk 2 belongs to Group 1. The data on Disk 1 will be exactly same as the data on Disk 2. So, block A written on Disk 1 will be mirroed on Disk 2. Block B written on Disk 3 will be mirrored on Disk 4.
Across the group, the data is striped. i.e Block A is written to Group 1, Block B is written to Group 2, Block C is written to Group 3.
This is why it is called “stripe of mirrors”. i.e the disks within the group are mirrored. But, the groups themselves are striped.

RAID 01

RAID 01 is also called as RAID 0+1
It is also called as “mirror of stripes”
It requires minimum of 3 disks. But in most cases this will be implemented as minimum of 4 disks.
To understand this better, create two groups. For example, if you have total of 6 disks, create two groups with 3 disks each as shown below. In the above example, Group 1 has 3 disks and Group 2 has 3 disks.
Within the group, the data is striped. i.e In the Group 1 which contains three disks, the 1st block will be written to 1st disk, 2nd block to 2nd disk, and the 3rd block to 3rd disk. So, block A is written to Disk 1, block B to Disk 2, block C to Disk 3.
Across the group, the data is mirrored. i.e The Group 1 and Group 2 will look exactly the same. i.e Disk 1 is mirrored to Disk 4, Disk 2 to Disk 5, Disk 3 to Disk 6.
This is why it is called “mirror of stripes”. i.e the disks within the groups are striped. But, the groups are mirrored.

Main difference between RAID 10 vs RAID 01

Performance on both RAID 10 and RAID 01 will be the same.
The storage capacity on these will be the same.
The main difference is the fault tolerance level. On most implememntations of RAID controllers, RAID 01 fault tolerance is less. On RAID 01, since we have only two groups of RAID 0, if two drives (one in each group) fails, the entire RAID 01 will fail. In the above RAID 01 diagram, if Disk 1 and Disk 4 fails, both the groups will be down. So, the whole RAID 01 will fail.
RAID 10 fault tolerance is more. On RAID 10, since there are many groups (as the individual group is only two disks), even if three disks fails (one in each group), the RAID 10 is still functional. In the above RAID 10 example, even if Disk 1, Disk 3, Disk 5 fails, the RAID 10 will still be functional.
So, given a choice between RAID 10 and RAID 01, always choose RAID 10.

Courtesy - http://www.thegeekstuff.com/2011/10/raid10-vs-raid01/

MySQL DBA

Monday, 7 November 2011

MySQL Scalability