Wednesday, 29 May 2013

3 ways MySQL uses indexes

I often see people confuse different ways MySQL can use indexing, getting wrong ideas on what query performance they should expect. There are 3 main ways how MySQL can use the indexes for query execution, which are not mutually exclusive, in fact some queries will use indexes for all 3 purposes listed here.
Using index to find rows The main purpose of the index is to find rows quickly – without scanning whole data set. This is most typical reason index gets added on the first place. Most popular index type in MySQL – BTREE can speed up equality and prefix range matches. So if you have index on (A,B) This index can be used to lookup rows for WHERE clauses like A=5 A BETWEEN 5 AND 10 ;A=5 AND B BETWEEN 5 AND 10 it however will NOT be able to help lookup rows for B BETWEEN 5 AND 10 predicate because it is not index prefix. It is important to look at key_len column in explain plan to see how many index parts are actually used for row lookup. Very common problem I see is multi column indexes which are used but only to their short prefix which is not very selective. A lot of this mistakes come from missing one very important MySQL limitation – once MySQL runs into the interval range it will not use any further index parts. If you have A BETWEEN 5 AND 10 AND B=5 for the same index MySQL will use the index… but it will only use A prefix for row lookups and scan whole A BETWEEN 5 AND 10 range. It is interesting to note this limitation only applies to interval ranges – for enumerated ranges MySQL will use both key parts. Hence if you change this predicate to A IN (5,6,7,8,9,10) AND B=5 you will quite likely see improved query performance. Beware however of large nested enumerated ranges they are very hard on the optimizer. This just describes how MySQL uses single index – there are more complex rules of how indexes will be used if you look at multiple indexes usage with “index merge”
Using Index to Sort Data Another great benefit of BTREE index is – it allows to retrieve data in sorted form hence avoiding external sort process for executing of queries which require sorting. Using index for sorting often comes together with using index to find rows, however it can also be used just for sort for example if you’re just using ORDER BY without and where clauses on the table. In such case you would see “Index” type in explain which correspond to scanning (potentially) complete table in the index order. It is very important to understand in which conditions index can be used to sort data together with restricting amount of rows. Looking at the same index (A,B)things like ORDER BY A ; ORDER BY A,B ; ORDER BY A DESC, B DESC will be able to use full index for sorting (note MySQL may not select to use index for sort if you sort full table without a limit). However ORDER BY B or ORDER BY A, B DESC will not be able to use index because requested order does not line up with the order of data in BTREE. If you have both restriction and sorting things like this would work A=5 ORDER BY B ; A=5 ORDER BY B DESC; A>5 ORDER BY A ; A>5 ORDER BY A,B ; A>5 ORDER BY A DESC which again can be easily visualized as scanning a range in BTREE. Things like this however would not work A>5 ORDER BY BA>5 ORDER BY A,B DESC or A IN (3,4) ORDER BY B – in these cases getting data in sorting form would require a bit more than simple range scan in the BTREE and MySQL decides to pass it on. There are some workarounds you can use though.
Using index to read data Some storage engines (MyISAM and Innodb included) can also use index to read the data, hence avoiding to read the row data itself. This is not simply savings of having 2 reads per index entry instead of one but it can save IO orders of magnitude in some cases – Indexes are sorted (at least on the page boundary) so doing index range scan you typically get many index entries from the same page but the rows itself can be scattered across many pages requiring potentially a lot of IOs. On top of that if you just need access to couple of columns
index can be simply much smaller than the data which is one of the reason covering indexes help to speed up queries even if data is in memory. If MySQL is only reading index and not accessing rows you will see “using index” in EXPLAIN output.
These are the main “core” use for indexes. You can also see others like using index for group by but I think they can be pretty much looked as one of these 3 ways described


Wednesday, 23 January 2013

How to evaluate whether or not you should join a startup


When interviewing for a job at a start-up  you might focus on the questions you will be asked by the interviewer, but there are also a series of questions you need to ask yourself. Are you ready to grind yourself into the ground for the next 5 years? Do you believe in the company’s mission? Can you live off this (likely lower) salary? In the case of a successful exit, what future opportunities does this job provide? This article provides a series of information startup job candidates should consider and answers to commonly asked questions. It’s tailored more for those in business roles than those in technical or design roles.

Do you believe in the company’s mission and are you ready to work on it for the next 5 years?

This should be the first question you ask yourself. If the answer is no to both, then you are probably not joining the right company. While not everyone always gets to work for a company they truly believe in, if you plan on joining a startup and getting paid much less to do something you don’t believe in (or are excited about) you may as well go to a bigger company and get paid more money to do something you don’t love.
The one exception to not believing in the company mission and not being ready to spend the next 5 years working at it is if you are in education mode, and have an opportunity to work for an individual (or multiple people) who will be able to teach you everything from starting your own business, scaling the business, raising money, closing partnerships, getting press, and so on.
While one should always be looking for opportunities to professionally educate, I am referring to the rare opportunity at the beginning of a career in which candidates should learn as much as possible about business. If you get the chance to work with an absolute rockstar (i.e. working with Mark Zuckerberg at Facebook, even if you’re passionate about the pharmaceutical industry), then you should jump on it, no matter the field.

Who is your manager?

There are tons of people who believe that knowing who you will be reporting to is the single most important thing in deciding whether you should join. I tend to agree with them. If you are reporting to someone who knows what they are doing and is very good at their job, you will accelerate your career two, maybe even five years. Working for a well-respected individual in the industry will help legitimize you. Ask them to take you under their wing, bring you around to events, introduce you to their network, etc. It will go a very long way in building your career. Don’t be discouraged if your boss is only a year older than you, or even younger! It doesn’t mean that they don’t know what they are doing.

What will you be doing at the startup?

The next thing you should be thinking about is what your role will be. What are their expectations for you? What are your short term goals? Long term goals? What will you be doing Day 1? Understanding what your responsibilities will be, will help you evaluate if this is where you belong.

Do you get along with the employees?

Have you met the team yet? Do you like them? Do they like you? All very important questions, because you will be grinding yourself into the ground with them for the next few years. They are your brothers and sisters in the army that is your startup. You will share wins and losses together. If you don’t get along with them, then you are probably at the wrong startup. If you do get along with them, and they seem to like you, then you are probably a good culture fit.

What salary are you hoping to earn?

Expectations here vary from startup to startup. The best general rule of thumb is that if your role is to generate revenue for the company, then you are in Bucket X. If your role is not actively creating revenue, then you are in Bucket Y. From my experience, Bucket X is typically in the $90-150k range whereas Bucket Y is typically in the $30-90k range. Now these are pretty wide ranges, but I’ve seen them hold up in the real world. For example, if you work for an ecommerce startup in a sales/business development role and are doing deals that bring in money for the company, you are probably in Bucket X. Whereas if you are in an industry where your BD/marketing/community role doesn’t bring in money, then you are in Bucket Y. These numbers are not for C-level executive roles and are general estimates.

What about company stock options?

Ah, the infamous golden ticket. Here is something that they don’t tell you when hiring you, unless you join a rocket-ship, you are probably not going to get anything significant at a startup. Making those large sums of money only happens if you join the next billion dollar startup (only a handful every few years) or if you are a founder. If you think that by joining a startup you are going to strike gold, then you are in the wrong industry.
That being said, one of the perks of working for a startup is receiving equity in the company. Depending on your role and where the company is, you are most likely getting on the lower end of .05-.30% in your junior role (if you thought you were getting 1%, dream on!). What you need to understand is actually how much it is worth. The CEO telling you that you are getting .1% of the company in stock options doesn’t mean anything unless you know what the valuation of the company is!
To figure this out you need to know how much the company is worth and the number of outstanding shares. This will help you value your options. Once you receive this information you can estimate the value per share minus the exercise price multiplied by the number of shares and you should come up with a basic value of your stock options.
Some companies grant stock options, while others grant restricted stock units. The difference there is that options grant the employee the right to purchase shares of the company at a price (often a small fraction of the actual price) whereas RSUs mean that the stock is given to the employee free of charge.

Where is the company financially?

This is only important in terms of figuring out if the company will be around so you can grind yourself into the ground trying to achieve the mission. Try to get a good grasp on where the company is financially. If the company raised $1 million dollars eighteen months ago and makes no revenue, it is probably not a good idea to join them since they are bleeding money. Even if you can get some of the information in the press, if you are serious about joining the team, ask the management at the company. Be sure to ask how much runway they have and when they think they’ll need to raise money again. It is something that any smart interviewer should ask.
I would even preface it by saying that the press isn’t always 100% accurate and you’d love to hear from them about where they are financially. If someone in the company gets insulted by this question, then that is a big red flag and you should be wary. A confident and comfortable interviewer (likely the CEO or founder) should be happy you are asking such an important question and not blindly jumping into the unknown.

What is the most important piece to properly evaluating whether you should join a startup?

This entirely depends on you. What do you find most important? If you are looking to strike it rich, you probably have a better shot at doing so by being a moderately successful co-founder (.1% of a $100M exit is $100k, 30% of $10M exit is $3M). If you are looking to accelerate your career, then you probably should go work for a superstar and have them take you under their wing. At the end of the day, what is most important to you will depend on how you evaluate whether you should join the startup or not.

Friday, 31 August 2012

MySQL Enterprise Monitor 2.3.11

Overview and Upgrade of MySQL EM 2.3.11

The new MySQL Enterprise Monitor 2.3.11 has lot new monitor elements than the previous stable version of 2.3.5. This topic covers on what are new monitor elements are added and also the steps to upgrade your Enterprise Monitor with un-attended installation mode.This new Monitor has removed some possibly un-wanted elements and added new monitoring elements. See the table below for comparison between 2.3.5 and 2.3.11

The LEFT side color GREEN and new monitor elements and RIGHT side and monitoring elements removed.

MySQL EM Comaprison Chart

 

Upgrade of Monitor Server

Please download MySQL Enterprise Monitor server 2.3.11 update from your Oracle e-Delivery account. Typically the file name would be below.

mysqlmonitor-2.3.11.2165-linux-x86_64-update-installer.bin [Upgrade binary]

1. Download above file in your MySQL EM server
2. Grant execute privilege to the file
  
chmod +x mysqlmonitor-2.3.11.2165-linux-x86_64-update-installer.bin

3. Execute the below command which automatically upgrades your current MySQL EM to latest version. You need to mention the path of previous installation as input for the command below. 

./mysqlmonitor-2.3.11.2165-linux-x86_64-update-installer.bin --mode unattended --createDataBackup 0 --debugtrace ~/debug.log --forceRestart 1 --installdir <<path of previous installation>>

With above executions you have completed your upgrade of your MySQL EM server to 2.3.11. Validate it by logging into your web console and if you have noticed the new values monitored as mentioned above.

 Upgrade of Monitor Agent  

1. Download MySQL EM agent from your Oracle e-Delivery

mysqlmonitoragent-2.3.11.2163-linux-glibc2.3-x86-64bit-update-installer.bin 

2. Grant execute privilege to the file
3. Execute the below command

./mysqlmonitoragent-2.3.11.2163-linux-glibc2.3-x86-64bit-update-installer.bin --mode unattended --enableproxy 0 --createBackup 0 --installdir /opt/mysql/enterprise/agent/ --debugtrace ~/test.log
 

Looks simple. As MySQL



Thursday, 29 December 2011

MySQL DBA Training Online

Training on MySQL Database Administration, the most and quickly emerging database from last 2 years.
About Trainer - Has real-time experience of 7+ years in role of Linux and database administration. 

Course Content - Click Here to view the content 

Contact Details - themysqldba@gmail.com
Follow us         - themysqldba [Twitter]

Monday, 7 November 2011

MySQL Scalability


Scalability and elasticity are the trendiest words in the database arena these days – everybody scales, and everybody claims that only they scale the right way :) .
There are several ways to scale a database. When evaluating a database solution, ask yourself how it scales and see if it scales in a way that would be optimal for the needs of your application.
Before we can choose the most suitable scaling solution, there are a couple of things to consider:

MySQL Scalability Issues – What is it that we want to scale in a database?

  1. Capacity – The most natural scaling scenario, where data has grown and the disk/memory/attached storage is too small.
  2. Throughput – Application usage has grown, resulting in a decline in performance. Throughput is influenced by latency & concurrency, which means you can either improve the response time for every task (latency) , do a lot of tasks in parallel (concurrency), or use a combination of both (remember that one influences the other, and as concurrency increases, it affects latency as well.)
    Scaling throughput can be accomplished either for a specific type of transaction – such as scaling for read operations only – or for all types of transactions and operations: reads, writes and everything in between.

What we are willing to undergo during the scale event?

The Scale can be:
  1. Offline – This scale event requires stopping the applications running against the database until we bring back the newly scaled (up or down) database. This scale event can take anywhere from minutes to hours. In some cases, it may also require a re-portioning event or something of the sort.
    To maintain service during the time that we need to take the database offline, we often back up the data and restore it elsewhere to maintain service from a temporary location. Also, once the scaled instance is once again operational, you’ll need to sync all your data with the data in your backup.
  2. Online – The database continues to serve the application during this scale event. At most, the application might suffer some performance degradation during the scaling time, but overall service is maintained.
    Since your database records could change during the scale event, any solution that offers online scalability needs to be able to ensure data consistency.

So what types of scales are there?

Scaling Up and Down

Scaling up and down are the most common and ‘brute’ ways to scale:
Need more capacity or need better performance? Then buy a bigger, faster, meaner machine. The cloud infrastructure makes this approach very easy from the operational standpoint: you simply spawn the next available VM class and voila!
This approach can be achieved through a Do-It-Yourself operation of purchasing and installing additional virtual resources, or by subscribing to a database service that provides the ability to allocate additional instances on demand.
Advantages of scaling up or down:
  • It’s simple – This type of scale is usually fairly simple to deploy, and is supported by most of the databases used today (so that once more processors and cores are identified, the database software is able to take advantage of the additional resources.)
  • No code changes to the application – it is still runs against a single (larger) machine.
  • Scales both throughput and capacity
Disadvantages of scaling up or down:
  • Cost – High-end hardware is usually costly when deploying physical servers as well as when provisioning virtualized resources on the cloud. Moreover, if considering non-standard equipment (ultra-fast networks, PCI SSD storage etc.), the cost for such an infrastructure will be so expensive, making it unfeasible for most organizations. In addition, most of the cloud infrastructure uses commodity hardware, so the more high-end configurations and hardware are usually not available at all.
  • Limited MySQL scalability – At the end of the day, you can’t outgrow the capabilities of a single machine. In the cloud infrastructure, those capabilities are also restricted to the hardware offers by your chosen cloud provider.
  • Scaling up or down is usually done offline – This is the case because the hardware needs to be changed. To avoid service downtime, some kind of master/slave architecture needs to be established and upgraded one by one.

Scaling Out and In

This approach is more complex to achieve because it requires the database to be designed to support scale-out capabilities. There are various ways to scale-out; each solution carries its own advantages, disadvantages, tradeoffs, and architectural implications. We’ll address two popular architectures: read replicas and sharding.

Distributed read replicas:

This is the most common way to scale out. In this configuration, you have a single master server where all writes are performed and then distributed to all the read replicas. Read operations can be done against the master or the read replicas. The distribution of write operations can be synchronous, where the write operation will be blocking until it reaches all the read replicas (ensuring data consistency), or it can be asynchronous, where the write operation is unaffected, resulting in data inconsistency since you may not be reading the most up-to-date data.
Advantages of scaling out and in using read replicas:
  • Simple – relatively easy to implement compared to other distribution approaches
  • Scales beyond the limitations of a single machine
  • Usually is done online (but the client is not automatically updated on the added/removed replicas)
Disadvantages of scaling out and in using read replicas:
  • Scaling is only for throughput and  read operation (sometimes at the expense of write operations).
  • Application changes are usually required to perform the reads and writes in the right place and to add or remove nodes.
The read replicas approach is used by MySQL replication, Amazon RDS, Zimory, and more. It is usually suitable for 1st generation web applications where the application is mostly read (90/10, 80/20 – scalability is needed mostly for read operations). For next-generation web apps that are user and content-driven, this approach is less suitable.
With the expansion of social applications and social/sharing features, today’s applications are characterized with a much higher ratio of write operations. Some applications even demonstrate a ratio of 70/30 for write operations.

Sharding:

Sharding is another way to resolve MySQL scalability issues. It usually means splitting up the data by some logic derived from the application. This can be done by selecting a key in the data and splitting the data by hashing that key and having some distribution logic. It can also be done by identifying the application needs and setting different tables or different data sets in different databases (splitting the North-America sales data from the EMEA sales data, etc.)
This approach is simple from the database standpoint, but is very complex from the application standpoint since the application needs to be modified to deal with the data being scattered into the different shards. Moreover, combining data from different shards can be very complex and involves development in the application (you can’t just run a simple JOIN.)
Advantages of scaling out and in using sharding:
  • Scales beyond the limitations of a single machine
  • Scales both read and write operations (but makes some operations impossible to achieve in the database)
  • Scales both throughput and capacity
Disadvantages of scaling out and in using sharding:
  • Complex and requires application changes
  • Scaling is usually offline and requires a re-partitioning event – and may require application changes.
Today, there are some solutions that introduce auto-sharding (Scalebase, Dbshards). This approach makes sharding more similar to shared-nothing partitioning, thus taking the sting out of some sharding complexities. However, it still requires application awareness and could prove to be a limiting factor if you needed to update your app or migrate to a different database solution.