Wednesday, October 18, 2017

    I found myself in a situation where the same exact query ran much slower in our MySQL replication(slave) node compared to the primary(master) node. Both nodes had identical configuration, both were running the same version of MySQL, the tables and the indexes which the query ran against were also identical. Running explain through MySQL Workbench told me that the index that was making the query fast on the primary node, was not being used by the optimizer on the replica node. Hinting the query to use that particular index did improve the query speed on the replica node. But why did I have to hint? I knew with certainty there was a similar performance discrepancy for at least two other queries in my app.
    Jared Call, a colleague of mine, found that there was hung query on the replica node, by running
show processlist . The query was running for a number of days. He proceeded to kill the query:
https://stackoverflow.com/a/3787661/2948202
As soon as I learned about this, I ran my query against the replica, hoping that my problem would have been solved. Indeed, the query ran 5 seconds faster, but it was still slow - it took close to 7 seconds, while it was only a second on the primary node.
    Next Day: I tried on the replica node again and BOOM!!! - the query ran just as fast on replica as it ran on the primary. My guess is that this hung long running query was preventing any new records from being indexed, which made the query optimizer...not optimal. Once the hung query died, MySQL started indexing again and eventually caught up. This is my hand wavy theory as I am not a db admin pro.
   Tl;DR - if you have a query that MySQL doesn't seem to optimize correctly, check for any hung queries and kill them(as long as you know what you are doing). Then sit back and relax while MySQL catches up indexing.

Credits: Jared Call, Kelly Shutt, Eric Griffin, Nathan Wakefield, stackoverflow.com