Optimal Is Not The Best

Optimal Is Not The Best

As an example I will discuss a real life use case when we need to select some amount of latest news (let’s say 25 news) by tags. Suppose we have 1 billion records in table news and 10 thousands records in table tags. Natural relation between news and tags will be many-to-many using bridge table tags_in_news as shown below:

And the sql query will look like:

[sql]
select n.* from news n, tags_in_news tin
where ‘getNewsByTags’ = ‘getNewsByTags’
and n.deleted = false
and n.id = tin.news_id
and tin.deleted = false
and tin.uk_tag_id in (0, 1119, 2228, 4031, 1792, 1117, 3954, 2150, 3919, 4087)
and n.modified_date <= Now()
order by n.modified_date desc limit 25
[/sql]

After the first run of this query its execution time is not very fast as I would like it to be. It’s because of order by sorting construction. So I created index for modified_date field:

[sql]
create index ix_modified_date on news (modified_date)
[/sql]

But even after that MySQL did not choose query plan strategy using this index – it chose optimal strategy using another indexes that was not the best strategy. So I had to force MySQL to use this index. Fortunately, we have such possibility using force index construction and now our query will look like this:

[sql]
select n.* from news n force index (ix_modified_date), tags_in_news tin
where ‘getNewsByTags’ = ‘getNewsByTags’
and n.deleted = false and n.id = tin.news_id
and tin.deleted = false
and tin.uk_tag_id in (0, 1119, 2228, 4031, 1792, 1117, 3954, 2150, 3919, 4087)
and n.modified_date <= Now()
order by n.modified_date desc limit 25
[/sql]

Bingo! Query time decreased from more than a second to less than 50 ms. I discovered the best query execution strategy. So the problem was resolved… till one “fine” morning when our DB server fell down. After slow_query_log investigation I found out that for some tags values our query executes dramatically slowly. And the cause was really strange: the total amount of records returned from query was less than the limit value. In such case, it’s better not to use force index and let MySQL decide how to execute query. Hmm… looks like optimal strategy is better than the best strategy? So I had two options: to give up and use optimal strategy or to fight and find a solution how to omit this issue. I’ve chosen the second one. But more on it in my next article. Stay tuned!:)

Ivan
ivan Hi, my name is Ivan, Senior Architect & R&D team expert at Romexsoft, keen on high-load Java application development and databases.
Share The Post

Leave a Comment

Your email address will not be published. Required fields are marked *