Sub query vs exist vs join performance in particular case

SamSam
4 min readOct 2, 2019

Actually, it’s like an antique. You can query thousands related articles from internet. None of them is precise. I mean that there’s always said “It’s partially true, but the other part with another criteria (condition) is not true.”

Believe me. My “translation” from “someone said” above is much preciser than the answers you can find. The truth is like black magic. It’s forbidden because the blurry parts of condition combination are too ugly. The better one’s performance may 10 times better than the worse one. (more or less. Metaphor ok?)

So that might be the tricks that influence our bonus and performance as a back-end or data engineer or DBA. Imagine the words as Gandalf the White :

Gandalf:Gandalf:
I am Gandalf the White. And I come back to you now — at the turn of the tide.
Gandalf? Yes… that was what they used to call me. Gandalf the Gray. That was my name.

“I can accelerate the system performance 10 times from now. And I come back to you now — at the turn of the tide.”
“The only I need is double the bonus”
said poor Sam.

Well, it’s a joke. But somehow showed the truth that we can focus -> Performance of database and the query we made is very very close to each other.

After struggling with surfing online, I found the most approximate answer with promise is as below:

https://explainextended.com/2009/09/30/in-vs-join-vs-exists-oracle/

Good Job
by Quassnoi
September 30th, 2009 at 11:00 pm

Please don’t laugh at the time and doubt the time “2009″. It’s like the approximately truth after my testing in 10 pages of Google search.

Briefly, be in short, I’ll condense the final as:

1. Test by yourself

2. Check your DATA SIZE. The answer varies by data size.

3. Avoid full table scan. (Like just graduated from high school…)

4. Design good primary key and index also helps. (Query condition should be on key or index please. Thank you.)

5. Separate the huge I/O by design of tables and table partition

Don’t ask me why those above are nothing to do with “query” itself. You should ask yourself: “I just care about the performance. Using fuxking sub query or exists or join are all ok. Just give me the speed.”

So touched. right?

Yes. I will also give you a test result of “MY CONFIGURATION” and “MY data size”. Remember. It’s “MINE”. Dig yourself.

Be very very careful of my test case below. Make sure all your conditions are considered. It’s definitely not a black magic.

OS: Windows server 2016
DB: Relational database. Oracle 12c
data size: 138 columns and 6,410,782 rows
table size: 5409 Mb
24/7 table I/O: Yes
P_KEY: Primary Key
Date_created and Date_MODI: NOT index

My testing scripts are:
-

select * from T
where P_KEY in (select P_KEY from T where (DATE_CREATED > current_date() -365) or(DATE_MODI > current_date() -365) );

VS

select * from T
where exists (select P_KEY from B where t.P_KEY=b.P_KEY) and (DATE_CREATED > current_date() -365) or(DATE_MODI > current_date() -365) );

VS

select t.* from T
inner join (select P_KEY from B ) on t.P_KEY=b.P_KEY where (DATE_CREATED > current_date() -365) or(DATE_MODI > current_date() -365) );

My test result is:

sub query : 23 mins
exists query: 56 mins
join query: 74 mins

Sub query WINS.
-
(select * from T where p_key IN (select * from B))

Doubt my result and with many many questions? Fuxking testing yourself and strictly follow the conditions set. Allow me to remind you again:

1. Test by yourself

2. Check your DATA SIZE. The answer varies by data size.

3. Avoid full table scan. (Like just graduated from high school…)

4. Design good primary key and index also helps. (Query condition should be on key or index please. Thank you.)

5. Separate the huge I/O by design of tables and table partition

Don’t ask me why those above are nothing to do with “query” itself. You should ask yourself: “I just care about the performance. Using fuxking sub query or exists or join are all ok. Just give me the speed.”

So touched again. right?

Have a nice day with your database.

Originally published at http://datamansamxiao.wordpress.com on October 2, 2019.

--

--