SQL優化技巧之超級大表和超級大表的連接優化

公眾號：pythonislover

Python大數據與SQL優化筆記

前面文章有寫到大表與大表的join優化，一個表5G，一個表8G表之間怎麼去實現HASH HASH 連接。如果如果2個表都是30G，那麼怎麼辦？就算走了HASH HASH我還想還是比較慢的

今天我們聊聊這個場景我們該怎麼做，大致思路就算通過分區實現表的分片連接，具體怎麼做呢？

假如原來兩張2表名稱為 t1,t2 ,現在我們新建2張新表

create table bt1( hash_code number, name varchar(100), country varchar(50), family number, code varchar(50) ) partition by list(hash_code) ( partition p0 values(0), partition p1 values(1), partition p2 values(2), partition p3 values(3), partition p4 values(4));

create table bt2(
hash_code number,
name varchar(100),
country varchar(50),
family number,
code varchar(50)
)
partition by list(hash_code)
(
partition p0 values(0),
partition p1 values(1),
partition p2 values(2),
partition p3 values(3),
partition p4 values(4));

其他欄位不變，新增一個hash_code的欄位，並且採用分區的方式，分區數量安裝你的表實際大小來定

下面把源表數據導入到新建的分區表裡，注意導入方式

insert into bt1 select ora_hash(code,4),t1.* from t1;

insert into bt2
select ora_hash(code,4),t2.* from t2;

這裡先介紹下ora_hash(列，hash桶數)，就是把列hash之後放入到0，1，2，3，4這幾個桶里，相當於把表分成5份，對於上面的例子hash_code hash之後的值分別為0，1，2，3，4。

然後實際代碼要改成下面的分片形式

源SQL select * from bt1,bt2 where bt1.code = bt2.code

改之後的SQL
select * from bt1,bt2 where bt1.code = bt2.code
and bt1.hash_code=0
and bt2.hash_code=0

select * from bt1,bt2 where bt1.code = bt2.code
and bt1.hash_code=1
and bt2.hash_code=1

select * from bt1,bt2 where bt1.code = bt2.code
and bt1.hash_code=2
and bt2.hash_code=2

select * from bt1,bt2 where bt1.code = bt2.code
and bt1.hash_code=3
and bt2.hash_code=3

select * from bt1,bt2 where bt1.code = bt2.code
and bt1.hash_code=4
and bt2.hash_code=4