BigQuery –Joining Tables (basic operation)

這一篇我們來教學如何在BQ在做基本joining tables的操作,雖然在BQ上是不太鼓勵做joining tables的操作。
若是您要分析的資料是從一般的RDBMS的資料庫而來,哪麼建議您在ETL時就將RDBMS的資料做Denormalization的動作,這樣避免掉joining tables的所要的資源及時間
後面會有篇章帶到如何在BQ上做joining tables的優化。
BQ支援以下的joining tables的型態
inner/outer/cross/anti/semi/anti-semi

請看底下的範例

With bike_rentals as (
Select
count(starttime) as num_trips,
extract(date from starttime) as trip_date
From `bigquery-public-data`.new_york_citibike.citibike_trips
Group by trip_date
),

rainy_days As
(
Select
Date,
(Max(prcp) > 5) as rainy
From (
Select
Wx.date as date,
If (wx.element = ‘PRCP’ , wx.value/10, NULL) as prcp
From
`bigquery-public-data`.ghcn_d.ghcnd_2016 as wx
Where
wx.id = ‘USW00094728’
)
Group by
Date
)

Select
Round(AVG(bk.num_trips)) as num_trips,
wx.rainy
From bike_rentals as bk
Join rainy_days as wx
On wx.date = bk.trip_date
Group by wx.rainy

從上面的範例中我們從兩個不同的dataset篩選我們要的資料出來,個別名命為bike_rentals 及rainy_days 。在第三段的select語法將它們做join,其中rainy_days這段的語法中,wx.id = ‘USW00094728’是New York某一個氣象站台。若我們分別執行這兩段bike_rentals 及rainy_days會看到以下的結果

第一段是騎乘數量與日期第二段是日期與時間,最後我們以“日期”為條件將兩個資料合併起來。就變成上面第一個範例的結果
這個方式就叫 “Inner Join”
from “資料庫A” join ”資料庫B ” on “資料庫A共同的欄位” = ”資料庫B 共同的欄位”

讓我們再看下一個Inner Join範例

With from_item_a as (
select ‘Dalles’ as city, ‘or’ as state
union all select ‘Tokyo’, ‘Tokyo’
union all select ‘Taiwan’, ‘Taipei’
),

from_item_b as (
select ‘or’ as state, ‘USA’ as country
union all select ‘Tokyo’, ‘Japan’
union all select ‘Taipei’, ‘Taiwan’
)

select from_item_a.*, country
from from_item_a
join from_item_b
on from_item_a.state = from_item_b.state

這個範例我們就直接做兩個dataset出來,第一段第一個欄位是城市名稱第二個欄位是州,第二段第一個欄位是城市名稱第二個欄位是國家名稱。
最後一段我們select第一個datset全部的資料及國家名稱,以兩個dataset都共有的”州”欄位去做join table。

在on 這邊也可以用不等於來做,例如在這三個國家間運送貨物會有額外的費用發生,請看以下範例

With from_item_a as (
select ‘Dalles’ as city, ‘or’ as state
union all select ‘Tokyo’, ‘Tokyo’
union all select ‘Taiwan’, ‘Taipei’
),

from_item_b as (
select ‘or’ as state, ‘USA’ as country
union all select ‘Tokyo’, ‘Japan’
union all select ‘Taipei’, ‘Taiwan’
)

select from_item_a.*, country as surcharage
from from_item_a
join from_item_b
on from_item_a.state != from_item_b.state

接下來我們看一下Cross Join,請看以下範例

With winners as (
select ‘John’ as person, ‘100m’ as event
union all select ‘Jason’, ‘200m’
union all select ‘Bob’, ‘500m’
),
gifts as (
select ‘Google Home’ as gift, ‘100m’ as event
union all select ‘Google Hub’, ‘200m’
union all select ‘Pixel3’, ‘500m’
)
select winners.*, gifts.gift
from winners
JOIN gifts
on winners.event =gifts.event

With winners as (
select ‘John’ as person, ‘100m’ as event
union all select ‘Jason’, ‘200m’
union all select ‘Bob’, ‘500m’
),
gifts as (
select ‘Google Home’ as gift, ‘100m’ as event
union all select ‘Google Hub’, ‘200m’
union all select ‘Pixel3’, ‘500m’
)
select winners.*, gifts.gift
from winners
cross join gifts

上面的兩個範例中,第一個我們還是用innter join的方式用event為共同欄位去把兩個tbale組合起來,所以是一個蘿蔔一個坑。
但第二個範例是用cross join的方式,可以看到每一個名字都有mapping到所有第二個dataset的欄位。

最後我們來看一下Outer Join的範例
請看以下四個範例,分別是Inner/Full outer/Left/Right Join

With winners as (
Select ‘John’ as person, ‘100m’ as event
Union all select ‘Jason’, ‘200m’
Union all select ‘Ellen’, ‘400m’
Union all select ‘Aaron’, ’50m’
),
Gifts as (
Select ‘Google Hone’ as gift, ‘100m’ as event
Union all select ‘Google Hub’, ‘200m’
Union all select ‘Google Mini’, ‘400m’
Union all select ‘Google Pixel3’, ‘5000m’
)

Select person, gift from winners
Inner join gifts on winners.event = gifts.event

Inner Join

With winners as (
Select ‘John’ as person, ‘100m’ as event
Union all select ‘Jason’, ‘200m’
Union all select ‘Ellen’, ‘400m’
Union all select ‘Aaron’, ’50m’
),
Gifts as (
Select ‘Google Hone’ as gift, ‘100m’ as event
Union all select ‘Google Hub’, ‘200m’
Union all select ‘Google Mini’, ‘400m’
Union all select ‘Google Pixel3’, ‘5000m’
)

Select person, gift from winners
Full outer join gifts on winners.event = gifts.event

Full Outer Join

With winners as (
Select ‘John’ as person, ‘100m’ as event
Union all select ‘Jason’, ‘200m’
Union all select ‘Ellen’, ‘400m’
Union all select ‘Aaron’, ’50m’
),
Gifts as (
Select ‘Google Hone’ as gift, ‘100m’ as event
Union all select ‘Google Hub’, ‘200m’
Union all select ‘Google Mini’, ‘400m’
Union all select ‘Google Pixel3’, ‘5000m’
)

Select person, gift from winners
Left outer join gifts on winners.event = gifts.event

Left Outer Join

With winners as (
Select ‘John’ as person, ‘100m’ as event
Union all select ‘Jason’, ‘200m’
Union all select ‘Ellen’, ‘400m’
Union all select ‘Aaron’, ’50m’
),
Gifts as (
Select ‘Google Hone’ as gift, ‘100m’ as event
Union all select ‘Google Hub’, ‘200m’
Union all select ‘Google Mini’, ‘400m’
Union all select ‘Google Pixel3’, ‘5000m’
)

Select person, gift from winners
Right outer join gifts on winners.event = gifts.event

Right Outer Join

由上面四個範例來看,Inner join一樣是一個蘿蔔一個坑,以event為兩個dataset為共同欄位進行比對及Join。然而兩個dataset都有比對不到的資料故沒有顯示出來。
Full outer Join —
我們可以看到,以event作為共同欄位來做比對及Join,但與inner join不一樣的是,它把沒有比對到的資料也ㄧ併顯示出來。
Left outer join—
顧名思義,就是以左邊的資料為主顯示出所有比對到的資料。故我們可以看到Aaron雖然在左手邊有資料但另一個(右手邊)的dataset卻沒有資料可以進行合併。
Right outer Join—
與Left outer Join相反

以上就是在BQ中,簡單的joining tables的介紹,下一篇我們會在深入BQ的資料型態與功能。