1. 簡介

在資料庫SQL處理中,常常有行轉列(Pivot)和列轉行(Unpivot)的數據處理需求。本文以示例說明在Data Lake Analytics(https://www.aliyun.com/product/datalakeanalytics)中,如何使用SQL的一些技巧,達到行轉列(Pivot)和列轉行(Unpivot)的目的。另外,DLA支持函數式表達式的處理邏輯、豐富的JSON數據處理函數和UNNEST的SQL語法,結合這些功能,能夠實現非常豐富、強大的SQL數據處理語義和能力,本文也以JSON數據列展開為示例,說明在DLA中使用這種SQL的技巧。

2. 行轉列(Pivot)

2.1 樣例數據

test_pivot表內容:

+------+----------+---------+--------+
| id | username | subject | source |
+------+----------+---------+--------+
| 1 | 張三 | 語文 | 60 |
| 2 | 李四 | 數學 | 70 |
| 3 | 王五 | 英語 | 80 |
| 4 | 王五 | 數學 | 75 |
| 5 | 王五 | 語文 | 57 |
| 6 | 李四 | 語文 | 80 |
| 7 | 張三 | 英語 | 100 |
+------+----------+---------+--------+

2.2 方法一:通過CASE WHEN語句

SQL語句:

SELECT
username,
max(CASE WHEN subject = 語文 THEN source END) AS `語文`,
max(CASE WHEN subject = 數學 THEN source END) AS `數學`,
max(CASE WHEN subject = 英語 THEN source END) AS `英語`
FROM test_pivot
GROUP BY username
ORDER BY username;

結果:

+----------+--------+--------+--------+
| username | 語文 | 數學 | 英語 |
+----------+--------+--------+--------+
| 張三 | 60 | NULL | 100 |
| 李四 | 80 | 70 | NULL |
| 王五 | 57 | 75 | 80 |
+----------+--------+--------+--------+

2.3 方法二:通過map_agg函數

該方法思路上分為兩個步驟:

第一步,通過map_agg函數把兩個列的多行的值,映射為map;第二步,通過map的輸出,達到多列輸出的目的。

第一步SQL:

SELECT username, map_agg(subject, source) kv
FROM test_pivot
GROUP BY username
ORDER BY username;

第一步輸出:

+----------+-----------------------------------+
| username | kv |
+----------+-----------------------------------+
| 張三 | {語文=60, 英語=100} |
| 李四 | {數學=70, 語文=80} |
| 王五 | {數學=75, 語文=57, 英語=80} |
+----------+-----------------------------------+

可以看到map_agg的輸出效果。

最終,該方法的SQL:

SELECT
username,
if(element_at(kv, 語文) = null, null, kv[語文]) AS `語文`,
if(element_at(kv, 數學) = null, null, kv[數學]) AS `數學`,
if(element_at(kv, 英語) = null, null, kv[英語]) AS `英語`
FROM (
SELECT username, map_agg(subject, source) kv
FROM test_pivot
GROUP BY username
) t
ORDER BY username;

結果:

+----------+--------+--------+--------+
| username | 語文 | 數學 | 英語 |
+----------+--------+--------+--------+
| 張三 | 60 | NULL | 100 |
| 李四 | 80 | 70 | NULL |
| 王五 | 57 | 75 | 80 |
+----------+--------+--------+--------+

3. 列轉行(Unpivot)

3.1 樣例數據

test_unpivot表內容:

+----------+--------+--------+--------+
| username | 語文 | 數學 | 英語 |
+----------+--------+--------+--------+
| 張三 | 60 | NULL | 100 |
| 李四 | 80 | 70 | NULL |
| 王五 | 57 | 75 | 80 |
+----------+--------+--------+--------+

3.2 方法一:通過UNION語句

SQL語句:

SELECT username, subject, source
FROM (
SELECT username, 語文 AS subject, `語文` AS source FROM test_unpivot WHERE `語文` is not null
UNION
SELECT username, 數學 AS subject, `數學` AS source FROM test_unpivot WHERE `數學` is not null
UNION
SELECT username, 英語 AS subject, `英語` AS source FROM test_unpivot WHERE `英語` is not null
)
ORDER BY username;

結果:

+----------+---------+--------+
| username | subject | source |
+----------+---------+--------+
| 張三 | 語文 | 60 |
| 張三 | 英語 | 100 |
| 李四 | 語文 | 80 |
| 李四 | 數學 | 70 |
| 王五 | 英語 | 80 |
| 王五 | 語文 | 57 |
| 王五 | 數學 | 75 |
+----------+---------+--------+

3.3 方法二:通過CROSS JOIN UNNEST語句

SQL語句:

SELECT t1.username, t2.subject, t2.source
FROM test_unpivot t1
CROSS JOIN UNNEST (
array[語文, 數學, 英語],
array[`語文`, `數學`, `英語`]
) t2 (subject, source)
WHERE t2.source is not null

結果:

+----------+---------+--------+
| username | subject | source |
+----------+---------+--------+
| 張三 | 語文 | 60 |
| 張三 | 英語 | 100 |
| 李四 | 語文 | 80 |
| 李四 | 數學 | 70 |
| 王五 | 語文 | 57 |
| 王五 | 數學 | 75 |
| 王五 | 英語 | 80 |
+----------+---------+--------+

4. JSON數據列展開

JSON數據的表達能力非常靈活,因此在資料庫和SQL中,常常需要處理JSON數據,常常碰到稍複雜的需求,就是將JSON數據中的某些屬性欄位,進行展開轉換,轉成行、列的關係型表達。

4.1 基本思路和步驟

  • 使用JSON函數,對JSON字元串進行解析和數據提取;
  • 提取、轉換為ARRAY或者MAP的數據結構,如有需要,可以使用Lambda函數式表達式進行轉換處理;
  • 利用UNNEST語法進行列展開。

下面以多個示例說明。

4.2 用UNNEST對MAP進行關係型展開

SQL示例:

SELECT t.m, t.n
FROM (
SELECT MAP(ARRAY[foo, bar], ARRAY[1, 2]) as map_data
)
CROSS JOIN unnest(map_data) AS t(m, n);

結果:

+------+------+
| m | n |
+------+------+
| foo | 1 |
| bar | 2 |
+------+------+

4.3 用UNNEST對JSON數據進行關係型展開

SQL示例:

SELECT json_extract(t.a, $.a) AS a,
json_extract(t.a, $.b) AS b
FROM (
SELECT cast(json_extract({"x":[{"a":1,"b":2},{"a":3,"b":4}]}, $.x)
AS array<JSON>) AS package_array
)
CROSS JOIN UNNEST(package_array) AS t(a);

結果:

+------+------+
| a | b |
+------+------+
| 1 | 2 |
| 3 | 4 |
+------+------+

SQL示例:

SELECT t.m AS _col1, t.n AS _col2
FROM (
SELECT cast(json_extract({"x":[{"a":1,"b":2},{"a":3,"b":4}]}, $.x)
AS array<JSON>) AS array_1,
cast(json_extract({"x":[{"a":5,"b":6}, {"a":7,"b":8}, {"a":9,"b":10}, {"a":11,"b":12}]}, $.x)
AS array<JSON>) AS array_2
)
CROSS JOIN UNNEST(array_1, array_2) AS t(m, n);

結果:

+---------------+-----------------+
| _col1 | _col2 |
+---------------+-----------------+
| {"a":1,"b":2} | {"a":5,"b":6} |
| {"a":3,"b":4} | {"a":7,"b":8} |
| NULL | {"a":9,"b":10} |
| NULL | {"a":11,"b":12} |
+---------------+-----------------+

SQL示例:

SELECT json_extract(t.m, $.a) AS _col1,
json_extract(t.m, $.b) AS _col2,
json_extract(t.n, $.a) AS _col3,
json_extract(t.n, $.b) AS _col4
FROM (
SELECT cast(json_extract({"x":[{"a":1,"b":2},{"a":3,"b":4}]}, $.x)
AS array<JSON>) AS array_1,
cast(json_extract({"x":[{"a":5,"b":6}, {"a":7,"b":8}, {"a":9,"b":10}, {"a":11,"b":12}]}, $.x)
AS array<JSON>) AS array_2
)
CROSS JOIN UNNEST(array_1, array_2) AS t(m, n);

結果:

+-------+-------+-------+-------+
| _col1 | _col2 | _col3 | _col4 |
+-------+-------+-------+-------+
| 1 | 2 | 5 | 6 |
| 3 | 4 | 7 | 8 |
| NULL | NULL | 9 | 10 |
| NULL | NULL | 11 | 12 |
+-------+-------+-------+-------+

4.4 結合Lambda表達式,用UNNEST對JSON數據進行關係型展開

SQL示例:

SELECT count(*) AS cnt,
package_name
FROM (
SELECT t.a AS package_name
FROM (
SELECT transform(packages_map_array, x -> Element_at(x, packageName))
AS package_array
FROM (
SELECT cast(Json_extract(data_json, $.packages)
AS array<map<VARCHAR, VARCHAR>>) AS packages_map_array
FROM (
SELECT json_parse(data) AS data_json
FROM (
SELECT {
"packages": [
{
"appName": "鐵路12306",
"packageName": "com.MobileTicket",
"versionName": "4.1.9",
"versionCode": "194"
},
{
"appName": "QQ飛車",
"packageName": "com.tencent.tmgp.speedmobile",
"versionName": "1.11.0.13274",
"versionCode": "1110013274"
},
{
"appName": "掌閱",
"packageName": "com.chaozh.iReaderFree",
"versionName": "7.11.0",
"versionCode": "71101"
}
]
}
AS data
)
)
)
) AS x (package_array)
CROSS JOIN UNNEST(package_array) AS t (a)
)
GROUP BY package_name
ORDER BY cnt DESC;

結果:

+------+------------------------------+
| cnt | package_name |
+------+------------------------------+
| 1 | com.MobileTicket |
| 1 | com.tencent.tmgp.speedmobile |
| 1 | com.chaozh.iReaderFree |
+------+------------------------------+

本文作者:julian.zhou

原文鏈接

更多技術乾貨敬請關注云棲社區知乎機構號:阿里云云棲社區 - 知乎

本文為雲棲社區原創內容,未經允許不得轉載。

推薦閱讀:

相关文章