摘要: 本文主要基於 spring cloud + spring jpa + spring cloud alibaba fescar + mysql + seata 的結構,搭建一個分散式系統的 demo,通過 seata 的 debug 日誌和源代碼,從 client 端(RM、TM)的角度分析其工作流程及原理。

前言

在分散式系統中,分散式事務是一個必須要解決的問題,目前使用較多的是最終一致性方案。自年初阿里開源了Fescar(四月初更名為Seata)後,該項目受到了極大的關注,目前已接近 8000 Star。Seata以高性能和零侵入的特性為目標解決微服務領域的分散式事務難題,目前正處於快速迭代中,近期小目標是生產可用的 Mysql 版本。

本文主要基於 spring cloud + spring jpa + spring cloud alibaba fescar + mysql + seata 的結構,搭建一個分散式系統的 demo,通過 seata 的 debug 日誌和源代碼,從 client 端(RM、TM)的角度分析其工作流程及原理。(示例項目:github.com/fescar-group

為了更好地理解全文,我們來熟悉一下相關概念:

  • XID:全局事務的唯一標識,由 ip:port:sequence 組成;
  • Transaction Coordinator (TC):事務協調器,維護全局事務的運行狀態,負責協調並驅動全局事務的提交或回滾;
  • Transaction Manager (TM ):控制全局事務的邊界,負責開啟一個全局事務,並最終發起全局提交或全局回滾的決議;
  • Resource Manager (RM):控制分支事務,負責分支註冊、狀態彙報,並接收事務協調器的指令,驅動分支(本地)事務的提交和回滾;

提示:文中代碼是基於 fescar-0.4.1 版本,由於項目剛更名為 seata 不久,其中一些包名、類名、jar包等名稱還沒統一更換過來,故下文中仍使用 fescar 進行表述。

分散式框架支持

Fescar 使用 XID 表示一個分散式事務,XID 需要在一次分散式事務請求所涉的系統中進行傳遞,從而向 feacar-server 發送分支事務的處理情況,以及接收 feacar-server 的 commit、rollback 指令。 Fescar 官方已支持全版本的 dubbo 協議,而對於 spring cloud(spring-boot)的分散式項目社區也提供了相應的實現

<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-alibaba-fescar</artifactId>
<version>2.1.0.BUILD-SNAPSHOT</version>
</dependency>

該組件實現了基於 RestTemplate、Feign 通信時的 XID 傳遞功能。

業務邏輯

業務邏輯是經典的下訂單、扣餘額、減庫存流程。 根據模塊劃分為三個獨立的服務,且分別連接對應的資料庫:

  • 訂單:order-server
  • 賬戶:account-server
  • 庫存:storage-server

另外還有發起分散式事務的業務系統:

  • 業務:business-server

項目結構如下圖

正常業務:

  1. business發起購買請求
  2. storage扣減庫存
  3. order創建訂單
  4. account扣減餘額

異常業務

  1. business發起購買請求
  2. storage扣減庫存
  3. order創建訂單
  4. account扣減餘額異常

正常流程下 2、3、4 步的數據正常更新全局 commit,異常流程下的數據則由於第 4 步的異常報錯全局回滾。

配置文件

fescar 的配置入口文件是 registry.conf, 查看代碼 ConfigurationFactory 得知目前還不能指定該配置文件,所以配置文件名稱只能為 registry.conf。

private static final String REGISTRY_CONF = "registry.conf";
public static final Configuration FILE_INSTANCE = new FileConfiguration(REGISTRY_CONF);

registry 中可以指定具體配置的形式,默認使用 file 類型,在 file.conf 中有 3 部分配置內容:

  1. transport transport 部分的配置對應 NettyServerConfig 類,用於定義 Netty 相關的參數,TM、RM 與 fescar-server 之間使用 Netty 進行通信。
  2. service

service { #vgroup->rgroup vgroup_mapping.my_test_tx_group = "default" #配置Client連接TC的地址 default.grouplist = "127.0.0.1:8091" #degrade current not support enableDegrade = false #disable 是否啟用seata的分散式事務 disableGlobalTransaction = false }

3. client

client { #RM接收TC的commit通知後緩衝上限 async.commit.buffer.limit = 10000 lock { retry.internal = 10 retry.times = 30 } }
數據源 Proxy

除了前面的配置文件,fescar 在 AT 模式下稍微有點代碼量的地方就是對數據源的代理指定,且目前只能基於DruidDataSource的代理。 (註:在最新發布的 0.4.2 版本中已支持任意數據源類型)

@Bean
@ConfigurationProperties(prefix = "spring.datasource")
public DruidDataSource druidDataSource() {
DruidDataSource druidDataSource = new DruidDataSource();
return druidDataSource;
}

@Primary
@Bean("dataSource")
public DataSourceProxy dataSource(DruidDataSource druidDataSource) {
return new DataSourceProxy(druidDataSource);
}

使用 DataSourceProxy 的目的是為了引入 ConnectionProxy ,fescar 無侵入的一方面就體現在 ConnectionProxy 的實現上,即分支事務加入全局事務的切入點是在本地事務的 commit 階段,這樣設計可以保證業務數據與 undo_log 是在一個本地事務中。

undo_log 是需要在業務庫上創建的一個表,fescar 依賴該表記錄每筆分支事務的狀態及二階段 rollback 的回放數據。不用擔心該表的數據量過大形成單點問題,在全局事務 commit 的場景下事務對應的 undo_log 會非同步刪除。

CREATE TABLE `undo_log` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`branch_id` bigint(20) NOT NULL,
`xid` varchar(100) NOT NULL,
`rollback_info` longblob NOT NULL,
`log_status` int(11) NOT NULL,
`log_created` datetime NOT NULL,
`log_modified` datetime NOT NULL,
`ext` varchar(100) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `ux_undo_log` (`xid`,`branch_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

啟動 Server

前往github.com/seata/seata/ 下載與 Client 版本對應的 fescar-server,避免由於版本的不同導致的協議不一致問題 進入解壓之後的 bin 目錄,執行:

./fescar-server.sh 8091 ../data

啟動成功輸出:

2019-04-09 20:27:24.637 INFO [main]c.a.fescar.core.rpc.netty.AbstractRpcRemotingServer.start:152 -Server started ...

啟動 Client

fescar 的載入入口類位於 GlobalTransactionAutoConfiguration,對基於 spring boot 的項目能夠自動載入,當然也可以通過其他方式示例化 GlobalTransactionScanner

@Configuration
@EnableConfigurationProperties({FescarProperties.class})
public class GlobalTransactionAutoConfiguration {
private final ApplicationContext applicationContext;
private final FescarProperties fescarProperties;

public GlobalTransactionAutoConfiguration(ApplicationContext applicationContext, FescarProperties fescarProperties) {
this.applicationContext = applicationContext;
this.fescarProperties = fescarProperties;
}

/**
* 示例化GlobalTransactionScanner
* scanner為client初始化的發起類
*/
@Bean
public GlobalTransactionScanner globalTransactionScanner() {
String applicationName = this.applicationContext.getEnvironment().getProperty("spring.application.name");
String txServiceGroup = this.fescarProperties.getTxServiceGroup();
if (StringUtils.isEmpty(txServiceGroup)) {
txServiceGroup = applicationName + "-fescar-service-group";
this.fescarProperties.setTxServiceGroup(txServiceGroup);
}

return new GlobalTransactionScanner(applicationName, txServiceGroup);
}
}

可以看到支持一個配置項FescarProperties,用於配置事務分組名稱:

spring.cloud.alibaba.fescar.tx-service-group=my_test_tx_group

如果不指定服務組,則默認使用spring.application.name+ -fescar-service-group生成名稱,所以不指定spring.application.name啟動會報錯。

@ConfigurationProperties("spring.cloud.alibaba.fescar")
public class FescarProperties {
private String txServiceGroup;

public FescarProperties() {
}

public String getTxServiceGroup() {
return this.txServiceGroup;
}

public void setTxServiceGroup(String txServiceGroup) {
this.txServiceGroup = txServiceGroup;
}
}

獲取 applicationId 和 txServiceGroup 後,創建 GlobalTransactionScanner 對象,主要看類中 initClient 方法。

private void initClient() {
if (StringUtils.isNullOrEmpty(applicationId) || StringUtils.isNullOrEmpty(txServiceGroup)) {
throw new IllegalArgumentException(
"applicationId: " + applicationId + ", txServiceGroup: " + txServiceGroup);
}
//init TM
TMClient.init(applicationId, txServiceGroup);

//init RM
RMClient.init(applicationId, txServiceGroup);

}

方法中可以看到初始化了 TMClientRMClient,對於一個服務既可以是TM角色也可以是RM角色,至於什麼時候是 TM 或者 RM 則要看在一次全局事務中 @GlobalTransactional 註解標註在哪。 Client 創建的結果是與 TC 的一個 Netty 連接,所以在啟動日誌中可以看到兩個 Netty Channel,其中標明了 transactionRole 分別為 TMROLERMROLE

2019-04-09 13:42:57.417 INFO 93715 --- [imeoutChecker_1] c.a.f.c.rpc.netty.NettyPoolableFactory : NettyPool create channel to {"address":"127.0.0.1:8091","message":{"applicationId":"business-service","byteBuffer":{"char":"u0000","direct":false,"double":0.0,"float":0.0,"int":0,"long":0,"readOnly":false,"short":0},"transactionServiceGroup":"my_test_tx_group","typeCode":101,"version":"0.4.1"},"transactionRole":"TMROLE"}
2019-04-09 13:42:57.505 INFO 93715 --- [imeoutChecker_1] c.a.f.c.rpc.netty.NettyPoolableFactory : NettyPool create channel to {"address":"127.0.0.1:8091","message":{"applicationId":"business-service","byteBuffer":{"char":"u0000","direct":false,"double":0.0,"float":0.0,"int":0,"long":0,"readOnly":false,"short":0},"transactionServiceGroup":"my_test_tx_group","typeCode":103,"version":"0.4.1"},"transactionRole":"RMROLE"}
2019-04-09 13:42:57.629 DEBUG 93715 --- [lector_TMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler : Send:RegisterTMRequest{applicationId=business-service, transactionServiceGroup=my_test_tx_group}
2019-04-09 13:42:57.629 DEBUG 93715 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler : Send:RegisterRMRequest{resourceIds=null, applicationId=business-service, transactionServiceGroup=my_test_tx_group}
2019-04-09 13:42:57.699 DEBUG 93715 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler : Receive:version=0.4.1,extraData=null,identified=true,resultCode=null,msg=null,messageId:1
2019-04-09 13:42:57.699 DEBUG 93715 --- [lector_TMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler : Receive:version=0.4.1,extraData=null,identified=true,resultCode=null,msg=null,messageId:2
2019-04-09 13:42:57.701 DEBUG 93715 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.AbstractRpcRemoting : com.alibaba.fescar.core.rpc.netty.RmRpcClient@3b06d101 msgId:1, future :com.alibaba.fescar.core.protocol.MessageFuture@28bb1abd, body:version=0.4.1,extraData=null,identified=true,resultCode=null,msg=null
2019-04-09 13:42:57.701 DEBUG 93715 --- [lector_TMROLE_1] c.a.f.c.rpc.netty.AbstractRpcRemoting : com.alibaba.fescar.core.rpc.netty.TmRpcClient@65fc3fb7 msgId:2, future :com.alibaba.fescar.core.protocol.MessageFuture@9a1e3df, body:version=0.4.1,extraData=null,identified=true,resultCode=null,msg=null
2019-04-09 13:42:57.710 INFO 93715 --- [imeoutChecker_1] c.a.fescar.core.rpc.netty.RmRpcClient : register RM success. server version:0.4.1,channel:[id: 0xe6468995, L:/127.0.0.1:57397 - R:/127.0.0.1:8091]
2019-04-09 13:42:57.710 INFO 93715 --- [imeoutChecker_1] c.a.f.c.rpc.netty.NettyPoolableFactory : register success, cost 114 ms, version:0.4.1,role:TMROLE,channel:[id: 0xd22fe0c5, L:/127.0.0.1:57398 - R:/127.0.0.1:8091]
2019-04-09 13:42:57.711 INFO 93715 --- [imeoutChecker_1] c.a.f.c.rpc.netty.NettyPoolableFactory : register success, cost 125 ms, version:0.4.1,role:RMROLE,channel:[id: 0xe6468995, L:/127.0.0.1:57397 - R:/127.0.0.1:8091]

日誌中可以看到

  1. 創建Netty連接
  2. 發送註冊請求
  3. 得到響應結果
  4. RmRpcClientTmRpcClient 成功實例化

TM 處理流程

在本例中,TM 的角色是 business-service, BusinessService 的 purchase 方法標註了 @GlobalTransactional 註解:

@Service
public class BusinessService {

@Autowired
private StorageFeignClient storageFeignClient;
@Autowired
private OrderFeignClient orderFeignClient;

@GlobalTransactional
public void purchase(String userId, String commodityCode, int orderCount){
storageFeignClient.deduct(commodityCode, orderCount);

orderFeignClient.create(userId, commodityCode, orderCount);
}
}

方法調用後將會創建一個全局事務,首先關注 @GlobalTransactional 註解的作用,在 GlobalTransactionalInterceptor 中被攔截處理。

/**
* AOP攔截方法調用
*/
@Override
public Object invoke(final MethodInvocation methodInvocation) throws Throwable {
Class<?> targetClass = (methodInvocation.getThis() != null ? AopUtils.getTargetClass(methodInvocation.getThis()) : null);
Method specificMethod = ClassUtils.getMostSpecificMethod(methodInvocation.getMethod(), targetClass);
final Method method = BridgeMethodResolver.findBridgedMethod(specificMethod);

//獲取方法GlobalTransactional註解
final GlobalTransactional globalTransactionalAnnotation = getAnnotation(method, GlobalTransactional.class);
final GlobalLock globalLockAnnotation = getAnnotation(method, GlobalLock.class);

//如果方法有GlobalTransactional註解,則攔截到相應方法處理
if (globalTransactionalAnnotation != null) {
return handleGlobalTransaction(methodInvocation, globalTransactionalAnnotation);
} else if (globalLockAnnotation != null) {
return handleGlobalLock(methodInvocation);
} else {
return methodInvocation.proceed();
}
}

handleGlobalTransaction 方法中對 TransactionalTemplate 的 execute 進行了調用,從類名可以看到這是一個標準的模版方法,它定義了 TM 對全局事務處理的標準步驟,注釋已經比較清楚了。

public Object execute(TransactionalExecutor business) throws TransactionalExecutor.ExecutionException {
// 1. get or create a transaction
GlobalTransaction tx = GlobalTransactionContext.getCurrentOrCreate();

try {
// 2. begin transaction
try {
triggerBeforeBegin();
tx.begin(business.timeout(), business.name());
triggerAfterBegin();
} catch (TransactionException txe) {
throw new TransactionalExecutor.ExecutionException(tx, txe,
TransactionalExecutor.Code.BeginFailure);
}
Object rs = null;
try {
// Do Your Business
rs = business.execute();
} catch (Throwable ex) {
// 3. any business exception, rollback.
try {
triggerBeforeRollback();
tx.rollback();
triggerAfterRollback();
// 3.1 Successfully rolled back
throw new TransactionalExecutor.ExecutionException(tx, TransactionalExecutor.Code.RollbackDone, ex);
} catch (TransactionException txe) {
// 3.2 Failed to rollback
throw new TransactionalExecutor.ExecutionException(tx, txe,
TransactionalExecutor.Code.RollbackFailure, ex);
}
}
// 4. everything is fine, commit.
try {
triggerBeforeCommit();
tx.commit();
triggerAfterCommit();
} catch (TransactionException txe) {
// 4.1 Failed to commit
throw new TransactionalExecutor.ExecutionException(tx, txe,
TransactionalExecutor.Code.CommitFailure);
}
return rs;
} finally {
//5. clear
triggerAfterCompletion();
cleanUp();
}
}

通過 DefaultGlobalTransaction 的 begin 方法開啟全局事務。

public void begin(int timeout, String name) throws TransactionException {
if (role != GlobalTransactionRole.Launcher) {
check();
if (LOGGER.isDebugEnabled()) {
LOGGER.debug("Ignore Begin(): just involved in global transaction [" + xid + "]");
}
return;
}
if (xid != null) {
throw new IllegalStateException();
}
if (RootContext.getXID() != null) {
throw new IllegalStateException();
}
//具體開啟事務的方法,獲取TC返回的XID
xid = transactionManager.begin(null, null, name, timeout);
status = GlobalStatus.Begin;
RootContext.bind(xid);
if (LOGGER.isDebugEnabled()) {
LOGGER.debug("Begin a NEW global transaction [" + xid + "]");
}
}

方法開頭處if (role != GlobalTransactionRole.Launcher)對 role 的判斷有關鍵的作用,表明當前是全局事務的發起者(Launcher)還是參與者(Participant)。如果在分散式事務的下游系統方法中也加上@GlobalTransactional註解,那麼它的角色就是 Participant,會忽略後面的 begin 直接 return,而判斷是 Launcher 還是 Participant 是根據當前上下文是否已存在 XID 來判斷,沒有 XID 的就是 Launcher,已經存在 XID的就是 Participant。由此可見,全局事務的創建只能由 Launcher 執行,而一次分散式事務中也只有一個Launcher 存在。

DefaultTransactionManager負責 TM 與 TC 通訊,發送 begin、commit、rollback 指令。

@Override
public String begin(String applicationId, String transactionServiceGroup, String name, int timeout)
throws TransactionException {
GlobalBeginRequest request = new GlobalBeginRequest();
request.setTransactionName(name);
request.setTimeout(timeout);
GlobalBeginResponse response = (GlobalBeginResponse)syncCall(request);
return response.getXid();
}

至此拿到 fescar-server 返回的 XID 表示一個全局事務創建成功,日誌中也反應了上述流程。

2019-04-09 13:46:57.417 DEBUG 31326 --- [nio-8084-exec-1] c.a.f.c.rpc.netty.AbstractRpcRemoting : offer message: timeout=60000,transactionName=purchase(java.lang.String,java.lang.String,int)
2019-04-09 13:46:57.417 DEBUG 31326 --- [geSend_TMROLE_1] c.a.f.c.rpc.netty.AbstractRpcRemoting : write message:FescarMergeMessage timeout=60000,transactionName=purchase(java.lang.String,java.lang.String,int), channel:[id: 0xa148545e, L:/127.0.0.1:56120 - R:/127.0.0.1:8091],active?true,writable?true,isopen?true
2019-04-09 13:46:57.418 DEBUG 31326 --- [lector_TMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler : Send:FescarMergeMessage timeout=60000,transactionName=purchase(java.lang.String,java.lang.String,int)
2019-04-09 13:46:57.421 DEBUG 31326 --- [lector_TMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler : Receive:MergeResultMessage com.alibaba.fescar.core.protocol.transaction.GlobalBeginResponse@2dc480dc,messageId:1196
2019-04-09 13:46:57.421 DEBUG 31326 --- [nio-8084-exec-1] c.a.fescar.core.context.RootContext : bind 192.168.224.93:8091:2008502699
2019-04-09 13:46:57.421 DEBUG 31326 --- [nio-8084-exec-1] c.a.f.tm.api.DefaultGlobalTransaction : Begin a NEW global transaction [192.168.224.93:8091:2008502699]

全局事務創建後,就開始執行 business.execute(),即業務代碼storageFeignClient.deduct(commodityCode, orderCount)進入 RM 處理流程,此處的業務邏輯為調用 storage-service 的扣減庫存介面。

RM 處理流程

@GetMapping(path = "/deduct")
public Boolean deduct(String commodityCode, Integer count){
storageService.deduct(commodityCode,count);
return true;
}

@Transactional
public void deduct(String commodityCode, int count){
Storage storage = storageDAO.findByCommodityCode(commodityCode);
storage.setCount(storage.getCount()-count);

storageDAO.save(storage);
}

storage 的介面和 service 方法並未出現 fescar 相關的代碼和註解,體現了 fescar 的無侵入。那它是如何加入到這次全局事務中的呢?答案在ConnectionProxy中,這也是前面說為什麼必須要使用DataSourceProxy的原因,通過 DataSourceProxy 才能在業務代碼的本地事務提交時,fescar 通過該切入點,向 TC 註冊分支事務並發送 RM 的處理結果。

由於業務代碼本身的事務提交被ConnectionProxy代理實現,所以在提交本地事務時,實際執行的是ConnectionProxy 的 commit 方法。

public void commit() throws SQLException {
//如果當前是全局事務,則執行全局事務的提交
//判斷是不是全局事務,就是看當前上下文是否存在XID
if (context.inGlobalTransaction()) {
processGlobalTransactionCommit();
} else if (context.isGlobalLockRequire()) {
processLocalCommitWithGlobalLocks();
} else {
targetConnection.commit();
}
}

private void processGlobalTransactionCommit() throws SQLException {
try {
//首先是向TC註冊RM,拿到TC分配的branchId
register();
} catch (TransactionException e) {
recognizeLockKeyConflictException(e);
}

try {
if (context.hasUndoLog()) {
//寫入undolog
UndoLogManager.flushUndoLogs(this);
}

//提交本地事務,寫入undo_log和業務數據在同一個本地事務中
targetConnection.commit();
} catch (Throwable ex) {
//向TC發送RM的事務處理失敗的通知
report(false);
if (ex instanceof SQLException) {
throw new SQLException(ex);
}
}
//向TC發送RM的事務處理成功的通知
report(true);
context.reset();
}

private void register() throws TransactionException {
//註冊RM,構建request通過netty向TC發送註冊指令
Long branchId = DefaultResourceManager.get().branchRegister(BranchType.AT, getDataSourceProxy().getResourceId(),
null, context.getXid(), null, context.buildLockKeys());
//將返回的branchId存在上下文中
context.setBranchId(branchId);
}

通過日誌印證一下上面的流程。

2019-04-09 21:57:48.341 DEBUG 38933 --- [nio-8081-exec-1] o.s.c.a.f.web.FescarHandlerInterceptor : xid in RootContext null xid in RpcContext 192.168.0.2:8091:2008546211
2019-04-09 21:57:48.341 DEBUG 38933 --- [nio-8081-exec-1] c.a.fescar.core.context.RootContext : bind 192.168.0.2:8091:2008546211
2019-04-09 21:57:48.341 DEBUG 38933 --- [nio-8081-exec-1] o.s.c.a.f.web.FescarHandlerInterceptor : bind 192.168.0.2:8091:2008546211 to RootContext
2019-04-09 21:57:48.386 INFO 38933 --- [nio-8081-exec-1] o.h.h.i.QueryTranslatorFactoryInitiator : HHH000397: Using ASTQueryTranslatorFactory
Hibernate: select storage0_.id as id1_0_, storage0_.commodity_code as commodit2_0_, storage0_.count as count3_0_ from storage_tbl storage0_ where storage0_.commodity_code=?
Hibernate: update storage_tbl set count=? where id=?
2019-04-09 21:57:48.673 INFO 38933 --- [nio-8081-exec-1] c.a.fescar.core.rpc.netty.RmRpcClient : will connect to 192.168.0.2:8091
2019-04-09 21:57:48.673 INFO 38933 --- [nio-8081-exec-1] c.a.fescar.core.rpc.netty.RmRpcClient : RM will register :jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false
2019-04-09 21:57:48.673 INFO 38933 --- [nio-8081-exec-1] c.a.f.c.rpc.netty.NettyPoolableFactory : NettyPool create channel to {"address":"192.168.0.2:8091","message":{"applicationId":"storage-service","byteBuffer":{"char":"u0000","direct":false,"double":0.0,"float":0.0,"int":0,"long":0,"readOnly":false,"short":0},"resourceIds":"jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false","transactionServiceGroup":"hello-service-fescar-service-group","typeCode":103,"version":"0.4.0"},"transactionRole":"RMROLE"}
2019-04-09 21:57:48.677 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler : Send:RegisterRMRequest{resourceIds=jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false, applicationId=storage-service, transactionServiceGroup=hello-service-fescar-service-group}
2019-04-09 21:57:48.680 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler : Receive:version=0.4.1,extraData=null,identified=true,resultCode=null,msg=null,messageId:9
2019-04-09 21:57:48.680 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.AbstractRpcRemoting : com.alibaba.fescar.core.rpc.netty.RmRpcClient@7d61f5d4 msgId:9, future :com.alibaba.fescar.core.protocol.MessageFuture@186cd3e0, body:version=0.4.1,extraData=null,identified=true,resultCode=null,msg=null
2019-04-09 21:57:48.680 INFO 38933 --- [nio-8081-exec-1] c.a.fescar.core.rpc.netty.RmRpcClient : register RM success. server version:0.4.1,channel:[id: 0xd40718e3, L:/192.168.0.2:62607 - R:/192.168.0.2:8091]
2019-04-09 21:57:48.680 INFO 38933 --- [nio-8081-exec-1] c.a.f.c.rpc.netty.NettyPoolableFactory : register success, cost 3 ms, version:0.4.1,role:RMROLE,channel:[id: 0xd40718e3, L:/192.168.0.2:62607 - R:/192.168.0.2:8091]
2019-04-09 21:57:48.680 DEBUG 38933 --- [nio-8081-exec-1] c.a.f.c.rpc.netty.AbstractRpcRemoting : offer message: transactionId=2008546211,branchType=AT,resourceId=jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false,lockKey=storage_tbl:1
2019-04-09 21:57:48.681 DEBUG 38933 --- [geSend_RMROLE_1] c.a.f.c.rpc.netty.AbstractRpcRemoting : write message:FescarMergeMessage transactionId=2008546211,branchType=AT,resourceId=jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false,lockKey=storage_tbl:1, channel:[id: 0xd40718e3, L:/192.168.0.2:62607 - R:/192.168.0.2:8091],active?true,writable?true,isopen?true
2019-04-09 21:57:48.681 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler : Send:FescarMergeMessage transactionId=2008546211,branchType=AT,resourceId=jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false,lockKey=storage_tbl:1
2019-04-09 21:57:48.687 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler : Receive:MergeResultMessage BranchRegisterResponse: transactionId=2008546211,branchId=2008546212,result code =Success,getMsg =null,messageId:11
2019-04-09 21:57:48.702 DEBUG 38933 --- [nio-8081-exec-1] c.a.f.rm.datasource.undo.UndoLogManager : Flushing UNDO LOG: {"branchId":2008546212,"sqlUndoLogs":[{"afterImage":{"rows":[{"fields":[{"keyType":"PrimaryKey","name":"id","type":4,"value":1},{"keyType":"NULL","name":"count","type":4,"value":993}]}],"tableName":"storage_tbl"},"beforeImage":{"rows":[{"fields":[{"keyType":"PrimaryKey","name":"id","type":4,"value":1},{"keyType":"NULL","name":"count","type":4,"value":994}]}],"tableName":"storage_tbl"},"sqlType":"UPDATE","tableName":"storage_tbl"}],"xid":"192.168.0.2:8091:2008546211"}
2019-04-09 21:57:48.755 DEBUG 38933 --- [nio-8081-exec-1] c.a.f.c.rpc.netty.AbstractRpcRemoting : offer message: transactionId=2008546211,branchId=2008546212,resourceId=null,status=PhaseOne_Done,applicationData=null
2019-04-09 21:57:48.755 DEBUG 38933 --- [geSend_RMROLE_1] c.a.f.c.rpc.netty.AbstractRpcRemoting : write message:FescarMergeMessage transactionId=2008546211,branchId=2008546212,resourceId=null,status=PhaseOne_Done,applicationData=null, channel:[id: 0xd40718e3, L:/192.168.0.2:62607 - R:/192.168.0.2:8091],active?true,writable?true,isopen?true
2019-04-09 21:57:48.756 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler : Send:FescarMergeMessage transactionId=2008546211,branchId=2008546212,resourceId=null,status=PhaseOne_Done,applicationData=null
2019-04-09 21:57:48.758 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler : Receive:MergeResultMessage com.alibaba.fescar.core.protocol.transaction.BranchReportResponse@582a08cf,messageId:13
2019-04-09 21:57:48.799 DEBUG 38933 --- [nio-8081-exec-1] c.a.fescar.core.context.RootContext : unbind 192.168.0.2:8091:2008546211
2019-04-09 21:57:48.799 DEBUG 38933 --- [nio-8081-exec-1] o.s.c.a.f.web.FescarHandlerInterceptor : unbind 192.168.0.2:8091:2008546211 from RootContext

  1. 獲取business-service傳來的XID
  2. 綁定XID到當前上下文中
  3. 執行業務邏輯sql
  4. 向TC創建本次RM的Netty連接
  5. 向TC發送分支事務的相關信息
  6. 獲得TC返回的branchId
  7. 記錄Undo Log數據
  8. 向TC發送本次事務PhaseOne階段的處理結果
  9. 從當前上下文中解綁XID

其中第 1 步和第 9 步,是在FescarHandlerInterceptor中完成的,該類並不屬於 fescar,是前面提到的 spring-cloud-alibaba-fescar,它實現了基於 feign、rest 通信時將 xid bind 和 unbind 到當前請求上下文中。到這裡 RM 完成了 PhaseOne 階段的工作,接著看 PhaseTwo 階段的處理邏輯。

事務提交

各分支事務執行完成後,TC 對各 RM 的彙報結果進行匯總,給各 RM 發送 commit 或 rollback 的指令。

2019-04-09 21:57:49.813 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler : Receive:xid=192.168.0.2:8091:2008546211,branchId=2008546212,branchType=AT,resourceId=jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false,applicationData=null,messageId:1
2019-04-09 21:57:49.813 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.AbstractRpcRemoting : com.alibaba.fescar.core.rpc.netty.RmRpcClient@7d61f5d4 msgId:1, body:xid=192.168.0.2:8091:2008546211,branchId=2008546212,branchType=AT,resourceId=jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false,applicationData=null
2019-04-09 21:57:49.814 INFO 38933 --- [atch_RMROLE_1_8] c.a.f.core.rpc.netty.RmMessageListener : onMessage:xid=192.168.0.2:8091:2008546211,branchId=2008546212,branchType=AT,resourceId=jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false,applicationData=null
2019-04-09 21:57:49.816 INFO 38933 --- [atch_RMROLE_1_8] com.alibaba.fescar.rm.AbstractRMHandler : Branch committing: 192.168.0.2:8091:2008546211 2008546212 jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false null
2019-04-09 21:57:49.816 INFO 38933 --- [atch_RMROLE_1_8] com.alibaba.fescar.rm.AbstractRMHandler : Branch commit result: PhaseTwo_Committed
2019-04-09 21:57:49.817 INFO 38933 --- [atch_RMROLE_1_8] c.a.fescar.core.rpc.netty.RmRpcClient : RmRpcClient sendResponse branchStatus=PhaseTwo_Committed,result code =Success,getMsg =null
2019-04-09 21:57:49.817 DEBUG 38933 --- [atch_RMROLE_1_8] c.a.f.c.rpc.netty.AbstractRpcRemoting : send response:branchStatus=PhaseTwo_Committed,result code =Success,getMsg =null,channel:[id: 0xd40718e3, L:/192.168.0.2:62607 - R:/192.168.0.2:8091]
2019-04-09 21:57:49.817 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler : Send:branchStatus=PhaseTwo_Committed,result code =Success,getMsg =null

從日誌中可以看到

  1. RM 收到 XID=192.168.0.2:8091:2008546211,branchId=2008546212 的 commit 通知;
  2. 執行 commit 動作;
  3. 將 commit 結果發送給 TC,branchStatus 為 PhaseTwo_Committed;

具體看下二階段 commit 的執行過程,在AbstractRMHandler類的 doBranchCommit 方法:

/**
* 拿到通知的xid、branchId等關鍵參數
* 然後調用RM的branchCommit
*/
protected void doBranchCommit(BranchCommitRequest request, BranchCommitResponse response) throws TransactionException {
String xid = request.getXid();
long branchId = request.getBranchId();
String resourceId = request.getResourceId();
String applicationData = request.getApplicationData();
LOGGER.info("Branch committing: " + xid + " " + branchId + " " + resourceId + " " + applicationData);
BranchStatus status = getResourceManager().branchCommit(request.getBranchType(), xid, branchId, resourceId, applicationData);
response.setBranchStatus(status);
LOGGER.info("Branch commit result: " + status);
}

最終會將 branchCommit 的請求調用到AsyncWorker的 branchCommit 方法。AsyncWorker 的處理方式是fescar 架構的一個關鍵部分,因為大部分事務都是會正常提交的,所以在 PhaseOne 階段就已經結束了,這樣就可以將鎖最快的釋放。PhaseTwo 階段接收 commit 的指令後,非同步處理即可。將 PhaseTwo 的時間消耗排除在一次分散式事務之外。

private static final List<Phase2Context> ASYNC_COMMIT_BUFFER = Collections.synchronizedList( new ArrayList<Phase2Context>());

/**
* 將需要提交的XID加入list
*/
@Override
public BranchStatus branchCommit(BranchType branchType, String xid, long branchId, String resourceId, String applicationData) throws TransactionException {
if (ASYNC_COMMIT_BUFFER.size() < ASYNC_COMMIT_BUFFER_LIMIT) {
ASYNC_COMMIT_BUFFER.add(new Phase2Context(branchType, xid, branchId, resourceId, applicationData));
} else {
LOGGER.warn("Async commit buffer is FULL. Rejected branch [" + branchId + "/" + xid + "] will be handled by housekeeping later.");
}
return BranchStatus.PhaseTwo_Committed;
}

/**
* 通過定時任務消費list中的XID
*/
public synchronized void init() {
LOGGER.info("Async Commit Buffer Limit: " + ASYNC_COMMIT_BUFFER_LIMIT);
timerExecutor = new ScheduledThreadPoolExecutor(1,
new NamedThreadFactory("AsyncWorker", 1, true));
timerExecutor.scheduleAtFixedRate(new Runnable() {
@Override
public void run() {
try {
doBranchCommits();
} catch (Throwable e) {
LOGGER.info("Failed at async committing ... " + e.getMessage());
}
}
}, 10, 1000 * 1, TimeUnit.MILLISECONDS);
}

private void doBranchCommits() {
if (ASYNC_COMMIT_BUFFER.size() == 0) {
return;
}
Map<String, List<Phase2Context>> mappedContexts = new HashMap<>();
Iterator<Phase2Context> iterator = ASYNC_COMMIT_BUFFER.iterator();

//一次定時循環取出ASYNC_COMMIT_BUFFER中的所有待辦數據
//以resourceId作為key分組待commit數據,resourceId是一個資料庫的連接url
//在前面的日誌中可以看到,目的是為了覆蓋應用的多數據源創建
while (iterator.hasNext()) {
Phase2Context commitContext = iterator.next();
List<Phase2Context> contextsGroupedByResourceId = mappedContexts.get(commitContext.resourceId);
if (contextsGroupedByResourceId == null) {
contextsGroupedByResourceId = new ArrayList<>();
mappedContexts.put(commitContext.resourceId, contextsGroupedByResourceId);
}
contextsGroupedByResourceId.add(commitContext);

iterator.remove();

}

for (Map.Entry<String, List<Phase2Context>> entry : mappedContexts.entrySet()) {
Connection conn = null;
try {
try {
//根據resourceId獲取數據源以及連接
DataSourceProxy dataSourceProxy = DataSourceManager.get().get(entry.getKey());
conn = dataSourceProxy.getPlainConnection();
} catch (SQLException sqle) {
LOGGER.warn("Failed to get connection for async committing on " + entry.getKey(), sqle);
continue;
}
List<Phase2Context> contextsGroupedByResourceId = entry.getValue();
for (Phase2Context commitContext : contextsGroupedByResourceId) {
try {
//執行undolog的處理,即刪除xid、branchId對應的記錄
UndoLogManager.deleteUndoLog(commitContext.xid, commitContext.branchId, conn);
} catch (Exception ex) {
LOGGER.warn(
"Failed to delete undo log [" + commitContext.branchId + "/" + commitContext.xid + "]", ex);
}
}

} finally {
if (conn != null) {
try {
conn.close();
} catch (SQLException closeEx) {
LOGGER.warn("Failed to close JDBC resource while deleting undo_log ", closeEx);
}
}
}
}
}

所以對於commit動作的處理,RM只需刪除xid、branchId對應的undo_log即可。

事務回滾

對於rollback場景的觸發有兩種情況

  1. 分支事務處理異常,即ConnectionProxy中report(false)的情況
  2. TM捕獲到下游系統上拋的異常,即發起全局事務標有@GlobalTransactional註解的方法捕獲到的異常。在前面TransactionalTemplate類的execute模版方法中,對business.execute()的調用進行了catch,catch後會調用rollback,由TM通知TC對應XID需要回滾事務

public void rollback() throws TransactionException {
//只有Launcher能發起這個rollback
if (role == GlobalTransactionRole.Participant) {
// Participant has no responsibility of committing
if (LOGGER.isDebugEnabled()) {
LOGGER.debug("Ignore Rollback(): just involved in global transaction [" + xid + "]");
}
return;
}
if (xid == null) {
throw new IllegalStateException();
}

status = transactionManager.rollback(xid);
if (RootContext.getXID() != null) {
if (xid.equals(RootContext.getXID())) {
RootContext.unbind();
}
}
}

TC 匯總後向參與者發送 rollback 指令,RM 在AbstractRMHandler類的 doBranchRollback 方法中接收這個rollback 的通知。

protected void doBranchRollback(BranchRollbackRequest request, BranchRollbackResponse response) throws TransactionException {
String xid = request.getXid();
long branchId = request.getBranchId();
String resourceId = request.getResourceId();
String applicationData = request.getApplicationData();
LOGGER.info("Branch rolling back: " + xid + " " + branchId + " " + resourceId);
BranchStatus status = getResourceManager().branchRollback(request.getBranchType(), xid, branchId, resourceId, applicationData);
response.setBranchStatus(status);
LOGGER.info("Branch rollback result: " + status);
}

然後將 rollback 請求傳遞到DataSourceManager類的 branchRollback 方法。

public BranchStatus branchRollback(BranchType branchType, String xid, long branchId, String resourceId, String applicationData) throws TransactionException {
//根據resourceId獲取對應的數據源
DataSourceProxy dataSourceProxy = get(resourceId);
if (dataSourceProxy == null) {
throw new ShouldNeverHappenException();
}
try {
UndoLogManager.undo(dataSourceProxy, xid, branchId);
} catch (TransactionException te) {
if (te.getCode() == TransactionExceptionCode.BranchRollbackFailed_Unretriable) {
return BranchStatus.PhaseTwo_RollbackFailed_Unretryable;
} else {
return BranchStatus.PhaseTwo_RollbackFailed_Retryable;
}
}
return BranchStatus.PhaseTwo_Rollbacked;
}

最終會執行UndoLogManager類的 undo 方法,因為是純 jdbc 操作代碼比較長就不貼出來了,可以通過連接到github 查看源碼,說一下 undo 的具體流程:

  1. 根據 xid 和 branchId 查找 PhaseOne 階段提交的 undo_log;
  2. 如果找到了就根據 undo_log 中記錄的數據生成回放 sql 並執行,即還原 PhaseOne 階段修改的數據;
  3. 第 2 步處理完後,刪除該條 undo_log 數據;
  4. 如果第 1 步沒有找到對應的 undo_log,就插入一條狀態為GlobalFinished的 undo_log。出現沒找到的原因可能是 PhaseOne 階段的本地事務異常了,導致沒有正常寫入。 因為 xid 和 branchId 是唯一索引,所以第 4步的插入,可以防止 PhaseOne 階段恢復後的成功寫入,那麼 PhaseOne 階段就會異常,這樣一來業務數據也就不會提交成功,數據達到了最終回滾了的效果。

總結

本地結合分散式業務場景,分析了 fescar client 側的主要處理流程,對 TM 和 RM 角色的主要源碼進行了解析,希望能對大家理解 fescar 的工作原理有所幫助。

隨著 fescar 的快速迭代以及後期 Roadmap 規劃的不斷完善,假以時日,相信 fescar 能夠成為開源分散式事務的標杆解決方案。

本文作者:中間件小哥

原文鏈接

更多技術乾貨敬請關注云棲社區知乎機構號:阿里云云棲社區 - 知乎

本文為雲棲社區原創內容,未經允許不得轉載。


推薦閱讀:
相关文章