如何使用 TFF 框架执行异步模型训练?
我回顾迭代训练过程循环,但我不知道如何知道哪些客户模型收到。
很有可能在 TFF 中模拟类似于“异步 FL”的东西。考虑这一点的一种方法可能是在概念上将模拟时间与挂钟时间解耦。
每轮采样不同数量的客户端(而不是通常做的统一的K
客户端),也许有一些分布,根据预期训练的时间来加权客户端,可以模拟异步 FL。研究人员可以首先只处理一部分选定的客户端,研究人员可以根据需要自由地分割数据 / 计算。
Python-esque 伪代码演示了两种技术,不同的客户端采样和延迟梯度应用程序:
state = fed_avg_iter_proc.initialize()
for round_num in range(NUM_ROUNDS):
# Here we conceptualize a "round" as a block of time, rather than a synchronous
# round. We have a function that determines which clients will "finish" within
# our configured block of time. This might even return only a single client.
partints = get_next_clients(time_window=timedelta(minutes=30))
num_partints = len(partints)
# Here we only process the first half, and then updated the global model.
state2, metrics = fed_avg_iter_proc.next(state, partints[:num_partints/2])
# Now process the second half of the selected clients.
# Note: this is now apply the 'pseudo-gradient' that was computed on clients
# (the difference between the original `state` and their local training result),
# to the model that has already taken one step (`state2`). This possibly has
# undesirable effects on the optimisation process, or may be improved with
# techniques that handle "stale" gradients.
state3, metrics = fed_avg_iter_proc.next(state2, partints[num_partints/2:])
# Finally update the state for the next for-loop of the simulation.
state = state3
本站系公益性非盈利分享网址,本文来自用户投稿,不代表边看边学立场,如若转载,请注明出处
评论列表(35条)