TY - JOUR
T1 - Reinforcement Learning for NOMA-ALOHA under Fading
AU - Ko, Youngwook
AU - Choi, Jinho
PY - 2022/8/11
Y1 - 2022/8/11
N2 - We consider a non-orthogonal multiple access in a random-access ALOHA system, in which each user randomly accesses one out of different time slots and send uplink packets based on power differences. In the context of an asymmetric game, we propose a non-orthogonal multiple access ALOHA system based on multi-agent reinforcement learning tools that can help each user to find its best strategies of improving the rates of successful action choices. While taking into account not only collisions, but also fading, we analyze the mean rewards of actions under general settings and focus on the case that involves two different groups of users. To characterize the behaviors of accessing strategies, we apply multi-agent action value methods that consider either greedy or non-greedy actions, combined with an acceleration gradient descent. Our results show that in the proposed system, users employing the greedy action-based methods can be randomly divided into two groups of users and increase the rates of successful action choices. Interestingly, in relatively limited channels, such greedy methods turn many of users to be with a state of barring-access. In this case, the proposed acceleration, non-greedy action methods are shown to reduce such unfairness, at a loss of successful action rates.
AB - We consider a non-orthogonal multiple access in a random-access ALOHA system, in which each user randomly accesses one out of different time slots and send uplink packets based on power differences. In the context of an asymmetric game, we propose a non-orthogonal multiple access ALOHA system based on multi-agent reinforcement learning tools that can help each user to find its best strategies of improving the rates of successful action choices. While taking into account not only collisions, but also fading, we analyze the mean rewards of actions under general settings and focus on the case that involves two different groups of users. To characterize the behaviors of accessing strategies, we apply multi-agent action value methods that consider either greedy or non-greedy actions, combined with an acceleration gradient descent. Our results show that in the proposed system, users employing the greedy action-based methods can be randomly divided into two groups of users and increase the rates of successful action choices. Interestingly, in relatively limited channels, such greedy methods turn many of users to be with a state of barring-access. In this case, the proposed acceleration, non-greedy action methods are shown to reduce such unfairness, at a loss of successful action rates.
U2 - 10.1109/TCOMM.2022.3198125
DO - 10.1109/TCOMM.2022.3198125
M3 - Article
SN - 0090-6778
JO - IEEE Transactions on Communications
JF - IEEE Transactions on Communications
ER -