The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Kami mencadangkan rangka kerja untuk penyepaduan rangkaian heterogen dalam anggaran pose manusia (HPE) dengan tujuan mengimbangi ketepatan dan kerumitan pengiraan. Walaupun banyak kaedah sedia ada boleh meningkatkan ketepatan HPE menggunakan berbilang bingkai dalam video, kaedah tersebut juga meningkatkan kerumitan pengiraan. Perbezaan utama di sini ialah rangka kerja heterogen yang dicadangkan mempunyai pelbagai rangkaian untuk jenis bingkai yang berbeza, manakala kaedah sedia ada menggunakan rangkaian yang sama untuk semua bingkai. Khususnya, kami mencadangkan untuk membahagikan bingkai video kepada dua jenis, termasuk bingkai utama dan bingkai bukan kunci, dan menggunakan tiga rangkaian termasuk rangkaian perlahan, rangkaian pantas dan rangkaian pemindahan dalam rangka kerja heterogen kami. Untuk bingkai utama, rangkaian perlahan digunakan yang mempunyai ketepatan tinggi tetapi kerumitan pengiraan yang tinggi. Untuk bingkai bukan kunci yang mengikuti bingkai utama, kami mencadangkan untuk meledingkan peta haba rangkaian perlahan daripada bingkai utama melalui rangkaian pemindahan dan menggabungkannya dengan rangkaian pantas yang mempunyai ketepatan yang rendah tetapi kerumitan pengiraan yang rendah. Tambahan pula, apabila melanjutkan kepada penggunaan bingkai jangka panjang di mana sebilangan besar bingkai bukan kunci mengikuti bingkai utama, korelasi temporal berkurangan. Oleh itu, apabila perlu, kami menggunakan rangkaian pemindahan tambahan yang meledingkan peta haba daripada bingkai bukan kunci bersebelahan. Keputusan percubaan pada set data PoseTrack 2017 dan PoseTrack 2018 menunjukkan bahawa FSPose yang dicadangkan mencapai keseimbangan yang lebih baik antara ketepatan dan kerumitan pengiraan berbanding kaedah pesaing. Kod sumber kami tersedia di https://github.com/Fenax79/fspose.
Jianfeng XU
KDDI Research, Inc.
Satoshi KOMORITA
KDDI Research, Inc.
Kei KAWAMURA
KDDI Research, Inc.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Salinan
Jianfeng XU, Satoshi KOMORITA, Kei KAWAMURA, "FSPose: A Heterogeneous Framework with Fast and Slow Networks for Human Pose Estimation in Videos" in IEICE TRANSACTIONS on Information,
vol. E106-D, no. 6, pp. 1165-1174, June 2023, doi: 10.1587/transinf.2022EDP7182.
Abstract: We propose a framework for the integration of heterogeneous networks in human pose estimation (HPE) with the aim of balancing accuracy and computational complexity. Although many existing methods can improve the accuracy of HPE using multiple frames in videos, they also increase the computational complexity. The key difference here is that the proposed heterogeneous framework has various networks for different types of frames, while existing methods use the same networks for all frames. In particular, we propose to divide the video frames into two types, including key frames and non-key frames, and adopt three networks including slow networks, fast networks, and transfer networks in our heterogeneous framework. For key frames, a slow network is used that has high accuracy but high computational complexity. For non-key frames that follow a key frame, we propose to warp the heatmap of a slow network from a key frame via a transfer network and fuse it with a fast network that has low accuracy but low computational complexity. Furthermore, when extending to the usage of long-term frames where a large number of non-key frames follow a key frame, the temporal correlation decreases. Therefore, when necessary, we use an additional transfer network that warps the heatmap from a neighboring non-key frame. The experimental results on PoseTrack 2017 and PoseTrack 2018 datasets demonstrate that the proposed FSPose achieves a better balance between accuracy and computational complexity than the competitor method. Our source code is available at https://github.com/Fenax79/fspose.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2022EDP7182/_p
Salinan
@ARTICLE{e106-d_6_1165,
author={Jianfeng XU, Satoshi KOMORITA, Kei KAWAMURA, },
journal={IEICE TRANSACTIONS on Information},
title={FSPose: A Heterogeneous Framework with Fast and Slow Networks for Human Pose Estimation in Videos},
year={2023},
volume={E106-D},
number={6},
pages={1165-1174},
abstract={We propose a framework for the integration of heterogeneous networks in human pose estimation (HPE) with the aim of balancing accuracy and computational complexity. Although many existing methods can improve the accuracy of HPE using multiple frames in videos, they also increase the computational complexity. The key difference here is that the proposed heterogeneous framework has various networks for different types of frames, while existing methods use the same networks for all frames. In particular, we propose to divide the video frames into two types, including key frames and non-key frames, and adopt three networks including slow networks, fast networks, and transfer networks in our heterogeneous framework. For key frames, a slow network is used that has high accuracy but high computational complexity. For non-key frames that follow a key frame, we propose to warp the heatmap of a slow network from a key frame via a transfer network and fuse it with a fast network that has low accuracy but low computational complexity. Furthermore, when extending to the usage of long-term frames where a large number of non-key frames follow a key frame, the temporal correlation decreases. Therefore, when necessary, we use an additional transfer network that warps the heatmap from a neighboring non-key frame. The experimental results on PoseTrack 2017 and PoseTrack 2018 datasets demonstrate that the proposed FSPose achieves a better balance between accuracy and computational complexity than the competitor method. Our source code is available at https://github.com/Fenax79/fspose.},
keywords={},
doi={10.1587/transinf.2022EDP7182},
ISSN={1745-1361},
month={June},}
Salinan
TY - JOUR
TI - FSPose: A Heterogeneous Framework with Fast and Slow Networks for Human Pose Estimation in Videos
T2 - IEICE TRANSACTIONS on Information
SP - 1165
EP - 1174
AU - Jianfeng XU
AU - Satoshi KOMORITA
AU - Kei KAWAMURA
PY - 2023
DO - 10.1587/transinf.2022EDP7182
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E106-D
IS - 6
JA - IEICE TRANSACTIONS on Information
Y1 - June 2023
AB - We propose a framework for the integration of heterogeneous networks in human pose estimation (HPE) with the aim of balancing accuracy and computational complexity. Although many existing methods can improve the accuracy of HPE using multiple frames in videos, they also increase the computational complexity. The key difference here is that the proposed heterogeneous framework has various networks for different types of frames, while existing methods use the same networks for all frames. In particular, we propose to divide the video frames into two types, including key frames and non-key frames, and adopt three networks including slow networks, fast networks, and transfer networks in our heterogeneous framework. For key frames, a slow network is used that has high accuracy but high computational complexity. For non-key frames that follow a key frame, we propose to warp the heatmap of a slow network from a key frame via a transfer network and fuse it with a fast network that has low accuracy but low computational complexity. Furthermore, when extending to the usage of long-term frames where a large number of non-key frames follow a key frame, the temporal correlation decreases. Therefore, when necessary, we use an additional transfer network that warps the heatmap from a neighboring non-key frame. The experimental results on PoseTrack 2017 and PoseTrack 2018 datasets demonstrate that the proposed FSPose achieves a better balance between accuracy and computational complexity than the competitor method. Our source code is available at https://github.com/Fenax79/fspose.
ER -