YOWOv3: An Efficient and Generalized Framework for Human Action Detection and Recognition

简介

本文提出了一个名为YOWOv3的新框架，它是YOWOv2的改进版本，专门设计用于人类动作检测和识别任务。该框架旨在促进对不同配置的广泛实验，并支持对模型中各个组件进行简单的定制，从而减少了理解和修改代码所需的工作量。与UCF101-24和AVAv2.2这两个广泛使用的人类动作检测和识别数据集相比，YOWOv3展示了比YOWOv2更优越的性能。具体而言，前任模型YOWOv2在UCF101-24和AVAv2.2上分别实现了85.2％和20.3％的mAP，参数为109.7M，GFLOPs为53.6。相比之下，我们的模型——YOWOv3，只有59.8M的参数和39.8 GFLOPs，分别在UCF101-24和AVAv2.2上实现了88.33％和20.31％的mAP。结果表明，YOWOv3显着减少了参数和GFLOPs的数量，同时仍然实现了可比较的性能。
图表
解决问题

YOWOv3: An Improved Framework for Human Action Detection and Recognition
关键思路

YOWOv3 is a framework designed for Human Action Detection and Recognition, which demonstrates superior performance compared to its predecessor YOWOv2 while significantly reducing the number of parameters and GFLOPs.
其它亮点

YOWOv3 is designed to facilitate extensive experimentation with different configurations and supports easy customization of various components within the model. It achieves an mAP of 88.33% and 20.31% on UCF101-24 and AVAv2.2, respectively, with only 59.8M parameters and 39.8 GFLOPs. The framework is evaluated on two widely used datasets and the results show its superior performance compared to YOWOv2.
相关研究

Related works in this field include 'Real-time Action Recognition with Enhanced Motion Vector CNNs' and 'Two-Stream Convolutional Networks for Action Recognition in Videos'.

YOWOv3: An Efficient and Generalized Framework for Human Action Detection and Recognition

评论