Effective and Efficient Hand Detection in Egocentric Videos

Visée, Ryan1, 2; Zariffa, José1, 2

1. Institute of Biomaterials and Biomedical Engineering, University of Toronto; 2. Toronto Rehabilitation Institute, University Health Network

Spinal cord injury (SCI) significantly reduces quality of life of the affected individual. Persons with cervical SCI report that they would feel a significant improvement in their quality of life if they were to regain upper limb function. As a result, new treatments to improve hand function after SCI are needed. To accurately measure the impact of these interventions on patient function and independence, evaluation needs to occur at home, instead of within clinical settings. Although there are currently no tools to evaluate hand function at home, videos from wearable cameras (egocentric videos) can be used to monitor patient activities and analyzed using computer vision techniques. Hand detection is an essential step prior to further analysis of hand function. We propose the use of convolutional neural networks (CNN) in combination with a hand tracking algorithm to create a system for fast and reliable hand detection in egocentric videos.


In previous experiments, a GoPro Hero 4 wearable camera was used to collect data from 17 individuals with SCI performing activities of daily living in a home simulation laboratory at the Toronto Rehabilitation Institute. We then used these videos to create a large dataset for hand detection by manually labelling bounding boxes around hands in each frame. We will use this data to investigate traditional CNN-based detection algorithms and compare these results to hand tracking algorithms, to determine the superior approach. Finally, we may use some combination of hand detection and tracking techniques to produce the complete hand detection algorithm. These steps will be tested against previous approaches of hand detection in egocentric videos.


Over 160,000 frames were annotated to develop a large egocentric hand dataset. We then randomly chose 10 videos, covering over 18,000 frames, to compare current online tracking algorithms, resulting in average F1-scores of 58.24 ± 0.29 % and 40.00 ± 0.28% in a Median Flow and Kernalized Correlation Filter tracker respectively. This data will then be used to compare the current state-of-the-art CNN-based detection system to previous results.


Due to the inability to successfully recover from occlusions and quick movements, resulting in low F1-scores, tracking algorithms alone were not yet found to be sufficient for hand detection in egocentric video. We expect that implementing a fast detection algorithm to reset a tracking algorithm in case of occlusion will result in more accurate and efficient hand detections. Successful implementation will put researchers one step closer to innovating ways to directly measure hand function in a patient’s daily life at home, thus helping restore independence after SCI.