Improving egocentric hand detection using convolutional neural networks (CNN) and a novel Haar-like feature

Likitlersuang, Jirapat 1, 2 ; Sumitro, Elizabeth 1, 2 ; Visee, Ryan 1, 2 ; Kalsi-Ryan, Sukhvinder 1, 2, 3 ; Zariffa, Jose 1, 2

1. Institute of Biomaterials and Biomedical Engineering, University of Toronto; 2. Toronto Rehabilitation Institute, University Health Network; 3. Department of Physical Therapy, University of Toronto

Objective: 

The restoration of upper extremity function is rated as the highest priority for individuals with cervical spinal cord injury (SCI). In order to develop effective interventions, tools are needed to accurately measure hand function throughout the rehabilitation process (e.g. the Graded and Redefined Assessment of Strength, Sensibility and Prehension or the Spinal Cord Independence Measure). However, existing tools are limited to clinical settings or rely on self-report questionnaires to collect information about hand function at home.

Multiple attempts to gauge hand function at home have used wearable accelerometers, but these lack the resolution necessary to capture the complexity of functional hand use. A system based on a wearable camera has the potential to solve this limitation. The wide-angle viewpoint can capture the hand, arm, and surrounding environments.

Computer vision (CV) techniques for hand detection (locating the hand in the image) as well as segmentation (separating the outline of the hand from the cluttered background) are often applied as a preprocessing step for any analysis of the hand. Unfortunately, the performance of many of these systems remains uneven and faulty detections are commonly observed.  In this study, we proposed a hand detection method that takes into account the arm of the user and other individuals in the frames.  

 

Methods: 

A commercially available egocentric camera (GoPro Hero4) that is wearable by the participant was used to collect recordings with 8 participants with cervical SCI, performing activities of daily living in several environments. 

These egocentric recordings underwent a two-step hand detection process. First, a Convolutional Neural Network (CNN) was trained for hand detection using data from 6 participants as well as a publicly available able-bodied dataset, and tested on the remaining 2 participants. The resulting bounding boxes were then used to extract custom Haar-like features designed to encode the arm angle. These Haar-like features were inputted into a binary Random Forest classifier to classify if the bounding box consists of a hand, thus helping to eliminate false positives produced by the CNN. 

 

Results: 

The algorithm has been tested using our dataset of egocentric recordings in individuals with SCI. Results from our SCI dataset testing using Intersection over Union (IoU) show 0.76 in F-score for CNN alone compare to 0.82 in F-score for both CNN and Haar-like feature. 

 

Conclusion:  

This improvement in egocentric hand detection for wearable vision will provide a valuable tool for analyzing hand function in a variety of naturalistic contexts.. This technology will allow the level of independence at home in tasks involving upper limb function of users to be considered in judging the outcome of rehabilitation.