Interaction Detection in Egocentric Video: Towards a Novel Outcome Measure for Upper Extremity Function

Likitlersuang, Jirapat 1, 2 ; Sumitro, Elizabeth 1, 2 ; Kalsi-Ryan, Sukhvinder 3, 4 ; Zariffa, Jose 1, 2

1. Institute of Biomaterials and Biomedical Engineering, University of Toronto; 2. Toronto Rehabilitation Institute, University Health Network; 3. Department of Physical Therapy, University of Toronto; 4. Toronto Western Hospital, University Health Network

Objective: The restoration of upper extremity function is rated as the highest priority for individuals with cervical spinal cord injury (SCI). In order to develop effective interventions, tools are needed to accurately measure hand function throughout the rehabilitation process (e.g. the Graded and Redefined Assessment of Strength, Sensibility and Prehension). However, there is currently no satisfactory method to collect information about hand function in the community, when patients are not under direct observation of a clinician. In this study, a wearable sensor is presented that can quantify the amount of functional hand use, by detecting interactions of the hand with objects in the environment. The system is based on computer vision techniques applied to recordings from a wearable camera (egocentric video). 

Design/Method: A commercially available small egocentric camera (Looxie 2) that is wearable by the participant was used to collect recordings that involved 4 able bodied participants performing common activities of daily living, such as grasping a mug or picking up a phone. Similar data collection from 2 participants with cervical SCI was also performed (using GoPro cameras). Beyond capturing a new dataset of our own, we also sought to evaluate the effectiveness of our approach using a publicly available dataset of an able bodied participant. These egocentric recordings first undergo a pre-processing step, where the hand is detected and segmented out from the cluttered background. After hand segmentation, information associated with hand-object interactions is extracted from the video, consisting of video features based on motion and shape descriptors. These features are inputted into a binary Random Forest classifier to classify video frames as representing interactions or not, and the output is filtered with a moving average. The moving average highlights trends that allow metrics of hand use to be extracted. These include the number and duration of interactions between the hand and objects in the environment.

Results: The algorithm has been tested using our datasets of egocentric recordings in able-bodied and SCI individuals performing activities of daily living in several environments (F-scores of 0.85 and 0.81, respectively, during leave-one-activity-out cross-validation). It has additionally been validated on a publicly available dataset of able-bodied individuals (F-score of 0.85).

Conclusion: This wearable system will allow researchers and clinicians for the first time to gauge the user’s level of independence at home in activities involving upper limb function, which is a key consideration in judging the outcome of the rehabilitation process. The computer vision algorithms developed will provide a valuable tool for analyzing hand function in a variety of naturalistic contexts.