Generalizability of hand-object interaction detection in egocentric video across populations with hand impairment

Tsai, Meng-Fen (1,2), Likitlersuang, Jirapat (1,2), Visée, Ryan (1,2), Zariffa, José (1,2)

1: Institute of Biomaterials & Biomedical Engineering, University of Toronto, Canada

2: Toronto Rehabilitation Institute, University Health Network, Canada

Background and objective: Upper limb function is an important determinant of independence after injury. Stroke survivors experience a significant impact on their quality of life after the onset of stroke. Novel methods are needed to measure the impact of new upper limb therapies on the lives of the affected individuals in their home environment. In previous work, we demonstrated the efficacy of applying machine learning algorithms to detect functional hand use in egocentric videos of patients with cervical spinal cord injury suffering from bilateral hand function impairment. In our current study, we aim to evaluate our algorithm on videos from individuals with unilateral hand impairment after stroke. If the method is found to generalize well across populations, we hope to extend this wearable system to a variety of clinical populations suffering from hand function impairments, benefiting clinicians, researchers and patients.
Methods: An egocentric camera (GoPro Hero 5), which records video from a first-person angle, was used to record 39 activities of daily living (ADLs) performed by one stroke survivors in a home simulation laboratory. The participants had no other neurological or neuromuscular disease other than stroke. A supervised machine learning algorithm for detecting functional interactions between the hands and objects in the environment, which was trained on dataset of 9 participants with cervical spinal cord injury (SCI), was applied to the participants with stroke. The algorithm consisted of 3 steps: 1) the two hands of the participant were detected using YOLOv2; 2) hands were segmented using edge and colour information; 3) motion, shape and colour features were extracted from the segmented hands and used in conjunction with a random forest to detect interactions.
Results: To date, results have been evaluated using 11,043 frames from one stroke participant performed 8 ADLs. For YOLO2, the average accuracy in detecting the left and right hands of the participant was 0.63 and 0.79, respectively. The overall F1 score for detecting hand-object interactions was 0.64, in contrast to a previous result of 0.73 in the SCI group.
Conclusion: Preliminary results suggest that detecting interactions in one population with hand impairment using an algorithm trained on another population (stroke and spinal cord injury, respectively) may result in reduced performance. Transfer learning approaches and population-specific training should be explored.