Assessing Cluster Tendency in Neuronal Spike Data

Mahallati, Sara 1, 2 ; Popovic, Milos R. 1, 2 ; Valiante, Taufik A. 1, 3

1. Institute of Biomaterials and Biomedical Engineering, University of Toronto; 2. Toronto Rehabilitation Institute, University Health Network; 3. Krembil Research Institute, University Health Network

Recording of extracellular action potential (spike) waveforms (units) generated by neuronal activity is an important technique in neuroscientific research.  However, distinguishing the number of units present within recordings from a single electrode remains a fundamental and important technical issue.   The first step in identifying distinct units begins by clustering unlabeled spike trains with any one of a number of unsupervised learning algorithms such as hard c-means, fuzzy c-means, single linkage, or Gaussian mixture decomposition. All of these methods require one of two approaches: specification of a fixed value of c, the number of clusters to seek; or generation of candidate partitions for several possible values of c, followed by selection of a best candidate based on various post-clustering validation criteria. In this work, we explore the use of a pre-clustering method for improved Visual assessment of Clustering Tendency (iVAT) to estimate the number of clusters to seek prior to employing a clustering method to assign memberships. We show the need for such a strategy with numerical examples using simulated waveform data to provide labeled clusters as the ground truth. To generate simulated spike trains, we used in vitro data recorded using multi-electrode arrays (MEA) from human brain slices, and clustered units using both software and visually, by an experienced human operator.  Ten such units from electrodes on different slices were selected. Background noise obtained from sections of multi-electrode array signals that didn’t contain spikes was then added to the average waveforms for each unit.  Each cluster had between 700 and 1500 spikes to mimic the average firing rates of between 3 Hz and 8 Hz. Groups of units were created by appending waveforms of 2 or 3 units. The order of unit activity was randomized from a uniform distribution to mimic the random occurrence of spikes. The synthetic recording of 2 or 3 neurons was used to demonstrate the inconsistency between different feature extraction techniques (e.g. Principle component analysis (PCA), and T-student stochastic neighbour embedding (tSNE)) that project the data to a lower dimensional space to identify potential clusters, leading to an incorrect interpretation of the number of clusters. iVAT is based on an algorithm that enables the visualization of possible cluster structure in the data without dimensionality reduction since it uses a measure of similarity in the high dimensional space. Following application of iVAT we then compute clustering validation indices such as Dunn’s index to illustrate the relationship and compatibility of the pre- and post-cluster assessments. Finally, we show that iVAT and Dunn’s index can be further customized for our purpose by changing the Euclidean distance measure to a shape based distance that may be more appropriate for assessing clusters of finite sets of neuronal spike waveform data.