clustering module#

@author: Théo Lambert

This module regroups all the functions related to single-voxel clustering.

class clustering.SingleVoxelClustering(n_clusters: int, data: ndarray, coords: ndarray, file_indices, names: List[str], data_volume_shape: array, fe_method: str = 'pca', fe_params: dict | None = {}, noise_th: float | None = None)#

Bases: object

Main class handling the single voxel clustering process.

Parameters:
  • n_clusters (int) – Number of clusters to form.

  • data (ndarray) – Data volume (3D in time) on which to perform the clustering.

  • coords (ndarray) – Array containing the volume coordinates of each voxel in ‘data’.

  • file_indices (array) – Array containing for each voxel the index identifying the corresponding file.

  • names (array) – Array containing for each voxel the index identifying the corresponding file.

  • data_volume_shape (array) – The shape of the ‘data’ array.

  • fe_method (str) – Selection of the method for extracting the features, check the user guide for more info.

  • fe_params (dict) – Dictionary of parameters for the feature extraction method.

  • noise_th (float) – Threshold on the standard deviation of the baseline to remove ‘noisy’ voxels.

change_cmap(new_cmap: list | str, continuous=True)#

Method for changing the colormap.

Parameters:
  • new_cmap (list | str) – Either a list of RGBA values or the name of a colormap available in pyplot.

  • continuous (bool) – Set to True if the colormap is continuous and False if sequential.

cluster_dataset()#

Method for attributing each time trace with a cluster. Default model is K-Means.

compute_amplitude_map(file_idx, extrema)#

Method for computing the transparency map based on the normalized amplitude.

Parameters:
  • file_idx (int) – Index of the file to be displayed.

  • extrema (dict) – A dictionary containing the extrema for each cluster for the amplitude normalization.

Returns:

amplitude_map – The map containing the transparency values.

Return type:

ndarray

extract_features(fe_method: str, fe_params: dict)#

Method for applying the feature extraction process to the data.

Parameters:
  • fe_method (str | object) – Refer to the feature extraction package for more details. Accepted values: pca, ica, nnmf, or an object with ‘fit’ or ‘fit_predict’ method

  • fe_params (dict) – Dictionary containing the parameters to provide to the feature extractor.

Returns:

features – Features extracted from the data.

Return type:

ndarray

flatten_volume(volume: ndarray, volume_boundaries: array | None = None, ncols: int = 8, apply_colormap: bool = True) ndarray#

Utility function for creating flattened 3D volumes towards displaying them as a mosaique of 2D images, where each value is the cluster attribution of the associated voxel.

Parameters:
  • volume (ndarray) – The 3D volume to be flattened.

  • volume_boundaries (array | None) – For setting custom boundaries, if None the dimensionses of the whole volume will be taken.

  • ncols (int) – Number of columns to be used for the flattening.

  • apply_colormap (bool) – If True, the colormap is applied to replace cluster IDs with the corresponding color.

Returns:

res – A 2D array with values being either int (cluster ID) or tuple (color associated with cluster ID) depeding on the ‘apply_colormap’ parameter.

Return type:

ndarray

generate_background_img()#

Method for generating the microDoppler background image

get_cluster_locations(file_idx: int, volume_boundaries: array | None = None, plot: bool = True, registered: bool = False, atlas_path: None | str = None, cluster_ids=None, extrema=None, hires_dst=None)#

Method for displaying the map in which each voxel is color-coded depending on its cluster attribution.

Parameters:
  • file_idx (int) – Index of the file to be displayed.

  • volume_boundaries (array | None) – For setting custom boundaries, if None the dimensionses of the whole volume will be taken.

  • plot (bool) – If True, a map in which each voxel is color-coded depeding on its cluster attribution will be displayed.

  • registered (bool) – If the data was registered during the data loading, set to True to have the atlas info available through the cursor.

  • atlas_path (str) – Path to the atlas that the data has been registered to.

  • cluster_ids (list of int | None) – The list of cluster to be displayed.

  • extrema (tuple) – Dictionary containing for each cluster the extrema for normalization.

  • hires_dst (str | None) – If string, path where the hi-res files will be saved. If None, hi-res files won’t be generated nor saved.

Returns:

volume – A volume (same size as the original data) in which each value represents the cluster attribution.

Return type:

ndarray

get_signals_or_coords(output: str = 'signals', file_idx: int | None = None)#

Generates an iterator yielding for each sample in data the cluster id and either the data or the coords

Parameters:
  • output (str) – If ‘signals’, yield the time trace of each sample in data. If ‘coords’, the coordinates instead.

  • file_idx (int | None) – If int, yield the signals associated with the file index. Else, returns everything.

merge_clusters(pairs_to_merge, adjust_cmap=True)#

Utility function for merging clusters together.

Parameters:
  • pairs_to_merge (list of tuple of int) – List of pair of clusters to be merged. Higher cluster number will be merged to the lower cluster number, ie if cluster 5 and 3 are merged, in the cluster maps 5s will become 3s. Example input: [(5,3), (1,6)].

  • adjust_cmap (bool) – Whether or not to adjust the colormap to the new number of clusters.

plot_signals(file_idx: int | None = None, display: str = 'all', ncols: int = 4, scale: None | Tuple[float, float] = None, stimulation_pattern=None)#

Method for plotting the signals in each cluster.

Parameters:
  • file_idx (int | None) – If int, only the traces of the file with index ‘file_idx’ will be plotted.

  • display (str) – Defines the way traces will be displayed ‘all’: display all traces ‘mean_std’: only display the mean trace and the standard deviation.

  • ncols (int) – Number of columns for the display.

  • scale (tuple of float | None) – Min and max scales for the display (arguments vmin / vmax from plt.imshow). If None, auto scale will be used.

  • stimulation_pattern (list of tuple of int | None) – List containing beginnings and ends of stimulation windows.

print_file_info()#

To print in terminal the names of files included in the process and their associated indices.

reset_cmap()#

Function for reseting the colormap to default values, including when the number of cluster has changed.

switch_colors_in_cmap(color_swaps)#

Utility method for swapping colors in the colormap to adjust the display of the cluster traces.

Parameters:

color_swaps (list of tuple of int) – List containing pairs of colors to be switched.

class clustering.SingleVoxelClusteringWrapper(method: str, n_clusters: int, filelist: List[str], atlas_path: str, regions_info_path: str, fe_method: None | str = None, fe_params: dict = {}, noise_th: float | None = None, registered: bool = True, normalization: None | str = None)#

Bases: object

Wrapper to perform the single voxel clustering in various configurations.

Parameters:
  • method (str) –

    Choice of the method, between:
    • volume: all voxels in the image

    • hemisphere: all voxels in a given or both hemispheres

    • structure: all voxels belonging to a list of structures

    • multiregion: all voxels belonging to a list of regions

  • n_clusters (int) – Number of clusters to search for.

  • filelist (list of string) – List of paths to the volumes whose voxels will be clustered.

  • atlas_path (string) – Path to the file containing the atlas volume.

  • regions_info_path (string) – Text file listing all the regions included in the atlas.

  • fe_method (string) – Selection of the feature extraction method, to choose between ‘pca’, ‘ica’, ‘nnmf’ or custom.

  • fe_params (dict) – Selection of the feature extraction parameters.

  • noise_th (float.) – Threshold on the standard deviation of the baseline to remove ‘noisy’ voxels.

change_cmap(new_cmap: None | list | str = None, continuous: bool = True)#

Method for changing the colormap.

Parameters:
  • new_cmap (None | list | str) – Either a list of RGBA values or the name of a colormap available in pyplot.

  • continuous (bool) – Set to True if the colormap is continuous and False if sequential.

compute_amplitude_extrema(quantiles)#

Utility function for computing the extrema of each cluster for computing the transparency value associated with parameter ‘amplitude_transparency’ from method ‘display_cluster_locations’. Note: since the extrema are noisy, quantiles are used instead.

Parameters:

quantiles (list of float) – List of size 2 containing the quantiles for the minimum and maximum estimation respectively.

Returns:

extrema – Dictionary containing for each cluster the extrema for normalization.

Return type:

dict of tuple

display_atlas_mask_selected_regions(regions_list)#

Utility function to display on the atlas the list of selected regions.

Parameters:

regions_list (list of str) – List of regions acronyms to be displayed on the atlas.

display_cluster_locations(names: list | None = None, amplitude_transparency=False, quantiles=[0.05, 0.95], cluster_ids=None, hires_dst=None)#

Utility function for calling the display function for spatial locations depending on the method that was selected.

Parameters:
  • names (list) – Identifiers of the elements to be displayed.

  • amplitude_transparency (bool) – Whether or not to modulate the amplitude of the cluster based on the amplitude of the signals. Modulation is cluster specific.

  • quantiles (list of float) – List of size 2 containing the quantiles for the minimum and maximum estimation respectively. Only used if amplitude transparency is True.

  • cluster_ids (None | list) – List of the clusters to be displayed. If None, all clusters will be displayed.

  • hires_dst (str | None) – If string, path where the hi-res files will be saved. If None, hi-res files won’t be generated nor saved.

get_cluster_maps() dict#

Utility function for getting the arrays containing the cluster maps where each voxel’s value represents its cluster attribution.

Parameters:

use_boundaries (bool) – Whether or not to adjust the boundaries of the maps to the selected regions or structure. If False, the full map are output.

Returns:

res – A dictionary with keys being the element names (eg all sessions IDs) and values the associated cluster maps.

Return type:

dict

get_names()#

Method for printing the names of the data elements in the clustering object.

get_regions_from_groups(group: str, hemisphere: str)#

Utility function for converting an anatomical group into a list of regions included in that group in the format expected by the clustering code.

Parameters:
  • group (str) – Acronym of the anatomical group

  • hemisphere (str) – ‘L’ or ‘R’ respectively for left and right hemispheres, or ‘LR’ for both.

Returns:

res – List of tuples in the format (region acronym, hemisphere ‘L’ or ‘R’)

Return type:

list

get_signals(reduction='median') dict#

Utility function for accessing the temporal signals associated with each cluster.

Parameters:

reduction (str | None) – If ‘median’, returns the median of all signals in each cluster. If None, returns all signals.

Returns:

res – A dictionary with keys being the cluster IDs and values the associated temporal trace(s).

Return type:

dict

merge_clusters(pairs_to_merge, adjust_cmap=True)#

Utility function for merging clusters together.

Parameters:
  • pairs_to_merge (list of tuple of int) – List of pair of clusters to be merged. Higher cluster number will be merged to the lower cluster number, ie if cluster 5 and 3 are merged, in the cluster maps 5s will become 3s. Example input: [(5,3), (1,6)].

  • adjust_cmap (bool) – Whether or not to adjust the colormap to the new number of clusters.

normalize(data)#
plot_signals(display: str = 'all', scale: None | Tuple[float, float] = None, stimulation_pattern=None)#

Utility function for calling the display function for temporal traces depending on the method that was selected.

Parameters:
  • display (str) – Either ‘all’ to display all signals or ‘mean_std’ for mean/std of the signals.

  • scale (tuple of float | None) – Min and max scales for the display (arguments vmin / vmax from plt.imshow). If None, auto scale will be used.

  • stimulation_pattern (list of tuple of int | None) – List containing beginnings and ends of stimulation windows.

process(acr_list: str | List[Tuple[str, str]] | None = None)#

Method function to call for clustering the data.

Parameters:

acr_list (list of tuple | None) – If None, all regions are considered. Otherwise, only the acronyms present in the list will be included in the clustering process. The format of the tuple is (<region acronym>, <hemisphere>).

reset_cmap()#

Method for reseting the color map.

switch_colors_in_cmap(color_swaps)#

Utility method for swapping colors in the colormap to adjust the display of the cluster traces.

Parameters:

color_swaps (list of tuple of int) – List containing pairs of colors to be switched.

clustering.region_loading(filelist: ~typing.List[str], acr_list: ~typing.List[str], atlas: ~numpy.ndarray, regions_nb: ~numpy.ndarray, regions_acr: ~numpy.ndarray, regions_centered: bool = True) -> (<class 'numpy.ndarray'>, <class 'numpy.ndarray'>, <class 'numpy.ndarray'>, <class 'numpy.ndarray'>, typing.List[typing.Tuple[int, int]])#

Utility function for loading a single volume along with the necessary info: coordinates, file indices and shape of the volume. Note that ‘nan’ values are excluded during the process.

Parameters:
  • filelist (list of str) – The list of paths to the files that will be included in the clustering process.

  • acr_list (list of str) – The list of the acronyms to be used during the clustering.

  • atlas (ndarray) – 3D volume of the atlas, where each voxel’s value corresponds to its region number.

  • regions_nb (ndarray) – Array containing the regions’ numbers.

  • regions_acr (ndarray) – Array containing the regions’ acronyms. Same order as regions_acr.

  • regions_centered (bool) – If True, the output will be cropped so that the boundaries fit the selected areas.

Returns:

  • data (ndarray) – Contains the volume data (time traces).

  • coords (ndarray) – Contains the coordinates in the volume of each voxel.

  • file_indices (array) – Array filled with int values corresponding to the volume the data was extracted from. For example, 0 will correspond to data from the first loaded file.

  • data_volume_shape (list of array) – list of shapes of the volumes (excluding the time dimension) for reconstructing the volumes later

clustering.volume_loading(filelist: ~typing.List[str]) -> (<class 'numpy.ndarray'>, <class 'numpy.ndarray'>, <class 'numpy.ndarray'>, <class 'numpy.ndarray'>)#

Utility function for loading volumes along with the necessary info: coordinates, file indices and shape of the volume. Note that ‘nan’ values are excluded during the process.

Parameters:

filelist (list of str) – List of paths to the files containing the volumes to be loaded.

Returns:

  • dataset (ndarray) – Contains the volumes data (time traces).

  • coords (ndarray) – Contains the coordinates in the volume of each voxel.

  • file_indices (array) – Array filled with int values corresponding to the volume the data was extracted from. For example, 0 will correspond to data from the first loaded file.

  • data_volume_shape (array) – Shape of the volumes (excluding the time dimension) for reconstructing the volume later