clustering module

clustering module#

@author: Théo Lambert

This module regroups all the functions related to single-voxel clustering.

class clustering.SingleVoxelClustering(n_clusters: int, data: ndarray, coords: ndarray, file_indices, names: List[str], data_volume_shape: array, fe_method: str = 'pca', fe_params: dict | None = {}, noise_th: float | None = None)#

Bases: object

Main class handling the single voxel clustering process.

Parameters:

n_clusters (int) – Number of clusters to form.
data (ndarray) – Data volume (3D in time) on which to perform the clustering.
coords (ndarray) – Array containing the volume coordinates of each voxel in ‘data’.
file_indices (array) – Array containing for each voxel the index identifying the corresponding file.
names (array) – Array containing for each voxel the index identifying the corresponding file.
data_volume_shape (array) – The shape of the ‘data’ array.
fe_method (str) – Selection of the method for extracting the features, check the user guide for more info.
fe_params (dict) – Dictionary of parameters for the feature extraction method.
noise_th (float) – Threshold on the standard deviation of the baseline to remove ‘noisy’ voxels.

change_cmap(new_cmap: list | str, continuous=True)#

Method for changing the colormap.

Parameters:

new_cmap (list | str) – Either a list of RGBA values or the name of a colormap available in pyplot.
continuous (bool) – Set to True if the colormap is continuous and False if sequential.

cluster_dataset()#: Method for attributing each time trace with a cluster. Default model is K-Means.

compute_amplitude_map(file_idx, extrema)#

Method for computing the transparency map based on the normalized amplitude.

Parameters:

file_idx (int) – Index of the file to be displayed.
extrema (dict) – A dictionary containing the extrema for each cluster for the amplitude normalization.

Returns:

amplitude_map – The map containing the transparency values.

Return type:

ndarray

extract_features(fe_method: str, fe_params: dict)#

Method for applying the feature extraction process to the data.

Parameters:

fe_method (str | object) – Refer to the feature extraction package for more details. Accepted values: pca, ica, nnmf, or an object with ‘fit’ or ‘fit_predict’ method
fe_params (dict) – Dictionary containing the parameters to provide to the feature extractor.

Returns:

features – Features extracted from the data.

Return type:

ndarray

flatten_volume(volume: ndarray, volume_boundaries: array | None = None, ncols: int = 8, apply_colormap: bool = True) → ndarray#

Utility function for creating flattened 3D volumes towards displaying them as a mosaique of 2D images, where each value is the cluster attribution of the associated voxel.

Parameters:

volume (ndarray) – The 3D volume to be flattened.
volume_boundaries (array | None) – For setting custom boundaries, if None the dimensionses of the whole volume will be taken.
ncols (int) – Number of columns to be used for the flattening.
apply_colormap (bool) – If True, the colormap is applied to replace cluster IDs with the corresponding color.

Returns:

res – A 2D array with values being either int (cluster ID) or tuple (color associated with cluster ID) depeding on the ‘apply_colormap’ parameter.

Return type:

ndarray

generate_background_img()#: Method for generating the microDoppler background image

get_cluster_locations(file_idx: int, volume_boundaries: array | None = None, plot: bool = True, registered: bool = False, atlas_path: None | str = None, cluster_ids=None, extrema=None, hires_dst=None)#

Method for displaying the map in which each voxel is color-coded depending on its cluster attribution.

Parameters:

file_idx (int) – Index of the file to be displayed.
volume_boundaries (array | None) – For setting custom boundaries, if None the dimensionses of the whole volume will be taken.
plot (bool) – If True, a map in which each voxel is color-coded depeding on its cluster attribution will be displayed.
registered (bool) – If the data was registered during the data loading, set to True to have the atlas info available through the cursor.
atlas_path (str) – Path to the atlas that the data has been registered to.
cluster_ids (list of int | None) – The list of cluster to be displayed.
extrema (tuple) – Dictionary containing for each cluster the extrema for normalization.
hires_dst (str | None) – If string, path where the hi-res files will be saved. If None, hi-res files won’t be generated nor saved.

Returns:

volume – A volume (same size as the original data) in which each value represents the cluster attribution.

Return type:

ndarray

get_signals_or_coords(output: str = 'signals', file_idx: int | None = None)#

Generates an iterator yielding for each sample in data the cluster id and either the data or the coords

Parameters:

output (str) – If ‘signals’, yield the time trace of each sample in data. If ‘coords’, the coordinates instead.
file_idx (int | None) – If int, yield the signals associated with the file index. Else, returns everything.

merge_clusters(pairs_to_merge, adjust_cmap=True)#

Utility function for merging clusters together.

Parameters:

pairs_to_merge (list of tuple of int) – List of pair of clusters to be merged. Higher cluster number will be merged to the lower cluster number, ie if cluster 5 and 3 are merged, in the cluster maps 5s will become 3s. Example input: [(5,3), (1,6)].
adjust_cmap (bool) – Whether or not to adjust the colormap to the new number of clusters.

plot_signals(file_idx: int | None = None, display: str = 'all', ncols: int = 4, scale: None | Tuple[float, float] = None, stimulation_pattern=None)#

Method for plotting the signals in each cluster.

Parameters:

file_idx (int | None) – If int, only the traces of the file with index ‘file_idx’ will be plotted.
display (str) – Defines the way traces will be displayed ‘all’: display all traces ‘mean_std’: only display the mean trace and the standard deviation.
ncols (int) – Number of columns for the display.
scale (tuple of float | None) – Min and max scales for the display (arguments vmin / vmax from plt.imshow). If None, auto scale will be used.
stimulation_pattern (list of tuple of int | None) – List containing beginnings and ends of stimulation windows.

print_file_info()#: To print in terminal the names of files included in the process and their associated indices.

reset_cmap()#: Function for reseting the colormap to default values, including when the number of cluster has changed.

switch_colors_in_cmap(color_swaps)#

Utility method for swapping colors in the colormap to adjust the display of the cluster traces.

Parameters:: color_swaps (list of tuple of int) – List containing pairs of colors to be switched.

class clustering.SingleVoxelClusteringWrapper(method: str, n_clusters: int, filelist: List[str], atlas_path: str, regions_info_path: str, fe_method: None | str = None, fe_params: dict = {}, noise_th: float | None = None, registered: bool = True, normalization: None | str = None)#

Bases: object

Wrapper to perform the single voxel clustering in various configurations.

Parameters:

method (str) –
Choice of the method, between:
- volume: all voxels in the image
- hemisphere: all voxels in a given or both hemispheres
- structure: all voxels belonging to a list of structures
- multiregion: all voxels belonging to a list of regions
n_clusters (int) – Number of clusters to search for.
filelist (list of string) – List of paths to the volumes whose voxels will be clustered.
atlas_path (string) – Path to the file containing the atlas volume.
regions_info_path (string) – Text file listing all the regions included in the atlas.
fe_method (string) – Selection of the feature extraction method, to choose between ‘pca’, ‘ica’, ‘nnmf’ or custom.
fe_params (dict) – Selection of the feature extraction parameters.
noise_th (float.) – Threshold on the standard deviation of the baseline to remove ‘noisy’ voxels.

change_cmap(new_cmap: None | list | str = None, continuous: bool = True)#

Method for changing the colormap.

Parameters:

new_cmap (None | list | str) – Either a list of RGBA values or the name of a colormap available in pyplot.
continuous (bool) – Set to True if the colormap is continuous and False if sequential.

compute_amplitude_extrema(quantiles)#

Utility function for computing the extrema of each cluster for computing the transparency value associated with parameter ‘amplitude_transparency’ from method ‘display_cluster_locations’. Note: since the extrema are noisy, quantiles are used instead.

Parameters:: quantiles (list of float) – List of size 2 containing the quantiles for the minimum and maximum estimation respectively.
Returns:: extrema – Dictionary containing for each cluster the extrema for normalization.
Return type:: dict of tuple

display_atlas_mask_selected_regions(regions_list)#

Utility function to display on the atlas the list of selected regions.

Parameters:: regions_list (list of str) – List of regions acronyms to be displayed on the atlas.

display_cluster_locations(names: list | None = None, amplitude_transparency=False, quantiles=[0.05, 0.95], cluster_ids=None, hires_dst=None)#

Utility function for calling the display function for spatial locations depending on the method that was selected.

Parameters:

names (list) – Identifiers of the elements to be displayed.
amplitude_transparency (bool) – Whether or not to modulate the amplitude of the cluster based on the amplitude of the signals. Modulation is cluster specific.
quantiles (list of float) – List of size 2 containing the quantiles for the minimum and maximum estimation respectively. Only used if amplitude transparency is True.
cluster_ids (None | list) – List of the clusters to be displayed. If None, all clusters will be displayed.
hires_dst (str | None) – If string, path where the hi-res files will be saved. If None, hi-res files won’t be generated nor saved.

get_cluster_maps() → dict#

Utility function for getting the arrays containing the cluster maps where each voxel’s value represents its cluster attribution.

Parameters:: use_boundaries (bool) – Whether or not to adjust the boundaries of the maps to the selected regions or structure. If False, the full map are output.
Returns:: res – A dictionary with keys being the element names (eg all sessions IDs) and values the associated cluster maps.
Return type:: dict

get_names()#: Method for printing the names of the data elements in the clustering object.

get_regions_from_groups(group: str, hemisphere: str)#

Utility function for converting an anatomical group into a list of regions included in that group in the format expected by the clustering code.

Parameters:

group (str) – Acronym of the anatomical group
hemisphere (str) – ‘L’ or ‘R’ respectively for left and right hemispheres, or ‘LR’ for both.

Returns:

res – List of tuples in the format (region acronym, hemisphere ‘L’ or ‘R’)

Return type:

list

get_signals(reduction='median') → dict#

Utility function for accessing the temporal signals associated with each cluster.

Parameters:: reduction (str | None) – If ‘median’, returns the median of all signals in each cluster. If None, returns all signals.
Returns:: res – A dictionary with keys being the cluster IDs and values the associated temporal trace(s).
Return type:: dict

merge_clusters(pairs_to_merge, adjust_cmap=True)#

Utility function for merging clusters together.

Parameters:

pairs_to_merge (list of tuple of int) – List of pair of clusters to be merged. Higher cluster number will be merged to the lower cluster number, ie if cluster 5 and 3 are merged, in the cluster maps 5s will become 3s. Example input: [(5,3), (1,6)].
adjust_cmap (bool) – Whether or not to adjust the colormap to the new number of clusters.

normalize(data)#

plot_signals(display: str = 'all', scale: None | Tuple[float, float] = None, stimulation_pattern=None)#

Utility function for calling the display function for temporal traces depending on the method that was selected.

Parameters:

display (str) – Either ‘all’ to display all signals or ‘mean_std’ for mean/std of the signals.
scale (tuple of float | None) – Min and max scales for the display (arguments vmin / vmax from plt.imshow). If None, auto scale will be used.
stimulation_pattern (list of tuple of int | None) – List containing beginnings and ends of stimulation windows.

process(acr_list: str | List[Tuple[str, str]] | None = None)#

Method function to call for clustering the data.

Parameters:: acr_list (list of tuple | None) – If None, all regions are considered. Otherwise, only the acronyms present in the list will be included in the clustering process. The format of the tuple is (<region acronym>, <hemisphere>).

reset_cmap()#: Method for reseting the color map.

switch_colors_in_cmap(color_swaps)#

Utility method for swapping colors in the colormap to adjust the display of the cluster traces.

Parameters:: color_swaps (list of tuple of int) – List containing pairs of colors to be switched.

clustering.region_loading(filelist: ~typing.List[str], acr_list: ~typing.List[str], atlas: ~numpy.ndarray, regions_nb: ~numpy.ndarray, regions_acr: ~numpy.ndarray, regions_centered: bool = True) -> (<class 'numpy.ndarray'>, <class 'numpy.ndarray'>, <class 'numpy.ndarray'>, <class 'numpy.ndarray'>, typing.List[typing.Tuple[int, int]])#

Utility function for loading a single volume along with the necessary info: coordinates, file indices and shape of the volume. Note that ‘nan’ values are excluded during the process.

Parameters:

filelist (list of str) – The list of paths to the files that will be included in the clustering process.
acr_list (list of str) – The list of the acronyms to be used during the clustering.
atlas (ndarray) – 3D volume of the atlas, where each voxel’s value corresponds to its region number.
regions_nb (ndarray) – Array containing the regions’ numbers.
regions_acr (ndarray) – Array containing the regions’ acronyms. Same order as regions_acr.
regions_centered (bool) – If True, the output will be cropped so that the boundaries fit the selected areas.

Returns:

data (ndarray) – Contains the volume data (time traces).
coords (ndarray) – Contains the coordinates in the volume of each voxel.
file_indices (array) – Array filled with int values corresponding to the volume the data was extracted from. For example, 0 will correspond to data from the first loaded file.
data_volume_shape (list of array) – list of shapes of the volumes (excluding the time dimension) for reconstructing the volumes later

clustering.volume_loading(filelist: ~typing.List[str]) -> (<class 'numpy.ndarray'>, <class 'numpy.ndarray'>, <class 'numpy.ndarray'>, <class 'numpy.ndarray'>)#

Utility function for loading volumes along with the necessary info: coordinates, file indices and shape of the volume. Note that ‘nan’ values are excluded during the process.

Parameters:

filelist (list of str) – List of paths to the files containing the volumes to be loaded.

Returns:

dataset (ndarray) – Contains the volumes data (time traces).
coords (ndarray) – Contains the coordinates in the volume of each voxel.
file_indices (array) – Array filled with int values corresponding to the volume the data was extracted from. For example, 0 will correspond to data from the first loaded file.
data_volume_shape (array) – Shape of the volumes (excluding the time dimension) for reconstructing the volume later

clustering module

Contents

clustering module#