clustering module#
@author: Théo Lambert
This module regroups all the functions related to single-voxel clustering.
- class clustering.SingleVoxelClustering(n_clusters: int, data: ndarray, coords: ndarray, file_indices, names: List[str], data_volume_shape: array, fe_method: str = 'pca', fe_params: dict | None = {}, noise_th: float | None = None)#
Bases:
objectMain class handling the single voxel clustering process.
- Parameters:
n_clusters (int) – Number of clusters to form.
data (ndarray) – Data volume (3D in time) on which to perform the clustering.
coords (ndarray) – Array containing the volume coordinates of each voxel in ‘data’.
file_indices (array) – Array containing for each voxel the index identifying the corresponding file.
names (array) – Array containing for each voxel the index identifying the corresponding file.
data_volume_shape (array) – The shape of the ‘data’ array.
fe_method (str) – Selection of the method for extracting the features, check the user guide for more info.
fe_params (dict) – Dictionary of parameters for the feature extraction method.
noise_th (float) – Threshold on the standard deviation of the baseline to remove ‘noisy’ voxels.
- change_cmap(new_cmap: list | str, continuous=True)#
Method for changing the colormap.
- Parameters:
new_cmap (list | str) – Either a list of RGBA values or the name of a colormap available in pyplot.
continuous (bool) – Set to True if the colormap is continuous and False if sequential.
- cluster_dataset()#
Method for attributing each time trace with a cluster. Default model is K-Means.
- compute_amplitude_map(file_idx, extrema)#
Method for computing the transparency map based on the normalized amplitude.
- Parameters:
file_idx (int) – Index of the file to be displayed.
extrema (dict) – A dictionary containing the extrema for each cluster for the amplitude normalization.
- Returns:
amplitude_map – The map containing the transparency values.
- Return type:
ndarray
- extract_features(fe_method: str, fe_params: dict)#
Method for applying the feature extraction process to the data.
- Parameters:
fe_method (str | object) – Refer to the feature extraction package for more details. Accepted values: pca, ica, nnmf, or an object with ‘fit’ or ‘fit_predict’ method
fe_params (dict) – Dictionary containing the parameters to provide to the feature extractor.
- Returns:
features – Features extracted from the data.
- Return type:
ndarray
- flatten_volume(volume: ndarray, volume_boundaries: array | None = None, ncols: int = 8, apply_colormap: bool = True) ndarray#
Utility function for creating flattened 3D volumes towards displaying them as a mosaique of 2D images, where each value is the cluster attribution of the associated voxel.
- Parameters:
volume (ndarray) – The 3D volume to be flattened.
volume_boundaries (array | None) – For setting custom boundaries, if None the dimensionses of the whole volume will be taken.
ncols (int) – Number of columns to be used for the flattening.
apply_colormap (bool) – If True, the colormap is applied to replace cluster IDs with the corresponding color.
- Returns:
res – A 2D array with values being either int (cluster ID) or tuple (color associated with cluster ID) depeding on the ‘apply_colormap’ parameter.
- Return type:
ndarray
- generate_background_img()#
Method for generating the microDoppler background image
- get_cluster_locations(file_idx: int, volume_boundaries: array | None = None, plot: bool = True, registered: bool = False, atlas_path: None | str = None, cluster_ids=None, extrema=None, hires_dst=None)#
Method for displaying the map in which each voxel is color-coded depending on its cluster attribution.
- Parameters:
file_idx (int) – Index of the file to be displayed.
volume_boundaries (array | None) – For setting custom boundaries, if None the dimensionses of the whole volume will be taken.
plot (bool) – If True, a map in which each voxel is color-coded depeding on its cluster attribution will be displayed.
registered (bool) – If the data was registered during the data loading, set to True to have the atlas info available through the cursor.
atlas_path (str) – Path to the atlas that the data has been registered to.
cluster_ids (list of int | None) – The list of cluster to be displayed.
extrema (tuple) – Dictionary containing for each cluster the extrema for normalization.
hires_dst (str | None) – If string, path where the hi-res files will be saved. If None, hi-res files won’t be generated nor saved.
- Returns:
volume – A volume (same size as the original data) in which each value represents the cluster attribution.
- Return type:
ndarray
- get_signals_or_coords(output: str = 'signals', file_idx: int | None = None)#
Generates an iterator yielding for each sample in data the cluster id and either the data or the coords
- Parameters:
output (str) – If ‘signals’, yield the time trace of each sample in data. If ‘coords’, the coordinates instead.
file_idx (int | None) – If int, yield the signals associated with the file index. Else, returns everything.
- merge_clusters(pairs_to_merge, adjust_cmap=True)#
Utility function for merging clusters together.
- Parameters:
pairs_to_merge (list of tuple of int) – List of pair of clusters to be merged. Higher cluster number will be merged to the lower cluster number, ie if cluster 5 and 3 are merged, in the cluster maps 5s will become 3s. Example input: [(5,3), (1,6)].
adjust_cmap (bool) – Whether or not to adjust the colormap to the new number of clusters.
- plot_signals(file_idx: int | None = None, display: str = 'all', ncols: int = 4, scale: None | Tuple[float, float] = None, stimulation_pattern=None)#
Method for plotting the signals in each cluster.
- Parameters:
file_idx (int | None) – If int, only the traces of the file with index ‘file_idx’ will be plotted.
display (str) – Defines the way traces will be displayed ‘all’: display all traces ‘mean_std’: only display the mean trace and the standard deviation.
ncols (int) – Number of columns for the display.
scale (tuple of float | None) – Min and max scales for the display (arguments vmin / vmax from plt.imshow). If None, auto scale will be used.
stimulation_pattern (list of tuple of int | None) – List containing beginnings and ends of stimulation windows.
- print_file_info()#
To print in terminal the names of files included in the process and their associated indices.
- reset_cmap()#
Function for reseting the colormap to default values, including when the number of cluster has changed.
- switch_colors_in_cmap(color_swaps)#
Utility method for swapping colors in the colormap to adjust the display of the cluster traces.
- Parameters:
color_swaps (list of tuple of int) – List containing pairs of colors to be switched.
- class clustering.SingleVoxelClusteringWrapper(method: str, n_clusters: int, filelist: List[str], atlas_path: str, regions_info_path: str, fe_method: None | str = None, fe_params: dict = {}, noise_th: float | None = None, registered: bool = True, normalization: None | str = None)#
Bases:
objectWrapper to perform the single voxel clustering in various configurations.
- Parameters:
method (str) –
- Choice of the method, between:
volume: all voxels in the image
hemisphere: all voxels in a given or both hemispheres
structure: all voxels belonging to a list of structures
multiregion: all voxels belonging to a list of regions
n_clusters (int) – Number of clusters to search for.
filelist (list of string) – List of paths to the volumes whose voxels will be clustered.
atlas_path (string) – Path to the file containing the atlas volume.
regions_info_path (string) – Text file listing all the regions included in the atlas.
fe_method (string) – Selection of the feature extraction method, to choose between ‘pca’, ‘ica’, ‘nnmf’ or custom.
fe_params (dict) – Selection of the feature extraction parameters.
noise_th (float.) – Threshold on the standard deviation of the baseline to remove ‘noisy’ voxels.
- change_cmap(new_cmap: None | list | str = None, continuous: bool = True)#
Method for changing the colormap.
- Parameters:
new_cmap (None | list | str) – Either a list of RGBA values or the name of a colormap available in pyplot.
continuous (bool) – Set to True if the colormap is continuous and False if sequential.
- compute_amplitude_extrema(quantiles)#
Utility function for computing the extrema of each cluster for computing the transparency value associated with parameter ‘amplitude_transparency’ from method ‘display_cluster_locations’. Note: since the extrema are noisy, quantiles are used instead.
- Parameters:
quantiles (list of float) – List of size 2 containing the quantiles for the minimum and maximum estimation respectively.
- Returns:
extrema – Dictionary containing for each cluster the extrema for normalization.
- Return type:
dict of tuple
- display_atlas_mask_selected_regions(regions_list)#
Utility function to display on the atlas the list of selected regions.
- Parameters:
regions_list (list of str) – List of regions acronyms to be displayed on the atlas.
- display_cluster_locations(names: list | None = None, amplitude_transparency=False, quantiles=[0.05, 0.95], cluster_ids=None, hires_dst=None)#
Utility function for calling the display function for spatial locations depending on the method that was selected.
- Parameters:
names (list) – Identifiers of the elements to be displayed.
amplitude_transparency (bool) – Whether or not to modulate the amplitude of the cluster based on the amplitude of the signals. Modulation is cluster specific.
quantiles (list of float) – List of size 2 containing the quantiles for the minimum and maximum estimation respectively. Only used if amplitude transparency is True.
cluster_ids (None | list) – List of the clusters to be displayed. If None, all clusters will be displayed.
hires_dst (str | None) – If string, path where the hi-res files will be saved. If None, hi-res files won’t be generated nor saved.
- get_cluster_maps() dict#
Utility function for getting the arrays containing the cluster maps where each voxel’s value represents its cluster attribution.
- Parameters:
use_boundaries (bool) – Whether or not to adjust the boundaries of the maps to the selected regions or structure. If False, the full map are output.
- Returns:
res – A dictionary with keys being the element names (eg all sessions IDs) and values the associated cluster maps.
- Return type:
dict
- get_names()#
Method for printing the names of the data elements in the clustering object.
- get_regions_from_groups(group: str, hemisphere: str)#
Utility function for converting an anatomical group into a list of regions included in that group in the format expected by the clustering code.
- Parameters:
group (str) – Acronym of the anatomical group
hemisphere (str) – ‘L’ or ‘R’ respectively for left and right hemispheres, or ‘LR’ for both.
- Returns:
res – List of tuples in the format (region acronym, hemisphere ‘L’ or ‘R’)
- Return type:
list
- get_signals(reduction='median') dict#
Utility function for accessing the temporal signals associated with each cluster.
- Parameters:
reduction (str | None) – If ‘median’, returns the median of all signals in each cluster. If None, returns all signals.
- Returns:
res – A dictionary with keys being the cluster IDs and values the associated temporal trace(s).
- Return type:
dict
- merge_clusters(pairs_to_merge, adjust_cmap=True)#
Utility function for merging clusters together.
- Parameters:
pairs_to_merge (list of tuple of int) – List of pair of clusters to be merged. Higher cluster number will be merged to the lower cluster number, ie if cluster 5 and 3 are merged, in the cluster maps 5s will become 3s. Example input: [(5,3), (1,6)].
adjust_cmap (bool) – Whether or not to adjust the colormap to the new number of clusters.
- normalize(data)#
- plot_signals(display: str = 'all', scale: None | Tuple[float, float] = None, stimulation_pattern=None)#
Utility function for calling the display function for temporal traces depending on the method that was selected.
- Parameters:
display (str) – Either ‘all’ to display all signals or ‘mean_std’ for mean/std of the signals.
scale (tuple of float | None) – Min and max scales for the display (arguments vmin / vmax from plt.imshow). If None, auto scale will be used.
stimulation_pattern (list of tuple of int | None) – List containing beginnings and ends of stimulation windows.
- process(acr_list: str | List[Tuple[str, str]] | None = None)#
Method function to call for clustering the data.
- Parameters:
acr_list (list of tuple | None) – If None, all regions are considered. Otherwise, only the acronyms present in the list will be included in the clustering process. The format of the tuple is (<region acronym>, <hemisphere>).
- reset_cmap()#
Method for reseting the color map.
- switch_colors_in_cmap(color_swaps)#
Utility method for swapping colors in the colormap to adjust the display of the cluster traces.
- Parameters:
color_swaps (list of tuple of int) – List containing pairs of colors to be switched.
- clustering.region_loading(filelist: ~typing.List[str], acr_list: ~typing.List[str], atlas: ~numpy.ndarray, regions_nb: ~numpy.ndarray, regions_acr: ~numpy.ndarray, regions_centered: bool = True) -> (<class 'numpy.ndarray'>, <class 'numpy.ndarray'>, <class 'numpy.ndarray'>, <class 'numpy.ndarray'>, typing.List[typing.Tuple[int, int]])#
Utility function for loading a single volume along with the necessary info: coordinates, file indices and shape of the volume. Note that ‘nan’ values are excluded during the process.
- Parameters:
filelist (list of str) – The list of paths to the files that will be included in the clustering process.
acr_list (list of str) – The list of the acronyms to be used during the clustering.
atlas (ndarray) – 3D volume of the atlas, where each voxel’s value corresponds to its region number.
regions_nb (ndarray) – Array containing the regions’ numbers.
regions_acr (ndarray) – Array containing the regions’ acronyms. Same order as regions_acr.
regions_centered (bool) – If True, the output will be cropped so that the boundaries fit the selected areas.
- Returns:
data (ndarray) – Contains the volume data (time traces).
coords (ndarray) – Contains the coordinates in the volume of each voxel.
file_indices (array) – Array filled with int values corresponding to the volume the data was extracted from. For example, 0 will correspond to data from the first loaded file.
data_volume_shape (list of array) – list of shapes of the volumes (excluding the time dimension) for reconstructing the volumes later
- clustering.volume_loading(filelist: ~typing.List[str]) -> (<class 'numpy.ndarray'>, <class 'numpy.ndarray'>, <class 'numpy.ndarray'>, <class 'numpy.ndarray'>)#
Utility function for loading volumes along with the necessary info: coordinates, file indices and shape of the volume. Note that ‘nan’ values are excluded during the process.
- Parameters:
filelist (list of str) – List of paths to the files containing the volumes to be loaded.
- Returns:
dataset (ndarray) – Contains the volumes data (time traces).
coords (ndarray) – Contains the coordinates in the volume of each voxel.
file_indices (array) – Array filled with int values corresponding to the volume the data was extracted from. For example, 0 will correspond to data from the first loaded file.
data_volume_shape (array) – Shape of the volumes (excluding the time dimension) for reconstructing the volume later