data_loading module#

@author: Théo Lambert

This module regroups all the functions and classes related to data loading.

class data_loading.DataLoaderAndPreproc(root_folder: str, experiment_ID: str, expgroup_ID: list[str] | None = None, subject_ID: list[str] | None = None, session_ID: list[str] | None = None, stim_ID: list[str] | None = None, mode: str = 'folder_tree', excel_source: str | None = None, reduction: str = 'median', level_avg: str = 'stim', register: bool = True, atlas_resolution: int = 100, baseline: None | Iterable[int] = None, make_reliability_maps: bool = False, remove_unreliable: float | None = None, trial_preprocessing: object | None = None)#

Bases: object

Object handling the whole data loading and preprocessing step.

Parameters:
  • root_folder (str) – Path to the folder in which are stored the data for all the experiments. See doc for more info on folder structure.

  • experiment_ID (str) – Name/identifier of the experiment.

  • expgroup_ID (list of str) – List providing the expgroups to be processed.

  • subject_ID (list of str) – List providing the subject to be processed.

  • session_ID (list of str) – List providing the sessions to be processed.

  • stim_ID (list of str) – List providing the stimuli to be processed.

  • mode (string) – Loading mode, to select in folder_tree or excel. See general doc for more information.

  • excel_source (string) – Optional, used only if mode==’excel’. Path to the excel source file.

  • reduction (string) – To select how to aggregate the data: single_trial, median or mean.

  • level_avg (string) – Set the level at which the reduction will be performed: all, expgroup, subject, session.

  • register (bool) – Whether or not to register the data.

  • atlas_resolution (int) – Resolution of the atlas to which the data will be registered in µm (ex: 100).

  • baseline (iterable) – Baseline period on which to baseline the data (ex: range(0,20)).

  • make_reliability_maps (bool) – Whether or not to make reliability maps. Check the quality_control module for more info.

  • remove_unreliable (float | None) – Whether or not to filter the loaded data based on the reliability map. Excluded values will be set to NaN.

  • trial_preprocessing (object | None) – If None, will be ignored. If object, will be used to preprocess trials. Check quality_control.py for further details.

check_ids()#

Method for checking that the different IDs that were extracted from the folder structure or the excel file do not have the character ‘_’ in their string, as it is used later to parse filenames.

check_transf(session_path: str) None | str#

Method for checking if a transformation matrix for the session can be found in the folder structure. It must be located in the ‘other’ folder, and have either ‘transf’ or ‘Transf’ separated by ‘_’. Examples of valid names are: ‘tmp_Transf_001.mat’, ‘Transf.mat’ or ‘transf_XXX.mat’.

Parameters:

session_path (str) – Path to the session for which the transform matrix must be searched.

Returns:

transf_path – Returns None if no transformation matrix was found, or the path to the transform matrix.

Return type:

None | str

data_iterator_excel_file(level_avg: None | str = None)#

Method for generating an iterator from the info extracted of an excel file and yield destination filenames as well as raw data paths. Note: the code is quite similar to the iterator of folder tree, but they were not merged for the sake of clarity since multiple lines differ.

Parameters:

level_avg (string | None) – Level at which the reduction will be performed. Set to None if single trials are requested, then the full folder tree will be reproduced.

data_iterator_folder_tree(level_avg: None | str = None) None#

Method for generating an iterator over the folder tree structure and yield destination filenames as well as raw data paths.

Parameters:

level_avg (str | None) – Level at which the reduction will be performed. Set to None if single trials are requested, then the full folder tree will be reproduced.

extract_info_excel_file(excel_source: str)#

Function for extracting the data structure (subject, sessions, etc) from the excel file and create a dictionary representing this structure. If the data associated with a key is empty, it means this session or stimulus does not exist for this animal.

Parameters:

excel_source (str) – Path to the excel file from which to extract the structure.

init_data_reduction_func() None#

Method for initializing the data reduction function depending on the ‘reduction’ param defined in the init of the class.

list_selected_data() str#

Method for generating the log file, containing all the info about the data that were selected and the processing that was applied.

Parameters:
  • data (ndarray) – Input data, 3D volume in time.

  • outliers (ndarray) – Array of outliers as output by the ‘get_outliers’ method.

Returns:

header – A string containing the log.

Return type:

str

load_data_and_transf(filelist: list[str], transf_path: str, name: str, block_size: int = 5, trial_preprocessing=typing.Optional[object]) -> (<class 'dict'>, <class 'dict'>)#

Method for replacing the frames identified as outliers with an interpolation of neighbouring frames.

Parameters:
  • filelist (list of str) – List of paths to files to be loaded.

  • transf_path (str) – Path to the list of files to be loaded.

  • name (str) – Name of the output file. Only necessary if required by a trial selection object.

  • block_size (int) – Used for low-memory option. NOT PROPERLY IMPLEMENTED YET.

  • trial_preprocessing (None or object) – If None, no trial-based preprocessing. Otherwise, the ‘__call__’ method of the object is applied to the list of data. See quality_control.OutlierFrameRemoval for an example.

Returns:

  • data (dict) – A dictionary containing the reduced data, its size, voxel size and direction / anatomical orientation.

  • M (ndarray) – The 4x4 transformation matrix.

normalize_data(data: ndarray) ndarray#

Method for normalizing and centering the data using the self.baseline range.

Parameters:

data (ndarray) – The 4D data to be normalized.

Returns:

The normalized and centered data.

Return type:

data

process_data(print_reg_error: bool = True, block_size: int = 5)#

Main method for performing the data loading process. /!maybe implement an iterative way of computing mean or median for heavy datasets

Parameters:
  • print_reg_error (bool) – If True, data that were skipped because no transformation matrix was found will be displayed in the terminal.

  • block_size (int) – LOW-MEMORY IS NOT IMPLEMENTED YET SO NOT USED.

replicate_folder_tree_structure(level: None | str = None) None#

Method for replicating the folder tree structure in order to store the processed data.

Parameters:

level (str | None) – Level at which the reduction will be performed. Set to None if single trials are requested, then the full folder tree will be reproduced.

scan_folder_tree(print_skip_session: bool = False) None#

Method for checking whether all the requested data and associated transform matrix have been found in the folder structure.

Parameters:

print_skip_session (bool) – Whether or not to print the requested sessions that have not been found.