data_loading module#
@author: Théo Lambert
This module regroups all the functions and classes related to data loading.
- class data_loading.DataLoaderAndPreproc(root_folder: str, experiment_ID: str, expgroup_ID: list[str] | None = None, subject_ID: list[str] | None = None, session_ID: list[str] | None = None, stim_ID: list[str] | None = None, mode: str = 'folder_tree', excel_source: str | None = None, reduction: str = 'median', level_avg: str = 'stim', register: bool = True, atlas_resolution: int = 100, baseline: None | Iterable[int] = None, make_reliability_maps: bool = False, remove_unreliable: float | None = None, trial_preprocessing: object | None = None)#
Bases:
objectObject handling the whole data loading and preprocessing step.
- Parameters:
root_folder (str) – Path to the folder in which are stored the data for all the experiments. See doc for more info on folder structure.
experiment_ID (str) – Name/identifier of the experiment.
expgroup_ID (list of str) – List providing the expgroups to be processed.
subject_ID (list of str) – List providing the subject to be processed.
session_ID (list of str) – List providing the sessions to be processed.
stim_ID (list of str) – List providing the stimuli to be processed.
mode (string) – Loading mode, to select in folder_tree or excel. See general doc for more information.
excel_source (string) – Optional, used only if mode==’excel’. Path to the excel source file.
reduction (string) – To select how to aggregate the data: single_trial, median or mean.
level_avg (string) – Set the level at which the reduction will be performed: all, expgroup, subject, session.
register (bool) – Whether or not to register the data.
atlas_resolution (int) – Resolution of the atlas to which the data will be registered in µm (ex: 100).
baseline (iterable) – Baseline period on which to baseline the data (ex: range(0,20)).
make_reliability_maps (bool) – Whether or not to make reliability maps. Check the quality_control module for more info.
remove_unreliable (float | None) – Whether or not to filter the loaded data based on the reliability map. Excluded values will be set to NaN.
trial_preprocessing (object | None) – If None, will be ignored. If object, will be used to preprocess trials. Check quality_control.py for further details.
- check_ids()#
Method for checking that the different IDs that were extracted from the folder structure or the excel file do not have the character ‘_’ in their string, as it is used later to parse filenames.
- check_transf(session_path: str) None | str#
Method for checking if a transformation matrix for the session can be found in the folder structure. It must be located in the ‘other’ folder, and have either ‘transf’ or ‘Transf’ separated by ‘_’. Examples of valid names are: ‘tmp_Transf_001.mat’, ‘Transf.mat’ or ‘transf_XXX.mat’.
- Parameters:
session_path (str) – Path to the session for which the transform matrix must be searched.
- Returns:
transf_path – Returns None if no transformation matrix was found, or the path to the transform matrix.
- Return type:
None | str
- data_iterator_excel_file(level_avg: None | str = None)#
Method for generating an iterator from the info extracted of an excel file and yield destination filenames as well as raw data paths. Note: the code is quite similar to the iterator of folder tree, but they were not merged for the sake of clarity since multiple lines differ.
- Parameters:
level_avg (string | None) – Level at which the reduction will be performed. Set to None if single trials are requested, then the full folder tree will be reproduced.
- data_iterator_folder_tree(level_avg: None | str = None) None#
Method for generating an iterator over the folder tree structure and yield destination filenames as well as raw data paths.
- Parameters:
level_avg (str | None) – Level at which the reduction will be performed. Set to None if single trials are requested, then the full folder tree will be reproduced.
- extract_info_excel_file(excel_source: str)#
Function for extracting the data structure (subject, sessions, etc) from the excel file and create a dictionary representing this structure. If the data associated with a key is empty, it means this session or stimulus does not exist for this animal.
- Parameters:
excel_source (str) – Path to the excel file from which to extract the structure.
- init_data_reduction_func() None#
Method for initializing the data reduction function depending on the ‘reduction’ param defined in the init of the class.
- list_selected_data() str#
Method for generating the log file, containing all the info about the data that were selected and the processing that was applied.
- Parameters:
data (ndarray) – Input data, 3D volume in time.
outliers (ndarray) – Array of outliers as output by the ‘get_outliers’ method.
- Returns:
header – A string containing the log.
- Return type:
str
- load_data_and_transf(filelist: list[str], transf_path: str, name: str, block_size: int = 5, trial_preprocessing=typing.Optional[object]) -> (<class 'dict'>, <class 'dict'>)#
Method for replacing the frames identified as outliers with an interpolation of neighbouring frames.
- Parameters:
filelist (list of str) – List of paths to files to be loaded.
transf_path (str) – Path to the list of files to be loaded.
name (str) – Name of the output file. Only necessary if required by a trial selection object.
block_size (int) – Used for low-memory option. NOT PROPERLY IMPLEMENTED YET.
trial_preprocessing (None or object) – If None, no trial-based preprocessing. Otherwise, the ‘__call__’ method of the object is applied to the list of data. See quality_control.OutlierFrameRemoval for an example.
- Returns:
data (dict) – A dictionary containing the reduced data, its size, voxel size and direction / anatomical orientation.
M (ndarray) – The 4x4 transformation matrix.
- normalize_data(data: ndarray) ndarray#
Method for normalizing and centering the data using the self.baseline range.
- Parameters:
data (ndarray) – The 4D data to be normalized.
- Returns:
The normalized and centered data.
- Return type:
data
- process_data(print_reg_error: bool = True, block_size: int = 5)#
Main method for performing the data loading process. /!maybe implement an iterative way of computing mean or median for heavy datasets
- Parameters:
print_reg_error (bool) – If True, data that were skipped because no transformation matrix was found will be displayed in the terminal.
block_size (int) – LOW-MEMORY IS NOT IMPLEMENTED YET SO NOT USED.
- replicate_folder_tree_structure(level: None | str = None) None#
Method for replicating the folder tree structure in order to store the processed data.
- Parameters:
level (str | None) – Level at which the reduction will be performed. Set to None if single trials are requested, then the full folder tree will be reproduced.
- scan_folder_tree(print_skip_session: bool = False) None#
Method for checking whether all the requested data and associated transform matrix have been found in the folder structure.
- Parameters:
print_skip_session (bool) – Whether or not to print the requested sessions that have not been found.