paprica.parser

Submodule containing classes and functions relative to Parsing.

The general idea of this submodule is to parse the data to be processed later on. This submodule was developed for our particular folder layout and was particularly adapted for COLM, mesoSPIM and ClearScope.

Note that each channel is parsed separately so as to give maximum flexibility for stitching and visualization.

There are two general way of parsing the data: - multitile parsing (tileParser class), where each tile has a given position on a 2D grid and can therefore be stitched - independant parsing (baseParser class), where each tile is independent

We also provide a few classes to parse data from given microscopes: - COLM - MesoSpim - ClearScope

By using this code you agree to the terms of the software license agreement.

paprica.parser.autoParser(path, **kwargs)[source]

This function allows to parse a data-set automatically by guessing the folder architecture. This function uses the microscope_list that is automatically infered from this submodule. For each microscope it will call the __is_valid_acquisition() method and return the according parser object.

Parameters: path (str) – path containing the data
Return type: tileParser object

class paprica.parser.baseParser(path, frame_size, ftype, verbose=True)[source]

Bases: object

Class used to parse several independent tiles (not multitile).

__getitem__(item)[source]: Return tiles, add neighbors information before returning.

__init__(path, frame_size, ftype, verbose=True)[source]

Constructor of the baseParser object.

Parameters

path (string) – path where to look for the data.
frame_size (int) – size of each frame (camera resolution).
ftype (string) – input data type in ‘apr’, ‘tiff2D’ or ‘tiff3D’

__iter__()[source]

Method called under the hood when iterating on tiles. Because it is a generator, the memory is released at each iteration.

Return type: Generator containing the tileLoader object.

__len__()[source]: Returns the number of tiles.

_correct_offset()[source]: If the row or column do not start at 0, then we subtract the min_row and min_col so that it starts at 0.

_get_neighbors_map()[source]: Returns the non-redundant neighbors map: neighbors[row, col] gives a list of neighbors and the total number of pair-wise neighbors. Only SOUTH and EAST are returned to avoid the redundancy.

_get_path_list()[source]: Returns a list containing the path to each tile.

_get_tile_list()[source]: Returns a list of tiles as a dictionary

_get_tiles_from_path(files)[source]: Create a list of dictionnary for each tile containing it’s path and coordinate on the grid. Coordinates are set to None for the baseParser which only parse independant tiles.

_get_tiles_path()[source]: Returns a list containing file paths (for tiff3D and APR) or folder paths (for tiff2).

_get_tiles_pattern()[source]: Return the tile pattern (0 = no tile, 1 = tile)

_get_total_neighbors_map()[source]: Return the total neighbors maps (with redundancy in the case of undirected graph).

_print_info()[source]: Display parsing summary in the terminal.

_sort_tiles()[source]: Sort tiles so that they are arranged in columns and rows (read from left to right and top to bottom).

check_files_integrity()[source]

Check that all tiles are readable and not corrupted.

Return type: None

compute_average_CR(progress_bar=True)[source]

Compute the average Computational Ratio (CR). Note: data must be of type APR.

Returns: cr – average CR for the dataset
Return type: float

make_lazy_loadable()[source]

Loads all parsed APR tiles and compute and save the tree parts so that the data-set becomes lazy loadable.

Return type: None

class paprica.parser.clearscopeParser(path, channel=0, verbose=True)[source]

Bases: paprica.parser.tileParser

Class used to parse multi-tile colm data where each tile position in space matters. Tile parsed this way are usually stitched later on.

__init__(path, channel=0, verbose=True)[source]

Constructor of the tileParser object for COLM acquisition.

Parameters

path (string) – path where to look for the data.
channel (int) – fluorescence channel for parsing CLEARSCOPE data
verbose (bool) – Control verbosity of the parsing. If True, the parser will print acquisition info in the terminal.

static _get_n_channels(path)[source]: Returns the number of channel of the CLEARSCOPE acquisition. This is based on a regex that checks the highest n in the path: some_path/X_Y___nx/

_get_row_col(n)[source]

Get ClearScope tile row and col position given the tile number.

Parameters

n (int) – ClearScope tile number

Returns

row (int) – row number
col (int) – col number

_get_tiles_from_path(files)[source]: Returns a list of tiles as a dictionary for ClearScope data.

_get_tiles_path()[source]: Returns a list containing ClearScope folders which contains individual tiff.

static _is_valid_acquisition(path)[source]

This function returns True if path folder is a CLEARSCOPE acquisition. :param path: path to check if acquisition is COLM :type path: str

Return type: bool

interpolate_missing_frames()[source]

Interpolate missing frames and save them.

Return type: None

class paprica.parser.colmParser(path, channel=0, verbose=True)[source]

Bases: paprica.parser.tileParser

Class used to parse multi-tile colm data where each tile position in space matters. Tile parsed this way are usually stitched later on.

__init__(path, channel=0, verbose=True)[source]

Constructor of the tileParser object for COLM acquisition.

Parameters

path (string) – path where to look for the data. More specifically it should be the folder that contains the acquisition.
channel (int) – fluorescence channel for parsing COLM LOCXXX data
verbose (bool) – Control verbosity of the parsing. If True, the parser will print acquisition info in the terminal.

static _get_n_channels(path)[source]: Returns the number of channel of the COLM acquisition.

_get_tiles_from_path(files)[source]: Returns a list of tiles as a dictionary for data saved as LOC00X.

_get_tiles_path()[source]: Returns a list containing COLM folders which contains individual tiff.

static _is_number(s)[source]: Check if a string contain any king of number (isnumeric() method doesn’t work for decimal numbers). :rtype: True if s is a number, False otherwise

static _is_valid_acquisition(path)[source]

This function returns True if path folder is a COLM acquisition. :param path: path to check if acquisition is COLM :type path: str

Return type: bool

get_overlap()[source]: Extract overlap from COLM Experiment.ini file.

paprica.parser.get_microscope_list()[source]

This function builds up a dict containing microscopes supported by the pipeline (refer to the documentation to add yours). By default, this list should be equivalent to:

>>> microscope_list = {'colm': colmParser,
>>>                   'clearscope': clearscopeParser,
>>>                   'default': tileParser
>>>                    }

The list is built by looking at the class that inherits from the base parser class and that are not the base parser class.

Returns: microscope_list – dictionnary containing all supported microscopes.
Return type: dict

paprica.parser.get_number_of_channels(path)[source]

This functions returns the number of channels acquired in an acquisition located in ´path´ folder. :param path: path to the folder to check the number of channels :type path: str

Return type: Number of acquired channels

class paprica.parser.tileParser(path, frame_size=2048, ftype=None, verbose=True)[source]

Bases: paprica.parser.baseParser

Class used to parse multi-tile data where each tile position in space matters. Tile parsed this way are usually stitched later on.

__getitem__(item)[source]

If item is an int, then returns the corresponding tileLoader object.

If item is a tuple, then returns the corresponding (row, col) tileLoader object.

If item is a slice, then it creates a generator so the tileLoader object is garbage collected at each iteration.

__init__(path, frame_size=2048, ftype=None, verbose=True)[source]

Constructor of the tileParser object.

Parameters

path (string) – path where to look for the data.
frame_size (int) – size of each frame (camera resolution).
ftype (string) – input data type in ‘apr’, ‘tiff2D’ or ‘tiff3D’

__iter__()[source]

Method called under the hood when iterating on tiles. Because it is a generator, the memory is released at each iteration.

Return type: Generator containing the tileLoader object.

static _get_n_channels(path)[source]: Returns the number of channel of the CLEARSCOPE acquisition. This is based on a regex that checks the highest n in the path: some_path/X_Y___nx/

_get_ncol()[source]: Returns the number of columns (H) to be stitched.

_get_nrow()[source]: Returns the number of rows (V) to be stitched.

_get_tiles_from_path(files)[source]: Create a list of dictionnary for each tile containing it’s path and coordinate on the grid.

_get_type()[source]: Automatically determine file type based on what’s inside ‘path’.

_print_info()[source]: Display parsing summary in the terminal.