paprica.segmenter

Module containing classes and functions relative to Segmentation.

By using this code you agree to the terms of the software license agreement.

paprica.segmenter._predict_on_APR_block(x, clf, n_parts=10000000.0, output='class', verbose=False)[source]

Predict particle class with the trained classifier clf on the precomputed features f using a blocked strategy to avoid memory segfault.

Parameters

x (ndarray) – features (n_particle, n_features) for particle prediction
n_parts (int) – number of particles in the batch to predict
output (string) – output type, can be ‘class’ where each particle get assigned a class or ‘proba’ where each particle get assigned a probability of belonging to each class.
verbose (bool) – control function verbosity

Returns

parts_pred – Class prediction for each particle.

Return type

array_like

paprica.segmenter.compute_gradients(apr, parts)[source]

Compute gradient for each spatial direction directly on APR.

Parameters

apr ((APR) APR object) –
parts ((ParticleData) particle data sampled on APR) –

Returns

(dx, dy, dz)

Return type

(arrays) gradient for each direction

paprica.segmenter.compute_gradmag(apr, parts)[source]

Compute gradient magnitude directly on APR.

Parameters

apr ((APR) APR object) –
parts ((ParticleData) particle data sampled on APR) –

Return type

Gradient magnitude of APR.

paprica.segmenter.compute_laplacian(apr, parts, grad=None)[source]

Compute Laplacian for each spatial direction directly on APR.

Parameters

apr ((APR) APR object) –
parts ((ParticleData) particle data sampled on APR) –
grad ((dz, dy, dx) gradient for each direction if precomputed (faster for Laplacian computation)) –

Return type

Laplacian of APR.

paprica.segmenter.gaussian_blur(apr, parts, sigma=1.5, size=11)[source]

Compute Gaussian blur directly on APR.

Parameters

apr ((APR) APR object) –
parts ((ParticleData) particle data sampled on APR) –
sigma ((float) Gaussian blur standard deviation (kernel radius)) –
size ((int) kernel size (increase with caution, complexity is not linear)) –

Return type

Blurred APR.

paprica.segmenter.map_feature(apr, parts_cc, features)[source]

Map feature values to segmented particle data.

Parameters

apr (pyapr.APR) – apr object to map features to
parts_cc (pyapr.ParticleData) – connected component particle array corresponding to apr
features (array_like) – array containing the values to map

Return type

Array of mapped values (each particle in the connected component now has the value present in features)

class paprica.segmenter.multitileSegmenter(tiles: paprica.parser.tileParser, database: (<class 'str'>, <class 'pandas.core.frame.DataFrame'>), clf, func_to_compute_features, func_to_get_cc, verbose=True)[source]

Bases: object

Class used to segment multitiles acquisition.

__init__(tiles: paprica.parser.tileParser, database: (<class 'str'>, <class 'pandas.core.frame.DataFrame'>), clf, func_to_compute_features, func_to_get_cc, verbose=True)[source]

Parameters

tiles (tileLoader) – tile object for loading the tile (or containing the preloaded tile).
database (pd.DataFrame, string) – dataframe (or path to the csv file) containing the registration parameters to correctly place each tile.
clf (sklearn.classifier) – pre-trained classifier
func_to_compute_features (func) – function to compute the features on ParticleData. Must be the same set of as the one used to train the classifier.
func_to_get_cc (func) –

function to post process the segmentation map into a connected component (each cell has
a unique id)

_filter_cells_flann(c1, c2, lowe_ratio=0.7, distance_max=5)[source]

Remove cells duplicate using Flann criteria and distance threshold.

Parameters

c1 (ndarray) – array containing the first set cells coordinates
c2 (ndarray) – array containing the second set cells coordinates
lowe_ratio (float) – ratio of the second nearest neighbor distance / nearest neighbor distance above lowe_ratio, the cell is supposed to be unique. Below lowe_ratio, it might have a second detection on the neighboring tile.
distance_max (float) – maximum distance in pixel for two cells to be considered the same.
verbose (bool) – control function verbosity

Returns

_ – array containing the merged sets without the duplicates.

Return type

ndarray

_get_tile_position(row, col)[source]

Function to get the absolute tile position defined by it’s coordinate in the multitile set.

Parameters

row (int) – row number
col (int) – column number

Returns

_ – tile absolute position

Return type

ndarray

_merge_cells(tile, lowe_ratio, distance_max)[source]

Function to merge cells on a tile to the final cells list and remove duplicate.

Parameters

tile (tileLoader) – tile from which to merge cells
lowe_ratio (float) – ratio of the second nearest neighbor distance / nearest neighbor distance above lowe_ratio, the cell is supposed to be unique. Below lowe_ratio, it might have a second detection on the neighboring tile.
distance_max (float) – maximum distance in pixel for two cells to be considered the same.

Return type

None

_segment_tile(tile: paprica.loader.tileLoader, save_cc=True, save_mask=False, lazy_loading=True)[source]

Compute the segmentation and stores the result as an independent APR.

Parameters: verbose (bool) – control the verbosity of the function to print some info
Return type: None

compute_multitile_segmentation(save_cc=True, save_mask=False, lowe_ratio=0.7, distance_max=5, lazy_loading=True)[source]

Compute the segmentation and stores the result as an independent APR.

Parameters

verbose (bool) – control the verbosity of the function to print some info
save_cc (bool) – option to save the connected component particle to file
save_mask (bool) – option to save the prediction mask to file
lowe_ratio (float in ]0, 1[) – ratio between the second nearest neighbor and the first nearest neighbor to be considered a good match
distance_max (float) – maximum distance in pixel for two objects to be matched
lazy_loading (bool) – option to save the tree particles to allow for lazy loading later on

Return type

None

extract_and_merge_cells(lowe_ratio=0.7, distance_max=5)[source]

Function to extract cell positions in each tile and merging across all tiles. Identical cells on overlapping area are automatically detected using Flann method.

Parameters

lowe_ratio (float) – ratio of the second nearest neighbor distance / nearest neighbor distance above lowe_ratio, the cell is supposed to be unique. Below lowe_ratio, it might have a second detection on the neighboring tile.
distance_max (float) – maximum distance in pixel for two cells to be considered the same.

Return type

None

classmethod from_classifier(tiles: paprica.parser.tileParser, database: (<class 'str'>, <class 'pandas.core.frame.DataFrame'>), classifier, func_to_compute_features, func_to_get_cc=None, verbose=True)[source]

Instantiate tileSegmenter object with a classifier, function to compute the features and to get the connected components.

Parameters

classifier –
func_to_compute_features (func) – function to compute features used by the classifier to perform the segmentation.
func_to_get_cc (func) – function to compute the connected component from the classifier prediction.
verbose (bool) – control function output.

Return type

tileSegmenter object

classmethod from_trainer(tiles: paprica.parser.tileParser, database: (<class 'str'>, <class 'pandas.core.frame.DataFrame'>), trainer, verbose=True)[source]

Instantiate tileSegmenter object with a tileTrainer object.

Parameters

trainer (tileTrainer) – trainer object previously trained for segmentation
verbose (bool) – control function output

Return type

tileSegmenter object

save_cells(output_path)[source]

Save cells as a CSV file.

Parameters: output_path (string) – path for saving the CSV file.
Return type: None

paprica.segmenter.particle_levels(apr)[source]

Returns apr level: for each particle the lvl is defined as the size of the particle in pixel.

Parameters: apr ((APR) APR object) –
Return type: Particle level.

class paprica.segmenter.tileSegmenter(clf, func_to_compute_features, func_to_get_cc, verbose)[source]

Bases: object

Class used to segment tiles. It is instantiated with a tileLoader object, a previously trained classifier, a function to compute features (the same features used to train the classifier and a function to get the post processed connected component for the classifier output.

__init__(clf, func_to_compute_features, func_to_get_cc, verbose)[source]

Parameters

clf (sklearn.classifier) – pre-trained classifier
func_to_compute_features (func) – function to compute the features on ParticleData. Must be the same set of as the one used to train the classifier.
func_to_get_cc (func) –

function to post process the segmentation map into a connected component (each cell has
a unique id)

compute_segmentation(tile: paprica.loader.tileLoader, save_cc=True, save_mask=False, lazy_loading=True)[source]

Compute the segmentation and stores the result as an independent APR.

Parameters: verbose (bool) – control the verbosity of the function to print some info
Return type: None

classmethod from_classifier(classifier, func_to_compute_features, func_to_get_cc=None, verbose=True)[source]

Instantiate tileSegmenter object with a classifier, function to compute the features and to get the connected components.

Parameters

classifier –
func_to_compute_features (func) – function to compute features used by the classifier to perform the segmentation.
func_to_get_cc (func) – function to compute the connected component from the classifier prediction.
verbose (bool) – control function output.

Return type

tileSegmenter object

classmethod from_trainer(trainer, verbose=True)[source]

Instantiate tileSegmenter object with a tileTrainer object.

Parameters

trainer (tileTrainer) – trainer object previously trained for segmentation
verbose (bool) – control function output

Return type

tileSegmenter object

class paprica.segmenter.tileTrainer(tile: paprica.loader.tileLoader, func_to_compute_features, func_to_get_cc=None)[source]

Bases: object

Class used to train a classifier that works directly on APR data. It uses Napari to manually add labels.

static _are_labels_the_same(local_labels)[source]

Determine if manual labels in particle are the same and return the labels

Parameters: local_labels (ndarray) – particle labels
Returns: ((bool)
Return type: True if labels are the same, (int) corresponding label)

_find_particle(coords)[source]

Find particle index corresponding to pixel location coords.

Parameters: coords (array_like) – pixel coordinate [z, y, x]
Returns: idx
Return type: (int) particle index

_order_labels()[source]

Order pixel_list in z increasing order, then y increasing order and finally x increasing order.

Return type: None

_remove_ambiguities(verbose)[source]

Remove particles that have been labelled with different labels.

Parameters: verbose (bool) – option to print out information.

_sample_pixel_list_on_APR()[source]

Convert manual annotations coordinates from pixel to APR.

Return type: None

add_annotations(use_sparse_labels=True, **kwargs)[source]

Add annotations on previously annotated dataset.

Parameters: use_sparse_labels (bool) – use sparse array to store the labels (memory efficient but slower graphics)
Return type: None

apply_on_tile(tile, bg_label=None, func_to_get_cc=None, display_result=True, verbose=True)[source]

Apply classifier to the whole tile and display segmentation results using Napari.

Parameters

display_result (bool) – option to display segmentation results using Napari
verbose (bool) – option to print out information.

Return type

None

display_features()[source]: Display the computed features.

display_training_annotations(**kwargs)[source]

Display manual annotations and their sampling on APR grid (if available).

Return type: None

load_classifier(path=None)[source]

Load a trained classifier.

Parameters: path (string) – path for loading the classifier. By default, it is loaded from root folder.
Return type: None

load_labels(path=None)[source]

Load previously saved labels as numpy array with columns corresponding to [z, y, x, label].

Parameters: path (string) – path to load the saved labels. By default it loads them in the data root folder.
Return type: None

manually_annotate(use_sparse_labels=True, **kwargs)[source]

Manually annotate dataset using Napari.

Parameters: use_sparse_labels (bool) – use sparse array to store the labels (memory efficient but slower graphics)
Return type: None

save_classifier(path=None)[source]

Save the trained classifier.

Parameters: path (string) – path for saving the classifier. By default, it is saved in the data root folder.
Return type: None

save_labels(path=None)[source]

Save labels as numpy array with columns corresponding to [z, y, x, label].

Parameters: path (string) – path to save labels. By default it saves them in the data root folder.
Return type: None

segment_training_tile(bg_label=None, display_result=True, verbose=True)[source]

Apply classifier to the whole tile and display segmentation results using Napari.

Parameters

display_result (bool) – option to display segmentation results using Napari
verbose (bool) – option to print out information.

Return type

None

train_classifier(verbose=True, n_estimators=10, class_weight='balanced', mean_norm=True, std_norm=True)[source]

Train the classifier for segmentation.

Parameters

verbose (bool) – option to print out information.
n_estimators (int) – The number of trees in the random forest.
class_weight ({"balanced", "balanced_subsample"}, dict or list of dicts,) –
Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one. For multi-output problems, a list of dicts can be provided in the same order as the columns of y.

Note that for multioutput (including multilabel) weights should be defined for each class of every column in its own dict. For example, for four-class multilabel classification weights should be [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of [{1:1}, {2:5}, {3:1}, {4:1}].

The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))

The “balanced_subsample” mode is the same as “balanced” except that weights are computed based on the bootstrap sample for every tree grown.

For multi-output, the weights of each column of y will be multiplied.

Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.
mean_norm (bool) – If True, center the data before scaling.
std_norm (bool) – If True, scale the data to unit variance (or equivalently, unit standard deviation).

Return type

None