paprica.segmenter
Module containing classes and functions relative to Segmentation.
By using this code you agree to the terms of the software license agreement.
© Copyright 2020 Wyss Center for Bio and Neuro Engineering – All rights reserved
- paprica.segmenter._predict_on_APR_block(x, clf, n_parts=10000000.0, output='class', verbose=False)[source]
Predict particle class with the trained classifier clf on the precomputed features f using a blocked strategy to avoid memory segfault.
- Parameters
x (ndarray) – features (n_particle, n_features) for particle prediction
n_parts (int) – number of particles in the batch to predict
output (string) – output type, can be ‘class’ where each particle get assigned a class or ‘proba’ where each particle get assigned a probability of belonging to each class.
verbose (bool) – control function verbosity
- Returns
parts_pred – Class prediction for each particle.
- Return type
array_like
- paprica.segmenter.compute_gradients(apr, parts)[source]
Compute gradient for each spatial direction directly on APR.
- Parameters
apr ((APR) APR object) –
parts ((ParticleData) particle data sampled on APR) –
- Returns
(dx, dy, dz)
- Return type
(arrays) gradient for each direction
- paprica.segmenter.compute_gradmag(apr, parts)[source]
Compute gradient magnitude directly on APR.
- Parameters
apr ((APR) APR object) –
parts ((ParticleData) particle data sampled on APR) –
- Return type
Gradient magnitude of APR.
- paprica.segmenter.compute_laplacian(apr, parts, grad=None)[source]
Compute Laplacian for each spatial direction directly on APR.
- Parameters
apr ((APR) APR object) –
parts ((ParticleData) particle data sampled on APR) –
grad ((dz, dy, dx) gradient for each direction if precomputed (faster for Laplacian computation)) –
- Return type
Laplacian of APR.
- paprica.segmenter.gaussian_blur(apr, parts, sigma=1.5, size=11)[source]
Compute Gaussian blur directly on APR.
- Parameters
apr ((APR) APR object) –
parts ((ParticleData) particle data sampled on APR) –
sigma ((float) Gaussian blur standard deviation (kernel radius)) –
size ((int) kernel size (increase with caution, complexity is not linear)) –
- Return type
Blurred APR.
- paprica.segmenter.map_feature(apr, parts_cc, features)[source]
Map feature values to segmented particle data.
- Parameters
apr (pyapr.APR) – apr object to map features to
parts_cc (pyapr.ParticleData) – connected component particle array corresponding to apr
features (array_like) – array containing the values to map
- Return type
Array of mapped values (each particle in the connected component now has the value present in features)
- class paprica.segmenter.multitileSegmenter(tiles: paprica.parser.tileParser, database: (<class 'str'>, <class 'pandas.core.frame.DataFrame'>), clf, func_to_compute_features, func_to_get_cc, verbose=True)[source]
Bases:
object
Class used to segment multitiles acquisition.
- __init__(tiles: paprica.parser.tileParser, database: (<class 'str'>, <class 'pandas.core.frame.DataFrame'>), clf, func_to_compute_features, func_to_get_cc, verbose=True)[source]
- Parameters
tiles (tileLoader) – tile object for loading the tile (or containing the preloaded tile).
database (pd.DataFrame, string) – dataframe (or path to the csv file) containing the registration parameters to correctly place each tile.
clf (sklearn.classifier) – pre-trained classifier
func_to_compute_features (func) – function to compute the features on ParticleData. Must be the same set of as the one used to train the classifier.
func_to_get_cc (func) –
- function to post process the segmentation map into a connected component (each cell has
a unique id)
- _filter_cells_flann(c1, c2, lowe_ratio=0.7, distance_max=5)[source]
Remove cells duplicate using Flann criteria and distance threshold.
- Parameters
c1 (ndarray) – array containing the first set cells coordinates
c2 (ndarray) – array containing the second set cells coordinates
lowe_ratio (float) – ratio of the second nearest neighbor distance / nearest neighbor distance above lowe_ratio, the cell is supposed to be unique. Below lowe_ratio, it might have a second detection on the neighboring tile.
distance_max (float) – maximum distance in pixel for two cells to be considered the same.
verbose (bool) – control function verbosity
- Returns
_ – array containing the merged sets without the duplicates.
- Return type
ndarray
- _get_tile_position(row, col)[source]
Function to get the absolute tile position defined by it’s coordinate in the multitile set.
- Parameters
row (int) – row number
col (int) – column number
- Returns
_ – tile absolute position
- Return type
ndarray
- _merge_cells(tile, lowe_ratio, distance_max)[source]
Function to merge cells on a tile to the final cells list and remove duplicate.
- Parameters
tile (tileLoader) – tile from which to merge cells
lowe_ratio (float) – ratio of the second nearest neighbor distance / nearest neighbor distance above lowe_ratio, the cell is supposed to be unique. Below lowe_ratio, it might have a second detection on the neighboring tile.
distance_max (float) – maximum distance in pixel for two cells to be considered the same.
- Return type
None
- _segment_tile(tile: paprica.loader.tileLoader, save_cc=True, save_mask=False, lazy_loading=True)[source]
Compute the segmentation and stores the result as an independent APR.
- Parameters
verbose (bool) – control the verbosity of the function to print some info
- Return type
None
- compute_multitile_segmentation(save_cc=True, save_mask=False, lowe_ratio=0.7, distance_max=5, lazy_loading=True)[source]
Compute the segmentation and stores the result as an independent APR.
- Parameters
verbose (bool) – control the verbosity of the function to print some info
save_cc (bool) – option to save the connected component particle to file
save_mask (bool) – option to save the prediction mask to file
lowe_ratio (float in ]0, 1[) – ratio between the second nearest neighbor and the first nearest neighbor to be considered a good match
distance_max (float) – maximum distance in pixel for two objects to be matched
lazy_loading (bool) – option to save the tree particles to allow for lazy loading later on
- Return type
None
- extract_and_merge_cells(lowe_ratio=0.7, distance_max=5)[source]
Function to extract cell positions in each tile and merging across all tiles. Identical cells on overlapping area are automatically detected using Flann method.
- Parameters
lowe_ratio (float) – ratio of the second nearest neighbor distance / nearest neighbor distance above lowe_ratio, the cell is supposed to be unique. Below lowe_ratio, it might have a second detection on the neighboring tile.
distance_max (float) – maximum distance in pixel for two cells to be considered the same.
- Return type
None
- classmethod from_classifier(tiles: paprica.parser.tileParser, database: (<class 'str'>, <class 'pandas.core.frame.DataFrame'>), classifier, func_to_compute_features, func_to_get_cc=None, verbose=True)[source]
Instantiate tileSegmenter object with a classifier, function to compute the features and to get the connected components.
- Parameters
classifier –
func_to_compute_features (func) – function to compute features used by the classifier to perform the segmentation.
func_to_get_cc (func) – function to compute the connected component from the classifier prediction.
verbose (bool) – control function output.
- Return type
tileSegmenter object
- classmethod from_trainer(tiles: paprica.parser.tileParser, database: (<class 'str'>, <class 'pandas.core.frame.DataFrame'>), trainer, verbose=True)[source]
Instantiate tileSegmenter object with a tileTrainer object.
- Parameters
trainer (tileTrainer) – trainer object previously trained for segmentation
verbose (bool) – control function output
- Return type
tileSegmenter object
- paprica.segmenter.particle_levels(apr)[source]
Returns apr level: for each particle the lvl is defined as the size of the particle in pixel.
- Parameters
apr ((APR) APR object) –
- Return type
Particle level.
- class paprica.segmenter.tileSegmenter(clf, func_to_compute_features, func_to_get_cc, verbose)[source]
Bases:
object
Class used to segment tiles. It is instantiated with a tileLoader object, a previously trained classifier, a function to compute features (the same features used to train the classifier and a function to get the post processed connected component for the classifier output.
- __init__(clf, func_to_compute_features, func_to_get_cc, verbose)[source]
- Parameters
clf (sklearn.classifier) – pre-trained classifier
func_to_compute_features (func) – function to compute the features on ParticleData. Must be the same set of as the one used to train the classifier.
func_to_get_cc (func) –
- function to post process the segmentation map into a connected component (each cell has
a unique id)
- compute_segmentation(tile: paprica.loader.tileLoader, save_cc=True, save_mask=False, lazy_loading=True)[source]
Compute the segmentation and stores the result as an independent APR.
- Parameters
verbose (bool) – control the verbosity of the function to print some info
- Return type
None
- classmethod from_classifier(classifier, func_to_compute_features, func_to_get_cc=None, verbose=True)[source]
Instantiate tileSegmenter object with a classifier, function to compute the features and to get the connected components.
- Parameters
classifier –
func_to_compute_features (func) – function to compute features used by the classifier to perform the segmentation.
func_to_get_cc (func) – function to compute the connected component from the classifier prediction.
verbose (bool) – control function output.
- Return type
tileSegmenter object
- classmethod from_trainer(trainer, verbose=True)[source]
Instantiate tileSegmenter object with a tileTrainer object.
- Parameters
trainer (tileTrainer) – trainer object previously trained for segmentation
verbose (bool) – control function output
- Return type
tileSegmenter object
- class paprica.segmenter.tileTrainer(tile: paprica.loader.tileLoader, func_to_compute_features, func_to_get_cc=None)[source]
Bases:
object
Class used to train a classifier that works directly on APR data. It uses Napari to manually add labels.
- static _are_labels_the_same(local_labels)[source]
Determine if manual labels in particle are the same and return the labels
- Parameters
local_labels (ndarray) – particle labels
- Returns
((bool)
- Return type
True if labels are the same, (int) corresponding label)
- _find_particle(coords)[source]
Find particle index corresponding to pixel location coords.
- Parameters
coords (array_like) – pixel coordinate [z, y, x]
- Returns
idx
- Return type
(int) particle index
- _order_labels()[source]
Order pixel_list in z increasing order, then y increasing order and finally x increasing order.
- Return type
None
- _remove_ambiguities(verbose)[source]
Remove particles that have been labelled with different labels.
- Parameters
verbose (bool) – option to print out information.
- _sample_pixel_list_on_APR()[source]
Convert manual annotations coordinates from pixel to APR.
- Return type
None
- add_annotations(use_sparse_labels=True, **kwargs)[source]
Add annotations on previously annotated dataset.
- Parameters
use_sparse_labels (bool) – use sparse array to store the labels (memory efficient but slower graphics)
- Return type
None
- apply_on_tile(tile, bg_label=None, func_to_get_cc=None, display_result=True, verbose=True)[source]
Apply classifier to the whole tile and display segmentation results using Napari.
- Parameters
display_result (bool) – option to display segmentation results using Napari
verbose (bool) – option to print out information.
- Return type
None
- display_training_annotations(**kwargs)[source]
Display manual annotations and their sampling on APR grid (if available).
- Return type
None
- load_classifier(path=None)[source]
Load a trained classifier.
- Parameters
path (string) – path for loading the classifier. By default, it is loaded from root folder.
- Return type
None
- load_labels(path=None)[source]
Load previously saved labels as numpy array with columns corresponding to [z, y, x, label].
- Parameters
path (string) – path to load the saved labels. By default it loads them in the data root folder.
- Return type
None
- manually_annotate(use_sparse_labels=True, **kwargs)[source]
Manually annotate dataset using Napari.
- Parameters
use_sparse_labels (bool) – use sparse array to store the labels (memory efficient but slower graphics)
- Return type
None
- save_classifier(path=None)[source]
Save the trained classifier.
- Parameters
path (string) – path for saving the classifier. By default, it is saved in the data root folder.
- Return type
None
- save_labels(path=None)[source]
Save labels as numpy array with columns corresponding to [z, y, x, label].
- Parameters
path (string) – path to save labels. By default it saves them in the data root folder.
- Return type
None
- segment_training_tile(bg_label=None, display_result=True, verbose=True)[source]
Apply classifier to the whole tile and display segmentation results using Napari.
- Parameters
display_result (bool) – option to display segmentation results using Napari
verbose (bool) – option to print out information.
- Return type
None
- train_classifier(verbose=True, n_estimators=10, class_weight='balanced', mean_norm=True, std_norm=True)[source]
Train the classifier for segmentation.
- Parameters
verbose (bool) – option to print out information.
n_estimators (int) – The number of trees in the random forest.
class_weight ({"balanced", "balanced_subsample"}, dict or list of dicts,) –
Weights associated with classes in the form
{class_label: weight}
. If not given, all classes are supposed to have weight one. For multi-output problems, a list of dicts can be provided in the same order as the columns of y.Note that for multioutput (including multilabel) weights should be defined for each class of every column in its own dict. For example, for four-class multilabel classification weights should be [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of [{1:1}, {2:5}, {3:1}, {4:1}].
The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as
n_samples / (n_classes * np.bincount(y))
The “balanced_subsample” mode is the same as “balanced” except that weights are computed based on the bootstrap sample for every tree grown.
For multi-output, the weights of each column of y will be multiplied.
Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.
mean_norm (bool) – If True, center the data before scaling.
std_norm (bool) – If True, scale the data to unit variance (or equivalently, unit standard deviation).
- Return type
None