paprica.segmenter

Module containing classes and functions relative to Segmentation.

By using this code you agree to the terms of the software license agreement.

© Copyright 2020 Wyss Center for Bio and Neuro Engineering – All rights reserved

paprica.segmenter._predict_on_APR_block(x, clf, n_parts=10000000.0, output='class', verbose=False)[source]

Predict particle class with the trained classifier clf on the precomputed features f using a blocked strategy to avoid memory segfault.

Parameters
  • x (ndarray) – features (n_particle, n_features) for particle prediction

  • n_parts (int) – number of particles in the batch to predict

  • output (string) – output type, can be ‘class’ where each particle get assigned a class or ‘proba’ where each particle get assigned a probability of belonging to each class.

  • verbose (bool) – control function verbosity

Returns

parts_pred – Class prediction for each particle.

Return type

array_like

paprica.segmenter.compute_gradients(apr, parts)[source]

Compute gradient for each spatial direction directly on APR.

Parameters
  • apr ((APR) APR object) –

  • parts ((ParticleData) particle data sampled on APR) –

Returns

(dx, dy, dz)

Return type

(arrays) gradient for each direction

paprica.segmenter.compute_gradmag(apr, parts)[source]

Compute gradient magnitude directly on APR.

Parameters
  • apr ((APR) APR object) –

  • parts ((ParticleData) particle data sampled on APR) –

Return type

Gradient magnitude of APR.

paprica.segmenter.compute_laplacian(apr, parts, grad=None)[source]

Compute Laplacian for each spatial direction directly on APR.

Parameters
  • apr ((APR) APR object) –

  • parts ((ParticleData) particle data sampled on APR) –

  • grad ((dz, dy, dx) gradient for each direction if precomputed (faster for Laplacian computation)) –

Return type

Laplacian of APR.

paprica.segmenter.gaussian_blur(apr, parts, sigma=1.5, size=11)[source]

Compute Gaussian blur directly on APR.

Parameters
  • apr ((APR) APR object) –

  • parts ((ParticleData) particle data sampled on APR) –

  • sigma ((float) Gaussian blur standard deviation (kernel radius)) –

  • size ((int) kernel size (increase with caution, complexity is not linear)) –

Return type

Blurred APR.

paprica.segmenter.map_feature(apr, parts_cc, features)[source]

Map feature values to segmented particle data.

Parameters
  • apr (pyapr.APR) – apr object to map features to

  • parts_cc (pyapr.ParticleData) – connected component particle array corresponding to apr

  • features (array_like) – array containing the values to map

Return type

Array of mapped values (each particle in the connected component now has the value present in features)

class paprica.segmenter.multitileSegmenter(tiles: paprica.parser.tileParser, database: (<class 'str'>, <class 'pandas.core.frame.DataFrame'>), clf, func_to_compute_features, func_to_get_cc, verbose=True)[source]

Bases: object

Class used to segment multitiles acquisition.

__init__(tiles: paprica.parser.tileParser, database: (<class 'str'>, <class 'pandas.core.frame.DataFrame'>), clf, func_to_compute_features, func_to_get_cc, verbose=True)[source]
Parameters
  • tiles (tileLoader) – tile object for loading the tile (or containing the preloaded tile).

  • database (pd.DataFrame, string) – dataframe (or path to the csv file) containing the registration parameters to correctly place each tile.

  • clf (sklearn.classifier) – pre-trained classifier

  • func_to_compute_features (func) – function to compute the features on ParticleData. Must be the same set of as the one used to train the classifier.

  • func_to_get_cc (func) –

    function to post process the segmentation map into a connected component (each cell has

    a unique id)

_filter_cells_flann(c1, c2, lowe_ratio=0.7, distance_max=5)[source]

Remove cells duplicate using Flann criteria and distance threshold.

Parameters
  • c1 (ndarray) – array containing the first set cells coordinates

  • c2 (ndarray) – array containing the second set cells coordinates

  • lowe_ratio (float) – ratio of the second nearest neighbor distance / nearest neighbor distance above lowe_ratio, the cell is supposed to be unique. Below lowe_ratio, it might have a second detection on the neighboring tile.

  • distance_max (float) – maximum distance in pixel for two cells to be considered the same.

  • verbose (bool) – control function verbosity

Returns

_ – array containing the merged sets without the duplicates.

Return type

ndarray

_get_tile_position(row, col)[source]

Function to get the absolute tile position defined by it’s coordinate in the multitile set.

Parameters
  • row (int) – row number

  • col (int) – column number

Returns

_ – tile absolute position

Return type

ndarray

_merge_cells(tile, lowe_ratio, distance_max)[source]

Function to merge cells on a tile to the final cells list and remove duplicate.

Parameters
  • tile (tileLoader) – tile from which to merge cells

  • lowe_ratio (float) – ratio of the second nearest neighbor distance / nearest neighbor distance above lowe_ratio, the cell is supposed to be unique. Below lowe_ratio, it might have a second detection on the neighboring tile.

  • distance_max (float) – maximum distance in pixel for two cells to be considered the same.

Return type

None

_segment_tile(tile: paprica.loader.tileLoader, save_cc=True, save_mask=False, lazy_loading=True)[source]

Compute the segmentation and stores the result as an independent APR.

Parameters

verbose (bool) – control the verbosity of the function to print some info

Return type

None

compute_multitile_segmentation(save_cc=True, save_mask=False, lowe_ratio=0.7, distance_max=5, lazy_loading=True)[source]

Compute the segmentation and stores the result as an independent APR.

Parameters
  • verbose (bool) – control the verbosity of the function to print some info

  • save_cc (bool) – option to save the connected component particle to file

  • save_mask (bool) – option to save the prediction mask to file

  • lowe_ratio (float in ]0, 1[) – ratio between the second nearest neighbor and the first nearest neighbor to be considered a good match

  • distance_max (float) – maximum distance in pixel for two objects to be matched

  • lazy_loading (bool) – option to save the tree particles to allow for lazy loading later on

Return type

None

extract_and_merge_cells(lowe_ratio=0.7, distance_max=5)[source]

Function to extract cell positions in each tile and merging across all tiles. Identical cells on overlapping area are automatically detected using Flann method.

Parameters
  • lowe_ratio (float) – ratio of the second nearest neighbor distance / nearest neighbor distance above lowe_ratio, the cell is supposed to be unique. Below lowe_ratio, it might have a second detection on the neighboring tile.

  • distance_max (float) – maximum distance in pixel for two cells to be considered the same.

Return type

None

classmethod from_classifier(tiles: paprica.parser.tileParser, database: (<class 'str'>, <class 'pandas.core.frame.DataFrame'>), classifier, func_to_compute_features, func_to_get_cc=None, verbose=True)[source]

Instantiate tileSegmenter object with a classifier, function to compute the features and to get the connected components.

Parameters
  • classifier

  • func_to_compute_features (func) – function to compute features used by the classifier to perform the segmentation.

  • func_to_get_cc (func) – function to compute the connected component from the classifier prediction.

  • verbose (bool) – control function output.

Return type

tileSegmenter object

classmethod from_trainer(tiles: paprica.parser.tileParser, database: (<class 'str'>, <class 'pandas.core.frame.DataFrame'>), trainer, verbose=True)[source]

Instantiate tileSegmenter object with a tileTrainer object.

Parameters
  • trainer (tileTrainer) – trainer object previously trained for segmentation

  • verbose (bool) – control function output

Return type

tileSegmenter object

save_cells(output_path)[source]

Save cells as a CSV file.

Parameters

output_path (string) – path for saving the CSV file.

Return type

None

paprica.segmenter.particle_levels(apr)[source]

Returns apr level: for each particle the lvl is defined as the size of the particle in pixel.

Parameters

apr ((APR) APR object) –

Return type

Particle level.

class paprica.segmenter.tileSegmenter(clf, func_to_compute_features, func_to_get_cc, verbose)[source]

Bases: object

Class used to segment tiles. It is instantiated with a tileLoader object, a previously trained classifier, a function to compute features (the same features used to train the classifier and a function to get the post processed connected component for the classifier output.

__init__(clf, func_to_compute_features, func_to_get_cc, verbose)[source]
Parameters
  • clf (sklearn.classifier) – pre-trained classifier

  • func_to_compute_features (func) – function to compute the features on ParticleData. Must be the same set of as the one used to train the classifier.

  • func_to_get_cc (func) –

    function to post process the segmentation map into a connected component (each cell has

    a unique id)

compute_segmentation(tile: paprica.loader.tileLoader, save_cc=True, save_mask=False, lazy_loading=True)[source]

Compute the segmentation and stores the result as an independent APR.

Parameters

verbose (bool) – control the verbosity of the function to print some info

Return type

None

classmethod from_classifier(classifier, func_to_compute_features, func_to_get_cc=None, verbose=True)[source]

Instantiate tileSegmenter object with a classifier, function to compute the features and to get the connected components.

Parameters
  • classifier

  • func_to_compute_features (func) – function to compute features used by the classifier to perform the segmentation.

  • func_to_get_cc (func) – function to compute the connected component from the classifier prediction.

  • verbose (bool) – control function output.

Return type

tileSegmenter object

classmethod from_trainer(trainer, verbose=True)[source]

Instantiate tileSegmenter object with a tileTrainer object.

Parameters
  • trainer (tileTrainer) – trainer object previously trained for segmentation

  • verbose (bool) – control function output

Return type

tileSegmenter object

class paprica.segmenter.tileTrainer(tile: paprica.loader.tileLoader, func_to_compute_features, func_to_get_cc=None)[source]

Bases: object

Class used to train a classifier that works directly on APR data. It uses Napari to manually add labels.

static _are_labels_the_same(local_labels)[source]

Determine if manual labels in particle are the same and return the labels

Parameters

local_labels (ndarray) – particle labels

Returns

((bool)

Return type

True if labels are the same, (int) corresponding label)

_find_particle(coords)[source]

Find particle index corresponding to pixel location coords.

Parameters

coords (array_like) – pixel coordinate [z, y, x]

Returns

idx

Return type

(int) particle index

_order_labels()[source]

Order pixel_list in z increasing order, then y increasing order and finally x increasing order.

Return type

None

_remove_ambiguities(verbose)[source]

Remove particles that have been labelled with different labels.

Parameters

verbose (bool) – option to print out information.

_sample_pixel_list_on_APR()[source]

Convert manual annotations coordinates from pixel to APR.

Return type

None

add_annotations(use_sparse_labels=True, **kwargs)[source]

Add annotations on previously annotated dataset.

Parameters

use_sparse_labels (bool) – use sparse array to store the labels (memory efficient but slower graphics)

Return type

None

apply_on_tile(tile, bg_label=None, func_to_get_cc=None, display_result=True, verbose=True)[source]

Apply classifier to the whole tile and display segmentation results using Napari.

Parameters
  • display_result (bool) – option to display segmentation results using Napari

  • verbose (bool) – option to print out information.

Return type

None

display_features()[source]

Display the computed features.

display_training_annotations(**kwargs)[source]

Display manual annotations and their sampling on APR grid (if available).

Return type

None

load_classifier(path=None)[source]

Load a trained classifier.

Parameters

path (string) – path for loading the classifier. By default, it is loaded from root folder.

Return type

None

load_labels(path=None)[source]

Load previously saved labels as numpy array with columns corresponding to [z, y, x, label].

Parameters

path (string) – path to load the saved labels. By default it loads them in the data root folder.

Return type

None

manually_annotate(use_sparse_labels=True, **kwargs)[source]

Manually annotate dataset using Napari.

Parameters

use_sparse_labels (bool) – use sparse array to store the labels (memory efficient but slower graphics)

Return type

None

save_classifier(path=None)[source]

Save the trained classifier.

Parameters

path (string) – path for saving the classifier. By default, it is saved in the data root folder.

Return type

None

save_labels(path=None)[source]

Save labels as numpy array with columns corresponding to [z, y, x, label].

Parameters

path (string) – path to save labels. By default it saves them in the data root folder.

Return type

None

segment_training_tile(bg_label=None, display_result=True, verbose=True)[source]

Apply classifier to the whole tile and display segmentation results using Napari.

Parameters
  • display_result (bool) – option to display segmentation results using Napari

  • verbose (bool) – option to print out information.

Return type

None

train_classifier(verbose=True, n_estimators=10, class_weight='balanced', mean_norm=True, std_norm=True)[source]

Train the classifier for segmentation.

Parameters
  • verbose (bool) – option to print out information.

  • n_estimators (int) – The number of trees in the random forest.

  • class_weight ({"balanced", "balanced_subsample"}, dict or list of dicts,) –

    Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one. For multi-output problems, a list of dicts can be provided in the same order as the columns of y.

    Note that for multioutput (including multilabel) weights should be defined for each class of every column in its own dict. For example, for four-class multilabel classification weights should be [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of [{1:1}, {2:5}, {3:1}, {4:1}].

    The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))

    The “balanced_subsample” mode is the same as “balanced” except that weights are computed based on the bootstrap sample for every tree grown.

    For multi-output, the weights of each column of y will be multiplied.

    Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.

  • mean_norm (bool) – If True, center the data before scaling.

  • std_norm (bool) – If True, scale the data to unit variance (or equivalently, unit standard deviation).

Return type

None