Utils Module

Utility Functions

The Utils module within the ExSeq Toolbox provides a comprehensive suite of utility functions to support the preprocessing, retrieval, manipulation, and visualization of expansion microscopy data.

exm.utils.utils.chmod(path)[source]

Sets permissions so that users and the owner can read, write and execute files at the given path.

Parameters:

path (pathlib.Path) – Path in which privileges should be granted.

Return type:

None

exm.utils.utils.display_img(img)[source]

Displays an image using the Image module from the Python Imaging Library (PIL).

The function supports images of type boolean and other numpy data types. For boolean images, the function multiplies the image by 255 to create an 8-bit grayscale image. For non-boolean images, the function simply converts the image to an 8-bit grayscale image without scaling.

Parameters:

img (Union[np.ndarray, bool]) – The input image to display. This can be a boolean or non-boolean numpy array.

Return type:

None

exm.utils.utils.downsample_volume(array, factors)[source]

Reduces the size of an array by downsampling along each dimension using specified factors.

Parameters:
  • array (np.ndarray) – The input array to be downsampled.

  • factors (Tuple[Union[int, float], ...]) – The factors to downsample by for each dimension of the array. Each factor must be a positive number.

Returns:

The downsampled array.

Return type:

np.ndarray

exm.utils.utils.enhance_and_filter_volume(volume, low_percentile=0, high_percentile=100, acclerated=False)[source]

Enhances the contrast of a volume using specified percentiles and applies a median filter to reduce noise. Optionally uses GPU acceleration for the median filtering step if accelerated is set to True.

Parameters:
  • volume (np.ndarray) – The input volume to be processed.

  • low_percentile (float Default is 0.) – The lower percentile to use for contrast adjustment. Values below this percentile will be adjusted to the minimum intensity.

  • high_percentile (float Default is 100.) – The higher percentile to use for contrast adjustment. Values above this percentile will be adjusted to the maximum intensity.

  • accelerated (bool, optional Default is False.) – If True, uses GPU acceleration to perform the median filtering. Requires CuPy to be installed.

  • acclerated (bool)

Returns:

The volume after contrast enhancement and median filtering.

Return type:

np.ndarray

Raises:
  • ValueError – If the percentiles are out of the [0, 100] range or if high_percentile is not greater than low_percentile.

  • TypeError – If the input volume is not a numpy ndarray or if percentiles are not numeric.

  • ImportError – If accelerated is True but CuPy is not installed.

exm.utils.utils.gene_barcode_mapping(args)[source]

Loads a CSV file containing gene symbols and corresponding barcodes, and creates mappings between them.

This function reads a CSV file specified by args.gene_digit_csv, which contains gene symbols and their corresponding barcodes. It converts the barcodes into digit representations and creates two mappings: ‘digit2gene’ for mapping from digit representation to gene symbol, and ‘gene2digit’ for mapping from gene symbol to digit representation. These mappings are useful for identifying genes associated with puncta barcodes in a field of view.

Parameters:

args (Args) – Configuration options. This should be an instance of the Args class.

Returns:

A tuple containing: - A pandas DataFrame with the original CSV data and an additional column for digit representations. - A dictionary mapping from digit representation to gene symbol (‘digit2gene’). - A dictionary mapping from gene symbol to digit representation (‘gene2digit’).

Return type:

Tuple[pd.DataFrame, Dict[str, str], Dict[str, str]]

exm.utils.utils.generate_debug_candidate(args, gene=None, fov=None, num_missing_code=1)[source]

Generates a candidate puncta for debugging purposes.

The function first randomly selects a gene if not provided and retrieves all corresponding puncta. It then filters the puncta based on the number of missing codes in their barcodes. Finally, it randomly selects one puncta from the filtered list.

Parameters:
  • args (Args) – Configuration options. This should be an instance of the Args class.

  • gene (Optional[str]) – The gene of interest, if none is provided a gene is randomly selected.

  • fov (Optional[int]) – The field of view (FOV) to consider. If none is provided, all FOVs are considered.

  • num_missing_code (int) – The number of missing codes in the barcode of the puncta to be retrieved. Default is 1.

Returns:

A single randomly chosen puncta that satisfies all the criteria (matching gene, within FOV, correct number of missing codes).

Return type:

Optional[Dict]

exm.utils.utils.get_offsets(filename)[source]

Given the filename for the BDV/H5 XML file, returns the stitching offset as an (N,3) array in (X,Y,Z) order.

The offsets are expressed in micrometers (µm) and are extracted from the XML file produced by the Big Stitcher plugin of Fiji.

Parameters:

filename (str) – The file name of the BDV/H5 XML file.

Returns:

An array of stitching offsets in the format of (X, Y, Z).

Return type:

np.ndarray

Raises:
  • FileNotFoundError – If the XML file cannot be found.

  • ET.ParseError – If there is an error parsing the XML file.

  • ValueError – If the XML file has an unexpected structure or if the affine transformation cannot be read.

exm.utils.utils.retrieve_all_puncta(args, fov)[source]

Returns all identified puncta for a given field of view.

This function loads and returns all puncta data from a pickle file for the specified field of view. The path to the pickle file is constructed using the configuration options provided in the args parameter.

Parameters:
  • args (Args) – Configuration options. This should be an instance of the Args class.

  • fov (int) – The field of view for which to return all identified puncta.

Returns:

The data of all puncta identified in the specified field of view.

Return type:

List[Dict]

exm.utils.utils.retrieve_complete(args)[source]

Retrieves a complete summary of barcodes present in both the gene-barcode mapping and the overall barcode summary.

Parameters:

args (Args) – Configuration options. This should be an instance of the Args class.

Returns:

A pandas DataFrame containing the complete summary of barcodes, indexed by barcode with columns for total count (‘number’) and count per fov (e.g., ‘fov1’, ‘fov2’, …), and a ‘gene’ column mapping each barcode to its corresponding gene. Sorted by gene names in ascending order.

Return type:

pd.DataFrame

exm.utils.utils.retrieve_digit(args, digit)[source]

Retrieves all puncta with a specified barcode (represented as a digit) across all fields of view.

This function iterates over all provided fields of view (FOVs) and retrieves puncta that match the specified barcode. Each matching puncta, along with its FOV information, is appended to a list.

Parameters:
  • args (Args) – Configuration options. This should be an instance of the Args class.

  • digit (str) – The barcode to search for, represented as a digit.

Returns:

A list of dictionaries where each dictionary contains information about a puncta and the FOV it was found in.

Return type:

List[Dict]

exm.utils.utils.retrieve_gene(args, gene)[source]

Retrieves all puncta associated with a specific gene across all fields of view (FOVs).

Parameters:
  • args (Args) – Configuration options. This should be an instance of the Args class.

  • gene (str) – The gene of interest for which all corresponding puncta across all FOVs will be retrieved.

Returns:

A list of dictionaries, each representing a puncta associated with the gene, including puncta’s properties and the FOV in which it is found.

Return type:

List[Dict]

exm.utils.utils.retrieve_img(args, fov, code, channel, ROI_min, ROI_max)[source]

Returns the middle slice of a specified volume chunk.

This function retrieves a middle z-slice from a 3D volume chunk specified by its field of view, code, and channel. The ROI (Region of Interest) is defined by minimum and maximum coordinates.

Parameters:
  • args (Args) – Configuration options. This should be an instance of the Args class.

  • fov (int) – The field of view of the volume slice to be returned.

  • code (int) – The code of the volume slice to be returned.

  • channel (int) – The channel of the volume slice to be returned.

  • ROI_min (List[int]) – Minimum coordinates of the volume chunk in the format of [z, y, x].

  • ROI_max (List[int]) – Maximum coordinates of the volume chunk in the format of [z, y, x].

Returns:

A 2D numpy array representing the middle z-slice of the specified volume chunk.

Return type:

np.ndarray

exm.utils.utils.retrieve_one_puncta(args, fov, puncta_index)[source]

Retrieves information about a specific puncta from a given field of view.

This function uses the provided configuration options to access and return data for a single puncta, identified by its index, within the specified field of view.

Parameters:
  • args (Args) – Configuration options. This should be an instance of the Args class.

  • fov (int) – The field of view from which to retrieve the puncta.

  • puncta_index (int) – The index of the specific puncta to retrieve.

Returns:

A dictionary containing information about the puncta.

Return type:

Dict

exm.utils.utils.retrieve_summary(args)[source]

Retrieves a summary of all puncta for each field of view (FOV).

This function iterates over the provided list of FOVs, retrieves all puncta for each FOV, and aggregates the count of each barcode across all FOVs and individually per FOV. The summary is then saved to a CSV file.

Parameters:

args (Args) – Configuration options. This should be an instance of the Args class.

Returns:

A pandas DataFrame containing the summary of barcodes. The DataFrame is indexed by barcode with columns for total count (‘number’) and count per FOV (e.g., ‘fov1’, ‘fov2’, …). The DataFrame is sorted by total count in descending order.

Return type:

pd.DataFrame

exm.utils.utils.retrieve_vol(args, fov, code, c, ROI_min, ROI_max)[source]

Returns a specified volume chunk from a dataset.

Parameters:
  • args (Args) – Configuration options. This should be an instance of the Args class.

  • fov (int) – The field of view of the volume chunk to be returned.

  • code (int) – The code of the volume chunk to be returned.

  • c (int) – The channel of the volume chunk to be returned.

  • ROI_min (List[int]) – Minimum coordinates of the volume chunk in the format of [z, y, x].

  • ROI_max (List[int]) – Maximum coordinates of the volume chunk in the format of [z, y, x].

Returns:

A numpy array representing the retrieved volume chunk.

Return type:

h5py.Dataset

exm.utils.utils.subtract_background_rolling_ball(volume, radius=50, num_threads=40)[source]

Performs background subtraction on a volume image using the rolling ball method.

Parameters:
  • volume (np.ndarray) – The input volume image.

  • radius (int, optional) – The radius of the rolling ball used for background subtraction. Default is 50.

  • num_threads (int, optional) – The number of threads to use for the rolling ball operation. Default is 40.

Returns:

The volume image after background subtraction.

Return type:

np.ndarray

exm.utils.utils.subtract_background_top_hat(volume, radius=50, use_gpu=True)[source]

Performs top-hat background subtraction on a volume image.

Parameters:
  • volume (np.ndarray) – The input volume image.

  • radius (int, optional) – The radius of the disk structuring element used for top-hat transformation. Default is 50.

  • use_gpu (bool, optional) – If True, uses GPU for computation (requires cupy). Default is False.

Returns:

The volume image after background subtraction.

Return type:

np.ndarray

exm.utils.utils.visualize_progress(args)[source]

Visualizes the progress of the ExSeq Toolbox.

This function creates a heatmap visualizing the completion status of different steps in the ExSeq Toolbox for each field of view (FOV) and each code.

Parameters:

args (Args) – Configuration options. This should be an instance of the Args class.

Return type:

None

Package Logger

exm.utils.log.configure_logger(name, log_file_name='ExSeq-Toolbox_logs.log')[source]

Configures and returns a logger with both stream and file handlers.

This function sets up a logger to send log messages to the console and to a log file. The console will display messages with a level of INFO and higher, while the file will contain messages with a level of DEBUG and higher.

Parameters:
  • name (str) – Name of the logger to configure. Typically, this is the name of the module calling the logger.

  • log_file_name (str) – Name of the log file where logs will be saved. Defaults to ‘ExSeq-Toolbox_logs.log’.

Returns:

Configured logger object.

Return type:

logging.Logger

Raises:

OSError – If there is an issue with opening or writing to the log file.