smftools.informatics.binarize_converted_base_identities

smftools.informatics.binarize_converted_base_identities#

Functions

binarize_converted_base_identities(...[, ...])

Efficiently binarizes conversion SMF data within a sequence string using NumPy arrays.

smftools.informatics.binarize_converted_base_identities.binarize_converted_base_identities(base_identities, strand, modification_type, deaminase_footprinting=False, mismatch_trend_per_read={}, on_missing='nan')#

Efficiently binarizes conversion SMF data within a sequence string using NumPy arrays. For conversion modality, the strand parameter is used for mapping. For deaminase modality, the mismatch_trend_per_read is used for mapping.

Parameters:
  • base_identities (dict) -- A dictionary returned by extract_base_identities. Keyed by read name. Points to a list of base identities.

  • strand (str) -- A string indicating which strand was converted in the experiment (options are 'top' and 'bottom').

  • modification_type (str) -- A string indicating the modification type of interest (options are '5mC' and '6mA').

  • deaminase_footprinting (bool) -- Whether direct deaminase footprinting chemistry was used.

  • mismatch_trend_per_read (dict) -- For deaminase footprinting, indicates the type of conversion relative to the top strand reference for each read. (C->T or G->A if bottom strand was converted)

  • on_missing (str) -- Error handling if a read is missing

Returns:

A dictionary where 1 represents a methylated site, 0 represents an unmethylated site, and NaN represents a site without methylation info. If deaminase_footprinting, 1 represents deaminated sites, while 0 represents non-deaminated sites.

Return type:

dict