smftools.informatics.binarize_converted_base_identities#
Functions
|
Efficiently binarizes conversion SMF data within a sequence string using NumPy arrays. |
- smftools.informatics.binarize_converted_base_identities.binarize_converted_base_identities(base_identities, strand, modification_type, deaminase_footprinting=False, mismatch_trend_per_read={}, on_missing='nan')#
Efficiently binarizes conversion SMF data within a sequence string using NumPy arrays. For conversion modality, the strand parameter is used for mapping. For deaminase modality, the mismatch_trend_per_read is used for mapping.
- Parameters:
base_identities (
dict) -- A dictionary returned by extract_base_identities. Keyed by read name. Points to a list of base identities.strand (
str) -- A string indicating which strand was converted in the experiment (options are 'top' and 'bottom').modification_type (
str) -- A string indicating the modification type of interest (options are '5mC' and '6mA').deaminase_footprinting (
bool) -- Whether direct deaminase footprinting chemistry was used.mismatch_trend_per_read (
dict) -- For deaminase footprinting, indicates the type of conversion relative to the top strand reference for each read. (C->T or G->A if bottom strand was converted)on_missing (
str) -- Error handling if a read is missing
- Returns:
A dictionary where 1 represents a methylated site, 0 represents an unmethylated site, and NaN represents a site without methylation info. If deaminase_footprinting, 1 represents deaminated sites, while 0 represents non-deaminated sites.
- Return type: