smftools.informatics.ohe#
Functions
|
Efficient version of ohe_batching: one-hot encodes sequences in parallel and writes batches immediately. |
|
Takes an anndata object and a list of observation names. |
|
Takes a flattened one hot encoded array and returns the sequence string from that array. |
|
One-hot encodes a DNA sequence. |
- smftools.informatics.ohe.one_hot_encode(sequence, device='auto')#
One-hot encodes a DNA sequence.
- Parameters:
sequence (
str or list) -- DNA sequence (e.g., "ACGTN" or ['A', 'C', 'G', 'T', 'N']).- Returns:
Flattened one-hot encoded representation of the input sequence.
- Return type:
ndarray
- smftools.informatics.ohe.one_hot_decode(ohe_array)#
Takes a flattened one hot encoded array and returns the sequence string from that array. :type ohe_array: :param ohe_array: A one hot encoded array :type ohe_array:
np.array- Returns:
Sequence string of the one hot encoded array
- Return type:
sequence (str)
- smftools.informatics.ohe.ohe_layers_decode(adata, obs_names)#
Takes an anndata object and a list of observation names. Returns a list of sequence strings for the reads of interest. :type adata: :param adata: An anndata object. :type adata:
AnnData:type obs_names: :param obs_names: A list of observation name strings to retrieve sequences for. :type obs_names:list
- smftools.informatics.ohe.ohe_batching(base_identities, tmp_dir, record, prefix='', batch_size=100000, progress_bar=None, device='auto', threads=None)#
Efficient version of ohe_batching: one-hot encodes sequences in parallel and writes batches immediately.
- Parameters:
base_identities (
dict) -- Dictionary mapping read names to sequences.tmp_dir (
str) -- Directory for storing temporary files.record (
str) -- Record name.prefix (
str) -- Prefix for file naming.batch_size (
int) -- Number of reads per batch.progress_bar (
tqdm instance, optional) -- Shared progress bar.device (
str) -- Device for encoding.threads (
int, optional) -- Number of parallel workers.
- Returns:
List of valid H5AD file paths.
- Return type: