smftools.informatics.converted_BAM_to_adata

smftools.informatics.converted_BAM_to_adata#

smftools.informatics.converted_BAM_to_adata(converted_FASTA, split_dir, output_dir, input_already_demuxed, mapping_threshold, experiment_name, conversions, bam_suffix, device='cpu', num_threads=8, deaminase_footprinting=False, delete_intermediates=True, double_barcoded_path=None, samtools_backend='auto', demux_backend=None, single_bam=None, barcode_sidecar=None)#

Convert converted BAM files into an AnnData object with integer sequence encoding.

Parameters:
  • converted_FASTA (str | Path) -- Path to the converted FASTA reference.

  • split_dir (Path) -- Directory containing converted BAM files (ignored when single_bam is set).

  • output_dir (Path) -- Output directory for intermediate and final files.

  • input_already_demuxed (bool) -- Whether input reads were originally demultiplexed.

  • mapping_threshold (float) -- Minimum fraction of aligned reads required for inclusion.

  • experiment_name (str) -- Name for the output AnnData object.

  • conversions (list[str]) -- List of modification types (e.g., ["unconverted", "5mC", "6mA"]).

  • bam_suffix (str) -- File suffix for BAM files.

  • device (str | torch.device (default: 'cpu')) -- Torch device or device string.

  • num_threads (int (default: 8)) -- Number of parallel processing threads.

  • deaminase_footprinting (bool (default: False)) -- Whether the footprinting used direct deamination chemistry.

  • delete_intermediates (bool (default: True)) -- Whether to remove intermediate files after processing.

  • double_barcoded_path (Path | None (default: None)) -- Path to dorado demux summary file of double-ended barcodes.

  • samtools_backend (str | None (default: 'auto')) -- Samtools backend choice for alignment parsing.

  • demux_backend (str | None (default: None)) -- Demux backend used ("smftools" or "dorado"). If "smftools", demux_type annotation is skipped here and derived from BM tag later.

  • single_bam (Path | None (default: None)) -- When set, load from this single BAM instead of split_dir (non-split mode).

  • barcode_sidecar (Path | None (default: None)) -- Path to barcode sidecar parquet for read-to-barcode lookup in non-split mode.

Returns:

The AnnData object (if generated) and its path.

Return type:

tuple[anndata.AnnData | None, Path]

Processing Steps:
  1. Resolve the best available torch device and create output directories.

  2. Load converted FASTA records and compute conversion sites.

  3. Filter BAMs based on mapping thresholds.

  4. Process each BAM in parallel, building per-sample H5AD files.

  5. Concatenate per-sample AnnData objects and attach reference metadata.

  6. Add demultiplexing annotations and clean intermediate artifacts.