smftools.preprocessing.calculate_complexity_II

smftools.preprocessing.calculate_complexity_II#

smftools.preprocessing.calculate_complexity_II(adata, output_directory='', sample_col='Sample_names', ref_col='Reference_strand', cluster_col='sequence__merged_cluster_id', plot=True, save_plot=False, n_boot=30, n_depths=12, random_state=0, csv_summary=True, uns_flag='calculate_complexity_II_performed', force_redo=False, bypass=False)#

Estimate and optionally plot library complexity.

If ref_col is None, the calculation is performed per sample. If provided, complexity is computed for each (sample, reference) pair.

Parameters:

adata (AnnData) -- AnnData object containing read metadata.
output_directory (str | Path (default: '')) -- Directory for output plots/CSVs.
sample_col (str (default: 'Sample_names')) -- Obs column containing sample names.
ref_col (Optional[str] (default: 'Reference_strand')) -- Obs column with reference/strand categories, or None.
cluster_col (str (default: 'sequence__merged_cluster_id')) -- Obs column with merged cluster IDs.
plot (bool (default: True)) -- Whether to generate plots.
save_plot (bool (default: False)) -- Whether to save plots to disk.
n_boot (int (default: 30)) -- Number of bootstrap iterations per depth.
n_depths (int (default: 12)) -- Number of subsampling depths to evaluate.
random_state (int (default: 0)) -- Random seed for bootstrapping.
csv_summary (bool (default: True)) -- Whether to write CSV summary files.
uns_flag (str (default: 'calculate_complexity_II_performed')) -- Flag in adata.uns indicating prior completion.
force_redo (bool (default: False)) -- Whether to rerun even if uns_flag is present.
bypass (bool (default: False)) -- Whether to skip processing.

Return type:

None

smftools.preprocessing.calculate_complexity_II

Contents

smftools.preprocessing.calculate_complexity_II#