smftools.preprocessing.calculate_complexity_II

smftools.preprocessing.calculate_complexity_II#

smftools.preprocessing.calculate_complexity_II(adata, output_directory='', sample_col='Sample_names', ref_col='Reference_strand', cluster_col='sequence__merged_cluster_id', plot=True, save_plot=False, n_boot=30, n_depths=12, random_state=0, csv_summary=True, uns_flag='calculate_complexity_II_performed', force_redo=False, bypass=False)#

Estimate and optionally plot library complexity.

If ref_col is None, the calculation is performed per sample. If provided, complexity is computed for each (sample, reference) pair.

Parameters:
  • adata (AnnData) -- AnnData object containing read metadata.

  • output_directory (str | Path (default: '')) -- Directory for output plots/CSVs.

  • sample_col (str (default: 'Sample_names')) -- Obs column containing sample names.

  • ref_col (Optional[str] (default: 'Reference_strand')) -- Obs column with reference/strand categories, or None.

  • cluster_col (str (default: 'sequence__merged_cluster_id')) -- Obs column with merged cluster IDs.

  • plot (bool (default: True)) -- Whether to generate plots.

  • save_plot (bool (default: False)) -- Whether to save plots to disk.

  • n_boot (int (default: 30)) -- Number of bootstrap iterations per depth.

  • n_depths (int (default: 12)) -- Number of subsampling depths to evaluate.

  • random_state (int (default: 0)) -- Random seed for bootstrapping.

  • csv_summary (bool (default: True)) -- Whether to write CSV summary files.

  • uns_flag (str (default: 'calculate_complexity_II_performed')) -- Flag in adata.uns indicating prior completion.

  • force_redo (bool (default: False)) -- Whether to rerun even if uns_flag is present.

  • bypass (bool (default: False)) -- Whether to skip processing.

Return type:

None