smftools.preprocessing.subsample_adata

smftools.preprocessing.subsample_adata#

smftools.preprocessing.subsample_adata(adata, obs_columns=None, max_samples=2000, random_seed=42)#

Subsample an AnnData object by observation categories.

Each unique combination of categories in obs_columns is capped at max_samples observations. If obs_columns is None, the function randomly subsamples the entire dataset.

Parameters:
  • adata (AnnData) -- AnnData object to subsample.

  • obs_columns (Optional[Sequence[str]] (default: None)) -- Observation column names to group by.

  • max_samples (int (default: 2000)) -- Maximum observations per category combination.

  • random_seed (int (default: 42)) -- Random seed for reproducibility.

Returns:

Subsampled AnnData object.

Return type:

anndata.AnnData