smftools.informatics.pod5_functions#

Functions

basecall_pod5s(config_path)

Basecall POD5 inputs using a configuration file.

fast5_to_pod5(fast5_dir[, output_pod5])

Convert FAST5 inputs into a single POD5 file.

subsample_pod5(pod5_path, read_name_path, ...)

Write a subsampled POD5 containing selected reads.

subsample_pod5_for_basecalling(input_path, ...)

Randomly sample up to max_reads reads from pod5 inputs and write a temp pod5.

smftools.informatics.pod5_functions.basecall_pod5s(config_path)#

Basecall POD5 inputs using a configuration file.

Parameters:

config_path (str | Path) -- Path to the basecall configuration file.

Return type:

None

smftools.informatics.pod5_functions.fast5_to_pod5(fast5_dir, output_pod5='FAST5s_to_POD5.pod5')#

Convert FAST5 inputs into a single POD5 file.

Parameters:
  • fast5_dir (Union[str, Path, Iterable[str | Path]]) -- FAST5 file path, directory, or iterable of file paths to convert.

  • output_pod5 (str | Path (default: 'FAST5s_to_POD5.pod5')) -- Output POD5 file path.

Raises:

FileNotFoundError -- If no FAST5 files are found or the input path is invalid.

Return type:

None

smftools.informatics.pod5_functions.subsample_pod5(pod5_path, read_name_path, output_directory)#

Write a subsampled POD5 containing selected reads.

Parameters:
  • pod5_path (str | Path) -- POD5 file path or directory of POD5 files to subsample.

  • read_name_path (str | int) -- Path to a text file of read names (one per line) or an integer specifying a random subset size.

  • output_directory (str | Path) -- Directory to write the subsampled POD5 file.

Return type:

None

smftools.informatics.pod5_functions.subsample_pod5_for_basecalling(input_path, max_reads, output_dir, seed=42)#

Randomly sample up to max_reads reads from pod5 inputs and write a temp pod5.

Collects read IDs from all pod5 files first (memory-efficient), then samples, then writes only the selected reads. If the total read count is already <= max_reads the original input_path is returned unchanged.

Parameters:
  • input_path (str | Path) -- A single pod5 file or a directory containing pod5 files.

  • max_reads (int) -- Maximum number of reads to retain.

  • output_dir (str | Path) -- Directory to write the subsampled pod5 file.

  • seed (int (default: 42)) -- Random seed for reproducibility.

Return type:

Path

Returns:

Path to the (possibly new) pod5 file to use for basecalling.