smftools.informatics.fasta_functions#
Functions
|
Find genomic coordinates of modified bases in a reference FASTA. |
|
Convert a FASTA file and write converted records to disk. |
|
Create or reuse |
|
Return record lengths and sequences from a FASTA file. |
|
Index a FASTA file and optionally write chromosome sizes. |
|
Subsample a FASTA using BED coordinates. |
- smftools.informatics.fasta_functions.generate_converted_FASTA(input_fasta, modification_types, strands, output_fasta, num_threads=4, chunk_size=500)#
Convert a FASTA file and write converted records to disk.
- Parameters:
input_fasta (
str|Path) -- Path to the unconverted FASTA file.modification_types (
list[str]) -- List of modification types (5mC,6mA, or unconverted).output_fasta (
str|Path) -- Path to the converted FASTA output file.num_threads (
int(default:4)) -- Number of parallel workers to use.chunk_size (
int(default:500)) -- Number of records to process per write batch.
- Return type:
- smftools.informatics.fasta_functions.index_fasta(fasta, write_chrom_sizes=True)#
Index a FASTA file and optionally write chromosome sizes.
- smftools.informatics.fasta_functions.get_chromosome_lengths(fasta)#
Create or reuse
<fasta>.chrom.sizesderived from the FASTA index.
- smftools.informatics.fasta_functions.get_native_references(fasta_file)#
Return record lengths and sequences from a FASTA file.
- smftools.informatics.fasta_functions.find_conversion_sites(fasta_file, modification_type, conversions, deaminase_footprinting=False)#
Find genomic coordinates of modified bases in a reference FASTA.
- Parameters:
fasta_file (
str|Path) -- Path to the converted reference FASTA.modification_type (
str) -- Modification type (5mC,6mA, orunconverted).conversions (
list[str]) -- List of conversion types (first entry is the unconverted record type).deaminase_footprinting (
bool(default:False)) -- Whether the footprinting used direct deamination chemistry.
- Returns:
Mapping of record name to
[sequence length, top strand coordinates, bottom strand coordinates, sequence, complement].- Return type:
- Raises:
ValueError -- If the modification type is invalid.
- smftools.informatics.fasta_functions.subsample_fasta_from_bed(input_FASTA, input_bed, output_directory, output_FASTA)#
Subsample a FASTA using BED coordinates.