fastaq_replace_bases

FASTA and FASTQ file manipulation tools

Install

All systems
curl cmd.cat/fastaq_replace_bases.sh
Debian Debian
apt-get install fastaq
Ubuntu
apt-get install fastaq
image/svg+xml Kali Linux
apt-get install fastaq
Windows (WSL2)
sudo apt-get update sudo apt-get install fastaq
Raspbian
apt-get install fastaq

fastaq

FASTA and FASTQ file manipulation tools

A collection of scripts that perform useful and common fasta/q manipulation tasks. All scripts automatically detect whether the input is a FASTA or FASTQ file. Input and output files can be gzipped. fastaq_capillary_to_pairs - Given a fasta/q file of capillary reads, makes an interleaved file of read pairs fastaq_chunker - Splits a multi fasta/q file into separate files. Splits sequences into chunks of a fixed size. fastaq_count_sequences - Counts the number of sequences in a fasta/q file fastaq_deinterleave - Deinterleaves fasta/q file, so that reads are written alternately between two output files fastaq_enumerate_names - Renames sequences in a file, calling them 1,2,3... fastaq_expand_nucleotides - Makes all combinations of sequences in input file by using all possibilities of redundant bases. e.g. ART could be AAT or AGT. fastaq_extend_gaps - Extends the length of all gaps (and trims the start/end of sequences) in a fasta/q file. fastaq_fasta_to_fastq - Given a fasta and qual file, makes a fastq file. fastaq_filter - Filters a fasta/q file by sequence length and/or by name matching a regular expression. fastaq_get_ids - Gets IDs from each sequence in a fasta or fastq file. fastaq_get_seq_flanking_gaps - Gets the sequences either side of gaps in a fasta/q file. fastaq_insert_or_delete_bases - Deletes or inserts bases at given position(s) from a fasta/q file. fastaq_interleave - Interleaves two fasta/q files, so that reads are written alternately first/second in output file. fastaq_long_read_simulate - Simulates long reads from a fasta/q file. Can optionally make insertions into the reads, like pacbio does. fastaq_make_random_contigs - Makes a multi-fasta file of random sequences, all of the same length. Each base has equal chance of being A,C,G or T fastaq_merge - Converts multi fasta/q file to single sequence file, preserving original order of sequences. fastaq_replace_bases - Replaces all occurences of one letter with another in a fasta/q file. fastaq_reverse_complement - Reverse complements all sequences in a fasta/q file fastaq_scaffolds_to_contigs - Creates a file of contigs from a file of scaffolds - i.e. breaks at every gap in the input. fastaq_search_for_seq - Searches for an exact match on a given string and its reverese complement, in every sequences of a fasta/q file. Case insensitive. Guaranteed to find all hits. fastaq_sequence_trim - Trims sequences off the start of all sequences in a pair of fasta/q files, whenever there is a perfect match. Only keeps a read pair if both reads of the pair are at least a minimum length after any trimming. fastaq_split_by_base_count - Splits a multi fasta/q file into separate files. Does not split sequences. Puts up to max_bases into each split file. The exception is that any sequence longer than max_bases is put into its own file. fastaq_strip_illumina_suffix - Strips /1 or /2 off the end of every read name in a fasta/q file. fastaq_to_fake_qual - Makes fake quality scores file from a fasta/q file. fastaq_to_fasta - Converts sequence file to FASTA format. fastaq_to_mira_xml - Creates an xml file from a fasta/q file of reads, for use with Mira assembler. fastaq_to_orfs_gff - Writes a GFF file of open reading frames from a fasta/q file fastaq_to_perfect_reads - Makes perfect paired end fastq reads from a fasta/q file, with insert sizes sampled from a normal distribution. Read orientation is innies. Output is an interleaved fastq file. fastaq_to_quasr_primers_file - Converts a fasta/q file to QUASR primers format: just the sequence on each line and its reverse complement, tab separated. fastaq_to_random_subset - Takes a random subset of reads from a fasta/q file and optionally the corresponding read from a mates file. Ouptut is interleaved if mates file given. fastaq_to_tiling_bam - Takes a fasta/q file. Makes a BAM file containing perfect (unpaired) reads tiling the whole genome. fastaq_to_unique_by_id - Removes duplicate sequences from a fasta/q file, based on their names. If the same name is found more than once, then the longest sequence is kept. Order of sequences is preserved in output. fastaq_translate - Translates all sequences in a fasta or fastq file. Output is always fasta format fastaq_trim_ends - Trims set number of bases off each sequence in a fasta/q file fastaq_trim_Ns_at_end - Trims any Ns off each sequence in a fasta/q file. Does nothing to gaps in the middle, just trims the ends A developer API is also provided by this package. There are plenty of examples in tasks.py