spaced

alignment-free sequence comparison using spaced words

Install

All systems
curl cmd.cat/spaced.sh
Debian Debian
apt-get install spaced
Ubuntu
apt-get install spaced
image/svg+xml Kali Linux
apt-get install spaced
Windows (WSL2)
sudo apt-get update sudo apt-get install spaced
Raspbian
apt-get install spaced

spaced

alignment-free sequence comparison using spaced words

Spaced (Words) is a new approach to alignment-free sequence comparison. While most alignment-free algorithms compare the word-composition of sequences, spaced uses a pattern of care and don't care positions. The occurrence of a spaced word in a sequence is then defined by the characters at the match positions only, while the characters at the don't care positions are ignored. Instead of comparing the frequencies of contiguous words in the input sequences, this new approach compares the frequencies of the spaced words according to the pre-defined pattern. An information-theoretic distance measure is then used to define pairwise distances on the set of input sequences based on their spaced-word frequencies. Systematic test runs on real and simulated sequence sets have shown that, for phylogeny reconstruction, this multiple-spaced-words approach is far superior to the classical alignment-free approach based on contiguous word frequencies.