In this paper, we propose two metrics to compare DNA and protein sequences based on a Poisson model of word occurrences. Instead of comparing the frequencies of all fixed-length words in two sequences, we consider (1) the probability of 'generating' one sequence under the Poisson model estimated from the other; (2) their different expression levels of words. Phylogenetic trees of 25 viruses including SARS-Co. Vs are constructed to illustrate our approach. ยฉ 2008 Elsevier Inc.
keywords
year โฐ 2009
issn ๐Ÿ—„ 00255564
volume 217
number 2
page 159-166
citedbycount 7