¡@¡@Statistics based on the length of the longest run of a given pattern has been widely used on gene sequencing (Mott et al, 1990). The distributions of these statistics are closely related to the results of Erdos and others (1970-1975) on binary case. Fu and Koutras (1994) developed a Markov chain imbedding method, which not only greatly simplified the computation, but also provide a key for the problems of more general cases.
|
¡@¡@Due to the nature of gene sequence, a long matching run allowed minor errors are usually preferred. Karlin et al. (1990) had developed a comprehensive class of asymptotic distributions for such kind of statistics. In this work, we extend Fu and Koutras method to obtain the exact distribution of the length of non-perfect run, and apply it to access the DNA similarity among species, for the construction of phylogenetic tree.