Aarhus University Seal

A dependent rates model and MCMC based methodology for the Maximum Likelihood analysis of sequences with overlapping reading frames

by Anne-Mette Krabbe Pedersen and Jens Ledet Jensen
Research Reports Number 418 (December 2000)
We present a model and methodology for the maximum likelihood analysis of pariwise alignments of DNA sequences in which two genes are encoded in overlapping reading frames. In the model for the substitution process, the instantaneous rates of substitution are allowed to depend upon the nucleotides occupying the sites in a neighborhood of the site subject to substitution, at the instant of the substitution. By defining the neighborhood of a site to extend over all sites in the codons in both reading frames to which a site belong, constraints imposed by the genetic code in both reading frames can be taken into account. Due to the dependency of the instantaneous rates of substitution on the states at neighboring sites, the transition probability between sequences does not factorize and therefore can not be obtained directly. We present a Markov chain Monte Carlo procedure for obtaining the ratio of two transition probabilities between two sequences under the model considered, and describe how maximum likelihood parameter estimation and likelihood ration tests can be performed using the procedure. We describe how the expected numbers of different types of substitutions in the shared histroy of two sequences can be calculated and use the described model and methodology in an analysis of a pairwise alignment of two Hepatitis B sequences in which two genes are encoded in overlapping frames. Finally, we present an extended model together with a simpler approximate estimation procedure, and use this to test the adequacy of the former model.
Format available: PDF (378 KB)