This method relies heavily on the orderings of nucleotides appearing in the sequence. With the divergence of species over time, though, genomic rearrangements and in particular genetic shuffling make then sequence alignment unreliable or impossible.Graphical techniques are another powerful tool for the analysis and visualization of DNA sequences. Using graphical approaches can provide intuitive pictures or useful insights that assist the analysis of complicated relations between DNA sequences. This methodology starts with a graphical representation of DNA sequence which could be based on 2D, 3D, 4D, 5D, and 6D spaces and represents DNA as matrices by associating with the selected geometrical objects, then vectors composed of the invariants of matrices will be used to compare DNA sequences, see [1�C10].
Such schemes have an advantage in that they offer an instant, though, visual and qualitative summary of the lengthy DNA sequences. This approach also involves many unresolved questions. For example, how does one obtain suitable matrices to characterize DNA sequences and how are invariants selected suitable for sequence comparisons? In many cases, the calculation of the matrices or the invariants will become more and more difficult with the length of the sequence. There are also approaches which could arrive a mathematical representation of DNA sequences by nongraphical ways, see [11�C13]. And more recently, a new representation based on symbolic dynamics [14] and a new representation based on digital signal method [15] are also illustrated.
In this contribution, we introduce a novel nongraphical and nonalignment approach for DNA sequence comparison. We use DNA sequence directly GSK-3 by considering the frequencies of dinucleotide. We represent each DNA sequence by a dinucleotide frequency matrix or by a dinucleotide frequency vector, based on which two distance measurements are defined, respectively. Then comparisons between DNA sequences could be carried out by calculating the distances between these mathematical descriptors. The most important feature of this method is that the mathematical descriptors not only take into consideration the frequencies of adjacent XY pairs but also of nonadjacent XY pairs. In this way, information contained in the relative spacing of nucleotides is preserved. The method is very simple and fast, and does not require sequence alignment or sequence graphical representation which would cause complex calculations. It can be used to analyze both short and long DNA sequences. As an application, this method is tested on the exon-1 coding sequences of ��-globin for 11 species and the results are consistent with what have been reported previously [5, 9, 12, 14, 15], which prove the utility of this new method.2.