This is the last part of my little experiment on open access sharing via blog. The data and applications presented were part of a draft publication that needed a bit more in order to be submitted but there was no time to finish it. On the other hand I thought it should be shared with a larger audience instead of hiding it on my hard disk. So, here it is - the last third of a manuscript that was never fully completed nor submitted. I rearranged and edited a few sections but that's all.
and now ... genetic variation
The indicator vector method has been only applied to COI barcode
data so far. Its utility with respect to other genes or for comparison of
several gene diversity patters of a set of organisms hasn’t been tested yet.
Here I show two comparisons of fish sequence data obtained from full mtDNA
sequences on GenBank. All analyses were done with different mtDNA sequences of
the same individuals of 486 species representing 80% of all extant fish families.
The figure above represents a comparison of the full length COI sequence, the
standard COI barcode region, and the mini-barcode region proposed for archival
sequences. It
demonstrates the gradual loss of discontinuities necessary for separation into
species. However, especially the differences between the full length COI and
the DNA Barcoding region are not very pronounced thereby confirming the utility of
COI Barcodes in fishes that has been shown in so many studies.
It has also been shown that several important attributes of complete
mitochondrial genomes can be predicted with high accuracy from the DNA barcode
sequences alone. These attributes include average nucleotide composition, patterns
of strand asymmetry, GC content, and the high frequency of codons that encode
hydrophobic amino acids. Therefore, DNA Barcodes, or other short sequences
sampled from a wide taxonomic range, can give a meaningful overview of
variations in genome composition long before complete genome sequences become
available for many of the sampled taxa.
In an attempt to confirm these findings across a wide range of fish
species and to further test the capabilities of the indicator vector method I
conducted a parallel analysis of 6 representative mtDNA genes (ATPase 6,
Cytochrome b, Cytochrome Oxidase I, II, III, NADH dehydrogenase I) imposing an
identical order of sequences to all data subsets. They were organized based on
the topology of a Maximum Likelihood tree generated in RAxML with a concatenated
dataset of all mtDNA sequences obtained. A partitioned maximum likelihood
analysis was performed with the GTRMIX option. The resulting topology was used
to re-order all single gene data sets.
All Klee-diagrams retrieved were strikingly similar in appearance
indicating similar signals from all datasets.
Blocks of high correlation on the diagonal that are reflecting affinity
among species are visible in all cases. COI and Cytb produced relatively smooth
mapping, with maximum correlation among neighboring species, and decorrelation
among more distant species. Given the broad sampling across all fishes the
latter occurs much more frequent.
There is increasing evidence that within the nuclear genome,
selection works at a fine scale—gene by gene—rather than on a genome-wide basis.
Because the mitochondrial genome is inherited as a single molecule, mutational
biases or selective events would likely act on it as a whole, providing a basis
for the overall similarity of the false-color maps of all used mtDNA coding
genes. This means that any subset of the mitochondrial genome could be used as
a sentinel sequence that provides rapid insights into nucleotide usage and
composition.
No comments:
Post a Comment