Monday, February 18, 2013

Marine Klee-diagrams (3)


This is the last part of my little experiment on open access sharing via blog. The data and applications presented were part of a draft publication that needed a bit more in order to be submitted but there was no time to finish it. On the other hand I thought it should be shared with a larger audience instead of hiding it on my hard disk. So, here it is - the last third of a manuscript that was never fully completed nor submitted. I rearranged and edited a few sections but that's all.

and now ... genetic variation


The indicator vector method has been only applied to COI barcode data so far. Its utility with respect to other genes or for comparison of several gene diversity patters of a set of organisms hasn’t been tested yet. Here I show two comparisons of fish sequence data obtained from full mtDNA sequences on GenBank. All analyses were done with different mtDNA sequences of the same individuals of 486 species representing 80% of all extant fish families. 

Klee diagrams for 486fish species using indicator vectors for full length COI, COI barcode region ,and COI mini-barcodes (Meusnier et al. 2008). Data were retrieved from full mtDNA genomes atthe NCBI Genome database.



The figure above represents a comparison of the full length COI sequence, the standard COI barcode region, and the mini-barcode region proposed for archival sequences. It demonstrates the gradual loss of discontinuities necessary for separation into species. However, especially the differences between the full length COI and the DNA Barcoding region are not very pronounced thereby confirming the utility of COI Barcodes in fishes that has been shown in so many studies.

It has also been shown that several important attributes of complete mitochondrial genomes can be predicted with high accuracy from the DNA barcode sequences alone. These attributes include average nucleotide composition, patterns of strand asymmetry, GC content, and the high frequency of codons that encode hydrophobic amino acids. Therefore, DNA Barcodes, or other short sequences sampled from a wide taxonomic range, can give a meaningful overview of variations in genome composition long before complete genome sequences become available for many of the sampled taxa.

In an attempt to confirm these findings across a wide range of fish species and to further test the capabilities of the indicator vector method I conducted a parallel analysis of 6 representative mtDNA genes (ATPase 6, Cytochrome b, Cytochrome Oxidase I, II, III, NADH dehydrogenase I) imposing an identical order of sequences to all data subsets. They were organized based on the topology of a Maximum Likelihood tree generated in RAxML with a concatenated dataset of all mtDNA sequences obtained. A partitioned maximum likelihood analysis was performed with the GTRMIX option. The resulting topology was used to re-order all single gene data sets.

Klee diagrams for 486fish species using indicator vectors for Cytochrome Oxidase I (COI), II (COII),III (COIII), Cytochrome b (Cytb), NADH dehydrogenase 1 (ND1), and  ATPase 6 ,. Data were retrieved from fullmtDNA genomes at the NCBI Genome database.
 

All Klee-diagrams retrieved were strikingly similar in appearance indicating similar signals from all datasets.  Blocks of high correlation on the diagonal that are reflecting affinity among species are visible in all cases. COI and Cytb produced relatively smooth mapping, with maximum correlation among neighboring species, and decorrelation among more distant species. Given the broad sampling across all fishes the latter occurs much more frequent. 

There is increasing evidence that within the nuclear genome, selection works at a fine scale—gene by gene—rather than on a genome-wide basis. Because the mitochondrial genome is inherited as a single molecule, mutational biases or selective events would likely act on it as a whole, providing a basis for the overall similarity of the false-color maps of all used mtDNA coding genes. This means that any subset of the mitochondrial genome could be used as a sentinel sequence that provides rapid insights into nucleotide usage and composition.  



No comments:

Post a Comment