Paul Klee - Ancient Sound |
However, the larger datasets become,
the more difficult it will be to provide a panoramic view of the entire
dataset. Indicator vectors of sequences visualized in so called Klee-diagrams
have the potential to overcome this caveat of tree methods. I like to show a few examples here on the blog to demonstrate the power of Klee-diagrams displaying larger DNA Barcode datasets.
For my Klee-diagrams I utilized a mathematicalapproach to comparative analysis of nucleotide sequences using digitaltransformation in vector space. Essentially DNA data are transformed into vectors. A distinguishing vector which is indicative of a specific group of organisms can be calculated based on the transformed DNA sequence information (the number n of members in such a group can be defined). These so called indicator vectors can be constructed based on different taxonomic levels or other interesting groupings. All this is
implemented in a MatLab routine available at Mark Stoeckles barcode site.
Matrices of correlations among the indicator vectors can be displayed as
false-color maps (Klee-diagrams) using the MatLab graph functions. Note that the input could also be any other value set e.g. a distance matrix.
While the order of sequences in an alignment does not affect the actual
calculations, for the resulting Klee diagram it is useful to
arrange the sequences to approximate evolutionary relationships. Therefore, I organized the data based on the topology of Neighbor Joining trees (here constructed with MEGA 5). The re-ordering of my alignments was conducted
with a customized Tree Parser routine.
Over the last couple of years I dedicated a large amount of my time to marine DNA Barcoding and it seems logical to use some of the data we collected over 4 years to showcase the Klee-diagrams. By the way, all figures are hyperlinked (the captions) and available via figshare.
Klee-Diagram depictingindicator vectors (n=3) for 5000 marine species DNA Barcodes representing 10 phyla. |
The figure shows a Klee-diagram that was constructed using marine COI barcodes
publicly available on BOLD. Blocks of high
correlation on the diagonal reflect affinities within groups of species,
corresponding to taxonomic divisions.
Major marine groups are clearly separated in the diagram. While COI
usually fails to resolve intermediate taxonomy it performs surprisingly well to
resolve the major marine phyla in this dataset. Rapidly evolving sites appear
saturated while more constrained sites are sufficiently variable to be
phylogenetically useful. Thus, it could be argued that the level of divergence
of genetic relationships examined here is for the most part located in windows
in which rapidly evolving sites are too saturated and slowly evolving sites are
variable enough to provide phylogenetic signal on two levels.
Klee diagrams utilizing only one gene fragment cannot replace in-depth
phylogenetic multi-gene analysis but it is conceivable that heat-map based
visualization can overcome the inadequatenesses of large scale trees.
Topologies generated through complex multi-gene algorithms could be translated
into such diagrams as well.
An advantage of the
method is its scalability. The figure below depicts a comparison
of two groups of marine invertebrates – echinoderms and polychaetes – based on
the COI gene. DNA Barcoding has been proven to be an effective, accurate and
useful method of species diagnosis for all five classes of Echinodermata . In addition our Klee diagram reveals
discontinuities corresponding to higher-level taxonomic divisions (left diagram). Furthermore,
some areas of high correlation are indicative of species groups that exhibit
low barcode divergence due to rather recent speciation events.
Klee diagrams for 2groups of marine invertebrates. a. Indicator vector correlation (n=2) for 560echinoderm species. b. Indicator vectors (n=2) for 375 polychaete species. |
Indeed some of the crinoid species (in the left diagram, upper left block)
have been recently identified as Antarctica’s first example of a marine invertebrate species flock. Similar
species complexes are discussed for the asteroid genus Henricia which is also shown the diagram
(left diagram, position 170-195).
The diagram on the right shows that COI is not able to resolve the major groups
within the polychaeta. Traditionally, 18S rRNA has been used to provide
phylogenies that resolve the divisions within the polychaetes . Many species thought to have broad distributions turned out to be a complex of allied species and that this often rather reflects the
limitations of conventional taxonomy than actual cosmopolitanism. Also polychaetes
in general are thought to be paraphyletic and the lack of distinctness in the
diagram might reflect an overall unresolved taxonomy. However, it needs to be
pointed out that the method used to calculate the indicator vectors is based on
the rather arbitrary grouping by species identifications which could mask true
diversity patterns in some cases.
...to be continued
Starting a Wikipedia article on these ideas - would be nice if you could weigh in at https://en.wikipedia.org/wiki/Klee_diagram
ReplyDelete