Modern biodiversity detectives have found new ways to synthesize massive amounts of sequence data into clear information and insights. Two powerful tools to help visualize and understand the structure of life are DNA barcodes and Klee diagrams. Mark Stoeckle and David Thaler pioneered the use and explanation of these tools to offer insights into how species originated and evolved.
What is a DNA Barcode?
A DNA barcode is a short, standardized segment of the genome used for species identification. In the animal kingdom, the gold standard is a 648-base pair (bp) segment of the mitochondrial cytochrome c oxidase subunit I (COI) gene. While this segment represents less than one-millionth of an organism’s total genome, it has proven remarkably effective because mitochondrial DNA clusters largely overlap with species as defined by experts.
This tool is commonly used in eDNA samples to identify species from the environment. The BOLD (Barcode Of Life Database) now contains approximately five million of these barcodes, covering about 100,000 animal species. Interestingly, there is nothing inherently “special” about the COI gene biologically; it became the standard because reliable primers were adopted by a critical mass of the scientific community.
Visualizing Life: The Klee Diagram
To make sense of these millions of sequences, scientists developed the Klee diagram, a heat map that displays correlations between DNA sequences. In these diagrams, every sequence is compared with every other sequence, and the intersections are color-coded to show similarity. (Sirovich, Lawrence, Mark Y. Stoeckle, and Yu Zhang. “Structural analysis of biodiversity.” PLoS One 5.2 (2010))

Key features of Klee diagrams include:
• Indicator Vectors: Each DNA sample is listed on both the x and y axis and a heat map is generated comparing each species to itself (red=1, a perfect match) and all of the other samples in the database.
• Species Islands: When sequences are arrayed, species appear as sharp, non-overlapping squares. This visualization confirms that species are “islands in sequence space,” with distinct clusters and empty gaps between them.
• Scalability: Recent software developments like PyKleeBarcode allow these diagrams to be computed for very large datasets, potentially representing the whole animal kingdom in a single information space.

Evolutionary Implications: Why Mitochondria Define Species
A long controversy in biology concerns whether species are “real” or just human constructs. Dobzhansky, in his 1937 book Genetics and the Origin of Species, claimed that “Biological classification [of species] is simultaneously a man-made system of pigeonholes devised for the pragmatic purpose of recording observations… and an acknowledgement of the fact of organic discontinuity.”
Stoeckle and Thaler, in their 2018 paper “Why should mitochondria define species?”, expand on the evolutionary meaning behind these barcode clusters. They argue that the patterns seen in DNA barcodes are central facts of animal life that evolutionary theory must explain.
1. The “Barcode Gap” and Low Intraspecific Variation: Across the animal kingdom, the average pairwise difference (APD) within species is typically very low, between 0.0% and 0.5%. Meanwhile, the distance between even the most closely related species is usually 2% or more. This “gap” exists because intermediates between clusters are absent or rare.
2. The Neutrality of Synonymous Mutations: Most variation within and between these barcode clusters consists of synonymous substitutions; mutations that change the DNA sequence but not the resulting protein.
Stoeckle and Thaler argue that these changes are selectively neutral in mitochondria. This is because animal mitochondria are simpler than the nuclear genome; they lack introns (and thus splicing) and only have 22 different tRNA types. This lack of complexity means synonymous codons are less likely to affect the “fitness” of the organism, allowing them to accumulate as a “molecular clock”.
However, observed patterns of variation in DNA barcodes do not match the predictions of Kimura’s Neutral evolutionary theory of random accumulation of mutations.

3. A Recent Universal Expansion? To reconcile these observations, Stoeckle and Thaler’s use humans as a case example. Modern humans have an APD of 0.1%, which is about average for the animal kingdom.
Several lines of evidence suggest that human mitochondria originated from a state of uniformity approximately 100,000 to 200,000 years ago before expanding. Stoeckle and Thaler propose that the extant populations of humans, and almost all other animal species, arrived at a similar result due to a similar process of expansion from mitochondrial uniformity within the same recent geological timeframe.

This coincides with Mayr’s 1942 idea that bottlenecks followed by expansion could explain speciation:
“The reduced variability of small populations is not always due to accidental gene loss, but sometimes to the fact that the entire population was started by a single pair or by a single fertilized female. These “founders” of the population carried with them only a very small proportion of the variability of the parent population. This “founder” principle sometimes explains even the uniformity of rather large populations…”

Conclusion
DNA barcodes and Klee diagrams do more than just identify species; they reveal a kingdom-wide pattern of organic discontinuity. Whether through population bottlenecks, lineage sorting, or gene sweeps, the uniform low variance across species suggests that the “islands” of biodiversity we see today are the result of deep evolutionary currents that affect all animals—from humans to birds to insects—in a surprisingly similar way.
Thaler and Stoeckler conclude their 2018 paper by noting that “there is irony but also grandeur in this view that, precisely because they have no phenotype, synonymous codon variations in mitochondria reveal the structure of species and the mechanism of speciation.”
Annotated Bibliography
Sirovich, Lawrence, Mark Y. Stoeckle, and Yu Zhang. “Structural analysis of biodiversity.” PLoS One 5.2 (2010): e9266.
- lays out math and originally defines “Klee diagrams”. Some examples.
- short and sweet version for Nature. Butterly and Warbler Klee examples.
- first paper to note that the observed patterns in Klee diagrams, of homogenous species, doesn’t match neutral theory. OK.
- short but good paper, cool data on humans, bonobos, and chimps, and comparison to results from their 2014 paper disproving neutral theory. Human/Chimp Klee example.
- deep dive analysis that builds on 2014 observation that mitochondrial DNA barcodes don’t match expectations of neutral theory (”Species are islands in sequence space.”), while at the same time appearing to be created by neutral (synonymous) sequence changes. This is explained by evolutionary mechanisms of speciation, which has implications for how recent most species have become species. These results also help to resolve some of the disagreements about the definition of a species.
- methods paper

