2013 Annual Science Report

Massachusetts Institute of Technology Reporting  |  SEP 2012 – AUG 2013

Early Animals: The Genomic Origins of Morphological Complexity

Project Summary

Understanding the origins of life’s complexity here on Earth is paramount to finding it elsewhere in the universe. The fossil record indicates that complexity on Earth arose in a near geological moment—the famous Cambrian explosion—about 525 million years ago. However, molecular sequence analyses indicate that complex animals actually arose nearly 200 million years before they make their first appearance in the fossil record (Erwin et al. 2011). This disparity between the advent of morphological complexity and its appearance in the fossil record motivates an interesting question: why is it that we cannot detect complex life here on Earth for nearly 200 million years? And if we cannot detect it on Earth, what hope would we have on another distant Earth-like planet? Our research is focused on addressing this question by trying to obtain a better understanding of what encodes morphological complexity in the genome. Our research (Heimberg et al. 2008; Philippe et al. 2001; Tarver et al. 2013) suggests that a group of non-coding RNA genes—microRNAs—might be instrumental for the advent and maintenance of complexity in animals, and therefore sequencing the genomes and the transcriptomes (the expressed component of the genome) from carefully chosen taxa might allow us to better understand the biology of animals that predated the Cambrian explosion.

4 Institutions
3 Teams
5 Publications
0 Field Sites
Field Sites

Project Progress

To date, we have sequenced the genome and both the mRNA and miRNA transcriptomes of the chaetognath Parasaggita elegans. Chaeotognaths are relatively complex animals and are the first predators that make their appearance in the fossil record during the Early Cambrian (Vannier et al. 2007). The genome assembly of Parasagitta consists of 866,422 scaffolds with the longest scaffold being 28,090 bp. Our assembled genome is 1.2 Gb, and our calculated N50 value is 750 bp. The mRNA transcriptome assembly of is 44Mb with an N50 of 1738bp. For the small RNA transcriptome, we analyzed 93,829 non-redundant sequences, all of which were expressed four or more times in our library. Using both genomic and mRNA transcriptome sequence we compiled a data set of the amino acid sequences from 186 protein-coding genes, and aligned them to an existing phylogenomics data set (Philippe et al., 2011). Our phylogenetic analysis (Figure 1) strongly suggests that chaetognaths are basal to the two major protostome clades, the lophotrochozoans and the ecydysozoans, similar to what others have recently proposed (e.g., Marlétaz et al., 2008). This phylogenetic position indicates that the chaetognath lineage split from the main protostome lineage ~650 Ma, over 100 Ma before they make their appearance in the fossil record. Our analysis of the microRNAs (Tarver et al. 2013) is consistent with this phylogenetic position as the chaetognath P. elegans shares nine miRNAs with ecdysozoans and lophotrochozoans not present in deuterostomes or in more basal organisms like sponges and jellyfish. However, the lophotrochozoans and ecydysozoans also possess three additional miRNAs not present in either our small RNA library or in the genome of the chaetognath, miR-36, miR-67 and miR-317 (Figure 2), suggesting that these miRNAs evolved in the protostome lineage after the lineage split between chaetognaths and the traditional protostomes.

Because the phylogenetic position of chaetognaths appears to be (finally!) firmly established we can now properly evaluate the taxon’s relative microRNA repertoire. Our data indicate that the chaetognath has lost very few miRNAs – only miR-242 and miR-2001 appear to be secondarily lost, consistent with the observation that chaetognaths are relatively complex animals with no obvious signs of secondary simplification, highlighting the disparity between the origins of morphological complexity versus its manifestation in the fossil record.

Phylogenomic analysis of 186 orthologous protein-coding genes that strongly suggests that chaeotognaths are basal to the metazoan protostomes. One hundred and eighty six genes (~21,000 amino acid positions) taken from the data set of Philippe et al. (2011), plus the orthologues found in our genome and transcriptome data-bases from the chaeotgnath Par-asagitta elegans, were aligned using Gblock, and analyzed using the CAT model in Phylobayes. This analysis strongly supports (posterior probabilities are given at key nodes) the monophyly of the protostomes (labeled “P”) and the two major protostome sub-groups, the Lophotrochozoa (molluscs, annelids, flatworms and the like, labeled “L”) and the Ecdysozoa (arthropods and pri-apulids, labeled “E”). Note that the branch leading to the three chaetognath species analyzed falls below the Protostome node (black box).
microRNAs support the phylogenomics result that chaeotognaths are basal to the protostomes. Twenty-four metazoan taxa were scored for the presence/absence of 523 miRNA families. Four equally shortest trees were found found using Dollo parsimony (PAUP* 4.0b10) with all characters given equal weight and using the branch and bound search algorithm. Note that chaeotgnaths fall in the same phylogenetic position as they did in the phylogenomics anal-ysis (Fig. 1) as basal to the protostomes. Further, like most metazoan taxa, but unlike secondarily simplified taxa like acoel flatworms, the chaetognath Parasaggita elegans retains most of its ancestral miRNA repertoire, losing only miR-242 and miR-2001 (note that losses in other taxa are not shown).