RESUMEN
High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1-4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
Asunto(s)
Genoma , Genómica/métodos , Vertebrados/genética , Animales , Aves , Biblioteca de Genes , Tamaño del Genoma , Genoma Mitocondrial , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento , Anotación de Secuencia Molecular , Alineación de Secuencia , Análisis de Secuencia de ADN , Cromosomas Sexuales/genéticaRESUMEN
Lymphatic filariasis is caused by parasitic nematodes and is a leading cause of disability worldwide. Many filarial worms contain the bacterium Wolbachia as an obligate endosymbiont. RNA sequencing is a common technique used to study their molecular relationships and to identify potential drug targets against the nematode and bacteria. Ribosomal RNA (rRNA) is the most abundant RNA species, accounting for 80-90% of the RNA in a sample. To reduce sequencing costs, it is necessary to remove ribosomal reads through poly-A enrichment or ribosomal depletion. Bacterial RNA does not contain a poly-A tail, making it difficult to sequence both the nematode and Wolbachia from the same library preparation using standard poly-A selection. Ribosomal depletion can utilize species-specific oligonucleotide probes to remove rRNA through pull-down or degradation methods. While species-specific probes are commercially available for many commonly studied model organisms, there are currently limited depletion options for filarial parasites. Here, we performed total RNA sequencing from Brugia malayi containing the Wolbachia symbiont (wBm) and designed ssDNA depletion probes against their rRNA sequences. We compared the total RNA library to poly-A enriched, Terminator 5'-Phosphate-Dependent Exonuclease treated, NEBNext Human/Bacteria rRNA depleted and our custom nematode probe depleted libraries. The custom nematode depletion library had the lowest percentage of ribosomal reads across all methods, with a 300-fold decrease in rRNA when compared to the total RNA library. The nematode depletion libraries also contained the highest percentage of Wolbachia mRNA reads, resulting in a 16-1,000-fold increase in bacterial reads compared to the other enrichment and depletion methods. Finally, we found that the Brugia malayi depletion probes can remove rRNA from the filarial worm Dirofilaria immitis and the majority of rRNA from the more distantly related free living nematode Caenorhabditis elegans. These custom filarial probes will allow for future dual RNA-seq experiments between nematodes and their bacterial symbionts from a single sequencing library.
RESUMEN
Genomics can be used to study the complex relationships between hosts and their microbiota. Many bacteria cannot be cultured in the laboratory, making it difficult to obtain adequate amounts of bacterial DNA and to limit host DNA contamination for the construction of metagenome-assembled genomes (MAGs). For example, Wolbachia is a genus of exclusively obligate intracellular bacteria that live in a wide range of arthropods and some nematodes. While Wolbachia endosymbionts are frequently described as facultative reproductive parasites in arthropods, the bacteria are obligate mutualistic endosymbionts of filarial worms. Here, we achieve 50-fold enrichment of bacterial sequences using ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) with Brugia malayi nematodes, containing Wolbachia (wBm). ATAC-seq uses the Tn5 transposase to cut and attach Illumina sequencing adapters to accessible DNA lacking histones, typically thought to be open chromatin. Bacterial and mitochondrial DNA in the lysates are also cut preferentially since they lack histones, leading to the enrichment of these sequences. The benefits of this include minimal tissue input (<1 mg of tissue), a quick protocol (<4 h), low sequencing costs, less bias, correct assembly of lateral gene transfers and no prior sequence knowledge required. We assembled the wBm genome with as few as 1 million Illumina short paired-end reads with >97% coverage of the published genome, compared to only 12% coverage with the standard gDNA libraries. We found significant bacterial sequence enrichment that facilitated genome assembly in previously published ATAC-seq data sets from human cells infected with Mycobacterium tuberculosis and C. elegans contaminated with their food source, the OP50 strain of E. coli. These results demonstrate the feasibility and benefits of using ATAC-seq to easily obtain bacterial genomes to aid in symbiosis, infectious disease, and microbiome research.
RESUMEN
The genetic profile of vertebrate pallia has long driven debate on homology across distantly related clades. Based on an expression profile of the orphan nuclear receptor NR4A2 in mouse and chicken brains, Puelles et al. (The Journal of Comparative Neurology, 2016, 524, 665-703) concluded that the avian lateral mesopallium is homologous to the mammalian claustrum, and the medial mesopallium homologous to the insula cortex. They argued that their findings contradict conclusions by Jarvis et al. (The Journal of Comparative Neurology, 2013, 521, 3614-3665) and Chen et al. (The Journal of Comparative Neurology, 2013, 521, 3666-3701) that the hyperpallium densocellare is instead a mesopallium cell population, and by Suzuki and Hirata (Frontiers in Neuroanatomy, 2014, 8, 783) that the avian mesopallium is homologous to mammalian cortical layers 2/3. Here, we find that NR4A2 is an activity-dependent gene and cannot be used to determine brain organization or species relationships without considering behavioral state. Activity-dependent NR4A2 expression has been previously demonstrated in the rodent brain, with the highest induction occurring within the claustrum, amygdala, deep and superficial cortical layers, and hippocampus. In the zebra finch, we find that NR4A2 is constitutively expressed in the arcopallium, but induced in parts of the mesopallium, and in sparse cells within the hyperpallium, depending on animal stimulus or behavioral state. Basal and induced NR4A2 expression patterns do not discount the previously named avian hyperpallium densocellare as dorsal mesopallium and conflict with proposed homology between the avian mesopallium and mammalian claustrum/insula at the exclusion of other brain regions. Broadly, these findings highlight the importance of controlling for behavioral state and neural activity to genetically define brain cell population relationships within and across species.