RESUMO
Large-scale population analyses coupled with advances in technology have demonstrated that the human genome is more diverse than originally thought. To date, this diversity has largely been uncovered using short-read whole-genome sequencing. However, these short-read approaches fail to give a complete picture of a genome. They struggle to identify structural events, cannot access repetitive regions, and fail to resolve the human genome into haplotypes. Here, we describe an approach that retains long range information while maintaining the advantages of short reads. Starting from â¼1 ng of high molecular weight DNA, we produce barcoded short-read libraries. Novel informatic approaches allow for the barcoded short reads to be associated with their original long molecules producing a novel data type known as "Linked-Reads". This approach allows for simultaneous detection of small and large variants from a single library. In this manuscript, we show the advantages of Linked-Reads over standard short-read approaches for reference-based analysis. Linked-Reads allow mapping to 38 Mb of sequence not accessible to short reads, adding sequence in 423 difficult-to-sequence genes including disease-relevant genes STRC, SMN1, and SMN2 Both Linked-Read whole-genome and whole-exome sequencing identify complex structural variations, including balanced events and single exon deletions and duplications. Further, Linked-Reads extend the region of high-confidence calls by 68.9 Mb. The data presented here show that Linked-Reads provide a scalable approach for comprehensive genome analysis that is not possible using short reads alone.
Assuntos
Estudo de Associação Genômica Ampla/métodos , Polimorfismo Genético , Sequenciamento Completo do Genoma/métodos , Linhagem Celular , Genoma Humano , Humanos , Peptídeos e Proteínas de Sinalização Intercelular , Proteínas de Membrana/genética , Proteína 1 de Sobrevivência do Neurônio Motor/genética , Proteína 2 de Sobrevivência do Neurônio Motor/genéticaRESUMO
During cell division, spindle fibers attach to chromosomes at centromeres. The DNA sequence at regional centromeres is fast evolving with no conserved genetic signature for centromere identity. Instead CENH3, a centromere-specific histone H3 variant, is the epigenetic signature that specifies centromere location across both plant and animal kingdoms. Paradoxically, CENH3 is also adaptively evolving. An ongoing question is whether CENH3 evolution is driven by a functional relationship with the underlying DNA sequence. Here, we demonstrate that despite extensive protein sequence divergence, CENH3 histones from distant species assemble centromeres on the same underlying DNA sequence. We first characterized the organization and diversity of centromere repeats in wild-type Arabidopsis thaliana We show that A. thaliana CENH3-containing nucleosomes exhibit a strong preference for a unique subset of centromeric repeats. These sequences are largely missing from the genome assemblies and represent the youngest and most homogeneous class of repeats. Next, we tested the evolutionary specificity of this interaction in a background in which the native A. thaliana CENH3 is replaced with CENH3s from distant species. Strikingly, we find that CENH3 from Lepidium oleraceum and Zea mays, although specifying epigenetically weaker centromeres that result in genome elimination upon outcrossing, show a binding pattern on A. thaliana centromere repeats that is indistinguishable from the native CENH3. Our results demonstrate positional stability of a highly diverged CENH3 on independently evolved repeats, suggesting that the sequence specificity of centromeres is determined by a mechanism independent of CENH3.
Assuntos
Proteínas de Arabidopsis/genética , Arabidopsis/genética , Proteína Centromérica A/genética , Centrômero/genética , Polimorfismo Genético , Proteínas de Arabidopsis/química , Proteínas de Arabidopsis/metabolismo , Proteína Centromérica A/química , Proteína Centromérica A/metabolismo , Evolução Molecular , Nucleossomos/metabolismoRESUMO
Incompatibilities in interspecific hybrids, such as sterility and lethality, are widely observed causes of reproductive isolation and thus contribute to speciation. Because hybrid incompatibilities are caused by divergence in each of the hybridizing species, they also reveal genomic changes occurring on short evolutionary time scales that have functional consequences. These changes include divergence in protein-coding gene sequence, structure, and location, as well as divergence in noncoding DNAs. The most important unresolved issue is understanding the evolutionary causes of the divergence within species that in turn leads to incompatibility between species. Surprisingly, much of this divergence does not appear to be driven by ecological adaptation but may instead result from responses to purely mutational mechanisms or to internal genetic conflicts.
Assuntos
Quimera/genética , Hibridização Genética , Isolamento Reprodutivo , Adaptação Biológica , Alelos , Animais , Aberrações Cromossômicas , DNA/genética , Elementos de DNA Transponíveis , Epistasia Genética , Pleiotropia Genética , Especiação Genética , Heterocromatina/genética , Mutação , Sequências Repetitivas de Ácido Nucleico , Especificidade da Espécie , Transcrição GênicaRESUMO
The point of attachment of spindle microtubules to metaphase chromosomes is known as the centromere. Plant and animal centromeres are epigenetically specified by a centromere-specific variant of Histone H3, CENH3 (a.k.a. CENP-A). Unlike canonical histones that are invariant, CENH3 proteins are accumulating substitutions at an accelerated rate. This diversification of CENH3 is a conundrum since its role as the key determinant of centromere identity remains a constant across species. Here, we ask whether naturally occurring divergence in CENH3 has functional consequences. We performed functional complementation assays on cenh3-1, a null mutation in Arabidopsis thaliana, using untagged CENH3s from increasingly distant relatives. Contrary to previous results using GFP-tagged CENH3, we find that the essential functions of CENH3 are conserved across a broad evolutionary landscape. CENH3 from a species as distant as the monocot Zea mays can functionally replace A. thaliana CENH3. Plants expressing variant CENH3s that are fertile when selfed show dramatic segregation errors when crossed to a wild-type individual. The progeny of this cross include hybrid diploids, aneuploids with novel genetic rearrangements and haploids that inherit only the genome of the wild-type parent. Importantly, it is always chromosomes from the plant expressing the divergent CENH3 that missegregate. Using chimeras, we show that it is divergence in the fast-evolving N-terminal tail of CENH3 that is causing segregation errors and genome elimination. Furthermore, we analyzed N-terminal tail sequences from plant CENH3s and discovered a modular pattern of sequence conservation. From this we hypothesize that while the essential functions of CENH3 are largely conserved, the N-terminal tail is evolving to adapt to lineage-specific centromeric constraints. Our results demonstrate that this lineage-specific evolution of CENH3 causes inviability and sterility of progeny in crosses, at the same time producing karyotypic variation. Thus, CENH3 evolution can contribute to postzygotic reproductive barriers.
Assuntos
Arabidopsis/genética , Autoantígenos/genética , Proteínas Cromossômicas não Histona/genética , Segregação de Cromossomos/genética , Mitose/genética , Sequência de Aminoácidos , Animais , Arabidopsis/crescimento & desenvolvimento , Evolução Biológica , Centrômero/genética , Proteína Centromérica A , Quimera/genética , Diploide , Haploidia , Histonas/genética , Dados de Sequência Molecular , Zigoto/crescimento & desenvolvimentoRESUMO
The Dobzhansky and Muller (D-M) model explains the evolution of hybrid incompatibility (HI) through the interaction between lineage-specific derived alleles at two or more loci. In agreement with the expectation that HI results from functional divergence, many protein-coding genes that contribute to incompatibilities between species show signatures of adaptive evolution, including Lhr, which encodes a heterochromatin protein whose amino acid sequence has diverged extensively between Drosophila melanogaster and D. simulans by natural selection. The lethality of D. melanogaster/D. simulans F1 hybrid sons is rescued by removing D. simulans Lhr, but not D. melanogaster Lhr, suggesting that the lethal effect results from adaptive evolution in the D. simulans lineage. It has been proposed that adaptive protein divergence in Lhr reflects antagonistic coevolution with species-specific heterochromatin sequences and that defects in LHR protein localization cause hybrid lethality. Here we present surprising results that are inconsistent with this coding-sequence-based model. Using Lhr transgenes expressed under native conditions, we find no evidence that LHR localization differs between D. melanogaster and D. simulans, nor do we find evidence that it mislocalizes in their interspecific hybrids. Rather, we demonstrate that Lhr orthologs are differentially expressed in the hybrid background, with the levels of D. simulans Lhr double that of D. melanogaster Lhr. We further show that this asymmetric expression is caused by cis-by-trans regulatory divergence of Lhr. Therefore, the non-equivalent hybrid lethal effects of Lhr orthologs can be explained by asymmetric expression of a molecular function that is shared by both orthologs and thus was presumably inherited from the ancestral allele of Lhr. We present a model whereby hybrid lethality occurs by the interaction between evolutionarily ancestral and derived alleles.
Assuntos
Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/genética , Sequências Reguladoras de Ácido Nucleico/genética , Isolamento Reprodutivo , Animais , Animais Geneticamente Modificados , Evolução Biológica , Regulação da Expressão Gênica , Genes Letais , Especiação Genética , Heterocromatina/genética , Hibridização GenéticaRESUMO
We performed shallow single-cell sequencing of genomic DNA across 1475 cells from a cell-line, COLO829, to resolve overall complexity and clonality. This melanoma tumor-line has been previously characterized by multiple technologies and is a benchmark for evaluating somatic alterations. In some of these studies, COLO829 has shown conflicting and/or indeterminate copy number and, thus, single-cell sequencing provides a tool for gaining insight. Following shallow single-cell sequencing, we first identified at least four major sub-clones by discriminant analysis of principal components of single-cell copy number data. Based on clustering, break-point and loss of heterozygosity analysis of aggregated data from sub-clones, we identified distinct hallmark events that were validated within bulk sequencing and spectral karyotyping. In summary, COLO829 exhibits a classical Dutrillaux's monosomic/trisomic pattern of karyotype evolution with endoreduplication, where consistent sub-clones emerge from the loss/gain of abnormal chromosomes. Overall, our results demonstrate how shallow copy number profiling can uncover hidden biological insights.
Assuntos
Melanoma/genética , Melanoma/patologia , Análise de Célula Única/métodos , Linhagem Celular Tumoral , Variações do Número de Cópias de DNA , Humanos , Cariotipagem , Perda de Heterozigosidade , Análise de Sequência de DNARESUMO
In most diploids the centromere-specific histone H3 (CENH3), the assembly site of active centromeres, is encoded by a single copy gene. Persistance of two CENH3 paralogs in diploids species raises the possibility of subfunctionalization. Here we analysed both CENH3 genes of the diploid dryland crop cowpea. Phylogenetic analysis suggests that gene duplication of CENH3 occurred independently during the speciation of Vigna unguiculata. Both functional CENH3 variants are transcribed, and the corresponding proteins are intermingled in subdomains of different types of centromere sequences in a tissue-specific manner together with the kinetochore protein CENPC. CENH3.2 is removed from the generative cell of mature pollen, while CENH3.1 persists. CRISPR/Cas9-based inactivation of CENH3.1 resulted in delayed vegetative growth and sterility, indicating that this variant is needed for plant development and reproduction. By contrast, CENH3.2 knockout individuals did not show obvious defects during vegetative and reproductive development. Hence, CENH3.2 of cowpea is likely at an early stage of pseudogenization and less likely undergoing subfunctionalization.
Assuntos
Proteína Centromérica A/genética , Centrômero/genética , Variação Genética , Vigna/genética , Centrômero/metabolismo , Proteína Centromérica A/metabolismo , Evolução Molecular , Imunofluorescência , Regulação da Expressão Gênica de Plantas , Hibridização in Situ Fluorescente , Especificidade de Órgãos , Fenótipo , Filogenia , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Vigna/classificaçãoRESUMO
Lethality in hybrids between Drosophila melanogaster and its sibling species Drosophila simulans is caused in part by the interaction of the genes Hybrid male rescue (Hmr) and Lethal hybrid rescue (Lhr). Hmr and Lhr have diverged under positive selection in the hybridizing species. Here we test whether positive selection of Hmr is confined only to D. melanogaster and D. simulans. We find that Hmr has continued to diverge under recurrent positive selection between the sibling species D. simulans and Drosophila mauritiana and along the lineage leading to the melanogaster subgroup species pair Drosophila yakuba and Drosophila santomea. Hmr encodes a member of the Myb/SANT-like domain in ADF1 (MADF) family of transcriptional regulators. We show that although MADF domains from other Drosophila proteins have predicted ionic properties consistent with DNA binding, the MADF domains encoded by different Hmr orthologs have divergent properties consistent with binding to either the DNA or the protein components of chromatin. Our results suggest that Hmr may be functionally diverged in multiple species.
Assuntos
Proteínas de Drosophila/genética , Drosophila/genética , Hibridização Genética , Seleção Genética , Sequência de Aminoácidos , Animais , Drosophila/classificação , Proteínas de Drosophila/classificação , Drosophila melanogaster/genética , Feminino , Masculino , Dados de Sequência Molecular , Filogenia , Ligação Proteica , Estrutura Terciária de ProteínaRESUMO
Linked-Read sequencing technology has recently been employed successfully for de novo assembly of human genomes, however, the utility of this technology for complex plant genomes is unproven. We evaluated the technology for this purpose by sequencing the 3.5-gigabase (Gb) diploid pepper (Capsicum annuum) genome with a single Linked-Read library. Plant genomes, including pepper, are characterized by long, highly similar repetitive sequences. Accordingly, significant effort is used to ensure that the sequenced plant is highly homozygous and the resulting assembly is a haploid consensus. With a phased assembly approach, we targeted a heterozygous F1 derived from a wide cross to assess the ability to derive both haplotypes and characterize a pungency gene with a large insertion/deletion. The Supernova software generated a highly ordered, more contiguous sequence assembly than all currently available C. annuum reference genomes. Over 83% of the final assembly was anchored and oriented using four publicly available de novo linkage maps. A comparison of the annotation of conserved eukaryotic genes indicated the completeness of assembly. The validity of the phased assembly is further demonstrated with the complete recovery of both 2.5-Kb insertion/deletion haplotypes of the PUN1 locus in the F1 sample that represents pungent and nonpungent peppers, as well as nearly full recovery of the BUSCO2 gene set within each of the two haplotypes. The most contiguous pepper genome assembly to date has been generated which demonstrates that Linked-Read library technology provides a tool to de novo assemble complex highly repetitive heterozygous plant genomes. This technology can provide an opportunity to cost-effectively develop high-quality genome assemblies for other complex plants and compare structural and gene differences through accurate haplotype reconstruction.
RESUMO
Plant centromeres, which are determined epigenetically by centromeric histone 3 (CENH3) have revealed surprising structural diversity, ranging from the canonical monocentric seen in vertebrates, to polycentric, and holocentric. Normally stable, centromeres can change position over evolutionary times or upon genomic stress, such as when chromosomes are broken. At the DNA level, centromeres can be based on single copy DNA or more commonly on repeats. Rapid evolution of centromeric sequences and of CENH3 protein remains a mystery, as evidence of co-adaptation is lacking. Epigenetic differences between parents can trigger uniparental centromere failure and genome elimination, contributing to postzygotic hybridization barriers.â.
Assuntos
Centrômero , Cromossomos de Plantas , Evolução Biológica , Epigênese Genética , HaploidiaRESUMO
Genetic analysis in haploids provides unconventional yet powerful advantages not available in diploid organisms. In Arabidopsis thaliana, haploids can be generated through seeds by crossing a wild-type strain to a transgenic strain with altered centromeres. Here we report the development of an improved haploid inducer (HI) strain, SeedGFP-HI, that aids selection of haploid seeds prior to germination. We also show that haploids can be used as a tool to accelerate a variety of genetic analyses, specifically pyramiding multiple mutant combinations, forward mutagenesis screens, scaling down a tetraploid to lower ploidy levels and swapping of nuclear and cytoplasmic genomes. Furthermore, the A. thaliana HI can be used to produce haploids from a related species A. suecica and generate homozygous mutant plants from strong maternal gametophyte lethal alleles, which is not possible via conventional diploid genetics. Taken together, our results demonstrate the utility and power of haploid genetics in A. thaliana.
Assuntos
Arabidopsis/genética , Técnicas Genéticas , Haploidia , Genoma de Planta , Homozigoto , Mutação , FenótipoRESUMO
Hybrid incompatibility (HI) genes are frequently observed to be rapidly evolving under selection. This observation has led to the attractive conjecture that selection-derived protein-sequence divergence is culpable for incompatibilities in hybrids. The Drosophila simulans HI gene Lethal hybrid rescue (Lhr) is an intriguing case, because despite having experienced rapid sequence evolution, its HI properties are a shared function inherited from the ancestral state. Using an unusual D. simulans Lhr hybrid rescue allele, Lhr(2), we here identify a conserved stretch of 10 amino acids in the C terminus of LHR that is critical for causing hybrid incompatibility. Altering these 10 amino acids weakens or abolishes the ability of Lhr to suppress the hybrid rescue alleles Lhr(1) or Hmr(1), respectively. Besides single-amino-acid substitutions, Lhr orthologs differ by a 16-aa indel polymorphism, with the ancestral deletion state fixed in D. melanogaster and the derived insertion state at very high frequency in D. simulans. Lhr(2) is a rare D. simulans allele that has the ancestral deletion state of the 16-aa polymorphism. Through a series of transgenic constructs we demonstrate that the ancestral deletion state contributes to the rescue activity of Lhr(2). This indel is thus a polymorphism that can affect the HI function of Lhr.
Assuntos
Quimera/genética , Proteínas de Drosophila/genética , Drosophila/genética , Genes Letais , Animais , Evolução Biológica , Cruzamentos Genéticos , Mutação INDEL/genética , Polimorfismo Genético , Seleção GenéticaRESUMO
The Dobzhansky-Muller model proposes that hybrid incompatibilities are caused by the interaction between genes that have functionally diverged in the respective hybridizing species. Here, we show that Lethal hybrid rescue (Lhr) has functionally diverged in Drosophila simulans and interacts with Hybrid male rescue (Hmr), which has functionally diverged in D. melanogaster, to cause lethality in F1 hybrid males. LHR localizes to heterochromatic regions of the genome and has diverged extensively in sequence between these species in a manner consistent with positive selection. Rapidly evolving heterochromatic DNA sequences may be driving the evolution of this incompatibility gene.