Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
medRxiv ; 2024 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-38496498

RESUMO

Less than half of individuals with a suspected Mendelian condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control datasets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project ONT Sequencing Consortium aims to generate LRS data from at least 800 of the 1000 Genomes Project samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37x and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.

2.
Cell ; 185(18): 3426-3440.e19, 2022 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-36055201

RESUMO

The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. The final, phase 3 release of the 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS 1kGP resource, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina. We performed single-nucleotide variant (SNV) and short insertion and deletion (INDEL) discovery and generated a comprehensive set of structural variants (SVs) by integrating multiple analytic methods through a machine learning model. We show gains in sensitivity and precision of variant calls compared to phase 3, especially among rare SNVs as well as INDELs and SVs spanning frequency spectrum. We also generated an improved reference imputation panel, making variants discovered here accessible for association studies.


Assuntos
Genoma Humano , Sequenciamento Completo do Genoma , Feminino , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Mutação INDEL , Masculino , Polimorfismo de Nucleotídeo Único
3.
Nat Genet ; 54(4): 518-525, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35410384

RESUMO

Typical genotyping workflows map reads to a reference genome before identifying genetic variants. Generating such alignments introduces reference biases and comes with substantial computational burden. Furthermore, short-read lengths limit the ability to characterize repetitive genomic regions, which are particularly challenging for fast k-mer-based genotypers. In the present study, we propose a new algorithm, PanGenie, that leverages a haplotype-resolved pangenome reference together with k-mer counts from short-read sequencing data to genotype a wide spectrum of genetic variation-a process we refer to as genome inference. Compared with mapping-based approaches, PanGenie is more than 4 times faster at 30-fold coverage and achieves better genotype concordances for almost all variant types and coverages tested. Improvements are especially pronounced for large insertions (≥50 bp) and variants in repetitive regions, enabling the inclusion of these classes of variants in genome-wide association studies. PanGenie efficiently leverages the increasing amount of haplotype-resolved assemblies to unravel the functional impact of previously inaccessible variants while being faster compared with alignment-based workflows.


Assuntos
Variação Genética , Genoma Humano , Genômica , Algoritmos , Genoma Humano/genética , Estudo de Associação Genômica Ampla , Genômica/métodos , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA
4.
Nat Biotechnol ; 40(5): 672-680, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35132260

RESUMO

The repetitive nature and complexity of some medically relevant genes poses a challenge for their accurate analysis in a clinical setting. The Genome in a Bottle Consortium has provided variant benchmark sets, but these exclude nearly 400 medically relevant genes due to their repetitiveness or polymorphic complexity. Here, we characterize 273 of these 395 challenging autosomal genes using a haplotype-resolved whole-genome assembly. This curated benchmark reports over 17,000 single-nucleotide variations, 3,600 insertions and deletions and 200 structural variations each for human genome reference GRCh37 and GRCh38 across HG002. We show that false duplications in either GRCh37 or GRCh38 result in reference-specific, missed variants for short- and long-read technologies in medically relevant genes, including CBS, CRYAA and KCNE1. When masking these false duplications, variant recall can improve from 8% to 100%. Forming benchmarks from a haplotype-resolved whole-genome assembly may become a prototype for future benchmarks covering the whole genome.


Assuntos
Genoma Humano , Genoma Humano/genética , Haplótipos/genética , Humanos , Análise de Sequência de DNA
6.
Nat Biotechnol ; 39(9): 1129-1140, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34504351

RESUMO

Assessing the reproducibility, accuracy and utility of massively parallel DNA sequencing platforms remains an ongoing challenge. Here the Association of Biomolecular Resource Facilities (ABRF) Next-Generation Sequencing Study benchmarks the performance of a set of sequencing instruments (HiSeq/NovaSeq/paired-end 2 × 250-bp chemistry, Ion S5/Proton, PacBio circular consensus sequencing (CCS), Oxford Nanopore Technologies PromethION/MinION, BGISEQ-500/MGISEQ-2000 and GS111) on human and bacterial reference DNA samples. Among short-read instruments, HiSeq 4000 and X10 provided the most consistent, highest genome coverage, while BGI/MGISEQ provided the lowest sequencing error rates. The long-read instrument PacBio CCS had the highest reference-based mapping rate and lowest non-mapping rate. The two long-read platforms PacBio CCS and PromethION/MinION showed the best sequence mapping in repeat-rich areas and across homopolymers. NovaSeq 6000 using 2 × 250-bp read chemistry was the most robust instrument for capturing known insertion/deletion events. This study serves as a benchmark for current genomics technologies, as well as a resource to inform experimental design and next-generation sequencing variant calling.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normas , Pareamento Incorreto de Bases , Benchmarking , DNA/genética , DNA Bacteriano/genética , Genoma Bacteriano , Genoma Humano , Humanos
7.
Science ; 372(6537)2021 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-33632895

RESUMO

Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.


Assuntos
Variação Genética , Genoma Humano , Haplótipos , Feminino , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação INDEL , Sequências Repetitivas Dispersas , Masculino , Grupos Populacionais/genética , Locos de Características Quantitativas , Retroelementos , Análise de Sequência de DNA , Inversão de Sequência , Sequenciamento Completo do Genoma
8.
Nature ; 590(7845): 290-299, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33568819

RESUMO

The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.


Assuntos
Variação Genética/genética , Genoma Humano/genética , Genômica , National Heart, Lung, and Blood Institute (U.S.) , Medicina de Precisão , Citocromo P-450 CYP2D6/genética , Haplótipos/genética , Heterozigoto , Humanos , Mutação INDEL , Mutação com Perda de Função , Mutagênese , Fenótipo , Polimorfismo de Nucleotídeo Único , Densidade Demográfica , Medicina de Precisão/normas , Controle de Qualidade , Tamanho da Amostra , Estados Unidos , Sequenciamento Completo do Genoma/normas
9.
Sci Rep ; 10(1): 12629, 2020 07 28.
Artigo em Inglês | MEDLINE | ID: mdl-32724070

RESUMO

Ethiopian mustard (Brassica carinata A. Braun) is an emerging sustainable source of vegetable oil, in particular for the biofuel industry. The present study exploited genome assemblies of the Brassica diploids, Brassica nigra and Brassica oleracea, to discover over 10,000 genome-wide SNPs using genotype by sequencing of 620 B. carinata lines. The analyses revealed a SNP frequency of one every 91.7 kb, a heterozygosity level of 0.30, nucleotide diversity levels of 1.31 × 10-05, and the first five principal components captured only 13% molecular variation, indicating low levels of genetic diversity among the B. carinata collection. Genome bias was observed, with greater SNP density found on the B subgenome. The 620 lines clustered into two distinct sub-populations (SP1 and SP2) with the majority of accessions (88%) clustered in SP1 with those from Ethiopia, the presumed centre of origin. SP2 was distinguished by a collection of breeding lines, implicating targeted selection in creating population structure. Two selective sweep regions on B3 and B8 were detected, which harbour genes involved in fatty acid and aliphatic glucosinolate biosynthesis, respectively. The assessment of genetic diversity, population structure, and LD in the global B. carinata collection provides critical information to assist future crop improvement.


Assuntos
Produtos Agrícolas/genética , Indústrias , Desequilíbrio de Ligação/genética , Mostardeira/genética , Cromossomos de Plantas/genética , Variação Genética , Genética Populacional , Genoma de Planta , Haplótipos/genética , Polimorfismo de Nucleotídeo Único/genética , Seleção Genética
10.
PLoS Comput Biol ; 16(6): e1007933, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32559231

RESUMO

A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 1235 SVs, which can ultimately be used to evaluate SV callers or train machine learning models. We developed a crowdsourcing app-SVCurator-to help GIAB curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy. SVCurator displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002]. We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. 'Expert' curators were 93% concordant with each other, and 37 of the 61 curators had at least 78% concordance with a set of 'expert' curators. The curators were least concordant for complex SVs and SVs that had inaccurate breakpoints or size predictions. After filtering events with low concordance among curators, we produced high confidence labels for 935 events. The SVCurator crowdsourced labels were 94.5% concordant with the heuristic-based draft benchmark SV callset from GIAB. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies.


Assuntos
Genoma Humano , Variação Estrutural do Genoma , Heurística , Humanos , Mutação INDEL
11.
F1000Res ; 8: 1751, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-34386196

RESUMO

In March 2019, 45 scientists and software engineers from around the world converged at the University of California, Santa Cruz for the first pangenomics codeathon. The purpose of the meeting was to propose technical specifications and standards for a usable human pangenome as well as to build relevant tools for genome graph infrastructures. During the meeting, the group held several intense and productive discussions covering a diverse set of topics, including advantages of graph genomes over a linear reference representation, design of new methods that can leverage graph-based data structures, and novel visualization and annotation approaches for pangenomes. Additionally, the participants self-organized themselves into teams that worked intensely over a three-day period to build a set of pipelines and tools for specific pangenomic applications. A summary of the questions raised and the tools developed are reported in this manuscript.

12.
G3 (Bethesda) ; 8(8): 2673-2683, 2018 07 31.
Artigo em Inglês | MEDLINE | ID: mdl-29907649

RESUMO

The heavy selection pressure due to intensive breeding of Brassica napus has created a narrow gene pool, limiting the ability to produce improved varieties through crosses between B. napus cultivars. One mechanism that has contributed to the adaptation of important agronomic traits in the allotetraploid B. napus has been chromosomal rearrangements resulting from homoeologous recombination between the constituent A and C diploid genomes. Determining the rate and distribution of such events in natural B. napus will assist efforts to understand and potentially manipulate this phenomenon. The Brassica high-density 60K SNP array, which provides genome-wide coverage for assessment of recombination events, was used to assay 254 individuals derived from 11 diverse cultivated spring type B. napus These analyses identified reciprocal allele gain and loss between the A and C genomes and allowed visualization of de novo homoeologous recombination events across the B. napus genome. The events ranged from loss/gain of 0.09 Mb to entire chromosomes, with almost 5% aneuploidy observed across all gametes. There was a bias toward sub-telomeric exchanges leading to genome homogenization at chromosome termini. The A genome replaced the C genome in 66% of events, and also featured more dominantly in gain of whole chromosomes. These analyses indicate de novo homoeologous recombination is a continuous source of variation in established Brassica napus and the rate of observed events appears to vary with genetic background. The Brassica 60K SNP array will be a useful tool in further study and manipulation of this phenomenon.


Assuntos
Brassica napus/genética , Polimorfismo de Nucleotídeo Único , Recombinação Genética , Cromossomos de Plantas/genética , Frequência do Gene , Genoma de Planta , Análise de Sequência com Séries de Oligonucleotídeos
13.
Genome Res ; 28(5): 751-758, 2018 05.
Artigo em Inglês | MEDLINE | ID: mdl-29588360

RESUMO

High-throughput sequencing is a revolutionary technology for the analysis of metagenomic samples. However, querying large volumes of reads against comprehensive DNA/RNA databases in a sensitive manner can be compute-intensive. Here, we present taxMaps, a highly efficient, sensitive, and fully scalable taxonomic classification tool. Using a combination of simulated and real metagenomics data sets, we demonstrate that taxMaps is more sensitive and more precise than widely used taxonomic classifiers and is capable of delivering classification accuracy comparable to that of BLASTN, but at up to three orders of magnitude less computational cost.


Assuntos
Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenômica/métodos , Software , Bactérias/classificação , Bactérias/genética , Bases de Dados de Ácidos Nucleicos , Humanos , Microbiota/genética , Reprodutibilidade dos Testes , Rios/microbiologia , Especificidade da Espécie , Microbiologia da Água
14.
Plant J ; 88(5): 879-894, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-27513981

RESUMO

Camelina sativa is currently being embraced as a viable industrial bio-platform crop due to a number of desirable agronomic attributes and the unique fatty acid profile of the seed oil that has applications for food, feed and biofuel. The recent completion of the reference genome sequence of C. sativa identified a young hexaploid genome. To complement this work, we have generated a genome-wide developmental transcriptome map by RNA sequencing of 12 different tissues covering major developmental stages during the life cycle of C. sativa. We have generated a digital atlas of this comprehensive transcriptome resource that enables interactive visualization of expression data through a searchable database of electronic fluorescent pictographs (eFP browser). An analysis of this dataset supported expression of 88% of the annotated genes in C. sativa and provided a global overview of the complex architecture of temporal and spatial gene expression patterns active during development. Conventional differential gene expression analysis combined with weighted gene expression network analysis uncovered similarities as well as differences in gene expression patterns between different tissues and identified tissue-specific genes and network modules. A high-quality census of transcription factors, analysis of alternative splicing and tissue-specific genome dominance provided insight into the transcriptional dynamics and sub-genome interplay among the well-preserved triplicated repertoire of homeologous loci. The comprehensive transcriptome atlas in combination with the reference genome sequence provides a powerful resource for genomics research which can be leveraged to identify functional associations between genes and understand the regulatory networks underlying developmental processes.


Assuntos
Biocombustíveis , Brassicaceae/metabolismo , Proteínas de Plantas/metabolismo , Transcriptoma/genética , Brassicaceae/genética , Regulação da Expressão Gênica de Plantas/genética , Regulação da Expressão Gênica de Plantas/fisiologia , Proteínas de Plantas/genética , Poliploidia , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
15.
Theor Appl Genet ; 129(10): 1887-99, 2016 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-27364915

RESUMO

KEY MESSAGE: The Brassica napus Illumina array provides genome-wide markers linked to the available genome sequence, a significant tool for genetic analyses of the allotetraploid B. napus and its progenitor diploid genomes. A high-density single nucleotide polymorphism (SNP) Illumina Infinium array, containing 52,157 markers, was developed for the allotetraploid Brassica napus. A stringent selection process employing the short probe sequence for each SNP assay was used to limit the majority of the selected markers to those represented a minimum number of times across the highly replicated genome. As a result approximately 60 % of the SNP assays display genome-specificity, resolving as three clearly separated clusters (AA, AB, and BB) when tested with a diverse range of B. napus material. This genome specificity was supported by the analysis of the diploid ancestors of B. napus, whereby 26,504 and 29,720 markers were scorable in B. oleracea and B. rapa, respectively. Forty-four percent of the assayed loci on the array were genetically mapped in a single doubled-haploid B. napus population allowing alignment of their physical and genetic coordinates. Although strong conservation of the two positions was shown, at least 3 % of the loci were genetically mapped to a homoeologous position compared to their presumed physical position in the respective genome, underlying the importance of genetic corroboration of locus identity. In addition, the alignments identified multiple rearrangements between the diploid and tetraploid Brassica genomes. Although mostly attributed to genome assembly errors, some are likely evidence of rearrangements that occurred since the hybridisation of the progenitor genomes in the B. napus nucleus. Based on estimates for linkage disequilibrium decay, the array is a valuable tool for genetic fine mapping and genome-wide association studies in B. napus and its progenitor genomes.


Assuntos
Brassica napus/genética , Mapeamento Cromossômico , Genoma de Planta , Técnicas de Genotipagem , Polimorfismo de Nucleotídeo Único , DNA de Plantas/genética , Diploide , Marcadores Genéticos , Análise de Sequência de DNA , Tetraploidia
16.
BMC Genomics ; 17: 272, 2016 Mar 31.
Artigo em Inglês | MEDLINE | ID: mdl-27036196

RESUMO

BACKGROUND: The protist Plasmodiophora brassicae is a soil-borne pathogen of cruciferous species and the causal agent of clubroot disease of Brassicas including agriculturally important crops such as canola/rapeseed (Brassica napus). P. brassicae has remained an enigmatic plant pathogen and is a rare example of an obligate biotroph that resides entirely inside the host plant cell. The pathogen is the cause of severe yield losses and can render infested fields unsuitable for Brassica crop growth due to the persistence of resting spores in the soil for up to 20 years. RESULTS: To provide insight into the biology of the pathogen and its interaction with its primary host B. napus, we produced a draft genome of P. brassicae pathotypes 3 and 6 (Pb3 and Pb6) that differ in their host range. Pb3 is highly virulent on B. napus (but also infects other Brassica species) while Pb6 infects only vegetable Brassica crops. Both the Pb3 and Pb6 genomes are highly compact, each with a total size of 24.2 Mb, and contain less than 2 % repetitive DNA. Clustering of genome-wide single nucleotide polymorphisms (SNP) of Pb3, Pb6 and three additional re-sequenced pathotypes (Pb2, Pb5 and Pb8) shows a high degree of correlation of cluster grouping with host range. The Pb3 genome features significant reduction of intergenic space with multiple examples of overlapping untranslated regions (UTRs). Dependency on the host for essential nutrients is evident from the loss of genes for the biosynthesis of thiamine and some amino acids and the presence of a wide range of transport proteins, including some unique to P. brassicae. The annotated genes of Pb3 include those with a potential role in the regulation of the plant growth hormones cytokinin and auxin. The expression profile of Pb3 genes, including putative effectors, during infection and their potential role in manipulation of host defence is discussed. CONCLUSION: The P. brassicae genome sequence reveals a compact genome, a dependency of the pathogen on its host for some essential nutrients and a potential role in the regulation of host plant cytokinin and auxin. Genome annotation supported by RNA sequencing reveals significant reduction in intergenic space which, in addition to low repeat content, has likely contributed to the P. brassicae compact genome.


Assuntos
Brassica/parasitologia , Genoma de Protozoário , Interações Hospedeiro-Parasita/genética , Plasmodioforídeos/genética , Arabidopsis , Produtos Agrícolas/parasitologia , Citocininas/metabolismo , DNA de Protozoário/genética , Especificidade de Hospedeiro , Ácidos Indolacéticos/metabolismo , Doenças das Plantas/parasitologia , Análise de Sequência de RNA , Transcriptoma
17.
Methods Mol Biol ; 1374: 269-84, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26519412

RESUMO

The development of genotyping-by-sequencing (GBS) to rapidly detect nucleotide variation at the whole genome level, in many individuals simultaneously, has provided a transformative genetic profiling technique. GBS can be carried out in species with or without reference genome sequences yields huge amounts of potentially informative data. One limitation with the approach is the paucity of tools to transform the raw data into a format that can be easily interrogated at the genetic level. In this chapter we describe bioinformatics tools developed to address this shortfall together with experimental design considerations to fully leverage the power of GBS for genetic analysis.


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Técnicas de Genotipagem , Software , Navegador
18.
Mol Breed ; 35: 35, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25620879

RESUMO

Camelina sativa, a largely relict crop, has recently returned to interest due to its potential as an industrial oilseed. Molecular markers are key tools that will allow C. sativa to benefit from modern breeding approaches. Two complementary methodologies, capture of 3' cDNA tags and genomic reduced-representation libraries, both of which exploited second generation sequencing platforms, were used to develop a low density (768) Illumina GoldenGate single nucleotide polymorphism (SNP) array. The array allowed 533 SNP loci to be genetically mapped in a recombinant inbred population of C. sativa. Alignment of the SNP loci to the C. sativa genome identified the underlying sequenced regions that would delimit potential candidate genes in any mapping project. In addition, the SNP array was used to assess genetic variation among a collection of 175 accessions of C. sativa, identifying two sub-populations, yet low overall gene diversity. The SNP loci will provide useful tools for future crop improvement of C. sativa.

19.
Plant Cell ; 26(7): 2777-91, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-25035408

RESUMO

The Brassicaceae (Cruciferae) family, owing to its remarkable species, genetic, and physiological diversity as well as its significant economic potential, has become a model for polyploidy and evolutionary studies. Utilizing extensive transcriptome pyrosequencing of diverse taxa, we established a resolved phylogeny of a subset of crucifer species. We elucidated the frequency, age, and phylogenetic position of polyploidy and lineage separation events that have marked the evolutionary history of the Brassicaceae. Besides the well-known ancient α (47 million years ago [Mya]) and ß (124 Mya) paleopolyploidy events, several species were shown to have undergone a further more recent (∼7 to 12 Mya) round of genome multiplication. We identified eight whole-genome duplications corresponding to at least five independent neo/mesopolyploidy events. Although the Brassicaceae family evolved from other eudicots at the beginning of the Cenozoic era of the Earth (60 Mya), major diversification occurred only during the Neogene period (0 to 23 Mya). Remarkably, the widespread species divergence, major polyploidy, and lineage separation events during Brassicaceae evolution are clustered in time around epoch transitions characterized by prolonged unstable climatic conditions. The synchronized diversification of Brassicaceae species suggests that polyploid events may have conferred higher adaptability and increased tolerance toward the drastically changing global environment, thus facilitating species radiation.


Assuntos
Brassicaceae/genética , Cleome/genética , Evolução Molecular , Genoma de Planta/genética , Sequência de Bases , Brassicaceae/classificação , Cleome/classificação , Biblioteca Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Anotação de Sequência Molecular , Dados de Sequência Molecular , Filogenia , Folhas de Planta/classificação , Folhas de Planta/genética , Poliploidia , RNA Mensageiro/genética , RNA de Plantas/química , RNA de Plantas/genética , Análise de Sequência de DNA , Fatores de Tempo , Transcriptoma
20.
Genome Biol ; 15(6): R77, 2014 Jun 10.
Artigo em Inglês | MEDLINE | ID: mdl-24916971

RESUMO

BACKGROUND: Brassica oleracea is a valuable vegetable species that has contributed to human health and nutrition for hundreds of years and comprises multiple distinct cultivar groups with diverse morphological and phytochemical attributes. In addition to this phenotypic wealth, B. oleracea offers unique insights into polyploid evolution, as it results from multiple ancestral polyploidy events and a final Brassiceae-specific triplication event. Further, B. oleracea represents one of the diploid genomes that formed the economically important allopolyploid oilseed, Brassica napus. A deeper understanding of B. oleracea genome architecture provides a foundation for crop improvement strategies throughout the Brassica genus. RESULTS: We generate an assembly representing 75% of the predicted B. oleracea genome using a hybrid Illumina/Roche 454 approach. Two dense genetic maps are generated to anchor almost 92% of the assembled scaffolds to nine pseudo-chromosomes. Over 50,000 genes are annotated and 40% of the genome predicted to be repetitive, thus contributing to the increased genome size of B. oleracea compared to its close relative B. rapa. A snapshot of both the leaf transcriptome and methylome allows comparisons to be made across the triplicated sub-genomes, which resulted from the most recent Brassiceae-specific polyploidy event. CONCLUSIONS: Differential expression of the triplicated syntelogs and cytosine methylation levels across the sub-genomes suggest residual marks of the genome dominance that led to the current genome architecture. Although cytosine methylation does not correlate with individual gene dominance, the independent methylation patterns of triplicated copies suggest epigenetic mechanisms play a role in the functional diversification of duplicate genes.


Assuntos
Brassica/genética , Genoma de Planta , Transcriptoma , Aneuploidia , Brassica/metabolismo , Mapeamento Cromossômico , Metilação de DNA , Epigênese Genética , Evolução Molecular , Regulação da Expressão Gênica de Plantas , Anotação de Sequência Molecular , Dados de Sequência Molecular , Análise de Sequência de DNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...