Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 64
Filtrar
Mais filtros

Bases de dados
Tipo de documento
Intervalo de ano de publicação
1.
Plant J ; 109(1): 7-22, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34800071

RESUMO

Drought is a major limitation for survival and growth in plants. With more frequent and severe drought episodes occurring due to climate change, it is imperative to understand the genomic and physiological basis of drought tolerance to be able to predict how species will respond in the future. In this study, univariate and multitrait multivariate genome-wide association study methods were used to identify candidate genes in two iconic and ecosystem-dominating species of the western USA, coast redwood and giant sequoia, using 10 drought-related physiological and anatomical traits and genome-wide sequence-capture single nucleotide polymorphisms. Population-level phenotypic variation was found in carbon isotope discrimination, osmotic pressure at full turgor, xylem hydraulic diameter, and total area of transporting fibers in both species. Our study identified new 78 new marker × trait associations in coast redwood and six in giant sequoia, with genes involved in a range of metabolic, stress, and signaling pathways, among other functions. This study contributes to a better understanding of the genomic basis of drought tolerance in long-generation conifers and helps guide current and future conservation efforts in the species.


Assuntos
Adaptação Fisiológica/genética , Genoma de Planta/genética , Sequoia/genética , Sequoiadendron/genética , Transdução de Sinais/genética , Isótopos de Carbono/análise , Conservação dos Recursos Naturais , Secas , Estudo de Associação Genômica Ampla , Herança Multifatorial/genética , Pressão Osmótica , Fenótipo , Estômatos de Plantas/genética , Estômatos de Plantas/fisiologia , Sequoia/fisiologia , Sequoiadendron/fisiologia , Xilema/genética , Xilema/fisiologia
2.
Nature ; 551(7681): 498-502, 2017 11 23.
Artigo em Inglês | MEDLINE | ID: mdl-29143815

RESUMO

Aegilops tauschii is the diploid progenitor of the D genome of hexaploid wheat (Triticum aestivum, genomes AABBDD) and an important genetic resource for wheat. The large size and highly repetitive nature of the Ae. tauschii genome has until now precluded the development of a reference-quality genome sequence. Here we use an array of advanced technologies, including ordered-clone genome sequencing, whole-genome shotgun sequencing, and BioNano optical genome mapping, to generate a reference-quality genome sequence for Ae. tauschii ssp. strangulata accession AL8/78, which is closely related to the wheat D genome. We show that compared to other sequenced plant genomes, including a much larger conifer genome, the Ae. tauschii genome contains unprecedented amounts of very similar repeated sequences. Our genome comparisons reveal that the Ae. tauschii genome has a greater number of dispersed duplicated genes than other sequenced genomes and its chromosomes have been structurally evolving an order of magnitude faster than those of other grass genomes. The decay of colinearity with other grass genomes correlates with recombination rates along chromosomes. We propose that the vast amounts of very similar repeated sequences cause frequent errors in recombination and lead to gene duplications and structural chromosome changes that drive fast genome evolution.


Assuntos
Genoma de Planta , Filogenia , Poaceae/genética , Triticum/genética , Mapeamento Cromossômico , Diploide , Evolução Molecular , Duplicação Gênica , Genes de Plantas/genética , Genômica/normas , Poaceae/classificação , Recombinação Genética/genética , Análise de Sequência de DNA/normas , Triticum/classificação
3.
Plant J ; 104(2): 365-376, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32654344

RESUMO

The genomic architecture and molecular mechanisms controlling variation in quantitative disease resistance loci are not well understood in plant species and have been barely studied in long-generation trees. Quantitative trait loci mapping and genome-wide association studies were combined to test a large single nucleotide polymorphism (SNP) set for association with quantitative and qualitative white pine blister rust resistance in sugar pine. In the absence of a chromosome-scale reference genome, a high-density consensus linkage map was generated to obtain locations for associated SNPs. Newly discovered associations for white pine blister rust quantitative disease resistance included 453 SNPs involved in wide biological functions, including genes associated with disease resistance and others involved in morphological and developmental processes. In addition, NBS-LRR pathogen recognition genes were found to be involved in quantitative disease resistance, suggesting these newly reported genes are qualitative genes with partial resistance, they are the result of defeated qualitative resistance due to avirulent races, or they have epistatic effects on qualitative disease resistance genes. This study is a step forward in our understanding of the complex genomic architecture of quantitative disease resistance in long-generation trees, and constitutes the first step towards marker-assisted disease resistance breeding in white pine species.


Assuntos
Basidiomycota/fisiologia , Resistência à Doença/genética , Pinus/genética , Pinus/microbiologia , Mapeamento Cromossômico , Genes de Plantas , Genética Populacional , Genoma de Planta , Estudo de Associação Genômica Ampla , Fenótipo , Doenças das Plantas/microbiologia , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas
4.
Genome Res ; 27(5): 787-792, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-28130360

RESUMO

Long sequencing reads generated by single-molecule sequencing technology offer the possibility of dramatically improving the contiguity of genome assemblies. The biggest challenge today is that long reads have relatively high error rates, currently around 15%. The high error rates make it difficult to use this data alone, particularly with highly repetitive plant genomes. Errors in the raw data can lead to insertion or deletion errors (indels) in the consensus genome sequence, which in turn create significant problems for downstream analysis; for example, a single indel may shift the reading frame and incorrectly truncate a protein sequence. Here, we describe an algorithm that solves the high error rate problem by combining long, high-error reads with shorter but much more accurate Illumina sequencing reads, whose error rates average <1%. Our hybrid assembly algorithm combines these two types of reads to construct mega-reads, which are both long and accurate, and then assembles the mega-reads using the CABOG assembler, which was designed for long reads. We apply this technique to a large data set of Illumina and PacBio sequences from the species Aegilops tauschii, a large and extremely repetitive plant genome that has resisted previous attempts at assembly. We show that the resulting assembled contigs are far larger than in any previous assembly, with an N50 contig size of 486,807 nucleotides. We compare the contigs to independently produced optical maps to evaluate their large-scale accuracy, and to a set of high-quality bacterial artificial chromosome (BAC)-based assemblies to evaluate base-level accuracy.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Genoma de Planta , Genômica/métodos , Poaceae/genética , Sequências Repetitivas de Ácido Nucleico , Análise de Sequência de DNA/métodos , Software , Mapeamento de Sequências Contíguas/normas , Tamanho do Genoma , Genômica/normas , Análise de Sequência de DNA/normas
5.
New Phytol ; 221(4): 1789-1801, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30318590

RESUMO

Dissecting the genetic and genomic architecture of complex traits is essential to understand the forces maintaining the variation in phenotypic traits of ecological and economical importance. Whole-genome resequencing data were used to generate high-resolution polymorphic single nucleotide polymorphism (SNP) markers and genotype individuals from common gardens across the loblolly pine (Pinus taeda) natural range. Genome-wide associations were tested with a large phenotypic dataset comprising 409 variables including morphological traits (height, diameter, carbon isotope discrimination, pitch canker resistance), and molecular traits such as metabolites and expression of xylem development genes. Our study identified 2335 new SNP × trait associations for the species, with many SNPs located in physical clusters in the genome of the species; and the genomic location of hotspots for metabolic × genotype associations. We found a highly polygenic basis of quantitative inheritance, with significant differences in number, effects size, genomic location and frequency of alleles contributing to variation in phenotypes in the different traits. While mutation-selection balance might be shaping the genetic variation in metabolic traits, balancing selection is more likely to shape the variation in expression of xylem development genes. Our work contributes to the study of complex traits in nonmodel plant species by identifying associations at a whole-genome level.


Assuntos
Herança Multifatorial , Pinus taeda/genética , Polimorfismo de Nucleotídeo Único , Frequência do Gene , Genética Populacional , Estudo de Associação Genômica Ampla , Genótipo , Fenótipo , Pinus taeda/fisiologia , Estados Unidos , Sequenciamento Completo do Genoma , Xilema/genética , Xilema/crescimento & desenvolvimento
6.
Plant J ; 87(5): 507-32, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-27145194

RESUMO

The Persian walnut (Juglans regia L.), a diploid species native to the mountainous regions of Central Asia, is the major walnut species cultivated for nut production and is one of the most widespread tree nut species in the world. The high nutritional value of J. regia nuts is associated with a rich array of polyphenolic compounds, whose complete biosynthetic pathways are still unknown. A J. regia genome sequence was obtained from the cultivar 'Chandler' to discover target genes and additional unknown genes. The 667-Mbp genome was assembled using two different methods (SOAPdenovo2 and MaSuRCA), with an N50 scaffold size of 464 955 bp (based on a genome size of 606 Mbp), 221 640 contigs and a GC content of 37%. Annotation with MAKER-P and other genomic resources yielded 32 498 gene models. Previous studies in walnut relying on tissue-specific methods have only identified a single polyphenol oxidase (PPO) gene (JrPPO1). Enabled by the J. regia genome sequence, a second homolog of PPO (JrPPO2) was discovered. In addition, about 130 genes in the large gallate 1-ß-glucosyltransferase (GGT) superfamily were detected. Specifically, two genes, JrGGT1 and JrGGT2, were significantly homologous to the GGT from Quercus robur (QrGGT), which is involved in the synthesis of 1-O-galloyl-ß-d-glucose, a precursor for the synthesis of hydrolysable tannins. The reference genome for J. regia provides meaningful insight into the complex pathways required for the synthesis of polyphenols. The walnut genome sequence provides important tools and methods to accelerate breeding and to facilitate the genetic dissection of complex traits.


Assuntos
Genoma de Planta/genética , Juglans/genética , Proteínas de Plantas/genética , Polifenóis/metabolismo , Catecol Oxidase/metabolismo
7.
Genome Res ; 22(3): 557-67, 2012 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-22147368

RESUMO

New sequencing technology has dramatically altered the landscape of whole-genome sequencing, allowing scientists to initiate numerous projects to decode the genomes of previously unsequenced organisms. The lowest-cost technology can generate deep coverage of most species, including mammals, in just a few days. The sequence data generated by one of these projects consist of millions or billions of short DNA sequences (reads) that range from 50 to 150 nt in length. These sequences must then be assembled de novo before most genome analyses can begin. Unfortunately, genome assembly remains a very difficult problem, made more difficult by shorter reads and unreliable long-range linking information. In this study, we evaluated several of the leading de novo assembly algorithms on four different short-read data sets, all generated by Illumina sequencers. Our results describe the relative performance of the different assemblers as well as other significant differences in assembly difficulty that appear to be inherent in the genomes themselves. Three overarching conclusions are apparent: first, that data quality, rather than the assembler itself, has a dramatic effect on the quality of an assembled genome; second, that the degree of contiguity of an assembly varies enormously among different assemblers and different genomes; and third, that the correctness of an assembly also varies widely and is not well correlated with statistics on contiguity. To enable others to replicate our results, all of our data and methods are freely available, as are all assemblers used in this study.


Assuntos
Algoritmos , Genômica/métodos , Análise de Sequência de DNA , Animais , Biologia Computacional/métodos , Genoma , Genoma Bacteriano/genética , Humanos , Internet , Reprodutibilidade dos Testes
8.
Brief Bioinform ; 14(2): 213-24, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-22199379

RESUMO

Since its launch in 2004, the open-source AMOS project has released several innovative DNA sequence analysis applications including: Hawkeye, a visual analytics tool for inspecting the structure of genome assemblies; the Assembly Forensics and FRCurve pipelines for systematically evaluating the quality of a genome assembly; and AMOScmp, the first comparative genome assembler. These applications have been used to assemble and analyze dozens of genomes ranging in complexity from simple microbial species through mammalian genomes. Recent efforts have been focused on enhancing support for new data characteristics brought on by second- and now third-generation sequencing. This review describes the major components of AMOS in light of these challenges, with an emphasis on methods for assessing assembly quality and the visual analytics capabilities of Hawkeye. These interactive graphical aspects are essential for navigating and understanding the complexities of a genome assembly, from the overall genome structure down to individual bases. Hawkeye and AMOS are available open source at http://amos.sourceforge.net.


Assuntos
Genômica/estatística & dados numéricos , Análise de Sequência de DNA/estatística & dados numéricos , Software , Animais , Biologia Computacional , Gráficos por Computador , Apresentação de Dados , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos
9.
Bioinformatics ; 29(21): 2669-77, 2013 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-23990416

RESUMO

MOTIVATION: Second-generation sequencing technologies produce high coverage of the genome by short reads at a low cost, which has prompted development of new assembly methods. In particular, multiple algorithms based on de Bruijn graphs have been shown to be effective for the assembly problem. In this article, we describe a new hybrid approach that has the computational efficiency of de Bruijn graph methods and the flexibility of overlap-based assembly strategies, and which allows variable read lengths while tolerating a significant level of sequencing error. Our method transforms large numbers of paired-end reads into a much smaller number of longer 'super-reads'. The use of super-reads allows us to assemble combinations of Illumina reads of differing lengths together with longer reads from 454 and Sanger sequencing technologies, making it one of the few assemblers capable of handling such mixtures. We call our system the Maryland Super-Read Celera Assembler (abbreviated MaSuRCA and pronounced 'mazurka'). RESULTS: We evaluate the performance of MaSuRCA against two of the most widely used assemblers for Illumina data, Allpaths-LG and SOAPdenovo2, on two datasets from organisms for which high-quality assemblies are available: the bacterium Rhodobacter sphaeroides and chromosome 16 of the mouse genome. We show that MaSuRCA performs on par or better than Allpaths-LG and significantly better than SOAPdenovo on these data, when evaluated against the finished sequence. We then show that MaSuRCA can significantly improve its assemblies when the original data are augmented with long reads. AVAILABILITY: MaSuRCA is available as open-source code at ftp://ftp.genome.umd.edu/pub/MaSuRCA/. Previous (pre-publication) releases have been publicly available for over a year. CONTACT: alekseyz@ipst.umd.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica/métodos , Algoritmos , Animais , Genoma Bacteriano , Camundongos , Rhodobacter sphaeroides/genética , Análise de Sequência de DNA/métodos , Software
10.
Bioinformatics ; 29(14): 1718-25, 2013 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-23665771

RESUMO

MOTIVATION: A large and rapidly growing number of bacterial organisms have been sequenced by the newest sequencing technologies. Cheaper and faster sequencing technologies make it easy to generate very high coverage of bacterial genomes, but these advances mean that DNA preparation costs can exceed the cost of sequencing for small genomes. The need to contain costs often results in the creation of only a single sequencing library, which in turn introduces new challenges for genome assembly methods. RESULTS: We evaluated the ability of multiple genome assembly programs to assemble bacterial genomes from a single, deep-coverage library. For our comparison, we chose bacterial species spanning a wide range of GC content and measured the contiguity and accuracy of the resulting assemblies. We compared the assemblies produced by this very high-coverage, one-library strategy to the best assemblies created by two-library sequencing, and we found that remarkably good bacterial assemblies are possible with just one library. We also measured the effect of read length and depth of coverage on assembly quality and determined the values that provide the best results with current algorithms. CONTACT: salzberg@jhu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma Bacteriano , Genômica/métodos , Software , Algoritmos , Biblioteca Gênica , Análise de Sequência de DNA
11.
G3 (Bethesda) ; 14(5)2024 05 07.
Artigo em Inglês | MEDLINE | ID: mdl-38526344

RESUMO

Whitebark pine (WBP, Pinus albicaulis) is a white pine of subalpine regions in the Western contiguous United States and Canada. WBP has become critically threatened throughout a significant part of its natural range due to mortality from the introduced fungal pathogen white pine blister rust (WPBR, Cronartium ribicola) and additional threats from mountain pine beetle (Dendroctonus ponderosae), wildfire, and maladaptation due to changing climate. Vast acreages of WBP have suffered nearly complete mortality. Genomic technologies can contribute to a faster, more cost-effective approach to the traditional practices of identifying disease-resistant, climate-adapted seed sources for restoration. With deep-coverage Illumina short reads of haploid megagametophyte tissue and Oxford Nanopore long reads of diploid needle tissue, followed by a hybrid, multistep assembly approach, we produced a final assembly containing 27.6 Gb of sequence in 92,740 contigs (N50 537,007 bp) and 34,716 scaffolds (N50 2.0 Gb). Approximately 87.2% (24.0 Gb) of total sequence was placed on the 12 WBP chromosomes. Annotation yielded 25,362 protein-coding genes, and over 77% of the genome was characterized as repeats. WBP has demonstrated the greatest variation in resistance to WPBR among the North American white pines. Candidate genes for quantitative resistance include disease resistance genes known as nucleotide-binding leucine-rich repeat receptors (NLRs). A combination of protein domain alignments and direct genome scanning was employed to fully describe the 3 subclasses of NLRs. Our high-quality reference sequence and annotation provide a marked improvement in NLR identification compared to previous assessments that leveraged de novo-assembled transcriptomes.


Assuntos
Genoma de Planta , Anotação de Sequência Molecular , Pinus , Pinus/genética , Pinus/parasitologia , Genômica/métodos , Espécies em Perigo de Extinção , Sequenciamento de Nucleotídeos em Larga Escala
12.
medRxiv ; 2024 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-38585741

RESUMO

A common feature of human aging is the acquisition of somatic mutations, and mitochondria are particularly prone to mutation due to their inefficient DNA repair and close proximity to reactive oxygen species, leading to a state of mitochondrial DNA heteroplasmy1,2. Cross-sectional studies have demonstrated that detection of heteroplasmy increases with participant age3, a phenomenon that has been attributed to genetic drift4-7. In this first large-scale longitudinal study, we measured heteroplasmy in two prospective cohorts (combined n=1405) at two timepoints (mean time between visits, 8.6 years), demonstrating that deleterious heteroplasmies were more likely to increase in variant allele fraction (VAF). We further demonstrated that increase in VAF was associated with increased risk of overall mortality. These results challenge the claim that somatic mtDNA mutations arise mainly due to genetic drift, instead demonstrating positive selection for predicted deleterious mutations at the cellular level, despite an negative impact on overall mortality.

13.
PLoS Biol ; 8(9)2010 Sep 07.
Artigo em Inglês | MEDLINE | ID: mdl-20838655

RESUMO

A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo). Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (∼1.1 Gb) includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest.


Assuntos
Genoma , Perus/genética , Animais , Sequência de Bases , Mapeamento Cromossômico , DNA/genética , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA , Homologia de Sequência do Ácido Nucleico , Especificidade da Espécie
14.
bioRxiv ; 2023 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-37577699

RESUMO

We re-analyzed the data from a recent large-scale study that reported strong correlations between microbial organisms and 33 different cancer types, and that created machine learning predictors with near-perfect accuracy at distinguishing among cancers. We found at least two fundamental flaws in the reported data and in the methods: (1) errors in the genome database and the associated computational methods led to millions of false positive findings of bacterial reads across all samples, largely because most of the sequences identified as bacteria were instead human; and (2) errors in transformation of the raw data created an artificial signature, even for microbes with no reads detected, tagging each tumor type with a distinct signal that the machine learning programs then used to create an apparently accurate classifier. Each of these problems invalidates the results, leading to the conclusion that the microbiome-based classifiers for identifying cancer presented in the study are entirely wrong. These flaws have subsequently affected more than a dozen additional published studies that used the same data and whose results are likely invalid as well.

15.
mBio ; 14(5): e0160723, 2023 Oct 31.
Artigo em Inglês | MEDLINE | ID: mdl-37811944

RESUMO

IMPORTANCE: Recent reports showing that human cancers have a distinctive microbiome have led to a flurry of papers describing microbial signatures of different cancer types. Many of these reports are based on flawed data that, upon re-analysis, completely overturns the original findings. The re-analysis conducted here shows that most of the microbes originally reported as associated with cancer were not present at all in the samples. The original report of a cancer microbiome and more than a dozen follow-up studies are, therefore, likely to be invalid.


Assuntos
Microbiota , Neoplasias , Humanos , Biologia Computacional , Metagenômica , Análise de Dados
16.
Genome Biol ; 24(1): 249, 2023 10 30.
Artigo em Inglês | MEDLINE | ID: mdl-37904256

RESUMO

CHESS 3 represents an improved human gene catalog based on nearly 10,000 RNA-seq experiments across 54 body sites. It significantly improves current genome annotation by integrating the latest reference data and algorithms, machine learning techniques for noise filtering, and new protein structure prediction methods. CHESS 3 contains 41,356 genes, including 19,839 protein-coding genes and 158,377 transcripts, with 14,863 protein-coding transcripts not in other catalogs. It includes all MANE transcripts and at least one transcript for most RefSeq and GENCODE genes. On the CHM13 human genome, the CHESS 3 catalog contains an additional 129 protein-coding genes. CHESS 3 is available at http://ccb.jhu.edu/chess .


Assuntos
Genoma Humano , Proteínas , Humanos , Filogenia , Proteínas/genética , Algoritmos , Software , Anotação de Sequência Molecular
17.
bioRxiv ; 2023 Nov 17.
Artigo em Inglês | MEDLINE | ID: mdl-38014212

RESUMO

Whitebark pine (WBP, Pinus albicaulis ) is a white pine of subalpine regions in western contiguous US and Canada. WBP has become critically threatened throughout a significant part of its natural range due to mortality from the introduced fungal pathogen white pine blister rust (WPBR, Cronartium ribicola ) and additional threats from mountain pine beetle ( Dendroctonus ponderosae ), wildfire, and maladaptation due to changing climate. Vast acreages of WBP have suffered nearly complete mortality. Genomic technologies can contribute to a faster, more cost-effective approach to the traditional practices of identifying disease-resistant, climate-adapted seed sources for restoration. With deep-coverage Illumina short-reads of haploid megametophyte tissue and Oxford Nanopore long-reads of diploid needle tissue, followed by a hybrid, multistep assembly approach, we produced a final assembly containing 27.6 Gbp of sequence in 92,740 contigs (N50 537,007 bp) and 34,716 scaffolds (N50 2.0 Gbp). Approximately 87.2% (24.0 Gbp) of total sequence was placed on the twelve WBP chromosomes. Annotation yielded 25,362 protein-coding genes, and over 77% of the genome was characterized as repeats. WBP has demonstrated the greatest variation in resistance to WPBR among the North American white pines. Candidate genes for quantitative resistance include disease resistance genes known as nucleotide-binding leucine-rich-repeat receptors (NLRs). A combination of protein domain alignments and direct genome scanning was employed to fully describe the three subclasses of NLRs (TNL, CNL, RNL). Our high-quality reference sequence and annotation provide a marked improvement in NLR identification compared to previous assessments that leveraged de novo assembled transcriptomes.

18.
Nat Commun ; 14(1): 6113, 2023 09 30.
Artigo em Inglês | MEDLINE | ID: mdl-37777527

RESUMO

Mitochondria carry their own circular genome and disruption of the mitochondrial genome is associated with various aging-related diseases. Unlike the nuclear genome, mitochondrial DNA (mtDNA) can be present at 1000 s to 10,000 s copies in somatic cells and variants may exist in a state of heteroplasmy, where only a fraction of the DNA molecules harbors a particular variant. We quantify mtDNA heteroplasmy in 194,871 participants in the UK Biobank and find that heteroplasmy is associated with a 1.5-fold increased risk of all-cause mortality. Additionally, we functionally characterize mtDNA single nucleotide variants (SNVs) using a constraint-based score, mitochondrial local constraint score sum (MSS) and find it associated with all-cause mortality, and with the prevalence and incidence of cancer and cancer-related mortality, particularly leukemia. These results indicate that mitochondria may have a functional role in certain cancers, and mitochondrial heteroplasmic SNVs may serve as a prognostic marker for cancer, especially for leukemia.


Assuntos
Leucemia , Mitocôndrias , Humanos , Mitocôndrias/genética , DNA Mitocondrial/genética , Heteroplasmia , Leucemia/genética , Mutação
19.
Genetics ; 220(2)2022 02 04.
Artigo em Inglês | MEDLINE | ID: mdl-34897437

RESUMO

Until 2019, the human genome was available in only one fully annotated version, GRCh38, which was the result of 18 years of continuous improvement and revision. Despite dramatic improvements in sequencing technology, no other genome was available as an annotated reference until 2019, when the genome of an Ashkenazi individual, Ash1, was released. In this study, we describe the assembly and annotation of a second individual genome, from a Puerto Rican individual whose DNA was collected as part of the Human Pangenome project. The new genome, called PR1, is the first true reference genome created from an individual of African descent. Due to recent improvements in both sequencing and assembly technology, and particularly to the use of the recently completed CHM13 human genome as a guide to assembly, PR1 is more complete and more contiguous than either GRCh38 or Ash1. Annotation revealed 37,755 genes (of which 19,999 are protein coding), including 12 additional gene copies that are present in PR1 and missing from CHM13. Fifty-seven genes have fewer copies in PR1 than in CHM13, 9 map only partially, and 3 genes (all noncoding) from CHM13 are entirely missing from PR1.


Assuntos
População Negra , Genoma Humano , Hispânico ou Latino/genética , Humanos , Anotação de Sequência Molecular
20.
Nat Commun ; 13(1): 2047, 2022 04 19.
Artigo em Inglês | MEDLINE | ID: mdl-35440538

RESUMO

The genus Quercus, which emerged ∼55 million years ago during globally warm temperatures, diversified into ∼450 extant species. We present a high-quality de novo genome assembly of a California endemic oak, Quercus lobata, revealing features consistent with oak evolutionary success. Effective population size remained large throughout history despite declining since early Miocene. Analysis of 39,373 mapped protein-coding genes outlined copious duplications consistent with genetic and phenotypic diversity, both by retention of genes created during the ancient γ whole genome hexaploid duplication event and by tandem duplication within families, including numerous resistance genes and a very large block of duplicated DUF247 genes, which have been found to be associated with self-incompatibility in grasses. An additional surprising finding is that subcontext-specific patterns of DNA methylation associated with transposable elements reveal broadly-distributed heterochromatin in intergenic regions, similar to grasses. Collectively, these features promote genetic and phenotypic variation that would facilitate adaptability to changing environments.


Assuntos
Quercus , Evolução Biológica , Metilação de DNA/genética , Epigenoma , Evolução Molecular , Humanos , Quercus/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA