Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
BMC Genomics ; 24(1): 226, 2023 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-37127568

RESUMO

Open reading frames (ORFs) with fewer than 100 codons are generally not annotated in genomes, although bona fide genes of that size are known. Newer biochemical studies have suggested that thousands of small protein-coding ORFs (smORFs) may exist in the human genome, but the true number and the biological significance of the micropeptides they encode remain uncertain. Here, we used a comparative genomics approach to identify high-confidence smORFs that are likely protein-coding. We identified 3,326 high-confidence smORFs using constraint within human populations and evolutionary conservation as additional lines of evidence. Next, we validated that, as a group, our high-confidence smORFs are conserved at the amino-acid level rather than merely residing in highly conserved non-coding regions. Finally, we found that high-confidence smORFs are enriched among disease-associated variants from GWAS. Overall, our results highlight that smORF-encoded peptides likely have important functional roles in human disease.


Assuntos
Peptídeos , Proteínas , Humanos , Fases de Leitura Aberta , Proteínas/genética , Peptídeos/genética , Genoma Humano , Micropeptídeos
2.
Genet Med ; 25(1): 16-26, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36305854

RESUMO

PURPOSE: This study aimed to explore whether evidence of pathogenicity from prior variant classifications in ClinVar could be used to inform variant interpretation using the American College of Medical Genetics and Genomics/Association for Molecular Pathology clinical guidelines. METHODS: We identified distinct single-nucleotide variants (SNVs) that are either similar in location or in functional consequence to pathogenic variants in ClinVar and analyzed evidence in support of pathogenicity using 3 interpretation criteria. RESULTS: Thousands of variants, including many in clinically actionable disease genes (American College of Medical Genetics and Genomics secondary findings v3.0), have evidence of pathogenicity from existing variant classifications, accounting for 2.5% of nonsynonymous SNVs within ClinVar. Notably, there are many variants with uncertain or conflicting classifications that cause the same amino acid substitution as other pathogenic variants (PS1, N = 323), variants that are predicted to cause different amino acid substitutions in the same codon as pathogenic variants (PM5, N = 7692), and loss-of-function variants that are present in genes in which many loss-of-function variants are classified as pathogenic (PVS1, N = 3635). Most of these variants have similar computational predictions of pathogenicity and splicing effect as their associated pathogenic variants. CONCLUSION: Broadly, for >1.4 million SNVs exome wide, information from previously classified variants could be used to provide evidence of pathogenicity. We have developed a pipeline to identify variants meeting these criteria that may inform interpretation efforts.


Assuntos
Testes Genéticos , Genômica , Humanos , Exoma , Splicing de RNA , Patologia Molecular , Variação Genética/genética
3.
Dev Biol ; 424(2): 181-188, 2017 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-28283406

RESUMO

We characterize the genetic diversity of Xenopus laevis strains using RNA-seq data and allele-specific analysis. This data provides a catalogue of coding variation, which can be used for improving the genomic sequence, as well as for better sequence alignment, probe design, and proteomic analysis. In addition, we paint a broad picture of the genetic landscape of the species by functionally annotating different classes of mutations with a well-established prediction tool (PolyPhen-2). Further, we specifically compare the variation in the progeny of four crosses: inbred genomic (J)-strain, outbred albino (B)-strain, and two hybrid crosses of J and B strains. We identify a subset of mutations specific to the B strain, which allows us to investigate the selection pressures affecting duplicated genes in this allotetraploid. From these crosses we find the ratio of non-synonymous to synonymous mutations is lower in duplicated genes, which suggests that they are under greater purifying selection. Surprisingly, we also find that function-altering ("damaging") mutations constitute a greater fraction of the non-synonymous variants in this group, which suggests a role for subfunctionalization in coding variation affecting duplicated genes.


Assuntos
Variação Genética , Fases de Leitura Aberta/genética , Transcriptoma/genética , Xenopus laevis/genética , Animais , Sequência de Bases , Cruzamentos Genéticos , Duplicação Gênica , Genoma , Hibridização Genética , Endogamia , Espectrometria de Massas , Mutação de Sentido Incorreto/genética , Polimorfismo de Nucleotídeo Único/genética , Reprodutibilidade dos Testes , Análise de Sequência de RNA , Proteínas de Xenopus/química , Proteínas de Xenopus/genética , Proteínas de Xenopus/metabolismo
4.
Genet Med ; 20(9): 936-941, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-29388949

RESUMO

PURPOSE: Over 150,000 variants have been reported to cause Mendelian disease in the medical literature. It is still difficult to leverage this knowledge base in clinical practice, as many reports lack strong statistical evidence or may include false associations. Clinical laboratories assess whether these variants (along with newly observed variants that are adjacent to these published ones) underlie clinical disorders. METHODS: We investigated whether citation data-including journal impact factor and the number of cited variants (NCV) in each gene with published disease associations-can be used to improve variant assessment. RESULTS: Surprisingly, we found that impact factor is not predictive of pathogenicity, but the NCV score for each gene can provide statistical support for prediction of pathogenicity. When this gene-level citation metric is combined with variant-level evolutionary conservation and structural features, classification accuracy reaches 89.5%. Further, variants identified in clinical exome sequencing cases have higher NCVs than do simulated rare variants from the Exome Aggregation Consortium database within the same set of genes and functional consequences (P < 2.22 × 10-16). CONCLUSION: Aggregate citation data can complement existing variant-based predictive algorithms, and can boost their performance without the need to access and review large numbers of papers. The NCV is a slow-growing metric of scientific knowledge about each gene's association with disease.


Assuntos
Biologia Computacional/métodos , Estudo de Associação Genômica Ampla/métodos , Algoritmos , Bases de Dados Genéticas , Previsões , Variação Genética , Humanos , Fator de Impacto de Revistas
5.
Genome Res ; 22(8): 1541-8, 2012 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-22555591

RESUMO

Genetic mapping of mutations in model systems has facilitated the identification of genes contributing to fundamental biological processes including human diseases. However, this approach has historically required the prior characterization of informative markers. Here we report a fast and cost-effective method for genetic mapping using next-generation sequencing that combines single nucleotide polymorphism discovery, mutation localization, and potential identification of causal sequence variants. In contrast to prior approaches, we have developed a hidden Markov model to narrowly define the mutation area by inferring recombination breakpoints of chromosomes in the mutant pool. In addition, we created an interactive online software resource to facilitate automated analysis of sequencing data and demonstrate its utility in the zebrafish and mouse models. Our novel methodology and online tools will make next-generation sequencing an easily applicable resource for mutation mapping in all model systems.


Assuntos
Análise Mutacional de DNA/métodos , Software , Peixe-Zebra/genética , Alelos , Animais , Mapeamento Cromossômico/métodos , Cromossomos/genética , Cruzamentos Genéticos , Feminino , Frequência do Gene , Genômica/métodos , Homozigoto , Masculino , Cadeias de Markov , Camundongos , Camundongos Endogâmicos C57BL , Mutação , Polimorfismo de Nucleotídeo Único , Recombinação Genética , Fatores de Tempo
6.
Biomedicines ; 12(1)2024 Jan 12.
Artigo em Inglês | MEDLINE | ID: mdl-38255267

RESUMO

We hypothesized that subjects with heterozygous loss-of-function (LoF) ACE mutations are at risk for Alzheimer's disease because amyloid Aß42, a primary component of the protein aggregates that accumulate in the brains of AD patients, is cleaved by ACE (angiotensin I-converting enzyme). Thus, decreased ACE activity in the brain, either due to genetic mutation or the effects of ACE inhibitors, could be a risk factor for AD. To explore this hypothesis in the current study, existing SNP databases were analyzed for LoF ACE mutations using four predicting tools, including PolyPhen-2, and compared with the topology of known ACE mutations already associated with AD. The combined frequency of >400 of these LoF-damaging ACE mutations in the general population is quite significant-up to 5%-comparable to the frequency of AD in the population > 70 y.o., which indicates that the contribution of low ACE in the development of AD could be under appreciated. Our analysis suggests several mechanisms by which ACE mutations may be associated with Alzheimer's disease. Systematic analysis of blood ACE levels in patients with all ACE mutations is likely to have clinical significance because available sequencing data will help detect persons with increased risk of late-onset Alzheimer's disease. Patients with transport-deficient ACE mutations (about 20% of damaging ACE mutations) may benefit from preventive or therapeutic treatment with a combination of chemical and pharmacological (e.g., centrally acting ACE inhibitors) chaperones and proteosome inhibitors to restore impaired surface ACE expression, as was shown previously by our group for another transport-deficient ACE mutation-Q1069R.

7.
Nature ; 447(7146): 799-816, 2007 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-17571346

RESUMO

We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.


Assuntos
Genoma Humano/genética , Genômica , Sequências Reguladoras de Ácido Nucleico/genética , Transcrição Gênica/genética , Cromatina/genética , Cromatina/metabolismo , Imunoprecipitação da Cromatina , Sequência Conservada/genética , Replicação do DNA , Evolução Molecular , Éxons/genética , Variação Genética/genética , Heterozigoto , Histonas/metabolismo , Humanos , Projetos Piloto , Ligação Proteica , RNA Mensageiro/genética , RNA não Traduzido/genética , Fatores de Transcrição/metabolismo , Sítio de Iniciação de Transcrição
8.
Nat Commun ; 14(1): 2230, 2023 04 19.
Artigo em Inglês | MEDLINE | ID: mdl-37076482

RESUMO

Despite the increasing use of genomic sequencing in clinical practice, the interpretation of rare genetic variants remains challenging even in well-studied disease genes, resulting in many patients with Variants of Uncertain Significance (VUSs). Computational Variant Effect Predictors (VEPs) provide valuable evidence in variant assessment, but they are prone to misclassifying benign variants, contributing to false positives. Here, we develop Deciphering Mutations in Actionable Genes (DeMAG), a supervised classifier for missense variants trained using extensive diagnostic data available in 59 actionable disease genes (American College of Medical Genetics and Genomics Secondary Findings v2.0, ACMG SF v2.0). DeMAG improves performance over existing VEPs by reaching balanced specificity (82%) and sensitivity (94%) on clinical data, and includes a novel epistatic feature, the 'partners score', which leverages evolutionary and structural partnerships of residues. The 'partners score' provides a general framework for modeling epistatic interactions, integrating both clinical and functional information. We provide our tool and predictions for all missense variants in 316 clinically actionable disease genes (demag.org) to facilitate the interpretation of variants and improve clinical decision-making.


Assuntos
Genômica , Mutação de Sentido Incorreto , Humanos , Estados Unidos , Genômica/métodos , Variação Genética , Testes Genéticos/métodos
9.
Nature ; 433(7026): 633-8, 2005 Feb 10.
Artigo em Inglês | MEDLINE | ID: mdl-15660107

RESUMO

Amino acid composition of proteins varies substantially between taxa and, thus, can evolve. For example, proteins from organisms with (G + C)-rich (or (A + T)-rich) genomes contain more (or fewer) amino acids encoded by (G + C)-rich codons. However, no universal trends in ongoing changes of amino acid frequencies have been reported. We compared sets of orthologous proteins encoded by triplets of closely related genomes from 15 taxa representing all three domains of life (Bacteria, Archaea and Eukaryota), and used phylogenies to polarize amino acid substitutions. Cys, Met, His, Ser and Phe accrue in at least 14 taxa, whereas Pro, Ala, Glu and Gly are consistently lost. The same nine amino acids are currently accrued or lost in human proteins, as shown by analysis of non-synonymous single-nucleotide polymorphisms. All amino acids with declining frequencies are thought to be among the first incorporated into the genetic code; conversely, all amino acids with increasing frequencies, except Ser, were probably recruited late. Thus, expansion of initially under-represented amino acids, which began over 3,400 million years ago, apparently continues to this day.


Assuntos
Aminoácidos/genética , Evolução Molecular , Genoma , Proteínas/química , Proteínas/genética , Sequência Rica em At/genética , Substituição de Aminoácidos/genética , Animais , Archaea/genética , Bactérias/genética , Composição de Bases , Células Eucarióticas/metabolismo , Sequência Rica em GC/genética , Humanos , Filogenia , Polimorfismo de Nucleotídeo Único/genética , Células Procarióticas/metabolismo
10.
PLoS Genet ; 4(12)2008 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-19096535

RESUMO

This corrects the article on p. e1000281 in Vol. 4, PMID: 19043566. Hypermutable Non-Synonymous Sites Are Under Stronger Negative Selection.


Assuntos
Variação Genética , Mutação , Seleção Genética , Evolução Molecular
11.
PLoS Genet ; 4(11): e1000281, 2008 11.
Artigo em Inglês | MEDLINE | ID: mdl-19043566

RESUMO

Mutation rate varies greatly between nucleotide sites of the human genome and depends both on the global genomic location and the local sequence context of a site. In particular, CpG context elevates the mutation rate by an order of magnitude. Mutations also vary widely in their effect on the molecular function, phenotype, and fitness. Independence of the probability of occurrence of a new mutation's effect has been a fundamental premise in genetics. However, highly mutable contexts may be preserved by negative selection at important sites but destroyed by mutation at sites under no selection. Thus, there may be a positive correlation between the rate of mutations at a nucleotide site and the magnitude of their effect on fitness. We studied the impact of CpG context on the rate of human-chimpanzee divergence and on intrahuman nucleotide diversity at non-synonymous coding sites. We compared nucleotides that occupy identical positions within codons of identical amino acids and only differ by being within versus outside CpG context. Nucleotides within CpG context are under a stronger negative selection, as revealed by their lower, proportionally to the mutation rate, rate of evolution and nucleotide diversity. In particular, the probability of fixation of a non-synonymous transition at a CpG site is two times lower than at a CpG site. Thus, sites with different mutation rates are not necessarily selectively equivalent. This suggests that the mutation rate may complement sequence conservation as a characteristic predictive of functional importance of nucleotide sites.


Assuntos
Ilhas de CpG/genética , Mutação , Seleção Genética , Animais , Evolução Molecular , Genoma , Genoma Humano , Humanos , Pan troglodytes/genética
13.
Nat Genet ; 47(2): 126-31, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25581429

RESUMO

Non-African populations have experienced size reductions in the time since their split from West Africans, leading to the hypothesis that natural selection to remove weakly deleterious mutations has been less effective in the history of non-Africans. To test this hypothesis, we measured the per-genome accumulation of nonsynonymous substitutions across diverse pairs of populations. We find no evidence for a higher load of deleterious mutations in non-Africans. However, we detect significant differences among more divergent populations, as archaic Denisovans have accumulated nonsynonymous mutations faster than either modern humans or Neanderthals. To reconcile these findings with patterns that have been interpreted as evidence of the less effective removal of deleterious mutations in non-Africans than in West Africans, we use simulations to show that the observed patterns are not likely to reflect changes in the effectiveness of selection after the populations split but are instead likely to be driven by other population genetic factors.


Assuntos
População Negra/genética , Genoma Humano/genética , Homem de Neandertal/genética , Seleção Genética/fisiologia , População Branca/genética , Substituição de Aminoácidos , Animais , Simulação por Computador , Frequência do Gene , Variação Genética , Genética Populacional , Humanos , Modelos Genéticos , Mutação
14.
Curr Protoc Hum Genet ; Chapter 7: Unit7.20, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23315928

RESUMO

PolyPhen-2 (Polymorphism Phenotyping v2), available as software and via a Web server, predicts the possible impact of amino acid substitutions on the stability and function of human proteins using structural and comparative evolutionary considerations. It performs functional annotation of single-nucleotide polymorphisms (SNPs), maps coding SNPs to gene transcripts, extracts protein sequence annotations and structural attributes, and builds conservation profiles. It then estimates the probability of the missense mutation being damaging based on a combination of all these properties. PolyPhen-2 features include a high-quality multiple protein sequence alignment pipeline and a prediction method employing machine-learning classification. The software also integrates the UCSC Genome Browser's human genome annotations and MultiZ multiple alignments of vertebrate genomes with the human genome. PolyPhen-2 is capable of analyzing large volumes of data produced by next-generation sequencing projects, thanks to built-in support for high-performance computing environments like Grid Engine and Platform LSF.


Assuntos
Biologia Computacional/métodos , Mutação de Sentido Incorreto , Proteínas/genética , Software , Bases de Dados Genéticas , Humanos , Internet , Polimorfismo de Nucleotídeo Único , Conformação Proteica , Proteínas/química , Proteínas/metabolismo , Ferramenta de Busca
15.
Nat Genet ; 41(4): 393-5, 2009 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-19287383

RESUMO

Eukaryotic DNA replication is highly stratified, with different genomic regions shown to replicate at characteristic times during S phase. Here we observe that mutation rate, as reflected in recent evolutionary divergence and human nucleotide diversity, is markedly increased in later-replicating regions of the human genome. All classes of substitutions are affected, suggesting a generalized mechanism involving replication time-dependent DNA damage. This correlation between mutation rate and regionally stratified replication timing may have substantial evolutionary implications.


Assuntos
Replicação do DNA/genética , Mutação , Polimorfismo de Nucleotídeo Único/genética , Variação Genética , Genoma Humano , Humanos , Cinética , Modelos Genéticos
16.
J Proteomics ; 71(3): 346-56, 2008 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-18639657

RESUMO

Homology-driven proteomics is a major tool to characterize proteomes of organisms with unsequenced genomes. This paper addresses practical aspects of automated homology-driven protein identifications by LC-MS/MS on a hybrid LTQ Orbitrap mass spectrometer. All essential software elements supporting the presented pipeline are either hosted at the publicly accessible web server, or are available for free download.


Assuntos
Espectrometria de Massas/métodos , Proteínas/análise , Proteômica/métodos , Sequência de Aminoácidos , Animais , Cromatografia Líquida/métodos , Biologia Computacional/métodos , Bases de Dados de Proteínas , Genoma , Proteínas de Insetos/química , Internet , Dados de Sequência Molecular , Mapeamento de Peptídeos , Proteínas de Plantas/química , Proteínas/química , Software
17.
J Proteome Res ; 7(8): 3382-95, 2008 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-18558732

RESUMO

Only a small fraction of spectra acquired in LC-MS/MS runs matches peptides from target proteins upon database searches. The remaining, operationally termed background, spectra originate from a variety of poorly controlled sources and affect the throughput and confidence of database searches. Here, we report an algorithm and its software implementation that rapidly removes background spectra, regardless of their precise origin. The method estimates the dissimilarity distance between screened MS/MS spectra and unannotated spectra from a partially redundant background library compiled from several control and blank runs. Filtering MS/MS queries enhanced the protein identification capacity when searches lacked spectrum to sequence matching specificity. In sequence-similarity searches it reduced by, on average, 30-fold the number of orphan hits, which were not explicitly related to background protein contaminants and required manual validation. Removing high quality background MS/MS spectra, while preserving in the data set the genuine spectra from target proteins, decreased the false positive rate of stringent database searches and improved the identification of low-abundance proteins.


Assuntos
Proteínas/análise , Animais , Cromatografia Líquida , Bases de Dados Factuais , Células HeLa , Humanos , Proteínas de Insetos/análise , Proteínas de Plantas/análise , Proteômica , Software , Espectrometria de Massas em Tandem , Traqueófitas , Triatoma
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA