Pesquisa | Portal Regional da BVS

1.

Revisiting mutagenesis at non-B DNA motifs in the human genome.

McGinty, R J; Sunyaev, S R.

Nat Struct Mol Biol ; 30(4): 417-424, 2023 04.

Artigo em Inglês | MEDLINE | ID: mdl-36914796

RESUMO

Non-B DNA structures formed by repetitive sequence motifs are known instigators of mutagenesis in experimental systems. Analyzing this phenomenon computationally in the human genome requires careful disentangling of intrinsic confounding factors, including overlapping and interrupted motifs and recurrent sequencing errors. Here, we show that accounting for these factors eliminates all signals of repeat-induced mutagenesis that extend beyond the motif boundary, and eliminates or dramatically shrinks the magnitude of mutagenesis within some motifs, contradicting previous reports. Mutagenesis not attributable to artifacts revealed several biological mechanisms. Polymerase slippage generates frequent indels within every variety of short tandem repeat motif, implicating slipped-strand structures. Interruption-correcting single nucleotide variants within short tandem repeats may originate from error-prone polymerases. Secondary-structure formation promotes single nucleotide variants within palindromic repeats and duplications within direct repeats. G-quadruplex motifs cause recurrent sequencing errors, whereas mutagenesis at Z-DNAs is conspicuously absent.

Assuntos

DNA , Genoma Humano , Humanos , Motivos de Nucleotídeos/genética , Mutagênese , DNA/genética , DNA/química , Nucleotídeos

2.

AnFiSA: An open-source computational platform for the analysis of sequencing data for rare genetic disease.

Bouzinier, M A; Etin, D; Trifonov, S I; Evdokimova, V N; Ulitin, V; Shen, J; Kokorev, A; Ghazani, A A; Chekaluk, Y; Albertyn, Z; Giersch, A; Morton, C C; Abraamyan, F; Bendapudi, P K; Sunyaev, S; Krier, J B.

J Biomed Inform ; 133: 104174, 2022 09.

Artigo em Inglês | MEDLINE | ID: mdl-35998814

RESUMO

Despite genomic sequencing rapidly transforming from being a bench-side tool to a routine procedure in a hospital, there is a noticeable lack of genomic analysis software that supports both clinical and research workflows as well as crowdsourcing. Furthermore, most existing software packages are not forward-compatible in regards to supporting ever-changing diagnostic rules adopted by the genetics community. Regular updates of genomics databases pose challenges for reproducible and traceable automated genetic diagnostics tools. Lastly, most of the software tools score low on explainability amongst clinicians. We have created a fully open-source variant curation tool, AnFiSA, with the intention to invite and accept contributions from clinicians, researchers, and professional software developers. The design of AnFiSA addresses the aforementioned issues via the following architectural principles: using a multidimensional database management system (DBMS) for genomic data to address reproducibility, curated decision trees adaptable to changing clinical rules, and a crowdsourcing-friendly interface to address difficult-to-diagnose cases. We discuss how we have chosen our technology stack and describe the design and implementation of the software. Finally, we show in detail how selected workflows can be implemented using the current version of AnFiSA by a medical geneticist.

Assuntos

Genômica , Software , Biologia Computacional/métodos , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Genômica/métodos , Reprodutibilidade dos Testes , Fluxo de Trabalho

3.

Guidelines for investigating causality of sequence variants in human disease.

MacArthur, D G; Manolio, T A; Dimmock, D P; Rehm, H L; Shendure, J; Abecasis, G R; Adams, D R; Altman, R B; Antonarakis, S E; Ashley, E A; Barrett, J C; Biesecker, L G; Conrad, D F; Cooper, G M; Cox, N J; Daly, M J; Gerstein, M B; Goldstein, D B; Hirschhorn, J N; Leal, S M; Pennacchio, L A; Stamatoyannopoulos, J A; Sunyaev, S R; Valle, D; Voight, B F; Winckler, W; Gunter, C.

Nature ; 508(7497): 469-76, 2014 Apr 24.

Artigo em Inglês | MEDLINE | ID: mdl-24759409

RESUMO

The discovery of rare genetic variants is accelerating, and clear guidelines for distinguishing disease-causing sequence variants from the many potentially functional variants present in any human genome are urgently needed. Without rigorous standards we risk an acceleration of false-positive reports of causality, which would impede the translation of genomic research findings into the clinical diagnostic setting and hinder biological understanding of disease. Here we discuss the key challenges of assessing sequence variants in human disease, integrating both gene-level and variant-level support for causality. We propose guidelines for summarizing confidence in variant pathogenicity and highlight several areas that require further resource development.

Assuntos

Doença , Predisposição Genética para Doença/genética , Variação Genética/genética , Guias como Assunto , Reações Falso-Positivas , Genes/genética , Humanos , Disseminação de Informação , Editoração , Reprodutibilidade dos Testes , Projetos de Pesquisa , Pesquisa Translacional Biomédica/normas

4.

Charting the proteomes of organisms with unsequenced genomes by MALDI-quadrupole time-of-flight mass spectrometry and BLAST homology searching.

Shevchenko, A; Sunyaev, S; Loboda, A; Shevchenko, A; Bork, P; Ens, W; Standing, K G.

Anal Chem ; 73(9): 1917-26, 2001 May 01.

Artigo em Inglês | MEDLINE | ID: mdl-11354471

RESUMO

MALDI-quadrupole time-of-flight mass spectrometry was applied to identify proteins from organisms whose genomes are still unknown. The identification was carried out by successively searching a sequence database-first with a peptide mass fingerprint, then with a packet of noninterpreted MS/MS spectra, and finally with peptide sequences obtained by automated interpretation of the MS/MS spectra. A "MS BLAST" homology searching protocol was developed to overcome specific limitations imposed by mass spectrometric data, such as the limited accuracy of de novo sequence predictions. This approach was tested in a small-scale proteomic project involving the identification of 15 bands of gel-separated proteins from the methylotrophic yeast Pichia pastoris, whose genome has not yet been sequenced and which is only distantly related to other fungi.

Assuntos

Bases de Dados Factuais , Genoma Fúngico , Pichia/genética , Proteoma/química , Análise de Sequência de Proteína/métodos , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz , Algoritmos , Sequência de Aminoácidos , Animais , Linhagem Celular , Cães , Rim/citologia , Proteínas de Membrana/química , Dados de Sequência Molecular , Mapeamento de Peptídeos/instrumentação , Mapeamento de Peptídeos/métodos , Tripsina/metabolismo

5.

Prediction of deleterious human alleles.

Sunyaev, S; Ramensky, V; Koch, I; Lathe, W; Kondrashov, A S; Bork, P.

Hum Mol Genet ; 10(6): 591-7, 2001 Mar 15.

Artigo em Inglês | MEDLINE | ID: mdl-11230178

RESUMO

Single nucleotide polymorphisms (SNPs) constitute the bulk of human genetic variation, occurring with an average density of approximately 1/1000 nucleotides of a genotype. SNPs are either neutral allelic variants or are under selection of various strengths, and the impact of SNPs on fitness remains unknown. Identification of SNPs affecting human phenotype, especially leading to risks of complex disorders, is one of the key problems of medical genetics. SNPs in protein-coding regions that cause amino acid variants (non-synonymous cSNPs) are most likely to affect phenotypes. We have developed a straightforward and reliable method based on physical and comparative considerations that estimates the impact of an amino acid replacement on the three-dimensional structure and function of the protein. We estimate that approximately 20% of common human non-synonymous SNPs damage the protein. The average minor allele frequency of such SNPs in our data set was two times lower than that of benign non-synonymous SNPs. The average human genotype carries approximately 10(3) damaging non-synonymous SNPs that together cause a substantial reduction in fitness.

Assuntos

Deleção de Genes , Frequência do Gene/genética , Polimorfismo de Nucleotídeo Único , Alelos , Substituição de Aminoácidos/genética , Variação Genética , Genótipo , Humanos , Modelos Moleculares , Conformação Proteica , Seleção Genética

6.

Integration of genome data and protein structures: prediction of protein folds, protein interactions and "molecular phenotypes" of single nucleotide polymorphisms.

Sunyaev, S; Lathe, W; Bork, P.

Curr Opin Struct Biol ; 11(1): 125-30, 2001 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-11179902

RESUMO

With the massive amount of sequence and structural data being produced, new avenues emerge for exploiting the information therein for applications in several fields. Fold distributions can be mapped onto entire genomes to learn about the nature of the protein universe and many of the interactions between proteins can now be predicted solely on the basis of the genomic context of their genes. Furthermore, by utilising the new incoming data on single nucleotide polymorphisms by mapping them onto three-dimensional structures of proteins, problems concerning population, medical and evolutionary genetics can be addressed.

Assuntos

Genômica/métodos , Polimorfismo de Nucleotídeo Único , Ligação Proteica , Dobramento de Proteína , Apolipoproteínas E/química , Apolipoproteínas E/genética , Previsões/métodos , Modelos Teóricos , Fenótipo , Homologia de Sequência de Aminoácidos

7.

SNP frequencies in human genes an excess of rare alleles and differing modes of selection.

Sunyaev, S R; Lathe, W C; Ramensky, V E; Bork, P.

Trends Genet ; 16(8): 335-7, 2000 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-10904261

Assuntos

Alelos , Frequência do Gene , Polimorfismo de Nucleotídeo Único/genética , Seleção Genética , Humanos , Modelos Genéticos

8.

Individual variation in protein-coding sequences of human genome.

Sunyaev, S; Hanke, J; Brett, D; Aydin, A; Zastrow, I; Lathe, W; Bork, P; Reich, J.

Adv Protein Chem ; 54: 409-37, 2000.

Artigo em Inglês | MEDLINE | ID: mdl-10829234

Assuntos

Variação Genética , Genoma Humano , Proteínas/genética , Processamento Alternativo , Etiquetas de Sequências Expressas , Humanos , Polimorfismo Genético , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA , Análise de Sequência de Proteína

9.

Towards a structural basis of human non-synonymous single nucleotide polymorphisms.

Sunyaev, S; Ramensky, V; Bork, P.

Trends Genet ; 16(5): 198-200, 2000 May.

Artigo em Inglês | MEDLINE | ID: mdl-10782110

Assuntos

Polimorfismo de Nucleotídeo Único , Proteínas/química , Proteínas/genética , Bases de Dados Factuais , Doenças Genéticas Inatas/genética , Variação Genética , Humanos , Mutação

10.

PSIC: profile extraction from sequence alignments with position-specific counts of independent observations.

Sunyaev, S R; Eisenhaber, F; Rodchenkov, I V; Eisenhaber, B; Tumanyan, V G; Kuznetsov, E N.

Protein Eng ; 12(5): 387-94, 1999 May.

Artigo em Inglês | MEDLINE | ID: mdl-10360979

RESUMO

Sequence weighting techniques are aimed at balancing redundant observed information from subsets of similar sequences in multiple alignments. Traditional approaches apply the same weight to all positions of a given sequence, hence equal efficiency of phylogenetic changes is assumed along the whole sequence. This restrictive assumption is not required for the new method PSIC (position-specific independent counts) described in this paper. The number of independent observations (counts) of an amino acid type at a given alignment position is calculated from the overall similarity of the sequences that share the amino acid type at this position with the help of statistical concepts. This approach allows the fast computation of position-specific sequence weights even for alignments containing hundreds of sequences. The PSIC approach has been applied to profile extraction and to the fold family assignment of protein sequences with known structures. Our method was shown to be very productive in finding distantly related sequences and more powerful than Hidden Markov Models or the profile methods in WiseTools and PSI-BLAST in many cases. The profile extraction routine is available on the WWW (http://www.bork.embl-heidelberg. de/PSIC or http://www.imb.ac.ru/PSIC).

Assuntos

Proteínas/química , Alinhamento de Sequência/estatística & dados numéricos , Algoritmos , Sequência de Aminoácidos , Aminoácidos/química , Sequência Conservada , Bases de Dados Factuais , Internet , Dados de Sequência Molecular , Dobramento de Proteína

11.

Prediction of nonsynonymous single nucleotide polymorphisms in human disease-associated genes.

Sunyaev, S; Hanke, J; Aydin, A; Wirkner, U; Zastrow, I; Reich, J; Bork, P.

J Mol Med (Berl) ; 77(11): 754-60, 1999 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-10619435

RESUMO

Analysis of human genetic variation can shed light on the problem of the genetic basis of complex disorders. Nonsynonymous single nucleotide polymorphisms (SNPs), which affect the amino acid sequence of proteins, are believed to be the most frequent type of variation associated with the respective disease phenotype. Complete enumeration of nonsynonymous SNPs in the candidate genes will enable further association studies on panels of affected and unaffected individuals. Experimental detection of SNPs requires implementation of expensive technologies and is still far from being routine. Alternatively, SNPs can be identified by computational analysis of a publicly available expressed sequence tag (EST) database following experimental verification. We performed in silico analysis of amino acid variation for 471 of proteins with a documented history of experimental variation studies and with confirmed association with human diseases. This allowed us to evaluate the level of completeness of the current knowledge of nonsynonymous SNPs in well studied, medically relevant genes and to estimate the proportion of new variants which can be added with the help of computer-aided mining in EST databases. Our results suggest that approx. 50% of frequent nonsynonymous variants are already stored in public databases. Computational methods based on the scan of an EST database can add significantly to the current knowledge, but they are greatly limited by the size of EST databases and the nonuniform coverage of genes by ESTs. Nevertheless, a considerable number of new candidate nonsynonymous SNPs in genes of medical interest were found by EST screening procedure.

Assuntos

Doenças Genéticas Inatas/genética , Polimorfismo de Nucleotídeo Único/genética , Bases de Dados Factuais , Processamento Eletrônico de Dados , Etiquetas de Sequências Expressas , Humanos

12.

Homology-based fold predictions for Mycoplasma genitalium proteins.

Huynen, M; Doerks, T; Eisenhaber, F; Orengo, C; Sunyaev, S; Yuan, Y; Bork, P.

J Mol Biol ; 280(3): 323-6, 1998 Jul 17.

Artigo em Inglês | MEDLINE | ID: mdl-9665839

RESUMO

Homology search techniques based on the iterative PSI-BLAST method in combination with various filters for low sequence complexity are applied to assign folds to all Mycoplasma genitalium proteins. The resulting procedure (implemented as a web server) is able to predict at least one domain in 37% of these proteins automatically, with an estimated accuracy higher than 98%. Taking structural features such as coiled coil or transmembrane regions aside, folds can be assigned to more than half of the globular proteins in a bacterium just by iterative sequence comparison.

Assuntos

Proteínas de Bactérias/química , Mycoplasma/química , Dobramento de Proteína , Conformação Proteica , Homologia de Sequência

13.

Are knowledge-based potentials derived from protein structure sets discriminative with respect to amino acid types?

Sunyaev, S R; Eisenhaber, F; Argos, P; Kuznetsov, E N; Tumanyan, V G.

Proteins ; 31(3): 225-46, 1998 May 15.

Artigo em Inglês | MEDLINE | ID: mdl-9593195

RESUMO

The parametric description of residue environments through solvent accessibility, backbone conformation, or pairwise residue-residue distances is the key to the comparison between amino acid types at protein sequence positions and residue locations in structural templates (condition of protein sequence-structure match). For the first time, the research results presented in this study clarify and allow to quantify, on a rigorous statistical basis, to what extent the amino acid type-specific distributions of commonly used environment parameters are discriminative with respect to the 20 amino acid types. Relying on the Bahadur theory, we estimate the probability of error in a single-sequence-structure alignment based on weak or absent discriminative power in a learning database of protein structure. We present the results for many residue environment variables and demonstrate that each fold description parameter is sensitive with respect to only a few amino acid types while indifferent to most of the other amino acid types. Even complex structural characteristics combining solvent-accessible surface area, backbone conformation, and pairwise distances distinguish only some amino acid types, whereas the others remain nondiscriminated. We find that the knowledge-based potentials currently in use treat especially Ala, Asp, Gln, His, Ser, Thr, and Tyr as essentially "average" amino acids. Thus, highly discriminative amino acid types define the alignment register in gapless sequence-structure alignments. The introduction of gaps leads to alignment ambiguities at sequence positions occupied by nondiscriminated amino acid types. Therefore, local sequence-structure alignments produced by techniques with gaps cannot be reliable. Conceptionally new and more sensitive environment parameters must be invented.

Assuntos

Aminoácidos/química , Conformação Proteica , Fenômenos Químicos , Físico-Química , Bases de Dados Factuais , Matemática , Dobramento de Proteína , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Alinhamento de Sequência , Solventes , Moldes Genéticos

14.

Protein sequence-structure compatibility criteria in terms of statistical hypothesis testing.

Sunyaev, S; Kuznetsov, E; Rodchenkov, I; Tumanyan, V.

Protein Eng ; 10(6): 635-46, 1997 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-9278276

RESUMO

The assignment of query protein sequences to probable folds in a threading approach is based on the statistical analysis (learning) of structural properties of amino acids in known protein structures. We formalize the recognition problem in terms of mathematical statistics, namely statistical hypothesis testing. Our general formulation leads to various mathematical forms of a decision rule function for evaluation of the quality of a sequence-structure fit. Three criteria were derived according to a likelihood ratio approach. Two of them have new functional forms while the third happens to coincide with the mean force potential function previously derived under the additional assumption of the Boltzmann law. New decision rule functions employ (i) the Parzen estimator of a probability density and (ii) the newly introduced non-parametric statistic with known asymptotic distribution. We compared criteria efficiency by a 'structure seeks sequence' search for three highly populated template folds through a query library of non-homologous sequences of proteins with known 3D structure using residue accessibility as an environmental variable. Various criteria reflect different underlying statistical propositions and thus often recognize diverse correct sequence-structure matches. On the other hand, if an amino acid sequence is recognized as compatible with a template by each of three decision rules it appears that one can make a more reliable inference of sequence-structure relationship since almost all false positives obtained by the three criteria differ.

Assuntos

Sequência de Aminoácidos , Modelos Estatísticos , Conformação Proteica , Algoritmos , Interpretação Estatística de Dados , Funções Verossimilhança , Biblioteca de Peptídeos , Dobramento de Proteína , Alinhamento de Sequência , Relação Estrutura-Atividade , Moldes Genéticos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA