Pesquisa | Portal Regional da BVS

Learning sequence, structure, and function representations of proteins with language models.

Hamamsy, Tymor; Barot, Meet; Morton, James T; Steinegger, Martin; Bonneau, Richard; Cho, Kyunghyun.

bioRxiv ; 2023 Nov 26.

Artigo em Inglês | MEDLINE | ID: mdl-38045331

RESUMO

The sequence-structure-function relationships that ultimately generate the diversity of extant observed proteins is complex, as proteins bridge the gap between multiple informational and physical scales involved in nearly all cellular processes. One limitation of existing protein annotation databases such as UniProt is that less than 1% of proteins have experimentally verified functions, and computational methods are needed to fill in the missing information. Here, we demonstrate that a multi-aspect framework based on protein language models can learn sequence-structure-function representations of amino acid sequences, and can provide the foundation for sensitive sequence-structure-function aware protein sequence search and annotation. Based on this model, we introduce a multi-aspect information retrieval system for proteins, Protein-Vec, covering sequence, structure, and function aspects, that enables computational protein annotation and function prediction at tree-of-life scales.

Protein remote homology detection and structural alignment using deep learning.

Hamamsy, Tymor; Morton, James T; Blackwell, Robert; Berenberg, Daniel; Carriero, Nicholas; Gligorijevic, Vladimir; Strauss, Charlie E M; Leman, Julia Koehler; Cho, Kyunghyun; Bonneau, Richard.

Nat Biotechnol ; 2023 Sep 07.

Artigo em Inglês | MEDLINE | ID: mdl-37679542

RESUMO

Exploiting sequence-structure-function relationships in biotechnology requires improved methods for aligning proteins that have low sequence similarity to previously annotated proteins. We develop two deep learning methods to address this gap, TM-Vec and DeepBLAST. TM-Vec allows searching for structure-structure similarities in large sequence databases. It is trained to accurately predict TM-scores as a metric of structural similarity directly from sequence pairs without the need for intermediate computation or solution of structures. Once structurally similar proteins have been identified, DeepBLAST can structurally align proteins using only sequence information by identifying structurally homologous regions between proteins. It outperforms traditional sequence alignment methods and performs similarly to structure-based alignment methods. We show the merits of TM-Vec and DeepBLAST on a variety of datasets, including better identification of remotely homologous proteins compared with state-of-the-art sequence alignment and structure prediction methods.

High-performance single-cell gene regulatory network inference at scale: the Inferelator 3.0.

Skok Gibbs, Claudia; Jackson, Christopher A; Saldi, Giuseppe-Antonio; Tjärnberg, Andreas; Shah, Aashna; Watters, Aaron; De Veaux, Nicholas; Tchourine, Konstantine; Yi, Ren; Hamamsy, Tymor; Castro, Dayanne M; Carriero, Nicholas; Gorissen, Bram L; Gresham, David; Miraldi, Emily R; Bonneau, Richard.

Bioinformatics ; 38(9): 2519-2528, 2022 04 28.

Artigo em Inglês | MEDLINE | ID: mdl-35188184

RESUMO

MOTIVATION: Gene regulatory networks define regulatory relationships between transcription factors and target genes within a biological system, and reconstructing them is essential for understanding cellular growth and function. Methods for inferring and reconstructing networks from genomics data have evolved rapidly over the last decade in response to advances in sequencing technology and machine learning. The scale of data collection has increased dramatically; the largest genome-wide gene expression datasets have grown from thousands of measurements to millions of single cells, and new technologies are on the horizon to increase to tens of millions of cells and above. RESULTS: In this work, we present the Inferelator 3.0, which has been significantly updated to integrate data from distinct cell types to learn context-specific regulatory networks and aggregate them into a shared regulatory network, while retaining the functionality of the previous versions. The Inferelator is able to integrate the largest single-cell datasets and learn cell-type-specific gene regulatory networks. Compared to other network inference methods, the Inferelator learns new and informative Saccharomyces cerevisiae networks from single-cell gene expression data, measured by recovery of a known gold standard. We demonstrate its scaling capabilities by learning networks for multiple distinct neuronal and glial cell types in the developing Mus musculus brain at E18 from a large (1.3 million) single-cell gene expression dataset with paired single-cell chromatin accessibility data. AVAILABILITY AND IMPLEMENTATION: The inferelator software is available on GitHub (https://github.com/flatironinstitute/inferelator) under the MIT license and has been released as python packages with associated documentation (https://inferelator.readthedocs.io/). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Redes Reguladoras de Genes , Software , Animais , Camundongos , Genômica , Genoma , Cromatina

Quantifying the Severity of Adverse Drug Reactions Using Social Media: Network Analysis.

Lavertu, Adam; Hamamsy, Tymor; Altman, Russ B.

J Med Internet Res ; 23(10): e27714, 2021 10 21.

Artigo em Inglês | MEDLINE | ID: mdl-34673524

RESUMO

BACKGROUND: Adverse drug reactions (ADRs) affect the health of hundreds of thousands of individuals annually in the United States, with associated costs of hundreds of billions of dollars. The monitoring and analysis of the severity of ADRs is limited by the current qualitative and categorical systems of severity classification. Previous efforts have generated quantitative estimates for a subset of ADRs but were limited in scope because of the time and costs associated with the efforts. OBJECTIVE: The aim of this study is to increase the number of ADRs for which there are quantitative severity estimates while improving the quality of these severity estimates. METHODS: We present a semisupervised approach that estimates ADR severity by using social media word embeddings to construct a lexical network of ADRs and perform label propagation. We used this method to estimate the severity of 28,113 ADRs, representing 12,198 unique ADR concepts from the Medical Dictionary for Regulatory Activities. RESULTS: Our Severity of Adverse Events Derived from Reddit (SAEDR) scores have good correlations with real-world outcomes. The SAEDR scores had Spearman correlations of 0.595, 0.633, and -0.748 for death, serious outcome, and no outcome, respectively, with ADR case outcomes in the Food and Drug Administration Adverse Event Reporting System. We investigated different methods for defining initial seed term sets and evaluated their impact on the severity estimates. We analyzed severity distributions for ADRs based on their appearance in boxed warning drug label sections, as well as for ADRs with sex-specific associations. We found that ADRs discovered in the postmarketing period had significantly greater severity than those discovered during the clinical trial (P<.001). We created quantitative drug-risk profile (DRIP) scores for 968 drugs that had a Spearman correlation of 0.377 with drugs ranked by the Food and Drug Administration Adverse Event Reporting System cases resulting in death, where the given drug was the primary suspect. CONCLUSIONS: Our SAEDR and DRIP scores are well correlated with the real-world outcomes of the entities they represent and have demonstrated utility in pharmacovigilance research. We make the SAEDR scores for 12,198 ADRs and the DRIP scores for 968 drugs publicly available to enable more quantitative analysis of pharmacovigilance data.

Assuntos

Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Mídias Sociais , Sistemas de Notificação de Reações Adversas a Medicamentos , Rotulagem de Medicamentos , Feminino , Humanos , Masculino , Farmacovigilância

Viewing the US presidential electoral map through the lens of public health.

Hamamsy, Tymor; Danziger, Michael; Nagler, Jonathan; Bonneau, Richard.

PLoS One ; 16(7): e0254001, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34288913

RESUMO

Health, disease, and mortality vary greatly at the county level, and there are strong geographical trends of disease in the United States. Healthcare is and has been a top priority for voters in the U.S., and an important political issue. Consequently, it is important to determine what relationship voting patterns have with health, disease, and mortality, as doing so may help guide appropriate policy. We performed a comprehensive analysis of the relationship between voting patterns and over 150 different public health and wellbeing variables at the county level, comparing all states, including counties in 2016 battleground states, and counties in states that flipped from majority Democrat to majority Republican from 2012 to 2016. We also investigated county-level health trends over the last 30+ years and find statistically significant relationships between a number of health measures and the voting patterns of counties in presidential elections. Collectively, these data exhibit a strong pattern: counties that voted Republican in the 2016 election had overall worse health outcomes than those that voted Democrat. We hope that this strong relationship can guide improvements in healthcare policy legislation at the county level.

Assuntos

Governo Federal , Geografia Médica , Empregados do Governo , Política , Saúde Pública , Gastos em Saúde/estatística & dados numéricos , Política de Saúde , Indicadores Básicos de Saúde , Humanos , Morbidade , Mortalidade , Estados Unidos

The ExAC browser: displaying reference data information from over 60 000 exomes.

Karczewski, Konrad J; Weisburd, Ben; Thomas, Brett; Solomonson, Matthew; Ruderfer, Douglas M; Kavanagh, David; Hamamsy, Tymor; Lek, Monkol; Samocha, Kaitlin E; Cummings, Beryl B; Birnbaum, Daniel; Daly, Mark J; MacArthur, Daniel G.

Nucleic Acids Res ; 45(D1): D840-D845, 2017 01 04.

Artigo em Inglês | MEDLINE | ID: mdl-27899611

RESUMO

Worldwide, hundreds of thousands of humans have had their genomes or exomes sequenced, and access to the resulting data sets can provide valuable information for variant interpretation and understanding gene function. Here, we present a lightweight, flexible browser framework to display large population datasets of genetic variation. We demonstrate its use for exome sequence data from 60 706 individuals in the Exome Aggregation Consortium (ExAC). The ExAC browser provides gene- and transcript-centric displays of variation, a critical view for clinical applications. Additionally, we provide a variant display, which includes population frequency and functional annotation data as well as short read support for the called variant. This browser is open-source, freely available at http://exac.broadinstitute.org, and has already been used extensively by clinical laboratories worldwide.

Assuntos

Biologia Computacional/métodos , Bases de Dados Genéticas , Exoma , Genômica/métodos , Navegador , Estudo de Associação Genômica Ampla/métodos , Humanos , Software , Interface Usuário-Computador

Gene expression elucidates functional impact of polygenic risk for schizophrenia.

Fromer, Menachem; Roussos, Panos; Sieberts, Solveig K; Johnson, Jessica S; Kavanagh, David H; Perumal, Thanneer M; Ruderfer, Douglas M; Oh, Edwin C; Topol, Aaron; Shah, Hardik R; Klei, Lambertus L; Kramer, Robin; Pinto, Dalila; Gümüs, Zeynep H; Cicek, A Ercument; Dang, Kristen K; Browne, Andrew; Lu, Cong; Xie, Lu; Readhead, Ben; Stahl, Eli A; Xiao, Jianqiu; Parvizi, Mahsa; Hamamsy, Tymor; Fullard, John F; Wang, Ying-Chih; Mahajan, Milind C; Derry, Jonathan M J; Dudley, Joel T; Hemby, Scott E; Logsdon, Benjamin A; Talbot, Konrad; Raj, Towfique; Bennett, David A; De Jager, Philip L; Zhu, Jun; Zhang, Bin; Sullivan, Patrick F; Chess, Andrew; Purcell, Shaun M; Shinobu, Leslie A; Mangravite, Lara M; Toyoshiba, Hiroyoshi; Gur, Raquel E; Hahn, Chang-Gyu; Lewis, David A; Haroutunian, Vahram; Peters, Mette A; Lipska, Barbara K; Buxbaum, Joseph D.

Nat Neurosci ; 19(11): 1442-1453, 2016 11.

Artigo em Inglês | MEDLINE | ID: mdl-27668389

RESUMO

Over 100 genetic loci harbor schizophrenia-associated variants, yet how these variants confer liability is uncertain. The CommonMind Consortium sequenced RNA from dorsolateral prefrontal cortex of people with schizophrenia (N = 258) and control subjects (N = 279), creating a resource of gene expression and its genetic regulation. Using this resource, â¼20% of schizophrenia loci have variants that could contribute to altered gene expression and liability. In five loci, only a single gene was involved: FURIN, TSNARE1, CNTN4, CLCN3 or SNAP91. Altering expression of FURIN, TSNARE1 or CNTN4 changed neurodevelopment in zebrafish; knockdown of FURIN in human neural progenitor cells yielded abnormal migration. Of 693 genes showing significant case-versus-control differential expression, their fold changes were ≤ 1.33, and an independent cohort yielded similar results. Gene co-expression implicates a network relevant for schizophrenia. Our findings show that schizophrenia is polygenic and highlight the utility of this resource for mechanistic interpretations of genetic liability for brain diseases.

Assuntos

Regulação da Expressão Gênica/genética , Predisposição Genética para Doença , Herança Multifatorial/genética , Esquizofrenia/genética , Encéfalo/metabolismo , Feminino , Estudo de Associação Genômica Ampla , Humanos , Masculino , Polimorfismo de Nucleotídeo Único , Risco

Patterns of genic intolerance of rare copy number variation in 59,898 human exomes.

Ruderfer, Douglas M; Hamamsy, Tymor; Lek, Monkol; Karczewski, Konrad J; Kavanagh, David; Samocha, Kaitlin E; Daly, Mark J; MacArthur, Daniel G; Fromer, Menachem; Purcell, Shaun M.

Nat Genet ; 48(10): 1107-11, 2016 10.

Artigo em Inglês | MEDLINE | ID: mdl-27533299

RESUMO

Copy number variation (CNV) affecting protein-coding genes contributes substantially to human diversity and disease. Here we characterized the rates and properties of rare genic CNVs (<0.5% frequency) in exome sequencing data from nearly 60,000 individuals in the Exome Aggregation Consortium (ExAC) database. On average, individuals possessed 0.81 deleted and 1.75 duplicated genes, and most (70%) carried at least one rare genic CNV. For every gene, we empirically estimated an index of relative intolerance to CNVs that demonstrated moderate correlation with measures of genic constraint based on single-nucleotide variation (SNV) and was independently correlated with measures of evolutionary conservation. For individuals with schizophrenia, genes affected by CNVs were more intolerant than in controls. The ExAC CNV data constitute a critical component of an integrated database spanning the spectrum of human genetic variation, aiding in the interpretation of personal genomes as well as population-based disease studies. These data are freely available for download and visualization online.

Assuntos

Variações do Número de Cópias de DNA , Exoma , Predisposição Genética para Doença , Adulto , Criança , Bases de Dados Genéticas , Feminino , Frequência do Gene , Genoma Humano , Humanos , Masculino , Polimorfismo de Nucleotídeo Único , Esquizofrenia/genética

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA