Pesquisa | Secretaria de Estado da Saúde

Obtaining genetics insights from deep learning via explainable artificial intelligence.

Novakovsky, Gherman; Dexter, Nick; Libbrecht, Maxwell W; Wasserman, Wyeth W; Mostafavi, Sara.

Nat Rev Genet ; 24(2): 125-137, 2023 02.

Artigo em Inglês | MEDLINE | ID: mdl-36192604

RESUMO

Artificial intelligence (AI) models based on deep learning now represent the state of the art for making functional predictions in genomics research. However, the underlying basis on which predictive models make such predictions is often unknown. For genomics researchers, this missing explanatory information would frequently be of greater value than the predictions themselves, as it can enable new insights into genetic processes. We review progress in the emerging area of explainable AI (xAI), a field with the potential to empower life science researchers to gain mechanistic insights into complex deep learning models. We discuss and categorize approaches for model interpretation, including an intuitive understanding of how each approach works and their underlying assumptions and limitations in the context of typical high-throughput biological datasets.

Assuntos

Inteligência Artificial , Aprendizado Profundo , Genômica

Petabase-scale sequence alignment catalyses viral discovery.

Edgar, Robert C; Taylor, Brie; Lin, Victor; Altman, Tomer; Barbera, Pierre; Meleshko, Dmitry; Lohr, Dan; Novakovsky, Gherman; Buchfink, Benjamin; Al-Shayeb, Basem; Banfield, Jillian F; de la Peña, Marcos; Korobeynikov, Anton; Chikhi, Rayan; Babaian, Artem.

Nature ; 602(7895): 142-147, 2022 02.

Artigo em Inglês | MEDLINE | ID: mdl-35082445

RESUMO

Public databases contain a planetary collection of nucleic acid sequences, but their systematic exploration has been inhibited by a lack of efficient methods for searching this corpus, which (at the time of writing) exceeds 20 petabases and is growing exponentially1. Here we developed a cloud computing infrastructure, Serratus, to enable ultra-high-throughput sequence alignment at the petabase scale. We searched 5.7 million biologically diverse samples (10.2 petabases) for the hallmark gene RNA-dependent RNA polymerase and identified well over 105 novel RNA viruses, thereby expanding the number of known species by roughly an order of magnitude. We characterized novel viruses related to coronaviruses, hepatitis delta virus and huge phages, respectively, and analysed their environmental reservoirs. To catalyse the ongoing revolution of viral discovery, we established a free and comprehensive database of these data and tools. Expanding the known sequence diversity of viruses can reveal the evolutionary origins of emerging pathogens and improve pathogen surveillance for the anticipation and mitigation of future pandemics.

Assuntos

Computação em Nuvem , Bases de Dados Genéticas , Vírus de RNA/genética , Vírus de RNA/isolamento & purificação , Alinhamento de Sequência/métodos , Virologia/métodos , Viroma/genética , Animais , Arquivos , Bacteriófagos/enzimologia , Bacteriófagos/genética , Biodiversidade , Coronavirus/classificação , Coronavirus/enzimologia , Coronavirus/genética , Evolução Molecular , Vírus Delta da Hepatite/enzimologia , Vírus Delta da Hepatite/genética , Humanos , Modelos Moleculares , Vírus de RNA/classificação , Vírus de RNA/enzimologia , RNA Polimerase Dependente de RNA/química , RNA Polimerase Dependente de RNA/genética , Software

Tonic-signaling chimeric antigen receptors drive human regulatory T cell exhaustion.

Lamarche, Caroline; Ward-Hartstonge, Kirsten; Mi, Tian; Lin, David T S; Huang, Qing; Brown, Andrew; Edwards, Karlie; Novakovsky, Gherman E; Qi, Christopher N; Kobor, Michael S; Zebley, Caitlin C; Weber, Evan W; Mackall, Crystal L; Levings, Megan K.

Proc Natl Acad Sci U S A ; 120(14): e2219086120, 2023 04 04.

Artigo em Inglês | MEDLINE | ID: mdl-36972454

RESUMO

Regulatory T cell (Treg) therapy is a promising approach to improve outcomes in transplantation and autoimmunity. In conventional T cell therapy, chronic stimulation can result in poor in vivo function, a phenomenon termed exhaustion. Whether or not Tregs are also susceptible to exhaustion, and if so, if this would limit their therapeutic effect, was unknown. To "benchmark" exhaustion in human Tregs, we used a method known to induce exhaustion in conventional T cells: expression of a tonic-signaling chimeric antigen receptor (TS-CAR). We found that TS-CAR-expressing Tregs rapidly acquired a phenotype that resembled exhaustion and had major changes in their transcriptome, metabolism, and epigenome. Similar to conventional T cells, TS-CAR Tregs upregulated expression of inhibitory receptors and transcription factors such as PD-1, TIM3, TOX and BLIMP1, and displayed a global increase in chromatin accessibility-enriched AP-1 family transcription factor binding sites. However, they also displayed Treg-specific changes such as high expression of 4-1BB, LAP, and GARP. DNA methylation analysis and comparison to a CD8+ T cell-based multipotency index showed that Tregs naturally exist in a relatively differentiated state, with further TS-CAR-induced changes. Functionally, TS-CAR Tregs remained stable and suppressive in vitro but were nonfunctional in vivo, as tested in a model of xenogeneic graft-versus-host disease. These data are the first comprehensive investigation of exhaustion in Tregs and reveal key similarities and differences with exhausted conventional T cells. The finding that human Tregs are susceptible to chronic stimulation-driven dysfunction has important implications for the design of CAR Treg adoptive immunotherapy strategies.

Assuntos

Doença Enxerto-Hospedeiro , Receptores de Antígenos Quiméricos , Humanos , Linfócitos T Reguladores , Exaustão das Células T , Imunoterapia Adotiva/métodos , Receptores de Antígenos de Linfócitos T/genética , Receptores de Antígenos de Linfócitos T/metabolismo

Using Transcriptomic Hidden Variables to Infer Context-Specific Genotype Effects in the Brain.

Ng, Bernard; Casazza, William; Patrick, Ellis; Tasaki, Shinya; Novakovsky, Gherman; Felsky, Daniel; Ma, Yiyi; Bennett, David A; Gaiteri, Chris; De Jager, Philip L; Mostafavi, Sara.

Am J Hum Genet ; 105(3): 562-572, 2019 09 05.

Artigo em Inglês | MEDLINE | ID: mdl-31447098

RESUMO

Deciphering the environmental contexts at which genetic effects are most prominent is central for making full use of GWAS results in follow-up experiment design and treatment development. However, measuring a large number of environmental factors at high granularity might not always be feasible. Instead, here we propose extracting cellular embedding of environmental factors from gene expression data by using latent variable (LV) analysis and taking these LVs as environmental proxies in detecting gene-by-environment (GxE) interaction effects on gene expression, i.e., GxE expression quantitative trait loci (eQTLs). Applying this approach to two largest brain eQTL datasets (n = 1,100), we show that LVs and GxE eQTLs in one dataset replicate well in the other dataset. Combining the two samples via meta-analysis, 895 GxE eQTLs are identified. On average, GxE effect explains an additional â¼4% variation in expression of each gene that displays a GxE effect. Ten of these 52 genes are associated with cell-type-specific eQTLs, and the remaining genes are multi-functional. Furthermore, after substituting LVs with expression of transcription factors (TF), we found 91 TF-specific eQTLs, which demonstrates an important use of our brain GxE eQTLs.

Assuntos

Encéfalo/metabolismo , Genótipo , Transcriptoma , Humanos , Locos de Características Quantitativas

ExplaiNN: interpretable and transparent neural networks for genomics.

Novakovsky, Gherman; Fornes, Oriol; Saraswat, Manu; Mostafavi, Sara; Wasserman, Wyeth W.

Genome Biol ; 24(1): 154, 2023 06 27.

Artigo em Inglês | MEDLINE | ID: mdl-37370113

RESUMO

Deep learning models such as convolutional neural networks (CNNs) excel in genomic tasks but lack interpretability. We introduce ExplaiNN, which combines the expressiveness of CNNs with the interpretability of linear models. ExplaiNN can predict TF binding, chromatin accessibility, and de novo motifs, achieving performance comparable to state-of-the-art methods. Its predictions are transparent, providing global (cell state level) as well as local (individual sequence level) biological insights into the data. ExplaiNN can serve as a plug-and-play platform for pretrained models and annotated position weight matrices. ExplaiNN aims to accelerate the adoption of deep learning in genomic sequence analysis by domain experts.

Assuntos

Genômica , Redes Neurais de Computação , Genômica/métodos , Cromatina/genética , Ligação Proteica

In silico discovery of small molecules for efficient stem cell differentiation into definitive endoderm.

Novakovsky, Gherman; Sasaki, Shugo; Fornes, Oriol; Omur, Meltem E; Huang, Helen; Bayly, Carmen L; Zhang, Dahai; Lim, Nathaniel; Cherkasov, Artem; Pavlidis, Paul; Mostafavi, Sara; Lynn, Francis C; Wasserman, Wyeth W.

Stem Cell Reports ; 18(3): 765-781, 2023 03 14.

Artigo em Inglês | MEDLINE | ID: mdl-36801003

RESUMO

Improving methods for human embryonic stem cell differentiation represents a challenge in modern regenerative medicine research. Using drug repurposing approaches, we discover small molecules that regulate the formation of definitive endoderm. Among them are inhibitors of known processes involved in endoderm differentiation (mTOR, PI3K, and JNK pathways) and a new compound, with an unknown mechanism of action, capable of inducing endoderm formation in the absence of growth factors in the media. Optimization of the classical protocol by inclusion of this compound achieves the same differentiation efficiency with a 90% cost reduction. The presented in silico procedure for candidate molecule selection has broad potential for improving stem cell differentiation protocols.

Assuntos

Endoderma , Células-Tronco Embrionárias Humanas , Humanos , Diferenciação Celular/fisiologia

Biologically relevant transfer learning improves transcription factor binding prediction.

Novakovsky, Gherman; Saraswat, Manu; Fornes, Oriol; Mostafavi, Sara; Wasserman, Wyeth W.

Genome Biol ; 22(1): 280, 2021 09 27.

Artigo em Inglês | MEDLINE | ID: mdl-34579793

RESUMO

BACKGROUND: Deep learning has proven to be a powerful technique for transcription factor (TF) binding prediction but requires large training datasets. Transfer learning can reduce the amount of data required for deep learning, while improving overall model performance, compared to training a separate model for each new task. RESULTS: We assess a transfer learning strategy for TF binding prediction consisting of a pre-training step, wherein we train a multi-task model with multiple TFs, and a fine-tuning step, wherein we initialize single-task models for individual TFs with the weights learned by the multi-task model, after which the single-task models are trained at a lower learning rate. We corroborate that transfer learning improves model performance, especially if in the pre-training step the multi-task model is trained with biologically relevant TFs. We show the effectiveness of transfer learning for TFs with ~ 500 ChIP-seq peak regions. Using model interpretation techniques, we demonstrate that the features learned in the pre-training step are refined in the fine-tuning step to resemble the binding motif of the target TF (i.e., the recipient of transfer learning in the fine-tuning step). Moreover, pre-training with biologically relevant TFs allows single-task models in the fine-tuning step to learn useful features other than the motif of the target TF. CONCLUSIONS: Our results confirm that transfer learning is a powerful technique for TF binding prediction.

Assuntos

Aprendizado de Máquina , Fatores de Transcrição/metabolismo , Sequenciamento de Cromatina por Imunoprecipitação , Genoma

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa