Búsqueda | Portal Regional de la BVS

1.

EGGNet, a Generalizable Geometric Deep Learning Framework for Protein Complex Pose Scoring.

Wang, Zichen; Brand, Ryan; Adolf-Bryfogle, Jared; Grewal, Jasleen; Qi, Yanjun; Combs, Steven A; Golovach, Nataliya; Alford, Rebecca; Rangwala, Huzefa; Clark, Peter M.

ACS Omega ; 9(7): 7471-7479, 2024 Feb 20.

Artículo en Inglés | MEDLINE | ID: mdl-38405499

RESUMEN

Computational prediction of molecule-protein interactions has been key for developing new molecules to interact with a target protein for therapeutics development. Previous work includes two independent streams of approaches: (1) predicting protein-protein interactions (PPIs) between naturally occurring proteins and (2) predicting binding affinities between proteins and small-molecule ligands [also known as drug-target interaction (DTI)]. Studying the two problems in isolation has limited the ability of these computational models to generalize across the PPI and DTI tasks, both of which ultimately involve noncovalent interactions with a protein target. In this work, we developed Equivariant Graph of Graphs neural Network (EGGNet), a geometric deep learning (GDL) framework, for molecule-protein binding predictions that can handle three types of molecules for interacting with a target protein: (1) small molecules, (2) synthetic peptides, and (3) natural proteins. EGGNet leverages a graph of graphs (GoG) representation constructed from the molecular structures at atomic resolution and utilizes a multiresolution equivariant graph neural network to learn from such representations. In addition, EGGNet leverages the underlying biophysics and makes use of both atom- and residue-level interactions, which improve EGGNet's ability to rank candidate poses from blind docking. EGGNet achieves competitive performance on both a public protein-small-molecule binding affinity prediction task (80.2% top 1 success rate on CASF-2016) and a synthetic protein interface prediction task (88.4% area under the precision-recall curve). We envision that the proposed GDL framework can generalize to many other protein interaction prediction problems, such as binding site prediction and molecular docking, helping accelerate protein engineering and structure-based drug development.

2.

Correction for Rando et al., "Pathogenesis, Symptomatology, and Transmission of SARS-CoV-2 through Analysis of Viral Genomics and Structure".

Rando, Halie M; MacLean, Adam L; Lee, Alexandra J; Lordan, Ronan; Ray, Sandipan; Bansal, Vikas; Skelly, Ashwin N; Sell, Elizabeth; Dziak, John J; Shinholster, Lamonica; D'Agostino McGowan, Lucy; Ben Guebila, Marouen; Wellhausen, Nils; Knyazev, Sergey; Boca, Simina M; Capone, Stephen; Qi, Yanjun; Park, YoSon; Mai, David; Sun, Yuchen; Boerckel, Joel D; Brueffer, Christian; Byrd, James Brian; Kamil, Jeremy P; Wang, Jinhui; Velazquez, Ryan; Szeto, Gregory L; Barton, John P; Goel, Rishi Raj; Mangul, Serghei; Lubiana, Tiago; Gitter, Anthony; Greene, Casey S.

mSystems ; 7(1): e0144721, 2022 Feb 22.

Artículo en Inglés | MEDLINE | ID: mdl-35076276

3.

Identification and Development of Therapeutics for COVID-19.

Rando, Halie M; Wellhausen, Nils; Ghosh, Soumita; Lee, Alexandra J; Dattoli, Anna Ada; Hu, Fengling; Byrd, James Brian; Rafizadeh, Diane N; Lordan, Ronan; Qi, Yanjun; Sun, Yuchen; Brueffer, Christian; Field, Jeffrey M; Ben Guebila, Marouen; Jadavji, Nafisa M; Skelly, Ashwin N; Ramsundar, Bharath; Wang, Jinhui; Goel, Rishi Raj; Park, YoSon; Boca, Simina M; Gitter, Anthony; Greene, Casey S.

mSystems ; 6(6): e0023321, 2021 Dec 21.

Artículo en Inglés | MEDLINE | ID: mdl-34726496

RESUMEN

After emerging in China in late 2019, the novel coronavirus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spread worldwide, and as of mid-2021, it remains a significant threat globally. Only a few coronaviruses are known to infect humans, and only two cause infections similar in severity to SARS-CoV-2: Severe acute respiratory syndrome-related coronavirus, a species closely related to SARS-CoV-2 that emerged in 2002, and Middle East respiratory syndrome-related coronavirus, which emerged in 2012. Unlike the current pandemic, previous epidemics were controlled rapidly through public health measures, but the body of research investigating severe acute respiratory syndrome and Middle East respiratory syndrome has proven valuable for identifying approaches to treating and preventing novel coronavirus disease 2019 (COVID-19). Building on this research, the medical and scientific communities have responded rapidly to the COVID-19 crisis and identified many candidate therapeutics. The approaches used to identify candidates fall into four main categories: adaptation of clinical approaches to diseases with related pathologies, adaptation based on virological properties, adaptation based on host response, and data-driven identification (ID) of candidates based on physical properties or on pharmacological compendia. To date, a small number of therapeutics have already been authorized by regulatory agencies such as the Food and Drug Administration (FDA), while most remain under investigation. The scale of the COVID-19 crisis offers a rare opportunity to collect data on the effects of candidate therapeutics. This information provides insight not only into the management of coronavirus diseases but also into the relative success of different approaches to identifying candidate therapeutics against an emerging disease. IMPORTANCE The COVID-19 pandemic is a rapidly evolving crisis. With the worldwide scientific community shifting focus onto the SARS-CoV-2 virus and COVID-19, a large number of possible pharmaceutical approaches for treatment and prevention have been proposed. What was known about each of these potential interventions evolved rapidly throughout 2020 and 2021. This fast-paced area of research provides important insight into how the ongoing pandemic can be managed and also demonstrates the power of interdisciplinary collaboration to rapidly understand a virus and match its characteristics with existing or novel pharmaceuticals. As illustrated by the continued threat of viral epidemics during the current millennium, a rapid and strategic response to emerging viral threats can save lives. In this review, we explore how different modes of identifying candidate therapeutics have borne out during COVID-19.

4.

Pathogenesis, Symptomatology, and Transmission of SARS-CoV-2 through Analysis of Viral Genomics and Structure.

Rando, Halie M; MacLean, Adam L; Lee, Alexandra J; Lordan, Ronan; Ray, Sandipan; Bansal, Vikas; Skelly, Ashwin N; Sell, Elizabeth; Dziak, John J; Shinholster, Lamonica; D'Agostino McGowan, Lucy; Ben Guebila, Marouen; Wellhausen, Nils; Knyazev, Sergey; Boca, Simina M; Capone, Stephen; Qi, Yanjun; Park, YoSon; Mai, David; Sun, Yuchen; Boerckel, Joel D; Brueffer, Christian; Byrd, James Brian; Kamil, Jeremy P; Wang, Jinhui; Velazquez, Ryan; Szeto, Gregory L; Barton, John P; Goel, Rishi Raj; Mangul, Serghei; Lubiana, Tiago; Gitter, Anthony; Greene, Casey S.

mSystems ; 6(5): e0009521, 2021 10 26.

Artículo en Inglés | MEDLINE | ID: mdl-34698547

RESUMEN

The novel coronavirus SARS-CoV-2, which emerged in late 2019, has since spread around the world and infected hundreds of millions of people with coronavirus disease 2019 (COVID-19). While this viral species was unknown prior to January 2020, its similarity to other coronaviruses that infect humans has allowed for rapid insight into the mechanisms that it uses to infect human hosts, as well as the ways in which the human immune system can respond. Here, we contextualize SARS-CoV-2 among other coronaviruses and identify what is known and what can be inferred about its behavior once inside a human host. Because the genomic content of coronaviruses, which specifies the virus's structure, is highly conserved, early genomic analysis provided a significant head start in predicting viral pathogenesis and in understanding potential differences among variants. The pathogenesis of the virus offers insights into symptomatology, transmission, and individual susceptibility. Additionally, prior research into interactions between the human immune system and coronaviruses has identified how these viruses can evade the immune system's protective mechanisms. We also explore systems-level research into the regulatory and proteomic effects of SARS-CoV-2 infection and the immune response. Understanding the structure and behavior of the virus serves to contextualize the many facets of the COVID-19 pandemic and can influence efforts to control the virus and treat the disease. IMPORTANCE COVID-19 involves a number of organ systems and can present with a wide range of symptoms. From how the virus infects cells to how it spreads between people, the available research suggests that these patterns are very similar to those seen in the closely related viruses SARS-CoV-1 and possibly Middle East respiratory syndrome-related CoV (MERS-CoV). Understanding the pathogenesis of the SARS-CoV-2 virus also contextualizes how the different biological systems affected by COVID-19 connect. Exploring the structure, phylogeny, and pathogenesis of the virus therefore helps to guide interpretation of the broader impacts of the virus on the human body and on human populations. For this reason, an in-depth exploration of viral mechanisms is critical to a robust understanding of SARS-CoV-2 and, potentially, future emergent human CoVs (HCoVs).

5.

Identification and Development of Therapeutics for COVID-19.

Rando, Halie M; Wellhausen, Nils; Ghosh, Soumita; Lee, Alexandra J; Dattoli, Anna Ada; Hu, Fengling; Byrd, James Brian; Rafizadeh, Diane N; Lordan, Ronan; Qi, Yanjun; Sun, Yuchen; Brueffer, Christian; Field, Jeffrey M; Guebila, Marouen Ben; Jadavji, Nafisa M; Skelly, Ashwin N; Ramsundar, Bharath; Wang, Jinhui; Goel, Rishi Raj; Park, YoSon; Boca, Simina M; Gitter, Anthony; Greene, Casey S.

ArXiv ; 2021 Mar 03.

Artículo en Inglés | MEDLINE | ID: mdl-33688554

RESUMEN

After emerging in China in late 2019, the novel coronavirus SARS-CoV-2 spread worldwide and as of mid-2021 remains a significant threat globally. Only a few coronaviruses are known to infect humans, and only two cause infections similar in severity to SARS-CoV-2: Severe acute respiratory syndrome-related coronavirus, a closely related species of SARS-CoV-2 that emerged in 2002, and Middle East respiratory syndrome-related coronavirus, which emerged in 2012. Unlike the current pandemic, previous epidemics were controlled rapidly through public health measures, but the body of research investigating severe acute respiratory syndrome and Middle East respiratory syndrome has proven valuable for identifying approaches to treating and preventing novel coronavirus disease 2019 (COVID-19). Building on this research, the medical and scientific communities have responded rapidly to the COVID-19 crisis to identify many candidate therapeutics. The approaches used to identify candidates fall into four main categories: adaptation of clinical approaches to diseases with related pathologies, adaptation based on virological properties, adaptation based on host response, and data-driven identification of candidates based on physical properties or on pharmacological compendia. To date, a small number of therapeutics have already been authorized by regulatory agencies such as the Food and Drug Administration (FDA), while most remain under investigation. The scale of the COVID-19 crisis offers a rare opportunity to collect data on the effects of candidate therapeutics. This information provides insight not only into the management of coronavirus diseases, but also into the relative success of different approaches to identifying candidate therapeutics against an emerging disease.

6.

Pathogenesis, Symptomatology, and Transmission of SARS-CoV-2 through Analysis of Viral Genomics and Structure.

Rando, Halie M; MacLean, Adam L; Lee, Alexandra J; Lordan, Ronan; Ray, Sandipan; Bansal, Vikas; Skelly, Ashwin N; Sell, Elizabeth; Dziak, John J; Shinholster, Lamonica; McGowan, Lucy D'Agostino; Guebila, Marouen Ben; Wellhausen, Nils; Knyazev, Sergey; Boca, Simina M; Capone, Stephen; Qi, Yanjun; Park, YoSon; Sun, Yuchen; Mai, David; Boerckel, Joel D; Brueffer, Christian; Byrd, James Brian; Kamil, Jeremy P; Wang, Jinhui; Velazquez, Ryan; Szeto, Gregory L; Barton, John P; Goel, Rishi Raj; Mangul, Serghei; Lubiana, Tiago; Gitter, Anthony; Greene, Casey S.

ArXiv ; 2021 Feb 01.

Artículo en Inglés | MEDLINE | ID: mdl-33594340

RESUMEN

The novel coronavirus SARS-CoV-2, which emerged in late 2019, has since spread around the world and infected hundreds of millions of people with coronavirus disease 2019 (COVID-19). While this viral species was unknown prior to January 2020, its similarity to other coronaviruses that infect humans has allowed for rapid insight into the mechanisms that it uses to infect human hosts, as well as the ways in which the human immune system can respond. Here, we contextualize SARS-CoV-2 among other coronaviruses and identify what is known and what can be inferred about its behavior once inside a human host. Because the genomic content of coronaviruses, which specifies the virus's structure, is highly conserved, early genomic analysis provided a significant head start in predicting viral pathogenesis and in understanding potential differences among variants. The pathogenesis of the virus offers insights into symptomatology, transmission, and individual susceptibility. Additionally, prior research into interactions between the human immune system and coronaviruses has identified how these viruses can evade the immune system's protective mechanisms. We also explore systems-level research into the regulatory and proteomic effects of SARS-CoV-2 infection and the immune response. Understanding the structure and behavior of the virus serves to contextualize the many facets of the COVID-19 pandemic and can influence efforts to control the virus and treat the disease.

7.

Graph convolutional networks for epigenetic state prediction using both sequence and 3D genome data.

Lanchantin, Jack; Qi, Yanjun.

Bioinformatics ; 36(Suppl_2): i659-i667, 2020 12 30.

Artículo en Inglés | MEDLINE | ID: mdl-33381816

RESUMEN

MOTIVATION: Predictive models of DNA chromatin profile (i.e. epigenetic state), such as transcription factor binding, are essential for understanding regulatory processes and developing gene therapies. It is known that the 3D genome, or spatial structure of DNA, is highly influential in the chromatin profile. Deep neural networks have achieved state of the art performance on chromatin profile prediction by using short windows of DNA sequences independently. These methods, however, ignore the long-range dependencies when predicting the chromatin profiles because modeling the 3D genome is challenging. RESULTS: In this work, we introduce ChromeGCN, a graph convolutional network for chromatin profile prediction by fusing both local sequence and long-range 3D genome information. By incorporating the 3D genome, we relax the independent and identically distributed assumption of local windows for a better representation of DNA. ChromeGCN explicitly incorporates known long-range interactions into the modeling, allowing us to identify and interpret those important long-range dependencies in influencing chromatin profiles. We show experimentally that by fusing sequential and 3D genome data using ChromeGCN, we get a significant improvement over the state-of-the-art deep learning methods as indicated by three metrics. Importantly, we show that ChromeGCN is particularly useful for identifying epigenetic effects in those DNA windows that have a high degree of interactions with other DNA windows. AVAILABILITY AND IMPLEMENTATION: https://github.com/QData/ChromeGCN. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Genoma , Redes Neurales de la Computación , Cromatina/genética , Epigénesis Genética , Epigenómica

8.

FastSK: fast sequence analysis with gapped string kernels.

Blakely, Derrick; Collins, Eamon; Singh, Ritambhara; Norton, Andrew; Lanchantin, Jack; Qi, Yanjun.

Bioinformatics ; 36(Suppl_2): i857-i865, 2020 12 30.

Artículo en Inglés | MEDLINE | ID: mdl-33381828

RESUMEN

MOTIVATION: Gapped k-mer kernels with support vector machines (gkm-SVMs) have achieved strong predictive performance on regulatory DNA sequences on modestly sized training sets. However, existing gkm-SVM algorithms suffer from slow kernel computation time, as they depend exponentially on the sub-sequence feature length, number of mismatch positions, and the task's alphabet size. RESULTS: In this work, we introduce a fast and scalable algorithm for calculating gapped k-mer string kernels. Our method, named FastSK, uses a simplified kernel formulation that decomposes the kernel calculation into a set of independent counting operations over the possible mismatch positions. This simplified decomposition allows us to devise a fast Monte Carlo approximation that rapidly converges. FastSK can scale to much greater feature lengths, allows us to consider more mismatches, and is performant on a variety of sequence analysis tasks. On multiple DNA transcription factor binding site prediction datasets, FastSK consistently matches or outperforms the state-of-the-art gkmSVM-2.0 algorithms in area under the ROC curve, while achieving average speedups in kernel computation of â¼100× and speedups of â¼800× for large feature lengths. We further show that FastSK outperforms character-level recurrent and convolutional neural networks while achieving low variance. We then extend FastSK to 7 English-language medical named entity recognition datasets and 10 protein remote homology detection datasets. FastSK consistently matches or outperforms these baselines. AVAILABILITY AND IMPLEMENTATION: Our algorithm is available as a Python package and as C++ source code at https://github.com/QData/FastSK. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Análisis de Secuencia de Proteína , Máquina de Vectores de Soporte , Algoritmos , Proteínas , Programas Informáticos

9.

Transfer String Kernel for Cross-Context DNA-Protein Binding Prediction.

Singh, Ritambhara; Lanchantin, Jack; Robins, Gabriel; Qi, Yanjun.

IEEE/ACM Trans Comput Biol Bioinform ; 16(5): 1524-1536, 2019.

Artículo en Inglés | MEDLINE | ID: mdl-27654939

RESUMEN

Through sequence-based classification, this paper tries to accurately predict the DNA binding sites of transcription factors (TFs) in an unannotated cellular context. Related methods in the literature fail to perform such predictions accurately, since they do not consider sample distribution shift of sequence segments from an annotated (source) context to an unannotated (target) context. We, therefore, propose a method called "Transfer String Kernel" (TSK) that achieves improved prediction of transcription factor binding site (TFBS) using knowledge transfer via cross-context sample adaptation. TSK maps sequence segments to a high-dimensional feature space using a discriminative mismatch string kernel framework. In this high-dimensional space, labeled examples of the source context are re-weighted so that the revised sample distribution matches the target context more closely. We have experimentally verified TSK for TFBS identifications on 14 different TFs under a cross-organism setting. We find that TSK consistently outperforms the state-of-the-art TFBS tools, especially when working with TFs whose binding sequences are not conserved across contexts. We also demonstrate the generalizability of TSK by showing its cutting-edge performance on a different set of cross-context tasks for the MHC peptide binding predictions.

Asunto(s)

Biología Computacional/métodos , Proteínas de Unión al ADN , ADN , Aprendizaje Automático , Modelos Estadísticos , Algoritmos , Animales , ADN/química , ADN/metabolismo , Proteínas de Unión al ADN/química , Proteínas de Unión al ADN/metabolismo , Humanos , Ratones , Unión Proteica

10.

DeepDiff: DEEP-learning for predicting DIFFerential gene expression from histone modifications.

Sekhon, Arshdeep; Singh, Ritambhara; Qi, Yanjun.

Bioinformatics ; 34(17): i891-i900, 2018 09 01.

Artículo en Inglés | MEDLINE | ID: mdl-30423076

RESUMEN

Motivation: Computational methods that predict differential gene expression from histone modification signals are highly desirable for understanding how histone modifications control the functional heterogeneity of cells through influencing differential gene regulation. Recent studies either failed to capture combinatorial effects on differential prediction or primarily only focused on cell type-specific analysis. In this paper we develop a novel attention-based deep learning architecture, DeepDiff, that provides a unified and end-to-end solution to model and to interpret how dependencies among histone modifications control the differential patterns of gene regulation. DeepDiff uses a hierarchy of multiple Long Short-Term Memory (LSTM) modules to encode the spatial structure of input signals and to model how various histone modifications cooperate automatically. We introduce and train two levels of attention jointly with the target prediction, enabling DeepDiff to attend differentially to relevant modifications and to locate important genome positions for each modification. Additionally, DeepDiff introduces a novel deep-learning based multi-task formulation to use the cell-type-specific gene expression predictions as auxiliary tasks, encouraging richer feature embeddings in our primary task of differential expression prediction. Results: Using data from Roadmap Epigenomics Project (REMC) for ten different pairs of cell types, we show that DeepDiff significantly outperforms the state-of-the-art baselines for differential gene expression prediction. The learned attention weights are validated by observations from previous studies about how epigenetic mechanisms connect to differential gene expression. Availability and implementation: Codes and results are available at deepchrome.org. Supplementary information: Supplementary data are available at Bioinformatics online.

Asunto(s)

Expresión Génica , Histonas/metabolismo , Aprendizaje Automático , Código de Histonas , Humanos , Procesamiento Proteico-Postraduccional , Programas Informáticos

11.

Opportunities and obstacles for deep learning in biology and medicine.

Ching, Travers; Himmelstein, Daniel S; Beaulieu-Jones, Brett K; Kalinin, Alexandr A; Do, Brian T; Way, Gregory P; Ferrero, Enrico; Agapow, Paul-Michael; Zietz, Michael; Hoffman, Michael M; Xie, Wei; Rosen, Gail L; Lengerich, Benjamin J; Israeli, Johnny; Lanchantin, Jack; Woloszynek, Stephen; Carpenter, Anne E; Shrikumar, Avanti; Xu, Jinbo; Cofer, Evan M; Lavender, Christopher A; Turaga, Srinivas C; Alexandari, Amr M; Lu, Zhiyong; Harris, David J; DeCaprio, Dave; Qi, Yanjun; Kundaje, Anshul; Peng, Yifan; Wiley, Laura K; Segler, Marwin H S; Boca, Simina M; Swamidass, S Joshua; Huang, Austin; Gitter, Anthony; Greene, Casey S.

J R Soc Interface ; 15(141)2018 04.

Artículo en Inglés | MEDLINE | ID: mdl-29618526

RESUMEN

Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems-patient classification, fundamental biological processes and treatment of patients-and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural network's prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine.

Asunto(s)

Investigación Biomédica/tendencias , Tecnología Biomédica/tendencias , Aprendizaje Profundo/tendencias , Algoritmos , Investigación Biomédica/métodos , Toma de Decisiones , Atención a la Salud/métodos , Atención a la Salud/tendencias , Enfermedad/genética , Diseño de Fármacos , Registros Electrónicos de Salud/tendencias , Humanos , Terminología como Asunto

12.

Attend and Predict: Understanding Gene Regulation by Selective Attention on Chromatin.

Singh, Ritambhara; Lanchantin, Jack; Sekhon, Arshdeep; Qi, Yanjun.

Adv Neural Inf Process Syst ; 30: 6785-6795, 2017 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-30147283

RESUMEN

The past decade has seen a revolution in genomic technologies that enabled a flood of genome-wide profiling of chromatin marks. Recent literature tried to understand gene regulation by predicting gene expression from large-scale chromatin measurements. Two fundamental challenges exist for such learning tasks: (1) genome-wide chromatin signals are spatially structured, high-dimensional and highly modular; and (2) the core aim is to understand what the relevant factors are and how they work together. Previous studies either failed to model complex dependencies among input signals or relied on separate feature analysis to explain the decisions. This paper presents an attention-based deep learning approach, AttentiveChrome, that uses a unified architecture to model and to interpret dependencies among chromatin factors for controlling gene regulation. AttentiveChrome uses a hierarchy of multiple Long Short-Term Memory (LSTM) modules to encode the input signals and to model how various chromatin marks cooperate automatically. AttentiveChrome trains two levels of attention jointly with the target prediction, enabling it to attend differentially to relevant marks and to locate important positions per mark. We evaluate the model across 56 different cell types (tasks) in humans. Not only is the proposed architecture more accurate, but its attention scores provide a better interpretation than state-of-the-art feature visualization methods such as saliency maps.

13.

DEEP MOTIF DASHBOARD: VISUALIZING AND UNDERSTANDING GENOMIC SEQUENCES USING DEEP NEURAL NETWORKS.

Lanchantin, Jack; Singh, Ritambhara; Wang, Beilun; Qi, Yanjun.

Pac Symp Biocomput ; 22: 254-265, 2017.

Artículo en Inglés | MEDLINE | ID: mdl-27896980

RESUMEN

Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence's saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them.

Asunto(s)

Genómica , Modelos Genéticos , Redes Neurales de la Computación , Sitios de Unión/genética , Biología Computacional , ADN/genética , ADN/metabolismo , Bases de Datos Genéticas/estadística & datos numéricos , Genoma Humano , Humanos , Factores de Transcripción/metabolismo

14.

DeepChrome: deep-learning for predicting gene expression from histone modifications.

Singh, Ritambhara; Lanchantin, Jack; Robins, Gabriel; Qi, Yanjun.

Bioinformatics ; 32(17): i639-i648, 2016 09 01.

Artículo en Inglés | MEDLINE | ID: mdl-27587684

RESUMEN

MOTIVATION: Histone modifications are among the most important factors that control gene regulation. Computational methods that predict gene expression from histone modification signals are highly desirable for understanding their combinatorial effects in gene regulation. This knowledge can help in developing 'epigenetic drugs' for diseases like cancer. Previous studies for quantifying the relationship between histone modifications and gene expression levels either failed to capture combinatorial effects or relied on multiple methods that separate predictions and combinatorial analysis. This paper develops a unified discriminative framework using a deep convolutional neural network to classify gene expression using histone modification data as input. Our system, called DeepChrome, allows automatic extraction of complex interactions among important features. To simultaneously visualize the combinatorial interactions among histone modifications, we propose a novel optimization-based technique that generates feature pattern maps from the learnt deep model. This provides an intuitive description of underlying epigenetic mechanisms that regulate genes. RESULTS: We show that DeepChrome outperforms state-of-the-art models like Support Vector Machines and Random Forests for gene expression classification task on 56 different cell-types from REMC database. The output of our visualization technique not only validates the previous observations but also allows novel insights about combinatorial interactions among histone modification marks, some of which have recently been observed by experimental studies. AVAILABILITY AND IMPLEMENTATION: Codes and results are available at www.deepchrome.org CONTACT: yanjun@virginia.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Regulación de la Expresión Génica , Código de Histonas , Máquina de Vectores de Soporte , Análisis por Conglomerados , Biología Computacional , Epigénesis Genética , Redes Reguladoras de Genes , Humanos , Redes Neurales de la Computación

15.

Causality Analysis of Inertial Body Sensors for Multiple Sclerosis Diagnostic Enhancement.

Gong, Jiaqi; Qi, Yanjun; Goldman, Myla D; Lach, John.

IEEE J Biomed Health Inform ; 20(5): 1273-80, 2016 09.

Artículo en Inglés | MEDLINE | ID: mdl-27411232

RESUMEN

Inertial body sensors have emerged in recent years as an effective tool for evaluating mobility impairment resulting from various diseases, disorders, and injuries. For example, body sensors have been used in 6-min walk (6 MW) tests for multiple sclerosis (MS) patients to identify gait features useful in the study, diagnosis, and tracking of the disease. However, most studies to date have focused on features localized to the lower or upper extremities and do not provide a holistic assessment of mobility. This paper presents a causality analysis method focused on the coordination between extremities to identify subtle whole-body mobility impairment that may aid disease diagnosis. This method was developed for and utilized in an MS pilot study with 41 subjects (28 persons with MS (PwMS) and 13 healthy controls) performing 6 MW tests. Compared with existing methods, the causality analysis provided better discrimination between healthy controls and PwMS and a deeper understanding of MS disease impact on mobility.

Asunto(s)

Marcha/fisiología , Esclerosis Múltiple/diagnóstico , Tecnología de Sensores Remotos/métodos , Adulto , Femenino , Humanos , Masculino , Persona de Mediana Edad , Procesamiento de Señales Asistido por Computador , Caminata/fisiología

16.

Recurrent chimeric fusion RNAs in non-cancer tissues and cells.

Babiceanu, Mihaela; Qin, Fujun; Xie, Zhongqiu; Jia, Yuemeng; Lopez, Kevin; Janus, Nick; Facemire, Loryn; Kumar, Shailesh; Pang, Yuwei; Qi, Yanjun; Lazar, Iulia M; Li, Hui.

Nucleic Acids Res ; 44(6): 2859-72, 2016 Apr 07.

Artículo en Inglés | MEDLINE | ID: mdl-26837576

RESUMEN

Gene fusions and their products (RNA and protein) were once thought to be unique features to cancer. However, chimeric RNAs can also be found in normal cells. Here, we performed, curated and analyzed nearly 300 RNA-Seq libraries covering 30 different non-neoplastic human tissues and cells as well as 15 mouse tissues. A large number of fusion transcripts were found. Most fusions were detected only once, while 291 were seen in more than one sample. We focused on the recurrent fusions and performed RNA and protein level validations on a subset. We characterized these fusions based on various features of the fusions, and their parental genes. They tend to be expressed at higher levels relative to their parental genes than the non-recurrent ones. Over half of the recurrent fusions involve neighboring genes transcribing in the same direction. A few sequence motifs were found enriched close to the fusion junction sites. We performed functional analyses on a few widely expressed fusions, and found that silencing them resulted in dramatic reduction in normal cell growth and/or motility. Most chimeras use canonical splicing sites, thus are likely products of 'intergenic splicing'. We also explored the implications of these non-pathological fusions in cancer and in evolution.

Asunto(s)

Fibroblastos/metabolismo , Fusión Génica , Células Madre Mesenquimatosas/metabolismo , Empalme del ARN , ARN Mensajero/genética , Animales , Astrocitos/citología , Astrocitos/metabolismo , Secuencia de Bases , Línea Celular Transformada , Biología Computacional , Evolución Molecular , Fibroblastos/citología , Biblioteca de Genes , Silenciador del Gen , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Células Madre Mesenquimatosas/citología , Ratones , Datos de Secuencia Molecular , Cultivo Primario de Células , ARN Mensajero/antagonistas & inhibidores , ARN Mensajero/metabolismo , ARN Interferente Pequeño/genética , ARN Interferente Pequeño/metabolismo , Análisis de Secuencia de ARN , Especificidad de la Especie

17.

Cas9-chromatin binding information enables more accurate CRISPR off-target prediction.

Singh, Ritambhara; Kuscu, Cem; Quinlan, Aaron; Qi, Yanjun; Adli, Mazhar.

Nucleic Acids Res ; 43(18): e118, 2015 Oct 15.

Artículo en Inglés | MEDLINE | ID: mdl-26032770

RESUMEN

The CRISPR system has become a powerful biological tool with a wide range of applications. However, improving targeting specificity and accurately predicting potential off-targets remains a significant goal. Here, we introduce a web-based CR: ISPR/Cas9 O: ff-target P: rediction and I: dentification T: ool (CROP-IT) that performs improved off-target binding and cleavage site predictions. Unlike existing prediction programs that solely use DNA sequence information; CROP-IT integrates whole genome level biological information from existing Cas9 binding and cleavage data sets. Utilizing whole-genome chromatin state information from 125 human cell types further enhances its computational prediction power. Comparative analyses on experimentally validated datasets show that CROP-IT outperforms existing computational algorithms in predicting both Cas9 binding as well as cleavage sites. With a user-friendly web-interface, CROP-IT outputs scored and ranked list of potential off-targets that enables improved guide RNA design and more accurate prediction of Cas9 binding or cleavage sites.

Asunto(s)

Proteínas Asociadas a CRISPR/metabolismo , Sistemas CRISPR-Cas , Cromatina/metabolismo , Desoxirribonucleasas/metabolismo , Programas Informáticos , Algoritmos , Sitios de Unión , División del ADN , Humanos , Análisis de Secuencia de ADN , Análisis de Secuencia de ARN/métodos

18.

Refining literature curated protein interactions using expert opinions.

Tastan, Oznur; Qi, Yanjun; Carbonell, Jaime G; Klein-Seetharaman, Judith.

Pac Symp Biocomput ; : 318-29, 2015.

Artículo en Inglés | MEDLINE | ID: mdl-25592592

RESUMEN

The availability of high-quality physical interaction datasets is a prerequisite for system-level analysis of interactomes and supervised models to predict protein-protein interactions (PPIs). One source is literature-curated PPI databases in which pairwise associations of proteins published in the scientific literature are deposited. However, PPIs may not be clearly labelled as physical interactions affecting the quality of the entire dataset. In order to obtain a high-quality gold standard dataset for PPIs between human immunodeficiency virus (HIV-1) and its human host, we adopted a crowd-sourcing approach. We collected expert opinions and utilized an expectation-maximization based approach to estimate expert labeling quality. These estimates are used to infer the probability of a reported PPI actually being a direct physical interaction given the set of expert opinions. The effectiveness of our approach is demonstrated through synthetic data experiments and a high quality physical interaction network between HIV and human proteins is obtained. Since many literature-curated databases suffer from similar challenges, the framework described herein could be utilized in refining other databases. The curated data is available at http://www.cs.bilkent.edu.tr/~oznur.tastan/supp/psb2015/.

Asunto(s)

Bases de Datos de Proteínas/estadística & datos numéricos , Mapas de Interacción de Proteínas , Biología Computacional , Colaboración de las Masas , Testimonio de Experto , VIH-1/patogenicidad , VIH-1/fisiología , Interacciones Huésped-Patógeno , Proteínas del Virus de la Inmunodeficiencia Humana/fisiología , Humanos , Descubrimiento del Conocimiento , Funciones de Verosimilitud , Modelos Estadísticos , Análisis de Sistemas

19.

[Changes of proliferation and angiogenesis of residual tumor in rabbit lung after radiofrequency ablation].

Ma, Lianjun; Zhou, Naikang; Qi, Yanjun; Peng, Yanghong; Zhang, Shuxin.

Zhonghua Yi Xue Za Zhi ; 94(21): 1671-3, 2014 Jun 03.

Artículo en Chino | MEDLINE | ID: mdl-25152296

RESUMEN

OBJECTIVE: To observe the changes of proliferation and angiogenesis of residual tumor in rabbit lung after radiofrequency ablation (RFA). METHODS: The model of VX2 tumor in rabbit lung was established by injection of tissue block suspension. 64 New Zealand White rabbits bearing VX2 tumor were assigned randomly to the control group (n = 10) and the RFA group (n = 48). During the RFA procedure, residual tumors were achieved by controlling the range of electrode expanding, output power and treatment time. At several points of time, Ki-67 labeling index (Ki-67LI) and microvessel density (MVD) of the residual tumors were calculated by immunohistochemical detection. RESULTS: Ki-67LI of the control group was 45.3% ± 2.1%. Ki-67LI of the RFA group at the first, 3 and 5 day were 56.4% ± 3.4%, 60.1% ± 4.1% and 59.8% ± 2.4% respectively, significantly higher than that of the control group; however, at the seventh, 9, 14 and 21 day, they were 45.4% ± 2.0%, 46.2% ± 3.4%, 45.1% ± 4.4% and 47.8% ± 3.9% respectively, no significant difference compared with the control group. The control group MVD was 28.9 ± 2.9. MVD of the RFA group at third, 5 and 7 day were 36.8 ± 2.6, 55.6 ± 4.8 and 51.5 ± 2.8 respectively, significantly higher than that of the control group; however, at the first, 9, 14 and 21 day were 27 ± 2.8, 29.2 ± 3.2, 30 ± 2.8 and 28.8 ± 3.1 respectively, no significant difference compared with the control group. CONCLUSIONS: The proliferation and angiogenesis of pulmonary residual tumor exhibit a transient increase phenomenon after RFA.

Asunto(s)

Proliferación Celular , Neoplasias Pulmonares/irrigación sanguínea , Neoplasias Pulmonares/patología , Neovascularización Patológica , Animales , Ablación por Catéter , Electrodos , Neoplasia Residual , Conejos

20.

An integrated approach to blood-based cancer diagnosis and biomarker discovery.

Min, Martin Renqiang; Chowdhury, Salim; Qi, Yanjun; Stewart, Alex; Ostroff, Rachel.

Pac Symp Biocomput ; : 87-98, 2014.

Artículo en Inglés | MEDLINE | ID: mdl-24297536

RESUMEN

Disrupted or abnormal biological processes responsible for cancers often quantitatively manifest as disrupted additive and multiplicative interactions of gene/protein expressions correlating with cancer progression. However, the examination of all possible combinatorial interactions between gene features in most case-control studies with limited training data is computationally infeasible. In this paper, we propose a practically feasible data integration approach, QUIRE (QUadratic Interactions among infoRmative fEatures), to identify discriminative complex interactions among informative gene features for cancer diagnosis and biomarker discovery directly based on patient blood samples. QUIRE works in two stages, where it first identifies functionally relevant gene groups for the disease with the help of gene functional annotations and available physical protein interactions, then it explores the combinatorial relationships among the genes from the selected informative groups. Based on our private experimentally generated data from patient blood samples using a novel SOMAmer (Slow Off-rate Modified Aptamer) technology, we apply QUIRE to cancer diagnosis and biomarker discovery for Renal Cell Carcinoma (RCC) and Ovarian Cancer (OVC). To further demonstrate the general applicability of our approach, we also apply QUIRE to a publicly available Colorectal Cancer (CRC) dataset that can be used to prioritize our SOMAmer design. Our experimental results show that QUIRE identifies gene-gene interactions that can better identify the different cancer stages of samples, as compared to other state-of-the-art feature selection methods. A literature survey shows that many of the interactions identified by QUIRE play important roles in the development of cancer.

Asunto(s)

Biomarcadores/sangre , Neoplasias/sangre , Neoplasias/diagnóstico , Inteligencia Artificial , Carcinoma de Células Renales/sangre , Carcinoma de Células Renales/diagnóstico , Carcinoma de Células Renales/genética , Neoplasias Colorrectales/sangre , Neoplasias Colorrectales/diagnóstico , Neoplasias Colorrectales/genética , Biología Computacional , Progresión de la Enfermedad , Epistasis Genética , Femenino , Marcadores Genéticos , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Humanos , Neoplasias Renales/sangre , Neoplasias Renales/diagnóstico , Neoplasias Renales/genética , Modelos Genéticos , Neoplasias/genética , Neoplasias Ováricas/sangre , Neoplasias Ováricas/diagnóstico , Neoplasias Ováricas/genética , Técnica SELEX de Producción de Aptámeros/estadística & datos numéricos

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA