Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nature ; 2021 Nov 24.
Artigo em Inglês | MEDLINE | ID: mdl-34819669

RESUMO

The cell is a multi-scale structure with modular organization across at least four orders of magnitude1. Two central approaches for mapping this structure-protein fluorescent imaging and protein biophysical association-each generate extensive datasets, but of distinct qualities and resolutions that are typically treated separately2,3. Here we integrate immunofluorescence images in the Human Protein Atlas4 with affinity purifications in BioPlex5 to create a unified hierarchical map of human cell architecture. Integration is achieved by configuring each approach as a general measure of protein distance, then calibrating the two measures using machine learning. The map, known as the multi-scale integrated cell (MuSIC 1.0), resolves 69 subcellular systems, of which approximately half are to our knowledge undocumented. Accordingly, we perform 134 additional affinity purifications and validate subunit associations for the majority of systems. The map reveals a pre-ribosomal RNA processing assembly and accessory factors, which we show govern rRNA maturation, and functional roles for SRRM1 and FAM120C in chromatin and RPS3A in splicing. By integration across scales, MuSIC increases the resolution of imaging while giving protein interactions a spatial dimension, paving the way to incorporate diverse types of data in proteome-wide cell maps.

2.
NAR Genom Bioinform ; 3(4): lqab097, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34729476

RESUMO

Prediction of cancer-specific drug responses as well as identification of the corresponding drug-sensitive genes and pathways remains a major biological and clinical challenge. Deep learning models hold immense promise for better drug response predictions, but most of them cannot provide biological and clinical interpretability. Visible neural network (VNN) models have emerged to solve the problem by giving neurons biological meanings and directly casting biological networks into the models. However, the biological networks used in VNNs are often redundant and contain components that are irrelevant to the downstream predictions. Therefore, the VNNs using these redundant biological networks are overparameterized, which significantly limits VNNs' predictive and explanatory power. To overcome the problem, we treat the edges and nodes in biological networks used in VNNs as features and develop a sparse learning framework ParsVNN to learn parsimony VNNs with only edges and nodes that contribute the most to the prediction task. We applied ParsVNN to build cancer-specific VNN models to predict drug response for five different cancer types. We demonstrated that the parsimony VNNs built by ParsVNN are superior to other state-of-the-art methods in terms of prediction performance and identification of cancer driver genes. Furthermore, we found that the pathways selected by ParsVNN have great potential to predict clinical outcomes as well as recommend synergistic drug combinations.

3.
Bioinformatics ; 37(Suppl_1): i254-i261, 2021 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-34252932

RESUMO

MOTIVATION: The prediction of the binding between peptides and major histocompatibility complex (MHC) molecules plays an important role in neoantigen identification. Although a large number of computational methods have been developed to address this problem, they produce high false-positive rates in practical applications, since in most cases, a single residue mutation may largely alter the binding affinity of a peptide binding to MHC which cannot be identified by conventional deep learning methods. RESULTS: We developed a differential boundary tree-based model, named DBTpred, to address this problem. We demonstrated that DBTpred can accurately predict MHC class I binding affinity compared to the state-of-art deep learning methods. We also presented a parallel training algorithm to accelerate the training and inference process which enables DBTpred to be applied to large datasets. By investigating the statistical properties of differential boundary trees and the prediction paths to test samples, we revealed that DBTpred can provide an intuitive interpretation and possible hints in detecting important residue mutations that can largely influence binding affinity. AVAILABILITY AND IMPLEMENTATION: The DBTpred package is implemented in Python and freely available at: https://github.com/fpy94/DBT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Antígenos de Histocompatibilidade Classe I , Peptídeos , Algoritmos , Antígenos de Histocompatibilidade Classe I/genética , Antígenos de Histocompatibilidade Classe I/metabolismo , Humanos , Complexo Principal de Histocompatibilidade , Peptídeos/metabolismo , Ligação Proteica
4.
Bioinformatics ; 37(Suppl_1): i410-i417, 2021 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-34252957

RESUMO

MOTIVATION: Recently, machine learning models have achieved tremendous success in prioritizing candidate genes for genetic diseases. These models are able to accurately quantify the similarity among disease and genes based on the intuition that similar genes are more likely to be associated with similar diseases. However, the genetic features these methods rely on are often hard to collect due to high experimental cost and various other technical limitations. Existing solutions of this problem significantly increase the risk of overfitting and decrease the generalizability of the models. RESULTS: In this work, we propose a graph neural network (GNN) version of the Learning under Privileged Information paradigm to predict new disease gene associations. Unlike previous gene prioritization approaches, our model does not require the genetic features to be the same at training and test stages. If a genetic feature is hard to measure and therefore missing at the test stage, our model could still efficiently incorporate its information during the training process. To implement this, we develop a Heteroscedastic Gaussian Dropout algorithm, where the dropout probability of the GNN model is determined by another GNN model with a mirrored GNN architecture. To evaluate our method, we compared our method with four state-of-the-art methods on the Online Mendelian Inheritance in Man dataset to prioritize candidate disease genes. Extensive evaluations show that our model could improve the prediction accuracy when all the features are available compared to other methods. More importantly, our model could make very accurate predictions when >90% of the features are missing at the test stage. AVAILABILITY AND IMPLEMENTATION: Our method is realized with Python 3.7 and Pytorch 1.5.0 and method and data are freely available at: https://github.com/juanshu30/Disease-Gene-Prioritization-with-Privileged-Information-and-Heteroscedastic-Dropout.


Assuntos
Bases de Dados Genéticas , Redes Neurais de Computação , Algoritmos , Humanos , Aprendizado de Máquina
5.
Nat Cancer ; 2(2): 233-244, 2021 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-34223192

RESUMO

Cell-line screens create expansive datasets for learning predictive markers of drug response, but these models do not readily translate to the clinic with its diverse contexts and limited data. In the present study, we apply a recently developed technique, few-shot machine learning, to train a versatile neural network model in cell lines that can be tuned to new contexts using few additional samples. The model quickly adapts when switching among different tissue types and in moving from cell-line models to clinical contexts, including patient-derived tumor cells and patient-derived xenografts. It can also be interpreted to identify the molecular features most important to a drug response, highlighting critical roles for RB1 and SMAD4 in the response to CDK inhibition and RNF8 and CHD4 in the response to ATM inhibition. The few-shot learning framework provides a bridge from the many samples surveyed in high-throughput screens (n-of-many) to the distinctive contexts of individual patients (n-of-one).

6.
Immunity ; 54(6): 1304-1319.e9, 2021 06 08.
Artigo em Inglês | MEDLINE | ID: mdl-34048708

RESUMO

Despite mounting evidence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) engagement with immune cells, most express little, if any, of the canonical receptor of SARS-CoV-2, angiotensin-converting enzyme 2 (ACE2). Here, using a myeloid cell receptor-focused ectopic expression screen, we identified several C-type lectins (DC-SIGN, L-SIGN, LSECtin, ASGR1, and CLEC10A) and Tweety family member 2 (TTYH2) as glycan-dependent binding partners of the SARS-CoV-2 spike. Except for TTYH2, these molecules primarily interacted with spike via regions outside of the receptor-binding domain. Single-cell RNA sequencing analysis of pulmonary cells from individuals with coronavirus disease 2019 (COVID-19) indicated predominant expression of these molecules on myeloid cells. Although these receptors do not support active replication of SARS-CoV-2, their engagement with the virus induced robust proinflammatory responses in myeloid cells that correlated with COVID-19 severity. We also generated a bispecific anti-spike nanobody that not only blocked ACE2-mediated infection but also the myeloid receptor-mediated proinflammatory responses. Our findings suggest that SARS-CoV-2-myeloid receptor interactions promote immune hyperactivation, which represents potential targets for COVID-19 therapy.


Assuntos
COVID-19/metabolismo , COVID-19/virologia , Interações Hospedeiro-Patógeno , Lectinas Tipo C/metabolismo , Proteínas de Membrana/metabolismo , Células Mieloides/imunologia , Células Mieloides/metabolismo , Proteínas de Neoplasias/metabolismo , SARS-CoV-2/fisiologia , Enzima de Conversão de Angiotensina 2/metabolismo , Sítios de Ligação , COVID-19/genética , Linhagem Celular , Citocinas , Regulação da Expressão Gênica , Interações Hospedeiro-Patógeno/genética , Interações Hospedeiro-Patógeno/imunologia , Humanos , Mediadores da Inflamação/metabolismo , Lectinas Tipo C/química , Proteínas de Membrana/química , Modelos Moleculares , Proteínas de Neoplasias/química , Ligação Proteica , Conformação Proteica , Anticorpos de Domínio Único/imunologia , Glicoproteína da Espícula de Coronavírus/química , Glicoproteína da Espícula de Coronavírus/imunologia , Glicoproteína da Espícula de Coronavírus/metabolismo , Relação Estrutura-Atividade
7.
Sensors (Basel) ; 21(1)2020 Dec 26.
Artigo em Inglês | MEDLINE | ID: mdl-33375324

RESUMO

In this paper, we propose AirSign, a novel user authentication technology to provide users with more convenient, intuitive, and secure ways of interacting with smartphones in daily settings. AirSign leverages both acoustic and motion sensors for user authentication by signing signatures in the air through smartphones without requiring any special hardware. This technology actively transmits inaudible acoustic signals from the earpiece speaker, receives echoes back through both built-in microphones to "illuminate" signature and hand geometry, and authenticates users according to the unique features extracted from echoes and motion sensors. To evaluate our system, we collected registered, genuine, and forged signatures from 30 participants, and by applying AirSign on the above dataset, we were able to successfully distinguish between genuine and forged signatures with a 97.1% F-score while requesting only seven signatures during the registration phase.

8.
Cancer Cell ; 38(5): 672-684.e6, 2020 11 09.
Artigo em Inglês | MEDLINE | ID: mdl-33096023

RESUMO

Most drugs entering clinical trials fail, often related to an incomplete understanding of the mechanisms governing drug response. Machine learning techniques hold immense promise for better drug response predictions, but most have not reached clinical practice due to their lack of interpretability and their focus on monotherapies. We address these challenges by developing DrugCell, an interpretable deep learning model of human cancer cells trained on the responses of 1,235 tumor cell lines to 684 drugs. Tumor genotypes induce states in cellular subsystems that are integrated with drug structure to predict response to therapy and, simultaneously, learn biological mechanisms underlying the drug response. DrugCell predictions are accurate in cell lines and also stratify clinical outcomes. Analysis of DrugCell mechanisms leads directly to the design of synergistic drug combinations, which we validate systematically by combinatorial CRISPR, drug-drug screening in vitro, and patient-derived xenografts. DrugCell provides a blueprint for constructing interpretable models for predictive medicine.


Assuntos
Antineoplásicos/uso terapêutico , Biologia Computacional/métodos , Neoplasias/tratamento farmacológico , Antineoplásicos/farmacologia , Linhagem Celular Tumoral , Bases de Dados Factuais , Aprendizado Profundo , Ensaios de Seleção de Medicamentos Antitumorais , Sinergismo Farmacológico , Genótipo , Humanos , Neoplasias/genética , Modelagem Computacional Específica para o Paciente
9.
Bioinformatics ; 36(Suppl_1): i542-i550, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-32657383

RESUMO

MOTIVATION: Cellular Indexing of Transcriptomes and Epitopes by sequencing (CITE-seq), couples the measurement of surface marker proteins with simultaneous sequencing of mRNA at single cell level, which brings accurate cell surface phenotyping to single-cell transcriptomics. Unfortunately, multiplets in CITE-seq datasets create artificial cell types (ACT) and complicate the automation of cell surface phenotyping. RESULTS: We propose CITE-sort, an artificial-cell-type aware surface marker clustering method for CITE-seq. CITE-sort is aware of and is robust to multiplet-induced ACT. We benchmarked CITE-sort with real and simulated CITE-seq datasets and compared CITE-sort against canonical clustering methods. We show that CITE-sort produces the best clustering performance across the board. CITE-sort not only accurately identifies real biological cell types (BCT) but also consistently and reliably separates multiplet-induced artificial-cell-type droplet clusters from real BCT droplet clusters. In addition, CITE-sort organizes its clustering process with a binary tree, which facilitates easy interpretation and verification of its clustering result and simplifies cell-type annotation with domain knowledge in CITE-seq. AVAILABILITY AND IMPLEMENTATION: http://github.com/QiuyuLian/CITE-sort. SUPPLEMENTARY INFORMATION: Supplementary data is available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Análise por Conglomerados , Epitopos , Análise de Sequência de RNA , Software
10.
Cell Syst ; 11(2): 176-185.e6, 2020 08 26.
Artigo em Inglês | MEDLINE | ID: mdl-32619550

RESUMO

All mammals progress through similar physiological stages throughout life, from early development to puberty, aging, and death. Yet, the extent to which this conserved physiology reflects underlying genomic events is unclear. Here, we map the common methylation changes experienced by mammalian genomes as they age, focusing on comparison of humans with dogs, an emerging model of aging. Using oligo-capture sequencing, we characterize methylomes of 104 Labrador retrievers spanning a 16-year age range, achieving >150× coverage within mammalian syntenic blocks. Comparison with human methylomes reveals a nonlinear relationship that translates dog-to-human years and aligns the timing of major physiological milestones between the two species, with extension to mice. Conserved changes center on developmental gene networks, which are sufficient to translate age and the effects of anti-aging interventions across multiple mammals. These results establish methylation not only as a diagnostic age readout but also as a cross-species translator of physiological aging milestones.


Assuntos
Envelhecimento/genética , Metilação de DNA/genética , Animais , Cães , Humanos
11.
Proc Natl Acad Sci U S A ; 116(28): 14011-14018, 2019 07 09.
Artigo em Inglês | MEDLINE | ID: mdl-31235599

RESUMO

Three-dimensional genome structure plays a pivotal role in gene regulation and cellular function. Single-cell analysis of genome architecture has been achieved using imaging and chromatin conformation capture methods such as Hi-C. To study variation in chromosome structure between different cell types, computational approaches are needed that can utilize sparse and heterogeneous single-cell Hi-C data. However, few methods exist that are able to accurately and efficiently cluster such data into constituent cell types. Here, we describe scHiCluster, a single-cell clustering algorithm for Hi-C contact matrices that is based on imputations using linear convolution and random walk. Using both simulated and real single-cell Hi-C data as benchmarks, scHiCluster significantly improves clustering accuracy when applied to low coverage datasets compared with existing methods. After imputation by scHiCluster, topologically associating domain (TAD)-like structures (TLSs) can be identified within single cells, and their consensus boundaries were enriched at the TAD boundaries observed in bulk cell Hi-C samples. In summary, scHiCluster facilitates visualization and comparison of single-cell 3D genomes.


Assuntos
Cromatina/ultraestrutura , Estruturas Cromossômicas/ultraestrutura , Biologia Computacional , Análise de Célula Única , Algoritmos , Análise por Conglomerados , Genoma/genética , Humanos , Conformação Molecular
12.
Cell Syst ; 8(3): 267-273.e3, 2019 03 27.
Artigo em Inglês | MEDLINE | ID: mdl-30878356

RESUMO

Systems biology requires not only genome-scale data but also methods to integrate these data into interpretable models. Previously, we developed approaches that organize omics data into a structured hierarchy of cellular components and pathways, called a "data-driven ontology." Such hierarchies recapitulate known cellular subsystems and discover new ones. To broadly facilitate this type of modeling, we report the development of a software library called the Data-Driven Ontology Toolkit (DDOT), consisting of a Python package (https://github.com/idekerlab/ddot) to assemble and analyze ontologies and a web application (http://hiview.ucsd.edu) to visualize them. Using DDOT, we programmatically assemble a compendium of ontologies for 652 diseases by integrating gene-disease mappings with a gene similarity network derived from omics data. For example, the ontology for Fanconi anemia describes known and novel disease mechanisms in its hierarchy of 194 genes and 74 subsystems. DDOT provides an easy interface to share ontologies online at the Network Data Exchange.


Assuntos
Ontologias Biológicas , Biologia Computacional/métodos , Redes Reguladoras de Genes , Software , Ontologia Genética , Humanos
13.
Bioinformatics ; 35(14): 2528, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-30726869
14.
Nat Commun ; 9(1): 4159, 2018 10 08.
Artigo em Inglês | MEDLINE | ID: mdl-30297789

RESUMO

Many recent efforts to analyze cancer genomes involve aggregation of mutations within reference maps of molecular pathways and protein networks. Here, we find these pathway studies are impeded by molecular interactions that are functionally irrelevant to cancer or the patient's tumor type, as these interactions diminish the contrast of driver pathways relative to individual frequently mutated genes. This problem can be addressed by creating stringent tumor-specific networks of biophysical protein interactions, identified by signatures of epistatic selection during tumor evolution. Using such an evolutionarily selected pathway (ESP) map, we analyze the major cancer genome atlases to derive a hierarchical classification of tumor subtypes linked to characteristic mutated pathways. These pathways are clinically prognostic and predictive, including the TP53-AXIN-ARHGEF17 combination in liver and CYLC2-STK11-STK11IP in lung cancer, which we validate in independent cohorts. This ESP framework substantially improves the definition of cancer pathways and subtypes from tumor genome data.


Assuntos
Evolução Clonal/genética , Mutação , Neoplasias/genética , Transdução de Sinais/genética , Regulação Neoplásica da Expressão Gênica , Genoma Humano/genética , Humanos , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/metabolismo , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/metabolismo , Neoplasias/classificação , Neoplasias/metabolismo , Mapas de Interação de Proteínas/genética
15.
Bioinformatics ; 34(13): i484-i493, 2018 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-29949979

RESUMO

Motivation: Network propagation has been widely used to aggregate and amplify the effects of tumor mutations using knowledge of molecular interaction networks. However, propagating mutations through interactions irrelevant to cancer leads to erosion of pathway signals and complicates the identification of cancer subtypes. Results: To address this problem we introduce a propagation algorithm, Network-Based Supervised Stratification (NBS2), which learns the mutated subnetworks underlying tumor subtypes using a supervised approach. Given an annotated molecular network and reference tumor mutation profiles for which subtypes have been predefined, NBS2 is trained by adjusting the weights on interaction features such that network propagation best recovers the provided subtypes. After training, weights are fixed such that mutation profiles of new tumors can be accurately classified. We evaluate NBS2 on breast and glioblastoma tumors, demonstrating that it outperforms the best network-based approaches in classifying tumors to known subtypes for these diseases. By interpreting the interaction weights, we highlight characteristic molecular pathways driving selected subtypes. Availability and implementation: The NBS2 package is freely available at: https://github.com/wzhang1984/NBSS. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Mutação , Neoplasias/classificação , Transdução de Sinais , Aprendizado de Máquina Supervisionado , Neoplasias da Mama/classificação , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Feminino , Glioblastoma/classificação , Glioblastoma/genética , Glioblastoma/metabolismo , Humanos , Neoplasias/genética , Neoplasias/metabolismo , Mapas de Interação de Proteínas , Software
16.
Cell ; 173(7): 1562-1565, 2018 06 14.
Artigo em Inglês | MEDLINE | ID: mdl-29906441

RESUMO

A major ambition of artificial intelligence lies in translating patient data to successful therapies. Machine learning models face particular challenges in biomedicine, however, including handling of extreme data heterogeneity and lack of mechanistic insight into predictions. Here, we argue for "visible" approaches that guide model structure with experimental biology.


Assuntos
Biologia Computacional/métodos , Aprendizado de Máquina , Algoritmos , Pesquisa Biomédica
17.
Nat Methods ; 15(4): 290-298, 2018 04.
Artigo em Inglês | MEDLINE | ID: mdl-29505029

RESUMO

Although artificial neural networks are powerful classifiers, their internal structures are hard to interpret. In the life sciences, extensive knowledge of cell biology provides an opportunity to design visible neural networks (VNNs) that couple the model's inner workings to those of real systems. Here we develop DCell, a VNN embedded in the hierarchical structure of 2,526 subsystems comprising a eukaryotic cell (http://d-cell.ucsd.edu/). Trained on several million genotypes, DCell simulates cellular growth nearly as accurately as laboratory observations. During simulation, genotypes induce patterns of subsystem activities, enabling in silico investigations of the molecular mechanisms underlying genotype-phenotype associations. These mechanisms can be validated, and many are unexpected; some are governed by Boolean logic. Cumulatively, 80% of the importance for growth prediction is captured by 484 subsystems (21%), reflecting the emergence of a complex phenotype. DCell provides a foundation for decoding the genetics of disease, drug resistance and synthetic life.


Assuntos
Fenômenos Fisiológicos Celulares , Aprendizado Profundo , Redes Neurais de Computação , Simulação por Computador , Regulação da Expressão Gênica , Genótipo , Humanos
18.
Pac Symp Biocomput ; 23: 602-613, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29218918

RESUMO

Analysis of patient genomes and transcriptomes routinely recognizes new gene sets associated with human disease. Here we present an integrative natural language processing system which infers common functions for a gene set through automatic mining of the scientific literature with biological networks. This system links genes with associated literature phrases and combines these links with protein interactions in a single heterogeneous network. Multiscale functional annotations are inferred based on network distances between phrases and genes and then visualized as an ontology of biological concepts. To evaluate this system, we predict functions for gene sets representing known pathways and find that our approach achieves substantial improvement over the conventional text-mining baseline method. Moreover, our system discovers novel annotations for gene sets or pathways without previously known functions. Two case studies demonstrate how the system is used in discovery of new cancer-related pathways with ontological annotations.


Assuntos
Ontologia Genética/estatística & dados numéricos , Redes Reguladoras de Genes , Anotação de Sequência Molecular/estatística & dados numéricos , Mapas de Interação de Proteínas , Algoritmos , Biologia Computacional/métodos , Mineração de Dados/estatística & dados numéricos , Humanos , Processamento de Linguagem Natural , Neoplasias/genética
19.
Bioinformatics ; 33(14): i267-i273, 2017 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-28881999

RESUMO

Motivation: Reconstructing the full-length expressed transcripts ( a.k.a. the transcript assembly problem) from the short sequencing reads produced by RNA-seq protocol plays a central role in identifying novel genes and transcripts as well as in studying gene expressions and gene functions. A crucial step in transcript assembly is to accurately determine the splicing junctions and boundaries of the expressed transcripts from the reads alignment. In contrast to the splicing junctions that can be efficiently detected from spliced reads, the problem of identifying boundaries remains open and challenging, due to the fact that the signal related to boundaries is noisy and weak. Results: We present DeepBound, an effective approach to identify boundaries of expressed transcripts from RNA-seq reads alignment. In its core DeepBound employs deep convolutional neural fields to learn the hidden distributions and patterns of boundaries. To accurately model the transition probabilities and to solve the label-imbalance problem, we novelly incorporate the AUC (area under the curve) score into the optimizing objective function. To address the issue that deep probabilistic graphical models requires large number of labeled training samples, we propose to use simulated RNA-seq datasets to train our model. Through extensive experimental studies on both simulation datasets of two species and biological datasets, we show that DeepBound consistently and significantly outperforms the two existing methods. Availability and implementation: DeepBound is freely available at https://github.com/realbigws/DeepBound . Contact: mingfu.shao@cs.cmu.edu or realbigws@gmail.com.


Assuntos
Splicing de RNA , Análise de Sequência de RNA/métodos , Software , Algoritmos , Área Sob a Curva , Simulação por Computador , Éxons , Humanos , Íntrons , Modelos Genéticos
20.
Bioinformatics ; 32(17): i658-i664, 2016 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-27587686

RESUMO

MOTIVATION: As an increasing amount of protein-protein interaction (PPI) data becomes available, their computational interpretation has become an important problem in bioinformatics. The alignment of PPI networks from different species provides valuable information about conserved subnetworks, evolutionary pathways and functional orthologs. Although several methods have been proposed for global network alignment, there is a pressing need for methods that produce more accurate alignments in terms of both topological and functional consistency. RESULTS: In this work, we present a novel global network alignment algorithm, named ModuleAlign, which makes use of local topology information to define a module-based homology score. Based on a hierarchical clustering of functionally coherent proteins involved in the same module, ModuleAlign employs a novel iterative scheme to find the alignment between two networks. Evaluated on a diverse set of benchmarks, ModuleAlign outperforms state-of-the-art methods in producing functionally consistent alignments. By aligning Pathogen-Human PPI networks, ModuleAlign also detects a novel set of conserved human genes that pathogens preferentially target to cause pathogenesis. AVAILABILITY: http://ttic.uchicago.edu/∼hashemifar/ModuleAlign.html CONTACT: canzar@ttic.edu or j3xu.ttic.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , Humanos , Proteínas , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...